Migrating Subversion repositories to Git
From Knowledge Base
I currently host a number of public Subversion repositories for open source projects at svn.wincent.com. Due to long-standing dissatisfaction with Subversion's inadequate support for branching and merging, earlier this year I started using SVK locally as an additional layer while maintaining the server-side infrastructure.
But although SVK is very good, it is written in Perl and has proved to be quite slow. SVK's local mirroring eliminates the network bottleneck for some operations, but still proves to be quite slow overall. Git on the other hand delivers most of the advantages of SVK (see "Git advantages" and "SVK advantages") but additionally offers unrivalled speed for most operations, is more powerful and more robust in my judgement, and offers excellent documentation (see "Git documentation" on par or better than Subversion's and far in advance of SVK's).
Although Git, like SVK, can be used as a gateway to/from a central Subversion server, I've decided to simplify my infrastructure by eventually replacing my existing Subversion repositories with Git ones rather than just layering Git over the top.
Once the initial set-up is done (describe below) migrating additional repositories is very easy. The basic pattern for a shallow (no-history) import is:
# For *public* repositories: # on the remote (public) machine: # create a new bare repository cd /pub/git/path_to_public_repositories sudo -u git mkdir repo.git cd repo.git sudo -H -u git git --bare init sudo -u git touch git-daemon-export-ok # Alternatively, for private repositories: cd /pub/git/path_private_repositories sudo -u git mkdir repo.git cd repo.git sudo -H u git git --bare init # For private *and* public repositories: sudo -u git -H git config weblog.name "name for repo in Git log" sudo -u git -H git config weblog.host "example.com" sudo -u git -H git config weblog.uri "/mt/mt-xmlrpc.cgi" sudo -u git -H git config weblog.blog 1 sudo -u git -H git config weblog.login "my_login" sudo -u git -H git config weblog.password "my_password" sudo -u git -H git config --bool weblog.publish true sudo -u git cp hooks/post-receive hooks/post-receive.orig sudo -u git cp path_to_post_receive_script hooks/post-receive sudo chmod u+x hooks/post-receive # For *private* repositories only sudo -u git -H git config --add weblog.filters "confidential phrase" # on the local (private) machine: # create a new empty repository and prepare the initial commit svn export svn://svn.example.com/repo/trunk repo cd repo git init git add . git commit -s # actually push git remote add origin git.example.com:/pub/git/path_to_public_repositories/repo.git git push --all # add a tag corresponding to Subversion revision number (eg r79) git tag -s r79 git push --tags # now, back on the remote machine: # for *public* repositories, set up gitweb cd path_to_repo echo "Description of this repository" | sudo -u git tee description echo "repository.git owner@example.com" | sudo -u git tee -a /pub/git/conf/gitweb-projects
Contents |
Preliminaries
The standard location for repositories served by git-daemon (see man git-daemon) is inside /pub/git so I began by creating the /pub directory:
cd / sudo mkdir pub
Seeing as I was planning on locking down access with git-shell (see man git-shell) I added the appropriate line into my /etc/shells list:
sudo -s # tailor according to where you installed git-shell echo "/usr/local/bin/git-shell" >> /etc/shells
I then created a git user and set its shell to /usr/local/bin/git-shell, disabled login (that is, in the /etc/shadow file the git user should have only a * in its password field), and set its home directory to /pub/git; seeing as I used Webmin the home directory and corresponding git group were automatically set up for me.
Then for some once-only global set-up for git user:
# turn on new 1.5 features which break backwards compatibility sudo -H -u git git config --global core.legacyheaders false sudo -H -u git git config --global repack.usedeltabaseoffset true
I also had to add this to /etc/services:
git 9418/tcp # Git version control system
And add this file, /etc/xinetd.d/git:
service git
{
port = 9418
socket_type = stream
protocol = tcp
user = git
server = /usr/local/bin/git-daemon
server_args = --inetd --base-path=/pub/git/path_to_public_repos -- /pub/git/path_to_public_repos
type = UNLISTED
wait = no
max_load = 5
instances = 5
}
Finally I had to get xinetd to re-read its configuration by sending it a SIGHUP.
SSH set up
While anonymous, read-only access is provided by the git-daemon, write access for developers takes place over SSH. As I am the only contributor on these projects at this stage the set-up is quite straight forward as there are no tricky permissions or shared repository concerns to worry about.
First of all, on my local machine I generated a public/private key pair for authentication:
ssh-keygen -t dsa -f ~/.ssh/id_dsa_git chmod 400 ~/.ssh/id_dsa_git
I copied the public key to the remote machine, adding a corresponding entry to the ~/.ssh/authorized_keys file in the git user home directory.
no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty ssh-dss AAAAB...CkiWA== wincent@example.com (git)
On local machine, in ~/.ssh/config add:
Host git.example.com IdentityFile ~/.ssh/id_dsa_git HostName git.example.com User git
I also added the key to the list of keys managed by ssh-agent so that I wouldn't have to repeatedly enter my password. If this is all correctly set-up you should be able to perform a ssh git.example.com and see this error:
fatal: What do you think I am? A shell?
This is not actually a bad thing; the error message demonstrates that you were able to log in via public key authentication (good) and that you were given only restricted access thanks to git-shell (again, a good thing).
If this doesn't work the first thing to check should be the permissions on your remote (and local) ~/.ssh directory and its contents. I recommend permissions of 500 on the directory and 400 on the contents.
Importing from the Subversion repository
As noted here there are a number of ways to get an existing Subversion repository into Git. Among them, we have:
- Use git svn init to set up a local, two-way mirror of an existing Subversion repository and all its history; this is similar to creating an SVK mirror
- Use git svnimport to do a once-off import of an existing Subversion repository and all its history
- Create a new Git repository and import only the tip of the current trunk from an existing Subversion repository (no history is imported)
I tried the first method first and the import failed half-way through:
$ git svn init -t tags -b branches -T trunk svn+ssh://svn.example.com/repo $ git svn fetch ... r76 = b5a75a3212b6620ec0a6967275cdfcec4844461d (trunk) Malformed network data: Malformed network data at /usr/local/bin/git-svn line 964
Without really knowing the cause of the failure it seemed wise to try another method, and in any case, I didn't really want a two-way gateway but a once-off import so that I could eventually decommission the public Subversion server.
So I then tried the second alternative:
echo "wincent = Wincent Colaiuta <win@wincent.com>" >> ~/.svn-authors git svnimport -i -v -I .gitignore -A ~/.svn-authors -C WOTest svn://svn.example.com/repo
Although this worked on my local (Mac OS X Tiger) machine, it failed on my remote Red Hat Enterprise Linux machine because it didn't have the necessary prerequisites installed (SVN::Core and friends). It had worked on the Mac OS X box because I had already built the Subversion Perl bindings when installing SVK. I tried to build the bindings on the Red Hat box but withot success (see "Upgrading to Subversion 1.4.4").
So this left me with two options:
- Forget importing the history and seed a clean Git repository with only the tip of the current head
- Use git svnimport to do the set-up on the Mac OS X box and then transfer the repository over to the Red Hat machine
The former was the easiest so that's the approach I tried first.
So, on the remote machine:
# set up remote repo, bare sudo -H -u git mkdir test.git cd test.git # note that --bare comes before the init subcommand # it is equivalent to --git-dir=pwd sudo -H -u git git --bare init sudo -H -u git touch git-daemon-export-ok
Now on the local machine, create a new empty repository and prepare the initial commit:
# grab tip of trunk to seed git repo svn export svn://svn.example.com/repo/trunk test cd test git init git add . git commit -s # add a tag corresponding to Subversion revision number git tag -s r208 # actually push git push git.example.com:/pub/git/path_to_public_repos/test.git master
Note how the default protocol is SSH and it is not necessary to specify it explicitly.
This pushed only the initial contents of the tree, not the tag. I believe if could have included the --tags switch to git push to have the tag included.
I also decided to set up a mapping from local master to remote master to make these pushes easier in the future. For example, I could set up a file (locally) at .git/remotes/my_shortcut with contents like this:
URL: git.example.com:/pub/git/path_to_public_repos/test.git Push: master
And then push the tag (for example) by doing:
git push my_shortcut r208
I explained this in a write-up to the mailing list here, and it was clarified that the "new" way of setting up these remote references is to add them to your .git/config:
[remote "origin"]
url = ssh://git.example.com/pub/git/path_to_public_repos/test
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
You can get this remote configuration (the same as would be produced by git clone) by using the git remote command:
git init git remote add origin git.example.com:/pub/git/path_to_public_repos/test.git
Yet another alternative is to just clone the now-populated remote repository; cloning a totally empty repository won't work but now that it has some content cloning will work fine and doing a git clone will automatically set you up to push and pull:
git clone ssh://git.example.com/pub/git/path_to_public_repos/test
I later also tried the approach of creating a repository on my local machine with full history, zipping up the repository and transferring it to a remote server, unpacking it and configuring it on the remote server. This approach worked roughly as follows:
# extract the repository cd path_to_git_repositories sudo -u git cp ~/walrus.zip . sudo -u git unzip walrus.zip sudo rm walrus.zip sudo -u git mv walrus/.git Walrus.git sudo rm -r walrus # repository set-up cd Walrus.git sudo -u git touch git-daemon-export-ok echo "Walrus.git win@wincent.com" | sudo tee -a path_to_conf_dir/gitweb-projects echo "Object-oriented templating system" | sudo tee description # remove local junk sudo rm qgit_cache.dat svn2git svn-authors # remove now irrelevant "origin" head sudo rm refs/heads/origin
But in the end I decided that instead of keeping the legacy history in the Git repository I preferred to make a clean break and start with a brand new (historyless) repository. I was mostly motivated by:
- The fact that Git encourages certain conventions for formatting commit messages that I hadn't previously followed in the Subversion era; for example some of my commit messages are too wide and so don't display well in Gitweb
- The desire for a clean "psychological break" with the old code and a "fresh start"; the history is still available in the Subversion repository my real attention is focussed on the present codebase, not where it came from
- A desire to adopt a more disciplined approach in the future
See also
Articles in this knowledge base
- Setting up gitweb
- Git repository maintenance
- Setting up a brand new public Git repository
- Setting up a brand new private Git repository
External links
Additional notes
My initial attempts at importing were done on the remote server directly, so I also added this to the ~/.gitconfig for the git user, seeing as I'll be the only one doing commits for the foreseeable future:
sudo -H -u git git config --global user.email win@wincent.com sudo -H -u git git config --global user.name "Wincent Colaiuta"
I suspect that this information is not actually required when pushing from an appropriately configured local repository, because the author information from the local repository should be used.
