Migrating Subversion repositories to Git

From Knowledge Base

Jump to: navigation, search

I currently host a number of public Subversion repositories for open source projects at svn.wincent.com. Due to long-standing dissatisfaction with Subversion's inadequate support for branching and merging, earlier this year I started using SVK locally as an additional layer while maintaining the server-side infrastructure.

But although SVK is very good, it is written in Perl and has proved to be quite slow. SVK's local mirroring eliminates the network bottleneck for some operations, but still proves to be quite slow overall. Git on the other hand delivers most of the advantages of SVK (see "Git advantages" and "SVK advantages") but additionally offers unrivalled speed for most operations, is more powerful and more robust in my judgement, and offers excellent documentation (see "Git documentation" on par or better than Subversion's and far in advance of SVK's).

Although Git, like SVK, can be used as a gateway to/from a central Subversion server, I've decided to simplify my infrastructure by eventually replacing my existing Subversion repositories with Git ones rather than just layering Git over the top.

Once the initial set-up is done (describe below) migrating additional repositories is very easy. The basic pattern for a shallow (no-history) import is:

# For *public* repositories:
# on the remote (public) machine:
# create a new bare repository
cd /pub/git/path_to_public_repositories
sudo -u git mkdir repo.git
cd repo.git
sudo -H -u git git --bare init
sudo -u git touch git-daemon-export-ok

# Alternatively, for private repositories:
cd /pub/git/path_private_repositories
sudo -u git mkdir repo.git
cd repo.git
sudo -H u git git --bare init

# For private *and* public repositories:
sudo -u git -H git config weblog.name "name for repo in Git log"
sudo -u git -H git config weblog.host "example.com"
sudo -u git -H git config weblog.uri "/mt/mt-xmlrpc.cgi"
sudo -u git -H git config weblog.blog 1
sudo -u git -H git config weblog.login "my_login"
sudo -u git -H git config weblog.password "my_password"
sudo -u git -H git config --bool weblog.publish true
sudo -u git cp hooks/post-receive hooks/post-receive.orig
sudo -u git cp path_to_post_receive_script hooks/post-receive
sudo chmod u+x hooks/post-receive

# For *private* repositories only
sudo -u git -H git config --add weblog.filters "confidential phrase"

# on the local (private) machine:
# create a new empty repository and prepare the initial commit
svn export svn://svn.example.com/repo/trunk repo
cd repo
git init
git add .
git commit -s

# actually push
git remote add origin git.example.com:/pub/git/path_to_public_repositories/repo.git
git push --all

# add a tag corresponding to Subversion revision number (eg r79)
git tag -s r79
git push --tags

# now, back on the remote machine:
# for *public* repositories, set up gitweb
cd path_to_repo
echo "Description of this repository" | sudo -u git tee description
echo "repository.git owner@example.com" | sudo -u git tee -a /pub/git/conf/gitweb-projects

Contents

Preliminaries

The standard location for repositories served by git-daemon (see man git-daemon) is inside /pub/git so I began by creating the /pub directory:

cd /
sudo mkdir pub

Seeing as I was planning on locking down access with git-shell (see man git-shell) I added the appropriate line into my /etc/shells list:

sudo -s

# tailor according to where you installed git-shell
echo "/usr/local/bin/git-shell" >> /etc/shells

I then created a git user and set its shell to /usr/local/bin/git-shell, disabled login (that is, in the /etc/shadow file the git user should have only a * in its password field), and set its home directory to /pub/git; seeing as I used Webmin the home directory and corresponding git group were automatically set up for me.

Then for some once-only global set-up for git user:

# turn on new 1.5 features which break backwards compatibility
sudo -H -u git git config --global core.legacyheaders false
sudo -H -u git git config --global repack.usedeltabaseoffset true

I also had to add this to /etc/services:

git             9418/tcp                        # Git version control system

And add this file, /etc/xinetd.d/git:

service git
{
        port = 9418
        socket_type = stream
        protocol = tcp
        user = git
        server = /usr/local/bin/git-daemon
        server_args = --inetd --base-path=/pub/git/path_to_public_repos -- /pub/git/path_to_public_repos
        type = UNLISTED
        wait = no
        max_load = 5
        instances = 5
}

Finally I had to get xinetd to re-read its configuration by sending it a SIGHUP.

SSH set up

While anonymous, read-only access is provided by the git-daemon, write access for developers takes place over SSH. As I am the only contributor on these projects at this stage the set-up is quite straight forward as there are no tricky permissions or shared repository concerns to worry about.

First of all, on my local machine I generated a public/private key pair for authentication:

ssh-keygen -t dsa -f ~/.ssh/id_dsa_git
chmod 400 ~/.ssh/id_dsa_git

I copied the public key to the remote machine, adding a corresponding entry to the ~/.ssh/authorized_keys file in the git user home directory.

no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty ssh-dss AAAAB...CkiWA== wincent@example.com (git)

On local machine, in ~/.ssh/config add:

Host git.example.com
  IdentityFile ~/.ssh/id_dsa_git
  HostName git.example.com
  User git

I also added the key to the list of keys managed by ssh-agent so that I wouldn't have to repeatedly enter my password. If this is all correctly set-up you should be able to perform a ssh git.example.com and see this error:

fatal: What do you think I am? A shell?

This is not actually a bad thing; the error message demonstrates that you were able to log in via public key authentication (good) and that you were given only restricted access thanks to git-shell (again, a good thing).

If this doesn't work the first thing to check should be the permissions on your remote (and local) ~/.ssh directory and its contents. I recommend permissions of 500 on the directory and 400 on the contents.

Importing from the Subversion repository

As noted here there are a number of ways to get an existing Subversion repository into Git. Among them, we have:

  • Use git svn init to set up a local, two-way mirror of an existing Subversion repository and all its history; this is similar to creating an SVK mirror
  • Use git svnimport to do a once-off import of an existing Subversion repository and all its history
  • Create a new Git repository and import only the tip of the current trunk from an existing Subversion repository (no history is imported)

I tried the first method first and the import failed half-way through:

$ git svn init -t tags -b branches -T trunk svn+ssh://svn.example.com/repo
$ git svn fetch
...
r76 = b5a75a3212b6620ec0a6967275cdfcec4844461d (trunk)
Malformed network data: Malformed network data at /usr/local/bin/git-svn line 964

Without really knowing the cause of the failure it seemed wise to try another method, and in any case, I didn't really want a two-way gateway but a once-off import so that I could eventually decommission the public Subversion server.

So I then tried the second alternative:

echo "wincent = Wincent Colaiuta <win@wincent.com>" >> ~/.svn-authors
git svnimport -i -v -I .gitignore -A ~/.svn-authors -C WOTest svn://svn.example.com/repo

Although this worked on my local (Mac OS X Tiger) machine, it failed on my remote Red Hat Enterprise Linux machine because it didn't have the necessary prerequisites installed (SVN::Core and friends). It had worked on the Mac OS X box because I had already built the Subversion Perl bindings when installing SVK. I tried to build the bindings on the Red Hat box but withot success (see "Upgrading to Subversion 1.4.4").

So this left me with two options:

  • Forget importing the history and seed a clean Git repository with only the tip of the current head
  • Use git svnimport to do the set-up on the Mac OS X box and then transfer the repository over to the Red Hat machine

The former was the easiest so that's the approach I tried first.

So, on the remote machine:

# set up remote repo, bare
sudo -H -u git mkdir test.git
cd test.git

# note that --bare comes before the init subcommand
# it is equivalent to --git-dir=pwd
sudo -H -u git git --bare init
sudo -H -u git touch git-daemon-export-ok

Now on the local machine, create a new empty repository and prepare the initial commit:

# grab tip of trunk to seed git repo
svn export svn://svn.example.com/repo/trunk test
cd test
git init
git add .
git commit -s

# add a tag corresponding to Subversion revision number
git tag -s r208

# actually push
git push git.example.com:/pub/git/path_to_public_repos/test.git master

Note how the default protocol is SSH and it is not necessary to specify it explicitly.

This pushed only the initial contents of the tree, not the tag. I believe if could have included the --tags switch to git push to have the tag included.

I also decided to set up a mapping from local master to remote master to make these pushes easier in the future. For example, I could set up a file (locally) at .git/remotes/my_shortcut with contents like this:

URL: git.example.com:/pub/git/path_to_public_repos/test.git
Push: master

And then push the tag (for example) by doing:

git push my_shortcut r208

I explained this in a write-up to the mailing list here, and it was clarified that the "new" way of setting up these remote references is to add them to your .git/config:

[remote "origin"]
        url = ssh://git.example.com/pub/git/path_to_public_repos/test
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
        merge = refs/heads/master

You can get this remote configuration (the same as would be produced by git clone) by using the git remote command:

git init
git remote add origin git.example.com:/pub/git/path_to_public_repos/test.git

Yet another alternative is to just clone the now-populated remote repository; cloning a totally empty repository won't work but now that it has some content cloning will work fine and doing a git clone will automatically set you up to push and pull:

git clone ssh://git.example.com/pub/git/path_to_public_repos/test

I later also tried the approach of creating a repository on my local machine with full history, zipping up the repository and transferring it to a remote server, unpacking it and configuring it on the remote server. This approach worked roughly as follows:

# extract the repository
cd path_to_git_repositories
sudo -u git cp ~/walrus.zip .
sudo -u git unzip walrus.zip 
sudo rm walrus.zip
sudo -u git mv walrus/.git Walrus.git
sudo rm -r walrus

# repository set-up
cd Walrus.git
sudo -u git touch git-daemon-export-ok
echo "Walrus.git win@wincent.com" | sudo tee -a path_to_conf_dir/gitweb-projects 
echo "Object-oriented templating system" | sudo tee description 

# remove local junk
sudo rm qgit_cache.dat svn2git svn-authors 

# remove now irrelevant "origin" head
sudo rm refs/heads/origin

But in the end I decided that instead of keeping the legacy history in the Git repository I preferred to make a clean break and start with a brand new (historyless) repository. I was mostly motivated by:

  • The fact that Git encourages certain conventions for formatting commit messages that I hadn't previously followed in the Subversion era; for example some of my commit messages are too wide and so don't display well in Gitweb
  • The desire for a clean "psychological break" with the old code and a "fresh start"; the history is still available in the Subversion repository my real attention is focussed on the present codebase, not where it came from
  • A desire to adopt a more disciplined approach in the future

See also

Articles in this knowledge base

External links

Additional notes

My initial attempts at importing were done on the remote server directly, so I also added this to the ~/.gitconfig for the git user, seeing as I'll be the only one doing commits for the foreseeable future:

sudo -H -u git git config --global user.email win@wincent.com
sudo -H -u git git config --global user.name "Wincent Colaiuta"

I suspect that this information is not actually required when pushing from an appropriately configured local repository, because the author information from the local repository should be used.

Advertisement
Advertisement