Distributed Version Control with svk

I started to use Subversion one year ago and liked the elegant file-system design a lot. Soon it became impossible for me to go back to CVS. This means that I felt uncomfortable whenever I was working on projects using CVS, and I wanted to see a tool to keep my Subversion repository in sync with a CVS repository. This would not only save me time importing snapshots into vendor branches, but it would also give me the whole history when I'm not online.

I found Barrie's VCP and wrote a Subversion driver. Then I understood why people said Subversion was slow. My driver invoked the svn command, and it took something like 30 hours to convert from a CVS repository that resulted in 3000 revisions in the Subversion repository.

Fortunately the Subversion developers made the code easy and ready for wrapping into different languages using SWIG. At that time, only Python bindings were implemented, so I had to do the Perl bindings myself. With the Perl bindings implemented rapidly, VCP gets much faster, and I also started writing SVN::Mirror, a module that enables mirroring between Subversion repositories. When I felt bored, I would add Subversion back-end support to tools like Bloxsom and Kwiki.

Then the season for traveling came. As I'm far more productive and creative while disconnected from the Internet, I realized I need a distributed version control system, and decided to give myself a year break to develop such a tool to enable me to be even more productive in the future. svk was born soon after my birthday in September 2003.

Why?

There are other distributed version control systems available: Arch, monotone, darcs. The functionalities they offer are more or less equivalent.

svk, however, is written in Perl, and so might be more hackable by a large community. svk also has a set of commands similar to those of cvs. On top of this, svk plans to implement transparent interpolation between different version-control systems.

As I don't see any strong argument suggesting one system over another, it's really up to you to try and decide.

Design Decisions

Subversion has a layered design:

fs: Underlying versioned tree library using bdb.
repos: Higher-level support for the fs, like log messages.
ra: Repository access. Abstracted protocol handlers.
wc: Working copy handling.
client: Implements the commands for clients.

The first design decision was to drop the wc and ra layers. Elaborating the Subversion design mentality: "Bandwidth is expensive, disks are cheap," we should really keep a local copy for every revision -- and SVN::Mirror is already available for such purposes.

Having everything in a local repository, we don't need anything like the bloated wc implementation at all. The wc library not only has the .svn metadata directory to confuse your favorite utilities like diff and grep, but also stores a text-base that makes your checkout twice the size of the actual content. XD (which is the character-wise increment of wc) was written to maintain checkout copies in a lightweight manner.

Next, I found the most important component of Subversion is not on the above list. It's the delta library that defines the API for describing tree deltas; this is definitely the core thing in tree-based version control systems. It's called "Delta Editor."

For example, running a delta between revision 1 and revision 3 will generate a series of method calls (add_directory, open_file, apply_textdelta, close_file, close_directory, etc.), to the targeted editor object. These method calls describe the changes made from revision 1 to revision 3.

While svk was self-hosting within two months of development, I started to refactor the existing code to center around this interface. With Perl, I could easily stack the editors together, making each editor do its own job, adding arbitrary callbacks as extension to the API, and all of the fun things you know you can do with Perl. Much of the functionality is abstracted and it resulted in the following core components of svk:

An editor that receives delta calls to modify the checkout copy.
A function to generate delta calls for describing the modification done to the checkout copy.
An editor that takes delta calls and merges them with a tree to generate non-conflicting calls.

Together with these, the logic behind most of the commands became just a question of gluing together a delta generator and the appropriate editors.

Additionally, with Perl's flexible PerlIO layers system, keyword expansion (like $Id$ in cvs) was done within one hour. The reusable part of this was abstracted out to the PerlIO::via::dynamic module on CPAN.

Now let's see svk in action.

A First Look

I hate typing those long URLs when using Subversion. So mapping repositories to shorter names is a must:


    $ svk depotmap

This will help you create a default repository at ~/.svk/local, and you could refer to it by // in the future. If you have a Subversion repository on the disk, you could add another line: test: '/path/to/repos'. Then you have immediate access to the existing repository -- only with the shorter name /test/ instead of file:///long-path-plus-auto-complete-wont-work.

Now let's put something in it:


$ svk import //project/vendor /path/to/project-0.01

This will do what you think: import things into //project/vendor. Repeat the command with a newer version of this project, say 0.02, you'll have a vendor branch tracked on the path.

Like Subversion, branches and tags are implemented as cheap file system copies:


$ svk cp -m 'development trunk for project' //project/vendor //project/trunk

Now let's check it out:


$ svk checkout //project/trunk ~/work/project

If you have experience with cvs or Subversion, you'll find it familiar when trying to add, modify, remove, or commit files. svk log will give a change history of files or directories.

Suppose you import project-0.02 after branching trunk, and want to merge the changes from the vendor branch. You just need to:


$ svk smerge //project/vendor //project/trunk

svk remembers branch and merge history, so it does things automatically for you. If there are conflicts, just replace //project/trunk with a checkout path such as ~/project/trunk. You will be able to see the conflicts. Resolve them and commit once done. Merging is no more painful.

Once merged, you could bring the checkout copy of your trunk to the latest revision with svk update.

Working with Remote Repositories

As mentioned earlier, svk uses SVN::Mirror to handle remote repository access. You need to mirror them before you can use them:


$ svk mirror //project/trunk https://svn.somewhere.org/repos/trunk
$ svk sync //project/trunk

Currently you need to set up a Subversion server (either using Apache2 or svnserve). See relevant articles or tutorials about it.

Now create a local branch, and prepare for traveling:


$ svk cp -m 'create a local branch' //project/trunk //project/local

You could now check out //project/local and work on it just as above. Of course you could still create your own branch with cp //project/local //project/new-feature.

Use svk sync to sync the latest trunk when connected. Merging the new changes from trunk to your local branch works just like the previous example of merging from a vendor branch. How about merging your local changes back to the remote repository?


$ svk smerge //project/local //project/trunk

Transparent, isn't it?

You should use smerge -C in advance to check if there are conflicts. Even if your local branch is not merged from the latest trunk, svk will merge the changes for you and commit to the remote repository directly, provided there's no conflicts. But be sure to sync the latest trunk first.

In fact, if you are online and about to commit a minor change, you could forget about the process "modify on local branch, then merge back." Just do:


$ svk switch //project/trunk
$ svk commit

This first line means we now switch from the local branch to trunk, which is the path containing the mirrored archive. The switch command will keep your local change and apply it to trunk as if those modifications are done to a checkout of trunk. Then the svk commit on the mirrored path will just commit the changes directly to the remote server and then sync the path for you. If the server is temporarily unavailable, just switch back to local and merge back later.

You could also merge individual changes. Find the change number you want with svk log, and use:


$ svk cmerge -c 113,125-128,130 //project/trunk //project/stable

Now if you are working on projects where you don't have the permission to commit, you could easily generate a diff and submit it to the author:


$ svk diff //project/trunk //project/local

Working with Multiple Repositories

Many people track development of several projects. Once you use svk to mirror the projects, you can run svk sync -a to sync all of them.

Now suppose another hacker uses svk and adds a feature to the project and publishes his own branch, and you wish to experiment with or utilize his feature:


$ svk mirror //project/new-feature http://svn.somewhere.else/repos/trunk
$ svk sync //project/new-feature

Then you could merge the changes from him:


$ svk smerge //project/new-feature ~/work/project

Or you might decide to merge that branch to trunk directly:


$ svk smerge //project/new-feature //project/trunk

You could also use the cmerge command described above to merge specific changes only from that new-feature branch.

This is the minimum case of the distributed development model. The idea is that everyone could create his private branch of the product and then to be merged back by the maintainer. There have been arguments against such model, but I am not going into them here. Although tools somehow promote certain models to solve problems, we change the model or just use another tool when we have to.

There are several features planned in the near future:

Changeset signing and verification

Signing the modified file in a commit with gpg. This is already done; it's just that the SVN::Mirror side hasn't been able to propagate and verify the signatures.

VCP integration

This would enable mirroring (and thus branching) from alien version control systems, like cvs or perforce. Imagine:

 $ svk mirror //foo/fromcvs cvs://cvs.server/foo@trunk

This will make me (and perhaps other people) more comfortable when working with projects which use other version control systems, and also less confused when switching between different command sets for working with different projects.

Patch manager

Non-committers can already easily generate the diff as shown above. While it would be good to register a merge history with the patch manager, large projects that need to merge many developer-submitted patches would find it handy to have a feature which allowed a review, then test, then click-to-apply for a particular change.

The development of svk is rather rapid, so expect them coming soon!

Conclusion

Five months after the birth of svk, it had become a fast, full-featured distributed version control system. This is possible mainly because the flexibility of Perl and the spirit of Perl -- use something that exists to create new things. Besides, the commands are designed to DWIM!

If you find it interesting, get a copy from the home page and install it just like any other Perl module. Hopefully I'll then receive your complaints, make svk better, and make the open source world more productive.

Mar	APR	May
	14
2011	2012	2013