I started to use Subversion one year ago and liked the elegant file-system design a lot. Soon it became impossible for me to go back to CVS. This means that I felt uncomfortable whenever I was working on projects using CVS, and I wanted to see a tool to keep my Subversion repository in sync with a CVS repository. This would not only save me time importing snapshots into vendor branches, but it would also give me the whole history when I'm not online.
I found Barrie's VCP and wrote a Subversion driver. Then I
understood why people said Subversion was slow. My driver invoked the
svn command, and it took something like 30 hours to convert from a CVS
repository that resulted in 3000 revisions in the Subversion
repository.
Fortunately the Subversion developers made the code easy and ready for
wrapping into different languages using SWIG. At that time, only Python
bindings were implemented, so I had to do the Perl bindings myself.
With the Perl bindings implemented rapidly, VCP gets much faster, and I
also started writing SVN::Mirror,
a module that enables mirroring
between Subversion repositories. When I felt bored, I would add
Subversion back-end support to tools like Bloxsom and Kwiki.
Then the season for traveling came. As I'm far more productive and creative while disconnected from the Internet, I realized I need a distributed version control system, and decided to give myself a year break to develop such a tool to enable me to be even more productive in the future. svk was born soon after my birthday in September 2003.
Why?
There are other distributed version control systems available: Arch,
monotone, darcs. The functionalities they offer are more or less
equivalent.
svk, however, is written in Perl, and so might be more hackable by a
large community. svk also has a set of commands similar to those of cvs. On
top of this, svk plans to implement transparent interpolation between
different version-control systems.
As I don't see any strong argument suggesting one system over another, it's really up to you to try and decide.
Design Decisions
Subversion has a layered design:
-
fs: Underlying versioned tree library usingbdb. -
repos: Higher-level support for thefs, like log messages. -
ra: Repository access. Abstracted protocol handlers. -
wc: Working copy handling. -
client: Implements the commands for clients.
The first design decision was to drop the wc and
ra layers. Elaborating the Subversion design mentality:
"Bandwidth is expensive, disks are cheap," we should really keep a
local copy for every revision -- and SVN::Mirror is
already available for such purposes.
Having everything in a local repository, we don't need anything like
the bloated wc implementation at all. The
wc library not only has the .svn metadata directory
to confuse your favorite utilities like diff and grep, but also stores
a text-base that makes your checkout twice the size of the actual
content. XD (which is the character-wise increment of wc) was written
to maintain checkout copies in a lightweight manner.
Next, I found the most important component of Subversion is not on the above list. It's the delta library that defines the API for describing tree deltas; this is definitely the core thing in tree-based version control systems. It's called "Delta Editor."
For example, running a delta between revision 1 and revision 3 will
generate a series of method calls (add_directory,
open_file, apply_textdelta,
close_file, close_directory, etc.), to the
targeted editor object. These method calls describe the changes made
from revision 1 to revision 3.
While svk was self-hosting within two months of development, I started to
refactor the existing code to center around this interface. With Perl,
I could easily stack the editors together, making each editor do its own
job, adding arbitrary callbacks as extension to the API, and all of the fun
things you know you can do with Perl. Much of the functionality is
abstracted and it resulted in the following core components of svk:
- An editor that receives delta calls to modify the checkout copy.
- A function to generate delta calls for describing the modification done to the checkout copy.
- An editor that takes delta calls and merges them with a tree to generate non-conflicting calls.
Together with these, the logic behind most of the commands became just a question of gluing together a delta generator and the appropriate editors.
Additionally, with Perl's flexible PerlIO layers system, keyword
expansion (like $Id$ in cvs) was done within one
hour. The reusable part of this was abstracted out to the
PerlIO::via::dynamic module on CPAN.
Now let's see svk in action.
A First Look
I hate typing those long URLs when using Subversion. So mapping repositories to shorter names is a must:
$ svk depotmap
This will help you create a default repository at
~/.svk/local, and you could refer to it by
// in the future. If you have a Subversion repository
on the disk, you could add another line: test:
'/path/to/repos'. Then you have immediate access to the
existing repository -- only with the shorter name /test/
instead of
file:///long-path-plus-auto-complete-wont-work.
Now let's put something in it:
$ svk import //project/vendor /path/to/project-0.01
This will do what you think: import things into
//project/vendor. Repeat the command with a newer version
of this project, say 0.02, you'll have a vendor branch tracked on the
path.
Like Subversion, branches and tags are implemented as cheap file system copies:
$ svk cp -m 'development trunk for project' //project/vendor //project/trunk
Now let's check it out:
$ svk checkout //project/trunk ~/work/project
If you have experience with cvs or Subversion, you'll find it
familiar when trying to add, modify, remove, or commit files. svk log
will give a change history of files or directories.
Suppose you import project-0.02 after branching trunk, and want to merge the changes from the vendor branch. You just need to:
$ svk smerge //project/vendor //project/trunk
svk remembers branch and merge history, so it does things
automatically for you. If there are conflicts, just replace
//project/trunk with a checkout path such as
~/project/trunk. You will be able to see the
conflicts. Resolve them and commit once done. Merging is no more
painful.
Once merged, you could bring the checkout copy of your trunk to the latest
revision with svk update.
Working with Remote Repositories
As mentioned earlier, svk uses SVN::Mirror to handle
remote repository access. You need to mirror them before you can
use them:
$ svk mirror //project/trunk https://svn.somewhere.org/repos/trunk
$ svk sync //project/trunk
Currently you need to set up a Subversion server (either using Apache2
or svnserve). See relevant articles or tutorials about it.
Now create a local branch, and prepare for traveling:
$ svk cp -m 'create a local branch' //project/trunk //project/local
You could now check out //project/local and work on it
just as above. Of course you could still create your own branch
with cp //project/local //project/new-feature.
Use svk sync to sync the latest trunk when connected. Merging the new
changes from trunk to your local branch works just like the previous
example of merging from a vendor branch. How about merging your local
changes back to the remote repository?
$ svk smerge //project/local //project/trunk
Transparent, isn't it?
You should use smerge -C in advance to check if there are conflicts.
Even if your local branch is not merged from the latest trunk, svk will
merge the changes for you and commit to the remote repository directly,
provided there's no conflicts. But be sure to sync the latest trunk first.
In fact, if you are online and about to commit a minor change, you could forget about the process "modify on local branch, then merge back." Just do:
$ svk switch //project/trunk
$ svk commit
This first line means we now switch from the local branch to trunk,
which is the path containing the mirrored archive. The switch command
will keep your local change and apply it to trunk as if those
modifications are done to a checkout of trunk. Then the svk
commit on the mirrored path will just commit the changes
directly to the remote server and then sync the path for you. If
the server is temporarily unavailable, just switch back to local and
merge back later.
You could also merge individual changes. Find the change number you
want with svk log, and use:
$ svk cmerge -c 113,125-128,130 //project/trunk //project/stable
Now if you are working on projects where you don't have the permission to
commit, you could easily generate a diff and submit it to the author:
$ svk diff //project/trunk //project/local
Working with Multiple Repositories
Many people track development of several projects. Once you use svk
to mirror the projects, you can run svk sync -a to sync all of
them.
Now suppose another hacker uses svk and adds a feature to the project
and publishes his own branch, and you wish to experiment with or utilize
his feature:
$ svk mirror //project/new-feature http://svn.somewhere.else/repos/trunk
$ svk sync //project/new-feature
Then you could merge the changes from him:
$ svk smerge //project/new-feature ~/work/project
Or you might decide to merge that branch to trunk directly:
$ svk smerge //project/new-feature //project/trunk
You could also use the cmerge command described above to merge
specific changes only from that new-feature branch.
This is the minimum case of the distributed development model. The idea is that everyone could create his private branch of the product and then to be merged back by the maintainer. There have been arguments against such model, but I am not going into them here. Although tools somehow promote certain models to solve problems, we change the model or just use another tool when we have to.
There are several features planned in the near future:
- Changeset signing and verification
-
Signing the modified file in a commit with
gpg. This is already done; it's just that theSVN::Mirrorside hasn't been able to propagate and verify the signatures. - VCP integration
-
This would enable mirroring (and thus branching) from alien version control systems, like
cvsorperforce. Imagine:$ svk mirror //foo/fromcvs cvs://cvs.server/foo@trunkThis will make me (and perhaps other people) more comfortable when working with projects which use other version control systems, and also less confused when switching between different command sets for working with different projects.
- Patch manager
-
Non-committers can already easily generate the
diffas shown above. While it would be good to register a merge history with the patch manager, large projects that need to merge many developer-submitted patches would find it handy to have a feature which allowed a review, then test, then click-to-apply for a particular change.
The development of svk is rather rapid, so expect them coming soon!
Conclusion
Five months after the birth of svk, it had become a fast,
full-featured distributed version control system. This is possible
mainly because the flexibility of Perl and the spirit of Perl -- use
something that exists to create new things. Besides, the commands are
designed to DWIM!
If you find it interesting, get a copy from the home page and install it
just like any other Perl module. Hopefully I'll then receive your
complaints, make svk better, and make the open source world more
productive.



