[Adium-devl] Version Control - To change or not to change? THAT is the question!
David Symonds
dsymonds at gmail.com
Tue Jan 1 22:44:33 UTC 2008
On Jan 2, 2008 7:14 AM, Alan Humpherys <alangh at adiumx.com> wrote:
> Since I have some strong feelings here, I decided it was time to "kick the
> Hornet's Nest" and write down my thoughts so I can come to a better
> understanding of the project direction.
>
> Let me start by stating that my contributions to Adium thus far have been
> miniscule, so I have very little room to talk.
IANAAD (I Am Not An Adium Developer), so my words should carry even
less weight, but I've been using SVN for many years and Git for nearly
a year, so I thought I'd toss my pennies in, too. I care a great deal
about source control, so please don't take my following responses
personally.
> 1. Why a DVCS?
> After consideration on this topic, I have narrowed the differences between
> SVN and the various DVCS systems down to 2 of any consequence as they relate
> to Adium development. There are certainly other differences between the
> implementation, features, and usability of any particular VCS, but these are
> the most important.
>
> 1.1. Disconnected Operation
> It appears to me that the main impetus behind any DVCS is support for
> disconnected operation, to support development in a situation where there
> is not access to a centralized repository. I myself have had need of such a
> system, and see its appeal. For example, 13 years ago, when I was
> developing the NetWare OS Kernel & JVM, it was commonplace that I was doing
> development on a plane, in a hotel room, or at a convention and did not have
> Internet access to do a checkin to our repository. I just had to continue
> working and wait to do a checkin when I returned to the office. That was
> certainly not optimal, and I longed for a different solution.
>
> However, the world has changed considerably in those 13 years, it has become
> so much more connected than it used to be. You are now hard-pressed to find
> a hotel, or convention without Internet access. When that is combined with
> the fact that we are working on a network application, I find myself
> hard-pressed to find any situation where an Adium developer is doing work
> and checking in the code when they do not have network access. This would
> mean that they are doing a checkin without testing their code, which I find
> disturbing.
>
> I believe that this removes disconnected operation from being a valid reason
> to switch away from SVN.
One of the biggest promoted features of Git is its speed, even amongst
the different DVCS. After a number of months of working solely with
Git repositories I had cause to do some work with an SVN repository,
and it was very painful due to its speed. Doing operations that I now
find quite natural and useful was very slow to the point that I didn't
want to use them. For example, even a simple "svn log" to see the last
few changes took quite a few seconds (mostly network-bound), but "git
log" was about as quick as typing "less readme.txt". It's not just
committing code that benefits from disconnected operation, but all the
other features that are now much more useable as well.
Connected-only operation also encourages people to commit large chunks
at once. Personally, I would much rather work in little chunks,
committing after each useful and significant change. This makes
tracking down regressions considerably easier, too.
> 1.2. Merging Branches
> Due to it's "multiple versions of the truth" model, where every developer
> has a copy of the repository, all of the DVCS systems have had to develop an
> extremely robust mechanism for creating and merging branches. In effect,
> every check-in starts to have some of the characteristics of a branch and
> merge operation in those environments.
>
> This is one of the areas where the DVCS models excel over SVN. They are
> built with the notion of tracking changes to multiple branches so that they
> can be merged together at a later date with minimal manual effort. This has
> been a common area of problems with any branching VCS for decades.
>
> The good news is that both DVCS and SVN handle the branching operation well.
> The operation is on the order of O(N), and it is inexpensive in the
> repository to make and track branches.
I disagree: SVN handles branching horribly. The notion of "copying"
the whole source tree to a separate namespace is terribly misguided
since branching is fundamentally different to just making new files
and directories.
Also, SVN branching is O(1) (though with a large constant), and at
least Git does O(1) branching (it's nothing more than making a new
40-byte text file).
Regardless, and as you point out, branching is useless without
merging, which is the SVN Achilles' heel. It has no proper branch
tracking (due to its copy-in-namespace approach), which causes merging
to be very painful, and also prevents useful queries being run to
examine branches and their relationships. In Git, for example, I can
run "git log master.." to see all the changes in my current branch
that aren't in the 'master' branch. Try doing that with SVN!
> 2. Should we change?
> By my own arguments, DVCS systems have an advantage in merging, but is that
> reason enough to make the change? Here are a number of other areas to
> consider before jumping to another VCS system "Because it's the wave of the
> future".
>
> 2.1. Tool Support
> SVN has finally come into the realm of full support for the tools we use in
> Adium development. It is a top-tier supported VCS for:
>
>
> - Mac OS (included in 10.5)
> - Trac
> - XCode
> - CIA
>
> If we change to mtn, bzr, or git, the onus is upon us as a development team
> to integrate that new system with the tools we use on a day to day basis. I
> would prefer that we use our limited development resources on new Adium
> features, rather than tool integration.
This might be a small stumbling block, though I think these issues are
less of a problem than you might think. CIA scripts exist for many of
the proposed DVCS (Git, bzr, Darcs, Mercurial). I don't use Xcode's
SVN integration since I found it somewhat unreliable and clunky; I
just keep a terminal open. Trac integration is nice, but is it really
a deal-breaker?
> 2.2. Product Maturity
> It is something we take for granted, that the VCS is a reliable mechanism
> and that bugs in the system will not corrupt our source code. I don't have
> any evidence that any one system is better than another in this area, but
> the fact that SVN has had longer to stabilize than the other systems gives
> me some level of comfort.
Many DVCS are used in large projects with no sign of data corruption
or unreliability. In fact, since every developer has a complete copy
of the repository it is *more* reliable than SVN which has to have a
single centralised repo. Automatic distributed backups are a great
side-effect of DVCS.
> 2.3. Size of User Community
> The SVN community is quickly becoming the largest, and as a result, there is
> increased support for developer attention to improving SVN as well as
> integrating more development tools with it as a first-tier VCS. Unless
> there is a critical mass for a platform, it will not gain that increased
> help from the community at large. (I know this first hand as we struggled
> with the "Fire" project, which lost it's mind-space to Adium as the years
> went by.)
SVN is probably already the largest VCS in use, at least among free
software. I'm somewhat ill-informed about non-Git DVCS, but Git is
used in quite large and important projects like the Linux kernel,
Xorg, Cairo, etc., which would indicated there's already critical mass
for it.
However, I'd question the merit of giving too much weight to concerns
for minor/non-contributors. It really isn't too hard to give some
simple instructions for checking out a copy, making minor changes and
submitting patches. Any change of VCS would be for the benefit of the
core development team, since they are the ones who make the biggest
contribution and who would benefit the most from a new and better VCS.
> 2.4. Ease of Use
> Most of the VCS systems have adopted the CVS command set model, so someone
> familiar with that command set can get up to speed on the new tool quickly.
> However, there are several items in DVCS systems which cause me some concern
> about the usability of DVCS systems over SVN.
>
> 2.4.1. Revision numbers
> With SVN, there is a simple revision number that represents the state of the
> system at a given point. In our case, this is a 5 digit number which grows
> incrementally over time. Due to the restrictions of working in a
> distributed system, the DVCS systems typically adopt use of a date/time or
> MD5 hash, which is not as user friendly.
Non-issue. I very rarely deal with the SHA-1 hashes that Git use,
because it supplies so many other ways of referring to different
revisions. For example, "master~3" refers to the 3rd-level parent of
the current tip of the master branch. Git also has lightweight
tagging, so you can just throw tags onto whichever revisions you care
about if you have to manipulate particular revisions repeatedly.
> 2.4.2. Local Repository
> With a DVCS, you need to keep a local version of the repository on your
> machine. The setup of such a local repository is an initial hurdle to ease
> of entry into using a system. While disk space is getting cheaper all the
> time, I still found it inconvenient to find space for a few hundred
> megabytes (and growing) of local storage. With that local repository comes
> questions of backup and other maintenance that I would rather not have to
> deal with.
I find it amusing that you previously asserted that network
connectivity should be assumed or a prerequisite to Adium development,
yet disk space is an issue (now measuring in fractions of a cent per
gigabyte)! Git stores the repository in a very compact
delta-compressed form, which tends to be much smaller than even SVN!
The Mozilla project did a sample conversion of their 10 year, 240,000
commit repository: a single checkout was 350 MB, CVS stored the
history in 3 GB, SVN in 12 GB, and Git in 300 MB. Yes, megabytes. The
entire history was stored in less space than a single full version of
the source tree.
> 2.4.3. Two step checkout and checkin
> With a DVCS, your normal checkout and checkin operations take place to your
> local repository, and that operation will appear very familiar to anyone
> acquainted with CVS or any of its descendants (like SVN). The "wrinkle"
> comes in as you have to synchronize your repository with one (or more)
> other repositories of other users. As you do these synchronization
> operations, you have to exercise care to ensure that you do not unwittingly
> create a branch off of the main line you thought you were working on. So
> now you have to do your checkin process twice....
I'd consider that a bonus: whenever two or more people work on code
together, it will start to diverge at some point. SVN doesn't prevent
that, nor does it aid it all that well. Hiding away merge problems
doesn't solve them, and DVCS tends to make it a lot easier to handle
properly.
> 2.5 Facilitates Fragmentation
> Because each developer has their own complete copy of the repository, they
> can very easily create their own project based upon the original, and have
> local development which is easily kept in sync with the original. It just
> makes it easier for someone to create an "AdiumZ" because they were upset
> about something that "curmudgeonly old Alan" said. While Open Source
> projects always allow for (and sometimes encourage) such fragmentation, just
> ask the average linux user what they think of having to choose between the
> ever fragmenting Linux distributions out there.
You'll notice that even though the Linux kernel is managed by a DVCS
(Git), the kernel really hasn't fragmented at all. The issue of the
different *distributions* is a completely separate topic, and a choice
between centralised and decentralised VCS has nothing to do with that.
> 2.6 Distributed User Management
> Inherent with having multiple repositories, comes the burden (potentially
> upon each developer) of managing the credentials for each user that is
> allowed to make changes to their repository. While this can be minimized
> with a centralized "master" there is still some work involved in user
> management for each participant.
I think that improves things in a DVCS. Instead of handling out commit
bits, and having to wrangle with all that guff, developers can just
pass around their changes much more naturally, and the designated
maintainer (someone like Evan) can just collate these all cleanly into
the "official" master. If Evan gets hit by a bus, someone else simply
takes over that role and nothing else need change.
> 3.0 My Conclusion
> Sometimes the choice of development tools borders on a "religious war", and
> each person involved often holds deeply rooted beliefs that are difficult to
> change. I myself resisted the switch from CVS to SVN for well over a year
> until I changed, and was eventually glad that I did.
>
> However, as I look at the maturity and features of the current state of DVCS
> systems. I find myself weighing the pro's and con's, but still find myself
> entrenched in the "It ain't broke, so don't fix it" camp for the Adium
> project.
My summation: "It *is* broke, you just don't realise it."
Cheers,
Dave.
More information about the devel
mailing list