Hi I’m Kim Moir and I’m a release engineer for the the Eclipse and RT Equinox projects. Since last June, our team has been working on migrating our ten year old CVS repository to Git. I’m going to talk about the process that we used to migrate, how our development processes changes to accommodate it, the challenges we faced and advice for other teams that are migrating. Along the way, I'm going to include some quotes from other committers with their thoughts on our git migration.
In honour of the fact that our Git migration is almost complete, a more appropriate name for my talk might be Git happens. Questions for Audience Show of hands, how many of you use Git on a daily basis? How many use CVS or SVN?
Why Git? The Eclipse foundation is in the process of phasing out support for CVS (December 2012) to reduce support costs. In theory, the summer months are our lowest activity period in terms of development due to the release we have every year in June. However the summer of 2011 was really busy for us as we began to plan our migration to Git. We wanted to minimize the disruption to the team, so we wanted to migrate as many projects as possible before fall and people returned from vacation. As well, we wanted to be able to migrate the bulk of the components before Indigo SR1 shipped at the end of September.
300 bundles, 62 features, 84 fragments JDT, PDE, Platform and Equinox projects -Some committers had exposure to Git - Orion and OSGi Alliance experience -16 GB in Eclipse repo, 8 GB in Equinox repo
One of the first discussions as we planned for our migration: what would the granularity of our Git repositories be? The Eclipse project has several subprojects: Platform, JDT, PDE and Equinox. Our commit rights are quite specific. If you are a committer on jdt.core, this doesn’t mean that you have rights on jdt.ui. With CVS, you can just check out the bundles you want into your workspace, you don’t have to clone the entire repository to your machine. Thus we wanted to ensure that our repositories weren’t too big so that a contributor wasn’t synchronizing a large repo to their machine with a lot of content that they would never use. How should our Git repositories be organized? We had a discussion with the PMC and decided that repositories should be organized by Unix group id.
With CVS, we had two repositories, /cvsroot/eclipse and /cvsroot/rt/. There is also currently a limitation in Git where you cannot assign multiple ACLs to the same repo. In order to preserve our project structure, we needed to have a repo for each Unix group. We couldn't have built larger repos without reorganizing our project structure and commit rights. That being said, we would recommend minimizing the number of repos you create as working with multiple Git repositories can be painful.
Due our CVS repository size and our desire to preserve our history, we decided that this would be a gradual migration over several months instead of a migration over a few days. We ran test migrations on a component basis and letting the owners them look at them and determine if there were issues. Many teams took the opportunity to reorganize their repos into a more organized fashion, for instance separating features, bundles and test bundles into separate directories. The Platform UI team were the first team to migrate to Git (July). Paul Webster spent about a month testing the Git migration of the platform UI bundles and writing scripts to assist in with the process. One of the issues that we ran into is that, when you tag or branch a repo in Git, the entire repo is tagged or branched. You can’t tag or branch a single project. In an effort to be good Eclipse citizens, during a release cycle, we only tag bundles that have changed. Thus only new bundles get downloaded as needed. For this reason, when first ran the migration tool on our CVS repos, a maintenance branch would only include bundles that had been branched for that release, and all the bundles that were not branched would be missing. Not good! To fix this, Paul wrote some scripts to precondition the repositories so that maintenance branches would include all the bundles in that release. I know that other projects didn’t have this problem, for instance CDT tags their bundles every time.
Another issue that we looked at during testing was that we had some rather large test repositories due to our binary files. Some background: Our build just compiles Java code. The SWT and Equinox Launcher teams have C code that must be compiled on native hardware for the 13 platforms we support and stored in the repository in binary form. Thus our initial test Git repositories were bloated with binaries, many of which had tags associated with old builds that we weren’t going to ever build again. Thus, we decided to 1) Have binary only repositories for these projects 2) Clean the binary repositories of non-release binaries to reduce their size. (Run a git-filter branch operation to remove binaries) 3) Update build scripts to fetch artifacts from binary repos
-Conditioned the repos back to 3.0 release
git-move-refs = removes unneeded fix up branches after the conversion Challenges during migration: Massaging tags that didn’t meet git standards. For instance some JDT committers had tags with “*” in them. Applied regexp foo to modify them.Long running git filter branch operations - From 20 minutes to 16 hours. Eclipse webmasters created a local partition for me on the filesystem to avoid NFS timeout issues on the shared Eclipse filesystem. Otherwise git filter branch operations would timeout after a few hours due to stale NFS file handles.
How long did the migration take? It depends on the size of the repo and the history associated with it. JDT Core 24 hours. 8 hours for filter branch. Time is correlated with repository size and history. Also, since we ran the migrations twice (1) test (2) real the migrations took a long time in both machine and people time
Our committers had a number of problems when first using Git. If you delete a project from your workspace, it’s easy to push that change to the master repository as an delete by mistake. In addition, since we work in multiple branches, we have had cases where people switch to one branch for one bundle and inadvertently commit code to another bundle to the wrong stream. While switching streams, committers also inadvertently deleted changes in their local workspace.
Our developers experienced quite a learning curve when switching to Git. For many it was a surprise that they couldn’t do everything in EGit like they had done in the CVS tooling. Several people reverted to using the command line or gitk. Which they found ironic because we are in the tooling business. So reverting to command line operations to manage your code contributions seemed like a step backward.
Another challenge was that the switch in focus to branches as opposed to patches. Traditionally, many teams created patches for every change and attached them to bugzillas that document the change. However, with Git, instead of creating patches, you would commit, and then add a link to the change in Bugzilla. So we had to adjust our mindset of commit to branch, instead of making a patch.
Branches > Patches Letting go of the patch mentality was a hurdle for many people. Several teams submit every change as a patch to bugzilla, and have done so for years. New committers were traditionally taught to write and refine patches as part of the process to become a committer. So it felt unnatural to commit changes in local branches.
I missed a bundle during one of the migrations and spent a day trying to integrate CVS content into a git repo while preserving history. I tried git-stitch and git-merge but to no avail, the history didn’t look right. In the end, I ended up rerunning the CVS migration because it was too much work to fix all the tags to look right.
Friday afternoon, the 21st of October, before milestone week. Brian de Alwis was using bzr-git client. He pushed some changes to master branch and it wiped all but two of the active branches in the repo. It also triggered a gc which cleaned up the recently deleted branches. Initially, other committers tried to push back the changes but were not allowed to because of server side commit hooks. They was then a mad scramble to find a committer with the latest copy of the repo that could be restored to eclipse.org. Paul found one his home machine. comment 37 I'll just add the final fix. We took a cloned repo that was up to date from Thursday and pulled Friday's 7 commits into R4_development and R3_6_maintenance only. Denis disabled the commit hooks. Then we pushed all tags and pushed refs/remotes/origin/*:refs/heads/* Pushing the refs also pushed back the GCed commits. We should get that restored repo from the ISP and compare it with the public repo now, to confirm we've completely restored the repo. PW We recently ran into a problem where a push inadvertently removed most of the branches and tags from our public repo, eclipse.platform.ui.git and GCed the orphaned commits, leaving us in a bad state. This was done through normal git operations, and can be easily replicated from the command line or a little script. We'd like to discuss ways of preventing or limiting the damage to our public repos from this kind of situation in the future. Please adds your comments or insights to https://bugs.eclipse.org/bugs/show_bug.cgi?id=362076 ”
Easier to branch Rolling back a commit is easier Seeing the Eclipse project move to Git made Wayne Beaton happy. If Wayne is happy, everyone is happy. Cool graphs on GitHub. The EGit team lots of feedback. Bugzilla feedback is love. For instance, Dani and Markus opened over 110 EGit bugs.
“ Fork you” is now a valid bugzilla resolution.
-We build with a mixture of PDE, p2 and Ant, as well as the Eclipse compiler. -In order to build against Git repositories we added the EGit fetch factory bundle to the subset of bundles that we use to build Eclipse. -Modified our map files to point to Git repositories -builder changes - fetch maps from Git repos, compare tags, create tag for build Id -changes to build scripts so binaries are fetched from the appropriate repos -Ran several test builds. Surprising low on the release engineering pain point scale. -backport changes to all four active developmen streams
The migration has also made us rethink our development and build processes. Today, we usually build from tags. Everyone releases to a branch and tags their contribution to the build. But with Git, you should be thinking of terms of branches. For instance, we will be moving to a git flow model where our usual development occurs in a one “develop” branch and we merge changes into the master branch for the build. We will also change the builder to tag the branch automatically.
The Git migration consumed a lot of time for us. However, if you look at it from an accounting perspective, it’s a sunk cost. Every year, we make a plan of major items for the release. Migrating to Git was a major item for us which meant that other items had to be deferred
Advice for other projects contemplating their Git migration Relax: You don’t have as many bundles or as much history as we do. It won’t be so painful or costly for you. And it won’t take months. Unless you’re WTP. Then it might take a while. Run test migrations and builds first before the actual migration date and get feedback from your community to see if you need to modify your strategy for the actual migration. [email_address] is helpful for questions related to git migration. Other projects have been very helpful. Paul Webster wrote a document “Git workflows for CVS users” which has been very useful. Inevitably when people have their repositories migrated to Git they have similar questions so it’s good to have the answers in a document you can point them to. -Minimize the number of repos you create. We have too many repos and cloning so many repos is not the most efficient way to work.
The benefits from the Git migration are not yet realized. Proponents of distributed version control systems suggest that it makes it easier to fork and contribute.
I recently watched a talk by David Eaves, who has been helping out open source and open data communities prepare metrics on bug fix rate, how long patches wait and so on. Anyways, one of his points during his talk was that people think that open source is all about collaboration and working together. But really, it if we empower people to go off and work on a problem by themselves without having to interact with someone because of the reduced transaction costs, this is a huge benefit. It would be an interesting academic study to analyze contributions to Eclipse projects using SVN and CVS, and contributions after the same projects convert to Git and see if there is a statistically significant increase in contributions. The most important thing is that you want to reduce the barriers to contribution in your community.
Migrating to Git: Rethinking the Commit
<ul>@Kim_Moir IBM Ottawa </ul><ul>Migrating to Git: Rethinking the Commit </ul>
<ul>Git Happens </ul><ul>@Kim_Moir IBM Ottawa </ul>
<ul>About Us </ul><ul><li>10 year old code repository
<ul>“ It’s one thing to migrate to Git, it’s another thing to use it” -Olivier Thomann </ul>
<ul>“ Git: The command line is where it’s at” -Bogdan Gheorge </ul>
<ul>“ Patches are like dumping a database into a text file. You need to think in terms of releasing fixes to branches instead of passing around patches. Patches also lose some Git provenance information such as author and parent. ” -John Arthorne </ul>
<ul>“ That operation is four pay-grade levels above my current git-foo .” -Paul Webster </ul>
<ul>“ I have broken the platform-ui git repository. ” -Bug 361707 </ul><ul>“ Better policy to guard against deleting all branches and tags from our public repos” -Bug 362076 </ul>
<ul>“ There are three categories of costs that we incurred during the Git migration: The migration process itself, the developer learning curve and dealing with EGit issues ” -Mike Wilson </ul>
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Eclipse and the Eclipse logo are trademarks of Eclipse Foundation, Inc.
IBM and the IBM logo are trademarks or registered trademarks of IBM Corporation, in the United States, other countries or both.
Other company, product, or service names may be trademarks or service marks of others.
THE INFORMATION DISCUSSED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, AND IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, SUCH INFORMATION. ANY INFORMATION CONCERNING IBM'S PRODUCT PLANS OR STRATEGY IS SUBJECT TO CHANGE BY IBM WITHOUT NOTICE </li></ul>