Using Git/Gerrit and Jenkins to Manage the Code Review Process
Using Git/Gerrit and Jenkins to Manage the Code Review ProcessESC-4024Presenters : Marc Karasek & Phil HordCode Review – What is it and why do we do it?The idea of the lone ranger programmer, cranking out code in his cube/office, is a nice romantic idea. Inreality it only leads to code that is obfuscated and unmaintainable. Having a code review process aspart of your development flow however, leads to more maintainable code.The ‘ideal’ code review system: 1. Web interface – Allow access from multiple development sites. 2. Allows pre-commit code reviews 3. Can handle a large number of repositories 4. Inline comments and block comments 5. Integration with a build server. 6. Review Process Workflow that can be integrated into the development process – Developer does not have to do anything “extra” to start review process.Let’s take each of these in order:Web InterfaceToday we have development teams spread around the world. The old adage the sun never sets on theBritish Empire could be applied to some of our current development teams. With developers notalways located in the same geographical/time zone area, it becomes important to have a web interfaceto allow code review to be a process that is not dependent on sitting around a table. A developer cansubmit his code for review and when his teammate gets into his office, he can review his code.Pre-commit code reviewsOne of the biggest problems facing code review is how to satisfy both requirements to have the codeunder SCM, and at the same time not impact any current code base with pre-reviewed code. There aremany ways to implement this, having a separate (sandbox) repository for untested/unreviewed codeand submitting patchset(s) for changes into a SCM are a couple of ways. The problem here is that mostof these methods add overhead to your development process. Having to maintain two repos, one for
production one for development or adding additional steps to the development process to create thepatchset for a change to be reviewed.Can handle a large number of repositoriesDevelopment teams today work on multiple projects; each one normally has its own code base thatneeds to be maintained. Being able to maintain a large number of different repositories, while not amajor issue with SCM systems today, are worth mentioning. It can become an issue in how the SCMstores the repository and how much space on the server it takes.Inline comments and block commentsThis is important to allow reviewers, to not only comment on the actual change itself, but addcomments inline in the patchset/code that is being reviewed. Think of this as a global comment on thechange, “The commit message needs to have some more verbiage added to describe the change better”versus a local comment in the code, “This variable is being used in file ABCD.c, check this file to makesure we do not have an issue.” Both types of comments, inline and block, should be part of the codereview history/process.Integration with a build serverProjects that share code across platforms and need to be able to cross check common code for multiplebuild targets. Having a build server that can do the ‘grunt’ work of building multiple targets for a codebase puts a check in place that is not dependent on a developer doing the builds. With some projectshaving many targets, having a build server helps to automate and standardize the process.Review Process Workflow that can be integrated into the developmentprocess.The trick is to integrate the code review so it is a part of your ‘normal’ code development process. Ifthere is any “exception” path that allows engineers to bypass code review for emergencies, this willbecome the normal path. From the developer’s point of view, the code review process should have aminimal impact on the development process. The best case is that the developer normal check/commitprocess for submitting code into the SCM is the code review process.
Current ProcessesMost code review systems/processes generally fall into one of three models: 1. Code is checked into a temporary holding branch for review. Once it has been reviewed, it is then merged into a master/release branch. This merge maybe could be done by either the original developer or a dedicated build/repository manager. 2. Code is kept locally on the developer’smachine; it is posted/emailed for review. Once it is has been reviewed it is the responsibility of the developer to merge this into the release/master branch. 3. Separate branches are maintained for release and development. The development branch is never guaranteed to build but always has the latest and greatest in it. Code maybe checked into this branch with no review. Once checked in, reviewers are notified and provided a link to the commit for review.Each of these processes has its good and bad points. What all of them lack is a way to automate thereview process. These includes 1. being able to cherry-pick/pull a patchset to a local repository for review/testing 2. review the changes w/o pulling down the code to your local machine 3. review the history of this change a. how many times has it been through the review process b. what other reviewers comments are
Let us see how the above processes stack up against the ‘ideal’ codereview system.Web InterfaceAll of the above could have some kind of web interface for accessing the code under review. This couldbe as simple as a patchset sent via email, to a web based gui. Regardless of the method, this adds anextra step in the development process. The engineer has to package his changes into a patchset, andthen either send it out an email list or post it to a web site. This adds time to the development processand does not allow good tracking of review changes. The normal process would be for the developer toreceive feedback, generate a new patchset and then send/post this new change. There is no explicitlink between the old and new changes.Pre-commit code reviewsOnly some of the above handle this requirement, #1 and #3. For these two the code is checked into aholding area/development branch for review, prior to being merged over to the release/master. #2fails this requirement, as the change only live on the developer’s machine and if it has an ‘accident’ thenthe changes are lost.Even the ones that meet this requirement have problems. As in the previous requirement this adds tothe development process. The code needs to be merged over, after review. This is either handled bythe developer or by a dedicated build manger. At the end it is a manual step that adds time and takesup resources.Can handle a large number of repositoriesMost modern SCM systems handle large repositories. This impacts the review process very little and isbest left for a separate discussion.Inline comments and block commentsMost current review processes fail this requirement. Being able to view other reviewer’s comments ona file or about the overall change is an invaluable resource that helps to streamline the review process.Also being able to review past comments for this change, no one gets it right the first time that is whywe do code review, also leads to shorter review time.
Integration with a build serverThis is normally a manual step in the review process, where a developer has to submit his job to thebuild server. At the best it is somewhat automated, in a nightly weekly build that pulls all currentsubmitted changes in and attempts to build them.Where this fails is that for all of the processes only #1 above, where the change is contained in its ownrepository couldbuild. For #3, the development branch is never guaranteed buildable. So for a vastmajority of the time this adds time to the process. Someone has to go find out why the nightly/weeklydevelopment build fails, inform the engineer that submitted the code, etc. For #2, there is no way forthe build server to get the code as it is on the developer’s machine.Review Process Workflow that can be integrated into the developmentprocess.For all three of the above, each one adds additional steps into the process. For the developer it is amultistep process to get his code submitted. They have to learn a ‘new’ process and how to use thisprocess in their development. For example: how to properly generate the patchset so that it can bereviewed by the team or how to package their changes to submit them through a web interface forreview.
Introducing : Git / Gerrit / JenkinsUsing git as a SCM with gerrit as a frontend addresses most of the above requirements. Adding Jenkinsas a build/integration server covers the requirements using git/gerrit alone do not.Web InterfaceGerrit provides a web interface that allows code review, patchset generation, cherry-picking, etc. ofpatchsets that have been submitted for review. Access to this web interface and the underlyingrepositories can be access controlled so that developers only have access to the projects that they areworking on.It allows for a custom view of the patchset under review. A reviewer can choose to view any number oflines that surround the change, up to the whole file. This allows each reviewer to view as muchinformation as they need, without having to check out any code.
Pre-commit code reviewsThis one item is worth using git/gerrit. Using gerrit as a frontend provides a ‘standard’ git interface tothe developers. They push there code to the git server, no special check in process, no special softwareto install. The developer just pushes their code to a tag “refs/for/<branch>” that gerrit understands.gerrit then takes the changes and creates a patchset from it and posts it to its web interface for review.The patchset is ‘held’ in gerrit until the code has been reviewed. It then can be submitted into the gitrepository. This patchset can be updated, abandoned, resurrected, etc. all without impacting the gitrepository that it has been pushed to. This allows for changesets to be in review and pending withoutimpacting the code base. The patchset can also be updated by the developer based on commentsduring review. They make the requested changes and just push the same commit to the git server.Gerrit sees that this is a new patchset based on a previous one and adds it to the review as patchset<x>.
Can handle a large number of repositoriesAll modern SCM systems can handle multiple repositories. Where git stands out though is in the size ofthe repository and how it stores the files.For example the Mozilla repository is reported to be almost 12 Gb when stored in SVN using the fsfsbackend. Previously, the fsfs backend also required over 240,000 files in one directory to record all240,000 commits made over the 10 year project history. The exact same history is stored in git by onlytwo files totaling just over 420 Mb. This means that SVN requires 30x the disk space to store the samehistory.One of the reasons for the smaller repo size is that an SVN working directory always contains two copiesof each file: one for the user to actually work with and another hidden in .svn/ to aid operations such asstatus, diff and commit. In contrast a git working directory requires only one small index file that storesabout 100 bytes of data per tracked file. On projects with a large number of files this can be asubstantial difference in the disk space required per working copy.This same comparison can be made between git and cvs, where a 3x improvement in disk space usagehas been seen.A side effect of how git manages its repository is that each time you clone a repository locally you getthe full repository. All the history, etc. is cloned to the local machine from the server. This allows fordevelopers to work on code and switch between branches, search history, etc. without having to bephysically attached to the ‘central’ SCM.Inline comments and block commentsGerrit allows the reviewer(s) to enter both inline and block comments on any patchset they arereviewing. It also keeps a history of the patchset as it goes through the review process. This gives thereviewer/developer the ability to access the past history of comments on the change.
Integration with a build serverThis is where the three amigos meet. Jenkins (build server) has built-in hooks to monitor and buildagainst a gerrit/git SCM system. This allows for automated builds to happen as a trigger event based ona patchset being submitted into gerrit. The developer does not have to do anything special to triggerthis event; it is automatic based on the patchset and which branch it is being pushed to in gerrit/git.This can be used to build a set group of targets based on a given branch, or all of the targets that a givenproject builds for.Review Process Workflow that can be integrated into the developmentprocess.This is where the rubber hits the road. Using gerrit/git allows the review process to be fully integratedinto the development process. The developer does not have to learn any new process, they just pushtheir changes to git and gerrit takes care of the magicThe developer pushes there code to the git server, no special check in process, no special software toinstall. The code is pushed to a special tag “refs/for/<branch>” that gerrit understands. Gerrit thentakes the changes and creates a patchset from it and posts it to its web interface for review. It thenemails out to whoever is on the review list that a new review is in their queue. When the reviewer(s) loginto gerrit, they see the patchset they have been asked to review in their queue.