Not a great deal is certain when it comes to software development. But one thing that is certain is …
… change. There are changes to … requirements and specificationswork items and bug lists. documentation. development teams and their managementAnd of course there are changes to code.
… and there are a lot of changes …
Many open source projects had over 1,000 commits last year. Large projects can have more. Mozilla for example had over 10,000 commits, and about 28,000 new bug reports.
To manage and track all these changes developers use many kinds of tools. Bug databases, work item databases, and of course source control repositories. More and more we see these tools integrated together in systems like IBM’s Jazz and Microsoft’s Team Foundation Server. Clearly managing change is critical …
It’s important for facilitating collaboration between developers, especially when they are geographically distributed. Developers might wish to validate changes, locate and triage defects, or simply understand modifications.It’s important for helping managers understand what going on, something that critical to the success of a project. And ultimately, understanding the history of a project is essential to evaluating it’s success.
But even with available tools understanding how a program is changing over time is not a simple proposition. And at it’s core, the problem is that program source code is difficult to understand. Professional developers spend the majority of their time trying to understand code.Let’s take a look at an example of this …
Suppose I’d like to understand this change, maybe I’m looking for a bug, or maybe I’m responsible for this particular file and I want to make sure this change, from some other developer is ok. What we have here is the standard diff output for a change to some java code showing exactly which lines changed. The problem is that diff output is not always easy to understand because we only get a slice of the program. Here we can see that the first element of a local array called pageParts is being returned instead of an empty string … but we don’t know when that happens, and its also not obvious what pageParts contains, or whether this is a reasonable thing to do.Because this tends to happen often there is also “context diff” which prints out some extra lines above and below the changed lines. But in this case pageParts is defined six lines up, so what we probably need to do is use a tool that gives a side-by side comparison …
The problem is that now we’re back to reading the code which is that difficult and time consuming task we want to avoid.
This is one of the reasons we have “commit messages” in version control systems. These are free form text fields that enable developers to summarize their changes so that others can see what they’ve done without having to puzzle through diffs or read code. In many projects they are required. The problem is that not only do they take time to make, these messages aren't always as complete and accurate as we would like …
Take a look at a few developer message board posts. These comments allude to the observation that although commit messages are useful, they are also burdensome for developers to create. Take the example on the top right which says …So perhaps what we would really like is a tool that’s automatic like diff, but that produces output that’s more to-the-point or summarized and easier to understand like commit messages.
What I’ll talk about today is a tool that we developed called “DeltaDoc” which automatically documents code changes in a way that is often more concise and more readable than existing tools like diff. In fact, we’ll show that it’s really more comparable to commit messages.DeltaDoc describes the EFFECT of a change on the runtime behavior of a program. That is to say “what conditions trigger the change?” and “what are the observable differences in program behavior?” DeltaDoc is based on a combination of symbolic execution and also a set of transformation summarizations that I’ll describe in a bit.In returning to our previous example…
… by applying DeltaDoc We can see the change in the functions behavior phrased in terms of the argument: String s.The DeltaDoc tells us that if s is not null and the result of spiting the string on hyphens is not length 2, then it returns the first item instead of returning an empty string.
Here’s another short example, this time comparing to a human written commit message which says “Temporarily removed the trade routes from the game menu”Below that, the deltadoc shows that trade route menu items are no longer added to these two menus: buildOrders and build view. Of course, the deltadoc doesn't say that it’s a temporary change, which alludes to the fact that there are really two kinds of information we might document …
In particular we distinguish between information that describes what has changed (for example added function x) from why it was changed (for example, fixed bug number 42) and other information like: “this is temporary.”
Both types of information are important, and a majority of commit messages contain some of both.
The goal of deltadoc is to cover as much of the important “what” information as possible, while still being concise.Note that by design, deltadocdosen’t cover all information that can be documented, and there are some reasons for that including that it is INTRA-procedural, it only documents external visible effects, and it makes some summarizations.
But given that, If the intersection between the information in commit messages and the information in the corresponding deltadoc is large (as we hypothesize). Then a significant portion of the documentation work done by developers could be avoided with the use of deltadoc.
If that’s true then developers could spend more time explaining why the change was made, or they could simply use that time to do other things, like make more changes.
In the rest of this talk I’ll discuss how deltadoc works, and how we evaluated it .. by comparing it directly to commit messages with the help of some human annotators.
First I’ll overview the DeltaDoc algorithm which takes as input two versions of a file and outputs human-readable text describing the change.
The first step is compute symbolic path predicates for each statement of the program before and after the change.
Then identify statements that have been added, removed, or have a different predicate and distill from that a documentation on the form: When X, Do Y, instead of Z
Finally, because our goal is to mimic human-written documentation, we apply a set of transformations which trade-off precision for space. For example, we re-arrange terms to make the final documentation more concise.
Our first step is to obtain intraprocedural path predicates, and we do that with symbolic execution.For an example, consider this function which produces a greeting prefixed by hello or bonjour depending on which language the system is using. Choosing between languages being especially relevant since we are in begium.
In this example there are just three paths through the function to consider: one where the argument name is null, and then one for each of the language options.
The next step is to execute each path symbolically. For each stmt we compute a symbolic description as well as the path predicate which is the conjunction of all the branch guards.
Here’s the same table for the second path where we note, for example, in the last row we return “hello” + name if name !=null and the system language is English.
And then the third path is just what you’d expect, similar to the second.
And finally, we only keep around information about stmts which are relevant to the externally visible behavior. So in this case we just need to remember the return stmts.
Now if there is a change to the function, like Hello is changed to Hi …
We would do predicate generation just like before …
… and notice that there is a difference in the functional behavior … here the predicate is the same, but Hello has changed to Hi
From that it’s fairly straight forward to distill a documentation string like this one [read]Now in some cases, this raw documentation is too complicated or verbose to be a good replacement for commit messages …
… so in an effort to better mimic human-written documentation -- which is rarely longer then a few lines --we apply a set of potentially precision reducing transformations.In one such transformation, we drop parts of the predicate that are not relevant the statements documented. Here, since we’re talking about returning s we guess that the important part of the path predicate is s != null, so we hide the rest of it.
We also preform changes like this one where we reduce redundancy by re-arranging terms into a more hierarchical structure.Other types of transformations simplify programmic expressions or make them more readable.But there are others, so for a more complete description of all the summarization please have a look at the paper.
Now that we’ve seen how deltadoc works, I’ll talk about how we evaluated it in three dimensions …we looked at the size of deltadoc output, the information content, and we used human annotators to qualitatively judge it’s usefulness.
We looked at a total of 1000 recent revisions to 5 popular open-source java projects across several domains. In addition to the size of the projects in lines of code we also note the number of active authors since one of the main reasons for using change documentation is to facilitate collaboration between developers.
This chart compares the size of deltadoc to standard diffs and also commit messages.Our takeaway is that while diffs averaged 35 lines, deltadocs were only about 9 lines, which is a lot closer to human-written commit messages which are usually only a few lines long and never more than 10 in this sample.While this makes deltadoc potentially suitable for supplementing commit messages in plain text, we can also imagine a tool that can allow developers to interact with the documentation … something that could allow them to drill down on demand to find out what’s going on more precisely but without being overwhelmed with too much detail initially.
Returning to our earlier figure, we hypothesized that much of the information contained in commit messages could be found in deltadoc.
In order to test this we need to establish a way to compare the information content in deltadoc to the information contained in commit messages. But because commit messages are natural language and DeltaDoc is not we need to be a little bit clever to do this scientifically …
DeltaDoc and human written commit messages arent directly comparable, so we introduce a third form which we call the “relational form” which represents the information content in each type of documentation, and it is in this form that we can make a precise comparison.
Relational form is so-named because it consists of a very specific set of relations between program variables. This makes translating to relational form (even from human-written text) straightforward and it also makes comparing two artifacts in relational form easily quantifiable.
For a quick example consider comparing the deltadoc “getpriceforbuilding > getowner.getgold” with the human written message “has an insufficient amount of gold”In this case the deltadoc is essentially already in relational form but in the case of the human doc we must manually extract the relation “something” is greater than goldAfter we extract all such relations we manually look for a mapping between them, which ultimately allows us to quantify the difference in information using what we call the “score metric”
For the details, I refer you to our paper. But at a high level we designed the metric to be tough on deltadoc. In particular, we assume that only relations from commit messages are important, and we don’t reward the deltadoc even when it produces other potentially valuable information.The only way for deltadoc to score better than a commit message to be more precise.So something like “updated to revision 5” gets slightly more points than “updated revision” or just “minor change”We used 16 students from a graduate programming language class to validate the comparison.The score can range from 0 to 1, where A score of at least 0.5 indicates that deltadoc contained all of the (what-type) information present in the commit message … that’s the primarly goal.
Add another example
This graph gives the average score metric for the changes sampled from each benchmark.With an average greater than 0.5, we conclude that deltadoc contained at least as much information as corresponding commit messages most of the time.
Here’s another view of the same data, which breaks down the distribution a bit showing how deltadoc is less complete or precise than commit messages only about 10% of them time. For the vast majority of commits, the deltadoc scores 0.5 or better.
So returning to our earlier hypothesis, we find that this intersection is indeed large. But is DeltaDoc useful?
The answer we got from our annotators, who had the chance to read many commit messages and deltadocs, what generally “yes”. In about two thirds of cases they either preferred deltadoc or had no preference in comparison to the existing commit message.
To conclude, like any program analysis deltadoc has both theoretical and practical limitations.
We believe that documentation of this form has a number of advantages. It’s fast and fully automatic, its suitable for supplementing commit messages right now, the output is structured – so it’s amenable to search, and it’s both predictable and reliable.
Commit Messages<br />Free-form text which may describe<br />What the change was.<br />Why the change was made.<br />Phex 3542<br />Minor change<br />jfreechart rev 3405<br /> (start): Changed from Date to long,<br /> (end): Likewise,<br /> (getStartMillis): New method,<br /> (getEndMillis): Likewise,<br /> (getStart): Returns new date instance,<br /> (getEnd): Likewise.<br />Jabref rev 2917<br />Fixed NullPointerException when downloading external file and file directory is undefined.<br />
Toby,<br />Going forward, could I ask you to be more descriptive in your commit messages? Ideally you should state what you've changed and also why (unless it's obvious)... I know you're busy and this takes more time, but it will help anyone who looks through the log ...<br />http://lists.macosforge.org/pipermail/macports-dev/2009-June/008881.html<br />Subject: An appeal for more descriptive commit messages<br />I know there is a lot going on but please can we be a bit more<br />descriptive when committing changes. Recent log messages have included:<br />"some cleanup"<br />"more external service work"<br />"Fixed a bug in wiring"<br />which are a lot less informative than others...<br />http://osdir.com/ml/apache.webservices.tuscany.devel/2006-02/msg00227.html<br />Sorry to be a pain in the neck about this, but could we please use more descriptive commit messages? I do try to read the commit emails, but since the vast majority of comments are "CAY-XYZ", I can't really tell what's going on unless I then look it up.<br />http://osdir.com/ml/java.cayenne.devel/2006-10/msg00044.html<br />
DeltaDoc<br />Describes the observable EFFECT of a change<br />Conditions that trigger the changed code<br />How the change impacts functional behavior and program state<br />Symbolic Execution<br />Summarization<br />Transformations<br />
DeltaDoc<br />Diff<br />19c19 , 22<br />< else return "";<br />---<br />> else return pageParts;<br />> // else return "";<br />DeltaDoc<br />When calling LastPageformat(String s)<br /> If s is not nulland s.split ("[-]+").length != 2<br />returns.split ("[-]+") instead of ""<br />
DeltaDoc<br />Commit message<br />Temporary removed the trade routes from the game menu.<br />DeltaDoc<br />When calling FreeColMenuBarbuildOrdersMenu<br /> No longer<br /> call JMenu.add(getMenuItem("assignTradeRouteAction"))<br />When calling FreeColMenuBarbuildViewMenu<br /> No longer<br /> call JMenu.add(getMenuItem("tradeRouteAction"))<br />Freecol rev 2085<br />
Documenting Change<br />Why was it changed?<br />What was changed?<br />
Relational Form Example<br />has an insufficient amount of gold<br />getPriceForBuilding() > getOwner().getGold()<br />getPriceForBuilding() > getOwner().getGold()<br />? > gold<br />
Score Metric<br />Conservatively assume only relations from commit messages are important.<br />Reward precision.<br />Used 16 human annotators to validate.<br />Score of 0.5 indicates that the DeltaDoc contained all the information in the commit message.<br />
Example Score = 0.5<br />Commit Message<br />no need to call clear()<br />DeltaDoc<br />When calling PdfContentByte reset()<br /> If stateList .isEmpty(),<br /> No longer call stateList .clear()<br />iText Rev 3837<br />
Example Score > 0.5<br />Commit Message<br />Commented unused constant<br />DeltaDoc<br />removed field : EuropePanel : int TITLE_FONT_SIZE<br />Freecol rev 2054<br />
Example Score < 0.5<br />Commit Message<br />Fixed bug: content selector for ‘editor‘ field uses ‘,' instead of ‘and' as delimiter.<br />DeltaDoc<br />When calling EntryEditorgetExtra()<br /> If ed.getFieldName().equals("editor")<br /> call contentSelectors.add(FieldContentSelector)<br />JabRef Rev 3111<br />
Qualitative Evaluation<br />“very useful" “highly useful” "would be a great supplement" “definitely a useful supplement" “can help make the logic clear” “often easier to understand“ “more accurate” “easy to read" “provides more information"<br />
DeltaDoc Limitations<br />Intraprocedural<br />Handling of loops not fully-precise<br />Less precise for large changes<br />Does not address reason for change<br />
DeltaDoc Advantages<br />Cheap<br />Can be computed in about a second on average.<br />Suitable for quick adoption<br />Can supplement or replace many existing commit messages.<br />Structured<br />Suitable for search.<br />Reliable<br />