SlideShare a Scribd company logo
Automatically Documenting Program Changes Ray Buse Wes Weimer De  ltaDoc diffs ASE  9.22.2010 Antwerp, Belgium Commit Messages
Change
So Much Change
So Much Change
So Much Change
Managing Change Developers Evaluation Managers
Understanding change remains difficult Peter Hallam. What Do Programmers Really Do Anyway? Microsoft Developer Network (MSDN) – C# Compiler. Jan 2006.
Diff 19c19 , 22 < else return ""; --- > else return pageParts[0]; > // else return ""; JabRef Revision 3066
Side-by-side
Commit Messages Free-form text which may describe What the change was. Why the change was made. Phex 3542 Minor change jfreechart rev 3405     (start): Changed from Date to long,     (end): Likewise,     (getStartMillis): New method,     (getEndMillis): Likewise,     (getStart): Returns new date instance,     (getEnd): Likewise. Jabref rev 2917 Fixed NullPointerException when downloading external file and file directory is undefined.
Toby, Going forward, could I ask you to be more descriptive in your commit messages? Ideally you should state what you've changed and also why (unless it's obvious)... I know you're busy and this takes more time, but it will help anyone who looks through the log ... http://lists.macosforge.org/pipermail/macports-dev/2009-June/008881.html Subject: An appeal for more descriptive commit messages I know there is a lot going on but please can we be a bit more descriptive when committing changes. Recent log messages have included: "some cleanup" "more external service work" "Fixed a bug in wiring" which are a lot less informative than others... http://osdir.com/ml/apache.webservices.tuscany.devel/2006-02/msg00227.html Sorry to be a pain in the neck about this, but could we please use more descriptive commit messages? I do try to read the commit emails, but since the vast majority of comments are "CAY-XYZ", I can't really tell what's going on unless I then look it up. http://osdir.com/ml/java.cayenne.devel/2006-10/msg00044.html
DeltaDoc Describes the observable EFFECT of a change Conditions that trigger the changed code How the change impacts functional behavior and program state Symbolic Execution Summarization Transformations
DeltaDoc Diff 19c19 , 22 < else return ""; --- > else return pageParts[0]; > // else return ""; DeltaDoc When calling LastPageformat(String s)     If s is not nulland s.split ("[-]+").length != 2 returns.split ("[-]+")[0] instead of ""
DeltaDoc Commit message Temporary removed the trade routes from the game menu. DeltaDoc When calling FreeColMenuBarbuildOrdersMenu   No longer     call JMenu.add(getMenuItem("assignTradeRouteAction")) When calling FreeColMenuBarbuildViewMenu   No longer     call JMenu.add(getMenuItem("tradeRouteAction")) Freecol rev 2085
Documenting Change Why was it changed? What was changed?
Commit Messages Why? What? Current Practice
DeltaDoc DeltaDoc Commit Messages Why? What?
Hypothesis This area is large. DeltaDoc Commit Messages Why? What?
With DeltaDoc DeltaDoc Commit Messages Why? What?
The rest of this talk Approach: How DeltaDoc works. Evaluation: Comparing DeltaDoc to Commit messages.
DeltaDoc Architecture
DeltaDoc Architecture Compute symbolic path predicates for each statement.
DeltaDoc Architecture When X,   Do Y Instead of Z Identify statements that have been added, removed, or have a different predicate.
DeltaDoc Architecture Apply summarization transformations until result is sufficiently concise.
Predicate Generation StringsayHello(String name)  { String ret = “”; if(name != null)   { if(System.Lang == ENG)      ret = ret + “Hello ”; else      ret = ret + “Bonjour ”;     ret = ret + name;  } returnret; }
Predicate Generation Enumerate loop-free control flow paths. String ret = “”; name != null System.Lang == ENG ret = ret + “Hello ”; ret = ret + name; returnret; String ret = “”; name != null System.Lang != ENG ret = ret + “Bonjour ”; ret = ret + name; returnret; String ret = “”; name == null returnret;
Predicate Generation Symbolic Execution
Predicate Generation Symbolic Execution
Predicate Generation Symbolic Execution
Store important stmts
Change 31 StringsayHello(String name)  { String ret = “”; if(name != null)   { if(System.Lang == ENG)      ret = ret + “Hello ”; else      ret = ret + “Bonjour ”;     ret = ret + name;  } returnret; } StringsayHello(String name)  { String ret = “”; if(name != null)   { if(System.Lang == ENG)      ret = ret + “Hi ”; else      ret = ret + “Bonjour ”;     ret = ret + name;  } returnret; }
Predicate Generation
Enumerate Changes
Generate Documentation When calling sayHello(String name) Ifname != null ANDSystem.Lang == ENG return“Hi ” + name Instead of“Hello ” + name
Summarization Remove irrelevant path predicates. If s != nulland a is true and b is true and c is true returns If s != null  returns
Summarization Re-arrange terms Simplification Readability enhancements based on Java idioms If P andQ,             DoX  IfR,  DoY If P andQ, DoX IfP andQ andR, DoY
Evaluation Quality Content Size
Benchmarks
Size Comparison Diffs are about 35 lines DeltaDocs are about 9 lines Commit messages are always less than 10 lines
Content Comparison How large is this area? DeltaDoc Commit Messages Why? What?
Content Comparison DeltaDoc CommitMessages
Content Comparison DeltaDoc Commit Messages RelationalForm RelationalForm
Relational Form
Relational Form Example has an insufficient amount of gold getPriceForBuilding() > getOwner().getGold() getPriceForBuilding() > getOwner().getGold() ?    >   gold
Score Metric Conservatively assume only relations from commit messages are important. Reward precision. Used 16 human annotators to validate. Score of 0.5 indicates that the DeltaDoc contained all the information in the commit message.
Example Score = 0.5 Commit Message no need to call clear() DeltaDoc When calling PdfContentByte reset()     If stateList .isEmpty(),       No longer call stateList .clear() iText Rev 3837
Example Score > 0.5 Commit Message Commented unused constant DeltaDoc removed field : EuropePanel : int TITLE_FONT_SIZE Freecol rev 2054
Example Score < 0.5 Commit Message Fixed bug: content selector for ‘editor‘ field uses ‘,' instead of ‘and' as delimiter. DeltaDoc When calling EntryEditorgetExtra()     If ed.getFieldName().equals("editor")         call contentSelectors.add(FieldContentSelector) JabRef Rev 3111
Results
Results
Results About 89% coverage. DeltaDoc Commit Messages Why? What?
Qualitative Evaluation “very useful" “highly useful” "would be a great supplement" “definitely a useful supplement" “can help make the logic clear” “often easier to understand“ “more accurate” “easy to read" “provides more information"
DeltaDoc Limitations Intraprocedural Handling of loops not fully-precise Less precise for large changes Does not address reason for change
DeltaDoc Advantages Cheap Can be computed in about a second on average. Suitable for quick adoption Can supplement or replace many existing commit messages. Structured Suitable for search. Reliable
Questions? DeltaDoc Commit Messages Why? What?

More Related Content

Similar to Automatically Documenting Program Changes

Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Victor Rentea
 
Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)
Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)
Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)
David Gómez García
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
Hammad Rajjoub
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 
Core .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 EnhancementsCore .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 Enhancements
Robert MacLean
 
Ruslan Platonov - Transactions
Ruslan Platonov - TransactionsRuslan Platonov - Transactions
Ruslan Platonov - Transactions
Dmitry Buzdin
 
Visual studio 2008
Visual studio 2008Visual studio 2008
Visual studio 2008
Luis Enrique
 

Similar to Automatically Documenting Program Changes (20)

Cleaning your architecture with android architecture components
Cleaning your architecture with android architecture componentsCleaning your architecture with android architecture components
Cleaning your architecture with android architecture components
 
Clean code _v2003
 Clean code _v2003 Clean code _v2003
Clean code _v2003
 
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
 
Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)
Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)
Midiendo la calidad de código en WTF/Min (Revisado EUI Abril 2014)
 
Ida python intro
Ida python introIda python intro
Ida python intro
 
Can't Dance The Lambda
Can't Dance The LambdaCan't Dance The Lambda
Can't Dance The Lambda
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Core .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 EnhancementsCore .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 Enhancements
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX Go
 
닷넷 개발자를 위한 패턴이야기
닷넷 개발자를 위한 패턴이야기닷넷 개발자를 위한 패턴이야기
닷넷 개발자를 위한 패턴이야기
 
The Ring programming language version 1.6 book - Part 184 of 189
The Ring programming language version 1.6 book - Part 184 of 189The Ring programming language version 1.6 book - Part 184 of 189
The Ring programming language version 1.6 book - Part 184 of 189
 
Refactoring
RefactoringRefactoring
Refactoring
 
Category theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) DataCategory theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) Data
 
The Ring programming language version 1.8 book - Part 95 of 202
The Ring programming language version 1.8 book - Part 95 of 202The Ring programming language version 1.8 book - Part 95 of 202
The Ring programming language version 1.8 book - Part 95 of 202
 
The Ring programming language version 1.7 book - Part 7 of 196
The Ring programming language version 1.7 book - Part 7 of 196The Ring programming language version 1.7 book - Part 7 of 196
The Ring programming language version 1.7 book - Part 7 of 196
 
Ruslan Platonov - Transactions
Ruslan Platonov - TransactionsRuslan Platonov - Transactions
Ruslan Platonov - Transactions
 
Visual studio 2008
Visual studio 2008Visual studio 2008
Visual studio 2008
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Automatically Documenting Program Changes

  • 1. Automatically Documenting Program Changes Ray Buse Wes Weimer De ltaDoc diffs ASE 9.22.2010 Antwerp, Belgium Commit Messages
  • 6. Managing Change Developers Evaluation Managers
  • 7. Understanding change remains difficult Peter Hallam. What Do Programmers Really Do Anyway? Microsoft Developer Network (MSDN) – C# Compiler. Jan 2006.
  • 8. Diff 19c19 , 22 < else return ""; --- > else return pageParts[0]; > // else return ""; JabRef Revision 3066
  • 10. Commit Messages Free-form text which may describe What the change was. Why the change was made. Phex 3542 Minor change jfreechart rev 3405 (start): Changed from Date to long, (end): Likewise, (getStartMillis): New method, (getEndMillis): Likewise, (getStart): Returns new date instance, (getEnd): Likewise. Jabref rev 2917 Fixed NullPointerException when downloading external file and file directory is undefined.
  • 11. Toby, Going forward, could I ask you to be more descriptive in your commit messages? Ideally you should state what you've changed and also why (unless it's obvious)... I know you're busy and this takes more time, but it will help anyone who looks through the log ... http://lists.macosforge.org/pipermail/macports-dev/2009-June/008881.html Subject: An appeal for more descriptive commit messages I know there is a lot going on but please can we be a bit more descriptive when committing changes. Recent log messages have included: "some cleanup" "more external service work" "Fixed a bug in wiring" which are a lot less informative than others... http://osdir.com/ml/apache.webservices.tuscany.devel/2006-02/msg00227.html Sorry to be a pain in the neck about this, but could we please use more descriptive commit messages? I do try to read the commit emails, but since the vast majority of comments are "CAY-XYZ", I can't really tell what's going on unless I then look it up. http://osdir.com/ml/java.cayenne.devel/2006-10/msg00044.html
  • 12. DeltaDoc Describes the observable EFFECT of a change Conditions that trigger the changed code How the change impacts functional behavior and program state Symbolic Execution Summarization Transformations
  • 13. DeltaDoc Diff 19c19 , 22 < else return ""; --- > else return pageParts[0]; > // else return ""; DeltaDoc When calling LastPageformat(String s) If s is not nulland s.split ("[-]+").length != 2 returns.split ("[-]+")[0] instead of ""
  • 14. DeltaDoc Commit message Temporary removed the trade routes from the game menu. DeltaDoc When calling FreeColMenuBarbuildOrdersMenu No longer call JMenu.add(getMenuItem("assignTradeRouteAction")) When calling FreeColMenuBarbuildViewMenu No longer call JMenu.add(getMenuItem("tradeRouteAction")) Freecol rev 2085
  • 15. Documenting Change Why was it changed? What was changed?
  • 16. Commit Messages Why? What? Current Practice
  • 17. DeltaDoc DeltaDoc Commit Messages Why? What?
  • 18. Hypothesis This area is large. DeltaDoc Commit Messages Why? What?
  • 19. With DeltaDoc DeltaDoc Commit Messages Why? What?
  • 20. The rest of this talk Approach: How DeltaDoc works. Evaluation: Comparing DeltaDoc to Commit messages.
  • 22. DeltaDoc Architecture Compute symbolic path predicates for each statement.
  • 23. DeltaDoc Architecture When X, Do Y Instead of Z Identify statements that have been added, removed, or have a different predicate.
  • 24. DeltaDoc Architecture Apply summarization transformations until result is sufficiently concise.
  • 25. Predicate Generation StringsayHello(String name) { String ret = “”; if(name != null) { if(System.Lang == ENG) ret = ret + “Hello ”; else ret = ret + “Bonjour ”; ret = ret + name; } returnret; }
  • 26. Predicate Generation Enumerate loop-free control flow paths. String ret = “”; name != null System.Lang == ENG ret = ret + “Hello ”; ret = ret + name; returnret; String ret = “”; name != null System.Lang != ENG ret = ret + “Bonjour ”; ret = ret + name; returnret; String ret = “”; name == null returnret;
  • 31. Change 31 StringsayHello(String name) { String ret = “”; if(name != null) { if(System.Lang == ENG) ret = ret + “Hello ”; else ret = ret + “Bonjour ”; ret = ret + name; } returnret; } StringsayHello(String name) { String ret = “”; if(name != null) { if(System.Lang == ENG) ret = ret + “Hi ”; else ret = ret + “Bonjour ”; ret = ret + name; } returnret; }
  • 34. Generate Documentation When calling sayHello(String name) Ifname != null ANDSystem.Lang == ENG return“Hi ” + name Instead of“Hello ” + name
  • 35. Summarization Remove irrelevant path predicates. If s != nulland a is true and b is true and c is true returns If s != null returns
  • 36. Summarization Re-arrange terms Simplification Readability enhancements based on Java idioms If P andQ, DoX IfR, DoY If P andQ, DoX IfP andQ andR, DoY
  • 39. Size Comparison Diffs are about 35 lines DeltaDocs are about 9 lines Commit messages are always less than 10 lines
  • 40. Content Comparison How large is this area? DeltaDoc Commit Messages Why? What?
  • 41. Content Comparison DeltaDoc CommitMessages
  • 42. Content Comparison DeltaDoc Commit Messages RelationalForm RelationalForm
  • 44. Relational Form Example has an insufficient amount of gold getPriceForBuilding() > getOwner().getGold() getPriceForBuilding() > getOwner().getGold() ? > gold
  • 45. Score Metric Conservatively assume only relations from commit messages are important. Reward precision. Used 16 human annotators to validate. Score of 0.5 indicates that the DeltaDoc contained all the information in the commit message.
  • 46. Example Score = 0.5 Commit Message no need to call clear() DeltaDoc When calling PdfContentByte reset() If stateList .isEmpty(), No longer call stateList .clear() iText Rev 3837
  • 47. Example Score > 0.5 Commit Message Commented unused constant DeltaDoc removed field : EuropePanel : int TITLE_FONT_SIZE Freecol rev 2054
  • 48. Example Score < 0.5 Commit Message Fixed bug: content selector for ‘editor‘ field uses ‘,' instead of ‘and' as delimiter. DeltaDoc When calling EntryEditorgetExtra() If ed.getFieldName().equals("editor") call contentSelectors.add(FieldContentSelector) JabRef Rev 3111
  • 51. Results About 89% coverage. DeltaDoc Commit Messages Why? What?
  • 52. Qualitative Evaluation “very useful" “highly useful” "would be a great supplement" “definitely a useful supplement" “can help make the logic clear” “often easier to understand“ “more accurate” “easy to read" “provides more information"
  • 53. DeltaDoc Limitations Intraprocedural Handling of loops not fully-precise Less precise for large changes Does not address reason for change
  • 54. DeltaDoc Advantages Cheap Can be computed in about a second on average. Suitable for quick adoption Can supplement or replace many existing commit messages. Structured Suitable for search. Reliable
  • 55. Questions? DeltaDoc Commit Messages Why? What?

Editor's Notes

  1. Not a great deal is certain when it comes to software development. But one thing that is certain is …
  2. … change. There are changes to … requirements and specificationswork items and bug lists. documentation. development teams and their managementAnd of course there are changes to code.
  3. … and there are a lot of changes …
  4. Many open source projects had over 1,000 commits last year. Large projects can have more. Mozilla for example had over 10,000 commits, and about 28,000 new bug reports.
  5. To manage and track all these changes developers use many kinds of tools. Bug databases, work item databases, and of course source control repositories. More and more we see these tools integrated together in systems like IBM’s Jazz and Microsoft’s Team Foundation Server. Clearly managing change is critical …
  6. It’s important for facilitating collaboration between developers, especially when they are geographically distributed. Developers might wish to validate changes, locate and triage defects, or simply understand modifications.It’s important for helping managers understand what going on, something that critical to the success of a project. And ultimately, understanding the history of a project is essential to evaluating it’s success.
  7. But even with available tools understanding how a program is changing over time is not a simple proposition. And at it’s core, the problem is that program source code is difficult to understand. Professional developers spend the majority of their time trying to understand code.Let’s take a look at an example of this …
  8. Suppose I’d like to understand this change, maybe I’m looking for a bug, or maybe I’m responsible for this particular file and I want to make sure this change, from some other developer is ok. What we have here is the standard diff output for a change to some java code showing exactly which lines changed. The problem is that diff output is not always easy to understand because we only get a slice of the program. Here we can see that the first element of a local array called pageParts is being returned instead of an empty string … but we don’t know when that happens, and its also not obvious what pageParts contains, or whether this is a reasonable thing to do.Because this tends to happen often there is also “context diff” which prints out some extra lines above and below the changed lines. But in this case pageParts is defined six lines up, so what we probably need to do is use a tool that gives a side-by side comparison …
  9. The problem is that now we’re back to reading the code which is that difficult and time consuming task we want to avoid.
  10. This is one of the reasons we have “commit messages” in version control systems. These are free form text fields that enable developers to summarize their changes so that others can see what they’ve done without having to puzzle through diffs or read code. In many projects they are required. The problem is that not only do they take time to make, these messages aren&apos;t always as complete and accurate as we would like …
  11. Take a look at a few developer message board posts. These comments allude to the observation that although commit messages are useful, they are also burdensome for developers to create. Take the example on the top right which says …So perhaps what we would really like is a tool that’s automatic like diff, but that produces output that’s more to-the-point or summarized and easier to understand like commit messages.
  12. What I’ll talk about today is a tool that we developed called “DeltaDoc” which automatically documents code changes in a way that is often more concise and more readable than existing tools like diff. In fact, we’ll show that it’s really more comparable to commit messages.DeltaDoc describes the EFFECT of a change on the runtime behavior of a program. That is to say “what conditions trigger the change?” and “what are the observable differences in program behavior?” DeltaDoc is based on a combination of symbolic execution and also a set of transformation summarizations that I’ll describe in a bit.In returning to our previous example…
  13. … by applying DeltaDoc We can see the change in the functions behavior phrased in terms of the argument: String s.The DeltaDoc tells us that if s is not null and the result of spiting the string on hyphens is not length 2, then it returns the first item instead of returning an empty string.
  14. Here’s another short example, this time comparing to a human written commit message which says “Temporarily removed the trade routes from the game menu”Below that, the deltadoc shows that trade route menu items are no longer added to these two menus: buildOrders and build view. Of course, the deltadoc doesn&apos;t say that it’s a temporary change, which alludes to the fact that there are really two kinds of information we might document …
  15. In particular we distinguish between information that describes what has changed (for example added function x) from why it was changed (for example, fixed bug number 42) and other information like: “this is temporary.”
  16. Both types of information are important, and a majority of commit messages contain some of both.
  17. The goal of deltadoc is to cover as much of the important “what” information as possible, while still being concise.Note that by design, deltadocdosen’t cover all information that can be documented, and there are some reasons for that including that it is INTRA-procedural, it only documents external visible effects, and it makes some summarizations.
  18. But given that, If the intersection between the information in commit messages and the information in the corresponding deltadoc is large (as we hypothesize). Then a significant portion of the documentation work done by developers could be avoided with the use of deltadoc.
  19. If that’s true then developers could spend more time explaining why the change was made, or they could simply use that time to do other things, like make more changes.
  20. In the rest of this talk I’ll discuss how deltadoc works, and how we evaluated it .. by comparing it directly to commit messages with the help of some human annotators.
  21. First I’ll overview the DeltaDoc algorithm which takes as input two versions of a file and outputs human-readable text describing the change.
  22. The first step is compute symbolic path predicates for each statement of the program before and after the change.
  23. Then identify statements that have been added, removed, or have a different predicate and distill from that a documentation on the form: When X, Do Y, instead of Z
  24. Finally, because our goal is to mimic human-written documentation, we apply a set of transformations which trade-off precision for space. For example, we re-arrange terms to make the final documentation more concise.
  25. Our first step is to obtain intraprocedural path predicates, and we do that with symbolic execution.For an example, consider this function which produces a greeting prefixed by hello or bonjour depending on which language the system is using. Choosing between languages being especially relevant since we are in begium.
  26. In this example there are just three paths through the function to consider: one where the argument name is null, and then one for each of the language options.
  27. The next step is to execute each path symbolically. For each stmt we compute a symbolic description as well as the path predicate which is the conjunction of all the branch guards.
  28. Here’s the same table for the second path where we note, for example, in the last row we return “hello” + name if name !=null and the system language is English.
  29. And then the third path is just what you’d expect, similar to the second.
  30. And finally, we only keep around information about stmts which are relevant to the externally visible behavior. So in this case we just need to remember the return stmts.
  31. Now if there is a change to the function, like Hello is changed to Hi …
  32. We would do predicate generation just like before …
  33. … and notice that there is a difference in the functional behavior … here the predicate is the same, but Hello has changed to Hi
  34. From that it’s fairly straight forward to distill a documentation string like this one [read]Now in some cases, this raw documentation is too complicated or verbose to be a good replacement for commit messages …
  35. … so in an effort to better mimic human-written documentation -- which is rarely longer then a few lines --we apply a set of potentially precision reducing transformations.In one such transformation, we drop parts of the predicate that are not relevant the statements documented. Here, since we’re talking about returning s we guess that the important part of the path predicate is s != null, so we hide the rest of it.
  36. We also preform changes like this one where we reduce redundancy by re-arranging terms into a more hierarchical structure.Other types of transformations simplify programmic expressions or make them more readable.But there are others, so for a more complete description of all the summarization please have a look at the paper.
  37. Now that we’ve seen how deltadoc works, I’ll talk about how we evaluated it in three dimensions …we looked at the size of deltadoc output, the information content, and we used human annotators to qualitatively judge it’s usefulness.
  38. We looked at a total of 1000 recent revisions to 5 popular open-source java projects across several domains. In addition to the size of the projects in lines of code we also note the number of active authors since one of the main reasons for using change documentation is to facilitate collaboration between developers.
  39. This chart compares the size of deltadoc to standard diffs and also commit messages.Our takeaway is that while diffs averaged 35 lines, deltadocs were only about 9 lines, which is a lot closer to human-written commit messages which are usually only a few lines long and never more than 10 in this sample.While this makes deltadoc potentially suitable for supplementing commit messages in plain text, we can also imagine a tool that can allow developers to interact with the documentation … something that could allow them to drill down on demand to find out what’s going on more precisely but without being overwhelmed with too much detail initially.
  40. Returning to our earlier figure, we hypothesized that much of the information contained in commit messages could be found in deltadoc.
  41. In order to test this we need to establish a way to compare the information content in deltadoc to the information contained in commit messages. But because commit messages are natural language and DeltaDoc is not we need to be a little bit clever to do this scientifically …
  42. DeltaDoc and human written commit messages arent directly comparable, so we introduce a third form which we call the “relational form” which represents the information content in each type of documentation, and it is in this form that we can make a precise comparison.
  43. Relational form is so-named because it consists of a very specific set of relations between program variables. This makes translating to relational form (even from human-written text) straightforward and it also makes comparing two artifacts in relational form easily quantifiable.
  44. For a quick example consider comparing the deltadoc “getpriceforbuilding &gt; getowner.getgold” with the human written message “has an insufficient amount of gold”In this case the deltadoc is essentially already in relational form but in the case of the human doc we must manually extract the relation “something” is greater than goldAfter we extract all such relations we manually look for a mapping between them, which ultimately allows us to quantify the difference in information using what we call the “score metric”
  45. For the details, I refer you to our paper. But at a high level we designed the metric to be tough on deltadoc. In particular, we assume that only relations from commit messages are important, and we don’t reward the deltadoc even when it produces other potentially valuable information.The only way for deltadoc to score better than a commit message to be more precise.So something like “updated to revision 5” gets slightly more points than “updated revision” or just “minor change”We used 16 students from a graduate programming language class to validate the comparison.The score can range from 0 to 1, where A score of at least 0.5 indicates that deltadoc contained all of the (what-type) information present in the commit message … that’s the primarly goal.
  46. Add another example
  47. This graph gives the average score metric for the changes sampled from each benchmark.With an average greater than 0.5, we conclude that deltadoc contained at least as much information as corresponding commit messages most of the time.
  48. Here’s another view of the same data, which breaks down the distribution a bit showing how deltadoc is less complete or precise than commit messages only about 10% of them time. For the vast majority of commits, the deltadoc scores 0.5 or better.
  49. So returning to our earlier hypothesis, we find that this intersection is indeed large. But is DeltaDoc useful?
  50. The answer we got from our annotators, who had the chance to read many commit messages and deltadocs, what generally “yes”. In about two thirds of cases they either preferred deltadoc or had no preference in comparison to the existing commit message.
  51. To conclude, like any program analysis deltadoc has both theoretical and practical limitations.
  52. We believe that documentation of this form has a number of advantages. It’s fast and fully automatic, its suitable for supplementing commit messages right now, the output is structured – so it’s amenable to search, and it’s both predictable and reliable.