Software Analytics: Towards Software Mining that Matters
Upcoming SlideShare
Loading in...5
×
 

Software Analytics: Towards Software Mining that Matters

on

  • 521 views

A keynote talk given at the 2013 second International Workshop on Software Mining: http://lamda.nju.edu.cn/conf/softwaremining13/speaker.html

A keynote talk given at the 2013 second International Workshop on Software Mining: http://lamda.nju.edu.cn/conf/softwaremining13/speaker.html

Statistics

Views

Total Views
521
Views on SlideShare
521
Embed Views
0

Actions

Likes
2
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Software Analytics: Towards Software Mining that Matters Software Analytics: Towards Software Mining that Matters Presentation Transcript

    • Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign http://www.cs.illinois.edu/homes/taoxie/ taoxie@illinois.edu
    • Should I testreview my? ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for datadriven tasks around software and services. [MALETS’11 Zhang et al.]
    • Software Intelligence & Analytics for Software Development http://people.engr.ncsu.edu/txie/publications/foser10-si.pdf http://thomas-zimmermann.com/publications/files/buse-foser-2010.pdf
    • • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
    • Look through your software data ©A. Hassan
    • Look through your software data ©A. Hassan
    • Mine through the data! An international effort to make software repositories actionable http://msrconf.org http://promisedata.org ©A. Hassan
    • Mine through the data! An international effort to make software repositories actionable http://msrconf.org http://promisedata.org ©A. Hassan
    • Promise Data Repository Mine through the data! An international effort to make software repositories actionable http://msrconf.org http://promisedata.org ©A. Hassan
    • Mining Software Repositories (MSR) • Transforms static recordkeeping repositories to active repositories • Makes repository data actionable by uncovering hidden patterns and trends Field logs CVS/SVN Bugzilla Mailinglist Crashes 11 ©A. Hassan
    • Mining Software Repositories (MSR) • Transforms static recordkeeping repositories to active repositories • Makes repository data actionable by uncovering hidden patterns and trends Field logs CVS/SVN Bugzilla Mailinglist Crashes 11 ©A. Hassan
    • Source Control CVS/SVN Bugzilla Mailing lists 12
    • Source Control CVS/SVN Bugzilla Mailing lists Crash Repos Field Logs 12
    • Source Control CVS/SVN Bugzilla Mailing lists Historical Repositories Crash Repos Field Logs 12
    • Source Control CVS/SVN Bugzilla Mailing lists Historical Repositories Crash Repos Field Logs Runtime Repos 12
    • Sourceforge GoogleCode Code Repos Source Control CVS/SVN Bugzilla Mailing lists Historical Repositories Crash Repos Field Logs Runtime Repos 12
    • MSR researchers analyze and cross-link repositories Bugzilla Mailinglist CVS/SVN Crashes ©A. Hassan
    • MSR researchers analyze and cross-link repositories discussions Buggy change & Fixing change fixed bug Bugzilla Mailinglist CVS/SVN Field crashes Crashes ©A. Hassan
    • MSR researchers analyze and cross-link repositories discussions Buggy change & Fixing change fixed bug Bugzilla Mailinglist CVS/SVN Field crashes Crashes New Bug Report ©A. Hassan
    • MSR researchers analyze and cross-link repositories discussions Buggy change & Fixing change fixed bug Bugzilla Mailinglist CVS/SVN Field crashes Crashes New Bug Report Estimate fix effort Mark duplicates Suggest experts and fix! ©A. Hassan
    • • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
    • We continue to help practitioners (esp. developers) ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • ©A. Hassan
    • Detection and Management of Code Clones ©A. Hassan
    • Support Logs Source Code ©A. Hassan
    • ©A. Hassan
    • • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Case Studies
    • Predicting Bugs • Studies have shown that most complexity metrics correlate well with LOC! – Graves et al. 2000 on commercial systems – Herraiz et al. 2007 on open source systems • Noteworthy findings: – Previous bugs are good predictors of future bugs – The more a file changes, the more likely it will have bugs in it – Recent changes affect more the bug potential of a file over older changes (weighted time damp models) – Number of developers is of little help in predicting bugs – Hard to generalize bug predictors across projects unless in similar domains [Nagappan, Ball et al. 2006] 23
    • Using Imports in Eclipse to Predict Bugs 71% of files that import compiler packages, had to be fixed later on. import org.eclipse.jdt.internal.compiler.lookup.*; import org.eclipse.jdt.internal.compiler.*; import org.eclipse.jdt.internal.compiler.ast.*; import org.eclipse.jdt.internal.compiler.util.*; ... import org.eclipse.pde.core.*; import org.eclipse.jface.wizard.*; import org.eclipse.ui.*; 14% of all files that import ui packages, had to be fixed later on. [Schröter et al. 06] 24
    • Don’t program on Fridays ;-) Percentage of bug-introducing changes for eclipse [Zimmermann et al. 05] 25
    • Failure is a 4-letter Word [PROMISE’11 Zeller et al.] 26
    • Actionable Alone is not Enough! [PROMISE’11 Zeller et al.] 27
    • Who produces more buggy code? ©A. Hassan
    • Who produces more buggy code? ©A. Hassan
    • • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
    • Analytic Techniques in SE • Association rules and frequent patterns • Classification • Clustering • Text mining/Natural language processing • Visualization More details are at • https://sites.google.com/site/xsoftanalytics/ 30
    • Solution-Driven Problem-Driven Where can I apply X miner? Basic mining algorithms E.g., association rule, frequent itemset mining… Advanced mining algorithms E.g., frequent partial order mining [ESEC/FSE 07] What patterns do we really need? New/adapted mining algorithms E.g., [ICSE 09], [ASE 09] 49
    • Mining  Searching + Mining Traditional approaches Code repositories 1 2 mining patterns Eclipse, Linux, … 50
    • Mining  Searching + Mining Traditional approaches Code repositories 1 Eclipse, Linux, … 2 mining patterns Often lack sufficient relevant data points (Eg. API call sites) 51
    • Mining  Searching + Mining Traditional approaches Code repositories 1 patterns mining 2 Eclipse, Linux, … Often lack sufficient relevant data points (Eg. API call sites) Our new approaches Code repositories 1 2 … Open source code on the web N searching mining patterns Code search engine e.g., 53 53 52
    • Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways Infrequent ways S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways Infrequent ways mine patterns Mined Patterns S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways mine patterns Infrequent ways detect violations Mined Patterns S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways mine patterns Infrequent ways detect violations Mined Patterns S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 58 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 59 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 1243 code examples 60 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 1 (1218 / 1243) 1243 code examples 61 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples 62 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 63 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 64 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 65 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next P2 : boolean check on return of ArrayList.size before Iterator.next S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next P2 : boolean check on return of ArrayList.size before Iterator.next  Cannot be mined by existing approaches, since alternative P2 is infrequent S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 72 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) 73 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) P2 is infrequent among entire 1243 code examples 74 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) P2 is frequent among code examples not supporting P1 75 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • Alternative Patterns  ImMiner mines three kinds of alternative patterns of the general form “P1 or P2” Balanced: all alternatives (both P1 and P2) are frequent Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^2” Single: only one alternative 76 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • ImMiner Algorithm  Uses frequent-itemset mining [Burdick et al. ICDE 01] iteratively  An input database with the following APIs for Iterator.next() Input database Mapping of IDs to APIs S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • ImMiner Algorithm: Frequent Alternatives Input database Frequent itemset mining (min_sup 0.5) Frequent item: 1 P1: boolean-check on the return of Iterator.hasNext() before Iterator.next() S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • ImMiner: Infrequent Alternatives of P1 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 79
    • ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 80
    • ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 81
    • ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD) S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 82
    • ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD)  Mine patterns that are frequent in NSD and are infrequent in PSD  Reason: Only such patterns serve as alternatives for P1 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 83
    • ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD)  Mine patterns that are frequent in NSD and are infrequent in PSD  Reason: Only such patterns serve as alternatives for P1  Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”  Alattin applies ImMiner algorithm to detect neglected conditions S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 84
    • Neglected Conditions  Neglected conditions refer to  Missing conditions that check the arguments or receiver of the API call before the API call  Missing conditions that check the return or receiver of the API call after the API call  One primary reason for many fatal issues  security or buffer-overflow vulnerabilities [Chang et al. ISSTA 07] S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
    • • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
    • Machine Learning that Matters [ICML’12 Wagstaff] http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
    • • Hyper-Focus on Benchmark Data Sets • Hyper-Focus on Abstract Metrics • Lack of Follow-Through [ICML’12 Wagstaff] http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
    • • Meaningful Evaluation Methods • Involvement of the World Outside ML • Eyes on the Prize [ICML’12 Wagstaff] http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
    • MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
    • MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Software Users Software Development Process Software Systems Research Topics Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
    • MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Information Visualization Software Users Software Development Process Analysis Algorithms Software Systems Large-scale Computing Research Topics Technology Pillars Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
    • MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Information Visualization Software Users Software Development Process Analysis Algorithms Software Systems Large-scale Computing Research Topics Technology Pillars Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
    • MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Information Visualization Software Users Software Vertical Development Process Analysis Algorithms Horizontal Software Systems Large-scale Computing Research Topics Technology Pillars Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
    • Software Analytics in Practice
    • Adoption Challenges for Software Analytics Must show value before data quality improves Correlation vs. Causation
    • ICSE Papers: Industry vs. Academia Source© Carlo Ghezzi
    • ICSE Papers: Industry vs. Academia OSDI 2008 26% vs. xSE ?% Developers, Programmers, Architects Among All Attendees Source© Carlo Ghezzi
    • ICSE Papers: Industry vs. Academia OSDI 2008 26% vs. xSE ?% Developers, Programmers, Architects Among All Attendees MSR 11 Keynote ICSE 09 Keynote Source© Carlo Ghezzi MSR 12 Keynote ICSM 11 Keynote SCAM 12 Keynote
    • "Are Automated Debugging [Research] Techniques Actually Helping Programmers?" • 50 years of automated debugging research – N papers  only 5 evaluated with actual programmers “ ” [ISSTA11 Parnin&Orso]
    • Are Regression Testing [Research] Techniques Actually Helping Industry? • Likely most studied testing problems – N papers “ ” [STVR11 Yoo&Harman]
    • Are [Some] Failure-Proneness Prediction [Research] Techniques Actually Helping? • Empirical software engineering (on prediction) – N papers ” [PROMISE11 Zeller et al.]
    • A Researcher's Observation in HCI Research Community • “The reviewers simply do not value the difficulty of building real systems and how hard controlled studies are to run on real systems for real tasks. This is in contrast with how easy it is to build new interaction techniques and then to run tight, controlled studies on these new techniques with small, artificial tasks” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
    • A Researcher's Observation in HCI Research Community • “This attitude is a joke and it offers researchers no incentive to do systems work. Why should they? Why should we put 3-4 person years into every CHI publication? Instead we can do 8 weeks of work on an idea piece or create a new interaction technique and test it tightly in 8-12 weeks and get a full CHI paper.” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
    • A Researcher's Observation in HCI Research Community • “When will this community wake up and understand that they are going to run out any work on creating new systems (rather than small pieces of systems) and cede that important endeavor to industry?” • “We are our own worst enemies. I think we have been blinded by the perception that "true scientific" research is only found in controlled experiments and nice statistics.” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
    • A Researcher's Observation in HCI Research Community • “When will this community wake up and understand that they are going to run out any work on creating new systems (rather than Does small pieces of systems) and cede that our research community important endeavor to industry?” have similar issues?? • “We are our own worst enemies. I think we have been blinded by the perception that "true scientific" research is only found in controlled experiments and nice statistics.” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
    • MS Academic Search: “Pointer Analysis”
    • “Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE’01] “During the past 21 years, over 75 papers and 9 Ph.D. theses have been published on pointer analysis. Given the tones of work on this topic one may wonder, “Haven't we solved this problem yet?'' With input from many researchers in the field, this paper describes issues related to pointer analysis and remaining open problems.” Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001) Source©M. Hind 58
    • “Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE’01] Section 4.3 Designing an Analysis for a Client’s Needs “Barbara Ryder expands on this topic: “… We can all write an unbounded number of papers that compare different pointer analysis approximations in the abstract. However, this does not accomplish the key goal, which is to design and engineer pointer analyses that are useful for solving real software problems for realistic programs.” Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001) Source©M. Hind&B. Ryder 59
    • MS Academic Search: “Clone Detection”
    • MS Academic Search: “Clone Detection” Typically focus/evaluate on intermediate steps (e.g., clone detection) instead of ultimate tasks (e.g., bug detection or refactoring), even when the field already grows mature with n years of efforts on intermediate steps
    • Some Success Stories of Applying Clone Detection [Focus on Ultimate Tasks] MSRA XIAO Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proc. ACSAC 2012, http://research.microsoft.com/en-us/groups/sa/ Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proc. OSDI 2004. http://patterninsight.com/ http://www.blackducksoftware.com/ 61
    • Suggested Actions  Tech Adoption • • • • Get research problems from real practice Get feedback from real practice Collaborate across disciplines Collaborate with industry
    • •Software Analytics Data Exploration and Analysis For Software Practitioners Obtain Insightful and Actionable info With Analytic Techniques • Producing Impact on Practice
    • Acknowledgments • Microsoft Research Asia Software Analytics Group • Ahmed Hassan, Lin Tan, Jian Pei • Many other colleagues 64
    • Q&A
    • •Software Analytics Data Exploration and Analysis For Software Practitioners Obtain Insightful and Actionable info With Analytic Techniques • Producing Impact on Practice