Tao XieNorth Carolina State Universitywith Dongmei Zhang (Microsoft ResearchAsia)Xusheng Xiao (North Carolina State Univer...
Software analytics is to enable softwarepractitioners to perform data exploration andanalysis in order to obtain insightfu...
ICSE 2013ResearchTopicsTechnology PillarsTargetAudienceConnection toPracticeOutput
SoftwareUsersSoftwareDevelopmentProcessSoftwareSystem• Covering different areas ofsoftware domain• Throughout entire devel...
Runtime tracesProgram logsSystem eventsPerf counters…Usage logUser surveysOnline forum postsBlog &Twitter…Source codeBug h...
DeveloperTesterProgram ManagerUsability engineerDesignerSupport engineerManagement personnelOperation engineer
ICSE 2013 Conveys meaningful and useful understanding orknowledge towards completing the target task Not easily attainab...
ICSE 2013 Enables software practitioners to come up withconcrete solutions towards completing the target task Examples ...
VerticalHorizontalInformationVisualizationDataAnalysisAlgorithmsLarge-scaleComputingSoftwareUsersSoftwareDevelopmentProces...
ICSE 2013 SoftwareAnalytics is naturally tied withsoftware development practice Getting realRealDataRealProblemsRealUser...
11Pattern MatchingBug updateProblematicPattern RepositoryBug DatabaseTrace analysisBugfilingStackMine [Han et al. ICSE 12]...
“We believe that the MSRA tool is highly valuable andmuch more efficient for mass trace (100+ traces) analysis.For 1000 tr...
Dual Ends of the Road13Foundation: Science of Software Analytics?From correlation to causalityPractice: Software Analyt...
Choose random systemcomponentFind vulnerabilitySuggest defenseAnalyze security ortest performanceAre we makingprogress?Pos...
Systematization of Knowledge: An organized body ofknowledge gained through researchAd hoc point solutions vs. general unde...
Percentage of bug-introducing changes for eclipseDon’t program on Fridays ;-)[Zimmermann et al. 05]
Failure is a 4-letterWord[PROMISE’11 Zeller et al.]
From Correlation to CausalityAnalytic techniques are often used for applicationsthat emphasize results over causation of ...
From Correlation to Causality cont.Analytic techniques are often not used tosupport the identification and advancementof ...
Open QuestionsHow much science of a field (e.g., soft analytics)?A field may be a means/solution in contrastto a problem...
Dual Ends of the Road21Foundation PracticeFoundation: Science of Software Analytics?From correlation to causalityPracti...
Download countsinitial 20 months of releaseAcademic: 17,366Industrial: 13,022Total: 30,38822Released since 2008
InterestingresultsActionableresultsvs.Problem huntingvs.Problem driven
Open Questions24Who should bring software analytics researchresults to the hands of practitioners?How to do so?
Dual Ends of the Road25Foundation PracticeFoundation: Science of Software Analytics?From correlation to causalityPracti...
Questions ?https://sites.google.com/site/asergrp/taoxie@gmail.comNSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO gra...
Upcoming SlideShare
Loading in …5
×

Advancing Foundation and Practice of Software Analytics

1,123 views

Published on

Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,123
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://stackoverflow.com/questions/3165483/why-pex-is-not-massive
  • Advancing Foundation and Practice of Software Analytics

    1. 1. Tao XieNorth Carolina State Universitywith Dongmei Zhang (Microsoft ResearchAsia)Xusheng Xiao (North Carolina State University)Chunhua Weng (Columbia University)RAISE 2013
    2. 2. Software analytics is to enable softwarepractitioners to perform data exploration andanalysis in order to obtain insightful andactionable information for data-driven tasksaround software and services.Dongmei Zhang,Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, andTaoXie. Software Analytics as a Learning Case in Practice: Approaches and Experiences.In Proc. MALETS 2011.MSRA SoftwareAnalytics group founded in May 2009Term coined/defined expanding scope of previous work[Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10]http://research.microsoft.com/en-us/groups/sa/http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
    3. 3. ICSE 2013ResearchTopicsTechnology PillarsTargetAudienceConnection toPracticeOutput
    4. 4. SoftwareUsersSoftwareDevelopmentProcessSoftwareSystem• Covering different areas ofsoftware domain• Throughout entire developmentcycle• Enabling practitioners to obtaininsights
    5. 5. Runtime tracesProgram logsSystem eventsPerf counters…Usage logUser surveysOnline forum postsBlog &Twitter…Source codeBug historyCheck-in historyTest cases…
    6. 6. DeveloperTesterProgram ManagerUsability engineerDesignerSupport engineerManagement personnelOperation engineer
    7. 7. ICSE 2013 Conveys meaningful and useful understanding orknowledge towards completing the target task Not easily attainable via directly investigating rawdata without aid of analytics technologies Going from correlation to causality Examples It is easy to count the number of re-opened bugs, but how tofind out the primary reasons for these re-opened bugs? When the availability of an online service drops below athreshold, how to localize the problem?
    8. 8. ICSE 2013 Enables software practitioners to come up withconcrete solutions towards completing the target task Examples Why bugs were re-opened?▪ A list of bug groups each with the same reason of re-opening Why availability of online services dropped?▪ A list of problematic areas with associated confidencevalues Which part of my code should be refactored?▪ A list of cloned code snippets easily explored fromdifferent perspectives
    9. 9. VerticalHorizontalInformationVisualizationDataAnalysisAlgorithmsLarge-scaleComputingSoftwareUsersSoftwareDevelopmentProcessSoftwareSystem
    10. 10. ICSE 2013 SoftwareAnalytics is naturally tied withsoftware development practice Getting realRealDataRealProblemsRealUsersRealTools
    11. 11. 11Pattern MatchingBug updateProblematicPattern RepositoryBug DatabaseTrace analysisBugfilingStackMine [Han et al. ICSE 12]Trace StorageTrace collectionInternetShi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Largevia Mining Millions of StackTraces. In Proc. ICSE 2012How many issues are stillunknown?Which trace file should Iinvestigate first?Key to issuediscoveryBottleneck ofscalability
    12. 12. “We believe that the MSRA tool is highly valuable andmuch more efficient for mass trace (100+ traces) analysis.For 1000 traces, we believe the tool saves us 4-6 weeks oftime to create new signatures, which is quite a significantproductivity boost.”- from Development Manager inWindowsHighly effective new issue discovery onWindows mini-hangContinuous impact on futureWindows versionsShi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Largevia Mining Millions of StackTraces. In Proc. ICSE 2012
    13. 13. Dual Ends of the Road13Foundation: Science of Software Analytics?From correlation to causalityPractice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practiceFoundation Practice
    14. 14. Choose random systemcomponentFind vulnerabilitySuggest defenseAnalyze security ortest performanceAre we makingprogress?Positive aspect: most security research addresses real problems@J. Mitchell
    15. 15. Systematization of Knowledge: An organized body ofknowledge gained through researchAd hoc point solutions vs. general understandingRepeating failures of the past with each new platform, type ofvulnerabilityScientific Method: System of acquiring knowledge basedon the scientific methodProcess of hypothesis testing and experimentsBuilding abstractions and models, theoremsUniversal Laws: Laws or theories that are predictiveWidely applicableMake strong, quantitative predictions@D. Evans, J. Mitchell
    16. 16. Percentage of bug-introducing changes for eclipseDon’t program on Fridays ;-)[Zimmermann et al. 05]
    17. 17. Failure is a 4-letterWord[PROMISE’11 Zeller et al.]
    18. 18. From Correlation to CausalityAnalytic techniques are often used for applicationsthat emphasize results over causation of thefindingsUsers may choose to act on the behavior withoutfocus on understanding it (or its causation)provided that the pattern has a high empiricalprobability of correctly identifying an issueE.g., smuggling, traveling with false documents,or predicting winning stock@L.Williams, M. Rappa
    19. 19. From Correlation to Causality cont.Analytic techniques are often not used tosupport the identification and advancementof fundamental scientific principles basedupon an analysis of causationEmphasize the use of analytics to advancescience (e.g., producing insights) besides theuse of analytics in providing just observations@L.Williams, M. Rappa
    20. 20. Open QuestionsHow much science of a field (e.g., soft analytics)?A field may be a means/solution in contrastto a problem domain like “security”,“design”How can analytics/AI be used to help buildscience of “X”?How to move a field to a foundational level?How to balance foundation and practice?
    21. 21. Dual Ends of the Road21Foundation PracticeFoundation: Science of Software Analytics?From correlation to causalityPractice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice
    22. 22. Download countsinitial 20 months of releaseAcademic: 17,366Industrial: 13,022Total: 30,38822Released since 2008
    23. 23. InterestingresultsActionableresultsvs.Problem huntingvs.Problem driven
    24. 24. Open Questions24Who should bring software analytics researchresults to the hands of practitioners?How to do so?
    25. 25. Dual Ends of the Road25Foundation PracticeFoundation: Science of Software Analytics?From correlation to causalityPractice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice
    26. 26. Questions ?https://sites.google.com/site/asergrp/taoxie@gmail.comNSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, anNSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIFAward

    ×