Knowledge Collaboration by Mining Software Repositories

Thomas Zimmermann
Thomas ZimmermannResearcher at Microsoft Research
Knowledge Collaboration by
Mining Software Repositories



Tom Zimmermann
Saarland University, Saarbrücken, Germany
Guiding developers




Zimmermann, Weissgerber, Diehl, Zeller (TSE 2005)
Knowledge Collaboration by Mining Software Repositories
Knowledge Collaboration by Mining Software Repositories
eROSE suggests further locations.
Knowledge Collaboration by Mining Software Repositories
eROSE prevents incomplete changes.
eROSE is customizable.
“Indirect” collaboration

   Direct collaboration


         Version
         archive
“Indirect” collaboration

   Direct collaboration


         Version
         archive

              Mining

         Hidden
        Knowledge
“Indirect” collaboration

   Direct collaboration


                          Indirect
         Version
                          collaboration
         archive

              Mining

         Hidden
        Knowledge
Future
#1: Change classification
#1: Change classification

bad changes (e.g., from bug database)

   X       X        X    X
#1: Change classification
       BUILD A CLASSIFIER
bad changes (e.g., from bug database)

   X       X        X    X
#1: Change classification
       BUILD A CLASSIFIER
bad changes (e.g., from bug database)

   X       X        X    X
                                        new change
#1: Change classification
       BUILD A CLASSIFIER
bad changes (e.g., from bug database)

   X       X        X    X
                                        new change


                             PREDICT QUALITY
#2: What should we collect
• Mining software repositories relied on
   exiting repositories so far.
• Collecting new data (e.g., navigation traces)
   opens new opportunities.
• Software(ICSM 2005), DeLine et al. (VL/HCC 2005)
               Navigation
  Singer et al

• Socialet al. (TagSea tool)
         Tagging
  Storey
Mining across projects
#3: Mining across projects

• Extend source code search engines with
  mining techniques.
• Large scale mining (129,167 SF projects) and
  large scale collaboration (1,393,250 SF users).
• Usage Pei (MSR 2006) Koders.com
          patterns from
  Xie and
Conclusion

• History supports knowledge collaboration.
• Future challenges: granularity and data.
• Mining software repositories @ ASE 2006:
  − Wednesday 4pm: Impact analysis
  − Friday 9am: Management
  − Friday 11am: Mining software repositories
1 of 21

More Related Content

Similar to Knowledge Collaboration by Mining Software Repositories(20)

On Impact in Software Engineering ResearchOn Impact in Software Engineering Research
On Impact in Software Engineering Research
CISPA Helmholtz Center for Information Security3K views
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
CISPA Helmholtz Center for Information Security1.4K views
DataHubDataHub
DataHub
Aditya Parameswaran2K views
On Impact in Software Engineering Research (Dagstuhl 2020)On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)
CISPA Helmholtz Center for Information Security992 views
Data ScienceData Science
Data Science
Ahmet Bulut945 views
On Impact in Software Engineering Research (HU Berlin 2021)On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)
CISPA Helmholtz Center for Information Security847 views
Msr17a.pptMsr17a.ppt
Msr17a.ppt
Ptidej Team98 views
Msr17a.pptMsr17a.ppt
Msr17a.ppt
Yann-Gaël Guéhéneuc22 views

More from Thomas Zimmermann(20)

Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
Thomas Zimmermann3.3K views
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
Thomas Zimmermann21.8K views
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann2.6K views
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
Thomas Zimmermann1.3K views
Data driven games user researchData driven games user research
Data driven games user research
Thomas Zimmermann1.5K views
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic models
Thomas Zimmermann1.5K views
Analytics for software developmentAnalytics for software development
Analytics for software development
Thomas Zimmermann4.6K views
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
Thomas Zimmermann1.9K views
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann1.6K views
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
Thomas Zimmermann1.5K views
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
Thomas Zimmermann5.9K views
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
Thomas Zimmermann632 views
Unit testing with JUnitUnit testing with JUnit
Unit testing with JUnit
Thomas Zimmermann16.1K views

Recently uploaded(20)

METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation22 views
ISWC2023-McGuinnessTWC16x9FinalShort.pdfISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
Deborah McGuinness80 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman143 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet44 views

Knowledge Collaboration by Mining Software Repositories