Towards Mining Software Repositories Research that Matters

Tao Xie
Tao XieSoftware Engineering Researcher & Educator
Towards Mining Software Repositories
Research that Matters
Tao Xie
Department of Computer Science
University of Illinois at Urbana-Champaign, USA
taoxie@illinois.edu
Machine Learning that Matters
“The basic argument in her paper is that machine learning
might be in danger of losing its impact because the
community as a whole has become quite self-referential.
People are probably solving real-world problems using ML
methods, but there is little sharing of these results within
the community. Instead, people focus on existing
benchmarks which might have originally had some
connection to real-world problems which has been long
forgotten, however.”
“She proposes a number of tasks like $100M solved
through ML based decision making or a human life saved
through a diagnosis or an intervention recommended by
an ML system to get ML back on track.”
ICML’12
http://icml.cc/2012/papers/298.pdf
http://blog.mikiobraun.de/2012/06/is-machine-learning-losing-impact.html
Redwine and Riddle Study (1985)
• From idea to “the point it can be popularized and
disseminated to the technical community at
large”
– Worst case: 23 years
– Best case: 11 years
– Mean: 17 years
• 7.5 years from developed technology to wide
availability
Source©S. L. Pfleeger
Sam Redwine Jr., William Riddle: Software Technology Maturation, In Proc. ICSE 1985.
Technology Maturation: Middleware
Source©A. Wolfhttp://www.sigsoft.org/impact/docs/ImpactWolfBCS2008.pdf
15-20 years between first
publication of an idea and
widespread availability in
products
Technology Maturation: Middleware
Source©A.http://www.sigsoft.org/impact/docs/ImpactWolfBCS2008.pdf
15-20 years between first
publication of an idea and
widespread availability in
products
Shall we just stay in our comfort
zone to wait for 15-20 years for
our research to (or not to)
produce practice impact??
How about the research that we
did 15-20 years ago??
[Caveat: don’t forget the need of
long-term/blue-sky research!!]
2012 NSF Workshop on Formal Methods
• Goal: to identify the future directions in research in
formal methods and its transition to industrial
practice.
• Success examples mentioned by the attendees
– SLAM/SDV
– ASTREE
– SMT-based tools
– …
http://goto.ucsd.edu/~rjhala/NSFWorkshop/
“What Happened to the Promise
of Software Tools?” – Jim Larus
http://www.srl.inf.ethz.ch/workshop2014/eth-larus.pdf
https://www.youtube.com/watch?v=kO9OYnkeRTM
Impacts, Impacts, Impacts, …
Image source: http://engage.synecoretech.com/marketing-technology-for-growth/bid/155279/How-Online-Content-Impacts-Your-Social-Media-Marketing-Strategy
Research Impacts
99319
22786
32987
Research Impacts
SIGSOFT Impact Paper Awards, ICSE MIP awards, …
…
Practice Impacts ACM Software System Awards
31 Awardees
http://awards.acm.org/software_system/
Practice Impacts ACM Software System Awards
• Development Environments/Tools
– 2013: Coq
– 2012: LLVM
– 2011: Eclipse
– 2007: Statemate
– 2006: Eiffel
– 2005: The Boyer-Moore Theorem Prover (ACL2)
– 2003: MAKE
– 2001: SPIN
– 1992: Interlisp
• Languages
– 2002: Java
– 1998: The S System (R statistical analysis)
– 1997: Tcl/Tk
– 1987: SMALLTALK
2012 LLVM born at Illinois
• The openness of the LLVM technology and the quality of its
architecture and engineering design are key factors in
understanding the success it has had both in academia and
industry
Vikram Adve Chris Lattner Evan Cheng
http://llvm.org/
Practice Impacts commercialization/industrial adoption
…
SAGE
ASTRÉE
Statechart
SPIN
Moles
Microsoft Research
…
…
Practice Impacts
research publications  industrial adoption done by others
…
• ICSE 00 Daikon paper by Ernst et al.  Agitar Agitator
– https://homes.cs.washington.edu/~mernst/pubs/invariants-relevance-icse2000.pdf
• ASE 04 Rostra paper by Xie et al.  Parasoft Jtest improvement
– http://web.engr.illinois.edu/~taoxie/publications/ase04.pdf
• PLDI/FSE 05 DART/CUTE papers by Sen et al.  MSR SAGE, Pex
– http://srl.cs.berkeley.edu/~ksen/papers/dart.pdf
– http://srl.cs.berkeley.edu/~ksen/papers/C159-sen.pdf
HOW???
• Are these impact goals too far from you?
• Can you plan for that?
• What if you are in a university research
group?
• …
(How) Can A University Group Do It?
• Aim for research impacts more commonly
– but sometimes/often may not be predicted well,
e.g., Whyper [USENIX SEC 13] http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf
• Start a startup
– but desirable to have right people (e.g., former students) to start
– but software engineering tools may not sell crazily
• Collaborate with industrial research labs
– but many research lab projects may look like univ. projects
• Collaborate with industrial product groups
– but many probs faced by product groups may not be “researchy”
• At least focus on problems that matter (now or future)!
(How) Can A University Group Do It?
• Need to balance/unify producing great
students vs./and great (high practice-impact)
research
http://www.cs.washington.edu/people/faculty/notkin/students
conts.
Experience Reports on Successful Tool Transfer
• Nikolai Tillmann, Jonathan de Halleux, and Tao Xie. Transferring an Automated Test
Generation Tool to Practice: From Pex to Fakes and Code Digger. In Proceedings of ASE
2014, Experience Papers. http://web.engr.illinois.edu/~taoxie/publications/ase14-
pexexperiences.pdf
• Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. Software
Analytics for Incident Management of Online Services: An Experience Report. In
Proceedings ASE 2013, Experience Paper.
http://web.engr.illinois.edu/~taoxie/publications/ase13-sas.pdf
• Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, and Tao Xie.
Software Analytics in Practice. IEEE Software, Special Issue on the Many Faces of Software
Analytics, 2013. http://web.engr.illinois.edu/~taoxie/publications/ieeesoft13-softanalytics.pdf
• Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO:
Tuning Code Clones at Hands of Engineers in Practice. In Proceedings of ACSAC 2012.
http://web.engr.illinois.edu/~taoxie/publications/acsac12-xiao.pdf
Q & A
http://www.cs.illinois.edu/homes/taoxie/
Contact: taoxie@illinois.edu
Supported in part by a Microsoft Research Award, NSF grants CCF-1349666, CNS-1434582, CCF-1434596, CCF-
1434590, CNS-1439481, and the USA National Security Agency (NSA) Science of Security Lablet.
Discussion
Discussion Topics: HOW???
• Are these impact goals too far from you?
• Can you plan for that?
• What if you are in a university research
group?
• …
1 of 21

More Related Content

What's hot(20)

Electronic Laboratory NotebooksElectronic Laboratory Notebooks
Electronic Laboratory Notebooks
Kristin Briney4.7K views
2010 ICGSE - Challenges and Solutions in Distributed Software Development Pro...2010 ICGSE - Challenges and Solutions in Distributed Software Development Pro...
2010 ICGSE - Challenges and Solutions in Distributed Software Development Pro...
HASE – Human Aspects in Software Engineering597 views
Customer Success Story: IEEE Provides Ongoing EducationCustomer Success Story: IEEE Provides Ongoing Education
Customer Success Story: IEEE Provides Ongoing Education
IEEE Xplore Digital Library2K views
Customer Success Story: IEEE Xplore Saves TimeCustomer Success Story: IEEE Xplore Saves Time
Customer Success Story: IEEE Xplore Saves Time
IEEE Xplore Digital Library3.3K views

Similar to Towards Mining Software Repositories Research that Matters(20)

Towards Mining Software Repositories Research that Matters

  • 1. Towards Mining Software Repositories Research that Matters Tao Xie Department of Computer Science University of Illinois at Urbana-Champaign, USA taoxie@illinois.edu
  • 2. Machine Learning that Matters “The basic argument in her paper is that machine learning might be in danger of losing its impact because the community as a whole has become quite self-referential. People are probably solving real-world problems using ML methods, but there is little sharing of these results within the community. Instead, people focus on existing benchmarks which might have originally had some connection to real-world problems which has been long forgotten, however.” “She proposes a number of tasks like $100M solved through ML based decision making or a human life saved through a diagnosis or an intervention recommended by an ML system to get ML back on track.” ICML’12 http://icml.cc/2012/papers/298.pdf http://blog.mikiobraun.de/2012/06/is-machine-learning-losing-impact.html
  • 3. Redwine and Riddle Study (1985) • From idea to “the point it can be popularized and disseminated to the technical community at large” – Worst case: 23 years – Best case: 11 years – Mean: 17 years • 7.5 years from developed technology to wide availability Source©S. L. Pfleeger Sam Redwine Jr., William Riddle: Software Technology Maturation, In Proc. ICSE 1985.
  • 4. Technology Maturation: Middleware Source©A. Wolfhttp://www.sigsoft.org/impact/docs/ImpactWolfBCS2008.pdf 15-20 years between first publication of an idea and widespread availability in products
  • 5. Technology Maturation: Middleware Source©A.http://www.sigsoft.org/impact/docs/ImpactWolfBCS2008.pdf 15-20 years between first publication of an idea and widespread availability in products Shall we just stay in our comfort zone to wait for 15-20 years for our research to (or not to) produce practice impact?? How about the research that we did 15-20 years ago?? [Caveat: don’t forget the need of long-term/blue-sky research!!]
  • 6. 2012 NSF Workshop on Formal Methods • Goal: to identify the future directions in research in formal methods and its transition to industrial practice. • Success examples mentioned by the attendees – SLAM/SDV – ASTREE – SMT-based tools – … http://goto.ucsd.edu/~rjhala/NSFWorkshop/
  • 7. “What Happened to the Promise of Software Tools?” – Jim Larus http://www.srl.inf.ethz.ch/workshop2014/eth-larus.pdf https://www.youtube.com/watch?v=kO9OYnkeRTM
  • 8. Impacts, Impacts, Impacts, … Image source: http://engage.synecoretech.com/marketing-technology-for-growth/bid/155279/How-Online-Content-Impacts-Your-Social-Media-Marketing-Strategy
  • 10. Research Impacts SIGSOFT Impact Paper Awards, ICSE MIP awards, … …
  • 11. Practice Impacts ACM Software System Awards 31 Awardees http://awards.acm.org/software_system/
  • 12. Practice Impacts ACM Software System Awards • Development Environments/Tools – 2013: Coq – 2012: LLVM – 2011: Eclipse – 2007: Statemate – 2006: Eiffel – 2005: The Boyer-Moore Theorem Prover (ACL2) – 2003: MAKE – 2001: SPIN – 1992: Interlisp • Languages – 2002: Java – 1998: The S System (R statistical analysis) – 1997: Tcl/Tk – 1987: SMALLTALK
  • 13. 2012 LLVM born at Illinois • The openness of the LLVM technology and the quality of its architecture and engineering design are key factors in understanding the success it has had both in academia and industry Vikram Adve Chris Lattner Evan Cheng http://llvm.org/
  • 14. Practice Impacts commercialization/industrial adoption … SAGE ASTRÉE Statechart SPIN Moles Microsoft Research … …
  • 15. Practice Impacts research publications  industrial adoption done by others … • ICSE 00 Daikon paper by Ernst et al.  Agitar Agitator – https://homes.cs.washington.edu/~mernst/pubs/invariants-relevance-icse2000.pdf • ASE 04 Rostra paper by Xie et al.  Parasoft Jtest improvement – http://web.engr.illinois.edu/~taoxie/publications/ase04.pdf • PLDI/FSE 05 DART/CUTE papers by Sen et al.  MSR SAGE, Pex – http://srl.cs.berkeley.edu/~ksen/papers/dart.pdf – http://srl.cs.berkeley.edu/~ksen/papers/C159-sen.pdf
  • 16. HOW??? • Are these impact goals too far from you? • Can you plan for that? • What if you are in a university research group? • …
  • 17. (How) Can A University Group Do It? • Aim for research impacts more commonly – but sometimes/often may not be predicted well, e.g., Whyper [USENIX SEC 13] http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf • Start a startup – but desirable to have right people (e.g., former students) to start – but software engineering tools may not sell crazily • Collaborate with industrial research labs – but many research lab projects may look like univ. projects • Collaborate with industrial product groups – but many probs faced by product groups may not be “researchy” • At least focus on problems that matter (now or future)!
  • 18. (How) Can A University Group Do It? • Need to balance/unify producing great students vs./and great (high practice-impact) research http://www.cs.washington.edu/people/faculty/notkin/students conts.
  • 19. Experience Reports on Successful Tool Transfer • Nikolai Tillmann, Jonathan de Halleux, and Tao Xie. Transferring an Automated Test Generation Tool to Practice: From Pex to Fakes and Code Digger. In Proceedings of ASE 2014, Experience Papers. http://web.engr.illinois.edu/~taoxie/publications/ase14- pexexperiences.pdf • Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. Software Analytics for Incident Management of Online Services: An Experience Report. In Proceedings ASE 2013, Experience Paper. http://web.engr.illinois.edu/~taoxie/publications/ase13-sas.pdf • Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, and Tao Xie. Software Analytics in Practice. IEEE Software, Special Issue on the Many Faces of Software Analytics, 2013. http://web.engr.illinois.edu/~taoxie/publications/ieeesoft13-softanalytics.pdf • Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proceedings of ACSAC 2012. http://web.engr.illinois.edu/~taoxie/publications/acsac12-xiao.pdf
  • 20. Q & A http://www.cs.illinois.edu/homes/taoxie/ Contact: taoxie@illinois.edu Supported in part by a Microsoft Research Award, NSF grants CCF-1349666, CNS-1434582, CCF-1434596, CCF- 1434590, CNS-1439481, and the USA National Security Agency (NSA) Science of Security Lablet. Discussion
  • 21. Discussion Topics: HOW??? • Are these impact goals too far from you? • Can you plan for that? • What if you are in a university research group? • …