Making Sense of Online
Code Snippets
Siddharth Subramanian, Reid Holmes
University of Waterloo
2
Indexes millions of random code
snippets from the internet
Public
Code on
the Internet
Traditional Code
Search
crawler
MAKING SENSE OF ONLINE CODE SNIPPETS - SIDDHARTH SUBRAMANIAN, REID HOLMES
MSR '13
Curated Code
Search
3
Indexes millions of random code
snippets from the internet
Indexes a limited set of good
quality code snippets
Public
Code on
the Internet
Traditional Code
Search
crawler
MAKING SENSE OF ONLINE CODE SNIPPETS - SIDDHARTH SUBRAMANIAN, REID HOLMES
MSR '13
crawler
Code Search Challenges
5
chrono -Type unknown!
run()- 20 different methods
java.util.TimerTask.run()
android.os.HandlerThread.run()
…
start()- 26 different methods
android.media.MediaPlayer.start()
android.animation.Animator.start()
…
MAKING SENSE OF ONLINE CODE SNIPPETS - SIDDHARTH SUBRAMANIAN, REID HOLMES
MSR '13
Code Search Challenges
PROBLEMS WITH LEXICAL SEARCH
6
Code is treated as plain-text
Underlying API linkage is lost
Method name collisions
MAKING SENSE OF ONLINE CODE SNIPPETS - SIDDHARTH SUBRAMANIAN, REID HOLMES
MSR '13
Code Search Challenges
7
chrono -android.widget.Chronometer
run()-java.lang.Runnable.run()
start()- android.widget.Chronometer.start()
MAKING SENSE OF ONLINE CODE SNIPPETS - SIDDHARTH SUBRAMANIAN, REID HOLMES
MSR '13
Code Search Challenges
PROBLEMS WITH LEXICAL SEARCH PROBLEMS WITH PARSING CODE
Code snippets are often incomplete
Missing class declarations
Missing method declarations
Incomplete code fragments
8
Code is treated as plain-text
Underlying API linkage is lost
Method name collisions
MAKING SENSE OF ONLINE CODE SNIPPETS - SIDDHARTH SUBRAMANIAN, REID HOLMES
MSR '13
Good morning everybody, I’m Siddharth Subramanian and I’m here to explain our submission to the MSR Challenge, which was work done in collaboration with Reid Holmes at the University of Waterloo. We built a system that helps developers better find API usage examples from the internet. We do so by extracting structural information hidden in code snippets on to guide code search and to construct a repository of curated source code examples from StackOVerflow.
Developers frequently reuse source code or search for examples to learn about a new API. In the process, they frequently use websites like Google code or Krugle to look for examples. How do these code search engines work? They index millions of code snippets that are publicly available on the internet. However, a lot of these code snippets are of poor quality and there is no assurance if they would actually work. To overcome this issue, we built a curated code search engine that searches through code snippets in accepted answers on stack overflow. This way, developers can search for code examples with have a guarantee that the results they get would actually work.What it does? Problem? What we do? Why android?
Developers frequently reuse source code or search for examples to learn about a new API. In the process, they frequently use websites like Google code or Krugle to look for examples. How do these code search engines work? They index millions of code snippets that are publicly available on the internet. However, a lot of these code snippets are of poor quality and there is no assurance if they would actually work. To overcome this issue, we built a curated code search engine that searches through code snippets in accepted answers on stack overflow. This way, developers can search for code examples with have a guarantee that the results they get would actually work.What it does? Problem? What we do? Why android?
However, lexically searching through source code is lossy. Consider the following code snippet from a post on SO. The type declaration of the chrono object is missing, so we do not know which particular methods run(), setbase() and start() methods are from the android API are being called. The android API has 20 different methods named run() and 26 methods named start() and It is not clear which ones are being called in this context. However, on parsing and analysing the code snippet, we can infer that the chrono object belongs to android.widget.chronometer type and the method run() being overridden is the ___ and start is ___. But since stack overflow deals with code snippets, parsing them is difficult.
However, lexically searching through source code is lossy. Consider the following code snippet from a post on SO. The type declaration of the chrono object is missing, so we do not know which particular methods run(), setbase() and start() methods are from the android API are being called. The android API has 20 different methods named run() and 26 methods named start() and It is not clear which ones are being called in this context. However, on parsing and analysing the code snippet, we can infer that the chrono object belongs to android.widget.chronometer type and the method run() being overridden is the ___ and start is ___. But since stack overflow deals with code snippets, parsing them is difficult.
However, lexically searching through source code is lossy. Consider the following code snippet from a post on SO. The type declaration of the chrono object is missing, so we do not know which particular methods run(), setbase() and start() methods are from the android API are being called. The android API has 20 different methods named run() and 26 methods named start() and It is not clear which ones are being called in this context. However, on parsing and analysing the code snippet, we can infer that the chrono object belongs to android.widget.chronometer type and the method run() being overridden is the ___ and start is ___. But since stack overflow deals with code snippets, parsing them is difficult.
However, lexically searching through source code is lossy. Consider the following code snippet from a post on SO. The type declaration of the chrono object is missing, so we do not know which particular methods run(), setbase() and start() methods are from the android API are being called. The android API has 20 different methods named run() and 26 methods named start() and It is not clear which ones are being called in this context. However, on parsing and analysing the code snippet, we can infer that the chrono object belongs to android.widget.chronometer type and the method run() being overridden is the ___ and start is ___. But since stack overflow deals with code snippets, parsing them is difficult.
However, lexically searching through source code is lossy. Consider the following code snippet from a post on SO. The type declaration of the chrono object is missing, so we do not know which particular methods run(), setbase() and start() methods are from the android API are being called. The android API has 20 different methods named run() and 26 methods named start() and It is not clear which ones are being called in this context. However, on parsing and analysing the code snippet, we can infer that the chrono object belongs to android.widget.chronometer type and the method run() being overridden is the ___ and start is ___. But since stack overflow deals with code snippets, parsing them is difficult.
We built a tool called snipparse that can parse through incomplete java source code snippets and extract structural information from them. We populated results using this tool on the posts belonging to the android framework since previous research by Parnin and others has shown that SO discussions cover a significant portion of the android API. We used this tool to build a curated source code repository where code is indexed based on the types and methods that are being used in them. This repository is made accessible through a web interface called codehunter that allows users search for precise API usage examples.
We built a tool called snipparse that can parse through incomplete java source code snippets and extract structural information from them. We populated results using this tool on the posts belonging to the android framework since previous research by Parnin and others has shown that SO discussions cover a significant portion of the android API. We used this tool to build a curated source code repository where code is indexed based on the types and methods that are being used in them. This repository is made accessible through a web interface called codehunter that allows users search for precise API usage examples.
We built a tool called snipparse that can parse through incomplete java source code snippets and extract structural information from them. We populated results using this tool on the posts belonging to the android framework since previous research by Parnin and others has shown that SO discussions cover a significant portion of the android API. We used this tool to build a curated source code repository where code is indexed based on the types and methods that are being used in them. This repository is made accessible through a web interface called codehunter that allows users search for precise API usage examples.
We built a tool called snipparse that can parse through incomplete java source code snippets and extract structural information from them. We populated results using this tool on the posts belonging to the android framework since previous research by Parnin and others has shown that SO discussions cover a significant portion of the android API. We used this tool to build a curated source code repository where code is indexed based on the types and methods that are being used in them. This repository is made accessible through a web interface called codehunter that allows users search for precise API usage examples.
We built a tool called snipparse that can parse through incomplete java source code snippets and extract structural information from them. We populated results using this tool on the posts belonging to the android framework since previous research by Parnin and others has shown that SO discussions cover a significant portion of the android API. We used this tool to build a curated source code repository where code is indexed based on the types and methods that are being used in them. This repository is made accessible through a web interface called codehunter that allows users search for precise API usage examples.
To summarize, we built a tool that can identify structy