Iterator - a powerful but underappreciated design pattern
Live API Documentation
1. Live API Documentation
Subramanian, S., Inozemtseva, L., & Holmes, R. (2014, May). Live API documentation.
In ICSE (pp. 643-652).
Presenter: Hossein Mobasher
Course: Software Evolution
3. Introduction
• APIs enable complex functionality to be used by client programs
• Understanding how to use an API can be difficult
• API documentation is often insufficient on its own.
• Writing documentation and keeping it up to date is very difficult
• Developers ignore the documentation that does exist and declare that “code
is king”
3 / 20
4. Introduction (continue)
• Online sites fill the gap between traditional API documentation and
more example-based resources
• StackOverflow
• Github Gists
• Unfortunately, these two important classes of documentation are
independent
• Baker links source code examples to API documentation
4 / 20
5. Previous works
• Identify source code references within non-code resources.
• These approaches have several limitations:
• Some systems explicitly ignored external references.
• Others only returned partially qualified names (PQN).
• Are insufficient for documentation linking.
• None of them worked for dynamically-typed languages.
5 / 20
6. What’s new?
• A constraint-based, iterative approach for determining the fully
qualified names of code elements in source code snippets.
• A prototype tool that implements this approach and uses the results
to automatically create bidirectional links between documentation
and source code examples.
6 / 20
7. Scenario 1
• Code snippet posted to StackOverflow to assist a developer who
didn’t understand how to manipulate the state of History objects.
• Baker can uniquely link bolded elements to the API.
• The elements for which it can determine a fully qualified name. (FQN)
7
8. Scenario 2
• Code snippet that a developer is trying to make a web app that can
take a photo and inject it into an element in an HTML documents.
• Baker also can identify the API that bolded elements are from.
8
9. Oracle Generation
• Baker’s oracle is key to its success.
• It is generally impossible to identify FQN of the code elements in a
snippets.
• FQN is essential to documentation linking tasks.
• The oracle are implemented as web services.
• Allowing Java/JS to be updated dynamically by any user or program.
9 / 20
10. Oracle Generation (continue)
• Java Oracle
• Containing class, method and field signatures.
• Using Neo4j to represent the hierarchies between code elements that an
object-oriented language like Java offers.
• Oracle includes full type information in the database.
• The types of classes, fields, return types, and parameters.
• The Java oracle can be dynamically updated by adding an appropriate JAR file.
10 / 20
11. Oracle Generation (continue)
• JS Oracle
• Is built by statically analyzing the source files of the libraries to be included.
• Using ESPRIMA to parse the source code of each library.
• ESPRIMA returns a JSON representation of the AST.
• JS oracle identifies all of the ‘Function Expressions’ and ‘Function Declaration’
nodes.
• Parsing problems:
• JavaScript is dynamically typed language. It is difficult to identify all method declarations
by static analysis of source code.
• JavaScript is not annotated with visibility (e.g. public and private)
11 / 20
12. Problems
• Parsing code snippets is more difficult than full files.
• Code snippets can be ambiguous.
• Kinds of ambiguity:
• Declaration Ambiguity
• External Reference Ambiguity
12 / 20
13. Approach
• Deductive Linking
• Handles declaration and external reference ambiguity.
• The goal is identifying the sole FQN that a given identifier can represent.
• Generating AST for code snippets.
• Uses information from the oracle to deduce facts about the AST.
13 / 20
14. Example (JavaBaker)
• History: 58 candidate types are recorded for History in oracle.
• addHistoryListener: Test 58 candidate types to see which ones contain a
method called addHistoryListener(…) that take a single object parameter.
• This results in 4 candidate methods.
• History node is also updated (reduced from 58 to 4)
• History.getToken: Test 4 candidates, reduced to from 4 to 2
• …
• At the end, Baker iterates again.
• Baker can be identified History as
com.google.gwt.user.client.History
14 / 20
15. Example (JSBaker)
• $(…).on(…)
• $ matches only with jQuery.
• on matches with jQuery’s on method. (Even though there are three on
method in the oracle)
• useGetPicture
• Oracle doesn’t contain a result for that.
• Baker records that this function as locally
defined, rather than being an external function.
• …
15 / 20
16. Evaluation
• Two research questions:
• Can Baker accurately identify API elements in code snippets?
• Does Baker work on a variety of systems, or is it limited to just a few libraries?
16 / 20
17. Linker Accuracy
• Precision is much more important than recall.
• Choose five Java systems (libraries) for analyzing.
• Android/ GWT/ Hibernate/ Joda Time/ XStream
• Manually examined 50 code elements for each system to determine if
the result returned by Baker:
• True Positive (TP)
• False Positive (FP)
• False Negative (FN)
17 / 20
18. Linker Accuracy (continue)
• Baker’s overall Java precision (0.98) and recall (0.83). Only exact
matches (cardinality = 1) were considered.
18 / 20
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 98% 𝑅𝑒𝑐𝑎𝑙𝑙 = 83%
19. Linker Accuracy (continue)
19 / 20
• Baker’s overall JavaScript precision (0.97) and recall (0.96). Only exact
matches were considered. (cardinality = 1)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 97% 𝑅𝑒𝑐𝑎𝑙𝑙 = 96%
20. Example Diversity
• JavaBaker parsed 4000 source code examples.
• Identified over 30000 links to 4500 unique elements.
• JSBaker parsed 1000 source code examples.
• Identified over 10000 links to 500 unique elements.
21. Qualifying High-cardinality Match
• Linking methods may return more than one match when there isn’t
enough information to FQN of a method or type.
• This is relatively rare.
• Graph shows the cardinality of the result
for each of the 4,000 snippets.
• JDK types and methods have been removed.
• The majority (69%) of elements can be
precisely identified.
21 / 20
22. Conclusion
• Maintaining API documentation is challenging, time-consuming task.
• The documentation is frequently out of date.
• Baker automatically generates links between API documentation and
source code examples.
• Baker has high precision. (0.97)
22 / 20