Multi-dimensional exploration of API usage - ICPC13 - 21-05-13


Published on

Presented at the 21st IEEE International Conference on Program Comprehension (ICPC 2013), San Francisco (USA). Website of the paper:

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

  1. 1. Multi-dimensionalExploration of API UsageCoen De Roover1, Ralf Lämmel2, Ekaterina Pek31 Software Languages Lab, Vrije Universiteit Brussel, Belgium2 Software Languages Team, University of Koblenz-Landau, Germany3 ADAPT Lab, University of Koblenz-Landau, Germany
  2. 2. Exploration Story: JHotDraw➜ relatively few references to SAX and DOMwhat XML APIs are used and how extensively?Swing!!java.lang!!JavaBeans!!!!AWT!!java.utilPackage org.jhotdraw.undoAWT!!Swing!! java.lang java.utilJavaBeans java.text java.lang.reflect!!DOM!!java.netjava.util.regex!!Java Print Service!!!!java.lang.annotationjava.math java.lang.ref java.util.concurrent Java security!!javax.imageio!!SAXJHotDraw’s API CocktailFig. 8. The API Cocktail of JHotDraw (cloud of API tags).View – A list as in the case of the API Footprint insight, exceptthat it is narrowed down to a sub-API of interest.GUIIO!!ForXML!!DJHGUIAPIsfor theAPI cloudceptforth atoryents.heckAPIGUI!!Data!!Basics!!IO!!Format!!Component!!Meta!!XML!!Distribution!!Parsing!!Control!!Math!!Output!!Security!!ConcurrencyJHotDraw’s API Domain CocktailGUI!!Basics!!Component!!IOPackage org.jhotdraw.undoProject jhotdrawFig. 9. Cocktail of domains for!!!!Swing!!JavaBeans!!!!APIsAPI domainsCoupling in JHotDrawfor the interface 10. API Coupling for JHotDraw’s interface concepts. Consider Fig. 9 for illustration. It showsAPI domains for all of JHotDraw and also for its undopackage. Thus, it presents the API cocktails of Fig. 8 in aAPI domain cloud?Let me start by making the concept of exploring API usage more concrete.Imagine you are a developer tasked with migrating JH from XML to JSON for persistency.The first thing you would like to know is what APIs for manipulating XML are used, and how extensively these APIs are used.You could gain these insights through the two tag clouds shown on the slide. The top one contains the domains of the APIs used by JH, the bottom one the actual APIs. Thesize of a tag corresponds to the amount of references to the API or the domain. So we can conclude that XML apis are used by JH, more concretely DOM and SAX, but notextensively. There are a lot more references to the AWT and SWING APIs from the GUI programming domain, for instance.
  3. 3. Exploration Story: JHotDraw➜ footprint of DOM in JHotDraw is but 94 refs to 19 distinct elementswhat elements of DOM are actually used?table of referenced API elements (i.e., DOM slice)?The next insight to gain is whether the project uses the complete DOM API, or just a small subset. Given a table of referenced API elements, the latter seems to be the case.There are only 94 references to 19 distinct types and methods.Even better news, no exotic API elements are used.
  4. 4. Exploration Story: JHotDrawSlice of JHotDrawwith DOM usage➜ local to 1/13 top-level packagesHow is DOM usage distributed across JHotDraw?table of referencing project elements (i.e., JHotDraw slice)?in the view of hundreds of API elements declared by thepublic void applyStylesTo(Element elem) {for (CSSRule rule : rules) {if (rule.matches(elem)) {rule.apply(elem);}}}usage.All good news so far, but it could still be the case that the API is used all over the project. Luckily, given a table of referencing project elements, the use of DOM is local to 4classes in the org.jhotdraw.xml package.Our exploration therefore shows that migrating from XML to JSON is feasible.
  5. 5. Exploring API Usage: Quaatlas API AtlasAPI metadataAPI named collection of elements (98)API domain named collection of APIs addressing the same domain (27)API facet named collection of API elements addressing a particular concernbasicd APIerfaceot payworks.faces,n soft-vokesefixes,APIs.ssiblyGUIg as ackagetypeskagesAPIs.causeJava’smmonubsetswhichhouldysis isgiveny sortrectlyojects,tions of projects or APIs as well as specific packages, types,or methods thereof. For instance, we may be interested in #apifor a specific project. Also, we may be interested in #ref forsome part of an API.Further, these metrics can be configured to count onlyspecific patterns. It is easy to see now that the given metricsare not even orthogonal because, for example, #derive can beobtained from #ref by only counting patterns for ‘extends’ and‘implements’ relationships.API Domains: We assume that each API addresses someprogramming domain such as XML processing or GUI pro-gramming. We are not aware of any general, widely adoptedattempt to associate APIs with domains, but the idea appearsto merit further research. We have begun collecting program-ming domains (or in fact, API domains) and tagging APIsappropriately. Let us list a few API domains and associatethem with well-known Java APIs:GUI: GUI programming, e.g., Swing and AWT.XML: XML processing, e.g., DOM, JDOM, and SAX.Data: Data structures incl. containers, e.g., java.util.IO: File- and stream-based I/O, e.g., and java.nio.Component: Component-oriented programming, e.g., JavaBeans.Meta: Meta-programming incl. reflection, e.g., java.lang.reflect.Basics: Basic language support, e.g., java.lang.String.API domains are helpful in reporting API usage and quan-tifying API usage of interest in more abstract terms than thenames of individual APIs, as will be illustrated in §VI.API Facets: An API may contain dozens or hundredsof types each of which has many method members in turn.Some APIs use sub-packages to organize such API complexity,but those sub-packages are typically concerned with advancedAPI usage whereas the core facets of API usage are notdistinguished in any operational manner. This makes it hardto understand API usage at a somewhat abstract level.2. output : corpus3. for each name in candidateList :4. (psrc, pbin ) = obtainProject(name);5. patches = exploratoryBuild(psrc, pbin );6. timestamp = build(psrc, patches);7. (java, classes, jars) = collectStats(psrc);8. java0= filter(java);9. (jarsbuilt , jarslib) = detectJars(timestamp, java0, jars);10. java0compiled = detectJava(timestamp, java0, classes, jarsbuilt );11. p0src = (java0compiled , jarslib);12. p0bin = jarsbuilt ;13. p0= (p0src, p0bin );14. if validate(p0) : corpus = corpus + p0;Fig. 4. Pseudocode describing the corpus (re)-engineering method.Accordingly, we propose leveraging a notion of API facetsin the sense of aspects or concerns supported by the API.In this paper, we assume that facets are represented as namedcollections of specific API types or methods. As an illustration,we name a few API facets of the typical DOM-like API suchas DOM itself, JDOM, or dom4j:Input / Output: De-/serialization for DOM trees.Observation: Getter-like access and other ‘read only’ forms.Addition: Addition of nodes et al. as part also of construction.Removal: Removal of nodes et al. as a form of mutation.Namespaces: XML namespace manipulation.Nontrivial XML: Use of CDATA, PI, and other XML idiosyncrasies.Nontrivial API: Usage of types and methods that are beyond normalAPI usage. For instance, XML APIs may provide some frameworkfor node factories or adapters for API integration.API facets are helpful in communicating API usage to theuser at a more abstract level than the level of individualtypes and methods, as will be illustrated in §VI. We leverageknowledge of the APIs to identify (to name) API facets and totag APIs appropriately. The idea of grouping API members,e.g., by their functional roles, has also been studied in relatedwork on code completion; see §III.V. THE QUAATLAS CORPUS FOR API-USAGE ANALYSISOur study requires a suitable corpus of mature, well-developed projects coming from different application domains.Arguably, such projects show sufficient and advanced APIusage. We decided to restrict ourselves to open-source Javaprojects; in order to increase quality and reproducibility of ourresearch, we decided to use an existing, established and cu-rated, collection of Java projects—the QUALITAS corpus [27],release 20101126r. As we discuss in §IV, API usage entailsthe ability to resolve types. However, QUALITAS does notguarantee the availability of a project’s library types. Thecollection consists of source and binary forms as they areprovided by the project extenbe addedprojects.Line 4source aproject wnature oThe expoccur dustage, win the bset is smbuild scror invocto pushexploratobuild theAftermodificaJava filetypes, fowe explocontaineOn linthat weline 9, winformatclassifyor as buiand thecompiledsource ctypes togthe binarThe rrebuildinmaking sthe methand librawe add tThis pthe procper projecoveragesomethin10 as anit on reggathered by studying API usage in a corpus of projectsre-engineered Qualitas corpus to Eclipse projects that compile (79)dependencies resolved and separated from project filesIn the paper, we present a similar exploration-based approach for understanding API usage. This approach relies on a lot of meta-data about APIs that we have madeavailable in an API atlas.For 98 APIs, this atlas describes the individual packages/types/methods the API consists of. A fine-grained description is necessary as libraries such as Google Guava oreven java.util group different APIs together.We also associated a domain with each API. This resulted in 27 API domains. Finally, we have started describing groups of elements within an API that address a particularconcern.We gathered this meta-data by studying the APIs used in a corpus of 79 mature projects. We re-engineered the projects from the Qualitas corpus such that all theirdependencies are resolved and separated from project files. This enables extracting precise API usage facts.
  6. 6. linked to 101Note that the entire API atlas is available on the paper’s website.There, we also present the meta-data in a human-readable format. One nice feature there is that each API is linked to its description on the 101companies wiki where youcan also browse through small example programs that use the API etc.
  7. 7. Exploring API Usage: Exapus Platformscaled and ordered by usage metrics: #ref, #elem, #derive, #proj, #api, ...computes exploration views on usage factsselection of API referencesorganized as project or API sliceproject members + outgoing refs within their scopeAPI members + incoming refs within their scoperendered as graph, table or cloudby referenced elements: API name, element, meta-data referencing elements: project name, element, syntactic pattern, ...gathers API usage facts for a given corpusreferenced element, referencing scope, syntactic pattern (e.g., super call)The actual exploration-based approach to understanding API usage is supported by a tool that extracts references to API elements from a single project or a corpus ofprojects. During an AST visit, the tool records for each reference it discovers the referenced element, the project scope in which this reference resides, and the syntacticform of the reference. This could be a method return type, a super call, or a type parameter, ..The tool presents exploration views on the extracted facts, which can be configured along several dimensions. First of all, you can configure what API references to includein a view using conditions on the referenced element and the referencing element. For instance, only the exceptions defined by an API from the XML domain that are caughtin the JH project. Next, you can choose to organize these references as a slice of project members with outgoing refs or as a slice of API members with incoming refs.Finally, you can have these slices rendered as a graph/table/cloud scaled by a usage metric. For instance, a tag cloud scaled by the amount of subclassing along the borderbetween a project and an API.
  8. 8. What follows are some screenshots of the tool in action. At the far left, there is a list of predefined views. Their configuration can be edited in the top-right corner. Shownhere is the configuration of a view that results in the tag cloud we saw earlier. At the top, you can select what referenced elements to include. Here, we include all of themusing a wildcard pattern. At the bottom, you can select what referencing elements to include in a view. Here, we only include references from the JH project.Note that even though the tool has a dynamic IDE-like feel, it is actually completely web-based. We hope this will encourage others to explore and augment our API meta-data.
  9. 9. Here you see a project-centric table of outgoing references from JH to the Java collections API and DOM. We see for instance that the method add of StyleManager invokesmethod add of java.util.List. At the bottom-left, you see a tag cloud for the currently selected project element. We see that there are more references to data APIs than toXML apis in the StyleManager class.The source code for this class is shown at the bottom-right. API references are highlighted within the source code.
  10. 10. Finally, here you see an API-centric graph of references from JH to the APIs known to us. Nodes are APIs. Borders of the nodes are scaled by the relative amount ofreferenced elements. So this is basically another rendering of the tag cloud you saw earlier. You could also choose to scale the borders of the nodes using a differentmetric, such as the amount of derivation that happens.
  11. 11. And of course, we also made this tool publicly available.
  12. 12. Insight: API Dispersionintentstakeholderviewintelligenceunderstand and compare dispersion of an API across the corpusAPI developerproject-centric tableusage metrics for quantitative comparisonAPI facets for qualitative comparisonFig. 5. JDOM’s API Dispersion in QUAATLAS (project-centric table).B. The API Dispersion Insightchoose compliance tests for API evolutionSo, what insights about API usage can one hope to gain through such a tool? And how should you configure the tool such that it produces the right view for each insight?In the paper, we discuss this in a structured manner for several API usage insights.The one shown here is concerned with how dispersed or widespread an API is across a corpus of projects. It can be gained by configuring the tool to produce a table ofreferencing project elements, together with some usage metrics. Here, we see JDOM’s dispersion in the corpus. The table is sorted by the amount of references eachproject contains. We see that the informa project has the most references, but that jspwiki references the most distinct API elements. We also see that this project is one ofthe few that contain subtypes of API elements. So who could benefit from this insight? This would be the developer of an API that needs to choose easy and difficultprojects for compliance testing after an API evolution.
  13. 13. Insight: API Footprintintentstakeholderviewintelligenceunderstand what API elements areactually used in a corpusor in specific project scopesAPI or project developerAPI-centric table or treeordered or scaled by #refFig. 6. JDOM’s API Footprint in QUAATLAS (API-centric table).Nontrivial JDOM API usage in velocityorg.apache.velocity.anakia.AnakiaJDOMFactoryScope Tags incl. facets #proj...API migration by project developer:target effortAPI evolution by API developer:minimize breaking changesThe API footprint insight is dual to the API dispersion insight in the sense that it is gained through a slice of referenced API elements rather than through a table ofreferencing project elements. API developers might want to gain this insight for an entire corpus of projects to minimize the impact of breaking API changes. A projectdeveloper might want to gain this insight for a single project to decide whether a wrapper-based migration, where a wrapper of the new API has to be produced for eachreferenced element, is feasible.
  14. 14. Insight: API Couplingintentstakeholderviewintelligenceunderstand what APIs or API domains areused in smaller project scopesproject developerAPI-centric cloud, usage metrics appliedreveals potential code smell: too many APIs in small scopeon!!Distribution!!GUI!!IO!!Componentjava.lang!!!!Swing!!JavaBeans!!!!APIsAPI domainsCoupling in JHotDrawfor the interface savinghelps understand design and motivation for API dependenciesShown here is an insight that is targeted more towards project developers who would like to understand what APIs are used together in a small project scope. This insightcan be gained by configuring the tool to produce an API tag cloud for the currently selected project scope.The one on the slide is for the AbstractView class of JH, which seems to be referencing quite a lot of different APIs.For small project scopes, such as a method, this could be the sign of a code smell. For larger scopes, API tag clouds can also help understand the motivation behind APIdependencies. Here for instance, java.lang is referenced for string manipulation, for saving a view to a URI, Swing for painting views, JavaBeans for changenotifications, and for handling exceptions during the saving of a view.
  15. 15. Insight: API Profileintentstakeholderviewintelligenceunderstand what API facets are used in varying project scopesproject developerAPI-centric cloud of API facets, usage metrics appliedproject scope: reveals API asbestossmaller scope: API usage scenariosObservation!!Input!!Exception!!Package de.nava.informa.parsersObservation!!Input!!Nontrivial XML!!Manipulation Exception!!RenamingAddition Namespaces!!Nontrivial API!!Output!!Project informaJDOM’s API Profile for informae.g., JDOM’s profile in informaThe API profile insight is similar, but is gained through a cloud of the facets of a single API used within a project scope rather than complete APIs. At the top, we see theJDOM facets used within the entire informa project. Here, seldomly used non-trivial parts of an API reveal that the project might be difficult to change.At the bottom, we see the JDOM facets used within a smaller scope of the project. Here, the displayed facets correspond to API usage scenarios: the parsers package readsXML files and observes XML nodes.
  16. 16. Conclusiondescribed several insights to be gained about API usage Quaatlas API atlasre-engineered Qualitas projects for precise extraction of API usageadded meta-data concerning APIs, API domains, API facetspresented multi-dimensional exploration modelsupported by IDE-like web-based platform Exapusconfigurable views on API usagecocktail, dispersion, distribution, footprint, coupling, profilefuture workempirical research on understanding API usage through explorationsupport flow analyses in views