API Usage Pattern Extraction using Semantic Similarity


Published on

An enthusiastic project for API usage pattern extraction exploiting semantic similarity among API usage code examples.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello everybodyWelcome to my presentationThis is Masudur RahmanToday I am going to present my research project titled as “Semantic network based API usage pattern extraction and learning”Hope you will enjoy the presentation
  • In my presentation I am going to cover the following topics.
  • API or Application programming interface is an interesting concept in modern OO programming languages as it encourages to reuse the existing programming resources without reinventing the wheels.But it is tough for the developer to master the APIs when there are a good number of complex APIs involved and there is no sufficient help to learn how to use those APIs. For example, we can mention API documentation, forum or API browser; however, they are not good enough to meet the developer’s learning need as they contain some simple examples or troubleshooting information. So, one possible solution is-consulting existing projects by other developers. Those projects have used the APIs and can provide some practical examples of the usage.However, we can not capture the whole API usage, rather we need to extract the API usage pattern which can provide sufficient knowledge of how to use an API.Here comes the term -API usage pattern
  • Now the question is- what is an API usage pattern?When a frequent and consistent sequence of API method calls and field accesses perform a particular programming task, then that sequence is called API usage pattern.It has to be widely used in different projectsIt has to be widely accepted by the developers community
  • For example, this is a code example for reading content from a file. From our research, we found that it is a common and frequent sequence of API calls to read the file content in numerous projects.It contains calls involved with Scanner class.So, it can be considered as a API usage pattern for Scanner API class.We are interested to extract such type of usage patterns for different API classesfrom different open source projects so that developer can learn them and use them in their works.
  • From our research, we got there are two board ways for API usage pattern extraction.Frequent method sequence miningGraph-based approachIn our research, we are following a relatively novel approach for API usage pattern extraction.We are using semantic web technologies.Ok lets explain the semantic web technologies.
  • Besides API usage pattern the most important concept needs to be explained is semantic web or network.Semantic web is a new breed of world wide web found by Tim Berners Lee. It consists of nodes and meaningful connecting edges which are not like simple hyperlinks rather they contains meaningful information. And each node is called a resource which can be a document, person, image, song or anything that can be identified by a REST URI in the web.Basically, semantic web is an efficient tool for knowledge representation and inference. For example, this is a simple semantic network representing some knowledge. Here, from the network, we can retrieve the living place information of a person written a particular software manual. This is easy for semantic web but relatively hard for current structure of www.We are interested to apply this kind of structure to capture the source code knowledge and use it for API usage pattern extraction and learning.
  • Let us consider a new developer has some knowledge about OO programming and he is assigned to fix a problem that involves complex API.He has to understand and learn the API to use in the work. Here is the example that opens a file and read the content of a file.The code is simple and contains two object usage – Scanner and File.But, the developer’s understanding about the source code depends on his knowledge about Java syntax and he has to grab the concepts from some source code lines which is not always easy, quick or helpful.
  • But if you consider this one, does it make sense?This is a semantic network version of the source code example I just showed. We, people are really visual beings and we understand relationships more easily from graphics or structures rather than texts. It looks a bit colorful, but a mediocre developer can understand the OO relationships between two objects without being concerned about the java syntax. For example, developer can understand the relationship between Scanner and File object. For example, here it shows that File is a parameter to the constructor of Scanner object. Similarly, other relationships can be represented like hasMethod, hasChild, hasContstructor can be in a simple graphical way.So, basically this type of representation of source code is really can help in understanding and learning for the developer.Also the structure can reflect the semantics and relationship among different source code entities like class, method, object, instances in a novel way which can be manipulated for API usage pattern extraction.Thus it is helpful for our research goal and we are motivated to use semantic network of source code for the research.
  • In this research, we try to answer these two research questions.RQ 1: Can semantic network technologies represent the semantics of OO source code properly?RQ 2: Can this representation be used for API usage pattern extraction and learning?
  • Here are some background concepts that we have explained so far. However, now we will discuss about the RDF, the framework used to implement the Semantic web or network.
  • However, semantic web is implemented using a framework called Resource Description Framework (RDF).The building block of RDF network is RDF statement or triples. Each statement represent a fact or a piece of knowledge about the network or system.Each triple has 3 components – subject, predicate and object.Subject: the entity about which the fact is described.Predicate: the attribute of the subjectObject: the attribute values of the subject.For example, Scanner.new is a node which type is a constructor.
  • Now comes the proposed approach for API usage pattern extraction and learning.At first we selected 25 open source projects and a list of API classes from 3 standard java libraries. Then we look for each java class from the open source projects for the API usages? We consider the java methods or constructors as the containers of the API usages. So, we extract them and parse them using AST parser provided by Eclipse.After parsing, we used the selected expression to develop the semantic network of source code that can be considered as an equivalent graphical representation of source code. Then we used those usage graphs to extract the API usage patterns.However, this is some overview, now, we describe some important steps more deeply.
  • Representing source code into equivalent semantic network. It is obviously a challenging task. However, we figured out a way to do that.Java source code is parsed by the AST parser provided by Eclipse. We parsed each Java statement up to expression level and got numbers of expression that express the semantics. However, in this research we are concerned with API usage patterns; so, we mainly focused on the expressions that reflect the OO semantics such as method call expression, field access expression, object creation expression and so on.We also used an advanced framework to deal with RDF technologies called Jena by Apache and thus we used all three – expression, selection rules, predicate list and Jena to develop the RDF network of each API usage.Basically, we developed the triples that contain those three parts – subject, predicate and object to represent every fact about the source code semantics and knowledge. Each semantic network is formed based on those triples. Thus, we got an equivalent semantic network for each API usage in the source code.Question is why do we need to represent the source in this way? Answer: Analyzing and processing source code directly is not easy. So, we need a structure which can be programmatically manipulated. Semantic web like structure provides more strength with its reasoning power through SPARQL.
  • Now comes the API usage pattern extraction from the RDF API usages.Basically, we exploited the strength of Jena framework for this purpose. From a list of usage graphs, we extracted all possible common subsets that capture the API calls, object creation, field access and all other API related information. These subsets are the isomorphic sub-graphs of each other and the possible candidates for the API usage patterns.However, then we calculated the score of each pattern candidate based on their frequency in the same project and frequency in multiple projects. Then, based on some thresholds, we considered a candidate pattern as a selected pattern for an API class.
  • Now comes the experimental results.We used 25 open source projects from different domains like java graphics, image manipulation, networking, domain management, utilities etc.We chose 3 standard API libraries and 250 classes from them.We detected the usages of 113 classes in those projects, however, we are able to extract patterns for 76 API classes.In total, we extracted 776 distinct patterns.
  • From our experiment, we extracted this type of usage patterns from the API usage.For example, this is an API usage pattern for BufferedInputStream API class. It is a sub-graph of the total API usage.This sub-graph of sub-network tells us how to create an object of a BufferedInputStream object.Create a File object and use it as a parameter to the constructor of FileInputStream.The use that FileInputStream object to construct the BufferedInputStream object.
  • While the semantic representation can be used for learning and understanding, it can also converted into the corresponding source code skeleton.So, basically, this is the source code version of the API usage patterns and it also represent the same information as the network does.For this task, we actually parsed the semantic network, that means the triples to generate the code skeletons which can be helpful for the developer in writing the actual code.
  • Here is the table that shows a portion of our results.For example, JHotdraw7 is a commonly used java project for different software maintenance activities.We found 689 java classes, 7330 methods and constructors49 API classes are found in the project and 2547 patterns are used.However, 462 distinct patterns are extracted We applied the experiments on 25 projects and we found our approach quite promising in extracting the API usage patterns.
  • We compared our results with the results of Nguyen et al, FSE, 2009.We considered top 5 projects based on their size and API patterns found.The graph shows average no. of patterns found per project class file. Here, we can see that our approach shows better performance in case of 3 projects. Also we notice a regular pattern in our results which is absent the Nguyen’s approach.Basically, we checked those last 3 projects and found they are highly popular and active in real world. Additionally, we found that they are involved with more API usages than other projects. So, we can also infer that advanced and frequent API usage may be a possible cause for their popularity.Though, we used a different set of projects than Nguyen et al, we found an interesting correlation between these two set of results.Thus, it is reasonable to think that the proposed approach is a suitable candidate for API usage pattern extraction. However, we are working on to make it more efficient.
  • Now comes about the answers of the research questions we stated at the very beginning.RQ 1: Can semantic network technologies represent the semantics of OO source code properly?Existing work by Nguyen et al uses a graph-based approach but the source code representation was not completely semantic as the connecting edges were treated too abstractly such as data dependency or control dependency, but our approach decomposes that relationships and dependency into more granular level and more importantly, ours one can be used for knowledge inference.Existing work by Wursch et al develops the source code ontology, but that is not a proper representation of source code, rather it contains partial information about the source code.So none of the existing works actually convert the source code into a semantic representation this much. Our approach can capture the semantics of OO source code more broadly than existing approaches.
  • RQ 2: Can this representation be used for API usage pattern extraction and learning?Yes, the semantic representation is quite helpful for API usage pattern extraction as we have already did that.Moreover, this representation is found as a potential approach for learning API by the developers because of its visual and descriptive logic nature.Basically, we try to add a novel concept for API learning and understanding.We are still working on it and this current outcome can be considered as a preliminary results of the whole idea.
  • While working, we faced few challenges which we tried to overcome.The complete semantic representation is a non-trivial task as it involves too many expression of a complete programming language. In this research, we tried to capture the OO features/concepts of Java as we focused on API usage patterns. But if more expressions are considered, more accurate representation is possible.We also found that RDF visualization within a limited display is challenging.
  • So, we proposed a new approach for representing OO source code in a semantic network fashion which is helpful forAPI usage pattern extraction, learning and visualization.More importantly, it capture the source code semantics than existing graph based approaches.This research also leads us to further research problem and we have some future plans:We will conduct a real world user study to determine its real benefitsWe will apply the extracted API usage patterns for code completion in the Eclipse IDEAlso will be used for API usage violation or anomaly detection.
  • That’s all about my presentation.Thanks to everybody.
  • API Usage Pattern Extraction using Semantic Similarity

    1. 1. SEMANTIC NETWORK BASED API USAGE PATTERN EXTRACTION & LEARNING Mohammad Masudur Rahman mor543@mail.usask.ca Department of Computer Science University of Saskatchewan
    2. 2. PRESENTATION OVERVIEW Introduction  Motivating Example  Background Concepts  Proposed Approach  Semantic Network of Source code  API Usage Pattern Extraction  Pattern Learning & Visualization  Experimental Results & Discussions  Threats to Validity  Conclusion & Future Works 
    3. 3. INTRODUCTION API (Application Programming Interface) Libraries  API Documentation, API Browser, forums  API Usage learning for developers  Existing projects using APIs  API Usage Patterns 
    4. 4. WHAT IS API USAGE PATTERN? A frequent and consistent sequence of API method calls and field accesses  Performs a particular programming task.  Widely used in multiple projects  Widely accepted by developers community 
    6. 6. BIG QUESTION? How to extract the API usage patterns from the source code?
    7. 7. SEMANTIC WEB OR NETWORK  What is the living place of the author of a particular software manual?
    10. 10. RESEARCH QUESTIONS RQ 1: Can semantic network technologies represent the semantics of OO source code properly?  RQ 2: Can this representation be used for API usage pattern extraction and learning? 
    11. 11. BACKGROUND CONCEPTS API Usage Patterns  API Usage Violation & Anomalies  Semantic Web  Semantic Network of Source Code  Resource Description Framework (RDF)  RDF Statement or Triples 
    14. 14. PROPOSED APPROACH FOR API USAGE PATTERN EXTRACTION & LEARNING 1 API Class List API Classes 3 4 Yes Contains API ? 2 OSS Projects Parsed Expressions Source files Source code parser 5 Semantic Network Builder RDF Files 7 6 No Patterns API Usage Pattern Manager Pattern 9 Pattern Source Skeleton Builder Pattern 8 RDF Pattern Visualizer API Pattern Explorer
    15. 15. SOURCE CODE SEMANTIC NETWORK Java Source code AST Parser (Javaparser) API Expression selection rules Apache Jena Framework Java Expressions RDF Maker RDF Network RDF Triples
    16. 16. API USAGE PATTERN EXTRACTION Common Sub-graph Selection Candidate API usage Patterns All Usages of an API Class Yes Selected API Usage Patterns Pattern Score > threshold ? No Discarded
    17. 17. EXPERIMENTAL RESULTS 25 Open source Projects  3 API libraries (java.io, java.util, java.awt)  250 API classes selected  API usages found for 113 API classes  Pattern found for 76 API classes  Total 776 patterns 
    19. 19. SOURCE CODE SKELETON Fig: BufferedInputStream Usage Pattern
    20. 20. EXPERIMENTAL RESULTS Project #Class #M &C #ATCF #ADCF #ATPF #ADPF Ant-Contrib 186 1388 96 23 1865 280 AOI 461 6489 218 55 1651 494 Groimp 1202 13875 132 41 1632 407 JFreechart 1059 12368 507 38 6841 410 JHotdraw7 689 7330 310 49 2547 462 #M & C =Methods & Constructors, #ATCF=Total API class, #ADCF=Distinct API class, #ATPF=Total API Patterns found, #ADPF=Distinct API Patterns found
    21. 21. PATTERNS PER CLASS Fig: # patterns extracted per class comparison
    22. 22. RESULTS DISCUSSION RQ 1: Can semantic network technologies represent the semantics of OO source code properly?  Graph-based API Usage Extraction by Nguyen et al, FSE, 2009 : Incomplete semantics for edges and attributes  Source code ontology by Wursch et al, ICSE, 2010 : Does not represent the complete source code  The proposed approach captures expression level syntax and semantics  Focuses on API usage patterns 
    23. 23. RESULTS DISCUSSION RQ 2: Can this representation be used for API usage pattern extraction and learning?  Successfully extracts 776 patterns for 76 API classes from 25 open source projects  A potential approach to be explored more for API usage pattern exploration  Visualization of RDF network helps in learning  Source code as visual entities rather than lines  More comprehensive idea about OO source code  Applicable for complex OO relationships  Very useful for quick learning 
    24. 24. THREATS TO VALIDITY Representing complete semantics: a non-trivial task.  More expressions for more accurate representation  RDF pattern visualization within limited display  Need to be introduced with RDF convention 
    25. 25. CONCLUSION & FUTURE WORKS Applicability of semantic web technologies for API usage pattern extraction  Semantic representation for learning by the developers  Real world user study  Extracted patterns for automatic code completion in the IDE.  Extracted patterns for API violation and anomaly detection 
    26. 26. THANK YOU!!!
    27. 27. REFERENCES [1] Semantic web diagram.URL http://www.w3.org/ Talks/2002/10/16-sw/slide7-0.html. [2] Tung Thanh Nguyen, Hoan Anh Nguyen, NamH.Pham, JafarM.Al-Kofahi, and TienN.Nguyen. Graph-based mining of multiple object usage patterns. In Proc. ESEC/FSE, 2009, pages 383-392. [3] M.Wursch, G.Ghezzi, G.Reif,and H.C.Gall. Supporting developers with natural language queries. In Proc. ICSE, 2010,pages 165-174 [4] Tao Xie and Jian Pei. Mapo:mining api usages from open source repositories. In Proc. MSR, 2006, pages 574-57 [5] Semantic web technology.URL http://www.w3.org/ 2001/sw [6] Visual learning style.URL http://www.learning-styles-online.com/style/visualspatial. [7] Apache Jena framework.URL http://jena.apache.org/. [8] Javaparser-java 1.5 parser and ast.URL http://code.google.com/p/javaparser/. [9] RDF-gravity tool.URL http://semweb.salzburgresearch.at/apps/rdf-gravity/.