The document presents a conceptual dependency graph based model for mapping source code to API documentation. It proposes using a weighted graph to model dependencies between code elements and filter candidate sets for contextual similarity computation. The methodology involves preprocessing source code and documents, building a dependency graph, and computing contextual similarity. Results show the proposed method achieves higher accuracy than existing techniques. Future work includes applying the approach to larger open source projects.
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code to API Documentation Mapping
1. A Conceptual Dependency Graph Based
Keyword Extraction Model for Source Code to
API Documentation Mapping
Prepared By
Nakul Sharma
Under Guidance of
Dr. Prasanth Yalla
Professor, Department of Computer Science and
Engineering.
Koneru Laxmiah Education Foundation.
Vijayawada, Andhra Pradesh
India
2. Table of Contents
Introduction
Background
Mathematical Foundations
Genesis of Research
Proposed Methodology
Results and Discussion
Future Scope and Conclusion
References
3. Introduction
Traditional key feature extraction techniques
• use terms or sentences from the project
source codes to form a unique code structure.
Almost all traditional document key phrase
extraction techniques
• represent a document collection as the
phrase or sentence matrix in which each row
denotes the phrase or sentence-id and
corresponding column represents the frequency
4. Introduction (Continued)
Main problem with the existing systems is that they ignore the
context based textual information.
Contextual Information hold more relevance especially when
undertaking any software change which effects not just the
current phase of project but also the previous phases and the
next phases.
Source Code Analysis also aids in checking the effect of
change on code.
In the proposed model, a weighted graph dependency model
is used to filter the candidate sets among the vertices for
contextual similarity computation.
7. Genesis of Research
Work Done in Text Mining and its related fields
Research conducted by various authors
8. Related Work
Sr. No. Name of Authors Work Done in Brief
1 S. Mohammadi et.al new approach is presented to extract the
knowledge of dependency between
artifacts in the source code.
2 V. U. Gómez, et.al U. Gómez, et.al, proposed a semantic
model on the visually characterizing
source code modifications
3 S. L. Abebe et.al S. L. Abebe et.al has introduced a new
extraction scheme that is sufficiently
effective to extract domain concepts from
the source code.
4 S. Bajracharya, et al, S. Bajracharya, et al, developed a new
SCA framework to collect and analyze
open source code on a large scale
5 A. S. Yumaganov A. S. Yumaganov proposed to compare
different search models for similarity with
limitations on the source code
9. Related Work
Sr. No. Name of Authors Work Done in Brief
1 Dimitriou et.al A. Dimitriou et.al, introduced a new keyword
search of top-k-size on tree structured data
2 W. Ding W. Ding proposed a review of software
documentation process knowledge-based
techniques
3 Hussain et. al. Hussain et.al proposed a new software design
pattern classification and selection scheme.
4 Ibrahim et. al. Ibrahim et.al presented a scientometric re-
ranking technique
5 L. H. Lee et. al. L. H. Lee, et.al, used Bayesian text classification
to introduce high relevance keyword extraction
process
10. Related Work (Related to Software
Metrics)
Sr. No. Name of Authors Work Done in Brief
1 Dimitriou et.al A. Dimitriou et.al, introduced a new keyword search of top-k-
size on tree structured data
2 W. Ding W. Ding proposed a review of software documentation
process knowledge-based techniques
3 Hussain et. al. Hussain et.al proposed a new software design pattern
classification and selection scheme.
4 Ibrahim et. al. Ibrahim et.al presented a scientometric re-ranking technique
5 L. H. Lee et. al. L. H. Lee, et.al, used Bayesian text classification to introduce
high relevance keyword extraction process
11. Observations on Related Work
Large open source projects not considered in SCA
systems and tools developed
Existing system also do not take into
consideration the contextual keyphrases in
providing traceability links.
The current work proposes an alternative
contextual dependency graph based software
metrics in form of contextual similarity.
12. Proposed Methodology
Figure 1: Module-1
Project source
codes
Class parsing
Project API
documentation
Text pre-processing
Filtered API
documents
Code dependency
Graph
Proposed
Contextual
dependency graph
similarity
14. Proposed
Methodology
Phase 1: Source Code and API documents Pre-processing
Step 1: Read project source codes S.
Step 2: Read project API documents D.
Step 3: for each code Ci in S[]
Do
Parse source code Ci with methods M and Fields F.
Mi=ExtractMethods(Ci)
Fi=ExtractFields(Ci)
Mapping (Mi , Fi) to Ci
C1 (M1,F1)
C2 (M2,F2)
… …..
Cn (Mn,Fn)
done
Step 4: // Remove the duplicate methods and fields in each class
For each code Ci
Do
i i j
i i j
M Pr ob(M M / C);i j
F Pr ob(F F / C);i j
If( Mi!=0 AND Fi!=0)
Then
Remove Mi in Ci or Cj
Remove Fi in Ci or Cj
End if
Done
16. Future Scope and Conclusion
The current paper proposed a novel approach to find
the relationship between the source code to API
documents using the contextual dependency graph. A
two pronged approach is used in the proposed method.
The project source code is scanned for the relevant
metrics. On the other hand, from the API
documentation, necessary information is extracted.
Here, the dependency graph is used to compute the
contextual similarity computation between the source
code metrics and its API documents
17. References
Amir Hossein Rasekh, Amir Hossein Arshia, “Mining and discovery of hidden relationships between
software source codes and related textual Documents”, Digital Scholarship in the Humanities ,
Published by Oxford University Press on behalf of EADH., doi:10.1093/llc/fqx052,
Chun Yong Chong , Sai Peck Lee , Automatic Clustering Constraints Derivation from Object-Oriented
Software Using Weighted Complex Network with Graph Theory Analysis, The Journal of Systems &
Software (2017), doi: 10.1016/j.jss.2017.08.017
Anh Tuan Nguyen, Tien N. Nguyen, Graph-based Statistical Language Model for Code, 2015
IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), 2015, Florence,
Italy, Page 858-862.
Lars Ackermann, Bernhard Volz, “model[NL]generation: Natural Language Model Extraction”,
DCM’13: Proceedings of the 2013 workshop on Domain Specific Modeling: ACM New York,USA.
F Meziane, N. Athanasakis, S. Ananiadou, "Generating Natural Lanuage Specifications from UML
Class diagrams", Requirement Engineering Journal, 13(1):1-18, Springer-Verlag, London.
Fabian Friedrich, Jan Mendling, Frank Puhlmann, “Process Model Generation from Natural
Language Text”, In Advanced Information Systems Engineering, Eds. Lecture Notes in Computer
Science. Springer Berlin Heidelberg, Berlin, Heidelberg, 482-496.