Software developers interact with APIs on a daily basis and, therefore, often face the need to learn how to use new APIs suitable for their purposes. Previous work has shown that recommending usage patterns to developers facilitates the learning process. Current approaches to usage pattern recommendation, however, still suffer from high redundancy and poor run-time performance. In this paper, we reformulate the problem of usage pattern recommendation in terms of a collaborative filtering recommender system. We present a new tool, FOCUS, which mines open-source project repositories to recommend API method invocations and usage patterns by analyzing how APIs are used in projects similar to the current project. We evaluate FOCUS on a large number of Java projects extracted from GitHub and Maven Central and find that it outperforms the state-of-the-art approach PAM with regards to success rate, accuracy, and execution time. Results indicate the suitability of context-aware collaborative-filtering recommender systems to provide API usage patterns.
2. 2ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Context
Related activities
• Searching for candidate components
• Evaluating a set of retrieved candidate components
to find the most suitable one
• Understanding how to use the selected components
• Monitoring the selected components
Development of new software systems
by reusing existing open source components
www.crossminer.org
@crossminer
eclipse.org/scava
3. 3ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Mining and
Knowledge Extraction
Tools
Source code
Q&A systems
Bug Reports
API
Documentation
Tutorials
Configuration
Management Systems
Advanced IDEs
CROSSMINER: high-level view
Bringing to the domain of software development the notion of
recommendation systems that are typically used for popular e-commerce
systems to present users with interesting items previously unknown to
them
4. 4ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Kinds of recommendation
Depending on the set of selected third-party libraries, the system is able to
recommend additional libraries that should be included in the project being
developed
Given a selected library, the system is able to suggest alternative ones that share
some similarities with the selected one
Depending on the set of selected libraries, the system shows API documentation
and Q&A posts that can help developers to understand how to use the selected
libraries
During the development, developers get recommendations about API function calls
and usage patterns that might be used
…
5. 5ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Problem
“Which API methods should this piece of client code
invoke, considering that it has already invoked these
other API methods?”
6. 6ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Explanatory example: method under development
10. 10ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Explanatory example: quested recommendations
List of API function calls:
• get, equal, where,
select, ...
11. 11ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Explanatory example: quested recommendations
Usage patterns:
• Snippets of code
containing the
recommended
function calls
12. 12ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
FOCUS
It recommends API FunctiOn Calls and USage patterns
It works on the basis of a context-aware collaborative-filtering system
13. 13ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Recommend products to customers with similar preference
Image source: https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0
Collaborative-Filtering Technique
14. 14ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Collaborative-Filtering Technique
University of L'Aquila 14
R1 R2 R3
c1 5 5 2
c2 3 3 4
c3 5 5 ?
Internal Meeting, 31 October 2017
User-item matrix: Ratings given to Pizza restaurants by customers
15. 15ICSE 2019 – May 31, 2019 – Montréal, QC, CanadaUniversity of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 15
Context-aware recommendation
16. 16ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Context-aware recommendation
University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 16
Examples of context: day of the
week, hour of the day, weather
conditions, …
17. 17ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Context-aware recommendation
University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 17
Predict the inclusion of additional invocations
20. 20ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Code Parser
The available OSS repositories are mined to extract for each project:
- Method declarations
- Method invocations
- Field accesses
- Interface implementations
- Class extensions
- …
Rascal
Metaprogramming Language
https://www.rascal-mpl.org/
21. 21ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Code Parser
The available OSS repositories are mined to extract for each project:
- Method declarations
- Method invocations
- Field accesses
- Interface implementations
- Class extensions
- …
Rascal
Metaprogramming Language
https://www.rascal-mpl.org/
23. 23ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Data encoder
Extracted method declarations and invocations of each project are
represented in a corresponding rating matrix
24. 24ICSE 2019 – May 31, 2019 – Montréal, QC, CanadaUniversity of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 24
Representation of Projects-MDs-MIs
3D user-item-context
ratings matrix
Mappings:
– contexts ←→ projects
– users ←→ declarations
– items ←→ invocations
25. 25ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Similarity calculator
Given an active declaration in an active project, we find the subset of:
- the most similar projects
- and then the most similar declarations in that similar projects
26. 26ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Similarity calculator: Projects and method declarations
Graph-based representation
of projects and invocations
The similarity of two projects
p and q is calculated by
considering their feature
vectors (TF-IDF)
The similarities among
methods declarations are
calculated using the Jaccard
similarity index
28. 28ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Recommendation engine: API function calls
Generation of a ranked list of API function calls
• Additional invocations for the active declaration are predicted by
computing the missing ratings
• Ranked list of invocations with scores in descending order
29. 29ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Recommendation engine: API usage patterns
From the ranked list, top-N method invocations are used as query to
search for relevant declarations
Source code snippets containing the identified relevant declarations
are retrieved from the available source code base
30. 30ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Evaluation
Assessing FOCUS capability to recommend API function calls
– Accuracy (precision and recall)
– Success rate
– Time performance
Comparing FOCUS with a state-of-the-art tool (PAM*)
Two dataset sources:
– More than 600 GitHub projects retrieved from Software Heritage
– A set of 3,600 jars retrieved from Maven Central
* Jaroslav Fowkes, Charles Sutton. Parameter-free probabilistic API mining across GitHub, Proceedings of the 24th ACM SIGSOFT
International Symposium on Foundations of Software Engineering (FSE 2016 )
31. 31ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Evaluation process
Source Code
metadata
32. 32ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Evaluation process: testing project
Total number of
declarations
Declarations that are kept
(the rest are discarded)
Total number of
invocations in a given
declaration
Invocations that are used
as query
33. 33ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Evaluation process: testing project
Only the first
invocation is provided
as a query, and the rest
is used as ground-truth
data
Four invocations are
provided as a query,
and the rest is used as
ground-truth data
The first half of the
declarations is used as
testing data and the
second half is removed
C1.1 C1.2
The last method
declaration is selected as
testing and all the
remaining declarations
are used as training data
C2.1 C2.2
Four different configurations
34. 34ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Evaluation key points
The performance of FOCUS relies on the availability of background data
– the system works effectively given that more OSS projects are available for
recommendation
Accuracy improves substantially when the query contains more invocations
Precision and recall for C1.1 and C1.2 on SH dataset Precision and recall for C1.1 and C1.2 on MV dataset
35. 35ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Evaluation key points
A dataset consisting of only 200 projects has been considered
Leave-one-out cross-validation has been performed to exploit as much
as possible the projects available as background data, given a testing
project
PAM requires 9 seconds to provide each
recommendation while FOCUS just
needs 0.095 seconds
36. 36ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
What’s next
Embedding FOCUS directly into the Eclipse IDE
– Under development in CROSSMINER
A user study to thoroughly study the system’s performance