FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns

http://people.disim.univaq.it/diruscio/
davide.diruscio@univaq.it
@ddiruscio
FOCUS: A Recommender System for
Mining API Function Calls and
Usage Patterns
Davide Di Ruscio
Joint work with Phuong T. Nguyen, Juri Di Rocco, Lina Ochoa, Thomas Degueule, Massimiliano Di Penta

2ICSE 2019 – May 31, 2019 – Montréal, QC, Canada
Context
Related activities
• Searching for candidate components
• Evaluating a set of retrieved candidate components
to find the most suitable one
• Understanding how to use the selected components
• Monitoring the selected components
Development of new software systems
by reusing existing open source components
www.crossminer.org
@crossminer
eclipse.org/scava

Mining and
Knowledge Extraction
Tools
Source code
Q&A systems
Bug Reports
API
Documentation
Tutorials
Configuration
Management Systems
Advanced IDEs
CROSSMINER: high-level view
Bringing to the domain of software development the notion of
recommendation systems that are typically used for popular e-commerce
systems to present users with interesting items previously unknown to
them

Kinds of recommendation
Depending on the set of selected third-party libraries, the system is able to
recommend additional libraries that should be included in the project being
developed
Given a selected library, the system is able to suggest alternative ones that share
some similarities with the selected one
Depending on the set of selected libraries, the system shows API documentation
and Q&A posts that can help developers to understand how to use the selected
libraries
During the development, developers get recommendations about API function calls
and usage patterns that might be used
…

Problem
“Which API methods should this piece of client code
invoke, considering that it has already invoked these
other API methods?”

Explanatory example: method under development

Explanatory example: method declaration
Method declaration (MD)
Method invocations (MI)

Explanatory example: complete method declaration

Explanatory example: quested recommendations

List of API function calls:
• get, equal, where,
select, ...

Usage patterns:
• Snippets of code
containing the
recommended
function calls

FOCUS
It recommends API FunctiOn Calls and USage patterns
It works on the basis of a context-aware collaborative-filtering system

Recommend products to customers with similar preference
Image source: https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0
Collaborative-Filtering Technique

Collaborative-Filtering Technique
University of L'Aquila 14
R1 R2 R3
c1 5 5 2
c2 3 3 4
c3 5 5 ?
Internal Meeting, 31 October 2017
User-item matrix: Ratings given to Pizza restaurants by customers

15ICSE 2019 – May 31, 2019 – Montréal, QC, CanadaUniversity of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 15
Context-aware recommendation

University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 16
Examples of context: day of the
week, hour of the day, weather
conditions, …

University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 17
Predict the inclusion of additional invocations

FOCUS architecture

Code Parser
The available OSS repositories are mined to extract for each project:
- Method declarations
- Method invocations
- Field accesses
- Interface implementations
- Class extensions
- …
Rascal
Metaprogramming Language
https://www.rascal-mpl.org/

FOCUS architecture

Data encoder
Extracted method declarations and invocations of each project are
represented in a corresponding rating matrix

24ICSE 2019 – May 31, 2019 – Montréal, QC, CanadaUniversity of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 24
Representation of Projects-MDs-MIs
3D user-item-context
ratings matrix
Mappings:
– contexts ←→ projects
– users ←→ declarations
– items ←→ invocations

Similarity calculator
Given an active declaration in an active project, we find the subset of:
- the most similar projects
- and then the most similar declarations in that similar projects

Similarity calculator: Projects and method declarations
Graph-based representation
of projects and invocations
The similarity of two projects
p and q is calculated by
considering their feature
vectors (TF-IDF)
The similarities among
methods declarations are
calculated using the Jaccard
similarity index

FOCUS architecture

Recommendation engine: API function calls
Generation of a ranked list of API function calls
• Additional invocations for the active declaration are predicted by
computing the missing ratings
• Ranked list of invocations with scores in descending order

Recommendation engine: API usage patterns
From the ranked list, top-N method invocations are used as query to
search for relevant declarations
Source code snippets containing the identified relevant declarations
are retrieved from the available source code base

Evaluation
Assessing FOCUS capability to recommend API function calls
– Accuracy (precision and recall)
– Success rate
– Time performance
Comparing FOCUS with a state-of-the-art tool (PAM*)
Two dataset sources:
– More than 600 GitHub projects retrieved from Software Heritage
– A set of 3,600 jars retrieved from Maven Central
* Jaroslav Fowkes, Charles Sutton. Parameter-free probabilistic API mining across GitHub, Proceedings of the 24th ACM SIGSOFT
International Symposium on Foundations of Software Engineering (FSE 2016 )

Evaluation process
Source Code
metadata

Evaluation process: testing project
Total number of
declarations
Declarations that are kept
(the rest are discarded)
Total number of
invocations in a given
declaration
Invocations that are used
as query

Evaluation process: testing project
Only the ﬁrst
invocation is provided
as a query, and the rest
is used as ground-truth
data
Four invocations are
provided as a query,
and the rest is used as
ground-truth data
The ﬁrst half of the
declarations is used as
testing data and the
second half is removed
C1.1 C1.2
The last method
declaration is selected as
testing and all the
remaining declarations
are used as training data
C2.1 C2.2
Four different configurations

Evaluation key points
The performance of FOCUS relies on the availability of background data
– the system works effectively given that more OSS projects are available for
recommendation
Accuracy improves substantially when the query contains more invocations
Precision and recall for C1.1 and C1.2 on SH dataset Precision and recall for C1.1 and C1.2 on MV dataset

Evaluation key points
A dataset consisting of only 200 projects has been considered
Leave-one-out cross-validation has been performed to exploit as much
as possible the projects available as background data, given a testing
project
PAM requires 9 seconds to provide each
recommendation while FOCUS just
needs 0.095 seconds

What’s next
Embedding FOCUS directly into the Eclipse IDE
– Under development in CROSSMINER
A user study to thoroughly study the system’s performance

Conclusions
https://github.com/crossminer/FOCUS

FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns

Recommended

Recommended

More Related Content

Similar to FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns

Similar to FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns (20)

More from Davide Ruscio

More from Davide Ruscio (11)

Recently uploaded

Recently uploaded (20)

FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns