SlideShare a Scribd company logo
1 of 37
Download to read offline
Uncovering Library Features from API
Usage on Stack Overflow
The 29th IEEE International Conference on Software Analysis, Evolution and Reengineering
Camilo Velázquez-Rodríguez, Eleni Constantinou and Coen De Roover
March 15th, 2022
@cvelazquezr
1
Introduction
JSoup library
1
Introduction
JSoup library
…
Introduction
2
Introduction
API Features
2
Introduction
API Features Features
Persistent or thread-safe?
Methods for sorting?
Methods for reversing?
Filter a collection?
Transform collections?
2
Introduction
API Features Features Use of Features
Persistent or thread-safe?
Methods for sorting?
Methods for reversing?
Filter a collection?
Transform collections?
Single method call?
Group of method calls?
Static or instance method calls?
How do libraries compare feature-wise?
2
Introduction
“… a set of data structures (i.e., fields and classes) and operations
(i.e., functions and methods) participating in the realisation of a
(…) functionality.”
“… a set of API calls with a corresponding name.”
[1] - Antoniol, G. & Guéhéneuc, Y.-G. Feature Identification: A Novel Approach and a Case Study. 21st Ieee Int Conf Softw Maintenance Icsm’05 1–10 (2005).

[2] - Kanda, T., Manabe, Y., Ishio, T., Matsushita, M. & Inoue, K. Semi-Automatically Extracting Features from Source Code of Android Applications. Ieice T Inf Syst, 2857–2859 (2013). 3
Approach
4
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
5
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Guava
…
Apache

BCEL
Approach
5
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Guava
…
Apache

BCEL
6
6
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
6
6
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
Approach
7
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
- Post Titles

- Body of the Question

- Body of the Answer
- Method Names

- Method Names split by Camel
Case 

- API Calls
Textual Information API Usage Information
7
Jaccard
Similarity
[1.0, -0.12, ..., 0.5]
[0.1, 1.0, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 1.0]
Approach
TF-IDF
model
[0.7, -0.12, ..., 0.5]
[0.1, 0.45, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 0.12]
Cosine
Similarity
[1.0, -0.12, ..., 0.5]
[0.1, 1.0, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 1.0]
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
- Post Titles

- Body of the Question

- Body of the Answer
- Method Names

- Method Names split by Camel
Case 

- API Calls
Textual Information API Usage Information
Approach
8
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
8
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
[3] - P. Langfelder, B. Zhang, and S. Horvath, “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R,” Bioinformatics, vol. 24, no. 5, pp. 719–720, 2008.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Evaluation
RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 

10
Text Code
[6] - L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990, vol. 344.
Evaluation
RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 

11
The Methods combination from the
API usage information attributes
produced the most cohesive
clusters.
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

12
Generated features
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

13
The majority of the tutorial features is
part of our uncovered features.
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

14
When considered against the usages
in the SO data, the matched accuracy
increases.
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

15
On average, our uncovered features
are very similar to those in tutorials
denoted by the relevance score.
Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects? 

17
Guava HttpClient JFreeChart …
Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects? 

18
The majority of our uncovered features
are also used in GitHub client projects.
Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects? 

19
The decrease in the relevance might
be caused by a less-focused usage of
the features in GitHub.
Examples of uncovered features
21
PDFBox Several libraries
Good features Bad features
Tool LiFUSO
22
https://github.com/softwarelanguageslab/lifuso
Potential application
23
Maintainers
Potential application
23
Maintainers Users
Conclusion

More Related Content

What's hot

Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...Jie Bao
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slideMohd Iqbal Al-farabi
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7alaa223
 
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Riccardo Albertoni
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 

What's hot (6)

Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7
 
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 

Similar to Uncovering Library Features from API Usage on SO

Component Search and Retrieval
Component Search and RetrievalComponent Search and Retrieval
Component Search and RetrievalEduardo Cruz
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resourcesimranlatif
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchLucidworks
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
B sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai universityB sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai universitytanujaparihar
 
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...Evanta Technologies
 
The Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge NetworkThe Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge NetworkPham Cuong
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTMartin Fleck
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsPasquale Lops
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
 

Similar to Uncovering Library Features from API Usage on SO (20)

Component Search and Retrieval
Component Search and RetrievalComponent Search and Retrieval
Component Search and Retrieval
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 
CEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR SubmissionsCEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR Submissions
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
B sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai universityB sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai university
 
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
The Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge NetworkThe Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge Network
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoT
 
653 656
653 656653 656
653 656
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 

Recently uploaded

WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 

Recently uploaded (20)

WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 

Uncovering Library Features from API Usage on SO

  • 1. Uncovering Library Features from API Usage on Stack Overflow The 29th IEEE International Conference on Software Analysis, Evolution and Reengineering Camilo Velázquez-Rodríguez, Eleni Constantinou and Coen De Roover March 15th, 2022 @cvelazquezr
  • 6. Introduction API Features Features Persistent or thread-safe? Methods for sorting? Methods for reversing? Filter a collection? Transform collections? 2
  • 7. Introduction API Features Features Use of Features Persistent or thread-safe? Methods for sorting? Methods for reversing? Filter a collection? Transform collections? Single method call? Group of method calls? Static or instance method calls? How do libraries compare feature-wise? 2
  • 8. Introduction “… a set of data structures (i.e., fields and classes) and operations (i.e., functions and methods) participating in the realisation of a (…) functionality.” “… a set of API calls with a corresponding name.” [1] - Antoniol, G. & Guéhéneuc, Y.-G. Feature Identification: A Novel Approach and a Case Study. 21st Ieee Int Conf Softw Maintenance Icsm’05 1–10 (2005). [2] - Kanda, T., Manabe, Y., Ishio, T., Matsushita, M. & Inoue, K. Semi-Automatically Extracting Features from Source Code of Android Applications. Ieice T Inf Syst, 2857–2859 (2013). 3
  • 9. Approach 4 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K
  • 10. Approach 5 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Guava … Apache BCEL
  • 11. Approach 5 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Guava … Apache BCEL
  • 12. 6 6 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Approach
  • 13. 6 6 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Approach
  • 14. Approach 7 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K - Post Titles - Body of the Question - Body of the Answer - Method Names - Method Names split by Camel Case - API Calls Textual Information API Usage Information
  • 15. 7 Jaccard Similarity [1.0, -0.12, ..., 0.5] [0.1, 1.0, ..., 0.2] [0.4, -0.5, ..., 0.98] [0.37, 0.19, ..., 0.32] [0.81, 0.44, ..., 1.0] Approach TF-IDF model [0.7, -0.12, ..., 0.5] [0.1, 0.45, ..., 0.2] [0.4, -0.5, ..., 0.98] [0.37, 0.19, ..., 0.32] [0.81, 0.44, ..., 0.12] Cosine Similarity [1.0, -0.12, ..., 0.5] [0.1, 1.0, ..., 0.2] [0.4, -0.5, ..., 0.98] [0.37, 0.19, ..., 0.32] [0.81, 0.44, ..., 1.0] Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K - Post Titles - Body of the Question - Body of the Answer - Method Names - Method Names split by Camel Case - API Calls Textual Information API Usage Information
  • 16. Approach 8 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K
  • 17. Approach 8 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K [3] - P. Langfelder, B. Zhang, and S. Horvath, “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R,” Bioinformatics, vol. 24, no. 5, pp. 719–720, 2008.
  • 18. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 19. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 20. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 21. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 22. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 23. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 24. Evaluation RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 
 10 Text Code [6] - L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990, vol. 344.
  • 25. Evaluation RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 
 11 The Methods combination from the API usage information attributes produced the most cohesive clusters.
  • 26. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 12 Generated features
  • 27. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 13 The majority of the tutorial features is part of our uncovered features.
  • 28. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 14 When considered against the usages in the SO data, the matched accuracy increases.
  • 29. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 15 On average, our uncovered features are very similar to those in tutorials denoted by the relevance score.
  • 30. Evaluation RQ3. To which extent do the uncovered features that do not match documented tutorial features correspond to actual API usage in client projects? 
 17 Guava HttpClient JFreeChart …
  • 31. Evaluation RQ3. To which extent do the uncovered features that do not match documented tutorial features correspond to actual API usage in client projects? 
 18 The majority of our uncovered features are also used in GitHub client projects.
  • 32. Evaluation RQ3. To which extent do the uncovered features that do not match documented tutorial features correspond to actual API usage in client projects? 
 19 The decrease in the relevance might be caused by a less-focused usage of the features in GitHub.
  • 33. Examples of uncovered features 21 PDFBox Several libraries Good features Bad features