SlideShare a Scribd company logo
Uncovering Library Features from API
Usage on Stack Overflow
The 29th IEEE International Conference on Software Analysis, Evolution and Reengineering
Camilo Velázquez-Rodríguez, Eleni Constantinou and Coen De Roover
March 15th, 2022
@cvelazquezr
1
Introduction
JSoup library
1
Introduction
JSoup library
…
Introduction
2
Introduction
API Features
2
Introduction
API Features Features
Persistent or thread-safe?
Methods for sorting?
Methods for reversing?
Filter a collection?
Transform collections?
2
Introduction
API Features Features Use of Features
Persistent or thread-safe?
Methods for sorting?
Methods for reversing?
Filter a collection?
Transform collections?
Single method call?
Group of method calls?
Static or instance method calls?
How do libraries compare feature-wise?
2
Introduction
“… a set of data structures (i.e., fields and classes) and operations
(i.e., functions and methods) participating in the realisation of a
(…) functionality.”
“… a set of API calls with a corresponding name.”
[1] - Antoniol, G. & Guéhéneuc, Y.-G. Feature Identification: A Novel Approach and a Case Study. 21st Ieee Int Conf Softw Maintenance Icsm’05 1–10 (2005).

[2] - Kanda, T., Manabe, Y., Ishio, T., Matsushita, M. & Inoue, K. Semi-Automatically Extracting Features from Source Code of Android Applications. Ieice T Inf Syst, 2857–2859 (2013). 3
Approach
4
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
5
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Guava
…
Apache

BCEL
Approach
5
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Guava
…
Apache

BCEL
6
6
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
6
6
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
Approach
7
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
- Post Titles

- Body of the Question

- Body of the Answer
- Method Names

- Method Names split by Camel
Case 

- API Calls
Textual Information API Usage Information
7
Jaccard
Similarity
[1.0, -0.12, ..., 0.5]
[0.1, 1.0, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 1.0]
Approach
TF-IDF
model
[0.7, -0.12, ..., 0.5]
[0.1, 0.45, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 0.12]
Cosine
Similarity
[1.0, -0.12, ..., 0.5]
[0.1, 1.0, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 1.0]
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
- Post Titles

- Body of the Question

- Body of the Answer
- Method Names

- Method Names split by Camel
Case 

- API Calls
Textual Information API Usage Information
Approach
8
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
8
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
[3] - P. Langfelder, B. Zhang, and S. Horvath, “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R,” Bioinformatics, vol. 24, no. 5, pp. 719–720, 2008.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.

[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
Evaluation
RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 

10
Text Code
[6] - L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990, vol. 344.
Evaluation
RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 

11
The Methods combination from the
API usage information attributes
produced the most cohesive
clusters.
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

12
Generated features
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

13
The majority of the tutorial features is
part of our uncovered features.
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

14
When considered against the usages
in the SO data, the matched accuracy
increases.
Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features? 

15
On average, our uncovered features
are very similar to those in tutorials
denoted by the relevance score.
Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects? 

17
Guava HttpClient JFreeChart …
Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects? 

18
The majority of our uncovered features
are also used in GitHub client projects.
Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects? 

19
The decrease in the relevance might
be caused by a less-focused usage of
the features in GitHub.
Examples of uncovered features
21
PDFBox Several libraries
Good features Bad features
Tool LiFUSO
22
https://github.com/softwarelanguageslab/lifuso
Potential application
23
Maintainers
Potential application
23
Maintainers Users
Conclusion

More Related Content

What's hot

Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
Deeksha thakur
 
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
Jie Bao
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
Mohd Iqbal Al-farabi
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7
alaa223
 
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Riccardo Albertoni
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 

What's hot (6)

Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...INDUS: A System for Information Integration and Knowledge Acquisition from Au...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7
 
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 

Similar to Uncovering Library Features from API Usage on Stack Overflow

Component Search and Retrieval
Component Search and RetrievalComponent Search and Retrieval
Component Search and Retrieval
Eduardo Cruz
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
imranlatif
 
CEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR SubmissionsCEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR Submissions
Syed Ahmad Chan Bukhari, PhD
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
Marco Brambilla
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
Lucidworks
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
Fabien Gandon
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 
B sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai universityB sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai university
tanujaparihar
 
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
Evanta Technologies
 
Wi presentation
Wi presentationWi presentation
Wi presentation
Saeedeh Shekarpour
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
Cameron Neylon
 
The Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge NetworkThe Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge Network
Pham Cuong
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoT
Martin Fleck
 
653 656
653 656653 656
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
Pasquale Lops
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
lagoze
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt

Similar to Uncovering Library Features from API Usage on Stack Overflow (20)

Component Search and Retrieval
Component Search and RetrievalComponent Search and Retrieval
Component Search and Retrieval
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 
CEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR SubmissionsCEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR Submissions
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
B sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai universityB sc it syit sem 3 sem 4 syllabus as per mumbai university
B sc it syit sem 3 sem 4 syllabus as per mumbai university
 
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
The Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge NetworkThe Structure of Computer Science Knowledge Network
The Structure of Computer Science Knowledge Network
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoT
 
653 656
653 656653 656
653 656
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 

Recently uploaded

Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Ben Linders
 
2 December UAE National Day - United Arab Emirates
2 December UAE National Day - United Arab Emirates2 December UAE National Day - United Arab Emirates
2 December UAE National Day - United Arab Emirates
UAE Ppt
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPEACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
Charmi13
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
Robin Haunschild
 
Genesis chapter 3 Isaiah Scudder.pptx
Genesis    chapter 3 Isaiah Scudder.pptxGenesis    chapter 3 Isaiah Scudder.pptx
Genesis chapter 3 Isaiah Scudder.pptx
FamilyWorshipCenterD
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
Prsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptxPrsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptx
prafulpawar29
 
Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
Raheem Muhammad
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
kekzed
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
kainatfatyma9
 
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
SkillCertProExams
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
RIDHIMAGARG21
 
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
OECD Directorate for Financial and Enterprise Affairs
 
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
gfysze
 
Legislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptxLegislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptx
Charmi13
 
Gamify it until you make it Improving Agile Development and Operations with ...
Gamify it until you make it  Improving Agile Development and Operations with ...Gamify it until you make it  Improving Agile Development and Operations with ...
Gamify it until you make it Improving Agile Development and Operations with ...
Ben Linders
 

Recently uploaded (19)

Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
 
2 December UAE National Day - United Arab Emirates
2 December UAE National Day - United Arab Emirates2 December UAE National Day - United Arab Emirates
2 December UAE National Day - United Arab Emirates
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
 
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPEACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
 
Genesis chapter 3 Isaiah Scudder.pptx
Genesis    chapter 3 Isaiah Scudder.pptxGenesis    chapter 3 Isaiah Scudder.pptx
Genesis chapter 3 Isaiah Scudder.pptx
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
 
Prsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptxPrsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptx
 
Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
 
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
 
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
 
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
 
Legislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptxLegislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptx
 
Gamify it until you make it Improving Agile Development and Operations with ...
Gamify it until you make it  Improving Agile Development and Operations with ...Gamify it until you make it  Improving Agile Development and Operations with ...
Gamify it until you make it Improving Agile Development and Operations with ...
 

Uncovering Library Features from API Usage on Stack Overflow

  • 1. Uncovering Library Features from API Usage on Stack Overflow The 29th IEEE International Conference on Software Analysis, Evolution and Reengineering Camilo Velázquez-Rodríguez, Eleni Constantinou and Coen De Roover March 15th, 2022 @cvelazquezr
  • 6. Introduction API Features Features Persistent or thread-safe? Methods for sorting? Methods for reversing? Filter a collection? Transform collections? 2
  • 7. Introduction API Features Features Use of Features Persistent or thread-safe? Methods for sorting? Methods for reversing? Filter a collection? Transform collections? Single method call? Group of method calls? Static or instance method calls? How do libraries compare feature-wise? 2
  • 8. Introduction “… a set of data structures (i.e., fields and classes) and operations (i.e., functions and methods) participating in the realisation of a (…) functionality.” “… a set of API calls with a corresponding name.” [1] - Antoniol, G. & Guéhéneuc, Y.-G. Feature Identification: A Novel Approach and a Case Study. 21st Ieee Int Conf Softw Maintenance Icsm’05 1–10 (2005). [2] - Kanda, T., Manabe, Y., Ishio, T., Matsushita, M. & Inoue, K. Semi-Automatically Extracting Features from Source Code of Android Applications. Ieice T Inf Syst, 2857–2859 (2013). 3
  • 9. Approach 4 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K
  • 10. Approach 5 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Guava … Apache BCEL
  • 11. Approach 5 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Guava … Apache BCEL
  • 12. 6 6 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Approach
  • 13. 6 6 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Approach
  • 14. Approach 7 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K - Post Titles - Body of the Question - Body of the Answer - Method Names - Method Names split by Camel Case - API Calls Textual Information API Usage Information
  • 15. 7 Jaccard Similarity [1.0, -0.12, ..., 0.5] [0.1, 1.0, ..., 0.2] [0.4, -0.5, ..., 0.98] [0.37, 0.19, ..., 0.32] [0.81, 0.44, ..., 1.0] Approach TF-IDF model [0.7, -0.12, ..., 0.5] [0.1, 0.45, ..., 0.2] [0.4, -0.5, ..., 0.98] [0.37, 0.19, ..., 0.32] [0.81, 0.44, ..., 0.12] Cosine Similarity [1.0, -0.12, ..., 0.5] [0.1, 1.0, ..., 0.2] [0.4, -0.5, ..., 0.98] [0.37, 0.19, ..., 0.32] [0.81, 0.44, ..., 1.0] Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K - Post Titles - Body of the Question - Body of the Answer - Method Names - Method Names split by Camel Case - API Calls Textual Information API Usage Information
  • 16. Approach 8 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K
  • 17. Approach 8 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K [3] - P. Langfelder, B. Zhang, and S. Horvath, “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R,” Bioinformatics, vol. 24, no. 5, pp. 719–720, 2008.
  • 18. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 19. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 20. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 21. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 22. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 23. Approach 9 Select Library Selected Library Collect Library Information groupID, artifactID 1 A Collect Answers from Stack Overflow Answers from SO Name Public Class Names for Library B 3 C Filter Answers with Code Filtered Answers Filter Answers with API Usages 4 5 D API Usage Textual Information Text Vectorisation E Combine Vectors Vectors Combined Cluster Clusters of SO Usages Features Name Clusters F H J L 7 9 11 2 6 Calculate Similarity Similarity Matrix I 8 A B C D Selection of Clusters 10 Selected Clusters K Formed clusters would ideally have frequent API calls, characterising the code-part of the cluster. [4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000. [5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
  • 24. Evaluation RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 
 10 Text Code [6] - L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990, vol. 344.
  • 25. Evaluation RQ1. Which combination of SO answer attributes produces the most cohesive clusters? 
 11 The Methods combination from the API usage information attributes produced the most cohesive clusters.
  • 26. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 12 Generated features
  • 27. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 13 The majority of the tutorial features is part of our uncovered features.
  • 28. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 14 When considered against the usages in the SO data, the matched accuracy increases.
  • 29. Evaluation RQ2. How similar are the automatically uncovered features to documented tutorial features? 
 15 On average, our uncovered features are very similar to those in tutorials denoted by the relevance score.
  • 30. Evaluation RQ3. To which extent do the uncovered features that do not match documented tutorial features correspond to actual API usage in client projects? 
 17 Guava HttpClient JFreeChart …
  • 31. Evaluation RQ3. To which extent do the uncovered features that do not match documented tutorial features correspond to actual API usage in client projects? 
 18 The majority of our uncovered features are also used in GitHub client projects.
  • 32. Evaluation RQ3. To which extent do the uncovered features that do not match documented tutorial features correspond to actual API usage in client projects? 
 19 The decrease in the relevance might be caused by a less-focused usage of the features in GitHub.
  • 33. Examples of uncovered features 21 PDFBox Several libraries Good features Bad features