SlideShare a Scribd company logo
1 of 24
STRICT: INFORMATION RETRIEVAL
BASED SEARCH TERM IDENTIFICATION
FOR CONCEPT LOCATION
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan, Canada
International Conference on Software Analysis, Evolution and
Reengineering (SANER 2017), Klagenfurt, Austria
SOFTWARE CHANGE TASK
2
Task Summary
Task Description
Other Information
SOFTWARE CHANGE TASK:
DOMAIN CONCEPT--ARTIFACT MAPPING
IResource
element
Tree
Level
Provider
3
Domain concepts
Project artifacts
(e.g., classes, methods)
Our
contribution:
Identifying
such concepts
EXISTING WORKS
 Query reformulation & expansion
 Haiduc et al, ICSE 2013
 Gay et al, ICSM 2009
 Shepherd et al, ASOD 2007
 Query quality analysis
 Haiduc et al, ASE 2011
 Haiduc et al, ICPC 2011
 Haiduc et al, ICSE 2012
 Software artifact mining
 Howard et al, MSR 2013
 Kevic & Fritz, MSR 2014
 Heuristics
 Kevic & Fritz, ICSE 2014
4
• Most studies expect the
developer to provide an initial
query
• Developers succeed only in
12.2% of cases (Kevic & Fritz,
ICSE 2014)
Initial search query for a
change task.
PAGERANK ALGORITHM: WEB LINK ANALYSIS
5Size of a face ∞ Size of the faces pointing to it
Most important face
in this crowd
SEARCH TERM IDENTIFICATION USING
TEXTRANK & POSRANK
6
SCHEMATIC DIAGRAM: PROPOSED
APPROACH
7
Change
request
Preprocessing
TextRank
calculation
POSRank
calculation
Ranking Search terms
Focus of this talk
TEXTRANK: TERM IMPORTANCE USING CO-
OCCURRENCE (MIHALCEA ET AL, EMNLP 2004)
8
IResource-------IJavaElement, element-----reported
Node = Distinct word
Edge = Two words co-
occurring in the same
context
POSRANK: TERM IMPORTANCE USING SYNTACTIC
DEPENDENCE (BLANCO & LIOMA, INF. RETR. 2012)
9
Edge = Syntactic
dependence between
various parts of
speech in the sentence
Verb-------Noun, Verb---Adjective
Jespersen’s Rank Theory
Noun
Verb Adjective
TERM IMPORTANCE
(ADAPTED FROM PAGERANK)
10
 
 )(
)10(
|)(|
)(
)1()(
ivInj
j
j
i
vOut
vS
vS 
•Vi – node of interest
•Vj – node connected to Vi through incoming links
• – damping factor (i.e., probability of choosing a node in the
network)
•In(Vi) – incoming nodes to Vi
•Out(Vj) – outgoing nodes from Vj
TERM IMPORTANCE (EXPLAINED)
11
Vi
Vj3
Vj5
Vj4
Vj2
Vj6
Vj1
Term Score (Vi) = TextRank (Vi) + POSRank (Vi)
EXPERIMENTAL DATASET
12
8 Projects (Apache + Eclipse)
GitHub commits &
Change set
BugZilla + JIRA issues
1,939 change tasks
EXPERIMENTAL SETUP
13
Change
request
Baseline
query
Suggested
query
Code search
Our ranks
Baseline
ranks
Compare
Query Effectiveness
Mean Avearge Precision
Mean Recall
Top-K Accuracy
EXPERIMENTAL RESULTS
(QUERY EFFECTIVENESS)
14
Query Pairs Improved Worsened P-value Preserved MRD
STRICT vs. Title 57.84% 34.94% <0.001* 7.22% -147
STRICT vs. Title
(10 keywords)
62.49% 32.26% <0.001* 5.25% -201
STRICT vs.
Description
53.84% 38.21% <0.001* 7.95% -329
STRICT vs.
(Title + Desc.)
52.36% 39.94% <0.001* 7.70% -265
*= Significant Difference, MRD = Mean Rank Difference
EXPERIMENTAL RESULTS
(RETRIEVAL PERFORMANCE)
15*Our performance is significantly higher for each metric
EXPERIMENTAL RESULTS
(RETRIEVAL PERFORMANCE)
16
Our Top-K accuracy is clearly higher for various K-values
COMPARISON WITH EXISTING METHODS
(QUERY EFFECTIVENESS)
17
Technique Improved Worsened Preserved MRD
Kevic & Fritz, ICSE
2014
40.09% 53.95% 5.96% +101
Rocchio’s Method,
ICSE 2013
37.59% 56.38% 6.03% +45
STRICT 57.84%* 34.94%* 7.22% -147
*= Significant Difference, MRD = Mean Rank Difference
COMPARISON WITH EXISTING METHODS
(RETRIEVAL PERFORMANCE)
18*Our performance is significantly higher for each metric
than the state-of-the-art
COMPARISON WITH EXISTING METHODS
(RETRIEVAL PERFORMANCE)
19Our Top-K accuracy is clearly higher for various K-values
than the state-of-the-art
TAKE-HOME MESSAGES
 Identifying initial search terms is challenging.
 Only 12.20% of developer’s search terms are
relevant.
 PageRank Algorithm adapted for term
importance.
 We combined TextRank and POSRank for
identifying important terms.
 Experiments with 1,939 change tasks from 8
systems of Apache & Eclipse.
 57.84% of queries improved by STRICT.
 Comparison with state-of-the-art approach
validates our approach. 20
THANK YOU !!! QUESTIONS?
21
More details on STRICT:
http://homepage.usask.ca/~masud.rahman/strict/
Contact: masud.rahman@usask.ca
PROVOCATIVE STATEMENT
 We need better algorithms to overcome
“vocabulary mismatch issue”. Where to start
from? Which source/repository is more appropriate
beside project source code?
22
PROBABLE QUESTIONS
 Did you do stemming?
 No we didn’t since many recent studies reported negative
performance. Especially does not help when the texts contain
structured items like camel case tokens.
 Which one is better TextRank and POSRank?
 The performed quite similarly. But we combined them since
they convey two distinct aspects of connectivity.
 Which settings did you apply for the ranking
algorithm?
 Details in the paper. But these PR-based algorithms have a
tendency of converging scores despite their initial settings
unlike simple VSM based models.
 Can this be used for query reformulation?
 Could be yes, if you can convert the artifact into the text
graph. We are basically working with that using source code.
23
PROBABLE QUESTIONS
 Recent studies show that IR-based methods are not
effective if the bug report is not rich.
 Yup, that’s true. We need more techniques to better write the
bug reports. Plus, we need better methods to address
vocabulary mismatch issue.
 Why didn’t you consider any stuff from the source
code?
 We are suggesting the initial query. Yes, the source will be
used for query-reformulation. We also showed that our initial
query is better than the baselines as used by the developers
frequently.
 How is the cost? How long it take?
 It is pretty much real time. We are planning to develop an IDE
plug-in recently.
24

More Related Content

What's hot

Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Lucidworks
 
Data Quality at the Scale of Aggregation
Data Quality at the Scale of AggregationData Quality at the Scale of Aggregation
Data Quality at the Scale of AggregationGretchen Gueguen
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...Tao Xie
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submittedChamath Sajeewa
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Répondre à la question automatique avec le web
Répondre à la question automatique avec le webRépondre à la question automatique avec le web
Répondre à la question automatique avec le webAhmed Hammami
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Xun Wang
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...Ali Ouni
 
Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code
Test-Driven Reuse: Improving the Selection of Semantically Relevant Source CodeTest-Driven Reuse: Improving the Selection of Semantically Relevant Source Code
Test-Driven Reuse: Improving the Selection of Semantically Relevant Source CodeMehrdad Nurolahzade
 
Web Service Antipatterns Detection Using Genetic Programming
Web Service Antipatterns Detection Using Genetic ProgrammingWeb Service Antipatterns Detection Using Genetic Programming
Web Service Antipatterns Detection Using Genetic ProgrammingAli Ouni
 

What's hot (19)

Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
Data Quality at the Scale of Aggregation
Data Quality at the Scale of AggregationData Quality at the Scale of Aggregation
Data Quality at the Scale of Aggregation
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Répondre à la question automatique avec le web
Répondre à la question automatique avec le webRépondre à la question automatique avec le web
Répondre à la question automatique avec le web
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market
 
ResumeDavidSurine
ResumeDavidSurineResumeDavidSurine
ResumeDavidSurine
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
 
Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code
Test-Driven Reuse: Improving the Selection of Semantically Relevant Source CodeTest-Driven Reuse: Improving the Selection of Semantically Relevant Source Code
Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code
 
Web Service Antipatterns Detection Using Genetic Programming
Web Service Antipatterns Detection Using Genetic ProgrammingWeb Service Antipatterns Detection Using Genetic Programming
Web Service Antipatterns Detection Using Genetic Programming
 

Similar to STRICT-SANER2017

CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017Masud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeMasud Rahman
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slidesMasud Rahman
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
2010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week12010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week1Wolfram Arnold
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationBunyamin Sisman
 
Refactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech TalkRefactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech TalkCodeOps Technologies LLP
 
Refactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech Talk Refactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech Talk Ganesh Samarthyam
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 

Similar to STRICT-SANER2017 (20)

CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slides
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
2010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week12010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week1
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug Localization
 
Refactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech TalkRefactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech Talk
 
Refactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech Talk Refactoring for Software Design Smells - Tech Talk
Refactoring for Software Design Smells - Tech Talk
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Srikanth CV - BDM
Srikanth CV - BDMSrikanth CV - BDM
Srikanth CV - BDM
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 

More from Masud Rahman

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityMasud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanMasud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud RahmanMasud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanMasud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanMasud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015Masud Rahman
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Masud Rahman
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationMasud Rahman
 
Exploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsExploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsMasud Rahman
 
SOAP--Simple Object Access Protocol
SOAP--Simple Object Access ProtocolSOAP--Simple Object Access Protocol
SOAP--Simple Object Access ProtocolMasud Rahman
 

More from Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
 
Exploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsExploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and Exceptions
 
SOAP--Simple Object Access Protocol
SOAP--Simple Object Access ProtocolSOAP--Simple Object Access Protocol
SOAP--Simple Object Access Protocol
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

STRICT-SANER2017

  • 1. STRICT: INFORMATION RETRIEVAL BASED SEARCH TERM IDENTIFICATION FOR CONCEPT LOCATION Mohammad Masudur Rahman, Chanchal K. Roy Department of Computer Science University of Saskatchewan, Canada International Conference on Software Analysis, Evolution and Reengineering (SANER 2017), Klagenfurt, Austria
  • 2. SOFTWARE CHANGE TASK 2 Task Summary Task Description Other Information
  • 3. SOFTWARE CHANGE TASK: DOMAIN CONCEPT--ARTIFACT MAPPING IResource element Tree Level Provider 3 Domain concepts Project artifacts (e.g., classes, methods) Our contribution: Identifying such concepts
  • 4. EXISTING WORKS  Query reformulation & expansion  Haiduc et al, ICSE 2013  Gay et al, ICSM 2009  Shepherd et al, ASOD 2007  Query quality analysis  Haiduc et al, ASE 2011  Haiduc et al, ICPC 2011  Haiduc et al, ICSE 2012  Software artifact mining  Howard et al, MSR 2013  Kevic & Fritz, MSR 2014  Heuristics  Kevic & Fritz, ICSE 2014 4 • Most studies expect the developer to provide an initial query • Developers succeed only in 12.2% of cases (Kevic & Fritz, ICSE 2014) Initial search query for a change task.
  • 5. PAGERANK ALGORITHM: WEB LINK ANALYSIS 5Size of a face ∞ Size of the faces pointing to it Most important face in this crowd
  • 6. SEARCH TERM IDENTIFICATION USING TEXTRANK & POSRANK 6
  • 8. TEXTRANK: TERM IMPORTANCE USING CO- OCCURRENCE (MIHALCEA ET AL, EMNLP 2004) 8 IResource-------IJavaElement, element-----reported Node = Distinct word Edge = Two words co- occurring in the same context
  • 9. POSRANK: TERM IMPORTANCE USING SYNTACTIC DEPENDENCE (BLANCO & LIOMA, INF. RETR. 2012) 9 Edge = Syntactic dependence between various parts of speech in the sentence Verb-------Noun, Verb---Adjective Jespersen’s Rank Theory Noun Verb Adjective
  • 10. TERM IMPORTANCE (ADAPTED FROM PAGERANK) 10    )( )10( |)(| )( )1()( ivInj j j i vOut vS vS  •Vi – node of interest •Vj – node connected to Vi through incoming links • – damping factor (i.e., probability of choosing a node in the network) •In(Vi) – incoming nodes to Vi •Out(Vj) – outgoing nodes from Vj
  • 11. TERM IMPORTANCE (EXPLAINED) 11 Vi Vj3 Vj5 Vj4 Vj2 Vj6 Vj1 Term Score (Vi) = TextRank (Vi) + POSRank (Vi)
  • 12. EXPERIMENTAL DATASET 12 8 Projects (Apache + Eclipse) GitHub commits & Change set BugZilla + JIRA issues 1,939 change tasks
  • 13. EXPERIMENTAL SETUP 13 Change request Baseline query Suggested query Code search Our ranks Baseline ranks Compare Query Effectiveness Mean Avearge Precision Mean Recall Top-K Accuracy
  • 14. EXPERIMENTAL RESULTS (QUERY EFFECTIVENESS) 14 Query Pairs Improved Worsened P-value Preserved MRD STRICT vs. Title 57.84% 34.94% <0.001* 7.22% -147 STRICT vs. Title (10 keywords) 62.49% 32.26% <0.001* 5.25% -201 STRICT vs. Description 53.84% 38.21% <0.001* 7.95% -329 STRICT vs. (Title + Desc.) 52.36% 39.94% <0.001* 7.70% -265 *= Significant Difference, MRD = Mean Rank Difference
  • 15. EXPERIMENTAL RESULTS (RETRIEVAL PERFORMANCE) 15*Our performance is significantly higher for each metric
  • 16. EXPERIMENTAL RESULTS (RETRIEVAL PERFORMANCE) 16 Our Top-K accuracy is clearly higher for various K-values
  • 17. COMPARISON WITH EXISTING METHODS (QUERY EFFECTIVENESS) 17 Technique Improved Worsened Preserved MRD Kevic & Fritz, ICSE 2014 40.09% 53.95% 5.96% +101 Rocchio’s Method, ICSE 2013 37.59% 56.38% 6.03% +45 STRICT 57.84%* 34.94%* 7.22% -147 *= Significant Difference, MRD = Mean Rank Difference
  • 18. COMPARISON WITH EXISTING METHODS (RETRIEVAL PERFORMANCE) 18*Our performance is significantly higher for each metric than the state-of-the-art
  • 19. COMPARISON WITH EXISTING METHODS (RETRIEVAL PERFORMANCE) 19Our Top-K accuracy is clearly higher for various K-values than the state-of-the-art
  • 20. TAKE-HOME MESSAGES  Identifying initial search terms is challenging.  Only 12.20% of developer’s search terms are relevant.  PageRank Algorithm adapted for term importance.  We combined TextRank and POSRank for identifying important terms.  Experiments with 1,939 change tasks from 8 systems of Apache & Eclipse.  57.84% of queries improved by STRICT.  Comparison with state-of-the-art approach validates our approach. 20
  • 21. THANK YOU !!! QUESTIONS? 21 More details on STRICT: http://homepage.usask.ca/~masud.rahman/strict/ Contact: masud.rahman@usask.ca
  • 22. PROVOCATIVE STATEMENT  We need better algorithms to overcome “vocabulary mismatch issue”. Where to start from? Which source/repository is more appropriate beside project source code? 22
  • 23. PROBABLE QUESTIONS  Did you do stemming?  No we didn’t since many recent studies reported negative performance. Especially does not help when the texts contain structured items like camel case tokens.  Which one is better TextRank and POSRank?  The performed quite similarly. But we combined them since they convey two distinct aspects of connectivity.  Which settings did you apply for the ranking algorithm?  Details in the paper. But these PR-based algorithms have a tendency of converging scores despite their initial settings unlike simple VSM based models.  Can this be used for query reformulation?  Could be yes, if you can convert the artifact into the text graph. We are basically working with that using source code. 23
  • 24. PROBABLE QUESTIONS  Recent studies show that IR-based methods are not effective if the bug report is not rich.  Yup, that’s true. We need more techniques to better write the bug reports. Plus, we need better methods to address vocabulary mismatch issue.  Why didn’t you consider any stuff from the source code?  We are suggesting the initial query. Yes, the source will be used for query-reformulation. We also showed that our initial query is better than the baselines as used by the developers frequently.  How is the cost? How long it take?  It is pretty much real time. We are planning to develop an IDE plug-in recently. 24

Editor's Notes

  1. Introduce yourself and the affiliation. Today I am going to talk about query suggestion for Concept location where we used Information Retrieval methods.
  2. This is a software change request. It has different sections like title, description and others. Now a developer’s task is to identify the most important terms and then use them for finding the source code to change.
  3. To model the problem formally, this is a mapping problem. And the mapping is between concepts in the change request and the relevant source artifacts from the codebase. Our job is to identify the appropriate terms from the change request for the successful mapping.
  4. There have been some studies on similar problem. However, most of these studies reformulate a given query. That means, the developer needs to provide an initial query first. But studies show that choosing that initial query itself is challenging. A study reported that only 12% of developers chosen search terms from the change request were useful. So, our focus is to choose the initial query from a change request rather than reformulation. The closely related work used a set of heuristics.
  5. While the earlier work used heuristics for the same problem, we used Google’s PageRank algorithm for choosing the important terms from a body of texts. Here, the most important face in the crowd is the face everybody is looking at, right? This also goes true for world wide web. A page is reputed is it is referred by other reputed pages from the web. So, we adapt our search term identification after this model.
  6. We identify search terms using two variants of PageRank--- They are called TextRank and POSRank in the information retrieval domain.
  7. So, these are the pretty straight-forward steps of our approach. We take a change request, and perform standard NLP (stop word removal and splitting). We avoided stemming. Then from the pre-processed texts, we develop two types of graphs – text graph and POS graph. Then we derive importance score for each of the terms from those two graphs. Then we a do linear combination, perform ranking and choose the top words as the search terms based on their scores. Now, we will zoom in this sections more.
  8. The idea behind this text graph is word co-occurrence. For example, these two terms—IResource and IJavaElement-- occur in the same context across multiple sentences. These are another two terms– element and reported—occur in the same context. Here we define context as a window size of two words within a sentence. We encode their co-occurrence into an edge in this text graph. This way, the whole change request can be converted into a text graph.
  9. Similarly, we develop the second graph based on syntactic dependence among various parts of speech of sentence. We apply Jespersen’s Rank Theory of 3 ranks. More details on the paper. That is, some POS depends on others POS for their complete meaning. For example, verb modifies noun and adjectives from within the same sentence. We encode such dependencies into the connecting edge, and develop another text graph. Thus, some terms are more connected than others.
  10. Now, we have two graphs developed from the change request based on two different dimensions --Word co-occurrence and syntactic dependence. Now, we apply the above algorithms adapted from PageRank for scoring. That is, a term’s importance will be determined by the importance of the surrounding terms, not just the connectivity. This is how Google beats the SCAM pages. We apply that in the case of concept location as well. This is the first time done in the concept location task, and this is our novelty.
  11. So, this is how the score of a term is determined, based on the scores of the surrounding terms. That means, the score of Vi is determined based on the scores of Vj1 to Vj5. We collect scores for the terms from both graphs which we call TextRank and POSRank. We combine them, rank them and collect the top ones as the search terms.
  12. For experiments, we select 8 subject systems from Apache and Eclipse. We collect 1939 change requests/bug reports from BugZilla and JIRA, and prepare the gold set by consulting the commit history of those projects from GitHub. For selecting bug fixing commits, we adopted the widely accepted approach. That is, we identify the Bug ID in the commit title, and then extract corresponding change set.
  13. For experiments, We collect our queries and the baseline queries (e.g., title or description from the change request), and feed them to a code search engine. Then we collect their results/ranks and compare. For evaluation/validation, we used these four performance metrics.
  14. Results show that our method can improve 52%--62% of the baseline queries, which is promising according to relevant literature. We consider various combinations as the baseline queries, and got similar performance. Our improvement and worsening ratios are significantly different according to statistical tests. The mean rank difference also shows that our mean ranks are closer to the top than the baseline.
  15. In terms retrieval performance, precision and recall are not too high. Precision is close to 30% and the accuracy is close to 45% when Top-10 results are considered. But I guess, that has been the status quo for the last 15 years. So, nothing very dramatic. However, they are quite higher than the baseline performance actually.
  16. When we extend the K-values, we found the accuracy is growing significantly. But, still, our performance remained higher than all the baselines. This shows the potential of our method.
  17. We compared with two parallel methods– Kevic & Fritz used heuristics and the second is a classic query reformulation technique. While they were promising, but still our method beat them in all aspects, and the performance is significantly higher as you see.
  18. If we see at box plots, we can see that our median metrics are significantly higher. While they relied to a set of heuristics and term weighting, our PageRank-based model seems to perform better.
  19. When we consider various Top-K accuracy, we got similar findings. Our method located concepts correctly for 80% of the change requests whereas they did for 60% of them at best. This shows the potential of our technique.
  20. You can simply read out the texts I guess.
  21. Thanks for your time and attention. I am ready to have a few questions.
  22. We tried with source code and Stack Overflow to look for semantically similar words. What’s next?