SlideShare a Scribd company logo
On the Use of
     Relevance Feedback in
   IR-Based Concept Location


Gregory Gay*, Sonia Haiduc**, Andrian Marcus**, Tim
                     Menzies*

   * West Virginia University, Morgantown, WV, USA
      ** Wayne State University, Detroit, MI, USA
Software change
IR-based concept location
      Query




Ranked list of results   Source code
Challenge: the query




• Text in the query needs to match the text in the
  source code
• Difficult to formulate good queries
  - unfamiliar source code
  - unknown target
  -> hard to describe something that you do not know
Eclipse bug #13926

Bug description:
 JFace Text Editor Leaves a Black
 Rectangle on Content Assist text insertion.
 Inserting a selected completion proposal
 from the context information popup causes
 a black rectangle to appear on top of the
 display.
Queries
• Q1: jface text editor black rectangle insert text

• Q2: jface text editor rectangle insert context
  information

• Q3: jface text editor content assist

• Q4: jface insert selected completion proposal
  context information
Queries and results
• Q1: jface text editor black rectangle insert text
  – position of modified method: 7496
• Q2: jface text editor rectangle insert context
  information
  – position of modified method: 258
• Q3: jface text editor content assist
  – position of modified method: 119
• Q4: jface insert selected completion proposal
  context information
  – position of modified method: 723

  Whole change request: 54
IR CL in unfamiliar software

Developers:
• Rarely begin with a good query: hard to choose
  the right words
• Analyze very briefly list of results before
  reformulating query
• Even after reformulation, vague idea of what to
  look for -> queries not always better
• Can recognize whether the results retrieved are
  relevant or not to the problem
Questions

• Is there a way to make the query formulation
  easier on the developers?

• Is there a way to ensure that the subsequent
  queries lead us in the right direction?

• Can we do this by following the common
  practices of the developers?

• Can we improve IR-based CL using this
  approach?
Relevance feedback

•   Uses developer feedback about relevancy of
    returned results to automatically reformulate
    queries
•   Queries are reformulated by:
    –   Adding terms from relevant documents
    –   Removing terms from irrelevant documents
•   Iterative process
•   Common technique in text retrieval
•   Used also in SE
JFace Text Editor Leaves a Black Rectangle on Content Assist text
   insertion. Inserting a selected completion proposal from the
   context information popup causes a black rectangle to appear on
   top of the display.

1. createContextInfoPopup() in
   org.eclipse.jface.text.contentassist.ContextInformationPopup
2. configure() in
   org.eclipse.jdt.internal.debug.ui.JDIContentAssistPreference
3. showContextProposals() in
   org.eclipse.jface.text.contentassist.ContextInformationPopup


     + words in      documents          - words in      documents


                               New Query
IRRF tool
• IR Engine: Lucene
  – based on the Vector Space Model (VSM)
  – input: methods, query
  – output: a ranked list of methods ordered by their
    textual similarity to the query

• Relevance feedback: Rocchio algorithm
  – the classic algorithm for RF; used also in SE
  – models a way of incorporating relevance
    feedback information into the VSM
Evaluation

• Extracted bug descriptions and set of methods
  modified in the bug fixes from bug tracking
  systems
• Consider bug descriptions as initial queries for IR
• Measure #methods investigated until reaching a
  modified method before and after using RF
• Relevance feedback:
  – one developer provides feedback
  – feedback round ends after marking N methods as
    relevant or irrelevant (N = 1, 3 ,5)
Stop criteria

• Target method in top N results

• More than 50 methods analyzed

• Position of target methods in the ranked list
  of results increases for 2 consecutive rounds
  -> query moving away from wanted methods
Systems


 System     Vers.   LOC    Methods      Classes
 Eclipse     2.0 2,500,000 74,996        7,500
  jEdit     4.2      300,000   5,366     750
Adempiere   3.1.0    330,000   28,622   1,900
Results

 System     RF improves   RF does not
                 IR       improve IR
 Eclipse          6            1

  jEdit         3             3

Adempiere       4             1

   All          13            5
Results
• Eclipse:
 Report #    Baseline    IRRF N=1    IRRF N=3    IRRF N=5
  19686        428        453 (5r)   48 (16r)    46 m(9r)


• jEdit:
 Report #    Baseline    IRRF N=1    IRRF N=3    IRRF N=5
 1607211       354        103(5r)     36 (12r)    28 (6r)


• Adempiere:
 Report #    Baseline    IRRF N=1    IRRF N=3    IRRF N=5
 1628050        52         3 (3r)      5 (2r)      7 (2r)
Questions – revisited (1)
• Is there a way to make the query formulation
  easier on the developers?
  – automatic query formulation


• Is there a way to ensure that the subsequent
  queries lead us in the right direction?
  – add terms from relevant documents, remove terms
    from irrelevant documents
  – stop when we move away from the target (results
    worsen for 2 consecutive rounds)
Questions – revisited (2)
• Can we do this by following the common
  practices of the developers?
  – developers still analyze only a few results in
    the result list before reformulation


• Can we improve IR-based CL using
  relevance feedback?
  – in some cases yes
Current and future work
• Studies involving more systems and more
  developers

• Automatically calibrating the parameters
  for a specific system and a specific set of
  change requests

• Study the circumstances when RF does
  not improve IR

More Related Content

What's hot

Lecture 1
Lecture 1Lecture 1
Lecture 1
RacingKings
 
Softwaretestingstrategies
SoftwaretestingstrategiesSoftwaretestingstrategies
Softwaretestingstrategies
saieswar19
 
SYNTAX Directed Translation Report || Compiler Construction
SYNTAX Directed Translation Report || Compiler ConstructionSYNTAX Directed Translation Report || Compiler Construction
SYNTAX Directed Translation Report || Compiler Construction
Zain Abid
 
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
ijaia
 
Cyclomatic complexity
Cyclomatic complexityCyclomatic complexity
Cyclomatic complexity
Nikita Kesharwani
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
Praveen Penumathsa
 
Formal Methods
Formal MethodsFormal Methods
Formal Methods
HendMuhammad
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
Roy Antony Arnold G
 
DISE - Programming Concepts
DISE - Programming ConceptsDISE - Programming Concepts
DISE - Programming Concepts
Rasan Samarasinghe
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
Madhar Khan Pathan
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Twitter Inc.
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
SungdoGu
 
Dynamic analysis in Software Testing
Dynamic analysis in Software TestingDynamic analysis in Software Testing
Dynamic analysis in Software Testing
Sagar Pednekar
 
Mca se chapter_9_formal_methods
Mca se chapter_9_formal_methodsMca se chapter_9_formal_methods
Mca se chapter_9_formal_methods
Aman Adhikari
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
CS, NcState
 
Introduction To Algorithms
Introduction To AlgorithmsIntroduction To Algorithms
Introduction To Algorithms
KM Bappi
 
Software testing- an introduction
Software testing- an introductionSoftware testing- an introduction
Software testing- an introduction
Santhi Priyan
 
Testing ppt
Testing pptTesting ppt
Testing ppt
Santhi Priyan
 
Unit 5 testing -software quality assurance
Unit 5  testing -software quality assuranceUnit 5  testing -software quality assurance
Unit 5 testing -software quality assurance
gopal10scs185
 

What's hot (19)

Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Softwaretestingstrategies
SoftwaretestingstrategiesSoftwaretestingstrategies
Softwaretestingstrategies
 
SYNTAX Directed Translation Report || Compiler Construction
SYNTAX Directed Translation Report || Compiler ConstructionSYNTAX Directed Translation Report || Compiler Construction
SYNTAX Directed Translation Report || Compiler Construction
 
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
 
Cyclomatic complexity
Cyclomatic complexityCyclomatic complexity
Cyclomatic complexity
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
 
Formal Methods
Formal MethodsFormal Methods
Formal Methods
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
DISE - Programming Concepts
DISE - Programming ConceptsDISE - Programming Concepts
DISE - Programming Concepts
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
 
Dynamic analysis in Software Testing
Dynamic analysis in Software TestingDynamic analysis in Software Testing
Dynamic analysis in Software Testing
 
Mca se chapter_9_formal_methods
Mca se chapter_9_formal_methodsMca se chapter_9_formal_methods
Mca se chapter_9_formal_methods
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
 
Introduction To Algorithms
Introduction To AlgorithmsIntroduction To Algorithms
Introduction To Algorithms
 
Software testing- an introduction
Software testing- an introductionSoftware testing- an introduction
Software testing- an introduction
 
Testing ppt
Testing pptTesting ppt
Testing ppt
 
Unit 5 testing -software quality assurance
Unit 5  testing -software quality assuranceUnit 5  testing -software quality assurance
Unit 5 testing -software quality assurance
 

Viewers also liked

Finding Robust Solutions to Requirements Models
Finding Robust Solutions to Requirements ModelsFinding Robust Solutions to Requirements Models
Finding Robust Solutions to Requirements Models
gregoryg
 
Costing your Bug Data Operations
Costing your Bug Data OperationsCosting your Bug Data Operations
Costing your Bug Data Operations
DataWorks Summit
 
Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...
Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...
Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...
Decision Management Solutions
 
How to Build Decision Management Systems Part 2 Decision services
How to Build Decision Management Systems Part 2 Decision servicesHow to Build Decision Management Systems Part 2 Decision services
How to Build Decision Management Systems Part 2 Decision services
Decision Management Solutions
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
gregoryg
 
The Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements ModelsThe Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements Models
gregoryg
 
Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
DataWorks Summit
 

Viewers also liked (7)

Finding Robust Solutions to Requirements Models
Finding Robust Solutions to Requirements ModelsFinding Robust Solutions to Requirements Models
Finding Robust Solutions to Requirements Models
 
Costing your Bug Data Operations
Costing your Bug Data OperationsCosting your Bug Data Operations
Costing your Bug Data Operations
 
Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...
Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...
Using Predictive Analytics: Secrets to Creating a Successful Predictive Enter...
 
How to Build Decision Management Systems Part 2 Decision services
How to Build Decision Management Systems Part 2 Decision servicesHow to Build Decision Management Systems Part 2 Decision services
How to Build Decision Management Systems Part 2 Decision services
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
 
The Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements ModelsThe Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements Models
 
Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
 

Similar to Irrf Presentation

Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
Sonia Haiduc
 
Apsec 2014 Presentation
Apsec 2014 PresentationApsec 2014 Presentation
Apsec 2014 Presentation
Ahrim Han, Ph.D.
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
Aravind Sesagiri Raamkumar
 
Query processing
Query processingQuery processing
Query processing
Dr. C.V. Suresh Babu
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Building a Meta-search Engine
Building a Meta-search EngineBuilding a Meta-search Engine
Building a Meta-search Engine
Ayan Chandra
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
arthi v
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
David Zibriczky
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
Poo Kuan Hoong
 
Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015
Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015
Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015
Harald Steck
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
IRJET- Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine LearningIRJET- Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine Learning
IRJET Journal
 
IRJET- Question-Answer Text Mining using Machine Learning
IRJET-  	  Question-Answer Text Mining using Machine LearningIRJET-  	  Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine Learning
IRJET Journal
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
nikhil sreeni
 
IT1204 - Software Engineering - L12
IT1204 - Software Engineering - L12IT1204 - Software Engineering - L12
IT1204 - Software Engineering - L12
BakerTilly US
 
Managing software project, software engineering
Managing software project, software engineeringManaging software project, software engineering
Managing software project, software engineering
Rupesh Vaishnav
 
Development Guideline
Development GuidelineDevelopment Guideline
Development Guideline
Mohammad Nasir Uddin
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
ISSEL
 
Software Measurement and Metrics.pptx
Software Measurement and Metrics.pptxSoftware Measurement and Metrics.pptx
Software Measurement and Metrics.pptx
ubaidullah75790
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web Collections
eXascale Infolab
 

Similar to Irrf Presentation (20)

Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
 
Apsec 2014 Presentation
Apsec 2014 PresentationApsec 2014 Presentation
Apsec 2014 Presentation
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
Query processing
Query processingQuery processing
Query processing
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Building a Meta-search Engine
Building a Meta-search EngineBuilding a Meta-search Engine
Building a Meta-search Engine
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015
Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015
Gaussian Ranking by Matrix Factorization, ACM RecSys Conference 2015
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
IRJET- Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine LearningIRJET- Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine Learning
 
IRJET- Question-Answer Text Mining using Machine Learning
IRJET-  	  Question-Answer Text Mining using Machine LearningIRJET-  	  Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine Learning
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
 
IT1204 - Software Engineering - L12
IT1204 - Software Engineering - L12IT1204 - Software Engineering - L12
IT1204 - Software Engineering - L12
 
Managing software project, software engineering
Managing software project, software engineeringManaging software project, software engineering
Managing software project, software engineering
 
Development Guideline
Development GuidelineDevelopment Guideline
Development Guideline
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Software Measurement and Metrics.pptx
Software Measurement and Metrics.pptxSoftware Measurement and Metrics.pptx
Software Measurement and Metrics.pptx
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web Collections
 

More from gregoryg

Distributed Decision Tree Induction
Distributed Decision Tree InductionDistributed Decision Tree Induction
Distributed Decision Tree Induction
gregoryg
 
Optimizing Requirements Decisions with KEYS
Optimizing Requirements Decisions with KEYSOptimizing Requirements Decisions with KEYS
Optimizing Requirements Decisions with KEYS
gregoryg
 
Confidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PREDConfidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PRED
gregoryg
 
Promise08 Wrapup
Promise08 WrapupPromise08 Wrapup
Promise08 Wrapup
gregoryg
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
gregoryg
 
Software Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative ModelSoftware Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative Model
gregoryg
 
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
Complementing Approaches in ERP Effort Estimation Practice: an Industrial StudyComplementing Approaches in ERP Effort Estimation Practice: an Industrial Study
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
gregoryg
 
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
gregoryg
 
Implications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect PredictorsImplications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect Predictors
gregoryg
 
Practical use of defect detection and prediction
Practical use of defect detection and predictionPractical use of defect detection and prediction
Practical use of defect detection and prediction
gregoryg
 
Risk And Relevance 20080414ppt
Risk And Relevance 20080414pptRisk And Relevance 20080414ppt
Risk And Relevance 20080414ppt
gregoryg
 
Organizations Use Data
Organizations Use DataOrganizations Use Data
Organizations Use Data
gregoryg
 
Cukic Promise08 V3
Cukic Promise08 V3Cukic Promise08 V3
Cukic Promise08 V3
gregoryg
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
gregoryg
 
Elane - Promise08
Elane - Promise08Elane - Promise08
Elane - Promise08
gregoryg
 
Risk And Relevance 20080414ppt
Risk And Relevance 20080414pptRisk And Relevance 20080414ppt
Risk And Relevance 20080414ppt
gregoryg
 
Introduction Promise 2008 V3
Introduction Promise 2008 V3Introduction Promise 2008 V3
Introduction Promise 2008 V3
gregoryg
 

More from gregoryg (17)

Distributed Decision Tree Induction
Distributed Decision Tree InductionDistributed Decision Tree Induction
Distributed Decision Tree Induction
 
Optimizing Requirements Decisions with KEYS
Optimizing Requirements Decisions with KEYSOptimizing Requirements Decisions with KEYS
Optimizing Requirements Decisions with KEYS
 
Confidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PREDConfidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PRED
 
Promise08 Wrapup
Promise08 WrapupPromise08 Wrapup
Promise08 Wrapup
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
 
Software Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative ModelSoftware Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative Model
 
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
Complementing Approaches in ERP Effort Estimation Practice: an Industrial StudyComplementing Approaches in ERP Effort Estimation Practice: an Industrial Study
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
 
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
 
Implications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect PredictorsImplications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect Predictors
 
Practical use of defect detection and prediction
Practical use of defect detection and predictionPractical use of defect detection and prediction
Practical use of defect detection and prediction
 
Risk And Relevance 20080414ppt
Risk And Relevance 20080414pptRisk And Relevance 20080414ppt
Risk And Relevance 20080414ppt
 
Organizations Use Data
Organizations Use DataOrganizations Use Data
Organizations Use Data
 
Cukic Promise08 V3
Cukic Promise08 V3Cukic Promise08 V3
Cukic Promise08 V3
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
 
Elane - Promise08
Elane - Promise08Elane - Promise08
Elane - Promise08
 
Risk And Relevance 20080414ppt
Risk And Relevance 20080414pptRisk And Relevance 20080414ppt
Risk And Relevance 20080414ppt
 
Introduction Promise 2008 V3
Introduction Promise 2008 V3Introduction Promise 2008 V3
Introduction Promise 2008 V3
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Irrf Presentation

  • 1. On the Use of Relevance Feedback in IR-Based Concept Location Gregory Gay*, Sonia Haiduc**, Andrian Marcus**, Tim Menzies* * West Virginia University, Morgantown, WV, USA ** Wayne State University, Detroit, MI, USA
  • 3. IR-based concept location Query Ranked list of results Source code
  • 4. Challenge: the query • Text in the query needs to match the text in the source code • Difficult to formulate good queries - unfamiliar source code - unknown target -> hard to describe something that you do not know
  • 5. Eclipse bug #13926 Bug description: JFace Text Editor Leaves a Black Rectangle on Content Assist text insertion. Inserting a selected completion proposal from the context information popup causes a black rectangle to appear on top of the display.
  • 6. Queries • Q1: jface text editor black rectangle insert text • Q2: jface text editor rectangle insert context information • Q3: jface text editor content assist • Q4: jface insert selected completion proposal context information
  • 7. Queries and results • Q1: jface text editor black rectangle insert text – position of modified method: 7496 • Q2: jface text editor rectangle insert context information – position of modified method: 258 • Q3: jface text editor content assist – position of modified method: 119 • Q4: jface insert selected completion proposal context information – position of modified method: 723 Whole change request: 54
  • 8. IR CL in unfamiliar software Developers: • Rarely begin with a good query: hard to choose the right words • Analyze very briefly list of results before reformulating query • Even after reformulation, vague idea of what to look for -> queries not always better • Can recognize whether the results retrieved are relevant or not to the problem
  • 9. Questions • Is there a way to make the query formulation easier on the developers? • Is there a way to ensure that the subsequent queries lead us in the right direction? • Can we do this by following the common practices of the developers? • Can we improve IR-based CL using this approach?
  • 10. Relevance feedback • Uses developer feedback about relevancy of returned results to automatically reformulate queries • Queries are reformulated by: – Adding terms from relevant documents – Removing terms from irrelevant documents • Iterative process • Common technique in text retrieval • Used also in SE
  • 11. JFace Text Editor Leaves a Black Rectangle on Content Assist text insertion. Inserting a selected completion proposal from the context information popup causes a black rectangle to appear on top of the display. 1. createContextInfoPopup() in org.eclipse.jface.text.contentassist.ContextInformationPopup 2. configure() in org.eclipse.jdt.internal.debug.ui.JDIContentAssistPreference 3. showContextProposals() in org.eclipse.jface.text.contentassist.ContextInformationPopup + words in documents - words in documents New Query
  • 12. IRRF tool • IR Engine: Lucene – based on the Vector Space Model (VSM) – input: methods, query – output: a ranked list of methods ordered by their textual similarity to the query • Relevance feedback: Rocchio algorithm – the classic algorithm for RF; used also in SE – models a way of incorporating relevance feedback information into the VSM
  • 13. Evaluation • Extracted bug descriptions and set of methods modified in the bug fixes from bug tracking systems • Consider bug descriptions as initial queries for IR • Measure #methods investigated until reaching a modified method before and after using RF • Relevance feedback: – one developer provides feedback – feedback round ends after marking N methods as relevant or irrelevant (N = 1, 3 ,5)
  • 14. Stop criteria • Target method in top N results • More than 50 methods analyzed • Position of target methods in the ranked list of results increases for 2 consecutive rounds -> query moving away from wanted methods
  • 15. Systems System Vers. LOC Methods Classes Eclipse 2.0 2,500,000 74,996 7,500 jEdit 4.2 300,000 5,366 750 Adempiere 3.1.0 330,000 28,622 1,900
  • 16. Results System RF improves RF does not IR improve IR Eclipse 6 1 jEdit 3 3 Adempiere 4 1 All 13 5
  • 17. Results • Eclipse: Report # Baseline IRRF N=1 IRRF N=3 IRRF N=5 19686 428 453 (5r) 48 (16r) 46 m(9r) • jEdit: Report # Baseline IRRF N=1 IRRF N=3 IRRF N=5 1607211 354 103(5r) 36 (12r) 28 (6r) • Adempiere: Report # Baseline IRRF N=1 IRRF N=3 IRRF N=5 1628050 52 3 (3r) 5 (2r) 7 (2r)
  • 18. Questions – revisited (1) • Is there a way to make the query formulation easier on the developers? – automatic query formulation • Is there a way to ensure that the subsequent queries lead us in the right direction? – add terms from relevant documents, remove terms from irrelevant documents – stop when we move away from the target (results worsen for 2 consecutive rounds)
  • 19. Questions – revisited (2) • Can we do this by following the common practices of the developers? – developers still analyze only a few results in the result list before reformulation • Can we improve IR-based CL using relevance feedback? – in some cases yes
  • 20. Current and future work • Studies involving more systems and more developers • Automatically calibrating the parameters for a specific system and a specific set of change requests • Study the circumstances when RF does not improve IR