SlideShare a Scribd company logo
Patterns for Cleaning Up Bug Data
Rodrigo Souza1,*
Christina Chavez1
Roberto Bittencourt2
1 Federal University of Bahia, Brazil
2 State University of Feira de Santana, Brazil
DAPSE’13: International Workshop on Data Analysis Patterns in Software Engineering
* speaker; email: rodrigo@dcc.ufba.br
May 21, 2013 San Francisco, USA
Bug reports 	
  
provide insight about…
- the quality of the software
- the quality of the process
Bug reports 	
  
often contain data that is…
-  incomplete
-  innacurate
-  biased
Bug reports 	
  
may lead you to
wrong conclusions	
  
are
like vegetables…
You have to clean
them up before
using them
Bug reports 	
  
In This Talk
Two patterns to help you clean up your data
1. Look Out For Mass Updates
2. Old Wine Tastes Better
Look Out for Mass Updates
Determine which changes to bug reports were the result of a mass update.
1. Context
2. Problem
3. Solution
4. Discussion
Look Out for Mass Updates
tuesday
Worked	
  on	
  bug	
  #5	
  
Worked	
  on	
  bug	
  #12	
  
Updated	
  bug	
  report	
  #5	
  
Updated	
  bug	
  report	
  #12	
  
Joe’s worklog
Today, Joe worked on two
bugs and updated the
corresponding bug
reports
tuesday
Updated	
  bug	
  report	
  #5	
  
Updated	
  bug	
  report	
  #12	
  
Joe’s worklog
Data scientists just see
the updates
Joe updated two reports
⇒ Joe worked on two bugs
Worked	
  on	
  bug	
  #5	
  
Worked	
  on	
  bug	
  #12	
  
wednesday
Joe’s worklog
Joe updated 2600 reports
⇒ Joe worked on 2600 bugs?
Updated	
  bug	
  report	
  #3	
  
Updated	
  bug	
  report	
  #18	
  
Updated	
  bug	
  report	
  #9	
  
Updated	
  bug	
  report	
  #15	
  
Updated	
  bug	
  report	
  #21	
  
Updated	
  bug	
  report	
  #52	
  
Updated	
  bug	
  report	
  #40	
  
Updated	
  bug	
  report	
  #41	
  
Updated	
  bug	
  report	
  #68	
  
Updated	
  bug	
  report	
  #73	
  
Updated	
  bug	
  report	
  #78	
  
…	
  
Mass updates	
  	
  
do not represent actual work
Often, they are just cleanup
Mass updates	
  	
  
should be discarded
from your analyses
1. Context
2. Problem
3. Solution
4. Discussion
Look Out for Mass Updates
Determine which changes to bug reports
were the result of a mass update
1. Context
2. Problem
3. Solution
4. Discussion
Look Out for Mass Updates
You’ll need:
-  Changes in bug reports (i.e., updates)
- What changed
- Date
- User
- Comment
Ingredients
Bug	
  #	
   What	
  changed	
   Date	
   User	
   Comment	
  
1	
   status	
  ⇒	
  
VERIFIED	
  
...	
   ...	
   ...	
  
2	
   status	
  ⇒	
  
VERIFIED	
  
...	
   ...	
   ...	
  
3	
   status	
  ⇒	
  
CLOSED	
  
...	
   ...	
   ...	
  
4	
   status	
  ⇒	
  
VERIFIED	
  
...	
   ...	
   ...	
  
Ingredients
Select one type of change (“what changed”)
e.g., status ⇒VERIFIED
1
Directions (solution #1)
2 Seek unusually high cliffs
3 Changes in the cliff are
considered mass updates
Plot accum. number of changes over time
Directions (solution #2)
Date	
   User	
   Comment	
  
D1	
   U1	
   C1	
  
D2	
   U2	
   C2	
  
D3	
   U3	
   C3	
  
D4	
   U4	
   C4	
  
D5	
   U5	
   C5	
  
Count	
  ▼	
  
1703	
  
972	
  
447	
  
1	
  
1	
  
2 Count the groups
3 Groups with
higher counts
are mass updates
1 Group changes by
⟨date, user, comment⟩
1. Context
2. Problem
3. Solution
4. Discussion
Look Out for Mass Updates
The main challenge is
to find a suitable threshold
(i.e., how many updates characterize mass updates)
Old Wine Tastes Better
Determine bug reports that are too recent to be classified.
1. Context
2. Problem
3. Solution
4. Discussion
Old Wine Tastes Better
Prediction models predict which bug
reports will undergo some change, e.g.,
predict which bugs get reopened,
predict which bugs get closed as invalid,
predict which bugs get assigned to John.
e.g., predict which bugs get reopened
#	
  
Who	
  
reported?	
  
Severity	
   Age	
   Reopened?	
  
1	
   ...	
   ...	
   ...	
   YES	
  
2	
   ...	
   ...	
   ...	
   YES	
  
3	
   ...	
   ...	
   ...	
   NO	
  
4	
   ...	
   ...	
   ...	
   NO	
  
5	
   ...	
   ...	
   ...	
   NO	
  
training set
#	
  
Who	
  
reported?	
  
Severity	
   Age	
   Reopened?	
  
1	
   ...	
   ...	
   ...	
   YES	
  
2	
   ...	
   ...	
   ...	
   YES	
  
3	
   ...	
   ...	
   ...	
   NO	
  
4	
   ...	
   ...	
   ...	
   NO	
  
5	
   ...	
   ...	
   1	
  day	
   not	
  yet	
  
training set
can’t use too recent bugs for training
1. Context
2. Problem
3. Solution
4. Discussion
Old Wine Tastes Better
Determine bug reports that are
too recent to be classified
1. Context
2. Problem
3. Solution
4. Discussion
Old Wine Tastes Better
You’ll need:
-  Date of last change in your data set
-  Bug reports
- Creation date
- Whether it has been reopened*
Ingredients
* or, in general, whether it has undergone a particular change
Measure each bug’s age,
from its creation date
to the date of the last change in your data set
1
Directions
#	
   ...	
   Age	
   Reopened?	
  
1	
   ...	
   180	
  days	
   YES	
  
2	
   ...	
   90	
  days	
   NO	
  
3	
   ...	
   16	
  days	
   YES	
  
4	
   ...	
   12	
  days	
   NO	
  
...	
   ...	
   ...	
   ...	
  
Guess a threshold
so that bugs younger than the threshold are
considered too recent to be classified
2
Directions
threshold
= 42 days
#	
   ...	
   Age	
   Reopened?	
  
1	
   ...	
   180	
  days	
   YES	
  
2	
   ...	
   90	
  days	
   NO	
  
3	
   ...	
   16	
  days	
   YES	
  
4	
   ...	
   12	
  days	
   NO	
  
...	
   ...	
   ...	
   ...	
  
too recent
Estimate the confidence (α)
that the remaining non-reopened bugs
will never be reopened
3
Directions
#	
   ...	
   Age	
   Reopened?	
  
1	
   ...	
   180	
  days	
   YES	
  
2	
   ...	
   90	
  days	
   NO	
  
3	
   ...	
   16	
  days	
   YES	
  
4	
   ...	
   12	
  days	
   NO	
  
...	
   ...	
   ...	
   ...	
  
confidence (α)?
α =
Directions (formula in the paper)
#	
   ...	
   Age	
   Reopened?	
  
1	
   ...	
   180	
  days	
   YES	
  
2	
   ...	
   90	
  days	
   NO	
  
3	
   ...	
   16	
  days	
   YES	
  
4	
   ...	
   12	
  days	
   NO	
  
...	
   ...	
   ...	
   ...	
  
num. bugs that have been reopened
num. bugs older than the threshold
If α is not high enough (e.g., α< 0.95),
choose another threshold
(i.e., repeat from )
4
Directions
2
1. Context
2. Problem
3. Solution
4. Discussion
Old Wine Tastes Better
There’s a trade off:
larger α ⇒ more confidence, less data
smaller α⇒ less confidence, more data
For the project NetBeans/Platform:
removing bugs younger than 6 weeks (0.7%)
raises the confidence from 88% to 95%
Arrrr!*
It’s in the
paper!
*	
  
Do ye have
any source
code to show?
Thank you!
And clean up your bug
reports before using them!

More Related Content

Viewers also liked

Productpresentatie
ProductpresentatieProductpresentatie
ProductpresentatieTonn
 
Mi carro nuevo - Survey 2012
Mi carro nuevo - Survey 2012Mi carro nuevo - Survey 2012
Mi carro nuevo - Survey 2012
RedMasAdv
 
Project proposal
Project proposalProject proposal
Project proposal
Moeed Awais
 
Using Design Psychology for Good and Evil - IGNITE UXPA 2014
Using Design Psychology for Good and Evil - IGNITE UXPA 2014Using Design Psychology for Good and Evil - IGNITE UXPA 2014
Using Design Psychology for Good and Evil - IGNITE UXPA 2014
Susan Mercer
 
Taller innovacion turistica #rumbo lanzarote con jimmy pons
Taller innovacion turistica #rumbo lanzarote con jimmy ponsTaller innovacion turistica #rumbo lanzarote con jimmy pons
Taller innovacion turistica #rumbo lanzarote con jimmy pons
Esther García
 
Электронные датчики. Датчик утечки воды
Электронные датчики. Датчик утечки водыЭлектронные датчики. Датчик утечки воды
Электронные датчики. Датчик утечки водыxontus
 
Distress company
Distress companyDistress company
Distress company
Quek Joo Chay
 
Passion’s Brag Challenge 2015
Passion’s Brag Challenge 2015Passion’s Brag Challenge 2015
Passion’s Brag Challenge 2015
Teh Theng Heng
 
планета венера
планета венерапланета венера
планета венера
chocolate98
 
The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...
The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...
The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...
Parsons Behle & Latimer
 
Comversación en linaeá
Comversación en linaeáComversación en linaeá
Comversación en linaeá
REMM40
 
Proyecto manhattan.pptxbueno
Proyecto manhattan.pptxbuenoProyecto manhattan.pptxbueno
Proyecto manhattan.pptxbueno
volleyball
 
Festivals celebrated in Hong Kong
Festivals celebrated in Hong KongFestivals celebrated in Hong Kong
Festivals celebrated in Hong Kong
nvssleaders
 
Literatura. La Il·lustració
Literatura. La Il·lustracióLiteratura. La Il·lustració
Literatura. La Il·lustraciómsimo6
 

Viewers also liked (18)

Educanet
EducanetEducanet
Educanet
 
Productpresentatie
ProductpresentatieProductpresentatie
Productpresentatie
 
Tecno
TecnoTecno
Tecno
 
Mi carro nuevo - Survey 2012
Mi carro nuevo - Survey 2012Mi carro nuevo - Survey 2012
Mi carro nuevo - Survey 2012
 
Project proposal
Project proposalProject proposal
Project proposal
 
Using Design Psychology for Good and Evil - IGNITE UXPA 2014
Using Design Psychology for Good and Evil - IGNITE UXPA 2014Using Design Psychology for Good and Evil - IGNITE UXPA 2014
Using Design Psychology for Good and Evil - IGNITE UXPA 2014
 
Taller innovacion turistica #rumbo lanzarote con jimmy pons
Taller innovacion turistica #rumbo lanzarote con jimmy ponsTaller innovacion turistica #rumbo lanzarote con jimmy pons
Taller innovacion turistica #rumbo lanzarote con jimmy pons
 
Электронные датчики. Датчик утечки воды
Электронные датчики. Датчик утечки водыЭлектронные датчики. Датчик утечки воды
Электронные датчики. Датчик утечки воды
 
Алсу
АлсуАлсу
Алсу
 
Distress company
Distress companyDistress company
Distress company
 
Passion’s Brag Challenge 2015
Passion’s Brag Challenge 2015Passion’s Brag Challenge 2015
Passion’s Brag Challenge 2015
 
планета венера
планета венерапланета венера
планета венера
 
The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...
The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...
The Utah Legislature: What Happened in 2012 and How to Protect Your Business ...
 
Aina cordero
Aina corderoAina cordero
Aina cordero
 
Comversación en linaeá
Comversación en linaeáComversación en linaeá
Comversación en linaeá
 
Proyecto manhattan.pptxbueno
Proyecto manhattan.pptxbuenoProyecto manhattan.pptxbueno
Proyecto manhattan.pptxbueno
 
Festivals celebrated in Hong Kong
Festivals celebrated in Hong KongFestivals celebrated in Hong Kong
Festivals celebrated in Hong Kong
 
Literatura. La Il·lustració
Literatura. La Il·lustracióLiteratura. La Il·lustració
Literatura. La Il·lustració
 

Similar to Patterns for Cleaning Up Bug Data

Patterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug ReportsPatterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug Reports
Rodrigo Rocha
 
Blackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test SeriesBlackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test Series
nazeer pasha
 
Binder2
Binder2Binder2
Binder2
Tiana Nadeau
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
Leanleaders.org
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
Leanleaders.org
 
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
Evaluating the Usefulness of IR-Based Fault LocalizationTechniquesEvaluating the Usefulness of IR-Based Fault LocalizationTechniques
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
Alex Orso
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
Omar Ahmed
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
Doug Needham
 
Improving your Agile Process
Improving your Agile ProcessImproving your Agile Process
Improving your Agile Process
David Copeland
 
VictorPSRCCA
VictorPSRCCA VictorPSRCCA
How should we build that? Evolving a development environment that's suitable ...
How should we build that? Evolving a development environment that's suitable ...How should we build that? Evolving a development environment that's suitable ...
How should we build that? Evolving a development environment that's suitable ...
AdaCore
 
Outcome Over Output - And why should we care?
Outcome Over Output - And why should we care?Outcome Over Output - And why should we care?
Outcome Over Output - And why should we care?
Scrum Australia Pty Ltd
 
Defects
DefectsDefects
Defects
Nghia Le
 
Scrum Practices
Scrum PracticesScrum Practices
Scrum Practices
Linchuan Wang
 
Detecting netflixthrough analysis of twitter
Detecting netflixthrough analysis of twitterDetecting netflixthrough analysis of twitter
Detecting netflixthrough analysis of twitter
Jack Shepherd
 
Using Simulation to Manage Software Delivery Risk
Using Simulation to Manage Software Delivery RiskUsing Simulation to Manage Software Delivery Risk
Using Simulation to Manage Software Delivery Risk
Troy Magennis
 
#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360
Derek Chan
 
Acceptance test plan_4-24-07
Acceptance test plan_4-24-07Acceptance test plan_4-24-07
Acceptance test plan_4-24-07
Virgiawan Lakstianto
 
Critical Chain Project Management
Critical Chain Project ManagementCritical Chain Project Management
Critical Chain Project Management
Fred Wiersma
 
Middle Out Design
Middle Out DesignMiddle Out Design
Middle Out Design
Audrey Crane
 

Similar to Patterns for Cleaning Up Bug Data (20)

Patterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug ReportsPatterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug Reports
 
Blackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test SeriesBlackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test Series
 
Binder2
Binder2Binder2
Binder2
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
 
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
Evaluating the Usefulness of IR-Based Fault LocalizationTechniquesEvaluating the Usefulness of IR-Based Fault LocalizationTechniques
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Improving your Agile Process
Improving your Agile ProcessImproving your Agile Process
Improving your Agile Process
 
VictorPSRCCA
VictorPSRCCA VictorPSRCCA
VictorPSRCCA
 
How should we build that? Evolving a development environment that's suitable ...
How should we build that? Evolving a development environment that's suitable ...How should we build that? Evolving a development environment that's suitable ...
How should we build that? Evolving a development environment that's suitable ...
 
Outcome Over Output - And why should we care?
Outcome Over Output - And why should we care?Outcome Over Output - And why should we care?
Outcome Over Output - And why should we care?
 
Defects
DefectsDefects
Defects
 
Scrum Practices
Scrum PracticesScrum Practices
Scrum Practices
 
Detecting netflixthrough analysis of twitter
Detecting netflixthrough analysis of twitterDetecting netflixthrough analysis of twitter
Detecting netflixthrough analysis of twitter
 
Using Simulation to Manage Software Delivery Risk
Using Simulation to Manage Software Delivery RiskUsing Simulation to Manage Software Delivery Risk
Using Simulation to Manage Software Delivery Risk
 
#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360#speakgeek - Support Processes for iconnect360
#speakgeek - Support Processes for iconnect360
 
Acceptance test plan_4-24-07
Acceptance test plan_4-24-07Acceptance test plan_4-24-07
Acceptance test plan_4-24-07
 
Critical Chain Project Management
Critical Chain Project ManagementCritical Chain Project Management
Critical Chain Project Management
 
Middle Out Design
Middle Out DesignMiddle Out Design
Middle Out Design
 

More from Rodrigo Rocha

Aula: busca e ordenação
Aula: busca e ordenaçãoAula: busca e ordenação
Aula: busca e ordenação
Rodrigo Rocha
 
Introdução ao Android (minicurso 4h)
Introdução ao Android (minicurso 4h)Introdução ao Android (minicurso 4h)
Introdução ao Android (minicurso 4h)
Rodrigo Rocha
 
Beabá do R
Beabá do RBeabá do R
Beabá do R
Rodrigo Rocha
 
Mineração de Repositórios de Defeitos
Mineração de Repositórios de DefeitosMineração de Repositórios de Defeitos
Mineração de Repositórios de Defeitos
Rodrigo Rocha
 
2011 seminario rodrigo 2011
2011 seminario rodrigo 20112011 seminario rodrigo 2011
2011 seminario rodrigo 2011
Rodrigo Rocha
 
2012 qualificacao-rodrigo-2012
2012 qualificacao-rodrigo-20122012 qualificacao-rodrigo-2012
2012 qualificacao-rodrigo-2012
Rodrigo Rocha
 
2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina
2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina
2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina
Rodrigo Rocha
 
Características de apps
Características de appsCaracterísticas de apps
Características de apps
Rodrigo Rocha
 
Mercado de apps
Mercado de appsMercado de apps
Mercado de apps
Rodrigo Rocha
 
Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)
Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)
Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)
Rodrigo Rocha
 

More from Rodrigo Rocha (10)

Aula: busca e ordenação
Aula: busca e ordenaçãoAula: busca e ordenação
Aula: busca e ordenação
 
Introdução ao Android (minicurso 4h)
Introdução ao Android (minicurso 4h)Introdução ao Android (minicurso 4h)
Introdução ao Android (minicurso 4h)
 
Beabá do R
Beabá do RBeabá do R
Beabá do R
 
Mineração de Repositórios de Defeitos
Mineração de Repositórios de DefeitosMineração de Repositórios de Defeitos
Mineração de Repositórios de Defeitos
 
2011 seminario rodrigo 2011
2011 seminario rodrigo 20112011 seminario rodrigo 2011
2011 seminario rodrigo 2011
 
2012 qualificacao-rodrigo-2012
2012 qualificacao-rodrigo-20122012 qualificacao-rodrigo-2012
2012 qualificacao-rodrigo-2012
 
2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina
2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina
2012 doutorado - visita de dalton - comentarios de dalton, roberto e christina
 
Características de apps
Características de appsCaracterísticas de apps
Características de apps
 
Mercado de apps
Mercado de appsMercado de apps
Mercado de apps
 
Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)
Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)
Characterizing Verification of Bug Fixes in Two Open Source IDEs (MSR 2012)
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 

Patterns for Cleaning Up Bug Data

  • 1. Patterns for Cleaning Up Bug Data Rodrigo Souza1,* Christina Chavez1 Roberto Bittencourt2 1 Federal University of Bahia, Brazil 2 State University of Feira de Santana, Brazil DAPSE’13: International Workshop on Data Analysis Patterns in Software Engineering * speaker; email: rodrigo@dcc.ufba.br May 21, 2013 San Francisco, USA
  • 3. provide insight about… - the quality of the software - the quality of the process Bug reports  
  • 4. often contain data that is… -  incomplete -  innacurate -  biased Bug reports   may lead you to wrong conclusions  
  • 5. are like vegetables… You have to clean them up before using them Bug reports  
  • 6. In This Talk Two patterns to help you clean up your data 1. Look Out For Mass Updates 2. Old Wine Tastes Better
  • 7. Look Out for Mass Updates Determine which changes to bug reports were the result of a mass update.
  • 8. 1. Context 2. Problem 3. Solution 4. Discussion Look Out for Mass Updates
  • 9. tuesday Worked  on  bug  #5   Worked  on  bug  #12   Updated  bug  report  #5   Updated  bug  report  #12   Joe’s worklog Today, Joe worked on two bugs and updated the corresponding bug reports
  • 10. tuesday Updated  bug  report  #5   Updated  bug  report  #12   Joe’s worklog Data scientists just see the updates Joe updated two reports ⇒ Joe worked on two bugs Worked  on  bug  #5   Worked  on  bug  #12  
  • 11. wednesday Joe’s worklog Joe updated 2600 reports ⇒ Joe worked on 2600 bugs? Updated  bug  report  #3   Updated  bug  report  #18   Updated  bug  report  #9   Updated  bug  report  #15   Updated  bug  report  #21   Updated  bug  report  #52   Updated  bug  report  #40   Updated  bug  report  #41   Updated  bug  report  #68   Updated  bug  report  #73   Updated  bug  report  #78   …  
  • 12.
  • 13. Mass updates     do not represent actual work Often, they are just cleanup
  • 14. Mass updates     should be discarded from your analyses
  • 15. 1. Context 2. Problem 3. Solution 4. Discussion Look Out for Mass Updates
  • 16. Determine which changes to bug reports were the result of a mass update
  • 17. 1. Context 2. Problem 3. Solution 4. Discussion Look Out for Mass Updates
  • 18. You’ll need: -  Changes in bug reports (i.e., updates) - What changed - Date - User - Comment Ingredients
  • 19. Bug  #   What  changed   Date   User   Comment   1   status  ⇒   VERIFIED   ...   ...   ...   2   status  ⇒   VERIFIED   ...   ...   ...   3   status  ⇒   CLOSED   ...   ...   ...   4   status  ⇒   VERIFIED   ...   ...   ...   Ingredients Select one type of change (“what changed”) e.g., status ⇒VERIFIED
  • 20. 1 Directions (solution #1) 2 Seek unusually high cliffs 3 Changes in the cliff are considered mass updates Plot accum. number of changes over time
  • 21. Directions (solution #2) Date   User   Comment   D1   U1   C1   D2   U2   C2   D3   U3   C3   D4   U4   C4   D5   U5   C5   Count  ▼   1703   972   447   1   1   2 Count the groups 3 Groups with higher counts are mass updates 1 Group changes by ⟨date, user, comment⟩
  • 22. 1. Context 2. Problem 3. Solution 4. Discussion Look Out for Mass Updates
  • 23. The main challenge is to find a suitable threshold (i.e., how many updates characterize mass updates)
  • 24. Old Wine Tastes Better Determine bug reports that are too recent to be classified.
  • 25. 1. Context 2. Problem 3. Solution 4. Discussion Old Wine Tastes Better
  • 26. Prediction models predict which bug reports will undergo some change, e.g., predict which bugs get reopened, predict which bugs get closed as invalid, predict which bugs get assigned to John.
  • 27. e.g., predict which bugs get reopened #   Who   reported?   Severity   Age   Reopened?   1   ...   ...   ...   YES   2   ...   ...   ...   YES   3   ...   ...   ...   NO   4   ...   ...   ...   NO   5   ...   ...   ...   NO   training set
  • 28. #   Who   reported?   Severity   Age   Reopened?   1   ...   ...   ...   YES   2   ...   ...   ...   YES   3   ...   ...   ...   NO   4   ...   ...   ...   NO   5   ...   ...   1  day   not  yet   training set can’t use too recent bugs for training
  • 29. 1. Context 2. Problem 3. Solution 4. Discussion Old Wine Tastes Better
  • 30. Determine bug reports that are too recent to be classified
  • 31. 1. Context 2. Problem 3. Solution 4. Discussion Old Wine Tastes Better
  • 32. You’ll need: -  Date of last change in your data set -  Bug reports - Creation date - Whether it has been reopened* Ingredients * or, in general, whether it has undergone a particular change
  • 33. Measure each bug’s age, from its creation date to the date of the last change in your data set 1 Directions #   ...   Age   Reopened?   1   ...   180  days   YES   2   ...   90  days   NO   3   ...   16  days   YES   4   ...   12  days   NO   ...   ...   ...   ...  
  • 34. Guess a threshold so that bugs younger than the threshold are considered too recent to be classified 2 Directions threshold = 42 days #   ...   Age   Reopened?   1   ...   180  days   YES   2   ...   90  days   NO   3   ...   16  days   YES   4   ...   12  days   NO   ...   ...   ...   ...   too recent
  • 35. Estimate the confidence (α) that the remaining non-reopened bugs will never be reopened 3 Directions #   ...   Age   Reopened?   1   ...   180  days   YES   2   ...   90  days   NO   3   ...   16  days   YES   4   ...   12  days   NO   ...   ...   ...   ...   confidence (α)?
  • 36. α = Directions (formula in the paper) #   ...   Age   Reopened?   1   ...   180  days   YES   2   ...   90  days   NO   3   ...   16  days   YES   4   ...   12  days   NO   ...   ...   ...   ...   num. bugs that have been reopened num. bugs older than the threshold
  • 37. If α is not high enough (e.g., α< 0.95), choose another threshold (i.e., repeat from ) 4 Directions 2
  • 38. 1. Context 2. Problem 3. Solution 4. Discussion Old Wine Tastes Better
  • 39. There’s a trade off: larger α ⇒ more confidence, less data smaller α⇒ less confidence, more data
  • 40. For the project NetBeans/Platform: removing bugs younger than 6 weeks (0.7%) raises the confidence from 88% to 95%
  • 41. Arrrr!* It’s in the paper! *   Do ye have any source code to show?
  • 42. Thank you! And clean up your bug reports before using them!