SlideShare a Scribd company logo
Data-driven software engineering @Microsoft 
Michaela Greiler
Data-driven software engineering @Microsoft 
•How can we optimize the testing process? 
•Do code reviews make a difference? 
•Is coding velocity and quality always a tradeoff? 
•What’s the optimal way to organize work on a large team? 
MSR Redmond/TSE: 
Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta 
MSR Redmond: 
Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann 
MSR Cambridge: Brendan MurphyKim Herzig
0 
20 
40 
60 
80 
100 
2010 
2010 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Code Coverage trigger of Checkins 
% completely covered 
% somewhat covered 
% not covered
Reviewer recommendation: Does experience matter?
Can we change with what we can measure? 
Michaela Greiler
YES
YES 
that’s the danger!
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
SOCIO TECHNICAL CONGRUENCE 
“Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
So should we go without any measurements?
Interpretation 
Data Collection 
Usage 
Lessons learned 
No 
Garbage!
•What is codemine? What data does codeminehave?
GMQ vs. Opportunistic data collection 
•Easily available ≠ what’s needed 
•Determine the needed data 
•Find proxy measures if needed 
•Know the analysis before collecting the data 
Otherwise, data is not usable for the intended purpose 
•Goal –Question –Metric 
•Check for completeness, cleanness/ noise and usefulness 
•Data background 
•How was data generated? 
•Why was it generated? 
•Who consumes the data? 
•What about outliers? 
•How was the data processed?
Interpretation needs domain knowledge
Tools, processes, 
practices and policies. 
Release schedule 
Time 
Engineers 
What roles exist? 
Who does what? 
Responsibilities? 
M1 
M2 
Beta 
Organization of code bases 
Team structure and culture.
You cannot compare 1:1
Engineers want to understand the nitty-gritty 
•How do you calculate the recommended reviewers? 
•Why was that person recommended? 
•Why is Lisa not recommended?
Simplicity first 
Files 
without 
bugs 
Files 
with 
bugs 
Files withoutbugs: main contributor made > 50% of all edits 
Files withbugs: main contributor made < 60% of all edits 
Ownership metric: 
Proportion of edits of all edits for the contributor with the most edits 
Reporting vs. Prediction 
Comprehension 
vs. automation 
If you can do it with a decision tree… do it…
Iterative process with very close involvement of product teams and domain experts. 
It’s a dialog 
It’s a back and forth
Mixed Method Research 
Is a research approach or methodology 
•for questions that call for real-life contextual understandings; 
•employing rigorous quantitative research assessing magnitude and frequency of constructs and 
•rigorous qualitative researchexploring the meaning and understanding of constructs; 
DR. MARGARET-ANNESTOREY 
Professor of Computer Science University of Victoria 
All methods are inherently flawed! 
Generalizability 
Precision 
Realism 
DR. ARIEVANDEURSEN 
Professor of Software Engineering Delft University of Technology
Foundations of Mixed 
Methods Research 
Designing 
Social Inquiry 
Qualitative Research: Mixed Method Research 
•Interviews 
•Observations 
•Focus groups 
•Contextual Inquiry 
•Grounded Theory 
•…
A Grounded Theory Study 
23 
Systematic procedure to discover a theory from (qualitative) data 
S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. 
B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. 
Glaser and Strauss
Deductiveversus inductive 
A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) 
Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). 
Theory 
Hypotheses 
Observation 
Confirm/Reject 
Observation 
Patterns 
Theory
All models are wrong but some are useful 
(George E. P. Box)
Theo: Test Effectiveness Optimization from History 
Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* 
*Microsoft Research, Cambridge 
+Microsoft Corporation, US
Improving Development Processes 
Product / 
Service 
Legacy 
changes 
New product 
features 
Technology 
changes 
Development Environment 
$ 
Speed 
R 
Cost 
Quality / Risk 
(should be well balanced) 
Microsoft aims for shorter release cycles 
Empirical data to support & drive decisions 
• Speed up development processes (e.g. code velocity) 
• More frequent releases 
• Maintaining / increasing product quality 
Joint effort by MSR & product teams 
• MSR Cambridge: Brendan Murphy, Kim Herzig 
• TSE Redmond: Jacek Czerwonka, Michaela Greiler 
• MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan 
• Windows, Windows Phone, Office, Dynamics product teams
Software Testing for Windows 
Winmain (main branch) 
Quality gate 
(system testing) 
Quality gate 
(system & component testing) 
Quality gate 
(component testing) 
time 
Development branch 
Multiple area branches 
Multiple component branches 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
{Simplified illustration}
Software Testing for Office 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
Dev Inner Loop 
BVT and CVT 
on main 
Dog food 
Different 
• Branching structure 
• Development process 
• Testing process 
• Release schedules 
• … 
{Simplified illustration}
Goal 
Reduce the number of test executions … 
… without sacrificing code quality 
Dynamic, self-adaptive optimization model
Solution 
Reduce the number of test executions … 
•Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). 
•We eventually find all code issues but take riskof finding them later (on higher level branches). 
… without sacrificing code quality 
High cost, unknown value 
$$$$$ 
High cost, low value$$$$ 
Low cost, 
low value$ 
Low cost, good value$$ 
How likely is a test causing: 
1)false positivesor 
2)finding code issues? 
Analyzehistoric data: 
-Test Events 
-Builds 
-Code Integrations 
Analyzepast test results 
-Passing tests, false alarms, detected code issues
Bug finding capabilities change with context
Solution 
Using cost function to model risk. 
푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 
퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" 
=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 
퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" 
=푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ 
Test 
Costto run a test. 
Valueof output.
Current Results 
Simulated on Windows 8.1 development period (BVT only)
Dynamic, Self-Adaptive 
Decision points are connected to each other 
Skipping tests influences the risk factorsof higher level branches 
We re-enable testsif code quality drops (e.g. different milestone) 
0.00% 
10.00% 
20.00% 
30.00% 
40.00% 
50.00% 
60.00% 
70.00% 
relative test reduction rate 
Time (Windows 8.1) 
Training period
Bug Finding Performance of Tests 
How many test executions fail? 
#failed test exec 
Branch level 
Number of test executions 
How many of the failed test executions result in bug reports? 
FP 
TP test-unspecific 
TP test-specific 
Branch level
Impact on Development Process 
Secondary Improvements 
•Machine Setup: we may lower the number of machines allocated to testing process 
•Developer satisfaction: Removing false test failures increases confidence in testing process 
…hard to estimate speed improvement through simulation 
“We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
Michaela Greiler 
@mgreiler 
www.michaelagreiler.com 
http://research.microsoft.com/en-us/projects/tse/

More Related Content

What's hot

Better Software Classic Testing Mistakes
Better Software Classic Testing MistakesBetter Software Classic Testing Mistakes
Better Software Classic Testing Mistakesnazeer pasha
 
A Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven TestA Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven Test
Editor IJMTER
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
Yida Tao
 
Exploratory Testing Basics and Future
Exploratory Testing Basics and FutureExploratory Testing Basics and Future
Exploratory Testing Basics and Future
Kari Kakkonen
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
Delft University of Technology
 
ISTQB CTAL - Test Analyst
ISTQB CTAL - Test AnalystISTQB CTAL - Test Analyst
ISTQB CTAL - Test Analyst
Samer Desouky
 
Software testing
Software testingSoftware testing
Software testing
prasad g
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQB
Jenny Nguyen
 
Klaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using ScrumKlaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using Scrum
TEST Huddle
 
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
TEST Huddle
 
Effective unit testing
Effective unit testingEffective unit testing
Effective unit testing
Roberto Casadei
 
Mats Grindal - Risk-Based Testing - Details of Our Success
Mats Grindal - Risk-Based Testing - Details of Our Success Mats Grindal - Risk-Based Testing - Details of Our Success
Mats Grindal - Risk-Based Testing - Details of Our Success
TEST Huddle
 
OmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next LevelOmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next Level
Sergio Freire
 
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSQUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
ijseajournal
 
S440999102
S440999102S440999102
S440999102
IJERA Editor
 
01 software test engineering (manual testing)
01 software test engineering (manual testing)01 software test engineering (manual testing)
01 software test engineering (manual testing)Siddireddy Balu
 
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010
TEST Huddle
 
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away -  EuroSTAR 2010Ken Johnston - Big Bugs That Got Away -  EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
TEST Huddle
 
IT8076 - SOFTWARE TESTING
IT8076 - SOFTWARE TESTINGIT8076 - SOFTWARE TESTING
IT8076 - SOFTWARE TESTING
Sathya R
 

What's hot (19)

Better Software Classic Testing Mistakes
Better Software Classic Testing MistakesBetter Software Classic Testing Mistakes
Better Software Classic Testing Mistakes
 
A Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven TestA Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven Test
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
 
Exploratory Testing Basics and Future
Exploratory Testing Basics and FutureExploratory Testing Basics and Future
Exploratory Testing Basics and Future
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
ISTQB CTAL - Test Analyst
ISTQB CTAL - Test AnalystISTQB CTAL - Test Analyst
ISTQB CTAL - Test Analyst
 
Software testing
Software testingSoftware testing
Software testing
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQB
 
Klaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using ScrumKlaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using Scrum
 
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
 
Effective unit testing
Effective unit testingEffective unit testing
Effective unit testing
 
Mats Grindal - Risk-Based Testing - Details of Our Success
Mats Grindal - Risk-Based Testing - Details of Our Success Mats Grindal - Risk-Based Testing - Details of Our Success
Mats Grindal - Risk-Based Testing - Details of Our Success
 
OmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next LevelOmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next Level
 
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSQUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
 
S440999102
S440999102S440999102
S440999102
 
01 software test engineering (manual testing)
01 software test engineering (manual testing)01 software test engineering (manual testing)
01 software test engineering (manual testing)
 
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010
 
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away -  EuroSTAR 2010Ken Johnston - Big Bugs That Got Away -  EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
 
IT8076 - SOFTWARE TESTING
IT8076 - SOFTWARE TESTINGIT8076 - SOFTWARE TESTING
IT8076 - SOFTWARE TESTING
 

Similar to Can we induce change with what we measure?

First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?
Andy Zaidman
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
Kari Kakkonen
 
AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...
Kari Kakkonen
 
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
TEST Huddle
 
Test-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate Workplace
Ahmed Owian
 
History Class - For software testers
History Class - For software testersHistory Class - For software testers
History Class - For software testers
Joris Meerts
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to share
Doron Bar
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
Lionel Briand
 
Agile Testing Days
Agile Testing DaysAgile Testing Days
Agile Testing Days
Marcin Czenko
 
What would Jesus Developer do?
What would Jesus Developer do?What would Jesus Developer do?
What would Jesus Developer do?
Lukáš Čech
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
SQALab
 
New model
New modelNew model
New model
TEST Huddle
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For Testing
TEST Huddle
 
Software testing
Software testingSoftware testing
Software testing
Enamul Haque
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 
Combinatorial testing ppt
Combinatorial testing pptCombinatorial testing ppt
Combinatorial testing ppt
Kedar Kumar
 

Similar to Can we induce change with what we measure? (20)

First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...
 
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
 
Test-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate Workplace
 
History Class - For software testers
History Class - For software testersHistory Class - For software testers
History Class - For software testers
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to share
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
 
tem7
tem7tem7
tem7
 
Agile Testing Days
Agile Testing DaysAgile Testing Days
Agile Testing Days
 
What would Jesus Developer do?
What would Jesus Developer do?What would Jesus Developer do?
What would Jesus Developer do?
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
New model
New modelNew model
New model
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For Testing
 
Software testing
Software testingSoftware testing
Software testing
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Combinatorial testing ppt
Combinatorial testing pptCombinatorial testing ppt
Combinatorial testing ppt
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Can we induce change with what we measure?

  • 1. Data-driven software engineering @Microsoft Michaela Greiler
  • 2. Data-driven software engineering @Microsoft •How can we optimize the testing process? •Do code reviews make a difference? •Is coding velocity and quality always a tradeoff? •What’s the optimal way to organize work on a large team? MSR Redmond/TSE: Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta MSR Redmond: Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann MSR Cambridge: Brendan MurphyKim Herzig
  • 3. 0 20 40 60 80 100 2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 Code Coverage trigger of Checkins % completely covered % somewhat covered % not covered
  • 4. Reviewer recommendation: Does experience matter?
  • 5. Can we change with what we can measure? Michaela Greiler
  • 6. YES
  • 8. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 9. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 10. SOCIO TECHNICAL CONGRUENCE “Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
  • 11. So should we go without any measurements?
  • 12. Interpretation Data Collection Usage Lessons learned No Garbage!
  • 13. •What is codemine? What data does codeminehave?
  • 14. GMQ vs. Opportunistic data collection •Easily available ≠ what’s needed •Determine the needed data •Find proxy measures if needed •Know the analysis before collecting the data Otherwise, data is not usable for the intended purpose •Goal –Question –Metric •Check for completeness, cleanness/ noise and usefulness •Data background •How was data generated? •Why was it generated? •Who consumes the data? •What about outliers? •How was the data processed?
  • 16. Tools, processes, practices and policies. Release schedule Time Engineers What roles exist? Who does what? Responsibilities? M1 M2 Beta Organization of code bases Team structure and culture.
  • 18. Engineers want to understand the nitty-gritty •How do you calculate the recommended reviewers? •Why was that person recommended? •Why is Lisa not recommended?
  • 19. Simplicity first Files without bugs Files with bugs Files withoutbugs: main contributor made > 50% of all edits Files withbugs: main contributor made < 60% of all edits Ownership metric: Proportion of edits of all edits for the contributor with the most edits Reporting vs. Prediction Comprehension vs. automation If you can do it with a decision tree… do it…
  • 20. Iterative process with very close involvement of product teams and domain experts. It’s a dialog It’s a back and forth
  • 21. Mixed Method Research Is a research approach or methodology •for questions that call for real-life contextual understandings; •employing rigorous quantitative research assessing magnitude and frequency of constructs and •rigorous qualitative researchexploring the meaning and understanding of constructs; DR. MARGARET-ANNESTOREY Professor of Computer Science University of Victoria All methods are inherently flawed! Generalizability Precision Realism DR. ARIEVANDEURSEN Professor of Software Engineering Delft University of Technology
  • 22. Foundations of Mixed Methods Research Designing Social Inquiry Qualitative Research: Mixed Method Research •Interviews •Observations •Focus groups •Contextual Inquiry •Grounded Theory •…
  • 23. A Grounded Theory Study 23 Systematic procedure to discover a theory from (qualitative) data S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. Glaser and Strauss
  • 24. Deductiveversus inductive A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). Theory Hypotheses Observation Confirm/Reject Observation Patterns Theory
  • 25. All models are wrong but some are useful (George E. P. Box)
  • 26. Theo: Test Effectiveness Optimization from History Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* *Microsoft Research, Cambridge +Microsoft Corporation, US
  • 27. Improving Development Processes Product / Service Legacy changes New product features Technology changes Development Environment $ Speed R Cost Quality / Risk (should be well balanced) Microsoft aims for shorter release cycles Empirical data to support & drive decisions • Speed up development processes (e.g. code velocity) • More frequent releases • Maintaining / increasing product quality Joint effort by MSR & product teams • MSR Cambridge: Brendan Murphy, Kim Herzig • TSE Redmond: Jacek Czerwonka, Michaela Greiler • MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan • Windows, Windows Phone, Office, Dynamics product teams
  • 28. Software Testing for Windows Winmain (main branch) Quality gate (system testing) Quality gate (system & component testing) Quality gate (component testing) time Development branch Multiple area branches Multiple component branches Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection {Simplified illustration}
  • 29. Software Testing for Office Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection Dev Inner Loop BVT and CVT on main Dog food Different • Branching structure • Development process • Testing process • Release schedules • … {Simplified illustration}
  • 30. Goal Reduce the number of test executions … … without sacrificing code quality Dynamic, self-adaptive optimization model
  • 31. Solution Reduce the number of test executions … •Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). •We eventually find all code issues but take riskof finding them later (on higher level branches). … without sacrificing code quality High cost, unknown value $$$$$ High cost, low value$$$$ Low cost, low value$ Low cost, good value$$ How likely is a test causing: 1)false positivesor 2)finding code issues? Analyzehistoric data: -Test Events -Builds -Code Integrations Analyzepast test results -Passing tests, false alarms, detected code issues
  • 32. Bug finding capabilities change with context
  • 33. Solution Using cost function to model risk. 푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" =퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" =푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ Test Costto run a test. Valueof output.
  • 34. Current Results Simulated on Windows 8.1 development period (BVT only)
  • 35. Dynamic, Self-Adaptive Decision points are connected to each other Skipping tests influences the risk factorsof higher level branches We re-enable testsif code quality drops (e.g. different milestone) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% relative test reduction rate Time (Windows 8.1) Training period
  • 36. Bug Finding Performance of Tests How many test executions fail? #failed test exec Branch level Number of test executions How many of the failed test executions result in bug reports? FP TP test-unspecific TP test-specific Branch level
  • 37. Impact on Development Process Secondary Improvements •Machine Setup: we may lower the number of machines allocated to testing process •Developer satisfaction: Removing false test failures increases confidence in testing process …hard to estimate speed improvement through simulation “We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
  • 38. Michaela Greiler @mgreiler www.michaelagreiler.com http://research.microsoft.com/en-us/projects/tse/