SlideShare a Scribd company logo
1 of 38
Download to read offline
Data-driven software engineering @Microsoft 
Michaela Greiler
Data-driven software engineering @Microsoft 
•How can we optimize the testing process? 
•Do code reviews make a difference? 
•Is coding velocity and quality always a tradeoff? 
•What’s the optimal way to organize work on a large team? 
MSR Redmond/TSE: 
Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta 
MSR Redmond: 
Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann 
MSR Cambridge: Brendan MurphyKim Herzig
0 
20 
40 
60 
80 
100 
2010 
2010 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Code Coverage trigger of Checkins 
% completely covered 
% somewhat covered 
% not covered
Reviewer recommendation: Does experience matter?
Can we change with what we can measure? 
Michaela Greiler
YES
YES 
that’s the danger!
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
SOCIO TECHNICAL CONGRUENCE 
“Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
So should we go without any measurements?
Interpretation 
Data Collection 
Usage 
Lessons learned 
No 
Garbage!
•What is codemine? What data does codeminehave?
GMQ vs. Opportunistic data collection 
•Easily available ≠ what’s needed 
•Determine the needed data 
•Find proxy measures if needed 
•Know the analysis before collecting the data 
Otherwise, data is not usable for the intended purpose 
•Goal –Question –Metric 
•Check for completeness, cleanness/ noise and usefulness 
•Data background 
•How was data generated? 
•Why was it generated? 
•Who consumes the data? 
•What about outliers? 
•How was the data processed?
Interpretation needs domain knowledge
Tools, processes, 
practices and policies. 
Release schedule 
Time 
Engineers 
What roles exist? 
Who does what? 
Responsibilities? 
M1 
M2 
Beta 
Organization of code bases 
Team structure and culture.
You cannot compare 1:1
Engineers want to understand the nitty-gritty 
•How do you calculate the recommended reviewers? 
•Why was that person recommended? 
•Why is Lisa not recommended?
Simplicity first 
Files 
without 
bugs 
Files 
with 
bugs 
Files withoutbugs: main contributor made > 50% of all edits 
Files withbugs: main contributor made < 60% of all edits 
Ownership metric: 
Proportion of edits of all edits for the contributor with the most edits 
Reporting vs. Prediction 
Comprehension 
vs. automation 
If you can do it with a decision tree… do it…
Iterative process with very close involvement of product teams and domain experts. 
It’s a dialog 
It’s a back and forth
Mixed Method Research 
Is a research approach or methodology 
•for questions that call for real-life contextual understandings; 
•employing rigorous quantitative research assessing magnitude and frequency of constructs and 
•rigorous qualitative researchexploring the meaning and understanding of constructs; 
DR. MARGARET-ANNESTOREY 
Professor of Computer Science University of Victoria 
All methods are inherently flawed! 
Generalizability 
Precision 
Realism 
DR. ARIEVANDEURSEN 
Professor of Software Engineering Delft University of Technology
Foundations of Mixed 
Methods Research 
Designing 
Social Inquiry 
Qualitative Research: Mixed Method Research 
•Interviews 
•Observations 
•Focus groups 
•Contextual Inquiry 
•Grounded Theory 
•…
A Grounded Theory Study 
23 
Systematic procedure to discover a theory from (qualitative) data 
S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. 
B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. 
Glaser and Strauss
Deductiveversus inductive 
A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) 
Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). 
Theory 
Hypotheses 
Observation 
Confirm/Reject 
Observation 
Patterns 
Theory
All models are wrong but some are useful 
(George E. P. Box)
Theo: Test Effectiveness Optimization from History 
Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* 
*Microsoft Research, Cambridge 
+Microsoft Corporation, US
Improving Development Processes 
Product / 
Service 
Legacy 
changes 
New product 
features 
Technology 
changes 
Development Environment 
$ 
Speed 
R 
Cost 
Quality / Risk 
(should be well balanced) 
Microsoft aims for shorter release cycles 
Empirical data to support & drive decisions 
• Speed up development processes (e.g. code velocity) 
• More frequent releases 
• Maintaining / increasing product quality 
Joint effort by MSR & product teams 
• MSR Cambridge: Brendan Murphy, Kim Herzig 
• TSE Redmond: Jacek Czerwonka, Michaela Greiler 
• MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan 
• Windows, Windows Phone, Office, Dynamics product teams
Software Testing for Windows 
Winmain (main branch) 
Quality gate 
(system testing) 
Quality gate 
(system & component testing) 
Quality gate 
(component testing) 
time 
Development branch 
Multiple area branches 
Multiple component branches 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
{Simplified illustration}
Software Testing for Office 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
Dev Inner Loop 
BVT and CVT 
on main 
Dog food 
Different 
• Branching structure 
• Development process 
• Testing process 
• Release schedules 
• … 
{Simplified illustration}
Goal 
Reduce the number of test executions … 
… without sacrificing code quality 
Dynamic, self-adaptive optimization model
Solution 
Reduce the number of test executions … 
•Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). 
•We eventually find all code issues but take riskof finding them later (on higher level branches). 
… without sacrificing code quality 
High cost, unknown value 
$$$$$ 
High cost, low value$$$$ 
Low cost, 
low value$ 
Low cost, good value$$ 
How likely is a test causing: 
1)false positivesor 
2)finding code issues? 
Analyzehistoric data: 
-Test Events 
-Builds 
-Code Integrations 
Analyzepast test results 
-Passing tests, false alarms, detected code issues
Bug finding capabilities change with context
Solution 
Using cost function to model risk. 
푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 
퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" 
=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 
퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" 
=푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ 
Test 
Costto run a test. 
Valueof output.
Current Results 
Simulated on Windows 8.1 development period (BVT only)
Dynamic, Self-Adaptive 
Decision points are connected to each other 
Skipping tests influences the risk factorsof higher level branches 
We re-enable testsif code quality drops (e.g. different milestone) 
0.00% 
10.00% 
20.00% 
30.00% 
40.00% 
50.00% 
60.00% 
70.00% 
relative test reduction rate 
Time (Windows 8.1) 
Training period
Bug Finding Performance of Tests 
How many test executions fail? 
#failed test exec 
Branch level 
Number of test executions 
How many of the failed test executions result in bug reports? 
FP 
TP test-unspecific 
TP test-specific 
Branch level
Impact on Development Process 
Secondary Improvements 
•Machine Setup: we may lower the number of machines allocated to testing process 
•Developer satisfaction: Removing false test failures increases confidence in testing process 
…hard to estimate speed improvement through simulation 
“We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
Michaela Greiler 
@mgreiler 
www.michaelagreiler.com 
http://research.microsoft.com/en-us/projects/tse/

More Related Content

What's hot

Better Software Classic Testing Mistakes
Better Software Classic Testing MistakesBetter Software Classic Testing Mistakes
Better Software Classic Testing Mistakes
nazeer pasha
 
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
TEST Huddle
 
01 software test engineering (manual testing)
01 software test engineering (manual testing)01 software test engineering (manual testing)
01 software test engineering (manual testing)
Siddireddy Balu
 

What's hot (19)

Better Software Classic Testing Mistakes
Better Software Classic Testing MistakesBetter Software Classic Testing Mistakes
Better Software Classic Testing Mistakes
 
A Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven TestA Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven Test
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
 
Exploratory Testing Basics and Future
Exploratory Testing Basics and FutureExploratory Testing Basics and Future
Exploratory Testing Basics and Future
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
ISTQB CTAL - Test Analyst
ISTQB CTAL - Test AnalystISTQB CTAL - Test Analyst
ISTQB CTAL - Test Analyst
 
Software testing
Software testingSoftware testing
Software testing
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQB
 
Klaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using ScrumKlaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using Scrum
 
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
 
Effective unit testing
Effective unit testingEffective unit testing
Effective unit testing
 
Mats Grindal - Risk-Based Testing - Details of Our Success
Mats Grindal - Risk-Based Testing - Details of Our Success Mats Grindal - Risk-Based Testing - Details of Our Success
Mats Grindal - Risk-Based Testing - Details of Our Success
 
OmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next LevelOmniTestingConf: Taking Test Automation to the Next Level
OmniTestingConf: Taking Test Automation to the Next Level
 
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSQUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
 
S440999102
S440999102S440999102
S440999102
 
01 software test engineering (manual testing)
01 software test engineering (manual testing)01 software test engineering (manual testing)
01 software test engineering (manual testing)
 
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010
 
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away -  EuroSTAR 2010Ken Johnston - Big Bugs That Got Away -  EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
 
IT8076 - SOFTWARE TESTING
IT8076 - SOFTWARE TESTINGIT8076 - SOFTWARE TESTING
IT8076 - SOFTWARE TESTING
 

Similar to Can we induce change with what we measure?

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 

Similar to Can we induce change with what we measure? (20)

First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...
 
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
 
Test-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate Workplace
 
History Class - For software testers
History Class - For software testersHistory Class - For software testers
History Class - For software testers
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to share
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
 
tem7
tem7tem7
tem7
 
Agile Testing Days
Agile Testing DaysAgile Testing Days
Agile Testing Days
 
What would Jesus Developer do?
What would Jesus Developer do?What would Jesus Developer do?
What would Jesus Developer do?
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
New model
New modelNew model
New model
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For Testing
 
Software testing
Software testingSoftware testing
Software testing
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Combinatorial testing ppt
Combinatorial testing pptCombinatorial testing ppt
Combinatorial testing ppt
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Can we induce change with what we measure?

  • 1. Data-driven software engineering @Microsoft Michaela Greiler
  • 2. Data-driven software engineering @Microsoft •How can we optimize the testing process? •Do code reviews make a difference? •Is coding velocity and quality always a tradeoff? •What’s the optimal way to organize work on a large team? MSR Redmond/TSE: Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta MSR Redmond: Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann MSR Cambridge: Brendan MurphyKim Herzig
  • 3. 0 20 40 60 80 100 2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 Code Coverage trigger of Checkins % completely covered % somewhat covered % not covered
  • 4. Reviewer recommendation: Does experience matter?
  • 5. Can we change with what we can measure? Michaela Greiler
  • 6. YES
  • 8. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 9. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 10. SOCIO TECHNICAL CONGRUENCE “Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
  • 11. So should we go without any measurements?
  • 12. Interpretation Data Collection Usage Lessons learned No Garbage!
  • 13. •What is codemine? What data does codeminehave?
  • 14. GMQ vs. Opportunistic data collection •Easily available ≠ what’s needed •Determine the needed data •Find proxy measures if needed •Know the analysis before collecting the data Otherwise, data is not usable for the intended purpose •Goal –Question –Metric •Check for completeness, cleanness/ noise and usefulness •Data background •How was data generated? •Why was it generated? •Who consumes the data? •What about outliers? •How was the data processed?
  • 16. Tools, processes, practices and policies. Release schedule Time Engineers What roles exist? Who does what? Responsibilities? M1 M2 Beta Organization of code bases Team structure and culture.
  • 18. Engineers want to understand the nitty-gritty •How do you calculate the recommended reviewers? •Why was that person recommended? •Why is Lisa not recommended?
  • 19. Simplicity first Files without bugs Files with bugs Files withoutbugs: main contributor made > 50% of all edits Files withbugs: main contributor made < 60% of all edits Ownership metric: Proportion of edits of all edits for the contributor with the most edits Reporting vs. Prediction Comprehension vs. automation If you can do it with a decision tree… do it…
  • 20. Iterative process with very close involvement of product teams and domain experts. It’s a dialog It’s a back and forth
  • 21. Mixed Method Research Is a research approach or methodology •for questions that call for real-life contextual understandings; •employing rigorous quantitative research assessing magnitude and frequency of constructs and •rigorous qualitative researchexploring the meaning and understanding of constructs; DR. MARGARET-ANNESTOREY Professor of Computer Science University of Victoria All methods are inherently flawed! Generalizability Precision Realism DR. ARIEVANDEURSEN Professor of Software Engineering Delft University of Technology
  • 22. Foundations of Mixed Methods Research Designing Social Inquiry Qualitative Research: Mixed Method Research •Interviews •Observations •Focus groups •Contextual Inquiry •Grounded Theory •…
  • 23. A Grounded Theory Study 23 Systematic procedure to discover a theory from (qualitative) data S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. Glaser and Strauss
  • 24. Deductiveversus inductive A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). Theory Hypotheses Observation Confirm/Reject Observation Patterns Theory
  • 25. All models are wrong but some are useful (George E. P. Box)
  • 26. Theo: Test Effectiveness Optimization from History Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* *Microsoft Research, Cambridge +Microsoft Corporation, US
  • 27. Improving Development Processes Product / Service Legacy changes New product features Technology changes Development Environment $ Speed R Cost Quality / Risk (should be well balanced) Microsoft aims for shorter release cycles Empirical data to support & drive decisions • Speed up development processes (e.g. code velocity) • More frequent releases • Maintaining / increasing product quality Joint effort by MSR & product teams • MSR Cambridge: Brendan Murphy, Kim Herzig • TSE Redmond: Jacek Czerwonka, Michaela Greiler • MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan • Windows, Windows Phone, Office, Dynamics product teams
  • 28. Software Testing for Windows Winmain (main branch) Quality gate (system testing) Quality gate (system & component testing) Quality gate (component testing) time Development branch Multiple area branches Multiple component branches Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection {Simplified illustration}
  • 29. Software Testing for Office Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection Dev Inner Loop BVT and CVT on main Dog food Different • Branching structure • Development process • Testing process • Release schedules • … {Simplified illustration}
  • 30. Goal Reduce the number of test executions … … without sacrificing code quality Dynamic, self-adaptive optimization model
  • 31. Solution Reduce the number of test executions … •Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). •We eventually find all code issues but take riskof finding them later (on higher level branches). … without sacrificing code quality High cost, unknown value $$$$$ High cost, low value$$$$ Low cost, low value$ Low cost, good value$$ How likely is a test causing: 1)false positivesor 2)finding code issues? Analyzehistoric data: -Test Events -Builds -Code Integrations Analyzepast test results -Passing tests, false alarms, detected code issues
  • 32. Bug finding capabilities change with context
  • 33. Solution Using cost function to model risk. 푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" =퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" =푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ Test Costto run a test. Valueof output.
  • 34. Current Results Simulated on Windows 8.1 development period (BVT only)
  • 35. Dynamic, Self-Adaptive Decision points are connected to each other Skipping tests influences the risk factorsof higher level branches We re-enable testsif code quality drops (e.g. different milestone) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% relative test reduction rate Time (Windows 8.1) Training period
  • 36. Bug Finding Performance of Tests How many test executions fail? #failed test exec Branch level Number of test executions How many of the failed test executions result in bug reports? FP TP test-unspecific TP test-specific Branch level
  • 37. Impact on Development Process Secondary Improvements •Machine Setup: we may lower the number of machines allocated to testing process •Developer satisfaction: Removing false test failures increases confidence in testing process …hard to estimate speed improvement through simulation “We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
  • 38. Michaela Greiler @mgreiler www.michaelagreiler.com http://research.microsoft.com/en-us/projects/tse/