The document discusses data-driven approaches to optimizing software testing processes at Microsoft. It describes how historical test and code data can be analyzed to determine which tests are most valuable and cost-effective to run, in order to reduce total test execution time without negatively impacting code quality. Simulation results on Windows 8.1 data show the potential for significant test reduction (up to 60%) while maintaining bug finding ability. This could improve development processes by lowering machine costs and increasing developer satisfaction.
2. Data-driven software engineering @Microsoft
•How can we optimize the testing process?
•Do code reviews make a difference?
•Is coding velocity and quality always a tradeoff?
•What’s the optimal way to organize work on a large team?
MSR Redmond/TSE:
Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta
MSR Redmond:
Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann
MSR Cambridge: Brendan MurphyKim Herzig
14. GMQ vs. Opportunistic data collection
•Easily available ≠ what’s needed
•Determine the needed data
•Find proxy measures if needed
•Know the analysis before collecting the data
Otherwise, data is not usable for the intended purpose
•Goal –Question –Metric
•Check for completeness, cleanness/ noise and usefulness
•Data background
•How was data generated?
•Why was it generated?
•Who consumes the data?
•What about outliers?
•How was the data processed?
16. Tools, processes,
practices and policies.
Release schedule
Time
Engineers
What roles exist?
Who does what?
Responsibilities?
M1
M2
Beta
Organization of code bases
Team structure and culture.
18. Engineers want to understand the nitty-gritty
•How do you calculate the recommended reviewers?
•Why was that person recommended?
•Why is Lisa not recommended?
19. Simplicity first
Files
without
bugs
Files
with
bugs
Files withoutbugs: main contributor made > 50% of all edits
Files withbugs: main contributor made < 60% of all edits
Ownership metric:
Proportion of edits of all edits for the contributor with the most edits
Reporting vs. Prediction
Comprehension
vs. automation
If you can do it with a decision tree… do it…
20. Iterative process with very close involvement of product teams and domain experts.
It’s a dialog
It’s a back and forth
21. Mixed Method Research
Is a research approach or methodology
•for questions that call for real-life contextual understandings;
•employing rigorous quantitative research assessing magnitude and frequency of constructs and
•rigorous qualitative researchexploring the meaning and understanding of constructs;
DR. MARGARET-ANNESTOREY
Professor of Computer Science University of Victoria
All methods are inherently flawed!
Generalizability
Precision
Realism
DR. ARIEVANDEURSEN
Professor of Software Engineering Delft University of Technology
22. Foundations of Mixed
Methods Research
Designing
Social Inquiry
Qualitative Research: Mixed Method Research
•Interviews
•Observations
•Focus groups
•Contextual Inquiry
•Grounded Theory
•…
23. A Grounded Theory Study
23
Systematic procedure to discover a theory from (qualitative) data
S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011.
B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004.
Glaser and Strauss
24. Deductiveversus inductive
A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7)
Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004).
Theory
Hypotheses
Observation
Confirm/Reject
Observation
Patterns
Theory
25. All models are wrong but some are useful
(George E. P. Box)
26. Theo: Test Effectiveness Optimization from History
Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy*
*Microsoft Research, Cambridge
+Microsoft Corporation, US
27. Improving Development Processes
Product /
Service
Legacy
changes
New product
features
Technology
changes
Development Environment
$
Speed
R
Cost
Quality / Risk
(should be well balanced)
Microsoft aims for shorter release cycles
Empirical data to support & drive decisions
• Speed up development processes (e.g. code velocity)
• More frequent releases
• Maintaining / increasing product quality
Joint effort by MSR & product teams
• MSR Cambridge: Brendan Murphy, Kim Herzig
• TSE Redmond: Jacek Czerwonka, Michaela Greiler
• MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan
• Windows, Windows Phone, Office, Dynamics product teams
28. Software Testing for Windows
Winmain (main branch)
Quality gate
(system testing)
Quality gate
(system & component testing)
Quality gate
(component testing)
time
Development branch
Multiple area branches
Multiple component branches
Software testing is very expensive
• Thousands test suites executed, millions test cases executed
• On different branches, architectures, languages, etc.
• We tend to repeat the same tests over and over again
• Too many false alarms (failures due to test and infrastructure issues)
• Each test failures slows down product development
• Aims to find code issues as early as possible
• At the cost of slower product development
Actual problem
Current process aims for maximal protection
{Simplified illustration}
29. Software Testing for Office
Software testing is very expensive
• Thousands test suites executed, millions test cases executed
• On different branches, architectures, languages, etc.
• We tend to repeat the same tests over and over again
• Too many false alarms (failures due to test and infrastructure issues)
• Each test failures slows down product development
• Aims to find code issues as early as possible
• At the cost of slower product development
Actual problem
Current process aims for maximal protection
Dev Inner Loop
BVT and CVT
on main
Dog food
Different
• Branching structure
• Development process
• Testing process
• Release schedules
• …
{Simplified illustration}
30. Goal
Reduce the number of test executions …
… without sacrificing code quality
Dynamic, self-adaptive optimization model
31. Solution
Reduce the number of test executions …
•Runevery test at least once beforeintegrating code change into main branch (e.g., winmain).
•We eventually find all code issues but take riskof finding them later (on higher level branches).
… without sacrificing code quality
High cost, unknown value
$$$$$
High cost, low value$$$$
Low cost,
low value$
Low cost, good value$$
How likely is a test causing:
1)false positivesor
2)finding code issues?
Analyzehistoric data:
-Test Events
-Builds
-Code Integrations
Analyzepast test results
-Passing tests, false alarms, detected code issues
33. Solution
Using cost function to model risk.
푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest
퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm"
=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒)
퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater"
=푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ
Test
Costto run a test.
Valueof output.
35. Dynamic, Self-Adaptive
Decision points are connected to each other
Skipping tests influences the risk factorsof higher level branches
We re-enable testsif code quality drops (e.g. different milestone)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
relative test reduction rate
Time (Windows 8.1)
Training period
36. Bug Finding Performance of Tests
How many test executions fail?
#failed test exec
Branch level
Number of test executions
How many of the failed test executions result in bug reports?
FP
TP test-unspecific
TP test-specific
Branch level
37. Impact on Development Process
Secondary Improvements
•Machine Setup: we may lower the number of machines allocated to testing process
•Developer satisfaction: Removing false test failures increases confidence in testing process
…hard to estimate speed improvement through simulation
“We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)