The Robust Optimization of Non-Linear Requirements Models
1. The Robust Optimization of Non-
Linear Requirements Models
GREGORY GAY
THESIS DEFENSE
WEST VIRGINIA UNIVERSITY
GREG@GREGGAY.COM
2. 2
Consider a requirements model…
Contains:
Various goals of a project.
Methods for reaching those goals.
Risks that prevent those goals.
Mitigations that remove risks.
(but carry costs)
A solution: balance between cost and attainment.
This is a non-linear optimization problem!
3. 3
Understanding the Solution Space
Open and pressing issue. [Harman ‘07]
Many SE problems are over-constrained.
No right answer, so give partial solutions.
Robustness of solutions is key – many algorithms
give brittle results. [Harman ‘01]
Important to present insight into the neighborhood.
What happens if I do B instead of A?
4. 4
The Naïve Approach
Naive approaches to understanding neighborhood:
Run N times and report (a) the solutions appearing in more
than N/2 cases, or (b) results with a 95% confidence interval.
Both are flawed – they require multiple trials!
Neighborhood assessment must be fast [Feather ’08]
Real-time if possible.
[Nielson ‘93] states that results must be within:
1Second before the mind begins to drift.
10 seconds before the mind has completely moved on.
5. 5
Research Goals
Two important concerns:
Is demonstrating solution
robustness a time-consuming
task?
Must solution quality be traded
against solution robustness?
6. 6
What We Want
These are the features we want out of an algorithm:
Feature Algorithm
High-Quality Results* Yes
Low Result Variance Yes
Neighborhood (Tame)
Assessment
Scores Plateau to Stability Yes
(Well-behaved)
Speed Real-Time Results Yes
Scalability Yes
*(We want “more for less”)
7. 7
Why Do We Care?
The later in the development process that a defect is
identified, the more exponentially expensive it
becomes to fix it
(Paraphrased from Barry Boehm [Boehm ‘81])
8. 8
Roadmap
Model
Algorithms
Experiments
Future Work
Conclusions
9. 9
The Defect Detection and Prevention Model
Used at NASA JPL by Martin Feather’s “Team X”
[Cornford ’01, Feather ‘02, Feather ’08, Jalali ‘08]
Early-lifecycle requirements model
Light-weight ontology represents:
Requirements: project objectives, weighted by importance.
Risks: events that damage attainment of requirements.
Mitigations: precautions that remove risk, carry a cost value.
Mappings: Directed, weighted edges between requirements
and risks and between risks and mitigations.
Part-of-relations: Provide structure between model
components.
11. 11
Why Use DDP?
Three Reasons:
1. Demonstrably useful. [Feather ‘02, Feather ‘08]
Costsavings often over $100,000
Numerous design improvements seen in DDP sessions
Overall shift in risks in JPL projects.
2. Availability of real-world models [Jalali ‘08]
Now and in the future.
12. 12
The Third Reason
DDP is representative of other requirements tools.
Set of influences, expressed in a hierarchy, with relationships
modeled through equations.
QOC [MacLean ‘96]
Soft-Goals [Myloupolus ‘99]
Sensitivity analysis [Cruz ‘73] not applicable to DDP.
13. 13
Using DDP
Input = Set of enabled mitigations.
Output = Two values: (Cost, Attainment)
Those values are normalized and combined into a
single score [Jalali ‘08]:
14. 14
Roadmap
Model
Algorithms
Experiments
Future Work
Conclusions
15. 15
Search-Based Software Engineering
No single solution, so reformulate as a search problem and find several!
Four factors must be met: [Harman ’01, Harman ‘04]
1. A large search space.
2. Low computational complexity.
3. Approximate continuity (in the score space).
4. No known optimal solutions.
DDP Problem fits all:
1. Some models have up to (299 = 6.33*1029) possible settings.
2. Calculating the score is fast, algorithms run in O(N2)
3. Discrete variables, but continuous score space.
4. Solutions depend on project settings, optimal not known.
16. 16
Theory of KEYS
Theory: A minority of variables control the majority
of the search space. [Menzies ‘07]
If so, then a search that (a) finds those keys and (b)
explores their ranges will rapidly plateau to stable,
optimal solutions.
This is not new: narrows, master-variables, back
doors, and feature subset selection all work on the
same theory.
[Amarel ’86, Crawford ’94, Kohavi ’97, Menzies ’03, Williams ’03]
Everyone reports them, but few exploit them!
17. 17
KEYS Algorithm
Two components: greedy search and a Bayesian
ranking method (BORE = “Best or Rest”).
Each round, a greedy search:
Generate 100 configurations of mitigations 1…M.
Score them.
Sort top 10% of scores into “Best” grouping, bottom 90% into
“Rest.”
Rank individual mitigations using BORE.
The top ranking mitigation is fixed for all subsequent rounds.
Stop when every mitigation has a value, return final
cost and attainment values.
18. 18
BORE Ranking Heuristic
We don’t have to actually search for the keys, just
keep frequency counts for “best” and “rest” scores.
BORE [Clark ‘05] based on Bayes’ theorem. Use
those frequency counts to calculate:
To avoid low-frequency evidence, add support term:
19. 19
KEYS vs KEYS2
KEYS fixes a single top-
ranked mitigation each
round.
KEYS2 [Gay ‘10]
incrementally sets more
(1 in round 1, 2 in round
2… M in round M)
Slightly less tame, much
faster.
20. 20
Discovering KEYS2
KEYS2 is simple, developing it was not!
Many different heuristics tried:
Simple: Complex:
Set more in powers of 2 Shift best/rest slope
ln(round) Neighborhood exploration
top N% KEYS-R
Lesson in simplicity.
21. 21
Benchmarked Algorithms
KEYS much be benchmarked against standard SBSE
techniques.
Simulated Annealing, MaxWalkSat, A* Search
Chosen techniques are discrete, sequential,
unconstrained algorithms. [Gu ‘97]
Constrained searches work towards a pre-determined number
of solutions, unconstrained adjust to their goal space.
22. 22
Simulated Annealing
Classic, yet common, approach. [Kirkpatrick ’83]
Choose a random starting position.
Look at a “neighboring” configuration.
If it is better, go to it.
If not, move based on guidance from probability function
(biased by the current temperature).
Over time, temperature lowers. Wild jumps stabilize
to small wiggles.
23. 23
MaxWalkSat
Hybridized local/random search. [Kautz ‘96, Selman ‘93]
Start with random configuration.
Either perform
Local Search: Move to a neighboring configuration with a
better score. (70%)
Random Search: Change one random mitigation setting. (30%)
Keeps working towards a score threshold. Allotted a
certain number of resets, which it will use if it fails to
pass the threshold within a certain number of
rounds.
24. 24
A* Search
Best first path-finding heuristic. [Hart ‘68]
Uses distance from origin (G) and estimated cost to
goal (H), and moves to the neighbor that minimizes
G+H.
Moves to new location and adds the previous
location to a closed list to prevent backtracking.
Optimal search because it always underestimates H.
Stops after being stuck for 10 rounds.
25. 25
Other Methods
[Gu ‘97]’s survey lists hundreds of methods!
Gradient descent methods and sensitivity analysis
assume a continuous range for model variables.
DDP models are discrete!
Integer programming could still be used (CPLEX
[Mittelmann ‘07])
Too slow! [Coarfa ‘00]
SE problems are over-constrained, so a solution over all
constraints is not possible. [Harman ’07]
Parallel algorithms
Communications overhead overwhelms benefits.
26. 26
Roadmap
Model
Algorithms
Experiments
Future Work
Conclusions
27. 27
Experiment 1: Costs and Attainments
We use real-world models 2,4,5 (1,3 are too small
and were only used for debugging).
Models discussed in [Feather ‘02, Jalali ‘08, Menzies ‘03]
Run each algorithm 1000 times per model.
Removed outlier problems by generating a lot of data points.
Still a small enough number to collect results in a short time
span.
Graph cost and attainment values.
Values towards bottom-right better.
31. 31
Summing it Up…
Feature Simulated MaxWalkSat A* KEYS KEYS2
Annealing
High-Quality Results No No Yes Yes Yes
Low Result Variance No No No Yes Yes
(Tame)
Scores Plateau to Stability ? ? ? ? ?
(Well-behaved)
Real-Time Results ? ? ? ? ?
Scalability ? ? ? ? ?
32. 32
Experiment 2: Runtimes
For each model:
Run each algorithm 100
times.
Record runtime using
Unix “time” command.
Divide runtime/100 to get
average.
34. 34
Summing it Up…
Feature Simulated MaxWalkSat A* KEYS KEYS2
Annealing
High-Quality Results No No Yes Yes Yes
Low Result Variance No No No Yes Yes
(Tame)
Scores Plateau to Stability ? ? ? ? ?
(Well-behaved)
Real-Time Results No No Yes Yes Yes
Scalability ? ? ? ? ?
35. 35
Experiment 3: Scale-Up Study
By 2013, we expect DDP
models 8x larger than
those used in this thesis
(from 2008) Year Num. Variables
2004 30
KEYS and KEYS2 are 2008 100
“real time” now, but will 2010 300
they scale? 2013 800
36. 36
Artificial Model Generation
To test scalability, a generator builds synthesized
models by:
Studying the existing real-world models and collecting
statistics on its internal structure.
Mutating them into larger models based on user-input size,
density, and distribution altering parameters
Models built that were 2,4, and 8 times larger than
existing models.
39. 39
Scale-Up Results (3)
KEYS KEYS2
Runtimes Model Calls Runtimes Model Calls
Exponential 0.82 0.83 0.88 0.93
Polynomial 0.99 0.99 0.99 0.98
(of degree 2)
KEYS and KEYS2 fit to O(N2)
Both scale to larger models, but KEYS requires
exponentially more model calls (thus, large jump in
execution time)
Thus, we recommend KEYS2
40. 40
Summing it Up…
Feature Simulated MaxWalkSat A* KEYS KEYS2
Annealing
High-Quality Results No No Yes Yes Yes
Low Result Variance No No No Yes Yes
(Tame)
Scores Plateau to Stability ? ? ? ? ?
(Well-behaved)
Real-Time Results No No Yes Yes Yes
Scalability No No Maybe No Yes
41. 41
Decision Ordering Diagrams
Design of KEYS2 automatically provides a way to
explore the decision neighborhood.
Decision ordering diagrams – Visual format that
ranks decisions from most to least important . [Gay ‘10]
42. 42
Decision Ordering Diagrams (2)
These diagrams can be used to assess solution
robustness in linear time by
(A) Considering the variance in performance after applying X
decisions.
Spread = measure of variance (75th-25th quartile)
43. 43
Decision Ordering Diagrams (3)
These diagrams can be used to assess solution
robustness in linear time by
(B) Comparing the results of using the first X decisions to that
of X-1 or X+1.
44. 44
Decision Ordering Diagrams (4)
Useful under three conditions:
(a) scores output are well-behaved
(b) variance is tamed
(c) they are generated in a timely manner.
45. 45
Summing it Up…
Feature Simulated MaxWalkSat A* KEYS KEYS2
Annealing
High-Quality Results No No Yes Yes Yes
Low Result Variance No No No Yes Yes
(Tame)
Scores Plateau to Stability No No No Yes Yes
(Well-behaved)
Real-Time Results No No Yes Yes Yes
Scalability No No Maybe Yes Yes
46. 46
Roadmap
Model
Algorithms
Experiments
Future Work
Conclusions
47. 47
Future Work
Weakness in KEYS2 – it
must call the model 100x
per round.
70% of execution time is
spent calling the model.
To improve –call the
model less.
Problem: do this without
losing support!
48. 48
Future Directions
Two main parts – cache and active learning.
Don’t call the model, just look up scores.
Clustering must be efficient [Jiang ’08, Poshyvanyk ‘08]
First round – generate 100 random configurations.
Greedily cluster them, assign identifiers to each, build a
hierarchical distance tree.
After this, can simply drop new instances into the tree (as the
number of clusters will remain fixed).
The model only needs to be called for new instances.
49. 49
Future Directions (2)
Most learners passive – blindly generating or
reading data.
Active learners exercise control over what data they
learn on [Cohn ‘94].
Many clusters are linear – their members form a
smooth area of the search space with similar scores
[Shepperd ‘97].
If a new configuration falls into a linear region, don’t
poll the model, just interpolate its score.
Train KEYS2 to outright ignore linear regions with
poor scores.
50. 50
KEYS as a Component
Treatment Learning Bayesian Nets
(NASA Ames) (Tsinghua University)
Work with Misty Davies, Karen Work with Hongyu Zhang.
Gundy-Burlet.
Bayes Net = directed graph of
Design process: Run variables and their influences.
simulations, then apply Variable ordering problem
Treatment Learning via [Hsu ‘04]
TAR3 or TAR4.1 Order of variables examined by
[Menzies ‘03b] learning process is crucial to
performance!
KEYS is a multi-stage Need a ranking from best -> worst
version of TAR4.1 Genetic algorithms sometimes used,
but they are slow.
Future – KEYS as part of KEYS offers a potential solution
the simulator. to the VOP.
51. 51
Roadmap
Model
Algorithms
Experiments
Future Work
Conclusions
52. 52
Conclusions
Optimization tools can study the space of
requirements, risks, and mitigations.
Finding a balance between costs and attainment is
hard!
Such solutions can be brittle, so we must comment on
solution robustness. [Harman ‘01]
53. 53
Conclusions (2)
Pre-experimental concerns:
An algorithm would need to trade solution quality for
robustness (variance vs score).
Demonstrating solution robustness is time-consuming and
requires multiple procedure calls.
KEYS2 defies both concerns.
Generates higher quality solutions than standard methods, and
generates results that are tame and well-behaved (thus, we can
generate decision ordering graphs to assess robustness).
Is faster than other techniques, and can generate decision
ordering graphs in O(N2)
54. 54
We Recommend KEYS2
Feature Simulated MaxWalkSat A* KEYS KEYS2
Annealing
High-Quality Results No No Yes Yes Yes
Low Result Variance No No No Yes Yes
(Tame)
Scores Plateau to Stability No No No Yes Yes
(Well-behaved)
Real-Time Results No No Yes Yes Yes
Scalability No No Maybe No Yes
55. 55
References
Slide 3:
[Harman ‘01] M. Harman and B.F. Jones. Search-based software engineering. Journal of Information and Software
Technology, 43:833–839, December 2001.
[Harman ‘07] Mark Harman. The current state and future of search based software engineering. In Future of Software
Engineering, ICSE’07. 2007.
Slide 4:
[Feather ‘08] M. Feather, S. Cornford, K. Hicks, J. Kiper, and T. Menzies. Application of a broad- spectrum quantitative
requirements model to early-lifecycle decision making. IEEE Software, 2008.
[Nielson ‘93] J. Nielson. Usability Engineering. Academic Press, 1993.
Slide 7:
[Boehm ‘81] B. W. Boehm. Software Engineering Economics. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1981.
Slide 9:
[Cornford ‘01] S.L. Cornford, M.S. Feather, and K.A. Hicks. DDP a tool for life-cycle risk management. In IEEE Aerospace
Conference, Big Sky, Montana, pages 441–451, March 2001.
[Feather ‘02] M.S. Feather and T. Menzies. Converging on the optimal attainment of requirements. In IEEE Joint Conference On
Requirements Engineering ICRE’02 and RE’02, 9-13th September, University of Essen, Germany, 2002.
[Jalali ‘08] Tim Menzies, Omid Jalali, and Martin Feather. Optimizing requirements decisions with keys. In Proceedings
PROMISE ’08 (ICSE), 2008.
Slide 12:
[Cruz ‘73] Cruz, J.B., editor. System Sensitivity Analysis. Dowden, Hutchinson, & Ross. Stroudsburg, PA. 1973.
[Maclean ‘96] A. MacLean, R.M. Young, V. Bellotti, and T.P. Moran. Questions, options and criteria: Elements of design space
analysis. In T.P. Moran and J.M. Carroll, editors, Design Rationale: Concepts, Techniques, and Use, pages 53–106. Lawerence
Erlbaum Associates, 1996.
[Mylopoulos ‘99] J. Mylopoulos, L. Cheng, and E. Yu. From object-oriented to goal-oriented requirements analysis.
Communications of the ACM, 42(1):31–37, January 1999.
56. 56
References (2)
Slide 15:
[Harman ‘04] Mark Harman and John Clark. Metrics are fitness functions too. In 10th International Software Metrics
Symposium (METRICS 2004), 2004), pages = 58–69, location = Chicago, IL, USA, publisher = IEEE Computer Society Press,
address = Los Alamitos, CA, USA.
Slide 16:
[Amarel ‘86] S. Amarel. Program synthesis as a theory formation task: Problem representations and solution methods. In R.
S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach: Volume II,
pages 499–569. Kaufmann, Los Altos, CA, 1986.
[Crawford ‘94] J.Crawford and A.Baker. Experimental results on the application of satisfiability algorithms to scheduling
problems. In AAAI ’94, 1994.
[Kohavi ‘97] Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelli- gence, 97(1-2):273–324,
1997.
[Menzies ‘03] T. Menzies and H. Singh. Many maybes mean (mostly) the same thing. In M. Madravio, editor, Soft Computing
in Software Engineering. Springer-Verlag, 2003.
[Menzies ‘07] T. Menzies, D. Owen, and K. Richardson. The Strangest Thing About Software. Computer 40, 1 (Jan. 2007),
54-60.
[Williams ’03] R.Williams, C.P.Gomes, and B.Selman. Backdoors to typical case complexity. In Proceedings of IJCAI 2003,
2003.
Slide 18:
[Clark ‘05] R. Clark. Faster treatment learning, Computer Science, Portland State University. Master’s thesis, 2005.
Slide 19:
[Gay ‘10] Gay, Gregory and Menzies, Tim and Jalali, Omid and Mundy, Gregory and Gilkerson, Beau and Feather, Martin and
Kiper, James. Finding robust solutions in requirements models. Automated Software Engineering, 17(1): 87-116, 2010.
57. 57
References (3)
Slide 21:
[Gu ‘97] Jun Gu, Paul W. Purdom, John Franco, and Benjamin W. Wah. Algorithms for the satisfiability (sat)
problem: A survey. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 19–152.
American Mathematical Society, 1997.
Slide 22:
[Kirkpatrick ‘83] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Sci- ence,
Number 4598, 13 May 1983, 220, 4598:671–680, 1983.
Slide 23:
[Kautz ‘96] Henry Kautz and Bart Selman. Pushing the envelope: Planning, propositional logic and stochastic
search. In Proceedings of the Thirteenth National Conference on Artificial Intel- ligence and the Eighth Innovative
Applications of Artificial Intelligence Conference, pages 1194–1201, Menlo Park, August 4–8 1996. AAAI Press /
MIT Press. Available from http://www.cc.gatech.edu/ ~jimmyd/summaries/kautz1996.ps.
[Selman ‘93] Bart Selman, Henry A. Kautz, and Bram Cohen. Local search strategies for satisfiability testing. In
Michael Trick and David Stifler Johnson, editors, Proceedings of the Second DIMACS Challange on Cliques,
Coloring, and Satisfiability, Providence RI, 1993.
Slide 24:
[Hart ‘68] P.E. Hart, N.J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost
paths. IEEE Transactions on Systems Science and Cybernetics, 4:100–107, 1968.
58. 58
References (4)
Slide 25:
[Coarfa ‘00] Cristian Coarfa, Demetrios D. Demopoulos, Alfonso San, Miguel Aguirre, Devika Subramanian, and
Moshe Y. Vardi. Random 3-sat: The plot thickens. In In Principles and Practice of Constraint Programming, pages
143–159, 2000.
[Mittelmann ‘07] H.D. Mittelmann. Recent benchmarks of optimization software. In 22nd Euorpean Conference on
Operational Research, 2007.
Slide 48:
[Jiang ‘08] Y. Jiang, B. Cukic, and T. Menzies. Does transformation help? In Defects 2008, 2008.
[Poshyvanyk ‘08] D. Poshyvanyk A. Marcus and R. Ferenc. Using the conceptual cohesion of classes for fault
prediction in object oriented systems. IEEE Transactions on Software Engineering, 34:287–300, 2 2008.
Slide 49:
[Cohn ‘94] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning. Mach.
Learn., 15(2):201–221, 1994.
[Shepperd ‘97] M. Shepperd and C. Schofield. Estimating software project effort using analogies. IEEE
Transactions on Software Engineering, 23(12), November 1997.
Slide 50:
[Menzies ’03b] T. Menzies and Y. Hu. Data mining for very busy people. In IEEE Computer, November 2003.
Available from http://menzies.us/pdf/03tar2.pdf.
[Hsu ‘04] Hsu, W.H. Genetic Wrappers for feature selection in decision tree induction and variable ordering in
Bayesian network structure learning. Inf. Sci. 163, 1-3 (June 2004), 103-122.
59. 59
Questions?
Want to contact me later?
Email: greg@greggay.com
Facebook: http://facebook.com/greg.gay
Twitter: http://twitter.com/Greg4cr
More about me: http://www.greggay.com