Slideshow transcript
Slide 1: This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government. If you fix everything you lose fixes for everything else Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Dan Baker (WVU) Karen Lum (JPL) tim@menzies.us International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, oelrawas@mix.wvu.edu Nov 5, 2007
Slide 2: What does this mean? A supposedly np-hard task abduction over first- order theories nogood/2 Q: for what models does (a few peeks) = (many hard stares)? 2
Slide 3: Grow A: models with Monte Carlo a model – Picking input settings at “collars” random For each run – Score each output Add score to each input “Collar” variables set the other settings variables Harvest Rule generation experiments, Narrows – – favoring settings with better Amarel in the 60s scores Minimal environments – If “collars”, then DeKleer ’85 … small rules … – Master variables – – … learned quickly … Crawford & Baker ‘94 – … will suffice Feature subset selection – Kohavi & John ‘97 Back doors – Williams et al ‘03 Etc – Implications for uncertainty? Feather & Menzies RE’02 3
Slide 4: STAR: collars + simulated annealing on For example Boehm’s USC’s software process models USC software process models for effort, defects, threats controllable y[i] = impact[i] * project[i] + b[i] for i ∈ {1,2,3,…} – α ≤ project[i] ≤ β : uncertainty in project description – χ ≤ impact[i] ≤ δ : uncertainty in model calibration – uncontrollable Random solution pick project[i] and impact[i] from any α .. β , χ .. δ – – α .. β set via domain knowledge; e.g. process maturity in 3 to 5 – range of χ .. δ known from history; Score solution by effort (Ef), defects (De) and Threat (Th) 4
Slide 5: Two studies y[i] = impact[i] * project[i] + b[i] one two Certain methods Methods with more uncertainty Using much historical data – Using no historical data – Learn the magnitude of the – – Monte Carlo at random across impact[i] relationship the project[i] settings and – With fixed impact[I] Tame impact[i] settings Monte Carlo at uncontroll- andom across the ables via project[i] settings historical E.g. E.g. records STAR – Regression-based tools that – – Monte Carlo a model learn impact[I] from historical records – Score each output – 93 records of JPL systems – Sort settings by their “C”, – SCAT: “C”= cumulative score JPL’s current methods Rule generation experiments, – 2CEE: – favoring settings with better “C”. WVU’s improvement over SCAT (currently under test) 5
Slide 6: Bad Inside STAR 1. sampling - simulated annealing Good 2. summarizing - post-processor 38 not-so- good ideas for setting ∈ Sx { value[setting] += E } Sort all settings by their value Ignore uncontrollables impact[I] – Assume the top – (1 ≤ i ≤ max) project[I] settings Randomly select the rest – “Policy point” : smallest I with lowest E – Median = 50% percentile Spread = (75-50)% percentile – 22 good ideas 6
Slide 7: SCAT vs 2CEE vs STAR project[i] 7
Slide 8: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] 8
Slide 9: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] 9
Slide 10: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% 10
Slide 11: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% 11
Slide 12: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/2cee= 30/620= 5% STAR/2cee= 50/ 800= 6% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% STAR/scat= 30/730= 4% STAR/scat= 50/1300= 4% 12
Slide 13: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/2cee= 30/620= 5% STAR/2cee= 50/ 800= 6% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% STAR/scat= 30/730= 4% STAR/scat= 50/1300= 4% 13
Slide 14: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/2cee= 30/620= 5% STAR/2cee= 50/ 800= 6% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% STAR/scat= 30/730= 4% STAR/scat= 50/1300= 4% Ignoring historical data is useful (!!!?) 14
Slide 15: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/2cee= 30/620= 5% STAR/2cee= 50/ 800= 6% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% STAR/scat= 30/730= 4% STAR/scat= 50/1300= 4% Ignoring historical data is useful (!!!?) 15
Slide 16: Control impact[I] via historical data SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Median: 50% point Spread : (75 - 50)% STAR/2cee= 400/1600= 25% STAR/2cee= 180/ 400= 45% STAR/2cee= 30/620= 5% STAR/2cee= 50/ 800= 6% STAR/scat= 400/1900= 21% STAR/scat= 180/1900= 60% STAR/scat= 30/730= 4% STAR/scat= 50/1300= 4% Ignoring historical data is useful (!!!?) If you fix everything, you lose fixes for everything else16
Slide 17: Luke, trust the force, I mean, collars IEEE Computer, Jan 2007 “The strangest thing about software”
Slide 18: Extra Material
Slide 19: Related work Abduction : World W = minimal set of – assumptions (w.r.t. size) such that Feather, DDP, treatment learning T ∪ A => G Optimization of Not(T U A => error) – requirement models Framework for – validation, XEROC PARC, 1980s, qualitative diagnosis, representations (QR) planning, monitoring, not overly-specific, – explanation, Quickly collected in a new – tutoring, domain. test case generation, – Used for model diagnosis prediction,… and repair Theoretically slow (NP-hard) but – – Can found creative solutions in this should be practical: larger space of possible Abduction + stochastic sampling qualitative behaviors, Find collars than in the tighter space of precise Learn constraints on collars quantitative behaviors 19
Slide 20: Possible optimizations (not used here) STAR, an example of a general BORE (best or rest) process: n runs – Stochastic sampling Best= top 10% scores – – – Sort settings by “value” Rest = remaining 90% – – Rule generation experiments {a,b} = frequency of – discretized range in {best, rest favoring highly “value”-ed settings See also, elite sampling in the Sort settings by – Ask cross-entropy method -1 * (a/n)2 / (a/n + b/n) me why, off-line If SA convergence too slow Other valuable tricks: Try moving back select into the SA; – Incremental discretization: – – Constrain solution mutation to Gama&Pinto’s PID + prefer highly “value”-ed settings Fayyad&Irani – Limited discrepancy search: Harvey&Ginsberg – Treatment learning: Menzies&Yu 20
Slide 21: “Uncertainty helps planning” (questions? comments?)
Slide 22: At the “policy point”, diff diff STAR’s random solutions are surprisingly accurate diff diff LC : learn impact[i] via regression (JPL data) STAR: no tuning, randomly pick impact[i] Diff = ∑ mre(lc)/ ∑ mre(star) Mre = abs(predicted - actual) /actual same diff ∑ mre(lc) / ∑ mre(star) strategic tactical ground 66% 63% all 91% 75% OSP2 99% 125% ●❍ same same OSP 112% ●❍ 111% ●❍ flight 101% ●❍ 121% ●❍ same at {95, 99}% confidence (MWU) { “●” “❍”} same same Why so little Diff (median= 75%)? Most influential inputs tightly constrained – 22
Slide 23: (Model uncertainty = collars) << inputs In many models, a few “collar” variables set the other variables Narrows (Amarel in the 60s) – Minimal environments (DeKleer ’85) – Master variables (Crawford & Baker ‘94) – – Feature subset selection (Kohavi & John ‘97) – Back doors (Williams et al ‘03) – See “The Strangest Thing About Software (IEEE Computer, Jan’07)” Collars appear in all execution traces (by definition) You don’t have to find the collars, they’ll find you – So, to handle uncertainty Write a simulator – Stagger over uncertainties – This talk: a very simple example of this process From stagger, find collars – Constrain collars – 23
Slide 24: Comparisons Standard software process modeling Models written more than run (PROSIM community) – Limited sensitivity analysis Limited trade space Or, expensive, error-prone, incomplete data collection – programs Point solutions Here: No data collection – Found stable conclusions – within a space of possibilities – Search : very simple – Solution, not brittle With trade-off space 22 good ideas, sorted 24
Slide 25: Bad Summary Living with uncertainty Sometimes, simpler than you Good – may think more useful than you might – think Simple: Here, the smallest change – to simulating annealing Useful: Sometimes uncertainty can – teach you more than certainty If you fix everything, you lose – fixes to everything else An example you Collars control certainty can explain to Uncertainty plus constrained – any business user collars → more certainty Also, can drive model to – better performance An example you can explain to any business user 22 good ideas, sorted 25




Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 0 (more)