Combinatorial Testing Rick Kuhn and Raghu Kacker          National Institute of       Standards and Technology           G...
Enhancements to meet newrequirements can be tricky                                Software maintenance                    ...
Combinatorial Methods in Testing• Goals – reduce testing cost, improve cost-benefit ratio for  software assurance• Merge a...
What is NIST and why are we doing this?• A US Government agency• The nation’s measurement and testing  laboratory – 3,000 ...
Empirical Data on Software Failures• We studied software failures in a variety of  fields including 15 years of FDA medica...
Software Failure Internals• How does an interaction fault manifest itself in code?Example: pressure < 10 & volume > 300 (2...
How about hard-to-find flaws?•Interactions e.g., failure occurs if• pressure < 10                                         ...
How about other applications?                        Browser (green)              100                                     ...
And other applications?                          Server (magenta)                100                 90                 80...
Still more?               NASA distributed database                              (light blue)             100             ...
Even more?Traffic Collision Avoidance System module           (seeded errors) (purple)                  100               ...
FinallyNetwork security (Bell, 2006)     (orange)                            Curves appear                            to b...
So, how many parameters are    involved in really tricky faults?•    The interaction rule: most failures are triggered by ...
How does this knowledge help? If all faults are triggered by the interaction of t or fewer variables, then testing all t-w...
What is combinatorial testing?      A simple example                      10 on-off                      variables
How Many Tests Would It Take?There are 10 effects, each can be on or offAll combinations is 210 = 1,024 testsWhat if our b...
Now How Many Would It Take?There are        = 120 3-way interactions.             10              33 binary variables = 8 ...
Covering Arrays       Ex: all triples in only 13 tests,   10                                            3   23 = 960 combi...
Cost and Volume of Tests    Number of tests: proportional to vt log n     for v values, n variables, t-way interactions•• ...
Covering Arrays – key points• Not the same as orthogonal arrays  • OAs – all combinations covered same    number of times ...
Two ways of using combinatorial testing:inputs or configurations (or both)1. Inputsparameter values,    => discretized, sm...
Combinations of inputsSuppose we have a system with on-off switches, withswitch settings as inputs to control software:
How do we test this?34 switches = 234 = 1.7 x 1010 possible inputs = 1.7 x 1010 tests
What if we knew no failure involves more than    3 switch settings interacting?•    34 switches = 234 = 1.7 x 1010 possibl...
Testing Configurations• Example: application must run on any configuration of OS, browser, protocol, CPU, and DBMS
Testing Configurations•Very effective for interoperability testing,  being used by NIST for DoD Android phone testing
Configurations to TestDegree of interaction coverage: 2Number of parameters: 5Maximum number of values per parameter: 3Num...
The Test Oracle Problem• Creating test data is the easy part!• How do we check that the code worked correctly  on the test...
Available Tools• Covering array generator – basic tool for test input or  configurations;• Sequence covering array generat...
ACTS - Defining a new system
Variable interaction strength
Constraints
Covering array output
Output options            Mappable values                              Human readableDegree of           interaction      ...
Algorithm comparison•       Smaller test sets faster, with a more advanced user interface•       First parallelized coveri...
Configuration Testing
Testing Smartphone ConfigurationsOptions: keyboard, screen orientation, screen size, number ofscreens, navigation method, ...
Testing Smartphone ConfigurationsAndroid configuration options:int HARDKEYBOARDHIDDEN_NO;          int ORIENTATION_LANDSCA...
Configuration option valuesParameter Name       Values                                   # ValuesHARDKEYBOARDHIDDEN   NO, ...
Number of configurations generated     t   # Configs   % of Exhaustive     2          29                     0.02     3   ...
Configuration combination demoParameter Name       ValuesHARDKEYBOARDHIDDEN   NO, UNDEFINED, YESKEYBOARDHIDDEN       NO, U...
Configuration combination demo
Configuration combination demo
Configuration combination demo
Configuration combination demo
Configuration combination demo
Input Parameter Testing
Example: Input Testing for Pizza Orders                              6x217x217x217x4x3x2x2x5x2                            ...
Ordering Pizza CombinatoriallySimplified pizza ordering:6x4x4x4x4x3x2x2x5x2= 184,320 possibilities2-way tests:    323-way ...
Case study 1: Traffic Collision AvoidanceSystem (TCAS) module •   Used in previous testing research •   41 versions seeded...
Tests generated  t        Test cases           120002-way:           156            100003-way:           461             ...
Results• Roughly consistent with data on large systems• But errors harder to detect than real-world examples         Detec...
Case study 2:       Document Object Model Events• DOM is a World Wide Web Consortium standard  incorporated into web brows...
Document Object Model EventsEvent Name             Param.   Tests                                        Load           3 ...
World Wide Web Consortium        Document Object Model Events                            Test Results             % oft   ...
Buffer Overflows•   Empirical data from the National Vulnerability Database     •   Investigated > 3,000 denial-of-service...
Example: Finding Buffer Overflows1.    if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) {2.          if (conn[sid]....
Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1.    if (strcmp(conn[sid].dat->in_Re...
Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1.    if (strcmp(conn[sid].dat->in_Re...
Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1.    if (strcmp(conn[sid].dat->in_Re...
Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1.    if (strcmp(conn[sid].dat->in_Re...
Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1.    if (strcmp(conn[sid].dat->in_Re...
Case study 3: Modeling & Simulation•   “Simured” network simulator     •   Kernel of ~ 5,000 lines of C++ (not including G...
Simulation Input Parameters      Parameter      Values         5x3x4x4x4x4x2x21    DIMENSIONS        1,2,4,6,8    x2x4x4x4...
Network Deadlock DetectionDeadlocks Detected:  combinatorial                         1000   2000    4000    8000  t     Te...
Network Deadlock DetectionDetected 14 configurations that can cause deadlock:    14/ 31,457,280 = 4.4 x 10-7Combinatorial ...
Break
Advanced Topics inCombinatorial Methods for        Testing
Solutions to theoracle problem
How to automate checking         correctness of output• Creating test data is the easy part!• How do we check that the cod...
Crash Testing• Like “fuzz testing” - send packets or other input  to application, watch for crashes• Unlike fuzz testing, ...
Built-in Self Test through Embedded AssertionsSimple example:assert( x != 0); // ensure divisor is not zeroOr pre and post...
Built-in Self TestAssertions check properties of expected result:  ensures balance == old(balance) - amount   && result ==...
Using model checking to produce tests      The system can never                                                    Yes it ...
Model checking example-- specification for a portion of tcas - altitude separation.-- The corresponding C code is original...
Computation Tree LogicThe usual logic operators,plus temporal:    A φ - All: φ holds on all paths starting from the  curre...
What is the most effective way to integrate       combinatorial testing with model checking?•    Given AG(P -> AX(R))    “...
What happens with these assertions?   1. AG(v1 & v2 & ... & vk & P -> AX !(R)) P may have a negation of one of the vi, so ...
All faults foundwith 5-way tests        Detection Rate for TCAS Seeded          Tradeoff of up-front                    Er...
CoverageMeasurement
Combinatorial Coverage Measurement Tests   Variables           Variable   Variable-value   Coverage                       ...
Graphing Coverage Measurement                          Bottom line:100% coverage of 33% of                          All co...
Adding a testCoverage after adding test [1,1,0,1]
Adding another testCoverage after adding test [1,0,1,1]
Additional test completes coverage   Coverage after adding test [1,0,1,0]   All combinations covered to 100% level,   so t...
Measurement of satellite tests
Extending an existingtest suite automatically
Coverage to at least 70% selected
New coverage measured
Sequences
Combinatorial Sequence Testing •We want to see if a system works correctly regardless of  the order of events. How can thi...
Sequence Covering Array• With 6 events, all sequences = 6! = 720 tests• Only 10 tests needed for all 3-way sequences,  res...
Sequence Covering Array Properties  • 2-way sequences require only 2 tests    (write events in any order, then reverse)  •...
Case study 4: Laptop application       Problem: connect many       peripherals, order of       connection may affect      ...
Connection Sequences             P-1 (USB- P-2 (USB- P-3 (USB-1   Boot      RIGHT)    BACK)      LEFT)     P-4     P-5    ...
Results• Tested peripheral connection for 3-way  sequences• Some faults detected that would not have  been found with 2-wa...
Research Questions
Fault locationGiven: a set of tests that the SUT fails, whichcombinations of variables/values triggered the failure?      ...
Fault location – whats the problem?                  If theyre in failing set but not in                  passing set:    ...
Integrating into Testing Program• Test suite development   • Generate covering arrays for tests     OR   • Measure coverag...
Industrial Usage Reports• Work with US Air Force on sequence covering arrays, submitted  for publication• WWW Consortium D...
Technology TransferTools obtained by 800+ organizations; NIST “textbook” on combinatorial testing downloaded 11,000+ times...
Please contact us if you    would like more     information.  Rick Kuhn              Raghu Kackerkuhn@nist.gov       raghu...
Upcoming SlideShare
Loading in...5
×

Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Faults in Regression Testing

675

Published on

Tutorial 2: Practical Combinatorial (t-way) Methods for Detecting Complex Faults in Regression Testing
Authors: Rick Kuhn and Raghu Kacker

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
675
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Faults in Regression Testing

  1. 1. Combinatorial Testing Rick Kuhn and Raghu Kacker National Institute of Standards and Technology Gaithersburg, MD ICSM 2011, Williamsburg, VA 26 Sept 2011
  2. 2. Enhancements to meet newrequirements can be tricky Software maintenance can: • introduce new faults • expose untested paths • trigger unexpected interactions • result in loss of previous functions ⇒ need to find problems no one thought of, perhaps even more than with original development ⇒ combinatorial methods excel at testing for rare events
  3. 3. Combinatorial Methods in Testing• Goals – reduce testing cost, improve cost-benefit ratio for software assurance• Merge automated test generation with combinatorial methods• New algorithms to make large-scale combinatorial testing practical• Accomplishments – huge increase in performance, scalability + widespread use in real-world applications: >800 tool users; >11,000 manual/tutorial users• Joint research with many organizations
  4. 4. What is NIST and why are we doing this?• A US Government agency• The nation’s measurement and testing laboratory – 3,000 scientists, engineers, and support staff including 3 Nobel laureates Research in physics, chemistry, materials, manufacturing, computer science Analysis of engineering failures, including buildings, materials, and ...
  5. 5. Empirical Data on Software Failures• We studied software failures in a variety of fields including 15 years of FDA medical device recall data• What causes software failures? • logic errors? • calculation errors? • interaction faults? • inadequate input checking? Etc.• What testing and analysis would have prevented failures?• Would statement coverage, branch coverage, all-values, all-pairs etc. testing find the errors?Interaction faults: e.g., failure occurs ifpressure < 10 (1-way interaction <= all-values testing catches)pressure<10 && volume>300 (2-way interaction <= all-pairs testing catches)
  6. 6. Software Failure Internals• How does an interaction fault manifest itself in code?Example: pressure < 10 & volume > 300 (2-way interaction)if (pressure < 10) { // do something if (volume > 300) { faulty code! BOOM! } else { good code, no problem}} else { // do something else}A test that included pressure = 5 and volume = 400 wouldtrigger this failure
  7. 7. How about hard-to-find flaws?•Interactions e.g., failure occurs if• pressure < 10 (1-way interaction)• pressure < 10 & volume > 300 (2-way interaction)• pressure < 10 & volume > 300 & velocity = 5 (3-way interaction)• The most complex failure reported required 4-way interaction to trigger 100 90 80 70 % detected 60 50 40 30 20 10 0 1 2 3 4 Interaction
  8. 8. How about other applications? Browser (green) 100 These faults 90 more complex 80 than medical device 70 software!! 60 % detected 50 40 Why? 30 20 10 0 1 2 3 4 5 6 Interactions
  9. 9. And other applications? Server (magenta) 100 90 80 70 60 % detected 50 40 30 20 10 0 1 2 3 4 5 6 Interactions
  10. 10. Still more? NASA distributed database (light blue) 100 90 80 70 60% detected 50 40 30 20 10 0 1 2 3 4 5 6 Interactions
  11. 11. Even more?Traffic Collision Avoidance System module (seeded errors) (purple) 100 90 80 70 60 % detected 50 40 30 20 10 0 1 2 3 4 5 6 Interactions
  12. 12. FinallyNetwork security (Bell, 2006) (orange) Curves appear to be similar across a variety of application domains. Why this distribution?
  13. 13. So, how many parameters are involved in really tricky faults?• The interaction rule: most failures are triggered by one or two parameters, and progressively fewer by three, four, or more parameters, and the maximum interaction degree is small.• Maximum interactions for fault triggering was 6• More empirical work needed• Reasonable evidence that maximum interaction strength for fault triggering is relatively small
  14. 14. How does this knowledge help? If all faults are triggered by the interaction of t or fewer variables, then testing all t-way combinations can provide strong assurance. (taking into account: value propagation issues, equivalence partitioning, timing issues, more complex interactions, . . . )
  15. 15. What is combinatorial testing? A simple example 10 on-off variables
  16. 16. How Many Tests Would It Take?There are 10 effects, each can be on or offAll combinations is 210 = 1,024 testsWhat if our budget is too limited for thesetests?Instead, let’s look at all 3-way interactions …
  17. 17. Now How Many Would It Take?There are = 120 3-way interactions. 10 33 binary variables = 8 values: 000, 001, 010…This is 120 x 23 = 960 possible combinationsSince we can pack 3 triples into each test, weneed no more than 320 tests.BUT, each test exercises many triples: 0 1 1 0 0 0 0 1 1 0We can pack a lot into one test, so what’s thesmallest number of tests we need?
  18. 18. Covering Arrays Ex: all triples in only 13 tests, 10 3 23 = 960 combinations One column perOne row per test: parameter: For any 3 parameters, every combination occurs at least once in the array
  19. 19. Cost and Volume of Tests Number of tests: proportional to vt log n for v values, n variables, t-way interactions•• Thus: •Tests increase exponentially with interaction strength t : BAD, but unavoidable •But logarithmically with the number of parameters : GOOD! Example: suppose we want all 4-way combinations of n parameters, 5 values each:• 5000 4500 4000 3500 3000 2500 Tests 2000 1500 1000 500 0 10 20 30 40 50 Variables
  20. 20. Covering Arrays – key points• Not the same as orthogonal arrays • OAs – all combinations covered same number of times • CAs – all combinations covered at least once• Developed 1990s as a theoretical construct• Extends Design of Experiments concept• NP hard problem but good algorithms nowMore on these issues in Part II of the tutorial
  21. 21. Two ways of using combinatorial testing:inputs or configurations (or both)1. Inputsparameter values, => discretized, small number of values for each input account_balance: 0, 999, MAXINT, … state: “VA”, “NC”, “MD”, “PA”, …parameter properties, => may have multiple properties per parameter state_sales_tax: 0, 3.0 to 5.0, 5.0 to 6.0, >6.0 state_CONUS: Y or N state_distance: 0 to 100, 101 to 500, etc. => map combinations of properties to input values => often require constraints2. Configurations, such as max file size, number of input buffers,comm protocol
  22. 22. Combinations of inputsSuppose we have a system with on-off switches, withswitch settings as inputs to control software:
  23. 23. How do we test this?34 switches = 234 = 1.7 x 1010 possible inputs = 1.7 x 1010 tests
  24. 24. What if we knew no failure involves more than 3 switch settings interacting?• 34 switches = 234 = 1.7 x 1010 possible inputs = 1.7 x 1010 tests• If only 3-way interactions, need only 33 tests• For 4-way interactions, need only 85 tests
  25. 25. Testing Configurations• Example: application must run on any configuration of OS, browser, protocol, CPU, and DBMS
  26. 26. Testing Configurations•Very effective for interoperability testing, being used by NIST for DoD Android phone testing
  27. 27. Configurations to TestDegree of interaction coverage: 2Number of parameters: 5Maximum number of values per parameter: 3Number of configurations: 10-------------------------------------Configuration #1:1 = OS=XP t # Configs % of Exhaustive2 = Browser=IE3 = Protocol=IPv4 2 10 144 = CPU=Intel 3 18 255 = DBMS=MySQL------------------------------------- 4 36 50Configuration #2:1 = OS=XP 5 72 1002 = Browser=Firefox3 = Protocol=IPv64 = CPU=AMD5 = DBMS=Sybase-------------------------------------Configuration #3:1 = OS=XP2 = Browser=IE3 = Protocol=IPv64 = CPU=Intel5 = DBMS=Oracle. . . etc.
  28. 28. The Test Oracle Problem• Creating test data is the easy part!• How do we check that the code worked correctly on the test input? • Crash testing server or other code to ensure it does not crash for any test input (like ‘fuzz testing’) - Easy but limited value • Built-in self test with embedded assertions – incorporate assertions in code to check critical states at different points in the code, or print out important values during execution • Full scale model-checking using mathematical model of system and model checker to generate expected results for each input - expensive but tractable
  29. 29. Available Tools• Covering array generator – basic tool for test input or configurations;• Sequence covering array generator – new concept; applies combinatorial methods to event sequence testing• Combinatorial coverage measurement – detailed analysis of combination coverage; automated generation of supplemental tests; helpful for integrating c/t with existing test methods• Domain/application specific tools: • Access control policy tester • .NET config file generator
  30. 30. ACTS - Defining a new system
  31. 31. Variable interaction strength
  32. 32. Constraints
  33. 33. Covering array output
  34. 34. Output options Mappable values Human readableDegree of interaction Degree of interaction coverage: 2coverage: 2 Number of parameters: 12Number of parameters: 12 Maximum number of values perNumber of tests: 100 parameter: 10 Number of configurations: 100----------------------------- ----------------------------------- Configuration #1:0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 1 1 0 1 1 1 1 1 = Cur_Vertical_Sep=2992 0 1 0 1 0 2 0 2 2 1 0 2 = High_Confidence=true0 1 0 1 0 1 3 0 3 1 0 1 3 = Two_of_Three_Reports=true1 1 0 0 0 1 0 0 4 2 1 0 4 = Own_Tracked_Alt=12 1 0 1 1 0 1 0 5 0 0 1 5 = Other_Tracked_Alt=10 1 1 1 0 1 2 0 6 0 0 0 6 = Own_Tracked_Alt_Rate=6001 0 1 0 1 0 3 0 7 0 1 1 7 = Alt_Layer_Value=02 0 1 1 0 1 0 0 8 1 0 0 8 = Up_Separation=00 0 0 0 1 0 1 0 9 2 1 1 9 = Down_Separation=01 1 0 0 1 0 2 1 0 1 0 1 10 = Other_RAC=NO_INTENTEtc. 11 = Other_Capability=TCAS_CA 12 = Climb_Inhibit=true
  35. 35. Algorithm comparison• Smaller test sets faster, with a more advanced user interface• First parallelized covering array algorithm• More information per test IPOG ITCH (IBM) Jenny (Open Source) TConfig (U. of Ottawa) TVG (Open Source)T-Way Size Time Size Time Size Time Size Time Size Time 2 100 0.8 120 0.73 108 0.001 108 >1 hour 101 2.75 3 400 0.36 2388 1020 413 0.71 472 >12 hour 9158 3.07 4 1363 3.05 1484 5400 1536 3.54 1476 >21 hour 64696 127 >1 5 4226 18s NA day 4580 43.54 NA >1 day 313056 1549 6 10941 65.03 NA >1 day 11625 470 NA >1 day 1070048 12600 Traffic Collision Avoidance System (TCAS): 273241102 Times in seconds
  36. 36. Configuration Testing
  37. 37. Testing Smartphone ConfigurationsOptions: keyboard, screen orientation, screen size, number ofscreens, navigation method, and on, and on, and on . . .What if we built the appfor this: … and we have to make sure it works on these
  38. 38. Testing Smartphone ConfigurationsAndroid configuration options:int HARDKEYBOARDHIDDEN_NO; int ORIENTATION_LANDSCAPE;int HARDKEYBOARDHIDDEN_UNDEFINED; int ORIENTATION_PORTRAIT;int HARDKEYBOARDHIDDEN_YES; int ORIENTATION_SQUARE;int KEYBOARDHIDDEN_NO; int ORIENTATION_UNDEFINED;int KEYBOARDHIDDEN_UNDEFINED; int SCREENLAYOUT_LONG_MASK;int KEYBOARDHIDDEN_YES; int SCREENLAYOUT_LONG_NO;int KEYBOARD_12KEY; int SCREENLAYOUT_LONG_UNDEFINED;int KEYBOARD_NOKEYS; int SCREENLAYOUT_LONG_YES;int KEYBOARD_QWERTY; int SCREENLAYOUT_SIZE_LARGE;int KEYBOARD_UNDEFINED; int SCREENLAYOUT_SIZE_MASK;int NAVIGATIONHIDDEN_NO; int SCREENLAYOUT_SIZE_NORMAL;int NAVIGATIONHIDDEN_UNDEFINED; int SCREENLAYOUT_SIZE_SMALL;int NAVIGATIONHIDDEN_YES; int SCREENLAYOUT_SIZE_UNDEFINED;int NAVIGATION_DPAD; int TOUCHSCREEN_FINGER;int NAVIGATION_NONAV; int TOUCHSCREEN_NOTOUCH;int NAVIGATION_TRACKBALL; int TOUCHSCREEN_STYLUS;int NAVIGATION_UNDEFINED; int TOUCHSCREEN_UNDEFINED;int NAVIGATION_WHEEL;
  39. 39. Configuration option valuesParameter Name Values # ValuesHARDKEYBOARDHIDDEN NO, UNDEFINED, YES 3KEYBOARDHIDDEN NO, UNDEFINED, YES 3KEYBOARD 12KEY, NOKEYS, QWERTY, UNDEFINED 4NAVIGATIONHIDDEN NO, UNDEFINED, YES 3NAVIGATION DPAD, NONAV, TRACKBALL, UNDEFINED, 5 WHEELORIENTATION LANDSCAPE, PORTRAIT, SQUARE, UNDEFINED 4SCREENLAYOUT_LONG MASK, NO, UNDEFINED, YES 4SCREENLAYOUT_SIZE LARGE, MASK, NORMAL, SMALL, UNDEFINED 5TOUCHSCREEN FINGER, NOTOUCH, STYLUS, UNDEFINED 4Total possible configurations:3 x 3 x 4 x 3 x 5 x 4 x 4 x 5 x 4 = 172,800
  40. 40. Number of configurations generated t # Configs % of Exhaustive 2 29 0.02 3 137 0.08 4 625 0.4 5 2532 1.5 6 9168 5.3
  41. 41. Configuration combination demoParameter Name ValuesHARDKEYBOARDHIDDEN NO, UNDEFINED, YESKEYBOARDHIDDEN NO, UNDEFINED, YESKEYBOARD 12KEY, NOKEYS, QWERTY, UNDEFINEDNAVIGATIONHIDDEN NO, UNDEFINED, YESNAVIGATION DPAD, NONAV, TRACKBALL, UNDEFINED, WHEELORIENTATION LANDSCAPE, PORTRAIT, SQUARE, UNDEFINEDSCREENLAYOUT_LONG MASK, NO, UNDEFINED, YESSCREENLAYOUT_SIZE LARGE, MASK, NORMAL, SMALL, UNDEFINEDTOUCHSCREEN FINGER, NOTOUCH, STYLUS, UNDEFINED
  42. 42. Configuration combination demo
  43. 43. Configuration combination demo
  44. 44. Configuration combination demo
  45. 45. Configuration combination demo
  46. 46. Configuration combination demo
  47. 47. Input Parameter Testing
  48. 48. Example: Input Testing for Pizza Orders 6x217x217x217x4x3x2x2x5x2 = WAY TOO MUCH TO TEST Simplified pizza ordering: 6x4x4x4x4x3x2x2x5x2 = 184,320 possibilities
  49. 49. Ordering Pizza CombinatoriallySimplified pizza ordering:6x4x4x4x4x3x2x2x5x2= 184,320 possibilities2-way tests: 323-way tests: 1504-way tests: 5705-way tests: 2,4136-way tests: 8,330If all failures involve 5 or fewerparameters, then we can haveconfidence after running all 5-waytests.
  50. 50. Case study 1: Traffic Collision AvoidanceSystem (TCAS) module • Used in previous testing research • 41 versions seeded with errors • 12 variables: 7 boolean, two 3-value, one 4- value, two 10-value • All flaws found with 5-way coverage • Thousands of tests - generated by model checker in a few minutes
  51. 51. Tests generated t Test cases 120002-way: 156 100003-way: 461 8000 Tests4-way: 1,450 60005-way: 4,309 40006-way: 11,094 2000 0 2-way 3-way 4-way 5-way 6-way
  52. 52. Results• Roughly consistent with data on large systems• But errors harder to detect than real-world examples Detection Rate for TCAS Seeded Tests per error Errors 350.0 100% 300.0 80% 250.0 200.0 Tests 60% Tests per error Detection 150.0 40% rate 100.0 20% 50.0 0% 0.0 2 way 3 way 4 way 5 way 6 way 2 way 3 way 4 way 5 way 6 way Fault Interaction level Fault Interaction levelBottom line for model checking based combinatorial testing:Expensive but can be highly effective
  53. 53. Case study 2: Document Object Model Events• DOM is a World Wide Web Consortium standard incorporated into web browsers• NIST Systems and Software division develops tests for standards such as DOM• DOM testing problem: • large number of events handled by separate functions • functions have 3 to 15 parameters • parameters have many, often continuous, values • verification requires human interaction (viewing screen) • testing takes a long time
  54. 54. Document Object Model EventsEvent Name Param. Tests Load 3 24Abort 3 12 MouseDown 15 4352Blur 5 24 MouseMove 15 4352Click 15 4352 MouseOut 15 4352Change 3 12 MouseOver 15 4352dblClick 15 4352 MouseUp 15 4352DOMActivate 5 24 MouseWheel 14 1024DOMAttrModified 8 16 Reset 3 12DOMCharacterDataMo 8 64 Resize 5 48dified Scroll 5 48DOMElementNameCha 6 8 Select 3 12nged Submit 3 12DOMFocusIn 5 24 TextInput 5 8DOMFocusOut 5 24 Unload 3 24DOMNodeInserted 8 128 Wheel 15 4096DOMNodeInsertedIntoD 8 128 Total Tests 36626ocumentDOMNodeRemoved 8 128DOMNodeRemovedFrom 8 128Document Exhaustive testing ofDOMSubTreeModified 8 64Error 3 12 equivalence classFocus 5 24KeyDown 1 17 valuesKeyUp 1 17
  55. 55. World Wide Web Consortium Document Object Model Events Test Results % oft Tests Not Orig. Pass Fail Run2 702 1.92% 202 27 4733 1342 3.67% 786 27 5294 1818 4.96% 437 72 13095 2742 7.49% 908 72 1762 11.546 4227 1803 72 2352 %All failures found using < 5% oforiginal exhaustive test set
  56. 56. Buffer Overflows• Empirical data from the National Vulnerability Database • Investigated > 3,000 denial-of-service vulnerabilities reported in the NIST NVD for period of 10/06 – 3/07 • Vulnerabilities triggered by: • Single variable – 94.7% example: Heap-based buffer overflow in the SFTP protocol handler for Panic Transmit … allows remote attackers to execute arbitrary code via a long ftps:// URL. • 2-way interaction – 4.9% example: single character search string in conjunction with a single character replacement string, which causes an "off by one overflow" • 3-way interaction – 0.4% example: Directory traversal vulnerability when register_globals is enabled and magic_quotes is disabled and .. (dot dot) in the page parameter
  57. 57. Example: Finding Buffer Overflows1. if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) {2. if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { ……3. conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024,sizeof(char)); ……4. pPostData=conn[sid].PostData;5. do {6. rc=recv(conn[sid].socket, pPostData, 1024, 0); ……7. pPostData+=rc;8. x+=rc;9. } while ((rc==1024)||(x<conn[sid].dat->in_ContentLength));10. conn[sid].PostData[conn[sid].dat->in_ContentLength]=0;11. }
  58. 58. Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1. if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) {2. if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { ……3. conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024,sizeof(char)); ……4. pPostData=conn[sid].PostData;5. do {6. rc=recv(conn[sid].socket, pPostData, 1024, 0); ……7. pPostData+=rc;8. x+=rc;9. } while ((rc==1024)||(x<conn[sid].dat->in_ContentLength));10. conn[sid].PostData[conn[sid].dat->in_ContentLength]=0;11. }
  59. 59. Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1. if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) { true branch2. if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { ……3. conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024,sizeof(char)); ……4. pPostData=conn[sid].PostData;5. do {6. rc=recv(conn[sid].socket, pPostData, 1024, 0); ……7. pPostData+=rc;8. x+=rc;9. } while ((rc==1024)||(x<conn[sid].dat->in_ContentLength));10. conn[sid].PostData[conn[sid].dat->in_ContentLength]=0;11. }
  60. 60. Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1. if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) {2. true if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { branch ……3. conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char)); ……4. pPostData=conn[sid].PostData;5. do {6. rc=recv(conn[sid].socket, pPostData, 1024, 0); ……7. pPostData+=rc;8. x+=rc;9. } while ((rc==1024)||(x<conn[sid].dat->in_ContentLength));10. conn[sid].PostData[conn[sid].dat->in_ContentLength]=0;11. }
  61. 61. Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1. if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) {2. true if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { branch ……3. conn[sid].PostData=calloc(conn[sid].dat- Allocate -1000 + 1024 bytes = 24 bytes>in_ContentLength+1024, sizeof(char)); ……4. pPostData=conn[sid].PostData;5. do {6. rc=recv(conn[sid].socket, pPostData, 1024, 0); ……7. pPostData+=rc;8. x+=rc;9. } while ((rc==1024)||(x<conn[sid].dat->in_ContentLength));10. conn[sid].PostData[conn[sid].dat->in_ContentLength]=0;11. }
  62. 62. Interaction: request-method=”POST”, content-length = -1000, data= a string > 24 bytes1. if (strcmp(conn[sid].dat->in_RequestMethod, "POST")==0) {2. true if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { branch ……3. conn[sid].PostData=calloc(conn[sid].dat- Allocate -1000 + 1024 bytes = 24 bytes>in_ContentLength+1024, sizeof(char)); ……4. pPostData=conn[sid].PostData; rc=recv(conn[sid].socket, pPostData, 1024, 0);Boom!5. do {6. ……7. pPostData+=rc;8. x+=rc;9. } while ((rc==1024)||(x<conn[sid].dat->in_ContentLength));10. conn[sid].PostData[conn[sid].dat->in_ContentLength]=0;11. }
  63. 63. Case study 3: Modeling & Simulation• “Simured” network simulator • Kernel of ~ 5,000 lines of C++ (not including GUI)• Objective: detect configurations that can produce deadlock: • Prevent connectivity loss when changing network • Attacks that could lock up network• Compare effectiveness of random vs. combinatorial inputs• Deadlock combinations discovered• Crashes in >6% of tests w/ valid values (Win32 version only)
  64. 64. Simulation Input Parameters Parameter Values 5x3x4x4x4x4x2x21 DIMENSIONS 1,2,4,6,8 x2x4x4x4x4x42 NODOSDIM 2,4,6 = 31,457,2803 NUMVIRT 1,2,3,8 configurations4 NUMVIRTINJ 1,2,3,85 NUMVIRTEJE 1,2,3,8 Are any of them6 LONBUFFER 1,2,4,6 dangerous?7 NUMDIR 1,28 FORWARDING 0,1 If so, how many?9 PHYSICAL true, false Which ones?10 ROUTING 0,1,2,311 DELFIFO 1,2,4,612 DELCROSS 1,2,4,613 DELCHANNEL 1,2,4,614 DELSWITCH 1,2,4,6
  65. 65. Network Deadlock DetectionDeadlocks Detected: combinatorial 1000 2000 4000 8000 t Tests 500 pkts pkts pkts pkts pkts 2 28 0 0 0 0 0 3 161 2 3 2 3 3 4 752 14 14 14 14 14Average Deadlocks Detected: random 1000 2000 4000 8000 t Tests 500 pkts pkts pkts pkts pkts 2 28 0.63 0.25 0.75 0. 50 0. 75 3 161 3 3 3 3 3 4 752 10.13 11.75 10.38 13 13.25
  66. 66. Network Deadlock DetectionDetected 14 configurations that can cause deadlock: 14/ 31,457,280 = 4.4 x 10-7Combinatorial testing found more deadlocks thanrandom, including some that might never have beenfound with random testingWhy do this testing? Risks:• accidental deadlock configuration: low• deadlock config discovered by attacker: much higher (because they are looking for it)
  67. 67. Break
  68. 68. Advanced Topics inCombinatorial Methods for Testing
  69. 69. Solutions to theoracle problem
  70. 70. How to automate checking correctness of output• Creating test data is the easy part!• How do we check that the code worked correctly on the test input? • Crash testing server or other code to ensure it does not crash for any test input (like ‘fuzz testing’) - Easy but limited value • Built-in self test with embedded assertions – incorporate assertions in code to check critical states at different points in the code, or print out important values during execution • Full scale model-checking using mathematical model of system and model checker to generate expected results for each input - expensive but tractable
  71. 71. Crash Testing• Like “fuzz testing” - send packets or other input to application, watch for crashes• Unlike fuzz testing, input is non-random; cover all t-way combinations• May be more efficient - random input generation requires several times as many tests to cover the t-way combinations in a covering arrayLimited utility, but can detect high-risk problems such as: - buffer overflows - server crashes
  72. 72. Built-in Self Test through Embedded AssertionsSimple example:assert( x != 0); // ensure divisor is not zeroOr pre and post-conditions:/requires amount >= 0;/ensures balance == old(balance) - amount &&result == balance;
  73. 73. Built-in Self TestAssertions check properties of expected result: ensures balance == old(balance) - amount && result == balance;•Reasonable assurance that code works correctly acrossthe range of expected inputs•May identify problems with handling unanticipated inputs•Example: Smart card testing • Used Java Modeling Language (JML) assertions • Detected 80% to 90% of flaws
  74. 74. Using model checking to produce tests The system can never Yes it can, and get in this state! here’s how … Model-checker test production: if assertion is not true, then a counterexample is generated. This can be converted to a test case. Black & Ammann, 1999
  75. 75. Model checking example-- specification for a portion of tcas - altitude separation.-- The corresponding C code is originally from Siemens Corp.MODULE mainVAR Cur_Vertical_Sep : { 299, 300, 601 }; High_Confidence : boolean;...init(alt_sep) := START_; next(alt_sep) := case enabled & (intent_not_known | !tcas_equipped) : case need_upward_RA & need_downward_RA : UNRESOLVED; need_upward_RA : UPWARD_RA; need_downward_RA : DOWNWARD_RA; 1 : UNRESOLVED; esac; 1 : UNRESOLVED; esac;...SPEC AG ((enabled & (intent_not_known | !tcas_equipped) &!need_downward_RA & need_upward_RA) -> AX (alt_sep = UPWARD_RA))-- “FOR ALL executions,-- IF enabled & (intent_not_known ....-- THEN in the next state alt_sep = UPWARD_RA”
  76. 76. Computation Tree LogicThe usual logic operators,plus temporal: A φ - All: φ holds on all paths starting from the current state. E φ - Exists: φ holds on some paths starting from the current state. G φ - Globally: φ has to hold on the entire subsequent path. F φ - Finally: φ eventually has to hold X φ - Next: φ has to hold at the next state [others not listed] execution paths states on the execution paths SPEC AG ((enabled & (intent_not_known | !tcas_equipped) & !need_downward_RA & need_upward_RA) -> AX (alt_sep = UPWARD_RA))“FOR ALL executions, IF enabled & (intent_not_known .... THEN in the next state alt_sep = UPWARD_RA”
  77. 77. What is the most effective way to integrate combinatorial testing with model checking?• Given AG(P -> AX(R)) “for all paths, in every state, if P then in the next state, R holds”• For k-way variable combinations, v1 & v2 & ... & vk• vi abbreviates “var1 = val1”• Now combine this constraint with assertion to produce counterexamples. Some possibilities: 1. AG(v1 & v2 & ... & vk & P -> AX !(R)) 2. AG(v1 & v2 & ... & vk -> AX !(1)) 3. AG(v1 & v2 & ... & vk -> AX !(R))
  78. 78. What happens with these assertions? 1. AG(v1 & v2 & ... & vk & P -> AX !(R)) P may have a negation of one of the vi, so we get 0 -> AX !(R))always true, so no counterexample, no test.This is too restrictive! 1. AG(v1 & v2 & ... & vk -> AX !(1)) The model checker makes non-deterministic choices for variables not in v1..vk, so all R values may not be covered by a counterexample. This is too loose! 2. AG(v1 & v2 & ... & vk -> AX !(R)) Forces production of a counterexample for each R. This is just right!
  79. 79. All faults foundwith 5-way tests Detection Rate for TCAS Seeded Tradeoff of up-front Errors spec creation time vs. only a few 100% minutes to run tests 80% 60% Detection 40% rate 20% 0% 2 way 3 way 4 way 5 way 6 way Fault Interaction level
  80. 80. CoverageMeasurement
  81. 81. Combinatorial Coverage Measurement Tests Variables Variable Variable-value Coverage pairs combinations a b c d covered 1 0 0 0 0 ab 00, 01, 10 .75 2 0 1 1 0 ac 00, 01, 10 .75 3 1 0 0 1 ad 00, 01, 11 .75 4 0 1 1 1 bc 00, 11 .50 bd 00, 01, 10, 11 1.0 cd 00, 01, 10, 11 1.0
  82. 82. Graphing Coverage Measurement Bottom line:100% coverage of 33% of All combinationscombinations75% coverage of half of covered to at least 50%combinations50% coverage of 16% ofcombinations
  83. 83. Adding a testCoverage after adding test [1,1,0,1]
  84. 84. Adding another testCoverage after adding test [1,0,1,1]
  85. 85. Additional test completes coverage Coverage after adding test [1,0,1,0] All combinations covered to 100% level, so this is a covering array.
  86. 86. Measurement of satellite tests
  87. 87. Extending an existingtest suite automatically
  88. 88. Coverage to at least 70% selected
  89. 89. New coverage measured
  90. 90. Sequences
  91. 91. Combinatorial Sequence Testing •We want to see if a system works correctly regardless of the order of events. How can this be done efficiently?• Failure reports often say something like: failure occurred when A started if B is not already connected.• Can we produce compact tests such that all t-way sequences covered (possibly with interleaving events)? Event Description a connect flow meter b connect pressure gauge c connect satellite link d connect pressure readout e start comm link f boot system
  92. 92. Sequence Covering Array• With 6 events, all sequences = 6! = 720 tests• Only 10 tests needed for all 3-way sequences, results even better for larger numbers of events• Example: .*c.*f.*b.* covered. Any such 3-way seq covered. Test Sequence 1 a b c d e f 2 f e d c b a 3 d e f a b c 4 c b a f e d 5 b f a d c e 6 e c d a f b 7 a e f c b d 8 d b c f e a 9 c e a d b f 10 f b d a e c
  93. 93. Sequence Covering Array Properties • 2-way sequences require only 2 tests (write events in any order, then reverse) • For > 2-way, number of tests grows with log n, for n events • Simple greedy algorithm produces compact test set 300 250 200 2-wayTests 150 3-way 4-way 100 50 0 5 10 20 30 40 50 60 70 80 Number of events
  94. 94. Case study 4: Laptop application Problem: connect many peripherals, order of connection may affect application
  95. 95. Connection Sequences P-1 (USB- P-2 (USB- P-3 (USB-1 Boot RIGHT) BACK) LEFT) P-4 P-5 App Scan P-3 (USB- P-2 (USB- P-1 (USB-2 Boot App Scan P-5 P-4 RIGHT) BACK) LEFT) P-3 (USB- P-2 (USB- P-1 (USB-3 Boot RIGHT) LEFT) BACK) App Scan P-5 P-4 etc... 3-way sequence covering of connection events
  96. 96. Results• Tested peripheral connection for 3-way sequences• Some faults detected that would not have been found with 2-way sequence testing; may not have been found with random • Example: • If P2-P1-P3 sequence triggers a failure, then a full 2-way sequence covering array would not have found it (because 1-2-3-4-5-6-7 and 7-6-5-4-3-2-1 is a 2-way sequence covering array)
  97. 97. Research Questions
  98. 98. Fault locationGiven: a set of tests that the SUT fails, whichcombinations of variables/values triggered the failure? variable/value combinations in passing tests These are the ones we want variable/value combinations in failing tests
  99. 99. Fault location – whats the problem? If theyre in failing set but not in passing set: 1. which ones triggered the failure? 2. which ones dont matter? ( ) combinations t n out of v t Example: 30 variables, 5 values each = 445,331,250 5-way combinations 142,506 combinations in each test
  100. 100. Integrating into Testing Program• Test suite development • Generate covering arrays for tests OR • Measure coverage of existing tests and supplement• Training • Testing textbooks – Mathur, Ammann & Offutt, • Combinatorial testing “textbook” on ACTS site • User manuals • Worked examples
  101. 101. Industrial Usage Reports• Work with US Air Force on sequence covering arrays, submitted for publication• WWW Consortium DOM Level 3 events conformance test suite• Cooperative Research & Development Agreement with Lockheed Martin Aerospace - report to be released 3rd or 4th quarter 2011• New projects with NASA IV&V facility
  102. 102. Technology TransferTools obtained by 800+ organizations; NIST “textbook” on combinatorial testing downloaded 11,000+ times since Oct. 2010Collaborations: USAF 46th Test Wing, Lockheed Martin, George Mason Univ., Univ. of Maryland Baltimore County, Johns Hopkins Univ. Applied Physics Lab, Carnegie Mellon Univ., NASA IV&VWe are always interested in working with others!
  103. 103. Please contact us if you would like more information. Rick Kuhn Raghu Kackerkuhn@nist.gov raghu.kacker@nist.gov http://csrc.nist.gov/acts
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×