Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes


Published on

This presentation was presented by Justin Hunter, Lanette Creamer, and Ajay Balamurugadas at CAST 2011. The focus of the presentation is using pairwise testing methods as well as other more sophisticated Design of Experiments based software test design methods.

The description of the presentation on the CAST site is:

"Vendor Meets User: The Hexawise Test Design Tool and a Tester who Tried to Use It in Real Life

Justin Hunter, Lanette Creamer, and Ajay Balamurugadas

Dr. William G. Hunter helped manufacturers create small numbers of prototypes that were each carefully designed to reveal as much actionable information as possible. He did this using Design of Experiments methods that he taught as a professor of Applied Statistics. Five years ago, while working at Accenture, Hunter’s son Justin began to apply some of these Design of Experiments-based methods to the software testing field. After seeing promising results from 17 pilot projects he helped manage at Accenture, Justin created Hexawise, a software test design tool that generates tests using Design of Experiments-based methods.

Justin will introduce the tool. But this is not the typical vendor talk. Testers Lanette Creamer and Ajay Balamurugadas each recently used Hexawise for the first time on a real project. They will share their experiences, covering both where it helped and where she experienced limitations of the tool and the test design technique.

Justin Hunter, Founder and CEO of Hexawise, is a test design specialist who has enjoyed teaching testers on six continents how to improve the efficiency and effectiveness of their test case selection approaches. The improbably circuitous career path that led him into the software testing field included working as a securities lawyer based in London and launching Asia’s first internet-based stock brokerage firm. The Hexawise test design tool is a web-based tool that is available for free to teams of 5 or fewer testers, as well as to non-profit organizations.

Lanette Creamer: After 10 years at Adobe, including working as a Quality Lead testing across the Creative Suites, Lanette is now a Senior Consultant with Sogeti. She is currently working as a Test Lead at Starbucks. Lanette has been evangelizing test collaboration and promoting advancement in human test ideas for the past 5 years. With a deep passion for collaboration as a way to increase test coverage, she believes it is a powerful solution when facing complex technical challenges. Lanette has presented at PNSQC, Better Software/Agile Development Practices, Writing About Testing, and STPCon in 2010. She’ll be participating at CAST 2011 in her home city of Seattle. She actively participates in the testing community and has written two technical papers and a published article on testing in ST&P Mag January 2010 (now ST&QA magazine)."

Published in: Technology, Education
1 Comment
  • Special offer to CAST attendees: Anyone who attended CAST 2011 is welcome to sign up for a free Hexawise account at https://app.hexawise.com/signup-1-5

    Hexawise is a web-based software test design tool. You enter your test inputs on the Define Inputs screen, click a Create Tests icon, and Hexawise generates a set of powerful software tests for you in seconds. You can then adjust your coverage strength (all pairs, all triples, risk-based testing coverage, etc.) to suit your preferences.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes

  1. 1. Vendor Meets UserThe Hexawise Test Design Tool and TwoTesters Who Tried to Use it in Real Life Presented at CAST, 2011 Justin Hunter Lanette Creamer Ajay Balamurugadas
  2. 2. Topics Introduction to Hexawise “Inside the Mind of the Vendor” Justin Hunter Experiences “Inside the Minds of Testers” Lanette Creamer & Ajay BalamurugadasHexawise is a test case design tool used by testers to design their tests. In a contextwhere test scripts are used, Hexawise can design detailed test scripts. In anExploratory Testing context, Hexawise is used to generate “test ideas” that encouragethe tester for the tester to explore, and even design mini tests on the fly. 2
  3. 3. My Dad - William G. HunterWhy I created Hexawise has a whole lot to do with who my father was. He was a leadingapplied statistician who specialized in how to make experiments more efficient & effective. 3
  4. 4. 1960’sIn the 1960’s my dad brought my family to Singapore where he taught at a university andworked with local companies. 4
  5. 5. 1970’sIn the 1970’s my dad brought my family to Nigeria where, again, he taught at a universityand worked with local companies. That’s me in the lower right hand corner. We werevisiting a factory my father was helping. 5
  6. 6. 1980’sIn the 1980’s he did something that his colleagues thought was pretty crazy because theythought his expertise and lessons learned probably wouldn’t transfer into the governmentsector. While a professor in Madison, Wisconsin, he started collaborating with local andstate government agencies. 6
  7. 7. Design of ExperimentsWhy did he uproot our family every few years and then start collaborating withgovernment agencies?It was because he passionately believed that sharing his expertise in Design ofExperiments (a specialized field of applied statistics), would really help people - by givingthem skills that would improve both quality and productivity. This is a book cover from abook he co-wrote with George Box that has helped to increase awareness of what Designof Experiments is and how practitioners should use it. 7
  8. 8. Design of ExperimentsSo what is Design of Experiments? It’s a specialized field of applied statistics that hasbeen around since the 1930’s. Simply put, it is focused on answering this question. 8
  9. 9. Design of ExperimentsWhere is Design of Experiments used?It is used extensively in manufacturing, among other industries. If you’re a manufacturertrying to create a widget for a car part, for example, you don’t want to have to build100,000 different prototypes of the widget before you stumbled across a combination ofheat and pressure and temperature and ingredients that achieve the desiredcharacteristics. You’d want to build a small handful of prototypes, and have the variablesgoing into each of the prototypes carefully varied from prototype to prototype to allow youto learn as much as possible in as few experiments. That’s what auto-manufacturersregularly do. 9
  10. 10. Design of ExperimentsWhere Else is Design of Experiments used?Design of Experiments-based methods have also been commonly used in agriculture fordecades. If you’re Monsanto and you want to grow an hardier seed that will grow in coldertemperatures and mature more quickly, you’re going to use Design of Experimentsmethods to identify the combinations of variables to test together in each test youexecute. 10
  11. 11. Design of ExperimentsWhere Else is Design of Experiments used?Many marketers also use Design of Experiments methods extensively. YouTuberecently ran an experiment involving 1,024 different combinations of fonts, colors,messages, and button sizes and shapes to find an optimal combination that increasedtheir sign-up rate by more than 15%. Jeff Fry, a tester at Microsoft wrote a good articleabout this and posted a phenomenal video by a Design of Experiments expert whoworked at Amazon before moving to Microsoft.A/B testing is a very simple “watered down” DoE approach. Multi-Variate Testing (MVT)is “full-blown” DoE-based marketing. 11
  12. 12. Design of Experiments What about in Software Testing?This is the question I’ve been focused on for the last 5 years. Seventeen pilot projects Iconducted at my prior company convinced me that DoE-based methods consistentlydeliver improvements in the efficiency and effectiveness of their business as usual testcases. 12
  13. 13. What is Hexawise? Challenges Hexawise Addresses Problems During Impact Felt During Test Design... Test Execution Manual Documentation Delayed Start Largely Repetitive Tests Inefficient Gaps in Coverage Missed DefectsHexawise was created to address these common testing challenges. 13
  14. 14. Mortgage Application ExampleLet’s use this simple example to demonstrate how pairwise testing works. I’veborrowed this idea from a presentation that Bernie Berger gave at StarEast.Imagine you’re testing a mortgage application that has several sets of details. This isan “executive summary” view of the different options that could be selected forapplication. We could make this example more complicated by including hardware andsoftware configuration options, user types, etc. We’re intentionally keeping it simplehere. 14
  15. 15. Mortgage Application ExampleIf we just examine one of those three branches, we see that it has 27 possible testcombinations associated with it. For example, 1 of the 27 possible tests would include:One example: Income = Low & Credit Rating = Medium & Customer Status = VIPThere are 26 other similar combinations. 15
  16. 16. Mortgage Application ExampleWhen we examine all three branches, we see they have equal complexity. Each of thethree branches has 27 total possible combinations. How many total combinations arethere? Hint: It is not 81. 16
  17. 17. Which Tests Should You Choose? 27 X 27 X 27 = 19,683 Possible TestsThere are almost 20,000 possible combinations to choose from. 17
  18. 18. Prioritization How many test inputs are needed to trigger defects in production? 5% 1 11% 2 (“pairwise”) 51% 3 33% 4, 5, or 6 In order to prioritize which specific combinations should be selected as high priority tests from those ~20,000 possible tests, it is extremely important to understand that the vast majority of defects in production can be triggered by just two test inputs tested together (e.g., a test that includes Income = Low as the first test input and also includes Credit Rating = High as the second test input). This fact has extremely important implications for software testers. Unfortunately, very few software testers are aware of (a) this fact, or (b) the implications. The implications for software testers is that small sets of tests that ensure every possible PAIR of values get tested• Medical Devices:  D.R. Wallace, D.R. Kuhn, Failure Modes in Medical Device Software: an Analysiseffectiveof Recall Data, International Journal of Reliability, Quality, and Safety Engineering, Vol. 8, No. 4, 2001.     together in at least one test case are extremely efficient and of 15 Years at finding defects.• Browser, Server:  D.R. Kuhn, M.J. Reilly, An Investigation of the Applicability of Design of Experiments to Software Testing, 27th NASA/IEEE Software Engineering Workshop, NASA Goddard SFC 4-6 December, 2002 .  • NASA database:  D.R. Kuhn, D.R. Wallace, A.J. Gallo, Jr., Software Fault Interactions and Implications for Software Testing, IEEE Trans. on Software Engineering, vol. 30, no. 6, June, 2004.  • Network Security:  K.Z. Bell, Optimizing Effectiveness and Efficiency of Software Testing: a Hybrid Approach,  PhD Dissertation, North Carolina State University, 2006.   18
  19. 19. Mortgage Application ExampleSelect a couple pairs of test inputs from this mind map. Possible pairs of inputs couldinclude pairs like these shown below. Select your own two pairs though.First Example of a pair of values: Income = Low & Credit Rating = MediumSecond Example of a pair of values: Income= High & Customer Status = Employee 19Third Example: Income = High & Credit Rating = Low
  20. 20. These are the same testinputs that have beenimported from the mindmap into the Hexawisetool. When you click onthe “Create Tests” icon atthe top of the screen, youwill see a pairwise testingsolution. Every and allpairs of values might haveselected will be included ina surprisingly smallnumber of tests. 20
  21. 21. Only 17 tests are required (out f 19,683 possible tests) to test every single possible pairof test conditions together in the same test case at least once.In other words, every single pair of test conditions will be tested together at least once.The pair of conditions we selected at random Income = Low, Credit Rating = Mediumappears in test number 8. All the other pairs are also tested together at least once.If we have done a thorough job of identifying test inputs, the vast majority of defectswill be triggered by these 17 tests out of almost 20,000 tests. This is the lesson fromDesign of Experiments that have been learned and applied in so many other industriessince the 1930’s and are now being applied by an increasing number software testers. 21
  22. 22. This is the same set of 17 tests shown in case the speaker notes on the last slidecovered a pair of values you were trying to confirm were tested together. 22
  23. 23. The tests “front-load” coverage. 87% of the pairs of values have already been testedtogether by the end of the 9th test.This is simply a coverage chart showing what percentage of test input pairs have beentested together so far as a percentage of the total number of possible pairs that couldbe tested together in the System Under Test. 23
  24. 24. User Experiences Lanette Creamer’s ExperiencesNow lets’ hear from Lanette and Ajay... 24
  25. 25. HEXAWISEFirst used on June 08, 2009
  26. 26. FIRST STEPSInterestedto try new softwareAware of allpairs.exeProblem Statement  Multiple Printers  Printer Specific Charts  Chart 1 & Chart 2  Other Settings Sl. No Printer Chart 1 Chart 2 Settings 1 ABC 16 strips 64 strips Borderless 15 LMN 64 strips ------- Auto cut 45 XYZ ------- 16 strips -------
  27. 27. TESTED WITH ALLPAIRSExcel to Notepad to ExcelVery useful when all pairs are validUnable to mention invalid pairsSteps to be repeated based on casesMaintained a common repository
  28. 28. HEXAWISEEasy to useOne user account – anytime accessibleCan specify invalid pairsMultiple strength cases
  29. 29. HEXAWISE WISH LISTDesktop version – useful without internet tooAble to define invalid pairs after the cases are generatedEasy method to define invalid pairsNeed to try project sharing & excel import
  30. 30. Disclaimer: Thinking Req’dThis is a photo I saw Lanette use in earlier presentations. It is absolutely spot on in thiscontext. Designing DoE-based software tests is not a paint-by-numbers approach.You need to use your critical thinking skills. Without using them, there will be agarbage in / garbage out problem. ? 30
  31. 31. Disclaimer - Imperfect ModelsWhen you use a Design of Experiments-based test design tool, you effectively create a ?model that will generate your tests. Whenever you do so, there will be parts of theSystem Under Test you will miss. Perhaps (probably?) you will miss important parts. 31
  32. 32. Disclaimer - Which Inputs?When creating DoE-based software tests, you will face the same kinds of test designconsiderations you always have... as well as new, DoE-specific considerations. ? 32
  33. 33. “The Truffle Pig Problem”Design of Experiments-based test design methods face a “truffle pig problem.” Ifsoftware bugs were like leaves on your lawn that you wanted to get rid of, DoE-basedtest-design based methods would be much more popular than they are now. DoE-based methods would be the equivalent of a leaf blower: you’d be able to instantly seeyour productivity increase.Unfortunately bugs are not visible, like leaves. They’re hiding, unseen, like truffles. Itis my experience, that DoE-based test design methods are like an especially thoroughand efficient truffle pig. The problem is, of course, that if someone gave you a supertruffle pig that was twice as good at finding truffles as your regular truffle pig, youwould probably have a hard time assessing how good it was. DoE-based test designmethods face this same challenge. ? 33
  34. 34. How Can You Know?Here’s the best approach I’ve come up with to answer the question of “how can youknow DoE-based test design methods are better than manually-selected test cases?” ? 34
  35. 35. “Let’s test this hypothesis.”Even though we can complete a meaningful “bake-off” pilot project within just a coupleman days of effort, this is the typical reaction I get from test teams who I propose thisapproach to! (brief video of office mates diving under desks, hiding under plants, etc.) CerealIt is amazing how quickly people tend to run and hide when they are given the Boxopportunity to learn something that could fundamentally change their software testingeffectiveness. Toyota -The irony is that teams will say “we’re too busy to execute a one or two day pilot project Entering thenow.” Hello? In my experience, the findings from the pilot - on average - more thandouble the number of defects found per tester hour. The entire point of learning about Truck marketDesign of Experiments-based test design techniques, like pairwise and 3-way, andorthogonal array-based / OA testing is to improve your efficiency and effectiveness... So in the U.Syou can get much more done with fewer resources. Saying “I’m too busy to learn howto do that” is... shortsighted is probably the most diplomatic word. 35
  36. 36. Results: Less Test Design Time Different Test Different Same Design Results Approach System Time to Design ~ 30 Test Under - 40% Less Time Tests Identify tests manually vs. Combinatorial Test Ideas Test Generation (b/c Many generate tests Coverage Steps are Automated) of using a Design Experiments- No. of Bugs Found Time based tool Time to ExecuteIn my experience, teams that have agreed to pilot projects have seen these results. It Teststakes far less time to generate tests using this approach because many steps in the testcase selection process and test case documentation process get automated. ? 36
  37. 37. Results: Better Coverage Different Test Different Same Design Results Approach System Time to Design Under Test Identify tests Tests manually vs. Combinatorial Test Ideas generate tests Coverage using a Design of Experiments- No. of Bugs Found Time based toolIn my experience, it is easy to show that combinatorial coverage (e.g. how many pairs ofvalues, how many triples of values, are tested together, Time far superior with this etc.) is to Executeapproach. In this actual example from a couple months ago, we showed that 51 Tests more thanbusiness as usual tests that were put together manually did not test for1,400 pairs of values. ?A skeptic will probably say... “OK. Interesting, but what does that translate to in termsof actual defects found?” 37
  38. 38. Results: More Bugs Found Different Test Will depend upon: Different Same Design (1) the System Under Test, Results Approach (2) Test Designer skill, and (3)System the coverage strength of Time to Design Under Test Identify tests Tests the DoE-based tests. manually vs. Combinatorial Test Ideas My Experience fromgenerate tests Coverage dozens of projects: of using a Design 2-way DoE-based Experiments- No. of Bugs Found Time based toolIn my experience, 2-way (or pairwise) tests - using the same test ideas as used in tests consistently more Time to Execute asbusiness as usual tests - have consistently found defects than the business find more.usual tests. This is true even when the business as usual tests are far higher in numberthan the pairwise set of tests. TestsIf you used 3-way or 4-way sets of tests, the number of defects found by this Design of ?Experiments-based test design approach would be far higher than found using thebusiness as usual approach. 38
  39. 39. Results: ~2x Bugs / Hour Different Test Will depend upon: Different Same System Under Test, Design (1) the Results Approach (2) Test Designer skill, and (3) the coverage strength System DoE-based tests. of the Time to Design Under Test Identify tests Tests My Experience from dozens of projects: vs. Test Ideas manually Combinatorial generate tests ~2-way DoE-based Coverage using a Design of tests consistently Experiments- No. of Bugs Found find MANY more Time based tool Time to Execute bugs / hour Tests (often double)The number of defects is higher using Hexawise. The number of tests executed is ?lower using Hexawise. On average, in the dozens of pilot projects I have seen, thenumber of defects found per tester hour is often double the number of defects foundper tester hour from business as usual sets 39 tests. of
  40. 40. How Can You Know?I would strongly encourage you to try a simple one or two day pilot project. In fact, I’llhelp you do it if you agree to publish the results (whether good or bad). ? 40
  41. 41. Additional Information James Bach - Pairwise Testing: A Best Practice that Isn’tWe’ve barely scratched the surface on the topic of what Design of Experiments-basedtest design is and how you could get started using it. Here are some good sources tofind out more about it and how you can get started using it. I am happy to talk to youabout it if you have any questions. ? 41
  42. 42. Questions? ?Thank you all for your time. Any questions? 42
  43. 43. de s x S li e n diA pp 43
  44. 44. Select Your Thoroughness GoalTesting for every pair of input values is just a start. The test designer cangenerate plans with very different levels of testing thoroughness.The 2-way test cases Hexawise generates have been consistently shown to be more thorough than standard test casescreated by most test teams at Fortune 500 firms. Even so, Hexawise allows users to “turn up the coverage dial” dramaticallyand generate other, extraordinarily thorough, sets of tests. In this case, we see Hexawise can generate test set solutions forthis simple insurance ratings engine example ranging in size from 28 test cases (for users who prioritize speed to market) allthe way up to 3,925 test cases (for users who desire extremely thorough testing). 44
  45. 45. How Much is Enough Testing?The “Analyze Coverage” screen shows you how much coverage is achievedat each point in the set of tests. In other words, what percentage of thetargeted combinations have been tested for after each test? This chart gives teams the ability to make fact-based decisions about “how much testing is enough?” Here, for example, 83% of the pairs of test inputs entered into this plan have been tested together after only 12 tests (out of 295,000 possible tests). 45
  46. 46. Better Than Hand-Selected TestsIf you take a close look at any set of Hexawise-generated test cases youwill notice that there is an enormous amount of variation from test caseto test case (and the smallest amount of repetition that is mathematicallypossible to achieve). In contrast, if you were to translate your existing manually-selected test cases into a similar format and analyze them, you would find that the manually-selected test cases have far more repeated test combinations and far less variation from test case to test case. This is is a big part of the reason why Hexawise generates dramatic efficiency improvements. In addition, if you were to graph the percent of the targeted 2-way combinations achieved by your existing manually-selected test cases, you would find that there are many pairs of test inputs that were never covered by your tests. The fact that Hexawise will ensure every pair of test inputs gets tested in at least one test case is a big part of the reason why Hexawise- generated tests result in superior coverage and more defects found during test execution. 46
  47. 47. What is DoE-based testing?Topic Details Design of Experiments-based testing is a test design approach usedDefinition to identify a small subset of tests (from many possible ones) in order to find as many defects as possible in as few tests as possible. Test conditions are constructed to ensure:Why it • No combinations of conditions get accidentally omittedWorks • Unproductive repetition is minimized “Design of Experiments-based testing” covers several closely- related subjects:“AKA” • Pairwise / AllPairs • Orthogonal Array / OA / OATs • 2-way, 3-way, ... t-way 47
  48. 48. Software Testing Challenges• Software applications are very complex; it is impossible to test every possibility• Extraordinarily smart, pragmatically-oriented applied statisticians created the field of “Design of Experiments” to solve exactly this challenge; for the last 40+ years they have developed highly effective math-based covering array techniques and similar strategies which are now broadly used in many areas including manufacturing, advertising, and agriculture• These proven Design of Experiments techniques, which are designed to find out as much information as possible in as few test cases as possible, also have direct applicability to the software testing field• Unfortunately, the vast majority of software testers in the relatively young field of software testing have never heard of any Design of Experiments concepts like MFAT vs. OFAT, Orthogonal Array coverage, pairwise coverage, or even the existence of the “Design of Experiments” field• Instead of using 40+ years of Design of Experiments-based knowledge to design tests that are as effective as possible, testers almost always manually select the combinations of test conditions they use in their tests, and as a result... 48
  49. 49. Results without DoE / Hexawise... the results from manual test case selection efforts are consistently farfrom optimal: Missed combinations Wasteful repetition 49
  50. 50. Results with DoE / HexawiseIn contrast, Hexawise algorithms use Design of Experiments-basedmethods to generate tests. The result is that Hexawise-generatedtests consistently find more defects in fewer tests. Hexawise-generated tests pack more coverage into each test. 50