Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes

Vendor Meets User

The Hexawise Test Design Tool and Two
Testers Who Tried to Use it in Real Life

Presented at CAST, 2011
Justin Hunter
Lanette Creamer
Ajay Balamurugadas

Topics

Introduction to Hexawise
“Inside the Mind of the Vendor”
Justin Hunter

Experiences
“Inside the Minds of Testers”
Lanette Creamer & Ajay Balamurugadas

Hexawise is a test case design tool used by testers to design their tests. In a context
where test scripts are used, Hexawise can design detailed test scripts. In an
Exploratory Testing context, Hexawise is used to generate “test ideas” that encourage
the tester for the tester to explore, and even design mini tests on the ﬂy.

2

My Dad - William G. Hunter

Why I created Hexawise has a whole lot to do with who my father was. He was a leading
applied statistician who specialized in how to make experiments more efficient & effective.

3

1960’s

In the 1960’s my dad brought my family to Singapore where he taught at a university and
worked with local companies.

4

1970’s

In the 1970’s my dad brought my family to Nigeria where, again, he taught at a university
and worked with local companies. That’s me in the lower right hand corner. We were
visiting a factory my father was helping.

5

1980’s

In the 1980’s he did something that his colleagues thought was pretty crazy because they
thought his expertise and lessons learned probably wouldn’t transfer into the government
sector. While a professor in Madison, Wisconsin, he started collaborating with local and
state government agencies.

6

Design of Experiments

Why did he uproot our family every few years and then start collaborating with
government agencies?

It was because he passionately believed that sharing his expertise in Design of
Experiments (a specialized ﬁeld of applied statistics), would really help people - by giving
them skills that would improve both quality and productivity. This is a book cover from a
book he co-wrote with George Box that has helped to increase awareness of what Design
of Experiments is and how practitioners should use it.
7


So what is Design of Experiments? It’s a specialized ﬁeld of applied statistics that has
been around since the 1930’s. Simply put, it is focused on answering this question.

8


Where is Design of Experiments used?

It is used extensively in manufacturing, among other industries. If you’re a manufacturer
trying to create a widget for a car part, for example, you don’t want to have to build
100,000 different prototypes of the widget before you stumbled across a combination of
heat and pressure and temperature and ingredients that achieve the desired
characteristics. You’d want to build a small handful of prototypes, and have the variables
going into each of the prototypes carefully varied from prototype to prototype to allow you
to learn as much as possible in as few experiments. That’s what auto-manufacturers
regularly do.
9


Where Else is Design of Experiments used?

Design of Experiments-based methods have also been commonly used in agriculture for
decades. If you’re Monsanto and you want to grow an hardier seed that will grow in colder
temperatures and mature more quickly, you’re going to use Design of Experiments
methods to identify the combinations of variables to test together in each test you
execute.

10


Where Else is Design of Experiments used?

Many marketers also use Design of Experiments methods extensively. YouTube
recently ran an experiment involving 1,024 different combinations of fonts, colors,
messages, and button sizes and shapes to ﬁnd an optimal combination that increased
their sign-up rate by more than 15%. Jeff Fry, a tester at Microsoft wrote a good article
about this and posted a phenomenal video by a Design of Experiments expert who
worked at Amazon before moving to Microsoft.

A/B testing is a very simple “watered down” DoE approach. Multi-Variate Testing (MVT)
is “full-blown” DoE-based marketing.
11


What about
in Software
Testing?
This is the question I’ve been focused on for the last 5 years. Seventeen pilot projects I
conducted at my prior company convinced me that DoE-based methods consistently
deliver improvements in the efficiency and effectiveness of their business as usual test
cases.

12

What is Hexawise?

Challenges Hexawise Addresses

Problems During Impact Felt During
Test Design... Test Execution
Manual Documentation Delayed Start

Largely Repetitive Tests Inefﬁcient

Gaps in Coverage Missed Defects
Hexawise was created to address these common testing challenges.
13

Mortgage Application Example

Let’s use this simple example to demonstrate how pairwise testing works. I’ve
borrowed this idea from a presentation that Bernie Berger gave at StarEast.

Imagine you’re testing a mortgage application that has several sets of details. This is
an “executive summary” view of the different options that could be selected for
application. We could make this example more complicated by including hardware and
software conﬁguration options, user types, etc. We’re intentionally keeping it simple
here.
14


If we just examine one of those three branches, we see that it has 27 possible test
combinations associated with it. For example, 1 of the 27 possible tests would include:

One example: Income = Low & Credit Rating = Medium & Customer Status = VIP

There are 26 other similar combinations.
15


When we examine all three branches, we see they have equal complexity. Each of the
three branches has 27 total possible combinations. How many total combinations are
there? Hint: It is not 81.
16

Which Tests Should You Choose?

27 X 27 X 27 =

19,683 Possible Tests

There are almost 20,000 possible combinations to choose from.

17

Prioritization

How many test inputs are needed
to trigger defects in production?

5%
1
11%
2 (“pairwise”)

51%
3
33%
4, 5, or 6
In order to prioritize which specific combinations should be selected as high priority tests from those ~20,000 possible tests, it is
extremely important to understand that the vast majority of defects in production can be triggered by just two test inputs tested together
(e.g., a test that includes Income = Low as the first test input and also includes Credit Rating = High as the second test input).

This fact has extremely important implications for software testers. Unfortunately, very few software testers are aware of (a) this fact, or
(b) the implications. The implications for software testers is that small sets of tests that ensure every possible PAIR of values get tested
• Medical Devices: D.R. Wallace, D.R. Kuhn, Failure Modes in Medical Device Software: an Analysiseffectiveof Recall Data, International Journal of Reliability, Quality, and Safety Engineering, Vol. 8, No. 4, 2001.
together in at least one test case are extremely efficient and of 15 Years at finding defects.
• Browser, Server: D.R. Kuhn, M.J. Reilly, An Investigation of the Applicability of Design of Experiments to Software Testing, 27th NASA/IEEE Software Engineering Workshop, NASA Goddard SFC 4-6 December, 2002 .
• NASA database: D.R. Kuhn, D.R. Wallace, A.J. Gallo, Jr., Software Fault Interactions and Implications for Software Testing, IEEE Trans. on Software Engineering, vol. 30, no. 6, June, 2004.
• Network Security: K.Z. Bell, Optimizing Effectiveness and Efficiency of Software Testing: a Hybrid Approach, PhD Dissertation, North Carolina State University, 2006.

18


Select a couple pairs of test inputs from this mind map. Possible pairs of inputs could
include pairs like these shown below. Select your own two pairs though.

First Example of a pair of values: Income = Low & Credit Rating = Medium

Second Example of a pair of values: Income= High & Customer Status = Employee
19
Third Example: Income = High & Credit Rating = Low

These are the same test
inputs that have been
imported from the mind
map into the Hexawise
tool. When you click on
the “Create Tests” icon at
the top of the screen, you
will see a pairwise testing
solution. Every and all
pairs of values might have
selected will be included in
a surprisingly small
number of tests.

20

Only 17 tests are required (out f 19,683 possible tests) to test every single possible pair
of test conditions together in the same test case at least once.

In other words, every single pair of test conditions will be tested together at least once.
The pair of conditions we selected at random Income = Low, Credit Rating = Medium
appears in test number 8. All the other pairs are also tested together at least once.

If we have done a thorough job of identifying test inputs, the vast majority of defects
will be triggered by these 17 tests out of almost 20,000 tests. This is the lesson from
Design of Experiments that have been learned and applied in so many other industries
since the 1930’s and are now being applied by an increasing number software testers.
21

This is the same set of 17 tests shown in case the speaker notes on the last slide
covered a pair of values you were trying to conﬁrm were tested together.

22

The tests “front-load” coverage. 87% of the pairs of values have already been tested
together by the end of the 9th test.

This is simply a coverage chart showing what percentage of test input pairs have been
tested together so far as a percentage of the total number of possible pairs that could
be tested together in the System Under Test.

23

User Experiences

Lanette Creamer’s Experiences

Now lets’ hear from Lanette and Ajay...

24

HEXAWISE
First used on June 08, 2009

FIRST STEPS

Interestedto try new software
Aware of allpairs.exe

Problem Statement
 Multiple Printers
 Printer Specific Charts
 Chart 1 & Chart 2
 Other Settings

Sl. No Printer Chart 1 Chart 2 Settings
1 ABC 16 strips 64 strips Borderless

15 LMN 64 strips ------- Auto cut

45 XYZ ------- 16 strips -------

TESTED WITH ALLPAIRS

Excel to Notepad to Excel
Very useful when all pairs are valid

Unable to mention invalid pairs

Steps to be repeated based on cases

Maintained a common repository

HEXAWISE
Easy to use
One user account –
anytime accessible
Can specify invalid
pairs
Multiple strength cases

HEXAWISE WISH LIST
Desktop version –
useful without internet
too
Able to define invalid
pairs after the cases are
generated
Easy method to define
invalid pairs
Need to try project
sharing & excel import

Disclaimer: Thinking Req’d

This is a photo I saw Lanette use in earlier presentations. It is absolutely spot on in this
context. Designing DoE-based software tests is not a paint-by-numbers approach.
You need to use your critical thinking skills. Without using them, there will be a
garbage in / garbage out problem.

?
30

Disclaimer - Imperfect Models

When you use a Design of Experiments-based test design tool, you effectively create a

?
model that will generate your tests. Whenever you do so, there will be parts of the
System Under Test you will miss. Perhaps (probably?) you will miss important parts.
31

Disclaimer - Which Inputs?

When creating DoE-based software tests, you will face the same kinds of test design
considerations you always have... as well as new, DoE-speciﬁc considerations.

?
32

“The Trufﬂe Pig Problem”
Design of Experiments-based test design methods face a “truffle pig problem.” If
software bugs were like leaves on your lawn that you wanted to get rid of, DoE-based
test-design based methods would be much more popular than they are now. DoE-
based methods would be the equivalent of a leaf blower: you’d be able to instantly see
your productivity increase.

Unfortunately bugs are not visible, like leaves. They’re hiding, unseen, like truffles. It
is my experience, that DoE-based test design methods are like an especially thorough
and efficient truffle pig. The problem is, of course, that if someone gave you a super
truffle pig that was twice as good at ﬁnding truffles as your regular truffle pig, you
would probably have a hard time assessing how good it was. DoE-based test design
methods face this same challenge.

?
33

How Can You Know?

Here’s the best approach I’ve come up with to answer the question of “how can you
know DoE-based test design methods are better than manually-selected test cases?”

?
34

“Let’s test this hypothesis.”

Even though we can complete a meaningful “bake-off” pilot project within just a couple
man days of effort, this is the typical reaction I get from test teams who I propose this
approach to! (brief video of office mates diving under desks, hiding under plants, etc.)
Cereal
It is amazing how quickly people tend to run and hide when they are given the
Box
opportunity to learn something that could fundamentally change their software testing
effectiveness.

Toyota -
The irony is that teams will say “we’re too busy to execute a one or two day pilot project
Entering the
now.” Hello? In my experience, the ﬁndings from the pilot - on average - more than
double the number of defects found per tester hour. The entire point of learning about
Truck market
Design of Experiments-based test design techniques, like pairwise and 3-way, and
orthogonal array-based / OA testing is to improve your efficiency and effectiveness... So
in the U.S
you can get much more done with fewer resources. Saying “I’m too busy to learn how
to do that” is... shortsighted is probably the most diplomatic word.

35

Results: Less Test Design Time

Different Test
Different
Same Design
Results
Approach
System Time to Design
~ 30 Test
Under - 40% Less Time Tests
Identify tests
manually vs. Combinatorial
Test Ideas Test Generation
(b/c Many generate tests
Coverage
Steps are Automated) of
using a Design
Experiments- No. of Bugs Found
Time based tool Time to Execute
In my experience, teams that have agreed to pilot projects have seen these results. It
Tests
takes far less time to generate tests using this approach because many steps in the test
case selection process and test case documentation process get automated.

?
36

Results: Better Coverage

Different Test
Different
Same Design
Results
Approach
System Time to Design
Under Test Identify tests Tests
Test Ideas
generate tests Coverage
using a Design of
Time based tool
In my experience, it is easy to show that combinatorial coverage (e.g. how many pairs of
values, how many triples of values, are tested together, Time far superior with this
etc.) is to Execute
approach. In this actual example from a couple months ago, we showed that 51
Tests more than
business as usual tests that were put together manually did not test for
1,400 pairs of values.

?
A skeptic will probably say... “OK. Interesting, but what does that translate to in terms
of actual defects found?”
37

Results: More Bugs Found

Different Test
Will depend upon: Different
Same Design
(1) the System Under Test, Results
Approach
(2) Test Designer skill, and
(3)System
the coverage strength of Time to Design
the DoE-based tests.
Test Ideas
My Experience fromgenerate tests Coverage
dozens of projects: of
using a Design
2-way DoE-based Experiments- No. of Bugs Found
Time based tool
In my experience, 2-way (or pairwise) tests - using the same test ideas as used in
tests consistently more Time to Execute as
business as usual tests - have consistently found defects than the business
ﬁnd more.
usual tests. This is true even when the business as usual tests are far higher in number
than the pairwise set of tests.
Tests
If you used 3-way or 4-way sets of tests, the number of defects found by this Design of
?
Experiments-based test design approach would be far higher than found using the
business as usual approach.
38

Results: ~2x Bugs / Hour

Different Test
Will depend upon: Different
Same System Under Test,
Design
(1) the Results
Approach
(2) Test Designer skill, and
(3) the coverage strength
System DoE-based tests.
of the Time to Design
My Experience from
dozens of projects: vs.
Test Ideas
manually Combinatorial
generate tests
~2-way DoE-based Coverage
using a Design of
tests consistently
ﬁnd MANY more
Time based tool Time to Execute
bugs / hour
Tests
(often double)
The number of defects is higher using Hexawise. The number of tests executed is

?
lower using Hexawise. On average, in the dozens of pilot projects I have seen, the
number of defects found per tester hour is often double the number of defects found
per tester hour from business as usual sets 39 tests.
of

How Can You Know?

I would strongly encourage you to try a simple one or two day pilot project. In fact, I’ll
help you do it if you agree to publish the results (whether good or bad).
?
40

Additional Information

James Bach - Pairwise Testing: A Best Practice that Isn’t

We’ve barely scratched the surface on the topic of what Design of Experiments-based
test design is and how you could get started using it. Here are some good sources to
ﬁnd out more about it and how you can get started using it. I am happy to talk to you
about it if you have any questions. ?
41

Questions?

?
Thank you all for your time. Any questions?

42

de s
x S li
e n di
A pp

43

Select Your Thoroughness Goal

Testing for every pair of input values is just a start. The test designer can
generate plans with very different levels of testing thoroughness.
The 2-way test cases Hexawise generates have been consistently shown to be more thorough than standard test cases
created by most test teams at Fortune 500 ﬁrms. Even so, Hexawise allows users to “turn up the coverage dial” dramatically
and generate other, extraordinarily thorough, sets of tests. In this case, we see Hexawise can generate test set solutions for
this simple insurance ratings engine example ranging in size from 28 test cases (for users who prioritize speed to market) all
the way up to 3,925 test cases (for users who desire extremely thorough testing).

44

How Much is Enough Testing?

The “Analyze Coverage” screen shows you how much coverage is achieved
at each point in the set of tests. In other words, what percentage of the
targeted combinations have been tested for after each test?

This chart gives teams the ability to make fact-based decisions about “how
much testing is enough?” Here, for example, 83% of the pairs of test inputs
entered into this plan have been tested together after only 12 tests (out of
295,000 possible tests).
45

Better Than Hand-Selected Tests

If you take a close look at any set of Hexawise-generated test cases you
will notice that there is an enormous amount of variation from test case
to test case (and the smallest amount of repetition that is mathematically
possible to achieve).
In contrast, if you were to translate
your existing manually-selected test
cases into a similar format and
analyze them, you would find that
the manually-selected test cases
have far more repeated test
combinations and far less variation
from test case to test case. This is
is a big part of the reason why
Hexawise generates dramatic
efficiency improvements.

In addition, if you were to graph
the percent of the targeted 2-way
combinations achieved by your
existing manually-selected test
cases, you would find that there are
many pairs of test inputs that were
never covered by your tests. The
fact that Hexawise will ensure
every pair of test inputs gets tested
in at least one test case is a big
part of the reason why Hexawise-
generated tests result in superior
coverage and more defects found
during test execution.

46

What is DoE-based testing?

Topic Details
Design of Experiments-based testing is a test design approach used
Deﬁnition to identify a small subset of tests (from many possible ones) in
order to ﬁnd as many defects as possible in as few tests as possible.

Test conditions are constructed to ensure:
Why it • No combinations of conditions get accidentally omitted
Works • Unproductive repetition is minimized

“Design of Experiments-based testing” covers several closely-
related subjects:
“AKA” • Pairwise / AllPairs
• Orthogonal Array / OA / OATs
• 2-way, 3-way, ... t-way

47

Software Testing Challenges

• Software applications are very complex; it is impossible to test every possibility

• Extraordinarily smart, pragmatically-oriented applied statisticians created the field
of “Design of Experiments” to solve exactly this challenge; for the last 40+ years
they have developed highly effective math-based covering array techniques and
similar strategies which are now broadly used in many areas including
manufacturing, advertising, and agriculture

• These proven Design of Experiments techniques, which are designed to find out
as much information as possible in as few test cases as possible, also have direct
applicability to the software testing field

• Unfortunately, the vast majority of software testers in the relatively young field of
software testing have never heard of any Design of Experiments concepts like
MFAT vs. OFAT, Orthogonal Array coverage, pairwise coverage, or even the
existence of the “Design of Experiments” field

• Instead of using 40+ years of Design of Experiments-based knowledge to design
tests that are as effective as possible, testers almost always manually select the
combinations of test conditions they use in their tests, and as a result...
48

Results without DoE / Hexawise

... the results from manual test case selection efforts are consistently far
from optimal:

Missed combinations Wasteful repetition

49

Results with DoE / Hexawise

In contrast, Hexawise algorithms use Design of Experiments-based
methods to generate tests. The result is that Hexawise-generated
tests consistently ﬁnd more defects in fewer tests. Hexawise-
generated tests pack more coverage into each test.

50

Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes

Similar to Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes (20)

Recently uploaded

Recently uploaded (20)

Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Testing Conference 2011 - with speaker notes