Bj Rollison discusses challenges with using customer, tester-generated, static, and random data for testing. Random data risks not being representative of real data or reproducible. The solution is to generate probabilistic stochastic test data that models the real data population in a statistically unbiased and repeatable way. This is done by decomposing data parameters, generating valid and invalid values adhering to rules, and using algorithms to produce valid random outputs. This approach provides variability while resembling expected real data.
2. Customer provided data
Domain expertise
Generally very limited in scope
Tester generated data
Happy path, probabilistic data
Input population poorly defined, human bias
Random data not representative of population
Static data files
Library of historical failure indicators
Too restrictive
Ineffective with multiple iterations
3. Large number of variables
Variable sequences can result in a virtual infinite
number of combinations
Impractical to test all values and combinations of
values in any reasonable testing cycle
Example:
NetBIOS name 15 alphanumeric characters
Using ASCII only chars, 82 allowable
characters (0x20 * + = | : ; “ ? < > , ) invalid*
Total number of possible input tests equals
8215 + 8214 + 8213…+ 821 =
51,586,566,049,662,994,687,009,994,574
4. It does not “look” like real world test data.
Years ago developers would argue that a name
textbox couldn’t contain a number!
To a computer, what is the difference between
the strings Margaret and ksjCu9ls?
Random data is not reproducible.
A seeded random generator will produce the
same exact result given the same seed value
Random data violates constraints of real data
Representative data from population
Deterministic algorithms
5. Sampling is commonly used in risk based testing
Samples must be representative
Samples must be statistically unbiased
Samples set must include variability for breadth
Random data generation provides variability, but
Simple random data may not be representative
Simple random data hard to reproduce
6. Goal – generate random data that is
Representative of the input data set
Statistically unbiased - random sample of
elements from a probability distribution
Value – input test data that
Provides greater variability
Includes expected and unexpected sequences
Eliminates human bias
Is better at evaluating robustness
Is dynamic!
7. System.Security.Cryptography
.RandomNumberGenerator class
Encrypted data indistinguishable from random
Cannot be seeded; no repeatability
System.Random class
Sequence of numbers that meet certain
statistical requirements for randomness
Can be seeded for repeatability
Not perfect, but reasonably random for
practical purposes
8. Comparison between RandomNumberGenerator
class and Random class
Red – RNG
Blue – Random
Both pseudo –
random
No obvious
pattern
based on sample by
Jeff Attwood
http://www.codinghorror.com
9. User defined seed
Tester provides seed value for repeatability
Dynamic seed
New seed value
generated at
runtime
Seed variable
must be
preserved in
test log
public static int GetSeedValue(
string seedValue)
{
int seed = 0;
if (seedValue != string.Empty)
{
seed = int.Parse(seedValue);
}
else
{
Random r = new Random();
seed = r.Next();
}
return seed;
}
10. Define the representative data set
Example – Credit card numbers
341846580149320
Card length –
(BIN + digits)
between 14 and
19 depending on
card type
Bank Identification
Number (BIN) –
between 1 and 4
digits depending
on card type
Checksum – Luhn (Mod 10) algorithm
11. Equivalence class partitioning decomposes data
into discrete valid and invalid class subsets
Card type Valid Class subsets Invalid Class subsets
American
Express
BIN – 34, 37
Length – 15 digits
Checksum – Mod 10
Unassigned BINs
Length <= 16 digits
Length >= 14 digits
Fail Checksum
Maestro BIN – 5020, 5038,
6034, 6759
Length – 16, 18
Checksum – Mod 10
Unassigned BINs
Length <= 15 digits
Length >= 19 digits
Length == 17 digits
Fail Checksum
Input variable Valid input Invalid input
12. Valid BIN
Number(s)
& Length
Seed
Generator
Is Valid
Luhn
Algorithm
Random
Number
Generator
Card
Length(s)
by Type
Get
credit card
Info
Input
(card type)
Output
(card #)
Input
(optional seed)
13. Assigned BINs ensures the data looks real
The Mod10 check ensures the data feels real
Result is representative of real data!
GetCardNumber(int cardType, int seed)
Get BIN (cardType, seed);
Get CardLength (cardType, seed);
Assign BIN to cardNumber;
Generate a new random object;
for (cardNumberLength < CardLength)
Generate a random number 0 <> 9;
Append it to the cardNumber;
if IsNotValidCardNumber(cardNumber)
while (IsNotValidCardNumber(cardNumber))
increment last number by 1;
return cardNumber;
Deterministic
algorithm to
generate a valid
random credit
card
14. Model
test
data
Generate
test data
Apply
test
data
Verify
results
Decompose the
data set for each
parameter using
equivalence class
partitioning
Generate valid
and invalid test
data adhering to
parameter properties,
business rules, and
test hypothesis
Apply the test
data to the
application
under test
Verify the actual
results against
the expected
results – oracle!
15. JCB Type 1
BIN = 35 Len = 16
JCB Type 2
BIN = 1800, 2131, Len = 15
19. Static test data wears out!
Random test data that is not repeatable or not
representative may find defects, but…
Probabilistic stochastic test data
Is a modeled representation of the population
Is statistically unbiased
Is especially good at testing robustness
Recommend using both static (real-world)test data
and probabilistic stochastic test data for breadth
21. Practice .NET Testing with IR Data
Bj Rollison
http://www.stpmag.com/issues/stp-2007-06.pdf
Automatic test data generation for path testing
using a new stochastic algorithm
Bruno T. de Abreu, Eliane Martins, Fabiano L. de Sousa
http://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf
Data Generation Techniques for Automated
Software Robustness Testing
Matthew Schmid & Frank Hill
http://www.cigital.com/papers/download/ictcsfinal.pdf
Tools
http://www.TestingMentor.com