Bj Rollison - Pobabillistic Stochastic Test Data

Bj Rollison
Test Architect
Microsoft
http://www.TestingMentor.com
http://blogs.msdn.com/imtesty

Customer provided data
Domain expertise
Generally very limited in scope
Tester generated data
Happy path, probabilistic data
Input population poorly defined, human bias
Random data not representative of population
Static data files
Library of historical failure indicators
Too restrictive
Ineffective with multiple iterations

Large number of variables
Variable sequences can result in a virtual infinite
number of combinations
Impractical to test all values and combinations of
values in any reasonable testing cycle
Example:
NetBIOS name 15 alphanumeric characters
Using ASCII only chars, 82 allowable
characters (0x20 * + = | : ; “ ? < > , ) invalid*
Total number of possible input tests equals
8215 + 8214 + 8213…+ 821 =
51,586,566,049,662,994,687,009,994,574

It does not “look” like real world test data.
Years ago developers would argue that a name
textbox couldn’t contain a number!
To a computer, what is the difference between
the strings Margaret and ksjCu9ls?
Random data is not reproducible.
A seeded random generator will produce the
same exact result given the same seed value
Random data violates constraints of real data
Representative data from population
Deterministic algorithms

Sampling is commonly used in risk based testing
Samples must be representative
Samples must be statistically unbiased
Samples set must include variability for breadth
Random data generation provides variability, but
Simple random data may not be representative
Simple random data hard to reproduce

Goal – generate random data that is
Representative of the input data set
Statistically unbiased - random sample of
elements from a probability distribution
Value – input test data that
Provides greater variability
Includes expected and unexpected sequences
Eliminates human bias
Is better at evaluating robustness
Is dynamic!

System.Security.Cryptography
.RandomNumberGenerator class
Encrypted data indistinguishable from random
Cannot be seeded; no repeatability
System.Random class
Sequence of numbers that meet certain
statistical requirements for randomness
Can be seeded for repeatability
Not perfect, but reasonably random for
practical purposes

Comparison between RandomNumberGenerator
class and Random class
Red – RNG
Blue – Random
Both pseudo –
random
No obvious
pattern
based on sample by
Jeff Attwood
http://www.codinghorror.com

User defined seed
Tester provides seed value for repeatability
Dynamic seed
New seed value
generated at
runtime
Seed variable
must be
preserved in
test log
public static int GetSeedValue(
string seedValue)
{
int seed = 0;
if (seedValue != string.Empty)
{
seed = int.Parse(seedValue);
}
else
{
Random r = new Random();
seed = r.Next();
}
return seed;
}

Define the representative data set
Example – Credit card numbers
341846580149320
Card length –
(BIN + digits)
between 14 and
19 depending on
card type
Bank Identification
Number (BIN) –
between 1 and 4
digits depending
on card type
Checksum – Luhn (Mod 10) algorithm

Equivalence class partitioning decomposes data
into discrete valid and invalid class subsets
Card type Valid Class subsets Invalid Class subsets
American
Express
BIN – 34, 37
Length – 15 digits
Checksum – Mod 10
Unassigned BINs
Length <= 16 digits
Length >= 14 digits
Fail Checksum
Maestro BIN – 5020, 5038,
6034, 6759
Length – 16, 18
Checksum – Mod 10
Unassigned BINs
Length <= 15 digits
Length >= 19 digits
Length == 17 digits
Fail Checksum
Input variable Valid input Invalid input

Valid BIN
Number(s)
& Length
Seed
Generator
Is Valid
Luhn
Algorithm
Random
Number
Generator
Card
Length(s)
by Type
Get
credit card
Info
Input
(card type)
Output
(card #)
Input
(optional seed)

Assigned BINs ensures the data looks real
The Mod10 check ensures the data feels real
Result is representative of real data!
GetCardNumber(int cardType, int seed)
Get BIN (cardType, seed);
Get CardLength (cardType, seed);
Assign BIN to cardNumber;
Generate a new random object;
for (cardNumberLength < CardLength)
Generate a random number 0 <> 9;
Append it to the cardNumber;
if IsNotValidCardNumber(cardNumber)
while (IsNotValidCardNumber(cardNumber))
increment last number by 1;
return cardNumber;
Deterministic
algorithm to
generate a valid
random credit
card

Model
test
data
Generate
test data
Apply
test
data
Verify
results
Decompose the
data set for each
parameter using
equivalence class
partitioning
Generate valid
and invalid test
data adhering to
parameter properties,
business rules, and
test hypothesis
Apply the test
data to the
application
under test
Verify the actual
results against
the expected
results – oracle!

JCB Type 1
BIN = 35 Len = 16
JCB Type 2
BIN = 1800, 2131, Len = 15

Robust
testing
Multi-
language
input
testing
String length
fixed or variable
Seed value
Custom range for
greater controlUnicode
language
families
Assigned code
points
Reserved
characters
Unicode surrogate
pairs

1000 Unicode characters
from the sample population

Character corruption and
data loss
135 characters (bytes)
obvious data loss

Static test data wears out!
Random test data that is not repeatable or not
representative may find defects, but…
Probabilistic stochastic test data
Is a modeled representation of the population
Is statistically unbiased
Is especially good at testing robustness
Recommend using both static (real-world)test data
and probabilistic stochastic test data for breadth

Helping Testers
Unleash Their Potential!TM
Bj.Rollison@TestingMentor.com

Practice .NET Testing with IR Data
Bj Rollison
http://www.stpmag.com/issues/stp-2007-06.pdf
Automatic test data generation for path testing
using a new stochastic algorithm
Bruno T. de Abreu, Eliane Martins, Fabiano L. de Sousa
http://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf
Data Generation Techniques for Automated
Software Robustness Testing
Matthew Schmid & Frank Hill
http://www.cigital.com/papers/download/ictcsfinal.pdf
Tools

Bj Rollison - Pobabillistic Stochastic Test Data

Recommended

Recommended

More Related Content

Similar to Bj Rollison - Pobabillistic Stochastic Test Data

Similar to Bj Rollison - Pobabillistic Stochastic Test Data (20)

More from TEST Huddle

More from TEST Huddle (20)

Bj Rollison - Pobabillistic Stochastic Test Data