EuroSTAR Software Testing Conference 2008 presentation on What's Wrong With My Test Data by Lauri Pietarinen. See more at conferences.eurostarsoftwaretesting.com/past-presentations/
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Lauri Pietarinen - What's Wrong With My Test Data
1. What's Wrong with My Test Data?
Lauri Pietarinen
Relational Consulting
EuroSTAR 2008
2. My Background
• Tietokonepalvelu (Pension Insurance) 85-97
– Mainframe development in PLI/DL/I environment
– Support department 87-95
• Maintenance of prog. environment, DB2-training etc...
• AtBusiness Communications 97-04
– Internet applications
– Database design, DW-implementations, Java-programming,
Project management etc...
• Relational Consulting (own company) 04
– Independent database consultant
– Specialising in test data management
• Lauri.pietarinen (at) relational-consulting.com
3. Customers
• Finland
– Ilmarinen (Insurance)
– Arek (Insurance)
– TietoEnator
– Area (Travel agency)
– + many others…
• Sweden
– BGC
– Alecta
– SEB
4. Agenda
• Why is test data management important?
• Alternatives for populating test databases
• Technical issues involved
– what is needed (scope of data)?
– subsetting issues
– de-identifying
• Case: Pension Insurance Company in Sweden
8. Problems with test data
• Test data is not semantically valid
– errors in test programs have corrupted the database
– integrity over several systems
• external interfaces!
• Test data is not comprehensive
– hard to build realistic test cases
• Test data cases are consumed
– Contracts terminated and people declared dead
– "You can't step into the same river twice"
• Herakleitos
programs can't even be started
solving errors caused by faulty data
10. How to Populate the Test DB?
SQL-Scripts Robot over UI
(e.g. QTP)
100% COPY
5%
5%
PROD
TEST
EXTRACT
1
2
3
4
11. How to Populate Test DB?
• Copy total production full volume into test
– + is comprehensive and intact
– + technically simple (can be done with standard tools)
– - heavy operation with big databases
– - test environment hard to use and maintain
– - ad hoc updates from production not possible
– - does not solve problem of consumption and corruption
• Scripts
– + create non existent cases
– + only need SQL-editor
– - lots of repetitive work
– - go out of date
• Extract subset from production
– + right data when needed
– + same technology can be used to manage the subsets
– - need to build home made tools/scripts or purchase one
– - expert knowledge of database structure required
12. How to Extract?
• Home made tools/scripts
– many organisations have such tools/scripts/programs
– effort needed to maintain them
• often tied to one person (who will soon be retired!)
• Generic products
– DataBee (Net 2000)
– Grid-Tools (Grid-Tools)
– Optim/Relational Tools (IBM)
– Data Express (Micro Focus)
14. Lot's of issues still remain
• What is a test case?
– must define what is needed for the spesific test
– customer, with orders or without?
– often simpler to extract superset of tables
• Finding the right cases for your test
– green haired left handed midget
– maintain library of keys and/or SQL-scripts?
• Bookkeeping (is somebody else already using this
case?)
• Integrity over applications
– External parties
– 3rd party software
15. Some Concepts (Optim)
• Extract
– start from a set of rows in start table and extract all related
rows from specified tables
– use RI or "soft relations" for navigation
• Extract File
– binary format file containing extracted data
• Insert
– add rows from extract file into database
• Delete
– delete rows that were extracted
• Compare
– compare two extracts and flag deleted, inserted and
modified rows
20. Impact on Program/DB Design
• Batch programs should be able to operate on subsets
of cases
– so as not to consume and disturb the whole database!
– external parameters (e.g. list of customers) or other
indicators
– new columns/tables in database for subsetting?
• Soft Date
– Don't get date from the system, give it as a parameter
• Choice of indentifiers
– surrogate keys/logical keys
• How identifiers are generated
– surrogate table
– sequence
– select max(key)+1 from table
21. Case: Company X
• X is a Swedish insurance company that
specialises in Labour Pension Insurances
• X recently renewed nearly all of it's
application portfolio
– Billing, payouts, insurance, DW, actuary,
extranet...
– Went live April 1st 2008
• Large project with a budget of about 100M€
– development time 6 years
– up to 150 persons involved in the project
22. Case: X
• Technical platforms: >5
• Kinds of DBMS's: 4
• Number of databases: ~20
• Number of tables: >1200
• Number of integration interfaces: ~100
• Number of batches: 150
• Number of online dialogs: 100
• Number of test cases: > 1400
24. Case: X
Optim EXTRACT Process Report
Extract File : K87376.TDSRES.B54.S004.EAF.SEXT.XF
Access Definition : TDS.EAF.EXTRACT
Created by : Job K87376, using SQLID K87376 on DB2 Subsystem DB2P
Time Started : 2008-03-26 08.21.27
Time Finished : 2008-03-26 08.48.37
Process Options:
Process Mode : Batch
Retrieve Data using : DB2
Limit Extract Rows : 40000000
RowList : 'K87376.TDSRES.B54.S004.EAF.SEXT.PNS'
Total Number of Extract Tables : 112
Total Number of Extracted Rows : 6676868
Total Number of First Pass Start Table Rows : 117172
5% of a total of 2M persons
25. X: Life Cycle Tests
• Test environment was loaded with one
person at a time
– from a set of about 30 persons with different
profiles
• 10 months worth of batches were run at the
rate of about 7 min/month
– Batches used "soft date" to simulate time flow
• Before/after compares were made on the
database
26. De-identifying Sensitive Data
• Tightening regulation
• Outsourcing
– providing your contractors with good test data is
essential
– however, security issues become important
27. De-identifying Issues
• How to de-identify
– use algorithm to create new id (soc.security nr)
– create a random id and save in lookup table
• always use the same?
– use a random lookup table for names
• Issues
– propagating changes to foreign keys
– introducing company wide schemes
– introducing extra company schemes
28. Create a TDM-System
• Wrapping it up by building a Test Data
Management System
– process for copying data from one environment to
the other
– automated system to minimize manual work
involved
– Imbed bookkeeping and deidentifying
– Auditing and statistics "for free"
30. Test Data Management
Plan
Centralize
Use a tool
Test data management must be
an integral part of application
testing and development and not
an afterthought