This document provides guidance on creating a project plan for testing a data warehouse project. It discusses key aspects to consider such as reviewing documentation, estimating resources like test engineers, determining the number of ETL legs and release cycles, assessing test complexity, and ensuring the test automation tool QuerySurge is configured. An example project plan estimates the time to review documentation, identifies one test engineer and ETL leg, plans for four release cycles, and provides estimates of 7 low complexity, 21 medium complexity, and 8 high complexity tests.
Creating a Project Plan for a Data Warehouse Testing Assignment
1. a software division of
Creating a Project Plan for a Data Warehouse Testing Assignment
Chris Thompson
Senior Solutions Architect
Mike Calabrese
Senior Solutions Architect
QuerySurge™
the smart Data Testing solution
QuerySurgeTM
™
2. QuerySurge™
a software division of
SENIOR DOMAIN EXPERT, DATA TESTING PRACTICE
• Military veteran - Aviation electronics technician in the
U.S Navy
• BS in computer science from the University of Delaware
• Successful implementations of QA projects in the Data
space for over 15 years
• Employee for RTTS for the past 21 years
• Started with RTTS as an entry level Test Engineer
• Worked in numerous fields including Pharmaceutical,
Utilities and Retail
Chris Thompson
3. QuerySurge™
a software division of
SENIOR DOMAIN EXPERT, DATA TESTING PRACTICE
• Joined RTTS as a Test Engineer in 2009
• Over a decade of experience successfully
implementing automated functional, data validation
and ETL testing solutions for multiple clients across
many industry verticals.
• Mike is a technical expert on QuerySurge, RTTS’
flagship data testing solution, and supports clients
around the world with their QuerySurge
implementations.
• BS in Computer Engineering from Hofstra University
Mike Calabrese
4. QuerySurge™
a software division of
Introduction
• Data Testing is an integral part of the development of any data
project including, data warehouse, data migration and integration
projects
• Bad Data from defects can cause companies to make decisions that
could cost millions of dollars or in a health-related field could cost
dearly
5. QuerySurge™
a software division of
Handles more than 1 million customer transactions every hour
• data imported into databases that contain > 2.5 petabytes of data
• equivalent to 167 times the information contained in all the books in the US Library of Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
Introduction
6. QuerySurge™
a software division of
Introduction
What is a Data Source?
• A Data Source is a pool of data available for extraction.
• The concept of the Data Source is technologically neutral – it is not associated
with any specific technology.
• The most common Data Sources are databases, files, and XML documents.
7. QuerySurge™
a software division of
Introduction
What is a Data Warehouse? (In this case, the target)
• A collection of data or information intended to support business
decision making.
• Data Warehouses contain a wide variety of data that present a
coherent picture of business conditions.
• A Data Warehouse is a huge repository of electronically organized
data mainly meant for the purpose of reporting and analysis.
• Most Data Warehouses are sent data from multiple sources
(Databases and Files).
• A place where historical data is stored for archival, analysis and
security purposes.
Legacy DB
CRM/ERP
DB
Finance DB
8. QuerySurge™
a software division of
Introduction
What is ETL?
• In computing, the term Extract, Transform and Load (ETL) refers to a data
handling process that involves:
− Extract data from outside sources
− Transform data to fit operational or reporting needs
− Load data into the endpoint target (usually a database, more specifically a
Data Warehouse)
− Why ETL? Businesses need to load the Data Warehouse regularly
(incrementally/daily/weekly) so that it can serve its purpose of supporting
business analysis
9. QuerySurge™
a software division of
Introduction
Legacy DB
CRM/ERP
DB
Finance DB
Source Data ETL Process Target DWH
Extract
Transform
Load
10. QuerySurge™
a software division of
Introduction
Data Warehouse
Data Mart
Data Mart
BI Tool
BI Tool
Inventory
‘We have
212 Widgets
in the east
warehouse’
Customer Service
‘The paint
came off my
widget’
Advertising
‘Running a
new radio ad
today’
Transactional Analytical
11. QuerySurge™
a software division of
Introduction
Test Points and “ETL Legs”
• An ‘ETL Leg’ refers to a single ETL process that moves/transforms data between
two discrete points.
• A full ETL process may have multiple legs
• Test points are usually across single ETL legs –
the verification is between the source and
the target for that leg.
• Example: an operational source database
(source test point) is extracted, transformed
and loaded into a Data Warehouse (target test point).
Testing is conducted across this ETL leg.
Inventory
Data Warehouse
12. QuerySurge™
a software division of
Introduction
Legacy
DB
CRM/E
RP DB
Finance
DB
Data Sources ETL Process Target DW ETL Process Data Mart
ETL Process
Staging
ETL Leg
ETL Leg
ETL Leg
ETL ETL ETL
13. QuerySurge™
a software division of
Introduction
Single Leg Multi Leg
More tests need to be created Less tests need to be created
Tests are less complex Tests are more complex
Defects are easier to pinpoint Defects are more difficult to pinpoint
Execution time tends to be longer Execution time tends to be shorter
Single Leg vs. Multi Leg Approaches
14. QuerySurge™
a software division of
Introduction
Data Mapping Document
A data mapping document is frequently called a source-to-target map and is
generally created in a spreadsheet.
This document acts as a central part of the functional requirements. The following
information is contained within the mapping document:
•Source database information
▪Source table
▪Source column
•Target database information
▪Target table
▪Target column
•Data transformation logic
•Optional requirements
16. QuerySurge™
a software division of
Introduction
• Direct Map
• Selective column and row type
• Translation
• Lookups
• Transpose
• Field Splitting
• Field Merging
• Calculated and Derived
Transformation Types
17. QuerySurge™
a software division of
Introduction
Testing Methods – Automation Tool
• Automation with QuerySurge offers
− Bulk data verification, testing sample sizes up to 100%
− Management of test assets
− Test Scheduling
− Persistent access to test data
− Reporting
An automated data testing approach with QuerySurge can significantly improve
coverage, organization and efficiency when compared to the previously mentioned
manual testing techniques.
18. QuerySurge™
a software division of
The Project Plan
What you will need:
− Gather project documents and assets
• Mapping documents
• Requirement documents
• Data Model documents
− Estimate the time to review documentation
− Determine the number of test engineer resources
− Determine the number of ETL or test legs
− Determine the number of cycles or releases
19. QuerySurge™
a software division of
The Project Plan
− Determine complexity of project mappings
• Low Complexity: No transformation logic (1-to-1 mapping) or minor transformation
logic including a change to data types from source to target, selective row filtering, and
minor translations
• Medium Complexity: Transformation logic including translations, joins across tables,
field splitting, and field merging
• High Complexity: Transformation logic including major translations, multiple joins across
tables, calculated or aggregated fields, transposing, derived fields, match and merge.
− Is QuerySurge installed and configured for the project?
− Does the lead or test engineers require training?
20. QuerySurge™
a software division of
The Project Plan
Question
Review documentation 4
Number of Test engineers 1
Number of ETL Legs 1
Number of Releases/Cycles 4
Low Complexity Tests 7
Medium Complexity Tests 21
High Complexity Tests 8