SlideShare a Scribd company logo
1 of 21
Download to read offline
a software division of
Creating a Project Plan for a Data Warehouse Testing Assignment
Chris Thompson
Senior Solutions Architect
Mike Calabrese
Senior Solutions Architect
QuerySurge™
the smart Data Testing solution
QuerySurgeTM
™
QuerySurge™
a software division of
SENIOR DOMAIN EXPERT, DATA TESTING PRACTICE
• Military veteran - Aviation electronics technician in the
U.S Navy
• BS in computer science from the University of Delaware
• Successful implementations of QA projects in the Data
space for over 15 years
• Employee for RTTS for the past 21 years
• Started with RTTS as an entry level Test Engineer
• Worked in numerous fields including Pharmaceutical,
Utilities and Retail
Chris Thompson
QuerySurge™
a software division of
SENIOR DOMAIN EXPERT, DATA TESTING PRACTICE
• Joined RTTS as a Test Engineer in 2009
• Over a decade of experience successfully
implementing automated functional, data validation
and ETL testing solutions for multiple clients across
many industry verticals.
• Mike is a technical expert on QuerySurge, RTTS’
flagship data testing solution, and supports clients
around the world with their QuerySurge
implementations.
• BS in Computer Engineering from Hofstra University
Mike Calabrese
QuerySurge™
a software division of
Introduction
• Data Testing is an integral part of the development of any data
project including, data warehouse, data migration and integration
projects
• Bad Data from defects can cause companies to make decisions that
could cost millions of dollars or in a health-related field could cost
dearly
QuerySurge™
a software division of
Handles more than 1 million customer transactions every hour
• data imported into databases that contain > 2.5 petabytes of data
• equivalent to 167 times the information contained in all the books in the US Library of Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
Introduction
QuerySurge™
a software division of
Introduction
What is a Data Source?
• A Data Source is a pool of data available for extraction.
• The concept of the Data Source is technologically neutral – it is not associated
with any specific technology.
• The most common Data Sources are databases, files, and XML documents.
QuerySurge™
a software division of
Introduction
What is a Data Warehouse? (In this case, the target)
• A collection of data or information intended to support business
decision making.
• Data Warehouses contain a wide variety of data that present a
coherent picture of business conditions.
• A Data Warehouse is a huge repository of electronically organized
data mainly meant for the purpose of reporting and analysis.
• Most Data Warehouses are sent data from multiple sources
(Databases and Files).
• A place where historical data is stored for archival, analysis and
security purposes.
Legacy DB
CRM/ERP
DB
Finance DB
QuerySurge™
a software division of
Introduction
What is ETL?
• In computing, the term Extract, Transform and Load (ETL) refers to a data
handling process that involves:
− Extract data from outside sources
− Transform data to fit operational or reporting needs
− Load data into the endpoint target (usually a database, more specifically a
Data Warehouse)
− Why ETL? Businesses need to load the Data Warehouse regularly
(incrementally/daily/weekly) so that it can serve its purpose of supporting
business analysis
QuerySurge™
a software division of
Introduction
Legacy DB
CRM/ERP
DB
Finance DB
Source Data ETL Process Target DWH
Extract
Transform
Load
QuerySurge™
a software division of
Introduction
Data Warehouse
Data Mart
Data Mart
BI Tool
BI Tool
Inventory
‘We have
212 Widgets
in the east
warehouse’
Customer Service
‘The paint
came off my
widget’
Advertising
‘Running a
new radio ad
today’
Transactional Analytical
QuerySurge™
a software division of
Introduction
Test Points and “ETL Legs”
• An ‘ETL Leg’ refers to a single ETL process that moves/transforms data between
two discrete points.
• A full ETL process may have multiple legs
• Test points are usually across single ETL legs –
the verification is between the source and
the target for that leg.
• Example: an operational source database
(source test point) is extracted, transformed
and loaded into a Data Warehouse (target test point).
Testing is conducted across this ETL leg.
Inventory
Data Warehouse
QuerySurge™
a software division of
Introduction
Legacy
DB
CRM/E
RP DB
Finance
DB
Data Sources ETL Process Target DW ETL Process Data Mart
ETL Process
Staging
ETL Leg
ETL Leg
ETL Leg
ETL ETL ETL
QuerySurge™
a software division of
Introduction
Single Leg​ Multi Leg​
More tests need to be created​ Less tests need to be created​
Tests are less complex​ Tests are more complex​
Defects are easier to pinpoint​ Defects are more difficult to pinpoint​
Execution time tends to be longer​ Execution time tends to be shorter​
Single Leg vs. Multi Leg Approaches
QuerySurge™
a software division of
Introduction
Data Mapping Document
A data mapping document is frequently called a source-to-target map and is
generally created in a spreadsheet.
This document acts as a central part of the functional requirements. The following
information is contained within the mapping document:
•Source database information
▪Source table
▪Source column
•Target database information
▪Target table
▪Target column
•Data transformation logic
•Optional requirements
QuerySurge™
a software division of
Introduction
QuerySurge™
a software division of
Introduction
• Direct Map
• Selective column and row type
• Translation
• Lookups
• Transpose
• Field Splitting
• Field Merging
• Calculated and Derived
Transformation Types
QuerySurge™
a software division of
Introduction
Testing Methods – Automation Tool
• Automation with QuerySurge offers
− Bulk data verification, testing sample sizes up to 100%
− Management of test assets
− Test Scheduling
− Persistent access to test data
− Reporting
An automated data testing approach with QuerySurge can significantly improve
coverage, organization and efficiency when compared to the previously mentioned
manual testing techniques.
QuerySurge™
a software division of
The Project Plan
What you will need:
− Gather project documents and assets
• Mapping documents
• Requirement documents
• Data Model documents
− Estimate the time to review documentation
− Determine the number of test engineer resources
− Determine the number of ETL or test legs
− Determine the number of cycles or releases
QuerySurge™
a software division of
The Project Plan
− Determine complexity of project mappings
• Low Complexity: No transformation logic (1-to-1 mapping) or minor transformation
logic including a change to data types from source to target, selective row filtering, and
minor translations
• Medium Complexity: Transformation logic including translations, joins across tables,
field splitting, and field merging
• High Complexity: Transformation logic including major translations, multiple joins across
tables, calculated or aggregated fields, transposing, derived fields, match and merge.
− Is QuerySurge installed and configured for the project?
− Does the lead or test engineers require training?
QuerySurge™
a software division of
The Project Plan
Question
Review documentation 4
Number of Test engineers 1
Number of ETL Legs 1
Number of Releases/Cycles 4
Low Complexity Tests 7
Medium Complexity Tests 21
High Complexity Tests 8
QuerySurge™
a software division of
Any questions?

More Related Content

Similar to Creating a Project Plan for a Data Warehouse Testing Assignment

Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8Years
Varsha Hiremath
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep Shahapur
 
ETL & Reporting Test Lead_JenishVarkeyJohn
ETL & Reporting Test Lead_JenishVarkeyJohnETL & Reporting Test Lead_JenishVarkeyJohn
ETL & Reporting Test Lead_JenishVarkeyJohn
Jenish John
 
Ramesh_Informatica_Power_Centre
Ramesh_Informatica_Power_CentreRamesh_Informatica_Power_Centre
Ramesh_Informatica_Power_Centre
Ramesh Togari
 
Resume_PratikDey
Resume_PratikDeyResume_PratikDey
Resume_PratikDey
Pratik Dey
 
Informatica_Power_Centre_9x
Informatica_Power_Centre_9xInformatica_Power_Centre_9x
Informatica_Power_Centre_9x
Ramesh Togari
 

Similar to Creating a Project Plan for a Data Warehouse Testing Assignment (20)

Pradeep_resume_ETL Testing
Pradeep_resume_ETL TestingPradeep_resume_ETL Testing
Pradeep_resume_ETL Testing
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf Software
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8Years
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
sandhya exp resume
sandhya exp resume sandhya exp resume
sandhya exp resume
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of Exerience
 
ETL & Reporting Test Lead_JenishVarkeyJohn
ETL & Reporting Test Lead_JenishVarkeyJohnETL & Reporting Test Lead_JenishVarkeyJohn
ETL & Reporting Test Lead_JenishVarkeyJohn
 
Etl testing
Etl testingEtl testing
Etl testing
 
Jithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL TestingJithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL Testing
 
Deepanshu_Resume
Deepanshu_ResumeDeepanshu_Resume
Deepanshu_Resume
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 
Ramesh_Informatica_Power_Centre
Ramesh_Informatica_Power_CentreRamesh_Informatica_Power_Centre
Ramesh_Informatica_Power_Centre
 
Resume_PratikDey
Resume_PratikDeyResume_PratikDey
Resume_PratikDey
 
Informatica_Power_Centre_9x
Informatica_Power_Centre_9xInformatica_Power_Centre_9x
Informatica_Power_Centre_9x
 
Nicholas king oracle epm migration and upgrade
Nicholas king   oracle epm migration and upgradeNicholas king   oracle epm migration and upgrade
Nicholas king oracle epm migration and upgrade
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
 

More from RTTS

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
RTTS
 

More from RTTS (18)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Creating a Project Plan for a Data Warehouse Testing Assignment

  • 1. a software division of Creating a Project Plan for a Data Warehouse Testing Assignment Chris Thompson Senior Solutions Architect Mike Calabrese Senior Solutions Architect QuerySurge™ the smart Data Testing solution QuerySurgeTM ™
  • 2. QuerySurge™ a software division of SENIOR DOMAIN EXPERT, DATA TESTING PRACTICE • Military veteran - Aviation electronics technician in the U.S Navy • BS in computer science from the University of Delaware • Successful implementations of QA projects in the Data space for over 15 years • Employee for RTTS for the past 21 years • Started with RTTS as an entry level Test Engineer • Worked in numerous fields including Pharmaceutical, Utilities and Retail Chris Thompson
  • 3. QuerySurge™ a software division of SENIOR DOMAIN EXPERT, DATA TESTING PRACTICE • Joined RTTS as a Test Engineer in 2009 • Over a decade of experience successfully implementing automated functional, data validation and ETL testing solutions for multiple clients across many industry verticals. • Mike is a technical expert on QuerySurge, RTTS’ flagship data testing solution, and supports clients around the world with their QuerySurge implementations. • BS in Computer Engineering from Hofstra University Mike Calabrese
  • 4. QuerySurge™ a software division of Introduction • Data Testing is an integral part of the development of any data project including, data warehouse, data migration and integration projects • Bad Data from defects can cause companies to make decisions that could cost millions of dollars or in a health-related field could cost dearly
  • 5. QuerySurge™ a software division of Handles more than 1 million customer transactions every hour • data imported into databases that contain > 2.5 petabytes of data • equivalent to 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day Introduction
  • 6. QuerySurge™ a software division of Introduction What is a Data Source? • A Data Source is a pool of data available for extraction. • The concept of the Data Source is technologically neutral – it is not associated with any specific technology. • The most common Data Sources are databases, files, and XML documents.
  • 7. QuerySurge™ a software division of Introduction What is a Data Warehouse? (In this case, the target) • A collection of data or information intended to support business decision making. • Data Warehouses contain a wide variety of data that present a coherent picture of business conditions. • A Data Warehouse is a huge repository of electronically organized data mainly meant for the purpose of reporting and analysis. • Most Data Warehouses are sent data from multiple sources (Databases and Files). • A place where historical data is stored for archival, analysis and security purposes. Legacy DB CRM/ERP DB Finance DB
  • 8. QuerySurge™ a software division of Introduction What is ETL? • In computing, the term Extract, Transform and Load (ETL) refers to a data handling process that involves: − Extract data from outside sources − Transform data to fit operational or reporting needs − Load data into the endpoint target (usually a database, more specifically a Data Warehouse) − Why ETL? Businesses need to load the Data Warehouse regularly (incrementally/daily/weekly) so that it can serve its purpose of supporting business analysis
  • 9. QuerySurge™ a software division of Introduction Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target DWH Extract Transform Load
  • 10. QuerySurge™ a software division of Introduction Data Warehouse Data Mart Data Mart BI Tool BI Tool Inventory ‘We have 212 Widgets in the east warehouse’ Customer Service ‘The paint came off my widget’ Advertising ‘Running a new radio ad today’ Transactional Analytical
  • 11. QuerySurge™ a software division of Introduction Test Points and “ETL Legs” • An ‘ETL Leg’ refers to a single ETL process that moves/transforms data between two discrete points. • A full ETL process may have multiple legs • Test points are usually across single ETL legs – the verification is between the source and the target for that leg. • Example: an operational source database (source test point) is extracted, transformed and loaded into a Data Warehouse (target test point). Testing is conducted across this ETL leg. Inventory Data Warehouse
  • 12. QuerySurge™ a software division of Introduction Legacy DB CRM/E RP DB Finance DB Data Sources ETL Process Target DW ETL Process Data Mart ETL Process Staging ETL Leg ETL Leg ETL Leg ETL ETL ETL
  • 13. QuerySurge™ a software division of Introduction Single Leg​ Multi Leg​ More tests need to be created​ Less tests need to be created​ Tests are less complex​ Tests are more complex​ Defects are easier to pinpoint​ Defects are more difficult to pinpoint​ Execution time tends to be longer​ Execution time tends to be shorter​ Single Leg vs. Multi Leg Approaches
  • 14. QuerySurge™ a software division of Introduction Data Mapping Document A data mapping document is frequently called a source-to-target map and is generally created in a spreadsheet. This document acts as a central part of the functional requirements. The following information is contained within the mapping document: •Source database information ▪Source table ▪Source column •Target database information ▪Target table ▪Target column •Data transformation logic •Optional requirements
  • 16. QuerySurge™ a software division of Introduction • Direct Map • Selective column and row type • Translation • Lookups • Transpose • Field Splitting • Field Merging • Calculated and Derived Transformation Types
  • 17. QuerySurge™ a software division of Introduction Testing Methods – Automation Tool • Automation with QuerySurge offers − Bulk data verification, testing sample sizes up to 100% − Management of test assets − Test Scheduling − Persistent access to test data − Reporting An automated data testing approach with QuerySurge can significantly improve coverage, organization and efficiency when compared to the previously mentioned manual testing techniques.
  • 18. QuerySurge™ a software division of The Project Plan What you will need: − Gather project documents and assets • Mapping documents • Requirement documents • Data Model documents − Estimate the time to review documentation − Determine the number of test engineer resources − Determine the number of ETL or test legs − Determine the number of cycles or releases
  • 19. QuerySurge™ a software division of The Project Plan − Determine complexity of project mappings • Low Complexity: No transformation logic (1-to-1 mapping) or minor transformation logic including a change to data types from source to target, selective row filtering, and minor translations • Medium Complexity: Transformation logic including translations, joins across tables, field splitting, and field merging • High Complexity: Transformation logic including major translations, multiple joins across tables, calculated or aggregated fields, transposing, derived fields, match and merge. − Is QuerySurge installed and configured for the project? − Does the lead or test engineers require training?
  • 20. QuerySurge™ a software division of The Project Plan Question Review documentation 4 Number of Test engineers 1 Number of ETL Legs 1 Number of Releases/Cycles 4 Low Complexity Tests 7 Medium Complexity Tests 21 High Complexity Tests 8