Webinar
Mike Calabrese
Team Lead/Senior Engineer
Bill Hayduk
Founder/CEO
Creating a Data Validation
& Testing Strategy
Copyright Real-Time Technology Solutions, Inc. 2019 CONFIDENTIAL – DO NOT distribute
Facts
Founded:
1996 (24th anniversary)
Location:
New York City (HQ)
Customer profile:
• Fortune 500 & mid-size
• 700+ customers
Strategic Partners:
IBM, Microsoft, Oracle,
Teradata, Cloudera,
HortonWorks, MongoDB,
SAP, Micro Focus
Other Software
Supported
QuerySurge, Selenium,
Appium, CitraTest,
Postman, Smart Bear,
JMeter, others
RTTS is the premier pure-play QA & Testing firm
that specializes in Test Automation
Data
Validation
Data Testing
Strategies
Intro
Assessment
Case Study
Data Validation Assessment by
Data
Validation
Data Testing
Strategies
Intro
Assessment
Case Study
Data Validation Assessment by RTTS
Handles more than 1 million customer transactions every hour.
• data imported into databases that contain > 2.5 petabytes of data
• the equivalent of 167 times the information contained in all the books in the US Library of
Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
others
Big Impacts of Big Data
Data Warehouse Marketplace
“the worldwide data warehouse management software market is forecast
to generate nearly $17 billion in revenue by 2020” - Forrester
Top vendors: Oracle, Teradata, IBM, Microsoft, SAP, Micro Focus and Amazon
Business Intelligence Marketplace
“The business intelligence (BI) and analytics software market is forecast to grow to
$22.8 billion by the end of 2020” - Gartner
SAP, IBM, SAS, Microsoft, Oracle, Tableau, Qlik, MicroStrategy , Information Builders
DWH, BI, Big Data Marketplaces
Big Data Marketplace
“By the end of 2020, companies will spend > USD $72 billion on on Big Data
hardware, software, & professional services” - IDC
Oracle, IBM, Microsoft, Amazon, Micro Focus, HortonWorks, Cloudera, Teradata,
SAP, MongoDB, MapR, DataStax, Snowflake.
Legacy DB
CRM/ERP
DB
Finance DB
Source Data
ETL Process
Target DWH
ETL Process
Business Intelligence (BI) & Analytics
Data Mart
Impacts of Bad Data
“On average, poor data quality costs organizations $14.2 million
annually.”
a software division ofQuerySurge™
“Dirty data costs the average business 15% to 25% of revenue.”
“Cleaning up data will lead to average cost savings of 33%, while
boosting revenue by an average of 31%.”
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
What is Data Validation?
Data Validation Testing
The process of verifying your data is completely and accurately moved
through your systems according to the business requirements.
Legacy DB
CRM/ERP
DB
Finance DB
Source Data ETL Process Target DWH
Extract
Transform
Load
• Data Completeness
Verifying that all data has been loaded from the sources to the target Data Warehouse.
Validate the correct data displays in BI reports.
Data Validation Testing
• Data Transformation
Ensuring that all data has been transformed
correctly during the extract-transform-load (ETL)
process.
• BI Report Testing
Verify that BI Reports are formatted correctly, calculated fields are validated, and data is verified
against the underlying data.
DATA VALIDATION TEST TYPES
• BI Performance Testing
Ensure your BI Reports can be generated in a reasonable amount of time
• Data Quality
Ensuring that the ETL process correctly rejects,
substitutes default values, corrects or ignores and
reports invalid data.
Finding Bad Data
Issue Description Possible Causes
Missing Data Data that does not make it into the target database
• Invalid or incorrect lookup table in the
transformation logic
• Bad data from the source database (Needs
cleansing)
• Invalid joins
Truncation of Data Data being lost by truncation of the data field
• Invalid field lengths on target database
• Transformation logic not considering field
lengths from source
Data Type Mismatch Data types not set up correctly on target database Source data field not configured correctly
Null Translation
Null source values not being transformed to correct
target values
Development team did not include the null
translation in the transformation logic
Wrong Translation
Opposite of the Null Translation error. Field should be
null but is populated with a non-null value or field
should be populated, but with the wrong value
Development team incorrectly translated the
source field for certain values
Misplaced Data
Source data fields not being transformed to the
correct target data field
Development team inadvertently mapped
the source data field to the wrong target data
field
Extra Records
Records which should not be in the ETL are included
in the ETL
Development team did not include filter in
their code
Not Enough Records
Records which should be in the ETL are included in
the ETL
Development team had a filter in their code
which should not have been there
Finding Bad Data (cont.)
Issue Description Possible Causes
Transformation Logic
Errors/Holes
Testing sometimes can lead to finding “holes” in the
transformation logic or realizing the logic is unclear
Development team did not take into account
special cases. For example international
cities that contain special language specific
characters might need to be dealt with in the
ETL code
Simple/Small Errors Capitalization, spacing and other small errors
Development team did not add an additional
space after a comma for populating the
target field.
Sequence Generator
Ensuring that the sequence number of reports are in
the correct order is very important when processing
follow-up reports or answering to an audit
Development team did not configure the
sequence generator correctly resulting in
records with a duplicate sequence number
Undocumented
Requirements
Find requirements that are “understood” but are not
actually documented anywhere
Several of the members of the development
team did not understand the “understood”
undocumented requirements.
Duplicate Records
Duplicate records are two or more records that
contain the same data
Development team did not add the
appropriate code to filter out duplicate
records
Numeric Field Precision
Numbers that are not formatted to the correct
decimal point or not rounded per specifications
Development team rounded the numbers to
the wrong decimal point
Rejected Rows Data rows that get rejected due to data issues
Development team did not take into account
data conditions that could break the ETL for
a particular row
Challenges
• How much data needs to be validated/tested?
• How do I ensure I am testing the proper data
permutations?
• What are the critical data endpoints that need
to be tested?
• How do I verify that the data from my various
source systems is propagating through the
architecture?
• How do I validate data in the cloud
environments?
• Is bad data making it into the architecture?
• How much of the data testing can be automated?
COST
Data Mapping Development
Unit Testing
QA Test Cycle
UAT
Testing
End
User
Solutions
Finding Bad Data
• Identify testing points
• Review data mappings
• Data Testing Strategies
• comparisons (source vs. target)
• row counts
• minus queries
• automation tools
Solutions
Data Testing Permutations
• Analyze the data mappings
• Develop a test Data Set
o Review Transformation Logic
▪ Case Statements
▪ Field Merges/ Field Splitting
▪ Translations (Lookups)
▪ Derived
• Replication of production data
• Homegrown or Freeware
• Enterprise solutions
o IBM InfoSphere Optim, GenRocket, SAP, Computer Associates
Test Data Generation
Solutions
How much data to validate?
• Requirements
• Regulatory authorities may require 100% of your data be tested.
• In other cases, 90% or 80% may be the goal.
• Time, resource and scope driven
• Release timeline
• Available resources
• Scope of authoring and executing tests
• Risk Assessment
• Business Acceptance Criteria – End users define their primary data use cases.
• Critical Path – Validate the data the flows through the high priority data
endpoints within in your system.
𝑇𝑒𝑠𝑡 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙
# 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒)
= # 𝑜𝑓 𝑑𝑎𝑦𝑠
𝑇𝑒𝑠𝑡 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙
# 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒)
= # 𝑜𝑓 𝑑𝑎𝑦𝑠
Solutions
Automation vs Manual
• Recurrence
• Avoid complicated single use test cases
• Focus on repeatable testing paths
• Ensure modularization of test data sets
• Test Data Sets
• Consider automation tool’s assigned hardware resources and performance
which must be able to handle the load of the data set under test
• Include time needed to prepare environments into your testing estimates
• Database Performance
• Set expectations on database hardware & responsiveness.
• SQL query response time will factor into overall test run times
Solutions
How do I test data in my cloud environment ?
• On-Prem vs Cloud
o Follow the same testing methodologies but with considerations for cloud
connections and scalability
o If an automated solution is being pursued, confirm the tools involved
allows for connectivity to your cloud environment
• Hybrid-Could Mapping
o Interface documentation
o Define entry & exit points if applicable
• Digital Transformation
o Clearly defined conversion
requirements and mappings
• Environment Scalability
• Define limitations on testing environment resources
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
Data Validation Assessment
What are the goals of a
Data Validation assessment?
• Receive an expert evaluation of your
current data validation process
• Provide recommendations on how to
improve your process
• Proposal for successful implementation
of your goals
Data Validation Assessment
Components of the Assessment
• Business analysis
• Data architecture analysis
• ETL testing process evaluation
• DataOps & DevOps evaluation
• Resource evaluation (optional)
• Metrics evaluation
• Risk assessment
Data Validation Assessment
Interview with Key Players
• Business/Data Analysts create requirements
• QA Testers develop and execute test plans and
test cases
• Architects set up environments
• Developers create ETL code, perform unit tests
• DBAs test for performance and stress
• Business Users perform functional User
Acceptance Tests
Data Validation Assessment
Process Review
• Review Requirements & Mapping documentation
• Testing Process Design
• Analysis of tools and DevOps/DataOps
• Reporting metrics evaluations
Data Validation Assessment
Deliverables
• Detailed analysis report with recommendations
for improvement
• Presentation to your team on our findings
• Proposal for successful implementation of your
goals
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
ETL Developer: Codes data movement based on Mapping Requirements
Data Warehouse
ETL
Data Tester: Tests data movement based on Mapping Requirements
Data Mart
ETL
Source Data Big Data lake
Testing Point #1 Testing Point #2 Testing Points #3
BI & Analytics
Testing Point #4
Tester tests BI
Reports
BI Analyst extracts
data for reports
Data Testing - Developer & Tester
Source-to-Target Map
It’s the critical element required to
efficiently plan the target Data
Stores. It also defines the Extract,
Transform, Load (ETL) process.
Intention:
✓ capture business rules
✓ data flow mapping and
✓ data movement requirements.
Mapping Doc specifies:
▪ Source input definition
▪ Target/output details
▪ Business & data transformation rules
▪ Absolute data quality requirements
▪ Optional data quality requirements.
Data Requirements = Mapping Document
Data Testing Strategies
Testing Methods
Minus Queries – Create a SQL source query and a SQL Target query. Utilizing SQL, subtract
source query results from target query results and subtract target query results from
source query results
Visual Compare – View source data and target
data and manually compare
Record Counts – Creating a SQL source and
target query to return a record counts and
comparing the values
Automation – Utilizing an automation tool to compare SQL source and target query results
Sampling
Level
1
Sampling a % of data by visually comparing data sets. Not repeatable.
Excel, Ad Hoc Reporting
Level
2
Using Excel or other homegrown method. Ad hoc reporting.
Minus Queries
Level
3
Utilizing SQL editor & minus queries to test data. More
detailed reporting.
Data Test Automation
Level
4
Repeatable test automation, agreed-upon process, centralized
reporting.
On which Level
should your
process be?
Data Quality Optimizing
Level
5
Full automation, tracking of ROI, predictive data issues, auditable
results. Business value is fully understood/supported by management.
Data Maturity Model - Test Execution
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
A company in the financial industry had a development and QA team assigned to
their ETL process. But there were still issues:
Case Study
• They were still suffering from incorrect data
fields populating their Business Intelligence
(BI) reports
• Development cycles were frequently delayed
• Management was losing confidence in the BI
reporting data
CASE STUDY
OVERVIEW
Senior RTTS resources were brought in to assess the process
• Interview key players
• Review process documentation and tools
• Minimal requirements
• Ticketing system was not being implemented for
traceability
• Testing process of low-level maturity
o Table row counts
o Sampling
o Excel comparisons
Problem areas identified:
Case Study
Resource needs:
Case Study
Recommendations for Improvement
• Centralized mapping documentation
o Linking requirements to work items
tickets to test cases.
• Improve communications between team
members we recommended a new Data
Analyst role
• Narrowed focus of the stand-up meetings
• Implemented automated solutions to
expand coverage for larger data sets
DEMO:
Automating your data validation & testing
Any questions?
Creating a Data Validation & Testing Strategy

Creating a Data validation and Testing Strategy

  • 1.
    Webinar Mike Calabrese Team Lead/SeniorEngineer Bill Hayduk Founder/CEO Creating a Data Validation & Testing Strategy
  • 2.
    Copyright Real-Time TechnologySolutions, Inc. 2019 CONFIDENTIAL – DO NOT distribute
  • 3.
    Facts Founded: 1996 (24th anniversary) Location: NewYork City (HQ) Customer profile: • Fortune 500 & mid-size • 700+ customers Strategic Partners: IBM, Microsoft, Oracle, Teradata, Cloudera, HortonWorks, MongoDB, SAP, Micro Focus Other Software Supported QuerySurge, Selenium, Appium, CitraTest, Postman, Smart Bear, JMeter, others RTTS is the premier pure-play QA & Testing firm that specializes in Test Automation
  • 4.
  • 5.
  • 6.
    Handles more than1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others Big Impacts of Big Data
  • 7.
    Data Warehouse Marketplace “theworldwide data warehouse management software market is forecast to generate nearly $17 billion in revenue by 2020” - Forrester Top vendors: Oracle, Teradata, IBM, Microsoft, SAP, Micro Focus and Amazon Business Intelligence Marketplace “The business intelligence (BI) and analytics software market is forecast to grow to $22.8 billion by the end of 2020” - Gartner SAP, IBM, SAS, Microsoft, Oracle, Tableau, Qlik, MicroStrategy , Information Builders DWH, BI, Big Data Marketplaces Big Data Marketplace “By the end of 2020, companies will spend > USD $72 billion on on Big Data hardware, software, & professional services” - IDC Oracle, IBM, Microsoft, Amazon, Micro Focus, HortonWorks, Cloudera, Teradata, SAP, MongoDB, MapR, DataStax, Snowflake.
  • 8.
    Legacy DB CRM/ERP DB Finance DB SourceData ETL Process Target DWH ETL Process Business Intelligence (BI) & Analytics Data Mart
  • 9.
    Impacts of BadData “On average, poor data quality costs organizations $14.2 million annually.” a software division ofQuerySurge™ “Dirty data costs the average business 15% to 25% of revenue.” “Cleaning up data will lead to average cost savings of 33%, while boosting revenue by an average of 31%.”
  • 10.
    Data Validation Data Testing Strategies Intro a softwaredivision of Assessment Case Study Data Validation Assessment by
  • 11.
    What is DataValidation? Data Validation Testing The process of verifying your data is completely and accurately moved through your systems according to the business requirements. Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target DWH Extract Transform Load
  • 12.
    • Data Completeness Verifyingthat all data has been loaded from the sources to the target Data Warehouse. Validate the correct data displays in BI reports. Data Validation Testing • Data Transformation Ensuring that all data has been transformed correctly during the extract-transform-load (ETL) process. • BI Report Testing Verify that BI Reports are formatted correctly, calculated fields are validated, and data is verified against the underlying data. DATA VALIDATION TEST TYPES • BI Performance Testing Ensure your BI Reports can be generated in a reasonable amount of time • Data Quality Ensuring that the ETL process correctly rejects, substitutes default values, corrects or ignores and reports invalid data.
  • 13.
    Finding Bad Data IssueDescription Possible Causes Missing Data Data that does not make it into the target database • Invalid or incorrect lookup table in the transformation logic • Bad data from the source database (Needs cleansing) • Invalid joins Truncation of Data Data being lost by truncation of the data field • Invalid field lengths on target database • Transformation logic not considering field lengths from source Data Type Mismatch Data types not set up correctly on target database Source data field not configured correctly Null Translation Null source values not being transformed to correct target values Development team did not include the null translation in the transformation logic Wrong Translation Opposite of the Null Translation error. Field should be null but is populated with a non-null value or field should be populated, but with the wrong value Development team incorrectly translated the source field for certain values Misplaced Data Source data fields not being transformed to the correct target data field Development team inadvertently mapped the source data field to the wrong target data field Extra Records Records which should not be in the ETL are included in the ETL Development team did not include filter in their code Not Enough Records Records which should be in the ETL are included in the ETL Development team had a filter in their code which should not have been there
  • 14.
    Finding Bad Data(cont.) Issue Description Possible Causes Transformation Logic Errors/Holes Testing sometimes can lead to finding “holes” in the transformation logic or realizing the logic is unclear Development team did not take into account special cases. For example international cities that contain special language specific characters might need to be dealt with in the ETL code Simple/Small Errors Capitalization, spacing and other small errors Development team did not add an additional space after a comma for populating the target field. Sequence Generator Ensuring that the sequence number of reports are in the correct order is very important when processing follow-up reports or answering to an audit Development team did not configure the sequence generator correctly resulting in records with a duplicate sequence number Undocumented Requirements Find requirements that are “understood” but are not actually documented anywhere Several of the members of the development team did not understand the “understood” undocumented requirements. Duplicate Records Duplicate records are two or more records that contain the same data Development team did not add the appropriate code to filter out duplicate records Numeric Field Precision Numbers that are not formatted to the correct decimal point or not rounded per specifications Development team rounded the numbers to the wrong decimal point Rejected Rows Data rows that get rejected due to data issues Development team did not take into account data conditions that could break the ETL for a particular row
  • 15.
    Challenges • How muchdata needs to be validated/tested? • How do I ensure I am testing the proper data permutations? • What are the critical data endpoints that need to be tested? • How do I verify that the data from my various source systems is propagating through the architecture? • How do I validate data in the cloud environments? • Is bad data making it into the architecture? • How much of the data testing can be automated?
  • 16.
    COST Data Mapping Development UnitTesting QA Test Cycle UAT Testing End User Solutions Finding Bad Data • Identify testing points • Review data mappings • Data Testing Strategies • comparisons (source vs. target) • row counts • minus queries • automation tools
  • 17.
    Solutions Data Testing Permutations •Analyze the data mappings • Develop a test Data Set o Review Transformation Logic ▪ Case Statements ▪ Field Merges/ Field Splitting ▪ Translations (Lookups) ▪ Derived • Replication of production data • Homegrown or Freeware • Enterprise solutions o IBM InfoSphere Optim, GenRocket, SAP, Computer Associates Test Data Generation
  • 18.
    Solutions How much datato validate? • Requirements • Regulatory authorities may require 100% of your data be tested. • In other cases, 90% or 80% may be the goal. • Time, resource and scope driven • Release timeline • Available resources • Scope of authoring and executing tests • Risk Assessment • Business Acceptance Criteria – End users define their primary data use cases. • Critical Path – Validate the data the flows through the high priority data endpoints within in your system. 𝑇𝑒𝑠𝑡 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒) = # 𝑜𝑓 𝑑𝑎𝑦𝑠 𝑇𝑒𝑠𝑡 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒) = # 𝑜𝑓 𝑑𝑎𝑦𝑠
  • 19.
    Solutions Automation vs Manual •Recurrence • Avoid complicated single use test cases • Focus on repeatable testing paths • Ensure modularization of test data sets • Test Data Sets • Consider automation tool’s assigned hardware resources and performance which must be able to handle the load of the data set under test • Include time needed to prepare environments into your testing estimates • Database Performance • Set expectations on database hardware & responsiveness. • SQL query response time will factor into overall test run times
  • 20.
    Solutions How do Itest data in my cloud environment ? • On-Prem vs Cloud o Follow the same testing methodologies but with considerations for cloud connections and scalability o If an automated solution is being pursued, confirm the tools involved allows for connectivity to your cloud environment • Hybrid-Could Mapping o Interface documentation o Define entry & exit points if applicable • Digital Transformation o Clearly defined conversion requirements and mappings • Environment Scalability • Define limitations on testing environment resources
  • 21.
    Data Validation Data Testing Strategies Intro a softwaredivision of Assessment Case Study Data Validation Assessment by
  • 22.
    Data Validation Assessment Whatare the goals of a Data Validation assessment? • Receive an expert evaluation of your current data validation process • Provide recommendations on how to improve your process • Proposal for successful implementation of your goals
  • 23.
    Data Validation Assessment Componentsof the Assessment • Business analysis • Data architecture analysis • ETL testing process evaluation • DataOps & DevOps evaluation • Resource evaluation (optional) • Metrics evaluation • Risk assessment
  • 24.
    Data Validation Assessment Interviewwith Key Players • Business/Data Analysts create requirements • QA Testers develop and execute test plans and test cases • Architects set up environments • Developers create ETL code, perform unit tests • DBAs test for performance and stress • Business Users perform functional User Acceptance Tests
  • 25.
    Data Validation Assessment ProcessReview • Review Requirements & Mapping documentation • Testing Process Design • Analysis of tools and DevOps/DataOps • Reporting metrics evaluations
  • 26.
    Data Validation Assessment Deliverables •Detailed analysis report with recommendations for improvement • Presentation to your team on our findings • Proposal for successful implementation of your goals
  • 27.
    Data Validation Data Testing Strategies Intro a softwaredivision of Assessment Case Study Data Validation Assessment by
  • 28.
    ETL Developer: Codesdata movement based on Mapping Requirements Data Warehouse ETL Data Tester: Tests data movement based on Mapping Requirements Data Mart ETL Source Data Big Data lake Testing Point #1 Testing Point #2 Testing Points #3 BI & Analytics Testing Point #4 Tester tests BI Reports BI Analyst extracts data for reports Data Testing - Developer & Tester
  • 29.
    Source-to-Target Map It’s thecritical element required to efficiently plan the target Data Stores. It also defines the Extract, Transform, Load (ETL) process. Intention: ✓ capture business rules ✓ data flow mapping and ✓ data movement requirements. Mapping Doc specifies: ▪ Source input definition ▪ Target/output details ▪ Business & data transformation rules ▪ Absolute data quality requirements ▪ Optional data quality requirements. Data Requirements = Mapping Document
  • 30.
    Data Testing Strategies TestingMethods Minus Queries – Create a SQL source query and a SQL Target query. Utilizing SQL, subtract source query results from target query results and subtract target query results from source query results Visual Compare – View source data and target data and manually compare Record Counts – Creating a SQL source and target query to return a record counts and comparing the values Automation – Utilizing an automation tool to compare SQL source and target query results
  • 31.
    Sampling Level 1 Sampling a %of data by visually comparing data sets. Not repeatable. Excel, Ad Hoc Reporting Level 2 Using Excel or other homegrown method. Ad hoc reporting. Minus Queries Level 3 Utilizing SQL editor & minus queries to test data. More detailed reporting. Data Test Automation Level 4 Repeatable test automation, agreed-upon process, centralized reporting. On which Level should your process be? Data Quality Optimizing Level 5 Full automation, tracking of ROI, predictive data issues, auditable results. Business value is fully understood/supported by management. Data Maturity Model - Test Execution
  • 32.
    Data Validation Data Testing Strategies Intro a softwaredivision of Assessment Case Study Data Validation Assessment by
  • 33.
    A company inthe financial industry had a development and QA team assigned to their ETL process. But there were still issues: Case Study • They were still suffering from incorrect data fields populating their Business Intelligence (BI) reports • Development cycles were frequently delayed • Management was losing confidence in the BI reporting data CASE STUDY OVERVIEW
  • 34.
    Senior RTTS resourceswere brought in to assess the process • Interview key players • Review process documentation and tools • Minimal requirements • Ticketing system was not being implemented for traceability • Testing process of low-level maturity o Table row counts o Sampling o Excel comparisons Problem areas identified: Case Study Resource needs:
  • 35.
    Case Study Recommendations forImprovement • Centralized mapping documentation o Linking requirements to work items tickets to test cases. • Improve communications between team members we recommended a new Data Analyst role • Narrowed focus of the stand-up meetings • Implemented automated solutions to expand coverage for larger data sets
  • 36.
    DEMO: Automating your datavalidation & testing
  • 37.
    Any questions? Creating aData Validation & Testing Strategy