Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Completing the Data Equation: Test Data + Data Validation = Success


Published on

Completing the Data Equation

In this presentation, we tackle 2 major challenges to assuring your data quality:
1) Test Data Generation
2) Data Validation

We illustrate how GenRocket and QuerySurge, used in conjunction, can solve these challenges. Also see how they can be easily integrated into your Continuous Integration/Continuous Delivery pipeline.

Session Overview
- Primary challenges organizations are facing with their data projects
- Key success factors for data validation & testing
- How to setup a workflow around test data generation and data validation using GenRocket & QuerySurge
- How to automate this workflow in your CI/CD DataOps pipeline

to see the video, go to

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Completing the Data Equation: Test Data + Data Validation = Success

  1. 1. QuerySurge ™ The smart data testing solution Garth Rose CEO Completing the Data Equation Hycel Taylor CTO Enterprise Test Data Automation Test Data Generation + Data Validation = Data Success Eric Smyth Director
  2. 2. ETL ETL Mainframe Business Intelligence & Analytics C-level executives are using BI & Analytics to make critical business decisions with the assumption that the underlying data is fine ETL We know it is not Typical data issue areas
  3. 3. Keys to Success The keys to a successfully test data throughout your data architecture are the following : • To be able to validate the critical business rules and transformation logic being applied to the data • To have the ability to test large volumes of data in a period of time that will not delay the release schedule • To be able to execute data tests as part of a Continuous Integration /Continuous Delivery pipeline • To be able to pinpoint where data defects were introduced in the architecture and link them back to the specification or data model The Goal
  4. 4. There are two major challenges when dealing with data testing that need to be overcome to have a successful implementation • How do I generate the data needed to conduct the tests to be executed? The Issues The Data Testing Challenge • How do I execute the tests in an accurate and an efficient way that aligns with my testing cycle?
  5. 5. • Be able to accurately and rapidly generate all the data needed for the scenarios that need to be tested • Be able to mask data to ensure data security • Be able to validate large amounts of data quickly • Be able to validate difficult transformation rules between the various source and target systems being tested • Be able to be integrated into your build pipeline to achieve continuous testing • Be able to store historical results and provide analytics The Ideal Data Testing Strategy The ideal data testing strategy should be able to accomplish the following:
  6. 6. The Ideal Data Testing Solutions These challenges can be overcome by utilizing the following two solutions as key parts of your data testing framework: Data Validation QuerySurge™ The smart data testing solution Test Data Generation Enterprise Test Data Automation
  7. 7. What is GenRocket ? • Fast, secure, low cost, versatile, industry leading data quality • 9+ years of development, refinement, customer experience • Recommended by top Global Systems Integrators • Customers across the world in 10 different vertical markets • Only Patented Test Data Platform(2017) – # US 20140143525 A1 for Systems and Methods for Data Generation An Enterprise Test Data Automation Platform
  8. 8. How It Works
  9. 9. Test Data Use Cases
  10. 10. is the smart testing solution for automated validation & testing of Data QuerySurge QuerySurge™ Use Cases a software division of
  11. 11. QuerySurge connects to any 2 points at one time SQL HQL SQL Comparison of every data set Source Data Target Data Data Analytics Dashboard, Data Intelligence Reports, automated emails Results – pass/fail Target Data Big Data stores • Hadoop • NoSQL Data Warehouses Business Intelligence Reports XML Web Services Source Data Data Stores • Databases • Data Warehouses • Data Marts Flat Files • Fixed Width • Delimited • Excel • JSON Mainframes • DB2 • Various file types
  12. 12. Data Testing Process Testing Validation Point Testing Validation PointTest Data Generation Point Test Data Generation Point ETL ETL Data Warehouse Target Database Flat Files (Excel, JSON, etc.) Sources
  13. 13. Use Case Example: A CUSTOMER_FACT table exists in our target data warehouse. Two key transformations are being executed that need to be tested: Use Case 1) Country name associated with the customer is being converted to Country Codes: Ex: ❑ United States > USA ❑ Canada > CAN 2) Total number of orders a customer purchased last year determines VIP status based on the following lookup chart Ex: ❑ 10+ orders > VIP Level 5 ❑ 7-9 orders > VIP Level 4 ❑ 4-6 orders > VIP Level 3 ❑ 2-3 orders > VIP Level 2 ❑ 1 order > VIP Level 1 ❑ 0 orders > VIP Level 0
  14. 14. a software division of Source Database Use Case Use Case Example: The data test should validate combinations of customers from every country and every possible VIP ordering status Target Database
  15. 15. Use Case Implementation • A Domain will be created for each database table. • A Scenario will be created for each domain • A Scenario Chain will be created to execute all scenarios • The domains will contain Attributes for each database field.
  16. 16. a software division ofQuerySurge™ Use Case ImplementationQuerySurge • A QueryPair is created to validate the ETL process. • A Test Suite is created that will contain the QueryPair • A Scenario executes the Test Suite and the results are analyzed
  17. 17. Build Pipeline Validation Tests Execute ETL Job Generate Data GenRocket ™ Runtime Execution API Linux Mac Windows CI / CD Use Case Implementation Source Database QuerySurge ™ Server Execution API Automatic Email Notification Target Database Pass / Fail
  18. 18. built by a software division of Demonstration QuerySurge™ The video can be found on YouTube here>>