Challenges
“79% of organizations
have more than 100 data sources
with 30% using
more than 1,000 sources.” - IDC
1. Driving Business Value from Data in the Face of Fragmentation and Complexity
Data Complexity
Data Growth
1. IDC High Data Growth and Modern Applications Drive New Storage Requirements in Digitally Transformed Enterprises
Data has been growing at an ever-increasing pace. Over the next five
years IDC predicts structured data will At a compounded annual
growth rate for 40.15% from 3,892 exabytes in 2021 to 21,406
exabytes in 2026.
1 exabyte = 1,000 petabyte
1 petabyte = 1,000 terabyte
“Lack of automation impacts the ability to scale data
management. ”
- IDC
“40% of organizations haven’t been able to automate any
process. ”
- IDC
“Poor data quality now costs organizations
an average of $12.9 million annually”
- Gartner
1. Driving Business Value from Data in the Face of Fragmentation and Complexity
2. How to Improve Your Data Quality
How can we solve these data
challenges?
“AI is the defining technology of our times.”
- Satya Nadella
Microsoft CEO
Artificial Intelligence and Language Models
Artificial Intelligence – Is the ability for computers to
understand their environment and to solve problems
rationally.
Language Models (LM) – Are AI systems built for Natural
Language Processing (NLP)
AI Expansion
How can Data Testers Leverage AI?
Test Design Strategy (before AI)
Skill Level Design Effort
Test Strategy
Table-to-Table &
Column-to-Column Compares
Basic
create with
QuerySurge Wizard
Minimal
incredibly fast
Business Rules
Intermediate
create with SQL
Moderate
medium /fast
Row Count Compares
Basic
create with
QuerySurge Wizard
Minimal
incredibly fast
ETL Logic & Transformations
Intermediate/
Advanced
create with SQL
Moderate/
Difficult
time-consuming
Data Mappings
What is a Mapping Document?
• Also known as a Source-to-Target Map
• It’s THE critical element required to efficiently plan
data movement
• Defines the Extract, Transform, Load (ETL) process
It Captures:
✓ Business rules
✓ Data flow mapping
✓ Data movement requirements
It Specifies:
▪ Source input definition
▪ Target/output details
▪ Business & data transformation rules
▪ Absolute & Optional data quality requirements
What is Data Mapping?
It’s the process of matching fields from one database
to another
Schema
Mapping Doc
Survey of Data Experts
Consensus:
• 81% said their teams use formal mapping documents
• 78% of those teams with mappings store them in Excel
(the next highest answer was a 3rd party vendor tool at 14%)
1st Question:
Does your team use formal mapping documents and if yes,
in what form do you store the information?
1Poll conducted by RTTS on
targeted LinkedIn groups
We conducted a survey1 on Linkedin of 380+ Data Analysts, Data Architects &
ETL Developers and asked 2 questions about mappings:
2nd question:
Approximately how many mappings are in your typical Data
Warehouse project?
Consensus:
• 68% have more than 250 mappings
• 14% have between 100 and 250 mappings
Typical Data Movement Process
(without automated solution)
Roles Tasks
Timeline
Data Analyst
Data Architect
ETL Developer
Data/ETL Tester
Models and
builds target
data store
Reviews
Mapping
Document
Maintains
Target data
Store
Creates 2
tests: 1 for
each mapping
w/SQL editor
Reviews
Mapping
Document
Dumps
results of
tests to 2
Excel files
Compares
Excel files
by eye
Executes
tests
Determines
Requirements
Creates & maintains
Mapping Document
iterate
iterate
Reviews
Mapping
Document
Extracts & Loads data or
Extracts, Transforms &
Loads data into new store
Builds data
movement
logic
where
QuerySurge
typically fits
Integrating AI with QuerySurge
Test Process without automated data validation & ETL testing solution
Test Process with QuerySurge + QuerySurge AI
Test process with automated ETL testing solution (but not QuerySurge AI)
Skilled resource creates
2 SQL tests
for each mapping
(only db to db compares)
Dump results
to
files
Compare
data manually
&
Summaries Results
Execute
tests
QuerySurge compares results
up to 1,000x faster
&
provides detailed reports
Execute
tests
Skilled resource creates
2 SQL tests
for each mapping
(200+ data stores: db, json,
csv, Excel, xml, other)
QuerySurge compares results
up to 1,000x faster
&
provides detailed reports
Execute
tests
Any resource converts
all mappings
in minutes
Time Sinkhole Time Sinkhole
Time Sinkhole Immense Time Savings
Immense Time Savings Immense Time Savings
About QuerySurge
The Executive Office and Critical Data
C-level executives are using BI & Analytics to
make critical business decisions with the
assumption that the underlying data is fine.
We know it is not
How QuerySurge Works
QuerySurge connects
to any 2 points
at one time
Compares every data set to
find anomalies
Source Data Target Data
• Databases
• Data Warehouses
• Flat Files & Excel
• JSON files
• Web Services
• Mainframe files
• Data Warehouses
• Big Data Stores
(Hadoop & NoSQL)
• Business Intelligence
reports
Source Data
…and provides
Data Validation Analytics
Target Data
• Converts hundreds to thousands of data mappings
into tests in minutes
• Creates tests in the data store’s native SQL with little
to no human intervention
• Requires minimal technical skillset
Integrating AI with QuerySurge
2 Use Cases
1)Initial mapping upload:
Bulk upload of 100’s of mappings
at once
2) Incremental mapping uploads:
One or more mappings at a time
Mapping Uploads: Initial & Incremental
Initial Mapping Document
(100s of source-to-target data requirements)
QuerySurge QueryPairs
(SQL tests pulling data from source & target data stores for comparison)
Incremental Mapping Upload
(single source-to-target data requirement) QuerySurge AI
Pulls mappings into AI
model, converts to SQL and
injects into QuerySurge
QuerySurge Survey & ROI
Metrics:
• A data mapping of average complexity ≈ 1 hour of time to create
data tests
• To create data tests for 1,000 mappings ≈ 1,000 people/hours
• 1,000 hours * $100/hour consulting rate = $100,000
• QuerySurge AI can translate 200 mappings per hour or
1,000 mappings in 5 hours
• 5 hours * $100/hour = $500
Survey of 530 QuerySurge prospects and customers:
• 20% have > 1,000 mappings
• 10% have > 5,000 mappings
QuerySurge AI = Huge ROI
Therefore,
Demo
Future
Current AI Concerns
https://www.intel.com/content/www/us/en/newsroom/news/10-per-cent-orgs-launched-genai-solutions-2023.html
Compliance & Security
Network Network
Cloud Hosted AI
Internally Hosted AI
QuerySurgeTM
AI
Other AI Platforms
Reliability/Accuracy
QuerySurge’s fine tuned model was over 30% more accurate generating
SQL from mapping requirements then GPT-4
31.34%
38.63%
68.82%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
GPT-3.5 GPT-4 QuerySurge AI
Execution
Accuracy
(%)
Model
SQL Generation Accuracy w/ Augmented Dataset
Cost
$51.00
$0.43
$0.00
$10.00
$20.00
$30.00
$40.00
$50.00
$60.00
GPT 4.0 QuerySurge AI
Cost Per 1,000 Mappings
QuerySurgeTM
AI has been optimized for both accuracy and size (cost). When compared to
GPT-4 QuerySurgeTM
AI is 30% more accurate and costs 1/100th of the price.
QuerySurgeTM
Trial
(1) Trial in the Cloud of QuerySurgeTM, including
self-learning tutorial that works with sample data for 3 days
(2) Downloaded Trial of QuerySurgeTM, including
self-learning tutorial with sample data or your data for 15 days
(3) Proof of Concept of QuerySurgeTM for your clients for 45 days
• Includes all QuerySurge add-ons:
- Import/Export, Ready for Analytics, BI Tester,
QuerySurge DevOps for Data
• As-needed assistance from QuerySurge team
TRIAL
IN THE CLOUD
Proof
of
Concept
Available at: https://www.querysurge.com/trial
Free Self-Paced Training and Certifications
Learn. Earn. Inform.
Learn QuerySurge.
Earn your digital badge.
Inform your social network of your new skills.
Use Code: QS-TG-2024
Available until August 31st 2024
Available at:
https://www.querysurge.com/cert
Q&A

Leveraging AI to Simplify and Speed Up ETL Testing

  • 2.
  • 3.
    “79% of organizations havemore than 100 data sources with 30% using more than 1,000 sources.” - IDC 1. Driving Business Value from Data in the Face of Fragmentation and Complexity Data Complexity
  • 4.
    Data Growth 1. IDCHigh Data Growth and Modern Applications Drive New Storage Requirements in Digitally Transformed Enterprises Data has been growing at an ever-increasing pace. Over the next five years IDC predicts structured data will At a compounded annual growth rate for 40.15% from 3,892 exabytes in 2021 to 21,406 exabytes in 2026. 1 exabyte = 1,000 petabyte 1 petabyte = 1,000 terabyte
  • 5.
    “Lack of automationimpacts the ability to scale data management. ” - IDC “40% of organizations haven’t been able to automate any process. ” - IDC “Poor data quality now costs organizations an average of $12.9 million annually” - Gartner 1. Driving Business Value from Data in the Face of Fragmentation and Complexity 2. How to Improve Your Data Quality
  • 6.
    How can wesolve these data challenges?
  • 7.
    “AI is thedefining technology of our times.” - Satya Nadella Microsoft CEO
  • 8.
    Artificial Intelligence andLanguage Models Artificial Intelligence – Is the ability for computers to understand their environment and to solve problems rationally. Language Models (LM) – Are AI systems built for Natural Language Processing (NLP)
  • 9.
  • 10.
    How can DataTesters Leverage AI?
  • 11.
    Test Design Strategy(before AI) Skill Level Design Effort Test Strategy Table-to-Table & Column-to-Column Compares Basic create with QuerySurge Wizard Minimal incredibly fast Business Rules Intermediate create with SQL Moderate medium /fast Row Count Compares Basic create with QuerySurge Wizard Minimal incredibly fast ETL Logic & Transformations Intermediate/ Advanced create with SQL Moderate/ Difficult time-consuming
  • 12.
    Data Mappings What isa Mapping Document? • Also known as a Source-to-Target Map • It’s THE critical element required to efficiently plan data movement • Defines the Extract, Transform, Load (ETL) process It Captures: ✓ Business rules ✓ Data flow mapping ✓ Data movement requirements It Specifies: ▪ Source input definition ▪ Target/output details ▪ Business & data transformation rules ▪ Absolute & Optional data quality requirements What is Data Mapping? It’s the process of matching fields from one database to another Schema Mapping Doc
  • 13.
    Survey of DataExperts Consensus: • 81% said their teams use formal mapping documents • 78% of those teams with mappings store them in Excel (the next highest answer was a 3rd party vendor tool at 14%) 1st Question: Does your team use formal mapping documents and if yes, in what form do you store the information? 1Poll conducted by RTTS on targeted LinkedIn groups We conducted a survey1 on Linkedin of 380+ Data Analysts, Data Architects & ETL Developers and asked 2 questions about mappings: 2nd question: Approximately how many mappings are in your typical Data Warehouse project? Consensus: • 68% have more than 250 mappings • 14% have between 100 and 250 mappings
  • 14.
    Typical Data MovementProcess (without automated solution) Roles Tasks Timeline Data Analyst Data Architect ETL Developer Data/ETL Tester Models and builds target data store Reviews Mapping Document Maintains Target data Store Creates 2 tests: 1 for each mapping w/SQL editor Reviews Mapping Document Dumps results of tests to 2 Excel files Compares Excel files by eye Executes tests Determines Requirements Creates & maintains Mapping Document iterate iterate Reviews Mapping Document Extracts & Loads data or Extracts, Transforms & Loads data into new store Builds data movement logic where QuerySurge typically fits
  • 15.
    Integrating AI withQuerySurge Test Process without automated data validation & ETL testing solution Test Process with QuerySurge + QuerySurge AI Test process with automated ETL testing solution (but not QuerySurge AI) Skilled resource creates 2 SQL tests for each mapping (only db to db compares) Dump results to files Compare data manually & Summaries Results Execute tests QuerySurge compares results up to 1,000x faster & provides detailed reports Execute tests Skilled resource creates 2 SQL tests for each mapping (200+ data stores: db, json, csv, Excel, xml, other) QuerySurge compares results up to 1,000x faster & provides detailed reports Execute tests Any resource converts all mappings in minutes Time Sinkhole Time Sinkhole Time Sinkhole Immense Time Savings Immense Time Savings Immense Time Savings
  • 16.
  • 17.
    The Executive Officeand Critical Data C-level executives are using BI & Analytics to make critical business decisions with the assumption that the underlying data is fine. We know it is not
  • 18.
    How QuerySurge Works QuerySurgeconnects to any 2 points at one time Compares every data set to find anomalies Source Data Target Data • Databases • Data Warehouses • Flat Files & Excel • JSON files • Web Services • Mainframe files • Data Warehouses • Big Data Stores (Hadoop & NoSQL) • Business Intelligence reports Source Data …and provides Data Validation Analytics Target Data
  • 19.
    • Converts hundredsto thousands of data mappings into tests in minutes • Creates tests in the data store’s native SQL with little to no human intervention • Requires minimal technical skillset
  • 20.
    Integrating AI withQuerySurge 2 Use Cases 1)Initial mapping upload: Bulk upload of 100’s of mappings at once 2) Incremental mapping uploads: One or more mappings at a time
  • 21.
    Mapping Uploads: Initial& Incremental Initial Mapping Document (100s of source-to-target data requirements) QuerySurge QueryPairs (SQL tests pulling data from source & target data stores for comparison) Incremental Mapping Upload (single source-to-target data requirement) QuerySurge AI Pulls mappings into AI model, converts to SQL and injects into QuerySurge
  • 22.
    QuerySurge Survey &ROI Metrics: • A data mapping of average complexity ≈ 1 hour of time to create data tests • To create data tests for 1,000 mappings ≈ 1,000 people/hours • 1,000 hours * $100/hour consulting rate = $100,000 • QuerySurge AI can translate 200 mappings per hour or 1,000 mappings in 5 hours • 5 hours * $100/hour = $500 Survey of 530 QuerySurge prospects and customers: • 20% have > 1,000 mappings • 10% have > 5,000 mappings QuerySurge AI = Huge ROI Therefore,
  • 23.
  • 24.
  • 25.
  • 26.
    Compliance & Security NetworkNetwork Cloud Hosted AI Internally Hosted AI QuerySurgeTM AI Other AI Platforms
  • 27.
    Reliability/Accuracy QuerySurge’s fine tunedmodel was over 30% more accurate generating SQL from mapping requirements then GPT-4 31.34% 38.63% 68.82% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% GPT-3.5 GPT-4 QuerySurge AI Execution Accuracy (%) Model SQL Generation Accuracy w/ Augmented Dataset
  • 28.
    Cost $51.00 $0.43 $0.00 $10.00 $20.00 $30.00 $40.00 $50.00 $60.00 GPT 4.0 QuerySurgeAI Cost Per 1,000 Mappings QuerySurgeTM AI has been optimized for both accuracy and size (cost). When compared to GPT-4 QuerySurgeTM AI is 30% more accurate and costs 1/100th of the price.
  • 29.
    QuerySurgeTM Trial (1) Trial inthe Cloud of QuerySurgeTM, including self-learning tutorial that works with sample data for 3 days (2) Downloaded Trial of QuerySurgeTM, including self-learning tutorial with sample data or your data for 15 days (3) Proof of Concept of QuerySurgeTM for your clients for 45 days • Includes all QuerySurge add-ons: - Import/Export, Ready for Analytics, BI Tester, QuerySurge DevOps for Data • As-needed assistance from QuerySurge team TRIAL IN THE CLOUD Proof of Concept Available at: https://www.querysurge.com/trial
  • 30.
    Free Self-Paced Trainingand Certifications Learn. Earn. Inform. Learn QuerySurge. Earn your digital badge. Inform your social network of your new skills. Use Code: QS-TG-2024 Available until August 31st 2024 Available at: https://www.querysurge.com/cert
  • 31.