SlideShare a Scribd company logo
Pipeline Testing Story
IRINA PASHKOVA
QA Lead, GreenM
Agenda
1. Regression ETL testing
2. Non-functional ETL testing
3. Functional ETL Testing
Puppy to Play with
Daily Runs
Full Refresh Mode
300 Customers
~ 500 Mln rec / tab
~ 5h ETL time
Better & Faster – ETL Evolution
or
Regression ETL Testing
ETL
Extract Transform Load
Operations Storages
or
DATA SOURCES
Reporting Oriented
Data Marts or
TARGETS
New Pipeline Version
Regression Testing
Non-Functional Testing
• Same Sources & Targets
• Same Transformation Rules
• Previous fully tested version of
ETL available
Regression via Reference Data Schema
• Exclude
• Tracking fields
• New functionality Data
• Clean up Test Schema
• Run Smoke suite first SOURCE
TESTED
TARGET
REFERENCE
TARGET
NEW ETL
VERSION
PROD ETL
VERSION
Regression Testing
FitNesse for ETL Regression
• Config files
• Connections
• Tab parameters
• Fixtures
• Non-empty tab
• No duplicates
• Counts match
• Content match
Regression Testing
FitNesse for ETL Regression
Regression Testing
Regression Challenges
Long run time of ETL
Big Data volume
Regression Testing
Time waste waiting
for a fix / change
Hang up tests
Manual Inspections
• Configurations:
• Connections
• Run mode
• Pipeline Steps order & dependencies
• Source & Target Tabs
• ETL code queries
Regression Testing: Challenges
Set the Limits!
• “Partial” run & Extract re-using
• Limit compared data
• Set timeout in tests
• Model missing data
Extract Transform Load
Regression Testing: Challenges
Take Care about Production Support Group
or
Non-functional ETL Testing
Non-functional Pipeline Testing
• Performance
• Security
• Load/ Stress
• Scalability
• Usability
• Reliability
Non-Functional Testing
Usability Testing
• Easy to
• identify current state
• find/read Error info
• re-configure
• Flexible Start
• Documentation
Non-Functional Testing
• Risks assessment
• Failure simulation
• Volume simulation
Reliability Testing
Non-Functional Testing
Reliability Testing Challenges
Hidden Risks Underestimation of severity
Dependency on 3d party services Underestimation of probability
Communication gaps
Non-Functional Testing: Challenges
Be Informed!
• Monitor Services Logs
• Organize Recovery Training
• Be specific with to-do’s
Non-Functional Testing: Challenges
We’re done! Aren’t we?
Add Analytics for
a New Business Module…
please
New Data Module Creation
or
Functional ETL Testing
Data Warehouse Testing
Extract Transform Load
SOURCE
TARGET
Test Underlying Data
Test Data Model
Balancing Tests
Data Quality Tests
Smoke Tests
Balancing Tests
Balancing Tests
Test Underlying Data
1. Gather info – bridge gaps!
2. Break rules that can be broken
3. Draft a Troubleshooting doc
Source Area Testing
Test Target Data Model
1. Naming convention
2. Optimal base for Visualization
3. Testability checks
Data Mart Structure Testing
Functional ETL Testing
• Smoke Tests
• Target Data Quality tests:
• Type
• Constraint
• Data Plausibility
• Logical Constraints
! Create similar / relevant tests where applicable for Source to help with further debugging
Functional ETL Testing
Functional ETL Testing
• Balancing Tests:
• Study/ Create Specification
• Test Minus Queries Assertions
via mutated data
• Do both-sides comparison
Functional ETL Testing
Balancing Tests
• One all-data storage
• AWS Glue & Athena
Functional ETL Testing
Most Common bugs
• Count Mismatch (incl. Duplicates)
• Anomalies issues: Null or Length relevant
• Date relevant calculations
Functional ETL Testing
ETL Testing Challenges
• Tests Complexity
• Unpredictable slow work of AWS Athena
• Impossible to check each single record
Functional ETL Testing
Visualization in Data QA
• Source Data Analysis
• Target Quality
Dashboard
• Dedicated resources
& Test Results
visualization
Functional ETL Testing
Ongoing Support
• Data Integrity Project
• Ongoing Logs Analysis
• Monitoring Rules &
Alarms
Testing in Production
Data Pipeline
Key Takeaways
• ETL verification is not that bad
• Know your data
• Be ready to meet Monsters
• Long ETL duration
• Big Data Volume
• Difference of Test Data from Prod
Your questions

More Related Content

What's hot

Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
Madhar Khan Pathan
 
MySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesMySQL optimisations of Docplanner services
MySQL optimisations of Docplanner services
Tomasz Wójcik
 
Testing, a pragmatic approach
Testing, a pragmatic approachTesting, a pragmatic approach
Testing, a pragmatic approach
Enrico Da Ros
 
Agile Tools
Agile ToolsAgile Tools
Agile Tools
Allan Mangune
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
BIOVIA
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
Madhar Khan Pathan
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
Brij Mishra
 
Annotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationAnnotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementation
Hélio Costa e Silva
 
Importing Queries using Mass Import Tool
Importing Queries using Mass Import ToolImporting Queries using Mass Import Tool
Importing Queries using Mass Import Tool
Datagaps Inc
 
Hands on training on DbFit Part-I
Hands on training on DbFit Part-IHands on training on DbFit Part-I
Hands on training on DbFit Part-I
Babul Mirdha
 
Crafting high quality code
Crafting high quality code Crafting high quality code
Crafting high quality code
Allan Mangune
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
Madhu Nepal
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
Sandeep Mehta
 
Object-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingObject-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & Programming
Allan Mangune
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entity
kartik179
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
Chetan Gadodia
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design
Allan Mangune
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
Mark Chappell
 

What's hot (18)

Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
MySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesMySQL optimisations of Docplanner services
MySQL optimisations of Docplanner services
 
Testing, a pragmatic approach
Testing, a pragmatic approachTesting, a pragmatic approach
Testing, a pragmatic approach
 
Agile Tools
Agile ToolsAgile Tools
Agile Tools
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
 
Annotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationAnnotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementation
 
Importing Queries using Mass Import Tool
Importing Queries using Mass Import ToolImporting Queries using Mass Import Tool
Importing Queries using Mass Import Tool
 
Hands on training on DbFit Part-I
Hands on training on DbFit Part-IHands on training on DbFit Part-I
Hands on training on DbFit Part-I
 
Crafting high quality code
Crafting high quality code Crafting high quality code
Crafting high quality code
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
Object-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingObject-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & Programming
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entity
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
 

Similar to Data Pipeline Installation Quality

ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
BugRaptors
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)
GOKb Project
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014
Red Gate Software
 
Data engineering testing services
Data engineering testing servicesData engineering testing services
Data engineering testing services
Nitor Infotech
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
Kellyn Pot'Vin-Gorman
 
Test Design and Automation for REST API
Test Design and Automation for REST APITest Design and Automation for REST API
Test Design and Automation for REST API
Ivan Katunou
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
Apica
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
amarnathdeo
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BI
Datavail
 
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
COMAQA.BY
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
Patrick Van Renterghem
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
JesusaEspeleta
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
Introduction to SoapUI day 2
Introduction to SoapUI day 2Introduction to SoapUI day 2
Introduction to SoapUI day 2
Qualitest
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
Sujit Kumar
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Databricks
 
Etl testing
Etl testingEtl testing
Etl testing
Krishna Prasad
 
Understanding System Performance
Understanding System PerformanceUnderstanding System Performance
Understanding System Performance
Teradata
 
Load Testing Best Practices
Load Testing Best PracticesLoad Testing Best Practices
Load Testing Best Practices
Apica
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
Rakesh Hansalia
 

Similar to Data Pipeline Installation Quality (20)

ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014
 
Data engineering testing services
Data engineering testing servicesData engineering testing services
Data engineering testing services
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Test Design and Automation for REST API
Test Design and Automation for REST APITest Design and Automation for REST API
Test Design and Automation for REST API
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BI
 
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Introduction to SoapUI day 2
Introduction to SoapUI day 2Introduction to SoapUI day 2
Introduction to SoapUI day 2
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
Etl testing
Etl testingEtl testing
Etl testing
 
Understanding System Performance
Understanding System PerformanceUnderstanding System Performance
Understanding System Performance
 
Load Testing Best Practices
Load Testing Best PracticesLoad Testing Best Practices
Load Testing Best Practices
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 

More from GreenM

User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BI
GreenM
 
Tableau vs Microstrategy
Tableau vs MicrostrategyTableau vs Microstrategy
Tableau vs Microstrategy
GreenM
 
Data monsters probablistic data structures
Data monsters probablistic data structuresData monsters probablistic data structures
Data monsters probablistic data structures
GreenM
 
Data streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersData streamsnorkelingdatamonsters
Data streamsnorkelingdatamonsters
GreenM
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
GreenM
 
DAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponDAX as Power BI Visualization Weapon
DAX as Power BI Visualization Weapon
GreenM
 
How To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerHow To Make Your Dashboard Smaller
How To Make Your Dashboard Smaller
GreenM
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
GreenM
 

More from GreenM (8)

User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BI
 
Tableau vs Microstrategy
Tableau vs MicrostrategyTableau vs Microstrategy
Tableau vs Microstrategy
 
Data monsters probablistic data structures
Data monsters probablistic data structuresData monsters probablistic data structures
Data monsters probablistic data structures
 
Data streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersData streamsnorkelingdatamonsters
Data streamsnorkelingdatamonsters
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
DAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponDAX as Power BI Visualization Weapon
DAX as Power BI Visualization Weapon
 
How To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerHow To Make Your Dashboard Smaller
How To Make Your Dashboard Smaller
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 

Recently uploaded

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 

Recently uploaded (20)

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 

Data Pipeline Installation Quality

  • 1. Pipeline Testing Story IRINA PASHKOVA QA Lead, GreenM
  • 2. Agenda 1. Regression ETL testing 2. Non-functional ETL testing 3. Functional ETL Testing
  • 3. Puppy to Play with Daily Runs Full Refresh Mode 300 Customers ~ 500 Mln rec / tab ~ 5h ETL time
  • 4. Better & Faster – ETL Evolution or Regression ETL Testing
  • 5. ETL Extract Transform Load Operations Storages or DATA SOURCES Reporting Oriented Data Marts or TARGETS
  • 6. New Pipeline Version Regression Testing Non-Functional Testing • Same Sources & Targets • Same Transformation Rules • Previous fully tested version of ETL available
  • 7. Regression via Reference Data Schema • Exclude • Tracking fields • New functionality Data • Clean up Test Schema • Run Smoke suite first SOURCE TESTED TARGET REFERENCE TARGET NEW ETL VERSION PROD ETL VERSION Regression Testing
  • 8. FitNesse for ETL Regression • Config files • Connections • Tab parameters • Fixtures • Non-empty tab • No duplicates • Counts match • Content match Regression Testing
  • 9. FitNesse for ETL Regression Regression Testing
  • 10. Regression Challenges Long run time of ETL Big Data volume Regression Testing Time waste waiting for a fix / change Hang up tests
  • 11. Manual Inspections • Configurations: • Connections • Run mode • Pipeline Steps order & dependencies • Source & Target Tabs • ETL code queries Regression Testing: Challenges
  • 12. Set the Limits! • “Partial” run & Extract re-using • Limit compared data • Set timeout in tests • Model missing data Extract Transform Load Regression Testing: Challenges
  • 13. Take Care about Production Support Group or Non-functional ETL Testing
  • 14. Non-functional Pipeline Testing • Performance • Security • Load/ Stress • Scalability • Usability • Reliability Non-Functional Testing
  • 15. Usability Testing • Easy to • identify current state • find/read Error info • re-configure • Flexible Start • Documentation Non-Functional Testing
  • 16. • Risks assessment • Failure simulation • Volume simulation Reliability Testing Non-Functional Testing
  • 17. Reliability Testing Challenges Hidden Risks Underestimation of severity Dependency on 3d party services Underestimation of probability Communication gaps Non-Functional Testing: Challenges
  • 18. Be Informed! • Monitor Services Logs • Organize Recovery Training • Be specific with to-do’s Non-Functional Testing: Challenges
  • 20. Add Analytics for a New Business Module… please
  • 21. New Data Module Creation or Functional ETL Testing
  • 22. Data Warehouse Testing Extract Transform Load SOURCE TARGET Test Underlying Data Test Data Model Balancing Tests Data Quality Tests Smoke Tests Balancing Tests Balancing Tests
  • 23. Test Underlying Data 1. Gather info – bridge gaps! 2. Break rules that can be broken 3. Draft a Troubleshooting doc Source Area Testing
  • 24. Test Target Data Model 1. Naming convention 2. Optimal base for Visualization 3. Testability checks Data Mart Structure Testing
  • 25. Functional ETL Testing • Smoke Tests • Target Data Quality tests: • Type • Constraint • Data Plausibility • Logical Constraints ! Create similar / relevant tests where applicable for Source to help with further debugging Functional ETL Testing
  • 26. Functional ETL Testing • Balancing Tests: • Study/ Create Specification • Test Minus Queries Assertions via mutated data • Do both-sides comparison Functional ETL Testing
  • 27. Balancing Tests • One all-data storage • AWS Glue & Athena Functional ETL Testing
  • 28. Most Common bugs • Count Mismatch (incl. Duplicates) • Anomalies issues: Null or Length relevant • Date relevant calculations Functional ETL Testing
  • 29. ETL Testing Challenges • Tests Complexity • Unpredictable slow work of AWS Athena • Impossible to check each single record Functional ETL Testing
  • 30. Visualization in Data QA • Source Data Analysis • Target Quality Dashboard • Dedicated resources & Test Results visualization Functional ETL Testing
  • 31. Ongoing Support • Data Integrity Project • Ongoing Logs Analysis • Monitoring Rules & Alarms Testing in Production Data Pipeline
  • 32. Key Takeaways • ETL verification is not that bad • Know your data • Be ready to meet Monsters • Long ETL duration • Big Data Volume • Difference of Test Data from Prod