SlideShare a Scribd company logo
1 of 33
Download to read offline
Pipeline Testing Story
IRINA PASHKOVA
QA Lead, GreenM
Agenda
1. Regression ETL testing
2. Non-functional ETL testing
3. Functional ETL Testing
Puppy to Play with
Daily Runs
Full Refresh Mode
300 Customers
~ 500 Mln rec / tab
~ 5h ETL time
Better & Faster – ETL Evolution
or
Regression ETL Testing
ETL
Extract Transform Load
Operations Storages
or
DATA SOURCES
Reporting Oriented
Data Marts or
TARGETS
New Pipeline Version
Regression Testing
Non-Functional Testing
• Same Sources & Targets
• Same Transformation Rules
• Previous fully tested version of
ETL available
Regression via Reference Data Schema
• Exclude
• Tracking fields
• New functionality Data
• Clean up Test Schema
• Run Smoke suite first SOURCE
TESTED
TARGET
REFERENCE
TARGET
NEW ETL
VERSION
PROD ETL
VERSION
Regression Testing
FitNesse for ETL Regression
• Config files
• Connections
• Tab parameters
• Fixtures
• Non-empty tab
• No duplicates
• Counts match
• Content match
Regression Testing
FitNesse for ETL Regression
Regression Testing
Regression Challenges
Long run time of ETL
Big Data volume
Regression Testing
Time waste waiting
for a fix / change
Hang up tests
Manual Inspections
• Configurations:
• Connections
• Run mode
• Pipeline Steps order & dependencies
• Source & Target Tabs
• ETL code queries
Regression Testing: Challenges
Set the Limits!
• “Partial” run & Extract re-using
• Limit compared data
• Set timeout in tests
• Model missing data
Extract Transform Load
Regression Testing: Challenges
Take Care about Production Support Group
or
Non-functional ETL Testing
Non-functional Pipeline Testing
• Performance
• Security
• Load/ Stress
• Scalability
• Usability
• Reliability
Non-Functional Testing
Usability Testing
• Easy to
• identify current state
• find/read Error info
• re-configure
• Flexible Start
• Documentation
Non-Functional Testing
• Risks assessment
• Failure simulation
• Volume simulation
Reliability Testing
Non-Functional Testing
Reliability Testing Challenges
Hidden Risks Underestimation of severity
Dependency on 3d party services Underestimation of probability
Communication gaps
Non-Functional Testing: Challenges
Be Informed!
• Monitor Services Logs
• Organize Recovery Training
• Be specific with to-do’s
Non-Functional Testing: Challenges
We’re done! Aren’t we?
Add Analytics for
a New Business Module…
please
New Data Module Creation
or
Functional ETL Testing
Data Warehouse Testing
Extract Transform Load
SOURCE
TARGET
Test Underlying Data
Test Data Model
Balancing Tests
Data Quality Tests
Smoke Tests
Balancing Tests
Balancing Tests
Test Underlying Data
1. Gather info – bridge gaps!
2. Break rules that can be broken
3. Draft a Troubleshooting doc
Source Area Testing
Test Target Data Model
1. Naming convention
2. Optimal base for Visualization
3. Testability checks
Data Mart Structure Testing
Functional ETL Testing
• Smoke Tests
• Target Data Quality tests:
• Type
• Constraint
• Data Plausibility
• Logical Constraints
! Create similar / relevant tests where applicable for Source to help with further debugging
Functional ETL Testing
Functional ETL Testing
• Balancing Tests:
• Study/ Create Specification
• Test Minus Queries Assertions
via mutated data
• Do both-sides comparison
Functional ETL Testing
Balancing Tests
• One all-data storage
• AWS Glue & Athena
Functional ETL Testing
Most Common bugs
• Count Mismatch (incl. Duplicates)
• Anomalies issues: Null or Length relevant
• Date relevant calculations
Functional ETL Testing
ETL Testing Challenges
• Tests Complexity
• Unpredictable slow work of AWS Athena
• Impossible to check each single record
Functional ETL Testing
Visualization in Data QA
• Source Data Analysis
• Target Quality
Dashboard
• Dedicated resources
& Test Results
visualization
Functional ETL Testing
Ongoing Support
• Data Integrity Project
• Ongoing Logs Analysis
• Monitoring Rules &
Alarms
Testing in Production
Data Pipeline
Key Takeaways
• ETL verification is not that bad
• Know your data
• Be ready to meet Monsters
• Long ETL duration
• Big Data Volume
• Difference of Test Data from Prod
Your questions

More Related Content

What's hot

Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering Madhar Khan Pathan
 
MySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesMySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesTomasz Wójcik
 
Testing, a pragmatic approach
Testing, a pragmatic approachTesting, a pragmatic approach
Testing, a pragmatic approachEnrico Da Ros
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering Madhar Khan Pathan
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Brij Mishra
 
Annotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationAnnotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationHélio Costa e Silva
 
Importing Queries using Mass Import Tool
Importing Queries using Mass Import ToolImporting Queries using Mass Import Tool
Importing Queries using Mass Import ToolDatagaps Inc
 
Hands on training on DbFit Part-I
Hands on training on DbFit Part-IHands on training on DbFit Part-I
Hands on training on DbFit Part-IBabul Mirdha
 
Crafting high quality code
Crafting high quality code Crafting high quality code
Crafting high quality code Allan Mangune
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanMadhu Nepal
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognosSandeep Mehta
 
Object-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingObject-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingAllan Mangune
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entitykartik179
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design Allan Mangune
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful toolsMark Chappell
 

What's hot (18)

Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
MySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesMySQL optimisations of Docplanner services
MySQL optimisations of Docplanner services
 
Testing, a pragmatic approach
Testing, a pragmatic approachTesting, a pragmatic approach
Testing, a pragmatic approach
 
Agile Tools
Agile ToolsAgile Tools
Agile Tools
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
 
Annotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationAnnotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementation
 
Importing Queries using Mass Import Tool
Importing Queries using Mass Import ToolImporting Queries using Mass Import Tool
Importing Queries using Mass Import Tool
 
Hands on training on DbFit Part-I
Hands on training on DbFit Part-IHands on training on DbFit Part-I
Hands on training on DbFit Part-I
 
Crafting high quality code
Crafting high quality code Crafting high quality code
Crafting high quality code
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
Object-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingObject-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & Programming
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entity
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
 

Similar to Data Pipeline Installation Quality

ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataBugRaptors
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb Project
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Red Gate Software
 
Data engineering testing services
Data engineering testing servicesData engineering testing services
Data engineering testing servicesNitor Infotech
 
Test Design and Automation for REST API
Test Design and Automation for REST APITest Design and Automation for REST API
Test Design and Automation for REST APIIvan Katunou
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...Apica
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxamarnathdeo
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BIDatavail
 
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.COMAQA.BY
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptxJesusaEspeleta
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Introduction to SoapUI day 2
Introduction to SoapUI day 2Introduction to SoapUI day 2
Introduction to SoapUI day 2Qualitest
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to ApexSujit Kumar
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks
 
Understanding System Performance
Understanding System PerformanceUnderstanding System Performance
Understanding System PerformanceTeradata
 
Load Testing Best Practices
Load Testing Best PracticesLoad Testing Best Practices
Load Testing Best PracticesApica
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing processRakesh Hansalia
 

Similar to Data Pipeline Installation Quality (20)

ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014
 
Data engineering testing services
Data engineering testing servicesData engineering testing services
Data engineering testing services
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Test Design and Automation for REST API
Test Design and Automation for REST APITest Design and Automation for REST API
Test Design and Automation for REST API
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BI
 
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Introduction to SoapUI day 2
Introduction to SoapUI day 2Introduction to SoapUI day 2
Introduction to SoapUI day 2
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
Etl testing
Etl testingEtl testing
Etl testing
 
Understanding System Performance
Understanding System PerformanceUnderstanding System Performance
Understanding System Performance
 
Load Testing Best Practices
Load Testing Best PracticesLoad Testing Best Practices
Load Testing Best Practices
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 

More from GreenM

User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BIGreenM
 
Tableau vs Microstrategy
Tableau vs MicrostrategyTableau vs Microstrategy
Tableau vs MicrostrategyGreenM
 
Data monsters probablistic data structures
Data monsters probablistic data structuresData monsters probablistic data structures
Data monsters probablistic data structuresGreenM
 
Data streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersData streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersGreenM
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl newGreenM
 
DAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponDAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponGreenM
 
How To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerHow To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerGreenM
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipelineGreenM
 

More from GreenM (8)

User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BI
 
Tableau vs Microstrategy
Tableau vs MicrostrategyTableau vs Microstrategy
Tableau vs Microstrategy
 
Data monsters probablistic data structures
Data monsters probablistic data structuresData monsters probablistic data structures
Data monsters probablistic data structures
 
Data streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersData streamsnorkelingdatamonsters
Data streamsnorkelingdatamonsters
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
DAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponDAX as Power BI Visualization Weapon
DAX as Power BI Visualization Weapon
 
How To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerHow To Make Your Dashboard Smaller
How To Make Your Dashboard Smaller
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Data Pipeline Installation Quality

  • 1. Pipeline Testing Story IRINA PASHKOVA QA Lead, GreenM
  • 2. Agenda 1. Regression ETL testing 2. Non-functional ETL testing 3. Functional ETL Testing
  • 3. Puppy to Play with Daily Runs Full Refresh Mode 300 Customers ~ 500 Mln rec / tab ~ 5h ETL time
  • 4. Better & Faster – ETL Evolution or Regression ETL Testing
  • 5. ETL Extract Transform Load Operations Storages or DATA SOURCES Reporting Oriented Data Marts or TARGETS
  • 6. New Pipeline Version Regression Testing Non-Functional Testing • Same Sources & Targets • Same Transformation Rules • Previous fully tested version of ETL available
  • 7. Regression via Reference Data Schema • Exclude • Tracking fields • New functionality Data • Clean up Test Schema • Run Smoke suite first SOURCE TESTED TARGET REFERENCE TARGET NEW ETL VERSION PROD ETL VERSION Regression Testing
  • 8. FitNesse for ETL Regression • Config files • Connections • Tab parameters • Fixtures • Non-empty tab • No duplicates • Counts match • Content match Regression Testing
  • 9. FitNesse for ETL Regression Regression Testing
  • 10. Regression Challenges Long run time of ETL Big Data volume Regression Testing Time waste waiting for a fix / change Hang up tests
  • 11. Manual Inspections • Configurations: • Connections • Run mode • Pipeline Steps order & dependencies • Source & Target Tabs • ETL code queries Regression Testing: Challenges
  • 12. Set the Limits! • “Partial” run & Extract re-using • Limit compared data • Set timeout in tests • Model missing data Extract Transform Load Regression Testing: Challenges
  • 13. Take Care about Production Support Group or Non-functional ETL Testing
  • 14. Non-functional Pipeline Testing • Performance • Security • Load/ Stress • Scalability • Usability • Reliability Non-Functional Testing
  • 15. Usability Testing • Easy to • identify current state • find/read Error info • re-configure • Flexible Start • Documentation Non-Functional Testing
  • 16. • Risks assessment • Failure simulation • Volume simulation Reliability Testing Non-Functional Testing
  • 17. Reliability Testing Challenges Hidden Risks Underestimation of severity Dependency on 3d party services Underestimation of probability Communication gaps Non-Functional Testing: Challenges
  • 18. Be Informed! • Monitor Services Logs • Organize Recovery Training • Be specific with to-do’s Non-Functional Testing: Challenges
  • 20. Add Analytics for a New Business Module… please
  • 21. New Data Module Creation or Functional ETL Testing
  • 22. Data Warehouse Testing Extract Transform Load SOURCE TARGET Test Underlying Data Test Data Model Balancing Tests Data Quality Tests Smoke Tests Balancing Tests Balancing Tests
  • 23. Test Underlying Data 1. Gather info – bridge gaps! 2. Break rules that can be broken 3. Draft a Troubleshooting doc Source Area Testing
  • 24. Test Target Data Model 1. Naming convention 2. Optimal base for Visualization 3. Testability checks Data Mart Structure Testing
  • 25. Functional ETL Testing • Smoke Tests • Target Data Quality tests: • Type • Constraint • Data Plausibility • Logical Constraints ! Create similar / relevant tests where applicable for Source to help with further debugging Functional ETL Testing
  • 26. Functional ETL Testing • Balancing Tests: • Study/ Create Specification • Test Minus Queries Assertions via mutated data • Do both-sides comparison Functional ETL Testing
  • 27. Balancing Tests • One all-data storage • AWS Glue & Athena Functional ETL Testing
  • 28. Most Common bugs • Count Mismatch (incl. Duplicates) • Anomalies issues: Null or Length relevant • Date relevant calculations Functional ETL Testing
  • 29. ETL Testing Challenges • Tests Complexity • Unpredictable slow work of AWS Athena • Impossible to check each single record Functional ETL Testing
  • 30. Visualization in Data QA • Source Data Analysis • Target Quality Dashboard • Dedicated resources & Test Results visualization Functional ETL Testing
  • 31. Ongoing Support • Data Integrity Project • Ongoing Logs Analysis • Monitoring Rules & Alarms Testing in Production Data Pipeline
  • 32. Key Takeaways • ETL verification is not that bad • Know your data • Be ready to meet Monsters • Long ETL duration • Big Data Volume • Difference of Test Data from Prod