SlideShare a Scribd company logo
ETL Testing
- Chetan Gadodia
Agenda
◎Datawarehouse Architecture
◎What is ETL?
◎Why ETL is a separate Testing Type?
◎Discuss some ETL Jargons
◎ETL Loading Strategies
◎ETL Testing Types
◎Preparing Test Data for ETL Testing
◎ETL Testing Challenges
◎Best Practices on ETL Testing
◎Demo Example
2
Datawarehouse Architecture
3
ETL – Extract, Transformation and Load
◎ Data is taken (extracted) from a source system,
converted (transformed) into a format that can be
analyzed, and stored (loaded) into a data warehouse or
other system
4
ETL - Separate Testing Type?
◎Validation of Data Migration (End – to – End)
○ Source to Target record count match
○ Source to Target data match
○ Transformation of Data
○ Loading Techniques – Full, Incremental
◎Comparison – Current (Legacy) vs Future system
○ Reports / Data comparison
○ Loading time
5
Contd..
◎Validation of Business use cases
○ Transformation of data in different format for downstream
systems
○ File Transfer
6
ETL Jargons
◎File Systems
○ Structured - clearly defined data types
(CSV, Database, Tab-separated, etc..)
○ Unstructured - not as easily searchable
(Email, Web-pages, videos, etc..)
◎Dimensions
○ Descriptive attributes that are textual fields
○ Dimensions like people, products, place and time
7
Contd..
◎Facts
○ Consists of business facts and foreign keys that refer to
primary keys in the dimension tables provide the
measurement of an enterprise
8
Contd..
◎Staging Layer
○ Staging area is a place where you hold temporary tables
on data warehouse server
◎Look-up
○ Reference tables – used to fetch the matching values
○ Target tables – used to find the delta records or perform
incremental load
9
ETL Loading Strategies
◎Full Load – Truncate and Load
○ Truncating the target table before loading new data (Staging
Area)
◎Incremental Load
○ Incremental load is a process of loading data incrementally
○ Only new and changed data is loaded to the destination
○ Used to keep historical data
○ Uses Timestamps, Flags, Business key to fetch delta records
10
SCD types
◎A Slowly Changing Dimension (SCD) is a dimension
that stores and manages both current and historical
data over time in a data warehouse.
◎It is considered and implemented as one of the most
critical ETL tasks in tracking the history of dimension
records
11
Contd..
◎Type 0 SCDs– Fixed Dimension
○ No changes allowed, dimension never changes
◎Type 1 SCDs – Overwriting
○ Existing data is lost as it is not stored anywhere else
○ Default type of dimension you create
◎Type 2 SCDs - Creating another dimension record
○ When the value of a chosen attribute changes, the current record is
closed. A new record is created -becomes the current record
○ Each record contains the effective time and expiration time
12
ETL Testing Types
◎Production Validation Testing
○ Table balancing or product reconciliation. It is performed on
data before or while being moved into the production system in
the correct order.
◎Source To Target Testing
○ Performed to validate the data values after data transformation.
◎Application Upgrade
○ Check data extracted from an older application or repository are
exactly same as the data in a repository or new application.
13
Contd..
◎Data Transformation Testing:
○ Multiple SQL queries are required to be run for each and
every row to verify data transformation standards.
◎Data Completeness Testing:
○ Verify if the expected data is loaded at the appropriate
destination as per the predefined standards.
14
Preparing Test Data
◎Can be Generated
○ Manually
○ Mass copy of data from production to testing environment
○ Mass copy of test data from legacy client systems
○ Automated Test Data Generation Tools
◎How to select data for testing
○ Data profiling
○ Full field length data
○ Null records
○ Lookup values
15
ETL Testing Challenges
◎ Testers have no privileges to execute ETL jobs by their own
◎ Volume and complexity of data are very huge
◎ Incompatible and duplicate data
◎ Loss of data during ETL process
◎ Fault in business process and procedures
◎ Trouble acquiring and building test data
◎ Unstable testing environment
◎ Missing business flow information
16
Best Practices
◎Make sure data is transformed correctly
◎Without any data loss and truncation projected data
should be loaded into the data warehouse
◎Ensure that ETL application appropriately rejects and
replaces with default values and reports invalid data
◎Ensure appropriate load occurs at each data layer
17
Contd..
◎Need to ensure that the data loaded in data
warehouse within prescribed and expected time
frames to confirm scalability and performance
◎Ensure records are updated as per appropriate
Business Key in the target database tables
◎Ensure coding standards are in place while designing
ETL mappings
18
Demo
Demonstrating SCD
type scenarios
19
Thanks!
Any questions?
You can find me at:
connect2chetan@live.com
+91-9765180008
/ Chetan_G
20

More Related Content

What's hot

ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
H2Kinfosys
 
ETL
ETLETL
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
Ismail El Gayar
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
Sunita Sahu
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
Torana, Inc.
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
Jonathan Levin
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
jeshocarme
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the Planning
TechWell
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
Sandeep Mehta
 
ETL Process
ETL ProcessETL Process
ETL Process
Rohin Rangnekar
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
Ruud Kapteijn
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overview
honglee71
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
DATAVERSITY
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
mahezabeenIlkal
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
Robert MacLean
 
Oracle SQL Basics
Oracle SQL BasicsOracle SQL Basics
Oracle SQL Basics
Dhananjay Goel
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
Omid Vahdaty
 
Oracle
OracleOracle

What's hot (20)

ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
 
ETL
ETLETL
ETL
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the Planning
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
ETL Process
ETL ProcessETL Process
ETL Process
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overview
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
 
Oracle SQL Basics
Oracle SQL BasicsOracle SQL Basics
Oracle SQL Basics
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Oracle
OracleOracle
Oracle
 

Similar to ETL Testing Overview

ETL_Methodology.pptx
ETL_Methodology.pptxETL_Methodology.pptx
ETL_Methodology.pptx
yogeshsuryawanshi47
 
Tuning data warehouse
Tuning data warehouseTuning data warehouse
Tuning data warehouse
Srinivasan R
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Andreas Buckenhofer
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data Integration
Tilmann Rabl
 
Testing Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on HadoopTesting Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on Hadoop
CitiusTech
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
JesusaEspeleta
 
Jithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL TestingJithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL Testing
jithenderReddy Gunda
 
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data WarehousingGoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
Michael Rainey
 
Etl confessions pg conf us 2017
Etl confessions   pg conf us 2017Etl confessions   pg conf us 2017
Etl confessions pg conf us 2017
Corey Huinker
 
Data warehouse
Data warehouseData warehouse
Data warehouse
shachibattar
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Julien Le Dem
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
EDB
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
nikshaikh786
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
BOSupport
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
Lars Albertsson
 
GoldenGate and Oracle Data Integrator - A Perfect Match...
GoldenGate and Oracle Data Integrator - A Perfect Match...GoldenGate and Oracle Data Integrator - A Perfect Match...
GoldenGate and Oracle Data Integrator - A Perfect Match...
Michael Rainey
 
How to Cost-Optimize Cloud Data Pipelines_.pptx
How to Cost-Optimize Cloud Data Pipelines_.pptxHow to Cost-Optimize Cloud Data Pipelines_.pptx
How to Cost-Optimize Cloud Data Pipelines_.pptx
Sadeka Islam
 
Sql server performance tuning
Sql server performance tuningSql server performance tuning
Sql server performance tuning
ngupt28
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
Gaurav Bhatnagar
 

Similar to ETL Testing Overview (20)

ETL_Methodology.pptx
ETL_Methodology.pptxETL_Methodology.pptx
ETL_Methodology.pptx
 
Tuning data warehouse
Tuning data warehouseTuning data warehouse
Tuning data warehouse
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data Integration
 
Testing Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on HadoopTesting Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on Hadoop
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Jithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL TestingJithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL Testing
 
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data WarehousingGoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
 
Etl confessions pg conf us 2017
Etl confessions   pg conf us 2017Etl confessions   pg conf us 2017
Etl confessions pg conf us 2017
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
GoldenGate and Oracle Data Integrator - A Perfect Match...
GoldenGate and Oracle Data Integrator - A Perfect Match...GoldenGate and Oracle Data Integrator - A Perfect Match...
GoldenGate and Oracle Data Integrator - A Perfect Match...
 
How to Cost-Optimize Cloud Data Pipelines_.pptx
How to Cost-Optimize Cloud Data Pipelines_.pptxHow to Cost-Optimize Cloud Data Pipelines_.pptx
How to Cost-Optimize Cloud Data Pipelines_.pptx
 
Sql server performance tuning
Sql server performance tuningSql server performance tuning
Sql server performance tuning
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 

Recently uploaded

Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

ETL Testing Overview

  • 2. Agenda ◎Datawarehouse Architecture ◎What is ETL? ◎Why ETL is a separate Testing Type? ◎Discuss some ETL Jargons ◎ETL Loading Strategies ◎ETL Testing Types ◎Preparing Test Data for ETL Testing ◎ETL Testing Challenges ◎Best Practices on ETL Testing ◎Demo Example 2
  • 4. ETL – Extract, Transformation and Load ◎ Data is taken (extracted) from a source system, converted (transformed) into a format that can be analyzed, and stored (loaded) into a data warehouse or other system 4
  • 5. ETL - Separate Testing Type? ◎Validation of Data Migration (End – to – End) ○ Source to Target record count match ○ Source to Target data match ○ Transformation of Data ○ Loading Techniques – Full, Incremental ◎Comparison – Current (Legacy) vs Future system ○ Reports / Data comparison ○ Loading time 5
  • 6. Contd.. ◎Validation of Business use cases ○ Transformation of data in different format for downstream systems ○ File Transfer 6
  • 7. ETL Jargons ◎File Systems ○ Structured - clearly defined data types (CSV, Database, Tab-separated, etc..) ○ Unstructured - not as easily searchable (Email, Web-pages, videos, etc..) ◎Dimensions ○ Descriptive attributes that are textual fields ○ Dimensions like people, products, place and time 7
  • 8. Contd.. ◎Facts ○ Consists of business facts and foreign keys that refer to primary keys in the dimension tables provide the measurement of an enterprise 8
  • 9. Contd.. ◎Staging Layer ○ Staging area is a place where you hold temporary tables on data warehouse server ◎Look-up ○ Reference tables – used to fetch the matching values ○ Target tables – used to find the delta records or perform incremental load 9
  • 10. ETL Loading Strategies ◎Full Load – Truncate and Load ○ Truncating the target table before loading new data (Staging Area) ◎Incremental Load ○ Incremental load is a process of loading data incrementally ○ Only new and changed data is loaded to the destination ○ Used to keep historical data ○ Uses Timestamps, Flags, Business key to fetch delta records 10
  • 11. SCD types ◎A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. ◎It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records 11
  • 12. Contd.. ◎Type 0 SCDs– Fixed Dimension ○ No changes allowed, dimension never changes ◎Type 1 SCDs – Overwriting ○ Existing data is lost as it is not stored anywhere else ○ Default type of dimension you create ◎Type 2 SCDs - Creating another dimension record ○ When the value of a chosen attribute changes, the current record is closed. A new record is created -becomes the current record ○ Each record contains the effective time and expiration time 12
  • 13. ETL Testing Types ◎Production Validation Testing ○ Table balancing or product reconciliation. It is performed on data before or while being moved into the production system in the correct order. ◎Source To Target Testing ○ Performed to validate the data values after data transformation. ◎Application Upgrade ○ Check data extracted from an older application or repository are exactly same as the data in a repository or new application. 13
  • 14. Contd.. ◎Data Transformation Testing: ○ Multiple SQL queries are required to be run for each and every row to verify data transformation standards. ◎Data Completeness Testing: ○ Verify if the expected data is loaded at the appropriate destination as per the predefined standards. 14
  • 15. Preparing Test Data ◎Can be Generated ○ Manually ○ Mass copy of data from production to testing environment ○ Mass copy of test data from legacy client systems ○ Automated Test Data Generation Tools ◎How to select data for testing ○ Data profiling ○ Full field length data ○ Null records ○ Lookup values 15
  • 16. ETL Testing Challenges ◎ Testers have no privileges to execute ETL jobs by their own ◎ Volume and complexity of data are very huge ◎ Incompatible and duplicate data ◎ Loss of data during ETL process ◎ Fault in business process and procedures ◎ Trouble acquiring and building test data ◎ Unstable testing environment ◎ Missing business flow information 16
  • 17. Best Practices ◎Make sure data is transformed correctly ◎Without any data loss and truncation projected data should be loaded into the data warehouse ◎Ensure that ETL application appropriately rejects and replaces with default values and reports invalid data ◎Ensure appropriate load occurs at each data layer 17
  • 18. Contd.. ◎Need to ensure that the data loaded in data warehouse within prescribed and expected time frames to confirm scalability and performance ◎Ensure records are updated as per appropriate Business Key in the target database tables ◎Ensure coding standards are in place while designing ETL mappings 18
  • 20. Thanks! Any questions? You can find me at: connect2chetan@live.com +91-9765180008 / Chetan_G 20