ETL Testing
- Chetan Gadodia
Agenda
◎Datawarehouse Architecture
◎What is ETL?
◎Why ETL is a separate Testing Type?
◎Discuss some ETL Jargons
◎ETL Loading Strategies
◎ETL Testing Types
◎Preparing Test Data for ETL Testing
◎ETL Testing Challenges
◎Best Practices on ETL Testing
◎Demo Example
2
Datawarehouse Architecture
3
ETL – Extract, Transformation and Load
◎ Data is taken (extracted) from a source system,
converted (transformed) into a format that can be
analyzed, and stored (loaded) into a data warehouse or
other system
4
ETL - Separate Testing Type?
◎Validation of Data Migration (End – to – End)
○ Source to Target record count match
○ Source to Target data match
○ Transformation of Data
○ Loading Techniques – Full, Incremental
◎Comparison – Current (Legacy) vs Future system
○ Reports / Data comparison
○ Loading time
5
Contd..
◎Validation of Business use cases
○ Transformation of data in different format for downstream
systems
○ File Transfer
6
ETL Jargons
◎File Systems
○ Structured - clearly defined data types
(CSV, Database, Tab-separated, etc..)
○ Unstructured - not as easily searchable
(Email, Web-pages, videos, etc..)
◎Dimensions
○ Descriptive attributes that are textual fields
○ Dimensions like people, products, place and time
7
Contd..
◎Facts
○ Consists of business facts and foreign keys that refer to
primary keys in the dimension tables provide the
measurement of an enterprise
8
Contd..
◎Staging Layer
○ Staging area is a place where you hold temporary tables
on data warehouse server
◎Look-up
○ Reference tables – used to fetch the matching values
○ Target tables – used to find the delta records or perform
incremental load
9
ETL Loading Strategies
◎Full Load – Truncate and Load
○ Truncating the target table before loading new data (Staging
Area)
◎Incremental Load
○ Incremental load is a process of loading data incrementally
○ Only new and changed data is loaded to the destination
○ Used to keep historical data
○ Uses Timestamps, Flags, Business key to fetch delta records
10
SCD types
◎A Slowly Changing Dimension (SCD) is a dimension
that stores and manages both current and historical
data over time in a data warehouse.
◎It is considered and implemented as one of the most
critical ETL tasks in tracking the history of dimension
records
11
Contd..
◎Type 0 SCDs– Fixed Dimension
○ No changes allowed, dimension never changes
◎Type 1 SCDs – Overwriting
○ Existing data is lost as it is not stored anywhere else
○ Default type of dimension you create
◎Type 2 SCDs - Creating another dimension record
○ When the value of a chosen attribute changes, the current record is
closed. A new record is created -becomes the current record
○ Each record contains the effective time and expiration time
12
ETL Testing Types
◎Production Validation Testing
○ Table balancing or product reconciliation. It is performed on
data before or while being moved into the production system in
the correct order.
◎Source To Target Testing
○ Performed to validate the data values after data transformation.
◎Application Upgrade
○ Check data extracted from an older application or repository are
exactly same as the data in a repository or new application.
13
Contd..
◎Data Transformation Testing:
○ Multiple SQL queries are required to be run for each and
every row to verify data transformation standards.
◎Data Completeness Testing:
○ Verify if the expected data is loaded at the appropriate
destination as per the predefined standards.
14
Preparing Test Data
◎Can be Generated
○ Manually
○ Mass copy of data from production to testing environment
○ Mass copy of test data from legacy client systems
○ Automated Test Data Generation Tools
◎How to select data for testing
○ Data profiling
○ Full field length data
○ Null records
○ Lookup values
15
ETL Testing Challenges
◎ Testers have no privileges to execute ETL jobs by their own
◎ Volume and complexity of data are very huge
◎ Incompatible and duplicate data
◎ Loss of data during ETL process
◎ Fault in business process and procedures
◎ Trouble acquiring and building test data
◎ Unstable testing environment
◎ Missing business flow information
16
Best Practices
◎Make sure data is transformed correctly
◎Without any data loss and truncation projected data
should be loaded into the data warehouse
◎Ensure that ETL application appropriately rejects and
replaces with default values and reports invalid data
◎Ensure appropriate load occurs at each data layer
17
Contd..
◎Need to ensure that the data loaded in data
warehouse within prescribed and expected time
frames to confirm scalability and performance
◎Ensure records are updated as per appropriate
Business Key in the target database tables
◎Ensure coding standards are in place while designing
ETL mappings
18
Demo
Demonstrating SCD
type scenarios
19
Thanks!
Any questions?
You can find me at:
connect2chetan@live.com
+91-9765180008
/ Chetan_G
20

ETL Testing Overview

  • 1.
  • 2.
    Agenda ◎Datawarehouse Architecture ◎What isETL? ◎Why ETL is a separate Testing Type? ◎Discuss some ETL Jargons ◎ETL Loading Strategies ◎ETL Testing Types ◎Preparing Test Data for ETL Testing ◎ETL Testing Challenges ◎Best Practices on ETL Testing ◎Demo Example 2
  • 3.
  • 4.
    ETL – Extract,Transformation and Load ◎ Data is taken (extracted) from a source system, converted (transformed) into a format that can be analyzed, and stored (loaded) into a data warehouse or other system 4
  • 5.
    ETL - SeparateTesting Type? ◎Validation of Data Migration (End – to – End) ○ Source to Target record count match ○ Source to Target data match ○ Transformation of Data ○ Loading Techniques – Full, Incremental ◎Comparison – Current (Legacy) vs Future system ○ Reports / Data comparison ○ Loading time 5
  • 6.
    Contd.. ◎Validation of Businessuse cases ○ Transformation of data in different format for downstream systems ○ File Transfer 6
  • 7.
    ETL Jargons ◎File Systems ○Structured - clearly defined data types (CSV, Database, Tab-separated, etc..) ○ Unstructured - not as easily searchable (Email, Web-pages, videos, etc..) ◎Dimensions ○ Descriptive attributes that are textual fields ○ Dimensions like people, products, place and time 7
  • 8.
    Contd.. ◎Facts ○ Consists ofbusiness facts and foreign keys that refer to primary keys in the dimension tables provide the measurement of an enterprise 8
  • 9.
    Contd.. ◎Staging Layer ○ Stagingarea is a place where you hold temporary tables on data warehouse server ◎Look-up ○ Reference tables – used to fetch the matching values ○ Target tables – used to find the delta records or perform incremental load 9
  • 10.
    ETL Loading Strategies ◎FullLoad – Truncate and Load ○ Truncating the target table before loading new data (Staging Area) ◎Incremental Load ○ Incremental load is a process of loading data incrementally ○ Only new and changed data is loaded to the destination ○ Used to keep historical data ○ Uses Timestamps, Flags, Business key to fetch delta records 10
  • 11.
    SCD types ◎A SlowlyChanging Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. ◎It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records 11
  • 12.
    Contd.. ◎Type 0 SCDs–Fixed Dimension ○ No changes allowed, dimension never changes ◎Type 1 SCDs – Overwriting ○ Existing data is lost as it is not stored anywhere else ○ Default type of dimension you create ◎Type 2 SCDs - Creating another dimension record ○ When the value of a chosen attribute changes, the current record is closed. A new record is created -becomes the current record ○ Each record contains the effective time and expiration time 12
  • 13.
    ETL Testing Types ◎ProductionValidation Testing ○ Table balancing or product reconciliation. It is performed on data before or while being moved into the production system in the correct order. ◎Source To Target Testing ○ Performed to validate the data values after data transformation. ◎Application Upgrade ○ Check data extracted from an older application or repository are exactly same as the data in a repository or new application. 13
  • 14.
    Contd.. ◎Data Transformation Testing: ○Multiple SQL queries are required to be run for each and every row to verify data transformation standards. ◎Data Completeness Testing: ○ Verify if the expected data is loaded at the appropriate destination as per the predefined standards. 14
  • 15.
    Preparing Test Data ◎Canbe Generated ○ Manually ○ Mass copy of data from production to testing environment ○ Mass copy of test data from legacy client systems ○ Automated Test Data Generation Tools ◎How to select data for testing ○ Data profiling ○ Full field length data ○ Null records ○ Lookup values 15
  • 16.
    ETL Testing Challenges ◎Testers have no privileges to execute ETL jobs by their own ◎ Volume and complexity of data are very huge ◎ Incompatible and duplicate data ◎ Loss of data during ETL process ◎ Fault in business process and procedures ◎ Trouble acquiring and building test data ◎ Unstable testing environment ◎ Missing business flow information 16
  • 17.
    Best Practices ◎Make suredata is transformed correctly ◎Without any data loss and truncation projected data should be loaded into the data warehouse ◎Ensure that ETL application appropriately rejects and replaces with default values and reports invalid data ◎Ensure appropriate load occurs at each data layer 17
  • 18.
    Contd.. ◎Need to ensurethat the data loaded in data warehouse within prescribed and expected time frames to confirm scalability and performance ◎Ensure records are updated as per appropriate Business Key in the target database tables ◎Ensure coding standards are in place while designing ETL mappings 18
  • 19.
  • 20.
    Thanks! Any questions? You canfind me at: connect2chetan@live.com +91-9765180008 / Chetan_G 20