SlideShare a Scribd company logo
1 of 25
Introduction to ETL TestingIntroduction to ETL Testing
The process of updating the data
warehouse.
Design by :- Vibrant
Technologies & computers
Two Data Warehousing StrategiesTwo Data Warehousing Strategies
• Enterprise-wide warehouse, top down, the Inmon
methodology
• Data mart, bottom up, the Kimball methodology
• When properly executed, both result in an
enterprise-wide data warehouse
The Data Mart StrategyThe Data Mart Strategy
• The most common approach
• Begins with a single mart and architected marts are
added over time for more subject areas
• Relatively inexpensive and easy to implement
• Can be used as a proof of concept for data
warehousing
• Can perpetuate the “silos of information” problem
• Can postpone difficult decisions and activities
• Requires an overall integration plan
The Enterprise-wide StrategyThe Enterprise-wide Strategy
• A comprehensive warehouse is built initially
• An initial dependent data mart is built using a
subset of the data in the warehouse
• Additional data marts are built using subsets of the
data in the warehouse
• Like all complex projects, it is expensive, time
consuming, and prone to failure
• When successful, it results in an integrated, scalable
warehouse
Data Sources and TypesData Sources and Types
• Primarily from legacy, operational systems
• Almost exclusively numerical data at the present
time
• External data may be included, often purchased
from third-party sources
• Technology exists for storing unstructured data and
expect this to become more important over time
Extraction, Transformation, and LoadingExtraction, Transformation, and Loading
(ETL) Processes(ETL) Processes
• The “plumbing” work of data warehousing
• Data are moved from source to target data bases
• A very costly, time consuming part of data
warehousing
Recent Development:Recent Development:
More Frequent UpdatesMore Frequent Updates
• Updates can be done in bulk and trickle modes
• Business requirements, such as trading partner
access to a Web site, requires current data
• For international firms, there is no good time to load
the warehouse
Recent Development:Recent Development:
Clickstream DataClickstream Data
• Results from clicks at web sites
• A dialog manager handles user interactions. An
ODS (operational data store in the data staging
area) helps to custom tailor the dialog
• The clickstream data is filtered and parsed and
sent to a data warehouse where it is analyzed
• Software is available to analyze the clickstream
data
Data ExtractionData Extraction
• Often performed by COBOL routines
(not recommended because of high program
maintenance and no automatically generated
meta data)
• Sometimes source data is copied to the target
database using the replication capabilities of
standard RDMS (not recommended because of
“dirty data” in the source systems)
• Increasing performed by specialized ETL software
Sample ETL ToolsSample ETL Tools
• Teradata Warehouse Builder from Teradata
• DataStage from Ascential Software
• SAS System from SAS Institute
• Power Mart/Power Center from Informatica
• Sagent Solution from Sagent Software
• Hummingbird Genio Suite from Hummingbird
Communications
Reasons for “Dirty” DataReasons for “Dirty” Data
• Dummy Values
• Absence of Data
• Multipurpose Fields
• Cryptic Data
• Contradicting Data
• Inappropriate Use of Address Lines
• Violation of Business Rules
• Reused Primary Keys,
• Non-Unique Identifiers
• Data Integration Problems
Data CleansingData Cleansing
• Source systems contain “dirty data” that must be cleansed
• ETL software contains rudimentary data cleansing capabilities
• Specialized data cleansing software is often used. Important
for performing name and address correction and
householding functions
• Leading data cleansing vendors include Vality (Integrity),
Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
Steps in Data CleansingSteps in Data Cleansing
• Parsing
• Correcting
• Standardizing
• Matching
• Consolidating
ParsingParsing
• Parsing locates and identifies individual data
elements in the source files and then isolates these
data elements in the target files.
• Examples include parsing the first, middle, and last
name; street number and street name; and city
and state.
CorrectingCorrecting
• Corrects parsed individual data components using
sophisticated data algorithms and secondary data
sources.
• Example include replacing a vanity address and
adding a zip code.
StandardizingStandardizing
• Standardizing applies conversion routines to
transform data into its preferred (and consistent)
format using both standard and custom business
rules.
• Examples include adding a pre name, replacing a
nickname, and using a preferred street name.
MatchingMatching
• Searching and matching records within and across
the parsed, corrected and standardized data
based on predefined business rules to eliminate
duplications.
• Examples include identifying similar names and
addresses.
ConsolidatingConsolidating
• Analyzing and identifying relationships between
matched records and consolidating/merging them
into ONE representation.
Data StagingData Staging
• Often used as an interim step between data extraction
and later steps
• Accumulates data from asynchronous sources using
native interfaces, flat files, FTP sessions, or other
processes
• At a predefined cutoff time, data in the staging file is
transformed and loaded to the warehouse
• There is usually no end user access to the staging file
• An operational data store may be used for data staging
Data TransformationData Transformation
• Transforms the data in accordance with the
business rules and standards that have been
established
• Example include: format changes, deduplication,
splitting up fields, replacement of codes, derived
values, and aggregates
Data LoadingData Loading
• Data are physically moved to the data warehouse
• The loading takes place within a “load window”
• The trend is to near real time updates of the data
warehouse as the warehouse is increasingly used for
operational applications
Meta DataMeta Data
• Data about data
• Needed by both information technology
personnel and users
• IT personnel need to know data sources and
targets; database, table and column names;
refresh schedules; data usage measures; etc.
• Users need to know entity/attribute definitions;
reports/query tools available; report distribution
information; help desk contact information, etc.
Recent Development:Recent Development:
Meta Data IntegrationMeta Data Integration
• A growing realization that meta data is critical
to data warehousing success
• Progress is being made on getting vendors to
agree on standards and to incorporate the
sharing of meta data among their tools
• Vendors like Microsoft, Computer Associates,
and Oracle have entered the meta data
marketplace with significant product offerings
ThankThank You !!!You !!!
For More Information click below link:
Follow Us on:
http://vibranttechnologies.co.in/etl-testing-classes-in-mu

More Related Content

What's hot

Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasiryasir873
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehousesDhani Ahmad
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 
A 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data managementA 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data managementETLSolutions
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design phanleson
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data miningzafrii
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 

What's hot (18)

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Presentation
PresentationPresentation
Presentation
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
A 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data managementA 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data management
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 

Viewers also liked

ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingVibrant Event
 
Ethical Hacking - Introduction to Computer Security
Ethical Hacking - Introduction to Computer Security Ethical Hacking - Introduction to Computer Security
Ethical Hacking - Introduction to Computer Security Vibrant Event
 
Introduction To Computer Security
Introduction To Computer SecurityIntroduction To Computer Security
Introduction To Computer SecurityVibrant Event
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 

Viewers also liked (11)

ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
Ethical Hacking - Introduction to Computer Security
Ethical Hacking - Introduction to Computer Security Ethical Hacking - Introduction to Computer Security
Ethical Hacking - Introduction to Computer Security
 
Introduction To Computer Security
Introduction To Computer SecurityIntroduction To Computer Security
Introduction To Computer Security
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Ethical Hacking - Introduction to Computer Security
Ethical Hacking - Introduction to Computer SecurityEthical Hacking - Introduction to Computer Security
Ethical Hacking - Introduction to Computer Security
 
Qtp - Introduction to automation basics
Qtp -  Introduction to automation basicsQtp -  Introduction to automation basics
Qtp - Introduction to automation basics
 
Qtp - Introduction to fundamentals of vbscript
Qtp - Introduction to fundamentals of vbscriptQtp - Introduction to fundamentals of vbscript
Qtp - Introduction to fundamentals of vbscript
 
SQL for ETL Testing
SQL for ETL TestingSQL for ETL Testing
SQL for ETL Testing
 
Selenium testing - Handle Elements in WebDriver
Selenium testing - Handle Elements in WebDriver Selenium testing - Handle Elements in WebDriver
Selenium testing - Handle Elements in WebDriver
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 

Similar to ETL Testing - Introduction to ETL testing

Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptRafiulHasan19
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
extract, transform, load_Data Analyt.ppt
extract, transform, load_Data Analyt.pptextract, transform, load_Data Analyt.ppt
extract, transform, load_Data Analyt.pptNeerupa Chauhan
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsutsav25khel
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016DataGenic Ltd
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehousessuser7fc7eb
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Lesa Cote
 

Similar to ETL Testing - Introduction to ETL testing (20)

Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
D01 etl
D01 etlD01 etl
D01 etl
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
extract, transform, load_Data Analyt.ppt
extract, transform, load_Data Analyt.pptextract, transform, load_Data Analyt.ppt
extract, transform, load_Data Analyt.ppt
 
Chapter 6.pptx
Chapter 6.pptxChapter 6.pptx
Chapter 6.pptx
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering students
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 

Recently uploaded

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

ETL Testing - Introduction to ETL testing

  • 1.
  • 2. Introduction to ETL TestingIntroduction to ETL Testing The process of updating the data warehouse. Design by :- Vibrant Technologies & computers
  • 3. Two Data Warehousing StrategiesTwo Data Warehousing Strategies • Enterprise-wide warehouse, top down, the Inmon methodology • Data mart, bottom up, the Kimball methodology • When properly executed, both result in an enterprise-wide data warehouse
  • 4. The Data Mart StrategyThe Data Mart Strategy • The most common approach • Begins with a single mart and architected marts are added over time for more subject areas • Relatively inexpensive and easy to implement • Can be used as a proof of concept for data warehousing • Can perpetuate the “silos of information” problem • Can postpone difficult decisions and activities • Requires an overall integration plan
  • 5. The Enterprise-wide StrategyThe Enterprise-wide Strategy • A comprehensive warehouse is built initially • An initial dependent data mart is built using a subset of the data in the warehouse • Additional data marts are built using subsets of the data in the warehouse • Like all complex projects, it is expensive, time consuming, and prone to failure • When successful, it results in an integrated, scalable warehouse
  • 6. Data Sources and TypesData Sources and Types • Primarily from legacy, operational systems • Almost exclusively numerical data at the present time • External data may be included, often purchased from third-party sources • Technology exists for storing unstructured data and expect this to become more important over time
  • 7. Extraction, Transformation, and LoadingExtraction, Transformation, and Loading (ETL) Processes(ETL) Processes • The “plumbing” work of data warehousing • Data are moved from source to target data bases • A very costly, time consuming part of data warehousing
  • 8. Recent Development:Recent Development: More Frequent UpdatesMore Frequent Updates • Updates can be done in bulk and trickle modes • Business requirements, such as trading partner access to a Web site, requires current data • For international firms, there is no good time to load the warehouse
  • 9. Recent Development:Recent Development: Clickstream DataClickstream Data • Results from clicks at web sites • A dialog manager handles user interactions. An ODS (operational data store in the data staging area) helps to custom tailor the dialog • The clickstream data is filtered and parsed and sent to a data warehouse where it is analyzed • Software is available to analyze the clickstream data
  • 10. Data ExtractionData Extraction • Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) • Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems) • Increasing performed by specialized ETL software
  • 11. Sample ETL ToolsSample ETL Tools • Teradata Warehouse Builder from Teradata • DataStage from Ascential Software • SAS System from SAS Institute • Power Mart/Power Center from Informatica • Sagent Solution from Sagent Software • Hummingbird Genio Suite from Hummingbird Communications
  • 12. Reasons for “Dirty” DataReasons for “Dirty” Data • Dummy Values • Absence of Data • Multipurpose Fields • Cryptic Data • Contradicting Data • Inappropriate Use of Address Lines • Violation of Business Rules • Reused Primary Keys, • Non-Unique Identifiers • Data Integration Problems
  • 13. Data CleansingData Cleansing • Source systems contain “dirty data” that must be cleansed • ETL software contains rudimentary data cleansing capabilities • Specialized data cleansing software is often used. Important for performing name and address correction and householding functions • Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
  • 14. Steps in Data CleansingSteps in Data Cleansing • Parsing • Correcting • Standardizing • Matching • Consolidating
  • 15. ParsingParsing • Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. • Examples include parsing the first, middle, and last name; street number and street name; and city and state.
  • 16. CorrectingCorrecting • Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. • Example include replacing a vanity address and adding a zip code.
  • 17. StandardizingStandardizing • Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. • Examples include adding a pre name, replacing a nickname, and using a preferred street name.
  • 18. MatchingMatching • Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. • Examples include identifying similar names and addresses.
  • 19. ConsolidatingConsolidating • Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
  • 20. Data StagingData Staging • Often used as an interim step between data extraction and later steps • Accumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processes • At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse • There is usually no end user access to the staging file • An operational data store may be used for data staging
  • 21. Data TransformationData Transformation • Transforms the data in accordance with the business rules and standards that have been established • Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
  • 22. Data LoadingData Loading • Data are physically moved to the data warehouse • The loading takes place within a “load window” • The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
  • 23. Meta DataMeta Data • Data about data • Needed by both information technology personnel and users • IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. • Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.
  • 24. Recent Development:Recent Development: Meta Data IntegrationMeta Data Integration • A growing realization that meta data is critical to data warehousing success • Progress is being made on getting vendors to agree on standards and to incorporate the sharing of meta data among their tools • Vendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offerings
  • 25. ThankThank You !!!You !!! For More Information click below link: Follow Us on: http://vibranttechnologies.co.in/etl-testing-classes-in-mu

Editor's Notes

  1. There is still debate over which approach is best.
  2. The key is to have an overall plan, processes, and technologies for integrating the different marts. The marts may be logically rather than physically separate.
  3. Even with the enterprise-wide strategy, the warehouse is developed in phases and each phase should be designed to deliver business value.
  4. It is not unusual to extract data from over 100 source systems. While the technology is available to store structured and unstructured data together, the reality is that warehouse data is almost exclusively structured -- numerical with simple textual identifiers.
  5. ETL tends to be “pick and shovel” work. Most organization’s data is even worse than imagined.
  6. As data warehousing becomes more critical to decision making and operational processes, the pressure is to have more current data, which leads to trickle updates.
  7. The ODS is used to support the web site dialog -- an operational process -- while the data in the warehouse is analyzed -- to better understand customers and their use of the web site.
  8. It’s changing, but COBOL extracts are still the most common ETL process. There are multiple reasons for this -- the cost of specialized ETL software, in-house programmers who have a good knowledge of the COBOL based source systems that will be used, and the peculiarities of the source systems that make the use of ETL software difficult.
  9. You might go to the vendors’ web sites to find a good demo to show your students.
  10. Here’s a couple of examples: Dummy data -- a clerk enters 999-99-9999 as a SSN rather than asking the customer for theirs. Reused primary keys -- a branch bank is closed. Several years later, a new branch is opened, and the old identifier is used again.
  11. Data cleansing is critical to customer relationship management initiatives.
  12. A good example to use is cleansing customer data. Most students can identify with receiving multiple copies of the same catalog because the company is not doing a good data cleansing job.
  13. The record is broken down into atomic data elements.
  14. External data, such as census data, is often used in this process.
  15. Companies decide on the standards that they want to use.
  16. Commercial data cleansing software often uses AI techniques to match records.
  17. All of the data are now combined in a standard format.
  18. Data staging is used in cleansing, transforming, and integrating the data.
  19. Aggregates, such as sales totals, are often precalculated and stored in the warehouse to speed queries that require summary totals.
  20. Most loads involve only change data rather than a bulk reloading of all of the data in the warehouse.
  21. The importance of meta data is now realized, even though creating it is not glamorous work.
  22. Historically, each vendor had their own meta data solution -- which was incompatible with other vendors’ solutions. This is changing.