SlideShare a Scribd company logo
PRESENTED BY,
YOGESH SURYAWANSHI.
ETL METHODOLOGIES
1
AGENDA
1. Overview
2. Staging Database
3. Extract Methodology
4. Transform Methodology
5. Load Methodology
6. Best Practices ELT Process
7. Summary
2
OVERVIEW
 Extraction Transformation Loading – ETL
 To get data out of the source and load it into the data warehouse – simply
a process of copying data from one database to other
 Data is extracted from OLTP database, transformed to match the data
warehouse schema and loaded into the data warehouse database
 Many data warehouses also incorporate data from non-OLTP systems
such as text files, legacy systems, and spreadsheets; such data also
requires extraction, transformation and loading
 When defining ETL for a data warehouse, it is important to think of ETL as
a process, not a physical implementation
3
OVERVIEW
 ETL is often a complex combination of process and technology that
consumes a significant portion of data warehouse development efforts
and requires the skills of business analysts, database designers and
application developers
 It is not a one-time event as new data is added to the Data Warehouse
periodically – Monthly, daily, hourly
 ETL is an integral, ongoing and recurring part of data warehouse
 Automated
 Well documented
 Easily changeable
4
STAGING AKA OPERATIONAL DATASTORE (ODS)
 ETL operations should be performed on a relational database server
separate from the source databases and the data warehouse database
 Creates a logical and physical separation between the source systems
and the data warehouse
 Minimizes the impact of the intense periodic ETL activity on source and
data warehouse databases
5
EXTRACTION
6
EXTRACT
 During data extraction, raw data is
copied or exported from source
locations to a staging area.
 Data management teams can extract
data from a variety of data sources,
which can be structured, semi-
structured, unstructured & streaming.
 Those sources include :
 SQL or NoSQL servers
 CRM and ERP systems
 Flat files
 Email
 Web pages
7
EXTRACTION
 ETL process needs to effectively integrate systems that have DBMS,
Hardware, Operating Systems and communication Protocols. Sources
include legacy applications like mainframes, customized applications,
Point of contact devices like ATM, call switches, text files, spreadsheet,
ERP, data from vendors, Partners amongst others.
 Need to have a logical data map before the physical data can be
transformed
 The logical data map describes the relationship between the extreme
starting points and the extreme ending points of your ETL system
usually presented in a table or spreadsheet
8
EXTRACTION
 The content of the logical data mapping document has been proven to
be the critical element required to efficiently plan ETL processes
 The table type gives us our queue for the ordinal position of our data
load processes—first dimensions, then facts.
 The primary purpose of this document is to provide the ETL developer
with a clear-cut blueprint (Transformation rules and logic) of exactly
what is expected from the ETL process. This table must depict, without
question, the course of action involved in the transformation process
 The transformation can contain anything from the absolute solution to
nothing at all. Most often, the transformation can be expressed in SQL.
The SQL may or may not be the complete statement
9
EXTRACTION
 Three Data Extraction methods:
 Full Extraction
 Partial Extraction – without update notification
 Partial Extraction – with update notification
 Irrespective of the method used, extraction should not affect performance
and response time of the source systems. These source systems could
be test or development or production databases. Any slow or locking
could affect company’s bottom line.
10
EXTRACTION
 Some validations are done during Extraction:
 Reconcile records with the source data
 Make sure that no spam/unwanted data loaded
 Data type check
 Remove all types of duplicate/fragmented data
 Check whether all keys are in place or not
11
TRANSFORM
12
TRANSFORM
 In the staging area, the raw data undergoes
data processing.Here, the data is transformed
and consolidated for its intended analytical
use case.
 Filtering, cleansing, de-duplicating, validating,
and authenticating the data.Performing
calculations, translations, or summarizations
based on the raw data.
13
TRANSFORM
 This can include changing row and column headers for consistency,
converting currencies or other units of measurement, editing text strings,
and more.Conducting audits to ensure data quality and compliance
 Removing, encrypting, or protecting data governed by industry or
governmental regulators Formatting the data into tables or joined tables to
match the schema of the target data warehouse.
14
TRANSFORM AKA CLEANSE DATA
 Anomaly Detection
 Data sampling – count(*) of the rows for a department column
 Column Property Enforcement
 Null Values in required columns
 Numeric values that fall outside of expected high and lows
 Columns whose lengths are exceptionally short/long
 Columns with certain values outside of discrete valid value sets
 Adherence to a required pattern/ member of a set of pattern
15
TRANSFORM
16
TRANSFORM - CONFIRMING
 Structure Enforcement
 Tables have proper primary and foreign keys
 Obey referential integrity
 Data and Rule value enforcement
 Simple business rules
 Logical data checks
17
TRANSFORMATION
18
TRANSFORM - CONFIRMING
 Data Integrity Problems
 Different spelling of the same person as Jon, John, etc.
 There are multiple ways to denote company name like Google, Google
Inc.
 Use of different names like Cleveland, cleveland .
 There may be a case that different account numbers are generated by
various applications for the same customer.
 In some data required files remains blank
 Invalid product collected at POS as manual entry can lead to mistakes.
19
TRANSFORMATION
20
TRANSFORM - CONFIRMING
 Validation During the Stage
 Filtering — Select only certain columns to load
 Using rules and lookup tables for Data standardization
 Character Set Conversion and encoding handling
 Conversion of Units of Measurements like Date Time Conversion,
currency conversions, numerical conversions, etc.
 Data threshold validation check. For example, age cannot be more
than two digits for an employee.
 Data flow validation from the staging area to the intermediate tables.
21
TRANSFORM - CONFIRMING
 Validation During the Stage
 Required fields should not be left blank.
 Cleaning ( for example, mapping NULL to 0 or Gender Male to "M"
and Female to "F" etc.)
 Split a column into multiples and merging multiple columns into a
single column.
 Transposing rows and columns.
 Use lookups to merge data
 Using any complex data validation (e.g., if the first two columns in a
row are empty then it automatically reject the row from processing)
22
LOADING
23
LOAD
 In this last step, the transformed data is
moved from the staging area into a target
data warehouse.
 Typically, this involves an initial loading of
all data, followed by periodic loading of
incremental data changes and, less often,
full refreshes to erase and replace data in
the warehouse.
24
TRANSFORMATION
25
LOAD
 Loading data into the target Datawarehouse database is the last step of
the ETL process. In a typical Data warehouse, huge volume of data needs
to be loaded in a relatively short period (nights). Hence, load process
should be optimized for performance
 In case of load failure, recover mechanisms should be configured to
restart from the point of failure without data integrity loss. Data Warehouse
admins need to monitor, resume, cancel loads as per prevailing server
performance
26
LOAD
 Types of Loading :
 Initial Load – Populating all the Data Warehouse tables
 Incremental Load – apply ongoing changes as when needed
periodically
 Full refresh – erasing the contents of one or more tables and reloading
with fresh Data
27
LOAD
 Load verification
 Ensure that the key field data is neither missing nor null. Test modeling
views based on the target tables.
 Check that combined values and calculated measures.
 Data checks in dimension table as well as history table.
 Check the BI reports on the loaded fact and dimension table.
28
BEST PRACTICES ELT PROCESS
 Never try to cleanse all the data
 Every organization would like to have all the data clean, but most of
them are not ready to pay to wait or not ready to wait. To clean it all
would simply take too long, So it is better not to try to cleanse all the
data. Cleanse only relevant data.
 Never cleanse Anything
 Always plan to clean something because the biggest reason for
building the Data Warehouse is to offer cleaner and more reliable data.
29
BEST PRACTICES ELT PROCESS
 Determine the cost of cleansing the data
 Before cleansing all the dirty data, it is important for you to
determine the cleansing cost for every dirty data element.
 Determine the cost per data element.
30
SUMMARY
 ETL is an abbreviation of Extract, Transform and Load.
 ETL provides a method of moving the data from various sources into a
data warehouse.
 In the first step extraction, data is extracted from the source system
into the staging area.
 In the transformation step, the data extracted from source is cleansed
and transformed.
 Loading data into the target Datawarehouse is the last step of the ETL
process.
32
REFERENCE
 ETL Process in Data Warehouse by Chirayu Poundarik.
 ETL Methodology - https://www.ibm.com/in-en/cloud/learn/etl.
THANK YOU !!!
Q & A

More Related Content

Similar to ETL_Methodology.pptx

Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
ganblues
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data Wharehouse
BugRaptors
 
Data Migration vs ETL Know Key Difference
Data Migration vs ETL Know Key DifferenceData Migration vs ETL Know Key Difference
Data Migration vs ETL Know Key Difference
varshanayak241
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETL
idescitation
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
abhaybansal43
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
raianup
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
Deepali Raut
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
Gaurav Bhatnagar
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
Thinqloud
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
ABDUL KHALIQ
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
nikshaikh786
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
astronish
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
kzayra69
 
Etl testing
Etl testingEtl testing
Etl testing
Sandip Patil
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
obieefans
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
David Pedreno
 
Asset Finance Systems Implementation
Asset Finance Systems ImplementationAsset Finance Systems Implementation
Asset Finance Systems Implementation
David Pedreno
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
David Pedreno
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process
RashidRiaz18
 
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET Journal
 

Similar to ETL_Methodology.pptx (20)

Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data Wharehouse
 
Data Migration vs ETL Know Key Difference
Data Migration vs ETL Know Key DifferenceData Migration vs ETL Know Key Difference
Data Migration vs ETL Know Key Difference
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETL
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Etl testing
Etl testingEtl testing
Etl testing
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
 
Asset Finance Systems Implementation
Asset Finance Systems ImplementationAsset Finance Systems Implementation
Asset Finance Systems Implementation
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process
 
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
 

Recently uploaded

Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

ETL_Methodology.pptx

  • 2. 1 AGENDA 1. Overview 2. Staging Database 3. Extract Methodology 4. Transform Methodology 5. Load Methodology 6. Best Practices ELT Process 7. Summary
  • 3. 2 OVERVIEW  Extraction Transformation Loading – ETL  To get data out of the source and load it into the data warehouse – simply a process of copying data from one database to other  Data is extracted from OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database  Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems, and spreadsheets; such data also requires extraction, transformation and loading  When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation
  • 4. 3 OVERVIEW  ETL is often a complex combination of process and technology that consumes a significant portion of data warehouse development efforts and requires the skills of business analysts, database designers and application developers  It is not a one-time event as new data is added to the Data Warehouse periodically – Monthly, daily, hourly  ETL is an integral, ongoing and recurring part of data warehouse  Automated  Well documented  Easily changeable
  • 5. 4 STAGING AKA OPERATIONAL DATASTORE (ODS)  ETL operations should be performed on a relational database server separate from the source databases and the data warehouse database  Creates a logical and physical separation between the source systems and the data warehouse  Minimizes the impact of the intense periodic ETL activity on source and data warehouse databases
  • 7. 6 EXTRACT  During data extraction, raw data is copied or exported from source locations to a staging area.  Data management teams can extract data from a variety of data sources, which can be structured, semi- structured, unstructured & streaming.  Those sources include :  SQL or NoSQL servers  CRM and ERP systems  Flat files  Email  Web pages
  • 8. 7 EXTRACTION  ETL process needs to effectively integrate systems that have DBMS, Hardware, Operating Systems and communication Protocols. Sources include legacy applications like mainframes, customized applications, Point of contact devices like ATM, call switches, text files, spreadsheet, ERP, data from vendors, Partners amongst others.  Need to have a logical data map before the physical data can be transformed  The logical data map describes the relationship between the extreme starting points and the extreme ending points of your ETL system usually presented in a table or spreadsheet
  • 9. 8 EXTRACTION  The content of the logical data mapping document has been proven to be the critical element required to efficiently plan ETL processes  The table type gives us our queue for the ordinal position of our data load processes—first dimensions, then facts.  The primary purpose of this document is to provide the ETL developer with a clear-cut blueprint (Transformation rules and logic) of exactly what is expected from the ETL process. This table must depict, without question, the course of action involved in the transformation process  The transformation can contain anything from the absolute solution to nothing at all. Most often, the transformation can be expressed in SQL. The SQL may or may not be the complete statement
  • 10. 9 EXTRACTION  Three Data Extraction methods:  Full Extraction  Partial Extraction – without update notification  Partial Extraction – with update notification  Irrespective of the method used, extraction should not affect performance and response time of the source systems. These source systems could be test or development or production databases. Any slow or locking could affect company’s bottom line.
  • 11. 10 EXTRACTION  Some validations are done during Extraction:  Reconcile records with the source data  Make sure that no spam/unwanted data loaded  Data type check  Remove all types of duplicate/fragmented data  Check whether all keys are in place or not
  • 13. 12 TRANSFORM  In the staging area, the raw data undergoes data processing.Here, the data is transformed and consolidated for its intended analytical use case.  Filtering, cleansing, de-duplicating, validating, and authenticating the data.Performing calculations, translations, or summarizations based on the raw data.
  • 14. 13 TRANSFORM  This can include changing row and column headers for consistency, converting currencies or other units of measurement, editing text strings, and more.Conducting audits to ensure data quality and compliance  Removing, encrypting, or protecting data governed by industry or governmental regulators Formatting the data into tables or joined tables to match the schema of the target data warehouse.
  • 15. 14 TRANSFORM AKA CLEANSE DATA  Anomaly Detection  Data sampling – count(*) of the rows for a department column  Column Property Enforcement  Null Values in required columns  Numeric values that fall outside of expected high and lows  Columns whose lengths are exceptionally short/long  Columns with certain values outside of discrete valid value sets  Adherence to a required pattern/ member of a set of pattern
  • 17. 16 TRANSFORM - CONFIRMING  Structure Enforcement  Tables have proper primary and foreign keys  Obey referential integrity  Data and Rule value enforcement  Simple business rules  Logical data checks
  • 19. 18 TRANSFORM - CONFIRMING  Data Integrity Problems  Different spelling of the same person as Jon, John, etc.  There are multiple ways to denote company name like Google, Google Inc.  Use of different names like Cleveland, cleveland .  There may be a case that different account numbers are generated by various applications for the same customer.  In some data required files remains blank  Invalid product collected at POS as manual entry can lead to mistakes.
  • 21. 20 TRANSFORM - CONFIRMING  Validation During the Stage  Filtering — Select only certain columns to load  Using rules and lookup tables for Data standardization  Character Set Conversion and encoding handling  Conversion of Units of Measurements like Date Time Conversion, currency conversions, numerical conversions, etc.  Data threshold validation check. For example, age cannot be more than two digits for an employee.  Data flow validation from the staging area to the intermediate tables.
  • 22. 21 TRANSFORM - CONFIRMING  Validation During the Stage  Required fields should not be left blank.  Cleaning ( for example, mapping NULL to 0 or Gender Male to "M" and Female to "F" etc.)  Split a column into multiples and merging multiple columns into a single column.  Transposing rows and columns.  Use lookups to merge data  Using any complex data validation (e.g., if the first two columns in a row are empty then it automatically reject the row from processing)
  • 24. 23 LOAD  In this last step, the transformed data is moved from the staging area into a target data warehouse.  Typically, this involves an initial loading of all data, followed by periodic loading of incremental data changes and, less often, full refreshes to erase and replace data in the warehouse.
  • 26. 25 LOAD  Loading data into the target Datawarehouse database is the last step of the ETL process. In a typical Data warehouse, huge volume of data needs to be loaded in a relatively short period (nights). Hence, load process should be optimized for performance  In case of load failure, recover mechanisms should be configured to restart from the point of failure without data integrity loss. Data Warehouse admins need to monitor, resume, cancel loads as per prevailing server performance
  • 27. 26 LOAD  Types of Loading :  Initial Load – Populating all the Data Warehouse tables  Incremental Load – apply ongoing changes as when needed periodically  Full refresh – erasing the contents of one or more tables and reloading with fresh Data
  • 28. 27 LOAD  Load verification  Ensure that the key field data is neither missing nor null. Test modeling views based on the target tables.  Check that combined values and calculated measures.  Data checks in dimension table as well as history table.  Check the BI reports on the loaded fact and dimension table.
  • 29. 28 BEST PRACTICES ELT PROCESS  Never try to cleanse all the data  Every organization would like to have all the data clean, but most of them are not ready to pay to wait or not ready to wait. To clean it all would simply take too long, So it is better not to try to cleanse all the data. Cleanse only relevant data.  Never cleanse Anything  Always plan to clean something because the biggest reason for building the Data Warehouse is to offer cleaner and more reliable data.
  • 30. 29 BEST PRACTICES ELT PROCESS  Determine the cost of cleansing the data  Before cleansing all the dirty data, it is important for you to determine the cleansing cost for every dirty data element.  Determine the cost per data element.
  • 31. 30 SUMMARY  ETL is an abbreviation of Extract, Transform and Load.  ETL provides a method of moving the data from various sources into a data warehouse.  In the first step extraction, data is extracted from the source system into the staging area.  In the transformation step, the data extracted from source is cleansed and transformed.  Loading data into the target Datawarehouse is the last step of the ETL process.
  • 32. 32 REFERENCE  ETL Process in Data Warehouse by Chirayu Poundarik.  ETL Methodology - https://www.ibm.com/in-en/cloud/learn/etl.
  • 34. Q & A

Editor's Notes

  1. To Change the Background Picture Follow the steps below Mouse Right button click>Format Background>Select Picture or Texture File>Click “File” button>Browse and select the image from your computer>Click Insert That’s it. You are Done !!!