SlideShare a Scribd company logo
1 of 34
PRESENTED BY,
YOGESH SURYAWANSHI.
ETL METHODOLOGIES
1
AGENDA
1. Overview
2. Staging Database
3. Extract Methodology
4. Transform Methodology
5. Load Methodology
6. Best Practices ELT Process
7. Summary
2
OVERVIEW
 Extraction Transformation Loading – ETL
 To get data out of the source and load it into the data warehouse – simply
a process of copying data from one database to other
 Data is extracted from OLTP database, transformed to match the data
warehouse schema and loaded into the data warehouse database
 Many data warehouses also incorporate data from non-OLTP systems
such as text files, legacy systems, and spreadsheets; such data also
requires extraction, transformation and loading
 When defining ETL for a data warehouse, it is important to think of ETL as
a process, not a physical implementation
3
OVERVIEW
 ETL is often a complex combination of process and technology that
consumes a significant portion of data warehouse development efforts
and requires the skills of business analysts, database designers and
application developers
 It is not a one-time event as new data is added to the Data Warehouse
periodically – Monthly, daily, hourly
 ETL is an integral, ongoing and recurring part of data warehouse
 Automated
 Well documented
 Easily changeable
4
STAGING AKA OPERATIONAL DATASTORE (ODS)
 ETL operations should be performed on a relational database server
separate from the source databases and the data warehouse database
 Creates a logical and physical separation between the source systems
and the data warehouse
 Minimizes the impact of the intense periodic ETL activity on source and
data warehouse databases
5
EXTRACTION
6
EXTRACT
 During data extraction, raw data is
copied or exported from source
locations to a staging area.
 Data management teams can extract
data from a variety of data sources,
which can be structured, semi-
structured, unstructured & streaming.
 Those sources include :
 SQL or NoSQL servers
 CRM and ERP systems
 Flat files
 Email
 Web pages
7
EXTRACTION
 ETL process needs to effectively integrate systems that have DBMS,
Hardware, Operating Systems and communication Protocols. Sources
include legacy applications like mainframes, customized applications,
Point of contact devices like ATM, call switches, text files, spreadsheet,
ERP, data from vendors, Partners amongst others.
 Need to have a logical data map before the physical data can be
transformed
 The logical data map describes the relationship between the extreme
starting points and the extreme ending points of your ETL system
usually presented in a table or spreadsheet
8
EXTRACTION
 The content of the logical data mapping document has been proven to
be the critical element required to efficiently plan ETL processes
 The table type gives us our queue for the ordinal position of our data
load processes—first dimensions, then facts.
 The primary purpose of this document is to provide the ETL developer
with a clear-cut blueprint (Transformation rules and logic) of exactly
what is expected from the ETL process. This table must depict, without
question, the course of action involved in the transformation process
 The transformation can contain anything from the absolute solution to
nothing at all. Most often, the transformation can be expressed in SQL.
The SQL may or may not be the complete statement
9
EXTRACTION
 Three Data Extraction methods:
 Full Extraction
 Partial Extraction – without update notification
 Partial Extraction – with update notification
 Irrespective of the method used, extraction should not affect performance
and response time of the source systems. These source systems could
be test or development or production databases. Any slow or locking
could affect company’s bottom line.
10
EXTRACTION
 Some validations are done during Extraction:
 Reconcile records with the source data
 Make sure that no spam/unwanted data loaded
 Data type check
 Remove all types of duplicate/fragmented data
 Check whether all keys are in place or not
11
TRANSFORM
12
TRANSFORM
 In the staging area, the raw data undergoes
data processing.Here, the data is transformed
and consolidated for its intended analytical
use case.
 Filtering, cleansing, de-duplicating, validating,
and authenticating the data.Performing
calculations, translations, or summarizations
based on the raw data.
13
TRANSFORM
 This can include changing row and column headers for consistency,
converting currencies or other units of measurement, editing text strings,
and more.Conducting audits to ensure data quality and compliance
 Removing, encrypting, or protecting data governed by industry or
governmental regulators Formatting the data into tables or joined tables to
match the schema of the target data warehouse.
14
TRANSFORM AKA CLEANSE DATA
 Anomaly Detection
 Data sampling – count(*) of the rows for a department column
 Column Property Enforcement
 Null Values in required columns
 Numeric values that fall outside of expected high and lows
 Columns whose lengths are exceptionally short/long
 Columns with certain values outside of discrete valid value sets
 Adherence to a required pattern/ member of a set of pattern
15
TRANSFORM
16
TRANSFORM - CONFIRMING
 Structure Enforcement
 Tables have proper primary and foreign keys
 Obey referential integrity
 Data and Rule value enforcement
 Simple business rules
 Logical data checks
17
TRANSFORMATION
18
TRANSFORM - CONFIRMING
 Data Integrity Problems
 Different spelling of the same person as Jon, John, etc.
 There are multiple ways to denote company name like Google, Google
Inc.
 Use of different names like Cleveland, cleveland .
 There may be a case that different account numbers are generated by
various applications for the same customer.
 In some data required files remains blank
 Invalid product collected at POS as manual entry can lead to mistakes.
19
TRANSFORMATION
20
TRANSFORM - CONFIRMING
 Validation During the Stage
 Filtering — Select only certain columns to load
 Using rules and lookup tables for Data standardization
 Character Set Conversion and encoding handling
 Conversion of Units of Measurements like Date Time Conversion,
currency conversions, numerical conversions, etc.
 Data threshold validation check. For example, age cannot be more
than two digits for an employee.
 Data flow validation from the staging area to the intermediate tables.
21
TRANSFORM - CONFIRMING
 Validation During the Stage
 Required fields should not be left blank.
 Cleaning ( for example, mapping NULL to 0 or Gender Male to "M"
and Female to "F" etc.)
 Split a column into multiples and merging multiple columns into a
single column.
 Transposing rows and columns.
 Use lookups to merge data
 Using any complex data validation (e.g., if the first two columns in a
row are empty then it automatically reject the row from processing)
22
LOADING
23
LOAD
 In this last step, the transformed data is
moved from the staging area into a target
data warehouse.
 Typically, this involves an initial loading of
all data, followed by periodic loading of
incremental data changes and, less often,
full refreshes to erase and replace data in
the warehouse.
24
TRANSFORMATION
25
LOAD
 Loading data into the target Datawarehouse database is the last step of
the ETL process. In a typical Data warehouse, huge volume of data needs
to be loaded in a relatively short period (nights). Hence, load process
should be optimized for performance
 In case of load failure, recover mechanisms should be configured to
restart from the point of failure without data integrity loss. Data Warehouse
admins need to monitor, resume, cancel loads as per prevailing server
performance
26
LOAD
 Types of Loading :
 Initial Load – Populating all the Data Warehouse tables
 Incremental Load – apply ongoing changes as when needed
periodically
 Full refresh – erasing the contents of one or more tables and reloading
with fresh Data
27
LOAD
 Load verification
 Ensure that the key field data is neither missing nor null. Test modeling
views based on the target tables.
 Check that combined values and calculated measures.
 Data checks in dimension table as well as history table.
 Check the BI reports on the loaded fact and dimension table.
28
BEST PRACTICES ELT PROCESS
 Never try to cleanse all the data
 Every organization would like to have all the data clean, but most of
them are not ready to pay to wait or not ready to wait. To clean it all
would simply take too long, So it is better not to try to cleanse all the
data. Cleanse only relevant data.
 Never cleanse Anything
 Always plan to clean something because the biggest reason for
building the Data Warehouse is to offer cleaner and more reliable data.
29
BEST PRACTICES ELT PROCESS
 Determine the cost of cleansing the data
 Before cleansing all the dirty data, it is important for you to
determine the cleansing cost for every dirty data element.
 Determine the cost per data element.
30
SUMMARY
 ETL is an abbreviation of Extract, Transform and Load.
 ETL provides a method of moving the data from various sources into a
data warehouse.
 In the first step extraction, data is extracted from the source system
into the staging area.
 In the transformation step, the data extracted from source is cleansed
and transformed.
 Loading data into the target Datawarehouse is the last step of the ETL
process.
32
REFERENCE
 ETL Process in Data Warehouse by Chirayu Poundarik.
 ETL Methodology - https://www.ibm.com/in-en/cloud/learn/etl.
THANK YOU !!!
Q & A

More Related Content

Similar to ETL_Methodology.pptx

What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLidescitation
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdfabhaybansal43
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testingraianup
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migrationThinqloud
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxnikshaikh786
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyushastronish
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementationDavid Pedreno
 
Asset Finance Systems Implementation
Asset Finance Systems ImplementationAsset Finance Systems Implementation
Asset Finance Systems ImplementationDavid Pedreno
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementationDavid Pedreno
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration processRashidRiaz18
 
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET Journal
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become ObsoleteJerald Burget
 

Similar to ETL_Methodology.pptx (20)

What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data Wharehouse
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETL
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Etl testing
Etl testingEtl testing
Etl testing
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
 
Asset Finance Systems Implementation
Asset Finance Systems ImplementationAsset Finance Systems Implementation
Asset Finance Systems Implementation
 
Asset finance systems implementation
Asset finance systems implementationAsset finance systems implementation
Asset finance systems implementation
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process
 
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data WarehousingIRJET- Comparative Study of ETL and E-LT in Data Warehousing
IRJET- Comparative Study of ETL and E-LT in Data Warehousing
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become Obsolete
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

ETL_Methodology.pptx

  • 2. 1 AGENDA 1. Overview 2. Staging Database 3. Extract Methodology 4. Transform Methodology 5. Load Methodology 6. Best Practices ELT Process 7. Summary
  • 3. 2 OVERVIEW  Extraction Transformation Loading – ETL  To get data out of the source and load it into the data warehouse – simply a process of copying data from one database to other  Data is extracted from OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database  Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems, and spreadsheets; such data also requires extraction, transformation and loading  When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation
  • 4. 3 OVERVIEW  ETL is often a complex combination of process and technology that consumes a significant portion of data warehouse development efforts and requires the skills of business analysts, database designers and application developers  It is not a one-time event as new data is added to the Data Warehouse periodically – Monthly, daily, hourly  ETL is an integral, ongoing and recurring part of data warehouse  Automated  Well documented  Easily changeable
  • 5. 4 STAGING AKA OPERATIONAL DATASTORE (ODS)  ETL operations should be performed on a relational database server separate from the source databases and the data warehouse database  Creates a logical and physical separation between the source systems and the data warehouse  Minimizes the impact of the intense periodic ETL activity on source and data warehouse databases
  • 7. 6 EXTRACT  During data extraction, raw data is copied or exported from source locations to a staging area.  Data management teams can extract data from a variety of data sources, which can be structured, semi- structured, unstructured & streaming.  Those sources include :  SQL or NoSQL servers  CRM and ERP systems  Flat files  Email  Web pages
  • 8. 7 EXTRACTION  ETL process needs to effectively integrate systems that have DBMS, Hardware, Operating Systems and communication Protocols. Sources include legacy applications like mainframes, customized applications, Point of contact devices like ATM, call switches, text files, spreadsheet, ERP, data from vendors, Partners amongst others.  Need to have a logical data map before the physical data can be transformed  The logical data map describes the relationship between the extreme starting points and the extreme ending points of your ETL system usually presented in a table or spreadsheet
  • 9. 8 EXTRACTION  The content of the logical data mapping document has been proven to be the critical element required to efficiently plan ETL processes  The table type gives us our queue for the ordinal position of our data load processes—first dimensions, then facts.  The primary purpose of this document is to provide the ETL developer with a clear-cut blueprint (Transformation rules and logic) of exactly what is expected from the ETL process. This table must depict, without question, the course of action involved in the transformation process  The transformation can contain anything from the absolute solution to nothing at all. Most often, the transformation can be expressed in SQL. The SQL may or may not be the complete statement
  • 10. 9 EXTRACTION  Three Data Extraction methods:  Full Extraction  Partial Extraction – without update notification  Partial Extraction – with update notification  Irrespective of the method used, extraction should not affect performance and response time of the source systems. These source systems could be test or development or production databases. Any slow or locking could affect company’s bottom line.
  • 11. 10 EXTRACTION  Some validations are done during Extraction:  Reconcile records with the source data  Make sure that no spam/unwanted data loaded  Data type check  Remove all types of duplicate/fragmented data  Check whether all keys are in place or not
  • 13. 12 TRANSFORM  In the staging area, the raw data undergoes data processing.Here, the data is transformed and consolidated for its intended analytical use case.  Filtering, cleansing, de-duplicating, validating, and authenticating the data.Performing calculations, translations, or summarizations based on the raw data.
  • 14. 13 TRANSFORM  This can include changing row and column headers for consistency, converting currencies or other units of measurement, editing text strings, and more.Conducting audits to ensure data quality and compliance  Removing, encrypting, or protecting data governed by industry or governmental regulators Formatting the data into tables or joined tables to match the schema of the target data warehouse.
  • 15. 14 TRANSFORM AKA CLEANSE DATA  Anomaly Detection  Data sampling – count(*) of the rows for a department column  Column Property Enforcement  Null Values in required columns  Numeric values that fall outside of expected high and lows  Columns whose lengths are exceptionally short/long  Columns with certain values outside of discrete valid value sets  Adherence to a required pattern/ member of a set of pattern
  • 17. 16 TRANSFORM - CONFIRMING  Structure Enforcement  Tables have proper primary and foreign keys  Obey referential integrity  Data and Rule value enforcement  Simple business rules  Logical data checks
  • 19. 18 TRANSFORM - CONFIRMING  Data Integrity Problems  Different spelling of the same person as Jon, John, etc.  There are multiple ways to denote company name like Google, Google Inc.  Use of different names like Cleveland, cleveland .  There may be a case that different account numbers are generated by various applications for the same customer.  In some data required files remains blank  Invalid product collected at POS as manual entry can lead to mistakes.
  • 21. 20 TRANSFORM - CONFIRMING  Validation During the Stage  Filtering — Select only certain columns to load  Using rules and lookup tables for Data standardization  Character Set Conversion and encoding handling  Conversion of Units of Measurements like Date Time Conversion, currency conversions, numerical conversions, etc.  Data threshold validation check. For example, age cannot be more than two digits for an employee.  Data flow validation from the staging area to the intermediate tables.
  • 22. 21 TRANSFORM - CONFIRMING  Validation During the Stage  Required fields should not be left blank.  Cleaning ( for example, mapping NULL to 0 or Gender Male to "M" and Female to "F" etc.)  Split a column into multiples and merging multiple columns into a single column.  Transposing rows and columns.  Use lookups to merge data  Using any complex data validation (e.g., if the first two columns in a row are empty then it automatically reject the row from processing)
  • 24. 23 LOAD  In this last step, the transformed data is moved from the staging area into a target data warehouse.  Typically, this involves an initial loading of all data, followed by periodic loading of incremental data changes and, less often, full refreshes to erase and replace data in the warehouse.
  • 26. 25 LOAD  Loading data into the target Datawarehouse database is the last step of the ETL process. In a typical Data warehouse, huge volume of data needs to be loaded in a relatively short period (nights). Hence, load process should be optimized for performance  In case of load failure, recover mechanisms should be configured to restart from the point of failure without data integrity loss. Data Warehouse admins need to monitor, resume, cancel loads as per prevailing server performance
  • 27. 26 LOAD  Types of Loading :  Initial Load – Populating all the Data Warehouse tables  Incremental Load – apply ongoing changes as when needed periodically  Full refresh – erasing the contents of one or more tables and reloading with fresh Data
  • 28. 27 LOAD  Load verification  Ensure that the key field data is neither missing nor null. Test modeling views based on the target tables.  Check that combined values and calculated measures.  Data checks in dimension table as well as history table.  Check the BI reports on the loaded fact and dimension table.
  • 29. 28 BEST PRACTICES ELT PROCESS  Never try to cleanse all the data  Every organization would like to have all the data clean, but most of them are not ready to pay to wait or not ready to wait. To clean it all would simply take too long, So it is better not to try to cleanse all the data. Cleanse only relevant data.  Never cleanse Anything  Always plan to clean something because the biggest reason for building the Data Warehouse is to offer cleaner and more reliable data.
  • 30. 29 BEST PRACTICES ELT PROCESS  Determine the cost of cleansing the data  Before cleansing all the dirty data, it is important for you to determine the cleansing cost for every dirty data element.  Determine the cost per data element.
  • 31. 30 SUMMARY  ETL is an abbreviation of Extract, Transform and Load.  ETL provides a method of moving the data from various sources into a data warehouse.  In the first step extraction, data is extracted from the source system into the staging area.  In the transformation step, the data extracted from source is cleansed and transformed.  Loading data into the target Datawarehouse is the last step of the ETL process.
  • 32. 32 REFERENCE  ETL Process in Data Warehouse by Chirayu Poundarik.  ETL Methodology - https://www.ibm.com/in-en/cloud/learn/etl.
  • 34. Q & A

Editor's Notes

  1. To Change the Background Picture Follow the steps below Mouse Right button click>Format Background>Select Picture or Texture File>Click “File” button>Browse and select the image from your computer>Click Insert That’s it. You are Done !!!