SlideShare a Scribd company logo
1 of 21
-
1
Data Warehousing
Extract Transform Load (ETL)
Ch Anwar ul Hassan (Lecturer)
Department of Computer Science and Software
Engineering
Capital University of Sciences & Technology, Islamabad
Pakistan
anwarchaudary@gmail.com
-
2
Extract Transform Load (ETL)
-
3
Data Warehouse Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
Data
Warehouse
Operational
Data Bases
Semistructured
Sources
MOLAP
ROLAP
Query/Reporting

Data Marts Tools
Meta
Data
Data sources
Data
(Tier 0)





IT
Users


Business
Users


Business Users
Data Mining

Archived
data
Analysis

Extract
Transform
Load
(ETL)
www data
Putting the pieces together
{Comment: All except ETL washed out look}
-
4
The ETL Cycle
EXTRACT
The process of reading
data from different
sources.
TRANSFORM
The process of transforming
the extracted data from its
original state into a consistent
state so that it can be placed
into another database.
LOAD
The process of writing
the data into the target
source.
TRANSFORM CLEANSE
LOAD
Data Warehouse
OLAP
Temporary
Data storage
EXTRACT
MIS Systems
(Acct, HR)
Legacy
Systems
Other indigenous applications
(COBOL, VB, C++, Java)

Archived data
www data
-
5
ETL Processing
Extracts
from
source
systems
Data
Movement
Data
Transfor-
mation
Data
Loading
Index
Mainte-
nance
Statistics
Collectio
Data
Cleansing
ETL is independent yet interrelated steps.
It is important to look at the big picture.
Data acquisition time may include…
Backup
Back-up is a major task, its a DWH not a cube
Note: Backup will come as other
elements after “Statistical collection”
-
6
Overview of Data Extraction
First step of ETL, followed by many.
Source system for extraction are typically OLTP
systems.
A very complex task due to number of reasons:
 Very complex and poorly documented source system.
 Data has to be extracted not once, but number of times.

The process design is dependent on:
 Which extraction method to choose?
 How to make available extracted data for further
processing?
-
7
Types of Data Extraction
 Logical Extraction
 Full Extraction
 Incremental Extraction
 Physical Extraction
 Online Extraction
 Offline Extraction
 Legacy vs. OLTP
-
8
Logical Data Extraction
 Full Extraction
 The data extracted completely from the source system.
 No need to keep track of changes.
 Source data made available as-is with any additional
information.
 Incremental Extraction
 Data extracted after a well defined point/event in time.
 Mechanism used to reflect/record the temporal changes in data
(column or table).
 Sometimes entire tables off-loaded from source system into the
DWH.
 Can have significant performance impacts on the data
warehouse server.
-
9
Physical Data Extraction…
 Online Extraction
 Data extracted directly from the source system.
 May access source tables through an intermediate system.
 Intermediate system usually similar to the source system.
 Offline Extraction
 Data NOT extracted directly from the source system, instead staged
explicitly outside the original source system.
 Data is either already structured or was created by an extraction
routine.
 Some of the prevalent structures are:
 Flat files
 Dump files
 Redo and archive logs
 Transportable table-spaces
-
10
Physical Data Extraction
 Legacy vs. OLTP
 Data moved from the source system
 Copy made of the source system data
 Staging area used for performance reasons
-
11
Data Transformation
 Basic tasks
1. Selection
2. Splitting/Joining
3. Conversion
4. Summarization
5. Enrichment
-
12
Data Transformation Basic Tasks
 Selection
-
13
Data Transformation Basic Tasks
 Splitting/joining
-
14
Data Transformation Basic Tasks
 Conversion
-
15
Data Transformation Basic Tasks: Conversion Example-1
 Convert common data elements into a consistent
form i.e. name and address.
 Translation of dissimilar codes into a standard
code.
Field format Field data
First-Family-title Muhammad Ibrahim Contractor
Family-title-comma-first Ibrahim Contractor, Muhammad
Family-comma-first-title Ibrahim, Muhammad Contractor
Natl. ID NID
National ID NID
F/NO-2
F-2
FL.NO.2
FL.2
FL/NO.2
FL-2
FLAT-2
FLAT#
FLAT,2
FLAT-NO-2
FL-NO.2
FLAT No. 2
-
16
 Data representation change
 EBCIDIC to ASCII
 Operating System Change
 Mainframe (MVS) to UNIX
 UNIX to NT or XP
 Data type change
 Program (Excel to Access), database format (FoxPro to
Access).
 Character, numeric and date type.
 Fixed and variable length.
Data Transformation Basic Tasks: Conversion Example-
2
-
17
Data Transformation Basic Tasks
 Summarization
-
18
Data Transformation Basic Tasks
 Enrichment
-
19
Data Transformation Basic Tasks: Enrichment
Example
 Data elements are mapped from source tables
and files to destination fact and dimension tables.
 Default values are used in the absence of source
data.
Input Data
HAJI MUHAMMAD IBRAHIM, GOVT. CONT.
K. S. ABDULLAH & BROTHERS,
MAMOOJI ROAD, ABDULLAH MANZIL
RAWALPINDI, Ph 67855
Parsed Data
First Name: HAJI MUHAMMAD
Family Name: IBRAHIM
Title: GOVT. CONT.
Firm: K. S. ABDULLAH &
BROTHERS
Firm Location: ABDULLAH MANZIL
Road: MAMOOJI ROAD
Phone: 051-67855
City: RAWALPINDI
Code: 46200
-
20
Aspects of Data Loading Strategies
 Need to look at:
 Data freshness
 System performance
 Data volatility
 Data Freshness
 Very fresh low update efficiency
 Historical data, high update efficiency
 Always trade-offs in the light of goals
 System performance
 Availability of staging table space
 Impact on query workload
 Data Volatility
 Ratio of new to historical data
 High percentages of data change (batch update)
-
21
Three Loading Strategies
 Once we have transformed data, there are three
primary loading strategies:
 Full data refresh with BLOCK INSERT or ‘block
slamming’ into empty table.
 Incremental data refresh with BLOCK INSERT or
‘block slamming’ into existing (populated) tables.
 Trickle/continuous feed with constant data
collection and loading using row level insert and
update operations.

More Related Content

What's hot

Ibm db2 10.5 for linux, unix, and windows data movement utilities guide and...
Ibm db2 10.5 for linux, unix, and windows   data movement utilities guide and...Ibm db2 10.5 for linux, unix, and windows   data movement utilities guide and...
Ibm db2 10.5 for linux, unix, and windows data movement utilities guide and...bupbechanhgmail
 
Information Flow Mechanism in Data warehouse
Information Flow Mechanism in Data warehouseInformation Flow Mechanism in Data warehouse
Information Flow Mechanism in Data warehouseGunjanShree1
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasiryasir873
 
Introduction to Datawarehousing
Introduction to  DatawarehousingIntroduction to  Datawarehousing
Introduction to Datawarehousingkarunakar81987
 
The Database Environment Chapter 10
The Database Environment Chapter 10The Database Environment Chapter 10
The Database Environment Chapter 10Jeanie Arnoco
 
47468272 introduction-to-informatica
47468272 introduction-to-informatica47468272 introduction-to-informatica
47468272 introduction-to-informaticaVenkat485
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented databaseKanike Krishna
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to readPrasanth Dusi
 
9780538745840 ppt ch07
9780538745840 ppt ch079780538745840 ppt ch07
9780538745840 ppt ch07Terry Yoast
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesDATAVERSITY
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql serverDivya Sharma
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BIDatavail
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sasAjay Ohri
 
DB2 Interview Questions - Part 1
DB2 Interview Questions - Part 1DB2 Interview Questions - Part 1
DB2 Interview Questions - Part 1ReKruiTIn.com
 

What's hot (20)

IDUG 2015 NA Data Movement Utilities final
IDUG 2015 NA Data Movement Utilities finalIDUG 2015 NA Data Movement Utilities final
IDUG 2015 NA Data Movement Utilities final
 
Ibm db2 10.5 for linux, unix, and windows data movement utilities guide and...
Ibm db2 10.5 for linux, unix, and windows   data movement utilities guide and...Ibm db2 10.5 for linux, unix, and windows   data movement utilities guide and...
Ibm db2 10.5 for linux, unix, and windows data movement utilities guide and...
 
Information Flow Mechanism in Data warehouse
Information Flow Mechanism in Data warehouseInformation Flow Mechanism in Data warehouse
Information Flow Mechanism in Data warehouse
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Introduction to Datawarehousing
Introduction to  DatawarehousingIntroduction to  Datawarehousing
Introduction to Datawarehousing
 
The Database Environment Chapter 10
The Database Environment Chapter 10The Database Environment Chapter 10
The Database Environment Chapter 10
 
Column oriented Transactions
Column oriented TransactionsColumn oriented Transactions
Column oriented Transactions
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
 
47468272 introduction-to-informatica
47468272 introduction-to-informatica47468272 introduction-to-informatica
47468272 introduction-to-informatica
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to read
 
9780538745840 ppt ch07
9780538745840 ppt ch079780538745840 ppt ch07
9780538745840 ppt ch07
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data Wharehouse
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BI
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sas
 
DB2 Interview Questions - Part 1
DB2 Interview Questions - Part 1DB2 Interview Questions - Part 1
DB2 Interview Questions - Part 1
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 

Similar to Intro to Data warehousing lecture 09

Lecture 16
Lecture 16Lecture 16
Lecture 16Shani729
 
project_phrase I.pptx
project_phrase I.pptxproject_phrase I.pptx
project_phrase I.pptxNambiraju
 
moving data between the data bases in database
moving data between the data bases in databasemoving data between the data bases in database
moving data between the data bases in databasemqasimsheikh5
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptxJesusaEspeleta
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.pptpadalamail
 
The Database Environment Chapter 11
The Database Environment Chapter 11The Database Environment Chapter 11
The Database Environment Chapter 11Jeanie Arnoco
 
Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010Yahoo Developer Network
 
Less18 moving data
Less18 moving dataLess18 moving data
Less18 moving dataImran Ali
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migrationThinqloud
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Less17 moving data
Less17 moving dataLess17 moving data
Less17 moving dataAmit Bhalla
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInrajappaiyer
 

Similar to Intro to Data warehousing lecture 09 (20)

Lecture 16
Lecture 16Lecture 16
Lecture 16
 
project_phrase I.pptx
project_phrase I.pptxproject_phrase I.pptx
project_phrase I.pptx
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
moving data between the data bases in database
moving data between the data bases in databasemoving data between the data bases in database
moving data between the data bases in database
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
ETL Process
ETL ProcessETL Process
ETL Process
 
ETL_Methodology.pptx
ETL_Methodology.pptxETL_Methodology.pptx
ETL_Methodology.pptx
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt
 
The Database Environment Chapter 11
The Database Environment Chapter 11The Database Environment Chapter 11
The Database Environment Chapter 11
 
Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010
 
Less18 moving data
Less18 moving dataLess18 moving data
Less18 moving data
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Less17 moving data
Less17 moving dataLess17 moving data
Less17 moving data
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 

More from AnwarrChaudary

Intro to Data warehousing lecture 20
Intro to Data warehousing   lecture 20Intro to Data warehousing   lecture 20
Intro to Data warehousing lecture 20AnwarrChaudary
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19AnwarrChaudary
 
Intro to Data warehousing lecture 18
Intro to Data warehousing   lecture 18Intro to Data warehousing   lecture 18
Intro to Data warehousing lecture 18AnwarrChaudary
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17AnwarrChaudary
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16AnwarrChaudary
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14AnwarrChaudary
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13AnwarrChaudary
 
Intro to Data warehousing lecture 12
Intro to Data warehousing   lecture 12Intro to Data warehousing   lecture 12
Intro to Data warehousing lecture 12AnwarrChaudary
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11AnwarrChaudary
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10AnwarrChaudary
 
Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08AnwarrChaudary
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07AnwarrChaudary
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06AnwarrChaudary
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05AnwarrChaudary
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04AnwarrChaudary
 
Intro to Data warehousing lecture 03
Intro to Data warehousing   lecture 03Intro to Data warehousing   lecture 03
Intro to Data warehousing lecture 03AnwarrChaudary
 
Intro to Data warehousing lecture 02
Intro to Data warehousing   lecture 02Intro to Data warehousing   lecture 02
Intro to Data warehousing lecture 02AnwarrChaudary
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseAnwarrChaudary
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software EngineeringAnwarrChaudary
 

More from AnwarrChaudary (20)

Intro to Data warehousing lecture 20
Intro to Data warehousing   lecture 20Intro to Data warehousing   lecture 20
Intro to Data warehousing lecture 20
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
 
Intro to Data warehousing lecture 18
Intro to Data warehousing   lecture 18Intro to Data warehousing   lecture 18
Intro to Data warehousing lecture 18
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13
 
Intro to Data warehousing lecture 12
Intro to Data warehousing   lecture 12Intro to Data warehousing   lecture 12
Intro to Data warehousing lecture 12
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
 
Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04
 
Intro to Data warehousing lecture 03
Intro to Data warehousing   lecture 03Intro to Data warehousing   lecture 03
Intro to Data warehousing lecture 03
 
Intro to Data warehousing lecture 02
Intro to Data warehousing   lecture 02Intro to Data warehousing   lecture 02
Intro to Data warehousing lecture 02
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software Engineering
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Intro to Data warehousing lecture 09

  • 1. - 1 Data Warehousing Extract Transform Load (ETL) Ch Anwar ul Hassan (Lecturer) Department of Computer Science and Software Engineering Capital University of Sciences & Technology, Islamabad Pakistan anwarchaudary@gmail.com
  • 3. - 3 Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) Data Warehouse Operational Data Bases Semistructured Sources MOLAP ROLAP Query/Reporting  Data Marts Tools Meta Data Data sources Data (Tier 0)      IT Users   Business Users   Business Users Data Mining  Archived data Analysis  Extract Transform Load (ETL) www data Putting the pieces together {Comment: All except ETL washed out look}
  • 4. - 4 The ETL Cycle EXTRACT The process of reading data from different sources. TRANSFORM The process of transforming the extracted data from its original state into a consistent state so that it can be placed into another database. LOAD The process of writing the data into the target source. TRANSFORM CLEANSE LOAD Data Warehouse OLAP Temporary Data storage EXTRACT MIS Systems (Acct, HR) Legacy Systems Other indigenous applications (COBOL, VB, C++, Java)  Archived data www data
  • 5. - 5 ETL Processing Extracts from source systems Data Movement Data Transfor- mation Data Loading Index Mainte- nance Statistics Collectio Data Cleansing ETL is independent yet interrelated steps. It is important to look at the big picture. Data acquisition time may include… Backup Back-up is a major task, its a DWH not a cube Note: Backup will come as other elements after “Statistical collection”
  • 6. - 6 Overview of Data Extraction First step of ETL, followed by many. Source system for extraction are typically OLTP systems. A very complex task due to number of reasons:  Very complex and poorly documented source system.  Data has to be extracted not once, but number of times.  The process design is dependent on:  Which extraction method to choose?  How to make available extracted data for further processing?
  • 7. - 7 Types of Data Extraction  Logical Extraction  Full Extraction  Incremental Extraction  Physical Extraction  Online Extraction  Offline Extraction  Legacy vs. OLTP
  • 8. - 8 Logical Data Extraction  Full Extraction  The data extracted completely from the source system.  No need to keep track of changes.  Source data made available as-is with any additional information.  Incremental Extraction  Data extracted after a well defined point/event in time.  Mechanism used to reflect/record the temporal changes in data (column or table).  Sometimes entire tables off-loaded from source system into the DWH.  Can have significant performance impacts on the data warehouse server.
  • 9. - 9 Physical Data Extraction…  Online Extraction  Data extracted directly from the source system.  May access source tables through an intermediate system.  Intermediate system usually similar to the source system.  Offline Extraction  Data NOT extracted directly from the source system, instead staged explicitly outside the original source system.  Data is either already structured or was created by an extraction routine.  Some of the prevalent structures are:  Flat files  Dump files  Redo and archive logs  Transportable table-spaces
  • 10. - 10 Physical Data Extraction  Legacy vs. OLTP  Data moved from the source system  Copy made of the source system data  Staging area used for performance reasons
  • 11. - 11 Data Transformation  Basic tasks 1. Selection 2. Splitting/Joining 3. Conversion 4. Summarization 5. Enrichment
  • 12. - 12 Data Transformation Basic Tasks  Selection
  • 13. - 13 Data Transformation Basic Tasks  Splitting/joining
  • 14. - 14 Data Transformation Basic Tasks  Conversion
  • 15. - 15 Data Transformation Basic Tasks: Conversion Example-1  Convert common data elements into a consistent form i.e. name and address.  Translation of dissimilar codes into a standard code. Field format Field data First-Family-title Muhammad Ibrahim Contractor Family-title-comma-first Ibrahim Contractor, Muhammad Family-comma-first-title Ibrahim, Muhammad Contractor Natl. ID NID National ID NID F/NO-2 F-2 FL.NO.2 FL.2 FL/NO.2 FL-2 FLAT-2 FLAT# FLAT,2 FLAT-NO-2 FL-NO.2 FLAT No. 2
  • 16. - 16  Data representation change  EBCIDIC to ASCII  Operating System Change  Mainframe (MVS) to UNIX  UNIX to NT or XP  Data type change  Program (Excel to Access), database format (FoxPro to Access).  Character, numeric and date type.  Fixed and variable length. Data Transformation Basic Tasks: Conversion Example- 2
  • 17. - 17 Data Transformation Basic Tasks  Summarization
  • 18. - 18 Data Transformation Basic Tasks  Enrichment
  • 19. - 19 Data Transformation Basic Tasks: Enrichment Example  Data elements are mapped from source tables and files to destination fact and dimension tables.  Default values are used in the absence of source data. Input Data HAJI MUHAMMAD IBRAHIM, GOVT. CONT. K. S. ABDULLAH & BROTHERS, MAMOOJI ROAD, ABDULLAH MANZIL RAWALPINDI, Ph 67855 Parsed Data First Name: HAJI MUHAMMAD Family Name: IBRAHIM Title: GOVT. CONT. Firm: K. S. ABDULLAH & BROTHERS Firm Location: ABDULLAH MANZIL Road: MAMOOJI ROAD Phone: 051-67855 City: RAWALPINDI Code: 46200
  • 20. - 20 Aspects of Data Loading Strategies  Need to look at:  Data freshness  System performance  Data volatility  Data Freshness  Very fresh low update efficiency  Historical data, high update efficiency  Always trade-offs in the light of goals  System performance  Availability of staging table space  Impact on query workload  Data Volatility  Ratio of new to historical data  High percentages of data change (batch update)
  • 21. - 21 Three Loading Strategies  Once we have transformed data, there are three primary loading strategies:  Full data refresh with BLOCK INSERT or ‘block slamming’ into empty table.  Incremental data refresh with BLOCK INSERT or ‘block slamming’ into existing (populated) tables.  Trickle/continuous feed with constant data collection and loading using row level insert and update operations.