SlideShare a Scribd company logo
1 of 23
21COM2T463
Dr. NEERUPA CHAUHAN
Asst. Professor
Kristu Jayanti College, Autonomous
(Reaccredited A++ Grade by NAAC with CGPA 3.78/4)
Bengaluru – 560077, India
ETL
(Estraction,Transformation,
Loading)
The process of updating the data
warehouse.
Two Data Warehousing
Strategies
Enterprise-wide warehouse, top down,
the Inmon methodology
Data mart, bottom up, the Kimball
methodology
When properly executed, both result in
an enterprise-wide data warehouse
The Data Mart Strategy
The most common approach
Begins with a single mart and architected marts are
added over time for more subject areas
Relatively inexpensive and easy to implement
Can be used as a proof of concept for data
warehousing
Can perpetuate the “silos of information” problem
Can postpone difficult decisions and activities
Requires an overall integration plan
The Enterprise-wide Strategy
A comprehensive warehouse is built initially
An initial dependent data mart is built using a
subset of the data in the warehouse
Additional data marts are built using subsets
of the data in the warehouse
Like all complex projects, it is expensive, time
consuming, and prone to failure
When successful, it results in an integrated,
scalable warehouse
Extraction, Transformation, and
Loading (ETL) Processes
The “plumbing” work of data
warehousing
Data are moved from source to target
data bases
A very costly, time consuming part of
data warehousing
Recent Development:
More Frequent Updates
Updates can be done in bulk and trickle
modes
Business requirements, such as trading
partner access to a Web site, requires
current data
For international firms, there is no good
time to load the warehouse
Recent Development:
Clickstream Data
Results from clicks at web sites
A dialog manager handles user interactions.
An ODS (operational data store in the data
staging area) helps to custom tailor the dialog
The clickstream data is filtered and parsed
and sent to a data warehouse where it is
analyzed
Software is available to analyze the
clickstream data
Data Extraction
Often performed by COBOL routines
(not recommended because of high program
maintenance and no automatically generated
meta data)
Sometimes source data is copied to the target
database using the replication capabilities of
standard RDMS (not recommended because
of “dirty data” in the source systems)
Increasing performed by specialized ETL
software
Sample ETL Tools
Teradata Warehouse Builder from Teradata
DataStage from Ascential Software
SAS System from SAS Institute
Power Mart/Power Center from Informatica
Sagent Solution from Sagent Software
Hummingbird Genio Suite from Hummingbird
Communications
Reasons for “Dirty” Data
 Dummy Values
 Absence of Data
 Multipurpose Fields
 Cryptic Data
 Contradicting Data
 Inappropriate Use of Address Lines
 Violation of Business Rules
 Reused Primary Keys,
 Non-Unique Identifiers
 Data Integration Problems
Data Cleansing
Source systems contain “dirty data” that must
be cleansed
ETL software contains rudimentary data
cleansing capabilities
Specialized data cleansing software is often
used. Important for performing name and
address correction and householding
functions
Leading data cleansing vendors include Vality
(Integrity), Harte-Hanks (Trillium), and
Firstlogic (i.d.Centric)
Steps in Data Cleansing
 Parsing
 Correcting
 Standardizing
 Matching
 Consolidating
Parsing
Parsing locates and identifies individual
data elements in the source files and
then isolates these data elements in the
target files.
Examples include parsing the first,
middle, and last name; street number
and street name; and city and state.
Correcting
Corrects parsed individual data
components using sophisticated data
algorithms and secondary data sources.
Example include replacing a vanity
address and adding a zip code.
Standardizing
Standardizing applies conversion
routines to transform data into its
preferred (and consistent) format using
both standard and custom business
rules.
Examples include adding a pre name,
replacing a nickname, and using a
preferred street name.
Matching
Searching and matching records within
and across the parsed, corrected and
standardized data based on predefined
business rules to eliminate duplications.
Examples include identifying similar
names and addresses.
Consolidating
 Analyzing and identifying relationships
between matched records and
consolidating/merging them into ONE
representation.
Data Staging
Often used as an interim step between data
extraction and later steps
Accumulates data from asynchronous sources using
native interfaces, flat files, FTP sessions, or other
processes
At a predefined cutoff time, data in the staging file is
transformed and loaded to the warehouse
There is usually no end user access to the staging file
An operational data store may be used for data
staging
Data Transformation
Transforms the data in accordance with
the business rules and standards that
have been established
Example include: format changes,
deduplication, splitting up fields,
replacement of codes, derived values,
and aggregates
Data Loading
Data are physically moved to the data
warehouse
The loading takes place within a “load
window”
The trend is to near real time updates
of the data warehouse as the
warehouse is increasingly used for
operational applications
Meta Data
Data about data
Needed by both information technology
personnel and users
IT personnel need to know data sources and
targets; database, table and column names;
refresh schedules; data usage measures; etc.
Users need to know entity/attribute
definitions; reports/query tools available;
report distribution information; help desk
contact information, etc.
Recent Development:
Meta Data Integration
A growing realization that meta data is critical
to data warehousing success
Progress is being made on getting vendors to
agree on standards and to incorporate the
sharing of meta data among their tools
Vendors like Microsoft, Computer Associates,
and Oracle have entered the meta data
marketplace with significant product offerings

More Related Content

Similar to extract, transform, load_Data Analyt.ppt

ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingVibrant Event
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroangshuman2387
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessJawaherAlbaddawi
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptxJesusaEspeleta
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
Data Ware House Testing
Data Ware House TestingData Ware House Testing
Data Ware House Testingmanojpmat
 
(Lecture 2)Data Warehouse Architecture.pdf
(Lecture 2)Data Warehouse Architecture.pdf(Lecture 2)Data Warehouse Architecture.pdf
(Lecture 2)Data Warehouse Architecture.pdfMobeenMasoudi
 

Similar to extract, transform, load_Data Analyt.ppt (20)

ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
GROPSIKS.pptx
GROPSIKS.pptxGROPSIKS.pptx
GROPSIKS.pptx
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
DW 101
DW 101DW 101
DW 101
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
Data Ware House Testing
Data Ware House TestingData Ware House Testing
Data Ware House Testing
 
(Lecture 2)Data Warehouse Architecture.pdf
(Lecture 2)Data Warehouse Architecture.pdf(Lecture 2)Data Warehouse Architecture.pdf
(Lecture 2)Data Warehouse Architecture.pdf
 

More from Neerupa Chauhan

Consumer behaviour- business Economics.pptx
Consumer behaviour- business Economics.pptxConsumer behaviour- business Economics.pptx
Consumer behaviour- business Economics.pptxNeerupa Chauhan
 
Business Economics-Introduction-UNIT-I.pptx
Business Economics-Introduction-UNIT-I.pptxBusiness Economics-Introduction-UNIT-I.pptx
Business Economics-Introduction-UNIT-I.pptxNeerupa Chauhan
 
Introduction to Business Analytics---PPT
Introduction to Business Analytics---PPTIntroduction to Business Analytics---PPT
Introduction to Business Analytics---PPTNeerupa Chauhan
 
elliott wave theory- Investment Management
elliott wave theory- Investment Managementelliott wave theory- Investment Management
elliott wave theory- Investment ManagementNeerupa Chauhan
 
Oscillators- Investment Management..pptx
Oscillators- Investment Management..pptxOscillators- Investment Management..pptx
Oscillators- Investment Management..pptxNeerupa Chauhan
 
Mean_Median_Mode .kjc.pptx
Mean_Median_Mode .kjc.pptxMean_Median_Mode .kjc.pptx
Mean_Median_Mode .kjc.pptxNeerupa Chauhan
 
Measures in Statistics. kjc.pptx
Measures in Statistics. kjc.pptxMeasures in Statistics. kjc.pptx
Measures in Statistics. kjc.pptxNeerupa Chauhan
 
Descriptive Statistics.kjc.ppt
Descriptive Statistics.kjc.pptDescriptive Statistics.kjc.ppt
Descriptive Statistics.kjc.pptNeerupa Chauhan
 
Measures of Variablity.kjc.ppt
Measures of Variablity.kjc.pptMeasures of Variablity.kjc.ppt
Measures of Variablity.kjc.pptNeerupa Chauhan
 

More from Neerupa Chauhan (11)

Consumer behaviour- business Economics.pptx
Consumer behaviour- business Economics.pptxConsumer behaviour- business Economics.pptx
Consumer behaviour- business Economics.pptx
 
Business Economics-Introduction-UNIT-I.pptx
Business Economics-Introduction-UNIT-I.pptxBusiness Economics-Introduction-UNIT-I.pptx
Business Economics-Introduction-UNIT-I.pptx
 
Introduction to Business Analytics---PPT
Introduction to Business Analytics---PPTIntroduction to Business Analytics---PPT
Introduction to Business Analytics---PPT
 
elliott wave theory- Investment Management
elliott wave theory- Investment Managementelliott wave theory- Investment Management
elliott wave theory- Investment Management
 
Oscillators- Investment Management..pptx
Oscillators- Investment Management..pptxOscillators- Investment Management..pptx
Oscillators- Investment Management..pptx
 
TYPE OF CHEQUES.pptx
TYPE OF CHEQUES.pptxTYPE OF CHEQUES.pptx
TYPE OF CHEQUES.pptx
 
uni 2 good act.pptx
uni 2 good act.pptxuni 2 good act.pptx
uni 2 good act.pptx
 
Mean_Median_Mode .kjc.pptx
Mean_Median_Mode .kjc.pptxMean_Median_Mode .kjc.pptx
Mean_Median_Mode .kjc.pptx
 
Measures in Statistics. kjc.pptx
Measures in Statistics. kjc.pptxMeasures in Statistics. kjc.pptx
Measures in Statistics. kjc.pptx
 
Descriptive Statistics.kjc.ppt
Descriptive Statistics.kjc.pptDescriptive Statistics.kjc.ppt
Descriptive Statistics.kjc.ppt
 
Measures of Variablity.kjc.ppt
Measures of Variablity.kjc.pptMeasures of Variablity.kjc.ppt
Measures of Variablity.kjc.ppt
 

Recently uploaded

male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................MirzaAbrarBaig5
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxneillewis46
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptxPoojaSen20
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxMarlene Maheu
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint23600690
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxAdelaideRefugio
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 

Recently uploaded (20)

male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 

extract, transform, load_Data Analyt.ppt

  • 1. 21COM2T463 Dr. NEERUPA CHAUHAN Asst. Professor Kristu Jayanti College, Autonomous (Reaccredited A++ Grade by NAAC with CGPA 3.78/4) Bengaluru – 560077, India
  • 3. Two Data Warehousing Strategies Enterprise-wide warehouse, top down, the Inmon methodology Data mart, bottom up, the Kimball methodology When properly executed, both result in an enterprise-wide data warehouse
  • 4. The Data Mart Strategy The most common approach Begins with a single mart and architected marts are added over time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data warehousing Can perpetuate the “silos of information” problem Can postpone difficult decisions and activities Requires an overall integration plan
  • 5. The Enterprise-wide Strategy A comprehensive warehouse is built initially An initial dependent data mart is built using a subset of the data in the warehouse Additional data marts are built using subsets of the data in the warehouse Like all complex projects, it is expensive, time consuming, and prone to failure When successful, it results in an integrated, scalable warehouse
  • 6. Extraction, Transformation, and Loading (ETL) Processes The “plumbing” work of data warehousing Data are moved from source to target data bases A very costly, time consuming part of data warehousing
  • 7. Recent Development: More Frequent Updates Updates can be done in bulk and trickle modes Business requirements, such as trading partner access to a Web site, requires current data For international firms, there is no good time to load the warehouse
  • 8. Recent Development: Clickstream Data Results from clicks at web sites A dialog manager handles user interactions. An ODS (operational data store in the data staging area) helps to custom tailor the dialog The clickstream data is filtered and parsed and sent to a data warehouse where it is analyzed Software is available to analyze the clickstream data
  • 9. Data Extraction Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems) Increasing performed by specialized ETL software
  • 10. Sample ETL Tools Teradata Warehouse Builder from Teradata DataStage from Ascential Software SAS System from SAS Institute Power Mart/Power Center from Informatica Sagent Solution from Sagent Software Hummingbird Genio Suite from Hummingbird Communications
  • 11. Reasons for “Dirty” Data  Dummy Values  Absence of Data  Multipurpose Fields  Cryptic Data  Contradicting Data  Inappropriate Use of Address Lines  Violation of Business Rules  Reused Primary Keys,  Non-Unique Identifiers  Data Integration Problems
  • 12. Data Cleansing Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary data cleansing capabilities Specialized data cleansing software is often used. Important for performing name and address correction and householding functions Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
  • 13. Steps in Data Cleansing  Parsing  Correcting  Standardizing  Matching  Consolidating
  • 14. Parsing Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. Examples include parsing the first, middle, and last name; street number and street name; and city and state.
  • 15. Correcting Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. Example include replacing a vanity address and adding a zip code.
  • 16. Standardizing Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. Examples include adding a pre name, replacing a nickname, and using a preferred street name.
  • 17. Matching Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. Examples include identifying similar names and addresses.
  • 18. Consolidating  Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
  • 19. Data Staging Often used as an interim step between data extraction and later steps Accumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processes At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse There is usually no end user access to the staging file An operational data store may be used for data staging
  • 20. Data Transformation Transforms the data in accordance with the business rules and standards that have been established Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
  • 21. Data Loading Data are physically moved to the data warehouse The loading takes place within a “load window” The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
  • 22. Meta Data Data about data Needed by both information technology personnel and users IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.
  • 23. Recent Development: Meta Data Integration A growing realization that meta data is critical to data warehousing success Progress is being made on getting vendors to agree on standards and to incorporate the sharing of meta data among their tools Vendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offerings