SlideShare a Scribd company logo
1 of 24
ETL
The process of updating the data
warehouse.
Recent Developments in
Data Warehousing: A Tutorial
Hugh J. Watson
Terry College of Business
University of Georgia
hwatson@terry.uga.edu
http://www.terry.uga.edu/~hwatson/dw_tutorial.ppt
Two Data Warehousing
Strategies
Enterprise-wide warehouse, top down,
the Inmon methodology
Data mart, bottom up, the Kimball
methodology
When properly executed, both result in
an enterprise-wide data warehouse
The Data Mart Strategy
The most common approach
Begins with a single mart and architected marts are
added over time for more subject areas
Relatively inexpensive and easy to implement
Can be used as a proof of concept for data
warehousing
Can perpetuate the “silos of information” problem
Can postpone difficult decisions and activities
Requires an overall integration plan
The Enterprise-wide Strategy
A comprehensive warehouse is built initially
An initial dependent data mart is built using a
subset of the data in the warehouse
Additional data marts are built using subsets
of the data in the warehouse
Like all complex projects, it is expensive, time
consuming, and prone to failure
When successful, it results in an integrated,
scalable warehouse
Data Sources and Types
Primarily from legacy, operational systems
Almost exclusively numerical data at the
present time
External data may be included, often
purchased from third-party sources
Technology exists for storing unstructured
data and expect this to become more
important over time
Extraction, Transformation, and
Loading (ETL) Processes
The “plumbing” work of data
warehousing
Data are moved from source to target
data bases
A very costly, time consuming part of
data warehousing
Recent Development:
More Frequent Updates
Updates can be done in bulk and trickle
modes
Business requirements, such as trading
partner access to a Web site, requires
current data
For international firms, there is no good
time to load the warehouse
Recent Development:
Clickstream Data
Results from clicks at web sites
A dialog manager handles user interactions.
An ODS (operational data store in the data
staging area) helps to custom tailor the dialog
The clickstream data is filtered and parsed
and sent to a data warehouse where it is
analyzed
Software is available to analyze the
clickstream data
Data Extraction
Often performed by COBOL routines
(not recommended because of high program
maintenance and no automatically generated
meta data)
Sometimes source data is copied to the target
database using the replication capabilities of
standard RDMS (not recommended because
of “dirty data” in the source systems)
Increasing performed by specialized ETL
software
Sample ETL Tools
Teradata Warehouse Builder from Teradata
DataStage from Ascential Software
SAS System from SAS Institute
Power Mart/Power Center from Informatica
Sagent Solution from Sagent Software
Hummingbird Genio Suite from Hummingbird
Communications
Reasons for “Dirty” Data
• Dummy Values
• Absence of Data
• Multipurpose Fields
• Cryptic Data
• Contradicting Data
• Inappropriate Use of Address Lines
• Violation of Business Rules
• Reused Primary Keys,
• Non-Unique Identifiers
• Data Integration Problems
Data Cleansing
Source systems contain “dirty data” that must
be cleansed
ETL software contains rudimentary data
cleansing capabilities
Specialized data cleansing software is often
used. Important for performing name and
address correction and householding
functions
Leading data cleansing vendors include Vality
(Integrity), Harte-Hanks (Trillium), and
Firstlogic (i.d.Centric)
Steps in Data Cleansing
• Parsing
• Correcting
• Standardizing
• Matching
• Consolidating
Parsing
Parsing locates and identifies individual
data elements in the source files and
then isolates these data elements in the
target files.
Examples include parsing the first,
middle, and last name; street number
and street name; and city and state.
Correcting
Corrects parsed individual data
components using sophisticated data
algorithms and secondary data sources.
Example include replacing a vanity
address and adding a zip code.
Standardizing
Standardizing applies conversion
routines to transform data into its
preferred (and consistent) format using
both standard and custom business
rules.
Examples include adding a pre name,
replacing a nickname, and using a
preferred street name.
Matching
Searching and matching records within
and across the parsed, corrected and
standardized data based on predefined
business rules to eliminate duplications.
Examples include identifying similar
names and addresses.
Consolidating
• Analyzing and identifying relationships
between matched records and
consolidating/merging them into ONE
representation.
Data Staging
Often used as an interim step between data
extraction and later steps
Accumulates data from asynchronous sources using
native interfaces, flat files, FTP sessions, or other
processes
At a predefined cutoff time, data in the staging file is
transformed and loaded to the warehouse
There is usually no end user access to the staging file
An operational data store may be used for data
staging
Data Transformation
Transforms the data in accordance with
the business rules and standards that
have been established
Example include: format changes,
deduplication, splitting up fields,
replacement of codes, derived values,
and aggregates
Data Loading
Data are physically moved to the data
warehouse
The loading takes place within a “load
window”
The trend is to near real time updates of
the data warehouse as the warehouse
is increasingly used for operational
applications
Meta Data
Data about data
Needed by both information technology
personnel and users
IT personnel need to know data sources and
targets; database, table and column names;
refresh schedules; data usage measures; etc.
Users need to know entity/attribute
definitions; reports/query tools available;
report distribution information; help desk
contact information, etc.
Recent Development:
Meta Data Integration
A growing realization that meta data is critical
to data warehousing success
Progress is being made on getting vendors to
agree on standards and to incorporate the
sharing of meta data among their tools
Vendors like Microsoft, Computer Associates,
and Oracle have entered the meta data
marketplace with significant product offerings

More Related Content

What's hot

Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
Theju Paul
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
Abdul Aslam
 

What's hot (19)

Datawarehousing Terminology
Datawarehousing TerminologyDatawarehousing Terminology
Datawarehousing Terminology
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Transaction
TransactionTransaction
Transaction
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Tuning data warehouse
Tuning data warehouseTuning data warehouse
Tuning data warehouse
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
Data extraction, transformation, and loading
Data extraction, transformation, and loadingData extraction, transformation, and loading
Data extraction, transformation, and loading
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data mining
Data miningData mining
Data mining
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 

Similar to D01 etl

BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
angshuman2387
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
sumit621
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
raianup
 

Similar to D01 etl (20)

Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering students
 
extract, transform, load_Data Analyt.ppt
extract, transform, load_Data Analyt.pptextract, transform, load_Data Analyt.ppt
extract, transform, load_Data Analyt.ppt
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
 
(Lecture 2)Data Warehouse Architecture.pdf
(Lecture 2)Data Warehouse Architecture.pdf(Lecture 2)Data Warehouse Architecture.pdf
(Lecture 2)Data Warehouse Architecture.pdf
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 

Recently uploaded

Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
ScottMeyers35
 
Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Recently uploaded (20)

Delivery in 20 Mins Call Girls Malappuram { 9332606886 } VVIP NISHA Call Girl...
Delivery in 20 Mins Call Girls Malappuram { 9332606886 } VVIP NISHA Call Girl...Delivery in 20 Mins Call Girls Malappuram { 9332606886 } VVIP NISHA Call Girl...
Delivery in 20 Mins Call Girls Malappuram { 9332606886 } VVIP NISHA Call Girl...
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
Election 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfElection 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdf
 
2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.
 
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiVasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
 
Call Girl Service in Korba 9332606886 High Profile Call Girls You Can Get ...
Call Girl Service in Korba   9332606886  High Profile Call Girls You Can Get ...Call Girl Service in Korba   9332606886  High Profile Call Girls You Can Get ...
Call Girl Service in Korba 9332606886 High Profile Call Girls You Can Get ...
 
An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCC
 
Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218
 
Finance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCCFinance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCC
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
 
Vivek @ Cheap Call Girls In Kamla Nagar | Book 8448380779 Extreme Call Girls ...
Vivek @ Cheap Call Girls In Kamla Nagar | Book 8448380779 Extreme Call Girls ...Vivek @ Cheap Call Girls In Kamla Nagar | Book 8448380779 Extreme Call Girls ...
Vivek @ Cheap Call Girls In Kamla Nagar | Book 8448380779 Extreme Call Girls ...
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
 
Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Morena [ 7014168258 ] Call Me For Genuine Models We...
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
Honasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdfHonasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdf
 
Call Girls Basheerbagh ( 8250092165 ) Cheap rates call girls | Get low budget
Call Girls Basheerbagh ( 8250092165 ) Cheap rates call girls | Get low budgetCall Girls Basheerbagh ( 8250092165 ) Cheap rates call girls | Get low budget
Call Girls Basheerbagh ( 8250092165 ) Cheap rates call girls | Get low budget
 
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girls
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girlsPakistani Call girls in Sharjah 0505086370 Sharjah Call girls
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girls
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP process
 
tOld settlement register shouldnotaffect BTR
tOld settlement register shouldnotaffect BTRtOld settlement register shouldnotaffect BTR
tOld settlement register shouldnotaffect BTR
 

D01 etl

  • 1. ETL The process of updating the data warehouse.
  • 2. Recent Developments in Data Warehousing: A Tutorial Hugh J. Watson Terry College of Business University of Georgia hwatson@terry.uga.edu http://www.terry.uga.edu/~hwatson/dw_tutorial.ppt
  • 3. Two Data Warehousing Strategies Enterprise-wide warehouse, top down, the Inmon methodology Data mart, bottom up, the Kimball methodology When properly executed, both result in an enterprise-wide data warehouse
  • 4. The Data Mart Strategy The most common approach Begins with a single mart and architected marts are added over time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data warehousing Can perpetuate the “silos of information” problem Can postpone difficult decisions and activities Requires an overall integration plan
  • 5. The Enterprise-wide Strategy A comprehensive warehouse is built initially An initial dependent data mart is built using a subset of the data in the warehouse Additional data marts are built using subsets of the data in the warehouse Like all complex projects, it is expensive, time consuming, and prone to failure When successful, it results in an integrated, scalable warehouse
  • 6. Data Sources and Types Primarily from legacy, operational systems Almost exclusively numerical data at the present time External data may be included, often purchased from third-party sources Technology exists for storing unstructured data and expect this to become more important over time
  • 7. Extraction, Transformation, and Loading (ETL) Processes The “plumbing” work of data warehousing Data are moved from source to target data bases A very costly, time consuming part of data warehousing
  • 8. Recent Development: More Frequent Updates Updates can be done in bulk and trickle modes Business requirements, such as trading partner access to a Web site, requires current data For international firms, there is no good time to load the warehouse
  • 9. Recent Development: Clickstream Data Results from clicks at web sites A dialog manager handles user interactions. An ODS (operational data store in the data staging area) helps to custom tailor the dialog The clickstream data is filtered and parsed and sent to a data warehouse where it is analyzed Software is available to analyze the clickstream data
  • 10. Data Extraction Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems) Increasing performed by specialized ETL software
  • 11. Sample ETL Tools Teradata Warehouse Builder from Teradata DataStage from Ascential Software SAS System from SAS Institute Power Mart/Power Center from Informatica Sagent Solution from Sagent Software Hummingbird Genio Suite from Hummingbird Communications
  • 12. Reasons for “Dirty” Data • Dummy Values • Absence of Data • Multipurpose Fields • Cryptic Data • Contradicting Data • Inappropriate Use of Address Lines • Violation of Business Rules • Reused Primary Keys, • Non-Unique Identifiers • Data Integration Problems
  • 13. Data Cleansing Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary data cleansing capabilities Specialized data cleansing software is often used. Important for performing name and address correction and householding functions Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
  • 14. Steps in Data Cleansing • Parsing • Correcting • Standardizing • Matching • Consolidating
  • 15. Parsing Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. Examples include parsing the first, middle, and last name; street number and street name; and city and state.
  • 16. Correcting Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. Example include replacing a vanity address and adding a zip code.
  • 17. Standardizing Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. Examples include adding a pre name, replacing a nickname, and using a preferred street name.
  • 18. Matching Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. Examples include identifying similar names and addresses.
  • 19. Consolidating • Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
  • 20. Data Staging Often used as an interim step between data extraction and later steps Accumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processes At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse There is usually no end user access to the staging file An operational data store may be used for data staging
  • 21. Data Transformation Transforms the data in accordance with the business rules and standards that have been established Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
  • 22. Data Loading Data are physically moved to the data warehouse The loading takes place within a “load window” The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
  • 23. Meta Data Data about data Needed by both information technology personnel and users IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.
  • 24. Recent Development: Meta Data Integration A growing realization that meta data is critical to data warehousing success Progress is being made on getting vendors to agree on standards and to incorporate the sharing of meta data among their tools Vendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offerings

Editor's Notes

  1. This presentation is based on a tutorial presented by Hugh Watson at the 2001 AMCIS. It provides material for an introduction to data warehousing. Customize it for your own purposes.
  2. There is still debate over which approach is best.
  3. The key is to have an overall plan, processes, and technologies for integrating the different marts. The marts may be logically rather than physically separate.
  4. Even with the enterprise-wide strategy, the warehouse is developed in phases and each phase should be designed to deliver business value.
  5. It is not unusual to extract data from over 100 source systems. While the technology is available to store structured and unstructured data together, the reality is that warehouse data is almost exclusively structured -- numerical with simple textual identifiers.
  6. ETL tends to be “pick and shovel” work. Most organization’s data is even worse than imagined.
  7. As data warehousing becomes more critical to decision making and operational processes, the pressure is to have more current data, which leads to trickle updates.
  8. The ODS is used to support the web site dialog -- an operational process -- while the data in the warehouse is analyzed -- to better understand customers and their use of the web site.
  9. It’s changing, but COBOL extracts are still the most common ETL process. There are multiple reasons for this -- the cost of specialized ETL software, in-house programmers who have a good knowledge of the COBOL based source systems that will be used, and the peculiarities of the source systems that make the use of ETL software difficult.
  10. You might go to the vendors’ web sites to find a good demo to show your students.
  11. Here’s a couple of examples: Dummy data -- a clerk enters 999-99-9999 as a SSN rather than asking the customer for theirs. Reused primary keys -- a branch bank is closed. Several years later, a new branch is opened, and the old identifier is used again.
  12. Data cleansing is critical to customer relationship management initiatives.
  13. A good example to use is cleansing customer data. Most students can identify with receiving multiple copies of the same catalog because the company is not doing a good data cleansing job.
  14. The record is broken down into atomic data elements.
  15. External data, such as census data, is often used in this process.
  16. Companies decide on the standards that they want to use.
  17. Commercial data cleansing software often uses AI techniques to match records.
  18. All of the data are now combined in a standard format.
  19. Data staging is used in cleansing, transforming, and integrating the data.
  20. Aggregates, such as sales totals, are often precalculated and stored in the warehouse to speed queries that require summary totals.
  21. Most loads involve only change data rather than a bulk reloading of all of the data in the warehouse.
  22. The importance of meta data is now realized, even though creating it is not glamorous work.
  23. Historically, each vendor had their own meta data solution -- which was incompatible with other vendors’ solutions. This is changing.