SlideShare a Scribd company logo
1 of 2
Download to read offline
Data Mining Lecture 2
What is Data Warehouse?
Defined in many different ways, but not
rigorously
 A decision support database that is
maintained separately from the organisation’s
operational database
 Support information processing by providing
a solid platform of consolidated, historical
data for analysis
Definition by Inmon
 “A data warehouse is a subject-oriented,
integrated, time-variant, and non-volatile
collection of data in support of management’s
decision-making process”
Data warehousing
 The process of constructing and using data
warehouses
Data Warehouse—Subject-Oriented
 Organised around major subjects, such as
customer, product, sales
 Focusing on the modelling and analysis of
data for decision makers, not on daily
operations or transaction processing
 Provide a simple and concise view around
particular subject issues by excluding data
that are not useful in the decision support
process
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
 relational databases, flat files, on-line
transaction records
Data cleaning and data integration
techniques are applied
 Ensure consistency in naming conventions,
encoding structures, attribute measures, etc.
among different data sources
o E.g., Hotel price: currency, tax,
breakfast covered, etc.
 When data is moved to the warehouse, it is
converted
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly
longer than that of operational systems
 Operational database: current value data
 Data warehouse data: provide information
from a historical perspective (e.g., past 5-10
years)
Every key structure in the data warehouse
 Contains an element of time, explicitly or
implicitly
 But the key of operational data may or may
not contain “time element”
Data Warehouse—Non-Volatile
- Physically separate stores of data
transformed from the operational
environment
- Operational update of data does not
occur in the data warehouse
environment
 Does not require transaction processing,
recovery, and concurrency control
mechanisms
 Requires only two operations in data
accessing: initial loading of data and access of
data
Data Warehouse vs. Heterogeneous
DB
Traditional heterogeneous DB integration
- Build wrappers/mediators on top of
heterogeneous databases
- Query driven approach
o When a query is posed to a client site,
a meta-dictionary is used to translate
the query into queries appropriate for
individual heterogeneous sites
involved, and the results are
integrated into a global answer set
o Complex information filtering,
compete for resources
Data warehouse
- update-driven, high performance
- Information from heterogeneous sources is
integrated in advance and stored in
warehouses for direct access and analysis
Data Warehouse vs. Operational
DB
OLTP (On-Line Transaction Processing)
 Major task of traditional relational DB
 Day-to-day operations: purchasing, inventory,
banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (On-Line Analytical Processing)
 Major task of data warehouse system
 Data analysis and decision making
Distinct features (OLTP vs. OLAP)
 User and system orientation: customer vs.
market
 Data contents: current, detailed vs. historical,
consolidated
 Database design: ER + application vs. star +
subject
 View: current, local vs. evolutionary,
integrated
 Access patterns: update vs. read-only but
complex queries
OLTP vs. OLAP
Why Separate Data Warehouse?
High performance for both systems
 DBMS— tuned for OLTP
-access methods, indexing,
concurrency control, recovery
 Warehouse—tuned for OLAP
-complex OLAP queries,
multidimensional view, consolidation
Different functions and different data
 Missing data: Decision support requires
historical data which operational DBs do not
typically maintain
 Data consolidation: DS requires consolidation
(aggregation, summarisation) of data from
heterogeneous sources
 Data quality: different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled

More Related Content

What's hot

Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouseSiddique Ibrahim
 
White Paper - Data Warehouse Documentation Roadmap
White Paper -  Data Warehouse Documentation RoadmapWhite Paper -  Data Warehouse Documentation Roadmap
White Paper - Data Warehouse Documentation RoadmapDavid Walker
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
Classification of data mart
Classification of data martClassification of data mart
Classification of data martkhush_boo31
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 

What's hot (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
White Paper - Data Warehouse Documentation Roadmap
White Paper -  Data Warehouse Documentation RoadmapWhite Paper -  Data Warehouse Documentation Roadmap
White Paper - Data Warehouse Documentation Roadmap
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Classification of data mart
Classification of data martClassification of data mart
Classification of data mart
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data cube
Data cubeData cube
Data cube
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 

Similar to Data Mining Lecture 2 - What is Data Warehouse

11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4ambujm
 
data warehousing
data warehousingdata warehousing
data warehousing143sohil
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbiShaishavShah8
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSamPrem3
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 

Similar to Data Mining Lecture 2 - What is Data Warehouse (20)

11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Chpt2.ppt
Chpt2.pptChpt2.ppt
Chpt2.ppt
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbi
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Unit 1
Unit 1Unit 1
Unit 1
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 

Recently uploaded

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Data Mining Lecture 2 - What is Data Warehouse

  • 1. Data Mining Lecture 2 What is Data Warehouse? Defined in many different ways, but not rigorously  A decision support database that is maintained separately from the organisation’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis Definition by Inmon  “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process” Data warehousing  The process of constructing and using data warehouses Data Warehouse—Subject-Oriented  Organised around major subjects, such as customer, product, sales  Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing  Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources  relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied  Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources o E.g., Hotel price: currency, tax, breakfast covered, etc.  When data is moved to the warehouse, it is converted Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems  Operational database: current value data  Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse  Contains an element of time, explicitly or implicitly  But the key of operational data may or may not contain “time element” Data Warehouse—Non-Volatile - Physically separate stores of data transformed from the operational environment - Operational update of data does not occur in the data warehouse environment  Does not require transaction processing, recovery, and concurrency control mechanisms  Requires only two operations in data accessing: initial loading of data and access of data Data Warehouse vs. Heterogeneous DB Traditional heterogeneous DB integration - Build wrappers/mediators on top of heterogeneous databases - Query driven approach o When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set o Complex information filtering, compete for resources Data warehouse - update-driven, high performance - Information from heterogeneous sources is integrated in advance and stored in warehouses for direct access and analysis
  • 2. Data Warehouse vs. Operational DB OLTP (On-Line Transaction Processing)  Major task of traditional relational DB  Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (On-Line Analytical Processing)  Major task of data warehouse system  Data analysis and decision making Distinct features (OLTP vs. OLAP)  User and system orientation: customer vs. market  Data contents: current, detailed vs. historical, consolidated  Database design: ER + application vs. star + subject  View: current, local vs. evolutionary, integrated  Access patterns: update vs. read-only but complex queries OLTP vs. OLAP Why Separate Data Warehouse? High performance for both systems  DBMS— tuned for OLTP -access methods, indexing, concurrency control, recovery  Warehouse—tuned for OLAP -complex OLAP queries, multidimensional view, consolidation Different functions and different data  Missing data: Decision support requires historical data which operational DBs do not typically maintain  Data consolidation: DS requires consolidation (aggregation, summarisation) of data from heterogeneous sources  Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled