SlideShare a Scribd company logo
Data Warehousing

1
Overview
•
•
•
•
•
•

What is data warehouse?
Why data warehouse?
Data reconciliation – ETL process
Data warehouse architectures
Star schema – dimensional modeling
Data analysis

2
What is Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained separately from
the organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.

• “A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of
management’s decision-making process.”—W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses

3
Data Warehouse—SubjectOriented
• Organized around major subjects, such as
customer, product, sales
• Focusing on the modeling and analysis of data
for decision makers, not on daily operations or
transaction processing
• Provide a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process
4
Data Warehouse—Integrated
• Constructed by integrating multiple,
heterogeneous data sources
– relational databases, flat files, on-line transaction
records

• Data cleaning and data integration techniques
are applied.
– Ensure consistency in naming conventions,
encoding structures, attribute measures, etc. among
different data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.

– When data is moved to the warehouse, it is
converted.
5
Data Warehouse—Time Variant
• The time horizon for the data warehouse is
significantly longer than that of operational
systems
– Operational database: current value data
– Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)

• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not
6
contain “time element”
Data Warehouse—Nonvolatile
• A physically separate store of data transformed
from the operational environment
• Operational update of data does not occur in the
data warehouse environment
– Does not require transaction processing, recovery,
and concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
7
Trends in Organisations that encourage
the need for data warehousing
• No single system of record
• Multiple systems are not synchronized
• Organisations want to analyse the activities in
a balanced way
• Customer relationship management
• Supplier relationship management

8
Need for Data Warehousing
• Integrated, company-wide view of high-quality
information (from different databases)
• Separation of operational and informational systems
and data (for improved performance)

9
Operational & Informational System
The need to separate operational and informational
systems is based on three primary factors:
• A data warehouse centralizes data that are scattered
throughout disparate operational systems and make them
a available for decision support applications
• A properly designed data warehouse adds value to data
by improving their quality
• A separate data warehouse eliminates much of contention
for resources that result when informational application
confounded with operational processing
10
11
Data Reconciliation
• Typical operational data is:
– Transient – not historical
– Not normalised (perhaps due to denormalisation for
performance)
– Restricted in scope – not comprehensive
– Sometimes poor quality – inconsistencies and errors

• After ETL (Extract, Transform, Load), data
should be:
–
–
–
–
–

Detailed – not summarized yet
Historical – periodic
Normalised – 3rd normal form or higher
Comprehensive – enterprise-wide perspective
Timely – data should be current enough to assist decisionmaking
– Quality controlled – accurate with full integrity

12
The ETL Process/ Data
Reconciliation Main Steps
•
•
•
•

Capture/Extract
Scrub or data cleansing
Transform
Load and Index

13
Static extract = capturing a

Incremental extract =

snapshot of the source data at a point
in time

capturing changes that have
occurred since the last static extract
14
Fixing errors: misspellings,

Also: decoding, reformatting, time

erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies

stamping, conversion, key generation,
merging, error detection/logging, locating
15
missing data
Record-level:

Field-level:

Selection – data partitioning
Joining – data combining
Aggregation – data summarization

single-field – from one field to one field
multi-field – from many fields to one, or
16
one field to many
Refresh mode: bulk rewriting of
target data at periodic intervals

Update mode: only changes in
source data are written to data
17
warehouse
Data Warehouse Architectures
• Generic Two-Level Architecture
• Independent Data Mart
• Dependent Data Mart and Operational
Data Store
• Logical Data Mart and @ctive
Warehouse

18
Generic two-level architecture

L
One companywide warehouse

T
E
Periodic extraction  data is not completely current in warehouse
19
Independent data mart

Data marts:
Mini-warehouses, limited in scope

L

T
E

Separate ETL for each
independent data mart

Data access complexity
due to multiple data marts
20
Dependent data mart with
operational data store

ODS provides option for
obtaining current data

L

T
E
Single ETL for
enterprise data warehouse (EDW)

Dependent data marts
loaded from21
EDW
ODS and data warehouse
are one and the same

L

T
E
Near real-time ETL for
@active Data Warehouse

Data marts are NOT separate
databases, but logical views of the
data warehouse
22
 Easier to create new data marts
Data Characteristics
Status vs. Event Data
Status

Event – a database action
(create/update/delete) that
results from a transaction

Status
23
Data Characteristics
Transient vs.
Periodic Data

Changes to existing
records are written
over previous
records, thus
destroying the
previous data content
Data are never
physically altered or
deleted once they
have been added to
the store

24
star schema
Fact tables contain
factual or quantitative
data

1:N relationship
between dimension
tables and fact
tables

Dimension tables
are denormalized to
maximize
performance

Dimension tables contain
descriptions about the
subjects of the business

Star Schema: Simple database design in
which dimensional data are separated from
fact data. Excellent for queries, but bad for
25
online transaction processing
Star schema example
Fact table provides statistics for sales broken
down by product, period and store dimensions

26
27
On-Line Analytical Processing (OLAP)
• The use of a set of graphical tools that
provides users with multidimensional views of
their data and allows them to analyze the
data using simple windowing techniques
• Relational OLAP (ROLAP)
– Traditional relational representation

• Multidimensional OLAP (MOLAP)
– Cube structure

• OLAP Operations
– Cube slicing – come up with 2-D view of data
– Drill-down – going from summary to more
detailed views
28
Data Warehouse vs. Operational
DBMS
• OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.

• OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making

• Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
29
OLTP vs. OLAP
OLTP

OLAP

users

clerk, IT professional

knowledge worker

function

day to day operations

decision support

DB design

application-oriented

subject-oriented

data

current, up-to-date
detailed, flat relational
isolated
repetitive

historical,
summarized, multidimensional
integrated, consolidated
ad-hoc
lots of scans

unit of work

read/write
index/hash on prim. key
short, simple transaction

# records accessed

tens

millions

#users

thousands

hundreds

DB size

100MB-GB

100GB-TB

metric

transaction throughput

query throughput, response

usage
access

complex query

30
Slicing a data cube

31
Summary report

Example:
Drill-down
Drill-down with color added

32
Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting
using crosstabs, tables, charts and graphs
– Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools

33

More Related Content

What's hot

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
Er. Nawaraj Bhandari
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Samir Sabry
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
Ashish Kumar Thakur
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Event
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
Samir Sabry
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Varun Jain
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
Amit Sarkar
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
uncleRhyme
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Eyad Manna
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
Ankita Dubey
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
Shwetabh Jaiswal
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
Dhani Ahmad
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
Valmik Potbhare
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
Aashish Rathod
 
Classification of data mart
Classification of data martClassification of data mart
Classification of data mart
khush_boo31
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
DataminingTools Inc
 
Data warehousing unit 1
Data warehousing unit 1Data warehousing unit 1
Data warehousing unit 1
WE-IT TUTORIALS
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 

What's hot (19)

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Classification of data mart
Classification of data martClassification of data mart
Classification of data mart
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data warehousing unit 1
Data warehousing unit 1Data warehousing unit 1
Data warehousing unit 1
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 

Similar to Ch1 data-warehousing

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
Vibrant Technologies & Computers
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
Murli Jha
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
SalehaMariyam
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
Y Parandama Reddy
 
Chpt2.ppt
Chpt2.pptChpt2.ppt
Chpt2.ppt
PawanDhiwar1
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
hqlm1
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
GraceJoyMoleroCarwan
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
Dr. Mahendra Srivastava
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
RahulSingh986955
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
MutiaSari53
 
SAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA TutorialSAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA Tutorial
ZaranTech LLC
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
Yogendra Uikey
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Vigneshwaar Ponnuswamy
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Subrata Kumer Paul
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousing
Zahra Mansoori
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbi
ShaishavShah8
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
Azad public school
 

Similar to Ch1 data-warehousing (20)

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Chpt2.ppt
Chpt2.pptChpt2.ppt
Chpt2.ppt
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
 
SAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA TutorialSAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA Tutorial
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousing
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbi
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
 

Recently uploaded

ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
dot55audits
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
Chevonnese Chevers Whyte, MBA, B.Sc.
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
BoudhayanBhattachari
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 

Recently uploaded (20)

ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 

Ch1 data-warehousing

  • 2. Overview • • • • • • What is data warehouse? Why data warehouse? Data reconciliation – ETL process Data warehouse architectures Star schema – dimensional modeling Data analysis 2
  • 3. What is Data Warehouse? • Defined in many different ways, but not rigorously. – A decision support database that is maintained separately from the organization’s operational database – Support information processing by providing a solid platform of consolidated, historical data for analysis. • “A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon • Data warehousing: – The process of constructing and using data warehouses 3
  • 4. Data Warehouse—SubjectOriented • Organized around major subjects, such as customer, product, sales • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process 4
  • 5. Data Warehouse—Integrated • Constructed by integrating multiple, heterogeneous data sources – relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, breakfast covered, etc. – When data is moved to the warehouse, it is converted. 5
  • 6. Data Warehouse—Time Variant • The time horizon for the data warehouse is significantly longer than that of operational systems – Operational database: current value data – Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) • Every key structure in the data warehouse – Contains an element of time, explicitly or implicitly – But the key of operational data may or may not 6 contain “time element”
  • 7. Data Warehouse—Nonvolatile • A physically separate store of data transformed from the operational environment • Operational update of data does not occur in the data warehouse environment – Does not require transaction processing, recovery, and concurrency control mechanisms – Requires only two operations in data accessing: • initial loading of data and access of data 7
  • 8. Trends in Organisations that encourage the need for data warehousing • No single system of record • Multiple systems are not synchronized • Organisations want to analyse the activities in a balanced way • Customer relationship management • Supplier relationship management 8
  • 9. Need for Data Warehousing • Integrated, company-wide view of high-quality information (from different databases) • Separation of operational and informational systems and data (for improved performance) 9
  • 10. Operational & Informational System The need to separate operational and informational systems is based on three primary factors: • A data warehouse centralizes data that are scattered throughout disparate operational systems and make them a available for decision support applications • A properly designed data warehouse adds value to data by improving their quality • A separate data warehouse eliminates much of contention for resources that result when informational application confounded with operational processing 10
  • 11. 11
  • 12. Data Reconciliation • Typical operational data is: – Transient – not historical – Not normalised (perhaps due to denormalisation for performance) – Restricted in scope – not comprehensive – Sometimes poor quality – inconsistencies and errors • After ETL (Extract, Transform, Load), data should be: – – – – – Detailed – not summarized yet Historical – periodic Normalised – 3rd normal form or higher Comprehensive – enterprise-wide perspective Timely – data should be current enough to assist decisionmaking – Quality controlled – accurate with full integrity 12
  • 13. The ETL Process/ Data Reconciliation Main Steps • • • • Capture/Extract Scrub or data cleansing Transform Load and Index 13
  • 14. Static extract = capturing a Incremental extract = snapshot of the source data at a point in time capturing changes that have occurred since the last static extract 14
  • 15. Fixing errors: misspellings, Also: decoding, reformatting, time erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies stamping, conversion, key generation, merging, error detection/logging, locating 15 missing data
  • 16. Record-level: Field-level: Selection – data partitioning Joining – data combining Aggregation – data summarization single-field – from one field to one field multi-field – from many fields to one, or 16 one field to many
  • 17. Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data 17 warehouse
  • 18. Data Warehouse Architectures • Generic Two-Level Architecture • Independent Data Mart • Dependent Data Mart and Operational Data Store • Logical Data Mart and @ctive Warehouse 18
  • 19. Generic two-level architecture L One companywide warehouse T E Periodic extraction  data is not completely current in warehouse 19
  • 20. Independent data mart Data marts: Mini-warehouses, limited in scope L T E Separate ETL for each independent data mart Data access complexity due to multiple data marts 20
  • 21. Dependent data mart with operational data store ODS provides option for obtaining current data L T E Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from21 EDW
  • 22. ODS and data warehouse are one and the same L T E Near real-time ETL for @active Data Warehouse Data marts are NOT separate databases, but logical views of the data warehouse 22  Easier to create new data marts
  • 23. Data Characteristics Status vs. Event Data Status Event – a database action (create/update/delete) that results from a transaction Status 23
  • 24. Data Characteristics Transient vs. Periodic Data Changes to existing records are written over previous records, thus destroying the previous data content Data are never physically altered or deleted once they have been added to the store 24
  • 25. star schema Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business Star Schema: Simple database design in which dimensional data are separated from fact data. Excellent for queries, but bad for 25 online transaction processing
  • 26. Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions 26
  • 27. 27
  • 28. On-Line Analytical Processing (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques • Relational OLAP (ROLAP) – Traditional relational representation • Multidimensional OLAP (MOLAP) – Cube structure • OLAP Operations – Cube slicing – come up with 2-D view of data – Drill-down – going from summary to more detailed views 28
  • 29. Data Warehouse vs. Operational DBMS • OLTP (on-line transaction processing) – Major task of traditional relational DBMS – Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. • OLAP (on-line analytical processing) – Major task of data warehouse system – Data analysis and decision making • Distinct features (OLTP vs. OLAP): – User and system orientation: customer vs. market – Data contents: current, detailed vs. historical, consolidated – Database design: ER + application vs. star + subject – View: current, local vs. evolutionary, integrated – Access patterns: update vs. read-only but complex queries 29
  • 30. OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated repetitive historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans unit of work read/write index/hash on prim. key short, simple transaction # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response usage access complex query 30
  • 31. Slicing a data cube 31
  • 33. Data Warehouse Usage • Three kinds of data warehouse applications – Information processing • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs – Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations, slice-dice, drilling, pivoting – Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools 33