SlideShare a Scribd company logo
Data Warehousing
An Introduction
1Presented by Shahed Khalili on June 18th
, 2007
Knowledge Management & Business
Intelligence
• Transform data to usable Information that makes sense
• Support Business decisions
• Gain competitive advantage
• Identify and analyze market and user trends
• Identify popular and profitable services and products
• Increase effectiveness of marketing
• Have the ability and flexibility to create various reports for
management and Clients quickly
• Have the ability to priorities product enhancements based on
studying customer’s behavioural trends
• Detecting anomalies (e.g. Fraud detection)
• and more…
2
Value
Business Intelligence
DataData
InformationInformation
KnowledgeKnowledge
DecisionDecision
3
What is a Data Warehouse
• A way to store large amount of operational
data to be able to analyze and create
comprehensive and intuitive reports
• A tool that gives management the ability to
access and analyze information about its
business
4
What is a Data Warehouse
• A data warehouse is a copy of transaction
data specifically structured for querying and
reporting.
• Large collection of integrated, non-volatile,
time variant data from multiple sources,
processed for storage in a multi-dimensional
model
Source: Ralph Kimball, Margy Ross, “Data Warehouse Toolkit”,
5
Characteristics of a DW
• Subject-oriented
– Data that gives information about a particular subject instead of about a company's on-going
operations (e.g. CUSTOMER, FINANCIAL INSTITUTION, VENDOR).
• Integrated
– Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole, (standardize encoding, impose consistency in units of measure e.g. one
standard way to record a customer’s transaction across all systems).
• Non-volatile
– Data once loaded into the data warehouse does not change. Each data record represents a
distinct state (event).
• Time-variant
– Data is expected to store for long durations with time stamp to record its state (e.g..
sampling, summary or trend analysis).
6
Typical DW Architecture
7
Source: Connolly, Begg
OLTP vs. Data Warehouse
• OLTP
– Online transaction processing is used at the
routine operation level and supported by
transactional databases optimized for insertion,
updates, deletions and some low level queries.
• Data Warehousing
– Optimized for data retrieval, not routine
transaction processing and supports decision-
support applications.
8
OLTP vs. Data Warehouse
OLTP Data Warehouse
Current Data Historic Data
Detailed Lightly and highly Summarized data
Dynamic Static
High transaction throughput Low transaction throughput
Transaction driven Analysis driven
Serves large number users – low volume Low number of users – large volume
9
Designing a DW
• Top down
– Business Questions – Interview to see what the
business needs to know
• Bottom Up
– What data sources are available and what data is
stored
10
Reminder
• What is a Data Warehouse
– “Large collection of integrated, non-volatile, time
variant data from multiple sources, processed for
storage in a multi-dimensional model”
11
Dimensional Modeling
• Every dimensional model (DM) is composed of
one table with a composite primary key,
called the FACT table, and a set of smaller
tables called DIMENSION tables.
12
Dimensional Modeling
Customer
Vendor
Tim
e
PaymentsPayments
Notes:
This is a simple 3 dimensional data model
(Cube) that stores Payment Facts. The x, y, z
axis are representing the dimension tables and
what’s inside the cube is representing the
FACT table.
A real Dimensional Model is never this simple,
this is only a simple visual representation of
what it could look like. In real life it will
require many more dimensions to describe a
business process of a FACT.
Notes:
This is a simple 3 dimensional data model
(Cube) that stores Payment Facts. The x, y, z
axis are representing the dimension tables and
what’s inside the cube is representing the
FACT table.
A real Dimensional Model is never this simple,
this is only a simple visual representation of
what it could look like. In real life it will
require many more dimensions to describe a
business process of a FACT.
13
Dimensional Modeling Rules
• Each DIMENSION table has a simple (non-composite)
primary key that corresponds exactly to one of the
components of the composite key in the FACT table.
• Forms ‘star-like’ structure, which is called a star
schema or star join.
• All natural keys are replaced with surrogate keys.
Means that every join between FACT and DIMENSION
tables is based on surrogate keys, not natural keys.
• Surrogate keys allows data in the warehouse to have
some independence from the data used and produced
by the OLTP systems. (e.g. changing BINs)
14
Denormalizing
• Denormalizing
– DW data schema is denormalized or partially
denormalized to speed data retrieval.
• e.g. in a normalized DB we don’t store information that
can be calculated from stored information. In DW
design we do.
15
Star Schema
Payment FACT
Customer
Vendor
Time
Amount
Response Code
Customer
ID
Customer Num
Customer Name
Customer Age
Vendor
ID
Vendor Num
Vendor Name
Vendor Account
Num
Address
Time
ID
Date
Day of Week
Quarter
Month
Year
Day of Month
Note how time is
denormalized and
stored as a
dimension
Note how time is
denormalized and
stored as a
dimension
16
Why Denormalizing?
• Looking at Time Dimension table:
– We’re storing fields that can be calculated (such as day of week)
• For example if you are Safeway you want to see what day of week you
have the most customers to staff up. The question we ask the DW would
be “show me the average number of transactions we process on different
days of the week”)
– If we weren’t storing the day of week our DW would have to go
through millions of transactions, calculate the day of week based on
datestamp to match and return the results.
– This calculation is very time consuming and the response time would
be unacceptable.
– We denormalize to reduce the response time by storing more
information than [one could argue is] needed.
17
Fact Table
• Consists of measured or observed variables and identified via pointers pointing to
the dimension tables.
• Best to store facts that are numerical measurements, continuously valued and
additive (egg. in a Payment Fact table: amount, CustomerVendorAcct, traceNo,
returnCode, etc.).
• Each measurement is taken at the intersection of all the dimensions.
• Queries are made to the fact table which links to multiple records from the various
dimension tables to form the result set that will form the report.
• Fact table is sparse, if there is no value to be added, it is not filled.
• Fact Fields in the FACT table must be as minimal as possible
18
Dimension Tables
• Store descriptions of the dimensions of the business.
• Each textual description (attribute) helps to describe a property of
the respective dimension.
• Best to store attributes that are textual, discrete and used as the
source of constraints and row headers in the user’s answer set.
– For attribute that is a numerical measurement, if it varies continuously
every time it is sampled, store it as a fact, otherwise, store as a
dimensional attribute (e.g. standard cost of a product, if it does not
change often, store as a dimensional attribute).
19
FACTless Tables
• Something that happens, but nothing happens
– e.g. To track the Customers that are registered to
use Mobile Banking
• Answers Business questions like: “How many signed up
for this service but never used it?”
• A FACTless table contains only the keys linking
the defined dimension tables
20
9 Steps DW Design Methodology
1. Choosing the process
2. Choosing the grain
3. Identifying and confirming the dimensions
4. Choosing the facts
5. Storing pre-calculations in the fact table
6. Rounding out the dimension tables
7. Choosing the duration of the database
8. Tracking slowly changing dimensions
9. Deciding the query priorities and the query
modes
Source: Ralph Kimball, Margy Ross, “Data Warehouse Toolkit”,
21
9 Steps DW Design Methodology
• Step 1: Choosing Process
– The chosen process (function) refers to the subject matter
of a particular data mart, for example: a Bill Payment
Process
• Step 2: Choosing The Grain
– Decide what a record of the fact table is to represent, i.e..
the grain. For example, the grain is a single Payment
• Step 3: Identifying and conforming the dimensions
– Dimensions set the context for asking questions about the
facts in the fact table. e.g. Who made the Bill Payment
• Step 4: Choosing the Facts
– Facts should be numeric and additive.
22
9 Steps DW Design Methodology
• Step 5: Storing pre-calculations in the fact table
– Once the facts have been selected each should be re-examined to determine
whether there are opportunities to use pre-calculations. (denormalization)
• Step 6: Rounding out the dimension tables
– What properties to include in dimension table to best describe it. Should be
intuitive and understandable
• Step 7: Choosing the duration of the database
– How long to keep the data for
• Step 8: Tracking slowly changing dimensions
– Type 1: where a changed dimension attribute is overwritten
– Type 2: where a changed dimension attribute causes a new dimension record
to be created
– Type 3: where a changed dimension attribute causes an alternate attribute to
be created so that both the old and new values of the attribute are
simultaneously accessible in the same dimension record
23
9 Steps DW Design Methodology
• Step 9: Deciding the query priorities and the
query modes
– Consider physical decision issues
• Indexing for performance, Indexed Views, partitioning,
physical sort order, etc.
• Storage, backup, security
24
Data Warehouse
Data Sources
ETL
Data Warehouse
REPORTINGREPORTING
1
2
3
n
25
ETL
• Extraction, Transformation, Loading
• Tasks of capturing data by extracting from
source systems, cleansing (Transforming) it,
and finally loading results into target system.
• Can be carried out either by separate products, or by a
single integrated solution.
26
DW – Technology and DBMS
• MySQL
– Scale out not scale up.
• MySQL supports Clustering, Replication, etc. You can distribute
the DW across multiple Servers
– Fast database engine, specially for bulk inserts and selects
– Lots of Open Source tools available for ETL
– MySQL is a cheaper solution which makes it more attractive to
business to make the initial investment
27
Thank you
Questions….
28

More Related Content

What's hot

Benefits of data visualization
Benefits of data visualizationBenefits of data visualization
Benefits of data visualization
infographic_art
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
ETL and its impact on Business Intelligence
ETL and its impact on Business IntelligenceETL and its impact on Business Intelligence
ETL and its impact on Business Intelligence
IshaPande
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
Data Science Thailand
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Quang Nguyễn Bá
 
Single View of Customer in Banking
Single View of Customer in BankingSingle View of Customer in Banking
Single View of Customer in BankingRajeev Krishnan
 
UNIT 2.pptx BI
UNIT 2.pptx BIUNIT 2.pptx BI
UNIT 2.pptx BI
vobine5379
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
Yogendra Uikey
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
David Walker
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
SwarnaLatha177
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
Tony Pearson
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
Uyoyo Edosio
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
Fazle Rabbi Ador
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
Dunn Solutions Group
 

What's hot (20)

Benefits of data visualization
Benefits of data visualizationBenefits of data visualization
Benefits of data visualization
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
ETL and its impact on Business Intelligence
ETL and its impact on Business IntelligenceETL and its impact on Business Intelligence
ETL and its impact on Business Intelligence
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Single View of Customer in Banking
Single View of Customer in BankingSingle View of Customer in Banking
Single View of Customer in Banking
 
UNIT 2.pptx BI
UNIT 2.pptx BIUNIT 2.pptx BI
UNIT 2.pptx BI
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 

Viewers also liked

Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
Sajjad Zaheer
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
Rajesh Chandra
 
Project Report On Warehousing Sector (Repaired)
Project Report On Warehousing Sector (Repaired)Project Report On Warehousing Sector (Repaired)
Project Report On Warehousing Sector (Repaired)khetawatrahul
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
Dr. Sunil Kr. Pandey
 
Business analysis in data warehousing
Business analysis in data warehousingBusiness analysis in data warehousing
Business analysis in data warehousingHimanshu
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
Deepali Raut
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
Krish_ver2
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
aksrauf
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Presentation On Warehousing
Presentation On WarehousingPresentation On Warehousing
Presentation On WarehousingRRChandran
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 

Viewers also liked (20)

Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Project Report On Warehousing Sector (Repaired)
Project Report On Warehousing Sector (Repaired)Project Report On Warehousing Sector (Repaired)
Project Report On Warehousing Sector (Repaired)
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
Business analysis in data warehousing
Business analysis in data warehousingBusiness analysis in data warehousing
Business analysis in data warehousing
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Warehouse management
Warehouse managementWarehouse management
Warehouse management
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Component level design
Component   level designComponent   level design
Component level design
 
OLAP
OLAPOLAP
OLAP
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Presentation On Warehousing
Presentation On WarehousingPresentation On Warehousing
Presentation On Warehousing
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 

Similar to An introduction to data warehousing

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
MohammedAmeenUlIslam1
 
Datawarehouse
DatawarehouseDatawarehouse
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
vipush1
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
Er. Nawaraj Bhandari
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
mekuanint sefi
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
hqlm1
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
jainyshah20
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
Krish_ver2
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Subrata Kumer Paul
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
Christalin Nelson
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
GraceJoyMoleroCarwan
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
Simon Belak
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
meghu123
 

Similar to An introduction to data warehousing (20)

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
2. data warehouse 2nd unit
2. data warehouse 2nd unit2. data warehouse 2nd unit
2. data warehouse 2nd unit
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

An introduction to data warehousing

  • 1. Data Warehousing An Introduction 1Presented by Shahed Khalili on June 18th , 2007
  • 2. Knowledge Management & Business Intelligence • Transform data to usable Information that makes sense • Support Business decisions • Gain competitive advantage • Identify and analyze market and user trends • Identify popular and profitable services and products • Increase effectiveness of marketing • Have the ability and flexibility to create various reports for management and Clients quickly • Have the ability to priorities product enhancements based on studying customer’s behavioural trends • Detecting anomalies (e.g. Fraud detection) • and more… 2
  • 4. What is a Data Warehouse • A way to store large amount of operational data to be able to analyze and create comprehensive and intuitive reports • A tool that gives management the ability to access and analyze information about its business 4
  • 5. What is a Data Warehouse • A data warehouse is a copy of transaction data specifically structured for querying and reporting. • Large collection of integrated, non-volatile, time variant data from multiple sources, processed for storage in a multi-dimensional model Source: Ralph Kimball, Margy Ross, “Data Warehouse Toolkit”, 5
  • 6. Characteristics of a DW • Subject-oriented – Data that gives information about a particular subject instead of about a company's on-going operations (e.g. CUSTOMER, FINANCIAL INSTITUTION, VENDOR). • Integrated – Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole, (standardize encoding, impose consistency in units of measure e.g. one standard way to record a customer’s transaction across all systems). • Non-volatile – Data once loaded into the data warehouse does not change. Each data record represents a distinct state (event). • Time-variant – Data is expected to store for long durations with time stamp to record its state (e.g.. sampling, summary or trend analysis). 6
  • 8. OLTP vs. Data Warehouse • OLTP – Online transaction processing is used at the routine operation level and supported by transactional databases optimized for insertion, updates, deletions and some low level queries. • Data Warehousing – Optimized for data retrieval, not routine transaction processing and supports decision- support applications. 8
  • 9. OLTP vs. Data Warehouse OLTP Data Warehouse Current Data Historic Data Detailed Lightly and highly Summarized data Dynamic Static High transaction throughput Low transaction throughput Transaction driven Analysis driven Serves large number users – low volume Low number of users – large volume 9
  • 10. Designing a DW • Top down – Business Questions – Interview to see what the business needs to know • Bottom Up – What data sources are available and what data is stored 10
  • 11. Reminder • What is a Data Warehouse – “Large collection of integrated, non-volatile, time variant data from multiple sources, processed for storage in a multi-dimensional model” 11
  • 12. Dimensional Modeling • Every dimensional model (DM) is composed of one table with a composite primary key, called the FACT table, and a set of smaller tables called DIMENSION tables. 12
  • 13. Dimensional Modeling Customer Vendor Tim e PaymentsPayments Notes: This is a simple 3 dimensional data model (Cube) that stores Payment Facts. The x, y, z axis are representing the dimension tables and what’s inside the cube is representing the FACT table. A real Dimensional Model is never this simple, this is only a simple visual representation of what it could look like. In real life it will require many more dimensions to describe a business process of a FACT. Notes: This is a simple 3 dimensional data model (Cube) that stores Payment Facts. The x, y, z axis are representing the dimension tables and what’s inside the cube is representing the FACT table. A real Dimensional Model is never this simple, this is only a simple visual representation of what it could look like. In real life it will require many more dimensions to describe a business process of a FACT. 13
  • 14. Dimensional Modeling Rules • Each DIMENSION table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the FACT table. • Forms ‘star-like’ structure, which is called a star schema or star join. • All natural keys are replaced with surrogate keys. Means that every join between FACT and DIMENSION tables is based on surrogate keys, not natural keys. • Surrogate keys allows data in the warehouse to have some independence from the data used and produced by the OLTP systems. (e.g. changing BINs) 14
  • 15. Denormalizing • Denormalizing – DW data schema is denormalized or partially denormalized to speed data retrieval. • e.g. in a normalized DB we don’t store information that can be calculated from stored information. In DW design we do. 15
  • 16. Star Schema Payment FACT Customer Vendor Time Amount Response Code Customer ID Customer Num Customer Name Customer Age Vendor ID Vendor Num Vendor Name Vendor Account Num Address Time ID Date Day of Week Quarter Month Year Day of Month Note how time is denormalized and stored as a dimension Note how time is denormalized and stored as a dimension 16
  • 17. Why Denormalizing? • Looking at Time Dimension table: – We’re storing fields that can be calculated (such as day of week) • For example if you are Safeway you want to see what day of week you have the most customers to staff up. The question we ask the DW would be “show me the average number of transactions we process on different days of the week”) – If we weren’t storing the day of week our DW would have to go through millions of transactions, calculate the day of week based on datestamp to match and return the results. – This calculation is very time consuming and the response time would be unacceptable. – We denormalize to reduce the response time by storing more information than [one could argue is] needed. 17
  • 18. Fact Table • Consists of measured or observed variables and identified via pointers pointing to the dimension tables. • Best to store facts that are numerical measurements, continuously valued and additive (egg. in a Payment Fact table: amount, CustomerVendorAcct, traceNo, returnCode, etc.). • Each measurement is taken at the intersection of all the dimensions. • Queries are made to the fact table which links to multiple records from the various dimension tables to form the result set that will form the report. • Fact table is sparse, if there is no value to be added, it is not filled. • Fact Fields in the FACT table must be as minimal as possible 18
  • 19. Dimension Tables • Store descriptions of the dimensions of the business. • Each textual description (attribute) helps to describe a property of the respective dimension. • Best to store attributes that are textual, discrete and used as the source of constraints and row headers in the user’s answer set. – For attribute that is a numerical measurement, if it varies continuously every time it is sampled, store it as a fact, otherwise, store as a dimensional attribute (e.g. standard cost of a product, if it does not change often, store as a dimensional attribute). 19
  • 20. FACTless Tables • Something that happens, but nothing happens – e.g. To track the Customers that are registered to use Mobile Banking • Answers Business questions like: “How many signed up for this service but never used it?” • A FACTless table contains only the keys linking the defined dimension tables 20
  • 21. 9 Steps DW Design Methodology 1. Choosing the process 2. Choosing the grain 3. Identifying and confirming the dimensions 4. Choosing the facts 5. Storing pre-calculations in the fact table 6. Rounding out the dimension tables 7. Choosing the duration of the database 8. Tracking slowly changing dimensions 9. Deciding the query priorities and the query modes Source: Ralph Kimball, Margy Ross, “Data Warehouse Toolkit”, 21
  • 22. 9 Steps DW Design Methodology • Step 1: Choosing Process – The chosen process (function) refers to the subject matter of a particular data mart, for example: a Bill Payment Process • Step 2: Choosing The Grain – Decide what a record of the fact table is to represent, i.e.. the grain. For example, the grain is a single Payment • Step 3: Identifying and conforming the dimensions – Dimensions set the context for asking questions about the facts in the fact table. e.g. Who made the Bill Payment • Step 4: Choosing the Facts – Facts should be numeric and additive. 22
  • 23. 9 Steps DW Design Methodology • Step 5: Storing pre-calculations in the fact table – Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations. (denormalization) • Step 6: Rounding out the dimension tables – What properties to include in dimension table to best describe it. Should be intuitive and understandable • Step 7: Choosing the duration of the database – How long to keep the data for • Step 8: Tracking slowly changing dimensions – Type 1: where a changed dimension attribute is overwritten – Type 2: where a changed dimension attribute causes a new dimension record to be created – Type 3: where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record 23
  • 24. 9 Steps DW Design Methodology • Step 9: Deciding the query priorities and the query modes – Consider physical decision issues • Indexing for performance, Indexed Views, partitioning, physical sort order, etc. • Storage, backup, security 24
  • 25. Data Warehouse Data Sources ETL Data Warehouse REPORTINGREPORTING 1 2 3 n 25
  • 26. ETL • Extraction, Transformation, Loading • Tasks of capturing data by extracting from source systems, cleansing (Transforming) it, and finally loading results into target system. • Can be carried out either by separate products, or by a single integrated solution. 26
  • 27. DW – Technology and DBMS • MySQL – Scale out not scale up. • MySQL supports Clustering, Replication, etc. You can distribute the DW across multiple Servers – Fast database engine, specially for bulk inserts and selects – Lots of Open Source tools available for ETL – MySQL is a cheaper solution which makes it more attractive to business to make the initial investment 27