SlideShare a Scribd company logo
1 of 52
1
Data Warehousing
2
Overview
Part 1: Data Warehouses
Part 2: OLAP
Part 3: Data Mining
Part 4: Query Processing and Optimization
3
Part 1: Data Warehouses
4
Data, Data everywhere
yet ...
I can’t find the data I need
data is scattered over the network
many versions, subtle differences
I can’t get the data I need
need an expert to get the data
I can’t understand the data I
found
available data poorly documented
I can’t use the data I found
results are unexpected
data needs to be transformed from
one form to other
5
What is a Data Warehouse?
A single, complete and consistent
store of data obtained from a
variety of different sources made
available to end users in a what
they can understand and use in a
business context.
Data Characteristics:
Subject Oriented
Integrated
Non-Volatile
Time-Referenced
6
Which are our
lowest/highest margin
customers ?
Which are our
lowest/highest margin
customers ?
Who are my customers
and what products
are they buying?
Who are my customers
and what products
are they buying?
Which customers
are most likely to go
to the competition ?
Which customers
are most likely to go
to the competition ?
What impact will
new products/services
have on revenue
and margins?
What impact will
new products/services
have on revenue
and margins?
What product prom-
-otions have the biggest
impact on revenue?
What product prom-
-otions have the biggest
impact on revenue?
What is the most
effective distribution
channel?
What is the most
effective distribution
channel?
Why Data Warehousing?
WHY BI AND WHAT IS IT
Information is the life-blood of modern
organization
BI transform data stored in core business systems
into knowledge,which in turn enables you to,
•Know your customers and let your customer know you
•Streamline your business processes by aligning
technology with business goals
•Know how your business runs as a whole by
getting 360 deg view of your organization
BI Architecture
9
Decision Support
Used to manage and control business
Data is historical or point-in-time
Optimized for inquiry rather than update
Use of the system is loosely defined and
can be ad-hoc
Used by managers and end-users to
understand the business and make
judgements
10
What are the users
saying...
Data should be integrated
across the enterprise
Summary data had a real
value to the organization
Historical data held the key to
understanding data over time
What-if capabilities are
required
11
Data Warehousing --
It is a process
Technique for assembling and
managing data from various
sources for the purpose of
answering business
questions. Thus making
decisions that were not
previous possible
A decision support database
maintained separately from
the organization’s operational
database
12
Traditional RDBMS used
for OLTP
Database Systems have been used
traditionally for OLTP
clerical data processing tasks
detailed, up to date data
structured repetitive tasks
read/update a few records
isolation, recovery and integrity are critical
Will call these operational systems
13
OLTP vs Data Warehouse
OLTP
Application Oriented
Used to run business
Clerical User
Detailed data
Current up to date
Isolated Data
Repetitive access by
small transactions
Read/Update access
Warehouse (DSS)
Subject Oriented
Used to analyze
business
Manager/Analyst
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access using
large queries
Mostly read access
14
Data Warehouse
Architecture
Relational
Databases
Legacy
Data
Purchased
Data
Data Warehouse
Engine
Optimized Loader
Extraction
Cleansing
Analyze
Query
Metadata Repository
15
From the Data Warehouse
to Data Marts
Departmentally
Structured
Individually
Structured
Data Warehouse
Organizationally
Structured
Less
More
History
Normalized
Detailed
Data
Information
16
Users have different views
of Data
Organizationally
structured
OLAP
Explorers: Seek out the
unknown and previously
unsuspected rewards hiding in
the detailed data
Farmers: Harvest information
from known access paths
Tourists: Browse
information harvested
by farmers
17
Wal*Mart Case Study
Founded by Sam Walton
One the largest Super Market Chains in
the US
Wal*Mart: 2000+ Retail Stores
SAM's Clubs 100+Wholesalers Stores
18
Old Retail Paradigm
Wal*Mart
Inventory Management
Merchandise Accounts
Payable
Purchasing
Supplier Promotions:
National, Region, Store
Level
Suppliers
Accept Orders
Promote Products
Provide special
Incentives
Monitor and Track The
Incentives
Bill and Collect
Receivables
Estimate Retailer
Demands
19
New (Just-In-Time) Retail
Paradigm
No more deals
Shelf-Pass Through (POS Application)
One Unit Price
Suppliers paid once a week on ACTUAL items sold
Wal*Mart Manager
Daily Inventory Restock
Suppliers (sometimes SameDay) ship to Wal*Mart
Warehouse-Pass Through
Stock some Large Items
Delivery may come from supplier
Distribution Center
Supplier’s merchandise unloaded directly onto Wal*Mart Trucks
20
Information as a Strategic
Weapon
Daily Summary of all Sales Information
Regional Analysis of all Stores in a logical area
Specific Product Sales
Specific Supplies Sales
Trend Analysis, etc.
Wal*Mart uses information when negotiating
with
Suppliers
Advertisers etc.
21
Schema Design
Database organization
must look like business
must be recognizable by business user
approachable by business user
Must be simple
Schema Types
Star Schema
Fact Constellation Schema
Snowflake schema
22
Star Schema
A single fact table and for each dimension one
dimension table
T
i
m
e
p
r
o
d
c
u
s
t
c
i
t
y
f
a
c
t
date, custno, prodno, cityname, sales
23
Dimension Tables
Dimension tables
Define business in terms already familiar to
users
Wide rows with lots of descriptive text
Small tables (about a million rows)
Joined to fact table by a foreign key
heavily indexed
typical dimensions
time periods, geographic region (markets, cities),
products, customers, salesperson, etc.
24
Fact Table
Central table
Typical example: individual sales records
mostly raw numeric items
narrow rows, a few columns at most
large number of rows (millions to a billion)
Access via dimensions
25
Snowflake schema
Represent dimensional hierarchy directly by
normalizing tables.
Easy to maintain and saves storage
T
i
m
e
p
r
o
d
c
u
s
t
c
i
t
y
f
a
c
t
date, custno, prodno, cityname, ...
r
e
g
i
o
n
26
Fact Constellation
Fact Constellation
Multiple fact tables that share many
dimension tables
Booking and Checkout may share many
dimension tables in the hotel industry
Hotels
Travel Agents
Promotion
Room Type
Customer
Booking
Checkout
27
Data Granularity in
Warehouse
Summarized data stored
reduce storage costs
reduce cpu usage
increases performance since smaller number
of records to be processed
design around traditional high level reporting
needs
tradeoff with volume of data to be stored and
detailed usage of data
28
Granularity in Warehouse
Solution is to have dual level of granularity
Store summary data on disks
95% of DSS processing done against this data
Store detail on tapes
5% of DSS processing against this data
29
Levels of Granularity
Operational
60 days of
activity
account
activity date
amount
teller
location
account bal
account
month
# trans
withdrawals
deposits
average bal
amount
activity date
amount
account bal
monthly account
register -- up to
10 years
Not all fields
need be
archived
Banking Example
30
Data Integration Across
Sources
Trust Credit cardSavings Loans
Same data
different name
Different data
Same name
Data found here
nowhere else
Different keys
same data
31
Data Transformation
Data transformation is the foundation
for achieving single version of the truth
Major concern for IT
Data warehouse can fail if appropriate
data transformation strategy is not
developed
Sequential Legacy Relational ExternalOperational/
Source Data
Data
Transformation
Accessing Capturing Extracting Householding Filtering
Reconciling Conditioning Loading Validating Scoring
32
Data Transformation
Example
encodingunitfield
appl A - balance
appl B - bal
appl C - currbal
appl D - balcurr
appl A - pipeline - cm
appl B - pipeline - in
appl C - pipeline - feet
appl D - pipeline - yds
appl A - m,f
appl B - 1,0
appl C - x,y
appl D - male, female
Data Warehouse
33
Data Integrity Problems
Same person, different spellings
Agarwal, Agrawal, Aggarwal etc...
Multiple ways to denote company name
Persistent Systems, PSPL, Persistent Pvt. LTD.
Use of different names
mumbai, bombay
Different account numbers generated by different
applications for the same customer
Required fields left blank
Invalid product codes collected at point of sale
manual entry leads to mistakes
“in case of a problem use 9999999”
34
When to Refresh?
periodically (e.g., every night, every week)
or after significant events
on every update: not warranted unless
warehouse data require current data (up
to the minute stock quotes)
refresh policy set by administrator based
on user needs and traffic
possibly different policies for different
sources
35
Refresh techniques
Incremental techniques
detect changes on base tables: replication
servers (e.g., Sybase, Oracle, IBM Data
Propagator)
snapshots (Oracle)
transaction shipping (Sybase)
compute changes to derived and summary
tables
maintain transactional correctness for
incremental load
36
How To Detect Changes
Create a snapshot log table to record ids
of updated rows of source data and
timestamp
Detect changes by:
Defining after row triggers to update snapshot
log when source table changes
Using regular transaction log to detect
changes to source data
37
Querying Data Warehouses
SQL Extensions
Multidimensional modeling of data
OLAP
More on OLAP later …
38
Reporting Tools
Andyne Computing -- GQL
Brio -- BrioQuery
Business Objects -- Business Objects
Cognos -- Impromptu
Information Builders Inc. -- Focus for Windows
Oracle -- Discoverer2000
Platinum Technology -- SQL*Assist, ProReports
PowerSoft -- InfoMaker
SAS Institute -- SAS/Assist
Software AG -- Esperant
Sterling Software -- VISION:Data
39
Deploying Data
Warehouses
What business information
keeps you in business today?
What business information can
put you out of business
tomorrow?
What business information
should be a mouse click away?
What business conditions are
the driving the need for
business information?
40
Cultural Considerations
Not just a technology project
New way of using information
to support daily activities and
decision making
Care must be taken to prepare
organization for change
Must have organizational
backing and support
41
User Training
Users must have a higher level of IT
proficiency than for operational systems
Training to help users analyze data in the
warehouse effectively
42
Warehouse Products
Computer Associates -- CA-Ingres
Hewlett-Packard -- Allbase/SQL
Informix -- Informix, Informix XPS
Microsoft -- SQL Server
Oracle -- Oracle7, Oracle Parallel Server
Red Brick -- Red Brick Warehouse
SAS Institute -- SAS
Software AG -- ADABAS
Sybase -- SQL Server, IQ, MPP
43
Strengths of OLAP
It is a powerful visualization
tool
It provides fast, interactive
response times
It is good for analyzing time
series
It can be useful to find some
clusters and outliners
Many vendors offer OLAP
tools
44
Part 3: Data Mining
45
Why Data Mining
Credit ratings/targeted marketing:
Given a database of 100,000 names, which persons are the least likely
to default on their credit cards?
Identify likely responders to sales promotions
Fraud detection
Which types of transactions are likely to be fraudulent, given the
demographics and transactional history of a particular customer?
Customer relationship management:
Which of my customers are likely to be the most loyal, and which are
most likely to leave for a competitor? :
Data Mining helps extract
such information
46
Data mining
Process of semi-automatically analyzing
large databases to find interesting and
useful patterns
Overlaps with machine learning, statistics,
artificial intelligence and databases but
more scalable in number of features and
instances
more automated to handle heterogeneous
data
47
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
48
Data Mining in Use
The US Government uses Data Mining to track
fraud
A Supermarket becomes an information broker
Basketball teams use it to track game strategy
Cross Selling
Target Marketing
Holding on to Good Customers
Weeding out Bad Customers
49
Why Now?
Data is being produced
Data is being warehoused
The computing power is available
The computing power is affordable
The competitive pressures are strong
Commercial products are available
50
Data Mining works with
Warehouse Data
Data Warehousing provides
the Enterprise with a memory
Data Mining provides the
Enterprise with intelligence
51
Mining market
Around 20 to 30 mining tool vendors
Major players:
Clementine,
IBM’s Intelligent Miner,
SGI’s MineSet,
SAS’s Enterprise Miner.
All pretty much the same set of tools
Many embedded products: fraud detection,
electronic commerce applications
52
OLAP Mining integration
OLAP (On Line Analytical Processing)
Fast interactive exploration of multidim.
aggregates.
Heavy reliance on manual operations for
analysis:
Tedious and error-prone on large
multidimensional data
Ideal platform for vertical integration of mining
but needs to be interactive instead of batch.

More Related Content

What's hot

INTRODUCTION TO BUSINESS INTELLIGENCE and DATA MINING
INTRODUCTION TO BUSINESS INTELLIGENCE and DATA MININGINTRODUCTION TO BUSINESS INTELLIGENCE and DATA MINING
INTRODUCTION TO BUSINESS INTELLIGENCE and DATA MININGRajesh Math
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its conceptsGaurav Garg
 
Data warehouse
Data warehouseData warehouse
Data warehouse_123_
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousingukc4
 
Business process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse designBusiness process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse designSlava Kokaev
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNGDivya Tadi
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Dr. Sunil Kr. Pandey
 
Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessArsalan Qadri
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 

What's hot (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
INTRODUCTION TO BUSINESS INTELLIGENCE and DATA MINING
INTRODUCTION TO BUSINESS INTELLIGENCE and DATA MININGINTRODUCTION TO BUSINESS INTELLIGENCE and DATA MINING
INTRODUCTION TO BUSINESS INTELLIGENCE and DATA MINING
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
3dw
3dw3dw
3dw
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 
Business process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse designBusiness process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse design
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3
 
Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail business
 
3dw
3dw3dw
3dw
 
Chapter 2 - Retail Sales
Chapter 2 - Retail Sales Chapter 2 - Retail Sales
Chapter 2 - Retail Sales
 
Data mart
Data martData mart
Data mart
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 

Viewers also liked

Lineup Card Management
Lineup Card ManagementLineup Card Management
Lineup Card Managementmark swanson
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Er Bansal
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
A First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformA First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformSafe Software
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationDenodo
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayTorana, Inc.
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingDunn Solutions Group
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Denodo
 
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkDesigning and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkBharat Vadlamudi
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieAndreas Buckenhofer
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 

Viewers also liked (20)

Lineup Card Management
Lineup Card ManagementLineup Card Management
Lineup Card Management
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
A First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformA First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job Platform
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
 
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkDesigning and implementing_an_etl_framework
Designing and implementing_an_etl_framework
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 

Similar to Data Warehouse-Final

Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Oracle Hyperion overview
Oracle Hyperion overviewOracle Hyperion overview
Oracle Hyperion overviewClick4learning
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptKRISHNARAJ207
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 

Similar to Data Warehouse-Final (20)

Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Oracle Hyperion overview
Oracle Hyperion overviewOracle Hyperion overview
Oracle Hyperion overview
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
DWM
DWMDWM
DWM
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 

Data Warehouse-Final

  • 2. 2 Overview Part 1: Data Warehouses Part 2: OLAP Part 3: Data Mining Part 4: Query Processing and Optimization
  • 3. 3 Part 1: Data Warehouses
  • 4. 4 Data, Data everywhere yet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other
  • 5. 5 What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. Data Characteristics: Subject Oriented Integrated Non-Volatile Time-Referenced
  • 6. 6 Which are our lowest/highest margin customers ? Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel? What is the most effective distribution channel? Why Data Warehousing?
  • 7. WHY BI AND WHAT IS IT Information is the life-blood of modern organization BI transform data stored in core business systems into knowledge,which in turn enables you to, •Know your customers and let your customer know you •Streamline your business processes by aligning technology with business goals •Know how your business runs as a whole by getting 360 deg view of your organization
  • 9. 9 Decision Support Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgements
  • 10. 10 What are the users saying... Data should be integrated across the enterprise Summary data had a real value to the organization Historical data held the key to understanding data over time What-if capabilities are required
  • 11. 11 Data Warehousing -- It is a process Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible A decision support database maintained separately from the organization’s operational database
  • 12. 12 Traditional RDBMS used for OLTP Database Systems have been used traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical Will call these operational systems
  • 13. 13 OLTP vs Data Warehouse OLTP Application Oriented Used to run business Clerical User Detailed data Current up to date Isolated Data Repetitive access by small transactions Read/Update access Warehouse (DSS) Subject Oriented Used to analyze business Manager/Analyst Summarized and refined Snapshot data Integrated Data Ad-hoc access using large queries Mostly read access
  • 15. 15 From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
  • 16. 16 Users have different views of Data Organizationally structured OLAP Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers
  • 17. 17 Wal*Mart Case Study Founded by Sam Walton One the largest Super Market Chains in the US Wal*Mart: 2000+ Retail Stores SAM's Clubs 100+Wholesalers Stores
  • 18. 18 Old Retail Paradigm Wal*Mart Inventory Management Merchandise Accounts Payable Purchasing Supplier Promotions: National, Region, Store Level Suppliers Accept Orders Promote Products Provide special Incentives Monitor and Track The Incentives Bill and Collect Receivables Estimate Retailer Demands
  • 19. 19 New (Just-In-Time) Retail Paradigm No more deals Shelf-Pass Through (POS Application) One Unit Price Suppliers paid once a week on ACTUAL items sold Wal*Mart Manager Daily Inventory Restock Suppliers (sometimes SameDay) ship to Wal*Mart Warehouse-Pass Through Stock some Large Items Delivery may come from supplier Distribution Center Supplier’s merchandise unloaded directly onto Wal*Mart Trucks
  • 20. 20 Information as a Strategic Weapon Daily Summary of all Sales Information Regional Analysis of all Stores in a logical area Specific Product Sales Specific Supplies Sales Trend Analysis, etc. Wal*Mart uses information when negotiating with Suppliers Advertisers etc.
  • 21. 21 Schema Design Database organization must look like business must be recognizable by business user approachable by business user Must be simple Schema Types Star Schema Fact Constellation Schema Snowflake schema
  • 22. 22 Star Schema A single fact table and for each dimension one dimension table T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname, sales
  • 23. 23 Dimension Tables Dimension tables Define business in terms already familiar to users Wide rows with lots of descriptive text Small tables (about a million rows) Joined to fact table by a foreign key heavily indexed typical dimensions time periods, geographic region (markets, cities), products, customers, salesperson, etc.
  • 24. 24 Fact Table Central table Typical example: individual sales records mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a billion) Access via dimensions
  • 25. 25 Snowflake schema Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname, ... r e g i o n
  • 26. 26 Fact Constellation Fact Constellation Multiple fact tables that share many dimension tables Booking and Checkout may share many dimension tables in the hotel industry Hotels Travel Agents Promotion Room Type Customer Booking Checkout
  • 27. 27 Data Granularity in Warehouse Summarized data stored reduce storage costs reduce cpu usage increases performance since smaller number of records to be processed design around traditional high level reporting needs tradeoff with volume of data to be stored and detailed usage of data
  • 28. 28 Granularity in Warehouse Solution is to have dual level of granularity Store summary data on disks 95% of DSS processing done against this data Store detail on tapes 5% of DSS processing against this data
  • 29. 29 Levels of Granularity Operational 60 days of activity account activity date amount teller location account bal account month # trans withdrawals deposits average bal amount activity date amount account bal monthly account register -- up to 10 years Not all fields need be archived Banking Example
  • 30. 30 Data Integration Across Sources Trust Credit cardSavings Loans Same data different name Different data Same name Data found here nowhere else Different keys same data
  • 31. 31 Data Transformation Data transformation is the foundation for achieving single version of the truth Major concern for IT Data warehouse can fail if appropriate data transformation strategy is not developed Sequential Legacy Relational ExternalOperational/ Source Data Data Transformation Accessing Capturing Extracting Householding Filtering Reconciling Conditioning Loading Validating Scoring
  • 32. 32 Data Transformation Example encodingunitfield appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
  • 33. 33 Data Integrity Problems Same person, different spellings Agarwal, Agrawal, Aggarwal etc... Multiple ways to denote company name Persistent Systems, PSPL, Persistent Pvt. LTD. Use of different names mumbai, bombay Different account numbers generated by different applications for the same customer Required fields left blank Invalid product codes collected at point of sale manual entry leads to mistakes “in case of a problem use 9999999”
  • 34. 34 When to Refresh? periodically (e.g., every night, every week) or after significant events on every update: not warranted unless warehouse data require current data (up to the minute stock quotes) refresh policy set by administrator based on user needs and traffic possibly different policies for different sources
  • 35. 35 Refresh techniques Incremental techniques detect changes on base tables: replication servers (e.g., Sybase, Oracle, IBM Data Propagator) snapshots (Oracle) transaction shipping (Sybase) compute changes to derived and summary tables maintain transactional correctness for incremental load
  • 36. 36 How To Detect Changes Create a snapshot log table to record ids of updated rows of source data and timestamp Detect changes by: Defining after row triggers to update snapshot log when source table changes Using regular transaction log to detect changes to source data
  • 37. 37 Querying Data Warehouses SQL Extensions Multidimensional modeling of data OLAP More on OLAP later …
  • 38. 38 Reporting Tools Andyne Computing -- GQL Brio -- BrioQuery Business Objects -- Business Objects Cognos -- Impromptu Information Builders Inc. -- Focus for Windows Oracle -- Discoverer2000 Platinum Technology -- SQL*Assist, ProReports PowerSoft -- InfoMaker SAS Institute -- SAS/Assist Software AG -- Esperant Sterling Software -- VISION:Data
  • 39. 39 Deploying Data Warehouses What business information keeps you in business today? What business information can put you out of business tomorrow? What business information should be a mouse click away? What business conditions are the driving the need for business information?
  • 40. 40 Cultural Considerations Not just a technology project New way of using information to support daily activities and decision making Care must be taken to prepare organization for change Must have organizational backing and support
  • 41. 41 User Training Users must have a higher level of IT proficiency than for operational systems Training to help users analyze data in the warehouse effectively
  • 42. 42 Warehouse Products Computer Associates -- CA-Ingres Hewlett-Packard -- Allbase/SQL Informix -- Informix, Informix XPS Microsoft -- SQL Server Oracle -- Oracle7, Oracle Parallel Server Red Brick -- Red Brick Warehouse SAS Institute -- SAS Software AG -- ADABAS Sybase -- SQL Server, IQ, MPP
  • 43. 43 Strengths of OLAP It is a powerful visualization tool It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliners Many vendors offer OLAP tools
  • 44. 44 Part 3: Data Mining
  • 45. 45 Why Data Mining Credit ratings/targeted marketing: Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotions Fraud detection Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer? Customer relationship management: Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? : Data Mining helps extract such information
  • 46. 46 Data mining Process of semi-automatically analyzing large databases to find interesting and useful patterns Overlaps with machine learning, statistics, artificial intelligence and databases but more scalable in number of features and instances more automated to handle heterogeneous data
  • 47. 47 Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 48. 48 Data Mining in Use The US Government uses Data Mining to track fraud A Supermarket becomes an information broker Basketball teams use it to track game strategy Cross Selling Target Marketing Holding on to Good Customers Weeding out Bad Customers
  • 49. 49 Why Now? Data is being produced Data is being warehoused The computing power is available The computing power is affordable The competitive pressures are strong Commercial products are available
  • 50. 50 Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
  • 51. 51 Mining market Around 20 to 30 mining tool vendors Major players: Clementine, IBM’s Intelligent Miner, SGI’s MineSet, SAS’s Enterprise Miner. All pretty much the same set of tools Many embedded products: fraud detection, electronic commerce applications
  • 52. 52 OLAP Mining integration OLAP (On Line Analytical Processing) Fast interactive exploration of multidim. aggregates. Heavy reliance on manual operations for analysis: Tedious and error-prone on large multidimensional data Ideal platform for vertical integration of mining but needs to be interactive instead of batch.

Editor's Notes

  1. Absolute: 40 M$ 40M$, expected to grow 10 times by 2000 --Forrester research
  2. OLAP refers to Online Analytical Processing. I always wondered what was the analytical part in olap products? OLAP -- bunch of aggregates and simple group-bys on sums and average is not analysis. Interactive speed for selects/drill-downs/rollups/ no joins “analysis” manually and the products meet the “Online” part of the promise by pre-computing the aggregates. They offer a bare-bones RISC like functionality using which analysts do most of the work manually. This talk is about investigating if we can do some of the analysis too? When you have 5 dimensions, with avg. 3 levels hierarchy on each aggregating more than a million rows, manual exploration can get tedious. Goal is to add more complex operations although called mining think of them more like CISC functionalities.. Mining products provide the analysis part but they do it batched rather than online. Greater success of OLAP means people find this form of interactive analysis quite attractive.