05_Decision Support and OLAP.pdf

Decision Support,
Decision Support, Data
Data
Warehousing,
Warehousing, and OLAP
and OLAP
Data Warehousing
Data Warehousing
Warehousing,
Warehousing, and OLAP
and OLAP

Decision Support
Decision Support and OLAP
and OLAP

Information technology to help the knowledge

worker (executive, manager, analyst) make faster
worker (executive, manager, analyst) make faster
and better decisions
and better decisions.
.

Decision Support
Decision Support and OLAP
and OLAP
•
• What
What were the sales volumes by region and product
were the sales volumes by region and product
•
• What
What were the sales volumes by region and product
were the sales volumes by region and product
category for the last year?
category for the last year?
•
• How did the share price of computer manufacturers
How did the share price of computer manufacturers
correlate with quarterly profits over the past 10 years?
correlate with quarterly profits over the past 10 years?
•
• Which orders should we fill to maximize revenues?
Which orders should we fill to maximize revenues?
•
• Will a 10% discount increase sales volume sufficiently?
Will a 10% discount increase sales volume sufficiently?
•
• Which of two new medications will result in the best
Which of two new medications will result in the best
•
• Which of two new medications will result in the best
Which of two new medications will result in the best
outcome: higher recovery rate shorter hospital stay?
outcome: higher recovery rate shorter hospital stay?

On
On-
-Line
Line Analytical Processing (OLAP) is an element of
Analytical Processing (OLAP) is an element of
decision support
decision support systems
systems (DSS).
(DSS).

Evolution
Evolution

60’s: Batch reports

•
• hard to find and analyze information
hard to find and analyze information
•
• inflexible and expensive, reprogram every new request
inflexible and expensive, reprogram every new request

70’s: Terminal
70’s: Terminal-
-based DSS and EIS (executive information
based DSS and EIS (executive information
systems)
systems)
•
• still inflexible, not integrated with desktop tools
still inflexible, not integrated with desktop tools

80’s: Desktop data access and analysis tools

•
• query tools, spreadsheets, GUIs
query tools, spreadsheets, GUIs
•
• easier to use, but only access operational databases
easier to use, but only access operational databases

90’s: Data warehousing with integrated OLAP engines
90’s: Data warehousing with integrated OLAP engines
and tools
and tools

OLTP vs. OLAP
OLTP vs. OLAP

OLTP (
OLTP (Online Transaction Processing
Online Transaction Processing

OLTP (
OLTP (Online Transaction Processing
Online Transaction Processing
Systems
Systems) is a direct transactional processing
) is a direct transactional processing
system (
system (insert, update, delete)
insert, update, delete) through the
through the
network.
network.

OLAP
OLAP (
(Online Analytical Processing
Online Analytical Processing

OLAP
OLAP (
(Online Analytical Processing
Online Analytical Processing
Systems
Systems)
) is a system built to help in
is a system built to help in
planning, problem solving, and decision
planning, problem solving, and decision
support.
support.

OLTP vs. OLAP
OLTP vs. OLAP
Item OLTP OLAP
Item OLTP OLAP
Data source Operational Data,
OLTP as the original
data.
Consolidated Data,
OLAP data from
OLTP.
Data function Controlling and
running the main tasks.
Planning, problem
solving and supporting
the decision.
Data showed Sustainable business From varies of
Data showed Sustainable business
process.
From varies of
business activities.
Query used Simple Query. Complex Queries.

OLTP vs. OLAP
OLTP vs. OLAP
Item OLTP OLAP
Item OLTP OLAP
Speed of access Faster. Depends on the data
involved, it could be faster
using indexes.
Space required Smaller. Larger, needs more
indexing other than OLTP.
Database Design Normalized with
many tables.
De-normalized with less
table and using star /
snowflakes schemas.
snowflakes schemas.
User IT Professional Knowledge worker

Data Warehouse
Data Warehouse

A decision support database that is maintained

separately from the organization’s operational
separately from the organization’s operational
databases.
databases.

A data warehouse is a
A data warehouse is a
•
• subject
subject-
-oriented,
oriented,
•
• integrated,
integrated,
•
• integrated,
integrated,
•
• time
time-
-varying,
varying,
•
• non
non-
-volatile
volatile
collection of data that is used primarily in
collection of data that is used primarily in
organizational decision making
organizational decision making

Why Separate Data Warehouse?

Performance
Performance

Performance
Performance
•
• Operational database designed
Operational database designed tuned for
tuned for
known
known transactions
transactions workloads.
workloads.
•
• Complex OLAP queries would degrade
Complex OLAP queries would degrade
performance.
performance. For
For operational transactions.
operational transactions.
•
• Special data organization, access
Special data organization, access
•
• Special data organization, access
Special data organization, access
implementation methods needed for
implementation methods needed for
multidimensional views queries.
multidimensional views queries.


Function
Function

Function
Function
•
• Missing data: Decision support requires historical
Missing data: Decision support requires historical
data, which
data, which operational DB do
operational DB do not typically
not typically
maintain.
maintain.
•
• Data consolidation: Decision support requires
Data consolidation: Decision support requires
consolidation (aggregation, summarization) of data
from many heterogeneous sources:
from many heterogeneous sources: operational DB,
operational DB,
external sources.
external sources.
•
• Data quality: Different sources typically use
Data quality: Different sources typically use
inconsistent data representations, codes, and
inconsistent data representations, codes, and
formats which have to be reconciled.
formats which have to be reconciled.

Data Warehousing Market
Data Warehousing Market

Hardware: servers, storage, clients


Warehouse DBMs
Warehouse DBMs

Tools
Tools

Market growing from
Market growing from
•
• $2B in 1995 to $8 B in 1998 (Meta Group)
$2B in 1995 to $8 B in 1998 (Meta Group)
•
• 1.5B today to $6.9B in 1999 (Gartner Group)
1.5B today to $6.9B in 1999 (Gartner Group)

Systems integration Consulting
Systems integration Consulting

Already deployed in many industries: manufacturing,
Already deployed in many industries: manufacturing,
retail, financial, insurance, transportation, telecom.,
retail, financial, insurance, transportation, telecom.,
utilities, healthcare.
utilities, healthcare.

Data Warehousing Architecture
Data Warehousing Architecture
OLAP servers
OLAP servers
Monitoring
Monitoring Administration
Administration
Extract
Extract
Transform
Transform
Load
Load
Refresh
Refresh
External
External
Sources
Sources
Serve
Serve
OLAP servers
OLAP servers
Analysis
Analysis
Query/
Query/
Reporting
Reporting
Metadata
Metadata
Repository
Repository
Refresh
Refresh
Data Marts
Data Marts
Serve
Serve
Data
Data
Mining
Mining
Operational
Operational
DB
DB

Three
Three-
-Tier Architecture
Tier Architecture

Warehouse database server

•
• Almost always a relational DBMS; rarely flat files
Almost always a relational DBMS; rarely flat files

OLAP servers
OLAP servers
•
• Relational OLAP (ROLAP): extended relational DBMS that maps
Relational OLAP (ROLAP): extended relational DBMS that maps
operations on multidimensional data to standard relational
operations on multidimensional data to standard relational
operations.
operations.
•
• Multidimensional OLAP (MOLAP): special purpose server that
Multidimensional OLAP (MOLAP): special purpose server that
•
• Multidimensional OLAP (MOLAP): special purpose server that
Multidimensional OLAP (MOLAP): special purpose server that
directly implements multidimensional data and operations.
directly implements multidimensional data and operations.

Clients
Clients
•
• Query and reporting tools.
Query and reporting tools.
•
• Analysis tools
Analysis tools
•
• Data mining tools (e.g., trend analysis, prediction)
Data mining tools (e.g., trend analysis, prediction)

Design Operational Process
Design Operational Process

Define architecture. Do capacity planning.


Integrate db and OLAP servers, storage and client tools.
Integrate db and OLAP servers, storage and client tools.

Design warehouse schema, views.
Design warehouse schema, views.

Design physical warehouse organization: data placement,
Design physical warehouse organization: data placement,
partitioning, access methods.
partitioning, access methods.

Connect sources: gateways, ODBC drivers, wrappers.

Design implement scripts for data extract, load refresh.
Design implement scripts for data extract, load refresh.

Define metadata and populate repository.
Define metadata and populate repository.

Design implement end
Design implement end-
-user applications.
user applications.

Roll out warehouse and applications.
Roll out warehouse and applications.

Monitor the warehouse.
Monitor the warehouse.

OLAP for Decision Support
OLAP for Decision Support

Goal of OLAP is to support ad
Goal of OLAP is to support ad-
-hoc querying for the
hoc querying for the

Goal of OLAP is to support ad
Goal of OLAP is to support ad-
-hoc querying for the
hoc querying for the
business analyst
business analyst

Business analysts are familiar with spreadsheets
Business analysts are familiar with spreadsheets

Extend spreadsheet analysis model to work with
Extend spreadsheet analysis model to work with
warehouse data
warehouse data
•
• Large data set
Large data set
•
• Semantically enriched to understand business terms (e.g., time,
Semantically enriched to understand business terms (e.g., time,
•
• Semantically enriched to understand business terms (e.g., time,
Semantically enriched to understand business terms (e.g., time,
geography)
geography)
•
• Combined with reporting features
Combined with reporting features

Multidimensional
Multidimensional view of data is the foundation of OLAP
view of data is the foundation of OLAP

Multidimensional Data Model

Database is a set of
Database is a set of facts
facts (points) in a
(points) in a

Database is a set of
Database is a set of facts
facts (points) in a
(points) in a
multidimensional space
multidimensional space

A fact has a
A fact has a measure
measure dimension
dimension
•
• quantity that is analyzed, e.g., sale, budget
quantity that is analyzed, e.g., sale, budget

A set of
A set of dimensions
dimensions on which data is
on which data is

A set of
A set of dimensions
dimensions on which data is
on which data is
analyzed
analyzed
•
• e.g. , store, product, date associated with a sale
e.g. , store, product, date associated with a sale
amount
amount


Dimensions
Dimensions form a sparsely populated
form a sparsely populated

Dimensions
Dimensions form a sparsely populated
form a sparsely populated
coordinate system
coordinate system

Each dimension has a set of
Each dimension has a set of attributes
attributes
•
• e.g., owner city and county of store
e.g., owner city and county of store

Attributes of a dimension may be related by

partial order
partial order
•
• Hierarchy
Hierarchy: e.g., street county city
: e.g., street county city
•
• Lattice
Lattice: e.g., date monthyear,
: e.g., date monthyear,
dateweekyear
dateweekyear

Multidimensional Data
Multidimensional Data
Sales
Sales
10
10
47
47
30
30
Juice
Juice
Cola
Cola
Milk
Milk
Sales
Sales
Volume
Volume
as a
as a
function
function
of time,
of time,
city and
city and
30
30
12
12
Milk
Milk
Cream
Cream
city and
city and
product
product
3/1 3/2 3/3 3/4
3/1 3/2 3/3 3/4
Date
Date

Operations in Multidimensional
Operations in Multidimensional
Data Model
Data Model

Aggregation (
Aggregation (roll
roll-
-up
up)
)

Aggregation (
Aggregation (roll
roll-
-up
up)
)
•
• dimension reduction: e.g., total sales by city
dimension reduction: e.g., total sales by city
•
• summarization over aggregate hierarchy: e.g., total
summarization over aggregate hierarchy: e.g., total
sales by city and year
sales by city and year -
- total sales by region and by
total sales by region and by
year
year

Selection (
Selection (slice
slice) defines a subcube
) defines a subcube
•
• e.g., sales where city = Palo Alto and date = 1/15/96
e.g., sales where city = Palo Alto and date = 1/15/96

Navigation to detailed data (
Navigation to detailed data (drill
drill-
-down
down)
)
•
• e.g., (sales
e.g., (sales -
- expense) by city, top 3% of cities by
expense) by city, top 3% of cities by
average income
average income

Visualization Operations (e.g., Pivot)
Visualization Operations (e.g., Pivot)

A Visual Operation: Pivot (Rotate)
A Visual Operation: Pivot (Rotate)
10
10
47
47
30
30
Juice
Juice
Cola
Cola
Milk
Milk 30
30
12
12
Milk
Milk
Cream
Cream
3/1
3/1 3/2 3/3
3/2 3/3 3/4
3/4
Date
Date
Product
Product

Approaches to OLAP Servers

Relational OLAP (ROLAP)

•
• Relational and Specialized Relational DBMS
Relational and Specialized Relational DBMS
to store and manage warehouse data
to store and manage warehouse data
•
• OLAP middleware to support missing pieces
OLAP middleware to support missing pieces
–
– Optimize for each DBMS backend
Optimize for each DBMS backend
–
– Aggregation Navigation Logic
Aggregation Navigation Logic
–
– Aggregation Navigation Logic
Aggregation Navigation Logic
–
– Additional tools and services
Additional tools and services
•
• Example:
Example: Microstrategy
Microstrategy,
, MetaCube
MetaCube
(Informix
(Informix)
)


Multidimensional
Multidimensional OLAP (MOLAP)
OLAP (MOLAP)

Multidimensional
Multidimensional OLAP (MOLAP)
OLAP (MOLAP)
•
• Array
Array-
-based storage structures
based storage structures
•
• Direct access to array data structures
Direct access to array data structures
•
• Example:
Example: Essbase
Essbase (Arbor),
(Arbor), Accumate
Accumate
(
(Kenan
Kenan)
)
(
(Kenan
Kenan)
)

Domain
Domain-
-specific enrichment
specific enrichment

Relational DBMS as Warehouse
Relational DBMS as Warehouse
Server
Server

Schema design
Schema design

Schema design
Schema design

Specialized scan, indexing and join techniques
Specialized scan, indexing and join techniques

Handling of aggregate views (querying and
Handling of aggregate views (querying and
materialization)
materialization)

Supporting query language extensions beyond
Supporting query language extensions beyond
SQL
SQL
SQL
SQL

Complex query processing and optimization
Complex query processing and optimization

Data partitioning and parallelism
Data partitioning and parallelism

Warehouse Database Schema
Warehouse Database Schema

ER design techniques not appropriate


Design should reflect multidimensional
Design should reflect multidimensional
view
view
•
• Star Schema
Star Schema
•
• Snowflake Schema
Snowflake Schema
•
• Snowflake Schema
Snowflake Schema
•
• Fact Constellation Schema
Fact Constellation Schema

Example of a Star Schema
Example of a Star Schema
Order
Order
Product
Product
Order No
Order No
Order Date
Order Date
SalespersonID
SalespersonID
SalespersonName
SalespersonName
City
City
Quota
Quota
OrderNO
OrderNO
SalespersonID
SalespersonID
CustomerNO
CustomerNO
ProdNo
ProdNo
ProductNO
ProductNO
ProdName
ProdName
ProdDescr
ProdDescr
Category
Category
CategoryDescription
CategoryDescription
UnitPrice
UnitPrice
Salesperson
Salesperson
Date
Date
Product
Product
Fact Table
Fact Table
Customer No
Customer No
Customer Name
Customer Name
Customer Address
Customer Address
City
City
ProdNo
ProdNo
DateKey
DateKey
CityName
CityName
Quantity
Quantity
TotalPrice
DateKey
DateKey
Date
Date
CityName
CityName
State
State
Country
Country
Customer
Customer
City
City
Date
Date

Star Schema
Star Schema

A single fact table and a single table for each dimension


Every fact points to one tuple in each of the dimensions
Every fact points to one tuple in each of the dimensions
and has additional attributes
and has additional attributes

Does not capture hierarchies directly
Does not capture hierarchies directly

Generated keys are used for performance and maintenance
Generated keys are used for performance and maintenance
reasons
reasons

Fact constellation: Multiple Fact tables that share many
Fact constellation: Multiple Fact tables that share many
dimension tables
dimension tables
•
• Example: Projected expense and the actual expense may share
Example: Projected expense and the actual expense may share
dimensional tables
dimensional tables

Example of a Snowflake Schema
Example of a Snowflake Schema
Order
Order
Product
Product
Category
Category
Order No
Order No
Order Date
Order Date
Customer No
Customer No
Customer Name
Customer Name
Customer
Customer
Address
Address
OrderNO
OrderNO
SalespersonID
SalespersonID
CustomerNO
CustomerNO
ProdNo
ProdNo
ProductNO
ProductNO
ProdName
ProdName
ProdDescr
ProdDescr
CategoryName
CategoryName
Category
Category
UnitPrice
UnitPrice
Customer
Customer
Date
Date
Product
Product
Fact Table
Fact Table
CategoryName
CategoryName
CategoryDescr
CategoryDescr
Category
Category
Month
Month
City
City
SalespersonID
SalespersonID
SalespersonName
SalespersonName
City
City
Quota
Quota
ProdNo
ProdNo
DateKey
DateKey
CityName
CityName
Quantity
Quantity
TotalPrice
DateKey
DateKey
Date
Date
Month
Month
CityName
CityName
StateName
StateName
Salesperson
Salesperson
City
City
Date
Date
Month
Month
Year
Year
Year
Year
StateName
StateName
Country
Country
State
State
Month
Month
Year
Year

Snowflake Schema
Snowflake Schema

Represent dimensional hierarchy directly by

normalizing the dimension tables
normalizing the dimension tables

Easy to maintain
Easy to maintain

Saves storage, but is alleged that it reduces
Saves storage, but is alleged that it reduces
effectiveness of browsing (Kimball)

05_Decision Support and OLAP.pdf

Recommended

Recommended

More Related Content

Similar to 05_Decision Support and OLAP.pdf

Similar to 05_Decision Support and OLAP.pdf (20)

More from INyomanSwitrayana

More from INyomanSwitrayana (8)

Recently uploaded

Recently uploaded (20)

05_Decision Support and OLAP.pdf