2. OLTP Vs OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data per
transaction
Small Large
Views Pre-defined User-defined
Typical write
operation
Update, insert, delete Bulk insert
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109 B – 1012 B) High (1012 B – 1015 B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med
3. OLAP framework for decision support.
Physical implementation techniques: MOLAP, ROLAP,
HOLAP, and DOLAP.
Star schema design.
Topics
4. Relationship between DWH & OLAP
DataWarehouse & OLAP go together.
Analysis supported by OLAP
DWH and OLAP
5. Supporting The Human Thought Process
How many such query sequences can be programmed in advance?
THOUGHT PROCESS QUERY SEQUENCE
An enterprise wide fall in profit
Profit down by a large percentage
consistently during last quarter only.
Rest is OK
What is special about last quarter ?
Products alone doing OK, but North
region is most problematic.
What was the quarterly sales during
last year ??
What was the quarterly sales at
regional level during last year ??
What was the monthly sale for last
quarter group by products
What was the monthly sale of
products in north at store level
group by products purchased
OK. So the problem is the high cost of
products purchased
in north.
What was the quarterly sales at
product level during last year?
?
What was the monthly sale for last
quarter group by region
6. Analysis isAd-hoc
Analysis is interactive (user driven)
Analysis is iterative
Answer to one question leads to a dozen more
Analysis is directional
Drill Down
Roll Up
Pivot
Analysis of the Example
More in
subsequent
slides
7. Not feasible to write predefined queries.
Fails to remain user_driven (becomes programmer driven).
Fails to remain ad_hoc and hence is not interactive.
Enable ad-hoc query support
Business user can not build his/her own queries (does not know SQL,
should not know it).
On_the_go SQL generation and execution too slow.
Challenges
8. Contradiction
Want to compute answers in advance, but don't know the questions
Solution
Compute answers to “all” possible “queries”. But how?
NOTE: Queries are multidimensional aggregates at some level
Challenges
9. A Conceptual Hierarchy (Dimensions)
Province KPK Punjab...
Division MultanLahorePeshawarMardan ......
Lahore ... Gugranwala
City
Zone GulbergDefense ...
District LahorePeshawar
ALL ALLALL ALL
10. OLAP = On-line analytical processing.
OLAP is a characterization of applications, not a database
design technique.
Analytical processing requires access to complex aggregations
(as opposed to record-level access).
Idea is to provide
very fast response time
For ad-hoc queries
in order to facilitate iterative decision-making.
Where Does OLAP Fit In?
11. Where Does OLAP Fit In?
Transaction
Data
Presentation
Tools
Reports
OLAP
Data Cube
(MOLAP)
Data
Loading
?
Decision
Maker
12. Information is conceptually viewed as “cubes” for simplifying the way in
which users access, view, and analyze data.
Quantitative values are known as “facts” or “measures.”
e.g., sales $, units sold, etc.
Descriptive categories are known as “dimensions.”
e.g., geography, time, product, scenario (budget or actual), etc.
Dimensions are often organized in hierarchies that represent levels of detail
in the data (e.g., UPC, SKU, product subcategory, product category,
etc.).
Vendors try to make OLAP synonymous with a technology
Where Does OLAP Fit In?
13. FACTS: Quantitative values (numbers) or “measures.”
e.g., units sold, sales $, Co, Kg etc.
DIMENSIONS: Descriptive categories.
e.g., time, geography, product etc.
DIM often organized in hierarchies representing levels of detail in the
data (e.g., week, month, quarter, year, decade etc.)
Facts and Dimensions
14. MOLAP: OLAP implemented with a multi-dimensional database.
ROLAP: OLAP implemented with a relational database.
HOLAP: OLAP implemented with a hybrid of multi-dimensional and
relational database technologies.
DOLAP: OLAP implemented for desktop decision support environments.
OLAP Implementations
15. Sales volume as a function of product, month, and region
Multidimensional DataProduct
Month
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
16. A sample Data Cube
Date
Country
sum
sum
TV
VCR
PC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Total annual sales
of TV in U.S.A.
17. Fast: Delivers information to the user at a fairly constant rate. Most
queries should be delivered to the user in five seconds or less.
Analysis: Performs basic numerical and statistical analysis of the data, pre-
defined by an application developer or defined ad hoc by the user.
Example?
Shared: Implements the security requirements necessary for sharing
potentially confidential data across a large user population.
OLAP FASMI Test
18. Multi-dimensional: The essential characteristic of OLAP.
Information: Accesses all the data and information necessary and relevant
for the application, wherever it may reside and not limited by volume.
...from the OLAP Report by Pendse and Creeth.
OLAP FASMI Test
19. Choose between MOLAP, ROLAP and HOLAP physical storage:
MOLAP: ~10s of GB. Fast response
ROLAP: 100s of GB ~TB. Slower
HOLAP: Partition the data. Store Fact table and infrequent data in
ROLAP, the rest in MOLAP
OLAP Implementations
20. OLAP has historically been implemented through use of multi-dimensional
databases (MDDs).
Dimensions are key business factors for analysis:
geographies (zip, state, region,...)
products (item, product category, product department,...)
dates (day, week, month, quarter, year,...)
Very high performance via fast look-up into “cube” data structure to
retrieve pre-calculated results.
“Cube” data structures allow pre-calculation of aggregate results for
each possible combination of dimensional values.
Use of application programming interface (API) for access via front-end
tools.
MOLAP Implementations