Lecture 09
OLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation Techniques
OLTP Vs OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data per
transaction
Small Large
Views Pre-defined User-defined
Typical write
operation
Update, insert, delete Bulk insert
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109 B – 1012 B) High (1012 B – 1015 B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med
OLAP framework for decision support.
Physical implementation techniques: MOLAP, ROLAP,
HOLAP, and DOLAP.
Star schema design.
Topics
Relationship between DWH & OLAP
DataWarehouse & OLAP go together.
Analysis supported by OLAP
DWH and OLAP
Supporting The Human Thought Process
How many such query sequences can be programmed in advance?
THOUGHT PROCESS QUERY SEQUENCE
An enterprise wide fall in profit
Profit down by a large percentage
consistently during last quarter only.
Rest is OK
What is special about last quarter ?
Products alone doing OK, but North
region is most problematic.
What was the quarterly sales during
last year ??
What was the quarterly sales at
regional level during last year ??
What was the monthly sale for last
quarter group by products
What was the monthly sale of
products in north at store level
group by products purchased
OK. So the problem is the high cost of
products purchased
in north.
What was the quarterly sales at
product level during last year?
?
What was the monthly sale for last
quarter group by region
Analysis isAd-hoc
Analysis is interactive (user driven)
Analysis is iterative
Answer to one question leads to a dozen more
Analysis is directional
Drill Down
Roll Up
Pivot
Analysis of the Example
More in
subsequent
slides
Not feasible to write predefined queries.
Fails to remain user_driven (becomes programmer driven).
Fails to remain ad_hoc and hence is not interactive.
Enable ad-hoc query support
Business user can not build his/her own queries (does not know SQL,
should not know it).
On_the_go SQL generation and execution too slow.
Challenges
Contradiction
Want to compute answers in advance, but don't know the questions
Solution
Compute answers to “all” possible “queries”. But how?
NOTE: Queries are multidimensional aggregates at some level
Challenges
A Conceptual Hierarchy (Dimensions)
Province KPK Punjab...
Division MultanLahorePeshawarMardan ......
Lahore ... Gugranwala
City
Zone GulbergDefense ...
District LahorePeshawar
ALL ALLALL ALL
OLAP = On-line analytical processing.
OLAP is a characterization of applications, not a database
design technique.
Analytical processing requires access to complex aggregations
(as opposed to record-level access).
Idea is to provide
very fast response time
For ad-hoc queries
in order to facilitate iterative decision-making.
Where Does OLAP Fit In?
Where Does OLAP Fit In?
Transaction
Data
Presentation
Tools
Reports
OLAP
Data Cube
(MOLAP)
Data
Loading
?
Decision
Maker
Information is conceptually viewed as “cubes” for simplifying the way in
which users access, view, and analyze data.
Quantitative values are known as “facts” or “measures.”
e.g., sales $, units sold, etc.
Descriptive categories are known as “dimensions.”
e.g., geography, time, product, scenario (budget or actual), etc.
Dimensions are often organized in hierarchies that represent levels of detail
in the data (e.g., UPC, SKU, product subcategory, product category,
etc.).
Vendors try to make OLAP synonymous with a technology
Where Does OLAP Fit In?
FACTS: Quantitative values (numbers) or “measures.”
e.g., units sold, sales $, Co, Kg etc.
DIMENSIONS: Descriptive categories.
e.g., time, geography, product etc.
DIM often organized in hierarchies representing levels of detail in the
data (e.g., week, month, quarter, year, decade etc.)
Facts and Dimensions
MOLAP: OLAP implemented with a multi-dimensional database.
ROLAP: OLAP implemented with a relational database.
HOLAP: OLAP implemented with a hybrid of multi-dimensional and
relational database technologies.
DOLAP: OLAP implemented for desktop decision support environments.
OLAP Implementations
Sales volume as a function of product, month, and region
Multidimensional DataProduct
Month
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
A sample Data Cube
Date
Country
sum
sum
TV
VCR
PC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Total annual sales
of TV in U.S.A.
Fast: Delivers information to the user at a fairly constant rate. Most
queries should be delivered to the user in five seconds or less.
Analysis: Performs basic numerical and statistical analysis of the data, pre-
defined by an application developer or defined ad hoc by the user.
Example?
Shared: Implements the security requirements necessary for sharing
potentially confidential data across a large user population.
OLAP FASMI Test
Multi-dimensional: The essential characteristic of OLAP.
Information: Accesses all the data and information necessary and relevant
for the application, wherever it may reside and not limited by volume.
...from the OLAP Report by Pendse and Creeth.
OLAP FASMI Test
Choose between MOLAP, ROLAP and HOLAP physical storage:
MOLAP: ~10s of GB. Fast response
ROLAP: 100s of GB ~TB. Slower
HOLAP: Partition the data. Store Fact table and infrequent data in
ROLAP, the rest in MOLAP
OLAP Implementations
OLAP has historically been implemented through use of multi-dimensional
databases (MDDs).
Dimensions are key business factors for analysis:
geographies (zip, state, region,...)
products (item, product category, product department,...)
dates (day, week, month, quarter, year,...)
Very high performance via fast look-up into “cube” data structure to
retrieve pre-calculated results.
“Cube” data structures allow pre-calculation of aggregate results for
each possible combination of dimensional values.
Use of application programming interface (API) for access via front-end
tools.
MOLAP Implementations

Cs437 lecture 09

  • 1.
    Lecture 09 OLAP ImplementationTechniquesOLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation Techniques
  • 2.
    OLTP Vs OLAP FeatureOLTP OLAP Level of data Detailed Aggregated Amount of data per transaction Small Large Views Pre-defined User-defined Typical write operation Update, insert, delete Bulk insert “age” of data Current (60-90 days) Historical 5-10 years and also current Number of users High Low-Med Tables Flat tables Multi-Dimensional tables Database size Med (109 B – 1012 B) High (1012 B – 1015 B) Query Optimizing Requires experience Already “optimized” Data availability High Low-Med
  • 3.
    OLAP framework fordecision support. Physical implementation techniques: MOLAP, ROLAP, HOLAP, and DOLAP. Star schema design. Topics
  • 4.
    Relationship between DWH& OLAP DataWarehouse & OLAP go together. Analysis supported by OLAP DWH and OLAP
  • 5.
    Supporting The HumanThought Process How many such query sequences can be programmed in advance? THOUGHT PROCESS QUERY SEQUENCE An enterprise wide fall in profit Profit down by a large percentage consistently during last quarter only. Rest is OK What is special about last quarter ? Products alone doing OK, but North region is most problematic. What was the quarterly sales during last year ?? What was the quarterly sales at regional level during last year ?? What was the monthly sale for last quarter group by products What was the monthly sale of products in north at store level group by products purchased OK. So the problem is the high cost of products purchased in north. What was the quarterly sales at product level during last year? ? What was the monthly sale for last quarter group by region
  • 6.
    Analysis isAd-hoc Analysis isinteractive (user driven) Analysis is iterative Answer to one question leads to a dozen more Analysis is directional Drill Down Roll Up Pivot Analysis of the Example More in subsequent slides
  • 7.
    Not feasible towrite predefined queries. Fails to remain user_driven (becomes programmer driven). Fails to remain ad_hoc and hence is not interactive. Enable ad-hoc query support Business user can not build his/her own queries (does not know SQL, should not know it). On_the_go SQL generation and execution too slow. Challenges
  • 8.
    Contradiction Want to computeanswers in advance, but don't know the questions Solution Compute answers to “all” possible “queries”. But how? NOTE: Queries are multidimensional aggregates at some level Challenges
  • 9.
    A Conceptual Hierarchy(Dimensions) Province KPK Punjab... Division MultanLahorePeshawarMardan ...... Lahore ... Gugranwala City Zone GulbergDefense ... District LahorePeshawar ALL ALLALL ALL
  • 10.
    OLAP = On-lineanalytical processing. OLAP is a characterization of applications, not a database design technique. Analytical processing requires access to complex aggregations (as opposed to record-level access). Idea is to provide very fast response time For ad-hoc queries in order to facilitate iterative decision-making. Where Does OLAP Fit In?
  • 11.
    Where Does OLAPFit In? Transaction Data Presentation Tools Reports OLAP Data Cube (MOLAP) Data Loading ? Decision Maker
  • 12.
    Information is conceptuallyviewed as “cubes” for simplifying the way in which users access, view, and analyze data. Quantitative values are known as “facts” or “measures.” e.g., sales $, units sold, etc. Descriptive categories are known as “dimensions.” e.g., geography, time, product, scenario (budget or actual), etc. Dimensions are often organized in hierarchies that represent levels of detail in the data (e.g., UPC, SKU, product subcategory, product category, etc.). Vendors try to make OLAP synonymous with a technology Where Does OLAP Fit In?
  • 13.
    FACTS: Quantitative values(numbers) or “measures.” e.g., units sold, sales $, Co, Kg etc. DIMENSIONS: Descriptive categories. e.g., time, geography, product etc. DIM often organized in hierarchies representing levels of detail in the data (e.g., week, month, quarter, year, decade etc.) Facts and Dimensions
  • 14.
    MOLAP: OLAP implementedwith a multi-dimensional database. ROLAP: OLAP implemented with a relational database. HOLAP: OLAP implemented with a hybrid of multi-dimensional and relational database technologies. DOLAP: OLAP implemented for desktop decision support environments. OLAP Implementations
  • 15.
    Sales volume asa function of product, month, and region Multidimensional DataProduct Month Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day
  • 16.
    A sample DataCube Date Country sum sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum Total annual sales of TV in U.S.A.
  • 17.
    Fast: Delivers informationto the user at a fairly constant rate. Most queries should be delivered to the user in five seconds or less. Analysis: Performs basic numerical and statistical analysis of the data, pre- defined by an application developer or defined ad hoc by the user. Example? Shared: Implements the security requirements necessary for sharing potentially confidential data across a large user population. OLAP FASMI Test
  • 18.
    Multi-dimensional: The essentialcharacteristic of OLAP. Information: Accesses all the data and information necessary and relevant for the application, wherever it may reside and not limited by volume. ...from the OLAP Report by Pendse and Creeth. OLAP FASMI Test
  • 19.
    Choose between MOLAP,ROLAP and HOLAP physical storage: MOLAP: ~10s of GB. Fast response ROLAP: 100s of GB ~TB. Slower HOLAP: Partition the data. Store Fact table and infrequent data in ROLAP, the rest in MOLAP OLAP Implementations
  • 20.
    OLAP has historicallybeen implemented through use of multi-dimensional databases (MDDs). Dimensions are key business factors for analysis: geographies (zip, state, region,...) products (item, product category, product department,...) dates (day, week, month, quarter, year,...) Very high performance via fast look-up into “cube” data structure to retrieve pre-calculated results. “Cube” data structures allow pre-calculation of aggregate results for each possible combination of dimensional values. Use of application programming interface (API) for access via front-end tools. MOLAP Implementations