SlideShare a Scribd company logo
1 of 47
Dr. Abdul Basit Siddiqui
Assistant Professor
FURC, Islamabad
DWH & OLAP
Relationship between DWH & OLAP
Data Warehouse & OLAP go
together
Analysis supported by OLAP
3
Supporting the human thought process
How many such query sequences can be programmed in advance?
4
THOUGHT PROCESSTHOUGHT PROCESS QUERY SEQUENCEQUERY SEQUENCE
An enterprise wide fall in profit
Profit down by a large
percentage consistently during
last quarter only. Rest is OK
What is special about last
quarter ?
Products alone doing OK, but
North region is most
problematic.
What was the quarterly sales
during last year ??
What was the quarterly sales at
regional level during last year ??
What was the monthly sale for
last quarter group by products
What was the monthly sale of
products in north at store level
group by products purchased
OK. So the problem is the high
cost of products purchased
in north.
What was the quarterly sales at
product level during last year?

?
What was the monthly sale for
last quarter group by region
Analysis of last example
Analysis is Ad-hoc
Analysis is interactive (user driven)
Analysis is iterative
Answer to one question leads to a dozen more
Analysis is directional
Drill Down
Roll Up
Pivot
5
More in
subsequent
slides
Challenges…
Not feasible to write predefined queries.
Fails to remain user_driven (becomes programmer
driven).
Fails to remain ad_hoc and hence is not interactive.
Enable ad-hoc query support
Business user can not build his/her own queries (does
not know SQL, should not know it).
On_the_go SQL generation and execution too slow.
6
Challenges
Contradiction
Want to compute answers in advance, but don't know
the questions
Solution
Compute answers to “all” possible “queries”. But how?
NOTE: Queries are multidimensional aggregates at
some level
7
“All” possible queries (level aggregates)
8
Province Frontier Punjab...
Division MultanLahorePeshawarMardan ......
Lahore ... Gugranwala
City
Zone GulbergDefense ...
District LahorePeshawar
ALL ALLALL ALL
OLAP: Facts & Dimensions
FACTS: Quantitative values (numbers) or
“measures.”
e.g., units sold, sales $, Co
, Kg etc.
DIMENSIONS: Descriptive categories.
e.g., time, geography, product etc.
DIM often organized in hierarchies representing levels
of detail in the data (e.g., week, month, quarter, year,
decade etc.).
9
Where Does OLAP Fit In?
 It is a classification of applications, NOT a
database design technique.
 Analytical processing uses multi-level
aggregates, instead of record level access.
 Objective is to support very
I. fast
II. iterative and
III. ad-hoc decision-making
10
Where does OLAP fit in?
11
Transaction
Data

Presentation
Tools
Reports
OLAP
Data Cube
(MOLAP)
Data
Loading

?
Decision
Maker
OLTP vs. OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data
per transaction
Small Large
Views Pre-defined User-defined
Typical write
operation
Update, insert, delete Bulk insert
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109
B – 1012
B) High (1012
B – 1015
B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med
12
OLAP FASMI Test
Fast: Delivers information to the user at a fairly constant rate. Most
queries answered in under five seconds.
Analysis: Performs basic numerical and statistical analysis of the
data, pre-defined by an application developer or defined ad-hocly by
the user.
Shared: Implements the security requirements necessary for sharing
potentially confidential data across a large user population.
Multi-dimensional: The essential characteristic of OLAP.
Information: Accesses all the data and information necessary and
relevant for the application, wherever it may reside and not limited
by volume.
13
14
OLAP Implementations
1. MOLAP: OLAP implemented with a multi-dimensional data
structure.
2. ROLAP: OLAP implemented with a relational database.
3. HOLAP: OLAP implemented as a hybrid of MOLAP and ROLAP.
4. DOLAP: OLAP implemented for desktop decision support
environments.
15
MOLAP Implementations
OLAP has historically been implemented using a
multi_dimensional data structure or “cube”.
Dimensions are key business factors for analysis:
Geographies (city, district, division, province,...)
Products (item, product category, product
department,...)
Dates (day, week, month, quarter, year,...)
Very high performance achieved by O(1) time lookup
into “cube” data structure to retrieve pre_aggregated
results.
16
MOLAP Implementations
No standard query language for querying MOLAP
- No SQL !
Vendors provide proprietary languages allowing business users to
create queries that involve pivots, drilling down, or rolling up.
- E.g. MDX of Microsoft
- Languages generally involve extensive visual (click and drag)
support.
- Application Programming Interface (API)’s also provided for
probing the cubes.
17
Aggregations in MOLAP
18
 Sales volume as a function of (i) product, (ii)Sales volume as a function of (i) product, (ii)
time, and (iii) geographytime, and (iii) geography
 A cube structure created to handle this.A cube structure created to handle this.
Dimensions: Product, Geography, Time
Industry
Category
Product
Hierarchical summarization pathsHierarchical summarization paths
Product
G
eog
Time
w1 w2 w3 w4 w5 w6
Milk
Bread
Eggs
Butter
Jam
Juice
N
E
W
S
12
13
45
8
23
10
Province
Division
District
City
Zone
Year
Quarter
Month Week
Day
Cube operations
Drill down: get more details
e.g., given summarized sales as above, find breakup of
sales by city within each region, or within Sindh
Rollup: summarize data
e.g., given sales data, summarize sales for last year by
product category and region
Slice and dice: select and project
e.g.: Sales of soft-drinks in Karachi during last quarter
Pivot: change the view of data
19
Querying the cube
-
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
2001 2002
Juices Soda Drinks
20
Drill-down
-
2,000
4,000
6,000
8,000
10,000
12,000
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
OJ RK 8UP PK MJ BU AJ
2001 2002
-
2,000
4,000
6,000
8,000
10,000
12,000
14,000
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Juices Soda Drinks
2001 2002
Drill-Down
Roll-Up
Querying the cube: Pivoting
-
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
2001 2002
Juices Soda Drinks
-
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
Orange
juice
Mango
juice
Apple
juice
Rola-
Kola
8-UP Bubbly-
UP
Pola-
Kola
2001 2002
21
MOLAP evaluation
22
Advantages of MOLAP:

Instant response (pre-calculated
aggregates).

Impossible to ask question without an
answer.

Value added functions (ranking, %
change).
MOLAP evaluation
23
Drawbacks of MOLAP:
 Long load time ( pre-calculating the cube
may take days!).
 Very sparse cube (wastage of space) for
high cardinality (sometimes in small
hundreds). e.g. number of heaters sold in
Jacobabad or Sibi.
MOLAP Implementation issues
Maintenance issue: Every data item received
must be aggregated into every cube (assuming
“to-date” summaries are maintained). Lot of
work.
Storage issue: As dimensions get less detailed
(e.g., year vs. day) cubes get much smaller, but
storage consequences for building hundreds of
cubes can be significant. Lot of space.
24
Partitioned Cubes
To overcome the space limitation of MOLAP, the cube is
partitioned.
The divide&conquer cube partitioning approach helps alleviate the
scalability limitations of MOLAP implementation.
One logical cube of data can be spread across multiple physical
cubes on separate (or same) servers.
Ideal cube partitioning is completely invisible to end users.
Performance degradation does occurs in case of a join across
partitioned cubes.
25
Partitioned Cubes: How it looks Like?
26
Time
Geography
Men’s clothing
Children clothing
Bed linen
Sales data cube partitioned at a major cottonSales data cube partitioned at a major cotton
products sale outletproducts sale outlet
Product
Virtual Cubes
Used to query two dissimilar cubes by creating a third
“virtual” cube by a join between two cubes.
Logically similar to a relational view i.e. linking two (or
more) cubes along common dimension(s).
Biggest advantage is saving in space by eliminating
storage of redundant information.
Example: Joining the store cube and the list price cube
along the product dimension, to calculate the sale price
without redundant storage of the sale price data.
27
28
Why ROLAP?
Issue of scalability i.e. curse of dimensionality for
MOLAP
Deployment of significantly large dimension tables as
compared to MOLAP using secondary storage.
Aggregate awareness allows using pre-built summary
tables by some front-end tools.
 Star schema designs usually used to facilitate ROLAP
querying (in next lecture).
29
ROLAP as a “Cube”
OLAP data is stored in a relational database (e.g. a star schema)
The fact table is a way of visualizing as a “un-rolled” cube.
So where is the cube?
It’s a matter of perception
Visualize the fact table as an elementary cube.
30
ProductG
eog.
Time
500500Z1Z1P2P2M2M2
250250Z1Z1P1P1M1M1
Sale K Rs.Sale K Rs.ZoneZoneProductProductMonthMonth
Fact Table
How to create “Cube” in ROLAP
Cube is a logical entity containing values of a certain fact at a
certain aggregation level at an intersection of a combination of
dimensions.
The following table can be created using 3 queries
31
SUMSUM
(Sales_Amt)(Sales_Amt)
M1M1 M2M2 M3M3 ALLALL
P1P1
P2P2
P3P3
TotalTotal
Month_ID
Product_ID
How to create “Cube” in ROLAP using SQL
 For the table entries, without the totals
SELECT S.Month_Id, S.Product_Id,
SUM(S.Sales_Amt)
FROM Sales
GROUP BY S.Month_Id, S.Product_Id;
 For the row totals
SELECT S.Product_Id, SUM (Sales_Amt)
FROM Sales
GROUP BY S.Product_Id;
 For the column totals
SELECT S.Month_Id, SUM (Sales)
FROM Sales
GROUP BY S.Month_Id;
32
Problem With Simple Approach
Number of required queries increases
exponentially with the increase in number of
dimensions.
Its wasteful to compute all queries.
In the example, the first query can do most of the
work of the other two queries
If we could save that result and aggregate over
Month_Id and Product_Id, we could compute the
other queries more efficiently
33
CUBE Clause
The CUBE clause is part of SQL:1999
GROUP BY CUBE (v1, v2, …, vn)
Equivalent to a collection of GROUP BYs, one for
each of the subsets of v1, v2, …, vn
34
ROLAP & Space Requirement
If one is not careful, with the increase in number
of dimensions, the number of summary tables gets
very large
Consider the example discussed earlier with the
following two dimensions on the fact table...
Time: Day, Week, Month, Quarter, Year, All Days
Product: Item, Sub-Category, Category, All Products
35
EXAMPLE: ROLAP & Space Requirement
36
A naï ve implementation will require all combinations
of summary tables at each and every aggregation
level.
 


…
24 summary tables, add in
geography, results in 120
tables
ROLAP Issues
 Maintenance.
 Non standard hierarchy of dimensions.
 Non standard conventions.
 Explosion of storage space requirement.
Aggregation pit-falls.
37
ROLAP Issue: Maintenance
Summary tables are mostly a maintenance issue (similar
to MOLAP) than a storage issue.
Notice that summary tables get much smaller as
dimensions get less detailed (e.g., year vs. day).
Should plan for twice the size of the unsummarized data
for ROLAP summaries in most environments.
Assuming "to-date" summaries, every detail record that is
received into warehouse must aggregate into EVERY
summary table.
38
ROLAP Issue: Hierarchies
Dimensions are NOT always simple hierarchies
Dimensions can be more than simple hierarchies i.e. item,
subcategory, category, etc.
The product dimension might also branch off by trade style that cross
simple hierarchy boundaries such as:
Looking at sales of air conditioners that cross manufacturer
boundaries, such as COY1, COY2, COY3 etc.
Looking at sales of all “green colored” items that even cross product
categories (washing machine, refrigerator, split-AC, etc.).
Looking at a combination of both.
39
ROLAP Issue: Convention
Conventions are NOT absolute
Example: What is calendar year? What is a week?
 Calendar:
01 Jan. to 31 Dec or
01 Jul. to 30 Jun. or
01 Sep to 30 Aug.
 Week:
Mon. to Sat. or Thu. to Wed.
40
ROLAP Issue: Storage space explosion
41
Summary tables required for non-standardSummary tables required for non-standard
groupinggrouping
Summary tables required along differentSummary tables required along different
definitions of year, week etc.definitions of year, week etc.
Brute force approach would quickly overwhelmBrute force approach would quickly overwhelm
the system storage capacity due to athe system storage capacity due to a
combinatorial explosion.combinatorial explosion.
ROALP Issues: Aggregation pitfalls
Coarser granularity correspondingly decreases
potential cardinality.
Aggregating whatever that can be aggregated.
Throwing away the detail data after aggregation.
42
How to Reduce Summary tables?
Many ROLAP products have developed means to
reduce the number of summary tables by:
Building summaries on-the-fly as required by end-
user applications.
Enhancing performance on common queries at
coarser granularities.
Providing smart tools to assist DBAs in selecting the
"best” aggregations to build i.e. trade-off between
speed and space.
43
Performance vs. Space Trade-Off
Maximum performance boost implies using lots
of disk space for storing every pre-calculation.
Minimum performance boost implies no disk
space with zero pre-calculation.
Using meta data to determine best level of pre-
aggregation from which all other aggregates can
be computed.
44
Performance vs. Space Trade-off using Wizard
20
40
60
80
100
2 4 6 8
××
××
MB
%Gain Aggregation answers
most queries
Aggregation
answers few queries
HOLAP
Target is to get the best of both worlds.
HOLAP (Hybrid OLAP) allow co-existence of pre-
built MOLAP cubes alongside relational OLAP or
ROLAP structures.
How much to pre-build?
46
DOLAP
47
Cube on the
remote server
Local
Machine/Server
Subset of the cube is
transferred to the local
machine

More Related Content

What's hot

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Business Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional AnalysisBusiness Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional AnalysisMichael Lamont
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bitsTrivadis
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Er Bansal
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouseUday Kothari
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 

What's hot (15)

ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Business Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional AnalysisBusiness Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional Analysis
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bits
 
2dw
2dw2dw
2dw
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
Unit4
Unit4Unit4
Unit4
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Data ware house
Data ware houseData ware house
Data ware house
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 

Similar to Dwh lecture slides-week 12&13

Cs437 lecture 10-12
Cs437 lecture 10-12Cs437 lecture 10-12
Cs437 lecture 10-12Aneeb_Khawar
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06AnwarrChaudary
 
Dwh lecture 11-molap
Dwh  lecture 11-molapDwh  lecture 11-molap
Dwh lecture 11-molapSulman Ahmed
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vjhomeworkping4
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingnurmeen1
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07AnwarrChaudary
 
Dwh lecture 10-olap
Dwh   lecture 10-olapDwh   lecture 10-olap
Dwh lecture 10-olapSulman Ahmed
 
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/RailsActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/RailsPaul Gallagher
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingSamraiz Tejani
 
Cameron Hawthorne - E Bus Sales analysis with Olap and Disco
Cameron Hawthorne - E Bus Sales analysis with Olap and DiscoCameron Hawthorne - E Bus Sales analysis with Olap and Disco
Cameron Hawthorne - E Bus Sales analysis with Olap and DiscoInSync Conference
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05AnwarrChaudary
 
Lecture 11
Lecture 11Lecture 11
Lecture 11Shani729
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsInside Analysis
 

Similar to Dwh lecture slides-week 12&13 (20)

Cs437 lecture 10-12
Cs437 lecture 10-12Cs437 lecture 10-12
Cs437 lecture 10-12
 
Cs437 lecture 09
Cs437 lecture 09Cs437 lecture 09
Cs437 lecture 09
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06
 
Dwh lecture 11-molap
Dwh  lecture 11-molapDwh  lecture 11-molap
Dwh lecture 11-molap
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07
 
Dwh lecture 10-olap
Dwh   lecture 10-olapDwh   lecture 10-olap
Dwh lecture 10-olap
 
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/RailsActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Cameron Hawthorne - E Bus Sales analysis with Olap and Disco
Cameron Hawthorne - E Bus Sales analysis with Olap and DiscoCameron Hawthorne - E Bus Sales analysis with Olap and Disco
Cameron Hawthorne - E Bus Sales analysis with Olap and Disco
 
Lecture 10.ppt
Lecture 10.pptLecture 10.ppt
Lecture 10.ppt
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05
 
OLAP
OLAPOLAP
OLAP
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
3dw
3dw3dw
3dw
 
OLAP
OLAPOLAP
OLAP
 
Olap queries
Olap queriesOlap queries
Olap queries
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 

More from Shani729

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012Shani729
 
Python tutorial
Python tutorialPython tutorial
Python tutorialShani729
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionShani729
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)Shani729
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15Shani729
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15Shani729
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10Shani729
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Shani729
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Shani729
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1Shani729
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13Shani729
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furcShani729
 
Lecture 40
Lecture 40Lecture 40
Lecture 40Shani729
 
Lecture 39
Lecture 39Lecture 39
Lecture 39Shani729
 
Lecture 38
Lecture 38Lecture 38
Lecture 38Shani729
 
Lecture 37
Lecture 37Lecture 37
Lecture 37Shani729
 
Lecture 35
Lecture 35Lecture 35
Lecture 35Shani729
 
Lecture 36
Lecture 36Lecture 36
Lecture 36Shani729
 
Lecture 34
Lecture 34Lecture 34
Lecture 34Shani729
 

More from Shani729 (20)

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interaction
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Lecture 40
Lecture 40Lecture 40
Lecture 40
 
Lecture 39
Lecture 39Lecture 39
Lecture 39
 
Lecture 38
Lecture 38Lecture 38
Lecture 38
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
 
Lecture 35
Lecture 35Lecture 35
Lecture 35
 
Lecture 36
Lecture 36Lecture 36
Lecture 36
 
Lecture 34
Lecture 34Lecture 34
Lecture 34
 

Recently uploaded

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Recently uploaded (20)

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

Dwh lecture slides-week 12&13

  • 1. Dr. Abdul Basit Siddiqui Assistant Professor FURC, Islamabad
  • 2.
  • 3. DWH & OLAP Relationship between DWH & OLAP Data Warehouse & OLAP go together Analysis supported by OLAP 3
  • 4. Supporting the human thought process How many such query sequences can be programmed in advance? 4 THOUGHT PROCESSTHOUGHT PROCESS QUERY SEQUENCEQUERY SEQUENCE An enterprise wide fall in profit Profit down by a large percentage consistently during last quarter only. Rest is OK What is special about last quarter ? Products alone doing OK, but North region is most problematic. What was the quarterly sales during last year ?? What was the quarterly sales at regional level during last year ?? What was the monthly sale for last quarter group by products What was the monthly sale of products in north at store level group by products purchased OK. So the problem is the high cost of products purchased in north. What was the quarterly sales at product level during last year?  ? What was the monthly sale for last quarter group by region
  • 5. Analysis of last example Analysis is Ad-hoc Analysis is interactive (user driven) Analysis is iterative Answer to one question leads to a dozen more Analysis is directional Drill Down Roll Up Pivot 5 More in subsequent slides
  • 6. Challenges… Not feasible to write predefined queries. Fails to remain user_driven (becomes programmer driven). Fails to remain ad_hoc and hence is not interactive. Enable ad-hoc query support Business user can not build his/her own queries (does not know SQL, should not know it). On_the_go SQL generation and execution too slow. 6
  • 7. Challenges Contradiction Want to compute answers in advance, but don't know the questions Solution Compute answers to “all” possible “queries”. But how? NOTE: Queries are multidimensional aggregates at some level 7
  • 8. “All” possible queries (level aggregates) 8 Province Frontier Punjab... Division MultanLahorePeshawarMardan ...... Lahore ... Gugranwala City Zone GulbergDefense ... District LahorePeshawar ALL ALLALL ALL
  • 9. OLAP: Facts & Dimensions FACTS: Quantitative values (numbers) or “measures.” e.g., units sold, sales $, Co , Kg etc. DIMENSIONS: Descriptive categories. e.g., time, geography, product etc. DIM often organized in hierarchies representing levels of detail in the data (e.g., week, month, quarter, year, decade etc.). 9
  • 10. Where Does OLAP Fit In?  It is a classification of applications, NOT a database design technique.  Analytical processing uses multi-level aggregates, instead of record level access.  Objective is to support very I. fast II. iterative and III. ad-hoc decision-making 10
  • 11. Where does OLAP fit in? 11 Transaction Data  Presentation Tools Reports OLAP Data Cube (MOLAP) Data Loading  ? Decision Maker
  • 12. OLTP vs. OLAP Feature OLTP OLAP Level of data Detailed Aggregated Amount of data per transaction Small Large Views Pre-defined User-defined Typical write operation Update, insert, delete Bulk insert “age” of data Current (60-90 days) Historical 5-10 years and also current Number of users High Low-Med Tables Flat tables Multi-Dimensional tables Database size Med (109 B – 1012 B) High (1012 B – 1015 B) Query Optimizing Requires experience Already “optimized” Data availability High Low-Med 12
  • 13. OLAP FASMI Test Fast: Delivers information to the user at a fairly constant rate. Most queries answered in under five seconds. Analysis: Performs basic numerical and statistical analysis of the data, pre-defined by an application developer or defined ad-hocly by the user. Shared: Implements the security requirements necessary for sharing potentially confidential data across a large user population. Multi-dimensional: The essential characteristic of OLAP. Information: Accesses all the data and information necessary and relevant for the application, wherever it may reside and not limited by volume. 13
  • 14. 14
  • 15. OLAP Implementations 1. MOLAP: OLAP implemented with a multi-dimensional data structure. 2. ROLAP: OLAP implemented with a relational database. 3. HOLAP: OLAP implemented as a hybrid of MOLAP and ROLAP. 4. DOLAP: OLAP implemented for desktop decision support environments. 15
  • 16. MOLAP Implementations OLAP has historically been implemented using a multi_dimensional data structure or “cube”. Dimensions are key business factors for analysis: Geographies (city, district, division, province,...) Products (item, product category, product department,...) Dates (day, week, month, quarter, year,...) Very high performance achieved by O(1) time lookup into “cube” data structure to retrieve pre_aggregated results. 16
  • 17. MOLAP Implementations No standard query language for querying MOLAP - No SQL ! Vendors provide proprietary languages allowing business users to create queries that involve pivots, drilling down, or rolling up. - E.g. MDX of Microsoft - Languages generally involve extensive visual (click and drag) support. - Application Programming Interface (API)’s also provided for probing the cubes. 17
  • 18. Aggregations in MOLAP 18  Sales volume as a function of (i) product, (ii)Sales volume as a function of (i) product, (ii) time, and (iii) geographytime, and (iii) geography  A cube structure created to handle this.A cube structure created to handle this. Dimensions: Product, Geography, Time Industry Category Product Hierarchical summarization pathsHierarchical summarization paths Product G eog Time w1 w2 w3 w4 w5 w6 Milk Bread Eggs Butter Jam Juice N E W S 12 13 45 8 23 10 Province Division District City Zone Year Quarter Month Week Day
  • 19. Cube operations Drill down: get more details e.g., given summarized sales as above, find breakup of sales by city within each region, or within Sindh Rollup: summarize data e.g., given sales data, summarize sales for last year by product category and region Slice and dice: select and project e.g.: Sales of soft-drinks in Karachi during last quarter Pivot: change the view of data 19
  • 20. Querying the cube - 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 2001 2002 Juices Soda Drinks 20 Drill-down - 2,000 4,000 6,000 8,000 10,000 12,000 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 OJ RK 8UP PK MJ BU AJ 2001 2002 - 2,000 4,000 6,000 8,000 10,000 12,000 14,000 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Juices Soda Drinks 2001 2002 Drill-Down Roll-Up
  • 21. Querying the cube: Pivoting - 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 2001 2002 Juices Soda Drinks - 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 Orange juice Mango juice Apple juice Rola- Kola 8-UP Bubbly- UP Pola- Kola 2001 2002 21
  • 22. MOLAP evaluation 22 Advantages of MOLAP:  Instant response (pre-calculated aggregates).  Impossible to ask question without an answer.  Value added functions (ranking, % change).
  • 23. MOLAP evaluation 23 Drawbacks of MOLAP:  Long load time ( pre-calculating the cube may take days!).  Very sparse cube (wastage of space) for high cardinality (sometimes in small hundreds). e.g. number of heaters sold in Jacobabad or Sibi.
  • 24. MOLAP Implementation issues Maintenance issue: Every data item received must be aggregated into every cube (assuming “to-date” summaries are maintained). Lot of work. Storage issue: As dimensions get less detailed (e.g., year vs. day) cubes get much smaller, but storage consequences for building hundreds of cubes can be significant. Lot of space. 24
  • 25. Partitioned Cubes To overcome the space limitation of MOLAP, the cube is partitioned. The divide&conquer cube partitioning approach helps alleviate the scalability limitations of MOLAP implementation. One logical cube of data can be spread across multiple physical cubes on separate (or same) servers. Ideal cube partitioning is completely invisible to end users. Performance degradation does occurs in case of a join across partitioned cubes. 25
  • 26. Partitioned Cubes: How it looks Like? 26 Time Geography Men’s clothing Children clothing Bed linen Sales data cube partitioned at a major cottonSales data cube partitioned at a major cotton products sale outletproducts sale outlet Product
  • 27. Virtual Cubes Used to query two dissimilar cubes by creating a third “virtual” cube by a join between two cubes. Logically similar to a relational view i.e. linking two (or more) cubes along common dimension(s). Biggest advantage is saving in space by eliminating storage of redundant information. Example: Joining the store cube and the list price cube along the product dimension, to calculate the sale price without redundant storage of the sale price data. 27
  • 28. 28
  • 29. Why ROLAP? Issue of scalability i.e. curse of dimensionality for MOLAP Deployment of significantly large dimension tables as compared to MOLAP using secondary storage. Aggregate awareness allows using pre-built summary tables by some front-end tools.  Star schema designs usually used to facilitate ROLAP querying (in next lecture). 29
  • 30. ROLAP as a “Cube” OLAP data is stored in a relational database (e.g. a star schema) The fact table is a way of visualizing as a “un-rolled” cube. So where is the cube? It’s a matter of perception Visualize the fact table as an elementary cube. 30 ProductG eog. Time 500500Z1Z1P2P2M2M2 250250Z1Z1P1P1M1M1 Sale K Rs.Sale K Rs.ZoneZoneProductProductMonthMonth Fact Table
  • 31. How to create “Cube” in ROLAP Cube is a logical entity containing values of a certain fact at a certain aggregation level at an intersection of a combination of dimensions. The following table can be created using 3 queries 31 SUMSUM (Sales_Amt)(Sales_Amt) M1M1 M2M2 M3M3 ALLALL P1P1 P2P2 P3P3 TotalTotal Month_ID Product_ID
  • 32. How to create “Cube” in ROLAP using SQL  For the table entries, without the totals SELECT S.Month_Id, S.Product_Id, SUM(S.Sales_Amt) FROM Sales GROUP BY S.Month_Id, S.Product_Id;  For the row totals SELECT S.Product_Id, SUM (Sales_Amt) FROM Sales GROUP BY S.Product_Id;  For the column totals SELECT S.Month_Id, SUM (Sales) FROM Sales GROUP BY S.Month_Id; 32
  • 33. Problem With Simple Approach Number of required queries increases exponentially with the increase in number of dimensions. Its wasteful to compute all queries. In the example, the first query can do most of the work of the other two queries If we could save that result and aggregate over Month_Id and Product_Id, we could compute the other queries more efficiently 33
  • 34. CUBE Clause The CUBE clause is part of SQL:1999 GROUP BY CUBE (v1, v2, …, vn) Equivalent to a collection of GROUP BYs, one for each of the subsets of v1, v2, …, vn 34
  • 35. ROLAP & Space Requirement If one is not careful, with the increase in number of dimensions, the number of summary tables gets very large Consider the example discussed earlier with the following two dimensions on the fact table... Time: Day, Week, Month, Quarter, Year, All Days Product: Item, Sub-Category, Category, All Products 35
  • 36. EXAMPLE: ROLAP & Space Requirement 36 A naï ve implementation will require all combinations of summary tables at each and every aggregation level.     … 24 summary tables, add in geography, results in 120 tables
  • 37. ROLAP Issues  Maintenance.  Non standard hierarchy of dimensions.  Non standard conventions.  Explosion of storage space requirement. Aggregation pit-falls. 37
  • 38. ROLAP Issue: Maintenance Summary tables are mostly a maintenance issue (similar to MOLAP) than a storage issue. Notice that summary tables get much smaller as dimensions get less detailed (e.g., year vs. day). Should plan for twice the size of the unsummarized data for ROLAP summaries in most environments. Assuming "to-date" summaries, every detail record that is received into warehouse must aggregate into EVERY summary table. 38
  • 39. ROLAP Issue: Hierarchies Dimensions are NOT always simple hierarchies Dimensions can be more than simple hierarchies i.e. item, subcategory, category, etc. The product dimension might also branch off by trade style that cross simple hierarchy boundaries such as: Looking at sales of air conditioners that cross manufacturer boundaries, such as COY1, COY2, COY3 etc. Looking at sales of all “green colored” items that even cross product categories (washing machine, refrigerator, split-AC, etc.). Looking at a combination of both. 39
  • 40. ROLAP Issue: Convention Conventions are NOT absolute Example: What is calendar year? What is a week?  Calendar: 01 Jan. to 31 Dec or 01 Jul. to 30 Jun. or 01 Sep to 30 Aug.  Week: Mon. to Sat. or Thu. to Wed. 40
  • 41. ROLAP Issue: Storage space explosion 41 Summary tables required for non-standardSummary tables required for non-standard groupinggrouping Summary tables required along differentSummary tables required along different definitions of year, week etc.definitions of year, week etc. Brute force approach would quickly overwhelmBrute force approach would quickly overwhelm the system storage capacity due to athe system storage capacity due to a combinatorial explosion.combinatorial explosion.
  • 42. ROALP Issues: Aggregation pitfalls Coarser granularity correspondingly decreases potential cardinality. Aggregating whatever that can be aggregated. Throwing away the detail data after aggregation. 42
  • 43. How to Reduce Summary tables? Many ROLAP products have developed means to reduce the number of summary tables by: Building summaries on-the-fly as required by end- user applications. Enhancing performance on common queries at coarser granularities. Providing smart tools to assist DBAs in selecting the "best” aggregations to build i.e. trade-off between speed and space. 43
  • 44. Performance vs. Space Trade-Off Maximum performance boost implies using lots of disk space for storing every pre-calculation. Minimum performance boost implies no disk space with zero pre-calculation. Using meta data to determine best level of pre- aggregation from which all other aggregates can be computed. 44
  • 45. Performance vs. Space Trade-off using Wizard 20 40 60 80 100 2 4 6 8 ×× ×× MB %Gain Aggregation answers most queries Aggregation answers few queries
  • 46. HOLAP Target is to get the best of both worlds. HOLAP (Hybrid OLAP) allow co-existence of pre- built MOLAP cubes alongside relational OLAP or ROLAP structures. How much to pre-build? 46
  • 47. DOLAP 47 Cube on the remote server Local Machine/Server Subset of the cube is transferred to the local machine