Business Information Systems

OLAP Cubes in Datawarehousing
Prithwis Mukerjee, Ph.D.

•Acknowledgement
•Hector Garcia Moli...
OLTP vs. OLAP
OLTP: On Line
Transaction Processing










Describes processing at
operational sites
Mostly upda...
Warehouse Models & Operators
Data Models




relations
stars & snowflakes
cubes

Operators





slice & dice
roll-u...
Star Schema Terms
Fact table
Dimension tables
Measures
product
prodId
name
price

sale
orderId
date
custId
prodId
storeId
...
Star

product

prodId
p1
p2

name price
bolt
10
nut
5

sale oderId date
o100 1/7/97
o102 2/7/97
105 3/8/97

customer

cust...
Dimension Hierarchies

sType
store

store storeId
s5
s7
s9

city
cityId
sfo
sfo
la

tId
t1
t2
t1

mgr
joe
fred
nancy

 sn...
Cube

Fact table view:
sale

prodId
p1
p2
p1
p2

storeId
c1
c1
c3
c2

Multi-dimensional cube:
amt
12
11
50
8

p1
p2

c1
12...
3-D Cube

Fact table view:
sale

prodId
p1
p2
p1
p2
p1
p1

storeId
c1
c1
c3
c2
c1
c2

Multi-dimensional cube:
date
1
1
1
1...
ROLAP vs. MOLAP
ROLAP:
Relational On-Line Analytical Processing
MOLAP:
Multi-Dimensional On-Line Analytical Processing

9
...
Aggregates

• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
sale

prodId storeId
p1
c1
p2
c1...
Aggregates

• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
sale

prodId storeId
p1
c1
p2
...
Another Example

• Add up amounts by day, product
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale

pr...
Aggregates
Operators: sum, count, max, min,
median, ave
“Having” clause
Using dimension hierarchy



average by region (...
Cube Aggregation

day 2
day 1

p1
p2 c1
p1
12
p2
11

p1
p2

c1
56
11

c1
44

c2
4
c2

c3

Example: computing sums
...

c3
...
Cube Operators

day 2
day 1

p1
p2 c1
p1
12
p2
11

p1
p2

c1
56
11

c1
44

c2
4
c2

c3

...

c3
50

sale(c1,*,*)

8

c2
4
...
Extended Cube

c2
4
8
12
c3

p1
p2
c1
*
12

p1
p2
*
c1
44

c1
56
11
67
c2
4

c2
44

c3
4
50

11
23

8
8

50

*
62
19
81

*...
Aggregation Using Hierarchies

day 2
day 1

p1
p2 c1
p1
12
p2
11

c1
44

c2
4
c2

c3
c3
50

8

customer
region
country

p1...
Pivoting

Fact table view:
sale

prodId storeId
p1
c1
p2
c1
p1
c3
p2
c2
p1
c1
p1
c2

Multi-dimensional cube:
date
1
1
1
1
...
What is a Multi-Dimensional
Database?
A multidimensional database (MDDB) is a computer
software system designed to allow f...
Relational and Multi-Dimensional
Models: An Example

2

SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL
MINI VAN
MINI VAN
MINI ...
Multidimentional Structure

Sales Volumes

Dimension

Mini Van

6

5

4

Coupe

3

5

5

Sedan

4

3

2

Blue

M
O
D
E
L

...
The “Classic” Star Scheme

Fact Table
Store Dimension
STORE KEY
Store Description
City
State
District ID
District Desc.
Re...
Differences between MDDB and
Relational Databases
Normalized Relational

MDDB

Data reorganized based on
query. Perspectiv...
Relational Model and Multi
Dimensional Databases -Example 2
SALES VOLUMES FOR ALL DEALERSHIPS
MODEL
MINI VAN
MINI VAN
MINI...
Mutlidimensional Representation

Sales Volumes

M
O
D
E
L

Mini Van
Coupe
Carr
Gleason
Clyde

Sedan
Blue

Red

DEALERSHIP
...
Viewing Data - An Example

Sales Volumes

M
O
D
E
L
DEALERSHIP
COLOR

•Assume that each dimension has 10 positions, as sho...
Adding Dimensions- An Example

Sales Volumes
M
O
D
E
L

Mini Van

Mini Van

Coupe

Mini Van

Coupe
Carr
Gleason
Clyde

Sed...
3
When is MDD (In)appropriate?

First, consider situation 1
PERSONNEL
LAST NAME
SMITH
REGAN
FOX
WELD
KELLY
LINK
KRANZ
LUCU...
When is MDD (In)appropriate?

Now consider situation 2

SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL
MINI VAN
MINI VAN
MINI ...
When is MDD (In)appropriate?

MDD Structures for the Situations
Employee Age
Smith

21

Regan

Sales Volumes
Miini Van

6
...
When is MDD (In)appropriate?
Highly interrelated dataset types be placed in a
multidimensional data structure for greatest...
4

MDD Features - Rotation

Sales Volumes

Mini Van

6

5

4

Coupe

3

5

5

Sedan

4

3

2

Blue

M
O
D
E
L

Red

White
...
MDD Features - Rotation

Sales Volumes

M
O
D
E
L

C
O
L
O
R

Mini Van
Coupe
Carr
Gleason
Clyde

Sedan
Blue

Red

Red
Carr...
MDD Features - Ranging

Sales Volumes

M
O
D
E
L

Mini Van

Mini Van
Coupe

Coupe

Normal
Blue

Carr
Clyde

Normal
Blue

M...
MDD Features - Roll-Ups & Drill
Downs
ORGANIZATION DIMENSION
Midwest

REGION

DISTRICT
DEALERSHIP

Chicago
Clyde

Gleason
...
The Time Dimension

TIME as a predefined hierarchy for rolling-up and
drilling-down across days, weeks, months, years and
...
Pros/Cons of MDD

5

Cognitive Advantages for the User
Ease of Data Presentation and Navigation,
Time dimension
Performanc...
Multidimensional OLAP (MOLAP)

Frontend
Tool

specialized database technology
multidimensional storage structures
E.g. Hyp...
MOLAP Server

ty

Multi-Dimensional OLAP Server
Sales

M.D. tools

Product

Ci

B
A

milk
soda
eggs
soap

1

utilities

mu...
Relational OLAP (ROLAP)
idea: use relational data
storage
star (snowflake) schema
E.g. Microstrategy, SAP BW
+ advantages ...
ROLAP Server
Relational OLAP Server
sale

prodId
p1
p2
p1

date
1
1
2

sum
62
19
48

tools

utilities

ROLAP
server

Speci...
Client (Desktop) OLAP
proprietary data structure on the
client
data stored as file
mostly RAM based architectures
E.g. Bus...
DW Integration

MOLAP

ROLAP

ClientOLAP

ROLAPEngine
Multidim.
Database

DW-DB (mostly relational)
43
Upcoming SlideShare
Loading in …5
×

05 OLAP v6 weekend

439 views
309 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
439
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The classic star schema is characterized by the following:
    A single “fact table” containing a compound primary key, with one segment for each “dimension,” and additional columns of additive, numeric facts
    A single “dimension” table (for each dimension) with a generated key, and a level indicator that describes the attribute level of each record. For example, if the dimension is STORE, records in the dimension table might refer to stores, districts or regions. We might assign a level number to these (such as 1=store, 2=district, 3=region) and put that value in the “level” column.
    The single fact table will contain detail (or “atomic”) data, such as sales dollars, for a given store, for a given product in a given time period.
    The fact table may also contain partially consolidated data, such as sales dollars for a region, for a given product for a given time period.
    Confusion is possible if ALL consolidated data isn’t included. For example, if data is consolidated only to the district level, a query for regional information will bring back no records and, hence, report no sales activity. For this reason, simple stars MUST contain either
    - ALL of the combinations of aggregated data or
    - At least views of every combination
  • OLAP ist weitverbreitet: daher genauere Betrachtung der Architekturen:
    MOLAP: Benutzer greift eine spezielle Multidimensionale Datenbank zu. Diese kann seine Anfragen sehr performant beantworten, da sie extra für diesen Zweck optimiert ist. Daten werden häufig aus der zentralen DW-DB befüllt.
    ROLAP: Ausnutzung der relationalen Datenbank-Technologie. ROLAP Engine setzt multidim Operationen dynamisch in SQL um. DB Systeme nicht speziell für diese Anwendungs optimiert.
    Client-OLAP: Bildung des Würfels auf der Maschine des Benutzers. Persönlicher Würfel für jeden Benutzer, aber Kapazität!
  • 05 OLAP v6 weekend

    1. 1. Business Information Systems OLAP Cubes in Datawarehousing Prithwis Mukerjee, Ph.D. •Acknowledgement •Hector Garcia Molina – Stanford •FORWISS - Bavarian Research Centre for Knowledge Based Systems
    2. 2. OLTP vs. OLAP OLTP: On Line Transaction Processing         Describes processing at operational sites Mostly updates Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical OLAP: On Line Analytical Processing       Describes processing at warehouse Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users 2 2
    3. 3. Warehouse Models & Operators Data Models    relations stars & snowflakes cubes Operators     slice & dice roll-up, drill down pivoting other 3 3
    4. 4. Star Schema Terms Fact table Dimension tables Measures product prodId name price sale orderId date custId prodId storeId qty amt customer custId name address city store storeId city 4 4
    5. 5. Star product prodId p1 p2 name price bolt 10 nut 5 sale oderId date o100 1/7/97 o102 2/7/97 105 3/8/97 customer custId 53 81 111 store storeId c1 c2 c3 custId 53 53 111 name joe fred sally prodId p1 p2 p1 storeId c1 c1 c3 address 10 main 12 main 80 willow qty 1 2 5 city nyc sfo la amt 12 11 50 city sfo sfo la 5 5
    6. 6. Dimension Hierarchies sType store store storeId s5 s7 s9 city cityId sfo sfo la tId t1 t2 t1 mgr joe fred nancy  snowflake schema  constellations region sType tId t1 t2 city size small large cityId pop sfo 1M la 5M location downtown suburbs regId north south region regId name north cold region south warm region 6 6
    7. 7. Cube Fact table view: sale prodId p1 p2 p1 p2 storeId c1 c1 c3 c2 Multi-dimensional cube: amt 12 11 50 8 p1 p2 c1 12 11 c2 c3 50 8 dimensions = 2 7 7
    8. 8. 3-D Cube Fact table view: sale prodId p1 p2 p1 p2 p1 p1 storeId c1 c1 c3 c2 c1 c2 Multi-dimensional cube: date 1 1 1 1 2 2 amt 12 11 50 8 44 4 day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 4 c2 c3 c3 50 8 dimensions = 3 8 8
    9. 9. ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing MOLAP: Multi-Dimensional On-Line Analytical Processing 9 9
    10. 10. Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 81 10 10
    11. 11. Aggregates • Add up amounts by day • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 ans date 1 2 11 sum 81 48 11
    12. 12. Another Example • Add up amounts by day, product • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 sale prodId p1 p2 p1 date 1 1 2 amt 62 19 48 rollup drill-down 12 12
    13. 13. Aggregates Operators: sum, count, max, min, median, ave “Having” clause Using dimension hierarchy   average by region (within store) maximum by month (within date) 13 13
    14. 14. Cube Aggregation day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 Example: computing sums ... c3 50 8 c2 4 8 rollup drill-down c3 50 sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 14 14
    15. 15. Cube Operators day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 ... c3 50 sale(c1,*,*) 8 c2 4 8 c3 50 sale(c2,p2,*) sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 sale(*,*,*) 15 15
    16. 16. Extended Cube c2 4 8 12 c3 p1 p2 c1 * 12 p1 p2 * c1 44 c1 56 11 67 c2 4 c2 44 c3 4 50 11 23 8 8 50 * 62 19 81 * day 2 day 1 p1 p2 * c3 50 * 50 48 48 * 110 19 129 sale(*,p2,*) 16 16
    17. 17. Aggregation Using Hierarchies day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 4 c2 c3 c3 50 8 customer region country p1 p2 region A 56 11 region B 54 8 (customer c1 in Region A; customers c2, c3 in Region B) 17 17
    18. 18. Pivoting Fact table view: sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 Multi-dimensional cube: date 1 1 1 1 2 2 amt 12 11 50 8 44 4 day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 c3 50 8 c2 4 8 18 c3 50 18
    19. 19. What is a Multi-Dimensional Database? A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are • intimately related and • stored, viewed and analyzed from different perspectives. These perspectives are called dimensions. 19
    20. 20. Relational and Multi-Dimensional Models: An Example 2 SALES VOLUMES FOR GLEASON DEALERSHIP MODEL MINI VAN MINI VAN MINI VAN SPORTS COUPE SPORTS COUPE SPORTS COUPE SEDAN SEDAN SEDAN COLOR BLUE RED WHITE BLUE RED WHITE BLUE RED WHITE SALES VOLUME 6 5 4 3 5 5 4 3 2 The Relational Structure 20
    21. 21. Multidimentional Structure Sales Volumes Dimension Mini Van 6 5 4 Coupe 3 5 5 Sedan 4 3 2 Blue M O D E L Red White COLOR Positions Measurement Dimension 21
    22. 22. The “Classic” Star Scheme Fact Table Store Dimension STORE KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Product Dimension PRODUCT KEY Time Dimension PERIOD KEY Period Desc Year Quarter Month Day Product Desc. Brand Color Size Manufacturer 22
    23. 23. Differences between MDDB and Relational Databases Normalized Relational MDDB Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents Perspectives embedded directly in the structure. Browsing and data manipulation Data retrieval and manipulation are not intuitive to user are easy Slows down for large datasets Fast retrieval for large datasets due to multiple JOIN operations due to predefined structure. needed. Flexible. Anything an MDDB can do, can be done this way. Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure. 23
    24. 24. Relational Model and Multi Dimensional Databases -Example 2 SALES VOLUMES FOR ALL DEALERSHIPS MODEL MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN COLOR BLUE BLUE BLUE RED RED RED WHITE WHITE WHITE BLUE BLUE BLUE RED RED RED WHITE WHITE WHITE BLUE BLUE BLUE RED RED RED WHITE WHITE WHITE DEALERSHIP CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR VOLUME 6 6 2 3 5 5 2 4 3 2 3 2 7 5 2 4 5 1 6 4 2 1 3 4 2 2 3 24
    25. 25. Mutlidimensional Representation Sales Volumes M O D E L Mini Van Coupe Carr Gleason Clyde Sedan Blue Red DEALERSHIP White COLOR 25
    26. 26. Viewing Data - An Example Sales Volumes M O D E L DEALERSHIP COLOR •Assume that each dimension has 10 positions, as shown in the cube above •How many records would be there in a relational table? •Implications for viewing data from an end-user standpoint? 26
    27. 27. Adding Dimensions- An Example Sales Volumes M O D E L Mini Van Mini Van Coupe Mini Van Coupe Carr Gleason Clyde Sedan Blue Red White COLOR JANUARY Coupe Carr Gleason Clyde Sedan Blue Red White COLOR FEBRUARY Carr Gleason Clyde Sedan Blue Red DEALERSHIP White COLOR MARCH 27
    28. 28. 3 When is MDD (In)appropriate? First, consider situation 1 PERSONNEL LAST NAME SMITH REGAN FOX WELD KELLY LINK KRANZ LUCUS WEISS EMPLOYEE# 01 12 31 14 54 03 41 33 23 EMPLOYEE AGE 21 19 63 31 27 56 45 41 19 28
    29. 29. When is MDD (In)appropriate? Now consider situation 2 SALES VOLUMES FOR GLEASON DEALERSHIP MODEL MINI VAN MINI VAN MINI VAN SPORTS COUPE SPORTS COUPE SPORTS COUPE SEDAN SEDAN SEDAN COLOR BLUE RED WHITE BLUE RED WHITE BLUE RED WHITE VOLUME 6 5 4 3 5 5 4 3 2 1. Set up a MDD structure for situation 1, with LAST NAME and Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL and COLOR as dimensions, and SALES VOLUME as the measurement . 29
    30. 30. When is MDD (In)appropriate? MDD Structures for the Situations Employee Age Smith 21 Regan Sales Volumes Miini Van 6 5 4 Coupe 3 5 5 4 3 2 Blue M O D E L Red White Sedan Fox L A S T N A M E 19 63 Weld 31 Kelly 27 Link 56 Kranz 45 COLOR Lucas 41 Weiss 19 31 41 23 01 14 54 03 12 33 EMPLOYEE # Note the sparseness in the second MDD representation 30
    31. 31. When is MDD (In)appropriate? Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis. When there are no interrelationships, the MDD structure is not appropriate. 31
    32. 32. 4 MDD Features - Rotation Sales Volumes Mini Van 6 5 4 Coupe 3 5 5 Sedan 4 3 2 Blue M O D E L Red White COLOR View #1 C O L O R o ( ROTATE 90 ) Blue 6 3 4 Red 5 5 3 White 4 5 2 Mini Van Coupe Sedan MODEL View #2 •Also referred to as “data slicing.” •Each rotation yields a different slice or two dimensional table of data – a different face of the cube. 32
    33. 33. MDD Features - Rotation Sales Volumes M O D E L C O L O R Mini Van Coupe Carr Gleason Clyde Sedan Blue Red Red Carr Gleason Clyde White Sedan White COLOR Coupe D E A L E R S H I P Gleason Mini Van Coupe Sedan White Red Blue COLOR DEALERSHIP o o ( ROTATE 90 ) MODEL View #3 Carr M O D E L Gleason Blue Red White Clyde Mini Van Coupe Sedan o ( ROTATE 90 ) MODEL MODEL View #4 Gleason Clyde ( ROTATE 90 ) View #2 Carr Mini Van Coupe Sedan DEALERSHIP View #1 Clyde Red White Carr MODEL o ( ROTATE 90 ) Blue Mini Van DEALERSHIP D E A L E R S H I P C O L O R Blue Mini Van Coupe Blue Red White Sedan Clyde Gleason Carr o DEALERSHIP ( ROTATE 90 ) COLOR View #5 COLOR View #6 33
    34. 34. MDD Features - Ranging Sales Volumes M O D E L Mini Van Mini Van Coupe Coupe Normal Blue Carr Clyde Normal Blue Metal Blue Carr Clyde Metal Blue DEALERSHIP COLOR • The end user selects the desired positions along each dimension. • Also referred to as "data dicing." • The data is scoped down to a subset grouping 34
    35. 35. MDD Features - Roll-Ups & Drill Downs ORGANIZATION DIMENSION Midwest REGION DISTRICT DEALERSHIP Chicago Clyde Gleason St. Louis Carr Levi Gary Lucas Bolton • The figure presents a definition of a hierarchy within the organization dimension. • Aggregations perceived as being part of the same dimension. • Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.” 35
    36. 36. The Time Dimension TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years.   Eliminates the effort required to build sophisticated hierarchies every time a database is set up. Extra performance advantages 36
    37. 37. Pros/Cons of MDD 5 Cognitive Advantages for the User Ease of Data Presentation and Navigation, Time dimension Performance Less flexible Requires greater initial effort 37
    38. 38. Multidimensional OLAP (MOLAP) Frontend Tool specialized database technology multidimensional storage structures E.g. Hyperion Essbase, Oracle Express, Cognos PowerPlay (Server) Query Performance Powerful MD Model  write access Database Features  multiuser access/ backup and recovery Multidim. Database Sparsity Handling -> DB Explosion 38
    39. 39. MOLAP Server ty Multi-Dimensional OLAP Server Sales M.D. tools Product Ci B A milk soda eggs soap 1 utilities multidimensional server 2 3 4 Date could also sit on relational DBMS 39 39
    40. 40. Relational OLAP (ROLAP) idea: use relational data storage star (snowflake) schema E.g. Microstrategy, SAP BW + advantages of RDBMS Frontend Tool MDInterface ROLAPEngine SQL + Meta Data +    scalability, reliability, security etc. Sparsity handling Query Performance Data Model Complexity no write access Relational DB 40
    41. 41. ROLAP Server Relational OLAP Server sale prodId p1 p2 p1 date 1 1 2 sum 62 19 48 tools utilities ROLAP server Special indices, tuning; Schema is “denormalized” relational DBMS 41 41
    42. 42. Client (Desktop) OLAP proprietary data structure on the client data stored as file mostly RAM based architectures E.g. Business Objects, Cognos PowerPlay ClientOLAP + +   mobile user ease of installation and use data volume no multiuser capabilites 42
    43. 43. DW Integration MOLAP ROLAP ClientOLAP ROLAPEngine Multidim. Database DW-DB (mostly relational) 43

    ×