05 OLAP  v6 weekend
Upcoming SlideShare
Loading in...5
×
 

05 OLAP v6 weekend

on

  • 399 views

 

Statistics

Views

Total Views
399
Views on SlideShare
399
Embed Views
0

Actions

Likes
1
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The classic star schema is characterized by the following: <br /> A single “fact table” containing a compound primary key, with one segment for each “dimension,” and additional columns of additive, numeric facts <br /> A single “dimension” table (for each dimension) with a generated key, and a level indicator that describes the attribute level of each record. For example, if the dimension is STORE, records in the dimension table might refer to stores, districts or regions. We might assign a level number to these (such as 1=store, 2=district, 3=region) and put that value in the “level” column. <br /> The single fact table will contain detail (or “atomic”) data, such as sales dollars, for a given store, for a given product in a given time period. <br /> The fact table may also contain partially consolidated data, such as sales dollars for a region, for a given product for a given time period. <br /> Confusion is possible if ALL consolidated data isn’t included. For example, if data is consolidated only to the district level, a query for regional information will bring back no records and, hence, report no sales activity. For this reason, simple stars MUST contain either <br /> - ALL of the combinations of aggregated data or <br /> - At least views of every combination <br />
  • OLAP ist weitverbreitet: daher genauere Betrachtung der Architekturen: <br /> MOLAP: Benutzer greift eine spezielle Multidimensionale Datenbank zu. Diese kann seine Anfragen sehr performant beantworten, da sie extra für diesen Zweck optimiert ist. Daten werden häufig aus der zentralen DW-DB befüllt. <br /> ROLAP: Ausnutzung der relationalen Datenbank-Technologie. ROLAP Engine setzt multidim Operationen dynamisch in SQL um. DB Systeme nicht speziell für diese Anwendungs optimiert. <br /> Client-OLAP: Bildung des Würfels auf der Maschine des Benutzers. Persönlicher Würfel für jeden Benutzer, aber Kapazität! <br />

05 OLAP  v6 weekend 05 OLAP v6 weekend Presentation Transcript

  • Business Information Systems OLAP Cubes in Datawarehousing Prithwis Mukerjee, Ph.D. •Acknowledgement •Hector Garcia Molina – Stanford •FORWISS - Bavarian Research Centre for Knowledge Based Systems
  • OLTP vs. OLAP OLTP: On Line Transaction Processing         Describes processing at operational sites Mostly updates Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical OLAP: On Line Analytical Processing       Describes processing at warehouse Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users 2 2
  • Warehouse Models & Operators Data Models    relations stars & snowflakes cubes Operators     slice & dice roll-up, drill down pivoting other 3 3
  • Star Schema Terms Fact table Dimension tables Measures product prodId name price sale orderId date custId prodId storeId qty amt customer custId name address city store storeId city 4 4
  • Star product prodId p1 p2 name price bolt 10 nut 5 sale oderId date o100 1/7/97 o102 2/7/97 105 3/8/97 customer custId 53 81 111 store storeId c1 c2 c3 custId 53 53 111 name joe fred sally prodId p1 p2 p1 storeId c1 c1 c3 address 10 main 12 main 80 willow qty 1 2 5 city nyc sfo la amt 12 11 50 city sfo sfo la 5 5
  • Dimension Hierarchies sType store store storeId s5 s7 s9 city cityId sfo sfo la tId t1 t2 t1 mgr joe fred nancy  snowflake schema  constellations region sType tId t1 t2 city size small large cityId pop sfo 1M la 5M location downtown suburbs regId north south region regId name north cold region south warm region 6 6
  • Cube Fact table view: sale prodId p1 p2 p1 p2 storeId c1 c1 c3 c2 Multi-dimensional cube: amt 12 11 50 8 p1 p2 c1 12 11 c2 c3 50 8 dimensions = 2 7 7
  • 3-D Cube Fact table view: sale prodId p1 p2 p1 p2 p1 p1 storeId c1 c1 c3 c2 c1 c2 Multi-dimensional cube: date 1 1 1 1 2 2 amt 12 11 50 8 44 4 day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 4 c2 c3 c3 50 8 dimensions = 3 8 8
  • ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing MOLAP: Multi-Dimensional On-Line Analytical Processing 9 9
  • Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 81 10 10
  • Aggregates • Add up amounts by day • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 ans date 1 2 11 sum 81 48 11
  • Another Example • Add up amounts by day, product • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 sale prodId p1 p2 p1 date 1 1 2 amt 62 19 48 rollup drill-down 12 12
  • Aggregates Operators: sum, count, max, min, median, ave “Having” clause Using dimension hierarchy   average by region (within store) maximum by month (within date) 13 13
  • Cube Aggregation day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 Example: computing sums ... c3 50 8 c2 4 8 rollup drill-down c3 50 sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 14 14
  • Cube Operators day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 ... c3 50 sale(c1,*,*) 8 c2 4 8 c3 50 sale(c2,p2,*) sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 sale(*,*,*) 15 15
  • Extended Cube c2 4 8 12 c3 p1 p2 c1 * 12 p1 p2 * c1 44 c1 56 11 67 c2 4 c2 44 c3 4 50 11 23 8 8 50 * 62 19 81 * day 2 day 1 p1 p2 * c3 50 * 50 48 48 * 110 19 129 sale(*,p2,*) 16 16
  • Aggregation Using Hierarchies day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 4 c2 c3 c3 50 8 customer region country p1 p2 region A 56 11 region B 54 8 (customer c1 in Region A; customers c2, c3 in Region B) 17 17
  • Pivoting Fact table view: sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 Multi-dimensional cube: date 1 1 1 1 2 2 amt 12 11 50 8 44 4 day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 c3 50 8 c2 4 8 18 c3 50 18
  • What is a Multi-Dimensional Database? A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are • intimately related and • stored, viewed and analyzed from different perspectives. These perspectives are called dimensions. 19
  • Relational and Multi-Dimensional Models: An Example 2 SALES VOLUMES FOR GLEASON DEALERSHIP MODEL MINI VAN MINI VAN MINI VAN SPORTS COUPE SPORTS COUPE SPORTS COUPE SEDAN SEDAN SEDAN COLOR BLUE RED WHITE BLUE RED WHITE BLUE RED WHITE SALES VOLUME 6 5 4 3 5 5 4 3 2 The Relational Structure 20
  • Multidimentional Structure Sales Volumes Dimension Mini Van 6 5 4 Coupe 3 5 5 Sedan 4 3 2 Blue M O D E L Red White COLOR Positions Measurement Dimension 21
  • The “Classic” Star Scheme Fact Table Store Dimension STORE KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Product Dimension PRODUCT KEY Time Dimension PERIOD KEY Period Desc Year Quarter Month Day Product Desc. Brand Color Size Manufacturer 22
  • Differences between MDDB and Relational Databases Normalized Relational MDDB Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents Perspectives embedded directly in the structure. Browsing and data manipulation Data retrieval and manipulation are not intuitive to user are easy Slows down for large datasets Fast retrieval for large datasets due to multiple JOIN operations due to predefined structure. needed. Flexible. Anything an MDDB can do, can be done this way. Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure. 23
  • Relational Model and Multi Dimensional Databases -Example 2 SALES VOLUMES FOR ALL DEALERSHIPS MODEL MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN MINI VAN SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SPORTS COUPE SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN SEDAN COLOR BLUE BLUE BLUE RED RED RED WHITE WHITE WHITE BLUE BLUE BLUE RED RED RED WHITE WHITE WHITE BLUE BLUE BLUE RED RED RED WHITE WHITE WHITE DEALERSHIP CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR CLYDE GLEASON CARR VOLUME 6 6 2 3 5 5 2 4 3 2 3 2 7 5 2 4 5 1 6 4 2 1 3 4 2 2 3 24
  • Mutlidimensional Representation Sales Volumes M O D E L Mini Van Coupe Carr Gleason Clyde Sedan Blue Red DEALERSHIP White COLOR 25
  • Viewing Data - An Example Sales Volumes M O D E L DEALERSHIP COLOR •Assume that each dimension has 10 positions, as shown in the cube above •How many records would be there in a relational table? •Implications for viewing data from an end-user standpoint? 26
  • Adding Dimensions- An Example Sales Volumes M O D E L Mini Van Mini Van Coupe Mini Van Coupe Carr Gleason Clyde Sedan Blue Red White COLOR JANUARY Coupe Carr Gleason Clyde Sedan Blue Red White COLOR FEBRUARY Carr Gleason Clyde Sedan Blue Red DEALERSHIP White COLOR MARCH 27
  • 3 When is MDD (In)appropriate? First, consider situation 1 PERSONNEL LAST NAME SMITH REGAN FOX WELD KELLY LINK KRANZ LUCUS WEISS EMPLOYEE# 01 12 31 14 54 03 41 33 23 EMPLOYEE AGE 21 19 63 31 27 56 45 41 19 28
  • When is MDD (In)appropriate? Now consider situation 2 SALES VOLUMES FOR GLEASON DEALERSHIP MODEL MINI VAN MINI VAN MINI VAN SPORTS COUPE SPORTS COUPE SPORTS COUPE SEDAN SEDAN SEDAN COLOR BLUE RED WHITE BLUE RED WHITE BLUE RED WHITE VOLUME 6 5 4 3 5 5 4 3 2 1. Set up a MDD structure for situation 1, with LAST NAME and Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL and COLOR as dimensions, and SALES VOLUME as the measurement . 29
  • When is MDD (In)appropriate? MDD Structures for the Situations Employee Age Smith 21 Regan Sales Volumes Miini Van 6 5 4 Coupe 3 5 5 4 3 2 Blue M O D E L Red White Sedan Fox L A S T N A M E 19 63 Weld 31 Kelly 27 Link 56 Kranz 45 COLOR Lucas 41 Weiss 19 31 41 23 01 14 54 03 12 33 EMPLOYEE # Note the sparseness in the second MDD representation 30
  • When is MDD (In)appropriate? Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis. When there are no interrelationships, the MDD structure is not appropriate. 31
  • 4 MDD Features - Rotation Sales Volumes Mini Van 6 5 4 Coupe 3 5 5 Sedan 4 3 2 Blue M O D E L Red White COLOR View #1 C O L O R o ( ROTATE 90 ) Blue 6 3 4 Red 5 5 3 White 4 5 2 Mini Van Coupe Sedan MODEL View #2 •Also referred to as “data slicing.” •Each rotation yields a different slice or two dimensional table of data – a different face of the cube. 32
  • MDD Features - Rotation Sales Volumes M O D E L C O L O R Mini Van Coupe Carr Gleason Clyde Sedan Blue Red Red Carr Gleason Clyde White Sedan White COLOR Coupe D E A L E R S H I P Gleason Mini Van Coupe Sedan White Red Blue COLOR DEALERSHIP o o ( ROTATE 90 ) MODEL View #3 Carr M O D E L Gleason Blue Red White Clyde Mini Van Coupe Sedan o ( ROTATE 90 ) MODEL MODEL View #4 Gleason Clyde ( ROTATE 90 ) View #2 Carr Mini Van Coupe Sedan DEALERSHIP View #1 Clyde Red White Carr MODEL o ( ROTATE 90 ) Blue Mini Van DEALERSHIP D E A L E R S H I P C O L O R Blue Mini Van Coupe Blue Red White Sedan Clyde Gleason Carr o DEALERSHIP ( ROTATE 90 ) COLOR View #5 COLOR View #6 33
  • MDD Features - Ranging Sales Volumes M O D E L Mini Van Mini Van Coupe Coupe Normal Blue Carr Clyde Normal Blue Metal Blue Carr Clyde Metal Blue DEALERSHIP COLOR • The end user selects the desired positions along each dimension. • Also referred to as "data dicing." • The data is scoped down to a subset grouping 34
  • MDD Features - Roll-Ups & Drill Downs ORGANIZATION DIMENSION Midwest REGION DISTRICT DEALERSHIP Chicago Clyde Gleason St. Louis Carr Levi Gary Lucas Bolton • The figure presents a definition of a hierarchy within the organization dimension. • Aggregations perceived as being part of the same dimension. • Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.” 35
  • The Time Dimension TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years.   Eliminates the effort required to build sophisticated hierarchies every time a database is set up. Extra performance advantages 36
  • Pros/Cons of MDD 5 Cognitive Advantages for the User Ease of Data Presentation and Navigation, Time dimension Performance Less flexible Requires greater initial effort 37
  • Multidimensional OLAP (MOLAP) Frontend Tool specialized database technology multidimensional storage structures E.g. Hyperion Essbase, Oracle Express, Cognos PowerPlay (Server) Query Performance Powerful MD Model  write access Database Features  multiuser access/ backup and recovery Multidim. Database Sparsity Handling -> DB Explosion 38
  • MOLAP Server ty Multi-Dimensional OLAP Server Sales M.D. tools Product Ci B A milk soda eggs soap 1 utilities multidimensional server 2 3 4 Date could also sit on relational DBMS 39 39
  • Relational OLAP (ROLAP) idea: use relational data storage star (snowflake) schema E.g. Microstrategy, SAP BW + advantages of RDBMS Frontend Tool MDInterface ROLAPEngine SQL + Meta Data +    scalability, reliability, security etc. Sparsity handling Query Performance Data Model Complexity no write access Relational DB 40
  • ROLAP Server Relational OLAP Server sale prodId p1 p2 p1 date 1 1 2 sum 62 19 48 tools utilities ROLAP server Special indices, tuning; Schema is “denormalized” relational DBMS 41 41
  • Client (Desktop) OLAP proprietary data structure on the client data stored as file mostly RAM based architectures E.g. Business Objects, Cognos PowerPlay ClientOLAP + +   mobile user ease of installation and use data volume no multiuser capabilites 42
  • DW Integration MOLAP ROLAP ClientOLAP ROLAPEngine Multidim. Database DW-DB (mostly relational) 43