Upcoming SlideShare
×

# Olap fundamentals

1,558 views

Published on

Published in: Technology
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,558
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
116
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Olap fundamentals

1. 1. 9/23/2010 1 Understanding Multi Dimensional Database Prepared By Amit Sharma Hyperion/OBIEE Trainer learnhyperion.wordpress.com Aloo_a2@yahoo.com
2. 2. 9/23/2010 2 Review ØØ ArchitectureArchitecture ØØ CharacteristicsCharacteristics ØØ Relational OLAPRelational OLAP ØØ Multidimensional OLAPMultidimensional OLAP ØØ ROLAP VS. MOLAPROLAP VS. MOLAP
3. 3. 9/23/2010 3 Star SchemaStar Schema ØØFact tableFact table ØØDimensionsDimensions ØØDrilling Down & Roll upDrilling Down & Roll up ØØSlicing & DicingSlicing & Dicing
4. 4. 9/23/2010 4 Fact • Definition : Facts are numeric measurements (values) that represent a specific business activity. Facts are stored in a FACT table I.e. the center of the star schema. Facts are used in business data analysis, are units, cost, prices and revenues • Example: sales figures are numeric measurements that represent product and/or service sales.
5. 5. 9/23/2010 5 Fact Table Central table – Mostly raw numeric items – Narrow rows, a few columns at most – Large number of rows (millions to a billion) – Access via dimensions
6. 6. 9/23/2010 6 Fact Table Definition :The centralized table in a star schema is called as FACT table, that contains facts and connected to dimensions. A fact table typically has two types of columns: Ø Contain facts and Ø Foreign keys to dimension tables. Ø The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.
7. 7. 9/23/2010 7 Dimension • Definition : Qualifying characteristics that provide additional perspective to a given fact. • Example: sales might be compared by product from region to region and from one time period to the next. Here sales have product, location and time dimensions. Such dimensions are stored in DIMENSIONAL TABLE.
8. 8. 9/23/2010 8 Dimension Tables • Definition: The dimensions of the fact table are further described with dimension tables • Fact table: Sales (Market_id, Product_Id, Time_Id, Sales_Amt) • Dimension Tables: Market (Market_Id, City, State, Region) Product (Product_Id, Name, Category, Price) Time (Time_Id, Week, Month, Quarter)
9. 9. 9/23/2010 9 Definition: Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. • It is called a star schema because the entity- relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. • The center of the star schema consists of a large fact table and it points towards the dimension tables. • The advantage of star schema are slicing down, performance increase and easy understanding of data. What is Star Schema?
10. 10. 9/23/2010 10 Steps in designing Star Schema Ø Identify a business process for analysis(like sales). Ø Identify measures or facts (sales dollar). Ø Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). Ø List the columns that describe each dimension.(region name, branch name, region name). Ø Determine the lowest level of summary in a fact table(sales dollar). Ø In a star schema every dimension will have a primary key. Ø In a star schema, a dimension table will not have any parent table. • Whereas in a snow flake schema, a dimension table will have one or more parent tables. Ø Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Ø Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.
11. 11. 9/23/2010 11 Attributes • Each dimension table contain attributes. • Used to search, filter and classify facts. • Example, Sales, we can identify some attributes for each dimension: – Product Dimension: product ID, description, product type – Location Dimension: region, state, city. – Time Dimension: year quarter, month, week and date.
12. 12. 9/23/2010 12 Attributes Hierarchy •Definition : AH provides a top-down data organization •Used for aggregation and drill-down/roll-up data analysis. •Example, location dimension attributes can be organized in a hierarchy by region, state and city. •AH provides the capability to perform drill-down and roll-up searches. •Allows the DW and OLAP systems to to have defined path.
13. 13. 9/23/2010 13 A Concept Hierarchy: Dimension (location) all Europe North_America MexicoCanadaSpainGermany Vancouver M. WindL. Chan ... ...... ... ... ... all region office country TorontoFrankfurtcity
14. 14. 9/23/2010 14 A Concept Hierarchy: Dimension (location) The Adventuresof HuckleberryFinn FictionAudiobooksBooks Winnie The PoohChildrensAudiobooksBooks The HobbitChildrensAudiobooksBooks Wild Swans:Three Daughtersof China BiographiesAudiobooksBooks High Top AlmondsArchitectureArtsand MusicBooks Product Name Product Category Product FamilyProduct Line Product_Line->Product_Family->Product_Category->Product_Name
15. 15. 9/23/2010 15 Multidimensional Data • Sales volumeas afunction of product, month, and region ProductRegion Month Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day
16. 16. 9/23/2010 16 A Sample Data Cube Total annual sales of TV in U.S.A. Date Product Country sum sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum
17. 17. 9/23/2010 17 A Sample Data Cube Total annual sales of TV in U.S.A. Date Product Country sum sumTV VCR PC 1Qtr 2Qtr3Qtr 4Qtr U.S.A Canada Mexico sum Illnois 300Ohio Texas California New York Mac Qtr4Qtr3Qtr2Qtr1 3466346634663466Illnois 6633663366336633Ohio 63446634466344663446Texas 200200200200California 1000100010001000New York John SalesSalesSalesSales Qtr4Qtr3Qtr2Qtr1Sales Manager Essbase
18. 18. 9/23/2010 18 Star Schema • A single fact tableand for each dimension onedimension table • Doesnot capture hierarchiesdirectly
19. 19. 9/23/2010 19 • Exampleof Star Schema: Figure1.6
20. 20. 9/23/2010 20 In the example, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below. Ø Sales dollar value for a particular product Ø Sales dollar value for a product in a location Ø Sales dollar value for a product in a year within a location Ø Sales dollar value for a product in a year within a location sold or serviced by an employee
21. 21. 9/23/2010 21 Example of Star Schema •time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch
22. 22. 9/23/2010 22 Aggregation • Many OLAP queries involve aggregation of the data in the fact table • For example, to find the total sales (over time) of each product in each market, we might use SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) FROM Sales S GROUP BY S.Market_Id, S.Product_Id • The aggregation is over the entire time dimension and thus produces a two-dimensional view of the data
23. 23. 9/23/2010 23 Aggregation Over Time The output of the previous query ………P5 …70007503P4 …34503P3 …24026003P2 …15033003P1 M4M3M2M1 SUM(Sales_Amt) Market_Id Product_Id
24. 24. 9/23/2010 24 Typical OLAP Operations • Roll up (drill-up): summarize data – by climbing up hierarchy or by dimension reduction • Drill down (roll down): reverse of roll-up – from higher level summary to lower level summary or detailed data, or introducing new dimensions • Slice and dice: – project and select • Pivot (rotate): – reorient the cube, visualization, 3D to series of 2D planes. • Other operations – drill across: involving (across) more than one fact table – drill through: through the bottom level of the cube to its back-end relational tables (using SQL)
25. 25. 9/23/2010 25 Drilling Down and Rolling Up • Some dimension tables form an aggregation hierarchy Market_Id ® City ® State ® Region • Executing a series of queries that moves down a hierarchy (e.g., from aggregation over regions to that over states) is called drilling down – Requires the use of the fact table or information more specific than the requested aggregation (e.g., cities) • Executing a series of queries that moves up the hierarchy (e.g., from states to regions) is called rolling up
26. 26. 9/23/2010 26 • Drilling down on market: from Region to State Sales (Market_Id, Product_Id, Time_Id, Sales_Amt) Market (Market_Id, City, State, Region) – SELECT S.Product_Id, M.Region, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.Region – SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State, Drilling Down
27. 27. 9/23/2010 27 Rolling Up • Rolling up on market, from State to Region – If we have already created a table, State_Sales, using 1. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State then we can roll up from there to: 2. SELECT T.Product_Id, M.Region, SUM (T.Sales_Amt) FROM State_Sales T, Market M WHERE M.State = T.State GROUP BY T.Product_Id, M.Region
28. 28. 9/23/2010 28 Roll-up and Drill Down Ø Sales Channel Ø Region Ø Country Ø State Ø Location Address Ø Sales Representative RollUp Higher Level of Aggregation Low-level Details Drill-Down
29. 29. 9/23/2010 29 “Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
30. 30. 9/23/2010 30 Snowflake Schema A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierarchies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table (sales)
31. 31. 9/23/2010 31 Snowflake schema • Represent dimensional hierarchy directly by normalizing tables. • Easy to maintain and saves storage
32. 32. 9/23/2010 32 Example of Snowflake Schema•
33. 33. 9/23/2010 33 Example of Snowflake Schema time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city province_or_street country city
34. 34. 9/23/2010 34 Questions?? Prepared By Amit Sharma Hyperion/OBIEE Trainer learnhyperion.wordpress.com Aloo_a2@yahoo.com