Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Biug 20112026 dimensional modeling and mdx best practices


Published on

My presentation at the last Israeli BI User Group.
Some of the content taken from SQLBits and Chriss Web's blog.

Published in: Technology, Business
  • Be the first to comment

Biug 20112026 dimensional modeling and mdx best practices

  1. 1. BIUGItay
  2. 2. Agendal Dimension Designl SSAS Best Practicesl MDXl Inspired by Vincent Rainardi ( ) and Mosha Pumanski
  3. 3. l l lBI- DB l l l DB- BI- l
  4. 4. Microstrategy- BI • DWH • • BI SQL SERVER- SYBASE- • • DB P T • DB OEM • •WEB SILVERLIGHT C • • .NET
  5. 5. White Papers• Analysis Services 2008 R2 Performance Guide• Analysis Services 2008 Operation Guide• Performance Improvements for MDX in SQL Server 2008 Analysis Services• OLAP Design Best Practices
  6. 6. 1 or 2 dimensions a) One Dimension b) Two Dimensions Dim Dim Account Account Fact Fact Table Table customer Dim attributes Customer • We can get the customer• Simplicity, 1 dim attributes without knowing the• Hierarchy from customer account key attribute &account attribute • Disadvantage: can‟t go from• Use when we don‟t have fact account to customer without tables requiring customer grain. going through the fact table - performance
  7. 7. 1 or 2 dimensionsc) Snowflake Dim Dim Account Customer Fact Table • Dim customer is needed by another fact table • Modular: 2 separate dim tables but we can combine them easily to create a bigger dimension • To get the breakdown of a measure by a customer attribute is a bit more complicated than a) select c. attribute, sum(f.measure1) from fact1 f inner join dim_account a on f.account_key = a.account_key inner join dim_customer c on a.customer_key = c.customer_key group by c. attribute
  8. 8. When to Snowflake1. When the sub dim is used by several dims City-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured Replaced by Location/GeoKey pointing to DimLocation / DimGeography Advantage: consistent hierarchy, i.e. relationship between City, Country & Region. Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
  9. 9. When to Snowflake2. When the sub dim is used by both the main dim andthe fact table(s) • DimCustomer is used in DimAccount, and is also used in the fact table. • DimManufacturer is used in DimProduct, and is also used in the fact table. • DimProductGroup is used in DimProduct, and is also used in some fact table. The alternative is maintaining two full dimensions (star classic).
  10. 10. When to Snowflake3. To make “base dim” and “detail dim”Insurance classes, account types(banking), product lines, diagnosis,treatment (health care)Policies for marine, aviation & property classes have differentattributes.Pull common attributes into 1 dim: DimBasePolicyPut class-specific attributes into DimMarine, DimProperty, DimAviationRef: Kimball DW Toolkit 2nd edition page 213
  11. 11. A dimension with only 1 attribute Should we put the attribute in the fact table? (like DD = Degenerate Dim) Probably, if the grain = fact table, and it‟s short or it‟s a number.Reasons for putting single attribute in its own dim:– Keep fact table slim (4 bytes int not 100 bytes varchar)– When the value changes, we don‟t have to update the BIG fact table – ETL performance– Grain is much lower than fact table – small dim– Yes it‟s only 1 attribute today, but in the future there could be another attribute.
  12. 12. Fact Table Primary KeyShould we have a PK? Some experts totally disagreeYes, if we need to be able to identify each fact row1. Need to refer to a fact row from another fact row e.g. chain of events2. Many identical fact rows and we need to update/delete only one3. To link the fact table to another fact tableRelated Trans Header - Detail Uniqueness PK FK PK FK (no RI) PK (not enforced) previous/next transaction
  13. 13. Fact Table Primary KeySingle or Multi Column? Single Column: Generated Identity Multi Column: Dimension KeysSingle-column PK is better than multi-column PK because :1) A multi-column PK may not be unique. A single-column PKguarantees that the PK is unique, because it is an identity column.2) A single-column PK is slimmer than a multi-column PK, better queryperformance. To do a self join in the fact table (e.g. to link the currentfact row to the previous fact row), we join on a single integer column.
  14. 14. Fact Table Primary Key• Advantage: Prevent duplicate rows, query performance• Disadvantage: loading performance• Indexing the PK: cluster or not? – Cluster the PK if: the PK is an identity column – Don‟t cluster the PK if: the PK is a composite, or when you need the cluster index for query performance (with partitioning)Example of not having a PK If duplicate fact rows are allowed. e.g. retail DW: Store Key, Date Key, Product Key, Customer Key Same customer buying the same milk in the same shop on the same day twice
  15. 15. Aggregate Fact TablesWhat are they? Base Fact Tables• High level aggregation of base fact tables• A “select group by” query on a 2 billion rows fact table can take 30 mins if it joins with two big fact tables, even with indexes in place• So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table 30 mins• The report only takes 1 second to run. Aggregate 1 sec Fact Table Report
  16. 16. Rapidly Changing Dimension• Why is it a problem – Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables• What to do – Put into a separate dim, link direct to fact table. – Just store the latest, type 1 attributes (or dual) – Store in the fact table (for small attribute, e.g. indicator) Type2 Type2 Type2 Type2 Type1
  17. 17. Very Large DimensionWhy is it a problem – SSAS: 4 GB string store limit for dimension – SSAS: dim is “select distinct” on each attribute – long processing time – Difficult to browse high cardinality attribute – Join with fact tables – performance
  18. 18. Very Large DimensionWhat to do– Split into 2 dims, same grain. Always cut vertically.– Remove SCD2, or at least only certain columns.– Most common: separate the attributes with high cardinality/change frequency VLD
  19. 19. Real Time Fact Table• Reporting the transaction system in real time• View to union with the normal fact table, or use partitions• Freezing the dims for key lookup, -3 unknown key• Key corrections next day Dims as of Main partition yesterday (up to last night)Unknown keys:-1 null in source-2 not in dim table Real time partition-3 not in dim table as dim was frozen dim (intraday today) to be resolved next batch key
  20. 20. Dealing with Currency RatesWhat for/background/requirements– Report in 3 reporting currencies, using today rates or past– Analyse over time without the impact of currency rates (using fixed currency rates, e.g. 2010 EOY rates)– Had the transactions happened today– Currency rates historical analysis Transaction DW Reporting Currency Transaction Currency Reporting Currency Rates Rates100 countries (many transaction 1 currency ( 1 reporting 3-4 currencies40 currencies dates) e.g. GBP GBP, USD, EUR, date) Original
  21. 21. Dealing with Currency Rates• A good example can be found here.
  22. 22. Dealing with StatusWhat/background – Workflow (policies, contracts, documents) – Bottleneck analysis (no of days between stages) – How many on each stage Status Status Status Status 1 2 4 6 date1 date2 date3 date4 Status Status 3 5
  23. 23. Dealing with StatusApproaches– Accumulative Snapshot Fact, 1 row per application– SCD2 on DimApp AppKey AppID StsKey StsDate Current 1 1 1 1/3/11 N– App Status fact table 2 1 2 3/3/11 N 3 1 3 7/3/11 Y AppKey StsKey StsDateKey 4 2 1 6/3/11 N 1 1 61 5 2 2 7/3/11 Y 1 2 63 1 3 67 2 1 66 AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind 2 2 67 1 1/3/11 1 3/3/11 1 7/3/11 1 2 6/3/11 1 7/3/11 1 0
  24. 24. Referenced Dimensions• Enables using one “master” member• Not Snowflake dimension – For ex. • Dim customers: UK, London, Roman Avramovich. • Dim Stores: UK, London, Friendly Bikes Store – What is the total revenue from Internet customers and stores in London?
  25. 25. MDX optimization Methodology• Re-write the MDX code• Add Aggregations• Add pre-calculated Measure Groups (ETL)• Solve the problem using Relational Engine• Use .NET Store Procedures. – Rarely the problem can be solved using better hardware.• Column based Databases
  26. 26. • Optimizing MDX – Baselining Query Speeds • Clearing the Analysis Services Caches • Clearing the Operating System Caches using fsutil.exe or SSAS Stored Proc (codeplex) • Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services • Configuring the Analysis Services Query Log
  27. 27. • Cell-by-Cell Mode vs. Subspace ModeAlmost always, performance obtained byusing subspace (or block computation)mode is superior to that obtained by usingcell-by-cell (nor naïve) mode.
  28. 28. Using Profiler• So far so good
  29. 29. Doesn‟t use the cache
  30. 30. Subcube• Granularity• Slice
  31. 31. Granularity• Single grain – List of GROUP BY attributes in SQL SELECT• Mixed grain – Both Attribute.[All] and Attribute.MEMBERS
  32. 32. Granularity All Countries, Countries, Country, All City Cities All CityAllProducts Products
  33. 33. Slice• Single member – SQL: Where City = „Redmond‟ – MDX: [City].[Redmond]• Multiple members – SQL: Where City IN („Redmond‟, „Seattle‟) – MDX: { [City].[Redmond], [City].[Seattle] }
  34. 34. Slice at granularitySQLSELECT Sum(Sales), City FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)GROUP BY CityMDXSELECT Measures.Sales ON 0, NON EMPTY {Redmond, Seattle} ON 1FROM Sales_Cube
  35. 35. Slice below granularitySQLSELECT Sum(Sales) FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)MDXSELECT Measures.Sales ON 0FROM Sales_CubeWHERE {Redmond, Seattle}
  36. 36. Examples All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon
  37. 37. Examples All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon (Seattle, Year.Year.MEMBERS)
  38. 38. Examples All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon (Seattle, Year.MEMBERS)
  39. 39. Examples All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon ({Redmond, Seattle, London}, Year.MEMBERS)
  40. 40. Examples All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon ({Redmond, Seattle}, {2005, 2006, 2007})
  41. 41. Arbitrary shaped subcubes• What is it ?• How can it happen ?• Why is it so bad ?• How to avoid them ?
  42. 42. Arbitrary shaped subcubes All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLodnon Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS, 2005))
  43. 43. Arbitrary shaped subcubes All Years 2005 2006 2007 2008All CitiesRedmon dSeattle SFDenver CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) – (Seattle, 2007)
  44. 44. Arbitrary shaped subcubes All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon{(Redmond,2005), (Seattle, 2006), (New York, 2007), (London, 2008)}
  45. 45. Arbitrary shaped subcubes All Years 2005 2006 2007 2008All CitiesRedmon dSeattle New YorkLondon Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All Years]))
  46. 46. Arbitrary shapes• WHERE/Subselect/Aggregate• Unnatural hierarchies• Parent-Child (visual totals)• “Non Leaves” subcube• Conditional logic (IIF, IF, CASE, CoalesceEmpty etc)• NonEmpty, Exists
  47. 47. WHERE/Subselect• Severity = „1‟ OR Priority = „1‟• multiselect – {USA, London}
  48. 48. Mixed grain slicer All USA UK NewSeattle London Bristol York
  49. 49. Mixed grain slicer All USA UK New Seattle London Bristol York All Cities Seattle New York London BristolAll Countries USA UK
  50. 50. Parent-child
  51. 51. Leaves vs. Non Leaves All Countries, Countries, Country, All City Cities All City AllProduct sProduct Leaves s
  52. 52. Problems with arbitrary shapes• Caching• Partition slices• Indexes• SCOPEs• Matching calculations• Many more(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
  53. 53. SCOPESCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )); ...;END SCOPE;
  54. 54. Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )); ...;END SCOPE; Scope 2 Scope 3 Scope 1
  55. 55. Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], [DateTool].[Aggregation].DefaultMember, Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), [DateTool].[Comparison].DefaultMember ...;END SCOPE;
  56. 56. MDX Optimization - Tips• Partial expressions are not cachedThis = iif(<expensive expression >= 0, 1/<expensive expression>, null);create member currentcube.measures.MyPartialExpression as <expensiveexpression> , visible=0;this = iif(measures.MyPartialExpression >= 0, 1/measures.MyPartialExpression, null);
  57. 57. Demo
  58. 58. SSAS Denali• Coming in the first half of 2012• SSAS Tabular Mode – Cheaper – Not best of breed – Uses DAX or MDX• Have you started working with it?
  59. 59. Mobile BI BI l Smart Phone l BI l Mobile Bi BIGartner
  60. 60. Social BI• Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users.• Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more• Instantly Available via Cloud
  61. 61. Social BI• Deep Personalization• Enterprise Data Integration
  62. 62. Survey• SQL / SSAS Denali• Mobile BI• Social BI