Microsoft Analysis Services Physical Design

2,473 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,473
On SlideShare
0
From Embeds
0
Number of Embeds
811
Actions
Shares
0
Downloads
76
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Microsoft Analysis Services Physical Design

  1. 1. Microsoft Analysis Services Physical Design James Snape Application Development Consulting Microsoft Limited
  2. 2. The Kimball Process
  3. 3. Agenda Hardware Dimensions Facts Relational stuff Performance tuning next steps NB: Relational design not complete – logging, auditing etc discussed in next session
  4. 4. Prime Directive: Sequential IO Good, Random IO Bad
  5. 5. Hardware SQL Server Fast Track Data Warehouse www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx Pre-tested hardware configurations Specific disk, filegroup, layouts Minimal indexing To feed CPU at maximum capacity
  6. 6. Dimensions vs Facts Dimension Small (relatively) Repeating data Fact Large Numeric data + keys Treat them differently
  7. 7. Dimensions in Relational Terms Customer Table structure Full Name Keys Post Code City Indexes State Country Null handling Gender Occupation Managing change Customer Marital Status Geography Email Address Processing 1. Country 2. State 3. City 4. Post Code 5. Full Name
  8. 8. Star vs. Snowflake Schemas dbo.Customer dbo.Customer CustomerKey CustomerKey GeographyKey FullName FullName PostCode Gender City Occupation State MaritalStatus OR Country EmailAddress Gender Occupation MaritalStatus dbo.Geography EmailAddress GeographyKey PostCode City NB: both are denormalized, State one more than the other Country
  9. 9. Primary Keys Use smallest possible integer as surrogate primary key Primary key is a “row identifier” Multiple row “versions” are possible “None” and “Unknown” special values are useful Do NOT use business/source system keys Clustered primary key is OK for dimensions
  10. 10. Dimension Indexes Dimension processing queries of the form: SELECT DISTINCT .... FROM .... WHERE (filter) clauses never used WHERE (join) clauses are used in snowflake dimensions Non-processing queries may end up in SQL ROLAP dimensions Direct to SQL queries
  11. 11. Null Handling in Dimensions By default NULL converts to 0 or an empty string NULL attribute keys can invoke special “Unknown Member” handling Prefer to create a specific “Unknown” row CustomerKey FullName City Country -1 Unknown Unknown Unknown -2 None None None 1243 John Smith London United Kingdom 1244 Mary Jones Glasgow United Kingdom
  12. 12. Dimension Attributes Attributes have keys, names (and values) Integer attribute keys are smaller and faster Keys must be unique Attribute Key Name (Value) Year 2009 CY 2009 2009 Month 4 April 4 Month of Year 20090400 April 2009 4 SELECT [Month] as [Month], [Month] + „ „ + [Year] as [Month of Year] FROM dbo.Time
  13. 13. Slowly Changing Dimensions PK = row identifier dbo.Customer CustomerKey Multiple rows = FullName multiple versions PostCode City State Country Add effective dating Gender columns Occupation MaritalStatus Which can be exposed EmailAddress as new dimensional EffectiveFrom (smalldatetime) attributes EffectiveTo (smalldatetime) CurrentFlag (tinyint)
  14. 14. Facts in Relational Terms Keys Internet Sales Indexing Sales Amount Order Quantity Partitioning Tax Amount Unit Price Processing Transaction Count Consider Row and Page compression
  15. 15. Fact Keys and Indexes Is a surrogate/primary key required? Beware the clustered index/primary key Prefer the date FK as the clustered index Add NO CHECK to foreign keys Indexes are usually not useful Unless processing degenerate dimensions Or servicing ROLAP/direct to SQL queries
  16. 16. Fact Partitioning – Why? Parallel processing Only process most recent data Multiple storage engine threads during query Archive off data Multiple aggregation strategies NB: Partitions require Enterprise Edition
  17. 17. Fact Partitioning – Guidelines Partition when fact tables are 50-100GB+ Ideal partition size 2M-20M rows Less than 1000 partitions per measure group This wins over partition size Prefer to partition over time Can not aggregate higher than partition grain Align AS and SQL partitions! Calculated time keys become very useful
  18. 18. Fact Storage MOLAP, ROLAP or HOLAP Source Data Facts Aggregations Relational Multidimensional
  19. 19. Proactive Caching Cube = “Cache” Automatic invalidation of cube Automatic rebuild of cube Query SQL Query Valid? Valid?
  20. 20. Quick Storage Engine Tuning Ensure attribute relations are implemented Turn on query log Run Usage Based Optimisation (UBO) wizard
  21. 21. © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

×