Nauzad Kapadia
Quartz Systems
nauzadk@quartzsystems.com
Key Takeaways
 How to design your cubes efficiently
 How to effectively partition your facts
 How to optimize cube and que...
Agenda
 Cube Design
 Storage and Partitioning
 Aggregations
 Processing
 Scalability
Tips for Designing Dimensions and Facts

 Base fact data sources on views
    Can use query hints
    Can facilitate write...
Tips for designing Attributes
 Avoid unnecessary attributes
 Use AttributeHierarchyEnabled property with
 care
 Use Key Co...
Query performance
   Dimension storage access is faster
   Produces more optimal execution plans
Aggregation design
   Ena...
How attribute relationships affect
performance
After adding attribute relationships…
Don’t forget to remove redundant relationships!
All attributes have implicit relatio...
User Defined Hierarchies
 Pre-defined navigation paths through
 dimensional space defined by attributes
 Why create user d...
Natural Hierarchies
 1:M relation (via attribute relationships)
 between every pair of adjacent levels
 Examples:
    Coun...
Natural Hierarchies
 Performance implications
    Only natural hierarchies are materialized on disk
    during processing
...
Benefits of Partitioning
 Partitions can be added, processed,
 deleted independently
    Update to last month’s data does ...
Benefits of Partitioning
 Partitions can be processed and
 queried in parallel
    Better utilization of server resources
...
Best Practices for Partitioning
 No more than 20M rows per partition
 Specify partition / data slice
    Optional (but sti...
Best Practices for Designing Partitions

  Design from the start
  Partition boundary and intervals
  Determine what stora...
What is Proactive Caching
 Benefits of Proactive caching
 Considerations for using proactive caching
    Use correct laten...
What are Aggregations
 Benefits of Aggregations
 Aggregating data in partitions
Aggregation Design Algorithm
 Evaluate cost/benefit of aggregations
     Relative to other aggregations
     Designed in “...
Aggregation Design Algorithm
 Examines the AggregationUsage property to
 build list of candidate attributes
    Full: Ever...
How to Monitor Aggregation Usage?

                                 

                     
                           H...
How to Monitor Aggregation Usage?

                            
                            


                   Miss
 ...
Best Practices for Aggregations
 Define all possible attribute relationships
 Set accurate attribute member counts and fac...
Best Practices for Aggregations
 Aggregation design cycle
    Use Storage Design Wizard (~20% perf gain)
    to design ini...
Processing Options
 ProcessFull
    Fully processes the object from scratch
 ProcessClear
    Clears all data—brings objec...
Best Practices for Processing
 Use XMLA scripts in large production systems
    Automation (e.g., using ascmd)
    Finer c...
Best Practices for Processing
 Partition processing
    Split ProcessFull into ProcessData + ProcessIndexes for large
    ...
Improving multi-user performance
 Increase Query parallelism
 Block long running queries
 Use a load balancing cluster
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be...
Analysis Services   Best Practices From Large Deployments
Analysis Services   Best Practices From Large Deployments
Upcoming SlideShare
Loading in …5
×

Analysis Services Best Practices From Large Deployments

4,099 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,099
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
103
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Analysis Services Best Practices From Large Deployments

  1. 1. Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com
  2. 2. Key Takeaways How to design your cubes efficiently How to effectively partition your facts How to optimize cube and query processing How to ensure that your solution scales well
  3. 3. Agenda Cube Design Storage and Partitioning Aggregations Processing Scalability
  4. 4. Tips for Designing Dimensions and Facts Base fact data sources on views Can use query hints Can facilitate write-back partitions for measure groups containing semi-additive measures Avoid Linked Dimensions Use the Unknown Member
  5. 5. Tips for designing Attributes Avoid unnecessary attributes Use AttributeHierarchyEnabled property with care Use Key Columns appropriately
  6. 6. Query performance Dimension storage access is faster Produces more optimal execution plans Aggregation design Enables aggregation design algorithm to produce effective set of aggregations Dimension security DeniedSet = {State.WA} should deny cities and customers in WA—requires attribute relationships
  7. 7. How attribute relationships affect performance
  8. 8. After adding attribute relationships… Don’t forget to remove redundant relationships! All attributes have implicit relationship to key Examples: Customer  City (not redundant) Customer  State (redundant) Customer  Country (redundant) Date  Month (not redundant) Date  Quarter (redundant) Date  Year (redundant)
  9. 9. User Defined Hierarchies Pre-defined navigation paths through dimensional space defined by attributes Why create user defined hierarchies? Guide end users to interesting navigation paths Existing client tools are not “attribute-aware” Performance Optimize navigation path at processing time Materialization of hierarchy tree on disk Aggregation designer favors user defined hierarchies
  10. 10. Natural Hierarchies 1:M relation (via attribute relationships) between every pair of adjacent levels Examples: Country-State-City-Customer (natural) Country-City (natural) State-Customer (natural) Age-Gender-Customer (unnatural) Year-Quarter-Month (depends on key columns) How many quarters and months? 4 & 12 across all years (unnatural) 4 & 12 for each year (natural)
  11. 11. Natural Hierarchies Performance implications Only natural hierarchies are materialized on disk during processing Unnatural hierarchies are built on the fly during queries (and cached in memory) Create natural hierarchies where possible Using attribute relationships Not always appropriate (e.g., Age-Gender)
  12. 12. Benefits of Partitioning Partitions can be added, processed, deleted independently Update to last month’s data does not affect prior months’ partitions Sliding window scenario easy to implement e.g., 24 month window  add June 2006 partition and delete June 2004 Partitions can have different storage settings Storage mode (MOLAP, ROLAP, HOLAP) Aggregation design Alternate disk drive Remote server
  13. 13. Benefits of Partitioning Partitions can be processed and queried in parallel Better utilization of server resources Reduced data warehouse load times Queries are isolated to relevant partitions  less data to scan SELECT … FROM … WHERE *Time+.*Year+.*2006] Queries only 2006 partitions Bottom line  partitions enable: Manageability Performance Scalability
  14. 14. Best Practices for Partitioning No more than 20M rows per partition Specify partition / data slice Optional (but still recommended) for MOLAP: server auto- detects the slice and validates against user-specified slice (if any) Should reflect, as closely as possible, the data in the partition Must be specified for ROLAP Remote partitions for scale out
  15. 15. Best Practices for Designing Partitions Design from the start Partition boundary and intervals Determine what storage model and aggregation level fits best Frequently queried  MOLAP with lots of aggs Periodically queried  MOLAP with less or no aggs Real-time ROLAP with no aggs Pick efficient data types in fact table
  16. 16. What is Proactive Caching Benefits of Proactive caching Considerations for using proactive caching Use correct latency and silence settings Useful in a transaction-oriented system in which changes are unpredictable
  17. 17. What are Aggregations Benefits of Aggregations Aggregating data in partitions
  18. 18. Aggregation Design Algorithm Evaluate cost/benefit of aggregations Relative to other aggregations Designed in “waves” from top of pyramid Cost is related to aggregation size Benefit is related to “distance” from another aggregation Storage design wizard Assumes all combinations of attributes are equally likely Can be done before you know Fact Table the query load Usage based optimization wizard Assumes query pattern resembles your selection from the query log Representative history is needed
  19. 19. Aggregation Design Algorithm Examines the AggregationUsage property to build list of candidate attributes Full: Every agg must include the attribute None: No agg can include the attribute Unrestricted: No restrictions on the algorithm Default: Unrestricted if attribute is All, key or belongs to a natural hierarchy, None otherwise Builds the attribute lattice
  20. 20. How to Monitor Aggregation Usage?   Hit    Profiler
  21. 21. How to Monitor Aggregation Usage?   Miss   Profiler
  22. 22. Best Practices for Aggregations Define all possible attribute relationships Set accurate attribute member counts and fact table counts Set AggregationUsage to guide agg designer Set rarely queried attributes to None Set commonly queried attributes to Unrestricted Do not build too many aggregations In the 100s, not 1000s! Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesn’t)
  23. 23. Best Practices for Aggregations Aggregation design cycle Use Storage Design Wizard (~20% perf gain) to design initial set of aggregations Enable query log and run pilot workload (beta test with limited set of users) Use Usage Based Optimization (UBO) Wizard to refine aggregations Use larger perf gain (70-80%) Reprocess partitions for new aggregations to take effect Periodically use UBO to refine aggregations
  24. 24. Processing Options ProcessFull Fully processes the object from scratch ProcessClear Clears all data—brings object to unprocessed state ProcessData Reads and stores fact data only (no aggs or indexes) ProcessIndexes Builds aggs and indexes ProcessUpdate Incremental update of dimension (preserves fact data) ProcessAdd Adds new rows to dimension or partition
  25. 25. Best Practices for Processing Use XMLA scripts in large production systems Automation (e.g., using ascmd) Finer control over parallelism, transactions, memory usage, etc. Don’t just process the entire database! Dimension processing Performance is limited by attribute relationships Key attribute is a big bottleneck Define all possible attribute relationships Eliminate redundant relationships—especially on key! Bind Dimension data sources to views instead of tables or named queries
  26. 26. Best Practices for Processing Partition processing Split ProcessFull into ProcessData + ProcessIndexes for large partitions—consumes less memory Monitor aggregation processing spilling to disk (perfmon counters for temp file usage) Add memory, turn on /3GB, move to x64/ia64 Fully process partitions periodically Achieves better compression over repeated incremental processing Data sources Avoid using .NET data sources—OLEDB is order of magnitude faster for processing
  27. 27. Improving multi-user performance Increase Query parallelism Block long running queries Use a load balancing cluster
  28. 28. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

×