• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Analysis Services   Best Practices From Large Deployments
 

Analysis Services Best Practices From Large Deployments

on

  • 4,606 views

 

Statistics

Views

Total Views
4,606
Views on SlideShare
4,597
Embed Views
9

Actions

Likes
0
Downloads
85
Comments
0

1 Embed 9

http://www.slideshare.net 9

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Analysis Services   Best Practices From Large Deployments Analysis Services Best Practices From Large Deployments Presentation Transcript

    • Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com
    • Key Takeaways How to design your cubes efficiently How to effectively partition your facts How to optimize cube and query processing How to ensure that your solution scales well
    • Agenda Cube Design Storage and Partitioning Aggregations Processing Scalability
    • Tips for Designing Dimensions and Facts Base fact data sources on views Can use query hints Can facilitate write-back partitions for measure groups containing semi-additive measures Avoid Linked Dimensions Use the Unknown Member
    • Tips for designing Attributes Avoid unnecessary attributes Use AttributeHierarchyEnabled property with care Use Key Columns appropriately
    • Query performance Dimension storage access is faster Produces more optimal execution plans Aggregation design Enables aggregation design algorithm to produce effective set of aggregations Dimension security DeniedSet = {State.WA} should deny cities and customers in WA—requires attribute relationships
    • How attribute relationships affect performance
    • After adding attribute relationships… Don’t forget to remove redundant relationships! All attributes have implicit relationship to key Examples: Customer  City (not redundant) Customer  State (redundant) Customer  Country (redundant) Date  Month (not redundant) Date  Quarter (redundant) Date  Year (redundant)
    • User Defined Hierarchies Pre-defined navigation paths through dimensional space defined by attributes Why create user defined hierarchies? Guide end users to interesting navigation paths Existing client tools are not “attribute-aware” Performance Optimize navigation path at processing time Materialization of hierarchy tree on disk Aggregation designer favors user defined hierarchies
    • Natural Hierarchies 1:M relation (via attribute relationships) between every pair of adjacent levels Examples: Country-State-City-Customer (natural) Country-City (natural) State-Customer (natural) Age-Gender-Customer (unnatural) Year-Quarter-Month (depends on key columns) How many quarters and months? 4 & 12 across all years (unnatural) 4 & 12 for each year (natural)
    • Natural Hierarchies Performance implications Only natural hierarchies are materialized on disk during processing Unnatural hierarchies are built on the fly during queries (and cached in memory) Create natural hierarchies where possible Using attribute relationships Not always appropriate (e.g., Age-Gender)
    • Benefits of Partitioning Partitions can be added, processed, deleted independently Update to last month’s data does not affect prior months’ partitions Sliding window scenario easy to implement e.g., 24 month window  add June 2006 partition and delete June 2004 Partitions can have different storage settings Storage mode (MOLAP, ROLAP, HOLAP) Aggregation design Alternate disk drive Remote server
    • Benefits of Partitioning Partitions can be processed and queried in parallel Better utilization of server resources Reduced data warehouse load times Queries are isolated to relevant partitions  less data to scan SELECT … FROM … WHERE *Time+.*Year+.*2006] Queries only 2006 partitions Bottom line  partitions enable: Manageability Performance Scalability
    • Best Practices for Partitioning No more than 20M rows per partition Specify partition / data slice Optional (but still recommended) for MOLAP: server auto- detects the slice and validates against user-specified slice (if any) Should reflect, as closely as possible, the data in the partition Must be specified for ROLAP Remote partitions for scale out
    • Best Practices for Designing Partitions Design from the start Partition boundary and intervals Determine what storage model and aggregation level fits best Frequently queried  MOLAP with lots of aggs Periodically queried  MOLAP with less or no aggs Real-time ROLAP with no aggs Pick efficient data types in fact table
    • What is Proactive Caching Benefits of Proactive caching Considerations for using proactive caching Use correct latency and silence settings Useful in a transaction-oriented system in which changes are unpredictable
    • What are Aggregations Benefits of Aggregations Aggregating data in partitions
    • Aggregation Design Algorithm Evaluate cost/benefit of aggregations Relative to other aggregations Designed in “waves” from top of pyramid Cost is related to aggregation size Benefit is related to “distance” from another aggregation Storage design wizard Assumes all combinations of attributes are equally likely Can be done before you know Fact Table the query load Usage based optimization wizard Assumes query pattern resembles your selection from the query log Representative history is needed
    • Aggregation Design Algorithm Examines the AggregationUsage property to build list of candidate attributes Full: Every agg must include the attribute None: No agg can include the attribute Unrestricted: No restrictions on the algorithm Default: Unrestricted if attribute is All, key or belongs to a natural hierarchy, None otherwise Builds the attribute lattice
    • How to Monitor Aggregation Usage?   Hit    Profiler
    • How to Monitor Aggregation Usage?   Miss   Profiler
    • Best Practices for Aggregations Define all possible attribute relationships Set accurate attribute member counts and fact table counts Set AggregationUsage to guide agg designer Set rarely queried attributes to None Set commonly queried attributes to Unrestricted Do not build too many aggregations In the 100s, not 1000s! Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesn’t)
    • Best Practices for Aggregations Aggregation design cycle Use Storage Design Wizard (~20% perf gain) to design initial set of aggregations Enable query log and run pilot workload (beta test with limited set of users) Use Usage Based Optimization (UBO) Wizard to refine aggregations Use larger perf gain (70-80%) Reprocess partitions for new aggregations to take effect Periodically use UBO to refine aggregations
    • Processing Options ProcessFull Fully processes the object from scratch ProcessClear Clears all data—brings object to unprocessed state ProcessData Reads and stores fact data only (no aggs or indexes) ProcessIndexes Builds aggs and indexes ProcessUpdate Incremental update of dimension (preserves fact data) ProcessAdd Adds new rows to dimension or partition
    • Best Practices for Processing Use XMLA scripts in large production systems Automation (e.g., using ascmd) Finer control over parallelism, transactions, memory usage, etc. Don’t just process the entire database! Dimension processing Performance is limited by attribute relationships Key attribute is a big bottleneck Define all possible attribute relationships Eliminate redundant relationships—especially on key! Bind Dimension data sources to views instead of tables or named queries
    • Best Practices for Processing Partition processing Split ProcessFull into ProcessData + ProcessIndexes for large partitions—consumes less memory Monitor aggregation processing spilling to disk (perfmon counters for temp file usage) Add memory, turn on /3GB, move to x64/ia64 Fully process partitions periodically Achieves better compression over repeated incremental processing Data sources Avoid using .NET data sources—OLEDB is order of magnitude faster for processing
    • Improving multi-user performance Increase Query parallelism Block long running queries Use a load balancing cluster
    • © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.