Analysis Services Best Practices From Large Deployments

Nauzad Kapadia
Quartz Systems
nauzadk@quartzsystems.com

Key Takeaways
How to design your cubes efficiently
How to effectively partition your facts
How to optimize cube and query processing
How to ensure that your solution scales well

Agenda
Cube Design
Storage and Partitioning
Aggregations
Processing
Scalability

Tips for Designing Dimensions and Facts

Base fact data sources on views
Can use query hints
Can facilitate write-back partitions for measure
groups containing semi-additive measures
Avoid Linked Dimensions
Use the Unknown Member

Tips for designing Attributes
Avoid unnecessary attributes
Use AttributeHierarchyEnabled property with
care
Use Key Columns appropriately

Query performance
Dimension storage access is faster
Produces more optimal execution plans
Aggregation design
Enables aggregation design algorithm to produce effective
set of aggregations
Dimension security
DeniedSet = {State.WA} should deny cities and customers
in WA—requires attribute relationships

How attribute relationships affect
performance

After adding attribute relationships…
Don’t forget to remove redundant relationships!
All attributes have implicit relationship to key
Examples:
Customer  City (not redundant)
Customer  State (redundant)
Customer  Country (redundant)
Date  Month (not redundant)
Date  Quarter (redundant)
Date  Year (redundant)

User Defined Hierarchies
Pre-defined navigation paths through
dimensional space defined by attributes
Why create user defined hierarchies?
Guide end users to interesting navigation paths
Existing client tools are not “attribute-aware”
Performance
Optimize navigation path at processing time
Materialization of hierarchy tree on disk
Aggregation designer favors user defined hierarchies

Natural Hierarchies
1:M relation (via attribute relationships)
between every pair of adjacent levels
Examples:
Country-State-City-Customer (natural)
Country-City (natural)
State-Customer (natural)
Age-Gender-Customer (unnatural)
Year-Quarter-Month (depends on key columns)
How many quarters and months?
4 & 12 across all years (unnatural)
4 & 12 for each year (natural)

Natural Hierarchies
Performance implications
Only natural hierarchies are materialized on disk
during processing
Unnatural hierarchies are built on the fly during queries
(and cached in memory)
Create natural hierarchies where possible
Using attribute relationships
Not always appropriate (e.g., Age-Gender)

Benefits of Partitioning
Partitions can be added, processed,
deleted independently
Update to last month’s data does not affect prior
months’ partitions
Sliding window scenario easy to implement
e.g., 24 month window  add June 2006 partition
and delete June 2004
Partitions can have different storage settings
Storage mode (MOLAP, ROLAP, HOLAP)
Aggregation design
Alternate disk drive
Remote server

Benefits of Partitioning
Partitions can be processed and
queried in parallel
Better utilization of server resources
Reduced data warehouse load times
Queries are isolated to relevant
partitions  less data to scan
SELECT … FROM … WHERE *Time+.*Year+.*2006]
Queries only 2006 partitions
Bottom line  partitions enable:
Manageability
Performance
Scalability

Best Practices for Partitioning
No more than 20M rows per partition
Specify partition / data slice
Optional (but still recommended) for MOLAP: server auto-
detects the slice and validates against user-specified slice (if
any)
Should reflect, as closely as possible, the data in the
partition
Must be specified for ROLAP
Remote partitions for scale out

Best Practices for Designing Partitions

Design from the start
Partition boundary and intervals
Determine what storage model and aggregation
level fits best
Frequently queried  MOLAP with lots of aggs
Periodically queried  MOLAP with less or no aggs
Real-time ROLAP with no aggs
Pick efficient data types in fact table

What is Proactive Caching
Benefits of Proactive caching
Considerations for using proactive caching
Use correct latency and silence settings
Useful in a transaction-oriented system in which
changes are unpredictable

What are Aggregations
Benefits of Aggregations
Aggregating data in partitions

Aggregation Design Algorithm
Evaluate cost/benefit of aggregations
Relative to other aggregations
Designed in “waves” from top of pyramid
Cost is related to aggregation size
Benefit is related to “distance”
from another aggregation
Storage design wizard
Assumes all combinations of
attributes are equally likely
Can be done before you know Fact Table
the query load
Usage based optimization wizard
Assumes query pattern resembles your selection from the query log
Representative history is needed

Aggregation Design Algorithm
Examines the AggregationUsage property to
build list of candidate attributes
Full: Every agg must include the attribute
None: No agg can include the attribute
Unrestricted: No restrictions on the algorithm
Default: Unrestricted if attribute is All, key or
belongs to a natural hierarchy, None otherwise
Builds the attribute lattice

How to Monitor Aggregation Usage?




Hit






Profiler

How to Monitor Aggregation Usage?




Miss



Profiler

Best Practices for Aggregations
Define all possible attribute relationships
Set accurate attribute member counts and fact table
counts
Set AggregationUsage to guide agg designer
Set rarely queried attributes to None
Set commonly queried attributes to Unrestricted
Do not build too many aggregations
In the 100s, not 1000s!
Do not build aggregations larger than 30% of fact table
size (aggregation design algorithm doesn’t)

Best Practices for Aggregations
Aggregation design cycle
Use Storage Design Wizard (~20% perf gain)
to design initial set of aggregations
Enable query log and run pilot workload (beta
test with limited set of users)
Use Usage Based Optimization (UBO) Wizard
to refine aggregations
Use larger perf gain (70-80%)
Reprocess partitions for new aggregations to
take effect
Periodically use UBO to refine aggregations

Processing Options
ProcessFull
Fully processes the object from scratch
ProcessClear
Clears all data—brings object to unprocessed state
ProcessData
Reads and stores fact data only (no aggs or indexes)
ProcessIndexes
Builds aggs and indexes
ProcessUpdate
Incremental update of dimension (preserves fact data)
ProcessAdd
Adds new rows to dimension or partition

Best Practices for Processing
Use XMLA scripts in large production systems
Automation (e.g., using ascmd)
Finer control over parallelism, transactions,
memory usage, etc.
Don’t just process the entire database!
Dimension processing
Performance is limited by attribute relationships
Key attribute is a big bottleneck
Define all possible attribute relationships
Eliminate redundant relationships—especially on key!
Bind Dimension data sources to views instead of tables or
named queries

Best Practices for Processing
Partition processing
Split ProcessFull into ProcessData + ProcessIndexes for large
partitions—consumes less memory
Monitor aggregation processing spilling to disk (perfmon
counters for temp file usage)
Add memory, turn on /3GB, move to x64/ia64
Fully process partitions periodically
Achieves better compression over repeated incremental processing
Data sources
Avoid using .NET data sources—OLEDB is order of
magnitude faster for processing

Improving multi-user performance
Increase Query parallelism
Block long running queries
Use a load balancing cluster

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Analysis Services Best Practices From Large Deployments

Recommended

Recommended

More Related Content

Similar to Analysis Services Best Practices From Large Deployments

Similar to Analysis Services Best Practices From Large Deployments (20)

More from rsnarayanan

More from rsnarayanan (20)

Recently uploaded

Recently uploaded (20)

Analysis Services Best Practices From Large Deployments