Designing, Building, and Maintaining Large Cubes using Lessons Learned

Design, Building & Maintaining
large
cubes using Lessons Learned
Nicholas Dritsas, Eric Jacobsen, Denny Lee
SQL Server Customer Advisory Team
Microsoft Corp.

Customer Advisory Team
• Works on largest, most complex SQL Server projects worldwide
• US: NASDAQ, USDA, Verizon, Raymond James…
• Europe: London Stock Exchange, Barclay’s Capital
• Asia and Pacific: Korea Telecom, Western Digital, Japan Railways East
• ISVs: SAP, Siebel, Sharepoint, GE Healtcare
• Drives product requirements back into SQL Server from our
customers and ISVs
• Shares best practices with SQL Server community
• http://blogs.msdn.com/sqlcat - CAT team blog
• http://blogs.msdn.com/mssqlisv - ISV blog
• http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/default.mspx
• Coming soon: http://www.sqlcat.com – technical notes and case studies
We are wearing the Orange shirts during the conference.
Stop, say hello and feel free to ask us any questions.

Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned

Designing Dimensions
Slowly changing and large dimensions
• Slowly changing dimensions Type 2:
• Minimize data updates to avoid cube reprocessing
• If you must update, do the ProcessAdd every evening and perform
weekly full processing. NOTE: This is only available in XMLA
• Large dimensions:
• Use natural hierarchies
• Dimension SQL queries are in the form of "select distinct colA, colB, …
from [DimensionTable]"
• Many hierarchies introduce many select distinct statements. Look on
tuning SQL indexes.
• See TK Anand’s article, http://msdn2.microsoft.com/en-
us/library/ms345142.aspx, for more details
• Maximum Size of dimensions
• Successful implementations with 10 million members.

Designing Cubes
Using partitions
• Partition by time plus another dimension too, such as geography
• For real-time BI, you may want having only the most recent partition
ROLAP with the other partitions in MOLAP.
• NOTE: When data changes, all data cache for the measure group is
discarded. So, it may make sense to separate cube or measure
groups by “static” and “real-time” analysis.

Designing Aggregations
Here is an optimized method:
1) Create your aggregations via the Agg Wizard at 5-10%
2) Turn on the query log and set the sampling to 1 to record all
queries. Delete queries from the OlapQueryLog table that have a
Duration <100ms as those queries are pretty fast.
3) Run a set of MDX queries that best represent the type of questions
you will be typically asked by your users.
4) With the OlapQueryLog table full of data, write a SQL statement to
only get the rows of slow queries and/or queries executed often.
5) This SQL statement can now be used in the Aggregation Manager
sample (found in Codeplex or SQL Server SP2 Samples)

Demonstration #1
Aggregation Manager

Scalability - Processing
• Scale by processing many partitions in parallel
• No more than 2,000-4,000 total partitions
• If you need more, ensure you have installed build 3166 or later
• Parallelism is applicable when:
• dimensions change and need to reprocess partitions
• adding or modifying measure groups
• changing aggregation design

Process Data
Process Indexes
Read Data from
SQL Server
Lookup
Dimension
Keys
Write to
*.fact.data files
Look at
*.fact.data
files
Build *.map
files
Write to
*.agg.*.data
files

Use ProcessData and ProcessIndexes
• ProcessFull is the default method and it executes the
ProcessData and ProcessIndexes jobs.
• Processing completes faster and AS uses fewer memory
resources when using ProcessData and ProcessIndexes
separately.
Process enumeration Processing time
Two-step process 00:12:23
ProcessData 00:10:18
ProcessIndexes 00:02:05
ProcessFull 00:13:34

ProcessFull
ProcessData ProcessIndexes

• ProcessData, rule of thumb
• 40-80 K rows/sec, per partition
• Best if you have
• Integer keys
• Less than 10 measures
• No SQL joins
• Example:
• One customer with many partitions in our lab saw 400 K
rows/sec sustained, 700 K rows/sec peaks, on an 8-CPU
machine
• We have seen customers with hundreds of measures

• Next Slide – Show Me the Numbers
• Project Real data
• Unisys 16-processor machine
• Chose 16 partitions with most similar size
• For each data point <n>, set MaxParallel=<n>, used <n>
partitions
• Integer keys, 2-part composite keys

• Why performance loss at 16??
• On this machine, with this configuration, with this data, Memory
Quota limited to running only 12 in parallel, based on estimates,
so last 4 had to wait
• Adding more memory can help
• Imagine Gantt chart showing which partitions run at a time
• Takeaway point is there can be many things to reduce
scalability, but it is often possible to get good performance

• Distinct Count
• Limit one distinct count measure per measure group
• Distinct Count cannot use in-memory DataCache to derive
storage engine queries – only goes to fact table partitions
• During processing, SQL Server does the sorting
• Items to look for:
• Memory grants on SQL Server could limit to only 3 partitions
running in parallel, if the query plan generated for these queries
exceeds 1/4 the memory on SQLServer.
• Watch perfmon counter for SQLServer:
"SQLServer:MemoryManagerMemory Grants Outstanding",
"SQLServer:MemoryManagerMemory Grants Pending".
• If you need to process more than 3 partitions in parallel, contact CSS
• SQL Query timeout error HYT00.
• Modify <ExternalTimeout>.
• A query will be canceled if no rows are returned in that time.

• Dimension processing, potential concerns
• Longest pole (Gannt chart analogy) when processing very large
dimension, e.g. 10 million members
• Size limitation – 4 GB for “string store”, stored in Unicode, 6 byte
per-string overhead.
• E.g. 50-character name: 4*1024*1024*1024 / (6+50*2) = 40.5
million members
• Consider size of other properties/attributes
• We saw recent case of discretization of 10 million member
customer dimension – workarounds include do it in SQL using
NTile or define hierarchy. (Go back to business logic.)
• Usually bigger concern is impact of changing dimensions

• Scale up vs. Scale out
• Scale up = big machine, scale out = many machines
• Today, officially, only scale up
• Scale out
• Better economics
• Can be better flexibility of machine usage
• AS team is considering delivering a supported method

Scalability - Tools
• Goal
• Drive server with workload at <n> users
• Measure average throughput (queries/sec)
• Measure average response time (sec/query)
• Can support <n> users such that average response time < 15
seconds (as example)
• Next slide – example graph
• Each point is a 15-minute run

Scalability - Tools
• Some available tools
• VBScript - per client, parse mdx text files, execute queries, log
to CSV file, analyze with Excel
• VSTS – Visual Studio Team System
• Framework to run multi-user scenarios, record perfmon counters, ramp up
users
• There is sample on http://www.codeplex.com, not production quality yet but
a good start
• LoadRunner
• ASCMD utility, soon to be updated with some additions for Multi
User testing
• Roll your own …

Scalability - Tools
• Input to VSTS tools
• Number of clients (users) to simulate
• Query file for each client (user) – each represents a sequence of
user actions
• Time to run – e.g. 15 minutes
• Think time – e.g. random between 10 to 20 seconds
• Output
• Average throughput – queries/sec
• Average response time – sec/query
• Perfmon counters
• Issue: Is it realistic (representative)
• Issue: Any problems? (Could be many.)

What effect do we see?

Another view of same, what is the effect?
TIP: CREATE CACHE or warm-up queries can help after server startup

Multi user query load testing
• Things to watch out for
• Duplicate queries (VSTS sample – 18,000 queries gives about
2,000 unique)
• ASQueryGenerator and documents with some examples on how to
create template queries to reduce duplicate and empty queries are
due for an update soon.
• Empty results (think of cube sparsity) – Real users rarely look at
empty regions (e.g. Canada swimsuit sales in December)
• Caching (applies to real world but can skew results good or bad)
• Think time
• Anything you can think of looking at …

Multiuser scalability
• Load considerations
• Process partition while under load
• Writeback
• Proactive caching (real time)
• Force-Commit timeout
• Ramp up effects (connection, query warm up time)
• End effects (include measurements when last users finish?)
• No one-touch tool to examine effects
• WAN (wide area network) simulation

Scalability options
• Performance less than expected?
• IO Bound?
• Look at disk system, including controllers, number of disks per controller.
• Direct-attach, SAN choices.
• Use SQLIO to measure hardware.
• SSAS Partitions, Aggregations, 64-bit vs. 32-bit
• CPU Bound?
• Scale up (bigger machine), scale out (more machines, use replication)
• SSAS calculations
• Look at queries.
• Think about reasonability. Not a black box, break down to
components, try removing components/factors, isolate
• Diagnosing Query performance paper

Scale Up for Queries
• Scale up = more CPUs
• Note that 4 socket machines relatively cheaper than > 4
sockets, not a linear cost factor
• Benefits scenarios that are CPU bound, and parallel
• Some customers led to believe (wrongly) that bigger machine
will improve every query, even formula-engine calculations
running by itself
• Presently only way to improve parallel processing

Scale Out for Queries
• In clustering you ensure high
availability, with scale out you
optimize query performanceoptimize query performance.
• You can setup multiple query-only
AS databases on multiple AS
servers to handle a larger number of
concurrent users.
• NLB or other TCP/IP load balancing
is typically used.
• You can find out more info within the
“Scale Out Querying with AnalysisScale Out Querying with Analysis
ServicesServices” whitepapers.

Scalability
• Looking back at multiuser load testing
• Part of loop, understand what is happening, go back to goals,
investigate where time is going, consider if it makes sense
• Solutions:
• Avoid time-expensive operations,
• Buy more hardware (scale up/scale out)
• Limit project goals (e.g. standard reports show less information), revisit
alignment to business goals
• Rewrite mdx queries, change design strategy (e.g. one cube for semi-static
analysis, another for 15-minute updates)
• Load testing and analysis can take 50% or more of project time. Quick
POC approach might uncover some issues early.

What can you do to improve processing
performance?
• These best practices recommendations are based on the
lessons learned from working with many enterprise AS
customers.
• The suggestions below can be found in the AnalysisAnalysis
Services Processing Best PracticesServices Processing Best Practices whitepaper
located on the SQL Server Best Practices web site on
Technet.

Dimension Processing BP
• Add indexes to the underlying tables dimension tables to help
improve the “select distinct” queries generated by AS
• Create a separate table or view for dimension processing so you
can optimize specifically for AS dimension processing.
• Set the appropriate values for parallel processing. In general this is
1-2 times CPU. Testing can help to find the optimal value.
• Use the XMLA <Parallel> nodes to group processing tasks.
• Use the <Transaction> node to group different objects to have
different transaction commits.

Maintenance Issues
Backup and Restore Strategies
• The backup and restorebackup and restore functionality has markedly improved within
SQL Server 2005.
• Note, SQL Server 2008 will introduce a newer version of this feature to help
improve scaling for huge cubes.
• Backup the SQL database that holds your OlapQueryLog table.OlapQueryLog table.
• You can use the Database SynchronizationDatabase Synchronization feature that allows you
to synchronize your database from a primary to a secondary server.
• Similar to AS2000, you can copy the full data foldercopy the full data folder as your
backup as well.
• Note, that some of information is encrypted so if you are restoring to a different
server, you will need to manually change connection strings and passwords.
• You can find more information about this approach in the best practices
whitepaper “Scale Out Querying with Analysis ServicesScale Out Querying with Analysis Services”.

Maintenance Issues
Planning for possible cube rebuildPlanning for possible cube rebuild
• How do you plan for the possible full cube rebuildplan for the possible full cube rebuild if
you have a catastrophic loss of your Analysis Services
database?
• For starters, partition your data (e.g. by time) if possible so that
way you can restore the available time periods of databases as
you are busy re-building any missing portions.
• Presuming you have a great backup solution, then you can
restore most of the data except for the current day. The current
day data can then be rebuilt against the existing data source.

Performance Monitor
• SSAS 2005 has more perfmon tools.
• The counter “MSAS 2005: ProcessingRows“MSAS 2005: ProcessingRows
read/sec”read/sec” is helpful to troubleshoot or optimize parallel
processing. It provides the number of rows/second AS is
reading from the relational data source.
• Processing begins by sending a SQL query to get the
data to populate each partition.

How to monitor AS performance
• Use the SQL Server ProfilerSQL Server Profiler to capture key trace events
of long running queries (user or processing).
• Use the Windows Event LogWindows Event Log as many AS events are
recorded there.
• The AS Query LogAS Query Log stores internal query information
meant to help in defining aggregations later.

Demonstration #2
Profiling SQL Executed Statement

Other AS Monitoring Tools
• Use ascmd.exeascmd.exe sample application included in the SQL
Server SP2 Samples. You can use the –T option to
output trace file when running this utility.
• Create a system-wide trace filesystem-wide trace file to record the events
(refer to the attached XMLA file).
• Note, the AS Flight RecorderFlight Recorder exists to record the last
set of events that occurred in case of a catastrophic
event on the server. Within the AS Properties, check
“Show Advanced (All) Properties” and you will notice the
Flight Recorder properties
ProcessPartitionAndRunTrace.xmla

SQL CAT Presentations at PASS 2007
Session Code Session Title Speakers Date Time
DBA-410-M Designing for Petabyte using
Lessons Learned from Customer
Experiences
Lubor Kollar; Lasse
Nedergaard
9/19/2007 9:45 AM - 11:00 AM
DBA-411-M Building High Performance SQL
system using Lessons Learned
from customer deployments
Michael Thomassy;
Burzin Patel
9/19/2007 1:30 PM - 2:45 PM
DBA-412-M ISV configuration &
implementation using Lessons
Learned from customer
deployments
Juergen Thomas 9/20/2007 10:30 AM - 11:45 AM
DBA-413-M Building Highly Available SQL
Server implementations using
Lessons Learned from customer
deployments
Prem Mehra; Lindsey
Allen; Sanjay Mishra
9/20/2007 1:30 PM - 2:45 PM
DBA-416-M Building and Deploying Large
Scale SSRS farms using Lessons
Learned from customer
deployments
Denny Lee; Lukasz
Pawlowski
9/21/2007 9:45 AM - 11:00 AM
DBA-415-M Building & Maintaining large
cubes using Lessons Learned from
customer deployments
Nicholas Dritsas; Eric
Jacobsen
9/21/2007 1:00 PM - 2:15 PM

Thank you!
Thank you for attending this session and the
2007 PASS Community Summit in Denver

Designing, Building, and Maintaining Large Cubes using Lessons Learned

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Designing, Building, and Maintaining Large Cubes using Lessons Learned

Similar to Designing, Building, and Maintaining Large Cubes using Lessons Learned (20)

More from Denny Lee

More from Denny Lee (20)

Recently uploaded

Recently uploaded (20)

Designing, Building, and Maintaining Large Cubes using Lessons Learned

Editor's Notes