SlideShare a Scribd company logo
1 of 45
Design, Building & Maintaining
large
cubes using Lessons Learned
Nicholas Dritsas, Eric Jacobsen, Denny Lee
SQL Server Customer Advisory Team
Microsoft Corp.
Customer Advisory Team
• Works on largest, most complex SQL Server projects worldwide
• US: NASDAQ, USDA, Verizon, Raymond James…
• Europe: London Stock Exchange, Barclay’s Capital
• Asia and Pacific: Korea Telecom, Western Digital, Japan Railways East
• ISVs: SAP, Siebel, Sharepoint, GE Healtcare
• Drives product requirements back into SQL Server from our
customers and ISVs
• Shares best practices with SQL Server community
• http://blogs.msdn.com/sqlcat - CAT team blog
• http://blogs.msdn.com/mssqlisv - ISV blog
• http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/default.mspx
• Coming soon: http://www.sqlcat.com – technical notes and case studies
We are wearing the Orange shirts during the conference.
Stop, say hello and feel free to ask us any questions.
Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Designing Dimensions
Slowly changing and large dimensions
• Slowly changing dimensions Type 2:
• Minimize data updates to avoid cube reprocessing
• If you must update, do the ProcessAdd every evening and perform
weekly full processing. NOTE: This is only available in XMLA
• Large dimensions:
• Use natural hierarchies
• Dimension SQL queries are in the form of "select distinct colA, colB, …
from [DimensionTable]"
• Many hierarchies introduce many select distinct statements. Look on
tuning SQL indexes.
• See TK Anand’s article, http://msdn2.microsoft.com/en-
us/library/ms345142.aspx, for more details
• Maximum Size of dimensions
• Successful implementations with 10 million members.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Designing Cubes
Using partitions
• Partition by time plus another dimension too, such as geography
• For real-time BI, you may want having only the most recent partition
ROLAP with the other partitions in MOLAP.
• NOTE: When data changes, all data cache for the measure group is
discarded. So, it may make sense to separate cube or measure
groups by “static” and “real-time” analysis.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Designing Aggregations
Here is an optimized method:
1) Create your aggregations via the Agg Wizard at 5-10%
2) Turn on the query log and set the sampling to 1 to record all
queries. Delete queries from the OlapQueryLog table that have a
Duration <100ms as those queries are pretty fast.
3) Run a set of MDX queries that best represent the type of questions
you will be typically asked by your users.
4) With the OlapQueryLog table full of data, write a SQL statement to
only get the rows of slow queries and/or queries executed often.
5) This SQL statement can now be used in the Aggregation Manager
sample (found in Codeplex or SQL Server SP2 Samples)
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Demonstration #1
Aggregation Manager
Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• Scale by processing many partitions in parallel
• No more than 2,000-4,000 total partitions
• If you need more, ensure you have installed build 3166 or later
• Parallelism is applicable when:
• dimensions change and need to reprocess partitions
• adding or modifying measure groups
• changing aggregation design
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
Process Data
Process Indexes
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Read Data from
SQL Server
Lookup
Dimension
Keys
Write to
*.fact.data files
Look at
*.fact.data
files
Build *.map
files
Write to
*.agg.*.data
files
Scalability - Processing
Use ProcessData and ProcessIndexes
• ProcessFull is the default method and it executes the
ProcessData and ProcessIndexes jobs.
• Processing completes faster and AS uses fewer memory
resources when using ProcessData and ProcessIndexes
separately.
Process enumeration Processing time
Two-step process 00:12:23
ProcessData 00:10:18
ProcessIndexes 00:02:05
ProcessFull 00:13:34
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
ProcessFull
ProcessData ProcessIndexes
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• ProcessData, rule of thumb
• 40-80 K rows/sec, per partition
• Best if you have
• Integer keys
• Less than 10 measures
• No SQL joins
• Example:
• One customer with many partitions in our lab saw 400 K
rows/sec sustained, 700 K rows/sec peaks, on an 8-CPU
machine
• We have seen customers with hundreds of measures
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• Next Slide – Show Me the Numbers
• Project Real data
• Unisys 16-processor machine
• Chose 16 partitions with most similar size
• For each data point <n>, set MaxParallel=<n>, used <n>
partitions
• Integer keys, 2-part composite keys
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• Why performance loss at 16??
• On this machine, with this configuration, with this data, Memory
Quota limited to running only 12 in parallel, based on estimates,
so last 4 had to wait
• Adding more memory can help
• Imagine Gantt chart showing which partitions run at a time
• Takeaway point is there can be many things to reduce
scalability, but it is often possible to get good performance
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• Distinct Count
• Limit one distinct count measure per measure group
• Distinct Count cannot use in-memory DataCache to derive
storage engine queries – only goes to fact table partitions
• During processing, SQL Server does the sorting
• Items to look for:
• Memory grants on SQL Server could limit to only 3 partitions
running in parallel, if the query plan generated for these queries
exceeds 1/4 the memory on SQLServer.
• Watch perfmon counter for SQLServer:
"SQLServer:MemoryManagerMemory Grants Outstanding",
"SQLServer:MemoryManagerMemory Grants Pending".
• If you need to process more than 3 partitions in parallel, contact CSS
• SQL Query timeout error HYT00.
• Modify <ExternalTimeout>.
• A query will be canceled if no rows are returned in that time.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• Dimension processing, potential concerns
• Longest pole (Gannt chart analogy) when processing very large
dimension, e.g. 10 million members
• Size limitation – 4 GB for “string store”, stored in Unicode, 6 byte
per-string overhead.
• E.g. 50-character name: 4*1024*1024*1024 / (6+50*2) = 40.5
million members
• Consider size of other properties/attributes
• We saw recent case of discretization of 10 million member
customer dimension – workarounds include do it in SQL using
NTile or define hierarchy. (Go back to business logic.)
• Usually bigger concern is impact of changing dimensions
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Processing
• Scale up vs. Scale out
• Scale up = big machine, scale out = many machines
• Today, officially, only scale up
• Scale out
• Better economics
• Can be better flexibility of machine usage
• AS team is considering delivering a supported method
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Tools
• Goal
• Drive server with workload at <n> users
• Measure average throughput (queries/sec)
• Measure average response time (sec/query)
• Can support <n> users such that average response time < 15
seconds (as example)
• Next slide – example graph
• Each point is a 15-minute run
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Tools
• Some available tools
• VBScript - per client, parse mdx text files, execute queries, log
to CSV file, analyze with Excel
• VSTS – Visual Studio Team System
• Framework to run multi-user scenarios, record perfmon counters, ramp up
users
• There is sample on http://www.codeplex.com, not production quality yet but
a good start
• LoadRunner
• ASCMD utility, soon to be updated with some additions for Multi
User testing
• Roll your own …
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability - Tools
• Input to VSTS tools
• Number of clients (users) to simulate
• Query file for each client (user) – each represents a sequence of
user actions
• Time to run – e.g. 15 minutes
• Think time – e.g. random between 10 to 20 seconds
• Output
• Average throughput – queries/sec
• Average response time – sec/query
• Perfmon counters
• Issue: Is it realistic (representative)
• Issue: Any problems? (Could be many.)
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
What effect do we see?
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Another view of same, what is the effect?
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
TIP: CREATE CACHE or warm-up queries can help after server startup
Multi user query load testing
• Things to watch out for
• Duplicate queries (VSTS sample – 18,000 queries gives about
2,000 unique)
• ASQueryGenerator and documents with some examples on how to
create template queries to reduce duplicate and empty queries are
due for an update soon.
• Empty results (think of cube sparsity) – Real users rarely look at
empty regions (e.g. Canada swimsuit sales in December)
• Caching (applies to real world but can skew results good or bad)
• Think time
• Anything you can think of looking at …
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Multiuser scalability
• Load considerations
• Process partition while under load
• Writeback
• Proactive caching (real time)
• Force-Commit timeout
• Ramp up effects (connection, query warm up time)
• End effects (include measurements when last users finish?)
• No one-touch tool to examine effects
• WAN (wide area network) simulation
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability options
• Performance less than expected?
• IO Bound?
• Look at disk system, including controllers, number of disks per controller.
• Direct-attach, SAN choices.
• Use SQLIO to measure hardware.
• SSAS Partitions, Aggregations, 64-bit vs. 32-bit
• CPU Bound?
• Scale up (bigger machine), scale out (more machines, use replication)
• SSAS calculations
• Look at queries.
• Think about reasonability. Not a black box, break down to
components, try removing components/factors, isolate
• Diagnosing Query performance paper
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scale Up for Queries
• Scale up = more CPUs
• Note that 4 socket machines relatively cheaper than > 4
sockets, not a linear cost factor
• Benefits scenarios that are CPU bound, and parallel
• Some customers led to believe (wrongly) that bigger machine
will improve every query, even formula-engine calculations
running by itself
• Presently only way to improve parallel processing
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scale Out for Queries
• In clustering you ensure high
availability, with scale out you
optimize query performanceoptimize query performance.
• You can setup multiple query-only
AS databases on multiple AS
servers to handle a larger number of
concurrent users.
• NLB or other TCP/IP load balancing
is typically used.
• You can find out more info within the
“Scale Out Querying with AnalysisScale Out Querying with Analysis
ServicesServices” whitepapers.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Scalability
• Looking back at multiuser load testing
• Part of loop, understand what is happening, go back to goals,
investigate where time is going, consider if it makes sense
• Solutions:
• Avoid time-expensive operations,
• Buy more hardware (scale up/scale out)
• Limit project goals (e.g. standard reports show less information), revisit
alignment to business goals
• Rewrite mdx queries, change design strategy (e.g. one cube for semi-static
analysis, another for 15-minute updates)
• Load testing and analysis can take 50% or more of project time. Quick
POC approach might uncover some issues early.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
What can you do to improve processing
performance?
• These best practices recommendations are based on the
lessons learned from working with many enterprise AS
customers.
• The suggestions below can be found in the AnalysisAnalysis
Services Processing Best PracticesServices Processing Best Practices whitepaper
located on the SQL Server Best Practices web site on
Technet.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Dimension Processing BP
• Add indexes to the underlying tables dimension tables to help
improve the “select distinct” queries generated by AS
• Create a separate table or view for dimension processing so you
can optimize specifically for AS dimension processing.
• Set the appropriate values for parallel processing. In general this is
1-2 times CPU. Testing can help to find the optimal value.
• Use the XMLA <Parallel> nodes to group processing tasks.
• Use the <Transaction> node to group different objects to have
different transaction commits.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Maintenance Issues
Backup and Restore Strategies
• The backup and restorebackup and restore functionality has markedly improved within
SQL Server 2005.
• Note, SQL Server 2008 will introduce a newer version of this feature to help
improve scaling for huge cubes.
• Backup the SQL database that holds your OlapQueryLog table.OlapQueryLog table.
• You can use the Database SynchronizationDatabase Synchronization feature that allows you
to synchronize your database from a primary to a secondary server.
• Similar to AS2000, you can copy the full data foldercopy the full data folder as your
backup as well.
• Note, that some of information is encrypted so if you are restoring to a different
server, you will need to manually change connection strings and passwords.
• You can find more information about this approach in the best practices
whitepaper “Scale Out Querying with Analysis ServicesScale Out Querying with Analysis Services”.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Maintenance Issues
Planning for possible cube rebuildPlanning for possible cube rebuild
• How do you plan for the possible full cube rebuildplan for the possible full cube rebuild if
you have a catastrophic loss of your Analysis Services
database?
• For starters, partition your data (e.g. by time) if possible so that
way you can restore the available time periods of databases as
you are busy re-building any missing portions.
• Presuming you have a great backup solution, then you can
restore most of the data except for the current day. The current
day data can then be rebuilt against the existing data source.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Agenda
• Design
• Dimensions
• Cubes
• Aggregations
• Build
• Scalability – Processing
• Scalability – Queries
• Scalability options for multi-user queries
• Best Practices
• Maintain
• Monitor
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Performance Monitor
• SSAS 2005 has more perfmon tools.
• The counter “MSAS 2005: ProcessingRows“MSAS 2005: ProcessingRows
read/sec”read/sec” is helpful to troubleshoot or optimize parallel
processing. It provides the number of rows/second AS is
reading from the relational data source.
• Processing begins by sending a SQL query to get the
data to populate each partition.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
How to monitor AS performance
• Use the SQL Server ProfilerSQL Server Profiler to capture key trace events
of long running queries (user or processing).
• Use the Windows Event LogWindows Event Log as many AS events are
recorded there.
• The AS Query LogAS Query Log stores internal query information
meant to help in defining aggregations later.
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Demonstration #2
Profiling SQL Executed Statement
Other AS Monitoring Tools
• Use ascmd.exeascmd.exe sample application included in the SQL
Server SP2 Samples. You can use the –T option to
output trace file when running this utility.
• Create a system-wide trace filesystem-wide trace file to record the events
(refer to the attached XMLA file).
• Note, the AS Flight RecorderFlight Recorder exists to record the last
set of events that occurred in case of a catastrophic
event on the server. Within the AS Properties, check
“Show Advanced (All) Properties” and you will notice the
Flight Recorder properties
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
ProcessPartitionAndRunTrace.xmla
SQL CAT Presentations at PASS 2007
Session Code Session Title Speakers Date Time
DBA-410-M Designing for Petabyte using
Lessons Learned from Customer
Experiences
Lubor Kollar; Lasse
Nedergaard
9/19/2007 9:45 AM - 11:00 AM
DBA-411-M Building High Performance SQL
system using Lessons Learned
from customer deployments
Michael Thomassy;
Burzin Patel
9/19/2007 1:30 PM - 2:45 PM
DBA-412-M ISV configuration &
implementation using Lessons
Learned from customer
deployments
Juergen Thomas 9/20/2007 10:30 AM - 11:45 AM
DBA-413-M Building Highly Available SQL
Server implementations using
Lessons Learned from customer
deployments
Prem Mehra; Lindsey
Allen; Sanjay Mishra
9/20/2007 1:30 PM - 2:45 PM
DBA-416-M Building and Deploying Large
Scale SSRS farms using Lessons
Learned from customer
deployments
Denny Lee; Lukasz
Pawlowski
9/21/2007 9:45 AM - 11:00 AM
DBA-415-M Building & Maintaining large
cubes using Lessons Learned from
customer deployments
Nicholas Dritsas; Eric
Jacobsen
9/21/2007 1:00 PM - 2:15 PM
DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
Thank you!
Thank you for attending this session and the
2007 PASS Community Summit in Denver

More Related Content

What's hot

PPCD_And_AmazonRDS
PPCD_And_AmazonRDSPPCD_And_AmazonRDS
PPCD_And_AmazonRDS
Vibhor Kumar
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres Deployment
EDB
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
Calpont
 

What's hot (20)

Active/Active Database Solutions with Log Based Replication in xDB 6.0
Active/Active Database Solutions with Log Based Replication in xDB 6.0Active/Active Database Solutions with Log Based Replication in xDB 6.0
Active/Active Database Solutions with Log Based Replication in xDB 6.0
 
Scalability Design Principles - Internal Session
Scalability Design Principles - Internal SessionScalability Design Principles - Internal Session
Scalability Design Principles - Internal Session
 
PPCD_And_AmazonRDS
PPCD_And_AmazonRDSPPCD_And_AmazonRDS
PPCD_And_AmazonRDS
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
 
SQL 2014 In-Memory OLTP
SQL 2014 In-Memory  OLTPSQL 2014 In-Memory  OLTP
SQL 2014 In-Memory OLTP
 
4. (mjk) extreme performance 2
4. (mjk) extreme performance 24. (mjk) extreme performance 2
4. (mjk) extreme performance 2
 
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration PlanningDB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres Deployment
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional Databases
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
 
Best practices: running high-performance databases on Kubernetes
Best practices: running high-performance databases on KubernetesBest practices: running high-performance databases on Kubernetes
Best practices: running high-performance databases on Kubernetes
 
Student projects with open source CSQL
Student projects with open source CSQLStudent projects with open source CSQL
Student projects with open source CSQL
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 
Postgres Integrates Effectively in the "Enterprise Sandbox"
Postgres Integrates Effectively in the "Enterprise Sandbox"Postgres Integrates Effectively in the "Enterprise Sandbox"
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
SQL Server 2014 Features
SQL Server 2014 FeaturesSQL Server 2014 Features
SQL Server 2014 Features
 
OLAP
OLAPOLAP
OLAP
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
 

Similar to Designing, Building, and Maintaining Large Cubes using Lessons Learned

Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
Liran Zelkha
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)
Minal Patil
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
shuwutong
 

Similar to Designing, Building, and Maintaining Large Cubes using Lessons Learned (20)

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Java on the Mainframe
Java on the MainframeJava on the Mainframe
Java on the Mainframe
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 20197 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
 

More from Denny Lee

More from Denny Lee (20)

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger Brains
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 
Yahoo! TAO Case Study Excerpt
Yahoo! TAO Case Study ExcerptYahoo! TAO Case Study Excerpt
Yahoo! TAO Case Study Excerpt
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Designing, Building, and Maintaining Large Cubes using Lessons Learned

  • 1. Design, Building & Maintaining large cubes using Lessons Learned Nicholas Dritsas, Eric Jacobsen, Denny Lee SQL Server Customer Advisory Team Microsoft Corp.
  • 2. Customer Advisory Team • Works on largest, most complex SQL Server projects worldwide • US: NASDAQ, USDA, Verizon, Raymond James… • Europe: London Stock Exchange, Barclay’s Capital • Asia and Pacific: Korea Telecom, Western Digital, Japan Railways East • ISVs: SAP, Siebel, Sharepoint, GE Healtcare • Drives product requirements back into SQL Server from our customers and ISVs • Shares best practices with SQL Server community • http://blogs.msdn.com/sqlcat - CAT team blog • http://blogs.msdn.com/mssqlisv - ISV blog • http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/default.mspx • Coming soon: http://www.sqlcat.com – technical notes and case studies We are wearing the Orange shirts during the conference. Stop, say hello and feel free to ask us any questions.
  • 3. Agenda • Design • Dimensions • Cubes • Aggregations • Build • Scalability – Processing • Scalability – Queries • Scalability options for multi-user queries • Best Practices • Maintain • Monitor DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 4. Agenda • Design • Dimensions • Cubes • Aggregations • Build • Scalability – Processing • Scalability – Queries • Scalability options for multi-user queries • Best Practices • Maintain • Monitor DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 5. Designing Dimensions Slowly changing and large dimensions • Slowly changing dimensions Type 2: • Minimize data updates to avoid cube reprocessing • If you must update, do the ProcessAdd every evening and perform weekly full processing. NOTE: This is only available in XMLA • Large dimensions: • Use natural hierarchies • Dimension SQL queries are in the form of "select distinct colA, colB, … from [DimensionTable]" • Many hierarchies introduce many select distinct statements. Look on tuning SQL indexes. • See TK Anand’s article, http://msdn2.microsoft.com/en- us/library/ms345142.aspx, for more details • Maximum Size of dimensions • Successful implementations with 10 million members. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 6. Designing Cubes Using partitions • Partition by time plus another dimension too, such as geography • For real-time BI, you may want having only the most recent partition ROLAP with the other partitions in MOLAP. • NOTE: When data changes, all data cache for the measure group is discarded. So, it may make sense to separate cube or measure groups by “static” and “real-time” analysis. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 7. Designing Aggregations Here is an optimized method: 1) Create your aggregations via the Agg Wizard at 5-10% 2) Turn on the query log and set the sampling to 1 to record all queries. Delete queries from the OlapQueryLog table that have a Duration <100ms as those queries are pretty fast. 3) Run a set of MDX queries that best represent the type of questions you will be typically asked by your users. 4) With the OlapQueryLog table full of data, write a SQL statement to only get the rows of slow queries and/or queries executed often. 5) This SQL statement can now be used in the Aggregation Manager sample (found in Codeplex or SQL Server SP2 Samples) DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 8. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned Demonstration #1 Aggregation Manager
  • 9. Agenda • Design • Dimensions • Cubes • Aggregations • Build • Scalability – Processing • Scalability – Queries • Scalability options for multi-user queries • Best Practices • Maintain • Monitor DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 10. Scalability - Processing • Scale by processing many partitions in parallel • No more than 2,000-4,000 total partitions • If you need more, ensure you have installed build 3166 or later • Parallelism is applicable when: • dimensions change and need to reprocess partitions • adding or modifying measure groups • changing aggregation design DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 11. Scalability - Processing Process Data Process Indexes DBA-415-M – Building and Maintaining Large Cubes Lessons Learned Read Data from SQL Server Lookup Dimension Keys Write to *.fact.data files Look at *.fact.data files Build *.map files Write to *.agg.*.data files
  • 12. Scalability - Processing Use ProcessData and ProcessIndexes • ProcessFull is the default method and it executes the ProcessData and ProcessIndexes jobs. • Processing completes faster and AS uses fewer memory resources when using ProcessData and ProcessIndexes separately. Process enumeration Processing time Two-step process 00:12:23 ProcessData 00:10:18 ProcessIndexes 00:02:05 ProcessFull 00:13:34 DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 13. ProcessFull ProcessData ProcessIndexes DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 14. Scalability - Processing • ProcessData, rule of thumb • 40-80 K rows/sec, per partition • Best if you have • Integer keys • Less than 10 measures • No SQL joins • Example: • One customer with many partitions in our lab saw 400 K rows/sec sustained, 700 K rows/sec peaks, on an 8-CPU machine • We have seen customers with hundreds of measures DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 15. Scalability - Processing • Next Slide – Show Me the Numbers • Project Real data • Unisys 16-processor machine • Chose 16 partitions with most similar size • For each data point <n>, set MaxParallel=<n>, used <n> partitions • Integer keys, 2-part composite keys DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 16. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 17. Scalability - Processing • Why performance loss at 16?? • On this machine, with this configuration, with this data, Memory Quota limited to running only 12 in parallel, based on estimates, so last 4 had to wait • Adding more memory can help • Imagine Gantt chart showing which partitions run at a time • Takeaway point is there can be many things to reduce scalability, but it is often possible to get good performance DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 18. Scalability - Processing • Distinct Count • Limit one distinct count measure per measure group • Distinct Count cannot use in-memory DataCache to derive storage engine queries – only goes to fact table partitions • During processing, SQL Server does the sorting • Items to look for: • Memory grants on SQL Server could limit to only 3 partitions running in parallel, if the query plan generated for these queries exceeds 1/4 the memory on SQLServer. • Watch perfmon counter for SQLServer: "SQLServer:MemoryManagerMemory Grants Outstanding", "SQLServer:MemoryManagerMemory Grants Pending". • If you need to process more than 3 partitions in parallel, contact CSS • SQL Query timeout error HYT00. • Modify <ExternalTimeout>. • A query will be canceled if no rows are returned in that time. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 19. Scalability - Processing • Dimension processing, potential concerns • Longest pole (Gannt chart analogy) when processing very large dimension, e.g. 10 million members • Size limitation – 4 GB for “string store”, stored in Unicode, 6 byte per-string overhead. • E.g. 50-character name: 4*1024*1024*1024 / (6+50*2) = 40.5 million members • Consider size of other properties/attributes • We saw recent case of discretization of 10 million member customer dimension – workarounds include do it in SQL using NTile or define hierarchy. (Go back to business logic.) • Usually bigger concern is impact of changing dimensions DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 20. Scalability - Processing • Scale up vs. Scale out • Scale up = big machine, scale out = many machines • Today, officially, only scale up • Scale out • Better economics • Can be better flexibility of machine usage • AS team is considering delivering a supported method DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 21. Scalability - Tools • Goal • Drive server with workload at <n> users • Measure average throughput (queries/sec) • Measure average response time (sec/query) • Can support <n> users such that average response time < 15 seconds (as example) • Next slide – example graph • Each point is a 15-minute run DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 22.
  • 23. Scalability - Tools • Some available tools • VBScript - per client, parse mdx text files, execute queries, log to CSV file, analyze with Excel • VSTS – Visual Studio Team System • Framework to run multi-user scenarios, record perfmon counters, ramp up users • There is sample on http://www.codeplex.com, not production quality yet but a good start • LoadRunner • ASCMD utility, soon to be updated with some additions for Multi User testing • Roll your own … DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 24. Scalability - Tools • Input to VSTS tools • Number of clients (users) to simulate • Query file for each client (user) – each represents a sequence of user actions • Time to run – e.g. 15 minutes • Think time – e.g. random between 10 to 20 seconds • Output • Average throughput – queries/sec • Average response time – sec/query • Perfmon counters • Issue: Is it realistic (representative) • Issue: Any problems? (Could be many.) DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 25. What effect do we see? DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 26. Another view of same, what is the effect? DBA-415-M – Building and Maintaining Large Cubes Lessons Learned TIP: CREATE CACHE or warm-up queries can help after server startup
  • 27. Multi user query load testing • Things to watch out for • Duplicate queries (VSTS sample – 18,000 queries gives about 2,000 unique) • ASQueryGenerator and documents with some examples on how to create template queries to reduce duplicate and empty queries are due for an update soon. • Empty results (think of cube sparsity) – Real users rarely look at empty regions (e.g. Canada swimsuit sales in December) • Caching (applies to real world but can skew results good or bad) • Think time • Anything you can think of looking at … DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 28. Multiuser scalability • Load considerations • Process partition while under load • Writeback • Proactive caching (real time) • Force-Commit timeout • Ramp up effects (connection, query warm up time) • End effects (include measurements when last users finish?) • No one-touch tool to examine effects • WAN (wide area network) simulation DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 29. Scalability options • Performance less than expected? • IO Bound? • Look at disk system, including controllers, number of disks per controller. • Direct-attach, SAN choices. • Use SQLIO to measure hardware. • SSAS Partitions, Aggregations, 64-bit vs. 32-bit • CPU Bound? • Scale up (bigger machine), scale out (more machines, use replication) • SSAS calculations • Look at queries. • Think about reasonability. Not a black box, break down to components, try removing components/factors, isolate • Diagnosing Query performance paper DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 30. Scale Up for Queries • Scale up = more CPUs • Note that 4 socket machines relatively cheaper than > 4 sockets, not a linear cost factor • Benefits scenarios that are CPU bound, and parallel • Some customers led to believe (wrongly) that bigger machine will improve every query, even formula-engine calculations running by itself • Presently only way to improve parallel processing DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 31. Scale Out for Queries • In clustering you ensure high availability, with scale out you optimize query performanceoptimize query performance. • You can setup multiple query-only AS databases on multiple AS servers to handle a larger number of concurrent users. • NLB or other TCP/IP load balancing is typically used. • You can find out more info within the “Scale Out Querying with AnalysisScale Out Querying with Analysis ServicesServices” whitepapers. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 32. Scalability • Looking back at multiuser load testing • Part of loop, understand what is happening, go back to goals, investigate where time is going, consider if it makes sense • Solutions: • Avoid time-expensive operations, • Buy more hardware (scale up/scale out) • Limit project goals (e.g. standard reports show less information), revisit alignment to business goals • Rewrite mdx queries, change design strategy (e.g. one cube for semi-static analysis, another for 15-minute updates) • Load testing and analysis can take 50% or more of project time. Quick POC approach might uncover some issues early. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 33. Agenda • Design • Dimensions • Cubes • Aggregations • Build • Scalability – Processing • Scalability – Queries • Scalability options for multi-user queries • Best Practices • Maintain • Monitor DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 34. What can you do to improve processing performance? • These best practices recommendations are based on the lessons learned from working with many enterprise AS customers. • The suggestions below can be found in the AnalysisAnalysis Services Processing Best PracticesServices Processing Best Practices whitepaper located on the SQL Server Best Practices web site on Technet. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 35. Dimension Processing BP • Add indexes to the underlying tables dimension tables to help improve the “select distinct” queries generated by AS • Create a separate table or view for dimension processing so you can optimize specifically for AS dimension processing. • Set the appropriate values for parallel processing. In general this is 1-2 times CPU. Testing can help to find the optimal value. • Use the XMLA <Parallel> nodes to group processing tasks. • Use the <Transaction> node to group different objects to have different transaction commits. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 36. Agenda • Design • Dimensions • Cubes • Aggregations • Build • Scalability – Processing • Scalability – Queries • Scalability options for multi-user queries • Best Practices • Maintain • Monitor DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 37. Maintenance Issues Backup and Restore Strategies • The backup and restorebackup and restore functionality has markedly improved within SQL Server 2005. • Note, SQL Server 2008 will introduce a newer version of this feature to help improve scaling for huge cubes. • Backup the SQL database that holds your OlapQueryLog table.OlapQueryLog table. • You can use the Database SynchronizationDatabase Synchronization feature that allows you to synchronize your database from a primary to a secondary server. • Similar to AS2000, you can copy the full data foldercopy the full data folder as your backup as well. • Note, that some of information is encrypted so if you are restoring to a different server, you will need to manually change connection strings and passwords. • You can find more information about this approach in the best practices whitepaper “Scale Out Querying with Analysis ServicesScale Out Querying with Analysis Services”. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 38. Maintenance Issues Planning for possible cube rebuildPlanning for possible cube rebuild • How do you plan for the possible full cube rebuildplan for the possible full cube rebuild if you have a catastrophic loss of your Analysis Services database? • For starters, partition your data (e.g. by time) if possible so that way you can restore the available time periods of databases as you are busy re-building any missing portions. • Presuming you have a great backup solution, then you can restore most of the data except for the current day. The current day data can then be rebuilt against the existing data source. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 39. Agenda • Design • Dimensions • Cubes • Aggregations • Build • Scalability – Processing • Scalability – Queries • Scalability options for multi-user queries • Best Practices • Maintain • Monitor DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 40. Performance Monitor • SSAS 2005 has more perfmon tools. • The counter “MSAS 2005: ProcessingRows“MSAS 2005: ProcessingRows read/sec”read/sec” is helpful to troubleshoot or optimize parallel processing. It provides the number of rows/second AS is reading from the relational data source. • Processing begins by sending a SQL query to get the data to populate each partition. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 41. How to monitor AS performance • Use the SQL Server ProfilerSQL Server Profiler to capture key trace events of long running queries (user or processing). • Use the Windows Event LogWindows Event Log as many AS events are recorded there. • The AS Query LogAS Query Log stores internal query information meant to help in defining aggregations later. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned
  • 42. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned Demonstration #2 Profiling SQL Executed Statement
  • 43. Other AS Monitoring Tools • Use ascmd.exeascmd.exe sample application included in the SQL Server SP2 Samples. You can use the –T option to output trace file when running this utility. • Create a system-wide trace filesystem-wide trace file to record the events (refer to the attached XMLA file). • Note, the AS Flight RecorderFlight Recorder exists to record the last set of events that occurred in case of a catastrophic event on the server. Within the AS Properties, check “Show Advanced (All) Properties” and you will notice the Flight Recorder properties DBA-415-M – Building and Maintaining Large Cubes Lessons Learned ProcessPartitionAndRunTrace.xmla
  • 44. SQL CAT Presentations at PASS 2007 Session Code Session Title Speakers Date Time DBA-410-M Designing for Petabyte using Lessons Learned from Customer Experiences Lubor Kollar; Lasse Nedergaard 9/19/2007 9:45 AM - 11:00 AM DBA-411-M Building High Performance SQL system using Lessons Learned from customer deployments Michael Thomassy; Burzin Patel 9/19/2007 1:30 PM - 2:45 PM DBA-412-M ISV configuration & implementation using Lessons Learned from customer deployments Juergen Thomas 9/20/2007 10:30 AM - 11:45 AM DBA-413-M Building Highly Available SQL Server implementations using Lessons Learned from customer deployments Prem Mehra; Lindsey Allen; Sanjay Mishra 9/20/2007 1:30 PM - 2:45 PM DBA-416-M Building and Deploying Large Scale SSRS farms using Lessons Learned from customer deployments Denny Lee; Lukasz Pawlowski 9/21/2007 9:45 AM - 11:00 AM DBA-415-M Building & Maintaining large cubes using Lessons Learned from customer deployments Nicholas Dritsas; Eric Jacobsen 9/21/2007 1:00 PM - 2:15 PM
  • 45. DBA-415-M – Building and Maintaining Large Cubes Lessons Learned Thank you! Thank you for attending this session and the 2007 PASS Community Summit in Denver

Editor's Notes

  1. Slowly changing dimensions: Solve Type 2: Do your best to design your dimensions so that you need not update any of the data. This ensures that you do not need to reprocess your cubes when you process the dimensions. If this is possible, you can use the processing enumeration ProcessAdd. If you need to do updates, you can always do the ProcessAdd every evening and perform the full processing including updates every end-of-week or every end-of-month. Large dimensions: TODO: explain what natural hierarchies and then it is needed In general, most dimensions created have natural hierarchies within them. Make sure you are using them. Dimension SQL queries are in the form of &amp;quot;select distinct colA, colB, … from [DimensionTable]&amp;quot;. With more hierarchies within each dimension, instead of one large select distinct statement, there are multiple smaller select distinct statements associated with each hierarchy ProcessAdd is solution for this scenario. See TK Anand’s article, http://msdn2.microsoft.com/en-us/library/ms345142.aspx. Basically for the dimension, point to just the new rows, and do ProcessAdd. Then ProcessFull on latest partition. This assumes only it is affected, and older partitions will not have references to the new member.
  2. #4) don’t use the default
  3. select MaxDuration/QueryCount AvgDuration, * from (  select count(*) QueryCount, max(duration) MaxDuration, msolap_database, msolap_objectpath, dataset  from OlapQueryLog  group by msolap_database, msolap_objectpath, dataset) Twhere QueryCount &amp;gt;= 1  -- Change if desiredand MaxDuration/QueryCount &amp;gt;= 1 -- Change if desiredand MaxDuration &amp;gt; 100 -- Change if desired, units are millisecand msolap_objectpath = &amp;apos;MyObjectPath&amp;apos; -- Changeorder by msolap_objectpath, MaxDuration desc
  4. Mention Build 3166 fixing version map and msxml6 issue, now number of partitions can be larger. Mention KableNews, that had hundreds of measures. Slow rows/sec rate, but alternative design choice would be to have many more rows. E.g. 4th normal form.
  5. Longest pole – think of gantt chart,
  6. The point is throughput should go up until it plateaus, and response time should gradually increase, until resource saturation, then queuing is contributing factor.
  7. Perf team wanted to run for longer time, to see what effect it has. If run too short, might have high-frequency noise, not steady state. If run too long, takes too much time. One of perf team members said “guess what, I ran longer and it got better”. However, it was too good to be true. Later queries benefit from earlier queries filling cache. Happens in real world, but it left us not knowing how long to run, and issue of running longer giving better times. We discovered a problem and later fixed it by changing the queries we sent to not share cache.
  8. Give example of 18000 generated queries, after duplicate removal and empty removal, got 2142 queries. Was 1500 users, ended up supporting 50-70 users with average think time 15 sec.
  9. SQL Server Best Practices site: http://technet.microsoft.com/en-us/sqlserver/bb331794.aspx Analysis Services Processing Best Practices whitepaper http://technet.microsoft.com/en-us/sqlserver/bb331794.aspx
  10. Add indexes to the underlying tables dimension tables to help improve the “select distinct” queries generated by AS. You can use SQL Index Tuning Wizard. Probably only needed for largest dimension. Create a separate table or view for dimension processing so you can optimize specifically for AS dimension processing. Use the ProcessAdd enumeration if you are only adding new dimension members. This is only available in XMLA and is an optimized version of ProcessUpdate when only adding new members. (Consider as part of a strategy for slowly changing dimensions.) Upgrade to AS SP2+
  11. Database synchronization can be a great alternative to backup/restore presuming obtaining extra HW is not an issue. This way you can have an active running backup of the system in case your primary system was lost. Backup the SQL database that holds your OlapQueryLog table. It contains valuable data to optimize your AS queries; you may want to run a SQL Agent to delete rows w/ &amp;lt;100ms duration.
  12. Demo Script: Open up SQL Management Studio and connect to the Analysis Services server Connect to the “Adventure Works DW” OLAP database Expand out Adventure Works DW &amp;gt; Cubes &amp;gt; Adventure Works &amp;gt; Measure Groups &amp;gt; Internet Sales &amp;gt; Partitions &amp;gt; Internet Sales 2004 Open up SQL Server Profiler and run a trace on the SQL server Process the Internet Sales 2004 partition Stop the trace file in the Profiler and pick one of the SQL queries executed in the Profiler Purpose: Show the SQL query AS sends to the SQL server Notice how the query is created – which lends credence to the INT keys Notice how the base nested query is the query binding query – which lends credence to the query binding