Biug 20112026 dimensional modeling and mdx best practices

BIUG

Itay Braun
CTO
itay@twingo.co.il

Agenda

l Dimension Design
l SSAS Best Practices
l MDX

l Inspired by Vincent Rainardi (http://sqlbits.com/ )
and Mosha Pumanski

l
l
l
BI- DB l
l
l
DB- BI- l

Microstrategy- BI •
DWH •
•
BI

SQL SERVER- SYBASE- •
•
DB P T • DB
OEM •

•
WEB SILVERLIGHT C •
•
.NET

White Papers
• Analysis Services 2008 R2 Performance
Guide
• Analysis Services 2008 Operation
Guide
• Performance Improvements for MDX in
SQL Server 2008 Analysis Services
• OLAP Design Best Practices

1 or 2 dimensions
a) One Dimension b) Two Dimensions

Dim Dim
Account Account
Fact Fact
Table Table
customer
Dim
attributes
Customer

• We can get the customer
• Simplicity, 1 dim
attributes without knowing the
• Hierarchy from customer
account key
attribute &account attribute
• Disadvantage: can‟t go from
• Use when we don‟t have fact
account to customer without
tables requiring customer grain.
going through the fact table -
performance

1 or 2 dimensions
c) Snowflake

Dim Dim
Account Customer
Fact
Table • Dim customer is needed by another fact table
• Modular: 2 separate dim tables but we can combine
them easily to create a bigger dimension
• To get the breakdown of a measure by a customer
attribute is a bit more complicated than a)
select c. attribute, sum(f.measure1) from fact1 f
inner join dim_account a on f.account_key = a.account_key
inner join dim_customer c on a.customer_key = c.customer_key
group by c. attribute

When to Snowflake
1. When the sub dim is used by several dims
City-Country-Region columns exist in
DimBroker, DimPolicy, DimOffice and
DimInsured
Replaced by Location/GeoKey
pointing to DimLocation /
DimGeography
Advantage: consistent hierarchy, i.e. relationship between
City, Country & Region.
Weakness: we would lose flexibility. City to Country are
more or less fixed, but the grouping of countries might be
different between dimensions.

When to Snowflake
2. When the sub dim is used by both the main dim and
the fact table(s)
• DimCustomer is used in DimAccount,
and is also used in the fact table.
• DimManufacturer is used in DimProduct,
and is also used in the fact table.
• DimProductGroup is used in DimProduct,
and is also used in some fact table.

The alternative is maintaining two
full dimensions (star classic).

When to Snowflake
3. To make “base dim” and “detail dim”
Insurance classes, account types
(banking), product lines, diagnosis,
treatment (health care)
Policies for marine, aviation & property classes have different
attributes.
Pull common attributes into 1 dim: DimBasePolicy
Put class-specific attributes into DimMarine, DimProperty, DimAviation
Ref: Kimball DW Toolkit 2nd edition page 213

A dimension with only 1 attribute

Should we put the attribute in the fact table?
(like DD = Degenerate Dim)
Probably, if the grain = fact table,
and it‟s short or it‟s a number.
Reasons for putting single attribute in its own dim:
– Keep fact table slim (4 bytes int not 100 bytes varchar)
– When the value changes, we don‟t have to update the
BIG fact table – ETL performance
– Grain is much lower than fact table – small dim
– Yes it‟s only 1 attribute today, but in the future there
could be another attribute.

Fact Table Primary Key
Should we have a PK? Some experts totally disagree

Yes, if we need to be able to identify each fact row
1. Need to refer to a fact row from another fact row e.g. chain of events
2. Many identical fact rows and we need to update/delete only one
3. To link the fact table to another fact table

Related Trans Header - Detail Uniqueness

PK FK PK FK (no RI) PK
(not enforced)
previous/next transaction

Single or Multi Column?
Single Column: Generated Identity
Multi Column: Dimension Keys
Single-column PK is better than multi-column PK because :
1) A multi-column PK may not be unique. A single-column PK
guarantees that the PK is unique, because it is an identity column.
2) A single-column PK is slimmer than a multi-column PK, better query
performance. To do a self join in the fact table (e.g. to link the current
fact row to the previous fact row), we join on a single integer column.

• Advantage: Prevent duplicate rows, query performance
• Disadvantage: loading performance
• Indexing the PK: cluster or not?
– Cluster the PK if: the PK is an identity column
– Don‟t cluster the PK if: the PK is a composite, or when you need
the cluster index for query performance (with partitioning)

Example of not having a PK
If duplicate fact rows are allowed.
e.g. retail DW: Store Key, Date Key, Product Key, Customer Key
Same customer buying the same milk in the same shop on the same day
twice

Aggregate Fact Tables
What are they?
Base Fact Tables
• High level aggregation of base fact tables
• A “select group by” query on a 2 billion rows
fact table can take 30 mins if it joins with two
big fact tables, even with indexes in place
• So we do this query in advance as part of the
DW load and store it as an Aggregate Fact
Table 30 mins
• The report only takes 1 second to run.
Aggregate
1 sec Fact Table

Report

Rapidly Changing Dimension
• Why is it a problem
– Large SCD2 dim – Attributes change every day
– Slow query when join with large fact tables
• What to do
– Put into a separate dim, link direct to fact table.
– Just store the latest, type 1 attributes (or dual)
– Store in the fact table (for small attribute, e.g. indicator)

Type2 Type2 Type2

Type2 Type1

Very Large Dimension
Why is it a problem
– SSAS: 4 GB string store limit for dimension
– SSAS: dim is “select distinct” on each attribute
– long processing time
– Difficult to browse high cardinality attribute
– Join with fact tables – performance

Very Large Dimension
What to do
– Split into 2 dims, same grain. Always cut vertically.
– Remove SCD2, or at least only certain columns.
– Most common: separate the attributes with high cardinality/change
frequency

VLD

Real Time Fact Table
• Reporting the transaction system in real time
• View to union with the normal fact table, or use partitions
• Freezing the dims for key lookup, -3 unknown key
• Key corrections next day

Dims as of Main partition
yesterday (up to last night)
Unknown keys:
-1 null in source
-2 not in dim table Real time partition
-3 not in dim table as dim was frozen dim (intraday today)
to be resolved next batch key

Dealing with Currency Rates
What for/background/requirements
– Report in 3 reporting currencies, using today rates or past
– Analyse over time without the impact of currency rates (using fixed
currency rates, e.g. 2010 EOY rates)
– Had the transactions happened today
– Currency rates historical analysis

Transaction DW Reporting
Currency Transaction Currency Reporting Currency
Rates Rates
100 countries (many transaction 1 currency ( 1 reporting 3-4 currencies
40 currencies dates) e.g. GBP GBP, USD, EUR,
date)
Original

Dealing with Currency Rates
• A good example can be found here.

Dealing with Status
What/background
– Workflow (policies, contracts, documents)
– Bottleneck analysis (no of days between
stages)
– How many on each stage

Status Status Status Status
1 2 4 6
date1 date2 date3 date4
Status Status
3 5

Dealing with Status
Approaches
– Accumulative Snapshot Fact, 1 row per application
– SCD2 on DimApp AppKey AppID StsKey StsDate Current
1 1 1 1/3/11 N
– App Status fact table
2 1 2 3/3/11 N
3 1 3 7/3/11 Y
AppKey StsKey StsDateKey
4 2 1 6/3/11 N
1 1 61
5 2 2 7/3/11 Y
1 2 63
1 3 67
2 1 66
AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind
2 2 67
1 1/3/11 1 3/3/11 1 7/3/11 1
2 6/3/11 1 7/3/11 1 0

Referenced Dimensions
• Enables using one “master” member
• Not Snowflake dimension
– For ex.
• Dim customers: UK, London, Roman Avramovich.
• Dim Stores: UK, London, Friendly Bikes Store
– What is the total revenue from Internet
customers and stores in London?

MDX optimization Methodology
• Re-write the MDX code
• Add Aggregations
• Add pre-calculated Measure Groups (ETL)
• Solve the problem using Relational Engine
• Use .NET Store Procedures.
– Rarely the problem can be solved using better
hardware.
• Column based Databases

• Optimizing MDX
– Baselining Query Speeds
• Clearing the Analysis Services Caches
• Clearing the Operating System Caches using
fsutil.exe or SSAS Stored Proc (codeplex)
• Identifying and Resolving MDX Query
Performance Bottlenecks in SQL Server 2005
Analysis Services
• Configuring the Analysis Services Query Log

• Cell-by-Cell Mode vs. Subspace Mode
Almost always, performance obtained by
using subspace (or block computation)
mode is superior to that obtained by using
cell-by-cell (nor naïve) mode.

Using Profiler
• So far so good

Subcube
• Granularity
• Slice

Granularity
• Single grain
– List of GROUP BY attributes in SQL SELECT
• Mixed grain
– Both Attribute.[All] and Attribute.MEMBERS

Granularity
All Countries, Countries,
Country, All City Cities
All City
All
Products

Products

Slice
• Single member
– SQL: Where City = „Redmond‟
– MDX: [City].[Redmond]
• Multiple members
– SQL: Where City IN („Redmond‟, „Seattle‟)
– MDX: { [City].[Redmond], [City].[Seattle] }

Slice at granularity
SQL
SELECT Sum(Sales), City FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
GROUP BY City
MDX
SELECT Measures.Sales ON 0
, NON EMPTY {Redmond, Seattle} ON 1
FROM Sales_Cube

Slice below granularity

SQL
SELECT Sum(Sales) FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
MDX
SELECT Measures.Sales ON 0
FROM Sales_Cube
WHERE {Redmond, Seattle}

Examples

All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London

Examples

All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
(Seattle, Year.Year.MEMBERS)

Examples

All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
(Seattle, Year.MEMBERS)

Examples

All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
({Redmond, Seattle, London}, Year.MEMBERS)

Examples

All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
({Redmond, Seattle}, {2005, 2006, 2007})

Arbitrary shaped subcubes
• What is it ?
• How can it happen ?
• Why is it so bad ?
• How to avoid them ?


All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
Lodnon
Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS,
2005))


All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
SF
Denver

CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) –
(Seattle, 2007)


All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
{(Redmond,2005), (Seattle, 2006), (New York, 2007), (London,
2008)}


All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All
Years]))

Arbitrary shapes
• WHERE/Subselect/Aggregate
• Unnatural hierarchies
• Parent-Child (visual totals)
• “Non Leaves” subcube
• Conditional logic (IIF, IF, CASE,
CoalesceEmpty etc)
• NonEmpty, Exists

WHERE/Subselect
• Severity = „1‟ OR Priority = „1‟
• multiselect
– {USA, London}

Mixed grain slicer
All

USA UK

New
Seattle London Bristol
York

Mixed grain slicer
All

USA UK

New
Seattle London Bristol
York

All Cities Seattle New York London Bristol
All Countries
USA
UK

Leaves vs. Non Leaves
All Countries, Countries,
Country, All City Cities
All City
All
Product
s

Product Leaves
s

Problems with arbitrary shapes
• Caching
• Partition slices
• Indexes
• SCOPEs
• Matching calculations
• Many more
(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)

SCOPE
SCOPE ( [Date].[Month of Year].[All Periods],
[Date].[Month Name].[All],
Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,
{ ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )
);
...;
END SCOPE;

Subcube decomposition
Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,
{ ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )
);
...;
END SCOPE;

Scope 2

Scope 3 Scope 1

Subcube decomposition
Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ),
Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )
...;
END SCOPE;
[DateTool].[Aggregation].DefaultMember,
Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )
...;
END SCOPE;
Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ),
[DateTool].[Comparison].DefaultMember
...;
END SCOPE;

MDX Optimization - Tips
• Partial expressions are not cached
This = iif(<expensive expression >= 0, 1/<expensive expression>, null);

create member currentcube.measures.MyPartialExpression as <expensive
expression> , visible=0;
this = iif(measures.MyPartialExpression >= 0, 1/
measures.MyPartialExpression, null);

SSAS Denali
• Coming in the first half of 2012
• SSAS Tabular Mode
– Cheaper
– Not best of breed
– Uses DAX or MDX
• Have you started working with it?

Mobile BI

BI l

Smart Phone l
BI l

Mobile Bi
BI

Gartner

Social BI

• Discover New Insights - Analyze the
demographic and psychographic profiles
of your Facebook application users.
• Analyze Facebook Data - Analyze the full
spectrum of Facebook data: profiles,
interests, check-ins, and more
• Instantly Available via Cloud

Social BI
• Deep Personalization

• Enterprise Data Integration

Survey
• SQL / SSAS Denali
• Mobile BI
• Social BI

Biug 20112026 dimensional modeling and mdx best practices

Biug 20112026 dimensional modeling and mdx best practices

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Biug 20112026 dimensional modeling and mdx best practices

Similar to Biug 20112026 dimensional modeling and mdx best practices (20)

Recently uploaded

Recently uploaded (20)

Biug 20112026 dimensional modeling and mdx best practices