Microstrategy- BI • DWH • • BI SQL SERVER- SYBASE- • • DB P T • DB OEM • •WEB SILVERLIGHT C • • .NET
White Papers• Analysis Services 2008 R2 Performance Guide• Analysis Services 2008 Operation Guide• Performance Improvements for MDX in SQL Server 2008 Analysis Services• OLAP Design Best Practices
1 or 2 dimensions a) One Dimension b) Two Dimensions Dim Dim Account Account Fact Fact Table Table customer Dim attributes Customer • We can get the customer• Simplicity, 1 dim attributes without knowing the• Hierarchy from customer account key attribute &account attribute • Disadvantage: can‟t go from• Use when we don‟t have fact account to customer without tables requiring customer grain. going through the fact table - performance
1 or 2 dimensionsc) Snowflake Dim Dim Account Customer Fact Table • Dim customer is needed by another fact table • Modular: 2 separate dim tables but we can combine them easily to create a bigger dimension • To get the breakdown of a measure by a customer attribute is a bit more complicated than a) select c. attribute, sum(f.measure1) from fact1 f inner join dim_account a on f.account_key = a.account_key inner join dim_customer c on a.customer_key = c.customer_key group by c. attribute
When to Snowflake1. When the sub dim is used by several dims City-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured Replaced by Location/GeoKey pointing to DimLocation / DimGeography Advantage: consistent hierarchy, i.e. relationship between City, Country & Region. Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
When to Snowflake2. When the sub dim is used by both the main dim andthe fact table(s) • DimCustomer is used in DimAccount, and is also used in the fact table. • DimManufacturer is used in DimProduct, and is also used in the fact table. • DimProductGroup is used in DimProduct, and is also used in some fact table. The alternative is maintaining two full dimensions (star classic).
When to Snowflake3. To make “base dim” and “detail dim”Insurance classes, account types(banking), product lines, diagnosis,treatment (health care)Policies for marine, aviation & property classes have differentattributes.Pull common attributes into 1 dim: DimBasePolicyPut class-specific attributes into DimMarine, DimProperty, DimAviationRef: Kimball DW Toolkit 2nd edition page 213
A dimension with only 1 attribute Should we put the attribute in the fact table? (like DD = Degenerate Dim) Probably, if the grain = fact table, and it‟s short or it‟s a number.Reasons for putting single attribute in its own dim:– Keep fact table slim (4 bytes int not 100 bytes varchar)– When the value changes, we don‟t have to update the BIG fact table – ETL performance– Grain is much lower than fact table – small dim– Yes it‟s only 1 attribute today, but in the future there could be another attribute.
Fact Table Primary KeyShould we have a PK? Some experts totally disagreeYes, if we need to be able to identify each fact row1. Need to refer to a fact row from another fact row e.g. chain of events2. Many identical fact rows and we need to update/delete only one3. To link the fact table to another fact tableRelated Trans Header - Detail Uniqueness PK FK PK FK (no RI) PK (not enforced) previous/next transaction
Fact Table Primary KeySingle or Multi Column? Single Column: Generated Identity Multi Column: Dimension KeysSingle-column PK is better than multi-column PK because :1) A multi-column PK may not be unique. A single-column PKguarantees that the PK is unique, because it is an identity column.2) A single-column PK is slimmer than a multi-column PK, better queryperformance. To do a self join in the fact table (e.g. to link the currentfact row to the previous fact row), we join on a single integer column.
Fact Table Primary Key• Advantage: Prevent duplicate rows, query performance• Disadvantage: loading performance• Indexing the PK: cluster or not? – Cluster the PK if: the PK is an identity column – Don‟t cluster the PK if: the PK is a composite, or when you need the cluster index for query performance (with partitioning)Example of not having a PK If duplicate fact rows are allowed. e.g. retail DW: Store Key, Date Key, Product Key, Customer Key Same customer buying the same milk in the same shop on the same day twice
Aggregate Fact TablesWhat are they? Base Fact Tables• High level aggregation of base fact tables• A “select group by” query on a 2 billion rows fact table can take 30 mins if it joins with two big fact tables, even with indexes in place• So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table 30 mins• The report only takes 1 second to run. Aggregate 1 sec Fact Table Report
Rapidly Changing Dimension• Why is it a problem – Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables• What to do – Put into a separate dim, link direct to fact table. – Just store the latest, type 1 attributes (or dual) – Store in the fact table (for small attribute, e.g. indicator) Type2 Type2 Type2 Type2 Type1
Very Large DimensionWhy is it a problem – SSAS: 4 GB string store limit for dimension – SSAS: dim is “select distinct” on each attribute – long processing time – Difficult to browse high cardinality attribute – Join with fact tables – performance
Very Large DimensionWhat to do– Split into 2 dims, same grain. Always cut vertically.– Remove SCD2, or at least only certain columns.– Most common: separate the attributes with high cardinality/change frequency VLD
Real Time Fact Table• Reporting the transaction system in real time• View to union with the normal fact table, or use partitions• Freezing the dims for key lookup, -3 unknown key• Key corrections next day Dims as of Main partition yesterday (up to last night)Unknown keys:-1 null in source-2 not in dim table Real time partition-3 not in dim table as dim was frozen dim (intraday today) to be resolved next batch key
Dealing with Currency RatesWhat for/background/requirements– Report in 3 reporting currencies, using today rates or past– Analyse over time without the impact of currency rates (using fixed currency rates, e.g. 2010 EOY rates)– Had the transactions happened today– Currency rates historical analysis Transaction DW Reporting Currency Transaction Currency Reporting Currency Rates Rates100 countries (many transaction 1 currency ( 1 reporting 3-4 currencies40 currencies dates) e.g. GBP GBP, USD, EUR, date) Original
Dealing with Currency Rates• A good example can be found here.
Dealing with StatusWhat/background – Workflow (policies, contracts, documents) – Bottleneck analysis (no of days between stages) – How many on each stage Status Status Status Status 1 2 4 6 date1 date2 date3 date4 Status Status 3 5
Dealing with StatusApproaches– Accumulative Snapshot Fact, 1 row per application– SCD2 on DimApp AppKey AppID StsKey StsDate Current 1 1 1 1/3/11 N– App Status fact table 2 1 2 3/3/11 N 3 1 3 7/3/11 Y AppKey StsKey StsDateKey 4 2 1 6/3/11 N 1 1 61 5 2 2 7/3/11 Y 1 2 63 1 3 67 2 1 66 AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind 2 2 67 1 1/3/11 1 3/3/11 1 7/3/11 1 2 6/3/11 1 7/3/11 1 0
Referenced Dimensions• Enables using one “master” member• Not Snowflake dimension – For ex. • Dim customers: UK, London, Roman Avramovich. • Dim Stores: UK, London, Friendly Bikes Store – What is the total revenue from Internet customers and stores in London?
MDX optimization Methodology• Re-write the MDX code• Add Aggregations• Add pre-calculated Measure Groups (ETL)• Solve the problem using Relational Engine• Use .NET Store Procedures. – Rarely the problem can be solved using better hardware.• Column based Databases
• Optimizing MDX – Baselining Query Speeds • Clearing the Analysis Services Caches • Clearing the Operating System Caches using fsutil.exe or SSAS Stored Proc (codeplex) • Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services • Configuring the Analysis Services Query Log
• Cell-by-Cell Mode vs. Subspace ModeAlmost always, performance obtained byusing subspace (or block computation)mode is superior to that obtained by usingcell-by-cell (nor naïve) mode.
Leaves vs. Non Leaves All Countries, Countries, Country, All City Cities All City AllProduct sProduct Leaves s
Problems with arbitrary shapes• Caching• Partition slices• Indexes• SCOPEs• Matching calculations• Many more(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
SSAS Denali• Coming in the first half of 2012• SSAS Tabular Mode – Cheaper – Not best of breed – Uses DAX or MDX• Have you started working with it?
Mobile BI BI l Smart Phone l BI l Mobile Bi BIGartner
Social BI• Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users.• Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more• Instantly Available via Cloud
Social BI• Deep Personalization• Enterprise Data Integration
Survey• SQL / SSAS Denali• Mobile BI• Social BI