Data ware dimension design

SOFTWARE/WEB/MOBILE/DATABASE ARCHITECT, ENGINEER, AND DEVELOPER
TORONTO, CANADA
HTTP://SAYED.JUSTETC.NET
HTTP://WWW.JUSTETC.NET
Sayed Ahmed
Logical Design of a Data Warehouse

OUR SERVICES
 Free Training and Educational Services
 Training and Education in Bangla:
 Bangla.SaLearningSchool.com
 Training and Education in English:
 www.SaLearningSchool.com
 English.SaLearningSchool.com
 http://sitestree.com
 Ask a question and get answers:
 Ask.JustEtc.net

DESIGNING DIMENSIONS
 Dimension Field/Column Types
 Yes, when designing dimension tables, you need
to define the following types of columns/fields to
facilitate with reporting and analysis
 Keys : Used to identify entities
 Name columns: Used for human names of entities
 Attributes: Used for pivoting in analyses
 Member properties: Used for labels in a report
 Lineage columns: Used for auditing, and never
exposed to end users

DESIGNING DIMENSIONS
 You need to design your dimensions keeping analysis in mind
 Yes, reporting need to be in your mind for sure
 For analysis, we use
 Pivot Table
 Pivot Graph
 For Dimensions
 The fields used as for pivoting are called
 Attributes
 Not all columns in a dimension are attributes
 in OLTP tables, all columns are attributes
 Attributes:
 The fields based on what
 analysis are done
 In previous slide
 you saw the different types of columns in a dimension table

DIMENSION ATTRIBUTES
 Attributes
 For pivoting
 discrete attributes with a small number of distinct
values are the most appropriate
 Attribute values should not be continuous
 Keys are not good candidates for pivoting and
analysis; and so, not great for attributes
 To make continuous column for pivoting
 Convert/utilize it as a small set of discrete values

ON DIMENSION ATTRIBUTES
 SQL Server Analysis Service (SSAS) can
discretize continuous columns to achieve
discrete attributes
 Not always great (the automated process)
 you need to keep business perspectives as well
 Such as, 1 year difference in age can be significant at
young ages
 though may not matter when the age is 60 (depends on the
business perspective as well)
 Considering, we are using age for pivoting
 Age and Income are not good candidates for auto
discretize

NAMING COLUMNS, AND MEMBER PROPERTIES
 Naming columns (another dimension column
type) to identify the entity
 Not good for pivoting or keys
 Such as Address, city, or phones
 Member Properties
 Columns used in reports as labels only, not for
pivoting, are called member properties.
 Can include translations i.e. Naming/member
properties

LINEAGE AND AUDITING
 Lineage and auditing columns
 Used for auditing data
 Never exposed to the users

AUDITING AND LINEAGE
 In data warehouse, you may want some
auditing tables
 For every update, you should audit
 who made the update,
 when it was made,
 and how many rows were transferred
 to each dimension and
 fact table
 in your Data Warehouse

AUDITING AND LINEAGE
 You will need additional fields/columns in
your dimension and fact tables to track
 When, and who, and from where the row data
was/were updated
 Your ETL process needs to be updated
 If you used SSIS for the ETL
 Modify SSIS packages so that you can record these
information

CUSTOMER DIMENSION TABLE (PARTIAL)
Yes, in AdventureWorksDW 2012 database

POSSIBLE ATTRIBUTES FOR CUSTOMER DIMENSION
 Possible Attributes for Customer Dimension
 BirthDate (after calculating age and discretizing the age)
 MaritalStatus
 Gender
 YearlyIncome (after discretizing)
 TotalChildren
 NumberChildrenAtHome
 EnglishEducation (other education columns are for
translations)
 EnglishOccupation (other occupation columns are for
translations)
 HouseOwnerFlag
 NumberCarsOwned
 CommuteDistance

DATE DIMENSION IN ADVENTUREWORKSDW

DATE DIMENSION ATTRIBUTES
 FullDateAlternateKey (denotes a date in date format)
 EnglishMonthName
 CalendarQuarter
 CalendarSemester
 CalendarYear
 Drill Down attributes
 CalendarYear →CalendarSemester → CalendarQuarter
→ EnglishMonthName → FullDateAlternateKey.
 Usually leaf nodes appear in reports – when you can see
a drill down attribute hierarchies

DRILL DOWN HIERARCHIES
 dimension columns used in reports for labels
 are called member properties. – we already know
 In a Snowflake schema
 lookup tables show you levels of hierarchies
 In a Star schema
 you need to extract natural hierarchies from the
names and content of columns.
 Nevertheless, because drilling down through natural
hierarchies is so useful and welcomed by end users,
 you should use them as much as possible.

SLOWLY CHANGING DIMENSIONS
 Related to Auditing to keep track of historical data
 When data changes over time such as
 Someone moves to a different city
 Job title change for someone
 Three approaches to take for the purpose
 Type 1
 History lost
 Type 2
 Keeps all history
 Type 3
 Keeps partial history
 You can use a combination
 For some columns type1 for others type 2

TYPE 1
Information got changed, you just update the information. You lose the previous
information . Example as below:

TYPE 2 SCD
Here you keep track of all changes. In the example below, to keep track of Occupat
You insert new rows and mark the current position with current field.
Sure, you need to come up with ideas so that primary key constraints do not fail
(you can use a second type of keys called surrogate keys)
You can use date from and date to, to keep track of the changes
For the same dimension for some columns you can use Type 1 for others you
can use type 2

TYPE 3
Partial history is kept. In the example only the previous city information is kept

THANK YOU FOR BEING WITH US
 That’s the end of Dimension Table Design
 I may come again with a training video on it
 You will see some slides on Fact Table
Design after this slide
 I will make another presentation document on
that topic

FACT TABLE DESIGN
 Fact Table Design Topics
 Define fact table column types.
 Understand the additivity of a measure.
 Handle many-to-many relationships in a Star
schema.

FACT TABLE COLUMN TYPES
 Fact Table Column Types
 Foreign keys
 Measures
 Lineage columns (optional)
 Business key columns from the primary source
table (optional)
 Surrogate keys

FACT TABLE COLUMNS
 Measure Column Type
 Measure columns help with measurements
useful for a specific business process
 Measures columns are usually numeric
 And can be aggregated
 Measure columns store values that are of
interest to business such as
 sales amount, order quantity, and discount amount

FACT TABLE COLUMNS
 Foreign Key – Column Type
 These are the columns as coming from
Dimension Tables

DESIGNING FACT TABLES
 Fact tables include measures, foreign keys,
and possibly an additional primary key and
lineage columns.
 Measures can be additive, non-additive, or
semi-additive.
 For many-to-many relationships, you can
introduce an additional intermediate
dimension.

 Surrogate Key
 Usually will comes from the primary dimension
table for the current fact table
 Usually one or two columns in a fact table are
surrogate keys


SURROGATE KEYS FOR FACT TABLES
OrderId and LineItemId are the
surrogate keys as coming from the
primary Source Order details table
OrderId and LineItemId columns will help
For quick comparisons with source data
Surrogate keys are not a must in fact tables;
however, they help
Must read:
http://www.kimballgroup.com/2006/07/d
esign-tip-81-fact-table-surrogate-key/

LINEAGE COLUMNS IN FACT TABLES
 Lineage columns –
 Just as with dimension tables, these are strictly
for auditing purposes.
 References:
 https://upsearch.com/implementing-a-data-
warehouse-fact-tables/

ADDITIVITY OF MEASURES
 The primary purpose of Data warehouse is reporting,
and forecasting ( and analysis in some cases)
 Many times reports are aggregations such as sum or
avergae
 Example: sales by quarter, by region, by product type,
 Many reports are usually aggregation
 Hence, fact tables will have some columns to assist
with that measures and aggregation for reporting
 These are the measures columns as we discussed
before
 The measures that you add will help in how you want
to do the measures and reporting

TYPES OF ADDITIVITY OF MEASURES
 Types of Additivity of Measures
 additive measures
 Semi-additive measures
 non-additive measures


 Additive
 If a measure can be summed across all dimensions,
it’s referred to as an additive measure.
 Semi-additive
 Sometimes, however, we can sum a measure across
all dimensions except for time such as account
balance
 We can’t sum the account balance across the time
dimension. We would need to do something like take the
average instead, or simply use the last value. Measures
like this are called semi-additive measures.

 Finally, some measures can’t ever be
summed. These are called non-additive
measures, and include measures like
discount percentages and prices

ADDITIVITY OF MEASURES IN SSAS
 SSAS has support for semi-additive and non-additive
measures
 The SSAS database model is called the Business
Intelligence Semantic Model (BISM). Compared to the
SQL Server database model, BISM includes much
additional metadata.
 SSAS has two types of storage:
 dimensional and tabular.
 Tabular storage is quicker to develop, because it works
through tables like a data warehouse does.
 The dimensional model more properly represents a cube.
 However, the dimensional model includes even more
metadata than the tabular model.

 In BISM dimensional processing, SSAS
offers semi-additive aggregate functions out
of the box.
 For example, SSAS offers the LastNonEmpty
aggregate function, which properly uses the
SUM aggregate function across all
dimensions but time, and defines the last
known value as the aggregate over time.

 In the BISM tabular model, you use the Data
Analysis Expression (DAX) language. The
DAX language includes functions that let you
build semi-additive expressions quite quickly
as well.

 Fact tables
 Collection of measurements on a specific
aspects of business
 Measure columns
 sales amount, order quantity, and discount
amount.

Data ware dimension design

More Related Content

Similar to Data ware dimension design

More from Sayed Ahmed

Recently uploaded

Data ware dimension design