2. OUR SERVICES
Free Training and Educational Services
Training and Education in Bangla:
Bangla.SaLearningSchool.com
Training and Education in English:
www.SaLearningSchool.com
English.SaLearningSchool.com
Ask a question and get answers:
Ask.JustEtc.net
3. TOPICS - KEYWORDS
Design a Data Warehouse
Star Schema
Snow Flake Schema
Dimension Tables
Fact Tables
Auditing
Surrogate Keys
Type 1, Type 2, Type 3, and Mixed solutions for
slowly changing dimension data ( SCD
management)
Pivoting for Analysis
To help with SSAS on data warehouse
4. TOPICS - KEYWORDS
Design a Data Warehouse
Additive measures
Semi additive measures
Hierarchies for dimensions
Attributes in dimensions
Attributes in lookup tables
Long term data warehouse design
Usually Star Schema
Short term data warehouse design
POC
Usually snowflake schema
5. TOPICS - KEYWORDS
Fact Tables
measures
foreign keys
and possibly an additional primary key
and lineage columns
granularity of fact tables
auditing and lineage needs
Measures can be
additive
non-additive
semi-additive
6. TOPICS - KEYWORDS
dimension
keys
names
attributes
member properties
translations
and lineage
7. TOPICS - KEYWORDS
attributes
natural hierarchies
many-to-many fact table relationships
you can introduce an additional intermediate
dimension
8. CONCLUSION
Not much – right
However, if you understand all the terms and
can implement all these concepts in your data
warehouse
That will be great
Not necessarily you will need to use all of these
concepts; however, you may need to justify based
on the situation, will all or any of these will help?
What will help and what will not help
Check our sub sequent videos and tutorials
9. THANK YOU
Any Concerns?
http://ask.justetc.net
Or comment below...
10. TOOLS AND SOFTWARE REQUIREMENTS
Download the Adventure Works databases
OLTP database (LOB database)
Data warehouse Database
From
http://msftdbprodsamples.codeplex.com/releases/view/55330
For this tutorial, you can just check our slides
Though the following tools will help
And probably check the details in the downloaded
databases esp. The AdventureWorksDW2012
You will need help from SQL Server and SQL Server
MGMT Studio Tools
11. REQUIRED TOOLS
Useful/Required SQL Server Components
Database Engine Services
Documentation Components
Management Tools - Basic
Management Tools – Complete
SQL Server Data Tools
12. DATA WAREHOUSE DESIGN – THE DETAILS
Data Warehouse Logical Design
Topics: Design and Implement a Data Warehouse
Design and implement dimensions.
Design and implement fact tables
Design Auditing
track the source and time for data coming into a DW through
auditing i.e lineage information
Why a Data Warehouse?
It is hard to
generate reports from OLTP/LOB/Transactional database
To do Analysis on OLTP database data (some times)
Get useful information/useful summarized and details data
to be used to take business decisions
13. DATA WAREHOUSE DESIGN – THE DETAILS
Why a Data Warehouse?
Data in OLTP are heavily normalized. The goal was
to keep one data only in one single place to reduce
redundancy and consistency of data
You may end up with many tables 100s, 1000s
To generate reports you may need to join many
tables – will be slow
Historical data may not be there
Data quality is also an issue
For reporting or analyzing, you may need data from
multiple databases across many departments
14. WHY A DATA WAREHOUSE?
So you can create a Data Warehouse
By cleaning data
With historical data
Combining data from multiple sources
Denormalizing data
Using specific design geared towards Data
Warehouse design
Some or many consider DW design is less complex than
relational database design
Though it also has some complex areas to address... (by those
some or many)
15. SO WHAT DOES A DATA WAREHOUSE CONTAIN?
Usually two schemas are used for a DW
Star Schema-> looks like a star
Snow Flake Schema
Another one called Dimensional Model
Includes both Star and Snow Flake in the same
Data Warehouse
Both Schemas has tables of two types
Dimension Tables
Fact Tables
16. SO WHAT DOES A DATA WAREHOUSE CONTAIN?
Fact Tables are in the center
A Fact table joins/combines all the data required for
this reporting or for the business aspect of this
reporting
Usually combines the primary keys of different tables that
contain data for this report/business aspect
Dimension tables are all the other tables that
contain actual data
Dimension tables are the tables that contain data
these can be the actual tables in the OLTP database
without any modification (Snow Flake)
Or Dimension tables can be newly created by
denormalizing the existing OLTP databases (Star)
17. SO WHAT DOES A DATA WAREHOUSE CONTAIN?
So, you know now what are dimension tables
and what are fact tables
Fact tables contain primary keys of all related tables
(here they are foreign keys)
Dimension tables contain data
Usually, it’s better that you keep your data
warehouse separate from your OLTP database
So bring all the tables (dimension) here
Or denormalize them and bring them here in the new
database
18. SIMPLIFIED: WHAT ARE STAR AND SNOWFLAKE SCHEMAS
If you just create Fact tables and take all the
related tables from your OLTP/LOB databases
You get a Snow Flake Schema
Here all Dimension tables are still normalized (as
you just took them from the actual database)
This is easy –
so good for short-term, quick, and experimental Data
Warehouse
One note, your reporting and analysis services
queries (MDX, DMV) will be slow with Snow Flake
Schemas
19. SO WHAT DOES A DATA WAREHOUSE CONTAIN?
Now, when you denormalize the dimension
tables
You get the start schema
The Fact tables remain the same for example
Star Schema is kind of standard and used a
lot
Originally was developed in 1980’s
20. EXAMPLES: WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
Sales amount for internet sales by different countries and historical years
21. WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
issues that I did not mention before
If your OLTP database was well designed (?)
It may be hard to find the tables related to the
reporting
The table names and the column names can be tricky
– do not follow any conventions – do not have
meaning
So it can be hard to find data for the reporting
22. WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
Note: Reality:
The OLTP may not even be well designed (that makes
reporting hard sometimes) even the relationships as well
as normalization
– here we assumed that OLTP is perfect
In a long back project
I had to re-write/verify/check/change/optimize/had to deal with
(whatever you say) 100s (not really 100s, can be close to 100) of
queries for a reporting system
Had to change the interface from one button for one report
(easy to get lost)
Into a drop down list of reports
The relations among data were arbitrary – actually had only in the
mind of the designer – did not follow any standards – No ER – no
standard concepts---
So it was a hard job..
Anyway..
23. WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
In such cases
Tools such as SQL Profiler might help
you could create a test environment,
try to insert some data through an LOB application
have SQL Profiler identify where the data was inserted
Another, issue with this particular example
No lookup for dates and years
You need to extract
The tables may not contain even historical data
No date field
So no historical data
24. WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
If sales data reside in multiple databases even by
multiple departments
How do you merge
Identify and match
Customer data can be in different database with no
common identification
Data quality can be low
Data missing
Partial data
Inconsistent data in multiple databases
Data can be represented differenlt in different database
M or F for gender
1 or 0 for gender
26. TOTAL DW: MULTIPLE STAR SCHEMAS
You saw one Star Schema for Internet Sales
You can see another for Offline Sales
Another for Accounting
Your DW has many such Star Schemas
And these start schemas need to be connected/related
They will be connected when you use the same
dimensions for them
i.e. If two star schemas have the same dimension they can
share that dimension
Called: shared or conformed dimensions
For SSAS, you can use shared dimensions only
There is a concept of private dimension
Not a great idea in practical and real life applications
You cannot connect/compare/verify the data over the shared dimension
29. SNOW FLAKES WILL BE MORE AND MORE NORMALIZED
Everything can be normalized
Or the first level can be normalized others
are not
30. NORMALIZED PRODUCT DIMENSION
In the Star Schema, you could use these normalized product table to get snow
flake schema (partially.) Could use all normalized dimensions to get full snow
flake
31. SNOW FLAKE
In Snow flake, you may see partial than full
snow flakes in reality
Though, in reality, better to go for star
schema
Queries will be faster
33. GRANULARITY
The number of Dimension Tables connected
to a fact table
Dimension of a star schema
Cube = 3 dimension
SSAS operates/analyzes on Cube
34. AUDITING AND LINEAGE
I will be very short on this
In data warehouse, you may want some
auditing tables
For every update, you should audit
who made the update,
when it was made,
and how many rows were transferred
to each dimension and
fact table
in your DW
35. AUDITING AND LINEAGE
You will need additional fields/columns in
your dimension and fact tables to track
When, and who, and from where the row data
was/were updated
Your ETL process needs to be updated
If you used SSIS for the ETL
Modify SSIS packages so that you can record these
information
36. THANK YOU
Any Concerns?
http://ask.justetc.net
Or comment below...
37. DESIGNING DIMENSIONS
Keys . Used to identify entities
Name columns . Used for human names of
entities
Attributes . Used for pivoting in analyses
Member properties . Used for labels in a
report
Lineage columns . Used for auditing, and
never exposed to end users
38. For analysis
Pivot Table
Pivot Graph
For Dimensions
The fields used as for pivoting are called
Attributes
Not all columns are attributes
Attributes: based on what analysis are done
In previous, slide you saw the different types of
columns
39. Attributes
For pivoting, discrete attributes with a small
number of distinct values is most appropriate
Should not be continuous
Keys are not good candidates for pivoting and
analysis
To make continous column for pivoting
Concert/utilize it as a small set of discrete values
40. SSAS can discretize continuous attributes.
Not always great – need business perspecyive as
well
Age and Income are not good candidates for auto
discretize
Naming columns to identify the entity
Not good for pivoting or keys
Address such as
Columns used in reports as labels only, not for
pivoting, are called member properties.
Can include translations
41. Lineage and auditing columns
Used for auditing data
Never exposed to the users
42.
43. Possible Attributes
BirthDate (after calculating age and discretizing the age)
MaritalStatus
Gender
YearlyIncome (after discretizing)
TotalChildren
NumberChildrenAtHome
EnglishEducation (other education columns are for translations)
EnglishOccupation (other occupation columns are for
translations)
HouseOwnerFlag
NumberCarsOwned
CommuteDistance
44.
45. FullDateAlternateKey (denotes a date in date
format)
EnglishMonthName
CalendarQuarter
CalendarSemester
CalendarYear
Drill Down attributes
CalendarYear →CalendarSemester → CalendarQu
arter → EnglishMonthName → FullDateAlternateKey
.
46. why dimension columns used in reports for
labels are called member properties.
In a Snowflake schema, lookup tables show you
levels of hierarchies. In a Star schema, you
need to extract natural hierarchies from the
names and content of columns. Nevertheless,
because drilling down through natural
hierarchies is so useful and welcomed by end
users, you should use them as much as
possible.
47. SLOWLY CHANGING DIMENSIONS
Type 1
History lost
Type 2
Keeps all history
Type 3
Keeps partial history
You can use a combination
For some columns type1 for others type 2
48.
49.
50.
51.
52. DESIGNING FACT TABLES
Fact tables include measures, foreign keys,
and possibly an additional primary key and
lineage columns.
Measures can be additive, non-additive, or
semi-additive.
For many-to-many relationships, you can
introduce an additional intermediate
dimension.
53. Fact tables
Collection of measurements on a specific
aspects of business
Measure columns
sales amount, order quantity, and discount
amount.