2. 1. Getting into the Context
@folio_3 www.folio3.com Copyright 2015
3. Online Transaction Processing
• Core database
• Usually ER model
• For transactions and routine tasks
@folio_3 www.folio3.com Copyright 2015
4. Data about data, i.e information about data tables
in OLTP System.
@folio_3 www.folio3.com Copyright 2015
5. Extract from source (OLTP)
Transform, according to requirement
Load into Data Warehouse
@folio_3 www.folio3.com Copyright 2015
6. • For effective querying, analysis and decision-
making
• OLAP (Online Analytical Processing) Design
• Subject-oriented, Integrated, Time-varying, non-
volatile collection of data
@folio_3 www.folio3.com Copyright 2015
7. • Access layer of data warehouse
• Subset of data ware house
• Oriented to specific business unit or department
E.g. marketing
• Is not another physical entity
@folio_3 www.folio3.com Copyright 2015
8. To analyze multidimensional data interactively
from multiple perspectives
@folio_3 www.folio3.com Copyright 2015
9. • Computational process of discovering patterns in
large data sets.
• To extract information and transform it into an
understandable structure for further use.
@folio_3 www.folio3.com Copyright 2015
10. Creation and study of the visual representation
of data E.g. scatter plot, bar chart.
@folio_3 www.folio3.com Copyright 2015
11. Retrieve and present a subset of data for a
particular purpose
@folio_3 www.folio3.com Copyright 2015
16. Terminology
Dimensions
The time independent,
textual and descriptive
attributes by which users
describe objects.
Who, where, what, how,
when.
Angles/Dimensions with
which a data can be
viewed.
E.g. Product category,
Date-time of a transaction.
Facts
Business Measurements
(Quantified). E.g. quantity,
amount, cost, taxes.
Things that can be
summed or aggregated.
E.g. sales of a product.
Built from the lowest level
of detail (grain)
Data at consideration
Time dependent
@folio_3 www.folio3.com Copyright 2015
17. Dimensional Modeling Process
Sub-setting
De-normalization
i.e. collapsing hierarchies of dimensions by de-
normalization to 2NF
Summarization
i.e. Summation of Facts
@folio_3 www.folio3.com Copyright 2015
18. Modeling Design Steps
1. Identify the Business Process
Source of “measurements”
2. Identify the Grain
What does 1 row in the fact table represent or mean?
3. Identify the Dimensions
Descriptive context, true to the grain
4. Identify the Facts
Numeric additive measurements, true to the grain
@folio_3 www.folio3.com Copyright 2015
19. Design Steps - Example
@folio_3 www.folio3.com Copyright 2015
20. Case Study: Users Points System
Consider a System simply explained as:
It has users and groups of users.
Every user can perform certain actions like
message, comment, meeting etc.
For every action user get some points that are
also added to the points of user groups that this
user belongs.
The system also has many other features that are
not relevant to points.
Let’s assume the system has over 100 tables to
store various things.
@folio_3 www.folio3.com Copyright 2015
21. Step 1: Identify the Business Process
Question 1: Do we start doing dimensional
modeling to all the 100 tables in the system?
Answer: No
Question 2: So which tables should be
selected?
Answer: The tables that are relevant to the
business requirements.
@folio_3 www.folio3.com Copyright 2015
22. Business Requirements
Three types of points are required for
reporting:
1. Per month points
2. Average lifetime points at end of each month
For:
1. Individual users
2. User groups
3. Individual users per action
4. User groups per action
@folio_3 www.folio3.com Copyright 2015
23. Step 2: Identify the Grain
Analyzing the business requirements, following grains
are identified.
1. Points per individual per month
2. Points per user group per month
3. Points per user per action per month
4. Average Lifetime Points per individual per month
5. Average Lifetime Points per user group per month
6. Average Lifetime Points per user per action per
month
“Grain = What does 1 row in the fact table represent”
@folio_3 www.folio3.com Copyright 2015
24. Step 3: Identify the Dimensions
Simply speaking, the content after ‘per’ in
grain are the dimensions. They are found to
be:
1. Date (granularity: month)
2. Uses
3. User groups
4. Actions
“Dimension: descriptive context true to grain”
@folio_3 www.folio3.com Copyright 2015
25. Step 4: Identify the Facts
4 Facts are identified
1. User Points
2. User Lifetime Average Points
3. User Group Points
4. User Group Lifetime Average Points
“Facts: Numeric additive measures true to grain”
@folio_3 www.folio3.com Copyright 2015
26. Tables Schema
Once Grain, facts and dimensions are identified, table
schema is to be formed using these.
Please note:
It is not necessary to keep all facts in different tables.
They can be part of single table.
Alternatively, there can be multiple fact tables for a
single fact as per its relationship with dimensions.
Every dimension will be in different table and each
dimension can be connected to many fact tables.
@folio_3 www.folio3.com Copyright 2015
27. Tables Schema
Tables Schema should be the translation of
the Grain defined in step 2
@folio_3 www.folio3.com Copyright 2015
28. Star Schema – fact_points_user
Grains covered:
1. Points per individual per month
2. Average lifetime points per individual per month
@folio_3 www.folio3.com Copyright 2015
29. Star Schema – fact_points_user_action
Grains covered:
1. Points per individual per action
per month
2. Average lifetime points per
individual per action per month
@folio_3 www.folio3.com Copyright 2015
30. Star Schema – fact_points_group
Grains covered:
1. Points per user group per month
2. Average lifetime points per user group per month
@folio_3 www.folio3.com Copyright 2015
31. Star Schema for User Points Grains
Grains covered:
1. Points per user group per action
per month
2. Average lifetime points per user
group per action per month
@folio_3 www.folio3.com Copyright 2015
32. Example Query
SELECT fp.*, du.username, da.action_name
FROM fact_points_user_action fp
JOIN dim_user du ON fp.dim_user_id = du.dim_user_id
JOIN dim_date dd ON fp.dim_date_id = dd.dim_date_id
JOIN dim_action da ON fp.dim_action_id = da.dim_action_id
WHERE dd.month = 3 AND dd.year = 2014;
@folio_3 www.folio3.com Copyright 2015
34. Data Transformation
Once the OLAP Schema has been designed, data
is to be moved from the ERD (OLTP) DB to this
new OLAP DB.
This can be achieved using dedicated scripts or
cron jobs.
One simple example for the elaborated case is to
set up a cron that gets executed at every month
end and move relevant data from ERD DB to
OLAP DB after calculations (if any).
@folio_3 www.folio3.com Copyright 2015
35. Conclusion
Dimensional Modeling helps to keep data in a
form that is relevant and quickly accessible for
reporting and analysis.
@folio_3 www.folio3.com Copyright 2015