2. 104 Dimensional Modeling: In a Business Intelligence Environment
5.1 Converting an E/R model to a dimensional model
In this section we describe how to convert an E/R model to a dimensional model.
A dimensional model can be created from the enterprise data warehouse or
directly from OLTP source systems. For additional information, refer to:
Enterprise data warehouse: “Data warehouse architecture choices” on
page 57.
OLTP Source systems: “Data modeling: The organizing structure” on page 47
in the following sections:
– “Independent data mart architecture” on page 59
– “Dependent data mart architecture” on page 61
The following are the steps for converting an E/R model to a dimensional model:
Identify the business process from the E/R model.
Identify many-to-many tables in the E/R model to convert to fact tables.
Denormalize remaining tables into flat dimension tables.
Identify date and time from the E/R model.
The steps are explained in detail in the following sections.
5.1.1 Identify the business process from the E/R model
It is important to understand that an E/R model can be segmented into
multiple dimensional models. An E/R model (which may be an enterprise data
warehouse or an OLTP source system) consists of several business
processes. This is depicted in Figure 5-1 on page 105. For example, an E/R
model for an ERP system includes several business processes, such as retail
sales, order management, procurement, inventory, and store and warehouse
inventory management.
3. Chapter 5. Modeling considerations 105
Figure 5-1 E/R model consists of several business processes
5.1.2 Identify many-to-many tables in E/R model
Once the business processes are separated, the next step is to identify the
many-to-many tables (many-to-many relationships) in the E/R model and convert
them to dimensional model fact tables. These many-to-many relationships
contain numeric and additive non-key facts which generally become facts inside
the fact table.
The idea behind this step is to identify the transaction-based tables that serve to
express many-to-many relationships inside an E/R model.
Every E/R model consists of transaction-based tables which constantly have
data inserted, or are updated with data, or have data deleted from them. Some of
these tables also express a many-to-many relationship. For example, in an ERP
database, there are transaction tables, such as Invoice and Invoice_Details,
which are constantly inserted and updated because they are transaction-based
tables. However, tables such as Employee and Products in an E/R model may
be fairly static.
Description of many-to-many relationships
Many-to-many (m:n) relationships add complexity and confusion to the model
and to the application development process. The key to resolving m:n
relationships is to separate the two entities and create two one-to-many (1:n)
Business
Process #1
Business
Process #3
Business
Process #2
4. 106 Dimensional Modeling: In a Business Intelligence Environment
relationships between them with a third intersect entity. The intersect entity
usually contains attributes from both connecting entities.
To resolve an m:n relationship, analyze the business rules again. Have you
accurately diagrammed the relationship? The telephone directory example has a
m:n relationship between the name and fax entities, as Figure 5-2 depicts. The
business rules say, “One person can have zero, one, or many fax numbers; a fax
number can be for several people.” Based on what we selected earlier as our
primary key for the voice entity, an m:n relationship exists.
A problem exists in the fax entity because the telephone number, which is
designated as the primary key, can appear more than one time in the fax entity;
this violates the qualification of a primary key. Remember, the primary key must
be unique.
Figure 5-2 Telephone directory diagram to show many to many relationship
To resolve this m:n relationship, you can add an intersect entity between the
name and fax entities, as depicted in Figure 5-3 on page 107. The new intersect
entity, fax name, contains two attributes, fax_num and rec_num.
5. Chapter 5. Modeling considerations 107
Figure 5-3 Many-to-many relationship
Another example for many-to-many relationship is shown in Figure 5-4.
Figure 5-4 Many-to-many relationship
In Figure 5-4, the many-to-many relationship shows that employees use many
programming skills on many projects and each project has many employees with
varying programming skills.
5.1.3 Denormalize remaining tables into flat dimension tables
The final step involves taking the remaining tables in the E/R model and
denormalizing them into dimension tables for the dimensional model. The
primary key of each of the dimensions is made a surrogate (non-intelligent,
integer) key. This surrogate key connects directly to the fact table.
Programming
Skills
ProjectsHasEmployee
6. 108 Dimensional Modeling: In a Business Intelligence Environment
5.1.4 Identify date and time dimension from E/R model
The last step generally involves identifying the date and time dimension. Dates
are generally stored in the form of a date timestamp column inside the E/R
model. You will observe that date and time-related columns are generally found
in the transaction-based tables.
We explain the process of converting an E/R model to a dimensional model in the
section below.
Example: An E/R model conversion
We convert the E/R model shown in Figure 5-5 to a dimensional model using the
following steps:
1. Identify the business process: We discussed this step in detail in “Identify
the business process from the E/R model” on page 104. The business
process we identified for our example is retail sales. The E/R model for this
retail sales schema is shown in Figure 5-5.
Figure 5-5 E/R model for retail sales business process
2. Identify the many-to-many tables in the E/R model to convert them to fact
tables. After identifying the business process as retail sales, and identifying
the E/R model as shown in Figure 5-5, the next step is to identify the
7. Chapter 5. Modeling considerations 109
many-to-many relationships that exist inside the model. In order to find this,
we must segregate the tables inside the E/R model into two types.
Transaction-based tables and Non-Transaction based tables, as shown in
Table 5-1.
What is a transaction-based table? In an E/R model, it is one which is
generally involved in storing facts and measures about the business. Such
tables generally store foreign keys and facts, such as quantity, sales price,
profit, unit price, and discount. In transaction tables, records are usually
inserted, updated, and deleted as and when the transactions occur. Such
tables also in many ways represent many-to-many relationships between
non-transaction-based tables. Such tables are larger in volume and grow in
size much faster than the non-transaction-based tables.
What is a non-transaction-based table? This is an E/R model which is
generally involved in storing descriptions about the business. Such tables
describe entities such as products, product category, product brand,
customer, employees, regions, locations, services, departments, and
territories. In non-transaction tables, records are usually inserted and there
are fewer updates and deletes. Such tables are far smaller in volume and
grow very slowly in size, compared to the transaction-based tables.
Table 5-1 Transaction and Non-transaction tables in the E/R model
Note: Fact tables in a dimensional model express the many-to-many
relationships between dimensions. This means that the foreign keys in the fact
tables share a many-to-many relationship.
Transaction tables Non-transaction tables
Store_BILLING
[This transaction table stores the
BILL_NUMBER, Store Billing date, and
time information. The detailed Bill
information is contained inside the
Store_Billing_Details table.]
Employees: Stores information about
employees.
Store_Billing_Details
[This transaction table stores the details
for each store bill.]
Department: Stores department
information for every employee.
Store: Stores the name of the store and
type of store information.
Store_Region: Stores the region, county,
state, and country to which the store
belongs.
8. 110 Dimensional Modeling: In a Business Intelligence Environment
The transaction tables are identified in Table 5-1 on page 109, and we depict
them in Figure 5-6 on page 111.
Suppliers: Stores the supplier name and
supplier manager-related information.
Supplier_Type: Stores supplier type
information.
Products: Stores information relating to
products.
Brand: Stores brand information to which
different products belong.
Categories: Stores categories information
to which different product brands belong.
Packaging: Stores packaging-related
information for each product.
Customers: Stores customer-related
information.
Region: Stores different regions to which
customers belong.
Territories: Stores territories to which
different customers belong.
Customer_Type: Stores customer
classification information.
Transaction tables Non-transaction tables
9. Chapter 5. Modeling considerations 111
Figure 5-6 Identifying transaction-based tables in the E/R model
The Store_BILLING and Store_Billing_Details tables express a many-to-many
relationship that exists between Employee, Store, Customers, and Products.
Some of the relationships are explained below:
Each Employee can sell many products and each Product could be sold by
many Employees.
Each Store could sell many Products and each Product could be sold by
many Stores.
Each Customer could buy many products and each product could be bought
by many Customers.
Each Employee could sell to many Customers and each Customer could
purchase from many Employees.
Figure 5-7 on page 112 shows that the Store_BILLING and Store_Billing_Details
transaction tables express many-to-many relationships between the different
non-transaction tables.
Transaction Based Tables
Note: The many-to-many tables in the E/R model are converted to
dimensional model fact tables. The many-to-many tables in the E/R model are
generally the transaction-oriented tables.
10. 112 Dimensional Modeling: In a Business Intelligence Environment
Figure 5-7 Identifying many-to-many relationships in an E/R model
The Store_BILLING and Store_Billing_Details tables are the tables that identify
the many-to-many relationship between Employee, Products, Store, Customer,
and Supplier tables. The Store_BILLING table stores billing details for an order.
After having identified the many-to-many relationships in the E/R model, we are
able to identify the fact table as shown in Figure 5-8.
Figure 5-8 Fact table
The fact table design may change depending upon the grain chosen.
3. Denormalize remaining tables into flat dimension tables: After we
identified the fact table in Step 2, the next step is to take the remaining tables
Employee
Store Customer
Transaction Based Tables
Products
Supplier
Foreign Keys
Degenerate
Dimension
Facts
Note: Store_BILLING and
Store_Billing_Details tables
convert to a Fact table