1. Data Warehousing by Example
A Day at the IPL in Kolkata
Prepared By: RANJAN GANGULI
M.En(CSE),University Institute Of Technology
The University of Burdwan, West Bengal,India
email:ganguliranjan1979@gmail.com
Why?
The purpose of this document is to present a ‘Best Practice’ approach to Data Warehouse design
based on some experiences as there is no complete and consistent design methodology to design a
data warehouse.
Getting Started:
Here, I use a trip to watch a Cricket match of IPL in Kolkata India, to show how we can apply our approach to
a real-world situation and develop a design for a tailored Dimensional Model. This approach leads to the
implementation of a Reference Data Architecture and the design of a Data Warehouse.
A Canonical Data Model (CDM) is central to this and the Design Patterns based on a CDM.
Best Practice suggests that when all the steps have been completed, each item produced
should be reviewed and extended or modified as appropriate.
2. Additional Events:
During my trip to Kolkata, more events can be added as:
Event –I. Buy IPL ticket
Event-II. Ate food in a restaurant
Event-III. Watch the IPL match
The Approach:
The Approach is to follow these Steps:
Step 1 – Identify the Events involved
Step 2 – Define a Design Pattern based on the Event-driven Canonical Data Model
Step 3 - Define a Message Format for the data in each Event
Step 4 - Design a 3rd Normal Form Data Warehouse (DWH) and update it for each Event.
Step 5 – Define the format for loading data into the DWH for each Message
The reason for all the work that we have done to get to this point is, of course, to produce
Business Intelligence (‘BI’).
3. Canonical Data Model
(A General Format)
Typical Events could be:
i. A Customer makes a Purchase
ii. A Supplier makes a Delivery of Merchandise
Typical Documents could be :
i. A Sales Receipt
ii. A Contract Letter
iii. A Delivery Note
People and Organisations are examples of the Roles played by Parties.
Parties are often shown in Models produce d by professional Data Models.
In that case, Best Practice usually dictates that Semantic Models are produced to help business users
understand how Customers and so on, are modelled as Parties and Roles.
5. EVENT-I( BUY TICKET FOR IPL)
Activity Performed:
Step-I: Identify the Events:
Event-I. Buy Ticket for IPL Cricket Match
Step-II: The Design Pattern
Below shows how the Design Pattern applies to this Event.
(Please see the above CDM format)
EVENT
(PURCHASE A TICKET)
SERVICES
(CHECK-IN)
STAFF
(PURCHASE A TICKET)
DOCUMENTS
i. Ticket for train/bus
ii. Ticket for IPL
LOCATION
(EDEN GARDEN)
Kolkata
CUSTOMERS CREDIT CARD
6. Step-III: Message Format
This shows the data items on the Ticket:-
EVENT DATE LOCATION PRICE DETAILS
Purchase a ticket 30.01.2017 KOLKATA 2700
BLOCK-A
ROW-123
Step-IV: The 3NF Data ware House
The benefit of adopting a Third-Normal Form ERD is that it enforces a ‘Single View of the Truth’. If
we adopt a Dimensional Model it is not so easy to achieve this. This shows the design of the Data
Warehouse (DWH) after the first Event of Purchasing a Ticket
CUSTOMERS
CUSTOMER SERVICE
DOCUMENT
STAFFSERVICES
LOCATION
ADDRESS
CREDIT CARD
8. Similarly:
Event 2 – Get Lunch
This shows how we handle the Second Event.
The Design Pattern
This shows how the Design Pattern applies to this Event.
Message Format:
EVENT DATE LOCATION PRICE DETAILS
Buy Lunch Date and Time At Restaurant Total Price Rice, Chicken,Dal
etc
BUY LUNCHMENU STAFF
CUSTOMERS CREDIT-CARDRESTAURENT
SALES RECEIPT
9. DATAWARE HOUSE
Restaurant Data Mart:
This shows the Data Mart for Restaurant data.
Customers Service
Customers
Address
Credit Card
Supplier
Services
DocumentLocation
10. Event 3 – Watch the IPL Match:
The Design Pattern
This shows how the Design Pattern applies to this Event.
In this case, the Event is the Match between two Teams and the Outcome is very important
It is quite common for Event to have an Outcome, but so far, it has not been important enough to
justify appearing at the top level.
Message Format:
EVENT DATE LOCATION PRICE DETAILS
Watch the IPL Date and Time At Eden
Garden
Price of Ticket Results/Outcome
CompetitionCricket Venue Staff
Audience
(Customers)
9
Outcome
My Cricket
11. Data Warehouse :
This shows the design of the Data Warehouse (DWH) after the third Event of watching the cricket match.
Data Mart :
This shows the Data Mart for Cricket Competition Results data.
Customer
Customer Service
DocumentsLocation Outcome
Services
Supplier
Staff
Staff
Staff
12. Combined Data Mart:
This shows the three Data Marts:
I. IPL Match Results
II. Restaurant Orders
III. Ticket Sales