Data Warehousing by Example
A Day at the IPL in Kolkata
Prepared By: RANJAN GANGULI
M.En(CSE),University Institute Of Technology
The University of Burdwan, West Bengal,India
email:ganguliranjan1979@gmail.com
Why?
The purpose of this document is to present a ‘Best Practice’ approach to Data Warehouse design
based on some experiences as there is no complete and consistent design methodology to design a
data warehouse.
Getting Started:
Here, I use a trip to watch a Cricket match of IPL in Kolkata India, to show how we can apply our approach to
a real-world situation and develop a design for a tailored Dimensional Model. This approach leads to the
implementation of a Reference Data Architecture and the design of a Data Warehouse.
A Canonical Data Model (CDM) is central to this and the Design Patterns based on a CDM.
Best Practice suggests that when all the steps have been completed, each item produced
should be reviewed and extended or modified as appropriate.
Additional Events:
During my trip to Kolkata, more events can be added as:
Event –I. Buy IPL ticket
Event-II. Ate food in a restaurant
Event-III. Watch the IPL match
The Approach:
The Approach is to follow these Steps:
Step 1 – Identify the Events involved
Step 2 – Define a Design Pattern based on the Event-driven Canonical Data Model
Step 3 - Define a Message Format for the data in each Event
Step 4 - Design a 3rd Normal Form Data Warehouse (DWH) and update it for each Event.
Step 5 – Define the format for loading data into the DWH for each Message
The reason for all the work that we have done to get to this point is, of course, to produce
Business Intelligence (‘BI’).
Canonical Data Model
(A General Format)
Typical Events could be:
i. A Customer makes a Purchase
ii. A Supplier makes a Delivery of Merchandise
Typical Documents could be :
i. A Sales Receipt
ii. A Contract Letter
iii. A Delivery Note
People and Organisations are examples of the Roles played by Parties.
Parties are often shown in Models produce d by professional Data Models.
In that case, Best Practice usually dictates that Semantic Models are produced to help business users
understand how Customers and so on, are modelled as Parties and Roles.
(Design Pattern based Canonical Data Model)
Message Format:
EVENT-I( BUY TICKET FOR IPL)
Activity Performed:
Step-I: Identify the Events:
Event-I. Buy Ticket for IPL Cricket Match
Step-II: The Design Pattern
Below shows how the Design Pattern applies to this Event.
(Please see the above CDM format)
EVENT
(PURCHASE A TICKET)
SERVICES
(CHECK-IN)
STAFF
(PURCHASE A TICKET)
DOCUMENTS
i. Ticket for train/bus
ii. Ticket for IPL
LOCATION
(EDEN GARDEN)
Kolkata
CUSTOMERS CREDIT CARD
Step-III: Message Format
This shows the data items on the Ticket:-
EVENT DATE LOCATION PRICE DETAILS
Purchase a ticket 30.01.2017 KOLKATA 2700
BLOCK-A
ROW-123
Step-IV: The 3NF Data ware House
The benefit of adopting a Third-Normal Form ERD is that it enforces a ‘Single View of the Truth’. If
we adopt a Dimensional Model it is not so easy to achieve this. This shows the design of the Data
Warehouse (DWH) after the first Event of Purchasing a Ticket
CUSTOMERS
CUSTOMER SERVICE
DOCUMENT
STAFFSERVICES
LOCATION
ADDRESS
CREDIT CARD
Step-V:
Data Mart
This shows the Data Mart for Ticket Sales.
Similarly:
Event 2 – Get Lunch
This shows how we handle the Second Event.
The Design Pattern
This shows how the Design Pattern applies to this Event.
Message Format:
EVENT DATE LOCATION PRICE DETAILS
Buy Lunch Date and Time At Restaurant Total Price Rice, Chicken,Dal
etc
BUY LUNCHMENU STAFF
CUSTOMERS CREDIT-CARDRESTAURENT
SALES RECEIPT
DATAWARE HOUSE
Restaurant Data Mart:
This shows the Data Mart for Restaurant data.
Customers Service
Customers
Address
Credit Card
Supplier
Services
DocumentLocation
Event 3 – Watch the IPL Match:
The Design Pattern
This shows how the Design Pattern applies to this Event.
In this case, the Event is the Match between two Teams and the Outcome is very important
It is quite common for Event to have an Outcome, but so far, it has not been important enough to
justify appearing at the top level.
Message Format:
EVENT DATE LOCATION PRICE DETAILS
Watch the IPL Date and Time At Eden
Garden
Price of Ticket Results/Outcome
CompetitionCricket Venue Staff
Audience
(Customers)
9
Outcome
My Cricket
Data Warehouse :
This shows the design of the Data Warehouse (DWH) after the third Event of watching the cricket match.
Data Mart :
This shows the Data Mart for Cricket Competition Results data.
Customer
Customer Service
DocumentsLocation Outcome
Services
Supplier
Staff
Staff
Staff
Combined Data Mart:
This shows the three Data Marts:
I. IPL Match Results
II. Restaurant Orders
III. Ticket Sales

Data Warehousing by Example

  • 1.
    Data Warehousing byExample A Day at the IPL in Kolkata Prepared By: RANJAN GANGULI M.En(CSE),University Institute Of Technology The University of Burdwan, West Bengal,India email:ganguliranjan1979@gmail.com Why? The purpose of this document is to present a ‘Best Practice’ approach to Data Warehouse design based on some experiences as there is no complete and consistent design methodology to design a data warehouse. Getting Started: Here, I use a trip to watch a Cricket match of IPL in Kolkata India, to show how we can apply our approach to a real-world situation and develop a design for a tailored Dimensional Model. This approach leads to the implementation of a Reference Data Architecture and the design of a Data Warehouse. A Canonical Data Model (CDM) is central to this and the Design Patterns based on a CDM. Best Practice suggests that when all the steps have been completed, each item produced should be reviewed and extended or modified as appropriate.
  • 2.
    Additional Events: During mytrip to Kolkata, more events can be added as: Event –I. Buy IPL ticket Event-II. Ate food in a restaurant Event-III. Watch the IPL match The Approach: The Approach is to follow these Steps: Step 1 – Identify the Events involved Step 2 – Define a Design Pattern based on the Event-driven Canonical Data Model Step 3 - Define a Message Format for the data in each Event Step 4 - Design a 3rd Normal Form Data Warehouse (DWH) and update it for each Event. Step 5 – Define the format for loading data into the DWH for each Message The reason for all the work that we have done to get to this point is, of course, to produce Business Intelligence (‘BI’).
  • 3.
    Canonical Data Model (AGeneral Format) Typical Events could be: i. A Customer makes a Purchase ii. A Supplier makes a Delivery of Merchandise Typical Documents could be : i. A Sales Receipt ii. A Contract Letter iii. A Delivery Note People and Organisations are examples of the Roles played by Parties. Parties are often shown in Models produce d by professional Data Models. In that case, Best Practice usually dictates that Semantic Models are produced to help business users understand how Customers and so on, are modelled as Parties and Roles.
  • 4.
    (Design Pattern basedCanonical Data Model) Message Format:
  • 5.
    EVENT-I( BUY TICKETFOR IPL) Activity Performed: Step-I: Identify the Events: Event-I. Buy Ticket for IPL Cricket Match Step-II: The Design Pattern Below shows how the Design Pattern applies to this Event. (Please see the above CDM format) EVENT (PURCHASE A TICKET) SERVICES (CHECK-IN) STAFF (PURCHASE A TICKET) DOCUMENTS i. Ticket for train/bus ii. Ticket for IPL LOCATION (EDEN GARDEN) Kolkata CUSTOMERS CREDIT CARD
  • 6.
    Step-III: Message Format Thisshows the data items on the Ticket:- EVENT DATE LOCATION PRICE DETAILS Purchase a ticket 30.01.2017 KOLKATA 2700 BLOCK-A ROW-123 Step-IV: The 3NF Data ware House The benefit of adopting a Third-Normal Form ERD is that it enforces a ‘Single View of the Truth’. If we adopt a Dimensional Model it is not so easy to achieve this. This shows the design of the Data Warehouse (DWH) after the first Event of Purchasing a Ticket CUSTOMERS CUSTOMER SERVICE DOCUMENT STAFFSERVICES LOCATION ADDRESS CREDIT CARD
  • 7.
    Step-V: Data Mart This showsthe Data Mart for Ticket Sales.
  • 8.
    Similarly: Event 2 –Get Lunch This shows how we handle the Second Event. The Design Pattern This shows how the Design Pattern applies to this Event. Message Format: EVENT DATE LOCATION PRICE DETAILS Buy Lunch Date and Time At Restaurant Total Price Rice, Chicken,Dal etc BUY LUNCHMENU STAFF CUSTOMERS CREDIT-CARDRESTAURENT SALES RECEIPT
  • 9.
    DATAWARE HOUSE Restaurant DataMart: This shows the Data Mart for Restaurant data. Customers Service Customers Address Credit Card Supplier Services DocumentLocation
  • 10.
    Event 3 –Watch the IPL Match: The Design Pattern This shows how the Design Pattern applies to this Event. In this case, the Event is the Match between two Teams and the Outcome is very important It is quite common for Event to have an Outcome, but so far, it has not been important enough to justify appearing at the top level. Message Format: EVENT DATE LOCATION PRICE DETAILS Watch the IPL Date and Time At Eden Garden Price of Ticket Results/Outcome CompetitionCricket Venue Staff Audience (Customers) 9 Outcome My Cricket
  • 11.
    Data Warehouse : Thisshows the design of the Data Warehouse (DWH) after the third Event of watching the cricket match. Data Mart : This shows the Data Mart for Cricket Competition Results data. Customer Customer Service DocumentsLocation Outcome Services Supplier Staff Staff Staff
  • 12.
    Combined Data Mart: Thisshows the three Data Marts: I. IPL Match Results II. Restaurant Orders III. Ticket Sales