A Business Intelligence & Data Warehousing Journey_FINAL-3
1. A Business Intelligence & Data Warehousing Journey
Presented by:
Stefanie Boros - Shradha Salian - Saniya Shukla - Sarah Yousef
1
2. Project Description
C-flat has sat on the back-burner while the sales
exponentially increase
They do not know what inventory levels to keep
They have no knowledge of what trends are
present in customer orders
They feel that if they were able to visualize the
data, they would be able to market this product
effectively and become proactive in their
approach
2
3. Members
3
Name Primary Role Secondary Role
Stefanie Project Manager,
Business Analyst
BI Specialist
Sarah Technical Architect, Data
Integration Specialist
Project Manager,
Data Analyst
Shradha Data Integration Specialist,
Data Analyst
Saniya BI Specialist Business Analyst
4. Data Sources and Challenges
Salesforce: Sales Data
Inconsistencies in how data was entered
Took time to understand attributes
Access: Yield/production data
Had to understand manufacturing process
Access: Parts data
Needed to reconcile parts in Access vs website
Excel: Parts pricing data
Missing information
4
6. Data Integration Mapping
6
Source Data
Access
‘Yield’
Access
‘Parts’
Price
List
Access
Sales Force
Excel File
Staging Area Data Store Data Delivery
E
X
T
R
A
C
T
R
E
C
O
N
C
I
L
E
L
O
A
D
Manual
Cleansing
INTEGRATE
T
R
A
N
S
F
O
R
M
L
O
A
D
7. Business Questions
Production:
What is the trend in yields per part?
Is there any trend in the production by time of the year?
Sales:
What trends are there in the sales of parts?
Is there a seasonality to the sales?
Is there a trend in sales by region of the world?
Customer:
Who are the top customers?
Does an individual customer tend to buy at a certain
time or at a certain time interval?
What does an individual customer tend to buy?
7
9. Learning and Experiences
Applying concepts to a real business
What bad data looks like
Data integration takes the most time!
Reconciling fact tables on row numbers only is not
enough!
New Tools: Talend and Qlikview
Team Dynamics
9
34. ACCOUNT_DIM
34
Reconciliation Criteria Source Function Target Function
Total Number of Unique Rows
(Distinct)
89
Remove Duplicates on
column Account_Name
89
SELECT COUNT (*)
FROM ACCOUNT_DIM
Total Number of Unique Rows
with Region Americas
58
Remove Duplicates on
column Account_Name
COUNTIF(C2:C90,”Americas”
)
58
SELECT COUNT (*)
FROM ACCOUNT_DIM
WHERE Region = ‘Americas’
Total Number of Unique Rows
with Region Asia
11
Remove Duplicates on
column Account_Name
COUNTIF(C2:C90,”Asia”)
11
SELECT COUNT (*)
FROM ACCOUNT_DIM
WHERE Region = ‘Asia’
Total Number of Unique Rows
with Region EMEA
20
Remove Duplicates on
column Account_Name
COUNTIF(C2:C90,”EMEA”)
20
SELECT COUNT (*)
FROM ACCOUNT_DIM
WHERE Region = ‘EMEA’
35. SALES_TIME_DIM
35
Reconciliation Criteria Source Function Target Function
Total Number of Rows 1126 COUNT(A2:A1127) 1126
SELECT COUNT (*)
FROM SALES_TIME_DIM
Max Date 10/31/15 MAX(A2:A1127) 31/10/2015
SELECT MAX (Sales_Date)
FROM SALES_TIME_DIM
Min Date 10/01/12 MIN(A2:A1127) 01/10/2012
SELECT MIN (Sales_Date)
FROM SALES_TIME_DIM
36. SALES_FACT
36
Reconciliation Criteria Source Function Target Function
Total Number of Rows 1021 COUNT(G2:G1022) 1021
SELECT COUNT (*)
FROM SALES_FACT
Sum of Total_Price (Original
Source) vs Sum of
Total_Sales_Dolar_Amount
(Target)
1229631 SUM(J2:J1022) 1229631
SELECT ROUND (SUM
(Total_Sales_Dollar_Amount),
0)
FROM SALES_FACT
Sum of Quantity_Pack
(Intermediate) vs Sum
Quantity_Pack (Target)
4271
SELECT SUM (Quantity_Pack)
FROM SALES_CONVERSION
4271
SELECT SUM (Quantity_Pack)
FROM SALES_FACT
Editor's Notes
Good Evening and thank you to our special guests for joining us! I’m Stefanie and I’m here with my teammates Sarah, Shradha and Saniya. Tonight, we’re going to share our Business Intelligence and Data Warehousing journey with you.
Protochips is a small company in NC that develops analytical tools for the scanning and transmission electron microscope. They work with clients all over the world, and have products in over 25 countries.
A product that has taken second-place to their main, durable systems is the consumable known as C-flat Holey Carbon Grids. This product first was sold in 2012 and sales of it have soared unexpectedly since that time. David, the CEO, decided that it was time to start figuring out what trends exist in the production and sales of this product in order to become more proactive with inventory and marketing, instead of just reacting to the orders coming in.
At this time, Protochips does not use any Business Intelligence (BI) tools. They are not able to see trends easily with their current setup of Excel spreadsheets and Salesforce reports. They are looking for a BI solution that will allow them to view the data easily and from different angles so that they can learn buying patterns and production trends. We also worked with Angela, the director of operations at Protochips, and had both her and David’s most enthusiastic support.
Stefanie: My main role was as the Project Manager but I was also the connection between the team and Protochips. I worked on collecting the data and contacted David and Angela several times throughout the process to understand the data, the process, and to verify that we were on the right track every step of the way. I collaborated with every member of the team to make sure that the direction the various elements of the project were going in was in line with what Protochips was looking for, including working on Qlikview to create the BI tool that would be utilized by the organization.
Sarah: Sarah worked on developing the data models for the target tables, creating the corresponding tables in Microsoft SQL Server and creating their data dictionaries. She was also responsible for the source to target mappings, learning Talend, and loading the data to the target tables. She also worked on the reconciliation document.
Shradha: Shradha worked on creating a database for the source tables, learning Talend to understand the different functionalities and components, and understanding data sources, the relationships among them, and how different operations could be carried out on them. She contributed to cleansing the QuickBooks data source as well as working on creating the validation rules for all of the data sources.
Saniya: Saniya was involved on the BI side of the project to translate the data sets into visualizations of charts, graphs, and tables by addressing the business requirement questions and delivering the charts in the most effective way possible. Before implementing the data on the BI tool, she also created the wireframes and storyboards.
We ended up using Salesforce data which housed all of the customer order details. There were a lot of inconsistencies in how the data was entered and it took us a long time (and several phone calls and emails with Protochips) to understand the attributes.
We had two sources of data come from Protochips’ Access database. The yield and production data represented all of the production runs of each part for the time period we looked at. One challenge with this data was understanding the manufacturing process itself. It is a very technical process but it was important for us to understand how it all worked so that we could effectively represent answers to the business questions.
The other Access data was for the list of part numbers. This data needed to be reconciled with the part prices list. We also had to create an intermediate table the piece together the part numbers in a consistent way.
Finally, we had an Excel spreadsheet with the prices for each part. This list was missing information and needed to be manually completed.
Our dimensional model has two fact tables and 5 dimension tables. It is worth noting that the reason we have two fact tables because of the two distinct areas, Yield and Sales. We have two dimension tables connecting to our yield fact which are the time and parts dimensions, and 2 dimensions connecting to our sales fact table which are time and Opportunity which then snowflakes to our 3rd dimension, Account.
We had different types of source data coming from different places and some data required some cleansing which was done manually.
After that we pulled in all the datasets into Talend. We then performed transformations on some of the data sets.
We loaded our dimension tables, reconciled them, and then did the mappings.
Finally with the data in the fact tables, it was time to reconcile the fact tables.
Then came the fun part! We connected Qlikview to our db, pulled in the data, and created the visualizations that answer the specific questions raised by the client.
We focused on three main areas for the business questions: Production trends, Sales trends, and Customer order trends. Let’s take a look at the demonstration of the BI tool and how it answers these questions.
This project was, hands-down, the most comprehensive project we’ve all encountered in our MSIS career. It was very time-consuming and a bit stressful at times, but it was also such an amazing and relevant learning experience. Not only did we learn new tools, but we also were able to apply the theory learned in class to an actual project. We learned that data can be really ugly! We confirmed that data cleansing and integration most definitely takes the most time! We greatly underestimated the amount of time reconciliation would take and learned the importance of that step. Of course, we also can say that you definitely have to load the dimension tables before the facts.
One other important takeaway from this project was a lesson in teamwork. It was vital that we all pulled our weight, utilized our greatest strengths, and stepped out of our comfort zones to gain immeasurable experience. We are so excited to present our work to you all and Protochips is looking forward to seeing our final product too!