SlideShare a Scribd company logo
Data Warehouse
On
Retail Store
By: Siddharth Chaudhary
X16137001
Msc in Data Analytics
National College of Ireland
Table of Contents
Introduction:.........................................................................................................................................2
Data Sources: .......................................................................................................................................2
Data source 1-..................................................................................................................................2
Data Source 2-.................................................................................................................................2
Data Source 3-.................................................................................................................................2
Data Warehouse Design and Architecture:...........................................................................................3
Design of Data Warehouse:..................................................................................................................6
Dim_Customer: ...............................................................................................................................6
Dim_Product: ..................................................................................................................................6
Dim_Location: ................................................................................................................................7
Dim_Source:....................................................................................................................................7
Dim_Month:....................................................................................................................................7
Fact_Table:......................................................................................................................................7
Star Schema of Project:...................................................................................................................8
Extract Transform Load(ETL) process: ...............................................................................................8
Extraction: .......................................................................................................................................9
Transformation:.............................................................................................................................10
Loading: ........................................................................................................................................ 11
Deploying the CUBE: ................................................................................................................... 11
Business Analytics .............................................................................................................................12
Case Study:1..................................................................................................................................12
Analysis: ...................................................................................................................................13
Case Study:2..................................................................................................................................13
Analysis: ...................................................................................................................................13
Case Study:3..................................................................................................................................14
Analysis: ...................................................................................................................................14
Case Study:4..................................................................................................................................14
Analysis: ...................................................................................................................................15
Conclusion: ........................................................................................................................................15
Introduction:
15 years back Information technology gave a gift to this world,-E-Commerce. Since than every small
and big business has used it to improve its outreach, customer count, sales, profit and each possible
aspect. But this was not sufficient. As data grew from MB to GB to PB, these smart business felt a to
store this data efficiently and to utilize it for improving various aspect of business.
One such domain is retail where customer are products are key aspect. Which product is needed by
what type of customer and when are the key questions of retail business. If they are answered well
the can take retail business to new heights. In solving these queries Data Warehouse plays an
important role. It helps to analyze key aspects to improve sale of retail stores. To know what customer
buys and in which season, we need to have a look over the whole data. So first we need to collect the
whole historical data in one place in a standard format. This is done by preparing data ware house.
There are many software which helps in this like Teradata, Netezza, Oracle, Hadoop etc. Once the
warehouse is prepared we can use this dataset in many ways to answer endless queries. In this project
I have simulated the real time data warehouse preparation and answering business queries.
Data Sources:
Data is the basic requirement of any data warehouse. Data for this data warehouse is collected from
three different datasets. The first one is from a Global supermarket store, from which I took data of
five different stores from different locations of USA for year 2012. Second is the revenue collection
from each store in each month. Third one signifies which month fall in which season in USA.
Three of the dataset are easily coerced together as all the dataset have same month_id in each dataset
which is used in sql query for lookup to populate data in fact table as shown in fig.12
Data source 1-
This dataset had been fetched from www.Kaggle.com . Kaggle is a repository of thousands of data
set. This dataset contain data of supermarket of whole globe. The data used in this data warehouse is
of five different state of USA which are New York, New Jersey, New Hampshire, Utah, Texas.
Link of the dataset:
https://kaggle2.blob.core.windows.net/datasets/1048/1903/global_superstore_2016.xlsx.zip?sv=201
5-12-11&sr=b&sig=V6MbJAh5QVwQC8wLLiPrsC8dKochxZ354VLclEnFuWM%3D&se=2017-
04-07T08%3A21%3A15Z&sp=r
Data Source 2-
This dataset is a dummy dataset which is generated by mockaroo. This dataset contains the revenue
of each month for each state.
Data Source 3-
This is the unstructured data set which I Scraped from the site:
https://www.englishclub.com/vocabulary/time-months-of-year.htm
This data has been uploaded into excel which looks like as shown in Fig.3.1 which is cleaned in and
made structured as shown in Fig.3.This dataset have seasons of USA.
Fig.1
Fig.2
Data Warehouse Design and Architecture:
To carry out the analysis of retail store in different state of USA like how much is the revenue
generation, amount of product sold in what month and in which season Kimball’s approach is used
to build this Data Warehouse.
Design Tool for this Data Warehouse:-
● Sql Server Management Studio
● Sql Server Integration Services
● Sql Server Analysis Services
I have followed the Kimball’s architecture which consist of the following procedures :-
• Identification of the Process of Business:- We need to define the main process of business
like acquiring customer, acquiring the products, then sale process. We also need to understand
at what level sales data is summarized. Whether it is daily, weekly or monthly level. This step
helps in determining the entities and their relationship as per business requirement. Later on
these entities becomes the dimensions of the business. The most important entities are
Cusotmer, Product, Location, and time.
• Defining the Grain:- Grains mean at what depth we need to store the data for these
dimension. It defined the granularity of the system. In this project we are going to store sales
of the product at month level.
• Defining the Dimensions :- Once entities and grains are decided we can decide the
dimension. This dataset contains five dimensions -
Dimension Name Primary Key Example
Customer Customer-Key Sam
Product Product_Key Jeans
Location Location_Key Chicago
Season season_Key Summer
Month Month_Key June
Table -1
These dimensions contain descriptive and textual data.
• Deciding the fact of the Data Warehouse:-Fact table defines the measurable data we are
going to store for the dimesions. It is the pivot of star schema which contain all the primary
keys of dimensions and the measurable quantities which are used to carry out business queries.
This fact data is designed in such a manner that it helps in identifying which is our regular
customer, how to improve retail business as each season have variation in selling of product,
how much revenue is generated in which state and last but not least which is the highest selling
product.
Advantage of Kimball’s Model: Kimball model has slight different approach to build data
warehouse as it follows bottom up approach which help in merging small datasets.
• Performane of Kimball model is better
• More focus is on Dimension which play important role for analysis
• Focus of this approach is on the process of Building DW
• Less time consuming in creating the DataWarehouse
Overview of building data warehouse to carry out Business intelligence queries:-
In SSIS package Etl is done three of the datasets are in excel sheet which are extracted into the staging
table,From staging table data is populated into the Dimensions table.with the help of lookup tool(join)
data is being populated into the fact table.Cube is deployed in SSAS.Business queries are carried out
in power BI.as shown in Fig.a
Fig.3
Star Schema: Star Schema looks like a star in which Fact Table act as a pivot as it resides in the
center, while multiple Dimensions are attached to the fact table in a star like form having concepts of
Foreign key.A simple Star Schema usually have one Fact Table and multiple Dimensions but a
complex Star Schema can consist more than one Fact Table. Generally, Fact Tables are in 3NF.
Fact Table: Fact Table consist two type of column(i) Measure columns (ii) Foreign key column.
Measure columns consist of numeric values that can be measured or count while foreign key column
consist of column which act as primary key in dimension tables. Measure column can be used in form
of aggregation or without aggregation for analysis of Business query.
Dimension Table: Dimension table consist of Textual and descriptive values. Each dimension Table
have their own primary key which is a unique table represent other column values. The surrogate
column known as foreign key column in Fact Table is nothing else but they are the Primary key
column of Dimension Table
Fig.4
Advantage of Star Schema: Star schema has various merit which prove its efficiency as well as its
specialty in building a Data warehouse.
• Easy to generate an ETL process
• Complexity is low as table query has direct relationship
• Decrease the headache of Normalizing, as data in dimension tables is stored in normal form
• It is very efficient to carry out metric analysis
• Each Dimension table is directly connected to Fact Table
• Navigation of Data is fast as of the nature of connection of fact and dimension table.
Design of Data Warehouse:
For this Retail Data warehouse five dimensions and one fact table have been created.
Dim_Customer:
Customer dimension consist of Customer name, Customer id, Customer key. Customer key is the
primary key in this dimension. It is generated when we I create the dimension by entering query
[Customer_Key] INT Identity (1,1)PK. Now the question is why I generated this, as I was already
having customer_id. As the primary key should be unique, none of the value should be repeated but
as the customer is repeated their id will also repeat and that won’t make the column unique,so to
remove this redundancy Customer_key as the primary key of this dimension is auto generated.
Customer_name contains the name of customer and customer_id column contain the id of customer.
With this dimension we can analyse which one is our regular customer.
Fig 5
Fig 6
Dim_Product:
Product dimension has product_key as the primary key. Product_id contain id of the products.
Product_name contain the name of product sold.With the help of this dimension we can analyze which
is the highest selling product and which customer buys what product.
Fig 7
Dim_Location:
Location dimension contain Location_Key as primary key. State_id is the id of state. State_name
contains the name of state of store location. Region name contains the region of the country. This
dimension is helpful in analyzing which state or region have higest number of customer,which state
got highest sale. It will also help in analyzing the revenue earned in each state or region.
Fig 8
Dim_Source:
This dimension is fetched from unstructured dataset. It contain Season_key as primary key.
Se_month_id is the id of a particular month. This Dimension will help in analyzing which month
shows the highest sale and which season has what highest selling product.
Fig 9
Dim_Month:
This dimension contains Month_Key as Primary Key. S_month_id contain the id of particular month.
Month_name contain the month.This dimension can be used in analyzing highest sale in a state
according to month or which is the highest sold product in a month.
Fig 10
Fact_Table:
For our retail superstore we have created one fact table which is connected with each dimension table
with foreign key relationship. It has three columns for measurement.
(i) product_quantity- It contains the product of quantity sold.
(ii) total_sale- It contain the sale amount of customer visit wise.
(iii) revenue- It contain the amount of revenue generated in the store month wise.
Fig 11
Star Schema of Project:
Dimension tables and Fact Table is connected together using Star schema as shown in Fig 12.
Fig.12
Extract Transform Load(ETL) process:
For Building a data warehouse the important thing is extracting data, then this data is transformed
into the staging area and lastly loaded in destination area. This is known as ETL process. To carry out
ETL process for SSIS toolbox is used. In ETL process data from the External source is Extracted into
the staging Database. Next step is to carry Transformation stage. Loading stage is the end of ETL
process in which data is loaded in fact table.At the end of ETL process data is populated in fact table
as well as in dimension table as shown in Fig.6.
Fig.13
Extraction:
Data is extracted from external source in this phase. For this project excel sheets are the external
source. Otherwise it can be any database or OLTP server. This extraction will load the data into the
the staging database base, which is ole db destination as shown in Fig 14. All the data is extracted
into the database from these excel files. We can also see the data which comes in staging phase is
stored in the database as
(i) dbo.Main_Stage
(ii) dbo.season_stage
(iii) state_stage as shown in Fig 15.
A Truncate Query is written in staging phase so that no multiple data is generated due to multiple
run as shown in Fig 16.
Fig.14
Fig.15
Fig.16
Transformation:
After the data is extracted from excel to staging database, next step which is done is
transformation.For transformation i have used lookup tool(join) and sql query as shown in Fig.19.2
for loading the data from dimension tables. we have five dimension tables in our data base and 1 fact
table.
(i) dbo.Dim_Customer
(ii) dbo.Dim_Location
(iii) dbo.Dim_Month
(iv) dbo.Dim_Product
(v) dbo.dim_Source
(vi)dbo.Retail_Fact
These dimensions are shown in Fig.17.Dimensions are one of the important factor in analyzing data.
Mapping should not be mismatched as it will terminate the ETL flow.
Fig.17
Fig.18
Loading:
After populating Dimension table next step is to populate Fact table. Fact table contains all the
primary key of the dimension tables and some measureables which are used for analysis purpose with
some aggregation rule. Lookup tool (joins) is used to populate the dimension table and Measures in
fact Table.
Fig.19.1
Fig.19.2
Deploying the CUBE:
It is the phase to carry out multidimensional representation of data with the help of cube in SSAS
which is further use to analyze the data on the basis of measures which are present in fact table and
the descriptive,textual data present in Dimension tables. Here, Project.Cube is successfully deployed
as shown in Fig.20 & Fig.21. After deploying the cube, phase of analysis and reporting start’s where
Business intelligence query is carried out.
Fig.20
Fig.21
Business Analytics
Tool Used for Business Query-: Power BI
Power BI is used to carry out the analysis of this Data Warehouse.For analyzing cube is imported in
power BI. with the help of descriptive, textual and measurable quantity business queries have been
carried out.
Following business query can be analyzed with the help of our database.
Case Study:1
Does Seasons(summer,spring,winter,autumn) in 3 different regions of USA effect the retail
store business in term of revenue collection.
This Query touches all of the three dataset. To verify the above Query we will take revenue, season
name and region name. Below Graph shows how much revenue is generated in which region and in
which season.
Fig.22
Analysis:
From the clustered bar chart representation we can analyze that highest revenue is generated in
summer season followed by autumn, then by winter and spring is responsible for least revenue in
each region of USA. Graph also shows that in all the seasons store earns most of its revenue from
Eastern US and Western season stood last. This graph give a quick insight to marketing and sales
team that they need work on Western region to increase sales and find the reason of spring being so
slow.
Case Study:2
Sales generated in different states on basis of seasons
This Query is generated from all the three dataset. To predict above query Total sale, State and Season
is used. Below is the pie chart Fig.23 represent sale of different states in different season.
Fig.23
Analysis:
This pie chart is used to analysis the sales of store in different state in different season. As the Fig.23
shows that sale in Texas in summer season is highest, followed by New York. The pie chart shows
that New York got highest sale in autumn Season and is followed by Texas. So New York and Texas
are biggest buyers in any season. While rest of states are slow in all seasons. So it seems state is very
important factor in terms of sales. We need to understand the needs of Western US states which our
store is not able to cater. Either we need to change the products or increase some offers or may be
store manager is not very efficient. Season and State are very important factor in US. The product
which is suitable for New York in Winter might not be suitable for Utah during same time. This kind
of variation is needed while planning store products.
Case Study:3
Analytical Targeting of customers
To predicate the above query we need to check which customer buys maximum number of products
in which season. Product quantity, Customer Name and season is used for targeting specific
customers.
Fig.24
Analysis:
The Donut chart Fig.24 represent customer who buys maximum number of products in four different
season. Figure explains which customer bought what quantity of product in which season. According
to the business point of view we can target the specific Customer and provide some more offers to
improve our sales.
Case Study:4
Seasons affecting the revenue of States
This query also touches three of the dataset.To analyze the above query we used seasons, revenue,
states to check the amount of revenue generated from each state in every season.
Fig.25
Analysis:
The above graphical representation Fig.25 shows how much revenue is collected in each state in each
season. New York have generated highest amount of revenue in each season.while New Hampshire
have generated the least. In perspective of business New York and Texas revenue generation is
significantly high.
Conclusion:
This data warehouse can help in depicting how we can target specific customer in which region of
the country. New York and Texas have highest sale and highest revenue generation while New
Hampshire have significance less than each of the other state.so to improve the sale in New
Hampshire, Utah, New Jersey. Seasons also play important role in retail business as the sale in
summer season is the highest of all. with the help of this Data Warehouse we can also examine which
product is sold in which month so we can give some extra offers on that particular product.

More Related Content

What's hot

Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
Shahed Khalili
 
Chapter 2 - Retail Sales
Chapter 2 - Retail Sales Chapter 2 - Retail Sales
Chapter 2 - Retail Sales
Khairul Shafee Kalid
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
Mr. Fmhyudin
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
thomasmary607
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
Eric Matthews
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
Karthik Srini B R
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
Siddique Ibrahim
 
ETL Process
ETL ProcessETL Process
ETL Process
Rashmi Bhat
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
Abhi Bhardwaj
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Ramkrishna bhagat
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Jason S
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
puja_dhar
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
ahsan irfan
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
Data warehouse
Data warehouseData warehouse
Data warehouse
krishna kumar singh
 

What's hot (20)

Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Chapter 2 - Retail Sales
Chapter 2 - Retail Sales Chapter 2 - Retail Sales
Chapter 2 - Retail Sales
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Similar to Data warehouse project on retail store

Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
AnwarrChaudary
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
Sonali Gupta
 
Business analytics and data warehousing
Business analytics and data warehousingBusiness analytics and data warehousing
Business analytics and data warehousing
Samir Majumder
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
jainyshah20
 
INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAINING
ZaranTech LLC
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paperjuly12jana
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptx
hajon27910
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARN
abclearnn
 
Iowa liquor sales
Iowa liquor salesIowa liquor sales
Iowa liquor sales
Trushita Redij
 
SALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdfSALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdf
Sparkflows
 
IS 2 Long Report Pardeep kumar 1271107
IS 2  Long Report Pardeep kumar  1271107IS 2  Long Report Pardeep kumar  1271107
IS 2 Long Report Pardeep kumar 1271107TouchPoint
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
Sarita Kataria
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
Trushita Redij
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
QUONTRASOLUTIONS
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Data miningvs datawarehouse
Data miningvs datawarehouseData miningvs datawarehouse
Data miningvs datawarehouse
Suman Astani
 

Similar to Data warehouse project on retail store (20)

Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
Business analytics and data warehousing
Business analytics and data warehousingBusiness analytics and data warehousing
Business analytics and data warehousing
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
 
INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAINING
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptx
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARN
 
Iowa liquor sales
Iowa liquor salesIowa liquor sales
Iowa liquor sales
 
SALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdfSALES_FORECASTING of sparkflows.pdf
SALES_FORECASTING of sparkflows.pdf
 
IS 2 Long Report Pardeep kumar 1271107
IS 2  Long Report Pardeep kumar  1271107IS 2  Long Report Pardeep kumar  1271107
IS 2 Long Report Pardeep kumar 1271107
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
ETL QA
ETL QAETL QA
ETL QA
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Data miningvs datawarehouse
Data miningvs datawarehouseData miningvs datawarehouse
Data miningvs datawarehouse
 

More from Siddharth Chaudhary

Certificate importing data in python from relational database,xls and flat fi...
Certificate importing data in python from relational database,xls and flat fi...Certificate importing data in python from relational database,xls and flat fi...
Certificate importing data in python from relational database,xls and flat fi...
Siddharth Chaudhary
 
Certificate cleaning data in python
Certificate cleaning data in pythonCertificate cleaning data in python
Certificate cleaning data in python
Siddharth Chaudhary
 
Certificate network analysis
Certificate network analysisCertificate network analysis
Certificate network analysis
Siddharth Chaudhary
 
Certificate pandas foundation
Certificate pandas foundationCertificate pandas foundation
Certificate pandas foundation
Siddharth Chaudhary
 
Certificate Supervised learning with scikit learn
Certificate Supervised learning with scikit learnCertificate Supervised learning with scikit learn
Certificate Supervised learning with scikit learn
Siddharth Chaudhary
 
Certificate unsupervised learning in python
Certificate unsupervised learning in pythonCertificate unsupervised learning in python
Certificate unsupervised learning in python
Siddharth Chaudhary
 
Certificate cleaning data in r
Certificate cleaning data in rCertificate cleaning data in r
Certificate cleaning data in r
Siddharth Chaudhary
 
Machine learning project
Machine learning projectMachine learning project
Machine learning project
Siddharth Chaudhary
 
Certificate joining data in postgre sql course
Certificate joining data in postgre sql courseCertificate joining data in postgre sql course
Certificate joining data in postgre sql course
Siddharth Chaudhary
 
Certificate introduction to r for finance
Certificate introduction to r for financeCertificate introduction to r for finance
Certificate introduction to r for finance
Siddharth Chaudhary
 
Certificate forecsating using r
Certificate forecsating using rCertificate forecsating using r
Certificate forecsating using r
Siddharth Chaudhary
 
Certificate arima modeling with r
Certificate arima modeling with rCertificate arima modeling with r
Certificate arima modeling with r
Siddharth Chaudhary
 
Certificate introduction to r course
Certificate introduction to r courseCertificate introduction to r course
Certificate introduction to r course
Siddharth Chaudhary
 
Thesis report
Thesis reportThesis report
Thesis report
Siddharth Chaudhary
 
Project on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environmentProject on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environment
Siddharth Chaudhary
 
Project on visualization
Project on visualizationProject on visualization
Project on visualization
Siddharth Chaudhary
 
Salesforce project
Salesforce projectSalesforce project
Salesforce project
Siddharth Chaudhary
 
Automated home secuirty project
Automated home secuirty projectAutomated home secuirty project
Automated home secuirty project
Siddharth Chaudhary
 
Statistics report
Statistics reportStatistics report
Statistics report
Siddharth Chaudhary
 

More from Siddharth Chaudhary (19)

Certificate importing data in python from relational database,xls and flat fi...
Certificate importing data in python from relational database,xls and flat fi...Certificate importing data in python from relational database,xls and flat fi...
Certificate importing data in python from relational database,xls and flat fi...
 
Certificate cleaning data in python
Certificate cleaning data in pythonCertificate cleaning data in python
Certificate cleaning data in python
 
Certificate network analysis
Certificate network analysisCertificate network analysis
Certificate network analysis
 
Certificate pandas foundation
Certificate pandas foundationCertificate pandas foundation
Certificate pandas foundation
 
Certificate Supervised learning with scikit learn
Certificate Supervised learning with scikit learnCertificate Supervised learning with scikit learn
Certificate Supervised learning with scikit learn
 
Certificate unsupervised learning in python
Certificate unsupervised learning in pythonCertificate unsupervised learning in python
Certificate unsupervised learning in python
 
Certificate cleaning data in r
Certificate cleaning data in rCertificate cleaning data in r
Certificate cleaning data in r
 
Machine learning project
Machine learning projectMachine learning project
Machine learning project
 
Certificate joining data in postgre sql course
Certificate joining data in postgre sql courseCertificate joining data in postgre sql course
Certificate joining data in postgre sql course
 
Certificate introduction to r for finance
Certificate introduction to r for financeCertificate introduction to r for finance
Certificate introduction to r for finance
 
Certificate forecsating using r
Certificate forecsating using rCertificate forecsating using r
Certificate forecsating using r
 
Certificate arima modeling with r
Certificate arima modeling with rCertificate arima modeling with r
Certificate arima modeling with r
 
Certificate introduction to r course
Certificate introduction to r courseCertificate introduction to r course
Certificate introduction to r course
 
Thesis report
Thesis reportThesis report
Thesis report
 
Project on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environmentProject on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environment
 
Project on visualization
Project on visualizationProject on visualization
Project on visualization
 
Salesforce project
Salesforce projectSalesforce project
Salesforce project
 
Automated home secuirty project
Automated home secuirty projectAutomated home secuirty project
Automated home secuirty project
 
Statistics report
Statistics reportStatistics report
Statistics report
 

Recently uploaded

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Data warehouse project on retail store

  • 1. Data Warehouse On Retail Store By: Siddharth Chaudhary X16137001 Msc in Data Analytics National College of Ireland
  • 2. Table of Contents Introduction:.........................................................................................................................................2 Data Sources: .......................................................................................................................................2 Data source 1-..................................................................................................................................2 Data Source 2-.................................................................................................................................2 Data Source 3-.................................................................................................................................2 Data Warehouse Design and Architecture:...........................................................................................3 Design of Data Warehouse:..................................................................................................................6 Dim_Customer: ...............................................................................................................................6 Dim_Product: ..................................................................................................................................6 Dim_Location: ................................................................................................................................7 Dim_Source:....................................................................................................................................7 Dim_Month:....................................................................................................................................7 Fact_Table:......................................................................................................................................7 Star Schema of Project:...................................................................................................................8 Extract Transform Load(ETL) process: ...............................................................................................8 Extraction: .......................................................................................................................................9 Transformation:.............................................................................................................................10 Loading: ........................................................................................................................................ 11 Deploying the CUBE: ................................................................................................................... 11 Business Analytics .............................................................................................................................12 Case Study:1..................................................................................................................................12 Analysis: ...................................................................................................................................13 Case Study:2..................................................................................................................................13 Analysis: ...................................................................................................................................13 Case Study:3..................................................................................................................................14 Analysis: ...................................................................................................................................14 Case Study:4..................................................................................................................................14 Analysis: ...................................................................................................................................15 Conclusion: ........................................................................................................................................15
  • 3. Introduction: 15 years back Information technology gave a gift to this world,-E-Commerce. Since than every small and big business has used it to improve its outreach, customer count, sales, profit and each possible aspect. But this was not sufficient. As data grew from MB to GB to PB, these smart business felt a to store this data efficiently and to utilize it for improving various aspect of business. One such domain is retail where customer are products are key aspect. Which product is needed by what type of customer and when are the key questions of retail business. If they are answered well the can take retail business to new heights. In solving these queries Data Warehouse plays an important role. It helps to analyze key aspects to improve sale of retail stores. To know what customer buys and in which season, we need to have a look over the whole data. So first we need to collect the whole historical data in one place in a standard format. This is done by preparing data ware house. There are many software which helps in this like Teradata, Netezza, Oracle, Hadoop etc. Once the warehouse is prepared we can use this dataset in many ways to answer endless queries. In this project I have simulated the real time data warehouse preparation and answering business queries. Data Sources: Data is the basic requirement of any data warehouse. Data for this data warehouse is collected from three different datasets. The first one is from a Global supermarket store, from which I took data of five different stores from different locations of USA for year 2012. Second is the revenue collection from each store in each month. Third one signifies which month fall in which season in USA. Three of the dataset are easily coerced together as all the dataset have same month_id in each dataset which is used in sql query for lookup to populate data in fact table as shown in fig.12 Data source 1- This dataset had been fetched from www.Kaggle.com . Kaggle is a repository of thousands of data set. This dataset contain data of supermarket of whole globe. The data used in this data warehouse is of five different state of USA which are New York, New Jersey, New Hampshire, Utah, Texas. Link of the dataset: https://kaggle2.blob.core.windows.net/datasets/1048/1903/global_superstore_2016.xlsx.zip?sv=201 5-12-11&sr=b&sig=V6MbJAh5QVwQC8wLLiPrsC8dKochxZ354VLclEnFuWM%3D&se=2017- 04-07T08%3A21%3A15Z&sp=r Data Source 2- This dataset is a dummy dataset which is generated by mockaroo. This dataset contains the revenue of each month for each state. Data Source 3- This is the unstructured data set which I Scraped from the site: https://www.englishclub.com/vocabulary/time-months-of-year.htm This data has been uploaded into excel which looks like as shown in Fig.3.1 which is cleaned in and made structured as shown in Fig.3.This dataset have seasons of USA.
  • 4. Fig.1 Fig.2 Data Warehouse Design and Architecture: To carry out the analysis of retail store in different state of USA like how much is the revenue generation, amount of product sold in what month and in which season Kimball’s approach is used to build this Data Warehouse. Design Tool for this Data Warehouse:- ● Sql Server Management Studio ● Sql Server Integration Services ● Sql Server Analysis Services I have followed the Kimball’s architecture which consist of the following procedures :- • Identification of the Process of Business:- We need to define the main process of business like acquiring customer, acquiring the products, then sale process. We also need to understand at what level sales data is summarized. Whether it is daily, weekly or monthly level. This step helps in determining the entities and their relationship as per business requirement. Later on these entities becomes the dimensions of the business. The most important entities are Cusotmer, Product, Location, and time. • Defining the Grain:- Grains mean at what depth we need to store the data for these dimension. It defined the granularity of the system. In this project we are going to store sales of the product at month level.
  • 5. • Defining the Dimensions :- Once entities and grains are decided we can decide the dimension. This dataset contains five dimensions - Dimension Name Primary Key Example Customer Customer-Key Sam Product Product_Key Jeans Location Location_Key Chicago Season season_Key Summer Month Month_Key June Table -1 These dimensions contain descriptive and textual data. • Deciding the fact of the Data Warehouse:-Fact table defines the measurable data we are going to store for the dimesions. It is the pivot of star schema which contain all the primary keys of dimensions and the measurable quantities which are used to carry out business queries. This fact data is designed in such a manner that it helps in identifying which is our regular customer, how to improve retail business as each season have variation in selling of product, how much revenue is generated in which state and last but not least which is the highest selling product. Advantage of Kimball’s Model: Kimball model has slight different approach to build data warehouse as it follows bottom up approach which help in merging small datasets. • Performane of Kimball model is better • More focus is on Dimension which play important role for analysis • Focus of this approach is on the process of Building DW • Less time consuming in creating the DataWarehouse Overview of building data warehouse to carry out Business intelligence queries:- In SSIS package Etl is done three of the datasets are in excel sheet which are extracted into the staging table,From staging table data is populated into the Dimensions table.with the help of lookup tool(join) data is being populated into the fact table.Cube is deployed in SSAS.Business queries are carried out in power BI.as shown in Fig.a
  • 6. Fig.3 Star Schema: Star Schema looks like a star in which Fact Table act as a pivot as it resides in the center, while multiple Dimensions are attached to the fact table in a star like form having concepts of Foreign key.A simple Star Schema usually have one Fact Table and multiple Dimensions but a complex Star Schema can consist more than one Fact Table. Generally, Fact Tables are in 3NF. Fact Table: Fact Table consist two type of column(i) Measure columns (ii) Foreign key column. Measure columns consist of numeric values that can be measured or count while foreign key column consist of column which act as primary key in dimension tables. Measure column can be used in form of aggregation or without aggregation for analysis of Business query. Dimension Table: Dimension table consist of Textual and descriptive values. Each dimension Table have their own primary key which is a unique table represent other column values. The surrogate column known as foreign key column in Fact Table is nothing else but they are the Primary key column of Dimension Table Fig.4
  • 7. Advantage of Star Schema: Star schema has various merit which prove its efficiency as well as its specialty in building a Data warehouse. • Easy to generate an ETL process • Complexity is low as table query has direct relationship • Decrease the headache of Normalizing, as data in dimension tables is stored in normal form • It is very efficient to carry out metric analysis • Each Dimension table is directly connected to Fact Table • Navigation of Data is fast as of the nature of connection of fact and dimension table. Design of Data Warehouse: For this Retail Data warehouse five dimensions and one fact table have been created. Dim_Customer: Customer dimension consist of Customer name, Customer id, Customer key. Customer key is the primary key in this dimension. It is generated when we I create the dimension by entering query [Customer_Key] INT Identity (1,1)PK. Now the question is why I generated this, as I was already having customer_id. As the primary key should be unique, none of the value should be repeated but as the customer is repeated their id will also repeat and that won’t make the column unique,so to remove this redundancy Customer_key as the primary key of this dimension is auto generated. Customer_name contains the name of customer and customer_id column contain the id of customer. With this dimension we can analyse which one is our regular customer. Fig 5 Fig 6 Dim_Product: Product dimension has product_key as the primary key. Product_id contain id of the products. Product_name contain the name of product sold.With the help of this dimension we can analyze which is the highest selling product and which customer buys what product.
  • 8. Fig 7 Dim_Location: Location dimension contain Location_Key as primary key. State_id is the id of state. State_name contains the name of state of store location. Region name contains the region of the country. This dimension is helpful in analyzing which state or region have higest number of customer,which state got highest sale. It will also help in analyzing the revenue earned in each state or region. Fig 8 Dim_Source: This dimension is fetched from unstructured dataset. It contain Season_key as primary key. Se_month_id is the id of a particular month. This Dimension will help in analyzing which month shows the highest sale and which season has what highest selling product. Fig 9 Dim_Month: This dimension contains Month_Key as Primary Key. S_month_id contain the id of particular month. Month_name contain the month.This dimension can be used in analyzing highest sale in a state according to month or which is the highest sold product in a month. Fig 10 Fact_Table: For our retail superstore we have created one fact table which is connected with each dimension table with foreign key relationship. It has three columns for measurement.
  • 9. (i) product_quantity- It contains the product of quantity sold. (ii) total_sale- It contain the sale amount of customer visit wise. (iii) revenue- It contain the amount of revenue generated in the store month wise. Fig 11 Star Schema of Project: Dimension tables and Fact Table is connected together using Star schema as shown in Fig 12. Fig.12 Extract Transform Load(ETL) process: For Building a data warehouse the important thing is extracting data, then this data is transformed into the staging area and lastly loaded in destination area. This is known as ETL process. To carry out ETL process for SSIS toolbox is used. In ETL process data from the External source is Extracted into the staging Database. Next step is to carry Transformation stage. Loading stage is the end of ETL
  • 10. process in which data is loaded in fact table.At the end of ETL process data is populated in fact table as well as in dimension table as shown in Fig.6. Fig.13 Extraction: Data is extracted from external source in this phase. For this project excel sheets are the external source. Otherwise it can be any database or OLTP server. This extraction will load the data into the the staging database base, which is ole db destination as shown in Fig 14. All the data is extracted into the database from these excel files. We can also see the data which comes in staging phase is stored in the database as (i) dbo.Main_Stage (ii) dbo.season_stage (iii) state_stage as shown in Fig 15. A Truncate Query is written in staging phase so that no multiple data is generated due to multiple run as shown in Fig 16. Fig.14
  • 11. Fig.15 Fig.16 Transformation: After the data is extracted from excel to staging database, next step which is done is transformation.For transformation i have used lookup tool(join) and sql query as shown in Fig.19.2 for loading the data from dimension tables. we have five dimension tables in our data base and 1 fact table. (i) dbo.Dim_Customer (ii) dbo.Dim_Location (iii) dbo.Dim_Month (iv) dbo.Dim_Product (v) dbo.dim_Source (vi)dbo.Retail_Fact These dimensions are shown in Fig.17.Dimensions are one of the important factor in analyzing data. Mapping should not be mismatched as it will terminate the ETL flow.
  • 12. Fig.17 Fig.18 Loading: After populating Dimension table next step is to populate Fact table. Fact table contains all the primary key of the dimension tables and some measureables which are used for analysis purpose with some aggregation rule. Lookup tool (joins) is used to populate the dimension table and Measures in fact Table. Fig.19.1
  • 13. Fig.19.2 Deploying the CUBE: It is the phase to carry out multidimensional representation of data with the help of cube in SSAS which is further use to analyze the data on the basis of measures which are present in fact table and the descriptive,textual data present in Dimension tables. Here, Project.Cube is successfully deployed as shown in Fig.20 & Fig.21. After deploying the cube, phase of analysis and reporting start’s where Business intelligence query is carried out. Fig.20
  • 14. Fig.21 Business Analytics Tool Used for Business Query-: Power BI Power BI is used to carry out the analysis of this Data Warehouse.For analyzing cube is imported in power BI. with the help of descriptive, textual and measurable quantity business queries have been carried out. Following business query can be analyzed with the help of our database. Case Study:1 Does Seasons(summer,spring,winter,autumn) in 3 different regions of USA effect the retail store business in term of revenue collection. This Query touches all of the three dataset. To verify the above Query we will take revenue, season name and region name. Below Graph shows how much revenue is generated in which region and in which season. Fig.22
  • 15. Analysis: From the clustered bar chart representation we can analyze that highest revenue is generated in summer season followed by autumn, then by winter and spring is responsible for least revenue in each region of USA. Graph also shows that in all the seasons store earns most of its revenue from Eastern US and Western season stood last. This graph give a quick insight to marketing and sales team that they need work on Western region to increase sales and find the reason of spring being so slow. Case Study:2 Sales generated in different states on basis of seasons This Query is generated from all the three dataset. To predict above query Total sale, State and Season is used. Below is the pie chart Fig.23 represent sale of different states in different season. Fig.23 Analysis: This pie chart is used to analysis the sales of store in different state in different season. As the Fig.23 shows that sale in Texas in summer season is highest, followed by New York. The pie chart shows that New York got highest sale in autumn Season and is followed by Texas. So New York and Texas are biggest buyers in any season. While rest of states are slow in all seasons. So it seems state is very important factor in terms of sales. We need to understand the needs of Western US states which our store is not able to cater. Either we need to change the products or increase some offers or may be store manager is not very efficient. Season and State are very important factor in US. The product which is suitable for New York in Winter might not be suitable for Utah during same time. This kind of variation is needed while planning store products.
  • 16. Case Study:3 Analytical Targeting of customers To predicate the above query we need to check which customer buys maximum number of products in which season. Product quantity, Customer Name and season is used for targeting specific customers. Fig.24 Analysis: The Donut chart Fig.24 represent customer who buys maximum number of products in four different season. Figure explains which customer bought what quantity of product in which season. According to the business point of view we can target the specific Customer and provide some more offers to improve our sales. Case Study:4 Seasons affecting the revenue of States This query also touches three of the dataset.To analyze the above query we used seasons, revenue, states to check the amount of revenue generated from each state in every season.
  • 17. Fig.25 Analysis: The above graphical representation Fig.25 shows how much revenue is collected in each state in each season. New York have generated highest amount of revenue in each season.while New Hampshire have generated the least. In perspective of business New York and Texas revenue generation is significantly high. Conclusion: This data warehouse can help in depicting how we can target specific customer in which region of the country. New York and Texas have highest sale and highest revenue generation while New Hampshire have significance less than each of the other state.so to improve the sale in New Hampshire, Utah, New Jersey. Seasons also play important role in retail business as the sale in summer season is the highest of all. with the help of this Data Warehouse we can also examine which product is sold in which month so we can give some extra offers on that particular product.