SlideShare a Scribd company logo
1 of 35
CIS-5270 BUSINESS INTELLIGENCE
1
Superstore Data Analysis
By:
Monika Mishra
Nanjesh Ramesh
CIS 5270: Business Intelligence
Submitted to: Professor Shilpa Balan
CIS-5270 BUSINESS INTELLIGENCE
2
Table of Contents
S. No. Topic Page No.
1 Introduction and Goal 3
2 Data Set
1. Data Set URL
2. About the dataset
3. Dataset details
4. Column details
4
4
4
4-5
3 Data Cleaning
1. Renaming column
2. Removing unwanted column
3. Duplicating and splitting column
6-7
8-9
10-11
4 Analysis & Visualizations
1. Bar Chart
2. Histogram
3. Pie Chart
4. Tree Map
5. Correlation Matrix
6. Word Cloud
12-13
14-15
16-17
18-19
20-21
22-23
5 Statistical Summary & Functions
1. Statistical Summary
2. User Defined Functions
24-25
26-30
6 Code Summary 31-35
CIS-5270 BUSINESS INTELLIGENCE
3
INTRODUCTION AND GOAL
1. Introduction:
Superstores industry comprises of companies that operate by having large size spaces
which store and supply large amounts of goods. The superstore industry is comprised of
extensive stores that sell a typical product line of grocery items and merchandise
products, such as food, pharmaceuticals, apparel, games and toys, hobby items, furniture
and appliances. The analysis of such industry is of great importance as it gives insights
for the sales and profits of various products. Our analysis is based on a superstore dataset
for US country where the products are ordered between 2015 and 2018.
2. Goal: To find out various supermarket statistics such as –
 Region that accounts for greater number of orders
 Frequency distribution of quantity ordered
 Percentage sales by category
 Profitable category and sub-category
 Category and sub-category that incurred losses
 Product type that was ordered greater times
 Yearly sales for various state.
With this analysis, the Superstore can identify various aspects of the shopping pattern and
take measures if required.
CIS-5270 BUSINESS INTELLIGENCE
4
DATA SET
1. Data Set URL:
https://data.world/stanke/sample-superstore-2018
2. About the dataset:
The dataset provides information about the sales and profit from a US supermarket from
the year 2015 to 2018.
3. Dataset details:
Size 2.4 MB
Number of columns 21
Number of rows 9994
Original file format XLS
4. Column details:
The dataset contains the following columns-
Column Name Column Detail
Row ID Unique row ID
Order ID Unique Order ID
Order Date Ordered Date of the Order
Ship Date Shipping Date of the Order
Ship Mode Shipping mode of the order
CIS-5270 BUSINESS INTELLIGENCE
5
Customer ID Unique ID of Customers
Customer Name Customer’s name
Segment Product Segment
Country US
City City of product ordered
State State of product ordered
Postal Code Postal code for the order
Region Region of product ordered
Product ID Unique Product id
Category Product category
Sub-Category Product sub-category
Product Name Name of the product
Sales Sales contribution of the order
Quantity Quantity ordered
Discount Discount provided on order
Profit Profit for the order
CIS-5270 BUSINESS INTELLIGENCE
6
DATA CLEANING
1. Renaming Column
Goal: The Colum name “CT” was not proper. The aim is to rename the column to “City”
Before
After
Code Used
CIS-5270 BUSINESS INTELLIGENCE
7
colnames(superstore)[colnames(superstore)=="CT"] <- "City"
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
8
2. Removing unwanted Column
Goal: The Column named “Country” needs to be removed as it contains only one value
“United States”
Before
After
CIS-5270 BUSINESS INTELLIGENCE
9
Code Used
superstore = subset(superstore, select = -c(Country) )
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
10
3. Duplicating the column and Splitting it into 3 columns
Goal: To duplicate the column “Order.Date” to “order” and then split “order” into month,
day and year
Before
After
After duplicating After splitting order column
No column after Profit
CIS-5270 BUSINESS INTELLIGENCE
11
Code Used
superstore$order<-superstore$Order.Date
library(tidyr)
superstore<-separate(superstore,order,c("month","day","year"),sep="/")
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
12
ANALYSIS & VISUALIZATIONS
1. What is the total number of orders by region?
Plot Type - Bar Chart
Function Used – barplot, table
Analysis
The above bar chart displays the total number of orders by region. It can be seen that the
Western region has the maximum order count (greater than 3000). The Western region is
followed by the Eastern region having an order count close to 3000. It is then followed by
the Central region with a count of around 2300. The least order has been placed by
Southern region (around 1500).
CIS-5270 BUSINESS INTELLIGENCE
13
Code Used
> countsR <- table(superstore$Region)
> barplot(countsR, main="Total Orders by Region",
+ xlab="Region", col="lightblue")
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
14
2. What is the frequency distribution of quantity ordered?
Plot Type - Histogram
Function Used – hist
Analysis
The above histogram chart shows the frequency distribution of the quantity ordered. The
maximum ordered quantity is 1 which is greater than 3000. It is then followed by 2, the
frequency for which is close to 2500. Generally speaking, the frequency count is
decreasing as the quantity ordered is increasing. The quantity ordered 14 has the least
frequency.
CIS-5270 BUSINESS INTELLIGENCE
15
Code Used
> hist(superstore$Quantity, main="Frequency Distribution of Quantity
Ordered",
+
+ xlab="Quantity Ordered", ylab= "Frequency", col="lightpink")
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
16
3. What is the percentage sales by category?
Plot Type – Pie Chart
Function Used – pie, group_by, summarize, round, paste
Analysis
The above pie chart shows the percentage sales by category. There are three categories –
Technology, Furniture and Office Supplies. Product category “Technology” has
contributed maximum towards sales which is 36%. It is then followed “Furniture” which
is 32%. “Office Supplies” has contributed the least which is 31%.
CIS-5270 BUSINESS INTELLIGENCE
17
Code Used
> install.packages("dplyr")
> library("dplyr")
> library(magrittr)
> gd <- superstore %>% group_by(Category) %>% summarize(Sales=sum(Sales))
> pct<-round(gd$Sales/sum(gd$Sales)*100)
> lbls<-paste(gd$Category,pct)
> lbls<-paste(lbls, "%", sep= " ")
> colors = c('lightskyblue','plum2','peachpuff')
> pie(gd$Sales, labels = lbls,main="Percentage Sales By Category",col=colors)
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
18
4. Which sub-category incurred losses? Which is the most profitable sub-category?
How are the overall sales for various category and sub-category?
Plot Type – Tree Map
Function Used – list, treemap
Analysis
The above is a Tree Map which provides information about the sales and profit of various
product category and sub-category. The cell size is decided by the sales. The color
gradient describes the profit. It can be concluded from the above map that the sub-
category “Phones” under “Technology” has the highest sale. The sub-category
“Furniture” incurred losses. Most profitable sub-category is “Copiers”.
CIS-5270 BUSINESS INTELLIGENCE
19
Code Used
> install.packages("treemap")
> library(treemap)
> treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor =
"Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(-
20000,10000,60000),title = "Sales Treemap For categories",fontsize.labels =
c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
20
5. What is the co-relationship between Sales, Quantity, Discount and Profit?
Plot Type – Correlation Matrix
Function Used – corrplot, cor
Analysis
This is a co-relation matrix chart which provide the co-relationship information about
various variables. The color gradient from Red to Blue describes the extent of co-
relationship among Sales, Quantity, Discount and Profit, red being the negative co-
relationship and blue being the positive co-relationship. It can be seen that “Sales” and
“Profit” are somewhat related. “Profit” and “Quantity” are also very weakly related.
“Profit” and “Discount” are negatively related.
CIS-5270 BUSINESS INTELLIGENCE
21
Code Used
> install.packages("corrplot")
> mydata <- superstore[, c(18,19,20,21)]
> View(mydata)
> library(corrplot)
> mydata.cor = cor(mydata)
> mydata.cor
> corrplot(mydata.cor)
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
22
6. What are the product types that have been ordered maximum times?
Plot Type – Word Cloud
Function Used – wordcloud
Analysis
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a
specific word appears in a source of textual data (such as a speech, blog post, or
database), the bigger and bolder it appears in the word cloud. In our case we want to
know what kind of products have been ordered frequently. Looking at the above word
cloud, it is clear product related to “Xerox” has been ordered the most. The product
related to binders, chairs and avery have also been ordered many times.
CIS-5270 BUSINESS INTELLIGENCE
23
Code Used
> install.packages("tm")
> install.packages("SnowballC")
> install.packages("wordcloud")
> install.packages("RColorBrewer")
> library(tm)
> library(SnowballC)
> library(RColorBrewer)
> library(wordcloud)
> wordcloud(words = superstore$Product.Name, min.freq = 1,
+ max.words=100, random.order=FALSE, rot.per=0.35,
+ colors=brewer.pal(8, "Dark2"))
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
24
STATISTICAL SUMMARY & FUNCTIONS
1. Statistical Summary
Question - Provide a statistical summary of the Sales.
Answer – Given below is the statistical summary of the Sales:
Statistics Value Meaning
Min.
(Minimum) 0.444 The lowest value of the sales present in the table
1st Qu.
(First
Quartile)
17.280
The first quartile (Q1) is defined as the middle number
between the smallest number and the median of the data
set. It splits off the lowest 25% of data from the highest
75%.
Median 54.490
It represents the middle number in a given sequence of
numbers when it’s ordered by rank.
Mean 229.858
It is the average of the Sales. It is the summation of all
Sales number divided by total number of Sales.
3rd Qu.
(Third
Quartile)
209.940
The third quartile (Q3) is defined as the middle number
between the median and the highest value of the data set.
It splits off the highest 25% of data from the lowest 75%.
Max.
(Maximum)
22638.480 The highest value of the sales present in the table.
CIS-5270 BUSINESS INTELLIGENCE
25
Code Usedfor Execution
> setwd("~/Desktop/BI")
> superstore<-read.csv("superstore.csv")
> View(superstore)
> summary(superstore$Sales)
Result
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
26
2. User Defined Function
Question – What is the total sales for each year for a particular user provided state ?
Answer – As a solution to the above question, we created a user defined function, which
takes state name as input parameter and displays total sales by year for the provided state
by plotting a line graph.
The state name provided by the user is validated to check if the name is there in
superstore table or not. If not present, an error message is shown. If present, the line chart
is plotted to display the result.
Full Screenshot
CIS-5270 BUSINESS INTELLIGENCE
27
Code Screenshot
CIS-5270 BUSINESS INTELLIGENCE
28
Execution Screenshot
Line Chart Screenshot
CIS-5270 BUSINESS INTELLIGENCE
29
Function Code
# Function returns total sales by year for the entered state
statesales<-function(inputstate)
{
# importing libraries
library(tidyr)
library(dplyr)
library(ggplot2)
print(paste("The State provided by the user is: ", inputstate))
# retrieving distinct state name from the table
state_name<-distinct(superstore, State)
# checking if the state provided is correct or not
isvalid<- any(state_name == inputstate)
# if the state name provided is valid, a graph will be plotted
if (isvalid==TRUE)
{
selected<-select(superstore, State, Sales, year)
filtered<-filter(selected,State==inputstate)
aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum)
print(aggregated)
# plotting line chart
ggplot(data=aggregated, aes(x=Group.1, y=x, group=1)) + geom_line(color="red")
+
geom_point(color="blue")+xlab("Year") + ylab("Total Sales") +
ggtitle("Total Sales by year")
}
else
{ print('Enter correct state name') }
}
CIS-5270 BUSINESS INTELLIGENCE
30
Execution Script
> setwd("~/Desktop/BI")
> source("sales.R")
> statesales("LA")
[1] "The State provided by the user is: LA"
[1] "Enter correct state name"
> statesales("California")
[1] "The State provided by the user is: California"
Group.1 x
1 15 91303.53
2 16 88443.84
3 17 131551.91
4 18 146388.34
CIS-5270 BUSINESS INTELLIGENCE
31
CODE SUMMARY
1. Data Cleaning Codes
a. Renaming Column
colnames(superstore)[colnames(superstore)=="CT"] <- "City"
b. Removing unwanted Column
superstore = subset(superstore, select = -c(Country) )
c. Duplicating the column and splitting into 3 columns
superstore$order<-superstore$Order.Date
library(tidyr)
superstore<-separate(superstore,order,c("month","day","year"),sep="/")
CIS-5270 BUSINESS INTELLIGENCE
32
2. Visualization Codes
a. Bar Chart
> countsR <- table(superstore$Region)
> barplot(countsR, main="Total Orders by Region",
+ xlab="Region", col="lightblue")
b. Histogram
> hist(superstore$Quantity, main="Frequency Distribution of Quantity
Ordered",
+
+ xlab="Quantity Ordered", ylab= "Frequency", col="lightpink")
c. Pie Chart
> install.packages("dplyr")
> library("dplyr")
> library(magrittr)
> gd <- superstore %>% group_by(Category) %>% summarize(Sales=sum(Sales))
> pct<-round(gd$Sales/sum(gd$Sales)*100)
> lbls<-paste(gd$Category,pct)
> lbls<-paste(lbls, "%", sep= " ")
> colors = c('lightskyblue','plum2','peachpuff')
> pie(gd$Sales, labels = lbls,main="Percentage Sales By Category",col=colors)
CIS-5270 BUSINESS INTELLIGENCE
33
d. Tree Map
> install.packages("treemap")
> library(treemap)
> treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor =
"Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(-
20000,10000,60000),title = "Sales Treemap For categories",fontsize.labels =
c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
e. Correlation Matrix
> install.packages("corrplot")
> mydata <- superstore[, c(18,19,20,21)]
> View(mydata)
> library(corrplot)
> mydata.cor = cor(mydata)
> mydata.cor
> corrplot(mydata.cor)
CIS-5270 BUSINESS INTELLIGENCE
34
f. Word Cloud
> install.packages("tm")
> install.packages("SnowballC")
> install.packages("wordcloud")
> install.packages("RColorBrewer")
> library(tm)
> library(SnowballC)
> library(RColorBrewer)
> library(wordcloud)
> wordcloud(words = superstore$Product.Name, min.freq = 1,
+ max.words=100, random.order=FALSE, rot.per=0.35,
+ colors=brewer.pal(8, "Dark2"))
3. Statistics Summary Code
> setwd("~/Desktop/BI")
> superstore<-read.csv("superstore.csv")
> View(superstore)
> summary(superstore$Sales)
CIS-5270 BUSINESS INTELLIGENCE
35
4. User Defined Function Code
# Function returns total sales by year for the entered state
statesales<-function(inputstate)
{
# importing libraries
library(tidyr)
library(dplyr)
library(ggplot2)
print(paste("The State provided by the user is: ", inputstate))
# retrieving distinct state name from the table
state_name<-distinct(superstore, State)
# checking if the state provided is correct or not
isvalid<- any(state_name == inputstate)
# if the state name provided is valid, a graph will be plotted
if (isvalid==TRUE)
{
selected<-select(superstore, State, Sales, year)
filtered<-filter(selected,State==inputstate)
aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum)
print(aggregated)
# plotting line chart
ggplot(data=aggregated, aes(x=Group.1, y=x, group=1)) + geom_line(color="red")
+
geom_point(color="blue")+xlab("Year") + ylab("Total Sales") +
ggtitle("Total Sales by year")
}
else
{ print('Enter correct state name') }
}

More Related Content

What's hot

Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisMahendra Gupta
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Big Data Case Study on Walmart
Big Data Case Study on WalmartBig Data Case Study on Walmart
Big Data Case Study on WalmartJainamParikh3
 
Porter's 5 forces model on e books retail industry
Porter's 5 forces model on e books retail industryPorter's 5 forces model on e books retail industry
Porter's 5 forces model on e books retail industryMd Golam Rabbi
 
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Web Services
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataManasa Damera
 
Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processingnayakslideshare
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouseSwapnilSaurav7
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisSandeep Prasad
 

What's hot (20)

Bank market classification
Bank market classificationBank market classification
Bank market classification
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
Retail Analytics
Retail AnalyticsRetail Analytics
Retail Analytics
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Big Data Case Study on Walmart
Big Data Case Study on WalmartBig Data Case Study on Walmart
Big Data Case Study on Walmart
 
Porter's 5 forces model on e books retail industry
Porter's 5 forces model on e books retail industryPorter's 5 forces model on e books retail industry
Porter's 5 forces model on e books retail industry
 
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer Churn
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company data
 
Aggregate fact tables
Aggregate fact tablesAggregate fact tables
Aggregate fact tables
 
Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processing
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouse
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
DMQL(Data Mining Query Language).pptx
DMQL(Data Mining Query Language).pptxDMQL(Data Mining Query Language).pptx
DMQL(Data Mining Query Language).pptx
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 

Similar to Superstore Data Analysis using R

Preparing for BIT – IT2301 Database Management Systems 2001g
Preparing for BIT – IT2301 Database Management Systems 2001gPreparing for BIT – IT2301 Database Management Systems 2001g
Preparing for BIT – IT2301 Database Management Systems 2001gGihan Wikramanayake
 
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...Gina Pabalan
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Muhammad Fahad
 
D365 F&O - Data and Analytics White Paper
D365 F&O - Data and Analytics White PaperD365 F&O - Data and Analytics White Paper
D365 F&O - Data and Analytics White PaperGina Pabalan
 
Customer Clustering for Retailer Marketing
Customer Clustering for Retailer MarketingCustomer Clustering for Retailer Marketing
Customer Clustering for Retailer MarketingJonathan Sedar
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1guest9529cb
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Experiments and Results on Click stream analysis using R
Experiments and Results on Click stream analysis using RExperiments and Results on Click stream analysis using R
Experiments and Results on Click stream analysis using RPridhvi Kodamasimham
 
Oracle Hyperion overview
Oracle Hyperion overviewOracle Hyperion overview
Oracle Hyperion overviewClick4learning
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...MD Owes Quruny Shubho
 
INTRODUCTION TO BUSINESS ANALYTICS.pptx
INTRODUCTION TO BUSINESS ANALYTICS.pptxINTRODUCTION TO BUSINESS ANALYTICS.pptx
INTRODUCTION TO BUSINESS ANALYTICS.pptxSurendhranatha Reddy
 
Emerging database landscape july 2011
Emerging database landscape july 2011Emerging database landscape july 2011
Emerging database landscape july 2011navaidkhan
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Reportnyin27
 

Similar to Superstore Data Analysis using R (20)

Preparing for BIT – IT2301 Database Management Systems 2001g
Preparing for BIT – IT2301 Database Management Systems 2001gPreparing for BIT – IT2301 Database Management Systems 2001g
Preparing for BIT – IT2301 Database Management Systems 2001g
 
KPMG - TASK 1.pdf
KPMG - TASK 1.pdfKPMG - TASK 1.pdf
KPMG - TASK 1.pdf
 
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
 
D365 F&O - Data and Analytics White Paper
D365 F&O - Data and Analytics White PaperD365 F&O - Data and Analytics White Paper
D365 F&O - Data and Analytics White Paper
 
Customer Clustering for Retailer Marketing
Customer Clustering for Retailer MarketingCustomer Clustering for Retailer Marketing
Customer Clustering for Retailer Marketing
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Experiments and Results on Click stream analysis using R
Experiments and Results on Click stream analysis using RExperiments and Results on Click stream analysis using R
Experiments and Results on Click stream analysis using R
 
Oracle Hyperion overview
Oracle Hyperion overviewOracle Hyperion overview
Oracle Hyperion overview
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
INTRODUCTION TO BUSINESS ANALYTICS.pptx
INTRODUCTION TO BUSINESS ANALYTICS.pptxINTRODUCTION TO BUSINESS ANALYTICS.pptx
INTRODUCTION TO BUSINESS ANALYTICS.pptx
 
Emerging database landscape july 2011
Emerging database landscape july 2011Emerging database landscape july 2011
Emerging database landscape july 2011
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Report
 

More from Monika Mishra

Aws image recognition
Aws image recognitionAws image recognition
Aws image recognitionMonika Mishra
 
Drug Review Analysis Using Elasticsearch and Kibana
Drug Review Analysis Using Elasticsearch and KibanaDrug Review Analysis Using Elasticsearch and Kibana
Drug Review Analysis Using Elasticsearch and KibanaMonika Mishra
 
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...Monika Mishra
 
Re-admit Historical using SAS Visual Analytics
Re-admit Historical  using SAS Visual AnalyticsRe-admit Historical  using SAS Visual Analytics
Re-admit Historical using SAS Visual AnalyticsMonika Mishra
 
Diabetic Encounter Analysis using SAS studio
Diabetic Encounter Analysis using SAS studioDiabetic Encounter Analysis using SAS studio
Diabetic Encounter Analysis using SAS studioMonika Mishra
 
LA Energy and Water Efficiency Statistics using Tableau
LA Energy and Water Efficiency Statistics using TableauLA Energy and Water Efficiency Statistics using Tableau
LA Energy and Water Efficiency Statistics using TableauMonika Mishra
 
Predicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLPredicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLMonika Mishra
 
Amazon Product Review Data Analysis
Amazon Product ReviewData AnalysisAmazon Product ReviewData Analysis
Amazon Product Review Data AnalysisMonika Mishra
 

More from Monika Mishra (8)

Aws image recognition
Aws image recognitionAws image recognition
Aws image recognition
 
Drug Review Analysis Using Elasticsearch and Kibana
Drug Review Analysis Using Elasticsearch and KibanaDrug Review Analysis Using Elasticsearch and Kibana
Drug Review Analysis Using Elasticsearch and Kibana
 
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
 
Re-admit Historical using SAS Visual Analytics
Re-admit Historical  using SAS Visual AnalyticsRe-admit Historical  using SAS Visual Analytics
Re-admit Historical using SAS Visual Analytics
 
Diabetic Encounter Analysis using SAS studio
Diabetic Encounter Analysis using SAS studioDiabetic Encounter Analysis using SAS studio
Diabetic Encounter Analysis using SAS studio
 
LA Energy and Water Efficiency Statistics using Tableau
LA Energy and Water Efficiency Statistics using TableauLA Energy and Water Efficiency Statistics using Tableau
LA Energy and Water Efficiency Statistics using Tableau
 
Predicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLPredicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure ML
 
Amazon Product Review Data Analysis
Amazon Product ReviewData AnalysisAmazon Product ReviewData Analysis
Amazon Product Review Data Analysis
 

Recently uploaded

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

Superstore Data Analysis using R

  • 1. CIS-5270 BUSINESS INTELLIGENCE 1 Superstore Data Analysis By: Monika Mishra Nanjesh Ramesh CIS 5270: Business Intelligence Submitted to: Professor Shilpa Balan
  • 2. CIS-5270 BUSINESS INTELLIGENCE 2 Table of Contents S. No. Topic Page No. 1 Introduction and Goal 3 2 Data Set 1. Data Set URL 2. About the dataset 3. Dataset details 4. Column details 4 4 4 4-5 3 Data Cleaning 1. Renaming column 2. Removing unwanted column 3. Duplicating and splitting column 6-7 8-9 10-11 4 Analysis & Visualizations 1. Bar Chart 2. Histogram 3. Pie Chart 4. Tree Map 5. Correlation Matrix 6. Word Cloud 12-13 14-15 16-17 18-19 20-21 22-23 5 Statistical Summary & Functions 1. Statistical Summary 2. User Defined Functions 24-25 26-30 6 Code Summary 31-35
  • 3. CIS-5270 BUSINESS INTELLIGENCE 3 INTRODUCTION AND GOAL 1. Introduction: Superstores industry comprises of companies that operate by having large size spaces which store and supply large amounts of goods. The superstore industry is comprised of extensive stores that sell a typical product line of grocery items and merchandise products, such as food, pharmaceuticals, apparel, games and toys, hobby items, furniture and appliances. The analysis of such industry is of great importance as it gives insights for the sales and profits of various products. Our analysis is based on a superstore dataset for US country where the products are ordered between 2015 and 2018. 2. Goal: To find out various supermarket statistics such as –  Region that accounts for greater number of orders  Frequency distribution of quantity ordered  Percentage sales by category  Profitable category and sub-category  Category and sub-category that incurred losses  Product type that was ordered greater times  Yearly sales for various state. With this analysis, the Superstore can identify various aspects of the shopping pattern and take measures if required.
  • 4. CIS-5270 BUSINESS INTELLIGENCE 4 DATA SET 1. Data Set URL: https://data.world/stanke/sample-superstore-2018 2. About the dataset: The dataset provides information about the sales and profit from a US supermarket from the year 2015 to 2018. 3. Dataset details: Size 2.4 MB Number of columns 21 Number of rows 9994 Original file format XLS 4. Column details: The dataset contains the following columns- Column Name Column Detail Row ID Unique row ID Order ID Unique Order ID Order Date Ordered Date of the Order Ship Date Shipping Date of the Order Ship Mode Shipping mode of the order
  • 5. CIS-5270 BUSINESS INTELLIGENCE 5 Customer ID Unique ID of Customers Customer Name Customer’s name Segment Product Segment Country US City City of product ordered State State of product ordered Postal Code Postal code for the order Region Region of product ordered Product ID Unique Product id Category Product category Sub-Category Product sub-category Product Name Name of the product Sales Sales contribution of the order Quantity Quantity ordered Discount Discount provided on order Profit Profit for the order
  • 6. CIS-5270 BUSINESS INTELLIGENCE 6 DATA CLEANING 1. Renaming Column Goal: The Colum name “CT” was not proper. The aim is to rename the column to “City” Before After Code Used
  • 8. CIS-5270 BUSINESS INTELLIGENCE 8 2. Removing unwanted Column Goal: The Column named “Country” needs to be removed as it contains only one value “United States” Before After
  • 9. CIS-5270 BUSINESS INTELLIGENCE 9 Code Used superstore = subset(superstore, select = -c(Country) ) Full Screenshot
  • 10. CIS-5270 BUSINESS INTELLIGENCE 10 3. Duplicating the column and Splitting it into 3 columns Goal: To duplicate the column “Order.Date” to “order” and then split “order” into month, day and year Before After After duplicating After splitting order column No column after Profit
  • 11. CIS-5270 BUSINESS INTELLIGENCE 11 Code Used superstore$order<-superstore$Order.Date library(tidyr) superstore<-separate(superstore,order,c("month","day","year"),sep="/") Full Screenshot
  • 12. CIS-5270 BUSINESS INTELLIGENCE 12 ANALYSIS & VISUALIZATIONS 1. What is the total number of orders by region? Plot Type - Bar Chart Function Used – barplot, table Analysis The above bar chart displays the total number of orders by region. It can be seen that the Western region has the maximum order count (greater than 3000). The Western region is followed by the Eastern region having an order count close to 3000. It is then followed by the Central region with a count of around 2300. The least order has been placed by Southern region (around 1500).
  • 13. CIS-5270 BUSINESS INTELLIGENCE 13 Code Used > countsR <- table(superstore$Region) > barplot(countsR, main="Total Orders by Region", + xlab="Region", col="lightblue") Full Screenshot
  • 14. CIS-5270 BUSINESS INTELLIGENCE 14 2. What is the frequency distribution of quantity ordered? Plot Type - Histogram Function Used – hist Analysis The above histogram chart shows the frequency distribution of the quantity ordered. The maximum ordered quantity is 1 which is greater than 3000. It is then followed by 2, the frequency for which is close to 2500. Generally speaking, the frequency count is decreasing as the quantity ordered is increasing. The quantity ordered 14 has the least frequency.
  • 15. CIS-5270 BUSINESS INTELLIGENCE 15 Code Used > hist(superstore$Quantity, main="Frequency Distribution of Quantity Ordered", + + xlab="Quantity Ordered", ylab= "Frequency", col="lightpink") Full Screenshot
  • 16. CIS-5270 BUSINESS INTELLIGENCE 16 3. What is the percentage sales by category? Plot Type – Pie Chart Function Used – pie, group_by, summarize, round, paste Analysis The above pie chart shows the percentage sales by category. There are three categories – Technology, Furniture and Office Supplies. Product category “Technology” has contributed maximum towards sales which is 36%. It is then followed “Furniture” which is 32%. “Office Supplies” has contributed the least which is 31%.
  • 17. CIS-5270 BUSINESS INTELLIGENCE 17 Code Used > install.packages("dplyr") > library("dplyr") > library(magrittr) > gd <- superstore %>% group_by(Category) %>% summarize(Sales=sum(Sales)) > pct<-round(gd$Sales/sum(gd$Sales)*100) > lbls<-paste(gd$Category,pct) > lbls<-paste(lbls, "%", sep= " ") > colors = c('lightskyblue','plum2','peachpuff') > pie(gd$Sales, labels = lbls,main="Percentage Sales By Category",col=colors) Full Screenshot
  • 18. CIS-5270 BUSINESS INTELLIGENCE 18 4. Which sub-category incurred losses? Which is the most profitable sub-category? How are the overall sales for various category and sub-category? Plot Type – Tree Map Function Used – list, treemap Analysis The above is a Tree Map which provides information about the sales and profit of various product category and sub-category. The cell size is decided by the sales. The color gradient describes the profit. It can be concluded from the above map that the sub- category “Phones” under “Technology” has the highest sale. The sub-category “Furniture” incurred losses. Most profitable sub-category is “Copiers”.
  • 19. CIS-5270 BUSINESS INTELLIGENCE 19 Code Used > install.packages("treemap") > library(treemap) > treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor = "Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(- 20000,10000,60000),title = "Sales Treemap For categories",fontsize.labels = c(15,10),align.labels = list(c("centre","centre"),c("left","top"))) Full Screenshot
  • 20. CIS-5270 BUSINESS INTELLIGENCE 20 5. What is the co-relationship between Sales, Quantity, Discount and Profit? Plot Type – Correlation Matrix Function Used – corrplot, cor Analysis This is a co-relation matrix chart which provide the co-relationship information about various variables. The color gradient from Red to Blue describes the extent of co- relationship among Sales, Quantity, Discount and Profit, red being the negative co- relationship and blue being the positive co-relationship. It can be seen that “Sales” and “Profit” are somewhat related. “Profit” and “Quantity” are also very weakly related. “Profit” and “Discount” are negatively related.
  • 21. CIS-5270 BUSINESS INTELLIGENCE 21 Code Used > install.packages("corrplot") > mydata <- superstore[, c(18,19,20,21)] > View(mydata) > library(corrplot) > mydata.cor = cor(mydata) > mydata.cor > corrplot(mydata.cor) Full Screenshot
  • 22. CIS-5270 BUSINESS INTELLIGENCE 22 6. What are the product types that have been ordered maximum times? Plot Type – Word Cloud Function Used – wordcloud Analysis Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it appears in the word cloud. In our case we want to know what kind of products have been ordered frequently. Looking at the above word cloud, it is clear product related to “Xerox” has been ordered the most. The product related to binders, chairs and avery have also been ordered many times.
  • 23. CIS-5270 BUSINESS INTELLIGENCE 23 Code Used > install.packages("tm") > install.packages("SnowballC") > install.packages("wordcloud") > install.packages("RColorBrewer") > library(tm) > library(SnowballC) > library(RColorBrewer) > library(wordcloud) > wordcloud(words = superstore$Product.Name, min.freq = 1, + max.words=100, random.order=FALSE, rot.per=0.35, + colors=brewer.pal(8, "Dark2")) Full Screenshot
  • 24. CIS-5270 BUSINESS INTELLIGENCE 24 STATISTICAL SUMMARY & FUNCTIONS 1. Statistical Summary Question - Provide a statistical summary of the Sales. Answer – Given below is the statistical summary of the Sales: Statistics Value Meaning Min. (Minimum) 0.444 The lowest value of the sales present in the table 1st Qu. (First Quartile) 17.280 The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. It splits off the lowest 25% of data from the highest 75%. Median 54.490 It represents the middle number in a given sequence of numbers when it’s ordered by rank. Mean 229.858 It is the average of the Sales. It is the summation of all Sales number divided by total number of Sales. 3rd Qu. (Third Quartile) 209.940 The third quartile (Q3) is defined as the middle number between the median and the highest value of the data set. It splits off the highest 25% of data from the lowest 75%. Max. (Maximum) 22638.480 The highest value of the sales present in the table.
  • 25. CIS-5270 BUSINESS INTELLIGENCE 25 Code Usedfor Execution > setwd("~/Desktop/BI") > superstore<-read.csv("superstore.csv") > View(superstore) > summary(superstore$Sales) Result Full Screenshot
  • 26. CIS-5270 BUSINESS INTELLIGENCE 26 2. User Defined Function Question – What is the total sales for each year for a particular user provided state ? Answer – As a solution to the above question, we created a user defined function, which takes state name as input parameter and displays total sales by year for the provided state by plotting a line graph. The state name provided by the user is validated to check if the name is there in superstore table or not. If not present, an error message is shown. If present, the line chart is plotted to display the result. Full Screenshot
  • 28. CIS-5270 BUSINESS INTELLIGENCE 28 Execution Screenshot Line Chart Screenshot
  • 29. CIS-5270 BUSINESS INTELLIGENCE 29 Function Code # Function returns total sales by year for the entered state statesales<-function(inputstate) { # importing libraries library(tidyr) library(dplyr) library(ggplot2) print(paste("The State provided by the user is: ", inputstate)) # retrieving distinct state name from the table state_name<-distinct(superstore, State) # checking if the state provided is correct or not isvalid<- any(state_name == inputstate) # if the state name provided is valid, a graph will be plotted if (isvalid==TRUE) { selected<-select(superstore, State, Sales, year) filtered<-filter(selected,State==inputstate) aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum) print(aggregated) # plotting line chart ggplot(data=aggregated, aes(x=Group.1, y=x, group=1)) + geom_line(color="red") + geom_point(color="blue")+xlab("Year") + ylab("Total Sales") + ggtitle("Total Sales by year") } else { print('Enter correct state name') } }
  • 30. CIS-5270 BUSINESS INTELLIGENCE 30 Execution Script > setwd("~/Desktop/BI") > source("sales.R") > statesales("LA") [1] "The State provided by the user is: LA" [1] "Enter correct state name" > statesales("California") [1] "The State provided by the user is: California" Group.1 x 1 15 91303.53 2 16 88443.84 3 17 131551.91 4 18 146388.34
  • 31. CIS-5270 BUSINESS INTELLIGENCE 31 CODE SUMMARY 1. Data Cleaning Codes a. Renaming Column colnames(superstore)[colnames(superstore)=="CT"] <- "City" b. Removing unwanted Column superstore = subset(superstore, select = -c(Country) ) c. Duplicating the column and splitting into 3 columns superstore$order<-superstore$Order.Date library(tidyr) superstore<-separate(superstore,order,c("month","day","year"),sep="/")
  • 32. CIS-5270 BUSINESS INTELLIGENCE 32 2. Visualization Codes a. Bar Chart > countsR <- table(superstore$Region) > barplot(countsR, main="Total Orders by Region", + xlab="Region", col="lightblue") b. Histogram > hist(superstore$Quantity, main="Frequency Distribution of Quantity Ordered", + + xlab="Quantity Ordered", ylab= "Frequency", col="lightpink") c. Pie Chart > install.packages("dplyr") > library("dplyr") > library(magrittr) > gd <- superstore %>% group_by(Category) %>% summarize(Sales=sum(Sales)) > pct<-round(gd$Sales/sum(gd$Sales)*100) > lbls<-paste(gd$Category,pct) > lbls<-paste(lbls, "%", sep= " ") > colors = c('lightskyblue','plum2','peachpuff') > pie(gd$Sales, labels = lbls,main="Percentage Sales By Category",col=colors)
  • 33. CIS-5270 BUSINESS INTELLIGENCE 33 d. Tree Map > install.packages("treemap") > library(treemap) > treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor = "Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(- 20000,10000,60000),title = "Sales Treemap For categories",fontsize.labels = c(15,10),align.labels = list(c("centre","centre"),c("left","top"))) e. Correlation Matrix > install.packages("corrplot") > mydata <- superstore[, c(18,19,20,21)] > View(mydata) > library(corrplot) > mydata.cor = cor(mydata) > mydata.cor > corrplot(mydata.cor)
  • 34. CIS-5270 BUSINESS INTELLIGENCE 34 f. Word Cloud > install.packages("tm") > install.packages("SnowballC") > install.packages("wordcloud") > install.packages("RColorBrewer") > library(tm) > library(SnowballC) > library(RColorBrewer) > library(wordcloud) > wordcloud(words = superstore$Product.Name, min.freq = 1, + max.words=100, random.order=FALSE, rot.per=0.35, + colors=brewer.pal(8, "Dark2")) 3. Statistics Summary Code > setwd("~/Desktop/BI") > superstore<-read.csv("superstore.csv") > View(superstore) > summary(superstore$Sales)
  • 35. CIS-5270 BUSINESS INTELLIGENCE 35 4. User Defined Function Code # Function returns total sales by year for the entered state statesales<-function(inputstate) { # importing libraries library(tidyr) library(dplyr) library(ggplot2) print(paste("The State provided by the user is: ", inputstate)) # retrieving distinct state name from the table state_name<-distinct(superstore, State) # checking if the state provided is correct or not isvalid<- any(state_name == inputstate) # if the state name provided is valid, a graph will be plotted if (isvalid==TRUE) { selected<-select(superstore, State, Sales, year) filtered<-filter(selected,State==inputstate) aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum) print(aggregated) # plotting line chart ggplot(data=aggregated, aes(x=Group.1, y=x, group=1)) + geom_line(color="red") + geom_point(color="blue")+xlab("Year") + ylab("Total Sales") + ggtitle("Total Sales by year") } else { print('Enter correct state name') } }