SlideShare a Scribd company logo
1 of 14
Download to read offline
Marketing Analysis for The Bee Corp
Qingyang(Kevin) Liu
Email:tug14939@temple.edu
June 22, 2017
1 Introduction of the dataset
The orginal file Quant Round.xlsx contains three sheets. However, xlsx format is proprietary format hence can
not be imported to R software without using other packages. I transfer Quant Round.xlsx into Quant Round.csv
file and only keep the first sheet since csv foramt has much better compatibility and the first sheet from Quant
Round.xlsx contains all information we need.
The import process using read.csv command for R software is shown below:
> df1 <- read.csv(file =
+ "/home/kevin/Desktop/The Bee Corp/Quant Round.csv",
+ header = T)
> dim(df1)
[1] 9994 22
The Quant Round.csv file has been imported into R as df1 data frame, which contains 9994 rows and 22 variables.
The summary information for important variables are shown below.
Row.ID: The primary key for this dataset. This variable is unique for each row.
Order.ID: The order identification. This variable doesn’t have to be unqiue. One order could contain multiple
rows (one order may contain different products.). There are 5009 distinct orders in df1.
Order.Date: The date when order was created or submitted. Order.Date was stored in numeric formation. I trans-
fer the numeric formation into yyyy-mm-dd formation, assuming the original date is "1900-01-01".
Ship.Date: The date when order was shipped. Also stored in numeric formation. I transfer the numeric formation
into yyyy-mm-dd formation, assuming the original date is "1900-01-01".
Ship.Mode: There are four different ship mode: Same Day, First Class, Standard Class and Second Class.
Customer.ID: Customer Identification. One customer has one unique ID.
Segment: There are three different segments, Customer, Corporate and Home Office, in this dataset.
(Corporate␣ has been corrected as Corporate)
Country: All orders have been shipped within United States.
City: There are 531 different cities in this dataset.
State: There are 48 contiguous U.S. states and the District of Columbia in this dataset.
(CAL␣ has been corrected as California. IND␣ has been corrected as Indiana)
1
Region: There are five regions, Central, East, North, South and West, in this dataset. There are few mistakes
in the original dataset. For example, there are 37 records in which Florida was categorized as North
region.
Product.ID: Production Identification. One product has one unique ID.
Category: All productions belong to three categories, funiture, office supplies and technology.
Sub.Category: The relationship between Sub.Category and Category are shown in Table 1.1.
One Sub.Category only belongs to one Category.
Table 1.1: Sub.Category (in column) and Category (in row)
Furniture Office Supplies Technology
Accessories 0 0 775
Appliances 0 466 0
Art 0 796 0
Binders 0 1523 0
Bookcases 228 0 0
Chairs 617 0 0
Copiers 0 0 68
Envelopes 0 254 0
Fasteners 0 217 0
Furnishings 957 0 0
Labels 0 364 0
Machines 0 0 115
Paper 0 1370 0
Phones 0 0 889
Storage 0 846 0
Supplies 0 190 0
Tables 319 0 0
SalesTotal: SalesTotal = Iterm.Price × Quantity, where Item.Price is the price after discount.
Profit: Positive number stands for profit. Negative number stands for deficit.
2
2 Sales/Profit by Region
Figure 2.1: Maps of Sales and Profit in State Level
Total Sales in State Level
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
Profit in State Level
−20000
0
20000
40000
60000
80000
3
Table 2.1: Inconsistent definition of regions (part)
> head(table(df1$State,df1$Region),10)
Central East North South West
Alabama 0 0 1 60 0
Arizona 0 0 0 0 224
Arkansas 0 0 1 59 0
California 0 0 0 0 2001
Colorado 0 0 0 0 182
Connecticut 0 82 0 0 0
Delaware 0 96 0 0 0
District of Columbia 0 10 0 0 0
Florida 0 0 37 346 0
Georgia 0 0 6 178 0
Table 2.2: Top 4 States by Sales
> head(df2[order(-df2$sales.ratio),],4)
state sales profit sales.ratio
4 california 457687.6 76381.39 0.19923710
31 new york 310876.3 74038.55 0.13532829
42 texas 170188.0 -25729.36 0.07408497
46 washington 138641.3 33402.65 0.06035226
Table 2.3: Top 4 States by Profit
> head(df2[order(-df2$profit),],4)
state sales profit sales.ratio
4 california 457687.63 76381.39 0.19923710
31 new york 310876.27 74038.55 0.13532829
46 washington 138641.27 33402.65 0.06035226
21 michigan 76269.61 24463.19 0.03320111
Table 2.4: Discount in Texas
> table(df1[df1$State == "Texas","Discount"])
0.2 0.3 0.32 0.4 0.6 0.8
570 94 27 13 81 200
The rule of categorizing regions is doubtful and inconsistent in this dataset. According to Table 2.1, 37 records in
Florida have been defined as records in North region and 1 record in Alabama has been defined as a record in North
region. There are more than 100 records that have been defined in wrong regions. In real work, we need to discuss the
definition of each region with supervisor. For this analysis report, quantitative marketing analysis based on regions is
skipped.
4
According to Table 2.2 and Table 2.3, California is the largest market for the company and New York State is the
second largest market for the company either based on sales or by profit. The Sales/Profit performance in Texas
market is contradictory. By sales, Texas is the third largest market for the company. However, the company lost
$25, 729.36 in Texas market. By looking at Table 2.4, we find that the company has large discount policy in Texas
and that every product sold in Texas market has at least 20% discount. There are even 200 records of 80% discount.
Future more, by look at Table 2.5 and Table 2.6, we find out that sales in deficits markets have at least 20% discount.
Discount is a important reason for deficit in those market. We need to discuss the reason for applying large discount
strategy with business manager. It could be market penetration strategy or those products are too difficult to sell.
Table 2.5: Deficits Markets in States Level
> df3[df3$Profit < 0,]
State Profit SalesTotal Profit.Sales.Ratio
40 Oregon -1190.470 17431.15 -0.06829558
41 Florida -3399.302 89473.71 -0.03799219
42 Arizona -3427.925 35282.00 -0.09715789
43 Tennessee -5341.694 30661.87 -0.17421289
44 Colorado -6527.858 32108.12 -0.20330864
45 North Carolina -7490.912 55603.16 -0.13472097
46 Illinois -12607.887 80166.10 -0.15727205
47 Pennsylvania -15559.960 116511.91 -0.13354823
48 Ohio -16971.377 78258.14 -0.21686405
49 Texas -25729.356 170188.05 -0.15118192
Table 2.6: Discount in Deficits Markets
> tab1 <- table(df1[,c("State","Discount")])
> tab1[as.character(df3[df3$Profit < 0,"State"]),]
Discount
State 0 0.1 0.15 0.2 0.3 0.32 0.4 0.45 0.5 0.6 0.7 0.8
Oregon 0 0 0 100 0 0 0 0 5 0 19 0
Florida 0 0 0 299 0 0 0 11 6 0 67 0
Arizona 0 0 0 174 0 0 0 0 9 0 41 0
Tennessee 0 0 0 144 0 0 8 0 2 0 29 0
Colorado 0 0 0 138 0 0 0 0 4 0 40 0
North Carolina 0 0 0 201 0 0 8 0 4 0 36 0
Illinois 0 0 0 264 53 0 0 0 18 57 0 100
Pennsylvania 0 0 0 354 36 0 82 0 10 0 105 0
Ohio 0 0 0 290 23 0 67 0 8 0 81 0
Texas 0 0 0 570 94 27 13 0 0 81 0 200
Conclusion:
1. California and New York States are the first two most successful markets based on either profit or sales.
2. Companies are losing money in states like Texas, Ohio and many others due to large discount.
3. The region is ill-defined so no conclusion has been made based on it.
5
3 Profit by Category/Subcategory/Specific Product
According to Table 3.1, we find that Technology and Office Supplies account for 50.79% and 42.77% of total
profit for the company. The products that belong to Furniture only contributes 6.44% of the total Profit for the
company.
Table 3.1: Profit by Category
> df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum))
> df4 <- arrange(df4,-df4$Profit)
> df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2)
> df4
Category Profit percent
1 Technology 145454.95 50.79
2 Office Supplies 122490.80 42.77
3 Furniture 18451.27 6.44
Figure 3.1: Profit by Category/Subcategory
Profit by Category/Subcategory
Profit/Deficit
Tables
Bookcases
Furnishings
Chairs
−20000 0 20000 40000 60000
Furniture
Supplies
Fasteners
Labels
Art
Envelopes
Appliances
Storage
Binders
Paper
Office Supplies
Machines
Accessories
Phones
Copiers
Technology
Profit
Deficit
6
Table 3.2: Most Profitable Products
Category Sub.Category Product.Name Total.Quantity Total_Profit Average.Term.Price Max.Item.Price Min.Item.Price
Technology Copiers
Canon image
CLASS 2200
Advanced Copier
20 25199.93 1259.996 3499.99 2099.994
Office Supplies Binders
Fellowes PB500
Electric Punch
Plastic Comb
Binding Machine
with Manual Bind
31 7753.039 250.098 1270.99 254.198
Technology Copiers
Hewlett Packard
LaserJet 3310 Copier
38 6983.884 183.7864 599.99 359.994
Technology Copiers
Canon PC1060
Personal Laser Copier
19 4570.935 240.5755 10559.99 559.992
Technology Machines
HP Designjet
T520 Inkjet
Large Format Printer
- 24" Color
12 4094.977 341.2481 1749.99 874.995
Technology Machines
Ativa V4110MDD
Micro-Cut Shredder
11 3772.946 342.9951 699.99 699.99
Looking at Figure 3.1, we find that all 4 products that belong to technology can make profit for the company. Copiers,
Phones and Accessories can make more than $40, 000 for the company! All products ,except Supplies, that belong
to Office Supplies can make profit for the companies. For the Furniture products, Chairs and Furnishings, can
make profit while Bookcases and Tables are responsible for deficit.
From Table 3.2, the most profitable product is Canon image CLASS2200 Advanced Copier, which is a copiers and a
sort of technology product. However, there is doubt about the Item.Price of Canon PC1060 Personal Laser Copier.
The Max.Item.Price for that product is $10, 559.99 while the Min.Item.Price is $559.992. The difference is too
large for a copier. I guess the difference was caused by Typo.I will discuss these large difference between maximum
item price and minimum item price in section 5.
Table 3.3: Details of Canon PC1060 Personal Laser Copier’s transaction
Product_Name Item_Price Quantity Discount
Canon PC1060 Personal Laser Copier 559.992 2 0.2
Canon PC1060 Personal Laser Copier 10559.992 5 0.2
Canon PC1060 Personal Laser Copier 559.992 5 0.2
Canon PC1060 Personal Laser Copier 699.99 7 0
Conclusion:
1. Products like copiers, phones, accessories in Technology category can make a lot of profit.
2. The performance of Furniture products are generally not good. Those products either make little profit and
loss much money for the company.
3. The most profitable product is Canon image CLASS2200 Advanced Copier.
4. Some Item.Price are doubtful, (in Table 3.3, same printer has been sold at $10, 559.992 and $559.99).
7
4 Cluster Analysis (DEMO)
Cluster Analysis is a powerful tool for marketing analysis. The cluster analysis is very handy when there are many
continuous variables. Though we don’t have many continuous variables for this dataset, we can still use this methods
to have some interesting findings.
We create a new dataset after aggregating on State. The first 6 rows of the new dataset could be found in Table 4.1.
The cluster analysis is based on SalesTotal, Profit, Quantity and Avg.item.price.
Table 4.1: Dataset for clustering analysis
> df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")],
+ .(State),colwise(sum))
> df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity
> rownames(df7) <- as.character(df7$State)
> df7 <- df7[,2:5]
> head(df7)
SalesTotal Profit Quantity Avg.iterm.price
Alabama 19510.64 5786.825 256 76.21344
Arizona 35282.00 -3427.925 862 40.93040
Arkansas 11678.13 4008.687 240 48.65887
California 457687.63 76381.387 7667 59.69579
Colorado 32108.12 -6527.858 693 46.33206
Connecticut 13384.36 3511.492 281 47.63116
After standardizing each variable via scale function, we calculate the euclidean distance between each variable. Then
we choose "average" algorithm for clustering analysis.The initial result of clustering analysis could be found in Figure
4.1.
Figure 4.1: Initial Results - Cluster Analysis
California
NewYork
Wyoming
Texas
Washington
Vermont
Florida
Pennsylvania
Illinois
Ohio
Michigan
Virginia
Georgia
Indiana
RhodeIsland
Montana
Nevada
Maryland
Massachusetts
Missouri
Alabama
Oklahoma
Minnesota
Delaware
NewJersey
Kentucky
Wisconsin
NorthCarolina
Arizona
Colorado
Tennessee
WestVirginia
DistrictofColumbia
Idaho
Louisiana
Nebraska
NewHampshire
Mississippi
Arkansas
Connecticut
SouthCarolina
Utah
Oregon
Maine
Iowa
Kansas
NewMexico
NorthDakota
SouthDakota
0246
Average Linkage Clustering
hclust (*, "average")
d
Height
There are many criterion we can choose to determine the number of clusters. According to my experience, the
NbClust::NbClust function could be very helpful.
8
Figure 4.2: Determine the number of clusters
0 2 3 5 9 10
Number of Clusters Chosen by 26 Criteria
Number of Clusters
NumberofCriteria
02468
The NbClust::NbClust use 26 different criteria to determine the number of clusters. According to the result from
NbClust::NbClustin Figure 4.2, I decide to set the number of cluster equal 3.
Figure 4.3: Final Results - Cluster Analysis
California
NewYork
Wyoming
Texas
Washington
Vermont
Florida
Pennsylvania
Illinois
Ohio
Michigan
Virginia
Georgia
Indiana
RhodeIsland
Montana
Nevada
Maryland
Massachusetts
Missouri
Alabama
Oklahoma
Minnesota
Delaware
NewJersey
Kentucky
Wisconsin
NorthCarolina
Arizona
Colorado
Tennessee
WestVirginia
DistrictofColumbia
Idaho
Louisiana
Nebraska
NewHampshire
Mississippi
Arkansas
Connecticut
SouthCarolina
Utah
Oregon
Maine
Iowa
Kansas
NewMexico
NorthDakota
SouthDakota
0246
Average Linkage Clustering
3 Cluster Solution
hclust (*, "average")
d
Height
The final result could be found in Figure 4.3. New York and California are categorized as cluster 2. Wyoming is
categorized as cluster 3. The rest states are categorized as cluster 1.
Description of Clusters
> aggregate(df7, by = list(clusters), median)
Group.1 SalesTotal Profit Quantity Avg.iterm.price
1 1 20944.270 2116.598 268.5 57.87003
2 2 384281.951 75209.968 5945.5 66.64670
3 3 1603.136 100.196 4.0 400.78400
9
We can easily find that the average item price sold to Wyoming is as high as $400. This makes Wyoming a outlier
compared to other states. New York state and California are grouped together due to their outstanding performance in
profit. Other states are grouped together since the algorithm "thinks" the similarity between them is large. However,
I have to point out that this section is just a demo to illustrate my ability in data mining and machine learning. Much
more work still need to be done to draw serious conclusions.
5 Doubtful Item.Price
Table 5.1: Doubtful Item.Price
Category Sub_Category Product_Name Total_Profit Max_Item_Price Min_Item_Price Range
Furniture Furnishings
Deflect-o
DuraMat Antistatic
Studded Beveled Mat
for Medium Pile Carpeting
244.3888 10105.34 42.136 10063.2
Technology Accessories
Logitech P710e
Mobile Speakerphone
1645.361 10257.49 205.992 10051.5
Furniture Chairs
DMI Arturo Collection
Mission-style Design
Wood Chair
486.1556 10105.69 105.686 10000
Technology Copiers
Canon PC1060
Personal Laser Copier
4570.935 10559.99 559.992 10000
Technology Phones BlackBerry Q10 548.0565 10100.79 100.792 10000
Technology Phones
RCA ViSYS 25825
Wireless digital phone
90.993 10103.99 103.992 10000
Office Supplies Binders
Ibico EPK-21
Electric Binding System
3345.282 1889.99 377.998 1511.992
Technology Machines
Cubify CubeX 3D
Printer Double Head Print
-8879.97 2399.992 899.997 1499.995
Technology Copiers
Canon imageCLASS 2200
Advanced Copier
25199.93 3499.99 2099.994 1399.996
Office Supplies Binders
GBC DocuBind P400
Electric Binding System
-1878.17 1360.99 272.198 1088.792
Technology Machines
Lexmark MX611dhe
Monochrome Laser Printer
-4589.97 1529.991 509.997 1019.994
Office Supplies Binders
Fellowes PB500 Electric
Punch Plastic Comb Binding
Machine with Manual Bind
7753.039 1270.99 254.198 1016.792
Office Supplies Binders
Fellowes PB200 Plastic Comb
Binding Machine
693.5592 1050.997 50.997 1000
Office Supplies Envelopes
Tyvek Top-Opening
Peel & Seel Envelopes,
Plain White
225.0504 1021.744 21.744 1000
As I mentioned at the end of Section 3, the difference between maximum item price and minimum item price are too
large for some products. In Table 5.1, I will all products that have doubtful Item.Price. The Range variable equals
the difference between Max_Item_Price and Min_Item_Price. It is implausible that Blackberry Q10 could be sold
at $10, 100.79 meanwhile be sold at $100.79.
10
6 Code in R and SQL
Code for Section 1:
##Data Import##
df1 <- read.csv(file =
"/home/kevin/Desktop/The Bee Corp/Quant Round.csv",
header = T)
dim(df1)
##summary of variables##
#Row.ID#
length(unique(df1$Row.ID))
#Order.ID#
length(unique(df1$Order.ID))
#Order.Date#
df1$Order.Date <- as.Date(df1$Order.Date,origin = "1900-01-01")
#Ship.Date#
df1$Ship.Date <-as.Date(df1$Ship.Date,origin = "1900-01-01")
#Ship.Mode#
unique(df1$Ship.Mode)
#Customer.ID#
length(unique(df1$Customer.ID))
#Customer.Name#
length(unique(df1$Customer.Name))
#Segment#
table(df1$Segment)
levels(df1$Segment) <- c("Consumer","Corporate","Corporate","Home Office") ## Fix TYPO
#Country#
table(df1$Country)
#City#
length(unique(df1$City))
#State#
length(unique(df1$State))
sort(unique(df1$State))
df1$State[df1$State == "CAL "] <- c("California")
df1$State <- droplevels(df1$State,exclude = "CAL ") #correct CAL #
df1$State[df1$State == "IND "] <- c("Indiana")
df1$State <- droplevels(df1$State,exclude = "IND ") #correct IND #
table(df1$State)
#Region#
table(df1$Region)
#Category#
table(df1$Category)
#Sub.Category#
table(df1$Sub.Category,df1$Category)
11
Code for Section 2:
#maps#
library(latticeExtra)
library(mapproj)
library(plyr)
df2 <- data.frame(state = tolower(df1$State),
sales = df1$SalesTotal,
profit = df1$Profit)
df2 <- ddply(df2, .(state),colwise(sum))
rng <- with(df2,
range(sales, profit, finite = TRUE))
nbreaks <- 50
breaks <- exp(do.breaks(log(abs(rng)), nbreaks))
plot1 <- mapplot(state ~ sales, data = df2,
breaks = seq(from = 0,
to = max(df2$sales), length.out = 51),
map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"),
scales = list(draw = FALSE),
main = "Total Sales in State Level",xlab = "")
plot2 <- mapplot(state ~ profit,data = df2,
breaks = seq(from = min(df2$profit),
to = max(df2$profit)*1.1, length.out = 51),
map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"),
scales = list(draw = FALSE),
main = "Profit in State Level",xlab = "")
print(plot1, split = c(1,1,1,2), more = TRUE)
print(plot2, split = c(1,2,1,2), more = FALSE)
#sales#
df2$sales.ratio <- df2$sales/sum(df2$sales)
head(df2[order(-df2$sales.ratio),],4)
#profit#
head(df2[order(-df2$profit),],4)
#discount#
table(df1[df1$State == "Texas","Discount"])
round(prop.table(table(
df1[df1$State == "Texas","Discount"])),2)
#region/sale#
table(df1$State)
head(table(df1$State,df1$Region),10)
df3 <- ddply(df1[,c("State","Profit","SalesTotal")],.(State),colwise(sum))
df3 <- arrange(df3,-df3$Profit)
df3$Profit.Sales.Ratio <- df3$Profit/df3$SalesTotal
df3[df3$Profit < 0,]
tab1 <- table(df1[,c("State","Discount")])
tab1[as.character(df3[df3$Profit < 0,"State"]),]
head(tab1[as.character(df3[df3$Profit > 0,"State"]),],5)
12
Code for Section 3:
#Category#
df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum))
df4 <- arrange(df4,-df4$Profit)
df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2)
df4
#Sub.Category#
df5 <- ddply(df1[,c("Category","Sub.Category","Profit")],
.(Category, Sub.Category),colwise(sum))
df5 <- arrange(df5, -df5$Profit)
df5
df5$Positive <- as.numeric(df5$Profit > 0)
key.variety <- list(space = "right", text = list (c("Profit","Deficit")),
points = list(pch = 16, col = c("#0080ff","#ff00ff")))
plot2 <- dotplot(factor(Sub.Category,
levels = rev(as.character(arrange(df5, df5$Category, -df5$Profit)$Sub.Category)))
~ Profit| Category, groups = -Positive, data = df5, pch = 16, key = key.variety,
layout = c(1,3), scales=list(y = list(relation = ’free’)),
panel = function(...){
panel.grid (h = 0, v= -1)
panel.dotplot(...)
},layout.heights = "free", xlab = "Profit/Deficit", lattice.options =
modifyList (lattice.options(),list(skip.boundary.labels = 0)),
main = "Profit by Category/Subcategory");plot2
resizePanels()
#Most Profitable Product#
df6 <- ddply(df1[,c("Category","Sub.Category","Product.Name","Quantity","Profit")],
.(Category,Sub.Category,Product.Name),colwise(sum))
head(arrange(df6,-df6$Profit))
13
Code in Section 4:
#Cluster Analysis#
par(mfrow = c(1,1))
df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")],
.(State),colwise(sum))
df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity
rownames(df7) <- as.character(df7$State)
df7 <- df7[,2:5]
head(df7)
df7.scaled <- scale(df7)
d <- dist(df7.scaled)
fit.average <- hclust(d, method = "average")
plot(fit.average, hang = -1, cex = 0.8, main = "Average Linkage Clustering")
library(NbClust)
devAskNewPage(ask = F)
nc <- NbClust(df7.scaled, distance = "euclidean",
min.nc = 2, max.nc = 10, method = "average")
table(nc$Best.n[1,])
par(mfrow = c(1,1))
barplot(table(nc$Best.n[1,]),xlab = "Number of Clusters", ylab = "Number of Criteria",
main = "Number of Clusters Chosen by 26 Criteria")
clusters <- cutree(fit.average, k = 3)
table(clusters)
aggregate(df7, by = list(clusters), median)
plot(fit.average, hang = -1, cex = 0.8,
main = "Average Linkage Clusteringn3 Cluster Solution")
rect.hclust(fit.average,k=3,border = 2)
Code in Section 5 - SAS/SQL Code:
PROC SQL;
SELECT * FROM(
SELECT DISTINCT Category, Sub_Category, Product_Name,
sum(Profit) as Total_Profit,
max(Item_Price) as Max_Item_Price, min(Item_Price) as Min_Item_Price,
Calculated Max_Item_Price - Calculated Min_Item_Price as Range
FROM WORK.IMPORT
GROUP BY Product_Name)
WHERE Calculated Range >= 1000
ORDER BY Calculated Range DESC;
QUIT;
14

More Related Content

Similar to Marketing analysis - writing sample

Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdfstudy help
 
Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdfstudy help
 
Uncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic SectorsUncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic SectorsStephen Bolduc
 
Mock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptxMock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptxTim Enalls
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxrhetttrevannion
 
Dynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales ordersDynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales ordersSteve Chapman
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using RMonika Mishra
 
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docxINTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docxnormanibarber20063
 
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docxChapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docxcravennichole326
 
You can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docxYou can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docxjeffevans62972
 
Project Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docxProject Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docxbriancrawford30935
 
Hewlett Packard - Fundamental Research Report
Hewlett Packard - Fundamental Research ReportHewlett Packard - Fundamental Research Report
Hewlett Packard - Fundamental Research ReportUsman Riaz
 
Neenah ir presentation october 2013
Neenah ir presentation october 2013Neenah ir presentation october 2013
Neenah ir presentation october 2013irneenahpaperinc
 
Dell incorporate
Dell incorporateDell incorporate
Dell incorporateshakir27
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course materialShariAdamson
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course materialChristopherOjeda123
 
Herramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficasHerramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficasssuser322245
 
DELL project
DELL projectDELL project
DELL projectKIMEP
 

Similar to Marketing analysis - writing sample (20)

MNGT.pdf
MNGT.pdfMNGT.pdf
MNGT.pdf
 
Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdf
 
Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdf
 
Uncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic SectorsUncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic Sectors
 
Mock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptxMock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptx
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docx
 
Dynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales ordersDynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales orders
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using R
 
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docxINTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
 
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docxChapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
 
You can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docxYou can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docx
 
Project Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docxProject Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docx
 
Hewlett Packard - Fundamental Research Report
Hewlett Packard - Fundamental Research ReportHewlett Packard - Fundamental Research Report
Hewlett Packard - Fundamental Research Report
 
Neenah ir presentation october 2013
Neenah ir presentation october 2013Neenah ir presentation october 2013
Neenah ir presentation october 2013
 
Dell incorporate
Dell incorporateDell incorporate
Dell incorporate
 
Sale Record System
Sale Record SystemSale Record System
Sale Record System
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course material
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course material
 
Herramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficasHerramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficas
 
DELL project
DELL projectDELL project
DELL project
 

Recently uploaded

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 

Recently uploaded (20)

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Marketing analysis - writing sample

  • 1. Marketing Analysis for The Bee Corp Qingyang(Kevin) Liu Email:tug14939@temple.edu June 22, 2017 1 Introduction of the dataset The orginal file Quant Round.xlsx contains three sheets. However, xlsx format is proprietary format hence can not be imported to R software without using other packages. I transfer Quant Round.xlsx into Quant Round.csv file and only keep the first sheet since csv foramt has much better compatibility and the first sheet from Quant Round.xlsx contains all information we need. The import process using read.csv command for R software is shown below: > df1 <- read.csv(file = + "/home/kevin/Desktop/The Bee Corp/Quant Round.csv", + header = T) > dim(df1) [1] 9994 22 The Quant Round.csv file has been imported into R as df1 data frame, which contains 9994 rows and 22 variables. The summary information for important variables are shown below. Row.ID: The primary key for this dataset. This variable is unique for each row. Order.ID: The order identification. This variable doesn’t have to be unqiue. One order could contain multiple rows (one order may contain different products.). There are 5009 distinct orders in df1. Order.Date: The date when order was created or submitted. Order.Date was stored in numeric formation. I trans- fer the numeric formation into yyyy-mm-dd formation, assuming the original date is "1900-01-01". Ship.Date: The date when order was shipped. Also stored in numeric formation. I transfer the numeric formation into yyyy-mm-dd formation, assuming the original date is "1900-01-01". Ship.Mode: There are four different ship mode: Same Day, First Class, Standard Class and Second Class. Customer.ID: Customer Identification. One customer has one unique ID. Segment: There are three different segments, Customer, Corporate and Home Office, in this dataset. (Corporate␣ has been corrected as Corporate) Country: All orders have been shipped within United States. City: There are 531 different cities in this dataset. State: There are 48 contiguous U.S. states and the District of Columbia in this dataset. (CAL␣ has been corrected as California. IND␣ has been corrected as Indiana) 1
  • 2. Region: There are five regions, Central, East, North, South and West, in this dataset. There are few mistakes in the original dataset. For example, there are 37 records in which Florida was categorized as North region. Product.ID: Production Identification. One product has one unique ID. Category: All productions belong to three categories, funiture, office supplies and technology. Sub.Category: The relationship between Sub.Category and Category are shown in Table 1.1. One Sub.Category only belongs to one Category. Table 1.1: Sub.Category (in column) and Category (in row) Furniture Office Supplies Technology Accessories 0 0 775 Appliances 0 466 0 Art 0 796 0 Binders 0 1523 0 Bookcases 228 0 0 Chairs 617 0 0 Copiers 0 0 68 Envelopes 0 254 0 Fasteners 0 217 0 Furnishings 957 0 0 Labels 0 364 0 Machines 0 0 115 Paper 0 1370 0 Phones 0 0 889 Storage 0 846 0 Supplies 0 190 0 Tables 319 0 0 SalesTotal: SalesTotal = Iterm.Price × Quantity, where Item.Price is the price after discount. Profit: Positive number stands for profit. Negative number stands for deficit. 2
  • 3. 2 Sales/Profit by Region Figure 2.1: Maps of Sales and Profit in State Level Total Sales in State Level 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 Profit in State Level −20000 0 20000 40000 60000 80000 3
  • 4. Table 2.1: Inconsistent definition of regions (part) > head(table(df1$State,df1$Region),10) Central East North South West Alabama 0 0 1 60 0 Arizona 0 0 0 0 224 Arkansas 0 0 1 59 0 California 0 0 0 0 2001 Colorado 0 0 0 0 182 Connecticut 0 82 0 0 0 Delaware 0 96 0 0 0 District of Columbia 0 10 0 0 0 Florida 0 0 37 346 0 Georgia 0 0 6 178 0 Table 2.2: Top 4 States by Sales > head(df2[order(-df2$sales.ratio),],4) state sales profit sales.ratio 4 california 457687.6 76381.39 0.19923710 31 new york 310876.3 74038.55 0.13532829 42 texas 170188.0 -25729.36 0.07408497 46 washington 138641.3 33402.65 0.06035226 Table 2.3: Top 4 States by Profit > head(df2[order(-df2$profit),],4) state sales profit sales.ratio 4 california 457687.63 76381.39 0.19923710 31 new york 310876.27 74038.55 0.13532829 46 washington 138641.27 33402.65 0.06035226 21 michigan 76269.61 24463.19 0.03320111 Table 2.4: Discount in Texas > table(df1[df1$State == "Texas","Discount"]) 0.2 0.3 0.32 0.4 0.6 0.8 570 94 27 13 81 200 The rule of categorizing regions is doubtful and inconsistent in this dataset. According to Table 2.1, 37 records in Florida have been defined as records in North region and 1 record in Alabama has been defined as a record in North region. There are more than 100 records that have been defined in wrong regions. In real work, we need to discuss the definition of each region with supervisor. For this analysis report, quantitative marketing analysis based on regions is skipped. 4
  • 5. According to Table 2.2 and Table 2.3, California is the largest market for the company and New York State is the second largest market for the company either based on sales or by profit. The Sales/Profit performance in Texas market is contradictory. By sales, Texas is the third largest market for the company. However, the company lost $25, 729.36 in Texas market. By looking at Table 2.4, we find that the company has large discount policy in Texas and that every product sold in Texas market has at least 20% discount. There are even 200 records of 80% discount. Future more, by look at Table 2.5 and Table 2.6, we find out that sales in deficits markets have at least 20% discount. Discount is a important reason for deficit in those market. We need to discuss the reason for applying large discount strategy with business manager. It could be market penetration strategy or those products are too difficult to sell. Table 2.5: Deficits Markets in States Level > df3[df3$Profit < 0,] State Profit SalesTotal Profit.Sales.Ratio 40 Oregon -1190.470 17431.15 -0.06829558 41 Florida -3399.302 89473.71 -0.03799219 42 Arizona -3427.925 35282.00 -0.09715789 43 Tennessee -5341.694 30661.87 -0.17421289 44 Colorado -6527.858 32108.12 -0.20330864 45 North Carolina -7490.912 55603.16 -0.13472097 46 Illinois -12607.887 80166.10 -0.15727205 47 Pennsylvania -15559.960 116511.91 -0.13354823 48 Ohio -16971.377 78258.14 -0.21686405 49 Texas -25729.356 170188.05 -0.15118192 Table 2.6: Discount in Deficits Markets > tab1 <- table(df1[,c("State","Discount")]) > tab1[as.character(df3[df3$Profit < 0,"State"]),] Discount State 0 0.1 0.15 0.2 0.3 0.32 0.4 0.45 0.5 0.6 0.7 0.8 Oregon 0 0 0 100 0 0 0 0 5 0 19 0 Florida 0 0 0 299 0 0 0 11 6 0 67 0 Arizona 0 0 0 174 0 0 0 0 9 0 41 0 Tennessee 0 0 0 144 0 0 8 0 2 0 29 0 Colorado 0 0 0 138 0 0 0 0 4 0 40 0 North Carolina 0 0 0 201 0 0 8 0 4 0 36 0 Illinois 0 0 0 264 53 0 0 0 18 57 0 100 Pennsylvania 0 0 0 354 36 0 82 0 10 0 105 0 Ohio 0 0 0 290 23 0 67 0 8 0 81 0 Texas 0 0 0 570 94 27 13 0 0 81 0 200 Conclusion: 1. California and New York States are the first two most successful markets based on either profit or sales. 2. Companies are losing money in states like Texas, Ohio and many others due to large discount. 3. The region is ill-defined so no conclusion has been made based on it. 5
  • 6. 3 Profit by Category/Subcategory/Specific Product According to Table 3.1, we find that Technology and Office Supplies account for 50.79% and 42.77% of total profit for the company. The products that belong to Furniture only contributes 6.44% of the total Profit for the company. Table 3.1: Profit by Category > df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum)) > df4 <- arrange(df4,-df4$Profit) > df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2) > df4 Category Profit percent 1 Technology 145454.95 50.79 2 Office Supplies 122490.80 42.77 3 Furniture 18451.27 6.44 Figure 3.1: Profit by Category/Subcategory Profit by Category/Subcategory Profit/Deficit Tables Bookcases Furnishings Chairs −20000 0 20000 40000 60000 Furniture Supplies Fasteners Labels Art Envelopes Appliances Storage Binders Paper Office Supplies Machines Accessories Phones Copiers Technology Profit Deficit 6
  • 7. Table 3.2: Most Profitable Products Category Sub.Category Product.Name Total.Quantity Total_Profit Average.Term.Price Max.Item.Price Min.Item.Price Technology Copiers Canon image CLASS 2200 Advanced Copier 20 25199.93 1259.996 3499.99 2099.994 Office Supplies Binders Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind 31 7753.039 250.098 1270.99 254.198 Technology Copiers Hewlett Packard LaserJet 3310 Copier 38 6983.884 183.7864 599.99 359.994 Technology Copiers Canon PC1060 Personal Laser Copier 19 4570.935 240.5755 10559.99 559.992 Technology Machines HP Designjet T520 Inkjet Large Format Printer - 24" Color 12 4094.977 341.2481 1749.99 874.995 Technology Machines Ativa V4110MDD Micro-Cut Shredder 11 3772.946 342.9951 699.99 699.99 Looking at Figure 3.1, we find that all 4 products that belong to technology can make profit for the company. Copiers, Phones and Accessories can make more than $40, 000 for the company! All products ,except Supplies, that belong to Office Supplies can make profit for the companies. For the Furniture products, Chairs and Furnishings, can make profit while Bookcases and Tables are responsible for deficit. From Table 3.2, the most profitable product is Canon image CLASS2200 Advanced Copier, which is a copiers and a sort of technology product. However, there is doubt about the Item.Price of Canon PC1060 Personal Laser Copier. The Max.Item.Price for that product is $10, 559.99 while the Min.Item.Price is $559.992. The difference is too large for a copier. I guess the difference was caused by Typo.I will discuss these large difference between maximum item price and minimum item price in section 5. Table 3.3: Details of Canon PC1060 Personal Laser Copier’s transaction Product_Name Item_Price Quantity Discount Canon PC1060 Personal Laser Copier 559.992 2 0.2 Canon PC1060 Personal Laser Copier 10559.992 5 0.2 Canon PC1060 Personal Laser Copier 559.992 5 0.2 Canon PC1060 Personal Laser Copier 699.99 7 0 Conclusion: 1. Products like copiers, phones, accessories in Technology category can make a lot of profit. 2. The performance of Furniture products are generally not good. Those products either make little profit and loss much money for the company. 3. The most profitable product is Canon image CLASS2200 Advanced Copier. 4. Some Item.Price are doubtful, (in Table 3.3, same printer has been sold at $10, 559.992 and $559.99). 7
  • 8. 4 Cluster Analysis (DEMO) Cluster Analysis is a powerful tool for marketing analysis. The cluster analysis is very handy when there are many continuous variables. Though we don’t have many continuous variables for this dataset, we can still use this methods to have some interesting findings. We create a new dataset after aggregating on State. The first 6 rows of the new dataset could be found in Table 4.1. The cluster analysis is based on SalesTotal, Profit, Quantity and Avg.item.price. Table 4.1: Dataset for clustering analysis > df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")], + .(State),colwise(sum)) > df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity > rownames(df7) <- as.character(df7$State) > df7 <- df7[,2:5] > head(df7) SalesTotal Profit Quantity Avg.iterm.price Alabama 19510.64 5786.825 256 76.21344 Arizona 35282.00 -3427.925 862 40.93040 Arkansas 11678.13 4008.687 240 48.65887 California 457687.63 76381.387 7667 59.69579 Colorado 32108.12 -6527.858 693 46.33206 Connecticut 13384.36 3511.492 281 47.63116 After standardizing each variable via scale function, we calculate the euclidean distance between each variable. Then we choose "average" algorithm for clustering analysis.The initial result of clustering analysis could be found in Figure 4.1. Figure 4.1: Initial Results - Cluster Analysis California NewYork Wyoming Texas Washington Vermont Florida Pennsylvania Illinois Ohio Michigan Virginia Georgia Indiana RhodeIsland Montana Nevada Maryland Massachusetts Missouri Alabama Oklahoma Minnesota Delaware NewJersey Kentucky Wisconsin NorthCarolina Arizona Colorado Tennessee WestVirginia DistrictofColumbia Idaho Louisiana Nebraska NewHampshire Mississippi Arkansas Connecticut SouthCarolina Utah Oregon Maine Iowa Kansas NewMexico NorthDakota SouthDakota 0246 Average Linkage Clustering hclust (*, "average") d Height There are many criterion we can choose to determine the number of clusters. According to my experience, the NbClust::NbClust function could be very helpful. 8
  • 9. Figure 4.2: Determine the number of clusters 0 2 3 5 9 10 Number of Clusters Chosen by 26 Criteria Number of Clusters NumberofCriteria 02468 The NbClust::NbClust use 26 different criteria to determine the number of clusters. According to the result from NbClust::NbClustin Figure 4.2, I decide to set the number of cluster equal 3. Figure 4.3: Final Results - Cluster Analysis California NewYork Wyoming Texas Washington Vermont Florida Pennsylvania Illinois Ohio Michigan Virginia Georgia Indiana RhodeIsland Montana Nevada Maryland Massachusetts Missouri Alabama Oklahoma Minnesota Delaware NewJersey Kentucky Wisconsin NorthCarolina Arizona Colorado Tennessee WestVirginia DistrictofColumbia Idaho Louisiana Nebraska NewHampshire Mississippi Arkansas Connecticut SouthCarolina Utah Oregon Maine Iowa Kansas NewMexico NorthDakota SouthDakota 0246 Average Linkage Clustering 3 Cluster Solution hclust (*, "average") d Height The final result could be found in Figure 4.3. New York and California are categorized as cluster 2. Wyoming is categorized as cluster 3. The rest states are categorized as cluster 1. Description of Clusters > aggregate(df7, by = list(clusters), median) Group.1 SalesTotal Profit Quantity Avg.iterm.price 1 1 20944.270 2116.598 268.5 57.87003 2 2 384281.951 75209.968 5945.5 66.64670 3 3 1603.136 100.196 4.0 400.78400 9
  • 10. We can easily find that the average item price sold to Wyoming is as high as $400. This makes Wyoming a outlier compared to other states. New York state and California are grouped together due to their outstanding performance in profit. Other states are grouped together since the algorithm "thinks" the similarity between them is large. However, I have to point out that this section is just a demo to illustrate my ability in data mining and machine learning. Much more work still need to be done to draw serious conclusions. 5 Doubtful Item.Price Table 5.1: Doubtful Item.Price Category Sub_Category Product_Name Total_Profit Max_Item_Price Min_Item_Price Range Furniture Furnishings Deflect-o DuraMat Antistatic Studded Beveled Mat for Medium Pile Carpeting 244.3888 10105.34 42.136 10063.2 Technology Accessories Logitech P710e Mobile Speakerphone 1645.361 10257.49 205.992 10051.5 Furniture Chairs DMI Arturo Collection Mission-style Design Wood Chair 486.1556 10105.69 105.686 10000 Technology Copiers Canon PC1060 Personal Laser Copier 4570.935 10559.99 559.992 10000 Technology Phones BlackBerry Q10 548.0565 10100.79 100.792 10000 Technology Phones RCA ViSYS 25825 Wireless digital phone 90.993 10103.99 103.992 10000 Office Supplies Binders Ibico EPK-21 Electric Binding System 3345.282 1889.99 377.998 1511.992 Technology Machines Cubify CubeX 3D Printer Double Head Print -8879.97 2399.992 899.997 1499.995 Technology Copiers Canon imageCLASS 2200 Advanced Copier 25199.93 3499.99 2099.994 1399.996 Office Supplies Binders GBC DocuBind P400 Electric Binding System -1878.17 1360.99 272.198 1088.792 Technology Machines Lexmark MX611dhe Monochrome Laser Printer -4589.97 1529.991 509.997 1019.994 Office Supplies Binders Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind 7753.039 1270.99 254.198 1016.792 Office Supplies Binders Fellowes PB200 Plastic Comb Binding Machine 693.5592 1050.997 50.997 1000 Office Supplies Envelopes Tyvek Top-Opening Peel & Seel Envelopes, Plain White 225.0504 1021.744 21.744 1000 As I mentioned at the end of Section 3, the difference between maximum item price and minimum item price are too large for some products. In Table 5.1, I will all products that have doubtful Item.Price. The Range variable equals the difference between Max_Item_Price and Min_Item_Price. It is implausible that Blackberry Q10 could be sold at $10, 100.79 meanwhile be sold at $100.79. 10
  • 11. 6 Code in R and SQL Code for Section 1: ##Data Import## df1 <- read.csv(file = "/home/kevin/Desktop/The Bee Corp/Quant Round.csv", header = T) dim(df1) ##summary of variables## #Row.ID# length(unique(df1$Row.ID)) #Order.ID# length(unique(df1$Order.ID)) #Order.Date# df1$Order.Date <- as.Date(df1$Order.Date,origin = "1900-01-01") #Ship.Date# df1$Ship.Date <-as.Date(df1$Ship.Date,origin = "1900-01-01") #Ship.Mode# unique(df1$Ship.Mode) #Customer.ID# length(unique(df1$Customer.ID)) #Customer.Name# length(unique(df1$Customer.Name)) #Segment# table(df1$Segment) levels(df1$Segment) <- c("Consumer","Corporate","Corporate","Home Office") ## Fix TYPO #Country# table(df1$Country) #City# length(unique(df1$City)) #State# length(unique(df1$State)) sort(unique(df1$State)) df1$State[df1$State == "CAL "] <- c("California") df1$State <- droplevels(df1$State,exclude = "CAL ") #correct CAL # df1$State[df1$State == "IND "] <- c("Indiana") df1$State <- droplevels(df1$State,exclude = "IND ") #correct IND # table(df1$State) #Region# table(df1$Region) #Category# table(df1$Category) #Sub.Category# table(df1$Sub.Category,df1$Category) 11
  • 12. Code for Section 2: #maps# library(latticeExtra) library(mapproj) library(plyr) df2 <- data.frame(state = tolower(df1$State), sales = df1$SalesTotal, profit = df1$Profit) df2 <- ddply(df2, .(state),colwise(sum)) rng <- with(df2, range(sales, profit, finite = TRUE)) nbreaks <- 50 breaks <- exp(do.breaks(log(abs(rng)), nbreaks)) plot1 <- mapplot(state ~ sales, data = df2, breaks = seq(from = 0, to = max(df2$sales), length.out = 51), map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"), scales = list(draw = FALSE), main = "Total Sales in State Level",xlab = "") plot2 <- mapplot(state ~ profit,data = df2, breaks = seq(from = min(df2$profit), to = max(df2$profit)*1.1, length.out = 51), map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"), scales = list(draw = FALSE), main = "Profit in State Level",xlab = "") print(plot1, split = c(1,1,1,2), more = TRUE) print(plot2, split = c(1,2,1,2), more = FALSE) #sales# df2$sales.ratio <- df2$sales/sum(df2$sales) head(df2[order(-df2$sales.ratio),],4) #profit# head(df2[order(-df2$profit),],4) #discount# table(df1[df1$State == "Texas","Discount"]) round(prop.table(table( df1[df1$State == "Texas","Discount"])),2) #region/sale# table(df1$State) head(table(df1$State,df1$Region),10) df3 <- ddply(df1[,c("State","Profit","SalesTotal")],.(State),colwise(sum)) df3 <- arrange(df3,-df3$Profit) df3$Profit.Sales.Ratio <- df3$Profit/df3$SalesTotal df3[df3$Profit < 0,] tab1 <- table(df1[,c("State","Discount")]) tab1[as.character(df3[df3$Profit < 0,"State"]),] head(tab1[as.character(df3[df3$Profit > 0,"State"]),],5) 12
  • 13. Code for Section 3: #Category# df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum)) df4 <- arrange(df4,-df4$Profit) df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2) df4 #Sub.Category# df5 <- ddply(df1[,c("Category","Sub.Category","Profit")], .(Category, Sub.Category),colwise(sum)) df5 <- arrange(df5, -df5$Profit) df5 df5$Positive <- as.numeric(df5$Profit > 0) key.variety <- list(space = "right", text = list (c("Profit","Deficit")), points = list(pch = 16, col = c("#0080ff","#ff00ff"))) plot2 <- dotplot(factor(Sub.Category, levels = rev(as.character(arrange(df5, df5$Category, -df5$Profit)$Sub.Category))) ~ Profit| Category, groups = -Positive, data = df5, pch = 16, key = key.variety, layout = c(1,3), scales=list(y = list(relation = ’free’)), panel = function(...){ panel.grid (h = 0, v= -1) panel.dotplot(...) },layout.heights = "free", xlab = "Profit/Deficit", lattice.options = modifyList (lattice.options(),list(skip.boundary.labels = 0)), main = "Profit by Category/Subcategory");plot2 resizePanels() #Most Profitable Product# df6 <- ddply(df1[,c("Category","Sub.Category","Product.Name","Quantity","Profit")], .(Category,Sub.Category,Product.Name),colwise(sum)) head(arrange(df6,-df6$Profit)) 13
  • 14. Code in Section 4: #Cluster Analysis# par(mfrow = c(1,1)) df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")], .(State),colwise(sum)) df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity rownames(df7) <- as.character(df7$State) df7 <- df7[,2:5] head(df7) df7.scaled <- scale(df7) d <- dist(df7.scaled) fit.average <- hclust(d, method = "average") plot(fit.average, hang = -1, cex = 0.8, main = "Average Linkage Clustering") library(NbClust) devAskNewPage(ask = F) nc <- NbClust(df7.scaled, distance = "euclidean", min.nc = 2, max.nc = 10, method = "average") table(nc$Best.n[1,]) par(mfrow = c(1,1)) barplot(table(nc$Best.n[1,]),xlab = "Number of Clusters", ylab = "Number of Criteria", main = "Number of Clusters Chosen by 26 Criteria") clusters <- cutree(fit.average, k = 3) table(clusters) aggregate(df7, by = list(clusters), median) plot(fit.average, hang = -1, cex = 0.8, main = "Average Linkage Clusteringn3 Cluster Solution") rect.hclust(fit.average,k=3,border = 2) Code in Section 5 - SAS/SQL Code: PROC SQL; SELECT * FROM( SELECT DISTINCT Category, Sub_Category, Product_Name, sum(Profit) as Total_Profit, max(Item_Price) as Max_Item_Price, min(Item_Price) as Min_Item_Price, Calculated Max_Item_Price - Calculated Min_Item_Price as Range FROM WORK.IMPORT GROUP BY Product_Name) WHERE Calculated Range >= 1000 ORDER BY Calculated Range DESC; QUIT; 14