SlideShare a Scribd company logo
Marketing Analysis for The Bee Corp
Qingyang(Kevin) Liu
Email:tug14939@temple.edu
June 22, 2017
1 Introduction of the dataset
The orginal file Quant Round.xlsx contains three sheets. However, xlsx format is proprietary format hence can
not be imported to R software without using other packages. I transfer Quant Round.xlsx into Quant Round.csv
file and only keep the first sheet since csv foramt has much better compatibility and the first sheet from Quant
Round.xlsx contains all information we need.
The import process using read.csv command for R software is shown below:
> df1 <- read.csv(file =
+ "/home/kevin/Desktop/The Bee Corp/Quant Round.csv",
+ header = T)
> dim(df1)
[1] 9994 22
The Quant Round.csv file has been imported into R as df1 data frame, which contains 9994 rows and 22 variables.
The summary information for important variables are shown below.
Row.ID: The primary key for this dataset. This variable is unique for each row.
Order.ID: The order identification. This variable doesn’t have to be unqiue. One order could contain multiple
rows (one order may contain different products.). There are 5009 distinct orders in df1.
Order.Date: The date when order was created or submitted. Order.Date was stored in numeric formation. I trans-
fer the numeric formation into yyyy-mm-dd formation, assuming the original date is "1900-01-01".
Ship.Date: The date when order was shipped. Also stored in numeric formation. I transfer the numeric formation
into yyyy-mm-dd formation, assuming the original date is "1900-01-01".
Ship.Mode: There are four different ship mode: Same Day, First Class, Standard Class and Second Class.
Customer.ID: Customer Identification. One customer has one unique ID.
Segment: There are three different segments, Customer, Corporate and Home Office, in this dataset.
(Corporate␣ has been corrected as Corporate)
Country: All orders have been shipped within United States.
City: There are 531 different cities in this dataset.
State: There are 48 contiguous U.S. states and the District of Columbia in this dataset.
(CAL␣ has been corrected as California. IND␣ has been corrected as Indiana)
1
Region: There are five regions, Central, East, North, South and West, in this dataset. There are few mistakes
in the original dataset. For example, there are 37 records in which Florida was categorized as North
region.
Product.ID: Production Identification. One product has one unique ID.
Category: All productions belong to three categories, funiture, office supplies and technology.
Sub.Category: The relationship between Sub.Category and Category are shown in Table 1.1.
One Sub.Category only belongs to one Category.
Table 1.1: Sub.Category (in column) and Category (in row)
Furniture Office Supplies Technology
Accessories 0 0 775
Appliances 0 466 0
Art 0 796 0
Binders 0 1523 0
Bookcases 228 0 0
Chairs 617 0 0
Copiers 0 0 68
Envelopes 0 254 0
Fasteners 0 217 0
Furnishings 957 0 0
Labels 0 364 0
Machines 0 0 115
Paper 0 1370 0
Phones 0 0 889
Storage 0 846 0
Supplies 0 190 0
Tables 319 0 0
SalesTotal: SalesTotal = Iterm.Price × Quantity, where Item.Price is the price after discount.
Profit: Positive number stands for profit. Negative number stands for deficit.
2
2 Sales/Profit by Region
Figure 2.1: Maps of Sales and Profit in State Level
Total Sales in State Level
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
Profit in State Level
−20000
0
20000
40000
60000
80000
3
Table 2.1: Inconsistent definition of regions (part)
> head(table(df1$State,df1$Region),10)
Central East North South West
Alabama 0 0 1 60 0
Arizona 0 0 0 0 224
Arkansas 0 0 1 59 0
California 0 0 0 0 2001
Colorado 0 0 0 0 182
Connecticut 0 82 0 0 0
Delaware 0 96 0 0 0
District of Columbia 0 10 0 0 0
Florida 0 0 37 346 0
Georgia 0 0 6 178 0
Table 2.2: Top 4 States by Sales
> head(df2[order(-df2$sales.ratio),],4)
state sales profit sales.ratio
4 california 457687.6 76381.39 0.19923710
31 new york 310876.3 74038.55 0.13532829
42 texas 170188.0 -25729.36 0.07408497
46 washington 138641.3 33402.65 0.06035226
Table 2.3: Top 4 States by Profit
> head(df2[order(-df2$profit),],4)
state sales profit sales.ratio
4 california 457687.63 76381.39 0.19923710
31 new york 310876.27 74038.55 0.13532829
46 washington 138641.27 33402.65 0.06035226
21 michigan 76269.61 24463.19 0.03320111
Table 2.4: Discount in Texas
> table(df1[df1$State == "Texas","Discount"])
0.2 0.3 0.32 0.4 0.6 0.8
570 94 27 13 81 200
The rule of categorizing regions is doubtful and inconsistent in this dataset. According to Table 2.1, 37 records in
Florida have been defined as records in North region and 1 record in Alabama has been defined as a record in North
region. There are more than 100 records that have been defined in wrong regions. In real work, we need to discuss the
definition of each region with supervisor. For this analysis report, quantitative marketing analysis based on regions is
skipped.
4
According to Table 2.2 and Table 2.3, California is the largest market for the company and New York State is the
second largest market for the company either based on sales or by profit. The Sales/Profit performance in Texas
market is contradictory. By sales, Texas is the third largest market for the company. However, the company lost
$25, 729.36 in Texas market. By looking at Table 2.4, we find that the company has large discount policy in Texas
and that every product sold in Texas market has at least 20% discount. There are even 200 records of 80% discount.
Future more, by look at Table 2.5 and Table 2.6, we find out that sales in deficits markets have at least 20% discount.
Discount is a important reason for deficit in those market. We need to discuss the reason for applying large discount
strategy with business manager. It could be market penetration strategy or those products are too difficult to sell.
Table 2.5: Deficits Markets in States Level
> df3[df3$Profit < 0,]
State Profit SalesTotal Profit.Sales.Ratio
40 Oregon -1190.470 17431.15 -0.06829558
41 Florida -3399.302 89473.71 -0.03799219
42 Arizona -3427.925 35282.00 -0.09715789
43 Tennessee -5341.694 30661.87 -0.17421289
44 Colorado -6527.858 32108.12 -0.20330864
45 North Carolina -7490.912 55603.16 -0.13472097
46 Illinois -12607.887 80166.10 -0.15727205
47 Pennsylvania -15559.960 116511.91 -0.13354823
48 Ohio -16971.377 78258.14 -0.21686405
49 Texas -25729.356 170188.05 -0.15118192
Table 2.6: Discount in Deficits Markets
> tab1 <- table(df1[,c("State","Discount")])
> tab1[as.character(df3[df3$Profit < 0,"State"]),]
Discount
State 0 0.1 0.15 0.2 0.3 0.32 0.4 0.45 0.5 0.6 0.7 0.8
Oregon 0 0 0 100 0 0 0 0 5 0 19 0
Florida 0 0 0 299 0 0 0 11 6 0 67 0
Arizona 0 0 0 174 0 0 0 0 9 0 41 0
Tennessee 0 0 0 144 0 0 8 0 2 0 29 0
Colorado 0 0 0 138 0 0 0 0 4 0 40 0
North Carolina 0 0 0 201 0 0 8 0 4 0 36 0
Illinois 0 0 0 264 53 0 0 0 18 57 0 100
Pennsylvania 0 0 0 354 36 0 82 0 10 0 105 0
Ohio 0 0 0 290 23 0 67 0 8 0 81 0
Texas 0 0 0 570 94 27 13 0 0 81 0 200
Conclusion:
1. California and New York States are the first two most successful markets based on either profit or sales.
2. Companies are losing money in states like Texas, Ohio and many others due to large discount.
3. The region is ill-defined so no conclusion has been made based on it.
5
3 Profit by Category/Subcategory/Specific Product
According to Table 3.1, we find that Technology and Office Supplies account for 50.79% and 42.77% of total
profit for the company. The products that belong to Furniture only contributes 6.44% of the total Profit for the
company.
Table 3.1: Profit by Category
> df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum))
> df4 <- arrange(df4,-df4$Profit)
> df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2)
> df4
Category Profit percent
1 Technology 145454.95 50.79
2 Office Supplies 122490.80 42.77
3 Furniture 18451.27 6.44
Figure 3.1: Profit by Category/Subcategory
Profit by Category/Subcategory
Profit/Deficit
Tables
Bookcases
Furnishings
Chairs
−20000 0 20000 40000 60000
Furniture
Supplies
Fasteners
Labels
Art
Envelopes
Appliances
Storage
Binders
Paper
Office Supplies
Machines
Accessories
Phones
Copiers
Technology
Profit
Deficit
6
Table 3.2: Most Profitable Products
Category Sub.Category Product.Name Total.Quantity Total_Profit Average.Term.Price Max.Item.Price Min.Item.Price
Technology Copiers
Canon image
CLASS 2200
Advanced Copier
20 25199.93 1259.996 3499.99 2099.994
Office Supplies Binders
Fellowes PB500
Electric Punch
Plastic Comb
Binding Machine
with Manual Bind
31 7753.039 250.098 1270.99 254.198
Technology Copiers
Hewlett Packard
LaserJet 3310 Copier
38 6983.884 183.7864 599.99 359.994
Technology Copiers
Canon PC1060
Personal Laser Copier
19 4570.935 240.5755 10559.99 559.992
Technology Machines
HP Designjet
T520 Inkjet
Large Format Printer
- 24" Color
12 4094.977 341.2481 1749.99 874.995
Technology Machines
Ativa V4110MDD
Micro-Cut Shredder
11 3772.946 342.9951 699.99 699.99
Looking at Figure 3.1, we find that all 4 products that belong to technology can make profit for the company. Copiers,
Phones and Accessories can make more than $40, 000 for the company! All products ,except Supplies, that belong
to Office Supplies can make profit for the companies. For the Furniture products, Chairs and Furnishings, can
make profit while Bookcases and Tables are responsible for deficit.
From Table 3.2, the most profitable product is Canon image CLASS2200 Advanced Copier, which is a copiers and a
sort of technology product. However, there is doubt about the Item.Price of Canon PC1060 Personal Laser Copier.
The Max.Item.Price for that product is $10, 559.99 while the Min.Item.Price is $559.992. The difference is too
large for a copier. I guess the difference was caused by Typo.I will discuss these large difference between maximum
item price and minimum item price in section 5.
Table 3.3: Details of Canon PC1060 Personal Laser Copier’s transaction
Product_Name Item_Price Quantity Discount
Canon PC1060 Personal Laser Copier 559.992 2 0.2
Canon PC1060 Personal Laser Copier 10559.992 5 0.2
Canon PC1060 Personal Laser Copier 559.992 5 0.2
Canon PC1060 Personal Laser Copier 699.99 7 0
Conclusion:
1. Products like copiers, phones, accessories in Technology category can make a lot of profit.
2. The performance of Furniture products are generally not good. Those products either make little profit and
loss much money for the company.
3. The most profitable product is Canon image CLASS2200 Advanced Copier.
4. Some Item.Price are doubtful, (in Table 3.3, same printer has been sold at $10, 559.992 and $559.99).
7
4 Cluster Analysis (DEMO)
Cluster Analysis is a powerful tool for marketing analysis. The cluster analysis is very handy when there are many
continuous variables. Though we don’t have many continuous variables for this dataset, we can still use this methods
to have some interesting findings.
We create a new dataset after aggregating on State. The first 6 rows of the new dataset could be found in Table 4.1.
The cluster analysis is based on SalesTotal, Profit, Quantity and Avg.item.price.
Table 4.1: Dataset for clustering analysis
> df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")],
+ .(State),colwise(sum))
> df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity
> rownames(df7) <- as.character(df7$State)
> df7 <- df7[,2:5]
> head(df7)
SalesTotal Profit Quantity Avg.iterm.price
Alabama 19510.64 5786.825 256 76.21344
Arizona 35282.00 -3427.925 862 40.93040
Arkansas 11678.13 4008.687 240 48.65887
California 457687.63 76381.387 7667 59.69579
Colorado 32108.12 -6527.858 693 46.33206
Connecticut 13384.36 3511.492 281 47.63116
After standardizing each variable via scale function, we calculate the euclidean distance between each variable. Then
we choose "average" algorithm for clustering analysis.The initial result of clustering analysis could be found in Figure
4.1.
Figure 4.1: Initial Results - Cluster Analysis
California
NewYork
Wyoming
Texas
Washington
Vermont
Florida
Pennsylvania
Illinois
Ohio
Michigan
Virginia
Georgia
Indiana
RhodeIsland
Montana
Nevada
Maryland
Massachusetts
Missouri
Alabama
Oklahoma
Minnesota
Delaware
NewJersey
Kentucky
Wisconsin
NorthCarolina
Arizona
Colorado
Tennessee
WestVirginia
DistrictofColumbia
Idaho
Louisiana
Nebraska
NewHampshire
Mississippi
Arkansas
Connecticut
SouthCarolina
Utah
Oregon
Maine
Iowa
Kansas
NewMexico
NorthDakota
SouthDakota
0246
Average Linkage Clustering
hclust (*, "average")
d
Height
There are many criterion we can choose to determine the number of clusters. According to my experience, the
NbClust::NbClust function could be very helpful.
8
Figure 4.2: Determine the number of clusters
0 2 3 5 9 10
Number of Clusters Chosen by 26 Criteria
Number of Clusters
NumberofCriteria
02468
The NbClust::NbClust use 26 different criteria to determine the number of clusters. According to the result from
NbClust::NbClustin Figure 4.2, I decide to set the number of cluster equal 3.
Figure 4.3: Final Results - Cluster Analysis
California
NewYork
Wyoming
Texas
Washington
Vermont
Florida
Pennsylvania
Illinois
Ohio
Michigan
Virginia
Georgia
Indiana
RhodeIsland
Montana
Nevada
Maryland
Massachusetts
Missouri
Alabama
Oklahoma
Minnesota
Delaware
NewJersey
Kentucky
Wisconsin
NorthCarolina
Arizona
Colorado
Tennessee
WestVirginia
DistrictofColumbia
Idaho
Louisiana
Nebraska
NewHampshire
Mississippi
Arkansas
Connecticut
SouthCarolina
Utah
Oregon
Maine
Iowa
Kansas
NewMexico
NorthDakota
SouthDakota
0246
Average Linkage Clustering
3 Cluster Solution
hclust (*, "average")
d
Height
The final result could be found in Figure 4.3. New York and California are categorized as cluster 2. Wyoming is
categorized as cluster 3. The rest states are categorized as cluster 1.
Description of Clusters
> aggregate(df7, by = list(clusters), median)
Group.1 SalesTotal Profit Quantity Avg.iterm.price
1 1 20944.270 2116.598 268.5 57.87003
2 2 384281.951 75209.968 5945.5 66.64670
3 3 1603.136 100.196 4.0 400.78400
9
We can easily find that the average item price sold to Wyoming is as high as $400. This makes Wyoming a outlier
compared to other states. New York state and California are grouped together due to their outstanding performance in
profit. Other states are grouped together since the algorithm "thinks" the similarity between them is large. However,
I have to point out that this section is just a demo to illustrate my ability in data mining and machine learning. Much
more work still need to be done to draw serious conclusions.
5 Doubtful Item.Price
Table 5.1: Doubtful Item.Price
Category Sub_Category Product_Name Total_Profit Max_Item_Price Min_Item_Price Range
Furniture Furnishings
Deflect-o
DuraMat Antistatic
Studded Beveled Mat
for Medium Pile Carpeting
244.3888 10105.34 42.136 10063.2
Technology Accessories
Logitech P710e
Mobile Speakerphone
1645.361 10257.49 205.992 10051.5
Furniture Chairs
DMI Arturo Collection
Mission-style Design
Wood Chair
486.1556 10105.69 105.686 10000
Technology Copiers
Canon PC1060
Personal Laser Copier
4570.935 10559.99 559.992 10000
Technology Phones BlackBerry Q10 548.0565 10100.79 100.792 10000
Technology Phones
RCA ViSYS 25825
Wireless digital phone
90.993 10103.99 103.992 10000
Office Supplies Binders
Ibico EPK-21
Electric Binding System
3345.282 1889.99 377.998 1511.992
Technology Machines
Cubify CubeX 3D
Printer Double Head Print
-8879.97 2399.992 899.997 1499.995
Technology Copiers
Canon imageCLASS 2200
Advanced Copier
25199.93 3499.99 2099.994 1399.996
Office Supplies Binders
GBC DocuBind P400
Electric Binding System
-1878.17 1360.99 272.198 1088.792
Technology Machines
Lexmark MX611dhe
Monochrome Laser Printer
-4589.97 1529.991 509.997 1019.994
Office Supplies Binders
Fellowes PB500 Electric
Punch Plastic Comb Binding
Machine with Manual Bind
7753.039 1270.99 254.198 1016.792
Office Supplies Binders
Fellowes PB200 Plastic Comb
Binding Machine
693.5592 1050.997 50.997 1000
Office Supplies Envelopes
Tyvek Top-Opening
Peel & Seel Envelopes,
Plain White
225.0504 1021.744 21.744 1000
As I mentioned at the end of Section 3, the difference between maximum item price and minimum item price are too
large for some products. In Table 5.1, I will all products that have doubtful Item.Price. The Range variable equals
the difference between Max_Item_Price and Min_Item_Price. It is implausible that Blackberry Q10 could be sold
at $10, 100.79 meanwhile be sold at $100.79.
10
6 Code in R and SQL
Code for Section 1:
##Data Import##
df1 <- read.csv(file =
"/home/kevin/Desktop/The Bee Corp/Quant Round.csv",
header = T)
dim(df1)
##summary of variables##
#Row.ID#
length(unique(df1$Row.ID))
#Order.ID#
length(unique(df1$Order.ID))
#Order.Date#
df1$Order.Date <- as.Date(df1$Order.Date,origin = "1900-01-01")
#Ship.Date#
df1$Ship.Date <-as.Date(df1$Ship.Date,origin = "1900-01-01")
#Ship.Mode#
unique(df1$Ship.Mode)
#Customer.ID#
length(unique(df1$Customer.ID))
#Customer.Name#
length(unique(df1$Customer.Name))
#Segment#
table(df1$Segment)
levels(df1$Segment) <- c("Consumer","Corporate","Corporate","Home Office") ## Fix TYPO
#Country#
table(df1$Country)
#City#
length(unique(df1$City))
#State#
length(unique(df1$State))
sort(unique(df1$State))
df1$State[df1$State == "CAL "] <- c("California")
df1$State <- droplevels(df1$State,exclude = "CAL ") #correct CAL #
df1$State[df1$State == "IND "] <- c("Indiana")
df1$State <- droplevels(df1$State,exclude = "IND ") #correct IND #
table(df1$State)
#Region#
table(df1$Region)
#Category#
table(df1$Category)
#Sub.Category#
table(df1$Sub.Category,df1$Category)
11
Code for Section 2:
#maps#
library(latticeExtra)
library(mapproj)
library(plyr)
df2 <- data.frame(state = tolower(df1$State),
sales = df1$SalesTotal,
profit = df1$Profit)
df2 <- ddply(df2, .(state),colwise(sum))
rng <- with(df2,
range(sales, profit, finite = TRUE))
nbreaks <- 50
breaks <- exp(do.breaks(log(abs(rng)), nbreaks))
plot1 <- mapplot(state ~ sales, data = df2,
breaks = seq(from = 0,
to = max(df2$sales), length.out = 51),
map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"),
scales = list(draw = FALSE),
main = "Total Sales in State Level",xlab = "")
plot2 <- mapplot(state ~ profit,data = df2,
breaks = seq(from = min(df2$profit),
to = max(df2$profit)*1.1, length.out = 51),
map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"),
scales = list(draw = FALSE),
main = "Profit in State Level",xlab = "")
print(plot1, split = c(1,1,1,2), more = TRUE)
print(plot2, split = c(1,2,1,2), more = FALSE)
#sales#
df2$sales.ratio <- df2$sales/sum(df2$sales)
head(df2[order(-df2$sales.ratio),],4)
#profit#
head(df2[order(-df2$profit),],4)
#discount#
table(df1[df1$State == "Texas","Discount"])
round(prop.table(table(
df1[df1$State == "Texas","Discount"])),2)
#region/sale#
table(df1$State)
head(table(df1$State,df1$Region),10)
df3 <- ddply(df1[,c("State","Profit","SalesTotal")],.(State),colwise(sum))
df3 <- arrange(df3,-df3$Profit)
df3$Profit.Sales.Ratio <- df3$Profit/df3$SalesTotal
df3[df3$Profit < 0,]
tab1 <- table(df1[,c("State","Discount")])
tab1[as.character(df3[df3$Profit < 0,"State"]),]
head(tab1[as.character(df3[df3$Profit > 0,"State"]),],5)
12
Code for Section 3:
#Category#
df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum))
df4 <- arrange(df4,-df4$Profit)
df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2)
df4
#Sub.Category#
df5 <- ddply(df1[,c("Category","Sub.Category","Profit")],
.(Category, Sub.Category),colwise(sum))
df5 <- arrange(df5, -df5$Profit)
df5
df5$Positive <- as.numeric(df5$Profit > 0)
key.variety <- list(space = "right", text = list (c("Profit","Deficit")),
points = list(pch = 16, col = c("#0080ff","#ff00ff")))
plot2 <- dotplot(factor(Sub.Category,
levels = rev(as.character(arrange(df5, df5$Category, -df5$Profit)$Sub.Category)))
~ Profit| Category, groups = -Positive, data = df5, pch = 16, key = key.variety,
layout = c(1,3), scales=list(y = list(relation = ’free’)),
panel = function(...){
panel.grid (h = 0, v= -1)
panel.dotplot(...)
},layout.heights = "free", xlab = "Profit/Deficit", lattice.options =
modifyList (lattice.options(),list(skip.boundary.labels = 0)),
main = "Profit by Category/Subcategory");plot2
resizePanels()
#Most Profitable Product#
df6 <- ddply(df1[,c("Category","Sub.Category","Product.Name","Quantity","Profit")],
.(Category,Sub.Category,Product.Name),colwise(sum))
head(arrange(df6,-df6$Profit))
13
Code in Section 4:
#Cluster Analysis#
par(mfrow = c(1,1))
df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")],
.(State),colwise(sum))
df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity
rownames(df7) <- as.character(df7$State)
df7 <- df7[,2:5]
head(df7)
df7.scaled <- scale(df7)
d <- dist(df7.scaled)
fit.average <- hclust(d, method = "average")
plot(fit.average, hang = -1, cex = 0.8, main = "Average Linkage Clustering")
library(NbClust)
devAskNewPage(ask = F)
nc <- NbClust(df7.scaled, distance = "euclidean",
min.nc = 2, max.nc = 10, method = "average")
table(nc$Best.n[1,])
par(mfrow = c(1,1))
barplot(table(nc$Best.n[1,]),xlab = "Number of Clusters", ylab = "Number of Criteria",
main = "Number of Clusters Chosen by 26 Criteria")
clusters <- cutree(fit.average, k = 3)
table(clusters)
aggregate(df7, by = list(clusters), median)
plot(fit.average, hang = -1, cex = 0.8,
main = "Average Linkage Clusteringn3 Cluster Solution")
rect.hclust(fit.average,k=3,border = 2)
Code in Section 5 - SAS/SQL Code:
PROC SQL;
SELECT * FROM(
SELECT DISTINCT Category, Sub_Category, Product_Name,
sum(Profit) as Total_Profit,
max(Item_Price) as Max_Item_Price, min(Item_Price) as Min_Item_Price,
Calculated Max_Item_Price - Calculated Min_Item_Price as Range
FROM WORK.IMPORT
GROUP BY Product_Name)
WHERE Calculated Range >= 1000
ORDER BY Calculated Range DESC;
QUIT;
14

More Related Content

Similar to Marketing analysis - writing sample

Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdf
study help
 
Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdf
study help
 
Business insights Evaluation of a Telecom client dataset using R
Business insights Evaluation of a Telecom client dataset using R Business insights Evaluation of a Telecom client dataset using R
Business insights Evaluation of a Telecom client dataset using R
AbdulMajedRaja R S
 
Uncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic SectorsUncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic Sectors
Stephen Bolduc
 
Mock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptxMock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptx
Tim Enalls
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docx
rhetttrevannion
 
Dynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales ordersDynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales orders
Steve Chapman
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using R
Monika Mishra
 
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docxINTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
normanibarber20063
 
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docxChapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
cravennichole326
 
You can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docxYou can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docx
jeffevans62972
 
Project Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docxProject Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docx
briancrawford30935
 
Neenah ir presentation october 2013
Neenah ir presentation october 2013Neenah ir presentation october 2013
Neenah ir presentation october 2013irneenahpaperinc
 
Sale Record System
Sale Record SystemSale Record System
Sale Record System
kalpita surve
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course material
ShariAdamson
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course material
ChristopherOjeda123
 
Herramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficasHerramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficas
ssuser322245
 
DELL project
DELL projectDELL project
DELL project
KIMEP
 
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Biswadeep Ghosh Hazra
 
Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...
Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...
Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...
SlideTeam
 

Similar to Marketing analysis - writing sample (20)

Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdf
 
Excel homework.pdf
Excel homework.pdfExcel homework.pdf
Excel homework.pdf
 
Business insights Evaluation of a Telecom client dataset using R
Business insights Evaluation of a Telecom client dataset using R Business insights Evaluation of a Telecom client dataset using R
Business insights Evaluation of a Telecom client dataset using R
 
Uncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic SectorsUncovering the Bangor Region's Competitive Economic Sectors
Uncovering the Bangor Region's Competitive Economic Sectors
 
Mock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptxMock Data Insight Presentation v5.pptx
Mock Data Insight Presentation v5.pptx
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docx
 
Dynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales ordersDynamics gp insights to distribution - sales orders
Dynamics gp insights to distribution - sales orders
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using R
 
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docxINTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
INTRODUCTION TO CaseWare IDEAProvided by Audimation Services, .docx
 
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docxChapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
Chapter 2 Graphical Descriptions of Data 25 Chapter 2.docx
 
You can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docxYou can use a calculator to do numerical calculations. No graphing.docx
You can use a calculator to do numerical calculations. No graphing.docx
 
Project Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docxProject Management CaseYou are working for a large, apparel desi.docx
Project Management CaseYou are working for a large, apparel desi.docx
 
Neenah ir presentation october 2013
Neenah ir presentation october 2013Neenah ir presentation october 2013
Neenah ir presentation october 2013
 
Sale Record System
Sale Record SystemSale Record System
Sale Record System
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course material
 
Acc 206 complete course material
Acc 206 complete course materialAcc 206 complete course material
Acc 206 complete course material
 
Herramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficasHerramienta Anotación presentaciones y gráficas
Herramienta Anotación presentaciones y gráficas
 
DELL project
DELL projectDELL project
DELL project
 
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
 
Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...
Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...
Stock Pitch For Mailing Shipping Services PowerPoint Presentation PPT Slide T...
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 

Marketing analysis - writing sample

  • 1. Marketing Analysis for The Bee Corp Qingyang(Kevin) Liu Email:tug14939@temple.edu June 22, 2017 1 Introduction of the dataset The orginal file Quant Round.xlsx contains three sheets. However, xlsx format is proprietary format hence can not be imported to R software without using other packages. I transfer Quant Round.xlsx into Quant Round.csv file and only keep the first sheet since csv foramt has much better compatibility and the first sheet from Quant Round.xlsx contains all information we need. The import process using read.csv command for R software is shown below: > df1 <- read.csv(file = + "/home/kevin/Desktop/The Bee Corp/Quant Round.csv", + header = T) > dim(df1) [1] 9994 22 The Quant Round.csv file has been imported into R as df1 data frame, which contains 9994 rows and 22 variables. The summary information for important variables are shown below. Row.ID: The primary key for this dataset. This variable is unique for each row. Order.ID: The order identification. This variable doesn’t have to be unqiue. One order could contain multiple rows (one order may contain different products.). There are 5009 distinct orders in df1. Order.Date: The date when order was created or submitted. Order.Date was stored in numeric formation. I trans- fer the numeric formation into yyyy-mm-dd formation, assuming the original date is "1900-01-01". Ship.Date: The date when order was shipped. Also stored in numeric formation. I transfer the numeric formation into yyyy-mm-dd formation, assuming the original date is "1900-01-01". Ship.Mode: There are four different ship mode: Same Day, First Class, Standard Class and Second Class. Customer.ID: Customer Identification. One customer has one unique ID. Segment: There are three different segments, Customer, Corporate and Home Office, in this dataset. (Corporate␣ has been corrected as Corporate) Country: All orders have been shipped within United States. City: There are 531 different cities in this dataset. State: There are 48 contiguous U.S. states and the District of Columbia in this dataset. (CAL␣ has been corrected as California. IND␣ has been corrected as Indiana) 1
  • 2. Region: There are five regions, Central, East, North, South and West, in this dataset. There are few mistakes in the original dataset. For example, there are 37 records in which Florida was categorized as North region. Product.ID: Production Identification. One product has one unique ID. Category: All productions belong to three categories, funiture, office supplies and technology. Sub.Category: The relationship between Sub.Category and Category are shown in Table 1.1. One Sub.Category only belongs to one Category. Table 1.1: Sub.Category (in column) and Category (in row) Furniture Office Supplies Technology Accessories 0 0 775 Appliances 0 466 0 Art 0 796 0 Binders 0 1523 0 Bookcases 228 0 0 Chairs 617 0 0 Copiers 0 0 68 Envelopes 0 254 0 Fasteners 0 217 0 Furnishings 957 0 0 Labels 0 364 0 Machines 0 0 115 Paper 0 1370 0 Phones 0 0 889 Storage 0 846 0 Supplies 0 190 0 Tables 319 0 0 SalesTotal: SalesTotal = Iterm.Price × Quantity, where Item.Price is the price after discount. Profit: Positive number stands for profit. Negative number stands for deficit. 2
  • 3. 2 Sales/Profit by Region Figure 2.1: Maps of Sales and Profit in State Level Total Sales in State Level 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 Profit in State Level −20000 0 20000 40000 60000 80000 3
  • 4. Table 2.1: Inconsistent definition of regions (part) > head(table(df1$State,df1$Region),10) Central East North South West Alabama 0 0 1 60 0 Arizona 0 0 0 0 224 Arkansas 0 0 1 59 0 California 0 0 0 0 2001 Colorado 0 0 0 0 182 Connecticut 0 82 0 0 0 Delaware 0 96 0 0 0 District of Columbia 0 10 0 0 0 Florida 0 0 37 346 0 Georgia 0 0 6 178 0 Table 2.2: Top 4 States by Sales > head(df2[order(-df2$sales.ratio),],4) state sales profit sales.ratio 4 california 457687.6 76381.39 0.19923710 31 new york 310876.3 74038.55 0.13532829 42 texas 170188.0 -25729.36 0.07408497 46 washington 138641.3 33402.65 0.06035226 Table 2.3: Top 4 States by Profit > head(df2[order(-df2$profit),],4) state sales profit sales.ratio 4 california 457687.63 76381.39 0.19923710 31 new york 310876.27 74038.55 0.13532829 46 washington 138641.27 33402.65 0.06035226 21 michigan 76269.61 24463.19 0.03320111 Table 2.4: Discount in Texas > table(df1[df1$State == "Texas","Discount"]) 0.2 0.3 0.32 0.4 0.6 0.8 570 94 27 13 81 200 The rule of categorizing regions is doubtful and inconsistent in this dataset. According to Table 2.1, 37 records in Florida have been defined as records in North region and 1 record in Alabama has been defined as a record in North region. There are more than 100 records that have been defined in wrong regions. In real work, we need to discuss the definition of each region with supervisor. For this analysis report, quantitative marketing analysis based on regions is skipped. 4
  • 5. According to Table 2.2 and Table 2.3, California is the largest market for the company and New York State is the second largest market for the company either based on sales or by profit. The Sales/Profit performance in Texas market is contradictory. By sales, Texas is the third largest market for the company. However, the company lost $25, 729.36 in Texas market. By looking at Table 2.4, we find that the company has large discount policy in Texas and that every product sold in Texas market has at least 20% discount. There are even 200 records of 80% discount. Future more, by look at Table 2.5 and Table 2.6, we find out that sales in deficits markets have at least 20% discount. Discount is a important reason for deficit in those market. We need to discuss the reason for applying large discount strategy with business manager. It could be market penetration strategy or those products are too difficult to sell. Table 2.5: Deficits Markets in States Level > df3[df3$Profit < 0,] State Profit SalesTotal Profit.Sales.Ratio 40 Oregon -1190.470 17431.15 -0.06829558 41 Florida -3399.302 89473.71 -0.03799219 42 Arizona -3427.925 35282.00 -0.09715789 43 Tennessee -5341.694 30661.87 -0.17421289 44 Colorado -6527.858 32108.12 -0.20330864 45 North Carolina -7490.912 55603.16 -0.13472097 46 Illinois -12607.887 80166.10 -0.15727205 47 Pennsylvania -15559.960 116511.91 -0.13354823 48 Ohio -16971.377 78258.14 -0.21686405 49 Texas -25729.356 170188.05 -0.15118192 Table 2.6: Discount in Deficits Markets > tab1 <- table(df1[,c("State","Discount")]) > tab1[as.character(df3[df3$Profit < 0,"State"]),] Discount State 0 0.1 0.15 0.2 0.3 0.32 0.4 0.45 0.5 0.6 0.7 0.8 Oregon 0 0 0 100 0 0 0 0 5 0 19 0 Florida 0 0 0 299 0 0 0 11 6 0 67 0 Arizona 0 0 0 174 0 0 0 0 9 0 41 0 Tennessee 0 0 0 144 0 0 8 0 2 0 29 0 Colorado 0 0 0 138 0 0 0 0 4 0 40 0 North Carolina 0 0 0 201 0 0 8 0 4 0 36 0 Illinois 0 0 0 264 53 0 0 0 18 57 0 100 Pennsylvania 0 0 0 354 36 0 82 0 10 0 105 0 Ohio 0 0 0 290 23 0 67 0 8 0 81 0 Texas 0 0 0 570 94 27 13 0 0 81 0 200 Conclusion: 1. California and New York States are the first two most successful markets based on either profit or sales. 2. Companies are losing money in states like Texas, Ohio and many others due to large discount. 3. The region is ill-defined so no conclusion has been made based on it. 5
  • 6. 3 Profit by Category/Subcategory/Specific Product According to Table 3.1, we find that Technology and Office Supplies account for 50.79% and 42.77% of total profit for the company. The products that belong to Furniture only contributes 6.44% of the total Profit for the company. Table 3.1: Profit by Category > df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum)) > df4 <- arrange(df4,-df4$Profit) > df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2) > df4 Category Profit percent 1 Technology 145454.95 50.79 2 Office Supplies 122490.80 42.77 3 Furniture 18451.27 6.44 Figure 3.1: Profit by Category/Subcategory Profit by Category/Subcategory Profit/Deficit Tables Bookcases Furnishings Chairs −20000 0 20000 40000 60000 Furniture Supplies Fasteners Labels Art Envelopes Appliances Storage Binders Paper Office Supplies Machines Accessories Phones Copiers Technology Profit Deficit 6
  • 7. Table 3.2: Most Profitable Products Category Sub.Category Product.Name Total.Quantity Total_Profit Average.Term.Price Max.Item.Price Min.Item.Price Technology Copiers Canon image CLASS 2200 Advanced Copier 20 25199.93 1259.996 3499.99 2099.994 Office Supplies Binders Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind 31 7753.039 250.098 1270.99 254.198 Technology Copiers Hewlett Packard LaserJet 3310 Copier 38 6983.884 183.7864 599.99 359.994 Technology Copiers Canon PC1060 Personal Laser Copier 19 4570.935 240.5755 10559.99 559.992 Technology Machines HP Designjet T520 Inkjet Large Format Printer - 24" Color 12 4094.977 341.2481 1749.99 874.995 Technology Machines Ativa V4110MDD Micro-Cut Shredder 11 3772.946 342.9951 699.99 699.99 Looking at Figure 3.1, we find that all 4 products that belong to technology can make profit for the company. Copiers, Phones and Accessories can make more than $40, 000 for the company! All products ,except Supplies, that belong to Office Supplies can make profit for the companies. For the Furniture products, Chairs and Furnishings, can make profit while Bookcases and Tables are responsible for deficit. From Table 3.2, the most profitable product is Canon image CLASS2200 Advanced Copier, which is a copiers and a sort of technology product. However, there is doubt about the Item.Price of Canon PC1060 Personal Laser Copier. The Max.Item.Price for that product is $10, 559.99 while the Min.Item.Price is $559.992. The difference is too large for a copier. I guess the difference was caused by Typo.I will discuss these large difference between maximum item price and minimum item price in section 5. Table 3.3: Details of Canon PC1060 Personal Laser Copier’s transaction Product_Name Item_Price Quantity Discount Canon PC1060 Personal Laser Copier 559.992 2 0.2 Canon PC1060 Personal Laser Copier 10559.992 5 0.2 Canon PC1060 Personal Laser Copier 559.992 5 0.2 Canon PC1060 Personal Laser Copier 699.99 7 0 Conclusion: 1. Products like copiers, phones, accessories in Technology category can make a lot of profit. 2. The performance of Furniture products are generally not good. Those products either make little profit and loss much money for the company. 3. The most profitable product is Canon image CLASS2200 Advanced Copier. 4. Some Item.Price are doubtful, (in Table 3.3, same printer has been sold at $10, 559.992 and $559.99). 7
  • 8. 4 Cluster Analysis (DEMO) Cluster Analysis is a powerful tool for marketing analysis. The cluster analysis is very handy when there are many continuous variables. Though we don’t have many continuous variables for this dataset, we can still use this methods to have some interesting findings. We create a new dataset after aggregating on State. The first 6 rows of the new dataset could be found in Table 4.1. The cluster analysis is based on SalesTotal, Profit, Quantity and Avg.item.price. Table 4.1: Dataset for clustering analysis > df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")], + .(State),colwise(sum)) > df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity > rownames(df7) <- as.character(df7$State) > df7 <- df7[,2:5] > head(df7) SalesTotal Profit Quantity Avg.iterm.price Alabama 19510.64 5786.825 256 76.21344 Arizona 35282.00 -3427.925 862 40.93040 Arkansas 11678.13 4008.687 240 48.65887 California 457687.63 76381.387 7667 59.69579 Colorado 32108.12 -6527.858 693 46.33206 Connecticut 13384.36 3511.492 281 47.63116 After standardizing each variable via scale function, we calculate the euclidean distance between each variable. Then we choose "average" algorithm for clustering analysis.The initial result of clustering analysis could be found in Figure 4.1. Figure 4.1: Initial Results - Cluster Analysis California NewYork Wyoming Texas Washington Vermont Florida Pennsylvania Illinois Ohio Michigan Virginia Georgia Indiana RhodeIsland Montana Nevada Maryland Massachusetts Missouri Alabama Oklahoma Minnesota Delaware NewJersey Kentucky Wisconsin NorthCarolina Arizona Colorado Tennessee WestVirginia DistrictofColumbia Idaho Louisiana Nebraska NewHampshire Mississippi Arkansas Connecticut SouthCarolina Utah Oregon Maine Iowa Kansas NewMexico NorthDakota SouthDakota 0246 Average Linkage Clustering hclust (*, "average") d Height There are many criterion we can choose to determine the number of clusters. According to my experience, the NbClust::NbClust function could be very helpful. 8
  • 9. Figure 4.2: Determine the number of clusters 0 2 3 5 9 10 Number of Clusters Chosen by 26 Criteria Number of Clusters NumberofCriteria 02468 The NbClust::NbClust use 26 different criteria to determine the number of clusters. According to the result from NbClust::NbClustin Figure 4.2, I decide to set the number of cluster equal 3. Figure 4.3: Final Results - Cluster Analysis California NewYork Wyoming Texas Washington Vermont Florida Pennsylvania Illinois Ohio Michigan Virginia Georgia Indiana RhodeIsland Montana Nevada Maryland Massachusetts Missouri Alabama Oklahoma Minnesota Delaware NewJersey Kentucky Wisconsin NorthCarolina Arizona Colorado Tennessee WestVirginia DistrictofColumbia Idaho Louisiana Nebraska NewHampshire Mississippi Arkansas Connecticut SouthCarolina Utah Oregon Maine Iowa Kansas NewMexico NorthDakota SouthDakota 0246 Average Linkage Clustering 3 Cluster Solution hclust (*, "average") d Height The final result could be found in Figure 4.3. New York and California are categorized as cluster 2. Wyoming is categorized as cluster 3. The rest states are categorized as cluster 1. Description of Clusters > aggregate(df7, by = list(clusters), median) Group.1 SalesTotal Profit Quantity Avg.iterm.price 1 1 20944.270 2116.598 268.5 57.87003 2 2 384281.951 75209.968 5945.5 66.64670 3 3 1603.136 100.196 4.0 400.78400 9
  • 10. We can easily find that the average item price sold to Wyoming is as high as $400. This makes Wyoming a outlier compared to other states. New York state and California are grouped together due to their outstanding performance in profit. Other states are grouped together since the algorithm "thinks" the similarity between them is large. However, I have to point out that this section is just a demo to illustrate my ability in data mining and machine learning. Much more work still need to be done to draw serious conclusions. 5 Doubtful Item.Price Table 5.1: Doubtful Item.Price Category Sub_Category Product_Name Total_Profit Max_Item_Price Min_Item_Price Range Furniture Furnishings Deflect-o DuraMat Antistatic Studded Beveled Mat for Medium Pile Carpeting 244.3888 10105.34 42.136 10063.2 Technology Accessories Logitech P710e Mobile Speakerphone 1645.361 10257.49 205.992 10051.5 Furniture Chairs DMI Arturo Collection Mission-style Design Wood Chair 486.1556 10105.69 105.686 10000 Technology Copiers Canon PC1060 Personal Laser Copier 4570.935 10559.99 559.992 10000 Technology Phones BlackBerry Q10 548.0565 10100.79 100.792 10000 Technology Phones RCA ViSYS 25825 Wireless digital phone 90.993 10103.99 103.992 10000 Office Supplies Binders Ibico EPK-21 Electric Binding System 3345.282 1889.99 377.998 1511.992 Technology Machines Cubify CubeX 3D Printer Double Head Print -8879.97 2399.992 899.997 1499.995 Technology Copiers Canon imageCLASS 2200 Advanced Copier 25199.93 3499.99 2099.994 1399.996 Office Supplies Binders GBC DocuBind P400 Electric Binding System -1878.17 1360.99 272.198 1088.792 Technology Machines Lexmark MX611dhe Monochrome Laser Printer -4589.97 1529.991 509.997 1019.994 Office Supplies Binders Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind 7753.039 1270.99 254.198 1016.792 Office Supplies Binders Fellowes PB200 Plastic Comb Binding Machine 693.5592 1050.997 50.997 1000 Office Supplies Envelopes Tyvek Top-Opening Peel & Seel Envelopes, Plain White 225.0504 1021.744 21.744 1000 As I mentioned at the end of Section 3, the difference between maximum item price and minimum item price are too large for some products. In Table 5.1, I will all products that have doubtful Item.Price. The Range variable equals the difference between Max_Item_Price and Min_Item_Price. It is implausible that Blackberry Q10 could be sold at $10, 100.79 meanwhile be sold at $100.79. 10
  • 11. 6 Code in R and SQL Code for Section 1: ##Data Import## df1 <- read.csv(file = "/home/kevin/Desktop/The Bee Corp/Quant Round.csv", header = T) dim(df1) ##summary of variables## #Row.ID# length(unique(df1$Row.ID)) #Order.ID# length(unique(df1$Order.ID)) #Order.Date# df1$Order.Date <- as.Date(df1$Order.Date,origin = "1900-01-01") #Ship.Date# df1$Ship.Date <-as.Date(df1$Ship.Date,origin = "1900-01-01") #Ship.Mode# unique(df1$Ship.Mode) #Customer.ID# length(unique(df1$Customer.ID)) #Customer.Name# length(unique(df1$Customer.Name)) #Segment# table(df1$Segment) levels(df1$Segment) <- c("Consumer","Corporate","Corporate","Home Office") ## Fix TYPO #Country# table(df1$Country) #City# length(unique(df1$City)) #State# length(unique(df1$State)) sort(unique(df1$State)) df1$State[df1$State == "CAL "] <- c("California") df1$State <- droplevels(df1$State,exclude = "CAL ") #correct CAL # df1$State[df1$State == "IND "] <- c("Indiana") df1$State <- droplevels(df1$State,exclude = "IND ") #correct IND # table(df1$State) #Region# table(df1$Region) #Category# table(df1$Category) #Sub.Category# table(df1$Sub.Category,df1$Category) 11
  • 12. Code for Section 2: #maps# library(latticeExtra) library(mapproj) library(plyr) df2 <- data.frame(state = tolower(df1$State), sales = df1$SalesTotal, profit = df1$Profit) df2 <- ddply(df2, .(state),colwise(sum)) rng <- with(df2, range(sales, profit, finite = TRUE)) nbreaks <- 50 breaks <- exp(do.breaks(log(abs(rng)), nbreaks)) plot1 <- mapplot(state ~ sales, data = df2, breaks = seq(from = 0, to = max(df2$sales), length.out = 51), map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"), scales = list(draw = FALSE), main = "Total Sales in State Level",xlab = "") plot2 <- mapplot(state ~ profit,data = df2, breaks = seq(from = min(df2$profit), to = max(df2$profit)*1.1, length.out = 51), map = map("state", plot = FALSE, fill = TRUE,projection = "tetra"), scales = list(draw = FALSE), main = "Profit in State Level",xlab = "") print(plot1, split = c(1,1,1,2), more = TRUE) print(plot2, split = c(1,2,1,2), more = FALSE) #sales# df2$sales.ratio <- df2$sales/sum(df2$sales) head(df2[order(-df2$sales.ratio),],4) #profit# head(df2[order(-df2$profit),],4) #discount# table(df1[df1$State == "Texas","Discount"]) round(prop.table(table( df1[df1$State == "Texas","Discount"])),2) #region/sale# table(df1$State) head(table(df1$State,df1$Region),10) df3 <- ddply(df1[,c("State","Profit","SalesTotal")],.(State),colwise(sum)) df3 <- arrange(df3,-df3$Profit) df3$Profit.Sales.Ratio <- df3$Profit/df3$SalesTotal df3[df3$Profit < 0,] tab1 <- table(df1[,c("State","Discount")]) tab1[as.character(df3[df3$Profit < 0,"State"]),] head(tab1[as.character(df3[df3$Profit > 0,"State"]),],5) 12
  • 13. Code for Section 3: #Category# df4 <- ddply(df1[,c("Category","Profit")],.(Category),colwise(sum)) df4 <- arrange(df4,-df4$Profit) df4$percent <- round(df4$Profit/sum(df4$Profit)*100,2) df4 #Sub.Category# df5 <- ddply(df1[,c("Category","Sub.Category","Profit")], .(Category, Sub.Category),colwise(sum)) df5 <- arrange(df5, -df5$Profit) df5 df5$Positive <- as.numeric(df5$Profit > 0) key.variety <- list(space = "right", text = list (c("Profit","Deficit")), points = list(pch = 16, col = c("#0080ff","#ff00ff"))) plot2 <- dotplot(factor(Sub.Category, levels = rev(as.character(arrange(df5, df5$Category, -df5$Profit)$Sub.Category))) ~ Profit| Category, groups = -Positive, data = df5, pch = 16, key = key.variety, layout = c(1,3), scales=list(y = list(relation = ’free’)), panel = function(...){ panel.grid (h = 0, v= -1) panel.dotplot(...) },layout.heights = "free", xlab = "Profit/Deficit", lattice.options = modifyList (lattice.options(),list(skip.boundary.labels = 0)), main = "Profit by Category/Subcategory");plot2 resizePanels() #Most Profitable Product# df6 <- ddply(df1[,c("Category","Sub.Category","Product.Name","Quantity","Profit")], .(Category,Sub.Category,Product.Name),colwise(sum)) head(arrange(df6,-df6$Profit)) 13
  • 14. Code in Section 4: #Cluster Analysis# par(mfrow = c(1,1)) df7 <- ddply(df1[,c("State","SalesTotal","Profit","Quantity")], .(State),colwise(sum)) df7$Avg.iterm.price <- df7$SalesTotal/df7$Quantity rownames(df7) <- as.character(df7$State) df7 <- df7[,2:5] head(df7) df7.scaled <- scale(df7) d <- dist(df7.scaled) fit.average <- hclust(d, method = "average") plot(fit.average, hang = -1, cex = 0.8, main = "Average Linkage Clustering") library(NbClust) devAskNewPage(ask = F) nc <- NbClust(df7.scaled, distance = "euclidean", min.nc = 2, max.nc = 10, method = "average") table(nc$Best.n[1,]) par(mfrow = c(1,1)) barplot(table(nc$Best.n[1,]),xlab = "Number of Clusters", ylab = "Number of Criteria", main = "Number of Clusters Chosen by 26 Criteria") clusters <- cutree(fit.average, k = 3) table(clusters) aggregate(df7, by = list(clusters), median) plot(fit.average, hang = -1, cex = 0.8, main = "Average Linkage Clusteringn3 Cluster Solution") rect.hclust(fit.average,k=3,border = 2) Code in Section 5 - SAS/SQL Code: PROC SQL; SELECT * FROM( SELECT DISTINCT Category, Sub_Category, Product_Name, sum(Profit) as Total_Profit, max(Item_Price) as Max_Item_Price, min(Item_Price) as Min_Item_Price, Calculated Max_Item_Price - Calculated Min_Item_Price as Range FROM WORK.IMPORT GROUP BY Product_Name) WHERE Calculated Range >= 1000 ORDER BY Calculated Range DESC; QUIT; 14