Saks Fifth Avenue Customer Behavior Report
——Based on Data Driven Analysis
Group 8: Linhan zhang, Zhongyuan Lian, Huiruo Zhang, Yitian Chen
Executive Summary
Saks Fifth Avenue is a luxury department chain store which sells high-end brands both
online and offline. The objective of this research is to help Saks Fifth Avenue (hereafter Saks)
decrease customer’s return rate and cancel rate so as to improve customer’s profitability and
satisfaction. Also, we want to regain our old customers as well as increase their loyalty.
The original data in our research comes from the Customer Relationship Management
Database in Saks. This database records a wide range of historical sale information based on every
single order line, including over 137,000 orders from 100,000 customers. Each order line records
customer information in terms of customer ID number, and ZIP Code, and transaction-related
items such as order date, shipping date, revenue, cost etc.
Since we intend to segment our customers based on their return record, order cancel record,
total profits, and the time of their most recent order, we aggregate all records into a new data file
with individual level. Then, we choose four key factors as our variables which are profits, return
rate, cancel rate and time duration since last order date. We use K-means cluster analysis as our
major segmentation method. We divide whole data into calibration set and validation set, and
conduct K-means cluster analysis on each of them to make sure that we will not miss any
meaningful group of customers. Furthermore, different methods are conducted to explore our
research several times.
After outcomes of K-means cluster analysis match our expectation, we summarize and
interpret our key findings. There are 8 clusters which have meaningful features respectively.
Among them there are three groups that interest us most.
The first group makes up about 30% of all customers which generate high profits, and their
return/cancel rate are extremely low. They have shortest time duration since last order date.
Obviously, they are the core customers for our company and we should take an action to retain
these customers in order to generate more profits. For example, we could offer them better services
and high quality products to increase customer loyalty and satisfaction.
The second one is a group of customers who generate relatively high profits to our
company, while their cancel rate are extremely high. These customers were able to generate a huge
profit for us. However, they are likely to cancel their orders due to some reasons. What makes
things worse is that they will cause additional costs for our company since we need to provide
special services when they return items. For these kinds of customers, we need to figure out their
true needs and the reasons of high cancel rate. They have huge financial potential if we can increase
their customer satisfaction. They could turn into the first group of customers and generate a huge
profit to the company.
The third group includes customers who have relatively lower profits, but their return rate
is very high. These customers are unsatisfied with our products or services so they keep returning
their items back. This group is a huge financial burden for Saks, so we have to decrease their return
rate by figuring out the reasons and taking any actions to increase their satisfaction.
Ultimately, we analyze the major reasons, which cause high rate/cancel rate. Based on our
previous analysis, we provide different managerial recommendations for each groups regarding
their significance and characteristics. These recommendations will serve to decrease customers’
return rate and cancel rate and eventually increase profits for Saks in the future.
Table of Contents
1. Introduction .................................................................................................1
2. Background .................................................................................................2
3. Methodology and Analysis............................................................................4
Definition of Clustering Analysis...................................................................5
Data Obtained and Used................................................................................7
Variables selection and Explanation ...............................................................7
Data Preparation ...........................................................................................9
Calibration and Validation...........................................................................11
Clustering Settings......................................................................................11
Measure Interval: Euclidean Distance....................................................12
Cluster Method: Ward’s Method............................................................12
Standardization: Z scores......................................................................13
Specific Operations.....................................................................................13
Findings from Clustering Results .................................................................18
4. Conclusion & Recommendations.................................................................20
Recommendations.......................................................................................22
5. Limitations and Future Research .................................................................28
6. Appendix...................................................................................................30
1
1. Introduction
Imagine you are a store owner selling limited-edition Prada’s purse which normally more
than $5000. Which kind of customer is more valuable for you? A customer who spends average
amount of money but never returns or cancels the order? Or a customer who spends huge amount
of money but returns or cancels most their orders at the end? This is a significant but tricky question
for every company, especially for Saks Fifth Avenue who has higher unit price.
It is said that customers are the most valuable equity for companies. As a luxury department
store, Saks sells products that are much more expensive, which means every single purchase means
a lot to the company in financial level. As a result, high return and cancel rate are more lethal for
Saks than regular department stores, for example, Macy’s. At the meantime, customer satisfaction
and loyalty that directly decide the company’s fate are also extremely significant for Saks. What
is more, it is also important for us to know how often a customer comes back and purchase.
According to our background research, the major managerial issue of Saks is to increase
profit by reducing return/cancel rate as well as regaining customers who have not purchased more
than one year. Through a series of analysis and comparison, we segment whole customers into 8
groups based on profit that they generate, the time duration since their last order date, return rate
and cancel rate. Each of group has their own meaningful features. Some of them generate the
highest profits while have not purchased for more than two years. Some of them generate high
profit while also have high return/cancel rate. We have discussed each cluster in detail in the
following report. We will elaborate each group’s features and provide managerial
recommendations.
2
2. Background
Saks Fifth Avenue is a luxury department chain store that was founded in 1867. With such
a long history, Saks has established its own customer pool with large quantity of loyal customers.
Most customers go shopping in Saks for their nice service and latest fashion. There are a number
of world famous luxury brands in Saks including Gucci, Prada and FENDI. Staffs in Saks are very
professional and they usually offer customers thoughtful advices during the purchase process.
However, based on our research, Saks cannot generate as much revenue as it did a few
years ago. The competition between department stores is becoming more and more fierce. Main
competitors of Saks such as Bloomingdale’s and Neiman Marcus have made much pressure on
Saks by using price-off promotions. Even medium range department stores, saying Macy’s, and
online stores, like amazon.com, are competing with Saks. More competitions mean customers have
more choices. However, for Saks it leads to high return and cancel rate because once customers
find a lower price on amazon.com, the first action they will take is to cancel their orders on our
website. Moreover, high service costs make Saks more difficult to generate considerable profits.
As a result, Saks has faced much more challenges than it ever did and they need to find a way to
solve their own problems and keep growing.
In recent years, Saks has introduced their online stores and app to enlarge their market
share and attract more young customers. Online shopping is an easier and cheaper way to purchase
items for both customers and companies. However, it raises several issues as well. Since Saks sells
many apparels and makeups, it is impossible for customers to try them on before purchasing on
the website. Once customers find out that the product does not match their expectation, they will
return items back. Therefore, online shopping has increased return/cancel rate, which leads the
3
company to spend additional costs. Our team will help Saks to figure out solutions to these issues
by reducing return/cancel rate as well as increasing customer satisfaction.
4
3. Methodology and Analysis
The nature of retailing industry reflects the great importance of a deep understanding of
customers. Saks Fifth Avenue specializes in selling various high-end brands including Gucci,
Burberry, and Prada etc. In an effort to increase the company’s profit, we notice that working on
reducing return rate and cancel rate could play a crucial role in achieving this objective. Once a
customer returns a product or cancels an order, we actually lose not only the potential profit, but
also the previous effort we invested in acquiring this customer and in attracting her to visit our
locales. Therefore, we pay most of our attention in investigating returning and cancelling so as to
obtain actionable insights of which we can take advantage.
We intend to conduct cluster analysis to segment our historic customers in terms of their
return record, cancel order record, and total profit generated throughout their accumulated
consumptions in Saks Fifth Avenue, as well as the time of their most recent order. By conducting
cluster analysis, we discover separate groups that differ from each other in these aspects. Then we
compare them, identify their differences, and evaluate the possible reasons to these differences.
After understanding the characteristics and implications of these groups, we are able to come up
with corresponding recommendations that can improve their future performance.
The objective of this study is to identify different customer groups in terms of the above
four aspects and screen out specific contact information of the customers in each group for direct
marketing, eventually decreasing return rate and cancel rate, increasing customer satisfaction and
profit, and regaining old customers.
5
Definition of Clustering Analysis
Originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology
by Zubin in 1938 and Robert Tryon in 1939, cluster analysis is the task of grouping a set of objects
in such a way that objects in the same group (called a cluster) are more similar (in some sense or
another) to each other than to those in other groups (clusters). [Reference] The outcome of cluster
analysis is to create a set of segments from a set of individual samples. Samples in the same
segment share more commonalities with each other than they do with samples from other
segments. In business, Cluster Analysis is a popular and frequently used method to realize market
segmentation, which is an important part of marketing planning.
Our research adopts two types of clustering methods: Hierarchical Clustering and K-means
Clustering. Hierarchical Clustering is useful when sample size is relatively small. Different
selections of clustering method and measure interval lead to different clustering results. Among
them we select the one that meets our expectation in terms of segment size, segment characters
and between-segment differences. Hierarchical clustering engenders an exploratory insight for
following K-means clustering analysis, which, together with hierarchical clustering, is capable of
big-size-sample segmentation. An effective and efficient cluster analysis on a big size data set
requires the combination of these two clustering methods. The Flowchart (see Fig. 3.1)
demonstrates the procedure of our cluster analysis.
6
Figure 3.1 Analysis Flowchart
7
Data Obtained and Used
The data analyzed in our research comes from the Customer Relationship Management
Database in Saks. This database records a wide range of historical sale information based on every
single order line. Each order line records customer information including customer ID number and
ZIP Code, and some key items recorded during a transaction such as order date, shipping date,
price, cost, etc. Different order lines may have the same order number, showing that these order
lines are from the same order. By the same token, different order numbers may have the same
customer number, meaning that the customer placed these orders in different times.
The size of the data we obtain is significant enough to produce representative insights.
We select the records start from 12/16/2004 to 09/17/2012, covering more than 226,000 order
line records. These records reflect over 137,000 orders from 100,000 customers.
Variables selection and Explanation
After understanding the descriptions of the variables in a record line, we determine four
clustering variables. They are: Total Profit, Return Rate, Cancel Rate, and Time Duration since
Last Order Date. These variables are not included in the current data set but can be calculated from
some of the existing variables. There are other variables we will use to describe the features and
attributes of our result segments, including Customer Number, Zip Code, etc. Each of the
clustering variables has its unique meaning and implication for us.
Total Profit
Profit is the most important indicator of a customer’s value. The higher the profit a
customer generates, the more imperative it is to maintain him/her. Generally, a company has finite
resources available for customer relationship management. If it invests equal resources in every
8
customer in spite of their value, unavoidably, it will end up with high profitable customers not
served and maintained hospitably and with low profitable customers occupying much resources
but not creating enough profit in return. Therefore, while our final objective is to come up with
actionable recommendations to different customer groups, the Total Profit tells us which group
requires more attentions and hence, more resources.
Return Rate
Return Rate conveys important information about the consumption characteristics of a
customer. A high return rate has many implications. For example, an unclear or misleading product
description could result in customers’ complaints after receiving their packages, which always
leads to returning. A high return rate could also be attributed to customer’s particular taste. No
matter what leads to high return rate, the higher the return rate, the more profit we loss. While
increasing revenue is a pathway to greater profit, lowing unnecessary loss is also an effective one.
Saks dedicates in high-end niche market. The nature of high-end brands, generally speaking, have
smaller sale volume than lower tier brands, but they invest more to support their high-end brand
positioning and marketing. Saks’s well-qualified salesperson, high rental fee, and high advertising
budget imply a high operational cost. Once return or cancel happens, though most of products can
be resold, we waste a lot of costs. This is another reason we attach great importance to return rate
and cancel rate. With these considerations, we select return rate as one of our clustering
variables.
Cancel Rate
Cancel Rate refers to the ratio of one’s cancel order lines to total order lines. The same as
return rate, it has a strong relation with profit, but in a different way. While returning is a
customer’s decision after receiving the ordered products, cancelling means a customer changes her
9
mind before that. We assume that return rate, relatively, comes down to the dissatisfaction of our
products and that it implies unsatisfying customer purchasing experiences such as chaotic
shopping guidance and poor customer services. By the same token as return, a decreased cancel
rate brings corresponding increased profit. Therefore, we put cancel rate in our variable list.
Time Duration since Last Order Date
Time duration since last order date is the time period between the date a customer placed
his last order and current date. We audit the data set and find there are a large number of customers
have been a long time not shopping in Saks again. The longer the duration, the higher the
possibility that the customer has already defected. This variable matters our decision making in
that marketing strategies and plans can be totally different towards new customers and old ones.
And so too is the resulting marketing effects. Relatively new customers are easier to contact and
attract because their contact information is up to date and because they have stronger connection
with our brand and products. On the other hand, customers who have more than three years not
coming back are of lesser value and priority due to the opposite reasons. Therefore, differentiating
new and old customers through clustering is meaningful.
Data Preparation
The records in the data set is ordered basing on every single order line. Since the objective
of our cluster analysis is the acquirement of information on an individual basis, we aggregate all
records into a format with customer number as key value. We then audit the aggregated date set
and determine the calculations to transform existing variables into the four clustering variables.
Table 3.1 offers a comprehensive explanation of the variables used and the ones we compute, in
the order of variables used throughout this analysis.
10
Table 3.1 Overview of the Variables Used
Variables Explanation
Original variables
Customer Number
A unique customer identification numeric string with 11
digit. Each customers has only one customer number.
ZIP Code
5 digit ZIP Code referring the location of a customer
where he/she places an order.
Order Number
A unique 9 digit numeric string referring to a specific
order. One customer could have placed more than one
orders with different order number.
Order Line Line number for each unique product in an order.
Order Date Date an order was placed
Quantity Quantity of a product in an order
Revenue The total price of an order line
Cost The total cost of an order line
Return Quantity The quantity of returned product
Computed Variables before aggregation
Profit
The profit of a single order line.
Calculation:
Profit = Revenue - Cost
Time Duration since Order Date
(Month)
The time duration between today and the day the order
was placed.
Calculation:
Time Duration Since Order Date = Date of Today –
Order Date
Aggregated Variables (Aggregate by Customer Number)
Last ZIP Code The ZIP Code a customer places his/her last Order
Total Profit The summed Profit of a customer’s all order(s)
Time Duration since Last Order
Date (Month)
The time duration between today and the day the
customer’s last order was placed.
Total Quantity
The total quantity of products of a customer has ever
purchased including returned and cancelled quantity
Total Return Quantity
The total quantity of products a customer has ever
returned
11
Total Cancel Quantity
The total quantity of products a customer has ever
cancelled
Computed Variables after aggregation
Return Rate
The return rate of a customer’s historical consumptions.
Calculation:
Return Rate = Total Return Quantity / Total Quantity
Cancel Rate
The cancel rate of a customer’s historical consumptions.
Calculation:
Cancel Rate = Total Cancel Quantity / Total Quantity
Calibration and Validation
After the four clustering variables are ready, we divide the data set into two subsets:
Calibration sample set (including 60% records of all) and Validation sample set (including 40%
records of all). The Calibration sample set is used to generate a promising division of
segmentations, while the validation sample set is used to verify whether that division is appropriate
and representative. Conducting clustering on both these sets ensures no meaningful segments are
missed. If true, then a clustering on the entire data set is conducted to further testify that division.
This verification mechanism is useful in guaranteeing the accuracy and the representativeness of
our analysis.
Clustering Settings
Randomly selecting 10% samples from calibration set, we formulate the approach to
hierarchical clustering. There are three crucial decisions: selection of cluster method, selection of
measure interval, and whether or not to standardize clustering variables.
12
Measure Interval: Euclidean Distance
Measure Interval decides the calculation standard of the distance between two samples.
Two popular measure interval metrics are Squared Euclidean Distance and Euclidean Distance.
While the distance between two given samples is X according to Euclidean Distance algorithm, it
becomes X 2
in the case of Squared Euclidean Distance algorithm. Squared Euclidean Distance
amplifies the numeric value of a fixed distance, and thus the variance between samples is enlarged.
An enlarged variance alienate two samples. However, we prefer two similar samples to be
convergent rather than distant. Therefore, we select Euclidean Distance.
Cluster Method: Ward’s Method
Cluster Method decides the criterion that judges the distance between two clusters. Two
alternative methods are Furthest Neighbor and Ward’s method. Furthest Neighbor method
determine the longest distance between any two members of the two clusters as the distance
between the two clusters. This method is effective in identifying the small sample groups that are
conspicuously different from others, and correspondingly, the outcome clusters always happen to
have the majority of samples converge in a few large groups with the rest minority samples
assigned to much smaller groups. On the other hand, Ward’s method used sum of squared-errors
as the measure of distance and thus tends to produce groups of similar size.
Our analysis aims at identifying groups with different characters with respect to the four
clustering variables. The identified groups should be adequately sizable for actionable marketing
campaigns, which means that some of the segments identified by Furthest Neighbor method might
be too small to meet our expectations. On the contrary, Ward’s method provides groups with
relatively even sample distribution and is our choice.
13
Standardization: Z scores
Standardization is required when clustering variables produce different weighted
influences on the result. Standardization transforms the variables into comparable forms so that
they have equal influences and significances. In our study, the four variables have obviously
different value ranges and variances. To guarantee an accurate analysis, we standardize them.
Specific Operations
Firstly, according to our purpose, we must create several new variables in order to complete
further analysis. Since customer’s profit is the key factor for our analysis, we use the following
equation to calculate a new variable named “Profit”.
Profit = Revenue – Cost1
Because we want to know how many months passed since each customer’s last order date, we use
“Date and Time Wizard” to create a new variable named “Time Duration”. Secondly, we use
“Recode into Same Variables” to replace the missing values in cancel quantity, return quantity and
quantity with zero. Thirdly, we aggregate the original data file into a new data file. The break
variable is “Customer Number”, and the aggregated variables are “zip code(last)”, “profit(sum)”,
“last order data(minimum)”, “cancel quantity(sum)”, “return quantity(sum)” and “quantity(sum)”.
Finally, because we want to know each customer’s return rate and cancel rate, we create two new
variables named ”Return Rate” and “Cancel Rate” by using following formulas:
Return rate = return quantity / quantity
1
* Actually the precise total profit of a customer should be calculated by the formula:
Profit = (Revenue - Cost)*[1-(Return Quantity + Cancel Quantity)/Quantity]
However, this calculation losses the ability to demonstrate a customer’s potential consumption power since his/her
returned and cancelled profit are excluded. In our study, we want to investigate customer’s true consumption power
and therefore, we use: Profit = Revenue – Cost.
14
Cancel rate = cancel quantity / quantity
Standardize Decision Variables
We calculate Z-Scores for all decision variables including “Profit”, “Time Duration”,
“Return Rate” and “Cancel Rate”. Then we save them as new variables.
Split the Sample
We use “Select Cases” to split the whole data into a calibration sample which is about 60%
of all data and a validation sample which is about 40% of all data.
Hierarchical Clustering
Firstly, we choose 10% from the calibration sample as our small subset. Secondly, we run
Hierarchical Cluster Analysis to determine the number of clusters. We choose Ward’s method,
Euclidean distance and Z scores as our methods. According to the marked line, we choose 6 to 8
as the range of solutions. Based on the comparison of the Custom Tables, we choose 8 clusters as
the number of clusters because we can obtain most clear and meaningful managerial insights. The
detailed Custom Tables are attached on Appendixes. Thirdly, we conduct Hierarchical Cluster
Analysis to identify the cluster centers. Finally, we save the outcome in a new data file as initial
seeds which are attached on Appendixes (see Table 1 in the Appendix).
K-Means Cluster Analysis
We use the results of Hierarchical Cluster Analysis as initial seeds and conduct K-means
Cluster Analysis for the Calibration Sample. We choose 80 as maximum iterations and save cluster
membership. The valid cases are 60,025, and the missing cases are 54. The Initial Cluster Centers
(see Table 2 in the Appendix), Iteration History (see Table 3 in the Appendix), Final Cluster
15
Centers (see Table 4 in the Appendix) and Number of Cases (see Table 5 in the Appendix) in each
Cluster are attached on Appendixes.
Exploring Results
In order to make sure we obtain optimal result, we use different random subsets and
different methods to conduct Hierarchical Cluster Analysis. When we use Furthest Neighbor and
Squared Euclidean distance as methods, the outcome is obviously inappropriate because most data
are concentrated on 2 clusters. Other 6 clusters have extremely small and meaningless counts.
More importantly, we cannot find the ideal group which has high return rate and high cancel rate.
Then we save these outcomes as initial seeds in order to run K-means Cluster Analysis. We
conduct K-means Cluster Analysis using different initial seeds. As expected, the results of
calibration sample, the results of validation sample, and all data results cannot match in major
clusters respectively.
Finalize Calibration Results
Based on our previous analysis, we finalize our decision by running K-means Cluster
Analysis on Calibration sample. The following is calibration results (Table 3.2).
16
Validation Sample
Firstly, we conduct Hierarchical Cluster Analysis to identify the cluster centers. We still
use Ward’s method and Euclidean distance as our methods when we run Hierarchical
Cluster Analysis. Then we run K-means Cluster Analysis on Validation sample using new
initial seeds. The valid cases are 39,872, and the missing cases are 45. The Initial Cluster
Centers (see Table 6 in the Appendix), Iteration History (see Table 7 in the Appendix), Final
Cluster Centers (see Table 8 in the Appendix) and Number of Cases in each Cluster (see Table 9
in the Appendix) are attached on Appendixes. The following is validation results (see Table 3.3).
Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 67.6 7057 916.98 1021 100.48 17777 168.81 2802
Return Rate 0.79 7057 3.41 1021 0.6 17777 24.94 2802
Cancel Rate 0.44 7057 5.43 1021 0.15 17777 16.47 2802
Time Duration 81 7057 26 1021 13 17777 35 2802
Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 77.49 13597 67.79 2626 79.26 13894 107.68 1251
Return Rate 0.28 13597 98.04 2626 0.1 13894 0.2 1251
Cancel Rate 0.09 13597 0.02 2626 0.02 13894 97.09 1251
Time Duration 61 13597 41 2626 38 13894 46 1251
5 6 7 8
Table 3.2 Calibration Clustering Results
1 2 3 4
17
Compare and Finalize
We compare the calibration results and validation results, and they are consistent.
Especially, the most managerial meaningful clusters which have high return rate and cancel rate
are consistent. Therefore, we conduct Hierarchical Cluster Analysis to identify the cluster centers.
We still use Ward’s method and Euclidean distance as our methods. Then we run K-means Cluster
Analysis on all data using new initial seeds. The valid cases are 99,897, and the missing
cases are 99. The Initial Cluster Centers (see Table 10 in the Appendix), Iteration History
(see Table 11 in the Appendix), Final Cluster Centers (see Table 12 in the Appendix) and
Number of Cases in each Cluster (see Table 13 in the Appendix) are attached on Appendixes.
The following is all data results (see Table 3.4).
Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 86.26 11681 525.19 1604 1939 99 78.24 2395
Return Rate 0.89 11681 4.02 1604 2.91 99 85.14 2395
Cancel Rate 0.17 11681 2.96 1604 10.45 99 0.01 2395
Last Order Date 13 11681 25 1604 27 99 40 2395
Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 70.47 12078 73.82 10425 161.98 810 93.1 780
Return Rate 0.4 12078 0.6 10425 4.05 810 0 780
Cancel Rate 0.09 12078 0.07 10425 42.07 810 99.93 780
Time Duration 69 12078 41 10425 41 810 46 780
5 6 7 8
Table 3.3 Validation Clustering Results
1 2 3 4
18
Findings from Clustering Results
 The cluster 3 is one of the key customer clusters because this group of people contribute the
highest profit, which is $914.13, to us. Also, their return rate is 3.47% and their cancel rate is
5.28% which are relatively low. The average time duration since last order date is 26 months
which is the second shortest among all clusters.
 The cluster 6 is also extremely important for us because the profit of this cluster is $100.16
which is relatively high. The last order time is the shortest among all clusters, and their return
rate and cancel rate are both under 1%. Moreover, the customer number of this cluster is the
largest and makes up nearly 30% of all data sample.
 The cluster 4 is one of clusters which our team wants to highlight. The customers in this group
have second highest profit which is $172.52 and third shortest time duration since last order
date which is 35 months. However, their return rate and cancel rate are 24.78% and 16.64%
Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 67.38 11778 78.05 23015 914.13 1717 172.52 4726
Return Rate 0.8 11778 0.11 23015 3.47 1717 24.78 4726
Cancel Rate 0.4 11778 0.02 23015 5.28 1717 16.64 4726
Last Order Date 81 11778 38 23015 26 1717 35 4726
Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 77.4 22793 100.16 29415 67.23 4371 103.31 2082
Return Rate 0.29 22793 0.59 29415 98.08 4371 0.17 2082
Cancel Rate 0.09 22793 0.14 29415 0.01 4371 97.41 2082
Time Duration 61 22793 13 29415 41 4371 46 2082
5 6 7 8
Table 3.4 All Data Clustering Results
1 2 3 4
19
respectively. We can obtain huge financial return if we can lower their return rate and cancel
rate.
 The cluster 7 is another group which we want to deeply analyze. The profit of this cluster is
$67.23, and their cancel rate is 0.01%, and last order period is 41. But, we are surprised that
their return rate is 98.08%. It means we have been spent large amount of money to serve this
group of customers and they have a huge negative influence on our company’s financial status.
We can largely cut down company’s cost by decreasing their cluster’s return rate.
 The cluster 8 is surprising us as well. Their profit is $103.31 which is third highest among all
clusters. The return rate of this group is 0.17%, and the time duration since last order date is
46 months. However, the cancel rate of this cluster is 97.41%. From our perspective, this
group of customers has large profit potential if we can optimize our purchase process to lower
the cancel rate.
 The cluster 2 and cluster 5 are also significant for our analysis because of several reasons.
These two groups have large customer number. Although the profit of these two groups is
both under $80, the return rate and cancel rate are extremely low which all under 0.3%.
Meanwhile we also need to notice that their time duration since last order date are more than
3 years, so we must figure out how to “arouse” those old customers.
 Cluster 1 is relatively unimportant for this analysis. Although the profit is $67.38, the return
rate and cancel rate are low. These customers didn’t buy any product from our store more than
6 years. Thus, it is very hard to re-target this group of people.
20
4. Conclusion & Recommendations
After all data analysis, we segment our customers in 8 groups. Our goal is to decrease return
rate and cancel rate so that we can improve our customers’ profitability and satisfaction. We also
want to regain our old customers and increase their loyalty. According to Table 3.4, we create the
pie chart (see Fig.4.1) that illustrates the percentages of different segments that make up total
profits.
Figure 4.1 Profit Distribution among Groups
From Fig. 4.1, we can clearly find that cluster 6, 2, 5, and 3 contribute the majority (79%) of
our total profits. These customers are our key customers in terms of total profits they generate.
According to Table 3.4, Customers in cluster 6 have the shortest time duration since the last
order date, which means these people now have the highest awareness of Saks among all customers
and have a stronger connection to us. We need to retain these customers for the long-term
development because they have higher probabilities to bring potential profits. In addition, the fact
that their return rate and cancel rate are both low shows that they currently are satisfied with our
Group 1
8%
Group 2
18%
Group 3
15%
Group4
8%
Group 5
17%
Group 6
29%
Group 7
3%
Group 8
2%
Profits Distribution
21
products and services.
Cluster 2 has the lowest return rate and second lowest cancel rate. These customers are highly
satisfied with our products and services. These customers purchase products in Saks with fewer
hesitations. But they have not placed an order for more than 3 years, so it is important for us to
retarget them.
For the cluster 5, customers’ return rate and cancel rate are low, so their satisfaction is stable.
But they have not placed an order for more than 5 years. The mean profits of this group is relatively
low. Low return/cancel rates and low profits suggest that these customers are perhaps concerned
that the return/cancel process will bring them many troubles, so they are unwilling to buy the
product with a very high price. For these customers, we need to soothe their worries and convey
the information that Saks is the ideal store to buy high-end products. Meanwhile, we should update
their personal information and demands since they have not purchased products from us for more
than 5 years.
Since the mean of profits in cluster 3 is the highest, these customers are valuable for us. The
return rate and cancel rate are relatively low, but we still need to decrease return and cancel rate
indoor to increase their satisfaction as much as possible. This group’s time duration since last order
data is the second shortest, so we need to retain them and persuade them to set up a long-term
trustworthy relationship with us, helping us to generate more profits in the future.
Cluster 8 has the second-lowest return rate, so these people at least are satisfied with the
products that they have already bought. However, their cancel rate is extremely high which means
we lost most of our potential profits that they intended to purchase at the beginning. Meanwhile,
customers in cluster 8 have long time duration since their last order date, which means that they
are not willing to purchase products from our store since they had bad purchase experience before.
22
For example, they may be disappointed with our website’s slow updating frequency or long
shipping time. Thus, we may need to regain these customers by setting up specific strategies to
target their needs more efficiently.
Although the profits that cluster 4 bring to us are very high, these customers’ net profits are
not as high as we see in the table because of their high return rate and cancel rate. This situation
indicates that customers are dissatisfied with our products or services. We should improve the
quality of our products and optimize our services to convince them to keep purchasing products
from Saks with a lower return rate and cancel rate. In this way, we can prevent the loss of potential
profits from these customers.
No matter in terms of total profits or the mean of profits, customers in cluster 7 generate low
profits for us. Their return rate is the highest, which means they almost return all the products that
they purchased before. Although our employees spend much time and effort serving them and
trying to meet their needs, these people return most of our products. So there must be something
wrong with our products or services. Since this group of customers has negative influences on our
financial situation right now, the spending on them will be more productive and efficient if we can
lower their return rate.
Recommendations
In order to provide appropriate recommendations for our customers based on their different
characteristics, we need to analyze some reasons for consumer’s return and cancelation behaviors.
The difference between these two behaviors is that returns happen when customers have already
purchased products and cancelations happen when people have not paid for the product yet.
As we all know, Saks Fifth Avenue is both a retailer and e-retailer. In our physical store,
23
customers return items largely because of our staff who cannot provide the proper product
information or shopping advice for customers. On the other hand, an increasing amount of
customers are purchasing products on our official website or app. Thus some problems appear. For
example, when a customer purchases a pair of shoes on our website, he/she cannot look at or try
on these products in person. Many customers will be disappointed when they receive the packages
because the products do not match their expectations. Therefore, for these reasons, customers have
higher possibilities of returning their products.
In addition, more and more online retailers appear, which gives people multiple opportunities
to compare price. They can easily find a better price for the same product on other websites, and
once they find it, they will switch to other retailers. Our team has summarized several possible
reasons for return behaviors:
 The product itself cannot satisfy our customers. For instance, if one customer bought a sweater
on our website and she was not satisfied with the material of the cloth, she might return this
sweater.
 Another normal situation is that the product is damaged during the shipping process. Under
this situation, the customer definitely will return his/her product.
 The description of the product is not consistent with the real product or the details of the
product are not provided very clearly. The higher the expectations customers have based on
the description on our website, the more disappointed they will be if the product doesn’t match
the description.
 Shopping guides don’t offer clear explanations for our customers. When customers ask our
shopping guides for some advice or information in our physical stores, it is possible that our
shopping guides are unable to provide proper advice. Misleading information and advice will
24
probably result in return behaviors.
 Poor post-purchase service is another important factor that will cause people to return their
products. Saks is a high-end retailer that the prices of our products are relatively high. When
customers pay a premium for a product, they will have higher requirements for customer
services. If our post-purchase services cannot solve their problems in time and effectively, they
may return their products as well. For instance, when a customer calls our representative to
require an exchange, if we process this demand very slowly, the customer may run out of
patience and decide to return the product directly.
Our team has summarized several possible reasons for cancelation behaviors:
 Customers make some mistakes when they place an order. For instance, they may find that
they chose the wrong size or wrong color when they checkout. Under this situation, they will
cancel the order and replace it with the right order, so this kind of cancelation will not
essentially influence our sales. However, we still need to provide a clearer website design and
better information to help customers place orders correctly. The other condition is that
customers fill in the wrong personal information when they checkout, so they need to cancel
the order and order the product again. This condition doesn’t have significant influences on
our profits because customers usually will place the order again.
 Customers find a better offer on other websites. Since more and more online retailers appear,
many customers are used to comparing prices of the same products on different websites before
they checkout. Once they find a better offer on another on-line retailer, they will cancel the
previous order on our website.
 Personal factors. It happens all the time that customers put items in their shopping carts when
they are stimulated by some external incentives, but they still hesitate to buy. Products from
25
Saks usually have high prices, so a majority of customers need a longer time to consider. After
the impulse disappears, most customers will recover their rational thoughts and decide to
cancel the order.
Based on the previous analysis, Saks can prevent lots of customers’ return and cancelation
behaviors by taking practical actions. We hereby provide managerial recommendations based on
each group’s characteristics.
 Regarding cluster 7, customers generate relatively low profits but their return rate is the
highest. Obviously, we need to decrease the return rate in order to encourage them spend
more on Saks. Firstly, we should improve the quality of the information on our website, such
as providing them more description about product’s details. In this way, customers could have
better understanding before they purchase products.
Secondly, Saks should use better shipping packaging in order to protect products from being
damaged by external forces. According to our research, we find that customers care more
about the packaging when they pay high prices for products. So, delicate packaging can not
only convey a good impression for our company but also match customers’ expectations.
Besides, due to their frequent return behaviors, this group’s profits may be relatively low, so
if we can decrease their return rate, their profits will increase somewhat.
 Regarding cluster 8, this group generates relative high profits, but it also has the highest cancel
rate and has not purchased products from us for a long time. Saks should provide these
customers more straightforward information about products when they do shopping on our
website so as to reduce the probability of misleading them. In addition, Saks should highlight
“low stock” next to the quantity box in order to give customers a hint that this product may be
not available in a short time. In this way, we can largely reduce the time they hesitate and
26
motivate them to pay for the order immediately.
Besides, we can remind customers the number of people who are watching this product at the
same time. Giving them an impression that this product is really popular can motivate them to
complete the transaction quickly. A lot of potential profits will be realized if this group’s
cancel rate can be decreased. Since we have the contact information of these customers, Saks
should send them greeting emails to show our care. By telling them the new changes about
our company and our new arrivals, we can trigger their interests again.
 Regarding cluster 4, the mean profits of this group is the second highest, but their cancel rate
and return rate are relatively high among all groups. Firstly, we need to systematically train
our salespersons and shopping guides so that they have the ability to provide more appropriate
advice and information for our customers. Considering that this group has not placed orders
from us for more than two years, it is really helpful to retarget them by sending them
promotional emails seasonally, especially for holidays. In order to prevent the return behaviors,
we can also provide them discount coupons for their next purchases if they agree to keep their
products this time. If they insist to return, we can offer them a refund, like 5% of the original
price, to convince them not to return.
 Regarding cluster 3, this group generates much higher profits than other groups. So these VIP
customer’s return and cancelation behaviors have more serious negative effects on our profits.
Saks should provide a personal shopping guide for each of them so that we can be aware of
and solve their problems in a timely manner and correctly. Saks will gain huge financial
returns if we can decrease these VIP customer’s return rate to below 1%.
 Regarding cluster 2 and 5, the mean profits of these two groups are in the middle level, and
their cancel rate and return rate are extremely low. Based on our previous analysis, these
27
customers may have some concerns that the returning and canceling process would bring them
inconvenience, so they are unwilling to purchase high-price items. For these customers, we
need to provide them a guarantee that if they are not satisfied our products, they have multiple
channels to contact us, and we will deal with their problems in 24 hours. We believe that they
will spend more money if Saks’ shopping process become more convenient.
 Regarding cluster 6, the population in this group is the largest, which accounts for 30% of all
population. Their mean profits is relatively high. More importantly, the time duration since
their last order is the shortest. In this situation, we should send them promotional emails or
mailings more frequently to maintain their interests and to convince them to keep purchasing
from us. For example, we send them promotion coupons, like 10% discount. For these
customers, we also want them to generate more profits for our company because they have
potential profitability. Thus, we can try to offer them information about some high-end brands’
products through emails or mailings, in an effort to persuade these customers to buy higher
priced products.
28
5. Limitations and Future Research
Though we successfully identify 8 groups with diverse characteristics, we understand our
analysis has its limitations.
We lack some supportive data to serve decision making and reinforce our
recommendations. Our study aims at identifying and investigating actionable customer groups
with unique features. For example, for a high return rate group, convincingly lowering its return
rate increases its profit. However, the current data is capable of identifying who are high
return/cancel rate customers, but does not enable us to investigate why they return and/or cancel
orders. As discussed in the previous sections, the reasons leading to high return/cancel rate are
diverse. Knowing the motivations and reasons of returning and cancelling enables us to improve
and optimize in avoidance of future similar situations. Unfortunately, we could not learn relevant
insights from the current data, or otherwise we would have been able to come up with more specific
recommendations for different segments.
For future research, we have to extend our data diversity, especially adding the data that
assists in learning returning and cancelling reasons. Saks has two major retail channels: online
stores and offline stores. To comprehensively analyze the entire customer pool anticipates an
improved data collection mechanism. For the online channel, one suggestion for future data
collection is to add a check box listing possible return/cancel reasons in the after-sale-service page.
The check box window appears when customers apply for a return or a cancel so that our database
could record and store what we need. By the same token, when customers return in offline stores,
our sale assistants should also learn their return reasons and record them into the sale system.
29
The ultimate goal of analyzing customer information and consumption data is to obtain
financial returns, increased profit for instance. We note that there are various ways to improve
profit. While this study aims at investigating return rate and cancel rate, future research could focus
on improving profit through increasing revenue.
30
6. Appendix
Cluster Number
M C M C M C M C M C M C
Last Order Date 32 627 76 1039 18 2302 43 254 53 1684 48 135
Profit 308 627 50 1039 85 2302 66 254 88 1684 75 135
Return Rate 18 627 0 1039 0 2302 100 254 0.05 1684 0 135
Cancel Rate 11 627 0 1039 0 2302 0 254 0.11 1684 100 135
Cluster Number
M C M C M C M C M C M C M C
Last Order Date 28 223 76 1039 18 2302 34 404 43 254 53 1684 48 135
Profit 611 223 50 1039 85 2302 141 404 66 254 88 1684 75 135
Return Rate 4 223 0 1039 0 2302 26 404 100 254 0.05 1684 0 135
Cancel Rate 6.6 223 0 1039 0 2302 14 404 0 254 0.11 1684 100 135
Cluster Number
M C M C M C M C M C M C M C M C
Last Order Date 28 223 76 1039 27 994 34 404 12 1308 43 254 53 1684 48 135
Profit 611 223 50 1039 58 994 141 404 106 1308 65.91 254 88 1684 75 135
Return Rate 4.06 223 0 1039 0 994 26 404 0.01 1308 100 254 0.05 1684 0 135
Cancel Rate 6.6 223 0 1039 0 994 14 404 0 1308 0 254 0.11 1684 100 135
7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6
Table 1. Hierarchical Cluster Analysis on 10% Calibration Sample
1 2 3 4 5 6 7 8
Cluster Number 1 2 3 4 5 6 7 8
Zscore (Profit) -0.49722 1.41462 -0.56187 -0.26828 -1.16908 0.07006 0.50463 0.29141
Zscore (Return Rate)
Zscore (Cancel Rate)
-0.27425 -0.27678
0.23681 -0.1999 -0.1999 0.70025 -0.1999 -0.1999 -0.19288 6.4128
-0.08598 -0.27678 -0.27678 0.94381 -0.27648 4.42492
Table 2. Initial Cluster Centers for Calibration Sample
2.86974 -0.29206 -0.24894 0.22321 0.02253 -0.20202 -0.07664 -0.15222Zscore (Time Durtion)
31
1 2 3 4 5 6 7 8
1 1.299 0.88 0.303 0.49 1.333 0.338 0.324 0.455
2 0.104 0.388 0.401 0.158 0.26 0.004 0.108 0.031
3 0.029 0.412 0.151 0.065 0.064 0.003 0.141 0
4 0.037 0.322 0.083 0.041 0.009 0.001 0.131 0.02
5 0.029 0.254 0.08 0.033 0.018 0 0.105 0
6 0.011 0.207 0.029 0.031 0.03 0 0.055 0.006
7 0.004 0.15 0.018 0.027 0.023 0 0.034 0
8 0.005 0.132 0.012 0.022 0.016 0 0.02 0.002
9 0.004 0.116 0.006 0.018 0.011 0.001 0.009 0.002
10 0.007 0.082 0.006 0.014 0.007 0.001 0.007 0.006
11 0.006 0.057 0.003 0.012 0.006 0 0.005 0
12 0.006 0.055 0.003 0.005 0.004 0 0.003 0
13 0.007 0.046 0.002 0.006 0.004 0 0.001 0
14 0.009 0.033 0.001 0.002 0.006 0 0 0
15 0.016 0.026 0.001 0.003 0.01 0 0.001 0
16 0.014 0.026 0.001 0.005 0.011 0 0.003 0
17 0.006 0.031 0.001 0.005 0.007 0 0.004 0
18 0.004 0.021 0.001 0.005 0.003 0 0.002 0
19 0.001 0.019 0.001 0.003 0.001 0 0.001 0
20 0.001 0.015 0 0.003 0.001 0.001 0 0
21 0 0.015 0 0.001 0.001 0 4.68E-05 0
22 0.001 0.017 0.001 0.003 8.96E-05 0 4.97E-05 0
23 0 0.014 0 0.002 0 0 0 0
24 0 0.007 0 0.002 0 0 0 0
25 0 0.002 0 0.003 0 0 5.31E-05 0
26 0 0 0 0.001 0 0 0 0
27 0 0 3.02E-05 0 0 0 3.86E-05 0
28 0 0 0 0 0 0 0 0
Table 3. Iteration History for Calibration Sample
Iteration
Change in Cluster Centers
32
Cluster Number 1 2 3 4 5 6 7 8
-0.1987 -0.1987 6.2202
Zscore (Time Duration)
Zscore (Profit)
Zscore (Return Rate)
Zscore (Cancel Rate) -0.1706 0.15895 -0.1903 0.88898 -0.1938
-0.1915 -0.1269 0.03311
-0.2396 -0.1166 -0.2484 0.89573 -0.2634 4.33294 -0.2722 -0.2672
-0.1925 4.58926 -0.0074 0.37726 -0.1368
Table 4. Table Final Cluster Centers for Calibration Sample
1.60161 -0.6019 -1.1245 -0.2261 0.8055 0.00026 -0.1061 0.2118
1 7057
2 1021
3 17777
4 2802
5 13597
6 2626
7 13894
8 1251
Valid
Missing
Table5. Number of Cases in each Cluster for Calibration Sample
Cluster
60025
54
Cluster Number 1 2 3 4 5 6 7 8
Zscore(Profit) 0.05353 0.9094 1.63409 0.00658 -0.50291 -1.07304 -0.23003 0.15992
2.46949 6.4128
-0.27678 0.00582 -0.27678
Zscore (Cancel Rate) -0.1999 -0.18996 -0.1999 -0.1999 -0.19252 -0.1999
Zscore (Return Rate) -0.27678 -0.27678 -0.27678 4.42075 0.56365
Table 6. Initial Cluster Centers for Validation Sample
Zscore (Time Duration) -0.21249 -0.09599 -0.24407 -0.20105 0.62656 -0.19915 0.30391 -0.02783
33
Iteration 1 2 3 4 5 6 7 9
1 0.405 0.206 1.192 0.294 0.859 0.543 0.579 0.258
2 0.433 0.091 0.721 0.17 0.122 0.387 0.085 0.063
3 0.129 0.134 0.59 0.199 0.062 0.094 0.019 0.003
4 0.024 0.212 0.577 0.136 0.016 0.025 0.007 0.018
5 0.021 0.214 0.534 0.071 0.009 0.017 0.018 0
6 0.012 0.189 0.553 0.024 0.01 0.017 0.011 0
7 0.003 0.135 0.431 0.009 0.006 0.014 0.006 0
8 0.004 0.102 0.404 0.003 0.002 0.009 0.003 0
9 0.006 0.099 0.412 0.002 0.001 0.007 0.004 0
10 0.005 0.098 0.498 0.001 0.001 0.006 0.007 0
11 0.005 0.078 0.383 0.002 0.001 0.004 0.006 0
12 0.004 0.068 0.286 0.002 0 0.004 0.005 0
13 0.005 0.065 0.26 0.002 0.001 0.003 0.008 0
14 0.004 0.057 0.225 0 0 0.003 0.003 0
15 0.004 0.051 0.177 0.001 0.001 0.002 0.006 0
16 0.003 0.042 0.19 0 0 0.001 0.002 0
17 0.003 0.039 0.178 0 0 0.002 0 0
18 0.003 0.033 0.191 0 0 0.001 0 0
19 0.002 0.034 0.175 0.001 0 0.001 0.002 0
20 0.001 0.033 0.257 0 0.001 0.001 0.004 0
21 0.002 0.032 0.21 0 0.001 0 0.002 0
22 0.002 0.033 0.265 0.001 0 0.001 0.008 0
23 0.001 0.023 0.079 0.001 0 0.001 0.002 0
24 0.001 0.015 0.041 0 0 0.001 0.007 0
25 0.001 0.012 0.041 0 0 0.001 0 0
26 0.001 0.007 0 0 0 0 0 0
27 0 0.002 0 0 0 0 0 0
28 0 0.001 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0
Table7. Iteration History for Validation Sample
34
Cluster Number 1 2 3 4 5 6 7 9
6.40827
-0.08657 -0.27678
Zscore (CancelRate) -0.18893 -0.00429 0.49115 -0.19921 -0.19388 -0.19533 2.58208
-0.15752 0.33883 -0.04898
Zscore (ReturnRate) -0.23482 -0.08775 -0.13979 3.7263 -0.25802 -0.24841
Zscore (Profit_sum) -0.08745 2.38355 10.34737 -0.13262 -0.17638
Table8. Final Cluster Centers for Validation Sample
Zscore (Time Duration) -1.10393 -0.62541 -0.55918 -0.03315 1.15412 0.01119 0.00878 0.20639
1 11681
2 1604
3 99
4 2395
5 12078
6 10425
7 810
8 0
9 780
Valid
Missing
Table 9. Number of Cases in each Cluster for Validation
Cluster
39872
45
Cluster Number 1 2 3 4 5 6 7 8
Zscore (Time Duration)
Zscore (Profit)
Zscore (Return Rate)
Zscore (Cancel Rate)
Table10. Initial Cluster Centers for All Data
3.39681 -0.066 -0.239 0.26632 0.67681 -0.2987 -0.2043 -0.1314
-0.6108 -1.0087 1.36108 -0.2214 0.48402 0.24312 0.04853 0.2496
-0.1109 -0.2767 -0.2768 0.99106 -0.2506 -0.2768 4.42492 -0.2768
0.19242 -0.1999 -0.1999 0.75803 -0.1211 -0.1999 -0.1999 6.4128
35
1 2 3 4 5 6 7 8
1 1.463 0.594 1.082 0.447 0.729 0.614 0.325 0.414
2 0.308 0.183 0.411 0.139 0.103 0.195 0.009 0.035
3 0.082 0.109 0.349 0.068 0.075 0.077 0.001 0.011
4 0.042 0.063 0.285 0.051 0.047 0.03 0.001 0.008
5 0.021 0.03 0.233 0.04 0.023 0.017 0 0.005
6 0.011 0.021 0.184 0.03 0.017 0.01 0.001 0
7 0.006 0.015 0.151 0.02 0.013 0.008 0.001 0
8 0.004 0.006 0.121 0.01 0.004 0.007 0 0
9 0.001 0.002 0.109 0.01 0.002 0.005 0 0.008
10 0 0.001 0.087 0.012 0.001 0.003 0 0.003
11 0 0.001 0.051 0.009 0 0.002 0 0.002
12 0 0 0.048 0.009 0 0.002 0.001 0.001
13 0 0 0.036 0.006 0.001 0.001 0 0
14 0 0 0.028 0.006 0 0.001 0 0.002
15 0 0 0.022 0.005 0 0.001 0 0
16 0 5.54E-05 0.019 0.006 0 0 0 0
17 0.001 0 0.017 0.005 0 0 0 0
18 0 9.84E-05 0.018 0.004 9.04E-05 0.001 0 0
19 0 0 0.014 0.002 7.18E-05 0.001 0 0
20 0 7.04E-05 0.011 0.002 0 0 0 0
21 0 0 0.005 0.001 0 0 0 0
22 0 3.14E-05 0.01 0.002 0 0 0 0
23 0 0 0.013 0.002 0 0 0 0
24 0 0 0.01 0.003 0 0 0 0
25 0 0 0.004 0.003 6.85E-05 0 0 0
26 0 0 0.003 0.001 6.85E-05 8.99E-05 0 0
27 0 0 0.004 0.001 6.85E-05 9.97E-05 0 0
28 0 0 0.004 0.001 9.62E-05 0 0 0
29 0 0 0.005 0.002 9.80E-05 0 0 0
30 0 4.69E-05 0.004 0.002 0 7.14E-05 0 0
31 0 4.68E-05 0.006 0.002 0 6.64E-05 0 0
32 0 0 0.004 0.001 0 9.27E-05 0 0
33 0 0 0.004 0 0 0 0 0
34 0 0 0.003 0 0 8.59E-05 0 0
35 0 0 0 0 0 0 0 0
Table 11. Iteration History for All Data
Iteration
Change in Cluster Centers
Cluster Number 1 2 3 4 5 6 7 8
6.2413
4.33483 -0.2688
Zscore (Cancel Rate) -0.1736 -0.1987 0.14937 0.9007 -0.1939 -0.1904 -0.1992
-0.0092 -0.1946 0.00853
Zscore (Return Rate) -0.2394 -0.2717 -0.1138 0.88815 -0.2632 -0.2489
Zscore (Profit) -0.1938 -0.1337 4.57319 0.39816 -0.1373
Table 12. Final Cluster Centers for All Data
Zscore (Time Duration) 1.60055 -0.1037 -0.6051 -0.2386 0.80404 -1.1248 0.00654 0.20816
36
1 11778
2 23015
3 1717
4 4726
5 22793
6 29415
7 4371
8 2082
Valid
Missing
Table 13. Number of Cases in each Cluster for All Data
Cluster
99897
99

Saks Fifth Avenue

  • 1.
    Saks Fifth AvenueCustomer Behavior Report ——Based on Data Driven Analysis Group 8: Linhan zhang, Zhongyuan Lian, Huiruo Zhang, Yitian Chen
  • 2.
    Executive Summary Saks FifthAvenue is a luxury department chain store which sells high-end brands both online and offline. The objective of this research is to help Saks Fifth Avenue (hereafter Saks) decrease customer’s return rate and cancel rate so as to improve customer’s profitability and satisfaction. Also, we want to regain our old customers as well as increase their loyalty. The original data in our research comes from the Customer Relationship Management Database in Saks. This database records a wide range of historical sale information based on every single order line, including over 137,000 orders from 100,000 customers. Each order line records customer information in terms of customer ID number, and ZIP Code, and transaction-related items such as order date, shipping date, revenue, cost etc. Since we intend to segment our customers based on their return record, order cancel record, total profits, and the time of their most recent order, we aggregate all records into a new data file with individual level. Then, we choose four key factors as our variables which are profits, return rate, cancel rate and time duration since last order date. We use K-means cluster analysis as our major segmentation method. We divide whole data into calibration set and validation set, and conduct K-means cluster analysis on each of them to make sure that we will not miss any meaningful group of customers. Furthermore, different methods are conducted to explore our research several times. After outcomes of K-means cluster analysis match our expectation, we summarize and interpret our key findings. There are 8 clusters which have meaningful features respectively. Among them there are three groups that interest us most. The first group makes up about 30% of all customers which generate high profits, and their
  • 3.
    return/cancel rate areextremely low. They have shortest time duration since last order date. Obviously, they are the core customers for our company and we should take an action to retain these customers in order to generate more profits. For example, we could offer them better services and high quality products to increase customer loyalty and satisfaction. The second one is a group of customers who generate relatively high profits to our company, while their cancel rate are extremely high. These customers were able to generate a huge profit for us. However, they are likely to cancel their orders due to some reasons. What makes things worse is that they will cause additional costs for our company since we need to provide special services when they return items. For these kinds of customers, we need to figure out their true needs and the reasons of high cancel rate. They have huge financial potential if we can increase their customer satisfaction. They could turn into the first group of customers and generate a huge profit to the company. The third group includes customers who have relatively lower profits, but their return rate is very high. These customers are unsatisfied with our products or services so they keep returning their items back. This group is a huge financial burden for Saks, so we have to decrease their return rate by figuring out the reasons and taking any actions to increase their satisfaction. Ultimately, we analyze the major reasons, which cause high rate/cancel rate. Based on our previous analysis, we provide different managerial recommendations for each groups regarding their significance and characteristics. These recommendations will serve to decrease customers’ return rate and cancel rate and eventually increase profits for Saks in the future.
  • 4.
    Table of Contents 1.Introduction .................................................................................................1 2. Background .................................................................................................2 3. Methodology and Analysis............................................................................4 Definition of Clustering Analysis...................................................................5 Data Obtained and Used................................................................................7 Variables selection and Explanation ...............................................................7 Data Preparation ...........................................................................................9 Calibration and Validation...........................................................................11 Clustering Settings......................................................................................11 Measure Interval: Euclidean Distance....................................................12 Cluster Method: Ward’s Method............................................................12 Standardization: Z scores......................................................................13 Specific Operations.....................................................................................13 Findings from Clustering Results .................................................................18 4. Conclusion & Recommendations.................................................................20 Recommendations.......................................................................................22 5. Limitations and Future Research .................................................................28 6. Appendix...................................................................................................30
  • 5.
    1 1. Introduction Imagine youare a store owner selling limited-edition Prada’s purse which normally more than $5000. Which kind of customer is more valuable for you? A customer who spends average amount of money but never returns or cancels the order? Or a customer who spends huge amount of money but returns or cancels most their orders at the end? This is a significant but tricky question for every company, especially for Saks Fifth Avenue who has higher unit price. It is said that customers are the most valuable equity for companies. As a luxury department store, Saks sells products that are much more expensive, which means every single purchase means a lot to the company in financial level. As a result, high return and cancel rate are more lethal for Saks than regular department stores, for example, Macy’s. At the meantime, customer satisfaction and loyalty that directly decide the company’s fate are also extremely significant for Saks. What is more, it is also important for us to know how often a customer comes back and purchase. According to our background research, the major managerial issue of Saks is to increase profit by reducing return/cancel rate as well as regaining customers who have not purchased more than one year. Through a series of analysis and comparison, we segment whole customers into 8 groups based on profit that they generate, the time duration since their last order date, return rate and cancel rate. Each of group has their own meaningful features. Some of them generate the highest profits while have not purchased for more than two years. Some of them generate high profit while also have high return/cancel rate. We have discussed each cluster in detail in the following report. We will elaborate each group’s features and provide managerial recommendations.
  • 6.
    2 2. Background Saks FifthAvenue is a luxury department chain store that was founded in 1867. With such a long history, Saks has established its own customer pool with large quantity of loyal customers. Most customers go shopping in Saks for their nice service and latest fashion. There are a number of world famous luxury brands in Saks including Gucci, Prada and FENDI. Staffs in Saks are very professional and they usually offer customers thoughtful advices during the purchase process. However, based on our research, Saks cannot generate as much revenue as it did a few years ago. The competition between department stores is becoming more and more fierce. Main competitors of Saks such as Bloomingdale’s and Neiman Marcus have made much pressure on Saks by using price-off promotions. Even medium range department stores, saying Macy’s, and online stores, like amazon.com, are competing with Saks. More competitions mean customers have more choices. However, for Saks it leads to high return and cancel rate because once customers find a lower price on amazon.com, the first action they will take is to cancel their orders on our website. Moreover, high service costs make Saks more difficult to generate considerable profits. As a result, Saks has faced much more challenges than it ever did and they need to find a way to solve their own problems and keep growing. In recent years, Saks has introduced their online stores and app to enlarge their market share and attract more young customers. Online shopping is an easier and cheaper way to purchase items for both customers and companies. However, it raises several issues as well. Since Saks sells many apparels and makeups, it is impossible for customers to try them on before purchasing on the website. Once customers find out that the product does not match their expectation, they will return items back. Therefore, online shopping has increased return/cancel rate, which leads the
  • 7.
    3 company to spendadditional costs. Our team will help Saks to figure out solutions to these issues by reducing return/cancel rate as well as increasing customer satisfaction.
  • 8.
    4 3. Methodology andAnalysis The nature of retailing industry reflects the great importance of a deep understanding of customers. Saks Fifth Avenue specializes in selling various high-end brands including Gucci, Burberry, and Prada etc. In an effort to increase the company’s profit, we notice that working on reducing return rate and cancel rate could play a crucial role in achieving this objective. Once a customer returns a product or cancels an order, we actually lose not only the potential profit, but also the previous effort we invested in acquiring this customer and in attracting her to visit our locales. Therefore, we pay most of our attention in investigating returning and cancelling so as to obtain actionable insights of which we can take advantage. We intend to conduct cluster analysis to segment our historic customers in terms of their return record, cancel order record, and total profit generated throughout their accumulated consumptions in Saks Fifth Avenue, as well as the time of their most recent order. By conducting cluster analysis, we discover separate groups that differ from each other in these aspects. Then we compare them, identify their differences, and evaluate the possible reasons to these differences. After understanding the characteristics and implications of these groups, we are able to come up with corresponding recommendations that can improve their future performance. The objective of this study is to identify different customer groups in terms of the above four aspects and screen out specific contact information of the customers in each group for direct marketing, eventually decreasing return rate and cancel rate, increasing customer satisfaction and profit, and regaining old customers.
  • 9.
    5 Definition of ClusteringAnalysis Originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939, cluster analysis is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). [Reference] The outcome of cluster analysis is to create a set of segments from a set of individual samples. Samples in the same segment share more commonalities with each other than they do with samples from other segments. In business, Cluster Analysis is a popular and frequently used method to realize market segmentation, which is an important part of marketing planning. Our research adopts two types of clustering methods: Hierarchical Clustering and K-means Clustering. Hierarchical Clustering is useful when sample size is relatively small. Different selections of clustering method and measure interval lead to different clustering results. Among them we select the one that meets our expectation in terms of segment size, segment characters and between-segment differences. Hierarchical clustering engenders an exploratory insight for following K-means clustering analysis, which, together with hierarchical clustering, is capable of big-size-sample segmentation. An effective and efficient cluster analysis on a big size data set requires the combination of these two clustering methods. The Flowchart (see Fig. 3.1) demonstrates the procedure of our cluster analysis.
  • 10.
  • 11.
    7 Data Obtained andUsed The data analyzed in our research comes from the Customer Relationship Management Database in Saks. This database records a wide range of historical sale information based on every single order line. Each order line records customer information including customer ID number and ZIP Code, and some key items recorded during a transaction such as order date, shipping date, price, cost, etc. Different order lines may have the same order number, showing that these order lines are from the same order. By the same token, different order numbers may have the same customer number, meaning that the customer placed these orders in different times. The size of the data we obtain is significant enough to produce representative insights. We select the records start from 12/16/2004 to 09/17/2012, covering more than 226,000 order line records. These records reflect over 137,000 orders from 100,000 customers. Variables selection and Explanation After understanding the descriptions of the variables in a record line, we determine four clustering variables. They are: Total Profit, Return Rate, Cancel Rate, and Time Duration since Last Order Date. These variables are not included in the current data set but can be calculated from some of the existing variables. There are other variables we will use to describe the features and attributes of our result segments, including Customer Number, Zip Code, etc. Each of the clustering variables has its unique meaning and implication for us. Total Profit Profit is the most important indicator of a customer’s value. The higher the profit a customer generates, the more imperative it is to maintain him/her. Generally, a company has finite resources available for customer relationship management. If it invests equal resources in every
  • 12.
    8 customer in spiteof their value, unavoidably, it will end up with high profitable customers not served and maintained hospitably and with low profitable customers occupying much resources but not creating enough profit in return. Therefore, while our final objective is to come up with actionable recommendations to different customer groups, the Total Profit tells us which group requires more attentions and hence, more resources. Return Rate Return Rate conveys important information about the consumption characteristics of a customer. A high return rate has many implications. For example, an unclear or misleading product description could result in customers’ complaints after receiving their packages, which always leads to returning. A high return rate could also be attributed to customer’s particular taste. No matter what leads to high return rate, the higher the return rate, the more profit we loss. While increasing revenue is a pathway to greater profit, lowing unnecessary loss is also an effective one. Saks dedicates in high-end niche market. The nature of high-end brands, generally speaking, have smaller sale volume than lower tier brands, but they invest more to support their high-end brand positioning and marketing. Saks’s well-qualified salesperson, high rental fee, and high advertising budget imply a high operational cost. Once return or cancel happens, though most of products can be resold, we waste a lot of costs. This is another reason we attach great importance to return rate and cancel rate. With these considerations, we select return rate as one of our clustering variables. Cancel Rate Cancel Rate refers to the ratio of one’s cancel order lines to total order lines. The same as return rate, it has a strong relation with profit, but in a different way. While returning is a customer’s decision after receiving the ordered products, cancelling means a customer changes her
  • 13.
    9 mind before that.We assume that return rate, relatively, comes down to the dissatisfaction of our products and that it implies unsatisfying customer purchasing experiences such as chaotic shopping guidance and poor customer services. By the same token as return, a decreased cancel rate brings corresponding increased profit. Therefore, we put cancel rate in our variable list. Time Duration since Last Order Date Time duration since last order date is the time period between the date a customer placed his last order and current date. We audit the data set and find there are a large number of customers have been a long time not shopping in Saks again. The longer the duration, the higher the possibility that the customer has already defected. This variable matters our decision making in that marketing strategies and plans can be totally different towards new customers and old ones. And so too is the resulting marketing effects. Relatively new customers are easier to contact and attract because their contact information is up to date and because they have stronger connection with our brand and products. On the other hand, customers who have more than three years not coming back are of lesser value and priority due to the opposite reasons. Therefore, differentiating new and old customers through clustering is meaningful. Data Preparation The records in the data set is ordered basing on every single order line. Since the objective of our cluster analysis is the acquirement of information on an individual basis, we aggregate all records into a format with customer number as key value. We then audit the aggregated date set and determine the calculations to transform existing variables into the four clustering variables. Table 3.1 offers a comprehensive explanation of the variables used and the ones we compute, in the order of variables used throughout this analysis.
  • 14.
    10 Table 3.1 Overviewof the Variables Used Variables Explanation Original variables Customer Number A unique customer identification numeric string with 11 digit. Each customers has only one customer number. ZIP Code 5 digit ZIP Code referring the location of a customer where he/she places an order. Order Number A unique 9 digit numeric string referring to a specific order. One customer could have placed more than one orders with different order number. Order Line Line number for each unique product in an order. Order Date Date an order was placed Quantity Quantity of a product in an order Revenue The total price of an order line Cost The total cost of an order line Return Quantity The quantity of returned product Computed Variables before aggregation Profit The profit of a single order line. Calculation: Profit = Revenue - Cost Time Duration since Order Date (Month) The time duration between today and the day the order was placed. Calculation: Time Duration Since Order Date = Date of Today – Order Date Aggregated Variables (Aggregate by Customer Number) Last ZIP Code The ZIP Code a customer places his/her last Order Total Profit The summed Profit of a customer’s all order(s) Time Duration since Last Order Date (Month) The time duration between today and the day the customer’s last order was placed. Total Quantity The total quantity of products of a customer has ever purchased including returned and cancelled quantity Total Return Quantity The total quantity of products a customer has ever returned
  • 15.
    11 Total Cancel Quantity Thetotal quantity of products a customer has ever cancelled Computed Variables after aggregation Return Rate The return rate of a customer’s historical consumptions. Calculation: Return Rate = Total Return Quantity / Total Quantity Cancel Rate The cancel rate of a customer’s historical consumptions. Calculation: Cancel Rate = Total Cancel Quantity / Total Quantity Calibration and Validation After the four clustering variables are ready, we divide the data set into two subsets: Calibration sample set (including 60% records of all) and Validation sample set (including 40% records of all). The Calibration sample set is used to generate a promising division of segmentations, while the validation sample set is used to verify whether that division is appropriate and representative. Conducting clustering on both these sets ensures no meaningful segments are missed. If true, then a clustering on the entire data set is conducted to further testify that division. This verification mechanism is useful in guaranteeing the accuracy and the representativeness of our analysis. Clustering Settings Randomly selecting 10% samples from calibration set, we formulate the approach to hierarchical clustering. There are three crucial decisions: selection of cluster method, selection of measure interval, and whether or not to standardize clustering variables.
  • 16.
    12 Measure Interval: EuclideanDistance Measure Interval decides the calculation standard of the distance between two samples. Two popular measure interval metrics are Squared Euclidean Distance and Euclidean Distance. While the distance between two given samples is X according to Euclidean Distance algorithm, it becomes X 2 in the case of Squared Euclidean Distance algorithm. Squared Euclidean Distance amplifies the numeric value of a fixed distance, and thus the variance between samples is enlarged. An enlarged variance alienate two samples. However, we prefer two similar samples to be convergent rather than distant. Therefore, we select Euclidean Distance. Cluster Method: Ward’s Method Cluster Method decides the criterion that judges the distance between two clusters. Two alternative methods are Furthest Neighbor and Ward’s method. Furthest Neighbor method determine the longest distance between any two members of the two clusters as the distance between the two clusters. This method is effective in identifying the small sample groups that are conspicuously different from others, and correspondingly, the outcome clusters always happen to have the majority of samples converge in a few large groups with the rest minority samples assigned to much smaller groups. On the other hand, Ward’s method used sum of squared-errors as the measure of distance and thus tends to produce groups of similar size. Our analysis aims at identifying groups with different characters with respect to the four clustering variables. The identified groups should be adequately sizable for actionable marketing campaigns, which means that some of the segments identified by Furthest Neighbor method might be too small to meet our expectations. On the contrary, Ward’s method provides groups with relatively even sample distribution and is our choice.
  • 17.
    13 Standardization: Z scores Standardizationis required when clustering variables produce different weighted influences on the result. Standardization transforms the variables into comparable forms so that they have equal influences and significances. In our study, the four variables have obviously different value ranges and variances. To guarantee an accurate analysis, we standardize them. Specific Operations Firstly, according to our purpose, we must create several new variables in order to complete further analysis. Since customer’s profit is the key factor for our analysis, we use the following equation to calculate a new variable named “Profit”. Profit = Revenue – Cost1 Because we want to know how many months passed since each customer’s last order date, we use “Date and Time Wizard” to create a new variable named “Time Duration”. Secondly, we use “Recode into Same Variables” to replace the missing values in cancel quantity, return quantity and quantity with zero. Thirdly, we aggregate the original data file into a new data file. The break variable is “Customer Number”, and the aggregated variables are “zip code(last)”, “profit(sum)”, “last order data(minimum)”, “cancel quantity(sum)”, “return quantity(sum)” and “quantity(sum)”. Finally, because we want to know each customer’s return rate and cancel rate, we create two new variables named ”Return Rate” and “Cancel Rate” by using following formulas: Return rate = return quantity / quantity 1 * Actually the precise total profit of a customer should be calculated by the formula: Profit = (Revenue - Cost)*[1-(Return Quantity + Cancel Quantity)/Quantity] However, this calculation losses the ability to demonstrate a customer’s potential consumption power since his/her returned and cancelled profit are excluded. In our study, we want to investigate customer’s true consumption power and therefore, we use: Profit = Revenue – Cost.
  • 18.
    14 Cancel rate =cancel quantity / quantity Standardize Decision Variables We calculate Z-Scores for all decision variables including “Profit”, “Time Duration”, “Return Rate” and “Cancel Rate”. Then we save them as new variables. Split the Sample We use “Select Cases” to split the whole data into a calibration sample which is about 60% of all data and a validation sample which is about 40% of all data. Hierarchical Clustering Firstly, we choose 10% from the calibration sample as our small subset. Secondly, we run Hierarchical Cluster Analysis to determine the number of clusters. We choose Ward’s method, Euclidean distance and Z scores as our methods. According to the marked line, we choose 6 to 8 as the range of solutions. Based on the comparison of the Custom Tables, we choose 8 clusters as the number of clusters because we can obtain most clear and meaningful managerial insights. The detailed Custom Tables are attached on Appendixes. Thirdly, we conduct Hierarchical Cluster Analysis to identify the cluster centers. Finally, we save the outcome in a new data file as initial seeds which are attached on Appendixes (see Table 1 in the Appendix). K-Means Cluster Analysis We use the results of Hierarchical Cluster Analysis as initial seeds and conduct K-means Cluster Analysis for the Calibration Sample. We choose 80 as maximum iterations and save cluster membership. The valid cases are 60,025, and the missing cases are 54. The Initial Cluster Centers (see Table 2 in the Appendix), Iteration History (see Table 3 in the Appendix), Final Cluster
  • 19.
    15 Centers (see Table4 in the Appendix) and Number of Cases (see Table 5 in the Appendix) in each Cluster are attached on Appendixes. Exploring Results In order to make sure we obtain optimal result, we use different random subsets and different methods to conduct Hierarchical Cluster Analysis. When we use Furthest Neighbor and Squared Euclidean distance as methods, the outcome is obviously inappropriate because most data are concentrated on 2 clusters. Other 6 clusters have extremely small and meaningless counts. More importantly, we cannot find the ideal group which has high return rate and high cancel rate. Then we save these outcomes as initial seeds in order to run K-means Cluster Analysis. We conduct K-means Cluster Analysis using different initial seeds. As expected, the results of calibration sample, the results of validation sample, and all data results cannot match in major clusters respectively. Finalize Calibration Results Based on our previous analysis, we finalize our decision by running K-means Cluster Analysis on Calibration sample. The following is calibration results (Table 3.2).
  • 20.
    16 Validation Sample Firstly, weconduct Hierarchical Cluster Analysis to identify the cluster centers. We still use Ward’s method and Euclidean distance as our methods when we run Hierarchical Cluster Analysis. Then we run K-means Cluster Analysis on Validation sample using new initial seeds. The valid cases are 39,872, and the missing cases are 45. The Initial Cluster Centers (see Table 6 in the Appendix), Iteration History (see Table 7 in the Appendix), Final Cluster Centers (see Table 8 in the Appendix) and Number of Cases in each Cluster (see Table 9 in the Appendix) are attached on Appendixes. The following is validation results (see Table 3.3). Cluster Number Mean Count Mean Count Mean Count Mean Count Profit 67.6 7057 916.98 1021 100.48 17777 168.81 2802 Return Rate 0.79 7057 3.41 1021 0.6 17777 24.94 2802 Cancel Rate 0.44 7057 5.43 1021 0.15 17777 16.47 2802 Time Duration 81 7057 26 1021 13 17777 35 2802 Cluster Number Mean Count Mean Count Mean Count Mean Count Profit 77.49 13597 67.79 2626 79.26 13894 107.68 1251 Return Rate 0.28 13597 98.04 2626 0.1 13894 0.2 1251 Cancel Rate 0.09 13597 0.02 2626 0.02 13894 97.09 1251 Time Duration 61 13597 41 2626 38 13894 46 1251 5 6 7 8 Table 3.2 Calibration Clustering Results 1 2 3 4
  • 21.
    17 Compare and Finalize Wecompare the calibration results and validation results, and they are consistent. Especially, the most managerial meaningful clusters which have high return rate and cancel rate are consistent. Therefore, we conduct Hierarchical Cluster Analysis to identify the cluster centers. We still use Ward’s method and Euclidean distance as our methods. Then we run K-means Cluster Analysis on all data using new initial seeds. The valid cases are 99,897, and the missing cases are 99. The Initial Cluster Centers (see Table 10 in the Appendix), Iteration History (see Table 11 in the Appendix), Final Cluster Centers (see Table 12 in the Appendix) and Number of Cases in each Cluster (see Table 13 in the Appendix) are attached on Appendixes. The following is all data results (see Table 3.4). Cluster Number Mean Count Mean Count Mean Count Mean Count Profit 86.26 11681 525.19 1604 1939 99 78.24 2395 Return Rate 0.89 11681 4.02 1604 2.91 99 85.14 2395 Cancel Rate 0.17 11681 2.96 1604 10.45 99 0.01 2395 Last Order Date 13 11681 25 1604 27 99 40 2395 Cluster Number Mean Count Mean Count Mean Count Mean Count Profit 70.47 12078 73.82 10425 161.98 810 93.1 780 Return Rate 0.4 12078 0.6 10425 4.05 810 0 780 Cancel Rate 0.09 12078 0.07 10425 42.07 810 99.93 780 Time Duration 69 12078 41 10425 41 810 46 780 5 6 7 8 Table 3.3 Validation Clustering Results 1 2 3 4
  • 22.
    18 Findings from ClusteringResults  The cluster 3 is one of the key customer clusters because this group of people contribute the highest profit, which is $914.13, to us. Also, their return rate is 3.47% and their cancel rate is 5.28% which are relatively low. The average time duration since last order date is 26 months which is the second shortest among all clusters.  The cluster 6 is also extremely important for us because the profit of this cluster is $100.16 which is relatively high. The last order time is the shortest among all clusters, and their return rate and cancel rate are both under 1%. Moreover, the customer number of this cluster is the largest and makes up nearly 30% of all data sample.  The cluster 4 is one of clusters which our team wants to highlight. The customers in this group have second highest profit which is $172.52 and third shortest time duration since last order date which is 35 months. However, their return rate and cancel rate are 24.78% and 16.64% Cluster Number Mean Count Mean Count Mean Count Mean Count Profit 67.38 11778 78.05 23015 914.13 1717 172.52 4726 Return Rate 0.8 11778 0.11 23015 3.47 1717 24.78 4726 Cancel Rate 0.4 11778 0.02 23015 5.28 1717 16.64 4726 Last Order Date 81 11778 38 23015 26 1717 35 4726 Cluster Number Mean Count Mean Count Mean Count Mean Count Profit 77.4 22793 100.16 29415 67.23 4371 103.31 2082 Return Rate 0.29 22793 0.59 29415 98.08 4371 0.17 2082 Cancel Rate 0.09 22793 0.14 29415 0.01 4371 97.41 2082 Time Duration 61 22793 13 29415 41 4371 46 2082 5 6 7 8 Table 3.4 All Data Clustering Results 1 2 3 4
  • 23.
    19 respectively. We canobtain huge financial return if we can lower their return rate and cancel rate.  The cluster 7 is another group which we want to deeply analyze. The profit of this cluster is $67.23, and their cancel rate is 0.01%, and last order period is 41. But, we are surprised that their return rate is 98.08%. It means we have been spent large amount of money to serve this group of customers and they have a huge negative influence on our company’s financial status. We can largely cut down company’s cost by decreasing their cluster’s return rate.  The cluster 8 is surprising us as well. Their profit is $103.31 which is third highest among all clusters. The return rate of this group is 0.17%, and the time duration since last order date is 46 months. However, the cancel rate of this cluster is 97.41%. From our perspective, this group of customers has large profit potential if we can optimize our purchase process to lower the cancel rate.  The cluster 2 and cluster 5 are also significant for our analysis because of several reasons. These two groups have large customer number. Although the profit of these two groups is both under $80, the return rate and cancel rate are extremely low which all under 0.3%. Meanwhile we also need to notice that their time duration since last order date are more than 3 years, so we must figure out how to “arouse” those old customers.  Cluster 1 is relatively unimportant for this analysis. Although the profit is $67.38, the return rate and cancel rate are low. These customers didn’t buy any product from our store more than 6 years. Thus, it is very hard to re-target this group of people.
  • 24.
    20 4. Conclusion &Recommendations After all data analysis, we segment our customers in 8 groups. Our goal is to decrease return rate and cancel rate so that we can improve our customers’ profitability and satisfaction. We also want to regain our old customers and increase their loyalty. According to Table 3.4, we create the pie chart (see Fig.4.1) that illustrates the percentages of different segments that make up total profits. Figure 4.1 Profit Distribution among Groups From Fig. 4.1, we can clearly find that cluster 6, 2, 5, and 3 contribute the majority (79%) of our total profits. These customers are our key customers in terms of total profits they generate. According to Table 3.4, Customers in cluster 6 have the shortest time duration since the last order date, which means these people now have the highest awareness of Saks among all customers and have a stronger connection to us. We need to retain these customers for the long-term development because they have higher probabilities to bring potential profits. In addition, the fact that their return rate and cancel rate are both low shows that they currently are satisfied with our Group 1 8% Group 2 18% Group 3 15% Group4 8% Group 5 17% Group 6 29% Group 7 3% Group 8 2% Profits Distribution
  • 25.
    21 products and services. Cluster2 has the lowest return rate and second lowest cancel rate. These customers are highly satisfied with our products and services. These customers purchase products in Saks with fewer hesitations. But they have not placed an order for more than 3 years, so it is important for us to retarget them. For the cluster 5, customers’ return rate and cancel rate are low, so their satisfaction is stable. But they have not placed an order for more than 5 years. The mean profits of this group is relatively low. Low return/cancel rates and low profits suggest that these customers are perhaps concerned that the return/cancel process will bring them many troubles, so they are unwilling to buy the product with a very high price. For these customers, we need to soothe their worries and convey the information that Saks is the ideal store to buy high-end products. Meanwhile, we should update their personal information and demands since they have not purchased products from us for more than 5 years. Since the mean of profits in cluster 3 is the highest, these customers are valuable for us. The return rate and cancel rate are relatively low, but we still need to decrease return and cancel rate indoor to increase their satisfaction as much as possible. This group’s time duration since last order data is the second shortest, so we need to retain them and persuade them to set up a long-term trustworthy relationship with us, helping us to generate more profits in the future. Cluster 8 has the second-lowest return rate, so these people at least are satisfied with the products that they have already bought. However, their cancel rate is extremely high which means we lost most of our potential profits that they intended to purchase at the beginning. Meanwhile, customers in cluster 8 have long time duration since their last order date, which means that they are not willing to purchase products from our store since they had bad purchase experience before.
  • 26.
    22 For example, theymay be disappointed with our website’s slow updating frequency or long shipping time. Thus, we may need to regain these customers by setting up specific strategies to target their needs more efficiently. Although the profits that cluster 4 bring to us are very high, these customers’ net profits are not as high as we see in the table because of their high return rate and cancel rate. This situation indicates that customers are dissatisfied with our products or services. We should improve the quality of our products and optimize our services to convince them to keep purchasing products from Saks with a lower return rate and cancel rate. In this way, we can prevent the loss of potential profits from these customers. No matter in terms of total profits or the mean of profits, customers in cluster 7 generate low profits for us. Their return rate is the highest, which means they almost return all the products that they purchased before. Although our employees spend much time and effort serving them and trying to meet their needs, these people return most of our products. So there must be something wrong with our products or services. Since this group of customers has negative influences on our financial situation right now, the spending on them will be more productive and efficient if we can lower their return rate. Recommendations In order to provide appropriate recommendations for our customers based on their different characteristics, we need to analyze some reasons for consumer’s return and cancelation behaviors. The difference between these two behaviors is that returns happen when customers have already purchased products and cancelations happen when people have not paid for the product yet. As we all know, Saks Fifth Avenue is both a retailer and e-retailer. In our physical store,
  • 27.
    23 customers return itemslargely because of our staff who cannot provide the proper product information or shopping advice for customers. On the other hand, an increasing amount of customers are purchasing products on our official website or app. Thus some problems appear. For example, when a customer purchases a pair of shoes on our website, he/she cannot look at or try on these products in person. Many customers will be disappointed when they receive the packages because the products do not match their expectations. Therefore, for these reasons, customers have higher possibilities of returning their products. In addition, more and more online retailers appear, which gives people multiple opportunities to compare price. They can easily find a better price for the same product on other websites, and once they find it, they will switch to other retailers. Our team has summarized several possible reasons for return behaviors:  The product itself cannot satisfy our customers. For instance, if one customer bought a sweater on our website and she was not satisfied with the material of the cloth, she might return this sweater.  Another normal situation is that the product is damaged during the shipping process. Under this situation, the customer definitely will return his/her product.  The description of the product is not consistent with the real product or the details of the product are not provided very clearly. The higher the expectations customers have based on the description on our website, the more disappointed they will be if the product doesn’t match the description.  Shopping guides don’t offer clear explanations for our customers. When customers ask our shopping guides for some advice or information in our physical stores, it is possible that our shopping guides are unable to provide proper advice. Misleading information and advice will
  • 28.
    24 probably result inreturn behaviors.  Poor post-purchase service is another important factor that will cause people to return their products. Saks is a high-end retailer that the prices of our products are relatively high. When customers pay a premium for a product, they will have higher requirements for customer services. If our post-purchase services cannot solve their problems in time and effectively, they may return their products as well. For instance, when a customer calls our representative to require an exchange, if we process this demand very slowly, the customer may run out of patience and decide to return the product directly. Our team has summarized several possible reasons for cancelation behaviors:  Customers make some mistakes when they place an order. For instance, they may find that they chose the wrong size or wrong color when they checkout. Under this situation, they will cancel the order and replace it with the right order, so this kind of cancelation will not essentially influence our sales. However, we still need to provide a clearer website design and better information to help customers place orders correctly. The other condition is that customers fill in the wrong personal information when they checkout, so they need to cancel the order and order the product again. This condition doesn’t have significant influences on our profits because customers usually will place the order again.  Customers find a better offer on other websites. Since more and more online retailers appear, many customers are used to comparing prices of the same products on different websites before they checkout. Once they find a better offer on another on-line retailer, they will cancel the previous order on our website.  Personal factors. It happens all the time that customers put items in their shopping carts when they are stimulated by some external incentives, but they still hesitate to buy. Products from
  • 29.
    25 Saks usually havehigh prices, so a majority of customers need a longer time to consider. After the impulse disappears, most customers will recover their rational thoughts and decide to cancel the order. Based on the previous analysis, Saks can prevent lots of customers’ return and cancelation behaviors by taking practical actions. We hereby provide managerial recommendations based on each group’s characteristics.  Regarding cluster 7, customers generate relatively low profits but their return rate is the highest. Obviously, we need to decrease the return rate in order to encourage them spend more on Saks. Firstly, we should improve the quality of the information on our website, such as providing them more description about product’s details. In this way, customers could have better understanding before they purchase products. Secondly, Saks should use better shipping packaging in order to protect products from being damaged by external forces. According to our research, we find that customers care more about the packaging when they pay high prices for products. So, delicate packaging can not only convey a good impression for our company but also match customers’ expectations. Besides, due to their frequent return behaviors, this group’s profits may be relatively low, so if we can decrease their return rate, their profits will increase somewhat.  Regarding cluster 8, this group generates relative high profits, but it also has the highest cancel rate and has not purchased products from us for a long time. Saks should provide these customers more straightforward information about products when they do shopping on our website so as to reduce the probability of misleading them. In addition, Saks should highlight “low stock” next to the quantity box in order to give customers a hint that this product may be not available in a short time. In this way, we can largely reduce the time they hesitate and
  • 30.
    26 motivate them topay for the order immediately. Besides, we can remind customers the number of people who are watching this product at the same time. Giving them an impression that this product is really popular can motivate them to complete the transaction quickly. A lot of potential profits will be realized if this group’s cancel rate can be decreased. Since we have the contact information of these customers, Saks should send them greeting emails to show our care. By telling them the new changes about our company and our new arrivals, we can trigger their interests again.  Regarding cluster 4, the mean profits of this group is the second highest, but their cancel rate and return rate are relatively high among all groups. Firstly, we need to systematically train our salespersons and shopping guides so that they have the ability to provide more appropriate advice and information for our customers. Considering that this group has not placed orders from us for more than two years, it is really helpful to retarget them by sending them promotional emails seasonally, especially for holidays. In order to prevent the return behaviors, we can also provide them discount coupons for their next purchases if they agree to keep their products this time. If they insist to return, we can offer them a refund, like 5% of the original price, to convince them not to return.  Regarding cluster 3, this group generates much higher profits than other groups. So these VIP customer’s return and cancelation behaviors have more serious negative effects on our profits. Saks should provide a personal shopping guide for each of them so that we can be aware of and solve their problems in a timely manner and correctly. Saks will gain huge financial returns if we can decrease these VIP customer’s return rate to below 1%.  Regarding cluster 2 and 5, the mean profits of these two groups are in the middle level, and their cancel rate and return rate are extremely low. Based on our previous analysis, these
  • 31.
    27 customers may havesome concerns that the returning and canceling process would bring them inconvenience, so they are unwilling to purchase high-price items. For these customers, we need to provide them a guarantee that if they are not satisfied our products, they have multiple channels to contact us, and we will deal with their problems in 24 hours. We believe that they will spend more money if Saks’ shopping process become more convenient.  Regarding cluster 6, the population in this group is the largest, which accounts for 30% of all population. Their mean profits is relatively high. More importantly, the time duration since their last order is the shortest. In this situation, we should send them promotional emails or mailings more frequently to maintain their interests and to convince them to keep purchasing from us. For example, we send them promotion coupons, like 10% discount. For these customers, we also want them to generate more profits for our company because they have potential profitability. Thus, we can try to offer them information about some high-end brands’ products through emails or mailings, in an effort to persuade these customers to buy higher priced products.
  • 32.
    28 5. Limitations andFuture Research Though we successfully identify 8 groups with diverse characteristics, we understand our analysis has its limitations. We lack some supportive data to serve decision making and reinforce our recommendations. Our study aims at identifying and investigating actionable customer groups with unique features. For example, for a high return rate group, convincingly lowering its return rate increases its profit. However, the current data is capable of identifying who are high return/cancel rate customers, but does not enable us to investigate why they return and/or cancel orders. As discussed in the previous sections, the reasons leading to high return/cancel rate are diverse. Knowing the motivations and reasons of returning and cancelling enables us to improve and optimize in avoidance of future similar situations. Unfortunately, we could not learn relevant insights from the current data, or otherwise we would have been able to come up with more specific recommendations for different segments. For future research, we have to extend our data diversity, especially adding the data that assists in learning returning and cancelling reasons. Saks has two major retail channels: online stores and offline stores. To comprehensively analyze the entire customer pool anticipates an improved data collection mechanism. For the online channel, one suggestion for future data collection is to add a check box listing possible return/cancel reasons in the after-sale-service page. The check box window appears when customers apply for a return or a cancel so that our database could record and store what we need. By the same token, when customers return in offline stores, our sale assistants should also learn their return reasons and record them into the sale system.
  • 33.
    29 The ultimate goalof analyzing customer information and consumption data is to obtain financial returns, increased profit for instance. We note that there are various ways to improve profit. While this study aims at investigating return rate and cancel rate, future research could focus on improving profit through increasing revenue.
  • 34.
    30 6. Appendix Cluster Number MC M C M C M C M C M C Last Order Date 32 627 76 1039 18 2302 43 254 53 1684 48 135 Profit 308 627 50 1039 85 2302 66 254 88 1684 75 135 Return Rate 18 627 0 1039 0 2302 100 254 0.05 1684 0 135 Cancel Rate 11 627 0 1039 0 2302 0 254 0.11 1684 100 135 Cluster Number M C M C M C M C M C M C M C Last Order Date 28 223 76 1039 18 2302 34 404 43 254 53 1684 48 135 Profit 611 223 50 1039 85 2302 141 404 66 254 88 1684 75 135 Return Rate 4 223 0 1039 0 2302 26 404 100 254 0.05 1684 0 135 Cancel Rate 6.6 223 0 1039 0 2302 14 404 0 254 0.11 1684 100 135 Cluster Number M C M C M C M C M C M C M C M C Last Order Date 28 223 76 1039 27 994 34 404 12 1308 43 254 53 1684 48 135 Profit 611 223 50 1039 58 994 141 404 106 1308 65.91 254 88 1684 75 135 Return Rate 4.06 223 0 1039 0 994 26 404 0.01 1308 100 254 0.05 1684 0 135 Cancel Rate 6.6 223 0 1039 0 994 14 404 0 1308 0 254 0.11 1684 100 135 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 Table 1. Hierarchical Cluster Analysis on 10% Calibration Sample 1 2 3 4 5 6 7 8 Cluster Number 1 2 3 4 5 6 7 8 Zscore (Profit) -0.49722 1.41462 -0.56187 -0.26828 -1.16908 0.07006 0.50463 0.29141 Zscore (Return Rate) Zscore (Cancel Rate) -0.27425 -0.27678 0.23681 -0.1999 -0.1999 0.70025 -0.1999 -0.1999 -0.19288 6.4128 -0.08598 -0.27678 -0.27678 0.94381 -0.27648 4.42492 Table 2. Initial Cluster Centers for Calibration Sample 2.86974 -0.29206 -0.24894 0.22321 0.02253 -0.20202 -0.07664 -0.15222Zscore (Time Durtion)
  • 35.
    31 1 2 34 5 6 7 8 1 1.299 0.88 0.303 0.49 1.333 0.338 0.324 0.455 2 0.104 0.388 0.401 0.158 0.26 0.004 0.108 0.031 3 0.029 0.412 0.151 0.065 0.064 0.003 0.141 0 4 0.037 0.322 0.083 0.041 0.009 0.001 0.131 0.02 5 0.029 0.254 0.08 0.033 0.018 0 0.105 0 6 0.011 0.207 0.029 0.031 0.03 0 0.055 0.006 7 0.004 0.15 0.018 0.027 0.023 0 0.034 0 8 0.005 0.132 0.012 0.022 0.016 0 0.02 0.002 9 0.004 0.116 0.006 0.018 0.011 0.001 0.009 0.002 10 0.007 0.082 0.006 0.014 0.007 0.001 0.007 0.006 11 0.006 0.057 0.003 0.012 0.006 0 0.005 0 12 0.006 0.055 0.003 0.005 0.004 0 0.003 0 13 0.007 0.046 0.002 0.006 0.004 0 0.001 0 14 0.009 0.033 0.001 0.002 0.006 0 0 0 15 0.016 0.026 0.001 0.003 0.01 0 0.001 0 16 0.014 0.026 0.001 0.005 0.011 0 0.003 0 17 0.006 0.031 0.001 0.005 0.007 0 0.004 0 18 0.004 0.021 0.001 0.005 0.003 0 0.002 0 19 0.001 0.019 0.001 0.003 0.001 0 0.001 0 20 0.001 0.015 0 0.003 0.001 0.001 0 0 21 0 0.015 0 0.001 0.001 0 4.68E-05 0 22 0.001 0.017 0.001 0.003 8.96E-05 0 4.97E-05 0 23 0 0.014 0 0.002 0 0 0 0 24 0 0.007 0 0.002 0 0 0 0 25 0 0.002 0 0.003 0 0 5.31E-05 0 26 0 0 0 0.001 0 0 0 0 27 0 0 3.02E-05 0 0 0 3.86E-05 0 28 0 0 0 0 0 0 0 0 Table 3. Iteration History for Calibration Sample Iteration Change in Cluster Centers
  • 36.
    32 Cluster Number 12 3 4 5 6 7 8 -0.1987 -0.1987 6.2202 Zscore (Time Duration) Zscore (Profit) Zscore (Return Rate) Zscore (Cancel Rate) -0.1706 0.15895 -0.1903 0.88898 -0.1938 -0.1915 -0.1269 0.03311 -0.2396 -0.1166 -0.2484 0.89573 -0.2634 4.33294 -0.2722 -0.2672 -0.1925 4.58926 -0.0074 0.37726 -0.1368 Table 4. Table Final Cluster Centers for Calibration Sample 1.60161 -0.6019 -1.1245 -0.2261 0.8055 0.00026 -0.1061 0.2118 1 7057 2 1021 3 17777 4 2802 5 13597 6 2626 7 13894 8 1251 Valid Missing Table5. Number of Cases in each Cluster for Calibration Sample Cluster 60025 54 Cluster Number 1 2 3 4 5 6 7 8 Zscore(Profit) 0.05353 0.9094 1.63409 0.00658 -0.50291 -1.07304 -0.23003 0.15992 2.46949 6.4128 -0.27678 0.00582 -0.27678 Zscore (Cancel Rate) -0.1999 -0.18996 -0.1999 -0.1999 -0.19252 -0.1999 Zscore (Return Rate) -0.27678 -0.27678 -0.27678 4.42075 0.56365 Table 6. Initial Cluster Centers for Validation Sample Zscore (Time Duration) -0.21249 -0.09599 -0.24407 -0.20105 0.62656 -0.19915 0.30391 -0.02783
  • 37.
    33 Iteration 1 23 4 5 6 7 9 1 0.405 0.206 1.192 0.294 0.859 0.543 0.579 0.258 2 0.433 0.091 0.721 0.17 0.122 0.387 0.085 0.063 3 0.129 0.134 0.59 0.199 0.062 0.094 0.019 0.003 4 0.024 0.212 0.577 0.136 0.016 0.025 0.007 0.018 5 0.021 0.214 0.534 0.071 0.009 0.017 0.018 0 6 0.012 0.189 0.553 0.024 0.01 0.017 0.011 0 7 0.003 0.135 0.431 0.009 0.006 0.014 0.006 0 8 0.004 0.102 0.404 0.003 0.002 0.009 0.003 0 9 0.006 0.099 0.412 0.002 0.001 0.007 0.004 0 10 0.005 0.098 0.498 0.001 0.001 0.006 0.007 0 11 0.005 0.078 0.383 0.002 0.001 0.004 0.006 0 12 0.004 0.068 0.286 0.002 0 0.004 0.005 0 13 0.005 0.065 0.26 0.002 0.001 0.003 0.008 0 14 0.004 0.057 0.225 0 0 0.003 0.003 0 15 0.004 0.051 0.177 0.001 0.001 0.002 0.006 0 16 0.003 0.042 0.19 0 0 0.001 0.002 0 17 0.003 0.039 0.178 0 0 0.002 0 0 18 0.003 0.033 0.191 0 0 0.001 0 0 19 0.002 0.034 0.175 0.001 0 0.001 0.002 0 20 0.001 0.033 0.257 0 0.001 0.001 0.004 0 21 0.002 0.032 0.21 0 0.001 0 0.002 0 22 0.002 0.033 0.265 0.001 0 0.001 0.008 0 23 0.001 0.023 0.079 0.001 0 0.001 0.002 0 24 0.001 0.015 0.041 0 0 0.001 0.007 0 25 0.001 0.012 0.041 0 0 0.001 0 0 26 0.001 0.007 0 0 0 0 0 0 27 0 0.002 0 0 0 0 0 0 28 0 0.001 0 0 0 0 0 0 29 0 0 0 0 0 0 0 0 Table7. Iteration History for Validation Sample
  • 38.
    34 Cluster Number 12 3 4 5 6 7 9 6.40827 -0.08657 -0.27678 Zscore (CancelRate) -0.18893 -0.00429 0.49115 -0.19921 -0.19388 -0.19533 2.58208 -0.15752 0.33883 -0.04898 Zscore (ReturnRate) -0.23482 -0.08775 -0.13979 3.7263 -0.25802 -0.24841 Zscore (Profit_sum) -0.08745 2.38355 10.34737 -0.13262 -0.17638 Table8. Final Cluster Centers for Validation Sample Zscore (Time Duration) -1.10393 -0.62541 -0.55918 -0.03315 1.15412 0.01119 0.00878 0.20639 1 11681 2 1604 3 99 4 2395 5 12078 6 10425 7 810 8 0 9 780 Valid Missing Table 9. Number of Cases in each Cluster for Validation Cluster 39872 45 Cluster Number 1 2 3 4 5 6 7 8 Zscore (Time Duration) Zscore (Profit) Zscore (Return Rate) Zscore (Cancel Rate) Table10. Initial Cluster Centers for All Data 3.39681 -0.066 -0.239 0.26632 0.67681 -0.2987 -0.2043 -0.1314 -0.6108 -1.0087 1.36108 -0.2214 0.48402 0.24312 0.04853 0.2496 -0.1109 -0.2767 -0.2768 0.99106 -0.2506 -0.2768 4.42492 -0.2768 0.19242 -0.1999 -0.1999 0.75803 -0.1211 -0.1999 -0.1999 6.4128
  • 39.
    35 1 2 34 5 6 7 8 1 1.463 0.594 1.082 0.447 0.729 0.614 0.325 0.414 2 0.308 0.183 0.411 0.139 0.103 0.195 0.009 0.035 3 0.082 0.109 0.349 0.068 0.075 0.077 0.001 0.011 4 0.042 0.063 0.285 0.051 0.047 0.03 0.001 0.008 5 0.021 0.03 0.233 0.04 0.023 0.017 0 0.005 6 0.011 0.021 0.184 0.03 0.017 0.01 0.001 0 7 0.006 0.015 0.151 0.02 0.013 0.008 0.001 0 8 0.004 0.006 0.121 0.01 0.004 0.007 0 0 9 0.001 0.002 0.109 0.01 0.002 0.005 0 0.008 10 0 0.001 0.087 0.012 0.001 0.003 0 0.003 11 0 0.001 0.051 0.009 0 0.002 0 0.002 12 0 0 0.048 0.009 0 0.002 0.001 0.001 13 0 0 0.036 0.006 0.001 0.001 0 0 14 0 0 0.028 0.006 0 0.001 0 0.002 15 0 0 0.022 0.005 0 0.001 0 0 16 0 5.54E-05 0.019 0.006 0 0 0 0 17 0.001 0 0.017 0.005 0 0 0 0 18 0 9.84E-05 0.018 0.004 9.04E-05 0.001 0 0 19 0 0 0.014 0.002 7.18E-05 0.001 0 0 20 0 7.04E-05 0.011 0.002 0 0 0 0 21 0 0 0.005 0.001 0 0 0 0 22 0 3.14E-05 0.01 0.002 0 0 0 0 23 0 0 0.013 0.002 0 0 0 0 24 0 0 0.01 0.003 0 0 0 0 25 0 0 0.004 0.003 6.85E-05 0 0 0 26 0 0 0.003 0.001 6.85E-05 8.99E-05 0 0 27 0 0 0.004 0.001 6.85E-05 9.97E-05 0 0 28 0 0 0.004 0.001 9.62E-05 0 0 0 29 0 0 0.005 0.002 9.80E-05 0 0 0 30 0 4.69E-05 0.004 0.002 0 7.14E-05 0 0 31 0 4.68E-05 0.006 0.002 0 6.64E-05 0 0 32 0 0 0.004 0.001 0 9.27E-05 0 0 33 0 0 0.004 0 0 0 0 0 34 0 0 0.003 0 0 8.59E-05 0 0 35 0 0 0 0 0 0 0 0 Table 11. Iteration History for All Data Iteration Change in Cluster Centers Cluster Number 1 2 3 4 5 6 7 8 6.2413 4.33483 -0.2688 Zscore (Cancel Rate) -0.1736 -0.1987 0.14937 0.9007 -0.1939 -0.1904 -0.1992 -0.0092 -0.1946 0.00853 Zscore (Return Rate) -0.2394 -0.2717 -0.1138 0.88815 -0.2632 -0.2489 Zscore (Profit) -0.1938 -0.1337 4.57319 0.39816 -0.1373 Table 12. Final Cluster Centers for All Data Zscore (Time Duration) 1.60055 -0.1037 -0.6051 -0.2386 0.80404 -1.1248 0.00654 0.20816
  • 40.
    36 1 11778 2 23015 31717 4 4726 5 22793 6 29415 7 4371 8 2082 Valid Missing Table 13. Number of Cases in each Cluster for All Data Cluster 99897 99