ÄeĢĢ tieng anh thpt 2024 danh cho cac ban hoc sinh
Ā
Grocery Store Classification Model
1. Math 381 Project Two
Group 9
Alex Forney
Keren Lai
Gerard Trimberger
Xinyu Zhou
December 7, 2016
1
2. 1 Introduction
When we buy products in grocery store, we ļ¬nd the things we want to buy are usually not located
near each another, and it is common to ļ¬nd that one part of the store is crowded while others
have few customers. This may be because store managers or other higher-ups plan the store layout
while taking into consideration the similarities of productsā sales. He/she may place items often
purchased together in locations farther apart in the store. So, customers may need to stay in
the store longer, resulting in these customers seeing more items and potentially purchasing them.
Another added beneļ¬t may be the reduction of congestion is departments with popular items. In
our project, we seek to ļ¬nd the relationships between diļ¬erent departments of a grocery store using
multidimensional scaling (MDS). We will plot the activity of 10 diļ¬erent departments (Packaged
Produce, Deli, Bakery, Dairy, Meat, Dry goods, Fresh Produce, Coļ¬ee shop, Seafood, and Sushi)
in order to show the similarities and diļ¬erences between them. The result of our study may provide
insight into the planning of grocery stores and/or customer habits.
2 Background
2.1 Idea
We began the brainstorming process by each formulating a list of topics that we were interested in,
both mathematically and socially. We also created a list of our individual skill sets and experience
that we felt was relevant to the project. We then spent time reading through each of our responses
to get an idea of what type of project we could all ļ¬nd interesting. We all agreed that we wanted
to do something related to a common situation that most people experience on a daily basis.
It is always more interesting if people can directly relate to the project rather than working on
something that they do not have personal experience with. Our second criteria, was that we each
wanted to do something related to probabilities or Monte Carlo simulation. Keren and Xinyu
are ACMS/Economics double majors so they were both interested in the processes involved in
economic development.
Our ļ¬rst formulation of the proposal involved comparing the total sales and overall market
share of diļ¬erent car manufacturers. We wanted to build a Markov chain of diļ¬erent manufacturer
states and how they relate, in order to predict how the current market share distribution would
change over time. Ultimately, we felt that we would be unable to obtain the necessary data for
an interesting Markov chain, i.e. the number or probability of a car owner moving from one
manufacturer or another. Other outside factors, such as owning multiple cars, created additional
problems that we eventually felt would hinder our progress.
At this point, we decided to switch gears. While keeping the original overarching goals in
mind, speciļ¬cally a publicly relatable problem and something probability/simulation based, we
formulated a new proposal that involved simulating a grocery store checkout process. We planned
on contacting a local grocery store for real-life customer and item distribution data. Gerard went
into his local QFC on Friday, November 18th. He asked to speak with the manager of the store,
and presented the situation to her, asking speciļ¬cally if we could obtain some data for customer
checkout times, their number of items, and what types of register (Normal, Express, or Self-
Checkout) that they utilized to make their purchase. The manager suggested that he call back
on Saturday (11/19) when the bookkeeper was present, because the bookkeeper is the one with
access to that type of information. When Gerard called back on 11/19 he was informed that the
bookkeeper had called in sick, and that he would either have to call back on Monday or to try a
diļ¬erent store. The manager provided a phone number to another store in the region that had
their bookkeepers present on 11/19. Gerard followed through with this lead and presented the
situation to the other store manager. This new store manager did not seem to comprehend the
issue and advised Gerard to contact QFC Corporate for more information. Gerard then called the
Corporate phone number provided and left a message on their answering machine informing them
that we would like to talk as soon as possible. Gerard waited until Monday morning (11/21), and
when he had not heard back from corporate, decided to contact the manager at the local QFC
once again. This time he was able to speak directly to the bookkeeper of the store, and conļ¬rmed
that there was customer data available in the computer system but that it may not be exactly
what we were looking for. He provided his name and number and was told that if he did not hear
back from the store later that day, to come in on Tuesday (11/22). Gerard did not receive a call
2
4. for each customer according to diļ¬erent arrangements of the departments in the store.[9] The results
showed that the total walking distances of customers increased in the proposed new layout.[9] Chen
Li from University of Pittsburgh modeled the department allocation design problem as a multiple
knapsack problem and optimized the adjacency preference of departments to get possible maximum
exposure of items in the store, and try to give out an eļ¬ective layout.[10] Similar optimization was
used in the paper by Elif Ozgormus from Auburn University.[11] To access the revenue of the store
layout, she used stochastic simulation and classiļ¬ed departments in to groups where customers
often purchase items from them concurrently.[11] By limiting space, unit revenue production and
department adjacency in the store, she optimized the impulse purchase and customer satisfaction
to get a desired layout.[11]
All three papers have similar basic objectives to ours. The paper by Boros et al. was aiming to
maximize the total walking distance of each customer and thus promote sales of the store.[9] Liās
paper also focused on proļ¬t maximization but with considerations of the exposure of the items and
adjacencies between departments.[10] He is the ļ¬rst person to incorporate aisle structure, depart-
ment allocation, and departmental layout together into a comprehensive research.[10] The paper by
Ozgormus took revenue and adjacency into consideration and worked on the model speciļ¬cally for
grocery stores towards the objectives of maximizing revenue and adjacency satisfaction.[11] In our
paper, we simply focus on the busyness of diļ¬erent departments and use multidimensional scaling
to model the similarities between each department and thus provide solid evidence for designing
an eļ¬cient and proļ¬table layout. Instead of having data on comprehensive customer behavior in
the store, we have data of sales from the register point of view.
3 The Model
As a result of the data acquisition process described in the Background section, we were able to
obtain an hourly breakdown of the number of items, total sales, and number of customers that
purchase items from the local QFC that we collected from. The data presents a 24-hour snapshot
of a standard day within the grocery store. The data was presented in individual printouts of each
departmentās activity for the day, therefore the ļ¬rst step was to transcribe all of the information
from physical paper form onto an Excel spreadsheet. The results are presented in the Appendix.
The next step was to separate and normalize each of the diļ¬erent activity indicators based
on their departmental, as well as hourly, totals. In this way, we transformed the raw data into
standardized distributions whose area under the curve summed to one. Speciļ¬cally, we separated
the data into three diļ¬erent 24 Ć 10 matrices (i.e. items, sales, and customers), where the rows
of the matrix represent the hourly data for a 24-hour time period and the columns represent the
each of the 10 departments. For each of these matrices we normalized each entry by their daily
departmental totals, i.e. for each department (or column) we divided each entry in the column by
the summed total of the column:
MATLAB Code:
for i = 1:10
items_normD(:,i) = items_raw(:,i)/sum(items_raw(:,i));
sales_normD(:,i) = sales_raw(:,i)/sum(sales_raw(:,i));
cust_normD(:,i) = cust_raw(:,i)/sum(cust_raw(:,i));
end
Additionally, we normalized each of the 24 rows (hourly data) by the row sum of the activity for
that particular hour throughout all departments:
MATLAB Code:
for i = 1:24
items_normH(i,:) = items_raw(i,:)/sum(items_raw(i,:));
sales_normH(i,:) = sales_raw(i,:)/sum(sales_raw(i,:));
cust_normH(i,:) = cust_raw(i,:)/sum(cust_raw(i,:));
end
These calculations were performed on a mid-2010 Macbook Pro, running Windows 7 - SP1, in
MATLAB R2016b Student edition. The calculations were instantaneous. The result of this nor-
malization process resulted in 6 diļ¬erent datasets of customer activity, i.e. the number of items,
4
5. sales, and the number of customers each normalized by their daily departmental totals and addition-
ally by their hourly store totals. We ran each of these data sets through the distance calculations,
described below, in order to generate diļ¬erent variations of the information, ultimately in search
of the best āgoodness of ļ¬t.ā
In order to create an MDS model of the above mentioned data sets, our next step was to run
each data sets through our distance algorithm in order to calculate a single dimensional distance
between diļ¬erent departments. In other words, we iterated through each of the departments, a,
and compared them to each of the other departmentās, b, hourly customer activity. We utilized
the Minkowski distance formula for our distance calculations [7]:
distance =
24
i=1
|ra,i ā rb,i|p
1
p
where, i represents the hourly time period (e.g. i = 1 represents 12 oāclock AM to 1 oāclock AM),
a and b represent each of the diļ¬erent departments, and p represents the power of the Minkowski
algorithm. The most common powers, p, that are considered are powers of 1, 2, and ā. A power
of 1 is commonly referred to as the Manhattan distance, a power of 2 is commonly referred to as
the Euclidean distance, and power ā is commonly referred to as Supremum distance. We used R
version 3.3.2 on a Late 2013 MacBook Pro running macOS 10.12.1 to carry out our calculations,
which ran instantly. Speciļ¬cally, we ran the following commands in R:
library ( readr )
library ( wordcloud )
items <ā read . csv ( f i l e = "ItemsHourLabel . csv " , head = TRUE, sep = " , " )
d <ā d i s t ( items , method = " e u c l i d i a n " )
l l <ā cmdscale (d , k = 2)
textplot ( l l [ , 1 ] , l l [ , 2 ] , items [ , 1 ] , ann = FALSE)
Step-by-step, here is what the commands do:
library ( readr )
library ( wordcloud )
These commands import libraries that allow us to read the CSV ļ¬le and create the plot.
items <ā read . csv ( f i l e = "ItemsHourLabel . csv " , head = TRUE, sep = " , " )
This command reads in the formatted 24-dimensional vectors corresponding to each department
from the ļ¬le āItemsHourLabel.csvā into a table called āitemsā. The ļ¬le āItemsHourLabel.csvā con-
sists of rows that look like this:
Department,00:00 - 01:00,01:00 - 02:00,02:00 - 03:00,03:00 - 04:00,...
Packaged Produce,0,0,0,0.011299,0,0,0.022599,0.00565,0.022599,...
Deli,0.006135,0,0,0,0,0.02454,0.006135,0.02454,0.018405,0.02454,...
Bakery,0.001661,0,0,0,0,0.021595,0.019934,0.059801,0.043189,...
.
.
.
In this case, each row represents the number of items sold in each department in a given hour
divided by the total number of items sold in the department over the course of the day. The
department names at the beginning of each row are used for the graphic output.
d <ā d i s t ( items , method = " e u c l i d i a n " )
This command takes the table āitemsā and creates a matrix of distances between every row of the
table. Here, the distance method is speciļ¬ed as āeuclidianā, which means that the distance between
5
6. row i and row j will be calculated as
dij =
24
i=1
|ra,i ā rb,i|
2
.
l l <ā cmdscale (d , k = 2)
Here, the k = 2 speciļ¬es a two-dimensional model. The output is a list of two-dimensional coor-
dinates, one for each object in the original set:
> head(ll, 10)
[,1] [,2]
[1,] -0.032088329 0.01770756
[2,] -0.027631806 0.02097795
[3,] -0.028511119 0.05441644
[4,] -0.013549396 -0.01713736
[5,] -0.086806729 -0.06648990
[6,] -0.007476898 -0.01173682
[7,] -0.010818238 -0.02144684
[8,] -0.001610913 0.18130208
[9,] -0.045186100 -0.12261632
[10,] 0.253679528 -0.03497679
textplot ( l l [ , 1 ] , l l [ , 2 ] , items [ , 1 ] , ann = FALSE)
This command plots the result with the names of the departments. ll[,1], ll[,2] speciļ¬es
that the ļ¬rst column of ll gives the x-coordinates and the second column gives the y-coordinates.
items[,1] speciļ¬es that the ļ¬rst column of the table āitemsā gives the labels for the data points.
ann = FALSE removes the x and y labels from the plot. The results of these commands are
presented in the following section.
4 Results
4.1 Hourly
In order to draw conclusions about the two-dimensional representation of our data, we can compare
them to the original data after it has been normalized by the hourly store totals. The result of
these datasets is the 2D plot of the items per hour:
6
7. We immediately see that the dairy department and fresh produce department diļ¬er from the rest
of the data. Similarly, the coļ¬ee shop and bakery diļ¬er signiļ¬cantly. We then wish to ļ¬nd two
diļ¬erences in the data that may be causing the diļ¬erences and can be used as the dimensions of
our plot. A plot of the items sold over the course of the day in each department follows:
We can see that the dairy department and fresh produce department both sell more than double
any other department at their respective peaks, which occur at approximately the same time in
the day. So, the horizontal dimension of our 2D representation of the data corresponds to this
large peak between the hours of 12 p.m. and 8 p.m. This is further supported by the fact that
the dry goods department and the bakery follow this trend to a lesser degree (less than dairy
7
8. and fresh produce but more than the other departments), so they are closer to the right side
of our plot. Nothing immediately stands out from the raw data that indicates that the coļ¬ee
shop and the bakery diļ¬er from the rest of the departments in any meaningful way. We can in-
stead look at the normalized data to see what may be the cause of this vertical distance in the plot:
Here, we see that the coļ¬ee shop and the bakery sell the majority of the total items sold in the store
between about 6 a.m. and 9 a.m. This does seem to make sense, as many people may be purchasing
coļ¬ee and/or baked goods in the morning for breakfast. However, this second dimension tells us
that the departments diļ¬er in the times at which they are the most active, which we already knew
from our ļ¬rst dimension and the fact that our data is separated by departments and time intervals.
Consequently, this second dimension is not very useful.
Examining the other two 2D plots of the data normalized by hourly totals, i.e. sales and cus-
tomer count, leads to similar conclusions. That is, the axes of the plots are dependent on the times
at which business activity spikes in each department. If we now consider the example of the sales,
we see that the 2D representation is essentially the same as with the previous dataset:
8
9. While the distances are altered slightly, the plot is otherwise simply inverted. The results for the
customer data are very similar and are included in the Appendix.
4.2 Daily
Similar to when the data was normalized by the hourly totals, the 2D representations of our data
normalized by daily totals exhibits a relationship between departments that are busiest at the same
times:
For example, in the above plot of the items sold in each department, we see that the coļ¬ee shop
is far away from the seafood department. By looking at the raw data of the number of items sold
per department over the course of the day (included above in this section), there does not seem to
be anything contrasting the coļ¬ee shop and seafood in any meaningful way. Instead, we can look
directly at the normalized data:
9
10. We can see that the coļ¬ee shop is the busiest early in the day between 9 a.m. and 12 p.m. with
another spike around 2 p.m. Conversely, the seafood department does the most business between
3 p.m. and 6 p.m. The rest of the departments, other than the sushi department, seem to increase
their business steadily throughout the day and peak in the late afternoon. This leads us to the
conclusion that one axis in our plots corresponds to the time at which each department does most
of its business. However, there is also a second dimension that appears to depend only on sushi.
Looking at the 2D representation of the sales over the course of the day, we again see this strange
distance between the sushi department and the rest of the store:
When we look at the raw sales data for the sushi department, the only aspects that stand out are
the fact that the department does relatively little business and that the department only has three
time periods when there are any transactions at all. There are two spikes around lunch time and
again around dinner time, but there is another single sushi sale between midnight and 1 a.m.
10
11. The "sushi dimension" could be a result of either the two periods of activity or the fact that the
sushi department is one of the only departments to make a sale at the late hour. The former does
not seem to be the case because all of the departments go through a rise and fall of sales over
the course of a day. Alternatively, if the latter is true, the āsushi dimensionā is not particularly
interesting since we are only analyzing one dayās worth of data and the single sale is more than
likely not indicative of a trend of late night sushi purchases. In either case, the second dimension
of our plot is not really helpful in determining the similarity of any two departments. So, we can
perform another dimension reduction in order to create a one-dimensional model for our data.
The plot of customer data was omitted from the discussion because of its similarity to the item
and sales data sets. The results are presented in the Appendix. Our next step was to consider
adjustments to the Minkowski powers and MDS dimensions in our model.
5 Adjustments and Extensions
5.1 Goodness of Fit
Our ultimate goal in generating diļ¬erent variations of the MDS model was to ļ¬nd a model with the
optimal "goodness of ļ¬t," (GoF) for each of the above-mentioned data sets. Goodness of ļ¬t is a
measure of how well the MDS model ļ¬ts the original data based on a choice of MDS dimensions and
Minkowski powers. For each of the diļ¬erent customer activities (items, sales, and customers), and
for the two diļ¬erent normalization methods by hour and by department (or by day), we evaluated
how changing the MDS dimension and changing the Minkowski power aļ¬ected the goodness of ļ¬t
of our model. We considered each of the MDS dimensions between 1 and 9 because our model
contained 10 departments. As the dimension of our MDS model is increased we expected to see the
goodness of ļ¬t increase accordingly. We also considered the 3 most common Minkowski powers,
p = 1 which corresponds to the Manhattan distance or 1-norm, p = 2 which corresponds to the
Euclidean distance or 2-norm, and p = ā which corresponds to the maximum distance or inļ¬nity
norm.
We can use R to ļ¬nd the GoF data in a similar fashion to how we obtained our original model.
The entire code is included below:
11
12. library ( wordcloud )
items <ā read . csv ( f i l e = "ItemsDayLabel . csv " , head = TRUE, sep = " , " )
d <ā d i s t ( items , method = " e u c l i d i a n " ) # 2ānorm
# d <ā d i s t ( items , method = "manhattan ") # 1ānorm
# d <ā d i s t ( items , method = "maximum") # sup norm
cmdscale (d , k = 1 , eig=TRUE)$GOF # k i s the dimension
We can choose between one of the three distance measures depending on which norm we are testing.
Similarly, we can use the following command to change dimensions:
cmdscale (d , k = 1 , eig=TRUE)$GOF
This command returns a goodness of ļ¬t value between 0 and 1, where a value of 1 indicates a per-
fect ļ¬t, or direct correlation, and a value of 0 indicates uniform randomness. k = 1 corresponds to
the dimension of our data, which we let range from 1 to n ā 1 = 9 where n = 10 is the dimension
of our data (i.e. the number of departments). The results are presented in the graphs below:
For the customer data, we can see that a Minkowski power of 1, Manhattan, seems to produce
models with the best goodness of ļ¬t over most MDS dimensions. In other words, the red line
is consistently higher than rest. Next, we are interesting in ļ¬nding the lowest MDS dimension
that suļ¬ciently models the data. For the customer by department (or day) data, we see that
a dimension of 1 leads to a GoF of about 0.46. While this is acceptable in some situations, we
also noticed that by raising the dimension of our MDS model to 2, our GoF is increased to 0.78.
Therefore, to optimize our MDS model for this particular data set, we chose a Minkowski power
of 1 and an MSD dimension of 2. On the other hand, if we examine the customers by hour plot,
we can see that a Manhattan Minkowski in a 1-D MDS model produces a goodness of ļ¬t of 0.72.
Therefore, this particular set of choices is suļ¬cient in capturing the inherent trends present within
our original data set.
12
13. We noticed a similar trend in the items and sales data. The Manhattan Minkowski distance
calculation, i.e. p = 1, seems to produce the best GoF over most of the MDS dimensions between 1
and 9. Examining the plots of items and sales by department, we see that 1-D MDS models do not
suļ¬ciently encapsulate the multi-dimensional interactions present in these data sets, producing a
GoF of 0.45 and 0.51 respectively. However, if we examine the GoF of ļ¬t for these data sets in a
2-D MDS model, 0.77 and 0.76 respectively, we can see that there is a signiļ¬cant increase in the
GoF indicating that a 2-D MDS model is a signiļ¬cantly better ļ¬t for these data sets. Additionally,
if we examine the items and sales per hour, we notice that the 1-D Manhattan MDS models seems
to be suļ¬cient for modeling the original data set, producing a GoF of 0.76 and 0.81 respectively.
Goodness of ļ¬t tables for each of these data sets are presented in the Appendix.
5.2 Changing the Dimension
In formulating our problem, we made the assumption that our one day of data is meaningful in the
larger scheme of business at QFC. Although no single day can be indicative of the general patterns
at the store, we are working under the assumption that there are some trends present in our data
that may provide insight into the store in general. We could improve our model by obtaining more
data from QFC, at which point we may be able to have more evidence that any relationships we
ļ¬nd between departments are accurate. However, we would need a lot of data over a long period
of time in order to proceed in this manner. Seeing as how this data is probably very valuable to
the company and how diļ¬cult it was for us to obtain a single dayās worth of data, this is not a
practical way forward.
As we saw in Section 5.1, calculating our plots using one dimension and the Manhattan distance
seemed to produce a high enough goodness of ļ¬t. So, we can perform our scaling again in 1D rather
than 2D in an attempt to remove the excess dimension we saw in our original results. As has been
the case so far, we expect this relationship to depend on the time of day at which each department
does the most business. In any case, we can alter our R code slightly to reļ¬ect this change in our
model:
library ( readr )
13
14. library ( wordcloud )
items <ā read . csv ( f i l e = " SalesHourLabel . csv " , head = TRUE, sep = " , " )
d <ā d i s t ( items , method = "manhattan" )
l l <ā cmdscale (d , k = 1)
# Column of zeros used to p l o t a l i n e in one dimension
textplot ( l l , c (0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0) , items [ , 1 ] , yaxt = ān ā , ann = FALSE)
If we now compare our raw data to our 1D representations, we see a stronger relationship between
the dimension and the data itself. Consider ļ¬rst the number of items sold over the course of each
hour:
The fresh produce department and the dairy department sell the most items at their peaks. This
is reļ¬ected in the plot as those two departments are the furthest away from the rest. In fact, if we
go through the lines from top to bottom in the plot on the left, we will see that this is exactly the
order in which the departments appear from right to left in the second plot. We can see the same
relationship reļ¬ected in the plots for sales per hour and customers per hour:
14
15. To verify this trend, also notice that fresh produce department has the highest peak in the sales
data and is further to the right than the dairy department. Similarly, in the customer data, the
dairy department is further to the right of the fresh produce department because the amount of
customers served in between the hours of 4 p.m. and 5 p.m. is greater. So, the distances in our
scaled plots seem to correspond to the height of each peak between 4 p.m. and 5 p.m., which
provides insight into the maximum activity at what is the busiest hour at QFC.
As was the case in the original 2D MDS plots of the data normalized by daily totals, there
seems to be something unique about the sushi department in the 1D representations. In particular,
this relationship is not immediately obvious from the raw data itself. We can ļ¬rst compare the
normalized plot of customers served over the course of the day as compared to the 1D representa-
tion of the departments:
What stands out in the plot on the right is the fact that the coļ¬ee shop and the sushi department
are the furthest apart. When we look at the plot on the left, we notice that the coļ¬ee shop serves
the highest percentage of its total customers early in the day. In particular, it serves the highest
percentage of any department between 10 a.m. and 11 a.m., while the sushi department serves
none. We know that this particular hour, rather than any of the other morning hours, accounts for
the distances in the 1D plot because of the seafood department. That is, the seafood department
does not serve its ļ¬rst customer until this hour, and it is closer to the rest of the departments than
to the sushi department. If the plot were reļ¬ecting the diļ¬erences at an earlier time, then the
seafood department would presumably be right next to the sushi department since neither serves
a customer. This relationship is again apparent in the other two datasets:
15
16. 5.3 Takeaways
We have seen that a 1D representation of our data is the most ļ¬tting when it has been normalized
by hourly totals. The GoF values for these three datasets are reasonably high, and the resulting
plots accurately reļ¬ect the peak activity in each department at the busiest hour. This information
can be useful in planning how to organize a store when the most business is being done.
Conversely, 2D representations of the data when normalized by daily totals seem to be more
useful than the 1D plots. While the 2D plots have one dimension relating to activity at certain
time periods throughout the day (e.g. breakfast time, lunch time, and dinner time) and another
relating to the business of departments at one particular hour, the 1D plots only give us insight
into the latter. This information is ultimately not helpful in coming to any meaningful conclusions
about the activity patterns in each department because of the fact that we only have data from
one day. Despite the superļ¬uous second dimension, the 2D plots still have one useful dimension,
whereas the 1D plots do not have any.
Hence, we can best utilize our data to evaluate peak traļ¬c between 4 p.m. and 5 p.m. by
normalizing by hourly totals and comparing one-dimensional representations of the departments.
Additionally, we can see broad trends in business by normalizing our data by daily totals and
representing it in two dimensions. In order to verify the apparent trends, though, we would still
need to obtain a larger dataset.
6 Conclusion
6.1 Object of study
From the result we got above, we can see several departments are similarly busy at the same time,
such as the meat, fresh produce and seafood departments. To avoid congestion in some parts of
the grocery store and to maximize the possibility of money customers would spend in the store,
the store owner is better to separate these departments.
6.2 Limitations
First of all, we only have data for one particular day in that store. This would deļ¬nitely generate
some bias on our data and thus make our model less credible. Also, our data are statistics from
the register point of view. What we have are the actual purchases in each department, which is
only part of the customer behavior. Further, in reality the arrangements of departments could not
be ļ¬exible. They could be restricted by the locations of warehouses or workbenches. For instance,
the sushi department needs a workbench to make fresh sushi every day; a department containing
heavy items would prefer somewhere close to its warehouse; a coļ¬ee shop would deļ¬nitely be close
to the entrance or exit. For these departments, the location and size are predetermined at the
point of the construction of the store.
6.3 Future work
In terms of modeling busyness of departments, we are currently based on number of customers,
number of items sold and revenue in each department. There are basic observations from the
registers. What happens before people checking out would also be worth considering. If possible,
16
17. we could collect data on the time an average customer spent in each department, regardless of
whether he/she buys something in that department. Similarly, the number of customers physically
appeared in each department also measures the busyness of that department.
In terms of generating a layout the maximizes the sales in the store, there are many aspects
worth deeper discussions. In addition to locations of diļ¬erent departments, we could take sizes
of departments into consideration. Detailed placements and sizes of aisles, shelves and items on
each shelf would also have signiļ¬cant impact on sales in the store. This would be more realistic
since it is easier to make changes on them than on the predetermined locations of departments.
Completely diļ¬erent models and more complicated modeling methods would be required to identify
the interrelationships of locations and sizes between diļ¬erent aisles and shelves.
17
19. Appendix
Link to Google Drive:
https://drive.google.com/drive/folders/0B-8II7_BkXIbTmZ0aEREQ2RzSzA?usp=sharing
A.1 Raw data
The printout of data we got from QFC. There are total 10 pages, 1 page for each department.
19
39. A.3 2D MDS Results
2D plot of items sold, normalized by day:
2D plot of sales made, normalized by day:
2D plot of customers served, normalized by day:
39
40. 2D plot of items sold, normalized by hour:
2D plot of sales made, normalized by hour:
40
41. 2D plot of customers served, normalized by hour:
41
42. A.4 1D MDS Results
1D plot of items sold, normalized by day:
1D plot of sales made, normalized by day:
1D plot of customers served, normalized by day:
42
43. 1D plot of items sold, normalized by hour:
1D plot of sales made, normalized by hour:
43
44. 1D plot of customers served, normalized by hour:
44