QCon London: Mastering long-running processes in modern architectures
Isqs6347 team5 proposal_032513
1. ISQS 6347- Data &Text Mining
Spring 2013 Team 5
Project title Data Analysis For Abuelo’s
Class number / Semester ISQS6347 Spring 2013 – Section 1
Student names Preeti Prajapati
Neha Soam
Ming Kuo Hui
The type of this project Data Mining Academic Project
The nature and source of the dataset Nature - Available in SAS file format
Source – Abuelo’s Restaurant
Completion date May 16 2013
2013
2. May 15, 2013 [ISQS 6347 – Final Project Report]
2 | Abuelo's
Table of Contents
Introduction............................................................................................................................................................3
Business Background......................................................................................................................................3
Objective...............................................................................................................................................................3
Project Overview...................................................................................................................................................3
Dataset Availability and Description.........................................................................................................3
Table 1 : Attributes & their Description...................................................................................................4
Data Quality and Preparation ......................................................................................................................4
Table 2...................................................................................................................................................................6
Data Exploration & Preprocessing.................................................................................................................7
Data Preparation...............................................................................................................................................7
Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide).8
Figure 4: Output Result from SAS Enterprise Miner...........................................................................8
Preprocessing Tasks........................................................................................................................................9
Data Mining Methodologies ........................................................................................................................... 10
Primitive Results and Findings..................................................................................................................... 11
Data Filtration & Addition of New Variables ...................................................................................... 13
Refined Data’s Exploration ........................................................................................................................ 13
3. May 15, 2013 [ISQS 6347 – Final Project Report]
3 | Abuelo's
Introduction
The purpose of this project is to analyze a restaurant’s sales data and to generate a model
that would aid at restaurant’s management decisions. The restaurant would be examined in
this project, Abuelo’s, is a real restaurant and all data collected are real data. By collecting,
exploring, processing, and analyzing the real life data via the data mining techniques, we
learned from lecture, we are able to generate a model that is useful and can be applied to
restaurant’s decision making.
Business Background
The Abuelo’s is a Mexican restaurant that has established stores in several cities since 1989.
Abuelo’s has consistently been on the leading edge of Mexican cuisine, combining menu
creativity, outstanding food and beverage quality, colorful plate presentations and superior
service in an impressive Mexican courtyard-themed atmosphere. Every dish is made to
order from scratch using only the freshest premium ingredients.
Objective
Recently Abuelo’s is planning to adopt a new menu to replace the old one. The restaurant
has been conducting trails of new Value Items. Value item has a lower cost as well as a
lower profit margin compared to its full version (i.e. Chicken Zucchini and Chicken Zucchini
Lite). But value items are more frequently ordered than other items. The new menu differs
from the old one in that it is extended with Value Items and some other new items which
are not treated as value items in the list.
The main objective of this project is to analyze the effect of value items on the total profit
return. The result of this project is expected to aid at decisions of what value items should
be deleted or stayed on the menu.
Project Overview
Dataset Availability and Description
The data for Abuelo's is available for year 2011 and 2012 in excel and SAS files. The
attributes and descriptions of the available data are listed in table below:
4. May 15, 2013 [ISQS 6347 – Final Project Report]
4 | Abuelo's
Attribute Name Attribute Description
UID Unique ID representing combination of item number and store
ID
Store ID Unique ID assigned to each store
Item Number Number assigned to an item
Minor Category Category of item
Product Description Description of item
Quantity Quantity sold for each item in different stores
Avg Unit Price Average unit price of an item
Avg Unit Cost Average unit cost of an item
Guest Count Sum of customer visits in one stores in a particular week
Week IND Number assigned to each week in one year
Number Item Number Number assigned to an item
Table 1 : Attributes & their Description
Note: The dataset has approximately 1,827,700 rows and has minimal missing values.
Data Quality and Preparation
The dataset used comes from previous student project; therefore, many data preparation
tasks have been done and the dataset has already been transformed into SAS file format.
However, after exploring the dataset, we observed some issues that may require further
considerations and adjustments before the data analysis and mining stage:
» UID is not a unique identifier, and it has no value for 2406 records.
5. May 15, 2013 [ISQS 6347 – Final Project Report]
5 | Abuelo's
Figure 1: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)
» Purpose of Num_Item_Number attribute is unclear – the value contained is same to
that of Item_Number, but their data types are different. In addition,
Num_Item_Number has 187 missing value (but there are no missing value in
Item_Number).
6. May 15, 2013 [ISQS 6347 – Final Project Report]
6 | Abuelo's
» Output Result from SAS Enterprise Miner
Figure 2: Output Result from SAS EnterpriseMiner
» Unclear variable values:
Some Avg_Unit_Price contain 0, indicating the price of item is $0.
Some Avg_Unit_Cost contain 0 and negative value.
» There are 28 Item_Number having duplicate values but with different
Product_Description.
Table of Items that Have Same Number but Different Description (Show First Two)
Item_Number Minor_Category Product_Description
101090 Sub Cooked Taco Meat BF 2.5 oz - Sub
101090 Sub Cooked Taco Meat CK 2.5 oz - Sub
12067 Margaritas Patron Shaken Margarita
12067 Margaritas Shaken Margarita
Table 2
7. May 15, 2013 [ISQS 6347 – Final Project Report]
7 | Abuelo's
Data Exploration & Preprocessing
The tasks of preliminary data mining include data preparation, data exploration, data
model selection, and discussion of primitive findings. By performing preliminary data
mining, we are able to examine data quality and observe the issues such as missing data
and duplicated or erroneous data. The appropriate data methodologies are chosen and
applied based on nature of dataset and objective of project – to analyze the effect of valued
items on the total profit return. .
Data Preparation
The dataset, available in SAS file format, contains data and information as shown in Table 1.
Attribute Name Attribute Description
UID Unique ID representing combination of item number and store
ID
StoreID Unique ID assigned to each store
ItemNumber Number assigned to an item
MinorCategory Category of item
ProductDescription Description of item
Quantity Quantity sold for each item in different stores
AvgUnitPrice Average unit price of an item
AvgUnitCost Average unit cost of an item
GuestCount Sum of customer visits in one stores in a particular week
WeekIND Number assigned to each week in one year
NumItemNumber Number assigned to an item
Table 1: Initial Data from Dataset
Because the dataset is already cleansed and is well prepared, at this stage we focused on
data exploration and examination. We found several issues that may affect the analysis of
project. Four major issues observed are listed as followed:
UID is not a unique identifier, and 2406 of the records have no value (see Figure 1).
Purpose of NumItemNumber attribute is unclear – the value contained is same to
that of ItemNumber, but their data types are different. In addition, NumItemNumber
has 187 missing value (see Figure 2).
Unclear variable values:
o Some AvgUnitPrice contain 0, indicating the price of item is $0.
o Some AvgUnitCost contain 0 and negative value.
There are 28 ItemNumbers having duplicate values but with different
ProductDescription (see Table 2)
8. May 15, 2013 [ISQS 6347 – Final Project Report]
8 | Abuelo's
Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)
Figure 4: Output Result from SAS Enterprise Miner
9. May 15, 2013 [ISQS 6347 – Final Project Report]
9 | Abuelo's
Table 2: Items that Have Same Number but Different Description (Show First Two)
Preprocessing Tasks
The objective of this project is to determine whether the valued item has created any
effects on the profit generated. Therefore, we decided to add additional data
attributes,Profit, Valued_Item_Flag, and New_Item_Flag, to represent sales profit, valued
menu item, and new menu item, respectively, by combining the information of menu items.
One thing needs to be noted for the newly added attributes is that majority of data are
missing for the new item flag and valued item flag. The reason is because not all stores of
Abuelo’s participated in this research of new valued menu. Therefore, which data should be
chosen for our project analysis is a very important concern. Figure 3 below is the
screenshot of modified dataset, All_Profit_Flag. Table 3 lists the three newly added
attributes in dataset.
Figure 3: Table of Modified Dataset All_Profit_Flag
Item_Number Minor_Category Product_Description
101090 Sub Cooked Taco Meat BF 2.5 oz - Sub
101090 Sub Cooked Taco Meat CK 2.5 oz - Sub
12067 Margaritas Patron Shaken Margarita
12067 Margaritas Shaken Margarita
10. May 15, 2013 [ISQS 6347 – Final Project Report]
10 | Abuelo's
Attribute Name Attribute Description
Profit Sales profit of an item at a store during a week
NewItemFlag Flag for indicating the new menu item
ValuedItemFlag Flag for indicating the valued menu item
Table 3: Newly Added Attributes in Dataset
Data Mining Methodologies
The data mining models chosen for our project must meet two important criteria: the
nature of dataset and the objective of this business analysis project. Since our objective is
to determine whether the valued menu item increases sales profit of a store, at this
preliminary data mining stage we decided to use a Regression model to analyze the
importance of valued item in terms of profits generated. Figure 5 and 6 are variable
configuration and design of data process flow. The configuration shown in Figure 5 and 6
are subject to be changed and modified later.
Figure 5: Variable Configuration for Regression
Figure 6: Data Process Flow for Regression
Initially we only included two input variables, New_Item_Flag and Valued_Item_Flag, and
one target variable, Profit, for the regression analysis. As we mentioned earlier in report,
there are many data missing for the flags of new item and valued item. As a result, the data
11. May 15, 2013 [ISQS 6347 – Final Project Report]
11 | Abuelo's
must go through a filtering step to exclude the data rows which have no information about
new/valued item flags. Below is the result of Filter. About 90% of observations are
excluded after filtering.
Figure 7
Primitive Results and Findings
Figure 8 shows the result of Regression node. According to Type 3 Analysis of Effects, if we
only analyzed the effects of new item and valued item on the profit, new item seems to have
a significant effect on profit (Pr< .0001). On the other hand, the valued item does not have
any significant effect on the change of profit.
At this preliminary data mining stage, we concluded that regression analysis indicated that
the valued item has no significant impact on sales profit.
12. May 15, 2013 [ISQS 6347 – Final Project Report]
12 | Abuelo's
Figure 8: Output of Regression Model
13. May 15, 2013 [ISQS 6347 – Final Project Report]
13 | Abuelo's
Data Filtration & Addition of New Variables
We used Enterprise Guide to filter out the missing data and to add new variables like Profit,
New_Item_Flag&Valued_Item_Flag. Then we exported this refined dataset to use it in
Enterprise Miner.
Figure 9: Enterprise Guide showing newly introduce variables
Refined Data’s Exploration
After filtering & adding “Profit” column in the existing dataset using Enterprise Guide, we
used that dataset for further analysis. Figure 9 shows the variable settings for this dataset.
14. May 15, 2013 [ISQS 6347 – Final Project Report]
14 | Abuelo's
Figure 10
In Explore Window, Actions -> Plot, use 3D bar charts which will show dialog in Figure 10
& Figure 11 shows the same dialog enlarged.
Figure 11
15. May 15, 2013 [ISQS 6347 – Final Project Report]
15 | Abuelo's
Figure 12
Figure 12 shows 3D Bar Chart Plot with Profit as Response, year as Series
&Valued_Item_Flag as Category.
16. May 15, 2013 [ISQS 6347 – Final Project Report]
16 | Abuelo's
Figure 13
Figure 14
Figure 16 shows result of Segment Profile node with the variables settings shown in Figure
15
17. May 15, 2013 [ISQS 6347 – Final Project Report]
17 | Abuelo's
Figure 15
Figure 16