SlideShare a Scribd company logo
ISQS 6347- Data &Text Mining
Spring 2013 Team 5
Project title Data Analysis For Abuelo’s
Class number / Semester ISQS6347 Spring 2013 – Section 1
Student names Preeti Prajapati
Neha Soam
Ming Kuo Hui
The type of this project Data Mining Academic Project
The nature and source of the dataset Nature - Available in SAS file format
Source – Abuelo’s Restaurant
Completion date May 16 2013
2013
May 15, 2013 [ISQS 6347 – Final Project Report]
2 | Abuelo's
Table of Contents
Introduction............................................................................................................................................................3
Business Background......................................................................................................................................3
Objective...............................................................................................................................................................3
Project Overview...................................................................................................................................................3
Dataset Availability and Description.........................................................................................................3
Table 1 : Attributes & their Description...................................................................................................4
Data Quality and Preparation ......................................................................................................................4
Table 2...................................................................................................................................................................6
Data Exploration & Preprocessing.................................................................................................................7
Data Preparation...............................................................................................................................................7
Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide).8
Figure 4: Output Result from SAS Enterprise Miner...........................................................................8
Preprocessing Tasks........................................................................................................................................9
Data Mining Methodologies ........................................................................................................................... 10
Primitive Results and Findings..................................................................................................................... 11
Data Filtration & Addition of New Variables ...................................................................................... 13
Refined Data’s Exploration ........................................................................................................................ 13
May 15, 2013 [ISQS 6347 – Final Project Report]
3 | Abuelo's
Introduction
The purpose of this project is to analyze a restaurant’s sales data and to generate a model
that would aid at restaurant’s management decisions. The restaurant would be examined in
this project, Abuelo’s, is a real restaurant and all data collected are real data. By collecting,
exploring, processing, and analyzing the real life data via the data mining techniques, we
learned from lecture, we are able to generate a model that is useful and can be applied to
restaurant’s decision making.
Business Background
The Abuelo’s is a Mexican restaurant that has established stores in several cities since 1989.
Abuelo’s has consistently been on the leading edge of Mexican cuisine, combining menu
creativity, outstanding food and beverage quality, colorful plate presentations and superior
service in an impressive Mexican courtyard-themed atmosphere. Every dish is made to
order from scratch using only the freshest premium ingredients.
Objective
Recently Abuelo’s is planning to adopt a new menu to replace the old one. The restaurant
has been conducting trails of new Value Items. Value item has a lower cost as well as a
lower profit margin compared to its full version (i.e. Chicken Zucchini and Chicken Zucchini
Lite). But value items are more frequently ordered than other items. The new menu differs
from the old one in that it is extended with Value Items and some other new items which
are not treated as value items in the list.
The main objective of this project is to analyze the effect of value items on the total profit
return. The result of this project is expected to aid at decisions of what value items should
be deleted or stayed on the menu.
Project Overview
Dataset Availability and Description
The data for Abuelo's is available for year 2011 and 2012 in excel and SAS files. The
attributes and descriptions of the available data are listed in table below:
May 15, 2013 [ISQS 6347 – Final Project Report]
4 | Abuelo's
Attribute Name Attribute Description
UID Unique ID representing combination of item number and store
ID
Store ID Unique ID assigned to each store
Item Number Number assigned to an item
Minor Category Category of item
Product Description Description of item
Quantity Quantity sold for each item in different stores
Avg Unit Price Average unit price of an item
Avg Unit Cost Average unit cost of an item
Guest Count Sum of customer visits in one stores in a particular week
Week IND Number assigned to each week in one year
Number Item Number Number assigned to an item
Table 1 : Attributes & their Description
Note: The dataset has approximately 1,827,700 rows and has minimal missing values.
Data Quality and Preparation
The dataset used comes from previous student project; therefore, many data preparation
tasks have been done and the dataset has already been transformed into SAS file format.
However, after exploring the dataset, we observed some issues that may require further
considerations and adjustments before the data analysis and mining stage:
» UID is not a unique identifier, and it has no value for 2406 records.
May 15, 2013 [ISQS 6347 – Final Project Report]
5 | Abuelo's
Figure 1: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)
» Purpose of Num_Item_Number attribute is unclear – the value contained is same to
that of Item_Number, but their data types are different. In addition,
Num_Item_Number has 187 missing value (but there are no missing value in
Item_Number).
May 15, 2013 [ISQS 6347 – Final Project Report]
6 | Abuelo's
» Output Result from SAS Enterprise Miner
Figure 2: Output Result from SAS EnterpriseMiner
» Unclear variable values:
Some Avg_Unit_Price contain 0, indicating the price of item is $0.
Some Avg_Unit_Cost contain 0 and negative value.
» There are 28 Item_Number having duplicate values but with different
Product_Description.
Table of Items that Have Same Number but Different Description (Show First Two)
Item_Number Minor_Category Product_Description
101090 Sub Cooked Taco Meat BF 2.5 oz - Sub
101090 Sub Cooked Taco Meat CK 2.5 oz - Sub
12067 Margaritas Patron Shaken Margarita
12067 Margaritas Shaken Margarita
Table 2
May 15, 2013 [ISQS 6347 – Final Project Report]
7 | Abuelo's
Data Exploration & Preprocessing
The tasks of preliminary data mining include data preparation, data exploration, data
model selection, and discussion of primitive findings. By performing preliminary data
mining, we are able to examine data quality and observe the issues such as missing data
and duplicated or erroneous data. The appropriate data methodologies are chosen and
applied based on nature of dataset and objective of project – to analyze the effect of valued
items on the total profit return. .
Data Preparation
The dataset, available in SAS file format, contains data and information as shown in Table 1.
Attribute Name Attribute Description
UID Unique ID representing combination of item number and store
ID
StoreID Unique ID assigned to each store
ItemNumber Number assigned to an item
MinorCategory Category of item
ProductDescription Description of item
Quantity Quantity sold for each item in different stores
AvgUnitPrice Average unit price of an item
AvgUnitCost Average unit cost of an item
GuestCount Sum of customer visits in one stores in a particular week
WeekIND Number assigned to each week in one year
NumItemNumber Number assigned to an item
Table 1: Initial Data from Dataset
Because the dataset is already cleansed and is well prepared, at this stage we focused on
data exploration and examination. We found several issues that may affect the analysis of
project. Four major issues observed are listed as followed:
UID is not a unique identifier, and 2406 of the records have no value (see Figure 1).
Purpose of NumItemNumber attribute is unclear – the value contained is same to
that of ItemNumber, but their data types are different. In addition, NumItemNumber
has 187 missing value (see Figure 2).
Unclear variable values:
o Some AvgUnitPrice contain 0, indicating the price of item is $0.
o Some AvgUnitCost contain 0 and negative value.
There are 28 ItemNumbers having duplicate values but with different
ProductDescription (see Table 2)
May 15, 2013 [ISQS 6347 – Final Project Report]
8 | Abuelo's
Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)
Figure 4: Output Result from SAS Enterprise Miner
May 15, 2013 [ISQS 6347 – Final Project Report]
9 | Abuelo's
Table 2: Items that Have Same Number but Different Description (Show First Two)
Preprocessing Tasks
The objective of this project is to determine whether the valued item has created any
effects on the profit generated. Therefore, we decided to add additional data
attributes,Profit, Valued_Item_Flag, and New_Item_Flag, to represent sales profit, valued
menu item, and new menu item, respectively, by combining the information of menu items.
One thing needs to be noted for the newly added attributes is that majority of data are
missing for the new item flag and valued item flag. The reason is because not all stores of
Abuelo’s participated in this research of new valued menu. Therefore, which data should be
chosen for our project analysis is a very important concern. Figure 3 below is the
screenshot of modified dataset, All_Profit_Flag. Table 3 lists the three newly added
attributes in dataset.
Figure 3: Table of Modified Dataset All_Profit_Flag
Item_Number Minor_Category Product_Description
101090 Sub Cooked Taco Meat BF 2.5 oz - Sub
101090 Sub Cooked Taco Meat CK 2.5 oz - Sub
12067 Margaritas Patron Shaken Margarita
12067 Margaritas Shaken Margarita
May 15, 2013 [ISQS 6347 – Final Project Report]
10 | Abuelo's
Attribute Name Attribute Description
Profit Sales profit of an item at a store during a week
NewItemFlag Flag for indicating the new menu item
ValuedItemFlag Flag for indicating the valued menu item
Table 3: Newly Added Attributes in Dataset
Data Mining Methodologies
The data mining models chosen for our project must meet two important criteria: the
nature of dataset and the objective of this business analysis project. Since our objective is
to determine whether the valued menu item increases sales profit of a store, at this
preliminary data mining stage we decided to use a Regression model to analyze the
importance of valued item in terms of profits generated. Figure 5 and 6 are variable
configuration and design of data process flow. The configuration shown in Figure 5 and 6
are subject to be changed and modified later.
Figure 5: Variable Configuration for Regression
Figure 6: Data Process Flow for Regression
Initially we only included two input variables, New_Item_Flag and Valued_Item_Flag, and
one target variable, Profit, for the regression analysis. As we mentioned earlier in report,
there are many data missing for the flags of new item and valued item. As a result, the data
May 15, 2013 [ISQS 6347 – Final Project Report]
11 | Abuelo's
must go through a filtering step to exclude the data rows which have no information about
new/valued item flags. Below is the result of Filter. About 90% of observations are
excluded after filtering.
Figure 7
Primitive Results and Findings
Figure 8 shows the result of Regression node. According to Type 3 Analysis of Effects, if we
only analyzed the effects of new item and valued item on the profit, new item seems to have
a significant effect on profit (Pr< .0001). On the other hand, the valued item does not have
any significant effect on the change of profit.
At this preliminary data mining stage, we concluded that regression analysis indicated that
the valued item has no significant impact on sales profit.
May 15, 2013 [ISQS 6347 – Final Project Report]
12 | Abuelo's
Figure 8: Output of Regression Model
May 15, 2013 [ISQS 6347 – Final Project Report]
13 | Abuelo's
Data Filtration & Addition of New Variables
We used Enterprise Guide to filter out the missing data and to add new variables like Profit,
New_Item_Flag&Valued_Item_Flag. Then we exported this refined dataset to use it in
Enterprise Miner.
Figure 9: Enterprise Guide showing newly introduce variables
Refined Data’s Exploration
After filtering & adding “Profit” column in the existing dataset using Enterprise Guide, we
used that dataset for further analysis. Figure 9 shows the variable settings for this dataset.
May 15, 2013 [ISQS 6347 – Final Project Report]
14 | Abuelo's
Figure 10
In Explore Window, Actions -> Plot, use 3D bar charts which will show dialog in Figure 10
& Figure 11 shows the same dialog enlarged.
Figure 11
May 15, 2013 [ISQS 6347 – Final Project Report]
15 | Abuelo's
Figure 12
Figure 12 shows 3D Bar Chart Plot with Profit as Response, year as Series
&Valued_Item_Flag as Category.
May 15, 2013 [ISQS 6347 – Final Project Report]
16 | Abuelo's
Figure 13
Figure 14
Figure 16 shows result of Segment Profile node with the variables settings shown in Figure
15
May 15, 2013 [ISQS 6347 – Final Project Report]
17 | Abuelo's
Figure 15
Figure 16

More Related Content

Similar to Isqs6347 team5 proposal_032513

Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
AnwarrChaudary
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
sonalighai
 
How to PIVOT in SQL
How to PIVOT in SQLHow to PIVOT in SQL
How to PIVOT in SQL
Select Distinct Limited
 
SQL Tips PIVOT Function.pptx
SQL Tips PIVOT Function.pptxSQL Tips PIVOT Function.pptx
SQL Tips PIVOT Function.pptx
Select Distinct Limited
 
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
Sagar Tupkar
 
The olap tutorial 2012
The olap tutorial 2012The olap tutorial 2012
The olap tutorial 2012
Amin Jalali
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
Tom Donoghue
 
Data modelling interview question
Data modelling interview questionData modelling interview question
ETL QA
ETL QAETL QA
ETL QA
dillip kar
 
Kevin Fahy Bi Portfolio
Kevin Fahy   Bi PortfolioKevin Fahy   Bi Portfolio
Kevin Fahy Bi Portfolio
KevinPFahy
 
Business analytics and data warehousing
Business analytics and data warehousingBusiness analytics and data warehousing
Business analytics and data warehousing
Samir Majumder
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
Chamath Sajeewa
 
The ABC analysis
The ABC analysis The ABC analysis
The ABC analysis
Niyati Batish
 
Iowa liquor sales
Iowa liquor salesIowa liquor sales
Iowa liquor sales
Trushita Redij
 
Sql query analyzer & maintenance
Sql query analyzer & maintenanceSql query analyzer & maintenance
Sql query analyzer & maintenance
nspyrenet
 
Jazmine Kane Portfolio
Jazmine Kane PortfolioJazmine Kane Portfolio
Jazmine Kane Portfolio
Jazmine Kane
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
Star schema
Star schemaStar schema
See sql server graphical execution plans in action tech republic
See sql server graphical execution plans in action   tech republicSee sql server graphical execution plans in action   tech republic
See sql server graphical execution plans in action tech republic
Kaing Menglieng
 
SQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya BhatnagarSQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya Bhatnagar
sammykb
 

Similar to Isqs6347 team5 proposal_032513 (20)

Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
How to PIVOT in SQL
How to PIVOT in SQLHow to PIVOT in SQL
How to PIVOT in SQL
 
SQL Tips PIVOT Function.pptx
SQL Tips PIVOT Function.pptxSQL Tips PIVOT Function.pptx
SQL Tips PIVOT Function.pptx
 
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
 
The olap tutorial 2012
The olap tutorial 2012The olap tutorial 2012
The olap tutorial 2012
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Data modelling interview question
Data modelling interview questionData modelling interview question
Data modelling interview question
 
ETL QA
ETL QAETL QA
ETL QA
 
Kevin Fahy Bi Portfolio
Kevin Fahy   Bi PortfolioKevin Fahy   Bi Portfolio
Kevin Fahy Bi Portfolio
 
Business analytics and data warehousing
Business analytics and data warehousingBusiness analytics and data warehousing
Business analytics and data warehousing
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
 
The ABC analysis
The ABC analysis The ABC analysis
The ABC analysis
 
Iowa liquor sales
Iowa liquor salesIowa liquor sales
Iowa liquor sales
 
Sql query analyzer & maintenance
Sql query analyzer & maintenanceSql query analyzer & maintenance
Sql query analyzer & maintenance
 
Jazmine Kane Portfolio
Jazmine Kane PortfolioJazmine Kane Portfolio
Jazmine Kane Portfolio
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Star schema
Star schemaStar schema
Star schema
 
See sql server graphical execution plans in action tech republic
See sql server graphical execution plans in action   tech republicSee sql server graphical execution plans in action   tech republic
See sql server graphical execution plans in action tech republic
 
SQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya BhatnagarSQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya Bhatnagar
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Isqs6347 team5 proposal_032513

  • 1. ISQS 6347- Data &Text Mining Spring 2013 Team 5 Project title Data Analysis For Abuelo’s Class number / Semester ISQS6347 Spring 2013 – Section 1 Student names Preeti Prajapati Neha Soam Ming Kuo Hui The type of this project Data Mining Academic Project The nature and source of the dataset Nature - Available in SAS file format Source – Abuelo’s Restaurant Completion date May 16 2013 2013
  • 2. May 15, 2013 [ISQS 6347 – Final Project Report] 2 | Abuelo's Table of Contents Introduction............................................................................................................................................................3 Business Background......................................................................................................................................3 Objective...............................................................................................................................................................3 Project Overview...................................................................................................................................................3 Dataset Availability and Description.........................................................................................................3 Table 1 : Attributes & their Description...................................................................................................4 Data Quality and Preparation ......................................................................................................................4 Table 2...................................................................................................................................................................6 Data Exploration & Preprocessing.................................................................................................................7 Data Preparation...............................................................................................................................................7 Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide).8 Figure 4: Output Result from SAS Enterprise Miner...........................................................................8 Preprocessing Tasks........................................................................................................................................9 Data Mining Methodologies ........................................................................................................................... 10 Primitive Results and Findings..................................................................................................................... 11 Data Filtration & Addition of New Variables ...................................................................................... 13 Refined Data’s Exploration ........................................................................................................................ 13
  • 3. May 15, 2013 [ISQS 6347 – Final Project Report] 3 | Abuelo's Introduction The purpose of this project is to analyze a restaurant’s sales data and to generate a model that would aid at restaurant’s management decisions. The restaurant would be examined in this project, Abuelo’s, is a real restaurant and all data collected are real data. By collecting, exploring, processing, and analyzing the real life data via the data mining techniques, we learned from lecture, we are able to generate a model that is useful and can be applied to restaurant’s decision making. Business Background The Abuelo’s is a Mexican restaurant that has established stores in several cities since 1989. Abuelo’s has consistently been on the leading edge of Mexican cuisine, combining menu creativity, outstanding food and beverage quality, colorful plate presentations and superior service in an impressive Mexican courtyard-themed atmosphere. Every dish is made to order from scratch using only the freshest premium ingredients. Objective Recently Abuelo’s is planning to adopt a new menu to replace the old one. The restaurant has been conducting trails of new Value Items. Value item has a lower cost as well as a lower profit margin compared to its full version (i.e. Chicken Zucchini and Chicken Zucchini Lite). But value items are more frequently ordered than other items. The new menu differs from the old one in that it is extended with Value Items and some other new items which are not treated as value items in the list. The main objective of this project is to analyze the effect of value items on the total profit return. The result of this project is expected to aid at decisions of what value items should be deleted or stayed on the menu. Project Overview Dataset Availability and Description The data for Abuelo's is available for year 2011 and 2012 in excel and SAS files. The attributes and descriptions of the available data are listed in table below:
  • 4. May 15, 2013 [ISQS 6347 – Final Project Report] 4 | Abuelo's Attribute Name Attribute Description UID Unique ID representing combination of item number and store ID Store ID Unique ID assigned to each store Item Number Number assigned to an item Minor Category Category of item Product Description Description of item Quantity Quantity sold for each item in different stores Avg Unit Price Average unit price of an item Avg Unit Cost Average unit cost of an item Guest Count Sum of customer visits in one stores in a particular week Week IND Number assigned to each week in one year Number Item Number Number assigned to an item Table 1 : Attributes & their Description Note: The dataset has approximately 1,827,700 rows and has minimal missing values. Data Quality and Preparation The dataset used comes from previous student project; therefore, many data preparation tasks have been done and the dataset has already been transformed into SAS file format. However, after exploring the dataset, we observed some issues that may require further considerations and adjustments before the data analysis and mining stage: » UID is not a unique identifier, and it has no value for 2406 records.
  • 5. May 15, 2013 [ISQS 6347 – Final Project Report] 5 | Abuelo's Figure 1: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide) » Purpose of Num_Item_Number attribute is unclear – the value contained is same to that of Item_Number, but their data types are different. In addition, Num_Item_Number has 187 missing value (but there are no missing value in Item_Number).
  • 6. May 15, 2013 [ISQS 6347 – Final Project Report] 6 | Abuelo's » Output Result from SAS Enterprise Miner Figure 2: Output Result from SAS EnterpriseMiner » Unclear variable values: Some Avg_Unit_Price contain 0, indicating the price of item is $0. Some Avg_Unit_Cost contain 0 and negative value. » There are 28 Item_Number having duplicate values but with different Product_Description. Table of Items that Have Same Number but Different Description (Show First Two) Item_Number Minor_Category Product_Description 101090 Sub Cooked Taco Meat BF 2.5 oz - Sub 101090 Sub Cooked Taco Meat CK 2.5 oz - Sub 12067 Margaritas Patron Shaken Margarita 12067 Margaritas Shaken Margarita Table 2
  • 7. May 15, 2013 [ISQS 6347 – Final Project Report] 7 | Abuelo's Data Exploration & Preprocessing The tasks of preliminary data mining include data preparation, data exploration, data model selection, and discussion of primitive findings. By performing preliminary data mining, we are able to examine data quality and observe the issues such as missing data and duplicated or erroneous data. The appropriate data methodologies are chosen and applied based on nature of dataset and objective of project – to analyze the effect of valued items on the total profit return. . Data Preparation The dataset, available in SAS file format, contains data and information as shown in Table 1. Attribute Name Attribute Description UID Unique ID representing combination of item number and store ID StoreID Unique ID assigned to each store ItemNumber Number assigned to an item MinorCategory Category of item ProductDescription Description of item Quantity Quantity sold for each item in different stores AvgUnitPrice Average unit price of an item AvgUnitCost Average unit cost of an item GuestCount Sum of customer visits in one stores in a particular week WeekIND Number assigned to each week in one year NumItemNumber Number assigned to an item Table 1: Initial Data from Dataset Because the dataset is already cleansed and is well prepared, at this stage we focused on data exploration and examination. We found several issues that may affect the analysis of project. Four major issues observed are listed as followed: UID is not a unique identifier, and 2406 of the records have no value (see Figure 1). Purpose of NumItemNumber attribute is unclear – the value contained is same to that of ItemNumber, but their data types are different. In addition, NumItemNumber has 187 missing value (see Figure 2). Unclear variable values: o Some AvgUnitPrice contain 0, indicating the price of item is $0. o Some AvgUnitCost contain 0 and negative value. There are 28 ItemNumbers having duplicate values but with different ProductDescription (see Table 2)
  • 8. May 15, 2013 [ISQS 6347 – Final Project Report] 8 | Abuelo's Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide) Figure 4: Output Result from SAS Enterprise Miner
  • 9. May 15, 2013 [ISQS 6347 – Final Project Report] 9 | Abuelo's Table 2: Items that Have Same Number but Different Description (Show First Two) Preprocessing Tasks The objective of this project is to determine whether the valued item has created any effects on the profit generated. Therefore, we decided to add additional data attributes,Profit, Valued_Item_Flag, and New_Item_Flag, to represent sales profit, valued menu item, and new menu item, respectively, by combining the information of menu items. One thing needs to be noted for the newly added attributes is that majority of data are missing for the new item flag and valued item flag. The reason is because not all stores of Abuelo’s participated in this research of new valued menu. Therefore, which data should be chosen for our project analysis is a very important concern. Figure 3 below is the screenshot of modified dataset, All_Profit_Flag. Table 3 lists the three newly added attributes in dataset. Figure 3: Table of Modified Dataset All_Profit_Flag Item_Number Minor_Category Product_Description 101090 Sub Cooked Taco Meat BF 2.5 oz - Sub 101090 Sub Cooked Taco Meat CK 2.5 oz - Sub 12067 Margaritas Patron Shaken Margarita 12067 Margaritas Shaken Margarita
  • 10. May 15, 2013 [ISQS 6347 – Final Project Report] 10 | Abuelo's Attribute Name Attribute Description Profit Sales profit of an item at a store during a week NewItemFlag Flag for indicating the new menu item ValuedItemFlag Flag for indicating the valued menu item Table 3: Newly Added Attributes in Dataset Data Mining Methodologies The data mining models chosen for our project must meet two important criteria: the nature of dataset and the objective of this business analysis project. Since our objective is to determine whether the valued menu item increases sales profit of a store, at this preliminary data mining stage we decided to use a Regression model to analyze the importance of valued item in terms of profits generated. Figure 5 and 6 are variable configuration and design of data process flow. The configuration shown in Figure 5 and 6 are subject to be changed and modified later. Figure 5: Variable Configuration for Regression Figure 6: Data Process Flow for Regression Initially we only included two input variables, New_Item_Flag and Valued_Item_Flag, and one target variable, Profit, for the regression analysis. As we mentioned earlier in report, there are many data missing for the flags of new item and valued item. As a result, the data
  • 11. May 15, 2013 [ISQS 6347 – Final Project Report] 11 | Abuelo's must go through a filtering step to exclude the data rows which have no information about new/valued item flags. Below is the result of Filter. About 90% of observations are excluded after filtering. Figure 7 Primitive Results and Findings Figure 8 shows the result of Regression node. According to Type 3 Analysis of Effects, if we only analyzed the effects of new item and valued item on the profit, new item seems to have a significant effect on profit (Pr< .0001). On the other hand, the valued item does not have any significant effect on the change of profit. At this preliminary data mining stage, we concluded that regression analysis indicated that the valued item has no significant impact on sales profit.
  • 12. May 15, 2013 [ISQS 6347 – Final Project Report] 12 | Abuelo's Figure 8: Output of Regression Model
  • 13. May 15, 2013 [ISQS 6347 – Final Project Report] 13 | Abuelo's Data Filtration & Addition of New Variables We used Enterprise Guide to filter out the missing data and to add new variables like Profit, New_Item_Flag&Valued_Item_Flag. Then we exported this refined dataset to use it in Enterprise Miner. Figure 9: Enterprise Guide showing newly introduce variables Refined Data’s Exploration After filtering & adding “Profit” column in the existing dataset using Enterprise Guide, we used that dataset for further analysis. Figure 9 shows the variable settings for this dataset.
  • 14. May 15, 2013 [ISQS 6347 – Final Project Report] 14 | Abuelo's Figure 10 In Explore Window, Actions -> Plot, use 3D bar charts which will show dialog in Figure 10 & Figure 11 shows the same dialog enlarged. Figure 11
  • 15. May 15, 2013 [ISQS 6347 – Final Project Report] 15 | Abuelo's Figure 12 Figure 12 shows 3D Bar Chart Plot with Profit as Response, year as Series &Valued_Item_Flag as Category.
  • 16. May 15, 2013 [ISQS 6347 – Final Project Report] 16 | Abuelo's Figure 13 Figure 14 Figure 16 shows result of Segment Profile node with the variables settings shown in Figure 15
  • 17. May 15, 2013 [ISQS 6347 – Final Project Report] 17 | Abuelo's Figure 15 Figure 16