SlideShare a Scribd company logo
1 of 17
ISQS 6347- Data &Text Mining
Spring 2013 Team 5
Project title Data Analysis For Abuelo’s
Class number / Semester ISQS6347 Spring 2013 – Section 1
Student names Preeti Prajapati
Neha Soam
Ming Kuo Hui
The type of this project Data Mining Academic Project
The nature and source of the dataset Nature - Available in SAS file format
Source – Abuelo’s Restaurant
Completion date May 16 2013
2013
May 15, 2013 [ISQS 6347 – Final Project Report]
2 | Abuelo's
Table of Contents
Introduction............................................................................................................................................................3
Business Background......................................................................................................................................3
Objective...............................................................................................................................................................3
Project Overview...................................................................................................................................................3
Dataset Availability and Description.........................................................................................................3
Table 1 : Attributes & their Description...................................................................................................4
Data Quality and Preparation ......................................................................................................................4
Table 2...................................................................................................................................................................6
Data Exploration & Preprocessing.................................................................................................................7
Data Preparation...............................................................................................................................................7
Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide).8
Figure 4: Output Result from SAS Enterprise Miner...........................................................................8
Preprocessing Tasks........................................................................................................................................9
Data Mining Methodologies ........................................................................................................................... 10
Primitive Results and Findings..................................................................................................................... 11
Data Filtration & Addition of New Variables ...................................................................................... 13
Refined Data’s Exploration ........................................................................................................................ 13
May 15, 2013 [ISQS 6347 – Final Project Report]
3 | Abuelo's
Introduction
The purpose of this project is to analyze a restaurant’s sales data and to generate a model
that would aid at restaurant’s management decisions. The restaurant would be examined in
this project, Abuelo’s, is a real restaurant and all data collected are real data. By collecting,
exploring, processing, and analyzing the real life data via the data mining techniques, we
learned from lecture, we are able to generate a model that is useful and can be applied to
restaurant’s decision making.
Business Background
The Abuelo’s is a Mexican restaurant that has established stores in several cities since 1989.
Abuelo’s has consistently been on the leading edge of Mexican cuisine, combining menu
creativity, outstanding food and beverage quality, colorful plate presentations and superior
service in an impressive Mexican courtyard-themed atmosphere. Every dish is made to
order from scratch using only the freshest premium ingredients.
Objective
Recently Abuelo’s is planning to adopt a new menu to replace the old one. The restaurant
has been conducting trails of new Value Items. Value item has a lower cost as well as a
lower profit margin compared to its full version (i.e. Chicken Zucchini and Chicken Zucchini
Lite). But value items are more frequently ordered than other items. The new menu differs
from the old one in that it is extended with Value Items and some other new items which
are not treated as value items in the list.
The main objective of this project is to analyze the effect of value items on the total profit
return. The result of this project is expected to aid at decisions of what value items should
be deleted or stayed on the menu.
Project Overview
Dataset Availability and Description
The data for Abuelo's is available for year 2011 and 2012 in excel and SAS files. The
attributes and descriptions of the available data are listed in table below:
May 15, 2013 [ISQS 6347 – Final Project Report]
4 | Abuelo's
Attribute Name Attribute Description
UID Unique ID representing combination of item number and store
ID
Store ID Unique ID assigned to each store
Item Number Number assigned to an item
Minor Category Category of item
Product Description Description of item
Quantity Quantity sold for each item in different stores
Avg Unit Price Average unit price of an item
Avg Unit Cost Average unit cost of an item
Guest Count Sum of customer visits in one stores in a particular week
Week IND Number assigned to each week in one year
Number Item Number Number assigned to an item
Table 1 : Attributes & their Description
Note: The dataset has approximately 1,827,700 rows and has minimal missing values.
Data Quality and Preparation
The dataset used comes from previous student project; therefore, many data preparation
tasks have been done and the dataset has already been transformed into SAS file format.
However, after exploring the dataset, we observed some issues that may require further
considerations and adjustments before the data analysis and mining stage:
» UID is not a unique identifier, and it has no value for 2406 records.
May 15, 2013 [ISQS 6347 – Final Project Report]
5 | Abuelo's
Figure 1: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)
» Purpose of Num_Item_Number attribute is unclear – the value contained is same to
that of Item_Number, but their data types are different. In addition,
Num_Item_Number has 187 missing value (but there are no missing value in
Item_Number).
May 15, 2013 [ISQS 6347 – Final Project Report]
6 | Abuelo's
» Output Result from SAS Enterprise Miner
Figure 2: Output Result from SAS EnterpriseMiner
» Unclear variable values:
Some Avg_Unit_Price contain 0, indicating the price of item is $0.
Some Avg_Unit_Cost contain 0 and negative value.
» There are 28 Item_Number having duplicate values but with different
Product_Description.
Table of Items that Have Same Number but Different Description (Show First Two)
Item_Number Minor_Category Product_Description
101090 Sub Cooked Taco Meat BF 2.5 oz - Sub
101090 Sub Cooked Taco Meat CK 2.5 oz - Sub
12067 Margaritas Patron Shaken Margarita
12067 Margaritas Shaken Margarita
Table 2
May 15, 2013 [ISQS 6347 – Final Project Report]
7 | Abuelo's
Data Exploration & Preprocessing
The tasks of preliminary data mining include data preparation, data exploration, data
model selection, and discussion of primitive findings. By performing preliminary data
mining, we are able to examine data quality and observe the issues such as missing data
and duplicated or erroneous data. The appropriate data methodologies are chosen and
applied based on nature of dataset and objective of project – to analyze the effect of valued
items on the total profit return. .
Data Preparation
The dataset, available in SAS file format, contains data and information as shown in Table 1.
Attribute Name Attribute Description
UID Unique ID representing combination of item number and store
ID
StoreID Unique ID assigned to each store
ItemNumber Number assigned to an item
MinorCategory Category of item
ProductDescription Description of item
Quantity Quantity sold for each item in different stores
AvgUnitPrice Average unit price of an item
AvgUnitCost Average unit cost of an item
GuestCount Sum of customer visits in one stores in a particular week
WeekIND Number assigned to each week in one year
NumItemNumber Number assigned to an item
Table 1: Initial Data from Dataset
Because the dataset is already cleansed and is well prepared, at this stage we focused on
data exploration and examination. We found several issues that may affect the analysis of
project. Four major issues observed are listed as followed:
UID is not a unique identifier, and 2406 of the records have no value (see Figure 1).
Purpose of NumItemNumber attribute is unclear – the value contained is same to
that of ItemNumber, but their data types are different. In addition, NumItemNumber
has 187 missing value (see Figure 2).
Unclear variable values:
o Some AvgUnitPrice contain 0, indicating the price of item is $0.
o Some AvgUnitCost contain 0 and negative value.
There are 28 ItemNumbers having duplicate values but with different
ProductDescription (see Table 2)
May 15, 2013 [ISQS 6347 – Final Project Report]
8 | Abuelo's
Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide)
Figure 4: Output Result from SAS Enterprise Miner
May 15, 2013 [ISQS 6347 – Final Project Report]
9 | Abuelo's
Table 2: Items that Have Same Number but Different Description (Show First Two)
Preprocessing Tasks
The objective of this project is to determine whether the valued item has created any
effects on the profit generated. Therefore, we decided to add additional data
attributes,Profit, Valued_Item_Flag, and New_Item_Flag, to represent sales profit, valued
menu item, and new menu item, respectively, by combining the information of menu items.
One thing needs to be noted for the newly added attributes is that majority of data are
missing for the new item flag and valued item flag. The reason is because not all stores of
Abuelo’s participated in this research of new valued menu. Therefore, which data should be
chosen for our project analysis is a very important concern. Figure 3 below is the
screenshot of modified dataset, All_Profit_Flag. Table 3 lists the three newly added
attributes in dataset.
Figure 3: Table of Modified Dataset All_Profit_Flag
Item_Number Minor_Category Product_Description
101090 Sub Cooked Taco Meat BF 2.5 oz - Sub
101090 Sub Cooked Taco Meat CK 2.5 oz - Sub
12067 Margaritas Patron Shaken Margarita
12067 Margaritas Shaken Margarita
May 15, 2013 [ISQS 6347 – Final Project Report]
10 | Abuelo's
Attribute Name Attribute Description
Profit Sales profit of an item at a store during a week
NewItemFlag Flag for indicating the new menu item
ValuedItemFlag Flag for indicating the valued menu item
Table 3: Newly Added Attributes in Dataset
Data Mining Methodologies
The data mining models chosen for our project must meet two important criteria: the
nature of dataset and the objective of this business analysis project. Since our objective is
to determine whether the valued menu item increases sales profit of a store, at this
preliminary data mining stage we decided to use a Regression model to analyze the
importance of valued item in terms of profits generated. Figure 5 and 6 are variable
configuration and design of data process flow. The configuration shown in Figure 5 and 6
are subject to be changed and modified later.
Figure 5: Variable Configuration for Regression
Figure 6: Data Process Flow for Regression
Initially we only included two input variables, New_Item_Flag and Valued_Item_Flag, and
one target variable, Profit, for the regression analysis. As we mentioned earlier in report,
there are many data missing for the flags of new item and valued item. As a result, the data
May 15, 2013 [ISQS 6347 – Final Project Report]
11 | Abuelo's
must go through a filtering step to exclude the data rows which have no information about
new/valued item flags. Below is the result of Filter. About 90% of observations are
excluded after filtering.
Figure 7
Primitive Results and Findings
Figure 8 shows the result of Regression node. According to Type 3 Analysis of Effects, if we
only analyzed the effects of new item and valued item on the profit, new item seems to have
a significant effect on profit (Pr< .0001). On the other hand, the valued item does not have
any significant effect on the change of profit.
At this preliminary data mining stage, we concluded that regression analysis indicated that
the valued item has no significant impact on sales profit.
May 15, 2013 [ISQS 6347 – Final Project Report]
12 | Abuelo's
Figure 8: Output of Regression Model
May 15, 2013 [ISQS 6347 – Final Project Report]
13 | Abuelo's
Data Filtration & Addition of New Variables
We used Enterprise Guide to filter out the missing data and to add new variables like Profit,
New_Item_Flag&Valued_Item_Flag. Then we exported this refined dataset to use it in
Enterprise Miner.
Figure 9: Enterprise Guide showing newly introduce variables
Refined Data’s Exploration
After filtering & adding “Profit” column in the existing dataset using Enterprise Guide, we
used that dataset for further analysis. Figure 9 shows the variable settings for this dataset.
May 15, 2013 [ISQS 6347 – Final Project Report]
14 | Abuelo's
Figure 10
In Explore Window, Actions -> Plot, use 3D bar charts which will show dialog in Figure 10
& Figure 11 shows the same dialog enlarged.
Figure 11
May 15, 2013 [ISQS 6347 – Final Project Report]
15 | Abuelo's
Figure 12
Figure 12 shows 3D Bar Chart Plot with Profit as Response, year as Series
&Valued_Item_Flag as Category.
May 15, 2013 [ISQS 6347 – Final Project Report]
16 | Abuelo's
Figure 13
Figure 14
Figure 16 shows result of Segment Profile node with the variables settings shown in Figure
15
May 15, 2013 [ISQS 6347 – Final Project Report]
17 | Abuelo's
Figure 15
Figure 16

More Related Content

Similar to Isqs6347 team5 proposal_032513

Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15AnwarrChaudary
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15Sagar Tupkar
 
The olap tutorial 2012
The olap tutorial 2012The olap tutorial 2012
The olap tutorial 2012Amin Jalali
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Kevin Fahy Bi Portfolio
Kevin Fahy   Bi PortfolioKevin Fahy   Bi Portfolio
Kevin Fahy Bi PortfolioKevinPFahy
 
Business analytics and data warehousing
Business analytics and data warehousingBusiness analytics and data warehousing
Business analytics and data warehousingSamir Majumder
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submittedChamath Sajeewa
 
Sql query analyzer & maintenance
Sql query analyzer & maintenanceSql query analyzer & maintenance
Sql query analyzer & maintenancenspyrenet
 
Jazmine Kane Portfolio
Jazmine Kane PortfolioJazmine Kane Portfolio
Jazmine Kane PortfolioJazmine Kane
 
See sql server graphical execution plans in action tech republic
See sql server graphical execution plans in action   tech republicSee sql server graphical execution plans in action   tech republic
See sql server graphical execution plans in action tech republicKaing Menglieng
 
SQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya BhatnagarSQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya Bhatnagarsammykb
 

Similar to Isqs6347 team5 proposal_032513 (20)

Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
How to PIVOT in SQL
How to PIVOT in SQLHow to PIVOT in SQL
How to PIVOT in SQL
 
SQL Tips PIVOT Function.pptx
SQL Tips PIVOT Function.pptxSQL Tips PIVOT Function.pptx
SQL Tips PIVOT Function.pptx
 
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
EDA of San Francisco Employee Compensation for Fiscal Year 2014-15
 
The olap tutorial 2012
The olap tutorial 2012The olap tutorial 2012
The olap tutorial 2012
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Data modelling interview question
Data modelling interview questionData modelling interview question
Data modelling interview question
 
ETL QA
ETL QAETL QA
ETL QA
 
Kevin Fahy Bi Portfolio
Kevin Fahy   Bi PortfolioKevin Fahy   Bi Portfolio
Kevin Fahy Bi Portfolio
 
Business analytics and data warehousing
Business analytics and data warehousingBusiness analytics and data warehousing
Business analytics and data warehousing
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
 
The ABC analysis
The ABC analysis The ABC analysis
The ABC analysis
 
Iowa liquor sales
Iowa liquor salesIowa liquor sales
Iowa liquor sales
 
Sql query analyzer & maintenance
Sql query analyzer & maintenanceSql query analyzer & maintenance
Sql query analyzer & maintenance
 
Jazmine Kane Portfolio
Jazmine Kane PortfolioJazmine Kane Portfolio
Jazmine Kane Portfolio
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Star schema
Star schemaStar schema
Star schema
 
See sql server graphical execution plans in action tech republic
See sql server graphical execution plans in action   tech republicSee sql server graphical execution plans in action   tech republic
See sql server graphical execution plans in action tech republic
 
SQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya BhatnagarSQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya Bhatnagar
 

Recently uploaded

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Recently uploaded (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

Isqs6347 team5 proposal_032513

  • 1. ISQS 6347- Data &Text Mining Spring 2013 Team 5 Project title Data Analysis For Abuelo’s Class number / Semester ISQS6347 Spring 2013 – Section 1 Student names Preeti Prajapati Neha Soam Ming Kuo Hui The type of this project Data Mining Academic Project The nature and source of the dataset Nature - Available in SAS file format Source – Abuelo’s Restaurant Completion date May 16 2013 2013
  • 2. May 15, 2013 [ISQS 6347 – Final Project Report] 2 | Abuelo's Table of Contents Introduction............................................................................................................................................................3 Business Background......................................................................................................................................3 Objective...............................................................................................................................................................3 Project Overview...................................................................................................................................................3 Dataset Availability and Description.........................................................................................................3 Table 1 : Attributes & their Description...................................................................................................4 Data Quality and Preparation ......................................................................................................................4 Table 2...................................................................................................................................................................6 Data Exploration & Preprocessing.................................................................................................................7 Data Preparation...............................................................................................................................................7 Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide).8 Figure 4: Output Result from SAS Enterprise Miner...........................................................................8 Preprocessing Tasks........................................................................................................................................9 Data Mining Methodologies ........................................................................................................................... 10 Primitive Results and Findings..................................................................................................................... 11 Data Filtration & Addition of New Variables ...................................................................................... 13 Refined Data’s Exploration ........................................................................................................................ 13
  • 3. May 15, 2013 [ISQS 6347 – Final Project Report] 3 | Abuelo's Introduction The purpose of this project is to analyze a restaurant’s sales data and to generate a model that would aid at restaurant’s management decisions. The restaurant would be examined in this project, Abuelo’s, is a real restaurant and all data collected are real data. By collecting, exploring, processing, and analyzing the real life data via the data mining techniques, we learned from lecture, we are able to generate a model that is useful and can be applied to restaurant’s decision making. Business Background The Abuelo’s is a Mexican restaurant that has established stores in several cities since 1989. Abuelo’s has consistently been on the leading edge of Mexican cuisine, combining menu creativity, outstanding food and beverage quality, colorful plate presentations and superior service in an impressive Mexican courtyard-themed atmosphere. Every dish is made to order from scratch using only the freshest premium ingredients. Objective Recently Abuelo’s is planning to adopt a new menu to replace the old one. The restaurant has been conducting trails of new Value Items. Value item has a lower cost as well as a lower profit margin compared to its full version (i.e. Chicken Zucchini and Chicken Zucchini Lite). But value items are more frequently ordered than other items. The new menu differs from the old one in that it is extended with Value Items and some other new items which are not treated as value items in the list. The main objective of this project is to analyze the effect of value items on the total profit return. The result of this project is expected to aid at decisions of what value items should be deleted or stayed on the menu. Project Overview Dataset Availability and Description The data for Abuelo's is available for year 2011 and 2012 in excel and SAS files. The attributes and descriptions of the available data are listed in table below:
  • 4. May 15, 2013 [ISQS 6347 – Final Project Report] 4 | Abuelo's Attribute Name Attribute Description UID Unique ID representing combination of item number and store ID Store ID Unique ID assigned to each store Item Number Number assigned to an item Minor Category Category of item Product Description Description of item Quantity Quantity sold for each item in different stores Avg Unit Price Average unit price of an item Avg Unit Cost Average unit cost of an item Guest Count Sum of customer visits in one stores in a particular week Week IND Number assigned to each week in one year Number Item Number Number assigned to an item Table 1 : Attributes & their Description Note: The dataset has approximately 1,827,700 rows and has minimal missing values. Data Quality and Preparation The dataset used comes from previous student project; therefore, many data preparation tasks have been done and the dataset has already been transformed into SAS file format. However, after exploring the dataset, we observed some issues that may require further considerations and adjustments before the data analysis and mining stage: » UID is not a unique identifier, and it has no value for 2406 records.
  • 5. May 15, 2013 [ISQS 6347 – Final Project Report] 5 | Abuelo's Figure 1: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide) » Purpose of Num_Item_Number attribute is unclear – the value contained is same to that of Item_Number, but their data types are different. In addition, Num_Item_Number has 187 missing value (but there are no missing value in Item_Number).
  • 6. May 15, 2013 [ISQS 6347 – Final Project Report] 6 | Abuelo's » Output Result from SAS Enterprise Miner Figure 2: Output Result from SAS EnterpriseMiner » Unclear variable values: Some Avg_Unit_Price contain 0, indicating the price of item is $0. Some Avg_Unit_Cost contain 0 and negative value. » There are 28 Item_Number having duplicate values but with different Product_Description. Table of Items that Have Same Number but Different Description (Show First Two) Item_Number Minor_Category Product_Description 101090 Sub Cooked Taco Meat BF 2.5 oz - Sub 101090 Sub Cooked Taco Meat CK 2.5 oz - Sub 12067 Margaritas Patron Shaken Margarita 12067 Margaritas Shaken Margarita Table 2
  • 7. May 15, 2013 [ISQS 6347 – Final Project Report] 7 | Abuelo's Data Exploration & Preprocessing The tasks of preliminary data mining include data preparation, data exploration, data model selection, and discussion of primitive findings. By performing preliminary data mining, we are able to examine data quality and observe the issues such as missing data and duplicated or erroneous data. The appropriate data methodologies are chosen and applied based on nature of dataset and objective of project – to analyze the effect of valued items on the total profit return. . Data Preparation The dataset, available in SAS file format, contains data and information as shown in Table 1. Attribute Name Attribute Description UID Unique ID representing combination of item number and store ID StoreID Unique ID assigned to each store ItemNumber Number assigned to an item MinorCategory Category of item ProductDescription Description of item Quantity Quantity sold for each item in different stores AvgUnitPrice Average unit price of an item AvgUnitCost Average unit cost of an item GuestCount Sum of customer visits in one stores in a particular week WeekIND Number assigned to each week in one year NumItemNumber Number assigned to an item Table 1: Initial Data from Dataset Because the dataset is already cleansed and is well prepared, at this stage we focused on data exploration and examination. We found several issues that may affect the analysis of project. Four major issues observed are listed as followed: UID is not a unique identifier, and 2406 of the records have no value (see Figure 1). Purpose of NumItemNumber attribute is unclear – the value contained is same to that of ItemNumber, but their data types are different. In addition, NumItemNumber has 187 missing value (see Figure 2). Unclear variable values: o Some AvgUnitPrice contain 0, indicating the price of item is $0. o Some AvgUnitCost contain 0 and negative value. There are 28 ItemNumbers having duplicate values but with different ProductDescription (see Table 2)
  • 8. May 15, 2013 [ISQS 6347 – Final Project Report] 8 | Abuelo's Figure 3: Non-Unique UID and Its Number of Missing Values (via SAS Enterprise Guide) Figure 4: Output Result from SAS Enterprise Miner
  • 9. May 15, 2013 [ISQS 6347 – Final Project Report] 9 | Abuelo's Table 2: Items that Have Same Number but Different Description (Show First Two) Preprocessing Tasks The objective of this project is to determine whether the valued item has created any effects on the profit generated. Therefore, we decided to add additional data attributes,Profit, Valued_Item_Flag, and New_Item_Flag, to represent sales profit, valued menu item, and new menu item, respectively, by combining the information of menu items. One thing needs to be noted for the newly added attributes is that majority of data are missing for the new item flag and valued item flag. The reason is because not all stores of Abuelo’s participated in this research of new valued menu. Therefore, which data should be chosen for our project analysis is a very important concern. Figure 3 below is the screenshot of modified dataset, All_Profit_Flag. Table 3 lists the three newly added attributes in dataset. Figure 3: Table of Modified Dataset All_Profit_Flag Item_Number Minor_Category Product_Description 101090 Sub Cooked Taco Meat BF 2.5 oz - Sub 101090 Sub Cooked Taco Meat CK 2.5 oz - Sub 12067 Margaritas Patron Shaken Margarita 12067 Margaritas Shaken Margarita
  • 10. May 15, 2013 [ISQS 6347 – Final Project Report] 10 | Abuelo's Attribute Name Attribute Description Profit Sales profit of an item at a store during a week NewItemFlag Flag for indicating the new menu item ValuedItemFlag Flag for indicating the valued menu item Table 3: Newly Added Attributes in Dataset Data Mining Methodologies The data mining models chosen for our project must meet two important criteria: the nature of dataset and the objective of this business analysis project. Since our objective is to determine whether the valued menu item increases sales profit of a store, at this preliminary data mining stage we decided to use a Regression model to analyze the importance of valued item in terms of profits generated. Figure 5 and 6 are variable configuration and design of data process flow. The configuration shown in Figure 5 and 6 are subject to be changed and modified later. Figure 5: Variable Configuration for Regression Figure 6: Data Process Flow for Regression Initially we only included two input variables, New_Item_Flag and Valued_Item_Flag, and one target variable, Profit, for the regression analysis. As we mentioned earlier in report, there are many data missing for the flags of new item and valued item. As a result, the data
  • 11. May 15, 2013 [ISQS 6347 – Final Project Report] 11 | Abuelo's must go through a filtering step to exclude the data rows which have no information about new/valued item flags. Below is the result of Filter. About 90% of observations are excluded after filtering. Figure 7 Primitive Results and Findings Figure 8 shows the result of Regression node. According to Type 3 Analysis of Effects, if we only analyzed the effects of new item and valued item on the profit, new item seems to have a significant effect on profit (Pr< .0001). On the other hand, the valued item does not have any significant effect on the change of profit. At this preliminary data mining stage, we concluded that regression analysis indicated that the valued item has no significant impact on sales profit.
  • 12. May 15, 2013 [ISQS 6347 – Final Project Report] 12 | Abuelo's Figure 8: Output of Regression Model
  • 13. May 15, 2013 [ISQS 6347 – Final Project Report] 13 | Abuelo's Data Filtration & Addition of New Variables We used Enterprise Guide to filter out the missing data and to add new variables like Profit, New_Item_Flag&Valued_Item_Flag. Then we exported this refined dataset to use it in Enterprise Miner. Figure 9: Enterprise Guide showing newly introduce variables Refined Data’s Exploration After filtering & adding “Profit” column in the existing dataset using Enterprise Guide, we used that dataset for further analysis. Figure 9 shows the variable settings for this dataset.
  • 14. May 15, 2013 [ISQS 6347 – Final Project Report] 14 | Abuelo's Figure 10 In Explore Window, Actions -> Plot, use 3D bar charts which will show dialog in Figure 10 & Figure 11 shows the same dialog enlarged. Figure 11
  • 15. May 15, 2013 [ISQS 6347 – Final Project Report] 15 | Abuelo's Figure 12 Figure 12 shows 3D Bar Chart Plot with Profit as Response, year as Series &Valued_Item_Flag as Category.
  • 16. May 15, 2013 [ISQS 6347 – Final Project Report] 16 | Abuelo's Figure 13 Figure 14 Figure 16 shows result of Segment Profile node with the variables settings shown in Figure 15
  • 17. May 15, 2013 [ISQS 6347 – Final Project Report] 17 | Abuelo's Figure 15 Figure 16