SlideShare a Scribd company logo
Let’s Eat!
Brad Binder, Lesley Chapman,
Jon Froiland, David Lee
Introduction
History:
Since 1979 there have been services that review
and rank restaurants (Zagat)
•
Today:
According to Nielson – Americans have on
average 41 apps on their smartphones, many of
which provide a recommendation service
Introduction
A variety of restaurant recommendation apps
have been created
Features include: find restaurants, make reservations,
and healthy options
–
A Restaurant Recommender would aim to help
users save money, time, and could help cure
buyers remorse
Problem Summary
We need a tool that resolves the challenge of
finding a restaurant in your area based upon
specific cuisine and menu item criteria
entered by the user
Hypothesis
Hypothesis: The Restaurant Recommender will recommend a
more accurate restaurant compared to selecting a restaurant
based on chance alone
Ho (null hypothesis): A user will find a restaurant that they like
based on chance alone
HA(alternative hypothesis): The restaurant recommender app
will provide a better restaurant suggestion to the user compared
chance alone
Data Ingestion
• WORM Storage
–Stored HTML menu pages in one location
which could be read many times
• Parsed HTML with BeautifulSoup
–Built out a list of “Restaurant” objects
• GET requests to WMATA API to pull metro
station data
–JSON data parsed with pandas read_json()
function
Ingestion Wrangling Analysis Modeling Visualization
Wrangling and Munging
• Majority of time spent wrangling the data and
building restaurants
–Removing duplicate and incomplete
records
–Standardizing inconsistent fields (e.g. price)
–Aggregating and grouping
–Data types
• Merged restaurant and WMATA data using
Euclidean distance
Ingestion Wrangling Analysis Modeling Visualization
Data Overview
Ingestion Wrangling Analysis Modeling Visualization
964 Total Restaurants
115,517 Total Menu Items
• Restaurant data includes:
–Name
–Location (address, latitude, longitude)
–Type of cuisine
–Menu (item, price, description)
• WMATA data includes:
–Station name
–Location (latitude, longitude)
–Metro Line
Analysis
Ingestion Wrangling Analysis Modeling Visualization
10 cities
964 Restaurants
115,517 Menu Items
Analysis
Ingestion Wrangling Analysis Modeling Visualization
964 Restaurants
115,517 Menu Items
Washington, D.C.
Ingestion Wrangling Analysis Modeling Visualization
Washington, D.C.
Ingestion Wrangling Analysis Modeling Visualization
Feature Selection
• Four feature extraction pipelines using sklearn
–Chunking
–Cuisine Type
• TfidfVectorizer
–Extract keywords and assign significance score
– Tokenize and chunk parts of speech using nltk
• LabelBinarizer
–Convert cuisine types to binary features
• FeatureUnion
Ingestion Wrangling Analysis Modeling Visualization
Modeling and Prediction
• Transformation pipelines and transformed
feature vectors pickled
• Kmeans models fitted using training
restaurant data, then pickled
• User inputs entered via Flask are stored as
training instance
• Relevant pipeline and model loaded to
transform and predict
Ingestion Wrangling Analysis Modeling Visualization
K=15
Ingestion Wrangling Analysis Modeling Visualization
Ingestion Wrangling Analysis Modeling Visualization
Reporting and Visualization
• Restaurant recommendations are determined
by similarity within a matched cluster
–“Similarity” is calculated by minimizing sklearn’s
pairwise euclidean distance function between the
test data and the training instances in the feature
space
• Predictions are exported into an interactive
Tableau visualization
–Allows the user flexibility in making a selection
through filtering and visual indicators
Demo
Results
• Some predictions are good, others not so
good
–Some clusters still contain a “hodge podge”
• Removing the “cuisine type” feature helped to
eliminate what we saw as overfit
• Different k values saw better results in some
cases, worse in others
• Additional features (price, ratings, metro)
would require more clusters and MORE DATA
Conclusions
• More data over a “better” model
• Might improve results using transformations
like Singular Value Decomposition (SVD) or
Latent Dirichlet Allocation (LDA)
– Better model analysis
• With more data, improve our tokenizer
– Incorporate stemming, improve chunking
• Incorporating user feedback into prediction
model (ex: Flask interface)
Additional Opportunities
• “Waiter-caller” function that would allow users to login, use
the restaurant map search function, click on a restaurant, and
be matched up with menu items based on keyword matches.
As opposed to reading through an entire menu to find
relevant items.
–Required more knowledge and implementation of
javascript, css, and jinja into the Flask environment.
• Sentiment analyzer was developed but not integrated. Would
allow users to go to restaurant and input a review. The review
would then be analyzed giving back a recommended score (1-
5) to the user.
–Similar requirements
Sources
• Downey, Allen B. Think Bayes. O’Reilly Media; 1st Edition. 2013. Paperback.
• Downey, Allen B. Think Python. O’Reilly Media; 1st Edition, 2012. Paperback.
• Dwyer, Gareth. Flask by Example. Packt Publishing, 2016. Paperback.
• Harris, Harlin, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An
Introspective Survey of Data Scientists and Their Work. O’Reilly Media; 1st Edition,
2013.
• Julian, David. Designing Machine Learning Systems with Python. Packt Publishing,
2016. Paperback.
• Kirk, Matthew. Thoughtful Machine Learning: A Test-Driven Approach. O’Reilly
Media; 1st Edition, 2014. Paperback.
• Kumar, Ashish. Learning Predictive Analytics with Python. Packt Publishing, 2016.
Paperback.
• McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy,
and IPython. O’Reilly Media; 1st Edition, 2012. Paperback.
• Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web.
O’Reilly Media; 1st Edition, 2015. Paperback.
• Raschka, Sebastian. Python Machine Learning. Packt Publishing, 2015. Paperback.
• Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0
Applications. O’Reilly Media, 2007. Paperback.

More Related Content

Viewers also liked

Machine learning
Machine learningMachine learning
Machine learning
Andrea Iacono
 
Georgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectGeorgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone Project
Mark Phillips
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)
Noah Turner
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
divyakalsi89
 
Hotel Performance FINAL
Hotel Performance FINALHotel Performance FINAL
Hotel Performance FINAL
team_hotelperformance
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
Duyen Do
 

Viewers also liked (6)

Machine learning
Machine learningMachine learning
Machine learning
 
Georgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectGeorgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone Project
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
 
Hotel Performance FINAL
Hotel Performance FINALHotel Performance FINAL
Hotel Performance FINAL
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 

Similar to Lets eat presentation_final_20160521

Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
MLconf
 
Dissertation paper
Dissertation paperDissertation paper
Dissertation paper
Rupal Rathi
 
RecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTable
RecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTableRecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTable
RecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTable
Jeremy Schiff
 
Elizabeth Hom's UX Portfolio
Elizabeth Hom's UX PortfolioElizabeth Hom's UX Portfolio
Elizabeth Hom's UX Portfolio
Elizabeth Hom
 
Restaurant recommender
Restaurant recommenderRestaurant recommender
Restaurant recommender
Annie Thomas
 
Text mining of reviews
Text mining of reviewsText mining of reviews
Text mining of reviews
Shivam Borikar
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdfWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
farhanaaansari42
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptxWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
farhanaaansari42
 
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Jeremy Schiff
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Kris Jack
 
Prototyping for web and mobile workshop
Prototyping for web and mobile workshopPrototyping for web and mobile workshop
Prototyping for web and mobile workshop
Simon Phillips
 
Search Engine Marketing Campaign sample for California Fitness
Search Engine Marketing Campaign sample for California FitnessSearch Engine Marketing Campaign sample for California Fitness
Search Engine Marketing Campaign sample for California Fitness
Leo Concepcion
 
Spoon
SpoonSpoon
Recommender Systems Dr Carol Hargreaves
Recommender Systems Dr Carol HargreavesRecommender Systems Dr Carol Hargreaves
Recommender Systems Dr Carol Hargreaves
Carol Hargreaves
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Maya Hristakeva
 
The User Is Always Right (Usually): 4 User Research Methods That Get Results
The User Is Always Right (Usually): 4 User Research Methods That Get ResultsThe User Is Always Right (Usually): 4 User Research Methods That Get Results
The User Is Always Right (Usually): 4 User Research Methods That Get Results
Michael Hartman
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
Lucidworks
 
Emagineers - Design & Test Report
Emagineers - Design & Test ReportEmagineers - Design & Test Report
Emagineers - Design & Test Report
Alexis Polanco, Jr.
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
AkashPatil334
 
Cs548 s15 showcase_web_mining
Cs548 s15 showcase_web_miningCs548 s15 showcase_web_mining
Cs548 s15 showcase_web_mining
Aravindharamanan S
 

Similar to Lets eat presentation_final_20160521 (20)

Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
 
Dissertation paper
Dissertation paperDissertation paper
Dissertation paper
 
RecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTable
RecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTableRecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTable
RecSys 2015 - Unifying the Problem of Search and Recommendations at OpenTable
 
Elizabeth Hom's UX Portfolio
Elizabeth Hom's UX PortfolioElizabeth Hom's UX Portfolio
Elizabeth Hom's UX Portfolio
 
Restaurant recommender
Restaurant recommenderRestaurant recommender
Restaurant recommender
 
Text mining of reviews
Text mining of reviewsText mining of reviews
Text mining of reviews
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdfWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptxWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
 
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Prototyping for web and mobile workshop
Prototyping for web and mobile workshopPrototyping for web and mobile workshop
Prototyping for web and mobile workshop
 
Search Engine Marketing Campaign sample for California Fitness
Search Engine Marketing Campaign sample for California FitnessSearch Engine Marketing Campaign sample for California Fitness
Search Engine Marketing Campaign sample for California Fitness
 
Spoon
SpoonSpoon
Spoon
 
Recommender Systems Dr Carol Hargreaves
Recommender Systems Dr Carol HargreavesRecommender Systems Dr Carol Hargreaves
Recommender Systems Dr Carol Hargreaves
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
The User Is Always Right (Usually): 4 User Research Methods That Get Results
The User Is Always Right (Usually): 4 User Research Methods That Get ResultsThe User Is Always Right (Usually): 4 User Research Methods That Get Results
The User Is Always Right (Usually): 4 User Research Methods That Get Results
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
Emagineers - Design & Test Report
Emagineers - Design & Test ReportEmagineers - Design & Test Report
Emagineers - Design & Test Report
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Cs548 s15 showcase_web_mining
Cs548 s15 showcase_web_miningCs548 s15 showcase_web_mining
Cs548 s15 showcase_web_mining
 

Recently uploaded

FARE-Awareness-Teens_2022_Restricted.pptx
FARE-Awareness-Teens_2022_Restricted.pptxFARE-Awareness-Teens_2022_Restricted.pptx
FARE-Awareness-Teens_2022_Restricted.pptx
vikeshblazer
 
快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样
快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样
快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样
mmmnvxcc
 
一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理
一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理
一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理
uhyqho
 
W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...
W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...
W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...
William (Bill) H. Bender, FCSI
 
Heritage Conservation.Strategies and Options for Preserving India Heritage
Heritage Conservation.Strategies and Options for Preserving India HeritageHeritage Conservation.Strategies and Options for Preserving India Heritage
Heritage Conservation.Strategies and Options for Preserving India Heritage
JIT KUMAR GUPTA
 
Top 12 Best Restaurants in Panchkula city
Top 12 Best Restaurants in Panchkula cityTop 12 Best Restaurants in Panchkula city
Top 12 Best Restaurants in Panchkula city
Tricity help post
 
A Review on Recent Advances of Packaging in Food Industry
A Review on Recent Advances of Packaging in Food IndustryA Review on Recent Advances of Packaging in Food Industry
A Review on Recent Advances of Packaging in Food Industry
PriyankaKilaniya
 
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
vdabso
 
NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)
NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)
NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)
Addu25809
 
Cacao, the main component used in the creation of chocolate and other cacao-b...
Cacao, the main component used in the creation of chocolate and other cacao-b...Cacao, the main component used in the creation of chocolate and other cacao-b...
Cacao, the main component used in the creation of chocolate and other cacao-b...
AdelinePdelaCruz
 
USE OF AI in sensory evolution of food.pptx
USE OF AI in sensory evolution of food.pptxUSE OF AI in sensory evolution of food.pptx
USE OF AI in sensory evolution of food.pptx
saeedusama485
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
i990go7o
 
FOOD OBESITY IN NORTH AMERICA ( NEW).pptx
FOOD OBESITY IN NORTH AMERICA ( NEW).pptxFOOD OBESITY IN NORTH AMERICA ( NEW).pptx
FOOD OBESITY IN NORTH AMERICA ( NEW).pptx
kevinfrancis63
 

Recently uploaded (13)

FARE-Awareness-Teens_2022_Restricted.pptx
FARE-Awareness-Teens_2022_Restricted.pptxFARE-Awareness-Teens_2022_Restricted.pptx
FARE-Awareness-Teens_2022_Restricted.pptx
 
快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样
快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样
快速办理(Adelaide毕业证书)阿德莱德大学毕业证文凭证书一模一样
 
一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理
一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理
一比一原版(Bristol毕业证)布里斯托大学毕业证如何办理
 
W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...
W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...
W.H.Bender Quote 67 - Start with the MENU! The First Step in a Restaurant Sta...
 
Heritage Conservation.Strategies and Options for Preserving India Heritage
Heritage Conservation.Strategies and Options for Preserving India HeritageHeritage Conservation.Strategies and Options for Preserving India Heritage
Heritage Conservation.Strategies and Options for Preserving India Heritage
 
Top 12 Best Restaurants in Panchkula city
Top 12 Best Restaurants in Panchkula cityTop 12 Best Restaurants in Panchkula city
Top 12 Best Restaurants in Panchkula city
 
A Review on Recent Advances of Packaging in Food Industry
A Review on Recent Advances of Packaging in Food IndustryA Review on Recent Advances of Packaging in Food Industry
A Review on Recent Advances of Packaging in Food Industry
 
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
 
NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)
NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)
NANOTECHNOLOGY IN FOOD PACKAGING (FOOD TECHNOLOGY)
 
Cacao, the main component used in the creation of chocolate and other cacao-b...
Cacao, the main component used in the creation of chocolate and other cacao-b...Cacao, the main component used in the creation of chocolate and other cacao-b...
Cacao, the main component used in the creation of chocolate and other cacao-b...
 
USE OF AI in sensory evolution of food.pptx
USE OF AI in sensory evolution of food.pptxUSE OF AI in sensory evolution of food.pptx
USE OF AI in sensory evolution of food.pptx
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
FOOD OBESITY IN NORTH AMERICA ( NEW).pptx
FOOD OBESITY IN NORTH AMERICA ( NEW).pptxFOOD OBESITY IN NORTH AMERICA ( NEW).pptx
FOOD OBESITY IN NORTH AMERICA ( NEW).pptx
 

Lets eat presentation_final_20160521

  • 1. Let’s Eat! Brad Binder, Lesley Chapman, Jon Froiland, David Lee
  • 2. Introduction History: Since 1979 there have been services that review and rank restaurants (Zagat) • Today: According to Nielson – Americans have on average 41 apps on their smartphones, many of which provide a recommendation service
  • 3. Introduction A variety of restaurant recommendation apps have been created Features include: find restaurants, make reservations, and healthy options – A Restaurant Recommender would aim to help users save money, time, and could help cure buyers remorse
  • 4. Problem Summary We need a tool that resolves the challenge of finding a restaurant in your area based upon specific cuisine and menu item criteria entered by the user
  • 5. Hypothesis Hypothesis: The Restaurant Recommender will recommend a more accurate restaurant compared to selecting a restaurant based on chance alone Ho (null hypothesis): A user will find a restaurant that they like based on chance alone HA(alternative hypothesis): The restaurant recommender app will provide a better restaurant suggestion to the user compared chance alone
  • 6. Data Ingestion • WORM Storage –Stored HTML menu pages in one location which could be read many times • Parsed HTML with BeautifulSoup –Built out a list of “Restaurant” objects • GET requests to WMATA API to pull metro station data –JSON data parsed with pandas read_json() function Ingestion Wrangling Analysis Modeling Visualization
  • 7. Wrangling and Munging • Majority of time spent wrangling the data and building restaurants –Removing duplicate and incomplete records –Standardizing inconsistent fields (e.g. price) –Aggregating and grouping –Data types • Merged restaurant and WMATA data using Euclidean distance Ingestion Wrangling Analysis Modeling Visualization
  • 8. Data Overview Ingestion Wrangling Analysis Modeling Visualization 964 Total Restaurants 115,517 Total Menu Items • Restaurant data includes: –Name –Location (address, latitude, longitude) –Type of cuisine –Menu (item, price, description) • WMATA data includes: –Station name –Location (latitude, longitude) –Metro Line
  • 9. Analysis Ingestion Wrangling Analysis Modeling Visualization 10 cities 964 Restaurants 115,517 Menu Items
  • 10. Analysis Ingestion Wrangling Analysis Modeling Visualization 964 Restaurants 115,517 Menu Items
  • 11. Washington, D.C. Ingestion Wrangling Analysis Modeling Visualization
  • 12. Washington, D.C. Ingestion Wrangling Analysis Modeling Visualization
  • 13. Feature Selection • Four feature extraction pipelines using sklearn –Chunking –Cuisine Type • TfidfVectorizer –Extract keywords and assign significance score – Tokenize and chunk parts of speech using nltk • LabelBinarizer –Convert cuisine types to binary features • FeatureUnion Ingestion Wrangling Analysis Modeling Visualization
  • 14. Modeling and Prediction • Transformation pipelines and transformed feature vectors pickled • Kmeans models fitted using training restaurant data, then pickled • User inputs entered via Flask are stored as training instance • Relevant pipeline and model loaded to transform and predict Ingestion Wrangling Analysis Modeling Visualization
  • 15. K=15 Ingestion Wrangling Analysis Modeling Visualization
  • 16. Ingestion Wrangling Analysis Modeling Visualization Reporting and Visualization • Restaurant recommendations are determined by similarity within a matched cluster –“Similarity” is calculated by minimizing sklearn’s pairwise euclidean distance function between the test data and the training instances in the feature space • Predictions are exported into an interactive Tableau visualization –Allows the user flexibility in making a selection through filtering and visual indicators
  • 17. Demo
  • 18. Results • Some predictions are good, others not so good –Some clusters still contain a “hodge podge” • Removing the “cuisine type” feature helped to eliminate what we saw as overfit • Different k values saw better results in some cases, worse in others • Additional features (price, ratings, metro) would require more clusters and MORE DATA
  • 19. Conclusions • More data over a “better” model • Might improve results using transformations like Singular Value Decomposition (SVD) or Latent Dirichlet Allocation (LDA) – Better model analysis • With more data, improve our tokenizer – Incorporate stemming, improve chunking • Incorporating user feedback into prediction model (ex: Flask interface)
  • 20. Additional Opportunities • “Waiter-caller” function that would allow users to login, use the restaurant map search function, click on a restaurant, and be matched up with menu items based on keyword matches. As opposed to reading through an entire menu to find relevant items. –Required more knowledge and implementation of javascript, css, and jinja into the Flask environment. • Sentiment analyzer was developed but not integrated. Would allow users to go to restaurant and input a review. The review would then be analyzed giving back a recommended score (1- 5) to the user. –Similar requirements
  • 21. Sources • Downey, Allen B. Think Bayes. O’Reilly Media; 1st Edition. 2013. Paperback. • Downey, Allen B. Think Python. O’Reilly Media; 1st Edition, 2012. Paperback. • Dwyer, Gareth. Flask by Example. Packt Publishing, 2016. Paperback. • Harris, Harlin, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media; 1st Edition, 2013. • Julian, David. Designing Machine Learning Systems with Python. Packt Publishing, 2016. Paperback. • Kirk, Matthew. Thoughtful Machine Learning: A Test-Driven Approach. O’Reilly Media; 1st Edition, 2014. Paperback. • Kumar, Ashish. Learning Predictive Analytics with Python. Packt Publishing, 2016. Paperback. • McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media; 1st Edition, 2012. Paperback. • Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web. O’Reilly Media; 1st Edition, 2015. Paperback. • Raschka, Sebastian. Python Machine Learning. Packt Publishing, 2015. Paperback. • Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0 Applications. O’Reilly Media, 2007. Paperback.