SlideShare a Scribd company logo
Predicting Restaurants Rating and Popularity
based on Yelp Dataset
Presented by : Alin Babu(67)|Nandu O(66)|Liju Thomas(36)
Abstract
Restaurants rating on Yelp becomes an important indicator of their
future. In this project, we focus on predicting ratings and popularity
change of restaurants. With data from Yelp, we use several machine
learning methods including logistic regression and Naive Bayes, to make
relevant predictions. While logistic regression seems to perform better
than the others, predictions from all the methods are far from perfect.
This implies the potential improvement of more data and more suited
methodology.
Project Goals
 To predict ratings of restaurants on Yelp and popularity change based
on restaurant features.
 Project can shed lights on what customers value the most about a
restaurant.
Input
 Uses Yelp dataset
 The input to the algorithm is related feature variables about a
restaurant, such as :
– Restaurant name
– Comfortability
– Date
– Review
 Star Rating
 Comments
Algorithms and Methods
 Logistic Regression
 Naive Bayes
 Multinomial Naive Bayes
Dataset
 The data comes from Yelp Dataset Challenge .
 It includes review data, including text, time and rating.
 From the raw dataset, we select 20000 samples for testing.
 Due to different cultures across cities, we only focus on restaurants in
a particular city and surrounding areas in this project.
Data Pre-processing
• There is Restaurant name, date, comfortability, star rating, comments, review id
 Selecting two valid features manually.
 Star rating
 Comments
• Finding missing values of an attribute using isna().
• Root word extracting from comment rating using the methods,
 Removing punctuations
 Removing stop words
 Stemming
Performance Evaluation
Performance Evaluation
Performance Evaluation
Prediciting restaurant and popularity based on Yelp Dataset - 2

More Related Content

Similar to Prediciting restaurant and popularity based on Yelp Dataset - 2

BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
Shivam Dhawan
 
Deriving measurable content drivers for effective Content Marketing
Deriving measurable content drivers for effective Content MarketingDeriving measurable content drivers for effective Content Marketing
Deriving measurable content drivers for effective Content Marketing
Michael Wolfe
 
Text Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online ReviewsText Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online Reviews
Mark Chesney
 
Context Matters - Shifting from Reporting to Analysis
Context Matters - Shifting from Reporting to AnalysisContext Matters - Shifting from Reporting to Analysis
Context Matters - Shifting from Reporting to Analysis
MnSearch, The Minnesota Search Engine Marketing Association
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptxWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
farhanaaansari42
 
Neil Shapiro Resume Digital Marketing Google Analytics Expert
Neil Shapiro Resume Digital Marketing Google Analytics ExpertNeil Shapiro Resume Digital Marketing Google Analytics Expert
Neil Shapiro Resume Digital Marketing Google Analytics Expert
Neil Shapiro
 
Restaurant Review Sentiment Analysis
Restaurant Review Sentiment AnalysisRestaurant Review Sentiment Analysis
Restaurant Review Sentiment Analysis
AkritiGupta99
 
Study harvard reviews reputation and revenue- the case of yelp
Study   harvard reviews reputation and revenue- the case of yelpStudy   harvard reviews reputation and revenue- the case of yelp
Study harvard reviews reputation and revenue- the case of yelp
Bitsytask
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdfWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
farhanaaansari42
 
How to boost your ASO with data analytics?
How to boost your ASO with data analytics?How to boost your ASO with data analytics?
How to boost your ASO with data analytics?
GameCamp
 
Restaurants of Seoul - "likes" prediction report
Restaurants of Seoul - "likes" prediction reportRestaurants of Seoul - "likes" prediction report
Restaurants of Seoul - "likes" prediction report
Amelia Choi
 
Opticon 2015-Scaling Your Testing Program for Maximum Impact
Opticon 2015-Scaling Your Testing Program for Maximum ImpactOpticon 2015-Scaling Your Testing Program for Maximum Impact
Opticon 2015-Scaling Your Testing Program for Maximum Impact
Optimizely
 
Hypothesis presentation (2)
Hypothesis presentation (2)Hypothesis presentation (2)
Hypothesis presentation (2)
MyNearBite
 
Sentiment analysis of Restaurant reviews ppt
Sentiment analysis of Restaurant reviews pptSentiment analysis of Restaurant reviews ppt
Sentiment analysis of Restaurant reviews ppt
bhaskargani46
 
What is analytics
What is analyticsWhat is analytics
What is analytics
shweta saxena
 
Final.Version
Final.VersionFinal.Version
Final.Version
Fabio Amaral
 
GEOG 465 Final Project Presentation
GEOG 465 Final Project PresentationGEOG 465 Final Project Presentation
GEOG 465 Final Project Presentation
Jinyang Luo
 
Making Restaurant Inspections Smarter
Making Restaurant Inspections SmarterMaking Restaurant Inspections Smarter
Making Restaurant Inspections Smarter
Ian Henshaw
 
Measuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kimMeasuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kim
Jin Young Kim
 
Empowering Businesses using Yelp Reviews Mining
Empowering Businesses using Yelp Reviews MiningEmpowering Businesses using Yelp Reviews Mining
Empowering Businesses using Yelp Reviews Mining
Vipul Munot
 

Similar to Prediciting restaurant and popularity based on Yelp Dataset - 2 (20)

BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Deriving measurable content drivers for effective Content Marketing
Deriving measurable content drivers for effective Content MarketingDeriving measurable content drivers for effective Content Marketing
Deriving measurable content drivers for effective Content Marketing
 
Text Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online ReviewsText Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online Reviews
 
Context Matters - Shifting from Reporting to Analysis
Context Matters - Shifting from Reporting to AnalysisContext Matters - Shifting from Reporting to Analysis
Context Matters - Shifting from Reporting to Analysis
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptxWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pptx
 
Neil Shapiro Resume Digital Marketing Google Analytics Expert
Neil Shapiro Resume Digital Marketing Google Analytics ExpertNeil Shapiro Resume Digital Marketing Google Analytics Expert
Neil Shapiro Resume Digital Marketing Google Analytics Expert
 
Restaurant Review Sentiment Analysis
Restaurant Review Sentiment AnalysisRestaurant Review Sentiment Analysis
Restaurant Review Sentiment Analysis
 
Study harvard reviews reputation and revenue- the case of yelp
Study   harvard reviews reputation and revenue- the case of yelpStudy   harvard reviews reputation and revenue- the case of yelp
Study harvard reviews reputation and revenue- the case of yelp
 
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdfWeb Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
Web Scraping Food Reviews Data & Sentiment Analysis– A Comprehensive Guide.pdf
 
How to boost your ASO with data analytics?
How to boost your ASO with data analytics?How to boost your ASO with data analytics?
How to boost your ASO with data analytics?
 
Restaurants of Seoul - "likes" prediction report
Restaurants of Seoul - "likes" prediction reportRestaurants of Seoul - "likes" prediction report
Restaurants of Seoul - "likes" prediction report
 
Opticon 2015-Scaling Your Testing Program for Maximum Impact
Opticon 2015-Scaling Your Testing Program for Maximum ImpactOpticon 2015-Scaling Your Testing Program for Maximum Impact
Opticon 2015-Scaling Your Testing Program for Maximum Impact
 
Hypothesis presentation (2)
Hypothesis presentation (2)Hypothesis presentation (2)
Hypothesis presentation (2)
 
Sentiment analysis of Restaurant reviews ppt
Sentiment analysis of Restaurant reviews pptSentiment analysis of Restaurant reviews ppt
Sentiment analysis of Restaurant reviews ppt
 
What is analytics
What is analyticsWhat is analytics
What is analytics
 
Final.Version
Final.VersionFinal.Version
Final.Version
 
GEOG 465 Final Project Presentation
GEOG 465 Final Project PresentationGEOG 465 Final Project Presentation
GEOG 465 Final Project Presentation
 
Making Restaurant Inspections Smarter
Making Restaurant Inspections SmarterMaking Restaurant Inspections Smarter
Making Restaurant Inspections Smarter
 
Measuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kimMeasuring the Quality of Online Service - Jinyoung kim
Measuring the Quality of Online Service - Jinyoung kim
 
Empowering Businesses using Yelp Reviews Mining
Empowering Businesses using Yelp Reviews MiningEmpowering Businesses using Yelp Reviews Mining
Empowering Businesses using Yelp Reviews Mining
 

More from ALIN BABU

SECRY - Secure file storage on cloud using hybrid cryptography
SECRY - Secure file storage on cloud using hybrid cryptographySECRY - Secure file storage on cloud using hybrid cryptography
SECRY - Secure file storage on cloud using hybrid cryptography
ALIN BABU
 
Project final report
Project final reportProject final report
Project final report
ALIN BABU
 
Secry poster
Secry posterSecry poster
Secry poster
ALIN BABU
 
Secure Cloud Storage
Secure Cloud StorageSecure Cloud Storage
Secure Cloud Storage
ALIN BABU
 
Secure cloud storage
Secure cloud storageSecure cloud storage
Secure cloud storage
ALIN BABU
 
iPhone
iPhoneiPhone
iPhone
ALIN BABU
 
Report
ReportReport
Report
ALIN BABU
 

More from ALIN BABU (7)

SECRY - Secure file storage on cloud using hybrid cryptography
SECRY - Secure file storage on cloud using hybrid cryptographySECRY - Secure file storage on cloud using hybrid cryptography
SECRY - Secure file storage on cloud using hybrid cryptography
 
Project final report
Project final reportProject final report
Project final report
 
Secry poster
Secry posterSecry poster
Secry poster
 
Secure Cloud Storage
Secure Cloud StorageSecure Cloud Storage
Secure Cloud Storage
 
Secure cloud storage
Secure cloud storageSecure cloud storage
Secure cloud storage
 
iPhone
iPhoneiPhone
iPhone
 
Report
ReportReport
Report
 

Recently uploaded

Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

Prediciting restaurant and popularity based on Yelp Dataset - 2

  • 1. Predicting Restaurants Rating and Popularity based on Yelp Dataset Presented by : Alin Babu(67)|Nandu O(66)|Liju Thomas(36)
  • 2. Abstract Restaurants rating on Yelp becomes an important indicator of their future. In this project, we focus on predicting ratings and popularity change of restaurants. With data from Yelp, we use several machine learning methods including logistic regression and Naive Bayes, to make relevant predictions. While logistic regression seems to perform better than the others, predictions from all the methods are far from perfect. This implies the potential improvement of more data and more suited methodology.
  • 3. Project Goals  To predict ratings of restaurants on Yelp and popularity change based on restaurant features.  Project can shed lights on what customers value the most about a restaurant.
  • 4. Input  Uses Yelp dataset  The input to the algorithm is related feature variables about a restaurant, such as : – Restaurant name – Comfortability – Date – Review  Star Rating  Comments
  • 5. Algorithms and Methods  Logistic Regression  Naive Bayes  Multinomial Naive Bayes
  • 6. Dataset  The data comes from Yelp Dataset Challenge .  It includes review data, including text, time and rating.  From the raw dataset, we select 20000 samples for testing.  Due to different cultures across cities, we only focus on restaurants in a particular city and surrounding areas in this project.
  • 7. Data Pre-processing • There is Restaurant name, date, comfortability, star rating, comments, review id  Selecting two valid features manually.  Star rating  Comments • Finding missing values of an attribute using isna(). • Root word extracting from comment rating using the methods,  Removing punctuations  Removing stop words  Stemming