Predicting Restaurants Rating and Popularity
based on Yelp Dataset
Presented by : Alin Babu(67)|Nandu O(66)|Liju Thomas(36)
Abstract
Restaurants rating on Yelp becomes an important indicator of their
future. In this project, we focus on predicting ratings and popularity
change of restaurants. With data from Yelp, we use several machine
learning methods including logistic regression and Naive Bayes, to make
relevant predictions. While logistic regression seems to perform better
than the others, predictions from all the methods are far from perfect.
This implies the potential improvement of more data and more suited
methodology.
Project Goals
 To predict ratings of restaurants on Yelp and popularity change based
on restaurant features.
 Project can shed lights on what customers value the most about a
restaurant.
Input
 Uses Yelp dataset
 The input to the algorithm is related feature variables about a
restaurant, such as :
– Restaurant name
– Comfortability
– Date
– Review
 Star Rating
 Comments
Algorithms and Methods
 Logistic Regression
 Naive Bayes
 Multinomial Naive Bayes
Dataset
 The data comes from Yelp Dataset Challenge .
 It includes review data, including text, time and rating.
 From the raw dataset, we select 20000 samples for testing.
 Due to different cultures across cities, we only focus on restaurants in
a particular city and surrounding areas in this project.
Data Pre-processing
• There is Restaurant name, date, comfortability, star rating, comments, review id
 Selecting two valid features manually.
 Star rating
 Comments
• Finding missing values of an attribute using isna().
• Root word extracting from comment rating using the methods,
 Removing punctuations
 Removing stop words
 Stemming
Performance Evaluation
Performance Evaluation
Performance Evaluation
Prediciting restaurant and popularity based on Yelp Dataset - 2

Prediciting restaurant and popularity based on Yelp Dataset - 2

  • 1.
    Predicting Restaurants Ratingand Popularity based on Yelp Dataset Presented by : Alin Babu(67)|Nandu O(66)|Liju Thomas(36)
  • 2.
    Abstract Restaurants rating onYelp becomes an important indicator of their future. In this project, we focus on predicting ratings and popularity change of restaurants. With data from Yelp, we use several machine learning methods including logistic regression and Naive Bayes, to make relevant predictions. While logistic regression seems to perform better than the others, predictions from all the methods are far from perfect. This implies the potential improvement of more data and more suited methodology.
  • 3.
    Project Goals  Topredict ratings of restaurants on Yelp and popularity change based on restaurant features.  Project can shed lights on what customers value the most about a restaurant.
  • 4.
    Input  Uses Yelpdataset  The input to the algorithm is related feature variables about a restaurant, such as : – Restaurant name – Comfortability – Date – Review  Star Rating  Comments
  • 5.
    Algorithms and Methods Logistic Regression  Naive Bayes  Multinomial Naive Bayes
  • 6.
    Dataset  The datacomes from Yelp Dataset Challenge .  It includes review data, including text, time and rating.  From the raw dataset, we select 20000 samples for testing.  Due to different cultures across cities, we only focus on restaurants in a particular city and surrounding areas in this project.
  • 7.
    Data Pre-processing • Thereis Restaurant name, date, comfortability, star rating, comments, review id  Selecting two valid features manually.  Star rating  Comments • Finding missing values of an attribute using isna(). • Root word extracting from comment rating using the methods,  Removing punctuations  Removing stop words  Stemming
  • 8.
  • 9.
  • 10.