Prediciting restaurant and popularity based on Yelp Dataset project report

Predicting
Restaurants Rating
and Popularity based
on Yelp Dataset

MACHINE LEARNING PROJECT REPORT

Submitted by
ALIN BABU (67)
NANDU O (66)
LIJU THOMAS (36)

Introduction
Restaurants rating on Yelp becomes an important indicator of their future. In this
project, we focus on predicting ratings and popularity change of restaurants. With data
from Yelp, we use several machine learning methods including logistic regression and
Naive Bayes, to make relevant predictions. While logistic regression seems to perform
better than the others, predictions from all the methods are far from perfect. This implies
the potential improvement of more data and more suited methodology.

Project Objectives

➔ To predict ratings of restaurants on Yelp and popularity change based on
restaurant features.
➔ Project can shed light on what customers value the most about a restaurant.

Dataset
➔ The data comes from Yelp Dataset Challenge .
➔ It includes review data, including text, time and rating.
➔ From the raw dataset, we select 20000 samples for testing.
➔ Due to different cultures across cities, we only focus on restaurants in a particular
city and surrounding areas in this project.
1

Algorithm and Methods

In this project we use mainly three machine learning algorithms to predict the
restaurant rating.The algorithm used here are supervisory learning .The algorithm used in
the project are:
★ Logistic Regression
★ Multinomial Naive Bayes
★ Naive Bayes
Logistic Regression
Logistic regression is a classification algorithm used to assign observations to a
discrete set of classes. Unlike linear regression which outputs continuous number values,
logistic regression transforms its output using the logistic sigmoid function to return a
probability value which can then be mapped to two or more discrete classes.
Naive Bayes
A Naive Bayes classifier is a probabilistic machine learning model that’s used for
classification task. The crux of the classifier is based on the Bayes theorem.
Bayes Theorem:
2

Using Bayes theorem, we can find the probability of A happening, given that B has
occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is
that the predictors/features are independent. That is presence of one particular feature
does not affect the other. Hence it is called naive.
Multinomial Naive Bayes
Multinomial Naive Bayes is a specialized version of Naive Bayes that is designed
more for text documents. Whereas simple naive Bayes would model a document as the
presence and absence of particular words, multinomial naive bayes explicitly models the
word counts and adjusts the underlying calculations to deal with in.

Data Pre-Processing
In this project we use mainly Yelp Dataset.Dataset consists of user review and
rating. We mainly use restaurant name, date, comfortability, star rating, comments,
review id out of these data we will two features which are essential for our prediction.
Selecting two valid features manually.
❖ Star rating
❖ Comments
After selecting the valid features we find the missing values of an attribute and then root
word extracting from comment rating using the methods below:
❖ Removing punctuations
3

❖ Removing stop words
❖ Stemming - The process of producing morphological variants of a root/base word.

Performance Evaluation

Logistic Regression

Naive Bayes

4

Multinomial Naive Bayes

Conclusion

After testing with 20000 samples we can see that logistic regression performs
better than the other methods. One possible explanation is that the assumptions for other
models are problematic, and logistic regression is more robust to problematic model
assumptions.This implies the potential improvement of more data and more suited
methodology.However, the prediction needs further improvement. We compare our best
predictor-logistic regression with a random-number predictor, and a constant-number
predictor. As we can see, the logistic predictor is only slightly better than the
constant-number predictor.

5

Prediciting restaurant and popularity based on Yelp Dataset project report

More Related Content

Similar to Prediciting restaurant and popularity based on Yelp Dataset project report

More from ALIN BABU

Recently uploaded

Prediciting restaurant and popularity based on Yelp Dataset project report