This is analysis of Indian restaurants in NY, LA, and Chicago using Yelp API and Python, where I have used Yelp API to fetch pertinent information and used Python's statistical programming packages for visualisation and analysis.
2. Introduction
+ Since I couldn’t share projects I did for my previous employers, I have done
a mini-Market Research to showcase my skillset.
Source code and Main project:
https://colab.research.google.com/drive/1tJyxqIDbrrLm9LjUfcBcgFT9RLuk-Vum
Motivation
+ In this project, I have used Python to extract information and prepared a report to
evaluate the popularity and quality of Indian restaurants in three different regions of
America.
+ I used Yelp API and Python to fetch pertinent information and used Python's
statistical programming packages for analysis.
3. Data Fetching
+ I have filtered API responses for the keyword "Indian Restaurant" and
location (Chicago, New York, LA) in three different tables.
+ I used Python packages such as Pandas, JSON, Matplotlib, etc.
+ I got the following table out of the process, which you can also see in detail
in the source code provided at the bottom of the second slide.
4. Analysis Overview
+ I started by analyzing the distribution of ratings and review counts in
each location.
+ I used customer ratings as a signal to evaluate and compare quality
in these three regions.
+ I have used both ratings and the number of ratings & reviews as a
signal of popularity.
+ Using the information from the table on the slide above, I have plotted
histograms for the number of reviews and the ratings for each city.
+ I have also generated a summary stats table at the end of it.
5. Data Analysis: New York City
+ Histogram on the left side shows the distribution
of the number of reviews for restaurants.
+ We can observe that there are very few
restaurants that have more than 1000 reviews. A
lot of the restaurants have reviews between 300-
500. A moderate number of restaurants have
reviews between 500-100.
+ But how do we know these reviews are positive or
negative? We cannot exactly know for sure until
we do some form of sentiment analysis.
+ So, we look at the right bar graph showing ratings,
we see most of the restaurants have ratings of
either 4 or 4.5. Out of 50 restaurants, around 5
have a 3.5 rating, and every other restaurant are
ranked higher.
+ As discussed earlier, the high ratings do back the
conjecture that most reviews are positive and
hence, based on the histogram on the left, we can
say that the Indian restaurants are popular in New
York City. While, from the right graph, we can say
that the customers highly approve of the overall
quality of the restaurants in the city.
6. SUMMARY STATS
+ The summary statistics just validates
our intuition from the graphs above.
+ The mean rating across restaurants
in New York City is 4.17 and the
mean review count is around ~472.
+ This indicates there is high number
of people rating Indian restaurants
more than 4 (on a scale of 5.)
7. Data Analysis: Los Angeles
We see similar a trend in Los Angeles as we did in New York. Both the average review count (~595) and
the ratings (4.18) are very high. With the same line of argument that we discussed for New York City, the
quality and popularity of Indian restaurants in Los Angeles are also impressive.
8. Data Analysis: Chicago
• We can see that the mean review count is approximately 295 which is much lower than that of New
York City and LA. In terms of ratings, the mean rating of Chicago Indian restaurants is 4.06, which is a
slight decrease compared to that of LA and New York City.
• We can conclude that Indian restaurants in Chicago aren’t as popular as in the other two cities, and
there is also a slight decrease in quality based on the average rating.
9. + The correlation is slightly negative. But the small
magnitude of the correlation and the low variance hardly
gives us any room for interpretation of these statistics.
+ Therefore, we will compare the average ratings and review
counts across three cities by using bar graphs.
Correlation between rating and the
number of review per restaurants: -0.1328
10. Visual representation of combined data.
• The high average rating across all cities gives us reason to believe that, in general,
customers highly approve of Indian restaurants.
• It is hard to say what is causing this lower number of average reviews in Chicago
despite having high ratings.
• But this also gives us further scope of exploration and research.
11. + We can further evaluate the difference between the low
number of reviews in NY and Chicago despite having
almost similar average ratings as LA.
+ Perhaps the restaurant business is marketed more in one
region than the others, in that case we can explore data
relevant to marketing in each cities.
+ Maybe, it has to do with the relative distribution of Indian
diaspora and the income distribution in different cities,
where we can explore income and demographic data.
+ We can also always investigate more cities to see if any of
these cities is an outliers for their respective regions.
Further scope of exploration.
12. Conclusion:
• We don’t have sufficient data to comment anything
on the discrepancy in the number of reviews in
different cities compared to their relatively same
ratings. But it also gives us further scope to explore.
• However, the combined the average review count
and average rating is approximately 445 and 4.13
respectively, which are both good numbers in the
context of restaurant business.
• These two statistics show that Indian restaurants
are very popular across three cities.
• Although generalizing this result across all cities
would be a big jump, nevertheless, this analysis
does give us an idea about the general vibe
surrounding the Indian restaurant business in the
US.