Battle of the neighborhoods capstone for laura willis

Evaluation of Neighborhoods in
Toronto for a new Burmese
Restaurant
Applied Data Science Capstone (Week 5)
by LauraWillis

Introduction
• As owner ofThe Dataologist, clients come to me for
custom insights using Data Science
• I use my skills gained from the Coursera IBM Data
Science ProfessionalCertificate to help them to build
business plans, optimize their ideas and find solutions
related to their business
• In this scenario, a restauranteur who has a successful
Burmese restaurant in NewYork wants to open a 2nd
location of “Rangoon,” inToronto
• This restaunteur is less familiar withToronto and has
come to my firm for help in selecting the neighborhood

Location, Location, Location
• The proper location is crucial for a new restaurant
business to succeed
• Our entrepreneur feels that his best chance for
success is to open his restaurant where there are
already a number of Asian, and specificallyThai
restaurants, as Burmese shares some similarities
withThai food (much more so than Chinese,
Japanese or Korean food)
• Thus we will provide an analysis of neighborhoods
with a large number ofThai restaurants

Data acquisition and
Extraction
• Data Needed
• To solve this problem, I will need to collect the below data:
• List of neighborhoods in Toronto, Canada.
• Latitude and Longitude of these neighborhoods.
• Venue data related to Asian restaurant in order to help us
find the neighborhoods that already have many Asian
restaurants, and thus are the areas my client is looking to
open his restaurant in.
• Data Acquisition
• Scraping of Toronto neighborhoods via Wikipedia
• Gathering Latitude and Longitude data of these neighborhoods
via Geocoder package
• Utilizing Foursquare API to get venue data by neighborhood

Methodology
• First, I gathered a list of neighborhoods inToronto, Canada.This was possible by extracting the
list of neighborhoods fromWikipedia page
(“https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M”)
• Web scraping was completed by utilizing pandas html table scraping method, which also
allowed me to pull tabular data directly from a web page into data frame.
• However, this only provided a list of neighborhood names and postal codes.To proceed with the
project for my client I needed a way to layer in the data from Foursquare. I was able to do this
by using coordinates to pull the list of venues near these neighborhoods.
• In order to get the coordinates, I used the csv file provided by IBM team to match the
coordinates ofToronto neighborhoods.
• The next step was to use data visualization and create a map ofToronto using Folium package,
which allowed me to verify these coordinates (shown in the map to the left)
• Next, I used Foursquare API and pulled the list of the top 100 venues within 500 meters radius.
This provided me with the venue names, categories, latitude and longitude of their locations. I
was also able to check how many unique categories were in these venues.Then, I analyzed each
neighborhood by grouping the rows by neighborhood and taking the mean on the frequency of
occurrence of each venue category.The purpose of this step was to prepare clustering to be
done later.
• Here, I decided to look specifically for “Thai restaurants”.The reasoning is thatThai cuisine
would provide a better approximation for success of a Burmese restaurant as compared to
looking at either Chinese, Japanese, or Korean restaurants. Burmese flavor profiles are more
similar toThai food, so this is what I decided to focus on.
• Finally, I performed the clustering method by using k-means clustering.This method is one of
the simplest and most popular unsupervised machine learning algorithms and it is highly suited
for this project as well. I have clustered the neighborhoods inToronto into 3 clusters based on
their frequency of occurrence for “Thai food”. Based on the results (the concentration of
clusters), I will be able to recommend the ideal location to open the restaurant.

Results
• The results from k-means clustering show that we can
categorizeToronto neighborhoods into 3 clusters based
on how manyThai restaurants are in each neighborhood:
• Cluster 0: Neighborhoods with little or noThai
restaurants
• Cluster 1: Neighborhoods with noThai restaurants
• Cluster 2: Neighborhoods with high number ofThai
restaurants
• The results are visualized in the map to the right with
Cluster 0 in red color,Cluster 1 in purple color and Cluster
2 in light green color.

Recommendation
• Our recommendation is for the restauranteur to set up
business in inCluster 2 , with a high number ofThai
restaurants.
• This cluster includes the neighborhoods of Adelaide, King, and
Richmond areas.
• We recommend against those in Cluster 1 areas which are
NorthTorontoWest and Parkdale areas.
• Limitations and Suggestions for Future Research
• Per our contract with our client, the restaunteur, this project only
looks at one single factor: the density ofThai restaurants in each
neighborhood. There are many factors that this restauranteur
should take into consideration such as population density, income
of area residents, rent prices. Future research can take into
consideration of these factors.

Battle of the neighborhoods capstone for laura willis

Recommended

Recommended

More Related Content

Similar to Battle of the neighborhoods capstone for laura willis

Similar to Battle of the neighborhoods capstone for laura willis (20)

Recently uploaded

Recently uploaded (20)

Battle of the neighborhoods capstone for laura willis