SlideShare a Scribd company logo
1 of 15
CAPSTONE PROJECT
BATTLE OF NEIGHBORHOODS
-SUBHADEEP DEB
INTRODUCTION
Toronto, the capital of Ontario province is the most populous city in Canada and the 4th most
in entire North America with a population of 2.9 million in 2017. Toronto is an international
center of business, finance, arts, and culture, and is recognized as one of the most
multicultural and cosmopolitan cities in the world, attracting tourists as well as immigrants
from various different parts of the world. Of the 2011 population, a whopping 49% of people
living in Toronto are immigrants (higher than the Canada-wide 29%) of which the Indian
community accounts for 6.3% of the total population. Having a considerable amount of
Indian population in Toronto means a good chunk of people will be practicing Indian culture
and customs in Toronto and the ethic food is undoubtedly one of the biggest part of Indian
lives. Research paper published by Joel Waldfogel found that Indian cuisine was the 4th most
popular in the world hence all these factors indicate no shortage in demand of Indian
restaurants in Toronto.
PROBLEM STATEMENT- To find the best place in Toronto to open a new Indian Restaurant.
DATA
For this project we used the following data:
1) Wikipedia page
'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' to get data
containing the postal codes of various regions of Canada.
2) Foursquare API: to get the information about different venues (Indian Restaurants)
in Toronto.
3) Beautiful Soap for web page scraping.
4) "https://cocl.us/Geospatial_data": We will get the data of Toronto from here
METHODOLOGY
Importing Libraries, Web Scraping and Data Cleaning
At first, we imported all the necessary libraries
We then used web scraping to get data of postal codes of various Boroughs of Canada from the
Wikipedia page 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', we store the
information in a new dataset containing names of boroughs, neighbourhoods and postal codes.
Now we see that the dataset contains various ‘Not assigned’ values hence we need to clean the data set
by dropping the postal codes having ‘Not Assigned’ values and storing the values in a new dataframe.
Now create a new table containing the co-ordinates of all the postal codes with help of
"https://cocl.us/Geospatial_data" and then merge this table with the table containing names of
boroughs and neighbourhoods. We now only look for those neighbourhoods that have word ‘Toronto’
in it as our focus is only on the city of Toronto.
Visualization
We now visualize all the neighbourhoods of Toronto with help of folium map and create popup label
showing the neighbourhood’s name.
Then with the help of Foursquare API we create a get request URL to get the top 100 venues within a
radius of 500 meters. Then we do one hot encoding for getting dummies of the venue category and
group the whole data by Neighbourhoods.
Machine Learning
We will use k-means clustering to create separate clusters containing different neighbourhoods of
Toronto. But for using k-means clustering we must first find the optimal value for k so we create a graph
containing different values of k and the corresponding error. By analysing the graph, we see that the
elbow point for the graph occurs at k=4, hence we choose k=4. choose k=4.
So now we cluster the neighbourhoods and merge the tables so that we can see the names of
neighbourhood, cluster, co-ordinates, venue categories.
Now we show all the clusters in a Toronto map with help of folium to better visualise the results.
RESULTS
We find out the number of neighbourhoods in each cluster to give us the idea of each cluster. We find
out that Cluster 1 has the greatest number of neighbourhoods (30) followed by Cluster 0, Cluster 3 and
Cluster 2.
Then we look at the number of Indian restaurants in each of these clusters and find that
Cluster 2 has the highest number of Indian restaurants followed by Cluster 3 and Cluster 0
meanwhile Cluster 1 has little to no Indian restaurants.
Then we find the neighbourhoods in Cluster 2.
DISCUSSION
As we can see that the Cluster 1 has the most no. of Neighbourhoods but has the least number of
Indian restaurants of any cluster of Toronto. This shows that there is a potential market for opening a
new Indian restaurant in Cluster 1 neighbourhoods like Richmond, Studio District, Rosedale etc.
Meanwhile the situation is quite the opposite on Cluster 2 which as the fewest no. of neighbourhoods
but the most no. of Indian Restaurants. Cluster 0 and Cluster 3 both have average no. of
neighbourhoods and average no. of Indian restaurants hence they are well balanced. In the end I will
conclude that if I had to open a new Indian Restaurant in Toronto, I would have opened it in Cluster 1
neighbourhood.
CONCLUSION
With this report we found out a potential region for opening a new Indian restaurant in Toronto with
the help of various data science tools. The results of the report can be improved with more
comprehensive analysis and with help of other better data sources in the future.

More Related Content

Similar to Capstone project presentation

Adams_UrbanAgricultureandTerroir
Adams_UrbanAgricultureandTerroirAdams_UrbanAgricultureandTerroir
Adams_UrbanAgricultureandTerroir
Burke Adams
 

Similar to Capstone project presentation (20)

Scholarship Essay Educational Goals Sample
Scholarship Essay Educational Goals SampleScholarship Essay Educational Goals Sample
Scholarship Essay Educational Goals Sample
 
Introduction - How To Write An Essay - LibGuides At Universi
Introduction - How To Write An Essay - LibGuides At UniversiIntroduction - How To Write An Essay - LibGuides At Universi
Introduction - How To Write An Essay - LibGuides At Universi
 
Undergrad HTMG3030 Real Estate Assignment.docx
Undergrad HTMG3030 Real Estate Assignment.docxUndergrad HTMG3030 Real Estate Assignment.docx
Undergrad HTMG3030 Real Estate Assignment.docx
 
Clustering neighbourhood data
Clustering neighbourhood dataClustering neighbourhood data
Clustering neighbourhood data
 
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPURGeek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
 
Adams_UrbanAgricultureandTerroir
Adams_UrbanAgricultureandTerroirAdams_UrbanAgricultureandTerroir
Adams_UrbanAgricultureandTerroir
 
Grow to: An Urban Agriculture Action Plan for Toronto, Canada
Grow to: An Urban Agriculture Action Plan for Toronto, CanadaGrow to: An Urban Agriculture Action Plan for Toronto, Canada
Grow to: An Urban Agriculture Action Plan for Toronto, Canada
 
The Battle of Neighborhoods (Week 2)
The Battle of Neighborhoods (Week 2)The Battle of Neighborhoods (Week 2)
The Battle of Neighborhoods (Week 2)
 
Tuskegee Essay
Tuskegee EssayTuskegee Essay
Tuskegee Essay
 
Best Canadian Cities for Newcomers in 2024
Best Canadian Cities for Newcomers in 2024Best Canadian Cities for Newcomers in 2024
Best Canadian Cities for Newcomers in 2024
 
City Of Toronto, A World Class City
City Of Toronto, A World Class CityCity Of Toronto, A World Class City
City Of Toronto, A World Class City
 
Business Essay. Online business essay. Business Essay Topics To Write About....
Business Essay.  Online business essay. Business Essay Topics To Write About....Business Essay.  Online business essay. Business Essay Topics To Write About....
Business Essay. Online business essay. Business Essay Topics To Write About....
 
RestaurantProject
RestaurantProjectRestaurantProject
RestaurantProject
 
Essay About Yourself College. Online assignment writing service.
Essay About Yourself College. Online assignment writing service.Essay About Yourself College. Online assignment writing service.
Essay About Yourself College. Online assignment writing service.
 
Capstone project the battle of neighborhoods presentation
Capstone project the battle of neighborhoods presentationCapstone project the battle of neighborhoods presentation
Capstone project the battle of neighborhoods presentation
 
Wisconsin Essays 2016. Online assignment writing service.
Wisconsin Essays 2016. Online assignment writing service.Wisconsin Essays 2016. Online assignment writing service.
Wisconsin Essays 2016. Online assignment writing service.
 
Portfolio
PortfolioPortfolio
Portfolio
 
BIBLIOGRAPIYA- Kumalap ng mga Datos.pptx
BIBLIOGRAPIYA- Kumalap ng mga Datos.pptxBIBLIOGRAPIYA- Kumalap ng mga Datos.pptx
BIBLIOGRAPIYA- Kumalap ng mga Datos.pptx
 
4Th Grade Essay Prompt. Online assignment writing service.
4Th Grade Essay Prompt. Online assignment writing service.4Th Grade Essay Prompt. Online assignment writing service.
4Th Grade Essay Prompt. Online assignment writing service.
 
4Th Grade Essay Prompt
4Th Grade Essay Prompt4Th Grade Essay Prompt
4Th Grade Essay Prompt
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 

Capstone project presentation

  • 1. CAPSTONE PROJECT BATTLE OF NEIGHBORHOODS -SUBHADEEP DEB
  • 2. INTRODUCTION Toronto, the capital of Ontario province is the most populous city in Canada and the 4th most in entire North America with a population of 2.9 million in 2017. Toronto is an international center of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world, attracting tourists as well as immigrants from various different parts of the world. Of the 2011 population, a whopping 49% of people living in Toronto are immigrants (higher than the Canada-wide 29%) of which the Indian community accounts for 6.3% of the total population. Having a considerable amount of Indian population in Toronto means a good chunk of people will be practicing Indian culture and customs in Toronto and the ethic food is undoubtedly one of the biggest part of Indian lives. Research paper published by Joel Waldfogel found that Indian cuisine was the 4th most popular in the world hence all these factors indicate no shortage in demand of Indian restaurants in Toronto. PROBLEM STATEMENT- To find the best place in Toronto to open a new Indian Restaurant.
  • 3. DATA For this project we used the following data: 1) Wikipedia page 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' to get data containing the postal codes of various regions of Canada. 2) Foursquare API: to get the information about different venues (Indian Restaurants) in Toronto. 3) Beautiful Soap for web page scraping. 4) "https://cocl.us/Geospatial_data": We will get the data of Toronto from here
  • 4. METHODOLOGY Importing Libraries, Web Scraping and Data Cleaning At first, we imported all the necessary libraries
  • 5. We then used web scraping to get data of postal codes of various Boroughs of Canada from the Wikipedia page 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', we store the information in a new dataset containing names of boroughs, neighbourhoods and postal codes. Now we see that the dataset contains various ‘Not assigned’ values hence we need to clean the data set by dropping the postal codes having ‘Not Assigned’ values and storing the values in a new dataframe.
  • 6. Now create a new table containing the co-ordinates of all the postal codes with help of "https://cocl.us/Geospatial_data" and then merge this table with the table containing names of boroughs and neighbourhoods. We now only look for those neighbourhoods that have word ‘Toronto’ in it as our focus is only on the city of Toronto.
  • 7. Visualization We now visualize all the neighbourhoods of Toronto with help of folium map and create popup label showing the neighbourhood’s name.
  • 8. Then with the help of Foursquare API we create a get request URL to get the top 100 venues within a radius of 500 meters. Then we do one hot encoding for getting dummies of the venue category and group the whole data by Neighbourhoods. Machine Learning We will use k-means clustering to create separate clusters containing different neighbourhoods of Toronto. But for using k-means clustering we must first find the optimal value for k so we create a graph containing different values of k and the corresponding error. By analysing the graph, we see that the elbow point for the graph occurs at k=4, hence we choose k=4. choose k=4.
  • 9.
  • 10. So now we cluster the neighbourhoods and merge the tables so that we can see the names of neighbourhood, cluster, co-ordinates, venue categories.
  • 11. Now we show all the clusters in a Toronto map with help of folium to better visualise the results.
  • 12. RESULTS We find out the number of neighbourhoods in each cluster to give us the idea of each cluster. We find out that Cluster 1 has the greatest number of neighbourhoods (30) followed by Cluster 0, Cluster 3 and Cluster 2.
  • 13. Then we look at the number of Indian restaurants in each of these clusters and find that Cluster 2 has the highest number of Indian restaurants followed by Cluster 3 and Cluster 0 meanwhile Cluster 1 has little to no Indian restaurants. Then we find the neighbourhoods in Cluster 2.
  • 14. DISCUSSION As we can see that the Cluster 1 has the most no. of Neighbourhoods but has the least number of Indian restaurants of any cluster of Toronto. This shows that there is a potential market for opening a new Indian restaurant in Cluster 1 neighbourhoods like Richmond, Studio District, Rosedale etc. Meanwhile the situation is quite the opposite on Cluster 2 which as the fewest no. of neighbourhoods but the most no. of Indian Restaurants. Cluster 0 and Cluster 3 both have average no. of neighbourhoods and average no. of Indian restaurants hence they are well balanced. In the end I will conclude that if I had to open a new Indian Restaurant in Toronto, I would have opened it in Cluster 1 neighbourhood.
  • 15. CONCLUSION With this report we found out a potential region for opening a new Indian restaurant in Toronto with the help of various data science tools. The results of the report can be improved with more comprehensive analysis and with help of other better data sources in the future.