SlideShare a Scribd company logo
1 of 13
Neighborhoods
Similarities
Introduction
 Moving can be a tricky decision in itself without having to deal with
choosing the best area suited to you
 Since it is hard to adapt to a new place we found that it helps for one to
find place where can keep practicing his daily routines, like going to
nearby gym in the morning or getting you coffee in café near your house
before getting to work
 Hence we came up with this idea where we try to find the most similar
neighborhoods between two cities I to help people who are interested in
moving out of their current neighborhood by recommending for them the
neighborhood that best fit there lifstyle
Data: New York Data
 for new York Data we used a the same dataset we have used previously in
the 3rd week lab in the course[1] which provided a json file that contains an
already well organized data that didn’t need much cleaning and looked
eventually like this:
Data: Toronto Data
 for the Toronto data we retrieved the boroughs and neighborhoods from a
Wikipedia page[2] using web scraping techniques and then we used the
geospatial data file that was provided in the 3rd week assignment in order
to add the coordinate for each area and after some cleaning by removing
all unassigned boroughs and grouping neighborhoods that belongs to the
same postal code we came with the following table:
Data: Merging the two Datasets
 Finally, we did some formatting for the Toronto neighborhood data frame
to match the same format of the New York Neighborhood data frame in
order to concatenate them into one data frame that contains all
neighborhoods after adding a suffix to each Neighborhood name to
indicate the country the neighborhood belongs to, this step can be very
useful if we decided to expand the project in the future to include more
cities
Methodology: Features
For the purpose of this project, we are going to be considering the venues
surrounding a neighborhood to be the only deciding factor to determine
the similarity between neighborhoods, as we will extract the X most
common venues to use as the features of our model and after retrieving
those feature we store them in a data frame which eventually looked like
this:
Methodology: Model
For our model we chose a hierarchal clustering model specifically
Agglomerative Clustering since it is the most commonly used hierarchal
clustering model, and the reason behind this choice is that for our goal the
hierarchal model can give us different levels of similarities as opposed to
classical partition based clustering like K-means which can only define the
similarity to be between points that belong to same cluster without
considering hoe similar are they, whereas in a hierarchal model we can easily
find the neighborhood most similar and dissimilar for any neighborhood
which can give more options for our goal
Methodology: Libraries
 beautiful soup : we used Beautiful Soup library as a web scraping library on
order to retrieve the Toronto city data and format properly
 Scipy: we chose Scipy library to represent our model and the reason we
chose it over sklearn is that it provides easy and simple function to help us
draw our dendrogram
 FourSquare: we used FourSquare API to retrieve the all the venues nearby
a specific location
Result
After training our model to fit our features we found that most
neighborhoods that are very similar are within the same city but there a few
exceptions like port Morris new York which we find that the most similar
neighborhood to it is the ones inside the postal area M7Y which means that
those two neighborhoods are closer to each other more than they are similar
to other neighborhood within their same city.
Discussion
As we saw previously we were able to reach a model that can answer our
question, However it is somewhat hard to measure the accuracy of mentioned
model as it is the case with most clustering problems but that is okay in a way
since the goal of this model is to give an approximate recommendation for
someone looking to move to a new neighborhood and doesn’t want to
change his/her lifestyle dramatically.
Yet there is still one big limitation in our project and it is that we didn’t use
enough features to have the best decision possible since we considered the 7
most common venues to be the only deciding factor.
Discussion(Contd..)
So for future work we can add more features like land area, population
density and average monthly temperature and other properties that can
factor in one’s decision to choose a new living area. We can also try different
models that can help us achieve our goal such as fuzzy clustering model
which can be very useful to us since it supports a data point belonging to
multiple clusters in different degrees on possibility.
Conclusion
In conclusion, we find that we can find hierarchal clustering model to help us
find the most similar neighborhood for a specific neighborhood in order to
help someone trying to decide on a new place to live without making
dramatic changes to his/her routine but we find that one limiting factor in our
work is the features we chose to help our model decision as it is not enough
to help our model achieve the best decision possible thus it will add more
feature in our future work.
References
 [1]: https://cocl.us/new_york_dataset
 [2]: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

More Related Content

Similar to Neighborhoods similarities

Basha Chai --BashaChai.xyz Report
Basha Chai --BashaChai.xyz ReportBasha Chai --BashaChai.xyz Report
Basha Chai --BashaChai.xyz ReportTaniv Ashraf
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
 
Extraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesExtraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesAkanksha Prasad
 
ds 1.pptx
ds 1.pptxds 1.pptx
ds 1.pptxvaru9
 
Pairing Tweets with the Right Location
Pairing Tweets with the Right LocationPairing Tweets with the Right Location
Pairing Tweets with the Right LocationICDEcCnferenece
 
Data about the sector workshop
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
 
Development slam 2017
Development slam 2017Development slam 2017
Development slam 2017John Murray
 
ViziCities - Lessons Learnt Visualising Real-world Cities in 3D
ViziCities - Lessons Learnt Visualising Real-world Cities in 3DViziCities - Lessons Learnt Visualising Real-world Cities in 3D
ViziCities - Lessons Learnt Visualising Real-world Cities in 3DRobin Hawkes
 
Object oriented programming in r
Object oriented programming in r Object oriented programming in r
Object oriented programming in r Ashwini Mathur
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsJames Howison
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
DataChat_FinalPaper
DataChat_FinalPaperDataChat_FinalPaper
DataChat_FinalPaperUrjit Patel
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsUniversity of Bergen
 
Community Metrics at Novell
Community Metrics at NovellCommunity Metrics at Novell
Community Metrics at NovellSIKM
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsNeo4j
 
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIBig Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIRuchika Sharma
 
ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...
ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...
ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...IJCI JOURNAL
 

Similar to Neighborhoods similarities (20)

Basha Chai --BashaChai.xyz Report
Basha Chai --BashaChai.xyz ReportBasha Chai --BashaChai.xyz Report
Basha Chai --BashaChai.xyz Report
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
Extraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesExtraction of Buildings from Satellite Images
Extraction of Buildings from Satellite Images
 
ds 1.pptx
ds 1.pptxds 1.pptx
ds 1.pptx
 
Howison si2 keynote
Howison si2 keynoteHowison si2 keynote
Howison si2 keynote
 
Pairing Tweets with the Right Location
Pairing Tweets with the Right LocationPairing Tweets with the Right Location
Pairing Tweets with the Right Location
 
Ists
IstsIsts
Ists
 
Data about the sector workshop
Data about the sector workshopData about the sector workshop
Data about the sector workshop
 
Development slam 2017
Development slam 2017Development slam 2017
Development slam 2017
 
ViziCities - Lessons Learnt Visualising Real-world Cities in 3D
ViziCities - Lessons Learnt Visualising Real-world Cities in 3DViziCities - Lessons Learnt Visualising Real-world Cities in 3D
ViziCities - Lessons Learnt Visualising Real-world Cities in 3D
 
Object oriented programming in r
Object oriented programming in r Object oriented programming in r
Object oriented programming in r
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
DataChat_FinalPaper
DataChat_FinalPaperDataChat_FinalPaper
DataChat_FinalPaper
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender Systems
 
Oo ps exam answer2
Oo ps exam answer2Oo ps exam answer2
Oo ps exam answer2
 
Community Metrics at Novell
Community Metrics at NovellCommunity Metrics at Novell
Community Metrics at Novell
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIBig Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
 
ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...
ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...
ENHANCING THE PERFORMANCE OF E-COMMERCE SOLUTIONS BY FRIENDS RECOMMENDATION S...
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Neighborhoods similarities

  • 2. Introduction  Moving can be a tricky decision in itself without having to deal with choosing the best area suited to you  Since it is hard to adapt to a new place we found that it helps for one to find place where can keep practicing his daily routines, like going to nearby gym in the morning or getting you coffee in café near your house before getting to work  Hence we came up with this idea where we try to find the most similar neighborhoods between two cities I to help people who are interested in moving out of their current neighborhood by recommending for them the neighborhood that best fit there lifstyle
  • 3. Data: New York Data  for new York Data we used a the same dataset we have used previously in the 3rd week lab in the course[1] which provided a json file that contains an already well organized data that didn’t need much cleaning and looked eventually like this:
  • 4. Data: Toronto Data  for the Toronto data we retrieved the boroughs and neighborhoods from a Wikipedia page[2] using web scraping techniques and then we used the geospatial data file that was provided in the 3rd week assignment in order to add the coordinate for each area and after some cleaning by removing all unassigned boroughs and grouping neighborhoods that belongs to the same postal code we came with the following table:
  • 5. Data: Merging the two Datasets  Finally, we did some formatting for the Toronto neighborhood data frame to match the same format of the New York Neighborhood data frame in order to concatenate them into one data frame that contains all neighborhoods after adding a suffix to each Neighborhood name to indicate the country the neighborhood belongs to, this step can be very useful if we decided to expand the project in the future to include more cities
  • 6. Methodology: Features For the purpose of this project, we are going to be considering the venues surrounding a neighborhood to be the only deciding factor to determine the similarity between neighborhoods, as we will extract the X most common venues to use as the features of our model and after retrieving those feature we store them in a data frame which eventually looked like this:
  • 7. Methodology: Model For our model we chose a hierarchal clustering model specifically Agglomerative Clustering since it is the most commonly used hierarchal clustering model, and the reason behind this choice is that for our goal the hierarchal model can give us different levels of similarities as opposed to classical partition based clustering like K-means which can only define the similarity to be between points that belong to same cluster without considering hoe similar are they, whereas in a hierarchal model we can easily find the neighborhood most similar and dissimilar for any neighborhood which can give more options for our goal
  • 8. Methodology: Libraries  beautiful soup : we used Beautiful Soup library as a web scraping library on order to retrieve the Toronto city data and format properly  Scipy: we chose Scipy library to represent our model and the reason we chose it over sklearn is that it provides easy and simple function to help us draw our dendrogram  FourSquare: we used FourSquare API to retrieve the all the venues nearby a specific location
  • 9. Result After training our model to fit our features we found that most neighborhoods that are very similar are within the same city but there a few exceptions like port Morris new York which we find that the most similar neighborhood to it is the ones inside the postal area M7Y which means that those two neighborhoods are closer to each other more than they are similar to other neighborhood within their same city.
  • 10. Discussion As we saw previously we were able to reach a model that can answer our question, However it is somewhat hard to measure the accuracy of mentioned model as it is the case with most clustering problems but that is okay in a way since the goal of this model is to give an approximate recommendation for someone looking to move to a new neighborhood and doesn’t want to change his/her lifestyle dramatically. Yet there is still one big limitation in our project and it is that we didn’t use enough features to have the best decision possible since we considered the 7 most common venues to be the only deciding factor.
  • 11. Discussion(Contd..) So for future work we can add more features like land area, population density and average monthly temperature and other properties that can factor in one’s decision to choose a new living area. We can also try different models that can help us achieve our goal such as fuzzy clustering model which can be very useful to us since it supports a data point belonging to multiple clusters in different degrees on possibility.
  • 12. Conclusion In conclusion, we find that we can find hierarchal clustering model to help us find the most similar neighborhood for a specific neighborhood in order to help someone trying to decide on a new place to live without making dramatic changes to his/her routine but we find that one limiting factor in our work is the features we chose to help our model decision as it is not enough to help our model achieve the best decision possible thus it will add more feature in our future work.
  • 13. References  [1]: https://cocl.us/new_york_dataset  [2]: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M