SlideShare a Scribd company logo
1 of 26
A New Venue for Toronto
Capstone Project for IBM Data Science Professional Certificate via Coursera
ffmulksin
This is a presentation of my capstone project for the IBM
Data Science Professional Certificate via Coursera.[1]
IBM DATA SCIENCE PROFESSIONAL
The code can be found in form of a Jupyter notebook on
my Github account:
https://github.com/ffmulks/data...certification
CODE AVAILABILITY
Florian is a chemistry researcher who strives to use data-
driven decisions to fast-forward experimental and
computational chemical research processes.
DR. FLORIAN F. MULKS
Introduction 2
ffmulksin
Data science is the study of large quantities of data, which
can reveal insights that help make strategic choices.
WHAT IS DATA SCIENCE?
Data scientists need to be curious, judgemental and
argumentative. Finding a good story in the data and
telling it well is as important as data treatment.
DATA SCIENTISTS
Data science is a multi-disciplinary field influenced by
computer science, software engineering, mathematics,
statistics, economics and business management and uses
tools from all of these.
METHODS
Data Science 3
ffmulksin
4
Types of Machine Learning
Dimensionality Reduction
Clustering
Machine Learning
Unsupervised Learning
Supervised Learning
Reinforcement Learning
Regression
Classification
QUICK REMINDER
A broad set of machine learning methods was applied in
the scope of this project. Justification and further
explanation is given in the discussion of our results.
This is an overview of the available algorithm groups
classified as machine learning methods. Please refer to
the course material for further details.[1]
USED WITHIN THIS PROJECT
ffmulksin
5
Methodology
CRoss-Industry Standard Process for Data Mining (1996),
project by the EU and five companies. Most widely used
analytics model.
CRISP-DM, 1996
More refined 10-step version of CRISP-DM of an IBM data
scientist. He adds analytic approach, data collection, data
understanding and feedback.
JOHN ROLLINS’ MODEL, 2015
Analytics Solutions Unified Method for Data
Mining/Predictive Analytics, by IBM. Especially refines the
model around reiteration.
ASUM-DM, 2015
Evaluation
Business
Understanding
Data
Understanding
Deployment
Data
Preparation
Modeling
Data
ffmulksin
6
Business Understanding
Investors or city planners are looking to open a venue in
Toronto that is profitable and contributes toward the local
cultural infrastructure.
LAUNCH ANY VENUE
We are not posing any prior limitations regarding the
location or type of venue. The decision will be purely
based on data-driven predictions.
DECISION CIRCUMSTANCES
Toronto is a cultural melting pot among the top 10 most
populated cities in North America. Rich diversity is found
in venues and citizens.
TORONTO, CANADA
6
Ref. [2]
ffmulksin
7
Literature Overview
There are plenty of analyses of Toronto’s venue and
population structure due to the privileged role as
standard example in the course capstone.
DATA SCIENCE CAPSTONE
Available studies aim at opening specific venue types,[3] at
finding best places to live,[4] or at declaring an area the
“best” neighborhood.[5]
AIM OF PREVIOUS STUDIES
Most studies start with clustering postal code areas and
proceed to derive decisions based on bar graphs of some
additional data sources based on their goal.
METHODS SUMMARY
7
Ref. [2]
ffmulksin
8
Toronto Postal Codes and Venues
We will use the 103 postal codes starting with M in
Toronto, Canada, scraped from Wikipedia together with
assigned boroughs and neighborhoods.
DATA SET
Venue data was requested from the Foursquare API
within 500 m radii of postal code centroids. A limit of 100
venues per postal code request was applied.
VENUES
The collected dataset contains names, addresses,
geographic locations, and venue categories. We have
extracted 2130 venues from 271 categories spread over
103 areas defined by 500 m radii around the center of
postal codes.
DATA OVERVIEW
ffmulksin
9
Geographic Venue Distribution
We computed the total number of venues and the mean
distances between them by calculating the earth surface
distances (Haversine distances).
FEATURE CONSTRUCTION
Most investigated 500 m radii only contain less than 20
venues. Venue distances appear to be rather invariant
among the different postal code areas.
DISTRIBUTION
Models are expected to be of limited representativeness
as limited data amounts per postal code radius exist.
FIRST UNDERSTANDING
Click buttons to control figure display:
distribution plot box plots 
ffmulksin
10
Venue Density Distribution
High venue counts should lead to high venue densities
which could be a good measure of competitiveness but
also lucrativity of a location.
VENUE DENSITY
Due to little data of neighborhoods with high venue
counts, no good correlation can be found. A slight
decrease in density is found with higher counts.
DISTRIBUTION
Distances seem broadly spread in the low venue count
neighborhoods. The mean actually slightly increases with
increasing venue counts. Let us see if we can model the
structure of the venue density.
WEAK CORRELATION EXPECTED
ffmulksin
11
Venue Density Regression
Linear regression was applied to the venue density and
we created polynomial features of the distances and
venue counts to investigate the data structure.
REGRESSION MODELS
A poor correlation with R2 0.01 is found with a linear
model, the cubic model delivers an R2 score of 0.06 (only
explaining 6% of the data variation).
CORRELATION
We cannot employ the venue density for a priori binning
of our data, but the distance between the venues seems a
valuable feature to keep for further analyses.
FEATURE TREATMENT
Click buttons to control figure display:
cubic regression linear regression 
ffmulksin
12
Venue Categories Distribution
Foursquare defines categories to explain what exactly the
customer can expect when visiting a venue.
FOURSQUARE CATEGORIES
The categories are extremely detailed. The vast majority
of the 271 venue categories occurs only once in the
whole observed region in Toronto.
DISTRIBUTION
Clustering and Principal Component Analysis (PCA) will be
needed to reduce the dimensionality. 271 categories with
only 103 samples (postal codes) make for ill-defined
matrices reducing the choice of applicable algorithms.
HIGH DIMENSIONALITY
ffmulksin
13
Clustering Algorithms
Several algorithms and initialization types were employed
to find similarities in neighborhoods based on their venue
counts and categories.
CLUSTERING
Most algorithms only allocate downtown areas in smaller
clusters. The vast majority is assigned to one or
sometimes two “low venue count” clusters.
DISCOVERED PATTERNS
K-Means (k-means++ initialization) was chosen for further
investigation as it was capable of showing some
structures even outside of downtown Toronto.
ALGORITHM CHOICE
K-Means (Random,
PCA)
K-Means (Random)K-Means (k-means++)DBSCANAgglomerative Clustering
Click buttons to change algorithm:
k = 52 clusters found
ffmulksin
14
K-Means Number of Clusters k
More clusters will always capture more variation of the
data. The cluster number at which the steepness of this
accuracy gain drops is usually chosen.
ELBOW METHOD
While we find the correct location of downtown
neighborhoods at all k, the elbow plot shows no good k
due to high similarity of most areas.
RESULTS
Due to the high similarity of our observed areas,
clustering does not deliver useful insights with the
employed data set.
HIGH AREA SIMILARITY
9− 7 +− 5 +− 3 +
Click buttons to control figure display:
elbow plot  choose k
ffmulksin
15
2d-Principal Component Analysis (PCA)
The axes are found to correlate to venue counts from
certain categories, indicating similarities between venues
of certain categories.
NEIGHBORHOODS
The matrix was transposed to find structures in our
categories rather than areas. The x-PCA represents high
density, the y-PCA low density neighborhoods.
CATEGORIES
Certain venues are likely to appear in certain density
neighborhoods. PCA is valuable for grouping the
categories. We will reduce the dimensionality of our data
set to 30 (11% of 271 categories) to create more
meaningful variables.
DATA STRUCTURE
Click buttons to control figure display:
categories neighborhoods 
ffmulksin
16
Model Development
The difference between true and MLR-based predictions
for venue counts were used to evaluate the demand for
venues in Toronto neighborhoods.
PREDICTIVE MODEL
Categories were clustered to identify similar venue types
catering the same needs. These were used to correct for
such saturated demands.
REAL DEMAND CORRECTION
The resulting predictions show both the location and the
category for venues that are in demand in Toronto.
DEMAND PREDICTION
Real Venue
Demand
Venue Counts in
Categories
Category Clusters
Venue Demand
Predicted Venue
Counts
Cluster Demand
Predicted Cluster
Venue Counts
K-Means
Multiple Linear
Regression
Difference
True vs. Predicted
Product
Venue x Cluster
ffmulksin
17
Demand Prediction
MLR with 5-fold cross-validation was performed to predict
every single of our 273 variables (271 categories + venue
count + mean distance).
MULTIPLE LINEAR REGRESSION (MLR)
The resulting prediction matrix emulates average healthy
Toronto neighborhoods based on the number of
neighboring venues of certain categories.
VENUE COUNT PREDICTION
Differences between true and predicted values can be
employed to predict market oversaturation and, more
importantly, market demand for venue types.
VENUE DEMAND PREDICTION
ffmulksin
18
Category Clustering
Some categories are very similar. Predicted coffee shop
demands might fully be saturated due to existing cafés.
DEMAND SATURATION
We used K-Means clustering with our transposed data to
find 30 category clusters to reduce the dimensionality of
our data.
DIMENSIONALITY REDUCTION
The clusters nicely capture other venues that might
satiate the needs for e.g. coffee such as hotels,
restaurants and also some surprising categories.
CLUSTERS CAPTURE INTERACTIONS
General Travel, Modern European Restaurant,
Steakhouse, Restaurant, Plaza, Cuban Restaurant, Gift
Shop, New American Restaurant, Brazilian Restaurant,
Japanese Restaurant, Smoke Shop, French Restaurant,
Pub, Art Gallery, Shopping Mall, Speakeasy, Tea Room,
Italian Restaurant, Salon / Barbershop, Food Court,
Vegetarian / Vegan Restaurant, Seafood Restaurant,
Concert Hall, Nightclub, Gluten-free Restaurant, Soup
Place, American Restaurant, Bakery, Department Store,
Gastropub, Hotel, Coffee Shop, Opera House, Food Truck,
Lounge, Asian Restaurant, Art Museum, Cupcake Shop,
Train Station, Beer Bar, Colombian Restaurant, Café,
Record Shop, Bookstore, Deli / Bodega, Building, Men’s
Store, Fast Food Restaurant, Wine Bar, Dog Run,
Monument / Landmark, Museum, Thai Restaurant, Salad
Place
COFFEE SHOP CLUSTER
ffmulksin
19
Demand in Venue Categories
We analyzed the demand as difference between
predicted and true venue counts over the 30 category
clusters.
CLUSTER DEMAND
The highest demand for venue is found in the cluster
containing coffee shops. The map shows neighborhood
demands in the coffee cluster.
COFFEE CLUSTER HIGHLY DEMANDED
Most suburban areas have a fulfilled demand in the
coffee cluster. We found that many downtown areas
would support several more venues in the cluster.
STRONG DEMAND IN DOWNTOWN AREA
Click buttons to control figure display:
bar chart−+zoom 
coffee cluster
ffmulksin
20
Coffee Cluster Demand in Neighborhoods
The coffee cluster showed the highest demands over all
clusters. We now looked at the distribution of this
demand over the different locations.
DEMAND LOCATION
Toronto seems to have a healthy amount of venues in the
category but their location is suboptimal with both over-
and undersaturated neighborhoods.
DEMAND DISTRIBUTION
There are several neighborhoods that strongly demand
more venues in the coffee cluster. Let us look into the
details to find out which venue category would be the
most lucrative to launch.
AREAS WITH HIGH DEMAND FOUND
ffmulksin
21
Coffee and Café Demand
Coffee shops and cafés are demanded the most within
their cluster. Their combined demand is shown in the
maps.
CATEGORY DEMAND
Toronto needs coffee! The two almost identical categories
coffee shops and cafés are in extreme demand especially
in the Church and Wellesley area.
COFFEE DEMAND
We now need to make sure that this neighborhood is not
saturated with e.g. tea rooms and other venues that serve
coffee as well.
INTERACTION WITH OTHER VENUES
Click buttons to control figure display:
bar chart−+zoom 
ffmulksin
22
Real Demand
The product of the market demands (negative market
saturation) of the whole cluster with the demand of only
coffee shops and cafés was taken.
DEMAND PRODUCT
In some cases, the coffee demand is largely saturated by
other venues (larger yellow than red circle). One extreme
demand product is found.
SATURATION OF COFFEE DEMAND
The neighborhood Church and Wellesley lacks 4.5 coffee
shops/cafés and is even lacking 6.0 venues from the
whole cluster which yields an outstanding demand
product of 26.7.
OUTSTANDING DEMAND FOUND
Cafés:
Cluster:
Product:
2.1
5.3
10.9
Cafés:
Cluster:
Product:
4.5
6.0
26.7
Cafés:
Cluster:
Product:
1.4
6.2
8.8
Cafés:
Cluster:
Product:
1.8
4.2
7.5
Cafés:
Cluster:
Product:
4.5
6.0
26.7
Click buttons to control figure display:
−+zoom
ffmulksin
Among all venue categories, coffee shops/cafés were
identified by linear regression to be in extreme demand
throughout many central Toronto areas.
HIGH COFFEE DEMAND
Venue clustering showed that there is an extreme coffee
demand in the Church and Wellesley area that even lacks
further venues of similar categories.
BEST LOCATION FOR A STORE
Launch a coffee shop in Church and Wellesley. The area
lacks 4.5 coffee shops and a grand total of 6 venues
catering the need for coffee. This should be very lucrative!
ACTION RECOMMENDATION
Conclusions 23
Ref. [6]
ffmulksin
24
Thanks for Reading!
Thank you for your attention. If you have any comments,
please get in touch. Constructive criticism is always
welcome.
This project was done to learn the ropes of data science
for application in my research aiming to enable simple
and semi-automatic computational chemical modelling
for experimental researchers. If you are interested in my
work, feel free to contact me or check out my homepage
and social media.
GET IN TOUCH
inff@mulks.ac
mulks.ac ffmulks
ffmulks
24
Ref. [6]
ffmulksin
25
References
https://www.coursera.org/professional-certificates/ibm-data-science,
07/08/2020.
[1] Course material:
Wladyslaw Sojka via http://www.sojka.photo, 07/08/2020.
[2] Toronto skyline photography:
https://towardsdatascience.com/exploring-toronto-neighborhoods-to-
open-an-indian-restaurant-ff4dd6bf8c8a,
https://capstoneprojectcoursera.wordpress.com/, 07/08/2020.
[3] Best place to open specific venue:
https://medium.com/...-ibm-capstone-project-52b4292ef410,
https://www.linkedin.com/pulse/capstone-project-battle-
neighborhoods-rohitaksh-gs/, 07/08/2020.
[4] Rental or personal flat:
https://github.com/gnavia007/Coursera_Capstone/,
http://roshangrewal.com/...finding-a-better-place-in-scarborough-
toronto/, 07/08/2020.
[5] Best neighbourhood:
Mike Kenneally via https://unsplash.com, 07/08/2020.
[6] Coffee photography:
ffmulksin
26

More Related Content

Similar to A New Venue for Toronto: IBM Data Science Capstone Project by F. F. Mulks

THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterDiana Zajac
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
Big data and SP Theory of Intelligence
Big data and SP Theory of IntelligenceBig data and SP Theory of Intelligence
Big data and SP Theory of IntelligenceVarsha Prabhakar
 
Introducción a Neo4j
Introducción a Neo4jIntroducción a Neo4j
Introducción a Neo4jNeo4j
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Orchestrating Collective Intelligence
Orchestrating Collective IntelligenceOrchestrating Collective Intelligence
Orchestrating Collective IntelligenceTuri, Inc.
 
Data science lab project
Data science lab projectData science lab project
Data science lab projectLuciaRavazzi
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POIIRJET Journal
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODEBI Brainz
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena PekezInstitute of Contemporary Sciences
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeIJSRD
 

Similar to A New Venue for Toronto: IBM Data Science Capstone Project by F. F. Mulks (20)

THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 Poster
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
Big data and SP Theory of Intelligence
Big data and SP Theory of IntelligenceBig data and SP Theory of Intelligence
Big data and SP Theory of Intelligence
 
Introducción a Neo4j
Introducción a Neo4jIntroducción a Neo4j
Introducción a Neo4j
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Orchestrating Collective Intelligence
Orchestrating Collective IntelligenceOrchestrating Collective Intelligence
Orchestrating Collective Intelligence
 
Data science lab project
Data science lab projectData science lab project
Data science lab project
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
Bab 4.ppt
Bab 4.pptBab 4.ppt
Bab 4.ppt
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
 

Recently uploaded

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 

Recently uploaded (20)

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 

A New Venue for Toronto: IBM Data Science Capstone Project by F. F. Mulks

  • 1. A New Venue for Toronto Capstone Project for IBM Data Science Professional Certificate via Coursera
  • 2. ffmulksin This is a presentation of my capstone project for the IBM Data Science Professional Certificate via Coursera.[1] IBM DATA SCIENCE PROFESSIONAL The code can be found in form of a Jupyter notebook on my Github account: https://github.com/ffmulks/data...certification CODE AVAILABILITY Florian is a chemistry researcher who strives to use data- driven decisions to fast-forward experimental and computational chemical research processes. DR. FLORIAN F. MULKS Introduction 2
  • 3. ffmulksin Data science is the study of large quantities of data, which can reveal insights that help make strategic choices. WHAT IS DATA SCIENCE? Data scientists need to be curious, judgemental and argumentative. Finding a good story in the data and telling it well is as important as data treatment. DATA SCIENTISTS Data science is a multi-disciplinary field influenced by computer science, software engineering, mathematics, statistics, economics and business management and uses tools from all of these. METHODS Data Science 3
  • 4. ffmulksin 4 Types of Machine Learning Dimensionality Reduction Clustering Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Regression Classification QUICK REMINDER A broad set of machine learning methods was applied in the scope of this project. Justification and further explanation is given in the discussion of our results. This is an overview of the available algorithm groups classified as machine learning methods. Please refer to the course material for further details.[1] USED WITHIN THIS PROJECT
  • 5. ffmulksin 5 Methodology CRoss-Industry Standard Process for Data Mining (1996), project by the EU and five companies. Most widely used analytics model. CRISP-DM, 1996 More refined 10-step version of CRISP-DM of an IBM data scientist. He adds analytic approach, data collection, data understanding and feedback. JOHN ROLLINS’ MODEL, 2015 Analytics Solutions Unified Method for Data Mining/Predictive Analytics, by IBM. Especially refines the model around reiteration. ASUM-DM, 2015 Evaluation Business Understanding Data Understanding Deployment Data Preparation Modeling Data
  • 6. ffmulksin 6 Business Understanding Investors or city planners are looking to open a venue in Toronto that is profitable and contributes toward the local cultural infrastructure. LAUNCH ANY VENUE We are not posing any prior limitations regarding the location or type of venue. The decision will be purely based on data-driven predictions. DECISION CIRCUMSTANCES Toronto is a cultural melting pot among the top 10 most populated cities in North America. Rich diversity is found in venues and citizens. TORONTO, CANADA 6 Ref. [2]
  • 7. ffmulksin 7 Literature Overview There are plenty of analyses of Toronto’s venue and population structure due to the privileged role as standard example in the course capstone. DATA SCIENCE CAPSTONE Available studies aim at opening specific venue types,[3] at finding best places to live,[4] or at declaring an area the “best” neighborhood.[5] AIM OF PREVIOUS STUDIES Most studies start with clustering postal code areas and proceed to derive decisions based on bar graphs of some additional data sources based on their goal. METHODS SUMMARY 7 Ref. [2]
  • 8. ffmulksin 8 Toronto Postal Codes and Venues We will use the 103 postal codes starting with M in Toronto, Canada, scraped from Wikipedia together with assigned boroughs and neighborhoods. DATA SET Venue data was requested from the Foursquare API within 500 m radii of postal code centroids. A limit of 100 venues per postal code request was applied. VENUES The collected dataset contains names, addresses, geographic locations, and venue categories. We have extracted 2130 venues from 271 categories spread over 103 areas defined by 500 m radii around the center of postal codes. DATA OVERVIEW
  • 9. ffmulksin 9 Geographic Venue Distribution We computed the total number of venues and the mean distances between them by calculating the earth surface distances (Haversine distances). FEATURE CONSTRUCTION Most investigated 500 m radii only contain less than 20 venues. Venue distances appear to be rather invariant among the different postal code areas. DISTRIBUTION Models are expected to be of limited representativeness as limited data amounts per postal code radius exist. FIRST UNDERSTANDING Click buttons to control figure display: distribution plot box plots 
  • 10. ffmulksin 10 Venue Density Distribution High venue counts should lead to high venue densities which could be a good measure of competitiveness but also lucrativity of a location. VENUE DENSITY Due to little data of neighborhoods with high venue counts, no good correlation can be found. A slight decrease in density is found with higher counts. DISTRIBUTION Distances seem broadly spread in the low venue count neighborhoods. The mean actually slightly increases with increasing venue counts. Let us see if we can model the structure of the venue density. WEAK CORRELATION EXPECTED
  • 11. ffmulksin 11 Venue Density Regression Linear regression was applied to the venue density and we created polynomial features of the distances and venue counts to investigate the data structure. REGRESSION MODELS A poor correlation with R2 0.01 is found with a linear model, the cubic model delivers an R2 score of 0.06 (only explaining 6% of the data variation). CORRELATION We cannot employ the venue density for a priori binning of our data, but the distance between the venues seems a valuable feature to keep for further analyses. FEATURE TREATMENT Click buttons to control figure display: cubic regression linear regression 
  • 12. ffmulksin 12 Venue Categories Distribution Foursquare defines categories to explain what exactly the customer can expect when visiting a venue. FOURSQUARE CATEGORIES The categories are extremely detailed. The vast majority of the 271 venue categories occurs only once in the whole observed region in Toronto. DISTRIBUTION Clustering and Principal Component Analysis (PCA) will be needed to reduce the dimensionality. 271 categories with only 103 samples (postal codes) make for ill-defined matrices reducing the choice of applicable algorithms. HIGH DIMENSIONALITY
  • 13. ffmulksin 13 Clustering Algorithms Several algorithms and initialization types were employed to find similarities in neighborhoods based on their venue counts and categories. CLUSTERING Most algorithms only allocate downtown areas in smaller clusters. The vast majority is assigned to one or sometimes two “low venue count” clusters. DISCOVERED PATTERNS K-Means (k-means++ initialization) was chosen for further investigation as it was capable of showing some structures even outside of downtown Toronto. ALGORITHM CHOICE K-Means (Random, PCA) K-Means (Random)K-Means (k-means++)DBSCANAgglomerative Clustering Click buttons to change algorithm: k = 52 clusters found
  • 14. ffmulksin 14 K-Means Number of Clusters k More clusters will always capture more variation of the data. The cluster number at which the steepness of this accuracy gain drops is usually chosen. ELBOW METHOD While we find the correct location of downtown neighborhoods at all k, the elbow plot shows no good k due to high similarity of most areas. RESULTS Due to the high similarity of our observed areas, clustering does not deliver useful insights with the employed data set. HIGH AREA SIMILARITY 9− 7 +− 5 +− 3 + Click buttons to control figure display: elbow plot  choose k
  • 15. ffmulksin 15 2d-Principal Component Analysis (PCA) The axes are found to correlate to venue counts from certain categories, indicating similarities between venues of certain categories. NEIGHBORHOODS The matrix was transposed to find structures in our categories rather than areas. The x-PCA represents high density, the y-PCA low density neighborhoods. CATEGORIES Certain venues are likely to appear in certain density neighborhoods. PCA is valuable for grouping the categories. We will reduce the dimensionality of our data set to 30 (11% of 271 categories) to create more meaningful variables. DATA STRUCTURE Click buttons to control figure display: categories neighborhoods 
  • 16. ffmulksin 16 Model Development The difference between true and MLR-based predictions for venue counts were used to evaluate the demand for venues in Toronto neighborhoods. PREDICTIVE MODEL Categories were clustered to identify similar venue types catering the same needs. These were used to correct for such saturated demands. REAL DEMAND CORRECTION The resulting predictions show both the location and the category for venues that are in demand in Toronto. DEMAND PREDICTION Real Venue Demand Venue Counts in Categories Category Clusters Venue Demand Predicted Venue Counts Cluster Demand Predicted Cluster Venue Counts K-Means Multiple Linear Regression Difference True vs. Predicted Product Venue x Cluster
  • 17. ffmulksin 17 Demand Prediction MLR with 5-fold cross-validation was performed to predict every single of our 273 variables (271 categories + venue count + mean distance). MULTIPLE LINEAR REGRESSION (MLR) The resulting prediction matrix emulates average healthy Toronto neighborhoods based on the number of neighboring venues of certain categories. VENUE COUNT PREDICTION Differences between true and predicted values can be employed to predict market oversaturation and, more importantly, market demand for venue types. VENUE DEMAND PREDICTION
  • 18. ffmulksin 18 Category Clustering Some categories are very similar. Predicted coffee shop demands might fully be saturated due to existing cafés. DEMAND SATURATION We used K-Means clustering with our transposed data to find 30 category clusters to reduce the dimensionality of our data. DIMENSIONALITY REDUCTION The clusters nicely capture other venues that might satiate the needs for e.g. coffee such as hotels, restaurants and also some surprising categories. CLUSTERS CAPTURE INTERACTIONS General Travel, Modern European Restaurant, Steakhouse, Restaurant, Plaza, Cuban Restaurant, Gift Shop, New American Restaurant, Brazilian Restaurant, Japanese Restaurant, Smoke Shop, French Restaurant, Pub, Art Gallery, Shopping Mall, Speakeasy, Tea Room, Italian Restaurant, Salon / Barbershop, Food Court, Vegetarian / Vegan Restaurant, Seafood Restaurant, Concert Hall, Nightclub, Gluten-free Restaurant, Soup Place, American Restaurant, Bakery, Department Store, Gastropub, Hotel, Coffee Shop, Opera House, Food Truck, Lounge, Asian Restaurant, Art Museum, Cupcake Shop, Train Station, Beer Bar, Colombian Restaurant, Café, Record Shop, Bookstore, Deli / Bodega, Building, Men’s Store, Fast Food Restaurant, Wine Bar, Dog Run, Monument / Landmark, Museum, Thai Restaurant, Salad Place COFFEE SHOP CLUSTER
  • 19. ffmulksin 19 Demand in Venue Categories We analyzed the demand as difference between predicted and true venue counts over the 30 category clusters. CLUSTER DEMAND The highest demand for venue is found in the cluster containing coffee shops. The map shows neighborhood demands in the coffee cluster. COFFEE CLUSTER HIGHLY DEMANDED Most suburban areas have a fulfilled demand in the coffee cluster. We found that many downtown areas would support several more venues in the cluster. STRONG DEMAND IN DOWNTOWN AREA Click buttons to control figure display: bar chart−+zoom  coffee cluster
  • 20. ffmulksin 20 Coffee Cluster Demand in Neighborhoods The coffee cluster showed the highest demands over all clusters. We now looked at the distribution of this demand over the different locations. DEMAND LOCATION Toronto seems to have a healthy amount of venues in the category but their location is suboptimal with both over- and undersaturated neighborhoods. DEMAND DISTRIBUTION There are several neighborhoods that strongly demand more venues in the coffee cluster. Let us look into the details to find out which venue category would be the most lucrative to launch. AREAS WITH HIGH DEMAND FOUND
  • 21. ffmulksin 21 Coffee and Café Demand Coffee shops and cafés are demanded the most within their cluster. Their combined demand is shown in the maps. CATEGORY DEMAND Toronto needs coffee! The two almost identical categories coffee shops and cafés are in extreme demand especially in the Church and Wellesley area. COFFEE DEMAND We now need to make sure that this neighborhood is not saturated with e.g. tea rooms and other venues that serve coffee as well. INTERACTION WITH OTHER VENUES Click buttons to control figure display: bar chart−+zoom 
  • 22. ffmulksin 22 Real Demand The product of the market demands (negative market saturation) of the whole cluster with the demand of only coffee shops and cafés was taken. DEMAND PRODUCT In some cases, the coffee demand is largely saturated by other venues (larger yellow than red circle). One extreme demand product is found. SATURATION OF COFFEE DEMAND The neighborhood Church and Wellesley lacks 4.5 coffee shops/cafés and is even lacking 6.0 venues from the whole cluster which yields an outstanding demand product of 26.7. OUTSTANDING DEMAND FOUND Cafés: Cluster: Product: 2.1 5.3 10.9 Cafés: Cluster: Product: 4.5 6.0 26.7 Cafés: Cluster: Product: 1.4 6.2 8.8 Cafés: Cluster: Product: 1.8 4.2 7.5 Cafés: Cluster: Product: 4.5 6.0 26.7 Click buttons to control figure display: −+zoom
  • 23. ffmulksin Among all venue categories, coffee shops/cafés were identified by linear regression to be in extreme demand throughout many central Toronto areas. HIGH COFFEE DEMAND Venue clustering showed that there is an extreme coffee demand in the Church and Wellesley area that even lacks further venues of similar categories. BEST LOCATION FOR A STORE Launch a coffee shop in Church and Wellesley. The area lacks 4.5 coffee shops and a grand total of 6 venues catering the need for coffee. This should be very lucrative! ACTION RECOMMENDATION Conclusions 23 Ref. [6]
  • 24. ffmulksin 24 Thanks for Reading! Thank you for your attention. If you have any comments, please get in touch. Constructive criticism is always welcome. This project was done to learn the ropes of data science for application in my research aiming to enable simple and semi-automatic computational chemical modelling for experimental researchers. If you are interested in my work, feel free to contact me or check out my homepage and social media. GET IN TOUCH inff@mulks.ac mulks.ac ffmulks ffmulks 24 Ref. [6]
  • 25. ffmulksin 25 References https://www.coursera.org/professional-certificates/ibm-data-science, 07/08/2020. [1] Course material: Wladyslaw Sojka via http://www.sojka.photo, 07/08/2020. [2] Toronto skyline photography: https://towardsdatascience.com/exploring-toronto-neighborhoods-to- open-an-indian-restaurant-ff4dd6bf8c8a, https://capstoneprojectcoursera.wordpress.com/, 07/08/2020. [3] Best place to open specific venue: https://medium.com/...-ibm-capstone-project-52b4292ef410, https://www.linkedin.com/pulse/capstone-project-battle- neighborhoods-rohitaksh-gs/, 07/08/2020. [4] Rental or personal flat: https://github.com/gnavia007/Coursera_Capstone/, http://roshangrewal.com/...finding-a-better-place-in-scarborough- toronto/, 07/08/2020. [5] Best neighbourhood: Mike Kenneally via https://unsplash.com, 07/08/2020. [6] Coffee photography: