SlideShare a Scribd company logo
1 of 14
Autor(i) Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
The Collection and Analysis of
Public Data
Case study - Bucharest
Costin-Gabriel CHIRU and Constantin Ciprian MIHAILA
costin.chiru@cs.pub.ro, cipri.mihaila@gmail.com
Purpose
• A method for collecting and analyzing data within
urban settlements – case study: Bucharest
• Purpose: collect important information about
different streets, points of interests, details about
urban planning, etc.
• Goals:
– facilitating a quick and correct evaluation of specific
areas (the proximity of different points of interest)
and
– identifying suitable location for adding new points of
interest (using heuristics and data mining techniques
such as clustering algorithms, association rules)
12.09.2014 RoeduNet 2014 2
Introduction
• Public data = information produced or held by a
certain person, institution or company, that can
be accessed, reused, redistributed in a free way
by any citizen.
• Efficient use of this data may contribute to the
improvement of people's lives and to the
intelligent development of a city (e.g. reducing
pollution, recycling, optimal use of infrastructure,
traffic management, efficiency of public
transport, planning of new construction,
customers information on real data, etc.)
12.09.2014 RoeduNet 2014 3
State-of-the-art
• Applications for obtaining directions / evaluating different
locations (Google Maps)
– Advantage: allows users to mark different locations on the
existing maps, offering information about their location (hotels,
bars, hospitals, shops, public transportation stations, etc.)
– Drawbacks: it has a relatively small number of annotations
(marks for different points of interest) and it doesn't make any
difference between the points that are marked  it doesn't
allow for specific types of interest points
• Applications for tourists
– Advantage: offer information about locations like restaurants,
bars and coffee shops (+ ratings), recommendations, maps,
itinerary plans and attractions
– Drawback: limited to the touristic relevant categories of point
of interest
12.09.2014 RoeduNet 2014 4
State-of-the-art
• Similar to Yelp, which allows searching for points
of interest from different categories: food,
nightlife, shopping, health & medical, etc.
– Drawback: suggestions only for the most popular
cities around the world
• The identification of suitable locations for adding
new points of interest used the framework for
spatial data mining from Chawla, Shekhar and
Wu, that is trying to predict locations using map
similarity metrics
12.09.2014 RoeduNet 2014 5
Data Collection
• Points of Interest and Streets Data Collection
– Using a Web Crawler for http://strazi.rou.ro/ (data divided
into categories and subcategories - airports, agencies,
banks, churches, shops - and included associated details -
longitude, latitude, city and street where it is placed)
– Servicii Google (Google Places API) – allows four types of
search: nearby search, radar search, text search, details
search. (e.g. information about 200 schools from
Bucharest perimeter)
• Urban Planning Data
– Extracted images having spatial coordinates and legend (
http://www.melon.ro/maps/PUG_BUCURESTI_IE.html )
– This information was integrated in the current project by
adding a new layer on top of Google Maps (built from
these images)
– Extracted and saved the information about the legend
12.09.2014 RoeduNet 2014 6
Evaluating Proximity of a Location
• Present the information in an useful manner by evaluating
the proximity of a given location
• 2 different ways of evaluation:
– Radius search: searching for points inside a circle whose radius
and center are selected by the user  results: list of points of
interest that are found within the selected area, along with their
details
• Scenario: an old person wants to buy a house and he/she needs to see
how many points of interest are within walking distance (shops,
transportation, hospitals, etc.).
– Searching the closest points of interest from a selected point.
This method receives as parameters the current position and
one or more locations types that the user is interested in (e.g.
schools, banks, shops, hospitals, etc.) and will display the
nearest point from each selected category (according to the
Haversine distance) + their information.
• Scenario: someone needs to know where is the closest place where
he can buy some drugs or where is the closest doctor
12.09.2014 RoeduNet 2014 7
Evaluating Proximity of a Location
12.09.2014 RoeduNet 2014 8
Radius search Closest points of interest
Town Planning Analysis
• Additionally, we also make an analysis of the town planning
in the selected area (identify the main urban areas and the
% they cover within it)
• Works with the radius evaluation because, in this case, we
can estimate the evaluated area (which is not possible in
the case of the closest points of interest)
• Takes into account the tiles that have their center inside
the evaluation area (circle)
• Results: a sorted table that contains the average % of
different area types within the area, along with their
legend descriptions.
• Scenario: when one wants to buy a house, he/she might be
interested what type of area is in the neighborhood, as this
is an important information that influences the price of the
house (e.g. how central it is, if there are public parks/
factories in the nearby).
12.09.2014 RoeduNet 2014 9
Location Prediction
• Identification of suitable location for adding new
points of interest such as: shops, banks, schools,
hospitals, etc.
• Highly dependent on the information collected
about different settlements, as each settlement
has its own specificity
• We worked on the data that we collected about
Bucharest, which consists in locations of various
(categorized) points of interest and the city
planning (offering details about regulations and
local rules, urban area delimitation, traffic
network structure, type and height of buildings,
etc.)
12.09.2014 RoeduNet 2014 10
Location Prediction
• Using Data Mining techniques:
– Clustering Algorithms (Hierarchical Clustering, DBSCAN) – used for
analyzing the clusters built with the points of interest from the same
category (agencies, banks, schools, shops)  determine a clustering
coefficient for each type of points
– Rules associations: rules consist of linking the urban plan legends to
the points of interest  identify points of interest that can be found
inside the urban planning area and ones that cannot be found there.
• Using heuristics:
– based on the similarities and differences between different urban
planning areas  assumption: the categories of points of interest are
uniformly distributed in all areas of the same type
– evaluation of an area to ensure that if we want to add a specific point
type in that area, such a point does not already exist  radius
representing the cluster coefficient previously computed and the
circle center being the same with the center of the group of tiles from
that area
12.09.2014 RoeduNet 2014 11
Location Prediction
12.09.2014 RoeduNet 2014 12
Hierarchical Clustering
DBSCAN
Suitable location for bars in
a specific urban area”
Conclusions
• Public data = important source of information that can be automatic
analyzed using algorithms and techniques from the data mining
• Bucharest case study  for a fast, efficient and correct town area
evaluation and for the identification of suitable locations for adding
new points of interest
• The evaluation part has a medium complexity, but increased utility
• The prediction part involves high complexity algorithms that use a lot
of data
• Posibile improvements:
– find new sources of data to be added in the system
– porting the application on mobile devices
– Identify better algorithms and heuristics for the prediction part
– Take advantage on the ratings provided by different users
– Can be easily adapted for other towns
12.09.2014 RoeduNet 2014 13
Questions
12.09.2014 RoeduNet 2014 12
Thank you very much!

More Related Content

Similar to The collection and analysis of public data - Bucharest case study

Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Citadelh2020
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Gayane Sedrakyan
 
Health Care Geomatics Presentation
Health Care Geomatics PresentationHealth Care Geomatics Presentation
Health Care Geomatics Presentation
PeterLuebke
 
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
John Makridis
 
Geomarketing for retail
Geomarketing for retailGeomarketing for retail
Geomarketing for retail
CSR
 
Methods and Techniques for Segmentation of Consumers in Social Media
Methods and Techniques for Segmentation of Consumers in Social MediaMethods and Techniques for Segmentation of Consumers in Social Media
Methods and Techniques for Segmentation of Consumers in Social Media
Óscar Muñoz García
 

Similar to The collection and analysis of public data - Bucharest case study (20)

Enabling Smarter Cities through Internet of Things, Web of Data & Citizen Par...
Enabling Smarter Cities through Internet of Things, Web of Data & Citizen Par...Enabling Smarter Cities through Internet of Things, Web of Data & Citizen Par...
Enabling Smarter Cities through Internet of Things, Web of Data & Citizen Par...
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
Health Care Geomatics Presentation
Health Care Geomatics PresentationHealth Care Geomatics Presentation
Health Care Geomatics Presentation
 
Internet of Things, Web of Data & Citizen Participation as Enablers of Smart ...
Internet of Things, Web of Data & Citizen Participation as Enablers of Smart ...Internet of Things, Web of Data & Citizen Participation as Enablers of Smart ...
Internet of Things, Web of Data & Citizen Participation as Enablers of Smart ...
 
USEMP - value of personal data CAISE 14 presentation
USEMP - value of personal data CAISE 14 presentationUSEMP - value of personal data CAISE 14 presentation
USEMP - value of personal data CAISE 14 presentation
 
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
 
Geomarketing for retail
Geomarketing for retailGeomarketing for retail
Geomarketing for retail
 
Ijet v3 i1p4
Ijet v3 i1p4Ijet v3 i1p4
Ijet v3 i1p4
 
Maede Kiani Sarkaleh, Mehregan Mahdavi and Mahsa Baniardalan
Maede Kiani Sarkaleh, Mehregan Mahdavi and Mahsa BaniardalanMaede Kiani Sarkaleh, Mehregan Mahdavi and Mahsa Baniardalan
Maede Kiani Sarkaleh, Mehregan Mahdavi and Mahsa Baniardalan
 
Methods and Techniques for Segmentation of Consumers in Social Media
Methods and Techniques for Segmentation of Consumers in Social MediaMethods and Techniques for Segmentation of Consumers in Social Media
Methods and Techniques for Segmentation of Consumers in Social Media
 
What can be done with Open Data?
What can be done with Open Data?What can be done with Open Data?
What can be done with Open Data?
 
Praxis and politics of urban data: Building the Dublin Dashboard
Praxis and politics of urban data: Building the Dublin DashboardPraxis and politics of urban data: Building the Dublin Dashboard
Praxis and politics of urban data: Building the Dublin Dashboard
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IV
 
P Sweta
P SwetaP Sweta
P Sweta
 
GA Project Capstone-City of Melbourne street furniture-final
GA Project Capstone-City of Melbourne street furniture-finalGA Project Capstone-City of Melbourne street furniture-final
GA Project Capstone-City of Melbourne street furniture-final
 
Bazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploadingBazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploading
 
Urbanite vision short
Urbanite vision   shortUrbanite vision   short
Urbanite vision short
 
Evidence-Informed Decision Making
Evidence-Informed Decision MakingEvidence-Informed Decision Making
Evidence-Informed Decision Making
 
Evidence-Informed Decision Making
Evidence-Informed Decision MakingEvidence-Informed Decision Making
Evidence-Informed Decision Making
 

More from University Politehnica Bucharest

Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
University Politehnica Bucharest
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
University Politehnica Bucharest
 

More from University Politehnica Bucharest (20)

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
 
Creativity detection in texts
Creativity detection in textsCreativity detection in texts
Creativity detection in texts
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 

The collection and analysis of public data - Bucharest case study

  • 1. Autor(i) Conducător științific Universitatea Politehnica București Facultatea de Automatică și Calculatoare Catedra de Calculatoare The Collection and Analysis of Public Data Case study - Bucharest Costin-Gabriel CHIRU and Constantin Ciprian MIHAILA costin.chiru@cs.pub.ro, cipri.mihaila@gmail.com
  • 2. Purpose • A method for collecting and analyzing data within urban settlements – case study: Bucharest • Purpose: collect important information about different streets, points of interests, details about urban planning, etc. • Goals: – facilitating a quick and correct evaluation of specific areas (the proximity of different points of interest) and – identifying suitable location for adding new points of interest (using heuristics and data mining techniques such as clustering algorithms, association rules) 12.09.2014 RoeduNet 2014 2
  • 3. Introduction • Public data = information produced or held by a certain person, institution or company, that can be accessed, reused, redistributed in a free way by any citizen. • Efficient use of this data may contribute to the improvement of people's lives and to the intelligent development of a city (e.g. reducing pollution, recycling, optimal use of infrastructure, traffic management, efficiency of public transport, planning of new construction, customers information on real data, etc.) 12.09.2014 RoeduNet 2014 3
  • 4. State-of-the-art • Applications for obtaining directions / evaluating different locations (Google Maps) – Advantage: allows users to mark different locations on the existing maps, offering information about their location (hotels, bars, hospitals, shops, public transportation stations, etc.) – Drawbacks: it has a relatively small number of annotations (marks for different points of interest) and it doesn't make any difference between the points that are marked  it doesn't allow for specific types of interest points • Applications for tourists – Advantage: offer information about locations like restaurants, bars and coffee shops (+ ratings), recommendations, maps, itinerary plans and attractions – Drawback: limited to the touristic relevant categories of point of interest 12.09.2014 RoeduNet 2014 4
  • 5. State-of-the-art • Similar to Yelp, which allows searching for points of interest from different categories: food, nightlife, shopping, health & medical, etc. – Drawback: suggestions only for the most popular cities around the world • The identification of suitable locations for adding new points of interest used the framework for spatial data mining from Chawla, Shekhar and Wu, that is trying to predict locations using map similarity metrics 12.09.2014 RoeduNet 2014 5
  • 6. Data Collection • Points of Interest and Streets Data Collection – Using a Web Crawler for http://strazi.rou.ro/ (data divided into categories and subcategories - airports, agencies, banks, churches, shops - and included associated details - longitude, latitude, city and street where it is placed) – Servicii Google (Google Places API) – allows four types of search: nearby search, radar search, text search, details search. (e.g. information about 200 schools from Bucharest perimeter) • Urban Planning Data – Extracted images having spatial coordinates and legend ( http://www.melon.ro/maps/PUG_BUCURESTI_IE.html ) – This information was integrated in the current project by adding a new layer on top of Google Maps (built from these images) – Extracted and saved the information about the legend 12.09.2014 RoeduNet 2014 6
  • 7. Evaluating Proximity of a Location • Present the information in an useful manner by evaluating the proximity of a given location • 2 different ways of evaluation: – Radius search: searching for points inside a circle whose radius and center are selected by the user  results: list of points of interest that are found within the selected area, along with their details • Scenario: an old person wants to buy a house and he/she needs to see how many points of interest are within walking distance (shops, transportation, hospitals, etc.). – Searching the closest points of interest from a selected point. This method receives as parameters the current position and one or more locations types that the user is interested in (e.g. schools, banks, shops, hospitals, etc.) and will display the nearest point from each selected category (according to the Haversine distance) + their information. • Scenario: someone needs to know where is the closest place where he can buy some drugs or where is the closest doctor 12.09.2014 RoeduNet 2014 7
  • 8. Evaluating Proximity of a Location 12.09.2014 RoeduNet 2014 8 Radius search Closest points of interest
  • 9. Town Planning Analysis • Additionally, we also make an analysis of the town planning in the selected area (identify the main urban areas and the % they cover within it) • Works with the radius evaluation because, in this case, we can estimate the evaluated area (which is not possible in the case of the closest points of interest) • Takes into account the tiles that have their center inside the evaluation area (circle) • Results: a sorted table that contains the average % of different area types within the area, along with their legend descriptions. • Scenario: when one wants to buy a house, he/she might be interested what type of area is in the neighborhood, as this is an important information that influences the price of the house (e.g. how central it is, if there are public parks/ factories in the nearby). 12.09.2014 RoeduNet 2014 9
  • 10. Location Prediction • Identification of suitable location for adding new points of interest such as: shops, banks, schools, hospitals, etc. • Highly dependent on the information collected about different settlements, as each settlement has its own specificity • We worked on the data that we collected about Bucharest, which consists in locations of various (categorized) points of interest and the city planning (offering details about regulations and local rules, urban area delimitation, traffic network structure, type and height of buildings, etc.) 12.09.2014 RoeduNet 2014 10
  • 11. Location Prediction • Using Data Mining techniques: – Clustering Algorithms (Hierarchical Clustering, DBSCAN) – used for analyzing the clusters built with the points of interest from the same category (agencies, banks, schools, shops)  determine a clustering coefficient for each type of points – Rules associations: rules consist of linking the urban plan legends to the points of interest  identify points of interest that can be found inside the urban planning area and ones that cannot be found there. • Using heuristics: – based on the similarities and differences between different urban planning areas  assumption: the categories of points of interest are uniformly distributed in all areas of the same type – evaluation of an area to ensure that if we want to add a specific point type in that area, such a point does not already exist  radius representing the cluster coefficient previously computed and the circle center being the same with the center of the group of tiles from that area 12.09.2014 RoeduNet 2014 11
  • 12. Location Prediction 12.09.2014 RoeduNet 2014 12 Hierarchical Clustering DBSCAN Suitable location for bars in a specific urban area”
  • 13. Conclusions • Public data = important source of information that can be automatic analyzed using algorithms and techniques from the data mining • Bucharest case study  for a fast, efficient and correct town area evaluation and for the identification of suitable locations for adding new points of interest • The evaluation part has a medium complexity, but increased utility • The prediction part involves high complexity algorithms that use a lot of data • Posibile improvements: – find new sources of data to be added in the system – porting the application on mobile devices – Identify better algorithms and heuristics for the prediction part – Take advantage on the ratings provided by different users – Can be easily adapted for other towns 12.09.2014 RoeduNet 2014 13
  • 14. Questions 12.09.2014 RoeduNet 2014 12 Thank you very much!

Editor's Notes

  1. \