Public data can be considered large and important sources of data that can be used for different purposes. In this paper we present a method for collecting and analyzing data within urban settlements. For more focused analysis and gathering of large amount of data we considered a case study of Bucharest. The main purpose of this analysis is to pick up important information about different streets, points of interests, details about urban planning, etc., with the goal of facilitating a quick and correct evaluation of specific areas and identifying suitable location for adding new points of interest. The prediction of suitable location involves using heuristics and data mining technics such as clustering algorithms, association rules
2. Purpose
• A method for collecting and analyzing data within
urban settlements – case study: Bucharest
• Purpose: collect important information about
different streets, points of interests, details about
urban planning, etc.
• Goals:
– facilitating a quick and correct evaluation of specific
areas (the proximity of different points of interest)
and
– identifying suitable location for adding new points of
interest (using heuristics and data mining techniques
such as clustering algorithms, association rules)
12.09.2014 RoeduNet 2014 2
3. Introduction
• Public data = information produced or held by a
certain person, institution or company, that can
be accessed, reused, redistributed in a free way
by any citizen.
• Efficient use of this data may contribute to the
improvement of people's lives and to the
intelligent development of a city (e.g. reducing
pollution, recycling, optimal use of infrastructure,
traffic management, efficiency of public
transport, planning of new construction,
customers information on real data, etc.)
12.09.2014 RoeduNet 2014 3
4. State-of-the-art
• Applications for obtaining directions / evaluating different
locations (Google Maps)
– Advantage: allows users to mark different locations on the
existing maps, offering information about their location (hotels,
bars, hospitals, shops, public transportation stations, etc.)
– Drawbacks: it has a relatively small number of annotations
(marks for different points of interest) and it doesn't make any
difference between the points that are marked it doesn't
allow for specific types of interest points
• Applications for tourists
– Advantage: offer information about locations like restaurants,
bars and coffee shops (+ ratings), recommendations, maps,
itinerary plans and attractions
– Drawback: limited to the touristic relevant categories of point
of interest
12.09.2014 RoeduNet 2014 4
5. State-of-the-art
• Similar to Yelp, which allows searching for points
of interest from different categories: food,
nightlife, shopping, health & medical, etc.
– Drawback: suggestions only for the most popular
cities around the world
• The identification of suitable locations for adding
new points of interest used the framework for
spatial data mining from Chawla, Shekhar and
Wu, that is trying to predict locations using map
similarity metrics
12.09.2014 RoeduNet 2014 5
6. Data Collection
• Points of Interest and Streets Data Collection
– Using a Web Crawler for http://strazi.rou.ro/ (data divided
into categories and subcategories - airports, agencies,
banks, churches, shops - and included associated details -
longitude, latitude, city and street where it is placed)
– Servicii Google (Google Places API) – allows four types of
search: nearby search, radar search, text search, details
search. (e.g. information about 200 schools from
Bucharest perimeter)
• Urban Planning Data
– Extracted images having spatial coordinates and legend (
http://www.melon.ro/maps/PUG_BUCURESTI_IE.html )
– This information was integrated in the current project by
adding a new layer on top of Google Maps (built from
these images)
– Extracted and saved the information about the legend
12.09.2014 RoeduNet 2014 6
7. Evaluating Proximity of a Location
• Present the information in an useful manner by evaluating
the proximity of a given location
• 2 different ways of evaluation:
– Radius search: searching for points inside a circle whose radius
and center are selected by the user results: list of points of
interest that are found within the selected area, along with their
details
• Scenario: an old person wants to buy a house and he/she needs to see
how many points of interest are within walking distance (shops,
transportation, hospitals, etc.).
– Searching the closest points of interest from a selected point.
This method receives as parameters the current position and
one or more locations types that the user is interested in (e.g.
schools, banks, shops, hospitals, etc.) and will display the
nearest point from each selected category (according to the
Haversine distance) + their information.
• Scenario: someone needs to know where is the closest place where
he can buy some drugs or where is the closest doctor
12.09.2014 RoeduNet 2014 7
8. Evaluating Proximity of a Location
12.09.2014 RoeduNet 2014 8
Radius search Closest points of interest
9. Town Planning Analysis
• Additionally, we also make an analysis of the town planning
in the selected area (identify the main urban areas and the
% they cover within it)
• Works with the radius evaluation because, in this case, we
can estimate the evaluated area (which is not possible in
the case of the closest points of interest)
• Takes into account the tiles that have their center inside
the evaluation area (circle)
• Results: a sorted table that contains the average % of
different area types within the area, along with their
legend descriptions.
• Scenario: when one wants to buy a house, he/she might be
interested what type of area is in the neighborhood, as this
is an important information that influences the price of the
house (e.g. how central it is, if there are public parks/
factories in the nearby).
12.09.2014 RoeduNet 2014 9
10. Location Prediction
• Identification of suitable location for adding new
points of interest such as: shops, banks, schools,
hospitals, etc.
• Highly dependent on the information collected
about different settlements, as each settlement
has its own specificity
• We worked on the data that we collected about
Bucharest, which consists in locations of various
(categorized) points of interest and the city
planning (offering details about regulations and
local rules, urban area delimitation, traffic
network structure, type and height of buildings,
etc.)
12.09.2014 RoeduNet 2014 10
11. Location Prediction
• Using Data Mining techniques:
– Clustering Algorithms (Hierarchical Clustering, DBSCAN) – used for
analyzing the clusters built with the points of interest from the same
category (agencies, banks, schools, shops) determine a clustering
coefficient for each type of points
– Rules associations: rules consist of linking the urban plan legends to
the points of interest identify points of interest that can be found
inside the urban planning area and ones that cannot be found there.
• Using heuristics:
– based on the similarities and differences between different urban
planning areas assumption: the categories of points of interest are
uniformly distributed in all areas of the same type
– evaluation of an area to ensure that if we want to add a specific point
type in that area, such a point does not already exist radius
representing the cluster coefficient previously computed and the
circle center being the same with the center of the group of tiles from
that area
12.09.2014 RoeduNet 2014 11
13. Conclusions
• Public data = important source of information that can be automatic
analyzed using algorithms and techniques from the data mining
• Bucharest case study for a fast, efficient and correct town area
evaluation and for the identification of suitable locations for adding
new points of interest
• The evaluation part has a medium complexity, but increased utility
• The prediction part involves high complexity algorithms that use a lot
of data
• Posibile improvements:
– find new sources of data to be added in the system
– porting the application on mobile devices
– Identify better algorithms and heuristics for the prediction part
– Take advantage on the ratings provided by different users
– Can be easily adapted for other towns
12.09.2014 RoeduNet 2014 13