SlideShare a Scribd company logo
1 of 20
Download to read offline
City Data Dating: emerging affinities
between diverse urban datasets
Gloria Re Calegari, Irene Celino, Diego Peroni
Paper available at:
http://www.sciencedirect.com/science/article/pii/S0306437915001362
Elsevier - Information Systems Journal 1
Digital information about cities
• Open data (large number of data sources available on the web):
• Urban planning (land cover, public registers)
• Demographics and statistics about municipality
• Closed data sources produced and maintained by enterprises:
• Phone activity data  but sometimes made open!
• User generated information:
• Volunteered geographic information and crowdsourcing information (Open Street Map)
• Location based social network (Foursquare check-ins and geo located information)
• Real-time and streaming information
• Sensors (e.g. Temperature, energy consumption, ..)
Elsevier - Information Systems Journal 2
Do diverse urban datasets provide the same
“picture” of the city?
Short term goals
• Discovering “affinities” between heterogeneous datasets.
• Using a human relations metaphor, do diverse urban datasets “date each other” and show
“natural affinities”?
• Which is the influence of spatial resolution and data complexity on the dependence strength
between heterogeneous urban sources?
Long term goals
• Would it be possible to use one or more “cheap” datasets as proxy for more “expensive” data
sources?
Elsevier - Information Systems Journal 3
Milano datasets
Demographics:
• Population density
• Spatial resolution: census area (6079 –median
size of census area 12,000 m2)
• Source: Milano open data
Points of interest (POIs):
• Trasports, schools, sports facilities, amenity places,
shops ...
• Spatial resolution: lat-long points
• Source: Milano open data (official, 6718) and Open
Street Map (user generated, 44351)
Elsevier - Information Systems Journal 4
Milano datasets
Land use:
• type of land use according to CORINE taxonomy
(3-levels hierarchy, up to 40 types of land use defined)
• CORINE taxonomy
http://swa.cefriel.it/ontologies/corine.html#
• 12 types selected (which better feature
metropolitan area as Milan)
[dense residential areas, scattered residential areas, industrial
areas, parks and green areas, roads, railways, hospitals, sports
centres, public services and offices, construction sites, agricultural
areas and wild areas.]
• Spatial resolution: building level
• Source: Lombardy region open data
Elsevier - Information Systems Journal 5
Milano datasets
Call data records:
• 5 phone activities
• Incoming SMS
• Outcoming SMS
• Incoming CALL
• Outcoming CALL
• Internet
• Recorded every 10 minutes (144 values a day for each
activity) for 2 months (Nov-Dec 2013)
• Spatial resolution: grid of 3538 square cells of 250m
• Source: Telecom Italia – provided for their Big Data
Challenge http://theodi.fbk.eu/openbigdata/
Elsevier - Information Systems Journal 6
Pre-processing of data
Making spatial resolution uniform
Spatial resolutions used:
• District level with 88 official subdivisions
• Grid level with 3.538 square cells of 250m
Cells
Districts
New datasets generated:
• Density of POIs in each cell/district
• Weighted sum of population density in each cell/district
• Percentage shares of each land use over each cell/district area
Elsevier - Information Systems Journal 7
Pre-processing of data
Telecom data
Footprint/temporal signature for each
cell/district
(average activity over all the 60 days, distinguishing
between week and weekend days)
Temporal Data compression (pre-processing large scale time series to get a more
manageable compressed representation)
Elsevier - Information Systems Journal 8
Data exploration experiments
To discover possibly “natural” connections between heterogeneous datasets.
Three-step process
1) Correlation Analysis
2) Regression Analysis
3) Clustering Analysis
All the analysis performed both at district and at cell level
complexity
of data
Elsevier - Information Systems Journal 9
1) Correlation analysis
Try to identify possible correspondences between different datasets.
Measure whether and how two variables change together using correlation indexes -> Pearson’s
correlation coefficient
-1 < r < 1
Positive
correlation
Negative
correlation
Pairwise comparisons between 1-dimensional vectors:
• POIs: density
• Population: density
• Telecom: first Principal Component with 90% of explained variability
• Land use data: only residential and agricultural land use used separately, in term of belonging
percentages to district/cell
Elsevier - Information Systems Journal 10
Correlation analysis - district level
• Correlation between
• Telecom and residential
• Telecom and POIs
can actually exist.
Data fits quasi linear models.
tlc
resid
agric
POI mun
POI
OSM
pop
• Negative correlation between agricultural
land use and other datasets -> human action
inversely related to agricultural areas.
Elsevier - Information Systems Journal 11
Correlation analysis - cell level
• All coefficients lower than the district level
• Higher values again between Telecom and
residential and POIs
=> the choice of resolution level can have a
significant impact on the correlation results.
tlc
resid
agric
POI
mun
POI
OSM
pop
• Some phenomena causing the correlation are
independent of the resolution level (0.76
residential-population) .
Elsevier - Information Systems Journal 12
Correlation analysis - phone calls and population
• Could the correlation change during the
day according to the everyday human
behaviour pattern (get up, go to work,
come back home in the evening)?
• Call activity at 6 different day times
• Week and weekend profiles are different
-> mirroring people’s different habits
• Average correlation higher in the weekend
(phone activity related to the actual
presence of people at home)
• Weekday profile -> human behaviour
pattern
DISTRICT CELL
WEEKWEEKEND
Elsevier - Information Systems Journal 13
2) Regression analysis
Fit multiple linear regression models (MLR)
(To take into consideration a larger portion of urban data complexity)
MLR models
Telecom
(1-d vector for each communication type
for week/weekend days with the average
activity)
POIs
(1-d vector for each POI category with
the POI density)
Demographics
(1-d vector with the population
density)
Land use
(1-d vector for each land use with
the percentage of land covered.
Only 5 main land use type are used.)
Cheap-to-
produce
Expensive-
to-mantain
Elsevier - Information Systems Journal 14
Regression analysis - results • The more selected predictors (increasing
k), the larger part of the outcome
variable’s variance is explained (increasing
R2adj).
• The strength of the correlation decrease
from district-level (coarse-grained) to
cell-level (fine-grained)
• Benefit of adding a higher number of
predictors is weaker at cell-level
• Population, dense residential and
agricultural areas show a stronger
correlation
manual stepwise
model selection
AIC
criterion
Heterogeneous datasets provide
comparable or even similar pictures of
the urban environment.
Elsevier - Information Systems Journal 15
Regression analysis – significant predictors
Significant predictors of the models selected with manual backwards elimination
Elsevier - Information Systems Journal 16
3) Clustering analysis
Exploit the whole city information available (n-dimensional datasets), avoiding data compression.
Clustering technique to understand if diverse data naturally group together throughout the urban
space.
Cluster each dataset and compare the clustering obtained for each pair of datasets.
k-Means algorithm with 5 classes (Silhouette coefficient) applied on:
• CORINE: one vector, for each NIL/cell, of belonging percentages to the 12 categories
• Telecom: the whole footprint for each cell/NIL (a vector of 1440 values)
Elsevier - Information Systems Journal 17
Clustering analysis - Pairwise clusterings comparisons
Pairwise clustering comparisons using Rank Index and Overall Accuracy to evaluate datasets correnspondance
Elsevier - Information Systems Journal 18
District classified in the same way in Telecom and CORINE datasets
Clustering analysis - Qualitative analysis at district level
CORINE and whole activityCORINE and SMSin-out activity CORINE and Internet activity
Elsevier - Information Systems Journal 19
Conclusions and future work
• Evaluation of the correlations between datasets at different levels
• different spatial resolution (coarse-grained district level vs. fine-grained cell level)
• different data complexity (very compressed information vs multi-dimensional data).
• Correlations between different sources exist and their strength depends on both
spatial resolution and data complexity
• Diverse urban datasets can “date each other”, but that their actual “affinity” can
vary.
What is coming next?
• Extending our investigation toward a predicting approach (statistical and machine
learning techniques).
Elsevier - Information Systems Journal 20

More Related Content

Similar to City Data Dating: emerging affinities between diverse urban datasets

Extracting Urban Land Use from Linked Open Geospatial
Extracting Urban Land Use from Linked Open GeospatialExtracting Urban Land Use from Linked Open Geospatial
Extracting Urban Land Use from Linked Open GeospatialGloria Re Calegari
 
Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Beniamino Murgante
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataivaderivader
 
GIS and Agent-based modeling: Part 1
GIS and Agent-based modeling: Part 1GIS and Agent-based modeling: Part 1
GIS and Agent-based modeling: Part 1crooksAndrew
 
Applying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web servicesApplying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web servicesAlexander Decker
 
Presentation of PhD thesis on Location Data Fusion
Presentation of PhD thesis on Location Data Fusion Presentation of PhD thesis on Location Data Fusion
Presentation of PhD thesis on Location Data Fusion Alket Cecaj
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...PAPIs.io
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1Johan Blomme
 
Urban Environment Monitor
Urban Environment MonitorUrban Environment Monitor
Urban Environment MonitorBeta2k
 
Nhst 11 surat, Application of RS & GIS in urban waste management
Nhst 11 surat,  Application of RS  & GIS in urban waste managementNhst 11 surat,  Application of RS  & GIS in urban waste management
Nhst 11 surat, Application of RS & GIS in urban waste managementSamirsinh Parmar
 
Open Service Network Analysis
Open Service Network AnalysisOpen Service Network Analysis
Open Service Network AnalysisJorge Cardoso
 
A data driven approach for monitoring network events
A data driven approach for monitoring network eventsA data driven approach for monitoring network events
A data driven approach for monitoring network eventsJisc
 
Data Consistency in OpenStreetMap
Data Consistency in OpenStreetMapData Consistency in OpenStreetMap
Data Consistency in OpenStreetMapAlfonso Crisci
 
Introduction to GIS systems
Introduction to GIS systemsIntroduction to GIS systems
Introduction to GIS systemsVivek Srivastava
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.pptvishal choudhary
 
Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...
Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...
Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...Data Driven Innovation
 

Similar to City Data Dating: emerging affinities between diverse urban datasets (20)

Geo Open Data
Geo Open DataGeo Open Data
Geo Open Data
 
Extracting Urban Land Use from Linked Open Geospatial
Extracting Urban Land Use from Linked Open GeospatialExtracting Urban Land Use from Linked Open Geospatial
Extracting Urban Land Use from Linked Open Geospatial
 
Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...
 
Opinion and Consensus Dynamics in Tourism Digital Ecosystems
Opinion and Consensus Dynamics in Tourism Digital EcosystemsOpinion and Consensus Dynamics in Tourism Digital Ecosystems
Opinion and Consensus Dynamics in Tourism Digital Ecosystems
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
 
GIS and Agent-based modeling: Part 1
GIS and Agent-based modeling: Part 1GIS and Agent-based modeling: Part 1
GIS and Agent-based modeling: Part 1
 
Applying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web servicesApplying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web services
 
Presentation of PhD thesis on Location Data Fusion
Presentation of PhD thesis on Location Data Fusion Presentation of PhD thesis on Location Data Fusion
Presentation of PhD thesis on Location Data Fusion
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
 
Urban Environment Monitor
Urban Environment MonitorUrban Environment Monitor
Urban Environment Monitor
 
5.pdf
5.pdf5.pdf
5.pdf
 
Ongoing Research in Data Studies
Ongoing Research in Data StudiesOngoing Research in Data Studies
Ongoing Research in Data Studies
 
Nhst 11 surat, Application of RS & GIS in urban waste management
Nhst 11 surat,  Application of RS  & GIS in urban waste managementNhst 11 surat,  Application of RS  & GIS in urban waste management
Nhst 11 surat, Application of RS & GIS in urban waste management
 
Open Service Network Analysis
Open Service Network AnalysisOpen Service Network Analysis
Open Service Network Analysis
 
A data driven approach for monitoring network events
A data driven approach for monitoring network eventsA data driven approach for monitoring network events
A data driven approach for monitoring network events
 
Data Consistency in OpenStreetMap
Data Consistency in OpenStreetMapData Consistency in OpenStreetMap
Data Consistency in OpenStreetMap
 
Introduction to GIS systems
Introduction to GIS systemsIntroduction to GIS systems
Introduction to GIS systems
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...
Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...
Neighbourhood Liveability Map - Under development. Ester Pantaleo, Consulente...
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

City Data Dating: emerging affinities between diverse urban datasets

  • 1. City Data Dating: emerging affinities between diverse urban datasets Gloria Re Calegari, Irene Celino, Diego Peroni Paper available at: http://www.sciencedirect.com/science/article/pii/S0306437915001362 Elsevier - Information Systems Journal 1
  • 2. Digital information about cities • Open data (large number of data sources available on the web): • Urban planning (land cover, public registers) • Demographics and statistics about municipality • Closed data sources produced and maintained by enterprises: • Phone activity data  but sometimes made open! • User generated information: • Volunteered geographic information and crowdsourcing information (Open Street Map) • Location based social network (Foursquare check-ins and geo located information) • Real-time and streaming information • Sensors (e.g. Temperature, energy consumption, ..) Elsevier - Information Systems Journal 2
  • 3. Do diverse urban datasets provide the same “picture” of the city? Short term goals • Discovering “affinities” between heterogeneous datasets. • Using a human relations metaphor, do diverse urban datasets “date each other” and show “natural affinities”? • Which is the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources? Long term goals • Would it be possible to use one or more “cheap” datasets as proxy for more “expensive” data sources? Elsevier - Information Systems Journal 3
  • 4. Milano datasets Demographics: • Population density • Spatial resolution: census area (6079 –median size of census area 12,000 m2) • Source: Milano open data Points of interest (POIs): • Trasports, schools, sports facilities, amenity places, shops ... • Spatial resolution: lat-long points • Source: Milano open data (official, 6718) and Open Street Map (user generated, 44351) Elsevier - Information Systems Journal 4
  • 5. Milano datasets Land use: • type of land use according to CORINE taxonomy (3-levels hierarchy, up to 40 types of land use defined) • CORINE taxonomy http://swa.cefriel.it/ontologies/corine.html# • 12 types selected (which better feature metropolitan area as Milan) [dense residential areas, scattered residential areas, industrial areas, parks and green areas, roads, railways, hospitals, sports centres, public services and offices, construction sites, agricultural areas and wild areas.] • Spatial resolution: building level • Source: Lombardy region open data Elsevier - Information Systems Journal 5
  • 6. Milano datasets Call data records: • 5 phone activities • Incoming SMS • Outcoming SMS • Incoming CALL • Outcoming CALL • Internet • Recorded every 10 minutes (144 values a day for each activity) for 2 months (Nov-Dec 2013) • Spatial resolution: grid of 3538 square cells of 250m • Source: Telecom Italia – provided for their Big Data Challenge http://theodi.fbk.eu/openbigdata/ Elsevier - Information Systems Journal 6
  • 7. Pre-processing of data Making spatial resolution uniform Spatial resolutions used: • District level with 88 official subdivisions • Grid level with 3.538 square cells of 250m Cells Districts New datasets generated: • Density of POIs in each cell/district • Weighted sum of population density in each cell/district • Percentage shares of each land use over each cell/district area Elsevier - Information Systems Journal 7
  • 8. Pre-processing of data Telecom data Footprint/temporal signature for each cell/district (average activity over all the 60 days, distinguishing between week and weekend days) Temporal Data compression (pre-processing large scale time series to get a more manageable compressed representation) Elsevier - Information Systems Journal 8
  • 9. Data exploration experiments To discover possibly “natural” connections between heterogeneous datasets. Three-step process 1) Correlation Analysis 2) Regression Analysis 3) Clustering Analysis All the analysis performed both at district and at cell level complexity of data Elsevier - Information Systems Journal 9
  • 10. 1) Correlation analysis Try to identify possible correspondences between different datasets. Measure whether and how two variables change together using correlation indexes -> Pearson’s correlation coefficient -1 < r < 1 Positive correlation Negative correlation Pairwise comparisons between 1-dimensional vectors: • POIs: density • Population: density • Telecom: first Principal Component with 90% of explained variability • Land use data: only residential and agricultural land use used separately, in term of belonging percentages to district/cell Elsevier - Information Systems Journal 10
  • 11. Correlation analysis - district level • Correlation between • Telecom and residential • Telecom and POIs can actually exist. Data fits quasi linear models. tlc resid agric POI mun POI OSM pop • Negative correlation between agricultural land use and other datasets -> human action inversely related to agricultural areas. Elsevier - Information Systems Journal 11
  • 12. Correlation analysis - cell level • All coefficients lower than the district level • Higher values again between Telecom and residential and POIs => the choice of resolution level can have a significant impact on the correlation results. tlc resid agric POI mun POI OSM pop • Some phenomena causing the correlation are independent of the resolution level (0.76 residential-population) . Elsevier - Information Systems Journal 12
  • 13. Correlation analysis - phone calls and population • Could the correlation change during the day according to the everyday human behaviour pattern (get up, go to work, come back home in the evening)? • Call activity at 6 different day times • Week and weekend profiles are different -> mirroring people’s different habits • Average correlation higher in the weekend (phone activity related to the actual presence of people at home) • Weekday profile -> human behaviour pattern DISTRICT CELL WEEKWEEKEND Elsevier - Information Systems Journal 13
  • 14. 2) Regression analysis Fit multiple linear regression models (MLR) (To take into consideration a larger portion of urban data complexity) MLR models Telecom (1-d vector for each communication type for week/weekend days with the average activity) POIs (1-d vector for each POI category with the POI density) Demographics (1-d vector with the population density) Land use (1-d vector for each land use with the percentage of land covered. Only 5 main land use type are used.) Cheap-to- produce Expensive- to-mantain Elsevier - Information Systems Journal 14
  • 15. Regression analysis - results • The more selected predictors (increasing k), the larger part of the outcome variable’s variance is explained (increasing R2adj). • The strength of the correlation decrease from district-level (coarse-grained) to cell-level (fine-grained) • Benefit of adding a higher number of predictors is weaker at cell-level • Population, dense residential and agricultural areas show a stronger correlation manual stepwise model selection AIC criterion Heterogeneous datasets provide comparable or even similar pictures of the urban environment. Elsevier - Information Systems Journal 15
  • 16. Regression analysis – significant predictors Significant predictors of the models selected with manual backwards elimination Elsevier - Information Systems Journal 16
  • 17. 3) Clustering analysis Exploit the whole city information available (n-dimensional datasets), avoiding data compression. Clustering technique to understand if diverse data naturally group together throughout the urban space. Cluster each dataset and compare the clustering obtained for each pair of datasets. k-Means algorithm with 5 classes (Silhouette coefficient) applied on: • CORINE: one vector, for each NIL/cell, of belonging percentages to the 12 categories • Telecom: the whole footprint for each cell/NIL (a vector of 1440 values) Elsevier - Information Systems Journal 17
  • 18. Clustering analysis - Pairwise clusterings comparisons Pairwise clustering comparisons using Rank Index and Overall Accuracy to evaluate datasets correnspondance Elsevier - Information Systems Journal 18
  • 19. District classified in the same way in Telecom and CORINE datasets Clustering analysis - Qualitative analysis at district level CORINE and whole activityCORINE and SMSin-out activity CORINE and Internet activity Elsevier - Information Systems Journal 19
  • 20. Conclusions and future work • Evaluation of the correlations between datasets at different levels • different spatial resolution (coarse-grained district level vs. fine-grained cell level) • different data complexity (very compressed information vs multi-dimensional data). • Correlations between different sources exist and their strength depends on both spatial resolution and data complexity • Diverse urban datasets can “date each other”, but that their actual “affinity” can vary. What is coming next? • Extending our investigation toward a predicting approach (statistical and machine learning techniques). Elsevier - Information Systems Journal 20