SlideShare a Scribd company logo
Geo-statistical Exploration of 
Milano Datasets 
Irene Celino and Gloria Re Calegari 
CEFRIEL 
1 
The 13th International Semantic Web Conference 2014 
Riva del Garda, Italy 
19 – 23 October 2014 
Semantic Statistics workshop 
19th October 2014
Research goal 
Long term goal 
• Compare data related to the same city but obtained from heterogeneous 
sources. Do they provide the same ‘picture’ of the city? 
• Can we update a dataset (expensive and time consuming, updated once every 
5/10 years) with another dataset (cheaper and always up to date)? 
Presentation target 
• Comparison of two heterogeneous datasets referring to the same city (Milan) to 
discover if they have any intrinsic correlation 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 2
Datasets available 
• Telecom (phone activity data) provided for their “Big Data Challenge” 
• Milan + surroundings 
• Nov-dic 2013 
• Grid of 10.000 cells 
• Activity recorded every ten minutes 
• A footprint for each cell 
• ISTAT - Italian National Statistical Institute 
• demographic data of 2011 and 2001 
• divided by sex, age and nationality 
Analysis performed using data in their original format (GeoJSON, csv) and serialization in RDF format 
at the end of the analysis. 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 3
Datasets available – different spatial granularity 
ISTAT – 50 surrounding towns + 9 
internal Milano districts 
Telecom – grid of 10.000 cells (250 x 
250 m each) 
Pre-processing of 
data required. 
Mapping of 
Telecom data into 
municipality 
granularity. 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 4
Methodology of analysis 
Goal of analysis: do the two datasets have the same intrinsic meaning? 
Unsupervised clustering to group data in each dataset: 
• k-means (euclidean distance) 
• Adaptive k-means (cosine distance) 
Comparison of the two clustering results to find correlation between the two datasets. 
Validation of the comparison: 
• Rand Index 
• Kappa Index 
(the closer to 1 the index is, the stronger the correlation between the data clusterings) 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 5
Experiments- Telecom vs ISTAT (range on total poulation) 
Telecom - Kmeans 6 classes 
Rand 
Index 
Kappa 
Index 
0,23 0,23 
ISTAT 2011 - range 6 classes 
Only partial 
correlation 
Try to add 
information to total 
population (feature 
vector with 
distribution of 
populatin divided by 
sex, age and 
nationality) 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 6
Experiments - Telecom vs ISTAT (2011 fine-grained population) 
Telecom - Kmeans 4 classes 
Rand 
Index 
Kappa 
Index 
0,77 0,81 
ISTAT 2011 – Kmeans 4 classes 
Good correlation 
Try to add historical 
information (2001 
datasets). 
Does adding 
temporal dynamic 
improve corrrelation? 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 7
Telecom - Kmeans 4 classes 
Rand 
Index 
Kappa 
Index 
0,77 0,81 
ISTAT 2011 + 2001 – Kmeans 4 classes 
Good correlation. 
The same as the 
previous test. 
Experiments- Telecom vs ISTAT (2011+2001 fine-grained population) 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 8
Clustering results serialization 
Results of 
clustering 
analysis 
Geographic format 
RDF format 
Geo + RDF format 
GeoJSON file 
RDF Data Cube vocabulary (adding in the 
Observation definition concepts of clustering 
algo and cluster number) 
Enrich GeoJSON with power of 
JSON-LD -> ‘’GeoJSON-LD’’ 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 9
‘’GeoJSON-LD’’ 
‘’GeoJSON-LD’’ is obtained by adding the ‘@context’ prefix to the GeoJSON file. 
The prefix specify how to interpret GeoJSON tags as RDF resources 
@context prefix GeoJSON file 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 10
‘’GeoJSON-LD’’ pros and cons 
Correct interpretation of the geographic information 
in the GeoJSON-LD file 
Problems in the interpretation of the RDF information. 
‘’GeoJSON-LD’’ does not interpret correctly a lists of lists 
(polygon coordinates representation) 
GeoJSON-LD is correctly interpreted as GeoJSON by 
GIS 
GeoJSON list of lists 
RDF serialization 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 11
Conclusion and future works 
• Identification of a correlation between phone activity data and 
demographic information 
• Further tests using different datasets (land use, Point of interests) 
• Find efficient method for handling the different granularity levels of the 
datasets 
• Method for handling temporal aspect of phone activity data (we lost temporal 
information during clustering process) 
• Find a method to overcome ‘’GeoJSON-LD’’ serialization problem 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 12
Thank you! Any question? 
Further details at: http://swa.cefriel.it/geo/ 
Irene Celino and Gloria Re Calegari 
CEFRIEL 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 13

More Related Content

What's hot

The RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPsThe RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPs
RIPE NCC
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
Shaun Lewis
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Evangelos Kalampokis
 
RIPE Atlas
RIPE AtlasRIPE Atlas
RIPE Atlas
RIPE NCC
 
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataA Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
Gloria Re Calegari
 
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
Henrique O. Santos
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Guy Lansley
 
So much data so many uses
So much data so many usesSo much data so many uses
So much data so many uses
GailBordenTech
 

What's hot (8)

The RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPsThe RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPs
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
RIPE Atlas
RIPE AtlasRIPE Atlas
RIPE Atlas
 
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataA Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
 
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
So much data so many uses
So much data so many usesSo much data so many uses
So much data so many uses
 

Similar to Geo-statistical Exploration of Milano Datasets

City Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasetsCity Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasets
Gloria Re Calegari
 
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
GIS in the Rockies
 
Lemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshareLemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshare
Rob Lemmens
 
Data Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataData Ecosystems for Geospatial Data
Data Ecosystems for Geospatial Data
Slim Turki, Dr.
 
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Hotspot Analysis with QGIS  - FOSS4G-IT 2017Hotspot Analysis with QGIS  - FOSS4G-IT 2017
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Daniele Oxoli
 
Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...
Paul Box
 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
Paolo Tomeo
 
03 --metadata
03 --metadata03 --metadata
03 --metadata
Karel Charvat
 
Egu2013 final
Egu2013 finalEgu2013 final
Egu2013 finalsirf13
 
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
Karel Charvat
 
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Sergio Consoli
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Data Con LA
 
Murphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packagesMurphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packages
OECDregions
 
Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...
Beniamino Murgante
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
Fabio Benedetti
 
03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI
plan4all
 
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Iconsulting
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
plan4all
 
LokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportLokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportlokesh shanmuganandam
 

Similar to Geo-statistical Exploration of Milano Datasets (20)

City Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasetsCity Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasets
 
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
 
Lemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshareLemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshare
 
Data Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataData Ecosystems for Geospatial Data
Data Ecosystems for Geospatial Data
 
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Hotspot Analysis with QGIS  - FOSS4G-IT 2017Hotspot Analysis with QGIS  - FOSS4G-IT 2017
Hotspot Analysis with QGIS - FOSS4G-IT 2017
 
Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...
 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
 
03 --metadata
03 --metadata03 --metadata
03 --metadata
 
Egu2013 final
Egu2013 finalEgu2013 final
Egu2013 final
 
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
 
GISrock1_DG_REwork
GISrock1_DG_REworkGISrock1_DG_REwork
GISrock1_DG_REwork
 
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
 
Murphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packagesMurphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packages
 
Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
 
03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI
 
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
 
LokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportLokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReport
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 

Geo-statistical Exploration of Milano Datasets

  • 1. Geo-statistical Exploration of Milano Datasets Irene Celino and Gloria Re Calegari CEFRIEL 1 The 13th International Semantic Web Conference 2014 Riva del Garda, Italy 19 – 23 October 2014 Semantic Statistics workshop 19th October 2014
  • 2. Research goal Long term goal • Compare data related to the same city but obtained from heterogeneous sources. Do they provide the same ‘picture’ of the city? • Can we update a dataset (expensive and time consuming, updated once every 5/10 years) with another dataset (cheaper and always up to date)? Presentation target • Comparison of two heterogeneous datasets referring to the same city (Milan) to discover if they have any intrinsic correlation Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 2
  • 3. Datasets available • Telecom (phone activity data) provided for their “Big Data Challenge” • Milan + surroundings • Nov-dic 2013 • Grid of 10.000 cells • Activity recorded every ten minutes • A footprint for each cell • ISTAT - Italian National Statistical Institute • demographic data of 2011 and 2001 • divided by sex, age and nationality Analysis performed using data in their original format (GeoJSON, csv) and serialization in RDF format at the end of the analysis. Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 3
  • 4. Datasets available – different spatial granularity ISTAT – 50 surrounding towns + 9 internal Milano districts Telecom – grid of 10.000 cells (250 x 250 m each) Pre-processing of data required. Mapping of Telecom data into municipality granularity. Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 4
  • 5. Methodology of analysis Goal of analysis: do the two datasets have the same intrinsic meaning? Unsupervised clustering to group data in each dataset: • k-means (euclidean distance) • Adaptive k-means (cosine distance) Comparison of the two clustering results to find correlation between the two datasets. Validation of the comparison: • Rand Index • Kappa Index (the closer to 1 the index is, the stronger the correlation between the data clusterings) Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 5
  • 6. Experiments- Telecom vs ISTAT (range on total poulation) Telecom - Kmeans 6 classes Rand Index Kappa Index 0,23 0,23 ISTAT 2011 - range 6 classes Only partial correlation Try to add information to total population (feature vector with distribution of populatin divided by sex, age and nationality) Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 6
  • 7. Experiments - Telecom vs ISTAT (2011 fine-grained population) Telecom - Kmeans 4 classes Rand Index Kappa Index 0,77 0,81 ISTAT 2011 – Kmeans 4 classes Good correlation Try to add historical information (2001 datasets). Does adding temporal dynamic improve corrrelation? Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 7
  • 8. Telecom - Kmeans 4 classes Rand Index Kappa Index 0,77 0,81 ISTAT 2011 + 2001 – Kmeans 4 classes Good correlation. The same as the previous test. Experiments- Telecom vs ISTAT (2011+2001 fine-grained population) Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 8
  • 9. Clustering results serialization Results of clustering analysis Geographic format RDF format Geo + RDF format GeoJSON file RDF Data Cube vocabulary (adding in the Observation definition concepts of clustering algo and cluster number) Enrich GeoJSON with power of JSON-LD -> ‘’GeoJSON-LD’’ Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 9
  • 10. ‘’GeoJSON-LD’’ ‘’GeoJSON-LD’’ is obtained by adding the ‘@context’ prefix to the GeoJSON file. The prefix specify how to interpret GeoJSON tags as RDF resources @context prefix GeoJSON file Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 10
  • 11. ‘’GeoJSON-LD’’ pros and cons Correct interpretation of the geographic information in the GeoJSON-LD file Problems in the interpretation of the RDF information. ‘’GeoJSON-LD’’ does not interpret correctly a lists of lists (polygon coordinates representation) GeoJSON-LD is correctly interpreted as GeoJSON by GIS GeoJSON list of lists RDF serialization Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 11
  • 12. Conclusion and future works • Identification of a correlation between phone activity data and demographic information • Further tests using different datasets (land use, Point of interests) • Find efficient method for handling the different granularity levels of the datasets • Method for handling temporal aspect of phone activity data (we lost temporal information during clustering process) • Find a method to overcome ‘’GeoJSON-LD’’ serialization problem Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 12
  • 13. Thank you! Any question? Further details at: http://swa.cefriel.it/geo/ Irene Celino and Gloria Re Calegari CEFRIEL Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 13