SlideShare a Scribd company logo
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
A strategy for the matching of mobile phone
signals with census data
Rodolfo Metulini1, Maurizio Carpita1
1. Data Methods and Systems Statistical Laboratory - Department of
Economics and Management, University of Brescia
Milano - June 19th, 2019
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Table of contents
1 Data & Context
2 Methods
3 Results
4 Conclusions
5 Acknowledgm. & References
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
The Context
• Administrative data are traditionally used to count the presence of
people (Static)
• The geo-localization of people by mobile phone, by quantifying the
number of people at a given moment in time, enriches the amount of
useful information for “smart” (cities) evaluations. (Dynamic)
• Using Telecom Italia Mobile (TIM) data, we are able to characterize
the spatio-temporal dynamic of the presences in the city of just TIM
users.
• In this paper we propose a strategy to extrapolate the number of
people by using TIM data only.
• To do so, we apply a spatial record linkage of mobile phone data
with administrative archives using the number of residents at the
level of “sezione di censimento”.
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Data
• Provided by Telecom Italia Mobile (TIM), thanks to a research
collaboration of DMS StatLab with the Statistical Office of the
Municipality of Brescia
• Recorded in the period April 1st 2014 – August 11th 2016, in a
rectangular region defined by latitude 45.21◦
N - 46.36◦
N and
longitude 9.83◦
N - 10.85◦
N
• Aggregated into 923 x 607 rectangular cells of 150 m2
size each
• Available at intervals of 15 minutes, for a total of more than 40,000
millions of records collected
• The corresponding record refers to the average number of mobile
phones simultaneously connected to the network in that rectangular
area in that time interval
• The mobility feature of these data is hidden, in the sense it is not
possible to trace the single person over the time
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Mobile phone data in Literature
Similar data has been used by:
• Carpita and Simonetto (2014) analyzed the presence of people during
big events in the city of Brescia
• Zanini et al. (2016) find, by mean of a Independent Component
Analysis (ICA), a number of spatial components that separate main
areas of the city of Milano
• Manfredini et al. (2015) used Treelet Decomposition and Voronoi
Tassellation to study density curves
• Secchi et al. (2017) used Blind Source Separation, a method that
allows to extrapolate significant sources and to associate each source
to a specific urban behavior
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Estimating Density Profiles (i)
• We estimate the presence of TIM users in a specific area by
classifying similar days in terms of spatial and temporal dimension,
using a large (≈ 2 years) dataset (Metulini & Carpita, 2019)
• To manage with high dimensional data, we employed a multi-stage
procedure (Tomasi, 2012) that converts the data matrix containing
the values of the grid (2-D) to a vector of features (1-D)
• The procedure defines reference days using a mix of traditional
(k-means) and model-based functional data clustering techniques
Step Action Aim Methods Using ..
1 group days find similar
raster images
histogram of
oriented gradi-
ents (HOG) &
k-means
HOG
features
2 group groups
of days
find similar den-
sities
functional model-
based clustering
daily
density
profiles
3 characterize
groups
find reference
daily profiles
functional box
plots
daily
density
profiles
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Estimating Density Profiles (ii)
From a nxn raster data ....


93 124 77 ... ...
217 55 94 ... ...
24 77 109 ... ...
... ... ... ... ...
... ... ... ... ...


...to Xt , a matrix representing the
number of people in that cell at time t
quart. feat. day1 day2 ... day ˜T
1 1 h11,1 h21,1 ... h ˜T1,1
1 2 h11,2 h21,2 ... h ˜T1,2
1 ... ... ... ... ...
1 k h11,k h21,k ... h ˜T1,k
... ... ... ... ... ...
96 k h196,k h296,k ... h ˜T96,k
...to a vector of features of the 96
quarters of the same day
... to a classification of the days in
clusters
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Estimating Density Profiles (iii)
• The output of the procedure is a functional box plot on the daily
density profiles of a group of similar days (Febrero et al. 2008,
Bouveyron et al. 2015, Sun & Genton, 2011)
• By applying the procedure to the 39 x 39 rectangular grid defined by
latitude 45.516◦
N - 46.564◦
N and longitude 10.18◦
N - 10.245◦
N
( Brescia ) we find, for example, that most of the week days of the
Summer 2016 belongs to the same group
• The amount of TIM users along different quarters varies, by month
and by quarter, from a minimum of 30 to a maximum of about 55
thousands
0 20 40 60 80
303540455055
June
0 20 40 60 80
303540455055
July
0 20 40 60 80
303540455055
August
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
The Market share coefficient
• To estimate the dynamic of the total number of people we have to
consider all mobile phone users. This data is often unavailable,
unless to an onerous cost
• Alternative approach: to apply the mobile phone market share
coefficient to the number of TIM users
• A country-level estimate being available through “Il Sole 24 Ore”
newspaper. This value stands to 30.2 % (2016, December)
• However, we have reasons to think that TIM market share varies
along cities due to socio-economic and demographic variables
Quantity Brescia Italy
Per-capita revenues (Euro/year)1 23,418 19,514
% foreigners2 18.5 8.5
Avg. number of people per family2 2.11 2.33
Avg. age2 45.8 44.7
1 MEF -Dip. delle Finanze (2016)
2 ISTAT (2017)
• Assuming residential areas are populated, on late evening, only by
residents, we compare number of residents with the number of TIM
users on selected regions during a specific hour of the day (9pm)
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Administrative data
• ISTAT (https://www.istat.it/it/archivio/104317) published
“Basi territoriali e variabili censuarie” in the form of a shape file with
data (the so-called SpatialPolygonDataFrame in R language).
• For the municipalities with more than 20,000 residents, ISTAT
aggregates the region at a “Sezioni di censimento” (SC) level. The
municipality of Brescia has 1,836 SCs.
• The shape file contains, for each polygon, the information on the
number of residents by SC.
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Matching Strategy
• We compute the number of TIM users in each SC by matching
TIM grid cells with the shape files on residents (grid cells have
regular size, SCs are irregular polygons)
• We apply a weighted scheme based on the portion of the polygon
contained in the cell
EXAMPLE: SC110 overlaps with 4
cells
Oct 28th 15, 9pm: cell 1: 682 TIM
users, cell 2: 555, cell 3: 677, cell 4:
751
8.3% of SC110 lies in cell 1, 27.0% in
cell 2, 26.4% in cell 3, 38.2% in cell 4.
TIM users = (682 ∗ 0.083 + 555 ∗ 0.270 + 677 ∗ 0.264 + 751 ∗ 0.382) ∗
area(SC)
area(cell)
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Market share estimate (i)
• The median value is consistent with the TIM market share at a
country level
• Right tail distribution values are extremely large
min 5th 25th median 75th 95th max
0.006 0.070 0.139 0.245 0.547 5.567 347.024
Ratio Map Population Map
• in large areas with small number of residents the ratio is
overestimated
• Antenna is often localized in large SCs where few people reside.
People on residential areas are likely tracked where the antenna is
(i.e. up to few hundred meters far)
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Market share estimate (ii)
• Alternative strategy: to estimate the ratio for selected residential
areas plus a portion of neighbourhood, to include the Antenna
• Residential areas are chosen according to DUSAF (Destinazione
d’Uso dei Suoli Agricoli e Forestali) map
https://www.dati.lombardia.it/Territorio/Dusaf-5-0-Uso-del-suolo-2015/iq6r-u7y2.
Figure: Villaggio Sereno residential
area: 0.265
Figure: San Polo residential area:
0.310
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Further developments
• New estimated (via
market share
coefficient) density
curves
• Population - average
working day (11:00am,
peak)
r = 0.30, ∼ 217,000
r = 0.25, ∼ 254,000
• at macro (Municipality), micro (Sezioni Censimento) and by land
usage.
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Conclusions
• Mobile phone data can be used to estimate the number of people in
the city considering the dynamics over the time, differently to
administrative archives, which reports static numbers
• Generally, just a portion of phone users is available.
• We have developed a method to estimate the market share at a
municipality level by using administrative data on the number of
residents by “Sezione di censimento”
• By putting together mobile phone private data with two kinds of
administrative (freely available) data (census data and land usage
information)
• Results are consistent with the market share at a national level and
can be used to infer the dynamic of the presence at municipality level.
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Acknowledgements
Data Methods and Systems Statistical Laboratory (DMS StatLab)
DMS StatLab
Authors are grateful with the Statistical Office of the Municipality of
Brescia, with a special mention to
• Dr. Marco Palamenghi,
• Dr. Paola Chiesa
• Dr. Maria Elena Comune
who kindly supported and provided us with data.
matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
References
1 Bouveyron, C., Come, E., & Jacques, J. (2015). The discriminative functional mixture model for a
comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9(4), 1726-1760
2 Carpita, M., & Simonetto, A. Big Data to Monitor Big Social Events: Analysing the mobile phone
signals in the Brescia Smart City. Electronic Journal of Applied Statistical Analysis: Decision
Support Systems and Services Evaluation, vol. 5(1), pp. 31-41. (2014)
3 Febrero, M., Galeano, P., & Gonzalez-Manteiga, W. (2008). Outlier detection in functional data by
depth measures, with application to identify abnormal NOx levels. Environmetrics: The official
journal of the International Environmetrics Society, 19(4), 331-345.
4 Manfredini, F., Pucci, P., Secchi, P., Tagliolato, P., Vantini, S., & Vitelli, V. (2015). Treelet
decomposition of mobile phone data for deriving city usage and mobility pattern in the Milan urban
region. In Advances in complex data modeling and computational methods in statistics (pp.
133-147). Springer, Cham.
5 Metulini, R., & Carpita, M. HUMAN ACTIVITY SPATIO-TEMPORAL INDICATORS USING
MOBILE PHONE DATA. Data Science & Social Research 2019 Book of Abstracts, 89.
6 Secchi, P., Vantini, S., & Zanini, P. Analysis of Mobile Phone Data for Deriving City Mobility
Patterns. In Electric Vehicle Sharing Services for Smarter Cities (pp. 37-58). Springer, Cham (2017)
7 Sun, Y., & Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical
Statistics, 20(2), 316-334.
8 Tomasi, C. (2012). Histograms of oriented gradients. Computer Vision Sampler, 1-6.
9 Zanini, P., Shen, H., & Truong, Y. Understanding resident mobility in Milan through independent
component analysis of Telecom Italia mobile usage data. The Annals of Applied Statistics, vol.
10(2), pp. 812-833 (2016)
matching
mobile phone
signals and
census data
Metulini,
Carpita
Supplemental
Agricolo
Aree di interesse sovracomunale
Boschi
Extraurbano non classificato
Ferrovie
Parcheggio
Polifunzionale
Produttivo
Residenziale
Servizio comunale
Strade
Back to slide
matching
mobile phone
signals and
census data
Metulini,
Carpita
Supplemental
Back to slide
matching
mobile phone
signals and
census data
Metulini,
Carpita
Supplemental
Figure: Ratio - October 28th
(Wednesday), 2015, 9pm
Figure: Population - January, 1st
2016
Back to slide
matching
mobile phone
signals and
census data
Metulini,
Carpita
Supplemental
Back to slide

More Related Content

Similar to A strategy for the matching of mobile phone signals with census data

Monitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network dataMonitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network dataBeniamino Murgante
 
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Gloria Re Calegari
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...PAPIs.io
 
Nhst 11 surat, Application of RS & GIS in urban waste management
Nhst 11 surat,  Application of RS  & GIS in urban waste managementNhst 11 surat,  Application of RS  & GIS in urban waste management
Nhst 11 surat, Application of RS & GIS in urban waste managementSamirsinh Parmar
 
MODELLING DYNAMIC PATTERNS USING MOBILE DATA
MODELLING DYNAMIC PATTERNS USING MOBILE DATAMODELLING DYNAMIC PATTERNS USING MOBILE DATA
MODELLING DYNAMIC PATTERNS USING MOBILE DATAcscpconf
 
Modelling dynamic patterns using mobile data
Modelling dynamic patterns using mobile dataModelling dynamic patterns using mobile data
Modelling dynamic patterns using mobile datacsandit
 
Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...Indicus Analytics Private Limited
 
Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...
Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...
Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...Fredrick Ishengoma
 
Density of route frequency for enforcement
Density of route frequency for enforcement Density of route frequency for enforcement
Density of route frequency for enforcement Conference Papers
 
Richard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport PoliceRichard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport PoliceAGI Geocommunity
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
 
Km4city: Open Urban Platform for a Sentient Smart City
Km4city: Open Urban Platform for a Sentient Smart CityKm4city: Open Urban Platform for a Sentient Smart City
Km4city: Open Urban Platform for a Sentient Smart CityPaolo Nesi
 
ledio_gjoni_tesi
ledio_gjoni_tesiledio_gjoni_tesi
ledio_gjoni_tesiLedio Gjoni
 
Mobile Data for Development Primer
Mobile Data for Development PrimerMobile Data for Development Primer
Mobile Data for Development PrimerUN Global Pulse
 
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSangeetha Mam
 
Satellite based observations of the time-variation of urban pattern morpholog...
Satellite based observations of the time-variation of urban pattern morpholog...Satellite based observations of the time-variation of urban pattern morpholog...
Satellite based observations of the time-variation of urban pattern morpholog...Beniamino Murgante
 
Project Report for City of Los Angeles’ 311 Service Request
Project Report for City of Los Angeles’ 311 Service RequestProject Report for City of Los Angeles’ 311 Service Request
Project Report for City of Los Angeles’ 311 Service RequestRaman Deep Singh
 

Similar to A strategy for the matching of mobile phone signals with census data (20)

Monitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network dataMonitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network data
 
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
 
Nhst 11 surat, Application of RS & GIS in urban waste management
Nhst 11 surat,  Application of RS  & GIS in urban waste managementNhst 11 surat,  Application of RS  & GIS in urban waste management
Nhst 11 surat, Application of RS & GIS in urban waste management
 
MODELLING DYNAMIC PATTERNS USING MOBILE DATA
MODELLING DYNAMIC PATTERNS USING MOBILE DATAMODELLING DYNAMIC PATTERNS USING MOBILE DATA
MODELLING DYNAMIC PATTERNS USING MOBILE DATA
 
Modelling dynamic patterns using mobile data
Modelling dynamic patterns using mobile dataModelling dynamic patterns using mobile data
Modelling dynamic patterns using mobile data
 
Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...
 
Where Next
Where NextWhere Next
Where Next
 
Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...
Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...
Adam Mtaho & Fredrick Ishengoma - Factors Affecting QoS in Tanzania Cellular ...
 
Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
 
Density of route frequency for enforcement
Density of route frequency for enforcement Density of route frequency for enforcement
Density of route frequency for enforcement
 
Sergio ICON project
Sergio ICON projectSergio ICON project
Sergio ICON project
 
Richard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport PoliceRichard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport Police
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone data
 
Km4city: Open Urban Platform for a Sentient Smart City
Km4city: Open Urban Platform for a Sentient Smart CityKm4city: Open Urban Platform for a Sentient Smart City
Km4city: Open Urban Platform for a Sentient Smart City
 
ledio_gjoni_tesi
ledio_gjoni_tesiledio_gjoni_tesi
ledio_gjoni_tesi
 
Mobile Data for Development Primer
Mobile Data for Development PrimerMobile Data for Development Primer
Mobile Data for Development Primer
 
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
 
Satellite based observations of the time-variation of urban pattern morpholog...
Satellite based observations of the time-variation of urban pattern morpholog...Satellite based observations of the time-variation of urban pattern morpholog...
Satellite based observations of the time-variation of urban pattern morpholog...
 
Project Report for City of Los Angeles’ 311 Service Request
Project Report for City of Los Angeles’ 311 Service RequestProject Report for City of Los Angeles’ 311 Service Request
Project Report for City of Los Angeles’ 311 Service Request
 

More from University of Salerno

Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...University of Salerno
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSUniversity of Salerno
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team PerformanceUniversity of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...University of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...University of Salerno
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...University of Salerno
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramUniversity of Salerno
 

More from University of Salerno (20)

Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORS
 
Poster venezia
Poster veneziaPoster venezia
Poster venezia
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Big Data Analytics for Smart Cities
Big Data Analytics for Smart CitiesBig Data Analytics for Smart Cities
Big Data Analytics for Smart Cities
 
Meeting progetto ode_sm_rm
Meeting progetto ode_sm_rmMeeting progetto ode_sm_rm
Meeting progetto ode_sm_rm
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
 
The Global Virtual Water Network
The Global Virtual Water NetworkThe Global Virtual Water Network
The Global Virtual Water Network
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with Kriskogram
 
Ad b 1702_metu_v2
Ad b 1702_metu_v2Ad b 1702_metu_v2
Ad b 1702_metu_v2
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Talk 2
Talk 2Talk 2
Talk 2
 
Talk 3
Talk 3Talk 3
Talk 3
 
Talk 4
Talk 4Talk 4
Talk 4
 

A strategy for the matching of mobile phone signals with census data

  • 1. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References A strategy for the matching of mobile phone signals with census data Rodolfo Metulini1, Maurizio Carpita1 1. Data Methods and Systems Statistical Laboratory - Department of Economics and Management, University of Brescia Milano - June 19th, 2019
  • 2. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Table of contents 1 Data & Context 2 Methods 3 Results 4 Conclusions 5 Acknowledgm. & References
  • 3. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References The Context • Administrative data are traditionally used to count the presence of people (Static) • The geo-localization of people by mobile phone, by quantifying the number of people at a given moment in time, enriches the amount of useful information for “smart” (cities) evaluations. (Dynamic) • Using Telecom Italia Mobile (TIM) data, we are able to characterize the spatio-temporal dynamic of the presences in the city of just TIM users. • In this paper we propose a strategy to extrapolate the number of people by using TIM data only. • To do so, we apply a spatial record linkage of mobile phone data with administrative archives using the number of residents at the level of “sezione di censimento”.
  • 4. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Data • Provided by Telecom Italia Mobile (TIM), thanks to a research collaboration of DMS StatLab with the Statistical Office of the Municipality of Brescia • Recorded in the period April 1st 2014 – August 11th 2016, in a rectangular region defined by latitude 45.21◦ N - 46.36◦ N and longitude 9.83◦ N - 10.85◦ N • Aggregated into 923 x 607 rectangular cells of 150 m2 size each • Available at intervals of 15 minutes, for a total of more than 40,000 millions of records collected • The corresponding record refers to the average number of mobile phones simultaneously connected to the network in that rectangular area in that time interval • The mobility feature of these data is hidden, in the sense it is not possible to trace the single person over the time
  • 5. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Mobile phone data in Literature Similar data has been used by: • Carpita and Simonetto (2014) analyzed the presence of people during big events in the city of Brescia • Zanini et al. (2016) find, by mean of a Independent Component Analysis (ICA), a number of spatial components that separate main areas of the city of Milano • Manfredini et al. (2015) used Treelet Decomposition and Voronoi Tassellation to study density curves • Secchi et al. (2017) used Blind Source Separation, a method that allows to extrapolate significant sources and to associate each source to a specific urban behavior
  • 6. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Estimating Density Profiles (i) • We estimate the presence of TIM users in a specific area by classifying similar days in terms of spatial and temporal dimension, using a large (≈ 2 years) dataset (Metulini & Carpita, 2019) • To manage with high dimensional data, we employed a multi-stage procedure (Tomasi, 2012) that converts the data matrix containing the values of the grid (2-D) to a vector of features (1-D) • The procedure defines reference days using a mix of traditional (k-means) and model-based functional data clustering techniques Step Action Aim Methods Using .. 1 group days find similar raster images histogram of oriented gradi- ents (HOG) & k-means HOG features 2 group groups of days find similar den- sities functional model- based clustering daily density profiles 3 characterize groups find reference daily profiles functional box plots daily density profiles
  • 7. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Estimating Density Profiles (ii) From a nxn raster data ....   93 124 77 ... ... 217 55 94 ... ... 24 77 109 ... ... ... ... ... ... ... ... ... ... ... ...   ...to Xt , a matrix representing the number of people in that cell at time t quart. feat. day1 day2 ... day ˜T 1 1 h11,1 h21,1 ... h ˜T1,1 1 2 h11,2 h21,2 ... h ˜T1,2 1 ... ... ... ... ... 1 k h11,k h21,k ... h ˜T1,k ... ... ... ... ... ... 96 k h196,k h296,k ... h ˜T96,k ...to a vector of features of the 96 quarters of the same day ... to a classification of the days in clusters
  • 8. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Estimating Density Profiles (iii) • The output of the procedure is a functional box plot on the daily density profiles of a group of similar days (Febrero et al. 2008, Bouveyron et al. 2015, Sun & Genton, 2011) • By applying the procedure to the 39 x 39 rectangular grid defined by latitude 45.516◦ N - 46.564◦ N and longitude 10.18◦ N - 10.245◦ N ( Brescia ) we find, for example, that most of the week days of the Summer 2016 belongs to the same group • The amount of TIM users along different quarters varies, by month and by quarter, from a minimum of 30 to a maximum of about 55 thousands 0 20 40 60 80 303540455055 June 0 20 40 60 80 303540455055 July 0 20 40 60 80 303540455055 August
  • 9. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References The Market share coefficient • To estimate the dynamic of the total number of people we have to consider all mobile phone users. This data is often unavailable, unless to an onerous cost • Alternative approach: to apply the mobile phone market share coefficient to the number of TIM users • A country-level estimate being available through “Il Sole 24 Ore” newspaper. This value stands to 30.2 % (2016, December) • However, we have reasons to think that TIM market share varies along cities due to socio-economic and demographic variables Quantity Brescia Italy Per-capita revenues (Euro/year)1 23,418 19,514 % foreigners2 18.5 8.5 Avg. number of people per family2 2.11 2.33 Avg. age2 45.8 44.7 1 MEF -Dip. delle Finanze (2016) 2 ISTAT (2017) • Assuming residential areas are populated, on late evening, only by residents, we compare number of residents with the number of TIM users on selected regions during a specific hour of the day (9pm)
  • 10. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Administrative data • ISTAT (https://www.istat.it/it/archivio/104317) published “Basi territoriali e variabili censuarie” in the form of a shape file with data (the so-called SpatialPolygonDataFrame in R language). • For the municipalities with more than 20,000 residents, ISTAT aggregates the region at a “Sezioni di censimento” (SC) level. The municipality of Brescia has 1,836 SCs. • The shape file contains, for each polygon, the information on the number of residents by SC.
  • 11. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Matching Strategy • We compute the number of TIM users in each SC by matching TIM grid cells with the shape files on residents (grid cells have regular size, SCs are irregular polygons) • We apply a weighted scheme based on the portion of the polygon contained in the cell EXAMPLE: SC110 overlaps with 4 cells Oct 28th 15, 9pm: cell 1: 682 TIM users, cell 2: 555, cell 3: 677, cell 4: 751 8.3% of SC110 lies in cell 1, 27.0% in cell 2, 26.4% in cell 3, 38.2% in cell 4. TIM users = (682 ∗ 0.083 + 555 ∗ 0.270 + 677 ∗ 0.264 + 751 ∗ 0.382) ∗ area(SC) area(cell)
  • 12. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Market share estimate (i) • The median value is consistent with the TIM market share at a country level • Right tail distribution values are extremely large min 5th 25th median 75th 95th max 0.006 0.070 0.139 0.245 0.547 5.567 347.024 Ratio Map Population Map • in large areas with small number of residents the ratio is overestimated • Antenna is often localized in large SCs where few people reside. People on residential areas are likely tracked where the antenna is (i.e. up to few hundred meters far)
  • 13. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Market share estimate (ii) • Alternative strategy: to estimate the ratio for selected residential areas plus a portion of neighbourhood, to include the Antenna • Residential areas are chosen according to DUSAF (Destinazione d’Uso dei Suoli Agricoli e Forestali) map https://www.dati.lombardia.it/Territorio/Dusaf-5-0-Uso-del-suolo-2015/iq6r-u7y2. Figure: Villaggio Sereno residential area: 0.265 Figure: San Polo residential area: 0.310
  • 14. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Further developments • New estimated (via market share coefficient) density curves • Population - average working day (11:00am, peak) r = 0.30, ∼ 217,000 r = 0.25, ∼ 254,000 • at macro (Municipality), micro (Sezioni Censimento) and by land usage.
  • 15. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Conclusions • Mobile phone data can be used to estimate the number of people in the city considering the dynamics over the time, differently to administrative archives, which reports static numbers • Generally, just a portion of phone users is available. • We have developed a method to estimate the market share at a municipality level by using administrative data on the number of residents by “Sezione di censimento” • By putting together mobile phone private data with two kinds of administrative (freely available) data (census data and land usage information) • Results are consistent with the market share at a national level and can be used to infer the dynamic of the presence at municipality level.
  • 16. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References Acknowledgements Data Methods and Systems Statistical Laboratory (DMS StatLab) DMS StatLab Authors are grateful with the Statistical Office of the Municipality of Brescia, with a special mention to • Dr. Marco Palamenghi, • Dr. Paola Chiesa • Dr. Maria Elena Comune who kindly supported and provided us with data.
  • 17. matching mobile phone signals and census data Metulini, Carpita Data & Context Methods Results Conclusions Acknowledgm. & References References 1 Bouveyron, C., Come, E., & Jacques, J. (2015). The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9(4), 1726-1760 2 Carpita, M., & Simonetto, A. Big Data to Monitor Big Social Events: Analysing the mobile phone signals in the Brescia Smart City. Electronic Journal of Applied Statistical Analysis: Decision Support Systems and Services Evaluation, vol. 5(1), pp. 31-41. (2014) 3 Febrero, M., Galeano, P., & Gonzalez-Manteiga, W. (2008). Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics: The official journal of the International Environmetrics Society, 19(4), 331-345. 4 Manfredini, F., Pucci, P., Secchi, P., Tagliolato, P., Vantini, S., & Vitelli, V. (2015). Treelet decomposition of mobile phone data for deriving city usage and mobility pattern in the Milan urban region. In Advances in complex data modeling and computational methods in statistics (pp. 133-147). Springer, Cham. 5 Metulini, R., & Carpita, M. HUMAN ACTIVITY SPATIO-TEMPORAL INDICATORS USING MOBILE PHONE DATA. Data Science & Social Research 2019 Book of Abstracts, 89. 6 Secchi, P., Vantini, S., & Zanini, P. Analysis of Mobile Phone Data for Deriving City Mobility Patterns. In Electric Vehicle Sharing Services for Smarter Cities (pp. 37-58). Springer, Cham (2017) 7 Sun, Y., & Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical Statistics, 20(2), 316-334. 8 Tomasi, C. (2012). Histograms of oriented gradients. Computer Vision Sampler, 1-6. 9 Zanini, P., Shen, H., & Truong, Y. Understanding resident mobility in Milan through independent component analysis of Telecom Italia mobile usage data. The Annals of Applied Statistics, vol. 10(2), pp. 812-833 (2016)
  • 18. matching mobile phone signals and census data Metulini, Carpita Supplemental Agricolo Aree di interesse sovracomunale Boschi Extraurbano non classificato Ferrovie Parcheggio Polifunzionale Produttivo Residenziale Servizio comunale Strade Back to slide
  • 19. matching mobile phone signals and census data Metulini, Carpita Supplemental Back to slide
  • 20. matching mobile phone signals and census data Metulini, Carpita Supplemental Figure: Ratio - October 28th (Wednesday), 2015, 9pm Figure: Population - January, 1st 2016 Back to slide
  • 21. matching mobile phone signals and census data Metulini, Carpita Supplemental Back to slide