This document describes a strategy for matching mobile phone signals data with census data to estimate population densities over time. The researchers obtained mobile phone usage data from Telecom Italia Mobile covering a region in Brescia, Italy. They estimated density profiles by classifying days into similar patterns and characterized groups. To estimate the total population from just one carrier's data, they calculated a market share coefficient using census data on residents in different areas. Their results provide a method to infer dynamic population patterns from passive mobile phone and administrative data.
A strategy for the matching of mobile phone signals with census data
1. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
A strategy for the matching of mobile phone
signals with census data
Rodolfo Metulini1, Maurizio Carpita1
1. Data Methods and Systems Statistical Laboratory - Department of
Economics and Management, University of Brescia
Milano - June 19th, 2019
2. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Table of contents
1 Data & Context
2 Methods
3 Results
4 Conclusions
5 Acknowledgm. & References
3. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
The Context
• Administrative data are traditionally used to count the presence of
people (Static)
• The geo-localization of people by mobile phone, by quantifying the
number of people at a given moment in time, enriches the amount of
useful information for “smart” (cities) evaluations. (Dynamic)
• Using Telecom Italia Mobile (TIM) data, we are able to characterize
the spatio-temporal dynamic of the presences in the city of just TIM
users.
• In this paper we propose a strategy to extrapolate the number of
people by using TIM data only.
• To do so, we apply a spatial record linkage of mobile phone data
with administrative archives using the number of residents at the
level of “sezione di censimento”.
4. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Data
• Provided by Telecom Italia Mobile (TIM), thanks to a research
collaboration of DMS StatLab with the Statistical Office of the
Municipality of Brescia
• Recorded in the period April 1st 2014 – August 11th 2016, in a
rectangular region defined by latitude 45.21◦
N - 46.36◦
N and
longitude 9.83◦
N - 10.85◦
N
• Aggregated into 923 x 607 rectangular cells of 150 m2
size each
• Available at intervals of 15 minutes, for a total of more than 40,000
millions of records collected
• The corresponding record refers to the average number of mobile
phones simultaneously connected to the network in that rectangular
area in that time interval
• The mobility feature of these data is hidden, in the sense it is not
possible to trace the single person over the time
5. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Mobile phone data in Literature
Similar data has been used by:
• Carpita and Simonetto (2014) analyzed the presence of people during
big events in the city of Brescia
• Zanini et al. (2016) find, by mean of a Independent Component
Analysis (ICA), a number of spatial components that separate main
areas of the city of Milano
• Manfredini et al. (2015) used Treelet Decomposition and Voronoi
Tassellation to study density curves
• Secchi et al. (2017) used Blind Source Separation, a method that
allows to extrapolate significant sources and to associate each source
to a specific urban behavior
6. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Estimating Density Profiles (i)
• We estimate the presence of TIM users in a specific area by
classifying similar days in terms of spatial and temporal dimension,
using a large (≈ 2 years) dataset (Metulini & Carpita, 2019)
• To manage with high dimensional data, we employed a multi-stage
procedure (Tomasi, 2012) that converts the data matrix containing
the values of the grid (2-D) to a vector of features (1-D)
• The procedure defines reference days using a mix of traditional
(k-means) and model-based functional data clustering techniques
Step Action Aim Methods Using ..
1 group days find similar
raster images
histogram of
oriented gradi-
ents (HOG) &
k-means
HOG
features
2 group groups
of days
find similar den-
sities
functional model-
based clustering
daily
density
profiles
3 characterize
groups
find reference
daily profiles
functional box
plots
daily
density
profiles
7. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Estimating Density Profiles (ii)
From a nxn raster data ....
93 124 77 ... ...
217 55 94 ... ...
24 77 109 ... ...
... ... ... ... ...
... ... ... ... ...
...to Xt , a matrix representing the
number of people in that cell at time t
quart. feat. day1 day2 ... day ˜T
1 1 h11,1 h21,1 ... h ˜T1,1
1 2 h11,2 h21,2 ... h ˜T1,2
1 ... ... ... ... ...
1 k h11,k h21,k ... h ˜T1,k
... ... ... ... ... ...
96 k h196,k h296,k ... h ˜T96,k
...to a vector of features of the 96
quarters of the same day
... to a classification of the days in
clusters
8. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Estimating Density Profiles (iii)
• The output of the procedure is a functional box plot on the daily
density profiles of a group of similar days (Febrero et al. 2008,
Bouveyron et al. 2015, Sun & Genton, 2011)
• By applying the procedure to the 39 x 39 rectangular grid defined by
latitude 45.516◦
N - 46.564◦
N and longitude 10.18◦
N - 10.245◦
N
( Brescia ) we find, for example, that most of the week days of the
Summer 2016 belongs to the same group
• The amount of TIM users along different quarters varies, by month
and by quarter, from a minimum of 30 to a maximum of about 55
thousands
0 20 40 60 80
303540455055
June
0 20 40 60 80
303540455055
July
0 20 40 60 80
303540455055
August
9. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
The Market share coefficient
• To estimate the dynamic of the total number of people we have to
consider all mobile phone users. This data is often unavailable,
unless to an onerous cost
• Alternative approach: to apply the mobile phone market share
coefficient to the number of TIM users
• A country-level estimate being available through “Il Sole 24 Ore”
newspaper. This value stands to 30.2 % (2016, December)
• However, we have reasons to think that TIM market share varies
along cities due to socio-economic and demographic variables
Quantity Brescia Italy
Per-capita revenues (Euro/year)1 23,418 19,514
% foreigners2 18.5 8.5
Avg. number of people per family2 2.11 2.33
Avg. age2 45.8 44.7
1 MEF -Dip. delle Finanze (2016)
2 ISTAT (2017)
• Assuming residential areas are populated, on late evening, only by
residents, we compare number of residents with the number of TIM
users on selected regions during a specific hour of the day (9pm)
10. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Administrative data
• ISTAT (https://www.istat.it/it/archivio/104317) published
“Basi territoriali e variabili censuarie” in the form of a shape file with
data (the so-called SpatialPolygonDataFrame in R language).
• For the municipalities with more than 20,000 residents, ISTAT
aggregates the region at a “Sezioni di censimento” (SC) level. The
municipality of Brescia has 1,836 SCs.
• The shape file contains, for each polygon, the information on the
number of residents by SC.
11. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Matching Strategy
• We compute the number of TIM users in each SC by matching
TIM grid cells with the shape files on residents (grid cells have
regular size, SCs are irregular polygons)
• We apply a weighted scheme based on the portion of the polygon
contained in the cell
EXAMPLE: SC110 overlaps with 4
cells
Oct 28th 15, 9pm: cell 1: 682 TIM
users, cell 2: 555, cell 3: 677, cell 4:
751
8.3% of SC110 lies in cell 1, 27.0% in
cell 2, 26.4% in cell 3, 38.2% in cell 4.
TIM users = (682 ∗ 0.083 + 555 ∗ 0.270 + 677 ∗ 0.264 + 751 ∗ 0.382) ∗
area(SC)
area(cell)
12. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Market share estimate (i)
• The median value is consistent with the TIM market share at a
country level
• Right tail distribution values are extremely large
min 5th 25th median 75th 95th max
0.006 0.070 0.139 0.245 0.547 5.567 347.024
Ratio Map Population Map
• in large areas with small number of residents the ratio is
overestimated
• Antenna is often localized in large SCs where few people reside.
People on residential areas are likely tracked where the antenna is
(i.e. up to few hundred meters far)
13. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Market share estimate (ii)
• Alternative strategy: to estimate the ratio for selected residential
areas plus a portion of neighbourhood, to include the Antenna
• Residential areas are chosen according to DUSAF (Destinazione
d’Uso dei Suoli Agricoli e Forestali) map
https://www.dati.lombardia.it/Territorio/Dusaf-5-0-Uso-del-suolo-2015/iq6r-u7y2.
Figure: Villaggio Sereno residential
area: 0.265
Figure: San Polo residential area:
0.310
14. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Further developments
• New estimated (via
market share
coefficient) density
curves
• Population - average
working day (11:00am,
peak)
r = 0.30, ∼ 217,000
r = 0.25, ∼ 254,000
• at macro (Municipality), micro (Sezioni Censimento) and by land
usage.
15. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Conclusions
• Mobile phone data can be used to estimate the number of people in
the city considering the dynamics over the time, differently to
administrative archives, which reports static numbers
• Generally, just a portion of phone users is available.
• We have developed a method to estimate the market share at a
municipality level by using administrative data on the number of
residents by “Sezione di censimento”
• By putting together mobile phone private data with two kinds of
administrative (freely available) data (census data and land usage
information)
• Results are consistent with the market share at a national level and
can be used to infer the dynamic of the presence at municipality level.
16. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
Acknowledgements
Data Methods and Systems Statistical Laboratory (DMS StatLab)
DMS StatLab
Authors are grateful with the Statistical Office of the Municipality of
Brescia, with a special mention to
• Dr. Marco Palamenghi,
• Dr. Paola Chiesa
• Dr. Maria Elena Comune
who kindly supported and provided us with data.
17. matching
mobile phone
signals and
census data
Metulini,
Carpita
Data &
Context
Methods
Results
Conclusions
Acknowledgm.
& References
References
1 Bouveyron, C., Come, E., & Jacques, J. (2015). The discriminative functional mixture model for a
comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9(4), 1726-1760
2 Carpita, M., & Simonetto, A. Big Data to Monitor Big Social Events: Analysing the mobile phone
signals in the Brescia Smart City. Electronic Journal of Applied Statistical Analysis: Decision
Support Systems and Services Evaluation, vol. 5(1), pp. 31-41. (2014)
3 Febrero, M., Galeano, P., & Gonzalez-Manteiga, W. (2008). Outlier detection in functional data by
depth measures, with application to identify abnormal NOx levels. Environmetrics: The official
journal of the International Environmetrics Society, 19(4), 331-345.
4 Manfredini, F., Pucci, P., Secchi, P., Tagliolato, P., Vantini, S., & Vitelli, V. (2015). Treelet
decomposition of mobile phone data for deriving city usage and mobility pattern in the Milan urban
region. In Advances in complex data modeling and computational methods in statistics (pp.
133-147). Springer, Cham.
5 Metulini, R., & Carpita, M. HUMAN ACTIVITY SPATIO-TEMPORAL INDICATORS USING
MOBILE PHONE DATA. Data Science & Social Research 2019 Book of Abstracts, 89.
6 Secchi, P., Vantini, S., & Zanini, P. Analysis of Mobile Phone Data for Deriving City Mobility
Patterns. In Electric Vehicle Sharing Services for Smarter Cities (pp. 37-58). Springer, Cham (2017)
7 Sun, Y., & Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical
Statistics, 20(2), 316-334.
8 Tomasi, C. (2012). Histograms of oriented gradients. Computer Vision Sampler, 1-6.
9 Zanini, P., Shen, H., & Truong, Y. Understanding resident mobility in Milan through independent
component analysis of Telecom Italia mobile usage data. The Annals of Applied Statistics, vol.
10(2), pp. 812-833 (2016)
18. matching
mobile phone
signals and
census data
Metulini,
Carpita
Supplemental
Agricolo
Aree di interesse sovracomunale
Boschi
Extraurbano non classificato
Ferrovie
Parcheggio
Polifunzionale
Produttivo
Residenziale
Servizio comunale
Strade
Back to slide
20. matching
mobile phone
signals and
census data
Metulini,
Carpita
Supplemental
Figure: Ratio - October 28th
(Wednesday), 2015, 9pm
Figure: Population - January, 1st
2016
Back to slide