Big Data Analytics for smart cities
Dip. di Economia e Management & Big & Open data Innovation Laboratory (BODaI Lab) -
Università degli Studi di Brescia
Rodolfo Metulini
rodolfo.metulini@unibs.it
Brescia, 23 Maggio 2018
2
1. Types of Data and sources
 .shp  spatial objects (points, polygons, …)
 .txt  data objects
 Manipulating the two over, coerce, coordinates
(sp package R)
2. Data characteristics
 Point patterns (regular – grid, irregular)
 Areal data  aggregation, clustering, M.A.U.P.
 Origin – Destination (OD)
3. Data Matching
 Semantich issues
 Linkages (deterministic, probabilistic)
Databases and Linkages
Janus statue in Vitican Museum (Roma)
3
1. Define who is the neighbour of who
2. Assign a weight to each link (Wnn)
3. Find an index to avaluate (spatial auto-)
correlation  Geary C, Moran I
Indicators for Geo-referenced data
A) Queen B) Rook C) Rook + Queen
A) Point pattern B) Areal data C) OD Flows
Applications
4
1. Gravity Models for Human Mobility
2. Probabilistic Record Linkages for Energy Efficiency
5
Lavoro Studio Occasionali Affari Rientri a casa
Auto conducente
Auto passeggero
TPL gomma (corriera, filobus,
autobus urbano, extraurbano,
aziendale o scolastico)
TPL ferro (treno, tram,
metropolitana)
Moto
Bici
Piedi
Altro
1. Number of passengers
2. Travel duration
3. Customer satisfaction
Source: Open Data Regione
Lombardia
For the reason k (es. by car, to
work) , for each directed dyad
od (es. from Lumezzane to
Brescia) at time t (es. lunedi 11
set. Ore 9.01-10.00)
1. Gravity models for human mobility
6
From spatial interaction to matrix representation
7
Gravity model
 Following Newton’s law, the force between two masses
depends from their masses (phone cells, inhabitants*) and
(inversely) from their distance (cost of the ticket, time, direct
or not, distance).
* Linkage with population available
8
Masses: phone cells
1. Provided by TIM
2. Estimates the number of people in a
specific area (regular grid) in a specific
interval of time.
Figure 2. From Carpita, Simonetto (2014, EJASA)
PROS:
More detailed (disaggregated level, different
times) compared to pupulation
CONS:
They do not covers all the phone companies
9
Distance decay in human mobility
Distance decay is a geographical term
which describes the effect of distance on
cultural or spatial interaction. The
interaction between two locales declines
as the distance between them increases.
But….In a globalized world, geographical
distance is assumed to tend to zero.
What really matters (in human mobility)?
 Costs (ticket, fuel, highroad fees)
 Infrastructures (km of road, minutes of
road)
 Many others ...
MDS on the distances between municipalities in terms of minutes of road (my
elaboration)
10
2. Record Linkages for Energy Efficiency
ITALIA, dati ISTAT, 2015:
 24.1% della popolazione lamenta problemi abitativi strutturali (infiltrazioni, umidità da soffitto o infissi)
 Circa il 9.6% lamenta condizioni abitative difficili
Questi dati impattano sull’efficientamento energetico: aumento dei costi e di CO2 pollution
OBIETTIVI:
Inquadramento della situazione attuale, allo scopo di:
 Miglioramento salute e benessere, riduzione della povertà, aumento redditi
 Minori emissioni di gas, riduzione tariffe, mantenimento risorse naturali
11
Problema
A.P.E. (Attestato prestazione energetica) – campione ridotto di alloggi
 Classe di efficienza edificio
 Emissioni CO2
 Consumi per intervalli temporali
Unione con Sezioni di censimento per recuperare dimensionalità
PROBLEMA: diverso sistema di coordinate (POINT TO POLYGON)
NECESSITA’ DI PROBABILISTIC RECORD LINKAGE
12
Spendibilità
Paper 1.
Nel contesto di Geographical Analysis and Urban Modelling, sviluppare una metodologia
ad-hoc (codice R) di proabilistic record linkage per lo studio dell’efficientamento energetico
da parte di chi di dovere
Paper 2.
In un contesto di Environmental Economics, analizzare le determinanti che spiegano la
variabilità spaziale di consumi ed emissioni.

Big Data Analytics for Smart Cities

  • 1.
    Big Data Analyticsfor smart cities Dip. di Economia e Management & Big & Open data Innovation Laboratory (BODaI Lab) - Università degli Studi di Brescia Rodolfo Metulini rodolfo.metulini@unibs.it Brescia, 23 Maggio 2018
  • 2.
    2 1. Types ofData and sources  .shp  spatial objects (points, polygons, …)  .txt  data objects  Manipulating the two over, coerce, coordinates (sp package R) 2. Data characteristics  Point patterns (regular – grid, irregular)  Areal data  aggregation, clustering, M.A.U.P.  Origin – Destination (OD) 3. Data Matching  Semantich issues  Linkages (deterministic, probabilistic) Databases and Linkages Janus statue in Vitican Museum (Roma)
  • 3.
    3 1. Define whois the neighbour of who 2. Assign a weight to each link (Wnn) 3. Find an index to avaluate (spatial auto-) correlation  Geary C, Moran I Indicators for Geo-referenced data A) Queen B) Rook C) Rook + Queen A) Point pattern B) Areal data C) OD Flows
  • 4.
    Applications 4 1. Gravity Modelsfor Human Mobility 2. Probabilistic Record Linkages for Energy Efficiency
  • 5.
    5 Lavoro Studio OccasionaliAffari Rientri a casa Auto conducente Auto passeggero TPL gomma (corriera, filobus, autobus urbano, extraurbano, aziendale o scolastico) TPL ferro (treno, tram, metropolitana) Moto Bici Piedi Altro 1. Number of passengers 2. Travel duration 3. Customer satisfaction Source: Open Data Regione Lombardia For the reason k (es. by car, to work) , for each directed dyad od (es. from Lumezzane to Brescia) at time t (es. lunedi 11 set. Ore 9.01-10.00) 1. Gravity models for human mobility
  • 6.
    6 From spatial interactionto matrix representation
  • 7.
    7 Gravity model  FollowingNewton’s law, the force between two masses depends from their masses (phone cells, inhabitants*) and (inversely) from their distance (cost of the ticket, time, direct or not, distance). * Linkage with population available
  • 8.
    8 Masses: phone cells 1.Provided by TIM 2. Estimates the number of people in a specific area (regular grid) in a specific interval of time. Figure 2. From Carpita, Simonetto (2014, EJASA) PROS: More detailed (disaggregated level, different times) compared to pupulation CONS: They do not covers all the phone companies
  • 9.
    9 Distance decay inhuman mobility Distance decay is a geographical term which describes the effect of distance on cultural or spatial interaction. The interaction between two locales declines as the distance between them increases. But….In a globalized world, geographical distance is assumed to tend to zero. What really matters (in human mobility)?  Costs (ticket, fuel, highroad fees)  Infrastructures (km of road, minutes of road)  Many others ... MDS on the distances between municipalities in terms of minutes of road (my elaboration)
  • 10.
    10 2. Record Linkagesfor Energy Efficiency ITALIA, dati ISTAT, 2015:  24.1% della popolazione lamenta problemi abitativi strutturali (infiltrazioni, umidità da soffitto o infissi)  Circa il 9.6% lamenta condizioni abitative difficili Questi dati impattano sull’efficientamento energetico: aumento dei costi e di CO2 pollution OBIETTIVI: Inquadramento della situazione attuale, allo scopo di:  Miglioramento salute e benessere, riduzione della povertà, aumento redditi  Minori emissioni di gas, riduzione tariffe, mantenimento risorse naturali
  • 11.
    11 Problema A.P.E. (Attestato prestazioneenergetica) – campione ridotto di alloggi  Classe di efficienza edificio  Emissioni CO2  Consumi per intervalli temporali Unione con Sezioni di censimento per recuperare dimensionalità PROBLEMA: diverso sistema di coordinate (POINT TO POLYGON) NECESSITA’ DI PROBABILISTIC RECORD LINKAGE
  • 12.
    12 Spendibilità Paper 1. Nel contestodi Geographical Analysis and Urban Modelling, sviluppare una metodologia ad-hoc (codice R) di proabilistic record linkage per lo studio dell’efficientamento energetico da parte di chi di dovere Paper 2. In un contesto di Environmental Economics, analizzare le determinanti che spiegano la variabilità spaziale di consumi ed emissioni.