SlideShare a Scribd company logo
1 of 27
Twitter-based Sensing
of City-level Air Quality
Polychronis Charitidis, Eleftherios Spyromitros-Xioufis,
Symeon Papadopoulos, Yiannis Kompatsiaris
IVMSP 2018, June 6, Aristi, Greece
source: http://www.worldbank.org
Primary air pollutants
PM and ways of measuring
• Particulate matter: air-suspended mixture of both
solid and liquid particles
• PM10: particles smaller than 10μm
• PM2.5: particles smaller than 2.5μm
• Way of measuring PM
• Certified reference instruments
• Certified equivalent instruments
• Certified indicative instruments
• Indicative instruments
source: https://www.aeroqual.com/particulate-matters-why-monitor-pm10-and-pm2-5
accuracy cost
Sensing AQ using Twitter
air pollution affected city affected citizens
social media discussionsmonitoring and mining
prediction
AQ-related tweets
Related work
• Related research has focused on the use of Sina Weibo as a
“sensor” of air quality in China and by use of a standard
model building setting
• Our setting is different and more challenging
• Much lower volatility/variability in AQ values  less relevant
signals in social media
• Less population compared to China  less posts in social media
• We differentiate between monitored & unmonitored cities and
adopt a transfer learning formulation
Mei et al. “Inferring air pollution by sniffing social media.” ASONAM 2014
Jiang et al. “Using social media to detect outdoor air pollution ...” PLOS One 2015
Wang et al. ““Social media as a sensor of air …”. Journal of Medical Internet Research, 2015
Tao et al. “Inferring atmospheric particulate matter concentrations ….”. PLOS One 2016
Problem formulation
• CM/CU: Set of monitored and unmonitored cities
• For each 𝑐𝑗 ∈ 𝐶 𝑀:
• training samples 𝐷𝑐 𝑗
= { 𝒙1, 𝑦1 , … , (𝒙 𝑁, 𝑦 𝑁)}
• 𝒙𝑖 ∈ 𝑅 𝑑, d-dim vector summarizing tweets in city cj
during i-th temporal bin, 𝑦 𝑖
∈ 𝑅 average PM2.5
concentration during bin i
• For each 𝑐 𝑞 ∈ 𝐶 𝑈:
• build model ℎ 𝑐 𝑞
: 𝑿 → 𝑌 (only xi available)
Transfer learning
• Data pooling approach
• Train regression model h on 𝐷 = 𝑐 𝑗∈𝐶 𝑀
𝐷𝑐 𝑗
• Simultaneously minimize prediction error on all
monitored cities
• Feature selection: keep top k features ranked by
their Pearson correlation with Y
• Variant: weighted data pooling where each training
example weighted by inverse distance between its
city and target city
Data
• Track 120 English air quality-related keywords
air pollution, aqi, emission, smog, haze, cough, wheeze, …
• Infer location from each tweet using geotagging
method (Kordopatis-Zilos et al., 2017):
• Tweet text with geotagging confidence > 0.8
• Twitter account’s location field if above not possible
• Air quality data: OpenAQ API
• Hourly measurements
• Average measurements from different stations in the
same city
Kordopatis-Zilos et al. ““Geotagging text content with language models and ...” PIEEE 2017
Feature extraction
• Bag-of-words:
• tokenization, lowercasing and stop word removal
• vocabulary W = {w1,…,wn} of n=10K most frequent words
in a random sample of 1M of the collected tweets
• x = [x1, … , xn] represents all tweets in city c at time
interval t, xi denotes number of tweets containing wi
divided by total number of tweets in (c,t)
• Two variants:
• “current”: only tweets from current temporal bin
• “lagged”: include tweets from previous bins
Experiments
Setup
• Five cities in UK and five in US
• Period: Feb 8, 2017  Jan 18, 2018
• Each city in turn considered test city (i.e. no access to
ground truth air quality ground truth)
• Three temporal granularities: 6h, 12h, 24h (ground
truth  average of hourly measurements)
• Root Mean Squared Error (RMSE)
• Macro-averaging for country-wise/overall performance
(αRMSE)
• Gradient Tree Boosting for regression
• scikit-learn: learning rate = 0.01, nr. estimators = 200
UK cities
CITY #TWEETS/DAY
London 3972
Birmingham 198
Leeds 112
Liverpool 108
Manchester 321
US cities
CITY #TWEETS/DAY
New York 2564
Philadelphia 478
Boston 574
Baltimore 394
Pittsburgh 169
Baseline performance analysis
• IDW: Inverse Distance Weighting (spatial interpol.)
• High correlation between close-by cities
• mean: always predict mean PM2.5 value per city
• Small variability and mostly low PM2.5 values
UK US Overall
6h 12h 24h 6h 12h 24h 6h 12h 24h
IDW 3.79 3.34 3.09 4.12 3.73 3.41 3.96 3.54 3.25
mean 7.00 6.64 6.36 4.60 4.26 4.02 5.80 5.46 5.19
αRMSE
Within-city models
• #tw: total number of tweets in spatiotemporal bin
• #aqs: number of tweets related to air quality
• #high: number of tweets related to high air pollution
• all: concatenation of #tw, #aqs, #high
• BoW/BoW-1/BoW-2: BoW and lagged versions
#tw #aqs #high all BoW BoW-1 BoW-2
6h 5.96 5.93 5.98 5.84 5.15 4.99 4.97
12h 6.17 5.98 6.02 5.77 4.96 4.84 5.16
24h 5.83 6.11 5.82 5.52 4.65 4.96 5.16
αRMSE
Ground truth vs features
PM2.5 in London
date
PM2.5(μg/m3)
Cross-city models
• full: full dimensional BoW (or lagged)
• k=N: top-k features are selected
• w=0/1: without/with sample weighting
full k=10 k=20 k=50 k=100 k=200 k=500
w=0
6h 5.36 5.48 5.28 5.21 5.24 5.29 5.31
12h 5.21 5.29 5.18 5.12 5.09 5.11 5.15
24h 4.97 4.89 4.78 4.78 4.75 4.79 4.86
w=1
6h 5.35 5.47 5.27 5.21 5.24 5.29 5.30
12h 5.21 5.26 5.18 5.11 5.08 5.11 5.16
24h 4.95 4.85 4.77 4.76 4.73 4.77 4.84
αRMSE
Fusion
• Simple fusion of two inputs
• IDW estimate
• Twitter-based estimate
• Overall, still slightly lower
compared to IDW
• Better for three cities: Boston, London, Pittsburgh
(i.e. cities that are far from the rest)
Overall
6h 12h 24h
IDW 3.96 3.54 3.25
mean 5.80 5.46 5.19
fusion 4.15 4.00 3.63
Summary & outlook
• Features extracted from Twitter can offer useful
signals that can contribute to coarse air quality
estimations
• Combined with actual air quality measurements
from nearby locations, Twitter-based estimations
can lead to improved results
• Still room for further improvements:
• Better tweet classification, feature extraction, modelling
• Use of additional modalities (sky images)
Thank you!
Symeon Papadopoulos
papadop@iti.gr / @sympap
code: https://github.com/MKLab-ITI/twitter-aq
Top selected features
feature Correlation
measured 0.678
moderate 0.666
particles 0.666
temperature 0.661
wind 0.655
humidity 0.654
pm10 0.591
weather 0.574
haze 0.574
pollutants 0.560
feature correlation
currently 0.557
spam 0.551
forecast 0.529
tube 0.527
bonfire 0.523
polluted 0.515
air 0.515
temperatures 0.514
begun 0.508
exceeding 0.507
PM2.5 correlations of city pairs
Example tweets (aqs-high)
Tweets classified as aqs AND high
RT @PlumeInLondon: High pollution (50) at 10PM. High for #London. Avoid
physical activities if sensitive https://t.co/3LVRgps965
London's air pollution is killing me. Coughs now sound like squeaky chew toy.
#sendhelp #sendventolin
RT @cargill_taxi: And the mayor of London tries to blame poor air quality on toxic
air from German factories.
@claireL23 The traffic, poor air quality, the light pollution, the lack of green space,
the concrete jungle, the building work. Need I go on?
RT @SkyNews: THE GUARDIAN FRONT PAGE: "Toxic air risk to one in four London
schools" #skypapers https://t.co/2c6ANlujep
RT @MayorofLondon: London’s toxic air is a public health emergency. Here is what
I’m doing about it https://t.co/YHw2CVepPI

More Related Content

What's hot

Air quality challenges and business opportunities in China: Fusion of environ...
Air quality challenges and business opportunities in China: Fusion of environ...Air quality challenges and business opportunities in China: Fusion of environ...
Air quality challenges and business opportunities in China: Fusion of environ...CLIC Innovation Ltd
 
China testbed FMI-Enfuser in Langfang by Adj. Prof. Ari Karppinen
China testbed FMI-Enfuser in Langfang by Adj. Prof. Ari KarppinenChina testbed FMI-Enfuser in Langfang by Adj. Prof. Ari Karppinen
China testbed FMI-Enfuser in Langfang by Adj. Prof. Ari KarppinenCLEEN_Ltd
 
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim Mills
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim MillsBreathe London - Hyperlocal Air Quality Monitoring Network - Jim Mills
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim MillsIES / IAQM
 
Aerosol optical depth
Aerosol optical depthAerosol optical depth
Aerosol optical depthHardik Gajjar
 
Cosmic rays and clouds: using open science to clear the confusion
Cosmic rays and clouds: using open science to clear the confusionCosmic rays and clouds: using open science to clear the confusion
Cosmic rays and clouds: using open science to clear the confusionBenjamin Laken
 
Calibration of Environmental Sensor Data Using a Linear Regression Technique
Calibration of Environmental Sensor Data Using a Linear Regression TechniqueCalibration of Environmental Sensor Data Using a Linear Regression Technique
Calibration of Environmental Sensor Data Using a Linear Regression Techniqueijtsrd
 
Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...
Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...
Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...IES / IAQM
 
Prognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion ModellingPrognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion ModellingIES / IAQM
 
Air pollution monitoring system using mobile gprs sensors array ppt
Air pollution monitoring system using mobile gprs sensors array pptAir pollution monitoring system using mobile gprs sensors array ppt
Air pollution monitoring system using mobile gprs sensors array pptSaurabh Giratkar
 
3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream NetworkHartanto Sanjaya
 
Shair: Adding the missing dimensions to modelling
Shair: Adding the missing dimensions to modellingShair: Adding the missing dimensions to modelling
Shair: Adding the missing dimensions to modellingIES / IAQM
 
An Analytical Survey on Prediction of Air Quality Index
An Analytical Survey on Prediction of Air Quality IndexAn Analytical Survey on Prediction of Air Quality Index
An Analytical Survey on Prediction of Air Quality Indexijtsrd
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddyClimDev15
 
New generation of high sensitivity airborne potassium magnetometers
New generation of high sensitivity airborne potassium magnetometersNew generation of high sensitivity airborne potassium magnetometers
New generation of high sensitivity airborne potassium magnetometersGem Systems
 
Watershed Delineation of Kolhapur District Maharashtra, India
Watershed Delineation of Kolhapur District Maharashtra, IndiaWatershed Delineation of Kolhapur District Maharashtra, India
Watershed Delineation of Kolhapur District Maharashtra, Indiaabhijeetbmore
 
AERMOD CHANGES AND UPDATES
AERMOD CHANGES AND UPDATESAERMOD CHANGES AND UPDATES
AERMOD CHANGES AND UPDATESSergio A. Guerra
 

What's hot (20)

Kcw rb
Kcw rbKcw rb
Kcw rb
 
Air quality challenges and business opportunities in China: Fusion of environ...
Air quality challenges and business opportunities in China: Fusion of environ...Air quality challenges and business opportunities in China: Fusion of environ...
Air quality challenges and business opportunities in China: Fusion of environ...
 
China testbed FMI-Enfuser in Langfang by Adj. Prof. Ari Karppinen
China testbed FMI-Enfuser in Langfang by Adj. Prof. Ari KarppinenChina testbed FMI-Enfuser in Langfang by Adj. Prof. Ari Karppinen
China testbed FMI-Enfuser in Langfang by Adj. Prof. Ari Karppinen
 
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim Mills
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim MillsBreathe London - Hyperlocal Air Quality Monitoring Network - Jim Mills
Breathe London - Hyperlocal Air Quality Monitoring Network - Jim Mills
 
Aerosol optical depth
Aerosol optical depthAerosol optical depth
Aerosol optical depth
 
Cosmic rays and clouds: using open science to clear the confusion
Cosmic rays and clouds: using open science to clear the confusionCosmic rays and clouds: using open science to clear the confusion
Cosmic rays and clouds: using open science to clear the confusion
 
Calibration of Environmental Sensor Data Using a Linear Regression Technique
Calibration of Environmental Sensor Data Using a Linear Regression TechniqueCalibration of Environmental Sensor Data Using a Linear Regression Technique
Calibration of Environmental Sensor Data Using a Linear Regression Technique
 
Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...
Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...
Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...
 
Prognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion ModellingPrognostic Meteorological Models and Their Use in Dispersion Modelling
Prognostic Meteorological Models and Their Use in Dispersion Modelling
 
Air pollution monitoring system using mobile gprs sensors array ppt
Air pollution monitoring system using mobile gprs sensors array pptAir pollution monitoring system using mobile gprs sensors array ppt
Air pollution monitoring system using mobile gprs sensors array ppt
 
Flood hazard mapping four provinces of cambodia
Flood hazard mapping four provinces of cambodiaFlood hazard mapping four provinces of cambodia
Flood hazard mapping four provinces of cambodia
 
AIR QUALITY ANALYZER USING DRONE
AIR QUALITY ANALYZER USING DRONEAIR QUALITY ANALYZER USING DRONE
AIR QUALITY ANALYZER USING DRONE
 
3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network
 
Shair: Adding the missing dimensions to modelling
Shair: Adding the missing dimensions to modellingShair: Adding the missing dimensions to modelling
Shair: Adding the missing dimensions to modelling
 
An Analytical Survey on Prediction of Air Quality Index
An Analytical Survey on Prediction of Air Quality IndexAn Analytical Survey on Prediction of Air Quality Index
An Analytical Survey on Prediction of Air Quality Index
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddy
 
Ampio Pollution Control Drone
Ampio Pollution Control DroneAmpio Pollution Control Drone
Ampio Pollution Control Drone
 
New generation of high sensitivity airborne potassium magnetometers
New generation of high sensitivity airborne potassium magnetometersNew generation of high sensitivity airborne potassium magnetometers
New generation of high sensitivity airborne potassium magnetometers
 
Watershed Delineation of Kolhapur District Maharashtra, India
Watershed Delineation of Kolhapur District Maharashtra, IndiaWatershed Delineation of Kolhapur District Maharashtra, India
Watershed Delineation of Kolhapur District Maharashtra, India
 
AERMOD CHANGES AND UPDATES
AERMOD CHANGES AND UPDATESAERMOD CHANGES AND UPDATES
AERMOD CHANGES AND UPDATES
 

Similar to Twitter-based Sensing of City-level Air Quality

Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-finalAmin Chowdhury
 
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...STEP_scotland
 
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...Wassim Derguech
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...Fatima Qayyum
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichtenDaniel Westzaan
 
Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...
Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...
Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...Kalman Graffi
 
air Pollution monitoring and control
air Pollution monitoring and controlair Pollution monitoring and control
air Pollution monitoring and controlRohit566499
 
CFD Apps: Presentation of the Urban Wind Study App
CFD Apps: Presentation of the Urban Wind Study AppCFD Apps: Presentation of the Urban Wind Study App
CFD Apps: Presentation of the Urban Wind Study AppJulien de Charentenay
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
ATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackBTAOregon
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_JunMDO_Lab
 
RIPE Atlas for Network Researchers
RIPE Atlas for Network ResearchersRIPE Atlas for Network Researchers
RIPE Atlas for Network ResearchersRIPE NCC
 
COBWEB Summit at the OGC TC Dublin, 2016
COBWEB Summit at the OGC TC Dublin, 2016COBWEB Summit at the OGC TC Dublin, 2016
COBWEB Summit at the OGC TC Dublin, 2016COBWEB Project
 
Complying with EPA's Guidance for SO2 Designations
Complying with EPA's Guidance for SO2 DesignationsComplying with EPA's Guidance for SO2 Designations
Complying with EPA's Guidance for SO2 DesignationsSergio A. Guerra
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingHiroyuki Miyazaki
 
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning Phuc Nguyen
 

Similar to Twitter-based Sensing of City-level Air Quality (20)

Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-final
 
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
Scottish Urban Air Qualtiy Steering Group - Modelling & Monitoring Workshop -...
 
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
COBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF SecretariatCOBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF Secretariat
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
 
01-11 StreamAir - Donald.pdf
01-11 StreamAir - Donald.pdf01-11 StreamAir - Donald.pdf
01-11 StreamAir - Donald.pdf
 
Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...
Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...
Kalman Graffi - Disputation Talk - Monitoring and Management of P2P Systems -...
 
air Pollution monitoring and control
air Pollution monitoring and controlair Pollution monitoring and control
air Pollution monitoring and control
 
CFD Apps: Presentation of the Urban Wind Study App
CFD Apps: Presentation of the Urban Wind Study AppCFD Apps: Presentation of the Urban Wind Study App
CFD Apps: Presentation of the Urban Wind Study App
 
ACCESS-Opt_Overview
ACCESS-Opt_OverviewACCESS-Opt_Overview
ACCESS-Opt_Overview
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
ATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista Nordback
 
AIoT for AIA.pptx
AIoT for AIA.pptxAIoT for AIA.pptx
AIoT for AIA.pptx
 
Sampling-SDM2012_Jun
Sampling-SDM2012_JunSampling-SDM2012_Jun
Sampling-SDM2012_Jun
 
RIPE Atlas for Network Researchers
RIPE Atlas for Network ResearchersRIPE Atlas for Network Researchers
RIPE Atlas for Network Researchers
 
COBWEB Summit at the OGC TC Dublin, 2016
COBWEB Summit at the OGC TC Dublin, 2016COBWEB Summit at the OGC TC Dublin, 2016
COBWEB Summit at the OGC TC Dublin, 2016
 
Complying with EPA's Guidance for SO2 Designations
Complying with EPA's Guidance for SO2 DesignationsComplying with EPA's Guidance for SO2 Designations
Complying with EPA's Guidance for SO2 Designations
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mapping
 
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Twitter-based Sensing of City-level Air Quality

  • 1. Twitter-based Sensing of City-level Air Quality Polychronis Charitidis, Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Yiannis Kompatsiaris IVMSP 2018, June 6, Aristi, Greece
  • 3.
  • 5. PM and ways of measuring • Particulate matter: air-suspended mixture of both solid and liquid particles • PM10: particles smaller than 10μm • PM2.5: particles smaller than 2.5μm • Way of measuring PM • Certified reference instruments • Certified equivalent instruments • Certified indicative instruments • Indicative instruments source: https://www.aeroqual.com/particulate-matters-why-monitor-pm10-and-pm2-5 accuracy cost
  • 6. Sensing AQ using Twitter air pollution affected city affected citizens social media discussionsmonitoring and mining prediction
  • 8. Related work • Related research has focused on the use of Sina Weibo as a “sensor” of air quality in China and by use of a standard model building setting • Our setting is different and more challenging • Much lower volatility/variability in AQ values  less relevant signals in social media • Less population compared to China  less posts in social media • We differentiate between monitored & unmonitored cities and adopt a transfer learning formulation Mei et al. “Inferring air pollution by sniffing social media.” ASONAM 2014 Jiang et al. “Using social media to detect outdoor air pollution ...” PLOS One 2015 Wang et al. ““Social media as a sensor of air …”. Journal of Medical Internet Research, 2015 Tao et al. “Inferring atmospheric particulate matter concentrations ….”. PLOS One 2016
  • 9. Problem formulation • CM/CU: Set of monitored and unmonitored cities • For each 𝑐𝑗 ∈ 𝐶 𝑀: • training samples 𝐷𝑐 𝑗 = { 𝒙1, 𝑦1 , … , (𝒙 𝑁, 𝑦 𝑁)} • 𝒙𝑖 ∈ 𝑅 𝑑, d-dim vector summarizing tweets in city cj during i-th temporal bin, 𝑦 𝑖 ∈ 𝑅 average PM2.5 concentration during bin i • For each 𝑐 𝑞 ∈ 𝐶 𝑈: • build model ℎ 𝑐 𝑞 : 𝑿 → 𝑌 (only xi available)
  • 10. Transfer learning • Data pooling approach • Train regression model h on 𝐷 = 𝑐 𝑗∈𝐶 𝑀 𝐷𝑐 𝑗 • Simultaneously minimize prediction error on all monitored cities • Feature selection: keep top k features ranked by their Pearson correlation with Y • Variant: weighted data pooling where each training example weighted by inverse distance between its city and target city
  • 11. Data • Track 120 English air quality-related keywords air pollution, aqi, emission, smog, haze, cough, wheeze, … • Infer location from each tweet using geotagging method (Kordopatis-Zilos et al., 2017): • Tweet text with geotagging confidence > 0.8 • Twitter account’s location field if above not possible • Air quality data: OpenAQ API • Hourly measurements • Average measurements from different stations in the same city Kordopatis-Zilos et al. ““Geotagging text content with language models and ...” PIEEE 2017
  • 12. Feature extraction • Bag-of-words: • tokenization, lowercasing and stop word removal • vocabulary W = {w1,…,wn} of n=10K most frequent words in a random sample of 1M of the collected tweets • x = [x1, … , xn] represents all tweets in city c at time interval t, xi denotes number of tweets containing wi divided by total number of tweets in (c,t) • Two variants: • “current”: only tweets from current temporal bin • “lagged”: include tweets from previous bins
  • 13.
  • 15. Setup • Five cities in UK and five in US • Period: Feb 8, 2017  Jan 18, 2018 • Each city in turn considered test city (i.e. no access to ground truth air quality ground truth) • Three temporal granularities: 6h, 12h, 24h (ground truth  average of hourly measurements) • Root Mean Squared Error (RMSE) • Macro-averaging for country-wise/overall performance (αRMSE) • Gradient Tree Boosting for regression • scikit-learn: learning rate = 0.01, nr. estimators = 200
  • 16. UK cities CITY #TWEETS/DAY London 3972 Birmingham 198 Leeds 112 Liverpool 108 Manchester 321
  • 17. US cities CITY #TWEETS/DAY New York 2564 Philadelphia 478 Boston 574 Baltimore 394 Pittsburgh 169
  • 18. Baseline performance analysis • IDW: Inverse Distance Weighting (spatial interpol.) • High correlation between close-by cities • mean: always predict mean PM2.5 value per city • Small variability and mostly low PM2.5 values UK US Overall 6h 12h 24h 6h 12h 24h 6h 12h 24h IDW 3.79 3.34 3.09 4.12 3.73 3.41 3.96 3.54 3.25 mean 7.00 6.64 6.36 4.60 4.26 4.02 5.80 5.46 5.19 αRMSE
  • 19. Within-city models • #tw: total number of tweets in spatiotemporal bin • #aqs: number of tweets related to air quality • #high: number of tweets related to high air pollution • all: concatenation of #tw, #aqs, #high • BoW/BoW-1/BoW-2: BoW and lagged versions #tw #aqs #high all BoW BoW-1 BoW-2 6h 5.96 5.93 5.98 5.84 5.15 4.99 4.97 12h 6.17 5.98 6.02 5.77 4.96 4.84 5.16 24h 5.83 6.11 5.82 5.52 4.65 4.96 5.16 αRMSE
  • 20. Ground truth vs features PM2.5 in London date PM2.5(μg/m3)
  • 21. Cross-city models • full: full dimensional BoW (or lagged) • k=N: top-k features are selected • w=0/1: without/with sample weighting full k=10 k=20 k=50 k=100 k=200 k=500 w=0 6h 5.36 5.48 5.28 5.21 5.24 5.29 5.31 12h 5.21 5.29 5.18 5.12 5.09 5.11 5.15 24h 4.97 4.89 4.78 4.78 4.75 4.79 4.86 w=1 6h 5.35 5.47 5.27 5.21 5.24 5.29 5.30 12h 5.21 5.26 5.18 5.11 5.08 5.11 5.16 24h 4.95 4.85 4.77 4.76 4.73 4.77 4.84 αRMSE
  • 22. Fusion • Simple fusion of two inputs • IDW estimate • Twitter-based estimate • Overall, still slightly lower compared to IDW • Better for three cities: Boston, London, Pittsburgh (i.e. cities that are far from the rest) Overall 6h 12h 24h IDW 3.96 3.54 3.25 mean 5.80 5.46 5.19 fusion 4.15 4.00 3.63
  • 23. Summary & outlook • Features extracted from Twitter can offer useful signals that can contribute to coarse air quality estimations • Combined with actual air quality measurements from nearby locations, Twitter-based estimations can lead to improved results • Still room for further improvements: • Better tweet classification, feature extraction, modelling • Use of additional modalities (sky images)
  • 24. Thank you! Symeon Papadopoulos papadop@iti.gr / @sympap code: https://github.com/MKLab-ITI/twitter-aq
  • 25. Top selected features feature Correlation measured 0.678 moderate 0.666 particles 0.666 temperature 0.661 wind 0.655 humidity 0.654 pm10 0.591 weather 0.574 haze 0.574 pollutants 0.560 feature correlation currently 0.557 spam 0.551 forecast 0.529 tube 0.527 bonfire 0.523 polluted 0.515 air 0.515 temperatures 0.514 begun 0.508 exceeding 0.507
  • 26. PM2.5 correlations of city pairs
  • 27. Example tweets (aqs-high) Tweets classified as aqs AND high RT @PlumeInLondon: High pollution (50) at 10PM. High for #London. Avoid physical activities if sensitive https://t.co/3LVRgps965 London's air pollution is killing me. Coughs now sound like squeaky chew toy. #sendhelp #sendventolin RT @cargill_taxi: And the mayor of London tries to blame poor air quality on toxic air from German factories. @claireL23 The traffic, poor air quality, the light pollution, the lack of green space, the concrete jungle, the building work. Need I go on? RT @SkyNews: THE GUARDIAN FRONT PAGE: "Toxic air risk to one in four London schools" #skypapers https://t.co/2c6ANlujep RT @MayorofLondon: London’s toxic air is a public health emergency. Here is what I’m doing about it https://t.co/YHw2CVepPI

Editor's Notes

  1. https://www.sutp.org/en/news-reader/world-energy-outlook-special-report-2016-on-air-pollution-released-9039.html
  2. (people only complain on low quality conditions)
  3. Other regressors tested: - Random Forest  - LinearSVR - Lasso - Ridge