SlideShare a Scribd company logo
1 of 35
Extracting City Events from Social Streams
Pramod Anantharam
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
http://www.ict-citypulse.eu/page/
Collaborator: Dr. Payam Barnaghi
Advisors: Prof. Amit Sheth, Prof. Krishnaprasad Thirunarayan
Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. 2015. Extracting City Traffic Events
from Social Streams. ACM Trans. Intell. Syst. Technol. 6, 4, Article 43 (July 2015), 27 pages. DOI=10.1145/2717317
http://doi.acm.org/10.1145/2717317
http://wiki.knoesis.org/index.php/Citypulse
A Historical Perspective on Cities and its Inhabitants
“kings, emperors and other rulers benefited from being on the front lines with their
people when it came to making decisions.”1
1http://gicoaches.com/what-we-can-learn-from-kings-of-the-past-who-disguised-themselves-as-ordinary-men/
http://en.wikipedia.org/wiki/Qianlong_Emperor
Qianlong Emperor (8 October 1735 – 9 February 1796)
Qing Dynasty (1644–1912)
Disguised as a commoner, Qianlong visited
cities to understand a common man’s life
This is popularly known as “Management by
Walking Around” since the 1980’s
A Modern Perspective on Cities and its Inhabitants
City authorities, government and other humanitarian agencies are benefited from
being on the front lines with their people when it comes to making decisions.
We want to be connected to citizens to
understand and prioritize decisions
Image credit: http://www-03.ibm.com/software/products/us/en/intelligent-operations-center
Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html
Life in a City
Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html
Public Safety Urban planning Gov. & agency
admin.
Energy &
water
Environmental Transportation Social Programs Healthcare Education
Pulse of a City (CityPulse)
What are People Talking About City Infrastructure on Twitter?
• What are people talking about city
infrastructure on twitter?
• How do we extract city infrastructure related
events from twitter?
• How can we leverage event and location
knowledge bases for event extraction?
• How well can we extract city events?
Research Questions
Some Challenges in Extracting Events from Tweets
• No well accepted definition of ‘events related to a
city’
• Tweets are short (140 characters) and its informal
nature make it hard to analyze
– Entity, location, time, and type of an event
• Multiple reports of the same event and sparse
report of some events (biased sample)
– Numbers don’t necessarily indicate intensity
• Validation of the solution is hard due to the open
domain nature of the problem
Formal Text Informal Text
Closed Domain
Open Domain [Roitman et al. 2012][Kumaran and Allan 2004]
[Lampos and Cristianini 2012]
[Becker et al. 2011]
[Wang et al. 2012]
[Ritter et al. 2012]
Related Work on Event Extraction
• N-grams + Regression
– Text analysis to extract uni- and bi-grams (event markers)
– Feature selection to select best possible event markers
– Apply regression to estimate conditional probability P(Y|X) to enable
prediction where Y is the target (e.g., rainfall) and X is the input (e.g., traffic
jam event).
• Clustering
– Create event clusters incrementally over time
– Identify clusters of interest based on its relevance (manual inspection)
– Granularity remains at the tweet/cluster level (tweets are assigned to clusters
of interest)
• Sequence Labeling (CRFs)
– Text analysis to extract features such as named entities, POS1 tagging
– Each event indicator is modeled as a mixture of event types that are latent
variables
– Each type corresponds to a distribution over named entities n (labels assigned
to event types by manual inspection) and other features
Event Extraction -- Techniques
1Part Of Speech
• Event extraction should be open domain (no a
priori event types) preferably with event
metadata (e.g., event duration, impact).
• Incorporate background knowledge related to
city events e.g., 511.org hierarchy, SCRIBE
ontology, city location names.
• Assess the intensity of an event using content and
network cues.
• Be robust w.r.t. to noise, informal nature, and
variability of data.
City Event Extraction -- Desiderata
• N-grams + Regression
– Open domain: works best when there is a reference
corpus to extract n-grams
– Event metadata: cannot distinguish between entities
and hence hard to extract event metadata
– Background knowledge: incorporating domain
vocabulary (e.g., subsumption) is not natural
– Event intensity: regression maps the event indicators
to some quantified values
– Robustness: quite robust if there is a reference corpus
Techniques -- Desiderata
• Clustering
– Open domain: works well for domains with no a priori
knowledge of events (may need human inspection)
– Event metadata: too coarse grained (document level)
and event metadata extraction is not natural
– Background knowledge: incorporating domain
vocabulary is not natural
– Event intensity: not captured
– Robustness: quite robust for twitter data with enough
data for each cluster
Techniques -- Desiderata
• Sequence Labeling (CRFs)
– Open domain: works well for domains with no a priori
knowledge of events (may need human inspection)
– Event metadata: event metadata extraction is
captured naturally with the named entities
– Background knowledge: incorporating domain
vocabulary is quite natural
– Event intensity: part-of-speech tag may indirectly
capture intensity
– Robustness: with a deeper model for capturing
context, quite robust for twitter data
Techniques -- Desiderata
City Infrastructure
Tweets from a city
POS
Tagging
Hybrid NER+
Event term
extraction
Geohashing
Temporal
Estimation
Impact
Assessment
Event
Aggregation
OSM
Locations
SCRIBE
ontology
511.org hierarchy
City Event Extraction
City Event Extraction Solution Architecture
City Event Annotation
tag
1
tag
2
tag
3
token
1
token
2
token
3
Φ1
1(tag1,tag2) Φ1
2(tag2,tag3)
Φ2
1(tag1,token1) Φ2
2(tag2,token2) Φ2
3(tag3, token3)
t1 t2
T1 T2 T3
Training data with tokens and tags
A General CRF Model Regression Based Implementation of CRF Model
t2 t2
T4 T5 T6
t3
t1
City Event Annotation – CRF Formalization
The global normalization distinguishes CRFs from
other models allowing for factoring in long distance dependencies
City Event Annotation – CRF Annotation Examples
Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION
Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O
#Manteca O accident. B-EVENT two O lanes O blocked B-EVENT on O Hwy O 99 O NB
O at O Austin O Rd O #traffic O http://t.co/YehsHpD7aC O
#Fontana O accident. B-EVENT three O lanes O blocked B-EVENT on O I-10 O WB O
between O Cherry B-LOCATION Ave I-LOCATION and O Etiwanda O Ave O in O
#Ontario O #LAtraffic O http://t.co/e2e6MW3d78 O
B-LOCATION
I-LOCATION
B-EVENT
I-EVENT
O
Tags used in our approach:
a) Space: events reported within a grid (gi ∈G
where G is a set of all grids in a city)at a certain
time are most likely reporting the same event
b) Time: events reported within a time ∆t in a grid
gi are most likely to be reporting the same event
c) Theme: events with similar entities within a grid
gi and time ∆t are most likely reporting the same
event
City Event Extraction -- Key Insights
We will utilize these principles in the event aggregation algorithm
0.6 miles
Max-lat
Min-lat
Min-long
Max-long
0.38 miles
37.7545166015625, -122.40966796875
37.7490234375, -122.40966796875
37.7545166015625, -122.420654296875
37.7490234375, -122.420654296875
4
37.74933, -122.4106711
Hierarchical spatial structure of geohash for
representing locations with variable precision.
Here the location string is 5H34
0 1 2 3 4 5 6
7 8 9 B C D E
F G H I J K L
0 1
7
2 3 4
5 6 8 9
0 1 2 3 4
5 6 7
0 1 2
3 4 5
6 7 8
City Event Extraction – Geohashing
City Event Extraction – Metadata Population Algorithm
City Event Extraction
– Event Aggregation Algorithm
Event metadata inference
Spatial filtering
Grouping Events by types
• <traffic, 5889, 2013-10-22 19:24:39, 2013-10-23 18:54:08, 19>
• <concert, 5889, 2013-10-20 19:46:06, 2013-10-21 19:06:29, 35>
• <accident, 32400, 2013-10-20 19:51:10, 2013-10-21 15:53:08, 11>
• <parade, 8672, 2013-08-10 12:57:17, 2013-08-10 18:57:21, 11>
City Event Extraction – A Sample of Extracted Events
Location refers to the geohash number which is mapped to lat-long
• City Event Annotation
– Automated creation of training data
– Annotation task (our CRF model vs. baseline CRF model)
• City Event Extraction
– Use aggregation algorithm for event extraction
– Extracted events AND ground truth
• Dataset (Aug – Nov 2013) ~ 8 GB of data on disk
– Over 8 million tweets (extract events using Alg. 1 and 2)
– Over 162 million sensor data points (find delays by looking
at change in travel time for links serving as ground truth)
– 311 active events and 170 scheduled events (readily
available events as ground truth)
Evaluation
Evaluation – Automated Creation of Training Data (to train CRF model)
Evaluation over 500 randomly chosen tweets from around 8,000 annotated tweets
Aho-Corasick [Commentz-Walter 1979] string matching algorithm implemented by LingPipe [Alias-i 2008]
Evaluation – Annotation Task (our CRF model vs. baseline CRF model)
Baseline CRF Model
Our CRF Model
Baseline CRF model (trained on a huge manually
created data) works well on generic tasks.
Our CRF model trained on automatically generated
training data performs on par with the baseline.
Our CRF model does better on the event extraction
task due to the availability of event related
knowledge
Ground Truth Data (only incident reports) -- City Event Extraction
We have around 162 million data records from sensors monitoring over 3,700 links in San Franciso Bay Area
<link_id, link_speed, link_volume, link_travel_time,time_stamp>  a data record
GREEN – Active Events
YELLOW – Scheduled Events
311 active events and 170 scheduled events
Evaluation – Use Aggregation Algorithm for Event Extraction
Evaluation – Extracted Events AND Ground Truth Verification
Evaluation Metric For Comparing Events with Ground Truth:
• Complementary Events
• Additional information
• e.g., slow traffic from sensor data and accident from textual data
• Corroborative Events
• Additional confidence
• e.g., accident event supporting a accident report from ground truth
• Timeliness
• Additional insight
• e.g., knowing poor visibility before formal report from ground truth
Evaluation – Extracted Events AND Ground Truth Verification
Complementary Events
Complementary Events
Evaluation – Extracted Events AND Ground Truth Verification
Corroborative Events
Corroborative Events
Evaluation – Extracted Events AND Ground Truth Verification
Corroborative Events
Evaluation – Extracted Events AND Ground Truth Verification
Corroborative Events Complementary Events
Evaluation – Extracted Events AND Ground Truth Verification
Timeliness
Timeliness
• People in a city indeed talk about various
infrastructure related specifically, traffic.
• City traffic related events can be extracted from
tweets using sequence labeling techniques and
spatial aggregation algorithms.
• Domain knowledge of events and locations can
be utilized to create large training datasets to
train sequence labeling algorithms.
• Traffic events extracted from twitter can be
complementary, corroborative, and timely
compared to formal reports of traffic events.
Conclusion
[Kumaran and Allan 2004] Giridhar Kumaran and James Allan. 2004. Text classification and named entities for new
event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development
in information retrieval. ACM, 297–304.
[Lampos and Cristianini 2012] Vasileios Lampos and Nello Cristianini. 2012. Nowcasting events from the social web with
statistical learn- ing. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4 (2012), 72.
[Roitman et al. 2012] Haggai Roitman, Jonathan Mamou, Sameep Mehta, Aharon Satt, and LV Subramaniam. 2012.
Harnessing the Crowds for smart city sensing. In Proceedings of the 1st international workshop on Multimodal crowd
sensing. ACM, 17–18.
[Ritter et al. 2012] Alan Ritter, Oren Etzioni, Sam Clark, and others. 2012. Open domain event extraction from twitter. In
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1104–
1112.
[Wang et al. 2012] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. 2012. Automatic crime prediction using
events extracted from twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction. Springer, 231–
238.
[Becker et al. 2011] Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event
Identification on Twitter.. In ICWSM.
[Alias-i 2008] Alias-i. 2008. LingPipe 4.1.0. (2008). http://alias-i.com/lingpipe
[Commentz-Walter 1979] Beate Commentz-Walter. 1979. A string matching algorithm fast on the average. Springer.
References

More Related Content

What's hot

Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 
Smart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingSmart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingAmit Sheth
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsKan Yuenyong
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis? Amit Sheth
 
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Big Data Spain
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Amit Sheth
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social ComputingAmit Sheth
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818Ramesh Jain
 
resume_Yuli_Liang
resume_Yuli_Liangresume_Yuli_Liang
resume_Yuli_LiangYuli Liang
 

What's hot (20)

Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 
Smart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingSmart IoT for Connected Manufacturing
Smart IoT for Connected Manufacturing
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
 
Semantics based Summarization of Entities in Knowledge Graphs
Semantics based Summarization of Entities in Knowledge GraphsSemantics based Summarization of Entities in Knowledge Graphs
Semantics based Summarization of Entities in Knowledge Graphs
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social Computing
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Cri big data
Cri big dataCri big data
Cri big data
 
Big Data
Big Data Big Data
Big Data
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
CINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science OverviewCINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science Overview
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818
 
resume_Yuli_Liang
resume_Yuli_Liangresume_Yuli_Liang
resume_Yuli_Liang
 

Similar to Extracting City Traffic Events from Social Streams

Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Event Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesEvent Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesMikko Rinne
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataivaderivader
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008Emanuele Della Valle
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysYork University
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it
 
Project on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environmentProject on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environmentSiddharth Chaudhary
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsAhmad Jawwad
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
Semantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - PresentationSemantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - PresentationMuthu Kumaar Thangavelu
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City ApplicationsAmit Sheth
 
Toward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and ApplicationsToward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and ApplicationsRaja Chiky
 
Presentation iswc
Presentation iswcPresentation iswc
Presentation iswcSydGillani
 
Connecting the Next Billion Devices to the Internet - Standards and Protocols
Connecting the Next Billion Devices to the Internet - Standards and ProtocolsConnecting the Next Billion Devices to the Internet - Standards and Protocols
Connecting the Next Billion Devices to the Internet - Standards and ProtocolsSteve Ray
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 

Similar to Extracting City Traffic Events from Social Streams (20)

Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Event Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesEvent Processing Using Semantic Web Technologies
Event Processing Using Semantic Web Technologies
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Project on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environmentProject on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environment
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
Semantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - PresentationSemantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - Presentation
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Toward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and ApplicationsToward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and Applications
 
Presentation iswc
Presentation iswcPresentation iswc
Presentation iswc
 
Connecting the Next Billion Devices to the Internet - Standards and Protocols
Connecting the Next Billion Devices to the Internet - Standards and ProtocolsConnecting the Next Billion Devices to the Internet - Standards and Protocols
Connecting the Next Billion Devices to the Internet - Standards and Protocols
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management System
 
EventShop ISG talk 140213
EventShop ISG talk 140213EventShop ISG talk 140213
EventShop ISG talk 140213
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 

Recently uploaded

2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...Health
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stageAbc194748
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxnuruddin69
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 

Recently uploaded (20)

2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 

Extracting City Traffic Events from Social Streams

  • 1. Extracting City Events from Social Streams Pramod Anantharam Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA http://www.ict-citypulse.eu/page/ Collaborator: Dr. Payam Barnaghi Advisors: Prof. Amit Sheth, Prof. Krishnaprasad Thirunarayan Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. 2015. Extracting City Traffic Events from Social Streams. ACM Trans. Intell. Syst. Technol. 6, 4, Article 43 (July 2015), 27 pages. DOI=10.1145/2717317 http://doi.acm.org/10.1145/2717317 http://wiki.knoesis.org/index.php/Citypulse
  • 2. A Historical Perspective on Cities and its Inhabitants “kings, emperors and other rulers benefited from being on the front lines with their people when it came to making decisions.”1 1http://gicoaches.com/what-we-can-learn-from-kings-of-the-past-who-disguised-themselves-as-ordinary-men/ http://en.wikipedia.org/wiki/Qianlong_Emperor Qianlong Emperor (8 October 1735 – 9 February 1796) Qing Dynasty (1644–1912) Disguised as a commoner, Qianlong visited cities to understand a common man’s life This is popularly known as “Management by Walking Around” since the 1980’s
  • 3. A Modern Perspective on Cities and its Inhabitants City authorities, government and other humanitarian agencies are benefited from being on the front lines with their people when it comes to making decisions. We want to be connected to citizens to understand and prioritize decisions
  • 4. Image credit: http://www-03.ibm.com/software/products/us/en/intelligent-operations-center Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html Life in a City
  • 5. Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html Public Safety Urban planning Gov. & agency admin. Energy & water Environmental Transportation Social Programs Healthcare Education Pulse of a City (CityPulse)
  • 6. What are People Talking About City Infrastructure on Twitter?
  • 7. • What are people talking about city infrastructure on twitter? • How do we extract city infrastructure related events from twitter? • How can we leverage event and location knowledge bases for event extraction? • How well can we extract city events? Research Questions
  • 8. Some Challenges in Extracting Events from Tweets • No well accepted definition of ‘events related to a city’ • Tweets are short (140 characters) and its informal nature make it hard to analyze – Entity, location, time, and type of an event • Multiple reports of the same event and sparse report of some events (biased sample) – Numbers don’t necessarily indicate intensity • Validation of the solution is hard due to the open domain nature of the problem
  • 9. Formal Text Informal Text Closed Domain Open Domain [Roitman et al. 2012][Kumaran and Allan 2004] [Lampos and Cristianini 2012] [Becker et al. 2011] [Wang et al. 2012] [Ritter et al. 2012] Related Work on Event Extraction
  • 10. • N-grams + Regression – Text analysis to extract uni- and bi-grams (event markers) – Feature selection to select best possible event markers – Apply regression to estimate conditional probability P(Y|X) to enable prediction where Y is the target (e.g., rainfall) and X is the input (e.g., traffic jam event). • Clustering – Create event clusters incrementally over time – Identify clusters of interest based on its relevance (manual inspection) – Granularity remains at the tweet/cluster level (tweets are assigned to clusters of interest) • Sequence Labeling (CRFs) – Text analysis to extract features such as named entities, POS1 tagging – Each event indicator is modeled as a mixture of event types that are latent variables – Each type corresponds to a distribution over named entities n (labels assigned to event types by manual inspection) and other features Event Extraction -- Techniques 1Part Of Speech
  • 11. • Event extraction should be open domain (no a priori event types) preferably with event metadata (e.g., event duration, impact). • Incorporate background knowledge related to city events e.g., 511.org hierarchy, SCRIBE ontology, city location names. • Assess the intensity of an event using content and network cues. • Be robust w.r.t. to noise, informal nature, and variability of data. City Event Extraction -- Desiderata
  • 12. • N-grams + Regression – Open domain: works best when there is a reference corpus to extract n-grams – Event metadata: cannot distinguish between entities and hence hard to extract event metadata – Background knowledge: incorporating domain vocabulary (e.g., subsumption) is not natural – Event intensity: regression maps the event indicators to some quantified values – Robustness: quite robust if there is a reference corpus Techniques -- Desiderata
  • 13. • Clustering – Open domain: works well for domains with no a priori knowledge of events (may need human inspection) – Event metadata: too coarse grained (document level) and event metadata extraction is not natural – Background knowledge: incorporating domain vocabulary is not natural – Event intensity: not captured – Robustness: quite robust for twitter data with enough data for each cluster Techniques -- Desiderata
  • 14. • Sequence Labeling (CRFs) – Open domain: works well for domains with no a priori knowledge of events (may need human inspection) – Event metadata: event metadata extraction is captured naturally with the named entities – Background knowledge: incorporating domain vocabulary is quite natural – Event intensity: part-of-speech tag may indirectly capture intensity – Robustness: with a deeper model for capturing context, quite robust for twitter data Techniques -- Desiderata
  • 15. City Infrastructure Tweets from a city POS Tagging Hybrid NER+ Event term extraction Geohashing Temporal Estimation Impact Assessment Event Aggregation OSM Locations SCRIBE ontology 511.org hierarchy City Event Extraction City Event Extraction Solution Architecture City Event Annotation
  • 16. tag 1 tag 2 tag 3 token 1 token 2 token 3 Φ1 1(tag1,tag2) Φ1 2(tag2,tag3) Φ2 1(tag1,token1) Φ2 2(tag2,token2) Φ2 3(tag3, token3) t1 t2 T1 T2 T3 Training data with tokens and tags A General CRF Model Regression Based Implementation of CRF Model t2 t2 T4 T5 T6 t3 t1 City Event Annotation – CRF Formalization The global normalization distinguishes CRFs from other models allowing for factoring in long distance dependencies
  • 17. City Event Annotation – CRF Annotation Examples Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O #Manteca O accident. B-EVENT two O lanes O blocked B-EVENT on O Hwy O 99 O NB O at O Austin O Rd O #traffic O http://t.co/YehsHpD7aC O #Fontana O accident. B-EVENT three O lanes O blocked B-EVENT on O I-10 O WB O between O Cherry B-LOCATION Ave I-LOCATION and O Etiwanda O Ave O in O #Ontario O #LAtraffic O http://t.co/e2e6MW3d78 O B-LOCATION I-LOCATION B-EVENT I-EVENT O Tags used in our approach:
  • 18. a) Space: events reported within a grid (gi ∈G where G is a set of all grids in a city)at a certain time are most likely reporting the same event b) Time: events reported within a time ∆t in a grid gi are most likely to be reporting the same event c) Theme: events with similar entities within a grid gi and time ∆t are most likely reporting the same event City Event Extraction -- Key Insights We will utilize these principles in the event aggregation algorithm
  • 19. 0.6 miles Max-lat Min-lat Min-long Max-long 0.38 miles 37.7545166015625, -122.40966796875 37.7490234375, -122.40966796875 37.7545166015625, -122.420654296875 37.7490234375, -122.420654296875 4 37.74933, -122.4106711 Hierarchical spatial structure of geohash for representing locations with variable precision. Here the location string is 5H34 0 1 2 3 4 5 6 7 8 9 B C D E F G H I J K L 0 1 7 2 3 4 5 6 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 City Event Extraction – Geohashing
  • 20. City Event Extraction – Metadata Population Algorithm
  • 21. City Event Extraction – Event Aggregation Algorithm Event metadata inference Spatial filtering Grouping Events by types
  • 22. • <traffic, 5889, 2013-10-22 19:24:39, 2013-10-23 18:54:08, 19> • <concert, 5889, 2013-10-20 19:46:06, 2013-10-21 19:06:29, 35> • <accident, 32400, 2013-10-20 19:51:10, 2013-10-21 15:53:08, 11> • <parade, 8672, 2013-08-10 12:57:17, 2013-08-10 18:57:21, 11> City Event Extraction – A Sample of Extracted Events Location refers to the geohash number which is mapped to lat-long
  • 23. • City Event Annotation – Automated creation of training data – Annotation task (our CRF model vs. baseline CRF model) • City Event Extraction – Use aggregation algorithm for event extraction – Extracted events AND ground truth • Dataset (Aug – Nov 2013) ~ 8 GB of data on disk – Over 8 million tweets (extract events using Alg. 1 and 2) – Over 162 million sensor data points (find delays by looking at change in travel time for links serving as ground truth) – 311 active events and 170 scheduled events (readily available events as ground truth) Evaluation
  • 24. Evaluation – Automated Creation of Training Data (to train CRF model) Evaluation over 500 randomly chosen tweets from around 8,000 annotated tweets Aho-Corasick [Commentz-Walter 1979] string matching algorithm implemented by LingPipe [Alias-i 2008]
  • 25. Evaluation – Annotation Task (our CRF model vs. baseline CRF model) Baseline CRF Model Our CRF Model Baseline CRF model (trained on a huge manually created data) works well on generic tasks. Our CRF model trained on automatically generated training data performs on par with the baseline. Our CRF model does better on the event extraction task due to the availability of event related knowledge
  • 26. Ground Truth Data (only incident reports) -- City Event Extraction We have around 162 million data records from sensors monitoring over 3,700 links in San Franciso Bay Area <link_id, link_speed, link_volume, link_travel_time,time_stamp>  a data record GREEN – Active Events YELLOW – Scheduled Events 311 active events and 170 scheduled events
  • 27. Evaluation – Use Aggregation Algorithm for Event Extraction
  • 28. Evaluation – Extracted Events AND Ground Truth Verification Evaluation Metric For Comparing Events with Ground Truth: • Complementary Events • Additional information • e.g., slow traffic from sensor data and accident from textual data • Corroborative Events • Additional confidence • e.g., accident event supporting a accident report from ground truth • Timeliness • Additional insight • e.g., knowing poor visibility before formal report from ground truth
  • 29. Evaluation – Extracted Events AND Ground Truth Verification Complementary Events Complementary Events
  • 30. Evaluation – Extracted Events AND Ground Truth Verification Corroborative Events Corroborative Events
  • 31. Evaluation – Extracted Events AND Ground Truth Verification Corroborative Events
  • 32. Evaluation – Extracted Events AND Ground Truth Verification Corroborative Events Complementary Events
  • 33. Evaluation – Extracted Events AND Ground Truth Verification Timeliness Timeliness
  • 34. • People in a city indeed talk about various infrastructure related specifically, traffic. • City traffic related events can be extracted from tweets using sequence labeling techniques and spatial aggregation algorithms. • Domain knowledge of events and locations can be utilized to create large training datasets to train sequence labeling algorithms. • Traffic events extracted from twitter can be complementary, corroborative, and timely compared to formal reports of traffic events. Conclusion
  • 35. [Kumaran and Allan 2004] Giridhar Kumaran and James Allan. 2004. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 297–304. [Lampos and Cristianini 2012] Vasileios Lampos and Nello Cristianini. 2012. Nowcasting events from the social web with statistical learn- ing. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4 (2012), 72. [Roitman et al. 2012] Haggai Roitman, Jonathan Mamou, Sameep Mehta, Aharon Satt, and LV Subramaniam. 2012. Harnessing the Crowds for smart city sensing. In Proceedings of the 1st international workshop on Multimodal crowd sensing. ACM, 17–18. [Ritter et al. 2012] Alan Ritter, Oren Etzioni, Sam Clark, and others. 2012. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1104– 1112. [Wang et al. 2012] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. 2012. Automatic crime prediction using events extracted from twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction. Springer, 231– 238. [Becker et al. 2011] Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter.. In ICWSM. [Alias-i 2008] Alias-i. 2008. LingPipe 4.1.0. (2008). http://alias-i.com/lingpipe [Commentz-Walter 1979] Beate Commentz-Walter. 1979. A string matching algorithm fast on the average. Springer. References

Editor's Notes

  1. Note Citizen Centered Smart City Understanding Events in a City from Twitter Streams – hope this will be my direction
  2. Citizens are central to a city, country or in the past kingdom Since the beginning of civilizations (or settlements) around 8000 BC, people have moved toward living in ‘cities’ Kings and emperors realized the importance of understanding citizen moods, sentiments, and opinions in making decisions Qianlong emperor is one such example Concept applies to even today!
  3. - Connection to people is the key! We always want to hear citizens talk about both good and bad in a city - Good will help us know what works and bad will help us prioritize and work toward making it better Social media such as twitter, FB, myspace, and many others gives direct access to what citizens think about a city Best way to tap into the problems and challenges citizens face in a city
  4. - Citizens need following services to help lead a healthy and prosperous life in a city
  5. Data gathered in a city by various departments Citizens reporting their observations of city infrastructure You may ask do citizens really talk about city infrastructure?
  6. Twitter as a source of real-time information There are over 200 million users generating 500 million tweets / day Twitter as a source of events in a city Citizens use twitter to express their concerns of city infrastructure that impacts their life
  7. There are some knowledge bases from IBM Smart Planet initiative that can help us for city events
  8. [Kumaran and Allan 2004] - event detection from news articles using a combination of classification and NER techniques [Lampos and Cristianini 2012] – they estimated rainfall in London using tweets & compare it with actual rainfall, influenza like illness and compare it with Health Protection Agency [Roitman et al. 2012] – interesting piece of work on harnessing the cowd for city sensing, sensor fusion [Ritter et al. 2012] – open domain event extraction from twitter (general events such as product release, TV shows, movie release) [Wang et al. 2012] – predicts crime using a prediction model built by associating topics in tweets to ground truth data [Becker et al. 2011] – distuinges between real-world events vs. Non-events tweets using clustering
  9. Classification is another method – but it is not feasible for the open domain problem we are interested in. CRFs – Conditional Random Fields Sequence Labeling Consider deeper features such as named entities, POS tags Captures context of mention of entities within a tweet Allows natural incorporation of domain knowledge
  10. Intensity vs. density => tweets, page-rank analogous to importance of event, importance of events Importance of event based on location + time Contrasting density Noise Flood - people affected vs. people conversing
  11. N-grams + Regression No direct way to extract event metadata extraction May need reference corpus for creating n-grams Needs good quality tweets if no reference corpus
  12. Clustering Does not capture event metadata Too coarse grained (tweet level) May not be able to identify location, time etc.
  13. CRF assigns a tag to each token Global normalization is the argmax term RHS is just a regression based implementation of linear chain (potentials defined only over adjacent tags) CRF LingPipe implementation of CRF is used in our experiments
  14. localized event detection strategy, city a composition of smaller geographical units We call these geographical units as grids Geohash provides us a way of compartmentalizing a city into uniquely addressable grids
  15. Distance computed using the formula: dlon = lon2 - lon1 dlat = lat2 - lat1 a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2 c = 2 * atan2( sqrt(a), sqrt(1-a) ) d = R * c (where R is the radius of the Earth) Found the box for the tweet! 37.7545166015625, -122.420654296875 37.7545166015625, -122.40966796875 37.7490234375, -122.40966796875 37.7490234375, -122.420654296875
  16. Infer delays – we have sensor data including speed of vehicles and travel time. We will use this data for verifying the events we have extracted. We look for change (reduction/increase) in travel time for all the links in the vicinity of the event (we will vary the radius from 0.5 miles to 2 miles)
  17. [Alias-i 2008] Alias-i. 2008. LingPipe 4.1.0. (2008). http://alias-i.com/lingpipe [Commentz-Walter 1979] Beate Commentz-Walter. 1979. A string matching algorithm fast on the average. Springer. - The only place this annotation suffered is the I-EVENT since this is a stateless model
  18. The record of 511.org may have its own timestamp which may be before tweets