GRÉGOIRE BUREL1, HARITH ALANI1
1Knowledge Media Institute, The Open University, Milton Keynes, UK.
ISCRAM’18, Rochester, New York, USA.
20-23 May 2018.
Crisis Event Extraction Service
(CREES) – Automatic Detection and
Classification of Crisis-related
Content on Social Media
www.comrades-project.eu
evhart.github.io/crees
Social Media during Crises
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
2
How to obtain relevant information from this large amount
of heterogeneous high-velocity data?
Social media has become a common place for communities and
organisations to communicate and share information during crises,
to enhance their situational awareness, to share requests or offers
for help and support, and to coordinate their recovery efforts.
Example of Twitter usage during crises (+200m active users /
+400m posts a day):
1. During the 2011 Japan earthquake, 177 million tweets related
to the event were sent in one day.
2. The news about the Boston bombings first appeared on
Twitter.
Event Detection and Crisis Situations
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
3
Event detection is “the task of automatically
identifying certain clues in texts that denote a
specific event type or theme”.
- Help identifying/responding to events.
- Organise relevant information during
crises.
- Prioritise response.
! Social Media Event Detection Challenges
Typical (Manual) Processing Pipeline
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
4
During crises, data is mostly processed manually:
Collect
Curation and
Filtering
Classification
Entity
Extraction
Visualisation
Analyse
Understand &
Visualise
Search/browse social media
website. Copy/paste in
spreadsheet.
Manually select
relevant posts
during collection.
Annotate data / information priority, etc.
Use visualisation applications (e.g.,
Ushahidi, OSM) or libraries.
Automatic Processing Pipeline
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
5
Supervised or semi-supervised methods as well as APIs and automatic tools can be used.
Collect
Curation and
Filtering
Classification
Entity
Extraction
Visualisation
Analyse
Understand &
Visualise
Use APIs for collecting data.
Use event detection
models for filtering
documents.
Use visualisation applications (e.g.,
Ushahidi, OSM) or libraries.
Supervised / non-
supervised
classification
models.
Named Entity
Recognition and
Entity Linking.
Automatic Event Classification / API / Integration
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
6
Classification API Integration
Analyse Understand and Visualise
Questions
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
7
Automatic Classification of Crisis-
related Content using CNNs
Classification API Integration
Crisis-Related Event Detection Tasks
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media
Publications
8
Crisis-related event detection is often divided into three main tasks (Olteanu et al.
2015):
Crisis Related /
Unrelated
Crisis
Type
Information
Categories
Task 1
Identify the
different types
of crises the
message is
related to.
Differentiate the
type of information
contained in the
message.
e.g., shooting,
explosion, building
collapse, fires, floods,
meteorite fall, etc.
e.g., affected individuals,
infrastructures and
utilities, donations and
volunteer, caution and
advice, etc.
Granularity
Differentiate the
posts that are
related or unrelated
to crises.
Task 2 Task 3
Automatic Text Classification
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
9
We can broadly distinguish two types of text classification approaches:
1. Unsupervised text classification: e.g., clustering, LDA.
+ Do not need categories and existing annotations.
- Inferred classes are not typed and may have limited usefulness.
2. Supervised methods: e.g., SVM, CNN.
+ More precise and accurate classifications.
- Needs annotated content.
‘Traditional’ ML vs. Deep Learning
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
10
Deep Learning
- Artificial neural networks.
- Minimum feature engineering
- Word embeddings (Bengio et
al., 2013).
‘Traditional’ ML
- Standard classifiers (e.g., SVM,
J48…).
- Feature engineering (e.g.,
lemmatisation, TF-IDF…).
- Bag of words.
CNN for Sentence Classification (Kim et al., 2014)
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
11
CNN Variations – Complexity and Responsiveness
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
12
More complex CNN models exists
(e.g. semantic CNN, multimodal
CNN, GCN) but:
- Require additional pre-processing
when doing predictions.
- Require more time for training or
re-training.
- May not yield high improvement
against CNN.
!
Standard CNN models do not require
complex pre-processing steps and can
be more responsive for real-time
applications
CNN Classification – Experimental Setup
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
Dataset - T26 (28,000 annotated tweets)
- 12 Crisis types (shooting, explosion, building collapse, fires, floods,
meteorite fall, haze, bombing, typhoon, crash, earthquake, and
derailment).
- 6 Information categories (affected individuals, infrastructures and
utilities, donations and volunteer, caution and advice, sympathy and
emotional support, and other useful information)
CNN Classification – Experimental Setup
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
14
Dataset versions
- Full Dataset: 28,000 tweets.
- Balanced Datasets: 6703 tweets (24%) / 12 997 tweets (46.5%) / 9 105
tweets (32.6%).
Baselines
- Naïve Bayes / CART / SVM: Classical ML models using the words’ TF-
IDF vectors extracted from our dataset.
Evaluation
- 5-folds cross validation.
- CNN: 300-dim embeddings, Fn = 128 convolutional filter of sizes Fs =
[3,4,5], 0.5 dropout and ADAM.
- Evaluation Measures: P, R and F1.
?
Results
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
15
+ - CNN results are similar to
SVM.
- F1 > 83% for relatedness
and event types detection..
- CNN (word embeddings)
potentially better at
generalising on unseen
datasets compared to SVM.
- Crisis-information categories are
hard to classify (61% F1).
- The studied datasets are relatively
small for the task (small amount of
training data).
- More complex deep learning models
can perform better (e.g., Sem-CNN)
-
Questions
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
16
CREES API
Classification API Integration
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
1717
The COMRADES CREES Services (Crisis Event Extraction Service)
use CNNs to provide a rest API for annotating short text documents
(e.g. tweets) by identifying :
1. If a document is related to a crisis.
2. The type of event discussed.
3. The type of information present in a document:
https://evhart.github.io/crees
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
1818
CREES JSON API
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
1919
CREES API Load Testing
- The CREES API was evaluated using Load Testing and Locust:
- Approach:
- Increases the number of users every second up to 1000 users
performing 12 GET queries per second and record query latency
and failures.
- Perform the evaluation on an external network.
- Results: 5% Failure with 1000 users / No failure with 500 users (latency
< 700 ms).
Questions
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
20
Integration Examples
Classification API Integration
CREES Integration Examples
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
21
Although the CREES API can be integrated
easily with different tools we have already
integrated it to:
- Google Sheets:
- Spreadsheets are commonly used
during crises for collecting and
annotating information
- We created a Google Sheets Add-on
that automatically annotate
spreadsheet cells or columns.
- Ushahidi (COMRADES Platform):
- The Ushahidi platform is a data
collection and visualisation platform for
situation awareness.
- CREES automatically annotate reports
added to the platform.
Hurricane Harvey – Data Collection/Filtering
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
22
Google Sheets
22
Solicit
Collect and
Enrich
Information
solicited on
social media.
Copy/Paste and
manually extract
information in
spreadsheet.
CREES Google Sheets Add-on
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
23
Google Sheets
Ushahidi Integration / COMRADES Platform
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
Google Sheets
24
Summary and Conclusion
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
Publications
25
+ - CREES is a CNN based document
classification API.
- Open-source and integrated in Google
Sheets as an Add-on and in the
COMRADES Ushahidi platform.
- F-measure > 83% for event identification
and event type identification.
- Ability to identify information categories is
still not reliable (F-measure 61%)
- No user authentication, cache and rate
limiting.
- evhart.github.io/crees
Questions
@
Email: g.burel@open.ac.uk
Twitter: @evhart
CREES: evhart.github.io/crees
Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media
26

Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media

  • 1.
    GRÉGOIRE BUREL1, HARITHALANI1 1Knowledge Media Institute, The Open University, Milton Keynes, UK. ISCRAM’18, Rochester, New York, USA. 20-23 May 2018. Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media www.comrades-project.eu evhart.github.io/crees
  • 2.
    Social Media duringCrises Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 2 How to obtain relevant information from this large amount of heterogeneous high-velocity data? Social media has become a common place for communities and organisations to communicate and share information during crises, to enhance their situational awareness, to share requests or offers for help and support, and to coordinate their recovery efforts. Example of Twitter usage during crises (+200m active users / +400m posts a day): 1. During the 2011 Japan earthquake, 177 million tweets related to the event were sent in one day. 2. The news about the Boston bombings first appeared on Twitter.
  • 3.
    Event Detection andCrisis Situations Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 3 Event detection is “the task of automatically identifying certain clues in texts that denote a specific event type or theme”. - Help identifying/responding to events. - Organise relevant information during crises. - Prioritise response. ! Social Media Event Detection Challenges
  • 4.
    Typical (Manual) ProcessingPipeline Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 4 During crises, data is mostly processed manually: Collect Curation and Filtering Classification Entity Extraction Visualisation Analyse Understand & Visualise Search/browse social media website. Copy/paste in spreadsheet. Manually select relevant posts during collection. Annotate data / information priority, etc. Use visualisation applications (e.g., Ushahidi, OSM) or libraries.
  • 5.
    Automatic Processing Pipeline CrisisEvent Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 5 Supervised or semi-supervised methods as well as APIs and automatic tools can be used. Collect Curation and Filtering Classification Entity Extraction Visualisation Analyse Understand & Visualise Use APIs for collecting data. Use event detection models for filtering documents. Use visualisation applications (e.g., Ushahidi, OSM) or libraries. Supervised / non- supervised classification models. Named Entity Recognition and Entity Linking.
  • 6.
    Automatic Event Classification/ API / Integration Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 6 Classification API Integration Analyse Understand and Visualise
  • 7.
    Questions Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 7 Automatic Classification of Crisis- related Content using CNNs Classification API Integration
  • 8.
    Crisis-Related Event DetectionTasks Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media Publications 8 Crisis-related event detection is often divided into three main tasks (Olteanu et al. 2015): Crisis Related / Unrelated Crisis Type Information Categories Task 1 Identify the different types of crises the message is related to. Differentiate the type of information contained in the message. e.g., shooting, explosion, building collapse, fires, floods, meteorite fall, etc. e.g., affected individuals, infrastructures and utilities, donations and volunteer, caution and advice, etc. Granularity Differentiate the posts that are related or unrelated to crises. Task 2 Task 3
  • 9.
    Automatic Text Classification CrisisEvent Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 9 We can broadly distinguish two types of text classification approaches: 1. Unsupervised text classification: e.g., clustering, LDA. + Do not need categories and existing annotations. - Inferred classes are not typed and may have limited usefulness. 2. Supervised methods: e.g., SVM, CNN. + More precise and accurate classifications. - Needs annotated content.
  • 10.
    ‘Traditional’ ML vs.Deep Learning Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 10 Deep Learning - Artificial neural networks. - Minimum feature engineering - Word embeddings (Bengio et al., 2013). ‘Traditional’ ML - Standard classifiers (e.g., SVM, J48…). - Feature engineering (e.g., lemmatisation, TF-IDF…). - Bag of words.
  • 11.
    CNN for SentenceClassification (Kim et al., 2014) Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 11
  • 12.
    CNN Variations –Complexity and Responsiveness Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 12 More complex CNN models exists (e.g. semantic CNN, multimodal CNN, GCN) but: - Require additional pre-processing when doing predictions. - Require more time for training or re-training. - May not yield high improvement against CNN. ! Standard CNN models do not require complex pre-processing steps and can be more responsive for real-time applications
  • 13.
    CNN Classification –Experimental Setup Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media Dataset - T26 (28,000 annotated tweets) - 12 Crisis types (shooting, explosion, building collapse, fires, floods, meteorite fall, haze, bombing, typhoon, crash, earthquake, and derailment). - 6 Information categories (affected individuals, infrastructures and utilities, donations and volunteer, caution and advice, sympathy and emotional support, and other useful information)
  • 14.
    CNN Classification –Experimental Setup Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 14 Dataset versions - Full Dataset: 28,000 tweets. - Balanced Datasets: 6703 tweets (24%) / 12 997 tweets (46.5%) / 9 105 tweets (32.6%). Baselines - Naïve Bayes / CART / SVM: Classical ML models using the words’ TF- IDF vectors extracted from our dataset. Evaluation - 5-folds cross validation. - CNN: 300-dim embeddings, Fn = 128 convolutional filter of sizes Fs = [3,4,5], 0.5 dropout and ADAM. - Evaluation Measures: P, R and F1. ?
  • 15.
    Results Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 15 + - CNN results are similar to SVM. - F1 > 83% for relatedness and event types detection.. - CNN (word embeddings) potentially better at generalising on unseen datasets compared to SVM. - Crisis-information categories are hard to classify (61% F1). - The studied datasets are relatively small for the task (small amount of training data). - More complex deep learning models can perform better (e.g., Sem-CNN) -
  • 16.
    Questions Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 16 CREES API Classification API Integration
  • 17.
    Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 1717 The COMRADES CREES Services (Crisis Event Extraction Service) use CNNs to provide a rest API for annotating short text documents (e.g. tweets) by identifying : 1. If a document is related to a crisis. 2. The type of event discussed. 3. The type of information present in a document: https://evhart.github.io/crees
  • 18.
    Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 1818 CREES JSON API
  • 19.
    Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 1919 CREES API Load Testing - The CREES API was evaluated using Load Testing and Locust: - Approach: - Increases the number of users every second up to 1000 users performing 12 GET queries per second and record query latency and failures. - Perform the evaluation on an external network. - Results: 5% Failure with 1000 users / No failure with 500 users (latency < 700 ms).
  • 20.
    Questions Crisis Event ExtractionService (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 20 Integration Examples Classification API Integration
  • 21.
    CREES Integration Examples CrisisEvent Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 21 Although the CREES API can be integrated easily with different tools we have already integrated it to: - Google Sheets: - Spreadsheets are commonly used during crises for collecting and annotating information - We created a Google Sheets Add-on that automatically annotate spreadsheet cells or columns. - Ushahidi (COMRADES Platform): - The Ushahidi platform is a data collection and visualisation platform for situation awareness. - CREES automatically annotate reports added to the platform.
  • 22.
    Hurricane Harvey –Data Collection/Filtering Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 22 Google Sheets 22 Solicit Collect and Enrich Information solicited on social media. Copy/Paste and manually extract information in spreadsheet.
  • 23.
    CREES Google SheetsAdd-on Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 23 Google Sheets
  • 24.
    Ushahidi Integration /COMRADES Platform Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media Google Sheets 24
  • 25.
    Summary and Conclusion CrisisEvent Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media Publications 25 + - CREES is a CNN based document classification API. - Open-source and integrated in Google Sheets as an Add-on and in the COMRADES Ushahidi platform. - F-measure > 83% for event identification and event type identification. - Ability to identify information categories is still not reliable (F-measure 61%) - No user authentication, cache and rate limiting. - evhart.github.io/crees
  • 26.
    Questions @ Email: g.burel@open.ac.uk Twitter: @evhart CREES:evhart.github.io/crees Crisis Event Extraction Service (CREES) – Automatic Detection and Classification of Crisis-related Content on Social Media 26