A Real-time System for Detecting Landslide Reports on Social Media using Artificial Intelligence

A Real-time System for Detecting
Landslide Reports on Social Media
using Artificial Intelligence
Ferda Ofli1, Umair Qazi1, Muhammad Imran1, Julien Roch2,
Catherine Pennington3, Vanessa Banks3, Remy Bossu2
1Qatar Computing Research Institute
2European-Mediterranean Seismological Centre
3British Geological Survey
ICWE 2022
Bari, Italy

Agenda
• Motivation
• System Design
• Model Development
• System Benchmark
• Real-world Deployment
• Conclusion
2

Motivation
Landslides cause thousands of deaths and millions of dollars in infrastructural
damage worldwide each year. 3

Motivation
Landslide events are often under-reported and insufficiently documented.
Credit: Petley, D. Geology (2012)
4
Lack of such important data not only hinders humanitarian aid
but also impedes scientific research.

Existing Approaches
On-the-ground surveys Satellite imagery analysis
5
© BGS
- Expensive
- Time-consuming
- Impractical/not-applicable

Existing Approaches – Citizen Science (I)
6
Juang et al., “Using citizen science to expand the global map of landslides: Introducing the Cooperative Open
Online Landslide Repository”, Plos One 2019.
NASA Landslide Reporter

Existing Approaches – Citizen Science (II)
7
Mobile Applications
Kocaman & Gokceoglu, “A CitSci app for
landslide data collection”, Landslides 2019.
Sellers et al., “MARLI: a mobile application for regional
landslide inventories in Ecuador”, Landslides 2021.
Not easily scalable as they require active participation of
volunteers that opt-in to use a particular application.

Goal
Identify landslide
reports on social
media seamlessly
and at a much
larger scale
9

Detecting Landslides in Tweets
10
motorcycle
accident
heavy rainfall
earthquake
wildfire tropical cyclone
on fire
flooded
car accident

11
motorcycle
accident
heavy rainfall
earthquake
on fire
flooded
car accident

12
motorcycle
accident
heavy rainfall
earthquake
on fire
flooded
car accident

System Architecture – Image Pipeline
14

15

16

17

18

19

System Architecture – Text Pipeline
20

21

22

23

24

Duplicate Filter
• Image features extracted from the penultimate layer of a ResNet-50
model pre-trained on the Places dataset
• Threshold based on Euclidean distance
• 600 image pairs (460 duplicate / 140 non-duplicate)
27

Junk Filter
• Fine-tune a ResNet-50 model, pre-trained on the ImageNet dataset,
using a custom dataset introduced by Nguyen et al. [ISCRAM 2017]
28
Nguyen et al., “Automatic Image Filtering on Social Networks Using Deep Learning
and Perceptual Hashing During Crises”, ISCRAM 2017.

Landslides
Landslide Rockslide Mudslide
Keywords: landslide, landslip, earth slip, mudslide, mudflow, rockslide, rock fall, cliff fall
30

Collection of Landslide Images
• Downloaded from Google and Twitter using keywords
• Donated by BGS
31

Labeling Methodology
• Manual annotation by three landslide specialists
• Several rounds of discussion to agree on a labeling methodology
• CV-based interpretation is different from desk- or field-based landslide identification
32
Pennington et al., “A near-real-time global landslide incident reporting tool
demonstrator using social media and artificial intelligence”, IJDRR 2022.

Final Dataset
• Inter-annotator agreement
• Fleiss’ Kappa = 0.58 (almost substantial)
• Percent Agreement = 76%
• Imbalanced class distribution
• 23% landslide vs. 77% not-landslide
Google Twitter BGS Total
Landslide 1,240 598 852 2,690
Not-landslide 5,044 555 3,448 9,047
Total 6,284 1,153 4,300 11,737
34
Pennington et al., “A near-real-time global landslide incident reporting tool
demonstrator using social media and artificial intelligence”, IJDRR 2022.

Landslide Model Training
• Fine-tune a ResNet-50 model, pre-trained on the ImageNet dataset,
using the home-grown dataset.
36
Ofli et al., “Landslide Detection in Real-Time Social Media Image
Streams”, arXiv preprint arXiv:2110.04080, 2021.

Qualitative
Analysis
w/ t-SNE
37

Class Activation Maps – True Positives
38

Class Activation Maps – True Negatives
39

Class Activation Maps – False Positives
40

Class Activation Maps – False Negatives
41

Geolocation Tagger
42
Qazi et al., “GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with
Location Information”, Computer Science, ACM SIGSPATIAL Special, v12, pp 6-15, 2020.

Performance Evaluation & Benchmarking
• Stress-test the system and understand its scalability
• Latency
• time taken by a module to process a given input load
• Throughput
• number of items processed in a unit time (one second) given an input load
• Critical system components
• Duplicate filter
• Junk filter
• Landslide detector
• Geolocation tagger
43

44
Input Load (per second)
Latency
(second)
0
50
100
150
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
Duplicate Filter
Throughput
(items/second)
0
10
20
30
40
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
Duplicate Filter

45
Latency
(second)
0
5
10
15
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
Junk Filter
Throughput
(items/second)
0
100
200
300
400
500
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
Junk Filter

46
Latency
(second)
0
5
10
15
20
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
Landslide Detector
Throughput
(items/second)
0
100
200
300
400
500
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
Landslide Detector

47
Latency
(second)
0
100
200
300
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
With cache Without cache
Geolocation Tagger
Throughput
(items/sec)
0
20
40
60
0
1
2
4
8
1
6
3
2
6
4
1
2
8
2
5
6
5
1
2
1
0
2
4
2
0
4
8
4
0
9
6
With cache Without cache
Geolocation Tagger

Real-world Deployment
• Online since February 2020 to monitor live Twitter stream globally
• 339 multilingual keywords in 32 languages
• February 2020 – December 2021
• Collected more than 54 million tweets and 15 million image URLs
• ~2.5 million image URLs deemed unique and downloaded for further analysis
• ~17,000 images classified as relevant, unique and landslides
• Corresponds to <1% of the collected images
• Highlights the challenging nature of the problem
• ~6,500 landslide reports shared by personal accounts whereas ~4,500 by
organizational accounts
48

Real-world Deployment – Data Statistics
49
Data
Volume
1
10
100
1000
10000
100000
1000000
2020-02-01
2020-02-29
2020-03-28
2020-04-25
2020-05-23
2020-06-20
2020-07-18
2020-08-15
2020-09-12
2020-10-10
2020-11-07
2020-12-05
2021-01-02
2021-01-30
2021-02-27
2021-03-27
2021-04-24
2021-05-22
2021-06-19
2021-07-17
2021-08-14
2021-09-11
2021-10-09
2021-11-06
2021-12-04
2021-12-31
Raw Tweets Raw Images Relevant Images Non-duplicate Images Landslide Images

Real-world Deployment – Verification
• Randomly sampled 3,600 images processed by the system
• Asked experts to label the sampled images
• System-predicted labels compared to expert annotations
50

Real-world Deployment – Verification
• Randomly sampled 3,600 images processed by the system
• Asked experts to label the sampled images
• System-predicted labels compared to expert annotations
51
True False
Landslide (positive) 123 39
Not-landslide (negative) 3395 43

Real-world Deployment – Worldwide Reports
52
NASA landslide susceptibility map

Real-world Deployment – Country Maps
53

Real-world Deployment – Quarterly Maps
54

55
US, Ecuador, Colombia, and India experience significant landslide numbers all year round.

56
For India, landslides become even more prevalent in Q3.

57
Mexico experiences a significant increase in Q3.

58
Prominent landslide numbers in Indonesia and Malaysia happen in Q1 and Q4.

59
Prominent landslide numbers in the UK happen in Q1 and Q2.

60
Turkey experiences most landslides in Q1 thru Q3.

Conclusion
• An interdisciplinary collaboration between computer scientists
(QCRI), seismologists (EMSC), and landslide specialists (BGS).
• The system leverages online social media data in real time to identify
landslide reports automatically using state-of-the-art AI techniques
• Reduces the information overload by eliminating duplicate and irrelevant
content
• Identifies landslide images
• Infers their geolocation
• Categorizes the user type (organization or person)
• The real-world deployment shows the success of the system.
69

Conclusion
• We believe that our system can contribute to harvesting of global
landslide data and facilitate further landslide research.
• It can support global landslide susceptibility maps to provide
situational awareness and improve emergency response and decision
making.
• Next steps:
• Historical data analysis w/ ground truth from other sources, e.g., BGS, NASA,
EM-DAT, etc.
• Spatiotemporal detection of events
70

Thank you!
https://landslide-aidr.qcri.org/service.php
Please give us feedback!
71

A Real-time System for Detecting Landslide Reports on Social Media using Artificial Intelligence

Recommended

Recommended

More Related Content

Similar to A Real-time System for Detecting Landslide Reports on Social Media using Artificial Intelligence

Similar to A Real-time System for Detecting Landslide Reports on Social Media using Artificial Intelligence (20)

Recently uploaded

Recently uploaded (20)

A Real-time System for Detecting Landslide Reports on Social Media using Artificial Intelligence