Towards Deep Learning from Twitter for
Improved Tsunami Alerts and Advisories
L. I. Lumb1 & J. R. Freemantle2
1York University & 2Independent
NH14A-03, 2017 AGU Fall Meeting
New Orleans, LA; December 11, 2017
Outline
● Motivation
● Previous Work
○ Text Classification
● Current Work
○ Natural Language Processing via Word Embeddings
○ Reanalysis of 2 Event Pairs
● Discussion
Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific
American, v. 294, p. 56-63
Data extracted from Twitter via a Perl script that targets #earthquake
Lumb & Freemantle,
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter
data (unstructured text that comprises the body of a Tweet) in constructing
feature vectors - i.e., the semantic value of Twitter metadata is ignored
● Curation of training data is extremely important (e.g., accuracy), but also
extremely time consuming as this supervised learning is a manual process
● “earthquake” can be used in different contexts (e.g., geophysics vs. movies
vs. politics …) and have a ‘subtly’ different meanings
5
Key Takeaways of “earthquake” Spam Classification
Lumb & Freemantle,
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Word Vectors
https://adriancolyer.files.wordpress.com/2016/04/word2vec-distributed-representation.png?w=600
"... a word is characterized by the company it keeps ..."
Firth (1957)
Firth, J.R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic
Analysis. Oxford: Philological Society: 1–32. Reprinted in F.R. Palmer, ed. (1968).
Selected Papers of J.R. Firth 1952-1959. London: Longman.
“earthquake” and
its ‘closest’ 20 words
Lumb & Freemantle,
HPCS 2017, http://2017.hpcs.ca/ (accepted).
Word-Vector Workflow: NLP via GloVe + PyTorch
http://pytorch.org
https://nlp.stanford.edu/projects/glove/
Lumb & Freemantle,
HPCS 2017, http://2017.hpcs.ca/ (accepted).
Pre-Trained Vectors Hammy Tweets Spammy Tweets
GloVe 6B 0.1182 0.0097481
Twitter 27B -0.033930 -0.064906
Preliminary Results: “earthquake” Cosine Similarities
GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d
Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d
Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
Event Pairs Selected for Reanalysis
Tohoku
05:46 UTC, 11 March 2011
29 km, ~9 Mw earthquake & tsunami
Miyagi
14:32 UTC, 7 April 2011
49 km, 7.1 Mw earthquake only
Chiapas
04:49 UTC, 8 September 2017
50 km, 8.2 Mw earthquake & tsunami
Central Mexico
18:14 UTC, 19 September 2017
51 km, 7.1 Mw earthquake only
Curated according to start time ONLY
Pre-Trained Vectors Tohoku 3/11/2011 Miyagi 4/7/2011
GloVe 6B -0.2289 0.06455
Twitter 27B -0.05655 -0.03156
# tweets / # words 1374 / 715 146 / 328
Re-analysis Results: “earthquake” Cosine Similarities
GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d
Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d
Pre-Trained Vectors Chiapas 9/8/2017 Central Mexico 9/19/2017
GloVe 6B -0.1306 -0.01169
Twitter 27B 0.1050 0.1273
# tweets / # words 304 / 468 415 / 759
“earthquake-tsunami” Similarity
0 1
GloVe 6B
0.8255
Twitter 27B
0.009244
Tohoku
0.7161
Miyagi
-0.2540
Chiapas
0.3156
Central Mexico
-0.001964
Vector size = 50
Discussion
● Embedded word vectors superior to text classification in isolating
geophysically relevant content
○ Embeddings convey significantly enhanced semantic value over bland features
○ Unsupervised learning replaces manually intensive requirement for close supervision
● Using NLP via embedded word vectors
○ Closest word and inter-corpora cosine similarities prove inconclusive in isolation
○ Intra-corpora cosine similarities (e.g., “earthquake-tsunami”) appear more promising in
isolating tsunami-producing earthquakes
○ Word-vector analogies require additional consideration
● Steps towards operationalization
○ Enable shift from offline, reanalysis to online, real-time streaming
○ Focus efforts on the time interval between the earthquake and (potential) arrival of the tsunami
● Applicable in other disaster scenarios - e.g., hurricanes, wildfires, ...
www.univa.com 14
Tsunami Advisories
if ( EARTHQUAKE ) then {
TSUNAMI }
if ( Mw > 8.0 and TRENCH
and DISPLACEMENT and DEEP WATER ) then {
TSUNAMI }
Q&A
L. I. Lumb1 & J. R. Freemantle2
1ianlumb@yorku.ca & 2james.freemantle@rogers.com
Additional Content
Motivation
● Non-deterministic cause
○ Uncertainty inherent in any attempt to predict earthquakes
■ In situ measurements may reduce uncertainty
● Lead times
○ Availability of actionable observations
○ Communication of situation - advisories, warnings, etc.
● Cause-effect relationship
○ Energy transfer - inputs ... coupling ... outputs
■ ‘Geometry’ - bathymetry and topography
○ Other factors - e.g., tides
● Established effect
○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time)
have proven to be extremely accurate ... requires
● Distributed array of deep-ocean tsunami detection buoys + forecasting model
After Karau et al., Learning Spark, O’Reilly, 2015
“earthquake” Spam Classification via Apache Spark
The Opportunity for Semantics
● A feature vector is a feature vector - it is devoid of semantics
● Ignores inherent, overall credibility of a Tweet - e.g., as quantified by
TweetCred
● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter
data (unstructured text that comprises the body of a Tweet) in constructing
feature vectors - i.e., the semantic value of Twitter metadata is also ignored
by Deep Learning
● The W3C’s Resource Description Framework (RDF) facilitates the
representation of metadata and thus exposes semantics
● The W3C’s Web Ontology Language (OWL) accounts for domain specifics -
disambiguates use of overloaded terms (e.g., “earthquake”) in different
contexts (e.g., geophysics vs. movies vs. …)
● Deep Learning in combination with RDF/OWL semantics has the potential to
produce learned models with knowledge represented
23
http://pytorch.org/about/
www.univa.com
PyTorch
● Python package that provides
○ Tensor computation – strong GPU acceleration, efficient memory usage
■ Integrated with NVIDIA CuDNN and NCCL libraries
○ Deep Neural Networks built on a tape-based autograd system
● Can leverage numpy, scipy and Cython as needed
● Available tutorials include Natural Language Processing (NLP)
Big Data’s 6Vs
24
http://credit.pvamu.edu/MCBDA2016/Slides/Day2
_Lumb_MCBDA1_Twitter_Tsunami.pdf
www.univa.com

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories

  • 1.
    Towards Deep Learningfrom Twitter for Improved Tsunami Alerts and Advisories L. I. Lumb1 & J. R. Freemantle2 1York University & 2Independent NH14A-03, 2017 AGU Fall Meeting New Orleans, LA; December 11, 2017
  • 2.
    Outline ● Motivation ● PreviousWork ○ Text Classification ● Current Work ○ Natural Language Processing via Word Embeddings ○ Reanalysis of 2 Event Pairs ● Discussion
  • 3.
    Geist, E.L., Titov,V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific American, v. 294, p. 56-63
  • 4.
    Data extracted fromTwitter via a Perl script that targets #earthquake Lumb & Freemantle, http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
  • 5.
    ● Twitter metadata(handles, hashtags and URLs) contributes equally to Twitter data (unstructured text that comprises the body of a Tweet) in constructing feature vectors - i.e., the semantic value of Twitter metadata is ignored ● Curation of training data is extremely important (e.g., accuracy), but also extremely time consuming as this supervised learning is a manual process ● “earthquake” can be used in different contexts (e.g., geophysics vs. movies vs. politics …) and have a ‘subtly’ different meanings 5 Key Takeaways of “earthquake” Spam Classification Lumb & Freemantle, http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
  • 6.
    Word Vectors https://adriancolyer.files.wordpress.com/2016/04/word2vec-distributed-representation.png?w=600 "... aword is characterized by the company it keeps ..." Firth (1957) Firth, J.R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic Analysis. Oxford: Philological Society: 1–32. Reprinted in F.R. Palmer, ed. (1968). Selected Papers of J.R. Firth 1952-1959. London: Longman.
  • 7.
    “earthquake” and its ‘closest’20 words Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
  • 8.
    Word-Vector Workflow: NLPvia GloVe + PyTorch http://pytorch.org https://nlp.stanford.edu/projects/glove/ Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
  • 9.
    Pre-Trained Vectors HammyTweets Spammy Tweets GloVe 6B 0.1182 0.0097481 Twitter 27B -0.033930 -0.064906 Preliminary Results: “earthquake” Cosine Similarities GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
  • 10.
    Event Pairs Selectedfor Reanalysis Tohoku 05:46 UTC, 11 March 2011 29 km, ~9 Mw earthquake & tsunami Miyagi 14:32 UTC, 7 April 2011 49 km, 7.1 Mw earthquake only Chiapas 04:49 UTC, 8 September 2017 50 km, 8.2 Mw earthquake & tsunami Central Mexico 18:14 UTC, 19 September 2017 51 km, 7.1 Mw earthquake only Curated according to start time ONLY
  • 11.
    Pre-Trained Vectors Tohoku3/11/2011 Miyagi 4/7/2011 GloVe 6B -0.2289 0.06455 Twitter 27B -0.05655 -0.03156 # tweets / # words 1374 / 715 146 / 328 Re-analysis Results: “earthquake” Cosine Similarities GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d Pre-Trained Vectors Chiapas 9/8/2017 Central Mexico 9/19/2017 GloVe 6B -0.1306 -0.01169 Twitter 27B 0.1050 0.1273 # tweets / # words 304 / 468 415 / 759
  • 12.
    “earthquake-tsunami” Similarity 0 1 GloVe6B 0.8255 Twitter 27B 0.009244 Tohoku 0.7161 Miyagi -0.2540 Chiapas 0.3156 Central Mexico -0.001964 Vector size = 50
  • 13.
    Discussion ● Embedded wordvectors superior to text classification in isolating geophysically relevant content ○ Embeddings convey significantly enhanced semantic value over bland features ○ Unsupervised learning replaces manually intensive requirement for close supervision ● Using NLP via embedded word vectors ○ Closest word and inter-corpora cosine similarities prove inconclusive in isolation ○ Intra-corpora cosine similarities (e.g., “earthquake-tsunami”) appear more promising in isolating tsunami-producing earthquakes ○ Word-vector analogies require additional consideration ● Steps towards operationalization ○ Enable shift from offline, reanalysis to online, real-time streaming ○ Focus efforts on the time interval between the earthquake and (potential) arrival of the tsunami ● Applicable in other disaster scenarios - e.g., hurricanes, wildfires, ...
  • 14.
  • 15.
    if ( EARTHQUAKE) then { TSUNAMI }
  • 16.
    if ( Mw> 8.0 and TRENCH and DISPLACEMENT and DEEP WATER ) then { TSUNAMI }
  • 18.
    Q&A L. I. Lumb1& J. R. Freemantle2 1ianlumb@yorku.ca & 2james.freemantle@rogers.com
  • 19.
  • 20.
    Motivation ● Non-deterministic cause ○Uncertainty inherent in any attempt to predict earthquakes ■ In situ measurements may reduce uncertainty ● Lead times ○ Availability of actionable observations ○ Communication of situation - advisories, warnings, etc. ● Cause-effect relationship ○ Energy transfer - inputs ... coupling ... outputs ■ ‘Geometry’ - bathymetry and topography ○ Other factors - e.g., tides ● Established effect ○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time) have proven to be extremely accurate ... requires ● Distributed array of deep-ocean tsunami detection buoys + forecasting model
  • 21.
    After Karau etal., Learning Spark, O’Reilly, 2015 “earthquake” Spam Classification via Apache Spark
  • 22.
    The Opportunity forSemantics ● A feature vector is a feature vector - it is devoid of semantics ● Ignores inherent, overall credibility of a Tweet - e.g., as quantified by TweetCred ● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter data (unstructured text that comprises the body of a Tweet) in constructing feature vectors - i.e., the semantic value of Twitter metadata is also ignored by Deep Learning ● The W3C’s Resource Description Framework (RDF) facilitates the representation of metadata and thus exposes semantics ● The W3C’s Web Ontology Language (OWL) accounts for domain specifics - disambiguates use of overloaded terms (e.g., “earthquake”) in different contexts (e.g., geophysics vs. movies vs. …) ● Deep Learning in combination with RDF/OWL semantics has the potential to produce learned models with knowledge represented
  • 23.
    23 http://pytorch.org/about/ www.univa.com PyTorch ● Python packagethat provides ○ Tensor computation – strong GPU acceleration, efficient memory usage ■ Integrated with NVIDIA CuDNN and NCCL libraries ○ Deep Neural Networks built on a tape-based autograd system ● Can leverage numpy, scipy and Cython as needed ● Available tutorials include Natural Language Processing (NLP)
  • 24.