LABurst identifies unanticipated key moments in social media

Cody Buntain
cbuntain@cs.umd.edu
Human-Computer
Interaction Lab
University of Maryland
Jimmy Lin
jimmylin@uwaterloo.ca
University of Waterloo
Jennifer Golbeck
golbeck@cs.umd.edu
CCNC’16
11 January 2016
Las Vegas, NV
Discovering Key Moments
in Social Media Streams
1

Most event
detection
systems track
human-
generated,
seed keywords
4
Tweets per second mentioning “gol copa, gool copa, goool,
golaço” during the match June 12th, 2014 [1]
Tweets per hour related to earthquakes [2]
Introduction

Step 1
Identify Keywords
5
goal, score
Step 2
Find Bursts
Typical Approach
Introduction

Weaknesses
6
goal == gooooal?
Introduction

Can we identify
interesting moments
without seed tokens?
7Introduction

Can we identify
interesting moments
8Introduction

Can we identify
interesting moments
9Introduction

Step 1
Identify Keywords
10
goal, score
Step 2
LABurst Algorithm
Find Bursts
goooal, 0-1, 0:1,
1-0, gollll, holandaaaa,
penal, penalti, persie
Introduction

LABurst Algorithm
Discover Unanticipated
Moments
11
suarez,
bit,
biting
Identify Keywords
Introduction

14
Can we transfer
these sports-trained
models to more  
impactful domains?
Methods

15
Event Tweet Count
Training Data
2010 NFL Division Championship 109,809
2012 Premier League Soccer Games 1,064,040
2014 NHL Stanley Cup Playoffs 2,421,065
2014 NBA Playoffs 500,170
2014 Kentucky Derby Horse Race 233,172
2014 Belmont Stakes Horse Race 226,160
2014 FIFA World Cup Stages A+B 5,867,783
Testing Data
2013 MLB World Series Game 5 1,052,852
2013 MLB World Series Game 6 1,026,848
2013 Honshu Earthquake 444,018
2014 NFL Super Bowl 1,024,367
2014 FIFA World Cup Third Place 809,426
2014 FIFA World Cup Final 1,166,767
2014 Iwaki Earthquake 358,966
Total 16,305,443
Methods

LABurst  
learns  
bursts from
sporting event
data
16Methods

How do we
model these
bursts?
17
Extract Tokens
Methods

How do we
model these
bursts?
18Methods

19
Token
Feature Vector v
How do we
model these
bursts?
Freq. Regression
ΔAverage Freq.
Inter-Arrival Time
Message Entropy
Network Density
TF-IDF
TF-PDF1
BursT2
Methods

20
Token
Feature Vector v
SVM Random Forests
Ensemble
Bursty or Not?
Bursty
Classiﬁer
Methods

The more
tokens that
experience
bursts in a
given minute,
the more
important the
moment
21
Key moment!
Methods

We evaluate
LABurst by
comparing it
against two
baseline
methods
22Evaluation

Baseline 1
RawBurst
23
Find “bursts” in Twitter’s raw
message frequency
Current Freq – Avg Freq ⩼
Threshold
? > threshold: KEY MOMENT!
Evaluation

Baseline 2
TokenBurst
24
Modify RawBurst to use
frequency of pre-speciﬁed
seed tokens
Current Freq – Avg Freq ⩼
Threshold
Sport Seed Tokens
World Series run, home, homerun
Super Bowl
score, touchdown, td, ﬁeldgoal,
points
World Cup
goal, gol, golazo, score, foul,
penalty, card, red, yellow, points
Evaluation

25
Compare
using ROC-
AUC
LABurst Threshold
Number of tokens
experiencing a burst in this
minute
Baseline Thresholds
Difference between  
current frequency and
average frequency
Evaluation

How well does
our method
perform?
26
10-Fold Cross
Validation
Best scoring LABurst
ensemble classiﬁer:
ROC-AUC of 89.84%
for training data
Results

Which features
are the most
important?
27
Feature Sets ROC-AUC Difference
AdaBoost, All
Features
89.84% –
Without
Regression
87.79% -2.05
Without
Entropy
87.94% -1.9
Without TF-
IDF
88.85% -0.99
Without TF-
PDF
89.00% -0.84
Without
Density
89.07% -0.77
Without
InterArrival
89.46% -0.38
Without
BursT
89.52% -0.31
Without
Average
Difference
90.56% 0.72
Results

How well does
our method
perform?
28Results

How well does
our method
perform?
29Results

How well does
our method
perform?
30Results

How well does
our method
perform?
31Results

Composite ROC-AUC
32
Competitive without
seed keywords or
prior domain
knowledge
Results

Why is the
Super Bowl
hard?
33
Training/Testing Data:
Other Impactful Moments:
Discussion

What was
bursting at
these
moments?
34
Match Event
Bursty
Tokens
Brazil v.
Netherlands, 12 July
2014
Netherlands' Van
Persie scores a goal
on a penalty at 3',
1-0
0-1, 1-0, 1:0, 1x0,
card, goaaaaaaal,
goal, gol, goool,
holandaaaa, kırmızı,
pen, penal, penalti,
pênalti, persie, red
Brazil v.
Netherlands, 12 July
2014
Brazil's Oscar gets a
yellow card at 68'
dive, juiz, penalty,
ref
Germany v.
Argentina, 13 July
2014
Germany’s Götze
scores a goal at
113’, 1-0
goaaaaallllllll,
goalllll, godammit,
goetze, gollllll,
gooooool, gotze,
gotzeeee, götze,
nooo, yessss,
Discussion

What other
moments did
LABurst
discover?
35
LABurst vs. TokenBurst
at World Cup Final
Discussion

What other
moments did
LABurst
discover?
36
LABurst vs. TokenBurst
at World Cup Final
Moment: "puyol",
"gisele", and
"bundchen"
Discussion

What other
moments did
LABurst
discover?
37
LABurst vs. Baseline
at World Cup Final
Moment: "pipita", "higuaín", "
", “pipa”, “choke”
Discussion

Can these
models be
useful in other
domains?
38
Earthquake Detection
Honshu, Japan Earthquake -
25 October 2013
Iwaki, Japan Earthquake - 11
July 2014
Simultaneously
detects spikes
about the
earthquake
Also detects an
aftershock
Discussion

Can discover key moments
from Twitter streams  
without seed tokens
39Conclusions

without seed tokens
40Conclusions

without seed tokens
41Conclusions

without seed tokens
42Conclusions

Cody Buntain
cbuntain@cs.umd.edu
@codybuntain
Human-Computer Interaction Lab
Thank you!
Questions?
43
Discovering Key Moments
in Social Media Streams

How do we
train these
classiﬁers?
45
Examples of Bursty
Tokens:
saints peterson
7-0
1-0
touchdown
score
goal
penalty
td
fumble
persie messi
tonalist
Examples of Non-
Bursty Tokens:
??
the, i, me, my, myself, we, our,
ours, ourselves, you, before,
after, above,
below, to, from, up, down, in,
out, on
Stop Words

LABurst identifies unanticipated key moments in social media

Recommended

Recommended

More Related Content

Similar to LABurst identifies unanticipated key moments in social media

Similar to LABurst identifies unanticipated key moments in social media (18)

More from Cody Buntain

More from Cody Buntain (9)

Recently uploaded

Recently uploaded (20)

LABurst identifies unanticipated key moments in social media