Cody Buntain, Jimmy Lin, and Jennifer Golbeck presented a method called LABurst to discover key moments in social media streams without relying on pre-defined seed keywords. LABurst uses machine learning classifiers trained on bursty token patterns from sports event data to identify minutes with multiple bursting tokens as important moments. It outperforms baselines that detect bursts based only on frequency changes, achieving a best ROC-AUC of 89.84%. Bursty tokens at important World Cup moments included names of players who scored goals. LABurst also found moments in earthquake Twitter data, showing potential for other impactful domains.
15. 15
Event Tweet Count
Training Data
2010 NFL Division Championship 109,809
2012 Premier League Soccer Games 1,064,040
2014 NHL Stanley Cup Playoffs 2,421,065
2014 NBA Playoffs 500,170
2014 Kentucky Derby Horse Race 233,172
2014 Belmont Stakes Horse Race 226,160
2014 FIFA World Cup Stages A+B 5,867,783
Testing Data
2013 MLB World Series Game 5 1,052,852
2013 MLB World Series Game 6 1,026,848
2013 Honshu Earthquake 444,018
2014 NFL Super Bowl 1,024,367
2014 FIFA World Cup Third Place 809,426
2014 FIFA World Cup Final 1,166,767
2014 Iwaki Earthquake 358,966
Total 16,305,443
Methods
19. 19
Token
Feature Vector v
How do we
model these
bursts?
Freq. Regression
ΔAverage Freq.
Inter-Arrival Time
Message Entropy
Network Density
TF-IDF
TF-PDF1
BursT2
Methods
23. Baseline 1
RawBurst
23
Find “bursts” in Twitter’s raw
message frequency
Current Freq – Avg Freq ⩼
Threshold
? > threshold: KEY MOMENT!
Evaluation
24. Baseline 2
TokenBurst
24
Modify RawBurst to use
frequency of pre-specified
seed tokens
Current Freq – Avg Freq ⩼
Threshold
Sport Seed Tokens
World Series run, home, homerun
Super Bowl
score, touchdown, td, fieldgoal,
points
World Cup
goal, gol, golazo, score, foul,
penalty, card, red, yellow, points
Evaluation
26. How well does
our method
perform?
26
10-Fold Cross
Validation
Best scoring LABurst
ensemble classifier:
ROC-AUC of 89.84%
for training data
Results
27. Which features
are the most
important?
27
Feature Sets ROC-AUC Difference
AdaBoost, All
Features
89.84% –
Without
Regression
87.79% -2.05
Without
Entropy
87.94% -1.9
Without TF-
IDF
88.85% -0.99
Without TF-
PDF
89.00% -0.84
Without
Density
89.07% -0.77
Without
InterArrival
89.46% -0.38
Without
BursT
89.52% -0.31
Without
Average
Difference
90.56% 0.72
Results
33. Why is the
Super Bowl
hard?
33
Training/Testing Data:
Other Impactful Moments:
Discussion
34. What was
bursting at
these
moments?
34
Match Event
Bursty
Tokens
Brazil v.
Netherlands, 12 July
2014
Netherlands' Van
Persie scores a goal
on a penalty at 3',
1-0
0-1, 1-0, 1:0, 1x0,
card, goaaaaaaal,
goal, gol, goool,
holandaaaa, kırmızı,
pen, penal, penalti,
pênalti, persie, red
Brazil v.
Netherlands, 12 July
2014
Brazil's Oscar gets a
yellow card at 68'
dive, juiz, penalty,
ref
Germany v.
Argentina, 13 July
2014
Germany’s Götze
scores a goal at
113’, 1-0
goaaaaallllllll,
goalllll, godammit,
goetze, gollllll,
gooooool, gotze,
gotzeeee, götze,
nooo, yessss,
Discussion
38. Can these
models be
useful in other
domains?
38
Earthquake Detection
Honshu, Japan Earthquake -
25 October 2013
Iwaki, Japan Earthquake - 11
July 2014
Simultaneously
detects spikes
about the
earthquake
Also detects an
aftershock
Discussion
39. Can discover key moments
from Twitter streams
without seed tokens
39Conclusions
40. Can discover key moments
from Twitter streams
without seed tokens
40Conclusions
41. Can discover key moments
from Twitter streams
without seed tokens
41Conclusions
42. Can discover key moments
from Twitter streams
without seed tokens
42Conclusions
45. How do we
train these
classifiers?
45
Examples of Bursty
Tokens:
saints peterson
7-0
1-0
touchdown
score
goal
penalty
td
fumble
persie messi
tonalist
Examples of Non-
Bursty Tokens:
??
the, i, me, my, myself, we, our,
ours, ourselves, you, before,
after, above,
below, to, from, up, down, in,
out, on
Stop Words