SlideShare a Scribd company logo
1 of 25
Download to read offline
Category-Level Transfer Learning from Knowledge Base
to Microblog Stream for Accurate Event Detection
Weijing Huang, Tengjiao Wang, Wei Chen, Yazhou Wang
School of Electronics Engineering and Computer Science, Peking University
@DASFAA 2017, Suzhou, China
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 1 / 25
Motivation
Many Web applications need the accurate event detection technique on
microblog stream, including:
1 public opinion analysis [Chen, SIGIR 2013]
2 public security [Li, ICDE 2012], [Imran, WWW 2014]
3 disaster response [Sakaki,WWW 2010]
4 breaking news report1
But detecting events on twitter stream accurately is still challenging.
1
http://www.theverge.com/2016/12/1/13804542/
reuters-algorithm-breaking-news-twitter
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 2 / 25
Challenges (1/2)
According to [Huang, WWW 2016], the challenges include,
1 fast changing
2 high noise
3 short length
And, we found another key factor,
1 Small events with fewer tweets Hard to trade off between precision
and recall.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 3 / 25
Challenges (2/2)
Exploratory study on the Edinburgh twitter corpus: 11/27 events contain
less than 50 tweets.
Table: Statistics of labeled events.
Event Date Event Size
S&P downgrade US credit rating 05/08/2011 656
Atlantis shuttle lands 21/07/2011 595
US increases debt ceiling 25/07/2011 485
Plane with Russian hocky team Lokomotiv crashes 07/09/2011 286
Amy Winehouse dies 23/07/2011 283
Gunman opens fire in youth camp in Norway 23/07/2011 260
Earthquake in Virginia 24/08/2011 246
First victim of London riots dies 09/08/2011 174
Explosion in French nuclear plant in Marcoule 12/09/2011 135
Google announces plans to bury Motorola Mobility 15/08/2011 127
NASA announces there might be water on Mars 04/08/2011 124
Car bomb explodes in Oslo, Norway 22/07/2011 114
... ... ...
Indian and Bangladesh sign a border pact 06/09/2011 25
Flight 4896 crash 13/07/2011 21
First aritficial organ transplant 12/07/2011 18
three men die in riots in england 10/08/2011 16
rebels capture interational tripoli airport 21/08/2011 13
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 4 / 25
11
Small
events
with
fewer
tweets
How about existing methods? (1/2)
Event detection methods without extra information, such as
1 clustering articles
LSH[Petrovic, NAACL 2010]
need to set threshold to determine whether new article represents a
new event.
2 analyzing word frequencies
EDCoW[Weng, ICWSM 2011]
treat the word as the basic unit in analysis, without regarding polysemy
words (words have different meanings, e.g. “apple”)
3 finding bursty topics via topic modeling
TimeUserLDA[Diao, ACL 2012], BurstyBTM[Yan, AAAI 2015]
detects the “large” events but may ignore the “small” ones.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 5 / 25
How about existing methods? (2/2)
Event detection methods by leveraging extra information
1 typical one: Twevent[Li, CIKM 2012]
divides the tweet into segments according to the Microsoft Web
N-Gram service and Wikpedia
detects the bursty segments and cluster these segments into candidate
events
still has to trade off between precision and recall
e.g., the bursty segment in the not-so-popular event “first artificial
organ transplant” is missed
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 6 / 25
An Example
Much easier to detect the event on the time series of word “hood”
related to Military, without adjusting the threshold.
Jul 2011 Jul 2011
18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
25
50
75
100
Document
Frequency
Ft. Hood Attack
word "hood"
word "hood"
(about Military)
Figure: The comparison of the time series between the raw word hood and the
Military related word hood, computed on the Edinburgh twitter corpus. Refer the
event to https://en.wikipedia.org/wiki/Fort_Hood#2011_attack_plot.
An infant wearing a hood.
k and
he head,
n from
orm of
hood is
are used
ses where
the
bers,
(a)
Fort Hood
From Wikipedia, the free encyclopedia
Fort Hood is a U.S. military post located in Killeen, Texas. The
post is named after Confederate General John Bell Hood. It is
located halfway between Austin and Waco, about 60 miles
(b)Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 7 / 25
The insights on the example
Knowledge Base
Well organized
Constructed elaborately
full of rich information
Microblog stream
Short length
Fast changing
High noise
The benefit of enriching the semantics and filtering out noise by
Knowledge Base for microblogs is attractive.
But it’s expensive to retrieve every word of tweets in the Knowledge
Base.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 8 / 25
Our solution
TransDetector: a novel category-level transfer learning method
transfer KB’s category-level info into microblog stream
balance the performance and the cost of leveraging knowledge base
for event detection
knowledge base
information
about Military
docs@2011-7-28
in the stream
Texture
Texture
Texture
army
military
…
shootings
projectiles
…
killed
news
#libya
libya
…
hood
…
analysis on words
“killed”, “news”, …
category-level topic in
microblog stream
@2011-07-28, about Military
category-level topic
in knowledge base,
about Military
2011-07-28
event: “attack
at Fort Hood”,
event: “#libya
Libyan rebels
chief killed” …
1
2
3
Figure: TransDetector’s processing flow, in 3 phases.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 9 / 25
TransDetector: Phase 1 (Extracting Category-Level
Topics in KB) (1/3)
There is a three fold hierarchical structure in Knowledge Base.
1 Taxonomy Graph G(0). The directed edges in G(0) represent the
class→subclass relations in KB.
e.g., Main topic classifications → Society
2 Category-Page Bipartite Graph G(1). The directed edges in G(1)
represent the class→instance relations in KB.
e.g., Military → page: Armed Forces
3 Page-Content Map G(2). For a specific Wikipedia dumps version,
the edges page →content in G(2) define a one-to-one mapping.
e.g., page: Armed Forces → content(20170325version): “The armed
forces of a country are its government-sponsored defense, fighting
forces, and organizations...”
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 10 / 25
TransDetector: Phase 1 (Extracting Category-Level
Topics in KB) (2/3)
Algorithm 1: Extraction of Category-Level Topics in Knowledge Base
Input: Taxonomy’s Graph G(0), Category-Page Bipartite Graph G(1), Page-Content Bipartite
Graph G(2), topic related category node c
Output: c’s category-level topic in knowledge base hc
1 Pages(c) ← ∅, hc ← ∅
2 DAG G(0) ← Remove Cycles of G(0) by nodes’ HITS-PageRank scores.
3 SuccessorNodes(c) ← Breadth-first-traverse(G(0) , c)
4 for node ∈ SuccessorNodes(c) do
5 Pages(c) ← Pages(c) ∪ G(1).neighbours(node)
6 Word frequency table n(c, .) ← do word count on the text contents of Pages(c)
7 Word frequency table n(All, .) ← do word count on the text contents of all pages in G(2).
8 for word w in WordFrequencyTable(All).keys() do
9 chi(c, w) ← w’s chi-square statistics on WordFrequencyTable(c) and
WordFrequencyTable(All).
10 hc,w ← chi(c, w)
11 return hc
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 11 / 25
TransDetector: Phase 1 (Extracting Category-Level
Topics in KB) (3/3)
Taking the category Military as an example, we extract Military’s
category-level topic hMilitary .
Main topic
classifications
Culture
…
Society Politics
page: Military
page: Armed
Forces
page:
Firearm
page: World
War II
Charity
…
…
… Military
…
Activism
…
…
World War II…
category-level topic in
knowledge base, about military
army, military, air, command, force, regiment, forces,
squadron, infantry, battle, commander, …, shootings,
projectiles,… (order by words’ chi-square scores )
…
G(0):Taxonomy’s
Graph
G(1)
:Category-Page Bipartite Graph
G(2): Page-Content
Bipartite Graph
Figure: Extracting Category-Level Topics in Knowledge Base via its three fold
hierarchical structure, taking Military as an example.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 12 / 25
TransDetector: Phase 2 (Transferring Category-Level
Info into Microblog Stream) (1/2)
Transfer KB’s Category-Level Topics {hc}KKB
c=1 into microblogs stream:
CTrans-LDA.
u m θd zdn wdn
α β ϕk τk
D
Nd
K
Figure: Diagram of CTrans-LDA.
In CTrans-LDA, {hc}KKB
c=1 is used as prior information:
τkv =



λ
hkv
v∈Sk
hkv
, v ∈ Sk and k ≤ KKB
0, v /∈ Sk or k > KKB
(1)
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 13 / 25
TransDetector: Phase 2 (Transferring Category-Level
Info into Microblog Stream) (2/2)
We use Gibbs Sampling for solving CTrans-LDA.
The initialization probability ˆqk|v makes sure that the learned topics
are aligned to the pre-defined category-level topic.
ˆqk|v =



τkv
K
k=1 τkv
,
k
τkv > 0 (a)
0,
k
τkv = 0 and k ≤ KKB (b)
1/(K − KKB ),
k
τkv = 0 and k > KKB (c)
(2)
Conditional probability in gibbs sampling:
p(zdn = k|.) ∝ (n
(d)
dk + αmk)(n
(w)
kv + τkv + β)/(n
(w)
k,. + τk,. + V β).
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 14 / 25
TransDetector: Phase 3 (Detecting Events on
Category-Level Word Time Series) (1/2)
After transfer learning, we conduct analysis on category-level word time
series, and detect events in microblog stream.
Figure: Visualizing Category-Level Word Time Series in Microblog Stream on
Edinburgh Twitter Corpus (20110711-20110915), taking Military as an example.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 15 / 25
TransDetector: Phase 3 (Detecting Events on
Category-Level Word Time Series) (2/2)
With richer semantics, and fewer noise, we detect events more accurately,
in all the following granularities.(see more technique details in our paper)
1 events’ candidate words
e.g., Ft., Hood, attack.
2 event phrases
e.g., Ft. Hood attack.
3 events’ representative microblogs
e.g., Possible Ft. Hood Attack Thwarted http: // t. co/ BSJ33hk .
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 16 / 25
Experiment Settings
Dataset
Knowledge Base. Wikipedia dumps23
Microblog Stream. Edinburgh twitter corpus4
Baseline Methods
Twevent, BurstyBTM, LSH, EDCoW, and TimeUserLDA
Ground Truth
Benchmark1 is labeled events in previous study.
Benchmark2 is our manually checked events based on Twevent,
BurstyBTM, LSH, EDCoW, TimeUserLDA and TransDetector.
2
https:
//dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz
3
https:
//dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
4
http://demeter.inf.ed.ac.uk/cross/docs/fsd_corpus.tar.gz
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 17 / 25
Experimental Results
Evaluation on Categroy-Level Topics in Knowledge Base. On Aviation
topic, semantic coherence is much better than LightLDA (same as LDA)
in terms of NPMI[Roder, WSDM 2015]5.
Table: The comparison on the topic coherence(NPMI) between our method and
LightLDA, taking Aviation as an example. (NPMI is computed on a group of ten
words. ∼ stands for the top five words.)
Category-Level Topics extracted from Wikipedia by TransDetector Topics Learned from Wikipedia by LightLDA
GID #words* words NPMI GID #words* words NPMI
- 1-5 aircraft air airport flight airline - - 1-5 engine aircraft car air power -
0 1-5, 6-10 ∼, airlines aviation flying pilot squadron 0.113 0 1-5, 6-10 ∼, design flight model production speed 0.112
1 1-5, 11-15 ∼, flights pilots raf airways fighter 0.155 1 1-5, 11-15 ∼, system vehicle cars engines mm 0.062
2 1-5, 16-20 ∼, boeing runway force crashed flew 0.092 2 1-5, 16-20 ∼, fuel vehicles designed models type 0.072
3 1-5, 21-25 ∼, airfield landing passengers plane aerial 0.179 3 1-5, 21-25 ∼, version front produced rear electric 0.035
4 1-5, 26-30 ∼, bomber radar wing bombers crash 0.137 4 1-5, 26-30 ∼, space control motor standard development 0.085
5 1-5, 31-35 ∼, airbus airports operations jet helicopter 0.189 5 1-5, 31-35 ∼, film range light using available -0.002
6 1-5, 36-40 ∼, squadrons base flown havilland crew 0.088 6 1-5, 36-40 ∼, wing powered wheel weight launch 0.087
7 1-5, 41-45 ∼, combat luftwaffe aerodrome carrier fokker 0.159 7 1-5, 41-45 ∼, developed low test ford cylinder 0.007
8 1-5, 46-50 ∼, planes fly engine takeoff fleet 0.186 8 1-5, 46-50 ∼, equipment side pilot hp aviation 0.091
9 1-5, 51-55 ∼, fuselage helicopters aviator naval aero 0.157 9 1-5, 51-55 ∼, systems us sold body drive -0.051
10 1-5, 56-60 ∼, glider command training balloon faa 0.166 10 1-5, 56-60 ∼, gear introduced class safety seat 0.069
· · · · · · · · · · · · · · · · · · · · · · · ·
18 1-5, 96-100 ∼, scheduled carriers military curtiss biplane 0.131 18 1-5, 96-100 ∼, transmission special replaced limited different 0.059
19 1-5, 101-105 ∼, accident engines iaf albatross rcaf 0.068 19 1-5, 101-105 ∼, features machine nuclear even unit 0.011
5
https://github.com/AKSW/Palmetto
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 18 / 25
Experimental Results
Evaluation on Categroy-Level Topics in Knowledge Base. On more topics,
semantic coherence is much better than LightLDA (same as LDA) in
terms of NPMI[Roder, WSDM 2015].
−0.1
0.0
0.1
0.2
0.3
0 10 20 30 40 50
Group ID
NPMI
Category−Level Topics in Knowledge Base
Topics learned by LightLDA
Figure: More topics are compared at the NPMI metrics between our method and
LightLDA
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 19 / 25
Experimental Results
Effectiveness of transferring category-level topics into the microblog
stream, and finding more new words in the stream which is not stored in
the knowledge base.
Table: Category-Level Topics extracted from knowledge base and the
corresponding topics on microblog stream learned from CTrans-LDA. The words
in bold font are newly learned on the microblog stream by the transfer learning.
Aviation Health Middle East Military Mobile Phones
Knowledge
Base
Microblog
Stream
Knowledge
Base
Microblog
Stream
Knowledge
Base
Microblog
Stream
Knowledge
Base
Microblog
Stream
Knowledge
Base
Microblog
Stream
aircraft air health weight al #syria army killed android iphone
air plane patients loss israel #bahrain military news mobile apple
airport flight medical diet iran people air #libya nokia android
flight time disease health arab israel command libya ios app
airline airlines treatment cancer israeli police force rebels phone ipad
airlines news hospital lose egypt #libya regiment people samsung samsung
aviation boat patient fat egyptian #egypt forces police game mobile
flying airport clinical tips ibn news squadron war app blackberry
pilot force symptoms treatment jerusalem #israel infantry libyan iphone tablet
squadron fly cancer body syria world battle attack htc apps
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 20 / 25
Experimental Results
Effectiveness of event detection:
1 TransDetector performs better in terms of the precision and the
recall.
2 only sacrificing in the DERate slightly because an event could be
grouped into multiple categories.
the event “S&P downgrade US credit rating”, related to politics and
financial simultaneously.
Table: Overall Performance on Event Detection
Method
Number of
Events to
be Evaluated
Recall@
Benchmark1
Precision@
Benchmark2
Recall@
Benchmark2
F@
Benchmark2
DERatea (Duplicate
Event Rate)@
Benchmark2
LSH 500 0.704 0.788 0.651 0.713 0.348
TimeUserLDA 100 0.370 0.790 0.177 0.289 0.114
Twevent 375 0.741 0.808 0.658 0.725 0.142
EDCoW 349 0.556 0.748 0.511 0.607 0.226
BurstyBTM 200 0.667 0.825 0.384 0.497 0.079
TransDetector 457 0.889 0.912 0.876 0.894 0.170
a DERate = (the number of duplicate events) / (the total number of detected realistic events)
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 21 / 25
Experimental Results
To understand why TransDetector performs better.
Show the relationship between the recall and the event size.
0.00
0.25
0.50
0.75
1.00
>200 (100,200] (50,100] [0,50]
Group with different event size (number of tweets)
Recall@Benchmark1
TimeUserLDA
BurstyBTM
EDCoW
LSH
Twevent
TransDetector
Figure: The relation between the recall and the event size
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 22 / 25
Experimental Results
To understand why TransDetector performs better.
Show the relationship between the recall and the event size, taking the
military-related events as an example.
Table: Events about military detected by systems between 2011-07-22 and
2011-07-28
Date Event key words Representative event tweet
Number of
event tweet
Methodsa
L TU TW E B TD
7/22/11
Norway, Oslo,
attacks, bombing
Terror Attacks Devastate Norway: A bomb
ripped through government offices in Oslo
and a gunman... http://dlvr.it/cLbk8
557
7/23/11 Gunman, rink
Gunman Kills Self, 5 Others at Texas Roller
Rink http://dlvr.it/cLcTH
43 - - -
7/26/11
Kandahar, mayor,
suicide, attack
TELEGRAPH]: Kandahar mayor killed by
Afghan suicide bomber: The mayor of
Kandahar, the biggest city in south
47 - -
7/28/11 Ft., Hood, attack
Possible Ft. Hood Attack Thwarted
http://t.co/BSJ33hk
52 - - - - -
7/28/11
Libyan, rebel,
gunned
Libyan rebel chief gunned down in Benghazi
http://sns.mx/prfvy1
44 - - - - -
a L=LSH, TU=TimeUserLDA, TW=Twevent, E=EDCoW, B=BurstyBTM, TD=TransDetector.
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 23 / 25
Conclusions
Knowledge base is constructed elaborately and contains rich
information, which can benefit the not-well-organized microblog
stream.
We propose TransDetector method, and
use category-level topic in knowledge base as the prior knowledge,
transfer abundant knowledge from knowledge base into microblog
stream
enrich the semantics of microblogs and further enhances the accuracy
of microblogs event detection
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 24 / 25
Thanks!
Q&A
5
This slide and more data are available at http://q-r.to/bajx8I
Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 25 / 25

More Related Content

Similar to Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection

Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talkaphex34
 
A&S Chairs Hybrid Courses Project Presentation
A&S Chairs Hybrid Courses Project PresentationA&S Chairs Hybrid Courses Project Presentation
A&S Chairs Hybrid Courses Project PresentationChristopher Rice
 
Kid171 chap0 english version
Kid171 chap0 english versionKid171 chap0 english version
Kid171 chap0 english versionFrank S.C. Tseng
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)XAIC
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Sciencepetermurrayrust
 
Processing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk AssessmentProcessing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk AssessmentHaystax Technology
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)VikramJothyPrakash1
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
 
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure Jie Bao
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...Xiaogang (Marshall) Ma
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Deep Context-Awareness: Context Coupling and New Types of Context Information...
Deep Context-Awareness: Context Coupling and New Types of Context Information...Deep Context-Awareness: Context Coupling and New Types of Context Information...
Deep Context-Awareness: Context Coupling and New Types of Context Information...Hong-Linh Truong
 
College of Doctoral StudiesRES-845 Module 2 Problem.docx
        College of Doctoral StudiesRES-845 Module 2 Problem.docx        College of Doctoral StudiesRES-845 Module 2 Problem.docx
College of Doctoral StudiesRES-845 Module 2 Problem.docxShiraPrater50
 
College of Doctoral StudiesRES-845 Module 2 Problem.docx
College of Doctoral StudiesRES-845 Module 2 Problem.docxCollege of Doctoral StudiesRES-845 Module 2 Problem.docx
College of Doctoral StudiesRES-845 Module 2 Problem.docxadkinspaige22
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
 

Similar to Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection (20)

AI Science
AI Science AI Science
AI Science
 
Data and science
Data and scienceData and science
Data and science
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
 
A&S Chairs Hybrid Courses Project Presentation
A&S Chairs Hybrid Courses Project PresentationA&S Chairs Hybrid Courses Project Presentation
A&S Chairs Hybrid Courses Project Presentation
 
Kid171 chap0 english version
Kid171 chap0 english versionKid171 chap0 english version
Kid171 chap0 english version
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 
Processing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk AssessmentProcessing Events in Probabilistic Risk Assessment
Processing Events in Probabilistic Risk Assessment
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure Semantic Wiki Based  Collaborative Scientific Modeling Infrastructure
Semantic Wiki Based Collaborative Scientific Modeling Infrastructure
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Deep Context-Awareness: Context Coupling and New Types of Context Information...
Deep Context-Awareness: Context Coupling and New Types of Context Information...Deep Context-Awareness: Context Coupling and New Types of Context Information...
Deep Context-Awareness: Context Coupling and New Types of Context Information...
 
College of Doctoral StudiesRES-845 Module 2 Problem.docx
        College of Doctoral StudiesRES-845 Module 2 Problem.docx        College of Doctoral StudiesRES-845 Module 2 Problem.docx
College of Doctoral StudiesRES-845 Module 2 Problem.docx
 
College of Doctoral StudiesRES-845 Module 2 Problem.docx
College of Doctoral StudiesRES-845 Module 2 Problem.docxCollege of Doctoral StudiesRES-845 Module 2 Problem.docx
College of Doctoral StudiesRES-845 Module 2 Problem.docx
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection

  • 1. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection Weijing Huang, Tengjiao Wang, Wei Chen, Yazhou Wang School of Electronics Engineering and Computer Science, Peking University @DASFAA 2017, Suzhou, China Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 1 / 25
  • 2. Motivation Many Web applications need the accurate event detection technique on microblog stream, including: 1 public opinion analysis [Chen, SIGIR 2013] 2 public security [Li, ICDE 2012], [Imran, WWW 2014] 3 disaster response [Sakaki,WWW 2010] 4 breaking news report1 But detecting events on twitter stream accurately is still challenging. 1 http://www.theverge.com/2016/12/1/13804542/ reuters-algorithm-breaking-news-twitter Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 2 / 25
  • 3. Challenges (1/2) According to [Huang, WWW 2016], the challenges include, 1 fast changing 2 high noise 3 short length And, we found another key factor, 1 Small events with fewer tweets Hard to trade off between precision and recall. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 3 / 25
  • 4. Challenges (2/2) Exploratory study on the Edinburgh twitter corpus: 11/27 events contain less than 50 tweets. Table: Statistics of labeled events. Event Date Event Size S&P downgrade US credit rating 05/08/2011 656 Atlantis shuttle lands 21/07/2011 595 US increases debt ceiling 25/07/2011 485 Plane with Russian hocky team Lokomotiv crashes 07/09/2011 286 Amy Winehouse dies 23/07/2011 283 Gunman opens fire in youth camp in Norway 23/07/2011 260 Earthquake in Virginia 24/08/2011 246 First victim of London riots dies 09/08/2011 174 Explosion in French nuclear plant in Marcoule 12/09/2011 135 Google announces plans to bury Motorola Mobility 15/08/2011 127 NASA announces there might be water on Mars 04/08/2011 124 Car bomb explodes in Oslo, Norway 22/07/2011 114 ... ... ... Indian and Bangladesh sign a border pact 06/09/2011 25 Flight 4896 crash 13/07/2011 21 First aritficial organ transplant 12/07/2011 18 three men die in riots in england 10/08/2011 16 rebels capture interational tripoli airport 21/08/2011 13 Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 4 / 25 11 Small events with fewer tweets
  • 5. How about existing methods? (1/2) Event detection methods without extra information, such as 1 clustering articles LSH[Petrovic, NAACL 2010] need to set threshold to determine whether new article represents a new event. 2 analyzing word frequencies EDCoW[Weng, ICWSM 2011] treat the word as the basic unit in analysis, without regarding polysemy words (words have different meanings, e.g. “apple”) 3 finding bursty topics via topic modeling TimeUserLDA[Diao, ACL 2012], BurstyBTM[Yan, AAAI 2015] detects the “large” events but may ignore the “small” ones. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 5 / 25
  • 6. How about existing methods? (2/2) Event detection methods by leveraging extra information 1 typical one: Twevent[Li, CIKM 2012] divides the tweet into segments according to the Microsoft Web N-Gram service and Wikpedia detects the bursty segments and cluster these segments into candidate events still has to trade off between precision and recall e.g., the bursty segment in the not-so-popular event “first artificial organ transplant” is missed Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 6 / 25
  • 7. An Example Much easier to detect the event on the time series of word “hood” related to Military, without adjusting the threshold. Jul 2011 Jul 2011 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 25 50 75 100 Document Frequency Ft. Hood Attack word "hood" word "hood" (about Military) Figure: The comparison of the time series between the raw word hood and the Military related word hood, computed on the Edinburgh twitter corpus. Refer the event to https://en.wikipedia.org/wiki/Fort_Hood#2011_attack_plot. An infant wearing a hood. k and he head, n from orm of hood is are used ses where the bers, (a) Fort Hood From Wikipedia, the free encyclopedia Fort Hood is a U.S. military post located in Killeen, Texas. The post is named after Confederate General John Bell Hood. It is located halfway between Austin and Waco, about 60 miles (b)Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 7 / 25
  • 8. The insights on the example Knowledge Base Well organized Constructed elaborately full of rich information Microblog stream Short length Fast changing High noise The benefit of enriching the semantics and filtering out noise by Knowledge Base for microblogs is attractive. But it’s expensive to retrieve every word of tweets in the Knowledge Base. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 8 / 25
  • 9. Our solution TransDetector: a novel category-level transfer learning method transfer KB’s category-level info into microblog stream balance the performance and the cost of leveraging knowledge base for event detection knowledge base information about Military docs@2011-7-28 in the stream Texture Texture Texture army military … shootings projectiles … killed news #libya libya … hood … analysis on words “killed”, “news”, … category-level topic in microblog stream @2011-07-28, about Military category-level topic in knowledge base, about Military 2011-07-28 event: “attack at Fort Hood”, event: “#libya Libyan rebels chief killed” … 1 2 3 Figure: TransDetector’s processing flow, in 3 phases. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 9 / 25
  • 10. TransDetector: Phase 1 (Extracting Category-Level Topics in KB) (1/3) There is a three fold hierarchical structure in Knowledge Base. 1 Taxonomy Graph G(0). The directed edges in G(0) represent the class→subclass relations in KB. e.g., Main topic classifications → Society 2 Category-Page Bipartite Graph G(1). The directed edges in G(1) represent the class→instance relations in KB. e.g., Military → page: Armed Forces 3 Page-Content Map G(2). For a specific Wikipedia dumps version, the edges page →content in G(2) define a one-to-one mapping. e.g., page: Armed Forces → content(20170325version): “The armed forces of a country are its government-sponsored defense, fighting forces, and organizations...” Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 10 / 25
  • 11. TransDetector: Phase 1 (Extracting Category-Level Topics in KB) (2/3) Algorithm 1: Extraction of Category-Level Topics in Knowledge Base Input: Taxonomy’s Graph G(0), Category-Page Bipartite Graph G(1), Page-Content Bipartite Graph G(2), topic related category node c Output: c’s category-level topic in knowledge base hc 1 Pages(c) ← ∅, hc ← ∅ 2 DAG G(0) ← Remove Cycles of G(0) by nodes’ HITS-PageRank scores. 3 SuccessorNodes(c) ← Breadth-first-traverse(G(0) , c) 4 for node ∈ SuccessorNodes(c) do 5 Pages(c) ← Pages(c) ∪ G(1).neighbours(node) 6 Word frequency table n(c, .) ← do word count on the text contents of Pages(c) 7 Word frequency table n(All, .) ← do word count on the text contents of all pages in G(2). 8 for word w in WordFrequencyTable(All).keys() do 9 chi(c, w) ← w’s chi-square statistics on WordFrequencyTable(c) and WordFrequencyTable(All). 10 hc,w ← chi(c, w) 11 return hc Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 11 / 25
  • 12. TransDetector: Phase 1 (Extracting Category-Level Topics in KB) (3/3) Taking the category Military as an example, we extract Military’s category-level topic hMilitary . Main topic classifications Culture … Society Politics page: Military page: Armed Forces page: Firearm page: World War II Charity … … … Military … Activism … … World War II… category-level topic in knowledge base, about military army, military, air, command, force, regiment, forces, squadron, infantry, battle, commander, …, shootings, projectiles,… (order by words’ chi-square scores ) … G(0):Taxonomy’s Graph G(1) :Category-Page Bipartite Graph G(2): Page-Content Bipartite Graph Figure: Extracting Category-Level Topics in Knowledge Base via its three fold hierarchical structure, taking Military as an example. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 12 / 25
  • 13. TransDetector: Phase 2 (Transferring Category-Level Info into Microblog Stream) (1/2) Transfer KB’s Category-Level Topics {hc}KKB c=1 into microblogs stream: CTrans-LDA. u m θd zdn wdn α β ϕk τk D Nd K Figure: Diagram of CTrans-LDA. In CTrans-LDA, {hc}KKB c=1 is used as prior information: τkv =    λ hkv v∈Sk hkv , v ∈ Sk and k ≤ KKB 0, v /∈ Sk or k > KKB (1) Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 13 / 25
  • 14. TransDetector: Phase 2 (Transferring Category-Level Info into Microblog Stream) (2/2) We use Gibbs Sampling for solving CTrans-LDA. The initialization probability ˆqk|v makes sure that the learned topics are aligned to the pre-defined category-level topic. ˆqk|v =    τkv K k=1 τkv , k τkv > 0 (a) 0, k τkv = 0 and k ≤ KKB (b) 1/(K − KKB ), k τkv = 0 and k > KKB (c) (2) Conditional probability in gibbs sampling: p(zdn = k|.) ∝ (n (d) dk + αmk)(n (w) kv + τkv + β)/(n (w) k,. + τk,. + V β). Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 14 / 25
  • 15. TransDetector: Phase 3 (Detecting Events on Category-Level Word Time Series) (1/2) After transfer learning, we conduct analysis on category-level word time series, and detect events in microblog stream. Figure: Visualizing Category-Level Word Time Series in Microblog Stream on Edinburgh Twitter Corpus (20110711-20110915), taking Military as an example. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 15 / 25
  • 16. TransDetector: Phase 3 (Detecting Events on Category-Level Word Time Series) (2/2) With richer semantics, and fewer noise, we detect events more accurately, in all the following granularities.(see more technique details in our paper) 1 events’ candidate words e.g., Ft., Hood, attack. 2 event phrases e.g., Ft. Hood attack. 3 events’ representative microblogs e.g., Possible Ft. Hood Attack Thwarted http: // t. co/ BSJ33hk . Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 16 / 25
  • 17. Experiment Settings Dataset Knowledge Base. Wikipedia dumps23 Microblog Stream. Edinburgh twitter corpus4 Baseline Methods Twevent, BurstyBTM, LSH, EDCoW, and TimeUserLDA Ground Truth Benchmark1 is labeled events in previous study. Benchmark2 is our manually checked events based on Twevent, BurstyBTM, LSH, EDCoW, TimeUserLDA and TransDetector. 2 https: //dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz 3 https: //dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 4 http://demeter.inf.ed.ac.uk/cross/docs/fsd_corpus.tar.gz Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 17 / 25
  • 18. Experimental Results Evaluation on Categroy-Level Topics in Knowledge Base. On Aviation topic, semantic coherence is much better than LightLDA (same as LDA) in terms of NPMI[Roder, WSDM 2015]5. Table: The comparison on the topic coherence(NPMI) between our method and LightLDA, taking Aviation as an example. (NPMI is computed on a group of ten words. ∼ stands for the top five words.) Category-Level Topics extracted from Wikipedia by TransDetector Topics Learned from Wikipedia by LightLDA GID #words* words NPMI GID #words* words NPMI - 1-5 aircraft air airport flight airline - - 1-5 engine aircraft car air power - 0 1-5, 6-10 ∼, airlines aviation flying pilot squadron 0.113 0 1-5, 6-10 ∼, design flight model production speed 0.112 1 1-5, 11-15 ∼, flights pilots raf airways fighter 0.155 1 1-5, 11-15 ∼, system vehicle cars engines mm 0.062 2 1-5, 16-20 ∼, boeing runway force crashed flew 0.092 2 1-5, 16-20 ∼, fuel vehicles designed models type 0.072 3 1-5, 21-25 ∼, airfield landing passengers plane aerial 0.179 3 1-5, 21-25 ∼, version front produced rear electric 0.035 4 1-5, 26-30 ∼, bomber radar wing bombers crash 0.137 4 1-5, 26-30 ∼, space control motor standard development 0.085 5 1-5, 31-35 ∼, airbus airports operations jet helicopter 0.189 5 1-5, 31-35 ∼, film range light using available -0.002 6 1-5, 36-40 ∼, squadrons base flown havilland crew 0.088 6 1-5, 36-40 ∼, wing powered wheel weight launch 0.087 7 1-5, 41-45 ∼, combat luftwaffe aerodrome carrier fokker 0.159 7 1-5, 41-45 ∼, developed low test ford cylinder 0.007 8 1-5, 46-50 ∼, planes fly engine takeoff fleet 0.186 8 1-5, 46-50 ∼, equipment side pilot hp aviation 0.091 9 1-5, 51-55 ∼, fuselage helicopters aviator naval aero 0.157 9 1-5, 51-55 ∼, systems us sold body drive -0.051 10 1-5, 56-60 ∼, glider command training balloon faa 0.166 10 1-5, 56-60 ∼, gear introduced class safety seat 0.069 · · · · · · · · · · · · · · · · · · · · · · · · 18 1-5, 96-100 ∼, scheduled carriers military curtiss biplane 0.131 18 1-5, 96-100 ∼, transmission special replaced limited different 0.059 19 1-5, 101-105 ∼, accident engines iaf albatross rcaf 0.068 19 1-5, 101-105 ∼, features machine nuclear even unit 0.011 5 https://github.com/AKSW/Palmetto Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 18 / 25
  • 19. Experimental Results Evaluation on Categroy-Level Topics in Knowledge Base. On more topics, semantic coherence is much better than LightLDA (same as LDA) in terms of NPMI[Roder, WSDM 2015]. −0.1 0.0 0.1 0.2 0.3 0 10 20 30 40 50 Group ID NPMI Category−Level Topics in Knowledge Base Topics learned by LightLDA Figure: More topics are compared at the NPMI metrics between our method and LightLDA Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 19 / 25
  • 20. Experimental Results Effectiveness of transferring category-level topics into the microblog stream, and finding more new words in the stream which is not stored in the knowledge base. Table: Category-Level Topics extracted from knowledge base and the corresponding topics on microblog stream learned from CTrans-LDA. The words in bold font are newly learned on the microblog stream by the transfer learning. Aviation Health Middle East Military Mobile Phones Knowledge Base Microblog Stream Knowledge Base Microblog Stream Knowledge Base Microblog Stream Knowledge Base Microblog Stream Knowledge Base Microblog Stream aircraft air health weight al #syria army killed android iphone air plane patients loss israel #bahrain military news mobile apple airport flight medical diet iran people air #libya nokia android flight time disease health arab israel command libya ios app airline airlines treatment cancer israeli police force rebels phone ipad airlines news hospital lose egypt #libya regiment people samsung samsung aviation boat patient fat egyptian #egypt forces police game mobile flying airport clinical tips ibn news squadron war app blackberry pilot force symptoms treatment jerusalem #israel infantry libyan iphone tablet squadron fly cancer body syria world battle attack htc apps Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 20 / 25
  • 21. Experimental Results Effectiveness of event detection: 1 TransDetector performs better in terms of the precision and the recall. 2 only sacrificing in the DERate slightly because an event could be grouped into multiple categories. the event “S&P downgrade US credit rating”, related to politics and financial simultaneously. Table: Overall Performance on Event Detection Method Number of Events to be Evaluated Recall@ Benchmark1 Precision@ Benchmark2 Recall@ Benchmark2 F@ Benchmark2 DERatea (Duplicate Event Rate)@ Benchmark2 LSH 500 0.704 0.788 0.651 0.713 0.348 TimeUserLDA 100 0.370 0.790 0.177 0.289 0.114 Twevent 375 0.741 0.808 0.658 0.725 0.142 EDCoW 349 0.556 0.748 0.511 0.607 0.226 BurstyBTM 200 0.667 0.825 0.384 0.497 0.079 TransDetector 457 0.889 0.912 0.876 0.894 0.170 a DERate = (the number of duplicate events) / (the total number of detected realistic events) Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 21 / 25
  • 22. Experimental Results To understand why TransDetector performs better. Show the relationship between the recall and the event size. 0.00 0.25 0.50 0.75 1.00 >200 (100,200] (50,100] [0,50] Group with different event size (number of tweets) Recall@Benchmark1 TimeUserLDA BurstyBTM EDCoW LSH Twevent TransDetector Figure: The relation between the recall and the event size Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 22 / 25
  • 23. Experimental Results To understand why TransDetector performs better. Show the relationship between the recall and the event size, taking the military-related events as an example. Table: Events about military detected by systems between 2011-07-22 and 2011-07-28 Date Event key words Representative event tweet Number of event tweet Methodsa L TU TW E B TD 7/22/11 Norway, Oslo, attacks, bombing Terror Attacks Devastate Norway: A bomb ripped through government offices in Oslo and a gunman... http://dlvr.it/cLbk8 557 7/23/11 Gunman, rink Gunman Kills Self, 5 Others at Texas Roller Rink http://dlvr.it/cLcTH 43 - - - 7/26/11 Kandahar, mayor, suicide, attack TELEGRAPH]: Kandahar mayor killed by Afghan suicide bomber: The mayor of Kandahar, the biggest city in south 47 - - 7/28/11 Ft., Hood, attack Possible Ft. Hood Attack Thwarted http://t.co/BSJ33hk 52 - - - - - 7/28/11 Libyan, rebel, gunned Libyan rebel chief gunned down in Benghazi http://sns.mx/prfvy1 44 - - - - - a L=LSH, TU=TimeUserLDA, TW=Twevent, E=EDCoW, B=BurstyBTM, TD=TransDetector. Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 23 / 25
  • 24. Conclusions Knowledge base is constructed elaborately and contains rich information, which can benefit the not-well-organized microblog stream. We propose TransDetector method, and use category-level topic in knowledge base as the prior knowledge, transfer abundant knowledge from knowledge base into microblog stream enrich the semantics of microblogs and further enhances the accuracy of microblogs event detection Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 24 / 25
  • 25. Thanks! Q&A 5 This slide and more data are available at http://q-r.to/bajx8I Weijing Huang, et,al. Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection@DASFAA 2017, Suzhou, China 25 / 25