Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Topic detection and tracking
1. Outline
Topic Detection and Tracking • Topic detection and tracking
• Overview of TDT 2004
Valentin Jijkoun & Maarten de Rijke
Informatics Institute
University of Amsterdam
http://ilps.science.uva.nl/Teaching/II0506
March 6, 2006
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 1 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 2
TDT… Topic Detection and Tracking
• Introduction… Terabytes of • 5 TDT Applications
– http://www.nist.gov/speech/tests/tdt/ Unorganized data
– Story
segmentation*
– Topic Tracking
– Topic Detection
– First Story
Detection
– Link Detection
* Not evaluated in 2004
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 3 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 4
TDT’s Research Domain Definitions
• Technology challenge • An event is …
– Develop applications that organize and locate – A specific thing that happens at a specific time and
place along with all necessary preconditions and
relevant stories from a continuous feed of
unavoidable consequences.
news stories
• A topic is …
• Research driven by evaluation tasks – an event or activity, along with all directly related
• Composite applications built from events and activities
– Document Retrieval • A broadcast news story is …
– a section of transcribed text with substantive
– Speech-to-Text (STT) – not included in 2004
information content and a unified topical focus
– Story Segmentation – not included in 2004
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 5 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 6
2. TDT 2004 Evaluation Corpus
• TDT Evaluation Overview TDT4 TDT5 • 2004: same
(2003 (2004
languages as 2003
• Changes in 2004 Collection Dates
corpus)
Oct 1, 2000 to
corpus)
April 1, 2003 to • Summary of
• 2004 TDT Evaluation Result Summaries Newswire
Jan 31, 2001
3 Arabic
Sep 30, 2003
6 Arabic
differences
– New Event Detection Sources 2 English 7 English – New time period
2 Mandarin 4 Mandarin
– No broadcast news
– Link Detection Broadcast News
Sources
2 Arabic NONE
• No non-news stories
5 English
– Topic Tracking 5 Mandarin – 4.5 times more stories
Story Counts 90735 news, 407503 news,
– Experimental Tasks: 7513 non-news 0 non-news
– 3.1 times more topics
stories – Topics have ! as
• Supervised Adaptive Topic Tracking Annotated topics 80 250 many on-topic stories
• Hierarchical Topic Detection Average topic 79 stories 40 stories
size
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 7 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 8
Topic Size Distribution Multilingual Topic overlap
Single Overlap Topics Multiply Overlap Topics
35 7 21 63 62 62
Arb+Eng+Man Arb+Eng Eng+Man Eng Man Arb Common Topic ID 107: Casablanca bombs
Stories 105
Unique
Stories
126 12
1000 283
583 106 118
72 89 532
105 20 380 2
215 3 107 154
1110 6 9
Number of On-Topic Stories
92 1
29 125 25
100 451 63 140 1 60
227 70
Arabic 93 1
9
Mandarin 151 189
42 2 90 171 22 71 2
English 80
5 6
10 78
69 145
427 1 3
186 193 Topics on 71: Demonstrations in
31 1 104
1
Terrorism Casablanca
Topics (sorted by language and size)
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 9 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 10
Topic labels Participation by Task:
Showing the Number of Submitted System Runs
Single Overlap Topics Multiple Overlap Topics Sites New Event Hierarchical Topic Tracking Link
Detection Topic Traditional Supervised
Detection
72 Court indicts Liberian President 105 UN official killed in attack Detection Adaptation
89 Liberian former president arrives in exile 126 British soldiers attacked in Basra CMU Carnegie Mellon Univ.
215 Jerusalem: Bus suicide bombing
1 6 8 10
International Business
29 Swedish Foreign Minister killed 227 Bin Laden Videotape IBM
Machines 4
125 Sweden rejects the Euro 171 Morocco: death sentences for bombing
Domestic
Stottler Henke
suspects SHAI
Associates, Inc. 5
151 Egyptian delegation in Gaza UIowa Univ. of Iowa
4
189 Palestinian public uprising suspended 107 Casablanca bombs
UMd Univ. of Maryland
for three months 71 Demonstrations in Casablanca 1 2
Univ. Massachusetts
UMass 4 6 5 7 4
69 Earthquake in Algeria 106 Bombing in Riyadh, Saudi Arabia
Chinese Univ. of Hong
145 Visit of Morocco Minister of Foreign 118 World Economic Forum in Jordan CUHK
Kong 1
Affairs to Algeria 154 Saudi suicide bomber dies in shootout Institute of Computing
ICT 11 1
60 Saudi King has eye surgery
Foreign
Technology Chinese
Academy of Sciences
186 Press conference between Lebanon and 80 Spanish Elections
US foreign ministers NEU Northeastern University
2 2
in China
193 Colin Powell Plans to visit Middle East
Netherlands Org for
and Europe TNO
Applied Scientific 8
Research
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 11 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 12
3. New Event Detection Task TDT Evaluation Methodology
• System Goal: • Tasks are modeled as detection tasks
– To detect the first story that discusses each – Systems are presented with many trials and must
answer the question: “Is this example a target trial?”
topic
– Systems respond:
• YES this is a target, or NO this is not
• Each decision includes a likelihood score indicating the
First Stories on two topics system’s confidence in the decision
• System performance measured by linearly
= Topic 1
= Topic 2 combining the system’s missed detection rate
and false alarm rate
Not First Stories
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 13 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 14
Detection Evaluation Methodology Performance Measures Example
• Performance is measured in terms of Detection Cost Bar Chart
– CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget)
DET Curve
Actual Normalized
1
– Constants: Detection Cost
• CMiss = 1 and CFA = 0.1 are preset costs Minimum DET
• Ptarget = 0.02 is the a priori probability of a target Normalized Cost
– System performance estimates
• PMiss and PFA P(miss) = 5.5%
> Min DET Norm
Detection Cost
– Normalized Detection Cost generally lies between 0 and 1: P(fa)=1.1% Cost = 0.11
• (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)} 0.1
te r
• Detection Error Tradeoff (DET) curves graphically depict the
is bet
performance tradeoff between PMiss and PFA le f t
– Makes use of likelihood scores attached to the YES/NO decisions tto m
Bo
! Two important scores per system
– Actual Normalized Detection Cost 0.01
>
• Based on the YES/NO decision threshold P(miss) = 0.7% Min DET NormEnglish Mandarin
– Minimum Normalized DET point P(fa)=1.5% Cost = 0.08
• Based on the DET curve: Minimum score with proper threshold
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 15 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 16
Primary New Event Detection Results TDT Link Detection Task
Newswire, English Texts
System Goal:
Actual Norm(Cost) – To detect whether a pair of stories discuss the same topic.
Minimum Norm(Cost) (Can be thought of as a “primitive operator” to build a variety of
1
applications)
Normalized Cost
?
0.1
1
U1
1
1
AI
M
ass
CM
IB
SH
UM
2003’s best score
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 17 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 18
4. Primary Link Detection Results Topic Tracking Task
Newswire, Multilingual links, 10-file deferral period
Actual Norm(Cost) • System Goal:
Minimum Norm(Cost)
1 – To detect stories that discuss the target topic, in
multiple source streams
• Supervised Training
Normalized Cost
– Given Nt samples stories that discuss a given target topic
0.1
• Testing
– Find all subsequent stories that discuss the target topic
0.01
on-topic
unknown
training data
UI 1
U1
UM 1
unknown
1
U
a
ass
ow
NE
CM
test data
Scores are better than last year!
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 19 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 20
Primary Tracking Results Supervised Adaptive Tracking Task
Newswire, Multilingual Texts, 1 English Training Story
• Variation of Topic Tracking system goal:
Actual Norm(Cost) – To detect stories that discuss the target topic when
Minimum Norm(Cost)
1
a human provides feedback to the system
• System receives human judgment (on or off-topic)
for every retrieved story
Normalized Cost
– Same task as TREC 2002 Adaptive Filtering
0.1
on-topic
unknown
un-retrieved
0.01 retrieved on-topic
training data retrieved off-topic
UM 1
NE 1
U1
UM D1
1
U
T
ass
IC
CM
test data
2003’s best score
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 21 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 22
Supervised Adaptive Tracking Metrics Supervised Adaptive Tracking Metrics
• Normalized Detection Cost • Linear Utility Measure Computation:
– Same measure as for basic Tracking task – Basic formula: U = Wrel ! R - NR
• R = number of relevant stories retrieved
• Linear Utility Measure • NR = number of non-relevant stories retrieved
– As defined for TREC 2002 Filtering Track • Wrel = relative weight of relevant vs non-relevant
(Robertson & Soboroff) (set to 10, by analogy with CMiss vs. CFA weights for CDet)
– Measures value of the stories sent to the user: – Normalization across topics:
• Credit for relevant stories, debit for non-relevant stories • Divide by maximum possible utility score for each topic
• Equivalent to thresholding based on estimated probability – Scaling across topics:
of relevance • Define arbitrary minimum possible score, to avoid having
– No penalty for missing relevant stories average dominated by a few topics with huge NR counts
(i.e. all precision, no recall) • Corresponds to application scenario in which user stops looking
– Implication: Challenge is to beat the “do-nothing” baseline at stories when system exceeds some tolerable false alarm rate
(i.e. a system that rejects all stories) – Scaled, normalized value:
Uscale = [ max(Unorm, Umin) ] / [ 1 - Umin ]
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 23 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 24
5. Supervised Adaptive Tracking Effect of Supervised Adaptation
Best Two Submissions per Site
Newswire, Multilingual Texts, 1 English Training Story • CMU4 is a simple cosine similarity tracker
– Contrastive run submitted without supervised
adaptation
Minimum Norm(Cost)
1
Normalized Cost
0.1
0.01
g
ing
kin
ck
ac
a
Tr
Tr
Best 2004 standard tracking result!
SA
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 25 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 26
Supervised Adaptive Tracking
Utility vs. Detection cost Hierarchical Topic Detection
Actual Normalized DET Cost
• System goal:
Min. Normalized DET Cost
Minimum DET Cost vs. Scaled Utility
1 Scaled Utility
0.8
– To detect topics in terms of the (clusters of) stories
System Performance
0.7
0.1
0.6 that discuss them
Scaled Utility
0.5
0.4 • Problems with past Topic Detection evaluations:
0.01
0.3
0.2
y = 1.0398x + 0.2942 – Topics are at different levels of granularity,
2
yet systems had to choose single operating point
CMU6
CMU2
CMU1
CMU5
CMU3-TrecUtl
CMU4
CMU7
CMU8-dbg
UMass2
UMass1
UMass3
UMass4
UMass7
UMD1
UMD2
R = 0.2349
0.1
0
0 0.2 0.4 0.6 0.8
for creating a new cluster
Minimum DET Cost – Stories may pertain to multiple topics,
• Performance on Utility measure:
– 2/3 of systems surpassed baseline scaled utility score yet systems had to assign each to only one cluster
(0.33)
– Most systems optimized for detection cost, not utility
• Detection Cost and Utility are uncorrelated: R2 of 0.23
Internet– Even for March 6, 2006
Information/MIKII6, CMU3 which was tuned for utility Valentin Jijkoun, Maarten de Rijke 27 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 28
Topic Hierarchy Solves Problems Hierarchical Topic Detection
• System operation: a Vertex Observations
– Unsupervised topic training - Edge
no topic instances as input Story IDs • All systems structured hierarchy as a tree –
– Assign each story to one or more clusters
– Clusters may overlap or include other
each vertex has one parent
a
clusters
– Clusters must be organized as directed
• Travel cost has very little effect on finding the
acyclic graph (DAG) with single root b c
best cluster
– Treated as retrospective search s1
• Semantics of topic hierarchy: s2
s4 s3 – Setting WDET to 1.0 has little effect on topic mapping
– Root = entire collection
– Leaf nodes = the most specific topics d e f g
• Cost parameters favor false alarms
– Intermediate nodes represent different – Average mapped cluster sizes are between
levels of granularity s5 s10 s9
• Performance assessment: s6
1262 and 7757 stories
– Given a topic, find matching cluster – Average topic size is 40 stories
with lowest cost h i j
s7
s11 s13 s15
s8
s12 s14 s16
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 29 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 30
6. Summary What do teams use?
• Eleven research groups participated in five evaluation tasks
• Error rates increased for new event detection
– Why?
• TNO (HDT at TDT 2004)
• Error rates decreased for tracking – Focus on scalability
• Error rates decreased for link detection
• Dry run of hierarchical topic detection completed – Agglomerative clustering scalable for large
– Solves previous problems with topic detection task, but raises new issues
– Questions to consider: document collections
• Is the specified hierarchical structure (single-root DAG) appropriate?
• Is the minimal cost metric appropriate?
• Take a sample
• If so, is the normalization right? • Build a hierarchical cluster structure of this sample
• Dry run of supervised adaptive tracking completed
– Promising results for including relevance feedback • Optimize resulting binary tree for minimal cost
– Questions to consider: metric
• Should we continue the task?
• If so, should we continue using both metrics? – Detection cost, travel cost
• Assign remaining docs from the corpus to cluster
in the structure obtained from the sample
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 31 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 32
Umass at TDT 2004
• Hierarchical topic detection
– Topic detection classifies stories into different
topics
– Two step algorithm
• 1-NN for event formation
– Stories from same source selected and time ordered
– Stories are processed one by one, each incoming story
is compared to (a certain number of) stories before it
• Agglomerative clustering for building the hierarchy
– Events are sorted by time order according to time stamp
of first story, then do a bounded agglomerative clustering
for the events
Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 33