SlideShare a Scribd company logo
1 of 6
Download to read offline
Outline
           Topic Detection and Tracking                                                               • Topic detection and tracking
                                                                                                      • Overview of TDT 2004
                  Valentin Jijkoun & Maarten de Rijke

                                          Informatics Institute
                                        University of Amsterdam
              http://ilps.science.uva.nl/Teaching/II0506


                                             March 6, 2006



Internet Information/MIKII6, March 6, 2006                   Valentin Jijkoun, Maarten de Rijke   1   Internet Information/MIKII6, March 6, 2006       Valentin Jijkoun, Maarten de Rijke   2




TDT…                                                                                                  Topic Detection and Tracking
• Introduction…                                                                                           Terabytes of                             • 5 TDT Applications
       – http://www.nist.gov/speech/tests/tdt/                                                          Unorganized data
                                                                                                                                                     – Story
                                                                                                                                                       segmentation*
                                                                                                                                                     – Topic Tracking
                                                                                                                                                     – Topic Detection
                                                                                                                                                     – First Story
                                                                                                                                                       Detection
                                                                                                                                                     – Link Detection

                                                                                                                                                              * Not evaluated in 2004

Internet Information/MIKII6, March 6, 2006                   Valentin Jijkoun, Maarten de Rijke   3   Internet Information/MIKII6, March 6, 2006       Valentin Jijkoun, Maarten de Rijke   4




TDT’s Research Domain                                                                                 Definitions
• Technology challenge                                                                                • An event is …
       – Develop applications that organize and locate                                                       – A specific thing that happens at a specific time and
                                                                                                               place along with all necessary preconditions and
         relevant stories from a continuous feed of
                                                                                                               unavoidable consequences.
         news stories
                                                                                                      • A topic is …
• Research driven by evaluation tasks                                                                        – an event or activity, along with all directly related
• Composite applications built from                                                                            events and activities
       – Document Retrieval                                                                           • A broadcast news story is …
                                                                                                             – a section of transcribed text with substantive
       – Speech-to-Text (STT) – not included in 2004
                                                                                                               information content and a unified topical focus
       – Story Segmentation – not included in 2004
Internet Information/MIKII6, March 6, 2006                   Valentin Jijkoun, Maarten de Rijke   5   Internet Information/MIKII6, March 6, 2006       Valentin Jijkoun, Maarten de Rijke   6
TDT 2004                                                                                                                                Evaluation Corpus
               • TDT Evaluation Overview                                                                                                                                            TDT4                         TDT5                   • 2004: same
                                                                                                                                                                                    (2003                        (2004
                                                                                                                                                                                                                                          languages as 2003
               • Changes in 2004                                                                                                                   Collection Dates
                                                                                                                                                                                   corpus)
                                                                                                                                                                             Oct 1, 2000 to
                                                                                                                                                                                                                corpus)
                                                                                                                                                                                                          April 1, 2003 to              • Summary of
               • 2004 TDT Evaluation Result Summaries                                                                                              Newswire
                                                                                                                                                                             Jan 31, 2001

                                                                                                                                                                             3 Arabic
                                                                                                                                                                                                          Sep 30, 2003

                                                                                                                                                                                                          6 Arabic
                                                                                                                                                                                                                                          differences
                                 – New Event Detection                                                                                             Sources                   2 English                    7 English                                – New time period
                                                                                                                                                                             2 Mandarin                   4 Mandarin
                                                                                                                                                                                                                                                   – No broadcast news
                                 – Link Detection                                                                                                  Broadcast News
                                                                                                                                                   Sources
                                                                                                                                                                             2 Arabic                            NONE
                                                                                                                                                                                                                                                            • No non-news stories
                                                                                                                                                                             5 English
                                 – Topic Tracking                                                                                                                            5 Mandarin                                                            – 4.5 times more stories
                                                                                                                                                   Story Counts              90735 news,                  407503 news,
                                 – Experimental Tasks:                                                                                                                       7513 non-news                0 non-news
                                                                                                                                                                                                                                                   – 3.1 times more topics
                                                                                                                                                                             stories                                                               – Topics have ! as
                                     • Supervised Adaptive Topic Tracking                                                                          Annotated topics          80                           250                                        many on-topic stories
                                     • Hierarchical Topic Detection                                                                                Average topic             79 stories                   40 stories
                                                                                                                                                   size
               Internet Information/MIKII6, March 6, 2006                                          Valentin Jijkoun, Maarten de Rijke    7             Internet Information/MIKII6, March 6, 2006                                                           Valentin Jijkoun, Maarten de Rijke       8




               Topic Size Distribution                                                                                                                 Multilingual Topic overlap
                                                                                                                                                          Single Overlap Topics                                                             Multiply Overlap Topics
                                   35             7                     21            63            62              62
                              Arb+Eng+Man      Arb+Eng               Eng+Man          Eng           Man             Arb                                       Common               Topic ID                                                             107: Casablanca bombs
                                                                                                                                                               Stories                                                                 105
                                                                                                                                                                                                      Unique
                                                                                                                                                                                                      Stories
                                                                                                                                                                                                                             126            12
                              1000                                                                                                                                                                                                                                                                   283
                                                                                                                                                                                                                                                 583                           106            118
                                                                                                                                                                     72              89                                532
                                                                                                                                                              105           20                380                                                                                     2
                                                                                                                                                                                                                                      215         3            107                             154
                                                                                                                                                                                                                                                                              1110    6                    9
 Number of On-Topic Stories




                                                                                                                                                                                                                             92                         1
                                                                                                                                                                     29              125                                                                                  25
                               100                                                                                                                            451           63                140                                                                                1            60
                                                                                                                                                                                                                                      227                       70
                                                                                                                                        Arabic                                                                               93                    1
                                                                                                                                                                                                                                                                                                     9
                                                                                                                                        Mandarin                     151             189
                                                                                                                                                              42               2                 90                                     171            22      71         2
                                                                                                                                        English                                                                                                                                  80
                                                                                                                                                                                                                                  5                             6
                                10                                                                                                                                                                                                                                                        78
                                                                                                                                                                     69              145
                                                                                                                                                              427              1                  3


                                                                                                                                                                     186             193                                                Topics on                      71: Demonstrations in
                                                                                                                                                              31               1              104
                                 1
                                                                                                                                                                                                                                        Terrorism                      Casablanca
                                                            Topics (sorted by language and size)
               Internet Information/MIKII6, March 6, 2006                                          Valentin Jijkoun, Maarten de Rijke    9             Internet Information/MIKII6, March 6, 2006                                                           Valentin Jijkoun, Maarten de Rijke       10




                Topic labels                                                                                                                       Participation by Task:
                                                                                                                                                   Showing the Number of Submitted System Runs
                              Single Overlap Topics                                  Multiple Overlap Topics                                                          Sites                              New Event           Hierarchical                         Topic Tracking                            Link
                                                                                                                                                                                                         Detection              Topic                       Traditional          Supervised
                                                                                                                                                                                                                                                                                                          Detection
 72 Court indicts Liberian President                                               105 UN official killed in attack                                                                                                           Detection                                          Adaptation
 89 Liberian former president arrives in exile                                     126 British soldiers attacked in Basra                                     CMU          Carnegie Mellon Univ.
                                                                                   215 Jerusalem: Bus suicide bombing
                                                                                                                                                                                                                1                                                6                        8                    10
                                                                                                                                                                           International Business
 29 Swedish Foreign Minister killed                                                227 Bin Laden Videotape                                                    IBM
                                                                                                                                                                                 Machines                       4
125 Sweden rejects the Euro                                                        171 Morocco: death sentences for bombing
                                                                                                                                                   Domestic




                                                                                                                                                                               Stottler Henke
                                                                                      suspects                                                                SHAI
                                                                                                                                                                              Associates, Inc.                  5
151 Egyptian delegation in Gaza                                                                                                                               UIowa              Univ. of Iowa
                                                                                                                                                                                                                                                                                                               4
189 Palestinian public uprising suspended                                          107 Casablanca bombs
                                                                                                                                                              UMd             Univ. of Maryland
   for                 three months                                                 71 Demonstrations in Casablanca                                                                                                                                              1                        2
                                                                                                                                                                            Univ. Massachusetts
                                                                                                                                                              UMass                                             4                       6                        5                        7                    4
 69 Earthquake in Algeria                                                          106 Bombing in Riyadh, Saudi Arabia
                                                                                                                                                                           Chinese Univ. of Hong
145 Visit of Morocco Minister of Foreign                                           118 World Economic Forum in Jordan                                         CUHK
                                                                                                                                                                                   Kong                                                 1
   Affairs to Algeria                                                              154 Saudi suicide bomber dies in shootout                                               Institute of Computing
                                                                                                                                                              ICT                                                                      11                        1
                                                                                    60 Saudi King has eye surgery
                                                                                                                                                   Foreign




                                                                                                                                                                            Technology Chinese
                                                                                                                                                                           Academy of Sciences
186 Press conference between Lebanon and                                            80 Spanish Elections
   US foreign ministers                                                                                                                                       NEU          Northeastern University
                                                                                                                                                                                                                                                                 2                                             2
                                                                                                                                                                                  in China
193 Colin Powell Plans to visit Middle East
                                                                                                                                                                            Netherlands Org for
   and Europe                                                                                                                                                 TNO
                                                                                                                                                                             Applied Scientific                                         8
                                                                                                                                                                                Research

               Internet Information/MIKII6, March 6, 2006                                          Valentin Jijkoun, Maarten de Rijke    11            Internet Information/MIKII6, March 6, 2006                                                           Valentin Jijkoun, Maarten de Rijke       12
New Event Detection Task                                                                                                 TDT Evaluation Methodology
• System Goal:                                                                                                           • Tasks are modeled as detection tasks
       – To detect the first story that discusses each                                                                          – Systems are presented with many trials and must
                                                                                                                                  answer the question: “Is this example a target trial?”
         topic
                                                                                                                                – Systems respond:
                                                                                                                                        • YES this is a target, or NO this is not
                                                                                                                                        • Each decision includes a likelihood score indicating the
                                              First Stories on two topics                                                                 system’s confidence in the decision
                                                                                                                         • System performance measured by linearly
                   = Topic 1
                   = Topic 2                                                                                               combining the system’s missed detection rate
                                                                                                                           and false alarm rate
                                                   Not First Stories
Internet Information/MIKII6, March 6, 2006                          Valentin Jijkoun, Maarten de Rijke           13      Internet Information/MIKII6, March 6, 2006                                               Valentin Jijkoun, Maarten de Rijke   14




Detection Evaluation Methodology                                                                                         Performance Measures Example
   •     Performance is measured in terms of Detection Cost                                                                                                                                                                                Bar Chart
           – CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget)
                                                                                                                                                    DET Curve
                                                                                                                                                                                                                                            Actual Normalized
                                                                                                                                                                                                                                      1
           – Constants:                                                                                                                                                                                                                     Detection Cost
                   • CMiss = 1 and CFA = 0.1 are preset costs                                                                                                                                                                               Minimum DET
                   • Ptarget = 0.02 is the a priori probability of a target                                                                                                                                                                 Normalized Cost
           – System performance estimates
                   • PMiss and PFA                                                                                                             P(miss) = 5.5%
                                                                                                                                                                               > Min DET Norm


                                                                                                                                                                                                                   Detection Cost
           – Normalized Detection Cost generally lies between 0 and 1:                                                                         P(fa)=1.1%                        Cost = 0.11
                   • (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)}                                                                                                                                                         0.1
                                                                                                                                                                                                     te r
   •     Detection Error Tradeoff (DET) curves graphically depict the
                                                                                                                                                                                          is   bet
         performance tradeoff between PMiss and PFA                                                                                                                                le f t
           – Makes use of likelihood scores attached to the YES/NO decisions                                                                                               tto m
                                                                                                                                                                      Bo
   ! Two important scores per system
           – Actual Normalized Detection Cost                                                                                                                                                                                       0.01

                                                                                                                                                                                                            >
                   • Based on the YES/NO decision threshold                                                                                                                 P(miss) = 0.7%                      Min DET NormEnglish                    Mandarin
           – Minimum Normalized DET point                                                                                                                                   P(fa)=1.5%                          Cost = 0.08
                   • Based on the DET curve: Minimum score with proper threshold

Internet Information/MIKII6, March 6, 2006                          Valentin Jijkoun, Maarten de Rijke           15      Internet Information/MIKII6, March 6, 2006                                               Valentin Jijkoun, Maarten de Rijke   16




  Primary New Event Detection Results                                                                                    TDT Link Detection Task
  Newswire, English Texts

                                                                                                                         System Goal:
                                                                                             Actual Norm(Cost)                – To detect whether a pair of stories discuss the same topic.
                                                                                             Minimum Norm(Cost)                  (Can be thought of as a “primitive operator” to build a variety of
                                                                                             1
                                                                                                                                   applications)
                                                                          Normalized Cost




                                                                                                                                                                                          ?
                                                                                            0.1
                                                                                                            1
                                                                                                  U1

                                                                                                       1



                                                                                                                     1
                                                                                                            AI
                                                                                                       M



                                                                                                                 ass
                                                                                             CM

                                                                                                   IB

                                                                                                           SH
                                                                                                                UM




                                             2003’s best score
Internet Information/MIKII6, March 6, 2006                          Valentin Jijkoun, Maarten de Rijke           17      Internet Information/MIKII6, March 6, 2006                                               Valentin Jijkoun, Maarten de Rijke   18
Primary Link Detection Results                                                                                       Topic Tracking Task
Newswire, Multilingual links, 10-file deferral period

                                                                                                Actual Norm(Cost)          • System Goal:
                                                                                                Minimum Norm(Cost)
                                                                                                1                                  – To detect stories that discuss the target topic, in
                                                                                                                                     multiple source streams
                                                                                                                                          • Supervised Training




                                                                            Normalized Cost
                                                                                                                                                  – Given Nt samples stories that discuss a given target topic
                                                                                               0.1
                                                                                                                                          • Testing
                                                                                                                                                  – Find all subsequent stories that discuss the target topic

                                                                                              0.01
                                                                                                                                                                                              on-topic
                                                                                                                                                                                              unknown
                                                                                                                                      training data



                                                                                                UI 1
                                                                                                    U1



                                                                                                UM 1
                                                                                                                                                                                              unknown




                                                                                                       1
                                                                                                    U

                                                                                                     a
                                                                                                   ass
                                                                                                  ow
                                                                                                 NE
                                                                                                 CM
                                                                                                                                                                                       test data
                                    Scores are better than last year!
Internet Information/MIKII6, March 6, 2006                       Valentin Jijkoun, Maarten de Rijke       19         Internet Information/MIKII6, March 6, 2006                            Valentin Jijkoun, Maarten de Rijke   20




Primary Tracking Results                                                                                             Supervised Adaptive Tracking Task
Newswire, Multilingual Texts, 1 English Training Story

                                                                                                                           • Variation of Topic Tracking system goal:
                                                                                                Actual Norm(Cost)                  – To detect stories that discuss the target topic when
                                                                                                Minimum Norm(Cost)
                                                                                                1
                                                                                                                                     a human provides feedback to the system
                                                                                                                                          • System receives human judgment (on or off-topic)
                                                                                                                                            for every retrieved story
                                                                            Normalized Cost




                                                                                                                                   – Same task as TREC 2002 Adaptive Filtering
                                                                                               0.1
                                                                                                                                                                                           on-topic
                                                                                                                                                                                           unknown
                                                                                                                                                                                           un-retrieved
                                                                                              0.01                                                                                         retrieved on-topic
                                                                                                                                      training data                                        retrieved off-topic
                                                                                                      UM 1
                                                                                                      NE 1
                                                                                                         U1




                                                                                                     UM D1
                                                                                                            1
                                                                                                         U
                                                                                                          T



                                                                                                        ass
                                                                                                       IC
                                                                                               CM




                                                                                                                                                                                       test data
                                             2003’s best score
Internet Information/MIKII6, March 6, 2006                       Valentin Jijkoun, Maarten de Rijke       21         Internet Information/MIKII6, March 6, 2006                            Valentin Jijkoun, Maarten de Rijke   22




Supervised Adaptive Tracking Metrics                                                                                 Supervised Adaptive Tracking Metrics
   • Normalized Detection Cost                                                                                          • Linear Utility Measure Computation:
           – Same measure as for basic Tracking task                                                                            – Basic formula: U = Wrel ! R - NR
                                                                                                                                        • R = number of relevant stories retrieved
   • Linear Utility Measure                                                                                                             • NR = number of non-relevant stories retrieved
           – As defined for TREC 2002 Filtering Track                                                                                   • Wrel = relative weight of relevant vs non-relevant
             (Robertson & Soboroff)                                                                                                       (set to 10, by analogy with CMiss vs. CFA weights for CDet)
           – Measures value of the stories sent to the user:                                                                    – Normalization across topics:
                   • Credit for relevant stories, debit for non-relevant stories                                                        • Divide by maximum possible utility score for each topic
                   • Equivalent to thresholding based on estimated probability                                                  – Scaling across topics:
                     of relevance                                                                                                       • Define arbitrary minimum possible score, to avoid having
           – No penalty for missing relevant stories                                                                                      average dominated by a few topics with huge NR counts
             (i.e. all precision, no recall)                                                                                            • Corresponds to application scenario in which user stops looking
           – Implication: Challenge is to beat the “do-nothing” baseline                                                                  at stories when system exceeds some tolerable false alarm rate
             (i.e. a system that rejects all stories)                                                                           – Scaled, normalized value:
                                                                                                                                        Uscale = [ max(Unorm, Umin) ] / [ 1 - Umin ]


Internet Information/MIKII6, March 6, 2006                       Valentin Jijkoun, Maarten de Rijke       23         Internet Information/MIKII6, March 6, 2006                            Valentin Jijkoun, Maarten de Rijke   24
Supervised Adaptive Tracking                                                                                                                                                                                                            Effect of Supervised Adaptation
                                 Best Two Submissions per Site
                                    Newswire, Multilingual Texts, 1 English Training Story                                                                                                                                                                           • CMU4 is a simple cosine similarity tracker
                                                                                                                                                                                                                                                                             – Contrastive run submitted without supervised
                                                                                                                                                                                                                                                                               adaptation
                                                                                                                                                                                                                                                                                                                                     Minimum Norm(Cost)
                                                                                                                                                                                                                                                                                                                                            1




                                                                                                                                                                                                                                                                                                                       Normalized Cost
                                                                                                                                                                                                                                                                                                                                          0.1




                                                                                                                                                                                                                                                                                                                                         0.01




                                                                                                                                                                                                                                                                                                                                                    g

                                                                                                                                                                                                                                                                                                                                                           ing
                                                                                                                                                                                                                                                                                                                                                kin

                                                                                                                                                                                                                                                                                                                                                        ck
                                                                                                                                                                                                                                                                                                                                            ac

                                                                                                                                                                                                                                                                                                                                                       a
                                                                                                                                                                                                                                                                                                                                          Tr

                                                                                                                                                                                                                                                                                                                                                    Tr
                                                                                          Best 2004 standard tracking result!




                                                                                                                                                                                                                                                                                                                                                  SA
                          Internet Information/MIKII6, March 6, 2006                                                                                                          Valentin Jijkoun, Maarten de Rijke                           25                     Internet Information/MIKII6, March 6, 2006   Valentin Jijkoun, Maarten de Rijke          26




                          Supervised Adaptive Tracking
                          Utility vs. Detection cost                                                                                                                                                                                                              Hierarchical Topic Detection
                                                                                                          Actual Normalized DET Cost

                                                                                                                                                                                                                                                                  • System goal:
                                                                                                          Min. Normalized DET Cost
                                                                                                                                                                                                       Minimum DET Cost vs. Scaled Utility
                             1                                                                            Scaled Utility
                                                                                                                                                                                                   0.8
                                                                                                                                                                                                                                                                         – To detect topics in terms of the (clusters of) stories
    System Performance




                                                                                                                                                                                                   0.7

                         0.1
                                                                                                                                                                                                   0.6                                                                     that discuss them
                                                                                                                                                                                  Scaled Utility




                                                                                                                                                                                                   0.5
                                                                                                                                                                                                   0.4                                                            • Problems with past Topic Detection evaluations:
              0.01
                                                                                                                                                                                                   0.3
                                                                                                                                                                                                   0.2
                                                                                                                                                                                                                           y = 1.0398x + 0.2942                          – Topics are at different levels of granularity,
                                                                                                                                                                                                                                     2

                                                                                                                                                                                                                                                                           yet systems had to choose single operating point
                                 CMU6

                                        CMU2

                                               CMU1

                                                      CMU5

                                                             CMU3-TrecUtl

                                                                            CMU4

                                                                                   CMU7

                                                                                          CMU8-dbg

                                                                                                     UMass2

                                                                                                              UMass1

                                                                                                                       UMass3

                                                                                                                                UMass4

                                                                                                                                         UMass7

                                                                                                                                                  UMD1

                                                                                                                                                         UMD2




                                                                                                                                                                                                                                 R = 0.2349
                                                                                                                                                                                                   0.1
                                                                                                                                                                                                       0
                                                                                                                                                                                                           0         0.2             0.4         0.6        0.8
                                                                                                                                                                                                                                                                           for creating a new cluster
                                                                                                                                                                                                                     Minimum DET Cost                                    – Stories may pertain to multiple topics,
                         •      Performance on Utility measure:
                                  – 2/3 of systems surpassed baseline scaled utility score                                                                                                                                                                                 yet systems had to assign each to only one cluster
                                      (0.33)
                                  – Most systems optimized for detection cost, not utility
                         • Detection Cost and Utility are uncorrelated: R2 of 0.23
                          Internet– Even for March 6, 2006
                                  Information/MIKII6, CMU3 which was tuned for utility Valentin Jijkoun, Maarten de Rijke                                                                                                                  27                     Internet Information/MIKII6, March 6, 2006   Valentin Jijkoun, Maarten de Rijke          28




                     Topic Hierarchy Solves Problems                                                                                                                                                                                                              Hierarchical Topic Detection
•                 System operation:                                                                                                                                                                                              a               Vertex           Observations
                          – Unsupervised topic training -                                                                                                                                                                                        Edge
                            no topic instances as input                                                                                                                                                                                          Story IDs        • All systems structured hierarchy as a tree –
                          – Assign each story to one or more clusters
                          – Clusters may overlap or include other
                                                                                                                                                                                                                                                                    each vertex has one parent
                                                                                                                                                                                                               a
                            clusters
                          – Clusters must be organized as directed
                                                                                                                                                                                                                                                                  • Travel cost has very little effect on finding the
                            acyclic graph (DAG) with single root                                                                                                                                       b                     c
                                                                                                                                                                                                                                                                    best cluster
                          – Treated as retrospective search                                                                                                              s1
•                 Semantics of topic hierarchy:                                                                                                                           s2
                                                                                                                                                                                                                                                s4         s3            – Setting WDET to 1.0 has little effect on topic mapping
                          – Root = entire collection
                          – Leaf nodes = the most specific topics                                                                                                    d                             e                 f                     g
                                                                                                                                                                                                                                                                  • Cost parameters favor false alarms
                          – Intermediate nodes represent different                                                                                                                                                                                                       – Average mapped cluster sizes are between
                            levels of granularity                                                                                                               s5                                                                               s10        s9
•                 Performance assessment:                                                                                                                       s6
                                                                                                                                                                                                                                                                           1262 and 7757 stories
                          – Given a topic, find matching cluster                                                                                                                                                                                                         – Average topic size is 40 stories
                            with lowest cost                                                                                                                                                       h                 i                     j

                                                                                                                                                                 s7
                                                                                                                                                                                                           s11              s13                      s15
                                                                                                                                                                         s8
                                                                                                                                                                                                                   s12                   s14                s16
                          Internet Information/MIKII6, March 6, 2006                                                                                                          Valentin Jijkoun, Maarten de Rijke                           29                     Internet Information/MIKII6, March 6, 2006   Valentin Jijkoun, Maarten de Rijke          30
Summary                                                                                                         What do teams use?
   •     Eleven research groups participated in five evaluation tasks
   •     Error rates increased for new event detection
           – Why?
                                                                                                                • TNO (HDT at TDT 2004)
   •     Error rates decreased for tracking                                                                            – Focus on scalability
   •     Error rates decreased for link detection
   •     Dry run of hierarchical topic detection completed                                                             – Agglomerative clustering scalable for large
           – Solves previous problems with topic detection task, but raises new issues
           – Questions to consider:                                                                                      document collections
                   • Is the specified hierarchical structure (single-root DAG) appropriate?
                   • Is the minimal cost metric appropriate?
                                                                                                                               • Take a sample
                   • If so, is the normalization right?                                                                        • Build a hierarchical cluster structure of this sample
   •     Dry run of supervised adaptive tracking completed
           – Promising results for including relevance feedback                                                                • Optimize resulting binary tree for minimal cost
           – Questions to consider:                                                                                              metric
                   • Should we continue the task?
                   • If so, should we continue using both metrics?                                                                     – Detection cost, travel cost
                                                                                                                               • Assign remaining docs from the corpus to cluster
                                                                                                                                 in the structure obtained from the sample

Internet Information/MIKII6, March 6, 2006                            Valentin Jijkoun, Maarten de Rijke   31   Internet Information/MIKII6, March 6, 2006             Valentin Jijkoun, Maarten de Rijke   32




Umass at TDT 2004
• Hierarchical topic detection
       – Topic detection classifies stories into different
         topics
       – Two step algorithm
               • 1-NN for event formation
                       – Stories from same source selected and time ordered
                       – Stories are processed one by one, each incoming story
                         is compared to (a certain number of) stories before it
               • Agglomerative clustering for building the hierarchy
                       – Events are sorted by time order according to time stamp
                         of first story, then do a bounded agglomerative clustering
                         for the events

Internet Information/MIKII6, March 6, 2006                            Valentin Jijkoun, Maarten de Rijke   33

More Related Content

More from George Ang

腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qqGeorge Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagementGeorge Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享George Ang
 
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)George Ang
 
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享George Ang
 
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向George Ang
 
腾讯大讲堂13 soso访问速度优化
腾讯大讲堂13 soso访问速度优化腾讯大讲堂13 soso访问速度优化
腾讯大讲堂13 soso访问速度优化George Ang
 
腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介George Ang
 

More from George Ang (20)

腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
 
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
 
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
 
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向
 
腾讯大讲堂13 soso访问速度优化
腾讯大讲堂13 soso访问速度优化腾讯大讲堂13 soso访问速度优化
腾讯大讲堂13 soso访问速度优化
 
腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介腾讯大讲堂21 搜索引擎优化(seo)简介
腾讯大讲堂21 搜索引擎优化(seo)简介
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Topic detection and tracking

  • 1. Outline Topic Detection and Tracking • Topic detection and tracking • Overview of TDT 2004 Valentin Jijkoun & Maarten de Rijke Informatics Institute University of Amsterdam http://ilps.science.uva.nl/Teaching/II0506 March 6, 2006 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 1 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 2 TDT… Topic Detection and Tracking • Introduction… Terabytes of • 5 TDT Applications – http://www.nist.gov/speech/tests/tdt/ Unorganized data – Story segmentation* – Topic Tracking – Topic Detection – First Story Detection – Link Detection * Not evaluated in 2004 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 3 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 4 TDT’s Research Domain Definitions • Technology challenge • An event is … – Develop applications that organize and locate – A specific thing that happens at a specific time and place along with all necessary preconditions and relevant stories from a continuous feed of unavoidable consequences. news stories • A topic is … • Research driven by evaluation tasks – an event or activity, along with all directly related • Composite applications built from events and activities – Document Retrieval • A broadcast news story is … – a section of transcribed text with substantive – Speech-to-Text (STT) – not included in 2004 information content and a unified topical focus – Story Segmentation – not included in 2004 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 5 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 6
  • 2. TDT 2004 Evaluation Corpus • TDT Evaluation Overview TDT4 TDT5 • 2004: same (2003 (2004 languages as 2003 • Changes in 2004 Collection Dates corpus) Oct 1, 2000 to corpus) April 1, 2003 to • Summary of • 2004 TDT Evaluation Result Summaries Newswire Jan 31, 2001 3 Arabic Sep 30, 2003 6 Arabic differences – New Event Detection Sources 2 English 7 English – New time period 2 Mandarin 4 Mandarin – No broadcast news – Link Detection Broadcast News Sources 2 Arabic NONE • No non-news stories 5 English – Topic Tracking 5 Mandarin – 4.5 times more stories Story Counts 90735 news, 407503 news, – Experimental Tasks: 7513 non-news 0 non-news – 3.1 times more topics stories – Topics have ! as • Supervised Adaptive Topic Tracking Annotated topics 80 250 many on-topic stories • Hierarchical Topic Detection Average topic 79 stories 40 stories size Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 7 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 8 Topic Size Distribution Multilingual Topic overlap Single Overlap Topics Multiply Overlap Topics 35 7 21 63 62 62 Arb+Eng+Man Arb+Eng Eng+Man Eng Man Arb Common Topic ID 107: Casablanca bombs Stories 105 Unique Stories 126 12 1000 283 583 106 118 72 89 532 105 20 380 2 215 3 107 154 1110 6 9 Number of On-Topic Stories 92 1 29 125 25 100 451 63 140 1 60 227 70 Arabic 93 1 9 Mandarin 151 189 42 2 90 171 22 71 2 English 80 5 6 10 78 69 145 427 1 3 186 193 Topics on 71: Demonstrations in 31 1 104 1 Terrorism Casablanca Topics (sorted by language and size) Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 9 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 10 Topic labels Participation by Task: Showing the Number of Submitted System Runs Single Overlap Topics Multiple Overlap Topics Sites New Event Hierarchical Topic Tracking Link Detection Topic Traditional Supervised Detection 72 Court indicts Liberian President 105 UN official killed in attack Detection Adaptation 89 Liberian former president arrives in exile 126 British soldiers attacked in Basra CMU Carnegie Mellon Univ. 215 Jerusalem: Bus suicide bombing 1 6 8 10 International Business 29 Swedish Foreign Minister killed 227 Bin Laden Videotape IBM Machines 4 125 Sweden rejects the Euro 171 Morocco: death sentences for bombing Domestic Stottler Henke suspects SHAI Associates, Inc. 5 151 Egyptian delegation in Gaza UIowa Univ. of Iowa 4 189 Palestinian public uprising suspended 107 Casablanca bombs UMd Univ. of Maryland for three months 71 Demonstrations in Casablanca 1 2 Univ. Massachusetts UMass 4 6 5 7 4 69 Earthquake in Algeria 106 Bombing in Riyadh, Saudi Arabia Chinese Univ. of Hong 145 Visit of Morocco Minister of Foreign 118 World Economic Forum in Jordan CUHK Kong 1 Affairs to Algeria 154 Saudi suicide bomber dies in shootout Institute of Computing ICT 11 1 60 Saudi King has eye surgery Foreign Technology Chinese Academy of Sciences 186 Press conference between Lebanon and 80 Spanish Elections US foreign ministers NEU Northeastern University 2 2 in China 193 Colin Powell Plans to visit Middle East Netherlands Org for and Europe TNO Applied Scientific 8 Research Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 11 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 12
  • 3. New Event Detection Task TDT Evaluation Methodology • System Goal: • Tasks are modeled as detection tasks – To detect the first story that discusses each – Systems are presented with many trials and must answer the question: “Is this example a target trial?” topic – Systems respond: • YES this is a target, or NO this is not • Each decision includes a likelihood score indicating the First Stories on two topics system’s confidence in the decision • System performance measured by linearly = Topic 1 = Topic 2 combining the system’s missed detection rate and false alarm rate Not First Stories Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 13 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 14 Detection Evaluation Methodology Performance Measures Example • Performance is measured in terms of Detection Cost Bar Chart – CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget) DET Curve Actual Normalized 1 – Constants: Detection Cost • CMiss = 1 and CFA = 0.1 are preset costs Minimum DET • Ptarget = 0.02 is the a priori probability of a target Normalized Cost – System performance estimates • PMiss and PFA P(miss) = 5.5% > Min DET Norm Detection Cost – Normalized Detection Cost generally lies between 0 and 1: P(fa)=1.1% Cost = 0.11 • (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)} 0.1 te r • Detection Error Tradeoff (DET) curves graphically depict the is bet performance tradeoff between PMiss and PFA le f t – Makes use of likelihood scores attached to the YES/NO decisions tto m Bo ! Two important scores per system – Actual Normalized Detection Cost 0.01 > • Based on the YES/NO decision threshold P(miss) = 0.7% Min DET NormEnglish Mandarin – Minimum Normalized DET point P(fa)=1.5% Cost = 0.08 • Based on the DET curve: Minimum score with proper threshold Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 15 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 16 Primary New Event Detection Results TDT Link Detection Task Newswire, English Texts System Goal: Actual Norm(Cost) – To detect whether a pair of stories discuss the same topic. Minimum Norm(Cost) (Can be thought of as a “primitive operator” to build a variety of 1 applications) Normalized Cost ? 0.1 1 U1 1 1 AI M ass CM IB SH UM 2003’s best score Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 17 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 18
  • 4. Primary Link Detection Results Topic Tracking Task Newswire, Multilingual links, 10-file deferral period Actual Norm(Cost) • System Goal: Minimum Norm(Cost) 1 – To detect stories that discuss the target topic, in multiple source streams • Supervised Training Normalized Cost – Given Nt samples stories that discuss a given target topic 0.1 • Testing – Find all subsequent stories that discuss the target topic 0.01 on-topic unknown training data UI 1 U1 UM 1 unknown 1 U a ass ow NE CM test data Scores are better than last year! Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 19 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 20 Primary Tracking Results Supervised Adaptive Tracking Task Newswire, Multilingual Texts, 1 English Training Story • Variation of Topic Tracking system goal: Actual Norm(Cost) – To detect stories that discuss the target topic when Minimum Norm(Cost) 1 a human provides feedback to the system • System receives human judgment (on or off-topic) for every retrieved story Normalized Cost – Same task as TREC 2002 Adaptive Filtering 0.1 on-topic unknown un-retrieved 0.01 retrieved on-topic training data retrieved off-topic UM 1 NE 1 U1 UM D1 1 U T ass IC CM test data 2003’s best score Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 21 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 22 Supervised Adaptive Tracking Metrics Supervised Adaptive Tracking Metrics • Normalized Detection Cost • Linear Utility Measure Computation: – Same measure as for basic Tracking task – Basic formula: U = Wrel ! R - NR • R = number of relevant stories retrieved • Linear Utility Measure • NR = number of non-relevant stories retrieved – As defined for TREC 2002 Filtering Track • Wrel = relative weight of relevant vs non-relevant (Robertson & Soboroff) (set to 10, by analogy with CMiss vs. CFA weights for CDet) – Measures value of the stories sent to the user: – Normalization across topics: • Credit for relevant stories, debit for non-relevant stories • Divide by maximum possible utility score for each topic • Equivalent to thresholding based on estimated probability – Scaling across topics: of relevance • Define arbitrary minimum possible score, to avoid having – No penalty for missing relevant stories average dominated by a few topics with huge NR counts (i.e. all precision, no recall) • Corresponds to application scenario in which user stops looking – Implication: Challenge is to beat the “do-nothing” baseline at stories when system exceeds some tolerable false alarm rate (i.e. a system that rejects all stories) – Scaled, normalized value: Uscale = [ max(Unorm, Umin) ] / [ 1 - Umin ] Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 23 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 24
  • 5. Supervised Adaptive Tracking Effect of Supervised Adaptation Best Two Submissions per Site Newswire, Multilingual Texts, 1 English Training Story • CMU4 is a simple cosine similarity tracker – Contrastive run submitted without supervised adaptation Minimum Norm(Cost) 1 Normalized Cost 0.1 0.01 g ing kin ck ac a Tr Tr Best 2004 standard tracking result! SA Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 25 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 26 Supervised Adaptive Tracking Utility vs. Detection cost Hierarchical Topic Detection Actual Normalized DET Cost • System goal: Min. Normalized DET Cost Minimum DET Cost vs. Scaled Utility 1 Scaled Utility 0.8 – To detect topics in terms of the (clusters of) stories System Performance 0.7 0.1 0.6 that discuss them Scaled Utility 0.5 0.4 • Problems with past Topic Detection evaluations: 0.01 0.3 0.2 y = 1.0398x + 0.2942 – Topics are at different levels of granularity, 2 yet systems had to choose single operating point CMU6 CMU2 CMU1 CMU5 CMU3-TrecUtl CMU4 CMU7 CMU8-dbg UMass2 UMass1 UMass3 UMass4 UMass7 UMD1 UMD2 R = 0.2349 0.1 0 0 0.2 0.4 0.6 0.8 for creating a new cluster Minimum DET Cost – Stories may pertain to multiple topics, • Performance on Utility measure: – 2/3 of systems surpassed baseline scaled utility score yet systems had to assign each to only one cluster (0.33) – Most systems optimized for detection cost, not utility • Detection Cost and Utility are uncorrelated: R2 of 0.23 Internet– Even for March 6, 2006 Information/MIKII6, CMU3 which was tuned for utility Valentin Jijkoun, Maarten de Rijke 27 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 28 Topic Hierarchy Solves Problems Hierarchical Topic Detection • System operation: a Vertex Observations – Unsupervised topic training - Edge no topic instances as input Story IDs • All systems structured hierarchy as a tree – – Assign each story to one or more clusters – Clusters may overlap or include other each vertex has one parent a clusters – Clusters must be organized as directed • Travel cost has very little effect on finding the acyclic graph (DAG) with single root b c best cluster – Treated as retrospective search s1 • Semantics of topic hierarchy: s2 s4 s3 – Setting WDET to 1.0 has little effect on topic mapping – Root = entire collection – Leaf nodes = the most specific topics d e f g • Cost parameters favor false alarms – Intermediate nodes represent different – Average mapped cluster sizes are between levels of granularity s5 s10 s9 • Performance assessment: s6 1262 and 7757 stories – Given a topic, find matching cluster – Average topic size is 40 stories with lowest cost h i j s7 s11 s13 s15 s8 s12 s14 s16 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 29 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 30
  • 6. Summary What do teams use? • Eleven research groups participated in five evaluation tasks • Error rates increased for new event detection – Why? • TNO (HDT at TDT 2004) • Error rates decreased for tracking – Focus on scalability • Error rates decreased for link detection • Dry run of hierarchical topic detection completed – Agglomerative clustering scalable for large – Solves previous problems with topic detection task, but raises new issues – Questions to consider: document collections • Is the specified hierarchical structure (single-root DAG) appropriate? • Is the minimal cost metric appropriate? • Take a sample • If so, is the normalization right? • Build a hierarchical cluster structure of this sample • Dry run of supervised adaptive tracking completed – Promising results for including relevance feedback • Optimize resulting binary tree for minimal cost – Questions to consider: metric • Should we continue the task? • If so, should we continue using both metrics? – Detection cost, travel cost • Assign remaining docs from the corpus to cluster in the structure obtained from the sample Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 31 Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 32 Umass at TDT 2004 • Hierarchical topic detection – Topic detection classifies stories into different topics – Two step algorithm • 1-NN for event formation – Stories from same source selected and time ordered – Stories are processed one by one, each incoming story is compared to (a certain number of) stories before it • Agglomerative clustering for building the hierarchy – Events are sorted by time order according to time stamp of first story, then do a bounded agglomerative clustering for the events Internet Information/MIKII6, March 6, 2006 Valentin Jijkoun, Maarten de Rijke 33