SlideShare a Scribd company logo
University of Helsinki                                                                                                    Department of Computer Scien




          Utilizing Temporal Information in
            Topic Detection and Tracking
                                       Juha Makkonen and Helena Ahonen–Myka
                                           {jamakkon,hahonen}@cs.helsinki.fi


                            University of Helsinki – Department of Computer Science




Juha Makkonen and Helena Ahonen–Myka            Utilizing Temporal Information in Topic Detection and Tracking – p.1/15                     2003-08-1
University of Helsinki                                                                                              Department of Computer Scien




    Outline
                         Introduction
                         Topic Detection and Tracking
                         Resolving temporal expressions
                            Recognition
                            Formalization
                            Comparison
                         Experiments
                         Future Work




Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.2/15                     2003-08-1
University of Helsinki                                                                                              Department of Computer Scien




    Introduction
                         Temporal expressions are often omitted.
                            their extraction requires tools,
                            they have to be formalized in order to be of any use,
                            comparing formalizations is sometimes tricky.
                         By no means a novel idea
                            in AI to form chronologies of events,
                            in question answering to extract a fact,
                            in databases, diagnosing systems, dialog systems . . .
                         We want to measure the temporal similarity of two
                         documents.

Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.3/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Topic Detection and Tracking
                         TDT system monitors news broadcasts in order to
                             detect new, previously unreported events, and to
                             track the development of the detected events.
                         The focus is on news events: something untrivial taking
                         place at a specific time and place.
                         A topic is understood as as is an event or an activity,
                         along with all related events and activities.
                         The news stream that is monitored in intrinsically
                         sensitive to time.



Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.4/15                     2003-08-1
University of Helsinki                                                                                                Department of Computer Scien




    Resolving Temporal Expressions
                         An expression can be
                             explicit: “the 19th of August 2003”,
                             implicit: “today”, “Tuesday afternoon”, or
                             vague: “since April”, “a couple of weeks ago” .
                         The evaluation is based on a point of reference. “The
                         winter of 1974 was cold. The next winter will be colder.”
                         “The winter of 1974 was cold. The next winter was colder.”
                         Resolving the meaning of the latter winter requires
                             the reference time or the utterance time and
                             the tense of the relevant verb.

Juha Makkonen and Helena Ahonen–Myka        Utilizing Temporal Information in Topic Detection and Tracking – p.5/15                     2003-08-1
University of Helsinki                                                                                                             Department of Computer Scien




    Recognition
                         The relevant terms are split into categories.
              category                 terms
              baseterm                 day, week, weekday, month, monthname, quarter, season, year, decade
              indexical                yesterday, today, tomorrow
              internal                 beginning, end, early, late, middle
              determiner               this, last, next, previous, the
              temporal                 in, on, by, during, after, until, since, before, later
              postmodifier              of, to
              numeral                  one, two, . . .
              ordinal                  first, second, . . .
              adverb                   ago
              meta                     throughout
              vague                    some, few, several
              recurrence               every, per
              source                   from
Juha Makkonen and Helena Ahonen–Myka                     Utilizing Temporal Information in Topic Detection and Tracking – p.6/15                     2003-08-1
University of Helsinki                                                                                                         Department of Computer Scien




    Recognition
                         The categories are used to build automata.

                                                                 postmodifier
                                                                                                 determiner

                                               ordinal          postmodifier                       determiner
                                                                                                                        year
                                       determiner
                                init
                                                                     monthname
                                           determiner

                                                                    internal
                                         temporal
                                                                                                       numeral
                                                                        internal                       ordinal

                         “The strike started on the 15th of May 1919. It lasted until
                         the end of June, although there was still turmoil in late
                         January next year”.
Juha Makkonen and Helena Ahonen–Myka                Utilizing Temporal Information in Topic Detection and Tracking – p.7/15                      2003-08-1
University of Helsinki                                                                                                     Department of Computer Scien




    Formalization
                         We map the expressions onto a calendar
                             a time-line – points with precedence relation,
                             a set of granularities (year, month, week, . . . )
                             note: March, Thursday and weekend are also granularities.

                             a set of conversion functions between granularities.
                         The expressions are mapped as intervals [tstart , tend ] of the
                         bottom granularity which in our case is day.




Juha Makkonen and Helena Ahonen–Myka             Utilizing Temporal Information in Topic Detection and Tracking – p.8/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Formalization
                         The baseterm of the expression defines interval.
                         The non-baseterms are interpreted as shift and span
                         functions that modify the start and end points.
                            shift: this, next, last, 3 weeks ago, etc.
                            span: until, before, after, from, etc.
                         the length of the interval is modified by internals
                            in the beginning of 1970s, late May, etc.




Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.9/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Comparison
                         We want to measure the temporal similarity of two
                         documents, i.e., how much the references overlap.
                         When comparing the intervals of two documents
                            compare pairwise all intervals
                            similarity = 2 * overlap / size of the intervals
                            take the average of the best matches for each interval.
                         The outcome measures how well the references of one
                         document cover those of the other.




Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.10/15                     2003-08-1
University of Helsinki                                                                                                  Department of Computer Scien




    Experiments
                         Data: transcribed TV and radio broadcasts and online
                         news.
                             8595 documents from the TDT2 corpus.
                             2383 documents were labeled to one of 35 events.
                         Temporal expression recognition with 1417 sentences

                                         type     freq          recognition                      canonization
                                       simple     326                            0.98                                  0.93
                                  composite       209                            0.85                                  0.66
                         Verbs like to schedule , to plan or to expect gave hard time.
                         “The meeting was scheduled for Monday.” Which one?
Juha Makkonen and Helena Ahonen–Myka        Utilizing Temporal Information in Topic Detection and Tracking – p.11/15                      2003-08-1
University of Helsinki                                                                                                Department of Computer Scien




    Experiments
                         The distribution of temporal relations

                                                                     same event
                                       relation                         yes                 no
                                       before                     0.761              0.831
                                       meets                      0.001              0.000
                                       overlaps                   0.016              0.008
                                       begins                     0.010              0.006
                                       falls within               0.168              0.122
                                       finishes                    0.010              0.008
                                       exact                      0.072              0.056
Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.12/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Experiments
                         Temporal similarity is higher when documents are
                         relevant.

             average of                 same event different event ratio of yes/no
             sum of pairwise                 0.0034                                 0.0023                               1.4783
             max of pairwise                 0.0059                                 0.0040                               1.4750


                         Finding the best-match for each interval does not pay off.
                         A better accuracy on formalization would help.
                         What is the meaning of “three years ago?”
                         How to represent informativeness?
Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.13/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Future Work
                         Improvement of the composite expression processing
                            more work on the automata
                         Introduction of vagueness:
                            an expression would formalized as probability
                            distributions on the timeline
                            similarity could be Kullback-Leibler, for instance.
                         Survey of the behaviour of the temporal expressions
                            how the references distribute per medium?
                            the first story compared to the following ones?



Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.14/15                     2003-08-1
University of Helsinki                                                                                            Department of Computer Scien




    The End



                                       Thank you




Juha Makkonen and Helena Ahonen–Myka   Utilizing Temporal Information in Topic Detection and Tracking – p.15/15                     2003-08-1

More Related Content

More from George Ang

Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
George Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qqGeorge Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagementGeorge Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享George Ang
 
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)George Ang
 
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享George Ang
 
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向George Ang
 

More from George Ang (20)

Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
 
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
 
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
 
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向
 

Recently uploaded

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 

Recently uploaded (20)

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 

Utilizing temporal information in topic detection and tracking

  • 1. University of Helsinki Department of Computer Scien Utilizing Temporal Information in Topic Detection and Tracking Juha Makkonen and Helena Ahonen–Myka {jamakkon,hahonen}@cs.helsinki.fi University of Helsinki – Department of Computer Science Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.1/15 2003-08-1
  • 2. University of Helsinki Department of Computer Scien Outline Introduction Topic Detection and Tracking Resolving temporal expressions Recognition Formalization Comparison Experiments Future Work Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.2/15 2003-08-1
  • 3. University of Helsinki Department of Computer Scien Introduction Temporal expressions are often omitted. their extraction requires tools, they have to be formalized in order to be of any use, comparing formalizations is sometimes tricky. By no means a novel idea in AI to form chronologies of events, in question answering to extract a fact, in databases, diagnosing systems, dialog systems . . . We want to measure the temporal similarity of two documents. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.3/15 2003-08-1
  • 4. University of Helsinki Department of Computer Scien Topic Detection and Tracking TDT system monitors news broadcasts in order to detect new, previously unreported events, and to track the development of the detected events. The focus is on news events: something untrivial taking place at a specific time and place. A topic is understood as as is an event or an activity, along with all related events and activities. The news stream that is monitored in intrinsically sensitive to time. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.4/15 2003-08-1
  • 5. University of Helsinki Department of Computer Scien Resolving Temporal Expressions An expression can be explicit: “the 19th of August 2003”, implicit: “today”, “Tuesday afternoon”, or vague: “since April”, “a couple of weeks ago” . The evaluation is based on a point of reference. “The winter of 1974 was cold. The next winter will be colder.” “The winter of 1974 was cold. The next winter was colder.” Resolving the meaning of the latter winter requires the reference time or the utterance time and the tense of the relevant verb. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.5/15 2003-08-1
  • 6. University of Helsinki Department of Computer Scien Recognition The relevant terms are split into categories. category terms baseterm day, week, weekday, month, monthname, quarter, season, year, decade indexical yesterday, today, tomorrow internal beginning, end, early, late, middle determiner this, last, next, previous, the temporal in, on, by, during, after, until, since, before, later postmodifier of, to numeral one, two, . . . ordinal first, second, . . . adverb ago meta throughout vague some, few, several recurrence every, per source from Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.6/15 2003-08-1
  • 7. University of Helsinki Department of Computer Scien Recognition The categories are used to build automata. postmodifier determiner ordinal postmodifier determiner year determiner init monthname determiner internal temporal numeral internal ordinal “The strike started on the 15th of May 1919. It lasted until the end of June, although there was still turmoil in late January next year”. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.7/15 2003-08-1
  • 8. University of Helsinki Department of Computer Scien Formalization We map the expressions onto a calendar a time-line – points with precedence relation, a set of granularities (year, month, week, . . . ) note: March, Thursday and weekend are also granularities. a set of conversion functions between granularities. The expressions are mapped as intervals [tstart , tend ] of the bottom granularity which in our case is day. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.8/15 2003-08-1
  • 9. University of Helsinki Department of Computer Scien Formalization The baseterm of the expression defines interval. The non-baseterms are interpreted as shift and span functions that modify the start and end points. shift: this, next, last, 3 weeks ago, etc. span: until, before, after, from, etc. the length of the interval is modified by internals in the beginning of 1970s, late May, etc. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.9/15 2003-08-1
  • 10. University of Helsinki Department of Computer Scien Comparison We want to measure the temporal similarity of two documents, i.e., how much the references overlap. When comparing the intervals of two documents compare pairwise all intervals similarity = 2 * overlap / size of the intervals take the average of the best matches for each interval. The outcome measures how well the references of one document cover those of the other. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.10/15 2003-08-1
  • 11. University of Helsinki Department of Computer Scien Experiments Data: transcribed TV and radio broadcasts and online news. 8595 documents from the TDT2 corpus. 2383 documents were labeled to one of 35 events. Temporal expression recognition with 1417 sentences type freq recognition canonization simple 326 0.98 0.93 composite 209 0.85 0.66 Verbs like to schedule , to plan or to expect gave hard time. “The meeting was scheduled for Monday.” Which one? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.11/15 2003-08-1
  • 12. University of Helsinki Department of Computer Scien Experiments The distribution of temporal relations same event relation yes no before 0.761 0.831 meets 0.001 0.000 overlaps 0.016 0.008 begins 0.010 0.006 falls within 0.168 0.122 finishes 0.010 0.008 exact 0.072 0.056 Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.12/15 2003-08-1
  • 13. University of Helsinki Department of Computer Scien Experiments Temporal similarity is higher when documents are relevant. average of same event different event ratio of yes/no sum of pairwise 0.0034 0.0023 1.4783 max of pairwise 0.0059 0.0040 1.4750 Finding the best-match for each interval does not pay off. A better accuracy on formalization would help. What is the meaning of “three years ago?” How to represent informativeness? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.13/15 2003-08-1
  • 14. University of Helsinki Department of Computer Scien Future Work Improvement of the composite expression processing more work on the automata Introduction of vagueness: an expression would formalized as probability distributions on the timeline similarity could be Kullback-Leibler, for instance. Survey of the behaviour of the temporal expressions how the references distribute per medium? the first story compared to the following ones? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.14/15 2003-08-1
  • 15. University of Helsinki Department of Computer Scien The End Thank you Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.15/15 2003-08-1