SlideShare a Scribd company logo
1 of 15
Download to read offline
University of Helsinki                                                                                                    Department of Computer Scien




          Utilizing Temporal Information in
            Topic Detection and Tracking
                                       Juha Makkonen and Helena Ahonen–Myka
                                           {jamakkon,hahonen}@cs.helsinki.fi


                            University of Helsinki – Department of Computer Science




Juha Makkonen and Helena Ahonen–Myka            Utilizing Temporal Information in Topic Detection and Tracking – p.1/15                     2003-08-1
University of Helsinki                                                                                              Department of Computer Scien




    Outline
                         Introduction
                         Topic Detection and Tracking
                         Resolving temporal expressions
                            Recognition
                            Formalization
                            Comparison
                         Experiments
                         Future Work




Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.2/15                     2003-08-1
University of Helsinki                                                                                              Department of Computer Scien




    Introduction
                         Temporal expressions are often omitted.
                            their extraction requires tools,
                            they have to be formalized in order to be of any use,
                            comparing formalizations is sometimes tricky.
                         By no means a novel idea
                            in AI to form chronologies of events,
                            in question answering to extract a fact,
                            in databases, diagnosing systems, dialog systems . . .
                         We want to measure the temporal similarity of two
                         documents.

Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.3/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Topic Detection and Tracking
                         TDT system monitors news broadcasts in order to
                             detect new, previously unreported events, and to
                             track the development of the detected events.
                         The focus is on news events: something untrivial taking
                         place at a specific time and place.
                         A topic is understood as as is an event or an activity,
                         along with all related events and activities.
                         The news stream that is monitored in intrinsically
                         sensitive to time.



Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.4/15                     2003-08-1
University of Helsinki                                                                                                Department of Computer Scien




    Resolving Temporal Expressions
                         An expression can be
                             explicit: “the 19th of August 2003”,
                             implicit: “today”, “Tuesday afternoon”, or
                             vague: “since April”, “a couple of weeks ago” .
                         The evaluation is based on a point of reference. “The
                         winter of 1974 was cold. The next winter will be colder.”
                         “The winter of 1974 was cold. The next winter was colder.”
                         Resolving the meaning of the latter winter requires
                             the reference time or the utterance time and
                             the tense of the relevant verb.

Juha Makkonen and Helena Ahonen–Myka        Utilizing Temporal Information in Topic Detection and Tracking – p.5/15                     2003-08-1
University of Helsinki                                                                                                             Department of Computer Scien




    Recognition
                         The relevant terms are split into categories.
              category                 terms
              baseterm                 day, week, weekday, month, monthname, quarter, season, year, decade
              indexical                yesterday, today, tomorrow
              internal                 beginning, end, early, late, middle
              determiner               this, last, next, previous, the
              temporal                 in, on, by, during, after, until, since, before, later
              postmodifier              of, to
              numeral                  one, two, . . .
              ordinal                  first, second, . . .
              adverb                   ago
              meta                     throughout
              vague                    some, few, several
              recurrence               every, per
              source                   from
Juha Makkonen and Helena Ahonen–Myka                     Utilizing Temporal Information in Topic Detection and Tracking – p.6/15                     2003-08-1
University of Helsinki                                                                                                         Department of Computer Scien




    Recognition
                         The categories are used to build automata.

                                                                 postmodifier
                                                                                                 determiner

                                               ordinal          postmodifier                       determiner
                                                                                                                        year
                                       determiner
                                init
                                                                     monthname
                                           determiner

                                                                    internal
                                         temporal
                                                                                                       numeral
                                                                        internal                       ordinal

                         “The strike started on the 15th of May 1919. It lasted until
                         the end of June, although there was still turmoil in late
                         January next year”.
Juha Makkonen and Helena Ahonen–Myka                Utilizing Temporal Information in Topic Detection and Tracking – p.7/15                      2003-08-1
University of Helsinki                                                                                                     Department of Computer Scien




    Formalization
                         We map the expressions onto a calendar
                             a time-line – points with precedence relation,
                             a set of granularities (year, month, week, . . . )
                             note: March, Thursday and weekend are also granularities.

                             a set of conversion functions between granularities.
                         The expressions are mapped as intervals [tstart , tend ] of the
                         bottom granularity which in our case is day.




Juha Makkonen and Helena Ahonen–Myka             Utilizing Temporal Information in Topic Detection and Tracking – p.8/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Formalization
                         The baseterm of the expression defines interval.
                         The non-baseterms are interpreted as shift and span
                         functions that modify the start and end points.
                            shift: this, next, last, 3 weeks ago, etc.
                            span: until, before, after, from, etc.
                         the length of the interval is modified by internals
                            in the beginning of 1970s, late May, etc.




Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.9/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Comparison
                         We want to measure the temporal similarity of two
                         documents, i.e., how much the references overlap.
                         When comparing the intervals of two documents
                            compare pairwise all intervals
                            similarity = 2 * overlap / size of the intervals
                            take the average of the best matches for each interval.
                         The outcome measures how well the references of one
                         document cover those of the other.




Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.10/15                     2003-08-1
University of Helsinki                                                                                                  Department of Computer Scien




    Experiments
                         Data: transcribed TV and radio broadcasts and online
                         news.
                             8595 documents from the TDT2 corpus.
                             2383 documents were labeled to one of 35 events.
                         Temporal expression recognition with 1417 sentences

                                         type     freq          recognition                      canonization
                                       simple     326                            0.98                                  0.93
                                  composite       209                            0.85                                  0.66
                         Verbs like to schedule , to plan or to expect gave hard time.
                         “The meeting was scheduled for Monday.” Which one?
Juha Makkonen and Helena Ahonen–Myka        Utilizing Temporal Information in Topic Detection and Tracking – p.11/15                      2003-08-1
University of Helsinki                                                                                                Department of Computer Scien




    Experiments
                         The distribution of temporal relations

                                                                     same event
                                       relation                         yes                 no
                                       before                     0.761              0.831
                                       meets                      0.001              0.000
                                       overlaps                   0.016              0.008
                                       begins                     0.010              0.006
                                       falls within               0.168              0.122
                                       finishes                    0.010              0.008
                                       exact                      0.072              0.056
Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.12/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Experiments
                         Temporal similarity is higher when documents are
                         relevant.

             average of                 same event different event ratio of yes/no
             sum of pairwise                 0.0034                                 0.0023                               1.4783
             max of pairwise                 0.0059                                 0.0040                               1.4750


                         Finding the best-match for each interval does not pay off.
                         A better accuracy on formalization would help.
                         What is the meaning of “three years ago?”
                         How to represent informativeness?
Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.13/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Future Work
                         Improvement of the composite expression processing
                            more work on the automata
                         Introduction of vagueness:
                            an expression would formalized as probability
                            distributions on the timeline
                            similarity could be Kullback-Leibler, for instance.
                         Survey of the behaviour of the temporal expressions
                            how the references distribute per medium?
                            the first story compared to the following ones?



Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.14/15                     2003-08-1
University of Helsinki                                                                                            Department of Computer Scien




    The End



                                       Thank you




Juha Makkonen and Helena Ahonen–Myka   Utilizing Temporal Information in Topic Detection and Tracking – p.15/15                     2003-08-1

More Related Content

More from George Ang

Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textGeorge Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qqGeorge Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagementGeorge Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享George Ang
 
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)George Ang
 
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享George Ang
 
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向George Ang
 

More from George Ang (20)

Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
 
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
腾讯大讲堂17 性能优化不是仅局限于后台(qzone)
 
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
腾讯大讲堂18 让我们戴上有色眼镜--qzone前台架构的优化分享
 
腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向腾讯大讲堂19 系统优化的方向
腾讯大讲堂19 系统优化的方向
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Utilizing temporal information in topic detection and tracking

  • 1. University of Helsinki Department of Computer Scien Utilizing Temporal Information in Topic Detection and Tracking Juha Makkonen and Helena Ahonen–Myka {jamakkon,hahonen}@cs.helsinki.fi University of Helsinki – Department of Computer Science Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.1/15 2003-08-1
  • 2. University of Helsinki Department of Computer Scien Outline Introduction Topic Detection and Tracking Resolving temporal expressions Recognition Formalization Comparison Experiments Future Work Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.2/15 2003-08-1
  • 3. University of Helsinki Department of Computer Scien Introduction Temporal expressions are often omitted. their extraction requires tools, they have to be formalized in order to be of any use, comparing formalizations is sometimes tricky. By no means a novel idea in AI to form chronologies of events, in question answering to extract a fact, in databases, diagnosing systems, dialog systems . . . We want to measure the temporal similarity of two documents. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.3/15 2003-08-1
  • 4. University of Helsinki Department of Computer Scien Topic Detection and Tracking TDT system monitors news broadcasts in order to detect new, previously unreported events, and to track the development of the detected events. The focus is on news events: something untrivial taking place at a specific time and place. A topic is understood as as is an event or an activity, along with all related events and activities. The news stream that is monitored in intrinsically sensitive to time. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.4/15 2003-08-1
  • 5. University of Helsinki Department of Computer Scien Resolving Temporal Expressions An expression can be explicit: “the 19th of August 2003”, implicit: “today”, “Tuesday afternoon”, or vague: “since April”, “a couple of weeks ago” . The evaluation is based on a point of reference. “The winter of 1974 was cold. The next winter will be colder.” “The winter of 1974 was cold. The next winter was colder.” Resolving the meaning of the latter winter requires the reference time or the utterance time and the tense of the relevant verb. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.5/15 2003-08-1
  • 6. University of Helsinki Department of Computer Scien Recognition The relevant terms are split into categories. category terms baseterm day, week, weekday, month, monthname, quarter, season, year, decade indexical yesterday, today, tomorrow internal beginning, end, early, late, middle determiner this, last, next, previous, the temporal in, on, by, during, after, until, since, before, later postmodifier of, to numeral one, two, . . . ordinal first, second, . . . adverb ago meta throughout vague some, few, several recurrence every, per source from Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.6/15 2003-08-1
  • 7. University of Helsinki Department of Computer Scien Recognition The categories are used to build automata. postmodifier determiner ordinal postmodifier determiner year determiner init monthname determiner internal temporal numeral internal ordinal “The strike started on the 15th of May 1919. It lasted until the end of June, although there was still turmoil in late January next year”. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.7/15 2003-08-1
  • 8. University of Helsinki Department of Computer Scien Formalization We map the expressions onto a calendar a time-line – points with precedence relation, a set of granularities (year, month, week, . . . ) note: March, Thursday and weekend are also granularities. a set of conversion functions between granularities. The expressions are mapped as intervals [tstart , tend ] of the bottom granularity which in our case is day. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.8/15 2003-08-1
  • 9. University of Helsinki Department of Computer Scien Formalization The baseterm of the expression defines interval. The non-baseterms are interpreted as shift and span functions that modify the start and end points. shift: this, next, last, 3 weeks ago, etc. span: until, before, after, from, etc. the length of the interval is modified by internals in the beginning of 1970s, late May, etc. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.9/15 2003-08-1
  • 10. University of Helsinki Department of Computer Scien Comparison We want to measure the temporal similarity of two documents, i.e., how much the references overlap. When comparing the intervals of two documents compare pairwise all intervals similarity = 2 * overlap / size of the intervals take the average of the best matches for each interval. The outcome measures how well the references of one document cover those of the other. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.10/15 2003-08-1
  • 11. University of Helsinki Department of Computer Scien Experiments Data: transcribed TV and radio broadcasts and online news. 8595 documents from the TDT2 corpus. 2383 documents were labeled to one of 35 events. Temporal expression recognition with 1417 sentences type freq recognition canonization simple 326 0.98 0.93 composite 209 0.85 0.66 Verbs like to schedule , to plan or to expect gave hard time. “The meeting was scheduled for Monday.” Which one? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.11/15 2003-08-1
  • 12. University of Helsinki Department of Computer Scien Experiments The distribution of temporal relations same event relation yes no before 0.761 0.831 meets 0.001 0.000 overlaps 0.016 0.008 begins 0.010 0.006 falls within 0.168 0.122 finishes 0.010 0.008 exact 0.072 0.056 Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.12/15 2003-08-1
  • 13. University of Helsinki Department of Computer Scien Experiments Temporal similarity is higher when documents are relevant. average of same event different event ratio of yes/no sum of pairwise 0.0034 0.0023 1.4783 max of pairwise 0.0059 0.0040 1.4750 Finding the best-match for each interval does not pay off. A better accuracy on formalization would help. What is the meaning of “three years ago?” How to represent informativeness? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.13/15 2003-08-1
  • 14. University of Helsinki Department of Computer Scien Future Work Improvement of the composite expression processing more work on the automata Introduction of vagueness: an expression would formalized as probability distributions on the timeline similarity could be Kullback-Leibler, for instance. Survey of the behaviour of the temporal expressions how the references distribute per medium? the first story compared to the following ones? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.14/15 2003-08-1
  • 15. University of Helsinki Department of Computer Scien The End Thank you Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.15/15 2003-08-1