The document discusses utilizing temporal information in topic detection and tracking. It describes recognizing and formalizing temporal expressions to measure the temporal similarity between documents. The authors present their approach to resolving temporal expressions, recognizing terms, formalizing expressions as intervals on a timeline, and comparing temporal references between documents. They also discuss experiments on a news corpus and directions for future work, including improving composite expression processing and introducing vagueness.
Naturalized Epistemology North American Computing and Philosophy 2007 Gordana Dodig-Crnkovic
Gordana Dodig-Crnkovic
Knowledge Generation as Natural Computation,
Journal of Systemics, Cybernetics and Informatics, Vol 6, No 2, 2008
http://www.iiisci.org/Journal/CV$/sci/pdfs/G774PI.pdf
This document discusses techniques for analyzing social interactions and meaning, including Latent Semantic Analysis (LSA), Social Network Analysis (SNA), and Meaningful Interaction Analysis (MIA). LSA uses a vector space model to represent the meanings of words and passages. SNA analyzes relationships between actors in a network. MIA combines LSA and SNA to investigate associative closeness structures and social relations in latent semantic spaces. The document provides examples of applying these techniques and outlines their potential for future analysis of practices.
The document provides background information on topic detection and tracking (TDT) research. It describes how TDT began in the late 1990s with pilot studies and was further developed in TDT2 evaluations. TDT2 aimed to automatically detect new topics and track the reappearance and evolution of topics in streams of broadcast news and newswire stories. It established a large TDT2 corpus consisting of over 300 stories per day from multiple sources, including newswire articles and radio/TV broadcasts, and defined 100 target topics for evaluation. The three main TDT tasks are story segmentation, tracking known topics, and detecting unknown topics in the story stream.
This document provides an overview of Topic Detection and Tracking (TDT) and summarizes the TDT 2004 evaluation. TDT aims to develop applications that organize and locate relevant news stories from a continuous feed. The TDT 2004 evaluation included tasks like new event detection, topic tracking, and link detection on a multilingual corpus. The document outlines the evaluation methodology and discusses the results of the 2004 evaluation, showing most systems improved over 2003 scores.
Today there is vast amount of information present on the internet. There are a number of news media, blogs, social networks like Facebook, Google Plus etc. and microblogs such as Twitter that provide information to the user who is trying to extract some specific knowledge from the world wide web. Thus the user gets burdened with large amount of data which makes it difficult for the user to find the desired information. In this seminar report, we discuss the various techniques that are used to represent data in a more meaningful manner to counter the problem of Information Overload. We focus primarily on various algorithms used for Storyline Generation which is a technique to capture the underlying temporal and casual dependencies amongst the news events.
Wrapper induction construct wrappers automatically to extract information f...George Ang
Wrapper induction is a technique to automatically generate wrappers to extract information from web sources. It involves learning extraction rules from labeled examples to construct a wrapper as a finite state machine or set of delimiters. Two main wrapper induction systems are WIEN, which defines wrapper classes including LR, and STALKER, which uses a more expressive model with extraction rules and landmarks to handle structure hierarchically. Remaining challenges include selecting informative examples, generating label pages automatically, and developing more expressive models.
This document summarizes a tutorial given by Bing Liu on opinion mining and summarization. The tutorial covered several key topics in opinion mining including sentiment classification at the document and sentence level, feature-based opinion mining and summarization, comparative sentence extraction, and opinion spam detection. The tutorial provided an overview of the field of opinion mining and abstraction as well as summaries of various approaches to tasks such as sentiment classification using machine learning methods and feature scoring.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how the algorithm works, its advantages in achieving compression close to the source entropy, and some limitations. It also covers applications of Huffman coding like image compression.
Naturalized Epistemology North American Computing and Philosophy 2007 Gordana Dodig-Crnkovic
Gordana Dodig-Crnkovic
Knowledge Generation as Natural Computation,
Journal of Systemics, Cybernetics and Informatics, Vol 6, No 2, 2008
http://www.iiisci.org/Journal/CV$/sci/pdfs/G774PI.pdf
This document discusses techniques for analyzing social interactions and meaning, including Latent Semantic Analysis (LSA), Social Network Analysis (SNA), and Meaningful Interaction Analysis (MIA). LSA uses a vector space model to represent the meanings of words and passages. SNA analyzes relationships between actors in a network. MIA combines LSA and SNA to investigate associative closeness structures and social relations in latent semantic spaces. The document provides examples of applying these techniques and outlines their potential for future analysis of practices.
The document provides background information on topic detection and tracking (TDT) research. It describes how TDT began in the late 1990s with pilot studies and was further developed in TDT2 evaluations. TDT2 aimed to automatically detect new topics and track the reappearance and evolution of topics in streams of broadcast news and newswire stories. It established a large TDT2 corpus consisting of over 300 stories per day from multiple sources, including newswire articles and radio/TV broadcasts, and defined 100 target topics for evaluation. The three main TDT tasks are story segmentation, tracking known topics, and detecting unknown topics in the story stream.
This document provides an overview of Topic Detection and Tracking (TDT) and summarizes the TDT 2004 evaluation. TDT aims to develop applications that organize and locate relevant news stories from a continuous feed. The TDT 2004 evaluation included tasks like new event detection, topic tracking, and link detection on a multilingual corpus. The document outlines the evaluation methodology and discusses the results of the 2004 evaluation, showing most systems improved over 2003 scores.
Today there is vast amount of information present on the internet. There are a number of news media, blogs, social networks like Facebook, Google Plus etc. and microblogs such as Twitter that provide information to the user who is trying to extract some specific knowledge from the world wide web. Thus the user gets burdened with large amount of data which makes it difficult for the user to find the desired information. In this seminar report, we discuss the various techniques that are used to represent data in a more meaningful manner to counter the problem of Information Overload. We focus primarily on various algorithms used for Storyline Generation which is a technique to capture the underlying temporal and casual dependencies amongst the news events.
Wrapper induction construct wrappers automatically to extract information f...George Ang
Wrapper induction is a technique to automatically generate wrappers to extract information from web sources. It involves learning extraction rules from labeled examples to construct a wrapper as a finite state machine or set of delimiters. Two main wrapper induction systems are WIEN, which defines wrapper classes including LR, and STALKER, which uses a more expressive model with extraction rules and landmarks to handle structure hierarchically. Remaining challenges include selecting informative examples, generating label pages automatically, and developing more expressive models.
This document summarizes a tutorial given by Bing Liu on opinion mining and summarization. The tutorial covered several key topics in opinion mining including sentiment classification at the document and sentence level, feature-based opinion mining and summarization, comparative sentence extraction, and opinion spam detection. The tutorial provided an overview of the field of opinion mining and abstraction as well as summaries of various approaches to tasks such as sentiment classification using machine learning methods and feature scoring.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how the algorithm works, its advantages in achieving compression close to the source entropy, and some limitations. It also covers applications of Huffman coding like image compression.
Do not crawl in the dust different ur ls similar textGeorge Ang
The document describes the DustBuster algorithm for discovering DUST rules - rules that transform one URL into another URL that contains similar content. The algorithm takes as input a list of URLs from a website and finds valid DUST rules without requiring any page fetches. It detects likely DUST rules based on a large support principle and small buckets principle. It then eliminates redundant rules and validates the remaining rules using a sample of URLs to identify rules that transform URLs with similar content. Experimental results on logs from two websites show that DustBuster is able to discover DUST rules that can help improve crawling efficiency.
The document discusses techniques for optimizing front-end web performance. It provides examples of how much time is spent loading different parts of top websites, both with empty caches and full caches. The "performance golden rule" is that 80-90% of end-user response time is spent on the front-end. The document also outlines Yahoo's 14 rules for performance optimization, which include making fewer HTTP requests, using content delivery networks, adding Expires headers, gzipping components, script and CSS placement, and more.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMydbops
This presentation, titled "MySQL - InnoDB" and delivered by Mayank Prasad at the Mydbops Open Source Database Meetup 16 on June 8th, 2024, covers dynamic configuration of REDO logs and instant ADD/DROP columns in InnoDB.
This presentation dives deep into the world of InnoDB, exploring two ground-breaking features introduced in MySQL 8.0:
• Dynamic Configuration of REDO Logs: Enhance your database's performance and flexibility with on-the-fly adjustments to REDO log capacity. Unleash the power of the snake metaphor to visualize how InnoDB manages REDO log files.
• Instant ADD/DROP Columns: Say goodbye to costly table rebuilds! This presentation unveils how InnoDB now enables seamless addition and removal of columns without compromising data integrity or incurring downtime.
Key Learnings:
• Grasp the concept of REDO logs and their significance in InnoDB's transaction management.
• Discover the advantages of dynamic REDO log configuration and how to leverage it for optimal performance.
• Understand the inner workings of instant ADD/DROP columns and their impact on database operations.
• Gain valuable insights into the row versioning mechanism that empowers instant column modifications.
Do not crawl in the dust different ur ls similar textGeorge Ang
The document describes the DustBuster algorithm for discovering DUST rules - rules that transform one URL into another URL that contains similar content. The algorithm takes as input a list of URLs from a website and finds valid DUST rules without requiring any page fetches. It detects likely DUST rules based on a large support principle and small buckets principle. It then eliminates redundant rules and validates the remaining rules using a sample of URLs to identify rules that transform URLs with similar content. Experimental results on logs from two websites show that DustBuster is able to discover DUST rules that can help improve crawling efficiency.
The document discusses techniques for optimizing front-end web performance. It provides examples of how much time is spent loading different parts of top websites, both with empty caches and full caches. The "performance golden rule" is that 80-90% of end-user response time is spent on the front-end. The document also outlines Yahoo's 14 rules for performance optimization, which include making fewer HTTP requests, using content delivery networks, adding Expires headers, gzipping components, script and CSS placement, and more.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMydbops
This presentation, titled "MySQL - InnoDB" and delivered by Mayank Prasad at the Mydbops Open Source Database Meetup 16 on June 8th, 2024, covers dynamic configuration of REDO logs and instant ADD/DROP columns in InnoDB.
This presentation dives deep into the world of InnoDB, exploring two ground-breaking features introduced in MySQL 8.0:
• Dynamic Configuration of REDO Logs: Enhance your database's performance and flexibility with on-the-fly adjustments to REDO log capacity. Unleash the power of the snake metaphor to visualize how InnoDB manages REDO log files.
• Instant ADD/DROP Columns: Say goodbye to costly table rebuilds! This presentation unveils how InnoDB now enables seamless addition and removal of columns without compromising data integrity or incurring downtime.
Key Learnings:
• Grasp the concept of REDO logs and their significance in InnoDB's transaction management.
• Discover the advantages of dynamic REDO log configuration and how to leverage it for optimal performance.
• Understand the inner workings of instant ADD/DROP columns and their impact on database operations.
• Gain valuable insights into the row versioning mechanism that empowers instant column modifications.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Ukraine
Під час доповіді відповімо на питання, навіщо потрібно підвищувати продуктивність аплікації і які є найефективніші способи для цього. А також поговоримо про те, що таке кеш, які його види бувають та, основне — як знайти performance bottleneck?
Відео та деталі заходу: https://bit.ly/45tILxj
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...AlexanderRichford
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation Functions to Prevent Interaction with Malicious QR Codes.
Aim of the Study: The goal of this research was to develop a robust hybrid approach for identifying malicious and insecure URLs derived from QR codes, ensuring safe interactions.
This is achieved through:
Machine Learning Model: Predicts the likelihood of a URL being malicious.
Security Validation Functions: Ensures the derived URL has a valid certificate and proper URL format.
This innovative blend of technology aims to enhance cybersecurity measures and protect users from potential threats hidden within QR codes 🖥 🔒
This study was my first introduction to using ML which has shown me the immense potential of ML in creating more secure digital environments!
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
Utilizing temporal information in topic detection and tracking
1. University of Helsinki Department of Computer Scien
Utilizing Temporal Information in
Topic Detection and Tracking
Juha Makkonen and Helena Ahonen–Myka
{jamakkon,hahonen}@cs.helsinki.fi
University of Helsinki – Department of Computer Science
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.1/15 2003-08-1
2. University of Helsinki Department of Computer Scien
Outline
Introduction
Topic Detection and Tracking
Resolving temporal expressions
Recognition
Formalization
Comparison
Experiments
Future Work
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.2/15 2003-08-1
3. University of Helsinki Department of Computer Scien
Introduction
Temporal expressions are often omitted.
their extraction requires tools,
they have to be formalized in order to be of any use,
comparing formalizations is sometimes tricky.
By no means a novel idea
in AI to form chronologies of events,
in question answering to extract a fact,
in databases, diagnosing systems, dialog systems . . .
We want to measure the temporal similarity of two
documents.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.3/15 2003-08-1
4. University of Helsinki Department of Computer Scien
Topic Detection and Tracking
TDT system monitors news broadcasts in order to
detect new, previously unreported events, and to
track the development of the detected events.
The focus is on news events: something untrivial taking
place at a specific time and place.
A topic is understood as as is an event or an activity,
along with all related events and activities.
The news stream that is monitored in intrinsically
sensitive to time.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.4/15 2003-08-1
5. University of Helsinki Department of Computer Scien
Resolving Temporal Expressions
An expression can be
explicit: “the 19th of August 2003”,
implicit: “today”, “Tuesday afternoon”, or
vague: “since April”, “a couple of weeks ago” .
The evaluation is based on a point of reference. “The
winter of 1974 was cold. The next winter will be colder.”
“The winter of 1974 was cold. The next winter was colder.”
Resolving the meaning of the latter winter requires
the reference time or the utterance time and
the tense of the relevant verb.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.5/15 2003-08-1
6. University of Helsinki Department of Computer Scien
Recognition
The relevant terms are split into categories.
category terms
baseterm day, week, weekday, month, monthname, quarter, season, year, decade
indexical yesterday, today, tomorrow
internal beginning, end, early, late, middle
determiner this, last, next, previous, the
temporal in, on, by, during, after, until, since, before, later
postmodifier of, to
numeral one, two, . . .
ordinal first, second, . . .
adverb ago
meta throughout
vague some, few, several
recurrence every, per
source from
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.6/15 2003-08-1
7. University of Helsinki Department of Computer Scien
Recognition
The categories are used to build automata.
postmodifier
determiner
ordinal postmodifier determiner
year
determiner
init
monthname
determiner
internal
temporal
numeral
internal ordinal
“The strike started on the 15th of May 1919. It lasted until
the end of June, although there was still turmoil in late
January next year”.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.7/15 2003-08-1
8. University of Helsinki Department of Computer Scien
Formalization
We map the expressions onto a calendar
a time-line – points with precedence relation,
a set of granularities (year, month, week, . . . )
note: March, Thursday and weekend are also granularities.
a set of conversion functions between granularities.
The expressions are mapped as intervals [tstart , tend ] of the
bottom granularity which in our case is day.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.8/15 2003-08-1
9. University of Helsinki Department of Computer Scien
Formalization
The baseterm of the expression defines interval.
The non-baseterms are interpreted as shift and span
functions that modify the start and end points.
shift: this, next, last, 3 weeks ago, etc.
span: until, before, after, from, etc.
the length of the interval is modified by internals
in the beginning of 1970s, late May, etc.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.9/15 2003-08-1
10. University of Helsinki Department of Computer Scien
Comparison
We want to measure the temporal similarity of two
documents, i.e., how much the references overlap.
When comparing the intervals of two documents
compare pairwise all intervals
similarity = 2 * overlap / size of the intervals
take the average of the best matches for each interval.
The outcome measures how well the references of one
document cover those of the other.
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.10/15 2003-08-1
11. University of Helsinki Department of Computer Scien
Experiments
Data: transcribed TV and radio broadcasts and online
news.
8595 documents from the TDT2 corpus.
2383 documents were labeled to one of 35 events.
Temporal expression recognition with 1417 sentences
type freq recognition canonization
simple 326 0.98 0.93
composite 209 0.85 0.66
Verbs like to schedule , to plan or to expect gave hard time.
“The meeting was scheduled for Monday.” Which one?
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.11/15 2003-08-1
12. University of Helsinki Department of Computer Scien
Experiments
The distribution of temporal relations
same event
relation yes no
before 0.761 0.831
meets 0.001 0.000
overlaps 0.016 0.008
begins 0.010 0.006
falls within 0.168 0.122
finishes 0.010 0.008
exact 0.072 0.056
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.12/15 2003-08-1
13. University of Helsinki Department of Computer Scien
Experiments
Temporal similarity is higher when documents are
relevant.
average of same event different event ratio of yes/no
sum of pairwise 0.0034 0.0023 1.4783
max of pairwise 0.0059 0.0040 1.4750
Finding the best-match for each interval does not pay off.
A better accuracy on formalization would help.
What is the meaning of “three years ago?”
How to represent informativeness?
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.13/15 2003-08-1
14. University of Helsinki Department of Computer Scien
Future Work
Improvement of the composite expression processing
more work on the automata
Introduction of vagueness:
an expression would formalized as probability
distributions on the timeline
similarity could be Kullback-Leibler, for instance.
Survey of the behaviour of the temporal expressions
how the references distribute per medium?
the first story compared to the following ones?
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.14/15 2003-08-1
15. University of Helsinki Department of Computer Scien
The End
Thank you
Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.15/15 2003-08-1