A particular problem of searching news archives with named entities is that they are very dynamic in appearance compared to other vocabulary terms, and synonym relationships between terms change with time. In previous work, we proposed an approach to extracting time-based synonyms of named entities from the whole history of Wikipedia. In this paper, we present QUEST (Query Expansion using Synonyms over Time), a system that exploits time-based synonyms in searching news archives. The system takes as input a named entity query, and automatically determines time-based synonyms for a given query wrt. time criteria. Query expansion using the determined synonyms can be employed in order to improve the retrieval effectiveness.
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
QUEST: Query Expansion using Synonyms over Time (poster presentation)
1. QUEST: Query Expansion using Synonyms over Time
Nattiya Kanhabua and Kjetil Nørvåg
Norwegian University of Science and Technology
Overview
Demo: QUEST system bridges semantic gaps in searching
document archives, i.e., a lack of knowledge about terms
semantically equivalent/related to a named entity query wrt. time
Problem statement: When searching news archives using
named entities (people, organizations, locations), synonym*
relationships between terms change over time, e.g., changes
of roles, name alterations, and semantic shift
*In our context, synonyms are name variants (alternative names, titles or roles) of a named entity
Offline module
· The system are driven
by entity-synonym
relationships [1]
· Synonyms are extracted
automatically from the
history of Wikipedia
System prototype
· The system takes a named entity
query and a time period as input
· Synonyms of at the particular
time period are retrieved and
ranked by time-based scores
· A user can select a synonym to
expand the named entity query
Extracting Synonyms over Time
Ranking Time-based Synonyms Expanding Query using Synonyms
Motivation: Evolving entity-synonym relationships over time are
discovered from the snapshots of previous Wikipedia versions
Our approach has two main steps: (1) named entity recognition,
and (2) synonym extractions
Named entity recognition
· Partition Wikipedia wrt. the time
granularity g=month to obtain snapshots
· For each snapshot, identify named entity
pages to obtain a set of named entities as
described in [2]
Given a named entity ei and temporal criteria [ta, tb], synonyms are
retrieved and ranked by a time-based score, defined as a mixture
model of a temporal feature and a frequency of a synonym sj
Given a query q and temporal criteria [ta, tb]:
· pf(sj, [ta, tb]) is a time partition frequency that sj occurs in [ta, tb]
· tf(sj, [ta, tb]) is an averaged term frequency of sj in [ta, tb]
· µ underlines the importance of a temporal feature and a frequency
A live demo can be found at: http://research.idi.ntnu.no/wislab/quest/
Figure 2. Wikipedia snapshot at time tk
· For each entity in a current snapshot,
extract as synonyms all anchor texts for
the associated entity [3]
· Accumulate a set of entity-synonym
relationships from all snapshots
Synonym extraction
* The time of synonyms is timestamps of Wikipedia articles (8 years) in which they appear, not temporal expression
extracted from the contents. Refer to [1] for improving the accuracy of time using the New York Time Annotated Corpus.
[[President_of_the_United_States
|Barack Obama]], “Barack Obama” is
anchor texts linking to the article
President_of_the_United_States
Figure 1. Extracting time-based synonyms from the history of Wikipedia
[1] N. Kanhabua and K. Nørvåg, Exploiting Time-based Synonyms in Searching Document Archives. In Proceedings of JCDL, 2010.
[2] R. C. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of EACL, 2006.
[3] C. Bøhn and K. Nørvåg. Extracting named entities and synonyms from Wikipedia. In Proceedings of AINA, 2010.
Intuition: to measure
popularity of synonyms
based on two factors
· Robust to change over
time, the more partitions
synonyms occur, the more
robust to time they are
· High usages over time,
i.e., a high value of
averaged frequencies
Found 43 synonyms for "Pope Benedict XVI"
during [01/1987,01/2007]
Step 1: verify whether q is a named entity by searching Wikipedia,
and use the first page as the associated named entity
Step 2: retrieve synonyms for the associated named entity
Step 3: select synonyms to expand the original q in order to improve
the retrieval effectiveness