SlideShare a Scribd company logo
1 of 39
Temporal Web Dynamics
Implications for Information Retrieval
Nattiya Kanhabua
1st ALEXANDRIA Workshop
L3S Research Center, Hannover, Germany
15 September 2014
Outline
• What are temporal web dynamics?
• Why the dynamics impact search?
• Overview of time-aware approaches
– Temporal Information Extraction
– Temporal Query Analysis
– Time-aware Retrieval and Ranking
• Conclusion and outlook
Temporal Web Dynamics
• Web is changing over time in many aspects,
e.g., size, content, structure and how it is
accessed by user interactions or queries.
– Size: web pages are added/deleted at all time
– Content: web pages are edited/modified
– Query: users’ information needs changes
[Ke et al., CN 2006; Risvik et al., CN 2002]
[Dumais, SIAM-SDM 2012; WebDyn 2010]
Content/Structure Changes
Implications: Crawling, Indexing, Ranking
Fig. 1 Categorization of document collections with content changes over time.
Changes in User Behavior
Implications: Query Analysis, Ranking
Fig. 2 Categorization of queries with temporal information needs.
http://www.google.com/insights/search
Temporal Query Examples
• A temporal query consists of:
– Query keywords
– Temporal expressions
• A document consists of:
– Terms, i.e., bag-of-words
– Publication time and temporal expressions
[Berberich et al., ECIR 2010]
Implications for Search
query
Temporal
Web
Determining
Search Intent
Term: {Germany, World, Cup}
Time: {06/2006, 07/2006}
D2006
Retrieved results
matching/ranking
Time-sensitive
queries
Semantic
Annotation
Annotated
documents Term: {w1, w2, …, wn}
Time: {PubTime(di), ContentTime(di)}
Temporal Information
Extraction
Two Time Aspects
Two time dimensions
1. Publication or modified time
2. Content or event time
content time
publication time
Problem Statements
• Difficult to find the trustworthy time for web documents
– Time gap between crawling and indexing
– Decentralization and relocation of web documents
– No standard metadata for time/date
Document Dating
Let’s me see…
This document is
probably
written in 850 A.C.
with 95% confidence.
I found a bible-like
document. But I have
no idea when it was
created?
“ For a given document with uncertain
timestamp, can the contents be used to
determine the timestamp with a sufficiently
high confidence? ”
Probabilistic Approach
Timestamp Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
tsunami
Thailand
A non-timestamped
document
Similarity Scores
Score(1999) = 1
Score(2004) = 1 + 1 = 2 Most likely timestamp is 2004
Temporal Language
Models
• Based on the statistic usage
of words over time
• Compare each word of a
non-timestamped document
with a reference corpus
• Tentative timestamp -- a
time partition mostly
overlaps in word usage
[de Jong et al., AHC 2005; Kraaij, SIGIR Forum 2005; Kanhabua et al., ECDL 2008]
Freq
1
1
1
1
1
1
Extracting Content Time
• How to determine relevant temporal
expressions tagged in a document?
– Not all temporal expressions associated to an event
are equally relevant
• Approaches: machine learning; rule-based
Reported by World Health Organization (WHO) on
29 July 2012 about an ongoing Ebola outbreak
in Uganda since the beginning of July 2012
[Kanhabua et al., TAIA 2012; StrĂśtgen et al., TempWeb 2012; Hoffart et al., AIJ 2012]
Temporal Query Analysis
Temporal Queries
• Temporal queries exist in
the Web and archives
– Relevancy is dependent on time
– Documents are about events at
particular time
– Users: historians, librarians or
journalists
[Li et al., CIKM 2003; Jones and Diaz, ACM TOIS 2007; Berberich et al., ECIR 2010;
Peetz et al., Information Retrieval 2014]
• Searching temporal document collections
– E.g., digital libraries, web/news archives
• Problems: semantic gaps or lacking knowledge
1. possibly relevant time of queries
2. terminology changes over time
Challenges
Challenges
• Semantic gaps: lacking knowledge about
1. possibly relevant time of queries
2. terminology changes over time
query
time1
time2
…
timek
suggest
Challenges
• Semantic gaps: lacking knowledge about
1. possibly relevant time of queries
2. terminology changes over time
query
time1
time2
…
timek
suggest
How to determine the time of an implicit temporal query?
Current Approaches
1. Query log analysis
2. Search result analysis
Query Log Analysis
• Mining query logs
– Analyze query frequencies over time for identifying
the relevant time of queries
– Re-rank search results of implicit temporal queries
using the determined time
[Metzler et al., SIGIR 2009; Zhang et al., EMNLP 2010]
Search Result Analysis
• Use temporal bursts for query
modeling
– Identify temporal bursts in the ranked
lists of documents
– Sample terms from the documents and
update the query model
• Use temporal language models
– Determine tentative time for a query
– Re-rank search results using the
determined time
[Kanhabua et al., ECDL 2010; Peetz et al., Information Retrieval 2014]
• Intuition: documents published closely to the
time of queries are more relevant
– Assign document priors based on publication dates
Re-rank Search Results
query
News archive
Determine time 2005, 2004, 2006, ...
D2009
Initial retrieved results
[Kanhabua et al., ECDL 2010]
• Intuition: documents published closely to the
time of queries are more relevant
– Assign document priors based on publication dates
Re-rank Search Results
query
News archive
Determine time 2005, 2004, 2006, ...
D2009
Initial retrieved results
D2005
Re-ranked results
[Kanhabua et al., ECDL 2010]
Challenges
• Semantic gaps: lacking knowledge about
1. Possibly relevant time of queries
2. Named entity changes over time
query
synonym@2001
synonym@2002
…
synonym@2011
suggest
Problem Statements
• Queries of named entities (people, company, place)
– Highly dynamic in appearance, i.e., relationships between
terms changes over time
– E.g. changes of roles, name alterations, or semantic shift
Named Entity Evolution
Problem Statements
• Queries of named entities (people, company, place)
– Highly dynamic in appearance, i.e., relationships between
terms changes over time
– E.g. changes of roles, name alterations, or semantic shift
Named Entity Evolution
Scenario 1
Query: “Pope Benedict XVI” and written before 2005
Documents about “Joseph Alois Ratzinger” are relevant
Problem Statements
• Queries of named entities (people, company, place)
– Highly dynamic in appearance, i.e., relationships between
terms changes over time
– E.g. changes of roles, name alterations, or semantic shift
Named Entity Evolution
Scenario 1
Query: “Pope Benedict XVI” and written before 2005
Documents about “Joseph Alois Ratzinger” are relevant
Scenario 2
Query: “Hillary R. Clinton” and written from 1997 to 2002
Documents about “New York Senator” and “First Lady of
the United States” are relevant
QUEST Demo: http://research.idi.ntnu.no/wislab/quest/
Find Temporal Synonyms
• Extract time-based synonyms from Wikipedia
• Find a set of entity-synonym relationships at time tk
• For each ei ϵ Etk , extract anchor texts from article
links:
– Entity: President_of_the_United_States
– Synonym: George W. Bush
– Time: 11/2004
President_of_th
e_United_States
George
W. Bush
George
W. Bush
Presiden
t George
W. Bush
Presiden
t Bush
(43)
[Kanhabua et al., JCDL 2010]
Temporal Entity-Synonym
Note: the time of synonyms are timestamps of Wikipedia articles (8 years)
Time-aware Retrieval and
Ranking
Searching the Past
• Time must be explicitly modeled in order to
increase the effectiveness of ranking
– To order search results so that the most relevant ones
are ranked higher
Web
archives
news
archives
blogs emails
“temporal document
collections”
Retrieve documents
about Pope Benedict
XVI written before 2005
Term-based IR approaches
may give unsatisfied results
Query/Document Models
• A temporal query consists of:
– Query keywords
– Temporal expressions
• A document consists of:
– Terms, i.e., bag-of-words
– Publication time and temporal expressions
Time-aware Ranking Models
• Two main approaches
1. Mixture model [Kanhabua et al., ECDL 2010]
• Linearly combining textual- and temporal similarity
2. Probabilistic model [Berberich et al., ECIR 2010]
• Generating a query from the textual part and temporal part
of a document independently
Mixture Model
• Linearly combine textual- and temporal similarity
– α indicates the importance of similarity scores
• Both scores are normalized before combining
– Textual similarity can be determined using any term-
based retrieval model
• E.g., tf.idf or a unigram language model
Mixture Model
• Linearly combine textual- and temporal similarity
– α indicates the importance of similarity scores
• Both scores are normalized before combining
– Textual similarity can be determined using any term-
based retrieval model
• E.g., tf.idf or a unigram language model
How to determine temporal similarity?
Temporal Similarity
• Assume that temporal expressions in the query are
generated independently from a two-step
generative model:
– P(tq|td) can be estimated based on publication time
using an exponential decay function [Kanhabua et al.,
ECDL 2010]
– Linear interpolation smoothing is applied to eliminates
zero probabilities
• I.e., an unseen temporal expression tq in d
Similarityscore
Time
d1 d2
<q>
Dist(d1,q)
Dist(d2,q)
Conclusion and Outlook
• Temporal web dynamics and its impact
• State of the art temporal IR techniques
• Future work:
– Search in versioned document collections
– Efficient methods for document processing
– Effective retrieval and ranking, e.g., return
aggregated results or summaries
– Support exploratory search in Web archives
References
• [Berberich et al., WebDB 2009] Klaus Berberich, Srikanta J. Bedathur, Mauro Sozio, Gerhard
Weikum: Bridging the Terminology Gap in Web Archive Search. WebDB 2009
• [Berberich et al., ECIR 2010] Klaus Berberich, Srikanta J. Bedathur, Omar Alonso, Gerhard Weikum:
A Language Modeling Approach for Temporal Information Needs. ECIR 2010: 13-25
• [Dumais, SIAM-SDM 2012] Susan T. Dumais: Temporal Dynamics and Information Retrieval. SIAM-
SDM 2012
• [de Jong et al., AHC 2005] Franciska de Jong, Henning Rode, Djoerd Hiemstra: Temporal language
models for the disclosure of historical text. AHC 2005: 161-168
• [Kaluarachchi et al., CIKM 2010] Amal Chaminda Kaluarachchi, Aparna S. Varde, Srikanta J.
Bedathur, Gerhard Weikum, Jing Peng, Anna Feldman: Incorporating terminology evolution for query
translation in text retrieval with association rules. CIKM 2010: 1789-1792
• [Kanhabua et al., JCDL 2010] Nattiya Kanhabua, Kjetil Nørvåg: Exploiting time-based synonyms in
searching document archives. JCDL 2010: 79-88
• [Kanhabua et al., ECDL 2010] Nattiya Kanhabua, Kjetil Nørvåg: Determining Time of Queries for Re-
ranking Search Results. ECDL 2010: 261-272
• [Kanhabua et al., TAIA 2012] Nattiya Kanhabua, Sara Romano, Avaré Stewart: Identifying Relevant
Temporal Expressions for Real-World Events. Time-aware Information Access Workshop 2012
• [Ke et al., CN 2006] Yiping Ke, Lin Deng, Wilfred Ng, Dik Lun Lee: Web dynamics and their
ramifications for the development of Web search engines. Computer Networks 50(10): 1430-1447
(2006)
References (cont’)
• [Metzler et al., SIGIR 2009] Donald Metzler, Rosie Jones, Fuchun Peng, Ruiqiang Zhang:
Improving search relevance for implicitly temporal queries. SIGIR 2009: 700-701
• [Nunes et al., ECIR 2008] Sérgio Nunes, Cristina Ribeiro, Gabriel David: Use of Temporal
Expressions in Web Search. ECIR 2008: 580-584
• [Peetz et al., Information Retrieval 2014] Maria-Hendrike Peetz, Edgar Meij, Maarten de Rijke.
Using temporal bursts for query modeling. Information Retrieval, 17(1), 74-108, 2014.
• [Risvik et al., CN 2002] Knut Magne Risvik, Rolf Michelsen: Search engines and Web dynamics.
Computer Networks 39(3): 289-302 (2002)
• [Shokouhi, SIGIR 2011] Milad Shokouhi: Detecting Seasonal Queries by Time-Series Analysis.
SIGIR 2011: 1171-1172
• [Strötgen et al., TempWeb 2012] Jannik Strötgen, Omar Alonso, Michael Gertz: Identification of
top relevant temporal expressions in documents. Temporal Web Workshop 2012.
• [Tahmasebi et al., COLING2012] Nina Tahmasebi, Gerhard Gossen, Nattiya Kanhabua, Helge
Holzmann, Thomas Risse: NEER: An Unsupervised Method for Named Entity Evolution
Recognition. COLING 2012
• [WebDyn 2010] Web Dynamics course: http://www.mpi-
inf.mpg.de/departments/d5/teaching/ss10/dyn/, Max-Planck Institute for Informatics, SaarbrĂźcken,
Germany, 2010
• [Zhang et al., EMNLP 2010] Ruiqiang Zhang, Yuki Konda, Anlei Dong, Pranam Kolari, Yi Chang,
Zhaohui Zheng: Learning Recurrent Event Queries for Web Search. EMNLP 2010: 1129-1139

More Related Content

Similar to Temporal Web Dynamics and Implications for Information Retrieval

Temporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveTemporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveNattiya Kanhabua
 
Searching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesSearching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesNattiya Kanhabua
 
Dynamics of Web: Analysis and Implications from Search Perspective
Dynamics of Web: Analysis and Implications from Search  PerspectiveDynamics of Web: Analysis and Implications from Search  Perspective
Dynamics of Web: Analysis and Implications from Search PerspectiveNattiya Kanhabua
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalNattiya Kanhabua
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and futureRoi Blanco
 
Identifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsIdentifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsNattiya Kanhabua
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTu Nguyen
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Training Researchers with the MOVING Platform
Training Researchers with the MOVING PlatformTraining Researchers with the MOVING Platform
Training Researchers with the MOVING PlatformIacopo Vagliano
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptxTANMAY DAS GUPTA
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaNattiya Kanhabua
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptxJitha Kannan
 
Wise #LAK15 It's About Time Workshop
Wise #LAK15 It's About Time WorkshopWise #LAK15 It's About Time Workshop
Wise #LAK15 It's About Time Workshopalywise
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
 
3_Indexing.ppt
3_Indexing.ppt3_Indexing.ppt
3_Indexing.pptMedinaBedru
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...TimelessFuture
 
Data Analysis in Research for Social Study
Data Analysis in Research for Social StudyData Analysis in Research for Social Study
Data Analysis in Research for Social StudyLisaneworkSileshi
 

Similar to Temporal Web Dynamics and Implications for Information Retrieval (20)

Temporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveTemporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search Perspective
 
Searching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesSearching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current Approaches
 
Dynamics of Web: Analysis and Implications from Search Perspective
Dynamics of Web: Analysis and Implications from Search  PerspectiveDynamics of Web: Analysis and Implications from Search  Perspective
Dynamics of Web: Analysis and Implications from Search Perspective
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information Retrieval
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 
Identifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsIdentifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world Events
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the Web
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Training Researchers with the MOVING Platform
Training Researchers with the MOVING PlatformTraining Researchers with the MOVING Platform
Training Researchers with the MOVING Platform
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
LR.pptx
LR.pptxLR.pptx
LR.pptx
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
Wise #LAK15 It's About Time Workshop
Wise #LAK15 It's About Time WorkshopWise #LAK15 It's About Time Workshop
Wise #LAK15 It's About Time Workshop
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
3_Indexing.ppt
3_Indexing.ppt3_Indexing.ppt
3_Indexing.ppt
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
 
Data Analysis in Research for Social Study
Data Analysis in Research for Social StudyData Analysis in Research for Social Study
Data Analysis in Research for Social Study
 

More from Nattiya Kanhabua

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Nattiya Kanhabua
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksNattiya Kanhabua
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Nattiya Kanhabua
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationNattiya Kanhabua
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News PredictionsNattiya Kanhabua
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updatesNattiya Kanhabua
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Nattiya Kanhabua
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Nattiya Kanhabua
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Nattiya Kanhabua
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Nattiya Kanhabua
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsNattiya Kanhabua
 
Supporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalSupporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalNattiya Kanhabua
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Nattiya Kanhabua
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Nattiya Kanhabua
 
Exploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesExploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesNattiya Kanhabua
 
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Nattiya Kanhabua
 
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...Nattiya Kanhabua
 

More from Nattiya Kanhabua (17)

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of Outbreaks
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updates
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search Results
 
Supporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalSupporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information Retrieval
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 
Exploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesExploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document Archives
 
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
 
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
 

Recently uploaded

INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke
 

Recently uploaded (20)

INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
 

Temporal Web Dynamics and Implications for Information Retrieval

  • 1. Temporal Web Dynamics Implications for Information Retrieval Nattiya Kanhabua 1st ALEXANDRIA Workshop L3S Research Center, Hannover, Germany 15 September 2014
  • 2. Outline • What are temporal web dynamics? • Why the dynamics impact search? • Overview of time-aware approaches – Temporal Information Extraction – Temporal Query Analysis – Time-aware Retrieval and Ranking • Conclusion and outlook
  • 3. Temporal Web Dynamics • Web is changing over time in many aspects, e.g., size, content, structure and how it is accessed by user interactions or queries. – Size: web pages are added/deleted at all time – Content: web pages are edited/modified – Query: users’ information needs changes [Ke et al., CN 2006; Risvik et al., CN 2002] [Dumais, SIAM-SDM 2012; WebDyn 2010]
  • 4. Content/Structure Changes Implications: Crawling, Indexing, Ranking Fig. 1 Categorization of document collections with content changes over time.
  • 5. Changes in User Behavior Implications: Query Analysis, Ranking Fig. 2 Categorization of queries with temporal information needs. http://www.google.com/insights/search
  • 6. Temporal Query Examples • A temporal query consists of: – Query keywords – Temporal expressions • A document consists of: – Terms, i.e., bag-of-words – Publication time and temporal expressions [Berberich et al., ECIR 2010]
  • 7. Implications for Search query Temporal Web Determining Search Intent Term: {Germany, World, Cup} Time: {06/2006, 07/2006} D2006 Retrieved results matching/ranking Time-sensitive queries Semantic Annotation Annotated documents Term: {w1, w2, …, wn} Time: {PubTime(di), ContentTime(di)}
  • 9. Two Time Aspects Two time dimensions 1. Publication or modified time 2. Content or event time content time publication time
  • 10. Problem Statements • Difficult to find the trustworthy time for web documents – Time gap between crawling and indexing – Decentralization and relocation of web documents – No standard metadata for time/date Document Dating Let’s me see… This document is probably written in 850 A.C. with 95% confidence. I found a bible-like document. But I have no idea when it was created? “ For a given document with uncertain timestamp, can the contents be used to determine the timestamp with a sufficiently high confidence? ”
  • 11. Probabilistic Approach Timestamp Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models tsunami Thailand A non-timestamped document Similarity Scores Score(1999) = 1 Score(2004) = 1 + 1 = 2 Most likely timestamp is 2004 Temporal Language Models • Based on the statistic usage of words over time • Compare each word of a non-timestamped document with a reference corpus • Tentative timestamp -- a time partition mostly overlaps in word usage [de Jong et al., AHC 2005; Kraaij, SIGIR Forum 2005; Kanhabua et al., ECDL 2008] Freq 1 1 1 1 1 1
  • 12. Extracting Content Time • How to determine relevant temporal expressions tagged in a document? – Not all temporal expressions associated to an event are equally relevant • Approaches: machine learning; rule-based Reported by World Health Organization (WHO) on 29 July 2012 about an ongoing Ebola outbreak in Uganda since the beginning of July 2012 [Kanhabua et al., TAIA 2012; StrĂśtgen et al., TempWeb 2012; Hoffart et al., AIJ 2012]
  • 14. Temporal Queries • Temporal queries exist in the Web and archives – Relevancy is dependent on time – Documents are about events at particular time – Users: historians, librarians or journalists [Li et al., CIKM 2003; Jones and Diaz, ACM TOIS 2007; Berberich et al., ECIR 2010; Peetz et al., Information Retrieval 2014]
  • 15. • Searching temporal document collections – E.g., digital libraries, web/news archives • Problems: semantic gaps or lacking knowledge 1. possibly relevant time of queries 2. terminology changes over time Challenges
  • 16. Challenges • Semantic gaps: lacking knowledge about 1. possibly relevant time of queries 2. terminology changes over time query time1 time2 … timek suggest
  • 17. Challenges • Semantic gaps: lacking knowledge about 1. possibly relevant time of queries 2. terminology changes over time query time1 time2 … timek suggest How to determine the time of an implicit temporal query?
  • 18. Current Approaches 1. Query log analysis 2. Search result analysis
  • 19. Query Log Analysis • Mining query logs – Analyze query frequencies over time for identifying the relevant time of queries – Re-rank search results of implicit temporal queries using the determined time [Metzler et al., SIGIR 2009; Zhang et al., EMNLP 2010]
  • 20. Search Result Analysis • Use temporal bursts for query modeling – Identify temporal bursts in the ranked lists of documents – Sample terms from the documents and update the query model • Use temporal language models – Determine tentative time for a query – Re-rank search results using the determined time [Kanhabua et al., ECDL 2010; Peetz et al., Information Retrieval 2014]
  • 21. • Intuition: documents published closely to the time of queries are more relevant – Assign document priors based on publication dates Re-rank Search Results query News archive Determine time 2005, 2004, 2006, ... D2009 Initial retrieved results [Kanhabua et al., ECDL 2010]
  • 22. • Intuition: documents published closely to the time of queries are more relevant – Assign document priors based on publication dates Re-rank Search Results query News archive Determine time 2005, 2004, 2006, ... D2009 Initial retrieved results D2005 Re-ranked results [Kanhabua et al., ECDL 2010]
  • 23. Challenges • Semantic gaps: lacking knowledge about 1. Possibly relevant time of queries 2. Named entity changes over time query synonym@2001 synonym@2002 … synonym@2011 suggest
  • 24. Problem Statements • Queries of named entities (people, company, place) – Highly dynamic in appearance, i.e., relationships between terms changes over time – E.g. changes of roles, name alterations, or semantic shift Named Entity Evolution
  • 25. Problem Statements • Queries of named entities (people, company, place) – Highly dynamic in appearance, i.e., relationships between terms changes over time – E.g. changes of roles, name alterations, or semantic shift Named Entity Evolution Scenario 1 Query: “Pope Benedict XVI” and written before 2005 Documents about “Joseph Alois Ratzinger” are relevant
  • 26. Problem Statements • Queries of named entities (people, company, place) – Highly dynamic in appearance, i.e., relationships between terms changes over time – E.g. changes of roles, name alterations, or semantic shift Named Entity Evolution Scenario 1 Query: “Pope Benedict XVI” and written before 2005 Documents about “Joseph Alois Ratzinger” are relevant Scenario 2 Query: “Hillary R. Clinton” and written from 1997 to 2002 Documents about “New York Senator” and “First Lady of the United States” are relevant
  • 28. Find Temporal Synonyms • Extract time-based synonyms from Wikipedia • Find a set of entity-synonym relationships at time tk • For each ei Ďľ Etk , extract anchor texts from article links: – Entity: President_of_the_United_States – Synonym: George W. Bush – Time: 11/2004 President_of_th e_United_States George W. Bush George W. Bush Presiden t George W. Bush Presiden t Bush (43) [Kanhabua et al., JCDL 2010]
  • 29. Temporal Entity-Synonym Note: the time of synonyms are timestamps of Wikipedia articles (8 years)
  • 31. Searching the Past • Time must be explicitly modeled in order to increase the effectiveness of ranking – To order search results so that the most relevant ones are ranked higher Web archives news archives blogs emails “temporal document collections” Retrieve documents about Pope Benedict XVI written before 2005 Term-based IR approaches may give unsatisfied results
  • 32. Query/Document Models • A temporal query consists of: – Query keywords – Temporal expressions • A document consists of: – Terms, i.e., bag-of-words – Publication time and temporal expressions
  • 33. Time-aware Ranking Models • Two main approaches 1. Mixture model [Kanhabua et al., ECDL 2010] • Linearly combining textual- and temporal similarity 2. Probabilistic model [Berberich et al., ECIR 2010] • Generating a query from the textual part and temporal part of a document independently
  • 34. Mixture Model • Linearly combine textual- and temporal similarity – Îą indicates the importance of similarity scores • Both scores are normalized before combining – Textual similarity can be determined using any term- based retrieval model • E.g., tf.idf or a unigram language model
  • 35. Mixture Model • Linearly combine textual- and temporal similarity – Îą indicates the importance of similarity scores • Both scores are normalized before combining – Textual similarity can be determined using any term- based retrieval model • E.g., tf.idf or a unigram language model How to determine temporal similarity?
  • 36. Temporal Similarity • Assume that temporal expressions in the query are generated independently from a two-step generative model: – P(tq|td) can be estimated based on publication time using an exponential decay function [Kanhabua et al., ECDL 2010] – Linear interpolation smoothing is applied to eliminates zero probabilities • I.e., an unseen temporal expression tq in d Similarityscore Time d1 d2 <q> Dist(d1,q) Dist(d2,q)
  • 37. Conclusion and Outlook • Temporal web dynamics and its impact • State of the art temporal IR techniques • Future work: – Search in versioned document collections – Efficient methods for document processing – Effective retrieval and ranking, e.g., return aggregated results or summaries – Support exploratory search in Web archives
  • 38. References • [Berberich et al., WebDB 2009] Klaus Berberich, Srikanta J. Bedathur, Mauro Sozio, Gerhard Weikum: Bridging the Terminology Gap in Web Archive Search. WebDB 2009 • [Berberich et al., ECIR 2010] Klaus Berberich, Srikanta J. Bedathur, Omar Alonso, Gerhard Weikum: A Language Modeling Approach for Temporal Information Needs. ECIR 2010: 13-25 • [Dumais, SIAM-SDM 2012] Susan T. Dumais: Temporal Dynamics and Information Retrieval. SIAM- SDM 2012 • [de Jong et al., AHC 2005] Franciska de Jong, Henning Rode, Djoerd Hiemstra: Temporal language models for the disclosure of historical text. AHC 2005: 161-168 • [Kaluarachchi et al., CIKM 2010] Amal Chaminda Kaluarachchi, Aparna S. Varde, Srikanta J. Bedathur, Gerhard Weikum, Jing Peng, Anna Feldman: Incorporating terminology evolution for query translation in text retrieval with association rules. CIKM 2010: 1789-1792 • [Kanhabua et al., JCDL 2010] Nattiya Kanhabua, Kjetil NørvĂĽg: Exploiting time-based synonyms in searching document archives. JCDL 2010: 79-88 • [Kanhabua et al., ECDL 2010] Nattiya Kanhabua, Kjetil NørvĂĽg: Determining Time of Queries for Re- ranking Search Results. ECDL 2010: 261-272 • [Kanhabua et al., TAIA 2012] Nattiya Kanhabua, Sara Romano, AvarĂŠ Stewart: Identifying Relevant Temporal Expressions for Real-World Events. Time-aware Information Access Workshop 2012 • [Ke et al., CN 2006] Yiping Ke, Lin Deng, Wilfred Ng, Dik Lun Lee: Web dynamics and their ramifications for the development of Web search engines. Computer Networks 50(10): 1430-1447 (2006)
  • 39. References (cont’) • [Metzler et al., SIGIR 2009] Donald Metzler, Rosie Jones, Fuchun Peng, Ruiqiang Zhang: Improving search relevance for implicitly temporal queries. SIGIR 2009: 700-701 • [Nunes et al., ECIR 2008] SĂŠrgio Nunes, Cristina Ribeiro, Gabriel David: Use of Temporal Expressions in Web Search. ECIR 2008: 580-584 • [Peetz et al., Information Retrieval 2014] Maria-Hendrike Peetz, Edgar Meij, Maarten de Rijke. Using temporal bursts for query modeling. Information Retrieval, 17(1), 74-108, 2014. • [Risvik et al., CN 2002] Knut Magne Risvik, Rolf Michelsen: Search engines and Web dynamics. Computer Networks 39(3): 289-302 (2002) • [Shokouhi, SIGIR 2011] Milad Shokouhi: Detecting Seasonal Queries by Time-Series Analysis. SIGIR 2011: 1171-1172 • [StrĂśtgen et al., TempWeb 2012] Jannik StrĂśtgen, Omar Alonso, Michael Gertz: Identification of top relevant temporal expressions in documents. Temporal Web Workshop 2012. • [Tahmasebi et al., COLING2012] Nina Tahmasebi, Gerhard Gossen, Nattiya Kanhabua, Helge Holzmann, Thomas Risse: NEER: An Unsupervised Method for Named Entity Evolution Recognition. COLING 2012 • [WebDyn 2010] Web Dynamics course: http://www.mpi- inf.mpg.de/departments/d5/teaching/ss10/dyn/, Max-Planck Institute for Informatics, SaarbrĂźcken, Germany, 2010 • [Zhang et al., EMNLP 2010] Ruiqiang Zhang, Yuki Konda, Anlei Dong, Pranam Kolari, Yi Chang, Zhaohui Zheng: Learning Recurrent Event Queries for Web Search. EMNLP 2010: 1129-1139

Editor's Notes

  1. The Web is evolving over time and it has shown the temporal dynamics in many aspects:
  2. Google Insights for Search, you can compare search volume patterns across specific regions, categories, time frames and properties
  3. Note that, the actual value of any time point, e.g., tbl, tbu, tel, or teu, is an integer or the number of time units (e.g., milliseconds or days) passed (or to pass) a reference point of time (e.g., the UNIX epoch).