SlideShare a Scribd company logo
1 of 55
Download to read offline
Searching the Temporal Web
Challenges and Current
Approaches
Nattiya Kanhabua
IR Group, University of Glasgow
4 February 2013
Outline
• Evolution of the Web
• Temporal IR system
• Current Approaches
– Content Analysis
– Query Analysis
– Retrieval and Ranking
• Open Issues
2
Evolution of the Web
• Web is changing over time in many aspects:
– Size: web pages are added/deleted all the time
– Content: web pages are edited/modified
– Query: users’ information needs changes
[Ke et al., CN 2006; Risvik et al., CN 2002]
[Dumais, SIAM-SDM 2012; WebDyn 2010] 3
2000
First billion-URL index
The world’s largest!
≈5000 PCs in clusters!2004
Index grows to
4.2 billion pages
1995 2012
2008
Google counts
1 trillion
unique URLs
Web and Index Sizes
2009
TBs or PBs of data/index
Tens of thousands of PCs
http://www.worldwidewebsize.com/
Impacts: crawling, indexing, and caching 4
Content Dynamics
• WayBack Machine
– Web archive search by the Internet Archive
5
1998
2006
Content Dynamics
2012Impacts: document representation and retrieval 6
Query Dynamics
• Search queries exhibit temporal patterns
– Spikes or seasonality
Impacts: search intent and query representation
http://www.google.com/insights/search/
7
Temporal IR System
8
Time-sensitive Queries
• Represent temporal information needs
– E.g., 2006 FIFA World Cup, Thailand tsunami
• Queries and the relevance depend on time
– Query popularities change over time
– Documents are about events at particular time
9
Time Distribution of Qrel
Recency query Time-sensitive query
Time-insensitive query
[Li et al., CIKM 2003]
10
Query/Document Matching
query
Temporal
Web
Determining
Search Intent
Term: {Germany, World, Cup}
Time: {06/2006, 07/2006}
D2006
Retrieved results
matching
Time-sensitive
queries
Semantic
Annotation
Annotated
documents Term: {w1, w2, …, wn}
Time: {PubTime(di), ContentTime(di)}
11
Content Analysis
Two Time Aspects
Two time dimensions
1. Publication or modified time
2. Content or event time
content time
publication time
13
Problem Statements
• Difficult to find the trustworthy time for web documents
– Time gap between crawling and indexing
– Decentralization and relocation of web documents
– No standard metadata for time/date
Document Dating
Let’s me see…
This document is
probably
written in 850 A.C.
with 95% confidence.
I found a bible-like
document. But I have
no idea when it was
created?
“ For a given document with uncertain
timestamp, can the contents be used to
determine the timestamp with a sufficiently
high confidence? ”
14
Current Approaches
1. Content-based
2. Link-based
3. Hybrid
15
Content-based Approach
Partition Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
Temporal Language
Models
• Based on the statistic usage
of words over time
• Compare each word of a
non-timestamped document
with a reference corpus
• Tentative timestamp -- a
time partition mostly
overlaps in word usage
Freq
1
1
1
1
1
1
tsunami
Thailand
A non-timestamped
document
16
[de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
Content-based Approach
Partition Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
tsunami
Thailand
A non-timestamped
document
Temporal Language
Models
• Based on the statistic usage
of words over time
• Compare each word of a
non-timestamped document
with a reference corpus
• Tentative timestamp -- a
time partition mostly
overlaps in word usage
Freq
1
1
1
1
1
1
17
[de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
Content-based Approach
Partition Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
tsunami
Thailand
A non-timestamped
document
Temporal Language
Models
• Based on the statistic usage
of words over time
• Compare each word of a
non-timestamped document
with a reference corpus
• Tentative timestamp -- a
time partition mostly
overlaps in word usage
Freq
1
1
1
1
1
1
18
[de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
Content-based Approach
Partition Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
tsunami
Thailand
A non-timestamped
document
Temporal Language
Models
• Based on the statistic usage
of words over time
• Compare each word of a
non-timestamped document
with a reference corpus
• Tentative timestamp -- a
time partition mostly
overlaps in word usage
Freq
1
1
1
1
1
1
19
[de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
Content-based Approach
Partition Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
tsunami
Thailand
A non-timestamped
document
Similarity Scores
Score(1999) = 1
Score(2004) = 1 + 1 = 2 Most likely timestamp is 2004
Temporal Language
Models
• Based on the statistic usage
of words over time
• Compare each word of a
non-timestamped document
with a reference corpus
• Tentative timestamp -- a
time partition mostly
overlaps in word usage
[de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
Freq
1
1
1
1
1
1
20
Normalized Log-likelihood Ratio
Partition Word
1999 tsunami
1999 Japan
1999 tidal wave
2004 tsunami
2004 Thailand
2004 earthquake
Temporal Language Models
tsunami
Thailand
A non-timestamped
document
Similarity Scores
Score(1999) = 1
Score(2004) = 1 + 1 = 2 Most likely timestamp is 2004
Normalized log-likelihood
ratio
• Variant of Kullback-Leibler
divergence
• Similarity of a document and
time partitions
• C is the background model
estimated on the corpus
• Linear interpolation
smoothing to avoid the zero
probability of unseen words
[Kraaij, SIGIR Forum 2005]
21
Link-based Approach
• Dating a document using its neighbors
1. Web pages linking to the document
• Incoming links
2. Web pages pointed by the document
• Outgoing links
3. Media assets associated with the document
• E.g., images
• Averaging the last-modified dates of its
neighbors as timestamps
[Nunes et al., WIDM 2007]
22
Hybrid Approach
• Inferring timestamps using machine
learning
– Exploit links, contents of a web pages and its neighbors
– Features: linguistic, position, page formats, and tags
[Chen et al., SIGIR 2010]
23
Content Time Extraction
• Three types of temporal expressions
1. Explicit: time mentions being mapped directly to
a time point or interval, e.g., “July 4, 2012”
2. Implicit: imprecise time point or interval, e.g.,
“Independence Day 2012”
3. Relative: resolved to a time point or interval
using other types or the publication date, e.g.,
“next month”
[Alonso et al., SIGIR Forum 2007]
24
Identifying Relevant Time
• How to determine relevant temporal
expressions tagged in a document?
– Not all temporal expressions associated to an event
are equally relevant
Reported by World Health Organization (WHO) on
29 July 2012 about an ongoing Ebola outbreak
in Uganda since the beginning of July 2012
25
Current Approaches
1. Ranking temporal expressions using
different features
2. The task of identifying relevant time is
regarded as a classification problem
– Two classes: (1) relevant and (2) irrelevant
– Definition: relevant referring to the starting, ending or
ongoing time of the event
[Strötgen et al., TempWeb 2012; Kanhabua et al., TAIA 2012]
26
Query Analysis
Determining Time of Queries
• Two types of temporal queries:
1. Explicit: time is provided, "Presidential election 2012“
2. Implicit: time is not provided, "Germany World Cup"
• Temporal intent can be implicitly inferred
• Previous studies on temporal queries:
– 1.5% of web queries are explicit
– ~7% of web queries are implicit
[Nunes et al., ECIR 2008; Metzler et al., SIGIR 2009]
28
Current Approaches
1. Query log analysis
2. Analyzing top-k documents
29
Query Log Analysis
• Features from query logs
– Analyze query frequencies over time for identifying
the relevant time of queries
• Time-series analysis
– Leverage time-series decomposition for detecting
seasonal queries
[Metzler et al., SIGIR 2009; Shokouhi, SIGIR 2011]
30
Time-series Decomposition
Query: Easter Query: World cup
31
Analyzing Top-k Documents
• Using temporal language models
– Determine time of queries when no time is given explicitly
• Exploiting time from search snippets
– Extract temporal expressions (i.e., years) from the
contents of top-k retrieved web snippets for a given query
[Kanhabua et al., ECDL 2010; Campos et al., CIKM 2012]
32
Matching: Re-visited
D2006
Ranked results
query
Temporal
Web
Determining
Search Intent
Term: {Germany, World, Cup}
Time: {06/2006, 07/2006}
D2006
Retrieved results
matching
Time-sensitive
Queries
Semantic
Annotation
Annotated
documents Term: {w1, w2, …, wn}
Time: {PubTime(di), ContentTime(di)}
33
Retrieval and Ranking
Searching the Past
• Searching documents created/edited over time
– E.g., web archives, news archives, blogs, or emails
– A journalist wants to write a timeline of a news article
– A Wikipedia contributor searches for historical
information about an entity of interests
Web
archives
news
archives
blogs emails
“temporal document
collections”
Retrieve documents
about Pope Benedict
XVI written before 2005
Term-based IR approaches
may give unsatisfied results
35
• Time must be explicitly modeled in order to
increase the effectiveness of ranking
– To order search results so that the most relevant
ones are ranked higher
• Time uncertainty should be taken into account
– Two temporal expressions can refer to the same time
period even though they are not equally written
– E.g. the query “Independence Day 2011”
• A retrieval model relying on term-matching only will fail to
retrieve documents mentioning “July 4, 2011”
Challenges
36
Query/Document Models
• A temporal query consists of:
– Query keywords
– Temporal expressions
• A document consists of:
– Terms, i.e., bag-of-words
– Publication time and temporal expressions
37
Temporal Query Examples
• A temporal query consists of:
– Query keywords
– Temporal expressions
• A document consists of:
– Terms, i.e., bag-of-words
– Publication time and temporal expressions
[Berberich et al., ECIR 2010]
38
Time-aware Ranking
• Two main approaches
1. Mixture model
• Linearly combining textual- and temporal similarity
2. Probabilistic model
• Generating a query from the textual part and temporal part
of a document independently
[Li et al., CIKM 2003; Baeza-Yates, SIGIR Forum 2005]
[Berberich et al., ECIR 2010; Kanhabua et al., ECDL 2010]
39
Mixture Model
• Linearly combine textual- and temporal similarity
– α indicates the importance of similarity scores
• Both scores are normalized before combining
– Textual similarity can be determined using any term-
based retrieval model
• E.g., tf.idf or a unigram language model
40[Li et al., CIKM 2003; Baeza-Yates, SIGIR Forum 2005]
[Kanhabua et al., ECDL 2010]
Mixture Model
• Linearly combine textual- and temporal similarity
– α indicates the importance of similarity scores
• Both scores are normalized before combining
– Textual similarity can be determined using any term-
based retrieval model
• E.g., tf.idf or a unigram language model
How to determine temporal similarity?
41
Temporal Similarity
Similarityscore
Time distance
d1 d2
42
Probabilistic Model
• Assume that temporal expressions in the
query are generated independently from a
two-step generative model:
– P(tq|td) can be estimated likelihood for each time
interval tq that td can refer to
– Linear interpolation smoothing is applied to
eliminates zero probabilities
• I.e., an unseen temporal expression tq in d
43
[Berberich et al., ECIR 2010]
• Five time-aware ranking models
– LMT [Berberich et al., ECIR 2010]
– LMTU [Berberich et al., ECIR 2010]
– TS [Kanhabua et al., ECLD 2010]
– TSU [Kanhabua et al., ECLD 2010]
– FuzzySet [Kalczynski et al., Inf. Process. 2005]
Comparison of Time-aware Ranking
[Kanhabua et al., SIGIR 2011a]
44
Open Issues
Problem Statements
• Queries of named entities (people, company, place)
– Highly dynamic in appearance, i.e., relationships between
terms changes over time
– E.g. changes of roles, name alterations, or semantic shift
Named Entity Evolution
Scenario 1
Query: “Pope Benedict XVI” and written before 2005
Documents about “Joseph Alois Ratzinger” are relevant
Scenario 2
Query: “Hillary R. Clinton” and written from 1997 to 2002
Documents about “New York Senator” and “First Lady of
the United States” are relevant
46
Examples of Name Changes
47
QUEST Demo: http://research.idi.ntnu.no/wislab/quest/
Current Approaches
• Temporal co-occurrence
• Mining Wikipedia revisions
• Temporal association rule mining
48[Berberich et al., WebDB 2009; Kanhabua et al., JCDL 2010]
[Kaluarachchi et al., CIKM 2010; Tahmasebi et al., COLING 2012]
1. Performance prediction
– Predict the retrieval effectiveness wrt. a ranking model
Query Prediction Problems
query
precision = ?
recall = ?
MAP = ?
predict
[Cronen-Townsend et al., SIGIR 2002; Diaz et al., SIGIR 2004]
[Hauff et al., ECIR 2010; Carmel et al., 2010; Kanhabua et al. SIGIR 2011b]
49
1. Performance prediction
– Predict the retrieval effectiveness wrt. a ranking model
2. Retrieval model prediction
– Predict the retrieval model that is most suitable
Query Prediction Problems
query ranking = ?
predict max(precision)
max(recall)
max(MAP)
[Peng et al., ECIR 2010; Kanhabua et al., SIGIR 2012]
50
References
• [Alonso et al., SIGIR Forum 2007] Omar Alonso, Michael Gertz, Ricardo A. Baeza-Yates: On the
value of temporal information in information retrieval. SIGIR Forum 41(2): 35-41 (2007)
• [Baeza-Yates, SIGIR Forum 2005] Ricardo A. Baeza-Yates: Searching the future. SIGIR workshop
MF/IR 2005
• [Berberich et al., WebDB 2009] Klaus Berberich, Srikanta J. Bedathur, Mauro Sozio, Gerhard
Weikum: Bridging the Terminology Gap in Web Archive Search. WebDB 2009
• [Berberich et al., ECIR 2010] Klaus Berberich, Srikanta J. Bedathur, Omar Alonso, Gerhard Weikum:
A Language Modeling Approach for Temporal Information Needs. ECIR 2010: 13-25
• [Campos et al., CIKM 2012] Ricardo Campos, Gaël Dias, Alípio Jorge, Celia Nunes: GTE: A
Distributional Second-Order Co-Occurrence Approach to Improve the Identification of Top Relevant
Dates in Web Snippets. CIKM 2012
• [Carmel et al., 2010] David Carmel, Elad Yom-Tov: Estimating the Query Difficulty for Information
Retrieval. Morgan & Claypool Publishers 2010
• [Chen et al., SIGIR 2010] Zhumin Chen, Jun Ma, Chaoran Cui, Hongxing Rui, Shaomang Huang: Web
page publication time detection and its application for page rank. SIGIR 2010: 859-860
• [Cronen-Townsend et al., SIGIR 2002] Stephen Cronen-Townsend, Yun Zhou, W. Bruce Croft:
Predicting query performance. SIGIR 2002: 299-306
• [Diaz et al., SIGIR 2004] Fernando Diaz, Rosie Jones: Using temporal profiles of queries for precision
prediction. SIGIR 2004: 18-24
• [Dumais, SIAM-SDM 2012] Susan T. Dumais: Temporal Dynamics and Information Retrieval. SIAM-
SDM 2012
51
References (cont’)
• [Hauff et al., ECIR 2010] Claudia Hauff, Leif Azzopardi, Djoerd Hiemstra, Franciska de Jong: Query
Performance Prediction: Evaluation Contrasted with Effectiveness. ECIR 2010: 204-216
• [de Jong et al., AHC 2005] Franciska de Jong, Henning Rode, Djoerd Hiemstra: Temporal language
models for the disclosure of historical text. AHC 2005: 161-168
• [Kaluarachchi et al., CIKM 2010] Amal Chaminda Kaluarachchi, Aparna S. Varde, Srikanta J.
Bedathur, Gerhard Weikum, Jing Peng, Anna Feldman: Incorporating terminology evolution for query
translation in text retrieval with association rules. CIKM 2010: 1789-1792
• [Kalczynski et al., Inf. Process. 2005] Pawel Jan Kalczynski, Amy Chou: Temporal Document
Retrieval Model for business news archives. Inf. Process. Manage. 41(3): 635-650 (2005)
• [Kanhabua et al., JCDL 2010] Nattiya Kanhabua, Kjetil Nørvåg: Exploiting time-based synonyms in
searching document archives. JCDL 2010: 79-88
• [Kanhabua et al., ECDL 2010] Nattiya Kanhabua, Kjetil Nørvåg: Determining Time of Queries for Re-
ranking Search Results. ECDL 2010: 261-272
• [Kanhabua et al., SIGIR 2011a] Nattiya Kanhabua, Kjetil Nørvåg: A comparison of time-aware ranking
methods. SIGIR 2011: 1257-1258
• [Kanhabua et al., SIGIR 2011b] Nattiya Kanhabua, Kjetil Nørvåg: Time-based query performance
predictors. SIGIR 2011: 1181-1182
• [Kanhabua et al., SIGIR 2012] Nattiya Kanhabua, Klaus Berberich, Kjetil Nørvåg: Learning to select a
time-aware retrieval model. SIGIR 2012
• [Kanhabua et al., TAIA 2012] Nattiya Kanhabua, Sara Romano, Avaré Stewart: Identifying Relevant
Temporal Expressions for Real-World Events. Time-aware Information Access Workshop 2012
52
References (cont’)
• [Ke et al., CN 2006] Yiping Ke, Lin Deng, Wilfred Ng, Dik Lun Lee: Web dynamics and their
ramifications for the development of Web search engines. Computer Networks 50(10): 1430-1447
(2006)
• [Kraaij, SIGIR Forum 2005] Wessel Kraaij: Variations on language modeling for information
retrieval. SIGIR Forum 39(1): 61 (2005)
• [Li et al., CIKM 2003] Xiaoyan Li, W. Bruce Croft: Time-based language models. CIKM 2003:
469-475
• [Nunes et al., WIDM 2007] Sérgio Nunes, Cristina Ribeiro, Gabriel David: Using neighbors to
date web documents. WIDM 2007: 129-136
• [Metzler et al., SIGIR 2009] Donald Metzler, Rosie Jones, Fuchun Peng, Ruiqiang Zhang:
Improving search relevance for implicitly temporal queries. SIGIR 2009: 700-701
• [Mazeika et al., CIKM 2011] Arturas Mazeika, Tomasz Tylenda, Gerhard Weikum: Entity
timelines: visual analytics and named entity evolution. CIKM 2011: 2585-2588
• [Peng et al., ECIR 2010] Jie Peng, Craig Macdonald, Iadh Ounis: Learning to Select a Ranking
Function. ECIR 2010: 114-126
• [Risvik et al., CN 2002] Knut Magne Risvik, Rolf Michelsen: Search engines and Web dynamics.
Computer Networks 39(3): 289-302 (2002)
• [Shokouhi, SIGIR 2011] Milad Shokouhi: Detecting Seasonal Queries by Time-Series Analysis.
SIGIR 2011: 1171-1172
• [Strötgen et al., SemEval 2010] Jannik Strötgen, Michael Gertz: Heideltime: High quality rule-
based extraction and normalization of temporal expressions. SemEval 2010: 321-324
53
References (cont’)
• [Strötgen et al., TempWeb 2012] Jannik Strötgen, Omar Alonso, Michael Gertz: Identification of
top relevant temporal expressions in documents. Temporal Web Workshop 2012.
• [Tahmasebi et al., COLING2012] Nina Tahmasebi, Gerhard Gossen, Nattiya Kanhabua, Helge
Holzmann, Thomas Risse: NEER: An Unsupervised Method for Named Entity Evolution
Recognition. COLING 2012
• [Verhagen et al., ACL 2005] Marc Verhagen, Inderjeet Mani, Roser Sauri, Jessica Littman,
Robert Knippen, Seok Bae Jang, Anna Rumshisky, John Phillips, James Pustejovsky: Automating
Temporal Annotation with TARSQI. ACL 2005
• [WebDyn 2010] Web Dynamics course: http://www.mpi-
inf.mpg.de/departments/d5/teaching/ss10/dyn/, Max-Planck Institute for Informatics, Saarbrücken,
Germany, 2010
• [Zhang et al., EMNLP 2010] Ruiqiang Zhang, Yuki Konda, Anlei Dong, Pranam Kolari, Yi Chang,
Zhaohui Zheng: Learning Recurrent Event Queries for Web Search. EMNLP 2010: 1129-1139
54
Thank you!
55

More Related Content

Similar to Searching the Temporal Web: Challenges and Current Approaches

Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalNattiya Kanhabua
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Nattiya Kanhabua
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalNattiya Kanhabua
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and futureRoi Blanco
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaNattiya Kanhabua
 
Data Science Capstone - Global Economics
Data Science Capstone - Global EconomicsData Science Capstone - Global Economics
Data Science Capstone - Global EconomicsMeagan Thompson
 
Blueprint for Collaboration-UF Storage Cataloging Project
Blueprint for Collaboration-UF Storage Cataloging ProjectBlueprint for Collaboration-UF Storage Cataloging Project
Blueprint for Collaboration-UF Storage Cataloging ProjectNASIG
 
Building an LDA topic model using Wikipedia
Building an LDA topic model using WikipediaBuilding an LDA topic model using Wikipedia
Building an LDA topic model using WikipediaSharon Garewal
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTu Nguyen
 
Shebanq roma-2013-10-01
Shebanq roma-2013-10-01Shebanq roma-2013-10-01
Shebanq roma-2013-10-01Dirk Roorda
 
Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...
Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...
Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...Scottish Language Dictionaries
 
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...HopeBay Technologies, Inc.
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)TimelessFuture
 
Training Researchers with the MOVING Platform
Training Researchers with the MOVING PlatformTraining Researchers with the MOVING Platform
Training Researchers with the MOVING PlatformIacopo Vagliano
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 

Similar to Searching the Temporal Web: Challenges and Current Approaches (20)

Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information Retrieval
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information Retrieval
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
 
Data Science Capstone - Global Economics
Data Science Capstone - Global EconomicsData Science Capstone - Global Economics
Data Science Capstone - Global Economics
 
Blueprint for Collaboration-UF Storage Cataloging Project
Blueprint for Collaboration-UF Storage Cataloging ProjectBlueprint for Collaboration-UF Storage Cataloging Project
Blueprint for Collaboration-UF Storage Cataloging Project
 
LR.pptx
LR.pptxLR.pptx
LR.pptx
 
Building an LDA topic model using Wikipedia
Building an LDA topic model using WikipediaBuilding an LDA topic model using Wikipedia
Building an LDA topic model using Wikipedia
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the Web
 
Shebanq roma-2013-10-01
Shebanq roma-2013-10-01Shebanq roma-2013-10-01
Shebanq roma-2013-10-01
 
Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...
Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...
Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary us...
 
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
Digital Library
Digital LibraryDigital Library
Digital Library
 
#unbordering_svu
#unbordering_svu#unbordering_svu
#unbordering_svu
 
Training Researchers with the MOVING Platform
Training Researchers with the MOVING PlatformTraining Researchers with the MOVING Platform
Training Researchers with the MOVING Platform
 
Television News Search and Analysis with Lucene/Solr
Television News Search and Analysis with Lucene/SolrTelevision News Search and Analysis with Lucene/Solr
Television News Search and Analysis with Lucene/Solr
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 

More from Nattiya Kanhabua

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Nattiya Kanhabua
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksNattiya Kanhabua
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Nattiya Kanhabua
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationNattiya Kanhabua
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News PredictionsNattiya Kanhabua
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updatesNattiya Kanhabua
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Nattiya Kanhabua
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Nattiya Kanhabua
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Nattiya Kanhabua
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Nattiya Kanhabua
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsNattiya Kanhabua
 
Supporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalSupporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalNattiya Kanhabua
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Nattiya Kanhabua
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Nattiya Kanhabua
 
Exploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesExploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesNattiya Kanhabua
 
Identifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsIdentifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsNattiya Kanhabua
 
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Nattiya Kanhabua
 
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...Nattiya Kanhabua
 

More from Nattiya Kanhabua (18)

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of Outbreaks
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updates
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search Results
 
Supporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information RetrievalSupporting Exploration and Serendipity in Information Retrieval
Supporting Exploration and Serendipity in Information Retrieval
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 
Exploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesExploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document Archives
 
Identifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsIdentifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world Events
 
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
 
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalyst...
 

Recently uploaded

Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 

Recently uploaded (20)

Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 

Searching the Temporal Web: Challenges and Current Approaches

  • 1. Searching the Temporal Web Challenges and Current Approaches Nattiya Kanhabua IR Group, University of Glasgow 4 February 2013
  • 2. Outline • Evolution of the Web • Temporal IR system • Current Approaches – Content Analysis – Query Analysis – Retrieval and Ranking • Open Issues 2
  • 3. Evolution of the Web • Web is changing over time in many aspects: – Size: web pages are added/deleted all the time – Content: web pages are edited/modified – Query: users’ information needs changes [Ke et al., CN 2006; Risvik et al., CN 2002] [Dumais, SIAM-SDM 2012; WebDyn 2010] 3
  • 4. 2000 First billion-URL index The world’s largest! ≈5000 PCs in clusters!2004 Index grows to 4.2 billion pages 1995 2012 2008 Google counts 1 trillion unique URLs Web and Index Sizes 2009 TBs or PBs of data/index Tens of thousands of PCs http://www.worldwidewebsize.com/ Impacts: crawling, indexing, and caching 4
  • 5. Content Dynamics • WayBack Machine – Web archive search by the Internet Archive 5
  • 6. 1998 2006 Content Dynamics 2012Impacts: document representation and retrieval 6
  • 7. Query Dynamics • Search queries exhibit temporal patterns – Spikes or seasonality Impacts: search intent and query representation http://www.google.com/insights/search/ 7
  • 9. Time-sensitive Queries • Represent temporal information needs – E.g., 2006 FIFA World Cup, Thailand tsunami • Queries and the relevance depend on time – Query popularities change over time – Documents are about events at particular time 9
  • 10. Time Distribution of Qrel Recency query Time-sensitive query Time-insensitive query [Li et al., CIKM 2003] 10
  • 11. Query/Document Matching query Temporal Web Determining Search Intent Term: {Germany, World, Cup} Time: {06/2006, 07/2006} D2006 Retrieved results matching Time-sensitive queries Semantic Annotation Annotated documents Term: {w1, w2, …, wn} Time: {PubTime(di), ContentTime(di)} 11
  • 13. Two Time Aspects Two time dimensions 1. Publication or modified time 2. Content or event time content time publication time 13
  • 14. Problem Statements • Difficult to find the trustworthy time for web documents – Time gap between crawling and indexing – Decentralization and relocation of web documents – No standard metadata for time/date Document Dating Let’s me see… This document is probably written in 850 A.C. with 95% confidence. I found a bible-like document. But I have no idea when it was created? “ For a given document with uncertain timestamp, can the contents be used to determine the timestamp with a sufficiently high confidence? ” 14
  • 15. Current Approaches 1. Content-based 2. Link-based 3. Hybrid 15
  • 16. Content-based Approach Partition Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models Temporal Language Models • Based on the statistic usage of words over time • Compare each word of a non-timestamped document with a reference corpus • Tentative timestamp -- a time partition mostly overlaps in word usage Freq 1 1 1 1 1 1 tsunami Thailand A non-timestamped document 16 [de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
  • 17. Content-based Approach Partition Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models tsunami Thailand A non-timestamped document Temporal Language Models • Based on the statistic usage of words over time • Compare each word of a non-timestamped document with a reference corpus • Tentative timestamp -- a time partition mostly overlaps in word usage Freq 1 1 1 1 1 1 17 [de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
  • 18. Content-based Approach Partition Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models tsunami Thailand A non-timestamped document Temporal Language Models • Based on the statistic usage of words over time • Compare each word of a non-timestamped document with a reference corpus • Tentative timestamp -- a time partition mostly overlaps in word usage Freq 1 1 1 1 1 1 18 [de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
  • 19. Content-based Approach Partition Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models tsunami Thailand A non-timestamped document Temporal Language Models • Based on the statistic usage of words over time • Compare each word of a non-timestamped document with a reference corpus • Tentative timestamp -- a time partition mostly overlaps in word usage Freq 1 1 1 1 1 1 19 [de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008]
  • 20. Content-based Approach Partition Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models tsunami Thailand A non-timestamped document Similarity Scores Score(1999) = 1 Score(2004) = 1 + 1 = 2 Most likely timestamp is 2004 Temporal Language Models • Based on the statistic usage of words over time • Compare each word of a non-timestamped document with a reference corpus • Tentative timestamp -- a time partition mostly overlaps in word usage [de Jong et al., AHC 2005; Kanhabua et al., ECDL 2008] Freq 1 1 1 1 1 1 20
  • 21. Normalized Log-likelihood Ratio Partition Word 1999 tsunami 1999 Japan 1999 tidal wave 2004 tsunami 2004 Thailand 2004 earthquake Temporal Language Models tsunami Thailand A non-timestamped document Similarity Scores Score(1999) = 1 Score(2004) = 1 + 1 = 2 Most likely timestamp is 2004 Normalized log-likelihood ratio • Variant of Kullback-Leibler divergence • Similarity of a document and time partitions • C is the background model estimated on the corpus • Linear interpolation smoothing to avoid the zero probability of unseen words [Kraaij, SIGIR Forum 2005] 21
  • 22. Link-based Approach • Dating a document using its neighbors 1. Web pages linking to the document • Incoming links 2. Web pages pointed by the document • Outgoing links 3. Media assets associated with the document • E.g., images • Averaging the last-modified dates of its neighbors as timestamps [Nunes et al., WIDM 2007] 22
  • 23. Hybrid Approach • Inferring timestamps using machine learning – Exploit links, contents of a web pages and its neighbors – Features: linguistic, position, page formats, and tags [Chen et al., SIGIR 2010] 23
  • 24. Content Time Extraction • Three types of temporal expressions 1. Explicit: time mentions being mapped directly to a time point or interval, e.g., “July 4, 2012” 2. Implicit: imprecise time point or interval, e.g., “Independence Day 2012” 3. Relative: resolved to a time point or interval using other types or the publication date, e.g., “next month” [Alonso et al., SIGIR Forum 2007] 24
  • 25. Identifying Relevant Time • How to determine relevant temporal expressions tagged in a document? – Not all temporal expressions associated to an event are equally relevant Reported by World Health Organization (WHO) on 29 July 2012 about an ongoing Ebola outbreak in Uganda since the beginning of July 2012 25
  • 26. Current Approaches 1. Ranking temporal expressions using different features 2. The task of identifying relevant time is regarded as a classification problem – Two classes: (1) relevant and (2) irrelevant – Definition: relevant referring to the starting, ending or ongoing time of the event [Strötgen et al., TempWeb 2012; Kanhabua et al., TAIA 2012] 26
  • 28. Determining Time of Queries • Two types of temporal queries: 1. Explicit: time is provided, "Presidential election 2012“ 2. Implicit: time is not provided, "Germany World Cup" • Temporal intent can be implicitly inferred • Previous studies on temporal queries: – 1.5% of web queries are explicit – ~7% of web queries are implicit [Nunes et al., ECIR 2008; Metzler et al., SIGIR 2009] 28
  • 29. Current Approaches 1. Query log analysis 2. Analyzing top-k documents 29
  • 30. Query Log Analysis • Features from query logs – Analyze query frequencies over time for identifying the relevant time of queries • Time-series analysis – Leverage time-series decomposition for detecting seasonal queries [Metzler et al., SIGIR 2009; Shokouhi, SIGIR 2011] 30
  • 32. Analyzing Top-k Documents • Using temporal language models – Determine time of queries when no time is given explicitly • Exploiting time from search snippets – Extract temporal expressions (i.e., years) from the contents of top-k retrieved web snippets for a given query [Kanhabua et al., ECDL 2010; Campos et al., CIKM 2012] 32
  • 33. Matching: Re-visited D2006 Ranked results query Temporal Web Determining Search Intent Term: {Germany, World, Cup} Time: {06/2006, 07/2006} D2006 Retrieved results matching Time-sensitive Queries Semantic Annotation Annotated documents Term: {w1, w2, …, wn} Time: {PubTime(di), ContentTime(di)} 33
  • 35. Searching the Past • Searching documents created/edited over time – E.g., web archives, news archives, blogs, or emails – A journalist wants to write a timeline of a news article – A Wikipedia contributor searches for historical information about an entity of interests Web archives news archives blogs emails “temporal document collections” Retrieve documents about Pope Benedict XVI written before 2005 Term-based IR approaches may give unsatisfied results 35
  • 36. • Time must be explicitly modeled in order to increase the effectiveness of ranking – To order search results so that the most relevant ones are ranked higher • Time uncertainty should be taken into account – Two temporal expressions can refer to the same time period even though they are not equally written – E.g. the query “Independence Day 2011” • A retrieval model relying on term-matching only will fail to retrieve documents mentioning “July 4, 2011” Challenges 36
  • 37. Query/Document Models • A temporal query consists of: – Query keywords – Temporal expressions • A document consists of: – Terms, i.e., bag-of-words – Publication time and temporal expressions 37
  • 38. Temporal Query Examples • A temporal query consists of: – Query keywords – Temporal expressions • A document consists of: – Terms, i.e., bag-of-words – Publication time and temporal expressions [Berberich et al., ECIR 2010] 38
  • 39. Time-aware Ranking • Two main approaches 1. Mixture model • Linearly combining textual- and temporal similarity 2. Probabilistic model • Generating a query from the textual part and temporal part of a document independently [Li et al., CIKM 2003; Baeza-Yates, SIGIR Forum 2005] [Berberich et al., ECIR 2010; Kanhabua et al., ECDL 2010] 39
  • 40. Mixture Model • Linearly combine textual- and temporal similarity – α indicates the importance of similarity scores • Both scores are normalized before combining – Textual similarity can be determined using any term- based retrieval model • E.g., tf.idf or a unigram language model 40[Li et al., CIKM 2003; Baeza-Yates, SIGIR Forum 2005] [Kanhabua et al., ECDL 2010]
  • 41. Mixture Model • Linearly combine textual- and temporal similarity – α indicates the importance of similarity scores • Both scores are normalized before combining – Textual similarity can be determined using any term- based retrieval model • E.g., tf.idf or a unigram language model How to determine temporal similarity? 41
  • 43. Probabilistic Model • Assume that temporal expressions in the query are generated independently from a two-step generative model: – P(tq|td) can be estimated likelihood for each time interval tq that td can refer to – Linear interpolation smoothing is applied to eliminates zero probabilities • I.e., an unseen temporal expression tq in d 43 [Berberich et al., ECIR 2010]
  • 44. • Five time-aware ranking models – LMT [Berberich et al., ECIR 2010] – LMTU [Berberich et al., ECIR 2010] – TS [Kanhabua et al., ECLD 2010] – TSU [Kanhabua et al., ECLD 2010] – FuzzySet [Kalczynski et al., Inf. Process. 2005] Comparison of Time-aware Ranking [Kanhabua et al., SIGIR 2011a] 44
  • 46. Problem Statements • Queries of named entities (people, company, place) – Highly dynamic in appearance, i.e., relationships between terms changes over time – E.g. changes of roles, name alterations, or semantic shift Named Entity Evolution Scenario 1 Query: “Pope Benedict XVI” and written before 2005 Documents about “Joseph Alois Ratzinger” are relevant Scenario 2 Query: “Hillary R. Clinton” and written from 1997 to 2002 Documents about “New York Senator” and “First Lady of the United States” are relevant 46
  • 47. Examples of Name Changes 47 QUEST Demo: http://research.idi.ntnu.no/wislab/quest/
  • 48. Current Approaches • Temporal co-occurrence • Mining Wikipedia revisions • Temporal association rule mining 48[Berberich et al., WebDB 2009; Kanhabua et al., JCDL 2010] [Kaluarachchi et al., CIKM 2010; Tahmasebi et al., COLING 2012]
  • 49. 1. Performance prediction – Predict the retrieval effectiveness wrt. a ranking model Query Prediction Problems query precision = ? recall = ? MAP = ? predict [Cronen-Townsend et al., SIGIR 2002; Diaz et al., SIGIR 2004] [Hauff et al., ECIR 2010; Carmel et al., 2010; Kanhabua et al. SIGIR 2011b] 49
  • 50. 1. Performance prediction – Predict the retrieval effectiveness wrt. a ranking model 2. Retrieval model prediction – Predict the retrieval model that is most suitable Query Prediction Problems query ranking = ? predict max(precision) max(recall) max(MAP) [Peng et al., ECIR 2010; Kanhabua et al., SIGIR 2012] 50
  • 51. References • [Alonso et al., SIGIR Forum 2007] Omar Alonso, Michael Gertz, Ricardo A. Baeza-Yates: On the value of temporal information in information retrieval. SIGIR Forum 41(2): 35-41 (2007) • [Baeza-Yates, SIGIR Forum 2005] Ricardo A. Baeza-Yates: Searching the future. SIGIR workshop MF/IR 2005 • [Berberich et al., WebDB 2009] Klaus Berberich, Srikanta J. Bedathur, Mauro Sozio, Gerhard Weikum: Bridging the Terminology Gap in Web Archive Search. WebDB 2009 • [Berberich et al., ECIR 2010] Klaus Berberich, Srikanta J. Bedathur, Omar Alonso, Gerhard Weikum: A Language Modeling Approach for Temporal Information Needs. ECIR 2010: 13-25 • [Campos et al., CIKM 2012] Ricardo Campos, Gaël Dias, Alípio Jorge, Celia Nunes: GTE: A Distributional Second-Order Co-Occurrence Approach to Improve the Identification of Top Relevant Dates in Web Snippets. CIKM 2012 • [Carmel et al., 2010] David Carmel, Elad Yom-Tov: Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool Publishers 2010 • [Chen et al., SIGIR 2010] Zhumin Chen, Jun Ma, Chaoran Cui, Hongxing Rui, Shaomang Huang: Web page publication time detection and its application for page rank. SIGIR 2010: 859-860 • [Cronen-Townsend et al., SIGIR 2002] Stephen Cronen-Townsend, Yun Zhou, W. Bruce Croft: Predicting query performance. SIGIR 2002: 299-306 • [Diaz et al., SIGIR 2004] Fernando Diaz, Rosie Jones: Using temporal profiles of queries for precision prediction. SIGIR 2004: 18-24 • [Dumais, SIAM-SDM 2012] Susan T. Dumais: Temporal Dynamics and Information Retrieval. SIAM- SDM 2012 51
  • 52. References (cont’) • [Hauff et al., ECIR 2010] Claudia Hauff, Leif Azzopardi, Djoerd Hiemstra, Franciska de Jong: Query Performance Prediction: Evaluation Contrasted with Effectiveness. ECIR 2010: 204-216 • [de Jong et al., AHC 2005] Franciska de Jong, Henning Rode, Djoerd Hiemstra: Temporal language models for the disclosure of historical text. AHC 2005: 161-168 • [Kaluarachchi et al., CIKM 2010] Amal Chaminda Kaluarachchi, Aparna S. Varde, Srikanta J. Bedathur, Gerhard Weikum, Jing Peng, Anna Feldman: Incorporating terminology evolution for query translation in text retrieval with association rules. CIKM 2010: 1789-1792 • [Kalczynski et al., Inf. Process. 2005] Pawel Jan Kalczynski, Amy Chou: Temporal Document Retrieval Model for business news archives. Inf. Process. Manage. 41(3): 635-650 (2005) • [Kanhabua et al., JCDL 2010] Nattiya Kanhabua, Kjetil Nørvåg: Exploiting time-based synonyms in searching document archives. JCDL 2010: 79-88 • [Kanhabua et al., ECDL 2010] Nattiya Kanhabua, Kjetil Nørvåg: Determining Time of Queries for Re- ranking Search Results. ECDL 2010: 261-272 • [Kanhabua et al., SIGIR 2011a] Nattiya Kanhabua, Kjetil Nørvåg: A comparison of time-aware ranking methods. SIGIR 2011: 1257-1258 • [Kanhabua et al., SIGIR 2011b] Nattiya Kanhabua, Kjetil Nørvåg: Time-based query performance predictors. SIGIR 2011: 1181-1182 • [Kanhabua et al., SIGIR 2012] Nattiya Kanhabua, Klaus Berberich, Kjetil Nørvåg: Learning to select a time-aware retrieval model. SIGIR 2012 • [Kanhabua et al., TAIA 2012] Nattiya Kanhabua, Sara Romano, Avaré Stewart: Identifying Relevant Temporal Expressions for Real-World Events. Time-aware Information Access Workshop 2012 52
  • 53. References (cont’) • [Ke et al., CN 2006] Yiping Ke, Lin Deng, Wilfred Ng, Dik Lun Lee: Web dynamics and their ramifications for the development of Web search engines. Computer Networks 50(10): 1430-1447 (2006) • [Kraaij, SIGIR Forum 2005] Wessel Kraaij: Variations on language modeling for information retrieval. SIGIR Forum 39(1): 61 (2005) • [Li et al., CIKM 2003] Xiaoyan Li, W. Bruce Croft: Time-based language models. CIKM 2003: 469-475 • [Nunes et al., WIDM 2007] Sérgio Nunes, Cristina Ribeiro, Gabriel David: Using neighbors to date web documents. WIDM 2007: 129-136 • [Metzler et al., SIGIR 2009] Donald Metzler, Rosie Jones, Fuchun Peng, Ruiqiang Zhang: Improving search relevance for implicitly temporal queries. SIGIR 2009: 700-701 • [Mazeika et al., CIKM 2011] Arturas Mazeika, Tomasz Tylenda, Gerhard Weikum: Entity timelines: visual analytics and named entity evolution. CIKM 2011: 2585-2588 • [Peng et al., ECIR 2010] Jie Peng, Craig Macdonald, Iadh Ounis: Learning to Select a Ranking Function. ECIR 2010: 114-126 • [Risvik et al., CN 2002] Knut Magne Risvik, Rolf Michelsen: Search engines and Web dynamics. Computer Networks 39(3): 289-302 (2002) • [Shokouhi, SIGIR 2011] Milad Shokouhi: Detecting Seasonal Queries by Time-Series Analysis. SIGIR 2011: 1171-1172 • [Strötgen et al., SemEval 2010] Jannik Strötgen, Michael Gertz: Heideltime: High quality rule- based extraction and normalization of temporal expressions. SemEval 2010: 321-324 53
  • 54. References (cont’) • [Strötgen et al., TempWeb 2012] Jannik Strötgen, Omar Alonso, Michael Gertz: Identification of top relevant temporal expressions in documents. Temporal Web Workshop 2012. • [Tahmasebi et al., COLING2012] Nina Tahmasebi, Gerhard Gossen, Nattiya Kanhabua, Helge Holzmann, Thomas Risse: NEER: An Unsupervised Method for Named Entity Evolution Recognition. COLING 2012 • [Verhagen et al., ACL 2005] Marc Verhagen, Inderjeet Mani, Roser Sauri, Jessica Littman, Robert Knippen, Seok Bae Jang, Anna Rumshisky, John Phillips, James Pustejovsky: Automating Temporal Annotation with TARSQI. ACL 2005 • [WebDyn 2010] Web Dynamics course: http://www.mpi- inf.mpg.de/departments/d5/teaching/ss10/dyn/, Max-Planck Institute for Informatics, Saarbrücken, Germany, 2010 • [Zhang et al., EMNLP 2010] Ruiqiang Zhang, Yuki Konda, Anlei Dong, Pranam Kolari, Yi Chang, Zhaohui Zheng: Learning Recurrent Event Queries for Web Search. EMNLP 2010: 1129-1139 54