We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.
Uncommon Grace The Autobiography of Isaac Folorunso
Ranking Related News Predictions
1. Ranking Related News Predictions
Ranking Related News Predictions
Nattiya Kanhabua1, Roi Blanco2 and Michael Matthews2
1Norwegian University of Science and Tech., Norway
2Yahoo! Research, Barcelona, Spain
SIGIR’2011, Beijing
2. Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
3. Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
4. Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
5. Ranking Related News Predictions
Outline
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
6. Ranking Related News Predictions
Introduction
Problem Statement
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
7. Ranking Related News Predictions
Introduction
Problem Statement
Problem statement
People are naturally curious about the future.
◮ How long will a war in the middle east last?
◮ What is the latest health care plan?
◮ What will happen to EU economies in next 5 years?
◮ What will be potential effects of climate changes?
Over 32% of 2.5M documents from Yahoo! News (July 2009 to
July 2010) contain at least one prediction.
A new task called ranking related news predictions.
◮ Retrieve predictions related to a news story in news archives.
◮ Rank them according to their relevance to the news story.
8. Ranking Related News Predictions
Introduction
Problem Statement
Problem statement
People are naturally curious about the future.
◮ How long will a war in the middle east last?
◮ What is the latest health care plan?
◮ What will happen to EU economies in next 5 years?
◮ What will be potential effects of climate changes?
Over 32% of 2.5M documents from Yahoo! News (July 2009 to
July 2010) contain at least one prediction.
A new task called ranking related news predictions.
◮ Retrieve predictions related to a news story in news archives.
◮ Rank them according to their relevance to the news story.
9. Ranking Related News Predictions
Introduction
Problem Statement
Problem statement
People are naturally curious about the future.
◮ How long will a war in the middle east last?
◮ What is the latest health care plan?
◮ What will happen to EU economies in next 5 years?
◮ What will be potential effects of climate changes?
Over 32% of 2.5M documents from Yahoo! News (July 2009 to
July 2010) contain at least one prediction.
A new task called ranking related news predictions.
◮ Retrieve predictions related to a news story in news archives.
◮ Rank them according to their relevance to the news story.
10. Ranking Related News Predictions
Introduction
Problem Statement
Related News Predictions
11. Ranking Related News Predictions
Introduction
Problem Statement
Related News Predictions
12. Ranking Related News Predictions
Introduction
Problem Statement
Related News Predictions
Query = <gas, emission, percent,
european, global, climate>
13. Ranking Related News Predictions
Introduction
Related Work
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
14. Ranking Related News Predictions
Introduction
Related Work
Future-related Information Analyzing Tools
Recorded Future
Difference: a user must specify a query in advance using “predefined” entities.
15. Ranking Related News Predictions
Introduction
Related Work
Future-related Information Analyzing Tools
Yahoo’s Time Explorer
Difference: No ranking or performance evaluation is done.
16. Ranking Related News Predictions
Introduction
Related Work
Previous Work on Future Retrieval
R. Baeza-Yates. Searching the future. SIGIR’2005 Workshop
on Mathematical/Formal Methods in IR.
◮ Extract temporal expressions from news articles.
◮ Retrieve future information using a probabilistic model, i.e.,
multiplying term similarity and a time confidence.
◮ Only a small data set and a year granularity are used.
17. Ranking Related News Predictions
Introduction
Related Work
Previous Work on Future Retrieval
A. Jatowt et al. Supporting analysis of future-related
information in news archives and the web. JCDL’2009.
◮ Extract future mentions from news snippets obtained from
search engines.
◮ Summarize and aggregate results using clustering methods.
◮ Not focus on relevance and ranking of future information.
18. Ranking Related News Predictions
Introduction
Contributions
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
19. Ranking Related News Predictions
Introduction
Contributions
Contributions
I. Formally define ranking related news predictions.
II. Four classes of features: term similarity, entity-based
similarity, topic similarity and temporal similarity.
III. Extensive evaluation using dataset with over 6000
judgments from the NYT Annotated Corpus.
20. Ranking Related News Predictions
Task Definition
System Architecture
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
21. Ranking Related News Predictions
Task Definition
System Architecture
System Architecture
Step 1: Document annotation.
◮ Extract temporal expressions
using time and event recognition.
◮ Normalize them to dates so they
can be anchored on a timeline.
◮ Output: sentences annotated
with named entities and dates,
i.e., predictions.
22. Ranking Related News Predictions
Task Definition
System Architecture
System Architecture
Step 2: Retrieving predictions.
◮ Automatically generate a query
from a news article being read.
◮ Retrieve predictions that match
the query.
◮ Rank predictions by relevance. A
prediction is “relevant” if it is
about the topics of the article.
23. Ranking Related News Predictions
Task Definition
Models
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
24. Ranking Related News Predictions
Task Definition
Models
Annotated Document Model
Collection C = {d1, . . . , dn}.
Document d = {{w1, . . . , wn} , time(d)}.
◮ time(d) gives the publication date of d.
Annotated document ˆd is composed of:
◮ Named entities ˆde = {e1, . . . , en}
◮ Temporal expressions ˆdt = {t1, . . . , tm}
◮ Sentences ˆds = {s1, . . . , sz}
25. Ranking Related News Predictions
Task Definition
Models
Annotated Document Model
Collection C = {d1, . . . , dn}.
Document d = {{w1, . . . , wn} , time(d)}.
◮ time(d) gives the publication date of d.
Annotated document ˆd is composed of:
◮ Named entities ˆde = {e1, . . . , en}
◮ Temporal expressions ˆdt = {t1, . . . , tm}
◮ Sentences ˆds = {s1, . . . , sz}
26. Ranking Related News Predictions
Task Definition
Models
Prediction Model
Let dp be the parent document of a prediction p.
p is a sentence containing field/value pairs:
Field Value
ID 1136243_1
PARENT_ID 1136243
TITLE Gore Pledges A Health Plan For Every Child
TEXT Vice President Al Gore proposed today to guarantee access to
affordable health insurance for all children by 2005, expanding
on a program enacted two years ago that he conceded had had
limited success so far.
CONTEXT Mr. Gore acknowledged that the number of Americans without
health coverage had increased steadily since he and President
Clinton took office.
ENTITY Al Gore
FUTURE_DATE 2005
PUB_DATE 1999/09/08
27. Ranking Related News Predictions
Task Definition
Models
Query Model
Query q is extracted from a news article being read dq.
1. Keywords qtext
2. Time constraints qtime
28. Ranking Related News Predictions
Task Definition
Models
Query Keywords
A news article
being read
Query keyword
extraction
Term query
(1) (2) (3)
Entity query
Q Q
E T
Field
A prediction
ID
PARENT_ID
TITLE
TEXT
ENTITY
CONTEXT
FUTURE_DATE
PUB_DATE
Combined query
Q
C
QE = {e1, . . . , em}
E.g., Barack Obama, Iraq, America
29. Ranking Related News Predictions
Task Definition
Models
Query Keywords
A news article
being read
Query keyword
extraction
Term query
(1) (2) (3)
Entity query
Q Q
Combined query
Q
E T C
Field
A prediction
ID
PARENT_ID
TITLE
TEXT
ENTITY
CONTEXT
FUTURE_DATE
PUB_DATE
QE = {w1, . . . , wn}
E.g., troop, war, withdraw
30. Ranking Related News Predictions
Task Definition
Models
Query Keywords
A news article
being read
Query keyword
extraction
Term query
(1) (2) (3)
Entity query
Q Q
Combined query
Q
E T C
Field
A prediction
ID
PARENT_ID
TITLE
TEXT
ENTITY
CONTEXT
FUTURE_DATE
PUB_DATE
QC = {e1, . . . , em} ∪ {w1, . . . , wn}
E.g., Barack Obama, Iraq, America, troop, war, withdraw
31. Ranking Related News Predictions
Task Definition
Models
Query Time
Time constraints qtime
1. only predictions that are future to time(dq) - (time(dq), tmax]
2. only articles published before time(dq) - [tmin, time(dq)]
now futurepast
Query
2016 203320181999 20062002
P P P
32. Ranking Related News Predictions
Approach
Features
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
33. Ranking Related News Predictions
Approach
Features
Term Similarity
Capture the term-similarity between q and p.
1. retScore(q,p) Lucene’s TF-IDF scoring function
◮ Problem: keyword matching, short texts
◮ Predictions not containing query terms are not retrieved.
2. bm25f(q,p) field-aware ranking function
◮ Extend a sentence structure by surrounding sentences.
◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
34. Ranking Related News Predictions
Approach
Features
Term Similarity
Capture the term-similarity between q and p.
1. retScore(q,p) Lucene’s TF-IDF scoring function
◮ Problem: keyword matching, short texts
◮ Predictions not containing query terms are not retrieved.
2. bm25f(q,p) field-aware ranking function
◮ Extend a sentence structure by surrounding sentences.
◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
35. Ranking Related News Predictions
Approach
Features
Term Similarity
Capture the term-similarity between q and p.
1. retScore(q,p) Lucene’s TF-IDF scoring function
◮ Problem: keyword matching, short texts
◮ Predictions not containing query terms are not retrieved.
2. bm25f(q,p) field-aware ranking function
◮ Extend a sentence structure by surrounding sentences.
◮ Search CONTEXT in addition to TEXT [Blanco et al. 2010].
36. Ranking Related News Predictions
Approach
Features
Entity-based Similarity
Measure the similarity between q and p
by exploiting annotated entities in dp
, p,
q.
◮ Only applicable for QE and QC.
◮ Features commonly employed in
entity ranking tasks.
◮ Time distance captures the
relationship of term and time.
ID Feature
1 entitySim(q, p)
2 title(e, dp)
3 titleSim(e, dp)
4 senPos(e, dp)
5 senLen(e, dp)
6 cntSenSubj(e, dp)
7 cntEvent(e, dp)
8 cntFuture(e, dp)
9 cntEventSubj(e, dp)
10 cntFutureSubj(e, dp)
11 timeDistEvent(e, dp)
12 timeDistFuture(e, dp)
13 tagSim(e, dp)
14 isSubj(e, p)
15 timeDist(e, p)
37. Ranking Related News Predictions
Approach
Features
Topic Similarity
Compute the similarity between q and p on a topic level.
◮ Latent Dirichlet allocation [Blei et al. 2003] for modeling topics.
1. Train a topic model
2. Infer topics
3. Compute topic similarity
38. Ranking Related News Predictions
Approach
Features
Topic Similarity
Step 1: Learn a topic model.
◮ Partition DN into sub-collections,
called document snapshot Dtrain,tk
.
◮ For each Dtrain,tk
, randomly select
documents for training a topic model.
◮ Output: topic models at different
time snapshots, e.g., φtk
at tk .
39. Ranking Related News Predictions
Approach
Features
Topic Similarity
Step 2: Infer topics.
◮ Determine topics for q and p using
their contents, called topic inference.
◮ Both q and p are represented by a
probability distribution of topics.
◮ pφ = p(z1), . . . , p(zn), where p(z) is
a probability of a topic z.
40. Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
41. Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
42. Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
43. Ranking Related News Predictions
Approach
Features
Topic Similarity
I. Which model snapshot should be used for inference?
Select a topic model φtk
for inference in 2 ways:
◮ tk = time(dq
)
◮ tk = time(dp
)
II. Which contents should be used for inference?
For a query q, the parent document dq is used. For a
prediction p, the contents can be:
◮ Only text ptxt
◮ Both text ptxt and context pctx
◮ Parent document dp
44. Ranking Related News Predictions
Approach
Features
Topic Similarity
Step 3: Measuring topic similarity.
◮ q and p are represented by topic
distributions.
◮ qφ = p(z1), . . . , p(zn)
◮ pφ = p(z1), . . . , p(zn)
◮ Compute the topic similarity using
cosine similarity.
topicSim(q, p) =
qφ · pφ
||qφ|| · ||pφ||
= z∈Z qφz · pφz
z∈Z q2
φz
· z∈Z p2
φz
45. Ranking Related News Predictions
Approach
Features
Temporal Similarity
Hypothesis I. Predictions that are more recent to the query are
more relevant.
now futurepast
Query
2016 203320181999 20062002
P P P
Time distance
46. Ranking Related News Predictions
Approach
Features
Temporal Similarity
Hypothesis II. Predictions extracted from more recent
documents are more relevant.
now futurepast
Query
2016 203320181999 20062002
P P P
Time distance
◮ Timestamp-based Uncertainty (TSU) [Kanhabua and Nørvåg 2010]
◮ FussySet (FS) [Kalczynski and Chou 2005]
47. Ranking Related News Predictions
Approach
Ranking Method
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
48. Ranking Related News Predictions
Approach
Ranking Method
Ranking Method
Learning-to-rank: Given an unseen (q, p), p is ranked using a
model trained over a set of labeled query/prediction pairs.
score(q, p) =
N
i=1
wi × fi
◮ SVMMAP [Yue et al. 2007]
◮ RankSVM [Joachims 2002]
◮ SGD-SVM [Zhang 2004]
◮ PegasosSVM [Shalev-Shwartz et al. 2007]
◮ PA-Perceptron [Crammer et al. 2006]
49. Ranking Related News Predictions
Evaluation
Experiment Setting
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
50. Ranking Related News Predictions
Evaluation
Experiment Setting
Document collection
NYT Annotated Corpus 1.8M from 1987 to 2007.
◮ More than 25% contain at least one prediction
Annotation process uses several language processing tools.
◮ OpenNLP for tokenizing, sentence splitting, part-of-speech
tagging, shallow parsing
◮ SuperSense tagger for named entity recognition
◮ TARSQI for extracting temporal expressions
Apache Lucene for indexing and retrieving.
◮ 44,335,519 sentences and 548,491 predictions
◮ 939,455 future dates (avg. future date/prediction is 1.7)
51. Ranking Related News Predictions
Evaluation
Experiment Setting
Relevance judgments
42 future-related topics
POLITICS ENVIRONMENT SPACE
president election global warming Mars
Iraq war energy efficiency Moon
SCIENCE PHYSICS HEALTH
earthquake particle Physics bird flue
tsunami Big Bang influenza
BUSINESS SPORT TECHNOLOGY
subprime Olympics Internet
financial crisis World cup search engine
52. Ranking Related News Predictions
Evaluation
Experiment Setting
Relevance judgments
Human assessors gave a relevance score Grade(q, p, t).
◮ 4 (very relevant), 3 (relevant), 2 (related), 1 (non-relevant), and 0
(incorrect tagged date)
◮ relevant if Grade(q, p, t) ≥ 3 and non-relevant if
1 ≤ Grade(q, p, t) ≤ 2
In total, assessors judged 52 queries.
◮ On average 94 predictions were retrieved per query
◮ 4,888 query/prediction pairs (approximately 6,032 of triples)
Available for download at:
www.idi.ntnu.no/~nattiya/data/sigir2011/futurepredictions.zip
53. Ranking Related News Predictions
Evaluation
Experiment Setting
Parameter setting
BM25F: b = 0.75, k1 = 1.2 [Robertson et al. 1994]
◮ boost(TEXT) = 5.0
◮ boost(CONTEXT) = 1.0
◮ boost(TITLE) = 2.0
LDA: Stanford Topic Modeling Toolbox
◮ randomly select 4% of documents in each year for training
◮ filter 100 most common terms and in less than 15 documents
◮ number of topics Nz is 500
◮ collapsed variational Bayes approximation algorithm
Temporal features:
◮ DecayRate = 0.5, λ = 0.5, µ = 2y
◮ n = 2, m = 2, smin = 4y, smax = 2y
◮ α1 = time(dq) − 4y, α2 = time(dq) + 2y
54. Ranking Related News Predictions
Evaluation
Experimental Results
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
55. Ranking Related News Predictions
Evaluation
Experimental Results
Methods for comparison
Baseline: QE , QT , QC
◮ Rank using Lucene’s default ranking function.
Our approach: Re-QE , Re-QT , Re-QC
◮ Re-rank the baseline results using learning-to-rank.
Metrics: P@1, P@3, MRR
◮ Typically, a user is interested in a few top predictions.
56. Ranking Related News Predictions
Evaluation
Experimental Results
Selecting top-m entities and top-n terms
Select m and n with reasonable improvement in a hold-out set.
◮ Using QE to retrieve predictions, choose m = 11.
◮ Observing the performance of QC when m = 11, choose n = 10.
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
top-m entities
P10
MAP
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
top-n terms
P10
MAP
57. Ranking Related News Predictions
Evaluation
Experimental Results
Compare all other methods against QE
0
0.2
0.4
0.6
0.8
1
P1
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
P3
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
MRR
QE
QT
QC
Re-QE
Re-QT
Re-QC
Results:
◮ QE performs worst among the baselines, while QC is superior to QT .
◮ Re-QC gains the highest effectiveness followed by Re-QT .
◮ Re-ranking approach gains improvement, except Re-QE .
58. Ranking Related News Predictions
Evaluation
Experimental Results
Compare all other methods against QE
0
0.2
0.4
0.6
0.8
1
P1
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
P3
QE
QT
QC
Re-QE
Re-QT
Re-QC
0
0.2
0.4
0.6
0.8
1
MRR
QE
QT
QC
Re-QE
Re-QT
Re-QC
Analysis:
◮ QE not retrieved any relevant result in the judged pool, difficult for re-ranking.
◮ Entity-based features perform well for some topics.
59. Ranking Related News Predictions
Evaluation
Experimental Results
Feature analysis
Top-5 features with highest weights and lowest weights for each query type.
QE QT QC
Feature Wi Feature Wi Feature Wi
tagSim 1.00 bm25f 1.00 LDA1,parent,k 1.00
FS1 0.97 retScore 0.60 retScore 0.99
TSU2 0.88 LDA1,parent,k 0.55 LDA1,parent,all 0.96
LDA1,txt,k 0.87 LDA2,parent,k 0.51 bm25f 0.93
LDA1,txt,all 0.82 LDA1,parent,all 0.49 isSubj 0.87
cntSenSubj 0.01 timeDistEvent -0.03 cntEventSen -0.02
cntEventSubj 0.01 timeDistFuture -0.11 querySim -0.05
isInTitle 0.00 cntEventSen -0.12 cntFutureSen -0.10
cntEventSen 0.00 cntFutureSen -0.12 timeDistFuture -0.14
querySim -0.01 senLen -0.16 senLen -0.18
◮ Topic-based features play an important role in the re-ranking model.
◮ Although relying on terms, retScore and bm25f help to re-rank predictions.
◮ Features in top-5 features with lowest weights are from the entity-based class.
60. Ranking Related News Predictions
Conclusions
Conclusions and Future Work
Outline
Introduction
Problem Statement
Related Work
Contributions
Task Definition
System Architecture
Models
Approach
Features
Ranking Method
Evaluation
Experiment Setting
Experimental Results
61. Ranking Related News Predictions
Conclusions
Conclusions and Future Work
Conclusions and future work
◮ Define the task of ranking related future predictions.
◮ Employ learning-to-rank incorporating 4 feature classes.
◮ Conduct extensive experiments and create an evaluation
dataset with over 6000 relevance judgments.
◮ Future work:
◮ Combining multiple sources (Wikipedia, blogs, home
pages, etc.) of future-related information.
◮ Sentimental analysis for future-related information.
62. Ranking Related News Predictions
Conclusions
Conclusions and Future Work
Acknowledgment: Thank Hugo Zaragoza for his help at the
early stages of this work.
Thank you!