This document discusses estimating the difficulty of queries for a news prediction retrieval system. It presents 10 predictors that capture the ambiguity of a query using annotation information about entities in top search results. These predictors are used to train a machine learning model to classify queries as either easy or difficult. The combined feature model achieves an accuracy of 92% in classifying queries, demonstrating the ability to estimate query difficulty.
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
1. Contact info:
Nattiya Kanhabua
L3S Research Center
Appelstrasse 9a,
30167 Hannover, Germany
Email: kanhabua@L3S.de http://www.l3s.de
Estimating Query Difficulty for
News Prediction Retrieval
Nattiya Kanhabua
L3S Research Center
Leibniz Universität, Hannover, Germany
kanhabua@L3S.de
Kjetil Nørvåg
Department of Computer Science
Norwegian University of Science and Technology
Trondheim, Norway
noervaag@idi.ntnu.no
Query Difficulty Estimation
• We perform the first study of estimating the quality of result
predictions for a certain type of queries, namely, entity queries.
• Queries are labeled into two classes: Easy and Difficult.
• Given q, the Mean Average Precision (MAP) is measured for
different ranking models by considering prediction robustness [2].
• We split queries into two groups using the following condition
based on the average and standard deviation of MAP.
Query Difficulty Predictors
• We employ a machine learning approach trained using the
propose 10 post-retrieval predictors shown in Table 1.
• Our predictors capture the ambiguity of a query (or news article)
using annotation information about entities in top-k predictions.
Experiments
• Baseline is the majority
class with accuracy of 0.79
• The best single predictor
is avgEntityPerPredict in all k’s
• The combined features ALL
achieves the accuracy of 0.92
Motivation
• People are naturally curious and anticipate about the future [1].
• When reading news, these questions commonly arise :
- What will happen in the eurozone after the financial crisis?
- How will health care change in the post-genomic society?
- When can renewable energy replace fossil fuels?
• Future information is useful for understanding the temporal
development of news stories, and strategies planning in order to
minimize disruptions and risks, or maximize new opportunities.
What is News Prediction Retrieval?
• Retrieve predictions related to a news story in news archives and
rank by relevance [3].
• Over 32% of 2.5M documents from Yahoo! News (July’09-July’10)
contain at least one prediction.
References
[1] R. Baeza-Yates. Searching the future. In Proceedings of ACM SIGIR workshop on MF/IR 2005.
[2] D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool Publishers, 2010.
[3] N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions. In Proceeding of SIGIR’11, pp. 755-764, 2011.
Fig. 1: Result predictions of a query automatically generated.
System Pipeline
Step 1: Document annotation
• Extract temporal expressions
using time and event recognition
• Normalize them to dates so they
can be anchored on a timeline
• Output: predictions annotated
with named entities and dates
Step 2: Retrieving predictions
• Automatically generate a query
from a news article being read
• Retrieve predictions that match
the query and rank by relevance
(i.e., a prediction is “relevant” if it
is about the topics of the article) Fig. 2: News prediction retrieval system
Table 1: Description of the post-retrieval predictors.
Table 2: Accuracy of query classification.