Estimating Query Difficulty for News Prediction Retrieval (poster presentation)

•

1 like•450 views

This document discusses estimating the difficulty of queries for a news prediction retrieval system. It presents 10 predictors that capture the ambiguity of a query using annotation information about entities in top search results. These predictors are used to train a machine learning model to classify queries as either easy or difficult. The combined feature model achieves an accuracy of 92% in classifying queries, demonstrating the ability to estimate query difficulty.

Presentations & Public Speaking

Contact info:
Nattiya Kanhabua
L3S Research Center
Appelstrasse 9a,
30167 Hannover, Germany
Email: kanhabua@L3S.de http://www.l3s.de
Estimating Query Difficulty for
News Prediction Retrieval
Nattiya Kanhabua
L3S Research Center
Leibniz Universität, Hannover, Germany
kanhabua@L3S.de
Kjetil Nørvåg
Department of Computer Science
Norwegian University of Science and Technology
Trondheim, Norway
noervaag@idi.ntnu.no
Query Difficulty Estimation
• We perform the first study of estimating the quality of result
predictions for a certain type of queries, namely, entity queries.
• Queries are labeled into two classes: Easy and Difficult.
• Given q, the Mean Average Precision (MAP) is measured for
different ranking models by considering prediction robustness [2].
• We split queries into two groups using the following condition
based on the average and standard deviation of MAP.
Query Difficulty Predictors
• We employ a machine learning approach trained using the
propose 10 post-retrieval predictors shown in Table 1.
• Our predictors capture the ambiguity of a query (or news article)
using annotation information about entities in top-k predictions.
Experiments
• Baseline is the majority
class with accuracy of 0.79
• The best single predictor
is avgEntityPerPredict in all k’s
• The combined features ALL
achieves the accuracy of 0.92
Motivation
• People are naturally curious and anticipate about the future [1].
• When reading news, these questions commonly arise :
- What will happen in the eurozone after the financial crisis?
- How will health care change in the post-genomic society?
- When can renewable energy replace fossil fuels?
• Future information is useful for understanding the temporal
development of news stories, and strategies planning in order to
minimize disruptions and risks, or maximize new opportunities.
What is News Prediction Retrieval?
• Retrieve predictions related to a news story in news archives and
rank by relevance [3].
• Over 32% of 2.5M documents from Yahoo! News (July’09-July’10)
contain at least one prediction.
References
[1] R. Baeza-Yates. Searching the future. In Proceedings of ACM SIGIR workshop on MF/IR 2005.
[2] D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool Publishers, 2010.
[3] N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions. In Proceeding of SIGIR’11, pp. 755-764, 2011.
Fig. 1: Result predictions of a query automatically generated.
System Pipeline
Step 1: Document annotation
• Extract temporal expressions
using time and event recognition
• Normalize them to dates so they
can be anchored on a timeline
• Output: predictions annotated
with named entities and dates
Step 2: Retrieving predictions
• Automatically generate a query
from a news article being read
• Retrieve predictions that match
the query and rank by relevance
(i.e., a prediction is “relevant” if it
is about the topics of the article) Fig. 2: News prediction retrieval system
Table 1: Description of the post-retrieval predictors.
Table 2: Accuracy of query classification.

What's hot

quantitative marketing techniques

Kumarrebal

In this paper Compare the performance of two classification algorithm. I t is useful to differentiate algorithms based on computational performance rather than classification accuracy alone. As although classification accuracy between the algorithms is similar, computational performance can differ significantly and it can affect to the final results. So the objective of this paper is to perform a comparative analysis of two machine learning algorithms namely, K Nearest neighbor, classification and Logistic Regression. In this paper it was considered a large dataset of 7981 data points and 112 features. Then the performance of the above mentioned machine learning algorithms are examined. In this paper the processing time and accuracy of the different machine learning techniques are being estimated by considering the collected data set, over a 60% for train and remaining 40% for testing. The paper is organized as follows. In Section I, introduction and background analysis of the research is included and in section II, problem statement. In Section III, our application and data analyze Process, the testing environment, and the Methodology of our analysis are being described briefly. Section IV comprises the results of two algorithms. Finally, the paper concludes with a discussion of future directions for research by eliminating the problems existing with the current research methodology.

Performance Comparision of Machine Learning Algorithms

Dinusha Dilanka

Application of-statistics-in-CSE

MashudRana9

D.M time series analysis

Tanishq Soni

IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases

IEEEFINALYEARSTUDENTPROJECTS

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...

The Statistical and Applied Mathematical Sciences Institute

Digital Transformation: Big Data and Data Science Learning Path

Chulalongkorn University

Big Data Quality Panel: Diachron Workshop @EDBT

Paolo Missier

KREAM@ICCS2013

Jaakko Lappalainen

This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large preexisting databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest possible measurements to reduce the cost of data collection.

Significant Role of Statistics in Computational Sciences

Editor IJCATR

This seminar summarizes the published research: Exploiting Big Data in Time Series Forecasting: A Cross-Sectional Approach. It presents a break with the traditional forecasting in two points. First, taking only as much data as necessary in order to create accurate forecasts instead of whole histories. Second, abandon the concept of one model per time series and focus on modeling whole sets of time series. this approach is called cross-sectional forecasting.

Exploiting Big Data in Time Series Forecasting: A Cross-Sectional Approach

Molham Al-Maleh

TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.

ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES

Nexgen Technology

San Francisco Crime Classification

sai praneeth reddy

San Francisco Crime Prediction Report

Rohit Dandona

ReComp and the Variant Interpretations Case Study

Paolo Missier

A crime is an act which is against the laws of a country or region. The technique which is used to find areas on a map which have high crime intensity is known as crime hotspot prediction. The technique uses the crime data which includes the area with crime rate and predict the future location with high crime intensity. The motivation of crime hotspot prediction is to raise peopleÃ¢â‚¬â„¢s awareness regarding the dangerous location in certain time period. It can help for police resource allocation for creating a safe environment. The paper presents survey of different types of data mining techniques for crime hotspots prediction.

A Survey on Data Mining Techniques for Crime Hotspots Prediction

IJSRD

Idea

Deepika Agrawal

Exploiting Availability Prediction in Distributed Systems

Mário Almeida

Sessione I - Big Data Li-Chun Zhang, Discussion: Test mining, machin learn...

Istituto nazionale di statistica

Rohit 10103543

Pulkit Chhabra

What's hot (20)

quantitative marketing techniques

Performance Comparision of Machine Learning Algorithms

Application of-statistics-in-CSE

D.M time series analysis

IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...

Digital Transformation: Big Data and Data Science Learning Path

Big Data Quality Panel: Diachron Workshop @EDBT

KREAM@ICCS2013

Significant Role of Statistics in Computational Sciences

Exploiting Big Data in Time Series Forecasting: A Cross-Sectional Approach

ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES

San Francisco Crime Classification

San Francisco Crime Prediction Report

ReComp and the Variant Interpretations Case Study

A Survey on Data Mining Techniques for Crime Hotspots Prediction

Idea

Exploiting Availability Prediction in Distributed Systems

Sessione I - Big Data Li-Chun Zhang, Discussion: Test mining, machin learn...

Rohit 10103543

Viewers also liked

Gilligan presentation

mikaylaw5

FAULT & EVENT TREE ANALYSIS

Nitesh Dongare

Carol gilligan s moral development theory (psychology topic)

rehm dc

Intellectual Property Rights

Shwet Kamal

Intellectual property rights

Anirudh Pandey

Intellectual property (IP) refers to creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce. IP is protected in law by, for example, patents, copyright and trademarks, which enable people to earn recognition or financial benefit from what they invent or create. By striking the right balance between the interests of innovators and the wider public interest, the IP system aims to foster an environment in which creativity and innovation can flourish. Let us try and understand the basics of these intellectual property rights, how they can be applied for in India and understand how and why they are litigated so fiercely.

Intellectual Property Rights In India: Patents Trademarks And Copyrights

JRA & Associates

Introduction to Intellectual Property Rights

Jamil AlKhatib

There are different ways to connect people with data stewardship responsibilities. You can assign people to be data stewards, identify people as data stewards or recognize people as data stewards. These approaches vary in several ways. Join Bob Seiner for this month’s installment of the RWDG webinar series where he will compare and contrast three distinct approaches to data stewardship. The approach you select and follow will heavily influence how data governance results will be achieved. In this webinar Bob will discuss: - Three approaches to data stewardship - The influence of each approach on program results - Factors to assist in the selection of the approach to follow - Obstacles to being successful with each approach - Benefits of following each approach

RWDG Slides: Three Approaches to Data Stewardship

DATAVERSITY

Intellectual Property Rights

harshhanu

Viewers also liked (9)

Gilligan presentation

FAULT & EVENT TREE ANALYSIS

Carol gilligan s moral development theory (psychology topic)

Intellectual Property Rights

Intellectual property rights

Intellectual Property Rights In India: Patents Trademarks And Copyrights

Introduction to Intellectual Property Rights

RWDG Slides: Three Approaches to Data Stewardship

Intellectual Property Rights

Similar to Estimating Query Difficulty for News Prediction Retrieval (poster presentation)

Pre-defense_talk

aphex34

Data Science and Analysis.pptx

PrashantYadav931011

This paper introduces Topic Tracking for Punjabi language. Text mining is a field that automatically extracts previously unknown and useful information from unstructured textual data. It has strong connections with natural language processing. NLP has produced technologies that teach computers natural language so that they may analyze, understand and even generate text. Topic tracking is one of the technologies that has been developed and can be used in the text mining process. The main purpose of topic tracking is to identify and follow events presented in multiple news sources, including newswires, radio and TV broadcasts. It collects dispersed information together and makes it easy for user to get a general understanding. Not much work has been done in Topic tracking for Indian Languages in general and Punjabi in particular. First we survey various approaches available for Topic Tracking, then represent our approach for Punjabi. The experimental results are shown.

Topic Tracking for Punjabi Language

CSEIJJournal

IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.

Extracting intelligence from online news sources

eSAT Publishing House

Abstract This paper summarizes initiative for news extraction when we are investigating a simple approach for visualization of a range of content. To find specific information easily a novel approach of 5W1H is easiest & best suitable. Here we are “Extracting Intelligence from various online news sources”. Intelligence here means “detecting &tracking, visualization”. So our objective is not only extracting the news events occurred but to visualize it as well. This paper presents relatively lightweight approach of mapping the extracted news events. We present results of our work in news event extraction ,relevancy visualization, news visualization of extracted events, to enhance user interaction in information access and exploitation tasks. Here our news event extraction is done by 5W1H approach for detecting & tracking news events & then using its output to visualizing those events by personalizing maps. Index Terms: Event extraction, Visualization, Detecting & tracking, NER, NEXUS

Extracting intelligence from online news sources

eSAT Journals

ReComp for genomics

Paolo Missier

Ontology based clustering in research project

eSAT Publishing House

DataONE Education Module 03: Data Management Planning

DataONE

Research Design may be described as the researchers scheme of outlining the flow of his project. It is based on research design, that the researcher goes about gathering data to answer his research question. It enables the researcher to prioritize his work, create better questionnaires and arrive at conclusions with greater clarity. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following – Always on Time, outstanding customer support, and High-quality Subject Matter Experts. Learn More: http://bit.ly/2S312hb Why Statswork? Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics Across Methodologies | Wide Range Of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities Contact Us: Website: www.statswork.com/ Email: info@statswork.com UnitedKingdom: +44-1143520021 India: +91-4448137070 WhatsApp: +91-8754446690

Research design decisions and be competent in the process of reliable data co...

Stats Statswork

Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...

AIST

To fully trust, accept, and adopt newly emerging AI solutions in our everyday lives and practices, we need human-centric explainable AI that can provide human-understandable interpretations for their algorithmic behaviour and outcomes—consequently enabling us to control and continuously improve their performance, robustness, fairness, accountability, transparency, and explainability throughout the entire lifecycle of AI applications. The recently emerging trend within diverse and multidisciplinary research forms the basis of the next wave of AI. In this talk, we will present research that plans to produce interpretable deep learning models for time series analysis with a broad scope of applications.

[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...

DataScienceConferenc1

1. Intro DS.pptx

Anusuya123

Machine Learning for Forecasting: From Data to Deployment

Anant Agarwal

Deliverable 5 - Hypothesis Tests for Two Samples Competency Formulate and evaluate hypothesis tests for population parameters based on sample statistics using both Critical Regions and P-Values, and be able to state results in a non-technical way that can be understood by consumers of the data instead of statisticians. Dealing with Two Populations Inferential statistics involves forming conclusions about a population parameter. We do so by constructing confidence intervals and testing claims about a population mean and other statistics. Typically, these methods deal with a sample from one population. We can extend the methods to situations involving two populations (and there are many such applications). This deliverable looks at two scenarios. Concept being Studied Your focus is on hypothesis tests and confidence intervals for two populations using two samples, some of which are independent and some of which are dependent. These concepts are an extension of hypothesis testing and confidence intervals which use statistics from one sample to make conclusions about population parameters. What to Submit Your research, calculations, and analysis should be presented on the spreadsheet provided. To my wife, Barbara, and my sons Sean, Colin, and Timothy, and especially to my granddaughter, Isabella – M. J. K. To my wife, Kelley, and my daughters Rachel and Katherine – R. D. J. Copyright © 2015 by SAGE Publications, Inc. All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. Printed in the United States of America Library of Congress Cataloging-in-Publication Data Human resource information systems : basics, applications, and future directions / [edited by] Michael J. Kavanagh, State University of New York at Albany, Mohan Thite, Griffith University, Richard D. Johnson, State University of New York at Albany. — Third edition. pages cm Includes bibliographical references and index. ISBN 978-1-4833-0693-3 (pbk. : alk. paper) 1. Personnel management—Information technology. 2. Personnel management—Data processing. I. Kavanagh, Michael J. II. Thite, Mohan. III. Johnson, Richard D. HF5549.5.D37H86 2015 658.300285—dc23 2013029735 This book is printed on acid-free paper. 14 15 16 17 18 10 9 8 7 6 5 4 3 2 1 FOR INFORMATION: SAGE Publications, Inc. 2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected] SAGE Publications Ltd. 1 Oliver’s Yard 55 City Road London EC1Y 1SP United Kingdom SAGE Publications India Pvt. Ltd. B 1/I 1 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India SAGE Publications Asia-Pacific Pte. Ltd. 3 Church Street #10-04 Samsung Hub Singapore 049483 Acquisitions Editor: Patricia Quinlin Associate Editor: Maggie Stanley Assistant Editor: Mega.

Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx

randyburney60861

In this paper, different classification algorithms for data mining are discussed. Data Mining is about explaining the past & predicting the future by means of data analysis. Classification is a task of data mining, which categories data based on numerical or categorical variables. To classify the data many algorithms are proposed, out of them five algorithms are comparatively studied for data mining through classification. There are four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions & Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors, Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark datasets like Iris & Lung Cancer.

Hypothesis on Different Data Mining Algorithms

IJERA Editor

Qualitative and Quantitative Research Plans By Malik Muhammad Mehran

Malik Mughal

Cri big data

Putchong Uthayopas

A tutorial on secure outsourcing of large scalecomputation for big data

redpel dot com

Week11-EvaluationMethods.ppt

KamranAli649587

Time-series classification is utilized in a variety of applications leading to the development of many data mining techniques for time-series analysis. Among the broad range of time-series classification algorithms, recent studies are considering the impact of deep learning methods on time-series classification tasks. The quantity of related publications requires a bibliometric study to explore most prominent keywords, countries, sources and research clusters. The paper conducts a bibliometric analysis on related publications in time-series classification, adopted from Scopus database between 2010 and 2019. Through keywords co-occurrence analysis, a visual network structure of top keywords in time-series classification research has been produced and deep learning has been introduced as the most common topic by additional inquiry of the bibliography. The paper continues by exploring the publication trends of recent deep learning approaches for time-series classification. The annual number of publications, the productive and collaborative countries, the growth rate of sources, the most occurred keywords and the research collaborations are revealed from the bibliometric analysis within the study period. The research field has been broken down into three main categories as different frameworks of deep neural networks, different applications in remote sensing and also in signal processing for time-series classification tasks. The qualitative analysis highlights the categories of top citation rate papers by describing them in details.

Quantitative and Qualitative Analysis of Time-Series Classification using Dee...

Nader Ale Ebrahim

Similar to Estimating Query Difficulty for News Prediction Retrieval (poster presentation) (20)

Pre-defense_talk

Data Science and Analysis.pptx

Topic Tracking for Punjabi Language

Extracting intelligence from online news sources

ReComp for genomics

Ontology based clustering in research project

DataONE Education Module 03: Data Management Planning

Research design decisions and be competent in the process of reliable data co...

Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...

[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...

1. Intro DS.pptx

Machine Learning for Forecasting: From Data to Deployment

Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx

Hypothesis on Different Data Mining Algorithms

Qualitative and Quantitative Research Plans By Malik Muhammad Mehran

Cri big data

A tutorial on secure outsourcing of large scalecomputation for big data

Week11-EvaluationMethods.ppt

Quantitative and Qualitative Analysis of Time-Series Classification using Dee...

More from Nattiya Kanhabua

Search, Exploration and Analytics of Evolving Data

Nattiya Kanhabua

In human memory, forgetting plays a crucial role for focusing on important things and neglecting irrelevant details. In digital memories, the idea of systematic forgetting has found little attention, so far. At first glance, forgetting seems to contradict the purpose of archival and preservation. However, we are currently facing a tremendous growth in volumes of digital content. Thus, it becomes ever more important to focus, while forgetting irrelevant details, redundancies and noise. This holds true for better organizing the information space as well as in preservation management for making and revisiting decisions on what to keep. Therefore, we propose the introduction of the concept of managed forgetting as part of a joint information management and preservation management process in digital memories. Managed forgetting models resource selection as a function of attention and significance dynamics. Based on dynamic, multidimensional information value assessment it identifies information objects, e.g., documents or images of decreasing importance and/or topicality and triggers forgetting actions. Those actions include a variety of options, namely, aggregation and summarization, revised search and ranking behavior, elimination of redundancy, and finally, also deletion. In this paper, we present our vision for managed forgetting, discuss the challenges as well as our first ideas for its introduction, and present a case study for its motivation.

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...

Nattiya Kanhabua

A microblogging service like Twitter continues to surge in importance as a means of sharing information in social networks. In the medical domain, several works have shown the potential of detecting public health events (i.e., infectious disease outbreaks) using Twitter messages or tweets. Given its real-time nature, Twitter can enhance early outbreak warning for public health authorities in order that a rapid response can take place. Most of previous works on detecting outbreaks in Twitter simply analyze tweets matched disease names and/or locations of interests. However, the effectiveness of such method is limited for two main reasons. First, disease names are highly ambiguous, i.e., referring slangs or non health-related contexts. Second, the characteristics of infectious diseases are highly dynamic in time and place, namely, strongly time-dependent and vary greatly among different regions. In this paper, we propose to analyze the temporal diversity of tweets during the known periods of real-world outbreaks in order to gain insight into a temporary focus on specific events. More precisely, our objective is to understand whether the temporal diversity of tweets can be used as indicators of outbreak events, and to which extent. We employ an efficient algorithm based on sampling to compute the diversity statistics of tweets at particular time. To this end, we conduct experiments by correlating temporal diversity with the estimated event magnitude of 14 real-world outbreak events manually created as ground truth. Our analysis shows that correlation results are diverse among different outbreaks, which can reflect the characteristics (severity and duration) of outbreaks.

Understanding the Diversity of Tweets in the Time of Outbreaks

Nattiya Kanhabua

In this paper, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exists three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The aim of this paper is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.

Why Is It Difficult to Detect Outbreaks in Twitter?

Nattiya Kanhabua

Search result diversification is a common technique for tackling the problem of ambiguous and multi-faceted queries by maximizing query aspects or subtopics in a result list. In some special cases, subtopics associated to such queries can be temporally ambiguous, for instance, the query US Open is more likely to be targeting the tennis open in September, and the golf tournament in June. More precisely, users' search intent can be identified by the popularity of a subtopic with respect to the time where the query is issued. In this paper, we study search result diversification for time-sensitive queries, where the temporal dynamics of query subtopics are explicitly determined and modeled into result diversification. Unlike aforementioned work that, in general, considered only static subtopics, we leverage dynamic subtopics by analyzing two data sources (i.e., query logs and a document collection). By using these data sources, it provides the insights from different perspectives of how query subtopics change over time. Moreover, we propose novel time-aware diversification methods that leverage the identified dynamic subtopics. A key idea is to re-rank search results based on the freshness and popularity of subtopics. To this end, our experimental results show that the proposed methods can significantly improve the diversity and relevance effectiveness for time-sensitive queries in comparison with state-of-the-art methods.

Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification

Nattiya Kanhabua

Wikipedia has become a widely accepted reference point for information of all kinds; real-world events (e.g., natural disasters, man-made incidents, and political events) as well as specific entities like politicians, celebrities, and entities involved in an event. Due to its open construction and negotiation, Wikipedia is an important new cultural and societal phenomenon, and the content of Wikipedia articles is a valuable source for different applications. For instance, the edit history and view logs of Wikipedia can be leveraged for detecting an event and its associated entities. In this study, we analyze temporal anchor texts extracted from the edit history. We propose a model for Wikipedia and anchor texts viewed as a temporal resource and a probabilistic method for ranking temporal anchor texts. Our preliminary results show that relevant anchor texts composed of evolving information (e.g., the changes of names and semantic roles, as well as evolving context) that reflects societal trends and perceptions, thus being candidates for capturing entity evolution.

On the Value of Temporal Anchor Texts in Wikipedia

Nattiya Kanhabua

We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.

Ranking Related News Predictions

Nattiya Kanhabua

Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specific knowledge. Its content is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events influence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present Wikipedia Event Reporter, a web-based system that supports the entity-centric, temporal analytics of event-related information in Wikipedia by analyzing the whole history of article updates. For a given entity, the system first identifies peaks of update activities for the entity using burst detection and automatically extracts event-related updates using a machine-learning approach. Further, the system deter- mines distinct events through the clustering of updates by exploiting different types of information such as update time, textual similarity, and the position of the updates within an article. Finally, the system generates the meaningful temporal summarization of event-related updates and automatically annotates the identified events in a timeline.

Temporal summarization of event related updates

Nattiya Kanhabua

In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.

Temporal Web Dynamics: Implications from Search Perspective

Nattiya Kanhabua

Temporal Web Dynamics and Implications for Information Retrieval

Nattiya Kanhabua

Humans are very effective in remembering by abstraction, pattern exploitation, or contextualization. On the other hand, humans are also capable of forgetting irrelevant details, an important role in the human brain helping us to focus on relevant things instead of drowning in details by remembering everything. The research question that we address in this paper is: Can we learn from human remembering and forgetting in order to develop more advanced preservation technology? In particular, we aim at studying how a managed or controlled form of forgetting can play a role in digital preservation, including personal and organizational archives as well as collective memories. Our research goal is twofold: 1) to establish effective preservation for more concise and accessible digital memories, and 2) to enable the easier and wider adoption of preservation technology. The concept of managed forgetting is discussed in more detail in the research work of the European project ForgetIT, which investigates the proposed concept by mean of an integrated information and preservation management approach.

Preservation and Forgetting: Friends or Foes?

Nattiya Kanhabua

With the growing volumes of and reliance on digital content, there is a clear need for better information access solutions that keep relevant information accessible and usable in long-term. Inspired by the role of forgetting in the human brain, we envision a concept of managed forgetting for systematically dealing with information that progressively ceases in importance as well as with redundant information. Although inspired by human memory, managed forgetting is meant to complement rather than copy human remembering and forgetting. It can be regarded as functions of attention and significance dynamics relying on multi-faceted information assessment. This talk introduces our vision for managed forgetting on the conceptual level as part of an Integrated Cognitive Framework for Time-aware Information Access. We discuss relevant research and application aspects for managed forgetting. To this end, we present our first results and point out issues where further research is required.

Concise Preservation by Combining Managed Forgetting and Contextualized Remem...

Nattiya Kanhabua

In this talk, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exist three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The objective is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.

Can Twitter & Co. Save Lives?

Nattiya Kanhabua

This talk gives a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.

Searching the Temporal Web: Challenges and Current Approaches

Nattiya Kanhabua

Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or cannot be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.

Improving Temporal Language Models For Determining Time of Non-Timestamped Do...

Nattiya Kanhabua

In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where documents are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document repositories, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main research question is: How to improve the quality of search in a document archive using temporal information?

Exploiting temporal information in retrieval of archived documents (doctoral ...

Nattiya Kanhabua

Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness.

Determining Time of Queries for Re-ranking Search Results

Nattiya Kanhabua

Supporting Exploration and Serendipity in Information Retrieval

Nattiya Kanhabua

We address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. Our contributions are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking. Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. We analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness. We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques. Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections.

Time-aware Approaches to Information Retrieval

Nattiya Kanhabua

Retrieval effectiveness of temporal queries can be improved by taking into account the time dimension. Existing temporal ranking models follow one of two main approaches: 1) a mixture model linearly combining textual similarity and temporal similarity, and 2) a probabilistic model generating a query from the textual and temporal part of document independently. In this paper, we propose a novel time-aware ranking model based on learning-to-rank techniques. We employ two classes of features for learning a ranking model, entity-based and temporal features, which are derived from annotation data. Entity-based features are aimed at capturing the semantic similarity between a query and a document, whereas temporal features measure the temporal similarity. Through extensive experiments we show that our ranking model significantly improves the retrieval effectiveness over existing time-aware ranking models.

Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)

Nattiya Kanhabua

More from Nattiya Kanhabua (20)

Search, Exploration and Analytics of Evolving Data

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...

Understanding the Diversity of Tweets in the Time of Outbreaks

Why Is It Difficult to Detect Outbreaks in Twitter?

Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification

On the Value of Temporal Anchor Texts in Wikipedia

Ranking Related News Predictions

Temporal summarization of event related updates

Temporal Web Dynamics: Implications from Search Perspective

Temporal Web Dynamics and Implications for Information Retrieval

Preservation and Forgetting: Friends or Foes?

Concise Preservation by Combining Managed Forgetting and Contextualized Remem...

Can Twitter & Co. Save Lives?

Searching the Temporal Web: Challenges and Current Approaches

Improving Temporal Language Models For Determining Time of Non-Timestamped Do...

Exploiting temporal information in retrieval of archived documents (doctoral ...

Determining Time of Queries for Re-ranking Search Results

Supporting Exploration and Serendipity in Information Retrieval

Time-aware Approaches to Information Retrieval

Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)

Recently uploaded

Presentation on Engagement in Book Clubs

samaasim06

Thirunelveli call girls Tamil escorts 7877702510

Vipesco

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx

raffaeleoman

Dreaming Marissa Sánchez Music Video Treatment

nswingard

Yesterday in Lagos, I had the honour of delivering a lecture titled “If this Giant Must Walk; Manifesto for a New Nigeria“ at the Inaugural Memorial Lecture of Prince Emeka Obasi, founder and publisher of Business Hallmark and the inspirational figure behind the Public Policy Research and Analysis Centre - promoters of the revered Zik Leadership and Governance Awards. My lecture focused on the challenges of nation-building in Nigeria and how we can approach them in a way that promotes progress and unity. I discussed the many sources of concern about our country’s future prospects, including violent conflicts, revisionist contestations of the Amalgamation Act of 1914, discontent with the Nigerian economy, and dysfunctions in the federal system. Central to my lecture was the call to address these challenges by crafting a new manifesto for Nigeria. This manifesto, I proposed, should champion integrity, compassion, character, competence, and commitment to national unity and progress within a framework of democratic governance and cultural diversity. I firmly believe that by doing so, we can guide Nigeria to stride forward with pride and purpose. I want to thank the Board and Management of the Public Policy Forum for their kind invitation to speak at this important event and for their commitment to promoting public policy research and analysis in Nigeria. My gratitude also to the late Prince Emeka Obasi, a true inspiration and a builder of bridges across divides, for his contributions to the country. My thoughts are with his family and loved ones.

If this Giant Must Walk: A Manifesto for a New Nigeria

Kayode Fayemi

Air breathing and respiratory adaptations in diver animals

aqsarehman5055

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...

Delhi Call girls

I had the honour of presenting my reflections on the autobiography of Professor Isaac Folorunso Adewole, titled "Uncommon Grace." As someone deeply invested in documentation and history, I found Professor Adewole's decision to narrate his journey from humble beginnings to occupying one of the highest offices in the land both inspiring and invaluable. In this eloquently written memoir, Professor Adewole provides a comprehensive account of his life, from his ancestral roots to his time as Minister of Health in Nigeria. The unique aspect of this autobiography is that he portrayed himself authentically without taking the help of third-party narratives, which is often seen in accounts of high-ranking officials. His upbringing was greatly influenced by his father's commitment to education. He became a prominent figure in advocating for the rights of the underprivileged through trade unionism. His story is one of unwavering determination, resilience, and faith. His experiences, including both successes and struggles, provide priceless lessons on leadership, perseverance, and the alignment of personal values with public service. While reading "Uncommon Grace," I was struck by the deep leadership lessons that are embedded within its chapters. Professor Adewole stresses the importance of inclusivity, servant leadership, and planning, which are all highly relevant in today's complex world. His commitment to accountability, as well as his primary responsibility as a researcher, serves as a guiding light for aspiring leaders across various disciplines. During his tenure as the Vice Chancellor of the University of Ibadan, he led with visionary leadership and transformative impact. His accomplishments have been meticulously documented in the book, which can serve as a blueprint for rejuvenating institutions and promoting academic excellence. In the latter part of his autobiography, Professor Adewole shares his experiences as a Minister, detailing the challenges he faced while serving the public with integrity and courage. His reflections on the complexities of public service, coupled with his commitment to the well-being of the nation, offer practical insights for policymakers and citizens alike. I have carefully read "Uncommon Grace" and it is more than just a memoir. It is a timeless book that is hard to put down once you start reading. While intellectuals may continue to debate whether uncommon grace was made possible by uncommon preparation or the other way around, I applaud Professor Adewole for sharing his ideas, knowledge, and experience with the public. I highly recommend this book to everyone.

Uncommon Grace The Autobiography of Isaac Folorunso

Kayode Fayemi

Introduction to Prompt Engineering (Focusing on ChatGPT)

Chameera Dedduwage

BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service

Delhi Call girls

Dreaming Music Video Treatment _ Project & Portfolio III

NhPhngng3

Report Writing Webinar Training

KylaCullinane

💚❤️Call girl in Chandigarh ☎️8868886958☎️ Call Girl service in Chandigarh☎️ Chandigarh Call Girls Service ☎️ Call Girls In Chandigarh Call Girl Service In Chandigarh 💯Call Ruhi 🔝8868886958🔝Chandigarh Call Girl WhatsApp Chat: ☎️ +91-8868886958 Call Girls In Chandigarh offer you everything from intimate moments to wild nights - be it intimate or wild. Their girls are always prepared to meet all your needs and desires so you're assured a memorable experience with them. Beautiful Babes are sure to turn heads wherever they go, captivating men with their seductive personalities and captivating looks - not to mention their sensually alluring bodies that are sure to satisfy all your naughty fantasies, from blowjobs to anal massages; not forgetting BDSM and orgies as well. Show them some appreciation when the time is right. Call Girl Chandigarh that their clients can be very demanding. To maintain their reputation and gain more clients, these sexy call girls always work tirelessly to provide exceptional service that ensures each customer's happiness - so much so that many men seek them out to have an unforgettable experience with Passionate about providing erotic pleasure for their clients - fulfilling all sexy fantasies while offering role play services such as BDSM. ★OUR BEST SERVICES: - FOR BOOKING ★ A-Level (5-star escort) ★ Strip-tease ★ BBBJ (Bareback Blowjob) ★ Spending time in my rooms ★ BJ (Blowjob Without a Condom) ★. Extra ball (Have ride many times) ☛ ☛ ☛ ✔✔ secure✔✔ 100% safe WHATSAPP CALL ME💥✨✨⭐⭐⚡💦💦💨🔥💫💫 Their services range from erotic encounters and movie dates to in-call and out-call services, making this option available 24/7. Available for services including NSA, threesomes and foursomes sessions as well as massage services and casual foreplay - they even provide real girlfriend experience. Find one online or visit local women's clubs, hotels or restaurants. Hiring a Chandigarh call girl for an evening or day out can be the perfect addition to your social gathering or office event, or you could arrange for her to visit your home or hotel room. Not only are these women gorgeous, they're intelligent as well, with great senses of humor - sure to please and leave you wanting more. If you want a cheap escort in Chandigarh, look for someone who is either a student or works part time so that she will always be available when you need her. In addition to this, look for housewives as this ensures they have stable lives that can keep you satisfied for extended periods. These beautiful ladies will capture your attention at first sight with stunning eyes and full, seductive lips; not to mention a seductive personality that makes you want to spend more time with them; moreover they are discreet enough to meet up anywhere nearby! Hiring a call girl in Chandigarh can be done through either the internet or calling her directly, with services like massage or sex sessions offered to request from specific agencies within.

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...

Sheetaleventcompany

• For a full set of 390+ questions. Go to https://skillcertpro.com/product/aws-data-engineer-associate-dea-c01-exam-questions/ • SkillCertPro offers detailed explanations to each question which helps to understand the concepts better. • It is recommended to score above 85% in SkillCertPro exams before attempting a real exam. • SkillCertPro updates exam questions every 2 weeks. • You will get life time access and life time free updates • SkillCertPro assures 100% pass guarantee in first attempt.

AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf

SkillCertProExams

lONG QUESTION ANSWER PAKISTAN STUDIES10.

lodhisaajjda

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...

amilabibi1

The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf

Senaatti-kiinteistöt

Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified

Delhi Call girls

BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service

Delhi Call girls

SaaStr Workshop Wednesday w/ Lucas Price, Yardstick

saastr

Recently uploaded (20)

Presentation on Engagement in Book Clubs

Thirunelveli call girls Tamil escorts 7877702510

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx

Dreaming Marissa Sánchez Music Video Treatment

If this Giant Must Walk: A Manifesto for a New Nigeria

Air breathing and respiratory adaptations in diver animals

Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...

Uncommon Grace The Autobiography of Isaac Folorunso

Introduction to Prompt Engineering (Focusing on ChatGPT)

BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service

Dreaming Music Video Treatment _ Project & Portfolio III

Report Writing Webinar Training

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...

AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf

lONG QUESTION ANSWER PAKISTAN STUDIES10.

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...

The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf

Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified

BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service

SaaStr Workshop Wednesday w/ Lucas Price, Yardstick

Estimating Query Difficulty for News Prediction Retrieval (poster presentation)

1. Contact info: Nattiya Kanhabua L3S Research Center Appelstrasse 9a, 30167 Hannover, Germany Email: kanhabua@L3S.de http://www.l3s.de Estimating Query Difficulty for News Prediction Retrieval Nattiya Kanhabua L3S Research Center Leibniz Universität, Hannover, Germany kanhabua@L3S.de Kjetil Nørvåg Department of Computer Science Norwegian University of Science and Technology Trondheim, Norway noervaag@idi.ntnu.no Query Difficulty Estimation • We perform the first study of estimating the quality of result predictions for a certain type of queries, namely, entity queries. • Queries are labeled into two classes: Easy and Difficult. • Given q, the Mean Average Precision (MAP) is measured for different ranking models by considering prediction robustness [2]. • We split queries into two groups using the following condition based on the average and standard deviation of MAP. Query Difficulty Predictors • We employ a machine learning approach trained using the propose 10 post-retrieval predictors shown in Table 1. • Our predictors capture the ambiguity of a query (or news article) using annotation information about entities in top-k predictions. Experiments • Baseline is the majority class with accuracy of 0.79 • The best single predictor is avgEntityPerPredict in all k’s • The combined features ALL achieves the accuracy of 0.92 Motivation • People are naturally curious and anticipate about the future [1]. • When reading news, these questions commonly arise : - What will happen in the eurozone after the financial crisis? - How will health care change in the post-genomic society? - When can renewable energy replace fossil fuels? • Future information is useful for understanding the temporal development of news stories, and strategies planning in order to minimize disruptions and risks, or maximize new opportunities. What is News Prediction Retrieval? • Retrieve predictions related to a news story in news archives and rank by relevance [3]. • Over 32% of 2.5M documents from Yahoo! News (July’09-July’10) contain at least one prediction. References [1] R. Baeza-Yates. Searching the future. In Proceedings of ACM SIGIR workshop on MF/IR 2005. [2] D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool Publishers, 2010. [3] N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions. In Proceeding of SIGIR’11, pp. 755-764, 2011. Fig. 1: Result predictions of a query automatically generated. System Pipeline Step 1: Document annotation • Extract temporal expressions using time and event recognition • Normalize them to dates so they can be anchored on a timeline • Output: predictions annotated with named entities and dates Step 2: Retrieving predictions • Automatically generate a query from a news article being read • Retrieve predictions that match the query and rank by relevance (i.e., a prediction is “relevant” if it is about the topics of the article) Fig. 2: News prediction retrieval system Table 1: Description of the post-retrieval predictors. Table 2: Accuracy of query classification.

Estimating Query Difficulty for News Prediction Retrieval (poster presentation)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Estimating Query Difficulty for News Prediction Retrieval (poster presentation)

Similar to Estimating Query Difficulty for News Prediction Retrieval (poster presentation) (20)

More from Nattiya Kanhabua

More from Nattiya Kanhabua (20)

Recently uploaded

Recently uploaded (20)

Estimating Query Difficulty for News Prediction Retrieval (poster presentation)