Learning To Rank User Queries to Detect Search Tasks

Learning to Rank User Queries to Detect Search Tasks
Learning to Rank User Queries
to Detect Search Tasks
Claudio Lucchese1, Franco Maria Nardini1,
Salvatore Orlando2, Gabriele Tolomei3
1 ISTI-CNR, Pisa, Italy
2 Universit´a Ca’ Foscari Venezia, Italy
3 Yahoo Labs, London, UK

Introduction
The Evolution of Web Search
An increasing number of user searches are part of complex
patterns.
Complex search patterns are often composed of several,
multi-term, interleaved queries spread across many sessions.
User information needs are getting harder to understand and
satisfy.
Search Task: Clusters of queries with the same latent information
need from a real-world search engine log.

Introduction
Complex Search Patterns: AOL 2006
Queries within
short-time sessions
are part of diﬀerent
complex tasks.
Each complex task
spans across several
sessions.

Related Work
Related Work - I
Jones and Klinkner [3]
First high-level analysis of user search tasks.
Hierarchical search:
Flat query streams can be structured as complex search
missions linked to each other.
Each mission in turn contains simpler search goals.
Design a binary classiﬁer which is able to predict whether
two queries belong to the same goal or not.

Related Work
Related Work - II
Lucchese et al. [4, 5]
Formally introduce the search task discovery problem.
Graph-based representation of each user session:
Nodes are queries.
Edges between query pairs are weighted according to a query
similarity measure.
Search tasks are identiﬁed by connected components of
each user session graph.
Outperforms other approaches for session boundary detection
like the Query-Flow Graph [1].

Related Work
Related Work - III
Wang et al. [6]
Cross-session discovery of search tasks.
Graph-based representation of all queries.
Search tasks as connected components of the graph having
the following characteristics:
Each query of a task can be linked only to one past query of
the same task.
Tasks are therefore modeled as trees.
SVM model to identify the best tree structure hidden by the
query similarity graph.

Search Task Discovery (STD) Framework
Search Task Discovery Framework
Ground truth
of search tasks
QSL
1. Learning
query similarity
2. Learning
pruning threshold
3. Find
connected components
GQC

Query Similarity Learning (QSL)
Query Similarity Learning
Query Similarity Learning (QSL): estimates a target query
similarity function ˆσ from a ground truth of manually-labeled
search tasks.
Binary classes: same-task (positive) and not-same-task
(negative)
Learning to Rank: instead of predicting same-task, we
learn a ranking function:
same-task queries ranked highest.
How to build the training sets?

Graph-based Query Clustering (GQC)
Graph-based Query Clustering
Graph-based Query Clustering (GQC): transforms a user
search log into a weighted query graph Gu
ˆσ:
Queries are nodes.
Edges are labeled using ˆσ previously learnt by QSL.
Connected components of Gu
ˆσ: user search tasks.
Weak edges introduced by ˆσ:
-neighborhood technique as selective pruning: removing edges
if below a given .
optimal ˆ learnt from groud truth.

Search Task Discovery Problem
Search Task Discovery Problem
Given a user query log Qu, a clustering algorithm C that extracts
the connected components of the graph, and a quality function γ
measuring the quality of a clustering, the Search Task Discovery
Problem requires to ﬁnd the best similarity function and
pruning threshold that maximize the average quality of the clusters
C(Gu
ˆσ,ˆ) for all u ∈ U, i.e.,
(ˆσ∗
, ˆ∗
) = argmaxˆσ,ˆ
1
|U|
u∈U
γ(C(Gu
ˆσ,ˆ)).

Reducing QSL to a Learning to Rank Problem
Query-centric approach: we aim at learning a query
similarity function that scores higher those queries that
appear in the same task.
For any given qu
i ∈ Tu
k , we say that qu
j is relevant to qu
i if
qu
j ∈ Tu
k , and irrelevant otherwise.
Labels {1, 0} assigned accordingly when building the training
set.
Number of relevant labels: u |Qu|2.

User-centric approach: we aim at learning a query similarity
function that scores higher every pair of objects in the same
task.
Given any pair of queries (qu
i , qu
j ) in the user search log Qu we
require their similarity to be high iﬀ they belong to the same
task.
Here, each (qu
i , qu
j ) is a single ordered pair, where i ≤ j is
associated with the tuple for user u.
the binary relation “≤” between queries is given by the order
of their issuing times.
Number of relevance labels: u
Qu
2 .

Types of Features Used1
Symmetric global features based on Qu.
Examples: Session Num Queries, Session Time Span,
Avg Session Query Len, etc.
Symmetric features extracted from the query pair (qu
i , qu
j ).
Examples: Levenshtein, Jaccard (3-grams), ∆ Time, ∆ Pos,
Global Joint Prob (queries), Wikipedia Cosine [5].
Asymmetric features extracted from the query pair (qu
i , qu
j ).
Examples: Is Proper Subset (term-set),
Global Conditional Prob (queries).
37 features in total.
1
See our paper for the complete list of features used.

Implementing STD
Implementing STD
Learning to Rank
Gradient-Boosted Regression Trees (GBRT)
RMSE.
LambdaMART (λMART)
nDCG.
Binary Classiﬁcation
Logistic Regression (LogReg) [3]
Logistic loss.
Decision Trees (DT) [5]
Information gain.
k-fold cross validation (k = 5): parameters tuned on
validation data.
Clustering measure to get ˆ: Jaccard.

Experiments
Dataset
Dataset
Proposed by Hagen et al. [2]
Three-months sample of the AOL query log.
8,840 queries issued by 127 users.
Labeled by two human assessors into 1,378 user search tasks
(called missions in the original paper).
We remove stopwords, noisy chars and transform query strings
to lowercase.
We remove longest and shortest user sessions.
Resulting dataset: 6,381 queries from 125 users.

Experiments
Dataset
Dataset
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 10 15 20 25 30 35 55 85 110 166
ratiooftasks
number of queries
(a)
0
0.2
0.4
0.6
0.8
1
1 2 3
ratioofusers
0.8
1
rs
0.8
1
rs
Singleton tasks are 41% of the dataset.
Singleton: answering always not-same-task.

Experiments
Results
Results
Query-centric approach
Figure 2: Pro
Table 5: Comparison of L2R techniques and baselines in
validation folds. Best results are shown in boldface, and
↵ = .05.
(a) Query-centric dataset L0
Method
Metric
Rand F1avg Jaccard F1
Singleton 0.738 0.458 0 0
DT 0.898 0.853 0.620 0.714
LogReg 0.919 0.868 0.639 0.737
GBRT 0.915 0.889(*) 0.670 0.763
MART 0.919 0.879 0.687(*) 0.778(*)
skewness of class labels distribution in our dataset. There
fore, even though it is a widely used measure, Rand inde

Experiments
Results
Results
User-centric approach
e 2: Properties of the dataset.
elines in terms of Rand, F1avg, Jaccard, and F1 averaged across 5
ace, and there is a (*) next to those which are statistically signiﬁ
F1
0
0.714
0.737
0.763
.778(*)
(b) User-centric dataset L00
Method
Metric
Rand F1avg Jaccard F1
Singleton 0.738 0.458 0 0
DT 0.880 0.843 0.604 0.706
LogReg 0.921(*) 0.868 0.639 0.738
GBRT 0.913 0.875(*) 0.682 0.771
MART 0.914 0.873 0.684(*) 0.778(*)
. There-
nd index
e quality
criminative features are those regarding the relative
times and positions of a given pair of queries. This
sonable, since the chance of two close queries to be

Conclusion
Conclusion
We proposed the Search Task Discovery framework made up
of two modules: QSL and GQC.
QSL learns a query similarity functions from a ground truth of
manually-labeled search tasks.
GQC models the user queries as a graph.
We propose to employ Learning to Rank techniques (GBRT,
λMART) in QSL.
Experiments prove the eﬀectiveness of Learning to Rank
techniques in detecting search tasks.
Future Work: We plan to employ STD in a streaming setting to
detect search tasks in (pseudo) real time.

Conclusion
Thank you for your attention!

Conclusion
References I
[1] P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna.
The query-ﬂow graph: model and applications.
In CIKM’08, pages 609–618. ACM, 2008.
[2] M. Hagen, J. Gomoll, A. Beyer, and B. Stein.
From search session detection to search mission detection.
In OAIR’13, pages 85–92, 2013.
[3] R. Jones and K. L. Klinkner.
Beyond the session timeout: automatic hierarchical segmentation of search topics
in query logs.
In CIKM’08. ACM, 2008.
[4] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei.
Identifying task-based sessions in search engine query logs.
In WSDM’11, pages 277–286. ACM.
[5] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei.
Discovering user tasks in long-term web search engine logs.
ACM TOIS, 31(3):1–43, July 2013.

Conclusion
References II
[6] H. Wang, Y. Song, M.-W. Chang, X. He, R. W. White, and W. Chu.
Learning to extract cross-session search tasks.
In WWW’13, pages 1353–1364. ACM, 2013.

Conclusion
Clustering Metrics
Rand: tp+tn
tp+tn+fp+fn
Jaccard: tp
tp+fp+fn
F1: 2·p·r
p+r .
F1avg = j
mj
m F1max(j) where mj = |Tu
j | and m = |Qu|.

Conclusion
Feature Importance

Learning To Rank User Queries to Detect Search Tasks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Learning To Rank User Queries to Detect Search Tasks

Similar to Learning To Rank User Queries to Detect Search Tasks (20)

Recently uploaded

Recently uploaded (20)

Learning To Rank User Queries to Detect Search Tasks