SlideShare a Scribd company logo
1 of 24
Download to read offline
Learning to Rank User Queries to Detect Search Tasks
Learning to Rank User Queries
to Detect Search Tasks
Claudio Lucchese1, Franco Maria Nardini1,
Salvatore Orlando2, Gabriele Tolomei3
1 ISTI-CNR, Pisa, Italy
2 Universit´a Ca’ Foscari Venezia, Italy
3 Yahoo Labs, London, UK
Learning to Rank User Queries to Detect Search Tasks
Introduction
The Evolution of Web Search
An increasing number of user searches are part of complex
patterns.
Complex search patterns are often composed of several,
multi-term, interleaved queries spread across many sessions.
User information needs are getting harder to understand and
satisfy.
Search Task: Clusters of queries with the same latent information
need from a real-world search engine log.
Learning to Rank User Queries to Detect Search Tasks
Introduction
Complex Search Patterns: AOL 2006
Queries within
short-time sessions
are part of different
complex tasks.
Each complex task
spans across several
sessions.
Learning to Rank User Queries to Detect Search Tasks
Related Work
Related Work - I
Jones and Klinkner [3]
First high-level analysis of user search tasks.
Hierarchical search:
Flat query streams can be structured as complex search
missions linked to each other.
Each mission in turn contains simpler search goals.
Design a binary classifier which is able to predict whether
two queries belong to the same goal or not.
Learning to Rank User Queries to Detect Search Tasks
Related Work
Related Work - II
Lucchese et al. [4, 5]
Formally introduce the search task discovery problem.
Graph-based representation of each user session:
Nodes are queries.
Edges between query pairs are weighted according to a query
similarity measure.
Search tasks are identified by connected components of
each user session graph.
Outperforms other approaches for session boundary detection
like the Query-Flow Graph [1].
Learning to Rank User Queries to Detect Search Tasks
Related Work
Related Work - III
Wang et al. [6]
Cross-session discovery of search tasks.
Graph-based representation of all queries.
Search tasks as connected components of the graph having
the following characteristics:
Each query of a task can be linked only to one past query of
the same task.
Tasks are therefore modeled as trees.
SVM model to identify the best tree structure hidden by the
query similarity graph.
Learning to Rank User Queries to Detect Search Tasks
Search Task Discovery (STD) Framework
Search Task Discovery Framework
Ground truth
of search tasks
QSL
1. Learning
query similarity
2. Learning
pruning threshold
3. Find
connected components
GQC
Learning to Rank User Queries to Detect Search Tasks
Search Task Discovery (STD) Framework
Query Similarity Learning (QSL)
Query Similarity Learning
Query Similarity Learning (QSL): estimates a target query
similarity function ˆσ from a ground truth of manually-labeled
search tasks.
Binary classes: same-task (positive) and not-same-task
(negative)
Learning to Rank: instead of predicting same-task, we
learn a ranking function:
same-task queries ranked highest.
How to build the training sets?
Learning to Rank User Queries to Detect Search Tasks
Search Task Discovery (STD) Framework
Graph-based Query Clustering (GQC)
Graph-based Query Clustering
Graph-based Query Clustering (GQC): transforms a user
search log into a weighted query graph Gu
ˆσ:
Queries are nodes.
Edges are labeled using ˆσ previously learnt by QSL.
Connected components of Gu
ˆσ: user search tasks.
Weak edges introduced by ˆσ:
-neighborhood technique as selective pruning: removing edges
if below a given .
optimal ˆ learnt from groud truth.
Learning to Rank User Queries to Detect Search Tasks
Search Task Discovery (STD) Framework
Search Task Discovery Problem
Search Task Discovery Problem
Given a user query log Qu, a clustering algorithm C that extracts
the connected components of the graph, and a quality function γ
measuring the quality of a clustering, the Search Task Discovery
Problem requires to find the best similarity function and
pruning threshold that maximize the average quality of the clusters
C(Gu
ˆσ,ˆ) for all u ∈ U, i.e.,
(ˆσ∗
, ˆ∗
) = argmaxˆσ,ˆ
1
|U|
u∈U
γ(C(Gu
ˆσ,ˆ)).
Learning to Rank User Queries to Detect Search Tasks
Reducing QSL to a Learning to Rank Problem
Reducing QSL to a Learning to Rank Problem
Query-centric approach: we aim at learning a query
similarity function that scores higher those queries that
appear in the same task.
For any given qu
i ∈ Tu
k , we say that qu
j is relevant to qu
i if
qu
j ∈ Tu
k , and irrelevant otherwise.
Labels {1, 0} assigned accordingly when building the training
set.
Number of relevant labels: u |Qu|2.
Learning to Rank User Queries to Detect Search Tasks
Reducing QSL to a Learning to Rank Problem
Reducing QSL to a Learning to Rank Problem
User-centric approach: we aim at learning a query similarity
function that scores higher every pair of objects in the same
task.
Given any pair of queries (qu
i , qu
j ) in the user search log Qu we
require their similarity to be high iff they belong to the same
task.
Here, each (qu
i , qu
j ) is a single ordered pair, where i ≤ j is
associated with the tuple for user u.
the binary relation “≤” between queries is given by the order
of their issuing times.
Number of relevance labels: u
Qu
2 .
Learning to Rank User Queries to Detect Search Tasks
Reducing QSL to a Learning to Rank Problem
Types of Features Used1
Symmetric global features based on Qu.
Examples: Session Num Queries, Session Time Span,
Avg Session Query Len, etc.
Symmetric features extracted from the query pair (qu
i , qu
j ).
Examples: Levenshtein, Jaccard (3-grams), ∆ Time, ∆ Pos,
Global Joint Prob (queries), Wikipedia Cosine [5].
Asymmetric features extracted from the query pair (qu
i , qu
j ).
Examples: Is Proper Subset (term-set),
Global Conditional Prob (queries).
37 features in total.
1
See our paper for the complete list of features used.
Learning to Rank User Queries to Detect Search Tasks
Implementing STD
Implementing STD
Learning to Rank
Gradient-Boosted Regression Trees (GBRT)
RMSE.
LambdaMART (λMART)
nDCG.
Binary Classification
Logistic Regression (LogReg) [3]
Logistic loss.
Decision Trees (DT) [5]
Information gain.
k-fold cross validation (k = 5): parameters tuned on
validation data.
Clustering measure to get ˆ: Jaccard.
Learning to Rank User Queries to Detect Search Tasks
Experiments
Dataset
Dataset
Proposed by Hagen et al. [2]
Three-months sample of the AOL query log.
8,840 queries issued by 127 users.
Labeled by two human assessors into 1,378 user search tasks
(called missions in the original paper).
We remove stopwords, noisy chars and transform query strings
to lowercase.
We remove longest and shortest user sessions.
Resulting dataset: 6,381 queries from 125 users.
Learning to Rank User Queries to Detect Search Tasks
Experiments
Dataset
Dataset
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 10 15 20 25 30 35 55 85 110 166
ratiooftasks
number of queries
(a)
0
0.2
0.4
0.6
0.8
1
1 2 3
ratioofusers
0.8
1
rs
0.8
1
rs
Singleton tasks are 41% of the dataset.
Singleton: answering always not-same-task.
Learning to Rank User Queries to Detect Search Tasks
Experiments
Results
Results
Query-centric approach
Figure 2: Pro
Table 5: Comparison of L2R techniques and baselines in
validation folds. Best results are shown in boldface, and
↵ = .05.
(a) Query-centric dataset L0
Method
Metric
Rand F1avg Jaccard F1
Singleton 0.738 0.458 0 0
DT 0.898 0.853 0.620 0.714
LogReg 0.919 0.868 0.639 0.737
GBRT 0.915 0.889(*) 0.670 0.763
MART 0.919 0.879 0.687(*) 0.778(*)
skewness of class labels distribution in our dataset. There
fore, even though it is a widely used measure, Rand inde
Learning to Rank User Queries to Detect Search Tasks
Experiments
Results
Results
User-centric approach
e 2: Properties of the dataset.
elines in terms of Rand, F1avg, Jaccard, and F1 averaged across 5
ace, and there is a (*) next to those which are statistically signifi
F1
0
0.714
0.737
0.763
.778(*)
(b) User-centric dataset L00
Method
Metric
Rand F1avg Jaccard F1
Singleton 0.738 0.458 0 0
DT 0.880 0.843 0.604 0.706
LogReg 0.921(*) 0.868 0.639 0.738
GBRT 0.913 0.875(*) 0.682 0.771
MART 0.914 0.873 0.684(*) 0.778(*)
. There-
nd index
e quality
criminative features are those regarding the relative
times and positions of a given pair of queries. This
sonable, since the chance of two close queries to be
Learning to Rank User Queries to Detect Search Tasks
Conclusion
Conclusion
We proposed the Search Task Discovery framework made up
of two modules: QSL and GQC.
QSL learns a query similarity functions from a ground truth of
manually-labeled search tasks.
GQC models the user queries as a graph.
We propose to employ Learning to Rank techniques (GBRT,
λMART) in QSL.
Experiments prove the effectiveness of Learning to Rank
techniques in detecting search tasks.
Future Work: We plan to employ STD in a streaming setting to
detect search tasks in (pseudo) real time.
Learning to Rank User Queries to Detect Search Tasks
Conclusion
Thank you for your attention!
Learning to Rank User Queries to Detect Search Tasks
Conclusion
References I
[1] P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna.
The query-flow graph: model and applications.
In CIKM’08, pages 609–618. ACM, 2008.
[2] M. Hagen, J. Gomoll, A. Beyer, and B. Stein.
From search session detection to search mission detection.
In OAIR’13, pages 85–92, 2013.
[3] R. Jones and K. L. Klinkner.
Beyond the session timeout: automatic hierarchical segmentation of search topics
in query logs.
In CIKM’08. ACM, 2008.
[4] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei.
Identifying task-based sessions in search engine query logs.
In WSDM’11, pages 277–286. ACM.
[5] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei.
Discovering user tasks in long-term web search engine logs.
ACM TOIS, 31(3):1–43, July 2013.
Learning to Rank User Queries to Detect Search Tasks
Conclusion
References II
[6] H. Wang, Y. Song, M.-W. Chang, X. He, R. W. White, and W. Chu.
Learning to extract cross-session search tasks.
In WWW’13, pages 1353–1364. ACM, 2013.
Learning to Rank User Queries to Detect Search Tasks
Conclusion
Clustering Metrics
Rand: tp+tn
tp+tn+fp+fn
Jaccard: tp
tp+fp+fn
F1: 2·p·r
p+r .
F1avg = j
mj
m F1max(j) where mj = |Tu
j | and m = |Qu|.
Learning to Rank User Queries to Detect Search Tasks
Conclusion
Feature Importance

More Related Content

What's hot

PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGIJDKP
 
Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Ryohei Suzuki
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
IRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET Journal
 
Paper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsPaper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsRyohei Suzuki
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionIOSR Journals
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1abc
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics OverviewTony Fast
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setijma
 
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using RIRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using RIRJET Journal
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
 

What's hot (20)

PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
 
Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
IRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword Manager
 
Paper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsPaper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problems
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute set
 
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using RIRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
B017410916
B017410916B017410916
B017410916
 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
 

Viewers also liked

Learning to rank fulltext results from clicks
Learning to rank fulltext results from clicksLearning to rank fulltext results from clicks
Learning to rank fulltext results from clickstkramar
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInViet Ha-Thuc
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksViet Ha-Thuc
 
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...recsysfr
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Tien-Yang (Aiden) Wu
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataMail.ru Group
 
Python 起步走
Python 起步走Python 起步走
Python 起步走Justin Lin
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

Viewers also liked (8)

Learning to rank fulltext results from clicks
Learning to rank fulltext results from clicksLearning to rank fulltext results from clicks
Learning to rank fulltext results from clicks
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional Networks
 
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
 
Python 起步走
Python 起步走Python 起步走
Python 起步走
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Similar to Learning To Rank User Queries to Detect Search Tasks

User Priority Based Search on Organizing User Search Histories with Security
User Priority Based Search on Organizing User Search Histories with SecurityUser Priority Based Search on Organizing User Search Histories with Security
User Priority Based Search on Organizing User Search Histories with SecurityIOSR Journals
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Searchceya
 
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksParts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksRishabh Mehrotra
 
Dynamic Organization of User Historical Queries
Dynamic Organization of User Historical QueriesDynamic Organization of User Historical Queries
Dynamic Organization of User Historical QueriesIJMER
 
03 cs3024 pankaj_jajoo
03 cs3024 pankaj_jajoo03 cs3024 pankaj_jajoo
03 cs3024 pankaj_jajooMeetika Gupta
 
Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan Kumar
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...IRJET Journal
 
Modelling Time-aware Search Tasks for Search Personalisation
Modelling Time-aware Search Tasks for Search PersonalisationModelling Time-aware Search Tasks for Search Personalisation
Modelling Time-aware Search Tasks for Search PersonalisationThanh Vu
 
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelApplying Genetic Algorithms to Information Retrieval Using Vector Space Model
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
 
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL IJCSEA Journal
 
Applying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space modelApplying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space modelIJCSEA Journal
 
S-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task ModelsS-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task Modelsvirtual-campus
 
Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query UnderstandingAbhay Prakash
 
Query Recommendation by using Collaborative Filtering Approach
Query Recommendation by using Collaborative Filtering ApproachQuery Recommendation by using Collaborative Filtering Approach
Query Recommendation by using Collaborative Filtering ApproachIRJET Journal
 

Similar to Learning To Rank User Queries to Detect Search Tasks (20)

User Priority Based Search on Organizing User Search Histories with Security
User Priority Based Search on Organizing User Search Histories with SecurityUser Priority Based Search on Organizing User Search Histories with Security
User Priority Based Search on Organizing User Search Histories with Security
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Search
 
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksParts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
 
Dynamic Organization of User Historical Queries
Dynamic Organization of User Historical QueriesDynamic Organization of User Historical Queries
Dynamic Organization of User Historical Queries
 
03 cs3024 pankaj_jajoo
03 cs3024 pankaj_jajoo03 cs3024 pankaj_jajoo
03 cs3024 pankaj_jajoo
 
Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan insight student conference v2
Gunjan insight student conference v2
 
50120130406007
5012013040600750120130406007
50120130406007
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
 
Modelling Time-aware Search Tasks for Search Personalisation
Modelling Time-aware Search Tasks for Search PersonalisationModelling Time-aware Search Tasks for Search Personalisation
Modelling Time-aware Search Tasks for Search Personalisation
 
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelApplying Genetic Algorithms to Information Retrieval Using Vector Space Model
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
 
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL
 
Applying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space modelApplying genetic algorithms to information retrieval using vector space model
Applying genetic algorithms to information retrieval using vector space model
 
G04835759
G04835759G04835759
G04835759
 
S-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task ModelsS-CUBE LP: Service Discovery and Task Models
S-CUBE LP: Service Discovery and Task Models
 
Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender Systems
 
T0 numtq0n tk=
T0 numtq0n tk=T0 numtq0n tk=
T0 numtq0n tk=
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query Understanding
 
Query Recommendation by using Collaborative Filtering Approach
Query Recommendation by using Collaborative Filtering ApproachQuery Recommendation by using Collaborative Filtering Approach
Query Recommendation by using Collaborative Filtering Approach
 

Recently uploaded

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 

Recently uploaded (20)

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 

Learning To Rank User Queries to Detect Search Tasks

  • 1. Learning to Rank User Queries to Detect Search Tasks Learning to Rank User Queries to Detect Search Tasks Claudio Lucchese1, Franco Maria Nardini1, Salvatore Orlando2, Gabriele Tolomei3 1 ISTI-CNR, Pisa, Italy 2 Universit´a Ca’ Foscari Venezia, Italy 3 Yahoo Labs, London, UK
  • 2. Learning to Rank User Queries to Detect Search Tasks Introduction The Evolution of Web Search An increasing number of user searches are part of complex patterns. Complex search patterns are often composed of several, multi-term, interleaved queries spread across many sessions. User information needs are getting harder to understand and satisfy. Search Task: Clusters of queries with the same latent information need from a real-world search engine log.
  • 3. Learning to Rank User Queries to Detect Search Tasks Introduction Complex Search Patterns: AOL 2006 Queries within short-time sessions are part of different complex tasks. Each complex task spans across several sessions.
  • 4. Learning to Rank User Queries to Detect Search Tasks Related Work Related Work - I Jones and Klinkner [3] First high-level analysis of user search tasks. Hierarchical search: Flat query streams can be structured as complex search missions linked to each other. Each mission in turn contains simpler search goals. Design a binary classifier which is able to predict whether two queries belong to the same goal or not.
  • 5. Learning to Rank User Queries to Detect Search Tasks Related Work Related Work - II Lucchese et al. [4, 5] Formally introduce the search task discovery problem. Graph-based representation of each user session: Nodes are queries. Edges between query pairs are weighted according to a query similarity measure. Search tasks are identified by connected components of each user session graph. Outperforms other approaches for session boundary detection like the Query-Flow Graph [1].
  • 6. Learning to Rank User Queries to Detect Search Tasks Related Work Related Work - III Wang et al. [6] Cross-session discovery of search tasks. Graph-based representation of all queries. Search tasks as connected components of the graph having the following characteristics: Each query of a task can be linked only to one past query of the same task. Tasks are therefore modeled as trees. SVM model to identify the best tree structure hidden by the query similarity graph.
  • 7. Learning to Rank User Queries to Detect Search Tasks Search Task Discovery (STD) Framework Search Task Discovery Framework Ground truth of search tasks QSL 1. Learning query similarity 2. Learning pruning threshold 3. Find connected components GQC
  • 8. Learning to Rank User Queries to Detect Search Tasks Search Task Discovery (STD) Framework Query Similarity Learning (QSL) Query Similarity Learning Query Similarity Learning (QSL): estimates a target query similarity function ˆσ from a ground truth of manually-labeled search tasks. Binary classes: same-task (positive) and not-same-task (negative) Learning to Rank: instead of predicting same-task, we learn a ranking function: same-task queries ranked highest. How to build the training sets?
  • 9. Learning to Rank User Queries to Detect Search Tasks Search Task Discovery (STD) Framework Graph-based Query Clustering (GQC) Graph-based Query Clustering Graph-based Query Clustering (GQC): transforms a user search log into a weighted query graph Gu ˆσ: Queries are nodes. Edges are labeled using ˆσ previously learnt by QSL. Connected components of Gu ˆσ: user search tasks. Weak edges introduced by ˆσ: -neighborhood technique as selective pruning: removing edges if below a given . optimal ˆ learnt from groud truth.
  • 10. Learning to Rank User Queries to Detect Search Tasks Search Task Discovery (STD) Framework Search Task Discovery Problem Search Task Discovery Problem Given a user query log Qu, a clustering algorithm C that extracts the connected components of the graph, and a quality function γ measuring the quality of a clustering, the Search Task Discovery Problem requires to find the best similarity function and pruning threshold that maximize the average quality of the clusters C(Gu ˆσ,ˆ) for all u ∈ U, i.e., (ˆσ∗ , ˆ∗ ) = argmaxˆσ,ˆ 1 |U| u∈U γ(C(Gu ˆσ,ˆ)).
  • 11. Learning to Rank User Queries to Detect Search Tasks Reducing QSL to a Learning to Rank Problem Reducing QSL to a Learning to Rank Problem Query-centric approach: we aim at learning a query similarity function that scores higher those queries that appear in the same task. For any given qu i ∈ Tu k , we say that qu j is relevant to qu i if qu j ∈ Tu k , and irrelevant otherwise. Labels {1, 0} assigned accordingly when building the training set. Number of relevant labels: u |Qu|2.
  • 12. Learning to Rank User Queries to Detect Search Tasks Reducing QSL to a Learning to Rank Problem Reducing QSL to a Learning to Rank Problem User-centric approach: we aim at learning a query similarity function that scores higher every pair of objects in the same task. Given any pair of queries (qu i , qu j ) in the user search log Qu we require their similarity to be high iff they belong to the same task. Here, each (qu i , qu j ) is a single ordered pair, where i ≤ j is associated with the tuple for user u. the binary relation “≤” between queries is given by the order of their issuing times. Number of relevance labels: u Qu 2 .
  • 13. Learning to Rank User Queries to Detect Search Tasks Reducing QSL to a Learning to Rank Problem Types of Features Used1 Symmetric global features based on Qu. Examples: Session Num Queries, Session Time Span, Avg Session Query Len, etc. Symmetric features extracted from the query pair (qu i , qu j ). Examples: Levenshtein, Jaccard (3-grams), ∆ Time, ∆ Pos, Global Joint Prob (queries), Wikipedia Cosine [5]. Asymmetric features extracted from the query pair (qu i , qu j ). Examples: Is Proper Subset (term-set), Global Conditional Prob (queries). 37 features in total. 1 See our paper for the complete list of features used.
  • 14. Learning to Rank User Queries to Detect Search Tasks Implementing STD Implementing STD Learning to Rank Gradient-Boosted Regression Trees (GBRT) RMSE. LambdaMART (λMART) nDCG. Binary Classification Logistic Regression (LogReg) [3] Logistic loss. Decision Trees (DT) [5] Information gain. k-fold cross validation (k = 5): parameters tuned on validation data. Clustering measure to get ˆ: Jaccard.
  • 15. Learning to Rank User Queries to Detect Search Tasks Experiments Dataset Dataset Proposed by Hagen et al. [2] Three-months sample of the AOL query log. 8,840 queries issued by 127 users. Labeled by two human assessors into 1,378 user search tasks (called missions in the original paper). We remove stopwords, noisy chars and transform query strings to lowercase. We remove longest and shortest user sessions. Resulting dataset: 6,381 queries from 125 users.
  • 16. Learning to Rank User Queries to Detect Search Tasks Experiments Dataset Dataset 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 10 15 20 25 30 35 55 85 110 166 ratiooftasks number of queries (a) 0 0.2 0.4 0.6 0.8 1 1 2 3 ratioofusers 0.8 1 rs 0.8 1 rs Singleton tasks are 41% of the dataset. Singleton: answering always not-same-task.
  • 17. Learning to Rank User Queries to Detect Search Tasks Experiments Results Results Query-centric approach Figure 2: Pro Table 5: Comparison of L2R techniques and baselines in validation folds. Best results are shown in boldface, and ↵ = .05. (a) Query-centric dataset L0 Method Metric Rand F1avg Jaccard F1 Singleton 0.738 0.458 0 0 DT 0.898 0.853 0.620 0.714 LogReg 0.919 0.868 0.639 0.737 GBRT 0.915 0.889(*) 0.670 0.763 MART 0.919 0.879 0.687(*) 0.778(*) skewness of class labels distribution in our dataset. There fore, even though it is a widely used measure, Rand inde
  • 18. Learning to Rank User Queries to Detect Search Tasks Experiments Results Results User-centric approach e 2: Properties of the dataset. elines in terms of Rand, F1avg, Jaccard, and F1 averaged across 5 ace, and there is a (*) next to those which are statistically signifi F1 0 0.714 0.737 0.763 .778(*) (b) User-centric dataset L00 Method Metric Rand F1avg Jaccard F1 Singleton 0.738 0.458 0 0 DT 0.880 0.843 0.604 0.706 LogReg 0.921(*) 0.868 0.639 0.738 GBRT 0.913 0.875(*) 0.682 0.771 MART 0.914 0.873 0.684(*) 0.778(*) . There- nd index e quality criminative features are those regarding the relative times and positions of a given pair of queries. This sonable, since the chance of two close queries to be
  • 19. Learning to Rank User Queries to Detect Search Tasks Conclusion Conclusion We proposed the Search Task Discovery framework made up of two modules: QSL and GQC. QSL learns a query similarity functions from a ground truth of manually-labeled search tasks. GQC models the user queries as a graph. We propose to employ Learning to Rank techniques (GBRT, λMART) in QSL. Experiments prove the effectiveness of Learning to Rank techniques in detecting search tasks. Future Work: We plan to employ STD in a streaming setting to detect search tasks in (pseudo) real time.
  • 20. Learning to Rank User Queries to Detect Search Tasks Conclusion Thank you for your attention!
  • 21. Learning to Rank User Queries to Detect Search Tasks Conclusion References I [1] P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In CIKM’08, pages 609–618. ACM, 2008. [2] M. Hagen, J. Gomoll, A. Beyer, and B. Stein. From search session detection to search mission detection. In OAIR’13, pages 85–92, 2013. [3] R. Jones and K. L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In CIKM’08. ACM, 2008. [4] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Identifying task-based sessions in search engine query logs. In WSDM’11, pages 277–286. ACM. [5] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Discovering user tasks in long-term web search engine logs. ACM TOIS, 31(3):1–43, July 2013.
  • 22. Learning to Rank User Queries to Detect Search Tasks Conclusion References II [6] H. Wang, Y. Song, M.-W. Chang, X. He, R. W. White, and W. Chu. Learning to extract cross-session search tasks. In WWW’13, pages 1353–1364. ACM, 2013.
  • 23. Learning to Rank User Queries to Detect Search Tasks Conclusion Clustering Metrics Rand: tp+tn tp+tn+fp+fn Jaccard: tp tp+fp+fn F1: 2·p·r p+r . F1avg = j mj m F1max(j) where mj = |Tu j | and m = |Qu|.
  • 24. Learning to Rank User Queries to Detect Search Tasks Conclusion Feature Importance