OAIR 2013

Modeling and Predicting the Task-by-
Task Behavior of Search Engine Users
Gabriele Tolomei
Università Ca‟ Foscari Venezia, Italy
Claudio Lucchese
ISTI-CNR, Pisa, Italy
Salvatore Orlando
Università Ca‟ Foscari Venezia, Italy
Fabrizio Silvestri
Raffaele Perego
May, 23 2013 - Lisbon, Portugal
10th International Conference in the RIAO series

Outline
• Motivation
• Research Challenges
• Experiments and Results
• Conclusion and Future Work
2

Outline
• Motivation
3

A New Way of Search
Alice
Bob
Same Task!
“Reserving a hotel room in New York”
4

… and Search Engines?
• Roughly, they are still Web document
retrieval tools
– answering on a per-query basis
– ten-blue links to relevant Web pages
5

Information Need Hierarchy
• Web Task: any (atomic) activity that a user
performs through Web search
– “find a recipe”, “book a flight”, “read news”,
etc.
– distinct users may use different queries to
accomplish the same Web task
• Web Mission: composition of Web tasks to
achieve complex goals
– distinct users may use different Web tasks to
accomplish the same Web mission
6
[Jones and Klinkner, CIKM „08]

Goals
• Mine Search Engine logs to detect Web
tasks
• Provide a user model for task-oriented
search
– from query-by-query to task-by-task
• Show how such model can be used to
design a real-world application
– from query to task recommendation
7

Outline
• Motivation
8

The Big Picture
• Bottom-up, 2-stage clustering solution:
– User Task Discovery from “raw” queries
issued by the same user and stored in query
logs
– Collective Task Discovery from distinct User
Tasks
• Graph-based representation of Collective 9

User Task Discovery
• User Task
– set of possibly non contiguous queries (multi-
tasking), issued by a single user, whose aim
is to carry out a specific Web task
• QC-HTC
– Graph-based query clustering solution
proposed in our previous work [Lucchese et al.,
WSDM‟11]
– outperforms other techniques for session
boundary detection in query logs (e.g., QFG
[Boldi et al., CIKM‟08])
10

User Task Discovery: QC-HTC
• Splits long-term user session into shorter time-
based sessions
• Builds a weighted undirected graph for each time-
based session
– nodes in each graph are the queries of a time-based
session
• Weight-links consecutive pairs of queries with their
content-based similarity:
– lexical (query character n-grams)
– semantic (query “wikification”)
• Merges any two sequential clusters if their first
(head) and last (tail) queries are similar enough
11

Task-oriented User Sessions
12

Collective Task Discovery
• Collective Task
– group of distinct user tasks (i.e., distinct sets of
queries performed by several users) to represent
the same Web task
• Identify similar user tasks by clustering their
“bag of words” representations
– Each user query is a sentence
– Each user task is a concatenation of possibly
many sentences (i.e., a text document)
• T = {T1, …, TK} is the final set of Collective
Tasks
13

Mapping User to Collective
Tasks
… … … …
14

Task Relation Graph (TRG)
• Task-oriented model of user search behavior
• TRG(T, E, w, η) is a weighted directed graph
– nodes are the set of collective tasks T={T1, …, TK}
– edges E represent task relatedness
– w: TxT [0,1] is the weighting-edge function
– ηis a weight threshold
• Ti and Tj are linked together iff w(Ti, Tj) > η
15

Outline
• Motivation
16

Data Set: AOL 2006 Query Log
18

Results
Results were evaluated on a manually-built ground-truth of user tasks
[Lucchese et al., TOIS 2013]
19

Data Set: AOL 2006 Query Log
21

Training Set vs. Test Set
22

Clustering User Tasks
• Algorithm: Repeated Bisections vs.
Agglomerative
• Similarity Measure: Cosine similarity vs.
Pearson‟s correlation
• Objective Function: maximize intra-cluster
similarity
• Stop Criterion: choose heuristically the final
number K of clusters through the “elbow
method”
• We select K = 1,024
23

Results and Example
Results were evaluated on a manually-built ground-truth of collective tasks
[Lucchese et al., TOIS 2013]
24

Building TRG: Task Relatedness
• Use the training set to compute w(Ti,Tj)
• Frequent Sequential Patterns
– η= support (i.e., probability) of Ti and Tj co-
occurring in a specified sequence: P(<Ti, Tj>)
– task order matters!
• Association Rules Ti  Tj
– η= support: P({Ti, Tj})
– η= confidence: P(Tj|Ti)
– task order doesn‟t matter!
26

Task Recommendation
• One out of many possible applications of
TRG
• A user is performing (or has just
performed) a task Ti
– indeed a user task which is similar to a known
Ti
• Retrieve from TRG the set Rm(Ti) including
the m-top related nodes/tasks to Ti
– tasks in Rm(Ti) are those having the m highest
edge weights among all the adjacent nodes to27

Task Recommendation:
Experiments
• Use TRGs built from training set to
generate task recommendations for the
test set
• Original user sessions in test set are split
in 1/3 prefix and 2/3 suffix sets of user
tasks
• Each user task is mapped to a candidate
collective task Tc (cosine similarity)
• From all the Tc in prefix retrieve the union-
set of recommendations U R (T ) from
28

Evaluation
Coverage is affected by the edge weighting function and by the threshold η
29

Task Recommendation: Results
(top-1)
30

Task Recommendation: Results
(top-3)
31

Examples
32

Examples
33

Task vs. Query
Recommendation
• To show that task recommendation is
different from well-known query
recommendation
• TRG vs. QFG
– 83.8% of top-3 query suggestions generated by
QFG live in the same (collective) task
– Only 15.1% of top-3 query suggestions generated
by QFG lead to 2 separate (collective) tasks
• QFG is great if user wants to stay in the
same task
• TRG allows user to switch and jump to other
tasks
34

Outline
• Motivation
35

The “Take-Away” Message
• Web Search Engines should handle user
requests from “query-by-query” to “task-
by-task”
• New models for user search behavior are
needed: from Query Flow Graph to Task
Relation Graph
• Task Relation Graph may be exploited for
several applications (e.g., Task
Recommendation)
36

Future Work
• Advanced Task Representation
– E.g., linked data, as opposed to simple bag-of-
queries
• Automatic Task Labeling (taxonomy of Web
tasks):
– Linking queries of collective tasks with referent
entities in a knowledge base
– Exploit entity categories to label the whole task
• Use TRG for other applications
– Task-based advertising, Mission discovery, etc.
• New SERP to render task-oriented results
37

OAIR 2013

Recommended

Recommended

More Related Content

Similar to OAIR 2013

Similar to OAIR 2013 (20)

Recently uploaded

Recently uploaded (20)

OAIR 2013