SlideShare a Scribd company logo
Dynamic Search and
Beyond
Prof. Grace Hui Yang
InfoSense Group
Department of Computer Science
Georgetown University
huiyang@cs.georgetown.edu
Sep 29, 2018
CCIR 2018 @ Guilin
• Our graduate program focuses on
Information Systems,
Privacy and Security,
and Computer Theory.
• Ph.D., Master’s, Postdocs
• ACM International Conference on Theory of
Information Retrieval (ICTIR)
• Its importance in the IR community
• Acknowledgement to Guangxi normal university,
CCF, and many old and new friends
Statistical Modeling of
Information Seeking
• Aims to connect user’s information seeking
behaviors with retrieval models
• The ‘dynamics’ in the search process are the
primary elements to be modeled
• I call this set of novel retrieval algorithms “Dynamic
IR Modeling”
Task: Dynamic IR
• The information retrieval task that aims to find
relevant documents for a session of multiple queries.
• It happens when information needs are complex,
vague, evolving, often containing multiple subtopics
• Not possible to be resolved by one-shot ad-hoc
retrieval
• e.g. “Purchasing a home”, “What is the meaning of
life”
E.g. Find what city and state Dulles airport is in, what shuttles ride-sharing vans and
taxi cabs connect the airport to other cities, what hotels are close to the airport, what
are some cheap off-airport parking, and what are the metro stops close to the Dulles
airport.
Information
need
User
Search
Engine
An Illustration
Characteristics of Dynamic IR
• Rich interactions
• Query formulation
• Document clicks
• Document examination
• eye movement
• mouse movements
• etc.
4
Characteristics of Dynamic IR
• Temporal dependency
5
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2 ……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
Characteristics of Dynamic IR
• Aim for a long-term goal
• Great if we can find early what a user
ultimately want
4
Reinforcement Learning (RL)
• Fits well in this trial-and-error setting
• It is to learn from repeated, varied attempts which are
continued until success.
• The learner (also known as agent) learns from its dynamic
interactions with the world
• rather than from a labeled dataset as in supervised
learning.
• The stochastic model assumes that the system's current
state depend on the previous state and action in a non-
deterministic manner 6
Most of Our Work is inspired
by MDPs/POMDPs
○ Based on Markov Decision Process (MDP)
○ States: Queries
! Observable
○ Actions:
! User actions:
○ Add/remove/unchange the query terms
○ Nicely correspond to our definition of query change
! Search Engine actions:
○ Increase/ decrease /remain term weights
○ Rewards:
! nDCG
[Guan, Zhang, and Yang SIGIR 2013]
QUERY CHANGE MODEL
SEARCH ENGINE AGENT’S ACTIONS
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28, france
world cup 98 reaction stock market→ france world
cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck lobbyists
US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98 reaction
→ france world cup 98
N No change
‘legislation’ in s32, bollywood legislation
→bollywood law
QUERY CHANGE RETRIEVAL MODEL (QCM)
○ Bellman Equation gives the optimal value for an MDP:
○ The reward function is used as the document relevance
score function and is tweaked backwards from Bellman
equation:
Document relevant
score
Query Transition
model
Maximum past
relevance
Current
reward/relevance
score
CALCULATING THE TRANSITION MODEL
• According to Query Change and Search Engine Actions
Current reward/
relevance score
Increase weights for
theme terms
Decrease weights for
old added terms
Decrease weights for
removed terms
Increase weights for
novel added terms
○ Partially Observable Markov Decision Process
○ Two agents
● Cooperative game
● Joint Optimization
WIN-WIN SEARCH: DUAL-AGENT STOCHASTIC GAME
● Hidden states
● Actions
● Rewards
● Markov
[Luo, Zhang, and Yang SIGIR 2014]
A MARKOV CHAIN OF DECISION MAKING STATES
[Luo, Zhang, and Yang SIGIR 2014]
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
● scooter price ⟶		scooter stores ● collecting old US coins⟶	selling
old US coins
● Philadelphia NYC travel ⟶	
Philadelphia NYC train
● Boston tourism ⟶ NYC tourism
q0
HIDDEN DECISION MAKING STATES
[Luo, Zhang, and Yang SIGIR 2014]
Dual Agent Stochastic Game
ACTIONS
! User Action (Au)
○ add query terms (+Δq)
○ remove query terms (-Δq)
○ keep query terms (qtheme)
! Search Engine Action(Ase)
○ Increase/ decrease/ keep term weights
○ Switch on or off a search technique,
○ e.g. to use or not to use query expansion
○ adjust parameters in search techniques
○ e.g., select the best k for the top k docs used in
PRF
! Message from the user(Σu)
○ clicked documents
○ SAT clicked documents
! Message from search engine(Σse)
○ top k returned documents
Messages are essentially
documents that an agent thinks
are relevant.
[Luo, Zhang, and Yang SIGIR 2014]
REWARDS
! Explicit Rewards:
! nDCG
! Implicit Rewards:
! clicks
[Luo et al, SIGIR 2014, ECIR 2015]
EXPERIMENTS
○ Corpus: ClubWeb09 and ClueWeb 12, TREC DD datasets
○ Query Logs
SEARCH ACCURACY
○ Search accuracy on TREC 2012 Session Track
TREC 2012 Session Track
◆ Win-win outperforms most retrieval algorithms on TREC 2012.
◆ Systems in TREC 2012 perform better than in TREC 2013.
◆ many relevant documents are not included in ClueWeb12 CatB
collection
◆ Win-win outperforms all retrieval algorithms on TREC 2013.
◆ It is highly effective in Session Search.
SEARCH ACCURACY
○ Search accuracy on TREC 2013 Session Track
TREC 2013 Session Track
SEARCH ACCURACY FOR DIFFERENT
SESSION TYPES
○ TREC 2012 Sessions are classified into:
! Product: Factual / Intellectual
! Goal quality: Specific / Amorphous
Intellectual %chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
- Better handle sessions that demonstrate evolution and exploration Because QCM
treats a session as a continuous process by studying changes among query
transitions and modeling the dynamics
QCM
How to design the states,
actions, and rewards
DESIGN OPTIONS
○ Is there a temporal component?
○ States – What changes with each time step?
○ Actions – How does your system change the state?
○ Rewards – How do you measure feedback or
effectiveness in your problem at each time step?
○ Transition Probability – Can you determine this?
! If not, then model free approach is more suitable
ECIR’15
… can it be more
efficient?
A Direct Policy Learning
Framework
• Learns a direct mapping from observations to actions by
gradient descent
• Define a history: A chain of events happening in a
session
• the dynamic changes of states, actions, observations,
and rewards in a session
ICTIR’15
Browse Phase
• Actor: the user
• It happens
• after the search results are shown to the user
• before the user starts to write the next query
• Records how the user perceives and examines the
(previously retrieved) search results
ICTIR’15
Decompose a history
Query Phase
• Actor: the user
• It happens
• when the user writes a query
• Assuming the query is created based on
• what has been seen in the browse phase
• the information need
ICTIR’15
Decompose a history
Rank Phase
• Actor: the search engine
• It happens
• after the query is entered
• before the search results are returned
• It is where the search algorithm takes place
Decompose a history
Our objective function:
where
Action Selection Distribution
Softmax Function
Gradient
Ranking Function
• It originally presents the probability of selecting a
(ranking) action
• In our context, the probability of selecting d to be put
at the top of a ranked list under n3 and θ3 at the tth
iteration
• Then we sort the documents by it to generate the
document list
Updates:
Feature function:
Query Features
• Test if a search term w∈q
t
and w∈q
t
−1
• # of times that a term w occurs in q
1
,q
2
,…,q
t
Query-Document Features
• Test if a search term w∈+∆q
t
and w∈D
t
−1
• Test if a document d contains a term w ∈ −∆q
t
tf
.
idf score of a document d to q
t
Click Features
• Test if there are SAT-Clicks in Dt−1
• # of times a document being clicked in the
current session
• # of seconds a document being viewed and
reviewed in the current session
Query-Document-Click Features
• Test if qi leads to SAT-Clicks in Di, where i =
0...t−1
Session Features
• position at the current session
Browse
Query
Rank
Efficiency - TREC 2012 Session
• lemur > dpl > qcm > winwin
• dpl achieves a good balance between accuracy and efficiency
• the conclusions are also consistent upon experiments on TREC’12
~ 14 Session Tracks
DPL
TREC 2012 Session
• dpl achieves a significant improvement over the TREC best run
• We found similar conclusions on TREC 2013 and 2014 Session Tracks
DPL
TREC DYNAMIC DOMAIN 2015-2017
! The search task focuses on specific
domains
! In the three years, we had explored
domains from the dark web (illicit good and
Ebola) and polar science, to more general
web domains (NYT)
! What is consistent?
○ The participating system is expected to
help the user through interactions & get
their tasks done
○ User’s information need usually consists
of multiple aspects
THE TREC DYNAMIC DOMAIN TASK
FEEDBACK FROM A SIMULATED USER
! https://github.com/trec-dd/trec-dd-jig
DOMAIN USED IN 2017
○ New York Times Annotated Corpus
! Sandhaus, Evan. "The new york times annotated corpus." Linguistic Data
Consortium, Philadelphia 6, no. 12 (2008): e26752.
! Archives of New York Times in 20 years, from January 1, 1987 and June 19, 2007
! Uncompressed size 16 GB
! Over 1.8 million documents
! Over 650,000 article summaries written by library scientists.
! Over 1,500,000 articles manually tagged by library scientists
! Over 275,000 algorithmically-tagged articles that have been hand verified by
professionals
ANNOTATION
○ Create Topic and Relevance Judgement at the same time
! Not by pooling
○ Topic – subtopic – passage – Relevance Judgement
○ The challenge: how to be complete
○ Useful information that the user gains
! Raw relevance score
○ Discounting
! Based on document ranking
! Based on diversity
○ User’s efforts
! Time spent
! Lengths of documents being
viewed
EVALUATION METRICS FOR DYNAMIC SEARCH
○ Most session search metrics consider all those factors into
one overwhelmingly complex formula
○ The optimal value, aka upper bound, of those metrics highly
varies on different search topics
○ In Cranfield-like settings (e.g. TREC), the difference is often
ignored
THE PROBLEM
TOY EXAMPLE
Doc Relevance score regarding topic-subtopic
1-1 1-2 2-1 2-2 2-3 2-4 2-5
d1 1 4
d2 3 4
d3 4
d4 4
d5 4
System Topic 1 CT-
topic 1
Topic 2 CT-
topic
2
CT-avg Normaliz
ed CT-
avg
System1 d1, irrel, irrel, irrel,
irrel
1 d1, d3, d4, d5, irrel
16 8.5 0.596
System2 d2, irrel, irrel, irrel,
irrel
3 d1, d3, d4, d5, irrel
14 8.5 0.787
Optimal d1, d2, irrel, irrel, irrel 4 d1, d2, d3, d4, d5 17
○ What is the optimal metric value that a system can
achieve?
! How to get the upper bound for each search topic?
! How does it affect the evaluation conclusions?
○ Variance of different topics
○ Normalization
RESEARCH QUESTIONS
!"#$%& = (
)*+,-
$./_!"#$% 1#23", 5 − 7#/%$_8#9:;(1#23")
922%$_8#9:; 1#23" − 7#/%$_8#9:;(1#23")
○ Session-DCG (sDCG)
! Järvelin et al. "Discounted cumulated gain based evaluation of multiple-query IR
sessions." Advances in Information Retrieval (2008): 4-15.
○ Cube Test (CT)
! Luo et al. "The water filling model and the cube test: multi-dimensional evaluation for professional
search." CIKM, 2013.
○ Expected Utility (EU)
○ Yang and Abhimanyu. "Modeling expected utility of multi-session information distillation." ICTIR
2009.
DYNAMIC SEARCH METRICS
!" = $
%
& ' $
(,* ∈%
$
,∈-.,/
0, ∗ 23 ,,(,*45 − 7 ∗ 89:;(=, >))
@A =
∑(C5
D ∑*C5
|F(GH.|
∑, 0, IJK =, > ∗ 23(,,(,*45)
∑(C5
D ∑*C5
|F(GH.|
89:;(=, >)
:L@M = $
(C5
D
$
*C5
|F(GH.|
IJK(=, >)
1 + logS > ∗ 1 + logST =
○ sDCG
○ Cube Test
○ Expected Utility
DECONSTRUCT THE METRICS
CostGain Rank discount Novelty discount
!"#$ = &
'()
*
&
+()
|-'./0|
123(5, 7)
1 + log> 7 ∗ 1 + log>@ 5
#A =
∑'()
*
∑+()
|-'./0|
∑C DC 123 5, 7 ∗ EF(C,',+G))
∑'()
* ∑+()
|-'./0|
HI!J(5, 7)
KL = &
M
N O &
',+ ∈M
&
C∈Q0,R
DC ∗ EF C,',+G) − T ∗ HI!J(5, 7))
BOUNDS ON DIFFERENT TOPICS
!"#$ = "&!'()*+,- $.&*
BOUNDS ON DIFFERENT TOPICS
!" =
$%&'()*+,- ./%*
!(&+
BOUNDS ON DIFFERENT TOPICS
!" = $%&'()*+,- ./%*
−$%&'()*+,- 1(&+
! The difference of the optimal value a metric would
produce for different topics is large and should not
be ignored.
○ Rearrangement Inequality
○ In IR, Probability Ranking Principle [4]
! the overall effectiveness of an IR system can be
achieved the best by ranking the documents by their
usefulness in descending order
OUR SOLUTION
!"#$ + !&#$'" + … + !$#" ≤ !* " #" + !* & #& + … + !* $ #$ ≤ !"#" + !&#& + ⋯ + !$#$
,-. !" ≤ !& … ≤ !$ /01 #" ≤ #& … ≤ #$
NORMALIZATION EFFECT
!"#$ = "&!'()*+,- $.&*
NORMALIZATION EFFECT
!" =
$%&'()*+,- ./%*
!(&+
NORMALIZATION EFFECT
!" = $%&'()*+,- ./%* − / ∗ $%&'()*+,- 2(&+
/ = 0.01
! Using the bounds for normalization brings in more
fairness into evaluation
Conclusion
• Our main contributions:
• Put user into the models
• Created a bridge between information
seeking studies/user behavior studies with
learning
• Yield a family of new generative retrieval
models for a complex, dynamic settings
• Able to explain the results
A Few Thinkings
• Information seeking is a Markov Decision Process, instead of
independent searches
• User actions that cost more efforts, such as query changes,
are stronger signals than clicks
• Search is also a learning process for the user, who also
evolves
• Users and search engines form a partnership to explore the
information space
• They influence each other; It is a two-way communication
• Complex evaluation metrics might not be appropriate; the
complexity should either be modelled in the model or the
metric, but not in both
Look into the future
• Dynamic IR Models are good for modeling
information seeking
• A lot of room to study the user and the search
engine interaction in a generative way
• The thinking I presented here could be able to
generate new methods not only on retrieval and
evaluation, but also on related fields
• Exciting!!
Thank You!
• Email:
huiyang@cs.georgetown.edu
• Group Page: InfoSense at
http://infosense.cs.georgetown.
edu/
• Dynamic IR Website:
http://www.dynamic-ir-
modeling.org/
• Book: Dynamic Information
Retrieval Modeling
• TREC Dynamic Domain Track:
http://trec-dd.org/

More Related Content

Similar to Dynamic Search and Beyond

Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
Databricks
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Donal Byrne
 
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
UK Government Digital Service
 
DataEd Slides: Data Management Maturity - Achieving Best Practices Using DMM
DataEd Slides:  Data Management Maturity - Achieving Best Practices Using DMMDataEd Slides:  Data Management Maturity - Achieving Best Practices Using DMM
DataEd Slides: Data Management Maturity - Achieving Best Practices Using DMM
DATAVERSITY
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models Optimization
Sonya Liberman
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Lucidworks
 
Market Sounding Brief: ACT Government Data Management
Market Sounding Brief: ACT Government Data ManagementMarket Sounding Brief: ACT Government Data Management
Market Sounding Brief: ACT Government Data Management
Christopher Norman
 
CUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallCUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective Call
Smart Chicago Collaborative
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
Francisco Couto
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
IntoTheMinds
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
Ayele40
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
Sujit Pal
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
Eric D. Schabell
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
Crai Macdonald
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
Alexander Sibiryakov
 

Similar to Dynamic Search and Beyond (20)

Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
 
DataEd Slides: Data Management Maturity - Achieving Best Practices Using DMM
DataEd Slides:  Data Management Maturity - Achieving Best Practices Using DMMDataEd Slides:  Data Management Maturity - Achieving Best Practices Using DMM
DataEd Slides: Data Management Maturity - Achieving Best Practices Using DMM
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models Optimization
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
 
Market Sounding Brief: ACT Government Data Management
Market Sounding Brief: ACT Government Data ManagementMarket Sounding Brief: ACT Government Data Management
Market Sounding Brief: ACT Government Data Management
 
CUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallCUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective Call
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

Dynamic Search and Beyond

  • 1. Dynamic Search and Beyond Prof. Grace Hui Yang InfoSense Group Department of Computer Science Georgetown University huiyang@cs.georgetown.edu Sep 29, 2018 CCIR 2018 @ Guilin
  • 2. • Our graduate program focuses on Information Systems, Privacy and Security, and Computer Theory. • Ph.D., Master’s, Postdocs
  • 3. • ACM International Conference on Theory of Information Retrieval (ICTIR) • Its importance in the IR community • Acknowledgement to Guangxi normal university, CCF, and many old and new friends
  • 4. Statistical Modeling of Information Seeking • Aims to connect user’s information seeking behaviors with retrieval models • The ‘dynamics’ in the search process are the primary elements to be modeled • I call this set of novel retrieval algorithms “Dynamic IR Modeling”
  • 5. Task: Dynamic IR • The information retrieval task that aims to find relevant documents for a session of multiple queries. • It happens when information needs are complex, vague, evolving, often containing multiple subtopics • Not possible to be resolved by one-shot ad-hoc retrieval • e.g. “Purchasing a home”, “What is the meaning of life”
  • 6. E.g. Find what city and state Dulles airport is in, what shuttles ride-sharing vans and taxi cabs connect the airport to other cities, what hotels are close to the airport, what are some cheap off-airport parking, and what are the metro stops close to the Dulles airport. Information need User Search Engine An Illustration
  • 7. Characteristics of Dynamic IR • Rich interactions • Query formulation • Document clicks • Document examination • eye movement • mouse movements • etc. 4
  • 8. Characteristics of Dynamic IR • Temporal dependency 5 clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 …… …… Dn qn Cn I information need iteration 1 iteration 2 iteration n
  • 9. Characteristics of Dynamic IR • Aim for a long-term goal • Great if we can find early what a user ultimately want 4
  • 10. Reinforcement Learning (RL) • Fits well in this trial-and-error setting • It is to learn from repeated, varied attempts which are continued until success. • The learner (also known as agent) learns from its dynamic interactions with the world • rather than from a labeled dataset as in supervised learning. • The stochastic model assumes that the system's current state depend on the previous state and action in a non- deterministic manner 6
  • 11. Most of Our Work is inspired by MDPs/POMDPs
  • 12. ○ Based on Markov Decision Process (MDP) ○ States: Queries ! Observable ○ Actions: ! User actions: ○ Add/remove/unchange the query terms ○ Nicely correspond to our definition of query change ! Search Engine actions: ○ Increase/ decrease /remain term weights ○ Rewards: ! nDCG [Guan, Zhang, and Yang SIGIR 2013] QUERY CHANGE MODEL
  • 13. SEARCH ENGINE AGENT’S ACTIONS ∈ Di−1 action Example qtheme Y increase “pocono mountain” in s6 N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction +∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck lobbyists US policy N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy −∆q Y decrease ‘reaction’ in s28, france world cup 98 reaction → france world cup 98 N No change ‘legislation’ in s32, bollywood legislation →bollywood law
  • 14. QUERY CHANGE RETRIEVAL MODEL (QCM) ○ Bellman Equation gives the optimal value for an MDP: ○ The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: Document relevant score Query Transition model Maximum past relevance Current reward/relevance score
  • 15. CALCULATING THE TRANSITION MODEL • According to Query Change and Search Engine Actions Current reward/ relevance score Increase weights for theme terms Decrease weights for old added terms Decrease weights for removed terms Increase weights for novel added terms
  • 16. ○ Partially Observable Markov Decision Process ○ Two agents ● Cooperative game ● Joint Optimization WIN-WIN SEARCH: DUAL-AGENT STOCHASTIC GAME ● Hidden states ● Actions ● Rewards ● Markov [Luo, Zhang, and Yang SIGIR 2014]
  • 17. A MARKOV CHAIN OF DECISION MAKING STATES [Luo, Zhang, and Yang SIGIR 2014]
  • 18. SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration ● scooter price ⟶ scooter stores ● collecting old US coins⟶ selling old US coins ● Philadelphia NYC travel ⟶ Philadelphia NYC train ● Boston tourism ⟶ NYC tourism q0 HIDDEN DECISION MAKING STATES [Luo, Zhang, and Yang SIGIR 2014]
  • 20. ACTIONS ! User Action (Au) ○ add query terms (+Δq) ○ remove query terms (-Δq) ○ keep query terms (qtheme) ! Search Engine Action(Ase) ○ Increase/ decrease/ keep term weights ○ Switch on or off a search technique, ○ e.g. to use or not to use query expansion ○ adjust parameters in search techniques ○ e.g., select the best k for the top k docs used in PRF ! Message from the user(Σu) ○ clicked documents ○ SAT clicked documents ! Message from search engine(Σse) ○ top k returned documents Messages are essentially documents that an agent thinks are relevant. [Luo, Zhang, and Yang SIGIR 2014]
  • 21. REWARDS ! Explicit Rewards: ! nDCG ! Implicit Rewards: ! clicks [Luo et al, SIGIR 2014, ECIR 2015]
  • 22. EXPERIMENTS ○ Corpus: ClubWeb09 and ClueWeb 12, TREC DD datasets ○ Query Logs
  • 23. SEARCH ACCURACY ○ Search accuracy on TREC 2012 Session Track TREC 2012 Session Track ◆ Win-win outperforms most retrieval algorithms on TREC 2012.
  • 24. ◆ Systems in TREC 2012 perform better than in TREC 2013. ◆ many relevant documents are not included in ClueWeb12 CatB collection ◆ Win-win outperforms all retrieval algorithms on TREC 2013. ◆ It is highly effective in Session Search. SEARCH ACCURACY ○ Search accuracy on TREC 2013 Session Track TREC 2013 Session Track
  • 25. SEARCH ACCURACY FOR DIFFERENT SESSION TYPES ○ TREC 2012 Sessions are classified into: ! Product: Factual / Intellectual ! Goal quality: Specific / Amorphous Intellectual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10% - Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamics QCM
  • 26. How to design the states, actions, and rewards
  • 27. DESIGN OPTIONS ○ Is there a temporal component? ○ States – What changes with each time step? ○ Actions – How does your system change the state? ○ Rewards – How do you measure feedback or effectiveness in your problem at each time step? ○ Transition Probability – Can you determine this? ! If not, then model free approach is more suitable ECIR’15
  • 28. … can it be more efficient?
  • 29. A Direct Policy Learning Framework • Learns a direct mapping from observations to actions by gradient descent • Define a history: A chain of events happening in a session • the dynamic changes of states, actions, observations, and rewards in a session ICTIR’15
  • 30. Browse Phase • Actor: the user • It happens • after the search results are shown to the user • before the user starts to write the next query • Records how the user perceives and examines the (previously retrieved) search results ICTIR’15 Decompose a history
  • 31. Query Phase • Actor: the user • It happens • when the user writes a query • Assuming the query is created based on • what has been seen in the browse phase • the information need ICTIR’15 Decompose a history
  • 32. Rank Phase • Actor: the search engine • It happens • after the query is entered • before the search results are returned • It is where the search algorithm takes place Decompose a history
  • 35. Ranking Function • It originally presents the probability of selecting a (ranking) action • In our context, the probability of selecting d to be put at the top of a ranked list under n3 and θ3 at the tth iteration • Then we sort the documents by it to generate the document list
  • 36. Updates: Feature function: Query Features • Test if a search term w∈q t and w∈q t −1 • # of times that a term w occurs in q 1 ,q 2 ,…,q t Query-Document Features • Test if a search term w∈+∆q t and w∈D t −1 • Test if a document d contains a term w ∈ −∆q t tf . idf score of a document d to q t Click Features • Test if there are SAT-Clicks in Dt−1 • # of times a document being clicked in the current session • # of seconds a document being viewed and reviewed in the current session Query-Document-Click Features • Test if qi leads to SAT-Clicks in Di, where i = 0...t−1 Session Features • position at the current session Browse Query Rank
  • 37. Efficiency - TREC 2012 Session • lemur > dpl > qcm > winwin • dpl achieves a good balance between accuracy and efficiency • the conclusions are also consistent upon experiments on TREC’12 ~ 14 Session Tracks DPL
  • 38. TREC 2012 Session • dpl achieves a significant improvement over the TREC best run • We found similar conclusions on TREC 2013 and 2014 Session Tracks DPL
  • 39. TREC DYNAMIC DOMAIN 2015-2017 ! The search task focuses on specific domains ! In the three years, we had explored domains from the dark web (illicit good and Ebola) and polar science, to more general web domains (NYT) ! What is consistent? ○ The participating system is expected to help the user through interactions & get their tasks done ○ User’s information need usually consists of multiple aspects
  • 40. THE TREC DYNAMIC DOMAIN TASK
  • 41. FEEDBACK FROM A SIMULATED USER ! https://github.com/trec-dd/trec-dd-jig
  • 42. DOMAIN USED IN 2017 ○ New York Times Annotated Corpus ! Sandhaus, Evan. "The new york times annotated corpus." Linguistic Data Consortium, Philadelphia 6, no. 12 (2008): e26752. ! Archives of New York Times in 20 years, from January 1, 1987 and June 19, 2007 ! Uncompressed size 16 GB ! Over 1.8 million documents ! Over 650,000 article summaries written by library scientists. ! Over 1,500,000 articles manually tagged by library scientists ! Over 275,000 algorithmically-tagged articles that have been hand verified by professionals
  • 43. ANNOTATION ○ Create Topic and Relevance Judgement at the same time ! Not by pooling ○ Topic – subtopic – passage – Relevance Judgement ○ The challenge: how to be complete
  • 44. ○ Useful information that the user gains ! Raw relevance score ○ Discounting ! Based on document ranking ! Based on diversity ○ User’s efforts ! Time spent ! Lengths of documents being viewed EVALUATION METRICS FOR DYNAMIC SEARCH
  • 45. ○ Most session search metrics consider all those factors into one overwhelmingly complex formula ○ The optimal value, aka upper bound, of those metrics highly varies on different search topics ○ In Cranfield-like settings (e.g. TREC), the difference is often ignored THE PROBLEM
  • 46. TOY EXAMPLE Doc Relevance score regarding topic-subtopic 1-1 1-2 2-1 2-2 2-3 2-4 2-5 d1 1 4 d2 3 4 d3 4 d4 4 d5 4 System Topic 1 CT- topic 1 Topic 2 CT- topic 2 CT-avg Normaliz ed CT- avg System1 d1, irrel, irrel, irrel, irrel 1 d1, d3, d4, d5, irrel 16 8.5 0.596 System2 d2, irrel, irrel, irrel, irrel 3 d1, d3, d4, d5, irrel 14 8.5 0.787 Optimal d1, d2, irrel, irrel, irrel 4 d1, d2, d3, d4, d5 17
  • 47. ○ What is the optimal metric value that a system can achieve? ! How to get the upper bound for each search topic? ! How does it affect the evaluation conclusions? ○ Variance of different topics ○ Normalization RESEARCH QUESTIONS !"#$%& = ( )*+,- $./_!"#$% 1#23", 5 − 7#/%$_8#9:;(1#23") 922%$_8#9:; 1#23" − 7#/%$_8#9:;(1#23")
  • 48. ○ Session-DCG (sDCG) ! Järvelin et al. "Discounted cumulated gain based evaluation of multiple-query IR sessions." Advances in Information Retrieval (2008): 4-15. ○ Cube Test (CT) ! Luo et al. "The water filling model and the cube test: multi-dimensional evaluation for professional search." CIKM, 2013. ○ Expected Utility (EU) ○ Yang and Abhimanyu. "Modeling expected utility of multi-session information distillation." ICTIR 2009. DYNAMIC SEARCH METRICS !" = $ % & ' $ (,* ∈% $ ,∈-.,/ 0, ∗ 23 ,,(,*45 − 7 ∗ 89:;(=, >)) @A = ∑(C5 D ∑*C5 |F(GH.| ∑, 0, IJK =, > ∗ 23(,,(,*45) ∑(C5 D ∑*C5 |F(GH.| 89:;(=, >) :L@M = $ (C5 D $ *C5 |F(GH.| IJK(=, >) 1 + logS > ∗ 1 + logST =
  • 49. ○ sDCG ○ Cube Test ○ Expected Utility DECONSTRUCT THE METRICS CostGain Rank discount Novelty discount !"#$ = & '() * & +() |-'./0| 123(5, 7) 1 + log> 7 ∗ 1 + log>@ 5 #A = ∑'() * ∑+() |-'./0| ∑C DC 123 5, 7 ∗ EF(C,',+G)) ∑'() * ∑+() |-'./0| HI!J(5, 7) KL = & M N O & ',+ ∈M & C∈Q0,R DC ∗ EF C,',+G) − T ∗ HI!J(5, 7))
  • 50. BOUNDS ON DIFFERENT TOPICS !"#$ = "&!'()*+,- $.&*
  • 51. BOUNDS ON DIFFERENT TOPICS !" = $%&'()*+,- ./%* !(&+
  • 52. BOUNDS ON DIFFERENT TOPICS !" = $%&'()*+,- ./%* −$%&'()*+,- 1(&+
  • 53. ! The difference of the optimal value a metric would produce for different topics is large and should not be ignored.
  • 54. ○ Rearrangement Inequality ○ In IR, Probability Ranking Principle [4] ! the overall effectiveness of an IR system can be achieved the best by ranking the documents by their usefulness in descending order OUR SOLUTION !"#$ + !&#$'" + … + !$#" ≤ !* " #" + !* & #& + … + !* $ #$ ≤ !"#" + !&#& + ⋯ + !$#$ ,-. !" ≤ !& … ≤ !$ /01 #" ≤ #& … ≤ #$
  • 55. NORMALIZATION EFFECT !"#$ = "&!'()*+,- $.&*
  • 57. NORMALIZATION EFFECT !" = $%&'()*+,- ./%* − / ∗ $%&'()*+,- 2(&+ / = 0.01
  • 58. ! Using the bounds for normalization brings in more fairness into evaluation
  • 59. Conclusion • Our main contributions: • Put user into the models • Created a bridge between information seeking studies/user behavior studies with learning • Yield a family of new generative retrieval models for a complex, dynamic settings • Able to explain the results
  • 60. A Few Thinkings • Information seeking is a Markov Decision Process, instead of independent searches • User actions that cost more efforts, such as query changes, are stronger signals than clicks • Search is also a learning process for the user, who also evolves • Users and search engines form a partnership to explore the information space • They influence each other; It is a two-way communication • Complex evaluation metrics might not be appropriate; the complexity should either be modelled in the model or the metric, but not in both
  • 61. Look into the future • Dynamic IR Models are good for modeling information seeking • A lot of room to study the user and the search engine interaction in a generative way • The thinking I presented here could be able to generate new methods not only on retrieval and evaluation, but also on related fields • Exciting!!
  • 62. Thank You! • Email: huiyang@cs.georgetown.edu • Group Page: InfoSense at http://infosense.cs.georgetown. edu/ • Dynamic IR Website: http://www.dynamic-ir- modeling.org/ • Book: Dynamic Information Retrieval Modeling • TREC Dynamic Domain Track: http://trec-dd.org/