SlideShare a Scribd company logo
1 of 29
Download to read offline
DESIGNING STATES, ACTIONS,
AND REWARDS FOR USING
POMDP IN SESSION SEARCH
Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang
InfoSense
Department of Computer Science
Georgetown University
{jl1749,sz303,xd47}@georgetown.edu
huiyang@cs.georgetown.edu
1
2
E.g. Find what city and state Dulles airport is in, what shuttles ride-
sharing vans and taxi cabs connect the airport to other cities, what hotels
are close to the airport, what are some cheap off-airport parking, and what
are the metro stops close to the Dulles airport.
DYNAMIC IR- A NEW PERSPECTIVE TO LOOK AT
SEARCH
Information
need
User
Search
Engine
3
¢  Trial-and-error
CHARACTERISTICS OF DYNAMIC IR
3¢  q1 – "dulles hotels"
¢  q2 – "dulles airport"
¢  q3 – "dulles airport location”
¢  q4 – "dulles metrostop"
4
¢  Rich interactions
—  Query formulation
—  Document clicks
—  Document examination
—  eye movement
—  mouse movements
—  etc.
4
CHARACTERISTICS OF DYNAMIC IR
5
¢  Temporal dependency
5
CHARACTERISTICS OF DYNAMIC IR
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
……
…… Dn
qn Cn
I
informa(on	
  need
itera(on	
  1 itera(on	
  2 itera(on	
  n
6
¢  Fits well in this trial-and-error setting
¢  It is to learn from repeated, varied attempts which are
continued until success.
¢  The learner (also known as agent) learns from its dynamic
interactions with the world
—  rather than from a labeled dataset as in supervised learning.
¢  The stochastic model assumes that the system's current
state depend on the previous state and action in a non-
deterministic manner
REINFORCEMENT LEARNING (RL)
6
PARTIALLY OBSERVABLE MARKOV
DECISION PROCESS (POMDP)
7
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
—  Hidden states
—  Actions
—  Rewards
1R. D. Smallwood et. al., ‘73
o1 o2 o3
7
—  Markov
—  Long Term Optimization
—  Observations, Beliefs
8
8
Study designs of states, actions, reward
functions of RL algorithms in Session Search
GOAL OF THIS PAPER
A MARKOV CHAIN OF DECISION MAKING STATES
[Luo, Zhang, and Yang SIGIR 2014]
9
10
¢ Partially Observable Markov Decision Process
¢ Two agents
—  Cooperative game
—  Joint Optimization
WIN-WIN SEARCH: DUAL-AGENT STOCHASTIC
GAME
—  Hidden states
—  Actions
—  Rewards
—  Markov
[Luo, Zhang, and Yang SIGIR 2014]
11
¢  A tuple (S, M, A, R, γ, O, Θ, B)
—  S : state space
—  M: state transition function
—  A: actions
—  R: reward function
—  γ: discount factor, 0< γ ≤1
—  O: observations
a symbol emitted according to a hidden state.
—  Θ: observation function
Θ(s,a,o) is the probability that o is observed when the system
transitions into state s after taking action a, i.e. P(o|s,a).
—  B: belief space
Belief is a probability distribution over hidden states.
PARTIALLY OBSERVABLE MARKOV DECISION
PROCESS (POMDP)
1R. D. Smallwood et. al., ‘73
12
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitation
SNRR
Non-Relevant
& Exploration
—  scooter price ⟶    scooter
stores
—  collecting old US coins⟶  
selling old US coins
—  Philadelphia NYC travel ⟶  
Philadelphia NYC train
—  Boston tourism ⟶ NYC
tourism
q0
HIDDEN DECISION MAKING STATES
[Luo, Zhang, and Yang SIGIR 2014]
ACTIONS
—  User Action (Au)
¢  add query terms (+Δq)
¢  remove query terms (-Δq)
¢  keep query terms (qtheme)
—  Search Engine Action(Ase)
¢  Increase/ decrease/ keep term weights
¢  Switch on or off a search technique,
¢  e.g. to use or not to use query expansion
¢  adjust parameters in search techniques
¢  e.g., select the best k for the top k docs
used in PRF
—  Message from the user(Σu)
¢  clicked documents
¢  SAT clicked documents
—  Message from search engine(Σse)
¢  top k returned documents
Messages are essentially
documents that an agent
thinks are relevant.
[Luo, Zhang, and Yang SIGIR 2014]
13
¢  Based on Markov Decision Process (MDP)
¢  States: Queries
—  Observable
¢  Actions:
—  User actions:
¢  Add/remove/ unchange the query terms
¢  Nicely correspond to our definition of query change
—  Search Engine actions:
¢  Increase/ decrease /remain term weights
¢  Rewards:
—  nDCG
14
[Guan, Zhang, and Yang SIGIR 2013]
2ND MODEL: QUERY CHANGE MODEL
SEARCH ENGINE AGENT’S ACTIONS
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck
lobbyists US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N
No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
15[Guan, Zhang, and Yang SIGIR 2013]
QUERY CHANGE RETRIEVAL MODEL (QCM)
¢  Bellman Equation gives the optimal value for an MDP:
¢  The reward function is used as the document relevance
score function and is tweaked backwards from Bellman
equation:
16
V*
(s) = max
a
R(s,a) + γ P(s' | s,a)
s'
∑ V*
(s')
Score(qi, d) = P (qi|d) + γ P (qi|qi-1, Di-1, a)max
Di−1
P (qi-1|Di-1)
a
∑
Document
relevant score
Query
Transition
model
Maximum
past
relevanceCurrent
reward/
relevance score
[Guan, Zhang, and Yang SIGIR 2013]
CALCULATING THE TRANSITION MODEL
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii
∑
∑∑
∑
Δ−∈
−
∉
Δ+∈
∈
Δ+∈
−
∈
−
−
+−
−
−−
δ
εβ
α
17
•  According to Query Change and Search Engine Actions
Current reward/
relevance score
Increase weights
for theme terms
Decrease
weights for
removed terms
Increase weights
for novel added
terms
Decrease weights
for old added
terms
[Guan, Zhang, and Yang SIGIR 2013]
RELATED WORK
18
¢  Katja Hofmann, Shimon Whiteson, and Maarten de
Rijke. Balancing exploration and exploitation in
learning to rank online. In ECIR'11.
¢  Xiaoran Jin and Marc Sloan, and Jun Wang.
Interactive exploratory search for multi page
search results. In WWW '13
¢  Xuehua Shen, Bin Tan, and Chengxiang Zhai.
Implicit user modeling for personalized search. In
CIKM '05
¢  Norbert Fuhr. A Probability Ranking Principle for
Interactive Information Retrieval. In IRJ, 11, 3,
2008
18
STATE DESIGN OPTIONS
¢  (S1) Fixed number of states
—  use two binary relevance states
¢  “Relevant” or “Irrelevant”
—  use four states
¢  whether the previously retrieved documents are relevant
¢  whether the user desires to explore
¢  (S2) Varying number of states
—  model queries as states, n queries è n states
—  infinity states
¢  document relevance score distribution as states.
¢  one document corresponds to one state
19
ACTION DESIGN OPTIONS
¢  (A1) Technology Selection
—  a meta-level modeling of actions
¢  implement multiple search methods, and select the best
methods for each query
¢  Select the best parameters for each method
¢  (A2) Term Weight Adjustment
—  adjusted term weights
¢  (A3) Ranked List
—  One possible ranking of a list of documents is one
single action
¢  If the corpus size is N and the retrieved document number
is n, then the size of the action space is:
20
PN
n
= N(N −1)...(N − n +1) =
N!
(N − n)!
REWARD FUNCTION DESIGN OPTIONS
¢  (R1) Explicit Feedback
—  Rewards generated from user’s relevance assessments.
¢  nDCG, MAP, etc
¢  (R2) Implicit Feedback
—  Use implicit feedback obtained from user behavior
¢  Clicks, SAT clicks
21
SYSTEMS UNDER COMPARISON
¢  Luo, et al. Win-Win Search: Dual-Agent
Stochastic Game in Session Search.
SIGIR’14
¢  Zhang, et al. A POMDP Model for Content-
Free Document Re-ranking. SIGIR’14
¢  Guan, et al. Utilizing Query Change for
Session Search. SIGIR’13
¢  Shen, et al. Implicit user modeling for
personalized search. CIKM '05
¢  Jin, et al. Interactive exploratory search
for multi page search results. WWW '13
22
S1A1R1(win-win)
S1A3R2
S2A2R1(QCM)
S2A1R1(UCAIR)
S2A3R1(IES)
S1A1R2
S1A2R1
S2A1R1
EXPERIMENTS
¢  Evaluate on TREC 2012 and 2013 Session Tracks
—  The session logs contain
¢  session topic
¢  user queries
¢  previously retrieved URLs, snippets
¢  user clicks, and dwell time etc.
—  Task: retrieve 2,000 documents for the last query in each session
—  The evaluation is based on the whole session. Metrics include:
¢  nDCG@10, nDCG, nERR@10 and MAP
¢  Wall Clock Time, CPU cycles and the Big O notation
23
¢  Datasets
—  ClueWeb09 CatB
—  ClueWeb12 CatB
—  spam documents are
removed
—  duplicated documents
are removed
EFFICIENCY VS. # OF ACTIONS ON TREC 2012
24
¢  When number of actions increases, efficiency tends to
drop dramatically
¢  S1A3R2, S1A2R1,
S2A1R1(UCAIR),
S2A2R1(QCM) and
S2A1R1 are efficient
¢  S1A1R1(win-win) and
S1A1R2 are
moderately efficient
¢  S2A3R1(IES) is the
slowest system
ACCURACY VS. EFFICIENCY
25
TREC 2012 TREC 2013
¢  Accuracy tends to increase when efficiency decreases
¢  S2A1R1(UCAIR) strikes a good balance between accuracy
and efficiency
¢  S1A1R1(win-win) gives impressive accuracy with a fair
degree of efficiency
OUR RECOMMENDATION
26
¢  If focus on
accuracy
¢  If time limit is
within one hour
¢  If want the balance
of accuracy and
efficiency
v  Note: number of actions heavily effect efficiency which need to be
carefully designed
CONCLUSIONS
¢  POMDPs are good for session search modeling
—  Information seeking behaviors
¢  Design questions
—  States: What changes with each time step?
—  Actions: How does our system change the state?
—  Rewards: How can we measure feedback or
effectiveness?
¢  It is something between an Art and Empirical
Experiments
¢  Balance between efficiency and accuracy
27
RESOURCES
¢  Infosense
—  http://infosense.cs.georgetown.edu/
¢  Dynamic IR Website
—  Tutorials : http://www.dynamic-ir-modeling.org/
¢  Live Online Search Engine – Dumpling
—  http://dumplingproject.org
¢  Upcoming Book
—  Dynamic Information Retrieval Modeling
¢  TREC 2015 Dynamic Domain Track
—  http://trec-dd.org/
—  Please participate, if you are interested in
interactive, and dynamic search
28
THANK YOU
29
InfoSense
Georgetown University
huiyang@cs.georgetown.edu

More Related Content

Similar to Designing States, Actions, and Rewards for Using POMDP in Session Search

Dynamic Search and Beyond
Dynamic Search and BeyondDynamic Search and Beyond
Dynamic Search and BeyondGrace Hui Yang
 
GraphTour - How to Build Next-Generation Solutions using Graph Databases
GraphTour - How to Build Next-Generation Solutions using Graph DatabasesGraphTour - How to Build Next-Generation Solutions using Graph Databases
GraphTour - How to Build Next-Generation Solutions using Graph DatabasesNeo4j
 
Anomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement LearningAnomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement LearningHari Koduvely (PhD)
 
The Process and Toolkit of Layer 2 Mechanism Design
The Process and Toolkit of Layer 2 Mechanism DesignThe Process and Toolkit of Layer 2 Mechanism Design
The Process and Toolkit of Layer 2 Mechanism DesignBrandon Ramirez
 
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Alejandro Salado
 
Reactive programming at scale
Reactive programming at scale Reactive programming at scale
Reactive programming at scale John McClean
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittGraySystemsLab
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...openCypher
 
How we learned to rank search results big data meetup
How we learned to rank search results   big data meetupHow we learned to rank search results   big data meetup
How we learned to rank search results big data meetupMouloud LOUNACI
 
Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Qingxia Liu
 
Query Reranking As A Service
Query Reranking As A ServiceQuery Reranking As A Service
Query Reranking As A ServiceAbolfazl Asudeh
 
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...Neo4j
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDBOnto
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Xun Wang
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksBen Ball
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentationarhismece
 
Cadence: The Only Workflow Platform You'll Ever Need
Cadence: The Only Workflow Platform You'll Ever NeedCadence: The Only Workflow Platform You'll Ever Need
Cadence: The Only Workflow Platform You'll Ever NeedMaxim Fateev
 

Similar to Designing States, Actions, and Rewards for Using POMDP in Session Search (20)

Dynamic Search and Beyond
Dynamic Search and BeyondDynamic Search and Beyond
Dynamic Search and Beyond
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
GraphTour - How to Build Next-Generation Solutions using Graph Databases
GraphTour - How to Build Next-Generation Solutions using Graph DatabasesGraphTour - How to Build Next-Generation Solutions using Graph Databases
GraphTour - How to Build Next-Generation Solutions using Graph Databases
 
Anomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement LearningAnomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement Learning
 
The Process and Toolkit of Layer 2 Mechanism Design
The Process and Toolkit of Layer 2 Mechanism DesignThe Process and Toolkit of Layer 2 Mechanism Design
The Process and Toolkit of Layer 2 Mechanism Design
 
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
 
Reactive programming at scale
Reactive programming at scale Reactive programming at scale
Reactive programming at scale
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWitt
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...
 
How we learned to rank search results big data meetup
How we learned to rank search results   big data meetupHow we learned to rank search results   big data meetup
How we learned to rank search results big data meetup
 
Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)
 
Query Reranking As A Service
Query Reranking As A ServiceQuery Reranking As A Service
Query Reranking As A Service
 
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
Neo4j GraphTalk Helsinki - Next-Gerneation Telecommunication Solutions with N...
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
 
Cadence: The Only Workflow Platform You'll Ever Need
Cadence: The Only Workflow Platform You'll Ever NeedCadence: The Only Workflow Platform You'll Ever Need
Cadence: The Only Workflow Platform You'll Ever Need
 

Recently uploaded

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptxCherry
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsDeepika Singh
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Cherry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCherry
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 

Recently uploaded (20)

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Early Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdfEarly Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdf
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 

Designing States, Actions, and Rewards for Using POMDP in Session Search

  • 1. DESIGNING STATES, ACTIONS, AND REWARDS FOR USING POMDP IN SESSION SEARCH Jiyun Luo, Sicong Zhang, Xuchu Dong, Grace Hui Yang InfoSense Department of Computer Science Georgetown University {jl1749,sz303,xd47}@georgetown.edu huiyang@cs.georgetown.edu 1
  • 2. 2 E.g. Find what city and state Dulles airport is in, what shuttles ride- sharing vans and taxi cabs connect the airport to other cities, what hotels are close to the airport, what are some cheap off-airport parking, and what are the metro stops close to the Dulles airport. DYNAMIC IR- A NEW PERSPECTIVE TO LOOK AT SEARCH Information need User Search Engine
  • 3. 3 ¢  Trial-and-error CHARACTERISTICS OF DYNAMIC IR 3¢  q1 – "dulles hotels" ¢  q2 – "dulles airport" ¢  q3 – "dulles airport location” ¢  q4 – "dulles metrostop"
  • 4. 4 ¢  Rich interactions —  Query formulation —  Document clicks —  Document examination —  eye movement —  mouse movements —  etc. 4 CHARACTERISTICS OF DYNAMIC IR
  • 5. 5 ¢  Temporal dependency 5 CHARACTERISTICS OF DYNAMIC IR clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 …… …… Dn qn Cn I informa(on  need itera(on  1 itera(on  2 itera(on  n
  • 6. 6 ¢  Fits well in this trial-and-error setting ¢  It is to learn from repeated, varied attempts which are continued until success. ¢  The learner (also known as agent) learns from its dynamic interactions with the world —  rather than from a labeled dataset as in supervised learning. ¢  The stochastic model assumes that the system's current state depend on the previous state and action in a non- deterministic manner REINFORCEMENT LEARNING (RL) 6
  • 7. PARTIALLY OBSERVABLE MARKOV DECISION PROCESS (POMDP) 7 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 —  Hidden states —  Actions —  Rewards 1R. D. Smallwood et. al., ‘73 o1 o2 o3 7 —  Markov —  Long Term Optimization —  Observations, Beliefs
  • 8. 8 8 Study designs of states, actions, reward functions of RL algorithms in Session Search GOAL OF THIS PAPER
  • 9. A MARKOV CHAIN OF DECISION MAKING STATES [Luo, Zhang, and Yang SIGIR 2014] 9
  • 10. 10 ¢ Partially Observable Markov Decision Process ¢ Two agents —  Cooperative game —  Joint Optimization WIN-WIN SEARCH: DUAL-AGENT STOCHASTIC GAME —  Hidden states —  Actions —  Rewards —  Markov [Luo, Zhang, and Yang SIGIR 2014]
  • 11. 11 ¢  A tuple (S, M, A, R, γ, O, Θ, B) —  S : state space —  M: state transition function —  A: actions —  R: reward function —  γ: discount factor, 0< γ ≤1 —  O: observations a symbol emitted according to a hidden state. —  Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). —  B: belief space Belief is a probability distribution over hidden states. PARTIALLY OBSERVABLE MARKOV DECISION PROCESS (POMDP) 1R. D. Smallwood et. al., ‘73
  • 12. 12 SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration —  scooter price ⟶    scooter stores —  collecting old US coins⟶   selling old US coins —  Philadelphia NYC travel ⟶   Philadelphia NYC train —  Boston tourism ⟶ NYC tourism q0 HIDDEN DECISION MAKING STATES [Luo, Zhang, and Yang SIGIR 2014]
  • 13. ACTIONS —  User Action (Au) ¢  add query terms (+Δq) ¢  remove query terms (-Δq) ¢  keep query terms (qtheme) —  Search Engine Action(Ase) ¢  Increase/ decrease/ keep term weights ¢  Switch on or off a search technique, ¢  e.g. to use or not to use query expansion ¢  adjust parameters in search techniques ¢  e.g., select the best k for the top k docs used in PRF —  Message from the user(Σu) ¢  clicked documents ¢  SAT clicked documents —  Message from search engine(Σse) ¢  top k returned documents Messages are essentially documents that an agent thinks are relevant. [Luo, Zhang, and Yang SIGIR 2014] 13
  • 14. ¢  Based on Markov Decision Process (MDP) ¢  States: Queries —  Observable ¢  Actions: —  User actions: ¢  Add/remove/ unchange the query terms ¢  Nicely correspond to our definition of query change —  Search Engine actions: ¢  Increase/ decrease /remain term weights ¢  Rewards: —  nDCG 14 [Guan, Zhang, and Yang SIGIR 2013] 2ND MODEL: QUERY CHANGE MODEL
  • 15. SEARCH ENGINE AGENT’S ACTIONS ∈ Di−1 action Example qtheme Y increase “pocono mountain” in s6 N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction +∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck lobbyists US policy N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy −∆q Y decrease ‘reaction’ in s28, france world cup 98 reaction → france world cup 98 N No change ‘legislation’ in s32, bollywood legislation →bollywood law 15[Guan, Zhang, and Yang SIGIR 2013]
  • 16. QUERY CHANGE RETRIEVAL MODEL (QCM) ¢  Bellman Equation gives the optimal value for an MDP: ¢  The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 16 V* (s) = max a R(s,a) + γ P(s' | s,a) s' ∑ V* (s') Score(qi, d) = P (qi|d) + γ P (qi|qi-1, Di-1, a)max Di−1 P (qi-1|Di-1) a ∑ Document relevant score Query Transition model Maximum past relevanceCurrent reward/ relevance score [Guan, Zhang, and Yang SIGIR 2013]
  • 17. CALCULATING THE TRANSITION MODEL )|(log)|( )|(log)()|(log)|( )|(log)]|(1[+d)|P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii ∑ ∑∑ ∑ Δ−∈ − ∉ Δ+∈ ∈ Δ+∈ − ∈ − − +− − −− δ εβ α 17 •  According to Query Change and Search Engine Actions Current reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added terms Decrease weights for old added terms [Guan, Zhang, and Yang SIGIR 2013]
  • 18. RELATED WORK 18 ¢  Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Balancing exploration and exploitation in learning to rank online. In ECIR'11. ¢  Xiaoran Jin and Marc Sloan, and Jun Wang. Interactive exploratory search for multi page search results. In WWW '13 ¢  Xuehua Shen, Bin Tan, and Chengxiang Zhai. Implicit user modeling for personalized search. In CIKM '05 ¢  Norbert Fuhr. A Probability Ranking Principle for Interactive Information Retrieval. In IRJ, 11, 3, 2008 18
  • 19. STATE DESIGN OPTIONS ¢  (S1) Fixed number of states —  use two binary relevance states ¢  “Relevant” or “Irrelevant” —  use four states ¢  whether the previously retrieved documents are relevant ¢  whether the user desires to explore ¢  (S2) Varying number of states —  model queries as states, n queries è n states —  infinity states ¢  document relevance score distribution as states. ¢  one document corresponds to one state 19
  • 20. ACTION DESIGN OPTIONS ¢  (A1) Technology Selection —  a meta-level modeling of actions ¢  implement multiple search methods, and select the best methods for each query ¢  Select the best parameters for each method ¢  (A2) Term Weight Adjustment —  adjusted term weights ¢  (A3) Ranked List —  One possible ranking of a list of documents is one single action ¢  If the corpus size is N and the retrieved document number is n, then the size of the action space is: 20 PN n = N(N −1)...(N − n +1) = N! (N − n)!
  • 21. REWARD FUNCTION DESIGN OPTIONS ¢  (R1) Explicit Feedback —  Rewards generated from user’s relevance assessments. ¢  nDCG, MAP, etc ¢  (R2) Implicit Feedback —  Use implicit feedback obtained from user behavior ¢  Clicks, SAT clicks 21
  • 22. SYSTEMS UNDER COMPARISON ¢  Luo, et al. Win-Win Search: Dual-Agent Stochastic Game in Session Search. SIGIR’14 ¢  Zhang, et al. A POMDP Model for Content- Free Document Re-ranking. SIGIR’14 ¢  Guan, et al. Utilizing Query Change for Session Search. SIGIR’13 ¢  Shen, et al. Implicit user modeling for personalized search. CIKM '05 ¢  Jin, et al. Interactive exploratory search for multi page search results. WWW '13 22 S1A1R1(win-win) S1A3R2 S2A2R1(QCM) S2A1R1(UCAIR) S2A3R1(IES) S1A1R2 S1A2R1 S2A1R1
  • 23. EXPERIMENTS ¢  Evaluate on TREC 2012 and 2013 Session Tracks —  The session logs contain ¢  session topic ¢  user queries ¢  previously retrieved URLs, snippets ¢  user clicks, and dwell time etc. —  Task: retrieve 2,000 documents for the last query in each session —  The evaluation is based on the whole session. Metrics include: ¢  nDCG@10, nDCG, nERR@10 and MAP ¢  Wall Clock Time, CPU cycles and the Big O notation 23 ¢  Datasets —  ClueWeb09 CatB —  ClueWeb12 CatB —  spam documents are removed —  duplicated documents are removed
  • 24. EFFICIENCY VS. # OF ACTIONS ON TREC 2012 24 ¢  When number of actions increases, efficiency tends to drop dramatically ¢  S1A3R2, S1A2R1, S2A1R1(UCAIR), S2A2R1(QCM) and S2A1R1 are efficient ¢  S1A1R1(win-win) and S1A1R2 are moderately efficient ¢  S2A3R1(IES) is the slowest system
  • 25. ACCURACY VS. EFFICIENCY 25 TREC 2012 TREC 2013 ¢  Accuracy tends to increase when efficiency decreases ¢  S2A1R1(UCAIR) strikes a good balance between accuracy and efficiency ¢  S1A1R1(win-win) gives impressive accuracy with a fair degree of efficiency
  • 26. OUR RECOMMENDATION 26 ¢  If focus on accuracy ¢  If time limit is within one hour ¢  If want the balance of accuracy and efficiency v  Note: number of actions heavily effect efficiency which need to be carefully designed
  • 27. CONCLUSIONS ¢  POMDPs are good for session search modeling —  Information seeking behaviors ¢  Design questions —  States: What changes with each time step? —  Actions: How does our system change the state? —  Rewards: How can we measure feedback or effectiveness? ¢  It is something between an Art and Empirical Experiments ¢  Balance between efficiency and accuracy 27
  • 28. RESOURCES ¢  Infosense —  http://infosense.cs.georgetown.edu/ ¢  Dynamic IR Website —  Tutorials : http://www.dynamic-ir-modeling.org/ ¢  Live Online Search Engine – Dumpling —  http://dumplingproject.org ¢  Upcoming Book —  Dynamic Information Retrieval Modeling ¢  TREC 2015 Dynamic Domain Track —  http://trec-dd.org/ —  Please participate, if you are interested in interactive, and dynamic search 28