Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dynamic Information Retrieval Tutorial - SIGIR 2015

6,342 views

Published on

Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Examples include large datasets containing sequential data capturing document dynamics and modern IR systems observing user dynamics through interactivity. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive.

The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. Dynamic IR Modeling is the statistical modeling of IR systems that can adapt to change. It is a natural follow-up to previous statistical IR modeling tutorials with a fresh look on state-of-the-art dynamic retrieval models and their applications including session search and online advertising. The tutorial covers techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and presents to fellow researchers and practitioners a handful of useful algorithms and tools for solving IR problems incorporating dynamics.

http://www.dynamic-ir-modeling.org/

A newer version of this tutorial presented at WSDM 2015 can be found here http://www.slideshare.net/marcCsloan/dynamic-information-retrieval-tutorial-wsdm-2015
This version has a greater emphasis on the underlying theory and a guest lecture on evaluation by Dr Emine Yilmaz. The newer version presents a wider range of applications of DIR in state of the art research and includes a guest lecture on evaluation by Prof Charles Clarke.

@inproceedings{Yang:2014:DIR:2600428.2602297,
author = {Yang, Hui and Sloan, Marc and Wang, Jun},
title = {Dynamic Information Retrieval Modeling},
booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
series = {SIGIR '14},
year = {2014},
isbn = {978-1-4503-2257-7},
location = {Gold Coast, Queensland, Australia},
pages = {1290--1290},
numpages = {1},
url = {http://doi.acm.org/10.1145/2600428.2602297},
doi = {10.1145/2600428.2602297},
acmid = {2602297},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning},
}

Published in: Science, Business, Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Dynamic Information Retrieval Tutorial - SIGIR 2015

  1. 1. SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling
  2. 2. Dynamic Information Retrieval ModelingTutorial 20142
  3. 3. Age of Empire Dynamic Information Retrieval ModelingTutorial 20143
  4. 4. Dynamic Information Retrieval Dynamic Information Retrieval ModelingTutorial 20144 Documents to explore Information need Observed documents User Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which aren’t, and satisfy their information need.
  5. 5. Evolving IR Dynamic Information Retrieval ModelingTutorial 20145  Paradigm shifts in IR as new models emerge  e.g.VSM → BM25 → Language Model  Different ways of defining relationship between query and document  Static → Interactive → Dynamic  Evolution in modeling user interaction with search engine
  6. 6. Outline Dynamic Information Retrieval ModelingTutorial 20146  Introduction  Static IR  Interactive IR  Dynamic IR  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  7. 7. Conceptual Model – Static IR Dynamic Information Retrieval ModelingTutorial 20147 Static IR Interactive IR Dynamic IR  No feedback
  8. 8. Characteristics of Static IR Dynamic Information Retrieval ModelingTutorial 20148  Does not learn directly from user  Parameters updated periodically
  9. 9. Static Information Retrieval Model Dynamic Information Retrieval ModelingTutorial 20149 Learning to Rank
  10. 10. Dynamic Information Retrieval ModelingTutorial 201410 Commonly Used Static IR Models BM25 PageRank Language Model
  11. 11. Feedback in IR Dynamic Information Retrieval ModelingTutorial 201411
  12. 12. Outline Dynamic Information Retrieval ModelingTutorial 201412  Introduction  Static IR  Interactive IR  Dynamic IR  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  13. 13. Conceptual Model – Interactive IR Dynamic Information Retrieval ModelingTutorial 201413 Static IR Interactive IR Dynamic IR  Exploit Feedback
  14. 14. Interactive User Feedback Dynamic Information Retrieval ModelingTutorial 201414 Like, dislike, pause, skip
  15. 15. Learn the user’s taste interactively! At the same time, provide good recommendations! Dynamic Information Retrieval ModelingTutorial 201415 Interactive Recommender Systems
  16. 16. Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201416 Ambiguous Query
  17. 17. Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201417 Topic: Car
  18. 18. Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201418 Topic:Animal
  19. 19. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201419 Click on ‘car’ webpage
  20. 20. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201420 Click on ‘Next Page’
  21. 21. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201421 Page 2 results: Cars
  22. 22. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201422 Click on ‘animal’ webpage
  23. 23. Example – Interactive Search Dynamic Information Retrieval ModelingTutorial 201423 Page 2 results: Animals
  24. 24. Example – Dynamic Search Dynamic Information Retrieval ModelingTutorial 201424 Topic: Guitar
  25. 25. Example – Dynamic Search Dynamic Information Retrieval ModelingTutorial 201425 Diversified Page 1 Topics: Cars, animals, guitars
  26. 26. Toy Example Dynamic Information Retrieval ModelingTutorial 201426  Multi-Page search scenario  User image searches for “jaguar”  Rank two of the four results over two pages: 𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
  27. 27. Toy Example – Static Ranking Dynamic Information Retrieval ModelingTutorial 201427  Ranked according to PRP Page 1 Page 2 1. 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49
  28. 28. Toy Example – Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201428  Interactive Search  Improve 2nd page based on feedback from 1st page  Use clicks as relevance feedback  Rocchio1 algorithm on terms in image webpage  𝑤 𝑞 ′ = 𝛼𝑤 𝑞 + 𝛽 |𝐷 𝑟| 𝑤 𝑑𝑑∈𝐷 𝑟 − 𝛾 𝐷 𝑛 𝑤 𝑑𝑑∈𝐷 𝑛  New query closer to relevant documents and different to non-relevant documents 1Rocchio, J. J., ’71, Baeza-Yates & Ribeiro-Neto‘99
  29. 29. Toy Example – Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201429  Ranked according to PRP and Rocchio Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49 1. * * Click
  30. 30. Toy Example – Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201430  No click when searching for animals Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 1. ? ?
  31. 31. Toy Example – Value Function Dynamic Information Retrieval ModelingTutorial 201431  Optimize both pages using dynamic IR  Bellman equation for value function  Simplified example:  𝑉 𝑡 𝜃 𝑡 , Σ 𝑡 = max 𝑠 𝑡 𝜃𝑠 𝑡 + 𝐸(𝑉 𝑡+1 𝜃 𝑡+1 , Σ 𝑡+1 𝐶 𝑡 )  𝜃 𝑡 , Σ 𝑡 = relevance and covariance of documents for page 𝑡  𝐶 𝑡 = clicks on page 𝑡  𝑉 𝑡 =‘value’ of ranking on page 𝑡  Maximize value over all pages based on estimating feedback
  32. 32. 1 0.8 0.1 0 0.8 1 0.1 0 0.1 0.1 1 0.95 0 0 0.95 1 Toy Example - Covariance Dynamic Information Retrieval ModelingTutorial 201432  Covariance matrix represents similarity between images
  33. 33. Toy Example – Myopic Value Dynamic Information Retrieval ModelingTutorial 201433  For myopic ranking, 𝑉2 = 16.380 Page 1 2. 1.
  34. 34. Toy Example – Myopic Ranking Dynamic Information Retrieval ModelingTutorial 201434  Page 2 ranking stays the same regardless of clicks Page 1 Page 2 2. 1. 2. 1.
  35. 35. Toy Example – Optimal Value Dynamic Information Retrieval ModelingTutorial 201435  For optimal ranking, 𝑉2 = 16.528 Page 1 2. 1.
  36. 36. Toy Example – Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201436  If car clicked, Jaguar logo is more relevant on next page Page 1 Page 2 2. 1. 2. 1.
  37. 37. Toy Example – Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201437  In all other scenarios, rank animal first on next page Page 1 Page 2 2. 1. 2. 1.
  38. 38. Interactive vs Dynamic IR Dynamic Information Retrieval ModelingTutorial 201438 • Treats interactions independently • Responds to immediate feedback • Static IR used before feedback received • Optimizes over all interaction • Long term gains • Models future user feedback • Also used at beginning of interaction Interactive Dynamic
  39. 39. Outline Dynamic Information Retrieval ModelingTutorial 201439  Introduction  Static IR  Interactive IR  Dynamic IR  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  40. 40. Conceptual Model – Dynamic IR Dynamic Information Retrieval ModelingTutorial 201440 Static IR Interactive IR Dynamic IR  Explore and exploit Feedback
  41. 41. Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201441 Rich interactions  Query formulation  Document clicks  Document examination  eye movement  mouse movements  etc.
  42. 42. Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201442 Temporal dependency clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 …… …… Dn qn Cn I information need iteration 1 iteration 2 iteration n
  43. 43. Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201443 Overall goal Optimize over all iterations for goal IR metric or user satisfaction Optimal policy
  44. 44. Dynamic IR Dynamic Information Retrieval ModelingTutorial 201444  Dynamic IR explores actions  Dynamic IR learns from user and adjusts its actions  May hurt performance in a single stage, but improves over all stages
  45. 45. Applications to IR Dynamic Information Retrieval ModelingTutorial 201445  Dynamics found in lots of different aspects of IR  Dynamic Users  Users change behaviour over time, user history  Dynamic Documents  Information Filtering, document content change  Dynamic Queries  Changing query definition i.e.‘Twitter’  Dynamic Information Needs  Topic ontologies evolve over time  Dynamic Relevance  Seasonal/time of day change in relevance
  46. 46. User Interactivity in DIR Dynamic Information Retrieval ModelingTutorial 201446  Modern IR interfaces  Facets  Verticals  Personalization  Responsive to particular user  Complex log data  Mobile  Richer user interactions  Ads  Adaptive targeting
  47. 47. Big Data Dynamic Information Retrieval ModelingTutorial 201447  Data set sizes are always increasing  Computational footprint of learning to rank  Rich, sequential data 1Yin He et. al, ’11  Complex user model behaviour found in data, takes into account reading, skipping and re-reading behaviours1  Uses a POMDP Example
  48. 48. Online Learning to Rank Dynamic Information Retrieval ModelingTutorial 201448  Learning to rank iteratively on sequential data  Clicks as implicit user feedback/preference  Often uses multi-armed bandit techniques 1Katja Hofmann et. al., ’11 2YisongYue et. al.,‘09  Uses click models to interpret clicks and a contextual bandit to improve learning1  Pairwise comparison of rankings using duelling bandits formulation2 Example
  49. 49. Evaluation Dynamic Information Retrieval ModelingTutorial 201449  Use complex user interaction data to assess rankings  Compare ranking techniques in online testing  Minimise user dissatisfaction 1Jeff Huang et. al.,‘11 2Olivier Chapelle et. al.,‘12  Modelled cursor activity and correlated with eye tracking to validate good or bad abandonment1  Interleave search results from two ranking algorithms to determine which is better2 Example
  50. 50. Filtering and News Dynamic Information Retrieval ModelingTutorial 201450  Adaptive techniques to personalize information filtering or news recommendation  Understand the complex dynamics of real world events in search logs  Capture temporal document change1 1Dennis Fetterly et. al.,‘03 2Stephen Robertson,‘02 3Jure Leskovec et. al.,‘09  Uses relevance feedback to adapt threshold sensitivity over time in information filtering to maximise overal utility1  Detected patterns and memes in news cycles and modeled how information spreads2 Example
  51. 51. Advertising Dynamic Information Retrieval ModelingTutorial 201451  Behavioural targeting and personalized ads  Learn when to display new ads  Maximise profit from available ads 1ShuaiYuan et. al.,‘12 2ZeyuanAllen Zhu et. al.,‘10  Uses a POMDP and ad correlation to find the optimal ad to display to a user1  Dynamic click model that can interpret complex user behaviour in logs and apply results to tail queries and unseen ads2 Example
  52. 52. Outline Dynamic Information Retrieval ModelingTutorial 201452  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  53. 53. Outline Dynamic Information Retrieval ModelingTutorial 201453  Introduction  Theory and Models  Why not use supervised learning  Markov Models  Session Search  Reranking  Evaluation
  54. 54. Why not use Supervised Learning for Dynamic IR Modeling? Dynamic Information Retrieval ModelingTutorial 201454  Lack of enough training data  Dynamic IR problems contain a sequence of dynamic interactions  E.g. a series of queries in session  Rare to find repeated sequences (close to zero)  Even in large query logs (WSCD 2013 & 2014, query logs fromYandex)  Chance of finding repeated adjacent query pairs is also low Dataset Repeated Adjacent Query Pairs Total Adjacent Query Pairs Repeated Percentage WSCD 2013 476,390 17,784,583 2.68% WSCD 2014 1,959,440 35,376,008 5.54%
  55. 55. Our Solution Dynamic Information Retrieval ModelingTutorial 201455 Try to find an optimal solution through a sequence of dynamic interactions Trial and Error: learn from repeated, varied attempts which are continued until success No Supervised Learning
  56. 56. Trial and Error Dynamic Information Retrieval ModelingTutorial 201456  q1 – "dulles hotels"  q2 – "dulles airport"  q3 – "dulles airport location"  q4 – "dulles metrostop"
  57. 57. Dynamic Information Retrieval ModelingTutorial 201457  Rich interactions Query formulation, Document clicks, Document examination, eye movement, mouse movements, etc.  Temporal dependency  Overall goal Recap – Characteristics of Dynamic IR
  58. 58. Dynamic Information Retrieval ModelingTutorial 201458  Model interactions, which means it needs to have place holders for actions;  Model information need hidden behind user queries and other interactions;  Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;  Represent Markov properties to handle the temporal dependency. What is a Desirable Model for Dynamic IR A model inTrial and Error setting will do! A Markov Model will do!
  59. 59. Outline Dynamic Information Retrieval ModelingTutorial 201459  Introduction  Theory and Models  Why not use supervised learning  Markov Models  Session Search  Reranking  Evaluation
  60. 60. Markov Process  Markov Property1 (the “memoryless” property) for a system, its next state depends on its current state. Pr(Si+1|Si,…,S0)=Pr(Si+1|Si)  Markov Process a stochastic process with Markov property. e.g. Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,‘06 s0 s1 …… si ……si+1
  61. 61. Dynamic Information Retrieval ModelingTutorial 201461  Markov Chain  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-armed Bandit Family of Markov Models
  62. 62. A Pagerank(A)  Discrete-time Markov process  Example: Google PageRank1 Markov Chain B Pagerank(B) 𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘 𝑆 = 1 − 𝛼 𝑁 + 𝛼 𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘(𝑌) 𝐿(𝑌) 𝑌∈Π # of pages # of outlinks pages linked to S Dynamic Information Retrieval ModelingTutorial 201462 D Pagerank(D) C Pagerank(C) E Pagerank(E) Random jump factor 1L. Page et. al.,‘99 The stable state distribution of such an MC is PageRank  State S – web page  Transition probability M  PageRank: how likely a random web surfer will land on a page (S, M)
  63. 63. Hidden Markov Model  A Markov chain that states are hidden and observable symbols are emitted with some probability according to its states1. Dynamic Information Retrieval ModelingTutorial 201463 s0 s1 s2 …… o0 o1 o2 p0 𝑒0 p1 p2 𝑒1 𝑒2 Si– hidden state pi -- transition probability oi --observation ei --observation probability (emission probability) 1Leonard E. Baum et. al.,‘66 (S, M, O, e)
  64. 64. An HMM example for IR Construct an HMM for each document1 Dynamic Information Retrieval ModelingTutorial 201464 s0 s1 s2 …… t0 t1 t2 p0 𝑒0 p1 p2 𝑒1 𝑒2 Si– “Document” or “General English” pi –a0 or a1 ti – query term ei – Pr(t|D) or Pr(t|GE) P(D|q)∝ (𝑎0 𝑃 𝑡 𝐺𝐸 + 𝑎1 𝑃(𝑡|𝐷))𝑡∈𝑞 Document-to-query relevance 1Miller et. al.‘99 query
  65. 65.  MDP extends MC with actions and rewards1 si– state ai – action ri – reward pi – transition probability p0 p1 p2 Markov Decision Process Dynamic Information Retrieval ModelingTutorial 201465 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 1R. Bellman,‘57 (S, M, A, R, γ)
  66. 66. Definition of MDP  A tuple (S, M, A, R, γ)  S : state space  M: transition matrix Ma(s, s') = P(s'|s, a)  A: action space  R: reward function R(s,a) = immediate reward taking action a at state s  γ: discount factor, 0< γ ≤1  policy π π(s) = the action taken at state s  Goal is to find an optimal policy π* maximizing the expected total rewards. Dynamic Information Retrieval ModelingTutorial 201466
  67. 67. Policy Policy: (s) = a According to which, select an action a at state s. (s0) =move right and ups0 (s1) =move right and ups1 (s2) = move rights2 Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrin’s ML lecture]
  68. 68. Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by   [0,1) Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrin’s ML lecture]
  69. 69. Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by   [0,1) s1 R(s1) s1’’ s1’ R(s1’) R(s1’’) Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrin’s ML lecture]
  70. 70. Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) +  R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by   [0,1) s1 R(s1) s1’’ s1’ R(s1’) R(s1’’) (s1) R(s2) s2 (s1’) (s1’’) s2’’ s2’ R(s2’) R(s2’’) Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrin’s ML lecture]
  71. 71. Computing the value of a policy Dynamic Information Retrieval ModelingTutorial 201471 V(s0) = 𝐸 𝜋 [𝑅 𝑠0, 𝑎 + 𝛾𝑅 𝑠1, 𝑎 + 𝛾2 𝑅 𝑠2, 𝑎 + 𝛾3 𝑅 𝑠3, 𝑎 + ⋯ ] =𝐸 𝜋[𝑅 𝑠0, 𝑎 + 𝛾 𝛾 𝑡−1 𝑅(𝑠𝑡, 𝑎)∞ 𝑡=1 ] =𝑅 𝑠0, 𝑎 + 𝛾𝐸 𝜋 [ 𝛾 𝑡−1 𝑅(𝑠𝑡, 𝑎)∞ 𝑡=1 ] =𝑅 𝑠0, 𝑎 + 𝛾 𝑀 𝜋 𝑠 (𝑠, 𝑠′) 𝑉(𝑠′)𝑠′ Value function A possible next state The current state
  72. 72. Optimality — Bellman Equation  The Bellman equation1 to MDP is a recursive definition of the optimal value function V*(.) 𝑉∗ s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201472  Optimal Policy π∗ s = arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′) 𝑠′ 1R. Bellman,‘57 state-value function
  73. 73. Optimality — Bellman Equation  The Bellman equation can be rewritten as 𝑉∗ 𝑠 = max a 𝑄(𝑠, 𝑎) 𝑄(𝑠, 𝑎) = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201473  Optimal Policy π∗ s = arg 𝑚𝑎𝑥 𝑎 𝑄 𝑠, 𝑎 action-value function Relationship betweenV and Q
  74. 74. MDP algorithms Dynamic Information Retrieval ModelingTutorial 201474  Value Iteration  Policy Iteration  Modified Policy Iteration  Prioritized Sweeping  Temporal Difference (TD) Learning  Q-Learning Model free approaches Model-based approaches [Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98, Richard Sutton,‘88,Watkins,‘92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrin’s ML lecture]
  75. 75. Value Iteration  Initialization Initialize 𝑉0 𝑠 arbitrarily  Loop  Iteration 𝑉𝑖+1 𝑠 ← max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′ π s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′  Stopping criteria  π s is good enough Dynamic Information Retrieval ModelingTutorial 201475 1Bellman,‘57
  76. 76. Greedy Value Iteration  Initialization Initialize 𝑉0 𝑠 arbitrarily  Iteration 𝑉𝑖+1 𝑠 ← max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′  Stopping criteria ∀𝑠 𝑉𝑖+1 𝑠 − 𝑉𝑖 𝑠 < ε  Optimal policy π s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201476 1Bellman,‘57
  77. 77. Greedy Value Iteration 1. For each state s∈S Initialize V0(s) arbitrarily End for 2. 𝑖 ← 0 3. Repeat 3.1 𝑖 ← 𝑖 + 1 3.2 For each 𝑠 ∈ 𝑆 𝑉𝑖 𝑠 ← max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖−1(𝑠′)𝑠′ end for until ∀𝑠 𝑉𝑖 𝑠 − 𝑉𝑖−1 𝑠 < ε 4. For each 𝑠 ∈ 𝑆 π s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′ end for Algorithm Dynamic Information Retrieval ModelingTutorial 201477
  78. 78. V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6 V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) } =max{3+0.96*4.6, 6+0.96*8.0} =max{7.416, 13.68} =13.68 Greedy Value Iteration 𝑉 s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉(𝑠′) 𝑠′ V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4 V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8 Dynamic Information Retrieval ModelingTutorial 201478 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1 a2
  79. 79. Greedy Value Iteration 𝑉 s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉(𝑠′) 𝑠′ Dynamic Information Retrieval ModelingTutorial 201479 i V(i)(S1) V(i)(S2) V(i)(S3) 0 6 4 8 1 13.680 9.760 13.376 2 18.841 17.133 20.380 3 25.565 22.087 25.759 … … … … 200 168.039 165.316 168.793 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1a2 a1 π S1 π S 𝟐 π S 𝟑 a2 a1 a1
  80. 80. Policy Iteration  Initialization 𝑉π0 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦  Iteration (over i )  Policy Evaluation 𝑉π 𝑖 𝑠 ∞ ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉π 𝑖 (𝑠′) 𝑠′  Policy Improvement π𝑖+1 s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉π 𝑖 (𝑠′)𝑠′  Stop criteria Policy stops changing Dynamic Information Retrieval ModelingTutorial 201480 1Howard ,‘60
  81. 81. Policy Iteration 1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat For each 𝑠 ∈ 𝑆 𝑉′(𝑠) ← 𝑉(𝑠) 𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′ End for until ∀𝑠 𝑉 𝑠 − 𝑉′ 𝑠 < ε 2.2 For each 𝑠 ∈ 𝑆 π𝑖+1 s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′) 𝑠′ End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1 Algorithm Dynamic Information Retrieval ModelingTutorial 201481
  82. 82. Modified Policy Iteration  The “Policy Evaluation” step in Policy Iteration is time- consuming, especially when the state space is large.  The Modified Policy Iteration calculates an approximated policy evaluation by running just a few iterations Dynamic Information Retrieval ModelingTutorial 201482 Modified Policy Iteration Policy Iteration GreedyValue Iterationk=1 k=∞
  83. 83. Modified Policy Iteration 1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat k times For each 𝑠 ∈ 𝑆 𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′ End for 2.2 For each 𝑠 ∈ 𝑆 π𝑖+1 s ← arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′) 𝑠′ End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1 Algorithm Dynamic Information Retrieval ModelingTutorial 201483
  84. 84. MDP algorithms Dynamic Information Retrieval ModelingTutorial 201484  Value Iteration  Policy Iteration  Modified Policy Iteration  Prioritized Sweeping  Temporal Difference (TD) Learning  Q-Learning Model free approaches Model-based approaches [Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98, Richard Sutton,‘88,Watkins,‘92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrin’s ML lecture]
  85. 85. Temporal Difference Learning Dynamic Information Retrieval ModelingTutorial 201485  Monte Carlo Sampling can be used for model-free policy iteration  Estimate 𝑉 𝜋 s in “Policy Evaluation” by the average reward of trajectories from s  However, on the trajectories, some of them can be reused  So, we estimate them by an expectation over next state 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + γ𝐸 𝑉 𝜋 𝑠′ |𝑠, 𝑎  The simplest estimation: 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + 𝛾𝑉 𝜋 s′  A smoothed version: 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 s′ + (1 − 𝛼) 𝑉 𝜋 𝑠  TD-Learning rule: 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠)  r is the immediate reward, α is the learning rate Temporal difference Richard Sutton,‘88 Singh & Sutton,‘96 Sutton & Barto,‘98
  86. 86. Dynamic Information Retrieval ModelingTutorial 201486 1. For each state s∈S Initialize V 𝜋(s) arbitrarily End for 2. For each step in the state sequence 2.1 Initialize s 2.2 repeat 2.2.1 take action a at state s according to 𝜋 2.2.2 observe immediate reward r and the next state 𝑠′ 2.2.3 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠) 2.2.4 𝑠 ← 𝑠′ Until s is a terminal state End for Algorithm Temporal Difference Learning
  87. 87. Q-Learning Dynamic Information Retrieval ModelingTutorial 201487  TD-Learning rule  Q-learning rule 𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max 𝑎′ 𝑄 𝑠′, 𝑎′ − 𝑄(𝑠, 𝑎) 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠) 𝑉 𝑠 = max a 𝑄(𝑠, 𝑎) 𝜋∗ 𝑠 = arg 𝑚𝑎𝑥 𝑎 𝑄∗ (𝑠, 𝑎) 𝑄∗ 𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′) max 𝑎′ 𝑄∗ (𝑠′ , 𝑎′) 𝑠′
  88. 88. Q-Learning Dynamic Information Retrieval ModelingTutorial 201488 1. For each state s∈S and a∈A initialize Q0(s,a) arbitrarily End for 2. 𝑖 ← 0 3. For each step in the state sequence 3.1 Initialize s 3.2 Repeat 3.2.1 𝑖 ← 𝑖 + 1 3.2.2 select an action a at state s according to Qi-1 3.2.3 take action a, observe immediate reward r and the next state 𝑠′ 3.2.4 𝑄𝑖 𝑠, 𝑎 ← 𝑄𝑖−1 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max 𝑎′ 𝑄𝑖−1 𝑠′ , 𝑎′ − 𝑄𝑖−1(𝑠, 𝑎) 3.2.5 𝑠 ← 𝑠′ Until s is a terminal state End for 4. For each 𝑠 ∈ 𝑆 π s ← arg 𝑚𝑎𝑥 𝑎 𝑄𝑖 𝑠, 𝑎 End for Algorithm
  89. 89. Apply an MDP to an IR Problem Dynamic Information Retrieval ModelingTutorial 201489  We can model IR systems using a Markov Decision Process  Is there a temporal component?  States –What changes with each time step?  Actions – How does your system change the state?  Rewards – How do you measure feedback or effectiveness in your problem at each time step?  Transition Probability – Can you determine this?  If not, then model free approach is more suitable
  90. 90. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval ModelingTutorial 201490  User agent in session search  States – user’s relevance judgement  Action – new query  Reward – information gained
  91. 91. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval ModelingTutorial 201491  Search engine’s perspective  What if we can’t directly observe user’s relevance judgement?  Click ≠ relevance ? ? ? ?
  92. 92. Dynamic Information Retrieval ModelingTutorial 201492  Markov Chain  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-armed Bandit Family of Markov Models
  93. 93. POMDP Model Dynamic Information Retrieval ModelingTutorial 201493 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2  Hidden states  Observations  Belief 1R. D. Smallwood et. al.,‘73 o1 o2 o3
  94. 94. POMDP Definition Dynamic Information Retrieval ModelingTutorial 201494  A tuple (S, M,A, R, γ, O, Θ, B)  S : state space  M: transition matrix  A: action space  R: reward function  γ: discount factor, 0< γ ≤1  O: observation set an observation is a symbol emitted according to a hidden state.  Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a).  B: belief space Belief is a probability distribution over hidden states.
  95. 95. Dynamic Information Retrieval ModelingTutorial 201495  The agent uses a state estimator to update its belief about the hidden states b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)  b′ s′ = P s′ o′ , a, b = 𝑃(𝑠′,𝑜′|𝑎,𝑏) P(𝑜′|𝑎,𝑏) = Θ(𝑠′, 𝑎, 𝑜′) 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)𝑠 𝑃(𝑜′|𝑎, 𝑏) POMDP → Belief Update
  96. 96. Dynamic Information Retrieval ModelingTutorial 201496  The Bellman equation for POMDP 𝑉 𝑏 = max 𝑎 𝑟 𝑏, 𝑎 + 𝛾 𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′) 𝑜′  A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)  B : the continuous belief space  𝑀′: transition function 𝑀 𝑎 ′ (𝑏, 𝑏′)= 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)𝑜∈𝑂 where 1 𝑎,𝑜′ 𝑏′ , 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′ 0, 𝑒𝑙𝑠𝑒 .  A: action space  r: reward function r(b, a)= 𝑏 𝑠 𝑅(𝑠, 𝑎)𝑠∈𝑆 POMDP → Bellman Equation
  97. 97. Dynamic Information Retrieval ModelingTutorial 201497 The optimal policy of a POMDP The optimal policy of its belief MDP 1L. Kaelbling et. al., ’98 A variation of the value iteration algorithm Solving POMDPs – The Witness Algorithm
  98. 98. Policy Tree Dynamic Information Retrieval ModelingTutorial 201498 • A policy tree of depth i is an i-step non-stationary policy • As if we run value iteration until the ith iteration a(h) ok(h) ok a11 a21 a2k a2l … … … … … … … … … … … o1 ol …aik … a(i-1)k ai1 ail o1 olok i steps to go i-1 steps to go 2 steps to go 1 step to go
  99. 99. Value of a Policy Tree Dynamic Information Retrieval ModelingTutorial 201499  Can only determine the value of a policy tree h from some belief state b, because it never knows the exact state. 𝑉ℎ 𝑏 = 𝑏(𝑠)𝑉ℎ(𝑠)𝑠∈𝑆  𝑉ℎ 𝑠 = 𝑅 𝑠, 𝑎 ℎ + 𝛾 𝑀 𝑎 ℎ (𝑠, 𝑠′) Θ(𝑠′, 𝑎 ℎ , 𝑜𝑖)𝑉𝑜 𝑘 ℎ (𝑠′)𝑜 𝑘∈𝑂𝑠′∈𝑆 the action at the root node of h the (i-1)-step subtree associated with ok under the root node of h
  100. 100. Idea of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014100  For each action a, compute Γ𝑖 𝑎 , the set of candidate i-step policy trees with action a at their roots  The optimal value function at the ith step, 𝑉𝑖 ∗ (b), is the upper surface of the value functions of all i-step policy trees.
  101. 101. Optimal value function Dynamic Information Retrieval ModelingTutorial 2014101  Geometrically, 𝑉𝑖 ∗ (b) is piecewise linear and convex. An example for a two-state POMDP b(s1)+b(s2)=1 Simplex constraint The belief space is one-dimensional Vh2(b) Vh3(b) Vh1(b) Vh5(b) Vh4(b) 𝑉𝑖 ∗ 𝑏 = max ℎ∈H 𝑉ℎ 𝑏 Pruning the Set of PolicyTrees
  102. 102. Outlines of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014102 Algorithm 1.𝐻1 ←{} 2. i ← 1 3. Repeat 3.1 i ← i+1 3.2 For each a in A Γ𝑖 𝑎 ← witness(𝐻i−1, a) end for 3.3 Prune Γ𝑖 𝑎 𝑎 to get 𝐻i until 𝑠𝑢𝑝 𝑏|Vi(b) − Vi−1(b)| < 𝜀 the inner loop
  103. 103. Inner Loop of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014103 Inner loop of the witness algorithm 1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add ℎi to an agenda. 2. In each iteration 2.1 Select a policy tree ℎ 𝑛𝑒𝑤 from the agenda. 2.2 Look for a witness point b using Za and ℎ 𝑛𝑒𝑤. 2.3 If find such a witness point b, 2.3.1 Calculate the best policy tree ℎ 𝑏𝑒𝑠𝑡 for b. 2.3.2 Add ℎ 𝑏𝑒𝑠𝑡 to Za. 2.3.3 Add all the alternative trees of ℎ 𝑏𝑒𝑠𝑡 to the agenda. 2.4 Else remove ℎ 𝑛𝑒𝑤 from the agenda. 3. Repeat the above iteration until the agenda is empty.
  104. 104. Other Solutions Dynamic Information Retrieval ModelingTutorial 2014104  QMDP1  MC-POMDP (Monte Carlo POMDP)2  Grid BasedApproximation3  Belief Compression4 …… 1 Thrun et. al.,‘06 2 Thrun et. al.,‘05 3 Lovejoy,‘91 4 Roy,‘03
  105. 105. Dynamic Information Retrieval ModelingTutorial 2014105 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, User’s decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets Applying POMDP to Dynamic IR
  106. 106. Session Search Example - States SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration  scooter price ⟶ scooter stores  Hartford visitors ⟶ Hartford Connecticut tourism  Philadelphia NYC travel ⟶ Philadelphia NYC train  distance NewYork Boston ⟶ maps.bing.com q0 106 [ J. Luo ,et al., ’14]
  107. 107. Session Search Example - Actions (Au, Ase)  User Action(Au)  Add query terms (+Δq)  Remove query terms (-Δq)  keep query terms (qtheme)  clicked documents  SAT clicked documents  Search Engine Action(Ase)  increase/decrease/keep term weights,  Switch on or switch off query expansion  Adjust the number of top documents used in PRF  etc. 107 [ J. Luo et al., ’14]
  108. 108. Multi Page Search Example - States & Actions Dynamic Information Retrieval ModelingTutorial 2014108 State: Relevance of document Action: Ranking of documents Observation: Clicks Belief: Multivariate Guassian Reward: DCG over 2 pages [Xiaoran Jin et. al., ’13]
  109. 109. SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
  110. 110. Dynamic Information Retrieval ModelingTutorial 2014110  Markov Chain  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-Armed Bandit Family of Markov Models
  111. 111. Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014111 …… …… Which slot machine should I select in this round? Reward
  112. 112. Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014112 I won! Is this the best slot machine? Reward
  113. 113. MAB Definition Dynamic Information Retrieval ModelingTutorial 2014113  A tuple (S,A, R, B) S : hidden reward distribution of each bandit A: choose which bandit to play R: reward for playing bandit B: belief space, our estimate of each bandit’s distribution
  114. 114. Comparison with Markov Models Dynamic Information Retrieval ModelingTutorial 2014114  Single state Markov Decision Process No transition probability  Similar to POMDP in that we maintain a belief state  Action = choose a bandit, does not affect state  Does not‘plan ahead’ but intelligently adapts  Somewhere between interactive and dynamic IR
  115. 115. Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014115 …… …… Markov Process 1 Markov Process 2 Markov Process k Which slot machine should I select in this round? Reward
  116. 116. Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014116 …… …… Markov Process 1 Markov Process 2 Markov Process k Markov Process Action Which slot machine should I select in this round? Reward
  117. 117. MAB Policy Reward Dynamic Information Retrieval ModelingTutorial 2014117  MAB algorithm describes a policy 𝜋 for choosing bandits  Maximise rewards from chosen bandits over all time steps  Minimize regret  𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))𝑇 𝑡=1  Cumulative difference between optimal reward and actual reward
  118. 118. Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014118  Exploration  Try out bandits to find which has highest average reward  Exploitation  Too much exploration leads to poor performance  Play bandits that are known to pay out higher reward on average  MAB algorithms balance exploration and exploitation  Start by exploring more to find best bandits  Exploit more as best bandits become known
  119. 119. Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014119
  120. 120. MAB – Index Algorithms Dynamic Information Retrieval ModelingTutorial 2014120  Gittens index1  Play bandit with highest‘Dynamic Allocation Index’  Modelled using MDP but suffers‘curse of dimensionality’  𝜖-greedy2  Play highest reward bandit with probability 1 − ϵ  Play random bandit with probability 𝜖  UCB (Upper Confidence Bound)3  Play bandit 𝑖 with highest 𝑥𝑖 + 2 ln 𝑡 𝑇 𝑖  Chances of playing infrequently played bandits increases over time 1J. C. Gittins.‘89 2Nicolò Cesa-Bianchi et. al.,‘98 3P.Auer et. al.,‘02
  121. 121. MAB use in IR Dynamic Information Retrieval ModelingTutorial 2014121  Choosing ads to display to users1  Each ad is a bandit  User click through rate is reward  Recommending news articles2  News article is a bandit  Similar to Information Filtering case  Diversifying search results3  Each rank position is an MAB dependent on higher ranks  Documents are bandits chosen by each rank 1Deepayan Chakrabarti et. al. ,‘09 2Lihong Li et. al., ’10 3Radlinski et. al.,‘08
  122. 122. MAB Variations Dynamic Information Retrieval ModelingTutorial 2014122  Contextual Bandits1  World has some context 𝑥 ∈ 𝑋 (i.e. user location)  Learn policy 𝜋: 𝑋 → 𝐴 that maps context to arms (online or offline)  Duelling Bandits2  Play two (or more) bandits at each time step  Observe relative reward rather than absolute  Learn order of bandits  Mortal Bandits3  Value of bandits decays over time  Exploitation > exploration 1Lihong Li et. al.,‘10 2YisongYue et. al.,‘09 3Deepayan Chakrabarti et. al. ,‘09
  123. 123. Comparison of Markov Models Dynamic Information Retrieval ModelingTutorial 2014123  MC – a fully observable stochastic process  HMM – a partially observable stochastic process  MDP – a fully observable decision process  MAB – a decision process, either fully or partially observable  POMDP – a partially observable decision process actions rewards states MC No No Observable HMM No No Unobservable MDP Yes Yes Observable POMDP Yes Yes Unobservable MAB Yes Yes Fixed
  124. 124. SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
  125. 125. Outline Dynamic Information Retrieval ModelingTutorial 2014125  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  126. 126. TREC Session Tracks (2010-2012)  Given a series of queries {q1,q2,…,qn}, top 10 retrieval results {D1, … Di-1 } for q1 to qi-1, and click information  The task is to retrieve a list of documents for the current/last query, qn  Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description)  no need to segment the sessions 126
  127. 127. 1.pocono mountains pennsylvania 2.pocono mountains pennsylvania hotels 3.pocono mountains pennsylvania things to do 4.pocono mountains pennsylvania hotels 5.pocono mountains camelbeach 6.pocono mountains camelbeach hotel 7.pocono mountains chateau resort 8.pocono mountains chateau resort attractions 9.pocono mountains chateau resort getting to 10.chateau resort getting to 11.pocono mountains chateau resort directions TREC 2012 Session 6 127 Information needs: You are planning a winter vacation to the Pocono Mountains region in Pennsylvania in the US.Where will you stay?What will you do while there? How will you get there? In a session, queries change constantly
  128. 128. Query change is an important form of feedback  We define query change as the syntactic editing changes between two adjacent queries:  includes  , added terms  , removed terms  The unchanged/shared terms are called:  , theme term 1 iii qqq iq 128 iq iq iq themeq q1 = “bollywood legislation” q2 = “bollywood law” --------------------------------------- ThemeTerm = “bollywood” Added (+Δq) = “law” Removed (-Δq) = “legislation”
  129. 129. Where do these query changes come from?  GivenTREC Session settings, we consider two sources of query change:  the previous search results that a user viewed/read/examined  the information need  Example:  Kurosawa  Kurosawa wife  `wife’ is not in any previous results, but in the topic description  However, knowing information needs before search is difficult to achieve 129
  130. 130. Previous search results could influence query change in quite complex ways  Merck lobbyists  Merck lobbying US policy  D1 contains several mentions of‘policy’, such as  “A lobbyist who until 2004 worked as senior policy advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”  These mentions are about Canadian policies; while the user adds US policy in q2  Our guess is that the user might be inspired by‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’  Therefore, for the added terms `US policy’,‘US’ is the novel term here, and‘policy’ is not since it appeared in D1.  The two terms should be treated differently 130
  131. 131.  We propose to model session search as a Markov decision process (MDP)  Two agents: the User and the Search Engine Dynamic Information Retrieval ModelingTutorial 2014131  Environments Search results  States Queries  Actions  User actions: Add/remove/unchange the query terms  Search Engine actions: Increase/ decrease /remain term weights Applying MDP to Session Search
  132. 132. Search Engine Agent’s Actions ∈ Di−1 action Example qtheme Y increase “pocono mountain” in s6 N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction +∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck lobbyists US policy N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy −∆q Y decrease ‘reaction’ in s28, france world cup 98 reaction → france world cup 98 N No change ‘legislation’ in s32, bollywood legislation →bollywood law 132
  133. 133. Query Change retrieval Model (QCM)  Bellman Equation gives the optimal value for an MDP:  The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 133 V* (s) = max a R(s,a) + g P(s' | s,a) s' å V* (s')   a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1  Document relevant score Query Transition model Maximum past relevanceCurrent reward/relevanc e score
  134. 134. Calculating the Transition Model )|(log)|( )|(log)()|(log)|( )|(log)]|(1[+d)|P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii                    134 • According to Query Change and Search Engine Actions Current reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added terms Decrease weights for old added terms
  135. 135. Maximizing the Reward Function  Generate a maximum rewarded document denoted as d* i-1, from Di-1  That is the document(s) most relevant to qi-1  The relevance score can be calculated as 𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − {1 − 𝑃(𝑡|𝑑𝑖−1)} 𝑡∈𝑞 𝑖−1 𝑃 𝑡 𝑑𝑖−1 = #(𝑡,𝑑 𝑖−1) |𝑑 𝑖−1|  From several options, we choose to only use the document with top relevance max Di-1 P(qi-1 | Di-1) 135
  136. 136. Scoring the Entire Session  The overall relevance score for a session of queries is aggregated recursively : Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d) = Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)] = gn-i i=1 n å Score(qi, d) 136
  137. 137. Experiments  TREC 2011-2012 query sets, datasets  ClubWeb09 Category B 137
  138. 138. Search Accuracy (TREC 2012)  nDCG@10 (official metric used inTREC) Approach nDCG@10 %chg MAP %chg Lemur 0.2474 -21.54% 0.1274 -18.28% TREC’12 median 0.2608 -17.29% 0.1440 -7.63% Our TREC’12 submission 0.3021 −4.19% 0.1490 -4.43% TREC’12 best 0.3221 0.00% 0.1559 0.00% QCM 0.3353 4.10%† 0.1529 -1.92% QCM+Dup 0.3368 4.56%† 0.1537 -1.41% 138
  139. 139. Search Accuracy (TREC 2011)  nDCG@10 (official metric used inTREC) Approach nDCG@10 %chg MAP %chg Lemur 0.3378 -23.38% 0.1118 -25.86% TREC’11 median 0.3544 -19.62% 0.1143 -24.20% TREC’11 best 0.4409 0.00% 0.1508 0.00% QCM 0.4728 7.24%† 0.1713 13.59%† QCM+Dup 0.4821 9.34%† 0.1714 13.66%† Our TREC’12 submission 0.4836 9.68%† 0.1724 14.32%† 139
  140. 140. Search Accuracy for Different Session Types  TREC 2012 Sessions are classified into:  Product: Factual / Intellectual  Goal quality: Specific / Amorphous Intellec tual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10% 140 - Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamics
  141. 141. Outline Dynamic Information Retrieval ModelingTutorial 2014141  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  142. 142. Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014142
  143. 143. Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014143 Page 1 Page 2 2. 1. 2. 1.
  144. 144. Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014144  No UI Changes  Interactivity is Hidden  Private, performed in browser
  145. 145. Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014145 Page 1 • Diverse Ranking • Maximise learning potential • Exploration vs Exploitation Page 2 • Clickthroughs or explicit ratings • Respond to feedback from page 1 • Personalized
  146. 146. Model Dynamic Information Retrieval ModelingTutorial 2014146
  147. 147. Model Dynamic Information Retrieval ModelingTutorial 2014147  𝑁 𝜃1, Σ1  𝜃1 -prior estimate of relevance  Σ1 - prior estimate of covariance  Document similarity  Topic Clustering
  148. 148. Model Dynamic Information Retrieval ModelingTutorial 2014148  Rank action for page 1
  149. 149. Model Dynamic Information Retrieval ModelingTutorial 2014149
  150. 150. Model Dynamic Information Retrieval ModelingTutorial 2014150  Feedback from page 1  𝒓 ~ 𝑁(𝜃𝒔 1 , Σ 𝒔 1 )
  151. 151. Model Dynamic Information Retrieval ModelingTutorial 2014151  Update estimates using 𝒓1  𝜃1 = 𝜃𝒔′ 𝜃 𝒔′ Σ1 = Σ𝒔′ Σs′𝒔′ Σs′𝒔′ Σ 𝒔′  𝜃2 = 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′ −1 (𝒓1 − 𝜃𝒔′)  Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′ −1 Σs′𝒔′
  152. 152. Model Dynamic Information Retrieval ModelingTutorial 2014152  Rank using PRP
  153. 153. Model Dynamic Information Retrieval ModelingTutorial 2014153  Utility or Ranking  𝜆 𝜃 𝑠 𝑗 1 log2(𝑗+1) + 1 − 𝜆 𝜃 𝑠 𝑗 2 log2(𝑗+1) 2𝑀 𝑗=1+𝑀 𝑀 𝑗=1  DCG
  154. 154. Model – Bellman Equation Dynamic Information Retrieval ModelingTutorial 2014154  Optimize 𝒔1 to improve 𝑼 𝒔 2  𝑉 𝜃1 , Σ1 , 1 = max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 + max 𝒔2 (1 − 𝜆) 𝜃𝒔 2 . 𝑾2 𝑃 𝒓 𝑑𝒓𝒓
  155. 155. 𝜆 Dynamic Information Retrieval ModelingTutorial 2014155  Balances exploration and exploitation in page 1  Tuned for different queries  Navigational  Informational  𝜆 = 1 for non-ambiguous search
  156. 156. Approximation Dynamic Information Retrieval ModelingTutorial 2014156  Monte Carlo Sampling  ≈ max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 + max 𝒔2 1 − 𝜆 1 𝑆 𝜃𝒔 2 . 𝑾2 𝑃 𝒓𝑟∈𝑂  Sequential Ranking Decision
  157. 157. Experiment Data Dynamic Information Retrieval ModelingTutorial 2014157  Difficult to evaluate without access to live users  Simulated using 3TREC collections and relevance judgements  WT10G – Explicit Ratings  TREC8 – Clickthroughs  Robust – Difficult (ambiguous) search
  158. 158. User Simulation Dynamic Information Retrieval ModelingTutorial 2014158  Rank M documents  Simulated user clicks according to relevance judgements  Update page 2 ranking  Measure at page 1 and 2  Recall  Precision  nDCG  MRR  BM25 – prior ranking model
  159. 159. Investigating λ Dynamic Information Retrieval ModelingTutorial 2014159
  160. 160. Baselines Dynamic Information Retrieval ModelingTutorial 2014160  𝜆 determined experimentally  BM25  BM25 with conditional update (𝜆 = 1)  Maximum Marginal Relevance (MMR)  Diversification  MMR with conditional update  Rocchio  Relevance Feedback
  161. 161. Results Dynamic Information Retrieval ModelingTutorial 2014161
  162. 162. Results Dynamic Information Retrieval ModelingTutorial 2014162
  163. 163. Results Dynamic Information Retrieval ModelingTutorial 2014163
  164. 164. Results Dynamic Information Retrieval ModelingTutorial 2014164
  165. 165. Results Dynamic Information Retrieval ModelingTutorial 2014165  Similar results across data sets and metrics  2nd page gain outweighs 1st page losses  Outperformed Maximum Marginal Relevance using MRR to measure diversity  BM25-U simply no exploration case  Similar results when 𝑀 = 5
  166. 166. Results Dynamic Information Retrieval ModelingTutorial 2014166
  167. 167. Outline Dynamic Information Retrieval ModelingTutorial 2014167  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation
  168. 168. Dynamic Information Retrieval Evaluation EmineYilmaz University College London Emine.Yilmaz@ucl.ac.uk
  169. 169. Information Retrieval Systems Match information seekers with the information they seek
  170. 170. Retrieval Evaluation: Traditional View
  171. 171. Retrieval Evaluation: Dynamic View
  172. 172. Retrieval Evaluation: Dynamic View
  173. 173. Retrieval Evaluation: Dynamic View
  174. 174. Different Approaches to Evaluation  Online Evaluation  Design interactive experiments  Use users’ actions to evaluate the quality  Inherently dynamic in nature  Offline Evaluation  Controlled laboratory experiments  The users’ interaction with the engine is only simulated  Recent work focused on dynamic IR evaluation
  175. 175. Online Evaluation  Standard click metrics  Clickthrough rate  Probability user skips over results they have considered (pSkip)  Most recently: Result interleaving    Click/Noclick Evaluate 175
  176. 176. What is result interleaving?  A way to compare rankers online  Given the two rankings produced by two methods  Present a combination of the rankings to users  Team Draft Interleaving (Radlinski et al., 2008)  Interleaving two rankings  Input:Two rankings (“can be seen as teams who pick players”)  Repeat: o Toss a coin to see which team (ranking) picks next o Winner picks their best remaining player (document) o Loser picks their best remaining player (document)  Output: One ranking (2 teams of 5)  Credit assignment  Ranking providing more of the clicked results wins
  177. 177. Team Draft InterleavingRanking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org AB
  178. 178. Team Draft InterleavingRanking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins!
  179. 179. Team Draft InterleavingRanking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins! Repeat Over Many Different Queries!
  180. 180. Offline Evaluation  Controlled laboratory experiments  The user’s interaction with the engine is only simulated  Ask experts to judge each query result  Predict how users behave when they search  Aggregate judgments to evaluate 180
  181. 181. Offline Evaluation  Until recently: Metrics assume that user’s information need was not affected by the documents read  E.g.Average Precision, NDCG, … • Users are more likely to stop searching when they see a highly relevant document • Lately: Metrics that incorporate the affect of relevance of documents seen by the user on user behavior  Based on devising more realistic user models  EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09] 181
  182. 182. Modeling User Behavior Cascade-based models black powder ammunition 1 2 3 4 5 6 7 8 9 10 … • The user views search results from top to bottom • At each rank i, the user has a certain probability of being satisfied. • Probability of satisfaction proportional to the relevance grade of the document at rank i. • Once the user is satisfied with a document, he terminates the search.
  183. 183. Rank Biased Precision Query Stop View Next Item black powder ammunition 1 2 3 4 5 6 7 8 9 10 …
  184. 184. Rank Biased Precision black powder ammunition 1 2 3 4 5 6 7 8 9 10 …    1=i 1 =utilityTotal i irel examineddocsm.utility/NuTotalRBP  )1/(1)1(=examineddocsNum. 1=i 1    i i )-(1=RBP 1=i 1   i irel
  185. 185. Expected Reciprocal Rank [Chapelle et al CIKM09] Query Stop Relevant? View Next Item nosomewhathighly black powder ammunition 1 2 3 4 5 6 7 8 9 10 …
  186. 186. Expected Reciprocal Rank [Chapelle et al CIKM09] black powder ammunition 1 2 3 4 5 6 7 8 9 10 … rrankatdocument"perfectthe"findingofUtility:(r) 1/r(r)  )positionatstopsuser( 1 1 rP r ERR n r       1 11 )1( 1 r i ri n r RR r ERR documentitheofgraderelevance: th ig iRi g g i i docatstopofProb. 2 12 docofrelevanceofProb. max   
  187. 187. Paris Luxurious HotelsParis HiltonJ LoSession Evaluation
  188. 188. What is a good system?
  189. 189. Measuring “goodness” The user steps down a ranked list of documents and observes each one of them until a decision point and either a) abandons the search, or b) reformulates While stepping down or sideways, the user accumulates utility
  190. 190. Evaluation over a single ranked list 1 2 3 4 5 6 7 8 9 10 … kenya cooking traditional swahili kenya cooking traditional kenya swahili traditional food recipes
  191. 191. Session DCG [Järvelin et al ECIR 2008] kenya cooking traditional swahili kenya cooking traditional  2rel(r) 1 logb (r b 1)r1 k   2rel(r) 1 logb (r b 1)r1 k  1 logc (1 c 1) DCG(RL1)  1 logc (2  c 1)  DCG(RL2)
  192. 192. Model-based measures Probabilistic space of users following different paths  Ω is the space of all paths  P(ω) is the prob of a user following a path ω in Ω  Mω is a measure over a path ω [Yang and Lad ICTIR 2009, Kanoulas et al. SIGIR 2011]
  193. 193. Probability of a path Probability of abandoning at reform 2 X Probability of reformulating at rank 3 Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R … … … (1) (2)
  194. 194. Expected Global Utility [Yang and Lad ICTIR 2009] 1. User steps down ranked results one-by-one 2. Stops browsing documents based on a stochastic process that defines a stopping probability distribution over ranks and reformulates 3. Gains something from relevant documents, accumulating utility
  195. 195. Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R … … … Probability of abandoning the session at reformulation i Geometric w/ parameter preform (1)
  196. 196. Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R … … … Geometricw/parameterpdown Probability of reformulating at rank j (2) Geometric w/ parameter preform
  197. 197. Expected Global Utility [Yang and Lad ICTIR 2009]  The probability of a user following a path ω: P(ω) = P(r1, r2, ..., rK) ri is the stopping and reformulation point in list i  Assumption: stopping positions in each list are independent P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)  Use geometric distribution (RBP) to model the stopping and reformulation behaviour P(ri = r) = (1-)k1
  198. 198. Conclusions  Recent focus on evaluating the dynamic nature of the search process  Interleaving  New offline evaluation metrics  ERR, RBU  Session evaluation metrics
  199. 199. Outline Dynamic Information Retrieval ModelingTutorial 2014200  Introduction  Theory and Models  Session Search  Reranking  GuestTalk: Evaluation  Conclusion
  200. 200. Conclusions Dynamic Information Retrieval ModelingTutorial 2014201  Dynamic IR describes a new class of interactive model  Incorporates rich feedback, temporal dependency and is goal oriented.  Family of Markov models and Multi Armed Bandit theory useful in building DIR models  Applicable to a range of IR problems  Useful in applications such as session search and evaluation
  201. 201. Dynamic IR Book Dynamic Information Retrieval ModelingTutorial 2014202  Published by Morgan & Claypool  ‘Synthesis Lectures on Information Concepts, Retrieval, and Services’  Due March/April 2015 (in time for SIGIR 2015)
  202. 202. Acknowledgment Dynamic Information Retrieval ModelingTutorial 2014203  We thank Dr. EmineYilmaz for giving us the guest speech.  We sincerely thank Dr. Xuchu Dong for his help in preparation of the tutorial  We also thank comments and suggestions from the following colleagues:  Dr. Jamie Callan  Dr. Ophir Frieder  Dr. Fernando Diaz  Dr Filip Radlinski
  203. 203. Dynamic Information Retrieval ModelingTutorial 2014204
  204. 204. Thank You Dynamic Information Retrieval ModelingTutorial 2014205
  205. 205. References Dynamic Information Retrieval ModelingTutorial 2014206 Static IR  Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro- Neto.Addison-Wesley, 1999.  The PageRank Citation Ranking: Bringing Order to theWeb. Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd. 1999  Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005  A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.
  206. 206. References Dynamic Information Retrieval ModelingTutorial 2014207 Interactive IR  Relevance Feedback in Information Retrieval, Rocchio, J. J.,The SMART Retrieval System (pp. 313-23), 1971  A study in interface support mechanisms for interactive information retrieval, RyenW.White et. al, JASIST, 2006  Visualizing stages during an exploratory search session, Bill Kules et. al, HCIR, 2011  Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011  Structured Learning of Two-level Dynamic Rankings, Karthik Raman et. al, CIKM, 2011
  207. 207. References Dynamic Information Retrieval ModelingTutorial 2014208 Dynamic IR  A hidden Markov model information retrieval system. D. R. H. Miller,T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.  Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002  A large-scale study of the evolution of web pages, Dennis Fetterly et. al.,WWW 2003  Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008.  Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem,YisongYue et. al., ICML 2009  Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009
  208. 208. References Dynamic Information Retrieval ModelingTutorial 2014209 Dynamic IR  Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009  A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al.,WSDM 2010  A contextual-bandit approach to personalized news article recommendation. Lihong Li,Wei Chu, John Langford, Robert E. Schapire.WWW, 2010  Inferring search behaviors using partially observable markov model with duration (POMD),Yin he et. al.,WSDM, 2011  No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011  Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011  Large-ScaleValidation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al.,TOIS 2012
  209. 209. References Dynamic Information Retrieval ModelingTutorial 2014210 Dynamic IR  Using ControlTheory for Stable and Efficient Recommender Systems.T. Jambor, J.Wang, N. Lathia. In:WWW '12, pages 11-20.  Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al., CIKM 2012  Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.  Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H.Yang. In SIGIR 2013.  Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J.Wang. InWWW ’13.  Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In: CIKM'2013, pages 1411-1420.  Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H.Yang. In SIGIR ’14.
  210. 210. References Dynamic Information Retrieval ModelingTutorial 2014211 Markov Processes  A markovian decision process. R. Bellman. Indiana University Mathematics Journal, 6:679–684, 1957.  Dynamic Programming. R. Bellman. Princeton University Press, Princeton, NJ, USA, first edition, 1957.  Dynamic Programming and Markov Processes. R.A. Howard. MIT Press. 1960  Linear Programming and Sequential Decisions.Alan S. Manne. Management Science, 1960  Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical Statistics 37, 1966
  211. 211. References Dynamic Information Retrieval ModelingTutorial 2014212 Markov Processes  Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988  Computationally feasible bounds for partially observed Markov decision processes.W. Lovejoy. Operations Research 39: 162–175, 1991.  Q-Learning. Christopher J.C.H.Watkins, Peter Dayan. Machine Learning. 1992  Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.  Reinforcement Learning:An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.  Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1- 2):99–134, 1998.
  212. 212. References Dynamic Information Retrieval ModelingTutorial 2014213 Markov Processes  Finding approximate POMDP solutions through belief compression. N. Roy. PhDThesis Carnegie Mellon. 2003  VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.  Finding Approximate POMDP solutionsThrough Belief Compression. N. Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.  Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT Press. 2005  Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S.Thrun.Volume 27, pages 335-380, 2006  Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press, 2006.
  213. 213. References Dynamic Information Retrieval ModelingTutorial 2014214 Markov Processes  The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973  Modified Policy IterationAlgorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.  An example of statistical investigation of the text eugene onegin the connection of samples in chains.A.A. Markov. Science in Context, 19:591–600, 12 2006.  Learning to Rank for Information Retrieval.Tie-Yan Liu. Springer Science & Business Media. 2011  Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa- Bianchi, Paul Fischer. ICML 100-108, 1998  Multi-armed bandit allocation indices,Wiley, J. C. Gittins. 1989  Finite-time Analysis of the Multiarmed Bandit Problem, PeterAuer et. al., Machine Learning 47, Issue 2-3. 2002.

×