Partially observable Markov decision processes for spoken dialog systems Jason D. Williams, Steve Young (AT&T Labs) 2007, ...
Outline <ul><li>Introduction
Partially observable Markov decision processes
Spoken Dialog System
SDS-POMDP
Comparing
Empirical support </li></ul>
POMDP (1) <ul><li>Partially observable Markov decision processes
POMDP = {S, A, T, R, O, Z,  λ, b 0 } </li><ul><li>S – set of states describing agent's world
A – set of actions, that agent may take
T – transition probability – P(s'|s, a)
R – reward – r(s, a)
O – set of observation about the world
Z – observation probability – P(o'|s', a) </li></ul></ul>
POMDP (2) <ul><li>POMDP = {S, A, T, R, O, Z,  λ, b 0 } </li><ul><li>λ – geometric discount factor <0, 1>
b 0  – initial belief state   b 0 (s) </li></ul></ul>
POMDP (3) <ul><li>○  - random variable
□  - decision node
◊  - utility node
Shaded – unobserved
| - causal effect
  - distribution is used
RL – reinforced learning  </li></ul>
POMDP (Example) <ul><li>Dialog system </li><ul><li>saving/deleting messages </li></ul></ul>
Spoken Dialog System <ul><li>S u  – internal user state
S d  – dialog state (user view)
A u  – user action (intention) </li></ul>
Spoken Dialog System <ul><li>Y u  – user audio signal
A u  – action recognized by machine
C – confidence score
S m  – dialog state (machine view) </li></ul>~
Spoken Dialog System <ul><li>A m  – machine action
Y m  – machine audio signal
A m  – action recognized by user </li></ul>~
Mapping SDS to POMDP <ul><li>POMDP = {S, A, T, R, O, Z,  λ, b 0 }
SDS = {S u , S d , S m , C, A u , A u , A m } </li></ul>~
SDS-POMDP <ul><li>s = (s u , a u , s d )
Upcoming SlideShare
Loading in …5
×

Partially observable Markov decision processes for spoken dialog systems

1,460 views

Published on

Paper presentation:
Partially observable Markov decision processes for spoken dialog systems
Jason D. Williams, Steve Young (AT&T Labs)
2007, Computer Speech and Language, 21(2)

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,460
On SlideShare
0
From Embeds
0
Number of Embeds
71
Actions
Shares
0
Downloads
42
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Partially observable Markov decision processes for spoken dialog systems

  1. 1. Partially observable Markov decision processes for spoken dialog systems Jason D. Williams, Steve Young (AT&T Labs) 2007, Computer Speech and Language, 21(2):
  2. 2. Outline <ul><li>Introduction
  3. 3. Partially observable Markov decision processes
  4. 4. Spoken Dialog System
  5. 5. SDS-POMDP
  6. 6. Comparing
  7. 7. Empirical support </li></ul>
  8. 8. POMDP (1) <ul><li>Partially observable Markov decision processes
  9. 9. POMDP = {S, A, T, R, O, Z, λ, b 0 } </li><ul><li>S – set of states describing agent's world
  10. 10. A – set of actions, that agent may take
  11. 11. T – transition probability – P(s'|s, a)
  12. 12. R – reward – r(s, a)
  13. 13. O – set of observation about the world
  14. 14. Z – observation probability – P(o'|s', a) </li></ul></ul>
  15. 15. POMDP (2) <ul><li>POMDP = {S, A, T, R, O, Z, λ, b 0 } </li><ul><li>λ – geometric discount factor <0, 1>
  16. 16. b 0 – initial belief state b 0 (s) </li></ul></ul>
  17. 17. POMDP (3) <ul><li>○ - random variable
  18. 18. □ - decision node
  19. 19. ◊ - utility node
  20. 20. Shaded – unobserved
  21. 21. | - causal effect
  22. 22.  - distribution is used
  23. 23. RL – reinforced learning </li></ul>
  24. 24. POMDP (Example) <ul><li>Dialog system </li><ul><li>saving/deleting messages </li></ul></ul>
  25. 25. Spoken Dialog System <ul><li>S u – internal user state
  26. 26. S d – dialog state (user view)
  27. 27. A u – user action (intention) </li></ul>
  28. 28. Spoken Dialog System <ul><li>Y u – user audio signal
  29. 29. A u – action recognized by machine
  30. 30. C – confidence score
  31. 31. S m – dialog state (machine view) </li></ul>~
  32. 32. Spoken Dialog System <ul><li>A m – machine action
  33. 33. Y m – machine audio signal
  34. 34. A m – action recognized by user </li></ul>~
  35. 35. Mapping SDS to POMDP <ul><li>POMDP = {S, A, T, R, O, Z, λ, b 0 }
  36. 36. SDS = {S u , S d , S m , C, A u , A u , A m } </li></ul>~
  37. 37. SDS-POMDP <ul><li>s = (s u , a u , s d )
  38. 38. s m = b(s) = b(s u , a u , s d ) </li></ul>
  39. 39. Math behind <ul><li>Formula for new belief </li></ul><ul><li>Exact algorithms rarely scale with more than 10 actions, states and observations.
  40. 40. Effective approximate solutions exist. </li></ul>
  41. 41. Comparing SDS-POMDP <ul><li>Better than current approaches
  42. 42. CA are simplification or special case
  43. 43. Approaches </li><ul><li>Parallel state hypotheses
  44. 44. Local use of confidence score
  45. 45. Automated action planning </li></ul></ul>
  46. 46. Parallel state hypotheses <ul><li>Traditional = 1 state
  47. 47. Uncertainty -> multiple states
  48. 48. 2 techniques </li><ul><li>Greedy decision theoretic approaches
  49. 49. M-Best list </li></ul></ul>
  50. 50. Greedy decisions <ul><li>Maximizes immediate reward
  51. 51. Doesn't perform plan
  52. 52. Handcrafting + ad hoc tunning </li></ul>
  53. 53. M-Best list <ul><li>Considers only the top hypotheses
  54. 54. = POMDP with handcrafted action selection
  55. 55. Subspace of belief space </li></ul>
  56. 56. Local use of confidence score <ul><li>Handcrafted update rules
  57. 57. A c = {expl-confirm, imp-confirm, reject}
  58. 58. Useful, but hard for long-term goals </li></ul>
  59. 59. Automated action selection <ul><li>Handcrafted planning </li><ul><li>Unforseen dialog situations </li></ul><li>POMDP with single state
  60. 60. 2 main techniques </li><ul><li>Supervised learning
  61. 61. Markov decision processes </li></ul></ul>
  62. 62. Supervised learning <ul><li>Training data </li><ul><li>Human-human – much richer
  63. 63. Human-machine – machine errors </li></ul><li>Single state </li></ul>
  64. 64. Markov decision process <ul><li>Fully Observable MDP is simplification of PO
  65. 65. Assumes, that world state is known exactly
  66. 66. Single state </li></ul>
  67. 67. Empirical support <ul><li>Based on simulations
  68. 68. Benefits of POMDP to </li><ul><li>Parallel state hypotheses
  69. 69. Confidence score
  70. 70. Automated planning </li></ul><li>Real data </li></ul>
  71. 71. Parallel state hypotheses (1)
  72. 72. Parallel state hypotheses (2)
  73. 73. Parallel state hypotheses (3)
  74. 74. Confidence score (1) <ul><li>Confidence Score: Reject, 0.4, Low, 0.8, Hight </li></ul>
  75. 75. Confidence score (2)
  76. 76. Confidence score (3)
  77. 77. Confidence score (4)
  78. 78. Automated planning (1) <ul><li>HC1
  79. 79. HC2
  80. 80. HC3 </li></ul>
  81. 81. Automated planning (2)
  82. 82. Automated planning (3)
  83. 83. Real data (1) <ul><li>SACTI-1 Corpus </li><ul><li>144 human-human dialogs in the travel domain </li></ul></ul>
  84. 84. Real data (2)
  85. 85. Conclusion <ul><li>Significant improvement in robustness
  86. 86. CA are simplification or special case
  87. 87. Scales purely
  88. 88. Unique </li></ul>
  89. 89. Future work <ul><li>Other approaches </li><ul><li>Information State Update
  90. 90. Hidden Information State </li></ul><li>Evaluating on real users </li></ul>
  91. 91. Questions?
  92. 92. Thank you! Thank you!

×