Upcoming SlideShare
×

# Partially observable Markov decision processes for spoken dialog systems

1,460 views

Published on

Paper presentation:
Partially observable Markov decision processes for spoken dialog systems
Jason D. Williams, Steve Young (AT&T Labs)
2007, Computer Speech and Language, 21(2)

Published in: Technology
3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,460
On SlideShare
0
From Embeds
0
Number of Embeds
71
Actions
Shares
0
42
0
Likes
3
Embeds 0
No embeds

No notes for slide

### Partially observable Markov decision processes for spoken dialog systems

1. 1. Partially observable Markov decision processes for spoken dialog systems Jason D. Williams, Steve Young (AT&T Labs) 2007, Computer Speech and Language, 21(2):
2. 2. Outline <ul><li>Introduction
3. 3. Partially observable Markov decision processes
4. 4. Spoken Dialog System
5. 5. SDS-POMDP
6. 6. Comparing
7. 7. Empirical support </li></ul>
8. 8. POMDP (1) <ul><li>Partially observable Markov decision processes
9. 9. POMDP = {S, A, T, R, O, Z, λ, b 0 } </li><ul><li>S – set of states describing agent's world
10. 10. A – set of actions, that agent may take
11. 11. T – transition probability – P(s'|s, a)
12. 12. R – reward – r(s, a)
13. 13. O – set of observation about the world
14. 14. Z – observation probability – P(o'|s', a) </li></ul></ul>
15. 15. POMDP (2) <ul><li>POMDP = {S, A, T, R, O, Z, λ, b 0 } </li><ul><li>λ – geometric discount factor <0, 1>
16. 16. b 0 – initial belief state b 0 (s) </li></ul></ul>
17. 17. POMDP (3) <ul><li>○ - random variable
18. 18. □ - decision node
19. 19. ◊ - utility node
21. 21. | - causal effect
22. 22.  - distribution is used
23. 23. RL – reinforced learning </li></ul>
24. 24. POMDP (Example) <ul><li>Dialog system </li><ul><li>saving/deleting messages </li></ul></ul>
25. 25. Spoken Dialog System <ul><li>S u – internal user state
26. 26. S d – dialog state (user view)
27. 27. A u – user action (intention) </li></ul>
28. 28. Spoken Dialog System <ul><li>Y u – user audio signal
29. 29. A u – action recognized by machine
30. 30. C – confidence score
31. 31. S m – dialog state (machine view) </li></ul>~
32. 32. Spoken Dialog System <ul><li>A m – machine action
33. 33. Y m – machine audio signal
34. 34. A m – action recognized by user </li></ul>~
35. 35. Mapping SDS to POMDP <ul><li>POMDP = {S, A, T, R, O, Z, λ, b 0 }
36. 36. SDS = {S u , S d , S m , C, A u , A u , A m } </li></ul>~
37. 37. SDS-POMDP <ul><li>s = (s u , a u , s d )
38. 38. s m = b(s) = b(s u , a u , s d ) </li></ul>
39. 39. Math behind <ul><li>Formula for new belief </li></ul><ul><li>Exact algorithms rarely scale with more than 10 actions, states and observations.
40. 40. Effective approximate solutions exist. </li></ul>
41. 41. Comparing SDS-POMDP <ul><li>Better than current approaches
42. 42. CA are simplification or special case
43. 43. Approaches </li><ul><li>Parallel state hypotheses
44. 44. Local use of confidence score
45. 45. Automated action planning </li></ul></ul>
46. 46. Parallel state hypotheses <ul><li>Traditional = 1 state
47. 47. Uncertainty -> multiple states
48. 48. 2 techniques </li><ul><li>Greedy decision theoretic approaches
49. 49. M-Best list </li></ul></ul>
50. 50. Greedy decisions <ul><li>Maximizes immediate reward
51. 51. Doesn't perform plan
52. 52. Handcrafting + ad hoc tunning </li></ul>
53. 53. M-Best list <ul><li>Considers only the top hypotheses
54. 54. = POMDP with handcrafted action selection
55. 55. Subspace of belief space </li></ul>
56. 56. Local use of confidence score <ul><li>Handcrafted update rules
57. 57. A c = {expl-confirm, imp-confirm, reject}
58. 58. Useful, but hard for long-term goals </li></ul>
59. 59. Automated action selection <ul><li>Handcrafted planning </li><ul><li>Unforseen dialog situations </li></ul><li>POMDP with single state
60. 60. 2 main techniques </li><ul><li>Supervised learning
61. 61. Markov decision processes </li></ul></ul>
62. 62. Supervised learning <ul><li>Training data </li><ul><li>Human-human – much richer
63. 63. Human-machine – machine errors </li></ul><li>Single state </li></ul>
64. 64. Markov decision process <ul><li>Fully Observable MDP is simplification of PO
65. 65. Assumes, that world state is known exactly
66. 66. Single state </li></ul>
67. 67. Empirical support <ul><li>Based on simulations
68. 68. Benefits of POMDP to </li><ul><li>Parallel state hypotheses
69. 69. Confidence score
70. 70. Automated planning </li></ul><li>Real data </li></ul>
71. 71. Parallel state hypotheses (1)
72. 72. Parallel state hypotheses (2)
73. 73. Parallel state hypotheses (3)
74. 74. Confidence score (1) <ul><li>Confidence Score: Reject, 0.4, Low, 0.8, Hight </li></ul>
75. 75. Confidence score (2)
76. 76. Confidence score (3)
77. 77. Confidence score (4)
78. 78. Automated planning (1) <ul><li>HC1
79. 79. HC2
80. 80. HC3 </li></ul>
81. 81. Automated planning (2)
82. 82. Automated planning (3)
83. 83. Real data (1) <ul><li>SACTI-1 Corpus </li><ul><li>144 human-human dialogs in the travel domain </li></ul></ul>
84. 84. Real data (2)
85. 85. Conclusion <ul><li>Significant improvement in robustness
86. 86. CA are simplification or special case
87. 87. Scales purely
88. 88. Unique </li></ul>
89. 89. Future work <ul><li>Other approaches </li><ul><li>Information State Update
90. 90. Hidden Information State </li></ul><li>Evaluating on real users </li></ul>
91. 91. Questions?
92. 92. Thank you! Thank you!