Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

637 views

Published on

Using Deep Reinforcement Learning for Dialogue Systems:

Published in: Technology
  • Be the first to comment

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

  1. 1. Using Deep Reinforcement Learning for Dialogue Systems Harm van Seijen, Research Scientist Montréal, Canada
  2. 2. spoken dialogue system natural language understanding state tracker policy manager natural language generation data “Hi, do you know a good
 Indian restaurant” system response user act system
 act dialogue state user inform(food=“Indian”) user input “Sure. What price range 
 are you thinking of?” request(price_range)
  3. 3. spoken dialogue system natural language understanding state tracker policy manager natural language generation data “Hi, do you know a good
 Indian restaurant” system response user act system
 act dialogue state user The central question: how to train the policy manager? inform(food=“Indian”) user input “Sure. What price range 
 are you thinking of?” request(price_range)
  4. 4. outline 1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
  5. 5. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour.
  6. 6. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour. machine learning unsupervised learning supervised learning reinforcement learning
  7. 7. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour. machine learning unsupervised learning supervised learning reinforcement learning + deep learning deep learning + + deep learning
  8. 8. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour. machine learning unsupervised learning supervised learning reinforcement learning + deep learning deep learning + + deep learning = deep reinforcement learning
  9. 9. RL vs supervised learning behaviour: function that maps environment states to actions
  10. 10. RL vs supervised learning supervised learning hard to specify function easy to identify correct output behaviour: function that maps environment states to actions
  11. 11. RL vs supervised learning supervised learning hard to specify function easy to identify correct output behaviour: function that maps environment states to actions example: recognizing cats in images f cat / no cat
  12. 12. RL vs supervised learning behaviour: function that maps environment states to actions reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal
  13. 13. RL vs supervised learning behaviour: function that maps environment states to actions reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal example: double inverted pendulum state: θ1, θ2, ω1, ω2 
 action: clockwise/counter-clockwise
 torque on top joint goal: balance pendulum upright
  14. 14. advantages RL does not require knowledge of good policy does not require labelled data online learning: adaptation to environment changes
  15. 15. challenges RL requires lots of data sample distribution changes during learning samples are not i.i.d.
  16. 16. outline 1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
  17. 17. definitions
  18. 18. definitions
  19. 19. definitions
  20. 20. definitions
  21. 21. definitions
  22. 22. estimating the value function
  23. 23. estimating the value function
  24. 24. estimating the value function
  25. 25. estimating the value function
  26. 26. estimating the value function
  27. 27. finding the optimal policy policy estimation policy improvement:
  28. 28. finding the optimal policy Q-learning: classical RL algorithm combines (partial) policy evaluation with (partial) policy improvement update target: policy estimation policy improvement:
  29. 29. deep reinforcement learning 2015 Nature paper from DeepMind introduced an RL 
 method based on deep learning, called DQN main result: with same network architecture, learned to 
 play large number of Atari 2600 games effectively
  30. 30. deep reinforcement learning 2015 Nature paper from DeepMind introduced an RL 
 method based on deep learning, called DQN main result: with same network architecture, learned to 
 play large number of Atari 2600 games effectively DQN characteristics variation on Q-learning that uses deep neural networks to approximate the Q function uses experience replay to deal with non-i.i.d. samples uses two networks (Q and Q’) to mitigate non-stationarity of update targets
  31. 31. outline 1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
  32. 32. applying RL to dialogue system training dialogue manager requires huge number of online samples hence, a user simulator, trained on offline data, is used to train dialogue manager policy manager system
 act user simulator training state tracker dialogue
 act offline data
  33. 33. deep RL for dialogue system exact state is not observed, hence belief state is used belief-state spaces are typically discretized into summary state spaces to make the task tractable deep RL can be applied directly to the belief-state space due to its strong generalization properties with pre-training, a deep RL method can become even more efficient
  34. 34. effect of pre-training without pre-training with pre-training [based on DSTC2 dataset]
  35. 35. summary RL is a data-driven approach towards learning behaviour RL does not require knowledge of good policy RL can be used for online learning combining RL with deep learning means that RL can be applied to much bigger problems constructing a good policy for a modern dialogue manager is a challenging task deep RL is the perfect candidate to address this challenge
  36. 36. Further reading: “Introduction to Reinforcement Learning” by Richard S. Sutton & Andrew G. Barto https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html “Algorithms for Reinforcement Learning”
 by Csaba Szepesvari
 https://sites.ualberta.ca/~szepesva/RLBook.html “Policy Networks with Two-Stage Training for Dialogue Systems” by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman https://arxiv.org/abs/1606.03152 Code examples: simple DQN example in Python: 
 https://edersantana.github.io/articles/keras_rl/ tool for testing/developing RL algorithms: 
 https://gym.openai.com/

×