Successfully reported this slideshow.
Your SlideShare is downloading. ×

Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Setting

More Related Content

More from AI Frontiers

Related Books

Free with a 30 day trial from Scribd

See all

Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Setting

  1. 1. Horizon Robotics Language Learning in an Interactive and Embodied Setting 11/2018 Wei Xu 1 Horizon Robotics
  2. 2. Horizon RoboticsA Developmental Approach to Machine Intelligence 1. It might be easier than solving all the tasks a human adult can do 2. Learn skills and knowledges unspecified at design time 3. Gradually proceed from easy tasks to difficult tasks 2 “Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain.” - Alan Turing (1950) Language learning in an interactive and embodied setting
  3. 3. Horizon RoboticsWhy Embodied?  Learn from the experiences coming from the machine’s interactions with its environment  Learn commonsense through the observation and interaction with the environment  Meaning emerges by “grounding” language in modalities in our environment 3Language learning in an interactive and embodied setting Human driving: < 1000 miles Self-driving: >10 million miles
  4. 4. Horizon RoboticsWhy Interactive?  A useful robot needs to be able to understand and communicate effectively  It is easier for human to teach machines directly using language than writing code  Humans are great teachers  Learn the effects of speaking by observing feedbacks from conversational partner  Learn human value through the interaction 4Language learning in an interactive and embodied setting
  5. 5. Horizon RoboticsAnswering Questions and Following Commands 1. Is it possible to learn to follow commands using end-to-end reinforcement learning without any pretraining for vision or language? 2. Whether learning question answering can help learning command 3. Can the machine understand words under new context not seen in training? 5 Haonan Yu, Haichao. Zhang, Wei Xu “Interactive Grounded Language Acquisition and Generalization in a 2D World” ICLR 2018
  6. 6. Horizon RoboticsProblem Setup 6Answering questions and following commands east and avocado never appears together in training Watermelon only appears in answers during training
  7. 7. Horizon RoboticsModel architecture 7Answering questions and following commands answer action value
  8. 8. Horizon RoboticsExperiments 8Answering questions and following commands No QA training
  9. 9. Horizon RoboticsGeneralization Ability 9 We can generalize to word combinations never seen in training We can generalize to questions containing words never seen in training Answering questions and following commands Held out X(%): %X of word/combinations are held out from training
  10. 10. Horizon Robotics Challenges:  Partially observed  Much longer delay of reward  More visual variations “Navigate to the dog!”Navigation in a 3D Environment 10
  11. 11. Horizon RoboticsGuided Feature Transformation Haonan Yu, Xiaochen Lian, Haichao Zhang, Wei. Xu “Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents” CoRL 2018 11Navigation in 3D environment action value
  12. 12. Horizon RoboticsExperimental Results 12Navigation in 3D environment
  13. 13. Horizon RoboticsDemo the object besides candle is your target . please move to the object that is front of the basketball . can you reach the object right of toilet ?go to the object to the right of bike please .reach the location between car and trampoline please.please navigate to the grid between gift and tower .please navigate to the grid between bucket and chair .please move to the object that is front of basketball . 13Navigation in 3D environment
  14. 14. Horizon RoboticsLearning to Speak and Remember 1. How to learn to speak by talking with other people? 2. What information should be remembered? 3. How to utilize knowledge in memory? 14 Haichao Zhang, Haonan Yu, Wei Xu “Interactive Language Acquisition with One-Shot Visual Concept Learning through a Conversation Game” ACL 2018
  15. 15. Horizon RoboticsProblem Setup Rewards are given for each learner response based on its appropriateness 15Learning to speak and remember
  16. 16. Horizon RoboticsMemory Augmented Imitation + Behavior Shaping Through RL Interpreter Speaker Vision Memory What is this? It is a bird. Reward 16Learning to speak and remember
  17. 17. Horizon RoboticsModel Detail 17 Trained end-to-end using gradient descent over Imitation Cost + Reinforce Cost Learning to speak and remember
  18. 18. Horizon RoboticsExample Dialogs 18Learning to speak and remember T: Virtual teacher L: Learner (machine) T: i see grape L: watermelon grape watermelon T: tell what you see L: see see see see see T: there is grape L: grape grape watermelon T: i can observe coconut L: fox watermelon watermelon ------------------------------------------------- _________________________________ ------------------------------------------------- ------------------------------------------------- _________________________________ Before learning After learning
  19. 19. Horizon RoboticsSummary  What we have now:  Learning to understand and use simple language, memorize useful information, and execute simple commands from the interactions with a virtual teacher in virtual environments  What we will do in the future:  Simple → complex  Virtual → real 19
  20. 20. Horizon RoboticsAI Research at Horizon Robotics  About the company  A leading technology powerhouse of edge AI platform  Provide algorithms, processors and hardware jointly optimized for high-performance, low- power and low-cost edge AI capabilities  CES 2019 Innovation Reward  General AI Lab @ Silicon Valley  Research towards the company’s long term vision for artificial general intelligence  Build machines that can learn skills and knowledges unspecified at design time  Applied AI Lab @ Silicon Valley  Applied research focusing on near term needs  Developing novel AI technologies that are critical to our current products Job: bit.ly/general-ai-lab bit.ly/applied-ai-lab 20
  21. 21. Horizon Robotics THANKS! 21

Editor's Notes

  • Good afternoon everyone. I am Wei Xu from Horizon Robotics. Today I am going to talk about our recent work on language learning in an interactive and embodied setting
  • In 1950, in the same article where the famous Turing test was proposed, Turing also proposed a solution. “Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain”. There are several advantages of this approach. First, there are so many things that a human adult can do, it will be too expensive and difficult to individually solve each one of them. Second, emphasizing that all the skills and knowledge of the machine are acquired through its own learning can make sure that the machine will be able to learn new skills and new knowledge unspecified at design time. Third, learning in a developmental way lets the machine gradually proceed from easier tasks to more difficult tasks, which can make the learning easier. This is like curriculum learning which is found to be effective in many difficult learning problems.
  • For embodied learning, the learning experiences are from machine’s physical interactions with its environment.
    By actually doing things and observing the effects, the machine can learn a lots of common sense knowledge about the environment. These kinds of knowledges are typically very hard to be captured by rules or a static dataset. Self-driving car is a great example. Waymo recently announced that the total mileage of their cars is exceeding 10 million miles. Yet they are still not fully ready for deployment. On the other hand, we all know from our experience that a human can learn to drive very well with a few hundred miles practice. A key difference between the self-driving car and human is that human has a lot of commonsense knowledge about the world. For example, even without learning to drive, a human driver knows what situation is unsafe, what obstacles should be avoided, and so on. But for self-driving, all of these commonsense knowledge has to be either coded by rules or obtained from huge amount of driving data.

    Embodied learning is also very help for understanding language. In order for the machine to understand and use language, it needs to connect word sequences with the actual objects and events in the environment. ……..
  • Why should the machine learn in an interactive way? There are several reasons. First, a useful robot needs to interact with human, so it should be able to understand and communicate effectively with human.
    Second, it is easier for human to teach machines directly using language than writing code. And human are great teachers because they are good at adjusting the teaching based on the state of the learner. And in order to be able to use language, the machine needs to learn the effects of speaking by observing feedbacks from its conversational partner. Finally, through the interaction with human, the machine can learn the human value, which is very important to make sure it will do things consistent with human value.

    So I’ve talked about our motivation of learning language in an interactive and embodied setting. In the rest of the talk, I will talk about our recent work along this direction.
  • The first one is about learning to answer questions and follow commands. This work was published in this year’s ICLR conference. There are two problems we want to study in this paper.
  • Here is the problem setup. We developed a 2D simulator. For each session, we generate a random map, question and instructions. The answer is provided as direct supervision. The agent is given reward based on whether it successfully executed the instruction. At test time, the agent will be given commands with words or word combinations never seen in training commands or questions.
  • This is the high level structure of our model. I won’t go into the detail of the model. What I want to say here we design the structure focusing on its generalization ability.
  • This is a short video demo showing how the agent navigates following the commands. The current command is “please move to the object that is front of basket ball”. The agent needs to approach the toilet paper from the direction where it is in front of the basket ball. After it finishes a task, a new map and command will be generated.
  • So far our agent is able to understand some language. In this work, we want the agent to learn to use language through conversation.
  • Here the problem setup. In initially, the agent has zero language ability, cannot understand nor use it, just like a new born baby.
  • This is a high level structure of our model. First, it needs to have memory module because it needs to remember information coming from teacher utterances and images. The vision module generates the visual representation. The interpreter module is for understanding teacher utterance and decide whether to store things into memory. The speaker module is responsible for generating responses based understanding of teacher utterances and information retrieved from memory. The whole system is trained by predicting teacher word sequence and the rewards indicating the appropriateness of the response.
  • I will skip the detail of the model. Just note that it’s trained end-to-end using gradient descent over Imitation Cost + Reinforce Cost
  • Here I show some dialog examples. This is a dialog before learning. The agent just generates some garbage responses, just like a newborn baby.
    Then dialogs after the learning. Here I want to mention is that the machine never see these types of object during training.
    From these dialogs we can see that the machine learned several things. It can confirm the statements from the teacher. It can actively seeking information by asking questions. It can remember relevant information provided by the teacher so later it can use it for answering questions. And it can also answer teacher’s questions if it knows the answer. It somehow learned to uses shape as major cue to differentiate objects.
    I want to emphasize that, unlike most chatbots, where the behavior of the bot is pretty much designed by human, here non of these behaviors are programmed. The machine learned all these behaviors through its interaction with the teacher, in a similar way as a baby learn from their parents.
  • In this final slide, I am going to say a little bit about Horizon Robotics. It’s a leading technology powerhouse of edge AI platform. Its current focus is providing algorithms, processors and hardware jointly optimized for high-performance, low-power and low-cost edge AI capabilities. And I want to share a good news with you that we just received the CES 2019 Innovation Reward Vehicle Intelligence and Self-Driving Technology
    We have two AI Labs in Silicon Valley, one is general AI Lab. It’s doing the kind of research I just talked about, building machines that can learn skills and knowledges unspecified at design time
    We also have applied AI Lab. It’s doing applied research focusing on near term needs of the company, developing novel AI technologies that are critical to our current products
    We are actively hiring. If you are interested, please visit either of these two websites.

×