Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Cloud robotics for building conversational robots.

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Cloud Robotics for Building Conversational Robots Komei Sugiura National Institute of Information and Communications Tech., Japan
  2. 2. Beyond the Language Barrier: NICT’s free software and cloud services 1. Speech to speech translation system: VoiceTra (2010) >1M downloads. High performance in translation to/from Asian languages 2. MCML Speech interaction SDK (2013) The SDK enable the user to build WFST- based multilingual dialogue systems. 3. Smartphone dialogue apps (2011) Spoken dialogues and recommendation in tourist guidance domains 4. Cloud robotics platform rospeex (2013) 40K unique users. Top level quality as dialogue-based TTS in Japanese.
  3. 3. [New] Automatic captioning SDK for developers Free of charge, but authentication required Video
  4. 4. Motivation: How can we build communicative robots to help people? Smartphones and other consumer devices Speech interfaces give benefit to consumers cf. Market size of speech recognition ¥88B@2013→¥170B@2018 (€1.5B)* Show me today’s schedule * Estimation by NEDO, TSC Foresight Vol.8, 2015 Sushi restaurants around here Benefit for QA/search GPS Contacts Other context info. Current communication with robots Insufficient benefit to consumers ?? ??Throw them away. Is there any milk in the fridge? • Bad recognition accuracy • User needs to specify [what, where, how] as well as start/end conditions
  6. 6. Background: Speech recognition/synthesis is bottleneck for reducing cost in human-robot interactions • Synthesized speech sounds monotonous and unfriendly • Speech recognition does not work well than expected XIMERA 3 (Text-reading) Voice talent Target = Interactions with service robots
  7. 7. Rospeex: A cloud robotics platform for multilingual spoken dialogues • >40,000 unique users have used rospeex • WER =7.9% (accuracy=92.1%) for IWSLT tst2011 (1st Place Winner in IWSLT12, 13, 14) • Top-level quality dialogue-oriented TTS Python & C++ samples are available rospeex Search * Free of charge for research
  8. 8. Rospeex’s positioning in robot dialogue quadrants 8 Cloud APIs (Google, Microsoft, IBM, NTT docomo,,…) Free software Commercial software OpenHRI, PocketSphinx, Festival Cloud-based Stand-alone Robot middleware- compatible Incompatibl e Does not work with very low-spec PCs  Robotics-specific logs are lost  Authentication Low quality  Expensive  8 Distribution of rospeex users rospeex applications (40k unique users) Conversational agents in elderly care facilities, service robots, humanoid, dialogue agents, speech interface for car navigation systems or smarthome devices, …
  9. 9. Analysis: TTS requests depend heavily on individuals • Question: Do developers use same sentences for TTS? If so, we can speed up by introducing local cache. Cache hit Cache miss • Analysis on top 88 users – New requests = 50.4% on average – An individual uses max. 200 unique sentences Without a cloud platform, we cannot conduct large-scale analysis of robot developers Introducing cache will reduce comm. time
  11. 11. Multimodal language understanding Kollar+ 2010 HRI 2010 Best Paper • Input: Text, LRF, Image • Output: path planning • E.g. “Go down the hallway” Iwahashi & Sugiura+ 2010 • Input: Image and speech • Output: object manipulation • E.g. “Place-on Elmo” Visual QA[2015-] • Input: Image and question • Output: Answer • E.g. “How many elephants are there?” -> “2” Video
  12. 12. LCore: Multimodal Robot Language Acquisition [Iwahashi, Sugiura, et al 2010] Key features • Fully grounded vocabulary • Imitation learning • Incremental & interactive learning • Language independent • Learning when to ask questions 12
  13. 13. HMM “Place- on” Place X on Y Imitation learning for spoken language understanding: Re-ranking hypotheses using planned trajectories’ likelihood • Transformation of reference-point-dependent HMMs* – Input: verb ID, object ID(s) e.g. <place-on, Object 1, Object 3> – Transforms HMM from intrinsic coordinate system into world coordinate system HMM “Place-on” World CS Situation Place X on Y * Sugiura et al, IROS 2011 RoboCup Best Paper
  14. 14. HMM-based trajectory generation using dynamic features* : state sequence : HMM parameters : time series of (position,velocity,acceleration) Maximum likelihood trajectory *Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000 : vector of mean vectors : matrix of covariance matrices of each OPDF : matrix of coefficients in difference approximation : time series of position
  16. 16. RoboCup@Home: Benchmark tests for domestic robots • RoboCup@Home: The largest competition for domestic robots – One of the major RoboCup leagues – Focuses on human-robot interaction and mobile manipulation – Robots are evaluated by 8 standardized and 3 demonstration tasks • Scientific challenges – Navigation in unknown environments (e.g. real shop), handling everyday objects, spoken dialogues in very noisy environments, … 16
  17. 17. RoboCup@Home Standard Platform Leagues start in 2017 • Many teams need low-cost standardized platforms • Companies know NAO’s success after selected as soccer- Standard Platform (Softbank bought Aldebaran @100M USD ) Toyota HSR • Main use case = partner robot for those who need care • Lease-based Softbank Pepper • Already deployed in restaurants and shops • Very low price Both compatible with ROS CFPs for HSR/Pepper users will be open soon
  18. 18. Summary • Data-driven approaches • Multimodal spoken dialogue with robots • RoboCup and domestic service robots • …and we’re hiring!