This document discusses cloud robotics platforms for human-robot dialogues. It introduces Rospeex, a cloud robotics platform developed by NICT for multilingual dialogues. Rospeex has over 30,000 unique users and achieves state-of-the-art performance in speech recognition. The document also discusses NICT's work building domestic service robots through the RoboCup@Home competition, and their research using machine learning to better predict air pollution and reduce its social costs.
How to Troubleshoot Apps for the Modern Connected Worker
Cloud Robotics for Human-Robot Dialogues
1. Cloud Robotics for Human-Robot Dialogues
Komei Sugiura
Senior Researcher,
National Institute of Information and Communications Technology
Trustee,
RoboCup Federation
2. NICT:
Japan’s National Research Institute for ICT
Possible collaborations
• human-robot communication, scene
understanding, multimodal dialogues,
IoT data mining, and other
robotics/machine-learning/CV topics
Annual budget ¥29.7B
(£168M)
# researchers/staffs 434 / 937
Research topics
Spoken language processing, natural
language processing, machine translation,
databases, data mining, etc. @Kyoto
Photonic network, wireless network,
cybersecurity, time standard, neuroscience,
space weather, etc. @Tokyo, Osaka
VoiceTra
>1M downloads
since July, 2010
3. Multimodal dialogues with robots: Language processing using
non-linguistic information is challenging
Smartphone and other consumer devices
Language processing using non-
linguistic information gives benefit
cf. Market size of speech recognition
¥88B@2013→¥170B@2018 (£1B)*
Show me today’s
schedule
* Estimation by NEDO, TSC Foresight Vol.8, 2015
Sushi restaurants
around here
Benefit for
QA/search
GPS Contacts Other context
info.
Current communication with robots
Limited multi-modality and
scalability in robot intelligence
??
??Throw them
away.
Is there any milk
in the fridge?
cf. [Steels 2003, Roy 2005, Iwahashi
2007, Kollar+ 2010, Yu+ 2013]
4. Key Question:
How can we make robot intelligence scalable and multimodal ?
Major speech recognition engines are trained with large-scale corpora (>1000
hours ≒ 100M utterances), and continuously improved as cloud services
4
RoboCup@Home: Target user scenario
with service robots
XIMERA 3
(by NICT)
Voice talent
cf. [Sugiura+ ICRA2014]
Can we make such innovations in robotics?
• Training with large-scale datasets and continuous improvements in e.g.
dialogues, object recognition, grasping, simulation, …
5. (1) Rospeex:
We built a cloud robotics platform for multilingual dialogues*
• 30,000 unique users since Sep. 2013
• Non-monologue speech synthesis designed for robots [Sugiura+ 2014]
• Word Error Rate = 7.9% for IWSLT tst2011 (1st Place Winners in
IWSLT12, 13, 14)
Python & C++ samples
are available
rospeex Search
* Research/development-use only
6. Rospeex’s positioning in robot dialogue quadrants
Cloud APIs
(Google, Microsoft, Nuance,
NTT docomo, Wit.ai, …)
Free software
Commercial software
OpenHRI,
PocketSphinx, Festival
Cloud-based
Stand-alone
Robot middleware-
compatible
Incompatible
6
Does not work with
very low-spec PCs
Robotics-specific
logs are lost
Authentication
Low quality
Expensive
IP distributions of rospeex users
Rospeex has been applied to:
Humanoids, web agents, conversational
robots with elderly people, automotive
navigation systems, smart-home interface
7. (2) Building domestic service robots
(1st places in 2008 & 2010, 2nd places in 2009 & 2012)
• RoboCup@Home: The largest competition for domestic service robots
– Focuses on human-robot interaction and mobile manipulation
• Challenges
– Navigation in unknown environments (e.g. real shop), handling everyday
objects, spoken dialogues in very noisy environments (70dBA), …
• cf. Social impacts of other RoboCup leagues
– RoboCupRescue: Fukushima Power plant investigation
– Aldebaran sold >1000 NAO robots and bought by Softbank @US$ 100M, …
7by Channel 5
8. (3) Machine learning for environments:
Air pollution prediction can reduce social cost
• Loss by PM2.5 and air pollutants
– 3.3M premature deaths per year [Lelieveld, Nature, 2015]
• Prediction can prevent possible exposure, but prediction accuracy is quite low
– Standard approach gave only 42% accuracy in Fukuoka@2013-14*
• Applying the DPT-DRNN method to weather open-data outperforms a
standard weather model-based approach
*threat score=TP/(TP+FP+FN)
Premature deaths in London
≒2,800 @2010 [Ong & Sugiura, IEEE BigData 2014]Hefei, China 2015
(Not fog, not cloudy)