rospeex: a cloud-based speech communication toolkit for ROS

rospeex

A Cloud-based speech communication toolkit for ROS
2013/12/13

Komei Sugiura
National Institute of Information and Communication Technology, Japan
komei.sugiura@nict.go.jp

ROS (Robot Operating System)
• ROS: middleware for robots
– Version 1.0 released in 2010
– Global de facto standard
– From driver and package management to learning and
visualization

2

Speech communication toolkit for ROS

rospeex

• ROS compatible
• Speech recognition using VoiceTra engine
• Other functionalities
– Noise reduction, non-monologues speech synthesis
Conventional packages
rospeex
Speech recognition/ Sphinx, festival, Julius
VoiceTra engine
synthesis
(or commercial tools)
(or third-party engines)
Engine
Stand alone
Cloud-based
Language
Single language
ja, en, zh, ko
3

Position in Cloud Robotics
• Cloud robotics [James Kuffner@Google, 2011]
– Manipulation using Google Goggles [Kehoe+ 2013]
– Knowledge sharing based on RoboEarth [Tenorth+ 2012]
– Speech communication for robots
rospeex
Cloud-based

Incompatible

Commercial systems
(Nuance, ToSpeak,
AmiVoice Cloud, ..)

rospeex

Many

OpenHRI, HARK,
PocketSphinx, Festival
Stand-alone

Robot middleware
compatible

Quadrilingual communication using rospeex

5

rospeex provides speech recognition/synthesis,
user constructs dialogue processing
Input from other modules
(Sensors, recognized obj, etc)

Speech
input

Noise
reduction
VAD

Task manager

Output to other modules
(Actuators, learning, etc)

Speech module
Speech
recognition

Dialogue
processing

Speech
synthesis

Speech
output

Speech recognition
& synthesis servers

Provided by
rospeex

Provided by
the user

Provided by
third parties

Speech recognition
& synthesis servers

Non-monologue speech synthesis for robots
• Reading-style robot voice
– Monotonous, unnatural and unfriendly
– Hard to realize that the robot is asking
a question

XIMERA 3
(Text reading)

Voice talent

• Conventional text-to-speech (TTS) systems
are not optimized for communication

7

Demo
http://komeisugiura.jp/software/nm_tts.html

8

Using speech recognition/synthesis without ROS
• Send JSON file to the server
– Recognition http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSS
– Synthesis
http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSR
• Sample codes (JavaScript, Python, C++) are available
Non-monologue speech synthesis
{ "method":"recognize",
"params":[
"ja",
{“audio”:“base64-encoded wav",
"audioType":"audio/x-wav",
"voiceType":"*"
}]}
Recognition

Search
{ “method” : “speak”,
"params" : [
"ja",
"こんにちは",
"*",
"audio/x-wav"
]}

Synthesis

rospeex: a cloud-based speech communication toolkit for ROS

More Related Content

Similar to rospeex: a cloud-based speech communication toolkit for ROS

More from Komei Sugiura

Recently uploaded

rospeex: a cloud-based speech communication toolkit for ROS