Micai 13 contextualized practical speech

Departamento de Ciencias de la Computación
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas
Universidad Nacional Autónoma de México

Practical Speech Recognition for
Contextualized Service Robots
Ivan Meza, Caleb Rascón and Luis Pineda

http://golem.iimas.unam.mx/
GrupoGolem

Service robots
● Our future butlers
● They are task oriented
○ Clean up a room
○ Play a game

●
●
●
●

Interaction with spoken language
They work in noisy environments
Microphone is not close to the speaker
Poor speech recognition

Proposal
● Improve the system on four aspects
● Contextualized recogniser
● Prompting strategies
● Recovery strategies
● Audio calibration

I. Contextualized recognition
● Use specific language models for the
given expectations
■ YES: yes, okay, all right
■ NO: no, don’t, do not
■ NAVIGATE: go to the kitchen, go to the living
room, go to the bedroom

II. Prompting strategies
● Let know the user when to speak
■ Beep sound

● Speaker volume monitor
■ Could you speak louder or softer

III. Recovery strategy
● Let know the user when something
went wrong
■ could you repeat?
■ i can’t hear you well, could you repeat
■ sorry, i’m a little deaf

IV. Calibration of audio setting
● Hardware
■ 1 directional microphone
■ 1 USB interface with 4 channels
■ 2 speakers

● Calibration of SNR in situ
■ For background noise -58dB
■ SNR set to 20 dB

Corpus evaluation
● Logs from the robot performing
RoboCup tasks
■
■
■
■
■
■
■

2 years interactions in lab and competition
1,439 utterances
2,472 tokens
120 types
11 tasks
9 of 11 tasks are contextualized
14 language models

Contextualized recognition
We measure WER (the lower the better)
● With a unique LM for all tasks: 53.84%
● With task-based LM: 28.28%
● With contextualized: 23.42%

17.2% relative error reduction

Beep sound
● 79 utterances were recorded without the
beep sound
■ Without beeps 55.86%
■ With beeps 39.75%
■ With beeps full 53.72%

30%-4% Relative error reduction

Usage of SoundLoc System
● We measure usage
■ 174 times could have been triggered
■ 21 soft speech
■ 4 louder

14.36% of the times

Recovery strategy
● We measure usage
■ 504 times could have been triggered
■ 85 times activated

16.87% of the times

Conclusions
● These strategies help to improve in small
amounts the performance
● Together they allow practical speech
recognition on a service robot

Micai 13 contextualized practical speech

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

More from Grupo Golem (DCC-IIMAS-UNAM)

More from Grupo Golem (DCC-IIMAS-UNAM) (12)

Recently uploaded

Recently uploaded (20)

Micai 13 contextualized practical speech