This thesis is concerned with the autonomous acquisition of speech production skills by a robotic system. …
This thesis is concerned with the autonomous acquisition of speech production skills by a robotic system.
The acquisition should occur in interaction with a human tutor, making little or no assumptions on the vocabulary and language of interaction.
A particular target embodiment of the acquisition framework presented in this thesis is the humanoid robot ASIMO.
Because of its size, and the little knowledge of the world it possesses, a child's voice is probably the most appropriate type of voice for such an interactive system.
This means, however, that the acoustic properties of the tutor's voice are very different from the system's.
Consequently, the system has to address the correspondence problem in speech.
For this, inspired by findings in the development of speech skills in infants, we propose an interaction scheme involving a cooperative tutor that provides imitative feedback for simple utterances of the system.
It allows the robot to learn a probabilistic correspondence model, which lets the system associate configurations of it's own vocal tract with the acoustic properties of the tutor's voice.
Using this correspondence model, the system can project a target tutor utterance into its motor space, making an imitation possible.
We also integrated this interaction scheme in an embodied speech structure acquisition framework, already used to teach and interact with the robot.
With this integration, we measure the tutor response, and the utterances to be imitated, in a previously trained perceptual space.
This is not only biologically more plausible, but also paves the way for an embodiment in the humanoid robot.
We also investigated a new speech synthesis algorithm, which operates in the acoustic domain and provides the system with a child-like voice.
Its architecture is a hybrid of a harmonic model and a channel vocoder, and uses a gammatone filter bank to produce the spectral representations.
For the control of the speech synthesizer in the context of imitation learning, a synergistic coding scheme, based on the concept of motor primitive, was investigated.