SlideShare for iOS
by Linkedin Corporation
FREE - On the App Store
by Linkedin Corporation
FREE - On the App Store
We have emailed the verification/download link to "".
Login to your email and click the link to download the file directly.
Check your bulk/spam folders if you can't find our mail.
Like this? Share it with your network
ShareThis thesis is concerned with the autonomous acquisition of speech production skills by a robotic system. ...
This thesis is concerned with the autonomous acquisition of speech production skills by a robotic system.
The acquisition should occur in interaction with a human tutor, making little or no assumptions on the vocabulary and language of interaction.
A particular target embodiment of the acquisition framework presented in this thesis is the humanoid robot ASIMO.
Because of its size, and the little knowledge of the world it possesses, a child's voice is probably the most appropriate type of voice for such an interactive system.
This means, however, that the acoustic properties of the tutor's voice are very different from the system's.
Consequently, the system has to address the correspondence problem in speech.
For this, inspired by findings in the development of speech skills in infants, we propose an interaction scheme involving a cooperative tutor that provides imitative feedback for simple utterances of the system.
It allows the robot to learn a probabilistic correspondence model, which lets the system associate configurations of it's own vocal tract with the acoustic properties of the tutor's voice.
Using this correspondence model, the system can project a target tutor utterance into its motor space, making an imitation possible.
We also integrated this interaction scheme in an embodied speech structure acquisition framework, already used to teach and interact with the robot.
With this integration, we measure the tutor response, and the utterances to be imitated, in a previously trained perceptual space.
This is not only biologically more plausible, but also paves the way for an embodiment in the humanoid robot.
We also investigated a new speech synthesis algorithm, which operates in the acoustic domain and provides the system with a child-like voice.
Its architecture is a hybrid of a harmonic model and a channel vocoder, and uses a gammatone filter bank to produce the spectral representations.
For the control of the speech synthesizer in the context of imitation learning, a synergistic coding scheme, based on the concept of motor primitive, was investigated.
Views
Actions
Embeds 2
Report content
The ultimate goal was to use the framework to learn speech through interaction with a tutor.
In the end, I&#x2019;d shown you the first steps
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
- motivate the need for the new technique
- articulatory: limited on voices and phoneme sets
- VOCODER has been shown to work well with good spectral representations
- speech is the physical result of air being expelled from the lungs and passing through the vocal tract
- Source Filter Model of speech production
- source signal (larynx, vocal tract constriction) that is modulated by a Vocal Tract Filter Function
- different ways of representing and deriving the Vocal Tract Filter Function
we tested for intelligibility and naturalness
we tested for intelligibility and naturalness
we tested for intelligibility and naturalness
we tested for intelligibility and naturalness
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
2.
also mention the work of the guys in Edinburgh, where they make spectral morphing between an adult and a child speaker by maximizing the likelihood of a given sequence
Gros-Louis 2006 - interactive, differentiated and proximate responses increase production of more advanced utterances
Goldstein 2003 -
2.
also mention the work of the guys in Edinburgh, where they make spectral morphing between an adult and a child speaker by maximizing the likelihood of a given sequence
Gros-Louis 2006 - interactive, differentiated and proximate responses increase production of more advanced utterances
Goldstein 2003 -
2.
also mention the work of the guys in Edinburgh, where they make spectral morphing between an adult and a child speaker by maximizing the likelihood of a given sequence
Gros-Louis 2006 - interactive, differentiated and proximate responses increase production of more advanced utterances
Goldstein 2003 -
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
egin{split}
p_1(t) & = F_1(t) \
p_2(t) & = F_2(t) - F_1(t) \
p_3(t) & = F_3(t) - F_1(t) \
p_{{4,5,6}}(t) & = log( S( C_{{1,2,3}}(t), t) )%\
% p_5(t) & = log( S( c_2(t), t) ) \
% p_6(t) & = log( S( c_3(t), t) )
end{split}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
no assumptions on the distribution of the elements of ech class
important because data quite irregular
For a set of labels or vocal-classes $C_j$ and an input feature vector $x$, we consider a neighbourhood $V$ of $x$ that contains exactly $k$ points.
The posterior probability of class membership depends on the number of training points of class $C_j$ present in $V$, denoted by $K_j$:
egin{equation}
p( C_j | x ) = frac{K_j}{K}
alpha = frac{p( C_{j1} | x )}{ p( C_{j1} | x ) + p( C_{j2} | x )}
system benefits from an extended vocal repertoire
trends:
canonical vowels
generalization isn&#x2019;t working 100%: morphing might be introducing some distortions
syllable structure
number of vowels in the vowel system
prosodic
traditional HMM synthesis approaches not suitable
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
- from a given input, the Azubi model
C_{ij} = P( m_j | lambda_i^{p} ) = frac{ P( lambda_i^p | m_j , Dj) , P(lambda_i^p) } { P(m_j) }
M_{ij} = P( lambda_i^p | m_j , Dj)
[ lambda^p_{1}, ... , lambda^p_{n} ] = argmax_{[lambda^p] in mathcal{P}} P( [lambda^p] | X_{tutor})
add to scheme that the system gets the phone models after they have been
1. there are some phonemes for which there is a sparse activity
2. some phone models are never active
3. some are active all of the time
whole subset is not covered
- primitives are only vowels
different primitives have a stronger dispersion than others
either
- non-uniform imitative response of the tutor to the vocal primitive
- limitations to synthesizing a phoneme with only one spectral vector
- or the inexistence of any phone model fully representing the imitative response
- issues of over- or under- representation
1. there are some phonemes for which there is a sparse activity
2. some phone models are never active
3. some are active all of the time
whole subset is not covered
- primitives are only vowels
different primitives have a stronger dispersion than others
either
- non-uniform imitative response of the tutor to the vocal primitive
- limitations to synthesizing a phoneme with only one spectral vector
- or the inexistence of any phone model fully representing the imitative response
- issues of over- or under- representation
1. there are some phonemes for which there is a sparse activity
2. some phone models are never active
3. some are active all of the time
whole subset is not covered
- primitives are only vowels
different primitives have a stronger dispersion than others
either
- non-uniform imitative response of the tutor to the vocal primitive
- limitations to synthesizing a phoneme with only one spectral vector
- or the inexistence of any phone model fully representing the imitative response
- issues of over- or under- representation
1. there are some phonemes for which there is a sparse activity
2. some phone models are never active
3. some are active all of the time
whole subset is not covered
- primitives are only vowels
different primitives have a stronger dispersion than others
either
- non-uniform imitative response of the tutor to the vocal primitive
- limitations to synthesizing a phoneme with only one spectral vector
- or the inexistence of any phone model fully representing the imitative response
- issues of over- or under- representation
retake conclusions
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009
M. Vaz, H. Brandl, F. Joublin, and C. Goerick, &#x201C;Speech imitation with a child&#x2019;s voice: addressing the correspondence problem,&#x201D; accepted for 13-th Int. Conf. on Speech and Computer - SPECOM, 2009