Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information", 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (IFAC HMS2016), Aug, 2016. in Kyoto, Japan.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
1. Simultaneous Estimation of
Self-position and Word from Noisy
Utterances and Sensory Information
Akira Taniguchi *, Tadahiro Taniguchi *,
Tetsunari Inamura **
* Ritsumeikan University, Japan
(e-mail: a.taniguchi@em.ci.ritsumei.ac.jp).
** National Institute of Informatics/The Graduate
University for Advanced Studies, Japan.
The 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of
Human-Machine Systems (IFAC HMS 2016 in Kyoto)
2. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
2
3. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
3
4. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Background of our research
• Lexical acquisition in robots
– The robot does not have word knowledge in advance.
– The robot learns words from human speech signals.
– Estimating the identity of the speech recognition results with errors is
a very difficult problem.
• Self-localization in robots
– The robot estimates self-position using sensor information and an
environment map.
– If the robot uses local sensor only, the sequential global localization is
a very difficult problem.
4
5. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Mutually effective utilization
• Uncertain position information
• It is estimated by Monte-Carlo
Localization (MCL).
Self-localization
• Uncertain verbal information
• It is obtained from human speech
• Phoneme/Syllable recognition
Purpose of our research
5
Lexical acquisition
Lexical acquisition
related to places
Monte-Carlo Localization
/afroqtabutibe/
/bigbuqkkais/
/waitosherfu/
“The front of TV.”
/zafroqtabutibe/ ??
/zafrontobutibi/ ??
6. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
6
7. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Research outline (1/4)
7
Place of learning target
8. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Research outline (2/4)
8
Teaching
“white shelf”
“a front of TV”
Teachings of multiple
/afurantotibi/ ?
/waitosherufu/ ?
Ambiguous speech recognition
9. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Research outline (3/4)
9
Learning spatial concepts
/afurantobutibi/
/bigusheufu/
/waitosherufu/
10. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Research outline (4/4)
10
“a front of TV”
After
Modification of self-localization
/furontabutebi/ ?
“Where is .
this place?”
The robot can narrow down the hypothesis of self-position.
Before
11. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
11
12. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Simultaneous estimation of
Self-positions and words
12
Probabilistic self-localization algorithm
based on particle filter
Monte Carlo localization(MCL)
• Place name
• Spatial area (position distribution)
Spatial concept
W1:/sherufu/
W4:/terebi/
W3:/teebou/
W2:
/torashuboqs/
Ex) spatial concepts
• The proposed method is based on a Bayesian probability approach.
• The robot can learn place names from the user’s speech signals without
previous word knowledge
• The robot can reduce the estimation error of self-position by using learned
word information.
13. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
The posterior distribution is calculated as follows:
Self-localization
),,|(
),|()|()|(
),,|(
1:11:11:11:0
1
:1:1:1:0
tttt
ttttttt
tttt
Ouzxp
uxxpxOpxzp
Ouzxp
1tx
xt
xt1
1tz 1tztz
1tu tu 1tu
tC
tO
W
Σμ,
State of
spatial concept
Simultaneous estimation of
Self-positions and words
13
L
ΣμxNWO
CpCxpCOp
xOp
tt
t
t
t
CCt
C
Ct
ttt
C
tt
tt
1
),|()),LD(exp(
)(),,|(),|(
)|(
ΣμW
β:the parameter of the effect of LD
L :the number of spatial concepts
Robot’s
position
Control data
sensor data
Speech
recognition result
Place names
Position distribution
Parameter estimation by Gibbs sampling
Learning spatial concepts
Levenshtein distance Gaussian distribution
14. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
We set the initial position distribution.
The mean vector 𝝁 is the uniform random number in the given range,
i.e., the range in which the robot can move on the map.
The covariance matrix 𝚺 is set as diag( 𝜎initial; 𝜎initial).
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
Learning algorithm for spatial concept using Gibbs sampling
14
15. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
Learning algorithm for spatial concept using Gibbs sampling
The conditional posterior distribution of Ct samples the state of spatial concept
Ct at each time (data) t.
In this step, the robot does not estimate the place names W.
Therefore, the sampling of Ct is performed as follows:
),|(~ CtCttt xpC
1
2
3
15
16. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
• 位置分布の初期値を決定
initial
initial
c
c
0
0
範囲内に一様乱数
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
Learning algorithm for spatial concept using Gibbs sampling
• どのデータがどの場所概念を表すかを推定する
• 発話された場所の座標データを使って位置分布から確率的に選択する
),|(~ CtCttt xpC
The conditional posterior distribution of 𝝁, 𝚺 samples the position distribution
for each state of the spatial concept c.
The sampling of 𝝁, 𝚺 is performed as follows:
),|Wishert(),|(~, 1
NNccNNccc μ V
mN
NNNN ,,, Vm :the hyperparameters of the posterior distribution
1
2
3
16
17. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
• 位置分布の初期値を決定
initial
initial
c
c
0
0
範囲内に一様乱数
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
Learning algorithm for spatial concept using Gibbs sampling
• どのデータがどの場所概念を表すかを推定する
• 発話された場所の座標データを使って位置分布から確率的に選択する
),|(~ CtCttt xpC
• 場所概念ごとにデータから位置分布を更新
• 事前分布にガウス‐ウィシャート分布を用いてガウス分布に対するベイズ
推論を行う
),|Wishert()(,|(~, 111
NNccNNccc μ V
mN
NNNN ,,, Vm :ガウスウィシャート分布のハイパーパラメータ
The conditional posterior distribution of Wc samples the place name Wc for
each state of the spatial concept c.
The sampling of Wc is performed as follows:
)()|(~ c
Tt
cC
Ctc WpWOpW
o
t
t
1
2
3
word
wordword
p(Wc) = 1/N
17
18. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
Learning algorithm for spatial concept using Gibbs sampling
• 場所概念ごとにデータから位置分布を更新
• 事前分布にガウス‐ウィシャート分布を用いてガウス分布に対するベイズ
推論を行う
),|Wishert()(,|(~, 111
NNccNNccc μ V
mN
:ガウスウィシャート分布のハイパーパラメータ
The conditional posterior distribution of Ct samples the state of spatial concept
Ct for each time t.
The sampling of Ct is shown as follows:
)|(),|(~ CttCtCttt WOpxpC
1
2
3
1
3
2
word
wordword
18
19. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
initial
initial
c
c
0
0
範囲内に一様乱数
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
Learning algorithm for spatial concept using Gibbs sampling
),|Wishert()(,|(~, 111
NNccNNccc μ V
mN
Multiple iterations were performed for the process described in steps (3) to (5)
above.
1
2
3
1
3
2
word
wordword
word
wordword
word
wordword
1
3
2
4.
5.
3.
19
20. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
20
21. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Experiment 1 :
Learning spatial concepts
21
/gomibako/
(trash bin)/siroitana/
(white shelf)
Four places were selected as learning
targets.
The user decides whether the robot
arrived at a learning target.
The teaching utterances are repeated
40 times.
The number of spatial concepts:𝐿 = 4
The number of iterations:10
Speech recognition system : Julius
(using Japanese syllables dictionary)
Microphone : SHURE PG27-USB
Conditions
/tsukue/
(desk)
500cm
500cm
The environment on SIGVerse[Inamura et al. (2010)]
Inamura, T., et al. (2010). Simulator platform that enables social interaction simulation -SIGVerse:
SocioIntelliGenesis simulator-. In IEEE/SICE International Symposium on System Integration, 212-217.
/terebimae/
(in front of the TV)
22. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
W1:
/sinitana/
W4:
/terimae/
W3:
/tsukune/
W2:
/gumiwako/
Learning result of spatial concepts
22
23. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
23
24. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Experiment 2 :
Evaluation for word clustering
24
datateachingallofnumberThe
datacorrectestimatedofnumberThe
ΕΑR
We compared the matching rate for the estimated states
of the spatial concept Ct of teaching data and the
classification results of ground truth by a human.
Classification by human
(Ground truth)
Ct = 2
Ct = 3
Ct = 1
Ct = 4Conditions
The EAR (estimation accuracy rate) is calculated as follows:
The white boxes represent the classification of the teaching data
according to the state of the spatial concept.
25. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Evaluation for word clustering
25
The proposed method
EAR: 0.96
The number of estimated Ct
Figures show estimated results of the state of spatial concept in 10 trials.
Word 𝑂𝑡 only clustering
EAR: 0.81
As a result, we showed that it is possible
to improve the lexical acquisition accuracy
by performing clustering that considered
position and word information.
26. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
26
27. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
While the robot performs self-localization, the estimation error for each time step
is calculated as follows:
Experiment 3 : Self-localization using learned
spatial concepts
• Conventional MCL (without word information)
• The proposed method (with word information)
27
22
)()(
ttttt yyxxe
𝑒 𝑚: the average of 𝑒𝑡
𝑒 𝑟 : the minimum value of the estimation error 𝛾
M
i
i
t
i
tt
M
i
i
t
i
tt ywyxwx )()()()(
,
:the robot’s true position
tt yx ,
Comparison
We performed an experiment for the evaluation of self-localization using
learned spatial concepts.
𝛾 is determined by the interval 0, 𝛾 , which includes values greater than or equal to 95% of 𝑒𝑡 in
all teaching times.
Two evaluation indicators
28. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Experiment 3 : Self-localization using learned
spatial concepts
• Landmark based MCL
– If the robot looks at the landmark,
the robot gets the distance and
angle of the landmark.
– View angle of the camera:45
degree
– The number of particles:1000
– The number of landmarks:4
– The robot moves for 400 steps
28
Conditions
W1:sinitana
W4:terimae
W3:tsukune
W2:
gumiwako
When the robot arrives at the learning target,
the user says the place name of the target
place for the robot.
29. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Situation of the experiment
29
(Video clip)
30. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Results
The estimation errors of the proposed method were smaller than that
of MCL for two evaluation indicators 𝑒 𝑚, 𝑒 𝑟.
30
As a result, we showed that the spatial concepts enable the modification
of the global self-localization error.
31. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Outline
1. Background and purpose of our research
2. Research outline
3. Proposed method:
Simultaneous estimation of self-positions and words
4. Experiment1:
Learning spatial concepts
5. Experiment2:
Evaluation for word clustering
6. Experiment3:
Self-localization using learned spatial concepts
7. Conclusion
31
32. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
• The robot could acquire spatial concepts by integrating noisy
information from sensors and speech.
Learning spatial concepts
• It is possible to improve the lexical acquisition accuracy by
performing clustering that considered uncertain position
information and uncertain word information.
Lexical acquisition related to places
• The robot was able to estimate self-position by employing spatial
concepts with lower error than when using MCL.
Self-localization using spatial concepts
Conclusion
32THANK YOU FOR YOUR KIND ATTENTION.
34. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Teaching word Recognized words
/shiroitana/
(white shelf)
/shiroitano/ /shinitana/ /shiroitaga/ /shinotaga/ /shinutana/
/chinitano/ /tsutana/ /shinonitana/ /shinotana/ /shiruitana/
/tsukue/
(desk)
/tsukube/ /tsukune/ /tsukuu/ /tsukue/ /tsukume/
/tsukune/ /tsukuke/ /tsutsune/ /tsukune/ /tsukume/
/gomibako/
(trash bin)
/komiwako/ /gubiwako/ /gumiwako/ /gubiwako/ /kumiwako/
/komiwako/ /gumiwako/ /gumiwako/ /gumibako/ /gumibaku/
/terebimae/
(in front of the TV)
/terimae/ /tebimae/ /tegikunae/ /terenae/ /terimae/
/terenae/ /tezurimae/ /terimae/ /teteginae/ /teribinae/
The results of speech recognition using the
Japanese syllable dictionary
34
35. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Smoothing
Estimation of self-positions 𝑥1:𝑇
• Monte Carlo localization : on-line algorithm
• Learning spatial concepts : off-line algorithm
35
In general, the smoothing method can provide a more accurate estimation
than the MCL of online estimation.
Self-positions 𝑥1:𝑇 are sampled by using a Monte Carlo fixed-lag smoother in
the learning phase.
x0
x1
xt 1
xt
xt 1
1z 1tz tz 1tz
Tx
Tz
36. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
xt
robot’s true position
time
Ltx
:the trajectories of particles :the smoothed trajectory
Monte Carlo fixed-lag smoothing
(Particle smoother)
In general, the smoothing method can provide a more accurate estimation
than the MCL of online estimation.
37. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Speech recognition system: Julius
• Word dictionary
– Japanese syllables only
37
To recognize speech signals as syllable sequences.
Japanese syllable dictionary
The set of 43 Japanese phonemes defined by
Acoustical Society of Japan (ASJ)’s speech database
committee is adopted by Julius.
The Julius system uses a word dictionary containing
115 Japanese syllables.
Julius dictation-kit-v4.2, http://julius.sourceforge.jp/
We assume that the robot can recognize user speech at the unit of syllables, i.e., the
robot does not have word knowledge in advance.
38. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
The clustering method using word data only Ot
(without position data 𝑥 𝑡)
2 )(~ tt CpC
3 ),(~, cccc p
4
)()|(~ c
Tt
Cc
Ctc WpWOpW
o
t
t
5 )()|(~ tCtt CpWOpC t
38
1. Initialization 𝝁, 𝚺
2. Sampling of 𝐶𝑡
without W
3. Sampling of 𝝁, 𝚺
4. Sampling of W
5. Sampling of 𝐶𝑡 with W
6. Multiple iterations of
3 - 5
39. Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information
Nonparametric Bayesian Spatial Concept
Acquisition method (SpCoA)
39
• This model can learn spatial concepts from continuous speech signals
• This model can learn an appropriate number of spatial concepts, depending
on the data (using nonparametric Beysian approach)
• This model can relate several places to several names.
(many-to-many correspondences between names and places)
/emajentoshitemurabo/,
/taniguchizurabo/
“Here is
Emergent
System Lab.”
/konekutiNgukorida/
(connecting corridor)
Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization
and Unsupervised Word Discovery from Spoken Sentences", IEEE Transactions on Cognitive and Developmental Systems, 2016.