Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words

Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words
Multiple Categorization by iCub:
Learning Relationships between
Multiple Modalities and Words
○Akira Taniguchi*1，Tadahiro Taniguchi*1，
Angelo Cangelosi*2
*1 Ritsumeikan University, Japan
*2 Plymouth University, UK
1
IROS Workshop on Machine Learning Methods for High-Level Cognitive Capabilities
in Robotics 2016 (ML-HLCR 2016)

Research background
• Infants can acquire word meanings by estimating the
relationships between multiple situations and words.
• For example, if infant grasps a red ball at hand, the parent
may describe an action of infant and an object using a
sentence.
In this case, infant does not know the
relationship between words and
situations because infant has not
acquired the word meanings.
Infant cannot determine whether the
word “red” indicates an action, an
object, a position or a color.
2
“grasp front
red ball”
ball ?
grasp ?
front ?
red ?

• Infants can acquire word meanings by estimating the
relationships between multiple situations and words.
“grasp front
red ball”
ball ?
grasp ?
front ?
red ?
Research background
“look at red
apple”
apple ?
red ?
look at?
“right red
car”
right ?
car ?
red ?
We consider that infant can learn that the word
“red” represents the red color by observing the
co-occurrence of the word “red” with objects of
red color in multiple situations.
This is called cross-situational learning.
[Smith et al. 2011], [Fontanari et al. 2009]
3
“car”！“car”！“red”！

Related work
• Peniak et al. 2011
Action learning by multiple time-scale
recurrent neural network
In our study, we perform cross-
situational learning, including action
learning, by a Bayesian probabilistic
model.
• M. Attamimi et al. 2016
Learning word meanings and grammar
by multilayered multimodal latent
Dirichlet allocation (mMLDA) and
Bayesian HMM
Estimation of the relationships between
words and multiple concepts by
weighting the learned words according
to their mutual information as post-
processing.
In our study, the proposed method can
estimate multiple categories and the
relationships between words and
modalities simultaneously. 4

Research purpose
grasp green front cupHuman tutor
Multiple categorization (action, object, color, position)
and
Learning Relationships between Multiple Modalities and Words
Position of objectsColor of objects
Action information
of the robot
Visual
feature of
objects
？
5
The humanoid iCub robot

Overview of the task
1. The robot is in front of the table with objects
on it.
2. The robot selects an object. The robot performs
visual attention and an action on an object.
– e.g., touch, reach, grasp, look at
3. The human tutor speaks a sentence about the
object and the action of the robot.
4. The robot processes the sentence to discover
the meanings of the words.
This process (steps 1-4) is carried out many times in different situations.
The robot learns word meanings and multiple categories by using visual, tactile,
and proprioceptive information, as well as words.
6
(Video clip)

The proposed method
Multiple categorizations and word meaning learning
• A categorization for each modality is
represented by Gaussian mixture model (GMM).
• 𝐹𝑑 is a modality related to a word.
• 𝐴 𝑑 is an object on the table.𝐿
𝑀
o
dmz dmo o
k o
o

𝐾 𝑜
o

dnw l 
𝐷
𝑁
dA
𝐾 𝑐
c
dmz dmc c
k c
c
c

p
dmz dmp p
k p
p

𝐾 𝑝
p

𝐾 𝑎
a
dz da a
k a
a
a

dF
Word
distribution
GMM
(color)
GMM
(object feature)
GMM
(position)
GMM
(action)
Selection of
an object
Selection of
the modality
𝐹𝑑 = ( a, p, c, o )
grasp front green cup
1
2
7
𝑊1 𝑊2 𝑊3 𝑊4
𝐴 𝑑=object1
a: action, p: position, c: color, o: object feature
The number of objects M The number of data D

The proposed method
Generative model
𝐿
𝑀
o
dmz dmo o
k o
o

𝐾 𝑜
o

dnw l 
𝐷
𝑁
dA
𝐾 𝑐
c
dmz dmc c
k c
c
c

p
dmz dmp p
k p
p

𝐾 𝑝
p

𝐾 𝑎
a
dz da a
k a
a
a

dF
𝐿 = 𝐾 𝑎 + 𝐾 𝑝 + 𝐾 𝑜 + 𝐾 𝑐
8
In equation (1), we assume that a word related to
each modality is spoken only once in each sentence.
Word
distribution
Selection of
an object
Selection of
the modality
✔ 𝐹𝑑 = a, p, c, o , ✖ 𝐹𝑑 = o, o，o, o

Simulator experiment
The procedure for getting and processing data
ID : 𝑥 , 𝑦
1: -0.351, -0.175
2: -0.348, 0.184
3: -0.291, 0.007
9
Action
• looking at an object of
attention
• Reaching for an object
• Grasping with random
degree
Getting visual information
Getting action information
• Posture
• Tactile information
• Relative coordinates to
the object from the hand
Object feature (SIFT)
Color (RBG histogram)
Position (Homography)1
3
2
Area detection of objects
(Background subtraction)
k-means & normalization
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 1121314151617181911 1121314151617181911 112131415161718191
grasp right green ball
Word information
An object of attention：2

grasp front green cup
Simulator experiment
Condition
5 categories for each modality
Normalization to [0,1] for each
dimension of data
”box” “ball” “cup”
10
The number of action trials: 20 trials
The number of objects on the table: 1 – 3 objects
The number of words for each trial：4 words
The word order for each category was 𝐹𝑑 = (a,p,c,o)
in all of the sentences.
The number of kind of words ：14 words
• “reach”, “touch”, “grasp”, “look at”
• “front”, “left”, “right”, “far”
• “green”, “red”, “blue”
• “box”, “cup”, “ball”
reach front green box
touch right green cup
look at right blue box
reach front blue ball
grasp far red box
action position color object
Example of teaching sentences
20 trials

Experimental results
Word probability distributions 𝜃𝑙 (Multinomial distribution)
touch grasp look at reach far left front right box ball cup green red blue
a 0
a 1
a 2
a 3
a 4
p 0
p 1
p 2
p 3
p 4
o 0
o 1
o 2
o 3
o 4
c 0
c 1
c 2
c 3
c 4
11
Higher probability values are represented by darker shades.
a: action, p: position, o: object feature, c: color
The results show that the proposed method was able to associate each word with its
each modality. (in thick-bordered boxes)

Experimental results
Position, object, and color category
p0
p1 p4
p2
far left front right
p 0
p 1
p 2
p 3
p 4
box ball cup
o 0
o 1
o 2
o 3
o 4
Object category
Color category
green red blue
c 0
c 1
c 2
c 3
c 4
Part of the example of categorization results
Position category
12

Conclusions
• We have proposed a Bayesian probabilistic model that can learn
multiple categories and the relationships between words and
multiple modalities.
• The experimental results showed that the robot can perform the
categorization for each modality and the estimation of a modality
related to a word in complex situations.
Future directions
• Experiments using a real iCub
• Learning by uncertain spoken sentences
– Changing the number of words and order
• Action generation task, description task
13
THANK YOU FOR YOUR KIND ATTENTION.

Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words

Recommended

Recommended

More Related Content

Similar to Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words

Similar to Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words (20)

More from Akira Taniguchi

More from Akira Taniguchi (10)

Recently uploaded

Recently uploaded (20)

Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words