New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Automatic selection of object recognition methods using reinforcement learning
1. Automatic Selection of Object
Recognition Methods Using
Reinforcement Learning
Reinaldo A.C. Bianchi†, Arnau Ramisa‡,
and Ramón López de Mántaras‡
†Centro Universitário da FEI, Brazil
‡Artificial Intelligence Research Institute, Spain
Presenter: Shunta SAITO
13年5月22日水曜日
2. Authors
• Reinaldo A.C. Bianchi
‣ full professor at the electric engineering department of the
centro universitário da FEI, at são bernardo do campo, são
paulo, brasil
• Arnau Ramisa
‣ postdoc with the Perception and Manipulation team at the
Industrial Robotics Institute (IRI UPC-CSIC), at the
Universitat Politecnica de Catalunya
• Ramón López de Mántaras
‣ Director of the IIIA (Artificial Intelligence Research Institute)
of the CSIC (Spanish National Research Council)
13年5月22日水曜日
3. Abstract
Which algorithm should be used to recognize objects?
Question:
Goal:
Automatically select the best algorithm from 2 state-of-
the-art object recognition algorithms
Methodology:
Using Reinforcement Learning
Background:
The robot should be able to decide by itself which object
recognition method should be used, depending on the
current conditions of the world
13年5月22日水曜日
4. Abstract
Which algorithm should be used to recognize objects?
Question:
Goal:
Automatically select the best algorithm from 2 state-of-
the-art object recognition algorithms
Methodology:
Using Reinforcement Learning
Background:
The robot should be able to decide by itself which object
recognition method should be used, depending on the
current conditions of the world
13年5月22日水曜日
5. Reinforcement Learning
The RL problem is meant to be a straightforward framing of
the problem of learning from interaction to achieve a goal
(Sutton and Barto, 1998)
Formulation:
assuming that the environment fill Markov property
- finite set of states that the agent can achieve;s 2 S
- A finite set of possible actions that the agent can perform;a 2 A
- A state transition function T : S ⇥ A !
Y
(S)
where is a probability distribution over
Y
(S) S
- A finite set of bounded reinforcements (payoffs) R : S ⇥ A ! R
task: find out a stationary policy of actions ⇡⇤
: S ! A
13年5月22日水曜日
6. Reinforcement Learning
Formulation:
Q⇤
(s, a)
is the reward received upon performing action in statea s
+ the discounted value of following the optimal policy thereafter
Q⇤
(s, a) ⌘ R(s, a) +
X
s02S
T(s, a, s0
)V ⇤
(s0
)
optimal policy ⇡⇤
⌘ arg maxaQ⇤
(s, a)
transition probability
: optimal state-action function
Q⇤
(s, a) ⌘ R(s, a) +
X
s02S
T(s, a, s0
)maxa0 Q⇤
(s0
, a0
)
13年5月22日水曜日
7. Q-Learning
Formulation:
The Q-learning algorithm iteratively approximates ˆQ
the values will converge with probability 1 toˆQ Q⇤
ˆQLet be the learner s estimate of Q⇤
(s, a)
ˆQ(s, a) ˆQ(s, a) + ↵
h
r + maxa0 ˆQ(s0
, a0
) ˆQ(s, a)
i
a0
s0
s
a
...
...
...
...
...
...
...
...
...
backup diagram
ˆQ(s, a)
ˆQ(s0
, a0
)
(0 < 1)
↵ =
1
1 + visits(s, a)
total number of times this state-
action pair has been visited
times, and backup↵
step-size parameter
13年5月22日水曜日
9. RL applications in Computer Vision
Active Vision
Whitehead and Ballard (1991) => Machine Learning (Jnl.)
described an adaptive control architecture to integrate active sensory-
motor systems with RL based decision systems
Minut and Mahadevan (2001) => ICAA
proposed a model of selective attention for visual search tasks, such as
deciding where to fixate next in order to reach the region where an object is
most likely to be found
Darrell and Pentland (1996a,b) => NIPS, ICPR
proposed a gesture recognition system that guides an active camera to
foveate salient features based on a RL paradigm
To angle one's eyes such that the foveae are directed at (an object in one's field of view)
13年5月22日水曜日
10. RL applications in Computer Vision
Active Vision
Darrell (1998) => NIPS
concisely represented active recognition behavior derived from hidden-
state reinforcement learning techniques
Paletta and Pinz (2000) => Robotics and Autonomous Systems (Jnl.)
applied RL in an active object recognition system, to learn how to move
the camera to informative viewpoints, defining the recognition process as a
sequential decision problem with the objective of disambiguating initial
object hypotheses
Reinforcement Learning provides then an efficient method to
autonomously develop near-optimal decision strategies in terms of
sensori- motor mappings (Paletta et al, 1998)
For these authors...
13年5月22日水曜日
11. RL applications in Computer Vision
Active Vision
Borotschnig, et al. (1999) => Image and Vision Computing (Jnl.)
built a system that learns to reposition the camera to capture additional
views to improve the iamge classification result which is obtained from a
single view
Paletta, et al. (2005) => ICML
proposed the use of Q-learning to associate shift of attention actions to
cumulative reward with respect to object recognition
Image Segmentation
Peng and Bhanu (1998) => IEEE Trnsc. on PAMI
used RL to learn to adapt the image segmentation params of a specific
algorithm to the changing environmental conditions
13年5月22日水曜日
12. RL applications in Computer Vision
Image Segmentation and Object Recognition
Peng and Bhanu (1998, 2000) => IEEE Trnsc. on SMC
improved the recognition results by using the output at the highest level as
feedback for the learning system
Taylor (2004) => MSc Thesis
proposed a general framework for applying RL to parameter selection
problem in vision
Parameter Selection in Vision Problem
Tizhoosh and Taylor (2006) => Int. Jnl. of Image and Graphics
proposed a automated technique for obtaining a subjectively ideal image
enhancement
13年5月22日水曜日
13. RL applications in Computer Vision
Parameter Selection in Vision Problem
Shokri and Tizhoosh (2003∼2008) => some Int. Jnl.s...
proposed a reinforcement agent for finding an optimal threshold in order
to segment digital images
Yin (2002) => Signal Process (Jnl.)
design a general framework for an intelligent system to extract one object
of interest from ultrasound images based on reinforcement learning
Sahba, et al. (2008) => Expert Systems with Applications (Jnl.)
proposed a RL system for adaptive tropical cyclone patterns segmentation
and feature extraction from satellite imagery and introduced a closed-loop
system based tropical cyclone forecast on RL
Hossain, et al. (1999) => IEEE SMC
13年5月22日水曜日
14. RL applications in Computer Vision
Object Recognition
modeled the object recognition problem as a Markov Decision Problem,
and proposed a theoretically sound method for constructing object
recogntion strategies by combining CV algorithms to perform
segmentation (The result is a system called ADORE (Adaptive Object
Recognition) that automatically learns object-recognition strategies from
training data)
Draper, et al. (1999) => Jnl. of Computer Vision Research
There are many applications of RL in Computer Vision
13年5月22日水曜日
15. Summary of RL in CV
Whitehead and Ballard (1991)
active sensory-motor systems RL based decision systems+
adaptive control architecture
optimize the performance of active vision systems
decide where the focus of attention should be
learn how to move a camera to more informative viewpoints
optimize parameters of existing and new CV algorithms
diversified to...
13年5月22日水曜日
16. Limitations caused by RL
In Object Recognition Task
the reward value associated with a situation
‣ is usually not directly available
‣ requires that a certain amount of knowledge about the
world to be defined
the large space state
‣ make it difficult to converge
‣ RL algorithms rises performance issues
13年5月22日水曜日
17. Two Object Recognition Methods
Lowe s Feature Matching method (Lowe, 2004)
Vocabulary Tree Algorithm (Nistér and Stewénius, 2006)
‣ is proposed together with SIFT
‣ is a single view object detection
and recognition system
‣ matches features between a test and model images as below
‣ uses visual words (Bag-of-Words) to classify images
13年5月22日水曜日
20. Preprocessing
• segmentation
1. apply bilateral filtering to remove texture from the
image
2. the Canny edge detector is applied to define the
edges in the image
3. mathematical morphology operators are applied in
order to close the contours that remained open
4. a flood-fill algorithm is used to fill connected areas
divided by the edges
13年5月22日水曜日
21. Weaknesses of two methods
Lowe s Feature Matching Method
performs poorly when recognizing sparsely textured
objects or objects with repetitive patterns
Vocabulary Tree Algorithm
needs an accurate segmentation state, prior to
classification, which can be very time consuming, and it
depends on the quality of the segmentation stage to
provide good results
13年5月22日水曜日
22. Learning to Select Object Recognition Methods
1st stage 1. decide to use the image for recognition
‣ because the image contains an object
2. decide the image should be discarded
‣ because the image does not contain objects
2nd stage 1. decide to use Lowe s algorithm
2. decide to use Vocabulary Tree(VT) Algorithm
use Reinforcement Learning as a classification method
13年5月22日水曜日
23. State
attributes extracted from the images
the possible classification of the image
+
Space state
State definition example in 1st stage
s = [I, , c]
c : class ID
σ: standard deviation of image intensity
I : mean image intensity
13年5月22日水曜日
24. Action
Update action (not real action happening in the world)
Q(s, a)update the value of a state-action pair at one state
using the value of a neighbor pair
Example:
if space state is composed of [ I, σ] (2D)
I
σ
0.1 0.7 1.2 3.1 1.8
0.5 1.1 0.3 2.6 4.1
1.4 2.3 3.2 0.9 2.7
0.7 4.3 2.7 1.4 3.9
3.2 4.6 1.3 1.7 0.7
(after action a)
I
σ
0.1 0.7 1.2 3.1 1.8
0.5 1.1 0.3 2.6 4.1
1.4 2.3 2.3 0.9 2.7
0.7 4.3 2.7 1.4 3.9
3.2 4.6 1.3 1.7 0.7
update action
→action toward space state
Q(s, a) values
13年5月22日水曜日
25. Reward
If the learning agent reaches a state where a traning image
exists and the state corresponds to the correct classification
of the image, the agent receives a reward
Example:
a training image (mean = 50, std = 10, not contains objects)
a state (mean = 50, std = 10, classification = discard)
exists?
yes no
reward > 0 reward = 0
13年5月22日水曜日
26. MDP Definition
• The set of update actions that the agent can perform,
defined as update the Q value using the value of a neighbor
• The finite set of states in this case is the n-dimensional
space of values of the attributes extracted from the images
plus its classification;
• The state transition function allows updates to be made
between any pair of neighbors in the set of states
• The Reinforcements are defined using a set of
training images
a 2 A
s 2 S
R : S ⇥ A ! R
13年5月22日水曜日
27. Training phase
Reinforcement learning is performed over a set of pre-
classified images
→ learn a mapping from images to image classes
Learning algorithm
13年5月22日水曜日
28. Training phase
What is happening during the learning phase?
Goal
Every time the robot finds a
goal state and receive a
reward, the state-action pair
where the robot was before
reaching the goal state is
updated
Every time the robot moves,
it iteratively updates the origin
state-action pair
13年5月22日水曜日
29. Training phase
σ
I
class id
id 0: no such image in dataset
id 1: a image should be recognized by Lowe is found
id 2: a image should be recognized by VTA is found
learning
13年5月22日水曜日
30. Training phase
σ
I
class id
The right image shows a table where the classification was
spread over to states where there are no prior examples, and
that allows the classification of other images
learning
13年5月22日水曜日
31. Learning and Test database
image dataset of nine typical household objects
(Ramisa et al, 2008)
objects with
repetitive texture
textured
objects
non-textured
objects
3 categories (3 objects per category), each category consists of 3 different
objects and each object has approximately 20 training images
13年5月22日水曜日
32. Learning and Test database
image dataset of nine typical household objects
(Ramisa et al, 2008)
test image
include occlusions, illumination changes, blur and other typical nuisances
that will be encountered while navigating with a mobile robot
13年5月22日水曜日
33. Learning and Test database
image dataset of nine typical household objects
(Ramisa et al, 2008)
background images
that do not contain objects to be recognized
13年5月22日水曜日
34. Space State Descriptors
MS
mean and standard deviation of the image intensity
mean and standard deviation of the image intensity plus
entropy of the image
MSE
mean and standard deviation of the image intensity plus the
number of interest points detected by the Difference of
Gaussians operator
MSI
13年5月22日水曜日
36. Reward table
reward +10
reward -10
x in the training set exists an image with this
combination of mean and std dev values
. represents images that does not
contain objects (backgrounds)
(whitespace) is represents absence ob
images
13年5月22日水曜日
37. Classification Table
x in the training set exists an image with this
combination of mean and std dev values
. represents images that does not
contain objects (backgrounds)
(whitespace) is represents absence ob
images
the results of applying the RL algorithm during the learning phase
13年5月22日水曜日
38. Experiments
object image
(3 categories)
test image
(with nuisance)
background image
(with no objects)
image dataset of nine typical household objects
(Ramisa et al, 2008)
Experiment phases
1. the training of the RL
‣ 40 test images, from which approximately 160 images containing objects
were segmented and previously classified
‣ 360 background images, also resulting from the segmentation process
2. the execution phase where training quality can be verified
13年5月22日水曜日
39. Correctly Classified Images
Full ImgFull ImgFull Img Small ImgSmall ImgSmall Img Expert
MS MSE MSI MS MSE MSI
Back 91.9 100.0 98.0 92.6 100.0 98.9 100.0
Lowe 84.5 100.0 44.4 76.0 98.4 38.1 93.2
(%)
↵ = 0.1 = 0.9
13年5月22日水曜日
40. Incorrect Classification
Full ImgFull ImgFull Img Small ImgSmall ImgSmall Img Expert
MS MSE MSI MS MSE MSI
Back 12.8 1.8 14.2 20.4 2.4 25.3 8.2
Lowe 11.6 1.9 7.9 15.8 1.9 9.9 10.8
(%)
↵ = 0.1 = 0.9
13年5月22日水曜日
41. Conclusion
• Reinforcement Learning has been widely used in the
Computer Vision field
• In this paper we presented a method that uses
Reinforcement Learning to decide which algorithm
should be used to recognize objects seen by a mobile
robot in an indoor environment
• Another important contribution of this work is a
method that allows the use of a Reinforcement
Learning algorithm as a Classifier
13年5月22日水曜日