Automatic Selection of ObjectRecognition Methods UsingReinforcement LearningReinaldo A.C. Bianchi†, Arnau Ramisa‡,and Ramó...
Authors• Reinaldo A.C. Bianchi‣ full professor at the electric engineering department of thecentro universitário da FEI, a...
AbstractWhich algorithm should be used to recognize objects?Question:Goal:Automatically select the best algorithm from 2 s...
AbstractWhich algorithm should be used to recognize objects?Question:Goal:Automatically select the best algorithm from 2 s...
Reinforcement LearningThe RL problem is meant to be a straightforward framing ofthe problem of learning from interaction t...
Reinforcement LearningFormulation:Q⇤(s, a)is the reward received upon performing action in statea s+ the discounted value ...
Q-LearningFormulation:The Q-learning algorithm iteratively approximates ˆQthe values will converge with probability 1 toˆQ...
Reinforcement LearningQ-learning Algorithm:Q.initialize(arbitrarily)Episodes.each dos.initializewhile s.is_terminal?a = ep...
RL applications in Computer VisionActive VisionWhitehead and Ballard (1991) => Machine Learning (Jnl.)described an adaptiv...
RL applications in Computer VisionActive VisionDarrell (1998) => NIPSconcisely represented active recognition behavior der...
RL applications in Computer VisionActive VisionBorotschnig, et al. (1999) => Image and Vision Computing (Jnl.)built a syst...
RL applications in Computer VisionImage Segmentation and Object RecognitionPeng and Bhanu (1998, 2000) => IEEE Trnsc. on S...
RL applications in Computer VisionParameter Selection in Vision ProblemShokri and Tizhoosh (2003∼2008) => some Int. Jnl.s....
RL applications in Computer VisionObject Recognitionmodeled the object recognition problem as a Markov Decision Problem,an...
Summary of RL in CVWhitehead and Ballard (1991)active sensory-motor systems RL based decision systems+adaptive control arc...
Limitations caused by RLIn Object Recognition Taskthe reward value associated with a situation‣ is usually not directly av...
Two Object Recognition MethodsLowe s Feature Matching method (Lowe, 2004)Vocabulary Tree Algorithm (Nistér and Stewénius, ...
Lowe s MethodModel InputResultIdentification13年5月22日水曜日
Vocabulary Tree Algorithmwhich class?Classification13年5月22日水曜日
Preprocessing• segmentation1. apply bilateral filtering to remove texture from theimage2. the Canny edge detector is applie...
Weaknesses of two methodsLowe s Feature Matching Methodperforms poorly when recognizing sparsely texturedobjects or object...
Learning to Select Object Recognition Methods1st stage 1. decide to use the image for recognition‣ because the image conta...
Stateattributes extracted from the imagesthe possible classification of the image+Space stateState definition example in 1st...
ActionUpdate action (not real action happening in the world)Q(s, a)update the value of a state-action pair   at one stateu...
RewardIf the learning agent reaches a state where a traning imageexists and the state corresponds to the correct classifica...
MDP Definition• The set of update actions that the agent can perform,defined as update the Q value using the value of a neig...
Training phaseReinforcement learning is performed over a set of pre-classified images→ learn a mapping from images to image...
Training phaseWhat is happening during the learning phase?GoalEvery time the robot finds agoal state and receive areward, t...
Training phaseσIclass idid 0: no such image in datasetid 1: a image should be recognized by Lowe is foundid 2: a image sho...
Training phaseσIclass idThe right image shows a table where the classification wasspread over to states where there are no ...
Learning and Test databaseimage dataset of nine typical household objects(Ramisa et al, 2008)objects withrepetitive textur...
Learning and Test databaseimage dataset of nine typical household objects(Ramisa et al, 2008)test imageinclude occlusions,...
Learning and Test databaseimage dataset of nine typical household objects(Ramisa et al, 2008)background imagesthat do not ...
Space State DescriptorsMSmean and standard deviation of the image intensitymean and standard deviation of the image intens...
Experiments13年5月22日水曜日
Reward tablereward +10reward -10x in the training set exists an image with thiscombination of mean and std dev values. rep...
Classification Tablex in the training set exists an image with thiscombination of mean and std dev values. represents image...
Experimentsobject image(3 categories)test image(with nuisance)background image(with no objects)image dataset of nine typic...
Correctly Classified ImagesFull ImgFull ImgFull Img Small ImgSmall ImgSmall Img ExpertMS MSE MSI MS MSE MSIBack 91.9 100.0 ...
Incorrect ClassificationFull ImgFull ImgFull Img Small ImgSmall ImgSmall Img ExpertMS MSE MSI MS MSE MSIBack 12.8 1.8 14.2 ...
Conclusion• Reinforcement Learning has been widely used in theComputer Vision field• In this paper we presented a method th...
Upcoming SlideShare
Loading in …5
×

Automatic selection of object recognition methods using reinforcement learning

1,198 views

Published on

ゼミで紹介した論文

Published in: Technology, Education
  • Be the first to comment

Automatic selection of object recognition methods using reinforcement learning

  1. 1. Automatic Selection of ObjectRecognition Methods UsingReinforcement LearningReinaldo A.C. Bianchi†, Arnau Ramisa‡,and Ramón López de Mántaras‡†Centro Universitário da FEI, Brazil‡Artificial Intelligence Research Institute, SpainPresenter: Shunta SAITO13年5月22日水曜日
  2. 2. Authors• Reinaldo A.C. Bianchi‣ full professor at the electric engineering department of thecentro universitário da FEI, at são bernardo do campo, sãopaulo, brasil• Arnau Ramisa‣ postdoc with the Perception and Manipulation team at theIndustrial Robotics Institute (IRI UPC-CSIC), at theUniversitat Politecnica de Catalunya• Ramón López de Mántaras‣ Director of the IIIA (Artificial Intelligence Research Institute)of the CSIC (Spanish National Research Council)13年5月22日水曜日
  3. 3. AbstractWhich algorithm should be used to recognize objects?Question:Goal:Automatically select the best algorithm from 2 state-of-the-art object recognition algorithmsMethodology:Using Reinforcement LearningBackground:The robot should be able to decide by itself which objectrecognition method should be used, depending on thecurrent conditions of the world13年5月22日水曜日
  4. 4. AbstractWhich algorithm should be used to recognize objects?Question:Goal:Automatically select the best algorithm from 2 state-of-the-art object recognition algorithmsMethodology:Using Reinforcement LearningBackground:The robot should be able to decide by itself which objectrecognition method should be used, depending on thecurrent conditions of the world13年5月22日水曜日
  5. 5. Reinforcement LearningThe RL problem is meant to be a straightforward framing ofthe problem of learning from interaction to achieve a goal(Sutton and Barto, 1998)Formulation:assuming that the environment fill Markov property- finite set of states   that the agent can achieve;s 2 S- A finite set of possible actions   that the agent can perform;a 2 A- A state transition function T : S ⇥ A !Y(S)where   is a probability distribution overY(S) S- A finite set of bounded reinforcements (payoffs) R : S ⇥ A ! Rtask: find out a stationary policy of actions ⇡⇤: S ! A13年5月22日水曜日
  6. 6. Reinforcement LearningFormulation:Q⇤(s, a)is the reward received upon performing action in statea s+ the discounted value of following the optimal policy thereafterQ⇤(s, a) ⌘ R(s, a) +Xs02ST(s, a, s0)V ⇤(s0)optimal policy ⇡⇤⌘ arg maxaQ⇤(s, a)transition probability: optimal state-action functionQ⇤(s, a) ⌘ R(s, a) +Xs02ST(s, a, s0)maxa0 Q⇤(s0, a0)13年5月22日水曜日
  7. 7. Q-LearningFormulation:The Q-learning algorithm iteratively approximates ˆQthe values will converge with probability 1 toˆQ Q⇤ˆQLet be the learner s estimate of Q⇤(s, a)ˆQ(s, a) ˆQ(s, a) + ↵hr + maxa0 ˆQ(s0, a0) ˆQ(s, a)ia0s0sa...........................backup diagramˆQ(s, a)ˆQ(s0, a0)(0  < 1)↵ =11 + visits(s, a)total number of times this state-action pair has been visitedtimes, and backup↵step-size parameter13年5月22日水曜日
  8. 8. Reinforcement LearningQ-learning Algorithm:Q.initialize(arbitrarily)Episodes.each dos.initializewhile s.is_terminal?a = epsilon_greedy(s) # using policy derived rom Qreward, next_state = take_action aQ.update(s, a, reward, next_state)s = next_stateendendˆQ(s, a) ˆQ(s, a) + ↵hr + maxa0 ˆQ(s0, a0) ˆQ(s, a)i13年5月22日水曜日
  9. 9. RL applications in Computer VisionActive VisionWhitehead and Ballard (1991) => Machine Learning (Jnl.)described an adaptive control architecture to integrate active sensory-motor systems with RL based decision systemsMinut and Mahadevan (2001) => ICAAproposed a model of selective attention for visual search tasks, such asdeciding where to fixate next in order to reach the region where an object ismost likely to be foundDarrell and Pentland (1996a,b) => NIPS, ICPRproposed a gesture recognition system that guides an active camera tofoveate salient features based on a RL paradigmTo angle ones eyes such that the foveae are directed at (an object in ones field of view)13年5月22日水曜日
  10. 10. RL applications in Computer VisionActive VisionDarrell (1998) => NIPSconcisely represented active recognition behavior derived from hidden-state reinforcement learning techniquesPaletta and Pinz (2000) => Robotics and Autonomous Systems (Jnl.)applied RL in an active object recognition system, to learn how to movethe camera to informative viewpoints, defining the recognition process as asequential decision problem with the objective of disambiguating initialobject hypothesesReinforcement Learning provides then an efficient method toautonomously develop near-optimal decision strategies in terms ofsensori- motor mappings (Paletta et al, 1998)For these authors...13年5月22日水曜日
  11. 11. RL applications in Computer VisionActive VisionBorotschnig, et al. (1999) => Image and Vision Computing (Jnl.)built a system that learns to reposition the camera to capture additionalviews to improve the iamge classification result which is obtained from asingle viewPaletta, et al. (2005) => ICMLproposed the use of Q-learning to associate shift of attention actions tocumulative reward with respect to object recognitionImage SegmentationPeng and Bhanu (1998) => IEEE Trnsc. on PAMIused RL to learn to adapt the image segmentation params of a specificalgorithm to the changing environmental conditions13年5月22日水曜日
  12. 12. RL applications in Computer VisionImage Segmentation and Object RecognitionPeng and Bhanu (1998, 2000) => IEEE Trnsc. on SMCimproved the recognition results by using the output at the highest level asfeedback for the learning systemTaylor (2004) => MSc Thesisproposed a general framework for applying RL to parameter selectionproblem in visionParameter Selection in Vision ProblemTizhoosh and Taylor (2006) => Int. Jnl. of Image and Graphicsproposed a automated technique for obtaining a subjectively ideal imageenhancement13年5月22日水曜日
  13. 13. RL applications in Computer VisionParameter Selection in Vision ProblemShokri and Tizhoosh (2003∼2008) => some Int. Jnl.s...proposed a reinforcement agent for finding an optimal threshold in orderto segment digital imagesYin (2002) => Signal Process (Jnl.)design a general framework for an intelligent system to extract one objectof interest from ultrasound images based on reinforcement learningSahba, et al. (2008) => Expert Systems with Applications (Jnl.)proposed a RL system for adaptive tropical cyclone patterns segmentationand feature extraction from satellite imagery and introduced a closed-loopsystem based tropical cyclone forecast on RLHossain, et al. (1999) => IEEE SMC13年5月22日水曜日
  14. 14. RL applications in Computer VisionObject Recognitionmodeled the object recognition problem as a Markov Decision Problem,and proposed a theoretically sound method for constructing objectrecogntion strategies by combining CV algorithms to performsegmentation (The result is a system called ADORE (Adaptive ObjectRecognition) that automatically learns object-recognition strategies fromtraining data)Draper, et al. (1999) => Jnl. of Computer Vision ResearchThere are many applications of RL in Computer Vision13年5月22日水曜日
  15. 15. Summary of RL in CVWhitehead and Ballard (1991)active sensory-motor systems RL based decision systems+adaptive control architectureoptimize the performance of active vision systemsdecide where the focus of attention should belearn how to move a camera to more informative viewpointsoptimize parameters of existing and new CV algorithmsdiversified to...13年5月22日水曜日
  16. 16. Limitations caused by RLIn Object Recognition Taskthe reward value associated with a situation‣ is usually not directly available‣ requires that a certain amount of knowledge about theworld to be definedthe large space state‣ make it difficult to converge‣ RL algorithms rises performance issues13年5月22日水曜日
  17. 17. Two Object Recognition MethodsLowe s Feature Matching method (Lowe, 2004)Vocabulary Tree Algorithm (Nistér and Stewénius, 2006)‣ is proposed together with SIFT‣ is a single view object detectionand recognition system‣ matches features between a test and model images as below‣ uses visual words (Bag-of-Words) to classify images13年5月22日水曜日
  18. 18. Lowe s MethodModel InputResultIdentification13年5月22日水曜日
  19. 19. Vocabulary Tree Algorithmwhich class?Classification13年5月22日水曜日
  20. 20. Preprocessing• segmentation1. apply bilateral filtering to remove texture from theimage2. the Canny edge detector is applied to define theedges in the image3. mathematical morphology operators are applied inorder to close the contours that remained open4. a flood-fill algorithm is used to fill connected areasdivided by the edges13年5月22日水曜日
  21. 21. Weaknesses of two methodsLowe s Feature Matching Methodperforms poorly when recognizing sparsely texturedobjects or objects with repetitive patternsVocabulary Tree Algorithmneeds an accurate segmentation state, prior toclassification, which can be very time consuming, and itdepends on the quality of the segmentation stage toprovide good results13年5月22日水曜日
  22. 22. Learning to Select Object Recognition Methods1st stage 1. decide to use the image for recognition‣ because the image contains an object2. decide the image should be discarded‣ because the image does not contain objects2nd stage 1. decide to use Lowe s algorithm2. decide to use Vocabulary Tree(VT) Algorithmuse Reinforcement Learning as a classification method13年5月22日水曜日
  23. 23. Stateattributes extracted from the imagesthe possible classification of the image+Space stateState definition example in 1st stages = [I, , c]c : class IDσ: standard deviation of image intensityI : mean image intensity13年5月22日水曜日
  24. 24. ActionUpdate action (not real action happening in the world)Q(s, a)update the value of a state-action pair   at one stateusing the value of a neighbor pairExample:if space state is composed of [ I, σ] (2D)Iσ0.1 0.7 1.2 3.1 1.80.5 1.1 0.3 2.6 4.11.4 2.3 3.2 0.9 2.70.7 4.3 2.7 1.4 3.93.2 4.6 1.3 1.7 0.7(after action a)Iσ0.1 0.7 1.2 3.1 1.80.5 1.1 0.3 2.6 4.11.4 2.3 2.3 0.9 2.70.7 4.3 2.7 1.4 3.93.2 4.6 1.3 1.7 0.7update action→action toward space stateQ(s, a) values13年5月22日水曜日
  25. 25. RewardIf the learning agent reaches a state where a traning imageexists and the state corresponds to the correct classificationof the image, the agent receives a rewardExample:a training image (mean = 50, std = 10, not contains objects)a state (mean = 50, std = 10, classification = discard)exists?yes noreward > 0 reward = 013年5月22日水曜日
  26. 26. MDP Definition• The set of update actions that the agent can perform,defined as update the Q value using the value of a neighbor• The finite set of states in this case is the n-dimensionalspace of values of the attributes extracted from the imagesplus its classification;• The state transition function allows updates to be madebetween any pair of neighbors in the set of states• The Reinforcements are defined using a set oftraining imagesa 2 As 2 SR : S ⇥ A ! R13年5月22日水曜日
  27. 27. Training phaseReinforcement learning is performed over a set of pre-classified images→ learn a mapping from images to image classesLearning algorithm13年5月22日水曜日
  28. 28. Training phaseWhat is happening during the learning phase?GoalEvery time the robot finds agoal state and receive areward, the state-action pairwhere the robot was beforereaching the goal state isupdatedEvery time the robot moves,it iteratively updates the originstate-action pair13年5月22日水曜日
  29. 29. Training phaseσIclass idid 0: no such image in datasetid 1: a image should be recognized by Lowe is foundid 2: a image should be recognized by VTA is foundlearning13年5月22日水曜日
  30. 30. Training phaseσIclass idThe right image shows a table where the classification wasspread over to states where there are no prior examples, andthat allows the classification of other imageslearning13年5月22日水曜日
  31. 31. Learning and Test databaseimage dataset of nine typical household objects(Ramisa et al, 2008)objects withrepetitive texturetexturedobjectsnon-texturedobjects3 categories (3 objects per category), each category consists of 3 differentobjects and each object has approximately 20 training images13年5月22日水曜日
  32. 32. Learning and Test databaseimage dataset of nine typical household objects(Ramisa et al, 2008)test imageinclude occlusions, illumination changes, blur and other typical nuisancesthat will be encountered while navigating with a mobile robot13年5月22日水曜日
  33. 33. Learning and Test databaseimage dataset of nine typical household objects(Ramisa et al, 2008)background imagesthat do not contain objects to be recognized13年5月22日水曜日
  34. 34. Space State DescriptorsMSmean and standard deviation of the image intensitymean and standard deviation of the image intensity plusentropy of the imageMSEmean and standard deviation of the image intensity plus thenumber of interest points detected by the Difference ofGaussians operatorMSI13年5月22日水曜日
  35. 35. Experiments13年5月22日水曜日
  36. 36. Reward tablereward +10reward -10x in the training set exists an image with thiscombination of mean and std dev values. represents images that does notcontain objects (backgrounds)(whitespace) is represents absence obimages13年5月22日水曜日
  37. 37. Classification Tablex in the training set exists an image with thiscombination of mean and std dev values. represents images that does notcontain objects (backgrounds)(whitespace) is represents absence obimagesthe results of applying the RL algorithm during the learning phase13年5月22日水曜日
  38. 38. Experimentsobject image(3 categories)test image(with nuisance)background image(with no objects)image dataset of nine typical household objects(Ramisa et al, 2008)Experiment phases1. the training of the RL‣ 40 test images, from which approximately 160 images containing objectswere segmented and previously classified‣ 360 background images, also resulting from the segmentation process2. the execution phase where training quality can be verified13年5月22日水曜日
  39. 39. Correctly Classified ImagesFull ImgFull ImgFull Img Small ImgSmall ImgSmall Img ExpertMS MSE MSI MS MSE MSIBack 91.9 100.0 98.0 92.6 100.0 98.9 100.0Lowe 84.5 100.0 44.4 76.0 98.4 38.1 93.2(%)↵ = 0.1 = 0.913年5月22日水曜日
  40. 40. Incorrect ClassificationFull ImgFull ImgFull Img Small ImgSmall ImgSmall Img ExpertMS MSE MSI MS MSE MSIBack 12.8 1.8 14.2 20.4 2.4 25.3 8.2Lowe 11.6 1.9 7.9 15.8 1.9 9.9 10.8(%)↵ = 0.1 = 0.913年5月22日水曜日
  41. 41. Conclusion• Reinforcement Learning has been widely used in theComputer Vision field• In this paper we presented a method that usesReinforcement Learning to decide which algorithmshould be used to recognize objects seen by a mobilerobot in an indoor environment• Another important contribution of this work is amethod that allows the use of a ReinforcementLearning algorithm as a Classifier13年5月22日水曜日

×