Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hierarchical Object Detection with Deep
Reinforcement Learning
NIPS 2016 Workshop on Reinforcement Learning
[github] [arXi...
Outline
● Introduction
● Related Work
● Hierarchical Object Detection Model
● Experiments
● Conclusions
2
Introduction
3
Introduction
We present a method for performing hierarchical object detection in images
guided by a deep reinforcement lea...
Introduction
We present a method for performing hierarchical object detection in images
guided by a deep reinforcement lea...
Introduction
We present a method for performing hierarchical object detection in images
guided by a deep reinforcement lea...
Introduction
What is Reinforcement Learning ?
“a way of programming agents by reward and punishment without needing to
spe...
Introduction
Reinforcement Learning
● There is no supervisor, only reward
signal
● Feedback is delayed, not
instantaneous
...
Introduction
Reinforcement Learning
An agent that is a decision-maker interacts with the environment and learns
through tr...
Introduction
Reinforcement Learning
An agent that is a decision-maker interacts with the environment and learns
through tr...
Introduction
Contributions:
● Hierarchical object detection in images using deep reinforcement
learning agent
● We define ...
Related Work
12
Related Work
Deep Reinforcement Learning
13
ATARI 2600 Alpha Go
Mnih, V. (2013). Playing atari with deep reinforcement lea...
Related Work
14
Region
Proposals/Sliding
Window +
Detector
Sharing
convolutions over
locations +
Detector
Sharing
convolut...
Related Work
15
Region
Proposals/Sliding
Window +
Detector
Sharing
convolutions over
locations +
Detector
Sharing
convolut...
Related Work
So far we can cluster object detection pipelines based on how the regions
analyzed are obtained:
● Using obje...
Related Work
So far we can cluster object detection pipelines based on how the regions
analyzed are obtained:
● Using obje...
Related Work
Refinement of bounding box predictions
Attention Net:
They cast an object detection problem as an
iterative c...
Related Work
Refinement of bounding box predictions
Active Object Localization with Deep Reinforcement Learning:
19Caicedo...
Hierarchical Object Detection Model
Reinforcement Learning formulation
20
Reinforcement Learning Formulation
We cast the problem as a Markov Decision Process
21
Reinforcement Learning Formulation
We cast the problem as a Markov Decision Process
State: The agent will decide which act...
Reinforcement Learning Formulation
We cast the problem as a Markov Decision Process
Actions: Two kind of actions:
● moveme...
Reinforcement Learning Formulation
Hierarchies of regions
For the first kind of hierarchy,
less steps are required to reac...
Reinforcement Learning Formulation
Reward:
25
Reward for movement actions
Reward for terminal action
Hierarchical Object Detection Model
Q-learning
26
Q-learning
In Reinforcement Learning we want to obtain a function Q(s,a) that predicts
best action a in state s in order t...
Q-learning
What is deep reinforcement learning?
It is when we estimate this Q(s,a) function by means of a deep network
28
...
Hierarchical Object Detection Model
Model
29
Model
We tested two different
configurations of feature
extraction:
Image-Zooms model: We extract
features for every regio...
Model
Our RL agent is based on a
Q-network. The input is:
● Visual description
● History vector
The output is:
● A FC of 6...
Hierarchical Object Detection Model
Training
32
Training
Exploration-Exploitation dilemma
ε-greedy policy
Exploration: With probability ε the agent performs a random acti...
Training
Experience Replay
Bellman equation learns from transitions formed by (s,a,r,s’) Consecutive
experiences are very ...
Experiments
35
Visualizations
These results were obtained
with the Image-zooms
model, which yielded better
results.
We observe that the m...
Experiments
We calculate an upper-bound and baseline experiment with the hierarchies,
and observe that both are very limit...
Experiments
Most of the searches for objects of our agent
finish with just 1, 2 or 3 steps, so our agent
requires very few...
Conclusions
39
Conclusions
● Image-Zooms model yields better results. We argue that with the
ROI-pooling approach we do not have as much ...
Acknowledgements
Technical Support Financial Support
41
Albert Gil (UPC)
Josep Pujal (UPC)
Carlos Tripiana (BSC)
Thank you for your attention!
42
Upcoming SlideShare
Loading in …5
×

Hierarchical Object Detection with Deep Reinforcement Learning

11,647 views

Published on

Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, and Jordi Torres. "Hierarchical Object Detection with Deep Reinforcement Learning." In Deep Reinforcement Learning Workshop (NIPS). 2016.

We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis.We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.

https://imatge-upc.github.io/detection-2016-nipsws/

Published in: Data & Analytics
  • Be the first to comment

Hierarchical Object Detection with Deep Reinforcement Learning

  1. 1. Hierarchical Object Detection with Deep Reinforcement Learning NIPS 2016 Workshop on Reinforcement Learning [github] [arXiv] Míriam Bellver, Xavier Giró i Nieto, Ferran Marqués, Jordi Torres
  2. 2. Outline ● Introduction ● Related Work ● Hierarchical Object Detection Model ● Experiments ● Conclusions 2
  3. 3. Introduction 3
  4. 4. Introduction We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. 4 OBJECT FOUND
  5. 5. Introduction We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. 5 OBJECT FOUND
  6. 6. Introduction We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. 6 OBJECT FOUND
  7. 7. Introduction What is Reinforcement Learning ? “a way of programming agents by reward and punishment without needing to specify how the task is to be achieved” [Kaelbling, Littman, & Moore, 96] 7
  8. 8. Introduction Reinforcement Learning ● There is no supervisor, only reward signal ● Feedback is delayed, not instantaneous ● Time really matters (sequential, non i.i.d data) 8 Slide credit: UCL Course on RL by David Silver
  9. 9. Introduction Reinforcement Learning An agent that is a decision-maker interacts with the environment and learns through trial-and-error 9 Slide credit: UCL Course on RL by David Silver We model the decision-making process through a Markov Decision Process
  10. 10. Introduction Reinforcement Learning An agent that is a decision-maker interacts with the environment and learns through trial-and-error 10 Slide credit: UCL Course on RL by David Silver
  11. 11. Introduction Contributions: ● Hierarchical object detection in images using deep reinforcement learning agent ● We define two different hierarchies of regions ● We compare two different strategies to extract features for each candidate proposal to define the state ● We achieve to find objects analyzing just a few regions 11
  12. 12. Related Work 12
  13. 13. Related Work Deep Reinforcement Learning 13 ATARI 2600 Alpha Go Mnih, V. (2013). Playing atari with deep reinforcement learning Silver, D. (2016). Mastering the game of Go with deep neural networks and tree search
  14. 14. Related Work 14 Region Proposals/Sliding Window + Detector Sharing convolutions over locations + Detector Sharing convolutions over location and also to the detector Single Shot detectors Uijlings, J. R. (2013). Selective search for object recognition Girshick, R. (2015). Fast R-CNN Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN Redmon, J., (2015). YOLO Liu, W.,(2015). SSD Object Detection
  15. 15. Related Work 15 Region Proposals/Sliding Window + Detector Sharing convolutions over locations + Detector Sharing convolutions over location and also to the detector Single Shot detectors Object Detection they rely on a large number of locations they rely on a number of reference boxes from which bbs are regressed Uijlings, J. R. (2013). Selective search for object recognition Girshick, R. (2015). Fast R-CNN Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN Redmon, J., (2015). YOLO Liu, W.,(2015). SSD
  16. 16. Related Work So far we can cluster object detection pipelines based on how the regions analyzed are obtained: ● Using object proposals ● Using reference boxes “anchors” to be potentially regressed 16
  17. 17. Related Work So far we can cluster object detection pipelines based on how the regions analyzed are obtained: ● Using object proposals ● Using reference boxes “anchors” to be potentially regressed There is a third approach: ● Approaches that refine iteratively one initial bounding box (AttentionNet, Active Object Localization with DRL) 17
  18. 18. Related Work Refinement of bounding box predictions Attention Net: They cast an object detection problem as an iterative classification problem. Each category corresponds to a weak direction pointing to the target object. 18Yoo, D. (2015). Attentionnet: Aggregating weak directions for accurate object detection.
  19. 19. Related Work Refinement of bounding box predictions Active Object Localization with Deep Reinforcement Learning: 19Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning
  20. 20. Hierarchical Object Detection Model Reinforcement Learning formulation 20
  21. 21. Reinforcement Learning Formulation We cast the problem as a Markov Decision Process 21
  22. 22. Reinforcement Learning Formulation We cast the problem as a Markov Decision Process State: The agent will decide which action to choose based on the concatenation of: ● visual description of the current observed region ● history vector that maps past actions performed 22
  23. 23. Reinforcement Learning Formulation We cast the problem as a Markov Decision Process Actions: Two kind of actions: ● movement actions: to which of the 5 possible regions defined by the hierarchy to move ● terminal action: the agent indicates that the object has been found 23
  24. 24. Reinforcement Learning Formulation Hierarchies of regions For the first kind of hierarchy, less steps are required to reach a certain scale of bounding boxes, but the space of possible regions is smaller 24 trigger
  25. 25. Reinforcement Learning Formulation Reward: 25 Reward for movement actions Reward for terminal action
  26. 26. Hierarchical Object Detection Model Q-learning 26
  27. 27. Q-learning In Reinforcement Learning we want to obtain a function Q(s,a) that predicts best action a in state s in order to maximize a cumulative reward. This function can be estimated using Q-learning, which iteratively updates Q(s,a) using the Bellman Equation 27 immediate reward future reward discount factor = 0.90
  28. 28. Q-learning What is deep reinforcement learning? It is when we estimate this Q(s,a) function by means of a deep network 28 Figure credit: nervana blogpost about RL one output for each action
  29. 29. Hierarchical Object Detection Model Model 29
  30. 30. Model We tested two different configurations of feature extraction: Image-Zooms model: We extract features for every region observed Pool45-Crops model: We extract features once for the whole image, and ROI-pool features for each subregion 30
  31. 31. Model Our RL agent is based on a Q-network. The input is: ● Visual description ● History vector The output is: ● A FC of 6 neurons, indicating the Q-values for each action 31
  32. 32. Hierarchical Object Detection Model Training 32
  33. 33. Training Exploration-Exploitation dilemma ε-greedy policy Exploration: With probability ε the agent performs a random action Exploitation: With probability 1-ε performs action associated to highest Q(s,a) 33
  34. 34. Training Experience Replay Bellman equation learns from transitions formed by (s,a,r,s’) Consecutive experiences are very correlated, leading to inefficient training. Experience replay collects a buffer of experiences and the algorithm randomly takes mini batches from this replay memory to train the network 34
  35. 35. Experiments 35
  36. 36. Visualizations These results were obtained with the Image-zooms model, which yielded better results. We observe that the model approximates to the object, but that the final bounding box is not accurate. 36
  37. 37. Experiments We calculate an upper-bound and baseline experiment with the hierarchies, and observe that both are very limited in terms of recall. Image-Zooms model achieves better Precision-Recall metric 37
  38. 38. Experiments Most of the searches for objects of our agent finish with just 1, 2 or 3 steps, so our agent requires very few steps to approximate to objects. 38
  39. 39. Conclusions 39
  40. 40. Conclusions ● Image-Zooms model yields better results. We argue that with the ROI-pooling approach we do not have as much resolution as with the Image-Zoom features. Although Image-Zooms is more computationally intensive, we can afford it because with just a few steps we approximate to the object. ● Our agent approximates to the object, but the final bounding box is not accurate enough due that the hierarchy limits our space of solutions. A solution could be training a regressor that adjusts the bounding box to the target object. 40
  41. 41. Acknowledgements Technical Support Financial Support 41 Albert Gil (UPC) Josep Pujal (UPC) Carlos Tripiana (BSC)
  42. 42. Thank you for your attention! 42

×