The document describes Faster R-CNN, an object detection method that uses a Region Proposal Network (RPN) to generate region proposals from feature maps, pools features from each proposal into a fixed size using RoI pooling, and then classifies and regresses bounding boxes for each proposal using a convolutional network. The RPN outputs objectness scores and bounding box adjustments for anchor boxes sliding over the feature map, and non-maximum suppression is applied to reduce redundant proposals.
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
YOLO, a new approach to object detection. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
YOLO, a new approach to object detection. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Intro to selective search for object proposals, rcnn family and retinanet state of the art model deep dives for object detection along with MAP concept for evaluating model and how does anchor boxes make the model learn where to draw bounding boxes
Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.
I mede this slide for the beginners of object detection.
Anchor box was really hard to understand for me, so I wrote about it as easy to understand as I can.
Let's overwhelmingly prosper!!
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Object Detection using Deep Neural NetworksUsman Qayyum
Recent Talk at PI school covering following contents
Object Detection
Recent Architecture of Deep NN for Object Detection
Object Detection on Embedded Computers (or for edge computing)
SqueezeNet for embedded computing
TinySSD (object detection for edge computing)
Slides from the UPC reading group on computer vision about the following paper:
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015).
Lecture 10b: Classification. k-Nearest Neighbor classifier, Logistic Regression, Support Vector Machines (SVM), Naive Bayes (ppt,pdf)
Chapters 4,5 from the book “Introduction to Data Mining” by Tan, Steinbach, Kumar.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Intro to selective search for object proposals, rcnn family and retinanet state of the art model deep dives for object detection along with MAP concept for evaluating model and how does anchor boxes make the model learn where to draw bounding boxes
Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.
I mede this slide for the beginners of object detection.
Anchor box was really hard to understand for me, so I wrote about it as easy to understand as I can.
Let's overwhelmingly prosper!!
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Object Detection using Deep Neural NetworksUsman Qayyum
Recent Talk at PI school covering following contents
Object Detection
Recent Architecture of Deep NN for Object Detection
Object Detection on Embedded Computers (or for edge computing)
SqueezeNet for embedded computing
TinySSD (object detection for edge computing)
Slides from the UPC reading group on computer vision about the following paper:
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015).
Lecture 10b: Classification. k-Nearest Neighbor classifier, Logistic Regression, Support Vector Machines (SVM), Naive Bayes (ppt,pdf)
Chapters 4,5 from the book “Introduction to Data Mining” by Tan, Steinbach, Kumar.
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개할 논문은 3D관련 업무를 진행 하시는/ 희망하시는 분들의 필수 논문인 VoxelNET 입니다.
발표자료:https://www.slideshare.net/taeseonryu/mcsemultimodal-contrastive-learning-of-sentence-embeddings
안녕하세요! 딥러닝 논문읽기 모임입니다.
오늘은 자율 주행, 가정용 로봇, 증강/가상 현실과 같은 다양한 응용 분야에서 중요한 문제인 3D 포인트 클라우드에서의 객체 탐지에 대한 획기적인 진전을 소개하고자 합니다. 이를 위해 'VoxelNet'이라는 새로운 3D 탐지 네트워크에 대해 알아보겠습니다.
1. 기존 방법의 한계
기존의 많은 노력은 수동으로 만들어진 특징 표현, 예를 들어 새의 눈 시점 투영 등에 집중해 왔습니다. 하지만 이러한 방법들은 LiDAR 포인트 클라우드와 영역 제안 네트워크(RPN) 사이의 연결을 효과적으로 수행하기 어렵습니다.
2. VoxelNet의 혁신적 접근법
VoxelNet은 3D 포인트 클라우드를 위한 수동 특징 공학의 필요성을 없애고, 특징 추출과 바운딩 박스 예측을 단일 단계, end-to-end 학습 가능한 깊은 네트워크로 통합합니다. VoxelNet은 포인트 클라우드를 균일하게 배치된 3D 복셀로 나누고, 새롭게 도입된 복셀 특징 인코딩(VFE) 레이어를 통해 각 복셀 내의 포인트 그룹을 통합된 특징 표현으로 변환합니다.
3. 효과적인 기하학적 표현 학습
이 방식을 통해 포인트 클라우드는 서술적인 체적 표현으로 인코딩되며, 이는 RPN에 연결되어 탐지를 생성합니다. VoxelNet은 다양한 기하학적 구조를 가진 객체의 효과적인 구별 가능한 표현을 학습합니다.
4. 성능 평가
KITTI 자동차 탐지 벤치마크에서의 실험 결과, VoxelNet은 기존의 LiDAR 기반 3D 탐지 방법들을 큰 차이로 능가했습니다. 또한, LiDAR만을 기반으로 한 보행자와 자전거 탐지에서도 희망적인 결과를 보였습니다.
VoxelNet의 도입은 3D 포인트 클라우드에서의 객체 탐지를 혁신적으로 개선하고 있으며, 이 분야에서의 미래 발전에 중요한 영향을 미칠 것으로 기대됩니다.
오늘 논문 리뷰를 위해 이미지처리 허정원님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/yCgsCyoJoMg
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms
that use space that is linear or sublinear in the dimension. We prove general results showing that any sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation
inequalities for operators arising in the computation of these measures.
-Proceedings: https://arxiv.org/abs/1804.03065
-Origin: https://arxiv.org/abs/1804.03065
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
1. Tutorial
Faster R-CNN
Object Detection: Localization & Classification
Hwa Pyung Kim
Department of Computational Science and Engineering, Yonsei University
hpkim0512@yonsei.ac.kr
2. 𝑥
𝑦
𝑤
ℎ
Bounding box regression (localization):
Where?
Object Detection: Classification + Regression
A dog at (𝒙, 𝒚, 𝒘, 𝒉)
+ =
1
0
0
⋮
Dog
Cat
⋮
Person
Classification (recognition):
What?
Objection Detection
Feature
map
Encoding
(conv&pool)
Combining
features
𝒙, 𝒚
w
h
Bounding box information
• 𝒙, 𝒚 : top left corner position
• w = width
• h = height
3. Dog
Cat
Person
⋮
pool5 features[224,224,3]
[7,7,512]
Input image
224
224
7 =
224
32
32 = 25
5 = # of pooling
7
7
Vgg16 Networks
Pooling
CNN-based Object Detection:
There are clues of dog (What) at local position (Where)
in the convolution feature map
Fully-connected
layers
Classification
Regression
𝑥
𝑦
𝑤
ℎ
1
0
0
⋮
These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”.
⋯ ⋯ Dog
4. Multiple Object Detection:
Localize and Classify all objects appearing in the image
How many objects are in there?
• Classify these multiply overlapping objects
• Identify their bounding boxes
PASCAL VOC2007
5. Background
Person
Dining table
Extract “region proposals” using
selective search method.
ConvNet
Region based CNN (R-CNN) method
CNN input (fixed size)
Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape
Classifier
&
Regressor
Classifier
&
Regressor
Classifier
&
Regressor
7. Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Classifier
&
Regressor
What is Region Proposal Network?
8. Region Proposal Network (RPN)
Region Proposal Network
380
480 11 =
360
32
, 15 =
480
32
32 = 25
5 = # of pooling
512 = # of filters
15
11
512
Conv feature map
RPN
RPN outputs a set of rectangular object
proposals, each with an objectness score.
How?
Region proposals
9. Region Proposal Network
Conv feature map
15
11
512
Region Proposals & Anchor Boxes
𝑠 𝑜𝑏𝑗
𝑠 𝑛𝑜𝑏𝑗
t𝑥
t𝑦
t𝑤
tℎ
Fully-
connected
layers
Input: each sliding window
3×3×512
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the proposal is parametrized relative to an anchor.
𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥
𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦
𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤
𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ
Output:
• 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
• 2 scores: 𝑠 𝑜𝑏𝑗
, 𝑠 𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
Anchor box information
• 𝒂𝒙 , 𝒂𝒚 : center position
• 𝒂𝒘 = width
• 𝒂𝒉 = height
Anchor box
For example, 𝑎𝑤 = 𝑎ℎ = 128
• 𝑎𝑤 and 𝑎ℎ are fixed.
• 𝑎𝑥 , 𝑎𝑦 is determined by the
position of the red box
10. Region Proposals & Anchor Boxes
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1Conv feature map
15
11
512
Fully-
connected
layers
3×3×512
• 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed.
• 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the
position of the red box
9 Anchor boxes = 3 ratios × 3 scales
For example,
𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128,
𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯
𝑎𝑤7 =
1
2
× 𝑎ℎ7 = 128, ⋯
Output: For 𝑖 = 1, ⋯ , 9,
• 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖
• 2 scores: 𝑠𝑖
𝑜𝑏𝑗
, 𝑠𝑖
𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the 9 proposals are parametrized relative to 9 anchors.
Input: each sliding window
Region Proposal Network
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖
Anchor box information
• 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position
• 𝒂𝒘𝒊 = width
• 𝒂𝒉𝒊 = height
12. ⋮
⋮
Total # of windows # of proposals
per a window
Total # of proposals: 11 × 15 × 9 = 1485
Conv feature map
The proposals highly overlaps each other!
Need to reduce redundancy.
Generate Region Proposals
15
11
512
Total#ofwindows=11×15
Region Proposal Network
13. Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯ ⋯
Most probable proposal
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Proposal information
• 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position
• 𝒑𝒘𝒊 = width
• 𝒑𝒉𝒊 = height
• 𝒑𝒊 = objectness probability,
𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓
𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
14. Region Proposal Network
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.83𝐼𝑂𝑈 = 0.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
15. Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.830.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
𝐼𝑂𝑈 =
16. Most probable proposal
30 proposals having IoU>0.7
are discarded.
Region Proposal Network
Given the most probable proposal,
the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Summary of step 1-2 in NMS.
Step 3:
Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process.
Next most probable proposal
36 proposals having IoU>0.7
are discarded.
Reduce redundancy by NMS
17. Region Proposal Network
Before NMS After NMS
1,485 proposals 300 proposals
Repeats the previous procedure until…
Reduce redundancy by NMS
18. Summary of RPN
Inputs:
• Conv feature map
Outputs:
• Region proposals coordinates.
• Probabilities representing how likely the image in that region proposal will be an object.
Region Proposal Network
19. feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Now we are ready to explain
Classifier & Regressor.
Classifier
&
Regressor
Classifier & Regressor
20. RoI pooling layer
Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Classifier & Regressor
Bilinear interpolation
& Max pooling
Input for
Classifier & Regressor
: fixed-size
Conv feature map
Bilinear interpolation
& Max pooling
Convert the features inside valid RoI into a small feature map with a fixed spatial extent.
𝑝𝑥′
= 𝑝𝑥 ⋅
15
, 𝑝𝑦′
= 𝑝𝑦 ⋅
11
, 𝑝𝑤′
= 𝑝𝑤 ⋅
15
, 𝑝ℎ′
= 𝑝ℎ ⋅
11
360
480
11
15
5
8
3
9
7
7
7
7
𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
21. ⋯
300 RoI pooled feature maps
RoI pooling layer generates
inputs for Classifier & Regressor
Classifier & Regressor
7
7
512
7
7
512
7
7
512
7
7
512
23. Summary of Classification & Regression
Regress & classify
each class from proposals
⋮
Background
Person
TV monitor
⋮
⋮
Reduce redundancy
by NMS
Dining table
⋮
None
None
Classifier & Regressor
Discard bounding boxes
(p < 0.6 or background)
⋮
⋮
⋮
Region Proposals
24. Summary of Classifier & Regressor
Inputs:
• Conv feature map
• Region proposals
Outputs:
• Bounding boxes coordinate of objects in the image.
• Classification of bounding boxes
Classifier & Regressor
29. References
[Gitbooks] Object Localization and Detection
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html
[ICCV2015 Tutorial] Convolutional Feature Maps
https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf
[Infographic] The Modern History of Object Recognition
https://github.com/Nikasa1889/HistoryObjectRecognition
[Tensorflow Code] tf-Faster-RCNN
https://github.com/kevinjliang/tf-Faster-RCNN
[Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4
[pyimagesearch] Intersection over Union (IoU) for object detection
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[Stanford c231n] Lecture 11: Detection and Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf