Intro to selective search for object proposals, rcnn family and retinanet state of the art model deep dives for object detection along with MAP concept for evaluating model and how does anchor boxes make the model learn where to draw bounding boxes
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 270번째 논문 review입니다.
이번 논문은 Baidu에서 나온 PP-YOLO: An Effective and Efficient Implementation of Object Detector입니다. YOLOv3에 다양한 방법을 적용하여 매우 높은 성능과 함께 매우 빠른 속도 두마리 토끼를 다 잡아버린(?) 그런 논문입니다. 논문에서 사용한 다양한 trick들에 대해서 좀 더 깊이있게 살펴보았습니다. Object detection에 사용된 기법 들 중에 Deformable convolution, Exponential Moving Average, DropBlock, IoU aware prediction, Grid sensitivity elimination, MatrixNMS, CoordConv, 등의 방법에 관심이 있으시거나 알고 싶으신 분들은 영상과 발표자료를 참고하시면 좋을 것 같습니다!
논문링크: https://arxiv.org/abs/2007.12099
영상링크: https://youtu.be/7v34cCE5H4k
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 270번째 논문 review입니다.
이번 논문은 Baidu에서 나온 PP-YOLO: An Effective and Efficient Implementation of Object Detector입니다. YOLOv3에 다양한 방법을 적용하여 매우 높은 성능과 함께 매우 빠른 속도 두마리 토끼를 다 잡아버린(?) 그런 논문입니다. 논문에서 사용한 다양한 trick들에 대해서 좀 더 깊이있게 살펴보았습니다. Object detection에 사용된 기법 들 중에 Deformable convolution, Exponential Moving Average, DropBlock, IoU aware prediction, Grid sensitivity elimination, MatrixNMS, CoordConv, 등의 방법에 관심이 있으시거나 알고 싶으신 분들은 영상과 발표자료를 참고하시면 좋을 것 같습니다!
논문링크: https://arxiv.org/abs/2007.12099
영상링크: https://youtu.be/7v34cCE5H4k
Object Detection using Deep Neural NetworksUsman Qayyum
Recent Talk at PI school covering following contents
Object Detection
Recent Architecture of Deep NN for Object Detection
Object Detection on Embedded Computers (or for edge computing)
SqueezeNet for embedded computing
TinySSD (object detection for edge computing)
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
YOLO, a new approach to object detection. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
In machine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that have successfully been applied fpr analyzing visual imagery.
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...SlideTeam
Showcase how machines are built to perform intelligent tasks by using our content-ready Reinforcement Learning In AI PowerPoint Presentation Slide Templates Complete Deck. Take advantage of these artificial intelligence PowerPoint visuals, and describe how machine learning models are trained to make sequences of decisions in a complex environment. Showcase the types of artificial intelligence such as deep learning, machine learning. Explain the concept of machine learning which delivers predictive models based on the data fed into machine learning algorithms. Take the assistance of our visually attention-grabbing reinforcement learning PowerPoint templates and discuss the effective uses of artificial intelligence in various areas such as supply chain, human resources, fraud detection, knowledge creation, research, and development, etc. You can also present the usage of AI in healthcare. This includes treatment, diagnosis, training and research, early detection, etc. Explain the working of machine learning by downloading our attention-grabbing supervised learning PowerPoint presentation. https://bit.ly/3kQBnEZ
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
In this presentation we discuss the convolution operation, the architecture of a convolution neural network, different layers such as pooling etc. This presentation draws heavily from A Karpathy's Stanford Course CS 231n
Object Detection using Deep Neural NetworksUsman Qayyum
Recent Talk at PI school covering following contents
Object Detection
Recent Architecture of Deep NN for Object Detection
Object Detection on Embedded Computers (or for edge computing)
SqueezeNet for embedded computing
TinySSD (object detection for edge computing)
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
YOLO, a new approach to object detection. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
In machine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that have successfully been applied fpr analyzing visual imagery.
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...SlideTeam
Showcase how machines are built to perform intelligent tasks by using our content-ready Reinforcement Learning In AI PowerPoint Presentation Slide Templates Complete Deck. Take advantage of these artificial intelligence PowerPoint visuals, and describe how machine learning models are trained to make sequences of decisions in a complex environment. Showcase the types of artificial intelligence such as deep learning, machine learning. Explain the concept of machine learning which delivers predictive models based on the data fed into machine learning algorithms. Take the assistance of our visually attention-grabbing reinforcement learning PowerPoint templates and discuss the effective uses of artificial intelligence in various areas such as supply chain, human resources, fraud detection, knowledge creation, research, and development, etc. You can also present the usage of AI in healthcare. This includes treatment, diagnosis, training and research, early detection, etc. Explain the working of machine learning by downloading our attention-grabbing supervised learning PowerPoint presentation. https://bit.ly/3kQBnEZ
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
In this presentation we discuss the convolution operation, the architecture of a convolution neural network, different layers such as pooling etc. This presentation draws heavily from A Karpathy's Stanford Course CS 231n
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Computer Vision Landscape : Present and FutureSanghamitra Deb
Millions of people all around the world Learn with Chegg. Education at Chegg is powered by the depth and diversity of the content that we have. A huge part of our content is in form of images. These images could be uploaded by students or by content creators. Images contain text that is extracted using a transcription service. Very often uploaded images are noisy. This leads to irrelevant characters or words in the transcribed text. Using object detection techniques we develop a service that extracts the relevant parts of the image and uses a transcription service to get clean text. In the first part of the presentation, I will talk about building an object detection model using YOLO for cropping and masking images to obtain a cleaner text from transcription. YOLO is a deep learning object detection and recognition modeling framework that is able to produce highly accurate results with low latency. In the next part of my presentation, I will talk about the building the Computer Vision landscape at Chegg. Starting from images on academic materials that are composed of elements such as text, equations, diagrams we create a pipeline for extracting these image elements. Using state of the art deep learning techniques we create embeddings for these elements to enhance downstream machine learning models such as content quality and similarity.
Recent Progress on Object Detection_20170331Jihong Kang
This slide provides a brief summary of recent progress on object detection using deep learning.
The concept of selected previous works(R-CNN series/YOLO/SSD) and 6 recent papers (uploaded to the Arxiv between Dec/2016 and Mar/2017) are introduced in this slide.
Most papers are focusing on improving the performance of small object detection.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
Locating objects in images (“detection”) quickly and efficiently enables object tracking and counting applications on embedded visual sensors (fixed and mobile). By 2012, progress on techniques for detecting objects in images – a topic of perennial interest in computer vision – had plateaued, and techniques based on histogram of oriented gradients (HOG) were state of the art. Soon, though, convolutional neural networks (CNNs), in addition to classifying objects, were also beginning to become effective at simultaneously detecting objects. Research in CNN-based object detection was jump-started by the groundbreaking region-based CNN (R-CNN). We’ll follow the evolution of neural network algorithms for object detection, starting with R-CNN and proceeding to Fast R-CNN, Faster R-CNN, “You Only Look Once” (YOLO), and up to the latest Single Shot Multibox detector. In this talk, we’ll examine the successive innovations in performance and accuracy embodied in these algorithms – which is a good way to understand the insights behind effective neural-network-based object localization. We’ll also contrast bounding-box approaches with pixel-level segmentation approaches and present pros and cons.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
5. Segmentation
Idea: If we correctly segment the image before running object
recognition, we can use our segmentations as candidate objects.
Advantages: Can be efficient, makes no assumptions about object
sizes or shapes.
6. Selective search
• Start by oversegmenting the input image
“Efficient graph-based image
segmentation” Felzenszwalband
Huttenlocher, IJCV2004
9. Similarity measures
Color: 25 bin color histogram for each channel =75 (rgb)
Texture: HOG like gaussian derivatives of the image in 8
directions and for each channel. Construct a 10-bin histogram
for each region = 240 dim vector.
Size: Size similarity encourages smaller regions to merge
early. It ensures that region proposals at all scales are formed
at all parts of the image.
Shape: Measures how well two regions (ri and rj) fit into each
other. If ri fits into rj merge them to fill gaps
10. Selective search
1. Merge two most similar regions basedonS.
2. Update similarities between the newregion and its
neighbors.
3. Gobackto step 1.until the
whole imageis
asingle region.
11. Selective search
• Use hierarchical segmentation: start with small superpixels and
merge based on diverse cues
• Take bounding boxesof all generatedregions andtreat them aspossible
object locations
16. R-CNN details
• Cons
• Training is slow (84h), takes a lot of disk space
• 2000 CNN passes per image
• Inference (detection) is slow (47s / image with VGG16)
• The selective search algorithm is a fixed algorithm, no learning is
happening!. This could lead to the generation of bad candidate
region proposals.
17. Fast R-CNN
ConvNet
Forward whole image through ConvNet
“conv5” feature map of image
“RoI Pooling” layer
Linear +
softmax
FCs Fully-connected layers
Softmax classifier
Region
proposals
Linear Bounding-box regressors
18. Fast R-CNN
• Pros
• Less compute overhead
• 2.3 seconds per image inference time
• Cons
• Inference of 2.3 secs is still slow for real life!
• The selective search algorithm is a fixed algorithm, no learning is
happening!. This could lead to the generation of bad candidate
region proposals.
22. Region proposal network (RPN)
• Slide a small window over the feature map
• Predict object/no object
• Regress bounding box coordinates
• Box regression is with reference to anchors (3 scales x 3 aspect ratios)
23. Loss
i : index of an anchor in a mini-batch
pi: is the predicted probability of anchor i being an object
p∗i is 1 if the anchor is positive, and is 0 if the anchor is
negative.
ti: 4 predicted bounding box coordinates
t∗i: ground-truth box associated coordinates with a
positive anchor
Lreg (ti , t∗i ) = R(ti − t∗i ) where R is the robust loss
function (smooth L1)
Classification+Regression
24. Online hard example mining
• Class imbalance hurts training.
• We are training the model to learn background
space rather than detecting objects.
Sort anchors by their calculated loss, apply NMS
Pick the top ones such that ratio between the
picked negatives and positives is at most 3:1.
• Faster rcnn selects 256 anchors - 128 positive,
128 negative
29. NMS: non max suppression (refresher)
Initial predicted boxes Filtered (Suppressed boxes) by IOU
30. Why one stage detector trails accuracy?
Two-stage:
The proposal stage rapidly
narrows down #candidate object
locations to a small number (e.g.,
1-2k), filtering out most
background samples
In the classification stage, fix
foreground-to-background ratio to
1:3, or online hard example
mining (OHEM).
One-stage:
Have to process a much larger
set of candidate object locations
regularly sampled across an
image, which amounts to
enumerating ~100k locations that
densely cover spatial positions,
scales, and aspect ratios.
Extreme foreground-background class imbalance encountered
31. Activation maps
How about predicting from multiple maps?
As image goes through deeper in the
network, resolution decreases and
semantic value increases
32. Feature pyramid networks (FPN)
• Improve predictive power of
lower-level feature maps by
adding contextual
information from higher-
level feature maps
Top-Down+Lateral connections
36. Anchors - Example
• Anchor dims=(size*scale)/sqrt(ratio)
• Eg for 32 anchor size:
• [-22 -11 22 11] 44X22 [-28 -14 28 14] 56X28 [-35 -17 35 17] 70X34
• [-16 -16 16 16] 32X32 [-20 -20 20 20] 40X40 [-25 -25 25 25] 50X50
• [-11 -22 11 22] 22X44 [-14 -28 14 28] 28X56 [-17 -35 17 35] 34X70
For 800,600 input image:
• P3 activation map shape: 100,75
• Stride: 8
• Total (A) = 9 anchors per pixel location
• Total anchors at P3 level = 100*75*9
= 67500
• Similarly sum for all pyramid levels
P3,P4,P5,P6,P7 = total 90360! anchors per
image
37. Shift anchors
Shift anchors according to input image from activation map
(26,15)
(-22,-11)
(22,11)
(-18,-7)
(0,0)
(4,4)
Shift anchor centered at (0,0) on P3 (stride 8)
Activation map by [ 4. 4. 4. 4.]
Next shift [ 12. 4. 12. 4.], [ 20. 4. 20. 4.] , ….
(4,4) (12,4)
8
Input Image
Anchors applied wrt to input image!
38. Cross Entropy loss
Examples that are easily classified (pt >
0.5) incur a loss with non-trivial magnitude
but summed over a large number of easy
examples, these small loss values can
overwhelm the rare class.
39. Balanced Entropy loss
Alpha=1 for foreground,1-alpha for background
• Alpha hyperparam
• While α balances the importance
of positive/negative examples, it
does not differentiate between
easy/hard examples!
40. Example
• The loss from easy
examples = 100000×0.1 =
10000
• The loss from hard
examples = 100×2.3 =
230
• 10000 / 230 = 43. It is
about 40× bigger loss
from easy examples.
41. Focal loss!
• Misclassified, pt is small, modulating factor is near 1, loss is
unaffected.
• As pt → 1, the factor goes to 0 and the loss for well-classified
examples is down-weighted..
• with γ = 2, example classified with pt = 0.9 would have 100×
lower loss compared to CE and with pt ≈ 0.968 it would have
1000× lower loss. This in turn increases the importance of
correcting misclassified examples!
• Every sample is weighted
according to its error!
• Modulating factor added
• Focusing parameter γ smoothly
adjusts the rate at which easy
examples are downweighted
45. Prediction pipeline
Predicts regression(deltas) to anchor boxes!
Filter by 0.05 anchor score threshold
Get 1000 boxes per level, merge all
Apply NMS at 0.5
300 final boxes! display to user
At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum possible proposals for each location is denoted as k. So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal