This document summarizes a tutorial on object detection beyond RetinaNet and Mask R-CNN. It discusses challenges in object detection including the backbone network, detection head, pretraining, handling scale variations, large batch sizes, detecting objects in crowds, and neural architecture search. It also introduces recent works that aim to address these challenges, such as DetNet, Light Head R-CNN, Objects365 pretraining, SFace for scale, MegDet for batch size, CrowdHuman benchmark for crowds, and NAS approaches. The document concludes that further improving object detection requires focusing on details and that continued progress will significantly benefit computer vision applications.
Intro to selective search for object proposals, rcnn family and retinanet state of the art model deep dives for object detection along with MAP concept for evaluating model and how does anchor boxes make the model learn where to draw bounding boxes
Mask R-CNN present a conceptually simple, flexible, and general framework for object instance segmentation. This approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.
presentation: https://www.youtube.com/watch?v=FZePQKPEwoo (한국어)
reference: He, Kaiming, et al. "Mask r-cnn." arXiv preprint arXiv:1703.06870 (2017).
Intro to selective search for object proposals, rcnn family and retinanet state of the art model deep dives for object detection along with MAP concept for evaluating model and how does anchor boxes make the model learn where to draw bounding boxes
Mask R-CNN present a conceptually simple, flexible, and general framework for object instance segmentation. This approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.
presentation: https://www.youtube.com/watch?v=FZePQKPEwoo (한국어)
reference: He, Kaiming, et al. "Mask r-cnn." arXiv preprint arXiv:1703.06870 (2017).
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
computer vision 분야에서 dominant 한 Convolutional Layer를 일절 사용하지 않고, NLP에서 제안된 순수 Transformer의 architecture를 그대로 가져와 Attention과 일반 Feed Forward NN만을 이용하여 SOTA수준의 Image Classification Model을 구축한다.
TAVE research seminar 21.03.30 발표자료
발표자: 오창대
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
cvpaper.challenge の Meta Study Group 発表スライド
cvpaper.challenge はコンピュータビジョン分野の今を映し、トレンドを創り出す挑戦です。論文サマリ・アイディア考案・議論・実装・論文投稿に取り組み、凡ゆる知識を共有します。2019の目標「トップ会議30+本投稿」「2回以上のトップ会議網羅的サーベイ」
http://xpaperchallenge.org/cv/
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
computer vision 분야에서 dominant 한 Convolutional Layer를 일절 사용하지 않고, NLP에서 제안된 순수 Transformer의 architecture를 그대로 가져와 Attention과 일반 Feed Forward NN만을 이용하여 SOTA수준의 Image Classification Model을 구축한다.
TAVE research seminar 21.03.30 발표자료
발표자: 오창대
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
cvpaper.challenge の Meta Study Group 発表スライド
cvpaper.challenge はコンピュータビジョン分野の今を映し、トレンドを創り出す挑戦です。論文サマリ・アイディア考案・議論・実装・論文投稿に取り組み、凡ゆる知識を共有します。2019の目標「トップ会議30+本投稿」「2回以上のトップ会議網羅的サーベイ」
http://xpaperchallenge.org/cv/
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/understanding-dnn-based-object-detectors-a-presentation-from-au-zone-technologies/
Azhar Quddus, Senior Computer Vision Engineer at Au-Zone Technologies, presents the “Understanding DNN-Based Object Detectors” tutorial at the May 2022 Embedded Vision Summit.
Unlike image classifiers, which merely report on the most important objects within or attributes of an image, object detectors determine where objects of interest are located within an image. Consequently, object detectors are central to many computer vision applications including (but not limited to) autonomous vehicles and virtual reality.
In this presentation, Quddus provides a technical introduction to deep-neural-network-based object detectors. He explains how these algorithms work, and how they have evolved in recent years, utilizing examples of popular object detectors. Quddus examines some of the trade-offs to consider when selecting an object detector for an application, and touches on accuracy measurement. He also discusses performance comparison among the models discussed in this presentation.
Yinyin Liu presents a model for object detection and localization, called Fast-RCNN. She will show how to introduce a ROI pooling layer into neon, and how to add the PASCAL VOC dataset to interface with model training and inference. Lastly, Yinyin will run through a demo on how to apply the trained model to detect new objects.
Object extraction from satellite imagery using deep learningAly Abdelkareem
Presentation for extract objects from satellite imagery using deep learning techniques. you find a comparison between state-of-art approaches in computer vision.
Recent Progress on Object Detection_20170331Jihong Kang
This slide provides a brief summary of recent progress on object detection using deep learning.
The concept of selected previous works(R-CNN series/YOLO/SSD) and 6 recent papers (uploaded to the Arxiv between Dec/2016 and Mar/2017) are introduced in this slide.
Most papers are focusing on improving the performance of small object detection.
Object detection is a central problem in computer vision and underpins many applications from medical image analysis to autonomous driving. In this talk, we will review the basics of object detection from fundamental concepts to practical techniques. Then, we will dive into cutting-edge methods that use transformers to drastically simplify the object detection pipeline while maintaining predictive performance. Finally, we will show how to train these models at scale using Determined’s integrated deep learning platform and then serve the models using MLflow.
What you will learn:
Basics of object detection including main concepts and techniques
Main ideas from the DETR and Deformable DETR approaches to object detection
Overview of the core capabilities of Determined’s deep learning platform, with a focus on its support for effortless distributed training
How to serve models trained in Determined using MLflow
Invited talk at USTC and SJTU, discuss recent progress in object re-identification against very large repository, especially the problem of fast key point detection, feature repeatability prediction, aggregation, and object repository indexing and search.
DEEP NEURAL NETWORKS APPLIED TO LOW POWER ONBOARD IMAGE COMPRESSION
Over the past decade, rapid developments in digital technologies and access to space have enabled unprecedented capabilities of monitoring our planet and, more generally, our Universe.
This new space race is pushing for a paradigm shift in order to respond to the ever-increasing challenge of delivering the useful information to the end users. With huge number of satellites, greater spatial and spectral resolutions, higher temporal cadence and shrinking spectrum resources, on-board data reduction becomes not only a cost saving solution but, in many cases also, a key enabling technology to achieve viable missions.
https://atpi.eventsair.com/obpdc2022/
How static analysis supports quality over 50 million lines of C++ codecppfrug
Retour d’expérience sur l'utilisation d'outils d'analyse statique pour maintenir des activités Q&A dans le développement de logiciel scientifique de grande échelle au sein du CERN.
NERSC is the production high-performance computing (HPC) center for the United States Department of Energy (DOE) Office of Science. The center supports over 6,000 users in 600 projects, using a variety of applications in materials science, chemistry, biology, astrophysics, high energy physics, climate science, fusion science, and more.
NERSC deployed the Cori system on over 9,000 Intel® Xeon Phi™ processors. This session describes the optimization strategy for porting codes that target traditional manycore architectures to the processors. We also discuss highlights and lessons learned from the optimization process on 20 applications associated with the NERSC Exascale Science Application Program (NESAP).
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 4: retinex model based low light enhancement
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 3: prior embedding deep super resolution
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 2: text centric image style transfer
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 1: prior embedding deep rain removal
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
9. How to perform a detection?
• Sliding window: enumerate all the windows (up to millions of windows)
• VJ detector: cascade chain
• Fully Convolutional network
• shared computation
Robust Real-time Object Detection; Viola, Jones; IJCV 2001
http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
10. General Detection Before Deep Learning
• Feature + classifier
• Feature
• Haar Feature
• HOG (Histogram of Gradient)
• LBP (Local Binary Pattern)
• ACF (Aggregated Channel Feature)
• …
• Classifier
• SVM
• Bootsing
• Random Forest
13. General Detection Before Deep Learning
Traditional Methods
• Pros
• Efficient to compute (e.g., HAAR, ACF) on CPU
• Easy to debug, analyze the bad cases
• reasonable performance on limited training data
• Cons
• Limited performance on large dataset
• Hard to be accelerated by GPU
14. Deep Learning for Object Detection
Based on the whether following the “proposal and refine”
• One Stage
• Example: Densebox, YOLO (YOLO v2), SSD, Retina Net
• Keyword: Anchor, Divide and conquer, loss sampling
• Two Stage
• Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN,
MaskRCNN
• Keyword: speed, performance
15. A bit of History
Image
Feature
Extractor
classification
localization
(bbox)
One stage detector
Densebox (2015) UnitBox (2016) EAST (2017)
YOLO (2015) Anchor Free
Anchor importedYOLOv2 (2016)
SSD (2015)
RON(2017)
RetinaNet(2017)
DSSD (2017)
two stages detector
Image
Feature
Extractor
classification
localization
(bbox)
Proposal
classification
localization
(bbox)
Refine
RCNN (2014) Fast RCNN(2015)
Faster RCNN (2015)
RFCN (2016)
MultiBox(2014)
RFCN++ (2017)
FPN (2017)
Mask RCNN (2017)
OverFeat(2013)
16. Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
17. Modern Object detectors
Backbone Head
• Modern object detectors
• RetinaNet
• f1-f7 for backbone, f3-f7 with 4 convs for head
• FPN with ROIAlign
• f1-f6 for backbone, two fcs for head
• Recall vs localization
• One stage detector: Recall is high but compromising the localization ability
• Two stage detector: Strong localization ability
Postprocess
NMS
18. One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
19. One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
21. What is next for object detection?
• The pipeline seems to be mature
• There still exists a large gap between existing state-of-arts and product
requirements
• The devil is in the detail
22. Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
23. Challenges Overview
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-grained
Backbone Head Postprocess
NMS
24. Challenges - Backbone
• Backbone network is designed for classification task but not for
localization task
• Receptive Field vs Spatial resolution
• Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)
25. Backbone - DetNet
• DetNet: A Backbone network for Object Detection, Li etc, 2018,
https://arxiv.org/pdf/1804.06215.pdf
31. Challenges - Head
• Speed is significantly improved for the two-stage detector
• RCNN - > Fast RCNN -> Faster RCNN - > RFCN
• How to obtain efficient speed as one stage detector like YOLO, SSD?
• Small Backbone
• Light Head
32. Head – Light head RCNN
• Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017,
https://arxiv.org/pdf/1711.07264.pdf
Code: https://github.com/zengarden/light_head_rcnn
33. Head – Light head RCNN
• Backbone
• L: Resnet101
• S: Xception145
• Thin Feature map
• L:C_{mid} = 256
• S: C_{mid} =64
• C_{out} = 10 * 7 * 7
• R-CNN subnet
• A fc layer is connected to the PS ROI pool/Align
36. Head – Light head RCNN
• Mobile Version
• ThunderNet: Towards Real-time Generic Object Detection, Qin etc, Arxiv
2019
• https://arxiv.org/abs/1903.11752
37. Pretraining – Objects365
• ImageNet pretraining is usually employed for backbone training
• Training from Scratch
• Scratch Det claims GN/BN is important
• Rethinking ImageNet Pretraining validates that training time is important
42. Pretraining – Objects365
• Summary
• Pretraining is important to reduce the training time
• Pretraining with a large dataset is beneficial for the performance
44. Challenges - Scale
• Scale variations is extremely large for object detection
• Previous works
• Divide and Conquer: SSD, DSSD, RON, FPN, …
• Limited Scale variation
• Scale Normalization for Image Pyramids, Singh etc, CVPR2018
• Slow inference speed
• How to address extremely large scale variation without compromising
inference speed?
45. Scale - SFace
• SFace: An Efficient Network for Face Detection in Large Scale Variations,
2018, http://cn.arxiv.org/pdf/1804.06559.pdf
• Anchor-based:
• Good localization for the scales which are covered by anchors
• Difficult to address all the scale ranges of faces
• Anchor-free:
• Able to cover various face scales
• Not good for the localization ability
49. Scale - SFace
• Summary:
• Integrate anchor-based and anchor-free for the scale issue
• A new benchmark for face detection with large scale variations: 4K Face
50. Challenges - Batchsize
• Small mini-batchsize for general object detection
• 2 for R-CNN, Faster RCNN
• 16 for RetinaNet, Mask RCNN
• Problem with small mini-batchsize
• Long training time
• Insufficient BN statistics
• Inbalanced pos/neg ratio
51. Batchsize – MegDet
• MegDet: A Large Mini-Batch Object Detector, CVPR2018,
https://arxiv.org/pdf/1711.07240.pdf
53. Challenges - Crowd
• NMS is a post-processing step to eliminate multiple responses on one object
instance
• Reasonable for mild crowdness like COCO and VOC
• Will Fail in the case when the objects are in a crowd
54. Challenges - Crowd
• A few works have been devoted to this topic
• Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf
• Relation Networks, Hu etc, CVPR 2018,
https://arxiv.org/pdf/1711.11575.pdf
• Lacking a good benchmark for evaluation in the literature
55. Crowd - CrowdHuman
• CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018,
https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/
• A benchmark with Head, Visible Human, Full body bounding-box
• Generalization ability for other head/pedestrian datasets
• Crowdness
59. Conclusion
• The task of object detection is still far from solved
• Details are important to further improve the performance
• Backbone
• Head
• Pretraining
• Scale
• Batchsize
• Crowd
• The improvement of object detection will be a significantly boost for the
computer vision industry