The document summarizes the SSD object detection model. SSD is a single-shot detector that performs object detection by predicting bounding boxes and class probabilities from multiple feature maps extracted from a base network. SSD improves speed over two-stage detectors like Faster R-CNN by performing detection in one stage without region proposals. It achieves this by using default bounding boxes of different scales and aspect ratios on multiple feature maps to detect objects. The document explains SSD's model architecture, training procedure, and experimental results, showing that SSD achieves real-time speeds while maintaining accuracy compared to other detectors.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Plotting the training process
Regularization
Batch normalization
Saving and loading the weights and the architecture of a model
Visualize a Deep Learning Neural Network Model in Keras
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Plotting the training process
Regularization
Batch normalization
Saving and loading the weights and the architecture of a model
Visualize a Deep Learning Neural Network Model in Keras
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
More information in our Deep Learning webinar: http://www.slideshare.net/0xdata/h2-o-deeplearningarnocandel052114
Latest slide deck: http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
101번째 영상,
펀디멘탈팀 김준호 님의
Restricting the Flow: Information Bottlenecks for Attribution
논문 리뷰 입니다
Explanable ai, xai와 관련된 페이퍼 입니다! 관련되어 관심있으신 분들이 많은 도움이 되시길 바랍니다! attribution map을 이용하여 결과물에 영향을 준 네트워크의 gradient를 직접 추적하여 비주얼 explanation을 추적하는 방식입니다! 펀디멘탈팀 김준호님이 밑바닥부터 자세한 리뷰를 도와주셨습니다!
오늘도 많은 관심과 사랑 감사합니다!
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
Our team achieved 85th position out of 3,514 at the very popular Kaggle Otto Product Classification Challenge. Here's an overview of how we did it, as well as some techniques we learnt from fellow Kagglers during and after the competition.
A New Classifier Based onRecurrent Neural Network Using Multiple Binary-Outpu...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
6. SSD : Introduction
SOTA는 FASTER RCNN(2 Stage Detector)
- BoundingBox 가설을 통해 각 Box에 대한 픽셀이나 피처의 Resample하고 Class를 분류하는 방법
Too computationally intensive for embedded systems
- Faster RCNN도 7fps밖에 안나옴
Significantly increased speed
- 정확도가 떨어짐, YOLO
- Faster R-CNN 7 FPS with mAP 73.2% or YOLO 45 FPS with mAP 63.4%
The first deep network based object detector
- does not resample pixels features for bounding box
- accurate as approaches
두마리 토끼(속도와 정합성)을 잡자!
7. SSD : Single shot Detector
- 여러개의 Default Box 사용, 여러개의 피처에 Prediction 진행
- 높은 레벨의 피처는 추상화가 잘되어 있어서 큰 물체를 잘 찾음
- 낮은 레벨의 피처는 위치정보가 정확함
이런 느낌?
마지막 피처에서만 찾지 말고, 처음, 중간, 마지막 피처에서 찾아보자
9. SSD : Model
Multi-scale feature maps for detection
- 다른 Feature map에서 detection을 수행함
- 낮은 레이어는 물체의 위치가 더 정확히, 높은 레이어에서는 추상화가 잘되어 있으므로, 두개를 잘 섞자.
Convolutional predictors for detection
The SSD approach is based on a feed-forward convolutional network that
produces a fixed-size collection of bounding boxes and scores for the presence
of object class instances in those boxes, followed by a non-maximum
suppression step to produce the final detections
- Detection을 할때는 3x3xP개의 Conv필터를 사용함
- 출력은 a score for a category(1개), or a shape offset relative to the default box coordinates(4개)
Default boxes and aspect ratios
- Our default boxes are similar to the anchor boxes used in Faster R-CNN
- 마치 Faster RCNN처럼 기본 박스를 initial로 정하고, x, y, dw dh의 변화량을 학습함
10. SSD : Model
Convolutional predictors for detection 좀더 자세히
- Classifier : Conv: 3x3x(4x(Classes+4))
- 구조 : 첫번째 박스[(4개(dx, dy, dh, dw), 20개(Poscal voc기준 20 class), + 1개(bg)]
두번째, 세번째 , ~6번재박스까지
- 출력 채널 : 150 = 6 x (21 = 4)
12. SSD : Training
Matching strategy
- 많은 Default Boxes에서 GT랑 많이 겹치는 부분을 찾아내고 나머지는 Background처리 하는 기준이 IOU 0.5
- we then match default boxes to any ground truth with jaccard overlap higher than a threshold (0.5)
- Jaccard overlap이 iou임
The key difference between training SSD and training a typical detector that
uses region proposals, is that ground truth information needs to be assigned to
specific outputs in the fixed set of detector outputs. YOLO and for the region
proposal stage of Faster R-CNN
13. SSD : Training
Training objective
- Faster RCNN이랑 비슷함
● L conf : The confidence loss is the softmax loss over multiple classes confidences
● L Loc : we regress to offsets for the center (cx, cy) of the default bounding box (d) and for its width (w) and
height (h), default box에서 얼마나 이동시키면 되는건가를 학습하는것임
Width와 height는 log임
스케일이 커질수 있으니까.
N : the number of matched default boxes
14. SSD : Training
- 고양이와 개가 존재(고양이는 작고, 개는 큼)
- 8 x 8(낮은 레벨의 피처) 에서 iou가 0,5이상인것은 고양이만 검출(개는 더 크게 봐야함)
- 4 x 4(높은 레벨의 피처) 에서는 iou사 0.5이상인것은 개만 검출(고양이는 너무 작음)
- 피처에 따라 한 픽셀이 담당하는 원본이미지의 영역이 달라짐
Maching 알고리즘과 로스를 보고 다시한번 첫번째 그림을 해석하면
16. SSD : Training
- 디폴트 박스를 만드는 식 설명
Choosing scales and aspect ratios for default boxes
● M : 몇개의 feature map에서 박스를 뽑아 낼것이냐
● Smin, Smax는 상수(0.2~0.9)
● K는 선택하는 값
● Example PASCAL VOC : sk 0.1, 0.2, 0.55, 0.725, 0.9
- Sk 계산이 끝나면 박스의 비율을 선택
● ar ∈ {1, 2, 3, 1/2 , 1/3 }.
● 비율을 계산 width= sk √ ar, height = sk / √ ar 1이면, 정사각형 2 이면은 세로가 작은, 1/2이면 세로가 큰
● 5개의 비율이 다른 박스를 생성
● 바운딩 박스를 6개나 4개를 뽑았는데 1개는 sk만 가지고 추가로 만듬
● 4개는 3이랑 1/3이 빠져서 4개가 됨
17. SSD : Training
- After the matching step, most of the default boxes are negatives, especially when the number of possible
default boxes is large
- 모든 Detection에 대한 공통적인 문제, Bounding Box가 8732개인데 iou 0.5만 추려내서 사용한다면은
8732개중에 대부분이 Negative Sample이므로 거의 대부분의 데이터가 배경임
- Using the highest confidence loss for each default box
- Thee ratio between the negatives and positives is at most 3:1.
- 그래서 confidence로 순서를 세우고, Negative중에 높은것들중에 Positive의 3배만 선택
Hard negative mining
- Use the entire original input image.
- Sample a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3, 0.5, 0.7, or 0.9
- Randomly sample a patch.
- The aspect ratio is between 1 2 and 2
- Horizontally flipped with probability of 0.5
- Applying some photo-metric distortions
Data augmentation
18. SSD : Experimental Results
- VGG16
- We convert fc6 and fc7 to convolutional layers
- Using the highest confidence loss for each default box
- Subsample parameters from fc6 and fc7, change pool5 from 2 × 2 − s2 to 3 × 3 − s1
- We remove all the dropout layers and the fc8 layer
- We fine-tune the resulting model using SGD with initial learning rate 10−3 , 0.9 momentum, 0.0005 weight
decay, and batch size 32
Base network
19. SSD : Experimental Results
- Both Fast and Faster R-CNN use input images whose minimum dimension is 600
- The two SSD models have exactly the same settings except that they have different input sizes (300×300 vs.
512×512)
20. SSD : Experimental Results
- XS=extra-small; S=small; M=medium; L=large; XL =extra-large. Aspect Ratio: XT=extra-tall/narrow; T=tall;
M=medium; W=wide; XW =extra-wide
- SSD는 작은 물체를 잘 검출하지 못한다.
- 비율은 일그러져도 나름 잘 찾음
21. SSD : Experimental Results
- 이 논문에서는 Data Augmentation 으로 해결 할려함. 작은 이미지를 train data에 추가함
Sensitivity and impact of different object
● we first randomly place an image on a canvas of 16× of the original image size filled with mean values
원본이미지에 16배 큰 캔버스에 붙여 넣기할 이미지의 평균값으로 채운다
● We we do any random crop operation
● 그리고 이미지를 붙여 넣음
나름 잘 찾음
22. SSD : Experimental Results
Other reasons? FPN의 시작
- 작은 물체는 낮은 레이어에서 검출됨.
- 낮은 레이어는 충분하게 Abstraction 이 되어 있지 않아서 검출이 힘듬
- 높은 레이어에서는 충분한 Abtration이 되어 있으나 작은 물체는 검출이 힘듬(큰물체는 잘 찾음)
- 높은 레이어의 Abtration결과를 낮은 레이어로 전파해주자. 다시 거꾸로 올려줌
- FPN의 시작. 그중 Retina를 살펴보겠음
24. RETINA : Introduction
SOTA는 Two Stage Detector(FASTER FCNN …)
Could a simple one-stage detector achieve similar accuracy?
Class imbalance가 문제인데 (Negative : 배경이 너무 많음)
We propose a new loss function that acts as a more effective alternative to
previous approaches for dealing with class imbalance
- Faster RCNN은 RPN을 통해 바운딩 박스를 휴리스틱방법을 통해 줄여줌
- Single Stage Detector는 제안하는 박스가 너무 많고 대부분이 배경임
- One Stage : Fast, Simple
- Two Stage : 10~40% better accuracy
- CE(Cross Entropy)에 몇개 Term을 추가한 focal loss를 제안
- 쉬운 샘플을 더욱더 쉽게 만들어서 어려운 샘플에 더 focus하게 만드는 loss
- YOLOv1(98 boxes), YOLOv2(1K), OverFeat(1~2K), SSD(~8-26k)
- Default boxes가 많을수록 성능이 좋음
25. RETINA : Introduction
Cross Entropy with Imbalance Data
We propose a new loss function that acts as a more effective alternative to
previous approaches for dealing with class imbalance
- CE(Cross Entropy)에 몇개 Term을 추가한 focal loss를 제안
- 쉬운 샘플을 더욱더 쉽게 만들어서 어려운 샘플에 더 focus하게 만드는 loss
- 100000 easy, 100 hard examples
- 40x bigger loss from easy examples
- 그래서 CE를 살짝 변경함
27. RETINA : Focal loss
Focal Loss
- We introduce the focal loss starting from the cross entropy (CE) loss for binary classification
● y ∈ {±1} specifies the ground-truth class
● p ∈ [0, 1] is the model’s estimated probability for the class with label y = 1
28. RETINA : Focal loss
Balanced Cross Entropy
● For instance, with γ = 2, an example classified with pt = 0.9 would have 100× lower loss compared with
CE and with pt ≈ 0.968
Focal Loss Definition
쉬운것을 더 쉽게 만들어서 Hard sample에 더 집중하게 만드는 loss
29. RETINA : Retinanet Detector
RetinaNet Detector
- RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks
- The backbone is responsible for computing a convolutional feature map over an entire input image
- The second subnet performs convolutional bounding box regression
- We construct a pyramid with levels P3 through P7
- the spatial resolution is upsampled by a factor of 2 using the nearest neighbor for simplicity.(FPN), 1 by 1 Conv
추상화가 잘된 피처를 낮은 레이어로 내려서 작은 물체도 잘 디텍션 하게
31. RETINA : Retinanet Detector
추가 고민사항
- Backbone을 유지한채로 FPN부분만 잘 설계하면 성능이 좋아지지 않을까?
- 꼭 FPN을 top-down으로 섞어야 하는가?
- 어떻게 섞는것이 효율적일까?
- 잘 모르겠으니 Automl로 이것저것 다 섞어서 테스트를 해보자
NAS-FPN으로 넘어감
32. RETINA : Retinanet Detector
추가 고민사항
- Backbone을 유지한채로 FPN부분만 잘 설계하면 성능이 좋아지지 않을까?
- 꼭 FPN을 top-down으로 섞어야 하는가?
- 어떻게 섞는것이 효율적일까?
- 잘 모르겠으니 Automl로 이것저것 다 섞어서 테스트를 해보자
NAS-FPN으로 넘어감
34. NAS-FAN : Introduction
The challenge of designing feature pyramid architecture is in its huge design space
The key contribution of our work is in designing the search space that
covers all possible cross-scale connections to generate multiscale feature
representations.
The discovered architecture, named NAS-FPN, offers great flexibility in
building object detection architecture.
- Recently, Neural Architecture Search algorithm demonstrates promising results on efficiently
discovering top-performing architectures for image classification in a huge search space
Current state-of-the-art convolutional architectures for object detection are
manually designed. Here we aim to learn a better architecture of feature
pyramid network for object detection.
35. NAS-FAN : Method
- The architecture of FPN can be stacked N times for better accuracy
- The backbone model and the subnets for class and box predictions follow the original design in RetinaNet
RetinaNet with NAS-FPN
36. NAS-FAN : Method
- 5 scales {C3, C4, C5, C6, C7} with corresponding feature stride of {8, 16, 32, 64, 128} pixels
- The C6 and C7 are created by simply applying stride 2 and stride 4 max pooling to C5
- 피처맵 2개 선택해서 적당한 연산을 통해 합쳐주는 방법 MergingCell을 제안
Merging Cell
- Feature map을 2개 뽑고, output resolution 선택하고, Binary op를 해서 합친다.
- The input feature layers are adjusted to the output resolution by nearest neighbor
upsampling or max pooling if needed before applying the binary operation
- The merged feature layer is always followed by a ReLU, a 3x3 convolution, and a
batch normalization layer
- 다시 피처맵에 넣고 N time 반복
38. NAS-FAN : Experiments
Architecture Search for NAS-FPN
- To speed up the training of the RNN controller we need a proxy task
- Proxy task for 10 epochs, instead of 50 epochs
- A small backbone architecture of ResNet-10 with input 512 × 512 image size
- Reward : We reserve a randomly selected 7392 images from the COCO train2017 set as the validation set,
which we use to obtain rewards
Proxy Task
- Similar to our controller is a recurrent neural network (RNN) and it is trained using the Proximal Policy
Optimization (PPO) algorithm.
- The total number of unique architectures generated by the RNN controller
Contoller
39. NAS-FAN : Experiments
Architecture Search for NAS-FPN
- Left : The reward is computed as the AP of sampled architectures on the proxy task
- Right: The number of sampled unique architectures to the total number of sampled architectures
- Unique 한 FPN 구조는 대충 8000개 정도에서 수렴함
- 수많은 TPUs 사용해서 만들어낸 결과는?(100 TPUs,? 1000 TPUs??)
41. NAS-FAN : Experiments
Architecture graph of NAS-FPN
- Feature layers in the same row have identical resolution
- The resolution decreases in the bottom-up direction
- 해석을 하자면 FPN은 low 에서 high resolution 으로만 연결이 있음
- NAS가 AP가 높은것을 찾을수록 High resolution을 low resolution으로 연결할려는 모습을 보임
작은 물체를 감지하는 고해상도 피처를 연결하는 feature를 생성할수록 성능이 좋아짐
43. NAS-FAN : Experiments
Further Improvements with DropBlock
- We apply DropBlock with block size 3x3 after batch normalization layers in the the NAS-FPN layers
- DropBlock을 사용하면 성능이 더 좋아짐
44. 추가 고민사항
- AutoML이 Detection 영역으로 적용된 사례
- AutoML을 돌릴려면 무지막지한 장비와 시간이 드는데 과연 우리들이 할수 있을까?
- 더 효과적인 방법이 있을까?
- Multi resolution feature를 더할때 그냥 sum만 하는데 다른 방법이 없을까?
Efficient DET의 시작.
NAS-FAN : Experiments
46. EFFICIENTDET : Introduction
The state of-the-art object detectors also become increasingly more expensive
The key contribution of our work is in designing the search space that
covers all possible cross-scale connections to generate multiscale feature
representations.
- The latest AmoebaNet-based NASFPN detector requires 167M parameters and 3045B FLOPS (30x
more than RetinaNet)
- Given these real-world resource constraints, model efficiency becomes increasingly important for
object detection.
Model efficiency has become increasingly important in computer vision. First,
we propose a weighted bi-directional feature pyramid network. Second, we
propose a compound scaling method(EfficientNet). We have developed a new
family of object detectors, called EfficientDet
47. EFFICIENTDET : Introduction
Although these methods tend to achieve better efficiency, they usually sacrifice
accuracy
- Most previous works only focus on a specific or a small range of resource requirements
- the variety of real-world applications, from mobile devices to datacenters
A natural question
Is it possible to build a scalable detection architecture with both higher accuracy
and better efficiency across a wide spectrum of resource constraints.
모든 OD 논문의 공통 질문, 정확도와 효율성을 동시에 잡겠다!
48. EFFICIENTDET : Introduction
Challenge 1: efficient multi-scale feature fusion
- FPN has been widely used for multiscale feature fusion
- PANet, NAS-FPN, and other studies have developed more network structures for cross-scale feature fusion
- Most previous works simply sum them up without distinction
- We propose a simple yet highly effective weighted bi-directional feature pyramid network (BiFPN)
- PANet Retina Top-Down에서 하나더 Down-Top을 추가로 넣음
- 이유는 낮은 레벨의 feature는 위치정보가 더 있으니, 한번더 위로 올려주어서 상위레벨의 feature에
위치정보를 더 주면 성능이 좋아질것으로 예상.
49. EFFICIENTDET : Introduction
Challenge 2: model scaling
- Inspired by recent works EfficientNet, we propose a compound scaling method for object detectors, which
jointly scales up the resolution/depth/width for all backbone, feature network, box/class prediction network
- 모델을 크게 만드는 3가지 방법이 width, depth, resolution이 있는데 3개를 동시에 적절히 잘해보자.(Efficient
Net방법 적용)
50. EFFICIENTDET : Introduction
Our contributions can be summarized
- We proposed BiFPN, a weighted bidirectional feature network for easy and fast multi-scale feature fusion
- We proposed a new compound scaling method, which jointly scales up backbone, feature network,
box/class network, and resolution, in a principled way
- Based on BiFPN and compound scaling, we developed EfficientDet
51. EFFICIENTDET : BiFPN
Problem Formulation
- We proposed BiFPN, a weighted bidirectional feature network for easy and fast multi-scale feature fusion
- We proposed a new compound scaling method, which jointly scales up backbone, feature network,
box/class network, and resolution, in a principled way
- Based on BiFPN and compound scaling, we developed EfficientDet
52. EFFICIENTDET : BiFPN
Problem Formulation
- Formally, given a list of multi-scale features
Feature Pyramid에서 사용하는 Feature를 P in
- Our goal is to find a transformation f that can effectively aggregate different features.
- Output a list of new features
54. EFFICIENTDET : BiFPN
Cross-Scale Connections
- We observe that PANet achieves better accuracy than FPN and NAS-FPN
- 진짜?? 그럼 왜 NAS를 돌린걸까??
- First, we remove those nodes that only have one input edge
- Our intuition is simple: if a node has only one input edge with no feature fusion then it will have less
contribution called Simplified PANet
- Second, we add an extra edge from the original input to output node if they are at the same level
- Third, unlike PANet that only has one top-down and one bottom-up path, we treat each bidirectional
(top-down & bottom-up) path as one feature network layer, and repeat the same layer multiple times to
enable more high-level feature fusion
First Second Third N
times repeat
55. EFFICIENTDET : BiFPN
Weighted Feature Fusion
- A common way is to first resize them to the same resolution and then sum them up.
- Pyramid attention network introduces global self-attention upsampling to recover pixel
localization(SENET과 비슷)
Unbounded fusion
- Wi is a learnable weight that can be a scalar (per-feature), a vector (per-channel), or a multi-dimensional
tensor (per-pixel).
- We find a scale, The scalar weight is unbounded
- we resort to weight normalization to bound the value range of each weight
56. EFFICIENTDET : BiFPN
Softmax-based fusion
- An intuitive idea is to apply softmax to each weight, such that all weights are normalized to be a probability
with value range from 0 to 1, representing the importance of each input.
- The extra softmax leads to significant slowdown on GPU hardware
Fast normalized fusion
- where wi ≥ 0 is ensured by applying a Relu after each Wi
- E = 0.0001 is a small value to avoid numerical instability
- This fast fusion approach has very similar learning behavior and accuracy as the softmax-based fusion,
but runs up to 30% faster on GPUs
59. EFFICIENTDET : Architecture
EfficientDet architecture
- EfficientNet as the backbone network
- BiFPN as the feature network n times
- Shared class/box prediction network
61. EFFICIENTDET : EFFICIENTNET
Compound Scaling
- We propose a new compound scaling method for object detection, which uses a simple compound
coefficient φ to jointly scale up all dimensions of backbone network, BiFPN network, class/box
network, and resolution.
- Grid search for all dimensions is prohibitive expensive. Therefore, we use a heuristic-based scaling
approach
Backbone network
- We reuse the same width/depth scaling coefficients of EfficientNet-B0 to B6
62. EFFICIENTDET : EFFICIENTNET
BiFPN network
- We exponentially grow BiFPN width Wbifpn (#channels)
- Linearly increase depth Dbifpn (#layers)
Box/class prediction network
- We fix their width to be always the same as BiFPN (i.e., Wpred = Wbifpn)
- But linearly increase the depth (#layers)
채널 깊이, 레이어 수
Input image resolution
- Since feature level 3-7 are used in BiFPN, the input resolution must be dividable by 2^7=128
- But linearly increase the depth (#layers)