The document summarizes the CornerNet object detection method. CornerNet detects objects as pairs of top-left and bottom-right corners using a convolutional neural network. It introduces corner pooling to better localize corners and achieves state-of-the-art performance among single-stage detectors. The method formulates object detection as an association problem between corners using embeddings and outperforms other detectors on standard benchmarks with an average inference time of 244ms per image.
center point-based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. Center- Net achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS.
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 284번째 논문 review입니다.
이번 논문은 Facebook에서 나온 DETR(DEtection with TRansformer) 입니다.
arxiv-sanity에 top recent/last year에서 가장 상위에 자리하고 있는 논문이기도 합니다(http://www.arxiv-sanity.com/top?timefilter=year&vfilter=all)
최근에 ICLR 2021에 submit된 ViT로 인해서 이제 Transformer가 CNN을 대체하는 것 아닌가 하는 얘기들이 많이 나오고 있는데요, 올 해 ECCV에 발표된 논문이고 feature extraction 부분은 CNN을 사용하긴 했지만 transformer를 활용하여 효과적으로 Object Detection을 수행하는 방법을 제안한 중요한 논문이라고 생각합니다. 이 논문에서는 detection 문제에서 anchor box나 NMS(Non Maximum Supression)와 같은 heuristic 하고 미분 불가능한 방법들이 많이 사용되고, 이로 인해서 유독 object detection 문제는 딥러닝의 철학인 end-to-end 방식으로 해결되지 못하고 있음을 지적하고 있습니다. 그 해결책으로 bounding box를 예측하는 문제를 set prediction problem(중복을 허용하지 않고, 순서에 무관함)으로 보고 transformer를 활용한 end-to-end 방식의 알고리즘을 제안하였습니다. anchor box도 필요없고 NMS도 필요없는 DETR 알고리즘의 자세한 내용이 알고싶으시면 영상을 참고해주세요!
영상링크: https://youtu.be/lXpBcW_I54U
논문링크: https://arxiv.org/abs/2005.12872
Slide for study session given by Christian Saravia at Arithmer inc.
It is a summary of recent method for object detection, centernet.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
YOLO, a new approach to object detection. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Slides from the UPC reading group on computer vision about the following paper:
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015).
center point-based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. Center- Net achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS.
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 284번째 논문 review입니다.
이번 논문은 Facebook에서 나온 DETR(DEtection with TRansformer) 입니다.
arxiv-sanity에 top recent/last year에서 가장 상위에 자리하고 있는 논문이기도 합니다(http://www.arxiv-sanity.com/top?timefilter=year&vfilter=all)
최근에 ICLR 2021에 submit된 ViT로 인해서 이제 Transformer가 CNN을 대체하는 것 아닌가 하는 얘기들이 많이 나오고 있는데요, 올 해 ECCV에 발표된 논문이고 feature extraction 부분은 CNN을 사용하긴 했지만 transformer를 활용하여 효과적으로 Object Detection을 수행하는 방법을 제안한 중요한 논문이라고 생각합니다. 이 논문에서는 detection 문제에서 anchor box나 NMS(Non Maximum Supression)와 같은 heuristic 하고 미분 불가능한 방법들이 많이 사용되고, 이로 인해서 유독 object detection 문제는 딥러닝의 철학인 end-to-end 방식으로 해결되지 못하고 있음을 지적하고 있습니다. 그 해결책으로 bounding box를 예측하는 문제를 set prediction problem(중복을 허용하지 않고, 순서에 무관함)으로 보고 transformer를 활용한 end-to-end 방식의 알고리즘을 제안하였습니다. anchor box도 필요없고 NMS도 필요없는 DETR 알고리즘의 자세한 내용이 알고싶으시면 영상을 참고해주세요!
영상링크: https://youtu.be/lXpBcW_I54U
논문링크: https://arxiv.org/abs/2005.12872
Slide for study session given by Christian Saravia at Arithmer inc.
It is a summary of recent method for object detection, centernet.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
YOLO, a new approach to object detection. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
Slides from the UPC reading group on computer vision about the following paper:
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015).
Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 207번째 논문 review입니다
이번 논문은 YOLO v3입니다.
매우 유명한 논문이라서 크게 부연설명이 필요없을 것 같은데요, Object Detection algorithm들 중에 YOLO는 굉장히 특색있는 one-stage algorithm입니다. 이 논문에서는 YOLO v2(YOLO9000) 이후에 성능 향상을 위하여 어떤 것들을 적용하였는지 하나씩 설명해주고 있습니다. 또한 MS COCO의 metric인 average mAP에 대해서 비판하면서 mAP를 평가하는 방법에 대해서도 얘기를 하고 있는데요, 자세한 내용은 영상을 참고해주세요~
논문링크: https://arxiv.org/abs/1804.02767
영상링크: https://youtu.be/HMgcvgRrDcA
Yinyin Liu presents a model for object detection and localization, called Fast-RCNN. She will show how to introduce a ROI pooling layer into neon, and how to add the PASCAL VOC dataset to interface with model training and inference. Lastly, Yinyin will run through a demo on how to apply the trained model to detect new objects.
Real-time object detection coz YOLO! by Shagufta Gurmukhdas
You Only Look Once is a state-of-the-art, high speed real-time object detection algorithm. It looks at the whole image at test time so its predictions are informed by global context in the image. This talk teaches you to develop your own application to detect and classify objects in images & videos.
1.Intro to YOLO algorithm
2. Image detection on video with YOLO
3. Processing images in Python, adding bounding boxes and labels
4. Processing complete videos in Python in the similar way as the previous section
5. Processing real time video from webcam
Slides from Portland Machine Learning meetup, April 13th.
Abstract: You've heard all the cool tech companies are using them, but what are Convolutional Neural Networks (CNNs) good for and what is convolution anyway? For that matter, what is a Neural Network? This talk will include a look at some applications of CNNs, an explanation of how CNNs work, and what the different layers in a CNN do. There's no explicit background required so if you have no idea what a neural network is that's ok.
Introduction to Capsule Networks (CapsNets)Aurélien Géron
CapsNets are a hot new architecture for neural networks, invented by Geoffrey Hinton, one of the godfathers of deep learning.
You can view this presentation on YouTube at: https://youtu.be/pPN8d0E3900
NIPS 2017 Paper:
* Dynamic Routing Between Capsules,
* by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
* https://arxiv.org/abs/1710.09829
The 2011 paper:
* Transforming Autoencoders
* by Geoffrey E. Hinton, Alex Krizhevsky and Sida D. Wang
* https://goo.gl/ARSWM6
CapsNet implementations:
* Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-Keras
* TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow
* PyTorch: https://github.com/gram-ai/capsule-networks
Book:
Hands-On Machine with Scikit-Learn and TensorFlow
O'Reilly, 2017
Amazon: https://goo.gl/IoWYKD
Github: https://github.com/ageron
Twitter: https://twitter.com/aureliengeron
This is a presentation on the Yolo(You Only Look Once) object detection system. This is a state-of-the-art system that is works very fast. The presentation has been derived from the paper cited below
@article{yolov3,
title={YOLOv3: An Incremental Improvement},
author={Redmon, Joseph and Farhadi, Ali},
journal = {arXiv},
year={2018}
}
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.
Slides by Míriam Bellver at the UPC Reading group for the paper:
Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. "SSD: Single Shot MultiBox Detector." ECCV 2016.
Full listing of papers at:
https://github.com/imatge-upc/readcv/blob/master/README.md
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 207번째 논문 review입니다
이번 논문은 YOLO v3입니다.
매우 유명한 논문이라서 크게 부연설명이 필요없을 것 같은데요, Object Detection algorithm들 중에 YOLO는 굉장히 특색있는 one-stage algorithm입니다. 이 논문에서는 YOLO v2(YOLO9000) 이후에 성능 향상을 위하여 어떤 것들을 적용하였는지 하나씩 설명해주고 있습니다. 또한 MS COCO의 metric인 average mAP에 대해서 비판하면서 mAP를 평가하는 방법에 대해서도 얘기를 하고 있는데요, 자세한 내용은 영상을 참고해주세요~
논문링크: https://arxiv.org/abs/1804.02767
영상링크: https://youtu.be/HMgcvgRrDcA
Yinyin Liu presents a model for object detection and localization, called Fast-RCNN. She will show how to introduce a ROI pooling layer into neon, and how to add the PASCAL VOC dataset to interface with model training and inference. Lastly, Yinyin will run through a demo on how to apply the trained model to detect new objects.
Real-time object detection coz YOLO! by Shagufta Gurmukhdas
You Only Look Once is a state-of-the-art, high speed real-time object detection algorithm. It looks at the whole image at test time so its predictions are informed by global context in the image. This talk teaches you to develop your own application to detect and classify objects in images & videos.
1.Intro to YOLO algorithm
2. Image detection on video with YOLO
3. Processing images in Python, adding bounding boxes and labels
4. Processing complete videos in Python in the similar way as the previous section
5. Processing real time video from webcam
Slides from Portland Machine Learning meetup, April 13th.
Abstract: You've heard all the cool tech companies are using them, but what are Convolutional Neural Networks (CNNs) good for and what is convolution anyway? For that matter, what is a Neural Network? This talk will include a look at some applications of CNNs, an explanation of how CNNs work, and what the different layers in a CNN do. There's no explicit background required so if you have no idea what a neural network is that's ok.
Introduction to Capsule Networks (CapsNets)Aurélien Géron
CapsNets are a hot new architecture for neural networks, invented by Geoffrey Hinton, one of the godfathers of deep learning.
You can view this presentation on YouTube at: https://youtu.be/pPN8d0E3900
NIPS 2017 Paper:
* Dynamic Routing Between Capsules,
* by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
* https://arxiv.org/abs/1710.09829
The 2011 paper:
* Transforming Autoencoders
* by Geoffrey E. Hinton, Alex Krizhevsky and Sida D. Wang
* https://goo.gl/ARSWM6
CapsNet implementations:
* Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-Keras
* TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow
* PyTorch: https://github.com/gram-ai/capsule-networks
Book:
Hands-On Machine with Scikit-Learn and TensorFlow
O'Reilly, 2017
Amazon: https://goo.gl/IoWYKD
Github: https://github.com/ageron
Twitter: https://twitter.com/aureliengeron
This is a presentation on the Yolo(You Only Look Once) object detection system. This is a state-of-the-art system that is works very fast. The presentation has been derived from the paper cited below
@article{yolov3,
title={YOLOv3: An Incremental Improvement},
author={Redmon, Joseph and Farhadi, Ali},
journal = {arXiv},
year={2018}
}
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
"Towards a Science of Reproducible Science?" DPRMA Workshop talk at JCDL 2013, Indianapolis, 25th July 2013. Workshop website is http://dprma.oerc.ox.ac.uk/
Paper is
David De Roure. 2013. Towards computational research objects. In Proceedings of the 1st International Workshop on Digital Preservation of Research Methods and Artefacts (DPRMA '13). ACM, New York, NY, USA, 16-19. DOI=10.1145/2499583.2499590 http://doi.acm.org/10.1145/2499583.2499590
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
Get an introduction to rasterization and ray tracing with demonstrations in ParaView*, a popular general-purpose scientific visualization environment. Get an overview of open-source software packages available to the open-science community.
Object detection is a central problem in computer vision and underpins many applications from medical image analysis to autonomous driving. In this talk, we will review the basics of object detection from fundamental concepts to practical techniques. Then, we will dive into cutting-edge methods that use transformers to drastically simplify the object detection pipeline while maintaining predictive performance. Finally, we will show how to train these models at scale using Determined’s integrated deep learning platform and then serve the models using MLflow.
What you will learn:
Basics of object detection including main concepts and techniques
Main ideas from the DETR and Deformable DETR approaches to object detection
Overview of the core capabilities of Determined’s deep learning platform, with a focus on its support for effortless distributed training
How to serve models trained in Determined using MLflow
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This contains the agenda of the Spark Meetup I organised in Bangalore on Friday, the 23rd of Jan 2014. It carries the slides for the talk I gave on distributed deep learning over Spark
Collaborations in the Extreme: The rise of open code development in the scie...Kelle Cruz
Video: https://www.simonsfoundation.org/event/collaborations-in-the-extreme-the-rise-of-open-code-development-in-the-scientific-community/
The internet is changing the scientific landscape by fostering international, interdisciplinary and collaborative software development. More than ever before, software is a crucial component of any scientific result. The ability to easily share code is reshaping expectations about reproducibility -- a fundamental tenet of the scientific process. In this lecture, Kelle Cruz will briefly provide the backstory of how these shifts have come about, describe some of the most impactful open source projects, and discuss efforts currently underway aimed at ensuring these community-led projects are sustainable and receive support.
Histolab: an Open Source Python Library for Reproducible Digital PathologyAlessia Marcolini
The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors and it is expected to be at the center of the AI revolution in medicine, prevision supported by the increasing success of deep learning applications to digital pathology. The aim of histolab is to provide a tool for Whole Slide Images (WSIs) processing in a reproducible environment to support clinical and scientific research. histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles.
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
Even in the age of big data, labeled data is a scarce resource in many machine learning use cases. Florian Wilhelm evaluates generative adversarial networks (GANs) when used to extract information from vehicle registrations under a varying amount of labeled data, compares the performance with supervised learning techniques, and demonstrates a significant improvement when using unlabeled data.
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
Online vehicle marketplaces are embracing artificial intelligence to ease the process of selling a vehicle on their platform. The tedious work of copying information from the vehicle registration document into some web form can be automated with the help of smart text-spotting systems, in which the seller takes a picture of the document, and the necessary information is extracted automatically.
Florian Wilhelm details the components of a text-spotting system, including the subtasks of object detection and optical character recognition (OCR). Florian elaborates on the challenges of OCR in documents with various distortions and artifacts, which rule out off-the-shelf products for this task. After offering an overview of semisupervised learning based on generative adversarial networks (GANs), Florian evaluates the performance gains of this method compared to supervised learning. More specifically, for a varying amount of labeled data, he compares the accuracy of a convolution neural network (CNN) to a GANthat uses additional unlabeled data during the training phase, showing that GANs significantly outperform classical CNNs in use cases with a lack of labeled data.
What you'll learn:
Understand how semisupervised learning with GANs works
Explore beneficial semisupervised methods based on GANs for use cases with a limited amount of labeled data
Gain insight into an interesting OCR use case of an online vehicle marketplace
Event: O'Reilly Artificial Intelligence Conference, London, 11.10.2018
Speaker: Dr. Florian Wilhelm
Mehr Tech-Vorträge: www.inovex.de/vortraege
Mehr Tech-Artikel: www.inovex.de/blog
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
6. Main Contributions
• CornerNet: Detecting objects as pairs of top-left and bottom-
right corners
• Corner pooling to help better localize corners
• State-of-the-art performance among single-stage detectors
https://heilaw.github.io/
2. Introduction
8. CornerNet: Detecting Objects as Paired Keypoints
Person
Top-Left
Corner?
ConvNet
Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
2. Introduction
https://heilaw.github.io/
9. CornerNet: Detecting Objects as Paired Keypoints
Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose Botto
m-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
2. Introduction
https://heilaw.github.io/
10. CornerNet: Detecting Objects as Paired Keypoints
Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
Loss: distance
Loss: similarity
2. Introduction
https://heilaw.github.io/
14. Two-Stage Detector
[Girshick et al. CVPR’14] [He et al. ECCV’14] [He et al. ICCV’17] [Cai & Vasconcelos, CVPR’18] [Singh & Davis, CVPR’18]
Region Pooling
[Girshick, ICCV’15]
Region of Interest
[Ren et al. NIPS’15]
1st Network
2nd Network
Person
Person
https://heilaw.github.io/
r-cnn SPP Mask r-cnn Cascade r-cnn snip
Faster R-CNN PR-012 : https://youtu.be/kcPAGIgBGRs
Mask R-CNN PR-057 : https://youtu.be/RtSZALC9DlU
3. Related Works
15. One-stage Detector
Class
Person
Class
Person
Class
Background
Anchors
Anchors
Anchors
[Redmon & Farhadi, CVPR’17] [Shen et al. ICCV’17] [Liu et al. ECCV’16] [Fu et al. arXiv’17] [Lin et al. ICCV’17] [Zhang et al. CVPR’18]
ConvNet
Yolo9000 Dsod Ssd Dssd RetinaNet RefineDet
Yolo PR-016 : https://youtu.be/eTDcoeqj1_w
Yolo9000 PR-023 : https://youtu.be/6fdclSGgeio
SSD PR-132 https://youtu.be/ej1ISEoAK5g
https://heilaw.github.io/
3. Related Works
16. Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks."
Advances in neural information processing systems. 2015. (https://arxiv.org/abs/1506.01497)
3. Related Works
Anchor Boxes
https://medium.com/@andersasac/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9
17. Drawbacks of Anchor Boxes
1. Need a large number of anchors
A tiny fraction of anchors are positive examples
Slow down training [Lin et al. ICCV’17]
2. Extra hyperparameters – sizes and aspect ratios
At least one anchor
sufficiently overlaps
with ground-truth
https://heilaw.github.io/
3. Related Works
19. 3.2 Detecting Corner
Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose
estimation." European Conference on Computer Vision. Springer, Cham, 2016.
20. 3.2 Detecting Corner
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE
international conference on computer vision. 2017.
Ground-Truth Annotation
21. 3.2 Detecting Corner
Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
Faster R-CNN
Bounding-box regression
𝑜 𝑘: offset
n : downsampling factor
𝑥 𝑘, yk: coordinate for corner k
25. Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint detection and grouping."
Advances in Neural Information Processing Systems. 2017.
3.3 Grouping Corners
26. 3.3 Grouping Corners
𝑒𝑡 𝑘
: embedding for the top-left corner of object k
𝑒 𝑏 𝑘
: embedding for the bottom-right corner of object k
𝑒 𝑘: : average of 𝑒𝑡 𝑘
and 𝑒 𝑏 𝑘
△ : 1
32. α and β to 0.1 and γ to 1
4 Experiments
4.1 Training Details
- Implementation in PyTorch https://github.com/princeton-vl/CornerNet
- Network is randomly initialized with no pretraining on any external dataset
- Input Resolution : 511 x 511, Output Resolution : 128 x 128
- Data augmentation : horizontal flipping, random scaling/cropping/color jittering
- Bach_size : 49 (Total 10 Tintan X GPUs, 4 on the master GPU, 5 images for the rest)
- For ablation study : 250k iterations with a learning rate of 2.5 × 10−4
- For comparing with others : an extra 250k iterations and reduce the learning rate to 2.5 ×
10−5 for the last 50k iterations.
33. 4 Experiments
4.2 Testing Details
A simple post-processing algorithm
1. Non-maximal suppression :
3 x 3 max pooling layer on the corner heatmap
2. Picking the top 100 top-left, bottom-right corners from the heatmap
3. The corner locations are adjusted by the corresponding offsets
4. Calculation L1 distances between the embeddings of the top-left and bottom-right corners.
5. Pairs that have distances greater than 0.5 or contain corners from different categories are
rejected.
6. The average scores of the top-left and bottom-right corners are used as the detection
Generating
bounding boxes
36. Conclusion
• CornerNet: Detecting objects as pairs of top-left and bottom-
right corners
• Corner pooling to help better localize corners
• State-of-the-art performance among single-stage detectors
https://heilaw.github.io/
37. Further Discussion
• Other backbone?
• Occlusion between points?
• Corner Pooling
• Speed?
Corner pooling
38. The average inference time : 244ms per image on a Titan X (PASCAL) GPU (AP : 42.1)
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
39. REFERENCES
[1] Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints."
Proceedings of the European Conference on Computer Vision (ECCV). 2018.
[2] Lin, Tsung-Yi, et al. "Focal loss for dense object detection."
Proceedings of the IEEE international conference on computer vision. 2017.
[3] Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint
detection and grouping." Advances in Neural Information Processing Systems. 2017.
[4] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
[5] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose
estimation." European Conference on Computer Vision. Springer, Cham, 2016.
Editor's Notes
네 발표를 시작하도록 하겠습니다. 제가 이번에 발표드릴 논문은
CornerNet: Detecting Object as Paired Keypoints 이라는 논문입니다. ECCV’18 에 발표가 된 논문입니다.
이 논문은 제목에서 알 수 있겠지만 Object Detection 관련된 논문 이구요.
PR-12에서 Object Detection 논문이 많이 소개되어서
(이제 많은 분들이 아시겠지만) Object Detection이 풀려는 문제는
이미지 내 multiple objects가 각각이 무엇이고 어디에 있는지를
바운딩박스를 통해 localization 하는 문제입니다.
쉽게말해서 네모박스 치는거져
이 논문에서는 Object Detection task를 paired keypoints detection 으로 풀어보겠다는 것이구요
바운딩 박스라는걸 top-left, bottom-right의 이 두 코너포인트로 나타낼 수 있잖아요?
그래서 결국 바운딩박스의 top-left, bottom-right corne를 찾아보겠다는 것입니다.
결국 human pose estimation 문제에서 keypoint detection 하는 방법과 거의 유사한 방법을 적용했다고 보시면 됩니다.
이 논문을 선정하게된 이유는, 대부분의 object detection 논문들이 high recal의 anchor boxes를 기반으로 하는데.
다른 시도를 한 점이 흥미로워서 선정하게 되었습니다.
이번 발표슬라이드도 저자가 리소스들을 많이 공개를 해서 많은 부분들을 참고를 해서 만들었습니다.
요즘 발표 때 마다 아주 유용하게 써먹는 figure인데요
아주 고마운 분꼐서 딥러닝 기반의 Object Detection 알고리즘들을 리스트업해 주셨는데
CornerNet 논문이 여기쯤 있는 것을 보실 수 있습니다.
여기에서 확인하실 수 있습니다.
우선 논문의 main contributions은 다음과 같습니다.
우선 CornerNet 이라는 Object Detection Network를 제안합니다.
가장 큰 특징이라면 기존의 앵커박스 기반의 네트워크와는 달리 top-left, bottom-right coner points로 바운딩박스를 예측하는 새로운 모델을 제안하고 있구요.
두번째로는 corner 예측을 더 잘 예측하게 도와주는 corner pooling 이라는 새로운 pooling 방법을 제안합니다.
그리고 이런 방법을 통해서 COCO dataset mAP 으로, single-stage detectors 중에서 SOTA 성능을 달성했다고 합니다.
다시한번 그림으로 보자면 CornerNet의 핵심 아이디어는 이렇게 키포인트 두개를 가지고 바운딩박스를 예측해 보겠다는 것이구요
그러다보니 human pose estimation 쪽의 아이디어를 대부분 활용했다고 보시면 될 것 같습니다.
대략적으로 어떻게 돌아가는지를 보면 우리가 두 포인트를 예측해서 바운딩박스와 해당 클래스를 예측하고자 한다면
결국 이런식으로 Top-Left corner point와, Bottom Right coner point를 예측하고 당연히 클래스 probability도 예측을 하면 되겠죠 실제로 이부분이 합쳐져서 W x H x numClass 만큼의 map을 만들게 됩니다.
.그리고 여기 중간에 있는게 embedding vecto인데 (다음 슬라이드)
나중에 다시 설명하겠지만. 포인트를 예측했으면
결국 얘가 얘랑 연결되는지 아니면 애랑 연결되는지를 결정해줘야합니다.
Top-Left corner point와 Borrom-right corner point를 이 벡터간의 유사성을 이용해서 연결시켜줍니다.
그래서 결국 이 두개는 동일한 값을 가지도록
그리고 이 두 값은 다른 값을 가지도록 학습을 시킵니다.
네트워크 아웃풋 중에 heatmap 이 바로 이런 식으로 나오는데요
왼쪽이 Top-Left, 오른쪽이 Bottom-Right Corner에 대한 히트맵입니다.
여기 person class와 tennis rocket class가 하나의 히트맵으로 나타나 있는데
실제로 히트맵은 클래스마다 하나씩 가지고 있어서 실제로는 W * H * C 만큼의 히트맵을 각각 가지고 있겠죠
원래 이 person 이랑 tennis rocket은 서로 다른 채널에 있겠죠
실험은 MSCOCO로 하는데 CornerNet이 One-stage Detectors 중에서는 COCO mAP에서 SOTA를 성능을 달성했다고 합니다.
그리고 논문에서 푸는 썰을 좀 말씀드리자면
기존 object detection 알고리즘들을 크게
one-stage, two-stage 알고리즘으로 나눠 볼 수가 있을텐데
2-Stage Detector들은 보통 R-CNN 계열의 알고리즘들로
Region Proposal Network 가 앞쪽에 붙어있는 형태고
1-Stage Detector에 비해 상대적으로 속도는 느리지만 디텍션 성능이 좋은 특징이 있죠
반대로 1-stage network는 하나의 네트워크가 앵커박스도 찾고 클래스도 찾는 형태이고
2-stage에 비해 detection 성능이 조금씩 딸리지만 비교적 속도가 빠른 특징이 있습니다.
근데 1-stage 건 2-stage건 보통 anchor boxes 기반의 detector라고 할 수 있는데.
이 방식은 이런식으로 엄청나게 많은 후보 바운딩박스를 만들어놓고 하나라도 걸려라 하는 식인거죠
여기서부터가 논문의 썰인데
그럼 이런 앵커박스 기반의 방식의 문제가 무엇이냐면, 엄처안게 많은 앵커박스를 사전에 만들어 놓는데
사실 그 중에 객체랑 매칭이 되는 앵커는 비율로 따지면 수천개 중에 몇 개 안된다는 점이고
이게 근본적으로 positive, negative boxes간의 데이터 imbalance 문제를 초래하고, 학습을 더디게 한다는 점이 있고
두번째로는 이런 앵커박스를 이용하면 또 이것도 사람의 휴리스틱이 필요한 추가적인 hyperparameter가 필요하다는 문제가 있다는 겁니다.
그래서 우리는 앵커박스를 안쓰고 코너 포인트를 이용해보겠다 라는 거겠죠
네트워크를 보면 다음과 같습니다. Backbone으로 아워글래스 네트워크가 있는데 이건 human pose estimation 논문인
Stacked hourglass networks for human pose estimation 이라는 논문에서 에서 제안하는 모델인데, 참고로 CornerNet이랑
같은 Lab에서 나온 논문으로 알고 있구요. 제가 이 부분은 잘 모르기도 하고 이 논문에서 깊게 다루지 않기도 해서
Hourglass network 같은 부분은 다음에 이 논문을 발표할 기회가 있으면 그때 다시한번 다루도록 하겠습니다.
결국 다른 detection 논문들 처럼 hourglass networ이라는 backbone networt에서 representatio을 뽑아낸다고 보시면 되구요
이후에 두 브렌치로 나눠지는데 하나가 Top-left corner 그리고 다른 한 쪽이
Bottom-right corners로 나눠지게 됩고, 각각의 모듈에서 코너가 담긴 히트맵과 임베딩 벡터, 오프셋을 예측합니다.
히트맵이랑 임베딩s는 앞서 봤었고, 오프셋은 기존 object detection 알고리즘들의 bounding box regression과 거의 동일한 역할이라고
보시면 됩니다. 각 포인트에 대한 위치를 미세조정해주는 역할을 합니다.
이제 Loss Function을 조금 살펴볼건데요. 우선 L_det을 살펴볼껀데
앞서 말씀드렸듯이 각 포인트의 위치를 W*H*numClass만큼의 히트맵으로 가지고 있다고 했었잖아요?
이 맵의 gt를 만들때 그냥 점만 딱 찍는게 아니라 그 점을 중심으로 하는 가우시안을 정답 히트맵에 입힙니다.
그러면 정답값도 이런 형태로 나오겠죠.
그리고 실제 loss function은 focal loss를 변형해서 만드는데, 제 생각에는 당연히 얘도 결국 class imbalance 문제가 있겠죠
그래서 focal loss를 사용한다고 생각하구요 결국은 cross entropy loss의 변형형태. 다만 정답이 binar가 아니기때문에
여기 1 – y 부분이 들어갑니다. 가우시안으로 Negative가 0만 있진 않겠죠.
그리고 L_off 가 앞서 말씀드린 offset에 대한 loss인데
Offset을 하는 이유는 CorNerNet이 입력과 아웃풋의 사이즈가 다릅니다.
그게 n배 만큼 달라져서 그만큼 이제 localization error가 생기는데 그걸 보정해주는 용도구요
Loss metric는 Smooth L1Loss를 사용하는데 이는 Faster r-cnn이나 여러 BBX REGRESSION 할때 쓰던 loss를
그대로 가져와서 씁니다. 제가 알기로는 l1이나 l2를 그냥 사용하면 학습이 잘 안되는것으로 알고 있습니다.
더 자세히는 한번 고민을 해 봐야할 것 같습니다.
이제 임베딩에 대한 loss가 남아있는데
Pull push loss가 있습니다.
쉽게 생각하면 같은 놈들은 같도록
다른 놈들은 다르도록 학습시키는건데요
사실 이런 학습방법도 keypoints detection 논문에서 가져온거라고 보시면 되고
Keypoint detection 할때 키포인트들간에 연결시켜줄때 이렇게 임베일 벡터를 학습시켜서
유사한 키포인트들끼리 연결시켜주는 방법이라고 생각하시면 됩니다.
이것도 키포인트 디텍션에서의 예시인데.
임베딩 벡터가 2차원 이상일 것이라고 생각했는데 CornerNet도 그렇고
이렇게 1차원 임베딩을 사용합니다.
이 예시에서는 9명의 사람이 있는데 잘못된 것도 하나 있구요
키포인트를 연결시키는데 이 임베딩 벡터가 유사한 것들 끼리 연결시키도록 임베딩을 학습시키면
이렇게 잘 되더라 하는 논문입니다.
그래서 이 방법을 고대로 가져와서 detection에 적용한 건데요
E_t_k는 k 번째 오브젝트의 top-left corne의 임베딩이고
E_b_k 는 K번째 오브젝트의 bottom-right corner의 임베딩입니다.
E_k는 이 둘의 평균이구요.
L_pull은 결국 이 두 임베딩이 평균에 가까워지게 학습을 시키는 것이고
L_push는 서로 다른 오브젝트간의 이 평균 임베딩이 멀어지도록 학습시키는 것입니다.
그리고 마지막으로 corner pooling 이라는 기법을 제안하는데요
앞서 backbone에서 두 브렌치로 나왔잖아요?
하나가 top-left corner pooling module 이고 다른 하나는 bottom-left plling module 이었죠
여기 들어가는게 Corner pooling 인데요
논문에서 표현하기로는 이 코너에 local visual evidence가 부족할 수 있다고 이야기해요
이런 예시를 봐도 사실 코너의 위치가 객체와 아주 먼 곳에 있잖아요? 이럴때
이런식으로 수평적으로 한번 보고, 수직으로도 한번 볼 수 있으면 좋지 않을까 하는아이디어구요.
가령 이렇게 top-left corner의 경우에는 여기에서는 요 라인중에서 가장 큰 값
그리고 여기에서는 요 라인중에서 가장 큰 값을 가져옵니다.
다시보면 여기 3이라는 값은 요기에서 가장 큰값
여기 3은 요기에서 가장 큰 값이고 이 둘을 element wise 더한게 6이 됩니다.
요런 방식으로 corner 픽셀위치에 activatio을 주는 방법을 corner polling 이라고 합니다.
Bottom right corner pooling의 경우에는 반대로 하면 되겠죠
여기 corner pooling에 대한 ablation study가 있는데
Corner pooling을 했더니 특히 medium, large object의 map가 크게 올랐다고 합니다.
그래서 실제로는 pytorch로 구현이 되어 있구요
특이한 점이 있다면 pretrained network를 사용하지 않았다는 점이 있구요.. 이유는 나와있지는 않네요..
앞서 네트워크의 출력이 heatmap, embeddings, offset이었는데
이걸 가지고 실제 바운딩박스를 만들어야겠죠
이 논문에서는 Simple post-processing 알고리즘을 사용한다 라고 하는데 단계는 엄청 많네요
우선 3x3 pooling을 통해서 지역적으로 가장 높은 값들을 뽑아내구요
거기서 상위 100개의 top-left, bottom-right corners를 다시 뽑아냅니다.
그리고 여기에 offse을 더해주고, L1 distanc로 top-left, bottom right 임베딩의 거리를 계산합니다.
그래서 가장 가까운게 0.5를 넘거나 클래스가 서로 다르면 제거하고
최종으로 남은 paired points를 가지고 바운딩박스를 만들게 됩니다.
결과는 우리께 잘된다. 라는 거겠죠
잘 안되는 케이스도 넣어줬으면 좋았을것 같은데
그럼점이 살짝 아쉽구요
그리고 다른 SOTA detectors와 비교해 봤을때
One-stage detectors 중에서는 가장 높았고
Tow-stage detectors 에도 견줄만하다 라고 합니다.
다시 한번 결론을 짓자면
이번 논문에서는 cornetnet이라는 네트워크를 제안을 했고
Top-left, bottom right corner를 가지고 바운딩박스를 찾아보자는 방법이었습니다.
그리고 cornerpooling 이라는 방법을 통해서 성능을 조금 더 올릴 수 있었고
COCO map 기준 songle-stage detectors 중에서는 SOTA 성능을 보였다.
라고 정리할 수 있겠습니다.
그리고 Oral Session 동영상에서 나왔던 질문중에
Hourglass network 말고 다른 backbone을 써봤냐 라는 질문이 있었는데
Resnet, ResNext 등의 모델을 써봤는데 성능이 안좋았다는 답변이 있었고
Point 간의 Occlusion이 발생하면 어떡하냐는 질문에 그러면 별 수 없다.. 향후 과제인것 같다 라는 답변에 있었습니다.
그리고 개인적을 corner pooling 이 조금 의심하는데 surveillance video 같이 동일한 사람 클래스가 엄청 많거나 하는
상황에서 저런 pooling 을 사용하면 서로 다른 object 끼리 activation이 섞일 우려가 있을 것 같단 생각이 들었고
예를들어 여기 6이 있는데 그러면 다 6이 되잖아요? 요런게 문제가 될 수 있지 않을까 하는거죠
속도와 관련된 궁금증도 있는데
앞서 말씀드렸듯이 one-stage detector 는 비교적 성능이 조금 나쁘거나 비슷한 상황에서 속도가 빠른점은 강조하는
식의 비교를 많이 하곤 하는데 fps가 들어간 테이블이 없다는 점이 의하했습니다.
대신 TitanX 기준 inference time이 장당 224ms 정도라는 말이 있긴 한데
One-stage detector 측면에서보면 속도가 좀 아쉽다는 생각이 들고 이 plot은 retinanet 논문에서 가져왔는데 제가 알기로는
이걸 벤치마킹한 GPU가 TintanX급으로 알고있습니다만 여기에 견주어보면 다른 one-stage detector에 비해
성능이 좋을지는 몰라도 inference time 이 너무 느린점이 있다고 봅니다. 그럼 결국 one-stage detector 중에서
COCO dataset에서 SOTA를 찍은게 큰 의미가 있을까? 하는 생각이 들었습니다.
In this work, we propose CornetNet, a new one-stage detector that does away with anchor boxes. We reformulate object detection as detecting and grouping keypoints. In particular, we detect the top-left corners and bottom-right corners of bounding boxes, and pair them to form individual object instances.