PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 270번째 논문 review입니다.
이번 논문은 Baidu에서 나온 PP-YOLO: An Effective and Efficient Implementation of Object Detector입니다. YOLOv3에 다양한 방법을 적용하여 매우 높은 성능과 함께 매우 빠른 속도 두마리 토끼를 다 잡아버린(?) 그런 논문입니다. 논문에서 사용한 다양한 trick들에 대해서 좀 더 깊이있게 살펴보았습니다. Object detection에 사용된 기법 들 중에 Deformable convolution, Exponential Moving Average, DropBlock, IoU aware prediction, Grid sensitivity elimination, MatrixNMS, CoordConv, 등의 방법에 관심이 있으시거나 알고 싶으신 분들은 영상과 발표자료를 참고하시면 좋을 것 같습니다!
논문링크: https://arxiv.org/abs/2007.12099
영상링크: https://youtu.be/7v34cCE5H4k
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, CEO and Founder of Auviz Systems, presents the "Trade-offs in Implementing Deep Neural Networks on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Video and images are a key part of Internet traffic—think of all the data generated by social networking sites such as Facebook and Instagram—and this trend continues to grow. Extracting usable information from video and images is thus a growing requirement in the data center. For example, object and face recognition are valuable for a wide range of uses, from social applications to security applications. Deep neural networks are currently the most popular form of convolutional neural networks (CNN) used in data centers for such applications. 3D convolutions are a core part of CNNs. Nagesh presents alternative implementations of 3D convolutions on FPGAs, and discusses trade-offs among them.
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 270번째 논문 review입니다.
이번 논문은 Baidu에서 나온 PP-YOLO: An Effective and Efficient Implementation of Object Detector입니다. YOLOv3에 다양한 방법을 적용하여 매우 높은 성능과 함께 매우 빠른 속도 두마리 토끼를 다 잡아버린(?) 그런 논문입니다. 논문에서 사용한 다양한 trick들에 대해서 좀 더 깊이있게 살펴보았습니다. Object detection에 사용된 기법 들 중에 Deformable convolution, Exponential Moving Average, DropBlock, IoU aware prediction, Grid sensitivity elimination, MatrixNMS, CoordConv, 등의 방법에 관심이 있으시거나 알고 싶으신 분들은 영상과 발표자료를 참고하시면 좋을 것 같습니다!
논문링크: https://arxiv.org/abs/2007.12099
영상링크: https://youtu.be/7v34cCE5H4k
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, CEO and Founder of Auviz Systems, presents the "Trade-offs in Implementing Deep Neural Networks on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Video and images are a key part of Internet traffic—think of all the data generated by social networking sites such as Facebook and Instagram—and this trend continues to grow. Extracting usable information from video and images is thus a growing requirement in the data center. For example, object and face recognition are valuable for a wide range of uses, from social applications to security applications. Deep neural networks are currently the most popular form of convolutional neural networks (CNN) used in data centers for such applications. 3D convolutions are a core part of CNNs. Nagesh presents alternative implementations of 3D convolutions on FPGAs, and discusses trade-offs among them.
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
Introduction to Inception and Xception
video: https://youtu.be/V0dLhyg5_Dw
Papers:
Going Deeper with Convolutions
Rethinking the Inception Architecture for Computer Vision
Inception-v4, Inception-RestNet and the Impact of Residual Connections on Learning
Xception: Deep Learning with Depthwise Separable Convolutions
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 217번째 논문 review입니다
이번 논문은 GoogleBrain에서 쓴 EfficientDet입니다. EfficientNet의 후속작으로 accuracy와 efficiency를 둘 다 잡기 위한 object detection 방법을 제안한 논문입니다. 이를 위하여 weighted bidirectional feature pyramid network(BiFPN)과 EfficientNet과 유사한 방법의 detection용 compound scaling 방법을 제안하고 있는데요, 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1911.09070
영상링크: https://youtu.be/11jDC8uZL0E
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
Tensorflow-KR 논문읽기모임 95번째 발표영상입니다
Modularity Matters라는 제목으로 visual relational reasoning 문제를 풀 수 있는 방법을 제시한 논문입니다. 기존 CNN들이 이런 문제이 취약함을 보여주고 이를 해결하기 위한 방법을 제시합니다. 관심있는 주제이기도 하고 Bengio 교수님 팀에서 쓴 논문이라서 review 해보았습니다
발표영상: https://youtu.be/dAGI3mlOmfw
논문링크: https://arxiv.org/abs/1806.06765
Final Presentation of my Thesis on "A Neurally Controlled Robot That Learns" at Imperial College, 22. Sept 2011.
Full thesis incl. source code available on Github:
https://github.com/bwalther/DA-STDP-modulated-learning-in-mobile-robots
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
Tensorfkow-KR 논문읽기모임 PR12 144번째 논문 review입니다.
이번에는 Efficient CNN의 대표 중 하나인 SqueezeNext를 review해보았습니다. SqueezeNext의 전신인 SqueezeNet도 같이 review하였고, CNN을 평가하는 metric에 대한 논문인 NetScore에서 SqueezeNext가 1등을 하여 NetScore도 같이 review하였습니다.
논문링크:
SqueezeNext - https://arxiv.org/abs/1803.10615
SqueezeNet - https://arxiv.org/abs/1602.07360
NetScore - https://arxiv.org/abs/1806.05512
영상링크: https://youtu.be/WReWeADJ3Pw
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 231번째 논문 review 입니다
이번 논문은 Google Brain에서 나온 A Simple Framework for Contrastive Learning of Visual Representations입니다. Geoffrey Hinton님이 마지막 저자이시기도 해서 최근에 더 주목을 받고 있는 논문입니다.
이 논문은 최근에 굉장히 핫한 topic인 contrastive learning을 이용한 self-supervised learning쪽 논문으로 supervised learning으로 학습한 ResNet50와 동일한 성능을 얻을 수 있는 unsupervised pre-trainig 방법을 제안하였습니다. Data augmentation, Non-linear projection head, large batch size, longer training, NTXent loss 등을 활용하여 훌륭한 representation learning이 가능함을 보여주었고, semi-supervised learning이나 transfer learning에서도 매우 뛰어난 결과를 보여주었습니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/2002.05709
영상링크: https://youtu.be/FWhM3juUM6s
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 297번째 리뷰입니다
어느덧 PR-12 시즌 3의 끝까지 논문 3편밖에 남지 않았네요.
시즌 3가 끝나면 바로 시즌 4의 새 멤버 모집이 시작될 예정입니다. 많은 관심과 지원 부탁드립니다~~
(멤버 모집 공지는 Facebook TensorFlow Korea 그룹에 올라올 예정입니다)
오늘 제가 리뷰한 논문은 Facebook의 Training data-efficient image transformers & distillation through attention 입니다.
Google에서 나왔던 ViT논문 이후에 convolution을 전혀 사용하지 않고 오직 attention만을 이용한 computer vision algorithm에 어느때보다 관심이 높아지고 있는데요
이 논문에서 제안한 DeiT 모델은 ViT와 같은 architecture를 사용하면서 ViT가 ImageNet data만으로는 성능이 잘 안나왔던 것에 비해서
Training 방법 개선과 새로운 Knowledge Distillation 방법을 사용하여 mageNet data 만으로 EfficientNet보다 뛰어난 성능을 보여주는 결과를 얻었습니다.
정말 CNN은 이제 서서히 사라지게 되는 것일까요? Attention이 computer vision도 정복하게 될 것인지....
개인적으로는 당분간은 attention 기반의 CV 논문이 쏟아질 거라고 확신하고, 또 여기에서 놀라운 일들이 일어날 수 있을 거라고 생각하고 있습니다
CNN은 10년간 많은 연구를 통해서 발전해왔지만, transformer는 이제 CV에 적용된 지 얼마 안된 시점이라서 더 기대가 크구요,
attention이 inductive bias가 가장 적은 형태의 모델이기 때문에 더 놀라운 이들을 만들 수 있을거라고 생각합니다
얼마 전에 나온 open AI의 DALL-E도 그 대표적인 예라고 할 수 있을 것 같습니다. Transformer의 또하나의 transformation이 궁금하신 분들은 아래 영상을 참고해주세요
영상링크: https://youtu.be/DjEvzeiWBTo
논문링크: https://arxiv.org/abs/2012.12877
For real world application, convolutional neural network(CNN) model can take more than 100MB of space and can be computationally too expensive. Therefore, there are multiple methods to reduce this complexity in the state of art. Ristretto is a plug-in to Caffe framework that employs several model approximation methods. For this projects, first a CNN model is trained for Cifar-10 dataset with Caffe, then Ristretto will be use to generate multiple approximated version of the trained model using different schemes. The goal of this projects is comparison of the models in terms of execution performance, model size and cache utilizations in the test or inference phase. The same steps are done with Tensorflow and Quantisation tool. The quantisation schemes of Tensorflow and Ristretto are then compared.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
Introduction to Inception and Xception
video: https://youtu.be/V0dLhyg5_Dw
Papers:
Going Deeper with Convolutions
Rethinking the Inception Architecture for Computer Vision
Inception-v4, Inception-RestNet and the Impact of Residual Connections on Learning
Xception: Deep Learning with Depthwise Separable Convolutions
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 217번째 논문 review입니다
이번 논문은 GoogleBrain에서 쓴 EfficientDet입니다. EfficientNet의 후속작으로 accuracy와 efficiency를 둘 다 잡기 위한 object detection 방법을 제안한 논문입니다. 이를 위하여 weighted bidirectional feature pyramid network(BiFPN)과 EfficientNet과 유사한 방법의 detection용 compound scaling 방법을 제안하고 있는데요, 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1911.09070
영상링크: https://youtu.be/11jDC8uZL0E
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
Tensorflow-KR 논문읽기모임 95번째 발표영상입니다
Modularity Matters라는 제목으로 visual relational reasoning 문제를 풀 수 있는 방법을 제시한 논문입니다. 기존 CNN들이 이런 문제이 취약함을 보여주고 이를 해결하기 위한 방법을 제시합니다. 관심있는 주제이기도 하고 Bengio 교수님 팀에서 쓴 논문이라서 review 해보았습니다
발표영상: https://youtu.be/dAGI3mlOmfw
논문링크: https://arxiv.org/abs/1806.06765
Final Presentation of my Thesis on "A Neurally Controlled Robot That Learns" at Imperial College, 22. Sept 2011.
Full thesis incl. source code available on Github:
https://github.com/bwalther/DA-STDP-modulated-learning-in-mobile-robots
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
Tensorfkow-KR 논문읽기모임 PR12 144번째 논문 review입니다.
이번에는 Efficient CNN의 대표 중 하나인 SqueezeNext를 review해보았습니다. SqueezeNext의 전신인 SqueezeNet도 같이 review하였고, CNN을 평가하는 metric에 대한 논문인 NetScore에서 SqueezeNext가 1등을 하여 NetScore도 같이 review하였습니다.
논문링크:
SqueezeNext - https://arxiv.org/abs/1803.10615
SqueezeNet - https://arxiv.org/abs/1602.07360
NetScore - https://arxiv.org/abs/1806.05512
영상링크: https://youtu.be/WReWeADJ3Pw
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 231번째 논문 review 입니다
이번 논문은 Google Brain에서 나온 A Simple Framework for Contrastive Learning of Visual Representations입니다. Geoffrey Hinton님이 마지막 저자이시기도 해서 최근에 더 주목을 받고 있는 논문입니다.
이 논문은 최근에 굉장히 핫한 topic인 contrastive learning을 이용한 self-supervised learning쪽 논문으로 supervised learning으로 학습한 ResNet50와 동일한 성능을 얻을 수 있는 unsupervised pre-trainig 방법을 제안하였습니다. Data augmentation, Non-linear projection head, large batch size, longer training, NTXent loss 등을 활용하여 훌륭한 representation learning이 가능함을 보여주었고, semi-supervised learning이나 transfer learning에서도 매우 뛰어난 결과를 보여주었습니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/2002.05709
영상링크: https://youtu.be/FWhM3juUM6s
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 297번째 리뷰입니다
어느덧 PR-12 시즌 3의 끝까지 논문 3편밖에 남지 않았네요.
시즌 3가 끝나면 바로 시즌 4의 새 멤버 모집이 시작될 예정입니다. 많은 관심과 지원 부탁드립니다~~
(멤버 모집 공지는 Facebook TensorFlow Korea 그룹에 올라올 예정입니다)
오늘 제가 리뷰한 논문은 Facebook의 Training data-efficient image transformers & distillation through attention 입니다.
Google에서 나왔던 ViT논문 이후에 convolution을 전혀 사용하지 않고 오직 attention만을 이용한 computer vision algorithm에 어느때보다 관심이 높아지고 있는데요
이 논문에서 제안한 DeiT 모델은 ViT와 같은 architecture를 사용하면서 ViT가 ImageNet data만으로는 성능이 잘 안나왔던 것에 비해서
Training 방법 개선과 새로운 Knowledge Distillation 방법을 사용하여 mageNet data 만으로 EfficientNet보다 뛰어난 성능을 보여주는 결과를 얻었습니다.
정말 CNN은 이제 서서히 사라지게 되는 것일까요? Attention이 computer vision도 정복하게 될 것인지....
개인적으로는 당분간은 attention 기반의 CV 논문이 쏟아질 거라고 확신하고, 또 여기에서 놀라운 일들이 일어날 수 있을 거라고 생각하고 있습니다
CNN은 10년간 많은 연구를 통해서 발전해왔지만, transformer는 이제 CV에 적용된 지 얼마 안된 시점이라서 더 기대가 크구요,
attention이 inductive bias가 가장 적은 형태의 모델이기 때문에 더 놀라운 이들을 만들 수 있을거라고 생각합니다
얼마 전에 나온 open AI의 DALL-E도 그 대표적인 예라고 할 수 있을 것 같습니다. Transformer의 또하나의 transformation이 궁금하신 분들은 아래 영상을 참고해주세요
영상링크: https://youtu.be/DjEvzeiWBTo
논문링크: https://arxiv.org/abs/2012.12877
For real world application, convolutional neural network(CNN) model can take more than 100MB of space and can be computationally too expensive. Therefore, there are multiple methods to reduce this complexity in the state of art. Ristretto is a plug-in to Caffe framework that employs several model approximation methods. For this projects, first a CNN model is trained for Cifar-10 dataset with Caffe, then Ristretto will be use to generate multiple approximated version of the trained model using different schemes. The goal of this projects is comparison of the models in terms of execution performance, model size and cache utilizations in the test or inference phase. The same steps are done with Tensorflow and Quantisation tool. The quantisation schemes of Tensorflow and Ristretto are then compared.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
Introduction to computer vision with Convoluted Neural Networks - going over history of CNNs, describing basic concepts such as convolution and discussing applications of computer vision and image recognition technologies
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 Once-for-All: Train One Network and Specialize it for Efficient Deployment 라는 제목의 논문입니다.
모델을 실제로 하드웨어에 Deploy하는 그 상황을 보고 있는데 이 페이퍼에서 꼽고 있는 가장 큰 문제는 실제로 트레인한 모델을 Deploy할 하드웨어 환경이 너무나도 많다는 문제가 하나 있습니다 모든 디바이스가 갖고 있는 리소스가 다르기 때문에 모든 하드웨어에 맞는 모델을 찾기가 사실상 불가능하다는 문제를 꼽고 있고요
각 하드웨어에 맞는 옵티멀한 네트워크 아키텍처가 모두 다른 상황에서 어떻게 해야 될건지에 대한 고민이 일반적 입니다. 이제 할 수 있는 접근중에 하나는 각 하드웨어에 맞게 옵티멀한 아키텍처를 모두 다 찾는 건데 그게 사실상 너무나 많은 계산량을 요구하기 때문에 불가능하다라는 문제를 갖고 있습니다 삼성 노트 10을 예로 한 어플리케이션의 requirement가 20m/s로 그 모델을 돌려야 된다는 요구사항이 있으면은 그 20m/s 안에 돌 수 있는 모델이 뭔지 accuracy가 뭔지 이걸 찾기 위해서는 파란색 점들을 모두 찾아야 되고 각 점이 이제 트레이닝 한번을 의미하게 됩니다 그래서 사실상 다 수의 트레이닝을 다 해야지만 그 중에 뭐가 최적인지 또 찾아야 합니다. 실제 Deploy해야 되는 시나리오가 늘어나면 이게 리니어하게 증가하기 때문에
각 하드웨어에 맞는 그런 옵티멀 네트워크를 찾는게 사실상 불가능합니다.
그래서 이제 OFA에서 제안하는 어프로치는 하나의 네트워크를 한번 트레이닝 하고 나면 다시 하드웨어에 맞게 트레이닝할 필요 없이 그냥 각 환경에 맞게 가져다 쓸 수 있는 서브네트워크를 쓰면 된다 이게 주로 메인으로 사용하고 있는 어프로치입니다.
오늘 논문 리뷰를 위해 펀디멘탈팀 김동현님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
NIT Silchar ML Hackathon 2019 Session on Computer Vision with Deep Learning.
Targeted Audience: Pre-requisite: Basic knowledge on Machine Learning and Deep Learning
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Mx net image segmentation to predict and diagnose the cardiac diseases karp...KannanRamasamy25
Powerful open-source deep learning framework instrument
MXNet supports multiple languages like C++, Python, R, Julia, Perl etc
MXNet supported by Intel, Dato, Baidu, Microsoft, Wolfram Research, and research institutions such as Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science and Technology
Symbolic Execution: Static symbolic graph executor, which provides efficient symbolic graph execution and optimization.
Supports an efficient deployment of a trained model to low-end devices for inference, such as mobile devices, IoT devices (using AWS Greengrass), Serverless (Using AWS Lambda) or containers.
In this talk, after a brief overview of AI concepts in particular Machine Learning (ML) techniques, some of the well-known computer design concepts for high performance and power efficiency are presented. Subsequently, those techniques that have had a promising impact for computing ML algorithms are discussed. Deep learning has emerged as a game changer for many applications in various fields of engineering and medical sciences. Although the primary computation function is matrix vector multiplication, many competing efficient implementations of this primary function have been proposed and put into practice. This talk will review and compare some of those techniques that are used for ML computer design.
Similar to Cvpr 2018 papers review (efficient computing) (20)
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
3. NetAdapt: Platform Aware Neural Network Adaptation
for Mobile Applications (Google)
• 일반적인 최적화 MACs / FLOPs 등을 줄이는 데에 집중함
• 실제로 Latency, Energy consumption 등과 같은 direct metrics도 최적화 되는가? (그렇지 않을
수도 있다) 이 부분을 고려해서 최적화 하겠다!
• Empirical measurements
• Contribution
• Automatically and progressively simply a pre-trained network until the resource budget is
met while maximizing the accuracy
• Achieves better accuracy versus latency trade-offs on mobile CPU & GPU, compared with
the state-of-the-art automated network simplification algorithms
• Method
• 한 번에 주어진 constraints를 맞추려 하는 것이 아니라, iterative하게 조건을 점점 더 tight하게
만들어 가면서 정확도 최적화를 진행
• 1 step당 constraint를 만족시키면서 가장 acc drop이 낮은 layer의 필터 수를 조정하는 방식
• 느리다
6. Algorithm Details
• Empirical Measurements
• Layer 별로 look-up table 생성해 둬서 시간을 최대한 절약한다.
• Choose which Filter
• L2-norm magnitude 작은 순서대로 제거한다.
• Joint influence 계산해서 지우는 방법도 있을 것*
• Fine-tuning
• Short-term fine-tuning으로 대충 성능 비교 후 최종 결과에 대해서만 Long-term 으로 진행
• Short-term training: about 40k iteration, w/ ImageNet training set – 10,000 holdout set
*Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne: Designing energy-efficient convolutional neural networks using energy-aware pruning. In: IEEE Conference on Computer
Vision and Pattern Recognition (CVPR). (2017)
8. ADC: Automated Deep Compression and Acceleration
with Reinforcement Learning (Song Han)
• NetAdapt’s competitor
• LPIRC: Google Achieve better accuracy
than ADC & practical
• Efficient DL workshop: Song Han
NetAdapt is slow!
• Reinforcement Learning based agent
• Efficient design space exploration
• Accuracy & compression rate
• Sample the design space greatly improve
the model compression quality
• Even better than human expertise!
9. ADC Agent
• w/ continuous compression ratio control (DDPG*)
• Receive a reward with approximated model
performance without fine-tuning
• Accuracy & overall compression rate
• Further scenario: FLOPs-constrained compression &
accuracy-guaranteed compression
• Process a network in a layer by layer manner
• Input: Layer embedding state 𝑠𝑡 =
• Outputs a fine grained sparsity ratio for each layer
* N. Johnson, S. Kotz, and N. Balakrishnan. Continuous univariate probability distributions,(vol. 1), 1994.
10. Algorithm
• Specified Compression algorithm (reducing channels to c’): n x c x k x k ?
• Spatial decomposition[1]: n x c’ x k x 1, c’ x c x 1 x k - Data independent reconstruction
• Channel decomposition[2]: n x c’ x k x k, c’ x c x 1 x 1
• Channel pruning[3]: n x c’ x k x k - L2-norm(magnitude) based pruning
• Agent
• Each transition in an episode is 𝑠𝑡, 𝑎 𝑡, 𝑅, 𝑠𝑡+1
• Action Error[4]에 비례한 Reward를 통해 Agent 학습
• FLOPs-Constrained Compression
• R = -Error
• 일단 1차로 네트워크 압축 후, 휴리스틱을 통해 점차 주어진 budget 아래로 압축되도록 만든다.
• Accuracy-Guaranteed Compression
• Observe that accuracy error is inversely-proportional to log(FLOPs)
• R = - Error * log(FLOPs)
[1] M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014
[2] X. Zhang, J. Zou, K. He, and J. Sun. Accelerating very deep convolutional networks for classification and detection. IEEE transactions on pattern analysis and
machine intelligence, 38(10):1943–1955, 2016.
[3] Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1389–1397, 2017
[4]B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning
12. Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference (Google, CVPR 2018)
• How to train Quantized Neural Networks?
• 이전까지의 Quantization approach:
• 너무 쉬운 문제들에 대해서만 접근하는 경향이 있다 (Alexnet, ResNet, VGG)
• All over-parameterized
• Compression에 대해서만 생각하고 Computational efficiency는 고려하지 않았다.
• Look-up table 방식: Poorly perform on common devices
• Shift / XOR 등 bitwise operation 사용하는 애들은 Existing hardware에서 딱히 이득이 없다.
• Fully XOR Net 같은 경우는 performance degradation 문제가 있다
• Quantization scheme
• Weights / Activations: 8-bit integers
• bias vectors: 32-bit integers
• Quantized inference / training Framework
• Adopted in TFLite (Inference)
• Inference: Integer-only arithmetic / training: floating-point arithmetic
13. Quantized Inference
• affine mapping
• 𝑟 = 𝑆(𝑞 − 𝑍)
• Integers q to real numbers r S, Z are quantization parameters
• Uses a single set of quantization parameters for all values within
each activations array and within each weights array
• Computation of Matrix multiplication
• 𝑟3 = 𝑟1 𝑟2일 때 (𝑟𝛼: N x N matrix)
• 𝑟3
(𝑖,𝑘)
= 𝑗=1
𝑁
𝑟1
(𝑖,𝑗)
𝑟2
(𝑗,𝑘)
14. Quantized Inference
• Bias quantization
• Bias quantization error act as an overall bias
• 32-bit representation
• 𝑍 𝑏𝑖𝑎𝑠 = 0, 𝑆 𝑏𝑖𝑎𝑠 = 𝑆1 ∗ 𝑆2
• Things left to do
• Scale down to the final scale (8-bit output activations)
• Cast down to uint8
• apply the activation function to yield the final 8-bit output activation
15. Training with simulated quantization
• All weights & biases are stored in floating point
• Weights are quantized before they are convolved with the input
• Activations are quantized at points where they would be during inference
• Tuning quantization parameters
• Weight: min value ~ max value linearization
• Activation: Exponential moving averages
18. SBNet: Sparse Blocks Network for Fast
Inference (Uber)
• Low-cost computation mask reduce computation in the
high-resolution main network
• Tiling-based sparse convolution algorithm
• Implements tiling-based GPU kernel
• LiDAR 3D object detection tasks
19. Sparse Blocks Network
• How to handle sparse input?
• Mask to indices
• Extract a list of activate
location indices
• Sparse gather/scatter
• Extract data from the sparse
inputs
• Signal processing
• Overlap-save algorithm
• Repeating Gathering /
Scattering while processing
21. Shift: A Zero FLOP, Zero Parameter Alternative to
Spatial Convolutions (UC Berkeley, Kurt Keutzer)
• Shift-based module
• Use Shift operation to mix spatial
information across channels
• Let’s use simple shift operation
instead of depth-wise convolution!
• Series of memory operations that
adjusts channels of the input tensor in
certain directions
• Assign different shift kernels per
each channel
• 𝑘2
different shift kernels
• Each group of 𝑀/𝑘2
channels adopts
one shift
• Results
• It looks not that efficient
• But it can be adapted to MIDAP easily
22. Shift based modules
• (Shift-)Conv-Shift-Conv module
• 𝑆𝐶2 module / CSC module
• Shift Kernel
• Size 𝐷 𝑘: 𝐷 𝑘
2
possible shift matrices
• Dilation rate: similar to dilated convolution
• Expansion rate 𝜀: expand the channel size via 1x1
convolution kernel to gather sufficient information
with shift operation
• Only 1x1 convolutions
• Target
• Mobile / IOT applications
• Memory footprint reduction
24. Squeeze-and-Excitation Networks
(Momenta & Oxford)
• 1st place winner of ILSVRC 2017 classification
• Suggests SE block
• Feature recalibration
• Squeeze: Global average pooling (H x W 1 x 1)
• Excitation: Adaptive Recalibration (capture channel-wise dependencies)
25. Squeeze & Excitation
• Excitation
• Gating mechanism with two fully connected
layers
• Acts similarly as an attention module
• Results
26. ShuffleNet: An Extremely Efficient Convolutional
Neural Network for Mobile Devices (Megvii Inc.)
• Simple idea
• State-of-the-art architectures
• 1x1 conv + DWconv + 1x1 conv
• Intuitive shuffling
• 1x1 group conv + shuffle +
DWconv + 1x1 group conv
• g x n outputs (g: # of groups)
(g,n) transpose (n, g)
flattening g x n
• Good results
27. CondenseNet: An Efficient DenseNet using
Learned Group Convolutions (Cornell Univ.)
• Observation
• 1x1 group convolution usually leads to drastic
reductions in accuracy
• Learned group convolution
• Removing superfluous computation in
DenseNet architecture via group convolution
• Automatic input feature groupings during
training
28. CondenseNet Training
• Split the filters into G groups of equal size before training
• Random grouping for further condensation
• Condensation Criterion
• Averaged absolute value of weights between them across all outputs within the group
• Group Lasso
• Group-level sparsity
• Condensation procedure
• Condensation factor C
• C – 1 condensing stages
• Pruning 1/C of the filter weights at the end of each stage
• Re-index the layer
29. Stochastic Downsampling for Cost-Adjustable Inference and
Improved Regularization in Convolutional Networks
(Nanyang Technological University & Adobe & Nvidia)
• Training the network w/ stochastic downsampling
31. Efficient video object segmentation via
Network Modulation (Snap)
• Semi-supervised video segmentation
• A human can easily segment an object in the whole
video without knowing its semantic meaning
• Typical scenario
• Given: First frame of a video along with an annotated object
mask
• Task: to accurately locate the object in all following frames
• Modulator + segmentation network
• 기존: FCN pre-training + fine-tuning the network for
specific video sequence
• Fine-tuning 과정 비효율적
• Proposed: Segmentation network 는 1번만 트레이닝하고,
주어진 태스크에 맞는 modulator 트레이닝하자
• One-shot fine-tuning (One-shot learning == meta-learning 응용)
• Visual modulator(Attention), Spatial modulator
33. Mobile Video Object Detection with Temporally-Aware
Feature Maps (Georgia Tech, Google)
• Video object detection
• Imagenet VID 2015 dataset
• Single image object detector + LSTM
• LSTM layers to create an interweaved recurrent-
convolutional architecture
• Bottleneck-LSTM to reduce computational cost
• 15 FPS in Mobile CPU
• Smaller and faster than DFF(Deep Feature Flow)
• This work does not use optical flow estimation
34. Approach
• SSD + Convolutional LSTMs
• Mobilenet-SSD, Removing the final layer
• Inject convolutional LSTM layers directly into the single-
frame detector
• Allow the network to encode both spatial and temporal
information
• Feature refinement with LSTMs
• Place a single LSTM after the Conv13 layer
• Stack multiple LSTMs after the Conv13 layer
• Place one LSTM after each feature map
36. Towards High Performance Video Object
Detection (USTC, Microsoft Research)
• Recent works
• Motion estimation module is built into the network architecture
• Sparse feature propagation
• Expensive feature network on sparse key frames
• Motion field
• Dense feature aggregation
• Utilize every frame to enhance accuracy
• This paper suggests unified approach
• Sparsely recursive feature aggregation
• Spatially-adaptive partial feature updating
• To recompute features on non-key frames
• wherever propagated features have bad quality
• Temporally-adaptive key frame scheduling
• Dynamic key frame scheduling
38. Low-shot Learning with Imprinted Weights (UCLA)
• How to recognize novel visual categories?
• Given base classes w/ abundant samples for training
• Exposed to previously unseen novel classes with a limited amount of training data
for each category
• Directly set weights for a new category based on an appropriately
scaled copy of the embedding layer activations for that training
example
• Human’s ability to accept the new visual categories learner grows its capability
as it encounters more categories and training samples
• A single imprinted weight vector is learned for each novel category
39. Metric Learning
• Proxy-based Embedding Training
• Previous works: Neighborhood components
analysis – learns a distance metric
• Comparison with all other classes
• Proxy-based training
• Comparison with other negative-correlated proxies
• Trainable proxies
• I cannot understand this concept exactly
• Imprinting
• Remembering the semantic embedding of low-
shot examples as the templates for new classes
41. Memory Matching Networks for One-Shot Image
Recognition (USTC, Microsoft)
• Writes the features of a set of labelled images into memory
• Reads from memory when performing inference
• A Contextual Learner employs the memory slots in a sequential manner to predict the parameters of
CNNs for unlabeled images
• MM-Net could output one unified model irrespective of the number of shots and
categories
42. One-Shot Image recognition
• Given an unlabeled image 𝑥, predict its class 𝑦
• 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑦𝑛 𝑥, 𝑆), 𝑤ℎ𝑒𝑟𝑒 𝑃 𝑦𝑛 𝑥, 𝑠 = 𝑓 𝑥 𝑆 T
∙ 𝑔 𝑥 𝑛
𝑆
• Different embedding function for unlabeled image and support image
• 𝑥 𝑛: 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑠𝑎𝑚𝑝𝑙𝑒 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙 𝑦 𝑛
• Design a memory module to encode the contextual information within
support set into the memory via write controller
• Memory: consist of M key-value pairs
• Key: 𝐷 𝑚-dimensional memory representation
• Value: class label
• Write controller
• Encode the sequence of N support images into M memory slots
• Aiming to distill the intrinsic characteristics of classes
• Contextual Embedding
• For support set / Unlabeled image
• bi-LSTM-based approach
43. Feature Generating Networks for Zero-Shot Learning
(Saarland Informatics Campus)
• How to cope with unseen classes? (Zero-shot learning task)
• Use GAN to synthesize features of unseen classes
• Use class-level semantic information
45. Dual Skipping Networks (Fudan Univ, Tencent AI)
• Inspired by neuroscience studies
• Coarse-to-fine object categorization
• Mimicking the behavior of human brain
• LH(Fine grain) & RH(Coarse grain)
• Propose a layer-skipping mechanism
• Learns a gating network to predict which layers to
skip
• E
46. Model
• Network has left-right subnets by referring to
LH and RH
• At first, both branches have roughly the same
initialized layers and structures
• Skip-Dense Block
• Dense Layer – Residual or DenseNet based block
• Gating network
• Path selection
• Whether or not skipping the convolutional layer from the
training data
• Threshold function of Gating network
• Performs as a binary classifier
• Training: act as a scale value
• Testing: discrete binary value (0: skip)
• Guide
• Faster coarse subnet can guide the slower fine/local
subnet
48. Deep Mutual Learning
(Dalian University of Technology, China)
• Model distillation
• A powerful large network teaches a small network
• Deep Mutual learning
• An ensemble of students learn collaboratively & teach each other
• Collaborative learning
• Dual learning[1]: two cross-lingual translation models teach each other
• Cooperative Learning[2]: Recognizing the same set of object categories but with
different inputs (ex: RGB + depth)
• This work: different models, but the same input and task
• No priori powerful teacher network is necessary!
[1] D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, and W. Ma. Dual learning for machine translation. In NIPS, pages 820– 828, 2016.
[2] T. Batra and D. Parikh. Cooperative learning with visual attributes. arXiv: 1705.05512, 2017.
49. Deep Mutual Learning
• Use KL Divergence to provide training experience to each other network
• 𝐷 𝐾𝐿(𝑝2| 𝑝1 = 𝑖=1
𝑁
𝑚=1
𝑀
𝑝2
𝑚
𝑥𝑖 𝑙𝑜𝑔
𝑝2
𝑚 𝑥 𝑖
𝑝1
𝑚(𝑥 𝑖)
(𝑁: # 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠, 𝑀: # 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠, 𝑝 𝑛: 𝑜𝑢𝑡𝑝𝑢𝑡 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑒𝑟 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝜃 𝑛)
• Loss function: 𝐿 𝜃 𝑘
= 𝐿 𝐶 𝑘
+
1
𝐾−1 𝑙=1,𝑙≠𝑘
𝐾
𝐷 𝐾𝐿(𝑝𝑙||𝑝 𝑘) and vice versa (𝐿 𝑐 𝑘
: Classification Loss)
• It can be extended to semi-supervised tasks
• (Label information is not required for posterior computation)
52. Interpret Neural Networks by Identifying Critical Data
Routing Paths (Tsinghua Univ.)
• Interpretable machine learning
algorithm
• Explain or to present in
understandable terms to a human
• Distillation Guided Routing Method
• Discover the critical nodes on the
data routing paths for individual
input samples
• Scalar control gate
• Decide whether each layer’s output
channel is critical for the decision
53. Methodology
• Pretrained model + Channel-wise Control gates
• Control gates are learned to find the optimal routing decision in the network
• Scale value for each channel
• Distillation Guided Routing
• Perform SGD on the same input for T = 30 iterations
• Most scalar values of the gates should be close to zeros
• Output of the new network should be similar to the original network
• argmin
Λ
𝐿 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥; Λ + +𝛾 𝑘 𝜆 𝑘
• Gradients for control gates:
𝜕𝐿𝑜𝑠𝑠
𝜕Λ
=
𝜕𝐿
𝜕Λ
+ 𝛾 ∗ 𝑠𝑖𝑔𝑛 Λ
• CDRPs representation
• 𝑣 𝑓𝑜𝑟 𝑖𝑚𝑎𝑔𝑒 𝑥 = 𝐶𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑒(𝑎𝑙𝑙 Λ)
• Adversarial Samples Detection
• CDRPs comparison
54. Deep Photo Enhancer: Unpaired Learning for Image Enhancement
from Photographs with GANs (National Taiwan Univ.)
• Problem
• Given a set of photographs w/
desired characteristics
• Transforms an input image
into an enhanced image with
those characteristics
• MIT-Adobe 5K dataset
• 5K images – original images &
several versions of retouched
images
• Competitive samples : retouched
images from photographer C
55. Network
• Define an enhancement by a set of examples Y
• Input X U-net based generator Output (vs Y) Discriminator
• Add Attention-based feature in the U-net
• To capture global features (such as the sky)
• Can use 2-way GAN for consistency checking
56. A2-RL: Aesthetics Aware Reinforcement
Learning for Image Cropping
• Cropping the image to improve
aesthetic quality
• AVA dataset*
• Traditional approach: sliding
window method
• Time consuming, fixed aspect ratio
• Weakly supervised Aesthetics
Aware Reinforcement Learning
• Train the agent using the actor-
critic architecture
• Sequential decision making
* N. Murray, L. Marchesotti, and F. Perronnin. Ava: A large- scale database for aesthetic visual analysis. In CVPR, 2012.
57. RL Agent
• 14 pre-defined action
• Reward function: aesthetic score
• Output of the pretrained view finding network (asthetic ranker) – Trained with same dataset
58. Distort-and-Recover: Color Enhancement using
Deep Reinforcement Learning (Lunit)
• Distort original image & use original image as a ground truth for
recovering
• Adobe-5K Training set, but only utilizes retouched images
• Training a reinforcement learning agent for color enhancement
• Compare the features & take an action
• Reduce the gap between two images
59. Neural Style Transfer via Meta Networks
(Peking Univ., National University of Singapore)
• Generate the specified network for
specific style
• through one feed-forward in the meta
networks for neural style transfer
• Don’t need enormous training iterations
to adopt a new style
• Small size neural style transfer
network is generated
60. Embodied Question Answering
(Georgia Institute of Technology, Facebook AI)
• New AI Task
• 3D environment
• Question Navigate to
find the answer Answer
61. Excluded papers
• NestedNet: Learning Nested Sparse Structures in Deep Neural Networks (SNU)
• Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image
Style Transfer (Durham Univ.)
• Low-Latency Video Semantic Segmentation (CAS)
• Guided Proofreading of Automatic Segmentations for Connectomics (Harvard)
• Generative Adversarial Learning Towards FastWeakly Supervised Detection (Ximan Univ, Microsoft)
• Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks (ETH Zurich)
• Neural Baby Talk(Georgia Institute of Technology, Facebook AI)
• Self-Supervised Feature Learning by Learning to Spot Artifacts(University of Bern)
• CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise (Microsoft AI)
Editor's Notes
Empirical Measurements: Layer 별로 look-up table 생성해 둬서 시간을 최대한 절약한다.
Input image resolution의 경우엔 이 전체 과정에는 포함이 되지 않는 듯. (Resolution 각 Resolution 별로 이 과정 진행)
Which filter? for k from 1 to K (우측 그림에도 나와 있음)
Idea: Empirical experiments + Scheduling problem을 같이 써서 네트워크 조절 알고리즘을 짤 수도 있겠다.
처음에 압축 안하더라도, 뒤가 더 압축되는 결과가 나오므로 손해다.
Plain-20, VGG16 4x, Mobilenet
Residual / Concat 등 operation 지원하기 어렵다.
우리 아이디어와 비슷
Channel contribution: 해당 채널 값에 곱해지는 1x1 weight value를 기준으로 계산하였음.