○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
This document proposes a method called Factor Transfer for compressing complex networks via knowledge transfer from a teacher network to a student network. It introduces paraphrasing and translating modules to extract factors from the teacher and student networks and minimize their difference, unlike existing methods that directly compare outputs. Experiments on image classification datasets CIFAR-10, CIFAR-100 and ImageNet, as well as object detection, show the proposed method helps increase student network accuracy compared to directly transferring knowledge or attention from the teacher.
The document discusses relational knowledge distillation (RKD), a technique for transferring knowledge from a teacher model to a student model. It begins by providing background on knowledge distillation and recent approaches. It then introduces RKD, which transfers relational information between examples in the teacher's embedding space, such as distances and angles, rather than just individual example outputs. The document describes experiments applying RKD to metric learning, image classification, and few-shot learning, finding it improves student model performance over other distillation methods. It concludes RKD effectively leverages relational information to transfer knowledge between models.
Improving neural question generation using answer separationNAVER Engineering
Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
This document summarizes research comparing different convolutional neural network (CNN) architectures and feature representations on common image classification tasks. It finds that CNN-based methods outperform traditional bag-of-words models. Specifically, it compares different pre-trained CNNs, explores the effects of data augmentation, and shows that fine-tuning networks to target datasets improves performance. The best results are achieved with smaller filters, deeper networks, and ranking loss fine-tuning, outperforming more complex architectures. Code and models are available online for others to replicate the findings.
The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
The document summarizes the DINO self-supervised learning approach for vision transformers. DINO uses a teacher-student framework where the teacher's predictions are used to supervise the student through knowledge distillation. Two global and several local views of an image are passed through the student, while only global views are passed through the teacher. The student is trained to match the teacher's predictions for local views. DINO achieves state-of-the-art results on ImageNet with linear evaluation and transfers well to downstream tasks. It also enables vision transformers to discover object boundaries and semantic layouts.
This document proposes a method called Factor Transfer for compressing complex networks via knowledge transfer from a teacher network to a student network. It introduces paraphrasing and translating modules to extract factors from the teacher and student networks and minimize their difference, unlike existing methods that directly compare outputs. Experiments on image classification datasets CIFAR-10, CIFAR-100 and ImageNet, as well as object detection, show the proposed method helps increase student network accuracy compared to directly transferring knowledge or attention from the teacher.
The document discusses relational knowledge distillation (RKD), a technique for transferring knowledge from a teacher model to a student model. It begins by providing background on knowledge distillation and recent approaches. It then introduces RKD, which transfers relational information between examples in the teacher's embedding space, such as distances and angles, rather than just individual example outputs. The document describes experiments applying RKD to metric learning, image classification, and few-shot learning, finding it improves student model performance over other distillation methods. It concludes RKD effectively leverages relational information to transfer knowledge between models.
Improving neural question generation using answer separationNAVER Engineering
Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
This document summarizes research comparing different convolutional neural network (CNN) architectures and feature representations on common image classification tasks. It finds that CNN-based methods outperform traditional bag-of-words models. Specifically, it compares different pre-trained CNNs, explores the effects of data augmentation, and shows that fine-tuning networks to target datasets improves performance. The best results are achieved with smaller filters, deeper networks, and ranking loss fine-tuning, outperforming more complex architectures. Code and models are available online for others to replicate the findings.
The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
The document summarizes the DINO self-supervised learning approach for vision transformers. DINO uses a teacher-student framework where the teacher's predictions are used to supervise the student through knowledge distillation. Two global and several local views of an image are passed through the student, while only global views are passed through the teacher. The student is trained to match the teacher's predictions for local views. DINO achieves state-of-the-art results on ImageNet with linear evaluation and transfers well to downstream tasks. It also enables vision transformers to discover object boundaries and semantic layouts.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
Style transfer techniques have evolved from matching gram matrices to using neural networks. Early methods matched gram statistics of CNN features to transfer texture styles. Recent work uses adaptive instance normalization and feed-forward networks. WCT2 achieves photorealistic transfer using wavelet transforms that satisfy the perfect reconstruction condition, enabling high resolution stylization and temporal consistency in videos without post-processing.
Efficient Neural Network Architecture for Image ClassficationYogendra Tamang
The document outlines the objectives, methodology, and work accomplished for a project involving designing an efficient convolutional neural network architecture for image classification. The objectives were to classify images using CNNs and design an effective CNN architecture. The methodology involved designing convolution and pooling layers, and using gradient descent to train the network. Work accomplished included GPU configuration, designing CNN architectures for CIFAR-10 and MNIST datasets, and tracking training loss, validation loss, and accuracy over epochs.
The presentation is coverong the convolution neural network (CNN) design.
First,
the main building blocks of CNNs will be introduced. Then we systematically
investigate the impact of a range of recent advances in CNN architectures and
learning methods on the object categorization (ILSVRC) problem. In the
evaluation, the influence of the following choices of the architecture are
tested: non-linearity (ReLU, ELU, maxout, compatibility with batch
normalization), pooling variants (stochastic, max, average, mixed), network
width, classifier design (convolution, fully-connected, SPP), image
pre-processing, and of learning parameters: learning rate, batch size,
cleanliness of the data, etc.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Convolutional neural networks (CNNs) are better suited than traditional neural networks for processing image data due to properties of images. CNNs apply filters with local receptive fields and shared weights across the input, allowing them to detect features regardless of position. A CNN architecture consists of convolutional layers that apply filters, and pooling layers for downsampling. This reduces parameters and allows the network to learn representations of the input with minimal feature engineering.
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
This document describes a study comparing Convolutional Neural Networks (CNNs) and Hierarchical Temporal Memories (HTMs) on object recognition tasks. The study implements a CNN using Theano, creates a new benchmark of image sequences from the NORB dataset, and evaluates the performance of CNNs and HTMs on the original NORB dataset and new image sequences. The results show that while CNNs achieve higher accuracy on the original NORB data, HTMs are more competitive on the image sequences and can achieve comparable performance using less training data. The study proves that bio-inspired approaches like HTM can advance deep learning research.
Recently, WaveNet, which predicts the probability distribution of speech sample auto-regressively, provides a new paradigm in speech synthesis tasks.
Since the usage of WaveNet for speech synthesis varies by conditional vectors, it is very important to effectively design a baseline system structure.
In this talk, I would like to first introduce various types of WaveNet vocoders such as conventional speech-domain approach and recently proposed source-filter theory-based approach.
Then, I will explain a linear prediction (LP)-based WaveNet speech synthesis, i.e., LP-WaveNet, which overcomes the limitations of source-filter theory-based WaveNet vocoders caused by the mismatch between speech excitation signal and vocal tract filter.
While presenting experimental setups and results, I also would like to share some know-hows to successfully training the network.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
Convolutional neural networks (CNNs) learn multi-level features and perform classification jointly and better than traditional approaches for image classification and segmentation problems. CNNs have four main components: convolution, nonlinearity, pooling, and fully connected layers. Convolution extracts features from the input image using filters. Nonlinearity introduces nonlinearity. Pooling reduces dimensionality while retaining important information. The fully connected layer uses high-level features for classification. CNNs are trained end-to-end using backpropagation to minimize output errors by updating weights.
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
The document summarizes a study on training Vision Transformers (ViTs) by exploring different combinations of data augmentation, regularization techniques, model sizes, and training dataset sizes. Some key findings include: 1) Models trained with extensive data augmentation on ImageNet-1k performed comparably to those trained on the larger ImageNet-21k dataset without augmentation. 2) Transfer learning from pre-trained models was more efficient and achieved better results than training models from scratch, even with extensive compute. 3) Models pre-trained on more data showed better transfer ability, indicating more data yields more generic representations.
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
This document discusses convolutional neural networks for image classification and their application to the Kaggle National Data Science Bowl competition. It provides an overview of CNNs and their effectiveness for computer vision tasks. It then details various CNN architectures, preprocessing techniques, and ensembling methods that were tested on the competition dataset, achieving a top score of 0.609 log loss. The document concludes with highlights of the winning team's solution, including novel pooling methods and knowledge distillation.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
This document presents a thesis on using sequence-to-sequence learning with deep learning techniques for optical character recognition. The author aims to convert images of mathematical equations into LaTeX representations. Convolutional neural networks, recurrent neural networks, long short-term memory networks, and attention models are discussed as approaches. Details are provided on the architecture and workings of CNNs, RNNs, and LSTMs. The thesis will propose a model and discuss results and future work.
1) The Meta Network model proposes a two-level learning approach for few-shot learning. It includes a slow-learning meta-learner and a fast-learning base learner.
2) The meta-learner learns to generate fast weights for the base learner using gradient-based meta information from previous tasks. It stores these weights in a memory indexed by task embeddings.
3) Experiments on few-shot classification datasets like Omniglot and MiniImageNet demonstrate the model can learn new concepts from very few examples through fast adaptation of the base learner's weights.
Network Recasting: A Universal Method for Network Architecture TransformationJoonsangYu2
The document presents a method called network recasting that can transform pretrained neural network blocks into different block types. It trains a target block to match the output of a source block, then replaces the source with the target. This allows neural networks to be compressed by reducing filters or sped up by changing block types. Experiments showed network recasting outperformed previous methods by achieving lower error rates and up to a 2.1x inference speedup on ResNet-50 and 3.2x speedup on VGG-16.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
Style transfer techniques have evolved from matching gram matrices to using neural networks. Early methods matched gram statistics of CNN features to transfer texture styles. Recent work uses adaptive instance normalization and feed-forward networks. WCT2 achieves photorealistic transfer using wavelet transforms that satisfy the perfect reconstruction condition, enabling high resolution stylization and temporal consistency in videos without post-processing.
Efficient Neural Network Architecture for Image ClassficationYogendra Tamang
The document outlines the objectives, methodology, and work accomplished for a project involving designing an efficient convolutional neural network architecture for image classification. The objectives were to classify images using CNNs and design an effective CNN architecture. The methodology involved designing convolution and pooling layers, and using gradient descent to train the network. Work accomplished included GPU configuration, designing CNN architectures for CIFAR-10 and MNIST datasets, and tracking training loss, validation loss, and accuracy over epochs.
The presentation is coverong the convolution neural network (CNN) design.
First,
the main building blocks of CNNs will be introduced. Then we systematically
investigate the impact of a range of recent advances in CNN architectures and
learning methods on the object categorization (ILSVRC) problem. In the
evaluation, the influence of the following choices of the architecture are
tested: non-linearity (ReLU, ELU, maxout, compatibility with batch
normalization), pooling variants (stochastic, max, average, mixed), network
width, classifier design (convolution, fully-connected, SPP), image
pre-processing, and of learning parameters: learning rate, batch size,
cleanliness of the data, etc.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Convolutional neural networks (CNNs) are better suited than traditional neural networks for processing image data due to properties of images. CNNs apply filters with local receptive fields and shared weights across the input, allowing them to detect features regardless of position. A CNN architecture consists of convolutional layers that apply filters, and pooling layers for downsampling. This reduces parameters and allows the network to learn representations of the input with minimal feature engineering.
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
This document describes a study comparing Convolutional Neural Networks (CNNs) and Hierarchical Temporal Memories (HTMs) on object recognition tasks. The study implements a CNN using Theano, creates a new benchmark of image sequences from the NORB dataset, and evaluates the performance of CNNs and HTMs on the original NORB dataset and new image sequences. The results show that while CNNs achieve higher accuracy on the original NORB data, HTMs are more competitive on the image sequences and can achieve comparable performance using less training data. The study proves that bio-inspired approaches like HTM can advance deep learning research.
Recently, WaveNet, which predicts the probability distribution of speech sample auto-regressively, provides a new paradigm in speech synthesis tasks.
Since the usage of WaveNet for speech synthesis varies by conditional vectors, it is very important to effectively design a baseline system structure.
In this talk, I would like to first introduce various types of WaveNet vocoders such as conventional speech-domain approach and recently proposed source-filter theory-based approach.
Then, I will explain a linear prediction (LP)-based WaveNet speech synthesis, i.e., LP-WaveNet, which overcomes the limitations of source-filter theory-based WaveNet vocoders caused by the mismatch between speech excitation signal and vocal tract filter.
While presenting experimental setups and results, I also would like to share some know-hows to successfully training the network.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
Convolutional neural networks (CNNs) learn multi-level features and perform classification jointly and better than traditional approaches for image classification and segmentation problems. CNNs have four main components: convolution, nonlinearity, pooling, and fully connected layers. Convolution extracts features from the input image using filters. Nonlinearity introduces nonlinearity. Pooling reduces dimensionality while retaining important information. The fully connected layer uses high-level features for classification. CNNs are trained end-to-end using backpropagation to minimize output errors by updating weights.
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
The document summarizes a study on training Vision Transformers (ViTs) by exploring different combinations of data augmentation, regularization techniques, model sizes, and training dataset sizes. Some key findings include: 1) Models trained with extensive data augmentation on ImageNet-1k performed comparably to those trained on the larger ImageNet-21k dataset without augmentation. 2) Transfer learning from pre-trained models was more efficient and achieved better results than training models from scratch, even with extensive compute. 3) Models pre-trained on more data showed better transfer ability, indicating more data yields more generic representations.
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
This document discusses convolutional neural networks for image classification and their application to the Kaggle National Data Science Bowl competition. It provides an overview of CNNs and their effectiveness for computer vision tasks. It then details various CNN architectures, preprocessing techniques, and ensembling methods that were tested on the competition dataset, achieving a top score of 0.609 log loss. The document concludes with highlights of the winning team's solution, including novel pooling methods and knowledge distillation.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
This document presents a thesis on using sequence-to-sequence learning with deep learning techniques for optical character recognition. The author aims to convert images of mathematical equations into LaTeX representations. Convolutional neural networks, recurrent neural networks, long short-term memory networks, and attention models are discussed as approaches. Details are provided on the architecture and workings of CNNs, RNNs, and LSTMs. The thesis will propose a model and discuss results and future work.
1) The Meta Network model proposes a two-level learning approach for few-shot learning. It includes a slow-learning meta-learner and a fast-learning base learner.
2) The meta-learner learns to generate fast weights for the base learner using gradient-based meta information from previous tasks. It stores these weights in a memory indexed by task embeddings.
3) Experiments on few-shot classification datasets like Omniglot and MiniImageNet demonstrate the model can learn new concepts from very few examples through fast adaptation of the base learner's weights.
Network Recasting: A Universal Method for Network Architecture TransformationJoonsangYu2
The document presents a method called network recasting that can transform pretrained neural network blocks into different block types. It trains a target block to match the output of a source block, then replaces the source with the target. This allows neural networks to be compressed by reducing filters or sped up by changing block types. Experiments showed network recasting outperformed previous methods by achieving lower error rates and up to a 2.1x inference speedup on ResNet-50 and 3.2x speedup on VGG-16.
This document provides an overview of deep learning and common deep learning concepts. It discusses that deep learning uses complex neural networks to determine representations of data, rather than requiring humans to engineer features. It also describes convolutional neural networks and how they are better than fully connected networks for tasks like image recognition. Additionally, it covers transfer learning and how pre-trained models can be adapted to new tasks by retraining final layers, reducing data and computation needs. Common deep learning architectures mentioned include AlexNet, VGG16, Inception and MobileNets.
This document discusses deep learning initiatives at NECSTLab focused on hardware acceleration of convolutional neural networks using FPGAs. It proposes a framework called CNNECST that provides high-level APIs to design CNNs, integrates with machine learning frameworks for training, and generates customized hardware for FPGA implementation through C++ libraries and Vivado. Experimental results show speedups and energy savings for CNNs like LeNet and MNIST on FPGA boards compared to CPU. Challenges and future work include supporting more layer types and reduced precision computations.
This document discusses various techniques for optimizing deep neural network models and hardware for efficiency. It covers approaches such as exploiting activation and weight statistics, sparsity, compression, pruning neurons and synapses, decomposing trained filters, and knowledge distillation. The goal is to reduce operations, memory usage, and energy consumption to enable efficient inference on hardware like mobile phones and accelerators. Evaluation methodologies are also presented to guide energy-aware design space exploration.
This document provides an overview of deep learning including why it is used, common applications, strengths and challenges, common algorithms, and techniques for developing deep learning models. In 3 sentences: Deep learning methods like neural networks can learn complex patterns in large, unlabeled datasets and are better than traditional machine learning for tasks like image recognition. Popular deep learning algorithms include convolutional neural networks for image data and recurrent neural networks for sequential data. Effective deep learning requires techniques like regularization, dropout, data augmentation, and hyperparameter optimization to prevent overfitting on training data.
This document provides an overview of deep learning including:
1. Why deep learning performs better than traditional machine learning for tasks like image and speech recognition.
2. Common deep learning applications such as image recognition, speech recognition, and healthcare.
3. Challenges of deep learning like the need for large datasets and lack of interpretability.
Finding the best solution for Image ProcessingTech Triveni
What is beyond using Tensorflow, GPU or TPU to process images seamlessly? Do we have a silver bullet for image processing? Over the years, image processing has picked up a different level of attraction. Everyone can think about its ease of usability because it has become a reality now. We have started seeing how Residual Neural Network architecture is being used for different cases and not only that, how Residual Neural network is being tweaked to solve different problems. Along with tweaking the ResNet, preprocessing is also being improved to support different architecture for this matter.
Everyone has almost become cyborg already with mobile phones in our hands and apparently until human beings bring the AI/ML to the phones completely they are not taking any rest. We are going to see the development of different architecture and algorithms around running AI/ML on low configuration devices.
In this session, we are going to talk about different research papers submitted for these matters and some implementations for the same as well.
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 Once-for-All: Train One Network and Specialize it for Efficient Deployment 라는 제목의 논문입니다.
모델을 실제로 하드웨어에 Deploy하는 그 상황을 보고 있는데 이 페이퍼에서 꼽고 있는 가장 큰 문제는 실제로 트레인한 모델을 Deploy할 하드웨어 환경이 너무나도 많다는 문제가 하나 있습니다 모든 디바이스가 갖고 있는 리소스가 다르기 때문에 모든 하드웨어에 맞는 모델을 찾기가 사실상 불가능하다는 문제를 꼽고 있고요
각 하드웨어에 맞는 옵티멀한 네트워크 아키텍처가 모두 다른 상황에서 어떻게 해야 될건지에 대한 고민이 일반적 입니다. 이제 할 수 있는 접근중에 하나는 각 하드웨어에 맞게 옵티멀한 아키텍처를 모두 다 찾는 건데 그게 사실상 너무나 많은 계산량을 요구하기 때문에 불가능하다라는 문제를 갖고 있습니다 삼성 노트 10을 예로 한 어플리케이션의 requirement가 20m/s로 그 모델을 돌려야 된다는 요구사항이 있으면은 그 20m/s 안에 돌 수 있는 모델이 뭔지 accuracy가 뭔지 이걸 찾기 위해서는 파란색 점들을 모두 찾아야 되고 각 점이 이제 트레이닝 한번을 의미하게 됩니다 그래서 사실상 다 수의 트레이닝을 다 해야지만 그 중에 뭐가 최적인지 또 찾아야 합니다. 실제 Deploy해야 되는 시나리오가 늘어나면 이게 리니어하게 증가하기 때문에
각 하드웨어에 맞는 그런 옵티멀 네트워크를 찾는게 사실상 불가능합니다.
그래서 이제 OFA에서 제안하는 어프로치는 하나의 네트워크를 한번 트레이닝 하고 나면 다시 하드웨어에 맞게 트레이닝할 필요 없이 그냥 각 환경에 맞게 가져다 쓸 수 있는 서브네트워크를 쓰면 된다 이게 주로 메인으로 사용하고 있는 어프로치입니다.
오늘 논문 리뷰를 위해 펀디멘탈팀 김동현님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
This document provides legal notices and disclaimers for an informational presentation by Intel. It states that the presentation is for informational purposes only and that Intel makes no warranties. It also notes that Intel technologies' features and benefits depend on system configuration. Finally, it specifies that the sample source code in the presentation is released under the Intel Sample Source Code License Agreement and that Intel and its logo are trademarks.
Machine Learning - Convolutional Neural NetworkRichard Kuo
The document provides an overview of convolutional neural networks (CNNs) for visual recognition. It discusses the basic concepts of CNNs such as convolutional layers, activation functions, pooling layers, and network architectures. Examples of classic CNN architectures like LeNet-5 and AlexNet are presented. Modern architectures such as Inception and ResNet are also discussed. Code examples for image classification using TensorFlow, Keras, and Fastai are provided.
This document provides an overview of deep learning concepts including neural networks, regression and classification, convolutional neural networks, and applications of deep learning such as housing price prediction. It discusses techniques for training neural networks including feature extraction, cost functions, gradient descent, and regularization. The document also reviews deep learning frameworks and notable deep learning models like AlexNet that have achieved success in tasks such as image classification.
The document provides an overview of neural networks for data mining. It discusses how neural networks can be used for classification tasks in data mining. It describes the structure of a multi-layer feedforward neural network and the backpropagation algorithm used for training neural networks. The document also discusses techniques like neural network pruning and rule extraction that can optimize neural network performance and interpretability.
A separately excited dc motor is driven from a 240v, 50HZ supply via a HC
SCR-bridge with a fly-wheel diode. The motor has an armature resistance
1Ω, an armature voltage constant Kv of 0.8 V. s/rad. The field current is
constant. Assume steady armature current. Determine the armature current
and torque for 1600 rpm and a firing angle delay of a) 30° b) 60
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
ZUIX is a design system created by Zigbang's CTO team to standardize design across all of Zigbang's services. It uses React Native for responsive, multi-platform components and includes tools like Storybook for development and a design review infrastructure for validation. The deployment process involves code reviews, CI/CD pipelines, and publishing to a npm registry. Training and documentation is provided through tools like Google Classroom and Notion. The team aims to further develop ZUIX by improving the design review tools, adding end-to-end testing, and analyzing component usage. The goal is to solve Zigbang's unique challenges through an agile, collaborative approach between designers and developers.
This document discusses Kakao's search platform front-end project. It describes the architecture of an integrated search service using microservices and the need for a design system due to fragmented UIs. It introduces the KST (Kakao Search Template) project for creating a design system including 200+ UI blocks and templates. The KST Builder, Logger, and Dashboard are discussed for managing templates, logging usage, and monitoring coverage. Maintaining a consistent design system is important for operating diverse search services and platforms.
This document discusses Banksalad Product Language (BPL), which is a method used at Banksalad to standardize UI text, elements, and components. It allows designers and developers to use consistent terms, while abstracting UI elements to different levels suitable for their roles. Examples of standardized elements are provided, as well as external resources that discuss concepts like tree shaking that are relevant to BPL. While BPL has benefits, the document considers whether there may be better approaches than BPL.
This document summarizes a presentation about using Stitches, a React styling library, and Storybook for component design.
The presentation introduces Stitches as the styling library used for its support of React, easy usage, and themes. Key features of Stitches discussed include creating styled components, variants, and comparisons to other libraries.
Storybook is presented as a way to improve communication between designers and developers by allowing visualization of components alongside their stories. Clean communication through a shared Storybook is emphasized.
Reflections on initially creating a design system note the benefits of consistency and speed but also identify areas for improvement like documentation, process alignment, and understanding each other's roles. Establishing trust and understanding between
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
The document discusses Kotlin coroutines and how they can be used to write asynchronous code in a synchronous, sequential way. It explains what coroutines are, how they work internally using continuation-passing style (CPS) transformation and state machines, and compares them to callbacks. It also outlines some of the benefits of using coroutines, such as structured concurrency, light weight execution, built-in cancellation, and simplifying asynchronous code. Finally, it provides examples of how to use common coroutine builders like launch, async, and coroutineScope in a basic Android application with ViewModels.
This document contains the transcript from a presentation given by Wonsuk Lim from Naver on tips for debugging and analyzing Android applications. Some key tips discussed include fully utilizing the Android emulator's capabilities like 2-finger touch control, clipboard sharing between the emulator and host PC, and mocking locations. Advanced settings for the emulator like foldable and camera emulation are also covered. The presenter recommends ways to configure developer options and use tools like LeakCanary, the Android profiler, and Stetho for testing app stability. Methods for understanding the Android framework by reviewing system services and managers via AIDL files and logcat dumps are presented. Finally, reverse engineering tools like APK Extractor and decompilers are introduced.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
5. Related Works
5
Hardware architecture
Intel Skylake architecture [1]
NVIDIA Turing architecture [2]
• Traditional computer architectures are not efficient for DNNs.
• NVDIA introduced Tensor cores to accelerate DNNs.
6. Related Works
6
DL accelerator
DianNao architecture [3] ZeNa architecture [4]
• To accelerate neural network, several accelerators are also introduced.
• DNNs consists of simple operations (MAC), so it is easy to accelerate.
• In addition, conditional memory access is also possible thanks to pruning.
7. Related Works
7
Network architecture
Big-Little architecture [5]
ShuffleNet v2 architecture [6]
• Many network architectures are introduced to improve performance.
• In addition, many research also focus on light-weight
and light-computation CNN architecture.
8. Related Works
8
Compression (pruning)
Example of pruning method: ThiNet [7]
• Pruning-based network compression methods were introduced.
• After training, we can remove weak weights or filters.
10. Related Works
10
Compression (distillation)
Deep mutual learning [9]
• By distilling knowledge from cumbersome model, small network can
achieve higher accuracy compared with conventional training method.
13. Network Recasting
13
Network Recasting
• We transform pretrained blocks (source) into new blocks (target).
• The transformation is done by training the target block to generate output
activations (=feature map) similar to those of the source block.
• After training, the source block can be replaced with the target block.
Basic concept of network recasting.
Teacher network Student network
24. Training Methods
24
Mixed-architecture network
• When we recast partially, we can obtain a mixed-architecture network.
• The mixed-architecture network has both advantages of consisting blocks.
Mixed-architecture network
Image via wikipedia
Bottom Top
26. Training Methods
26
Block Training
• To avoid dimension mismatch problem, when training a target block, we
train the target block together with the next block by approximating the
output activations of the next block.
256-d 64-d
27. Training Methods
27
Block Training
• To avoid dimension mismatch problem, when training a target block, we
train the target block together with the next block by approximating the
output activations of the next block.
Dimension mismatch!
256-d 64-d
28. Training Methods
28
Block Training
• To avoid dimension mismatch problem, when training a target block, we
train the target block together with the next block by approximating the
output activations of the next block.
Dimension mismatch!
256-d 64-d
256-d 256-d
37. Training Methods
37
Fine-tuning
• After finishing sequential recasting, we use the knowledge distillation
approach to fine-tune the student network.
• We train the student network with logits of the teacher network and
ground truth.
MSE loss for the logits Cross-entropy loss between
the given label and softmax output
41. Experiments
41
Filter reduction (Compression)
• Recast a given source block into a smaller target block of the same type.
• Network recasting automatically remove redundant filters to reconstruct
the output activation of source block.
Source Target
42. Experiments
42
Visualization of Filter Reduction
• We recast the first block of AlexNet to visualize the filter reduction.
• Our method can remove redundant filters without any similarity or
effectiveness check criteria.
Visualization of filters in the first layer of AlexNet
47. Experiments
47
Activation load
• Generally, 1x1 convolution is used to reduce # of mults and params.
• However, 1x1 convolution actually increases activation loads from main
memory, and thus inference time.
Comparison of # multiplications. Comparison of inference time.
48. Experiments
48
Activation load
• For the activation reduction, we recast source block into the different type.
• By transforming network architecture, we can reduce the inference time.
Smaller
activation
Source Target
53. Experiments
53
Previous works
• Many previous use weight/Filter pruning to reduce # of mults and params.
• The network architecture is not changed, so many 1x1 convolutions still exist.
• Thus, activation loads are still large.
Limitation of weight/filter pruning.
54. Experiments
54
Comparison with Previous work
• Compared with previous work, network recasting achived the lowest error
rate and the highest actual speed-up.
Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
55. Experiments
55
Comparison with Previous work
• Compared with previous work, network recasting achived the lowest error
rate and the highest actual speed-up.
Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
56. Experiments
56
Comparison with Previous work
• Compared with previous work, network recasting achived the lowest error
rate and the highest actual speed-up.
Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
58. Conclusion
58
• The network recasting enables transformation of a network into a
different type.
• Sequential training of a student network gives a better result even
by alleviating vanishing gradient problem.
• The network recasting can remove redundant filters and also
accelerate inference effectively.
We achieved up to 2.1x inference time reduction on ResNet-50
We also achieved up to 3.2x reduction on VGG-16.
60. Reference
60
• [1] https://wccftech.com/idf15-intel-skylake-analysis-cpu-gpu-microarchitecture-ddr4-memory-impact/3/
• [2] https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/
• [3] Chen, Tianshi, et al. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS,
2014.
• [4] Kim, Dongyoung, et al. Zena: Zero-aware neural network accelerator. IEEE Design & Test 35.1 (2018): 39-46.
• [5] Chen, Chun-Fu, et al. Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition.
In ICLR, 2019.
• [6] Ma, Ningning, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV. 2018.
• [7] Luo, J.-H., et al. ThiNet: A filter level pruning method for deep neural network compression. In ICCV, 2017.
• [8] Hinton, G. et al. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning
Workshop, 2014.
• [9] Zhang, Ying, et al. Deep mutual learning. IN CVPR. 2018.
• [10] Yim, Junho, et al. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In
CVPR. 2017.
64. Appendix
64
Experimental Setup
• The network recasting was implemented on the PyTorch framework.
• We adopted batch normalization for all networks.
• We used the Xavier initializer in all experiments.
• We used SGD with Nesterov momentum to train the teacher network and used
Adam optimizer for the network recasting.
• we used the pre-trained ResNet-50, DenseNet-121, and VGG-16 available from
torchvision.