(1) The document proposes Multi-Scale Dense Networks (MSDNet) for efficient image classification with multiple classifiers of varying resource demands. (2) MSDNet uses multi-scale feature maps and dense connectivity to ensure later classifiers have rich features and do not interfere with earlier classifiers. (3) Experiments show MSDNet achieves state-of-the-art results for anytime image classification, where predictions are made until a resource budget expires, and budgeted batch classification, where exits occur after a confidence threshold is met.
Joint Negative and Positive Learning for Noisy Labelsharmonylab
Ā
ē“¹ä»č«ę
Joint Negative and Positive Learning for Noisy Labels
Youngdong Kim Juseung Yun Hyounguk Shon Junmo KimSchool of Electrical Engineering, KAIST, South Korea
ę¦č¦ļ¼Noisy Labelsć«åƾććå¾ę„ęę³ć®NLNLćę¹åććJNPLćęę”ććļ¼ę°ććŖę失é¢ę°NL+ćØPL+ćēØććåäøć®å¦ēæć¢ć«ć“ćŖćŗć ćēØććććØć§åē“åćå¦ēæć³ć¹ćć®åęøćØē²¾åŗ¦åäøćēćļ¼SOTAćéęććļ¼
āIntroduction to DNN Model Compression Techniques,ā a Presentation from XailientEdge AI and Vision Alliance
Ā
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/introduction-to-dnn-model-compression-techniques-a-presentation-from-xailient/
Sabina Pokhrel, Customer Success AI Engineer at Xailient, presents the āIntroduction to DNN Model Compression Techniquesā tutorial at the May 2021 Embedded Vision Summit.
Embedding real-time large-scale deep learning vision applications at the edge is challenging due to their huge computational, memory, and bandwidth requirements. System architects can mitigate these demands by modifying deep-neural networks to make them more energy efficient and less demanding of processing resources by applying various model compression approaches.
In this talk, Pokhrel provides an introduction to four established techniques for model compression. She discusses network pruning, quantization, knowledge distillation and low-rank factorization compression approaches.
The detailed overview of the whole family of StyleGANs starting from the ProgressiveGAN to the latest StyleGAN3.
Such a continuous look at the StyleGAN improvements gives an excellent understanding of the research principles and approaches for improving your own models.
(1) The document proposes Multi-Scale Dense Networks (MSDNet) for efficient image classification with multiple classifiers of varying resource demands. (2) MSDNet uses multi-scale feature maps and dense connectivity to ensure later classifiers have rich features and do not interfere with earlier classifiers. (3) Experiments show MSDNet achieves state-of-the-art results for anytime image classification, where predictions are made until a resource budget expires, and budgeted batch classification, where exits occur after a confidence threshold is met.
Joint Negative and Positive Learning for Noisy Labelsharmonylab
Ā
ē“¹ä»č«ę
Joint Negative and Positive Learning for Noisy Labels
Youngdong Kim Juseung Yun Hyounguk Shon Junmo KimSchool of Electrical Engineering, KAIST, South Korea
ę¦č¦ļ¼Noisy Labelsć«åƾććå¾ę„ęę³ć®NLNLćę¹åććJNPLćęę”ććļ¼ę°ććŖę失é¢ę°NL+ćØPL+ćēØććåäøć®å¦ēæć¢ć«ć“ćŖćŗć ćēØććććØć§åē“åćå¦ēæć³ć¹ćć®åęøćØē²¾åŗ¦åäøćēćļ¼SOTAćéęććļ¼
āIntroduction to DNN Model Compression Techniques,ā a Presentation from XailientEdge AI and Vision Alliance
Ā
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/introduction-to-dnn-model-compression-techniques-a-presentation-from-xailient/
Sabina Pokhrel, Customer Success AI Engineer at Xailient, presents the āIntroduction to DNN Model Compression Techniquesā tutorial at the May 2021 Embedded Vision Summit.
Embedding real-time large-scale deep learning vision applications at the edge is challenging due to their huge computational, memory, and bandwidth requirements. System architects can mitigate these demands by modifying deep-neural networks to make them more energy efficient and less demanding of processing resources by applying various model compression approaches.
In this talk, Pokhrel provides an introduction to four established techniques for model compression. She discusses network pruning, quantization, knowledge distillation and low-rank factorization compression approaches.
The detailed overview of the whole family of StyleGANs starting from the ProgressiveGAN to the latest StyleGAN3.
Such a continuous look at the StyleGAN improvements gives an excellent understanding of the research principles and approaches for improving your own models.
This presentation covers the following topics-
1. Video Classification as a sequence of frames
2. Video Classification as a sequence of frame-blocks
3. 2D ConvNets for Videos
4. CNN + LSTM
This document discusses domain adaptation techniques for machine learning models. It summarizes several papers on domain adaptation methods, including domain-adversarial training which uses a gradient reversal layer to learn domain-invariant features, adversarial discriminative domain adaptation which matches features from source and target domains, and maximum classifier discrepancy which trains generators and classifiers adversarially to preserve decision boundaries during domain adaptation. The document provides an overview of common domain adaptation scenarios and different approaches for supervised, unsupervised, and domain generalization settings.
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
Ā
- Masked Autoencoders Are Scalable Vision Learners presents a new self-supervised learning method called Masked Autoencoder (MAE) for computer vision.
- MAE works by masking random patches of input images, encoding the visible patches, and decoding to reconstruct the full image. This forces the model to learn visual representations from incomplete views of images.
- Experiments on ImageNet show that MAE achieves superior results compared to supervised pre-training from scratch as well as other self-supervised methods, scaling effectively to larger models. MAE representations also transfer well to downstream tasks like object detection, instance segmentation and semantic segmentation.
āAn Introduction to Data Augmentation Techniques in ML Frameworks,ā a Present...Edge AI and Vision Alliance
Ā
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/an-introduction-to-data-augmentation-techniques-in-ml-frameworks-a-presentation-from-amd/
Rajy Rawther, PMTS Software Architect at AMD, presents the āIntroduction to Data Augmentation Techniques in ML Frameworksā tutorial at the May 2021 Embedded Vision Summit.
Data augmentation is a set of techniques that expand the diversity of data available for training machine learning models by generating new data from existing data. This talk introduces different types of data augmentation techniques as well as their uses in various training scenarios.
Rawther explores some built-in augmentation methods in popular ML frameworks like PyTorch and TensorFlow. She also discusses some tips and tricks that are commonly used to randomly select parameters to avoid having model overfit to a particular dataset.
DeNA AIć·ć¹ćć éØå ć®č¼Ŗč¬ć§ēŗč”Øććč³ęć§ććDeep fakesć®ēØ®é”ććć®ę¤åŗę³ć®ē“¹ä»ć§ćć
äø»ć«äøčØć®č«ęć®ē“¹ä»
S. Agarwal, et al., "Protecting World Leaders Against Deep Fakes," in Proc. of CVPR Workshop on Media Forensics, 2019.
A. Rossler, et al., "FaceForensics++: Learning to Detect Manipulated Facial Images," in Proc. of ICCV, 2019.
The document discusses the history and development of artificial neural networks and deep learning. It describes early neural network models like perceptrons from the 1950s and their use of weighted sums and activation functions. It then explains how additional developments led to modern deep learning architectures like convolutional neural networks and recurrent neural networks, which use techniques such as hidden layers, backpropagation, and word embeddings to learn from large datasets.
The document discusses the Vision Transformer (ViT) model for computer vision tasks. It covers:
1. How ViT tokenizes images into patches and uses position embeddings to encode spatial relationships.
2. ViT uses a class embedding to trigger class predictions, unlike CNNs which have decoders.
3. The receptive field of ViT grows as the attention mechanism allows elements to attend to other distant elements in later layers.
4. Initial results showed ViT performance was comparable to CNNs when trained on large datasets but lagged CNNs trained on smaller datasets like ImageNet.
This document is a slide presentation on recent advances in deep learning. It discusses self-supervised learning, which involves using unlabeled data to learn representations by predicting structural information within the data. The presentation covers pretext tasks, invariance-based approaches, and generation-based approaches for self-supervised learning in computer vision and natural language processing. It provides examples of specific self-supervised methods like predicting image rotations, clustering representations to generate pseudo-labels, and masked language modeling.
Bayes by Backprop is a method for introducing weight uncertainty into neural networks using variational Bayesian learning. It represents each weight as a probability distribution rather than a fixed value. This allows the model to better assess uncertainty. The paper proposes Bayes by Backprop, which uses a simple approximate learning algorithm similar to backpropagation to learn the distributions over weights. Experiments show it achieves good results on classification, regression, and contextual bandit problems, outperforming standard regularization methods by capturing weight uncertainty.
Diffusion models beat gans on image synthesisBeerenSahu
Ā
Diffusion models have recently been shown to produce higher quality images than GANs while also offering better diversity and being easier to scale and train. Specifically, a 2021 paper by OpenAI demonstrated that a diffusion model achieved an FID score of 2.97 on ImageNet 128x128, beating the previous state-of-the-art held by BigGAN. Diffusion models work by gradually adding noise to images in a forward process and then learning to remove noise in a backward denoising process, allowing them to generate diverse, high fidelity images.
The document discusses data augmentation techniques for improving machine learning models. It begins with definitions of data augmentation and reasons for using it, such as enlarging datasets and preventing overfitting. Examples of data augmentation for images, text, and audio are provided. The document then demonstrates how to perform data augmentation for natural language processing tasks like text classification. It shows an example of augmenting a movie review dataset and evaluating a text classifier. Pros and cons of data augmentation are discussed, along with key takeaways about using it to boost performance of models with small datasets.
This document discusses mixed precision training techniques for deep neural networks. It introduces three techniques to train models with half-precision floating point without losing accuracy: 1) Maintaining a FP32 master copy of weights, 2) Scaling the loss to prevent small gradients, and 3) Performing certain arithmetic like dot products in FP32. Experimental results show these techniques allow a variety of networks to match the accuracy of FP32 training while reducing memory and bandwidth. The document also discusses related work and PyTorch's new Automatic Mixed Precision features.
This document provides an outline and overview of training convolutional neural networks. It discusses update rules like stochastic gradient descent, momentum, and Adam. It also covers techniques like data augmentation, transfer learning, and monitoring the training process. The goal of training a CNN is to optimize its weights and parameters to correctly classify images from the training set by minimizing output error through backpropagation and updating weights.
This presentation covers the following topics-
1. Video Classification as a sequence of frames
2. Video Classification as a sequence of frame-blocks
3. 2D ConvNets for Videos
4. CNN + LSTM
This document discusses domain adaptation techniques for machine learning models. It summarizes several papers on domain adaptation methods, including domain-adversarial training which uses a gradient reversal layer to learn domain-invariant features, adversarial discriminative domain adaptation which matches features from source and target domains, and maximum classifier discrepancy which trains generators and classifiers adversarially to preserve decision boundaries during domain adaptation. The document provides an overview of common domain adaptation scenarios and different approaches for supervised, unsupervised, and domain generalization settings.
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
Ā
- Masked Autoencoders Are Scalable Vision Learners presents a new self-supervised learning method called Masked Autoencoder (MAE) for computer vision.
- MAE works by masking random patches of input images, encoding the visible patches, and decoding to reconstruct the full image. This forces the model to learn visual representations from incomplete views of images.
- Experiments on ImageNet show that MAE achieves superior results compared to supervised pre-training from scratch as well as other self-supervised methods, scaling effectively to larger models. MAE representations also transfer well to downstream tasks like object detection, instance segmentation and semantic segmentation.
āAn Introduction to Data Augmentation Techniques in ML Frameworks,ā a Present...Edge AI and Vision Alliance
Ā
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/an-introduction-to-data-augmentation-techniques-in-ml-frameworks-a-presentation-from-amd/
Rajy Rawther, PMTS Software Architect at AMD, presents the āIntroduction to Data Augmentation Techniques in ML Frameworksā tutorial at the May 2021 Embedded Vision Summit.
Data augmentation is a set of techniques that expand the diversity of data available for training machine learning models by generating new data from existing data. This talk introduces different types of data augmentation techniques as well as their uses in various training scenarios.
Rawther explores some built-in augmentation methods in popular ML frameworks like PyTorch and TensorFlow. She also discusses some tips and tricks that are commonly used to randomly select parameters to avoid having model overfit to a particular dataset.
DeNA AIć·ć¹ćć éØå ć®č¼Ŗč¬ć§ēŗč”Øććč³ęć§ććDeep fakesć®ēØ®é”ććć®ę¤åŗę³ć®ē“¹ä»ć§ćć
äø»ć«äøčØć®č«ęć®ē“¹ä»
S. Agarwal, et al., "Protecting World Leaders Against Deep Fakes," in Proc. of CVPR Workshop on Media Forensics, 2019.
A. Rossler, et al., "FaceForensics++: Learning to Detect Manipulated Facial Images," in Proc. of ICCV, 2019.
The document discusses the history and development of artificial neural networks and deep learning. It describes early neural network models like perceptrons from the 1950s and their use of weighted sums and activation functions. It then explains how additional developments led to modern deep learning architectures like convolutional neural networks and recurrent neural networks, which use techniques such as hidden layers, backpropagation, and word embeddings to learn from large datasets.
The document discusses the Vision Transformer (ViT) model for computer vision tasks. It covers:
1. How ViT tokenizes images into patches and uses position embeddings to encode spatial relationships.
2. ViT uses a class embedding to trigger class predictions, unlike CNNs which have decoders.
3. The receptive field of ViT grows as the attention mechanism allows elements to attend to other distant elements in later layers.
4. Initial results showed ViT performance was comparable to CNNs when trained on large datasets but lagged CNNs trained on smaller datasets like ImageNet.
This document is a slide presentation on recent advances in deep learning. It discusses self-supervised learning, which involves using unlabeled data to learn representations by predicting structural information within the data. The presentation covers pretext tasks, invariance-based approaches, and generation-based approaches for self-supervised learning in computer vision and natural language processing. It provides examples of specific self-supervised methods like predicting image rotations, clustering representations to generate pseudo-labels, and masked language modeling.
Bayes by Backprop is a method for introducing weight uncertainty into neural networks using variational Bayesian learning. It represents each weight as a probability distribution rather than a fixed value. This allows the model to better assess uncertainty. The paper proposes Bayes by Backprop, which uses a simple approximate learning algorithm similar to backpropagation to learn the distributions over weights. Experiments show it achieves good results on classification, regression, and contextual bandit problems, outperforming standard regularization methods by capturing weight uncertainty.
Diffusion models beat gans on image synthesisBeerenSahu
Ā
Diffusion models have recently been shown to produce higher quality images than GANs while also offering better diversity and being easier to scale and train. Specifically, a 2021 paper by OpenAI demonstrated that a diffusion model achieved an FID score of 2.97 on ImageNet 128x128, beating the previous state-of-the-art held by BigGAN. Diffusion models work by gradually adding noise to images in a forward process and then learning to remove noise in a backward denoising process, allowing them to generate diverse, high fidelity images.
The document discusses data augmentation techniques for improving machine learning models. It begins with definitions of data augmentation and reasons for using it, such as enlarging datasets and preventing overfitting. Examples of data augmentation for images, text, and audio are provided. The document then demonstrates how to perform data augmentation for natural language processing tasks like text classification. It shows an example of augmenting a movie review dataset and evaluating a text classifier. Pros and cons of data augmentation are discussed, along with key takeaways about using it to boost performance of models with small datasets.
This document discusses mixed precision training techniques for deep neural networks. It introduces three techniques to train models with half-precision floating point without losing accuracy: 1) Maintaining a FP32 master copy of weights, 2) Scaling the loss to prevent small gradients, and 3) Performing certain arithmetic like dot products in FP32. Experimental results show these techniques allow a variety of networks to match the accuracy of FP32 training while reducing memory and bandwidth. The document also discusses related work and PyTorch's new Automatic Mixed Precision features.
This document provides an outline and overview of training convolutional neural networks. It discusses update rules like stochastic gradient descent, momentum, and Adam. It also covers techniques like data augmentation, transfer learning, and monitoring the training process. The goal of training a CNN is to optimize its weights and parameters to correctly classify images from the training set by minimizing output error through backpropagation and updating weights.
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
Ā
The document summarizes a study on training Vision Transformers (ViTs) by exploring different combinations of data augmentation, regularization techniques, model sizes, and training dataset sizes. Some key findings include: 1) Models trained with extensive data augmentation on ImageNet-1k performed comparably to those trained on the larger ImageNet-21k dataset without augmentation. 2) Transfer learning from pre-trained models was more efficient and achieved better results than training models from scratch, even with extensive compute. 3) Models pre-trained on more data showed better transfer ability, indicating more data yields more generic representations.
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
Ā
This document summarizes a research paper that proposes a new method called TTT-MAE (Test-Time Training with Masked Autoencoders) to address the problem of domain shift in visual recognition tasks. TTT-MAE uses masked autoencoders as the self-supervised pretext task in test-time training, instead of rotation prediction as used in previous work. Experimental results on datasets like ImageNet-C and ImageNet-R show that TTT-MAE achieves higher performance gains than prior methods under different types of distribution shifts. However, TTT-MAE is slower at test time than directly applying a fixed model. Future work could focus on improving efficiency and generalizing the approach to other tasks
This document proposes ResNeSt, a split-attention network that divides feature maps into groups and applies attention mechanisms across groups. It outperforms ResNet variants on image classification, object detection, semantic segmentation, and instance segmentation while maintaining the same computational efficiency. The paper introduces ResNeSt's split attention block, training strategies including large batches, data augmentation, and regularization methods. Evaluation shows ResNeSt achieves state-of-the-art accuracy on ImageNet and downstream tasks using less computation than NAS models.
This document provides an introduction and tutorial on transfer learning using Keras pretrained models. It discusses using pretrained models like ResNet50 for feature extraction and fine-tuning. The code example shows transferring a pretrained ResNet50 model trained on ImageNet to a CIFAR10 image classification task. Homework involves trying different pretrained models like VGG16, changing trainable layers, or applying the techniques to new tasks.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Presentation in Vietnam Japan AI Community in 2019-05-26.
The presentation summarizes what I've learned about Regularization in Deep Learning.
Disclaimer: The presentation is given in a community event, so it wasn't thoroughly reviewed or revised.
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...ssuser6a46522
Ā
Batch normalization is a technique that normalizes the inputs to each node in a neural network layer. It reduces internal covariate shift and allows higher learning rates to be used, speeding up training. The method normalizes each feature by subtracting the batch mean and dividing by the batch standard deviation. It was shown to significantly accelerate training of state-of-the-art models on MNIST and ImageNet, requiring only a fraction of the training steps to reach the same level of accuracy compared to models without batch normalization.
The document discusses improving generalization performance in large mini-batch training for distributed deep learning. It proposes using second-order optimization methods like natural gradient descent (NGD) and K-FAC to converge with fewer iterations, and smoothing the objective function with data augmentation techniques like mixup to avoid sharp minima and improve accuracy. Experimental results show that K-FAC converges faster than SGD but suffers from greater accuracy degradation with large batches. Applying mixup to K-FAC training reduces the accuracy degradation and improves generalization.
Practical tips for handling noisy data and annotaitonRyuichiKanoh
Ā
The document summarizes a KaggleDays workshop on techniques for handling noisy data and annotation. It includes an agenda covering an introduction, experiment setup, and techniques for learning with noisy datasets. The techniques discussed are mixup, using large batch sizes, and distillation. For mixup, virtual training samples are constructed by linearly interpolating real samples and labels. Large batch sizes help because noise from random labels cancels out within a batch. Distillation trains a student network using predictions from a pre-trained teacher network to ease training. Code links and examples of applying the techniques in competitions are also provided.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
Ā
The document presents SimCLR, a framework for contrastive learning of visual representations using simple data augmentation. Key aspects of SimCLR include using random cropping and color distortions to generate positive sample pairs for the contrastive loss, a nonlinear projection head to learn representations, and large batch sizes. Evaluation shows SimCLR learns representations that outperform supervised pretraining on downstream tasks and achieves state-of-the-art results with only view augmentation and contrastive loss.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
Ā
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
This document discusses various techniques for optimizing deep neural network models and hardware for efficiency. It covers approaches such as exploiting activation and weight statistics, sparsity, compression, pruning neurons and synapses, decomposing trained filters, and knowledge distillation. The goal is to reduce operations, memory usage, and energy consumption to enable efficient inference on hardware like mobile phones and accelerators. Evaluation methodologies are also presented to guide energy-aware design space exploration.
Predicting Drug Target Interaction Using Deep Belief NetworkRashim Dhaubanjar
Ā
With the advancement in AI field, machine learning methods are being used to train the classifier for separating intractable drug-target pair as it is difficult to classify dockable and non-dockable ligands due to non-linear nature of big-biological data. As deep learning has been shown to produce state-of-the-art results on various tasks, we propose a new approach to predict the interaction between drug and targets efficiently. The DBN is used to extract the high level features from 2D chemical substructure represented in fingerprint format. DBN is trained in a greedy layer-wise unsupervised fashion and the result from this pre-training phase is used to initialize the parameters prior to BP used for fine tuning. Similarly, logistic regression layer is staked as output layer. Then it is fine-tuned using BP of error derivative to build classification model that directly predict whether a drug interacts with a target of interest or not. In addition to this we too propose an approach to reduce the time complexity of training the learning method with the use of GPU which is highly parallel programmable processor featuring peak arithmetic and memory bandwidth that substantially outpaces its CPU counterpart.
Deep Gradient Compression reduces the communication bandwidth required for distributed deep learning training by 300-600x without impacting accuracy. It achieves this through gradient sparsification, local gradient accumulation, momentum correction, local gradient clipping, momentum factor masking, and warm-up training. These techniques compress gradients to reduce data transfer size while preventing accuracy loss. The paper demonstrates Deep Gradient Compression can scale distributed training to use inexpensive network infrastructure.
Similar to Bag of tricks for image classification with convolutional neural networks review [cdm] (20)
1) iMTFA is an incremental approach to few-shot instance segmentation that allows adding new classes without retraining.
2) It extends the MTFA baseline by training an instance feature extractor to generate discriminative embeddings for each instance, with the average embedding used as the class representative.
3) At inference, it predicts classes based on the cosine distance between ROI embeddings and stored class representatives, using class-agnostic box regression and mask prediction.
4) Experiments on COCO, VOC2007 and VOC2012 show iMTFA outperforms SOTA few-shot object detection and instance segmentation methods while enabling incremental class addition.
The document discusses the application of transformers to computer vision tasks. It first introduces the standard transformer architecture and its use in natural language processing. It then summarizes recent works on applying transformers to object detection (DETR) and image classification (ViT). DETR proposes an end-to-end object detection method using a CNN-Transformer encoder-decoder architecture. Deformable DETR improves on DETR by incorporating deformable attention mechanisms. ViT represents images as sequences of patches and applies a standard Transformer encoder for image recognition, exceeding state-of-the-art models with less pre-training computation. While promising results have been achieved, challenges remain regarding model parameters and expanding transformer applications to other computer vision tasks.
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationDongmin Choi
Ā
The document proposes a method called ConResNet to improve 3D medical image segmentation using inter-slice context residual learning. ConResNet captures inter-slice contextual information more effectively using a context residual module. This module predicts a context residual mask showing differences between adjacent slices and uses it to boost the model's ability to perceive 3D context. Experiments on brain tumor and pancreas segmentation datasets demonstrate ConResNet outperforms other methods and ablation studies show the effectiveness of using context residuals.
Review : Structure Boundary Preserving SegmentationāØfor Medical Image with Am...Dongmin Choi
Ā
Paper title : Structure Boundary Preserving SegmentationāØfor Medical Image with Ambiguous Boundary (CVPR2020)
Paper link : https://openaccess.thecvf.com/content_CVPR_2020/papers/Lee_Structure_Boundary_Preserving_Segmentation_for_Medical_Image_With_Ambiguous_Boundary_CVPR_2020_paper.pdf
This document discusses how to customize options when extracting radiomic features using the Pyradiomics package in Python. It covers reading image and mask files using SimpleITK, extracting features using a feature extractor object, and customizing options at the feature extractor level and filter class level, including specifying which features to extract, image preprocessing settings, and filter-specific settings.
Seeing What a GAN Cannot Generate [cdm]Dongmin Choi
Ā
1. The document discusses research on understanding limitations in generative adversarial networks (GANs) by analyzing what types of images or image features GANs cannot generate.
2. Mode collapse is identified as a problem where GAN outputs are identical. The paper presents techniques for understanding omissions in the distribution and individual images generated by GANs.
3. By inverting a GAN to find the closest latent vector that generates a real image, differences between the real and generated images reveal what the GAN cannot capture. This helps analyze limitations in GANs and understand how they can be improved.
This paper proposes a method called network deconvolution to remove pixel-wise and channel-wise correlation in convolutional networks. It does this by learning a decorrelation matrix during training that whitens the input data, removing redundancy. Experiments show it converges faster than batch normalization and achieves better performance on image classification tasks. The method is inspired by the decorrelation process observed in animal visual cortex and results in sparser representations.
How much position information do convolutional neural networks encode? review...Dongmin Choi
Ā
This document presents research into whether convolutional neural networks (CNNs) encode absolute spatial or position information. The authors hypothesize that CNN models implicitly encode position information through techniques like zero-padding. They propose PosENet, a model that couples a pretrained encoder like VGG or ResNet with a position encoding module to predict gradient-like position maps. PosENet is trained on a saliency detection dataset and evaluated on a semantic segmentation dataset. Results show deeper models and position-dependent tasks encode more position information. The authors conclude that zero-padding plays a key role in delivering position cues to CNNs.
Objects as points (CenterNet) review [CDM]Dongmin Choi
Ā
The document proposes representing objects as single center points rather than bounding boxes. This allows detecting objects through keypoint estimation using a single neural network without post-processing. The method, called CenterNet, predicts center points along with object properties like size in one forward pass. Experiments show CenterNet runs in real-time and is simpler, faster and more accurate than two-stage detectors that require additional pre and post-processing steps. It provides a new direction for real-time object recognition.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
Ā
An English š¬š§ translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech šØšæ version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Ā
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as ākeysā). In fact, itās unlikely youāll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, theyāll also be making use of the Split-Merge Block functionality.
Youāll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Ā
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Ā
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
Ā
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
Ā
š Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
š Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
š» Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
š Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
What is an RPA CoE? Session 1 ā CoE VisionDianaGray10
Ā
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
ā¢ The role of a steering committee
ā¢ How do the organizationās priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Ā
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
āāTwitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
ā
āFacebook(Meta): https://www.facebook.com/mydbops/
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Ā
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
Ā
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
āTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...Edge AI and Vision Alliance
Ā
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the āTemporal Event Neural Networks: A More Efficient Alternative to the Transformerā tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChipās Akida neuromorphic hardware IP further enhances TENNsā capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
Ā
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
3. Abstract
ā¢ Focus on training reļ¬nements, such as data augmentations
and optimization methods
ā¢ Tips of how to deal with these reļ¬nements
ā¢ Raised ResNet-50ās top-1 accuracy from 73.5% to 79.29%āØ
on ImageNet
ā¢ These improvements also leads to better transfer learning
performance
4. Introduction
ā¢ Image Classiļ¬cation performance has been raised with various
architecture including AlexNet, VGG, ResNet, DenseNet and
NASNet.
ā¢ However, these advancements did not solely come from improved
model architectureāØ
- Training reļ¬nements (ex. loss functions, data preprocessing,
optimization) as played a major role
7. Training Procedures
1. Baseline Training Procedure - During Training
ā¢ (1) Randomly sample an image and decode it into 32-bit ļ¬oating point raw pixel
values in [0, 255].
ā¢ (2) Randomly crop a rectangular region whose aspect ratio is randomly sampled
in [3/4, 4/3] and area randomly sampled in [8%, 100%], then resize the cropped
region into a 224-by-224 square image.
ā¢ (3) Flip horizontally with 0.5 probability.
ā¢ (e) Scale hue, saturation, and brightness with coeļ¬cients uniformly drawn fromāØ
[0.6, 1.4].
ā¢ (5) Add PCA noise with a coeļ¬cient sampled from a normal distribution (0, 0.1).
ā¢ (6). Normalize RGB channels by subtracting 123.68, 116.779, 103.939 and
dividing by 58.393, 57.12, 57.375, respectively.
N
8. Training Procedures
1. Baseline Training Procedure - During Validation
ā¢ Resize each imageās shorter edge to 256 pixels while keeping its aspect
ratio
ā¢ Crop out 224-by-224 region in the center
ā¢ Normalize RGB channels similar to training
ā¢ No random data augmentations
9. Training Procedures
1. Baseline Training Procedure - Weights Initialization
ā¢ Convolutional and fully-connected layers : Xavier algorithm
ā¢ All biases : 0
ā¢ For batch normalization, vectors : 1 / vectors : 0Ī³ Ī²
10. Training Procedures
1. Baseline Training Procedure - Optimizer and Hyper-parameters
ā¢ Optimizer : Nesterov Accelerated Gradient (NAG) descent
ā¢ Epochs : 120
ā¢ Batch size : 256
ā¢ GPU : 8 NVIDIA V100 GPUs
ā¢ Learning rate : 0.1 (decay at 30, 60, and 90 epochs)
13. Efļ¬cient Training
1. Large-batch Training
ā¢ It is more eļ¬cient to use larger batch size
ā¢ But using large batch size decreases convergence rate for convex
problems and degrades validation accuracy.
ā¢ Four heuristics that help scale the batch size upāØ
- Linear scaling learning rateāØ
- Learning rate warmupāØ
- Zero āØ
- No bias decay
Ī³
14. Efļ¬cient Training
1. Large-batch Training : Linear Scaling Learning Rate
ā¢ A large batch size reduces the noise in the gradient (= reduces variance)
ā¢ It is possible to increase the learning rate with a large batch size
ā¢ In Goyal et al*, linearly increasing the learning rate with the batch size
works empirically for ResNet-50 training.
ā¢ In He et al**, set the initial learning rate to 0.1 x b/256 (b = batch size)
Goyal et al*, Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017
He et al**, Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770ā778, 2016
15. Efļ¬cient Training
1. Large-batch Training : Learning Rate Warmup
ā¢ Initial network parameters are typically far away from the ļ¬nal solutionāØ
- Using a too large learning rate may result in numerical instability
ā¢ Using a small learning rate at the beginning and thenāØ
switch back to the initial learning rate when the training process is stable
ā¢ In Goyal et al*, set the learning rate as belowāØ
- Use the ļ¬rst batches to warm upāØ
- Initial learning rate : āØ
- At batch , the learning rate :
m
Ī·
i(1 ā¤ i ā¤ m) iĪ·/m
16. Efļ¬cient Training
1. Large-batch Training : Zero Ī³
ā¢ In batch normalization, the output is āØ
( : standardized input / : learnable parameters initialized to 1s and 0s)
ā¢ Zero initialization heuristicāØ
- Initialize for all BN layers that sit at the end of a residual blockāØ
- All residual blocks just return their inputs [ output = + block ]āØ
- Mimics network that has less number of layers and is easier to traināØ
at the initial stage
Ī³ Ģx + Ī²
Ģx x Ī³, Ī²
Ī³
Ī³ = 0
x (x)
https://mc.ai/bag-of-tricks-for-image-classiļ¬cation-with-convolutional-neural-networks-in-keras/
17. Efļ¬cient Training
1. Large-batch Training : No bias decay
ā¢ Weight decay is often applied to all learnable parameters in order to āØ
avoid overļ¬tting
ā¢ According to Jia et al*, itās recommended to only apply weight decay to
weights (not to bias)
ā¢ That is, other parameters (biases and and in BN layers) are left
unregularized
Ī³ Ī²
Jia et al*, Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205, 2018. āØ
18. Efļ¬cient Training
2. Low-precision training
ā¢ Neural networks are commonly trained with 32-bit-ļ¬oating-point (FP32)
ā¢ However, new hardware support lower precision data types
ā¢ 2 to 3 times faster after switchingāØ
from FP32 to FP16 on V100
ā¢ Micikevicius et al* suggestedāØ
Mixed precision trainingāØ
( Forward and backward in FP16,āØ
Weight update in FP32)
Micikevicius et al*, Mixed precision training. arXiv preprint arXiv:1710.03740, 2017.
20. Model Tweaks
ā¢ A minor adjustment to the network architecture, such as
changing the stride
ā¢ Such a tweak often barely changes the computational
complexity but might have a non-negligible eļ¬ect on accuracy
22. Model Tweaks
Original Downsampling Block
ResNet-B Downsampling Block
Using a kernel size 1x1 with a stride of 2āØ
ā Ignores three-quarters of the input feature map
First appeared in a Torch implementation
23. Model Tweaks
Original Input Stem
ResNet-C Input Stem
7 x 7 convolution is 5.4 times more expensiveāØ
than 3 x 3 convolution
Proposed in Inception-v2 originally
24. Model Tweaks
Original Input Stem
ResNet-D Downsampling Block
Path B also ignores 3/4 of input feature mapsāØ
because of stride size
Proposed model tweak
Original Downsampling Block
27. Training Reļ¬nements
1. Cosine Learning Rate Decay
ā¢ Cosine Annealing Strategy proposed by Loshchilov et al*
Ī·t =
1
2 (
1 + cos
(
tĻ
T ))
Ī·
- : Learning rate at batch
- : The number of total batches
Ī·t t
T
Loshchilov et al*, SGDR: stochastic gradient de- scent with restarts. CoRR, abs/1608.03983, 2016.
29. Training Reļ¬nements
2. Label Smoothing Proposed by Inception-v2*
Inception-v2* : C.Szegedy et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
Ground Truth Probability :
Optimal Solution :
- : the number of labelsK
30. Training Reļ¬nements
2. Label Smoothing Proposed by Inception-v2*
Inception-v2* : C.Szegedy et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
centers at the theoretical value
fewer extreme values
31. Training Reļ¬nements
3. Knowledge Distillation
G Hinton et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. āØ
Image
Teacher ModelāØ
(Large; ex. ResNet-152)
Student ModelāØ
(Small; ex. ResNet-50)
r
z
Output
Transfer
loss(p, softmax(z)) + T2
loss(softmax(r/T), softmax(z/T))
To improve accuracy of the student model while keeping its complexity the same
ā» : the temperature hyper-parameter to make the softmax outputs smootherT
32. Training Reļ¬nements
4. Mixup Training
H Zhang et al. mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.
https://hoya012.github.io/blog/Bag-of-Tricks-for-Image-Classiļ¬cation-with-Convolutional-Neural-Networks-Review/
Only use , generated by a weighted linear interpolation of two example( Ģx, Ģy) (xi, yi), (xj, yj)
drawn from the Beta distribution(Ī±, Ī±)
33. Training Reļ¬nements
5. Experiment Results
- for label smoothing
- for knowledge distillation
- for mixup training
Ļµ = 0.1
T = 20
Ī± = 0.2