Binary classification and linear separators. Perceptron, ADALINE, artifical neurons. Artificial neural networks (ANNs), activation functions, and universal approximation theorem. Linear versus non-linear classification problems. Typical tasks, architectures and loss functions. Gradient descent and back-propagation. Support Vector Machines (SVMs), soft-margins and kernel trick. Connexions between ANNs and SVMs.
Localization and classification. Overfeat: class agnostic versu class specific localization, fully convolutional neural networks, greedy merge strategy. Multiobject detection. Region proposal and selective search. R-CNN, Fast R-CNN, Faster R-CNN and YOLO. Image segmentation. Semantic segmentation and transposed convolutions. Instance segmentation and Mask R-CNN. Image captioning. Recurrent Neural Networks (RNNs). Language generation. Long Short Term Memory (LSTMs). DeepImageSent, Show and Tell, and Show, Attend and Tell algorithms.
Overview of the course. Introduction to image sciences, image processing and computer vision. Basics of machine learning, terminologies, paradigms. No-free lunch theorem. Supervised versus unsupervised learning. Clustering and K-Means. Classification and regression. Linear least squares and polynomial curve fitting. Model complexity and overfitting. Curse of dimensionality. Dimensionality reduction and principal component analysis. Image representation, semantic gap, image features, and classical computer vision pipelines.
Image generation. Gaussian models for human faces, limits and relations with linear neural networks. Generative adversarial networks (GANs), generators, discrinators, adversarial loss and two player games. Convolutional GAN and image arithmetic. Super-resolution. Nearest-neighbor, bilinear and bicubic interpolation. Image sharpening. Linear inverse problems, Tikhonov and Total-Variation regularization. Super-Resolution CNN, VDSR, Fast SRCNN, SRGAN, perceptual, adversarial and content losses. Style transfer: Gatys model, content loss and style loss.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Binary classification and linear separators. Perceptron, ADALINE, artifical neurons. Artificial neural networks (ANNs), activation functions, and universal approximation theorem. Linear versus non-linear classification problems. Typical tasks, architectures and loss functions. Gradient descent and back-propagation. Support Vector Machines (SVMs), soft-margins and kernel trick. Connexions between ANNs and SVMs.
Localization and classification. Overfeat: class agnostic versu class specific localization, fully convolutional neural networks, greedy merge strategy. Multiobject detection. Region proposal and selective search. R-CNN, Fast R-CNN, Faster R-CNN and YOLO. Image segmentation. Semantic segmentation and transposed convolutions. Instance segmentation and Mask R-CNN. Image captioning. Recurrent Neural Networks (RNNs). Language generation. Long Short Term Memory (LSTMs). DeepImageSent, Show and Tell, and Show, Attend and Tell algorithms.
Overview of the course. Introduction to image sciences, image processing and computer vision. Basics of machine learning, terminologies, paradigms. No-free lunch theorem. Supervised versus unsupervised learning. Clustering and K-Means. Classification and regression. Linear least squares and polynomial curve fitting. Model complexity and overfitting. Curse of dimensionality. Dimensionality reduction and principal component analysis. Image representation, semantic gap, image features, and classical computer vision pipelines.
Image generation. Gaussian models for human faces, limits and relations with linear neural networks. Generative adversarial networks (GANs), generators, discrinators, adversarial loss and two player games. Convolutional GAN and image arithmetic. Super-resolution. Nearest-neighbor, bilinear and bicubic interpolation. Image sharpening. Linear inverse problems, Tikhonov and Total-Variation regularization. Super-Resolution CNN, VDSR, Fast SRCNN, SRGAN, perceptual, adversarial and content losses. Style transfer: Gatys model, content loss and style loss.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Evolution of Deep Learning and new advancementsChitta Ranjan
Earlier known as neural networks, deep learning saw a remarkable resurgence in the past decade. Neural networks did not find enough adopters in the past century due to its limited accuracy in real world applications (due to various reasons) and difficult interpretation. Many of these limitations got resolved in the recent years, and it was re-branded as deep learning. Now deep learning is widely used in industry and has become a popular research topic in academia. Learning about the passage of its evolution and development is intriguing. In this presentation, we will learn about how we resolved the issues in last generation neural networks, how we reached to the recent advanced methods from the earlier works, and different components of deep learning models.
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Evolution of Deep Learning and new advancementsChitta Ranjan
Earlier known as neural networks, deep learning saw a remarkable resurgence in the past decade. Neural networks did not find enough adopters in the past century due to its limited accuracy in real world applications (due to various reasons) and difficult interpretation. Many of these limitations got resolved in the recent years, and it was re-branded as deep learning. Now deep learning is widely used in industry and has become a popular research topic in academia. Learning about the passage of its evolution and development is intriguing. In this presentation, we will learn about how we resolved the issues in last generation neural networks, how we reached to the recent advanced methods from the earlier works, and different components of deep learning models.
deep learningFeature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham
2 Neural Networks Basics Perceptron Equivalent performance to least mean square algorithm (linear regression) Activation Function Sigmoid, Hyperbolic Tangent Multi Layer Perceptrons Chains of perceptrons, perform feature extraction Training the network Training set, validation set, generalization set Back propagation
3 Perceptron and Activation Function The basic building block of Neural Networks (1) Summation of weighted inputs Bias performs change in y-intercept Output is present when the activation Threshold is overcome Activation function must be differentiable w1 w3 w2 Activation function Input 1 Input 2 Input 3 output Bias
4 Multi-Layer Perceptrons and Training Classification with 20 node MLP NN (4) Feature extraction with 5 layered Convolutional Neural Network (2) Feature Extraction with MLP NN (4)
5 Article Objectives … we propose a supervised approach for task-aware selection of features using Deep Neural Networks (DNN) in the context of action recognition (e.g. walking, running, jumping). (1) … selected features are found to give better classification performance than the original high-dimensional features. (1) It is also shown that the classification performance of the proposed feature selection technique is superior to the low-dimensional representation obtained by principal component analysis (PCA). (1)
6 Methodology … analyze the contribution of each of the input dimensions to identify the features (inputs) important for classification (1) … to correctly analyze the contribution of an input feature, we study its activation potential (averaged over all training values of the input and hidden neurons) relative to the total activation potential (1) The higher the activation potential contribution of an input dimension, the more likely is its participation in hidden neuronal activity and consequently, classification. (1)Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham
2 Neural Networks Basics Perceptron Equivalent performance to least mean square algorithm (linear regression) Activation Function Sigmoid, Hyperbolic Tangent Multi Layer Perceptrons Chains of perceptrons, perform feature extraction Training the network Training set, validation set, generalization set Back propagation
3 Perceptron and Activation Function The basic building block of Neural Networks (1) Summation of weighted inputs Bias performs change in y-intercept Output is present when the activation Threshold is overcome Activation function must be differentiable w1 w3 w2 Activation function Input 1 Input 2 Input 3 output Bias
4 Multi-Layer Perceptrons and Training Classification with 20 node MLP NN (4) Feature extraction with 5 layered Convolutional Neural Network (2) Feature Extraction with MLP NN (4)
5 Article Objectives … we propose a supervised approach fo
ResNet (short for Residual Network) is a deep neural network architecture that has achieved significant advancements in image recognition tasks. It was introduced by Kaiming He et al. in 2015.
The key innovation of ResNet is the use of residual connections, or skip connections, that enable the network to learn residual mappings instead of directly learning the desired underlying mappings. This addresses the problem of vanishing gradients that commonly occurs in very deep neural networks.
In a ResNet, the input data flows through a series of residual blocks. Each residual block consists of several convolutional layers followed by batch normalization and rectified linear unit (ReLU) activations. The original input to a residual block is passed through the block and added to the output of the block, creating a shortcut connection. This addition operation allows the network to learn residual mappings by computing the difference between the input and the output.
By using residual connections, the gradients can propagate more effectively through the network, enabling the training of deeper models. This enables the construction of extremely deep ResNet architectures with hundreds of layers, such as ResNet-101 or ResNet-152, while still maintaining good performance.
ResNet has become a widely adopted architecture in various computer vision tasks, including image classification, object detection, and image segmentation. Its ability to train very deep networks effectively has made it a fundamental building block in the field of deep learning.
Similar to MLIP - Chapter 3 - Introduction to deep learning (20)
Heat equation. Discretization and finite difference. Explicit and implicit Euler schemes. CFL conditions. Continuous Gaussian convolution solution. Linear and non-linear scale spaces. Anisotropic diffusion. Perona-Malik and Weickert model. Variational methods. Tikhonov regularization by gradient descent. Links between variational models and diffusion models. Total-Variation regularization and ROF model. Sparsity and group sparsity. Applications to image deconvolution.
Sample mean, law of large numbers, and method of moments. Mean square error and bias-variance tradeoff. Unbiased estimation: MVUE, Cramèr-Rao-Bound, Efficiency, MLE. Linear estimation: BLUE, Gauss-Markov theorem, least-square error estimator, Moore-Penrose pseudo-inverse. Bayesian estimators: likelihood and priors, MMSE and posterior mean, MAP. Linear MMSE, applications to Wiener deconvolution, image filtering with PCA, and the non-local Bayes algorithm. Applications to image denoising.
Limits of Wiener filtering. Non-linear shrinkage functions. Limits of Fourier representation. Continuous and discrete wavelet transforms. Sparsity and shrinkage in wavelet domain. Undecimated wavelet transforms, a trous algorithm. Regularization. Sparse regression, combinatorial optimization and matching pursuit. LASSO, non-smooth optimization, and proximal minimization. Link with implcit Euler scheme. ISTA algorithm. Syntehsis versus analysis regularized models. Applications to image deconvolution.
Patch models and sparse decompositions of image patches. Dictionary learning and the k-SVD algorithm. Collaborative filtering and BM3D. Non-local sparse based models. Expected patch log-likelihood. Other applications of patch models in inpainting, super-resolution and deblurring.
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) Charles Deledalle
Moving averages. Finite differences and edge detectors. Gradient, Sobel and Laplacian. Linear translations invariant filters, cross-correlation and convolution. Adaptive and non-linear filters. Median filters. Morphological filters. Local versus global filters. Sigma filter. Bilateral filter. Patches and non-local means. Applications to image denoising.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxnikitacareer3
Looking for the best engineering colleges in Jaipur for 2024?
Check out our list of the top 10 B.Tech colleges to help you make the right choice for your future career!
1) MNIT
2) MANIPAL UNIV
3) LNMIIT
4) NIMS UNIV
5) JECRC
6) VIVEKANANDA GLOBAL UNIV
7) BIT JAIPUR
8) APEX UNIV
9) AMITY UNIV.
10) JNU
TO KNOW MORE ABOUT COLLEGES, FEES AND PLACEMENT, WATCH THE FULL VIDEO GIVEN BELOW ON "TOP 10 B TECH COLLEGES IN JAIPUR"
https://www.youtube.com/watch?v=vSNje0MBh7g
VISIT CAREER MANTRA PORTAL TO KNOW MORE ABOUT COLLEGES/UNIVERSITITES in Jaipur:
https://careermantra.net/colleges/3378/Jaipur/b-tech
Get all the information you need to plan your next steps in your medical career with Career Mantra!
https://careermantra.net/
3. Deep learning
Brief history
• Popularized by Hinton in 2006 with Restricted Boltzmann Machines
• Developed by different actors:
and many others...
3
4. Deep learning – Hype or reality?
Quotes
I have worked all my life in Machine Learning, and I’ve never seen one
algorithm knock over benchmarks like Deep Learning
– Andrew Ng (Stanford & Baidu)
Deep Learning is an algorithm which has no theoretical limitations
of what it can learn; the more data you give and the more
computational time you provide, the better it is – Geoffrey Hinton (Google)
Human-level artificial intelligence has the potential to help humanity
thrive more than any invention that has come before it – Dileep George
(Co-Founder Vicarious)
For a very long time it will be a complementary tool that human
scientists and human experts can use to help them with the things
that humans are not naturally good – Demis Hassabis (Co-Founder DeepMind)
(Source: Lucas Masuch) 4
5. Deep learning
Actors and applications
• Very active technology adopted by big actors
• Success story for many different academic problems
• Image processing
• Computer vision
• Speech recognition
• Natural language processing
• Translation
• etc
5
7. Deep learning – Hype or reality?
NIPS (Computational Neuroscience Conference) Growth
7
8. Deep learning – Success stories
Success stories
AlexNet wins the ImageNet classification challenge (Krizhevsky, 2012)
Since, Deep Learning has been the standard and achieved remarkable results.
8
9. Deep learning – Success stories
Success stories
Google Brains’s image super-resolution (Dahl et al., 2017).
“Google’s neural networks have achieved the dream
of CSI viewers”, The Guardian.
9
10. Deep learning – Success stories
Success stories
Image captioning (Karpathy, Fei-Fei, CVPR, 2015).
“I had never thought AI would arrive at this stage that fast”, D. J. Lasiman.
10
11. Deep learning – Success stories
Success stories
Speech recognition
Same improvements in other fields: automatic translation, Go, . . .
11
13. Deep learning
ANNs recap
• Interconnections of neurons (summations and activations),
• Feedforward architecture: input → hidden layer(s) → output,
• Parameterized by a collection of weights Wk (and biases),
• Backprop: parameters learned from a collection of labeled data. 12
14. Deep learning
What is deep learning?
• Representation learning using artificial neural networks
→ Learning good features automatically from raw data.
→ Exceptionally effective at learning patterns.
• Learning representations of data with multiple levels of abstraction
→ hierarchy of layers that mimic the neural networks of our brain,
→ cascade of non-linear transforms.
Google’sdetectionNN
• If you provide the system with tons of information, it begins to understand
it and responds in useful ways.
(Source: Caner Hazırba¸s & Lucas Masuch) 13
15. Deep learning
How to teach a machine?
Good representations are often very complex to define.
(Source: Caner Hazırba¸s)
14
16. Deep learning
Deep learning: No more feature engineering
Learn the latent factors/features behind the data generation
(Source: Lucas Masuch)
15
17. Deep learning
Inspired by the Brain
• The first hierarchy of neurons that receives information in the visual cortex
are sensitive to specific edges while brain regions further down the visual
pipeline are sensitive to more complex structures such as faces.
• Our brain has lots of neurons connected together and the strength of the
connections between neurons represents long term knowledge.
• One learning algorithm hypothesis: all significant mental algorithms are
learned except for the learning and reward machinery itself.
(Source: Lucas Masuch)
16
18. Deep learning
Deep learning: No more feature engineering
• The traditional model of pattern recognition (since the late 50’s)
• Fixed/engineered features (or fixed kernel) + trainable classifier
• End-to-end learning / Feature learning / Deep learning
• Trainable features (or kernel) + trainable classifier
(Source: Yann LeCun & Marc’Aurelio Ranzato)
17
19. Deep learning
Deep learning: No more feature engineering
• Non-deep learning architecture for pattern recognition
• Speech recognition: early 90’s – 2011
• Object Recognition: 2006 – 2012
(Source: Yann LeCun & Marc’Aurelio Ranzato)
18
21. Deep learning
Trainable feature hierarchy
• Hierarchy of representations with increasing levels of abstraction.
• Each stage is a kind of trainable feature transform.
Image recognition
• Pixel → edge → texton → motif → part → object
Text
• Character → word → word group → clause → sentence → story
Speech
• Sample → spectral band → sound → ... → phone → phoneme → word
Deep Learning addresses the problem of
learning hierarchical representations.
(Source: Yann LeCun & Marc’Aurelio Ranzato)
20
22. Deep learning
Deep learning – Feature hierarchy
• It’s deep if it has more than one stage of non-linear feature transformation
Low-Level
Feature
Feature visualization of convolutional net
trained on ImageNet from (Zeiler & Fergus, 2013)
(Source: Yann LeCun & Marc’Aurelio Ranzato) 21
23. Deep learning
Deep learning – Feature hierarchy
Each layer progressively extracts higher level features of the input until the final
layer essentially makes a decision about what the input shows. The more layers
the network has, the higher level features it will learn.
(Source: Andrew Ng & Lucas Masuch & Caner Hazırba¸s)
22
24. Deep learning
Deep learning – Feature hierarchy
• Mid- to high-level feature are specific to the given task.
• Low-level representations contained less specific features.
(can be shared for many different applications)
23
25. Deep learning
Deep learning – Training
Today’s trend: make it deeper and deeper
• 2012: 8 layers (AlexNet – Krizhevsky et al., 2012)
• 2014: 19 layers (VGG Net – Simonyan & Zisserman, 2014)
• 2014: 22 layers (GoogLeNet – Szegedy et al., 2014)
• 2015: 152 layers (ResNet – He et al., 2015)
• 2016: 201 layers (DenseNet – Huang et al., 2017)
But remember, with back-propagation:
• We got stuck at local optima or saddle points
• The learning time does not scale well
• it is very slow for deep networks and can be unstable.
How did network get so deep? First, why does backprop fail?
24
26. Deep learning
Deep learning – Gradient vanishing problems
Back-propagation and gradient vanishing problems
Update wi,j ← wi,j − γ
∂E(W )
∂wi,j
with
∂E(W )
∂wi,j
=
(x,d)∈T
δihj
where δi = g (ai) ×
yi − di if i is an output node
l
wl,iδl otherwise
• With deep networks, the gradient vanishes quickly.
• Unfortunately, this arises even though we are far from a solution.
• The updates become insignificant, which leads to slow training rates.
• This strongly depends on the shape of g (a).
• The gradient may also explode leading to instabilities:
→ gradient exploding problem.
25
27. Deep learning
Deep learning – Gradient vanishing problem
As the network gets deeper, the landscape of E becomes:
• very hilly → lots of stationary points,
• with large plateaus → gradient vanishing problem,
• and delimited by cliffs → gradient exploding problem.
So, what has changed?
26
29. Deep learning – Unsupervised pretraining (Prehistory)
(Greedy layer-wise) Unsupervised pretraining
Idea: Training a shallow network is easier than training a deep one.
1 Pretraining: Train several successive layers in an unsupervised way,
2 Stacking: Stack them, stick a classifier at the last layer, and
3 Fine tuning: do classical back propagation.
• Originally proposed by Hinton 2006 to learn Deep Belief Networks (DBN),
→ Considered as deep learning renaissance.
• Greedy because each layer is optimized independently,
• Regularization: decreases test error without decreasing training error.
How to train layers/networks in an unsupervised way?
For DBN, Hinton proposed using restricted Boltzman machine.
But here, we will consider auto-encoders (Bengio et al., 2007).
27
30. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
• Unsupervised learning: unlabeled training data T = {x1
, x2
, . . . , xN
}
• Applications: Dimensionality reduction, denoising, feature extraction, . . .
• Model: feed-forward NNs trained to reproduce the inputs at the outputs,
• Early attempts date back to auto-associative memories (Kohonen 80s),
• Renewed interest since 2006.
Trained with backprop, using the x’s both as
inputs and desired outputs, e.g.:
E(W ) =
N
i=1
||xi
− yi
||2
where yi
= f(xi
; W )
The hidden layer h learns features relevant for the
given dataset. Also called representation layer.
28
31. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
• Will try to reproduce the input, i.e., to learn the identity function.
• In order to avoid this trivial solution: constrain the hidden layer.
Undercomplete representations
• Classical constraint is to use less hidden units than in the input layer,
(like on the previous illustration)
• Dimensionality reduction: learn a compressed representation h of the data
by exploiting the correlation among the features,
• If all units are linear and the loss is the MSE: performs a transformation
similar to PCA, i.e., it learns a projection that maximizes the variance of
the data (without the orthogonal property).
29
32. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
Overcomplete representations (1/3)
• Hidden layer may be larger than the input layer.
• Add a penalization into the loss function to avoid the identity solution:
E(W ) =
N
i=1
||xi
− yi
||2
+ Ω(hi
)
• The penalty prevents hidden units from being
freely independent, and h to take arbitrary
configurations.
• This requires adapting the backprop algorithm.
30
33. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
Overcomplete representations (2/3)
• Sparse autoencoders: 1 penalty: Ω(h) = λ||h||1 = λ |hk|
• promote sparsity: tends to drive some entries of h to be exactly zero,
• responds differently to different stimuli x (different non-zero coeffs),
• hidden units are specialized,
• learns local mappings (union of subspaces/manifold learning).
31
34. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
Overcomplete representations (2/3)
• Sparse autoencoders: 1 penalty: Ω(h) = λ||h||1 = λ |hk|
• promote sparsity: tends to drive some entries of h to be exactly zero,
• responds differently to different stimuli x (different non-zero coeffs),
• hidden units are specialized,
• learns local mappings (union of subspaces/manifold learning).
31
35. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
Overcomplete representations (2/3)
• Sparse autoencoders: 1 penalty: Ω(h) = λ||h||1 = λ |hk|
• promote sparsity: tends to drive some entries of h to be exactly zero,
• responds differently to different stimuli x (different non-zero coeffs),
• hidden units are specialized,
• learns local mappings (union of subspaces/manifold learning).
31
36. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs)
Overcomplete representations (3/3)
• Contractive autoencoders:
Ω(h) = λ||
∂h
∂x
||2
F = λ
k l
∂hk
∂xl
2
• Encourage ∂h
∂x
to be close to 0,
• Insensible to variations in x except in meaningful directions.
• Prior on the distribution of hidden units
• Under a probabilistic interpretation of the NN,
priors can play the role of penalties (Bayesian framework).
• Ex: Ω(h) = λ||h||1 = − log p(h) + cst. under a Lapacian prior p.
32
37. Deep learning – Unsupervised pretraining – Auto-encoders
Denoising auto-encoders (DAEs) (Vincent et al., 2008)
• Train the AE to reconstruct the input from a corrupted version of it:
• Some inputs randomly set to 0: ˜xi = xi with proba P, 0 otherwise,
• Or Gaussian additive noise: ˜xi = xi + wi, wi ∼ N(0, σ2
).
• Force the hidden layer to discover more robust features,
• Sort of regularization: prevents from simply learning the identity.
33
38. Deep learning – Unsupervised pretraining – Auto-encoders
Auto-encoders (AEs) – Visualizing hidden units activities
One way to understand what a hidden unit (or any unit) is doing,
is to look at its weight vector.
• Sparse AE trained on a small set of 10×10 images,
• 100 hidden units (hidden layer size = input layer size),
• Each square represents an input image maximizing the
response of one of the hidden unit,
• Each hidden unit has learned a feature detector,
• Mostly edge detectors at different angles and positions
→ Low-level features.
(Source: Andrew Ng)
How to get higher level features?
34
39. Deep learning – Unsupervised pretraining – Auto-encoders
Stacked auto-encoders (Bengio et al., 2007)
• Idea: Train successive layers of AEs to learn more complex features.
• Strategy: (greedy layer-wise training)
• Each AE is trained using the representation layer of a previous AE,
35
40. Deep learning – Unsupervised pretraining – Auto-encoders
Stacked auto-encoders (Bengio et al., 2007)
• Idea: Train successive layers of AEs to learn more complex features.
• Strategy: (greedy layer-wise training)
• Each AE is trained using the representation layer of a previous AE,
• The encoders and decoders are next stacked together,
35
41. Deep learning – Unsupervised pretraining – Auto-encoders
Stacked auto-encoders (Bengio et al., 2007)
• Idea: Train successive layers of AEs to learn more complex features.
• Strategy: (greedy layer-wise training)
• Each AE is trained using the representation layer of a previous AE,
• The encoders and decoders are next stacked together,
• Fine tuning can be performed to tweak the deep AE.
35
42. Deep learning – Unsupervised pretraining – Auto-encoders
Deep auto-encoders
Learning High Level Representations in Videos – Google (Le et al. 2012)
• 10 million images from YouTube videos,
• Images are 200 × 200 pixels,
• Sparse deep auto-encoder with 109
connections,
• Show test images to the network (e.g., faces),
• Look for neurons with maximum response,
• Some neurons respond to high level characteristics
→ Faces, cats, silhouettes, . . .
• Major breakthrough in machine learning.
Remark: in fact this deep auto-encoder has not been trained using
unsupervised pretraining, but with techniques that will be introduced later.
36
43. Deep learning – Unsupervised pretraining – Auto-encoders
Combining deep auto-encoders with a classifier
Idea:
• The learned features can be used as inputs to any other system,
• Instead of using the input features x use h obtained at the deepest layer
of the encoder.
Strategy:
1 Given a training set {(x1
, d1
), . . . , (xN
, dN
)},
2 Unsupervisedly train an auto-encoder from {x1
, . . . , xN
},
3 Get the feature codes {h1
, . . . , hN
} (from the deepest layer),
4 Use these codes to train a classifier on {(h1
, d1
), . . . , (hN
, dN
)},
5 Plug the encoder and the classifier together,
6 Perform fine tuning (end-to-end training with backprop).
37
44. Deep learning – Unsupervised pretraining – Auto-encoders
Combining deep auto-encoders with a classifier
1 Train auto-encoder:
2 Plug classifier:
38
45. Deep learning – Unsupervised pretraining – Auto-encoders
Combining deep auto-encoders with a classifier
1 Train auto-encoder:
2 Plug classifier:
38
46. Deep learning – Unsupervised pretraining – Auto-encoders
Combining deep auto-encoders with a classifier
1 Train auto-encoder:
2 Plug classifier:
38
47. Deep learning
Unsupervised pretraining – Overview
Why not learn all the layers at once?
• The gradients propagated backwards diminish rapidly in magnitude so that
the first layers will not learn much (vanishing gradient problem),
• Non convex form of the loss function: many local minima.
Why use auto-encoders?
• Unsupervised learning
• Often easy to get unlabeled data,
• Extract “meaningful” features,
• Can be used as preprocessing for other datasets (transfer learning).
Supervised fine tuning
• Can be used when labels are available for other tasks.
Today, there exist procedures for training deep NN architectures from
scratch, but the unsupervised pretraining approach
was the first method to succeed.
39
49. Modern deep learning
What has changed again?
Modern deep learning makes use of several tools and heuristics for training
large architectures: type of units, normalization, optimization. . .
1 Fight vanishing/exploding gradients
−→ Rectifiers, Gradient clipping.
2 Improve accuracy by using tons of data thanks to
• Stochastic gradient descent (and variants),
• GPUs, parallelization, distributed computing, . . .
3 Fight poor solutions (saddle points) and over-fitting
−→ Early stopping, Dropout, Batch normalization, Regularization, . . .
4 Multitude of open source solutions.
40
50. Modern deep learning – Tools and recipes
Disclaimers
The field evolves so quickly that parts of what follows
are possibly already outdated.
And if they are not outdated yet, they will be in the next few months.
Many recent tools, sometimes less than a year old, become the new standard
just because they work better.
Why do they work better? The answer is usually based on intuition, and
rigorous answers are often lacking.
41
51. Modern deep learning – Tools and recipes
Rectifier (Glorot, Bordes and Bengio, 2011)
• Goal: Avoid the gradient vanishing problem.
• How: make sure g (aj) is not close to zero for all training points,
this was often arising with hyperbolic tangent or sigmoids.
• Rectifier linear unit (ReLU):
g(a) = max(a, 0)
• Softplus: g(a) = log (1 + exp a)
• Leaky ReLU: g(a) = max(a, 0.1a)
• Exponential LU (ELU): g(a) =
a if a 0
α(ea
− 1) otherwise
In a randomly initialized network, about 50% of hidden units are activated.
It also promotes sparsity of hidden units (structured/organized features).
42
52. Modern deep learning – Tools and recipes
Gradient clipping (Mikolov, 2012; Pascanu et al., 2012)
• Goal: Avoid the gradient exploding problem.
• How: make sure E(W ) does not get too large.
• Cliffs cause the learning process to overshoot a desirable minimum.
• Preserve the direction but prevent extreme values:
If || E(W t
)|| τ: W t+1
← W t
− γ E(W t
)
Else: W t+1
← W t
− γτ
E(W t
)
|| E(W t)||
• Pick threshold τ by averaging the norm over a large number of updates.
• New solution W t+1
stays in a ball of radius τ around the current solution.
43
53. Modern deep learning – Tools and recipes
Stochastic gradient descent
Reminder: (Batch) gradient descent with backprop
1 For each training sample
• Perform backprop,
• Sum their gradients.
2 Update the weights (based on the gradient over the whole batch).
GotoStep1
−−−−−−−→
→ Convergence to a stationary point under some technical assumptions.
• Requires the loading of the whole dataset into memory (big data sets),
• Can’t be efficiently parallelized or used online,
• Scanning the whole training set is slow (epoch),
• Epoch: a scan in which each training sample has been visited at least once.
• After 10 epochs, the weights have been updated only 10 times,
• Designed to fall into local minima.
44
54. Modern deep learning – Tools and recipes
Stochastic gradient descent
Stochastic gradient descent (SGD) (Robbins and Monro, 1951)
1 For each training sample (shuffled randomly – thus “stochastic”)
• Perform backprop,
• Evaluate its gradient,
• Update the weights (based only on that sample).
GotoStep1
−−−−−−−→
→ Convergence under technical assumptions with high probability.
• After 10 epochs, the weights have been updated 10 × N times
(N size of the training set),
• Faster if the gradient of each sample is a good proxy for the total gradient,
• Can get out of local minima as it explores the space in a random fashion,
• It allows online learning,
• But it is noisy, unstable and can still get trap into local minima.
• Origin: ADALINE (Widrow & Hoff, 1960).
45
55. Modern deep learning – Tools and recipes
Stochastic gradient descent
Mini-batch gradient descent
1 Shuffle: create a random partition of subsets of size b (mini-batches)
2 For each mini-batch
• Perform backprop and accumulate the gradients,
• Update the weights (based on the gradient of the mini-batch)
GotoStep1
−−−−−−−→
→ Convergence under technical assumptions with high probability.
• Usually, a mini-batch contains around 50 to 256 data points,
• After 10 epochs, the weights have been updated 10 × N/b times
• Smoothes the noise, then provides a better proxy for the total gradient,
• Can still get out of some local minima,
• Trade-off between batch GD and SGD (stability vs. speed)
• Enjoys highly optimized matrix manipulation tools.
Remark: Nowadays, SGD often referred to mini-batch gradient descent.
46
56. Modern deep learning – Tools and recipes
Stochastic gradient descent
Mini-batch gradient descent:
• Not only faster,
• Usually finds better solutions.
Because randomness helps explore new configurations,
and mini-batches favor more relevant configurations.
But make sure to adjust the learning rates.
47
57. Modern deep learning – Tools and recipes
Learning rate (reminder)
W t+1
← W t
− γ E(W t
)
48
58. Modern deep learning – Tools and recipes
Learning rate schedules
Learning rate schedules: reduce γ at each epoch t
W t+1
← W t
− γt
E(W t
) with lim
t→∞
γt
= 0
• Time-based decay: γt
= 1
1+κt
γ0
, κ > 0 (aka, search-then-converge)
• Step decay: γt
= κt
γ0
, 0 < κ < 1 (aka, drop decay)
• Exponential decay: γt
= exp(−κt)γ0
, κ > 0
• Guaranteed to converge (avoid SGD from oscillating forever),
• But may stop too early (too fast),
• Connections with simulated annealing (γt
seen as temperature),
→ Hence, often referred to as annealing.
• Large γt
: explore the space (allows uphill jump),
• Small γt
: move downhill towards the best local solution.
• Variant: restart schedule from the best iterate (achieving smaller loss).
49
59. Modern deep learning – Tools and recipes
Adaptive learning rates
Adaptive learning rates: adapt γ during the iteration t independently for
each connection wij
wt+1
ij ← wt
ij − γt
ij
∂E(W t
)
∂wt
i,j
AdaGrad (Duchi et al., 2011): γt
i,j =
γ0
t
k=1
∂E(W k
)
∂wk
i,j
2
+ ε
Use larger rates for directions that remain flat, smaller if they keep changing.
Other strategies: RMSProp (Hinton, 2012) and Adam (Kingma & Ba, 2015)
new standard
.
Main advantage: Less sensitive to the initial choice of γ0
> 0.
Further reading: http://ruder.io/optimizing-gradient-descent/ 50
60. Modern deep learning – Tools and recipes
Momentum and Nesterov accelerated gradient (NAG)
Idea: denoise gradients (in SGD) by using inertia, memory of velocity
(Stochastic) gradient descent (for each mini-batch)
W t+1
← W t
− γ E(W t
)
Momentum – introduce velocity V t
(parameter ρ ≈ .9)
W t+1
← W t
− γ E(W t
) + ρV t
=V t+1
Nesterov acceleration (1983) (in deep community: Sutskever et al., 2013)
W t+1
← W t
− γ E(W t
− ρV t
) + ρV t
=V t+1
Nice illustrations: https://distill.pub/2017/momentum/ 51
61. Modern deep learning – Tools and recipes
Momentum and Nesterov accelerated gradient (NAG)
W t+1 ← W t − γ E(W t
)+ρV t
=V t+1
W t+1 ← W t − γ E(W t
−ρV t
) + ρV t
=V t+1
(Correction of the gradient) (Correction of the momentum)
In convex and batch settings, NAG decays in O(1/t2
) instead of O(1/t).
For deep learning, dramatic acceleration observed in practice.
It also improves regularity of SGD.
52
62. Modern deep learning – Tools and recipes
Weight decay
Goal: prevent from exploding weights, limit the freedom, avoid over-fitting.
Weight decay:
W t+1
← W t
− γ( E(W t
) + κW t
)
It corresponds in adding an 2 penalty in the loss
min
W
E(W ) +
κ
2
||W ||2
F
and has the effect of shrinking the weights towards zero.
53
63. Modern deep learning – Tools and recipes
Gradient noise (Neelakantan, 2015)
If a little bit of noise is good, what about adding noise to the gradient?
W t+1
← W t
+ γ( E(W t
) + N(0, σ2
t )) with σt =
σ0
(1 + t)κ
Works surprisingly great in many situations!
All together
Nesterov + Adaptive learning rates + Weight decay + Gradient noise:
W t+1
← W t
− γt
( E(W t
− ρV t
) + κ(W t
− ρV t
) + N(0, σt
2
)) + ρV t
=V t+1
Good luck to tune all the parameters jointly!
Luckily, many toolboxes offer good default settings.
54
64. Modern deep learning – Tools and recipes
Early stopping (Girosi, Jones and Poggio, 1995)
• Goal: Reduce over-fitting.
• How: Stop the iterations of gradient descent before convergence.
1 Split the training set into a training subset and a validation subset,
2 Train only on the training subset,
3 Evaluate the error on the validation subset every few epochs,
4 Stop training as soon as the error on the validation starts increasing.
55
65. Modern deep learning – Tools and recipes
Normalized initialization
Idea: A good weight initialization should be random
wij ∼ N(0, σ2
)
Gaussian
or wij ∼ U −
√
3σ,
√
3σ
Uniform
with standard deviation σ chosen such that all (hidden) outputs hj have unit
variance (during the first step of backprop).
Var[hj] = Var g
i
wijhi = 1
Randomness:
prevents units from learning the same thing.
Normalization: preserves dynamic of the
outputs and gradients at each layer:
→ all units get updated with similar speeds.
56
66. Modern deep learning – Tools and recipes
Normalized initialization
1 Normalize the inputs with about zero-mean and unit variance,
2 Initialize all the biases to zero,
3 For the weights wij of a unit j with N inputs and activation g:
• Xavier initialization (Glorot & Bengio, 2010): for g(a) = a
Var g
i
wijhi =
i
σ2
Var [hi]
=1
= 1 ⇒ σ =
1
√
N
(Works also for g(a) = tanh(a).)
• He initialization (He et al., 2015): for ReLU → σ =
2
N
(First time, a machine surpasses humans on ImageNet.)
• Kumar initialization (Kumar, 2017): for any g differentiable at 0:
σ2
=
1
Ng (0)2(1 + g(0)2)
57
67. Modern deep learning – Tools and recipes
Normalized initialization
Alternatives:
• Force weights of different units to be orthogonal (Saxe et a., 2014)
• Choose W as one of the orthonormal matrices obtained by taking the
SVD of a matrix with random elements: M = USV T
.
• It promotes units to be specialized for different features.
• Mix orthogonality and equal variance principles (Mishkin & Matas, 2016)
• Choose W as a random orthonormal matrix,
• Normalize each unit by their variance estimated on mini-batches,
• It forces units to be specialized in different tasks,
• It makes sure the gradients do not vanish in some layers.
58
68. Modern deep learning – Tools and recipes
Dropout (Hinton et al., 2012)
• Goal: Reduce over-fitting.
• How: Train smaller networks.
• Build a sub-network for each training sample,
• Drop nodes with probability 1 − p or keep them with probability p,
(in practice use a random binary mask to disable entries of W ).
• Usually, p = .5 for hidden layer, and p ≈ 1 for the input.
59
69. Modern deep learning – Tools and recipes
Dropout (Hinton et al., 2012)
• Use back-propagation for each data point and its sub-network,
• Note that each sub-network within a mini-batch shares common weights,
• At test time, dropout is disabled and unit outputs are multiplied by their
corresponding p such that all units behave as if trained without dropout.
• Prevents units from co-adapting too much (depend on each other),
• Allows the network to learn more robust and specific features,
• Converges faster, and each epoch costs less.
60
70. Modern deep learning – Tools and recipes
Batch normalization (Ioffe and Szegedy, 2015)
• Goal: Learn faster, learn better.
• How: Normalize inputs of each hidden layer.
Internal Covariate Shift
• With backprop, the distribution of the inputs h of a (deep) hidden layer
keeps changing during the iterations.
• This is painful as the layers need to continuously adapt to it, then
• Requires low learning rates,
• Careful parameter initialization,
• Proper choices of activation functions,
• Leads to slow convergence, and poor solutions.
61
71. Modern deep learning – Tools and recipes
Batch normalization (Ioffe and Szegedy, 2015)
• Ultimately, transform a (recall, a = W h + b) to satisfy
• Zero-mean: remove arbitrary shift of your data,
• Unit variance: remove arbitrary spread of your data,
• White: force decorrelation/independence
(non-linearities are often applied entry-wise),
• Gaussian: force entries to be Gaussian distributed,
→ remove extreme values.
• Dates back to classical ML with hand-crafted features,
(see, standardizing, PCA, feature warping, histogram specification, . . . )
• but here we also normalize the inputs of each (hidden) layer at each epoch.
• To avoid reducing speed: only do zero-mean and unit variance.
62
72. Modern deep learning – Tools and recipes
Batch normalization (Ioffe and Szegedy, 2015)
Idea: perform for each unit i of a given hidden layer
aBN
i ← βi + αi
ai − µi
σ2
i + ε
standardized input
• µi, σi: mean and standard deviation of ai over the current mini-batch,
(obtained during the forward step)
• αi, βi: scalar parameters learned by backprop,
(updated during the backward step)
• ε > 0: regularization to avoid instabilities.
αi and βi introduced to preserve the network capacity.
(to make use of non-linearities, otherwise sigmoid ≈ linear)
During testing, replace µi and σi by an average of the last estimates obtained
over mini-batches during training. 63
73. Modern deep learning – Tools and recipes
Batch normalization (Ioffe and Szegedy, 2015)
• αi and βi encode the range of inputs given to the activation function,
(can be seen as parameters of the activation function)
• each layer learns a little bit more by itself independently of other layers,
• allows higher learning rates (larger gradient descent step size γ),
• reduce over-fitting – slight regularization effects.
64
74. Modern deep learning – Tools and recipes
Batch normalization – variant
• Some implementations normalized hi instead of ai,
• In this case αi and βi perform a kind of preconditioning,
• What strategy is best is still under debate.
65
75. Modern deep learning – Tools and recipes
Residual Networks (ResNets) (Microsoft – He et al., 2015).
Up to 2015, deep was about ≈ 20 layers, and using more was harmful.
Goal: Get deeper and deeper while improving accuracy.
Idea: Learn f : x → d − x
residual
instead of h : x → d (assuming same size),
then give your prediction of d as y = f(x) + x
• Tough to learn a mapping close to the identity with non-linear layers:
• Consider a 1 ReLU unit:
1 y = max(0, wx + b)
h(x)
2 y = max(0, wx + b)
f(x)
+x.
• Learn y = x: trivial for 2 , impossible for 1 .
• Of course, possible with more units but less trivial.
• Simpler to learn what the fluctuations around x should be.
66
76. Modern deep learning – Tools and recipes
Residual Networks (ResNets)
Example (Denoising x = d + n with d the desired clean image)
• The residual is (minus) the noise component: d − x = −n,
• subtle difference between the input x = d + n and output d
→ hard to learn.
• large difference between the input x = d + n and the residual −n
→ easier to learn.
x
×
−→
d x
√
−→
−n
But why does it help me to be deeper?
67
77. Modern deep learning – Tools and recipes
Residual Networks (ResNets)
• Don’t necessarily use this concept only for the input and output layers,
• Instead, use successions of hidden layers with same size,
• Use it every two hidden layers:
ak+1
= W k
hk
hk+1
= g(ak+1
)
ak+2
= W k+1
hk+1
hk+2
= g(ak+2
+ hk
)
• Use shortcut connections:
• It does not require extra parameters,
• Does not add computational overhead,
• Acts as a kind of preconditioning
−→ allowing for huge accelerations,
• Led to a 152 deep NN that won the 2015 ImageNet competition,
−→ we will return to this next class.
68
78. Modern deep learning – Tools and recipes
Regularization
• Goal: Avoid over-fitting.
• How: Use prior knowledge.
Early stopping, dropout, batch normalization, etc., implicitly promote some
regularity. But we can also introduce regularity explicitly given prior knowledge
on the problem/application context.
Remember: autoencoders extract relevant features by making use of
• Undercomplete representation (remove nodes):
information can be compressed/encoded with a small dictionary,
• Overcomplete/sparse representation (penalize node’s outputs):
information can be encoded with a few atoms of a large dictionary.
In both cases, we have used prior knowledge that
information must be structured/organized.
69
79. Modern deep learning – Tools and recipes
Regularization
In general, there exists a lot more of intrinsic regularity in the data that should
be injected in the model (all the more for images).
For instance, as for the nodes, depending on the application, some connections
can be removed (non-fully connected layer), or their weights can be penalized
by adding a regularization term Ω in the loss function:
• Ω = 1 to promote sparsity (sparse layer),
• Ω = 2 to promote small coefficients (weight decay),
• . . .
As we will see next class, Convolution Neural Networks (CNN) are a kind of
highly constrained architecture very relevant for images.
Regularity is probably the most important ingredient allowing for deep learning.
70
80. Modern deep learning – Tools and recipes
Data augmentation
Idea: generate artificially more data points from the training set itself.
Others: rotations, zooms (rescalings), brightness, noise, . . .
(Source: Ken Chatfield) 71
81. Modern deep learning – Tools and recipes
Curriculum learning (Bengio et al, 2009)
• Convex criteria: the order of presentation of samples should not matter
to the convergence point, but could influence convergence speed.
• Non-convex criteria: the order and selection of samples could yield a
better solution.
1 Integers
2 Real numbers
3 Complex numbers
4 eiπ
= −1
(Inspiration: Pawan Kumar)
• Start learning on easy samples, and increase the difficulty step by step.
• Improves but requires to sort the training samples by difficulty.
• When you cannot or do not know how, shuffling randomly is safer.
72
82. Modern deep learning – Tools and recipes
Multitask learning (Caruana, 1997)
• Learn jointly how to solve several tasks improves performance,
• Effect of regularization (avoid over-fitting),
• Useful in context of insufficiently labeled data.
Share first hidden
layers, use specific
ones last.
Related to:
• unsupervised pretraining (Hinton, 2006): train first in an unsupervised
way, fine-tune to solve a given task.
• transfer learning (Pratt, 1993, 1997): train first on a given large dataset
(detect red cars), fine-tune to a smaller one (detect tumors).
73
83. Modern deep learning – Tools and recipes
Open source solutions
Most popular
• TensorFlow – Google Brain – Python, C/C++
• Torch – Facebook – Lua, Python
• Theano – University of Montreal – Python (ended Nov 2017)
• Keras high level interface for TensorFlow, Theano
• Caffe – Berkeley Vision and Learning Center - Python, MATLAB
• MxNet – Apache – Python, Matlab, Julia, C++, . . .
• MatConvNet – University of Oxford – Matlab
Main features
• Implementations of most classical DL models, tools, recipes...
• Pretrained models,
• GPU/CUDA support,
• Flexible interfaces,
• Automatic differentiation.
74
84. Modern deep learning – Tools and recipes
(Multi) GPUs architectures
• Neural Networks are inherently parallel,
• GPUs are good at matrix/tensor operations,
• Much faster than CPU, lower power, lower cost,
• Provide same or better prediction,
• New architectures are optimized for deep learning:
NVIDIA Tesla V100: World’s first
GPU to break the 100 teraflops
barrier of deep learning performance.
New feature: 640 tensor cores.
Relatively cheap (≈ $8,000).
Released June 2017.
• Relatively easy to manipulate thanks to CUDA,
• Many deep learning toolboxes have GPU support,
• Can be integrated in clusters/datacenters for distributed computing.
75
85. Modern deep learning – Tools and recipes
Best practices
• Check/clean your data: remove corrupted ones, perform normalization, . . .
• Shuffle the training samples,
• Split your data into training (70%) and testing (30%),
• Use early stopping: take another 30% of training for validation,
• Never train on test data,
• Start with an existing network and adapt it to your problem,
• Start smallish, keep adding layers and nodes until you overfit too much,
• Check that you can achieve zero loss on a tiny subset (20 examples),
• Track the loss during training,
• Track the first-layer features,
• Try the successive rates: γ = 0.3, 0.1, 0.03, 0.01, 0.003, 0.001.
76
87. Weaknesses and criticisms
Differences from the brain
• The way artificial neurons fire is fundamentally different than biological
neurons.
• A human brain has 100 billion neurons and 100 trillion connections
(synapses) and operates on 20 watts (enough to run a dim light bulb) - in
comparison the 2012 Google Brain project had 10 million neurons and 1
billion connections on 16,000 CPUs (about 3 million watts).
• The brain is limited to 5 types of input data from the 5 senses.
• Children do not learn what a cow is by reviewing 100,000 pictures labeled
“cow” and “not cow”, but this is how deep learning works.
(Source: Devin Didericksen)
77
88. Weaknesses and Criticisms
Deep neural networks are easily fooled (Nguyen, CVPR, et al, 2014)
• Images are unrecognizable
by humans,
• System has 99.1% certainty
that the image is that
type of thing,
• Unclear what it learns, clearly
it does not organize concepts
as we do.
78
89. Weaknesses and Criticisms
Intriguing properties of neural networks (Szegedy et al., 2013)
For each sample, there exists a small (non random) perturbation that can
completely fool the network, even though humans are not fooled.
79
90. Weaknesses and criticisms
Universal Adversarial Perturbations (Moosavi-Dezfooli et al., ICCV, 2017)
Look for a small perturbation r such that
argmin
r
||r||2 subject to ∀(x, d) ∈ T , y = f(x + r) = d
Worst, it can be the same perturbation for all training samples.
−→ raises serious security issues in critical applications. 80
93. Deep learning – Overview
Pros & Cons
• Best performing method in many
Computer Vision tasks,
• No need for hand-crafted features,
• Robustness to natural variations in the
data is automatically learned,
• Can be reused for many different
applications and data types,
• Applicable to large-scale problems,
e.g., classification with 1000 classes,
• Performance improves with more data,
• Easy parallelization on GPUs.
• Need of huge amount of training data,
• Hard to design and tune,
• Difficult to analyze and understand:
many features are not really
human-understandable,
• Other methods like SVMs are easily
deployed even by novices and can
usually get you good results,
• Tend to learn everything. It’s better to
encode prior knowledge about
structure of images (or audio or text)
if large datasets are not available.
(Source: Caner Hazırba¸s)
82
94. Deep learning – Overview
When to use Deep Learning?
• You have a large amount of data with good quality,
• You are modeling image/audio/language/time-series data,
• Your data contains high-level abstractions:
Excels in tasks where the basic unit (pixel, word) has very little
meaning in itself, but their combination has a useful meaning.
• You need a model that is less reliant on handmade features, and instead
can learn important features from the data.
(Source: Lucas Masuch, Andrew Ng) 83
95. Questions?
Next class: Image classification and CNNs
Slides and images courtesy of
• K. Chatfield
• P. Gallinari,
• C. Hazırba¸s
• Y. LeCun
• V. Lepetit
• L. Masuch
• A. Ng
• M. Ranzato
83