This document provides an introduction and overview of deep learning. It begins with defining neural networks and how they are inspired by biological neurons. It then discusses different types of neural networks like single perceptrons, multi-layer perceptrons, convolutional neural networks, recurrent neural networks, and autoencoders. The document explains key concepts in deep learning like weights, biases, activation functions, loss functions, and training neural networks using gradient descent. It also clarifies terms like epochs, batches, and iterations in the training process. Finally, it touches on important hyperparameters like learning rate that control the training of neural networks.
The document discusses various activation functions used in neural networks including Tanh, ReLU, Leaky ReLU, Sigmoid, and Softmax. It explains that activation functions introduce non-linearity and allow neural networks to learn complex patterns. Tanh squashes outputs between -1 and 1 while ReLU sets negative values to zero, addressing the "dying ReLU" problem. Leaky ReLU allows a small negative slope. Sigmoid and Softmax transform outputs between 0-1 for classification problems. Activation functions determine if a neuron's output is important for prediction.
Artificial neural networks mimic the human brain by using interconnected layers of neurons that fire electrical signals between each other. Activation functions are important for neural networks to learn complex patterns by introducing non-linearity. Without activation functions, neural networks would be limited to linear regression. Common activation functions include sigmoid, tanh, ReLU, and LeakyReLU, with ReLU and LeakyReLU helping to address issues like vanishing gradients that can occur with sigmoid and tanh functions.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
Neural networks are inspired by biological neurons and are used to learn relationships in data. The document defines an artificial neural network as a large number of interconnected processing elements called neurons that learn from examples. It outlines the key components of artificial neurons including weights, inputs, summation, and activation functions. Examples of neural network architectures include single-layer perceptrons, multi-layer perceptrons, convolutional neural networks, and recurrent neural networks. Common applications of neural networks include pattern recognition, data classification, and processing sequences.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
The document provides an overview of machine learning concepts including linear regression, artificial neural networks, and convolutional neural networks. It discusses how artificial neural networks are inspired by biological neurons and can learn relationships in data. The document uses the MNIST dataset example to demonstrate how a neural network can be trained to classify images of handwritten digits using backpropagation to adjust weights to minimize error. TensorFlow is introduced as a popular Python library for building machine learning models, enabling flexible creation and training of neural networks.
The document provides an introduction to deep learning, including the following key points:
- Deep learning uses neural networks inspired by the human brain to perform machine learning tasks. The basic unit is an artificial neuron that takes weighted inputs and applies an activation function.
- Popular deep learning libraries and frameworks include TensorFlow, Keras, PyTorch, and Caffe. Common activation functions are sigmoid, tanh, and ReLU.
- Neural networks are trained using forward and backpropagation. Forward propagation feeds inputs through the network while backpropagation calculates errors to update weights.
- Convolutional neural networks are effective for image and visual data tasks due to their use of convolutional and pooling layers. Recurrent neural networks can process sequential data due
Neural Network Activation Functions
Bharatiya Vidya Bhavan’s
Sardar Patel Institute of Technology,
Munshi Nagar, Andheri (w) Mumbai.
Neural Network Activation Functions
AICTE Sponsored Two Week FDP on
“Insights into Intelligent Automation
Machine Learning and Data science”
19th Oct to 31st Oct 2020
By
Dhananjay Kalbande
Professor, Computer
Engineering,S.P.I.T. Mumba
The document discusses various activation functions used in neural networks including Tanh, ReLU, Leaky ReLU, Sigmoid, and Softmax. It explains that activation functions introduce non-linearity and allow neural networks to learn complex patterns. Tanh squashes outputs between -1 and 1 while ReLU sets negative values to zero, addressing the "dying ReLU" problem. Leaky ReLU allows a small negative slope. Sigmoid and Softmax transform outputs between 0-1 for classification problems. Activation functions determine if a neuron's output is important for prediction.
Artificial neural networks mimic the human brain by using interconnected layers of neurons that fire electrical signals between each other. Activation functions are important for neural networks to learn complex patterns by introducing non-linearity. Without activation functions, neural networks would be limited to linear regression. Common activation functions include sigmoid, tanh, ReLU, and LeakyReLU, with ReLU and LeakyReLU helping to address issues like vanishing gradients that can occur with sigmoid and tanh functions.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
Neural networks are inspired by biological neurons and are used to learn relationships in data. The document defines an artificial neural network as a large number of interconnected processing elements called neurons that learn from examples. It outlines the key components of artificial neurons including weights, inputs, summation, and activation functions. Examples of neural network architectures include single-layer perceptrons, multi-layer perceptrons, convolutional neural networks, and recurrent neural networks. Common applications of neural networks include pattern recognition, data classification, and processing sequences.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
The document provides an overview of machine learning concepts including linear regression, artificial neural networks, and convolutional neural networks. It discusses how artificial neural networks are inspired by biological neurons and can learn relationships in data. The document uses the MNIST dataset example to demonstrate how a neural network can be trained to classify images of handwritten digits using backpropagation to adjust weights to minimize error. TensorFlow is introduced as a popular Python library for building machine learning models, enabling flexible creation and training of neural networks.
The document provides an introduction to deep learning, including the following key points:
- Deep learning uses neural networks inspired by the human brain to perform machine learning tasks. The basic unit is an artificial neuron that takes weighted inputs and applies an activation function.
- Popular deep learning libraries and frameworks include TensorFlow, Keras, PyTorch, and Caffe. Common activation functions are sigmoid, tanh, and ReLU.
- Neural networks are trained using forward and backpropagation. Forward propagation feeds inputs through the network while backpropagation calculates errors to update weights.
- Convolutional neural networks are effective for image and visual data tasks due to their use of convolutional and pooling layers. Recurrent neural networks can process sequential data due
Neural Network Activation Functions
Bharatiya Vidya Bhavan’s
Sardar Patel Institute of Technology,
Munshi Nagar, Andheri (w) Mumbai.
Neural Network Activation Functions
AICTE Sponsored Two Week FDP on
“Insights into Intelligent Automation
Machine Learning and Data science”
19th Oct to 31st Oct 2020
By
Dhananjay Kalbande
Professor, Computer
Engineering,S.P.I.T. Mumba
This document provides legal notices and disclaimers for an informational presentation by Intel. It states that the presentation is for informational purposes only and that Intel makes no warranties. It also notes that Intel technologies' features and benefits depend on system configuration. Finally, it specifies that the sample source code in the presentation is released under the Intel Sample Source Code License Agreement and that Intel and its logo are trademarks.
This document provides an overview of deep learning including:
- Deep learning uses neural networks with multiple hidden layers to learn complex patterns in data.
- It can learn powerful feature representations from raw data in an unsupervised manner, unlike traditional ML which requires handcrafted features.
- The basics of neural networks including perceptrons, forward/backward propagation, and activation functions are explained.
- Training a neural network involves calculating loss, taking gradients to minimize loss through methods like stochastic gradient descent and adapting the learning rate.
- Regularization techniques help prevent overfitting, and H2O is introduced as a tool for scalable deep learning on large datasets.
The document discusses deep learning and artificial neural networks. It provides an agenda for topics covered, including gradient descent, backpropagation, activation functions, and examples of neural network architectures like convolutional neural networks. It explains concepts like how neural networks learn patterns from data using techniques like stochastic gradient descent to minimize loss functions. Deep learning requires large amounts of processing power and labeled training data. Common deep learning networks are used for tasks like image recognition, object detection, and time series analysis.
- Artificial neural networks are inspired by biological neurons and are made up of artificial neurons (perceptrons).
- A perceptron receives multiple inputs, assigns weights to them, calculates the weighted sum, and passes it through an activation function to produce an output.
- Weights allow the perceptron to learn the importance of different inputs and change the orientation of the decision boundary. The bias helps shift the activation function curve.
- Common activation functions include sigmoid, tanh, ReLU, and leaky ReLU. They introduce non-linearity and help address issues like vanishing gradients.
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
The document provides an introduction and overview of machine learning, deep learning, and data analysis. It discusses key concepts like supervised and unsupervised learning. It also summarizes the speaker's experience taking online courses and studying resources to learn machine learning techniques. Examples of commonly used machine learning algorithms and neural network architectures are briefly outlined.
This document provides an overview of neural networks and their basic components. It discusses how biological neurons inspired the development of artificial neurons like perceptrons. Perceptrons take weighted inputs, sum them, and use an activation function to determine if the neuron "fires" or not. The document explains how perceptrons are trained by adjusting weights to minimize errors. It also covers common activation functions like sigmoid, tanh, and ReLU and why non-linear activation functions are important for neural networks to learn complex patterns from data.
The document provides an overview of Convolutional Neural Networks (CNNs) including the common layers used to build CNNs such as convolutional, activation, pooling, fully connected, batch normalization, and dropout layers. It describes the functions of each layer type and includes diagrams illustrating CNN architecture. Key components like convolutional layers, pooling layers, and fully connected layers are explained in more detail. Additionally, the document discusses various activation functions used in CNNs such as ReLU, LeakyReLU, Sigmoid, Tanh, Softmax, and more. Their mathematical representations and limitations are also outlined.
The document discusses neural network architecture and components. It explains that a neural network consists of nodes that represent neurons, similar to the human brain. Data is fed through an input layer, processed through hidden layers, and output at the output layer. Key components include the neuron/node, weights, biases, and activation functions. Common activation functions are sigmoid, tanh, ReLU, and softmax, each suited for different types of problems. The document provides details on each of these components and how they enable neural networks to learn from data.
The field of artificial neural networks is often just called neural networks or multi-layer perceptrons after perhaps the most useful type of neural network. A perceptron is a single neuron model that was a precursor to larger neural networks.
It is a field that investigates how simple models of biological brains can be used to solve difficult computational tasks like the predictive modeling tasks we see in machine learning. The goal is not to create realistic models of the brain but instead to develop robust algorithms and data structures that we can use to model difficult problems.
The power of neural networks comes from their ability to learn the representation in your training data and how best to relate it to the output variable you want to predict. In this sense, neural networks learn mapping. Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm.
The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features, for example, from lines to collections of lines to shapes.
The document provides an overview of artificial neural networks (ANNs) and the perceptron learning algorithm. It discusses how biological neurons inspire ANNs and how a basic perceptron works using a simple example with inputs, weights, and outputs. The perceptron learning algorithm is then explained, which updates weights based on whether the perceptron's prediction was correct or incorrect on each training example. Finally, the document introduces multilayer perceptrons which can solve non-linearly separable problems by connecting multiple perceptron layers together through a process called backpropagation.
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
- TensorFlow is a popular deep learning library that provides both C++ and Python APIs to make working with deep learning models easier. It supports both CPU and GPU computing and has a faster compilation time than other libraries like Keras and Torch.
- Tensors are multidimensional arrays that represent inputs, outputs, and parameters of deep learning models in TensorFlow. They are the fundamental data structure that flows through graphs in TensorFlow.
- The main programming elements in TensorFlow include constants, variables, placeholders, and sessions. Constants are parameters whose values do not change, variables allow adding trainable parameters, placeholders feed data from outside the graph, and sessions run the graph to evaluate nodes.
The document discusses artificial neural networks and their biological inspiration. It provides details on:
- The basic structure and functioning of biological neurons
- How artificial neural networks are modeled after biological neural networks with nodes, links, weights, and activation functions
- Examples of different activation functions used in artificial neurons like threshold, sigmoid, and linear functions
- How simple logic gates can be modeled using the McCulloch-Pitts neuron model with different weight configurations
- Learning in neural networks involves adjusting the connection weights between neurons through supervised or unsupervised learning processes.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
The document discusses various techniques for edge detection and line detection in images, including:
- Canny edge detection, which uses thresholds to detect and link edges.
- Hough transforms, which detect shapes like lines and circles by counting points that agree with a shape model.
- RANSAC for line detection, which forms line hypotheses from random samples and counts supporting points.
- Techniques for thinning thick edges and detecting edge contours.
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
Training of Deep neural network is difficult task. Deep neural network train with the help of training algorithms and activation function This is an overview of Activation Function and Training Algorithms used for Deep Neural Network. It underlines a brief comparative study of activation function and training algorithms.
Data Science - Part VIII - Artifical Neural NetworkDerek Kane
This lecture provides an overview of biological based learning in the brain and how to simulate this approach through the use of feed-forward artificial neural networks with back propagation. We will go through some methods of calibration and diagnostics and then apply the technique on three different data mining tasks: binary prediction, classification, and time series prediction.
- The document discusses multi-layer perceptrons (MLPs), a type of artificial neural network. MLPs have multiple layers of nodes and can classify non-linearly separable data using backpropagation.
- It describes the basic components and working of perceptrons, the simplest type of neural network, and how they led to the development of MLPs. MLPs use backpropagation to calculate error gradients and update weights between layers.
- Various concepts are explained like activation functions, forward and backward propagation, biases, and error functions used for training MLPs. Applications mentioned include speech recognition, image recognition and machine translation.
This document provides an overview of artificial neural networks. It describes how biological neurons work and how artificial neurons are modeled after them. An artificial neuron receives weighted inputs, sums them, and outputs the result through an activation function. A neural network consists of interconnected artificial neurons arranged in layers. The network is trained using a learning algorithm called backpropagation that adjusts the weights to minimize error between the network's output and target output. Neural networks can be used for applications like handwritten digit recognition.
Deep Learning Sample Class (Jon Lederman)Jon Lederman
Deep learning uses neural networks that can learn their own features from data. The document discusses the history and limitations of early neural networks like perceptrons that used hand-engineered features. Modern deep learning overcomes these limitations by using hierarchical neural networks that can learn increasingly complex features from raw data through backpropagation and gradient descent. Deep learning networks represent features using tensors, or multidimensional arrays, that are learned from data through training examples.
This document provides legal notices and disclaimers for an informational presentation by Intel. It states that the presentation is for informational purposes only and that Intel makes no warranties. It also notes that Intel technologies' features and benefits depend on system configuration. Finally, it specifies that the sample source code in the presentation is released under the Intel Sample Source Code License Agreement and that Intel and its logo are trademarks.
This document provides an overview of deep learning including:
- Deep learning uses neural networks with multiple hidden layers to learn complex patterns in data.
- It can learn powerful feature representations from raw data in an unsupervised manner, unlike traditional ML which requires handcrafted features.
- The basics of neural networks including perceptrons, forward/backward propagation, and activation functions are explained.
- Training a neural network involves calculating loss, taking gradients to minimize loss through methods like stochastic gradient descent and adapting the learning rate.
- Regularization techniques help prevent overfitting, and H2O is introduced as a tool for scalable deep learning on large datasets.
The document discusses deep learning and artificial neural networks. It provides an agenda for topics covered, including gradient descent, backpropagation, activation functions, and examples of neural network architectures like convolutional neural networks. It explains concepts like how neural networks learn patterns from data using techniques like stochastic gradient descent to minimize loss functions. Deep learning requires large amounts of processing power and labeled training data. Common deep learning networks are used for tasks like image recognition, object detection, and time series analysis.
- Artificial neural networks are inspired by biological neurons and are made up of artificial neurons (perceptrons).
- A perceptron receives multiple inputs, assigns weights to them, calculates the weighted sum, and passes it through an activation function to produce an output.
- Weights allow the perceptron to learn the importance of different inputs and change the orientation of the decision boundary. The bias helps shift the activation function curve.
- Common activation functions include sigmoid, tanh, ReLU, and leaky ReLU. They introduce non-linearity and help address issues like vanishing gradients.
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
The document provides an introduction and overview of machine learning, deep learning, and data analysis. It discusses key concepts like supervised and unsupervised learning. It also summarizes the speaker's experience taking online courses and studying resources to learn machine learning techniques. Examples of commonly used machine learning algorithms and neural network architectures are briefly outlined.
This document provides an overview of neural networks and their basic components. It discusses how biological neurons inspired the development of artificial neurons like perceptrons. Perceptrons take weighted inputs, sum them, and use an activation function to determine if the neuron "fires" or not. The document explains how perceptrons are trained by adjusting weights to minimize errors. It also covers common activation functions like sigmoid, tanh, and ReLU and why non-linear activation functions are important for neural networks to learn complex patterns from data.
The document provides an overview of Convolutional Neural Networks (CNNs) including the common layers used to build CNNs such as convolutional, activation, pooling, fully connected, batch normalization, and dropout layers. It describes the functions of each layer type and includes diagrams illustrating CNN architecture. Key components like convolutional layers, pooling layers, and fully connected layers are explained in more detail. Additionally, the document discusses various activation functions used in CNNs such as ReLU, LeakyReLU, Sigmoid, Tanh, Softmax, and more. Their mathematical representations and limitations are also outlined.
The document discusses neural network architecture and components. It explains that a neural network consists of nodes that represent neurons, similar to the human brain. Data is fed through an input layer, processed through hidden layers, and output at the output layer. Key components include the neuron/node, weights, biases, and activation functions. Common activation functions are sigmoid, tanh, ReLU, and softmax, each suited for different types of problems. The document provides details on each of these components and how they enable neural networks to learn from data.
The field of artificial neural networks is often just called neural networks or multi-layer perceptrons after perhaps the most useful type of neural network. A perceptron is a single neuron model that was a precursor to larger neural networks.
It is a field that investigates how simple models of biological brains can be used to solve difficult computational tasks like the predictive modeling tasks we see in machine learning. The goal is not to create realistic models of the brain but instead to develop robust algorithms and data structures that we can use to model difficult problems.
The power of neural networks comes from their ability to learn the representation in your training data and how best to relate it to the output variable you want to predict. In this sense, neural networks learn mapping. Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm.
The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features, for example, from lines to collections of lines to shapes.
The document provides an overview of artificial neural networks (ANNs) and the perceptron learning algorithm. It discusses how biological neurons inspire ANNs and how a basic perceptron works using a simple example with inputs, weights, and outputs. The perceptron learning algorithm is then explained, which updates weights based on whether the perceptron's prediction was correct or incorrect on each training example. Finally, the document introduces multilayer perceptrons which can solve non-linearly separable problems by connecting multiple perceptron layers together through a process called backpropagation.
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
- TensorFlow is a popular deep learning library that provides both C++ and Python APIs to make working with deep learning models easier. It supports both CPU and GPU computing and has a faster compilation time than other libraries like Keras and Torch.
- Tensors are multidimensional arrays that represent inputs, outputs, and parameters of deep learning models in TensorFlow. They are the fundamental data structure that flows through graphs in TensorFlow.
- The main programming elements in TensorFlow include constants, variables, placeholders, and sessions. Constants are parameters whose values do not change, variables allow adding trainable parameters, placeholders feed data from outside the graph, and sessions run the graph to evaluate nodes.
The document discusses artificial neural networks and their biological inspiration. It provides details on:
- The basic structure and functioning of biological neurons
- How artificial neural networks are modeled after biological neural networks with nodes, links, weights, and activation functions
- Examples of different activation functions used in artificial neurons like threshold, sigmoid, and linear functions
- How simple logic gates can be modeled using the McCulloch-Pitts neuron model with different weight configurations
- Learning in neural networks involves adjusting the connection weights between neurons through supervised or unsupervised learning processes.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
The document discusses various techniques for edge detection and line detection in images, including:
- Canny edge detection, which uses thresholds to detect and link edges.
- Hough transforms, which detect shapes like lines and circles by counting points that agree with a shape model.
- RANSAC for line detection, which forms line hypotheses from random samples and counts supporting points.
- Techniques for thinning thick edges and detecting edge contours.
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
Training of Deep neural network is difficult task. Deep neural network train with the help of training algorithms and activation function This is an overview of Activation Function and Training Algorithms used for Deep Neural Network. It underlines a brief comparative study of activation function and training algorithms.
Data Science - Part VIII - Artifical Neural NetworkDerek Kane
This lecture provides an overview of biological based learning in the brain and how to simulate this approach through the use of feed-forward artificial neural networks with back propagation. We will go through some methods of calibration and diagnostics and then apply the technique on three different data mining tasks: binary prediction, classification, and time series prediction.
- The document discusses multi-layer perceptrons (MLPs), a type of artificial neural network. MLPs have multiple layers of nodes and can classify non-linearly separable data using backpropagation.
- It describes the basic components and working of perceptrons, the simplest type of neural network, and how they led to the development of MLPs. MLPs use backpropagation to calculate error gradients and update weights between layers.
- Various concepts are explained like activation functions, forward and backward propagation, biases, and error functions used for training MLPs. Applications mentioned include speech recognition, image recognition and machine translation.
This document provides an overview of artificial neural networks. It describes how biological neurons work and how artificial neurons are modeled after them. An artificial neuron receives weighted inputs, sums them, and outputs the result through an activation function. A neural network consists of interconnected artificial neurons arranged in layers. The network is trained using a learning algorithm called backpropagation that adjusts the weights to minimize error between the network's output and target output. Neural networks can be used for applications like handwritten digit recognition.
Deep Learning Sample Class (Jon Lederman)Jon Lederman
Deep learning uses neural networks that can learn their own features from data. The document discusses the history and limitations of early neural networks like perceptrons that used hand-engineered features. Modern deep learning overcomes these limitations by using hierarchical neural networks that can learn increasingly complex features from raw data through backpropagation and gradient descent. Deep learning networks represent features using tensors, or multidimensional arrays, that are learned from data through training examples.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
DeepLearning.pdf
1. Ramesh Naidu L
Joint Director
Big Data Analytics & Machine Learning
C-DAC, Bangalore
Deep Learning
2. Outline
2
Introduction
What is Neural Network?
01
Deep Neural Network
DNN and its components
02
CNN
Convolutional Neural Networks & Transfer
Learning
03
Autoencoders
Latent space networks
04
RNN
Recurrent Neural Networks &
Its variants
05
Applications of DL
Applications and future of DL
06
6. Deep learning isn’t magic.
But, it is very good at finding patterns…to make predictions.
7. Andrew Ng
Input
Input space Feature space
Motorbikes
“Non”-Motorbikes
Feature
Extraction
Machine
Learning
algorithm
pixel 1
pixel
2
“wheel”
“handle”
handle
wheel
What is Machine Learning?
10. Dendrites
(InputVector)
Soma
(weighted Summation
Function of positive and
Negative signals from
Dendrites)
Axon (Output)
(Transmits the output
Signal to other neurons)
axon
soma
dendrites
Image Credits: https://en.wikipedia.org/wiki/Neuron
NEURAL NETWORK (BIOLOGY)
25. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
▪ Will the weather be nice?
▪ What’s the music like?
▪ Do I have anyone to go with?
▪ Can I afford it?
▪ Do I need to write my exam?
▪ Will I like the food in the food stalls at
stadium?
Just take the first four
for simplicity
26. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
To make my decision, I would consider how important each factor was
to me, weigh them all up, and see if the result was over a certain threshold.
If so, I will go to the match! It’s a bit like the process we often use of weighing
up pros and cons to make a decision.
27. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
Give some weights…and scores…
28. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
Scores:
• weather is looking pretty good but not perfect - 3 out of 4.
• music is ok but not my favourite - 2 out of 4.
• Company - my best friend has said she’ll come with me so I know the
company will be great, - 4 out of 4.
• Money - little pricey but not completely unreasonable - 2 out of 4.
29. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
Now for the importance:
• Weather - I really want to go if it’s sunny, very imp - I give it a full 4 out of 4.
• Music - I’m happy to listen to most things, not super important - 2 out of 4.
• Company - I wouldn’t mind too much going to the match on my own, so
company can have a 2 out of 4 for importance.
• Money - I’m not too worried about money, so I can give it just a 1 out of 4.
30. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
Total Score = Sum of (Score x Weight) = 26
Threshold = 25
Score (26) >Threshold (25) → Going to Cricket Match
31. WHAT HAPPENS IN A NEURON?
Problem:Whether to go to a cricket match or not?
This number is what is known as the ‘bias’
32. WHAT HAPPENS IN A NEURON?
A general form of a Neuron/Single Perceptron
33. A PERCEPTRON – GENERAL FORM
• The basic unit of computation in a neural network is the neuron, often called
a node or unit.
• receives input from some other nodes, or from an external source and computes
an output.
• Each input has an associated weight (w), which is assigned on the basis of its
relative importance
• The node applies a function f (defined below) to the weighted sum of its inputs
34. A PERCEPTRON…HOW IT WORKS?
1. All the inputs x are multiplied with their weights w. Let’s call it k.
Fig: Multiplying inputs with weights for 5 inputs
35. A PERCEPTRON…HOW IT WORKS?
2. Add all the multiplied values and call them Weighted Sum.
Fig:Adding with Summation
36. WHY DO WE NEED WEIGHTS AND BIAS?
• Weights shows the strength of the particular node/input.
• A bias value allows you to shift the activation function to the left or right.
37. WHY DO WE NEED WEIGHTS AND BIAS?
• Weights shows the strength of the particular node/input.
• A bias value allows you to shift the activation function to the left or right.
without the bias the ReLU-network is not able to deviate from zero at (0,0).
39. WHY DO WE NEED “ACTIVATION FUNCTION”?
For example, if we simply want to estimate the cost of a meal, we could use a neuron based on
a linear function f(z) = wx + b. Using (z) = z and weights equal to the price of each item, the
linear neuron would take in the number of burgers, fries, and sodas and output the price of the
meal.
The above kind of linear neuron is very easy
for computation.
However, by just having an interconnected
set of linear neurons, we won’t be able to
learn complex relationships: taking multiple
linear functions and combining them still
gives us a linear function, and we could
represent the whole network by a simple
matrix transformation.
40. WHY DO WE NEED “ACTIVATION FUNCTION”?
Real world data is mostly non-linear, so this is where
the Activation functions comes for help.
Activation functions introduce non-linearity into the
output of a neuron, and this ensures that we can
learn more complex functions by approximating
every non-linear function as a linear combination of
a large number of non-linear functions.
Also, activation functions help us to limit the
output to a certain finite value.
41. WHY DO WE NEED “ACTIVATION FUNCTION”?
• Their main purpose is to convert a input signal of a node in a A-NN to
an output signal.
• In short, the activation functions are used to map the input between the
required values like (0, 1) or (-1, 1).
• It is also known as Transfer Function.
• They introduce non-linear properties to our Network.
• That output signal now is used as a input in the next layer in the stack.
The Activation Functions can be basically divided into 2 types-
• Linear Activation Function
• Non-linear Activation Functions
43. SIGMOID
• Sigmoid is a special case of logistic
L = 1, k = 1 and xo = 0
• Takes a real number and
transforms it to → σ(x)∈(0,1).
• Large Negative → 0
• Large Positive → 1
Sigmoidal → S-shaped
Logistic
Sigmoid
Derivative of Sigmoid (dotted Line)
44. SIGMOID – DISADVANTAGES
• Sigmoids saturate and kill
gradients
1) Sigmoids saturate and kill
gradients
• neuron’s activation saturates at
either tail of 0 or 1
No gradient → No signal to neuron
Too large weights → Neurons
saturate → No learning
2) Output is not zero centred
After 6 or -6, gradient of sigmoid is
becoming Zero
(Vanishing Gradient Problem)
45. HYPERBOLICTANGENT FUNCTION - TANH
• It is also sigmoidal
• Mathematically,
tanh(x) = 2σ(2x)−1
Transforms data to -1 to 1
tanh(x)∈(−1,1).
• Output is zero centred
• It also saturates and kills gradient
46. RELU (RECTIFIED LINEAR UNIT) ACTIVATION FUNCTION
R(x) = max(0,x) ➔
R(x) = 0 if x <= 0
R(x) = x if x > 0
• The ReLU is the most used
activation function
• Range: [ 0 to infinity)
• More effective than sigmoidal
functions
• Disadvantage: any negative input
given to the ReLU activation function
turns the value into zero immediately in
the graph, which in turns affects the
resulting graph by not mapping the
negative values appropriately.
R(x) = max(0,x) ➔
R(x) = 0 if x <= 0
R(x) = x if x > 0
47. LEAKY RELU
fix the “dying ReLU” problem by
introducing a
Small negative slope (0.01 or so)
(x) = 0.01x, if (x<0)
f(x) = x, if (x≥0)
- Does not saturate
- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.
[Mass et al., 2013]
[He et al., 2015]
48. RANDOMIZED RELU
• Fix the problem of dying neurons
• It introduces a small slope to keep the updates alive.
• The leak helps to increase the range of the ReLU function. Usually, the value
of a is 0.01 or so.
• When a is not 0.01 then it is called Randomized ReLU.
49. SOFTMAX
• Transforms a real valued vector to [0, 1]
• Output of softmax is the list of probabilities of k different classes/categories
j = 1, 2, ..., K.
50. SOFTMAX – CLASSIFICATION OF DIGITS
• Transforms a real valued vector to [0, 1]
• Output of softmax is the list of probabilities of 10 digits
51. WHICH ACTIVATION FUNCTIONTO USE?
▪ use ReLu which should only be applied to the hidden layers.
▪ And if our Model suffers form dead neurons during training we should
use leaky ReLu.
▪ It’s just that Sigmoid and Tanh should not be used nowadays due to
the vanishing Gradient Problem which causes a lots of problems
to train a Neural Network Model.
▪ Softmax is used in the last layer (Because it gives output
probabilities).
54. SINGLE NEURONTO A NETWORK
• The outputs of the neurons in one layer flow through to become the inputs of
the next layer, and so on.
55. A NETWORKTO A DEEP NETWORK
Add another hidden layer….
a neural network can be considered ‘deep’ if it contains more hidden layers...
56. FEEDFORWARD NEURAL NETWORK
Add another hidden layer….
a neural network can be considered ‘deep’ if it contains more hidden layers...
57. TRAININGTHE NETWORK
1. Randomly initialise the network weights and biases
2. Training:
• Get a ton of labelled training data (e.g. pictures of cats labelled ‘cats’ and
pictures of other stuff also correctly labelled)
• For every piece of training data, feed it into the network
3. Testing:
• Check whether the network gets it right (given a picture labelled ‘cat’, is
the result of the network also ‘cat’ or is it ‘dog’, or something else?)
• If not, how wrong was it? Or, how right was it? (What probability did it
assign to its guess?)
4. Tuning /Improving: Nudge the weights a little to increase the probability of
the network more probability getting the answer right.
60. HOWTO INCREASETHE ACCURACY?
Predict
72
%
Trump
28
% Hillory
1. Two output neurons for each class
2. O/P: A probability of 72% that the image is “Trump” → 72 % is Not sufficient
Go Backwards &
Adjust weights
61. HOWTO INCREASETHE ACCURACY?
▪ Measure the network’s output and the correct output using the ‘loss function’.
▪ Which Loss Function??
▪ Depends on the data
▪ Example: Mean Square Error (MSE), Categorical Cross Entropy
62. HOWTO INCREASETHE ACCURACY?
▪ Measure the network’s output and the correct output using the ‘loss function’.
▪ Which Loss Function??
▪ Depends on the data
▪ Example: Mean Square Error
The goal of training is to find the weights and
biases that minimizes the loss function.
Image source: firsttimeprogrammer.blogspot.co.u
63. HOWTO INCREASETHE ACCURACY?
▪ We can plot the loss against the weights.
▪ To do this accurately, we would need to be able to visualise tons of dimensions,
to account for the many weights and biases in the network.
▪ Because I find it difficult to visualise more than three dimensions,
▪ Let’s pretend we only need to find two weight values.We can then use the third
dimension for the loss.
▪ Before training the network, the weights and biases are randomly initialised so
the loss function is likely to be high as the network will get a lot of things
wrong.
▪ Our aim is to find the lowest point of the loss function, and then see what
weight values that corresponds to. (See Next Slide).
64. HOWTO INCREASETHE ACCURACY?
▪ Here, we can easily see
where the lowest point is
and could happily read off
the corresponding weights
values.
▪ Unfortunately, it’s not that
easy in reality. The network
doesn’t have a nice
overview of the loss
function, it can only know
what its current loss is and
its current weights and
biases are.
Image source: firsttimeprogrammer.blogspot.co.u
67. BASIC UNITS OF NEURAL NETWORK
Input Layer: Input variables, sometimes called the visible layer.
Hidden Layers: Layers of nodes between the input and output layers. There may be
one or more of these layers.
Output Layer: A layer of nodes that produce the output variables.
Things to describe the shape and capability of a neural network:
▪ Size: The number of nodes in the model/network.
▪ Width: The number of nodes in a specific layer.
▪ Depth: The number of layers in a neural network (We don’t count input as a layer).
▪ Capacity: The type or structure of functions that can be learned by a network
configuration. Sometimes called “representational capacity“.
▪ Architecture: The specific arrangement of the layers and nodes in the network.
71. Learning Rate (LR)
• Deep learning neural networks are trained using the stochastic gradient descent
algorithm.
• An optimization algorithm that estimates the error gradient for the current state of the
model using examples from the training dataset, then updates the weights of the model
using the back-propagation of errors algorithm
• Learning Rate: The amount that the weights are updated during training is referred to
as the step size or the “learning rate.”
• Specifically, the learning rate is a configurable hyperparameter used in the training of
neural networks that has a small positive value, often in the range between 0.0 and 1.0.
72. Learning Rate
• The learning rate controls how quickly the model is adapted to the problem.
• Smaller learning rates → Require more training epochs given the smaller changes made
to the weights each update
• Larger learning rates → Result in rapid changes and require fewer training epochs.
• We have to choose the learning rate carefully
The learning rate is perhaps the most important hyperparameter.
If you have time to tune only one hyperparameter, tune the learning rate.
74. Learning Rate – How to find optimal LR?
Gradually increase the learning rate after each mini batch, recording the loss at each
increment.
This gradual increase can be on either a linear or exponential scale.
a) For learning rates which are too low, the loss may decrease, but at a very shallow rate.
b) When entering the optimal learning rate zone, you'll observe a quick drop in the loss
function.
c) Increasing the learning rate further will cause an increase in the loss as the parameter
updates cause the loss to "bounce around" and even diverge from the minima.
78. 1. Simple Gradient Descent
When we initialize our weights, we are at point A in
the loss landscape.
The first thing we do is to check, out of all possible
directions in the x-y plane, moving along which
direction brings about the steepest decline in the
value of the loss function.
This is the direction we have to move in.
79. Problems with SGD – Local Minima
Gradient descent is driven by the
gradient, which will be zero at the
base of any minima.
Local minimum are called so since
the value of the loss function is
minimum at that point in a local
region.
A global minima is called so since the
value of the loss function is minimum
there, globally across the entire
domain of the loss function.
80. Optimization: Problems with SGD – Saddle Points
• A saddle point gets it's name
from the saddle of a horse
with which it resembles.
• While it's a minima in one
direction (x), it's a local
maxima in another direction,
and if the contour is flatter
towards the x direction, GD
would keep oscillating to and
fro in the y - direction, and
give us the illusion that we
have converged to a minima.
https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/
81. Optimization: Problems with SGD – Local Minima
A global minima is called so since the
value of the loss function is minimum
there, globally across the entire
domain of the loss function.
A Complicated Loss Landscape
82. Optimization: Problems with SGD
What if loss changes quickly in one direction and slowly in another?
What does gradient descent do?
Very slow progress along shallow dimension, jitter along steep direction
84. 2. SGD + Momentum
try to capture some information
regarding the previous updates a
weight has gone through before
performing the current update.
if a weight is constantly moving in a
particular direction (increasing or
decreasing), it will slowly accumulate
some “momentum” in that direction.
when it faces some resistance and actually
has to go the opposite way, it will still
continue going in the original direction for
a while because of the accumulated
momentum.
85. 2. SGD + Momentum
Example:
• This is comparable to the actual momentum in
physics. Let’s consider a small valley and a ball.
• If we release the ball at one side of the valley,
it will continuously roll down and in that
process it is gaining momentum towards that
direction.
• Finally when the ball reaches the bottom, it
doesn’t stop there.
• Instead it starts rolling up on the opposite side
for a while since it has gained some
momentum even though gravity is telling it to
come down.
86. 3. Nesterov Momentum
In this, what we try to do is instead of
calculating the gradients at the current
position we try to calculate it from an
approximate future position.
This is because we want to try to calculate our
gradients in a smarter way.
Just before reaching a minima, the momentum
will start reducing before it reaches it because
we are using gradients from a future point. This
results in improved stability and lesser
oscillations while converging, furthermore, in
practice it performs better than pure
momentum.
88. 4. AdaGrad (Adaptive Optimization)
In these methods, the parameters like learning rate (alpha) and momentum coefficient (n)
will not stay constant throughout the training process. Instead, these values will constantly
adapt for each and every weight in the network and hence will also change along with the
weights.
The first algorithm we are going to look into is Adagrad (Adaptive Gradient).
try to change the learning rate (alpha) for each update.
The learning rate changes during each update in such a way that it will decrease if a
weight is being updated too much in a short amount of time and it will increase if a weight
is not being updated much.
89. 4. AdaGrad (Adaptive Optimization)
In these methods, the parameters like learning rate (alpha) and momentum coefficient (n)
will not stay constant throughout the training process. Instead, these values will constantly
adapt for each and every weight in the network and hence will also change along with the
weights.
The first algorithm we are going to look into is Adagrad (Adaptive Gradient).
try to change the learning rate (alpha) for each update.
The learning rate changes during each update in such a way that it will decrease if a
weight is being updated too much in a short amount of time and it will increase if a weight
is not being updated much.
90. 4. AdaGrad (Adaptive Optimization)
Each weight has its own cache value, which collects the squares of the gradients till the
current point.
The cache will continue to increase in value as the training progresses. Now the new
update formula is as follows:
91. 5. RMSProp
In RMSProp the only difference lies in the cache updating strategy. In the new formula, we
introduce a new parameter, the decay rate (gamma).
gamma value is usually around 0.9 or 0.99.
Hence for each update, the square of gradients get added at a very slow rate compared to
adagrad.
This ensures that the learning rate is changing constantly based on the way the weight is
being updated, just like adagrad, but at the same time the learning rate does not decay
too quickly, hence allowing training to continue for much longer.
92. 6. Adam
Adam is a little like combining RMSProp with Momentum. First we calculate our m value,
which will represent the momentum at the current point.
Adam
Momentum
Update Formula
The only difference between this equation and the momentum equation is that instead of the
learning rate we keep (1-Beta_1) to be multiplied with the current gradient. Next we will
calculate the accumulated cache, which is exactly the same as it is in RMSProp:
Now we can get the final
update formula:
93. 6. Adam
• As we can observe here, we are performing accumulating the gradients by calculating
momentum and also we are constantly changing the learning rate by using the cache.
• Due to these two features, Adam usually performs better than any other optimizer out
there and is usually preferred while training neural networks.
• In the paper for Adam, the recommended parameters are 0.9 for beta_1, 0.99 for beta_2
and 1e-08 for epsilon.
94. - Adam is a good default choice in many cases
- SGD+Momentum with learning rate decay often
outperforms Adam by a bit, but requires more tuning
In practice:
96. HOW A NEURON WORKS?
1. Each neuron computes a weighted sum of the outputs of the previous
layer (which, for the inputs, we’ll call x),
2. applies an activation function, before forwarding it on to the next
layer.
3. Each neuron in a layer has its own set of weights—so while each
neuron in a layer is looking at the same inputs, their outputs will all be
different.
97. HOW A NETWORK WORKS?
1. we want to find a set of weights W that minimizes on the output of J(W).
2. Consider a simple double layer Network.
3. each neuron is a function of the previous one connected to it. In other words,
if one were to change the value of w1, both “hidden 1” and “hidden 2” (and
ultimately the output) neurons would change.
98. HOW A NETWORK WORKS?
we can mathematically formulate the output as an extensive composite
function:
99. INTRODUCE ERROR
we can mathematically formulate the output as an extensive composite
function:
J is a function of → input, weights, activation function and output
100. HOWTO MINIMIZETHE ERROR – BACK PROPAGATION
The error derivative can be used in gradient descent (as discussed
earlier) to iteratively improve upon the weight.
We compute the error derivatives w.r.t. every other weight in the
network and apply gradient descent in the same way.
104. Fast-forward to today: ConvNets are everywhere
Classification Retrieval
Figurescopyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012.Reproduced with
permission.
105. Fast-forwardTo Today: Convnets Are Everywhere
Figurescopyright ShaoqingRen,Kaiming He,RossGirschick,JianSun,2015.Reproduced
with permission.
[Faster R-CNN: Ren, He, Girshick, Sun 2015]
Detection Segmentation
[Farabet et al.,
2012]
Figurescopyright ClementFarabet,
2012. Reproduced with permission.
107. CNN – STEP 1: BREAKTHE IMAGE INTO OVERLAPPING IMAGETILES
108. CNN - STEP 2: FEED EACH IMAGETILE INTO A SMALL NEURAL NETWORK
109. CNN - STEP 3: SAVETHE RESULTS FROM EACHTILE INTO A NEW ARRAY
110. CNN - STEP 4: DOWN SAMPLING
The result of Step 3 was an array that maps out which parts of the original image are the
most interesting. But that array is still pretty big.
111. CNN - STEP 4: DOWN SAMPLING…
To reduce the size of the array, we down sample it using an algorithm called max pooling. It
sounds fancy, but it isn’t at all!
We’ll just look at each 2x2 square of the array and keep the biggest number:
112. CNN – 5 STEP PIPELINE
So from start to finish, our whole five-step pipeline looks like this:
190. Case Study: AlexNet
[Krizhevsky et al. 2012]
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with
permission.
Architecture
:
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
Max POOL3
FC6
FC7
FC8
191. Case Study: AlexNet
[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4
=>
Q: what is the output volume size? Hint: (227-11)/4+1 = 55
(N – F)/S + 1
192. Case Study: AlexNet
[Krizhevsky et al. 2012]
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with
permission.
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4
=>
Output volume [55x55x96]
Q: What is the total number of parameters in this layer?
193. Case Study: AlexNet
[Krizhevsky et al. 2012]
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with
permission.
Input: 227x227x3 images
After CONV1: 55x55x96
Second layer (POOL1): 3x3 filters applied at stride 2
Q: what is the output volume size? Hint: (55-3)/2+1 = 27
195. Case Study: AlexNet
[Krizhevsky et al. 2012]
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class
scores)
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with
permission.
Full (simplified) AlexNet
architecture: [227x227x3] INPUT
[55x55x96] CONV1: 96 11x11 filters at stride 4, pad
0 [27x27x96] MAX POOL1: 3x3 filters at stride 2
[27x27x96] NORM1: Normalization layer
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad
2 [13x13x256] MAX POOL2: 3x3 filters at stride 2
[13x13x256] NORM2: Normalization layer
[13x13x384] CONV3: 384 3x3 filters at stride 1, pad
1 [13x13x384] CONV4: 384 3x3 filters at stride 1,
pad 1 [13x13x256] CONV5: 256 3x3 filters at stride
1, pad 1 [6x6x256] MAX POOL3: 3x3 filters at stride
2
196. Case Study: AlexNet
[Krizhevsky et al. 2012]
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class
scores)
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with
permission.
Full (simplified) AlexNet
architecture: [227x227x3] INPUT
[55x55x96] CONV1: 96 11x11 filters at stride 4, pad
0 [27x27x96] MAX POOL1: 3x3 filters at stride 2
[27x27x96] NORM1: Normalization layer
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad
2 [13x13x256] MAX POOL2: 3x3 filters at stride 2
[13x13x256] NORM2: Normalization layer
[13x13x384] CONV3: 384 3x3 filters at stride 1, pad
1 [13x13x384] CONV4: 384 3x3 filters at stride 1,
pad 1 [13x13x256] CONV5: 256 3x3 filters at stride
1, pad 1 [6x6x256] MAX POOL3: 3x3 filters at stride
2
Details/Retrospectives:
-first use of ReLU
-used Norm layers (not common anymore)
-heavy data augmentation
-dropout 0.5
-batch size 128
-SGD Momentum 0.9
-Learning rate 1e-2, reduced by 10
manually when val accuracy
plateaus
197. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
Lin etal Sanchez&
Perronnin
Krizhevsky etal
(AlexNet)
Zeiler&
Fergus
Simonyan & Szegedy etal
Zisserman (VGG) (GoogLeNet)
He et al
(ResNet)
Russakovsky etal
Shao etal Hu etal
(SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
First CNN-based
winner
198. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
Lin etal Sanchez&
Perronnin
Krizhevsky etal
(AlexNet)
Zeiler&
Fergus
Simonyan & Szegedy etal
Zisserman (VGG) (GoogLeNet)
He et al
(ResNet)
Russakovsky etal
Shao etal Hu etal
(SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
ZFNet: Improved
hyperparameters
over AlexNet
199. ZFNet [Zeiler and Fergus,
2013]
AlexNet but:
CONV1: change from (11x11 stride 4) to (7x7 stride 2)
CONV3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512
ImageNet top 5 error: 16.4% -> 11.7%
200. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
Lin etal Sanchez&
Perronnin
Krizhevsky et al Zeiler&
(AlexNet) Fergus
Simonyan & Szegedy etal
Zisserman (VGG) (GoogLeNet)
He et al
(ResNet)
Russakovsky etal
Shao etal Hu etal
(SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
Deeper Networks
201. Case Study: VGGNet
[Simonyan and Zisserman, 2014]
Small filters, Deeper networks
8 layers (AlexNet)
-> 16 and 19 layers (VGG16, VGG19)
Only 3x3 CONV stride 1, pad 1
and 2x2 MAX POOL stride 2
11.7% top 5 error in ILSVRC’13 (ZFNet)
-> 7.3% top 5 error in ILSVRC’14
AlexNet VGG16 VGG19
204. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
Lin etal Sanchez&
Perronnin
Krizhevsky et al Zeiler&
(AlexNet) Fergus
Simonyan & Szegedy etal
Zisserman (VGG) (GoogLeNet)
He et al
(ResNet)
Russakovsky etal
Shao etal Hu etal
(SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
Deeper Networks
205. Case Study: GoogLeNet
[Szegedy et al., 2014]
Inception module
- 22 layers
- Efficient “Inception” module
- No FC layers
- Only 5 million parameters!
12x less than AlexNet
- ILSVRC’14 classification winner
(6.7% top 5 error)
Deeper networks, with computational
efficiency
206. Case Study: GoogLeNet
[Szegedy et al., 2014]
“Inception module”: design a
good local network topology
(network within a network) and
then stack these modules on
top of each other
Inception module
207. Case Study: GoogLeNet
[Szegedy et al., 2014]
Naive Inception module
Apply parallel filter operations on
the input from previous layer:
- Multiple receptive field sizes
for convolution (1x1, 3x3,
5x5)
- Pooling operation (3x3)
Concatenate all filter outputs
together depth-wise
208. Case Study: GoogLeNet
[Szegedy et al., 2014]
Naive Inception module
Apply parallel filter operations on
the input from previous layer:
- Multiple receptive field sizes
for convolution (1x1, 3x3,
5x5)
- Pooling operation (3x3)
Concatenate all filter outputs
together depth-wise
Q: What is the problem with this?
[Hint: Computational complexity]
209. Case Study: GoogLeNet
[Szegedy et al., 2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
210. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q1: What is the output size of
the 1x1 conv, with 128 filters?
211. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q1: What is the output size of
the 1x1 conv, with 128 filters?
28x28x12
8
212. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q2: What are the output sizes
of all different filter operations?
28x28x12
8
213. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q2: What are the output sizes
of all different filter operations?
28x28x12
8
28x28x19
2
28x28x9
6
28x28x25
6
214. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q3:What is output size
after filter concatenation?
28x28x12
8
28x28x19
2
28x28x9
6
28x28x25
6
215. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q3:What is output size
after filter concatenation?
28x28x12
8
28x28x19
2
28x28x9
6
28x28x25
6
28x28x(128+192+96+256) =
28x28x672
216. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Naive Inception module
Q3:What is output size
after filter concatenation?
28x28x12
8
28x28x19
2
28x28x9
6
28x28x25
6
28x28x(128+192+96+256) =
28x28x672
Conv Ops:
[1x1 conv, 128] 28x28x128x1x1x256
[3x3 conv, 192] 28x28x192x3x3x256
[5x5 conv, 96] 28x28x96x5x5x256
Total: 854M ops
217. Case Study: GoogLeNet
[Szegedy et al.,
2014]
Naive Inception module
Q: What is the problem with this?
[Hint: Computational complexity]
Example:
Module
input:
28x28x256
Q3:What is output size
after filter concatenation?
28x28x12
8
28x28x19
2
28x28x9
6
28x28x25
6
28x28x(128+192+96+256) =
28x28x672
Conv Ops:
[1x1 conv, 128] 28x28x128x1x1x256
[3x3 conv, 192] 28x28x192x3x3x256
[5x5 conv, 96] 28x28x96x5x5x256
Total: 854M ops
Very expensive compute
218. Lin etal Sanchez&
Perronnin
Krizhevsky et al Zeiler&
(AlexNet) Fergus
Simonyan & Szegedy etal
Zisserman (VGG) (GoogLeNet)
He et al
(ResNet)
Russakovsky etal
Shao etal Hu etal
(SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
“Revolution of Depth”
219. Case Study: ResNet
[He et al., 2015]
Very deep networks using residual
connections
- 152-layer model for ImageNet
- ILSVRC’15 classification winner
(3.57% top 5 error)
- Swept all classification and
detection competitions in
ILSVRC’15 and COCO’15!
..
.
relu
X
identity
F(x) + x
F(x)
relu
X
Residual block
220. Case Study: ResNet
[He et al., 2015]
relu
X
“Plain” layers
H(x)
relu
X
identity
F(x) + x
F(x)
Solution: Use network layers to fit a residual mapping instead of directly trying to fit a
desired underlying mapping
relu
X
Residual block
221. relu
Case Study: ResNet
[He et al., 2015]
Solution: Use network layers to fit a residual mapping instead of directly trying to fit a
desired underlying mapping
X
identity
F(x) + x
F(x)
relu
relu
X
Residual block
X
“Plain” layers
H(x)
Use layers to
fit residual
F(x) = H(x) - x
instead of
H(x) directly
H(x) = F(x) + x
222. An Analysis of Deep Neural Network Models for Practical Applications,
2017.
Comparing complexity... Inception-v4: Resnet + Inception!
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with
permission.
223. Comparing complexity...
An Analysis of Deep Neural Network Models for Practical Applications,
2017.
VGG: Highest
memory, most
operations
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with
permission.
224. Comparing complexity...
An Analysis of Deep Neural Network Models for Practical Applications,
2017.
GoogLeNet:
most efficient
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with
permission.
225. Comparing complexity...
An Analysis of Deep Neural Network Models for Practical Applications,
2017.
AlexNet:
Smaller compute, still memory
heavy, lower accuracy
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with
permission.
225
226. Comparing complexity...
An Analysis of Deep Neural Network Models for Practical Applications,
2017.
ResNet:
Moderate efficiency depending on model, highest accuracy
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with
permission.
226
227. Lin et al Sanchez& Krizhevsky et al
Zeiler&
Simonyan & Szegedy etal He et al Shao etal Hu etal Russakovsky etal
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet) (SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
Network ensembling
103
228. [Shao et al. 2016]
- Multi-scale ensembling of Inception, Inception-Resnet, Resnet,
Wide Resnet models
- ILSVRC’16 classification winner
Improving ResNets...
“Good Practices for Deep Feature Fusion”
104
229. Lin etal Sanchez&
Perronnin
Krizhevsky et al Zeiler&
(AlexNet) Fergus
Simonyan & Szegedy etal
Zisserman (VGG) (GoogLeNet)
He et al
(ResNet)
Russakovsky etal
Shao etal Hu etal
(SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners
Adaptive feature map reweighting
105
230. Improving ResNets...
Squeeze-and-Excitation Networks (SENet)
[Hu et al. 2017]
- Add a “feature recalibration” module that
learns to adaptively reweight feature maps
- Global information (global avg. pooling
layer) + 2 FC layers used to determine
feature map weights
- ILSVRC’17 classification winner (using
ResNeXt-152 as a base architecture)
106
231. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
Lin et al Sanchez& Krizhevsky et al
Zeiler&
Simonyan & Szegedy etal He et al Shao etal Hu etal Russakovsky etal
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet) (SENet)
shallow 8 layers 8 layers
19 layers 22 layers
152 layers 152 layers 152 layers
231
232. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
winners
152 layers 152 layers 152 layers
19 layers 22 layers
shallow 8 layers 8 layers
Lin et al Sanchez & Krizhevsky et al Zeiler & Simonyan & Szegedy et al He et al Shao et al Hu et al Russakovsky etal
Perronnin (AlexNet) Fergus Zisserman (VGG) (GoogLeNet) (ResNet) (SENet)
Completion of the challenge:
Annual ImageNet competition no longer
held after 2017 -> now moved to Kaggle.
232
247. Summary: Data Augmentation
- Augmentation helps to prevent overfitting
- It makes network invariant to certain transformations:
translations, flip, etc
- Can be done on-the-fly
- Can be used to learn image representations when no label
datasets are available.
250. SHORTCOMINGS OF NORMAL NEURAL NETWORKS
For example, imagine you want to classify what kind of event is happening
at every point in a movie.
It’s unclear how a traditional neural network could use its reasoning
about previous events in the film to inform later ones.
251. RNN
RNNs address this issue. It has loops in it and allows to persist
information.
A chunk of neural network, A, looks at some input xt and outputs a value ht.
A loop allows information to be passed from one step of the network to the next.
252. RNN
▪ In traditional Networks, all inputs are independent of each other.
▪ RNN will make use of sequential information.
▪ If you want to predict the next word in a sentence you better know which
words came before it (A sequence of words → Predict the next word).
253. RNN…
▪ The most straight-forward example is perhaps a time-series of numbers,
where the task is to predict the next value given previous values.
▪ The input to the RNN at every time-step is the current value as well as a state
vector which represent what the network has “seen” at time-steps before.
▪ This state-vector is the encoded memory of the RNN, initially set to zero.
254. RNN…
▪ A recurrent neural network can be thought of as multiple copies of the same
network, each passing a message to a successor.
256. RNN –VARIOUSTYPES
(2) Sequence output
Example: Image captioning
Takes an image and outputs a sentence of words.
Single Image → Sequence of words (Caption)
260. RNN –VARIOUSTYPES
(3) Sequence of input to a single output
Example: Sentiment Prediction
Takes a sequence of words (Sentence) and
outputs a word (Sentiment).
Sentence (sequence of words) → Sentiment
(word)
261. RNN –VARIOUSTYPES
(4) Sequence input and sequence output
Example: MachineTranslation
An RNN reads a sentence in English and
then outputs a sentence in French.
262. RNN – MACHINETRANSLATION
Sequence of words (German) →
Sequence of words (English)
output only starts after we have seen the complete input, because the first word of
our translated sentences may require information captured from the complete
input sequence.
263. RNN –VARIOUSTYPES
(5) Synced Sequence input and output
Example: Video Classification
Label each frame of the video
267. RNN – DISADVANTAGES
Can’t persist long term dependencies
“the clouds are in the sky,”
we don’t need any further context – it’s pretty obvious the next word is going to
be sky. In such cases, where the gap between the relevant information and the place
that it’s needed is small, RNNs can learn to use the past information.
Don’t have memory cells,
X0, X1 are lost when we reached X3
268. RNN – DISADVANTAGES
Can’t persist long term dependencies
“I grew up in Bangalore… I speak fluent ? ”
Ans: Kannada
Recent information (=I speak) suggests that the next word is probably the name of
a language,
But,Which language?
we need the context of Bangalore, from further back.
It’s entirely possible for the gap between the relevant information and the point
where it is needed to become very large.
If the gap grows, RNNs cannot learn to connect the information.
269. LSTM – LONG SHORTTERM MEMORY
▪ RNNs cannot capture long term dependencies.
“I grew up in France… I speak fluent French.”
We need the context of France…We go for a new variant of RNN, i.e. LSTM
271. BIDIRECTIONAL RNN (eg: BiLSTM)
27
1
The output at time 𝑡 may not only depend
on the previous elements in the
sequence, but also future elements.
Example: to predict amissing word in a
sequence you want to look atboth the left
and the right context.
Two RNNs stacked on top of each other.
Output computed based on the hidden state of
both RNNs.
Huang Zhiheng,Xu Wei,YuKai.Bidirectional LSTM
ConditionalRandom FieldModels forSequence
T
agging.
274. PCA – PRINCIPAL COMPONENT ANALYSIS
- Statistical approach for data
compression and visualization
- Invented by Karl Pearson in
1901
- Weakness: linear components
only.
275. TRADITIONAL AUTOENCODER
▪ Unlike the PCA now we can use
activation functions to achieve non-
linearity.
▪ It has been shown that an AE
without activation functions
achieves the PCA capacity.
𝑧
278. LATENT SPACE REPRESENTATION
Original
Latent
Representation
In this fairly simple example, let’s say our original data set is a 5 x 5 x 1 image. We will
set the latent space size to 3 x 1, which means that the compressed data point is a vector
has 3 dimensions.
Now every 5x5x1 (25 numbers) compressed data point is uniquely identified by 3
numbers
279. LATENT SPACE REPRESENTATION
But you may be wondering, what are “similar” images and why does reducing the size of
the data make similar images “closer together” in space?
If we look at the three images above, two chairs of different color and a table, we will
easily say that the two images of the chair are more similar while the table is the most
different in shape.
280. LATENT SPACE REPRESENTATION
But what makes the other two chairs “alike”?
Of course we will consider related features such as angles, characteristics of shapes, etc.
each such feature will be considered a data point represented in the latent space. .
In the latent space irrelevant information such as the color of the seat will be ignored
when represented in the latent space, or in other words we will remove that information.
As a result, when we reduce the size the image of both
chairs becomes less different and more similar.
281. AE - SIMPLE IDEA
- Given data 𝑥 (no labels) we would like to learn the
functions 𝑓 (encoder) and 𝑔 (decoder) where:
𝑓 𝑥 = 𝑎𝑐𝑡 𝑤𝑥 + 𝑏 = 𝑧
𝑔 𝑧 = 𝑎𝑐𝑡 𝑤′z + 𝑏′ = ො
𝑥
s.t ℎ 𝑥 = 𝑔 𝑓 𝑥 = ො
𝑥
where ℎ is an approximation of the identity
function.
(𝑧 is some latent
representation or code
and “act” is a non-
linearity such as the
sigmoid)
ො
𝑥
𝑓 𝑥 𝑔 𝑧
𝑥
(ො
𝑥 is 𝑥’s reconstruction)
𝑧
282. CONVOLUTIONAL AE
Input
(28,28,1
)
Encoder Decoder
Conv 1
16 F
@ (3,3,1)
same
C1
(28,28,16
)
M.P 1
(2,2)
same
M.P1
(14,14,16
)
Conv 2
8 F
@ (3,3,16)
same
C2
(14,14,8)
M.P 2
(2,2)
same
M.P2
(7,7,8)
Conv 3
8 F
@ (3,3,8)
same
C3
(7,7,8)
M.P 3
(2,2)
same
M.P3
(4,4,8)
Hidden
Code
D Conv 1
8 F
@ (3,3,8)
same
D.C1
(4,4,8)
U.S 1
(2,2)
U.S1
(8,8,8)
D Conv 2
8 F
@ (3,3,8)
same
D.C2
(8,8,8)
U.S 2
(2,2)
U.S2
(16,16,8)
D Conv 3
16 F
@ (3,3,8)
valid
D.C3
(14,14,16
)
U.S 3
(2,2)
U.S3
(28,28,8)
D Conv 4
1 F
@ (5,5,8)
same
D.C4
(28,28,1)
Output
* Input values are normalized
* All of the conv layers activation functions are relu except for the last conv which is sigm
283. DENOISING AUTOENCODERS
Intuition:
- We still aim to encode the input and to NOT mimic the identity function.
- We try to undo the effect of corruption process stochastically applied to the
input.
Encoder Decoder
Latent space representation Denoised Input
Noisy Input
A more robust model
284. DENOISING AUTOENCODERS
Intuition:
- We still aim to encode the input and to NOT mimic the identity function.
- We try to undo the effect of corruption process stochastically applied to the
input.
Encoder Decoder
Latent space representation Denoised Input
Noisy Input
A more robust model
286. DENOISING AUTOENCODERS
A DAE instead minimizes:
𝐿 𝑥, 𝑔 𝑓
𝑥
where
𝑥 is a copy of 𝑥 that has been corrupted by some form of noise.
Instead of trying to mimic the identity function by minimizing:
𝐿 𝑥, 𝑔 𝑓 𝑥
where L is some loss function
293. MACHINE LEARNINGTOOLS/FRAMEWORKS
Caffe is a deep learning framework made with expression, speed, and modularity in
mind. Caffe is developed by the Berkeley Vision and Learning Center (BVLC), as well
as community contributors and is popular for computer vision.
Caffe supports cuDNN v5 for GPU acceleration.
Supported interfaces: C, C++, Python,.
Caffe2 is a deep learning framework enabling simple and flexible deep learning. Built
on the original Caffe, Caffe2 is designed with expression, speed, and modularity in
mind, allowing for a more flexible way to organize computation.
Caffe2 supports cuDNN v5.1 for GPU acceleration.
Supported interfaces: C++, Python
294. MACHINE LEARNINGTOOLS/FRAMEWORKS
The Microsoft Toolkit — previously known as CNTK— is a unified deep-learning
toolkit from Microsoft Research that makes it easy to train and combine popular
model types across multiple GPUs and servers. Microsoft Cognitive Toolkit
implements highly efficient CNN and RNN training for speech, image and text data.
Microsoft CognitiveToolkit supports cuDNN v5.1 for GPU acceleration.
Supported interfaces: Python, C++, C# and Command line interface
TensorFlow is a software library for numerical computation using data flow graphs,
developed by Google’s Machine Intelligence research organization.
TensorFlow supports cuDNN v5.1 for GPU acceleration.
Supported interfaces: C++, Python
295. MACHINE LEARNINGTOOLS/FRAMEWORKS
Theano is a math expression compiler that efficiently defines, optimizes, and evaluates
mathematical expressions involving multi-dimensional arrays.
Theano supports cuDNN v5 for GPU acceleration.
Supported interfaces: Python
Keras is a minimalist, highly modular neural networks library, written in Python, and
capable of running on top of either TensorFlow or Theano. Keras was developed with
a focus on enabling fast experimentation.
cuDNN version depends on the version of TensorFlow and Theano installed with
Keras.
Supported Interfaces: Python