SlideShare a Scribd company logo
1 of 34
Joe Li(lizhi97@gmail.com)
§ Unit (Neurons)
§ Input Layer
§ Hidden Layers
§ Batch Normalization
§ Cost/Loss(Min) Objective(Max) Functions
§ Regularization
§ Activation Functions
§ Backpropagation
§ Learning Rate
§ Optimization
§ Libraries
§ A unit often refers to the activation function in a layer by which the inputs are
transformed via a nonlinear activation function (for example by the logistic
sigmoid function). Usually, a unit has several incoming connections and several
outgoing connections.
§ Comprised of multiple Real-Valued inputs. Each input must be linearly
independent from each other.
§ Layers other than the input and output layers. A layer is the highest-level building
block in deep learning. A layer is a container that usually receives weighted input,
transforms it with a set of mostly non-linear functions and then passes these values
as output to the next layer.
§ Using mini-batches of examples, as opposed to one example at a time, is helpful in
several ways. First, the gradient of the loss over a mini-batch is an estimate of the
gradient over the training set, whose quality improves as the batch size increases.
Second, computation over a batch can be much more efficient than m computations
for individual examples, due to the parallelism afforded by the modern computing
platforms.
§ With SGD, the training proceeds in steps, and at each step we consider a mini- batch
x1...m of size m.The mini-batch is used to approx- imate the gradient of the loss function
with respect to the parameters.
§ Intuition
§ Maximum Likelihood Estimation(MLE)
§ Cross-Entropy
§ Logistic
§ Quadratic
§ 0-1 Loss
§ Hinge Loss
§ Exponential
§ https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-
applications
§ https://en.wikipedia.org/wiki/Loss_functions_for_classification
§ The cost function will tell us how right the predictions of our model and weight matrix
are, and the choice of cost function will drive how much we care about how wrong each
prediction is. For instance, a hinge loss will assume the difference between each
incorrect prediction value is linear.Were we to square the hinge loss and use that as
our cost function, we would be telling the system that being very wrong gets
exponentially worse as we get away from the right prediction. Cross Entropy would
offer a probabilistic approach.
§ Many cost functions are the result of applying Maximum Likelihood. For instance,
the Least Squares cost function can be obtained via Maximum Likelihood. Cross-
Entropy is another example.
§ The likelihood of a parameter value (or vector of parameter values), θ, given
outcomes x, is equal to the probability (density) assumed for those observed
outcomes given those parameter values, that is
§ The natural logarithm of the likelihood function, called the log-likelihood, is more
convenient to work with. Because the logarithm is a monotonically increasing
function, the logarithm of a function achieves its maximum value at the same points
as the function itself, and hence the log-likelihood can be used in place of the
likelihood in maximum likelihood estimation and related techniques.
§ In general, for a fixed set of data and underlying statistical model, the method of maximum
likelihood selects the set of values of the model parameters that maximizes the likelihood
function. Intuitively, this maximizes the "agreement" of the selected model with the observed
data, and for discrete random variables it indeed maximizes the probability of the observed data
under the resulting distribution. Maximum-likelihood estimation gives a unified approach to
estimation, which is well-defined in the case of the normal distribution and many other
problems.
§ Cross entropy can be used to define the loss function in machine learning and
optimization.The true probability pi is the true label, and the given distribution qi
is the predicted value of the current model.
§ Cross-entropy error function and logistic regression
§ The logistic loss function is defined as:
§ The use of a quadratic loss function is common, for example when using least
squares techniques. It is often more mathematically tractable than other loss
functions because of the properties of variances, as well as being symmetric: an
error above the target causes the same loss as the same magnitude of error below
the target. If the target is t, then a quadratic loss function is:
§ In statistics and decision theory, a frequently used loss function is the 0-1 loss
function
§ The hinge loss is a loss function used for training classifiers. For an intended output
t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as:
§ L1 norm
§ L2 norm
§ Early Stopping
§ Dropout
§ Sparse regularize on columns
§ Nuclear norm regularization
§ Mean-constrained regularization
§ Clustered mean-constrained regularization
§ Graph-based similarity
§ Manhattan Distance
§ L1-norm is also known as least absolute deviations (LAD), least absolute errors (LAE). It is
basically minimizing the sum of the absolute differences (S) between the target value and
the estimated values.
§ Intuitively, the L1 norm prefers a weight matrix which contains the larger number of zeros.
§ Euclidean Distance
§ L2-norm is also known as least squares. It is basically minimizing the sum of the square of
the differences (S) between the target value and the estimated values:
§ Intuitively, the L2 norm prefers a weight matrix where the norm is distributed across all
weight matrix entries.
§ Early stopping rules provide guidance as to how many iterations can be run before
the learner begins to over-fit, and stop the algorithm then.
§ Is a regularization technique for reducing overfitting in neural networks by
preventing complex co-adaptations on training data. It is a very efficient way of
performing model averaging with neural networks.The term "dropout" refers to
dropping out units (both hidden and visible) in a neural network
§ This regularizer defines an L2 norm on each column and an L1 norm over all
columns. It can be solved by proximal methods.
§ This regularizer constrains the functions learned for each task to be similar to the
overall average of the functions across all tasks.This is useful for expressing prior
information that each task is expected to share similarities with each other task. An
example is predicting blood iron levels measured at different times of the day,
where each task represents a different person.
§ This regularizer is similar to the mean-constrained regularizer, but instead
enforces similarity between tasks within the same cluster.This can capture more
complex prior information.This technique has been used to predict Netflix
recommendations.
§ More general than above, similarity between tasks can be defined by a function.
The regularizer encourages the model to learn similar functions for similar tasks.
§ Defines the output of that node given an input or set of inputs.
§ Types
§ ReLU
§ Sigmoid / Logistic
§ Binary
§ Tanh
§ Softplus
§ Softmax
§ Maxout
§ Leaky ReLU, PReLU, RReLU, ELU, SELU, and others.
§ Is a method used in artificial neural networks to calculate the error contribution of each
neuron after a batch of data. It calculates the gradient of the loss function. It is
commonly used in the gradient descent optimization algorithm. It is also called
backward propagation of errors, because the error is calculated at the output and
distributed back through the network layers.
§ Neural Network taking 4 dimension vector representation of words.
§ In this method, we reuse partial derivatives computed for higher layers in lower
layers, for efficiency.
§ Intuition for backpropagation
§ Simple Example (Circuits)
§ Another Example (Circuits)
§ Simple Example (Flowgraphs)
§ Neural networks are often trained by gradient descent on the weights.This means
at each iteration we use backpropagation to calculate the derivative of the loss
function with respect to each weight and subtract it from that weight.
§ However, if you actually try that, the weights will change far too much each iteration, which
will make them “overcorrect” and the loss will actually increase/diverge. So in practice,
people usually multiply each derivative by a small value called the “learning rate” before
they subtract it from its corresponding weight.
§ Simplest recipe: keep it fixed and use the same for all parameters.
§ Better results by allowing learning rates to decrease Options:
§ Reduce by 0.5 when validation error stops improving
§ Reduction by O(1/t) because of theoretical convergence guarantees, with hyper-
parameters ε0 and τ and t is iteration numbers.
§ Better yet: No hand-set learning of rates by using AdaGrad
Optimization
§ Gradient Descent
§ Is a first-order iterative optimization algorithm for finding the minimum of a function.To find a
local minimum of a function using gradient descent, one takes steps proportional to the
negative of the gradient (or of the approximate gradient) of the function at the current point. If
instead one takes steps proportional to the positive of the gradient, one approaches a local
maximum of that function; the procedure is then known as gradient ascent.
§ Stochastic Gradient Descent (SGD)
§ Gradient descent uses total gradient over all examples per update, SGD updates after only 1
or few examples:
§ Mini-batch Stochastic Gradient Descent (SGD)
§ Gradient descent uses total gradient over all examples per update, SGD updates after only 1
example
§ Momentum
§ Idea: Add a fraction v of previous update to current one.When the gradient keeps pointing in
the same direction, this will increase the size of the steps taken towards the minimum.
§ Adagrad
§ Adaptive learning rates for each parameter

More Related Content

What's hot

Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introductionGanesan Narayanasamy
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learningUjjawal
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...Jisang Yoon
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Jon Lederman
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_BSrimatre K
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANNwaseem khan
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmSupun Abeysinghe
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and ProgrammingMelanie Knight
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Zihui Li
 

What's hot (20)

Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_B
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
 

Similar to Deep learning concepts

Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architecturesJoe li
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMayuraD1
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptxkumarkaushal17
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdfjuan631
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptxPriyadharshiniG41
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 

Similar to Deep learning concepts (20)

Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Ann a Algorithms notes
Ann a Algorithms notesAnn a Algorithms notes
Ann a Algorithms notes
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 

Recently uploaded

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Deep learning concepts

  • 2. § Unit (Neurons) § Input Layer § Hidden Layers § Batch Normalization § Cost/Loss(Min) Objective(Max) Functions § Regularization § Activation Functions § Backpropagation § Learning Rate § Optimization § Libraries
  • 3. § A unit often refers to the activation function in a layer by which the inputs are transformed via a nonlinear activation function (for example by the logistic sigmoid function). Usually, a unit has several incoming connections and several outgoing connections.
  • 4. § Comprised of multiple Real-Valued inputs. Each input must be linearly independent from each other.
  • 5. § Layers other than the input and output layers. A layer is the highest-level building block in deep learning. A layer is a container that usually receives weighted input, transforms it with a set of mostly non-linear functions and then passes these values as output to the next layer.
  • 6. § Using mini-batches of examples, as opposed to one example at a time, is helpful in several ways. First, the gradient of the loss over a mini-batch is an estimate of the gradient over the training set, whose quality improves as the batch size increases. Second, computation over a batch can be much more efficient than m computations for individual examples, due to the parallelism afforded by the modern computing platforms. § With SGD, the training proceeds in steps, and at each step we consider a mini- batch x1...m of size m.The mini-batch is used to approx- imate the gradient of the loss function with respect to the parameters.
  • 7. § Intuition § Maximum Likelihood Estimation(MLE) § Cross-Entropy § Logistic § Quadratic § 0-1 Loss § Hinge Loss § Exponential § https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside- applications § https://en.wikipedia.org/wiki/Loss_functions_for_classification
  • 8. § The cost function will tell us how right the predictions of our model and weight matrix are, and the choice of cost function will drive how much we care about how wrong each prediction is. For instance, a hinge loss will assume the difference between each incorrect prediction value is linear.Were we to square the hinge loss and use that as our cost function, we would be telling the system that being very wrong gets exponentially worse as we get away from the right prediction. Cross Entropy would offer a probabilistic approach.
  • 9. § Many cost functions are the result of applying Maximum Likelihood. For instance, the Least Squares cost function can be obtained via Maximum Likelihood. Cross- Entropy is another example. § The likelihood of a parameter value (or vector of parameter values), θ, given outcomes x, is equal to the probability (density) assumed for those observed outcomes given those parameter values, that is § The natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques.
  • 10. § In general, for a fixed set of data and underlying statistical model, the method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the "agreement" of the selected model with the observed data, and for discrete random variables it indeed maximizes the probability of the observed data under the resulting distribution. Maximum-likelihood estimation gives a unified approach to estimation, which is well-defined in the case of the normal distribution and many other problems.
  • 11. § Cross entropy can be used to define the loss function in machine learning and optimization.The true probability pi is the true label, and the given distribution qi is the predicted value of the current model. § Cross-entropy error function and logistic regression
  • 12. § The logistic loss function is defined as:
  • 13. § The use of a quadratic loss function is common, for example when using least squares techniques. It is often more mathematically tractable than other loss functions because of the properties of variances, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is t, then a quadratic loss function is:
  • 14. § In statistics and decision theory, a frequently used loss function is the 0-1 loss function
  • 15. § The hinge loss is a loss function used for training classifiers. For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as:
  • 16.
  • 17. § L1 norm § L2 norm § Early Stopping § Dropout § Sparse regularize on columns § Nuclear norm regularization § Mean-constrained regularization § Clustered mean-constrained regularization § Graph-based similarity
  • 18. § Manhattan Distance § L1-norm is also known as least absolute deviations (LAD), least absolute errors (LAE). It is basically minimizing the sum of the absolute differences (S) between the target value and the estimated values. § Intuitively, the L1 norm prefers a weight matrix which contains the larger number of zeros.
  • 19. § Euclidean Distance § L2-norm is also known as least squares. It is basically minimizing the sum of the square of the differences (S) between the target value and the estimated values: § Intuitively, the L2 norm prefers a weight matrix where the norm is distributed across all weight matrix entries.
  • 20. § Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit, and stop the algorithm then.
  • 21. § Is a regularization technique for reducing overfitting in neural networks by preventing complex co-adaptations on training data. It is a very efficient way of performing model averaging with neural networks.The term "dropout" refers to dropping out units (both hidden and visible) in a neural network
  • 22. § This regularizer defines an L2 norm on each column and an L1 norm over all columns. It can be solved by proximal methods.
  • 23.
  • 24. § This regularizer constrains the functions learned for each task to be similar to the overall average of the functions across all tasks.This is useful for expressing prior information that each task is expected to share similarities with each other task. An example is predicting blood iron levels measured at different times of the day, where each task represents a different person.
  • 25. § This regularizer is similar to the mean-constrained regularizer, but instead enforces similarity between tasks within the same cluster.This can capture more complex prior information.This technique has been used to predict Netflix recommendations.
  • 26. § More general than above, similarity between tasks can be defined by a function. The regularizer encourages the model to learn similar functions for similar tasks.
  • 27. § Defines the output of that node given an input or set of inputs. § Types § ReLU § Sigmoid / Logistic § Binary § Tanh § Softplus § Softmax § Maxout § Leaky ReLU, PReLU, RReLU, ELU, SELU, and others.
  • 28. § Is a method used in artificial neural networks to calculate the error contribution of each neuron after a batch of data. It calculates the gradient of the loss function. It is commonly used in the gradient descent optimization algorithm. It is also called backward propagation of errors, because the error is calculated at the output and distributed back through the network layers. § Neural Network taking 4 dimension vector representation of words.
  • 29. § In this method, we reuse partial derivatives computed for higher layers in lower layers, for efficiency.
  • 30. § Intuition for backpropagation § Simple Example (Circuits) § Another Example (Circuits)
  • 31. § Simple Example (Flowgraphs)
  • 32. § Neural networks are often trained by gradient descent on the weights.This means at each iteration we use backpropagation to calculate the derivative of the loss function with respect to each weight and subtract it from that weight. § However, if you actually try that, the weights will change far too much each iteration, which will make them “overcorrect” and the loss will actually increase/diverge. So in practice, people usually multiply each derivative by a small value called the “learning rate” before they subtract it from its corresponding weight.
  • 33. § Simplest recipe: keep it fixed and use the same for all parameters. § Better results by allowing learning rates to decrease Options: § Reduce by 0.5 when validation error stops improving § Reduction by O(1/t) because of theoretical convergence guarantees, with hyper- parameters ε0 and τ and t is iteration numbers. § Better yet: No hand-set learning of rates by using AdaGrad
  • 34. Optimization § Gradient Descent § Is a first-order iterative optimization algorithm for finding the minimum of a function.To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent. § Stochastic Gradient Descent (SGD) § Gradient descent uses total gradient over all examples per update, SGD updates after only 1 or few examples: § Mini-batch Stochastic Gradient Descent (SGD) § Gradient descent uses total gradient over all examples per update, SGD updates after only 1 example § Momentum § Idea: Add a fraction v of previous update to current one.When the gradient keeps pointing in the same direction, this will increase the size of the steps taken towards the minimum. § Adagrad § Adaptive learning rates for each parameter