SlideShare a Scribd company logo
1 of 2
Download to read offline
CS 229 – Machine Learning https://stanford.edu/~shervine
VIP Cheatsheet: Deep Learning
Afshine Amidi and Shervine Amidi
September 15, 2018
Neural Networks
Neural networks are a class of models that are built with layers. Commonly used types of neural
networks include convolutional and recurrent neural networks.
r Architecture – The vocabulary around neural networks architectures is described in the
figure below:
By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:
z
[i]
j = w
[i]
j
T
x + b
[i]
j
where we note w, b, z the weight, bias and output respectively.
r Activation function – Activation functions are used at the end of a hidden unit to introduce
non-linear complexities to the model. Here are the most common ones:
Sigmoid Tanh ReLU Leaky ReLU
g(z) =
1
1 + e−z
g(z) =
ez − e−z
ez + e−z
g(z) = max(0,z) g(z) = max(z,z)
with   1
r Cross-entropy loss – In the context of neural networks, the cross-entropy loss L(z,y) is
commonly used and is defined as follows:
L(z,y) = −
h
y log(z) + (1 − y) log(1 − z)
i
r Learning rate – The learning rate, often noted η, indicates at which pace the weights get
updated. This can be fixed or adaptively changed. The current most popular method is called
Adam, which is a method that adapts the learning rate.
r Backpropagation – Backpropagation is a method to update the weights in the neural network
by taking into account the actual output and the desired output. The derivative with respect
to weight w is computed using chain rule and is of the following form:
∂L(z,y)
∂w
=
∂L(z,y)
∂a
×
∂a
∂z
×
∂z
∂w
As a result, the weight is updated as follows:
w ←− w − η
∂L(z,y)
∂w
r Updating weights – In a neural network, weights are updated as follows:
• Step 1: Take a batch of training data.
• Step 2: Perform forward propagation to obtain the corresponding loss.
• Step 3: Backpropagate the loss to get the gradients.
• Step 4: Use the gradients to update the weights of the network.
r Dropout – Dropout is a technique meant at preventing overfitting the training data by
dropping out units in a neural network. In practice, neurons are either dropped with probability
p or kept with probability 1 − p.
Convolutional Neural Networks
r Convolutional layer requirement – By noting W the input volume size, F the size of the
convolutional layer neurons, P the amount of zero padding, then the number of neurons N that
fit in a given volume is such that:
N =
W − F + 2P
S
+ 1
r Batch normalization – It is a step of hyperparameter γ, β that normalizes the batch {xi}.
By noting µB, σ2
B the mean and variance of that we want to correct to the batch, it is done as
follows:
xi ←− γ
xi − µB
p
σ2
B + 
+ β
It is usually done after a fully connected/convolutional layer and before a non-linearity layer and
aims at allowing higher learning rates and reducing the strong dependence on initialization.
Stanford University 1 Fall 2018
CS 229 – Machine Learning https://stanford.edu/~shervine
Recurrent Neural Networks
r Types of gates – Here are the different types of gates that we encounter in a typical recurrent
neural network:
Input gate Forget gate Output gate Gate
Write to cell or not? Erase a cell or not? Reveal a cell or not? How much writing?
r LSTM – A long short-term memory (LSTM) network is a type of RNN model that avoids
the vanishing gradient problem by adding ’forget’ gates.
Reinforcement Learning and Control
The goal of reinforcement learning is for an agent to learn how to evolve in an environment.
r Markov decision processes – A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R)
where:
• S is the set of states
• A is the set of actions
• {Psa} are the state transition probabilities for s ∈ S and a ∈ A
• γ ∈ [0,1[ is the discount factor
• R : S × A −→ R or R : S −→ R is the reward function that the algorithm wants to
maximize
r Policy – A policy π is a function π : S −→ A that maps states to actions.
Remark: we say that we execute a given policy π if given a state s we take the action a = π(s).
r Value function – For a given policy π and a given state s, we define the value function V π
as follows:
V π
(s) = E
h
R(s0) + γR(s1) + γ2
R(s2) + ...|s0 = s,π
i
r Bellman equation – The optimal Bellman equations characterizes the value function V π∗
of the optimal policy π∗:
V π∗
(s) = R(s) + max
a∈A
γ
X
s0∈S
Psa(s0
)V π∗
(s0
)
Remark: we note that the optimal policy π∗ for a given state s is such that:
π∗
(s) = argmax
a∈A
X
s0∈S
Psa(s0
)V ∗
(s0
)
r Value iteration algorithm – The value iteration algorithm is in two steps:
• We initialize the value:
V0(s) = 0
• We iterate the value based on the values before:
Vi+1(s) = R(s) + max
a∈A

X
s0∈S
γPsa(s0
)Vi(s0
)
#
r Maximum likelihood estimate – The maximum likelihood estimates for the state transition
probabilities are as follows:
Psa(s0
) =
#times took action a in state s and got to s0
#times took action a in state s
r Q-learning – Q-learning is a model-free estimation of Q, which is done as follows:
Q(s,a) ← Q(s,a) + α
h
R(s,a,s0
) + γ max
a0
Q(s0
,a0
) − Q(s,a)
i
Stanford University 2 Fall 2018

More Related Content

What's hot

Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
butest
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Simplilearn
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 

What's hot (20)

Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Deep learning
Deep learning Deep learning
Deep learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Deep learning
Deep learningDeep learning
Deep learning
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 

Similar to Cheatsheet deep-learning

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
soft computing
soft computingsoft computing
soft computing
AMIT KUMAR
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 

Similar to Cheatsheet deep-learning (20)

Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural Networks
 
Practical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowPractical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlow
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
6
66
6
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
Artificial Neural Network for machine learning
Artificial Neural Network for machine learningArtificial Neural Network for machine learning
Artificial Neural Network for machine learning
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
soft computing
soft computingsoft computing
soft computing
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 

More from Steve Nouri (12)

Add a subheading.pdf
Add a subheading.pdfAdd a subheading.pdf
Add a subheading.pdf
 
CES 2 AI Products
CES 2 AI ProductsCES 2 AI Products
CES 2 AI Products
 
CES AI Product.pdf
CES AI Product.pdfCES AI Product.pdf
CES AI Product.pdf
 
NFT Types
NFT TypesNFT Types
NFT Types
 
Perspectives matters
Perspectives mattersPerspectives matters
Perspectives matters
 
Make AI & BI work at Scale
Make AI & BI work at ScaleMake AI & BI work at Scale
Make AI & BI work at Scale
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
 
Cheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricksCheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricks
 
Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learning
 
Refresher probabilities-statistics
Refresher probabilities-statisticsRefresher probabilities-statistics
Refresher probabilities-statistics
 
Refresher algebra-calculus
Refresher algebra-calculusRefresher algebra-calculus
Refresher algebra-calculus
 

Recently uploaded

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
httgc7rh9c
 

Recently uploaded (20)

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food Additives
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 

Cheatsheet deep-learning

  • 1. CS 229 – Machine Learning https://stanford.edu/~shervine VIP Cheatsheet: Deep Learning Afshine Amidi and Shervine Amidi September 15, 2018 Neural Networks Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. r Architecture – The vocabulary around neural networks architectures is described in the figure below: By noting i the ith layer of the network and j the jth hidden unit of the layer, we have: z [i] j = w [i] j T x + b [i] j where we note w, b, z the weight, bias and output respectively. r Activation function – Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones: Sigmoid Tanh ReLU Leaky ReLU g(z) = 1 1 + e−z g(z) = ez − e−z ez + e−z g(z) = max(0,z) g(z) = max(z,z) with 1 r Cross-entropy loss – In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows: L(z,y) = − h y log(z) + (1 − y) log(1 − z) i r Learning rate – The learning rate, often noted η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate. r Backpropagation – Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form: ∂L(z,y) ∂w = ∂L(z,y) ∂a × ∂a ∂z × ∂z ∂w As a result, the weight is updated as follows: w ←− w − η ∂L(z,y) ∂w r Updating weights – In a neural network, weights are updated as follows: • Step 1: Take a batch of training data. • Step 2: Perform forward propagation to obtain the corresponding loss. • Step 3: Backpropagate the loss to get the gradients. • Step 4: Use the gradients to update the weights of the network. r Dropout – Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1 − p. Convolutional Neural Networks r Convolutional layer requirement – By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that: N = W − F + 2P S + 1 r Batch normalization – It is a step of hyperparameter γ, β that normalizes the batch {xi}. By noting µB, σ2 B the mean and variance of that we want to correct to the batch, it is done as follows: xi ←− γ xi − µB p σ2 B + + β It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization. Stanford University 1 Fall 2018
  • 2. CS 229 – Machine Learning https://stanford.edu/~shervine Recurrent Neural Networks r Types of gates – Here are the different types of gates that we encounter in a typical recurrent neural network: Input gate Forget gate Output gate Gate Write to cell or not? Erase a cell or not? Reveal a cell or not? How much writing? r LSTM – A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding ’forget’ gates. Reinforcement Learning and Control The goal of reinforcement learning is for an agent to learn how to evolve in an environment. r Markov decision processes – A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where: • S is the set of states • A is the set of actions • {Psa} are the state transition probabilities for s ∈ S and a ∈ A • γ ∈ [0,1[ is the discount factor • R : S × A −→ R or R : S −→ R is the reward function that the algorithm wants to maximize r Policy – A policy π is a function π : S −→ A that maps states to actions. Remark: we say that we execute a given policy π if given a state s we take the action a = π(s). r Value function – For a given policy π and a given state s, we define the value function V π as follows: V π (s) = E h R(s0) + γR(s1) + γ2 R(s2) + ...|s0 = s,π i r Bellman equation – The optimal Bellman equations characterizes the value function V π∗ of the optimal policy π∗: V π∗ (s) = R(s) + max a∈A γ X s0∈S Psa(s0 )V π∗ (s0 ) Remark: we note that the optimal policy π∗ for a given state s is such that: π∗ (s) = argmax a∈A X s0∈S Psa(s0 )V ∗ (s0 ) r Value iteration algorithm – The value iteration algorithm is in two steps: • We initialize the value: V0(s) = 0 • We iterate the value based on the values before: Vi+1(s) = R(s) + max a∈A X s0∈S γPsa(s0 )Vi(s0 ) # r Maximum likelihood estimate – The maximum likelihood estimates for the state transition probabilities are as follows: Psa(s0 ) = #times took action a in state s and got to s0 #times took action a in state s r Q-learning – Q-learning is a model-free estimation of Q, which is done as follows: Q(s,a) ← Q(s,a) + α h R(s,a,s0 ) + γ max a0 Q(s0 ,a0 ) − Q(s,a) i Stanford University 2 Fall 2018