SlideShare a Scribd company logo
1 of 23
Download to read offline
CSC321 Lecture 10: Automatic Differentiation
Roger Grosse
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 1 / 23
Overview
Implementing backprop by hand is like programming in assembly
language.
You’ll probably never do it, but it’s important for having a mental
model of how everything works.
Lecture 6 covered the math of backprop, which you are using to code
it up for a particular network for Assignment 1
This lecture: how to build an automatic differentiation (autodiff)
library, so that you never have to write derivatives by hand
We’ll cover a simplified version of Autograd, a lightweight autodiff tool.
PyTorch’s autodiff feature is based on very similar principles.
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 2 / 23
Confusing Terminology
Automatic differentiation (autodiff) refers to a general way of taking
a program which computes a value, and automatically constructing a
procedure for computing derivatives of that value.
In this lecture, we focus on reverse mode autodiff. There is also a
forward mode, which is for computing directional derivatives.
Backpropagation is the special case of autodiff applied to neural nets
But in machine learning, we often use backprop synonymously with
autodiff
Autograd is the name of a particular autodiff package.
But lots of people, including the PyTorch developers, got confused and
started using “autograd” to mean “autodiff”
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 3 / 23
What Autodiff Is Not
Autodiff is not finite differences.
Finite differences are expensive, since you need to do a forward pass for
each derivative.
It also induces huge numerical error.
Normally, we only use it for testing.
Autodiff is both efficient (linear in the cost of computing the value)
and numerically stable.
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 4 / 23
What Autodiff Is Not
Autodiff is not symbolic differentiation (e.g. Mathematica).
Symbolic differentiation can result in complex and redundant
expressions.
Mathematica’s derivatives for one layer of soft ReLU (univariate case):
Derivatives for two layers of soft ReLU:
There might not be a convenient formula for the derivatives.
The goal of autodiff is not a formula, but a procedure for computing
derivatives.
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 5 / 23
What Autodiff Is
Recall how we computed the derivatives of logistic least squares regression.
An autodiff system should transform the left-hand side into the right-hand
side.
Computing the loss:
z = wx + b
y = σ(z)
L =
1
2
(y − t)2
Computing the derivatives:
L = 1
y = y − t
z = y σ0
(z)
w = z x
b = z
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 6 / 23
What Autodiff Is
An autodiff system will convert the program into a sequence of primitive
operations which have specified routines for computing derivatives.
In this representation, backprop can be done in a completely mechanical way.
Original program:
z = wx + b
y =
1
1 + exp(−z)
L =
1
2
(y − t)2
Sequence of primitive operations:
t1 = wx
z = t1 + b
t3 = −z
t4 = exp(t3)
t5 = 1 + t4
y = 1/t5
t6 = y − t
t7 = t2
6
L = t7/2
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 7 / 23
What Autodiff Is
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 8 / 23
Autograd
The rest of this lecture covers how Autograd is implemented.
Source code for the original Autograd package:
https://github.com/HIPS/autograd
Autodidact, a pedagogical implementation of Autograd — you are
encouraged to read the code.
https://github.com/mattjj/autodidact
Thanks to Matt Johnson for providing this!
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 9 / 23
Building the Computation Graph
Most autodiff systems, including Autograd, explicitly construct the
computation graph.
Some frameworks like TensorFlow provide mini-languages for building
computation graphs directly. Disadvantage: need to learn a totally new API.
Autograd instead builds them by tracing the forward pass computation,
allowing for an interface nearly indistinguishable from NumPy.
The Node class (defined in tracer.py) represents a node of the
computation graph. It has attributes:
value, the actual value computed on a particular set of inputs
fun, the primitive operation defining the node
args and kwargs, the arguments the op was called with
parents, the parent Nodes
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 10 / 23
Building the Computation Graph
Autograd’s fake NumPy module provides primitive ops which look and
feel like NumPy functions, but secretly build the computation graph.
They wrap around NumPy functions:
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 11 / 23
Building the Computation Graph
Example:
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 12 / 23
Vector-Jacobian Products
Previously, I suggested deriving backprop equations in terms of sums
and indices, and then vectorizing them. But we’d like to implement
our primitive operations in vectorized form.
The Jacobian is the matrix of partial derivatives:
J =
∂y
∂x
=




∂y1
∂x1
· · · ∂y1
∂xn
.
.
.
...
.
.
.
∂ym
∂x1
· · · ∂ym
∂xn




The backprop equation (single child node) can be written as a
vector-Jacobian product (VJP):
xj =
X
i
yi
∂yi
∂xj
x = y>
J
That gives a row vector. We can treat it as a column vector by taking
x = J>
y
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 13 / 23
Vector-Jacobian Products
Examples
Matrix-vector product
z = Wx J = W x = W>
z
Elementwise operations
y = exp(z) J =



exp(z1) 0
...
0 exp(zD )


 z = exp(z) ◦ y
Note: we never explicitly construct the Jacobian. It’s usually simpler
and more efficient to compute the VJP directly.
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 14 / 23
Vector-Jacobian Products
For each primitive operation, we must specify VJPs for each of its
arguments. Consider y = exp(x).
This is a function which takes in the output gradient (i.e. y), the
answer (y), and the arguments (x), and returns the input gradient (x)
defvjp (defined in core.py) is a convenience routine for registering
VJPs. It just adds them to a dict.
Examples from numpy/numpy vjps.py
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 15 / 23
Backward Pass
Recall that the backprop computations are more modular if we view
them as message passing.
This procedure can be implemented directly using the data structures
we’ve introduced.
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 16 / 23
Backward Pass
The backwards pass is defined in core.py.
The argument g is the error signal for the end node; for us this is always L = 1.
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 17 / 23
Backward Pass
grad (in differential operators.py) is just a wrapper around make vjp (in
core.py) which builds the computation graph and feeds it to backward pass.
grad itself is viewed as a VJP, if we treat L as the 1 × 1 matrix with entry 1.
∂L
∂w
=
∂L
∂w
L
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 18 / 23
Recap
We saw three main parts to the code:
tracing the forward pass to build the computation graph
vector-Jacobian products for primitive ops
the backwards pass
Building the computation graph requires fancy NumPy gymnastics,
but other two items are basically what I showed you.
You’re encouraged to read the full code (< 200 lines!) at:
https://github.com/mattjj/autodidact/tree/master/autograd
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 19 / 23
Differentiating through a Fluid Simulation
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 20 / 23
Differentiating through a Fluid Simulation
https://github.com/HIPS/autograd#end-to-end-examples
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 21 / 23
Gradient-Based Hyperparameter Optimization
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 22 / 23
Gradient-Based Hyperparameter Optimization
Roger Grosse CSC321 Lecture 10: Automatic Differentiation 23 / 23

More Related Content

Similar to lec10.pdf

Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindAccord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindPVS-Studio
 
Shor's discrete logarithm quantum algorithm for elliptic curves
 Shor's discrete logarithm quantum algorithm for elliptic curves Shor's discrete logarithm quantum algorithm for elliptic curves
Shor's discrete logarithm quantum algorithm for elliptic curvesXequeMateShannon
 
complexity analysis.pdf
complexity analysis.pdfcomplexity analysis.pdf
complexity analysis.pdfpasinduneshan
 
Integrative Parallel Programming in HPC
Integrative Parallel Programming in HPCIntegrative Parallel Programming in HPC
Integrative Parallel Programming in HPCVictor Eijkhout
 
Os2 2
Os2 2Os2 2
Os2 2issbp
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSIJCI JOURNAL
 
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using AutomataModeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using AutomataDaniel Bristot de Oliveira
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and AlgorithmsCP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and AlgorithmsSheba41
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingMark Kilgard
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
ALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTESALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTESsuthi
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Divide&Conquer & Dynamic Programming
Divide&Conquer & Dynamic ProgrammingDivide&Conquer & Dynamic Programming
Divide&Conquer & Dynamic ProgrammingGuillaume Guérard
 

Similar to lec10.pdf (20)

Lecture2a algorithm
Lecture2a algorithmLecture2a algorithm
Lecture2a algorithm
 
Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindAccord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Slides
SlidesSlides
Slides
 
Shor's discrete logarithm quantum algorithm for elliptic curves
 Shor's discrete logarithm quantum algorithm for elliptic curves Shor's discrete logarithm quantum algorithm for elliptic curves
Shor's discrete logarithm quantum algorithm for elliptic curves
 
complexity analysis.pdf
complexity analysis.pdfcomplexity analysis.pdf
complexity analysis.pdf
 
Integrative Parallel Programming in HPC
Integrative Parallel Programming in HPCIntegrative Parallel Programming in HPC
Integrative Parallel Programming in HPC
 
Os2 2
Os2 2Os2 2
Os2 2
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
 
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using AutomataModeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Os2
Os2Os2
Os2
 
ALGO.ppt
ALGO.pptALGO.ppt
ALGO.ppt
 
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and AlgorithmsCP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
ALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTESALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTES
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Divide&Conquer & Dynamic Programming
Divide&Conquer & Dynamic ProgrammingDivide&Conquer & Dynamic Programming
Divide&Conquer & Dynamic Programming
 

Recently uploaded

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

lec10.pdf

  • 1. CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10: Automatic Differentiation 1 / 23
  • 2. Overview Implementing backprop by hand is like programming in assembly language. You’ll probably never do it, but it’s important for having a mental model of how everything works. Lecture 6 covered the math of backprop, which you are using to code it up for a particular network for Assignment 1 This lecture: how to build an automatic differentiation (autodiff) library, so that you never have to write derivatives by hand We’ll cover a simplified version of Autograd, a lightweight autodiff tool. PyTorch’s autodiff feature is based on very similar principles. Roger Grosse CSC321 Lecture 10: Automatic Differentiation 2 / 23
  • 3. Confusing Terminology Automatic differentiation (autodiff) refers to a general way of taking a program which computes a value, and automatically constructing a procedure for computing derivatives of that value. In this lecture, we focus on reverse mode autodiff. There is also a forward mode, which is for computing directional derivatives. Backpropagation is the special case of autodiff applied to neural nets But in machine learning, we often use backprop synonymously with autodiff Autograd is the name of a particular autodiff package. But lots of people, including the PyTorch developers, got confused and started using “autograd” to mean “autodiff” Roger Grosse CSC321 Lecture 10: Automatic Differentiation 3 / 23
  • 4. What Autodiff Is Not Autodiff is not finite differences. Finite differences are expensive, since you need to do a forward pass for each derivative. It also induces huge numerical error. Normally, we only use it for testing. Autodiff is both efficient (linear in the cost of computing the value) and numerically stable. Roger Grosse CSC321 Lecture 10: Automatic Differentiation 4 / 23
  • 5. What Autodiff Is Not Autodiff is not symbolic differentiation (e.g. Mathematica). Symbolic differentiation can result in complex and redundant expressions. Mathematica’s derivatives for one layer of soft ReLU (univariate case): Derivatives for two layers of soft ReLU: There might not be a convenient formula for the derivatives. The goal of autodiff is not a formula, but a procedure for computing derivatives. Roger Grosse CSC321 Lecture 10: Automatic Differentiation 5 / 23
  • 6. What Autodiff Is Recall how we computed the derivatives of logistic least squares regression. An autodiff system should transform the left-hand side into the right-hand side. Computing the loss: z = wx + b y = σ(z) L = 1 2 (y − t)2 Computing the derivatives: L = 1 y = y − t z = y σ0 (z) w = z x b = z Roger Grosse CSC321 Lecture 10: Automatic Differentiation 6 / 23
  • 7. What Autodiff Is An autodiff system will convert the program into a sequence of primitive operations which have specified routines for computing derivatives. In this representation, backprop can be done in a completely mechanical way. Original program: z = wx + b y = 1 1 + exp(−z) L = 1 2 (y − t)2 Sequence of primitive operations: t1 = wx z = t1 + b t3 = −z t4 = exp(t3) t5 = 1 + t4 y = 1/t5 t6 = y − t t7 = t2 6 L = t7/2 Roger Grosse CSC321 Lecture 10: Automatic Differentiation 7 / 23
  • 8. What Autodiff Is Roger Grosse CSC321 Lecture 10: Automatic Differentiation 8 / 23
  • 9. Autograd The rest of this lecture covers how Autograd is implemented. Source code for the original Autograd package: https://github.com/HIPS/autograd Autodidact, a pedagogical implementation of Autograd — you are encouraged to read the code. https://github.com/mattjj/autodidact Thanks to Matt Johnson for providing this! Roger Grosse CSC321 Lecture 10: Automatic Differentiation 9 / 23
  • 10. Building the Computation Graph Most autodiff systems, including Autograd, explicitly construct the computation graph. Some frameworks like TensorFlow provide mini-languages for building computation graphs directly. Disadvantage: need to learn a totally new API. Autograd instead builds them by tracing the forward pass computation, allowing for an interface nearly indistinguishable from NumPy. The Node class (defined in tracer.py) represents a node of the computation graph. It has attributes: value, the actual value computed on a particular set of inputs fun, the primitive operation defining the node args and kwargs, the arguments the op was called with parents, the parent Nodes Roger Grosse CSC321 Lecture 10: Automatic Differentiation 10 / 23
  • 11. Building the Computation Graph Autograd’s fake NumPy module provides primitive ops which look and feel like NumPy functions, but secretly build the computation graph. They wrap around NumPy functions: Roger Grosse CSC321 Lecture 10: Automatic Differentiation 11 / 23
  • 12. Building the Computation Graph Example: Roger Grosse CSC321 Lecture 10: Automatic Differentiation 12 / 23
  • 13. Vector-Jacobian Products Previously, I suggested deriving backprop equations in terms of sums and indices, and then vectorizing them. But we’d like to implement our primitive operations in vectorized form. The Jacobian is the matrix of partial derivatives: J = ∂y ∂x =     ∂y1 ∂x1 · · · ∂y1 ∂xn . . . ... . . . ∂ym ∂x1 · · · ∂ym ∂xn     The backprop equation (single child node) can be written as a vector-Jacobian product (VJP): xj = X i yi ∂yi ∂xj x = y> J That gives a row vector. We can treat it as a column vector by taking x = J> y Roger Grosse CSC321 Lecture 10: Automatic Differentiation 13 / 23
  • 14. Vector-Jacobian Products Examples Matrix-vector product z = Wx J = W x = W> z Elementwise operations y = exp(z) J =    exp(z1) 0 ... 0 exp(zD )    z = exp(z) ◦ y Note: we never explicitly construct the Jacobian. It’s usually simpler and more efficient to compute the VJP directly. Roger Grosse CSC321 Lecture 10: Automatic Differentiation 14 / 23
  • 15. Vector-Jacobian Products For each primitive operation, we must specify VJPs for each of its arguments. Consider y = exp(x). This is a function which takes in the output gradient (i.e. y), the answer (y), and the arguments (x), and returns the input gradient (x) defvjp (defined in core.py) is a convenience routine for registering VJPs. It just adds them to a dict. Examples from numpy/numpy vjps.py Roger Grosse CSC321 Lecture 10: Automatic Differentiation 15 / 23
  • 16. Backward Pass Recall that the backprop computations are more modular if we view them as message passing. This procedure can be implemented directly using the data structures we’ve introduced. Roger Grosse CSC321 Lecture 10: Automatic Differentiation 16 / 23
  • 17. Backward Pass The backwards pass is defined in core.py. The argument g is the error signal for the end node; for us this is always L = 1. Roger Grosse CSC321 Lecture 10: Automatic Differentiation 17 / 23
  • 18. Backward Pass grad (in differential operators.py) is just a wrapper around make vjp (in core.py) which builds the computation graph and feeds it to backward pass. grad itself is viewed as a VJP, if we treat L as the 1 × 1 matrix with entry 1. ∂L ∂w = ∂L ∂w L Roger Grosse CSC321 Lecture 10: Automatic Differentiation 18 / 23
  • 19. Recap We saw three main parts to the code: tracing the forward pass to build the computation graph vector-Jacobian products for primitive ops the backwards pass Building the computation graph requires fancy NumPy gymnastics, but other two items are basically what I showed you. You’re encouraged to read the full code (< 200 lines!) at: https://github.com/mattjj/autodidact/tree/master/autograd Roger Grosse CSC321 Lecture 10: Automatic Differentiation 19 / 23
  • 20. Differentiating through a Fluid Simulation Roger Grosse CSC321 Lecture 10: Automatic Differentiation 20 / 23
  • 21. Differentiating through a Fluid Simulation https://github.com/HIPS/autograd#end-to-end-examples Roger Grosse CSC321 Lecture 10: Automatic Differentiation 21 / 23
  • 22. Gradient-Based Hyperparameter Optimization Roger Grosse CSC321 Lecture 10: Automatic Differentiation 22 / 23
  • 23. Gradient-Based Hyperparameter Optimization Roger Grosse CSC321 Lecture 10: Automatic Differentiation 23 / 23