SlideShare a Scribd company logo
1 of 23
Download to read offline
Linear Algebra
Lecture slides for Chapter 2 of Deep Learning
Ian Goodfellow
2016-06-24
(Goodfellow 2016)
About this chapter
• Not a comprehensive survey of all of linear algebra
• Focused on the subset most relevant to deep
learning
• Larger subset: e.g., Linear Algebra by Georgi Shilov
(Goodfellow 2016)
Scalars
• A scalar is a single number
• Integers, real numbers, rational numbers, etc.
• We denote it with italic font:
a, n, x
(Goodfellow 2016)
Vectors
• A vector is a 1-D array of numbers:
• Can be real, binary, integer, etc.
• Example notation for type and size:
rder. We can identify each individual number by its index in that ordering.
ypically we give vectors lower case names written in bold typeface, such
s x. The elements of the vector are identified by writing its name in italic
ypeface, with a subscript. The first element of x is x1, the second element
x2 and so on. We also need to say what kind of numbers are stored in
he vector. If each element is in R, and the vector has n elements, then the
ector lies in the set formed by taking the Cartesian product of R n times,
enoted as Rn. When we need to explicitly identify the elements of a vector,
e write them as a column enclosed in square brackets:
x =
2
6
6
6
4
x1
x2
...
xn
3
7
7
7
5
. (2.1)
We can think of vectors as identifying points in space, with each element
ving the coordinate along a different axis.
ometimes we need to index a set of elements of a vector. In this case, we
efine a set containing the indices and write the set as a subscript. For
xample, to access x1, x3 and x6, we define the set S = {1, 3, 6} and write
S. We use the sign to index the complement of a set. For example x 1 is
he vector containing all elements of x except for x1, and x S is the vector
ontaining all of the elements of x except for x1, x3 and x6.
Matrices: A matrix is a 2-D array of numbers, so each element is identified by
wo indices instead of just one. We usually give matrices upper-case variable
ames with bold typeface, such as A. If a real-valued matrix A has a height
• Vectors: A vector is an array of
order. We can identify each indiv
Typically we give vectors lower
as x. The elements of the vector
typeface, with a subscript. The fi
is x2 and so on. We also need t
the vector. If each element is in R
vector lies in the set formed by t
denoted as Rn. When we need to
(Goodfellow 2016)
Matrices
• A matrix is a 2-D array of numbers:
• Example notation for type and shape:
A =
2
4
A1,1 A1,2
A2,1 A2,2
A3,1 A3,2
3
5 ) A>
=

A1,1 A2,1 A3,1
A1,2 A2,2 A3,2
The transpose of the matrix can be thought of as a mirror image across the
nal.
th column of A. When we need to explicitly identify the elements of a
x, we write them as an array enclosed in square brackets:

A1,1 A1,2
A2,1 A2,2
. (2.2)
times we may need to index matrix-valued expressions that are not just
gle letter. In this case, we use subscripts after the expression, but do
onvert anything to lower case. For example, f(A)i,j gives element (i, j)
e matrix computed by applying the function f to A.
ors: In some cases we will need an array with more than two axes. In
eneral case, an array of numbers arranged on a regular grid with a
ble number of axes is known as a tensor. We denote a tensor named “A”
this typeface: A. We identify the element of A at coordinates (i, j, k)
riting Ai,j,k.
ndices and write the set as a subscript
x6, we define the set S = {1, 3, 6} and
x the complement of a set. For example
ents of x except for x1, and x S is the
of x except for x1, x3 and x6.
ray of numbers, so each element is identifi
. We usually give matrices upper-case va
h as A. If a real-valued matrix A has a
we say that A 2 Rm⇥n. We usually id
g its name in italic but not bold font, an
Column
Row
(Goodfellow 2016)
Tensors
• A tensor is an array of numbers, that may have
• zero dimensions, and be a scalar
• one dimension, and be a vector
• two dimensions, and be a matrix
• or more dimensions.
(Goodfellow 2016)
Matrix Transpose
CHAPTER 2. LINEAR ALGEBRA
A =
2
4
A1,1 A1,2
A2,1 A2,2
A3,1 A3,2
3
5 ) A>
=

A1,1 A2,1 A3,1
A1,2 A2,2 A3,2
Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the
main diagonal.
the i-th column of A. When we need to explicitly identify the elements of a
matrix, we write them as an array enclosed in square brackets:

A1,1 A1,2
A2,1 A2,2
. (2.2)
rtant operation on matrices is the transpose. The transpose of a
mirror image of the matrix across a diagonal line, called the main
ning down and to the right, starting from its upper left corner. See
graphical depiction of this operation. We denote the transpose of a
A>, and it is defined such that
(A>
)i,j = Aj,i. (2.3)
an be thought of as matrices that contain only one column. The
a vector is therefore a matrix with only one row. Sometimes we
33
ations have many useful properties that make mathematical
more convenient. For example, matrix multiplication is
A(B + C) = AB + AC. (2.6)
A(BC) = (AB)C. (2.7)
is not commutative (the condition AB = BA does not
alar multiplication. However, the dot product between two
:
x>
y = y>
x. (2.8)
matrix product has a simple form:
(AB)>
= B>
A>
. (2.9)
onstrate Eq. 2.8, by exploiting the fact that the value of
(Goodfellow 2016)
Matrix (Dot) Product
duct of matrices A and B is a third matrix C. In order
ned, A must have the same number of columns as B has
⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p.
product just by placing two or more matrices together,
C = AB. (2.4)
n is defined by
Ci,j =
X
k
Ai,kBk,j. (2.5)
d product of two matrices is not just a matrix containing
dual elements. Such an operation exists and is called the
Hadamard product, and is denoted as A B.
en two vectors x and y of the same dimensionality is the
can think of the matrix product C = AB as computing
etween row i of A and column j of B.
= •m
p
m
p
n
n
Must
match
e defined, A must have the same number of columns as B has
pe m ⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p.
atrix product just by placing two or more matrices together,
C = AB. (2.4)
ration is defined by
Ci,j =
X
k
Ai,kBk,j. (2.5)
andard product of two matrices is not just a matrix containing
ndividual elements. Such an operation exists and is called the
t or Hadamard product, and is denoted as A B.
between two vectors x and y of the same dimensionality is the
. We can think of the matrix product C = AB as computing
uct between row i of A and column j of B.
34
(Goodfellow 2016)
Identity Matrix
PTER 2. LINEAR ALGEBRA
2
4
1 0 0
0 1 0
0 0 1
3
5
Figure 2.2: Example identity matrix: This is I3.
A2,1x1 + A2,2x2 + · · · + A2,nxn = b2 (2
. . . (2
Am,1x1 + Am,2x2 + · · · + Am,nxn = bm. (2
Matrix-vector product notation provides a more compact representation
ations of this form.
Inverse Matrices
werful tool called matrix inversion that allows us to
for many values of A.
ersion, we first need to define the concept of an identity
x is a matrix that does not change any vector when we
at matrix. We denote the identity matrix that preserves
n. Formally, In 2 Rn⇥n, and
8x 2 Rn
, Inx = x. (2.20)
ity matrix is simple: all of the entries along the main
the other entries are zero. See Fig. 2.2 for an example.
(Goodfellow 2016)
Systems of Equations
t of useful properties of the matrix product here, but
that many more exist.
ear algebra notation to write down a system of linear
Ax = b (2.11)
n matrix, b 2 Rm is a known vector, and x 2 Rn is a
we would like to solve for. Each element xi of x is one
Each row of A and each element of b provide another
Eq. 2.11 as:
A1,:x = b1 (2.12)
A2,:x = b2 (2.13)
. . . (2.14)
expands to
aware that many more exist.
ough linear algebra notation to write down a system of linear
Ax = b (2.11)
a known matrix, b 2 Rm is a known vector, and x 2 Rn is a
riables we would like to solve for. Each element xi of x is one
iables. Each row of A and each element of b provide another
ewrite Eq. 2.11 as:
A1,:x = b1 (2.12)
A2,:x = b2 (2.13)
. . . (2.14)
Am,:x = bm (2.15)
tly, as:
A1,1x1 + A1,2x2 + · · · + A1,nxn = b1 (2.16)
35
(Goodfellow 2016)
Solving Systems of Equations
• A linear system of equations can have:
• No solution
• Many solutions
• Exactly one solution: this means multiplication by
the matrix is an invertible function
(Goodfellow 2016)
Matrix Inversion
• Matrix inverse:
• Solving a system using an inverse:
• Numerically unstable, but useful for abstract
analysis
identity matrix is simple: all of the entries along the main
all of the other entries are zero. See Fig. 2.2 for an example.
se of A is denoted as A 1, and it is defined as the matrix
A 1
A = In. (2.21)
Eq. 2.11 by the following steps:
Ax = b (2.22)
A 1
Ax = A 1
b (2.23)
Inx = A 1
b (2.24)
36
f the identity matrix is simple: all of the entries along the main
while all of the other entries are zero. See Fig. 2.2 for an example.
inverse of A is denoted as A 1, and it is defined as the matrix
A 1
A = In. (2.21)
solve Eq. 2.11 by the following steps:
Ax = b (2.22)
A 1
Ax = A 1
b (2.23)
Inx = A 1
b (2.24)
36
(Goodfellow 2016)
Invertibility
• Matrix can’t be inverted if…
• More rows than columns
• More columns than rows
• Redundant rows/columns (“linearly dependent”,
“low rank”)
(Goodfellow 2016)
Norms
• Functions that measure how “large” a vector is
• Similar to a distance between zero and the point
represented by the vector
is given by
||x||p =
X
i
|xi|p
!1
p
for p 2 R, p 1.
Norms, including the Lp norm, are functions mapping vect
values. On an intuitive level, the norm of a vector x measure
the origin to the point x. More rigorously, a norm is any func
the following properties:
• f(x) = 0 ) x = 0
• f(x + y)  f(x) + f(y) (the triangle inequality)
• 8↵ 2 R, f(↵x) = |↵|f(x)
The L2 norm, with p = 2, is known as the Euclidean nor
Euclidean distance from the origin to the point identified by
(Goodfellow 2016)
• Lp
norm
• Most popular norm: L2 norm, p=2
• L1 norm, p=1:
• Max norm, infinite p:
Norms
o measure the size of a vector. In machine learning, we
ectors using a function called a norm. Formally, the L
||x||p =
X
i
|xi|p
!1
p
the Lp norm, are functions mapping vectors to non-n
ive level, the norm of a vector x measures the distan
nt x. More rigorously, a norm is any function f that
ties:
APTER 2. LINEAR ALGEBRA
ause it increases very slowly near the origin. In several machine learning
lications, it is important to discriminate between elements that are exactly
and elements that are small but nonzero. In these cases, we turn to a function
t grows at the same rate in all locations, but retains mathematical simplicity:
L1 norm. The L1 norm may be simplified to
||x||1 =
X
i
|xi|. (2.31)
L1 norm is commonly used in machine learning when the difference between
o and nonzero elements is very important. Every time an element of x moves
y from 0 by ✏, the L1 norm increases by ✏.
We sometimes measure the size of the vector by counting its number of nonzero
ments. Some authors refer to this function as the “L0 norm,” but this is incorrect
minology. The number of non-zero entries in a vector is not a norm, because
zero and elements that are small but nonzero. In these cases, we turn to a function
that grows at the same rate in all locations, but retains mathematical simplicity:
the L1 norm. The L1 norm may be simplified to
||x||1 =
X
i
|xi|. (2.31)
The L1 norm is commonly used in machine learning when the difference between
zero and nonzero elements is very important. Every time an element of x moves
away from 0 by ✏, the L1 norm increases by ✏.
We sometimes measure the size of the vector by counting its number of nonzero
elements. Some authors refer to this function as the “L0 norm,” but this is incorrect
terminology. The number of non-zero entries in a vector is not a norm, because
scaling the vector by ↵ does not change the number of nonzero entries. The L1
norm is often used as a substitute for the number of nonzero entries.
One other norm that commonly arises in machine learning is the L1 norm,
also known as the max norm. This norm simplifies to the absolute value of the
element with the largest magnitude in the vector,
||x||1 = max
i
|xi|. (2.32)
Sometimes we may also wish to measure the size of a matrix. In the context
of deep learning, the most common way to do this is with the otherwise obscure
Frobenius norm sX
(Goodfellow 2016)
• Unit vector:
• Symmetric Matrix:
• Orthogonal matrix:
Special Matrices and Vectors
A = A>
. (2.35)
en arise when the entries are generated by some function of
es not depend on the order of the arguments. For example,
ance measurements, with Ai,j giving the distance from point
= Aj,i because distance functions are symmetric.
ector with unit norm:
||x||2 = 1. (2.36)
vector y are orthogonal to each other if x>y = 0. If both
orm, this means that they are at a 90 degree angle to each
n vectors may be mutually orthogonal with nonzero norm.
only orthogonal but also have unit norm, we call them
ix is a square matrix whose rows are mutually orthonormal
e mutually orthonormal:
> >
1 n
machine learning algorithm in terms of arbitrary matrices,
sive (and less descriptive) algorithm by restricting some
ices need be square. It is possible to construct a rectangular
quare diagonal matrices do not have inverses but it is still
them cheaply. For a non-square diagonal matrix D, the
scaling each element of x, and either concatenating some
is taller than it is wide, or discarding some of the last
D is wider than it is tall.
is any matrix that is equal to its own transpose:
A = A>
. (2.35)
n arise when the entries are generated by some function of
s not depend on the order of the arguments. For example,
nce measurements, with Ai,j giving the distance from point
= Aj,i because distance functions are symmetric.
es often arise when the entries are generated by some function of
at does not depend on the order of the arguments. For example,
distance measurements, with Ai,j giving the distance from point
Ai,j = Aj,i because distance functions are symmetric.
is a vector with unit norm:
||x||2 = 1. (2.36)
nd a vector y are orthogonal to each other if x>y = 0. If both
ero norm, this means that they are at a 90 degree angle to each
most n vectors may be mutually orthogonal with nonzero norm.
e not only orthogonal but also have unit norm, we call them
matrix is a square matrix whose rows are mutually orthonormal
ns are mutually orthonormal:
A>
A = AA>
= I. (2.37)
41
LGEBRA
A 1
= A>
, (2.38)
(Goodfellow 2016)
Eigendecomposition
• Eigenvector and eigenvalue:
• Eigendecomposition of a diagonalizable matrix:
• Every real symmetric matrix has a real, orthogonal
eigendecomposition:
atrix as an array of elements.
dely used kinds of matrix decomposition is called eigen-
h we decompose a matrix into a set of eigenvectors and
square matrix A is a non-zero vector v such that multipli-
the scale of v:
Av = v. (2.39)
as the eigenvalue corresponding to this eigenvector. (One
vector such that v>A = v>, but we are usually concerned
.
or of A, then so is any rescaled vector sv for s 2 R, s 6= 0.
he same eigenvalue. For this reason, we usually only look
example of the effect of eigenvectors and eigenvalues. Here, we have
h two orthonormal eigenvectors, v(1)
with eigenvalue 1 and v(2)
with
Left) We plot the set of all unit vectors u 2 R2
as a unit circle. (Right)
of all points Au. By observing the way that A distorts the unit circle, we
cales space in direction v(i)
by i.
form a matrix V with one eigenvector per column: V = [v(1), . . . ,
, we can concatenate the eigenvalues to form a vector = [ 1, . . . ,
ndecomposition of A is then given by
A = V diag( )V 1
. (2.40)
en that constructing matrices with specific eigenvalues and eigenvec-
to stretch space in desired directions. However, we often want to
rices into their eigenvalues and eigenvectors. Doing so can help us
ain properties of the matrix, much as decomposing an integer into
rs can help us understand the behavior of that integer.
matrix can be decomposed into eigenvalues and eigenvectors. In some
ALGEBRA
on exists, but may involve complex rather than real numbers.
ook, we usually need to decompose only a specific class of
simple decomposition. Specifically, every real symmetric
posed into an expression using only real-valued eigenvectors
A = Q⇤Q>
, (2.41)
gonal matrix composed of eigenvectors of A, and ⇤ is a
eigenvalue ⇤i,i is associated with the eigenvector in column i
(Goodfellow 2016)
Effect of Eigenvalues
−3 −2 −1 0 1 2 3
x(
−3
−2
−1
0
1
2
3
x1
v(1)
v(1)
Before multiplication
−3 −2 −1 0 1 2 3
x′
(
−3
−2
−1
0
1
2
3
x′
1
v(1)
λ1 v(1)
v(1)
λ1 v(1)
After multiplication
(Goodfellow 2016)
Singular Value Decomposition
• Similar to eigendecomposition
• More general; matrix need not be square
LINEAR ALGEBRA
ly applicable. Every real matrix has a singular value decomposition,
is not true of the eigenvalue decomposition. For example, if a matrix
, the eigendecomposition is not defined, and we must use a singular
osition instead.
at the eigendecomposition involves analyzing a matrix A to discover
f eigenvectors and a vector of eigenvalues such that we can rewrite
A = V diag( )V 1
. (2.42)
ular value decomposition is similar, except this time we will write A
of three matrices:
A = UDV >
. (2.43)
hat A is an m ⇥ n matrix. Then U is defined to be an m ⇥ m matrix,
m ⇥ n matrix, and V to be an n ⇥ n matrix.
(Goodfellow 2016)
Moore-Penrose Pseudoinverse
• If the equation has:
• Exactly one solution: this is the same as the inverse.
• No solution: this gives us the solution with the
smallest error
• Many solutions: this gives us the solution with the
smallest norm of x.
When A has more columns than r
pseudoinverse provides one of the ma
he solution x = A+y with minima
olutions.
When A has more rows than colu
n this case, using the pseudoinverse
possible to y in terms of Euclidean n
rithms for computing the pseudoinverse are not based on this defini-
er the formula
A+
= V D+
U>
, (2.47)
nd V are the singular value decomposition of A, and the pseudoinverse
onal matrix D is obtained by taking the reciprocal of its non-zero
taking the transpose of the resulting matrix.
as more columns than rows, then solving a linear equation using the
provides one of the many possible solutions. Specifically, it provides
x = A+y with minimal Euclidean norm ||x||2 among all possible
as more rows than columns, it is possible for there to be no solution.
using the pseudoinverse gives us the x for which Ax is as close as
in terms of Euclidean norm ||Ax y||2.
e Trace Operator
rator gives the sum of all of the diagonal entries of a matrix:
(Goodfellow 2016)
Computing the Pseudoinverse
eudoinverse allows us to make some headway in these
of A is defined as a matrix
A+
= lim
↵&0
(A>
A + ↵I) 1
A>
. (2.46)
mputing the pseudoinverse are not based on this defini-
la
A+
= V D+
U>
, (2.47)
singular value decomposition of A, and the pseudoinverse
D is obtained by taking the reciprocal of its non-zero
ranspose of the resulting matrix.
mns than rows, then solving a linear equation using the
of the many possible solutions. Specifically, it provides
Take reciprocal of non-zero entries
The SVD allows the computation of the pseudoinverse:
(Goodfellow 2016)
Trace
perator
sum of all of the diagonal entries of a matrix:
Tr(A) =
X
i
Ai,i. (2.48)
ful for a variety of reasons. Some operations that are
sorting to summation notation can be specified using
46
pression using many useful identities. For example, the trace
nt to the transpose operator:
Tr(A) = Tr(A>
). (2.50)
square matrix composed of many factors is also invariant to
ctor into the first position, if the shapes of the corresponding
resulting product to be defined:
Tr(ABC) = Tr(CAB) = Tr(BCA) (2.51)
Tr(
nY
i=1
F (i)
) = Tr(F (n)
n 1Y
i=1
F (i)
). (2.52)
cyclic permutation holds even if the resulting product has a
r example, for A 2 Rm⇥n and B 2 Rn⇥m, we have
(Goodfellow 2016)
Learning linear algebra
• Do a lot of practice problems
• Start out with lots of summation signs and indexing
into individual entries
• Eventually you will be able to mostly use matrix
and vector product notation quickly and easily

More Related Content

What's hot

Linear Algebra and Matrix
Linear Algebra and MatrixLinear Algebra and Matrix
Linear Algebra and Matrix
itutor
 
Matrices & determinants
Matrices & determinantsMatrices & determinants
Matrices & determinants
indu thakur
 
Introduction To Matrix
Introduction To MatrixIntroduction To Matrix
Introduction To Matrix
Annie Koh
 
6.3 matrix algebra
6.3 matrix algebra6.3 matrix algebra
6.3 matrix algebra
math260
 
Inverse Matrix & Determinants
Inverse Matrix & DeterminantsInverse Matrix & Determinants
Inverse Matrix & Determinants
itutor
 

What's hot (20)

Linear Algebra
Linear AlgebraLinear Algebra
Linear Algebra
 
Ppt presentasi matrix algebra
Ppt presentasi matrix algebraPpt presentasi matrix algebra
Ppt presentasi matrix algebra
 
1560 mathematics for economists
1560 mathematics for economists1560 mathematics for economists
1560 mathematics for economists
 
Matrix Algebra : Mathematics for Business
Matrix Algebra : Mathematics for BusinessMatrix Algebra : Mathematics for Business
Matrix Algebra : Mathematics for Business
 
Matrices - Mathematics
Matrices - MathematicsMatrices - Mathematics
Matrices - Mathematics
 
Linear Algebra and Matrix
Linear Algebra and MatrixLinear Algebra and Matrix
Linear Algebra and Matrix
 
Determinants
DeterminantsDeterminants
Determinants
 
Presentation on matrix
Presentation on matrixPresentation on matrix
Presentation on matrix
 
Matrix and Determinants
Matrix and DeterminantsMatrix and Determinants
Matrix and Determinants
 
Matrix algebra
Matrix algebraMatrix algebra
Matrix algebra
 
My Lecture Notes from Linear Algebra
My Lecture Notes fromLinear AlgebraMy Lecture Notes fromLinear Algebra
My Lecture Notes from Linear Algebra
 
Matrices & determinants
Matrices & determinantsMatrices & determinants
Matrices & determinants
 
Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -
 
Matrices & Determinants
Matrices & DeterminantsMatrices & Determinants
Matrices & Determinants
 
Matrices and determinants
Matrices and determinantsMatrices and determinants
Matrices and determinants
 
Introduction To Matrix
Introduction To MatrixIntroduction To Matrix
Introduction To Matrix
 
matrices and algbra
matrices and algbramatrices and algbra
matrices and algbra
 
6.3 matrix algebra
6.3 matrix algebra6.3 matrix algebra
6.3 matrix algebra
 
Determinants
DeterminantsDeterminants
Determinants
 
Inverse Matrix & Determinants
Inverse Matrix & DeterminantsInverse Matrix & Determinants
Inverse Matrix & Determinants
 

Similar to 02 linear algebra

Functions And Relations
Functions And RelationsFunctions And Relations
Functions And Relations
andrewhickson
 

Similar to 02 linear algebra (20)

1. Introduction.pptx
1. Introduction.pptx1. Introduction.pptx
1. Introduction.pptx
 
Linear_Algebra_final.pdf
Linear_Algebra_final.pdfLinear_Algebra_final.pdf
Linear_Algebra_final.pdf
 
matlab functions
 matlab functions  matlab functions
matlab functions
 
Vectouurs
VectouursVectouurs
Vectouurs
 
Vectouurs
VectouursVectouurs
Vectouurs
 
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdfIntroduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
 
Engg maths k notes(4)
Engg maths k notes(4)Engg maths k notes(4)
Engg maths k notes(4)
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
 
Precalculus 1 chapter 1
Precalculus 1 chapter 1Precalculus 1 chapter 1
Precalculus 1 chapter 1
 
Power point for Theory of computation and detail
Power point for Theory of computation and detailPower point for Theory of computation and detail
Power point for Theory of computation and detail
 
Commands list
Commands listCommands list
Commands list
 
basics of autometa theory for beginner .
basics of autometa theory for beginner .basics of autometa theory for beginner .
basics of autometa theory for beginner .
 
1 linear algebra matrices
1 linear algebra matrices1 linear algebra matrices
1 linear algebra matrices
 
ArrayBasics.ppt
ArrayBasics.pptArrayBasics.ppt
ArrayBasics.ppt
 
Lecture_note2.pdf
Lecture_note2.pdfLecture_note2.pdf
Lecture_note2.pdf
 
Functions And Relations
Functions And RelationsFunctions And Relations
Functions And Relations
 
M a t r i k s
M a t r i k sM a t r i k s
M a t r i k s
 
Beginning direct3d gameprogrammingmath03_vectors_20160328_jintaeks
Beginning direct3d gameprogrammingmath03_vectors_20160328_jintaeksBeginning direct3d gameprogrammingmath03_vectors_20160328_jintaeks
Beginning direct3d gameprogrammingmath03_vectors_20160328_jintaeks
 
Lecture2 (vectors and tensors).pdf
Lecture2 (vectors and tensors).pdfLecture2 (vectors and tensors).pdf
Lecture2 (vectors and tensors).pdf
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

02 linear algebra

  • 1. Linear Algebra Lecture slides for Chapter 2 of Deep Learning Ian Goodfellow 2016-06-24
  • 2. (Goodfellow 2016) About this chapter • Not a comprehensive survey of all of linear algebra • Focused on the subset most relevant to deep learning • Larger subset: e.g., Linear Algebra by Georgi Shilov
  • 3. (Goodfellow 2016) Scalars • A scalar is a single number • Integers, real numbers, rational numbers, etc. • We denote it with italic font: a, n, x
  • 4. (Goodfellow 2016) Vectors • A vector is a 1-D array of numbers: • Can be real, binary, integer, etc. • Example notation for type and size: rder. We can identify each individual number by its index in that ordering. ypically we give vectors lower case names written in bold typeface, such s x. The elements of the vector are identified by writing its name in italic ypeface, with a subscript. The first element of x is x1, the second element x2 and so on. We also need to say what kind of numbers are stored in he vector. If each element is in R, and the vector has n elements, then the ector lies in the set formed by taking the Cartesian product of R n times, enoted as Rn. When we need to explicitly identify the elements of a vector, e write them as a column enclosed in square brackets: x = 2 6 6 6 4 x1 x2 ... xn 3 7 7 7 5 . (2.1) We can think of vectors as identifying points in space, with each element ving the coordinate along a different axis. ometimes we need to index a set of elements of a vector. In this case, we efine a set containing the indices and write the set as a subscript. For xample, to access x1, x3 and x6, we define the set S = {1, 3, 6} and write S. We use the sign to index the complement of a set. For example x 1 is he vector containing all elements of x except for x1, and x S is the vector ontaining all of the elements of x except for x1, x3 and x6. Matrices: A matrix is a 2-D array of numbers, so each element is identified by wo indices instead of just one. We usually give matrices upper-case variable ames with bold typeface, such as A. If a real-valued matrix A has a height • Vectors: A vector is an array of order. We can identify each indiv Typically we give vectors lower as x. The elements of the vector typeface, with a subscript. The fi is x2 and so on. We also need t the vector. If each element is in R vector lies in the set formed by t denoted as Rn. When we need to
  • 5. (Goodfellow 2016) Matrices • A matrix is a 2-D array of numbers: • Example notation for type and shape: A = 2 4 A1,1 A1,2 A2,1 A2,2 A3,1 A3,2 3 5 ) A> =  A1,1 A2,1 A3,1 A1,2 A2,2 A3,2 The transpose of the matrix can be thought of as a mirror image across the nal. th column of A. When we need to explicitly identify the elements of a x, we write them as an array enclosed in square brackets:  A1,1 A1,2 A2,1 A2,2 . (2.2) times we may need to index matrix-valued expressions that are not just gle letter. In this case, we use subscripts after the expression, but do onvert anything to lower case. For example, f(A)i,j gives element (i, j) e matrix computed by applying the function f to A. ors: In some cases we will need an array with more than two axes. In eneral case, an array of numbers arranged on a regular grid with a ble number of axes is known as a tensor. We denote a tensor named “A” this typeface: A. We identify the element of A at coordinates (i, j, k) riting Ai,j,k. ndices and write the set as a subscript x6, we define the set S = {1, 3, 6} and x the complement of a set. For example ents of x except for x1, and x S is the of x except for x1, x3 and x6. ray of numbers, so each element is identifi . We usually give matrices upper-case va h as A. If a real-valued matrix A has a we say that A 2 Rm⇥n. We usually id g its name in italic but not bold font, an Column Row
  • 6. (Goodfellow 2016) Tensors • A tensor is an array of numbers, that may have • zero dimensions, and be a scalar • one dimension, and be a vector • two dimensions, and be a matrix • or more dimensions.
  • 7. (Goodfellow 2016) Matrix Transpose CHAPTER 2. LINEAR ALGEBRA A = 2 4 A1,1 A1,2 A2,1 A2,2 A3,1 A3,2 3 5 ) A> =  A1,1 A2,1 A3,1 A1,2 A2,2 A3,2 Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the main diagonal. the i-th column of A. When we need to explicitly identify the elements of a matrix, we write them as an array enclosed in square brackets:  A1,1 A1,2 A2,1 A2,2 . (2.2) rtant operation on matrices is the transpose. The transpose of a mirror image of the matrix across a diagonal line, called the main ning down and to the right, starting from its upper left corner. See graphical depiction of this operation. We denote the transpose of a A>, and it is defined such that (A> )i,j = Aj,i. (2.3) an be thought of as matrices that contain only one column. The a vector is therefore a matrix with only one row. Sometimes we 33 ations have many useful properties that make mathematical more convenient. For example, matrix multiplication is A(B + C) = AB + AC. (2.6) A(BC) = (AB)C. (2.7) is not commutative (the condition AB = BA does not alar multiplication. However, the dot product between two : x> y = y> x. (2.8) matrix product has a simple form: (AB)> = B> A> . (2.9) onstrate Eq. 2.8, by exploiting the fact that the value of
  • 8. (Goodfellow 2016) Matrix (Dot) Product duct of matrices A and B is a third matrix C. In order ned, A must have the same number of columns as B has ⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p. product just by placing two or more matrices together, C = AB. (2.4) n is defined by Ci,j = X k Ai,kBk,j. (2.5) d product of two matrices is not just a matrix containing dual elements. Such an operation exists and is called the Hadamard product, and is denoted as A B. en two vectors x and y of the same dimensionality is the can think of the matrix product C = AB as computing etween row i of A and column j of B. = •m p m p n n Must match e defined, A must have the same number of columns as B has pe m ⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p. atrix product just by placing two or more matrices together, C = AB. (2.4) ration is defined by Ci,j = X k Ai,kBk,j. (2.5) andard product of two matrices is not just a matrix containing ndividual elements. Such an operation exists and is called the t or Hadamard product, and is denoted as A B. between two vectors x and y of the same dimensionality is the . We can think of the matrix product C = AB as computing uct between row i of A and column j of B. 34
  • 9. (Goodfellow 2016) Identity Matrix PTER 2. LINEAR ALGEBRA 2 4 1 0 0 0 1 0 0 0 1 3 5 Figure 2.2: Example identity matrix: This is I3. A2,1x1 + A2,2x2 + · · · + A2,nxn = b2 (2 . . . (2 Am,1x1 + Am,2x2 + · · · + Am,nxn = bm. (2 Matrix-vector product notation provides a more compact representation ations of this form. Inverse Matrices werful tool called matrix inversion that allows us to for many values of A. ersion, we first need to define the concept of an identity x is a matrix that does not change any vector when we at matrix. We denote the identity matrix that preserves n. Formally, In 2 Rn⇥n, and 8x 2 Rn , Inx = x. (2.20) ity matrix is simple: all of the entries along the main the other entries are zero. See Fig. 2.2 for an example.
  • 10. (Goodfellow 2016) Systems of Equations t of useful properties of the matrix product here, but that many more exist. ear algebra notation to write down a system of linear Ax = b (2.11) n matrix, b 2 Rm is a known vector, and x 2 Rn is a we would like to solve for. Each element xi of x is one Each row of A and each element of b provide another Eq. 2.11 as: A1,:x = b1 (2.12) A2,:x = b2 (2.13) . . . (2.14) expands to aware that many more exist. ough linear algebra notation to write down a system of linear Ax = b (2.11) a known matrix, b 2 Rm is a known vector, and x 2 Rn is a riables we would like to solve for. Each element xi of x is one iables. Each row of A and each element of b provide another ewrite Eq. 2.11 as: A1,:x = b1 (2.12) A2,:x = b2 (2.13) . . . (2.14) Am,:x = bm (2.15) tly, as: A1,1x1 + A1,2x2 + · · · + A1,nxn = b1 (2.16) 35
  • 11. (Goodfellow 2016) Solving Systems of Equations • A linear system of equations can have: • No solution • Many solutions • Exactly one solution: this means multiplication by the matrix is an invertible function
  • 12. (Goodfellow 2016) Matrix Inversion • Matrix inverse: • Solving a system using an inverse: • Numerically unstable, but useful for abstract analysis identity matrix is simple: all of the entries along the main all of the other entries are zero. See Fig. 2.2 for an example. se of A is denoted as A 1, and it is defined as the matrix A 1 A = In. (2.21) Eq. 2.11 by the following steps: Ax = b (2.22) A 1 Ax = A 1 b (2.23) Inx = A 1 b (2.24) 36 f the identity matrix is simple: all of the entries along the main while all of the other entries are zero. See Fig. 2.2 for an example. inverse of A is denoted as A 1, and it is defined as the matrix A 1 A = In. (2.21) solve Eq. 2.11 by the following steps: Ax = b (2.22) A 1 Ax = A 1 b (2.23) Inx = A 1 b (2.24) 36
  • 13. (Goodfellow 2016) Invertibility • Matrix can’t be inverted if… • More rows than columns • More columns than rows • Redundant rows/columns (“linearly dependent”, “low rank”)
  • 14. (Goodfellow 2016) Norms • Functions that measure how “large” a vector is • Similar to a distance between zero and the point represented by the vector is given by ||x||p = X i |xi|p !1 p for p 2 R, p 1. Norms, including the Lp norm, are functions mapping vect values. On an intuitive level, the norm of a vector x measure the origin to the point x. More rigorously, a norm is any func the following properties: • f(x) = 0 ) x = 0 • f(x + y)  f(x) + f(y) (the triangle inequality) • 8↵ 2 R, f(↵x) = |↵|f(x) The L2 norm, with p = 2, is known as the Euclidean nor Euclidean distance from the origin to the point identified by
  • 15. (Goodfellow 2016) • Lp norm • Most popular norm: L2 norm, p=2 • L1 norm, p=1: • Max norm, infinite p: Norms o measure the size of a vector. In machine learning, we ectors using a function called a norm. Formally, the L ||x||p = X i |xi|p !1 p the Lp norm, are functions mapping vectors to non-n ive level, the norm of a vector x measures the distan nt x. More rigorously, a norm is any function f that ties: APTER 2. LINEAR ALGEBRA ause it increases very slowly near the origin. In several machine learning lications, it is important to discriminate between elements that are exactly and elements that are small but nonzero. In these cases, we turn to a function t grows at the same rate in all locations, but retains mathematical simplicity: L1 norm. The L1 norm may be simplified to ||x||1 = X i |xi|. (2.31) L1 norm is commonly used in machine learning when the difference between o and nonzero elements is very important. Every time an element of x moves y from 0 by ✏, the L1 norm increases by ✏. We sometimes measure the size of the vector by counting its number of nonzero ments. Some authors refer to this function as the “L0 norm,” but this is incorrect minology. The number of non-zero entries in a vector is not a norm, because zero and elements that are small but nonzero. In these cases, we turn to a function that grows at the same rate in all locations, but retains mathematical simplicity: the L1 norm. The L1 norm may be simplified to ||x||1 = X i |xi|. (2.31) The L1 norm is commonly used in machine learning when the difference between zero and nonzero elements is very important. Every time an element of x moves away from 0 by ✏, the L1 norm increases by ✏. We sometimes measure the size of the vector by counting its number of nonzero elements. Some authors refer to this function as the “L0 norm,” but this is incorrect terminology. The number of non-zero entries in a vector is not a norm, because scaling the vector by ↵ does not change the number of nonzero entries. The L1 norm is often used as a substitute for the number of nonzero entries. One other norm that commonly arises in machine learning is the L1 norm, also known as the max norm. This norm simplifies to the absolute value of the element with the largest magnitude in the vector, ||x||1 = max i |xi|. (2.32) Sometimes we may also wish to measure the size of a matrix. In the context of deep learning, the most common way to do this is with the otherwise obscure Frobenius norm sX
  • 16. (Goodfellow 2016) • Unit vector: • Symmetric Matrix: • Orthogonal matrix: Special Matrices and Vectors A = A> . (2.35) en arise when the entries are generated by some function of es not depend on the order of the arguments. For example, ance measurements, with Ai,j giving the distance from point = Aj,i because distance functions are symmetric. ector with unit norm: ||x||2 = 1. (2.36) vector y are orthogonal to each other if x>y = 0. If both orm, this means that they are at a 90 degree angle to each n vectors may be mutually orthogonal with nonzero norm. only orthogonal but also have unit norm, we call them ix is a square matrix whose rows are mutually orthonormal e mutually orthonormal: > > 1 n machine learning algorithm in terms of arbitrary matrices, sive (and less descriptive) algorithm by restricting some ices need be square. It is possible to construct a rectangular quare diagonal matrices do not have inverses but it is still them cheaply. For a non-square diagonal matrix D, the scaling each element of x, and either concatenating some is taller than it is wide, or discarding some of the last D is wider than it is tall. is any matrix that is equal to its own transpose: A = A> . (2.35) n arise when the entries are generated by some function of s not depend on the order of the arguments. For example, nce measurements, with Ai,j giving the distance from point = Aj,i because distance functions are symmetric. es often arise when the entries are generated by some function of at does not depend on the order of the arguments. For example, distance measurements, with Ai,j giving the distance from point Ai,j = Aj,i because distance functions are symmetric. is a vector with unit norm: ||x||2 = 1. (2.36) nd a vector y are orthogonal to each other if x>y = 0. If both ero norm, this means that they are at a 90 degree angle to each most n vectors may be mutually orthogonal with nonzero norm. e not only orthogonal but also have unit norm, we call them matrix is a square matrix whose rows are mutually orthonormal ns are mutually orthonormal: A> A = AA> = I. (2.37) 41 LGEBRA A 1 = A> , (2.38)
  • 17. (Goodfellow 2016) Eigendecomposition • Eigenvector and eigenvalue: • Eigendecomposition of a diagonalizable matrix: • Every real symmetric matrix has a real, orthogonal eigendecomposition: atrix as an array of elements. dely used kinds of matrix decomposition is called eigen- h we decompose a matrix into a set of eigenvectors and square matrix A is a non-zero vector v such that multipli- the scale of v: Av = v. (2.39) as the eigenvalue corresponding to this eigenvector. (One vector such that v>A = v>, but we are usually concerned . or of A, then so is any rescaled vector sv for s 2 R, s 6= 0. he same eigenvalue. For this reason, we usually only look example of the effect of eigenvectors and eigenvalues. Here, we have h two orthonormal eigenvectors, v(1) with eigenvalue 1 and v(2) with Left) We plot the set of all unit vectors u 2 R2 as a unit circle. (Right) of all points Au. By observing the way that A distorts the unit circle, we cales space in direction v(i) by i. form a matrix V with one eigenvector per column: V = [v(1), . . . , , we can concatenate the eigenvalues to form a vector = [ 1, . . . , ndecomposition of A is then given by A = V diag( )V 1 . (2.40) en that constructing matrices with specific eigenvalues and eigenvec- to stretch space in desired directions. However, we often want to rices into their eigenvalues and eigenvectors. Doing so can help us ain properties of the matrix, much as decomposing an integer into rs can help us understand the behavior of that integer. matrix can be decomposed into eigenvalues and eigenvectors. In some ALGEBRA on exists, but may involve complex rather than real numbers. ook, we usually need to decompose only a specific class of simple decomposition. Specifically, every real symmetric posed into an expression using only real-valued eigenvectors A = Q⇤Q> , (2.41) gonal matrix composed of eigenvectors of A, and ⇤ is a eigenvalue ⇤i,i is associated with the eigenvector in column i
  • 18. (Goodfellow 2016) Effect of Eigenvalues −3 −2 −1 0 1 2 3 x( −3 −2 −1 0 1 2 3 x1 v(1) v(1) Before multiplication −3 −2 −1 0 1 2 3 x′ ( −3 −2 −1 0 1 2 3 x′ 1 v(1) λ1 v(1) v(1) λ1 v(1) After multiplication
  • 19. (Goodfellow 2016) Singular Value Decomposition • Similar to eigendecomposition • More general; matrix need not be square LINEAR ALGEBRA ly applicable. Every real matrix has a singular value decomposition, is not true of the eigenvalue decomposition. For example, if a matrix , the eigendecomposition is not defined, and we must use a singular osition instead. at the eigendecomposition involves analyzing a matrix A to discover f eigenvectors and a vector of eigenvalues such that we can rewrite A = V diag( )V 1 . (2.42) ular value decomposition is similar, except this time we will write A of three matrices: A = UDV > . (2.43) hat A is an m ⇥ n matrix. Then U is defined to be an m ⇥ m matrix, m ⇥ n matrix, and V to be an n ⇥ n matrix.
  • 20. (Goodfellow 2016) Moore-Penrose Pseudoinverse • If the equation has: • Exactly one solution: this is the same as the inverse. • No solution: this gives us the solution with the smallest error • Many solutions: this gives us the solution with the smallest norm of x. When A has more columns than r pseudoinverse provides one of the ma he solution x = A+y with minima olutions. When A has more rows than colu n this case, using the pseudoinverse possible to y in terms of Euclidean n rithms for computing the pseudoinverse are not based on this defini- er the formula A+ = V D+ U> , (2.47) nd V are the singular value decomposition of A, and the pseudoinverse onal matrix D is obtained by taking the reciprocal of its non-zero taking the transpose of the resulting matrix. as more columns than rows, then solving a linear equation using the provides one of the many possible solutions. Specifically, it provides x = A+y with minimal Euclidean norm ||x||2 among all possible as more rows than columns, it is possible for there to be no solution. using the pseudoinverse gives us the x for which Ax is as close as in terms of Euclidean norm ||Ax y||2. e Trace Operator rator gives the sum of all of the diagonal entries of a matrix:
  • 21. (Goodfellow 2016) Computing the Pseudoinverse eudoinverse allows us to make some headway in these of A is defined as a matrix A+ = lim ↵&0 (A> A + ↵I) 1 A> . (2.46) mputing the pseudoinverse are not based on this defini- la A+ = V D+ U> , (2.47) singular value decomposition of A, and the pseudoinverse D is obtained by taking the reciprocal of its non-zero ranspose of the resulting matrix. mns than rows, then solving a linear equation using the of the many possible solutions. Specifically, it provides Take reciprocal of non-zero entries The SVD allows the computation of the pseudoinverse:
  • 22. (Goodfellow 2016) Trace perator sum of all of the diagonal entries of a matrix: Tr(A) = X i Ai,i. (2.48) ful for a variety of reasons. Some operations that are sorting to summation notation can be specified using 46 pression using many useful identities. For example, the trace nt to the transpose operator: Tr(A) = Tr(A> ). (2.50) square matrix composed of many factors is also invariant to ctor into the first position, if the shapes of the corresponding resulting product to be defined: Tr(ABC) = Tr(CAB) = Tr(BCA) (2.51) Tr( nY i=1 F (i) ) = Tr(F (n) n 1Y i=1 F (i) ). (2.52) cyclic permutation holds even if the resulting product has a r example, for A 2 Rm⇥n and B 2 Rn⇥m, we have
  • 23. (Goodfellow 2016) Learning linear algebra • Do a lot of practice problems • Start out with lots of summation signs and indexing into individual entries • Eventually you will be able to mostly use matrix and vector product notation quickly and easily