Successfully reported this slideshow.                             Upcoming SlideShare
×

# 1. Linear Algebra for Machine Learning: Linear Systems

280 views

Published on

The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the first part which is giving a short overview of matrices and discussing linear systems.

Published in: Education
• Full Name
Comment goes here.

Are you sure you want to Yes No Are you sure you want to  Yes  No
• Be the first to like this

### 1. Linear Algebra for Machine Learning: Linear Systems

1. 1. Seminar Series on Linear Algebra for Machine Learning Part 1: Linear Systems Dr. Ceni Babaoglu Ryerson University cenibabaoglu.com Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
2. 2. Overview 1 Matrices and Matrix Operations 2 Special Types of Matrices 3 Inverse of a Matrix 4 Determinant of a Matrix 5 A statistical Application: Correlation Coeﬃcient 6 Matrix Transformations 7 Systems of Linear Equations 8 Linear Systems and Inverses 9 References Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
3. 3. Matrices An m × n matrix A =      a11 a12 a13 . . . a1n a21 a22 a23 . . . a2n ... ... ... ... ... am1 am2 am3 . . . amn      = [aij ] The i th row of A is A = ai1 ai2 ai3 . . . ain , (1 ≤ i ≤ m) The j th column of A is A =      a1j a2j ... amj      , (1 ≤ j ≤ n) Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
4. 4. Matrix Operations Matrix Addition A + B = [aij ] + [bij ] , C = [cij ] cij = aij + bij , i = 1, 2, · · · , m, j = 1, 2, · · · , n. Scalar Multiplication rA = r [aij ] , C = [cij ] cij = r aij , i = 1, 2, · · · , m, j = 1, 2, · · · , n. Transpose of a Matrix AT = aT ij , aT ij = aji Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
5. 5. Special Types of Matrices Diagonal Matrix An n × n matrix A = [aij ] is called a diagonal matrix if aij = 0 for i = j      a 0 . . . 0 0 1 . . . 0 ... ... ... ... 0 0 . . . 1      Identity Matrix The scalar matrix In = [dij ], where dii = 1 and dij = 0 for i = j, is called the n × n identity matrix      1 0 . . . 0 0 1 . . . 0 ... ... ... ... 0 0 . . . 1      Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
6. 6. Special Types of Matrices Upper Triangular Matrix An n × n matrix A = [aij ] is called upper triangular if aij = 0 for i > j   2 b c 0 3 0 0 0 1   Lower Triangular Matrix An n × n matrix A = [aij ] is called lower triangular if aij = 0 for i < j   2 0 0 0 3 0 a b 1   Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
7. 7. Special Types of Matrices Symmetrix Matrix A matrix A with real entries is called symmetric if AT = A.   1 b c b 2 d c d 3   Skew Symmetric Matrix A matrix A with real entries is called skew symmetric if AT = −A.   0 b −c −b 0 −d c d 0   Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
8. 8. Matrix Operations Inner Product a · b = a1b1 + a2b2 + · · · + anbn = n i=1 ai bi Matrix Multiplication of an m × p matrix and p × n matrix cij = ai1b1j + ai2b2j + · · · + aipbpj = p k=1 aikbkj , 1 ≤ i ≤ m, 1 ≤ j ≤ n. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
9. 9. Algebraic Properties of Matrix Operations Let A, B and C be matrices of appropriate sizes; r and s be real numbers. A + B is a matrix of the same dimensions as A and B. A + B = B + A A + (B + C) = (A + B) + C For any matrix A, there is a unique matrix 0 such that A + 0 = A. For each A, there is a unique matrix −A, A such that A + (−A) = O. A(BC) = (AB)C (A + B)C = AC + BC C(A + B) = CA + CB r(sA) = (rs)A (r + s)A = rA + sA r(A + B) = rA + rB A(rB) = r(AB) = (rA)B Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
10. 10. Inverse of a Matrix Nonsingular Matrices An n × n matrix is called nonsingular, or invertible if there exists an n × n matrix B such that AB = BA = In. Inverse Matrix Such a B is called an inverse of A. If such a B does not exist, A is called singular, or noninvertible. The inverse of a matrix, if it exists, is unique. AA−1 = A−1 A = In AA−1 = 1 2 3 4 −2 1 3/2 −1/2 = −2 1 3/2 −1/2 1 2 3 4 = 1 0 0 1 Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
11. 11. Determinant of a Matrix Associated with every square matrix A is a number called the determinant, denoted by det(A). For 2 × 2 matrices, the determinant is deﬁned as A = a b c d , det(A) = ad − bc A = 2 1 −4 −2 , det(A) = (2)(−2) − (1)(−4) = 0 Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
12. 12. Properties of Determinants 1 If I is the identity, then det(I) = 1. 2 If B is obtained from A by interchanging two rows, then det(B) = −det(A). 3 If B is obtained from A by adding a multiple of one row of A to another row, then det(B) = det(A). 4 If B is obtained from A by multiplying a row of A by the number m, then det(B) = m det(A). 5 Determinant of an upper (or lower) triangular matrix is equal to the product of its diagonal entries. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
13. 13. Determinant of an n × n matrix Minor Suppose that in an n × n matrix A we delete the ith row and jth column to obtain an (n − 1) × (n − 1) matrix. The determinant of this sub-matrix is called the (i, j)th minor of A and is denoted by Mij . Cofactor The number (−1)i+j Mij is called the (i, j)th cofactor of A and is denoted by Cij . Determinant Let A be an n × n matrix. Then det(A) can be evaluated by expanding by cofactors along any row or any column: det(A) = ai1Ci1 + ai2Ci2 + · · · + ainCin, 1 ≤ i ≤ n. or det(A) = a1j C1j + a2j C2j + · · · + anj Cnj , 1 ≤ j ≤ n. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
14. 14. Example Let’s ﬁnd the determinant of the following matrix. A =   2 −3 1 4 0 −2 3 −1 −3   . If we expand cofactors along the ﬁrst row: |A| = (2)C11 + (−3)C12 + (1)C13 = 2(−1)1+1 0 −2 −1 −3 − 3(−1)1+2 4 −2 3 −3 + 1(−1)1+3 4 0 3 −1 = 2(−2) + 3(−6) + (−4) = −26. If we expand along the third column, we obtain |A| = (1)C13 + (−2)C23 + (−3)C33 = 1(−1)1+3 4 0 3 −1 − 2(−1)2+3 2 −3 3 −1 − 3(−1)3+3 2 −3 4 0 = −26. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
15. 15. Angle between to vectors The length of n-vector v =        v1 v2 ... vn−1 vn        is deﬁnes as v = v2 1 + v2 2 + · · · + v2 n−1 + v2 n . The angle between the two nonzero vectors is determined by cos(θ) = u · v u v . −1 u · v u v 1, 0 θ π Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
16. 16. A statistical application: Correlation Coeﬃcient Sample means of two attributes ¯x = 1 n n i=1 x, ¯y = 1 n n i=1 y Centered form xc = [x1 − ¯x x2 − ¯x · · · xn − ¯x]T yc = [y1 − ¯y y2 − ¯y · · · yn − ¯y]T Correlation coeﬃcient Cor(xc, yc) = xc · yc xc yc r = n i=1(xi − ¯x)(yi − ¯y) n i=1(xi − ¯x)2 n i=1(yi − ¯y)2 Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
17. 17. Linear Algebra vs Data Science 1 Length of a vector 2 Angle between the two vectors is small 3 Angle between the two vectors is near π 4 Angle between the two vectors is near π/2 1 Variability of a variable 2 The two variables are highly positively correlated 3 The two variables are highly negatively correlated 4 The two variables are uncorrelated Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
18. 18. Matrix Transformations If A is an m × n matrix and u is an n-vector, then the matrix product Au is an m-vector. A funtion f mapping Rn into Rm is denoted by f : Rn → Rm. A matrix transformation is a function f : Rn into Rm deﬁned by f (u) = Au. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
19. 19. Example Let f : R2 → R2 be the matrix transformation deﬁned by f (u) = 1 0 0 −1 u. f (u) = f x y = 1 0 0 −1 x y = x −y This transformation performs a reﬂection with respect to the x-axis in R2. To see a reﬂection of a point, say (2,-3) 1 0 0 −1 2 −3 = 2 3 Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
20. 20. Systems of Linear Equations A linear equation in variables x1, x2, . . . , xn is an equation of the form a1x1 + a2x2 + . . . + anxn = b. A collection of such equations is called a linear system: a11x1 + a12x2 + · · · + a1nxn = b1 a21x1 + a22x2 + · · · + a2nxn = b2 ... ... ... ... am1x1 + am2x2 + · · · + amnxn = bm Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
21. 21. Systems of Linear Equations For the system of equations a11x1 + a12x2 + · · · + a1nxn = b1 a21x1 + a22x2 + · · · + a2nxn = b2 ... ... ... ... am1x1 + am2x2 + · · · + amnxn = bm Ax = b The augmented matrix:     a11 a12 a13 . . . a1n b1 a21 a22 a23 . . . a2n b2 . . . . . . . . . . . . . . . . . . am1 am2 am3 . . . amn bm     If b1 = b2 = · · · = bm = 0, the system is called homogeneous. Ax = 0 Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
22. 22. Linear Systems and Inverses If A is an n × n matrix, then the linear system Ax = b is a system of n equations in n unknowns. Suppose that A is nonsingular. Ax = b A−1 (Ax) = A−1 b (A−1 A)x = A−1 b Inx = A−1 b x = A−1 b x = A−1b is the unique solution of the linear system. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
23. 23. Solving Linear Systems A matrix is in echelon form if 1 All zero rows, if there are any, appear at the bottom of the matrix. 2 The ﬁrst nonzero entry from the left of a nonzero row is a 1. This entry is called a leading one of its row. 3 For each nonzero row, the leading one appears to the right and below any leading ones in preceding rows. 4 If a column contains a leading one, then all other entries in that column are zero.    1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1         1 0 0 0 1 3 0 1 0 0 5 2 0 0 0 1 2 0 0 0 0 0 0 0         1 2 0 0 3 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0     Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
24. 24. Solving Linear Systems An elementary row operation on a matrix is one of the following: 1 interchange two rows, 2 add a multiple of one row to another, and 3 multiply one row by a non-zero constant. Two matrices are row equivalent if one can be converted into the other through a series of elementary row operations. Every matrix is row equivalent to a matrix in echelon form. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
25. 25. Solving Linear Systems If an augmented matrix is in echelon form, then the ﬁrst nonzero entry of each row is a pivot. The variables corresponding to the pivots are called pivot variables, and the other variables are called free variables. A matrix is in reduced echelon form if all pivot entries are 1 and all entries above and below the pivots are 0. A system of linear equations with more unknowns than equations will either fail to have any solutions or will have an inﬁnite number of solutions. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
26. 26. Example: Let’s solve the following system. x1 − 3 x2 + x3 = 1 2 x1 + x2 − x3 = 2 4 x1 + 4 x2 − 2 x3 = 1 5 x1 − 8 x2 + 2 x3 = 5     1 −3 1 2 1 −1 4 4 −2 5 −8 2 1 2 1 5     R2−2R1→R2 R3−4R1→R3 R4−5R1→R4 −−−−−−−→     1 −3 1 0 7 −3 0 16 −6 0 7 −3 1 0 −3 0     R2/7→R2 −−−−−→     1 −3 1 0 1 −3/7 0 16 −6 0 7 −3 1 0 −3 0     R1+3R2→R1 R3−16R2→R3 R4−7R2→R4 −−−−−−−−→     1 0 −2/7 0 1 −3/7 0 0 6/7 0 0 0 1 0 −3 0     7R3/6→R3 −−−−−−→     1 0 −2/7 0 1 −3/7 0 0 1 0 0 0 1 0 −7/2 0     R1+2R3/7→R1 R2+3R3/7→R2 −−−−−−−−→     1 0 0 0 1 0 0 0 1 0 0 0 0 −3/2 −7/2 0     ⇔ x1 = 0, x2 = −3/2, x3 = −7/2 Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
27. 27. Example: Let’s solve the following homogenous system. 2 x1 + 4 x2 + 3 x3 + 3 x4 + 3 x5 = 0 x1 + 2 x2 + x3 + 2 x4 + x5 = 0 x1 + 2 x2 + 2 x3 + x4 + 2 x5 = 0 x3 − x4 − x5 = 0     2 4 3 3 3 1 2 1 2 1 1 2 2 1 2 0 0 1 −1 −1 0 0 0 0     R1↔R2 −−−−→     1 2 1 2 1 2 4 3 3 3 1 2 2 1 2 0 0 1 −1 −1 0 0 0 0     R2−2R1→R2 R3−R1→R3 −−−−−−−→     1 2 1 2 1 0 0 1 −1 1 0 0 1 −1 1 0 0 1 −1 −1 0 0 0 0     R3−R2→R3 R4−R2→R4 −−−−−−−→     1 2 1 2 1 0 0 1 −1 1 0 0 0 0 0 0 0 0 0 −2 0 0 0 0     R3↔R4 −−−−→     1 2 1 2 1 0 0 1 −1 1 0 0 0 0 −2 0 0 0 0 0 0 0 0 0     −R3/2→R3 −−−−−−−→     1 2 1 2 1 0 0 1 −1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0     x1 + 2x2 + x3 + 2x4 + x5 = 0, x3 − x4 + x5 = 0 x5 = 0, x2 = α, x4 = β, x3 = β, x1 = −2α − β − 2β. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
28. 28. Example: Let’s use elementary row operations to ﬁnd A−1 if A =   4 3 2 5 6 3 3 5 2  .   4 3 2 5 6 3 3 5 2 1 0 0 0 1 0 0 0 1   R1−R3→R1 −−−−−−−→   1 −2 0 5 6 3 3 5 2 1 0 −1 0 1 0 0 0 1   R2−5R1→R2 R3−3R1→R3 −−−−−−−→   1 −2 0 0 16 3 0 11 2 1 0 −1 −5 1 5 −3 0 4   R2/16→R2 −−−−−−→   1 −2 0 0 1 3/16 0 11 2 1 0 −1 −5/16 1/16 5/16 −3 0 4   R1+2R2→R1R3−11R1→R3 −−−−−−−−−−−−−−−→   1 0 3/8 0 1 3/16 0 0 −1/16 3/8 1/8 −3/8 −5/16 1/16 5/16 7/16 −11/16 9/16   R1+6R3→R1 R2+3R3→R2 −−−−−−−→   1 0 0 0 1 0 0 0 −1/16 3 −4 3 1 −2 2 7/16 −11/16 9/16   −16R3→R3 −−−−−−→   1 0 0 0 1 0 0 0 1 3 −4 3 1 −2 2 −7 11 −9   A−1 =   3 −4 3 1 −2 2 −7 11 −9   Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems
29. 29. References Linear Algebra With Applications, 7th Edition by Steven J. Leon. Elementary Linear Algebra with Applications, 9th Edition by Bernard Kolman and David Hill. Dr. Ceni Babaoglu cenibabaoglu.com Linear Algebra for Machine Learning: Linear Systems