Like this document? Why not share!

# Riyazi mohandesi jeffrey(www.mtka.lxb.ir)

## on Nov 22, 2013

• 915 views

ریاضی مهندسی پیشرفته جفری

ریاضی مهندسی پیشرفته جفری

### Views

Total Views
915
Views on SlideShare
915
Embed Views
0

Likes
1
164
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Riyazi mohandesi jeffrey(www.mtka.lxb.ir)Document Transcript

• Alan Jeffrey University of Newcastle-upon-Tyne San Diego San Francisco New York Boston London Toronto Sydney Tokyo
• To Lisl and our family
• C O N T E N T S Preface PART ONE CHAPTER 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 xv REVIEW MATERIAL 1 Review of Prerequisites 3 Real Numbers, Mathematical Induction, and Mathematical Conventions 4 Complex Numbers 10 The Complex Plane 15 Modulus and Argument Representation of Complex Numbers 18 Roots of Complex Numbers 22 Partial Fractions 27 Fundamentals of Determinants 31 Continuity in One or More Variables 35 Differentiability of Functions of One or More Variables 38 Tangent Line and Tangent Plane Approximations to Functions 40 Integrals 41 Taylor and Maclaurin Theorems 43 Cylindrical and Spherical Polar Coordinates and Change of Variables in Partial Differentiation 46 Inverse Functions and the Inverse Function Theorem 49 vii
• PART TWO CHAPTER 2 2.1 2.2 2.3 2.4 VECTORS AND MATRICES 53 Vectors and Vector Spaces 55 2.5 2.6 2.7 CHAPTER Vectors, Geometry, and Algebra 56 The Dot Product (Scalar Product) 70 The Cross Product (Vector Product) 77 Linear Dependence and Independence of Vectors and Triple Products 82 n -Vectors and the Vector Space R n 88 Linear Independence, Basis, and Dimension 95 Gram–Schmidt Orthogonalization Process 101 3 Matrices and Systems of Linear Equations 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 CHAPTER 4 4.1 4.2 4.3 4.4 4.5 viii 105 Matrices 106 Some Problems That Give Rise to Matrices 120 Determinants 133 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication 143 The Echelon and Row-Reduced Echelon Forms of a Matrix 147 Row and Column Spaces and Rank 152 The Solution of Homogeneous Systems of Linear Equations 155 The Solution of Nonhomogeneous Systems of Linear Equations 158 The Inverse Matrix 163 Derivative of a Matrix 171 Eigenvalues, Eigenvectors, and Diagonalization Characteristic Polynomial, Eigenvalues, and Eigenvectors 178 Diagonalization of Matrices 196 Special Matrices with Complex Elements Quadratic Forms 210 The Matrix Exponential 215 205 177
• PART THREE CHAPTER 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 CHAPTER 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 ORDINARY DIFFERENTIAL EQUATIONS 225 First Order Differential Equations 227 Background to Ordinary Differential Equations Some Problems Leading to Ordinary Differential Equations 233 Direction Fields 240 Separable Equations 242 Homogeneous Equations 247 Exact Equations 250 Linear First Order Equations 253 The Bernoulli Equation 259 The Riccati Equation 262 Existence and Uniqueness of Solutions 264 228 Second and Higher Order Linear Differential Equations and Systems 269 Homogeneous Linear Constant Coefﬁcient Second Order Equations 270 Oscillatory Solutions 280 Homogeneous Linear Higher Order Constant Coefﬁcient Equations 291 Undetermined Coefﬁcients: Particular Integrals 302 Cauchy–Euler Equation 309 Variation of Parameters and the Green’s Function 311 Finding a Second Linearly Independent Solution from a Known Solution: The Reduction of Order Method 321 Reduction to the Standard Form u + f (x)u = 0 324 Systems of Ordinary Differential Equations: An Introduction 326 A Matrix Approach to Linear Systems of Differential Equations 333 Nonhomogeneous Systems 338 Autonomous Systems of Equations 351 ix
• CHAPTER 7 7.1 7.2 7.3 7.4 CHAPTER 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 PART FOUR CHAPTER 9 9.1 9.2 9.3 9.4 9.5 9.6 x The Laplace Transform Laplace Transform: Fundamental Ideas 379 Operational Properties of the Laplace Transform 390 Systems of Equations and Applications of the Laplace Transform 415 The Transfer Function, Control Systems, and Time Lags 379 437 Series Solutions of Differential Equations, Special Functions, and Sturm–Liouville Equations A First Approach to Power Series Solutions of Differential Equations 443 A General Approach to Power Series Solutions of Homogeneous Equations 447 Singular Points of Linear Differential Equations 461 The Frobenius Method 463 The Gamma Function Revisited 480 Bessel Function of the First Kind Jn(x) 485 Bessel Functions of the Second Kind Yν (x) 495 Modiﬁed Bessel Functions I ν (x) and K ν (x) 501 A Critical Bending Problem: Is There a Tallest Flagpole? Sturm–Liouville Problems, Eigenfunctions, and Orthogonality 509 Eigenfunction Expansions and Completeness 526 443 504 FOURIER SERIES, INTEGRALS, AND THE FOURIER TRANSFORM 543 Fourier Series 545 Introduction to Fourier Series 545 Convergence of Fourier Series and Their Integration and Differentiation 559 Fourier Sine and Cosine Series on 0 ≤ x ≤ L 568 Other Forms of Fourier Series 572 Frequency and Amplitude Spectra of a Function 577 Double Fourier Series 581
• CHAPTER 10 10.1 10.2 10.3 PART FIVE CHAPTER 11 11.1 11.2 11.3 11.4 11.5 11.6 CHAPTER 12 12.1 12.2 12.3 12.4 PART SIX CHAPTER 13 13.1 13.2 13.3 13.4 Fourier Integrals and the Fourier Transform The Fourier Integral 589 The Fourier Transform 595 Fourier Cosine and Sine Transforms 589 611 VECTOR CALCULUS 623 Vector Differential Calculus 625 Scalar and Vector Fields, Limits, Continuity, and Differentiability 626 Integration of Scalar and Vector Functions of a Single Real Variable 636 Directional Derivatives and the Gradient Operator Conservative Fields and Potential Functions 650 Divergence and Curl of a Vector 659 Orthogonal Curvilinear Coordinates 665 644 Vector Integral Calculus Background to Vector Integral Theorems 678 Integral Theorems 680 Transport Theorems 697 Fluid Mechanics Applications of Transport Theorems 677 704 COMPLEX ANALYSIS 709 Analytic Functions 711 Complex Functions and Mappings 711 Limits, Derivatives, and Analytic Functions 717 Harmonic Functions and Laplace’s Equation 730 Elementary Functions, Inverse Functions, and Branches 735 xi
• CHAPTER 14 14.1 14.2 14.3 14.4 CHAPTER 15 15.1 15.2 15.3 15.4 15.5 CHAPTER 16 16.1 CHAPTER 17 17.1 17.2 PART SEVEN CHAPTER 18 18.1 18.2 18.3 18.4 xii Complex Integration 745 Complex Integrals 745 Contours, the Cauchy–Goursat Theorem, and Contour Integrals 755 The Cauchy Integral Formulas 769 Some Properties of Analytic Functions 775 Laurent Series, Residues, and Contour Integration Complex Power Series and Taylor Series 791 Uniform Convergence 811 Laurent Series and the Classiﬁcation of Singularities 816 Residues and the Residue Theorem 830 Evaluation of Real Integrals by Means of Residues 791 839 The Laplace Inversion Integral The Inversion Integral for the Laplace Transform Conformal Mapping and Applications to Boundary Value Problems 863 863 877 Conformal Mapping 877 Conformal Mapping and Boundary Value Problems 904 PARTIAL DIFFERENTIAL EQUATIONS 925 Partial Differential Equations 927 What Is a Partial Differential Equation? 927 The Method of Characteristics 934 Wave Propagation and First Order PDEs 942 Generalizing Solutions: Conservation Laws and Shocks 951
• 18.5 18.6 18.7 18.8 18.9 18.10 18.11 18.12 PART EIGHT CHAPTER 19 19.1 19.2 19.3 19.4 19.5 19.6 19.7 The Three Fundamental Types of Linear Second Order PDE 956 Classiﬁcation and Reduction to Standard Form of a Second Order Constant Coefﬁcient Partial Differential Equation for u(x, y) 964 Boundary Conditions and Initial Conditions 975 Waves and the One-Dimensional Wave Equation 978 The D’Alembert Solution of the Wave Equation and Applications 981 Separation of Variables 988 Some General Results for the Heat and Laplace Equation 1025 An Introduction to Laplace and Fourier Transform Methods for PDEs 1030 NUMERICAL MATHEMATICS 1043 Numerical Mathematics 1045 Decimal Places and Signiﬁcant Figures 1046 Roots of Nonlinear Functions 1047 Interpolation and Extrapolation 1058 Numerical Integration 1065 Numerical Solution of Linear Systems of Equations 1077 Eigenvalues and Eigenvectors 1090 Numerical Solution of Differential Equations 1095 Answers 1109 References 1143 Index 1147 xiii
• P R E F A C E T his book has evolved from lectures on engineering mathematics given regularly over many years to students at all levels in the United States, England, and elsewhere. It covers the more advanced aspects of engineering mathematics that are common to all ﬁrst engineering degrees, and it differs from texts with similar names by the emphasis it places on certain topics, the systematic development of the underlying theory before making applications, and the inclusion of new material. Its special features are as follows. Prerequisites T he opening chapter, which reviews mathematical prerequisites, serves two purposes. The ﬁrst is to refresh ideas from previous courses and to provide basic self-contained reference material. The second is to remove from the main body of the text certain elementary material that by tradition is usually reviewed when ﬁrst used in the text, thereby allowing the development of more advanced ideas to proceed without interruption. Worked Examples T he numerous worked examples that follow the introduction of each new idea serve in the earlier chapters to illustrate applications that require relatively little background knowledge. The ability to formulate physical problems in mathematical terms is an essential part of all mathematics applications. Although this is not a text on mathematical modeling, where more complicated physical applications are considered, the essential background is ﬁrst developed to the point at which the physical nature of the problem becomes clear. Some examples, such as the ones involving the determination of the forces acting in the struts of a framed structure, the damping of vibrations caused by a generator and the vibrational modes of clamped membranes, illustrate important mathematical ideas in the context of practical applications. Other examples occur without speciﬁc applications and their purpose is to reinforce new mathematical ideas and techniques as they arise. A different type of example is the one that seeks to determine the height of the tallest ﬂagpole, where the height limitation is due to the phenomenon of xv
• buckling. Although the model used does not give an accurate answer, it provides a typical example of how a mathematical model is constructed. It also illustrates the reasoning used to select a physical solution from a scenario in which other purely mathematical solutions are possible. In addition, the example demonstrates how the choice of a unique physically meaningful solution from a set of mathematically possible ones can sometimes depend on physical considerations that did not enter into the formulation of the original problem. Exercise Sets T he need for engineering students to have a sound understanding of mathematics is recognized by the systematic development of the underlying theory and the provision of many carefully selected fully worked examples, coupled with their reinforcement through the provision of large sets of exercises at the ends of sections. These sets, to which answers to odd-numbered exercises are listed at the end of the book, contain many routine exercises intended to provide practice when dealing with the various special cases that can arise, and also more challenging exercises, each of which is starred, that extend the subject matter of the text in different ways. Although many of these exercises can be solved quickly by using standard computer algebra packages, the author believes the fundamental mathematical ideas involved are only properly understood once a signiﬁcant number of exercises have ﬁrst been solved by hand. Computer algebra can then be used with advantage to conﬁrm the results, as is required in various exercise sets. Where computer algebra is either required or can be used to advantage, the exercise numbers are in blue. A comparison of computer-based solutions with those obtained by hand not only conﬁrms the correctness of hand calculations, but also serves to illustrate how the method of solution often determines its form, and that transforming one form of solution to another is sometimes difﬁcult. It is the author’s belief that only when fundamental ideas are fully understood is it safe to make routine use of computer algebra, or to use a numerical package to solve more complicated problems where the manipulation involved is prohibitive, or where a numerical result may be the only form of solution that is possible. New Material T ypical of some of the new material to be found in the book is the matrix exponential and its application to the solution of linear systems of ordinary differential equations, and the use of the Green’s function. The introductory discussion of the development of discontinuous solutions of ﬁrst order quasilinear equations, which are essential in the study of supersonic gas ﬂow and in various other physical applications, is also new and is not to be found elsewhere. The account of the Laplace transform contains more detail than usual. While the Laplace transform is applied to standard engineering problems, including xvi
• control theory, various nonstandard problems are also considered, such as the solution of a boundary value problem for the equation that describes the bending of a beam and the derivation of the Laplace transform of a function from its differential equation. The chapter on vector integral calculus ﬁrst derives and then applies two fundamental vector transport theorems that are not found in similar texts, but which are of considerable importance in many branches of engineering. Series Solutions of Differential Equations U nderstanding the derivation of series solutions of ordinary differential equations is often difﬁcult for students. This is recognized by the provision of detailed examples, followed by carefully chosen sets of exercises. The worked examples illustrate all of the special cases that can arise. The chapter then builds on this by deriving the most important properties of Legendre polynomials and Bessel functions, which are essential when solving partial differential equations involving cylindrical and spherical polar coordinates. Complex Analysis B ecause of its importance in so many different applications, the chapters on complex analysis contain more topics than are found in similar texts. In particular, the inclusion of an account of the inversion integral for the Laplace transform makes it possible to introduce transform methods for the solution of problems involving ordinary and partial differential equations for which tables of transform pairs are inadequate. To avoid unnecessary complication, and to restrict the material to a reasonable length, some topics are not developed with full mathematical rigor, though where this occurs the arguments used will sufﬁce for all practical purposes. If required, the account of complex analysis is sufﬁciently detailed for it to serve as a basis for a single subject course. Conformal Mapping and Boundary Value Problems S ufﬁcient information is provided about conformal transformations for them to be used to provide geometrical insight into the solution of some fundamental two-dimensional boundary value problems for the Laplace equation. Physical applications are made to steady-state temperature distributions, electrostatic problems, and ﬂuid mechanics. The conformal mapping chapter also provides a quite different approach to the solution of certain two-dimensional boundary value problems that in the subsequent chapter on partial differential equations are solved by the very different method of separation of variables. xvii
• Partial Differential Equations A n understanding of partial differential equations is essential in all branches of engineering, but accounts in engineering mathematics texts often fall short of what is required. This is because of their tendency to focus on the three standard types of linear second order partial differential equations, and their solution by means of separation of variables, to the virtual exclusion of ﬁrst order equations and the systems from which these fundamental linear second order equations are derived. Often very little is said about the types of boundary and initial conditions that are appropriate for the different types of partial differential equations. Mention is seldom if ever made of the important part played by nonlinearity in ﬁrst order equations and the way it inﬂuences the properties of their solutions. The account given here approaches these matters by starting with ﬁrst order linear and quasilinear equations, where the way initial and boundary conditions and nonlinearity inﬂuence solutions is easily understood. The discussion of the effects of nonlinearity is introduced at a comparatively early stage in the study of partial differential equations because of its importance in subjects like ﬂuid mechanics and chemical engineering. The account of nonlinearity also includes a brief discussion of shock wave solutions that are of fundamental importance in both supersonic gas ﬂow and elsewhere. Linear and nonlinear wave propagation is examined in some detail because of its considerable practical importance; in addition, the way integral transform methods can be used to solve linear partial differential equations is described. From a rigorous mathematical point of view, the solution of a partial differential equation by the method of separation of variables only yields a formal solution, which only becomes a rigorous solution once the completeness of any set of eigenfunctions that arises has been established. To develop the subject in this manner would take the text far beyond the level for which it is intended and so the completeness of any set of eigenfunctions that occurs will always be assumed. This assumption can be fully justiﬁed when applying separation of variables to the applications considered here and also in virtually all other practical cases. Technology Projects T o encourage the use of technology and computer algebra and to broaden the range of problems that can be considered, technology-based projects have been added wherever appropriate; in addition, standard sets of exercises of a theoretical nature have been included at the ends of sections. These projects are not linked to a particular computer algebra package: Some projects illustrating standard results are intended to make use of simple computer skills while others provide insight into more advanced and physically important theoretical questions. Typical of the projects designed to introduce new ideas are those at the end of the chapter on partial differential equations, which offer a brief introduction to the special nonlinear wave solutions called solitons. xviii
• Numerical Mathematics A lthough an understanding of basic numerical mathematics is essential for all engineering students, in a book such as this it is impossible to provide a systematic account of this important discipline. The aim of this chapter is to provide a general idea of how to approach and deal with some of the most important and frequently encountered numerical operations, using only basic numerical techniques, and thereafter to encourage the use of standard numerical packages. The routines available in numerical packages are sophisticated, highly optimized and efﬁcient, but the general ideas that are involved are easily understood once the material in the chapter has been assimilated. The accounts that are given here purposely avoid going into great detail as this can be found in the quoted references. However, the chapter does indicate when it is best to use certain types of routine and those circumstances where routines might be inappropriate. The details of references to literature contained in square brackets at the ends of sections are listed at the back of the book with suggestions for additional reading. An instructor’s Solutions Manual that gives outline solutions for the technology projects is also available. Acknowledgments I wish to express my sincere thanks to the reviewers and accuracy readers, those cited below and many who remain anonymous, whose critical comments and suggestions were so valuable, and also to my many students whose questions when studying the material in this book have contributed so fundamentally to its development. Particular thanks go to: Chun Liu, Pennsylvania State University William F. Moss, Clemson University Donald Hartig, California Polytechnic State University at San Luis Obispo Howard A. Stone, Harvard University Donald Estep, Georgia Institute of Technology Preetham B. Kumar, California State University at Sacramento Anthony L. Peressini, University of Illinois at Urbana-Champaign Eutiquio C. Young, Florida State University Colin H. Marks, University of Maryland Ronald Jodoin, Rochester Institute of Technology Edgar Pechlaner, Simon Fraser University Ronald B. Guenther, Oregon State University Mattias Kawski, Arizona State University L. F. Shampine, Southern Methodist University In conclusion, I also wish to thank my editor, Barbara Holland, for her invaluable help and advice on presentation; Julie Bolduc, senior production editor, for her patience and guidance; Mike Sugarman, for his comments during the early stages of writing; and, ﬁnally, Chuck Glaser, for encouraging me to write the book in the ﬁrst place. xix
• PART ONE REVIEW MATERIAL Chapter 1 Review of Prerequisites 1
• 1 C H A P T E R Review of Prerequisites E very account of advanced engineering mathematics must rely on earlier mathematics courses to provide the necessary background. The essentials are a ﬁrst course in calculus and some knowledge of elementary algebraic concepts and techniques. The purpose of the present chapter is to review the most important of these ideas that have already been encountered, and to provide for convenient reference results and techniques that can be consulted later, thereby avoiding the need to interrupt the development of subsequent chapters by the inclusion of review material prior to its use. Some basic mathematical conventions are reviewed in Section 1.1, together with the method of proof by mathematical induction that will be required in later chapters. The essential algebraic operations involving complex numbers are summarized in Section 1.2, the complex plane is introduced in Section 1.3, the modulus and argument representation of complex numbers is reviewed in Section 1.4, and roots of complex numbers are considered in Section 1.5. Some of this material is required throughout the book, though its main use will be in Part 5 when developing the theory of analytic functions. The use of partial fractions is reviewed in Section 1.6 because of the part they play in Chapter 7 in developing the Laplace transform. As the most basic properties of determinants are often required, the expansion of determinants is summarized in Section 1.7, though a somewhat fuller account of determinants is to be found later in Section 3.3 of Chapter 3. The related concepts of limit, continuity, and differentiability of functions of one or more independent variables are fundamental to the calculus, and to the use that will be made of them throughout the book, so these ideas are reviewed in Sections 1.8 and 1.9. Tangent line and tangent plane approximations are illustrated in Section 1.10, and improper integrals that play an essential role in the Laplace and Fourier transforms, and also in complex analysis, are discussed in Section 1.11. The importance of Taylor series expansions of functions involving one or more independent variables is recognized by their inclusion in Section 1.12. A brief mention is also made of the two most frequently used tests for the convergence of series, and of the differentiation and integration of power series that is used in Chapter 8 when considering series solutions of linear ordinary differential equations. These topics are considered again in Part 5 when the theory of analytic functions is developed. The solution of many problems involving partial differential equations can be simpliﬁed by a convenient choice of coordinate system, so Section 1.13 reviews the theorem for the 3
• 4 Chapter 1 Review of Prerequisites change of variable in partial differentiation, and describes the cylindrical polar and spherical polar coordinate systems that are the two that occur most frequently in practical problems. Because of its fundamental importance, the implicit function theorem is stated without proof in Section 1.14, though it is not usually mentioned in ﬁrst calculus courses. 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions N umbers are fundamental to all mathematics, and real numbers are a subset of complex numbers. A real number can be classiﬁed as being an integer, a rational number, or an irrational number. From the set of positive and negative integers, and zero, the set of positive integers 1, 2, 3, . . . is called the set of natural numbers. The rational numbers are those that can be expressed in the form m/n, √ where m and n are integers with n = 0. Irrational numbers such as π , 2, and sin 2 are numbers that cannot be expressed in rational form, so, for example, for no √ integers m and n is it true that 2 is equal to m/n. Practical calculations can only be performed using rational numbers, so all irrational numbers that arise must be approximated arbitrarily closely by rational numbers. Collectively, the sets of integers and rational and irrational numbers form what is called the set of all real numbers, and this set is denoted by R. When it is necessary to indicate that an arbitrary number a is a real number a shorthand notation is adopted involving the symbol ∈, and we will write a ∈ R. The symbol ∈ is to be read “belongs to” or, more formally, as “is an element of the set.” If a is not a member of set R, the symbol ∈ is negated by writing ∈, and we will write a ∈ R where, of / / course, the symbol ∈ is to be read as “does not belong to,” or “is not an element / of the set.” As real numbers can be identiﬁed in a unique manner with points on a line, the set of all real numbers R is often called the real line. The set of all complex numbers C to which R belongs will be introduced later. One of the most important properties of real numbers that distinguishes them from other complex numbers is that they can be arranged in numerical order. This fundamental property is expressed by saying that the real numbers possess the order property. This simply means that if x, y ∈ R, with x = y, then either x < y or x > y, where the symbol < is to be read “is less than” and the symbol > is to be read “is greater than.” When the foregoing results are expressed differently, though equivalently, if x, y ∈ R, with x = y, then either x − y < 0 absolute value or x − y > 0. It is the order property that enables the graph of a real function f of a real variable x to be constructed. This follows because once length scales have been chosen for the axes together with a common origin, a real number can be made to correspond to a unique point on an axis. The graph of f follows by plotting all possible points (x, f (x)) in the plane, with x measured along one axis and f (x) along the other axis. The absolute value |x| of a real number x is deﬁned by the formula |x| = x if x ≥ 0 −x if x < 0.
• Section 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions 5 This form of deﬁnition is in reality a concise way of expressing two separate statements. One statement is obtained by reading |x| with the top condition on the right and the other by reading it with the bottom condition on the right. The absolute value of a real number provides a measure of its magnitude without regard to its sign so, for example, |3| = 3, |−7.41| = 7.41, and |0| = 0. Sometimes the form of a general mathematical result that only depends on an arbitrary natural number n can be found by experiment or by conjecture, and then the problem that remains is how to prove that the result is either true or false for all n. A typical example is the proposition that the product (1 − 1/4)(1 − 1/9)(1 − 1/16) . . . [1 − 1/(n + 1)2 ] = (n + 2)/(2n + 2), mathematical induction for n = 1, 2, . . . . This assertion is easily checked for any speciﬁc positive integer n, but this does not amount to a proof that the result is true for all natural numbers. A powerful method by which such propositions can often be shown to be either true or false involves using a form of argument called mathematical induction. This type of proof depends for its success on the order property of numbers and the fact that if n is a natural number, then so also is n + 1. The steps involved in an inductive proof can be summarized as follows. Proof by Mathematical Induction Let P(n) be a proposition depending on a positive integer n. STEP 1 STEP 2 STEP 3 STEP 4 Show, if possible, that P(n) is true for some positive integer n0 . Show, if possible, that if P(n) is true for an arbitrary integer n = k ≥ n0 , then the proposition P(k + 1) follows from proposition P(k). If Step 2 is true, the fact that P(n0 ) is true implies that P(n0 + 1) is true, and then that P(n0 + 2) is true, and hence that P(n) is true for all n ≥ n0 . If no number n = n0 can be found for which Step 1 is true, or if in Step 2 it can be shown that P(k) does not imply P(k + 1), the proposition P(n) is false. The example that follows is typical of the situation where an inductive proof is used. It arises when determining the nth term in the Maclaurin series for sin ax that involves ﬁnding the nth derivative of sin ax. A result such as this may be found intuitively by inspection of the ﬁrst few derivatives, though this does not amount to a formal proof that the result is true for all natural numbers n. EXAMPLE 1.1 Prove by mathematical induction that dn /dx n [sin ax] = a n sin(ax + nπ/2), for n = 1, 2, . . . . Solution The proposition P(n) is that dn /dx n [sin ax] = a n sin(ax + nπ/2), STEP 1 for n = 1, 2, . . . . Differentiation gives d/dx[sin ax] = a cos ax,
• 6 Chapter 1 Review of Prerequisites but setting n = 1 in P(n) leads to the result d/dx[sin ax] = a sin(ax + π/2) = a cos ax, showing that proposition P(n) is true for n = 1 (so in this case n0 = 1). STEP 2 Assuming P(k) to be true for k > 1, differentiation gives d/dx{dk/dx k[sin ax]} = d/dx[a k sin(ax + kπ/2)], so dk+1 /dx k+1 [sin ax] = a k+1 cos(ax + kπ/2). However, replacing k by k + 1 in P(k) gives dk+1 /dx k+1 [sin ax] = a k+1 sin[ax + (k + 1)π/2] = a k+1 sin[(ax + kπ/2) + π/2] = a k+1 cos(ax + kπ/2), showing, as required, that proposition P(k) implies proposition P(k + 1), so Step 2 is true. STEP 3 As P(n) is true for n = 1, and P(k) implies P(k + 1), it follows that the result is true for n = 1, 2, . . . and the proof is complete. The binomial theorem ﬁnds applications throughout mathematics at all levels, so we quote it ﬁrst when the exponent n is a positive integer, and then in its more general form when the exponent α involved is any real number. Binomial theorem when n is a positive integer If a, b are real numbers and n is a positive integer, then n(n − 1) n−2 2 a b 2! n(n − 1)(n − 2) n−3 3 a b + · · · + bn , + 3! (a + b)n = a n + na n−1 b + binomial coefﬁcient or more concisely in terms of the binomial coefﬁcient n n! , = r (n − r )!r ! we have n (a + b)n = r =0 n n−r r a b, r where m! is the factorial function deﬁned as m! = 1 · 2 · 3 · · · m with m > 0 an integer, and 0! is deﬁned as 0! = 1. It follows at once that n n = = 1. 0 n
• Section 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions 7 The binomial theorem involving the expression (a + b)α , where a and b are real numbers with |b/a| < 1 and α is an arbitrary real number takes the following form. General form of the binomial theorem when α is an arbitrary real number If a and b are real numbers such that |b/a| < 1 and α is an arbitrary real number, then (a + b)α = a α 1 + + b a α = aα 1 + α(α − 1)(α − 2) 3! b a α 1! b α(α − 1) + a 2! b a 2 3 + ··· . The series on the right only terminates after a ﬁnite number of terms if α is a positive integer, in which case the result reduces to the one just given. If α is a negative integer, or a nonintegral real number, the expression on the right becomes an inﬁnite series that diverges if |b/a| > 1. EXAMPLE 1.2 Expand (3 + x)−1/2 by the binomial theorem, stating for what values of x the series converges. Solution Setting b/a = 1 x in the general form of the binomial theorem gives 3 1 (3 + x)−1/2 = 3−1/2 1 + x 3 −1/2 1 1 1 5 3 x + ··· . = √ 1 − x + x2 − 6 24 432 3 The series only converges if | 1 x| < 1, and so it is convergent provided |x| < 3. 3 Some standard mathematical conventions Use of combinations of the ± and ∓ signs The occurrence of two or more of the symbols ± and ∓ in an expression is to be taken to imply two separate results, the ﬁrst obtained by taking the upper signs and the second by taking the lower signs. Thus, the expression a ± b sin θ ∓ c cos θ is an abbreviation for the two separate expressions a + b sin θ − c cos θ and a − b sin θ + c cos θ. Multi-statements multi-statement When a function is deﬁned sectionally on n different intervals of the real line, instead of formulating n separate deﬁnitions these are usually simpliﬁed by being combined into what can be considered to be a single multi-statement. The following example is typical of a multi-statement: ⎧ ⎨sin x, x < π 0, π ≤ x ≤ 3π/2 f (x) = ⎩ −1, x > 3π/2.
• 8 Chapter 1 Review of Prerequisites It is, in fact, three statements. The ﬁrst is obtained by reading f (x) in conjunction with the top line on the right, the second by reading it in conjunction with the second line on the right, and the third by reading it in conjunction with the third line on the right. An example of a multi-statement has already been encountered in the deﬁnition of the absolute value |x| of a number x. Frequent use of multi-statements will be made in Chapter 9 on Fourier series, and elsewhere. Polynomials polynomials A polynomial is an expression of the form P(x) = a0 x n + a1 x n−1 + · · · + an−1 x + an . The integer n is called the degree of the polynomial, and the numbers ai are called its coefﬁcients. The fundamental theorem of algebra that is proved in Chapter 14 asserts that P(x) = 0 has n roots that may be either real or complex, though some of them may be repeated. (a0 = 0 is assumed.) Notation for ordinary and partial derivatives If f (x) is an n times differentiable function then f (n) (x) will, on occasion, be used to signify dn f/dx n , so that f (n) (x) = sufﬁx notation for partial derivatives dn f . dx n If f (x, y) is a suitably differentiable function of x and y, a concise notation used to signify partial differentiation involves using sufﬁxes, so that fx = ∂f ∂ , fyx = ( fy )x = ∂x ∂x ∂f ∂y = ∂2 f ∂2 f , fyy = ,..., ∂ y∂ x ∂ y2 with similar results when f is a function of more than two independent variables. Inverse trigonometric functions The periodicity of the real variable trigonometric sine, cosine, and tangent functions means that the corresponding general inverse trigonometric functions are many val√ ued. So, for example, if y = sin x and we ask for what values of x is y = 1/ 2, we ﬁnd this is true for x = π/4 ± 2nπ and x = 3π/4 ± 2nπ for n = 0, 1, 2, . . . . To overcome this ambiguity, we introduce the single valued inverses, denoted respectively by x = Arcsin y, x = Arccos y, and x = Arctan y by restricting the domain and range of the sine, cosine, and tangent functions to one where they are either strictly increasing or strictly decreasing functions, because then one value of x corresponds to one value of y and, conversely, one value of y corresponds to one value of x. In the case of the function y = sin x, by restricting the argument x to the interval −π/2 ≤ x ≤ π/2 the function becomes a strictly increasing function of x. The corresponding single valued inverse function is denoted by x = Arcsin y, where y is a number in the domain of deﬁnition [−1, 1] of the Arcsine function and x is a number in its range [−π/2, π/2]. Similarly, when considering the function y = cos x, the argument is restricted to 0 ≤ x ≤ π to make cos x a strictly decreasing function of x. The corresponding single valued inverse function is denoted by x = Arccos y, where y is a number in the domain of deﬁnition [−1, 1] of the Arccosine function and x is a number in its range [0, π]. Finally, in the case of the function y = tan x, restricting
• Section 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions 9 the argument to the interval −π/2 < x < π/2 makes the tangent function a strictly increasing function of x. The corresponding single valued inverse function is denoted by x = Arctan y where y is a number in the domain of deﬁnition (−∞, ∞) of the Arctangent function and x is a number in its range (−π/2, π/2). As the inverse trigonometric functions are important in their own right, the variables x and y in the preceding deﬁnitions are interchanged to allow consideration of the inverse functions y = Arcsin x, y = Arccos x, and y = Arctan x, so that now x is the independent variable and y is the dependent variable. With this interchange of variables the expression y = arcsin x will be used to refer to any single valued inverse function with the same domain of deﬁnition as Arcsin x, but with a different range. Similar deﬁnitions apply to the functions y = arccos x and y = arctan x. Double summations An expression involving a double summation like ∞ ∞ amn sin mx sin ny, m=1 n=1 double summation means sum the terms amn sin mx sin ny over all possible values of m and n, so that ∞ ∞ amn sin mx sin ny = a11 sin x sin y + a12 sin x sin 2y m=1 n=1 + a21 sin 2x sin y + a22 sin 2x sin 2y + · · · . A more concise notation also in use involves writing the double summation as ∞ amn sin mx sin nx. m=1,n=1 The signum function signum function The signum function, usually written sign(x), and sometimes sgn(x), is deﬁned as sign(x) = 1 if x > 0 −1 if x < 0. We have, for example, sign(cos x) = 1 for 0 < x < π/2, and sign(cos x) = −1 for π/2 < x < π or, equivalently, sign(cos x) = 1, 0 < x < 1 π 2 −1, 1 π < x < π. 2 Products Let {uk}n be a sequence of numbers or functions u1 , u2 , . . . ; then the product of k=1 the n members of this sequence is denoted by n uk, so that k=1 n uk = u1 u2 · · · un . k=1 inﬁnite product When the sequence is inﬁnite, n uk = lim n→∞ k=1 ∞ uk k=1
• 10 Chapter 1 Review of Prerequisites is called an inﬁnite product involving the sequence {uk}. Typical examples of inﬁnite products are ∞ 1− k=2 1 k2 = 1 2 ∞ and k=1 1− x2 k2 π 2 = sin x . x More background information and examples can be found in the appropriate sections in any of references [1.1], [1.2], and [1.5]. Logarithmic functions the functions ln and Log The notation ln x is used to denote the natural logarithm of a real number x, that is, the logarithm of x to the base e, and in some books this is written loge x. In this book logarithms to the base 10 are not used, and when working with functions of a complex variable the notation Log z, with z = r eiθ means Log z = ln r + iθ . EXERCISES 1.1 √ √ 1. Prove √that if a > 0, b > 0, then a/ b + b/ a ≥ √ a + b. Prove Exercises 2 through 6 by mathematical induction. 2. 3. 4. 5. 6. n−1 k=0 (a + kd) = (n/2)[2a + (n − 1)d] (sum of an arithmetic series). n−1 k r = (1 − r n )/(1 − r ) (r = 1) k=0 (sum of a geometric series). n k2 = (1/6)n(n + 1)(2n + 1) (sum of squares). k=1 dn /dx n [cos ax] = a n cos(ax + nπ/2), with n a natural number. dn /dx n [ln(1 + x)] = (−1)n+1 (n − 1)!/(1 + x)n , with n a natural number. 1.2 7. Use the binomial theorem to expand (3 + 2x)4 . 8. Use the binomial theorem and multiplication to expand (1 − x 2 )(2 + 3x)3 . In Exercises 9 through 12 ﬁnd the ﬁrst four terms of the binomial expansion of the function and state conditions for the convergence of the series. 9. 10. 11. 12. (3 + 2x)−2 . (2 − x 2 )1/3 . (4 + 2x 2 )−1/2 . (1 − 3x 2 )3/4 . Complex Numbers Mathematical operations can lead to numbers that do not belong to the real number system R introduced in Section 1.1. In the simplest case this occurs when ﬁnding the roots of the quadratic equation ax 2 + bx + c = 0 with a, b, c ∈ R, a = 0 by means of the quadratic formula x= discriminant of a quadratic −b ± √ b2 − 4ac . 2a The discriminant of the equation is b2 − 4ac, and if b2 − 4ac < 0 the formula involves the square root of a negative real number; so, if the formula is to have meaning, numbers must be allowed that lie outside the real number system. The inadequacy of the real number system when considering different mathematical operations can be illustrated in other ways by asking, for example, how to ﬁnd the three roots that are expected of a third degree algebraic equation as
• Section 1.2 Complex Numbers 11 simple as x 3 − 1 = 0, where only the real root 1 can be found using y = x 3 − 1, or by seeking to give meaning to ln(−1), both of which questions will arise later. Difﬁculties such as these can all be overcome if the real number system is extended by introducing the imaginary unit i deﬁned as i 2 = −1, so expressions like (−k2 ) where k a positive real number may be written (−1) (k2 ) = ±ik. Notice that as the real number k only scales the imaginary unit i, it is immaterial whether the result is written as ik or as ki. The extension to the real number system that is required to resolve problems of the type just illustrated involves the introduction of complex numbers, denoted collectively by C, in which the general complex number, usually denoted by z, has the form z = α + iβ, real and imaginary part notation with α, β real numbers. The real number α is called the real part of the complex number z, and the real number β is called its imaginary part. When these need to be identiﬁed separately, we write Re{z} = α and Im{z} = β, so if z = 3 − 7i, Re{z} = 3 and Im{z} = −7. If Im{z} = β = 0 the complex number z reduces to a real number, and if Re{z} = α = 0 it becomes a purely imaginary number, so, for example, z = 5i is a purely imaginary number. When a complex number z is considered as a variable it is usual to write it as z = x + i y, where x and y are now real variables. If it is necessary to indicate that z is a general complex number we write z ∈ C. When solving the quadratic equation az2 + bz + c = 0 with a, b, and c real numbers and a discriminant b2 − 4ac < 0, by setting 4ac − b2 = k2 in the quadratic formula, with k > 0, the two roots z1 and z2 are given by the complex numbers z1 = −(b/2a) + i(k/2a) and z2 = −(b/2a) − i(k/2a). Algebraic rules for complex numbers Let the complex numbers z1 and z2 be deﬁned as z1 = a + ib and z2 = c + id, with a, b, c, and d arbitrary real numbers. Then the following rules govern the arithmetic manipulation of complex numbers. Equality of complex numbers The complex numbers z1 and z2 are equal, written z1 = z2 if, and only if, Re{z1 } = Re{z2 } and Im{z1 } = Im{z2 }. So a + ib = c + id if, and only if, a = c and b = d.
• 12 Chapter 1 Review of Prerequisites EXAMPLE 1.3 (a) 3 − 9i = 3 + bi if, and only if, b = −9. (b) If u = −2 + 5i, v = 3 + 5i, w = a + 5i, then u = w if, and only if, a = −2 but u = v, and v = w if, and only if, a = 3. Zero complex number The zero complex number, also called the null complex number, is the number 0 + 0i that, for simplicity, is usually written as an ordinary zero 0. EXAMPLE 1.4 If a + ib = 0, then a = 0 and b = 0. Addition and subtraction of complex numbers The addition (sum) and subtraction (difference) of the complex numbers z1 and z2 is deﬁned as z1 + z2 = Re{z1 } + Re{z2 } + i[Im{z1 } + Im{z2 }] and z1 − z2 = Re{z1 } − Re{z2 } + i[Im{z1 } − Im{z2 }]. So, if z1 = a + ib and z2 = c + id, then z1 + z2 = (a + ib) + (c + id) = (a + c) + i(b + d), and z1 − z2 = (a + ib) − (c + id) = (a − c) + i(b − d). EXAMPLE 1.5 If z1 = 3 + 7i and z2 = 3 + 2i, then the sum z1 + z2 = (3 + 3) + (7 + 2)i = 6 + 9i, and the difference z1 − z2 = (3 − 3) + (7 − 2)i = 5i. Multiplication of complex numbers The multiplication (product) of the two complex numbers z1 = a + ib and z2 = c + id is deﬁned by the rule z1 z2 = (a + ib)(c + id) = (ac − bd) + i(ad + bc). An immediate consequence of this deﬁnition is that if k is a real number, then kz1 = k(a + ib) = ka + ikb. This operation involving multiplication of a complex
• Section 1.2 Complex Numbers 13 number by a real number is called scaling a complex number. Thus, if z1 = 3 + 7i and z2 = 3 + 2i, then 2z1 − 3z2 = (6 + 14i) − (9 + 6i) = −3 + 8i. In particular, if z = a + ib, then −z = (−1)z = −a − ib. This is as would be expected, because it leads to the result z − z = 0. In practice, instead of using this formal deﬁnition of multiplication, it is more convenient to perform multiplication of complex numbers by multiplying the bracketed quantities in the usual algebraic manner, replacing every product i 2 by −1, and then combining separately the real and imaginary terms to arrive at the required product. EXAMPLE 1.6 (a) 5i(−4 + 3i) = −15 − 20i. (b) (3 − 2i)(−1 + 4i)(1 + i) = (−3 + 12i + 2i − 8i 2 )(1 + i) = [(−3 + 8) + (12 + 2)i](1 + i) = (5 + 14i)(1 + i) = 5 + 14i + 5i + 14i 2 = (5 − 14) + (5 + 14)i = −9 + 19i. Complex conjugate If z = a + ib, then the complex conjugate of z, usually denoted by z and read “z bar,” is deﬁned as z = a − ib. It follows directly that (z) = z and zz = a 2 + b2 . In words, the complex conjugate operation has the property that taking the complex conjugate of a complex conjugate returns the original complex number, whereas the product of a complex number and its complex conjugate always yields a real number. If z = a + ib, then adding and subtracting z and z gives the useful results z + z = 2Re{z} = 2a and z − z = 2i Im{z} = 2ib. These can be written in the equivalent form Re{z} = a = 1 (z + z) 2 and Im{z} = b = 1 (z − z). 2i Quotient (division) of complex numbers Let z1 = a + ib and z2 = c + id. Then the quotient z1 /z2 is deﬁned as z1 (ac + bd) + i(bc − ad) = , z2 = 0. z2 c2 + d2
• 14 Chapter 1 Review of Prerequisites In practice, division of complex numbers is not carried out using this deﬁnition. Instead, the quotient is written in the form z1 z1 z2 = , z2 z2 z2 where the denominator is now seen to be a real number. The quotient is then found by multiplying out and simplifying the numerator in the usual manner and dividing the real and imaginary parts of the numerator by the real number z2 z2 . EXAMPLE 1.7 Find z1 /z2 given that z1 = (3 + 2i) and z2 = 1 + 3i. Solution (3 + 2i)(1 − 3i) 3 − 9i + 2i − 6i 2 9 7i 3 + 2i = = = − . 1 + 3i (1 + 3i)(1 − 3i) 10 10 10 Modulus of a complex number The modulus of the complex number z = a + ib denoted by |z|, and also called its magnitude, is deﬁned as |z| = (a 2 + b2 )1/2 = (zz)1/2 . It follows directly from the deﬁnitions of the modulus and division that |z| = |z| = (a 2 + b2 )1/2 , and z1 /z2 = z1 z2 /|z2 |2 . EXAMPLE 1.8 If z = 3 + 7i, then |z| = |3 + 7i| = (32 + 72 )1/2 = √ 58. It is seen that the foregoing rules for the arithmetic manipulation of complex numbers reduce to the ordinary arithmetic rules for the algebraic manipulation of real numbers when all the complex numbers involved are real numbers. Complex numbers are the most general numbers that need to be used in mathematics, and they contain the real numbers as a special case. There is, however, a fundamental difference between real and complex numbers to which attention will be drawn after their common properties have been listed. Properties shared by real and complex numbers Let z, u, and w be arbitrary real or complex numbers. Then the following properties are true: z + u = u + z. This means that the order in which complex numbers are added does not affect their sum. 2. zu = uz. This means that the order in which complex numbers are multiplied does not affect their product. 1.
• Section 1.3 The Complex Plane 3. (z + u) + w = z + (u + w). This means that the order in which brackets are inserted into a sum of ﬁnitely many complex numbers does not affect the sum. 4. z(uw) = (zu)w. This means that the terms in a product of complex numbers may be grouped and multiplied in any order without affecting the resulting product. 5. z(u + w) = zu + zw. This means that the product of z and a sum of complex numbers equals the sum of the products of z and the individual complex numbers involved in the sum. 6. z + 0 = 0 + z = z. This result means that the addition of zero to any complex number leaves it unchanged. 7. 15 z · 1 = 1 · z = z. This result means that multiplication of any complex number by unity leaves the complex number unchanged. Despite the properties common to real and complex numbers just listed, there remains a fundamental difference because, unlike real numbers, complex numbers have no natural order. So if z1 and z2 are any complex numbers, a statement such as z1 < z2 has no meaning. EXERCISES 1.2 Find the roots of the equations in Exercises 1 through 6. 1. z2 + z + 1 = 0. 2. 2z2 + 5z + 4 = 0. 3. z2 + z + 6 = 0. 4. 3z2 + 2z + 1 = 0. 5. 3z2 + 3z + 1 = 0. 6. 2z2 − 2z + 3 = 0. 7. Given that z = 1 is a root, ﬁnd the other two roots of 2z3 − z2 + 3z − 4 = 0. 8. Given that z = −2 is a root, ﬁnd the other two roots of 4z3 + 11z2 + 10z + 8 = 0. 1.3 9. Given u = 4 − 2i, v = 3 − 4i, w = −5i and a + ib = (u + iv)w, ﬁnd a and b. 10. Given u = −4 + 3i, v = 2 + 4i, and a + ib = uv2 , ﬁnd a and b. 11. Given u = 2 + 3i, v = 1 − 2i, w = −3 − 6i, ﬁnd |u + v|, u + 2v, u − 3v + 2w, uv, uvw, |u/v|, v/w. 12. Given u = 1 + 3i, v = 2 − i, w = −3 + 4i, ﬁnd uv/w, uw/v and |v|w/u. The Complex Plane cartesian representation of z Complex numbers can be represented geometrically either as points, or as directed line segments (vectors), in the complex plane. The complex plane is also called the z-plane because of the representation of complex numbers in the form z = x + i y. Both of these representations are accomplished by using rectangular cartesian coordinates and plotting the complex number z = a + ib as the point (a, b) in the plane, so the x-coordinate of z is a = Re{z} and its y-coordinate is b = Im{z}. Because of this geometrical representation, a complex number written in the form z = a + ib is said to be expressed in cartesian form. To acknowledge the Swiss amateur mathematician Jean-Robert Argand, who introduced the concept of the complex plane in 1806, and who by profession was a bookkeeper, this representation is also called the Argand diagram.
• Section 1.3 The Complex Plane 17 Imaginary axis Imaginary axis b+d d d z1 z2 + z2 b −c b z1 a−c 0 −z2 z1 0 z2 z1 − zc −z2 a 2 Real axis b−d −d a a + c Real axis c (a) (b) FIGURE 1.2 Addition and subtraction of complex numbers using the triangle/parallelogram law. The geometrical interpretation of the subtraction of z2 from z1 follows similarly by adding to z1 the directed line segment −z2 that is obtained by reversing of the sense (arrow) along z2 , as shown in Fig. 1.2b. It is an elementary fact from Euclidean geometry that the sum of the lengths of the two sides |u| and |v| of the triangle in Fig. 1.3 is greater than or equal to the length of the hypotenuse |u + v|, so from geometrical considerations we can write |u + v| ≤ |u| + |v|. triangle inequality This result involving the moduli of the complex numbers u and v is called the triangle inequality for complex numbers, and it has many applications. An algebraic proof of the triangle inequality proceeds as follows: |u + v|2 = (u + v)(u + v) = uu + vu + uv + vv = |u|2 + |v|2 + (uv + uv) ≤ |u2 | + |v2 | + 2|uv| = (|u| + |v|)2 . The required result now follows from taking the positive square root. A similar argument, the proof of which is left as an exercise, can be used to show that u| − |v ≤ |u + v|, so when combined with the triangle inequality we have u| − |v ≤ |u + v| ≤ |u| + |v|. Imaginary axis u+ ⎢ v⎥ ⎢v⎥ ⎢u⎥ 0 FIGURE 1.3 The triangle inequality. Real axis
• 18 Chapter 1 Review of Prerequisites EXERCISES 1.3 In Exercises 1 through 8 use the parallelogram law to form the sum and difference of the given complex numbers and then verify the results by direct addition and subtraction. 1. 2. 3. 4. u = 2 + 3i, v = 1 − 2i. u = 4 + 7i, v = −2 − 3i. u = −3, v = −3 − 4i. u = 4 + 3i, v = 3 + 4i. 1.4 5. 6. 7. 8. u = 3 + 6i, v = −4 + 2i. u = −3 + 2i, v = 6i. u = −4 + 2i, v = −4 − 10i. u = 4 + 7i, v = −3 + 5i. In Exercises 9 through 11 use the parallelogram law to verify the triangle inequality |u + v| ≤ |u| + |v| for the given complex numbers u and v. 9. u = −4 + 2i, v = 3 + 5i. 10. u = 2 + 5i, v = 3 − 2i. 11. u = −3 + 5i, v = 2 + 6i. Modulus and Argument Representation of Complex Numbers polar representation of z When representing z = x + i y in the complex plane by a point P with coordinates (x, y), a natural alternative to the cartesian representation is to give the polar coordinates (r, θ) of P. This polar representation of z is shown in Fig. 1.4, where OP = r = |z| = (x 2 + y2 )1/2 and tan θ = y/x. (1) The radial distance OP is the modulus of z, so r = |z|, and the angle θ measured counterclockwise from the positive real axis is called the argument of z. Because of this, a complex number expressed in terms of the polar coordinates (r, θ ) is said to be in modulus–argument form. The argument θ is indeterminate up to a multiple of 2π, because the polar coordinates (r, θ ), and (r, θ + 2kπ ), with k = ±1, ±2, . . . , identify the same point P. By convention, the the angle θ is called the principal value of the argument of z when it lies in the interval −π < θ ≤ π . To distinguish the principal value of the argument from all of its other values, we write Arg z = θ, when −π < θ ≤ π. (2) The values of the argument of z that differ from this value of θ by a multiple of 2π are denoted by arg z, so that arg z = θ + 2kπ, with k = ± 1, ±2, . . . . Imaginary axis P(r, θ) y = r sin θ r= θ O z ⎢⎥ z x = r cos θ Real axis FIGURE 1.4 The complex plane and the (r, θ ) representation of z. (3)
• Section 1.4 Modulus and Argument Representation of Complex Numbers 19 The signiﬁcance of the multivalued nature of arg z will become apparent later when the roots of complex numbers are determined. The connection between the cartesian coordinates (x, y) and the polar coordinates (r, θ ) of the point P corresponding to z = x + i y is easily seen to be given by x = r cos θ modulus–argument representation of z and y = r sin θ. This leads immediately to the representation of z = x + i y in the alternative modulus–argument form z = r (cos θ + i sin θ ). (4) A routine calculation using elementary trigonometric identities shows that (cos θ + i sin θ )2 = (cos 2θ + i sin 2θ). An inductive argument using the above result as its ﬁrst step then establishes the following simple but important theorem. THEOREM 1.1 De Moivre’s theorem (cos θ + i sin θ)n = (cos nθ + i sin nθ), EXAMPLE 1.9 for n a natural number. Use de Moivre’s theorem to express cos 4θ and sin 4θ in terms of powers of cos θ and sin θ . Solution The result is obtained by ﬁrst setting n = 4 in de Moivre’s theorem and expanding (cos θ + i sin θ )4 to obtain cos4 θ + 4i cos3 θ sin θ − 6 cos2 θ sin2 θ − 4i cos θ sin3 θ + sin4 θ = cos 4θ + i sin 4θ. Equating the respective real and imaginary parts on either side of this identity gives the required results cos 4θ = cos4 θ − 6 cos2 θ sin2 θ + sin4 θ and sin 4θ = 4 cos3 θ sin θ − 4 cos θ sin3 θ. As the complex number z = cos θ + i sin θ has unit modulus, it follows that all numbers of this form lie on the unit circle (a circle of radius 1) centered on the origin, as shown in Fig. 1.5. Using (5), we see that if z = r (cos θ + i sin θ), then zn = r n (cos nθ + i sin nθ), for n a natural number. θ (5) The relationship between e , sin θ, and cos θ can be seen from the following well-known series expansions of the functions eθ = sin θ = cos θ = ∞ n=0 ∞ θ2 θ3 θ4 θ5 θ6 θn =1+θ + + + + + + ···; n! 2! 3! 4! 5! 6! (−1)n θ3 θ5 θ7 θ 2n+1 =θ− + − + ···; (2n + 1)! 3! 5! 7! (−1)n θ2 θ4 θ6 θ 2n =1− + − + ···. (2n)! 2! 4! 6! n=0 ∞ n=0
• 20 Chapter 1 Review of Prerequisites Imaginary axis i z = cos θ + i sin θ y = sin θ −1 0 θ x = cos θ 1 Real axis −i FIGURE 1.5 Point z = cos θ + i sin θ on the unit circle centered on the origin. Euler formula By making a formal power series expansion of the function eiθ , simplifying powers of i, grouping together the real and imaginary terms, and using the series representations for cos θ and sin θ, we arrive at what is called the real variable form of the Euler formula eiθ = cos θ + i sin θ, for any real θ . (6) This immediately implies that if z = r eiθ , then zα = r α eiαθ , for any real α. (7) When θ is restricted to the interval −π < θ ≤ π , formula (6) leads to the useful results 1 = ei0 , i = eiπ/2 , −1 = eiπ , −i = e−iπ/2 and, in particular, to 1 = e2kπi for k = 0, ±1, ±2, . . . . The Euler form for complex numbers makes their multiplication and division very simple. To see this we set z1 = r1 eiα and z2 = r2 eiβ and then use the results z1 z2 = r1r2 ei(α+β) and z1 /z2 = r1 /r2 ei(α−β) . (8) These show that when complex numbers are multiplied, their moduli are multiplied and their arguments are added, whereas when complex numbers are divided, their moduli are divided and their arguments are subtracted. EXAMPLE 1.10 Find uv, u/v, and u25 given that u = 1 + i, v = √ 3 − i. √ √ √ i Solution u = 1 + i = 2eiπ/4 , v = 3 −√ = 2e−iπ/6 , so uv = 2 2eiπ/12 , u/v = √ iπ/4 25 √ i(6+1/4)π √ i5π/12 2)e while u25 = √ 2e ) = ( √ 25 (eiπ/4 )25 = 4096 2(e ( 2) ) = (1/ √ i6π 4096 2(e )(eiπ/4 ) = 4096 2(eiπ/4 ) = 4096 2(1 + i). To ﬁnd the principal value of the argument of a given complex number z, namely Arg z, use should be made of the signs of x = Re{z}, and y = Im{z} together
• Section 1.4 Modulus and Argument Representation of Complex Numbers 21 with the results listed below, all of which follow by inspection of Fig. 1.5. Signs of x and y x < 0, y < 0 x > 0, y < 0 x > 0, y > 0 x < 0, y > 0 EXAMPLE 1.11 Arg z = θ −π < θ < −π/2 −π/2 < θ < 0 0 < θ < π/2 π/2 < θ < π Find r = |z|, Arg z, arg z, and the modulus–argument form of the following values of z. √ √ √ (a) −2 3 − 2i (b) −1 + i 3 (c) 1 + i (d) 2 − i2 3. √ Solution (a) r = {(−2 3)2 + (−2)2 }1/2 = 4, Argz = θ = −5π/6 and arg z = −5π/6 + 2kπ , k = ± 1, ±2, . . . , z = 4(cos(−5π/6) + i sin(−5π/6)). √ (b) r = {(−1)2 + ( 3)2 }1/2 = 2, Arg z = θ = 2π/3 and arg z = 2π/3 + 2kπ, k = ±1, ±2, . . . , z = 2(cos(2π/3) + i sin(2π/3)). √ (c) r = {(1)2 + (1)2 }1/2 = 2, Arg z = θ = π/4 and arg z = π/4 + 2kπ, √ k = ±1, ±2, . . . , z = 2(cos(π/4) + i sin(π/4)). √ (d) r = {(2)2 + (−2 3)2 }1/2 = 4, Arg z = θ = −π/3 and arg z = −π/3 + 2kπ, k = ±1, ±2, . . . , z = 4(cos(−π/3) + i sin(−π/3)). EXERCISES 1.4 1. Expand (cos θ + i sin θ)2 and then use trigonometric identities to show that (cos θ + i sin θ )2 = (cos 2θ + i sin 2θ). 2. Give an inductive proof of de Moivre’s theorem (cos θ + i sin θ)n = (cos nθ + i sin nθ ), for n a natural number. 3. Use de Moivre’s theorem to express cos 5θ and sin 5θ in terms of powers of cos θ and sin θ . 4. Use de Moivre’s theorem to express cos 6θ and sin 6θ in terms of powers of cos θ and sin θ. 5. Show by expanding (cos α + i sin α)(cos β + i sin β) and using trigonometric identities that (cos α + i sin α)(cos β + i sin β) = cos(α + β) + i sin(α + β). 6. Show by expanding (cos α + i sin α)/(cos β + i sin β) and using trigonometric identities that (cos α + i sin α)/(cos β + i sin β) = cos(α − β) + i sin(α − β). 7. If z = cos θ + i sin θ = eiθ , show that when n is a natural number, cos(nθ) = 1 1 n z + n 2 z and sin(nθ) = 1 1 zn − n . 2i z Use these results to express cos3 θ sin3 θ in terms of multiple angles of θ. Hint: z = 1/z. ¯ 8. Use the method of Exercise 7 to express sin6 θ in terms of multiple angles of θ. 9. By expanding (z + 1/z)4 , grouping terms, and using the method of Exercise 7, show that cos4 θ = (1/8)(3 + 4 cos 2θ + cos 4θ). 10. By expanding (z − 1/z)5 , grouping terms, and using the method of Exercise 7, show that sin5 θ = (1/16)(sin 5θ − 5 sin 3θ + 10 sin θ). 11. Use the method of Exercise 7 to show that cos3 θ + sin3 θ = (1/4)(cos 3θ + 3 cos θ − sin3 θ + 3 sin θ).
• 22 Chapter 1 Review of Prerequisites In Exercises 12 through 15 express the functions of u, v, and w in modulus-argument form. √ 12. uv, u/v, and v5 , given that u = 2 − 2i and v = 3 + i3 3. √ 13. uv, u/v, and u7 , given that u = −1 − i 3, v = −4 + 4i. √ 14. uv, u/v, and v6 , given that u = 2 − 2i, v = 2 − i2 3. 15. uvw, √ uw/v, and w 3 /u4 , given that u = 2 − 2i, v = 3 − i3 3 and w = 1 + i. √ 16. Express [(−8 + i8 3)/(−1 − i)]2 in modulus–argument form. √ 17. Find in modulus–argument form [(1 + i 3)3 / (−1 + i)2 ]3 . 18. Use the factorization (1 − zn+1 ) = (1 − z)(1 + z + z2 + · · · + zn ) 1.5 with z = eiθ = exp(iθ) to show that n exp(ikθ) = k=1 exp(inθ) − 1 . 1 − exp(−iθ) 19. Use the ﬁnal result of Exercise 18 to show that n exp(ikθ) = k=1 exp[i(n + 1/2)θ] − exp(iθ/2) , exp(iθ/2) − exp(−iθ/2) and then use the result to deduce the Lagrange identity 1 + cos θ + cos 2θ + · · · + cos nθ sin[(n + 1/2)θ] = 1/2 + , for 0 < θ < 2π. 2 sin(θ/2) (z = 1) Roots of Complex Numbers It is often necessary to ﬁnd the n values of z1/n when n is a positive integer and z is an arbitrary complex number. This process is called ﬁnding the nth roots of z. To determine these roots we start by setting w = z1/n , which is equivalent to w n = z. Then, after deﬁning w and z in modulus–argument form as w = ρeiφ and z = reiθ , (9) we substitute for w and z in w n = z to obtain ρ n einφ = r eiθ . It is at this stage, in order to ﬁnd all n roots, that use must be made of the manyvalued nature of the argument of a complex number by recognizing that 1 = e2kπi for k = 0, ±1, ±2, . . . . Using this result we now multiply the right-hand side of the foregoing result by by e2kπi (that is, by 1) to obtain ρ n einφ = reiθ e2kπi = rei(θ +2kπ) . Equality of complex numbers in modulus–argument form means the equality of their moduli and, correspondingly, the equality of their arguments, so applying this to the last result we have ρn = r and nφ = θ + 2kπ, ρ = r 1/n and φ = (θ + 2kπ)/n. showing that Here r 1/n is simply the nth positive root of r : ρ = √ n r.
• Section 1.5 Roots of Complex Numbers 23 Imaginary axis w2 w1 (θ + 4π)/ n 2π/n (θ + )/n 2π w0 θ/n 0 Real axis wn−1 FIGURE 1.6 Location of the roots of z1/n . nth roots of a complex number z Finally, when we substitute these results into the expression for w, we see that the n values of the roots denoted by w0 , w1 , . . . , wn−1 are given by wk = r 1/n {cos[(θ + 2kπ)/n] + i sin[(θ + 2kπ )/n]}, for k = 0, 1, . . . , n − 1. (10) Notice that it is only necessary to allow k to run through the successive integers 0, 1, . . . , n − 1, because the period of the sine and cosine functions is 2π, so allowing k to increase beyond the value n − 1 will simply repeat this same set of roots. An identical argument shows that allowing kto run through successive negative integers can again only generate the same n roots w0 , w1 , . . . , wn−1 . Examination of the arguments of the roots shows them to be spaced uniformly around a circle of radius r 1/n centered on the origin. The angle between the radial lines drawn from the origin to each successive root is 2π/n, with the radial line from the origin to the ﬁrst root w0 making an angle θ/n to the positive real axis, as shown in Fig. 1.6. This means that if the location on the circle of any one root is known, then the locations of the others follow immediately. Writing unity in the form 1 = ei0 shows its modulus to be r = 1 and the principal value of its argument to be θ = 0. Substitution in formula (10) then shows the n roots of 11/n , called the nth roots of unity, to be w0 = 1, w1 = eiπ/n , w2 = ei2π/n , . . . , wn−1 = ei(n−1)π/n . (11) By way of example, the ﬁfth roots of unity are located around the unit circle as shown in Fig. 1.7. If we set ω = w1 , it follows that the nth roots of unity can be written in the form 1, ω, ω2 , . . . , ωn−1 . As ωn = 1 and ωn − 1 = (ω − 1)(1 + ω + ω2 + · · · + ωn−1 ) = 0, as ω1 = 1 we see that the the nth roots of unity satisfy 1 + ω + ω2 + · · · + ωn−1 = 0. (12)
• 24 Chapter 1 Review of Prerequisites Imaginary axis i w1 w2 2π /5 2π /5 w0 2π/5 1 0 Real axis 2π /5 2π /5 w3 w4 FIGURE 1.7 The ﬁfth roots of unity. This result remains true if ω is replaced by any one of the other nth roots of unity, with the exception of 1 itself. EXAMPLE 1.12 Find w = (1 + i)1/3 . √ √ Solution Setting z = 1 + i = 2eiπ/4 shows that r = |z| = 2 and θ = π/4. Substituting these results into formula (1) gives wk = 21/6 {cos[(1/12)(1 + 8k)π ] + i sin[(1/12)(1 + 8k)π ]}, for k = 0, 1, 2. The square root of a complex number ζ = α + iβ is often required, so we now derive a useful formula for its two roots in terms of |ζ |, α and the sign of β. To obtain the result we consider the equation z2 = ζ, where ζ = α + iβ, and let Arg ζ = θ . Then we may write z2 = |ζ |eiθ , and taking the square root of this result we ﬁnd the two square roots z− and z+ are given by z± = ±|ζ |1/2 eiθ/2 = ±|ζ |1/2 {cos(θ/2) + i sin(θ/2)}. Now cos θ = α/|ζ |, but cos2 (θ/2) = (1/2)(1 + cos θ ), and sin2 (θ/2) = (1/2)(1 − cos θ),
• Section 1.5 Roots of Complex Numbers 25 so cos2 (θ/2) = (1/2)(1 + α/|ζ |), and sin2 (θ/2) = (1/2)(1 − α/|ζ |). As −π < θ ≤ π, it follows that in this interval cos(θ/2) is nonnegative, so taking the square root of cos2 (θ/2) we obtain cos(θ/2) = |ζ | + α 2|ζ | 1/2 . However, the function sin(θ/2) is negative in the interval −π < θ < 0 and positive in the interval 0 < θ < π, and so has the same sign as β. Thus, the square root of sin2 (θ/2) can be written in the form sin(θ/2) = sign (β) |ζ | − α 2|ζ | 1/2 . Using these expressions for cos(θ/2) and sin(θ/2) in the square roots z± brings us to the following useful rule. Rule for ﬁnding the square root of a complex number Let z2 = ζ , with ζ = α + iβ. Then the square roots z+ and z− of ζ are given by z+ = z− = − EXAMPLE 1.13 |ζ | + α 2 1/2 |ζ | + α 2 + i sign (β) 1/2 − i sign (β) |ζ | − α 2 |ζ | − α 2 1/2 1/2 . Find the square roots of (a) ζ = 1 + i and (b) ζ = 1 − i. √ Solution (a) ζ = 1 + i so |ζ | = 2, α = 1 and sign(β) = 1, so the square roots of ζ = 1 + i are ⎧ √ ⎫ √ 1/2 1/2 ⎬ ⎨ 2+1 2−1 z± = ± +i . ⎩ ⎭ 2 2 √ (b) ζ = 1 − i, so |ζ | = 2, α = 1 and sign(β) = −1, from which it follows that the square roots of ζ = 1 − i are ⎧ √ ⎫ √ 1/2 1/2 ⎬ ⎨ 2+1 2−1 −i . z± = ± ⎩ ⎭ 2 2 The theorem that follows provides information about the roots of polynomials with real coefﬁcients that proves to be useful in a variety of ways.
• 26 Chapter 1 Review of Prerequisites THEOREM 1.2 Roots of a polynomial with real coefﬁcients Let P(z) = zn + a1 zn−1 + a2 zn−2 + · · · an−1 z + an be a polynomial of degree n in which all the coefﬁcients a1 , a2 , . . . , an are real. Then either all the n roots of P(z) = 0 are real, that is, the n zeros of P(z) are all real, or any that are complex must occur in complex conjugate pairs. Proof The proof uses the following simple properties of the complex conjugate operation. 1. If a is real, then a = a. This result follows directly from the deﬁnition of the complex conjugate operation. 2. If b and c are any two complex numbers, then b + c = b + c. This result also follows directly from the deﬁnition of the complex conjugate operation. 3. If b and c are any two complex numbers, then bc = bc and br = (b)r . We now proceed to the proof. Taking the complex conjugate of P(z) = 0 gives zn + a1 zn−1 + a2 zn−2 + · · · + an−1 z + an = 0, but the ar are all real so ar zn−r = ar zn−r = ar zn−r = ar (z)n−r , allowing the preceding equation to be rewritten as (z)n + a1 (z)n−1 + a2 (z)n−2 + · · · + an−1 z + an = 0. This result is simply P(z) = 0, showing that if z is a complex root of P(z), then so also is z. Equivalently, z and z are both zeros of P(z). If, however, z is a real root, then z = z and the result remains true, so the ﬁrst part of the theorem is proved. The second part follows from the fact that if z = α + iβ is a root, then so also is z = α − iβ, and so (z − α − iβ) and (z − α + iβ) are factors of P(z). The product of these factors must also be a factor of P(z), but (z − α − iβ)(z − α + iβ) = z2 − 2αz + α 2 + β 2 , and the expression on the right is a quadratic in z with real coefﬁcients, so the ﬁnal result of the theorem is established. EXAMPLE 1.14 Find the roots of z3 − z2 − z − 2 = 0, given that z = 2 is a root. Solution If z = 2 is a root of P(z) = 0, then z − 2 is a factor of P(z), so dividing P(z) by z − 2 we obtain z2 + z + 1. The remaining two roots of P(z) = 0 are the (−1 roots of z2 + z + 1 = 0. Solving this quadratic equation we ﬁnd that z =√ ± √ √ i 3)/2, so the three roots of the equation are 2, (−1 + i 3)/2, and (−1 − i 3)/2. For more background information and examples on complex numbers, the complex plane and roots of complex numbers, see Chapter 1 of reference [6.1], Sections 1.1 to 1.5 of reference [6.4], and Chapter 1 of reference [6.6].
• Section 1.6 Partial Fractions 27 EXERCISES 1.5 In Exercises 1 through 8 ﬁnd the square roots of the given complex number by using result (10), and then conﬁrm the result by using the formula for ﬁnding the square root of a complex mumber. 1. 2. 3. 4. −1 + i. 3 + 2i. i. −1 + 4i. 5. 6. 7. 8. 1 + cos(2π/n) + cos(4π/n) + · · · + cos[(2(n − 1)π/n)] = 0 and 2 − 3i. −2 − i. 4 − 3i. −5 + i. sin(2π/n) + sin(4π/n) + · · · + sin[(2(n − 1)π/n)] = 0. In Exercises 9 through 14 ﬁnd the roots of the given complex number. √ 12. (−1 − i)1/3 . 9. (1 + i 3)1/3 . 1/4 13. (−i)1/3 . 10. i . 1/4 14. (4 + 4i)1/4 . 11. (−1) . 15. Find the roots of z3 + z(i − 1) = 0. 16. Find the roots of z3 + i z/(1 + i) = 0. 1.6 17. Use result (12) to show that 18. Use Theorem 1.1 and the representation z = r eiθ to prove that if a and b are any two arbitrary complex numbers, then ab = ab and (a r ) = (a)r . 19. Given z = 1 is a zero of the polynomial P(z) = z3 − 5z2 + 17z − 13, ﬁnd its other two zeros and verify that they are complex conjugates. 20. Given that z = −2 is a zero of the polynomial P(z) = z5 + 2z4 − 4z − 8, ﬁnd its other four zeros and verify that they occur in complex conjugate pairs. 21. Find the two zeros of the quadratic P(z) = z2 − 1 + i, and explain why they do not occur as a complex conjugate pair. Partial Fractions Let N(x) and D(x) be two polynomials. Then a rational function of x is any function of the form N(x)/D(x). The method of partial fractions involves the decomposition of rational functions into an equivalent sum of simpler terms of the type P2 P1 , ,... ax + b (ax + b)2 and Q2 x + R2 Q1 x + R1 , ,..., 2 + Bx + C 2 + Bx + C)2 Ax (Ax where the coefﬁcients are all real together with, possibly, a polynomial in x. The steps in the reduction of a rational function to its partial fraction representation are as follows: STEP 1 Factorize D(x) into a product of linear factors and quadratic factors with real coefﬁcients with complex roots, called irreducible factors. This is the hardest step, and real quadratic factors will only arise when D(x) = 0 has pairs of complex conjugate roots (see Theorem 1.2). Use the result to express D(x) in the form D(x) = (a1 x + b1 )r1 . . . (am x + bm)rm A1 x 2 + B1 x + C1 . . . Ak x 2 + Bk x + Ck sk s1 , where ri is the number of times the linear factor (ai x + bi ) occurs in the factorization of D(x), called its multiplicity, and s j is the corresponding multiplicity of the quadratic factor (Aj x 2 + Bj x + C j ).
• 28 Chapter 1 Review of Prerequisites STEP 2 Suppose ﬁrst that the degree n of the numerator is less than the degree d of the denominator. Then, to every different linear factor (ax + b) with multiplicity r , include in the partial fraction expansion the terms P1 P2 Pr + + ··· + , (ax + b) (ax + b)2 (ax + b)r partial fraction undetermined coefﬁcients where the constant coefﬁcients Pi are unknown at this stage, and so are called undetermined coefﬁcients. STEP 3 To every quadratic factor ( Ax 2 + Bx + C)s with multiplicity s include in the partial fraction expansion the terms Q1 x + R1 Q2 x + R2 Qs x + Rs + + ··· + , 2 + Bx + C) 2 + Bx + C)2 2 + Bx + C)s (Ax (Ax (Ax where the Q j and Rj for j = 1, 2, . . . , s are undetermined coefﬁcients. STEP 4 Take as the partial fraction representation of N(x)/D(x) the sum of all the terms in Steps 2 and 3. STEP 5 Multiply the expression N(x)/D(x) = Partial fraction representation in Step 4 by D(x), and determine the unknown coefﬁcients by equating the coefﬁcients of corresponding powers of x on either side of this expression to make it an identity (that is, true for all x). STEP 6 Substitute the values of the coefﬁcients determined in Step 5 into the expression in Step 4 to obtain the required partial fraction representation. STEP 7 If n ≥ d, use long division to divide the denominator into the numerator to obtain the sum of a polynomial of degree n − d of the form T0 + T1 x + T2 x 2 + · · · + Tn−d x n−d , together with a remainder term in the form of a rational function R(x) of the type just considered. Find the partial fraction representation of the rational function R(x) using Steps 1 to 6. The required partial fraction representation is then the sum of the polynomial found by long division and the partial fraction representation of R(x). EXAMPLE 1.15 Find the partial fraction representations of (a) F(x) = x2 (x + 1)(x − 2)(x + 3) and (b) F(x) = 2x 3 − 4x 2 + 3x + 1 . (x − 1)2 Solution (a) All terms in the denominator are linear factors, so by Step 1 the appropriate form of partial fraction representation is A B C x2 = + + . (x + 1)(x − 2)(x + 3) x+1 x−2 x+3 Cross multiplying, we obtain x 2 = A(x − 2)(x + 3) + B(x + 1)(x + 3) + C(x + 1)(x − 2).
• Section 1.6 Partial Fractions 29 Setting x = −1 makes the terms in B and C vanish and gives A = −1/6. Setting x = 2 makes the terms in A and C vanish and gives B = 4/15, whereas setting x = −3 makes the terms in A and B vanish and gives C = 9/10, so x2 −1 4 9 = + + . (x + 1)(x − 2)(x + 3) 6(x + 1) 15(x − 2) 10(x + 3) (b) The degree of the numerator exceeds that of the denominator, so from Step 7 it is necessary to start by dividing the denominator into the numerator longhand to obtain 2x 3 − 4x 2 + x + 3 3−x = 2x + . 2 (x − 1) (x − 1)2 We now seek a partial fraction representation of (3 − x)/(x − 1)2 by using Step 1 and writing 3−x B A + = . 2 (x − 1) x − 1 (x − 1)2 When we multiply by (x − 1)2 , this becomes 3 − x = A(x − 1) + B. Equating the constant terms gives 3 = −A+ B, whereas equating the coefﬁcients of x gives −1 = Aso that B = 2. Thus, the required partial fraction representation is 2x 3 − 4x 2 + x + 3 2 1 + = 2x + . (x − 1)2 1 − x (x − 1)2 An examination of the way the undetermined coefﬁcients were obtained in (a) earlier, where the degree of the numerator is less than that of the denominator and linear factors occur in the denominator, leads to a simple rule for ﬁnding the undetermined coefﬁcients called the “cover-up rule.” The cover-up rule Let a partial fraction decomposition be required for a rational function N(x)/D(x) in which the degree of the numerator N(x) is less than that of the denominator D(x) and, when factored, let D(x) contain some linear factors (factors of degree 1). Let (x − α) be a linear factor of D(x). Then the unknown coefﬁcient K in the term K/(x − α) in the partial fraction decomposition of N(x)/D(x) is obtained by “covering up” (ignoring) all of the other terms in the partial fraction expansion, multiplying the remaining expression N(x)/D(x) = K/(x − α) by (x − α), and then determining K by setting x = α in the result. To illustrate the use of this rule we use it in case (a) given earlier to ﬁnd Afrom the representation x2 A B C = + + . (x + 1)(x − 2)(x + 3) x+1 x−2 x+3
• 30 Chapter 1 Review of Prerequisites We “cover up” (ignore) the terms involving B and C, multiply through by (x + 1), and ﬁnd A from the result x2 =A (x − 2)(x + 3) completing the square by setting x = −1, when we obtain A = −1/6. The undetermined coefﬁcients B and C follow in similar fashion. Once a partial fraction representation of a function has been obtained, it is often necessary to express any quadratic x 2 + px + q that occurs in a denominator in the form (x + A)2 + B, where A and B may be either positive or negative real numbers. This is called completing the square, and it is used, for example, when integrating rational functions and when ﬁnding inverse Laplace transforms. To ﬁnd A and B we set x 2 + px + q = (x + A)2 + B = x 2 + 2Ax + A2 + B, and to make this an identity we now equate the coefﬁcients of corresponding powers of x on either side of this expression: (coefﬁcients of x 2 ) (coefﬁcients of x) (constant terms) 1 = 1 (this tells us nothing) p = 2A q = A2 + B. Consequently A = (1/2) p and B = q − (1/4) p2 , and so the result obtained by completing the square is x 2 + px + q = [x + (1/2) p]2 + q − (1/4) p2 . If the more general quadratic ax 2 + bx + c occurs, all that is necessary to reduce it to this same form is to write it as ax 2 + bx + c = a[x 2 + (b/a)x + c/a], and then to complete the square using p = b/a and q = c/a. EXAMPLE 1.16 Complete the square in the following expressions: (a) x 2 + x + 1. (b) x 2 + 4x. (c) 3x 2 + 2x + 1. Solution (a) p = 1, q = 1, so A = 1/2, B = 3/4, and hence x 2 + x + 1 = (x + 1/2)2 + 3/4. (b) p = 4, q = 0, so A = 2, B = −4, and hence x 2 + 4x = (x + 2)2 − 4. (c) 3x 2 + 2x + 1 = 3[x 2 + (2/3)x + 1/3] and so p = 2/3, q = 1/3, from which it follows that A = 1/3 and B = 2/9, so 3x 2 + 2x + 1 = 3{(x + 1/3)2 + 2/9}. Further information and examples of partial fractions can be found in any one of references [1.1] to [1.7].
• Section 1.7 Fundamentals of Determinants 31 EXERCISES 1.6 6. (x 2 − 1)/(x 2 + x + 1). 7. (x 3 + x 2 + x + 1)/{(x + 2)2 (x + 1)}. 8. (x 2 + 4)/(x 3 + 3x 2 + 3x + 1). Express the rational functions in Exercises 1 through 8 in terms of partial fractions using the method of Section 1.6, and verify the results by using computer algebra to determine the partial fractions. 1. 2. 3. 4. 5. Complete the square in Exercises 9 through 14. (3x + 4)/(2x 2 + 5x + 2). (x 2 + 3x + 5)/(2x 2 + 5x + 3). (3x − 7)/(2x 2 + 9x + 10). (x 2 + 3x + 2)/(x 2 + 2x − 3). (x 3 + x 2 + x + 1)/[(x + 2)2 (x 2 + 1)]. 1.7 9. x 2 + 4x + 5. 10. x 2 + 6x + 7. 11. 2x 2 + 3x − 6. 12. 4x 2 − 4x − 3. 13. 2 − 2x + 9x 2 . 14. 2 + 2x − x 2 . Fundamentals of Determinants A determinant of order n is a single number associated with an array A of n2 numbers arranged in n rows and n columns. If the number in the ith row and jth column of a determinant is ai j , the determinant of A, denoted by det A and sometimes by |A|, is written a11 a21 det A = |A| = ... an1 a12 a22 ... an2 . . . a1n . . . a2n . ... ... . . . ann (13) It is customary to refer to the entries ai j in a determinant as its elements. Notice the use of vertical bars enclosing the array A in the notation |A| for the determinant of A, as opposed to the use of the square brackets in [A] that will be used later to denote the matrix associated with an array A of quantities in which the number of rows need not be equal to the number of columns. The value of a ﬁrst order determinant det A with the single element a11 is deﬁned as a11 so that det[a11 ] = a11 or, in terms of the alternative notation for a determinant, |a11 | = a11 . This use of the notation |.| to signify a determinant should not be confused with the notation used to signify the absolute value of a number. The second order determinant associated with an array of elements containing two rows and two columns is deﬁned as det A = a11 a21 a12 = a11 a22 − a12 a21 , a22 (14) so, for example, using the alternative notation for a determinant we have 9 −7 3 = 9(−4) − (−7)3 = −15. −4 Notice that interchanging two rows or columns of a determinant changes its sign. We now introduce the terms minor and cofactor that are used in connection with determinants of all orders, and to do so we consider the third order determinant a11 det A = a21 a31 a12 a22 a32 a13 a23 . a33 (15)
• 32 Chapter 1 Review of Prerequisites minors and cofactors The minor Mi j associated with ai j , the element in the ith row and jth column of det A, is deﬁned as the second order determinant obtained from det A by deleting the elements (numbers) in its ith row and jth column. The cofactor Ci j of an element in the ith row and jth column of the det A in (15) is deﬁned as the signed minor using the rule Ci j = (−1)i+ j Mi j . (16) With these ideas in mind, the determinant det A in (15) is deﬁned as 3 det A = a1 j (−1)1+ j det M1 j j=1 = a11 M11 − a12 M12 + a13 M13 . If we introduce the cofactors Ci j , this last result can be written det A = a11 C11 + a12 C12 + a13 C13 , (17) and more concisely as 3 det A = a1 j C1 j . (18) j=1 Result (18), or equivalently (17), will be taken as the deﬁnition of a third order determinant. EXAMPLE 1.17 Evaluate the determinant 1 2 −2 3 −3 1 0 . 1 1 Solution The minor M11 = 1 0 = (1)(1) − (0)(1) = 1, so the cofactor 1 1 C11 = (−1)(1+1) M11 = 1. 2 The minor M12 = −2 0 = (2)(1) − (0)(−2) = 2, so the cofactor 1 C12 = (−1)(1+2) M12 = −2. 2 The minor M13 = −2 1 = (2)(1) − (1)(−2) = 4, so the cofactor 1 C13 = (−1)(1+3) M13 = 4. Using (17) we have 1 2 −2 3 −3 1 0 = (1)C11 + (3)C12 + (−3)C13 = (1)(1) + (3)(−2) + (−3)(4) = −17. 1 1 When expanded, (17) becomes det A = a11 a22 a33 − a11 a32 a23 − a12 a21 a33 + a12 a31 a23 + a13 a21 a32 − a13 a31 a22 ,
• Section 1.7 Fundamentals of Determinants 33 and after regrouping these terms in the form det A = −a21 a12 a33 + a21 a32 a13 + a22 a11 a33 − a22 a31 a13 − a23 a11 a32 + a23 a31 a12 , we ﬁnd that det A = a21 C21 + a22 C22 + a23 C23 . Proceeding in this manner, we can easily show that det A may be obtained by forming the sum of the products of the elements of A and their cofactors in any row or column of det A. These results can be expressed symbolically as follows. Expanding in terms of the elements of the ith row: 3 det A = ai1 Ci1 + ai2 Ci2 + ai3 Ci3 = ai j Ci j . (19) j=1 Laplace expansion theorem Expanding in terms of the elements of the jth column: 3 det A = a1 j C1 j + a2 j C2 j + a3 j C3 j = ai j Ci j . (20) i=1 Results (19) and (20) are the form taken by the Laplace expansion theorem when applied to a third order determinant. The extension of the theorem to determinants of any order will be made later in Chapter 3, Section 3.3. EXAMPLE 1.18 Expand the following determinant (a) in terms of elements of its ﬁrst row, and (b) in terms of elements of its third column: 1 |A| = 1 1 2 0 2 4 2 . 1 Solution (a) Expanding in terms of the elements of the ﬁrst row requires the three cofactors C11 = M11 , C12 = −M12 , and C13 = M13 , where M11 = 0 2 2 = −4, 1 M12 = 1 1 2 = −1, 1 M13 = 1 1 0 = 2, 2 so C11 = (−1)(1+1) (−4) = −4, C12 = (−1)(1+2) (−1) = 1, C13 = (−1)(1+3) (2) = 2, and so |A| = (1)(−4) + (2)(1) + (4)(2) = 6. (b) Expanding in terms of the elements of the third column requires the three cofactors C13 = M13 , C23 = −M23 , and C33 = M33 , where M13 = 1 1 0 = 2, 2 M23 = 1 1 2 = 0, 2 M33 = 1 1 2 = −2, 0 so C13 = (−1)(1+3) (2) = 2, C23 = 0, C33 = (−1)(3+3) (−2) = −2 and so |A| = (4)(2) + (2)(0) + (1)(−2) = 6.
• 34 Chapter 1 Review of Prerequisites Two especially simple third order determinants are of the form a11 det A = 0 0 a12 a22 0 a13 a23 a33 a11 det A = a21 a31 and 0 a22 a32 0 0 . a33 The ﬁrst of these determinants has only zero elements below the diagonal line drawn from its top left element to its bottom right one, and the second determinant has only zero elements above this line. This diagonal line in every determinant is called the leading diagonal. The value of each of the preceding determinants is easily seen to be given by the product a11 a22 a33 of the terms on its leading diagonal. Simpler still in form is the third order determinant a11 det A = 0 0 0 a22 0 0 0 = a11 a22 a33 , a33 whose value a11 a22 a33 is again the product of the elements on the leading diagonal. For another approach to the elementary properties of determinants, see Appendix A16 of reference [1.2], and Chapter 2 of reference [2.1]. EXERCISES 1.7 Evaluate the determinants in Exercises 1 through 6 (a) in terms of elements of the ﬁrst row and (b) in terms of elements of the second column. −1 2 −1 1 1. 1 1 5 7 −1 1 . 2 1 2 2. 2 5 1 6 1 −1 −1 . −1 1 5. 2 4 5 3. 1 3 2 2 1 4 1 . 5 6. 4. 3 1 3 0 1 3 6 4 . 1 −6 3 . 21 1 5 −1 2 1 −3 . −4 1 1 7. On occasion the elements of a matrix may be functions, in which case the determinant may be a function. Evaluate the functional determinant 1 0 0 0 sin x cos x 0 − cos x . sin x 8. Determine the values of λ that make the following determinant vanish: 3−λ 2 2 2 2 2−λ 0 . 0 4−λ Hint: This is a polynomial in λ of degree 3. 9. A matrix is said to be transposed if its ﬁrst row is written as its ﬁrst column, its second row is written as its second column . . . , and its last row is written as its last column. If the determinant is |A|, the determinant of AT , the transpose matrix A, is denoted by |AT |. Write out the expansion of |A| using (17) and reorder the terms to show that |A| = |AT |. 10. Use elimination to solve the system of linear equations a11 x1 + a12 x2 = b1 a21 x1 + a22 x2 = b2 for x1 and x2 , in which not both b1 and b2 are zero, and show that the solution can be written in the form x1 = D1 /|A| and x2 = D2 /|A|, provided |A| = 0, where |A| is the determinant of the matrix of coefﬁcients of the system |A| = a11 a12 b a a b , D1 = 1 12 , and D2 = 11 1 . a21 a22 b2 a22 a21 b2 Notice that D1 is obtained from |A| by replacing its ﬁrst column by b1 and b2 , whereas D2 is obtained from |A| by replacing its second column by b1 and b2 . This is Cramer’s rule for a system of two simultaneous equations. Use this method to ﬁnd the solution of x1 + 5x2 = 3 7x1 − 3x2 = −1.
• Section 1.8 Continuity in One or More Variables where |A| is the determinant of the matrix of coefﬁcients and Di is the determinant obtained from |A| by replacing its ith column by b1 , b2 , and b3 for i = 1, 2, 3. This is Cramer’s rule for a system of three simultaneous equations, and the method generalizes to a system of n linear equations in n unknowns. Use this method to ﬁnd the solution of 11. Repeat the calculation in Exercise 10 using the system of equations a11 x1 + a12 x2 + a13 x3 = b1 a21 x1 + a22 x2 + a23 x3 = b2 a31 x1 + a32 x2 + a33 x3 = b3 , in which not all of b1 , b2 , and b3 are zero, and show that provided |A| = 0, x1 = D1 /|A|, 1.8 x2 = D2 /|A|, and 35 x1 + 2x2 − x3 = 2 x1 − 3x2 − 2x3 = −1 2x1 + x2 + 2x3 = 1. x3 = D3 /|A|, Continuity in One or More Variables If the function y = f (x) is deﬁned in the interval a ≤ x ≤ b, the interval is called the domain of deﬁnition of the function. The function f is said to have a limit at a point c in a ≤ x ≤ b, written limx→c f (x) = L, if for every arbitrarily small number ε > 0 there exists a number δ > 0 such that | f (x) − L| < ε when |x − c| < δ. (21) This technical deﬁnition means that as x either increases toward c and becomes arbitrarily close to it, or decreases toward c and becomes arbitrarily close to it, so f (x) approaches arbitrarily close to the value L. Notice that it is not necessary for f (x) to be deﬁned at x = c, or, if it is, that f (c) assumes the value L. If f (x) has a limit L as x → c and in addition f (c) = L, so that lim f (x) = f (c) = L, x→c continuity from the right then the function f is said to be continuous at c. It must be emphasized that in this deﬁnition of continuity the limiting operation x → c must be true as x tends to c from both the left and right. It is convenient to say that x approaches c from the left when it increases toward c and, correspondingly, to say that x approaches c from the right when it decreases toward it. The function f is continuous from the right at x = c if lim f (x) = f (c), x→c+ continuity from the left lim f (x) = f (c), (24) where now x → c− means that x increases toward c, causing x to tend to c from the left. The relationship among deﬁnitions (22), (23), and (24) is that f is continuous at the point c if lim f (x) = lim+ f (x) = f (c). x→c− continuous function (23) where the notation x → c+ means that x decreases toward c, causing x to tend to c from the right. Similarly, f is continuous from the left at x = c if x→c− continuity at x = c (22) x→c (25) When expressed in words, this says that f is continuous at x = c if the limits of f as x tends to c from both the left and right exist and, furthermore, the limits equal the functional value f (c). A function f that is continuous at all points of a ≤ x ≤ b is said to be a continuous function on that interval. Graphically, a continuous function on a ≤ x ≤ b is a function whose graph is unbroken but not necessarily smooth. A function f is said
• 36 Chapter 1 Review of Prerequisites Discontinuous Continuous from the left at x = d y Continuous at x = c y Discontinuous at x = c k2 f (c) k1 y = f(x ) y = f (x ) Continuous from the right 0 a c d b x 0 a c (a) b x (b) FIGURE 1.8 (a) A continuous function for a < x < b. (b) A discontinuous function. smooth function continuous and piecewise smooth function discontinuous function to be smooth over an interval if at each point of the graph the tangent lines to the left and right of the point are the same. Figure 1.8a shows the graph of a continuous function that is smooth over the intervals a ≤ x < c and c < x < b but has different tangent lines to the immediate left and right of x = c where the function is not smooth. A function such as this is said to be continuous and piecewise smooth over the interval a ≤ x ≤ b. A function f is said to be discontinuous at a point c if it is not continuous there. For a jump discontinuity we have lim f (x) = k1 x→c− piecewise continuity and lim f (x) = k2 , x→c+ but k1 = k2 . (26) A function f is said to have a removable discontinuity at a point c if k1 = k2 in (26), but f (c) = k1 , as at the point c2 in Fig. 1.9. An example of a discontinuous function is shown in Fig. 1.8b where a jump discontinuity occurs at x = c. A function f is said to be piecewise continuous on an interval a ≤ x ≤ b if it is continuous on a ﬁnite number of adjacent subintervals, but discontinuous at the end points of the subintervals, as shown in Fig. 1.9. The notion of continuity of a function of several variables is best illustrated by considering a function f (x, y) of the two independent variables x and y. The function f deﬁned in some region of the (x, y)-plane D, say, is said to be continuous y Discontinuous at c1 Discontinuous at c3 y = f (x ) Discontinuous at c2 0 a c1 c2 FIGURE 1.9 A piecewise continuous function. c3 b x
• Section 1.8 Continuity in One or More Variables 37 at the point (a, b) in D if continuity of f(x, y) lim x→a,y→b discontinuity of f(x, y) f (x, y) = f (a, b), (27) and to be discontinuous otherwise. In this deﬁnition of continuity, it is important to recognize that a general point P at (x, y) is allowed to tend to the point (a, b) in D along any path in the (x, y)-plane that lies in D. Expressed differently, f will only be continuous at (a, b) if the limit in (27) is independent of the way in which the point (x, y) approaches the point (a, b). When this is true for all points in D, the function f is said to be continuous in D. The function f is, for instance, discontinuous at (a, b) if lim x→a,y→b f (x, y) = k, but f (a, b) = k. Sufﬁcient for showing that a function f is discontinuous at a point (a, b) is by demonstrating that two different limiting values of f are obtained if the point P at (x, y) is allowed to tend to (a, b) along two different straight-line paths. This approach can be used to show that the function xy f (x, y) = 2 x + a 2 y2 has no limit at the origin. If we allow the point P at (x, y) to tend to the origin along the straight line y = kx, with k an arbitrary constant, the function f becomes f (x, kx) = k , 1 + a 2 k2 and it is seen from this that f is constant along each such line. However, the value of f on each line, and hence at the origin, depends on k, so f has no limit at the origin and so is discontinuous at that point, though f is deﬁned and continuous at all other points of the (x, y)-plane. An example of a function f (x, y) that is continuous everywhere except at points along a curve in the (x, y)-plane is shown in Fig. 1.10. z = f (x, y ) us o tinu con Dis long Γ a z y 0 Γ x D FIGURE 1.10 A function f (x, y) continuous everywhere except at points on .
• 38 Chapter 1 Review of Prerequisites The extension of these deﬁnitions to functions of n variables is immediate and so will not be discussed. Discussions on continuity and its consequences can be found in any one of references [1.1] to [1.7]. 1.9 Differentiability of Functions of One or More Variables The function f (x) deﬁned in a ≤ x ≤ b is said to be differentiable with the derivative f (c) at a point c inside the interval if the following limit exists: lim h→0 differentiability of f(x) left- and right-hand derivatives of f(x) f (c + h) − f (c) = f (c). h (28) Here, as in the deﬁnition of continuity, for f to be differentiable at point c the limit must remain unchanged as h tends to zero through both positive and negative values. The function f is said to be differentiable in the interval a ≤ x ≤ b if it is differentiable at every point in the interval. When f is differentiable at a point c with derivative f (c), the number f (c) is the gradient, or slope, of the tangent line to the graph at the point (c, f (c)). A function with a continuous derivative throughout an interval is said to be a smooth function over the interval. The function f will be said to be nondifferentiable at any point c where the limit in (28) does not exist. Even when a function f is nondifferentiable at a point, it is possible that a special form of derivative can still be deﬁned to the left and right of the point if the requirement that the limit in (28) exists as h → 0 through both positive and negative values is relaxed. The function f has a right-hand derivative at a if the limit lim+ h→0 f (a + h) − f (a) h (29) exists, and a left-hand derivative at b if the limit lim h→0− ﬁrst order partial derivatives of f(x, y) f (b + h) − f (b) . h (30) exists. When c is a speciﬁc point, f (c) is a number, but when x is a variable, f (x) becomes a function. Left- and right-hand derivatives are illustrated in Fig. 1.11. An important consequence of differentiability is that differentiability implies continuity, but the converse is not true. The ﬁrst order partial derivative with respect to x of the function f (x, y) of the two independent variables x and y at the point (a, b) is the number deﬁned by lim h→0 f (a + h, b) − f (a, b) , h (31)
• Section 1.9 Differentiability of Functions of One or More Variables 39 left-hand derivative equal to slope of line y f (b) f (c) y = f (x) right-hand derivative equal to slope of line right-hand derivative equal to slope of line left-hand derivative equal to slope of line f (a) 0 a c b x FIGURE 1.11 Left- and right-hand derivatives as tangent lines. provided the limit exists. The value of this partial derivative is denoted either by ∂ f/∂ x at (a, b), or by fx (a, b). The corresponding partial derivative at a general point (x, y) is the function fx (x, y). Similarly, the ﬁrst order partial derivative with respect to y of the function f (x, y) at the point (a, b) is the number deﬁned by the limit lim k→0 second order partial derivatives of f(x, y) f (a, b + k) − f (a, b) , k (32) provided the limit exists. The value of this partial derivative is denoted either by ∂ f/∂ y at (a, b), or by fy (a, b). At a general point (x, y) this partial derivative becomes the function fy (x, y). Higher order partial derivatives are deﬁned in a similar fashion leading, for example, to the second order partial derivatives ∂ 2 f/∂ x 2 = ∂/∂ x(∂ f/∂ x), ∂ 2 f/∂ y2 = ∂/∂ y(∂ f/∂ y), ∂ 2 f/∂ x∂ y = ∂/∂ y(∂ f/∂ x), and ∂ 2 f/∂ y∂ x = ∂/∂ x(∂ f/∂ y). A more compact notation for these same derivatives is fxx , fyy , fxy , and fyx , so that, for example fyx = ∂ 2 f/∂ y∂ x and fyy = ∂ 2 f/∂ y2 . mixed partial derivatives THEOREM 1.3 The derivatives fxy and fyx are called mixed partial derivatives, and their relationship forms the statement of the next theorem, the proof of which can be found in any one of references [1.1] to [1.7]. Equality of mixed partial derivatives Let f, fx , fxy , and fyx all be deﬁned and continuous at a point (a, b) in a region. Then fxy (a, b) = fyx (a, b).
• 40 Chapter 1 Review of Prerequisites total differential This result, given conditions for the equality of mixed partial derivatives, is an important one, and use will be made of it on numerous occasions as, for example, in Chapter 18 when second order partial differential equations are considered. If z = f (x, y), the total differential dz of f is deﬁned as dz = (∂ f/∂ x) dx + (∂ f/∂ y) dy, (33) where dz, dx, and dy are differentials. Here, a differential means a small quantity, and the differential dz is determined by (33) when the differentials dx and dy are speciﬁed. When ∂ f/∂ x and ∂ f/∂ y are evaluated at a speciﬁc point (a, b), result (33) provides a linear approximation to f (x, y) near to the point (a, b). Although ﬁnite, the limits of the quotients of the differentials dz ÷ dx and dy ÷ dx as the differential dx → 0 are such that they become the values of the derivatives dz/dx and dy/dx, respectively, at a point (x, y) where ∂ f/∂ x and ∂ f/∂ y are evaluated. 1.10 Tangent Line and Tangent Plane Approximations to Functions tangent line approximation Let y = f (x) be deﬁned in the interval a ≤ x ≤ b and be differentiable throughout it. Then a tangent line (linear) approximation to f near a point x0 in the interval is given by yT = f (x0 ) + (x − x0 ) f (x0 ). (34) This linear expression approximates the function f close to x0 by the tangent to the graph of y = f (x) at the point (x0 , f (x0 )). This simple approximation has many uses; one will be in the Euler and modiﬁed Euler methods for solving initial value problems for ordinary differential equations developed in Chapter 19. EXAMPLE 1.19 Find a tangent line approximation to y = 1 + x 2 + sin x near the point x = α. Solution Setting x0 = α and substituting into (34) gives y ≈ 1 + α 2 + sin α + (x − α)(2α + cos α) for x close to α. tangent plane approximation Let the function z = f (x, y) be deﬁned in a region Dof the (x, y)-plane where it possesses continuous ﬁrst order partial derivatives ∂ f/∂ x and ∂ f/∂ y. Then a tangent plane (linear) approximation to f near any point (x0 , y0 ) in D is given by zT = f (x0 , y0 ) + (x − x0 ) fx (x0 , y0 ) + (y − y0 ) fy (x0 , y0 ). (35) This linear expression approximates the function f close to the point (x0 , y0 ) by a plane that is tangent to the surface z = f (x, y) at the point (x0 , y0 , f (x0 , y0 )). The tangent plane approximation in (35) is an immediate extension to functions of two variables of the tangent line approximation in (34), to which it simpliﬁes when only one independent variable is involved.
• Section 1.11 Integrals 41 Both of these approximations are derived from the appropriate Taylor series expansions of functions discussed in Section 1.12 by retaining only the linear terms. EXAMPLE 1.20 Find the tangent plane approximation to the function z = x 2 − 3y2 near the point (1, 2). Solution Setting x0 = 1, y0 = 2 and substituting into (35) gives z ≈ −11 + 2(x − 1) − 12(y − 2) for (x, y) close to (1, 2). 1.11 Integrals indeﬁnite and deﬁnite integrals A differentiable function F(x) is called an antiderivative of the function f (x) on some interval if at each point of the interval dF/dx = f (x). If F(x) is any antiderivative of f (x), the indeﬁnite integral of f (x), written f (x) dx, is f (x) dx = F(x) + c, where c is an arbitrary constant called the constant of integration. The function f (x) is called the integrand of the integral. Thus, an indeﬁnite integral is a function, and an antiderivative and an indeﬁnite integral can only differ by an arbitrary additive constant. b The expression a f (x) dx, called a deﬁnite integral, is a number and may be interpreted geometrically as the area between the graph of f (x) and the lines x = a and x = b, for b > a, with areas above the x-axis counted as positive and those below it as negative. The relationship between deﬁnite integrals that are numbers and indeﬁnite integrals that are functions is given in the next theorem, included in which is also the mean value theorem for integrals. See the references at the end of the chapter for proofs and further information. THEOREM 1.4 Fundamental theorem of integral calculus and the mean value theorem for integrals If F (x) is continuous in the interval a ≤ x ≤ b, throughout which F (x) = f (x), then b f (x) dx = F(b) − F(a). a Another result is b f (x) dx = (b − a) f (ξ ), a if f is differentiable, where the number ξ , although unknown, lies in the interval a < ξ < b. In this form the result is called the mean value theorem for integrals.
• 42 Chapter 1 Review of Prerequisites An improper integral is a deﬁnite integral in which one or more of the following cases arises: (a) the integrand becomes inﬁnite inside or at the end of the interval of integration, or (b) one (or both) of the limits of integration is inﬁnite. Types of Improper Integrals Case (a) convergence and divergence of improper integrals If the integrand of an integral becomes inﬁnite at a point c inside the interval of integration a ≤ x ≤ b as shown in Fig. 1.12a, the improper integral is said to exist if the limits in (36) exist. When the improper integral exists it is said to converge to the (ﬁnite) value of the following limit: b h→0 a a Cauchy principal value c−h f (x) dx = lim f (x) dx + lim b k→0 c+k f (x) dx. (36) In this deﬁnition h > 0 and k > 0 are allowed to tend to zero independently of each other. If, when the limit is taken, the integral is either inﬁnite or indeterminate, the integral is said to diverge. Some integrals of this type diverge when h and k are allowed to tend to zero independently of each other, but converge when the limit is taken with h = k, in which case the result of the limit is called the Cauchy principal value of the integral. Integrals of this type arise frequently when certain types of deﬁnite integral are evaluated in the complex plane by means of contour integration (see Chapter 15, Section 15.5). Case (b) If a limit of integration in a deﬁnite integral is inﬁnite, say the upper limit as shown in Fig. 1.12b, then, when it exists, the improper integral is said to converge to the value of the limit ∞ R f (x) dx = lim R→∞ a a y f (x) dx, (37) y y = f (x) y = f(x) 0 a c (a) b x 0 x a (b) FIGURE 1.12 (a) f (x) is inﬁnite inside the interval of integration. (b) The interval of integration is inﬁnite in length.
• Section 1.12 Taylor and Maclaurin Theorems 43 and the integral is divergent if the limit is either inﬁnite or indeterminate. If both limits are inﬁnite, the improper integral is said to converge to the value of the limit ∞ f (x) dx = −∞ R lim R→∞,S→∞ −S f (x) dx (38) when it exists, and the integral is said to be divergent if the limit is either inﬁnite or indeterminate. In (38) R and S are allowed to tend to inﬁnity independently of each other. Integrals of this type also have Cauchy principal values if the foregoing process leads to divergence, but the integrals are convergent when the limit is taken with R = S. Integrals of this type also occur when certain real integrals are evaluated by means of contour integration (see Chapter 15, Section 15.5). Elementary examples of convergent improper integrals of the types shown in (36) to (38) are 1 0 ∞ 1 x p − x− p dx = − π cot pπ, x−1 p exp(−x) sin xdx = 1/2 ∞ and 0 THEOREM 1.5 ( p2 < 1), −∞ dx = π. 1 + x2 Differentiation under the integral sign — Leibniz’ rule If ξ (t), η(t), dξ/dt, dη/dt, f (x, t), and ∂ f/∂t are continuous for t0 ≤ t ≤ t1 and for x in the interval of integration, then η(t) d dt ξ (t) η(t) f (x, t) dx = ξ (t) dη dξ ∂ f (x, t) dx + f (η(t), t) − f (ξ (t), t) . ∂t dt dt This theorem is used, for example, in Chapter 18 when discussing discontinuous solutions of a class of partial differential equations called conservation laws. Extensions of the theorem to functions of more variables are developed in Chapter 12, Section 12.3, where certain vector integral theorems are developed, and applications of the results of that section to ﬂuid mechanics are to be found in Chapter 12, Section 12.4. An application of Theorem 1.5 that is easily checked by direct calculation is d dt t2 2t (x 2 + t) dx = t2 dx + (t 4 + t) · 2t − (4t 2 + t) · 2 = 2t 5 − 5t 2 − 4t. 2t A proof of Leibniz’ rule can be found, for example, in Chapter 12 of reference [1.6]. 1.12 Taylor and Maclaurin Theorems THEOREM 1.6 Taylor’s theorem for a function of one variable Let a function f (x) have derivatives of all orders in the interval a < x < b. Then for each positive integer n and
• 44 Chapter 1 Review of Prerequisites each x0 in the interval f (x) = f (x0 ) + (x − x0 ) f (1) (x0 ) + + (x − x0 )2 (2) f (x0 ) + · · · 2! (x − x0 )n (n) f (x0 ) + Rn+1 (x), n! where f (r ) (x) = dr f/dxr , and the remainder term Rn+1 (x) is given by Rn+1 (x) = (x − x0 )n+1 (n+1) f (ξ ), (n + 1)! for some ξ between x0 and x. Taylor polynomial Maclaurin’s theorem Taylor’s theorem becomes the Taylor series for f (x) when n is allowed to become inﬁnite, and if the remainder term is neglected in Taylor’s theorem the result is called the Taylor polynomial approximation to f (x) of degree n. The Taylor polynomial of degree 1 is simply the tangent line approximation to f at x0 given in (34). Taylor’s theorem reduces to Maclaurin’s theorem if x0 = 0, and if we allow n to become inﬁnite in Maclaurin’s theorem, it becomes the Maclaurin series for f (x). A special case of Theorem 1.6 arises when Taylor’s theorem is terminated with the term R1 (x), corresponding to n = 0, because the result can be written f (x) − f (x0 ) = f (ξ ), x − x0 mean value theorem (39) with ξ between x0 and x, and in this form it is called the mean value theorem for derivatives (see the last result of Theorem 1.4). A Taylor series is an example of an inﬁnite series called a power series, the general form of which is ∞ an (x − x0 )n = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + · · · . (40) n=0 In (40) the quantity x is a variable, the numbers ai are the coefﬁcients of the power series, the constant x0 is called the center of the series, or the point about which the series is expanded, and unless otherwise stated, x, x0 , and the ai are real numbers, so the power series is a function of x. A power series is said to converge for a given value of x if the sum of the inﬁnite series for this value of x is ﬁnite. If the sum is inﬁnite, or is not deﬁned, the power series will be said to diverge for that value of x. Power series converge in an interval x0 − R < x < x0 + R, where the number R is called the radius of convergence of the series. Expressions for R are derived in Section 15.1. The interval x0 − R < x < x0 + R is called the interval of convergence of the power series. A power series converges for all x inside the interval of convergence and diverges for all x outside it, and the series may, or may not, converge at the end points of the interval. The convergence properties of power series are shown diagramatically in Fig. 1.13, and results (40) and combining expressions for R with
• Section 1.12 Taylor and Maclaurin Theorems Interval of Convergence Divergence x0 − R x0 45 Divergence x0 + R FIGURE 1.13 Interval of convergence of a power series with center x0 . (40) gives the following theorem (see the references at the end of the chapter for real variable proofs of the following results and for more information). THEOREM 1.7 Ratio test and nth root test for the convergence of power series The power series ∞ an (x − x0 )n = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + · · · n=0 radius and interval of convergence converges in the interval of convergence x0 − R < x < x0 + R, where the radius of convergence R is determined by either of the formulas (a) R = 1/ lim |an+1 /an | n→∞ or (b) R = 1/ lim |an |1/n . n→∞ The power series will diverge outside the interval of convergence, and its behavior at the ends of the interval of convergence must be determined separately. A simple result on the convergence of a series that is often useful is the alternating series test. An alternating series is so named because the signs of successive terms of the series alternate in sign. THEOREM 1.8 The alternating series test for convergence The alternating series converges if an > 0 and an+1 < an for all n and limn→∞ an = 0. ∞ n+1 an n=1 (−1) The following theorem on the differentiation and integration of power series is often needed, and it is a real variable form of a result proved later in Chapter 15 when complex power series are studied. THEOREM 1.9 Differentiation and integration of power series Let a power series have an interval of convergence x0 − R < x < x0 + R. Then the series may be differentiated and integrated term by term, and in each case the resulting series will have the same interval of convergence as the original series. In addition, within an interval of convergence common to any two power series, the series may be scaled by a constant and added or subtracted term by term and the resulting power series will have the same common interval of convergence. The simplest form of Taylor’s theorem for a function of two variables that ﬁnds many applications is given in the next theorem. THEOREM 1.10 Taylor’s theorem for a function of two variables Let f (x, y) be deﬁned for a < x < b and c < y < d and have continuous partial derivatives up to and including
• 46 Chapter 1 Review of Prerequisites those of order 2. Then for x0 and y0 any points such that a < x0 < b and c < y0 < d, f (x, y) = f (x0 , y0 ) + (x − x0 ) fx (x0 , y0 ) + (y − y0 ) fy (x0 , y0 ) 1 (x − x0 )2 fxx (x0 + ξ, y0 + η) + 2(x − x0 )(y − y0 ) + 2! × fxy (x0 + ξ, y0 + η)(y − y0 )2 fyy (x0 + ξ, y0 + η) , where the numbers ξ and η are unknown, but ξ lies between x0 and x and η lies between y0 and y. The group of second order partial derivatives in Theorem 1.10 forms the remainder term, and when these derivatives are ignored, the result reduces to the tangent plane approximation to f (x, y) at the point (x0 , y0 ) given in (35). More information on Taylor’s theorem and series can be found, for example, in reference [1.2]. 1.13 Cylindrical and Spherical Polar Coordinates and Change of Variables in Partial Differentiation Mathematical problems formulated using a particular coordinate system, such as cartesian coordinates, often need to be reexpressed in terms of a different coordinate system in order to simplify the task of ﬁnding a solution. When partial derivatives occur in the formulation of problems, it becomes necessary to know how they transform when a different coordinate system is used. The fundamental theorem governing the transformation of partial derivatives under a change of variables takes the following form (see the references at the end of the chapter for the proof of Theorem 1.11 and for more examples of its use). THEOREM 1.11 Change of variables in partial differentiation Let f (x1 , x2 , . . . , xn ) be a differentiable function with respect to the n independent variables x1 , x2 , . . . , xn , and let the n new independent variables u1 , u2 , . . . , un be determined in terms of x1 , x2 , . . . , xn by x1 = X1 (u1 , u2 , . . . , un ), x2 = X2 (u1 , u2 , . . . , un ), . . . , xn = Xn (u1 , u2 , . . . , un ), where X1 , X2 , . . . , Xn are differentiable functions of their arguments. Then, if as a result of the change of variables the function f (x1 , x2 , . . . , xn ) becomes the function F(X1 , X2 , . . . , Xn ), and using chain rules we have ∂F ∂ f ∂ X1 ∂ f ∂ X2 ∂ f ∂ Xn = + +···+ ∂u1 ∂ x1 ∂u1 ∂ x2 ∂u1 ∂ xn ∂u1 ∂F ∂ f ∂ X1 ∂ f ∂ X2 ∂ f ∂ Xn = + +···+ ∂u2 ∂ x1 ∂u2 ∂ x2 ∂u2 ∂ xn ∂u2 .................................... ∂ f ∂ X1 ∂ f ∂ X2 ∂ f ∂ Xn ∂F = + +···+ . ∂un ∂ x1 ∂un ∂ x2 ∂un ∂ xn ∂un (41)
• Section 1.13 Cylindrical and Spherical Polar Coordinates and Change of Variables in Partial Differentiation 47 To ﬁnd higher order partial derivatives it is necessary to express the relationships between the operations of differentiation in the two coordinate systems, rather than between the actual derivatives themseves. This can be accomplished by rewriting the results of Theorem 1.11 in the form of partial differential operators as follows: ∂ ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ ≡ + + ··· + ∂u1 ∂u1 ∂ x1 ∂u1 ∂ x2 ∂u1 ∂ xn ∂ ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ ≡ + + ··· + ∂u2 ∂u2 ∂ x1 ∂u2 ∂ x2 ∂u2 ∂ xn .................................... (42) ∂ ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ ≡ + + ··· + . ∂un ∂un ∂ x1 ∂un ∂ x2 ∂un ∂ xn When expressed in this form the relationships between the partial differentiation operations ∂/∂ x1 , ∂/∂ x2 , . . . , ∂/∂ xn and ∂/∂u1 , ∂/∂u2 , . . . , ∂/∂un become clear. This interpretation is needed when ﬁnding higher order partial derivatives such as ∂ 2 F/∂u2 ∂u1 , because ∂ ∂2 F = ∂u2 ∂u1 ∂u1 ∂F ∂u2 = ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ + + ··· + ∂u1 ∂ x1 ∂u1 ∂ x2 ∂u1 ∂ xn ∂F ∂u2 . An important combination of partial derivatives that occurs throughout physics and engineering is called the Laplacian of a function. When a twice differentiable function f (x, y, z) of the cartesian coordinates x, y, and z is involved, the Laplacian of f , denoted by f and sometimes by ∇ 2 f , read “del squared f ,” takes the form f = ∇2 f = ∂2 f ∂2 f ∂2 f + 2 + 2. 2 ∂x ∂y ∂z (43) Cylindrical Polar Coordinates (r, θ, z) The cylindrical polar coordinate system (r, θ, z) is illustrated in Fig. 1.14, and its relationship to cartesian coordinates is given by x = r cos θ, y = r sin θ, z = z, with 0 ≤ θ < 2π and r ≥ 0. (44) Spherical Polar Coordinates (r, φ, θ) The spherical polar coordinate system (r, φ, θ ) shown in Fig. 1.15 is related to cartesian coordinates by x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ, with 0 ≤ θ ≤ π, 0 ≤ φ < 2π. (45) The derivation of the formulas for the change of variables in functions of several variables can be found in any one of references [1.1] to [1.7], where cylindrical and
• 48 Chapter 1 Review of Prerequisites z z P (r, θ, z) P (r, φ, θ) θ z r 0 0 θ r y y x P Q P y x φ y x z x FIGURE 1.14 Cylindrical polar coordinates (r, θ, z). FIGURE 1.15 Spherical polar coordinates (r, φ, θ ). spherical polar coordinates are also discussed. Information on general orthogonal coordinate systems can be found in references [G.3] and [2.3]. EXERCISES 1.13 1. By making the change of variables x = r cos θ, y = r sin θ, z = z, in the function f (x, y, z), when it becomes the function F(r, θ, z), show that in cylindrical polar coordinates ∂F ∂f ∂f = cos θ + sin θ , ∂r ∂x ∂y ∂f ∂f ∂F ∂f ∂F = −r sin θ + r cos θ , = . ∂θ ∂x ∂y ∂z ∂z 2. Use the results of Exercise 1 to show that in cylindrical polar coordinates the Laplacian ∂ f ∂ f ∂ f + + 2 becomes ∂ x2 ∂ y2 ∂z 2 ∂ F 1 ∂2 F 1 ∂F ∂2 F F = + 2 + + , 2 2 ∂r r ∂r r ∂θ ∂z2 and hence that an equivalent form of F is f = F= 1 ∂ r ∂r 2 r ∂F ∂r 2 + 2 1 ∂ r ∂θ ∂F ∂θ + ∂ ∂F r ∂z ∂z becomes F(r, φ, θ ), show that in spherical polar coordinates ∂F ∂f ∂f ∂f = sin θ cos φ + sin φ sin θ + cos φ ∂r ∂x ∂y ∂z ∂f ∂f ∂f ∂F = r cos φ cos θ + r cos φ sin θ − r sin φ ∂φ ∂x ∂y ∂z ∂f ∂f ∂F = −r sin φ sin θ + r sin φ cos θ . ∂z ∂x ∂y 4. Use the results of Exercise 3 to show that in spherical polar coordinates the Laplacian f = ∂2 f ∂2 f ∂2 f + + 2 2 2 ∂x ∂y ∂z becomes . 3. By making the change of variable x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ in the function f (x, y, z), when it F = 1 1 ∂ ∂F r2 + r 2 ∂r ∂r r 2 sin2 θ 1 ∂F ∂ + 2 sin θ . r sin θ ∂θ ∂θ ∂2 F ∂φ 2
• Section 1.14 1.14 Inverse Functions and the Inverse Function Theorem 49 Inverse Functions and the Inverse Function Theorem In mathematics and its applications it is often necessary to ﬁnd the inverse of a function y = f (x) so x can be expressed in the form x = g(y), and when this can be done the function g is called the inverse of f and is such that y = f (g(y)). When f is an arbitrary function its inverse is often denoted by f −1 , and this superscript notation is also used to denote the inverse of trigonometric functions so if, for example, y = sin x, the inverse sine function is written sin−1 , so that x = sin−1 y. However, the notation y = arcsin y is also used with the understanding that the notations arcsin and sin−1 are equivalent. A trivial example of a function whose inverse can be found unambiguously is y = ax + b, because provided a = 0 we can write x = (y − b)/a for all x and y. This is not the case, however, when trigonometric functions are involved, because the function y = sin x will give a unique value of y for any given x, but given y there are inﬁnitely many values of x for which y = sin x. This and similar inverse trigonometric functions are considered in elementary calculus courses. There the multivalued nature of the inverse sine function is resolved by restricting it to make y lie in a speciﬁc interval chosen so that one y corresponds to one x and, conversely, one x corresponds to one y. This situation is described by saying that the relationship between x and y is one-to-one. Speciﬁcally, in the case of the sine function, this is accomplished by requiring that if x = sin y, the inverse function y = Arcsin x is restricted so its principal value lies in the interval −π/2 ≤ Arcsinx ≤ π/2, where the domain of deﬁnition of the inverse function is −1 ≤ x ≤ 1. A different possibility that arises frequently is when x and y are related by an equation of the form f (x, y) = 0 from which it is impossible to extract either x as a function of y, or y as a function of x in terms of known functions. A typical example of this type is f (x, y) = x 2 − 2y2 − sin xy. To make matters precise, if x and y are related by an equation f (x, y) = 0, then if a function y = g(x) exists such that f (x, g(x)) = 0, the function y = g(x) is said to be deﬁned implicitly by f (x, y) = 0. Although it is often not possible to ﬁnd the function g(x), it is still necessary to know when, in a neighborhood of a point (x0 , y0 ), given a value of x, a unique value of y can be found, sometimes only numerically. The implicit function theorem that follows is seldom mentioned in ﬁrst calculus courses because its proof involves certain technicalities, but it is quoted here in the simplest possible form because of its fundamental importance and the fact that is it frequently used by implication. THEOREM 1.12 The implicit function theorem Let f (x, y) and fy (x, y) be continuous in a region D of the (x, y)-plane and let (x0 , y0 ) be a point inside D, where f (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0. Then (i) There is a rectangle R inside D containing (x0 , y0 ) at all points of which there can be found a unique y such that f (x, y) = 0. (ii) If the value of y is denoted by g(x), then y0 = g(x0 ), with f (x, g(x)) = 0, and g(x) is continuous inside R.
• 50 Chapter 1 Review of Prerequisites (iii) If, in addition, fx (x, y) is continuous in D then g(x) is differentiable in R and g (x) = − fx (x,g(x)) . fy (x,g(x)) In general terms, the implicit function theorem gives conditions that ensure the existence of an inverse function that is continuous and smooth enough to be differentiable. The theorem has a more general form involving functions f (x1 , x2 , . . . , xn ) of n variables, though this will not be given here. The interested reader can ﬁnd accounts of the implicit function theorem and some of its generalizations in references [1.4], [1.6], and [5.1].
• CHAPTER 1 TECHNOLOGY PROJECTS Project 1 Linear Difference Equations and the Fibonacci Sequence In Italy in 1202, Leonardo of Pisa, also known as Fibonacci, posed the following question. Let a newly born pair of rabbits produce two offspring each month, with breeding starting when they are 2 months old. Assuming that the pair of offspring start breeding in the same fashion when 2 months old, and that the process continues thereafter in a similar manner with no deaths, how many pairs of rabbits will there be after n months? If un , is the number of pairs of rabbits after n months, the production of rabbits can be represented by the linear difference equation, or recurrence relation, un+2 = un+1 + un , where the sequence of numbers ur with r = 1, 2, . . . is generated by setting u1 = 1 and u2 = 1, since this represents the initial pair of rabbits that began the breeding process. A simple calculation using this difference equation shows that the sequence of numbers generated in this manner that represents the number of pairs of rabbits present each month is 1, 1, 2, 3, 5, 8, . . . , and this is called the Fibonacci sequence. This sequence is found to occur in the study of regular solids, in numerical analysis, and elsewhere in mathematics. A linear difference equation of the form un+2 = aun+1 + bun , with a and b real numbers, can be solved by substituting un = Aλn into the difference equation and ﬁnding the two roots λ1 and λ2 of the resulting quadratic equation in λ. When λ1 = λ2 , the general solution is un = A1 λn + Aλn , and when λ1 = λ2 = λ , say, the gen2 1 eral solution is un = (A1 + nA2 )μn . The arbitrary constants A1 and A2 are found by requiring un to satisfy some given conditions of the form u1 = α and u2 = β, where the numbers α and β specify the way the sequence starts (the initial conditions). Use this method to show that the solution un for the Fibonacci sequence is √ n √ n 1 1 5 1+ 5 un = √ , 2 2 5 ( ) ( ) for n = 1, 2, . . . . Make use of computer algebra to generate the ﬁrst 30 terms of the Fibonacci sequence directly from the difference equation, and verify that the results are in agreement wïth the preceding formula. Use computer √algebra to show that limn→∞ (un /un 1 ) = 1 ( 5 + 1). This number is called 2 the golden mean, and in art and architecture it represents the ratio of the sides of a rectangle that is considered to have the most pleasing appearance. Project 2 Erratic Behavior of a Sequence Generated by a Difference Equation 1. Not all difference equations generate sequences of numbers that evolve steadily as happens with the Fibonacci sequence. Use computer algebra to generate the ﬁrst 20 terms of the sequence produced by the difference equation un+2 = 2un+1 5un with u1 = 1, u2 = 3, and observe its erratic behavior. Use the method of Project 1 to determine the analytical solution, and by means of computer algebra conﬁrm that the two results are in agreement. Examine the analytical solution and explain why the behavior of the sequence of terms is so erratic. 2. Construct a difference equation of your own in which the roots λ1 and λ2 are equal. Find the analytical solution and use computer algebra to determine the ﬁrst 20 terms of the sequence. Verify that these terms are in agreement with the ones generated directly from the difference equation. 51
• PART TWO VECTORS AND MATRICES 2 Chapter 3 Chapter Chapter 4 Vectors and Vector Spaces Matrices and System of Linear Equations Eigenvalues, Eigenvectors, and Diagonalization 53
• 2 C H A P T E R Vectors and Vector Spaces E ngineers, scientists, and physicists need to work with systems involving physical quantities that, unlike the density of a solid, cannot be characterized by a single number. This chapter is about the algebra of important and useful quantities called vectors that arise naturally when studying physical systems, and are deﬁned by an ordered group of three numbers (a, b, c). Vectors are of fundamental importance and they play an essential role when the laws governing engineering and physics are expressed in mathematical terms. A scalar quantity is one that is completely described when its magnitude is known, such as pressure, temperature, and area. A vector is a quantity that is completely speciﬁed when both its magnitude and direction are given, such as force, velocity, and momentum. A vector can be described geometrically as a directed straight line segment, with its length proportional to the magnitude of the vector, the line representing the vector parallel to the line of action of the vector, and an arrow on the line showing the direction along the line, or the sense, in which the vector acts. This geometrical interpretation of a vector is valuable in many ways, as it can be used to add and subtract vectors and to multiply them by a scalar, since this merely involves changing their magnitude and sense, while leaving the line to which they are parallel unchanged. However, to perform more general algebraic operations on vectors some other form of representation is required. The one that is used most frequently involves describing a vector in terms of what are called its components along a set of three mutually orthogonal axes, which are usually taken to be the axes O{x, y, z} in the cartesian coordinate system. Here, by the component of a vector along a given line l , we mean the length of the perpendicular projection of the vector onto the line l . We will see later that this cartesian representation of a vector identiﬁes it completely in terms of three components and enables algebraic operations to be performed on it. In particular, it allows the introduction of the scalar product, or dot product, of two vectors that results in a scalar, and a vector product, or cross product, of two vectors that leads to a vector. Finally, vectors and their algebra will be generalized to n space dimensions, leading to the concept of a vector space and to some related ideas. 55
• 56 Chapter 2 2.1 Vectors and Vector Spaces Vectors, Geometry, and Algebra M scalar vector directed straight line segment translation any quantities are completely described once their magnitude is known. A typical example of a physical quantity of this type is provided by the temperature at a given point in a room that is determined by the number specifying its value measured on a temperature scale, such as degrees F or degrees C. A quantity such as this is called a scalar quantity, and different examples of mathematical and physical scalar quantities are real numbers, length, area, volume, mass, speed, pressure, chemical concentration, electrical resistance, electric potential, and energy. Other physical quantities are only fully speciﬁed when both their magnitude and direction are given. Quantities like this are called vector quantities, and a typical example of a vector quantity arises when specifying the instantaneous motion of a ﬂuid particle in a river. In this case both the particle speed and its direction must be given if the description of its motion is to be complete. Speed in a given direction is called velocity, and velocity is a vector quantity. Some other examples of vector quantities are force, acceleration, momentum, the heat ﬂow vector at a point in a block of metal, the earth’s magnetic ﬁeld at a given location, and a mathematical quantity called the gradient of a scalar function of position that will be deﬁned later. By deﬁnition, the magnitude of a vector quantity is a nonnegative number (a scalar) that measures its size without regard to its direction, so, for example, the magnitude of a velocity is a speed. A convenient geometrical representation of a vector is provided by a straight line segment drawn in space parallel to the required direction, with an arrowhead indicating the sense in which the vector acts along the line segment, and the length of the line segment proportional to the magnitude of the vector. This is called a directed straight line segment, and by deﬁnition all directed straight line segments that are parallel to one another and have the same sense and length are regarded as equal. Expressed differently, moving a directed straight line segment parallel to itself so that its length remains the same and its arrow still points in the same direction leaves the vector it represents unchanged. A shift of a directed straight line segment of this type is called a translation of the vector it represents. For this reason the terms directed straight line segment and vector can be used interchangeably. Some examples of vectors that are equal through translation are shown in Fig. 2.1. It must be emphasized that geometrical representations of vectors as directed straight line segments in space are deﬁned without reference to a speciﬁc coordinate system. This purely geometrical interpretation of vectors ﬁnds many applications, though a different form of representation is necessary if an effective vector algebra is to be developed for use with the calculus. An analytical representation of vectors that allows a vector algebra to be constructed with this purpose in mind can be based on a general coordinate system. However, throughout this chapter only rectangular cartesian coordinates will be used because they provide a simple and natural way of representing vectors. FIGURE 2.1 Equal geometrical vectors.
• Section 2.1 Vectors, Geometry, and Algebra 57 z y 0 x FIGURE 2.2 A right-handed rectangular cartesian coordinate system. right-handed system In rectangular cartesian coordinates the x-, y-, and z-axes are all mutually orthogonal (perpendicular), and the positive sense along the axes is taken to be in the direction of increasing x, y, and z. The orientation of the axes will always be such that the positive direction along the z-axis is the one in which a right-handed screw (such as a corkscrew) aligned with the z-axis will advance when rotated from the positive x-axis to the positive y-axis, as shown in Fig. 2.2. A system of axes with this property is called a right-handed system. The end of a vector toward which the arrow points will be called the tip of the vector, and the other end its base. Because a vector is invariant under a translation, there is no loss of generality in taking its base to be located at the origin O of the coordinate system, and its tip at a point P with the coordinates (a1 , a2 , a3 ), say, as shown in Fig. 2.3. An application of the Pythagoras theorem to the triangle OPP z a3 OP = 2 + (a 1 1 /2 2) 2 + a3 a2 P (a1, a2, a3) a2 O y OP '= (a 1 a1 2 +a 2 2) 1/ 2 P x FIGURE 2.3 The vector from O to P and its components a1 , a2 , and a3 in the x-, y-, z-coordinate system.
• 58 Chapter 2 Vectors and Vector Spaces magnitude, unit vector, and components ordered number triple norm and modulus 2 2 2 shows the length of the line from O to P to be (a1 + a2 + a3 )1/2 . This length is proportional to the magnitude of the vector it represents, and as the base of the vector is at O, the sense of the vector is from O to P. For convenience, the constant of proportionality will be taken to be 1, so a directed straight line segment of unit length will represent a vector of magnitude 1 and so will be called a unit vector. Using this convention, the vector represented by the line from O to P in Fig. 2.3 has 2 2 2 magnitude (a1 + a2 + a3 )1/2 . The three numbers a1 , a2 , and a3 , in this order, that deﬁne the vector from O to P are called its components in the x, y, and z directions, respectively. A set of three numbers a1 , a2 , and a3 in a given order, written (a1 , a2 , a3 ), is called an ordered number triple. As the coordinates (a1 , a2 , a3 ) of point P in Fig. 2.3 completely deﬁne the vector from O to P, this ordered number triple may be taken as the deﬁnition of the vector itself. In general, changing the order of the numbers in an ordered number triple changes the vector it deﬁnes. Sometimes it is necessary to consider a vector whose base does not coincide with the origin. Suppose that when this occurs the base C is at the point (c1 , c2 , c3 ) and the tip D is at the point (d1 , d2 , d3 ). Then Fig. 2.4 shows the components of this vector in the x, y, and z directions to be d1 − c1 , d2 − c2 , and d3 − c3 . These components determine both the magnitude and direction of the vector. The vector is described by the ordered number triple (d1 − c1 , d2 − c2 , d3 − c3 ), and the length of CD that is equal to the magnitude of the vector is [(d1 − c1 )2 + (d2 − c2 )2 + (d3 − c3 )2 ]1/2 . For convenience, it is usual to represent a vector by a single boldface character such as a, and its magnitude (length) by a , called the norm of a. It is necessary to say here that in applications of vectors to mechanics, and in some purely geometrical applications of vectors, the norm of vector r is often called its modulus and written |r|. When this convention is used, because |r| is a scalar it is usual to denote it by the corresponding ordinary italic letter r , so that r = |r|. If the base and tip of a vector need to be identiﬁed by letters, a vector such as the one from C to D in Fig. 2.4 is written CD, with underlining used to indicate that a vector is involved, and the ordering of the letters is such that the ﬁrst shows the z d3 D c3 C 0 c2 d2 y c1 d1 C' D x FIGURE 2.4 Vector directed from point C at (c1 , c2 , c3 ) to point D at (d1 , d2 , d3 ).
• Section 2.1 Vectors, Geometry, and Algebra 59 base and the second the tip of the vector. Thus, CD and DC are vectors of equal magnitude but opposite sense, and when these vectors are represented by arrows, the arrows are parallel and of equal length, but point in opposite directions. EXAMPLE 2.1 If, in Fig. 2.4, C is the point (−3, 4, 9) and D the point (2, 5, 7), the vector CD has components 2 − (−3) = 5, 5 − 4 = 1, and 7 − 9 = −2, and so is represented by the ordered number triple (5, 1, −2), whereas vector DC has components −5, −1, and 2 and is represented by the ordered number triple (−5, −1, 2). Having illustrated the concepts of scalars and vectors using some familiar examples, we now develop the algebra of vectors in rather more general terms. Vectors A vector quantity a is an ordered number triple (a1 , a2 , a3 ) in which a1 , a2 , and a3 are real numbers, and we shall write a = (a1 , a2 , a3 ). The numbers a1 , a2 , and a3 , in this order, are called the ﬁrst, second, and third components of vector a or, equivalently, its x-, y-, and z-components. Null vector The null (zero) vector, written 0, has neither magnitude nor direction and is the ordered number triple 0 = (0, 0, 0). Equality of vectors Two vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) are equal, written a = b, if, and only if, a1 = b1 , a2 = b2 , and a3 = b3 . EXAMPLE 2.2 If a = (a1 , −5, 6), b = (3, b2 , b3 ) and c = (3, −5, 1), then a = b if a1 = 3, b2 = −5 and b3 = 6, and b = c if b2 = −5 and b3 = 1, but a = c for any choice of a1 because 6 = 1. Norm of a vector The norm of vector a = (a1 , a2 , a3 ), denoted by a , is the non-negative real number 2 2 2 a = a1 + a2 + a3 1/2 , and in geometrical terms a is the length of vector a. The norm of the null vector 0 is 0 = 0. For example, if a is in m/sec, “length” of a is in m/sec. EXAMPLE 2.3 If a = (1, −3, 2), then a = [12 + (−3)2 + 22 ]1/2 = √ 14, as illustrated in Fig. 2.5.
• 60 Chapter 2 Vectors and Vector Spaces z 2 A OA =⎥ ⎢a ⎥ a ⎢ −3 y 0 1 A x FIGURE 2.5 Vector a and its norm a . The sum of two vectors If a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) have the same dimensions, say, both are m/sec, their sum, written a + b, is deﬁned as the ordered number triple (vector) obtained by adding corresponding components of a and b to give a + b = (a1 + b1 , a2 + b2 , a3 + b3 ). EXAMPLE 2.4 If a = (1, 2, −5) and b = (−2, 2, 4), then a + b = (1 + (−2), 2 + 2, −5 + 4) = (−1, 4, −1). Multiplying a vector by a scalar Let a = (a1 , a2 , a3 ) and λ be an arbitrary real number. Then the product λa is deﬁned as the vector λa = (λa1 , λa2 , λa3 ). EXAMPLE 2.5 Let a = (2, −3, 5), b = (−1, 2, 4). Then 2a = (4, −6, 10), 4b = (−4, 8, 16), and 2a + 4b = (4 + (−4), −6 + 8, 10 + 16) = (0, 2, 26). This deﬁnition of the product of a vector and a scalar, called scaling a vector, shows that when vector a is multiplied by a scalar λ, the norm of a is multiplied by |λ|, because 2 2 2 λa = λ2 a1 + λ2 a2 + λ2 a3 1/2 = |λ| · a . It also follows from the deﬁnition that the sense of vector a is reversed when it is multiplied by −1, though its norm is left unaltered. The deﬁnition of the difference
• Section 2.1 Vectors, Geometry, and Algebra 61 y y a2 + b2 b2 a+ b a2 b b a2 a a 0 b1 a1 x a1 a1 + b1 x 0 FIGURE 2.6 The vector sum a + b. of two vectors is seen to be contained in the deﬁnition of their sum, because a − b = a + (−b). In particular, when a = b, we ﬁnd that that a − a = 0, showing that −a is the additive inverse of a. The geometrical interpretations of the sum a + b, the difference a − b, and the scaled vector λa in terms of their components are shown in Figs. 2.6 to 2.8, though to simplify the diagrams only the two-dimensional cases are illustrated. This involves no loss of generality, because it is always possible to choose the (x, y)-plane to coincide with the plane containing the vectors a and b. Vector Addition by the Triangle Rule triangle rule for addition Consideration of Fig. 2.6 shows that the addition of vector b to vector a is obtained geometrically by translating vector b until its base is located at the tip of vector a, and then the vector representing the sum a + b has its base at the base of vector a and its tip at the tip of the repositioned vector b. Because of the triangle involving vectors a, b, and a + b, this geometrical interpretation of a vector sum is called the triangle rule for vector addition. The triangle rule also applies to the difference of two vectors, as may be seen by considering Fig. 2.7, because after obtaining −b from b by reversing its sense, the difference a − b can be written as the vector sum a + (−b), where −b is added to vector a by means of the triangle rule. The algebraic results discussed so far concerning the addition and scaling of vectors, together with some of their consequences, are combined to form the following theorem. y y b2 a2 a2 b −b1 −b a a1 − b1 a 0 b1 a1 x −b2 FIGURE 2.7 The vector difference a − b. 0 a2 − b2 a−b −b a1 x
• 62 Chapter 2 Vectors and Vector Spaces y y 2a2 a2 1/2 a 2a 1/2 a2 a a1 0 y y x 0 x 2a1 0 (k = 2) −2a1 1/2 a1 x x (k = 1/2) −2a −2a2 (k = −2) FIGURE 2.8 The vector ka for different values of k. THEOREM 2.1 Addition and scaling of vectors Let P, Q, and R be arbitrary vectors and let α and β be arbitrary real numbers. Then: 1. P+Q=Q+P 2. P+0=0+P=P 3. (P + Q) + R = P + (Q + R) 4. α(P + Q) = αP + αQ 5. (αβ)P = α(βP) = β(αP) 6. (α + β)P = αP + βP 7. αP = |α| · P (vector addition is commutative); (0 is the identity element in vector addition); (vector addition is associative); (multiplication by a scalar is distributive over vector addition); (multiplication of a vector by a product of scalars is associative); (multiplication of a vector by a sum of scalars is distributive); (scaling P by α scales the norm of P by |α|). Proof The results of this theorem are all immediate consequences of the above deﬁnitions so as the proofs of results 1 to 6 are all very similar, and result 7 has already been established, we only prove result 4. Let P = ( p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ); then α(P + Q) = α( p1 + q1 , p2 + q2 , p3 + q3 ) = α[( p1 , p2 , p3 ) + (q1 , q2 , q3 )] = α( p1 , p2 , p3 ) + α(q1 , q2 , q3 ) = αP + αQ, as was to be shown. The Representation of Vectors in Terms of the Unit Vectors i, j, and k The components of a vector, together with vector addition, can be used to describe vectors in a very convenient way. The idea is simple, and it involves using the standard convention that i, j, and k are vectors of unit length that point in the positive sense along the x-, y-, and z-axes, respectively. Vectors such as i, j, and k that have a unit norm (length) are called unit vectors, so i = j = k = 1.
• Section 2.1 Vectors, Geometry, and Algebra 63 z a3 k j a= i a 1i j + a2 A(a1, a2, a3) k + a3 a3k 0 a2 y a1i a1 a2 j x FIGURE 2.9 Vector a in terms of the unit vectors i, j, and k. An arbitrary vector a can be represented by an “arrow,” with its base at the origin and its tip at the point A with cartesian coordinates (a1 , a2 , a3 ) where, of course, a1 , a2 , and a3 are also the components of a. Consequently, scaling the unit vectors i, j, and k by the respective x, y, and z components a1 , a2 , and a3 of a, followed by vector addition of these three vectors, shows that a can be written a = a1 i + a2 j + a3 k, (1) as can be seen from Fig. 2.9. The representation of vector a in terms of the unit vectors i, j, and k in (1), and the ordered triple notation, are equivalent, so a = a1 i + a2 j + a3 k = (a1 , a2 , a3 ). position vector (2) In some applications a vector deﬁnes a point in space, so vectors of this type are called position vectors. The symbol r is normally used for a position vector, so if point P with coordinates (x, y, z) is a general point in space, as in Fig. 2.10, its z k j r= i xi j+ +y P (x, y, z) zk zk 0 y xi yj 1/ OP = ⎥⎢r⎥⎢ = (x2 + y2 + z2) 2 x FIGURE 2.10 Position vector of a general point P in space.
• 64 Chapter 2 Vectors and Vector Spaces position vector relative to the origin is r = xi + yj + zk, (3) r = (x 2 + y2 + z2 )1/2 . (4) and its norm (length) is EXAMPLE 2.6 (a) Find the distance of point P from the origin given that its position vector is r = 2i + 4j − 3k. (b) If a general point P in space has position vector r = xi + yj + zk, describe the surface deﬁned by r = 3 and ﬁnd its cartesian equation. Solution (a) As r is the position vector of P relative to the origin, the distance of √ point P from the origin is r = [22 + 42 + (−3)2 ]1/2 = 29. (b) As r = 3 (constant), it follows that the required surface is one for which every point lies at a distance 3 from the origin, so the surface must be a sphere of radius 3 centered on the origin. As r = xi + yj + zk is the general position vector of a point on this sphere, the result r = 3 is equivalent to (x 2 + y2 + z2 )1/2 = 3, so the cartesian equation of the sphere is x 2 + y2 + z2 = 9. Because of the equivalence of the ordered number triple notation and the representation of vectors in terms of the unit vectors i, j, and k given in (2), both systems obey the same rules governing the addition and scaling of vectors in terms of their components. Thus, the following rules apply to the combination of any two vectors a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k expressed in terms of i, j, and k, and an arbitrary real number λ. The sum a + b is given by a + b = (a1 + b1 )i + (a2 + b2 )j + (a3 + b3 )k. (5) The product λ a is given by λ a = λa1 i + λa2 j + λa3 k. (6) The norm of scaled vector λ a is given by λ a = |λ| · a 2 2 2 = |λ| a1 + a2 + a3 EXAMPLE 2.7 1/2 . (7) If a = 5i + j − 3k and b = 2i − 2j − 7k, ﬁnd (a) a + b, (b) a − b, (c) 2a + b, and (d) |−2a|. Solution (a) a + b = (5i + j − 3k) + (2i − 2j − 7k) = (5 + 2)i + (1 − 2)j + (−3 − 7)k = 7i − j − 10k. (b) a − b = (5i + j − 3k) − (2i − 2j − 7k) = (5 − 2)i + (1 − (−2))j + (−3 − (−7))k = 3i + 3j + 4k.
• Section 2.1 Vectors, Geometry, and Algebra 65 2a + b = 2(5i + j − 3k) + (2i − 2j − 7k) = (10i + 2j − 6k) + (2i − 2j − 7k) = (10 + 2)i + (2 + (−2))j + (−6 + (−7))k = 12i − 13k. √ 1/2 |−2a| = [(−10)2 + (−2)2 + 62 ] = 2 35 (c) (d) or, equivalently, √ |−2a| = |−2| · a = 2 a = 2[52 + 12 + (−3)2 ]1/2 = 2 35. Finding a Unit Vector in the Direction of an Arbitrary Vector It is often necessary to ﬁnd a unit vector in the direction of an arbitrary vector a = a1 i + a2 j + a3 k. This is accomplished by dividing a by its norm a , because the vector a/ a has the same sense as a and its norm is 1. It is convenient to use a symbol related to an arbitrary vector a to indicate the unit vector in its direction, so from ˆ now on such a vector will be denoted by a, read “a hat.” So if a = a1 i + a2 j + a3 k, 2 2 2 ˆ a = a/ a = (a1 i + a2 j + a3 k)/ a1 + a2 + a3 = (a1 /a)i + (a2 /a)j + (a3 /a)k, 1/2 2 2 2 with a = a1 + a2 + a3 1/2 . (8) As the symbols i, j, and k are used exclusively for the unit vectors in the x-, y-, and ˆ z-directions, it is not necessary to write ˆ j, and k. i, ˆ ˆ The relationship between a, a, and a can be put in the useful form ˆ a = a a, (9) ˆ showing that a general vector a can always be written as the unit vector a scaled by a . Unless otherwise stated, a = 0. EXAMPLE 2.8 Find a unit vector in the direction of a = 3i + 2j + 5k. √ Solution As a = (32 + 22 + 52 )1/2 = 38, it follows that √ √ √ ˆ a = a/ a = (3/ 38)i + (2/ 38)j + (5/ 38)k. EXAMPLE 2.9 It is known from experiments in mechanics that forces are vector quantities and so combine according to the laws of vector algebra. Use this fact to ﬁnd the sum and difference of a force of 9 units in the direction of 2i + j − 2k and a force of 10 units in the direction of 4i − 3j, and determine the magnitudes of these forces. Solution We will use the convention that a unit vector represents a force of 1 unit. Let F be the force of 9 units. Then as 2i + j − 2k = [22 + 12 + (−2)2 ]1/2 = 3, the unit vector in the direction of F is ˆ F = (1/3)(2i + j − 2k) = (2/3)i + (1/3)j − (2/3)k, ˆ so F = 9F = 6i + 3j − 6k units.
• 66 Chapter 2 Vectors and Vector Spaces Similarly, let G be the force of 10 units. Then as 4i − 3j = 5, the unit vector in the direction of G is ˆ G = (1/5)(4i − 3j) = (4/5)i − (3/5)j, ˆ so G = 10G = 8i − 6j units. Combining these results shows that F + G = 14i − 3j − 6k units, and F − G = −2i + 9j − 6k units, from which it follows that the magnitudes of the forces are given by √ F + G = 241 units and F − G = 11 units. Equality of vectors expressed in terms of unit vectors As the difference of two equal and opposite vectors is the null vector 0, this shows that if a = b, where a = a1 i + a2 j + a3 k, and b = b1 i + b2 j + b3 k, then the respective components of vectors a and b must be equal, leading to the result that (10) a = b if, and only if, a1 = b1 , a2 = b2 , and a3 = b3 . Simple Geometrical Applications of Vectors Although our use of vectors will be mainly in connection with the calculus, the following simple geometrical applications are helpful because they illustrate basic vector arguments and properties. Although we have seen how an arbitrary vector can be expressed in terms of unit vectors associated with a cartesian coordinate system, it must be remembered that the fundamental concept of a vector and its algebra is independent of a coordinate system. Because of this, it is often possible to use the rules governing elementary vector algebra given in Theorem 2.1 to establish equations in a purely vectorial manner, without the need to appeal to any coordinate system. Once a general vector equation has been established, the representation of the vectors involved in terms of their components and the unit vectors i, j, and k can be used to convert the vector equation into the equivalent cartesian equations. The purely vectorial approach to geometrical problems is well illustrated by ﬁnding the vector AB in terms of the position vectors of points A and B, and then using the result to ﬁnd the position vector of the mid-point of AB. After this, the purely vectorial derivation of a geometrical result followed by its interpretation in cartesian form will be illustrated by ﬁnding the equation of a straight line in three space dimensions. Vector AB in terms of the position vectors of A and B Let a and b be the position vectors of points A and B relative to an origin O, as shown in Fig. 2.11. An application of the triangle rule for the addition of vectors gives OA + AB = OB, but OA = a and OB = b, so a + AB = b,
• Section 2.1 Vectors, Geometry, and Algebra 67 B AB A b a O FIGURE 2.11 Vectors a, b, and AB. giving AB = b − a. (11) When expressed in words, this simple but useful result asserts that vector AB is obtained by subtracting the position vector a of point A from the position vector b of point B. EXAMPLE 2.10 Find the position vector of the mid-point of AB if point A has position vector a and point B has position vector b relative to an origin O. Solution Let point C, with position vector c relative to origin O, be the mid-point of AB, as shown in Fig. 2.12. By the triangle rule, OA + AC = OC, but OA = a, and from (11) AC = (1/2)(b − a), so OC = a + (1/2)(b − a), so the required result is c = OC = (1/2)(b + a). B C AC A c b a O FIGURE 2.12 C is the mid-point of AB.
• 68 Chapter 2 Vectors and Vector Spaces b A a λb AP = L P r O FIGURE 2.13 The straight line L. The vector and cartesian equations of a straight line Let line L be a straight line through point A with position vector a relative to an origin O, and let the line be parallel to a vector b. If P is an arbitrary point on line L with position vector r relative to O, an application of the triangle rule for vector addition to the vectors shown in Fig. 2.13 gives r = OA + AP. vector equation of straight line But OA = a, and as AP is parallel to b, a number λ can always be found such that AP = λb, so the vector equation of line L becomes r = a + λb. (12) Notice that result (12) determines all points P on L if λ is taken to be a number in the interval −∞ < λ < ∞. The cartesian equations of line L follow by setting a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and r = xi + yj + zk in result (12), and then using the deﬁnition of equality of vectors given in (10) to obtain the corresponding three scalar cartesian equations. Proceeding in this way we ﬁnd that xi + yj + zk = a1 i + a2 j + a3 k + λ(b1 i + b2 j + b3 k), cartesian and standard form of straight line so equating corresponding components of i, j, and k on each side of this equation brings us to the required cartesian equations for L in the form x1 = a1 + λb1 , x2 = a2 + λb2 , x3 = a3 + λb3 . (13) An equivalent form of these equations is obtained by solving each equation for λ and equating the results to get x − a1 y − a2 z − a3 = = = λ. b1 b2 b3 (14)
• Section 2.1 Vectors, Geometry, and Algebra 69 This is the standard form (also called the canonical form) of the cartesian equations of a straight line. It is important to notice that when written in standard form the coefﬁcients of x, y, and z are all unity. Once the equation of a straight line is written in standard form, equating each numerator to zero determines the components (a1 , a2 , a3 ) of a position vector of a point on the line, while the denominators in the order (b1 , b2 , b3 ) determine the components of a vector parallel to the line. EXAMPLE 2.11 A straight line L is given in the form 3−y z+ 1 2x − 3 = = . 4 2 3 Find the position vector of a point on L and a vector parallel to L. Solution When the equation is written in standard form it becomes x − 3/2 y−3 z+ 1 = = = λ. 2 −2 3 Comparing these equations with (14) shows that (a1 , a2 , a3 ) = (3/2, 3, −1) and b = (b1 , b2 , b3 ) = (2, −2, 3). So the position vector of a point on the line is a = (3/2)i + 3j − k, and a vector parallel to the line is b = 2i − 2j + 3k. Neither of these results is unique, because μb is also parallel to the line for any scalar μ = 0, and any other point on L would sufﬁce. For example, the vector 14i − 14j + 21k is also parallel to the line, while setting λ = 2 leads to the result (a1 , a2 , a3 ) = (11/2, −1, 5), corresponding to a different point on the same line, this time with position vector a = (11/2)i − j + 5k. Summary This section has introduced vectors both as geometrical quantities that can be represented by directed line segments and, using a right-handed system of cartesian axes, as ordered number triples. Deﬁnitions of the scaling, addition, and subtraction of vectors have been given, and a general vector has been deﬁned in terms of the set of three unit vectors i, j, and k that lie along the orthogonal cartesian axes O{x, y, z}. Finally, the vector and cartesian equations of a straight line in space have been derived, and the standard form of the cartesian equations has been introduced from which a vector parallel to the line may be found by inspection. EXERCISES 2.1 1. Prove Results 1, 3, and 6 of Theorem 2.1. 2. Given that a = 2i + 3j − k, b = i − j + 2k, and c = 3i + 4j + k, ﬁnd (a) a + 2b − c, (b) a vector d such that a + b + c + d = 0, and (c) a vector d such that a − b + c + 3d = 0. 3. Given a = i + 2j + 3k, b = 2i − 2j + k, ﬁnd (a) a vector c such that 2a + b + 2c = i + k, (b) a vector c such that 3a − 2b + c = i + j − 2k. 4. Given that a = 3i + 2j − 3k, b = 2i − j + 5k, and c = 2i + 5j + 2k, ﬁnd (a) 2a + 3b − 3c, (b) a vector d such that a + 3b − 2c + 3d = 0, and (c) a vector d such that 2a − 3d = b + 4c. 5. Given that Aand B have the respective position vectors 2i + 3j − k and i + 2j + 4k, ﬁnd the vector AB and a unit vector in the direction of AB. 6. Given that A and B have the respective position vectors 3i − j + 4k and 2i + j + k, ﬁnd the vector AB and the position vector c of the mid-point of AB. 7. Given that Aand B have the respective position vectors a and b, ﬁnd the position vector of a point P on the line AB located between A and B such that (length AP)/(length PB) = m/n, where m, n > 0 are any two real numbers.
• 70 Chapter 2 Vectors and Vector Spaces 8. Find the position vector r of a point P on the straight line joining point Aat (1, 2, 1) and point B at (3, −1, 2) and between A and B such that 13. A straight line L is given in the form 2x + 1 3y + 2 2 − 4z = = . 3 4 −1 (length AP)/(length PB) = 3/2. 9. It is known from Euclidean geometry that the medians of a triangle (lines drawn from a vertex to the mid-point of the opposite side) all meet at a single point P, and that P is two-thirds of the distance along each median from the vertex through which it passes. If the vertices A, B, and C of a triangle have the respective position vectors a, b, and c, show that the position vector of P is (1/3)(a + b + c). 10. Forces of 1, 2, and 3 units act through the origin along, and in the positive directions of, the respective x-, y-, and z-axes. Find the vector sum S of these forces, the magnitude S of the sum of the vectors, and a unit vector in the direction of S. 11. Forces of 2, 1, and 4 units act through the origin along, and in the positive directions of, the respective x-, y-, and z-axes. Find the vector sum S of these forces, the magnitude S of the sum of the vectors, and a unit vector in the direction of S. 12. A straight line L is given in the form 3x − 1 2y + 3 2 − 3z = = . 4 2 1 Find the position vectors of two different points on L and a unit vector parallel to L. 2.2 14. 15. 16. 17. 18. Find position vectors of two different points on L and a unit vector parallel to L. Given that a straight line L1 passes through the points (−2, 3, 1) and (1, 4, 6), ﬁnd (a) the position vector of a point on the line and a vector parallel to it, and (b) a straight line L2 parallel to L1 that passes through the point (1, 2, 1). Given that a straight line L1 passes through the points (3, 2, 4) and (2, 1, 6), ﬁnd (a) the position vector of a point on the line and a vector parallel to it, and (b) a straight line L2 parallel to L1 that passes through the point (−2, 1, 2). A straight line has the vector equation r = a + λb, where a = 3j + 2k, and b = 2i + j + 2k. Find the cartesian equations of the line and the coordinates of three points that lie on it. A straight line passes through the point (3, 2, −3) parallel to the vector 2i + 3j − 3k. Find the cartesian equations of the line and the coordinates of three points that lie on it. In mechanics, if a point A moves with velocity vA and point B moves with velocity vB , the velocity vR of A relative to B (the relative velocity of A with respect to B) is deﬁned as vR = vA − vB . Power boat A moves northeast at 20 knots and power boat B moves southeast at 30 knots. Find the velocity of boat Arelative to boat B, and a unit vector in the direction of the relative velocity. The Dot Product (Scalar Product) A product of two vectors a and b can be formed in such a way that the result is a scalar. The result is written a · b and called the dot product of a and b. The names scalar product and inner product are also used in place of the term dot product. Dot Product dot or scalar product Let a and b be any two vectors that after a translation to bring their bases into coincidence are inclined to one another at an angle θ , as shown in Fig. 2.14, where 0 ≤ θ ≤ π. Then the dot product of a and b is deﬁned as the number a · b = a · b cos θ. This geometrical deﬁnition of the dot product has many uses, but when working with vectors a and b that are expressed in terms of their components in the i, j, and
• Section 2.2 The Dot Product (Scalar Product) 71 a θ b FIGURE 2.14 Vectors a and b inclined at an angle θ. k directions, a more convenient form is needed. An equivalent deﬁnition that is easier to use is given later in (23). properties of the dot product Properties of the dot product The following results, in which a and b are any two vectors and λ and μ are any two scalars, are all immediate consequences of the deﬁnition of the dot product. The dot product is commutative a · b = b · a and λa · μb = μa · λb = λμa · b (15) The dot product is distributive and linear a · (b + c) = a · b + a · c and a · (λb + μc) = λa · b + μa · c. (16) The angle between two vectors The angle θ between vectors a and b is given by cos θ = a·b , a · b with 0 ≤ θ ≤ π. (17) Parallel vectors (θ = 0) If vectors a and b are parallel, then a·b= a · b and, in particular, a · a = a 2. (18) Orthogonal vectors (θ = π/2) If vectors a and b are orthogonal, then a · b = 0. (19)
• 72 Chapter 2 Vectors and Vector Spaces Product of unit vectors ˆ ˆ If a and b are unit vectors, then ˆ ˆ a · b = cos θ, with 0 ≤ θ ≤ π. (20) An immediate consequence of properties (15), (19), and (20) is that i · i = j · j = k · k = 1, (21) i · j = j · i = i · k = k · i = j · k = k · j = 0. (22) and We now use results (21) and (22) to arrive at a simple expression for the dot product in terms of the components of a and b. To arrive at the result we set a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k and form the dot product a · b = (a1 i + a2 j + a3 k) · (b1 i + b2 j + b3 k). dot product in terms of components Expanding this product using (15) and (16) and making use of results (21) and (22) brings us to the following alternative deﬁnition of the dot product expressed in terms of the components of a and b: a · b = a1 b1 + a2 b2 + a3 b3 . (23) Using (23) in (17) produces the following useful expression that can be used to ﬁnd the angle θ between a and b: cos θ = EXAMPLE 2.12 a1 b1 + a2 b2 + a3 b3 2 a1 + 2 a2 2 + a3 1/2 2 2 2 b1 + b2 + b3 1/2 where 0 ≤ θ ≤ π. (24) Find a · b and the angle between the vectors a and b, given that a = i + 2j + 3k and b = 2i − j − 2k. √ Solution a = 14, b = 3, and a · b = 1 · 2 + 2 · (−1) + 3 · (−2) = −6. Using these results in (24) gives √ √ cos θ = −6/(3 14) = −2/ 14, so as 0 ≤ θ ≤ π we see that θ = 2.1347 radians, or θ = 122.3◦ . projecting a vector onto a line The projection of a vector onto the line of another vector The projection of vector a onto the line of vector b is a scalar, and it is the signed length of the geometrical projection of vector a onto a line parallel to b, with the sign positive for 0 ≤ θ < π/2 and negative for π/2 < θ ≤ π . This is illustrated in Fig. 2.15, from which it is seen that the signed length of the projection of a onto the line of vector b is ON, where ON = a cos θ.
• Section 2.2 OA = ⎥⎢a⎥⎢ a A The Dot Product (Scalar Product) A a OA = ⎥⎢a⎥⎢ a a θ b 73 O N N b ^ b θ O ^ b π/2 < θ ≤ π 0 ≤ θ < π/2 FIGURE 2.15 The projection of vector a onto the line of vector b. ˆ ˆ ˆ ˆ If b is the unit vector along b, then as a = a a , and a · b = cos θ , the projection ON = a cos θ can be written as the dot product ˆ ˆ ˆ ON = a a · b = a · b = EXAMPLE 2.13 a·b b (25) Find the strength of the magnetic ﬁeld vector H = 5i + 3j + 7k in the direction of 2i − j + 2k, where a unit vector represents one unit of magnetic ﬂux. Solution We are required to ﬁnd the projection of vector H in the direction of ˆ the vector 2i − j + 2k. Setting b = 2i − j + 2k, b = 3, so b = (1/3)(2i − j + 2k), so the strength of the vector H in the direction of b is ˆ H · b = (1/3)(5i + 3j + 7k) · (2i − j + 2k) = 7. Direction cosines and direction ratios ˆ If a = a1 i + a2 j + a3 k is an arbitrary vector, the unit vector a in the direction of a is ˆ a = (a1 i + a2 j + a3 k)/ a 1/2 2 2 2 = (a1 i + a2 j + a3 k)/ a1 + a2 + a3 . (26) Taking the dot product of a with i, j, and k, and setting l = a1 / 2 2 2 2 2 2 2 2 2 (a1 + a2 + a3 )1/2 , m = a2 /(a1 + a2 + a3 )1/2 , and n = a3 /(a1 + a2 + a3 )1/2 gives ˆ l = i · a, ˆ m = j · a, and ˆ n = k · a, so we may write ˆ a = li + mj + nk. 2 2 2 ˆ ˆ The dot product a · a = l 2 + m2 + n2 = (a1 + a2 + a3 )/ a 2 , but 2 2 2 a1 + a2 + a3 , so l 2 + m2 + n2 = 1. (27) a 2 = (28) The number l is the cosine of the angle β1 between a and the x-axis, the number m is the cosine of the angle β2 between a and the y-axis, and the number n is the cosine of the angle β3 between a and the z-axis, as shown in
• 74 Chapter 2 Vectors and Vector Spaces z β3 O a⎥ ⎢ = ⎥⎢ OA a A β2 β1 y x FIGURE 2.16 The angles β1 , β2 , and β3 . direction cosines Fig. 2.16. The numbers (l, m, n) are called the direction cosines of a, because ˆ they determine the direction of the unit vector a that is parallel to a. Notice that when any two of the three direction cosines l, m, and n of a vector a are given, the third is related to them by l 2 + m2 + n2 = 1. Because of result (27) it is always possible to write a = a (li + mj + nk), direction ratios EXAMPLE 2.14 (29) where l, m, and n are the direction cosines of a. As the components a1 , a2 , and a3 of a are proportional to the direction cosines, they are called the direction ratios of a. Find the direction cosines and direction ratios of a = 3i + j − 2k. √ √ √ Solution√ As a = 14, the direction cosines are l = 3/ 14, m = 1/ 14, and n = −2/ 14. The direction ratios of a are 3, 1, and −2, or any nonnegative multiple √ √ √ of these three numbers such as 15/ 14, 5/ 14, and −10/ 14. The triangle inequality The following result will be needed in the proof of the triangle inequality that is to follow. The absolute value of a · b = a · b cos θ is |a · b| = a · b |cos θ|, but | cos θ | ≤ 1, so using this in the above result we obtain the Cauchy–Schwarz inequality, |a · b| ≤ a · b . (30)
• Section 2.2 THEOREM 2.2 The Dot Product (Scalar Product) 75 The triangle inequality If a and b are any two vectors, then a+b ≤ a + b . Proof From (18) we have a+b 2 = (a + b) · (a + b) = a · a + 2a · b + b · b = a 2 + 2a · b + b 2 , but a · b ≤ |a · b|, so from the Cauchy–Schwarz inequality (30) a+b 2 ≤ a 2 +2 a · b + b 2 = ( a + b )2 . Taking the positive square root of this last result, we obtain the triangle inequality a+b ≤ a + b . The triangle inequality will be generalized in Section 2.5, but in its present form it is the vector equivalent of the Euclidean theorem that “the sum of the lengths of any two sides of a triangle is greater than or equal to the length of the third side,” and it is from this theorem that the inequality derives its name. Equation of a Plane vector equation of a plane When working with the vector calculus it is sometimes necessary to consider a plane that is locally tangent to a point on a surface in space so it will be useful to derive the general equation of a plane in both its vector and cartesian forms. A plane can be deﬁned by specifying a ﬁxed point belonging to the plane and a vector n that is perpendicular to the plane. This follows because if n is perpendicular at a point on the plane, it must be perpendicular at every point on the plane. Any vector n that is perpendicular to a plane is called a normal to the plane. Clearly a normal to a plane is not unique, because a plane has two sides, so if a normal n is directed away from one side of the plane, the vector −n is a normal directed away from the other side. Both n and −n can be scaled by any nonzero number and still remain normals; consequently, if n is a normal to a plane, so also are all vectors of the form λn, with λ = 0 any real number. Let a ﬁxed point Aon plane with normal n have position vector a relative to an origin O, and let P be a general point on plane with position vector r relative to O. Then, as may be seen from Fig. 2.17, the vector r − a lies in the plane, and so is perpendicular (normal) to n. Forming the dot product of n and r − a, and using (19), shows that the vector equation of plane is n · (r − a) = 0, (31) n · r = n · a. (32) or, equivalently, cartesian equation of a plane The cartesian form of this equation follows by considering a general point with coordinates (x, y, z) on plane , setting r = xi + yj + zk, a = a1 i + a2 j + a3 k,
• 76 Chapter 2 Vectors and Vector Spaces n π A r−a AP = P a ^.a = ^.r n n ^ n r O FIGURE 2.17 Plane through point A. with normal n passing and n = n1 i + n2 j + n3 k, and then substituting into (32) to get (n1 i + n2 j + n3 k) · (xi + yj + zk) = (n1 i + n2 j + n3 k) · (a1 i + a2 j + a3 k). Taking the dot products and using results (21) and (22) show the cartesian equation of plane to be n1 x + n2 y + n3 z = n1 a1 + n2 a2 + n3 a3 = d, a constant. EXAMPLE 2.15 (33) Find the cartesian equation of the plane through the point (2, 5, 3) with normal 3i + 2j − 7k. Solution Here n1 = 3, n2 = 2, n3 = −7 and a1 = 2, a2 = 5, and a3 = 3, so substituting into (33) shows the plane has the equation 3x + 2y − 7z = −5. Summary This section has introduced the dot or scalar product of two vectors in geometrical terms and, more conveniently for calculations, in terms of the components of the two vectors involved. The applications given include the important operation of projecting a vector onto the line of another vector and the derivation of the vector equation and cartesian equation of a plane. EXERCISES 2.2 1. Find the dot products of the following pairs of vectors: (a) i − j + 3k, 2i + 3j + k. (b) 2i − j + 4k, −i + 2 j + 2k. (c) i + j − 3k, 2i + j + k. 2. Find the dot products of the following pairs of vectors: (a) i − 2 j + 4k, i + 2 j + 3k. (b) 3i + j + 2k, 4i − 3j + k. (c) 5i − 3j + 3k, 2i − 3j + 5k. 3. Find which of the following pairs of vectors are orthogonal: (a) 3i + 2 j − 6k, −9i − 6j + 18k. (b) 3i − j + 7k, 3i + 2 j + k. (c) 2i + j + k, i + j − k. (d) i + j − 3k, 2i + j + k. 4. Find which, if any, of the following pairs of vectors are orthogonal: (a) 2i + j + k, 8i + 2 j + 2k. (b) i + 2 j + 3k, 2i − 2 j − 3k. (c) i + 2 j + 4k, 2i + j + 3k. (d) i + j, 2 j + 3k. 5. Given that a = 2i + 3j − 2k, b = i + 3j + k and c = 3i + j − k, ﬁnd (a) (a + b) · c. (b) (2b − 3c) · a. (c) a · a. (d) c · (a − 2b).
• Section 2.3 6. Given that a = 3i + 2 j − 3k, b = 2i + j + 2k, and c = 5i + 2 j − 2k, ﬁnd (a) b · (b + (a · c)c). (b) (a + 2b) · (2b − 3c). (c) (c · c)b − (a · a)c. 7. Find the angle between the following pairs of vectors: (a) i + j + k, 2i + j − k. (b) 2i − j + 3k, 2i + j + 3k. (c) 3i − j + k, i − 2 j + 3k. (d) i − 2 j + k, 4i − 8j + 16k. 8. Given a = 2i − 3j − 3k, b = i + j + 2k, and c = 3i − 2 j − k, ﬁnd the angles between the following pairs of vectors: (a) a + b, b − 2c. (b) 2a − c, a + b − c. (c) b + 3c, a − 2c. 9. Find the component of the force F = 4i + 3j + 2k in the direction of the vector i + j + k. 10. Find the component of the force F = 2i + 5j − 3k in the direction of the vector 2i + j − 2k. 11. Given that a = i + 2 j + 2k and b = 2i − 3j + k, ﬁnd (a) the projection of a onto the line of b, and (b) the projection of b onto the line of a. 12. Given that a = 3i + 6j + 9k and b = i + 2 j + 3k, (a) ﬁnd the projection of a onto the line of b and (b) compare the magnitude of a with the result found in (a) and comment on the result. 13. Find the direction cosines and corresponding angles for the following vectors: (a) i + j + k. (b) i − 2 j + 2k. (c) 4i − 2 j + 3k. 14. Find the direction cosines and corresponding angles for the following vectors: (a) i − j − k. (b) 2i + 2 j − 5k. (c) −4j − k. 15. Verify the triangle inequality for vectors a = i + 2 j + 3k and b = 2i + j + 7k. 16. Verify the triangle inequality for vectors a = 2i − j − 2k and 3i + 2 j + 3k. 17. Find the equation of the plane with normal 2i − 3j + k that contains the point (1, 0, 1). 18. Find the equation of the plane with normal i − 2 j + 2k that contains the point (2, −3, 4). 19. Given that a plane passes through the point (2, 3, −5), and the vector 2i + k is normal to the plane, ﬁnd the cartesian form of its equation. 2.3 The Cross Product 77 20. The equation of a plane is 3x + 2y − 5z = 4. Find a vector that is normal to the plane, and the position vector of a point on the plane. 21. Explain why if the vector equation of plane in (32) is divided by n to bring it into the form r · n = a · n, the number |a · n| is the perpendicular distance of origin O from the plane. Explain also why if a · n > 0 the plane lies to the side of O toward which n is directed, as in Fig. 2.15, but that if a · n < 0 it lies on the opposite side of O toward which −n is directed. 22. Use the result of Exercise 21 to ﬁnd the perpendicular distance of the plane 2x − 4y − 5z = 5 from the origin. 23. The angle between two planes is deﬁned as the angle between their normals. Find the angle between the two planes x + 3y + 2z = 4 and 2x − 5y + z = 2. 24. Find the angle between the two planes 3x + 2y − 2z = 4 and 2x + y + 2z = 1. 25. Let a and b be two arbitrary skew (nonparallel) vectors, and set a = ab + ap , where ab is parallel to b and ap is perpendicular to b and lies in the plane of a and b. Find ab and ap in terms of a and b. 26. The law of cosines for a triangle with sides of length a, b, and c, in which the angle opposite the side of length c is C, takes the form c2 = a 2 + b2 − 2ab cos C. Prove this by taking vectors a, b, and c such that c = a − b and considering the dot product c · c = (a − b) · (a − b). 27. The work units W done by a constant force F when moving its point of application along a straight line L parallel to a vector a are deﬁned as the product of the component of F in the direction of a and the distance d moved along line L. Express W in terms of F, a, and d. 28. If a and b are arbitrary vectors and λ and μ are any two scalars, prove that λa + μb 2 ≤ λ2 a 2 + 2λμa · b + μ2 b 2 . 29. Verify the result of Exercise 28 by setting λ = 2, μ = −3, a = 3i + j − 4k, and b = 2i + 3j + k. The Cross Product A product of two vectors a and b can be deﬁned in such a way that the result is a vector. The result is written a × b and called the cross product of a and b. The name vector product is also used in place of the term cross product. Before deﬁning the cross product we ﬁrst formulate what is called the right-hand rule. Given any two skew vectors a and b, the right-hand rule is used to determine
• 78 Chapter 2 Vectors and Vector Spaces the sense of a third vector c that is required to be normal to the plane containing vectors a and b. right-hand rule The Right-Hand Rule Let a and b be two arbitrary skew vectors with the same base point, with c a vector normal to the plane containing them. If the ﬁngers of the right hand are curled in such a way that they point from vector a to vector b through the angle θ between them, with 0 < θ < π , then when the thumb is extended away from the palm it will point in the direction of vector c. When applying the right-hand rule, the order of the vectors is important. If vectors a, b, and c obey the right-hand rule, they will always be written in the order a, b, c, with the understanding that c is normal to the plane of a and b, with its sense determined by the right-hand rule. Figure 2.18 illustrates the right-hand rule. An important special case of the right-hand rule has already been encountered in connection with the unit vectors i, j, and k that obey the rule, and because the vectors are mutually orthogonal the vectors j, k, i and k, i, j also obey the right-hand rule. geometrical deﬁnition of a cross product The cross product (a geometrical interpretation) ˆ Let a and b be two arbitrary vectors, with n a unit vector normal to the plane of ˆ a and b chosen so that a, b, and n, in this order, obey the right-hand rule. Then the cross product of vectors a and b, written a × b, is deﬁned as the vector ˆ a × b = a . b sin θ n. (34) This geometrical deﬁnition of the cross product is useful in many situations, but when the vectors a and b are speciﬁed in terms of their cartesian components a different form of the deﬁnition will be needed. The cross product can be interpreted as a vector area, in the sense that it can ˆ be written a × b = Sn, where S = OA· BN = a · b sin θ is the geometrical area c θ b a FIGURE 2.18 The right-hand rule.
• Section 2.3 The Cross Product 79 B ^ n S b A θ a O FIGURE 2.19 The cross product interpreted as the vector area of a parallelogram. ˆ of the parallelogram in Fig. 2.19, and the unit vector n is normal to the area. This shows that the geometrical area S of the vector parallelogram with sides a and b is simply the modulus of the cross product a × b, so S = a × b . properties of the cross product Properties of the cross product The following results are consequences of the deﬁnition of the cross product. The cross product is anticommutative a × b = −b × a (35) a × (b + c) = a × b + a × c. (36) The cross product is associative Parallel vectors (θ = 0) If vectors a and b are parallel, then a × b = 0. (37) Orthogonal vectors (θ = π/2) If vectors a and b are orthogonal, then ˆ a × b = a . b n. (38) ˆ a × b = sin θ n. (39) Product of unit vectors If a and b are unit vectors, then An immediate consequence of properties (34), (35), and (37) is that i × i = j × j = k × k = 0, (40)
• 80 Chapter 2 Vectors and Vector Spaces and i × j = k, j × i = −k, j × k = i, k × j = −i, k × i = j, i × k = −j. (41) Only results (35) and (36) require some comment, as the other results are obvious. The change of sign in (35) that makes the cross product anticommutative occurs because when the vectors a and b are interchanged, the right-hand rule ˆ causes the direction of n to be reversed. Result (36) can be proved in several ways, but we shall postpone its proof until a different expression for the cross product has been derived. To obtain a more convenient expression for the cross product that can be used when a and b are known in terms of their components, we proceed as follows. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k, and consider the cross product a × b = (a1 i + a2 j + a3 k) × (b1 i + b2 j + b3 k). Expanding this expression term by term is justiﬁed because of the associative property given in (36), and it leads to the result a × b = a1 b1 i × i + a1 b2 i × j + a1 b3 i × k + a2 b1 j × i + a2 b2 j × j + a2 b3 j × k + a3 b1 k × i + a3 b2 k × j + a3 b3 k × k. cross product in terms of components Results (40) cause three terms on the right-hand side to vanish, and results (41) allow the remaining six terms to be collected into three groups as follows to give a × b = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k. (42) This alternative expression for the cross product in terms of the cartesian components of vectors a and b can be further simpliﬁed by making formal use of the third-order determinant, i a × b = a1 b1 j a2 b2 k a3 , b3 because a formal expansion in terms of elements of the ﬁrst row generates result (42). We take this result as an alternative but equivalent deﬁnition of the cross product. practical deﬁnition of a cross product using a determinant The cross product (cartesian component form) Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then i a × b = a1 b1 j a2 b2 k a3 , b3 (43) When expressing a × b as the determinant in (43), purely formal use was made of the method of expansion of a determinant in terms of the elements of its ﬁrst row, because (43) is not a determinant in the ordinary sense as its elements are a mixture of vectors and numbers.
• Section 2.3 EXAMPLE 2.16 The Cross Product 81 ˆ Given that a = 3i − 2 j − k and b = i + 4j + 2k, ﬁnd a × b and a unit vector n normal to the plane containing a and b such that a, b, and n, in this order, obey the right-hand rule. Solution Substitution into expression (43) gives i a×b = 3 1 j −2 4 k −1 2 = [(−2) · 2 − 4 · (−1)]i − [3 · 2 − 1 · (−1)] j + [3 · 4 − 1 · (−2)]k = −7j + 14k. ˆ The required unit vector n is simply the unit vector in the direction of a × b, so √ ˆ n = (a × b)/ a × b = (−7j + 14k)/(7 5). √ √ = (−1/ 5)j + (2/ 5)k. We now return to the proof of the associative property stated in (35) and establish it by means of result (43). Setting a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k, we have a × (b + c) = i a1 (b1 + c1 ) j a2 (b2 + c2 ) k a3 . (b3 + c3 ) Expanding the determinant in terms of elements of its ﬁrst row and grouping terms gives a × (b + c) = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k + (a2 c3 − a3 c2 )i − (a1 c3 − a3 c1 )j + (a1 c2 − a2 c1 )k = a × b + a × c, and the result is proved. Summary This section ﬁrst introduced the vector or cross product of two vectors in geometrical terms and then used the result to show that the vector product is anticommutative, in the sense that a × b = −b × a. Important results involving the vector product are given in terms of the components of the two vectors that are involved. Finally, the vector product was expressed in a form that is most convenient for calculations by writing it in determinantal form, the rows of which contain the unit vectors i, j, and k and the components of the respective vectors. EXERCISES 2.3 In Exercises 1 through 6 use (43) to ﬁnd a × b. 1. 2. 3. 4. 5. For a = 2i − j − 4k, b = 3i − j − k. For a = −3i + 2 j + 4k, b = 2i + j − 2k. For a = 7i + 6k, b = 3j + k. For a = 3i + 7j + 2k, b = i − j + k. For a = 2i + j + k, b = 2i − j + k. 6. For a = 3i − 2 j + 6k, b = 2i + j + 3k. In Exercises 7 through 10 verify the equivalence of the definitions of the cross product in (34) and (43) by ﬁrst using ˆ (43) to calculate a × b, and hence a × b and n, and then calculating a and b directly, using result (17) to ﬁnd cos θ and hence sin θ, and using the results to ﬁnd a × b from (34).
• 82 7. 8. 9. 10. Chapter 2 Vectors and Vector Spaces For a = i + j + 3k and b = 3i + 2 j + k. For a = i + j + k and b = 4i + 2 j + 2k. For a = 2i + j − 3k and b = 5i − 2k. For a = −2i − 3j + k and b = 3i + j + 2k. 21. 22. 23. 24. In Exercises 11 through 14, verify by direct calculation that (b + c) × a = −a × (b + c). 11. 12. 13. 14. a = 3j + 2k, b = i − 4j + k, and c = 5i − 2 j + 3k. a = −i + 5j + 2k, b = 4i + k, and c = −2i − 4j + 3k. a = i + k, b = 3i − j − 2k, and c = 3i + j + k. a = 5i + j + k, b = 2i − j − k, and c = 4i + 2 j + 3k. In Exercises 15 through 18 ﬁnd a unit vector normal to a plane containing the given vectors. 3i + j + k and i + 2 j + k. 2i − j + 2k and 2i + 3j + k. i + j + k and 2i + 3j − k. 2i + 2 j − k and 3i + j + 4k. Find a unit vector normal to a plane containing vectors a + b and a + c, given that a = i + 2 j + k, b = 2i + j − 2k, and c = 3i + 2 j + 4k. 20. Given that a = 3i + j + k, b = 2i − j + 2k, and c = i + j + k, ﬁnd (a) a vector normal to the plane containing the vectors a + (a · b)b and c and, (b) explain why the normal to a plane containing the vectors a and b and the normal to a plane containing the vectors (a · b)a and (b · c)b are parallel. 15. 16. 17. 18. 19. In Exercises 21 through 24, ﬁnd the cartesian equation of the plane that passes through the given points. 2.4 25. 26. 27. 28. 29. (1, 3, 2), (2, 0, −4), and (1, 6, 11). (1, 4, 3), (2, 0, 1), and (3, 4, −6). (1, 2, 3), (2, −4, 1), and (3, 6, −1). (1, 0, 1), (2, 5, 7), and (2, 3, 9). Three points with position vectors a, b, and c will be collinear (lie on a line) if the parallelogram with adjacent sides a − b and a − c has zero geometrical area. Use this result in Exercises 25 through 28 to determine which sets of points are collinear. (2, 2, 3), (6, 1, 5), (−2, 4, 3). (1, 2, 4), (7, 0, 8), (−8, 5, −2). (2, 3, 3), (3, 7, 5), (0, −5, −1). (1, 3, 2), (4, 2, 1), (1, 0, 2). A vector N normal to the plane containing the skew vectors a and b can be found as follows. N is normal to a and b, so a · N = 0 and b · N = 0. If a component of N is assigned an arbitrary nonzero value c, say, the other two components can be found from these two equations as multiples of c, and N will then be determined as a multiple of c. A suitable choice of c will make N a ˆ unit normal N. Apply this method to vectors a and b in ˆ Exercise 7 to ﬁnd a vector N. Compare the result with the unit vector ˆ n = (a × b)/ a × b ˆ ˆ found from (43). Explain why although both n and N are normal to the plane containing a and b they may have opposite senses. Linear Dependence and Independence of Vectors and Triple Products The dot and cross products can be combined to provide a simple test that determines whether or not an arbitrary set of three vectors possesses a property of fundamental importance to the algebra of vectors. First, however, some introductory remarks are necessary. Given a set of n vectors a1 , a2 , . . . , an , and a set of n constants c1 , c2 , . . . , cn , the sum c1 a1 + c2 a2 + · · · + cn an linear combination of vectors basis is called a linear combination of the vectors. Linear combinations of the vectors i, j, and k were used in Section 2.1 to express every vector in three-dimensional space as a linear combination of these three vectors. A triad of vectors such as i, j, and k with the property that all vectors in three-dimensional space can be represented as linear combinations of these three vectors is said to form a basis for the space.
• Section 2.4 Linear Dependence and Independence of Vectors and Triple Products 83 z a3 0 a2 a1 y x FIGURE 2.20 Nonorthogonal triad forming a basis in three-dimensional space. It is a fundamental property of three-dimensional space that a basis for the space comprises a set of three vectors a1 , a2 , and a3 , with the property that the linear combination c1 a1 + c2 a2 + c3 a3 = 0 linear independence and linear dependence (44) is only true when c1 = c2 = c3 = 0. Vectors a1 , a2 , and a3 satisfying this condition are said to be linearly independent vectors, and a vector d of the form d = c1 a1 + c2 a2 + c3 a3 , where not all of c1 , c2 , and c3 are zero, is said to be linearly dependent on the vectors a1 , a2 , and a3 . The vectors i, j, and k that form a basis for three-dimensional space are linearly independent vectors, but the position vector r = 2i − 3j + 5k is linearly dependent on vectors i, j, and k. Clearly, vectors i, j, and k do not form the only basis for three-dimensional space, because any triad of linearly independent vectors a1 , a2 , and a3 will serve equally well, as, for example, the nonorthogonal set of vectors shown in Fig. 2.20. The dot and cross products will now be combined to develop a test for linear dependence and independence based on the elementary geometrical idea of the volume of the parallelepiped shown in Fig. 2.21, three edges a, b, and c of which meet at the origin. z y ^ n V c b θ 0 a FIGURE 2.21 Volume V of a parallelepiped. x
• 84 Chapter 2 Vectors and Vector Spaces The volume V of a parallelepiped is a nonnegative number given by the product of the area of its base and its height. Suppose vectors a and b are chosen to form two sides of the base of the parallelepiped. Then the vector area of this base has already been interpreted as a × b. The vertical height of the parallelepiped is the ˆ projection of vector c in the direction of the unit vector n normal to the base, and ˆ ˆ so is given by n · c. Consequently, as a × b = a · b sin θ n, it follows that V = |(a × b) · c|. (45) The absolute value of the right-hand side of (45) has been taken because a volume must be a nonnegative quantity, whereas the dot product (a × b) · c may be of either sign. If vectors a, b, and c form a basis for three-dimensional space, vector c cannot be linearly dependent on vectors a and b, and so the parallelepiped in Fig. 2.21 with these vectors as its sides must have a nonzero volume. If, however, vectors a, b, and c are coplanar (all lie in the same plane), and so cannot form a basis for the space, the volume of the parallelepiped will be zero. These simple geometrical observations lead to the following test for the linear independence of three vectors in three-dimensional space. THEOREM 2.3 a test for linear independence scalar triple product Test for linear independence of vectors in three-dimensional space Let a, b, and c be any three vectors. Then the vectors are linearly independent if (a × b)· c = 0, and they are linearly dependent if (a × b) · c = 0. A product of the type (a × b) · c is called a scalar triple product. The name arises because the result is a scalar. It is also called a mixed triple product since both · and × appear. Three vectors are involved in this dot (scalar) product, one of which is the vector a × b and the other is the vector c. Scalar triple products are easily evaluated, because taking the dot product of a × b in the form given in (42) with c = c1 i + c2 j + c3 k gives (a × b) · c = (a2 b3 − a3 b2 )c1 − (a1 b3 − a3 b1 )c2 + (a1 b2 − a2 b1 )c3 . The right-hand side of this expression is simply the value of a determinant with successive rows given by the components of a, b, and c, so we have arrived at the following convenient formula for the scalar triple product. scalar triple product as a determinant Scalar triple product Let a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k. Then a1 (a × b) · c = b1 c1 a2 b2 c2 a3 b3 . c3 (46) Interchanging any two rows in a matrix changes the sign but not the value of its determinant. Two such switches in (46) leave the value unchanged, so the dot
• Section 2.4 Linear Dependence and Independence of Vectors and Triple Products 85 product is commutative and so we arrive at the useful result (a × b) · c = a · (b × c). (47) So, in a scalar triple product the dot and cross may be interchanged without altering the result. EXAMPLE 2.17 Given the two sets of vectors (a) a = i + 2j − 5k, b = i + j + 2k, c = i + 4j − 19k and (b) a = 2i + j + k, b = 3i + 4k, c = i + j + k, ﬁnd if the vectors are linearly independent or linearly dependent. Solution We apply Theorem 2.3 to each set, using result (46) to evaluate the scalar triple products. (a) 1 (a × b) · c = 1 1 −5 2 = 0, −19 2 1 4 so the set of three vectors in (a) is linearly dependent. In fact this can be seen from the fact that c = 3a − 2b. (b) 2 (a × b) · c = 3 1 1 0 1 1 4 = −4 = 0, 1 so the set of three vectors in (b) is linearly independent. Although not required, the volume V of the parallelepiped formed by these three vectors is V = |(a × b) · c| = | − 4| = 4. Another notation for the scalar triple product of vectors a, b, and c is [a, b, c], so [a, b, c] = (a × b) · c, (48) or, in terms of a determinant, a1 [a, b, c] = b1 c1 alternative forms of a scalar triple product a2 b2 c2 a3 b3 . c3 (49) Using this deﬁnition of [a, b, c] with the row interchange property of determinants (see Section 1.7) shows that [a, b, c] = [b, c, a] = [c, a, b], (50) because two row interchanges are needed to arrive at [b, c, a] from [a, b, c], leaving the sign of the determinant unchanged, whereas two more are required to arrive at [c, a, b] from [b, c, a], again leaving the sign of the determinant unchanged. The order of the vectors in results (46), or in the equivalent notation of (48), is easily remembered when the results are abbreviated to a b b c c a c a b
• 86 Chapter 2 Vectors and Vector Spaces In this pattern, row two follows from row one when the ﬁrst letter is moved to the end position, and row three follows from row two by means of the same process. The effect of applying this process to the third row is simply to regenerate the ﬁrst row. Rearrangements of this kind are called cyclic permutations of the three vectors. Again making use of the row interchange property of determinants (see Section 1.7), it follows that [a, b, c] = −[a, c, b], because this time only one row interchange is needed to produce the result on the right from the one on the left, so that a sign change is involved. A different product involving the three vectors a, b, and c that this time generates another vector is of the form a × (b × c), and products of this type are called vector triple products since the results are vectors. In these products it is essential to include the brackets because, in general, a × (b × c) = (a × b) × c. The most important results concerning vector triple products are given in the following theorem. THEOREM 2.4 Vector triple products (a) vector triple product If a, b, and c are any three vectors, then a × (b × c) = (a · c)b − (a · b)c and (b) (a × b) × c = (a · c)b − (b · c)a. Proof The proof of the results in Theorem 2.4 both follow in similar fashion, so we only prove result (a) and leave the proof of result (b) as an exercise. We write the cross product a × (b × c) in the form of the determinant in (43), with the components of a in the second row and those of b × c (obtained from (42)) in the third row when we ﬁnd that a × (b × c) = i a1 (b2 c3 − b3 c2 ) j a2 (b3 c1 − b1 c3 ) k a3 . (b1 c2 − b2 c1 ) Expanding this determinant in terms of the elements of its ﬁrst row and grouping terms gives a × (b × c) = [(a2 c2 + a3 c3 )b1 − (a2 b2 + a3 b3 )c1 ]i + [(a1 c1 + a3 c3 )b2 − (a1 b1 + a3 b3 )c2 ] j + [(a1 c1 + a2 c2 )b3 − (a1 b1 + a2 b2 )c3 ]k. As it stands, this result is not yet in the form that is required, but adding and subtracting a1 b1 c1 to the coefﬁcient of i, a2 b2 c2 to the coefﬁcient of j, and a3 b3 c3 to the coefﬁcient of k followed by grouping terms give a × (b × c) = (a · c)b − (a · b)c, and the result is established.
• Section 2.4 EXAMPLE 2.18 Linear Dependence and Independence of Vectors and Triple Products 87 Find a × (b × c) and (a × b) × c, given that a = 3i + j − 4k, b = 2i + j + 3k, and c = i + 5j − k. Solution a · b = −5, a · c = 12, and b · c = 4, so a × (b × c) = (a · c)b − (a · b)c = 12b + 5c = 29i + 37j + 31k, and (a × b) × c = (a · c)b − (b · c)a = 12b − 4a = 12i + 8j + 52k. Accounts of geometrical vectors can be found, for example, in references [2.1], [2.3], [2.6], and [1.6]. Summary This section introduced the two fundamental concepts of linear dependence and independence of vectors. It then showed how the scalar triple product involving three vectors, that gives rise to a scalar quantity, provides a simple test for the linear dependence or independence of the vectors involved. A simple and convenient way of calculating a scalar triple product was shown to be in terms of a determinant with the elements in its rows formed by the components of the three vectors involved in the product. Finally a vector triple product was deﬁned that gives rise to a vector quantity, and it was shown that to avoid ambiguity it is necessary to bracket a pair of vectors in such a product. A rule for the expansion of a vector triple product was derived and shown to involve a linear combination of two of the vectors multiplied by scalar products so that, for example, a × (b × c) = (a · c)b − (a · b)c. EXERCISES 2.4 In Exercises 1 through 4 use the vectors a, b, and c to ﬁnd (a) the scalar triple product a · (b × c), and (b) the volume of the parallelepiped determined by these three vectors directed away from a corner. 1. 2. 3. 4. a = 2i − j − 3k, b = 3i − 2k, c = i + j − 4k. a = i − j + 2k, b = i + j + 3k, c = 2i − j + 3k. a = −i − j + k, b = 2i + 2 j + 3k, c = −4i + j + 3k. a = 5i + 3k, b = 2i − j, c = −2i + 3j − 2k. In Exercises 5 through 10 ﬁnd which sets of vectors are coplanar. 5. 6. 7. 8. 9. 10. i + 3j + 2k, 2i + j + 4k, 4i + 7j + 8k. 2i + j + 4k, i + 2 j + k, 4i + 3j + 6k. 2i + k, i + 4j + 2k, 3i + 12 j + 7k. i + j + k, 2i + j + 2k, 4i + 3j + k. 2i + j − k, 3i + j + 2k, 5i + j + 8k. 2i + j − k, i + 2 j + 2k, 5i + 4j + k. In Exercises 11 through 15 use computer algebra to verify that [a, b, c] = [c, a, b] = −[a, c, b]. 11. a = i + j + k, b = 2i + j − k, c = 3i − j + k. 12. a = i − j − k, b = −5i + 2 j − 3k, c = 2i + 3j − 2k. 13. a = −3i − 4j + k, b = 9i + 12 j − 3k, c = i + 2j + k. 14. a = 3i + 4k, b = i + 5k, c = 2 j + k. 15. Prove that if a, b, c, and d are any four vectors, and λ, μ are arbitrary scalars [λa + μb, c, d] = λ[a, c, d] + μ[b, c, d]. Use computer algebra with vectors a, b, c, d from Exercise 12 with d = 4c − 2 j + 6k, and scalars λ, μ of your choice, to verify this result. In Exercises 16 through 20 ﬁnd (a) the cartesian equation of the plane containing the given points, and (b) a unit vector normal to the plane. 16. 17. 18. 19. 20. 21. (1, 2, 1), (3, 1, −2), (2, 1, 4). (2, 0, 3), (0, 1, 0), (2, 4, 5). (−1, 2, −3), (2, 4, 1), (3, 0, 1). (1, 2, 5), (−2, 1, 0), (0, 2, 0). Prove result (b) of Theorem 2.4. Show that a × (b × c) + b × (c × a) + c × (a × b) = 0. 22. The law of sines for a triangle with angles A, B, and C opposite sides with the respective lengths a, b, and c
• 88 Chapter 2 Vectors and Vector Spaces takes the form b c a = = . sin A sin B sin C Prove this by considering a vector triangle with sides a, b, and c, where c = a + b, and taking the cross product of c = a + b ﬁrst with a, then with b, and ﬁnally with c. In Exercises 23 through 26 use the fact that four points with position vectors p, q, r, and s will be coplanar if the vectors p − q, p − r, and p − s are coplanar to ﬁnd which sets of points all lie in a plane. 23. 24. 25. 26. 27. (1, 1, −1), (−3, 1, 1), (−1, 2, −1), (1, 0, 0). (1, 2, −1), (2, 1, 1), (0, 1, 2), (1, 1, 1). (0, −4, 0), (2, 3, 1), (3, −4, −2), (4, −2, −2). (1, 2, 3), (1, 0, 1), (2, 1, 2), (4, 1, 0). The volume of a tetrahedron is one-third of the product of the area of its base and its vertical height. Show the volume V of the tetrahedron in Fig. 2.22, in which three edges formed by the vectors a, b, and c are directed away from a vertex, is given by V = (1/6)|a · (b × c)| 28. Let a, b, c, and d be vectors and λ, μ, ν be scalars satisfying the equation λ(b × c) + μ(c × a) + ν(a × b) + d = 0. Show that if a, b, and c are linearly independent, then λ = −(a · d)/[a · (b × c)], ν = −(c · d)/[a · (b × c)]. 2.5 μ = −(b · d)/[a · (b × c)], a b c FIGURE 2.22 Tetrahedron. 29. Let a, b, c, and d be vectors and λ, μ, ν be scalars satisfying the equation λa + μb + νc + d = 0. By taking the scalar products of this equation ﬁrst with b × c, then with a × c, and ﬁnally with a × b, show that if a, b, and c are linearly independent, then λ = −d · (b × c)/[a · (b × c)], μ = −d · (c × a)/[a · (b × c)], ν = −d · (a × b)/[a · (b × c)]. 30. Show that a = i + 2 j + k, b = 2i − j − k, and c = 4i + 3j + ik are linearly independent vectors, and use them with a vector d of your choice to verify the results of Exercises 28 and 29. 31. Prove the Lagrange identity (a × b) · (c × d) = (a · c)(b · d) − (a · d)(b · c). n-Vectors and the Vector Space R n n-tuples There are many occasions when it is convenient to generalize a vector and its associated algebra to spaces of more than three dimensions. A typical situation occurs in mechanics, where it is sometimes necessary to consider both the position and the momentum of a particle as functions of time. This leads to the study of a 6-vector, three components of which specify the particle position and three its momentum vector at a time t. Sets of n numbers (x1 , x2 , . . . , xn ) in a given order, that can be thought of either as n-vectors or as the coordinates of a point in n-dimensional space are called ordered n-tuples of real numbers or, simply, n-tuples.
• n-Vectors and the Vector Space R n Section 2.5 n-vector 89 An n-vector If n ≥ 2 is an integer, and x1 , x2 , . . . , xn are real numbers, an n-vector is an ordered n-tuple (x1 , x2 , . . . , xn ). components and dimension norm in R n The numbers x1 , x2 , . . . , xn are called the components of the n-vector, xi is the ith component of the vector, and n is called the dimension of the space to which the n-vector belongs. For any given n, the set of all vectors with n real components is called a real n-space or, simply, an n-space, and it is denoted by the symbol Rn . A corresponding space exists when the n numbers x1 , x2 , . . . , xn are allowed to be complex numbers, leading to a complex n-space denoted by C n . In this notation R3 is the three-dimensional space used in previous sections. In R3 the length of a vector was taken as the deﬁnition of its norm, so if 2 2 2 r = x1 i + x2 j + x3 k, then r = (x1 + x2 + x3 )1/2 . A generalization of this norm to R n leads to the following deﬁnition. The norm in Rn The norm of the n-vector (x1 , x2 , . . . , xn ), denoted by (x1 , x2 , . . . , xn ) is (x1 , x2 , . . . , xn ) = 2 2 2 x1 + x2 + · · · + xn 1/2 n = xi2 (51) . i=1 The laws for the equality, addition, and scaling of vectors in R3 in terms of the components of the vector generalize to R n as follows. Equality of n-vectors algebraic rules for equality, addition, and scaling using components Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two n-vectors. Then the vectors will be equal, written (x1 , x2 , . . . , xn ) = (y1 , y2 , . . . , yn ), if, and only if, corresponding components are equal, so that x1 = y1 , x2 = y2 , . . . , xn = yn . (52) Addition of n-vectors Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be any two n-vectors. Then the sum of these vectors, written (x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ), is deﬁned as the vector whose ith component is the sum of the corresponding ith components of the vectors for i = 1, 2, . . . , n, so that (x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn ). (53)
• 90 Chapter 2 Vectors and Vector Spaces Scaling an n-vector Let (x1 , x2 , . . . , xn ) be an arbitrary n-vector and λ be any scalar. Then the result of scaling the vector by λ, written λ(x1 , x2 , . . . , xn ), is deﬁned as the vector whose ith component is λ times the ith component of the original vector, for i = 1, 2, . . . , n, so that λ(x1 , x2 , . . . , xn ) = (λx1 , λx2 , . . . , λxn ). (54) The null (zero) vector in Rn is the vector 0 in which every component is zero, so that 0 = (0, 0, . . . , 0). (55) As with vectors in R3 , so also with n-vectors in Rn , it is convenient to use a single boldface symbol for a vector and the corresponding italic symbols with sufﬁxes when it is necessary to specify the components. So we will write x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ). The reasoning that led to the interpretation of Theorem 2.1 on the algebraic rules for the addition and scaling of vectors in R3 leads also the following theorem for n-vectors. THEOREM 2.5 Algebraic rules for the addition and scaling of n-vectors in R n Let x, y, and z be arbitrary n-vectors, and let λ and μ be arbitrary real numbers. Then: (i) (ii) (iii) (iv) (v) (vi) (vii) x + y = y + x; x + 0 = 0 + x = x; (x + y) + z = x + (y + z); λ(x + y) = λx + λy; (λμ)x = λ(μx) = μ(λx); (λ + μ)x = λx + μx; λx = |λ| x . Because of this similarity between vectors in R3 and in Rn , the space Rn is called a real vector space, though because the symbol R indicates real numbers this is usually abbreviated a vector space. Analogously, when the elements of the n-vectors are allowed to be complex, the resulting space is called the complex vector space C n . So far there would seem to be little difference between vectors in R3 and Rn , but major differences do exist, and they are best appreciated when geometrical analogies are sought for vector operations in Rn . dot product of n-vectors The dot product of n-vectors Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be any two n-vectors. Then the dot product of these two vectors, written x · y and also called their inner
• Section 2.5 n-Vectors and the Vector Space R n 91 product, is deﬁned as the sum of the products of corresponding components, so that (x1 , x2 , . . . , xn ) · (y1 , y2 , . . . , yn ) = x1 y1 + x2 y2 + . . . + xn yn . (56) The following properties of this dot product are strictly analogous to those of the dot product in R3 and can be deduced directly from (56). THEOREM 2.6 Properties of the dot product in R n Let x, y, and z be any three n-vectors and λ be any scalar. Then: (i) (ii) (iii) (iv) (v) (vi) x · y = y · x; x · (y + z) = x · y + x · z; (λx) · y = x · (λy) = λ(x · y); x · x = x 2; x · 0 = 0; x 2 = 0 if, and only if, x = 0. The existence of a dot product in Rn allows the Cauchy–Schwarz and triangle inequalities to be generalized, both of which play a fundamental role in the study of vector spaces. Various forms of proof of these inequalities are possible, but the one given here has been chosen because it makes full use of the properties of the dot product listed in Theorem 2.6. THEOREM 2.7 The Cauchy–Schwarz and triangle inequalities Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be any two n-vectors. Then generalized inequalities for n-vectors (a) |x · y| ≤ x · y (Cauchy–Schwarz inequality), and (b) x+y ≤ x + y (triangle inequality). Proof We start by proving the Cauchy–Schwarz inequality in (a). The inequality is certainly true if x · y = 0, so we need only consider the case x · y = 0. Let x and y be any two n-vectors, and λ be a scalar. Then, using properties (ii) to (iv) of Theorem 2.6, x + λy 2 = (x + λy) · (x + λy), = x 2 + λx · y + λy · x + λ2 y 2 . However, by result (1) of Theorem 2.6, y · x = x · y, so x + λy 2 = x 2 + 2λx · y + λ2 y 2 . We now set λ = − x 2 /(x · y) to obtain x + λy 2 =− x 2 +( x 4 y 2 )/|x · y|2 , where we have used the fact that (x · y)2 = |x · y|2 . As x + λy result is equivalent to − x 2 +( x 4 · y 2 )/|x · y|2 ≥ 0. 2 is nonnegative, this
• 92 Chapter 2 Vectors and Vector Spaces Cancelling the nonnegative number x 2 , which leaves the inequality sign unchanged; rearranging the terms; and taking the square root of the remaining nonnegative result on each side of the inequality yields the Cauchy–Schwarz inequality |x · y| ≤ x · y . To prove the triangle inequality (b) we set λ = 1 and start from the result x+y 2 = x 2 + 2x · y + y 2 . As x · y may be either positive or negative, x · y ≤ |x · y|, so making use of the Cauchy–Schwarz inequality shows that x+y 2 ≤ x 2 +2 x · y + y 2 = ( x + y )2 . The triangle inequality follows from taking the square root of each side of this inequality, which is permitted because both are nonnegative numbers. The dot product in R3 allowed the angle between vectors to be determined and, more importantly, it provided a test for the orthogonality of vectors. These same geometrical ideas can be introduced into the vector space Rn if the Cauchy–Schwarz inequality is written in the form − x · y ≤x·y≤ x · y . After division by the nonnegative number x · y , this becomes −1 ≤ x·y ≤ 1. x · y This enables the angle θ between the two n-vectors x and y to be deﬁned by the result x·y . x · y cos θ = orthogonality of n-vectors unit n-vector On account of this result, two n-vectors x and y in Rn will be said to be orthogonal when x · y = 0. By analogy with R3 we will call x = (x1 , x2 , . . . , xn ) a unit n-vector if x = 1. If we deﬁne the unit n-vectors e1 , e2 , . . . , en as e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1), we see that ei · e j = 1 0 for i = j for i = j, showing that the ei are mutually orthogonal unit n-vectors in Rn . As a result of this the vectors e1 , e2 , . . . , en play the same role in Rn as the vectors i, j, and k in R3 . This allows the vector x = (x1 , x2 , . . . , xn ) to be written as x = x1 e1 + x2 e2 + · · · + xn en , where xi is the ith component of x.
• Section 2.5 n-Vectors and the Vector Space R n 93 Now suppose that for n > 3, we set u1 = (1, 0, 0, 0, . . . , 0), subspaces u2 = (0, 1, 0, 0, . . . , 0), u3 = (0, 0, 1, 0, . . . , 0), and all other ui identically zero, so that ui = (0, 0, 0, 0, . . . , 0) for i = 4, 5, . . . , n. Then it is not difﬁcult to see that u1 , u2 , and u3 behave like the unit vectors i, j, and k, so that, in some sense the vector space R3 is embedded in the vector space Rn with vectors in both spaces obeying the same algebraic rules for addition and scaling. This is recognized by saying that R3 is a subspace of Rn . Subspace of Rn A subset S of vectors in the vector space Rn is called a subspace of Rn if S is itself a vector space that obeys the rules for the addition and the scaling of vectors in Rn . EXAMPLE 2.19 Find the condition that the set S of vectors of the form (x, mx + c, 0), for any m and all real x forms a subspace of the vector space R3 , and give a geometrical interpretation of the result. Solution The set S can only contain the null vector (0, 0, 0) if c = 0, so if c = 0 the vectors in S cannot form a subspace of R3 . Now let c = 0, so that S contains the null vector. The vector addition law holds, because if (x, mx, 0) and (x , mx , 0) are vectors in S, the sum (x, mx, 0) + (x , mx , 0) = (x + x , m(x + x ), 0) is also a vector in S. The scaling λ(x, mx, 0) = (λx, mλx, 0) also generates a vector in S, so the scaling law for vectors also holds, showing that S is a subspace of R3 provided c = 0. If the three components of vectors in S are regarded as the x-, y-, and z-components of a vector in R3 , the vectors can be interpreted as points on the straight line y = mx passing through the origin and lying in the plane z = 0. This subspace is a one-dimensional vector space embedded in the three-dimensional vector space R3 . EXAMPLE 2.20 Test the following subsets of Rn to determine if they form a subspace. (a) S is the set of vectors (x1 , x1 + 1, . . . , xn ) with all the xi real numbers. (b) S is the set of vectors (x1 , x2 , . . . , xn ) with x1 + x2 + · · · + xn = 0 and all the xi are real numbers. Solution (a) The set S does not contain the null vector and so cannot form a subspace of Rn . This result is sufﬁcient to show that S is not a subspace, but to see what properties of a subspace the set S possesses we consider both the summation and scaling of vectors in S. If (x1 , x1 + 1, . . . , xn ) and (x1 , x1 + 1, . . . , xn ) are two vectors in S, their sum (x1 , x1 + 1, . . . , xn ) + (x1 , x1 + 1, . . . , xn ) = (x1 + x1 , x1 + x1 + 2, . . . , xn + xn ) is not a vector in S, so the summation law is not satisﬁed.
• 94 Chapter 2 Vectors and Vector Spaces The scaling condition for vectors is not satisﬁed, because if λ is an arbitrary scalar, λ(x1 , x1 + 1, . . . , xn ) = (λx1 , λx1 + λ, . . . , λxn ) = (a, a + 1, . . .), (λn1 = a) showing that scaling generates another a vector in S. We have proved that the vectors in S do not form a subspace of Rn . (b) The set S does contain the null vector, because x1 = x2 = · · · = xn = 0 satisﬁes the constraint condition x1 + x2 + · · · + xn = 0. Both the summation law and the scaling law for vectors are easily seen to be satisﬁed, so this set S does form a subspace of Rn . EXAMPLE 2.21 Let C(a, b) be the space of all real functions of a single real variable x that are continuous for a < x < b, and let S(a, b) be the set of all functions belonging to C(a, b) that have a derivative at every point of the interval a < x < b. Show that S(a, b) forms a subspace of C(a, b). Solution In this case a vector in the space is simply any real function of a single real variable x that is continuous in the interval a < x < b. The null vector corresponds to the continuous function that is identically zero in the stated interval, so as the derivative of this function is also zero, it follows that the set S(a, b) must also contain the null vector. The sum of continuous functions in a < x < b is a continuous function, and the sum of differentiable functions in this same interval is a differentiable function, so the summation law for vectors is satisﬁed. Similarly, scaling continuous functions and differentiable functions does not affect either their continuity or their differentiability, so the scaling law for vectors is also satisﬁed. Thus, S(a, b) forms a subspace of C(a, b). Think of the dimension of these spores as inﬁnite; norm and inner product are easy to deﬁne. Summary This section generalized the concept of a three-dimensional vector to a vector with n components in R n . It was shown that the magnitude of a vector in three space dimensions generalizes to the norm of a vector in R n and that in terms of components, the equality, addition, and scaling of vectors in R n follow the same pattern as with three space dimensions. The dot product was generalized and two fundamental inequalities for vectors in R n were derived. The concept of orthogonality of vectors was generalized and the notion of a subspace of R n was introduced. EXERCISES 2.5 In Exercises 1 through 8 ﬁnd the sum of the given pairs of vectors, their norms, and their dot product. 1. 2. 3. 4. 5. 6. 7. 8. (2, 1, 0, 2, 2), (1, −1, 2, 2, 4). (3, −1, −1, 2, −4), (1, 2, 0, 0, 3). (2, 1, −1, 2, 1), (−2, −1, 1, −2, −1). (3, −2, 1, 1, 2, 0, 1), (1, −1, 1, −1, 1, 0, 1). (3, 0, 1, 0), (0, 2, 0, 4). (1, −1, 2, 2, 0, 1), (2, −2, 1, 1, 1, 0). (−1, 2, −4, 0, 1), (2, −1, 1, 0, 2). (3, 1, 2, 4, 1, 1, 1), (1, 2, 3, −1, −2, 1, 3). In Exercises 9 through 12 ﬁnd the angle between the given pairs of n-vectors and the unit n-vector associated with each vector. 9. 10. 11. 12. (3, 1, 2, 1), (1, −1, 2, 2). (4, 1, 0, 2), (2, −1, 2, 1). (2, −2, −2, 4), (1, −1, −1, 2). (2, 1, −1, 1), (1, −2, 2, 2). In Exercises 13 through 18 determine if the set of vectors S forms a subspace of the given vector space. Give reasons why S either is or is not a subspace.
• Section 2.6 13. S is the set of vectors of the form (x1 , x2 , . . . , xn ) in Rn , 4 with the xi real numbers and x2 = x1 . 14. S is the set of vectors of the form (x1 , x2 , . . . , xn ) in Rn , with the xi real numbers and x1 + 2x2 + 3x3 + · · · + nxn = 0. 15. S is the set of vectors of the form (x1 , x2 , . . . , xn ) in Rn , with the xi real numbers and x1 + x2 + x3 + · · · + xn = 2. 16. S is the set of vectors of the form (x1 , x2 , . . . , x6 ) in R 6 , with the xi real numbers and x1 = 0 or x6 = 0. 17. S is the set of vectors of the form (x1 , x2 , . . . , x6 ) in R 6 , with the xi real numbers and x1 − x2 + x3 · · · + x6 = 0. 18. S is the set of vectors of the form (x1 , x2 , . . . , x5 ) in R5 , with the xi real numbers and x2 < x3 . In Exercises 19 to 23 determine if the given set S is a subspace of the space C[0, 1] of all real valued functions that are continuous on the interval 0 ≤ x ≤ 1. Give reasons why either S is a subspace, or it is not. 19. S is the set of all polynomials of degree two. 20. S is the set of all polynomial functions. 21. S is the set of all continuous functions such that f (0) = f (1) = 0. 22. S is the set of all continuous functions such that f (0) = 0 and f (1) = 2. 23. S is the set of all continuous once differentiable functions such that f (0) = 0 and f (x) > 0. 24. Prove that the set S of all vectors lying in any plane in R3 that passes through the origin forms a subspace of R3 . 25. Explain why the set S of all vectors lying in any plane in R3 that does not pass through the origin does not form a subspace of R3 . 2.6 Linear Independence, Basis, and Dimension 95 26. Consider the polynomial P(λ) deﬁned as P(λ) = x + λy 2 , where x and y are vectors in Rn . Show, provided not both x and y are null vectors, that the graph of P(λ) as a function of λ is nonnegative, so P(λ) = 0 cannot have real roots. Use this result to prove the Cauchy–Schwarz inequality |x · y| ≤ x · y . 27. Let x and y be vectors in Rn and λ be a scalar. Prove that x + λy 2 + x − λy 2 = 2( x + λ2 y 2 ). 2 28. If x and y are orthogonal vectors in Rn , prove that the Pythagoras theorem takes the form x+y 2 = x 2 + y 2. 29. What conditions on the components of vectors x and y in the Cauchy–Schwarz inequality cause it to become an equality, so that n 1/2 n xi yi = 1/2 n xi2 i=1 yi2 i=1 ? i=1 30. Modify the method of proof used in Theorem 2.7 to prove the complex form of the Cauchy–Schwarz inequality n 1/2 n |xi |2 xi yi ≤ i=1 i=1 1/2 n + |yi |2 , i=1 where the xi and yi are complex numbers. Linear Independence, Basis, and Dimension The concept of the linear independence of a set of vectors in R3 introduced in Section 2.4 generalizes to Rn and involves a linear combination of n-vectors. Linear combination of n-vectors Let x1 , x2 , . . . , xm be a set of n-vectors in Rn . Then a linear combination of the n-vectors is a sum of the form c1 x1 + c2 x2 + · · · + cmxm, where c1 , c2 , . . . , cm are nonzero scalars.
• 96 Chapter 2 Vectors and Vector Spaces An example of a linear combination of vectors in R5 is provided by the vector sum (m = 3, n = 5) 2x1 + x2 + 3x3 , where x1 = (1, 2, 3, 0, 4), x2 = (2, 1, 4, 1, −3), and x3 = (6, 0, 2, 2, −1). The vector in R5 formed by this linear combination is 2x1 + x2 + 3x3 = 2(1, 2, 3, 0, 4) + (2, 1, 4, 1, −3) + 3(6, 0, 2, 2, −1), = (22, 5, 16, 7, 2). A linear combination of n-vectors is the most general way of combining nvectors, and the deﬁnition of a linear combination of vectors contains within it the deﬁnition of the scaling of a single n-vector as a special case. This can be seen by setting m = 1, because this reduces the linear combination to the single scaled n-vector c1 x1 . linear dependence and independence of n-vectors Linear dependence of n-vectors Let x1 , x2 , . . . , xm be a set of n-vectors in Rn . Then the set is said to be linearly dependent if, and only if, one of the n-vectors can be expressed as a linear combination of the remaining n-vectors. An example of linear dependence in R 4 is provided by the vectors x1 = (1, 0, 2, 5), x2 = (2, 1, 2, 1), x3 = (3, 2, 1, 0), and x4 = (−1, −1, −1, 7), because x4 = 2x1 − 3x2 + x3 . Linear independence of n-vectors Let x1 , x2 , . . . , xm be a set of n-vectors in Rn . Then the set is said to be linearly independent if, and only if, the n-vectors are not linearly dependent. A simple example of a set of linearly independent vectors in R 4 is provided by the vectors e1 = (1, 0, 0, 0), e2 = (0, 1, 0, 0), and e3 = (0, 0, 1, 0). The linear independence of these 4-vectors can be seen from the fact that for no choice of c1 and c2 can the vector c1 e1 + c2 e2 be made equal to e3 . To make effective use of the concept of linear independence, and to understand the notion of the basis and dimension of a vector space, it is necessary to have a test for linear independence. Such a test is provided by the following theorem. THEOREM 2.8 Linear dependence and independence Let S be a set of non-zero n-vectors x1 , x2 , . . . , xm, with m ≥ 2. Then: (a) Set S is linearly dependent if the vector equation c1 x1 + c2 x2 + · · · + cmxm = 0 is true for some set of scalars (constants) c1 , c2 , . . . , cm that are not all zero;
• Section 2.6 Linear Independence, Basis, and Dimension 97 (b) Set S is linearly independent if the vector equation c1 x1 + c2 x2 + · · · + cmxm = 0 is only true when c1 = c2 = · · · = cm = 0. Proof To establish result (a) it is necessary to show that the conditions of the deﬁnition of linear dependence are satisﬁed. First, if the set S of n-vectors is linearly dependent, scalars d1 , d2 , . . . , dm exist such that d1 x1 + d2 x2 + · · · + dmxm = 0. There is no loss of generality in assuming that d1 = 0, because if this is not the case a renumbering of the vectors can always make this possible. Consequently, x1 = (−d2 /d1 )x2 + (−d3 /d1 )x3 + · · · + (−dm/d1 )xm, which shows, as claimed, that the set S is linearly dependent, because x1 is linearly dependent on x2 , x3 , . . . , xm. A similar argument applies to show that xr is linearly dependent on the remaining n-vectors in S provided dr = 0, for r = 2, 3, . . . , m. Conversely, if one of the n-vectors in set S, say x1 , is linearly dependent on the remaining n-vectors in the set, scalars d1 , d2 , . . . , dm can be found such that x1 = d2 x2 + · · · + dmxm, so that x1 − d2 x2 − · · · − dmxm = 0. This result is of the form given in deﬁnition of linear dependence with c1 = 1, c2 = −d2 , . . . , cm = −dm, not all of which constants are zero, so again the set of n-vectors in S is seen to be linearly dependent. To establish result (b), suppose, if possible, that the set S of vectors is linearly independent, but that some scalars d1 , d2 , . . . , dm that are not all zero can be found such that d1 x1 + d2 x2 + · · · + dmxm = 0. Then if d1 = 0, say, is one of these scalars, it follows that x1 = (−d2 /d1 )x2 + (−d3 /d1 )x3 + · · · + (−dm/d1 )xm, which is impossible because this shows that, contrary to the hypothesis, x1 is linearly dependent on the remaining n-vectors in S. So we must have c1 = c2 = · · · = cm = 0. A systematic and efﬁcient computational method for the application of Theorem 2.8 to vectors in Rn will be developed in the next chapter for the three separate cases that arise, (a) m < n, (b) m = n, and (c) m > n. However, when n and m are small, a straightforward approach is possible, as illustrated in the next example. EXAMPLE 2.22 Test the following sets of vectors in R4 for linear dependence or independence. (a) x1 = (2, 1, 1, 0), x2 = (0, 2, 0, 1), x3 = (1, 1, 0, 2), x4 = (0, 2, 1, 1). (b) x1 = (4, 0, 2), x2 = (2, 2, 0), x3 = (1, 1, 0), x4 = (5, 1, 2).
• 98 Chapter 2 Vectors and Vector Spaces Solution In both (a) and (b) it is necessary to consider the vector equation c1 x1 + · · · + cmxm = 0. If the equation is only satisﬁed when c1 = c2 = · · · = cm = 0, the set of vectors will be linearly independent, whereas if a solution can be found in which not all of the constants c1 , c2 , c3 , c4 vanish, the set of vectors will be linearly dependent. (a) Substituting for x1 , x2 , x3 , x4 in the preceding equation and equating corresponding components show the coefﬁcients ci must satisfy the following equations 2c1 + c3 = 0 c1 + 2c2 + c3 + 2c4 = 0 c1 + c 4 = 0 c2 + 2c3 + c4 = 0. The third equation shows that c4 = −c1 , so the equations can be rewritten as 2c1 + c3 = 0 −c1 + 2c2 + c3 = 0 c2 − c1 + 2c3 = 0. Adding twice the third equation to the ﬁrst equation shows that c3 = 0, so c1 = 0, and it then follows that c2 = c3 = c4 = 0. This has established the linear independence of the set of vectors in (a). (b) Proceeding in the same manner with the set of vectors in (b) leads to the following equations for the coefﬁcients ci : 4c1 + 2c2 + c3 + 5c4 = 0 2c2 + c3 + c4 = 0 2c1 + 2c4 = 0. The third equation shows that c4 = −c1 , so using this result in the ﬁrst two equations reduces the ﬁrst one to −c1 + 2c2 + c3 = 0 and the second to −c1 + 2c2 + c3 = 0. There is only one equation connecting c1 , c2 , and c3 , and hence also c4 . This means that if c2 and c3 are given arbitrary values, not both of which are zero, the constants c1 and c4 will be determined in terms of them. Thus, a set of constants c1 , c2 , c3 , c4 that are not all zero can be found that satisfy the vector equation, showing that the set of vectors in (b) is linearly dependent. This set of constants is not unique, but this does affect the conclusion that the set of vectors is linearly dependent, because to establish linear dependence it is sufﬁcient that at least one such set of constants can be found.
• Section 2.6 Linear Independence, Basis, and Dimension 99 Example 2.22 has shown one way in which Theorem 2.8 can be implemented for vectors in Rn , but it also illustrates the need for a systematic approach to the solution of the system of equations for the coefﬁcients when n is large. A trivial case of Theorem 2.8 arises when the set of vectors S contains the null vector 0, because then the set of vectors in S is always linearly dependent. This can be seen by assuming that x1 = 0, because then the vector equation in the theorem becomes c1 0 + c2 x2 + · · · + cmxm = 0. This vector equation is satisﬁed if c1 = 0 (arbitrary) and c2 = c3 = · · · = cm = 0, so, as not all of the coefﬁcients are zero, the set of vectors must be linearly dependent. We conclude this introduction to the vector space Rn by deﬁning the span, a basis, and the dimension of a vector space. span of a vector space Span of a vector space Let the set of non-zero vectors x1 , x2 , . . . , xm belonging to a vector space V have the property that every vector in V can be expressed as a linear combination of these vectors. Then the vectors x1 , x2 , . . . , xm are said to span the vector space V. EXAMPLE 2.23 All vectors v in the (x, y)-plane are spanned by the vectors i and j, because any vector v = (v1 , v2 ) can always be written v = v1 i + v2 j. This is an example of vectors spanning the space R2 . EXAMPLE 2.24 The vector space Rn is spanned by the unit n-vectors e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1). EXAMPLE 2.25 The subspace R3 of the vector space R5 is spanned by the unit vectors e1 = (1, 0, 0, 0, 0), e2 = (0, 1, 0, 0, 0), e3 = (0, 0, 1, 0, 0), because all vectors v = (v1 , v2 , v3 ) in R3 can be written in the form of the linear combination v = v1 e1 + v1 e2 + v3 e3 . basis of a vector space in R n Basis of a vector space Let x1 , x2 , . . . , xn be vectors in Rn . Then the vectors are said to form a basis for the vector space Rn if: (i) The vectors x1 , x2 , . . . , xn are linearly independent. (ii) Every vector in Rn can be expressed as a linear combination of the vectors x1 , x2 , . . . , xn . dimension of a vector space Dimension of a vector space The dimension of a vector space is the number of vectors in its basis.
• 100 Chapter 2 Vectors and Vector Spaces EXAMPLE 2.26 A basis for the space of ordinary vectors in three dimensions is provided by the vectors i, j, and k, so the dimension of the space is 3. EXAMPLE 2.27 A basis for Rn is provided by the n vectors e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1), so its dimension is n. EXAMPLE 2.28 It was shown in Example 2.20 (b) that the set S of vectors (x1 , x2 , . . . , xn ) with x1 + x2 + · · · + xn = 0 forms a subspace of Rn . The dimension of Rn is n, but the constraint condition x1 + x2 + · · · + xn = 0 implies that only n − 1 of the components x1 , x2 , . . . , xn can be speciﬁed independently, because the constraint itself determines the value of the remaining component. This in turn implies that the basis for the subspace S can only contain n − 1 linearly independent vectors, so S must have dimension n − 1. More information on linear vector spaces can be found in references [2.1] and [2.5] to [2.12]. Summary In this section the concepts of linear dependence and independence were generalized to vectors in R n , and the span of a vector space was deﬁned as a set of vectors in R n with the property that every vector in R n can be expressed as a linear combination of these vectors. Naturally in R n , as in R 3 , a set of vectors spanning the space is not unique. The smallest set of n vectors spanning a vector space is said to form a basis for the vector space, and the dimension of a vector space is the number of vectors in its basis. This corresponds to the fact that the unit vectors i, j, and k form a basis for the ordinary three-dimensional space R 3 , because every vector in this space can be represented as a linear combination of i, j, and k. EXERCISES 2.6 In Exercises 1 through 12 determine if the set of m vectors in three-dimensional space is linearly independent by solving for the scalars c1 , c2 . . . cm in Theorem 2.8. Where appropriate, verify the result by using Theorem 2.3. a = i + 2 j + k, b = i − j + k, c = 2i + k. a = 3i − j + k, b = i + 3k, c = 5i − j + 7k. a = 2i − j + k, b = 3i + j − k, c = 8i + j + 7k. a = 3i + 2k, b = i + j + 2k, c = 11i + 2 j − 2k. a = 4i − j + 3k, b = i + 4j − 2k, c = 3i − j − k. a = i + j − k, b = i − j + k, c = −i + j + k. a = i + 2 j + k, b = i + 3j − k, c = 3i + 10j − 5k. a = 2i + 3j + k, b = i − 3j + 2k, c = i + 15j − 4k. a = 3i − j + 2k, b = i + j + k (m = 2). a = i + j + k, b = i + 2 j + k, c = i + 3j + k, d = i − 4j + k (m = 4). 11. a = i − j + 3k, b = 2i − j + 2k, c = i + k, d = 3i + j + k (m = 4). 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 12. a = i + j, b = j + k, c = i − k. In Exercises 13 through 16, determine if the set of vectors in R 4 is linearly independent by using the method of Example 2.22. 13. 14. 15. 16. (1, 3, −1, 0), (1, 2, 0, 1), (0, 1, 0, −1), (1, 1, 0, 1). (1, −2, 1, 2), (4, −1, 0, 2), (2, 1, −1, 1), (1, 0, 0, −1). (2, 1, 0, 1), (1, 0, 1, 1), (4, 1, 2, −1), (1, 0, 1, −1). (1, 2, 1, 1), (1, −2, 0, −1), (1, 1, 1, 2), (1, −1, 0, 0). In Exercises 17 through 20, ﬁnd a basis and the dimension of the given subspace S. 17. The subspace S of vectors in R5 of the form (x1 , x2 , x3 , x4 , x5 ) with x1 = x2 . 18. The subspace S of vectors in R 4 of the form (x1 , x2 , x3 , x4 ) with x1 = 2x2 . 19. The subspace S of vectors in R5 of the form (x1 , x2 , x3 , x4 , x5 ) with x1 = x2 = 2x3 .
• Section 2.7 20. The subspace S of vectors in R 6 of the form (x1 , x2 , x3 , x4 , x5 , x6 ) with x1 = 2x2 and x3 = −x4 . 21. Let u = cos2 x and v = sin2 x form a basis for a vector space V. Find which of the following can be represented in terms of u and v, and so lie in V. 2.7 Gram–Schmidt Orthogonalization Process 101 (a) 2. (b) sin 2x. (c) 0. (d) cos 2x. (e) 2 + 3x. (f) 3 − 4 cos 2x. 22. Given that r ≤ n, prove that any subset S of r vectors selected from a set of n linearly independent vectors is linearly independent. Gram–Schmidt Orthogonalization Process A set of vectors forming a basis for a vector space is not unique, and having obtained a basis by some means, it is often useful to replace it by an equivalent set of orthogonal vectors. The Gram–Schmidt orthogonalization process accomplishes this by means of a sequence of simple steps that have a convenient geometrical interpretation. We now develop the Gram–Schmidt orthogonalization process for geometrical vectors in R3 , though in Section 4.2 the method will be extended to vectors in Rn to enable orthogonal matrices to be constructed from a set of eigenvectors associated with a symmetric matrix. Let us now show how any basis for R3 , comprising three nonorthogonal linearly independent vectors a1 , a2 , and a3 , can be used to construct an equivalent basis involving three linearly independent orthogonal vectors u1 , u2 , and u3 . It is essential that the vectors a1 , a2 , and a3 be linearly independent, because if not, the vectors u1 , u2 , and u3 generated by the Gram–Schmidt orthogonalization process will be linearly dependent and so cannot form a basis for R3 . The derivation of the method starts by setting u1 = a1 , where the choice of a1 instead of a2 or a3 is arbitrary. The component of a2 in the direction of u1 is u1 · a2 , so the vector component of a2 in this direction is (u1 · a2 )u1 = and this always exists because u1 vector u2 that is normal to u1 , so 2 (u1 · a2 )u1 , u1 2 > 0. Subtracting this vector from a2 gives a u2 = a2 − (u1 · a2 )u1 . u1 2 Similarly, to ﬁnd a vector normal to both u1 and u2 involving a3 , it is necessary to subtract from a3 the components of vector a3 in the direction of u1 and also in the direction of u2 , so that u3 = a3 − (u1 · a3 )u1 (u2 · a3 )u2 − , u1 2 u2 2 and this also always exists, because u1 2 > 0 and u2 2 > 0. If an orthonormal basis is required, it is necessary to normalize the vectors u1 , u2 , and u3 by dividing each by its norm.
• 102 Chapter 2 Vectors and Vector Spaces Rule for the Gram–Schmidt orthogonalization process in R3 A set of nonorthogonal linearly independent vectors a1 , a2 , and a3 that form a basis in R3 can be used to generate an equivalent orthogonal basis involving the vectors, u1 , u2 , and u3 by setting u1 = a1 , u2 = a2 − u3 = a3 − (u1 · a2 )u1 , u1 2 and (u1 · a3 )u1 (u2 · a3 )u2 − . u1 2 u2 2 As already remarked, the choice of a1 as the vector with which to start the orthogonalization process was arbitrary, and the process could equally well have been started by setting u1 = a2 or u1 = a3 . Using a different vector will produce a different set of orthogonal vectors u1 , u2 , and u3 , but any basis for R3 is equivalent to any other basis, so unless there is a practical reason for starting with a particular vector, the choice is immaterial. EXAMPLE 2.29 Given the nonorthogonal basis a1 = i − j − k, a2 = i + j + k, and a3 = −i + 2k, use the Gram–Schmidt orthogonalization process to ﬁnd an equivalent orthogonal basis, and then ﬁnd the corresponding orthonormal basis. Solution Using the preceding rule we start with u1 = i − j − k, and to ﬁnd u2 we need to use the results u1 · a2 = −1 and u1 2 = 3, so that u2 = i + j + k − (−1/3)(i − j − k) = (4/3)i + (2/3)j + (2/3)k. To ﬁnd u3 we need to use the results u1 · a3 = −3, u2 2 = 24/9, so that u1 2 = 3, u2 · a3 = 0, and u3 = −i + 2k − (−3/3)(i − j − k) = −j + k. So the required equivalent orthogonal basis is u1 = i − j − k, u2 = (4/3)i + (2/3)j + 2/3k, and u3 = −j + k. The corresponding orthonormal basis obtained by dividing each of these vectors by its norm (modulus) is √ ˆ u1 = (1/ 3)u1 , ˆ u2 = (1/2) (3/2)u2 and √ ˆ u3 = (1/ 2)u3 . Other accounts of the Gram–Schmidt orthogonalization process are to be found in references [2.1] and [2.7] to [2.12].
• Section 2.7 Summary Gram–Schmidt Orthogonalization Process 103 In this section it is shown how in R 3 the Gram–Schmidt orthogonalization process converts any three nonorthogonal linearly independent vectors a1 , a2 , and a3 into three orthogonal vectors u1 , u2 , and u3 . If necessary, the vectors u1 , u2 , and u3 can then be normalized in the usual manner to form an orthogonal set of unit vectors. EXERCISES 2.7 In Exercises 1 through 6, use the given nonorthogonal basis for vectors in R3 to ﬁnd an equivalent orthogonal basis by means of the Gram–Schmidt orthogonalization process. 1. 2. 3. 4. 5. a1 a1 a1 a1 a1 = i + 2 j + k, a2 = i − j, a3 = 2 j − k. = j + 3k, a2 = i + j − k, a3 = i + 2k. = 2i + j, a2 = 2 j + k, a3 = k. = i + 3k, a2 = i − j + k, a3 = 2i + j. = −i + k, a2 = 2 j + k, a3 = i + j + k. 6. a1 = i + k, a2 = −j + k, a3 = i + j + 2k. In Exercises 7 and 8, ﬁnd two different but equivalent sets of orthogonal vectors by arranging the same three nonorthogonal vectors in the orders indicated. 7. (a) a1 = 3j − k, a2 = i + j, a3 = i + 2k. (b) a1 = i + j, a2 = 3j − k, a3 = i + 2k. 8. (a) a1 = j − k, a2 = i + k, a3 = −i − j + k. (b) a1 = −i − j + k, a2 = i + k, a3 = j − k.
• C H A P T E R 3 Matrices and Systems of Linear Equations M any types of problems that arise in engineering and physics give rise to linear algebraic simultaneous equations. A typical engineering example involves the determination of the forces acting in the struts of a pin-jointed structure like a truss that forms the side of a bridge supporting a load. The determination of the forces in a strut is important in order to know when it is in compression or tension, and to ensure that no truss exceeds its safe load. The analysis of the forces in structures of this type gives rise to a set of linear simultaneous equations that relate the forces in the struts and the external load. It is necessary to know when systems of linear equations are consistent so a solution exists, when they are inconsistent so there is no solution, and whether when a solution exists it is unique or nonunique in the sense that it involves a number of arbitrary parameters. In practical problems all of these mathematical possibilities have physical meaning, and in the case of a truss, the inability to determine the forces acting in a particular strut indicates that it is redundant and so can be removed without compromising the integrity of the structure. A more complicated though very similar situation occurs when linearly vibrating systems are coupled together, as may happen when an active vibration damper is attached to a spring-mounted motor. However, in this case it is a system of simultaneous linear ordinary differential equations determining the amplitudes of the vibrations of the motor and vibration damper that are coupled together. The analysis of this problem, which will be considered later, also gives rise to a linear system of simultaneous algebraic equations. Linear ordinary differential equations are also coupled together when working with linear control systems involving feedback. When such systems are solved by means of the Laplace transform to be described later, linear algebraic systems again arise and the nature of the zeros of the determinant of a certain quantity then determines the stability of the control system. Linear systems of simultaneous algebraic equations also play an essential role in computer graphics, where at the simplest level they are used to transform images by translating, rotating, and stretching them by differing amounts in different directions. Although each equation in a system of linear algebraic equations can be considered separately, such can be discovered about the properties of the physical problem that gave rise to the equations if the system of equations can be studied as a whole. This can be accomplished by using the algebra of matrices that provides a way of analyzing systems 105
• 106 Chapter 3 Matrices and Systems of Linear Equations as a single entity, and it is the purpose of this chapter to introduce and develop this aspect of what is called linear algebra. After deﬁning the notion of a matrix, this chapter develops the fundamental matrix operations of equality, addition, scaling, transposition, and multiplication. Various applications of matrices are given, and the brief review of determinants given in Chapter 1 is developed in greater detail, prior to its use when considering the solution of systems of linear algebraic equations. The concept of elementary row operations is introduced and used to reduce systems of linear algebraic equations to a form that shows whether or not a unique solution exists. When a solution does exist, which is either unique or determined in terms of some of the remaining variables, this reduction enables the solution to be found immediately. The inverse of an n × n matrix is deﬁned and shown only to exist when the determinant of the matrix is nonvanishing, and, ﬁnally, the derivative of a matrix whose elements are functions of a variable is introduced and some of its most important properties are derived. 3.1 Matrices M atrices arise naturally in many different ways, one of the most common being in the study of systems of linear equations such as a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm. (1) In system (1) the numbers ai j are the coefﬁcients of the equations, the numbers bi are the nonhomogeneous terms, and the number of equations m may equal, exceed, or be less than n, the number of unknowns x1 , x2 , . . . , xn . System (1) is said to be homogeneous when b1 = b2 = · · · = bm = 0, and to be nonhomogeneous when at least one of the bi is nonvanishing. The algebraic properties of the system are determined by the array of coefﬁcients ai j , the nonhomogeneous terms bi and the numbers m and n. From now on, the array of coefﬁcients and the nonhomogeneous terms on the right will be denoted by the single symbols A and b, respectively, where ⎡ a11 ⎢ a21 A=⎢ ⎣ am1 a12 a22 . . . am2 . . . . . . . . ⎤ . a1n . a2n ⎥ ⎥ ⎦ . . amn ⎡ and ⎤ b1 ⎢b ⎥ b = ⎢ 2⎥. ⎣ · ⎦ bm (2) The array of mn coefﬁcients ai j in m rows and n columns that form A is an example of an m × n matrix, where m × n is read “m by n.” The array b is an example of an m × 1 matrix, and it is called an m element column vector. We will use the convention that an array such as A, with two or more rows and two or more columns, will be denoted by a boldface capital letter. An array with a single row, or a column such as b, will be denoted by a boldface lowercase letter. Each entry in a matrix is called an element of the matrix, and entries may be numbers, functions, or even matrices themselves. The sufﬁxes associated with an element show its position in the matrix, because the ﬁrst sufﬁx is the row number
• Section 3.1 Matrices 107 and the second is the column number. Because of this convention, the element a35 in a matrix belongs to the third row and the ﬁfth column of the matrix. So, for example, if A is a 3 × 2 matrix and its general element ai j = i + 3 j, then as i may only take the values 1, 2, and 3, and j the values 1 and 2, it follows that ⎡ ⎤ 4 7 A = ⎣5 8⎦ . 6 9 In a column vector c with elements c11 , c21 , c31 , . . . , cm1 , as only a single column is involved, it is usual to vary the sufﬁx convention by omitting the second sufﬁx and instead numbering the elements sequentially as c1 , c2 , c3 , . . . , cm, so that ⎡ ⎤ c1 ⎢ c2 ⎥ ⎢ ⎥ ⎢.⎥ c = ⎢ . ⎥. ⎢.⎥ ⎢.⎥ ⎣.⎦ . cm Later it will be necessary to introduce row vectors, and in an s element row vector r with elements r11 , r12 , r13 , . . . , r1s , the notation is again simpliﬁed, this time by omitting the ﬁrst sufﬁx and numbering the elements sequentially as r1 , r2 , . . . , rs , so r = [r1 , r2 , . . . , rs ]. (3) In general, row and column vectors will be denoted by boldface lowercase letters such as a, b, c, and x, and matrices such as the coefﬁcient matrix in (2) will be denoted by boldface capital letters such as A, B, P, and Q. A different convention that is also used to denote a matrix involves enclosing the array between curved brackets instead of the square ones used here. Thus, 1 −3 5 2 9 4 1 −3 and 5 2 9 4 (4) denote the same 2 × 3 matrix. A matrix should never be enclosed between two vertical rules in order to avoid confusion with the determinant notation because 3 5 −4 2 is a matrix, but 3 5 −4 = 26 is a determinant. 2 Deﬁnition of a matrix An m × n matrix is an array of mn entries, called elements, arranged in m rows and n columns. If a matrix is denoted by A, then the element in its ith row and jth column is denoted by ai j and ⎡ a11 ⎢ a21 A = [ai j ] = ⎢ ⎣ . am1 a12 a22 . . . am2 . . . . . . . . . . . . . ⎤ a1n a2n ⎥ ⎥. . ⎦ amn
• 108 Chapter 3 Matrices and Systems of Linear Equations some typical matrices The following are typical examples of matrices: A 1 × 1 matrix: [3]; a single element may be regarded as a matrix. ⎡ ⎤ 1 3 5 0 A 3 × 4 matrix: ⎣2 −1 4 3⎦ ; a matrix with real numbers as elements. 7 2 1 6 A 2 × 2 matrix: 1+i 3 + 4i 1−i ; 2 − 3i a matrix with complex numbers as elements. cosθ sin θ ; a matrix with functions as elements. −sinθ cos θ A 1 × 3 matrix: [2, −5, 7]; a three-element row vector. 11 A 2 × 1 matrix: ; a two-element column vector. 9 A 2 × 2 matrix: A square matrix is a matrix in which the number of rows m equals the number of columns n. A typical square matrix is the 3 × 3 matrix ⎡ ⎤ 2 0 5 ⎣1 −3 4⎦ . 3 1 7 Deﬁnition of the equality of matrices Let A = [ai j ] be an m × n matrix and B = [bi j ] be a p × q matrix. Then matrices A and B will be equal, written A = B, if, and only if: (a) A and B have the same number of rows, and the same number of columns, so that m = p and n = q, and (b) ai j = bi j , for each i and j. Equality of matrices means that if A and B are equal, then each is an identical copy of the other. ⎡ EXAMPLE 3.1 2 3 a If A = , b 6 1 2 B= −3 3 6 9 , 1 and 2 C = ⎣−3 0 3 6 0 ⎤ 9 1⎦ , 0 then A = B if and only if a = 9 and b = −3, but A = C and B = C. Deﬁnition of matrix addition The addition of matrices A and B is only deﬁned if the matrices each have the same number of rows and the same number of columns. Let A = [ai j ] and B = [bi j ] be m × n matrices. Then the the m × n matrix formed by adding A and B, called the sum of A and B and written A + B, is the matrix whose element in the ith row and jth column is ai j + bi j , for each i and j, so that A + B = [ai j + bi j ]. Matrices that can be added are said to be conformable for addition.
• Section 3.1 Matrices 109 It is an immediate consequence of this deﬁnition that A + B = B + A, so matrix addition is commutative. Deﬁnition of the transpose of a matrix Let A = [ai j ] be an m × n matrix. Then the transpose of A, denoted by AT (and sometimes by A ), is the matrix obtained from A by interchanging rows and columns to produce the n × m matrix AT = [ai j ]T = [a ji ]. The deﬁnition of the transpose of a matrix means that the ﬁrst row of A becomes the ﬁrst column of AT , the second row of A becomes the second column of AT , . . . . , and, ﬁnally, the mth row of A becomes the mth column of AT . In particular, if A is a row vector, then its transpose is a column vector, and conversely. EXAMPLE 3.2 ⎡ 2 If A = 1 6 0 3 4 2 then AT = ⎣6 3 ⎤ ⎡ ⎤ 1 7 0⎦ , and if A = [7, 3, 2] then AT = ⎣3⎦ . 4 2 Deﬁnition of scaling a matrix by a number Let A = [ai j ] be an m × n matrix and λ be a scalar (real or complex). Then if A is scaled by λ, written λA, every element of A is multiplied by λ to yield the m × n matrix λA = [λai j ]. EXAMPLE 3.3 If λ = 2 and A= 2 1 −6 4 7 , 15 then λA = 2A = 4 2 −12 8 14 , 30 and if λ = −1, then λA = (−1)A = −A = difference (subtraction) of matrices −2 −1 6 −7 . −4 −15 Taken together, the deﬁnitions of the addition and scaling of matrices show that if the matrices A and B are conformable for addition, then the subtraction of matrix B from A, called their difference and written A − B, is to be interpreted as A − B = A + (−1)B. EXAMPLE 3.4 If A = 2 5 8 1 −4 5 and B= 2 2 4 −4 5 , 1 then A − B = 0 −1 1 0 3 . 4
• 110 Chapter 3 Matrices and Systems of Linear Equations negative of a matrix The null or zero matrix 0 is deﬁned as any matrix in which every element is zero. The introduction of the null matrix makes it appropriate to call −A the negative of A, because A − A = A + (−1)A = 0. When working with the null matrix the number of its rows and columns is never stated, because these are always taken to be whatever is appropriate for the equation that is involved. Deﬁnition of the product of a row and a column vector Let a = [a1 , a2 , . . . , ar ] be an r-element row vector, and b = [b1 , b2 , . . . , br ]T be an r -element column vector. Then the product ab, in this order, is the number deﬁned as ab = a1 b1 + a2 b2 · · · + ar br . Notice that this product is only deﬁned when the number of elements in the row vector A equals the number of elements in the column vector B. EXAMPLE 3.5 Find the product ab given that a = [1, 4, −3, 10] and b = [2, 1, 4, −2]T . Solution ab = [1, 4, −3, 10] ⎡ ⎤ 2 ⎢ 1⎥ ⎢ ⎥ ⎣ 4⎦ −2 = (1) · (2) + (4) · (1) + (−3) · (4) + (10) · (−2) = −26. Deﬁnition of the product of matrices Let A = [ai j ] be an m × n matrix in which the r th row is the row vector ar , and let B = [bi j ] be a p × q matrix in which the sth column is the column vector bs . The matrix product AB, in this order, is only deﬁned if the number of columns in A equals the number of rows in B, so that n = p. The product is then an m × q matrix with the element in its r th row and sth column deﬁned as ar bs . Thus, if cr s = ar bs , as cr s = ar 1 b1s + ar 2 b2s + · · · + ar n bns , AB = [cr s ] = [ar 1 b1s + ar 2 b2s + · · · + ar n bns ], for 1 ≤ r ≤ m and 1 ≤ s ≤ q, or, equivalently, ⎡ a1 b1 a1 b2 a1 b3 ⎢ a2 b1 a2 b2 a2 b3 AB = ⎢ . . . . . . . . . . ⎣ amb1 amb2 amb3 ⎤ . . . a1 bq . . . a2 bq ⎥ ⎥. . . . . . ⎦ . . . ambq
• Section 3.1 Matrices 111 When a matrix product AB is deﬁned, the matrices are said to be conformable for matrix multiplication in the given order. in general, matrix multiplication is noncommutative EXAMPLE 3.6 It is important to notice that when the product AB is deﬁned, the product BA may or may not be deﬁned, and even when BA is deﬁned, in general AB = BA. This situation is recognized by saying that, in general, matrix multiplication is noncommutative. Provided matrices A and B are conformable for multiplication, the above rule for ﬁnding their product AB, in this order, is best remembered by saying that the element in the ith row and jth column of AB is the product of the ith row of A and the jth column of B. Form the matrix products AB and BA given that A= 1 2 4 5 −3 4 and ⎡ 4 B = ⎣2 0 ⎤ 1 6⎦ . 3 Solution Let us calculate the matrix product AB. The ﬁrst and second row vectors of A are a1 = [1, 4, −3] and a2 = [2, 5, 4], and the ﬁrst and second column vectors of B are b1 = [4, 2, 0]T and b2 = [1, 6, 3]T . As A is a 2 × 3 matrix and B is a 3 × 2 matrix, the product AB is conformable for multiplication and yields a 2 × 2 matrix AB = = (1 · 4 + 4 · 2 + (−3) · 0) (1 · 1 + 4 · 6 + (−3) · 3) a1 b1 a1 b2 = a2 b1 a2 b2 (2 · 4 + 5 · 2 + 4 · 0) (2 · 1 + 5 · 6 + 4 · 3) 12 18 16 . 44 The product BA is also conformable for multiplication and yields a 3 × 3 matrix, where now we must use the row vectors of B that with an obvious change of notation are b1 = [4, 1], b2 = [2, 6], b3 = [0, 3], and the column vectors of A that are a1 = [1, 2]T , a2 = [4, 5]T , and a3 = [−3, 4]T , so that ⎡ ⎤ ⎡ ⎤ b1 a1 b1 a2 b1 a3 (4 · 1 + 1 · 2) (4 · 4 + 1 · 5) (4 · (−3) + 1 · 4) BA = ⎣b2 a1 b2 a2 b2 a3 ⎦ = ⎣(2 · 1 + 6 · 2) (2 · 4 + 6 · 5) (2 · (−3) + 6 · 4)⎦ (0 · 1 + 3 · 2) (0 · 4 + 3 · 5) (0 · (−3) + 3 · 4) b3 a1 b3 a2 b3 a3 ⎡ ⎤ 6 21 −8 = ⎣14 38 18⎦ . 6 15 12 This is an example of two matrices A and B that can be combined to form the products AB and BA, but AB = BA. EXAMPLE 3.7 Write the system of simultaneous equations (1) in matrix form. Solution Using the matrices A and b in (2) and setting x = [x1 , x2 , . . . , xn ]T allows the system of equations (1) to be written Ax = b. Here, as is usual, to save space the transpose operation has been used to display the elements of column vector x in the more convenient form x = [x1 , x2 , . . . , xn ]T .
• 112 Chapter 3 Matrices and Systems of Linear Equations The deﬁnitions of matrix multiplication and addition lead almost immediately to the results of the following theorem, so the proof is left as an exercise. THEOREM 3.1 some important properties of matrices Associative and distributive properties of matrices Let A, B, and C be matrices that are conformable for the operations that follow, and let λ be a scalar. Then: If AB and BA are both deﬁned, in general AB = BA; (ii) A(BC) = (AB)C = ABC; (iii) (λA)B = A(λB) = λAB; (iv) A(B + C) = AB + AC; (v) THEOREM 3.2 (i) (A + B)C = AC + BC. Transposition of a product If matrices A and B are conformable to form the product AB, then (AB)T = BT AT . Proof The products (AB)T and BT AT are both deﬁned, and each is an m × q matrix. Introduce the notation [M]i j to denote the element of M in row i and column j. Then from the transpose operation and the rule for matrix multiplication, for all permissible i, j, n T [AB]i, j = [AB] j,i = (product of jth row of A with ith column of B) = a jkbki . k=1 Similarly, [BT AT ]i, j = (product of ith row of BT with jth column of AT ) n = (product of ith column of B with jth row of A) = a jkbki . k=1 So [AB]iTj = [BT AT ]i j for all permissible i, j, showing that (AB)T = BT AT . raising a matrix to a power It is an immediate consequence of Theorem 3.1(ii) that if A is a square matrix and m and n are positive integers, A · A · A · . . . · A = An and Am · An = Am+n . n times A useful result from the deﬁnition of addition is (A + B)T = AT + BT , while from Theorem 3.2 (ABC)T = CT BT AT .
• Section 3.1 pre- and postmultiplication of matrices Matrices 113 As the order in which a sequence of permissible matrix multiplications is performed inﬂuences the product, it is necessary to introduce a form of words that makes the order unambiguous. This is accomplished by saying that if matrix A multiplies matrix B from the left, as in AB, then B is premultiplied by A, while if A multiplies B from the right, as in BA, then B is postmultiplied by A. Equivalently, in the product AB, we can say that A is postmultiplied by B, or that B is premultiplied by A. Important Differences Between Ordinary Algebraic Equations and Matrix Equations (i) The algebraic equation ab = 0, in which a and b are numbers, not both of which are zero, implies that either a = 0 or b = 0. However, if the matrix product AB is deﬁned and is such that AB = 0, then it does not necessarily follow that either A = 0 or B = 0. (ii) The algebraic equation ab = ac in which a, b, and c are numbers, with a = 0, allows cancellation of the factor a leading to the conclusion that b = c. However, if the matrix products AB and AC are deﬁned and are such that AB = AC, this does not necessarily imply that B = C, so that cancellation of matrix factors is not permissible. The validity of these two statements can be seen by considering the following simple examples. EXAMPLE 3.8 Consider matrices A and B given by A= 1 3 4 12 B= and 4 −8 . −1 2 Then AB = 0, but neither A nor B is a null matrix. EXAMPLE 3.9 Consider the matrices A, B, and C given by A= 1 2 −1 , −2 B= 2 2 4 3 6 , 4 and C= 3 3 6 5 8 . 6 Then AB = AC = 0 0 1 2 2 , 4 but B = C. leading diagonal and trace of a matrix In a square n × n matrix A = [ai j ], the elements on a line extending from top left to bottom right is called the leading diagonal of A, and it contains the n elements a11 , a22 , . . . , ann . So the leading diagonal of the 2 × 2 matrix A in Example 3.8 contains the elements 1 and 12, and the leading diagonal of the 2 × 2 matrix B contains the elements 4 and 2. Symbolically, the leading diagonal of the n × n matrix A = [ai j ] shown below comprises the n elements in the shaded diagonal strip, though these
• 114 Chapter 3 Matrices and Systems of Linear Equations n elements do not form an n element vector. ⎡ a11 ⎢a21 ⎢ ⎢a A = ⎢ 31 ⎢ · ⎢ ⎣ · an1 a12 a22 a32 · · an2 a13 a23 a33 · · an3 · · · · · · · · · · · · · · · · · · ⎤ a1n a2n ⎥ ⎥ a3n ⎥ ⎥. · ⎥ ⎥ · ⎦ ann The trace of a square matrix A, written tr(A), is the sum of the terms on its leading diagonal, so for the foregoing matrix tr(A) = a11 + a22 + · · · + ann . Square matrices in which all elements away from the leading diagonal are zero, but not every element on the leading diagonal is zero, are called diagonal matrices. Of the class of diagonal matrices, the most important are the unit matrices, also called identity matrices, in which every element on the leading diagonal is the number 1. These n × n matrices are usually all denoted by the symbol I, with the value of n being understood to be appropriate to the context in which they arise. If, however, the value of n needs to be indicated, the symbol I can be replaced by In . It is easily seen from the deﬁnition of matrix multiplication that for any m × n matrix A it follows that ImA = AIn or, more simply, IA = AI = A, and that when A is an n × n matrix, IA = AI = A. identity or unit matrix When working with matrices, the unit matrix I plays the part of the unit real number, and it is because of this that I is called either the unit or the identity matrix. An example of a 4 × 4 diagonal matrix is ⎡ ⎤ 3 0 0 0 ⎢0 2 0 0⎥ ⎥ D=⎢ ⎣0 0 0 0⎦ , with the trace given by tr(D) = 3 + 2 + 0 + 1 = 6. 0 0 0 1 The 3 × 3 unit matrix is the diagonal matrix ⎡ ⎤ 1 0 0 I = I3 = ⎣0 1 0⎦ , and its trace is tr(I) = 1 + 1 + 1 = 3. 0 0 1 Various special square n × n matrices occur sufﬁciently frequently for them to be given names, and some of the most important of these are the following: some special matrices Upper triangular matrices are matrices in which all elements below the leading diagonal are zero. A typical example of a 4 × 4 upper triangular matrix is ⎡ ⎤ 1 3 −1 0 ⎢0 2 −6 1⎥ ⎥ U=⎢ ⎣0 0 −3 2⎦ . 0 0 0 4
• Section 3.1 Matrices 115 Lower triangular matrices are matrices in which all elements above the leading diagonal are zero. A typical example of a 4 × 4 lower triangular matrix is ⎡ ⎤ 2 0 0 0 ⎢ 1 0 0 0⎥ ⎥ L=⎢ ⎣ 3 −2 5 0⎦ . −2 4 7 3 Symmetric matrices A = [ai j ] are matrices in which ai j = a ji for all i and j. If A is symmetric, then A = AT . A typical example of a symmetric matrix is ⎡ ⎤ 1 5 −3 2⎦ . M=⎣ 5 4 −3 2 7 Skew-symmetric matrices A = [a ji ] are matrices in which ai j = −a ji for all i and j. From the deﬁnition of an n × n skew-symmetric matrix we have aii = −aii for i = 1, 2, . . . , n, so the elements on the leading diagonal must all be zero. An equivalent deﬁnition of a skew-symmetric matrix A is that AT = −A. A typical example of a skew-symmetric matrix is ⎡ ⎤ 0 3 −5 6 ⎢−3 0 2 −4⎥ ⎥. S=⎢ ⎣ 5 −2 0 −1⎦ −6 4 1 0 An orthogonal matrix Q is a matrix such that QQT = QT Q = I. A typical orthogonal matrix is ⎡ 1 1 ⎤ −√ √ ⎢ 2 2⎥ ⎥. Q=⎢ ⎣ 1 1 ⎦ √ √ 2 2 More special than the preceding real valued matrices are matrices A = [ai j ] in which the elements ai j are complex numbers. We will write A to denote the matrix obtained from A by replacing each of its elements ai j by its complex conjugate a i j , so that A = [a i j ]. Then matrix A is said to be Hermitian if T A = A. A typical Hermitian matrix is 7 1 + 4i A= 1 − 4i . 3 The matrix A is said to be skew-Hermitian if T A = −A. A typical skew-Hermitian matrix is A= 3i −5 + 2i 5 + 2i . 0
• 116 Chapter 3 Matrices and Systems of Linear Equations block matrices More will be said later about some of these special square matrices and the ways in which they arise. Finally, we mention that every m × n matrix A can be represented differently as a block matrix, in which each element is itself a matrix. This is accomplished by partitioning the matrix A into submatrices by considering horizontal and vertical lines to be drawn through A between some of its rows and columns, and then identifying each group of elements so deﬁned as a submatrix of A. Clearly there is more than one way in which a matrix can be partitioned. As an example of matrix partitioning, let us consider the 3 × 3 matrix ⎤ ⎡ 3 −1 2 2 0⎦ . A = ⎣1 2 1 0 One way in which this matrix can be partitioned is as follows: ⎡ ⎤ 3 −1 2 ⎢ ⎥ A = ⎣1 2 0 ⎦. 2 0 1 This can now be written in block matrix form as A= A12 , A22 A11 A21 where the submatrices are A11 = [3 −1], A12 = [2], A21 = 1 2 2 , 1 and A22 = 0 . 0 The addition and scaling of block matrices follow the same rules as those for ordinary matrices, but care must be exercised when multiplying block matrices. To see how multiplication of block matrices can be performed, let us consider the product of matrix A above and the 3 × 4 matrix ⎡ ⎤ 1 2 2 1 ⎥ ⎢ B = ⎣3 1 1 0 ⎦ , 2 3 0 2 which are conformable for the product AB that is itself a 3 × 4 matrix. If B is partitioned as indicated by the dashed lines, it can be written as B= B11 B21 B12 , B22 where the submatrices are B11 = 1 , 3 B12 = 2 1 2 1 1 , 0 B21 = [2], and B22 = [3, 0, 2]. Consideration of the deﬁnition of the product of matrices shows that we may now write the matrix product AB in the condensed form AB = A11 B11 + A12 B21 A21 B11 + A22 B21 A11 B12 + A12 B22 , A21 B12 + A22 B22
• Section 3.1 Matrices 117 where the partitioned matrices have been multiplied as though their elements were ordinary elements. This result follows because of correct partitioning, as each product of submatrices is conformable for multiplication and all of the matrix sums are conformable for addition. In this illustration, routine calculations show that A11 B11 + A12 B21 = [4], A11 B12 + A12 B22 = [11, 5, 7], 7 4 A21 B11 + A22 B21 = , and A21 B12 + A22 B22 = 5 5 4 5 1 , 2 so ⎡ [4] AB = ⎣ 7 5 ⎤ ⎡ [11, 5, 7] 4 ⎦ = ⎣7 4 4 1 5 5 5 2 11 4 5 ⎤ 5 7 4 1⎦. 5 2 This result is easily conﬁrmed by direct matrix multiplication. The calculation of a matrix product AB using partitioned matrices applies in general, provided the partitioning of A and B is performed in such a way that the products of all the submatrices involved are deﬁned. Matrix partitioning has various uses, one of which arises in machine computation when a very large ﬁxed matrix A needs to be multiplied by a sequence of very large matrices P, Q, R, . . . . If it happens that A can be partitioned in such a way that some of its submatrices are null matrices, the computational time involved can be drastically reduced, because the product of a submatrix and a null matrix is a null matrix, and so need not be computed. The economy follows from the fact that in machine computation multiplications occupy most of the time, so any reduction in their number produces a signiﬁcant reduction in the time taken to evaluate a matrix product, and the result is even more signiﬁcant when the same partitioned matrix with null blocks is involved in a sequence of calculations. Block matrices are also of signiﬁcance when describing complex oscillation problems governed by a large system of simultaneous ordinary differential equations. Their importance arises from the fact that the matrix of coefﬁcients of the equations often contains many null submatrices, and when this happens the structure of the nonnull blocks provides useful information about the fundamental modes of oscillation that are possible, and also about their interconnections. For other accounts of elementary matrices see the appropriate chapters in references [2.1], [2.5], and [2.7] to [2.12]. Summary This section deﬁned m × n matrices, and the special cases of column and row vectors, and it introduced the fundamental algebraic operations of equality, addition, scaling, transposition, and multiplication of matrices. It was shown that, in general, matrix multiplication is not commutative, so that even when both of the products AB and BA are deﬁned, it is usually the case that AB = BA. Pre- and postmultiplication of matrices was deﬁned, and some important special types of matrices were introduced, such as the unit matrix I. It was also shown how a matrix A can be subdivided into blocks, and that a matrix operation performed on A can be interpreted in terms of matrix operations performed on block matrices obtained by subdivision of A.
• 118 Chapter 3 Matrices and Systems of Linear Equations EXERCISES 3.1 In Exercises 1 through 4 ﬁnd the values of the constants a, b, and c in order that A = B. −a 1 4 2 b −1 ⎤ ⎡ 1 4 3 B = ⎣2 2 4⎦. b 1 0 ⎤ ⎡ 2 a2 a 1 a 1 2⎦ , B = ⎣ 3 3. A = ⎣ b 1+a 2+c 6 2 ⎡ ⎤ ⎡ 1 3+a 2 1 a 5 ⎦, B = ⎣4 4. A = ⎣1 + b b2 b2 1 a2 2 a 2 ⎡ 1 2. A = ⎣a 9 ⎡ 1. A = 1 c , 3 a ⎤ 4 3 2 4⎦ , 1 c B= a 1 4 ⎤ 1 2⎦. 6 ⎤ −1 c a 5 ⎦. 1 a2 In Exercises 9 through 12 form the sum λA + μB. ⎤ ⎡ 1 4 2 9. λ = 1, μ = 3, A = ⎣2 1 4⎦, 3 2 2 ⎤ ⎡ 2 3 −1 4⎦. B = ⎣1 2 1 0 3 μ = 2, A= 1 2 4 4 1 , 0 B= 2 0 1 2 1 . 4 11. λ = 4, μ = −2, . In Exercises 5 through 8 ﬁnd A + B and A − B. ⎤ ⎤ ⎡ ⎡ 2 0 1 −2 1 4 3 6 1⎦. 1 0 2⎦ , B = ⎣1 1 −3 5. A = ⎣2 0 1 1 0 1 −1 0 1 ⎤ ⎤ ⎡ ⎡ 1 7 6 2 −1 6 6. A = ⎣ 0 2 4⎦ , B = ⎣1 −2 3⎦. −1 0 1 2 1 2 ⎤ ⎤ ⎡ ⎡ 0 2 3 1 2 4 ⎢3 −1 1⎥ ⎢3 1 0⎥ ⎥. ⎥ ⎢ 7. A = ⎢ ⎣1 1 0⎦ , B = ⎣0 1 1⎦ 1 3 2 2 2 4 ⎤ ⎤ ⎡ ⎡ 1 0 0 0 1 4 3 6 ⎢ 3 1 0 0⎥ ⎢0 2 1 4⎥ ⎥ ⎥ ⎢ 8. A = ⎢ ⎣0 0 3 1⎦ , B = ⎣1 2 4 0⎦. 1 1 1 3 0 0 0 2 10. λ = −1, ⎡ 12. λ = 3, μ = −3, 4 A = ⎣2 1 ⎡ 6 B = ⎣2 1 ⎡ 3 A = ⎣2 3 ⎡ 3 B = ⎣4 2 3 1 2 1 4 1 1 2 6 2 2 1 ⎤ 1 1⎦, 1 ⎤ 0 2⎦. 2 ⎤ 4 1⎦, 2 ⎤ 1 3⎦. 1 In Exercises 13 through 16 ﬁnd the product AB. 13. A = [1, 14. A = [2, 15. A = [1, B = [2, 16. A = [1, B = [−1, 4, 3, 4, 2, 3, 2, −2, 3], B = [2, 1, −1, 2]T . 1, 4], B = [3, 1, 1, 3]T . 3, 7, 5], −1, −1, 3]T . −1, 2, 0], 13, 4, 1]T . In Exercises 17 through 22 ﬁnd the product AB and, when it exists, the product BA. 17. A = 1 4 , 2 0 B= 2 1 1 4 3 . 1 18. A = [1, 4, 6, −7], B = [2, 3, −2, 3]T . ⎤ ⎤ ⎡ ⎡ 1 0 0 3 1 4 19. A = ⎣ 0 1 0 ⎦ , B = ⎣ 2 1 −5 ⎦ . 0 0 1 7 2 0 ⎤ ⎤ ⎡ ⎡ 2 0 0 9 −1 4 6 −2 ⎦ . 20. A = ⎣ 0 −3 0 ⎦ , B = ⎣ 1 0 0 5 2 2 3 ⎤ ⎡ ⎤ ⎡ 2 3 1 5 2 3 ⎢4 1 2⎥ ⎥, B = ⎣2 0 4⎦. ⎢ 21. A = ⎣ 2 2 6⎦ 1 4 7 1 5 2 ⎤ ⎤ ⎡ ⎡ 3 1 1 2 1 0 ⎢ 4 ⎢2 1 1 4⎥ 2⎥ ⎥ ⎥ ⎢ 22. A = ⎢ ⎣ 1 0 2 1 ⎦ , B = ⎣ 6 −2 ⎦ . −1 4 1 1 2 1 23. Given ⎤ ⎤ ⎡ ⎡ 2 5 −3 4 2 1 4⎦ and B = ⎣2 5 6⎦ , A=⎣ 5 1 −3 4 6 1 6 3 show that (AB)T = BA.
• Section 3.1 In Exercises 24 through 28 write the given systems of equations in the matrix form Ax = b, where A is the coefﬁcient matrix, x is the vector of unknowns, and b is the nonhomogeneous vector term. 26. 5x + 3y − 6z = 14 6x − 5y + 11z = 20 x − 4y + 3z = 2 9x − 3y + 2z = 35. 24. 3x + 5y − 6z = 7 x − 7y + 4z = −3 2x + 4y − 5z = 4. 25. 4u + 5v − w + 7z = 25 3u + 2v + 3z = 6 v + 6w − 7z = 0. 27. 3x + 4y − 2z = λx 2x − 7y + 6z = λy 8x + 3y + 5z = λz. 28. 2x + 3y + 6z = λ(3x + 2y + 3z) 3x − 4y + 2z = λ(x − 5y + 2z) 4x + 9y + 2z = λ(x − 2y + 4z). 29. If ⎡ 1 A = ⎣1 0 ⎤ 6 0⎦ , 3 ⎤ ⎡ 2 0 1 2 3⎦ , B = ⎣4 0 −1 1 ⎡ ⎤ x1 x2 x3 X = ⎣ y1 y2 y3 ⎦ , z1 z2 z3 3 2 1 and solve for X given that Matrices 119 33. Prove the second result in Theorem 3.1 that A(BC) = (AB)C = ABC. 34. Prove the third result in Theorem 3.1 that (λA)B = A(λB) = λAB. 35. Prove the fourth result in Theorem 3.1 that A(B + C) = AB + AC. In Exercises 36 through 39 verify that (AB)T = BT AT . ⎤ ⎤ ⎡ ⎡ 2 1 3 3 1 4 36. A = ⎣2 1 2⎦ , B = ⎣1 2 5⎦ . 0 2 1 4 2 3 ⎤ ⎡ ⎤ ⎡ 1 4 3 2 1 4 3 ⎢ 2 1 5⎥ ⎥ 2 1⎦ , B = ⎢ 37. A = ⎣1 6 ⎣−1 3 2⎦ . 1 1 −2 4 1 7 3 ⎤ ⎤ ⎡ ⎡ 3 1 −5 1 4 2 4⎦ . 38. A = ⎣7 3 −1⎦ , B = ⎣1 3 2 0 8 0 2 5 ⎤ ⎡ ⎤ ⎡ 1 2 1 1 4 6 2 ⎢−2 1 4⎥ ⎥ 39. A = ⎣2 1 4 1⎦ , B = ⎢ ⎣ 2 2 5⎦ . 3 0 0 2 1 1 1 40. Verify that (ABC)T = CT BT AT given that 3X + A = A B − X + 3B. T 30. If ⎡ 2 A = ⎣1 3 1 2 0 ⎤ ⎤ ⎡ 1 4 1 4 1⎦ , B = ⎣2 1 2⎦ , 1 1 2 2 ⎡ ⎤ x1 x2 x3 X = ⎣ y1 y2 y3 ⎦ , z1 z2 z3 solve for X given that 2ABT + X − 2I = 3X + 4B − 2A. 31. Given that ⎡ 3 A = ⎣2 2 2 2 0 ⎤ 2 0⎦ , 4 show that A3 − 9A2 + 18A = 0. 32. Given that ⎡ 0 A = ⎣0 2 1 0 1 ⎤ 0 1⎦ , −2 show that A3 + 2A2 − A − 2I = 0. A= and 1 5 , 3 1 B= 3 −2 , 4 5 C= and −2 5 41. Prove that if D is the n × n diagonal matrix ⎡ ⎤ k1 0 0 · · · 0 ⎢ 0 k2 0 · · · 0 ⎥ ⎢ ⎥ D = ⎢ 0 0 k3 · · · 0 ⎥ , then ⎢ ⎥ ⎣. . . . . . . . . . . . ⎦ 0 0 0 · · · kn ⎡ m ⎤ k1 0 0 · · · 0 m ⎢ 0 k2 0 · · · 0⎥ ⎢ ⎥ m 0 k3 · · · 0 ⎥ , Dm = ⎢ 0 ⎢ ⎥ ⎣. . . . . . . . . . . . . ⎦ m 0 0 0 · · · kn where m is a positive integer. 42. Find A2 , A3 , and A4 , given that ⎤ ⎡ 1 2 7 6⎦ . A = ⎣2 5 1 0 −1 43. Find A2 , A4 , and A6 , given that 1/2 A= √ ( 3)/2 √ −( 3)/2 1/2 . 3 . 7
• 120 Chapter 3 Matrices and Systems of Linear Equations 44. Use the matrix A in Exercise 42 to ﬁnd A3 , A5 , and A7 . 45. A square matrix A such that A2 = A is said to be idempotent. Find the three idempotent matrices of the form A= 1 q p . r 46. A square matrix A such that for some positive integer n has the property that An−1 = 0, but An = 0 is said to be nilpotent of index n (n ≥ 2). Show that the matrix ⎤ ⎡ 0 0 0 0 0⎦ A = ⎣4 1 −1 0 is nilpotent and ﬁnd its index. 47. A quadratic form in the variables x1 , x2 , x3 , . . . , xn is 2 2 an expression of the form ax1 + bx1 x2 + cx2 + dx1 x3 + 2 · · · + f xn−1 xn + gxn in which some of the coefﬁcients a, b, c, d, . . . , f, g may be zero. Explain why xT Ax is a quadratic form and ﬁnd the quadratic form for which ⎤ ⎡ ⎡ ⎤ x1 3 4 0 3 ⎢x2 ⎥ ⎢4 2 2 6⎥ ⎥ ⎢ ⎥ A=⎢ ⎣0 2 5 1⎦ and x = ⎣x3 ⎦ . 3 6 1 7 x4 48. Find the quadratic form xT Ax when ⎤ ⎡ ⎤ ⎡ x1 4 1 3 6 ⎢x2 ⎥ ⎢2 3 5 4⎥ ⎥ ⎢ ⎥ A=⎢ ⎣1 4 1 2⎦ and x = ⎣x3 ⎦ . 2 0 4 1 x4 49. Explain why the matrix A in the general expression for 3.2 a quadratic form xT Ax can always be written as a symmetric matrix. In Exercises 50 through 52 ﬁnd the symmetric matrix A for the given quadratic form when written xT Ax, with x = [x, y, z]T . 50. 51. 52. 53. x 2 + 3xy − 4y2 + 4xz + 6yz − z2 . 2x 2 + 4xy + 6y2 + 7xz − 9z2 . 7x 2 + 7xy − 5y2 + 4xz + 2yz − 9z2 . A square matrix P is called a stochastic matrix if all its elements are nonnegative and the sum of the elements in each row is 1. Thus, the matrix ⎡ ⎤ p11 p12 · · · p1n ⎢p p22 · · · p2n ⎥ ⎥ P = ⎢ 21 ⎣. . . . . . . . . . . ⎦ pn1 pn2 · · · pnn will be a stochastic matrix if pi j ≥ 0 for 0 ≤ i ≤ n, 0 ≤ j ≤ n, and n pi j = 1 for i = 1, 2, . . . , n. j=1 Let the n element column vector E = [1, 1, 1, . . . , 1]T . By considering the matrix product PE, and using mathematical induction, prove that Pm is a stochastic matrix for all positive integral values of m. 54. Construct a 3 × 3 stochastic matrix P. Find P2 and P3 , and by showing that all elements of these matrices are nonnegative and that all their row-sums are 1, verify the result of Exercise 53 that each of these matrices is a stochastic matrix. Some Problems That Give Rise to Matrices (a) Electric Circuits with Resistors and Applied Voltages A simple electric circuit involving ﬁve resistors and three applied voltages is shown in Fig. 3.1. The directions of the currents i 1 , i 2 , and i 3 ﬂowing in each branch of the circuit are shown by arrows. The currents themselves can be determined by an application of Ohm’s law and the Kirchhoff laws that can be stated as follows: (a) Voltage = current × resistance (Ohm’s law); (b) The algebraic sum of the potential drops around each closed circuit is zero (Kirchhoff’s second law); (c) The current entering each junction must equal the algebraic sum of the currents leaving it (Kirchhoff’s ﬁrst law). equations and matrices for electric circuits An application of these laws to the circuit in Fig. 3.1, where the potentials are in volts, the resistances are in ohms, and the currents are in amps, leads to the following
• Section 3.2 Some Problems That Give Rise to Matrices 8 121 12 i1 10 8 i2 4 i3 6 4 6 FIGURE 3.1 An electric circuit with resistors and applied voltages. set of simultaneous equations: 8 = 12i 1 + 10(i 1 − i 2 ) + 8(i 1 − i 3 ) 4 = 10(i 2 − i 1 ) + 6(i 2 − i 3 ) 6 = 8(i 3 − i 1 ) + 6(i 3 − i 2 ) + 4i 3 . After collecting terms this system can be written as the matrix equation Ax = b, with ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 8 30 −10 −8 i1 16 −6⎦ , x = ⎣i 2 ⎦ , b = ⎣4⎦ . A = ⎣−10 6 −8 −6 18 i3 The directions assumed for the currents ir for r = 1, 2, 3 are shown by the arrows in Fig. 3.1, but if after the system of equations is solved, the value of the current is found to be negative, the direction of its arrow must be reversed. The circuit in Fig. 3.1 is simple, so in this example the currents can be found by routine elimination between the three equations. When many coupled circuits are involved a matrix approach is useful, and it then becomes necessary to develop a method for solving for x the matrix equation Ax = b, the elements of which are the required currents. If the number of equations is small, x can be found by making use of the matrix A−1 , inverse to A, that will be introduced later, though the most computationally efﬁcient approach is to use one of the numerical methods for solving systems of linear simultaneous equations described in Chapter 19. (b) Combinatorial Problems: Graph Theory Matrices play an important role in combinatorial problems of many different types and, in particular, in graph theory. The purpose of the brief account offered here will be to illustrate a particular application of matrices, and no attempt will be made to discuss their subsequent use in the solution of the associated problems. Combinatorial problems involve dealing with the possible arrangements of situations of various different kinds, and computing the number and properties of such arrangements. The arrangements may be of very diverse types, involving at one extreme the ordering of matches that are to take place in a tennis tournament,
• Section 3.2 FIGURE 3.3 A typical digraph. digraph Some Problems That Give Rise to Matrices 123 FIGURE 3.4 The Konigsberg bridge problem. ¨ directions. A graph of this type is called a digraph (directed graph). The rules for the entries in the adjacency matrix A = [ai j ] of a digraph are that ai j = 1, 0, if vertices i and j are joined by an edge with an arrow from i to j otherwise. A typical digraph is shown in Fig. 3.3, and it has the associated adjacency matrix ⎡ ⎤ 0 1 0 1 ⎢0 0 1 0⎥ ⎥ A=⎢ ⎣1 0 0 0⎦ . 0 0 1 0 K¨ nigsberg o bridge problem The adjacency matrix A characterizes all the possible interconnections between the four vertices and, as with the previous example, an analysis of the properties of any situation capable of representation in terms of this digraph can be performed using the matrix A. Problems of this type can arise in transportation problems in cities with one-way streets, and in chemical processes where a ﬂuid is piped to different parts of a plant through an interconnecting network of pipes through which ﬂuid may only ﬂow in a given direction. Before closing this brief introduction to graph theory, mention should be made of a problem of historical signiﬁcance, since it represented the start of graph theory ¨ as it is known today. The problem is called the Konigsberg bridge problem, and it was solved by Euler (1707–1783). During the early 18th century the Prussian town of Konigsberg was established on two adjacent islands in the middle of the river ¨ Pregel. The islands were linked to the land on either side of the river, and to one another, by seven bridges, as shown in Fig. 3.4a. It was suggested to Euler that he should resolve the conjecture that it ought to be possible to walk through the town, starting and ending at the same place, while crossing each of the seven bridges only once.
• 124 Chapter 3 Matrices and Systems of Linear Equations Euler replaced the picture in Fig. 3.4a by the graph in Fig. 3.4b, though it was not until much later that the term graph in the sense used here was introduced. In Fig. 3.4b the vertices S and Q represent the two islands and, using the same lettering, P and R represent the riverbanks. The number of edges incident on each vertex represents the number of bridges connected to the corresponding land mass. Euler introduced the concept of a connected graph, in which each pair of vertices is linked by a set of edges, and also what is now called an eulerian circuit, comprising a path through all vertices that starts and ends at the same vertex and uses every edge only once. He called the number of edges incident upon a vertex the degree of the vertex, and by using these ideas he was able to prove the impossibility of the conjecture. The arguments involved are not difﬁcult, but their details would be out of place here. Many more practical problems are capable of solution by graph theory, which itself belongs to the branch of mathematics called combinatorics. In elementary accounts, graph theory and related combinatorial issues are usually called discrete mathematics. More information about combinatorics and its connection with matrices can be found in References [2.2] and [2.13]. (c) Translations, Rotations, and Scaling of Graphs: Computer Graphics matrices and computer graphics The simplest operations in computer graphics involve copying a picture to a different location, rotating a picture about a ﬁxed point, and scaling a picture, where the scaling can be different in the horizontal and vertical directions. These operations are called, respectively, a translation, a rotation, and a scaling of the picture. Operations of this nature can all be represented in terms of matrices, and they involve what are called linear transformations of the original picture. Translation A translation of a two-dimensional picture involves copying it to a different location without either rotating it or changing its horizontal and vertical scales. Figure 3.5 shows the original cartesian axes O(x, y) and the shifted axes O (x , y ), where the respective axes remain parallel to their original directions and the origin O is located at the point (h, k) relative to the O(x, y) axes. The relationship between the two sets of coordinates is given by x = x +h y k O and y = y + k. y′ O′ x′ h FIGURE 3.5 A translation. x
• Section 3.2 Some Problems That Give Rise to Matrices 125 y y P x φ Q θ x O FIGURE 3.6 A rotation through an angle θ. If x = [x, y]T , x = [x , y ]T , and b = [h, k]T , the coordinate transformation can be written in matrix form as x = x + b, where matrix b represents the translation. Rotation A rotation of the coordinate axes through an angle θ is shown in Fig. 3.6, where P(x, y) is an arbitrary point. The coordinates of P in the (x, y) reference frame and the (x , y ) reference frame are related as x = OR = OP cos(φ + θ ) = OP cos φ cos θ − OP sin φ sin θ = OQ cos θ − PQ sin θ = x cos θ − y sin θ, and y = P R = OP sin(φ + θ) = OP sin φ cos θ + OP cos φ sin θ = PQ cos θ + OQ sin θ = y cos θ + x sin θ, so x = x cos θ − y sin θ and y = y cos θ + x sin θ. Deﬁning the matrices x, x , and R as x= x , y x = x , y R= and cos θ sin θ allows the coordinate transformation to be written as x = Rx . Scaling If S is a matrix of the form S= kx 0 0 , ky −sin θ cos θ
• 126 Chapter 3 Matrices and Systems of Linear Equations where kx and ky are positive constants, and x = Sx, it follows that x = kx x and y = ky y , showing that x is obtained by scaling x by kx , while y is obtained by scaling y by ky . This form of scaling is represented by premultiplication of x by S, and if, for example, 4 0 S= 0 , 3 the effect of this transformation on a circle of radius a will be to map it into an ellipse with semimajor axis of length 4a parallel to the x-axis and a semiminor axis of length 3a parallel to the y-axis. Composite transformations By combining the preceding matrix operations to form a composite transformation, it is possible to carry out several transformations simultaneously. As an example, the effect of a rotation R followed by a translation b when performed on a vector x are seen to be described by the matrix equation x = Rx + b, the effect of which is shown in Fig. 3.7. If a scaling S is performed before the rotation and translation, the effect on a vector x is described by the matrix equation x = RSx + b. This is illustrated in Fig. 3.8b, which shows the effect when a transformation of this type is performed on the circle of radius a centered on the origin shown in Fig. 3.8a, with b= h , k R= cos π/3 sin π/3 −sin π/3 , cos π/3 and S= 3 0 0 . 2 It is seen that the circle has ﬁrst been scaled to become an ellipse with semiaxes 3a and 2a, after which the ellipse has been rotated through an angle π/3, and ﬁnally its center has been translated to the point (h, k). y P(x, y) y′ φ k O x′ θ O′ h FIGURE 3.7 A rotation and a translation. x
• Section 3.2 Some Problems That Give Rise to Matrices 127 y x′ 3a y y′ x′ 2a y′ 0 a π /3 k π/3 −2a x 0 h x −3a (a) (b) FIGURE 3.8 The composite transformation x = RSx + b. It is essential to remember that the order in which transformations are performed will, in general, inﬂuence the result. This is easily seen by considering the two transformations x = RSx + b and x = SRx + b. If the ﬁrst of these is performed on the circle in Fig. 3.8a, it produces Fig. 3.8b, but when the second is performed on the same circle, it ﬁrst converts it into an ellipse with its major axis horizontal, and then translates the center of the ellipse to the point (h, k). In this case the effect of the rotation cannot be seen, because the circle is symmetric with respect to rotations. A relationship of the form x = F(x ) can be interpreted geometrically in two distinct ways which are equally valid: 1. As the change in the way we describe the location of a point P. Then the relationship is called a transformation of coordinates (Figs. 3.5, 3.6, 3.7). 2. As a mapping of a point P from one location to a new one. (d) Matrix Analysis of Framed Structures A framed structure is a network of straight struts joined at their ends to form a rigid three-dimensional structure. A typical framed structure is the steel work for a large building before the walls and ﬂoors have been added. A simple example of a framed structure, called a truss, is a plane construction in which the struts are joined together to form triangles, as in the side section of the small bridge shown in Fig. 3.9. For safety, to ensure that no strut fails when the bridge carries the largest permitted load, it is necessary to determine the force experienced by each strut in the truss when the bridge supports its maximum load in several different positions. Typically, the largest load could be due to a heavy truck crossing the bridge. The analysis of trusses is usually simpliﬁed by making the following assumptions: r The structure is in the vertical plane; r The weight of each strut can be neglected; r Struts are rigid and so remain straight;
• 128 Chapter 3 Matrices and Systems of Linear Equations F4 B B F1 D A A C E FIGURE 3.9 A typical truss found in a side section of a bridge. π /3 F3 F5 π /3 F2 L /2 L /2 D π /3 C F7 F6 π /3 L R1 E R2 3m FIGURE 3.10 A truss supporting a concentrated load. r Each joint is considered to be hinged, so the only forces acting at a joint are along the struts meeting at the joint if forces are applied at joints only. r There are no redundant struts, so that removing a strut will cause the truss to collapse. We now write down the simultaneous equations that must be solved to ﬁnd the forces acting in the seven struts of length L that form the truss shown in Fig. 3.10, when a concentrated load 3m is located at point C midway between A and E. This load could be considered to be a heavily laden truck standing in the center of the bridge. To determine the reactions at the support points A and E, we use the fact that for equilibrium the turning moments about these two points must be zero. The turning moment of the load 3m about the point A must be cancelled by the turning moment of the reaction R2 at E, so 3m(L) = R2 (2L), showing that R2 = 3m/2. Similarly, the turning moment of the load 3m about the point E must be cancelled by the turning moment of the reaction R1 at A, so 3m(L) = R1 (2L), showing that R1 = 3m/2. The directions in which the forces F1 to F7 are assumed to act are shown by arrows, and if later a force is found to be negative, the direction of the associated arrows must be reversed. For equilibrium the sum of the vertical components of all forces acting at each joint must be zero, as must be the sum of the horizontal components of all forces acting at each joint. The equations representing the balance of forces at each joint are as follows, where when resolving the forces acting at joint C, the effect of the load 3m which acts vertically downwards must be taken into account: equations and matrices for a framed structure Joint A(vertical) Joint A(horizontal) Joint B(vertical) Joint B(horizontal) Joint C(vertical) Joint C(horizontal) Joint D(vertical) F1 sin π/3 − 3m/2 = 0 F1 cos π/3 + F2 = 0 F1 sin π/3 + F3 sin π/3 = 0 F1 cos π/3 − F3 cos π/3 − F4 = 0 F3 sin π/3 + F5 sin π/3 + 3m = 0 F2 + F3 cos π/3 − F5 cos π/3 − F6 = 0 F5 sin π/3 + F7 sin π/3 = 0
• Section 3.2 Some Problems That Give Rise to Matrices 129 F4 + F5 cos π/3 − F7 cos π/3 = 0 F7 sin π/3 − 3m/2 = 0 F6 + F7 cos π/3 = 0. Joint D(horizontal) Joint E(vertical) Joint E(horizontal) After substituting for sin π/3 and cos π/3, these equations can be written in the matrix form Ax = b, where ⎤ ⎡1√ 3 0 0 0 0 0 0 2 ⎢ 1 ⎥ ⎤ ⎡ 1 0 0 0 0 0 ⎥ ⎢ 2 3m/2 ⎡ ⎤ ⎥ ⎢ √ √ F1 ⎢ 0 ⎥ ⎢1 3 0 1 3 0 0 0 0 ⎥ ⎥ ⎢ ⎥ ⎢2 2 ⎢ ⎥ ⎢ 0 ⎥ ⎥ ⎢ 1 ⎢ F2 ⎥ 1 ⎥ ⎢ ⎢ 0 − 2 −1 0 0 0 ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎥ ⎢ 2 ⎢ F3 ⎥ √ √ ⎥ ⎢ ⎥ ⎢ 1 1 ⎢ ⎥ 0 2 3 0 2 3 0 0 ⎥ ⎥ ⎢ ⎢ 0 ⎥ , x = ⎢ F4 ⎥ , b = ⎢ −3m ⎥ . ⎢ A=⎢ ⎢ ⎥ 1 ⎢ 0 ⎥ ⎢ ⎥ 0 1 0 − 1 −1 0 ⎥ ⎥ ⎢ ⎢ 2 2 ⎢ F5 ⎥ √ √ ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ 0 0 1 3⎥ 0 0 0 1 3 ⎢ ⎥ 2 2 ⎢ 0 ⎥ ⎥ ⎢ ⎣ F6 ⎦ ⎥ ⎢ ⎢ 0 1 1 ⎥ 0 −2 ⎥ 0 0 1 ⎢ ⎣3m/2⎦ 2 F7 ⎢ √ ⎥ ⎢ 0 0 0 0 0 0 0 1 3⎥ ⎦ ⎣ 2 0 0 0 0 0 1 1 2 These are 10 equations for the 7 unknown forces F1 to F7 , so unless 3 of the equations represented in Ax = b are combinations of the remaining 7 equations, we cannot expect there to be a solution. When the rank of a matrix is introduced in Section 3.6, we will see how systems of this type can be checked for consistency and, when appropriate, simpliﬁed and solved. In this case the equations are sufﬁciently simple that they can be solved sequentially, without the use of matrices. The solution is seen to be √ √ √ √ F1 = m 3, F2 = −m/( 3/2), F3 = −m 3, F4 = m 3, √ √ √ F5 = −m 3, F6 = −m( 3/2), F7 = m 3. k1 m1 x1 The signs show that the arrows in Fig. 3.10 associated with forces F2 , F3 , F5 , and F6 should be reversed, so these struts are in tension, while the others are in compression. Notice that matrix A is determined by the geometry of the truss, and so does not change when forces are applied to more than one of the joints on the truss (bridge). This means that after the 10 equations have been reduced to seven, the same modiﬁed matrix A can be use to ﬁnd the forces in the struts for any form of concentrated loading. Had a more complicated struss been involved, many more equations would have been involved, so that a matrix approach becomes necessary. This approach also identiﬁes any redundant struts in a structure, because the force in a redundant strut is indeterminate. (e) A Compound Mass–Spring System k2 x2 m2 FIGURE 3.11 A compound mass–spring system. Matrices can have variables as elements, and an analysis of the compound mass– spring system shown in Fig. 3.11 shows one way in which this can arise. Figure 3.11 represents a mass m1 suspended from a rigid support by a spring of negligible mass with spring constant k1 , and a mass m2 suspended from mass m1 by a spring of negligible mass with spring constant k2 . The vertical displacement of m1 from its
• 130 Chapter 3 Matrices and Systems of Linear Equations equations of motion of a coupled mass–spring system equilibrium position is x1 , and the vertical displacement of m2 from its equilibrium position is x2 . Each spring is assumed to be linearly elastic, so the restoring force exerted by a spring is equal to the product of the displacement from its equilibrium position and the spring constant. The product of the mass m1 and its acceleration is m1 d2 x1 /dt 2 , and the restoring force due to spring k1 is k1 x1 , while the restoring force due to spring k2 is k2 (x1 − x2 ), so the equation of motion of m1 is d2 x1 = −k1 x1 − k2 (x1 − x2 ). dt 2 Similarly, the equation of motion of m2 is m1 d2 x2 = −k2 (x2 − x1 ), dt 2 where the negative signs are necessary because the springs act to restore the masses to their original positions. ¨ This system can be written as the matrix differential equation x + Ax = 0, by deﬁning A and x as ⎡ 2 ⎤ ⎡ (k + k ) k2 ⎤ d x1 1 2 − ⎢ dt 2 ⎥ ⎥ ⎢ m1 m1 ⎥ x ⎢ ⎥ ¨ A=⎢ , x = 1 , and x = ⎢ ⎥. ⎣ x2 ⎣ d2 x2 ⎦ k2 ⎦ k2 − m2 m2 dt 2 The solution of this system will not be considered here as ordinary differential equations and systems of the type derived here are discussed in detail in Chapter 6, where matrix methods are also developed. Chapter 7 develops Laplace transform methods for the solution of differential equations and systems. It will sufﬁce to mention here that the dynamical behavior of the compound mass–spring system in Fig. 3.11 is completely characterized by matrix A. m2 (f) Stochastic Processes Certain problems arise that are not of a deterministic nature, so that both the formulation of the problem and its outcome must be expressed in terms of probabilities. The probability p that a certain event occurs is a number in the interval 0 ≤ p ≤ 1. An event with probability p = 0 is one that never occurs, and an event with probability p = 1 is one that is certain to occur. So, for example, when tossing a coin N times and recording each outcome as an H (head) or a T (tail), if the number of heads is NH and the number of tails is NT , so that N = NH + NT , the numbers NH /N and NT /N will be approximations to the respective probabilities that a head or a tail occurs when the coin is tossed. If the coin is unbiased, it is reasonable to expect that as N increases both NH /N and NT /N will approach the value 1/2. This will mean, of course, that the chances of either a head or a tail occurring on each occasion are equal. The example we now outline is called a stochastic process and is illustrated by considering a process that evolves with time and is such that at any given moment it may be in precisely one of N different situations, usually called states, where N is ﬁnite. We shall denote the N states in which the process may ﬁnd itself at any given time tm by S1 , S2 , . . . , SN , with m = 0, 1, 2, . . . , and tm−1 < tm, it being assumed that the outcome at each time depends on probabilities, and so is not deterministic.
• Section 3.2 Some Problems That Give Rise to Matrices 131 To formulate the problem we assume that what are called the conditional probabilities pki (also called transition probabilities) that determine the probability with which the process will be in state S j at time tm are all known, given that it was in state Sk at time tm−1 , and that these probabilities are the same from t1 to t2 as from tm−1 to tm for m = 0, 1, 2, . . . . This last assumption means that the probability with which the transition from state Sk to S j occurs is independent of the time at which the process was in state Sk. The conditional probabilities can be arranged as the N × N matrix P = [ p jk], so as probabilities are involved, all the p jk are nonnegative, and as each stage must have an outcome, the sum of the elements in every row of matrix P must equal 1. A matrix P with these properties, namely that N 0 ≤ p jk ≤ 1, 0 ≤ j ≤ N, 0 ≤ k ≤ N, p jk = 1, and j=1 stochastic matrix and a Markov process is called a stochastic matrix (see Exercise 53, Section 3.1). Processes like these, whose condition at any subsequent instant does not depend on how the process arrived at its present state, are called Markov processes. Simple but typical examples of such processes involving only two states are gambling wins and losses, the reliability of machines that may either be operational or under repair, shells ﬁred from a gun that either hit or miss the target and errors that introduce an incorrect digit 1 or 0 when transferring binary coded information. To develop the argument a little further, let us now consider a process that can be in one of two states, and that the matrix P describing its transitions is given by P= 2/3 1/3 . 1/4 3/4 Now suppose that initially the probability distribution is given by the row matrix E(0) = [ p, q], where, of course, p + q = 1. Then if E(m) denotes the probability distribution of the states at time tm, it follows that E(1) = E(0)P, but as P is independent of the time we conclude that after m transitions the general result must be E(m) = E(0)Pm, so in this case E(m) = [ p, q] 2/3 1/3 1/4 3/4 m . Direct calculation shows that E(3) = [0.470 p + 0.398q, 0.530 p + 0.602q], E(6) = [0.432 p + 0.426q, 0.568 p + 0.574q], and E(10) = [0.429 p + 0.429q, 0.571 p + 0.571q], so it is reasonable to ask if E(m) tends to a limiting vector as m → ∞ and, if so, what this is? As this problem is simple, an analytical answer is possible, though it involves using a diagonalizing matrix P which will be discussed later.
• 132 Chapter 3 Matrices and Systems of Linear Equations We will see later that P can be written as ADB, where D is a diagonal matrix and AB = I. In this case 1 1 A= 4 , −3 1 0 D= 0 , 5/12 and B= 3/7 1/7 1 0 0 5/12 3/7 1/7 4/7 , −1/7 4/7 . −1/7 so P= 1 1 4 −3 In what follows we will need to make repeated use of the fact that BA = 3/7 1/7 4/7 −1/7 1 1 4 1 0 = = I. −3 0 1 Using this last property we ﬁnd that P2 = = 1 1 4 −3 1 4 1 −3 1 0 0 5/12 1 0 0 5/12 3/7 1/7 2 4/7 −1/7 3/7 1/7 1 1 4 −3 1 0 0 5/12 3/7 1/7 4/7 −1/7 4/7 . −1/7 However, when a diagonal matrix is raised to a power, each of its elements is raised to that same power (see Problem 41, Section 3.1), so P2 = 1 1 4 −3 1 0 0 (5/12)2 3/7 1/7 4/7 −1/7 Pm = 1 1 4 −3 1 0 0 (5/12)m 3/7 1/7 4/7 . −1/7 and, in general, Thus, ⎡ 3 + 4(5/12)m ⎢ 7 Pm = ⎢ ⎣ 3 − 3(5/12)m 7 ⎤ 4 − 4(5/12)m ⎥ 7 ⎥, 4 + 3(5/12)m ⎦ 7 showing that as m → ∞, so lim E(m)Pm = [3( p + q)/7, 4( p + q)/7] = [3/7, 4/7], m→∞ and we have found the limiting state of the system. Stochastic processes also occur that involve more than two states. The problem of determining the probability with which such processes will be in a given state, and when a limiting state exists, the limiting values of the probabilities involved, is of considerable practical importance. An introduction to stochastic process can be found in reference [2.4]. Summary This section has introduced some of the many areas in which matrices play an essential role. These range from electric circuits needing the application of Kirchhoff’s laws, through routing problems involving the concepts of directed graphs and adjacency matrices, to the classical K¨ nigsberg bridge problem, computer graphic operations performed by linear o transformations, the matrix analysis of forces in a framed structure, the oscillations of a coupled mass–spring system, and stochastic processes.
• Section 3.3 Determinants 133 EXERCISES 3.2 1. State which of the following matrices is a stochastic matrix, giving a reason when this is not the case. ⎤ ⎡ ⎤ ⎡ 0.5 0.2 0.3 0.5 0.3 0.2 (c) ⎣0.7 0.3 0.2⎦ . (a) ⎣0.25 0 0.75⎦. 0.4 0.2 0.4 0.5 0.5 0 ⎤ ⎤ ⎡ ⎡ 0.3 0.1 0.6 1.2 0 −0.2 0.2⎦ . (d) ⎣0.8 0 0.2⎦ . (b) ⎣ 0 0.8 0 1 0 0.6 0.3 0.1 4. 2. Given the stochastic matrix 5. 3/4 P= 1/2 4 3 1 2 FIGURE 3.13 1/4 1/2 4 5 and the initial probability distribution E(0) = [ p, q], with p, q ≥ 0 and p + q = 1, the probability distribution of the two states at time tm is given by 3 E(m) = E(0)Pm. Find E(2), E(4), and E(6), together with their values when p = 1/4, q = 3/4. In Exercises 3 through 6 ﬁnd the adjacency matrices for the given graphs and digraphs. FIGURE 3.14 6. 4 3. 5 1 5 6 6 3 4 2 3 2 FIGURE 3.15 FIGURE 3.12 3.3 1 Determinants Every square matrix A with numbers as elements has associated with it a single unique number called the determinant of A, which is written detA. If A is an n × n matrix, the determinant of A is indicated by displaying the elements ai j of A between two vertical bars as follows: notation for a determinant a11 a21 detA = ··· an1 a12 a22 ··· an2 · · · a1n · · · a2n . ··· ··· · · · ann (5) The number n is called the order of determinant A, and in (5) the vertical bars are used to distinguish detA, that is a number, from matrix A that is an n × n array of numbers.
• 134 Chapter 3 Matrices and Systems of Linear Equations A general deﬁnition of the value of detA in terms of its elements ai j will be given later, so for the moment we deﬁne only the value of ﬁrst and second order determinants (see Section 1.7). If A only contains a single element a11 so A = [a11 ] then, by deﬁnition, detA = a11 , and if A is the 2 × 2 matrix A= a11 a21 a12 , a22 then, by deﬁnition, a11 a21 det A = a12 = a11 a22 − a21 a12 . a22 (6) Notice that in (6) the numerical value of detA is obtained by forming the product of the two terms a11 and a22 on the leading diagonal, and subtracting from it the product of the two terms a21 and a12 on the cross diagonal. This process, called expanding the determinant, is easily remembered by representing the method by which the determinant is expanded as a12  = a11 a22 − a21 a12 , d  a21  d a22  a11 where the product involving the downward arrow generates the ﬁrst pair of terms on the right and the product involving the upward arrow indicates that the product of the associated pair of terms is to be subtracted. EXAMPLE 3.10 Find detA given 3 −1 (a) detA = 2 6 and (b) detA = 1+i −3i i . 2 Solution (a) Using (5) we have detA = 3 2 −1 = 3 · 6 − 2 · (−1) = 20. 6 (b) Again using (5) we have detA = 1+i −3i i = (1 + i) · 2 − (−3i) · i = −1 + 2i. 2 To provide some motivation for the introduction of determinants, we solve by elimination the two linear simultaneous algebraic equations a11 x1 + a12 x2 = b1 a21 x1 + a22 x2 = b2 . (7) To eliminate x2 we multiply the ﬁrst equation by a22 and the second equation by a12 , and then subtract the results to obtain (a11 a22 − a21 a12 )x1 = a22 b1 − a12 b2 . This shows that when a11 a22 − a21 a12 = 0, x1 = a22 b1 − a12 b2 . a11 a22 − a21 a12
• Section 3.3 Determinants 135 This result can be expressed in terms of detA as x1 = (a22 b1 − a12 b2 )/detA. (8) Similarly, when x1 is eliminated from equations (7) we ﬁnd that x2 = (a11 b2 − a21 b1 )/detA. Cramer’s rule for a system of two equations (9) Examination of (8) and (9) shows that their numerators can be written in terms of determinants that are closely related to detA, because x1 = D1 D and D1 = b1 b2 a12 , a22 x2 = D2 , D (10) where D = detA, and D2 = a11 a21 b1 . b2 (11) The form of solution of equations (7) in terms of the determinants in (10) and (11) is called Cramer’s rule. The rule itself says that xi = Di /D for i = 1, 2, where determinant D1 is obtained from D = detA by replacing the ﬁrst column of A by the nonhomogeneous terms b1 and b2 on the right of equations (7), and determinant D2 is obtained from D by replacing the second column of A by these same two terms. EXAMPLE 3.11 Use Cramer’s rule to solve the equations 3x1 + 5x2 = 4 2x1 − 4x2 = 1. Solution The three determinants required by Cramer’s are D = detA = 3 2 5 = −22, −4 D1 = 4 1 5 = −21, −4 D2 = 3 2 4 = −5, 1 so x1 = D1 /D = 21/22 and x2 = D2 /D = 5/22. minors and cofactors This example shows how determinants enter naturally into the solution of a system of equations. As determinants of order n > 2 occur in the study of differential equations, analytical geometry, throughout linear algebra, and elsewhere, it is necessary to generalize the deﬁnition of a determinant of order 2 given in (6) to determinants of any order n. With this objective in mind, we ﬁrst deﬁne the minors and cofactors of a determinant of order n. The minor Mi j associated with the element ai j in the ith row and jth column of the nth order determinant in (5) is the determinant of order n − 1 formed from detA by deleting the elements in the ith row and jth column. As each element of detA has an associated minor, a determinant of order n has n2 minors. By way of example, the minor M3 j of the nth order determinant in (5) is the determinant of order n − 1 a11 a21 M3 j = a41 ··· an1 a12 a22 a42 ··· an2 · · · a1 j−1 · · · a2 j−1 · · · a4 j−1 ··· ··· · · · anj−1 a1 j+1 a2 j+1 a4 j+1 ··· anj+1 · · · a1n · · · a2n · · · a4n . ··· ··· · · · ann (12)
• 136 Chapter 3 Matrices and Systems of Linear Equations The cofactor Ci j associated with the element ai j in determinant (5) is deﬁned in terms of the minor Mi j as Ci j = (−1)i+ j Mi j for i, j = 1, 2, . . . , n, (13) so an nth order determinant has n2 cofactors. EXAMPLE 3.12 Find the minors and cofactors of detA = 2 1 −3 . 4 Solution Inspection shows that M11 = 4, M12 = 1, M21 = −3, and M22 = 2. Using deﬁnition (12), the cofactors are seen to be C11 = (−1)1+1 M11 = 4, C12 = (−1)1+2 M12 = −1, C21 = (−1)2+1 M21 = 3, and C22 = (−1)2+2 M22 = 2. Recognizing that the cofactors of the second order determinant detA = expanding a second order determinant in terms of rows or columns a11 a21 a12 a22 are C11 = a22 , C12 = −a21 , C21 = −a12 , and C22 = a11 , we see from the deﬁnition detA = a11 a22 − a21 a12 that detA can be expressed in terms of these cofactors in four different ways: detA = a11 C11 + a12 C12 , using elements and cofactors from the ﬁrst row of A; detA = a21 C21 + a22 C22 , using elements and cofactors from the second row of A; detA = a11 C11 + a21 C21 , using elements and cofactors from the ﬁrst column of A; detA = a12 C12 + a22 C22 , using elements and cofactors from the second column of A. This has proved by direct calculation that the value of the general second order determinant detA is given by the sum of the products of the elements and their associated cofactors in any row or column of the determinant. When the deﬁnition of a determinant is extended to the case n > 2 it will be seen that this same property remains true. There are various ways of deﬁning an nth order determinant, and from among these we have chosen to use one that involves a recursive process. More will be said about this recursive process, and how it can be used to evaluate the determinant, once the deﬁnition has been formulated. Deﬁnition of a determinant of order n The nth order determinant detA in which the element ai j has the associated cofactor Ci j for i, j = 1, 2, . . . , n is deﬁned as a11 a21 detA = ··· an1 a12 a22 ··· an2 · · · a1n · · · a2n = ··· ··· · · · ann n a1 j C1 j . j=1 (14)
• Section 3.3 Determinants 137 Recalling the different ways in which a second order determinant can be evaluated, we see that the expansion of detA in (14) is in terms of the elements and cofactors of the ﬁrst row, so for conciseness this expansion is said to be in terms of the elements of the ﬁrst row. The recursive process enters this deﬁnition through the fact that each cofactor C1 j is a determinant of order n − 1, as can be seen from (12), so each cofactor in turn can be expanded in terms of determinants of order n − 2, and the process continued until determinants of order 2 are obtained that can then be calculated using (6). EXAMPLE 3.13 Expand 1 detA = 2 1 4 −1 0 3 . 2 1 Solution To expand this third order determinant using (14), we must ﬁnd the cofactors of the elements of the ﬁrst row, so to do this we ﬁrst ﬁnd the minors and then use (13) to ﬁnd the cofactors, as a result of which we ﬁnd that M11 = 0 2 3 = −6, 1 so C11 = (−1)1+1 (−6) = −6 M12 = 2 1 3 = −1, 1 so C12 = (−1)1+2 (−1) = 1 M13 = 2 1 0 = 4, 2 so C13 = (−1)1+3 (4) = 4. As the elements of the ﬁrst row are a11 = 1, a12 = 4, and a13 = −1, we ﬁnd from (12) that detA = (1)C11 + (4)C12 + (−1)C13 = (1)(−6) + (4)(1) + (−1)(4) = −6. The determinant associated with either an upper or a lower triangular matrix A of any order is easily expanded, because repeated application of (12) shows that it reduces to the product of the terms on the leading diagonal, so the expansion of the nth order upper triangular determinant with elements a11 , a22 , . . . , ann on its leading diagonal a11 0 detA = 0 0 a12 a22 0 0 · · · a1n · · · a2n = a11 a22 . . . ann , ··· ··· 0 ann (15) and a corresponding result is true for a lower triangular matrix. Deﬁnition (14) can be used to prove that nth order determinants, like second order determinants, have the property that their value is given by the sum of the products of the elements and their cofactors in any row or column. This result, together with a generalization concerning the vanishing of the sum of the products of the elements in any row (or column) and the corresponding cofactors in a different row (or column), forms the next theorem. The details of the proof can be found in linear algebra texts, for example, [2.1], [2.5], [2.7], [2.9], but the method used has no other application in what is to follow, so the proof will be omitted.
• 138 Chapter 3 Matrices and Systems of Linear Equations THEOREM 3.3 Laplace expansion theorem Laplace expansion theorem and an extension Let A be an n × n matrix with elements ai j . Then, (i) detA can be expanded in terms of elements of its ith row and the cofactors Ci j of the ith row as n detA = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin = ai j Ci j j=1 for any ﬁxed i with 1 ≤ i ≤ n. (ii) detA can be expanded in terms of elements of its jth column and the cofactors Ci j of the jth column as n detA = a1 j C1 j + a2 j C2 j + · · · + anj Cnj = ai j Ci j j=1 for any ﬁxed j with 1 ≤ j ≤ n. (iii) The sum of the products of the elements of the ith row with the corresponding cofactors of the kth row is zero when i = k. (iv) The sum of the products of the elements in the jth column with the corresponding cofactors of the kth column is zero when j = k. Results (i) and (ii) are often used to advantage when a row or column contains many zeros, because if the determinant is expanded in terms of the elements of that row or column, the cofactors associated with each zero element need not be calculated. Results (iii) and (iv) simply say that the sum of the products of the elements in any row (or column) with the corresponding cofactors in a different row (or column) is zero. PIERRE SIMON LAPLACE (1749–1827) A French mathematician of remarkable ability who made contributions to analysis, differential equations, probability, and celestial mechanics. He used mathematics as a tool with which to investigate physical phenomena, and made fundamental contributions to hydrodynamics, the propagation of sound, surface tension in liquids, and many other topics. His many contributions had a wide-ranging effect on the development of mathematics. EXAMPLE 3.14 Verify Theorem 3.3(i) by expanding the determinant in Example 3.13 in terms of the elements of its second row. Use the determinant to check the result of Theorem 3.3(iii). Solution The second row contains a zero element in its mid position, so the cofactor C22 associated with the zero element need not be calculated. The necessary cofactors in the second row that follow from the minors are M21 = 4 2 −1 =6 1 so C21 = (−1)2+1 (6) = −6 M23 = 1 1 4 = −2 2 so C23 = (−1)2+3 (−2) = 2.
• Section 3.3 Determinants 139 As a21 = 2 and a23 = 3, it follows from Theorem 3.3(i) that when detA is expanded in terms of elements of its second row, detA = (2)(−6) + (3)(2) = −6, conﬁrming the result obtained in Example 3.13. As a particular case of Theorem 3.3(iii), let us show that the sum of the products of the cofactors in the ﬁrst row of detA and the corresponding elements in the third row is zero. In Example 3.13 it was found that C11 = −6, C12 = 1, and C13 = 4, so as the elements of the third row are a31 = 1, a32 = 2, and a33 = 1, we have a31 C11 + a32 C12 + a33 C13 = (−6)(1) + (2)(1) + (1)(4) = 0, conﬁrming the result of Theorem 3.3(iii) when the elements of row 3 and the cofactors of row 1 are used. Determinants have a number of special properties that can be used to simplify their expansion, though their main uses are found elsewhere in mathematics, where determinants often characterize some important theoretical feature of a problem. The most important and useful of these properties are contained in the next theorem. THEOREM 3.4 basic properties of determinants Properties of determinants A determinant detA has the following properties: (i) If any row or column of a determinant detA only contains zero elements, then detA = 0. (ii) If A is a square matrix with the transpose AT , then detA = detAT . (iii) If each element of a row or column of a square matrix A is multiplied by a constant k, then the value of the determinant is kdetA. (iv) If two rows (or columns) of a square matrix are interchanged, the sign of the determinant is changed. (v) If any two rows or columns of a square matrix A are proportional, then detA = 0. (vi) Let the square matrix A be such that each element ai j of the ith row (or the (1) (2) jth column) can be written as ai j = ai j + ai j . Then if A1 is the matrix derived from (1) A by replacing its ith row (or jth column) by the elements ai j and A2 is the matrix (2) derived from A by replacing its ith row (or jth column) by the elements ai j , detA = detA1 + detA2 . (vii) The addition of a multiple of a row (or column) of a determinant to another row (or column) of the determinant leaves the value of the determinant unchanged. (viii) Let A and B be two n × n matrix, then det(AB) = detA detB. Proof (i) The result follows by expanding the determinant in terms of the row or column that only contains zero elements.
• 140 Chapter 3 Matrices and Systems of Linear Equations (ii) The result follows from the fact that expanding detA in terms of the elements of its ﬁrst row is the same as expanding detAT in terms of the elements of its ﬁrst column. (iii) The result follows by expanding the determinant in terms of the row or column in which each element has been multiplied by the constant k, because k appears as a factor in each term, so the result becomes kdetA. (iv) The proof is by induction, starting with a second order determinant for which the result can be seen to be true from deﬁnition (6). To proceed with an inductive proof we assume the results to be true for a determinant of order r − 1, and show it must be true for a determinant of order r. Expand a row of a determinant of order r in terms of the elements of a row (or column) that has not been interchanged. Then, by hypothesis, as the cofactors are determinants of order r − 1, their signs will all be reversed. This establishes that if the hypothesis is true for a determinant of order r − 1 it must also be true for a determinant of order r . As the result is true for r = 2, it follows by induction that it is true for all integers r > 2, and the result is proved. (v) If the value of the determinant is detA, and one row is k times another, then from (ii) by removing the factor k from the row the value of the determinant will be kdetA1 , where A1 is now a determinant with two identical rows. From (ii), interchanging two rows changes the sign of the determinant, but the rows are identical, leaving the determinant invariant, so detA1 = 0. A similar argument shows the result to be true when two columns are proportional, so the result is proved. (vi) The result is proved directly by expanding the determinant in terms of the elements of the ith row (or the jth column). (vii) Let the square matrix B be obtained from A by adding k times the ith row (or a column) to the jth row (or column). Then from (iii) and (vi), detB = detA + kdetC, where C is obtained from A by replacing the ith row (or column) by the jth row (or column). As detC has two identical rows (or columns), it follows from (v) that detC = 0, so detB = detA and the result is proved. (viii) A proof of this result will be given later after the introduction of elementary row operation matrices. Cramer’s rule, which was ﬁrst encountered when seeking the solution of the two equations in (7), can be extended to a system of n equations in a very straightforward manner, and it takes the following form. Cramer’s rule Cramer’s rule for a system of n equations in n unknowns The solution of the system of n equations in the n unknowns x1 , x2 , . . . , xn a11 x1 + a12 x2 + · a21 x1 + a22 x2 + · · · an1 x1 + an2 x2 + · · · · · · + a1n xn = b1 · + a2n xn = b2 · · · · + ann xn = bn
• Section 3.3 Determinants 141 is given by xi = detAi /detA for i = 1, 2, . . . , n, where detA is the determinant of the coefﬁcient matrix with elements ai j , and detAi is the determinant obtained from the coefﬁcient matrix by replacing its ith column by the column containing the number b1 , b2 , . . . , bn . The justiﬁcation for Cramer’s rule in this more general form will be postponed until after the introduction of inverse matrices, when a simple proof can be given. Cramer’s rule is mainly of theoretical importance and, in general, it should not be used to solve equations when n > 3. This is because the number of multiplications required to evaluate a determinant of order n is (n − 1)n!, so to solve for n unknowns (n + 1) determinants must be evaluated leading to a total of (n2 − 1)n! multiplications, and this calculation becomes excessive when n > 3. An efﬁcient way of solving large systems by means of elimination is given in Chapter 19. EXAMPLE 3.15 Use Cramer’s rule to solve x1 − 2x2 + x3 = 1 2x1 + x2 − 2x3 = 3 −x1 + 3x2 + 4x3 = −2. Solution The determinants involved are detA = detA2 = −2 1 3 1 −2 = 29, 4 detA1 = 1 3 −2 −2 1 1 −2 = 37 3 4 1 1 1 2 3 −2 = 1, −1 −2 4 detA3 = 1 2 −1 −2 1 1 3 = −6, 3 −2 1 2 −1 so x1 = 37/29, x2 = 1/29, and x3 = −6/29. A purely algebraic approach to the study of determinants and their properties is to be found in reference [2.8], and many examples of their applications are given in references [2.11] and [2.12]. Summary This section has extended to an nth order determinant the basic notion of a second order determinant that was reviewed in Chapter 1, and then established its most important properties. The Laplace expansion formulas that were established are of theoretical importance, but it will be seen later that the practical evaluation of a determinant is most easily performed by reducing the n × n matrix associated with a determinant to its echelon form.
• 142 Chapter 3 Matrices and Systems of Linear Equations EXERCISES 3.3 In Exercises 1 through 4 ﬁnd detA. 2 1. detA = 0 3 2. detA = −1 3. −2 1 4 2 −1 1 −4 2 3. detA = −2 5 10. 2 3 1 detA = 0 − sin x . cos x −3 1 4 2 −1 5 = 87, 4 2 5 conﬁrm by direct calculation that (a) interchanging the ﬁrst and last rows changes the sign of detA and (b) interchanging the second and third columns changes the sign of detA. 6. Given that detA = 2 5 −1 1 3 −2 2 = −24, 1 3 conﬁrm by direct calculation that (a) adding twice row two to row three leaves detA unchanged and (b) subtracting three times column three from column one leaves detA unchanged. Establish the results in Exercises 7 through 12 without a direct expansion of the determinant by using the properties listed in Theorem 3.4. 7. 1+a b c a a 1+b b = (1 + a + b + c). c 1+c 1 a b+c 8. 1 b c + a = 0. 1 c a+b 2 9. a a 1 2 b b 1 2 ac bc = x 4 (x 2 + a 2 + b2 + c2 ). 2 2 x +c c c = (a − b)(a − c)(b − c). 1 a b 1 b = (a + b + 1)(a − 1)(b − 1). b 1 k 1 12. 1 1 −3 0. 4 4 0 4. detA = −2 cos x 5 sin x 5. Given that ab x 2 + b2 cb 1 11. a a 1 2. 2 4 1 −2 x2 + a 2 ab ac 1 k 1 1 1 1 k 1 1 1 = (k + 3)(k − 1)3 . 1 k In Exercises 13 and 14 use Cramer’s rule to solve the system of equations. 13. 2x1 − 3x2 + x3 = 4 x1 + 2x2 − 2x3 = 1 3x1 + x2 − 2x3 = −2. 14. 3x1 + x2 + 2x3 = 5 2x1 − 4x2 + 3x3 = −3 x1 + 2x2 + 4x3 = 2. 15. Let P(λ) be given by P(λ) = 3−λ 2 4 0 1 2−λ 2 , 2 1−λ where λ is a parameter. Expand the determinant to ﬁnd the form of the polynomial P(λ) and use the result to ﬁnd for what values of λ the determinant vanishes. 16. Let P(λ) be given by P(λ) = 4−λ 1 −1 0 1 −λ 1 , −2 2 − λ where λ is a parameter. Expand the determinant to ﬁnd the form of the polynomial P(λ) and use the result to ﬁnd for what values of λ the determinant vanishes. 17. Given that ⎡ −3 A=⎣ 1 1 ⎤ 0 4 2 −1⎦ 0 1 ⎡ and 1 2 B = ⎣2 3 3 1 ⎤ 3 1⎦ , 2 calculate det(AB), detA, detB, and hence verify that det(AB) = detAdetB.
• Section 3.4 3.4 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication 143 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication To motivate what is to follow we will examine the processes involved when solving by elimination the system of linear equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm, (16) though later more will need to be said about the details of this important problem, and how it is inﬂuenced by the number of equations m and the number of unknowns n. Elementary Row Operations the three basic types of elementary row operation The three types of elementary row operations used when solving equations (16) by elimination are: TYPE I The interchange of two equations TYPE II The scaling of an equation by a nonzero constant TYPE III The addition of a scalar multiple of an equation to another equation In matrix notation the system of equations (16) becomes Ax = b, (17) where A = [ai j ] is an m × n matrix, x = [x1 , x2 , . . . , xn ]T , and b = [b1 , b2 , . . . , bm]T . The three elementary row operations of types I to III that can be performed on the equations in (16) can be interpreted as the corresponding operations performed on the rows of the matrices A and b. This is equivalent to performing these same operations on the rows of the new matrix denoted by (A, b), deﬁned as ⎡ a11 a12 . . . ⎢ a21 a22 . . . ⎢ (A, b) = ⎢ · · · · · · ⎣ am1 the augmented matrix am2 a1n a2n . . . amn ⎤ b1 b2 ⎥ ⎥ ⎥, ⎦ (18) bm that has m rows and n + 1 columns and is obtained by inserting the column vector b containing the nonhomogeneous terms on the right of matrix A. When considering the system of linear equations in (16), matrix (A, b) is called the augmented matrix associated with the system. The separation of the last column in (18) by a vertical dashed line is to indicate partitioning of the matrix to show that the elements of the last column are not elements of the coefﬁcient matrix A.
• 144 Chapter 3 Matrices and Systems of Linear Equations We are now in a position to introduce a notation for the three elementary row operations that are necessary when using an elimination process to ﬁnd the solution of a system of equations in matrix form (ordinary or augmented). Elementary row operations The three elementary row operations that may be performed on a matrix are: (i) The interchange of the ith and jth rows, which will be denoted by R{i → j, j → i}. (ii) The replacement of each element in the ith row by its product with a nonzero constant α, which will be denoted by R{(α)i → i}. (iii) The replacement of each element of the jth row by the sum of β times the corresponding element in the ith row and the element in the jth row, which will be denoted by R{(β)i + j → j}. EXAMPLE 3.16 To illustrate the elementary row operations, we consider the matrix ⎡ ⎤ 1 6 4 −3 2 7 4⎦ . A = ⎣2 0 1 5 2 8 2 3 An example of an elementary row operation of type (i) performed on A is provided by R{1 → 3, 3 → 1}. This requires rows 1 and 3 to be interchanged to give the new matrix ⎡ ⎤ 5 2 8 2 3 7 4⎦ . R{1 → 3, 3 → 1}A = ⎣2 0 1 1 6 4 −3 2 An example of an elementary row operation of type (ii) performed on A is provided by R{(−3)1 → 1}. This requires each element in row 1 to be multiplied by −3 to give the new matrix ⎡ ⎤ −3 −18 −12 9 −6 0 1 7 4⎦ . R{(−3)1 → 1}A = ⎣ 2 5 2 8 2 3 An example of an elementary row operation of type (iii) performed on A is provided by R{(4)1 + 2 → 2}, which requires the elements of row 1 to be multiplied by 4 and then added to the corresponding elements of row 2 to give the new matrix ⎡ ⎤ 1 6 4 −3 2 R{(4)1 + 2 → 2}A = ⎣6 24 17 −5 12⎦ . 5 2 8 2 3 A sequence of elementary row operations performed on the augmented matrix (A, b) will lead to a different augmented matrix (A , b ). However, as this is equivalent to performing the corresponding sequence of operations on the actual equations in (16), although (A, b) and (A , b ) will look different, the interpretation of (A , b ) in terms of the solution of the system of equations in (16) will, of course, be the same as that of (A, b). It will be seen later that the purpose of carrying out these operations on a matrix is to simplify it while leaving its essential
• Section 3.4 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication 145 algebraic structure unaltered, e.g., without changing the solution x1 , . . . , xn of the corresponding system of equations. The deﬁnition that now follows is a consequence of the equivalence, in terms of equations (16), of matrix (A, b) and any matrix (A , b ) that can be derived from it by means of a sequence of elementary row operations, though the deﬁnition applies to matrices in general, and not only to augmented matrices. Row equivalence of matrices Two m × n matrices will be said to be row equivalent if one can be obtained from the other by means of a sequence of elementary row operations. Row equivalence between matrices A and B is denoted by writing A ∼ B. The row equivalence of matrices has the useful properties listed in the following theorem. THEOREM 3.5 Reﬂexive, symmetric, and transitive properties of row equivalence (i) Every m × n matrix A is row equivalent to itself (reﬂexive property). (ii) Let A and B be m × n matrices. Then if A is row equivalent to B, B is row equivalent to A (symmetric property). (iii) Let A, B, and C be m × n matrices. Then if matrix A is row equivalent to B and B is row equivalent to C, A is row equivalent to C (transitive property). Proof (i) The property is self-evident. (ii) To establish this property we must show the three elementary row operations involved are reversible. In the case of elementary row operations of type (i) the result follows from the fact that if an application of the operation R{i → j, j → i} to matrix A yields a new matrix B, an application of the operation R{ j → i, i → j} to matrix B generates the original matrix A. Similarly, in the case of elementary row operations of type (ii), if an application of the operation R{(α)i → i} to matrix A yields a new matrix B, an application of the operation R{(1/α)i → i} to matrix B reproduces the original matrix A. Finally we consider the case of elementary row operations of type (iii). If an application of the operation R{(β)i + j → j} to matrix A yields a new matrix B, an application of the operation R{(−β)i + j → j} to B returns the original matrix A. Taken together these results establish property (ii). (iii) Using property (ii) in (iii) establishes the row equivalence ﬁrst of A and B, and then of B and C, and hence of A and C, so property (iii) is proved. Let us now deﬁne what are called elementary matrices and examine the effect they have when used to premultiply a matrix. Elementary matrices An n × n elementary matrix is any matrix that is obtained from an n × n unit matrix I by performing a single elementary row operation.
• 146 Chapter 3 Matrices and Systems of Linear Equations The following concise notation will be used to identify the elementary matrices that correspond to each of the three elementary row operations. the three basic types of elementary matrix EXAMPLE 3.17 TYPE I Ei j will denote the elementary matrix obtained from the unit matrix I by interchanging its ith and jth rows. TYPE II Ei (c) will denote the matrix obtained from the unit matrix I by multiplying its ith row by the nonzero scalar c. TYPE III Ei j (c) will denote the matrix obtained from the unit matrix I by adding c times its ith row to its jth row. Let I be the 3 × 3 unit matrix. Then ⎡ ⎤ ⎡ 1 0 0 1 0 I = ⎣0 1 0⎦ , E23 = ⎣0 0 0 0 1 0 1 ⎤ ⎡ 0 1 1⎦ , E3 (4) = ⎣0 0 0 ⎡ ⎤ 1 0 0 E13 (5) = ⎣0 1 0⎦ . 5 0 1 0 1 0 ⎤ 0 0⎦ , 4 and Determinants of Elementary Matrices It follows directly from the deﬁnitions of elementary matrices that: (a) The determinant of an elementary matrix of Type I is −1, because two rows of a unit matrix have been interchanged so, in terms of Ei j , we have det(Ei j ) = −1. (b) The determinant of an elementary matrix of Type II in which a row is multiplied by a nonzero constant c is c, because a row of a unit matrix has been multiplied by c so, in terms of Ei (c), we have det(Ei (c)) = c. (c) The determinant of an elementary matrix of Type III in which c times one row has been added to another row is 1, because the addition of a multiple of a row of a unit matrix to another row leaves its value unchanged so, in terms of Ei j (c), we have det(Ei j (c)) = 1. The next theorem shows that premultiplication of a matrix A by an elementary matrix E that is conformable for multiplication performs on A the same elementary row operation that was used to generate E from I. THEOREM 3.6 Row operations performed by elementary matrices Let E be an m × m elementary matrix produced by performing an elementary row operation on the unit matrix I, and let A be an m × n matrix. Then the matrix product EA is the matrix that is obtained when the row operation that generated E from I is performed on A. Proof The proof of the theorem follows directly from the deﬁnition of a matrix product and the fact that, with the exception of the ith element in the ith row of I, which is 1, all the other elements in that row are zero. So if E is the elementary matrix obtained from I by replacing the element 1 in its ith row by α, the result of the matrix product EA will be that the elements in the ith row of A will be multiplied by α. As the form of argument used to establish the effect on A of premultiplication by P to form PA can also be employed when the other two elementary row operations are used to generate an elementary matrix E, the details will be left as an exercise.
• Section 3.5 EXAMPLE 3.18 The Echelon and Row-Reduced Echelon Forms of a Matrix Let A be the matrix ⎡ 2 A = ⎣1 6 4 3 1 147 ⎤ 5 7⎦ . 2 If we use the notation for elementary matrices, and introduce the elementary matrix E23 from Example 3.17 obtained by interchanging the last two rows of I3 , a routine calculation shows that ⎡ ⎤⎡ ⎤ ⎡ ⎤ 1 0 0 2 4 5 2 4 5 E23 A = ⎣0 0 1⎦ ⎣1 3 7⎦ = ⎣6 1 2⎦ , 0 1 0 6 1 2 1 3 7 so the product E23 A has indeed interchanged the last two rows of A. Similarly, again using the elementary matrices in Example 3.17, it is easily checked that E3 (4)A multiplies the elements in the third row of A by 4, while E13 (5)A adds ﬁve times the ﬁrst row of A to the last row. The main use of Theorem 3.6 is to be found in the theory of matrix algebra, and in the justiﬁcation it provides for various practical methods that are used when working with matrices. This is because when solving purely numerical problems the necessary row operations need only be performed on the rows of the augmented matrix instead of on the system of equations itself. Typical uses of the theorem will occur later after a discussion of the linear independence of equations, the deﬁnition of what is called the rank of a matrix, and the introduction of the inverse of an n × n matrix A. In this last case, the results of the theorem will be used to provide an elementary method by which what is called the inverse matrix of an n × n matrix can be obtained when n is small. Summary 3.5 This section introduced the three types of elementary row operations that are used when manipulating matrices together with the corresponding three types of elementary matrix that can be used to perform elementary row operations. The Echelon and Row-Reduced Echelon Forms of a Matrix We now use the row equivalence of matrices to reduce a matrix A to one of two slightly different but related standard forms called, respectively, its echelon form and its row-reduced echelon form. It is helpful to introduce these two new concepts by considering the solution of the system of m equations in n unknowns introduced in (16) and written in an equivalent but more condensed form as (A, b), where ⎡ a11 ⎢ a21 ⎢ (A, b) = ⎣ am1 a12 a22 · · am2 . . . a1n . . . a2n · · · · . . . amn ⎤ b1 b2 ⎥ ⎥, ⎦ bm because this is equivalent to the full matrix equation Ax = b. (19)
• 148 Chapter 3 Matrices and Systems of Linear Equations Echelon and row-reduced echelon forms of a matrix echelon and row-reduced echelon forms A matrix A is said to be in echelon form if: (i) The ﬁrst nonzero element in each row, called its leading entry, is 1; (ii) In any two successive rows i and i + 1 that do not consist entirely of zeros the leading element in the (i + 1)th row lies to the right of the leading element in ith row; (iii) Any rows that consist entirely of zeros lie at the bottom of the matrix. Matrix A is said to be in row-reduced echelon form if, in addition to conditions (i) to (iii), it is also true that (iv) In a column that contains the leading entry of a row, all the other elements are zero. In summary, this deﬁnition means that a matrix A is in echelon form if the ﬁrst nonzero entry in any row is a 1, the entry appears to the right of the ﬁrst nonzero entry in the row above, and all rows of zeros lie at the bottom of the matrix. Furthermore, matrix A is in row-reduced echelon form if, in addition to these conditions, the ﬁrst nonzero entry in any row is the only nonzero entry in the column containing that entry. EXAMPLE 3.19 The following matrices are in echelon form: ⎡ 1 ⎡ ⎤ ⎢0 1 0 5 7 ⎢ ⎣0 0 1 0⎦ and ⎢0 ⎢ ⎣0 0 0 0 0 0 The matrices ⎡ 0 1 0 ⎢0 0 1 ⎢ ⎣0 0 0 0 0 0 2 1 0 0 0 0 1 0 5 3 1 0 ⎤ 0 2⎥ ⎥, 0⎦ 0 ⎡ 1 ⎣0 0 0 1 0 0 0 1 1 0 0 0 0 9 2 1 1 1 0 0 0 1 2 1 0 0 ⎤ 2 3⎦ , 0 1 0 5 1 0 ⎤ 1 1⎥ ⎥ 2⎥ . ⎥ 3⎦ 0 ⎡ and 1 ⎣0 0 0 1 0 0 0 1 ⎤ 5 2⎦ 1 are in row-reduced echelon form. Rules for the reduction of a matrix to echelon form rules for ﬁnding the echelon form The reduction of the m × n matrix to its echelon form is accomplished by means of the following steps: 1. Find the row whose ﬁrst nonzero element is furthest to the left and, if necessary, move it into row 1; if there is more than one such row, choose the row whose ﬁrst nonzero element has the largest absolute value. 2. Scale row 1 to make its leading entry 1. 3. Subtract multiples of row 1 from the m − 1 rows below it to reduce to zero all entries that lie below the leading entry in the ﬁrst column. 4. In the m − 1 rows below row 1, ﬁnd the row whose ﬁrst nonzero entry is furthest to the left and, if necessary, move it into row 2; if there is more
• Section 3.5 5. 6. 7. 8. The Echelon and Row-Reduced Echelon Forms of a Matrix 149 than one such row, choose the row whose ﬁrst nonzero entry has the largest absolute value. Scale row 2 to make its leading entry 1. Subtract multiples of row 2 from the m − 2 rows below it to reduce to zero all entries in the column below the leading entry in row 2. Continue this process until either the ﬁrst nonzero entry in the mth row is 1, or a stage is reached at which all subsequent rows consist entirely of zeros. The matrix is then in its echelon form. Remark The selection in Step 1, and the steps corresponding to Step 4, of a row whose ﬁrst nonzero entry has the largest magnitude is made to reduce computational errors, and is not necessary mathematically. This criterion is introduced to ensure that the elimination procedure does not use an unnecessary scaling of a nonzero entry of small absolute magnitude to reduce to zero an entry of large absolute magnitude. rules for ﬁnding the row-reduced echelon form Rules for the reduction of a matrix to row-reduced echelon form 1. Proceed as in the reduction of a matrix to echelon form, but when steps equivalent to Step 6 are reached, in addition to subtracting multiples of the row containing a leading entry 1 from the rows below to reduce to zero all elements in the column below the leading entry, this same process must be repeated to reduce to zero all elements in the column above the leading entry. 2. An equivalent approach is ﬁrst to reduce the matrix to echelon form and then, starting with row 2 and working downwards, to subtract multiples of successive rows from the rows above to generate columns with leading entries to ones with the single nonzero entry 1. Each of these methods reduces a matrix to its row-reduced echelon form. The row equivalence of a matrix with either its echelon or its row-reduced echelon form means that the different-looking systems of equations represented by these three matrices all have identical solution sets. The simpliﬁed structure of the row echelon and row-reduced echelon forms of the original augmented matrix makes the solution of the associated system of equations particularly easy, as can be seen from the following examples. EXAMPLE 3.20 Reduce the following matrix to its echelon and its row-reduced echelon form: ⎡ 0 ⎢2 ⎢ ⎣1 1 1 4 2 3 2 8 4 6 0 2 2 1 ⎤ 3 4⎥ ⎥. 2⎦ 5
• 150 Chapter 3 Matrices and Systems of Linear Equations ⎡ 0 ⎢2 Solution ⎢ ⎣1 1 1 4 2 3 2 8 4 6 0 2 2 1 ⎡ 1 ⎢0 ∼ ⎢ divide row 1 by 2 ⎣1 1 ⎤ ⎡ ⎤ 3 2 4 8 2 4 ∼ 4⎥ switch rows ⎢0 1 2 0 3⎥ ⎥ ⎢ ⎥ 2⎦ 2 and 1 ⎣1 2 4 2 2⎦ 5 1 3 6 1 5 ⎤ ⎡ 2 4 1 2 1 ∼ 1 2 0 3⎥ subtract row 1 ⎢0 ⎥ ⎢ 2 4 2 2⎦ from rows 3 and 4 ⎣0 3 6 1 5 0 ⎡ 1 ∼ ⎢0 subtract row 2 ⎢ ⎣0 from row 4 0 2 1 0 0 4 2 0 0 2 1 0 1 4 2 0 2 1 0 1 0 ⎤ 2 3⎥ ⎥ 0⎦ 3 ⎤ 2 3⎥ ⎥ 0⎦ 0 1 0 1 0 and the matrix is now in echelon form. Having already obtained the echelon form of the matrix, we now use it to obtain the row-reduced echelon form. We already have ⎡ ⎤ ⎡ ⎤ 0 1 2 0 3 1 2 4 1 2 ∼ ⎢2 4 8 2 4⎥ ⎢0 1 2 0 3⎥ ⎢ ⎥ ⎢ ⎥subtract twice row 2 ⎣1 2 4 2 2⎦ ∼ ⎣0 0 0 1 0⎦ from row 1 1 3 6 1 5 0 0 0 0 0 ⎤ ⎡ ⎡ ⎤ 1 0 0 0 −4 1 0 0 1 −4 ∼ ⎢0 1 2 0 3⎥ 3⎥ subtract row 3 ⎢0 1 2 0 ⎥, ⎢ ⎢ ⎥ ⎦ ⎣0 0 0 1 ⎣0 0 0 1 0⎦ 0 from row 1 0 0 0 0 0 0 0 0 0 0 and the matrix is now in its row-reduced echelon form. EXAMPLE 3.21 Solve the system of equations x2 + 2x3 = 3 2x1 + 4x2 + 8x3 + 2x4 = 4 x1 + 2x2 + 4x3 + 2x4 = 2 x1 + 3x2 + 6x3 + x4 = 5. Solution The augmented matrix (A, b) for this system is the matrix in Example 3.20 that was shown to be equivalent to the row-reduced echelon form ⎤ ⎡ 1 0 0 0 −4 ⎢ ⎥ ⎢0 1 2 0 3⎥ ⎢ ⎥. ⎢0 0 0 1 0⎥ ⎦ ⎣ 0 0 0 0 0 If we recall that the ﬁrst four columns of this matrix contain the coefﬁcients of x1 , x2 , x3 , and x4 , while the last column contains the nonhomogeneous terms, the matrix implies the much simpler system of equations x4 = 0, x2 + 2x3 = 3, and x1 = −4.
• Section 3.5 The Echelon and Row-Reduced Echelon Forms of a Matrix 151 As there are only three equations connecting four unknowns, it follows that in the second equation either x2 or x3 can be assigned arbitrarily, so if we choose to set x3 = k (an arbitrary number), the solution set of the system in terms of the parameter k becomes x1 = −4, x2 = 3 − 2k, x3 = k, and x4 = 0. The same solution could have been obtained from the echelon form of the matrix ⎡ ⎤ 1 2 4 1 2 ⎢ ⎥ ⎢0 1 2 0 3⎥ ⎢ ⎥ ⎢0 0 0 1 0⎥ , ⎣ ⎦ 0 0 0 0 0 because this implies the system of equations x1 + 2x2 + 4x3 + x4 = 2, x2 + 2x3 = 3, and x4 = 0. Starting from the last equation we ﬁnd x4 = 0, and setting x3 = k in the middle equation gives, as before, x2 = 3 − 2k. Finally, substituting x2 , x3 , and x4 in the ﬁrst equation gives x1 = −4. This process of arriving at a solution of a system of equations whose coefﬁcient matrix is in upper triangular form is called back substitution. It should be noticed that the system of equations would have had no solution if the row-reduced echelon form had been ⎤ ⎡ 1 0 0 0 −4 ⎢ ⎥ ⎢0 1 2 0 3⎥ ⎥. ⎢ ⎢0 0 0 1 0⎥ ⎦ ⎣ 5 0 0 0 0 back substitution This is because although the equations corresponding to the ﬁrst three rows of this matrix would have been the same as before, the fourth row implies 0 = 5, which is impossible. This corresponds to a system of equations where one equation contradicts the others, so that no solution is possible. Summary This section deﬁned two related types of fundamental matrix that can be obtained from a general matrix by means of elementary row operations. The ﬁrst was a reduction to echelon form and the second, derived from the ﬁrst form, was a reduction to row-reduced echelon form. Each of the reduced forms retains the essential properties of the original matrix, while simplifying the task of solving the associated system of linear algebraic equations. EXERCISES 3.5 Let P, Q, and R be the matrices ⎡ 3 P = ⎣0 0 0 1 0 ⎤ 0 0⎦ , 1 ⎡ 0 Q = ⎣0 1 0 1 0 ⎤ 1 0⎦ , 0 ⎡ 1 R = ⎣0 0 2 1 0 ⎤ 0 0⎦ . 1 In Exercises 1 through 4 verify by direct calculation that (a) premultiplication by P multiplies row 1 by 3; (b) premultiplication by Q interchanges rows 1 and 3; and (c) premultiplication by R adds twice row 2 to row 1. ⎤ ⎤ ⎡ ⎡ 4 0 1 2 1 1 3. ⎣2 0 3⎦. 1. ⎣1 3 0⎦. 1 2 5 1 2 4 ⎡ ⎤ ⎡ ⎤ 9 1 3 1 −1 2 1 3⎦. 4. ⎣2 4 7⎦. 2. ⎣2 1 2 2 3 0 7
• 152 Chapter 3 Matrices and Systems of Linear Equations In Exercises 5 and 6 write down the required elementary matrices. 5. When I is the 3 × 3 unit matrix, write down E12 , E2 (3), and E12 (6). 6. When I is the 4 × 4 unit matrix, write down E41 , E4 (3), and E23 (4). In Exercises 7 through 12, reduce the given matrices to their row-reduced echelon form. ⎤ ⎡ ⎤ ⎡ 3 2 1 1 0 3 4 1 ⎢ 2 5 1 2⎥ 7. ⎣3 1 2 2⎦. ⎥ ⎢ 1 5 2 1 10. ⎢3 1 1 3⎥. ⎥ ⎢ ⎣ 0 1 3 4⎦ ⎤ ⎡ 4 1 3 1 3 2 1 3 1 8. ⎣2 1 1 2 0⎦. ⎤ ⎡ 2 2 4 1 4 3 2 1 1 0 ⎢1 1 3 2 1⎥ ⎤ ⎡ ⎥ 11. ⎢ 4 −2 2 3 1 ⎣3 2 5 1 4⎦. ⎦. ⎣2 0 0 3 2 9. 1 0 3 1 2 4 1 2 5 1 ⎤ ⎡ 3 2 3 2 12. ⎣3 7 1 −1⎦. 5 1 1 3 3.6 In Exercises 13 through 18, reduce the given augmented matrices to their row-reduced echelon form and, where appropriate, use the result to solve the related system of equations in terms of an appropriate number of the unknowns x1 , x2 , . . . . ⎤ ⎤ ⎡ ⎡ 2 3 1 0 2 1 0 2 1 ⎥ ⎥ ⎢ ⎢ ⎢1 3 1 4 2⎥ 13. ⎢1 3 1 4⎥. ⎥ ⎢ ⎦ ⎣ 16. ⎢ ⎥. ⎢2 1 2 3 1⎥ 6 9 4 8 ⎦ ⎣ ⎤ ⎡ 4 7 4 11 7 2 1 1 0 ⎥ ⎢ ⎤ ⎡ 14. ⎢2 3 1 4⎥. 1 0 1 0 2 0 ⎦ ⎣ ⎥ ⎢ ⎢2 2 6 0 6 0⎥ 4 9 4 8 ⎥ ⎢ 17. ⎢ ⎥. ⎡ ⎤ ⎢1 0 1 1 6 0⎥ 0 2 1 1 1 ⎦ ⎣ ⎢ ⎥ ⎢1 3 1 2 1⎥. 3 2 7 0 8 2 15. ⎣ ⎦ ⎡ ⎤ 3 9 4 3 0 3 0 6 0 6 ⎢ ⎥ 18. ⎢1 1 5 1 9 ⎥. ⎣ ⎦ 2 0 4 2 10 Row and Column Spaces and Rank The reduction of an m × n matrix A to either its echelon or its row-reduced echelon form will produce a row of zeros whenever the row is a linear combination of some (or all) of the rows above it. So if an echelon form contains r ≤ m nonzero rows, it follows that these r rows are linearly independent, and hence that the remaining m − r rows are linearly dependent on the ﬁrst r rows. The number r is called the row rank of matrix A. This means that if the r nonzero rows of an echelon form u1 , u2 , . . . , ur are regarded as n element row vectors belonging to a vector space Rn , the r vectors will span a subspace of Rn . Consequently, as these vectors form a basis for this subspace, every vector in it can be expressed as a linear combination of the form a1 u1 + a2 u2 + · · · + ar ur , row and column ranks and spaces where the a1 , a2 , . . . , ar are scalar constants. This subspace of Rn is called the row space of matrix A. It should be remembered that the vectors forming a basis for a space are not unique, and that any basis can be transformed to any other one by means of suitable linear combinations of the vectors involved. So although the r nonzero rows of the echelon form of A and those of its row-reduced echelon form look different, they are equivalent, and each forms a basis for the row space of A. Just as there may be linear dependence between the rows of A, so also may there be linear dependence between its columns. If s of the n columns of an m × n matrix A are linearly independent, the number s is called the column rank of matrix A. When the s nonzero columns v1 , v2 , . . . , vs are regarded as m element column vectors belonging to a vector space Rm, these vectors will span a subspace of Rm.
• Section 3.6 Row and Column Spaces and Rank 153 Consequently, as these vectors form a basis for this subspace, every vector in it can be expressed as a linear combination of the form b1 v1 + b2 v2 + · · · + bs vs , where the b1 , b2 , . . . , bs are scalar constants. This subspace of Rm is called the column space of matrix A. The connection between the row and column ranks of a matrix is provided by the following theorem. THEOREM 3.7 equality of the rank of a matrix and its transpose The equality of the row and column ranks Let A be any matrix. Then the row rank and column rank of A are equal. Proof Let an m × n matrix A have row rank r . Then in its row-reduced echelon form it must contain r columns v1 , v2 , . . . , vr , in each of which only the single nonzero entry 1 appears. Call these columns the leading columns of the row-reduced echelon form, and let them be arranged so that in the ith column v, the entry 1 appears in the ith row. The row-reduced echelon form of A will comprise the leading columns arranged in numerical order with, possibly, columns between the ith and the (i + 1)th leading columns in which zero elements lie below the ith row but nonzero elements may occur above it. Furthermore, there may be columns to the right of column vr in which zero elements lie below the r th row but nonzero elements may lie above it. By subtracting suitable multiples of the leading columns from any columns that lie between them or to the right of vr , it is possible to reduce all entries in such columns to zero. Consequently, at the end of this process, the only remaining nonzero columns will be the r linearly independent leading columns v1 , v2 , . . . , vr . This establishes the equality of the row and column ranks. Rank The rank of matrix A, denoted by rank (A), is the value common to the row and column ranks of A. THEOREM 3.8 Rank of A and AT Let A be any matrix. Then rank (A) = rank (AT ). Proof The columns of A are the rows of AT , so the column rank of A is the row rank of AT . However, by Theorem 3.7 these two ranks are equal, so the result is proved. EXAMPLE 3.22 Let ⎡ 1 ⎢2 A=⎢ ⎣1 1 0 1 0 0 3 7 3 3 0 0 2 0 4 10 6 4 ⎤ 0 1⎥ ⎥. 4⎦ 0
• 154 Chapter 3 Matrices and Systems of Linear Equations Then the row-reduced echelon form of A is B (B ∼ A) ⎡ ⎤ 1 0 3 0 4 0 ⎢0 1 1 0 2 1⎥ ⎥ B=⎢ ⎣0 0 0 1 1 2⎦, 0 0 0 0 0 0 showing that the number of leading columns is 3, so the row rank of A is 3, and hence its rank is 3. Three row vectors spanning a subspace of R6 , and so forming a basis for this subspace, are the three nonzero row vectors in this 4 × 6 matrix, u1 = [1, 0, 3, 0, 4, 0], u2 = [0, 1, 1, 0, 2, 1], The row-reduced echelon form of AT is ⎡ 1 0 ⎢0 1 ⎢ ⎢0 0 ⎢ ⎢0 0 ⎢ ⎣0 0 0 0 and u3 = [0, 0, 0, 1, 1, 2]. ⎤ 1 0⎥ ⎥ 0⎥ ⎥, 0⎥ ⎥ 0⎦ 0 0 0 1 0 0 0 showing that the number of leading columns is 3, conﬁrming as would be expected that the column rank of A (the row rank of AT ) is 3. The three row vectors of AT spanning a subspace of R4 , and so forming a basis for this subspace, are the three nonzero rows in this 6 × 4 matrix, namely, [1, 0, 0, 1], [0, 1, 0, 0], and [0, 0, 1, 0]. The three linearly independent column vectors of A are obtained by transposing these vectors to obtain ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 ⎢0⎥ ⎢1⎥ ⎢0⎥ v1 = ⎢ ⎥ , v2 = ⎢ ⎥ , and v3 = ⎢ ⎥ . ⎣0⎦ ⎣0⎦ ⎣1⎦ 1 0 0 Summary This section introduced the important algebraic concepts of the rank of a matrix, and of the row and column spaces of a matrix. The equality of the row and column ranks of a matrix was then proved. It will be seen later that the rank of a matrix plays a fundamental role when we seek a solution of a linear algebraic system of equations. EXERCISES 3.6 In Exercises 1 through 14 ﬁnd the row-reduced echelon form of the given matrix, its rank, a basis for its row space, and a basis for its column space. ⎤ ⎡ ⎡ ⎤ 1 3 2 1 1 3 1 0 1 1 ⎢2 0 2 1⎥ 1. ⎣2 2 1 0 0 1⎦ . ⎥ 2. ⎢ ⎣1 0 4 5⎦ . 0 2 1 4 1 3 0 1 2 4 ⎡ 3 ⎢4 3. ⎢ ⎣2 3 4. 2 1 0 2 1 0 0 2 0 0 3 2 0 0 ⎡ ⎤ 6 0 11 3⎥ ⎥. 4 0⎦ 6 3 1 1 0 4 2 1 4 . 2 1 5. ⎣2 3 ⎡ 3 6. ⎣1 8 2 3 2 2 2 8 ⎤ 3 1⎦ . 1 ⎤ 4 2 ⎦. 12
• Section 3.7 ⎡ 1 ⎢3 7. ⎢ ⎣2 0 ⎡ 2 ⎢1 ⎢ 8. ⎢1 ⎢ ⎣3 2 3.7 3 0 3 3 ⎤ 4 4⎥ ⎥. 1⎦ 5 1 1 2 3 3 3 0 1 4 1 ⎤ 1 3⎥ ⎥ 0⎥ . ⎥ 1⎦ 3 ⎡ 1 9. ⎣2 3 2 1 3 1 0 1 4 5 1 2 5 7 ⎡ 2 10. ⎣0 2 4 2 6 0 1 1 10 3 13 The Solution of Homogeneous Systems of Linear Equations ⎤ 7 1⎦ . 8 ⎤ 8 1⎦ . 9 ⎡ 0 −1 ⎢ 0 ⎢0 11. ⎢ ⎢0 0 ⎣ 1 0 ⎡ 1 0 ⎢ 1 −1 ⎢ 12. ⎢ ⎢2 5 ⎣ 1 3 4 3 ⎤ ⎥ 1 2⎥ ⎥. 0 −1 ⎥ ⎦ 0 0 ⎤ 0 0 ⎥ 0 0⎥ ⎥. −1 0 ⎥ ⎦ 2 1 ⎡ 1 ⎤ 7 2 0 5 7⎥. ⎦ 0 0 3 5 0 3 1 1 2 3 3 4 4 5 7 ⎢ 13. ⎣ 0 0 ⎡ 1 ⎢2 ⎢ ⎢ 14. ⎢ 1 ⎢ ⎢3 ⎣ 4 155 ⎤ 1⎥ ⎥ ⎥ 2⎥. ⎥ 3⎥ ⎦ 5 The Solution of Homogeneous Systems of Linear Equations Having now introduced the echelon and row-reduced echelon forms of an m × n matrix A, we are in a position to discuss the nature of the solution set of the system of linear equations homogeneous and nonhomogeneous systems of equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm, (20) which will be nonhomogeneous when at least one of the terms bi on the right is nonzero, and homogeneous when b1 = b2 = · · · = bm = 0. In this section we will only consider homogeneous systems. Rather than working with the full system of homogeneous equations corresponding to bi = 0, i = 1, 2, . . . , m in (20), it is more convenient to work with its coefﬁcient matrix ⎡ a11 ⎢ a21 A=⎢ ⎣ am1 a12 a22 . . . am2 . . . . . . . . ⎤ . a1n . a2n ⎥ ⎥, ⎦ . . amn (21) which contains all the information about the system. The coefﬁcients in the ﬁrst column of A are multipliers of x1 , those in the second column are multipliers of x2 , . . . , and those in the nth column are multipliers of xn . Denote by AE either the echelon or the row-reduced echelon form of the coefﬁcient matrix A. Then, as elementary row operations performed on a coefﬁcient matrix are equivalent in all respects to performing the same operations on the corresponding full system of equations, the solution set of the matrix equation Ax = 0 (22) will be the same as the solution set of an echelon form of the homogeneous equations AE x = 0. (23)
• 156 Chapter 3 Matrices and Systems of Linear Equations trivial solution It is obvious that x = 0, corresponding to x = [0, 0, . . . , 0]T , is always a solution of (22) and, of course of (23), and it is called the trivial solution of the homogeneous system of equations. To discover when nontrivial solutions exist it is necessary to work with the equivalent echelon form of the equations given in (23). If rank(A) = r , the ﬁrst r rows of AE will be nonzero rows, and the last m − r rows will be zero rows. As there are m rows in A, we must consider the three separate cases (a) m < n, (b) m = n, and (c) m > n. Case (a): m < n. In this case there are more variables than equations. As rank(A) = r , and there are m equations, it follows that r = rank(A) ≤ m. The system in (22) will thus contain only r linearly independent equations corresponding to the ﬁrst r rows of AE . So working with system (23), we see that r of the variables x1 , x2 , . . . , xn will be determined in terms of the remaining m − r variables regarded as parameters (see Example 3.23). Case (b): m = n. In this case the number of variables equals the number of equations. If rank(A) = r < n we have the same situation as in Case (a), and the variables x1 , x2 , . . . , xn will be determined by the system of equations in (23) in terms of the remaining m − r variables regarded as parameters. However, if r = n, only the trivial solution x = 0 is possible, because in this case AE becomes the unit matrix In , from which it follows directly that x = 0. Case (c): m > n. In this case the number of equations exceeds the number of variables and r = rank(A) ≤ n. This is essentially the same situation as in Case (b), because if r = rank(A) < n, the variables x1 , x2 , . . . , xn will be determined by the system of equations in (22) in terms of the remaining m − r variables regarded as parameters, while if rank(A) = n only the trivial solution x = 0 is possible. The practical determination of solution sets to homogeneous systems of linear equations is illustrated in the next example. EXAMPLE 3.23 Find the solution sets of the homogeneous systems of linear equations with coefﬁcient matrices given by: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 3 6 1 1 2 1 7 0 1 3 2 ⎢1 4 2 2⎥ ⎥ (a) A = ⎣3 6 4 24 3⎦, (b) A = ⎣2 1 0⎦, (c) A = ⎢ ⎣4 11 10 5⎦, 1 4 4 12 3 1 2 1 1 0 1 1 ⎡ ⎤ 1 4 1 2 ⎡ ⎤ ⎢1 3 0 1⎥ 1 2 3 1 4 3 ⎢ ⎥ (d) A = ⎢2 1 1 1⎥ , (e) A = ⎣0 1 3 0 1 5⎦ ⎢ ⎥ ⎣4 9 3 5⎦ 3 1 2 3 1 4 5 5 2 3 Solution (a) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 8 3 AE = ⎣0 1 0 −2 −3⎦ , 0 0 1 3 3
• Section 3.7 The Solution of Homogeneous Systems of Linear Equations 157 showing that rank(A) = 3. This corresponds to the following three equations between the ﬁve variables x1 , x2 , x3 , x4 , and x5 : x1 + 8x4 + 3x5 = 0, x2 − 2x4 − 3x5 = 0, x3 + 3x4 + 3x5 = 0. and Letting x4 = α and x5 = β be arbitrary numbers (parameters) allows the solution set to be written x1 = −8α − 3β, x2 = 2α + 3β, x3 = −3α − 3β, x4 = α, x5 = β. (b) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 AE = ⎣0 1 0⎦ , 0 0 1 showing that rank(A) = 3. This corresponds to the trivial solution x1 = x2 = x3 = 0. (c) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 20/13 ⎢0 1 0 5/13 ⎥ ⎥ AE = ⎢ ⎣0 0 1 −7/13 ⎦ , 0 0 0 0 showing that rank(A) = 3. This corresponds to the solution set x1 + (20/13)x4 = 0, x2 + (5/13)x4 = 0, and x3 − (7/13)x4 = 0. Setting x4 = k, an arbitrary number (a parameter), shows the solution set to be given by x1 = −(20/13)k, x2 = −(5/13)k, x3 = (7/13)k, and x4 = k. (d) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 0 ⎢0 1 0 1/3⎥ ⎢ ⎥ AE = ⎢0 0 1 2/3⎥ , ⎢ ⎥ ⎣0 0 0 0 ⎦ 0 0 0 0 showing that rank(A) = 3. This corresponds to the following three equations for the four variables x1 , x2 , x3 , and x4 : x1 = 0, x2 + (1/3)x4 = 0, and x3 + (2/3)x4 = 0. Setting x4 = k, an arbitrary number (a parameter), shows the solution set to be given by x1 = 0, x2 = −k/3 = 0, x3 = −2k/3, and x4 = k. (e) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 1 −1/4 1/2 AE = ⎣0 1 0 0 13/4 −5/2⎦ , 0 0 1 0 −3/4 5/2 showing that rank(A) = 3. This corresponds to the following three equations for the six variables x1 to x6 : x1 + x4 − (1/4)x5 + (1/2)x6 = 0, x2 + (13/4)x5 − (5/2)x6 = 0 x3 − (3/4)x5 + (5/2)x6 = 0.
• 158 Chapter 3 Matrices and Systems of Linear Equations Setting x4 = α, x5 = β, and x6 = γ , where α, β, and γ are arbitrary numbers (parameters), shows the solution set to be given by x1 = −α + (1/4)β − (1/2)γ , x2 = −(13/4)β + (5/2)γ , x4 = α, x5 = β, and x6 = γ . Summary x3 = (3/4)β − (5/2)γ This section made use of the rank of a matrix to determine when a nontrivial solution of a linear system of homogeneous linear algebraic equations exists and, when it does, its precise form. EXERCISES 3.7 In Exercises 1 through 10, use the given form of the matrix A to ﬁnd the solution set of the associated homogeneous linear system of equations Ax = 0. ⎤ ⎡ ⎤ ⎡ 1 2 4 1 1 3 2 1 1 ⎢0 3 1 3⎥ 1. ⎣1 1 0 1 2⎦. ⎥ 3. ⎢ ⎣1 4 1 3⎦. 0 1 2 1 3 ⎤ ⎡ 2 6 5 4 1 2 0 1 1 ⎤ ⎡ ⎢0 3 1 0 1⎥ 1 2 1 0 ⎥ 2. ⎢ ⎣2 0 2 0 1⎦. ⎢2 1 0 1⎥ ⎥ 4. ⎢ 1 0 3 1 1 ⎣0 3 5 1⎦. 1 0 1 5 3.8 ⎡ 1 ⎢2 ⎢ 5. ⎢1 ⎢ ⎣3 2 ⎡ 2 ⎢1 ⎢ 6. ⎢0 ⎢ ⎣1 0 ⎡ 1 ⎢0 ⎢ 7. ⎣ 1 2 3 1 0 1 3 ⎡ ⎤ 4 3⎥ ⎥ 2⎥. ⎥ 1⎦ 1 1 2 1 3 4 1 3 4 1 1 ⎤ 3 0⎥ ⎥ 2⎥. ⎥ 2⎦ 1 5 1 2 3 2 4 1 0 2 1 0 1 1 0 0 1 3 1 2 0 ⎤ 2 1⎥ ⎥. 0⎦ 2 1 ⎢2 ⎢ 8. ⎣ 5 2 ⎡ 1 9. ⎣2 0 ⎡ 1 ⎢2 10. ⎢ ⎣0 1 4 1 6 1 1 3 7 0 ⎤ 0 1⎥ ⎥. 2⎦ 1 1 3 1 5 1 0 0 2 1 3 5 1 0 2 1 2 3 1 0 0 1 ⎤ 0 1 1 3⎦. 3 0 ⎤ 1 2⎥ ⎥. 3⎦ 2 The Solution of Nonhomogeneous Systems of Linear Equations We now turn our attention to the solution of the nonhomogeneous system of equations in (20) that may be written in the matrix form Ax = b, (24) where A is an m × n matrix and b is an m × 1 nonzero column vector. In many respects the arguments we now use parallel the ones used when seeking the form of the solution set for a homogeneous system, but there are important differences. This time, rather than working with the matrix A, we must work with the augmented matrix (A, b) and use elementary row operations to transform it into either an echelon or a row-reduced echelon form that will be denoted by (A, b)E . When this is done, system (24) and the echelon form corresponding to (A, b)E will, of course, each have the same solution set. It is important to recognize that rank(A) is not necessarily equal to rank (A, b)E , so that in general rank(A) ≤ rank((A, b)E ). The signiﬁcance of this observation will become clear when we seek solutions of systems like (24).
• Section 3.8 The Solution of Nonhomogeneous Systems of Linear Equations 159 Case (a): m < n. In this case there are more variables than equations, and it must follow that rank((A, b)E ) ≤ m. If rank(A) = rank((A, b)E ) = r , it follows that r of the equations in (24) are linearly independent and m − r are linear combinations of these r equations. This means that the ﬁrst r rows of (A, b)E are linearly independent while the last m − r rows are rows of zeros. Thus, r of the variables x1 to xn will be determined by the equations corresponding to these r nonzero rows, in terms of the remaining m − r variables as parameters. It can happen, however, that rank(A) = r < rank((A, b)E ), and then the situation is different, because one or more of the rows following the r th row will have zeros in its ﬁrst n entries and nonzero numbers for their last entries. When interpreted as equations, these will imply contradictions, because they will assert expressions such as 0 = c with c = 0 that are impossible. Thus, no solution will exist if rank(A) = rank((A, b)E ). Case (b): m = n. In this case the number of variables equals the number of equations, and it must follow that rank((A, b)E ) ≤ n. The situation now parallels that of Case (a), because if rank(A) = rank((A, b)E ) = r < m, then r of the equations in (24) will be linearly independent, while m − r will be linear combinations of these r equations. So, as before, the ﬁrst r rows of (A, b)E will be linearly independent while the last m − r rows will be rows of zeros. Thus, r of the variables x1 to xn will be determined by the equations corresponding to these r nonzero rows in terms of the remaining m − r variables as parameters. In the case r = n, the solution will be unique, because then AE = I. Finally, if rank(A) = rank((A, b)E ), it follows, as in Case (a), that no solution will exist. Case (c): m > n. In this case there are more equations than variables, and it must follow that rank((A, b)E ) ≤ n. If rank(A) = rank((A, b)E ) = r , it follows, as in Case (b), that r of the equations in (24) are linearly independent while m − r are linear combinations of these r equations. Thus, again, the ﬁrst r rows of (A, b)E will be linearly independent while the last m − r rows will be rows of zeros. Consequently, r of the variables x1 to xn will be determined by the equations corresponding to these r nonzero rows in terms of the remaining m − r variables as parameters. If rank(A) = rank((A, b)E ), then as before no solution will exist. These considerations bring us to the deﬁnition of consistent and inconsistent systems of nonhomogeneous equations, with consistent systems having solutions, sometimes in terms of parameters, and inconsistent systems have no solution. consistent and inconsistent systems Consistent and inconsistent nonhomogeneous systems The nonhomogeneous system Ax = b is said to be consistent when it has a solution; otherwise, it is said to be inconsistent. As with homogeneous systems, the practical determination of solution sets of nonhomogeneous systems of linear equations will be illustrated by means of examples.
• 160 Chapter 3 Matrices and Systems of Linear Equations EXAMPLE 3.24 Find the solution sets for each of the following augmented matrices (A, b), where the matrices A are those given in Example 3.23. ⎡ ⎤ ⎡ ⎤ 1 2 1 7 0 1 1 3 2 2 ⎥ ⎢ ⎥ ⎢ 1⎦ (a) (A, b) = ⎣3 6 4 24 3 0⎦ (b) (A, b) = ⎣2 1 0 1 2 1 −3 1 4 4 12 3 3 ⎤ ⎡ 1 4 1 2 2 ⎡ ⎤ 2 3 6 1 2 ⎢1 3 0 1 0⎥ ⎥ ⎢ ⎥ ⎢ ⎢1 4 ⎥ ⎢ 2 2 3⎥ ⎢ ⎥ (d) (A, b) = ⎢2 1 1 1 3⎥ (c) (A, b) = ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎣4 11 10 5 1⎦ ⎣4 9 3 5 7⎦ 1 0 1 1 2 5 5 2 3 0 ⎡ ⎤ 1 2 3 1 4 3 −2 ⎢ ⎥ 0⎦ . (e) (A, b) = ⎣0 1 3 0 1 5 3 1 2 3 1 4 1 Solution (a) In this case, ⎡ 1 0 0 ⎢ (A, b)E = ⎣0 1 0 0 0 1 8 3 −2 −3 3 3 −7 ⎤ ⎥ 11/2⎦ . −3 As rank(A, b)E = 3, and the rank of matrix A is the rank of the matrix formed by deleting the last column of (A, b)E , it follows that rank(A) = 3. So rank(A, b)E = rank(A), showing the equations to be consistent, so they have a solution. If we remember that the ﬁrst column contains the coefﬁcients of x1 , the second column the coefﬁcients of x2 , . . . , and the ﬁfth column the coefﬁcients of x5 , while the last column contains the nonhomogeneous terms, we can see that the matrix (A, b)E is equivalent to the three equations x1 + 8x4 + 3x5 = −7, x2 − 2x4 − 3x5 = 11/2, x3 + 3x4 + 3x5 = −3. So, if we set x4 = α and x5 = β, with α and β arbitrary numbers (parameters), the solution set becomes x1 = −8α − 3β − 7, x2 = 2α + 3β + 11/2, x4 = α and x5 = β. x3 = −3α − 3β − 3, (b) In this case, ⎡ 1 ⎢ (A, b)E = ⎣0 0 0 0 1 0 0 1 ⎤ 9 ⎥ −17⎦ . 22 Here A is a 3 × 3 matrix and rank(A) = rank((A, b)E ) = 3, so the equations are consistent and the solution is unique. The solution set is seen to be x1 = 9, x2 = −17, and x3 = 22.
• Section 3.8 The Solution of Nonhomogeneous Systems of Linear Equations 161 (c) In this case, ⎡ 1 0 0 ⎢0 1 0 ⎢ (A, b)E = ⎢0 0 0 ⎢ ⎣0 0 0 0 0 0 20/13 5/13 −7/13 0 0 ⎤ 0 0⎥ ⎥ 0⎥ . ⎥ 1⎦ 0 This system has no solution because the equations are inconsistent. This follows from the fact that rank(A) = 3, as can be seen from the ﬁrst four columns, while the ﬁve columns show that rank((A, b)E ) = 4, so that rank(A) = rank((A, b)E ). The inconsistency can be seen from the contradiction contained in the last row, which asserts that 0 = 1. (d) In this case ⎡ ⎤ 1 0 0 0 0 ⎢0 1 0 1/3 0⎥ ⎢ ⎥ (A, b)E = ⎢0 0 1 2/3 0⎥ . ⎢ ⎥ ⎣0 0 0 0 1⎦ 0 0 0 0 0 This system also has no solution because the equations are inconsistent. This follows from the fact that rank(A) = 3 and rank((A, b)E ) = 4, so that rank(A) = rank((A, b)E ). The inconsistency can again be seen from the contradiction in the last row, which again asserts that 0 = 1. (e) In this case ⎡ 1 ⎢ (A, b)E = ⎣0 0 0 0 1 −1/4 1 0 0 13/4 −5/2 0 1 0 −3/4 5/2 1/2 5/8 ⎤ ⎥ −21/8⎦ , 7/8 showing that rank(A) = rank((A, b)E ) = 3, so the equations are consistent. Reasoning as in (a) and setting x4 = α, x5 = β, and x6 = γ , with α, β, and γ arbitrary numbers (parameters), shows the solution set to be given by x1 = −α + (1/4)β − (1/2)γ + 5/8, x2 = −(13/4)β + (5/2)γ − 21/8, x3 = (3/4)β − (5/2)γ + 7/8, x4 = α, x5 = β, x6 = γ . general solution of a nonhomogeneous system A comparison of the corresponding solution sets in Examples 3.23 and 3.24 shows that whenever the nonhomogeneous system has a solution, it comprises the sum of the solution set of the corresponding homogeneous system, containing arbitrary parameters, and numerical constants contributed by the nonhomogeneous terms. This is no coincidence, because it is a fundamental property of nonhomogeneous linear systems of equations. The combination of solutions comprising the sum of a solution of the homogeneous system Ax = 0 containing arbitrary constants, and a particular ﬁxed solution of the nonhomogeneous system Ax = b that is free from arbitrary constants, is called the general solution of a nonhomogeneous system. The result is important, so it will be recorded as a theorem.
• 162 Chapter 3 Matrices and Systems of Linear Equations THEOREM 3.9 General solution of a nonhomogeneous system The nonhomogeneous system of equations Ax = b for which rank(A) = rank ((A, b)E ) has a general solution of the form x = xH + x P , where xH is the general solution of the associated homogeneous system AxH = 0 and xP is a particular (ﬁxed) solution of the nonhomogeneous system AxP = b. Proof Let x be any solution of the nonhomogeneous system Ax = b, and let xP be a solution of the nonhomogeneous system AxP = b that contains no arbitrary constants (a ﬁxed solution). Then, as the equations are linear, A(x − xP ) = Ax − AxP = b − b = 0, showing that the difference xD = x − xP is itself a solution of the homogeneous system. Consequently, all solutions of the nonhomogeneous system are contained in the solution set of the homogeneous system to which xD belongs, and the theorem is proved. Summary This section used the rank of a matrix to determine when a solution of a linear system of nonhomogeneous equations exists and to determine its precise form. If the ranks of a matrix and an augmented matrix are equal, it was shown that a solution exists, furthermore, if there are n equations and the rank r < n, then r unknowns can be expressed in terms of arbitrary values assigned to the remaining n − r unknowns. The system was shown to have a unique solution when r = n, and no solution if the ranks of the matrix and the augmented matrix are different. EXERCISES 3.8 In Exercises 1 through 10 write down a system of equations with an appropriate number of unknowns x1 , x2 , . . . corresponding to the augmented matrix. Find the solution set when the equations are consistent, and state when the equations are inconsistent. ⎤ ⎡ ⎡ ⎤ 1 −2 1 3 11 1 3 1 1 0 ⎥ ⎢ ⎢ ⎥ ⎢1 1 3 2 1 ⎥ ⎢0 3 −2 1 11⎥ ⎢ ⎥ ⎢ ⎥ 3. ⎢ ⎥. ⎢ ⎥ ⎢1 1 0 3 1 ⎥ 1. ⎢2 1 0 4 23⎥ . ⎦ ⎣ ⎢ ⎥ ⎢ ⎥ 2 −1 2 21⎦ 2 0 2 1 0 ⎣3 1 −1 ⎡ 2 1 3 ⎢ 2. ⎢0 1 4 ⎣ 3 0 0 3 1 1 2 2 4 ⎤ 1 ⎥ 1⎥ . ⎦ 1 ⎡ 1 ⎢ 4. ⎢2 ⎣ 5 4 2 3 0 3 1 4 8 5 4 ⎤ ⎥ 2⎥ . ⎦ 8 ⎤ ⎡ 1 −1 2 −1 −4 ⎥ ⎢ 3 1 2 12⎥ ⎢2 ⎥ ⎢ ⎥ ⎢ 2 −2 3 15⎥ . 5. ⎢1 ⎢ ⎥ ⎢ ⎥ 1 −1 1 11⎦ ⎣3 1 ⎡ 1 ⎢ ⎢2 ⎢ ⎢ 6. ⎢0 ⎢ ⎢ ⎣2 ⎡ 1 −1 2 3 1 2 ⎤ 1 1 2 1 6 7 ⎥ 3⎥ ⎥ ⎥ 3⎥ . ⎥ ⎥ 5⎦ 1 −2 1 0 ⎤ 1 2 3 0 1 ⎢ ⎥ ⎢0 1 0 2 1 ⎥ ⎢ ⎥ 7. ⎢ ⎥. ⎢2 1 3 1 0 ⎥ ⎣ ⎦ 1 4 1 5 2 ⎡ 2 1 0 0 3 1 ⎤ ⎢ ⎥ 8. ⎢1 2 1 1 3 0⎥ . ⎣ ⎦ 0 1 2 5 1 2 3 ⎡ 1 ⎢ ⎢1 ⎢ 9. ⎢ ⎢2 ⎣ 0 ⎡ 1 2 1 1 2 1 1 3 5 4 ⎤ ⎥ 0⎥ ⎥ ⎥. 4⎥ ⎦ 1 3 1 1 2 1 ⎤ ⎢ ⎥ 10. ⎢1 −2 1 3 1 0⎥. ⎣ ⎦ 2 0 1 0 3 0
• Section 3.9 3.9 The Inverse Matrix 163 The Inverse Matrix multiplicative inverse matrix The operation of division is not deﬁned for matrices. However, we will see that n × n matrices A for which detA = 0 have associated with them an n × n matrix B, called its multiplicative inverse, with the property that AB = BA = I. The purpose of this section will be to develop ways of ﬁnding the multiplicative inverse of a matrix, which for simplicity is usually called the inverse matrix, but ﬁrst we give a formal deﬁnition of the inverse of a matrix. The inverse of a matrix Let A and B be two n × n matrices. Then matrix A is said to be invertible and to have an associated inverse matrix B if AB = BA = I. Interchanging the order of A and B in this deﬁnition shows that if B is the inverse of A, then A must be the inverse of B. To see that not all n × n matrices have inverses, it will be sufﬁcient to try to ﬁnd a matrix B such that the product AB = I, where A= 1 1 1 1 2 2 2 2 and B= a c b . d The product AB is AB = b a + 2c = d a + 2c a c b + 2d , b + 2d so if this product is to equal the 2 × 2 unit matrix I, it is necessary that a + 2c a + 2c b + 2d 1 = b + 2d 0 0 . 1 Equating corresponding elements in the ﬁrst columns shows that this can only hold if a + 2c = 1 and a + 2c = 0, while equating corresponding elements in the second columns shows that b + 2d = 0 and b + 2d = 1, which is impossible, so matrix A has no inverse. In this case detA = 0, and we will see later why the nonvanishing of detA is necessary if A is to have an inverse. Nonsingular and singular matrices singular and nonsingular n × n matrices EXAMPLE 3.25 An n × n matrix is said to be nonsingular when its inverse exists, and to be singular when it has no inverse. We have already seen that the matrix A= 1 1 2 , 2
• 164 Chapter 3 Matrices and Systems of Linear Equations for which detA = 0, has no inverse and so is singular. However, in the case of matrix A that follows, a simple matrix multiplication conﬁrms that it has associated with it an inverse B, where ⎡ ⎤ ⎡ ⎤ 1 0 1 2 1 −2 1 −1⎦ , A = ⎣−1 2 0⎦ and B = ⎣ 1 0 1 1 −1 −1 2 because AB = BA = I. Furthermore, detA = 0, so A is nonsingular, as is B, and each is the inverse of the other. Before proceeding further it is necessary to establish that, when it exists, the inverse matrix is unique. THEOREM 3.10 Uniqueness of the inverse matrix A nonsingular matrix A has a unique inverse. Proof Suppose, if possible, that the nonsingular n × n matrix A has the two different inverses B and C. Then as AC = I, we have B = BI = B(AC) = (BA)C = IC = C, showing that B = C, so the inverse matrix is unique. It is convenient to denote the inverse of a nonsingular n × n matrix A by the symbol A−1 . This is suggested by the exponentation notation (raising to a power), because if for the moment we write A = A1 , then AA−1 = A1 A−1 = I, showing that exponents may be combined in the usual way, with the understanding that A1 A−1 = A(1−1) = A0 = I. THEOREM 3.11 basic properties of the inverse matrix Basic properties of inverse matrices (i) (ii) (iii) (iv) The unit matrix I is its own inverse, so I = I−1 . If A is nonsingular, so also is A−1 , and (A−1 )−1 = A. If A is nonsingular, so also is AT , and (A−1 )T = (AT )−1 . If A and B are nonsingular n × n matrices, so is AB, and (AB)−1 = B−1 A−1 . (v) If A is nonsingular, then (A−1 )m = (Am)−1 for m = 1, 2, . . . . Proof We prove only (i) and (iv), and leave the proofs of (ii), (iii), and (v) as exercises. The proof of (i) is almost immediate, because I2 = I, showing that I = I−1 . To prove (iv) we premultiply B−1 A−1 by AB to obtain ABB−1 A−1 = AIA−1 = AA−1 = I, which shows that (AB)−1 is B−1 A−1 , so the proof is complete. A simple method of ﬁnding the inverse of an n × n matrix is by means of elementary row operations, but to justify the method we ﬁrst need the following theorem.
• Section 3.9 THEOREM 3.12 The Inverse Matrix 165 Elementary row operation matrices are nonsingular Every n × n matrix E that represents an elementary row operation is nonsingular. Proof Every n × n matrix E that represents an elementary row operation is derived from the unit matrix I by means of one of the three operations deﬁned at the start of Section 3.4. So, as rank(I) = n and E and I are row similar, it follows that rank(E) = n, and so E is also nonsingular. ﬁnding an inverse matrix using elementary row operations We can now describe an elementary way of ﬁnding an inverse matrix by means of elementary row transformations. Let A be a nonsingular n × n matrix, and let E1 , E2 , . . . , Em represent a sequence of elementary row operations of Types I, II, and III that reduces A to I, so that EmEm−1 . . . E2 E1 A = I. Then postmultiplying this result by A−1 gives EmEm−1 . . . E2 E1 I = A−1 , so A−1 is given by A−1 = EmEm−1 · · · E2 E1 I, where the product of the ﬁrst m matrices on the right is nonsingular because of Theorem 3.11. Expressed in words, this result states that when a sequence of elementary row operations is used to reduce a nonsingular matrix A to the unit matrix I, performing the same sequence of elementary row operations on I, in the same order, will generate the inverse matrix A−1 . If matrix A is singular, this will be indicated by the generation of either a complete row or a complete column of zeros before I is reached. If A is nonsingular, it is reducible to the unit matrix I, and clearly detA = 0. However, if A is singular, the attempt to reduce it to I will generate either a row or a column of zeros, so that then detA = 0. The vanishing or nonvanishing of detA provides a simple and convenient test for the singularity or nonsingularity of A whenever n is small, say n ≤ 3, because only then is it a simple matter to calculate detA. The practical way in which to implement this result is not to use the matrices Ei to reduce A to I, but to perform the operations directly on the rows of the partitioned matrix (A, I), because when A in the left half of the partitioned matrix has been reduced to I, the matrix I in the right half will have been transformed into A−1 . EXAMPLE 3.26 Use elementary row operations to ﬁnd A−1 given that ⎡ 1 A = ⎣ −1 0 0 2 1 ⎤ 1 0⎦. 1
• 166 Chapter 3 Matrices and Systems of Linear Equations Solution We form the augmented matrix (A, I) and proceed as described earlier. ⎡ ⎤ ⎤ ⎡ 1 0 1 1 0 0 1 0 1 1 0 0 ∼ ⎥ ⎢ ⎥ ⎢ (A, I) = ⎣−1 2 0 0 1 0⎦ add row 1 ⎣0 2 1 1 1 0⎦ to row 2 0 1 1 0 0 1 0 1 1 0 0 1 ⎡ 1 0 1 ∼ subtract row 3 ⎢0 1 0 ⎣ from row 2 0 1 1 1 0 1 1 0 0 ⎤ ⎡ 1 0 1 0 ∼ ⎥ subtract row 2 ⎢ −1⎦ ⎣0 1 0 from row 3 1 0 0 1 ⎡ 1 ∼ subtract row 3 ⎢0 ⎣ from row 1 0 0 0 2 1 1 0 1 1 0 1 −1 −2 −1 1 0 1 1 −1 −1 ⎤ 0 ⎥ −1⎦ 2 ⎤ ⎥ −1⎦ . 2 The 3 × 3 matrix on the left of this row-equivalent partitioned matrix is now the unit matrix I, so the required inverse matrix is the one to the right of the partition, namely, ⎡ ⎤ 2 1 −2 1 −1⎦ . A−1 = ⎣ 1 −1 −1 2 Once A−1 has been obtained, it is always advisable to check the result by verifying that AA−1 = I. Before proceeding further we will use elementary matrices to provide the promised proof of Theorem 3.4(viii). the proof that det(AB) = detA detB Proof that det(AB) = detA detB Let E1 be a row matrix of Type I. Then if A is a nonsingular matrix, det(EI A) = −detA, because only a row interchange is involved. However, det(EI ) = −1, so det(EI A) = detEI detA. Similar arguments show this to be true for elementary row operation matrices of the other two types, so if E is an elementary row operation of any type, then det(EA) = detEdetA. If detA = 0, premultiplication by a sequence of elementary row operation matrices E1 , E2 , . . . , Er will reduce A to I, so performing them on I in the reverse order allows us to write A = E1 E2 . . . Er I = E1 E2 . . . Er . A repetition of the result det(EA) = detEdetA shows that detA = detE1 detE2 . . . detEr . If B is conformable for multiplication with A, using the preceding result we have det(AB) = det(E1 E2 . . . Er B) = detE1 detE2 . . . detEr detB, but detE1 detE2 . . . detEn = detA, and so det(AB) = detAdetB.
• Section 3.9 The Inverse Matrix 167 To complete the proof we must show this result remains true if A is singular, in which case detA = 0. When detA = 0, the attempt to reduce it to the unit matrix I by elementary row operation matrices will fail because at one stage it will produce a determinant in which a row will contain only zero elements. Consequently, a determinant detEm, say, will be zero, which is impossible, so det(AB) = 0. However, if detA = 0, then detAdetB = 0, so that once again det(AB) = detAdetB, and the result is proved. EXAMPLE 3.27 Use (a) elementary row operations and (b) the determinant test to show matrix A is singular, given that ⎡ ⎤ 1 1 0 A = ⎣1 0 1⎦ . 4 3 1 Solution (a) Using elementary row operations on the augmented matrix gives ⎡ ⎤ ⎡ ⎤ 1 1 0 1 0 0 1 1 0 1 0 0 ∼ ⎢ ⎥ ⎥ ⎢ (A, I) = ⎣1 0 1 0 1 0⎦ subtract row 1 ⎣0 −1 1 −1 1 0⎦ from row 2 4 3 1 0 0 1 0 0 1 4 3 1 ⎡ 1 1 0 ∼ subtract 4 times ⎢0 −1 1 ⎣ row 1 from row 3 0 −1 1 ⎡ 1 1 0 ∼ subtract row 2 ⎢0 −1 1 ⎣ from row 3 0 0 0 0 −1 1 ⎥ 0⎦ −4 0 1 1 0 −1 1 −3 −1 0 ⎤ 1 0 ⎤ ⎥ 0⎦ . 1 The reduction is terminated at this stage by the appearance of a row of zeros on the matrix to the left of the partition, showing that A cannot be reduced to I, and hence that A is singular. (b) Applying the determinant test to A, we ﬁnd that detA = 0, showing that A is singular. Although in this case this is by far the quickest way to establish the singularity of A, this would not have been so had the order of detA been much greater than 3. This is because when n > 3, the effort involved in performing the elementary row operations in an attempt to reduce A to I is considerably less than the effort involved when calculating detA. The following very different way of ﬁnding the inverse of an n × n matrix A is mainly of theoretical importance, though it is a practical method when n is small. The method is based on the properties of the sum of products of elements and cofactors of a determinant. Let A = [ai j ] be an n × n matrix, C = [Ci j ] be the associated n × n matrix of cofactors and form the matrix product ⎡ ⎤⎡ ⎤ a11 a12 . . . a1n C11 C21 . . . Cn1 ⎢a a22 . . . a2n ⎥ ⎢C12 C22 . . . Cn2 ⎥ ⎥⎢ ⎥ ACT = ⎢ 21 ⎣ . . . . . . . . . ⎦⎣ . . . . . . . . . ⎦. an1 an2 . . . ann C1n C2n . . . Cnn
• 168 Chapter 3 Matrices and Systems of Linear Equations If we write B = ACT , with B = [bi j ], it follows from the rule for matrix multiplication that bi j = ai1 C1 j + ai2 C2 j + · · · + ain Cnj . Thus, bi j is seen to be the sum of the product of the elements of the ith row of A and the corresponding cofactors of the elements of the jth row of A. It then follows from the Laplace expansion theorem for determinants that bi j = detA, for i = j = 1, 2, . . . , n and bi j = 0, for i = j. Using these results in the matrix product, we ﬁnd that ⎡ ⎤ detA 0 0 . . . 0 ⎢ 0 ⎥ detA 0 ⎢ ⎥ 0 detA . . . 0 ⎥ ACT = ⎢ 0 ⎢ ⎥ ⎣ ⎦ . . . . . . . . . . . . 0 0 0 . . . detA = detA I. Consequently, provided detA = 0, it follows that (1/detA)ACT = I. Writing this as A{(1/detA)CT } = I shows that A−1 = (1/detA)CT . adjoint matrix The matrix CT , called the adjoint of A and written adjA, is the transpose of the matrix of cofactors of A. So the formula for the inverse of A becomes A−1 = (1/detA)adjA. (25) We have arrived at the following deﬁnition and theorem. Adjoint matrix If A is an n × n matrix, and C is the associated matrix of cofactors, the transpose CT of the matrix of cofactors is called the adjoint of A and is written adjA. THEOREM 3.13 formal deﬁnition of an inverse matrix The inverse matrix in terms of the adjoint of A Let A be a nonsingular n × n matrix. Then the inverse of A is given by A−1 = (1/detA)adjA.
• Section 3.9 EXAMPLE 3.28 Use Theorem 3.13 to ﬁnd A−1 , given that ⎡ 1 3 A = ⎣2 1 1 0 Solution The matrix of cofactors ⎡ ⎤ 1 −1 −1 1 3⎦ , C = ⎣−3 3 −1 −5 The Inverse Matrix 169 ⎤ 0 1⎦ . 1 ⎡ 1 so CT = ⎣−1 −1 ⎤ −3 3 1 −1⎦ . 3 −5 Expanding detA in terms of the elements of its ﬁrst row (we already have its associated cofactors in the ﬁrst row of C) gives detA = 1 · 1 + (−1) · 3 + 1 · 0 = −2, so from Theorem 3.13, ⎤ ⎡ 1 3 −3 −2 2 2 ⎥ ⎢ 1 1⎥ . A−1 = (−1/2)CT = ⎢ 2 − 1 2 2⎦ ⎣ 5 3 1 −2 2 2 Although the result of Theorem 3.13 is of considerable theoretical importance, unless n is small, the task of evaluating the determinants involved makes it impractical for the determination of inverse matrices. In general, for large n, an inverse matrix is found by means of a computer using elementary row operations to reduce A to I. General Proof of Cramer’s Rule proof of Cramer’s rule for a system of n equations In conclusion, we will use Theorem 3.13 to arrive at a simple proof of Cramer’s rule for the system of equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · an1 x1 + an2 x2 + · · · + ann xn = bn . If we write the system as Ax = b, then, provided detA = 0, the solution can be written x = A−1 b = (1/detA)(adjA)b = (1/detA)CT b, where CT is the transpose of the matrix of cofactors of A. If x = (x1 , x2 , . . . , xn )T and b = (b1 , b2 , . . . , bn )T , the ith element of x is given by xi = (1/detA)(C1i b1 + C2i b2 + · · · + Cni bn ) for i = 1, 2, . . . , n. This is simply the expansion of detAi in terms of the elements of its ith column, where Ai is the matrix obtained from A by replacing the elements of the ith column by the elements of b. This has established that xi = detAi /detA, and the proof is complete. for i = 1, 2, . . . , n,
• 170 Chapter 3 Matrices and Systems of Linear Equations More information about the material in Sections 3.4 to 3.9 is to be found in the appropriate chapters of references [2.1], [2.5], and [2.7] to [2.12]. GABRIEL CRAMER (1704–1752): A Swiss mathematician who made many contributions to algebra and geometry. The result called Cramer’s rule was, in fact, ﬁrst formulated by Maclaurin around 1729 and published posthumously in his Treatise on Algebra (1748). The form of the rule attributed to Cramer appeared in his book Traite des courbes algebraiques (1750), which became a standard reference work during the remainder of the century. The work was so well written and so often quoted that after his death Cramer was, on occasions, considered to be the originator of the rule. Summary Division by matrices is not deﬁned, but the introduction of a multiplicative inverse A−1 of a nonsingular n × n matrix A, called the inverse of A, enables certain operations that in some sense are similar to matrix division to be performed. This section gave the formal deﬁnition of the inverse of a matrix and established its most important algebraic properties. The inverse matrix was used to prove Cramer’s rule for a general system of n nonhomogeneous linear algebraic equations when the determinant of the coefﬁcient matrix is nonsingular. EXERCISES 3.9 In Exercises 1 through 8, construct a suitable augmented matrix and ﬁnd the inverse of the given matrix using elementary row operations. ⎤ ⎡ ⎤ ⎡ 2 3 1 1 3 7 5. ⎣1 2 0⎦ . 1. ⎣2 1 −1⎦ . 2 4 1 2 1 5 ⎤ ⎡ ⎤ ⎡ −4 1 0 3 0 1 2. ⎣ 1 −3 1⎦ . 6. ⎣1 −1 1⎦ . 2 1 4 0 4 5 ⎤ ⎡ ⎤ ⎡ 1 1 3 1 2 0 1 ⎢1 3. ⎣5 2 1⎦ . 0 −3 4⎥ ⎥. 7. ⎢ 1 6 2 ⎣0 1 2 5⎦ ⎤ ⎡ 2 −1 2 2 2 −6 1 ⎤ ⎡ ⎦. ⎣1 3 4 4. 0 1 2 3 ⎢2 2 −4 2⎥ 0 −2 1 ⎥. 8. ⎢ ⎣1 3 0 1⎦ 3 1 1 0 9. Given that ⎡ 3 −1 4 A = ⎣1 2 1 ⎤ 1 0⎦ −3 and ⎡ 1 B = ⎣2 3 verify that (AB)−1 = B−1 A−1 . ⎤ −3 1 0 5⎦ , 1 2 10. Given that ⎡ 4 1 A = ⎣3 1 3 2 ⎤ 2 0⎦, verify that (A−1 )T = (AT )−1 and 1 (A−1 )2 = (A2 )−1 . In Exercises 11 through 16, use Theorem 3.13 to ﬁnd the inverse of the given matrix, and check the result by showing that AA−1 = I. ⎤ ⎡ ⎤ ⎡ −3 2 6 2 4 −5 7⎦ . 1⎦ . 14. ⎣ 2 −1 11. ⎣2 7 5 4 −2 1 3 4 ⎤ ⎡ ⎤ ⎡ 2 0 1 2 3 −7 8 ⎢3 1 3 4⎥ 4 3⎦ . 12. ⎣1 ⎥. 15. ⎢ ⎣1 0 −2 3⎦ 0 −5 1 ⎤ ⎡ 1 −2 2 7 9 2 1 ⎤ ⎡ 0 1 −4 1 13. ⎣1 4 10⎦ . ⎢3 7 5 2⎥ 3 1 2 ⎥. 16. ⎢ ⎣1 −2 6 0⎦ 0 1 3 1 In the following two exercises, use the determinant test to show the given matrix is singular, and then verify this by using elementary row operations applied to a suitable augmented matrix, as in Example 3.27. Compare the effort involved in each case. ⎤ ⎤ ⎡ ⎡ 0 2 1 0 1 3 0 1 ⎢ 1 1 3 0⎥ ⎢1 1 2 1⎥ ⎥. ⎥ 17. ⎢ 18. ⎢ ⎣2 1 4 2 ⎦ . ⎣1 1 2 5⎦ 4 3 10 2 0 −1 1 2
• Section 3.10 3.10 Derivative of a Matrix 171 Derivative of a Matrix When the elements of matrix A are differentiable functions of a single variable, say t, so that A = A[ai j (t)], calculus can be performed on matrices, so it becomes necessary to deﬁne the derivative of a matrix. An illustration of the need for this ¨ was given in Section 3.2(e), where the matrix differential equation x + Ax = 0 was obtained as the system of second order differential equations determining the motion of a compound mass–spring system. Derivative of a matrix fundamental deﬁnition of dA/dt Let the m × n matrix A have elements ai j (t) that are differentiable functions of the variable t. Then the ﬁrst order derivative of A with respect to t, written dA/dt, is deﬁned as dA/dt = [d(ai j )/dt], and its nth order derivative with respect to t is deﬁned recursively as dn A/dt n = d/dt[dn−1 A/dt n−1 ], for n = 1, 2, . . . , with the convention that d0 (ai j )/dt0 = ai j , so that d0 A/dt 0 = A. The derivative of a constant matrix is the null (zero) matrix 0. EXAMPLE 3.29 Find dA/dt and d2 A/dt 2 given that (a) A = t2 2t + 1 3t et cosh t , sin 2t (b) A = tet . cos 3t Solution (a) By deﬁnition, dA/dt = (b) dA/dt = THEOREM 3.14 derivative of a sum, a product, and an inverse matrix 2t 2 3 et et + tet −3 sin 3t sinh t 2 cos 2t and and d2 A/dt 2 = d2 A/dt 2 = 2 0 0 et cosh t . −4 sin 2t 2et + tet . −9 cos 3t Derivative of the sum of two matrices Let A(t) and B(t) be an m × n matrices, each with differentiable elements. Then d/dt{A + B} = dA/dt + dB/dt. Proof The result follows immediately from the deﬁnition of the sum of two matrices.
• 172 Chapter 3 Matrices and Systems of Linear Equations THEOREM 3.15 Derivative of a matrix product Let A(t) be an m × n matrix and B(t) be an n × q matrix, each with differentiable elements. Then, if the m × q matrix C(t) = A(t) B(t), dC/dt = {dA/dt}B + A{dB/dt}. Proof It follows from the deﬁnition of the matrix product of two matrices A and B that are conformable for multiplication that cr s = ar 1 b1s + ar 2 b2s + · · · + ar n bns , so each term in cr s is a product of two differentiable functions. Differentiating cr s establishes the theorem in which the order of the matrix products must be as shown. THEOREM 3.16 Derivative of an inverse matrix Let A(t) be an n × n nonsingular matrix with differentiable elements. Then dA−1/dt = −A−1 {dA/dt}A−1 . Proof As A is nonsingular, its inverse A−1 exists and AA−1 = I. Differentiating the matrix product AA−1 = I gives {dA/dt}A−1 + AdA−1 /dt = 0. Premultiplication by A−1 followed by a rearrangement establishes the theorem. EXAMPLE 3.30 Find dA−1 /dt given that A= cos t sin t −sin t . cos t Solution We have dA/dt = −sin t cos t −cos t −sin t and A−1 = cos t −sin t sin t , cos t so from Theorem 3.16 dA−1 /dt = −A−1 {dA/dt}A−1 = −sin t −cos t cos t . −sin t In this case the result is easily checked by direct differentiation of A−1 . Applications of the derivative of a matrix are to be found in reference [2.11] and, for example, in connection with systems of ordinary differential equations in reference [3.15]. Summary Matrices can occur with functions as their elements as, for example, when a matrix describes a rotation through an angle θ about the origin of a cartesian coordinate system O{x, y}, or when a column vector contains the unknown functions u1 (t), u2 (t), . . . , un (t) that form the solution set of a system of linear differential equations with independent variable t. Because of this, it is necessary to understand how to differentiate a matrix with respect to an independent variable that is present in functions forming its elements. This section addressed this matter by ﬁrst deﬁning the fundamental operation of differentiation
• Section 3.10 Derivative of a Matrix 173 of a matrix, and then establishing the way in which it is to be applied to the sum and product of two matrices and to the inverse matrix. EXERCISES 3.10 5. A = 3 t t2 1. C = A + B, where A = B= 1 t 2 B= 2 t 2t t 3. C = A − 2B, where A = B= e2t 1 t t2 2t e t 1 sin t t sin t t tan t cos 3t t + 2 2t 3 3t t3 e2t and and t2 ln t cosh t sinh t cos t sin t and B= 1 + 2t 2 2 sin t . cos t and B= ln(2t) t t . cos t In Exercises 7 and 8 ﬁnd dA−1 /dt by means of Theorem 3.16 and then verify the result by direct differentiation of A−1 . ⎤ ⎡ cos t sin t 0 ⎥ ⎢ 7. A = ⎣ −sin t cos t 0 ⎦ . 2 t 1 t t 2 2t . −t 3t 9. Find an expression for 8. A = t3 . sinh t t 1 −cos 3t sin t and sinh t . sin t (t + 1)2 4. C = A + 3B, where A = 2t B= t sin t sin 2t cosh t . cos t 2t 3 2. C = A − B, where A = t cos t sin t cos t 6. A = In Exercises 1 through 4, ﬁnd dC/dt and d2 C/dt 2 . d2 {A−1 }/dt 2 and 4 t . t cosh t In Exercises 5 and 6, use Theorem 3.15 to ﬁnd dC/dt, where C = AB, and check the result by direct differentiation of C. in terms of A−1 , dA/dt, and d2 A/dt 2 . Apply the result to A= cos t sin t −sin t cos t and verify it by direct differentiation of A−1 .
• CHAPTER 3 TECHNOLOGY PROJECTS Project 1 Simpliﬁcation of det C When C = [ci j + di j ] The purpose of this project is to provide practice with the computer algebra of determinants and to extend the result of Theorem 3.4(vi) to the case when each element of a determinant is the sum of two numbers. 1. Let a1 , a2 , a3 , b1 , b2 , b3 be arbitrary 3 ϫ 1 element column vectors. Then, by repeated application of Theorem 3.4(vi), extend its result to the case when C ϭ [a1 ϩ b1 , a2 ϩ b2 , a3 ϩ b3 ] by expressing det C as a sum of 3 ϫ 3 determinants with columns formed from a1 , a2 , a3 , b1 , b2 , and b3 . 2. Deﬁne an arbitrary matrix C of the form C ϭ [a1 ϩ b1 , a2 ϩ b2 , a3 ϩ b3 ], and with the aid of a computer algebra determinant package ﬁnd det C by using the result of Step 1. Conﬁrm the result by applying the computer algebra package directly to ﬁnd det C. Project 2 The Row-Reduced Echelon Form of a Matrix and Its Rank The purpose of this project is to provide practice with elementary row operations performed by me