Riyazi mohandesi jeffrey(www.mtka.lxb.ir)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Riyazi mohandesi jeffrey(www.mtka.lxb.ir)

on

  • 1,349 views

ریاضی مهندسی پیشرفته جفری

ریاضی مهندسی پیشرفته جفری

Statistics

Views

Total Views
1,349
Views on SlideShare
1,349
Embed Views
0

Actions

Likes
1
Downloads
226
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Riyazi mohandesi jeffrey(www.mtka.lxb.ir) Document Transcript

  • 1. Alan Jeffrey University of Newcastle-upon-Tyne San Diego San Francisco New York Boston London Toronto Sydney Tokyo
  • 2. Sponsoring Editor Production Editor Promotions Manager Cover Design Text Design Front Matter Design Copyeditor Composition Printer Barbara Holland Julie Bolduc Stephanie Stevens Monty Lewis Design Thompson Steele Production Services Perspectives Kristin Landon TechBooks RR Donnelley & Sons, Inc. This book is printed on acid-free paper. ∞ Copyright C 2002 by HARCOURT/ACADEMIC PRESS All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Requests for permission to make copies of any part of the work should be mailed to: Permissions Department, Harcourt, Inc., 6277 Sea Harbor Drive, Orlando, Florida 32887-6777. Harcourt/Academic Press A Harcourt Science and Technology Company 200 Wheeler Road, Burlington, Massachusetts 01803, USA http://www.harcourt-ap.com Academic Press A Harcourt Science and Technology Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com Academic Press Harcourt Place, 32 Jamestown Road, London NW1 7BY, UK http://www.academicpress.com Library of Congress Catalog Card Number: 00-108262 International Standard Book Number: 0-12-382592-X PRINTED IN THE UNITED STATES OF AMERICA 01 02 03 04 05 06 DOC 9 8 7 6 5 4 3 2 1
  • 3. To Lisl and our family
  • 4. This Page Intentionally Left Blank
  • 5. C O N T E N T S Preface PART ONE CHAPTER 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 xv REVIEW MATERIAL 1 Review of Prerequisites 3 Real Numbers, Mathematical Induction, and Mathematical Conventions 4 Complex Numbers 10 The Complex Plane 15 Modulus and Argument Representation of Complex Numbers 18 Roots of Complex Numbers 22 Partial Fractions 27 Fundamentals of Determinants 31 Continuity in One or More Variables 35 Differentiability of Functions of One or More Variables 38 Tangent Line and Tangent Plane Approximations to Functions 40 Integrals 41 Taylor and Maclaurin Theorems 43 Cylindrical and Spherical Polar Coordinates and Change of Variables in Partial Differentiation 46 Inverse Functions and the Inverse Function Theorem 49 vii
  • 6. PART TWO CHAPTER 2 2.1 2.2 2.3 2.4 VECTORS AND MATRICES 53 Vectors and Vector Spaces 55 2.5 2.6 2.7 CHAPTER Vectors, Geometry, and Algebra 56 The Dot Product (Scalar Product) 70 The Cross Product (Vector Product) 77 Linear Dependence and Independence of Vectors and Triple Products 82 n -Vectors and the Vector Space R n 88 Linear Independence, Basis, and Dimension 95 Gram–Schmidt Orthogonalization Process 101 3 Matrices and Systems of Linear Equations 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 CHAPTER 4 4.1 4.2 4.3 4.4 4.5 viii 105 Matrices 106 Some Problems That Give Rise to Matrices 120 Determinants 133 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication 143 The Echelon and Row-Reduced Echelon Forms of a Matrix 147 Row and Column Spaces and Rank 152 The Solution of Homogeneous Systems of Linear Equations 155 The Solution of Nonhomogeneous Systems of Linear Equations 158 The Inverse Matrix 163 Derivative of a Matrix 171 Eigenvalues, Eigenvectors, and Diagonalization Characteristic Polynomial, Eigenvalues, and Eigenvectors 178 Diagonalization of Matrices 196 Special Matrices with Complex Elements Quadratic Forms 210 The Matrix Exponential 215 205 177
  • 7. PART THREE CHAPTER 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 CHAPTER 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 ORDINARY DIFFERENTIAL EQUATIONS 225 First Order Differential Equations 227 Background to Ordinary Differential Equations Some Problems Leading to Ordinary Differential Equations 233 Direction Fields 240 Separable Equations 242 Homogeneous Equations 247 Exact Equations 250 Linear First Order Equations 253 The Bernoulli Equation 259 The Riccati Equation 262 Existence and Uniqueness of Solutions 264 228 Second and Higher Order Linear Differential Equations and Systems 269 Homogeneous Linear Constant Coefficient Second Order Equations 270 Oscillatory Solutions 280 Homogeneous Linear Higher Order Constant Coefficient Equations 291 Undetermined Coefficients: Particular Integrals 302 Cauchy–Euler Equation 309 Variation of Parameters and the Green’s Function 311 Finding a Second Linearly Independent Solution from a Known Solution: The Reduction of Order Method 321 Reduction to the Standard Form u + f (x)u = 0 324 Systems of Ordinary Differential Equations: An Introduction 326 A Matrix Approach to Linear Systems of Differential Equations 333 Nonhomogeneous Systems 338 Autonomous Systems of Equations 351 ix
  • 8. CHAPTER 7 7.1 7.2 7.3 7.4 CHAPTER 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 PART FOUR CHAPTER 9 9.1 9.2 9.3 9.4 9.5 9.6 x The Laplace Transform Laplace Transform: Fundamental Ideas 379 Operational Properties of the Laplace Transform 390 Systems of Equations and Applications of the Laplace Transform 415 The Transfer Function, Control Systems, and Time Lags 379 437 Series Solutions of Differential Equations, Special Functions, and Sturm–Liouville Equations A First Approach to Power Series Solutions of Differential Equations 443 A General Approach to Power Series Solutions of Homogeneous Equations 447 Singular Points of Linear Differential Equations 461 The Frobenius Method 463 The Gamma Function Revisited 480 Bessel Function of the First Kind Jn(x) 485 Bessel Functions of the Second Kind Yν (x) 495 Modified Bessel Functions I ν (x) and K ν (x) 501 A Critical Bending Problem: Is There a Tallest Flagpole? Sturm–Liouville Problems, Eigenfunctions, and Orthogonality 509 Eigenfunction Expansions and Completeness 526 443 504 FOURIER SERIES, INTEGRALS, AND THE FOURIER TRANSFORM 543 Fourier Series 545 Introduction to Fourier Series 545 Convergence of Fourier Series and Their Integration and Differentiation 559 Fourier Sine and Cosine Series on 0 ≤ x ≤ L 568 Other Forms of Fourier Series 572 Frequency and Amplitude Spectra of a Function 577 Double Fourier Series 581
  • 9. CHAPTER 10 10.1 10.2 10.3 PART FIVE CHAPTER 11 11.1 11.2 11.3 11.4 11.5 11.6 CHAPTER 12 12.1 12.2 12.3 12.4 PART SIX CHAPTER 13 13.1 13.2 13.3 13.4 Fourier Integrals and the Fourier Transform The Fourier Integral 589 The Fourier Transform 595 Fourier Cosine and Sine Transforms 589 611 VECTOR CALCULUS 623 Vector Differential Calculus 625 Scalar and Vector Fields, Limits, Continuity, and Differentiability 626 Integration of Scalar and Vector Functions of a Single Real Variable 636 Directional Derivatives and the Gradient Operator Conservative Fields and Potential Functions 650 Divergence and Curl of a Vector 659 Orthogonal Curvilinear Coordinates 665 644 Vector Integral Calculus Background to Vector Integral Theorems 678 Integral Theorems 680 Transport Theorems 697 Fluid Mechanics Applications of Transport Theorems 677 704 COMPLEX ANALYSIS 709 Analytic Functions 711 Complex Functions and Mappings 711 Limits, Derivatives, and Analytic Functions 717 Harmonic Functions and Laplace’s Equation 730 Elementary Functions, Inverse Functions, and Branches 735 xi
  • 10. CHAPTER 14 14.1 14.2 14.3 14.4 CHAPTER 15 15.1 15.2 15.3 15.4 15.5 CHAPTER 16 16.1 CHAPTER 17 17.1 17.2 PART SEVEN CHAPTER 18 18.1 18.2 18.3 18.4 xii Complex Integration 745 Complex Integrals 745 Contours, the Cauchy–Goursat Theorem, and Contour Integrals 755 The Cauchy Integral Formulas 769 Some Properties of Analytic Functions 775 Laurent Series, Residues, and Contour Integration Complex Power Series and Taylor Series 791 Uniform Convergence 811 Laurent Series and the Classification of Singularities 816 Residues and the Residue Theorem 830 Evaluation of Real Integrals by Means of Residues 791 839 The Laplace Inversion Integral The Inversion Integral for the Laplace Transform Conformal Mapping and Applications to Boundary Value Problems 863 863 877 Conformal Mapping 877 Conformal Mapping and Boundary Value Problems 904 PARTIAL DIFFERENTIAL EQUATIONS 925 Partial Differential Equations 927 What Is a Partial Differential Equation? 927 The Method of Characteristics 934 Wave Propagation and First Order PDEs 942 Generalizing Solutions: Conservation Laws and Shocks 951
  • 11. 18.5 18.6 18.7 18.8 18.9 18.10 18.11 18.12 PART EIGHT CHAPTER 19 19.1 19.2 19.3 19.4 19.5 19.6 19.7 The Three Fundamental Types of Linear Second Order PDE 956 Classification and Reduction to Standard Form of a Second Order Constant Coefficient Partial Differential Equation for u(x, y) 964 Boundary Conditions and Initial Conditions 975 Waves and the One-Dimensional Wave Equation 978 The D’Alembert Solution of the Wave Equation and Applications 981 Separation of Variables 988 Some General Results for the Heat and Laplace Equation 1025 An Introduction to Laplace and Fourier Transform Methods for PDEs 1030 NUMERICAL MATHEMATICS 1043 Numerical Mathematics 1045 Decimal Places and Significant Figures 1046 Roots of Nonlinear Functions 1047 Interpolation and Extrapolation 1058 Numerical Integration 1065 Numerical Solution of Linear Systems of Equations 1077 Eigenvalues and Eigenvectors 1090 Numerical Solution of Differential Equations 1095 Answers 1109 References 1143 Index 1147 xiii
  • 12. This Page Intentionally Left Blank
  • 13. P R E F A C E T his book has evolved from lectures on engineering mathematics given regularly over many years to students at all levels in the United States, England, and elsewhere. It covers the more advanced aspects of engineering mathematics that are common to all first engineering degrees, and it differs from texts with similar names by the emphasis it places on certain topics, the systematic development of the underlying theory before making applications, and the inclusion of new material. Its special features are as follows. Prerequisites T he opening chapter, which reviews mathematical prerequisites, serves two purposes. The first is to refresh ideas from previous courses and to provide basic self-contained reference material. The second is to remove from the main body of the text certain elementary material that by tradition is usually reviewed when first used in the text, thereby allowing the development of more advanced ideas to proceed without interruption. Worked Examples T he numerous worked examples that follow the introduction of each new idea serve in the earlier chapters to illustrate applications that require relatively little background knowledge. The ability to formulate physical problems in mathematical terms is an essential part of all mathematics applications. Although this is not a text on mathematical modeling, where more complicated physical applications are considered, the essential background is first developed to the point at which the physical nature of the problem becomes clear. Some examples, such as the ones involving the determination of the forces acting in the struts of a framed structure, the damping of vibrations caused by a generator and the vibrational modes of clamped membranes, illustrate important mathematical ideas in the context of practical applications. Other examples occur without specific applications and their purpose is to reinforce new mathematical ideas and techniques as they arise. A different type of example is the one that seeks to determine the height of the tallest flagpole, where the height limitation is due to the phenomenon of xv
  • 14. buckling. Although the model used does not give an accurate answer, it provides a typical example of how a mathematical model is constructed. It also illustrates the reasoning used to select a physical solution from a scenario in which other purely mathematical solutions are possible. In addition, the example demonstrates how the choice of a unique physically meaningful solution from a set of mathematically possible ones can sometimes depend on physical considerations that did not enter into the formulation of the original problem. Exercise Sets T he need for engineering students to have a sound understanding of mathematics is recognized by the systematic development of the underlying theory and the provision of many carefully selected fully worked examples, coupled with their reinforcement through the provision of large sets of exercises at the ends of sections. These sets, to which answers to odd-numbered exercises are listed at the end of the book, contain many routine exercises intended to provide practice when dealing with the various special cases that can arise, and also more challenging exercises, each of which is starred, that extend the subject matter of the text in different ways. Although many of these exercises can be solved quickly by using standard computer algebra packages, the author believes the fundamental mathematical ideas involved are only properly understood once a significant number of exercises have first been solved by hand. Computer algebra can then be used with advantage to confirm the results, as is required in various exercise sets. Where computer algebra is either required or can be used to advantage, the exercise numbers are in blue. A comparison of computer-based solutions with those obtained by hand not only confirms the correctness of hand calculations, but also serves to illustrate how the method of solution often determines its form, and that transforming one form of solution to another is sometimes difficult. It is the author’s belief that only when fundamental ideas are fully understood is it safe to make routine use of computer algebra, or to use a numerical package to solve more complicated problems where the manipulation involved is prohibitive, or where a numerical result may be the only form of solution that is possible. New Material T ypical of some of the new material to be found in the book is the matrix exponential and its application to the solution of linear systems of ordinary differential equations, and the use of the Green’s function. The introductory discussion of the development of discontinuous solutions of first order quasilinear equations, which are essential in the study of supersonic gas flow and in various other physical applications, is also new and is not to be found elsewhere. The account of the Laplace transform contains more detail than usual. While the Laplace transform is applied to standard engineering problems, including xvi
  • 15. control theory, various nonstandard problems are also considered, such as the solution of a boundary value problem for the equation that describes the bending of a beam and the derivation of the Laplace transform of a function from its differential equation. The chapter on vector integral calculus first derives and then applies two fundamental vector transport theorems that are not found in similar texts, but which are of considerable importance in many branches of engineering. Series Solutions of Differential Equations U nderstanding the derivation of series solutions of ordinary differential equations is often difficult for students. This is recognized by the provision of detailed examples, followed by carefully chosen sets of exercises. The worked examples illustrate all of the special cases that can arise. The chapter then builds on this by deriving the most important properties of Legendre polynomials and Bessel functions, which are essential when solving partial differential equations involving cylindrical and spherical polar coordinates. Complex Analysis B ecause of its importance in so many different applications, the chapters on complex analysis contain more topics than are found in similar texts. In particular, the inclusion of an account of the inversion integral for the Laplace transform makes it possible to introduce transform methods for the solution of problems involving ordinary and partial differential equations for which tables of transform pairs are inadequate. To avoid unnecessary complication, and to restrict the material to a reasonable length, some topics are not developed with full mathematical rigor, though where this occurs the arguments used will suffice for all practical purposes. If required, the account of complex analysis is sufficiently detailed for it to serve as a basis for a single subject course. Conformal Mapping and Boundary Value Problems S ufficient information is provided about conformal transformations for them to be used to provide geometrical insight into the solution of some fundamental two-dimensional boundary value problems for the Laplace equation. Physical applications are made to steady-state temperature distributions, electrostatic problems, and fluid mechanics. The conformal mapping chapter also provides a quite different approach to the solution of certain two-dimensional boundary value problems that in the subsequent chapter on partial differential equations are solved by the very different method of separation of variables. xvii
  • 16. Partial Differential Equations A n understanding of partial differential equations is essential in all branches of engineering, but accounts in engineering mathematics texts often fall short of what is required. This is because of their tendency to focus on the three standard types of linear second order partial differential equations, and their solution by means of separation of variables, to the virtual exclusion of first order equations and the systems from which these fundamental linear second order equations are derived. Often very little is said about the types of boundary and initial conditions that are appropriate for the different types of partial differential equations. Mention is seldom if ever made of the important part played by nonlinearity in first order equations and the way it influences the properties of their solutions. The account given here approaches these matters by starting with first order linear and quasilinear equations, where the way initial and boundary conditions and nonlinearity influence solutions is easily understood. The discussion of the effects of nonlinearity is introduced at a comparatively early stage in the study of partial differential equations because of its importance in subjects like fluid mechanics and chemical engineering. The account of nonlinearity also includes a brief discussion of shock wave solutions that are of fundamental importance in both supersonic gas flow and elsewhere. Linear and nonlinear wave propagation is examined in some detail because of its considerable practical importance; in addition, the way integral transform methods can be used to solve linear partial differential equations is described. From a rigorous mathematical point of view, the solution of a partial differential equation by the method of separation of variables only yields a formal solution, which only becomes a rigorous solution once the completeness of any set of eigenfunctions that arises has been established. To develop the subject in this manner would take the text far beyond the level for which it is intended and so the completeness of any set of eigenfunctions that occurs will always be assumed. This assumption can be fully justified when applying separation of variables to the applications considered here and also in virtually all other practical cases. Technology Projects T o encourage the use of technology and computer algebra and to broaden the range of problems that can be considered, technology-based projects have been added wherever appropriate; in addition, standard sets of exercises of a theoretical nature have been included at the ends of sections. These projects are not linked to a particular computer algebra package: Some projects illustrating standard results are intended to make use of simple computer skills while others provide insight into more advanced and physically important theoretical questions. Typical of the projects designed to introduce new ideas are those at the end of the chapter on partial differential equations, which offer a brief introduction to the special nonlinear wave solutions called solitons. xviii
  • 17. Numerical Mathematics A lthough an understanding of basic numerical mathematics is essential for all engineering students, in a book such as this it is impossible to provide a systematic account of this important discipline. The aim of this chapter is to provide a general idea of how to approach and deal with some of the most important and frequently encountered numerical operations, using only basic numerical techniques, and thereafter to encourage the use of standard numerical packages. The routines available in numerical packages are sophisticated, highly optimized and efficient, but the general ideas that are involved are easily understood once the material in the chapter has been assimilated. The accounts that are given here purposely avoid going into great detail as this can be found in the quoted references. However, the chapter does indicate when it is best to use certain types of routine and those circumstances where routines might be inappropriate. The details of references to literature contained in square brackets at the ends of sections are listed at the back of the book with suggestions for additional reading. An instructor’s Solutions Manual that gives outline solutions for the technology projects is also available. Acknowledgments I wish to express my sincere thanks to the reviewers and accuracy readers, those cited below and many who remain anonymous, whose critical comments and suggestions were so valuable, and also to my many students whose questions when studying the material in this book have contributed so fundamentally to its development. Particular thanks go to: Chun Liu, Pennsylvania State University William F. Moss, Clemson University Donald Hartig, California Polytechnic State University at San Luis Obispo Howard A. Stone, Harvard University Donald Estep, Georgia Institute of Technology Preetham B. Kumar, California State University at Sacramento Anthony L. Peressini, University of Illinois at Urbana-Champaign Eutiquio C. Young, Florida State University Colin H. Marks, University of Maryland Ronald Jodoin, Rochester Institute of Technology Edgar Pechlaner, Simon Fraser University Ronald B. Guenther, Oregon State University Mattias Kawski, Arizona State University L. F. Shampine, Southern Methodist University In conclusion, I also wish to thank my editor, Barbara Holland, for her invaluable help and advice on presentation; Julie Bolduc, senior production editor, for her patience and guidance; Mike Sugarman, for his comments during the early stages of writing; and, finally, Chuck Glaser, for encouraging me to write the book in the first place. xix
  • 18. This Page Intentionally Left Blank
  • 19. PART ONE REVIEW MATERIAL Chapter 1 Review of Prerequisites 1
  • 20. This Page Intentionally Left Blank
  • 21. 1 C H A P T E R Review of Prerequisites E very account of advanced engineering mathematics must rely on earlier mathematics courses to provide the necessary background. The essentials are a first course in calculus and some knowledge of elementary algebraic concepts and techniques. The purpose of the present chapter is to review the most important of these ideas that have already been encountered, and to provide for convenient reference results and techniques that can be consulted later, thereby avoiding the need to interrupt the development of subsequent chapters by the inclusion of review material prior to its use. Some basic mathematical conventions are reviewed in Section 1.1, together with the method of proof by mathematical induction that will be required in later chapters. The essential algebraic operations involving complex numbers are summarized in Section 1.2, the complex plane is introduced in Section 1.3, the modulus and argument representation of complex numbers is reviewed in Section 1.4, and roots of complex numbers are considered in Section 1.5. Some of this material is required throughout the book, though its main use will be in Part 5 when developing the theory of analytic functions. The use of partial fractions is reviewed in Section 1.6 because of the part they play in Chapter 7 in developing the Laplace transform. As the most basic properties of determinants are often required, the expansion of determinants is summarized in Section 1.7, though a somewhat fuller account of determinants is to be found later in Section 3.3 of Chapter 3. The related concepts of limit, continuity, and differentiability of functions of one or more independent variables are fundamental to the calculus, and to the use that will be made of them throughout the book, so these ideas are reviewed in Sections 1.8 and 1.9. Tangent line and tangent plane approximations are illustrated in Section 1.10, and improper integrals that play an essential role in the Laplace and Fourier transforms, and also in complex analysis, are discussed in Section 1.11. The importance of Taylor series expansions of functions involving one or more independent variables is recognized by their inclusion in Section 1.12. A brief mention is also made of the two most frequently used tests for the convergence of series, and of the differentiation and integration of power series that is used in Chapter 8 when considering series solutions of linear ordinary differential equations. These topics are considered again in Part 5 when the theory of analytic functions is developed. The solution of many problems involving partial differential equations can be simplified by a convenient choice of coordinate system, so Section 1.13 reviews the theorem for the 3
  • 22. 4 Chapter 1 Review of Prerequisites change of variable in partial differentiation, and describes the cylindrical polar and spherical polar coordinate systems that are the two that occur most frequently in practical problems. Because of its fundamental importance, the implicit function theorem is stated without proof in Section 1.14, though it is not usually mentioned in first calculus courses. 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions N umbers are fundamental to all mathematics, and real numbers are a subset of complex numbers. A real number can be classified as being an integer, a rational number, or an irrational number. From the set of positive and negative integers, and zero, the set of positive integers 1, 2, 3, . . . is called the set of natural numbers. The rational numbers are those that can be expressed in the form m/n, √ where m and n are integers with n = 0. Irrational numbers such as π , 2, and sin 2 are numbers that cannot be expressed in rational form, so, for example, for no √ integers m and n is it true that 2 is equal to m/n. Practical calculations can only be performed using rational numbers, so all irrational numbers that arise must be approximated arbitrarily closely by rational numbers. Collectively, the sets of integers and rational and irrational numbers form what is called the set of all real numbers, and this set is denoted by R. When it is necessary to indicate that an arbitrary number a is a real number a shorthand notation is adopted involving the symbol ∈, and we will write a ∈ R. The symbol ∈ is to be read “belongs to” or, more formally, as “is an element of the set.” If a is not a member of set R, the symbol ∈ is negated by writing ∈, and we will write a ∈ R where, of / / course, the symbol ∈ is to be read as “does not belong to,” or “is not an element / of the set.” As real numbers can be identified in a unique manner with points on a line, the set of all real numbers R is often called the real line. The set of all complex numbers C to which R belongs will be introduced later. One of the most important properties of real numbers that distinguishes them from other complex numbers is that they can be arranged in numerical order. This fundamental property is expressed by saying that the real numbers possess the order property. This simply means that if x, y ∈ R, with x = y, then either x < y or x > y, where the symbol < is to be read “is less than” and the symbol > is to be read “is greater than.” When the foregoing results are expressed differently, though equivalently, if x, y ∈ R, with x = y, then either x − y < 0 absolute value or x − y > 0. It is the order property that enables the graph of a real function f of a real variable x to be constructed. This follows because once length scales have been chosen for the axes together with a common origin, a real number can be made to correspond to a unique point on an axis. The graph of f follows by plotting all possible points (x, f (x)) in the plane, with x measured along one axis and f (x) along the other axis. The absolute value |x| of a real number x is defined by the formula |x| = x if x ≥ 0 −x if x < 0.
  • 23. Section 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions 5 This form of definition is in reality a concise way of expressing two separate statements. One statement is obtained by reading |x| with the top condition on the right and the other by reading it with the bottom condition on the right. The absolute value of a real number provides a measure of its magnitude without regard to its sign so, for example, |3| = 3, |−7.41| = 7.41, and |0| = 0. Sometimes the form of a general mathematical result that only depends on an arbitrary natural number n can be found by experiment or by conjecture, and then the problem that remains is how to prove that the result is either true or false for all n. A typical example is the proposition that the product (1 − 1/4)(1 − 1/9)(1 − 1/16) . . . [1 − 1/(n + 1)2 ] = (n + 2)/(2n + 2), mathematical induction for n = 1, 2, . . . . This assertion is easily checked for any specific positive integer n, but this does not amount to a proof that the result is true for all natural numbers. A powerful method by which such propositions can often be shown to be either true or false involves using a form of argument called mathematical induction. This type of proof depends for its success on the order property of numbers and the fact that if n is a natural number, then so also is n + 1. The steps involved in an inductive proof can be summarized as follows. Proof by Mathematical Induction Let P(n) be a proposition depending on a positive integer n. STEP 1 STEP 2 STEP 3 STEP 4 Show, if possible, that P(n) is true for some positive integer n0 . Show, if possible, that if P(n) is true for an arbitrary integer n = k ≥ n0 , then the proposition P(k + 1) follows from proposition P(k). If Step 2 is true, the fact that P(n0 ) is true implies that P(n0 + 1) is true, and then that P(n0 + 2) is true, and hence that P(n) is true for all n ≥ n0 . If no number n = n0 can be found for which Step 1 is true, or if in Step 2 it can be shown that P(k) does not imply P(k + 1), the proposition P(n) is false. The example that follows is typical of the situation where an inductive proof is used. It arises when determining the nth term in the Maclaurin series for sin ax that involves finding the nth derivative of sin ax. A result such as this may be found intuitively by inspection of the first few derivatives, though this does not amount to a formal proof that the result is true for all natural numbers n. EXAMPLE 1.1 Prove by mathematical induction that dn /dx n [sin ax] = a n sin(ax + nπ/2), for n = 1, 2, . . . . Solution The proposition P(n) is that dn /dx n [sin ax] = a n sin(ax + nπ/2), STEP 1 for n = 1, 2, . . . . Differentiation gives d/dx[sin ax] = a cos ax,
  • 24. 6 Chapter 1 Review of Prerequisites but setting n = 1 in P(n) leads to the result d/dx[sin ax] = a sin(ax + π/2) = a cos ax, showing that proposition P(n) is true for n = 1 (so in this case n0 = 1). STEP 2 Assuming P(k) to be true for k > 1, differentiation gives d/dx{dk/dx k[sin ax]} = d/dx[a k sin(ax + kπ/2)], so dk+1 /dx k+1 [sin ax] = a k+1 cos(ax + kπ/2). However, replacing k by k + 1 in P(k) gives dk+1 /dx k+1 [sin ax] = a k+1 sin[ax + (k + 1)π/2] = a k+1 sin[(ax + kπ/2) + π/2] = a k+1 cos(ax + kπ/2), showing, as required, that proposition P(k) implies proposition P(k + 1), so Step 2 is true. STEP 3 As P(n) is true for n = 1, and P(k) implies P(k + 1), it follows that the result is true for n = 1, 2, . . . and the proof is complete. The binomial theorem finds applications throughout mathematics at all levels, so we quote it first when the exponent n is a positive integer, and then in its more general form when the exponent α involved is any real number. Binomial theorem when n is a positive integer If a, b are real numbers and n is a positive integer, then n(n − 1) n−2 2 a b 2! n(n − 1)(n − 2) n−3 3 a b + · · · + bn , + 3! (a + b)n = a n + na n−1 b + binomial coefficient or more concisely in terms of the binomial coefficient n n! , = r (n − r )!r ! we have n (a + b)n = r =0 n n−r r a b, r where m! is the factorial function defined as m! = 1 · 2 · 3 · · · m with m > 0 an integer, and 0! is defined as 0! = 1. It follows at once that n n = = 1. 0 n
  • 25. Section 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions 7 The binomial theorem involving the expression (a + b)α , where a and b are real numbers with |b/a| < 1 and α is an arbitrary real number takes the following form. General form of the binomial theorem when α is an arbitrary real number If a and b are real numbers such that |b/a| < 1 and α is an arbitrary real number, then (a + b)α = a α 1 + + b a α = aα 1 + α(α − 1)(α − 2) 3! b a α 1! b α(α − 1) + a 2! b a 2 3 + ··· . The series on the right only terminates after a finite number of terms if α is a positive integer, in which case the result reduces to the one just given. If α is a negative integer, or a nonintegral real number, the expression on the right becomes an infinite series that diverges if |b/a| > 1. EXAMPLE 1.2 Expand (3 + x)−1/2 by the binomial theorem, stating for what values of x the series converges. Solution Setting b/a = 1 x in the general form of the binomial theorem gives 3 1 (3 + x)−1/2 = 3−1/2 1 + x 3 −1/2 1 1 1 5 3 x + ··· . = √ 1 − x + x2 − 6 24 432 3 The series only converges if | 1 x| < 1, and so it is convergent provided |x| < 3. 3 Some standard mathematical conventions Use of combinations of the ± and ∓ signs The occurrence of two or more of the symbols ± and ∓ in an expression is to be taken to imply two separate results, the first obtained by taking the upper signs and the second by taking the lower signs. Thus, the expression a ± b sin θ ∓ c cos θ is an abbreviation for the two separate expressions a + b sin θ − c cos θ and a − b sin θ + c cos θ. Multi-statements multi-statement When a function is defined sectionally on n different intervals of the real line, instead of formulating n separate definitions these are usually simplified by being combined into what can be considered to be a single multi-statement. The following example is typical of a multi-statement: ⎧ ⎨sin x, x < π 0, π ≤ x ≤ 3π/2 f (x) = ⎩ −1, x > 3π/2.
  • 26. 8 Chapter 1 Review of Prerequisites It is, in fact, three statements. The first is obtained by reading f (x) in conjunction with the top line on the right, the second by reading it in conjunction with the second line on the right, and the third by reading it in conjunction with the third line on the right. An example of a multi-statement has already been encountered in the definition of the absolute value |x| of a number x. Frequent use of multi-statements will be made in Chapter 9 on Fourier series, and elsewhere. Polynomials polynomials A polynomial is an expression of the form P(x) = a0 x n + a1 x n−1 + · · · + an−1 x + an . The integer n is called the degree of the polynomial, and the numbers ai are called its coefficients. The fundamental theorem of algebra that is proved in Chapter 14 asserts that P(x) = 0 has n roots that may be either real or complex, though some of them may be repeated. (a0 = 0 is assumed.) Notation for ordinary and partial derivatives If f (x) is an n times differentiable function then f (n) (x) will, on occasion, be used to signify dn f/dx n , so that f (n) (x) = suffix notation for partial derivatives dn f . dx n If f (x, y) is a suitably differentiable function of x and y, a concise notation used to signify partial differentiation involves using suffixes, so that fx = ∂f ∂ , fyx = ( fy )x = ∂x ∂x ∂f ∂y = ∂2 f ∂2 f , fyy = ,..., ∂ y∂ x ∂ y2 with similar results when f is a function of more than two independent variables. Inverse trigonometric functions The periodicity of the real variable trigonometric sine, cosine, and tangent functions means that the corresponding general inverse trigonometric functions are many val√ ued. So, for example, if y = sin x and we ask for what values of x is y = 1/ 2, we find this is true for x = π/4 ± 2nπ and x = 3π/4 ± 2nπ for n = 0, 1, 2, . . . . To overcome this ambiguity, we introduce the single valued inverses, denoted respectively by x = Arcsin y, x = Arccos y, and x = Arctan y by restricting the domain and range of the sine, cosine, and tangent functions to one where they are either strictly increasing or strictly decreasing functions, because then one value of x corresponds to one value of y and, conversely, one value of y corresponds to one value of x. In the case of the function y = sin x, by restricting the argument x to the interval −π/2 ≤ x ≤ π/2 the function becomes a strictly increasing function of x. The corresponding single valued inverse function is denoted by x = Arcsin y, where y is a number in the domain of definition [−1, 1] of the Arcsine function and x is a number in its range [−π/2, π/2]. Similarly, when considering the function y = cos x, the argument is restricted to 0 ≤ x ≤ π to make cos x a strictly decreasing function of x. The corresponding single valued inverse function is denoted by x = Arccos y, where y is a number in the domain of definition [−1, 1] of the Arccosine function and x is a number in its range [0, π]. Finally, in the case of the function y = tan x, restricting
  • 27. Section 1.1 Real Numbers, Mathematical Induction, and Mathematical Conventions 9 the argument to the interval −π/2 < x < π/2 makes the tangent function a strictly increasing function of x. The corresponding single valued inverse function is denoted by x = Arctan y where y is a number in the domain of definition (−∞, ∞) of the Arctangent function and x is a number in its range (−π/2, π/2). As the inverse trigonometric functions are important in their own right, the variables x and y in the preceding definitions are interchanged to allow consideration of the inverse functions y = Arcsin x, y = Arccos x, and y = Arctan x, so that now x is the independent variable and y is the dependent variable. With this interchange of variables the expression y = arcsin x will be used to refer to any single valued inverse function with the same domain of definition as Arcsin x, but with a different range. Similar definitions apply to the functions y = arccos x and y = arctan x. Double summations An expression involving a double summation like ∞ ∞ amn sin mx sin ny, m=1 n=1 double summation means sum the terms amn sin mx sin ny over all possible values of m and n, so that ∞ ∞ amn sin mx sin ny = a11 sin x sin y + a12 sin x sin 2y m=1 n=1 + a21 sin 2x sin y + a22 sin 2x sin 2y + · · · . A more concise notation also in use involves writing the double summation as ∞ amn sin mx sin nx. m=1,n=1 The signum function signum function The signum function, usually written sign(x), and sometimes sgn(x), is defined as sign(x) = 1 if x > 0 −1 if x < 0. We have, for example, sign(cos x) = 1 for 0 < x < π/2, and sign(cos x) = −1 for π/2 < x < π or, equivalently, sign(cos x) = 1, 0 < x < 1 π 2 −1, 1 π < x < π. 2 Products Let {uk}n be a sequence of numbers or functions u1 , u2 , . . . ; then the product of k=1 the n members of this sequence is denoted by n uk, so that k=1 n uk = u1 u2 · · · un . k=1 infinite product When the sequence is infinite, n uk = lim n→∞ k=1 ∞ uk k=1
  • 28. 10 Chapter 1 Review of Prerequisites is called an infinite product involving the sequence {uk}. Typical examples of infinite products are ∞ 1− k=2 1 k2 = 1 2 ∞ and k=1 1− x2 k2 π 2 = sin x . x More background information and examples can be found in the appropriate sections in any of references [1.1], [1.2], and [1.5]. Logarithmic functions the functions ln and Log The notation ln x is used to denote the natural logarithm of a real number x, that is, the logarithm of x to the base e, and in some books this is written loge x. In this book logarithms to the base 10 are not used, and when working with functions of a complex variable the notation Log z, with z = r eiθ means Log z = ln r + iθ . EXERCISES 1.1 √ √ 1. Prove √that if a > 0, b > 0, then a/ b + b/ a ≥ √ a + b. Prove Exercises 2 through 6 by mathematical induction. 2. 3. 4. 5. 6. n−1 k=0 (a + kd) = (n/2)[2a + (n − 1)d] (sum of an arithmetic series). n−1 k r = (1 − r n )/(1 − r ) (r = 1) k=0 (sum of a geometric series). n k2 = (1/6)n(n + 1)(2n + 1) (sum of squares). k=1 dn /dx n [cos ax] = a n cos(ax + nπ/2), with n a natural number. dn /dx n [ln(1 + x)] = (−1)n+1 (n − 1)!/(1 + x)n , with n a natural number. 1.2 7. Use the binomial theorem to expand (3 + 2x)4 . 8. Use the binomial theorem and multiplication to expand (1 − x 2 )(2 + 3x)3 . In Exercises 9 through 12 find the first four terms of the binomial expansion of the function and state conditions for the convergence of the series. 9. 10. 11. 12. (3 + 2x)−2 . (2 − x 2 )1/3 . (4 + 2x 2 )−1/2 . (1 − 3x 2 )3/4 . Complex Numbers Mathematical operations can lead to numbers that do not belong to the real number system R introduced in Section 1.1. In the simplest case this occurs when finding the roots of the quadratic equation ax 2 + bx + c = 0 with a, b, c ∈ R, a = 0 by means of the quadratic formula x= discriminant of a quadratic −b ± √ b2 − 4ac . 2a The discriminant of the equation is b2 − 4ac, and if b2 − 4ac < 0 the formula involves the square root of a negative real number; so, if the formula is to have meaning, numbers must be allowed that lie outside the real number system. The inadequacy of the real number system when considering different mathematical operations can be illustrated in other ways by asking, for example, how to find the three roots that are expected of a third degree algebraic equation as
  • 29. Section 1.2 Complex Numbers 11 simple as x 3 − 1 = 0, where only the real root 1 can be found using y = x 3 − 1, or by seeking to give meaning to ln(−1), both of which questions will arise later. Difficulties such as these can all be overcome if the real number system is extended by introducing the imaginary unit i defined as i 2 = −1, so expressions like (−k2 ) where k a positive real number may be written (−1) (k2 ) = ±ik. Notice that as the real number k only scales the imaginary unit i, it is immaterial whether the result is written as ik or as ki. The extension to the real number system that is required to resolve problems of the type just illustrated involves the introduction of complex numbers, denoted collectively by C, in which the general complex number, usually denoted by z, has the form z = α + iβ, real and imaginary part notation with α, β real numbers. The real number α is called the real part of the complex number z, and the real number β is called its imaginary part. When these need to be identified separately, we write Re{z} = α and Im{z} = β, so if z = 3 − 7i, Re{z} = 3 and Im{z} = −7. If Im{z} = β = 0 the complex number z reduces to a real number, and if Re{z} = α = 0 it becomes a purely imaginary number, so, for example, z = 5i is a purely imaginary number. When a complex number z is considered as a variable it is usual to write it as z = x + i y, where x and y are now real variables. If it is necessary to indicate that z is a general complex number we write z ∈ C. When solving the quadratic equation az2 + bz + c = 0 with a, b, and c real numbers and a discriminant b2 − 4ac < 0, by setting 4ac − b2 = k2 in the quadratic formula, with k > 0, the two roots z1 and z2 are given by the complex numbers z1 = −(b/2a) + i(k/2a) and z2 = −(b/2a) − i(k/2a). Algebraic rules for complex numbers Let the complex numbers z1 and z2 be defined as z1 = a + ib and z2 = c + id, with a, b, c, and d arbitrary real numbers. Then the following rules govern the arithmetic manipulation of complex numbers. Equality of complex numbers The complex numbers z1 and z2 are equal, written z1 = z2 if, and only if, Re{z1 } = Re{z2 } and Im{z1 } = Im{z2 }. So a + ib = c + id if, and only if, a = c and b = d.
  • 30. 12 Chapter 1 Review of Prerequisites EXAMPLE 1.3 (a) 3 − 9i = 3 + bi if, and only if, b = −9. (b) If u = −2 + 5i, v = 3 + 5i, w = a + 5i, then u = w if, and only if, a = −2 but u = v, and v = w if, and only if, a = 3. Zero complex number The zero complex number, also called the null complex number, is the number 0 + 0i that, for simplicity, is usually written as an ordinary zero 0. EXAMPLE 1.4 If a + ib = 0, then a = 0 and b = 0. Addition and subtraction of complex numbers The addition (sum) and subtraction (difference) of the complex numbers z1 and z2 is defined as z1 + z2 = Re{z1 } + Re{z2 } + i[Im{z1 } + Im{z2 }] and z1 − z2 = Re{z1 } − Re{z2 } + i[Im{z1 } − Im{z2 }]. So, if z1 = a + ib and z2 = c + id, then z1 + z2 = (a + ib) + (c + id) = (a + c) + i(b + d), and z1 − z2 = (a + ib) − (c + id) = (a − c) + i(b − d). EXAMPLE 1.5 If z1 = 3 + 7i and z2 = 3 + 2i, then the sum z1 + z2 = (3 + 3) + (7 + 2)i = 6 + 9i, and the difference z1 − z2 = (3 − 3) + (7 − 2)i = 5i. Multiplication of complex numbers The multiplication (product) of the two complex numbers z1 = a + ib and z2 = c + id is defined by the rule z1 z2 = (a + ib)(c + id) = (ac − bd) + i(ad + bc). An immediate consequence of this definition is that if k is a real number, then kz1 = k(a + ib) = ka + ikb. This operation involving multiplication of a complex
  • 31. Section 1.2 Complex Numbers 13 number by a real number is called scaling a complex number. Thus, if z1 = 3 + 7i and z2 = 3 + 2i, then 2z1 − 3z2 = (6 + 14i) − (9 + 6i) = −3 + 8i. In particular, if z = a + ib, then −z = (−1)z = −a − ib. This is as would be expected, because it leads to the result z − z = 0. In practice, instead of using this formal definition of multiplication, it is more convenient to perform multiplication of complex numbers by multiplying the bracketed quantities in the usual algebraic manner, replacing every product i 2 by −1, and then combining separately the real and imaginary terms to arrive at the required product. EXAMPLE 1.6 (a) 5i(−4 + 3i) = −15 − 20i. (b) (3 − 2i)(−1 + 4i)(1 + i) = (−3 + 12i + 2i − 8i 2 )(1 + i) = [(−3 + 8) + (12 + 2)i](1 + i) = (5 + 14i)(1 + i) = 5 + 14i + 5i + 14i 2 = (5 − 14) + (5 + 14)i = −9 + 19i. Complex conjugate If z = a + ib, then the complex conjugate of z, usually denoted by z and read “z bar,” is defined as z = a − ib. It follows directly that (z) = z and zz = a 2 + b2 . In words, the complex conjugate operation has the property that taking the complex conjugate of a complex conjugate returns the original complex number, whereas the product of a complex number and its complex conjugate always yields a real number. If z = a + ib, then adding and subtracting z and z gives the useful results z + z = 2Re{z} = 2a and z − z = 2i Im{z} = 2ib. These can be written in the equivalent form Re{z} = a = 1 (z + z) 2 and Im{z} = b = 1 (z − z). 2i Quotient (division) of complex numbers Let z1 = a + ib and z2 = c + id. Then the quotient z1 /z2 is defined as z1 (ac + bd) + i(bc − ad) = , z2 = 0. z2 c2 + d2
  • 32. 14 Chapter 1 Review of Prerequisites In practice, division of complex numbers is not carried out using this definition. Instead, the quotient is written in the form z1 z1 z2 = , z2 z2 z2 where the denominator is now seen to be a real number. The quotient is then found by multiplying out and simplifying the numerator in the usual manner and dividing the real and imaginary parts of the numerator by the real number z2 z2 . EXAMPLE 1.7 Find z1 /z2 given that z1 = (3 + 2i) and z2 = 1 + 3i. Solution (3 + 2i)(1 − 3i) 3 − 9i + 2i − 6i 2 9 7i 3 + 2i = = = − . 1 + 3i (1 + 3i)(1 − 3i) 10 10 10 Modulus of a complex number The modulus of the complex number z = a + ib denoted by |z|, and also called its magnitude, is defined as |z| = (a 2 + b2 )1/2 = (zz)1/2 . It follows directly from the definitions of the modulus and division that |z| = |z| = (a 2 + b2 )1/2 , and z1 /z2 = z1 z2 /|z2 |2 . EXAMPLE 1.8 If z = 3 + 7i, then |z| = |3 + 7i| = (32 + 72 )1/2 = √ 58. It is seen that the foregoing rules for the arithmetic manipulation of complex numbers reduce to the ordinary arithmetic rules for the algebraic manipulation of real numbers when all the complex numbers involved are real numbers. Complex numbers are the most general numbers that need to be used in mathematics, and they contain the real numbers as a special case. There is, however, a fundamental difference between real and complex numbers to which attention will be drawn after their common properties have been listed. Properties shared by real and complex numbers Let z, u, and w be arbitrary real or complex numbers. Then the following properties are true: z + u = u + z. This means that the order in which complex numbers are added does not affect their sum. 2. zu = uz. This means that the order in which complex numbers are multiplied does not affect their product. 1.
  • 33. Section 1.3 The Complex Plane 3. (z + u) + w = z + (u + w). This means that the order in which brackets are inserted into a sum of finitely many complex numbers does not affect the sum. 4. z(uw) = (zu)w. This means that the terms in a product of complex numbers may be grouped and multiplied in any order without affecting the resulting product. 5. z(u + w) = zu + zw. This means that the product of z and a sum of complex numbers equals the sum of the products of z and the individual complex numbers involved in the sum. 6. z + 0 = 0 + z = z. This result means that the addition of zero to any complex number leaves it unchanged. 7. 15 z · 1 = 1 · z = z. This result means that multiplication of any complex number by unity leaves the complex number unchanged. Despite the properties common to real and complex numbers just listed, there remains a fundamental difference because, unlike real numbers, complex numbers have no natural order. So if z1 and z2 are any complex numbers, a statement such as z1 < z2 has no meaning. EXERCISES 1.2 Find the roots of the equations in Exercises 1 through 6. 1. z2 + z + 1 = 0. 2. 2z2 + 5z + 4 = 0. 3. z2 + z + 6 = 0. 4. 3z2 + 2z + 1 = 0. 5. 3z2 + 3z + 1 = 0. 6. 2z2 − 2z + 3 = 0. 7. Given that z = 1 is a root, find the other two roots of 2z3 − z2 + 3z − 4 = 0. 8. Given that z = −2 is a root, find the other two roots of 4z3 + 11z2 + 10z + 8 = 0. 1.3 9. Given u = 4 − 2i, v = 3 − 4i, w = −5i and a + ib = (u + iv)w, find a and b. 10. Given u = −4 + 3i, v = 2 + 4i, and a + ib = uv2 , find a and b. 11. Given u = 2 + 3i, v = 1 − 2i, w = −3 − 6i, find |u + v|, u + 2v, u − 3v + 2w, uv, uvw, |u/v|, v/w. 12. Given u = 1 + 3i, v = 2 − i, w = −3 + 4i, find uv/w, uw/v and |v|w/u. The Complex Plane cartesian representation of z Complex numbers can be represented geometrically either as points, or as directed line segments (vectors), in the complex plane. The complex plane is also called the z-plane because of the representation of complex numbers in the form z = x + i y. Both of these representations are accomplished by using rectangular cartesian coordinates and plotting the complex number z = a + ib as the point (a, b) in the plane, so the x-coordinate of z is a = Re{z} and its y-coordinate is b = Im{z}. Because of this geometrical representation, a complex number written in the form z = a + ib is said to be expressed in cartesian form. To acknowledge the Swiss amateur mathematician Jean-Robert Argand, who introduced the concept of the complex plane in 1806, and who by profession was a bookkeeper, this representation is also called the Argand diagram.
  • 34. 16 Chapter 1 Review of Prerequisites Imaginary axis Imaginary axis 4 3 4 z = 3i 3 z = 2 + 2i 2 z = 3i z = 2 + 2i 2 1 1 z=4 z=4 0 1 2 3 5 Real axis 0 1 2 3 4 5 Real axis −1 −1 −2 4 z = 2 − 2i (a) −2 z = 2 − 2i (b) FIGURE 1.1 (a) Complex numbers as points. (b) Complex numbers as vectors. triangle and parallelogram laws For obvious reasons, the x-axis is called the real axis and the y-axis the imaginary axis. Purely real numbers are represented by points on the real axis and purely imaginary ones by points on the imaginary axis. Examples of the representation of typical points in the complex plane are given in Fig. 1.1a, where the numbers 4, 3i, 2 + 2i, and 2 − 2i are plotted as points. These same complex numbers are shown again in Fig. 1.1b as directed line segments drawn from the origin (vectors). The arrow shows the sense along the line, that is, the direction from the origin to the tip of the vector representing the complex number. It can be seen from both figures that, when represented in the complex plane, a complex number and its complex conjugate (in this case 2 + 2i and 2 − 2i) lie symmetrically above and below the real axis. Another way of expressing this result is by saying that a complex number and its complex conjugate appear as reflections of each other in the real axis, which acts like a mirror. The addition and subtraction of two complex numbers have convenient geometrical interpretations that follow from the definitions given in Section 1.2. When complex numbers are added, their respective real and imaginary parts are added, whereas when they are subtracted, their respective real and imaginary parts are subtracted. This leads at once to the triangle law for addition illustrated in Fig. 1.2a, in which the directed line segment (vector) representing z2 is translated without rotation or change of scale, to bring its base (the end opposite to the arrow) into coincidence with the tip of the directed line element representing z1 (the end at which the arrow is located). The sum z1 + z2 of the two complex numbers is then represented by the directed line segment from the base of the line segment representing z1 to the tip of the newly positioned line segment representing z2 . The name triangle law comes from the triangle that is constructed in the complex plane during this geometrical process of addition. Notice that an immediate consequence of this law is that addition is commutative, because both z1 + z2 and z2 + z1 are seen to lead to the same directed line segment in the complex plane. For this reason the addition of complex numbers is also said to obey the parallelogram law for addition, because the commutative property generates the parallelogram shown in Fig. 1.2a.
  • 35. Section 1.3 The Complex Plane 17 Imaginary axis Imaginary axis b+d d d z1 z2 + z2 b −c b z1 a−c 0 −z2 z1 0 z2 z1 − zc −z2 a 2 Real axis b−d −d a a + c Real axis c (a) (b) FIGURE 1.2 Addition and subtraction of complex numbers using the triangle/parallelogram law. The geometrical interpretation of the subtraction of z2 from z1 follows similarly by adding to z1 the directed line segment −z2 that is obtained by reversing of the sense (arrow) along z2 , as shown in Fig. 1.2b. It is an elementary fact from Euclidean geometry that the sum of the lengths of the two sides |u| and |v| of the triangle in Fig. 1.3 is greater than or equal to the length of the hypotenuse |u + v|, so from geometrical considerations we can write |u + v| ≤ |u| + |v|. triangle inequality This result involving the moduli of the complex numbers u and v is called the triangle inequality for complex numbers, and it has many applications. An algebraic proof of the triangle inequality proceeds as follows: |u + v|2 = (u + v)(u + v) = uu + vu + uv + vv = |u|2 + |v|2 + (uv + uv) ≤ |u2 | + |v2 | + 2|uv| = (|u| + |v|)2 . The required result now follows from taking the positive square root. A similar argument, the proof of which is left as an exercise, can be used to show that u| − |v ≤ |u + v|, so when combined with the triangle inequality we have u| − |v ≤ |u + v| ≤ |u| + |v|. Imaginary axis u+ ⎢ v⎥ ⎢v⎥ ⎢u⎥ 0 FIGURE 1.3 The triangle inequality. Real axis
  • 36. 18 Chapter 1 Review of Prerequisites EXERCISES 1.3 In Exercises 1 through 8 use the parallelogram law to form the sum and difference of the given complex numbers and then verify the results by direct addition and subtraction. 1. 2. 3. 4. u = 2 + 3i, v = 1 − 2i. u = 4 + 7i, v = −2 − 3i. u = −3, v = −3 − 4i. u = 4 + 3i, v = 3 + 4i. 1.4 5. 6. 7. 8. u = 3 + 6i, v = −4 + 2i. u = −3 + 2i, v = 6i. u = −4 + 2i, v = −4 − 10i. u = 4 + 7i, v = −3 + 5i. In Exercises 9 through 11 use the parallelogram law to verify the triangle inequality |u + v| ≤ |u| + |v| for the given complex numbers u and v. 9. u = −4 + 2i, v = 3 + 5i. 10. u = 2 + 5i, v = 3 − 2i. 11. u = −3 + 5i, v = 2 + 6i. Modulus and Argument Representation of Complex Numbers polar representation of z When representing z = x + i y in the complex plane by a point P with coordinates (x, y), a natural alternative to the cartesian representation is to give the polar coordinates (r, θ) of P. This polar representation of z is shown in Fig. 1.4, where OP = r = |z| = (x 2 + y2 )1/2 and tan θ = y/x. (1) The radial distance OP is the modulus of z, so r = |z|, and the angle θ measured counterclockwise from the positive real axis is called the argument of z. Because of this, a complex number expressed in terms of the polar coordinates (r, θ ) is said to be in modulus–argument form. The argument θ is indeterminate up to a multiple of 2π, because the polar coordinates (r, θ ), and (r, θ + 2kπ ), with k = ±1, ±2, . . . , identify the same point P. By convention, the the angle θ is called the principal value of the argument of z when it lies in the interval −π < θ ≤ π . To distinguish the principal value of the argument from all of its other values, we write Arg z = θ, when −π < θ ≤ π. (2) The values of the argument of z that differ from this value of θ by a multiple of 2π are denoted by arg z, so that arg z = θ + 2kπ, with k = ± 1, ±2, . . . . Imaginary axis P(r, θ) y = r sin θ r= θ O z ⎢⎥ z x = r cos θ Real axis FIGURE 1.4 The complex plane and the (r, θ ) representation of z. (3)
  • 37. Section 1.4 Modulus and Argument Representation of Complex Numbers 19 The significance of the multivalued nature of arg z will become apparent later when the roots of complex numbers are determined. The connection between the cartesian coordinates (x, y) and the polar coordinates (r, θ ) of the point P corresponding to z = x + i y is easily seen to be given by x = r cos θ modulus–argument representation of z and y = r sin θ. This leads immediately to the representation of z = x + i y in the alternative modulus–argument form z = r (cos θ + i sin θ ). (4) A routine calculation using elementary trigonometric identities shows that (cos θ + i sin θ )2 = (cos 2θ + i sin 2θ). An inductive argument using the above result as its first step then establishes the following simple but important theorem. THEOREM 1.1 De Moivre’s theorem (cos θ + i sin θ)n = (cos nθ + i sin nθ), EXAMPLE 1.9 for n a natural number. Use de Moivre’s theorem to express cos 4θ and sin 4θ in terms of powers of cos θ and sin θ . Solution The result is obtained by first setting n = 4 in de Moivre’s theorem and expanding (cos θ + i sin θ )4 to obtain cos4 θ + 4i cos3 θ sin θ − 6 cos2 θ sin2 θ − 4i cos θ sin3 θ + sin4 θ = cos 4θ + i sin 4θ. Equating the respective real and imaginary parts on either side of this identity gives the required results cos 4θ = cos4 θ − 6 cos2 θ sin2 θ + sin4 θ and sin 4θ = 4 cos3 θ sin θ − 4 cos θ sin3 θ. As the complex number z = cos θ + i sin θ has unit modulus, it follows that all numbers of this form lie on the unit circle (a circle of radius 1) centered on the origin, as shown in Fig. 1.5. Using (5), we see that if z = r (cos θ + i sin θ), then zn = r n (cos nθ + i sin nθ), for n a natural number. θ (5) The relationship between e , sin θ, and cos θ can be seen from the following well-known series expansions of the functions eθ = sin θ = cos θ = ∞ n=0 ∞ θ2 θ3 θ4 θ5 θ6 θn =1+θ + + + + + + ···; n! 2! 3! 4! 5! 6! (−1)n θ3 θ5 θ7 θ 2n+1 =θ− + − + ···; (2n + 1)! 3! 5! 7! (−1)n θ2 θ4 θ6 θ 2n =1− + − + ···. (2n)! 2! 4! 6! n=0 ∞ n=0
  • 38. 20 Chapter 1 Review of Prerequisites Imaginary axis i z = cos θ + i sin θ y = sin θ −1 0 θ x = cos θ 1 Real axis −i FIGURE 1.5 Point z = cos θ + i sin θ on the unit circle centered on the origin. Euler formula By making a formal power series expansion of the function eiθ , simplifying powers of i, grouping together the real and imaginary terms, and using the series representations for cos θ and sin θ, we arrive at what is called the real variable form of the Euler formula eiθ = cos θ + i sin θ, for any real θ . (6) This immediately implies that if z = r eiθ , then zα = r α eiαθ , for any real α. (7) When θ is restricted to the interval −π < θ ≤ π , formula (6) leads to the useful results 1 = ei0 , i = eiπ/2 , −1 = eiπ , −i = e−iπ/2 and, in particular, to 1 = e2kπi for k = 0, ±1, ±2, . . . . The Euler form for complex numbers makes their multiplication and division very simple. To see this we set z1 = r1 eiα and z2 = r2 eiβ and then use the results z1 z2 = r1r2 ei(α+β) and z1 /z2 = r1 /r2 ei(α−β) . (8) These show that when complex numbers are multiplied, their moduli are multiplied and their arguments are added, whereas when complex numbers are divided, their moduli are divided and their arguments are subtracted. EXAMPLE 1.10 Find uv, u/v, and u25 given that u = 1 + i, v = √ 3 − i. √ √ √ i Solution u = 1 + i = 2eiπ/4 , v = 3 −√ = 2e−iπ/6 , so uv = 2 2eiπ/12 , u/v = √ iπ/4 25 √ i(6+1/4)π √ i5π/12 2)e while u25 = √ 2e ) = ( √ 25 (eiπ/4 )25 = 4096 2(e ( 2) ) = (1/ √ i6π 4096 2(e )(eiπ/4 ) = 4096 2(eiπ/4 ) = 4096 2(1 + i). To find the principal value of the argument of a given complex number z, namely Arg z, use should be made of the signs of x = Re{z}, and y = Im{z} together
  • 39. Section 1.4 Modulus and Argument Representation of Complex Numbers 21 with the results listed below, all of which follow by inspection of Fig. 1.5. Signs of x and y x < 0, y < 0 x > 0, y < 0 x > 0, y > 0 x < 0, y > 0 EXAMPLE 1.11 Arg z = θ −π < θ < −π/2 −π/2 < θ < 0 0 < θ < π/2 π/2 < θ < π Find r = |z|, Arg z, arg z, and the modulus–argument form of the following values of z. √ √ √ (a) −2 3 − 2i (b) −1 + i 3 (c) 1 + i (d) 2 − i2 3. √ Solution (a) r = {(−2 3)2 + (−2)2 }1/2 = 4, Argz = θ = −5π/6 and arg z = −5π/6 + 2kπ , k = ± 1, ±2, . . . , z = 4(cos(−5π/6) + i sin(−5π/6)). √ (b) r = {(−1)2 + ( 3)2 }1/2 = 2, Arg z = θ = 2π/3 and arg z = 2π/3 + 2kπ, k = ±1, ±2, . . . , z = 2(cos(2π/3) + i sin(2π/3)). √ (c) r = {(1)2 + (1)2 }1/2 = 2, Arg z = θ = π/4 and arg z = π/4 + 2kπ, √ k = ±1, ±2, . . . , z = 2(cos(π/4) + i sin(π/4)). √ (d) r = {(2)2 + (−2 3)2 }1/2 = 4, Arg z = θ = −π/3 and arg z = −π/3 + 2kπ, k = ±1, ±2, . . . , z = 4(cos(−π/3) + i sin(−π/3)). EXERCISES 1.4 1. Expand (cos θ + i sin θ)2 and then use trigonometric identities to show that (cos θ + i sin θ )2 = (cos 2θ + i sin 2θ). 2. Give an inductive proof of de Moivre’s theorem (cos θ + i sin θ)n = (cos nθ + i sin nθ ), for n a natural number. 3. Use de Moivre’s theorem to express cos 5θ and sin 5θ in terms of powers of cos θ and sin θ . 4. Use de Moivre’s theorem to express cos 6θ and sin 6θ in terms of powers of cos θ and sin θ. 5. Show by expanding (cos α + i sin α)(cos β + i sin β) and using trigonometric identities that (cos α + i sin α)(cos β + i sin β) = cos(α + β) + i sin(α + β). 6. Show by expanding (cos α + i sin α)/(cos β + i sin β) and using trigonometric identities that (cos α + i sin α)/(cos β + i sin β) = cos(α − β) + i sin(α − β). 7. If z = cos θ + i sin θ = eiθ , show that when n is a natural number, cos(nθ) = 1 1 n z + n 2 z and sin(nθ) = 1 1 zn − n . 2i z Use these results to express cos3 θ sin3 θ in terms of multiple angles of θ. Hint: z = 1/z. ¯ 8. Use the method of Exercise 7 to express sin6 θ in terms of multiple angles of θ. 9. By expanding (z + 1/z)4 , grouping terms, and using the method of Exercise 7, show that cos4 θ = (1/8)(3 + 4 cos 2θ + cos 4θ). 10. By expanding (z − 1/z)5 , grouping terms, and using the method of Exercise 7, show that sin5 θ = (1/16)(sin 5θ − 5 sin 3θ + 10 sin θ). 11. Use the method of Exercise 7 to show that cos3 θ + sin3 θ = (1/4)(cos 3θ + 3 cos θ − sin3 θ + 3 sin θ).
  • 40. 22 Chapter 1 Review of Prerequisites In Exercises 12 through 15 express the functions of u, v, and w in modulus-argument form. √ 12. uv, u/v, and v5 , given that u = 2 − 2i and v = 3 + i3 3. √ 13. uv, u/v, and u7 , given that u = −1 − i 3, v = −4 + 4i. √ 14. uv, u/v, and v6 , given that u = 2 − 2i, v = 2 − i2 3. 15. uvw, √ uw/v, and w 3 /u4 , given that u = 2 − 2i, v = 3 − i3 3 and w = 1 + i. √ 16. Express [(−8 + i8 3)/(−1 − i)]2 in modulus–argument form. √ 17. Find in modulus–argument form [(1 + i 3)3 / (−1 + i)2 ]3 . 18. Use the factorization (1 − zn+1 ) = (1 − z)(1 + z + z2 + · · · + zn ) 1.5 with z = eiθ = exp(iθ) to show that n exp(ikθ) = k=1 exp(inθ) − 1 . 1 − exp(−iθ) 19. Use the final result of Exercise 18 to show that n exp(ikθ) = k=1 exp[i(n + 1/2)θ] − exp(iθ/2) , exp(iθ/2) − exp(−iθ/2) and then use the result to deduce the Lagrange identity 1 + cos θ + cos 2θ + · · · + cos nθ sin[(n + 1/2)θ] = 1/2 + , for 0 < θ < 2π. 2 sin(θ/2) (z = 1) Roots of Complex Numbers It is often necessary to find the n values of z1/n when n is a positive integer and z is an arbitrary complex number. This process is called finding the nth roots of z. To determine these roots we start by setting w = z1/n , which is equivalent to w n = z. Then, after defining w and z in modulus–argument form as w = ρeiφ and z = reiθ , (9) we substitute for w and z in w n = z to obtain ρ n einφ = r eiθ . It is at this stage, in order to find all n roots, that use must be made of the manyvalued nature of the argument of a complex number by recognizing that 1 = e2kπi for k = 0, ±1, ±2, . . . . Using this result we now multiply the right-hand side of the foregoing result by by e2kπi (that is, by 1) to obtain ρ n einφ = reiθ e2kπi = rei(θ +2kπ) . Equality of complex numbers in modulus–argument form means the equality of their moduli and, correspondingly, the equality of their arguments, so applying this to the last result we have ρn = r and nφ = θ + 2kπ, ρ = r 1/n and φ = (θ + 2kπ)/n. showing that Here r 1/n is simply the nth positive root of r : ρ = √ n r.
  • 41. Section 1.5 Roots of Complex Numbers 23 Imaginary axis w2 w1 (θ + 4π)/ n 2π/n (θ + )/n 2π w0 θ/n 0 Real axis wn−1 FIGURE 1.6 Location of the roots of z1/n . nth roots of a complex number z Finally, when we substitute these results into the expression for w, we see that the n values of the roots denoted by w0 , w1 , . . . , wn−1 are given by wk = r 1/n {cos[(θ + 2kπ)/n] + i sin[(θ + 2kπ )/n]}, for k = 0, 1, . . . , n − 1. (10) Notice that it is only necessary to allow k to run through the successive integers 0, 1, . . . , n − 1, because the period of the sine and cosine functions is 2π, so allowing k to increase beyond the value n − 1 will simply repeat this same set of roots. An identical argument shows that allowing kto run through successive negative integers can again only generate the same n roots w0 , w1 , . . . , wn−1 . Examination of the arguments of the roots shows them to be spaced uniformly around a circle of radius r 1/n centered on the origin. The angle between the radial lines drawn from the origin to each successive root is 2π/n, with the radial line from the origin to the first root w0 making an angle θ/n to the positive real axis, as shown in Fig. 1.6. This means that if the location on the circle of any one root is known, then the locations of the others follow immediately. Writing unity in the form 1 = ei0 shows its modulus to be r = 1 and the principal value of its argument to be θ = 0. Substitution in formula (10) then shows the n roots of 11/n , called the nth roots of unity, to be w0 = 1, w1 = eiπ/n , w2 = ei2π/n , . . . , wn−1 = ei(n−1)π/n . (11) By way of example, the fifth roots of unity are located around the unit circle as shown in Fig. 1.7. If we set ω = w1 , it follows that the nth roots of unity can be written in the form 1, ω, ω2 , . . . , ωn−1 . As ωn = 1 and ωn − 1 = (ω − 1)(1 + ω + ω2 + · · · + ωn−1 ) = 0, as ω1 = 1 we see that the the nth roots of unity satisfy 1 + ω + ω2 + · · · + ωn−1 = 0. (12)
  • 42. 24 Chapter 1 Review of Prerequisites Imaginary axis i w1 w2 2π /5 2π /5 w0 2π/5 1 0 Real axis 2π /5 2π /5 w3 w4 FIGURE 1.7 The fifth roots of unity. This result remains true if ω is replaced by any one of the other nth roots of unity, with the exception of 1 itself. EXAMPLE 1.12 Find w = (1 + i)1/3 . √ √ Solution Setting z = 1 + i = 2eiπ/4 shows that r = |z| = 2 and θ = π/4. Substituting these results into formula (1) gives wk = 21/6 {cos[(1/12)(1 + 8k)π ] + i sin[(1/12)(1 + 8k)π ]}, for k = 0, 1, 2. The square root of a complex number ζ = α + iβ is often required, so we now derive a useful formula for its two roots in terms of |ζ |, α and the sign of β. To obtain the result we consider the equation z2 = ζ, where ζ = α + iβ, and let Arg ζ = θ . Then we may write z2 = |ζ |eiθ , and taking the square root of this result we find the two square roots z− and z+ are given by z± = ±|ζ |1/2 eiθ/2 = ±|ζ |1/2 {cos(θ/2) + i sin(θ/2)}. Now cos θ = α/|ζ |, but cos2 (θ/2) = (1/2)(1 + cos θ ), and sin2 (θ/2) = (1/2)(1 − cos θ),
  • 43. Section 1.5 Roots of Complex Numbers 25 so cos2 (θ/2) = (1/2)(1 + α/|ζ |), and sin2 (θ/2) = (1/2)(1 − α/|ζ |). As −π < θ ≤ π, it follows that in this interval cos(θ/2) is nonnegative, so taking the square root of cos2 (θ/2) we obtain cos(θ/2) = |ζ | + α 2|ζ | 1/2 . However, the function sin(θ/2) is negative in the interval −π < θ < 0 and positive in the interval 0 < θ < π, and so has the same sign as β. Thus, the square root of sin2 (θ/2) can be written in the form sin(θ/2) = sign (β) |ζ | − α 2|ζ | 1/2 . Using these expressions for cos(θ/2) and sin(θ/2) in the square roots z± brings us to the following useful rule. Rule for finding the square root of a complex number Let z2 = ζ , with ζ = α + iβ. Then the square roots z+ and z− of ζ are given by z+ = z− = − EXAMPLE 1.13 |ζ | + α 2 1/2 |ζ | + α 2 + i sign (β) 1/2 − i sign (β) |ζ | − α 2 |ζ | − α 2 1/2 1/2 . Find the square roots of (a) ζ = 1 + i and (b) ζ = 1 − i. √ Solution (a) ζ = 1 + i so |ζ | = 2, α = 1 and sign(β) = 1, so the square roots of ζ = 1 + i are ⎧ √ ⎫ √ 1/2 1/2 ⎬ ⎨ 2+1 2−1 z± = ± +i . ⎩ ⎭ 2 2 √ (b) ζ = 1 − i, so |ζ | = 2, α = 1 and sign(β) = −1, from which it follows that the square roots of ζ = 1 − i are ⎧ √ ⎫ √ 1/2 1/2 ⎬ ⎨ 2+1 2−1 −i . z± = ± ⎩ ⎭ 2 2 The theorem that follows provides information about the roots of polynomials with real coefficients that proves to be useful in a variety of ways.
  • 44. 26 Chapter 1 Review of Prerequisites THEOREM 1.2 Roots of a polynomial with real coefficients Let P(z) = zn + a1 zn−1 + a2 zn−2 + · · · an−1 z + an be a polynomial of degree n in which all the coefficients a1 , a2 , . . . , an are real. Then either all the n roots of P(z) = 0 are real, that is, the n zeros of P(z) are all real, or any that are complex must occur in complex conjugate pairs. Proof The proof uses the following simple properties of the complex conjugate operation. 1. If a is real, then a = a. This result follows directly from the definition of the complex conjugate operation. 2. If b and c are any two complex numbers, then b + c = b + c. This result also follows directly from the definition of the complex conjugate operation. 3. If b and c are any two complex numbers, then bc = bc and br = (b)r . We now proceed to the proof. Taking the complex conjugate of P(z) = 0 gives zn + a1 zn−1 + a2 zn−2 + · · · + an−1 z + an = 0, but the ar are all real so ar zn−r = ar zn−r = ar zn−r = ar (z)n−r , allowing the preceding equation to be rewritten as (z)n + a1 (z)n−1 + a2 (z)n−2 + · · · + an−1 z + an = 0. This result is simply P(z) = 0, showing that if z is a complex root of P(z), then so also is z. Equivalently, z and z are both zeros of P(z). If, however, z is a real root, then z = z and the result remains true, so the first part of the theorem is proved. The second part follows from the fact that if z = α + iβ is a root, then so also is z = α − iβ, and so (z − α − iβ) and (z − α + iβ) are factors of P(z). The product of these factors must also be a factor of P(z), but (z − α − iβ)(z − α + iβ) = z2 − 2αz + α 2 + β 2 , and the expression on the right is a quadratic in z with real coefficients, so the final result of the theorem is established. EXAMPLE 1.14 Find the roots of z3 − z2 − z − 2 = 0, given that z = 2 is a root. Solution If z = 2 is a root of P(z) = 0, then z − 2 is a factor of P(z), so dividing P(z) by z − 2 we obtain z2 + z + 1. The remaining two roots of P(z) = 0 are the (−1 roots of z2 + z + 1 = 0. Solving this quadratic equation we find that z =√ ± √ √ i 3)/2, so the three roots of the equation are 2, (−1 + i 3)/2, and (−1 − i 3)/2. For more background information and examples on complex numbers, the complex plane and roots of complex numbers, see Chapter 1 of reference [6.1], Sections 1.1 to 1.5 of reference [6.4], and Chapter 1 of reference [6.6].
  • 45. Section 1.6 Partial Fractions 27 EXERCISES 1.5 In Exercises 1 through 8 find the square roots of the given complex number by using result (10), and then confirm the result by using the formula for finding the square root of a complex mumber. 1. 2. 3. 4. −1 + i. 3 + 2i. i. −1 + 4i. 5. 6. 7. 8. 1 + cos(2π/n) + cos(4π/n) + · · · + cos[(2(n − 1)π/n)] = 0 and 2 − 3i. −2 − i. 4 − 3i. −5 + i. sin(2π/n) + sin(4π/n) + · · · + sin[(2(n − 1)π/n)] = 0. In Exercises 9 through 14 find the roots of the given complex number. √ 12. (−1 − i)1/3 . 9. (1 + i 3)1/3 . 1/4 13. (−i)1/3 . 10. i . 1/4 14. (4 + 4i)1/4 . 11. (−1) . 15. Find the roots of z3 + z(i − 1) = 0. 16. Find the roots of z3 + i z/(1 + i) = 0. 1.6 17. Use result (12) to show that 18. Use Theorem 1.1 and the representation z = r eiθ to prove that if a and b are any two arbitrary complex numbers, then ab = ab and (a r ) = (a)r . 19. Given z = 1 is a zero of the polynomial P(z) = z3 − 5z2 + 17z − 13, find its other two zeros and verify that they are complex conjugates. 20. Given that z = −2 is a zero of the polynomial P(z) = z5 + 2z4 − 4z − 8, find its other four zeros and verify that they occur in complex conjugate pairs. 21. Find the two zeros of the quadratic P(z) = z2 − 1 + i, and explain why they do not occur as a complex conjugate pair. Partial Fractions Let N(x) and D(x) be two polynomials. Then a rational function of x is any function of the form N(x)/D(x). The method of partial fractions involves the decomposition of rational functions into an equivalent sum of simpler terms of the type P2 P1 , ,... ax + b (ax + b)2 and Q2 x + R2 Q1 x + R1 , ,..., 2 + Bx + C 2 + Bx + C)2 Ax (Ax where the coefficients are all real together with, possibly, a polynomial in x. The steps in the reduction of a rational function to its partial fraction representation are as follows: STEP 1 Factorize D(x) into a product of linear factors and quadratic factors with real coefficients with complex roots, called irreducible factors. This is the hardest step, and real quadratic factors will only arise when D(x) = 0 has pairs of complex conjugate roots (see Theorem 1.2). Use the result to express D(x) in the form D(x) = (a1 x + b1 )r1 . . . (am x + bm)rm A1 x 2 + B1 x + C1 . . . Ak x 2 + Bk x + Ck sk s1 , where ri is the number of times the linear factor (ai x + bi ) occurs in the factorization of D(x), called its multiplicity, and s j is the corresponding multiplicity of the quadratic factor (Aj x 2 + Bj x + C j ).
  • 46. 28 Chapter 1 Review of Prerequisites STEP 2 Suppose first that the degree n of the numerator is less than the degree d of the denominator. Then, to every different linear factor (ax + b) with multiplicity r , include in the partial fraction expansion the terms P1 P2 Pr + + ··· + , (ax + b) (ax + b)2 (ax + b)r partial fraction undetermined coefficients where the constant coefficients Pi are unknown at this stage, and so are called undetermined coefficients. STEP 3 To every quadratic factor ( Ax 2 + Bx + C)s with multiplicity s include in the partial fraction expansion the terms Q1 x + R1 Q2 x + R2 Qs x + Rs + + ··· + , 2 + Bx + C) 2 + Bx + C)2 2 + Bx + C)s (Ax (Ax (Ax where the Q j and Rj for j = 1, 2, . . . , s are undetermined coefficients. STEP 4 Take as the partial fraction representation of N(x)/D(x) the sum of all the terms in Steps 2 and 3. STEP 5 Multiply the expression N(x)/D(x) = Partial fraction representation in Step 4 by D(x), and determine the unknown coefficients by equating the coefficients of corresponding powers of x on either side of this expression to make it an identity (that is, true for all x). STEP 6 Substitute the values of the coefficients determined in Step 5 into the expression in Step 4 to obtain the required partial fraction representation. STEP 7 If n ≥ d, use long division to divide the denominator into the numerator to obtain the sum of a polynomial of degree n − d of the form T0 + T1 x + T2 x 2 + · · · + Tn−d x n−d , together with a remainder term in the form of a rational function R(x) of the type just considered. Find the partial fraction representation of the rational function R(x) using Steps 1 to 6. The required partial fraction representation is then the sum of the polynomial found by long division and the partial fraction representation of R(x). EXAMPLE 1.15 Find the partial fraction representations of (a) F(x) = x2 (x + 1)(x − 2)(x + 3) and (b) F(x) = 2x 3 − 4x 2 + 3x + 1 . (x − 1)2 Solution (a) All terms in the denominator are linear factors, so by Step 1 the appropriate form of partial fraction representation is A B C x2 = + + . (x + 1)(x − 2)(x + 3) x+1 x−2 x+3 Cross multiplying, we obtain x 2 = A(x − 2)(x + 3) + B(x + 1)(x + 3) + C(x + 1)(x − 2).
  • 47. Section 1.6 Partial Fractions 29 Setting x = −1 makes the terms in B and C vanish and gives A = −1/6. Setting x = 2 makes the terms in A and C vanish and gives B = 4/15, whereas setting x = −3 makes the terms in A and B vanish and gives C = 9/10, so x2 −1 4 9 = + + . (x + 1)(x − 2)(x + 3) 6(x + 1) 15(x − 2) 10(x + 3) (b) The degree of the numerator exceeds that of the denominator, so from Step 7 it is necessary to start by dividing the denominator into the numerator longhand to obtain 2x 3 − 4x 2 + x + 3 3−x = 2x + . 2 (x − 1) (x − 1)2 We now seek a partial fraction representation of (3 − x)/(x − 1)2 by using Step 1 and writing 3−x B A + = . 2 (x − 1) x − 1 (x − 1)2 When we multiply by (x − 1)2 , this becomes 3 − x = A(x − 1) + B. Equating the constant terms gives 3 = −A+ B, whereas equating the coefficients of x gives −1 = Aso that B = 2. Thus, the required partial fraction representation is 2x 3 − 4x 2 + x + 3 2 1 + = 2x + . (x − 1)2 1 − x (x − 1)2 An examination of the way the undetermined coefficients were obtained in (a) earlier, where the degree of the numerator is less than that of the denominator and linear factors occur in the denominator, leads to a simple rule for finding the undetermined coefficients called the “cover-up rule.” The cover-up rule Let a partial fraction decomposition be required for a rational function N(x)/D(x) in which the degree of the numerator N(x) is less than that of the denominator D(x) and, when factored, let D(x) contain some linear factors (factors of degree 1). Let (x − α) be a linear factor of D(x). Then the unknown coefficient K in the term K/(x − α) in the partial fraction decomposition of N(x)/D(x) is obtained by “covering up” (ignoring) all of the other terms in the partial fraction expansion, multiplying the remaining expression N(x)/D(x) = K/(x − α) by (x − α), and then determining K by setting x = α in the result. To illustrate the use of this rule we use it in case (a) given earlier to find Afrom the representation x2 A B C = + + . (x + 1)(x − 2)(x + 3) x+1 x−2 x+3
  • 48. 30 Chapter 1 Review of Prerequisites We “cover up” (ignore) the terms involving B and C, multiply through by (x + 1), and find A from the result x2 =A (x − 2)(x + 3) completing the square by setting x = −1, when we obtain A = −1/6. The undetermined coefficients B and C follow in similar fashion. Once a partial fraction representation of a function has been obtained, it is often necessary to express any quadratic x 2 + px + q that occurs in a denominator in the form (x + A)2 + B, where A and B may be either positive or negative real numbers. This is called completing the square, and it is used, for example, when integrating rational functions and when finding inverse Laplace transforms. To find A and B we set x 2 + px + q = (x + A)2 + B = x 2 + 2Ax + A2 + B, and to make this an identity we now equate the coefficients of corresponding powers of x on either side of this expression: (coefficients of x 2 ) (coefficients of x) (constant terms) 1 = 1 (this tells us nothing) p = 2A q = A2 + B. Consequently A = (1/2) p and B = q − (1/4) p2 , and so the result obtained by completing the square is x 2 + px + q = [x + (1/2) p]2 + q − (1/4) p2 . If the more general quadratic ax 2 + bx + c occurs, all that is necessary to reduce it to this same form is to write it as ax 2 + bx + c = a[x 2 + (b/a)x + c/a], and then to complete the square using p = b/a and q = c/a. EXAMPLE 1.16 Complete the square in the following expressions: (a) x 2 + x + 1. (b) x 2 + 4x. (c) 3x 2 + 2x + 1. Solution (a) p = 1, q = 1, so A = 1/2, B = 3/4, and hence x 2 + x + 1 = (x + 1/2)2 + 3/4. (b) p = 4, q = 0, so A = 2, B = −4, and hence x 2 + 4x = (x + 2)2 − 4. (c) 3x 2 + 2x + 1 = 3[x 2 + (2/3)x + 1/3] and so p = 2/3, q = 1/3, from which it follows that A = 1/3 and B = 2/9, so 3x 2 + 2x + 1 = 3{(x + 1/3)2 + 2/9}. Further information and examples of partial fractions can be found in any one of references [1.1] to [1.7].
  • 49. Section 1.7 Fundamentals of Determinants 31 EXERCISES 1.6 6. (x 2 − 1)/(x 2 + x + 1). 7. (x 3 + x 2 + x + 1)/{(x + 2)2 (x + 1)}. 8. (x 2 + 4)/(x 3 + 3x 2 + 3x + 1). Express the rational functions in Exercises 1 through 8 in terms of partial fractions using the method of Section 1.6, and verify the results by using computer algebra to determine the partial fractions. 1. 2. 3. 4. 5. Complete the square in Exercises 9 through 14. (3x + 4)/(2x 2 + 5x + 2). (x 2 + 3x + 5)/(2x 2 + 5x + 3). (3x − 7)/(2x 2 + 9x + 10). (x 2 + 3x + 2)/(x 2 + 2x − 3). (x 3 + x 2 + x + 1)/[(x + 2)2 (x 2 + 1)]. 1.7 9. x 2 + 4x + 5. 10. x 2 + 6x + 7. 11. 2x 2 + 3x − 6. 12. 4x 2 − 4x − 3. 13. 2 − 2x + 9x 2 . 14. 2 + 2x − x 2 . Fundamentals of Determinants A determinant of order n is a single number associated with an array A of n2 numbers arranged in n rows and n columns. If the number in the ith row and jth column of a determinant is ai j , the determinant of A, denoted by det A and sometimes by |A|, is written a11 a21 det A = |A| = ... an1 a12 a22 ... an2 . . . a1n . . . a2n . ... ... . . . ann (13) It is customary to refer to the entries ai j in a determinant as its elements. Notice the use of vertical bars enclosing the array A in the notation |A| for the determinant of A, as opposed to the use of the square brackets in [A] that will be used later to denote the matrix associated with an array A of quantities in which the number of rows need not be equal to the number of columns. The value of a first order determinant det A with the single element a11 is defined as a11 so that det[a11 ] = a11 or, in terms of the alternative notation for a determinant, |a11 | = a11 . This use of the notation |.| to signify a determinant should not be confused with the notation used to signify the absolute value of a number. The second order determinant associated with an array of elements containing two rows and two columns is defined as det A = a11 a21 a12 = a11 a22 − a12 a21 , a22 (14) so, for example, using the alternative notation for a determinant we have 9 −7 3 = 9(−4) − (−7)3 = −15. −4 Notice that interchanging two rows or columns of a determinant changes its sign. We now introduce the terms minor and cofactor that are used in connection with determinants of all orders, and to do so we consider the third order determinant a11 det A = a21 a31 a12 a22 a32 a13 a23 . a33 (15)
  • 50. 32 Chapter 1 Review of Prerequisites minors and cofactors The minor Mi j associated with ai j , the element in the ith row and jth column of det A, is defined as the second order determinant obtained from det A by deleting the elements (numbers) in its ith row and jth column. The cofactor Ci j of an element in the ith row and jth column of the det A in (15) is defined as the signed minor using the rule Ci j = (−1)i+ j Mi j . (16) With these ideas in mind, the determinant det A in (15) is defined as 3 det A = a1 j (−1)1+ j det M1 j j=1 = a11 M11 − a12 M12 + a13 M13 . If we introduce the cofactors Ci j , this last result can be written det A = a11 C11 + a12 C12 + a13 C13 , (17) and more concisely as 3 det A = a1 j C1 j . (18) j=1 Result (18), or equivalently (17), will be taken as the definition of a third order determinant. EXAMPLE 1.17 Evaluate the determinant 1 2 −2 3 −3 1 0 . 1 1 Solution The minor M11 = 1 0 = (1)(1) − (0)(1) = 1, so the cofactor 1 1 C11 = (−1)(1+1) M11 = 1. 2 The minor M12 = −2 0 = (2)(1) − (0)(−2) = 2, so the cofactor 1 C12 = (−1)(1+2) M12 = −2. 2 The minor M13 = −2 1 = (2)(1) − (1)(−2) = 4, so the cofactor 1 C13 = (−1)(1+3) M13 = 4. Using (17) we have 1 2 −2 3 −3 1 0 = (1)C11 + (3)C12 + (−3)C13 = (1)(1) + (3)(−2) + (−3)(4) = −17. 1 1 When expanded, (17) becomes det A = a11 a22 a33 − a11 a32 a23 − a12 a21 a33 + a12 a31 a23 + a13 a21 a32 − a13 a31 a22 ,
  • 51. Section 1.7 Fundamentals of Determinants 33 and after regrouping these terms in the form det A = −a21 a12 a33 + a21 a32 a13 + a22 a11 a33 − a22 a31 a13 − a23 a11 a32 + a23 a31 a12 , we find that det A = a21 C21 + a22 C22 + a23 C23 . Proceeding in this manner, we can easily show that det A may be obtained by forming the sum of the products of the elements of A and their cofactors in any row or column of det A. These results can be expressed symbolically as follows. Expanding in terms of the elements of the ith row: 3 det A = ai1 Ci1 + ai2 Ci2 + ai3 Ci3 = ai j Ci j . (19) j=1 Laplace expansion theorem Expanding in terms of the elements of the jth column: 3 det A = a1 j C1 j + a2 j C2 j + a3 j C3 j = ai j Ci j . (20) i=1 Results (19) and (20) are the form taken by the Laplace expansion theorem when applied to a third order determinant. The extension of the theorem to determinants of any order will be made later in Chapter 3, Section 3.3. EXAMPLE 1.18 Expand the following determinant (a) in terms of elements of its first row, and (b) in terms of elements of its third column: 1 |A| = 1 1 2 0 2 4 2 . 1 Solution (a) Expanding in terms of the elements of the first row requires the three cofactors C11 = M11 , C12 = −M12 , and C13 = M13 , where M11 = 0 2 2 = −4, 1 M12 = 1 1 2 = −1, 1 M13 = 1 1 0 = 2, 2 so C11 = (−1)(1+1) (−4) = −4, C12 = (−1)(1+2) (−1) = 1, C13 = (−1)(1+3) (2) = 2, and so |A| = (1)(−4) + (2)(1) + (4)(2) = 6. (b) Expanding in terms of the elements of the third column requires the three cofactors C13 = M13 , C23 = −M23 , and C33 = M33 , where M13 = 1 1 0 = 2, 2 M23 = 1 1 2 = 0, 2 M33 = 1 1 2 = −2, 0 so C13 = (−1)(1+3) (2) = 2, C23 = 0, C33 = (−1)(3+3) (−2) = −2 and so |A| = (4)(2) + (2)(0) + (1)(−2) = 6.
  • 52. 34 Chapter 1 Review of Prerequisites Two especially simple third order determinants are of the form a11 det A = 0 0 a12 a22 0 a13 a23 a33 a11 det A = a21 a31 and 0 a22 a32 0 0 . a33 The first of these determinants has only zero elements below the diagonal line drawn from its top left element to its bottom right one, and the second determinant has only zero elements above this line. This diagonal line in every determinant is called the leading diagonal. The value of each of the preceding determinants is easily seen to be given by the product a11 a22 a33 of the terms on its leading diagonal. Simpler still in form is the third order determinant a11 det A = 0 0 0 a22 0 0 0 = a11 a22 a33 , a33 whose value a11 a22 a33 is again the product of the elements on the leading diagonal. For another approach to the elementary properties of determinants, see Appendix A16 of reference [1.2], and Chapter 2 of reference [2.1]. EXERCISES 1.7 Evaluate the determinants in Exercises 1 through 6 (a) in terms of elements of the first row and (b) in terms of elements of the second column. −1 2 −1 1 1. 1 1 5 7 −1 1 . 2 1 2 2. 2 5 1 6 1 −1 −1 . −1 1 5. 2 4 5 3. 1 3 2 2 1 4 1 . 5 6. 4. 3 1 3 0 1 3 6 4 . 1 −6 3 . 21 1 5 −1 2 1 −3 . −4 1 1 7. On occasion the elements of a matrix may be functions, in which case the determinant may be a function. Evaluate the functional determinant 1 0 0 0 sin x cos x 0 − cos x . sin x 8. Determine the values of λ that make the following determinant vanish: 3−λ 2 2 2 2 2−λ 0 . 0 4−λ Hint: This is a polynomial in λ of degree 3. 9. A matrix is said to be transposed if its first row is written as its first column, its second row is written as its second column . . . , and its last row is written as its last column. If the determinant is |A|, the determinant of AT , the transpose matrix A, is denoted by |AT |. Write out the expansion of |A| using (17) and reorder the terms to show that |A| = |AT |. 10. Use elimination to solve the system of linear equations a11 x1 + a12 x2 = b1 a21 x1 + a22 x2 = b2 for x1 and x2 , in which not both b1 and b2 are zero, and show that the solution can be written in the form x1 = D1 /|A| and x2 = D2 /|A|, provided |A| = 0, where |A| is the determinant of the matrix of coefficients of the system |A| = a11 a12 b a a b , D1 = 1 12 , and D2 = 11 1 . a21 a22 b2 a22 a21 b2 Notice that D1 is obtained from |A| by replacing its first column by b1 and b2 , whereas D2 is obtained from |A| by replacing its second column by b1 and b2 . This is Cramer’s rule for a system of two simultaneous equations. Use this method to find the solution of x1 + 5x2 = 3 7x1 − 3x2 = −1.
  • 53. Section 1.8 Continuity in One or More Variables where |A| is the determinant of the matrix of coefficients and Di is the determinant obtained from |A| by replacing its ith column by b1 , b2 , and b3 for i = 1, 2, 3. This is Cramer’s rule for a system of three simultaneous equations, and the method generalizes to a system of n linear equations in n unknowns. Use this method to find the solution of 11. Repeat the calculation in Exercise 10 using the system of equations a11 x1 + a12 x2 + a13 x3 = b1 a21 x1 + a22 x2 + a23 x3 = b2 a31 x1 + a32 x2 + a33 x3 = b3 , in which not all of b1 , b2 , and b3 are zero, and show that provided |A| = 0, x1 = D1 /|A|, 1.8 x2 = D2 /|A|, and 35 x1 + 2x2 − x3 = 2 x1 − 3x2 − 2x3 = −1 2x1 + x2 + 2x3 = 1. x3 = D3 /|A|, Continuity in One or More Variables If the function y = f (x) is defined in the interval a ≤ x ≤ b, the interval is called the domain of definition of the function. The function f is said to have a limit at a point c in a ≤ x ≤ b, written limx→c f (x) = L, if for every arbitrarily small number ε > 0 there exists a number δ > 0 such that | f (x) − L| < ε when |x − c| < δ. (21) This technical definition means that as x either increases toward c and becomes arbitrarily close to it, or decreases toward c and becomes arbitrarily close to it, so f (x) approaches arbitrarily close to the value L. Notice that it is not necessary for f (x) to be defined at x = c, or, if it is, that f (c) assumes the value L. If f (x) has a limit L as x → c and in addition f (c) = L, so that lim f (x) = f (c) = L, x→c continuity from the right then the function f is said to be continuous at c. It must be emphasized that in this definition of continuity the limiting operation x → c must be true as x tends to c from both the left and right. It is convenient to say that x approaches c from the left when it increases toward c and, correspondingly, to say that x approaches c from the right when it decreases toward it. The function f is continuous from the right at x = c if lim f (x) = f (c), x→c+ continuity from the left lim f (x) = f (c), (24) where now x → c− means that x increases toward c, causing x to tend to c from the left. The relationship among definitions (22), (23), and (24) is that f is continuous at the point c if lim f (x) = lim+ f (x) = f (c). x→c− continuous function (23) where the notation x → c+ means that x decreases toward c, causing x to tend to c from the right. Similarly, f is continuous from the left at x = c if x→c− continuity at x = c (22) x→c (25) When expressed in words, this says that f is continuous at x = c if the limits of f as x tends to c from both the left and right exist and, furthermore, the limits equal the functional value f (c). A function f that is continuous at all points of a ≤ x ≤ b is said to be a continuous function on that interval. Graphically, a continuous function on a ≤ x ≤ b is a function whose graph is unbroken but not necessarily smooth. A function f is said
  • 54. 36 Chapter 1 Review of Prerequisites Discontinuous Continuous from the left at x = d y Continuous at x = c y Discontinuous at x = c k2 f (c) k1 y = f(x ) y = f (x ) Continuous from the right 0 a c d b x 0 a c (a) b x (b) FIGURE 1.8 (a) A continuous function for a < x < b. (b) A discontinuous function. smooth function continuous and piecewise smooth function discontinuous function to be smooth over an interval if at each point of the graph the tangent lines to the left and right of the point are the same. Figure 1.8a shows the graph of a continuous function that is smooth over the intervals a ≤ x < c and c < x < b but has different tangent lines to the immediate left and right of x = c where the function is not smooth. A function such as this is said to be continuous and piecewise smooth over the interval a ≤ x ≤ b. A function f is said to be discontinuous at a point c if it is not continuous there. For a jump discontinuity we have lim f (x) = k1 x→c− piecewise continuity and lim f (x) = k2 , x→c+ but k1 = k2 . (26) A function f is said to have a removable discontinuity at a point c if k1 = k2 in (26), but f (c) = k1 , as at the point c2 in Fig. 1.9. An example of a discontinuous function is shown in Fig. 1.8b where a jump discontinuity occurs at x = c. A function f is said to be piecewise continuous on an interval a ≤ x ≤ b if it is continuous on a finite number of adjacent subintervals, but discontinuous at the end points of the subintervals, as shown in Fig. 1.9. The notion of continuity of a function of several variables is best illustrated by considering a function f (x, y) of the two independent variables x and y. The function f defined in some region of the (x, y)-plane D, say, is said to be continuous y Discontinuous at c1 Discontinuous at c3 y = f (x ) Discontinuous at c2 0 a c1 c2 FIGURE 1.9 A piecewise continuous function. c3 b x
  • 55. Section 1.8 Continuity in One or More Variables 37 at the point (a, b) in D if continuity of f(x, y) lim x→a,y→b discontinuity of f(x, y) f (x, y) = f (a, b), (27) and to be discontinuous otherwise. In this definition of continuity, it is important to recognize that a general point P at (x, y) is allowed to tend to the point (a, b) in D along any path in the (x, y)-plane that lies in D. Expressed differently, f will only be continuous at (a, b) if the limit in (27) is independent of the way in which the point (x, y) approaches the point (a, b). When this is true for all points in D, the function f is said to be continuous in D. The function f is, for instance, discontinuous at (a, b) if lim x→a,y→b f (x, y) = k, but f (a, b) = k. Sufficient for showing that a function f is discontinuous at a point (a, b) is by demonstrating that two different limiting values of f are obtained if the point P at (x, y) is allowed to tend to (a, b) along two different straight-line paths. This approach can be used to show that the function xy f (x, y) = 2 x + a 2 y2 has no limit at the origin. If we allow the point P at (x, y) to tend to the origin along the straight line y = kx, with k an arbitrary constant, the function f becomes f (x, kx) = k , 1 + a 2 k2 and it is seen from this that f is constant along each such line. However, the value of f on each line, and hence at the origin, depends on k, so f has no limit at the origin and so is discontinuous at that point, though f is defined and continuous at all other points of the (x, y)-plane. An example of a function f (x, y) that is continuous everywhere except at points along a curve in the (x, y)-plane is shown in Fig. 1.10. z = f (x, y ) us o tinu con Dis long Γ a z y 0 Γ x D FIGURE 1.10 A function f (x, y) continuous everywhere except at points on .
  • 56. 38 Chapter 1 Review of Prerequisites The extension of these definitions to functions of n variables is immediate and so will not be discussed. Discussions on continuity and its consequences can be found in any one of references [1.1] to [1.7]. 1.9 Differentiability of Functions of One or More Variables The function f (x) defined in a ≤ x ≤ b is said to be differentiable with the derivative f (c) at a point c inside the interval if the following limit exists: lim h→0 differentiability of f(x) left- and right-hand derivatives of f(x) f (c + h) − f (c) = f (c). h (28) Here, as in the definition of continuity, for f to be differentiable at point c the limit must remain unchanged as h tends to zero through both positive and negative values. The function f is said to be differentiable in the interval a ≤ x ≤ b if it is differentiable at every point in the interval. When f is differentiable at a point c with derivative f (c), the number f (c) is the gradient, or slope, of the tangent line to the graph at the point (c, f (c)). A function with a continuous derivative throughout an interval is said to be a smooth function over the interval. The function f will be said to be nondifferentiable at any point c where the limit in (28) does not exist. Even when a function f is nondifferentiable at a point, it is possible that a special form of derivative can still be defined to the left and right of the point if the requirement that the limit in (28) exists as h → 0 through both positive and negative values is relaxed. The function f has a right-hand derivative at a if the limit lim+ h→0 f (a + h) − f (a) h (29) exists, and a left-hand derivative at b if the limit lim h→0− first order partial derivatives of f(x, y) f (b + h) − f (b) . h (30) exists. When c is a specific point, f (c) is a number, but when x is a variable, f (x) becomes a function. Left- and right-hand derivatives are illustrated in Fig. 1.11. An important consequence of differentiability is that differentiability implies continuity, but the converse is not true. The first order partial derivative with respect to x of the function f (x, y) of the two independent variables x and y at the point (a, b) is the number defined by lim h→0 f (a + h, b) − f (a, b) , h (31)
  • 57. Section 1.9 Differentiability of Functions of One or More Variables 39 left-hand derivative equal to slope of line y f (b) f (c) y = f (x) right-hand derivative equal to slope of line right-hand derivative equal to slope of line left-hand derivative equal to slope of line f (a) 0 a c b x FIGURE 1.11 Left- and right-hand derivatives as tangent lines. provided the limit exists. The value of this partial derivative is denoted either by ∂ f/∂ x at (a, b), or by fx (a, b). The corresponding partial derivative at a general point (x, y) is the function fx (x, y). Similarly, the first order partial derivative with respect to y of the function f (x, y) at the point (a, b) is the number defined by the limit lim k→0 second order partial derivatives of f(x, y) f (a, b + k) − f (a, b) , k (32) provided the limit exists. The value of this partial derivative is denoted either by ∂ f/∂ y at (a, b), or by fy (a, b). At a general point (x, y) this partial derivative becomes the function fy (x, y). Higher order partial derivatives are defined in a similar fashion leading, for example, to the second order partial derivatives ∂ 2 f/∂ x 2 = ∂/∂ x(∂ f/∂ x), ∂ 2 f/∂ y2 = ∂/∂ y(∂ f/∂ y), ∂ 2 f/∂ x∂ y = ∂/∂ y(∂ f/∂ x), and ∂ 2 f/∂ y∂ x = ∂/∂ x(∂ f/∂ y). A more compact notation for these same derivatives is fxx , fyy , fxy , and fyx , so that, for example fyx = ∂ 2 f/∂ y∂ x and fyy = ∂ 2 f/∂ y2 . mixed partial derivatives THEOREM 1.3 The derivatives fxy and fyx are called mixed partial derivatives, and their relationship forms the statement of the next theorem, the proof of which can be found in any one of references [1.1] to [1.7]. Equality of mixed partial derivatives Let f, fx , fxy , and fyx all be defined and continuous at a point (a, b) in a region. Then fxy (a, b) = fyx (a, b).
  • 58. 40 Chapter 1 Review of Prerequisites total differential This result, given conditions for the equality of mixed partial derivatives, is an important one, and use will be made of it on numerous occasions as, for example, in Chapter 18 when second order partial differential equations are considered. If z = f (x, y), the total differential dz of f is defined as dz = (∂ f/∂ x) dx + (∂ f/∂ y) dy, (33) where dz, dx, and dy are differentials. Here, a differential means a small quantity, and the differential dz is determined by (33) when the differentials dx and dy are specified. When ∂ f/∂ x and ∂ f/∂ y are evaluated at a specific point (a, b), result (33) provides a linear approximation to f (x, y) near to the point (a, b). Although finite, the limits of the quotients of the differentials dz ÷ dx and dy ÷ dx as the differential dx → 0 are such that they become the values of the derivatives dz/dx and dy/dx, respectively, at a point (x, y) where ∂ f/∂ x and ∂ f/∂ y are evaluated. 1.10 Tangent Line and Tangent Plane Approximations to Functions tangent line approximation Let y = f (x) be defined in the interval a ≤ x ≤ b and be differentiable throughout it. Then a tangent line (linear) approximation to f near a point x0 in the interval is given by yT = f (x0 ) + (x − x0 ) f (x0 ). (34) This linear expression approximates the function f close to x0 by the tangent to the graph of y = f (x) at the point (x0 , f (x0 )). This simple approximation has many uses; one will be in the Euler and modified Euler methods for solving initial value problems for ordinary differential equations developed in Chapter 19. EXAMPLE 1.19 Find a tangent line approximation to y = 1 + x 2 + sin x near the point x = α. Solution Setting x0 = α and substituting into (34) gives y ≈ 1 + α 2 + sin α + (x − α)(2α + cos α) for x close to α. tangent plane approximation Let the function z = f (x, y) be defined in a region Dof the (x, y)-plane where it possesses continuous first order partial derivatives ∂ f/∂ x and ∂ f/∂ y. Then a tangent plane (linear) approximation to f near any point (x0 , y0 ) in D is given by zT = f (x0 , y0 ) + (x − x0 ) fx (x0 , y0 ) + (y − y0 ) fy (x0 , y0 ). (35) This linear expression approximates the function f close to the point (x0 , y0 ) by a plane that is tangent to the surface z = f (x, y) at the point (x0 , y0 , f (x0 , y0 )). The tangent plane approximation in (35) is an immediate extension to functions of two variables of the tangent line approximation in (34), to which it simplifies when only one independent variable is involved.
  • 59. Section 1.11 Integrals 41 Both of these approximations are derived from the appropriate Taylor series expansions of functions discussed in Section 1.12 by retaining only the linear terms. EXAMPLE 1.20 Find the tangent plane approximation to the function z = x 2 − 3y2 near the point (1, 2). Solution Setting x0 = 1, y0 = 2 and substituting into (35) gives z ≈ −11 + 2(x − 1) − 12(y − 2) for (x, y) close to (1, 2). 1.11 Integrals indefinite and definite integrals A differentiable function F(x) is called an antiderivative of the function f (x) on some interval if at each point of the interval dF/dx = f (x). If F(x) is any antiderivative of f (x), the indefinite integral of f (x), written f (x) dx, is f (x) dx = F(x) + c, where c is an arbitrary constant called the constant of integration. The function f (x) is called the integrand of the integral. Thus, an indefinite integral is a function, and an antiderivative and an indefinite integral can only differ by an arbitrary additive constant. b The expression a f (x) dx, called a definite integral, is a number and may be interpreted geometrically as the area between the graph of f (x) and the lines x = a and x = b, for b > a, with areas above the x-axis counted as positive and those below it as negative. The relationship between definite integrals that are numbers and indefinite integrals that are functions is given in the next theorem, included in which is also the mean value theorem for integrals. See the references at the end of the chapter for proofs and further information. THEOREM 1.4 Fundamental theorem of integral calculus and the mean value theorem for integrals If F (x) is continuous in the interval a ≤ x ≤ b, throughout which F (x) = f (x), then b f (x) dx = F(b) − F(a). a Another result is b f (x) dx = (b − a) f (ξ ), a if f is differentiable, where the number ξ , although unknown, lies in the interval a < ξ < b. In this form the result is called the mean value theorem for integrals.
  • 60. 42 Chapter 1 Review of Prerequisites An improper integral is a definite integral in which one or more of the following cases arises: (a) the integrand becomes infinite inside or at the end of the interval of integration, or (b) one (or both) of the limits of integration is infinite. Types of Improper Integrals Case (a) convergence and divergence of improper integrals If the integrand of an integral becomes infinite at a point c inside the interval of integration a ≤ x ≤ b as shown in Fig. 1.12a, the improper integral is said to exist if the limits in (36) exist. When the improper integral exists it is said to converge to the (finite) value of the following limit: b h→0 a a Cauchy principal value c−h f (x) dx = lim f (x) dx + lim b k→0 c+k f (x) dx. (36) In this definition h > 0 and k > 0 are allowed to tend to zero independently of each other. If, when the limit is taken, the integral is either infinite or indeterminate, the integral is said to diverge. Some integrals of this type diverge when h and k are allowed to tend to zero independently of each other, but converge when the limit is taken with h = k, in which case the result of the limit is called the Cauchy principal value of the integral. Integrals of this type arise frequently when certain types of definite integral are evaluated in the complex plane by means of contour integration (see Chapter 15, Section 15.5). Case (b) If a limit of integration in a definite integral is infinite, say the upper limit as shown in Fig. 1.12b, then, when it exists, the improper integral is said to converge to the value of the limit ∞ R f (x) dx = lim R→∞ a a y f (x) dx, (37) y y = f (x) y = f(x) 0 a c (a) b x 0 x a (b) FIGURE 1.12 (a) f (x) is infinite inside the interval of integration. (b) The interval of integration is infinite in length.
  • 61. Section 1.12 Taylor and Maclaurin Theorems 43 and the integral is divergent if the limit is either infinite or indeterminate. If both limits are infinite, the improper integral is said to converge to the value of the limit ∞ f (x) dx = −∞ R lim R→∞,S→∞ −S f (x) dx (38) when it exists, and the integral is said to be divergent if the limit is either infinite or indeterminate. In (38) R and S are allowed to tend to infinity independently of each other. Integrals of this type also have Cauchy principal values if the foregoing process leads to divergence, but the integrals are convergent when the limit is taken with R = S. Integrals of this type also occur when certain real integrals are evaluated by means of contour integration (see Chapter 15, Section 15.5). Elementary examples of convergent improper integrals of the types shown in (36) to (38) are 1 0 ∞ 1 x p − x− p dx = − π cot pπ, x−1 p exp(−x) sin xdx = 1/2 ∞ and 0 THEOREM 1.5 ( p2 < 1), −∞ dx = π. 1 + x2 Differentiation under the integral sign — Leibniz’ rule If ξ (t), η(t), dξ/dt, dη/dt, f (x, t), and ∂ f/∂t are continuous for t0 ≤ t ≤ t1 and for x in the interval of integration, then η(t) d dt ξ (t) η(t) f (x, t) dx = ξ (t) dη dξ ∂ f (x, t) dx + f (η(t), t) − f (ξ (t), t) . ∂t dt dt This theorem is used, for example, in Chapter 18 when discussing discontinuous solutions of a class of partial differential equations called conservation laws. Extensions of the theorem to functions of more variables are developed in Chapter 12, Section 12.3, where certain vector integral theorems are developed, and applications of the results of that section to fluid mechanics are to be found in Chapter 12, Section 12.4. An application of Theorem 1.5 that is easily checked by direct calculation is d dt t2 2t (x 2 + t) dx = t2 dx + (t 4 + t) · 2t − (4t 2 + t) · 2 = 2t 5 − 5t 2 − 4t. 2t A proof of Leibniz’ rule can be found, for example, in Chapter 12 of reference [1.6]. 1.12 Taylor and Maclaurin Theorems THEOREM 1.6 Taylor’s theorem for a function of one variable Let a function f (x) have derivatives of all orders in the interval a < x < b. Then for each positive integer n and
  • 62. 44 Chapter 1 Review of Prerequisites each x0 in the interval f (x) = f (x0 ) + (x − x0 ) f (1) (x0 ) + + (x − x0 )2 (2) f (x0 ) + · · · 2! (x − x0 )n (n) f (x0 ) + Rn+1 (x), n! where f (r ) (x) = dr f/dxr , and the remainder term Rn+1 (x) is given by Rn+1 (x) = (x − x0 )n+1 (n+1) f (ξ ), (n + 1)! for some ξ between x0 and x. Taylor polynomial Maclaurin’s theorem Taylor’s theorem becomes the Taylor series for f (x) when n is allowed to become infinite, and if the remainder term is neglected in Taylor’s theorem the result is called the Taylor polynomial approximation to f (x) of degree n. The Taylor polynomial of degree 1 is simply the tangent line approximation to f at x0 given in (34). Taylor’s theorem reduces to Maclaurin’s theorem if x0 = 0, and if we allow n to become infinite in Maclaurin’s theorem, it becomes the Maclaurin series for f (x). A special case of Theorem 1.6 arises when Taylor’s theorem is terminated with the term R1 (x), corresponding to n = 0, because the result can be written f (x) − f (x0 ) = f (ξ ), x − x0 mean value theorem (39) with ξ between x0 and x, and in this form it is called the mean value theorem for derivatives (see the last result of Theorem 1.4). A Taylor series is an example of an infinite series called a power series, the general form of which is ∞ an (x − x0 )n = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + · · · . (40) n=0 In (40) the quantity x is a variable, the numbers ai are the coefficients of the power series, the constant x0 is called the center of the series, or the point about which the series is expanded, and unless otherwise stated, x, x0 , and the ai are real numbers, so the power series is a function of x. A power series is said to converge for a given value of x if the sum of the infinite series for this value of x is finite. If the sum is infinite, or is not defined, the power series will be said to diverge for that value of x. Power series converge in an interval x0 − R < x < x0 + R, where the number R is called the radius of convergence of the series. Expressions for R are derived in Section 15.1. The interval x0 − R < x < x0 + R is called the interval of convergence of the power series. A power series converges for all x inside the interval of convergence and diverges for all x outside it, and the series may, or may not, converge at the end points of the interval. The convergence properties of power series are shown diagramatically in Fig. 1.13, and results (40) and combining expressions for R with
  • 63. Section 1.12 Taylor and Maclaurin Theorems Interval of Convergence Divergence x0 − R x0 45 Divergence x0 + R FIGURE 1.13 Interval of convergence of a power series with center x0 . (40) gives the following theorem (see the references at the end of the chapter for real variable proofs of the following results and for more information). THEOREM 1.7 Ratio test and nth root test for the convergence of power series The power series ∞ an (x − x0 )n = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + · · · n=0 radius and interval of convergence converges in the interval of convergence x0 − R < x < x0 + R, where the radius of convergence R is determined by either of the formulas (a) R = 1/ lim |an+1 /an | n→∞ or (b) R = 1/ lim |an |1/n . n→∞ The power series will diverge outside the interval of convergence, and its behavior at the ends of the interval of convergence must be determined separately. A simple result on the convergence of a series that is often useful is the alternating series test. An alternating series is so named because the signs of successive terms of the series alternate in sign. THEOREM 1.8 The alternating series test for convergence The alternating series converges if an > 0 and an+1 < an for all n and limn→∞ an = 0. ∞ n+1 an n=1 (−1) The following theorem on the differentiation and integration of power series is often needed, and it is a real variable form of a result proved later in Chapter 15 when complex power series are studied. THEOREM 1.9 Differentiation and integration of power series Let a power series have an interval of convergence x0 − R < x < x0 + R. Then the series may be differentiated and integrated term by term, and in each case the resulting series will have the same interval of convergence as the original series. In addition, within an interval of convergence common to any two power series, the series may be scaled by a constant and added or subtracted term by term and the resulting power series will have the same common interval of convergence. The simplest form of Taylor’s theorem for a function of two variables that finds many applications is given in the next theorem. THEOREM 1.10 Taylor’s theorem for a function of two variables Let f (x, y) be defined for a < x < b and c < y < d and have continuous partial derivatives up to and including
  • 64. 46 Chapter 1 Review of Prerequisites those of order 2. Then for x0 and y0 any points such that a < x0 < b and c < y0 < d, f (x, y) = f (x0 , y0 ) + (x − x0 ) fx (x0 , y0 ) + (y − y0 ) fy (x0 , y0 ) 1 (x − x0 )2 fxx (x0 + ξ, y0 + η) + 2(x − x0 )(y − y0 ) + 2! × fxy (x0 + ξ, y0 + η)(y − y0 )2 fyy (x0 + ξ, y0 + η) , where the numbers ξ and η are unknown, but ξ lies between x0 and x and η lies between y0 and y. The group of second order partial derivatives in Theorem 1.10 forms the remainder term, and when these derivatives are ignored, the result reduces to the tangent plane approximation to f (x, y) at the point (x0 , y0 ) given in (35). More information on Taylor’s theorem and series can be found, for example, in reference [1.2]. 1.13 Cylindrical and Spherical Polar Coordinates and Change of Variables in Partial Differentiation Mathematical problems formulated using a particular coordinate system, such as cartesian coordinates, often need to be reexpressed in terms of a different coordinate system in order to simplify the task of finding a solution. When partial derivatives occur in the formulation of problems, it becomes necessary to know how they transform when a different coordinate system is used. The fundamental theorem governing the transformation of partial derivatives under a change of variables takes the following form (see the references at the end of the chapter for the proof of Theorem 1.11 and for more examples of its use). THEOREM 1.11 Change of variables in partial differentiation Let f (x1 , x2 , . . . , xn ) be a differentiable function with respect to the n independent variables x1 , x2 , . . . , xn , and let the n new independent variables u1 , u2 , . . . , un be determined in terms of x1 , x2 , . . . , xn by x1 = X1 (u1 , u2 , . . . , un ), x2 = X2 (u1 , u2 , . . . , un ), . . . , xn = Xn (u1 , u2 , . . . , un ), where X1 , X2 , . . . , Xn are differentiable functions of their arguments. Then, if as a result of the change of variables the function f (x1 , x2 , . . . , xn ) becomes the function F(X1 , X2 , . . . , Xn ), and using chain rules we have ∂F ∂ f ∂ X1 ∂ f ∂ X2 ∂ f ∂ Xn = + +···+ ∂u1 ∂ x1 ∂u1 ∂ x2 ∂u1 ∂ xn ∂u1 ∂F ∂ f ∂ X1 ∂ f ∂ X2 ∂ f ∂ Xn = + +···+ ∂u2 ∂ x1 ∂u2 ∂ x2 ∂u2 ∂ xn ∂u2 .................................... ∂ f ∂ X1 ∂ f ∂ X2 ∂ f ∂ Xn ∂F = + +···+ . ∂un ∂ x1 ∂un ∂ x2 ∂un ∂ xn ∂un (41)
  • 65. Section 1.13 Cylindrical and Spherical Polar Coordinates and Change of Variables in Partial Differentiation 47 To find higher order partial derivatives it is necessary to express the relationships between the operations of differentiation in the two coordinate systems, rather than between the actual derivatives themseves. This can be accomplished by rewriting the results of Theorem 1.11 in the form of partial differential operators as follows: ∂ ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ ≡ + + ··· + ∂u1 ∂u1 ∂ x1 ∂u1 ∂ x2 ∂u1 ∂ xn ∂ ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ ≡ + + ··· + ∂u2 ∂u2 ∂ x1 ∂u2 ∂ x2 ∂u2 ∂ xn .................................... (42) ∂ ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ ≡ + + ··· + . ∂un ∂un ∂ x1 ∂un ∂ x2 ∂un ∂ xn When expressed in this form the relationships between the partial differentiation operations ∂/∂ x1 , ∂/∂ x2 , . . . , ∂/∂ xn and ∂/∂u1 , ∂/∂u2 , . . . , ∂/∂un become clear. This interpretation is needed when finding higher order partial derivatives such as ∂ 2 F/∂u2 ∂u1 , because ∂ ∂2 F = ∂u2 ∂u1 ∂u1 ∂F ∂u2 = ∂ X1 ∂ ∂ X2 ∂ ∂ Xn ∂ + + ··· + ∂u1 ∂ x1 ∂u1 ∂ x2 ∂u1 ∂ xn ∂F ∂u2 . An important combination of partial derivatives that occurs throughout physics and engineering is called the Laplacian of a function. When a twice differentiable function f (x, y, z) of the cartesian coordinates x, y, and z is involved, the Laplacian of f , denoted by f and sometimes by ∇ 2 f , read “del squared f ,” takes the form f = ∇2 f = ∂2 f ∂2 f ∂2 f + 2 + 2. 2 ∂x ∂y ∂z (43) Cylindrical Polar Coordinates (r, θ, z) The cylindrical polar coordinate system (r, θ, z) is illustrated in Fig. 1.14, and its relationship to cartesian coordinates is given by x = r cos θ, y = r sin θ, z = z, with 0 ≤ θ < 2π and r ≥ 0. (44) Spherical Polar Coordinates (r, φ, θ) The spherical polar coordinate system (r, φ, θ ) shown in Fig. 1.15 is related to cartesian coordinates by x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ, with 0 ≤ θ ≤ π, 0 ≤ φ < 2π. (45) The derivation of the formulas for the change of variables in functions of several variables can be found in any one of references [1.1] to [1.7], where cylindrical and
  • 66. 48 Chapter 1 Review of Prerequisites z z P (r, θ, z) P (r, φ, θ) θ z r 0 0 θ r y y x P Q P y x φ y x z x FIGURE 1.14 Cylindrical polar coordinates (r, θ, z). FIGURE 1.15 Spherical polar coordinates (r, φ, θ ). spherical polar coordinates are also discussed. Information on general orthogonal coordinate systems can be found in references [G.3] and [2.3]. EXERCISES 1.13 1. By making the change of variables x = r cos θ, y = r sin θ, z = z, in the function f (x, y, z), when it becomes the function F(r, θ, z), show that in cylindrical polar coordinates ∂F ∂f ∂f = cos θ + sin θ , ∂r ∂x ∂y ∂f ∂f ∂F ∂f ∂F = −r sin θ + r cos θ , = . ∂θ ∂x ∂y ∂z ∂z 2. Use the results of Exercise 1 to show that in cylindrical polar coordinates the Laplacian ∂ f ∂ f ∂ f + + 2 becomes ∂ x2 ∂ y2 ∂z 2 ∂ F 1 ∂2 F 1 ∂F ∂2 F F = + 2 + + , 2 2 ∂r r ∂r r ∂θ ∂z2 and hence that an equivalent form of F is f = F= 1 ∂ r ∂r 2 r ∂F ∂r 2 + 2 1 ∂ r ∂θ ∂F ∂θ + ∂ ∂F r ∂z ∂z becomes F(r, φ, θ ), show that in spherical polar coordinates ∂F ∂f ∂f ∂f = sin θ cos φ + sin φ sin θ + cos φ ∂r ∂x ∂y ∂z ∂f ∂f ∂f ∂F = r cos φ cos θ + r cos φ sin θ − r sin φ ∂φ ∂x ∂y ∂z ∂f ∂f ∂F = −r sin φ sin θ + r sin φ cos θ . ∂z ∂x ∂y 4. Use the results of Exercise 3 to show that in spherical polar coordinates the Laplacian f = ∂2 f ∂2 f ∂2 f + + 2 2 2 ∂x ∂y ∂z becomes . 3. By making the change of variable x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ in the function f (x, y, z), when it F = 1 1 ∂ ∂F r2 + r 2 ∂r ∂r r 2 sin2 θ 1 ∂F ∂ + 2 sin θ . r sin θ ∂θ ∂θ ∂2 F ∂φ 2
  • 67. Section 1.14 1.14 Inverse Functions and the Inverse Function Theorem 49 Inverse Functions and the Inverse Function Theorem In mathematics and its applications it is often necessary to find the inverse of a function y = f (x) so x can be expressed in the form x = g(y), and when this can be done the function g is called the inverse of f and is such that y = f (g(y)). When f is an arbitrary function its inverse is often denoted by f −1 , and this superscript notation is also used to denote the inverse of trigonometric functions so if, for example, y = sin x, the inverse sine function is written sin−1 , so that x = sin−1 y. However, the notation y = arcsin y is also used with the understanding that the notations arcsin and sin−1 are equivalent. A trivial example of a function whose inverse can be found unambiguously is y = ax + b, because provided a = 0 we can write x = (y − b)/a for all x and y. This is not the case, however, when trigonometric functions are involved, because the function y = sin x will give a unique value of y for any given x, but given y there are infinitely many values of x for which y = sin x. This and similar inverse trigonometric functions are considered in elementary calculus courses. There the multivalued nature of the inverse sine function is resolved by restricting it to make y lie in a specific interval chosen so that one y corresponds to one x and, conversely, one x corresponds to one y. This situation is described by saying that the relationship between x and y is one-to-one. Specifically, in the case of the sine function, this is accomplished by requiring that if x = sin y, the inverse function y = Arcsin x is restricted so its principal value lies in the interval −π/2 ≤ Arcsinx ≤ π/2, where the domain of definition of the inverse function is −1 ≤ x ≤ 1. A different possibility that arises frequently is when x and y are related by an equation of the form f (x, y) = 0 from which it is impossible to extract either x as a function of y, or y as a function of x in terms of known functions. A typical example of this type is f (x, y) = x 2 − 2y2 − sin xy. To make matters precise, if x and y are related by an equation f (x, y) = 0, then if a function y = g(x) exists such that f (x, g(x)) = 0, the function y = g(x) is said to be defined implicitly by f (x, y) = 0. Although it is often not possible to find the function g(x), it is still necessary to know when, in a neighborhood of a point (x0 , y0 ), given a value of x, a unique value of y can be found, sometimes only numerically. The implicit function theorem that follows is seldom mentioned in first calculus courses because its proof involves certain technicalities, but it is quoted here in the simplest possible form because of its fundamental importance and the fact that is it frequently used by implication. THEOREM 1.12 The implicit function theorem Let f (x, y) and fy (x, y) be continuous in a region D of the (x, y)-plane and let (x0 , y0 ) be a point inside D, where f (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0. Then (i) There is a rectangle R inside D containing (x0 , y0 ) at all points of which there can be found a unique y such that f (x, y) = 0. (ii) If the value of y is denoted by g(x), then y0 = g(x0 ), with f (x, g(x)) = 0, and g(x) is continuous inside R.
  • 68. 50 Chapter 1 Review of Prerequisites (iii) If, in addition, fx (x, y) is continuous in D then g(x) is differentiable in R and g (x) = − fx (x,g(x)) . fy (x,g(x)) In general terms, the implicit function theorem gives conditions that ensure the existence of an inverse function that is continuous and smooth enough to be differentiable. The theorem has a more general form involving functions f (x1 , x2 , . . . , xn ) of n variables, though this will not be given here. The interested reader can find accounts of the implicit function theorem and some of its generalizations in references [1.4], [1.6], and [5.1].
  • 69. CHAPTER 1 TECHNOLOGY PROJECTS Project 1 Linear Difference Equations and the Fibonacci Sequence In Italy in 1202, Leonardo of Pisa, also known as Fibonacci, posed the following question. Let a newly born pair of rabbits produce two offspring each month, with breeding starting when they are 2 months old. Assuming that the pair of offspring start breeding in the same fashion when 2 months old, and that the process continues thereafter in a similar manner with no deaths, how many pairs of rabbits will there be after n months? If un , is the number of pairs of rabbits after n months, the production of rabbits can be represented by the linear difference equation, or recurrence relation, un+2 = un+1 + un , where the sequence of numbers ur with r = 1, 2, . . . is generated by setting u1 = 1 and u2 = 1, since this represents the initial pair of rabbits that began the breeding process. A simple calculation using this difference equation shows that the sequence of numbers generated in this manner that represents the number of pairs of rabbits present each month is 1, 1, 2, 3, 5, 8, . . . , and this is called the Fibonacci sequence. This sequence is found to occur in the study of regular solids, in numerical analysis, and elsewhere in mathematics. A linear difference equation of the form un+2 = aun+1 + bun , with a and b real numbers, can be solved by substituting un = Aλn into the difference equation and finding the two roots λ1 and λ2 of the resulting quadratic equation in λ. When λ1 = λ2 , the general solution is un = A1 λn + Aλn , and when λ1 = λ2 = λ , say, the gen2 1 eral solution is un = (A1 + nA2 )μn . The arbitrary constants A1 and A2 are found by requiring un to satisfy some given conditions of the form u1 = α and u2 = β, where the numbers α and β specify the way the sequence starts (the initial conditions). Use this method to show that the solution un for the Fibonacci sequence is √ n √ n 1 1 5 1+ 5 un = √ , 2 2 5 ( ) ( ) for n = 1, 2, . . . . Make use of computer algebra to generate the first 30 terms of the Fibonacci sequence directly from the difference equation, and verify that the results are in agreement wïth the preceding formula. Use computer √algebra to show that limn→∞ (un /un 1 ) = 1 ( 5 + 1). This number is called 2 the golden mean, and in art and architecture it represents the ratio of the sides of a rectangle that is considered to have the most pleasing appearance. Project 2 Erratic Behavior of a Sequence Generated by a Difference Equation 1. Not all difference equations generate sequences of numbers that evolve steadily as happens with the Fibonacci sequence. Use computer algebra to generate the first 20 terms of the sequence produced by the difference equation un+2 = 2un+1 5un with u1 = 1, u2 = 3, and observe its erratic behavior. Use the method of Project 1 to determine the analytical solution, and by means of computer algebra confirm that the two results are in agreement. Examine the analytical solution and explain why the behavior of the sequence of terms is so erratic. 2. Construct a difference equation of your own in which the roots λ1 and λ2 are equal. Find the analytical solution and use computer algebra to determine the first 20 terms of the sequence. Verify that these terms are in agreement with the ones generated directly from the difference equation. 51
  • 70. This Page Intentionally Left Blank
  • 71. PART TWO VECTORS AND MATRICES 2 Chapter 3 Chapter Chapter 4 Vectors and Vector Spaces Matrices and System of Linear Equations Eigenvalues, Eigenvectors, and Diagonalization 53
  • 72. This Page Intentionally Left Blank
  • 73. 2 C H A P T E R Vectors and Vector Spaces E ngineers, scientists, and physicists need to work with systems involving physical quantities that, unlike the density of a solid, cannot be characterized by a single number. This chapter is about the algebra of important and useful quantities called vectors that arise naturally when studying physical systems, and are defined by an ordered group of three numbers (a, b, c). Vectors are of fundamental importance and they play an essential role when the laws governing engineering and physics are expressed in mathematical terms. A scalar quantity is one that is completely described when its magnitude is known, such as pressure, temperature, and area. A vector is a quantity that is completely specified when both its magnitude and direction are given, such as force, velocity, and momentum. A vector can be described geometrically as a directed straight line segment, with its length proportional to the magnitude of the vector, the line representing the vector parallel to the line of action of the vector, and an arrow on the line showing the direction along the line, or the sense, in which the vector acts. This geometrical interpretation of a vector is valuable in many ways, as it can be used to add and subtract vectors and to multiply them by a scalar, since this merely involves changing their magnitude and sense, while leaving the line to which they are parallel unchanged. However, to perform more general algebraic operations on vectors some other form of representation is required. The one that is used most frequently involves describing a vector in terms of what are called its components along a set of three mutually orthogonal axes, which are usually taken to be the axes O{x, y, z} in the cartesian coordinate system. Here, by the component of a vector along a given line l , we mean the length of the perpendicular projection of the vector onto the line l . We will see later that this cartesian representation of a vector identifies it completely in terms of three components and enables algebraic operations to be performed on it. In particular, it allows the introduction of the scalar product, or dot product, of two vectors that results in a scalar, and a vector product, or cross product, of two vectors that leads to a vector. Finally, vectors and their algebra will be generalized to n space dimensions, leading to the concept of a vector space and to some related ideas. 55
  • 74. 56 Chapter 2 2.1 Vectors and Vector Spaces Vectors, Geometry, and Algebra M scalar vector directed straight line segment translation any quantities are completely described once their magnitude is known. A typical example of a physical quantity of this type is provided by the temperature at a given point in a room that is determined by the number specifying its value measured on a temperature scale, such as degrees F or degrees C. A quantity such as this is called a scalar quantity, and different examples of mathematical and physical scalar quantities are real numbers, length, area, volume, mass, speed, pressure, chemical concentration, electrical resistance, electric potential, and energy. Other physical quantities are only fully specified when both their magnitude and direction are given. Quantities like this are called vector quantities, and a typical example of a vector quantity arises when specifying the instantaneous motion of a fluid particle in a river. In this case both the particle speed and its direction must be given if the description of its motion is to be complete. Speed in a given direction is called velocity, and velocity is a vector quantity. Some other examples of vector quantities are force, acceleration, momentum, the heat flow vector at a point in a block of metal, the earth’s magnetic field at a given location, and a mathematical quantity called the gradient of a scalar function of position that will be defined later. By definition, the magnitude of a vector quantity is a nonnegative number (a scalar) that measures its size without regard to its direction, so, for example, the magnitude of a velocity is a speed. A convenient geometrical representation of a vector is provided by a straight line segment drawn in space parallel to the required direction, with an arrowhead indicating the sense in which the vector acts along the line segment, and the length of the line segment proportional to the magnitude of the vector. This is called a directed straight line segment, and by definition all directed straight line segments that are parallel to one another and have the same sense and length are regarded as equal. Expressed differently, moving a directed straight line segment parallel to itself so that its length remains the same and its arrow still points in the same direction leaves the vector it represents unchanged. A shift of a directed straight line segment of this type is called a translation of the vector it represents. For this reason the terms directed straight line segment and vector can be used interchangeably. Some examples of vectors that are equal through translation are shown in Fig. 2.1. It must be emphasized that geometrical representations of vectors as directed straight line segments in space are defined without reference to a specific coordinate system. This purely geometrical interpretation of vectors finds many applications, though a different form of representation is necessary if an effective vector algebra is to be developed for use with the calculus. An analytical representation of vectors that allows a vector algebra to be constructed with this purpose in mind can be based on a general coordinate system. However, throughout this chapter only rectangular cartesian coordinates will be used because they provide a simple and natural way of representing vectors. FIGURE 2.1 Equal geometrical vectors.
  • 75. Section 2.1 Vectors, Geometry, and Algebra 57 z y 0 x FIGURE 2.2 A right-handed rectangular cartesian coordinate system. right-handed system In rectangular cartesian coordinates the x-, y-, and z-axes are all mutually orthogonal (perpendicular), and the positive sense along the axes is taken to be in the direction of increasing x, y, and z. The orientation of the axes will always be such that the positive direction along the z-axis is the one in which a right-handed screw (such as a corkscrew) aligned with the z-axis will advance when rotated from the positive x-axis to the positive y-axis, as shown in Fig. 2.2. A system of axes with this property is called a right-handed system. The end of a vector toward which the arrow points will be called the tip of the vector, and the other end its base. Because a vector is invariant under a translation, there is no loss of generality in taking its base to be located at the origin O of the coordinate system, and its tip at a point P with the coordinates (a1 , a2 , a3 ), say, as shown in Fig. 2.3. An application of the Pythagoras theorem to the triangle OPP z a3 OP = 2 + (a 1 1 /2 2) 2 + a3 a2 P (a1, a2, a3) a2 O y OP '= (a 1 a1 2 +a 2 2) 1/ 2 P x FIGURE 2.3 The vector from O to P and its components a1 , a2 , and a3 in the x-, y-, z-coordinate system.
  • 76. 58 Chapter 2 Vectors and Vector Spaces magnitude, unit vector, and components ordered number triple norm and modulus 2 2 2 shows the length of the line from O to P to be (a1 + a2 + a3 )1/2 . This length is proportional to the magnitude of the vector it represents, and as the base of the vector is at O, the sense of the vector is from O to P. For convenience, the constant of proportionality will be taken to be 1, so a directed straight line segment of unit length will represent a vector of magnitude 1 and so will be called a unit vector. Using this convention, the vector represented by the line from O to P in Fig. 2.3 has 2 2 2 magnitude (a1 + a2 + a3 )1/2 . The three numbers a1 , a2 , and a3 , in this order, that define the vector from O to P are called its components in the x, y, and z directions, respectively. A set of three numbers a1 , a2 , and a3 in a given order, written (a1 , a2 , a3 ), is called an ordered number triple. As the coordinates (a1 , a2 , a3 ) of point P in Fig. 2.3 completely define the vector from O to P, this ordered number triple may be taken as the definition of the vector itself. In general, changing the order of the numbers in an ordered number triple changes the vector it defines. Sometimes it is necessary to consider a vector whose base does not coincide with the origin. Suppose that when this occurs the base C is at the point (c1 , c2 , c3 ) and the tip D is at the point (d1 , d2 , d3 ). Then Fig. 2.4 shows the components of this vector in the x, y, and z directions to be d1 − c1 , d2 − c2 , and d3 − c3 . These components determine both the magnitude and direction of the vector. The vector is described by the ordered number triple (d1 − c1 , d2 − c2 , d3 − c3 ), and the length of CD that is equal to the magnitude of the vector is [(d1 − c1 )2 + (d2 − c2 )2 + (d3 − c3 )2 ]1/2 . For convenience, it is usual to represent a vector by a single boldface character such as a, and its magnitude (length) by a , called the norm of a. It is necessary to say here that in applications of vectors to mechanics, and in some purely geometrical applications of vectors, the norm of vector r is often called its modulus and written |r|. When this convention is used, because |r| is a scalar it is usual to denote it by the corresponding ordinary italic letter r , so that r = |r|. If the base and tip of a vector need to be identified by letters, a vector such as the one from C to D in Fig. 2.4 is written CD, with underlining used to indicate that a vector is involved, and the ordering of the letters is such that the first shows the z d3 D c3 C 0 c2 d2 y c1 d1 C' D x FIGURE 2.4 Vector directed from point C at (c1 , c2 , c3 ) to point D at (d1 , d2 , d3 ).
  • 77. Section 2.1 Vectors, Geometry, and Algebra 59 base and the second the tip of the vector. Thus, CD and DC are vectors of equal magnitude but opposite sense, and when these vectors are represented by arrows, the arrows are parallel and of equal length, but point in opposite directions. EXAMPLE 2.1 If, in Fig. 2.4, C is the point (−3, 4, 9) and D the point (2, 5, 7), the vector CD has components 2 − (−3) = 5, 5 − 4 = 1, and 7 − 9 = −2, and so is represented by the ordered number triple (5, 1, −2), whereas vector DC has components −5, −1, and 2 and is represented by the ordered number triple (−5, −1, 2). Having illustrated the concepts of scalars and vectors using some familiar examples, we now develop the algebra of vectors in rather more general terms. Vectors A vector quantity a is an ordered number triple (a1 , a2 , a3 ) in which a1 , a2 , and a3 are real numbers, and we shall write a = (a1 , a2 , a3 ). The numbers a1 , a2 , and a3 , in this order, are called the first, second, and third components of vector a or, equivalently, its x-, y-, and z-components. Null vector The null (zero) vector, written 0, has neither magnitude nor direction and is the ordered number triple 0 = (0, 0, 0). Equality of vectors Two vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) are equal, written a = b, if, and only if, a1 = b1 , a2 = b2 , and a3 = b3 . EXAMPLE 2.2 If a = (a1 , −5, 6), b = (3, b2 , b3 ) and c = (3, −5, 1), then a = b if a1 = 3, b2 = −5 and b3 = 6, and b = c if b2 = −5 and b3 = 1, but a = c for any choice of a1 because 6 = 1. Norm of a vector The norm of vector a = (a1 , a2 , a3 ), denoted by a , is the non-negative real number 2 2 2 a = a1 + a2 + a3 1/2 , and in geometrical terms a is the length of vector a. The norm of the null vector 0 is 0 = 0. For example, if a is in m/sec, “length” of a is in m/sec. EXAMPLE 2.3 If a = (1, −3, 2), then a = [12 + (−3)2 + 22 ]1/2 = √ 14, as illustrated in Fig. 2.5.
  • 78. 60 Chapter 2 Vectors and Vector Spaces z 2 A OA =⎥ ⎢a ⎥ a ⎢ −3 y 0 1 A x FIGURE 2.5 Vector a and its norm a . The sum of two vectors If a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) have the same dimensions, say, both are m/sec, their sum, written a + b, is defined as the ordered number triple (vector) obtained by adding corresponding components of a and b to give a + b = (a1 + b1 , a2 + b2 , a3 + b3 ). EXAMPLE 2.4 If a = (1, 2, −5) and b = (−2, 2, 4), then a + b = (1 + (−2), 2 + 2, −5 + 4) = (−1, 4, −1). Multiplying a vector by a scalar Let a = (a1 , a2 , a3 ) and λ be an arbitrary real number. Then the product λa is defined as the vector λa = (λa1 , λa2 , λa3 ). EXAMPLE 2.5 Let a = (2, −3, 5), b = (−1, 2, 4). Then 2a = (4, −6, 10), 4b = (−4, 8, 16), and 2a + 4b = (4 + (−4), −6 + 8, 10 + 16) = (0, 2, 26). This definition of the product of a vector and a scalar, called scaling a vector, shows that when vector a is multiplied by a scalar λ, the norm of a is multiplied by |λ|, because 2 2 2 λa = λ2 a1 + λ2 a2 + λ2 a3 1/2 = |λ| · a . It also follows from the definition that the sense of vector a is reversed when it is multiplied by −1, though its norm is left unaltered. The definition of the difference
  • 79. Section 2.1 Vectors, Geometry, and Algebra 61 y y a2 + b2 b2 a+ b a2 b b a2 a a 0 b1 a1 x a1 a1 + b1 x 0 FIGURE 2.6 The vector sum a + b. of two vectors is seen to be contained in the definition of their sum, because a − b = a + (−b). In particular, when a = b, we find that that a − a = 0, showing that −a is the additive inverse of a. The geometrical interpretations of the sum a + b, the difference a − b, and the scaled vector λa in terms of their components are shown in Figs. 2.6 to 2.8, though to simplify the diagrams only the two-dimensional cases are illustrated. This involves no loss of generality, because it is always possible to choose the (x, y)-plane to coincide with the plane containing the vectors a and b. Vector Addition by the Triangle Rule triangle rule for addition Consideration of Fig. 2.6 shows that the addition of vector b to vector a is obtained geometrically by translating vector b until its base is located at the tip of vector a, and then the vector representing the sum a + b has its base at the base of vector a and its tip at the tip of the repositioned vector b. Because of the triangle involving vectors a, b, and a + b, this geometrical interpretation of a vector sum is called the triangle rule for vector addition. The triangle rule also applies to the difference of two vectors, as may be seen by considering Fig. 2.7, because after obtaining −b from b by reversing its sense, the difference a − b can be written as the vector sum a + (−b), where −b is added to vector a by means of the triangle rule. The algebraic results discussed so far concerning the addition and scaling of vectors, together with some of their consequences, are combined to form the following theorem. y y b2 a2 a2 b −b1 −b a a1 − b1 a 0 b1 a1 x −b2 FIGURE 2.7 The vector difference a − b. 0 a2 − b2 a−b −b a1 x
  • 80. 62 Chapter 2 Vectors and Vector Spaces y y 2a2 a2 1/2 a 2a 1/2 a2 a a1 0 y y x 0 x 2a1 0 (k = 2) −2a1 1/2 a1 x x (k = 1/2) −2a −2a2 (k = −2) FIGURE 2.8 The vector ka for different values of k. THEOREM 2.1 Addition and scaling of vectors Let P, Q, and R be arbitrary vectors and let α and β be arbitrary real numbers. Then: 1. P+Q=Q+P 2. P+0=0+P=P 3. (P + Q) + R = P + (Q + R) 4. α(P + Q) = αP + αQ 5. (αβ)P = α(βP) = β(αP) 6. (α + β)P = αP + βP 7. αP = |α| · P (vector addition is commutative); (0 is the identity element in vector addition); (vector addition is associative); (multiplication by a scalar is distributive over vector addition); (multiplication of a vector by a product of scalars is associative); (multiplication of a vector by a sum of scalars is distributive); (scaling P by α scales the norm of P by |α|). Proof The results of this theorem are all immediate consequences of the above definitions so as the proofs of results 1 to 6 are all very similar, and result 7 has already been established, we only prove result 4. Let P = ( p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ); then α(P + Q) = α( p1 + q1 , p2 + q2 , p3 + q3 ) = α[( p1 , p2 , p3 ) + (q1 , q2 , q3 )] = α( p1 , p2 , p3 ) + α(q1 , q2 , q3 ) = αP + αQ, as was to be shown. The Representation of Vectors in Terms of the Unit Vectors i, j, and k The components of a vector, together with vector addition, can be used to describe vectors in a very convenient way. The idea is simple, and it involves using the standard convention that i, j, and k are vectors of unit length that point in the positive sense along the x-, y-, and z-axes, respectively. Vectors such as i, j, and k that have a unit norm (length) are called unit vectors, so i = j = k = 1.
  • 81. Section 2.1 Vectors, Geometry, and Algebra 63 z a3 k j a= i a 1i j + a2 A(a1, a2, a3) k + a3 a3k 0 a2 y a1i a1 a2 j x FIGURE 2.9 Vector a in terms of the unit vectors i, j, and k. An arbitrary vector a can be represented by an “arrow,” with its base at the origin and its tip at the point A with cartesian coordinates (a1 , a2 , a3 ) where, of course, a1 , a2 , and a3 are also the components of a. Consequently, scaling the unit vectors i, j, and k by the respective x, y, and z components a1 , a2 , and a3 of a, followed by vector addition of these three vectors, shows that a can be written a = a1 i + a2 j + a3 k, (1) as can be seen from Fig. 2.9. The representation of vector a in terms of the unit vectors i, j, and k in (1), and the ordered triple notation, are equivalent, so a = a1 i + a2 j + a3 k = (a1 , a2 , a3 ). position vector (2) In some applications a vector defines a point in space, so vectors of this type are called position vectors. The symbol r is normally used for a position vector, so if point P with coordinates (x, y, z) is a general point in space, as in Fig. 2.10, its z k j r= i xi j+ +y P (x, y, z) zk zk 0 y xi yj 1/ OP = ⎥⎢r⎥⎢ = (x2 + y2 + z2) 2 x FIGURE 2.10 Position vector of a general point P in space.
  • 82. 64 Chapter 2 Vectors and Vector Spaces position vector relative to the origin is r = xi + yj + zk, (3) r = (x 2 + y2 + z2 )1/2 . (4) and its norm (length) is EXAMPLE 2.6 (a) Find the distance of point P from the origin given that its position vector is r = 2i + 4j − 3k. (b) If a general point P in space has position vector r = xi + yj + zk, describe the surface defined by r = 3 and find its cartesian equation. Solution (a) As r is the position vector of P relative to the origin, the distance of √ point P from the origin is r = [22 + 42 + (−3)2 ]1/2 = 29. (b) As r = 3 (constant), it follows that the required surface is one for which every point lies at a distance 3 from the origin, so the surface must be a sphere of radius 3 centered on the origin. As r = xi + yj + zk is the general position vector of a point on this sphere, the result r = 3 is equivalent to (x 2 + y2 + z2 )1/2 = 3, so the cartesian equation of the sphere is x 2 + y2 + z2 = 9. Because of the equivalence of the ordered number triple notation and the representation of vectors in terms of the unit vectors i, j, and k given in (2), both systems obey the same rules governing the addition and scaling of vectors in terms of their components. Thus, the following rules apply to the combination of any two vectors a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k expressed in terms of i, j, and k, and an arbitrary real number λ. The sum a + b is given by a + b = (a1 + b1 )i + (a2 + b2 )j + (a3 + b3 )k. (5) The product λ a is given by λ a = λa1 i + λa2 j + λa3 k. (6) The norm of scaled vector λ a is given by λ a = |λ| · a 2 2 2 = |λ| a1 + a2 + a3 EXAMPLE 2.7 1/2 . (7) If a = 5i + j − 3k and b = 2i − 2j − 7k, find (a) a + b, (b) a − b, (c) 2a + b, and (d) |−2a|. Solution (a) a + b = (5i + j − 3k) + (2i − 2j − 7k) = (5 + 2)i + (1 − 2)j + (−3 − 7)k = 7i − j − 10k. (b) a − b = (5i + j − 3k) − (2i − 2j − 7k) = (5 − 2)i + (1 − (−2))j + (−3 − (−7))k = 3i + 3j + 4k.
  • 83. Section 2.1 Vectors, Geometry, and Algebra 65 2a + b = 2(5i + j − 3k) + (2i − 2j − 7k) = (10i + 2j − 6k) + (2i − 2j − 7k) = (10 + 2)i + (2 + (−2))j + (−6 + (−7))k = 12i − 13k. √ 1/2 |−2a| = [(−10)2 + (−2)2 + 62 ] = 2 35 (c) (d) or, equivalently, √ |−2a| = |−2| · a = 2 a = 2[52 + 12 + (−3)2 ]1/2 = 2 35. Finding a Unit Vector in the Direction of an Arbitrary Vector It is often necessary to find a unit vector in the direction of an arbitrary vector a = a1 i + a2 j + a3 k. This is accomplished by dividing a by its norm a , because the vector a/ a has the same sense as a and its norm is 1. It is convenient to use a symbol related to an arbitrary vector a to indicate the unit vector in its direction, so from ˆ now on such a vector will be denoted by a, read “a hat.” So if a = a1 i + a2 j + a3 k, 2 2 2 ˆ a = a/ a = (a1 i + a2 j + a3 k)/ a1 + a2 + a3 = (a1 /a)i + (a2 /a)j + (a3 /a)k, 1/2 2 2 2 with a = a1 + a2 + a3 1/2 . (8) As the symbols i, j, and k are used exclusively for the unit vectors in the x-, y-, and ˆ z-directions, it is not necessary to write ˆ j, and k. i, ˆ ˆ The relationship between a, a, and a can be put in the useful form ˆ a = a a, (9) ˆ showing that a general vector a can always be written as the unit vector a scaled by a . Unless otherwise stated, a = 0. EXAMPLE 2.8 Find a unit vector in the direction of a = 3i + 2j + 5k. √ Solution As a = (32 + 22 + 52 )1/2 = 38, it follows that √ √ √ ˆ a = a/ a = (3/ 38)i + (2/ 38)j + (5/ 38)k. EXAMPLE 2.9 It is known from experiments in mechanics that forces are vector quantities and so combine according to the laws of vector algebra. Use this fact to find the sum and difference of a force of 9 units in the direction of 2i + j − 2k and a force of 10 units in the direction of 4i − 3j, and determine the magnitudes of these forces. Solution We will use the convention that a unit vector represents a force of 1 unit. Let F be the force of 9 units. Then as 2i + j − 2k = [22 + 12 + (−2)2 ]1/2 = 3, the unit vector in the direction of F is ˆ F = (1/3)(2i + j − 2k) = (2/3)i + (1/3)j − (2/3)k, ˆ so F = 9F = 6i + 3j − 6k units.
  • 84. 66 Chapter 2 Vectors and Vector Spaces Similarly, let G be the force of 10 units. Then as 4i − 3j = 5, the unit vector in the direction of G is ˆ G = (1/5)(4i − 3j) = (4/5)i − (3/5)j, ˆ so G = 10G = 8i − 6j units. Combining these results shows that F + G = 14i − 3j − 6k units, and F − G = −2i + 9j − 6k units, from which it follows that the magnitudes of the forces are given by √ F + G = 241 units and F − G = 11 units. Equality of vectors expressed in terms of unit vectors As the difference of two equal and opposite vectors is the null vector 0, this shows that if a = b, where a = a1 i + a2 j + a3 k, and b = b1 i + b2 j + b3 k, then the respective components of vectors a and b must be equal, leading to the result that (10) a = b if, and only if, a1 = b1 , a2 = b2 , and a3 = b3 . Simple Geometrical Applications of Vectors Although our use of vectors will be mainly in connection with the calculus, the following simple geometrical applications are helpful because they illustrate basic vector arguments and properties. Although we have seen how an arbitrary vector can be expressed in terms of unit vectors associated with a cartesian coordinate system, it must be remembered that the fundamental concept of a vector and its algebra is independent of a coordinate system. Because of this, it is often possible to use the rules governing elementary vector algebra given in Theorem 2.1 to establish equations in a purely vectorial manner, without the need to appeal to any coordinate system. Once a general vector equation has been established, the representation of the vectors involved in terms of their components and the unit vectors i, j, and k can be used to convert the vector equation into the equivalent cartesian equations. The purely vectorial approach to geometrical problems is well illustrated by finding the vector AB in terms of the position vectors of points A and B, and then using the result to find the position vector of the mid-point of AB. After this, the purely vectorial derivation of a geometrical result followed by its interpretation in cartesian form will be illustrated by finding the equation of a straight line in three space dimensions. Vector AB in terms of the position vectors of A and B Let a and b be the position vectors of points A and B relative to an origin O, as shown in Fig. 2.11. An application of the triangle rule for the addition of vectors gives OA + AB = OB, but OA = a and OB = b, so a + AB = b,
  • 85. Section 2.1 Vectors, Geometry, and Algebra 67 B AB A b a O FIGURE 2.11 Vectors a, b, and AB. giving AB = b − a. (11) When expressed in words, this simple but useful result asserts that vector AB is obtained by subtracting the position vector a of point A from the position vector b of point B. EXAMPLE 2.10 Find the position vector of the mid-point of AB if point A has position vector a and point B has position vector b relative to an origin O. Solution Let point C, with position vector c relative to origin O, be the mid-point of AB, as shown in Fig. 2.12. By the triangle rule, OA + AC = OC, but OA = a, and from (11) AC = (1/2)(b − a), so OC = a + (1/2)(b − a), so the required result is c = OC = (1/2)(b + a). B C AC A c b a O FIGURE 2.12 C is the mid-point of AB.
  • 86. 68 Chapter 2 Vectors and Vector Spaces b A a λb AP = L P r O FIGURE 2.13 The straight line L. The vector and cartesian equations of a straight line Let line L be a straight line through point A with position vector a relative to an origin O, and let the line be parallel to a vector b. If P is an arbitrary point on line L with position vector r relative to O, an application of the triangle rule for vector addition to the vectors shown in Fig. 2.13 gives r = OA + AP. vector equation of straight line But OA = a, and as AP is parallel to b, a number λ can always be found such that AP = λb, so the vector equation of line L becomes r = a + λb. (12) Notice that result (12) determines all points P on L if λ is taken to be a number in the interval −∞ < λ < ∞. The cartesian equations of line L follow by setting a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and r = xi + yj + zk in result (12), and then using the definition of equality of vectors given in (10) to obtain the corresponding three scalar cartesian equations. Proceeding in this way we find that xi + yj + zk = a1 i + a2 j + a3 k + λ(b1 i + b2 j + b3 k), cartesian and standard form of straight line so equating corresponding components of i, j, and k on each side of this equation brings us to the required cartesian equations for L in the form x1 = a1 + λb1 , x2 = a2 + λb2 , x3 = a3 + λb3 . (13) An equivalent form of these equations is obtained by solving each equation for λ and equating the results to get x − a1 y − a2 z − a3 = = = λ. b1 b2 b3 (14)
  • 87. Section 2.1 Vectors, Geometry, and Algebra 69 This is the standard form (also called the canonical form) of the cartesian equations of a straight line. It is important to notice that when written in standard form the coefficients of x, y, and z are all unity. Once the equation of a straight line is written in standard form, equating each numerator to zero determines the components (a1 , a2 , a3 ) of a position vector of a point on the line, while the denominators in the order (b1 , b2 , b3 ) determine the components of a vector parallel to the line. EXAMPLE 2.11 A straight line L is given in the form 3−y z+ 1 2x − 3 = = . 4 2 3 Find the position vector of a point on L and a vector parallel to L. Solution When the equation is written in standard form it becomes x − 3/2 y−3 z+ 1 = = = λ. 2 −2 3 Comparing these equations with (14) shows that (a1 , a2 , a3 ) = (3/2, 3, −1) and b = (b1 , b2 , b3 ) = (2, −2, 3). So the position vector of a point on the line is a = (3/2)i + 3j − k, and a vector parallel to the line is b = 2i − 2j + 3k. Neither of these results is unique, because μb is also parallel to the line for any scalar μ = 0, and any other point on L would suffice. For example, the vector 14i − 14j + 21k is also parallel to the line, while setting λ = 2 leads to the result (a1 , a2 , a3 ) = (11/2, −1, 5), corresponding to a different point on the same line, this time with position vector a = (11/2)i − j + 5k. Summary This section has introduced vectors both as geometrical quantities that can be represented by directed line segments and, using a right-handed system of cartesian axes, as ordered number triples. Definitions of the scaling, addition, and subtraction of vectors have been given, and a general vector has been defined in terms of the set of three unit vectors i, j, and k that lie along the orthogonal cartesian axes O{x, y, z}. Finally, the vector and cartesian equations of a straight line in space have been derived, and the standard form of the cartesian equations has been introduced from which a vector parallel to the line may be found by inspection. EXERCISES 2.1 1. Prove Results 1, 3, and 6 of Theorem 2.1. 2. Given that a = 2i + 3j − k, b = i − j + 2k, and c = 3i + 4j + k, find (a) a + 2b − c, (b) a vector d such that a + b + c + d = 0, and (c) a vector d such that a − b + c + 3d = 0. 3. Given a = i + 2j + 3k, b = 2i − 2j + k, find (a) a vector c such that 2a + b + 2c = i + k, (b) a vector c such that 3a − 2b + c = i + j − 2k. 4. Given that a = 3i + 2j − 3k, b = 2i − j + 5k, and c = 2i + 5j + 2k, find (a) 2a + 3b − 3c, (b) a vector d such that a + 3b − 2c + 3d = 0, and (c) a vector d such that 2a − 3d = b + 4c. 5. Given that Aand B have the respective position vectors 2i + 3j − k and i + 2j + 4k, find the vector AB and a unit vector in the direction of AB. 6. Given that A and B have the respective position vectors 3i − j + 4k and 2i + j + k, find the vector AB and the position vector c of the mid-point of AB. 7. Given that Aand B have the respective position vectors a and b, find the position vector of a point P on the line AB located between A and B such that (length AP)/(length PB) = m/n, where m, n > 0 are any two real numbers.
  • 88. 70 Chapter 2 Vectors and Vector Spaces 8. Find the position vector r of a point P on the straight line joining point Aat (1, 2, 1) and point B at (3, −1, 2) and between A and B such that 13. A straight line L is given in the form 2x + 1 3y + 2 2 − 4z = = . 3 4 −1 (length AP)/(length PB) = 3/2. 9. It is known from Euclidean geometry that the medians of a triangle (lines drawn from a vertex to the mid-point of the opposite side) all meet at a single point P, and that P is two-thirds of the distance along each median from the vertex through which it passes. If the vertices A, B, and C of a triangle have the respective position vectors a, b, and c, show that the position vector of P is (1/3)(a + b + c). 10. Forces of 1, 2, and 3 units act through the origin along, and in the positive directions of, the respective x-, y-, and z-axes. Find the vector sum S of these forces, the magnitude S of the sum of the vectors, and a unit vector in the direction of S. 11. Forces of 2, 1, and 4 units act through the origin along, and in the positive directions of, the respective x-, y-, and z-axes. Find the vector sum S of these forces, the magnitude S of the sum of the vectors, and a unit vector in the direction of S. 12. A straight line L is given in the form 3x − 1 2y + 3 2 − 3z = = . 4 2 1 Find the position vectors of two different points on L and a unit vector parallel to L. 2.2 14. 15. 16. 17. 18. Find position vectors of two different points on L and a unit vector parallel to L. Given that a straight line L1 passes through the points (−2, 3, 1) and (1, 4, 6), find (a) the position vector of a point on the line and a vector parallel to it, and (b) a straight line L2 parallel to L1 that passes through the point (1, 2, 1). Given that a straight line L1 passes through the points (3, 2, 4) and (2, 1, 6), find (a) the position vector of a point on the line and a vector parallel to it, and (b) a straight line L2 parallel to L1 that passes through the point (−2, 1, 2). A straight line has the vector equation r = a + λb, where a = 3j + 2k, and b = 2i + j + 2k. Find the cartesian equations of the line and the coordinates of three points that lie on it. A straight line passes through the point (3, 2, −3) parallel to the vector 2i + 3j − 3k. Find the cartesian equations of the line and the coordinates of three points that lie on it. In mechanics, if a point A moves with velocity vA and point B moves with velocity vB , the velocity vR of A relative to B (the relative velocity of A with respect to B) is defined as vR = vA − vB . Power boat A moves northeast at 20 knots and power boat B moves southeast at 30 knots. Find the velocity of boat Arelative to boat B, and a unit vector in the direction of the relative velocity. The Dot Product (Scalar Product) A product of two vectors a and b can be formed in such a way that the result is a scalar. The result is written a · b and called the dot product of a and b. The names scalar product and inner product are also used in place of the term dot product. Dot Product dot or scalar product Let a and b be any two vectors that after a translation to bring their bases into coincidence are inclined to one another at an angle θ , as shown in Fig. 2.14, where 0 ≤ θ ≤ π. Then the dot product of a and b is defined as the number a · b = a · b cos θ. This geometrical definition of the dot product has many uses, but when working with vectors a and b that are expressed in terms of their components in the i, j, and
  • 89. Section 2.2 The Dot Product (Scalar Product) 71 a θ b FIGURE 2.14 Vectors a and b inclined at an angle θ. k directions, a more convenient form is needed. An equivalent definition that is easier to use is given later in (23). properties of the dot product Properties of the dot product The following results, in which a and b are any two vectors and λ and μ are any two scalars, are all immediate consequences of the definition of the dot product. The dot product is commutative a · b = b · a and λa · μb = μa · λb = λμa · b (15) The dot product is distributive and linear a · (b + c) = a · b + a · c and a · (λb + μc) = λa · b + μa · c. (16) The angle between two vectors The angle θ between vectors a and b is given by cos θ = a·b , a · b with 0 ≤ θ ≤ π. (17) Parallel vectors (θ = 0) If vectors a and b are parallel, then a·b= a · b and, in particular, a · a = a 2. (18) Orthogonal vectors (θ = π/2) If vectors a and b are orthogonal, then a · b = 0. (19)
  • 90. 72 Chapter 2 Vectors and Vector Spaces Product of unit vectors ˆ ˆ If a and b are unit vectors, then ˆ ˆ a · b = cos θ, with 0 ≤ θ ≤ π. (20) An immediate consequence of properties (15), (19), and (20) is that i · i = j · j = k · k = 1, (21) i · j = j · i = i · k = k · i = j · k = k · j = 0. (22) and We now use results (21) and (22) to arrive at a simple expression for the dot product in terms of the components of a and b. To arrive at the result we set a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k and form the dot product a · b = (a1 i + a2 j + a3 k) · (b1 i + b2 j + b3 k). dot product in terms of components Expanding this product using (15) and (16) and making use of results (21) and (22) brings us to the following alternative definition of the dot product expressed in terms of the components of a and b: a · b = a1 b1 + a2 b2 + a3 b3 . (23) Using (23) in (17) produces the following useful expression that can be used to find the angle θ between a and b: cos θ = EXAMPLE 2.12 a1 b1 + a2 b2 + a3 b3 2 a1 + 2 a2 2 + a3 1/2 2 2 2 b1 + b2 + b3 1/2 where 0 ≤ θ ≤ π. (24) Find a · b and the angle between the vectors a and b, given that a = i + 2j + 3k and b = 2i − j − 2k. √ Solution a = 14, b = 3, and a · b = 1 · 2 + 2 · (−1) + 3 · (−2) = −6. Using these results in (24) gives √ √ cos θ = −6/(3 14) = −2/ 14, so as 0 ≤ θ ≤ π we see that θ = 2.1347 radians, or θ = 122.3◦ . projecting a vector onto a line The projection of a vector onto the line of another vector The projection of vector a onto the line of vector b is a scalar, and it is the signed length of the geometrical projection of vector a onto a line parallel to b, with the sign positive for 0 ≤ θ < π/2 and negative for π/2 < θ ≤ π . This is illustrated in Fig. 2.15, from which it is seen that the signed length of the projection of a onto the line of vector b is ON, where ON = a cos θ.
  • 91. Section 2.2 OA = ⎥⎢a⎥⎢ a A The Dot Product (Scalar Product) A a OA = ⎥⎢a⎥⎢ a a θ b 73 O N N b ^ b θ O ^ b π/2 < θ ≤ π 0 ≤ θ < π/2 FIGURE 2.15 The projection of vector a onto the line of vector b. ˆ ˆ ˆ ˆ If b is the unit vector along b, then as a = a a , and a · b = cos θ , the projection ON = a cos θ can be written as the dot product ˆ ˆ ˆ ON = a a · b = a · b = EXAMPLE 2.13 a·b b (25) Find the strength of the magnetic field vector H = 5i + 3j + 7k in the direction of 2i − j + 2k, where a unit vector represents one unit of magnetic flux. Solution We are required to find the projection of vector H in the direction of ˆ the vector 2i − j + 2k. Setting b = 2i − j + 2k, b = 3, so b = (1/3)(2i − j + 2k), so the strength of the vector H in the direction of b is ˆ H · b = (1/3)(5i + 3j + 7k) · (2i − j + 2k) = 7. Direction cosines and direction ratios ˆ If a = a1 i + a2 j + a3 k is an arbitrary vector, the unit vector a in the direction of a is ˆ a = (a1 i + a2 j + a3 k)/ a 1/2 2 2 2 = (a1 i + a2 j + a3 k)/ a1 + a2 + a3 . (26) Taking the dot product of a with i, j, and k, and setting l = a1 / 2 2 2 2 2 2 2 2 2 (a1 + a2 + a3 )1/2 , m = a2 /(a1 + a2 + a3 )1/2 , and n = a3 /(a1 + a2 + a3 )1/2 gives ˆ l = i · a, ˆ m = j · a, and ˆ n = k · a, so we may write ˆ a = li + mj + nk. 2 2 2 ˆ ˆ The dot product a · a = l 2 + m2 + n2 = (a1 + a2 + a3 )/ a 2 , but 2 2 2 a1 + a2 + a3 , so l 2 + m2 + n2 = 1. (27) a 2 = (28) The number l is the cosine of the angle β1 between a and the x-axis, the number m is the cosine of the angle β2 between a and the y-axis, and the number n is the cosine of the angle β3 between a and the z-axis, as shown in
  • 92. 74 Chapter 2 Vectors and Vector Spaces z β3 O a⎥ ⎢ = ⎥⎢ OA a A β2 β1 y x FIGURE 2.16 The angles β1 , β2 , and β3 . direction cosines Fig. 2.16. The numbers (l, m, n) are called the direction cosines of a, because ˆ they determine the direction of the unit vector a that is parallel to a. Notice that when any two of the three direction cosines l, m, and n of a vector a are given, the third is related to them by l 2 + m2 + n2 = 1. Because of result (27) it is always possible to write a = a (li + mj + nk), direction ratios EXAMPLE 2.14 (29) where l, m, and n are the direction cosines of a. As the components a1 , a2 , and a3 of a are proportional to the direction cosines, they are called the direction ratios of a. Find the direction cosines and direction ratios of a = 3i + j − 2k. √ √ √ Solution√ As a = 14, the direction cosines are l = 3/ 14, m = 1/ 14, and n = −2/ 14. The direction ratios of a are 3, 1, and −2, or any nonnegative multiple √ √ √ of these three numbers such as 15/ 14, 5/ 14, and −10/ 14. The triangle inequality The following result will be needed in the proof of the triangle inequality that is to follow. The absolute value of a · b = a · b cos θ is |a · b| = a · b |cos θ|, but | cos θ | ≤ 1, so using this in the above result we obtain the Cauchy–Schwarz inequality, |a · b| ≤ a · b . (30)
  • 93. Section 2.2 THEOREM 2.2 The Dot Product (Scalar Product) 75 The triangle inequality If a and b are any two vectors, then a+b ≤ a + b . Proof From (18) we have a+b 2 = (a + b) · (a + b) = a · a + 2a · b + b · b = a 2 + 2a · b + b 2 , but a · b ≤ |a · b|, so from the Cauchy–Schwarz inequality (30) a+b 2 ≤ a 2 +2 a · b + b 2 = ( a + b )2 . Taking the positive square root of this last result, we obtain the triangle inequality a+b ≤ a + b . The triangle inequality will be generalized in Section 2.5, but in its present form it is the vector equivalent of the Euclidean theorem that “the sum of the lengths of any two sides of a triangle is greater than or equal to the length of the third side,” and it is from this theorem that the inequality derives its name. Equation of a Plane vector equation of a plane When working with the vector calculus it is sometimes necessary to consider a plane that is locally tangent to a point on a surface in space so it will be useful to derive the general equation of a plane in both its vector and cartesian forms. A plane can be defined by specifying a fixed point belonging to the plane and a vector n that is perpendicular to the plane. This follows because if n is perpendicular at a point on the plane, it must be perpendicular at every point on the plane. Any vector n that is perpendicular to a plane is called a normal to the plane. Clearly a normal to a plane is not unique, because a plane has two sides, so if a normal n is directed away from one side of the plane, the vector −n is a normal directed away from the other side. Both n and −n can be scaled by any nonzero number and still remain normals; consequently, if n is a normal to a plane, so also are all vectors of the form λn, with λ = 0 any real number. Let a fixed point Aon plane with normal n have position vector a relative to an origin O, and let P be a general point on plane with position vector r relative to O. Then, as may be seen from Fig. 2.17, the vector r − a lies in the plane, and so is perpendicular (normal) to n. Forming the dot product of n and r − a, and using (19), shows that the vector equation of plane is n · (r − a) = 0, (31) n · r = n · a. (32) or, equivalently, cartesian equation of a plane The cartesian form of this equation follows by considering a general point with coordinates (x, y, z) on plane , setting r = xi + yj + zk, a = a1 i + a2 j + a3 k,
  • 94. 76 Chapter 2 Vectors and Vector Spaces n π A r−a AP = P a ^.a = ^.r n n ^ n r O FIGURE 2.17 Plane through point A. with normal n passing and n = n1 i + n2 j + n3 k, and then substituting into (32) to get (n1 i + n2 j + n3 k) · (xi + yj + zk) = (n1 i + n2 j + n3 k) · (a1 i + a2 j + a3 k). Taking the dot products and using results (21) and (22) show the cartesian equation of plane to be n1 x + n2 y + n3 z = n1 a1 + n2 a2 + n3 a3 = d, a constant. EXAMPLE 2.15 (33) Find the cartesian equation of the plane through the point (2, 5, 3) with normal 3i + 2j − 7k. Solution Here n1 = 3, n2 = 2, n3 = −7 and a1 = 2, a2 = 5, and a3 = 3, so substituting into (33) shows the plane has the equation 3x + 2y − 7z = −5. Summary This section has introduced the dot or scalar product of two vectors in geometrical terms and, more conveniently for calculations, in terms of the components of the two vectors involved. The applications given include the important operation of projecting a vector onto the line of another vector and the derivation of the vector equation and cartesian equation of a plane. EXERCISES 2.2 1. Find the dot products of the following pairs of vectors: (a) i − j + 3k, 2i + 3j + k. (b) 2i − j + 4k, −i + 2 j + 2k. (c) i + j − 3k, 2i + j + k. 2. Find the dot products of the following pairs of vectors: (a) i − 2 j + 4k, i + 2 j + 3k. (b) 3i + j + 2k, 4i − 3j + k. (c) 5i − 3j + 3k, 2i − 3j + 5k. 3. Find which of the following pairs of vectors are orthogonal: (a) 3i + 2 j − 6k, −9i − 6j + 18k. (b) 3i − j + 7k, 3i + 2 j + k. (c) 2i + j + k, i + j − k. (d) i + j − 3k, 2i + j + k. 4. Find which, if any, of the following pairs of vectors are orthogonal: (a) 2i + j + k, 8i + 2 j + 2k. (b) i + 2 j + 3k, 2i − 2 j − 3k. (c) i + 2 j + 4k, 2i + j + 3k. (d) i + j, 2 j + 3k. 5. Given that a = 2i + 3j − 2k, b = i + 3j + k and c = 3i + j − k, find (a) (a + b) · c. (b) (2b − 3c) · a. (c) a · a. (d) c · (a − 2b).
  • 95. Section 2.3 6. Given that a = 3i + 2 j − 3k, b = 2i + j + 2k, and c = 5i + 2 j − 2k, find (a) b · (b + (a · c)c). (b) (a + 2b) · (2b − 3c). (c) (c · c)b − (a · a)c. 7. Find the angle between the following pairs of vectors: (a) i + j + k, 2i + j − k. (b) 2i − j + 3k, 2i + j + 3k. (c) 3i − j + k, i − 2 j + 3k. (d) i − 2 j + k, 4i − 8j + 16k. 8. Given a = 2i − 3j − 3k, b = i + j + 2k, and c = 3i − 2 j − k, find the angles between the following pairs of vectors: (a) a + b, b − 2c. (b) 2a − c, a + b − c. (c) b + 3c, a − 2c. 9. Find the component of the force F = 4i + 3j + 2k in the direction of the vector i + j + k. 10. Find the component of the force F = 2i + 5j − 3k in the direction of the vector 2i + j − 2k. 11. Given that a = i + 2 j + 2k and b = 2i − 3j + k, find (a) the projection of a onto the line of b, and (b) the projection of b onto the line of a. 12. Given that a = 3i + 6j + 9k and b = i + 2 j + 3k, (a) find the projection of a onto the line of b and (b) compare the magnitude of a with the result found in (a) and comment on the result. 13. Find the direction cosines and corresponding angles for the following vectors: (a) i + j + k. (b) i − 2 j + 2k. (c) 4i − 2 j + 3k. 14. Find the direction cosines and corresponding angles for the following vectors: (a) i − j − k. (b) 2i + 2 j − 5k. (c) −4j − k. 15. Verify the triangle inequality for vectors a = i + 2 j + 3k and b = 2i + j + 7k. 16. Verify the triangle inequality for vectors a = 2i − j − 2k and 3i + 2 j + 3k. 17. Find the equation of the plane with normal 2i − 3j + k that contains the point (1, 0, 1). 18. Find the equation of the plane with normal i − 2 j + 2k that contains the point (2, −3, 4). 19. Given that a plane passes through the point (2, 3, −5), and the vector 2i + k is normal to the plane, find the cartesian form of its equation. 2.3 The Cross Product 77 20. The equation of a plane is 3x + 2y − 5z = 4. Find a vector that is normal to the plane, and the position vector of a point on the plane. 21. Explain why if the vector equation of plane in (32) is divided by n to bring it into the form r · n = a · n, the number |a · n| is the perpendicular distance of origin O from the plane. Explain also why if a · n > 0 the plane lies to the side of O toward which n is directed, as in Fig. 2.15, but that if a · n < 0 it lies on the opposite side of O toward which −n is directed. 22. Use the result of Exercise 21 to find the perpendicular distance of the plane 2x − 4y − 5z = 5 from the origin. 23. The angle between two planes is defined as the angle between their normals. Find the angle between the two planes x + 3y + 2z = 4 and 2x − 5y + z = 2. 24. Find the angle between the two planes 3x + 2y − 2z = 4 and 2x + y + 2z = 1. 25. Let a and b be two arbitrary skew (nonparallel) vectors, and set a = ab + ap , where ab is parallel to b and ap is perpendicular to b and lies in the plane of a and b. Find ab and ap in terms of a and b. 26. The law of cosines for a triangle with sides of length a, b, and c, in which the angle opposite the side of length c is C, takes the form c2 = a 2 + b2 − 2ab cos C. Prove this by taking vectors a, b, and c such that c = a − b and considering the dot product c · c = (a − b) · (a − b). 27. The work units W done by a constant force F when moving its point of application along a straight line L parallel to a vector a are defined as the product of the component of F in the direction of a and the distance d moved along line L. Express W in terms of F, a, and d. 28. If a and b are arbitrary vectors and λ and μ are any two scalars, prove that λa + μb 2 ≤ λ2 a 2 + 2λμa · b + μ2 b 2 . 29. Verify the result of Exercise 28 by setting λ = 2, μ = −3, a = 3i + j − 4k, and b = 2i + 3j + k. The Cross Product A product of two vectors a and b can be defined in such a way that the result is a vector. The result is written a × b and called the cross product of a and b. The name vector product is also used in place of the term cross product. Before defining the cross product we first formulate what is called the right-hand rule. Given any two skew vectors a and b, the right-hand rule is used to determine
  • 96. 78 Chapter 2 Vectors and Vector Spaces the sense of a third vector c that is required to be normal to the plane containing vectors a and b. right-hand rule The Right-Hand Rule Let a and b be two arbitrary skew vectors with the same base point, with c a vector normal to the plane containing them. If the fingers of the right hand are curled in such a way that they point from vector a to vector b through the angle θ between them, with 0 < θ < π , then when the thumb is extended away from the palm it will point in the direction of vector c. When applying the right-hand rule, the order of the vectors is important. If vectors a, b, and c obey the right-hand rule, they will always be written in the order a, b, c, with the understanding that c is normal to the plane of a and b, with its sense determined by the right-hand rule. Figure 2.18 illustrates the right-hand rule. An important special case of the right-hand rule has already been encountered in connection with the unit vectors i, j, and k that obey the rule, and because the vectors are mutually orthogonal the vectors j, k, i and k, i, j also obey the right-hand rule. geometrical definition of a cross product The cross product (a geometrical interpretation) ˆ Let a and b be two arbitrary vectors, with n a unit vector normal to the plane of ˆ a and b chosen so that a, b, and n, in this order, obey the right-hand rule. Then the cross product of vectors a and b, written a × b, is defined as the vector ˆ a × b = a . b sin θ n. (34) This geometrical definition of the cross product is useful in many situations, but when the vectors a and b are specified in terms of their cartesian components a different form of the definition will be needed. The cross product can be interpreted as a vector area, in the sense that it can ˆ be written a × b = Sn, where S = OA· BN = a · b sin θ is the geometrical area c θ b a FIGURE 2.18 The right-hand rule.
  • 97. Section 2.3 The Cross Product 79 B ^ n S b A θ a O FIGURE 2.19 The cross product interpreted as the vector area of a parallelogram. ˆ of the parallelogram in Fig. 2.19, and the unit vector n is normal to the area. This shows that the geometrical area S of the vector parallelogram with sides a and b is simply the modulus of the cross product a × b, so S = a × b . properties of the cross product Properties of the cross product The following results are consequences of the definition of the cross product. The cross product is anticommutative a × b = −b × a (35) a × (b + c) = a × b + a × c. (36) The cross product is associative Parallel vectors (θ = 0) If vectors a and b are parallel, then a × b = 0. (37) Orthogonal vectors (θ = π/2) If vectors a and b are orthogonal, then ˆ a × b = a . b n. (38) ˆ a × b = sin θ n. (39) Product of unit vectors If a and b are unit vectors, then An immediate consequence of properties (34), (35), and (37) is that i × i = j × j = k × k = 0, (40)
  • 98. 80 Chapter 2 Vectors and Vector Spaces and i × j = k, j × i = −k, j × k = i, k × j = −i, k × i = j, i × k = −j. (41) Only results (35) and (36) require some comment, as the other results are obvious. The change of sign in (35) that makes the cross product anticommutative occurs because when the vectors a and b are interchanged, the right-hand rule ˆ causes the direction of n to be reversed. Result (36) can be proved in several ways, but we shall postpone its proof until a different expression for the cross product has been derived. To obtain a more convenient expression for the cross product that can be used when a and b are known in terms of their components, we proceed as follows. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k, and consider the cross product a × b = (a1 i + a2 j + a3 k) × (b1 i + b2 j + b3 k). Expanding this expression term by term is justified because of the associative property given in (36), and it leads to the result a × b = a1 b1 i × i + a1 b2 i × j + a1 b3 i × k + a2 b1 j × i + a2 b2 j × j + a2 b3 j × k + a3 b1 k × i + a3 b2 k × j + a3 b3 k × k. cross product in terms of components Results (40) cause three terms on the right-hand side to vanish, and results (41) allow the remaining six terms to be collected into three groups as follows to give a × b = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k. (42) This alternative expression for the cross product in terms of the cartesian components of vectors a and b can be further simplified by making formal use of the third-order determinant, i a × b = a1 b1 j a2 b2 k a3 , b3 because a formal expansion in terms of elements of the first row generates result (42). We take this result as an alternative but equivalent definition of the cross product. practical definition of a cross product using a determinant The cross product (cartesian component form) Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then i a × b = a1 b1 j a2 b2 k a3 , b3 (43) When expressing a × b as the determinant in (43), purely formal use was made of the method of expansion of a determinant in terms of the elements of its first row, because (43) is not a determinant in the ordinary sense as its elements are a mixture of vectors and numbers.
  • 99. Section 2.3 EXAMPLE 2.16 The Cross Product 81 ˆ Given that a = 3i − 2 j − k and b = i + 4j + 2k, find a × b and a unit vector n normal to the plane containing a and b such that a, b, and n, in this order, obey the right-hand rule. Solution Substitution into expression (43) gives i a×b = 3 1 j −2 4 k −1 2 = [(−2) · 2 − 4 · (−1)]i − [3 · 2 − 1 · (−1)] j + [3 · 4 − 1 · (−2)]k = −7j + 14k. ˆ The required unit vector n is simply the unit vector in the direction of a × b, so √ ˆ n = (a × b)/ a × b = (−7j + 14k)/(7 5). √ √ = (−1/ 5)j + (2/ 5)k. We now return to the proof of the associative property stated in (35) and establish it by means of result (43). Setting a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k, we have a × (b + c) = i a1 (b1 + c1 ) j a2 (b2 + c2 ) k a3 . (b3 + c3 ) Expanding the determinant in terms of elements of its first row and grouping terms gives a × (b + c) = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k + (a2 c3 − a3 c2 )i − (a1 c3 − a3 c1 )j + (a1 c2 − a2 c1 )k = a × b + a × c, and the result is proved. Summary This section first introduced the vector or cross product of two vectors in geometrical terms and then used the result to show that the vector product is anticommutative, in the sense that a × b = −b × a. Important results involving the vector product are given in terms of the components of the two vectors that are involved. Finally, the vector product was expressed in a form that is most convenient for calculations by writing it in determinantal form, the rows of which contain the unit vectors i, j, and k and the components of the respective vectors. EXERCISES 2.3 In Exercises 1 through 6 use (43) to find a × b. 1. 2. 3. 4. 5. For a = 2i − j − 4k, b = 3i − j − k. For a = −3i + 2 j + 4k, b = 2i + j − 2k. For a = 7i + 6k, b = 3j + k. For a = 3i + 7j + 2k, b = i − j + k. For a = 2i + j + k, b = 2i − j + k. 6. For a = 3i − 2 j + 6k, b = 2i + j + 3k. In Exercises 7 through 10 verify the equivalence of the definitions of the cross product in (34) and (43) by first using ˆ (43) to calculate a × b, and hence a × b and n, and then calculating a and b directly, using result (17) to find cos θ and hence sin θ, and using the results to find a × b from (34).
  • 100. 82 7. 8. 9. 10. Chapter 2 Vectors and Vector Spaces For a = i + j + 3k and b = 3i + 2 j + k. For a = i + j + k and b = 4i + 2 j + 2k. For a = 2i + j − 3k and b = 5i − 2k. For a = −2i − 3j + k and b = 3i + j + 2k. 21. 22. 23. 24. In Exercises 11 through 14, verify by direct calculation that (b + c) × a = −a × (b + c). 11. 12. 13. 14. a = 3j + 2k, b = i − 4j + k, and c = 5i − 2 j + 3k. a = −i + 5j + 2k, b = 4i + k, and c = −2i − 4j + 3k. a = i + k, b = 3i − j − 2k, and c = 3i + j + k. a = 5i + j + k, b = 2i − j − k, and c = 4i + 2 j + 3k. In Exercises 15 through 18 find a unit vector normal to a plane containing the given vectors. 3i + j + k and i + 2 j + k. 2i − j + 2k and 2i + 3j + k. i + j + k and 2i + 3j − k. 2i + 2 j − k and 3i + j + 4k. Find a unit vector normal to a plane containing vectors a + b and a + c, given that a = i + 2 j + k, b = 2i + j − 2k, and c = 3i + 2 j + 4k. 20. Given that a = 3i + j + k, b = 2i − j + 2k, and c = i + j + k, find (a) a vector normal to the plane containing the vectors a + (a · b)b and c and, (b) explain why the normal to a plane containing the vectors a and b and the normal to a plane containing the vectors (a · b)a and (b · c)b are parallel. 15. 16. 17. 18. 19. In Exercises 21 through 24, find the cartesian equation of the plane that passes through the given points. 2.4 25. 26. 27. 28. 29. (1, 3, 2), (2, 0, −4), and (1, 6, 11). (1, 4, 3), (2, 0, 1), and (3, 4, −6). (1, 2, 3), (2, −4, 1), and (3, 6, −1). (1, 0, 1), (2, 5, 7), and (2, 3, 9). Three points with position vectors a, b, and c will be collinear (lie on a line) if the parallelogram with adjacent sides a − b and a − c has zero geometrical area. Use this result in Exercises 25 through 28 to determine which sets of points are collinear. (2, 2, 3), (6, 1, 5), (−2, 4, 3). (1, 2, 4), (7, 0, 8), (−8, 5, −2). (2, 3, 3), (3, 7, 5), (0, −5, −1). (1, 3, 2), (4, 2, 1), (1, 0, 2). A vector N normal to the plane containing the skew vectors a and b can be found as follows. N is normal to a and b, so a · N = 0 and b · N = 0. If a component of N is assigned an arbitrary nonzero value c, say, the other two components can be found from these two equations as multiples of c, and N will then be determined as a multiple of c. A suitable choice of c will make N a ˆ unit normal N. Apply this method to vectors a and b in ˆ Exercise 7 to find a vector N. Compare the result with the unit vector ˆ n = (a × b)/ a × b ˆ ˆ found from (43). Explain why although both n and N are normal to the plane containing a and b they may have opposite senses. Linear Dependence and Independence of Vectors and Triple Products The dot and cross products can be combined to provide a simple test that determines whether or not an arbitrary set of three vectors possesses a property of fundamental importance to the algebra of vectors. First, however, some introductory remarks are necessary. Given a set of n vectors a1 , a2 , . . . , an , and a set of n constants c1 , c2 , . . . , cn , the sum c1 a1 + c2 a2 + · · · + cn an linear combination of vectors basis is called a linear combination of the vectors. Linear combinations of the vectors i, j, and k were used in Section 2.1 to express every vector in three-dimensional space as a linear combination of these three vectors. A triad of vectors such as i, j, and k with the property that all vectors in three-dimensional space can be represented as linear combinations of these three vectors is said to form a basis for the space.
  • 101. Section 2.4 Linear Dependence and Independence of Vectors and Triple Products 83 z a3 0 a2 a1 y x FIGURE 2.20 Nonorthogonal triad forming a basis in three-dimensional space. It is a fundamental property of three-dimensional space that a basis for the space comprises a set of three vectors a1 , a2 , and a3 , with the property that the linear combination c1 a1 + c2 a2 + c3 a3 = 0 linear independence and linear dependence (44) is only true when c1 = c2 = c3 = 0. Vectors a1 , a2 , and a3 satisfying this condition are said to be linearly independent vectors, and a vector d of the form d = c1 a1 + c2 a2 + c3 a3 , where not all of c1 , c2 , and c3 are zero, is said to be linearly dependent on the vectors a1 , a2 , and a3 . The vectors i, j, and k that form a basis for three-dimensional space are linearly independent vectors, but the position vector r = 2i − 3j + 5k is linearly dependent on vectors i, j, and k. Clearly, vectors i, j, and k do not form the only basis for three-dimensional space, because any triad of linearly independent vectors a1 , a2 , and a3 will serve equally well, as, for example, the nonorthogonal set of vectors shown in Fig. 2.20. The dot and cross products will now be combined to develop a test for linear dependence and independence based on the elementary geometrical idea of the volume of the parallelepiped shown in Fig. 2.21, three edges a, b, and c of which meet at the origin. z y ^ n V c b θ 0 a FIGURE 2.21 Volume V of a parallelepiped. x
  • 102. 84 Chapter 2 Vectors and Vector Spaces The volume V of a parallelepiped is a nonnegative number given by the product of the area of its base and its height. Suppose vectors a and b are chosen to form two sides of the base of the parallelepiped. Then the vector area of this base has already been interpreted as a × b. The vertical height of the parallelepiped is the ˆ projection of vector c in the direction of the unit vector n normal to the base, and ˆ ˆ so is given by n · c. Consequently, as a × b = a · b sin θ n, it follows that V = |(a × b) · c|. (45) The absolute value of the right-hand side of (45) has been taken because a volume must be a nonnegative quantity, whereas the dot product (a × b) · c may be of either sign. If vectors a, b, and c form a basis for three-dimensional space, vector c cannot be linearly dependent on vectors a and b, and so the parallelepiped in Fig. 2.21 with these vectors as its sides must have a nonzero volume. If, however, vectors a, b, and c are coplanar (all lie in the same plane), and so cannot form a basis for the space, the volume of the parallelepiped will be zero. These simple geometrical observations lead to the following test for the linear independence of three vectors in three-dimensional space. THEOREM 2.3 a test for linear independence scalar triple product Test for linear independence of vectors in three-dimensional space Let a, b, and c be any three vectors. Then the vectors are linearly independent if (a × b)· c = 0, and they are linearly dependent if (a × b) · c = 0. A product of the type (a × b) · c is called a scalar triple product. The name arises because the result is a scalar. It is also called a mixed triple product since both · and × appear. Three vectors are involved in this dot (scalar) product, one of which is the vector a × b and the other is the vector c. Scalar triple products are easily evaluated, because taking the dot product of a × b in the form given in (42) with c = c1 i + c2 j + c3 k gives (a × b) · c = (a2 b3 − a3 b2 )c1 − (a1 b3 − a3 b1 )c2 + (a1 b2 − a2 b1 )c3 . The right-hand side of this expression is simply the value of a determinant with successive rows given by the components of a, b, and c, so we have arrived at the following convenient formula for the scalar triple product. scalar triple product as a determinant Scalar triple product Let a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k. Then a1 (a × b) · c = b1 c1 a2 b2 c2 a3 b3 . c3 (46) Interchanging any two rows in a matrix changes the sign but not the value of its determinant. Two such switches in (46) leave the value unchanged, so the dot
  • 103. Section 2.4 Linear Dependence and Independence of Vectors and Triple Products 85 product is commutative and so we arrive at the useful result (a × b) · c = a · (b × c). (47) So, in a scalar triple product the dot and cross may be interchanged without altering the result. EXAMPLE 2.17 Given the two sets of vectors (a) a = i + 2j − 5k, b = i + j + 2k, c = i + 4j − 19k and (b) a = 2i + j + k, b = 3i + 4k, c = i + j + k, find if the vectors are linearly independent or linearly dependent. Solution We apply Theorem 2.3 to each set, using result (46) to evaluate the scalar triple products. (a) 1 (a × b) · c = 1 1 −5 2 = 0, −19 2 1 4 so the set of three vectors in (a) is linearly dependent. In fact this can be seen from the fact that c = 3a − 2b. (b) 2 (a × b) · c = 3 1 1 0 1 1 4 = −4 = 0, 1 so the set of three vectors in (b) is linearly independent. Although not required, the volume V of the parallelepiped formed by these three vectors is V = |(a × b) · c| = | − 4| = 4. Another notation for the scalar triple product of vectors a, b, and c is [a, b, c], so [a, b, c] = (a × b) · c, (48) or, in terms of a determinant, a1 [a, b, c] = b1 c1 alternative forms of a scalar triple product a2 b2 c2 a3 b3 . c3 (49) Using this definition of [a, b, c] with the row interchange property of determinants (see Section 1.7) shows that [a, b, c] = [b, c, a] = [c, a, b], (50) because two row interchanges are needed to arrive at [b, c, a] from [a, b, c], leaving the sign of the determinant unchanged, whereas two more are required to arrive at [c, a, b] from [b, c, a], again leaving the sign of the determinant unchanged. The order of the vectors in results (46), or in the equivalent notation of (48), is easily remembered when the results are abbreviated to a b b c c a c a b
  • 104. 86 Chapter 2 Vectors and Vector Spaces In this pattern, row two follows from row one when the first letter is moved to the end position, and row three follows from row two by means of the same process. The effect of applying this process to the third row is simply to regenerate the first row. Rearrangements of this kind are called cyclic permutations of the three vectors. Again making use of the row interchange property of determinants (see Section 1.7), it follows that [a, b, c] = −[a, c, b], because this time only one row interchange is needed to produce the result on the right from the one on the left, so that a sign change is involved. A different product involving the three vectors a, b, and c that this time generates another vector is of the form a × (b × c), and products of this type are called vector triple products since the results are vectors. In these products it is essential to include the brackets because, in general, a × (b × c) = (a × b) × c. The most important results concerning vector triple products are given in the following theorem. THEOREM 2.4 Vector triple products (a) vector triple product If a, b, and c are any three vectors, then a × (b × c) = (a · c)b − (a · b)c and (b) (a × b) × c = (a · c)b − (b · c)a. Proof The proof of the results in Theorem 2.4 both follow in similar fashion, so we only prove result (a) and leave the proof of result (b) as an exercise. We write the cross product a × (b × c) in the form of the determinant in (43), with the components of a in the second row and those of b × c (obtained from (42)) in the third row when we find that a × (b × c) = i a1 (b2 c3 − b3 c2 ) j a2 (b3 c1 − b1 c3 ) k a3 . (b1 c2 − b2 c1 ) Expanding this determinant in terms of the elements of its first row and grouping terms gives a × (b × c) = [(a2 c2 + a3 c3 )b1 − (a2 b2 + a3 b3 )c1 ]i + [(a1 c1 + a3 c3 )b2 − (a1 b1 + a3 b3 )c2 ] j + [(a1 c1 + a2 c2 )b3 − (a1 b1 + a2 b2 )c3 ]k. As it stands, this result is not yet in the form that is required, but adding and subtracting a1 b1 c1 to the coefficient of i, a2 b2 c2 to the coefficient of j, and a3 b3 c3 to the coefficient of k followed by grouping terms give a × (b × c) = (a · c)b − (a · b)c, and the result is established.
  • 105. Section 2.4 EXAMPLE 2.18 Linear Dependence and Independence of Vectors and Triple Products 87 Find a × (b × c) and (a × b) × c, given that a = 3i + j − 4k, b = 2i + j + 3k, and c = i + 5j − k. Solution a · b = −5, a · c = 12, and b · c = 4, so a × (b × c) = (a · c)b − (a · b)c = 12b + 5c = 29i + 37j + 31k, and (a × b) × c = (a · c)b − (b · c)a = 12b − 4a = 12i + 8j + 52k. Accounts of geometrical vectors can be found, for example, in references [2.1], [2.3], [2.6], and [1.6]. Summary This section introduced the two fundamental concepts of linear dependence and independence of vectors. It then showed how the scalar triple product involving three vectors, that gives rise to a scalar quantity, provides a simple test for the linear dependence or independence of the vectors involved. A simple and convenient way of calculating a scalar triple product was shown to be in terms of a determinant with the elements in its rows formed by the components of the three vectors involved in the product. Finally a vector triple product was defined that gives rise to a vector quantity, and it was shown that to avoid ambiguity it is necessary to bracket a pair of vectors in such a product. A rule for the expansion of a vector triple product was derived and shown to involve a linear combination of two of the vectors multiplied by scalar products so that, for example, a × (b × c) = (a · c)b − (a · b)c. EXERCISES 2.4 In Exercises 1 through 4 use the vectors a, b, and c to find (a) the scalar triple product a · (b × c), and (b) the volume of the parallelepiped determined by these three vectors directed away from a corner. 1. 2. 3. 4. a = 2i − j − 3k, b = 3i − 2k, c = i + j − 4k. a = i − j + 2k, b = i + j + 3k, c = 2i − j + 3k. a = −i − j + k, b = 2i + 2 j + 3k, c = −4i + j + 3k. a = 5i + 3k, b = 2i − j, c = −2i + 3j − 2k. In Exercises 5 through 10 find which sets of vectors are coplanar. 5. 6. 7. 8. 9. 10. i + 3j + 2k, 2i + j + 4k, 4i + 7j + 8k. 2i + j + 4k, i + 2 j + k, 4i + 3j + 6k. 2i + k, i + 4j + 2k, 3i + 12 j + 7k. i + j + k, 2i + j + 2k, 4i + 3j + k. 2i + j − k, 3i + j + 2k, 5i + j + 8k. 2i + j − k, i + 2 j + 2k, 5i + 4j + k. In Exercises 11 through 15 use computer algebra to verify that [a, b, c] = [c, a, b] = −[a, c, b]. 11. a = i + j + k, b = 2i + j − k, c = 3i − j + k. 12. a = i − j − k, b = −5i + 2 j − 3k, c = 2i + 3j − 2k. 13. a = −3i − 4j + k, b = 9i + 12 j − 3k, c = i + 2j + k. 14. a = 3i + 4k, b = i + 5k, c = 2 j + k. 15. Prove that if a, b, c, and d are any four vectors, and λ, μ are arbitrary scalars [λa + μb, c, d] = λ[a, c, d] + μ[b, c, d]. Use computer algebra with vectors a, b, c, d from Exercise 12 with d = 4c − 2 j + 6k, and scalars λ, μ of your choice, to verify this result. In Exercises 16 through 20 find (a) the cartesian equation of the plane containing the given points, and (b) a unit vector normal to the plane. 16. 17. 18. 19. 20. 21. (1, 2, 1), (3, 1, −2), (2, 1, 4). (2, 0, 3), (0, 1, 0), (2, 4, 5). (−1, 2, −3), (2, 4, 1), (3, 0, 1). (1, 2, 5), (−2, 1, 0), (0, 2, 0). Prove result (b) of Theorem 2.4. Show that a × (b × c) + b × (c × a) + c × (a × b) = 0. 22. The law of sines for a triangle with angles A, B, and C opposite sides with the respective lengths a, b, and c
  • 106. 88 Chapter 2 Vectors and Vector Spaces takes the form b c a = = . sin A sin B sin C Prove this by considering a vector triangle with sides a, b, and c, where c = a + b, and taking the cross product of c = a + b first with a, then with b, and finally with c. In Exercises 23 through 26 use the fact that four points with position vectors p, q, r, and s will be coplanar if the vectors p − q, p − r, and p − s are coplanar to find which sets of points all lie in a plane. 23. 24. 25. 26. 27. (1, 1, −1), (−3, 1, 1), (−1, 2, −1), (1, 0, 0). (1, 2, −1), (2, 1, 1), (0, 1, 2), (1, 1, 1). (0, −4, 0), (2, 3, 1), (3, −4, −2), (4, −2, −2). (1, 2, 3), (1, 0, 1), (2, 1, 2), (4, 1, 0). The volume of a tetrahedron is one-third of the product of the area of its base and its vertical height. Show the volume V of the tetrahedron in Fig. 2.22, in which three edges formed by the vectors a, b, and c are directed away from a vertex, is given by V = (1/6)|a · (b × c)| 28. Let a, b, c, and d be vectors and λ, μ, ν be scalars satisfying the equation λ(b × c) + μ(c × a) + ν(a × b) + d = 0. Show that if a, b, and c are linearly independent, then λ = −(a · d)/[a · (b × c)], ν = −(c · d)/[a · (b × c)]. 2.5 μ = −(b · d)/[a · (b × c)], a b c FIGURE 2.22 Tetrahedron. 29. Let a, b, c, and d be vectors and λ, μ, ν be scalars satisfying the equation λa + μb + νc + d = 0. By taking the scalar products of this equation first with b × c, then with a × c, and finally with a × b, show that if a, b, and c are linearly independent, then λ = −d · (b × c)/[a · (b × c)], μ = −d · (c × a)/[a · (b × c)], ν = −d · (a × b)/[a · (b × c)]. 30. Show that a = i + 2 j + k, b = 2i − j − k, and c = 4i + 3j + ik are linearly independent vectors, and use them with a vector d of your choice to verify the results of Exercises 28 and 29. 31. Prove the Lagrange identity (a × b) · (c × d) = (a · c)(b · d) − (a · d)(b · c). n-Vectors and the Vector Space R n n-tuples There are many occasions when it is convenient to generalize a vector and its associated algebra to spaces of more than three dimensions. A typical situation occurs in mechanics, where it is sometimes necessary to consider both the position and the momentum of a particle as functions of time. This leads to the study of a 6-vector, three components of which specify the particle position and three its momentum vector at a time t. Sets of n numbers (x1 , x2 , . . . , xn ) in a given order, that can be thought of either as n-vectors or as the coordinates of a point in n-dimensional space are called ordered n-tuples of real numbers or, simply, n-tuples.
  • 107. n-Vectors and the Vector Space R n Section 2.5 n-vector 89 An n-vector If n ≥ 2 is an integer, and x1 , x2 , . . . , xn are real numbers, an n-vector is an ordered n-tuple (x1 , x2 , . . . , xn ). components and dimension norm in R n The numbers x1 , x2 , . . . , xn are called the components of the n-vector, xi is the ith component of the vector, and n is called the dimension of the space to which the n-vector belongs. For any given n, the set of all vectors with n real components is called a real n-space or, simply, an n-space, and it is denoted by the symbol Rn . A corresponding space exists when the n numbers x1 , x2 , . . . , xn are allowed to be complex numbers, leading to a complex n-space denoted by C n . In this notation R3 is the three-dimensional space used in previous sections. In R3 the length of a vector was taken as the definition of its norm, so if 2 2 2 r = x1 i + x2 j + x3 k, then r = (x1 + x2 + x3 )1/2 . A generalization of this norm to R n leads to the following definition. The norm in Rn The norm of the n-vector (x1 , x2 , . . . , xn ), denoted by (x1 , x2 , . . . , xn ) is (x1 , x2 , . . . , xn ) = 2 2 2 x1 + x2 + · · · + xn 1/2 n = xi2 (51) . i=1 The laws for the equality, addition, and scaling of vectors in R3 in terms of the components of the vector generalize to R n as follows. Equality of n-vectors algebraic rules for equality, addition, and scaling using components Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two n-vectors. Then the vectors will be equal, written (x1 , x2 , . . . , xn ) = (y1 , y2 , . . . , yn ), if, and only if, corresponding components are equal, so that x1 = y1 , x2 = y2 , . . . , xn = yn . (52) Addition of n-vectors Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be any two n-vectors. Then the sum of these vectors, written (x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ), is defined as the vector whose ith component is the sum of the corresponding ith components of the vectors for i = 1, 2, . . . , n, so that (x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn ). (53)
  • 108. 90 Chapter 2 Vectors and Vector Spaces Scaling an n-vector Let (x1 , x2 , . . . , xn ) be an arbitrary n-vector and λ be any scalar. Then the result of scaling the vector by λ, written λ(x1 , x2 , . . . , xn ), is defined as the vector whose ith component is λ times the ith component of the original vector, for i = 1, 2, . . . , n, so that λ(x1 , x2 , . . . , xn ) = (λx1 , λx2 , . . . , λxn ). (54) The null (zero) vector in Rn is the vector 0 in which every component is zero, so that 0 = (0, 0, . . . , 0). (55) As with vectors in R3 , so also with n-vectors in Rn , it is convenient to use a single boldface symbol for a vector and the corresponding italic symbols with suffixes when it is necessary to specify the components. So we will write x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ). The reasoning that led to the interpretation of Theorem 2.1 on the algebraic rules for the addition and scaling of vectors in R3 leads also the following theorem for n-vectors. THEOREM 2.5 Algebraic rules for the addition and scaling of n-vectors in R n Let x, y, and z be arbitrary n-vectors, and let λ and μ be arbitrary real numbers. Then: (i) (ii) (iii) (iv) (v) (vi) (vii) x + y = y + x; x + 0 = 0 + x = x; (x + y) + z = x + (y + z); λ(x + y) = λx + λy; (λμ)x = λ(μx) = μ(λx); (λ + μ)x = λx + μx; λx = |λ| x . Because of this similarity between vectors in R3 and in Rn , the space Rn is called a real vector space, though because the symbol R indicates real numbers this is usually abbreviated a vector space. Analogously, when the elements of the n-vectors are allowed to be complex, the resulting space is called the complex vector space C n . So far there would seem to be little difference between vectors in R3 and Rn , but major differences do exist, and they are best appreciated when geometrical analogies are sought for vector operations in Rn . dot product of n-vectors The dot product of n-vectors Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be any two n-vectors. Then the dot product of these two vectors, written x · y and also called their inner
  • 109. Section 2.5 n-Vectors and the Vector Space R n 91 product, is defined as the sum of the products of corresponding components, so that (x1 , x2 , . . . , xn ) · (y1 , y2 , . . . , yn ) = x1 y1 + x2 y2 + . . . + xn yn . (56) The following properties of this dot product are strictly analogous to those of the dot product in R3 and can be deduced directly from (56). THEOREM 2.6 Properties of the dot product in R n Let x, y, and z be any three n-vectors and λ be any scalar. Then: (i) (ii) (iii) (iv) (v) (vi) x · y = y · x; x · (y + z) = x · y + x · z; (λx) · y = x · (λy) = λ(x · y); x · x = x 2; x · 0 = 0; x 2 = 0 if, and only if, x = 0. The existence of a dot product in Rn allows the Cauchy–Schwarz and triangle inequalities to be generalized, both of which play a fundamental role in the study of vector spaces. Various forms of proof of these inequalities are possible, but the one given here has been chosen because it makes full use of the properties of the dot product listed in Theorem 2.6. THEOREM 2.7 The Cauchy–Schwarz and triangle inequalities Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be any two n-vectors. Then generalized inequalities for n-vectors (a) |x · y| ≤ x · y (Cauchy–Schwarz inequality), and (b) x+y ≤ x + y (triangle inequality). Proof We start by proving the Cauchy–Schwarz inequality in (a). The inequality is certainly true if x · y = 0, so we need only consider the case x · y = 0. Let x and y be any two n-vectors, and λ be a scalar. Then, using properties (ii) to (iv) of Theorem 2.6, x + λy 2 = (x + λy) · (x + λy), = x 2 + λx · y + λy · x + λ2 y 2 . However, by result (1) of Theorem 2.6, y · x = x · y, so x + λy 2 = x 2 + 2λx · y + λ2 y 2 . We now set λ = − x 2 /(x · y) to obtain x + λy 2 =− x 2 +( x 4 y 2 )/|x · y|2 , where we have used the fact that (x · y)2 = |x · y|2 . As x + λy result is equivalent to − x 2 +( x 4 · y 2 )/|x · y|2 ≥ 0. 2 is nonnegative, this
  • 110. 92 Chapter 2 Vectors and Vector Spaces Cancelling the nonnegative number x 2 , which leaves the inequality sign unchanged; rearranging the terms; and taking the square root of the remaining nonnegative result on each side of the inequality yields the Cauchy–Schwarz inequality |x · y| ≤ x · y . To prove the triangle inequality (b) we set λ = 1 and start from the result x+y 2 = x 2 + 2x · y + y 2 . As x · y may be either positive or negative, x · y ≤ |x · y|, so making use of the Cauchy–Schwarz inequality shows that x+y 2 ≤ x 2 +2 x · y + y 2 = ( x + y )2 . The triangle inequality follows from taking the square root of each side of this inequality, which is permitted because both are nonnegative numbers. The dot product in R3 allowed the angle between vectors to be determined and, more importantly, it provided a test for the orthogonality of vectors. These same geometrical ideas can be introduced into the vector space Rn if the Cauchy–Schwarz inequality is written in the form − x · y ≤x·y≤ x · y . After division by the nonnegative number x · y , this becomes −1 ≤ x·y ≤ 1. x · y This enables the angle θ between the two n-vectors x and y to be defined by the result x·y . x · y cos θ = orthogonality of n-vectors unit n-vector On account of this result, two n-vectors x and y in Rn will be said to be orthogonal when x · y = 0. By analogy with R3 we will call x = (x1 , x2 , . . . , xn ) a unit n-vector if x = 1. If we define the unit n-vectors e1 , e2 , . . . , en as e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1), we see that ei · e j = 1 0 for i = j for i = j, showing that the ei are mutually orthogonal unit n-vectors in Rn . As a result of this the vectors e1 , e2 , . . . , en play the same role in Rn as the vectors i, j, and k in R3 . This allows the vector x = (x1 , x2 , . . . , xn ) to be written as x = x1 e1 + x2 e2 + · · · + xn en , where xi is the ith component of x.
  • 111. Section 2.5 n-Vectors and the Vector Space R n 93 Now suppose that for n > 3, we set u1 = (1, 0, 0, 0, . . . , 0), subspaces u2 = (0, 1, 0, 0, . . . , 0), u3 = (0, 0, 1, 0, . . . , 0), and all other ui identically zero, so that ui = (0, 0, 0, 0, . . . , 0) for i = 4, 5, . . . , n. Then it is not difficult to see that u1 , u2 , and u3 behave like the unit vectors i, j, and k, so that, in some sense the vector space R3 is embedded in the vector space Rn with vectors in both spaces obeying the same algebraic rules for addition and scaling. This is recognized by saying that R3 is a subspace of Rn . Subspace of Rn A subset S of vectors in the vector space Rn is called a subspace of Rn if S is itself a vector space that obeys the rules for the addition and the scaling of vectors in Rn . EXAMPLE 2.19 Find the condition that the set S of vectors of the form (x, mx + c, 0), for any m and all real x forms a subspace of the vector space R3 , and give a geometrical interpretation of the result. Solution The set S can only contain the null vector (0, 0, 0) if c = 0, so if c = 0 the vectors in S cannot form a subspace of R3 . Now let c = 0, so that S contains the null vector. The vector addition law holds, because if (x, mx, 0) and (x , mx , 0) are vectors in S, the sum (x, mx, 0) + (x , mx , 0) = (x + x , m(x + x ), 0) is also a vector in S. The scaling λ(x, mx, 0) = (λx, mλx, 0) also generates a vector in S, so the scaling law for vectors also holds, showing that S is a subspace of R3 provided c = 0. If the three components of vectors in S are regarded as the x-, y-, and z-components of a vector in R3 , the vectors can be interpreted as points on the straight line y = mx passing through the origin and lying in the plane z = 0. This subspace is a one-dimensional vector space embedded in the three-dimensional vector space R3 . EXAMPLE 2.20 Test the following subsets of Rn to determine if they form a subspace. (a) S is the set of vectors (x1 , x1 + 1, . . . , xn ) with all the xi real numbers. (b) S is the set of vectors (x1 , x2 , . . . , xn ) with x1 + x2 + · · · + xn = 0 and all the xi are real numbers. Solution (a) The set S does not contain the null vector and so cannot form a subspace of Rn . This result is sufficient to show that S is not a subspace, but to see what properties of a subspace the set S possesses we consider both the summation and scaling of vectors in S. If (x1 , x1 + 1, . . . , xn ) and (x1 , x1 + 1, . . . , xn ) are two vectors in S, their sum (x1 , x1 + 1, . . . , xn ) + (x1 , x1 + 1, . . . , xn ) = (x1 + x1 , x1 + x1 + 2, . . . , xn + xn ) is not a vector in S, so the summation law is not satisfied.
  • 112. 94 Chapter 2 Vectors and Vector Spaces The scaling condition for vectors is not satisfied, because if λ is an arbitrary scalar, λ(x1 , x1 + 1, . . . , xn ) = (λx1 , λx1 + λ, . . . , λxn ) = (a, a + 1, . . .), (λn1 = a) showing that scaling generates another a vector in S. We have proved that the vectors in S do not form a subspace of Rn . (b) The set S does contain the null vector, because x1 = x2 = · · · = xn = 0 satisfies the constraint condition x1 + x2 + · · · + xn = 0. Both the summation law and the scaling law for vectors are easily seen to be satisfied, so this set S does form a subspace of Rn . EXAMPLE 2.21 Let C(a, b) be the space of all real functions of a single real variable x that are continuous for a < x < b, and let S(a, b) be the set of all functions belonging to C(a, b) that have a derivative at every point of the interval a < x < b. Show that S(a, b) forms a subspace of C(a, b). Solution In this case a vector in the space is simply any real function of a single real variable x that is continuous in the interval a < x < b. The null vector corresponds to the continuous function that is identically zero in the stated interval, so as the derivative of this function is also zero, it follows that the set S(a, b) must also contain the null vector. The sum of continuous functions in a < x < b is a continuous function, and the sum of differentiable functions in this same interval is a differentiable function, so the summation law for vectors is satisfied. Similarly, scaling continuous functions and differentiable functions does not affect either their continuity or their differentiability, so the scaling law for vectors is also satisfied. Thus, S(a, b) forms a subspace of C(a, b). Think of the dimension of these spores as infinite; norm and inner product are easy to define. Summary This section generalized the concept of a three-dimensional vector to a vector with n components in R n . It was shown that the magnitude of a vector in three space dimensions generalizes to the norm of a vector in R n and that in terms of components, the equality, addition, and scaling of vectors in R n follow the same pattern as with three space dimensions. The dot product was generalized and two fundamental inequalities for vectors in R n were derived. The concept of orthogonality of vectors was generalized and the notion of a subspace of R n was introduced. EXERCISES 2.5 In Exercises 1 through 8 find the sum of the given pairs of vectors, their norms, and their dot product. 1. 2. 3. 4. 5. 6. 7. 8. (2, 1, 0, 2, 2), (1, −1, 2, 2, 4). (3, −1, −1, 2, −4), (1, 2, 0, 0, 3). (2, 1, −1, 2, 1), (−2, −1, 1, −2, −1). (3, −2, 1, 1, 2, 0, 1), (1, −1, 1, −1, 1, 0, 1). (3, 0, 1, 0), (0, 2, 0, 4). (1, −1, 2, 2, 0, 1), (2, −2, 1, 1, 1, 0). (−1, 2, −4, 0, 1), (2, −1, 1, 0, 2). (3, 1, 2, 4, 1, 1, 1), (1, 2, 3, −1, −2, 1, 3). In Exercises 9 through 12 find the angle between the given pairs of n-vectors and the unit n-vector associated with each vector. 9. 10. 11. 12. (3, 1, 2, 1), (1, −1, 2, 2). (4, 1, 0, 2), (2, −1, 2, 1). (2, −2, −2, 4), (1, −1, −1, 2). (2, 1, −1, 1), (1, −2, 2, 2). In Exercises 13 through 18 determine if the set of vectors S forms a subspace of the given vector space. Give reasons why S either is or is not a subspace.
  • 113. Section 2.6 13. S is the set of vectors of the form (x1 , x2 , . . . , xn ) in Rn , 4 with the xi real numbers and x2 = x1 . 14. S is the set of vectors of the form (x1 , x2 , . . . , xn ) in Rn , with the xi real numbers and x1 + 2x2 + 3x3 + · · · + nxn = 0. 15. S is the set of vectors of the form (x1 , x2 , . . . , xn ) in Rn , with the xi real numbers and x1 + x2 + x3 + · · · + xn = 2. 16. S is the set of vectors of the form (x1 , x2 , . . . , x6 ) in R 6 , with the xi real numbers and x1 = 0 or x6 = 0. 17. S is the set of vectors of the form (x1 , x2 , . . . , x6 ) in R 6 , with the xi real numbers and x1 − x2 + x3 · · · + x6 = 0. 18. S is the set of vectors of the form (x1 , x2 , . . . , x5 ) in R5 , with the xi real numbers and x2 < x3 . In Exercises 19 to 23 determine if the given set S is a subspace of the space C[0, 1] of all real valued functions that are continuous on the interval 0 ≤ x ≤ 1. Give reasons why either S is a subspace, or it is not. 19. S is the set of all polynomials of degree two. 20. S is the set of all polynomial functions. 21. S is the set of all continuous functions such that f (0) = f (1) = 0. 22. S is the set of all continuous functions such that f (0) = 0 and f (1) = 2. 23. S is the set of all continuous once differentiable functions such that f (0) = 0 and f (x) > 0. 24. Prove that the set S of all vectors lying in any plane in R3 that passes through the origin forms a subspace of R3 . 25. Explain why the set S of all vectors lying in any plane in R3 that does not pass through the origin does not form a subspace of R3 . 2.6 Linear Independence, Basis, and Dimension 95 26. Consider the polynomial P(λ) defined as P(λ) = x + λy 2 , where x and y are vectors in Rn . Show, provided not both x and y are null vectors, that the graph of P(λ) as a function of λ is nonnegative, so P(λ) = 0 cannot have real roots. Use this result to prove the Cauchy–Schwarz inequality |x · y| ≤ x · y . 27. Let x and y be vectors in Rn and λ be a scalar. Prove that x + λy 2 + x − λy 2 = 2( x + λ2 y 2 ). 2 28. If x and y are orthogonal vectors in Rn , prove that the Pythagoras theorem takes the form x+y 2 = x 2 + y 2. 29. What conditions on the components of vectors x and y in the Cauchy–Schwarz inequality cause it to become an equality, so that n 1/2 n xi yi = 1/2 n xi2 i=1 yi2 i=1 ? i=1 30. Modify the method of proof used in Theorem 2.7 to prove the complex form of the Cauchy–Schwarz inequality n 1/2 n |xi |2 xi yi ≤ i=1 i=1 1/2 n + |yi |2 , i=1 where the xi and yi are complex numbers. Linear Independence, Basis, and Dimension The concept of the linear independence of a set of vectors in R3 introduced in Section 2.4 generalizes to Rn and involves a linear combination of n-vectors. Linear combination of n-vectors Let x1 , x2 , . . . , xm be a set of n-vectors in Rn . Then a linear combination of the n-vectors is a sum of the form c1 x1 + c2 x2 + · · · + cmxm, where c1 , c2 , . . . , cm are nonzero scalars.
  • 114. 96 Chapter 2 Vectors and Vector Spaces An example of a linear combination of vectors in R5 is provided by the vector sum (m = 3, n = 5) 2x1 + x2 + 3x3 , where x1 = (1, 2, 3, 0, 4), x2 = (2, 1, 4, 1, −3), and x3 = (6, 0, 2, 2, −1). The vector in R5 formed by this linear combination is 2x1 + x2 + 3x3 = 2(1, 2, 3, 0, 4) + (2, 1, 4, 1, −3) + 3(6, 0, 2, 2, −1), = (22, 5, 16, 7, 2). A linear combination of n-vectors is the most general way of combining nvectors, and the definition of a linear combination of vectors contains within it the definition of the scaling of a single n-vector as a special case. This can be seen by setting m = 1, because this reduces the linear combination to the single scaled n-vector c1 x1 . linear dependence and independence of n-vectors Linear dependence of n-vectors Let x1 , x2 , . . . , xm be a set of n-vectors in Rn . Then the set is said to be linearly dependent if, and only if, one of the n-vectors can be expressed as a linear combination of the remaining n-vectors. An example of linear dependence in R 4 is provided by the vectors x1 = (1, 0, 2, 5), x2 = (2, 1, 2, 1), x3 = (3, 2, 1, 0), and x4 = (−1, −1, −1, 7), because x4 = 2x1 − 3x2 + x3 . Linear independence of n-vectors Let x1 , x2 , . . . , xm be a set of n-vectors in Rn . Then the set is said to be linearly independent if, and only if, the n-vectors are not linearly dependent. A simple example of a set of linearly independent vectors in R 4 is provided by the vectors e1 = (1, 0, 0, 0), e2 = (0, 1, 0, 0), and e3 = (0, 0, 1, 0). The linear independence of these 4-vectors can be seen from the fact that for no choice of c1 and c2 can the vector c1 e1 + c2 e2 be made equal to e3 . To make effective use of the concept of linear independence, and to understand the notion of the basis and dimension of a vector space, it is necessary to have a test for linear independence. Such a test is provided by the following theorem. THEOREM 2.8 Linear dependence and independence Let S be a set of non-zero n-vectors x1 , x2 , . . . , xm, with m ≥ 2. Then: (a) Set S is linearly dependent if the vector equation c1 x1 + c2 x2 + · · · + cmxm = 0 is true for some set of scalars (constants) c1 , c2 , . . . , cm that are not all zero;
  • 115. Section 2.6 Linear Independence, Basis, and Dimension 97 (b) Set S is linearly independent if the vector equation c1 x1 + c2 x2 + · · · + cmxm = 0 is only true when c1 = c2 = · · · = cm = 0. Proof To establish result (a) it is necessary to show that the conditions of the definition of linear dependence are satisfied. First, if the set S of n-vectors is linearly dependent, scalars d1 , d2 , . . . , dm exist such that d1 x1 + d2 x2 + · · · + dmxm = 0. There is no loss of generality in assuming that d1 = 0, because if this is not the case a renumbering of the vectors can always make this possible. Consequently, x1 = (−d2 /d1 )x2 + (−d3 /d1 )x3 + · · · + (−dm/d1 )xm, which shows, as claimed, that the set S is linearly dependent, because x1 is linearly dependent on x2 , x3 , . . . , xm. A similar argument applies to show that xr is linearly dependent on the remaining n-vectors in S provided dr = 0, for r = 2, 3, . . . , m. Conversely, if one of the n-vectors in set S, say x1 , is linearly dependent on the remaining n-vectors in the set, scalars d1 , d2 , . . . , dm can be found such that x1 = d2 x2 + · · · + dmxm, so that x1 − d2 x2 − · · · − dmxm = 0. This result is of the form given in definition of linear dependence with c1 = 1, c2 = −d2 , . . . , cm = −dm, not all of which constants are zero, so again the set of n-vectors in S is seen to be linearly dependent. To establish result (b), suppose, if possible, that the set S of vectors is linearly independent, but that some scalars d1 , d2 , . . . , dm that are not all zero can be found such that d1 x1 + d2 x2 + · · · + dmxm = 0. Then if d1 = 0, say, is one of these scalars, it follows that x1 = (−d2 /d1 )x2 + (−d3 /d1 )x3 + · · · + (−dm/d1 )xm, which is impossible because this shows that, contrary to the hypothesis, x1 is linearly dependent on the remaining n-vectors in S. So we must have c1 = c2 = · · · = cm = 0. A systematic and efficient computational method for the application of Theorem 2.8 to vectors in Rn will be developed in the next chapter for the three separate cases that arise, (a) m < n, (b) m = n, and (c) m > n. However, when n and m are small, a straightforward approach is possible, as illustrated in the next example. EXAMPLE 2.22 Test the following sets of vectors in R4 for linear dependence or independence. (a) x1 = (2, 1, 1, 0), x2 = (0, 2, 0, 1), x3 = (1, 1, 0, 2), x4 = (0, 2, 1, 1). (b) x1 = (4, 0, 2), x2 = (2, 2, 0), x3 = (1, 1, 0), x4 = (5, 1, 2).
  • 116. 98 Chapter 2 Vectors and Vector Spaces Solution In both (a) and (b) it is necessary to consider the vector equation c1 x1 + · · · + cmxm = 0. If the equation is only satisfied when c1 = c2 = · · · = cm = 0, the set of vectors will be linearly independent, whereas if a solution can be found in which not all of the constants c1 , c2 , c3 , c4 vanish, the set of vectors will be linearly dependent. (a) Substituting for x1 , x2 , x3 , x4 in the preceding equation and equating corresponding components show the coefficients ci must satisfy the following equations 2c1 + c3 = 0 c1 + 2c2 + c3 + 2c4 = 0 c1 + c 4 = 0 c2 + 2c3 + c4 = 0. The third equation shows that c4 = −c1 , so the equations can be rewritten as 2c1 + c3 = 0 −c1 + 2c2 + c3 = 0 c2 − c1 + 2c3 = 0. Adding twice the third equation to the first equation shows that c3 = 0, so c1 = 0, and it then follows that c2 = c3 = c4 = 0. This has established the linear independence of the set of vectors in (a). (b) Proceeding in the same manner with the set of vectors in (b) leads to the following equations for the coefficients ci : 4c1 + 2c2 + c3 + 5c4 = 0 2c2 + c3 + c4 = 0 2c1 + 2c4 = 0. The third equation shows that c4 = −c1 , so using this result in the first two equations reduces the first one to −c1 + 2c2 + c3 = 0 and the second to −c1 + 2c2 + c3 = 0. There is only one equation connecting c1 , c2 , and c3 , and hence also c4 . This means that if c2 and c3 are given arbitrary values, not both of which are zero, the constants c1 and c4 will be determined in terms of them. Thus, a set of constants c1 , c2 , c3 , c4 that are not all zero can be found that satisfy the vector equation, showing that the set of vectors in (b) is linearly dependent. This set of constants is not unique, but this does affect the conclusion that the set of vectors is linearly dependent, because to establish linear dependence it is sufficient that at least one such set of constants can be found.
  • 117. Section 2.6 Linear Independence, Basis, and Dimension 99 Example 2.22 has shown one way in which Theorem 2.8 can be implemented for vectors in Rn , but it also illustrates the need for a systematic approach to the solution of the system of equations for the coefficients when n is large. A trivial case of Theorem 2.8 arises when the set of vectors S contains the null vector 0, because then the set of vectors in S is always linearly dependent. This can be seen by assuming that x1 = 0, because then the vector equation in the theorem becomes c1 0 + c2 x2 + · · · + cmxm = 0. This vector equation is satisfied if c1 = 0 (arbitrary) and c2 = c3 = · · · = cm = 0, so, as not all of the coefficients are zero, the set of vectors must be linearly dependent. We conclude this introduction to the vector space Rn by defining the span, a basis, and the dimension of a vector space. span of a vector space Span of a vector space Let the set of non-zero vectors x1 , x2 , . . . , xm belonging to a vector space V have the property that every vector in V can be expressed as a linear combination of these vectors. Then the vectors x1 , x2 , . . . , xm are said to span the vector space V. EXAMPLE 2.23 All vectors v in the (x, y)-plane are spanned by the vectors i and j, because any vector v = (v1 , v2 ) can always be written v = v1 i + v2 j. This is an example of vectors spanning the space R2 . EXAMPLE 2.24 The vector space Rn is spanned by the unit n-vectors e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1). EXAMPLE 2.25 The subspace R3 of the vector space R5 is spanned by the unit vectors e1 = (1, 0, 0, 0, 0), e2 = (0, 1, 0, 0, 0), e3 = (0, 0, 1, 0, 0), because all vectors v = (v1 , v2 , v3 ) in R3 can be written in the form of the linear combination v = v1 e1 + v1 e2 + v3 e3 . basis of a vector space in R n Basis of a vector space Let x1 , x2 , . . . , xn be vectors in Rn . Then the vectors are said to form a basis for the vector space Rn if: (i) The vectors x1 , x2 , . . . , xn are linearly independent. (ii) Every vector in Rn can be expressed as a linear combination of the vectors x1 , x2 , . . . , xn . dimension of a vector space Dimension of a vector space The dimension of a vector space is the number of vectors in its basis.
  • 118. 100 Chapter 2 Vectors and Vector Spaces EXAMPLE 2.26 A basis for the space of ordinary vectors in three dimensions is provided by the vectors i, j, and k, so the dimension of the space is 3. EXAMPLE 2.27 A basis for Rn is provided by the n vectors e1 = (1, 0, 0, 0, . . . , 0), e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, 0, 0, . . . , 1), so its dimension is n. EXAMPLE 2.28 It was shown in Example 2.20 (b) that the set S of vectors (x1 , x2 , . . . , xn ) with x1 + x2 + · · · + xn = 0 forms a subspace of Rn . The dimension of Rn is n, but the constraint condition x1 + x2 + · · · + xn = 0 implies that only n − 1 of the components x1 , x2 , . . . , xn can be specified independently, because the constraint itself determines the value of the remaining component. This in turn implies that the basis for the subspace S can only contain n − 1 linearly independent vectors, so S must have dimension n − 1. More information on linear vector spaces can be found in references [2.1] and [2.5] to [2.12]. Summary In this section the concepts of linear dependence and independence were generalized to vectors in R n , and the span of a vector space was defined as a set of vectors in R n with the property that every vector in R n can be expressed as a linear combination of these vectors. Naturally in R n , as in R 3 , a set of vectors spanning the space is not unique. The smallest set of n vectors spanning a vector space is said to form a basis for the vector space, and the dimension of a vector space is the number of vectors in its basis. This corresponds to the fact that the unit vectors i, j, and k form a basis for the ordinary three-dimensional space R 3 , because every vector in this space can be represented as a linear combination of i, j, and k. EXERCISES 2.6 In Exercises 1 through 12 determine if the set of m vectors in three-dimensional space is linearly independent by solving for the scalars c1 , c2 . . . cm in Theorem 2.8. Where appropriate, verify the result by using Theorem 2.3. a = i + 2 j + k, b = i − j + k, c = 2i + k. a = 3i − j + k, b = i + 3k, c = 5i − j + 7k. a = 2i − j + k, b = 3i + j − k, c = 8i + j + 7k. a = 3i + 2k, b = i + j + 2k, c = 11i + 2 j − 2k. a = 4i − j + 3k, b = i + 4j − 2k, c = 3i − j − k. a = i + j − k, b = i − j + k, c = −i + j + k. a = i + 2 j + k, b = i + 3j − k, c = 3i + 10j − 5k. a = 2i + 3j + k, b = i − 3j + 2k, c = i + 15j − 4k. a = 3i − j + 2k, b = i + j + k (m = 2). a = i + j + k, b = i + 2 j + k, c = i + 3j + k, d = i − 4j + k (m = 4). 11. a = i − j + 3k, b = 2i − j + 2k, c = i + k, d = 3i + j + k (m = 4). 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 12. a = i + j, b = j + k, c = i − k. In Exercises 13 through 16, determine if the set of vectors in R 4 is linearly independent by using the method of Example 2.22. 13. 14. 15. 16. (1, 3, −1, 0), (1, 2, 0, 1), (0, 1, 0, −1), (1, 1, 0, 1). (1, −2, 1, 2), (4, −1, 0, 2), (2, 1, −1, 1), (1, 0, 0, −1). (2, 1, 0, 1), (1, 0, 1, 1), (4, 1, 2, −1), (1, 0, 1, −1). (1, 2, 1, 1), (1, −2, 0, −1), (1, 1, 1, 2), (1, −1, 0, 0). In Exercises 17 through 20, find a basis and the dimension of the given subspace S. 17. The subspace S of vectors in R5 of the form (x1 , x2 , x3 , x4 , x5 ) with x1 = x2 . 18. The subspace S of vectors in R 4 of the form (x1 , x2 , x3 , x4 ) with x1 = 2x2 . 19. The subspace S of vectors in R5 of the form (x1 , x2 , x3 , x4 , x5 ) with x1 = x2 = 2x3 .
  • 119. Section 2.7 20. The subspace S of vectors in R 6 of the form (x1 , x2 , x3 , x4 , x5 , x6 ) with x1 = 2x2 and x3 = −x4 . 21. Let u = cos2 x and v = sin2 x form a basis for a vector space V. Find which of the following can be represented in terms of u and v, and so lie in V. 2.7 Gram–Schmidt Orthogonalization Process 101 (a) 2. (b) sin 2x. (c) 0. (d) cos 2x. (e) 2 + 3x. (f) 3 − 4 cos 2x. 22. Given that r ≤ n, prove that any subset S of r vectors selected from a set of n linearly independent vectors is linearly independent. Gram–Schmidt Orthogonalization Process A set of vectors forming a basis for a vector space is not unique, and having obtained a basis by some means, it is often useful to replace it by an equivalent set of orthogonal vectors. The Gram–Schmidt orthogonalization process accomplishes this by means of a sequence of simple steps that have a convenient geometrical interpretation. We now develop the Gram–Schmidt orthogonalization process for geometrical vectors in R3 , though in Section 4.2 the method will be extended to vectors in Rn to enable orthogonal matrices to be constructed from a set of eigenvectors associated with a symmetric matrix. Let us now show how any basis for R3 , comprising three nonorthogonal linearly independent vectors a1 , a2 , and a3 , can be used to construct an equivalent basis involving three linearly independent orthogonal vectors u1 , u2 , and u3 . It is essential that the vectors a1 , a2 , and a3 be linearly independent, because if not, the vectors u1 , u2 , and u3 generated by the Gram–Schmidt orthogonalization process will be linearly dependent and so cannot form a basis for R3 . The derivation of the method starts by setting u1 = a1 , where the choice of a1 instead of a2 or a3 is arbitrary. The component of a2 in the direction of u1 is u1 · a2 , so the vector component of a2 in this direction is (u1 · a2 )u1 = and this always exists because u1 vector u2 that is normal to u1 , so 2 (u1 · a2 )u1 , u1 2 > 0. Subtracting this vector from a2 gives a u2 = a2 − (u1 · a2 )u1 . u1 2 Similarly, to find a vector normal to both u1 and u2 involving a3 , it is necessary to subtract from a3 the components of vector a3 in the direction of u1 and also in the direction of u2 , so that u3 = a3 − (u1 · a3 )u1 (u2 · a3 )u2 − , u1 2 u2 2 and this also always exists, because u1 2 > 0 and u2 2 > 0. If an orthonormal basis is required, it is necessary to normalize the vectors u1 , u2 , and u3 by dividing each by its norm.
  • 120. 102 Chapter 2 Vectors and Vector Spaces Rule for the Gram–Schmidt orthogonalization process in R3 A set of nonorthogonal linearly independent vectors a1 , a2 , and a3 that form a basis in R3 can be used to generate an equivalent orthogonal basis involving the vectors, u1 , u2 , and u3 by setting u1 = a1 , u2 = a2 − u3 = a3 − (u1 · a2 )u1 , u1 2 and (u1 · a3 )u1 (u2 · a3 )u2 − . u1 2 u2 2 As already remarked, the choice of a1 as the vector with which to start the orthogonalization process was arbitrary, and the process could equally well have been started by setting u1 = a2 or u1 = a3 . Using a different vector will produce a different set of orthogonal vectors u1 , u2 , and u3 , but any basis for R3 is equivalent to any other basis, so unless there is a practical reason for starting with a particular vector, the choice is immaterial. EXAMPLE 2.29 Given the nonorthogonal basis a1 = i − j − k, a2 = i + j + k, and a3 = −i + 2k, use the Gram–Schmidt orthogonalization process to find an equivalent orthogonal basis, and then find the corresponding orthonormal basis. Solution Using the preceding rule we start with u1 = i − j − k, and to find u2 we need to use the results u1 · a2 = −1 and u1 2 = 3, so that u2 = i + j + k − (−1/3)(i − j − k) = (4/3)i + (2/3)j + (2/3)k. To find u3 we need to use the results u1 · a3 = −3, u2 2 = 24/9, so that u1 2 = 3, u2 · a3 = 0, and u3 = −i + 2k − (−3/3)(i − j − k) = −j + k. So the required equivalent orthogonal basis is u1 = i − j − k, u2 = (4/3)i + (2/3)j + 2/3k, and u3 = −j + k. The corresponding orthonormal basis obtained by dividing each of these vectors by its norm (modulus) is √ ˆ u1 = (1/ 3)u1 , ˆ u2 = (1/2) (3/2)u2 and √ ˆ u3 = (1/ 2)u3 . Other accounts of the Gram–Schmidt orthogonalization process are to be found in references [2.1] and [2.7] to [2.12].
  • 121. Section 2.7 Summary Gram–Schmidt Orthogonalization Process 103 In this section it is shown how in R 3 the Gram–Schmidt orthogonalization process converts any three nonorthogonal linearly independent vectors a1 , a2 , and a3 into three orthogonal vectors u1 , u2 , and u3 . If necessary, the vectors u1 , u2 , and u3 can then be normalized in the usual manner to form an orthogonal set of unit vectors. EXERCISES 2.7 In Exercises 1 through 6, use the given nonorthogonal basis for vectors in R3 to find an equivalent orthogonal basis by means of the Gram–Schmidt orthogonalization process. 1. 2. 3. 4. 5. a1 a1 a1 a1 a1 = i + 2 j + k, a2 = i − j, a3 = 2 j − k. = j + 3k, a2 = i + j − k, a3 = i + 2k. = 2i + j, a2 = 2 j + k, a3 = k. = i + 3k, a2 = i − j + k, a3 = 2i + j. = −i + k, a2 = 2 j + k, a3 = i + j + k. 6. a1 = i + k, a2 = −j + k, a3 = i + j + 2k. In Exercises 7 and 8, find two different but equivalent sets of orthogonal vectors by arranging the same three nonorthogonal vectors in the orders indicated. 7. (a) a1 = 3j − k, a2 = i + j, a3 = i + 2k. (b) a1 = i + j, a2 = 3j − k, a3 = i + 2k. 8. (a) a1 = j − k, a2 = i + k, a3 = −i − j + k. (b) a1 = −i − j + k, a2 = i + k, a3 = j − k.
  • 122. This Page Intentionally Left Blank
  • 123. C H A P T E R 3 Matrices and Systems of Linear Equations M any types of problems that arise in engineering and physics give rise to linear algebraic simultaneous equations. A typical engineering example involves the determination of the forces acting in the struts of a pin-jointed structure like a truss that forms the side of a bridge supporting a load. The determination of the forces in a strut is important in order to know when it is in compression or tension, and to ensure that no truss exceeds its safe load. The analysis of the forces in structures of this type gives rise to a set of linear simultaneous equations that relate the forces in the struts and the external load. It is necessary to know when systems of linear equations are consistent so a solution exists, when they are inconsistent so there is no solution, and whether when a solution exists it is unique or nonunique in the sense that it involves a number of arbitrary parameters. In practical problems all of these mathematical possibilities have physical meaning, and in the case of a truss, the inability to determine the forces acting in a particular strut indicates that it is redundant and so can be removed without compromising the integrity of the structure. A more complicated though very similar situation occurs when linearly vibrating systems are coupled together, as may happen when an active vibration damper is attached to a spring-mounted motor. However, in this case it is a system of simultaneous linear ordinary differential equations determining the amplitudes of the vibrations of the motor and vibration damper that are coupled together. The analysis of this problem, which will be considered later, also gives rise to a linear system of simultaneous algebraic equations. Linear ordinary differential equations are also coupled together when working with linear control systems involving feedback. When such systems are solved by means of the Laplace transform to be described later, linear algebraic systems again arise and the nature of the zeros of the determinant of a certain quantity then determines the stability of the control system. Linear systems of simultaneous algebraic equations also play an essential role in computer graphics, where at the simplest level they are used to transform images by translating, rotating, and stretching them by differing amounts in different directions. Although each equation in a system of linear algebraic equations can be considered separately, such can be discovered about the properties of the physical problem that gave rise to the equations if the system of equations can be studied as a whole. This can be accomplished by using the algebra of matrices that provides a way of analyzing systems 105
  • 124. 106 Chapter 3 Matrices and Systems of Linear Equations as a single entity, and it is the purpose of this chapter to introduce and develop this aspect of what is called linear algebra. After defining the notion of a matrix, this chapter develops the fundamental matrix operations of equality, addition, scaling, transposition, and multiplication. Various applications of matrices are given, and the brief review of determinants given in Chapter 1 is developed in greater detail, prior to its use when considering the solution of systems of linear algebraic equations. The concept of elementary row operations is introduced and used to reduce systems of linear algebraic equations to a form that shows whether or not a unique solution exists. When a solution does exist, which is either unique or determined in terms of some of the remaining variables, this reduction enables the solution to be found immediately. The inverse of an n × n matrix is defined and shown only to exist when the determinant of the matrix is nonvanishing, and, finally, the derivative of a matrix whose elements are functions of a variable is introduced and some of its most important properties are derived. 3.1 Matrices M atrices arise naturally in many different ways, one of the most common being in the study of systems of linear equations such as a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm. (1) In system (1) the numbers ai j are the coefficients of the equations, the numbers bi are the nonhomogeneous terms, and the number of equations m may equal, exceed, or be less than n, the number of unknowns x1 , x2 , . . . , xn . System (1) is said to be homogeneous when b1 = b2 = · · · = bm = 0, and to be nonhomogeneous when at least one of the bi is nonvanishing. The algebraic properties of the system are determined by the array of coefficients ai j , the nonhomogeneous terms bi and the numbers m and n. From now on, the array of coefficients and the nonhomogeneous terms on the right will be denoted by the single symbols A and b, respectively, where ⎡ a11 ⎢ a21 A=⎢ ⎣ am1 a12 a22 . . . am2 . . . . . . . . ⎤ . a1n . a2n ⎥ ⎥ ⎦ . . amn ⎡ and ⎤ b1 ⎢b ⎥ b = ⎢ 2⎥. ⎣ · ⎦ bm (2) The array of mn coefficients ai j in m rows and n columns that form A is an example of an m × n matrix, where m × n is read “m by n.” The array b is an example of an m × 1 matrix, and it is called an m element column vector. We will use the convention that an array such as A, with two or more rows and two or more columns, will be denoted by a boldface capital letter. An array with a single row, or a column such as b, will be denoted by a boldface lowercase letter. Each entry in a matrix is called an element of the matrix, and entries may be numbers, functions, or even matrices themselves. The suffixes associated with an element show its position in the matrix, because the first suffix is the row number
  • 125. Section 3.1 Matrices 107 and the second is the column number. Because of this convention, the element a35 in a matrix belongs to the third row and the fifth column of the matrix. So, for example, if A is a 3 × 2 matrix and its general element ai j = i + 3 j, then as i may only take the values 1, 2, and 3, and j the values 1 and 2, it follows that ⎡ ⎤ 4 7 A = ⎣5 8⎦ . 6 9 In a column vector c with elements c11 , c21 , c31 , . . . , cm1 , as only a single column is involved, it is usual to vary the suffix convention by omitting the second suffix and instead numbering the elements sequentially as c1 , c2 , c3 , . . . , cm, so that ⎡ ⎤ c1 ⎢ c2 ⎥ ⎢ ⎥ ⎢.⎥ c = ⎢ . ⎥. ⎢.⎥ ⎢.⎥ ⎣.⎦ . cm Later it will be necessary to introduce row vectors, and in an s element row vector r with elements r11 , r12 , r13 , . . . , r1s , the notation is again simplified, this time by omitting the first suffix and numbering the elements sequentially as r1 , r2 , . . . , rs , so r = [r1 , r2 , . . . , rs ]. (3) In general, row and column vectors will be denoted by boldface lowercase letters such as a, b, c, and x, and matrices such as the coefficient matrix in (2) will be denoted by boldface capital letters such as A, B, P, and Q. A different convention that is also used to denote a matrix involves enclosing the array between curved brackets instead of the square ones used here. Thus, 1 −3 5 2 9 4 1 −3 and 5 2 9 4 (4) denote the same 2 × 3 matrix. A matrix should never be enclosed between two vertical rules in order to avoid confusion with the determinant notation because 3 5 −4 2 is a matrix, but 3 5 −4 = 26 is a determinant. 2 Definition of a matrix An m × n matrix is an array of mn entries, called elements, arranged in m rows and n columns. If a matrix is denoted by A, then the element in its ith row and jth column is denoted by ai j and ⎡ a11 ⎢ a21 A = [ai j ] = ⎢ ⎣ . am1 a12 a22 . . . am2 . . . . . . . . . . . . . ⎤ a1n a2n ⎥ ⎥. . ⎦ amn
  • 126. 108 Chapter 3 Matrices and Systems of Linear Equations some typical matrices The following are typical examples of matrices: A 1 × 1 matrix: [3]; a single element may be regarded as a matrix. ⎡ ⎤ 1 3 5 0 A 3 × 4 matrix: ⎣2 −1 4 3⎦ ; a matrix with real numbers as elements. 7 2 1 6 A 2 × 2 matrix: 1+i 3 + 4i 1−i ; 2 − 3i a matrix with complex numbers as elements. cosθ sin θ ; a matrix with functions as elements. −sinθ cos θ A 1 × 3 matrix: [2, −5, 7]; a three-element row vector. 11 A 2 × 1 matrix: ; a two-element column vector. 9 A 2 × 2 matrix: A square matrix is a matrix in which the number of rows m equals the number of columns n. A typical square matrix is the 3 × 3 matrix ⎡ ⎤ 2 0 5 ⎣1 −3 4⎦ . 3 1 7 Definition of the equality of matrices Let A = [ai j ] be an m × n matrix and B = [bi j ] be a p × q matrix. Then matrices A and B will be equal, written A = B, if, and only if: (a) A and B have the same number of rows, and the same number of columns, so that m = p and n = q, and (b) ai j = bi j , for each i and j. Equality of matrices means that if A and B are equal, then each is an identical copy of the other. ⎡ EXAMPLE 3.1 2 3 a If A = , b 6 1 2 B= −3 3 6 9 , 1 and 2 C = ⎣−3 0 3 6 0 ⎤ 9 1⎦ , 0 then A = B if and only if a = 9 and b = −3, but A = C and B = C. Definition of matrix addition The addition of matrices A and B is only defined if the matrices each have the same number of rows and the same number of columns. Let A = [ai j ] and B = [bi j ] be m × n matrices. Then the the m × n matrix formed by adding A and B, called the sum of A and B and written A + B, is the matrix whose element in the ith row and jth column is ai j + bi j , for each i and j, so that A + B = [ai j + bi j ]. Matrices that can be added are said to be conformable for addition.
  • 127. Section 3.1 Matrices 109 It is an immediate consequence of this definition that A + B = B + A, so matrix addition is commutative. Definition of the transpose of a matrix Let A = [ai j ] be an m × n matrix. Then the transpose of A, denoted by AT (and sometimes by A ), is the matrix obtained from A by interchanging rows and columns to produce the n × m matrix AT = [ai j ]T = [a ji ]. The definition of the transpose of a matrix means that the first row of A becomes the first column of AT , the second row of A becomes the second column of AT , . . . . , and, finally, the mth row of A becomes the mth column of AT . In particular, if A is a row vector, then its transpose is a column vector, and conversely. EXAMPLE 3.2 ⎡ 2 If A = 1 6 0 3 4 2 then AT = ⎣6 3 ⎤ ⎡ ⎤ 1 7 0⎦ , and if A = [7, 3, 2] then AT = ⎣3⎦ . 4 2 Definition of scaling a matrix by a number Let A = [ai j ] be an m × n matrix and λ be a scalar (real or complex). Then if A is scaled by λ, written λA, every element of A is multiplied by λ to yield the m × n matrix λA = [λai j ]. EXAMPLE 3.3 If λ = 2 and A= 2 1 −6 4 7 , 15 then λA = 2A = 4 2 −12 8 14 , 30 and if λ = −1, then λA = (−1)A = −A = difference (subtraction) of matrices −2 −1 6 −7 . −4 −15 Taken together, the definitions of the addition and scaling of matrices show that if the matrices A and B are conformable for addition, then the subtraction of matrix B from A, called their difference and written A − B, is to be interpreted as A − B = A + (−1)B. EXAMPLE 3.4 If A = 2 5 8 1 −4 5 and B= 2 2 4 −4 5 , 1 then A − B = 0 −1 1 0 3 . 4
  • 128. 110 Chapter 3 Matrices and Systems of Linear Equations negative of a matrix The null or zero matrix 0 is defined as any matrix in which every element is zero. The introduction of the null matrix makes it appropriate to call −A the negative of A, because A − A = A + (−1)A = 0. When working with the null matrix the number of its rows and columns is never stated, because these are always taken to be whatever is appropriate for the equation that is involved. Definition of the product of a row and a column vector Let a = [a1 , a2 , . . . , ar ] be an r-element row vector, and b = [b1 , b2 , . . . , br ]T be an r -element column vector. Then the product ab, in this order, is the number defined as ab = a1 b1 + a2 b2 · · · + ar br . Notice that this product is only defined when the number of elements in the row vector A equals the number of elements in the column vector B. EXAMPLE 3.5 Find the product ab given that a = [1, 4, −3, 10] and b = [2, 1, 4, −2]T . Solution ab = [1, 4, −3, 10] ⎡ ⎤ 2 ⎢ 1⎥ ⎢ ⎥ ⎣ 4⎦ −2 = (1) · (2) + (4) · (1) + (−3) · (4) + (10) · (−2) = −26. Definition of the product of matrices Let A = [ai j ] be an m × n matrix in which the r th row is the row vector ar , and let B = [bi j ] be a p × q matrix in which the sth column is the column vector bs . The matrix product AB, in this order, is only defined if the number of columns in A equals the number of rows in B, so that n = p. The product is then an m × q matrix with the element in its r th row and sth column defined as ar bs . Thus, if cr s = ar bs , as cr s = ar 1 b1s + ar 2 b2s + · · · + ar n bns , AB = [cr s ] = [ar 1 b1s + ar 2 b2s + · · · + ar n bns ], for 1 ≤ r ≤ m and 1 ≤ s ≤ q, or, equivalently, ⎡ a1 b1 a1 b2 a1 b3 ⎢ a2 b1 a2 b2 a2 b3 AB = ⎢ . . . . . . . . . . ⎣ amb1 amb2 amb3 ⎤ . . . a1 bq . . . a2 bq ⎥ ⎥. . . . . . ⎦ . . . ambq
  • 129. Section 3.1 Matrices 111 When a matrix product AB is defined, the matrices are said to be conformable for matrix multiplication in the given order. in general, matrix multiplication is noncommutative EXAMPLE 3.6 It is important to notice that when the product AB is defined, the product BA may or may not be defined, and even when BA is defined, in general AB = BA. This situation is recognized by saying that, in general, matrix multiplication is noncommutative. Provided matrices A and B are conformable for multiplication, the above rule for finding their product AB, in this order, is best remembered by saying that the element in the ith row and jth column of AB is the product of the ith row of A and the jth column of B. Form the matrix products AB and BA given that A= 1 2 4 5 −3 4 and ⎡ 4 B = ⎣2 0 ⎤ 1 6⎦ . 3 Solution Let us calculate the matrix product AB. The first and second row vectors of A are a1 = [1, 4, −3] and a2 = [2, 5, 4], and the first and second column vectors of B are b1 = [4, 2, 0]T and b2 = [1, 6, 3]T . As A is a 2 × 3 matrix and B is a 3 × 2 matrix, the product AB is conformable for multiplication and yields a 2 × 2 matrix AB = = (1 · 4 + 4 · 2 + (−3) · 0) (1 · 1 + 4 · 6 + (−3) · 3) a1 b1 a1 b2 = a2 b1 a2 b2 (2 · 4 + 5 · 2 + 4 · 0) (2 · 1 + 5 · 6 + 4 · 3) 12 18 16 . 44 The product BA is also conformable for multiplication and yields a 3 × 3 matrix, where now we must use the row vectors of B that with an obvious change of notation are b1 = [4, 1], b2 = [2, 6], b3 = [0, 3], and the column vectors of A that are a1 = [1, 2]T , a2 = [4, 5]T , and a3 = [−3, 4]T , so that ⎡ ⎤ ⎡ ⎤ b1 a1 b1 a2 b1 a3 (4 · 1 + 1 · 2) (4 · 4 + 1 · 5) (4 · (−3) + 1 · 4) BA = ⎣b2 a1 b2 a2 b2 a3 ⎦ = ⎣(2 · 1 + 6 · 2) (2 · 4 + 6 · 5) (2 · (−3) + 6 · 4)⎦ (0 · 1 + 3 · 2) (0 · 4 + 3 · 5) (0 · (−3) + 3 · 4) b3 a1 b3 a2 b3 a3 ⎡ ⎤ 6 21 −8 = ⎣14 38 18⎦ . 6 15 12 This is an example of two matrices A and B that can be combined to form the products AB and BA, but AB = BA. EXAMPLE 3.7 Write the system of simultaneous equations (1) in matrix form. Solution Using the matrices A and b in (2) and setting x = [x1 , x2 , . . . , xn ]T allows the system of equations (1) to be written Ax = b. Here, as is usual, to save space the transpose operation has been used to display the elements of column vector x in the more convenient form x = [x1 , x2 , . . . , xn ]T .
  • 130. 112 Chapter 3 Matrices and Systems of Linear Equations The definitions of matrix multiplication and addition lead almost immediately to the results of the following theorem, so the proof is left as an exercise. THEOREM 3.1 some important properties of matrices Associative and distributive properties of matrices Let A, B, and C be matrices that are conformable for the operations that follow, and let λ be a scalar. Then: If AB and BA are both defined, in general AB = BA; (ii) A(BC) = (AB)C = ABC; (iii) (λA)B = A(λB) = λAB; (iv) A(B + C) = AB + AC; (v) THEOREM 3.2 (i) (A + B)C = AC + BC. Transposition of a product If matrices A and B are conformable to form the product AB, then (AB)T = BT AT . Proof The products (AB)T and BT AT are both defined, and each is an m × q matrix. Introduce the notation [M]i j to denote the element of M in row i and column j. Then from the transpose operation and the rule for matrix multiplication, for all permissible i, j, n T [AB]i, j = [AB] j,i = (product of jth row of A with ith column of B) = a jkbki . k=1 Similarly, [BT AT ]i, j = (product of ith row of BT with jth column of AT ) n = (product of ith column of B with jth row of A) = a jkbki . k=1 So [AB]iTj = [BT AT ]i j for all permissible i, j, showing that (AB)T = BT AT . raising a matrix to a power It is an immediate consequence of Theorem 3.1(ii) that if A is a square matrix and m and n are positive integers, A · A · A · . . . · A = An and Am · An = Am+n . n times A useful result from the definition of addition is (A + B)T = AT + BT , while from Theorem 3.2 (ABC)T = CT BT AT .
  • 131. Section 3.1 pre- and postmultiplication of matrices Matrices 113 As the order in which a sequence of permissible matrix multiplications is performed influences the product, it is necessary to introduce a form of words that makes the order unambiguous. This is accomplished by saying that if matrix A multiplies matrix B from the left, as in AB, then B is premultiplied by A, while if A multiplies B from the right, as in BA, then B is postmultiplied by A. Equivalently, in the product AB, we can say that A is postmultiplied by B, or that B is premultiplied by A. Important Differences Between Ordinary Algebraic Equations and Matrix Equations (i) The algebraic equation ab = 0, in which a and b are numbers, not both of which are zero, implies that either a = 0 or b = 0. However, if the matrix product AB is defined and is such that AB = 0, then it does not necessarily follow that either A = 0 or B = 0. (ii) The algebraic equation ab = ac in which a, b, and c are numbers, with a = 0, allows cancellation of the factor a leading to the conclusion that b = c. However, if the matrix products AB and AC are defined and are such that AB = AC, this does not necessarily imply that B = C, so that cancellation of matrix factors is not permissible. The validity of these two statements can be seen by considering the following simple examples. EXAMPLE 3.8 Consider matrices A and B given by A= 1 3 4 12 B= and 4 −8 . −1 2 Then AB = 0, but neither A nor B is a null matrix. EXAMPLE 3.9 Consider the matrices A, B, and C given by A= 1 2 −1 , −2 B= 2 2 4 3 6 , 4 and C= 3 3 6 5 8 . 6 Then AB = AC = 0 0 1 2 2 , 4 but B = C. leading diagonal and trace of a matrix In a square n × n matrix A = [ai j ], the elements on a line extending from top left to bottom right is called the leading diagonal of A, and it contains the n elements a11 , a22 , . . . , ann . So the leading diagonal of the 2 × 2 matrix A in Example 3.8 contains the elements 1 and 12, and the leading diagonal of the 2 × 2 matrix B contains the elements 4 and 2. Symbolically, the leading diagonal of the n × n matrix A = [ai j ] shown below comprises the n elements in the shaded diagonal strip, though these
  • 132. 114 Chapter 3 Matrices and Systems of Linear Equations n elements do not form an n element vector. ⎡ a11 ⎢a21 ⎢ ⎢a A = ⎢ 31 ⎢ · ⎢ ⎣ · an1 a12 a22 a32 · · an2 a13 a23 a33 · · an3 · · · · · · · · · · · · · · · · · · ⎤ a1n a2n ⎥ ⎥ a3n ⎥ ⎥. · ⎥ ⎥ · ⎦ ann The trace of a square matrix A, written tr(A), is the sum of the terms on its leading diagonal, so for the foregoing matrix tr(A) = a11 + a22 + · · · + ann . Square matrices in which all elements away from the leading diagonal are zero, but not every element on the leading diagonal is zero, are called diagonal matrices. Of the class of diagonal matrices, the most important are the unit matrices, also called identity matrices, in which every element on the leading diagonal is the number 1. These n × n matrices are usually all denoted by the symbol I, with the value of n being understood to be appropriate to the context in which they arise. If, however, the value of n needs to be indicated, the symbol I can be replaced by In . It is easily seen from the definition of matrix multiplication that for any m × n matrix A it follows that ImA = AIn or, more simply, IA = AI = A, and that when A is an n × n matrix, IA = AI = A. identity or unit matrix When working with matrices, the unit matrix I plays the part of the unit real number, and it is because of this that I is called either the unit or the identity matrix. An example of a 4 × 4 diagonal matrix is ⎡ ⎤ 3 0 0 0 ⎢0 2 0 0⎥ ⎥ D=⎢ ⎣0 0 0 0⎦ , with the trace given by tr(D) = 3 + 2 + 0 + 1 = 6. 0 0 0 1 The 3 × 3 unit matrix is the diagonal matrix ⎡ ⎤ 1 0 0 I = I3 = ⎣0 1 0⎦ , and its trace is tr(I) = 1 + 1 + 1 = 3. 0 0 1 Various special square n × n matrices occur sufficiently frequently for them to be given names, and some of the most important of these are the following: some special matrices Upper triangular matrices are matrices in which all elements below the leading diagonal are zero. A typical example of a 4 × 4 upper triangular matrix is ⎡ ⎤ 1 3 −1 0 ⎢0 2 −6 1⎥ ⎥ U=⎢ ⎣0 0 −3 2⎦ . 0 0 0 4
  • 133. Section 3.1 Matrices 115 Lower triangular matrices are matrices in which all elements above the leading diagonal are zero. A typical example of a 4 × 4 lower triangular matrix is ⎡ ⎤ 2 0 0 0 ⎢ 1 0 0 0⎥ ⎥ L=⎢ ⎣ 3 −2 5 0⎦ . −2 4 7 3 Symmetric matrices A = [ai j ] are matrices in which ai j = a ji for all i and j. If A is symmetric, then A = AT . A typical example of a symmetric matrix is ⎡ ⎤ 1 5 −3 2⎦ . M=⎣ 5 4 −3 2 7 Skew-symmetric matrices A = [a ji ] are matrices in which ai j = −a ji for all i and j. From the definition of an n × n skew-symmetric matrix we have aii = −aii for i = 1, 2, . . . , n, so the elements on the leading diagonal must all be zero. An equivalent definition of a skew-symmetric matrix A is that AT = −A. A typical example of a skew-symmetric matrix is ⎡ ⎤ 0 3 −5 6 ⎢−3 0 2 −4⎥ ⎥. S=⎢ ⎣ 5 −2 0 −1⎦ −6 4 1 0 An orthogonal matrix Q is a matrix such that QQT = QT Q = I. A typical orthogonal matrix is ⎡ 1 1 ⎤ −√ √ ⎢ 2 2⎥ ⎥. Q=⎢ ⎣ 1 1 ⎦ √ √ 2 2 More special than the preceding real valued matrices are matrices A = [ai j ] in which the elements ai j are complex numbers. We will write A to denote the matrix obtained from A by replacing each of its elements ai j by its complex conjugate a i j , so that A = [a i j ]. Then matrix A is said to be Hermitian if T A = A. A typical Hermitian matrix is 7 1 + 4i A= 1 − 4i . 3 The matrix A is said to be skew-Hermitian if T A = −A. A typical skew-Hermitian matrix is A= 3i −5 + 2i 5 + 2i . 0
  • 134. 116 Chapter 3 Matrices and Systems of Linear Equations block matrices More will be said later about some of these special square matrices and the ways in which they arise. Finally, we mention that every m × n matrix A can be represented differently as a block matrix, in which each element is itself a matrix. This is accomplished by partitioning the matrix A into submatrices by considering horizontal and vertical lines to be drawn through A between some of its rows and columns, and then identifying each group of elements so defined as a submatrix of A. Clearly there is more than one way in which a matrix can be partitioned. As an example of matrix partitioning, let us consider the 3 × 3 matrix ⎤ ⎡ 3 −1 2 2 0⎦ . A = ⎣1 2 1 0 One way in which this matrix can be partitioned is as follows: ⎡ ⎤ 3 −1 2 ⎢ ⎥ A = ⎣1 2 0 ⎦. 2 0 1 This can now be written in block matrix form as A= A12 , A22 A11 A21 where the submatrices are A11 = [3 −1], A12 = [2], A21 = 1 2 2 , 1 and A22 = 0 . 0 The addition and scaling of block matrices follow the same rules as those for ordinary matrices, but care must be exercised when multiplying block matrices. To see how multiplication of block matrices can be performed, let us consider the product of matrix A above and the 3 × 4 matrix ⎡ ⎤ 1 2 2 1 ⎥ ⎢ B = ⎣3 1 1 0 ⎦ , 2 3 0 2 which are conformable for the product AB that is itself a 3 × 4 matrix. If B is partitioned as indicated by the dashed lines, it can be written as B= B11 B21 B12 , B22 where the submatrices are B11 = 1 , 3 B12 = 2 1 2 1 1 , 0 B21 = [2], and B22 = [3, 0, 2]. Consideration of the definition of the product of matrices shows that we may now write the matrix product AB in the condensed form AB = A11 B11 + A12 B21 A21 B11 + A22 B21 A11 B12 + A12 B22 , A21 B12 + A22 B22
  • 135. Section 3.1 Matrices 117 where the partitioned matrices have been multiplied as though their elements were ordinary elements. This result follows because of correct partitioning, as each product of submatrices is conformable for multiplication and all of the matrix sums are conformable for addition. In this illustration, routine calculations show that A11 B11 + A12 B21 = [4], A11 B12 + A12 B22 = [11, 5, 7], 7 4 A21 B11 + A22 B21 = , and A21 B12 + A22 B22 = 5 5 4 5 1 , 2 so ⎡ [4] AB = ⎣ 7 5 ⎤ ⎡ [11, 5, 7] 4 ⎦ = ⎣7 4 4 1 5 5 5 2 11 4 5 ⎤ 5 7 4 1⎦. 5 2 This result is easily confirmed by direct matrix multiplication. The calculation of a matrix product AB using partitioned matrices applies in general, provided the partitioning of A and B is performed in such a way that the products of all the submatrices involved are defined. Matrix partitioning has various uses, one of which arises in machine computation when a very large fixed matrix A needs to be multiplied by a sequence of very large matrices P, Q, R, . . . . If it happens that A can be partitioned in such a way that some of its submatrices are null matrices, the computational time involved can be drastically reduced, because the product of a submatrix and a null matrix is a null matrix, and so need not be computed. The economy follows from the fact that in machine computation multiplications occupy most of the time, so any reduction in their number produces a significant reduction in the time taken to evaluate a matrix product, and the result is even more significant when the same partitioned matrix with null blocks is involved in a sequence of calculations. Block matrices are also of significance when describing complex oscillation problems governed by a large system of simultaneous ordinary differential equations. Their importance arises from the fact that the matrix of coefficients of the equations often contains many null submatrices, and when this happens the structure of the nonnull blocks provides useful information about the fundamental modes of oscillation that are possible, and also about their interconnections. For other accounts of elementary matrices see the appropriate chapters in references [2.1], [2.5], and [2.7] to [2.12]. Summary This section defined m × n matrices, and the special cases of column and row vectors, and it introduced the fundamental algebraic operations of equality, addition, scaling, transposition, and multiplication of matrices. It was shown that, in general, matrix multiplication is not commutative, so that even when both of the products AB and BA are defined, it is usually the case that AB = BA. Pre- and postmultiplication of matrices was defined, and some important special types of matrices were introduced, such as the unit matrix I. It was also shown how a matrix A can be subdivided into blocks, and that a matrix operation performed on A can be interpreted in terms of matrix operations performed on block matrices obtained by subdivision of A.
  • 136. 118 Chapter 3 Matrices and Systems of Linear Equations EXERCISES 3.1 In Exercises 1 through 4 find the values of the constants a, b, and c in order that A = B. −a 1 4 2 b −1 ⎤ ⎡ 1 4 3 B = ⎣2 2 4⎦. b 1 0 ⎤ ⎡ 2 a2 a 1 a 1 2⎦ , B = ⎣ 3 3. A = ⎣ b 1+a 2+c 6 2 ⎡ ⎤ ⎡ 1 3+a 2 1 a 5 ⎦, B = ⎣4 4. A = ⎣1 + b b2 b2 1 a2 2 a 2 ⎡ 1 2. A = ⎣a 9 ⎡ 1. A = 1 c , 3 a ⎤ 4 3 2 4⎦ , 1 c B= a 1 4 ⎤ 1 2⎦. 6 ⎤ −1 c a 5 ⎦. 1 a2 In Exercises 9 through 12 form the sum λA + μB. ⎤ ⎡ 1 4 2 9. λ = 1, μ = 3, A = ⎣2 1 4⎦, 3 2 2 ⎤ ⎡ 2 3 −1 4⎦. B = ⎣1 2 1 0 3 μ = 2, A= 1 2 4 4 1 , 0 B= 2 0 1 2 1 . 4 11. λ = 4, μ = −2, . In Exercises 5 through 8 find A + B and A − B. ⎤ ⎤ ⎡ ⎡ 2 0 1 −2 1 4 3 6 1⎦. 1 0 2⎦ , B = ⎣1 1 −3 5. A = ⎣2 0 1 1 0 1 −1 0 1 ⎤ ⎤ ⎡ ⎡ 1 7 6 2 −1 6 6. A = ⎣ 0 2 4⎦ , B = ⎣1 −2 3⎦. −1 0 1 2 1 2 ⎤ ⎤ ⎡ ⎡ 0 2 3 1 2 4 ⎢3 −1 1⎥ ⎢3 1 0⎥ ⎥. ⎥ ⎢ 7. A = ⎢ ⎣1 1 0⎦ , B = ⎣0 1 1⎦ 1 3 2 2 2 4 ⎤ ⎤ ⎡ ⎡ 1 0 0 0 1 4 3 6 ⎢ 3 1 0 0⎥ ⎢0 2 1 4⎥ ⎥ ⎥ ⎢ 8. A = ⎢ ⎣0 0 3 1⎦ , B = ⎣1 2 4 0⎦. 1 1 1 3 0 0 0 2 10. λ = −1, ⎡ 12. λ = 3, μ = −3, 4 A = ⎣2 1 ⎡ 6 B = ⎣2 1 ⎡ 3 A = ⎣2 3 ⎡ 3 B = ⎣4 2 3 1 2 1 4 1 1 2 6 2 2 1 ⎤ 1 1⎦, 1 ⎤ 0 2⎦. 2 ⎤ 4 1⎦, 2 ⎤ 1 3⎦. 1 In Exercises 13 through 16 find the product AB. 13. A = [1, 14. A = [2, 15. A = [1, B = [2, 16. A = [1, B = [−1, 4, 3, 4, 2, 3, 2, −2, 3], B = [2, 1, −1, 2]T . 1, 4], B = [3, 1, 1, 3]T . 3, 7, 5], −1, −1, 3]T . −1, 2, 0], 13, 4, 1]T . In Exercises 17 through 22 find the product AB and, when it exists, the product BA. 17. A = 1 4 , 2 0 B= 2 1 1 4 3 . 1 18. A = [1, 4, 6, −7], B = [2, 3, −2, 3]T . ⎤ ⎤ ⎡ ⎡ 1 0 0 3 1 4 19. A = ⎣ 0 1 0 ⎦ , B = ⎣ 2 1 −5 ⎦ . 0 0 1 7 2 0 ⎤ ⎤ ⎡ ⎡ 2 0 0 9 −1 4 6 −2 ⎦ . 20. A = ⎣ 0 −3 0 ⎦ , B = ⎣ 1 0 0 5 2 2 3 ⎤ ⎡ ⎤ ⎡ 2 3 1 5 2 3 ⎢4 1 2⎥ ⎥, B = ⎣2 0 4⎦. ⎢ 21. A = ⎣ 2 2 6⎦ 1 4 7 1 5 2 ⎤ ⎤ ⎡ ⎡ 3 1 1 2 1 0 ⎢ 4 ⎢2 1 1 4⎥ 2⎥ ⎥ ⎥ ⎢ 22. A = ⎢ ⎣ 1 0 2 1 ⎦ , B = ⎣ 6 −2 ⎦ . −1 4 1 1 2 1 23. Given ⎤ ⎤ ⎡ ⎡ 2 5 −3 4 2 1 4⎦ and B = ⎣2 5 6⎦ , A=⎣ 5 1 −3 4 6 1 6 3 show that (AB)T = BA.
  • 137. Section 3.1 In Exercises 24 through 28 write the given systems of equations in the matrix form Ax = b, where A is the coefficient matrix, x is the vector of unknowns, and b is the nonhomogeneous vector term. 26. 5x + 3y − 6z = 14 6x − 5y + 11z = 20 x − 4y + 3z = 2 9x − 3y + 2z = 35. 24. 3x + 5y − 6z = 7 x − 7y + 4z = −3 2x + 4y − 5z = 4. 25. 4u + 5v − w + 7z = 25 3u + 2v + 3z = 6 v + 6w − 7z = 0. 27. 3x + 4y − 2z = λx 2x − 7y + 6z = λy 8x + 3y + 5z = λz. 28. 2x + 3y + 6z = λ(3x + 2y + 3z) 3x − 4y + 2z = λ(x − 5y + 2z) 4x + 9y + 2z = λ(x − 2y + 4z). 29. If ⎡ 1 A = ⎣1 0 ⎤ 6 0⎦ , 3 ⎤ ⎡ 2 0 1 2 3⎦ , B = ⎣4 0 −1 1 ⎡ ⎤ x1 x2 x3 X = ⎣ y1 y2 y3 ⎦ , z1 z2 z3 3 2 1 and solve for X given that Matrices 119 33. Prove the second result in Theorem 3.1 that A(BC) = (AB)C = ABC. 34. Prove the third result in Theorem 3.1 that (λA)B = A(λB) = λAB. 35. Prove the fourth result in Theorem 3.1 that A(B + C) = AB + AC. In Exercises 36 through 39 verify that (AB)T = BT AT . ⎤ ⎤ ⎡ ⎡ 2 1 3 3 1 4 36. A = ⎣2 1 2⎦ , B = ⎣1 2 5⎦ . 0 2 1 4 2 3 ⎤ ⎡ ⎤ ⎡ 1 4 3 2 1 4 3 ⎢ 2 1 5⎥ ⎥ 2 1⎦ , B = ⎢ 37. A = ⎣1 6 ⎣−1 3 2⎦ . 1 1 −2 4 1 7 3 ⎤ ⎤ ⎡ ⎡ 3 1 −5 1 4 2 4⎦ . 38. A = ⎣7 3 −1⎦ , B = ⎣1 3 2 0 8 0 2 5 ⎤ ⎡ ⎤ ⎡ 1 2 1 1 4 6 2 ⎢−2 1 4⎥ ⎥ 39. A = ⎣2 1 4 1⎦ , B = ⎢ ⎣ 2 2 5⎦ . 3 0 0 2 1 1 1 40. Verify that (ABC)T = CT BT AT given that 3X + A = A B − X + 3B. T 30. If ⎡ 2 A = ⎣1 3 1 2 0 ⎤ ⎤ ⎡ 1 4 1 4 1⎦ , B = ⎣2 1 2⎦ , 1 1 2 2 ⎡ ⎤ x1 x2 x3 X = ⎣ y1 y2 y3 ⎦ , z1 z2 z3 solve for X given that 2ABT + X − 2I = 3X + 4B − 2A. 31. Given that ⎡ 3 A = ⎣2 2 2 2 0 ⎤ 2 0⎦ , 4 show that A3 − 9A2 + 18A = 0. 32. Given that ⎡ 0 A = ⎣0 2 1 0 1 ⎤ 0 1⎦ , −2 show that A3 + 2A2 − A − 2I = 0. A= and 1 5 , 3 1 B= 3 −2 , 4 5 C= and −2 5 41. Prove that if D is the n × n diagonal matrix ⎡ ⎤ k1 0 0 · · · 0 ⎢ 0 k2 0 · · · 0 ⎥ ⎢ ⎥ D = ⎢ 0 0 k3 · · · 0 ⎥ , then ⎢ ⎥ ⎣. . . . . . . . . . . . ⎦ 0 0 0 · · · kn ⎡ m ⎤ k1 0 0 · · · 0 m ⎢ 0 k2 0 · · · 0⎥ ⎢ ⎥ m 0 k3 · · · 0 ⎥ , Dm = ⎢ 0 ⎢ ⎥ ⎣. . . . . . . . . . . . . ⎦ m 0 0 0 · · · kn where m is a positive integer. 42. Find A2 , A3 , and A4 , given that ⎤ ⎡ 1 2 7 6⎦ . A = ⎣2 5 1 0 −1 43. Find A2 , A4 , and A6 , given that 1/2 A= √ ( 3)/2 √ −( 3)/2 1/2 . 3 . 7
  • 138. 120 Chapter 3 Matrices and Systems of Linear Equations 44. Use the matrix A in Exercise 42 to find A3 , A5 , and A7 . 45. A square matrix A such that A2 = A is said to be idempotent. Find the three idempotent matrices of the form A= 1 q p . r 46. A square matrix A such that for some positive integer n has the property that An−1 = 0, but An = 0 is said to be nilpotent of index n (n ≥ 2). Show that the matrix ⎤ ⎡ 0 0 0 0 0⎦ A = ⎣4 1 −1 0 is nilpotent and find its index. 47. A quadratic form in the variables x1 , x2 , x3 , . . . , xn is 2 2 an expression of the form ax1 + bx1 x2 + cx2 + dx1 x3 + 2 · · · + f xn−1 xn + gxn in which some of the coefficients a, b, c, d, . . . , f, g may be zero. Explain why xT Ax is a quadratic form and find the quadratic form for which ⎤ ⎡ ⎡ ⎤ x1 3 4 0 3 ⎢x2 ⎥ ⎢4 2 2 6⎥ ⎥ ⎢ ⎥ A=⎢ ⎣0 2 5 1⎦ and x = ⎣x3 ⎦ . 3 6 1 7 x4 48. Find the quadratic form xT Ax when ⎤ ⎡ ⎤ ⎡ x1 4 1 3 6 ⎢x2 ⎥ ⎢2 3 5 4⎥ ⎥ ⎢ ⎥ A=⎢ ⎣1 4 1 2⎦ and x = ⎣x3 ⎦ . 2 0 4 1 x4 49. Explain why the matrix A in the general expression for 3.2 a quadratic form xT Ax can always be written as a symmetric matrix. In Exercises 50 through 52 find the symmetric matrix A for the given quadratic form when written xT Ax, with x = [x, y, z]T . 50. 51. 52. 53. x 2 + 3xy − 4y2 + 4xz + 6yz − z2 . 2x 2 + 4xy + 6y2 + 7xz − 9z2 . 7x 2 + 7xy − 5y2 + 4xz + 2yz − 9z2 . A square matrix P is called a stochastic matrix if all its elements are nonnegative and the sum of the elements in each row is 1. Thus, the matrix ⎡ ⎤ p11 p12 · · · p1n ⎢p p22 · · · p2n ⎥ ⎥ P = ⎢ 21 ⎣. . . . . . . . . . . ⎦ pn1 pn2 · · · pnn will be a stochastic matrix if pi j ≥ 0 for 0 ≤ i ≤ n, 0 ≤ j ≤ n, and n pi j = 1 for i = 1, 2, . . . , n. j=1 Let the n element column vector E = [1, 1, 1, . . . , 1]T . By considering the matrix product PE, and using mathematical induction, prove that Pm is a stochastic matrix for all positive integral values of m. 54. Construct a 3 × 3 stochastic matrix P. Find P2 and P3 , and by showing that all elements of these matrices are nonnegative and that all their row-sums are 1, verify the result of Exercise 53 that each of these matrices is a stochastic matrix. Some Problems That Give Rise to Matrices (a) Electric Circuits with Resistors and Applied Voltages A simple electric circuit involving five resistors and three applied voltages is shown in Fig. 3.1. The directions of the currents i 1 , i 2 , and i 3 flowing in each branch of the circuit are shown by arrows. The currents themselves can be determined by an application of Ohm’s law and the Kirchhoff laws that can be stated as follows: (a) Voltage = current × resistance (Ohm’s law); (b) The algebraic sum of the potential drops around each closed circuit is zero (Kirchhoff’s second law); (c) The current entering each junction must equal the algebraic sum of the currents leaving it (Kirchhoff’s first law). equations and matrices for electric circuits An application of these laws to the circuit in Fig. 3.1, where the potentials are in volts, the resistances are in ohms, and the currents are in amps, leads to the following
  • 139. Section 3.2 Some Problems That Give Rise to Matrices 8 121 12 i1 10 8 i2 4 i3 6 4 6 FIGURE 3.1 An electric circuit with resistors and applied voltages. set of simultaneous equations: 8 = 12i 1 + 10(i 1 − i 2 ) + 8(i 1 − i 3 ) 4 = 10(i 2 − i 1 ) + 6(i 2 − i 3 ) 6 = 8(i 3 − i 1 ) + 6(i 3 − i 2 ) + 4i 3 . After collecting terms this system can be written as the matrix equation Ax = b, with ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 8 30 −10 −8 i1 16 −6⎦ , x = ⎣i 2 ⎦ , b = ⎣4⎦ . A = ⎣−10 6 −8 −6 18 i3 The directions assumed for the currents ir for r = 1, 2, 3 are shown by the arrows in Fig. 3.1, but if after the system of equations is solved, the value of the current is found to be negative, the direction of its arrow must be reversed. The circuit in Fig. 3.1 is simple, so in this example the currents can be found by routine elimination between the three equations. When many coupled circuits are involved a matrix approach is useful, and it then becomes necessary to develop a method for solving for x the matrix equation Ax = b, the elements of which are the required currents. If the number of equations is small, x can be found by making use of the matrix A−1 , inverse to A, that will be introduced later, though the most computationally efficient approach is to use one of the numerical methods for solving systems of linear simultaneous equations described in Chapter 19. (b) Combinatorial Problems: Graph Theory Matrices play an important role in combinatorial problems of many different types and, in particular, in graph theory. The purpose of the brief account offered here will be to illustrate a particular application of matrices, and no attempt will be made to discuss their subsequent use in the solution of the associated problems. Combinatorial problems involve dealing with the possible arrangements of situations of various different kinds, and computing the number and properties of such arrangements. The arrangements may be of very diverse types, involving at one extreme the ordering of matches that are to take place in a tennis tournament,
  • 140. 122 Chapter 3 Matrices and Systems of Linear Equations 4 5 3 1 2 FIGURE 3.2 The graph representing routes. graphs, vertices, edges, and adjacency matrix and at the other extreme finding an optimum route for a delivery truck or for the most efficient routing of work through a machine shop. The ideas involved are most easily illustrated by means of examples, the first of which involves the delivery from a storage depot of a consumable product to a group of supermarkets in a large city where it is important that daily deliveries be made as rapidly as possible. One possibility involves a delivery truck making a delivery to each supermarket in turn and returning to the storage depot between each delivery before setting out on the next delivery. An alternative is to travel between supermarkets after each delivery without returning to the storage depot. The question that then arises is which approach to routing is the best, and how it is to be determined. A typical situation is illustrated in Fig. 3.2, in which supermarkets numbered 1 to 5 are involved, with circles representing supermarkets and lines and arcs representing the routes. The representation in Fig. 3.2 is called a graph, and it is to be regarded as a set of points represented by the circles called vertices of the graph, and edges of the graph represented by the lines and arcs. In Fig. 3.2 the vertices are the circles 1, 2, . . . , 5 and the seven edges are the lines and arcs connecting the vertices. A special type of matrix associated with such a graph is an adjacency matrix, that is, a matrix whose only entries are 0 or 1. The rules for the entries in an adjacency matrix A = [ai j ] are that ai j = 1, if vertices i and j are joined by an edge 0, otherwise. The adjacency matrix for the graph in Fig. 3.2 is seen to be the symmetric matrix ⎤ ⎡ 0 1 0 1 1 ⎢1 0 1 0 0⎥ ⎥ ⎢ A = ⎢0 1 0 1 1⎥ . ⎥ ⎢ ⎣1 0 1 0 1⎦ 1 0 1 1 0 It is to be expected that an adjacency matrix is symmetric, because if i is adjacent to j, then j is adjacent to i. Although we shall not attempt to do so here, the interconnection properties of the problem represented by the graph in Fig. 3.2 can be analyzed in terms of its adjacency matrix A. The optimum routing problem can then be resolved once the traveling times along roads (lines or arcs) are known. Sometimes it happens that the edges in a graph represent connections that only operate in one direction, so then arrows must be added to the graph to indicate these
  • 141. Section 3.2 FIGURE 3.3 A typical digraph. digraph Some Problems That Give Rise to Matrices 123 FIGURE 3.4 The Konigsberg bridge problem. ¨ directions. A graph of this type is called a digraph (directed graph). The rules for the entries in the adjacency matrix A = [ai j ] of a digraph are that ai j = 1, 0, if vertices i and j are joined by an edge with an arrow from i to j otherwise. A typical digraph is shown in Fig. 3.3, and it has the associated adjacency matrix ⎡ ⎤ 0 1 0 1 ⎢0 0 1 0⎥ ⎥ A=⎢ ⎣1 0 0 0⎦ . 0 0 1 0 K¨ nigsberg o bridge problem The adjacency matrix A characterizes all the possible interconnections between the four vertices and, as with the previous example, an analysis of the properties of any situation capable of representation in terms of this digraph can be performed using the matrix A. Problems of this type can arise in transportation problems in cities with one-way streets, and in chemical processes where a fluid is piped to different parts of a plant through an interconnecting network of pipes through which fluid may only flow in a given direction. Before closing this brief introduction to graph theory, mention should be made of a problem of historical significance, since it represented the start of graph theory ¨ as it is known today. The problem is called the Konigsberg bridge problem, and it was solved by Euler (1707–1783). During the early 18th century the Prussian town of Konigsberg was established on two adjacent islands in the middle of the river ¨ Pregel. The islands were linked to the land on either side of the river, and to one another, by seven bridges, as shown in Fig. 3.4a. It was suggested to Euler that he should resolve the conjecture that it ought to be possible to walk through the town, starting and ending at the same place, while crossing each of the seven bridges only once.
  • 142. 124 Chapter 3 Matrices and Systems of Linear Equations Euler replaced the picture in Fig. 3.4a by the graph in Fig. 3.4b, though it was not until much later that the term graph in the sense used here was introduced. In Fig. 3.4b the vertices S and Q represent the two islands and, using the same lettering, P and R represent the riverbanks. The number of edges incident on each vertex represents the number of bridges connected to the corresponding land mass. Euler introduced the concept of a connected graph, in which each pair of vertices is linked by a set of edges, and also what is now called an eulerian circuit, comprising a path through all vertices that starts and ends at the same vertex and uses every edge only once. He called the number of edges incident upon a vertex the degree of the vertex, and by using these ideas he was able to prove the impossibility of the conjecture. The arguments involved are not difficult, but their details would be out of place here. Many more practical problems are capable of solution by graph theory, which itself belongs to the branch of mathematics called combinatorics. In elementary accounts, graph theory and related combinatorial issues are usually called discrete mathematics. More information about combinatorics and its connection with matrices can be found in References [2.2] and [2.13]. (c) Translations, Rotations, and Scaling of Graphs: Computer Graphics matrices and computer graphics The simplest operations in computer graphics involve copying a picture to a different location, rotating a picture about a fixed point, and scaling a picture, where the scaling can be different in the horizontal and vertical directions. These operations are called, respectively, a translation, a rotation, and a scaling of the picture. Operations of this nature can all be represented in terms of matrices, and they involve what are called linear transformations of the original picture. Translation A translation of a two-dimensional picture involves copying it to a different location without either rotating it or changing its horizontal and vertical scales. Figure 3.5 shows the original cartesian axes O(x, y) and the shifted axes O (x , y ), where the respective axes remain parallel to their original directions and the origin O is located at the point (h, k) relative to the O(x, y) axes. The relationship between the two sets of coordinates is given by x = x +h y k O and y = y + k. y′ O′ x′ h FIGURE 3.5 A translation. x
  • 143. Section 3.2 Some Problems That Give Rise to Matrices 125 y y P x φ Q θ x O FIGURE 3.6 A rotation through an angle θ. If x = [x, y]T , x = [x , y ]T , and b = [h, k]T , the coordinate transformation can be written in matrix form as x = x + b, where matrix b represents the translation. Rotation A rotation of the coordinate axes through an angle θ is shown in Fig. 3.6, where P(x, y) is an arbitrary point. The coordinates of P in the (x, y) reference frame and the (x , y ) reference frame are related as x = OR = OP cos(φ + θ ) = OP cos φ cos θ − OP sin φ sin θ = OQ cos θ − PQ sin θ = x cos θ − y sin θ, and y = P R = OP sin(φ + θ) = OP sin φ cos θ + OP cos φ sin θ = PQ cos θ + OQ sin θ = y cos θ + x sin θ, so x = x cos θ − y sin θ and y = y cos θ + x sin θ. Defining the matrices x, x , and R as x= x , y x = x , y R= and cos θ sin θ allows the coordinate transformation to be written as x = Rx . Scaling If S is a matrix of the form S= kx 0 0 , ky −sin θ cos θ
  • 144. 126 Chapter 3 Matrices and Systems of Linear Equations where kx and ky are positive constants, and x = Sx, it follows that x = kx x and y = ky y , showing that x is obtained by scaling x by kx , while y is obtained by scaling y by ky . This form of scaling is represented by premultiplication of x by S, and if, for example, 4 0 S= 0 , 3 the effect of this transformation on a circle of radius a will be to map it into an ellipse with semimajor axis of length 4a parallel to the x-axis and a semiminor axis of length 3a parallel to the y-axis. Composite transformations By combining the preceding matrix operations to form a composite transformation, it is possible to carry out several transformations simultaneously. As an example, the effect of a rotation R followed by a translation b when performed on a vector x are seen to be described by the matrix equation x = Rx + b, the effect of which is shown in Fig. 3.7. If a scaling S is performed before the rotation and translation, the effect on a vector x is described by the matrix equation x = RSx + b. This is illustrated in Fig. 3.8b, which shows the effect when a transformation of this type is performed on the circle of radius a centered on the origin shown in Fig. 3.8a, with b= h , k R= cos π/3 sin π/3 −sin π/3 , cos π/3 and S= 3 0 0 . 2 It is seen that the circle has first been scaled to become an ellipse with semiaxes 3a and 2a, after which the ellipse has been rotated through an angle π/3, and finally its center has been translated to the point (h, k). y P(x, y) y′ φ k O x′ θ O′ h FIGURE 3.7 A rotation and a translation. x
  • 145. Section 3.2 Some Problems That Give Rise to Matrices 127 y x′ 3a y y′ x′ 2a y′ 0 a π /3 k π/3 −2a x 0 h x −3a (a) (b) FIGURE 3.8 The composite transformation x = RSx + b. It is essential to remember that the order in which transformations are performed will, in general, influence the result. This is easily seen by considering the two transformations x = RSx + b and x = SRx + b. If the first of these is performed on the circle in Fig. 3.8a, it produces Fig. 3.8b, but when the second is performed on the same circle, it first converts it into an ellipse with its major axis horizontal, and then translates the center of the ellipse to the point (h, k). In this case the effect of the rotation cannot be seen, because the circle is symmetric with respect to rotations. A relationship of the form x = F(x ) can be interpreted geometrically in two distinct ways which are equally valid: 1. As the change in the way we describe the location of a point P. Then the relationship is called a transformation of coordinates (Figs. 3.5, 3.6, 3.7). 2. As a mapping of a point P from one location to a new one. (d) Matrix Analysis of Framed Structures A framed structure is a network of straight struts joined at their ends to form a rigid three-dimensional structure. A typical framed structure is the steel work for a large building before the walls and floors have been added. A simple example of a framed structure, called a truss, is a plane construction in which the struts are joined together to form triangles, as in the side section of the small bridge shown in Fig. 3.9. For safety, to ensure that no strut fails when the bridge carries the largest permitted load, it is necessary to determine the force experienced by each strut in the truss when the bridge supports its maximum load in several different positions. Typically, the largest load could be due to a heavy truck crossing the bridge. The analysis of trusses is usually simplified by making the following assumptions: r The structure is in the vertical plane; r The weight of each strut can be neglected; r Struts are rigid and so remain straight;
  • 146. 128 Chapter 3 Matrices and Systems of Linear Equations F4 B B F1 D A A C E FIGURE 3.9 A typical truss found in a side section of a bridge. π /3 F3 F5 π /3 F2 L /2 L /2 D π /3 C F7 F6 π /3 L R1 E R2 3m FIGURE 3.10 A truss supporting a concentrated load. r Each joint is considered to be hinged, so the only forces acting at a joint are along the struts meeting at the joint if forces are applied at joints only. r There are no redundant struts, so that removing a strut will cause the truss to collapse. We now write down the simultaneous equations that must be solved to find the forces acting in the seven struts of length L that form the truss shown in Fig. 3.10, when a concentrated load 3m is located at point C midway between A and E. This load could be considered to be a heavily laden truck standing in the center of the bridge. To determine the reactions at the support points A and E, we use the fact that for equilibrium the turning moments about these two points must be zero. The turning moment of the load 3m about the point A must be cancelled by the turning moment of the reaction R2 at E, so 3m(L) = R2 (2L), showing that R2 = 3m/2. Similarly, the turning moment of the load 3m about the point E must be cancelled by the turning moment of the reaction R1 at A, so 3m(L) = R1 (2L), showing that R1 = 3m/2. The directions in which the forces F1 to F7 are assumed to act are shown by arrows, and if later a force is found to be negative, the direction of the associated arrows must be reversed. For equilibrium the sum of the vertical components of all forces acting at each joint must be zero, as must be the sum of the horizontal components of all forces acting at each joint. The equations representing the balance of forces at each joint are as follows, where when resolving the forces acting at joint C, the effect of the load 3m which acts vertically downwards must be taken into account: equations and matrices for a framed structure Joint A(vertical) Joint A(horizontal) Joint B(vertical) Joint B(horizontal) Joint C(vertical) Joint C(horizontal) Joint D(vertical) F1 sin π/3 − 3m/2 = 0 F1 cos π/3 + F2 = 0 F1 sin π/3 + F3 sin π/3 = 0 F1 cos π/3 − F3 cos π/3 − F4 = 0 F3 sin π/3 + F5 sin π/3 + 3m = 0 F2 + F3 cos π/3 − F5 cos π/3 − F6 = 0 F5 sin π/3 + F7 sin π/3 = 0
  • 147. Section 3.2 Some Problems That Give Rise to Matrices 129 F4 + F5 cos π/3 − F7 cos π/3 = 0 F7 sin π/3 − 3m/2 = 0 F6 + F7 cos π/3 = 0. Joint D(horizontal) Joint E(vertical) Joint E(horizontal) After substituting for sin π/3 and cos π/3, these equations can be written in the matrix form Ax = b, where ⎤ ⎡1√ 3 0 0 0 0 0 0 2 ⎢ 1 ⎥ ⎤ ⎡ 1 0 0 0 0 0 ⎥ ⎢ 2 3m/2 ⎡ ⎤ ⎥ ⎢ √ √ F1 ⎢ 0 ⎥ ⎢1 3 0 1 3 0 0 0 0 ⎥ ⎥ ⎢ ⎥ ⎢2 2 ⎢ ⎥ ⎢ 0 ⎥ ⎥ ⎢ 1 ⎢ F2 ⎥ 1 ⎥ ⎢ ⎢ 0 − 2 −1 0 0 0 ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎥ ⎢ 2 ⎢ F3 ⎥ √ √ ⎥ ⎢ ⎥ ⎢ 1 1 ⎢ ⎥ 0 2 3 0 2 3 0 0 ⎥ ⎥ ⎢ ⎢ 0 ⎥ , x = ⎢ F4 ⎥ , b = ⎢ −3m ⎥ . ⎢ A=⎢ ⎢ ⎥ 1 ⎢ 0 ⎥ ⎢ ⎥ 0 1 0 − 1 −1 0 ⎥ ⎥ ⎢ ⎢ 2 2 ⎢ F5 ⎥ √ √ ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ 0 0 1 3⎥ 0 0 0 1 3 ⎢ ⎥ 2 2 ⎢ 0 ⎥ ⎥ ⎢ ⎣ F6 ⎦ ⎥ ⎢ ⎢ 0 1 1 ⎥ 0 −2 ⎥ 0 0 1 ⎢ ⎣3m/2⎦ 2 F7 ⎢ √ ⎥ ⎢ 0 0 0 0 0 0 0 1 3⎥ ⎦ ⎣ 2 0 0 0 0 0 1 1 2 These are 10 equations for the 7 unknown forces F1 to F7 , so unless 3 of the equations represented in Ax = b are combinations of the remaining 7 equations, we cannot expect there to be a solution. When the rank of a matrix is introduced in Section 3.6, we will see how systems of this type can be checked for consistency and, when appropriate, simplified and solved. In this case the equations are sufficiently simple that they can be solved sequentially, without the use of matrices. The solution is seen to be √ √ √ √ F1 = m 3, F2 = −m/( 3/2), F3 = −m 3, F4 = m 3, √ √ √ F5 = −m 3, F6 = −m( 3/2), F7 = m 3. k1 m1 x1 The signs show that the arrows in Fig. 3.10 associated with forces F2 , F3 , F5 , and F6 should be reversed, so these struts are in tension, while the others are in compression. Notice that matrix A is determined by the geometry of the truss, and so does not change when forces are applied to more than one of the joints on the truss (bridge). This means that after the 10 equations have been reduced to seven, the same modified matrix A can be use to find the forces in the struts for any form of concentrated loading. Had a more complicated struss been involved, many more equations would have been involved, so that a matrix approach becomes necessary. This approach also identifies any redundant struts in a structure, because the force in a redundant strut is indeterminate. (e) A Compound Mass–Spring System k2 x2 m2 FIGURE 3.11 A compound mass–spring system. Matrices can have variables as elements, and an analysis of the compound mass– spring system shown in Fig. 3.11 shows one way in which this can arise. Figure 3.11 represents a mass m1 suspended from a rigid support by a spring of negligible mass with spring constant k1 , and a mass m2 suspended from mass m1 by a spring of negligible mass with spring constant k2 . The vertical displacement of m1 from its
  • 148. 130 Chapter 3 Matrices and Systems of Linear Equations equations of motion of a coupled mass–spring system equilibrium position is x1 , and the vertical displacement of m2 from its equilibrium position is x2 . Each spring is assumed to be linearly elastic, so the restoring force exerted by a spring is equal to the product of the displacement from its equilibrium position and the spring constant. The product of the mass m1 and its acceleration is m1 d2 x1 /dt 2 , and the restoring force due to spring k1 is k1 x1 , while the restoring force due to spring k2 is k2 (x1 − x2 ), so the equation of motion of m1 is d2 x1 = −k1 x1 − k2 (x1 − x2 ). dt 2 Similarly, the equation of motion of m2 is m1 d2 x2 = −k2 (x2 − x1 ), dt 2 where the negative signs are necessary because the springs act to restore the masses to their original positions. ¨ This system can be written as the matrix differential equation x + Ax = 0, by defining A and x as ⎡ 2 ⎤ ⎡ (k + k ) k2 ⎤ d x1 1 2 − ⎢ dt 2 ⎥ ⎥ ⎢ m1 m1 ⎥ x ⎢ ⎥ ¨ A=⎢ , x = 1 , and x = ⎢ ⎥. ⎣ x2 ⎣ d2 x2 ⎦ k2 ⎦ k2 − m2 m2 dt 2 The solution of this system will not be considered here as ordinary differential equations and systems of the type derived here are discussed in detail in Chapter 6, where matrix methods are also developed. Chapter 7 develops Laplace transform methods for the solution of differential equations and systems. It will suffice to mention here that the dynamical behavior of the compound mass–spring system in Fig. 3.11 is completely characterized by matrix A. m2 (f) Stochastic Processes Certain problems arise that are not of a deterministic nature, so that both the formulation of the problem and its outcome must be expressed in terms of probabilities. The probability p that a certain event occurs is a number in the interval 0 ≤ p ≤ 1. An event with probability p = 0 is one that never occurs, and an event with probability p = 1 is one that is certain to occur. So, for example, when tossing a coin N times and recording each outcome as an H (head) or a T (tail), if the number of heads is NH and the number of tails is NT , so that N = NH + NT , the numbers NH /N and NT /N will be approximations to the respective probabilities that a head or a tail occurs when the coin is tossed. If the coin is unbiased, it is reasonable to expect that as N increases both NH /N and NT /N will approach the value 1/2. This will mean, of course, that the chances of either a head or a tail occurring on each occasion are equal. The example we now outline is called a stochastic process and is illustrated by considering a process that evolves with time and is such that at any given moment it may be in precisely one of N different situations, usually called states, where N is finite. We shall denote the N states in which the process may find itself at any given time tm by S1 , S2 , . . . , SN , with m = 0, 1, 2, . . . , and tm−1 < tm, it being assumed that the outcome at each time depends on probabilities, and so is not deterministic.
  • 149. Section 3.2 Some Problems That Give Rise to Matrices 131 To formulate the problem we assume that what are called the conditional probabilities pki (also called transition probabilities) that determine the probability with which the process will be in state S j at time tm are all known, given that it was in state Sk at time tm−1 , and that these probabilities are the same from t1 to t2 as from tm−1 to tm for m = 0, 1, 2, . . . . This last assumption means that the probability with which the transition from state Sk to S j occurs is independent of the time at which the process was in state Sk. The conditional probabilities can be arranged as the N × N matrix P = [ p jk], so as probabilities are involved, all the p jk are nonnegative, and as each stage must have an outcome, the sum of the elements in every row of matrix P must equal 1. A matrix P with these properties, namely that N 0 ≤ p jk ≤ 1, 0 ≤ j ≤ N, 0 ≤ k ≤ N, p jk = 1, and j=1 stochastic matrix and a Markov process is called a stochastic matrix (see Exercise 53, Section 3.1). Processes like these, whose condition at any subsequent instant does not depend on how the process arrived at its present state, are called Markov processes. Simple but typical examples of such processes involving only two states are gambling wins and losses, the reliability of machines that may either be operational or under repair, shells fired from a gun that either hit or miss the target and errors that introduce an incorrect digit 1 or 0 when transferring binary coded information. To develop the argument a little further, let us now consider a process that can be in one of two states, and that the matrix P describing its transitions is given by P= 2/3 1/3 . 1/4 3/4 Now suppose that initially the probability distribution is given by the row matrix E(0) = [ p, q], where, of course, p + q = 1. Then if E(m) denotes the probability distribution of the states at time tm, it follows that E(1) = E(0)P, but as P is independent of the time we conclude that after m transitions the general result must be E(m) = E(0)Pm, so in this case E(m) = [ p, q] 2/3 1/3 1/4 3/4 m . Direct calculation shows that E(3) = [0.470 p + 0.398q, 0.530 p + 0.602q], E(6) = [0.432 p + 0.426q, 0.568 p + 0.574q], and E(10) = [0.429 p + 0.429q, 0.571 p + 0.571q], so it is reasonable to ask if E(m) tends to a limiting vector as m → ∞ and, if so, what this is? As this problem is simple, an analytical answer is possible, though it involves using a diagonalizing matrix P which will be discussed later.
  • 150. 132 Chapter 3 Matrices and Systems of Linear Equations We will see later that P can be written as ADB, where D is a diagonal matrix and AB = I. In this case 1 1 A= 4 , −3 1 0 D= 0 , 5/12 and B= 3/7 1/7 1 0 0 5/12 3/7 1/7 4/7 , −1/7 4/7 . −1/7 so P= 1 1 4 −3 In what follows we will need to make repeated use of the fact that BA = 3/7 1/7 4/7 −1/7 1 1 4 1 0 = = I. −3 0 1 Using this last property we find that P2 = = 1 1 4 −3 1 4 1 −3 1 0 0 5/12 1 0 0 5/12 3/7 1/7 2 4/7 −1/7 3/7 1/7 1 1 4 −3 1 0 0 5/12 3/7 1/7 4/7 −1/7 4/7 . −1/7 However, when a diagonal matrix is raised to a power, each of its elements is raised to that same power (see Problem 41, Section 3.1), so P2 = 1 1 4 −3 1 0 0 (5/12)2 3/7 1/7 4/7 −1/7 Pm = 1 1 4 −3 1 0 0 (5/12)m 3/7 1/7 4/7 . −1/7 and, in general, Thus, ⎡ 3 + 4(5/12)m ⎢ 7 Pm = ⎢ ⎣ 3 − 3(5/12)m 7 ⎤ 4 − 4(5/12)m ⎥ 7 ⎥, 4 + 3(5/12)m ⎦ 7 showing that as m → ∞, so lim E(m)Pm = [3( p + q)/7, 4( p + q)/7] = [3/7, 4/7], m→∞ and we have found the limiting state of the system. Stochastic processes also occur that involve more than two states. The problem of determining the probability with which such processes will be in a given state, and when a limiting state exists, the limiting values of the probabilities involved, is of considerable practical importance. An introduction to stochastic process can be found in reference [2.4]. Summary This section has introduced some of the many areas in which matrices play an essential role. These range from electric circuits needing the application of Kirchhoff’s laws, through routing problems involving the concepts of directed graphs and adjacency matrices, to the classical K¨ nigsberg bridge problem, computer graphic operations performed by linear o transformations, the matrix analysis of forces in a framed structure, the oscillations of a coupled mass–spring system, and stochastic processes.
  • 151. Section 3.3 Determinants 133 EXERCISES 3.2 1. State which of the following matrices is a stochastic matrix, giving a reason when this is not the case. ⎤ ⎡ ⎤ ⎡ 0.5 0.2 0.3 0.5 0.3 0.2 (c) ⎣0.7 0.3 0.2⎦ . (a) ⎣0.25 0 0.75⎦. 0.4 0.2 0.4 0.5 0.5 0 ⎤ ⎤ ⎡ ⎡ 0.3 0.1 0.6 1.2 0 −0.2 0.2⎦ . (d) ⎣0.8 0 0.2⎦ . (b) ⎣ 0 0.8 0 1 0 0.6 0.3 0.1 4. 2. Given the stochastic matrix 5. 3/4 P= 1/2 4 3 1 2 FIGURE 3.13 1/4 1/2 4 5 and the initial probability distribution E(0) = [ p, q], with p, q ≥ 0 and p + q = 1, the probability distribution of the two states at time tm is given by 3 E(m) = E(0)Pm. Find E(2), E(4), and E(6), together with their values when p = 1/4, q = 3/4. In Exercises 3 through 6 find the adjacency matrices for the given graphs and digraphs. FIGURE 3.14 6. 4 3. 5 1 5 6 6 3 4 2 3 2 FIGURE 3.15 FIGURE 3.12 3.3 1 Determinants Every square matrix A with numbers as elements has associated with it a single unique number called the determinant of A, which is written detA. If A is an n × n matrix, the determinant of A is indicated by displaying the elements ai j of A between two vertical bars as follows: notation for a determinant a11 a21 detA = ··· an1 a12 a22 ··· an2 · · · a1n · · · a2n . ··· ··· · · · ann (5) The number n is called the order of determinant A, and in (5) the vertical bars are used to distinguish detA, that is a number, from matrix A that is an n × n array of numbers.
  • 152. 134 Chapter 3 Matrices and Systems of Linear Equations A general definition of the value of detA in terms of its elements ai j will be given later, so for the moment we define only the value of first and second order determinants (see Section 1.7). If A only contains a single element a11 so A = [a11 ] then, by definition, detA = a11 , and if A is the 2 × 2 matrix A= a11 a21 a12 , a22 then, by definition, a11 a21 det A = a12 = a11 a22 − a21 a12 . a22 (6) Notice that in (6) the numerical value of detA is obtained by forming the product of the two terms a11 and a22 on the leading diagonal, and subtracting from it the product of the two terms a21 and a12 on the cross diagonal. This process, called expanding the determinant, is easily remembered by representing the method by which the determinant is expanded as a12  = a11 a22 − a21 a12 , d  a21  d a22 ‚ a11 where the product involving the downward arrow generates the first pair of terms on the right and the product involving the upward arrow indicates that the product of the associated pair of terms is to be subtracted. EXAMPLE 3.10 Find detA given 3 −1 (a) detA = 2 6 and (b) detA = 1+i −3i i . 2 Solution (a) Using (5) we have detA = 3 2 −1 = 3 · 6 − 2 · (−1) = 20. 6 (b) Again using (5) we have detA = 1+i −3i i = (1 + i) · 2 − (−3i) · i = −1 + 2i. 2 To provide some motivation for the introduction of determinants, we solve by elimination the two linear simultaneous algebraic equations a11 x1 + a12 x2 = b1 a21 x1 + a22 x2 = b2 . (7) To eliminate x2 we multiply the first equation by a22 and the second equation by a12 , and then subtract the results to obtain (a11 a22 − a21 a12 )x1 = a22 b1 − a12 b2 . This shows that when a11 a22 − a21 a12 = 0, x1 = a22 b1 − a12 b2 . a11 a22 − a21 a12
  • 153. Section 3.3 Determinants 135 This result can be expressed in terms of detA as x1 = (a22 b1 − a12 b2 )/detA. (8) Similarly, when x1 is eliminated from equations (7) we find that x2 = (a11 b2 − a21 b1 )/detA. Cramer’s rule for a system of two equations (9) Examination of (8) and (9) shows that their numerators can be written in terms of determinants that are closely related to detA, because x1 = D1 D and D1 = b1 b2 a12 , a22 x2 = D2 , D (10) where D = detA, and D2 = a11 a21 b1 . b2 (11) The form of solution of equations (7) in terms of the determinants in (10) and (11) is called Cramer’s rule. The rule itself says that xi = Di /D for i = 1, 2, where determinant D1 is obtained from D = detA by replacing the first column of A by the nonhomogeneous terms b1 and b2 on the right of equations (7), and determinant D2 is obtained from D by replacing the second column of A by these same two terms. EXAMPLE 3.11 Use Cramer’s rule to solve the equations 3x1 + 5x2 = 4 2x1 − 4x2 = 1. Solution The three determinants required by Cramer’s are D = detA = 3 2 5 = −22, −4 D1 = 4 1 5 = −21, −4 D2 = 3 2 4 = −5, 1 so x1 = D1 /D = 21/22 and x2 = D2 /D = 5/22. minors and cofactors This example shows how determinants enter naturally into the solution of a system of equations. As determinants of order n > 2 occur in the study of differential equations, analytical geometry, throughout linear algebra, and elsewhere, it is necessary to generalize the definition of a determinant of order 2 given in (6) to determinants of any order n. With this objective in mind, we first define the minors and cofactors of a determinant of order n. The minor Mi j associated with the element ai j in the ith row and jth column of the nth order determinant in (5) is the determinant of order n − 1 formed from detA by deleting the elements in the ith row and jth column. As each element of detA has an associated minor, a determinant of order n has n2 minors. By way of example, the minor M3 j of the nth order determinant in (5) is the determinant of order n − 1 a11 a21 M3 j = a41 ··· an1 a12 a22 a42 ··· an2 · · · a1 j−1 · · · a2 j−1 · · · a4 j−1 ··· ··· · · · anj−1 a1 j+1 a2 j+1 a4 j+1 ··· anj+1 · · · a1n · · · a2n · · · a4n . ··· ··· · · · ann (12)
  • 154. 136 Chapter 3 Matrices and Systems of Linear Equations The cofactor Ci j associated with the element ai j in determinant (5) is defined in terms of the minor Mi j as Ci j = (−1)i+ j Mi j for i, j = 1, 2, . . . , n, (13) so an nth order determinant has n2 cofactors. EXAMPLE 3.12 Find the minors and cofactors of detA = 2 1 −3 . 4 Solution Inspection shows that M11 = 4, M12 = 1, M21 = −3, and M22 = 2. Using definition (12), the cofactors are seen to be C11 = (−1)1+1 M11 = 4, C12 = (−1)1+2 M12 = −1, C21 = (−1)2+1 M21 = 3, and C22 = (−1)2+2 M22 = 2. Recognizing that the cofactors of the second order determinant detA = expanding a second order determinant in terms of rows or columns a11 a21 a12 a22 are C11 = a22 , C12 = −a21 , C21 = −a12 , and C22 = a11 , we see from the definition detA = a11 a22 − a21 a12 that detA can be expressed in terms of these cofactors in four different ways: detA = a11 C11 + a12 C12 , using elements and cofactors from the first row of A; detA = a21 C21 + a22 C22 , using elements and cofactors from the second row of A; detA = a11 C11 + a21 C21 , using elements and cofactors from the first column of A; detA = a12 C12 + a22 C22 , using elements and cofactors from the second column of A. This has proved by direct calculation that the value of the general second order determinant detA is given by the sum of the products of the elements and their associated cofactors in any row or column of the determinant. When the definition of a determinant is extended to the case n > 2 it will be seen that this same property remains true. There are various ways of defining an nth order determinant, and from among these we have chosen to use one that involves a recursive process. More will be said about this recursive process, and how it can be used to evaluate the determinant, once the definition has been formulated. Definition of a determinant of order n The nth order determinant detA in which the element ai j has the associated cofactor Ci j for i, j = 1, 2, . . . , n is defined as a11 a21 detA = ··· an1 a12 a22 ··· an2 · · · a1n · · · a2n = ··· ··· · · · ann n a1 j C1 j . j=1 (14)
  • 155. Section 3.3 Determinants 137 Recalling the different ways in which a second order determinant can be evaluated, we see that the expansion of detA in (14) is in terms of the elements and cofactors of the first row, so for conciseness this expansion is said to be in terms of the elements of the first row. The recursive process enters this definition through the fact that each cofactor C1 j is a determinant of order n − 1, as can be seen from (12), so each cofactor in turn can be expanded in terms of determinants of order n − 2, and the process continued until determinants of order 2 are obtained that can then be calculated using (6). EXAMPLE 3.13 Expand 1 detA = 2 1 4 −1 0 3 . 2 1 Solution To expand this third order determinant using (14), we must find the cofactors of the elements of the first row, so to do this we first find the minors and then use (13) to find the cofactors, as a result of which we find that M11 = 0 2 3 = −6, 1 so C11 = (−1)1+1 (−6) = −6 M12 = 2 1 3 = −1, 1 so C12 = (−1)1+2 (−1) = 1 M13 = 2 1 0 = 4, 2 so C13 = (−1)1+3 (4) = 4. As the elements of the first row are a11 = 1, a12 = 4, and a13 = −1, we find from (12) that detA = (1)C11 + (4)C12 + (−1)C13 = (1)(−6) + (4)(1) + (−1)(4) = −6. The determinant associated with either an upper or a lower triangular matrix A of any order is easily expanded, because repeated application of (12) shows that it reduces to the product of the terms on the leading diagonal, so the expansion of the nth order upper triangular determinant with elements a11 , a22 , . . . , ann on its leading diagonal a11 0 detA = 0 0 a12 a22 0 0 · · · a1n · · · a2n = a11 a22 . . . ann , ··· ··· 0 ann (15) and a corresponding result is true for a lower triangular matrix. Definition (14) can be used to prove that nth order determinants, like second order determinants, have the property that their value is given by the sum of the products of the elements and their cofactors in any row or column. This result, together with a generalization concerning the vanishing of the sum of the products of the elements in any row (or column) and the corresponding cofactors in a different row (or column), forms the next theorem. The details of the proof can be found in linear algebra texts, for example, [2.1], [2.5], [2.7], [2.9], but the method used has no other application in what is to follow, so the proof will be omitted.
  • 156. 138 Chapter 3 Matrices and Systems of Linear Equations THEOREM 3.3 Laplace expansion theorem Laplace expansion theorem and an extension Let A be an n × n matrix with elements ai j . Then, (i) detA can be expanded in terms of elements of its ith row and the cofactors Ci j of the ith row as n detA = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin = ai j Ci j j=1 for any fixed i with 1 ≤ i ≤ n. (ii) detA can be expanded in terms of elements of its jth column and the cofactors Ci j of the jth column as n detA = a1 j C1 j + a2 j C2 j + · · · + anj Cnj = ai j Ci j j=1 for any fixed j with 1 ≤ j ≤ n. (iii) The sum of the products of the elements of the ith row with the corresponding cofactors of the kth row is zero when i = k. (iv) The sum of the products of the elements in the jth column with the corresponding cofactors of the kth column is zero when j = k. Results (i) and (ii) are often used to advantage when a row or column contains many zeros, because if the determinant is expanded in terms of the elements of that row or column, the cofactors associated with each zero element need not be calculated. Results (iii) and (iv) simply say that the sum of the products of the elements in any row (or column) with the corresponding cofactors in a different row (or column) is zero. PIERRE SIMON LAPLACE (1749–1827) A French mathematician of remarkable ability who made contributions to analysis, differential equations, probability, and celestial mechanics. He used mathematics as a tool with which to investigate physical phenomena, and made fundamental contributions to hydrodynamics, the propagation of sound, surface tension in liquids, and many other topics. His many contributions had a wide-ranging effect on the development of mathematics. EXAMPLE 3.14 Verify Theorem 3.3(i) by expanding the determinant in Example 3.13 in terms of the elements of its second row. Use the determinant to check the result of Theorem 3.3(iii). Solution The second row contains a zero element in its mid position, so the cofactor C22 associated with the zero element need not be calculated. The necessary cofactors in the second row that follow from the minors are M21 = 4 2 −1 =6 1 so C21 = (−1)2+1 (6) = −6 M23 = 1 1 4 = −2 2 so C23 = (−1)2+3 (−2) = 2.
  • 157. Section 3.3 Determinants 139 As a21 = 2 and a23 = 3, it follows from Theorem 3.3(i) that when detA is expanded in terms of elements of its second row, detA = (2)(−6) + (3)(2) = −6, confirming the result obtained in Example 3.13. As a particular case of Theorem 3.3(iii), let us show that the sum of the products of the cofactors in the first row of detA and the corresponding elements in the third row is zero. In Example 3.13 it was found that C11 = −6, C12 = 1, and C13 = 4, so as the elements of the third row are a31 = 1, a32 = 2, and a33 = 1, we have a31 C11 + a32 C12 + a33 C13 = (−6)(1) + (2)(1) + (1)(4) = 0, confirming the result of Theorem 3.3(iii) when the elements of row 3 and the cofactors of row 1 are used. Determinants have a number of special properties that can be used to simplify their expansion, though their main uses are found elsewhere in mathematics, where determinants often characterize some important theoretical feature of a problem. The most important and useful of these properties are contained in the next theorem. THEOREM 3.4 basic properties of determinants Properties of determinants A determinant detA has the following properties: (i) If any row or column of a determinant detA only contains zero elements, then detA = 0. (ii) If A is a square matrix with the transpose AT , then detA = detAT . (iii) If each element of a row or column of a square matrix A is multiplied by a constant k, then the value of the determinant is kdetA. (iv) If two rows (or columns) of a square matrix are interchanged, the sign of the determinant is changed. (v) If any two rows or columns of a square matrix A are proportional, then detA = 0. (vi) Let the square matrix A be such that each element ai j of the ith row (or the (1) (2) jth column) can be written as ai j = ai j + ai j . Then if A1 is the matrix derived from (1) A by replacing its ith row (or jth column) by the elements ai j and A2 is the matrix (2) derived from A by replacing its ith row (or jth column) by the elements ai j , detA = detA1 + detA2 . (vii) The addition of a multiple of a row (or column) of a determinant to another row (or column) of the determinant leaves the value of the determinant unchanged. (viii) Let A and B be two n × n matrix, then det(AB) = detA detB. Proof (i) The result follows by expanding the determinant in terms of the row or column that only contains zero elements.
  • 158. 140 Chapter 3 Matrices and Systems of Linear Equations (ii) The result follows from the fact that expanding detA in terms of the elements of its first row is the same as expanding detAT in terms of the elements of its first column. (iii) The result follows by expanding the determinant in terms of the row or column in which each element has been multiplied by the constant k, because k appears as a factor in each term, so the result becomes kdetA. (iv) The proof is by induction, starting with a second order determinant for which the result can be seen to be true from definition (6). To proceed with an inductive proof we assume the results to be true for a determinant of order r − 1, and show it must be true for a determinant of order r. Expand a row of a determinant of order r in terms of the elements of a row (or column) that has not been interchanged. Then, by hypothesis, as the cofactors are determinants of order r − 1, their signs will all be reversed. This establishes that if the hypothesis is true for a determinant of order r − 1 it must also be true for a determinant of order r . As the result is true for r = 2, it follows by induction that it is true for all integers r > 2, and the result is proved. (v) If the value of the determinant is detA, and one row is k times another, then from (ii) by removing the factor k from the row the value of the determinant will be kdetA1 , where A1 is now a determinant with two identical rows. From (ii), interchanging two rows changes the sign of the determinant, but the rows are identical, leaving the determinant invariant, so detA1 = 0. A similar argument shows the result to be true when two columns are proportional, so the result is proved. (vi) The result is proved directly by expanding the determinant in terms of the elements of the ith row (or the jth column). (vii) Let the square matrix B be obtained from A by adding k times the ith row (or a column) to the jth row (or column). Then from (iii) and (vi), detB = detA + kdetC, where C is obtained from A by replacing the ith row (or column) by the jth row (or column). As detC has two identical rows (or columns), it follows from (v) that detC = 0, so detB = detA and the result is proved. (viii) A proof of this result will be given later after the introduction of elementary row operation matrices. Cramer’s rule, which was first encountered when seeking the solution of the two equations in (7), can be extended to a system of n equations in a very straightforward manner, and it takes the following form. Cramer’s rule Cramer’s rule for a system of n equations in n unknowns The solution of the system of n equations in the n unknowns x1 , x2 , . . . , xn a11 x1 + a12 x2 + · a21 x1 + a22 x2 + · · · an1 x1 + an2 x2 + · · · · · · + a1n xn = b1 · + a2n xn = b2 · · · · + ann xn = bn
  • 159. Section 3.3 Determinants 141 is given by xi = detAi /detA for i = 1, 2, . . . , n, where detA is the determinant of the coefficient matrix with elements ai j , and detAi is the determinant obtained from the coefficient matrix by replacing its ith column by the column containing the number b1 , b2 , . . . , bn . The justification for Cramer’s rule in this more general form will be postponed until after the introduction of inverse matrices, when a simple proof can be given. Cramer’s rule is mainly of theoretical importance and, in general, it should not be used to solve equations when n > 3. This is because the number of multiplications required to evaluate a determinant of order n is (n − 1)n!, so to solve for n unknowns (n + 1) determinants must be evaluated leading to a total of (n2 − 1)n! multiplications, and this calculation becomes excessive when n > 3. An efficient way of solving large systems by means of elimination is given in Chapter 19. EXAMPLE 3.15 Use Cramer’s rule to solve x1 − 2x2 + x3 = 1 2x1 + x2 − 2x3 = 3 −x1 + 3x2 + 4x3 = −2. Solution The determinants involved are detA = detA2 = −2 1 3 1 −2 = 29, 4 detA1 = 1 3 −2 −2 1 1 −2 = 37 3 4 1 1 1 2 3 −2 = 1, −1 −2 4 detA3 = 1 2 −1 −2 1 1 3 = −6, 3 −2 1 2 −1 so x1 = 37/29, x2 = 1/29, and x3 = −6/29. A purely algebraic approach to the study of determinants and their properties is to be found in reference [2.8], and many examples of their applications are given in references [2.11] and [2.12]. Summary This section has extended to an nth order determinant the basic notion of a second order determinant that was reviewed in Chapter 1, and then established its most important properties. The Laplace expansion formulas that were established are of theoretical importance, but it will be seen later that the practical evaluation of a determinant is most easily performed by reducing the n × n matrix associated with a determinant to its echelon form.
  • 160. 142 Chapter 3 Matrices and Systems of Linear Equations EXERCISES 3.3 In Exercises 1 through 4 find detA. 2 1. detA = 0 3 2. detA = −1 3. −2 1 4 2 −1 1 −4 2 3. detA = −2 5 10. 2 3 1 detA = 0 − sin x . cos x −3 1 4 2 −1 5 = 87, 4 2 5 confirm by direct calculation that (a) interchanging the first and last rows changes the sign of detA and (b) interchanging the second and third columns changes the sign of detA. 6. Given that detA = 2 5 −1 1 3 −2 2 = −24, 1 3 confirm by direct calculation that (a) adding twice row two to row three leaves detA unchanged and (b) subtracting three times column three from column one leaves detA unchanged. Establish the results in Exercises 7 through 12 without a direct expansion of the determinant by using the properties listed in Theorem 3.4. 7. 1+a b c a a 1+b b = (1 + a + b + c). c 1+c 1 a b+c 8. 1 b c + a = 0. 1 c a+b 2 9. a a 1 2 b b 1 2 ac bc = x 4 (x 2 + a 2 + b2 + c2 ). 2 2 x +c c c = (a − b)(a − c)(b − c). 1 a b 1 b = (a + b + 1)(a − 1)(b − 1). b 1 k 1 12. 1 1 −3 0. 4 4 0 4. detA = −2 cos x 5 sin x 5. Given that ab x 2 + b2 cb 1 11. a a 1 2. 2 4 1 −2 x2 + a 2 ab ac 1 k 1 1 1 1 k 1 1 1 = (k + 3)(k − 1)3 . 1 k In Exercises 13 and 14 use Cramer’s rule to solve the system of equations. 13. 2x1 − 3x2 + x3 = 4 x1 + 2x2 − 2x3 = 1 3x1 + x2 − 2x3 = −2. 14. 3x1 + x2 + 2x3 = 5 2x1 − 4x2 + 3x3 = −3 x1 + 2x2 + 4x3 = 2. 15. Let P(λ) be given by P(λ) = 3−λ 2 4 0 1 2−λ 2 , 2 1−λ where λ is a parameter. Expand the determinant to find the form of the polynomial P(λ) and use the result to find for what values of λ the determinant vanishes. 16. Let P(λ) be given by P(λ) = 4−λ 1 −1 0 1 −λ 1 , −2 2 − λ where λ is a parameter. Expand the determinant to find the form of the polynomial P(λ) and use the result to find for what values of λ the determinant vanishes. 17. Given that ⎡ −3 A=⎣ 1 1 ⎤ 0 4 2 −1⎦ 0 1 ⎡ and 1 2 B = ⎣2 3 3 1 ⎤ 3 1⎦ , 2 calculate det(AB), detA, detB, and hence verify that det(AB) = detAdetB.
  • 161. Section 3.4 3.4 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication 143 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication To motivate what is to follow we will examine the processes involved when solving by elimination the system of linear equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm, (16) though later more will need to be said about the details of this important problem, and how it is influenced by the number of equations m and the number of unknowns n. Elementary Row Operations the three basic types of elementary row operation The three types of elementary row operations used when solving equations (16) by elimination are: TYPE I The interchange of two equations TYPE II The scaling of an equation by a nonzero constant TYPE III The addition of a scalar multiple of an equation to another equation In matrix notation the system of equations (16) becomes Ax = b, (17) where A = [ai j ] is an m × n matrix, x = [x1 , x2 , . . . , xn ]T , and b = [b1 , b2 , . . . , bm]T . The three elementary row operations of types I to III that can be performed on the equations in (16) can be interpreted as the corresponding operations performed on the rows of the matrices A and b. This is equivalent to performing these same operations on the rows of the new matrix denoted by (A, b), defined as ⎡ a11 a12 . . . ⎢ a21 a22 . . . ⎢ (A, b) = ⎢ · · · · · · ⎣ am1 the augmented matrix am2 a1n a2n . . . amn ⎤ b1 b2 ⎥ ⎥ ⎥, ⎦ (18) bm that has m rows and n + 1 columns and is obtained by inserting the column vector b containing the nonhomogeneous terms on the right of matrix A. When considering the system of linear equations in (16), matrix (A, b) is called the augmented matrix associated with the system. The separation of the last column in (18) by a vertical dashed line is to indicate partitioning of the matrix to show that the elements of the last column are not elements of the coefficient matrix A.
  • 162. 144 Chapter 3 Matrices and Systems of Linear Equations We are now in a position to introduce a notation for the three elementary row operations that are necessary when using an elimination process to find the solution of a system of equations in matrix form (ordinary or augmented). Elementary row operations The three elementary row operations that may be performed on a matrix are: (i) The interchange of the ith and jth rows, which will be denoted by R{i → j, j → i}. (ii) The replacement of each element in the ith row by its product with a nonzero constant α, which will be denoted by R{(α)i → i}. (iii) The replacement of each element of the jth row by the sum of β times the corresponding element in the ith row and the element in the jth row, which will be denoted by R{(β)i + j → j}. EXAMPLE 3.16 To illustrate the elementary row operations, we consider the matrix ⎡ ⎤ 1 6 4 −3 2 7 4⎦ . A = ⎣2 0 1 5 2 8 2 3 An example of an elementary row operation of type (i) performed on A is provided by R{1 → 3, 3 → 1}. This requires rows 1 and 3 to be interchanged to give the new matrix ⎡ ⎤ 5 2 8 2 3 7 4⎦ . R{1 → 3, 3 → 1}A = ⎣2 0 1 1 6 4 −3 2 An example of an elementary row operation of type (ii) performed on A is provided by R{(−3)1 → 1}. This requires each element in row 1 to be multiplied by −3 to give the new matrix ⎡ ⎤ −3 −18 −12 9 −6 0 1 7 4⎦ . R{(−3)1 → 1}A = ⎣ 2 5 2 8 2 3 An example of an elementary row operation of type (iii) performed on A is provided by R{(4)1 + 2 → 2}, which requires the elements of row 1 to be multiplied by 4 and then added to the corresponding elements of row 2 to give the new matrix ⎡ ⎤ 1 6 4 −3 2 R{(4)1 + 2 → 2}A = ⎣6 24 17 −5 12⎦ . 5 2 8 2 3 A sequence of elementary row operations performed on the augmented matrix (A, b) will lead to a different augmented matrix (A , b ). However, as this is equivalent to performing the corresponding sequence of operations on the actual equations in (16), although (A, b) and (A , b ) will look different, the interpretation of (A , b ) in terms of the solution of the system of equations in (16) will, of course, be the same as that of (A, b). It will be seen later that the purpose of carrying out these operations on a matrix is to simplify it while leaving its essential
  • 163. Section 3.4 Elementary Row Operations, Elementary Matrices, and Their Connection with Matrix Multiplication 145 algebraic structure unaltered, e.g., without changing the solution x1 , . . . , xn of the corresponding system of equations. The definition that now follows is a consequence of the equivalence, in terms of equations (16), of matrix (A, b) and any matrix (A , b ) that can be derived from it by means of a sequence of elementary row operations, though the definition applies to matrices in general, and not only to augmented matrices. Row equivalence of matrices Two m × n matrices will be said to be row equivalent if one can be obtained from the other by means of a sequence of elementary row operations. Row equivalence between matrices A and B is denoted by writing A ∼ B. The row equivalence of matrices has the useful properties listed in the following theorem. THEOREM 3.5 Reflexive, symmetric, and transitive properties of row equivalence (i) Every m × n matrix A is row equivalent to itself (reflexive property). (ii) Let A and B be m × n matrices. Then if A is row equivalent to B, B is row equivalent to A (symmetric property). (iii) Let A, B, and C be m × n matrices. Then if matrix A is row equivalent to B and B is row equivalent to C, A is row equivalent to C (transitive property). Proof (i) The property is self-evident. (ii) To establish this property we must show the three elementary row operations involved are reversible. In the case of elementary row operations of type (i) the result follows from the fact that if an application of the operation R{i → j, j → i} to matrix A yields a new matrix B, an application of the operation R{ j → i, i → j} to matrix B generates the original matrix A. Similarly, in the case of elementary row operations of type (ii), if an application of the operation R{(α)i → i} to matrix A yields a new matrix B, an application of the operation R{(1/α)i → i} to matrix B reproduces the original matrix A. Finally we consider the case of elementary row operations of type (iii). If an application of the operation R{(β)i + j → j} to matrix A yields a new matrix B, an application of the operation R{(−β)i + j → j} to B returns the original matrix A. Taken together these results establish property (ii). (iii) Using property (ii) in (iii) establishes the row equivalence first of A and B, and then of B and C, and hence of A and C, so property (iii) is proved. Let us now define what are called elementary matrices and examine the effect they have when used to premultiply a matrix. Elementary matrices An n × n elementary matrix is any matrix that is obtained from an n × n unit matrix I by performing a single elementary row operation.
  • 164. 146 Chapter 3 Matrices and Systems of Linear Equations The following concise notation will be used to identify the elementary matrices that correspond to each of the three elementary row operations. the three basic types of elementary matrix EXAMPLE 3.17 TYPE I Ei j will denote the elementary matrix obtained from the unit matrix I by interchanging its ith and jth rows. TYPE II Ei (c) will denote the matrix obtained from the unit matrix I by multiplying its ith row by the nonzero scalar c. TYPE III Ei j (c) will denote the matrix obtained from the unit matrix I by adding c times its ith row to its jth row. Let I be the 3 × 3 unit matrix. Then ⎡ ⎤ ⎡ 1 0 0 1 0 I = ⎣0 1 0⎦ , E23 = ⎣0 0 0 0 1 0 1 ⎤ ⎡ 0 1 1⎦ , E3 (4) = ⎣0 0 0 ⎡ ⎤ 1 0 0 E13 (5) = ⎣0 1 0⎦ . 5 0 1 0 1 0 ⎤ 0 0⎦ , 4 and Determinants of Elementary Matrices It follows directly from the definitions of elementary matrices that: (a) The determinant of an elementary matrix of Type I is −1, because two rows of a unit matrix have been interchanged so, in terms of Ei j , we have det(Ei j ) = −1. (b) The determinant of an elementary matrix of Type II in which a row is multiplied by a nonzero constant c is c, because a row of a unit matrix has been multiplied by c so, in terms of Ei (c), we have det(Ei (c)) = c. (c) The determinant of an elementary matrix of Type III in which c times one row has been added to another row is 1, because the addition of a multiple of a row of a unit matrix to another row leaves its value unchanged so, in terms of Ei j (c), we have det(Ei j (c)) = 1. The next theorem shows that premultiplication of a matrix A by an elementary matrix E that is conformable for multiplication performs on A the same elementary row operation that was used to generate E from I. THEOREM 3.6 Row operations performed by elementary matrices Let E be an m × m elementary matrix produced by performing an elementary row operation on the unit matrix I, and let A be an m × n matrix. Then the matrix product EA is the matrix that is obtained when the row operation that generated E from I is performed on A. Proof The proof of the theorem follows directly from the definition of a matrix product and the fact that, with the exception of the ith element in the ith row of I, which is 1, all the other elements in that row are zero. So if E is the elementary matrix obtained from I by replacing the element 1 in its ith row by α, the result of the matrix product EA will be that the elements in the ith row of A will be multiplied by α. As the form of argument used to establish the effect on A of premultiplication by P to form PA can also be employed when the other two elementary row operations are used to generate an elementary matrix E, the details will be left as an exercise.
  • 165. Section 3.5 EXAMPLE 3.18 The Echelon and Row-Reduced Echelon Forms of a Matrix Let A be the matrix ⎡ 2 A = ⎣1 6 4 3 1 147 ⎤ 5 7⎦ . 2 If we use the notation for elementary matrices, and introduce the elementary matrix E23 from Example 3.17 obtained by interchanging the last two rows of I3 , a routine calculation shows that ⎡ ⎤⎡ ⎤ ⎡ ⎤ 1 0 0 2 4 5 2 4 5 E23 A = ⎣0 0 1⎦ ⎣1 3 7⎦ = ⎣6 1 2⎦ , 0 1 0 6 1 2 1 3 7 so the product E23 A has indeed interchanged the last two rows of A. Similarly, again using the elementary matrices in Example 3.17, it is easily checked that E3 (4)A multiplies the elements in the third row of A by 4, while E13 (5)A adds five times the first row of A to the last row. The main use of Theorem 3.6 is to be found in the theory of matrix algebra, and in the justification it provides for various practical methods that are used when working with matrices. This is because when solving purely numerical problems the necessary row operations need only be performed on the rows of the augmented matrix instead of on the system of equations itself. Typical uses of the theorem will occur later after a discussion of the linear independence of equations, the definition of what is called the rank of a matrix, and the introduction of the inverse of an n × n matrix A. In this last case, the results of the theorem will be used to provide an elementary method by which what is called the inverse matrix of an n × n matrix can be obtained when n is small. Summary 3.5 This section introduced the three types of elementary row operations that are used when manipulating matrices together with the corresponding three types of elementary matrix that can be used to perform elementary row operations. The Echelon and Row-Reduced Echelon Forms of a Matrix We now use the row equivalence of matrices to reduce a matrix A to one of two slightly different but related standard forms called, respectively, its echelon form and its row-reduced echelon form. It is helpful to introduce these two new concepts by considering the solution of the system of m equations in n unknowns introduced in (16) and written in an equivalent but more condensed form as (A, b), where ⎡ a11 ⎢ a21 ⎢ (A, b) = ⎣ am1 a12 a22 · · am2 . . . a1n . . . a2n · · · · . . . amn ⎤ b1 b2 ⎥ ⎥, ⎦ bm because this is equivalent to the full matrix equation Ax = b. (19)
  • 166. 148 Chapter 3 Matrices and Systems of Linear Equations Echelon and row-reduced echelon forms of a matrix echelon and row-reduced echelon forms A matrix A is said to be in echelon form if: (i) The first nonzero element in each row, called its leading entry, is 1; (ii) In any two successive rows i and i + 1 that do not consist entirely of zeros the leading element in the (i + 1)th row lies to the right of the leading element in ith row; (iii) Any rows that consist entirely of zeros lie at the bottom of the matrix. Matrix A is said to be in row-reduced echelon form if, in addition to conditions (i) to (iii), it is also true that (iv) In a column that contains the leading entry of a row, all the other elements are zero. In summary, this definition means that a matrix A is in echelon form if the first nonzero entry in any row is a 1, the entry appears to the right of the first nonzero entry in the row above, and all rows of zeros lie at the bottom of the matrix. Furthermore, matrix A is in row-reduced echelon form if, in addition to these conditions, the first nonzero entry in any row is the only nonzero entry in the column containing that entry. EXAMPLE 3.19 The following matrices are in echelon form: ⎡ 1 ⎡ ⎤ ⎢0 1 0 5 7 ⎢ ⎣0 0 1 0⎦ and ⎢0 ⎢ ⎣0 0 0 0 0 0 The matrices ⎡ 0 1 0 ⎢0 0 1 ⎢ ⎣0 0 0 0 0 0 2 1 0 0 0 0 1 0 5 3 1 0 ⎤ 0 2⎥ ⎥, 0⎦ 0 ⎡ 1 ⎣0 0 0 1 0 0 0 1 1 0 0 0 0 9 2 1 1 1 0 0 0 1 2 1 0 0 ⎤ 2 3⎦ , 0 1 0 5 1 0 ⎤ 1 1⎥ ⎥ 2⎥ . ⎥ 3⎦ 0 ⎡ and 1 ⎣0 0 0 1 0 0 0 1 ⎤ 5 2⎦ 1 are in row-reduced echelon form. Rules for the reduction of a matrix to echelon form rules for finding the echelon form The reduction of the m × n matrix to its echelon form is accomplished by means of the following steps: 1. Find the row whose first nonzero element is furthest to the left and, if necessary, move it into row 1; if there is more than one such row, choose the row whose first nonzero element has the largest absolute value. 2. Scale row 1 to make its leading entry 1. 3. Subtract multiples of row 1 from the m − 1 rows below it to reduce to zero all entries that lie below the leading entry in the first column. 4. In the m − 1 rows below row 1, find the row whose first nonzero entry is furthest to the left and, if necessary, move it into row 2; if there is more
  • 167. Section 3.5 5. 6. 7. 8. The Echelon and Row-Reduced Echelon Forms of a Matrix 149 than one such row, choose the row whose first nonzero entry has the largest absolute value. Scale row 2 to make its leading entry 1. Subtract multiples of row 2 from the m − 2 rows below it to reduce to zero all entries in the column below the leading entry in row 2. Continue this process until either the first nonzero entry in the mth row is 1, or a stage is reached at which all subsequent rows consist entirely of zeros. The matrix is then in its echelon form. Remark The selection in Step 1, and the steps corresponding to Step 4, of a row whose first nonzero entry has the largest magnitude is made to reduce computational errors, and is not necessary mathematically. This criterion is introduced to ensure that the elimination procedure does not use an unnecessary scaling of a nonzero entry of small absolute magnitude to reduce to zero an entry of large absolute magnitude. rules for finding the row-reduced echelon form Rules for the reduction of a matrix to row-reduced echelon form 1. Proceed as in the reduction of a matrix to echelon form, but when steps equivalent to Step 6 are reached, in addition to subtracting multiples of the row containing a leading entry 1 from the rows below to reduce to zero all elements in the column below the leading entry, this same process must be repeated to reduce to zero all elements in the column above the leading entry. 2. An equivalent approach is first to reduce the matrix to echelon form and then, starting with row 2 and working downwards, to subtract multiples of successive rows from the rows above to generate columns with leading entries to ones with the single nonzero entry 1. Each of these methods reduces a matrix to its row-reduced echelon form. The row equivalence of a matrix with either its echelon or its row-reduced echelon form means that the different-looking systems of equations represented by these three matrices all have identical solution sets. The simplified structure of the row echelon and row-reduced echelon forms of the original augmented matrix makes the solution of the associated system of equations particularly easy, as can be seen from the following examples. EXAMPLE 3.20 Reduce the following matrix to its echelon and its row-reduced echelon form: ⎡ 0 ⎢2 ⎢ ⎣1 1 1 4 2 3 2 8 4 6 0 2 2 1 ⎤ 3 4⎥ ⎥. 2⎦ 5
  • 168. 150 Chapter 3 Matrices and Systems of Linear Equations ⎡ 0 ⎢2 Solution ⎢ ⎣1 1 1 4 2 3 2 8 4 6 0 2 2 1 ⎡ 1 ⎢0 ∼ ⎢ divide row 1 by 2 ⎣1 1 ⎤ ⎡ ⎤ 3 2 4 8 2 4 ∼ 4⎥ switch rows ⎢0 1 2 0 3⎥ ⎥ ⎢ ⎥ 2⎦ 2 and 1 ⎣1 2 4 2 2⎦ 5 1 3 6 1 5 ⎤ ⎡ 2 4 1 2 1 ∼ 1 2 0 3⎥ subtract row 1 ⎢0 ⎥ ⎢ 2 4 2 2⎦ from rows 3 and 4 ⎣0 3 6 1 5 0 ⎡ 1 ∼ ⎢0 subtract row 2 ⎢ ⎣0 from row 4 0 2 1 0 0 4 2 0 0 2 1 0 1 4 2 0 2 1 0 1 0 ⎤ 2 3⎥ ⎥ 0⎦ 3 ⎤ 2 3⎥ ⎥ 0⎦ 0 1 0 1 0 and the matrix is now in echelon form. Having already obtained the echelon form of the matrix, we now use it to obtain the row-reduced echelon form. We already have ⎡ ⎤ ⎡ ⎤ 0 1 2 0 3 1 2 4 1 2 ∼ ⎢2 4 8 2 4⎥ ⎢0 1 2 0 3⎥ ⎢ ⎥ ⎢ ⎥subtract twice row 2 ⎣1 2 4 2 2⎦ ∼ ⎣0 0 0 1 0⎦ from row 1 1 3 6 1 5 0 0 0 0 0 ⎤ ⎡ ⎡ ⎤ 1 0 0 0 −4 1 0 0 1 −4 ∼ ⎢0 1 2 0 3⎥ 3⎥ subtract row 3 ⎢0 1 2 0 ⎥, ⎢ ⎢ ⎥ ⎦ ⎣0 0 0 1 ⎣0 0 0 1 0⎦ 0 from row 1 0 0 0 0 0 0 0 0 0 0 and the matrix is now in its row-reduced echelon form. EXAMPLE 3.21 Solve the system of equations x2 + 2x3 = 3 2x1 + 4x2 + 8x3 + 2x4 = 4 x1 + 2x2 + 4x3 + 2x4 = 2 x1 + 3x2 + 6x3 + x4 = 5. Solution The augmented matrix (A, b) for this system is the matrix in Example 3.20 that was shown to be equivalent to the row-reduced echelon form ⎤ ⎡ 1 0 0 0 −4 ⎢ ⎥ ⎢0 1 2 0 3⎥ ⎢ ⎥. ⎢0 0 0 1 0⎥ ⎦ ⎣ 0 0 0 0 0 If we recall that the first four columns of this matrix contain the coefficients of x1 , x2 , x3 , and x4 , while the last column contains the nonhomogeneous terms, the matrix implies the much simpler system of equations x4 = 0, x2 + 2x3 = 3, and x1 = −4.
  • 169. Section 3.5 The Echelon and Row-Reduced Echelon Forms of a Matrix 151 As there are only three equations connecting four unknowns, it follows that in the second equation either x2 or x3 can be assigned arbitrarily, so if we choose to set x3 = k (an arbitrary number), the solution set of the system in terms of the parameter k becomes x1 = −4, x2 = 3 − 2k, x3 = k, and x4 = 0. The same solution could have been obtained from the echelon form of the matrix ⎡ ⎤ 1 2 4 1 2 ⎢ ⎥ ⎢0 1 2 0 3⎥ ⎢ ⎥ ⎢0 0 0 1 0⎥ , ⎣ ⎦ 0 0 0 0 0 because this implies the system of equations x1 + 2x2 + 4x3 + x4 = 2, x2 + 2x3 = 3, and x4 = 0. Starting from the last equation we find x4 = 0, and setting x3 = k in the middle equation gives, as before, x2 = 3 − 2k. Finally, substituting x2 , x3 , and x4 in the first equation gives x1 = −4. This process of arriving at a solution of a system of equations whose coefficient matrix is in upper triangular form is called back substitution. It should be noticed that the system of equations would have had no solution if the row-reduced echelon form had been ⎤ ⎡ 1 0 0 0 −4 ⎢ ⎥ ⎢0 1 2 0 3⎥ ⎥. ⎢ ⎢0 0 0 1 0⎥ ⎦ ⎣ 5 0 0 0 0 back substitution This is because although the equations corresponding to the first three rows of this matrix would have been the same as before, the fourth row implies 0 = 5, which is impossible. This corresponds to a system of equations where one equation contradicts the others, so that no solution is possible. Summary This section defined two related types of fundamental matrix that can be obtained from a general matrix by means of elementary row operations. The first was a reduction to echelon form and the second, derived from the first form, was a reduction to row-reduced echelon form. Each of the reduced forms retains the essential properties of the original matrix, while simplifying the task of solving the associated system of linear algebraic equations. EXERCISES 3.5 Let P, Q, and R be the matrices ⎡ 3 P = ⎣0 0 0 1 0 ⎤ 0 0⎦ , 1 ⎡ 0 Q = ⎣0 1 0 1 0 ⎤ 1 0⎦ , 0 ⎡ 1 R = ⎣0 0 2 1 0 ⎤ 0 0⎦ . 1 In Exercises 1 through 4 verify by direct calculation that (a) premultiplication by P multiplies row 1 by 3; (b) premultiplication by Q interchanges rows 1 and 3; and (c) premultiplication by R adds twice row 2 to row 1. ⎤ ⎤ ⎡ ⎡ 4 0 1 2 1 1 3. ⎣2 0 3⎦. 1. ⎣1 3 0⎦. 1 2 5 1 2 4 ⎡ ⎤ ⎡ ⎤ 9 1 3 1 −1 2 1 3⎦. 4. ⎣2 4 7⎦. 2. ⎣2 1 2 2 3 0 7
  • 170. 152 Chapter 3 Matrices and Systems of Linear Equations In Exercises 5 and 6 write down the required elementary matrices. 5. When I is the 3 × 3 unit matrix, write down E12 , E2 (3), and E12 (6). 6. When I is the 4 × 4 unit matrix, write down E41 , E4 (3), and E23 (4). In Exercises 7 through 12, reduce the given matrices to their row-reduced echelon form. ⎤ ⎡ ⎤ ⎡ 3 2 1 1 0 3 4 1 ⎢ 2 5 1 2⎥ 7. ⎣3 1 2 2⎦. ⎥ ⎢ 1 5 2 1 10. ⎢3 1 1 3⎥. ⎥ ⎢ ⎣ 0 1 3 4⎦ ⎤ ⎡ 4 1 3 1 3 2 1 3 1 8. ⎣2 1 1 2 0⎦. ⎤ ⎡ 2 2 4 1 4 3 2 1 1 0 ⎢1 1 3 2 1⎥ ⎤ ⎡ ⎥ 11. ⎢ 4 −2 2 3 1 ⎣3 2 5 1 4⎦. ⎦. ⎣2 0 0 3 2 9. 1 0 3 1 2 4 1 2 5 1 ⎤ ⎡ 3 2 3 2 12. ⎣3 7 1 −1⎦. 5 1 1 3 3.6 In Exercises 13 through 18, reduce the given augmented matrices to their row-reduced echelon form and, where appropriate, use the result to solve the related system of equations in terms of an appropriate number of the unknowns x1 , x2 , . . . . ⎤ ⎤ ⎡ ⎡ 2 3 1 0 2 1 0 2 1 ⎥ ⎥ ⎢ ⎢ ⎢1 3 1 4 2⎥ 13. ⎢1 3 1 4⎥. ⎥ ⎢ ⎦ ⎣ 16. ⎢ ⎥. ⎢2 1 2 3 1⎥ 6 9 4 8 ⎦ ⎣ ⎤ ⎡ 4 7 4 11 7 2 1 1 0 ⎥ ⎢ ⎤ ⎡ 14. ⎢2 3 1 4⎥. 1 0 1 0 2 0 ⎦ ⎣ ⎥ ⎢ ⎢2 2 6 0 6 0⎥ 4 9 4 8 ⎥ ⎢ 17. ⎢ ⎥. ⎡ ⎤ ⎢1 0 1 1 6 0⎥ 0 2 1 1 1 ⎦ ⎣ ⎢ ⎥ ⎢1 3 1 2 1⎥. 3 2 7 0 8 2 15. ⎣ ⎦ ⎡ ⎤ 3 9 4 3 0 3 0 6 0 6 ⎢ ⎥ 18. ⎢1 1 5 1 9 ⎥. ⎣ ⎦ 2 0 4 2 10 Row and Column Spaces and Rank The reduction of an m × n matrix A to either its echelon or its row-reduced echelon form will produce a row of zeros whenever the row is a linear combination of some (or all) of the rows above it. So if an echelon form contains r ≤ m nonzero rows, it follows that these r rows are linearly independent, and hence that the remaining m − r rows are linearly dependent on the first r rows. The number r is called the row rank of matrix A. This means that if the r nonzero rows of an echelon form u1 , u2 , . . . , ur are regarded as n element row vectors belonging to a vector space Rn , the r vectors will span a subspace of Rn . Consequently, as these vectors form a basis for this subspace, every vector in it can be expressed as a linear combination of the form a1 u1 + a2 u2 + · · · + ar ur , row and column ranks and spaces where the a1 , a2 , . . . , ar are scalar constants. This subspace of Rn is called the row space of matrix A. It should be remembered that the vectors forming a basis for a space are not unique, and that any basis can be transformed to any other one by means of suitable linear combinations of the vectors involved. So although the r nonzero rows of the echelon form of A and those of its row-reduced echelon form look different, they are equivalent, and each forms a basis for the row space of A. Just as there may be linear dependence between the rows of A, so also may there be linear dependence between its columns. If s of the n columns of an m × n matrix A are linearly independent, the number s is called the column rank of matrix A. When the s nonzero columns v1 , v2 , . . . , vs are regarded as m element column vectors belonging to a vector space Rm, these vectors will span a subspace of Rm.
  • 171. Section 3.6 Row and Column Spaces and Rank 153 Consequently, as these vectors form a basis for this subspace, every vector in it can be expressed as a linear combination of the form b1 v1 + b2 v2 + · · · + bs vs , where the b1 , b2 , . . . , bs are scalar constants. This subspace of Rm is called the column space of matrix A. The connection between the row and column ranks of a matrix is provided by the following theorem. THEOREM 3.7 equality of the rank of a matrix and its transpose The equality of the row and column ranks Let A be any matrix. Then the row rank and column rank of A are equal. Proof Let an m × n matrix A have row rank r . Then in its row-reduced echelon form it must contain r columns v1 , v2 , . . . , vr , in each of which only the single nonzero entry 1 appears. Call these columns the leading columns of the row-reduced echelon form, and let them be arranged so that in the ith column v, the entry 1 appears in the ith row. The row-reduced echelon form of A will comprise the leading columns arranged in numerical order with, possibly, columns between the ith and the (i + 1)th leading columns in which zero elements lie below the ith row but nonzero elements may occur above it. Furthermore, there may be columns to the right of column vr in which zero elements lie below the r th row but nonzero elements may lie above it. By subtracting suitable multiples of the leading columns from any columns that lie between them or to the right of vr , it is possible to reduce all entries in such columns to zero. Consequently, at the end of this process, the only remaining nonzero columns will be the r linearly independent leading columns v1 , v2 , . . . , vr . This establishes the equality of the row and column ranks. Rank The rank of matrix A, denoted by rank (A), is the value common to the row and column ranks of A. THEOREM 3.8 Rank of A and AT Let A be any matrix. Then rank (A) = rank (AT ). Proof The columns of A are the rows of AT , so the column rank of A is the row rank of AT . However, by Theorem 3.7 these two ranks are equal, so the result is proved. EXAMPLE 3.22 Let ⎡ 1 ⎢2 A=⎢ ⎣1 1 0 1 0 0 3 7 3 3 0 0 2 0 4 10 6 4 ⎤ 0 1⎥ ⎥. 4⎦ 0
  • 172. 154 Chapter 3 Matrices and Systems of Linear Equations Then the row-reduced echelon form of A is B (B ∼ A) ⎡ ⎤ 1 0 3 0 4 0 ⎢0 1 1 0 2 1⎥ ⎥ B=⎢ ⎣0 0 0 1 1 2⎦, 0 0 0 0 0 0 showing that the number of leading columns is 3, so the row rank of A is 3, and hence its rank is 3. Three row vectors spanning a subspace of R6 , and so forming a basis for this subspace, are the three nonzero row vectors in this 4 × 6 matrix, u1 = [1, 0, 3, 0, 4, 0], u2 = [0, 1, 1, 0, 2, 1], The row-reduced echelon form of AT is ⎡ 1 0 ⎢0 1 ⎢ ⎢0 0 ⎢ ⎢0 0 ⎢ ⎣0 0 0 0 and u3 = [0, 0, 0, 1, 1, 2]. ⎤ 1 0⎥ ⎥ 0⎥ ⎥, 0⎥ ⎥ 0⎦ 0 0 0 1 0 0 0 showing that the number of leading columns is 3, confirming as would be expected that the column rank of A (the row rank of AT ) is 3. The three row vectors of AT spanning a subspace of R4 , and so forming a basis for this subspace, are the three nonzero rows in this 6 × 4 matrix, namely, [1, 0, 0, 1], [0, 1, 0, 0], and [0, 0, 1, 0]. The three linearly independent column vectors of A are obtained by transposing these vectors to obtain ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 ⎢0⎥ ⎢1⎥ ⎢0⎥ v1 = ⎢ ⎥ , v2 = ⎢ ⎥ , and v3 = ⎢ ⎥ . ⎣0⎦ ⎣0⎦ ⎣1⎦ 1 0 0 Summary This section introduced the important algebraic concepts of the rank of a matrix, and of the row and column spaces of a matrix. The equality of the row and column ranks of a matrix was then proved. It will be seen later that the rank of a matrix plays a fundamental role when we seek a solution of a linear algebraic system of equations. EXERCISES 3.6 In Exercises 1 through 14 find the row-reduced echelon form of the given matrix, its rank, a basis for its row space, and a basis for its column space. ⎤ ⎡ ⎡ ⎤ 1 3 2 1 1 3 1 0 1 1 ⎢2 0 2 1⎥ 1. ⎣2 2 1 0 0 1⎦ . ⎥ 2. ⎢ ⎣1 0 4 5⎦ . 0 2 1 4 1 3 0 1 2 4 ⎡ 3 ⎢4 3. ⎢ ⎣2 3 4. 2 1 0 2 1 0 0 2 0 0 3 2 0 0 ⎡ ⎤ 6 0 11 3⎥ ⎥. 4 0⎦ 6 3 1 1 0 4 2 1 4 . 2 1 5. ⎣2 3 ⎡ 3 6. ⎣1 8 2 3 2 2 2 8 ⎤ 3 1⎦ . 1 ⎤ 4 2 ⎦. 12
  • 173. Section 3.7 ⎡ 1 ⎢3 7. ⎢ ⎣2 0 ⎡ 2 ⎢1 ⎢ 8. ⎢1 ⎢ ⎣3 2 3.7 3 0 3 3 ⎤ 4 4⎥ ⎥. 1⎦ 5 1 1 2 3 3 3 0 1 4 1 ⎤ 1 3⎥ ⎥ 0⎥ . ⎥ 1⎦ 3 ⎡ 1 9. ⎣2 3 2 1 3 1 0 1 4 5 1 2 5 7 ⎡ 2 10. ⎣0 2 4 2 6 0 1 1 10 3 13 The Solution of Homogeneous Systems of Linear Equations ⎤ 7 1⎦ . 8 ⎤ 8 1⎦ . 9 ⎡ 0 −1 ⎢ 0 ⎢0 11. ⎢ ⎢0 0 ⎣ 1 0 ⎡ 1 0 ⎢ 1 −1 ⎢ 12. ⎢ ⎢2 5 ⎣ 1 3 4 3 ⎤ ⎥ 1 2⎥ ⎥. 0 −1 ⎥ ⎦ 0 0 ⎤ 0 0 ⎥ 0 0⎥ ⎥. −1 0 ⎥ ⎦ 2 1 ⎡ 1 ⎤ 7 2 0 5 7⎥. ⎦ 0 0 3 5 0 3 1 1 2 3 3 4 4 5 7 ⎢ 13. ⎣ 0 0 ⎡ 1 ⎢2 ⎢ ⎢ 14. ⎢ 1 ⎢ ⎢3 ⎣ 4 155 ⎤ 1⎥ ⎥ ⎥ 2⎥. ⎥ 3⎥ ⎦ 5 The Solution of Homogeneous Systems of Linear Equations Having now introduced the echelon and row-reduced echelon forms of an m × n matrix A, we are in a position to discuss the nature of the solution set of the system of linear equations homogeneous and nonhomogeneous systems of equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm, (20) which will be nonhomogeneous when at least one of the terms bi on the right is nonzero, and homogeneous when b1 = b2 = · · · = bm = 0. In this section we will only consider homogeneous systems. Rather than working with the full system of homogeneous equations corresponding to bi = 0, i = 1, 2, . . . , m in (20), it is more convenient to work with its coefficient matrix ⎡ a11 ⎢ a21 A=⎢ ⎣ am1 a12 a22 . . . am2 . . . . . . . . ⎤ . a1n . a2n ⎥ ⎥, ⎦ . . amn (21) which contains all the information about the system. The coefficients in the first column of A are multipliers of x1 , those in the second column are multipliers of x2 , . . . , and those in the nth column are multipliers of xn . Denote by AE either the echelon or the row-reduced echelon form of the coefficient matrix A. Then, as elementary row operations performed on a coefficient matrix are equivalent in all respects to performing the same operations on the corresponding full system of equations, the solution set of the matrix equation Ax = 0 (22) will be the same as the solution set of an echelon form of the homogeneous equations AE x = 0. (23)
  • 174. 156 Chapter 3 Matrices and Systems of Linear Equations trivial solution It is obvious that x = 0, corresponding to x = [0, 0, . . . , 0]T , is always a solution of (22) and, of course of (23), and it is called the trivial solution of the homogeneous system of equations. To discover when nontrivial solutions exist it is necessary to work with the equivalent echelon form of the equations given in (23). If rank(A) = r , the first r rows of AE will be nonzero rows, and the last m − r rows will be zero rows. As there are m rows in A, we must consider the three separate cases (a) m < n, (b) m = n, and (c) m > n. Case (a): m < n. In this case there are more variables than equations. As rank(A) = r , and there are m equations, it follows that r = rank(A) ≤ m. The system in (22) will thus contain only r linearly independent equations corresponding to the first r rows of AE . So working with system (23), we see that r of the variables x1 , x2 , . . . , xn will be determined in terms of the remaining m − r variables regarded as parameters (see Example 3.23). Case (b): m = n. In this case the number of variables equals the number of equations. If rank(A) = r < n we have the same situation as in Case (a), and the variables x1 , x2 , . . . , xn will be determined by the system of equations in (23) in terms of the remaining m − r variables regarded as parameters. However, if r = n, only the trivial solution x = 0 is possible, because in this case AE becomes the unit matrix In , from which it follows directly that x = 0. Case (c): m > n. In this case the number of equations exceeds the number of variables and r = rank(A) ≤ n. This is essentially the same situation as in Case (b), because if r = rank(A) < n, the variables x1 , x2 , . . . , xn will be determined by the system of equations in (22) in terms of the remaining m − r variables regarded as parameters, while if rank(A) = n only the trivial solution x = 0 is possible. The practical determination of solution sets to homogeneous systems of linear equations is illustrated in the next example. EXAMPLE 3.23 Find the solution sets of the homogeneous systems of linear equations with coefficient matrices given by: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 3 6 1 1 2 1 7 0 1 3 2 ⎢1 4 2 2⎥ ⎥ (a) A = ⎣3 6 4 24 3⎦, (b) A = ⎣2 1 0⎦, (c) A = ⎢ ⎣4 11 10 5⎦, 1 4 4 12 3 1 2 1 1 0 1 1 ⎡ ⎤ 1 4 1 2 ⎡ ⎤ ⎢1 3 0 1⎥ 1 2 3 1 4 3 ⎢ ⎥ (d) A = ⎢2 1 1 1⎥ , (e) A = ⎣0 1 3 0 1 5⎦ ⎢ ⎥ ⎣4 9 3 5⎦ 3 1 2 3 1 4 5 5 2 3 Solution (a) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 8 3 AE = ⎣0 1 0 −2 −3⎦ , 0 0 1 3 3
  • 175. Section 3.7 The Solution of Homogeneous Systems of Linear Equations 157 showing that rank(A) = 3. This corresponds to the following three equations between the five variables x1 , x2 , x3 , x4 , and x5 : x1 + 8x4 + 3x5 = 0, x2 − 2x4 − 3x5 = 0, x3 + 3x4 + 3x5 = 0. and Letting x4 = α and x5 = β be arbitrary numbers (parameters) allows the solution set to be written x1 = −8α − 3β, x2 = 2α + 3β, x3 = −3α − 3β, x4 = α, x5 = β. (b) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 AE = ⎣0 1 0⎦ , 0 0 1 showing that rank(A) = 3. This corresponds to the trivial solution x1 = x2 = x3 = 0. (c) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 20/13 ⎢0 1 0 5/13 ⎥ ⎥ AE = ⎢ ⎣0 0 1 −7/13 ⎦ , 0 0 0 0 showing that rank(A) = 3. This corresponds to the solution set x1 + (20/13)x4 = 0, x2 + (5/13)x4 = 0, and x3 − (7/13)x4 = 0. Setting x4 = k, an arbitrary number (a parameter), shows the solution set to be given by x1 = −(20/13)k, x2 = −(5/13)k, x3 = (7/13)k, and x4 = k. (d) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 0 ⎢0 1 0 1/3⎥ ⎢ ⎥ AE = ⎢0 0 1 2/3⎥ , ⎢ ⎥ ⎣0 0 0 0 ⎦ 0 0 0 0 showing that rank(A) = 3. This corresponds to the following three equations for the four variables x1 , x2 , x3 , and x4 : x1 = 0, x2 + (1/3)x4 = 0, and x3 + (2/3)x4 = 0. Setting x4 = k, an arbitrary number (a parameter), shows the solution set to be given by x1 = 0, x2 = −k/3 = 0, x3 = −2k/3, and x4 = k. (e) The row-reduced echelon form of the matrix is ⎡ ⎤ 1 0 0 1 −1/4 1/2 AE = ⎣0 1 0 0 13/4 −5/2⎦ , 0 0 1 0 −3/4 5/2 showing that rank(A) = 3. This corresponds to the following three equations for the six variables x1 to x6 : x1 + x4 − (1/4)x5 + (1/2)x6 = 0, x2 + (13/4)x5 − (5/2)x6 = 0 x3 − (3/4)x5 + (5/2)x6 = 0.
  • 176. 158 Chapter 3 Matrices and Systems of Linear Equations Setting x4 = α, x5 = β, and x6 = γ , where α, β, and γ are arbitrary numbers (parameters), shows the solution set to be given by x1 = −α + (1/4)β − (1/2)γ , x2 = −(13/4)β + (5/2)γ , x4 = α, x5 = β, and x6 = γ . Summary x3 = (3/4)β − (5/2)γ This section made use of the rank of a matrix to determine when a nontrivial solution of a linear system of homogeneous linear algebraic equations exists and, when it does, its precise form. EXERCISES 3.7 In Exercises 1 through 10, use the given form of the matrix A to find the solution set of the associated homogeneous linear system of equations Ax = 0. ⎤ ⎡ ⎤ ⎡ 1 2 4 1 1 3 2 1 1 ⎢0 3 1 3⎥ 1. ⎣1 1 0 1 2⎦. ⎥ 3. ⎢ ⎣1 4 1 3⎦. 0 1 2 1 3 ⎤ ⎡ 2 6 5 4 1 2 0 1 1 ⎤ ⎡ ⎢0 3 1 0 1⎥ 1 2 1 0 ⎥ 2. ⎢ ⎣2 0 2 0 1⎦. ⎢2 1 0 1⎥ ⎥ 4. ⎢ 1 0 3 1 1 ⎣0 3 5 1⎦. 1 0 1 5 3.8 ⎡ 1 ⎢2 ⎢ 5. ⎢1 ⎢ ⎣3 2 ⎡ 2 ⎢1 ⎢ 6. ⎢0 ⎢ ⎣1 0 ⎡ 1 ⎢0 ⎢ 7. ⎣ 1 2 3 1 0 1 3 ⎡ ⎤ 4 3⎥ ⎥ 2⎥. ⎥ 1⎦ 1 1 2 1 3 4 1 3 4 1 1 ⎤ 3 0⎥ ⎥ 2⎥. ⎥ 2⎦ 1 5 1 2 3 2 4 1 0 2 1 0 1 1 0 0 1 3 1 2 0 ⎤ 2 1⎥ ⎥. 0⎦ 2 1 ⎢2 ⎢ 8. ⎣ 5 2 ⎡ 1 9. ⎣2 0 ⎡ 1 ⎢2 10. ⎢ ⎣0 1 4 1 6 1 1 3 7 0 ⎤ 0 1⎥ ⎥. 2⎦ 1 1 3 1 5 1 0 0 2 1 3 5 1 0 2 1 2 3 1 0 0 1 ⎤ 0 1 1 3⎦. 3 0 ⎤ 1 2⎥ ⎥. 3⎦ 2 The Solution of Nonhomogeneous Systems of Linear Equations We now turn our attention to the solution of the nonhomogeneous system of equations in (20) that may be written in the matrix form Ax = b, (24) where A is an m × n matrix and b is an m × 1 nonzero column vector. In many respects the arguments we now use parallel the ones used when seeking the form of the solution set for a homogeneous system, but there are important differences. This time, rather than working with the matrix A, we must work with the augmented matrix (A, b) and use elementary row operations to transform it into either an echelon or a row-reduced echelon form that will be denoted by (A, b)E . When this is done, system (24) and the echelon form corresponding to (A, b)E will, of course, each have the same solution set. It is important to recognize that rank(A) is not necessarily equal to rank (A, b)E , so that in general rank(A) ≤ rank((A, b)E ). The significance of this observation will become clear when we seek solutions of systems like (24).
  • 177. Section 3.8 The Solution of Nonhomogeneous Systems of Linear Equations 159 Case (a): m < n. In this case there are more variables than equations, and it must follow that rank((A, b)E ) ≤ m. If rank(A) = rank((A, b)E ) = r , it follows that r of the equations in (24) are linearly independent and m − r are linear combinations of these r equations. This means that the first r rows of (A, b)E are linearly independent while the last m − r rows are rows of zeros. Thus, r of the variables x1 to xn will be determined by the equations corresponding to these r nonzero rows, in terms of the remaining m − r variables as parameters. It can happen, however, that rank(A) = r < rank((A, b)E ), and then the situation is different, because one or more of the rows following the r th row will have zeros in its first n entries and nonzero numbers for their last entries. When interpreted as equations, these will imply contradictions, because they will assert expressions such as 0 = c with c = 0 that are impossible. Thus, no solution will exist if rank(A) = rank((A, b)E ). Case (b): m = n. In this case the number of variables equals the number of equations, and it must follow that rank((A, b)E ) ≤ n. The situation now parallels that of Case (a), because if rank(A) = rank((A, b)E ) = r < m, then r of the equations in (24) will be linearly independent, while m − r will be linear combinations of these r equations. So, as before, the first r rows of (A, b)E will be linearly independent while the last m − r rows will be rows of zeros. Thus, r of the variables x1 to xn will be determined by the equations corresponding to these r nonzero rows in terms of the remaining m − r variables as parameters. In the case r = n, the solution will be unique, because then AE = I. Finally, if rank(A) = rank((A, b)E ), it follows, as in Case (a), that no solution will exist. Case (c): m > n. In this case there are more equations than variables, and it must follow that rank((A, b)E ) ≤ n. If rank(A) = rank((A, b)E ) = r , it follows, as in Case (b), that r of the equations in (24) are linearly independent while m − r are linear combinations of these r equations. Thus, again, the first r rows of (A, b)E will be linearly independent while the last m − r rows will be rows of zeros. Consequently, r of the variables x1 to xn will be determined by the equations corresponding to these r nonzero rows in terms of the remaining m − r variables as parameters. If rank(A) = rank((A, b)E ), then as before no solution will exist. These considerations bring us to the definition of consistent and inconsistent systems of nonhomogeneous equations, with consistent systems having solutions, sometimes in terms of parameters, and inconsistent systems have no solution. consistent and inconsistent systems Consistent and inconsistent nonhomogeneous systems The nonhomogeneous system Ax = b is said to be consistent when it has a solution; otherwise, it is said to be inconsistent. As with homogeneous systems, the practical determination of solution sets of nonhomogeneous systems of linear equations will be illustrated by means of examples.
  • 178. 160 Chapter 3 Matrices and Systems of Linear Equations EXAMPLE 3.24 Find the solution sets for each of the following augmented matrices (A, b), where the matrices A are those given in Example 3.23. ⎡ ⎤ ⎡ ⎤ 1 2 1 7 0 1 1 3 2 2 ⎥ ⎢ ⎥ ⎢ 1⎦ (a) (A, b) = ⎣3 6 4 24 3 0⎦ (b) (A, b) = ⎣2 1 0 1 2 1 −3 1 4 4 12 3 3 ⎤ ⎡ 1 4 1 2 2 ⎡ ⎤ 2 3 6 1 2 ⎢1 3 0 1 0⎥ ⎥ ⎢ ⎥ ⎢ ⎢1 4 ⎥ ⎢ 2 2 3⎥ ⎢ ⎥ (d) (A, b) = ⎢2 1 1 1 3⎥ (c) (A, b) = ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎣4 11 10 5 1⎦ ⎣4 9 3 5 7⎦ 1 0 1 1 2 5 5 2 3 0 ⎡ ⎤ 1 2 3 1 4 3 −2 ⎢ ⎥ 0⎦ . (e) (A, b) = ⎣0 1 3 0 1 5 3 1 2 3 1 4 1 Solution (a) In this case, ⎡ 1 0 0 ⎢ (A, b)E = ⎣0 1 0 0 0 1 8 3 −2 −3 3 3 −7 ⎤ ⎥ 11/2⎦ . −3 As rank(A, b)E = 3, and the rank of matrix A is the rank of the matrix formed by deleting the last column of (A, b)E , it follows that rank(A) = 3. So rank(A, b)E = rank(A), showing the equations to be consistent, so they have a solution. If we remember that the first column contains the coefficients of x1 , the second column the coefficients of x2 , . . . , and the fifth column the coefficients of x5 , while the last column contains the nonhomogeneous terms, we can see that the matrix (A, b)E is equivalent to the three equations x1 + 8x4 + 3x5 = −7, x2 − 2x4 − 3x5 = 11/2, x3 + 3x4 + 3x5 = −3. So, if we set x4 = α and x5 = β, with α and β arbitrary numbers (parameters), the solution set becomes x1 = −8α − 3β − 7, x2 = 2α + 3β + 11/2, x4 = α and x5 = β. x3 = −3α − 3β − 3, (b) In this case, ⎡ 1 ⎢ (A, b)E = ⎣0 0 0 0 1 0 0 1 ⎤ 9 ⎥ −17⎦ . 22 Here A is a 3 × 3 matrix and rank(A) = rank((A, b)E ) = 3, so the equations are consistent and the solution is unique. The solution set is seen to be x1 = 9, x2 = −17, and x3 = 22.
  • 179. Section 3.8 The Solution of Nonhomogeneous Systems of Linear Equations 161 (c) In this case, ⎡ 1 0 0 ⎢0 1 0 ⎢ (A, b)E = ⎢0 0 0 ⎢ ⎣0 0 0 0 0 0 20/13 5/13 −7/13 0 0 ⎤ 0 0⎥ ⎥ 0⎥ . ⎥ 1⎦ 0 This system has no solution because the equations are inconsistent. This follows from the fact that rank(A) = 3, as can be seen from the first four columns, while the five columns show that rank((A, b)E ) = 4, so that rank(A) = rank((A, b)E ). The inconsistency can be seen from the contradiction contained in the last row, which asserts that 0 = 1. (d) In this case ⎡ ⎤ 1 0 0 0 0 ⎢0 1 0 1/3 0⎥ ⎢ ⎥ (A, b)E = ⎢0 0 1 2/3 0⎥ . ⎢ ⎥ ⎣0 0 0 0 1⎦ 0 0 0 0 0 This system also has no solution because the equations are inconsistent. This follows from the fact that rank(A) = 3 and rank((A, b)E ) = 4, so that rank(A) = rank((A, b)E ). The inconsistency can again be seen from the contradiction in the last row, which again asserts that 0 = 1. (e) In this case ⎡ 1 ⎢ (A, b)E = ⎣0 0 0 0 1 −1/4 1 0 0 13/4 −5/2 0 1 0 −3/4 5/2 1/2 5/8 ⎤ ⎥ −21/8⎦ , 7/8 showing that rank(A) = rank((A, b)E ) = 3, so the equations are consistent. Reasoning as in (a) and setting x4 = α, x5 = β, and x6 = γ , with α, β, and γ arbitrary numbers (parameters), shows the solution set to be given by x1 = −α + (1/4)β − (1/2)γ + 5/8, x2 = −(13/4)β + (5/2)γ − 21/8, x3 = (3/4)β − (5/2)γ + 7/8, x4 = α, x5 = β, x6 = γ . general solution of a nonhomogeneous system A comparison of the corresponding solution sets in Examples 3.23 and 3.24 shows that whenever the nonhomogeneous system has a solution, it comprises the sum of the solution set of the corresponding homogeneous system, containing arbitrary parameters, and numerical constants contributed by the nonhomogeneous terms. This is no coincidence, because it is a fundamental property of nonhomogeneous linear systems of equations. The combination of solutions comprising the sum of a solution of the homogeneous system Ax = 0 containing arbitrary constants, and a particular fixed solution of the nonhomogeneous system Ax = b that is free from arbitrary constants, is called the general solution of a nonhomogeneous system. The result is important, so it will be recorded as a theorem.
  • 180. 162 Chapter 3 Matrices and Systems of Linear Equations THEOREM 3.9 General solution of a nonhomogeneous system The nonhomogeneous system of equations Ax = b for which rank(A) = rank ((A, b)E ) has a general solution of the form x = xH + x P , where xH is the general solution of the associated homogeneous system AxH = 0 and xP is a particular (fixed) solution of the nonhomogeneous system AxP = b. Proof Let x be any solution of the nonhomogeneous system Ax = b, and let xP be a solution of the nonhomogeneous system AxP = b that contains no arbitrary constants (a fixed solution). Then, as the equations are linear, A(x − xP ) = Ax − AxP = b − b = 0, showing that the difference xD = x − xP is itself a solution of the homogeneous system. Consequently, all solutions of the nonhomogeneous system are contained in the solution set of the homogeneous system to which xD belongs, and the theorem is proved. Summary This section used the rank of a matrix to determine when a solution of a linear system of nonhomogeneous equations exists and to determine its precise form. If the ranks of a matrix and an augmented matrix are equal, it was shown that a solution exists, furthermore, if there are n equations and the rank r < n, then r unknowns can be expressed in terms of arbitrary values assigned to the remaining n − r unknowns. The system was shown to have a unique solution when r = n, and no solution if the ranks of the matrix and the augmented matrix are different. EXERCISES 3.8 In Exercises 1 through 10 write down a system of equations with an appropriate number of unknowns x1 , x2 , . . . corresponding to the augmented matrix. Find the solution set when the equations are consistent, and state when the equations are inconsistent. ⎤ ⎡ ⎡ ⎤ 1 −2 1 3 11 1 3 1 1 0 ⎥ ⎢ ⎢ ⎥ ⎢1 1 3 2 1 ⎥ ⎢0 3 −2 1 11⎥ ⎢ ⎥ ⎢ ⎥ 3. ⎢ ⎥. ⎢ ⎥ ⎢1 1 0 3 1 ⎥ 1. ⎢2 1 0 4 23⎥ . ⎦ ⎣ ⎢ ⎥ ⎢ ⎥ 2 −1 2 21⎦ 2 0 2 1 0 ⎣3 1 −1 ⎡ 2 1 3 ⎢ 2. ⎢0 1 4 ⎣ 3 0 0 3 1 1 2 2 4 ⎤ 1 ⎥ 1⎥ . ⎦ 1 ⎡ 1 ⎢ 4. ⎢2 ⎣ 5 4 2 3 0 3 1 4 8 5 4 ⎤ ⎥ 2⎥ . ⎦ 8 ⎤ ⎡ 1 −1 2 −1 −4 ⎥ ⎢ 3 1 2 12⎥ ⎢2 ⎥ ⎢ ⎥ ⎢ 2 −2 3 15⎥ . 5. ⎢1 ⎢ ⎥ ⎢ ⎥ 1 −1 1 11⎦ ⎣3 1 ⎡ 1 ⎢ ⎢2 ⎢ ⎢ 6. ⎢0 ⎢ ⎢ ⎣2 ⎡ 1 −1 2 3 1 2 ⎤ 1 1 2 1 6 7 ⎥ 3⎥ ⎥ ⎥ 3⎥ . ⎥ ⎥ 5⎦ 1 −2 1 0 ⎤ 1 2 3 0 1 ⎢ ⎥ ⎢0 1 0 2 1 ⎥ ⎢ ⎥ 7. ⎢ ⎥. ⎢2 1 3 1 0 ⎥ ⎣ ⎦ 1 4 1 5 2 ⎡ 2 1 0 0 3 1 ⎤ ⎢ ⎥ 8. ⎢1 2 1 1 3 0⎥ . ⎣ ⎦ 0 1 2 5 1 2 3 ⎡ 1 ⎢ ⎢1 ⎢ 9. ⎢ ⎢2 ⎣ 0 ⎡ 1 2 1 1 2 1 1 3 5 4 ⎤ ⎥ 0⎥ ⎥ ⎥. 4⎥ ⎦ 1 3 1 1 2 1 ⎤ ⎢ ⎥ 10. ⎢1 −2 1 3 1 0⎥. ⎣ ⎦ 2 0 1 0 3 0
  • 181. Section 3.9 3.9 The Inverse Matrix 163 The Inverse Matrix multiplicative inverse matrix The operation of division is not defined for matrices. However, we will see that n × n matrices A for which detA = 0 have associated with them an n × n matrix B, called its multiplicative inverse, with the property that AB = BA = I. The purpose of this section will be to develop ways of finding the multiplicative inverse of a matrix, which for simplicity is usually called the inverse matrix, but first we give a formal definition of the inverse of a matrix. The inverse of a matrix Let A and B be two n × n matrices. Then matrix A is said to be invertible and to have an associated inverse matrix B if AB = BA = I. Interchanging the order of A and B in this definition shows that if B is the inverse of A, then A must be the inverse of B. To see that not all n × n matrices have inverses, it will be sufficient to try to find a matrix B such that the product AB = I, where A= 1 1 1 1 2 2 2 2 and B= a c b . d The product AB is AB = b a + 2c = d a + 2c a c b + 2d , b + 2d so if this product is to equal the 2 × 2 unit matrix I, it is necessary that a + 2c a + 2c b + 2d 1 = b + 2d 0 0 . 1 Equating corresponding elements in the first columns shows that this can only hold if a + 2c = 1 and a + 2c = 0, while equating corresponding elements in the second columns shows that b + 2d = 0 and b + 2d = 1, which is impossible, so matrix A has no inverse. In this case detA = 0, and we will see later why the nonvanishing of detA is necessary if A is to have an inverse. Nonsingular and singular matrices singular and nonsingular n × n matrices EXAMPLE 3.25 An n × n matrix is said to be nonsingular when its inverse exists, and to be singular when it has no inverse. We have already seen that the matrix A= 1 1 2 , 2
  • 182. 164 Chapter 3 Matrices and Systems of Linear Equations for which detA = 0, has no inverse and so is singular. However, in the case of matrix A that follows, a simple matrix multiplication confirms that it has associated with it an inverse B, where ⎡ ⎤ ⎡ ⎤ 1 0 1 2 1 −2 1 −1⎦ , A = ⎣−1 2 0⎦ and B = ⎣ 1 0 1 1 −1 −1 2 because AB = BA = I. Furthermore, detA = 0, so A is nonsingular, as is B, and each is the inverse of the other. Before proceeding further it is necessary to establish that, when it exists, the inverse matrix is unique. THEOREM 3.10 Uniqueness of the inverse matrix A nonsingular matrix A has a unique inverse. Proof Suppose, if possible, that the nonsingular n × n matrix A has the two different inverses B and C. Then as AC = I, we have B = BI = B(AC) = (BA)C = IC = C, showing that B = C, so the inverse matrix is unique. It is convenient to denote the inverse of a nonsingular n × n matrix A by the symbol A−1 . This is suggested by the exponentation notation (raising to a power), because if for the moment we write A = A1 , then AA−1 = A1 A−1 = I, showing that exponents may be combined in the usual way, with the understanding that A1 A−1 = A(1−1) = A0 = I. THEOREM 3.11 basic properties of the inverse matrix Basic properties of inverse matrices (i) (ii) (iii) (iv) The unit matrix I is its own inverse, so I = I−1 . If A is nonsingular, so also is A−1 , and (A−1 )−1 = A. If A is nonsingular, so also is AT , and (A−1 )T = (AT )−1 . If A and B are nonsingular n × n matrices, so is AB, and (AB)−1 = B−1 A−1 . (v) If A is nonsingular, then (A−1 )m = (Am)−1 for m = 1, 2, . . . . Proof We prove only (i) and (iv), and leave the proofs of (ii), (iii), and (v) as exercises. The proof of (i) is almost immediate, because I2 = I, showing that I = I−1 . To prove (iv) we premultiply B−1 A−1 by AB to obtain ABB−1 A−1 = AIA−1 = AA−1 = I, which shows that (AB)−1 is B−1 A−1 , so the proof is complete. A simple method of finding the inverse of an n × n matrix is by means of elementary row operations, but to justify the method we first need the following theorem.
  • 183. Section 3.9 THEOREM 3.12 The Inverse Matrix 165 Elementary row operation matrices are nonsingular Every n × n matrix E that represents an elementary row operation is nonsingular. Proof Every n × n matrix E that represents an elementary row operation is derived from the unit matrix I by means of one of the three operations defined at the start of Section 3.4. So, as rank(I) = n and E and I are row similar, it follows that rank(E) = n, and so E is also nonsingular. finding an inverse matrix using elementary row operations We can now describe an elementary way of finding an inverse matrix by means of elementary row transformations. Let A be a nonsingular n × n matrix, and let E1 , E2 , . . . , Em represent a sequence of elementary row operations of Types I, II, and III that reduces A to I, so that EmEm−1 . . . E2 E1 A = I. Then postmultiplying this result by A−1 gives EmEm−1 . . . E2 E1 I = A−1 , so A−1 is given by A−1 = EmEm−1 · · · E2 E1 I, where the product of the first m matrices on the right is nonsingular because of Theorem 3.11. Expressed in words, this result states that when a sequence of elementary row operations is used to reduce a nonsingular matrix A to the unit matrix I, performing the same sequence of elementary row operations on I, in the same order, will generate the inverse matrix A−1 . If matrix A is singular, this will be indicated by the generation of either a complete row or a complete column of zeros before I is reached. If A is nonsingular, it is reducible to the unit matrix I, and clearly detA = 0. However, if A is singular, the attempt to reduce it to I will generate either a row or a column of zeros, so that then detA = 0. The vanishing or nonvanishing of detA provides a simple and convenient test for the singularity or nonsingularity of A whenever n is small, say n ≤ 3, because only then is it a simple matter to calculate detA. The practical way in which to implement this result is not to use the matrices Ei to reduce A to I, but to perform the operations directly on the rows of the partitioned matrix (A, I), because when A in the left half of the partitioned matrix has been reduced to I, the matrix I in the right half will have been transformed into A−1 . EXAMPLE 3.26 Use elementary row operations to find A−1 given that ⎡ 1 A = ⎣ −1 0 0 2 1 ⎤ 1 0⎦. 1
  • 184. 166 Chapter 3 Matrices and Systems of Linear Equations Solution We form the augmented matrix (A, I) and proceed as described earlier. ⎡ ⎤ ⎤ ⎡ 1 0 1 1 0 0 1 0 1 1 0 0 ∼ ⎥ ⎢ ⎥ ⎢ (A, I) = ⎣−1 2 0 0 1 0⎦ add row 1 ⎣0 2 1 1 1 0⎦ to row 2 0 1 1 0 0 1 0 1 1 0 0 1 ⎡ 1 0 1 ∼ subtract row 3 ⎢0 1 0 ⎣ from row 2 0 1 1 1 0 1 1 0 0 ⎤ ⎡ 1 0 1 0 ∼ ⎥ subtract row 2 ⎢ −1⎦ ⎣0 1 0 from row 3 1 0 0 1 ⎡ 1 ∼ subtract row 3 ⎢0 ⎣ from row 1 0 0 0 2 1 1 0 1 1 0 1 −1 −2 −1 1 0 1 1 −1 −1 ⎤ 0 ⎥ −1⎦ 2 ⎤ ⎥ −1⎦ . 2 The 3 × 3 matrix on the left of this row-equivalent partitioned matrix is now the unit matrix I, so the required inverse matrix is the one to the right of the partition, namely, ⎡ ⎤ 2 1 −2 1 −1⎦ . A−1 = ⎣ 1 −1 −1 2 Once A−1 has been obtained, it is always advisable to check the result by verifying that AA−1 = I. Before proceeding further we will use elementary matrices to provide the promised proof of Theorem 3.4(viii). the proof that det(AB) = detA detB Proof that det(AB) = detA detB Let E1 be a row matrix of Type I. Then if A is a nonsingular matrix, det(EI A) = −detA, because only a row interchange is involved. However, det(EI ) = −1, so det(EI A) = detEI detA. Similar arguments show this to be true for elementary row operation matrices of the other two types, so if E is an elementary row operation of any type, then det(EA) = detEdetA. If detA = 0, premultiplication by a sequence of elementary row operation matrices E1 , E2 , . . . , Er will reduce A to I, so performing them on I in the reverse order allows us to write A = E1 E2 . . . Er I = E1 E2 . . . Er . A repetition of the result det(EA) = detEdetA shows that detA = detE1 detE2 . . . detEr . If B is conformable for multiplication with A, using the preceding result we have det(AB) = det(E1 E2 . . . Er B) = detE1 detE2 . . . detEr detB, but detE1 detE2 . . . detEn = detA, and so det(AB) = detAdetB.
  • 185. Section 3.9 The Inverse Matrix 167 To complete the proof we must show this result remains true if A is singular, in which case detA = 0. When detA = 0, the attempt to reduce it to the unit matrix I by elementary row operation matrices will fail because at one stage it will produce a determinant in which a