JinHo Kwak Sungpyo Hong
Linear Algebra Second Edition
Springer Science+Business Medi~ LLC
JinHo Kwak Department of Matbematics Pohang University of Science and Technology Pohang, Kyungbuk 790-784 South Korea
Sungpyo Hong Department of Mathematics Pohang University of Science and Technology Pohang, Kyungbuk 790-784 SouthKorea
Library of Cougress Cataloging-in-PubHeation Data Kwak, lin Ho, 1948Linear algebra I lin Ho Kwak, Sungpyo Hong.-2nd ed. p.cm. Includes bibliographical references and index. ISBN 978-0-8176-4294-5 ISBN 978-0-8176-8194-4 (eBook) DOI 10.1007/978-0-8176-8194-4 1. Algebras, Linear. I. Hong, Sungpyo, 1948- ß. Title.
QAI84.2.K932004 512' .5-dc22
2004043751 CIP
AMS Subject Classifications: 15-01 ISBN 978-0-8176-4294-5
Printed on acid-free paper.
@2004 Springer Science+Business Media New York Originally published by Birkhlluser Boston in 2004 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrievaI, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to property rights.
987654321
SPIN 10979327
www.birkhasuer-science.com
Preface tothe Second Edition
This second edition is based on many valuable comments and suggestions from readers of the first edition. In this edition, the last two chapters are interchanged and also several new sections have been added. The following diagram illustrates the dependencies of the chapters.
Chapter 1 Linear Equations and Matrices
Chapter 2 Determinants
Chapter 4 Linear Transformations
Chapter 6 Diagonalization
Chapter 5 Inner Product Spaces
Chapter? Complex Vector Spaces
Chapter 8 Jordan Canonical Forms
vi
Preface to the Second Edition
The major changes from the first edition are the following. (1) In Chapter 2, Section 2.5.1 "Miscellaneous examples for determinants" is added as an application . (2) In Chapter 4, "A homogeneous coordinate system" is introduced for an application in computer graphics. (3) In Chapter 5, Section 5.7 "Relations of fundamental subspaces" and Section 5.8 "Orthogonal matrices and isometries" are interchanged. "Least squares solutions," "Polynomial approximations" and "Orthogonal projection matrices" are collected together in Section 5.9-Applications. (4) Chapter 6 is entitled "Diagonalization" instead of "Eigenvectors and Eigenvalues." In Chapters 6 and 8, "Recurrence relations," "Linear difference equations" and "Linear differential equations" are described in more detail as applications of diagonalizations and the Jordan canonical forms of matrices . (5) In Chapter 8, Section 8.5 "The minimal polynomial of a matrix" has been added to introduce more easily accessible computational methods for Anand e A , with complete solutions of linear difference equations and linear differential equations . (6) Chapter 8 "Jordan Canonical Forms" and Chapter 9 "Quadratic Forms" are interchanged for a smooth continuation of the diagonalization problem of matrices. Chapter 9 "Quadratic Forms" is extended to a complex case and includes many new figures. (7) The errors and typos found to date in the first edition have been corrected . (8) Problems are refined to supplement the worked-out illustrative examples and to enable the reader to check his or her understanding of new definitions or theorems. Additional problems are added in the last exercise section of each chapter. More answers, sometimes with brief hints, are added, including some corrections. (9) In most examples , we begin with a brief explanatory phrase to enhance the reader's understanding. This textbook can be used for a one- or two-semester course in linear algebra. A theory oriented one-semester course may cover Chapter 1, Sections 1.1-1.4, 1.6-1.7; Chapter 2 Sections 2.1-2.3; Chapter 3 Sections 3.1-3.6; Chapter 4 Sections 4.1-4.6; Chapter 5 Sections 5.1-5.4; Chapter 6 Sections 6.1-6.2; Chapter 7 Sections 7.1-7.4 with possible addition from Sections 1.8, 2.4 or 9.1-9.4. Selected applications are included in each chapter as appropriate. For a beginning applied algebra course, an instructor might include some ofthem in the syllabus at his or her discretion depending on which area is to be emphasized or considered more interesting to the students. In definitions , we use bold face for the word being defined, and sometimes an italic or shadowbox to emphasize a sentence or undefined or post-defined terminology.
Preface to the Second Edition
vii
Acknowledgement: The authors would like to express our sincere appreciation for the many opinions and suggestions from the readers of the first edition including many of our colleagues at POSTECH. The authors are also indebted to Ki Hang Kim and Fred Roush at Alabama State University and Christoph Dalitz at Hochschule Niederrhein for improving the manuscript and selecting the newly added subjects in this edition . Our thanks again go to Mrs . Kathleen Roush for grammatical corrections in the final manuscript, and also to the editing staff of Birkhauser for gladly accepting the second edition for publication.
JinHo Kwak Sungpyo Hong E-mail:
[email protected] [email protected]
January 2004, Pohang, South Korea
Preface to the First Edition
Linear algebra is one of the most important subjects in the study of science and engineering because of its widespread applications in social or natural science, computer science, physics , or economics . As one of the most useful courses in undergraduate mathematics , it has provided essential tools for industrial scientists. The basic concepts of linear algebra are vector spaces, linear transformations, matrices and determinants, and they serve as an abstract language for stating ideas and solving problems . This book is based on lectures delivered over several years in a sophomore-level linear algebra course designed for science and engineering students. The primary purpose of this book is to give a careful presentation of the basic concepts of linear algebra as a coherent part of mathematics, and to illustrate its power and utility through applications to other disciplines . We have tried to emphasize computational skills along with mathematical abstractions , which have an integrity and beauty of their own. The book includes a variety of interesting applications with many examples not only to help students understand new concepts but also to practice wide applications ofthe subject to such areas as differential equations, statistics, geometry, and physics. Some of those applications may not be central to the mathematical development and may be omitted or selected in a syllabus at the discretion of the instructor. Most basic concepts and introductory motivations begin with examples in Euclidean space or solving a system of linear equations, and are gradually examined from different points of view to derive general principles . For students who have finished a year of calculus, linear algebra may be the first course in which the subject is developed in an abstract way, and we often find that many students struggle with the abstractions and miss the applications . Our experience is that, to understand the material, students should practice with many problems, which are sometimes omitted . To encourage repeated practice, we placed in the middle of the text not only many examples but also some carefully selected problems, with answers or helpful hints . We have tried to make this book as easily accessible and clear as possible , but certainly there may be some awkward expressions in several ways. Any criticism or comment from the readers will be appreciated .
x
Preface to the First Edition
We are very grateful to many colleagues in Korea, especially to the faculty members in the mathematics department at Pohang University of Science and Technology (POSTECH), who helped us over the years with various aspects of this book. For their valuable suggestions and comments, we would like to thank the students at POSTECH , who have used photocopied versions of the text over the past several years. We would also like to acknowledge the invaluable assistance we have received from the teaching assistants who have checked and added some answers or hints for the problems and exercises in this book. Our thanks also go to Mrs. Kathleen Roush who made this book much more readable with grammatical corrections in the final manuscript. Our thanks finally go to the editing staff of Birkhauser for gladly accepting our book for publication. Jin Ho Kwak Sungpyo Hong
April 1997, Pohang, South Korea
Contents
Preface to the Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Preface to the First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
Linear Equations and Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Gaussian elimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Sums and scalar multiplications of matrices. . . . . . . . . . . . . . . . . . . . . 1.4 Products of matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Block matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Elementary matrices and finding A-I . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 LDU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.9 Applications..... . .. . . . .. . . .. . .. .... . . . .. .. . . . . ....... . ... . 1.9.1 Cryptography.. . . .. . . . . . . . .. . .. . .. .. .. . .. . .. . . .. .. .. 1.9.2 Electrical network 1.9.3 Leontief model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Exercises
1 4 11 15 19 21 23 29 34 34 36 37 40
Determinants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.1 Basic properties of the determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Existence and uniqueness of the determinant. . . . . . . . . . . . . . . . . . .. 2.3 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.4 Cramer's rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.5 Applications .... . ....... ... . . . .. . ... ............. ... ... .... 2.5.1 Miscellaneous examples for determinants. . . . . . . . . . . . . . .. 2.5.2 Area and volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.6 Exercises ............. ..... . . .. . . .. . ...... .... .. .. . .. .. . . .
45 45 50 56 61 64 64 67 72
1
2
1
xii
Contents
3
Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.1 The n-space jRn and vector spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Subspaces. . . . .. . . . ... . . .. . . . .. . . . . . ... . ... ... . . . . ... .... .. 3.3 Bases. . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Dimensions . ... . ... .. .. ... .. .. . .. .. . . . . .... . .. ... . . . ... . .. 3.5 Rowand column spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.6 Rank and nullity 3.7 Bases for subspaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.8 Invertibility.... .. ...... ... .. ... . .. . . ... . . . . . . . . . . . . . . . . . . . 3.9 Applications 3.9.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2 The Wronskian 3.10 Exercises
75 75 79 82 88 91 96 100 106 108 108 110 112
4
Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Basic propertiesof linear transformations 4.2 Invertiblelinear transformations ..................... 4.3 Matrices of linear transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.4 Vector spaces of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Change of bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Similarity. . . . .. .. . . . . . . . . . .. . . . . . .. .... .. . . . ... .. . . . ... . .. 4.7. Applications 4.7.1 Dual spaces and adjoint " 4.7.2 Computer graphics 4.8 Exercises ... . . . . . .. . ... . . . .. . .. . . . . . . . . . . . .. . .. . . . . . . . . . ..
117 117 122 126 131 134 138 143 143 148 152
5
Inner Product Spaces 5.1 Dot products and inner products 5.2 The lengths and angles of vectors.. . . . .. . .. .. . . .. . . .. .. . . .. . .. 5.3 Matrix representations of inner products 5.4 Gram-Schmidt orthogonalization 5.5 Projections. .... . .. . . ... .. . . ... .... .... ................. . . . 5.6 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Relations of fundamental subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Orthogonal matrices and isometries 5.9 Applications 5.9.1 Least squares solutions 5.9.2 Polynomial approximations 5.9.3 Orthogonalprojectionmatrices. . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Exercises
157 157 160 163 164 168 170 175 177 181 181 186 190 196
Contents
xiii
6
Diagonalization 6.1 Eigenvalues and eigenvectors 6.2 Diagonalization of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Applications 6.3.1 Linear recurrence relations . . . . . . . . . .. 6.3.2 Linear difference equations 6.3.3 Linear differential equations I . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Exponential matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.5 Applications continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.5.1 Linear differential equations II 6.6 Diagonalization of linear transformations . . . . . . . . . . . . . . . . . . . . . . . 6.7 Exercises . .. .. . .. .. .... . ... .... .... .. ....... . .. .. . ....... .
201 201 207 212 212 221 226 232 235 235 240 242
7
Complex Vector Spaces and complex vector spaces " 7.1 The n-space 7.2 Hermitian and unitary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7.3 Unitarily diagonalizable matrices 7.4 Normal matrices 7.5 Application . ...... .. .... . ........ ..... .............. .. .. .. . 7.5.1 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7.6 Exercises.. ...... .. .... . .. ..... . ... .... .... . . ........ ... . .
247 247 254 258 262 265 265 269
8
Jordan Canonical Forms 8.1 Basic properties of Jordan canonical forms . . . . . . . . . . . . . . .. 8.2 Generalized eigenvectors " 8.3 The power A k and the exponential eA .• •. •• • • • •. ••• •• •••. •.•. •. 8.4 Cayley-Hamilton theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8.5 The minimal polynomial of a matrix " 8.6 Applications. .... .. ... ...... .... ...... .. ..... .. .. .. .. ...... 8.6.1 The power matrix A k again 8.6.2 The exponential matrix eA again 8.6.3 Linear difference equations again . . . . . . . . . . . . . . . . . . . . . .. 8.6.4 Linear differential equations again. . . . . . . . . . . . . . . . . . . . . . 8.7 Exercises
273 273 281 289 294 299 302 302 306 309 310 315
9
Quadratic Forms 9.1 Basic properties of quadratic forms " 9.2 Diagonalization of quadratic forms 9.3 A classification of level surfaces 9.4 Characterizations of definite forms " 9.5 Congruence relation 9.6 Bilinear and Hermitian forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Diagonalization of bilinear or Hermitian forms 9.8 Applications 9.8.1 Extrema of real-valued functions on jRn • • .• • • • • • •• . • • • . •
319 319 324 327 332 335 339 342 348 348
en
Contents
xiv
9.8.2 Constrained quadratic optimization 353 9.9 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
SelectedAnswersand mots
361
Bibliography
383
Index
385
Linear Algebra
1 Linear Equations and Matrices
1.1 Systems of linear equations One of the central motivations for linear algebra is solving a system oflinear equations. We begin with the problem of finding the solutions of a system of m linear equations in n unknowns of the following form :
I
a2lXI
+ +
amlXI
+
allXI
+ +
+ +
alnXn
a22x2
a2nXn
= =
b2
am2X2
+ . ., +
amnXn
=
b m,
a12 X2
bl
where Xl , X2, . . . , Xn are the unknowns and aij's and b.'s denote constant (real or complex) numbers. A sequence of numbers (Sl, S2, .. . ,sn) is called a solution of the system if Xl = sj , X2 = S2, , X n = Sn satisfy each equation in the system simultaneously. When bl = b2 = = b m = 0, we say that the system is homogeneous . The central topic of this chapter is to examine whether or not a given system has a solution , and to find the solution if it has one. For instance, every homogeneous system always has at least one solution XI = X2 = ... = Xn = 0, called the trivial solution . Naturally, one may ask whether such a homogeneous system has a nontrivial solution or not. If so, we would like to have a systematic method of finding all the solutions. A system of linear equations is said to be consistent if it has at least one solution, and inconsistent if it has no solution . For example, suppose that the system has only one linear equation
If ai = 0 for i = I, .. . , n, then the equation becomes 0 = b. Thus it has no solution if b i: 0 (nonhomogeneous), or has infinitely many solutions (any n numbers Xi'S can be a solution) if b 0 (homogeneous). In any case, if all the coefficients of an equation in a system are zero, the equation is vacuously trivial. In this book, when we speak of a system of linear equation, we
=
J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
2
Chapter 1. Linear Equations and Matrices
always assume that not all the coefficients in each equation of the system are zero unless otherwise specified. Example 1.1 The system of one equation in two unknowns x and y is ax +by = c,
in which at least one of a and b is nonzero. Geometrically this equation represents a straight line in the xy-plane. Therefore, a point P = (x, y) (actually, the coordinates x and y) is a solution if and only if the point P lies on the line . Thus there are infinitely many solutions which are all the points on the line. Example 1.2 The system of two equations in two unknowns x and y is a\x { a2 X
+ +
b\y
=
c\
b2Y
=
C2·
Solution: (I) Geometric method. Since the equations represent two straight lines in the xy-plane, only three types are possible as shown in Figure 1.1. Case (1)
{2xx-y -2y
=
Case (2) -1 -2
y
{x- y x-y
= =
Case (3)
{x+ y x-y
-1 0
y
x
=
1 0
y
x
Figure 1.1. Three types of solution sets
Since a solution is a point lying on both lines simultaneously, by looking at the graphs in Figure 1.1, one can see that only the following three types of solution sets are possible: (1) the straight line itself if they coincide, (2) the empty set if the lines are parallel and distinct, (3) only one point if they cross at a point. (II) Algebraic method. Case (1) (two lines coincide): Let the two equations represent the same straight line, that is, one equation is a nonzero constant multiple of the other. This condition is equivalent to
1.1. Systemsof linear equations
3
In this case, if a point (s, t) satisfies one equation, then it automatically satisfies the other too. Thus, there are infinitely many solutions which are all the points on the line. Case (2) (two lines are parallel but distinct) : In this case, a2 = Aa" bi = Ab" but C2 : /= ACl for A : /= O. (Note that the first two equalities are equivalent to a,b2 - a2b, = 0). Then no point (s, t) can satisfy both equations simultaneously, so that there are no solutions . Case (3) (two lines cross at a point) : Let the two lines have distinct slopes, which means aib: - a2b, ::/= O. In this case, they cross at a point (the only solution), which can be found by the elementary method of elimination and substitution. The following computation shows how to do this:
a,
Without loss of generality, one may assume : /= 0 by interchanging the two equations if necessary. (If both a, and a2 are zero, the system reduces to a system of one variable.) (1) Elimination : The variable x can be eliminated from the second equation by adding - a2 times the first equation to the second, to get
a,
(2) Since a, b2 - a2b, : /= 0, y can be found by multiplying the second equation by a nonzero number to get a,b2 - a2b,
a,
c, a,c2 -a2c, a,b2 - a2b,
(3) Substitution: Now, x is solved by substituting the value of y into the first equation, and we obtain the solution to the problem: b2Cl - b,C2
aib: - a2b,
Note that the condition a, b2 - a2b, : /= 0 is necessary for the system to have only one solution. D In Example 1.2, the original system of equations has been transformed into a simpler one through certain operations, called elimination and substitution, which is
4
Chapter 1. Linear Equations and Matrices
just the solution of the given system . That is, if (x, y) satisfies the original system of equations, then it also satisfies the simpler system in (3), and vice-versa. As in Example 1.2, we will see later that any system of linear equations may have either no solution, exactly one solution, or infinitely many solutions. (See Theorem 1.6.) Note that an equation ax + by + cz = d, (a, b, c) i- (0,0,0) , in three unknowns represents a plane in the 3-space ]R3. The solution set includes
+ by = + cz = {(O, y, z) I by + cz =
{(x, y, 0) I ax
d}
in the xy-plane,
{(x, 0, z) I ax
d}
in the xz-plane,
d}
in the yz-plane.
One can also examine the various possible types of the solution set of a system of three equations in three unknown. Figure 1.2 illustrates three possible cases.
Infinitly many solutions
Only one solution
No solutions
Figure 1.2. Three planes in ]R3
Problem 1.1 For a system of three linear equations in three unknowns allx a21x
1
a31 x
+ + +
anY a22Y a32Y
+ + +
al3Z a23Z a33Z
=
= =
bl b2 b3,
describe all the possible types of the solution set in the 3-space ]R3 .
1.2 Gaussian elimination A basic idea for solving a system of linear equations is to transform the given system into a simpler one, keeping the solution set unchanged, and Example 1.2 shows an idea of how to do it. In fact, the basic operations used in Example 1.2 are essentially only the following three operations , called elementary operations:
1.2. Gaussian elimination
5
(1) multiply a nonzero constant throughout an equation, (2) interchange two equations, (3) add a constant multiple of an equation to another equation. It is not hard to see that none of these operations alters the solutions . That is, if satisfy the original equations , then they also satisfy those equations altered by the three operations, and vice-versa. Moreover, each ofthe three elementary operations has its inverse operation which is also an elementary operation:
Xi ' S
(I') multiply the equation with the reciprocal of the same nonzero constant , (2') interchange two equations again, (3') add the negative of the same constant multiple of the equation to the other.
Therefore, by applying a finite sequence of the elementary operations to the given original system, one obtains another new system, and by applying these inverse operations in reverse order to the new system, one can recover the original system. Since none of the three elementary operations alters the solutions, the two systems have the same set of solutions. In fact, a system may be solved by transforming it into a simpler system using the three elementary operations finitely many times. These arguments can be formalized in mathematical language. Observe that in performing any of these three elementary operations , only the coefficients of the variables are involved in the operations, while the variables Xl, X2 , • • •, X n and the equal sign U= " are simply repeated . Thus, keeping the places of the variables and U= " in mind, we just pick up the coefficients only from the given system of equations and make a rectangular array of numbers as follows:
This matrix is called the augmented matrix of the system. The tenn matrix means just any rectangular array of numbers, and the numbers in this array are called the entries of the matrix. In the following sections, we shall discuss matrices in general. For the moment, we restrict our attention to the augmented matrix of a system. Within an augmented matrix, the horizontal and vertical subarrays
[ail a i2 . .. ain
bj)
and
are called the i-th ro w (matrix), which represents the i-th equation , and the j-th column (matrix), which are the coefficients of j -th variable X i - of the augmented
6
Chapter 1. Linear Equations and Matrices
matrix, respectively. The matrix consisting of the first n columns of the augmented matrix
is called the coefficient matrix of the system. One can easily see that there is a one-to-one correspondence between the columns of the coefficient matrix and variables of the system. Note also that the last column [bl b2 . . . bmf of the augmented matrix represents homogeneity of the system and so no variable corresponds to it. Since each row of the augmented matrix contains all the information of the corresponding equation of the system, we may deal with this augmented matrix instead of handling the whole system of linear equations, and the elementary operations may be applied to an augmented matrix just like they are applied to a system of equations . But in this case, the elementary operations are rephrased as the elementary row operations for the augmented matrix: (1st kind) multiply a nonzero constant throughout a row, (2Dd kind) interchange two rows, (3rd kind) add a constant multiple of a row to another row.
The inverse row operations which are also elementary row operations are (1st kind) multiply the row by the reciprocal of the same constant, (2Dd kind) interchange two rows again, (3rd kind) add the negative of the same constant multiple of the row to the other.
Definition 1.1 Two augmented matrices (or systems of linear equations) are said to be row-equivalent if one can be transformed to the other by a finite sequence of elementary row operations . Note that, if a matrix B can be obtained from a matrix A by these elementary row operations, then one can obviously recover A from B by applying the inverse elementary row operations to B in reverse order. Therefore, the two systems have the same solutions: Theorem 1.1 If two systems of linear equations are row-equivalent, then they have the same set ofsolutions. The general procedure for finding the solutions will be illustrated in the following example :
1.2. Gaussian elimination
7
Example 1.3 Solve the system of linear equations :
I
2y 2y
+ +
x 3x
4Y
+ + +
4z = 2z = 6z =
2 3 -1.
Solution: One could work with the augmented matrix only. However, to compare the operations on the system of linear equations with those on the augmented matrix, we work on the system and the augmented matrix in parallel. Note that the associated augmented matrix for the system is
[~ ; ~
~].
3 4 6 -1
(1) Since the coefficient of x in the first equation is zero while that in the second equation is not zero, we interchange these two equations :
I I
x
+
3x
+
2y 2y 4Y
+ + +
2z = 4z = 6z =
[b;; ;].
3 2 -1
3 4 6 -1
(2) Add -3 times the first equation to the third equation:
+
x
2y 2y 2y
-
+ +
2z = 4z = =
3 2 -10
[
1
2
o
-2
o
2
;o
;].
-10
Thus , the first variable x is eliminated from the second and the third equations. In this process , the coefficient 1 of the first unknown x in the first equation (row) is called the first pivot. Consequently, the second and the third equations have only the two unknowns y and z. Leave the first equat ion (row) alone, and the same elimination procedure can be applied to the second and the third equations (rows): The pivot to eliminate y from the last equation is the coefficient 2 of y in the second equation (row). (3) Add 1 times the second equation (row) to the third equation (row):
I
x
+
2y 2y
+ +
2z = 4z = 4z =
3 2 -8
;] . [b;; o 0 4 -8
The elimination process (i.e., (1) : row interchange , (2): elimination of x from the last two equations (rows), and then (3): elimination of y from the last equation (row)) done so far to obtain this result is called forward elimination. After this forward elimination, the leftmost nonzero entries in the nonzero rows are called the pivots . Thus the pivots of the second and third rows are 2 and 4, respectively. (4) Normalize nonzero rows by dividing them by their pivots. Then the pivots are replaced by 1:
8
r
2y y
{X
+2;
Chapter 1. Linear Equations and Matrices
+
+ +
= = =
2z 2z z
3 I
-2
[ o~ i ~ iJ . 0 I -2
The resulting matrix on the right-hand side is called a row-echelon form of the augmented matrix, and the I's at the pivotal positions are called the leading I's. The process so far is called Gaussian elimination. The last equation gives z = -2. Substituting z = -2 into the second equation gives y = 5. Now, putting these two values into the first equation, we get x = -3. This process is called back substitution.The computation is shown below: i.e., eliminating numbers above the leading I's, (5) Add -2 times the third row to the second and the first rows:
z
= = =
7 5
-2
2 0
U
~l
I 0 I -2
o
(6) Add -2 times the second row to the first row: { x
y
z
= = =
-3 5
-2
0 0 I 0 -35 o I -2
U
J
.
This resulting matrix is called the reduced row-echelon form of the augmented matrix, which is row-equivalent to the original augmented matrix and gives the solution to the system . The whole process to obtain the reduced row-echelon form is 0 called Gauss-Jordan elimination. In summary, by applying a finite sequence of elementary row operations , the augmented matrix for a system of linear equations can be transformed into its reduced row-echelon form, which is row equivalent to the original one. Hence the two corresponding systems have the same solutions. From the reduced row-echelon form, one can easily decide whether the system has a solution or not, and find the solution of the given system if it is consistent. Definition 1.2 A row-echelon form of an augmented matrix is of the following form: (1) The zero rows, if they exist, come last in the order of rows. (2) The first nonzero entries in the nonzero rows are I, called leading I's. (3) Below each leading I is a column of zeros. Thus, in any two consecutive nonzero rows, the leading I in the lower row appears farther to the right than the leading I in the upper row. The reduced row-echelon form of an augmented matrix is of the form: (4) Above each leading I is a column of zeros, in addition to a row-echelon form .
1.2. Gaussian elimination
9
Example 1.4 The first three augmented matrices below are in reduced row-echelon form, and the last one is just in row-echelon form.
[ ~ ~ ~] , [~~ ~ i ~] , [~~ ~ ~], [~i ~ ~]. 000
000 0 0
000 1
001 3
Recall that in an augmented matrix [A b], the last column b does not correspond to any variable. Thus , if the reduced row-echelon form of an augmented matrix for a nonhomogeneous system has a row of the form [ 0 0 ... 0 b ] with b f:. 0, then the associated equation is OXj + OX2 + . .. + OXn = b with b f:. 0, which means the system is inconsistent. If b = 0, then it has a row containing only O's, which can be neglected and deleted . In this example, the third matrix shows the former case, and the first two matrices show the latter case. 0 In the following example, we use Gauss-Jordan elimination again to solve a system which has infinitely many solutions . Example 1.5 Solve the following system of linear equations by Gauss-Jordan elimination. Xj + 3X2 2X3 = 3 2xj + 6X2 2X3 + 4X4 = 18 X2 + X3 + 3X4 = 10.
I
Solution: The augmented matrix for the system is
[
13-2 0 3]
2 6 -2 4 18 o 1 1 3 10
.
The Gaussian elimination begins with: (1) Adding -2 times the first row to the second produces
[
13-2 0 3]
o o
0 1
2 4 12 1 3 10
.
(2) Note that the coefficient of X2 in the second equation is zero and that in the third equation is not. Thus, interchanging the second and the third rows produces
[
13-2 0 3]
011310 o 0 2 4 12
.
(3) Dividing the third row by the pivot 2 produces a row-echelon form
10
Chapter 1. Linear Equations and Matrices
We now continue the back-substitution : (4) Adding -1 times the third row to the second, and 2 times the third row to the first produces
1 3 0 4 15] 1 0 1 4 . [ 0 1 2 6
o o
(5) Finally, adding -3 times the second row to the first produces the reduced row-echelon form:
13]
1 0 0 010 14 [ 001 2 6
.
The corresponding system of equations is Xl
1
X2 x3
+ + +
X4 X4
2X4
= =
=
3 4
6.
This system can be rewritten as follows:
1
~~
X3
: =
~
6
~:
2X4 .
Since there is no other condition on X4, one can see that all the other variables x}, X2, and X3 may be uniquely determined if an arbitrary real value t e lRis assigned to X4 (R denotes the set of real numbers): thus the solutions can be written as (Xl, X2, X3, x4)=(3-t , 4 - t , 6-2t, t), teR
o
Note that if we look at the reduced row-echelon form in Example 1.5, the variables and X3 correspond to the columns containing leading 1's, while the column corresponding to X4 contains no leading 1. An augmented matrix of a system of linear equations may have more than one row-echelon form, but it has only one reduced row-echelon form (see Remark (2) on page 97 for a concrete proof) . Thus the number of leading 1's in a system does not depend on the Gaussian elimination.
Xl , x2,
Definition 1.3 Among the variables in a system, the ones corresponding to the columns containing leading l's are called the basic variables, and the ones corresponding to the columns without leading 1's, if there are any, are called the free variables. Clearly the sum of the number of basic variables and that of free variables is equal to the total number of unknowns: the number of columns . In Example 1.4, the first and the last augmented matrices have only basic variables but no free variables, while the second one has two basic variables Xl and X3, and two
1.3. Sums and scalar multiplications of matrices
11
free variables X2 and X4. The third one has two basic variables XI and X2, and only one free variable X3. In general, as we have seen in Example 1.5, a consistent system has infinitely many solutions if it has at least one free variable, and has a unique solution if it has no free variable. In fact, if a consistent system has a free variable (which always happens when the number of equations is less than that of unknowns), then by assigning arbitrary value to the free variable, one always obtains infinitely many solutions. Theorem 1.2 If a homogeneous system has more unknowns than equations, then it
has infinitely many solutions. Problem 1.2 Suppose that the augmented matrices for some systems of linear equations have been reduced to reduced row-echelon forms as below by elementary row operations. Solve the systems:
(1)
[
1 0 0 I
o
o o o
0
5] -2 , 4
[bo ~ ~ ~
(2)
0 I 3
-I ]
6
.
2
Problem 1.3 Solve the following systems of equations by Gaussian elimination. What are the pivots?
(1)1
(3)\
-x + y + 3x + 4y + 2x + 5y + w + x -3w 17x 4w 17x 5x
2z
z
3z
=
0 0 O.
+ y + y + 2z 5z + 8y 2y + z
(2)
14X
=
3
3x
2y lOy
3y
z + 3z
I 5 6.
I I l.
Problem 1.4 Determine the condition on bi so that the following system has a solution. (1)
1~
3x
+ 2y + 6z 3y
2z y + 4z
=
bl b2 b3·
(2) {
~ 4x
+ 3y
2z
+ 3z + 2y + Z Y
bl b2 b3·
1.3 Sums and scalar multiplications of matrices Rectangular arrays of real numbers arise in many real-world problems. Historically, it was an English mathematician J.1. Sylvester who first introduced the word "matrix " in the year 1848. It was the Latin word for womb, as a name for an array of numbers. Definition 1.4 An m by n (written mxn) matrix is a rectangular array of numbers arranged into m (horizontal) rows and n (vertical) columns . The size of a matrix is specified by the number m of the rows and the number n of the columns .
12
Chapter 1. Linear Equations and Matrices
In general, a matrix is written in the following form:
or just A = [aij] if the size of the matrix is clear from the context. The number aij is called the (i, j)-entry of the matrix A , and is written as aij = [A]ij. An m x 1 matrix is called a column (matrix) or sometimes a column vector, and a 1 xn matrix is called a row (matrix), or a row vector. In general, we use capital letters like A, B, C for matrices and small boldface letters like x, y, z for column or row vectors. Definition 1.5 Let A = [aij] be an mxn matrix. The transpose of A is the nxm matrix, denoted by AT, whose j -th column is taken from the j -th row of A: that is, [AT]ij = [Alji. Example 1.6 (1) If A =
[135]
,then AT =
2 4 6
[12] ;:
.
(2) The transpose of a column vector is a row vector and vice-versa:
X
-- [
X~:'n~
{:=:}
]
xT =
[XI X2 • ••
xnl.
o
Definition 1.6 A matrix A = [aij] is called a square matrix of order n if the number of rows and the number of columns are both equal to n. Definition 1.7 Let A be a square matrix of order n. (1) The entries au, a22, . . . , ann are called the diagonal entries of A. (2) A is called a diagonal matrix if all the entries except for the diagonal entries are zero. (3) A is called an upper (lower) triangular matrix if all the entries below (above, respectively) the diagonal are zero. The following matrices U and L are the general forms of the upper triangular and lower triangular matrices, respectively:
~~: a~2
ln
. .. :
a2n a
. a~n
]
,
L= [
a~1
a n2
Jl
1.3. Sums and scalar multiplications of matrices
13
Note that a matrix which is both upper and lower triangular must be a diagonal matrix, and the transpose of an upper (lower, respectively) triangular matrix is lower (upper, respectively) triangular. Definition 1.8 Two matrices A and B are said to be equal, written A = B , if their sizes are the same and their corresponding entries are equal : i.e., [A]ij = [B]ij for all i and j.
=
This definition allows us to write a matrix equation. A simple example is (ATl A by definition . Let Mmxn{lR) denote the set of all m x n matrices with entries of real numbers. Among the elements of Mmxn{lR), one can define two operations, called the scalar multiplication and the sum of matrices, as follows:
Definition 1.9 (I)(Scalarmultiplication) Foranm xn matrix A = [aij] E Mmxn{lR) and a scalar k E lR (which is simply a real number), the scalar multiplication of k and A is defined to be the matrix kA such that [kA]ij k[A]ij for all i and j: i.e., in an expanded form:
=
. ..
a ll
.
k [
.
.
.
amI
(2)(Sum ofmatrices) For two matrices A = [aij] and B = [bij] in Mmxn(R), the sum of A and B is defined to be the matrix A + B such that [A + B]ij [A]ij + [B]ij for all i and j : i.e., in an expanded form:
=
aln]
:.
amn
+
[b
11
:.
al
.. .
'. '
bml
n
7
bIn ].
amn +bmn
The resulting matrices kA and A + B from these two operations also belong to Mmxn(lR) . In this sense, we say Mm xn(lR) is closed under the two operations. Note that matrices of different sizes cannot be added ; for example, a sum
124]+[ad e fc] [3 b
cannot be defined. If B is any matrix , then - B is by definition the multiplication (-1) B. Moreover, if A and B are two matrices of the same size, then the subtraction A - B is by definition the sum A + (-1) B. A matrix whose entries are all zeros is called a zero matrix, denoted by the symbol 0 (or Omxn when the size is emphasized). Clearly, matrix sum has the same properties as the sum of real numbers. The real numbers in the context here are traditionally called scalars even though "numbers" is a perfectly good name and "scalar" sounds more technical. The following theorem lists the basic arithmetic properties of the sum and scalar multiplication of matrices.
14
Chapter 1. Linear Equations and Matrices
Theorem 1.3 Suppose that the sizes of A, Band C are the same. Then the following arithmetic rules ofmatrices are valid: (1) (2) (3) (4) (5) (6) (7)
(A + B) + C = A + (B + C) , (written as A A+ 0 0+ A A, A+(-A)=(-A)+A=O, A + B = B + A, k (A + B) = kA + k.B, (k + £)A = kA + lA, (kl)A = k (fA).
=
=
+
B
+
C)
(Associativity),
(Commutativity),
Proof: We prove only the equality (S) and the remaining ones are left for exercises. For any (i, j), ij
= k[A +
= k([A]ij +
B]ij
[B]ij)
= [kA] ij +
[kB]ij
= (A +
+
= [kA +kB]ij .
In particular, A + A = 2A , A + (A + A) = 3A nA = (n - l)A + A for any positive integer n,
A)
o
A, and inductively
Definition 1.10 A square matrix A is said to be symmetric if AT = A , or skewsymmetric if AT = - A . For example , the matrices A =
[~ ~
b c
b ] ,
B =
sc
[-~ ~; ] -2 -3 0
,
are symmetric and skew-symmetric, respectively. Notice that all the diagonal entries of a skew-symmetric matrix must be zero, since aii = -aii . By a direct computation, one can easily verify the following properties of the transpose of matrices: Theorem 1.4 Let A and B be m x n matrices. Then (kA)T = kAT and (A
+
B)T = AT
+
B T.
Problem 1.5 Prove the remaining parts of Theorem 1.3. Problem 1.6 Find a matrix B such that A A
=
+
[
Problem 1.7 Find a , b , c and d such that
BT
= (A -
2-3 0]
4 -1 3 -1 0 1
3]
B)T , where
.
a+9] b . [ ac db] = 2[a2 a+c + [2+b c+d
1.4. Products of matrices
15
1.4 Products of matrices The sum and the scalar multiplic ation of matrices were introduced in Section 1.3. In this section , we introduce the product of matrices. Unlike the sum of two matrices , the product of matrices is a little bit more complicated, in the sense that it can be defined for two matrice s of different sizes. The product of matrices will be defined in three steps: Step ( 1) Product of vectors: For a 1 x n row vector a = [al a2 . . . an] and an n x 1 column vector x = [XI X2 ... Xnf , the product ax is a 1 x 1 matrix (i.e., just a number) defined by the rule
ax
~
[al a2 ... a.] [
~~ ~ ]
[a,x, + a2X2 + .. .+ a.x.l
~
[t.aiXi] .
Note that the number of entries of the first row vector is equal to the number of entries of the second column vector, so that an entry-wise multiplication is possible. Step (2) Product of a matrix and a vector: For an m x n matrix
with the row vectors a i'S and for an n x 1 column vector x = [XI'" xn]T, the product Ax is by definition an m x 1 matrix , whose m rows are computed according to Step (1 ):
Ax =
[:~: :~~ ami a m2
n al oi« ] [ XI] X2
a~n
•
=
[alX] a2X [I:?=l I:?=I aloux.ix i ] a~x = I:?=I:amiXi .
Therefore, for a system of m linear equations in n unknowns, by writing the n unknowns as an n x 1 column matrix x and the coefficients as an m x n matrix A, the system may be expressed as a matrix equation Ax = b . Step (3) Product of matrices: Let A be an m x n matrix and B an n x r matrix with columns bj , b2, . . . , b.. written as B = [bl b2 .. . b.]. The product AB is defined to be an m x r matrix whose r columns are the products of A and the r columns of B, each computed according to Step (2) in corresponding order. That is,
AB = [
Abl
16
Chapter 1. Linear Equations and Matrices
which is an m x r matrix. Therefore , the (i , j)-entry [AB] ij of AB is: [A Blij
= a ib j = ailbl j +
ai2b 2j
+ . .. +
ainbnj
=
Lk=1 aikbkj .
This can be easily memorized as the sum of entry-wise multiplications of the boxed vectors in Figure 1.3.
AB
:
[ bll
~ amI
amn
am2
...
blj b2j
bl r b2r
bnj
bnr
J
Figure 1.3. The entry [AB]ij
Example 1.7 Consider the matrices
A=[~ ~l
B=
1 0 . [51 - 20]
The columns of AB are the product of A and each column of B :
[~ ~] [;]
[~ ~] [-i] =
[~ ~][~]
2 · 1 + 3 . 5 ] _ [ 17 ] [ 4 ·1+0 ·5 4 ' 2 ·2 + 3 . (-1) ] _ [ 1 ] [ 4 ·2+0·(-1) 8 ' 2 .0 + 3.0 ] _ [ 0 ] [ 4 ·0+0 ·0 0 .
Therefore , AB is
2 3] [1 2 0] = [17 1 0] [ 4 0 5 -1 0 4 8 0 . Since A is a 2 x2 matrix and B is a 2x3 matrix, the product AB is a 2x3 matrix. If we concentrate, for example, on the (2, I)-entry of AB, we single out the second row from A and the first column from B, and then we multiply corresponding entries together and add them up, i.e., 4 . 1 + 0 . 5 = 4. 0 Note that the product AB of A and B is not defined if the number of columns of A and the number of rows of B are not equal.
1.4. Products of matrices
17
Remark: In step (2), instead of defining a product of a matrix and a vector, one can define alternatively the product of a 1 x n row matrix a and an n x r matrix Busing the same rule defined in step (1) to have a 1 x r row matrix aB. Accordingly, in step (3) an appropriate modification produces the same definition of the product of two matrices . We suggest that readers complete the details. (See Example 1.10.) The identity matrix of order n, denoted by In (or I if the order is clear from the context) , is a diagonal matrix whose diagonal entries are all 1, i.e.,
By a direct computation, one can easily see that AIn = A
= InA for any n x n matrix
A.
The operations of scalar multiplication, sum and product of matrices satisfy many, but not all, of the same arithmetic rules that real or complex numbers have. The matrix Om xn plays the role of the number 0, and In plays that of the number 1 in the set of usual numbers. The rule that does not hold for matrices in general is commutativity AB = BA of the product, while commutativity of the matrix sum A + B = B + A always holds . The follow ing example illustrates noncommutativity of the product of matrices. Example 1.8 (Noncommutativity of the matrix product) Let A =
[~ _ ~]
l [_~ ~ l
and B =
AB =
[~ ~
Then,
BA =
[~ -~] 0
which shows AB =1= BA .
The following theorem lists the basic arithmetic rules that hold in the matrix product. Theorem 1.5 Let A, B, C be arbitrary matrices for which the matrix operations below can be defined, and let k be an arbitrary scalar. Then (1) A(BC) = (AB)C, (written as ABC) (2) A(B + C) = AB + AC, and (A + B)C = AC (3) I A = A = AI, (4) k(BC) = (kB)C = B(kC), (5) (AB)T B T AT.
=
+
BC,
(Associativity), (Distributivity),
18
Chapter 1. Linear Equations and Matrices
Proof: Each equality can be shown by direct computation of each entry on both sides of the equalities. We illustrate this by proving (I) only, and leave the others to the reader. Assume that A = [aij] is an mxn matrix, B = [bkl] is an nxp matrix, and C [cstl is a pxr matrix. We now compute the (i, i)-entry of each side of the equation. Note that BC is an n xr matrix whose (i , i)-entry is [BC]ij = L:f=l bnc:..l : Thus
=
n
[A(BC)]ij
n
= :~::>i/l [ B C]/lj = L /l=1
n
p
ai/l I)/l}.C}.j
/l=1
p
=L
Lai/lb/l}.C}.j. /l=1 }.=I
}.=I
Similarly, AB is an m x p matrix with the (i, i)-entry [A B]ij = L:~=I ai/lb/lj, and p
[(AB)C]ij
p
= L[AB]i}.C}.j = L }.=l
n
L
n
ai/lb/l}.c}.j
=L
L ai/lb/l}.c}.j' /l=1 }.=I
}.=I/l=1
This clearly shows that [A(BC)]ij = A(BC) (AB)C as desired.
=
p
[(AB)C]ij for all i,
i.
and consequently
0
Problem 1.8 Give an exampleof matrices A and B such that (AB)T f= AT BT . Problem 1.9 Proveor disprove: If A is not a zeromatrixand AB is it true or not that AB = 0 implies A = 0 or B = O?
= AC,thenB = C.Similarly,
Problem 1.10 Showthatany triangularmatrix A satisfying A AT
= AT A is a diagonalmatrix.
Problem 1.11 For a square matrix A , showthat
(1) AA T and A + AT are symmetric, (2) A - AT is skew-symmetric, and (3) A can be expressedas the sum of symmetric part B
part C
= ~(A -
AT) , so that A
=B+
= ~ (A +
AT) and skew-symmetric
C.
As an application of our results on matrix operations, one can prove the following important theorem: Theorem 1.6 Any system of linear equations has either no solution, exactly one solution, or infinitely many solutions. Proof: We have seen that a system of linear equations may be written in matrix form as Ax = b. This system may have either no solution or a solution. If it has only one solution, then there is nothing to prove. Suppose that the system has more than one solution and let Xl and X2 be two different solutions so that AXI = band AX2 = b. Let Xo = XI - X2· Then xo f= 0, and Axo = A(XI - X2) = O.Thus
1.5. Block matrices
19
+ kxo) == AXI + kAxo = b. This means that Xl + kxo is also a solution of Ax = b for any k. Since there are A(XI
infinitely many choices for k, Ax = b has infinitely many solutions.
0
Problem 1.12 For which values of a does each of the following systems have no solution,
exactly one solution, or infinitely many solutions? (1)
I I
+
2y
+
y
x -
y 3y ay
x
3x
4x
(2)
x
2x
+ +
y
3z Sz
+ + + + +
(a 2 - 14)z
z
=
az
3z
4 2 a
+
2.
1 2 3.
1.5 Block matrices In this section we introduce some techniques that may be helpful in manipulations of matrices. A submatrix of a matrix A is a matrix obtained from A by deleting certain rows and/or columns of A. Using some horizontal and vertical lines, one can partition a matrix A into submatrices, called blocks, of A as follows: Consider a matrix
divided up into four blocks by the dotted lines shown. Now, if we write All
= [all a21
a12 a 13], A12 a22 a23
= [ a14 a24
] ,
then A can be written as
called a block matrix. The product of matrices partitioned into blocks also follows the matrix product formula, as if the blocks Aij were numbers: If A = [All
A12] A2l An
and
B = [Bll B12] B2l B22
are block matrices and the number of columns in Aik is equal to the number of rows in Bkj, then
20
Chapter 1. Linear Equations and Matrices _ -
AB
[AllBll A21Bll
+ +
Al2 B21 A22B21
AII Bl2 A2I Bl2
+ +
Al2 B22 ] A22 B22 .
This will be true only if the columns of A are partitioned in the same way as the rows of B. It is not hard to see that the matrix product by blocks is correct. Suppose, for example, that we have a 3x3 matrix A and partition it as
I a 13] I a23 I a33
a ll al2 a21 a22
A =
[ a31 an
[ A All
=
21
and a 3 x 2 matrix B which we partition as
[:~:E3i1J3i :~~] = [ :~: ] .
B=
Then the entries of C = [Cij] = A B are Cij = (ailb lj
+
ai2b2j)
+
ai3b3j.
The quantity ailblj + ai2b2j is simply the (i, j)-entry of AllBll if i .:::: 2, and is the (i , j)-entry of A21 Bll if i = 3. Similarly, ai3b3j is the (i, j)-entry of A l2B21 if i .:::: 2, and of A22B21 if i = 3. Thus AB can be written as
=[
AB
=[
Cll ] e21
AllBll A21 Bll
+ +
AI2 B21 ] . A22 B21
Example 1.9 If an m x n matrix A is partitioned into blocks of column vectors : i.e., A = [CI C2 cn] , where each block Cj is the j-th column, then the product Ax with x = [XI xnf is the sum of the block matrices (or column vectors) with coefficients X /s:
Ax
~ ICI
'2
where Xj Cj = xj[alj a2j but the vector equation XI CI
',] [
+
~~ ~ ]
x1'd X2'2 +.
+ x""
anjf . Hence, a matrix equation Ax XnC n = b.
X2C2
+ .. . +
= b is nothing 0
Example 1.10 Let A be an m x n matrix partitioned into the row vectors ar , a2, ... , an as its blocks, and let B be an n x r matrix so that their product AB is well defined . By considering the matrix B as a block , the product AB can be written at a2 .
AB =
[
..
am
J =[ B
alB a2B
..
.
amB
J=
l [alb a2bl .
alb2 a2b2
ambl
amb2
..
1.6. Inverse matrices
21
where b., b2, . .. , b, denote the columns of B . Hence, the row vectors of AB are the products of the row vectors of A and the column vectors of B. 0
Problem 1.13 Compute AB using block multiplication, where
A=
0]
1211
[
-3 ~~ ~
o 01 2 -1
1
,
B=
[
~
3
012]
~~
.
-2 I 1
1.6 Inverse matrices As shown in Section 1.4, a system of linear equations can be written as Ax = b in matrix form. This form resembles one of the simplest linear equations in one variable ax = b whose solution is simply x = a-I b when a f= O. Thus it is tempting to write the solution of the system as x = A -I b. However, in the case of matrices we first have to assign a meaning to A -I . To discuss this we begin with the following definition. Definition 1.11 For an m x n matrix A, an n x m matrix B is called a left inverse of A if BA = In, and an n x m matrix C is called a right inverse of A if AC = 1m • Example 1.11 (One -sided inverse) From a direct calculation for two matrices A
we have AB =
=
h , and
[1 2-1] andB=[-~ -2 2 0
BA =
1
[-~
-; - : ]
12 -4
f=
-3 ] 5
,
7
13 .
9
Thus, the matrix B is a right inverse but not a left inverse of A , while A is a left inverse but not a right inverse of B. 0 A matrix A has a right inverse if and only if AT has a left inverse, since (AB)T = B T AT and IT = I. In general, a matrix with a left (right) inverse need not have a right (left , respectively) inverse. However, the following lemma shows that if a matrix has both a left inverse and a right inverse, then they must be equal: Lemma 1.7 If an n x n square matrix A has a left inverse B and a right inverse C, then Band C are equal, i.e., B = C. Proof: A direct calculation shows that B = BIn = B(AC) = (BA)C = InC = C .
o
22
Chapter 1. Linear Equations and Matrices
By Lemma 1.7, one can say that if a matrix A has both left and right inverses, then any two left inverses must be both equal to a right inverse C, and hence to each other. By the same reason, any two right inverses must be both equal to a left inverse B, and hence to each other. So there exists only one left and only one right inverse, which must be equal. We will show later (Theorem 1.9) that if A is a square matrix and has a left inverse, then it has also a right inverse, and vice-versa. Moreover, Lemma 1.7 says that the left inverse and the right inverse must be equal. However, we shall also show in Chapter 3 that any non-square matrix A cannot have both a right inverse and a left inverse: that is, a non-square matrix may have only a one-sided inverse. The following example shows that such a matrix may have infinitely many one-sided inverses.
[~
!]
B=
[~ ~ ~] is a left inverse of A.
Example 1.12 (Infinitely many one-sided inverses) A non-square matrix A =
can have more than one left inverse, In fact, for any x, Y E Ill. the matrix 0
Definition 1.12 An n x n square matrix A is said to be invertible (or nonsingular) if there exists a square matrix B of the same size such that
AB = In = BA. Such a matrix B is called the inverse of A, and is denoted by A-I. A matrix A is said to be singular if it is not invertible. Lemma 1.7 implies that the inverse matrix of a square matrix is unique. That is why we call B 'the' inverse of A. For instance, consider a2x2 matrix A = If ad - be
:/= 0, then
it is easy to verify that
d
A-I _ 1 - ad -be
-b
[d-c -b] _[ ad-c- be a -
ad - be
ad - be
ad -be
a
since AA -1 = l: = A -1 A . Note that any zero matrix is singular.
[~ ~].
l
Problem 1.14 Let A be an invertible matrix and k any nonzero scalar. Show that
=
(1) A-I is invertible and (A-I)-I A; (2) the matrix kA is invertible and (kA)-I = tA-I; (3) AT is invertible and (AT)-I = (A-I)T.
Theorem 1.8 The product of invertible matrices is also invertible, whose inverse is the product of the individual inverses in reversed order:
1.7. Elementary matrices and finding A-I
23
Proof: Suppose that A and B are invertible matrices of the same size. Then (A B)(B - I A- I ) = A(BB-I)A- I = AlA-I = AA- I = I , and similarly (B - 1A-I) (AB) = I. Thus , AB has the inverse B- 1A-I .
o
The inverse of A is written as ' A to the power -1 ' , so one can give the meaning of Ak for any integer k : Let A be a square matrix. Define A0 = I. Then, for any positive integer k, we define the power A k of A inductively as
Moreover, if A is invertible, then the negative integer power is defined as
It is easy to check that AkH = AkA l whenever the right-hand side is defined. (If A is not invertible, A3+(-I) is defined but A-I is not.) Problem 1.15 Prove:
(1) If A has a zero row, so does AB . (2) If B has a zero column, so does AB. (3) Any matrix with a zero row or a zero column cannot be invertible. Problem 1.16 Let A be an invertible matrix. Is it true that (Ak) T Justify your answer.
= (A T) k
for any integer k?
1.7 Elementary matrices and finding A-I We now return to the system of linear equations Ax = b. If A has a right inverse B so that AB = 1m , then x = Bb is a solution of the system since Ax
= A (Bb) = (AB)b = b .
(Compare with Problem 1.23). In particular, if A is an invertible square matrix, then it has only one inverse A- I, and x = A- Ib is the only solution of the system. In this section, we discuss how to compute A-I when A is invertible. Recall that Gaussian elimination is a process in which the augmented matrix is transformed into its row-echelon form by a finite number of elementary row operations. In the following, one can see that each elementary row operation can be expressed as a nonsingular matrix, called an elementary matrix, so that the process of Gaussian elimination is the same as multiplying a finite number of corresponding elementary matrices to the augmented matrix. Definition 1.13 An elementary matrix is a matrix obtained from the identity matrix In by executing only one elementary row operation.
24
Chapter 1. Linear Equations and Matrices
For example, the following matrices are three elementary matrices corresponding to each type of the three elementary row operations:
(1St kind) (2nd kind)
[~ _~] : the second row of h is multiplied by -5;
[~o ~0 ~1 0~]: theares~cond and the fourth rows of 14 interchanged; o
(3 rd kind)
I 0 0
[bo ~ ~]: 0 I
3 times the third row is added to the first row of h-
It is an interesting fact that, if E is an elementary matrix obtained by executing a certain elementary row operation on the identity matrix 1m , then for any m x n matrix A, the product EA is exactly the matrix that is obtained when the same elementary row operation in E is executed on A. The following example illustrates this argument. (Note that AE is not what we want. For this, see Problem 1.18.) Example 1.13 (Elementary operation by an elementary matrix) Let b = [bl bz b3]T be a 3 x 1 column matrix. Suppose that we want to execute a third kind of elementary operation 'adding (-2) x the first row to the second row' on the matrix b. First, we execute this operation on the identity matrix h to get an elementary matrix E:
E =
[
100] .
-2 1 0 001
Multiplying this elementary matrix E to b on the left produces the desired result:
Similarly, the second kind of elementary operation 'interchanging the first and the third rows' on the matrix b can be achieved by multiplying an elementary matrix P obtained from h by interchanging the two rows, to b on the left:
o Recall that each elementary row operation has an inverse operation, which is also an elementary operation, that brings the elementary matrix back to the original identity matrix. In other words, if E denotes an elementary matrix and if E' denotes the elementary matrix corresponding to the 'inverse' elementary row operation of E, then E' E = 1, because
1.7. Elementary matrices and finding A -I
25
(1) if E multiplies a row by c =f. 0, then E' multiplies the same row by ~; (2) if E interchanges two rows, then E' interchanges them again; (3) if E adds a multiple of one row to another, then E' subtracts it from the same row.
= =
I E E' : every elementary matrix is invertible Furthermore, one can say that E' E and inverse matrix E- 1 = E' is also an elementary matrix.
Example 1.14 (Inverse ofan elementary matrix) If EI =
0 ] , E2 = [ 1 01 0c O 0 0 10 0 ] , E 3 = [01 [ 001 301 0
then
o Definition 1.14 A permutation matrix is a square matrix obtained from the identity matrix by permuting the rows. In Example 1.14, E3 is a permutation matrix, but E2 is not. Problem 1.17 Prove: (1) A permutation matrix is the product of a finite number of elementary matrices each of which corresponds to the 'row-interchanging' elementary row operation. (2) Every permutation matrix P is invertible and p - 1 pT . (3) The product of any two permutation matrices is a permutation matrix. (4) The transpose of a permutation matrix is also a permutation matrix.
=
Problem 1.18 Define the elementary column operations for a matrix by just replacing 'row ' by 'column' in the definition of the elementary row operations . Show that if A is an m x n matrix and if E is a matrix obtained by executing an elementary column operation on In, then AE is exactly the matrix that is obtained from A when the same column operation is executed on A. In particular, if D is an n x n diagonal matrix with diagonal entries d I, d2, . . . , dn, then AD is obtained by multiplication by dl, d2, ... , dn of the columns of A, while DA is obtained by multiplication by dl' d2' . . . , dn of the rows of A .
The next theorem establishes some fundamental relations between n xn square matrices and systems of n linear equations in n unknowns. Theorem 1.9 Let A be an n x n matrix. The following are equivalent: (1) A has a left inverse; (2) Ax = 0 has only the trivial solution x = 0;
26 (3) (4) (5) (6)
Chapter 1. Linear Equations and Matrices A A A A
is row-equivalent to In ; is a product of elementary matrices; is invertible; has a right inverse.
Proof: (1) =} (2): Let x be a solution of the homogeneous system Ax = 0, and let B be a left inverse of A. Then
x = Inx
= (BA)x = B(Ax) = BO = O.
(2) =} (3): Suppose that the homogeneous system Ax = 0 has only the trivial solution x 0:
=
I
XI
X2 Xn
= =
0 0
=
O.
This means that the augmented matrix [A 0] of the system Ax = 0 is reduced to the system [In 0] by Gauss-Jordan elimination. Hence, A is row-equivalent to In. (3) =} (4): Assume A is row-equivalent to In, so that A can be reduced to In by a finite sequence of elementary row operations . Thus, one can find elementary matrices EI, E2, "" e, such that Ek .. . E2EIA = In. By multiplying successively both sides of this equation by E 1, E)I on
s;', ..., 2
the left, we obtain A = E I- I E2- I
'"
E-II k
n=
E-IE- I I
2'"
E- I
k '
which expresses A as the product of elementary matrices . (4) =} (5) is trivial, because any elementary matrix is invertible. In fact, A-I = Ek " · E2E l . (5) =} (1) and (5) =} (6) are trivial. (6) =} (5): If B is a right inverse of A , then A is a left inverse of B and one can apply (1) =} (2) =} (3) =} (4) =} (5) to B and conclude that B is invertible, with A as its unique inverse, by Lemma 1.7. That is, B is the inverse of A and so A is invertible.
o
If a triangular matrix A has a zero diagonal entry, then the system Ax = 0 has at least one free variable, so that it has infinitely many solutions . Hence, one can have the following corollary. Corollary 1.10 A triangular matrix is invertible if and only entry.
if it has no zero diagonal
From Theorem 1.9, one can see that a square matrix is invertible if it has a onesided inverse. In particular, if a square matrix A is invertible, then x = A -I b is a unique solution to the system Ax = b.
1.7. Elementary matrices and finding A -I
27
Problem 1.19 Find the inverse of the product
[~
o
~ ~] [ ~ ~ ~] [-~ ~ ~1]'
-c 1
-b 0 1
0 0
As an application of Theorem 1.9, one can find a practical method for finding the inverse A-I of an invertible nxn matrix A. If A is invertible, then A is row equivalent to In and so there are elementary matrices EI, Ez, .. . , Ek such that Ek'" EzEIA = In. Hence, A-I
= Ek '"
EzEI
= Ek'"
EzElln .
This means that A-I can be obtained by performing on In the same sequence ofthe elementary row operations that reduces A to In . Practically, one first constructs an n x 2n augmented matrix [A I In] and then performs a Gaussian-Jordan elimination that reduces A to In on [A I In] to get [In I A -I]: that is, [A I In] -+ [Ee'" EIA I Ee'" Ell] = [U I K] -+
[Fk'" FlU I Fk'" FIK] = [I I A-I],
where Ee .. . EI represents a Gaussian elimination that reduces A to a row-echelon form U and Fk .. . FI represents the back substitution. The following example illustrates the computation of an inverse matrix.
Example 1.15 (Computing A -I by Gauss-Jordan elimination) Find the inverse of A=
[I 23] 235 102
.
Solution: Apply Gauss-Jordan elimination to [A I I]
=
2 3 3 5 0 2
U
U U U
2 3 -1 -1 -2 -1
-+
-+
-+
100]
I I 0 1 0 I a a 1
2 3 1 1 -2 -1
2 3 1 1 a 1
1 0 2 -1 3 -2
n n
I a -2 1 -1 a
1 0 2 -1 -1 a
(-2)row 1 + row 2 (-1 )row I + row 3
n
(-l)row 2
(2)row 2 + row 3
28
Chapter 1. Linear Equations and Matrices
This is [U I K] obtained by Gaussian elimination. Now continue the back substitution to reduce [U I K] to [I I A -I] .
[U
1 00]
2 3 1 1 0 1
U U U
I K] =
-+
-+
2 0 1 0 0 1
-8
0 0 1 0 0 1
-6
6
(-2)row 2 + row 1
-3 ]
-1 1 - 1 3 -2 1
4
-1 1 3 -2
Thus, we get A-I
(-l)row 3 + row 2 (-3)row 3 + row 1
2 -1 0 3 -2 1
=
-i-1 ] =
[IIA-
I].
[-6 4-1 ] -1 1 -1 3 -2 1
(The reader should verify that AA- I = I
.
= A-I A .)
0
Note that if A is not invertible, then , at some step in Gaussian elimination, a ]For example, the matrix zero r[ow ~il~ ShO~ ]u p on the left-hand Sid[e ~n [U61
Kl
A=
2 4 -1 -1 2 5 and is not invertible.
is row-equivalent to
0 -8 -9 0 0 0
, which has a zero row
By Theorem 1.9, a square matrix A is invertible if and only if Ax = 0 has only the trivial solution. That is, a square matrix A is noninvertible if and only if Ax 0 has a nontrivial solution, say XC. Now, for any column vector b [bl '" bn]T , if XI is a solution of Ax = b for a noninvertible matrix A, so is kxo + xi for any k, since
=
A(kxo
+ Xl) = k(Axo) +
AXI
=
= kO + b = b.
This argument strengthens Theorem 1.6 as follows when A is a square matrix:
Theorem 1.11 If A is an invertible n x n matrix. then for any column vector b = [bl .. . bnf. the system Ax = b has exactly one solution x = A-lb. If A is not invertible, then the system has either no solution or infinitely many solutions according to the consistency of the system. Problem 1.20 Express A-I as a product of elementary matrices for A given in Example 1.15.
1.8. LDU factorization
Problem 1.21 When is a diagonal matrix D
=
[
dO l ...
29
0 ] . nonsingular, and what is dn
Problem 1.22 Write the system of linear equations
I
x + 2x 4x -
2y 2y 3y
+ + +
2z 3z 5z
= =
10 1 4
in matrix form Ax = b and solve it by finding A-I b.
Problem 1.23 True or false: If the matrix A has a left inverse C so that C A is a solution of the system Ax
= b . Justify your answer.
= In, then x = Cb
1.8 WU factorization In this section, we show that the forward elimination for solving a system of linear equations Ax = b can be expressed by some invertible lower triangular matrix, so that the matrix A can be factored as a product of two or more triangular matrices. We first assume that no permutations of rows (2nd kind of operation) are necessary throughout the whole process of forward elimination on the augmented matrix [A b]. Then the forward elimination is just multiplications of the augmented matrix [A b] by finitely many elementary matrices Ek, .. . , E 1: that is,
where each E, is a lower triangular elementary matrix whose diagonal entries are all l's and [U y] is a row-echelon form of [A b] without divisions of the rows by the pivots. (Note that if A is a square matrix, then U must be an upper triangular matrix). Therefore, if we set L = (Ek' " E1)-1 = Ell . . . Ei: 1, then we have A = LU, where L is a lowertriangularmatrix whosediagonalentriesareall l's. (In fact, each l is also a lower triangular matrix, and a product of lower triangular matrices is also lower triangular (see Problem 1.25). Such factorization A = LU is called an LU factorization or an LU decomposition of A. For example,
Ei
where di 'S are the pivots. Now, let A LU be an LU factorization. Then the system Ax as LUx = b. Let Ux = y. Thus, the system
=
= b can be written
30
Chapter 1. Linear Equations and Matrices
Ax= LUx=b can be solved by the following two steps: Step 1 Solve Ly = b for y. Step 2 Solve Ux = y by back substitution. The following example illustrates the convenience of an LU factorization of a matrix A for solving the system Ax = b. Example 1.16 Solve the system oflinear equations
Ax =
2 1 1 4 1 0
[ -2 2 1
by using an LU factorization of A. Solution: The elementary matrices for the forward elimination on the augmented matrix [A b] are easily found to be
0]
o1 0 , 3 1 so that .:
o
.,
-4
0 41 ]
= U.
Thus, if we set
L = Ell E:;1 E3"1
=[ ;
~ ~],
-1 -3 1
which is a lower triangular matrix with 1 's on the diagonal, then
A = LU = [
; -1
10] -2 1 . ~ ~] [~ -~ -3 1 0 0 -4 4
Now, the system
Ly =b :
Y2 3Y2
+
= = Y3 =
1
-2 7
can be easily solved inductively to get y = (1, -4, - 4) and the system
1.8. WU factorization
= =
Ux=y:
=
31
1
-4 -4
also can be solved by back substitution to get
o
for t E JR, which is the solution for the original system.
As shown in Example 1.16, it is a simple computation to solve the systems Ly = b and Ux y, because the matrix L is lower triangular and the matrix U is the matrix obtained from A after forward elimination so that most entries of the lower-left side are zero.
=
Remark: For a system Ax = b, the Gaussian elimination may be described as an LU factorization of a matrix A. Let us assume that one needs to solve several systems of linear equations Ax = bi for i = 1, 2, . . . , l with the same coefficient matrix A. Instead ofperforming the Gaussian elimination process l times to solve these systems, one can use an LU factorization of A and can solve first Ly = b, for i = 1,2, ... , l to get solutions Yi and then the solutions of Ax = bi are just those of Ux = Yi . From an algorithmic point of view, the suggested method based on the LU factorization of A is much efficient than doing the Gaussian elimination repeatedly, in particular, when l is large . Problem 1.24 Determine an LU factorization of the matrix A from which solve Ax
=
[
1 -1
0 ]
o
2
-1
2 -1
-1
,
= b for (1) b = [1 1 If and (2) b = [2 0
- l]T .
Problem 1.25 Let A and B be two lower triangular matrices. Prove that (1) their product AB is also a lower triangular matrix; (2) if A is invertible , then its inverse is also a lower triangular matrix ; (3) if the diagonal entries of A and B are all 1 's, then the same holds for their product AB and their inverses. Note that the same holds for upper triangular matrices, and for the product of more than two matrices .
The matrix U in the decomposition A = LU of A can further be factored as the product U = DO, where D is a diagonal matrix whose diagonal entries are the pivots
32
Chapter 1. Linear Equations and Matrices
of U or zeros and U is a row-echelon form of A with leading 1's, so that A = LDU. For example,
u n[~l * *] u n[~l n[~ 0 0 1 0 * 1
A
* * *
0 dz * * * 0 0 0 d3 * 0 0 0 o 0
* *
=
0 0 1 0 * 1
0 dz 0 0
* *
=
0 0 d3 0
= LU
* * * dI dI 0 1 .!. * d2 * d2 0 0 0 1 * d3
* .]
0
0 0
0 0
LDU.
For notational convention, we replace U again by U and write A = LDU. This decomposition of A is called an LDU factorization or an LDU decomposition of
A.
For example, the matrix A in Example 1.16 was factored as
A
=
1 ° 0] [2 1-210] 1 = LU . [ -12 -31 01 0-1 0 0 -4 4
It can be further factored as A
= LDU by taking
1/2 1/2 °] 2 1 10] [200][1 ° ° [ o -1 -2 1 00-44
=
0 -1 0 0 -4
0 0
1
2
-1
1
-1
= DU.
The LDU factorization of a matrix A is always possible when no row interchange is needed in the forward elimination process . In general, if a permutation matrix for a row interchange is necessary in the forward elimination process , then an LDU factorization may not be possible. Example 1.17 (The LDU factorization cannot exist)
[~
Consider a matrix A
=
b] ' For forward elimination, it is necessary to interchange the first row with
the second row. Without this interchange, A has no LU or LDU factorization . In fact, one can show that it cannot be expressed as a product of any lower triangular matrix L and any upper triangular matrix U. 0 Suppose now that a row interchange is necessary during the forward elimination on the augmented matrix [A b]. In this case, one can first do all the row interchanges before doing any other type of elementary row operations, since the interchange of rows can be done at any time, before or after the other elementary operations, with the same effect on the solution . Those 'row-interchanging' elementary matrices altogether form a permutation matrix P so that no more row interchanges are needed during the forward elimination on PA. Now, the matrix PA can have an LDU factorization.
e
1.8. LDU factorization
E[Xrr t]18;:, :a:~:::::::~::o: )n::'::~::t:~:: 102
:::::ro:
with the third row, that is, we need to multiply A by the permutation matrix P
[~ ~
33
=
~] so that
100
PA
= [~
~ ~] = [~011 ~ ~] [~0~0~]2 0 [~0~ ~1] = LDU.
012
Note that U is a row-echelon form of the matrix A.
0
Of course, if we choose a different permutation p i, then the L DU factorization of p iA may be different from that of PA, even if there is another permutation matrix P" that changes p i A to P A. Moreo ver, as the following example shows, even if a permutation matrix is not necessary in the Gaussian elimination , the LDU factorization of A need not be unique.
Example1.19 (Infinitely many LDU factorizations) The matrix
B=
110] 1 3 0
[ 000
has the LDU factorization B=
[
1 0 1 1
o
~ ] [~ ~ ~] [~000 ~ ~] =
0 1
OO x
LDU
for any value x . It shows that a singular matrix B has infinitely many LDU factorizations . 0 However, if the matrix A is invertible and if the permutation matrix P is fixed when it is necessary, then the matrix PA has a unique LDU factorization.
Theorem 1.12 Let A be an invertible matrix. Thenfor a fixed suitable permutation matrix P the matrix PA has a unique LDU factorization. Proof: Suppose that PA = LjD j U , = L zDzUz , wheretheL 's are lower triangular, the U's are upper triangular whose diagonals are all l's, and the D's are diagonal matrices with no zeros on the diagonal. One needs to show L , = Lz, D; = Dz , and U, = Uz for the uniqueness . Note that the inverse of a lower triangular matrix is also lower triangular, and the inverse of an upper triangular matrix is also upper triangular. And the inverse of a
34
Chapter 1. Linear Equations and Matrices
diagonal matrix is also diagonal. Therefore, by multiplying (LIDI)-I = D I I L I I on the left and V:;I on the right, the equation LI DI VI = L2D2V2 becomes VIV:;I = DIILIIL2D2 '
The left-hand side is upper triangular, while the right-hand side is lower triangular. Hence, both sides must be diagonal. However, since the diagonal entries of the upper triangular matrix VI V:;I are all 1's, it must be the identity matrix I (see Problem 1.25). Thus VI V:; 1 = I, i.e., VI = V2. Similarly, L I I L2 = DID:;I implies that LI = L2 and DI = D2. 0 In particular, if an invertiblematrix A is symmetric (i.e., A = A T) , and if it can be factored into A = LDV without row interchanges,then we have
and thus, by the uniqueness of factorizations, we have V Problem 1.26 FUuI the factors L , D , .nd U for A What is the solution to Ax
= b for b = [1 0
=[
-!
= L T and A = L DL T •
-1 0 ] 2 -1 .
-1
2
- If?
probl[em/2; ~o]r all possible permutation matrices P, find the LDU factorization of PA for A=
2 4 2 111
.
1.9 Applications 1.9.1 Cryptography
Cryptographyis the study of sendingmessagesin disguisedform (secretcodes) so that only the intended recipients can remove the disguise and read the message; modem cryptography uses advanced mathematics. As an application of invertible matrices, we introduce a simple coding. Suppose we associate a prescribed number with every letter in the alphabet; for example, ABC D
t
°
t
t
t
1 2
3
x t
Y
Z Blank
?
t
t
t
23 24 25
t
26
t
27 28.
Suppose that we wantto send the message"GOOD LUCK." Replacethis message by 6, 14, 14, 3, 26, 11, 20, 2, 10
1.9.1. Application: Cryptography
35
according to the preceding substitution scheme. To use a matrix technique, we first break the message into three vectors in ]R3 each with three components, by adding extra blanks if necessary:
Next, choose a nonsingular 3 x 3 matrix A, say
A=
I 00] [2 I 0 , I
I
I
which is supposed to be known to both sender and receiver. Then, as a matrix multiplication, A translates our message into
By putting the components of the resulting vectors consecutively, we transmit
6, 26, 34, 3, 32, 40, 20, 42, 32. To decode this message, the receiver may follow the following process . Suppose that we received the following reply from our correspondent:
19, 45, 26, 13, 36, 41. To decode it, first break the message into two vectors in ]R3 as before:
We want to find two vectors vectors : i.e., AX!
un ~ un
XI , X2
~
such that
AXi
is the i-th vector of the above two
Ax,
Since A is invertible, the vectors XI , X2 can be found by multiplying the inverse of A to the two vectors given in the message. By an easy computation, one can find A-I =
I 00] [ -2 I 0 . 1 -I
Therefore,
I
36
Chapter 1. Linear Equations and Matrices
XI
=[
19 ] -21 01 00] [ 45 26 1 -1 1
= [197 ]
°
,
The numbers one obtains are
19, 7, 0, 13, 10, 18. Using our correspondence between letters and numbers, the message we have received is "THANKS."
Problem 1.28 Encode ''TAKEUFO" using the same matrix A used in the above example.
1.9.2 Electrical network In an electrical network, a simple current flow may be illustrated by a diagram like the one below. Such a network involves only voltage sources , like batteries, and resistors, like bulbs , motors, or refrigerators. The voltage is measured in volts, the resistance in ohms, and the current flow in amperes (amps, in short). For such an electrical network, current flow is governed by the following three laws: Ohm's Law: The voltage drop V across a resistor is the product of the current I and the resistance R: V = I R. Kirchhoff's Current Law (KCL): The current flow into a node equals the current flow out of the node. Kirchhoff's Voltage Law (KVL): The algebraic sum of the voltage drops around a closed loop equals the total voltage sources in the loop . Example 1.20 Determine the currents in the network given in Figure 104.
2 ohms
P
2 ohms
18 volts
Figure 1.4. A circuit network
1.9.3. Application: Leontief model 40 ohms
20 volts
37
1 ohm
30 ohms
5 volts
40 ohms
4 volts
40 volts
(2)
(1)
Figure 1.5. Two circuit networks
Solution: By applying KCL to nodes P and Q, we get equations
h + h =
at P,
12
at
12 h + ls
=
Q.
Observe that both equations are the same, and one of them is redundant. By applying KVL to each of the loops in the network clockwise direction, we get
+ 212 212 + 3h
6h
=
0 from the left loop,
=
18 from the right loop.
Collecting all the equations, we get a system of linear equations:
j
h 6h
212
h +
+ 212
h =
+ st,
=
O 0 18.
By solving it, the currents are h = -1 amp, 12 = 3 amps and /3 = 4 amps. The negative sign for h means that the current h flows in the direction opposite to that shown in the figure. 0
Problem 1.29 Determine the currents in the networks given in Figure 1.5.
1.9.3 Leontiefmodel Another significant application of linear algebra is to a mathematical model in economics. In most nations, an economic society may be divided into many sectors that produce goods or services, such as the automobile industry, oil industry, steelindustry,
38
Chapter 1. Linear Equations and Matrices
communication industry, and so on. Then a fundamental problem in economics is to find the equilibrium of the supply and the demand in the economy. There are two kind of demands for the goods: the intermediate demand from the industries themselves (or the sectors) that are needed as inputs for their own production, and the extra demand from the consumer, the governmental use, surplus production, or exports. Practically, the interrelation between the sectors is very complicated, and the connection between the extra demand and the production is unclear. A natural question is whether there is a production level such that the total amounts produced (or supply) will exactly balance the total demandfor the production, so that the equality {Total output} =
=
{Total demand}
{Intermediate demand} + {Extra demand}
holds. This problem can be described by a system of linear equations, which is called the LeontiefInput-Output Model . To illustrate this, we show a simple example. Suppose that a nation's economy consists of three sectors: It = automobile industry, h == steel industry, and h = oil industry. Let x [Xl Xzx3f denote the production vector (or production level) in ]R3 , where each entry Xi denotes the total amount (in a common unit such as 'dollars' rather than quantities such as 'tons' or 'gallons') of the output that the industry Ii produces per year. The intermediate demand may be explained as follows. Suppose that, for the total output Xz units of the steel industry h 20% is contributed by the output of It. 40% by that of hand 20% by that of h Then we can write this as a column vector, called a unit consumption vector of h :
=
Cz =
0.2 ] 0.4 . [ 0.2
For example , if h decides to produce 100 units per year, then it will order (or demand) 20 units from It , 40 units from hand 20 units from h: i.e., the consumption vector of h for the production Xz = 100 units can be written as a column vector: 100cz = [2040 20]T. From the concept of the consumption vector, it is clear that the sum of decimal fractions in the column cz must be ::: 1. In our example, suppose that the demands (inputs) of the outputs are given by the following matrix , called an input-output matrix:
It
A = input
t, h
h
[0.3 0.1 0.3
t c\
output h 0.2 0.4 0.2
t cz
h
03 ]
0.1 0.3
t C3
.
1.9.3. Application: Leontiefmodel
39
In this matrix, an industry looks down a column to see how much it needs from where to produce its total output, and it looks across a row to see how much of its output goes to where. For example, the second row says that, out of the total output Xz units of the steel industry lz. as the intermediate demand, the automobile industry h demands 10% of the output Xl, the steel industry /z demands 40% of the output Xz and the oil industry h demands 10% of the output X3. Therefore , it is now easy to see that the intermediate demand of the economy can be written as Ax =
[ 0.3Xl + 0.2xz + 0.3X3 ] 0.3 0.2 0.3] [ Xl] 0.1 0.4 0.1 Xz = O.lXl + O.4xz + 0.lX3 . [ 0.3 0.2 0.3 X3 0 3Xl + 0.2xz + 0.3X3
=
Suppose that the extra demand in our example is given by d = [dl, di, d3f [30,20, ioi" , Then the problem for this economy is to find the production vector x satisfying the following equation:
x = Ax+d. Another form of the equation is (l - A)x = d, where the matrix I - A is called the Leontief matrix. If I - A is not invertible, then the equation may have no solution or infinitely many solutions depending on what d is. If I - A is invertible, then the equation has the unique solution x = (l - A)-ld. Now, our example can be written as
[]
;~ ] = [~:i ~:~ ~:i] ;~X3 0.3 0.2 0.3
[ X3
+[
;~10 ] .
In this example, it turns out that the matrix I - A is invertible and (l - A)-l =
2.0 1.0 1.0] 0.5 2.0 0.5 . [ 1.0 1.0 2.0
Therefore,
x
~
(l - AJ-1d
= [
~
l
which gives the total amount of product Xi of the industry I i for one year to meet the required demand . Remark: (1) Under the usual circumstances, the sum ofthe entries in a column of the consumption matrix A is less than one because a sector should require less than one units worth of inputs to produce one unit of output. This actually implies that I - A is invertible and the production vector x is feasible in the sense that the entries in x are all nonnegative as the following argument shows. (2) In general, by using induction one can easily verify that for any k = 1, 2, . . . ,
Chapter 1. Linear Equations and Matrices
40
If the sums of column entries of A are all strictly less than one, then limk-+oo A k = 0 (see Section 6.4 for the limit of a sequence of matrices). Thus, we get (l - A)(l + A + ... + A k + ...) = I, that is, (l - A)-I = I
+ A + ...+
Ak
+ ... .
This also shows a practicalway of computing (l - A)-I sinceby takingk sufficiently large the right-handside may be made veryclose to (l- A)-I. In Chapter6, an easier method of computing A k will be shown. In summary, if A and d have nonnegative entries and if the sum of the entries of each column of A is less than one, then I - A is invertibleand the inverseis given as the above formula. Moreover, as the formula shows the entries of the inverse are all nonnegative, and so are those of the production vectorx = (l - A)-Id. Problem 1.30 Determine the total demand for industries It, 12 and 13 for the input-output matrix A and the extra demand vector d given below:
A=
0.1 0.7 0.2] 0.5 0.1 0.6 with d [ 0.4 0.2 0.2
= O.
= services, h = manufacturing industries, and h agriculture . For each unit of output, II demands no services from 1], 0.4 units from lz. and 0.5 units from 13. For each unit of output. lz requires 0.1 units from sector II of services. 0.7 units from other parts in sector lz. and no product from sector 13. For each unit of output. 13 demands 0.8 units of services 110 0.1 units of manufacturing products from lz. and 0.1 units of its own output from lJ. Determine the production level to balance the economy when 90 units of services, 10 units of manufacturing. and 30 units of agriculture are required as the extra demand. Problem 1.31 Suppose that an economy is divided into three sectors : II
=
1.10 Exercises 1.1. Which of the following matrices are in row-echelon form or in reduced row-echelon form?
A~U
0 0 0 -3 ] 0 1 0 4 • 2 0 0 1
c~u E=[l
0 0 0 0
0 1 0 0
0 2 1 0
1 0 0 0 1 0 1 0 -2
h[l n-
-n D=[l nF=[l
0 O 0 0
0 1 0 0
0 0 1 0
1 0 0 0 1 1 0 0 1
0 1 0 0
0 0 0 0
0 0 1 0
~l
-H
1.10. Exercises
41
1.2. Find a row-echelon form of each matrix.
[I -3 212] 3 -9 10 2 9 2 -6 4 2 4 2 -6 8 1 7
(1)
(2)
'
[~
2 3 4 3 4 5 4 5 1 5 1 2 1 2 3
1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.
H
1.4. Solve the systems of equations by Gauss-Jordan elimination. (1)
3XI Xl
I~;
(2)
:
12;~ ~ ;~ + +
2X2 X2
;~
2x
;~ +;:
X3 3X3
+ + +
-~ I
X4 3X4
-8 .
1~
z
1.
4z
What are the pivots in each of 3rd kind elementary operations?
1.5. Which of the following systems has a nontrivial solution?
I
X
(1)
+
2y 2y 2y
+ + +
3z 2z 3z
=
0 12x + 0 (2) x O. 3x +
= =
y 2y y -
z
3z 2z
= =
0 0 O.
x + 1.6. Determine all values of the b, that make the following system consistent:
I
x +
y - z = bl 2y + z = b2 Y-Z=b3·
1.7. Determine the condition on b; so that the following system has no solution :
1
+ -
Y 2y
2x -
y
2x 6x
+ + +
7z l lz 3z
bl b2
b3.
1.8. Let A and B be matrices of the same size. (I) Show that, if Ax (2) Show that, if Ax
= 0 for all x, then A is the zero matrix. = Bx for all x, then A = B.
1.9. Compute ABC and CAB for
A~[i -~ :j. B~[jl C~[I 1.10. Prove that if A is a 3 x 3 matrix such that AB A
= clJ for some constant c.
1.11. Let A
= [~
= BA for every 3 x 3 matrix B , then
~ ~]. Find Ak for all integers k.
001
-I]
42
Chapter 1. Linear Equations and Matrices
1.12. Compute (2A - B)C and CC T for
A
= [~
~ ~] ,
B
I 0 I
= [-;
~ ~] ,
C
0 0 1
=[
~ ~ ~] .
-2 2 I
=
1.13. Let f(x) anxn + an_IX n- 1 + ...+ alx + ao be a polynomial. For any square matrix A, a matrix polynomial f (A) is defined as
n
f(A)
For f(x)
=
(I)A~[
3x 3
+
-i ~
= anAn + an_lAn-I + ...+ + 3, find f(A) for
alA
+ aol.
x 2 - 2x
(2)A~U -~-n
1.14. Find the symmetric part and the skew-symmetric part of each of the following matrices . (I) A
~] ,
=[ ; ;
-1 3 2
=
(2) A
1.15. Find AA T and A T A for the matrix A
[~
;
0 0
= [; 2
1.16. Let A -I
= [~
~ ~] .
421
3
8 4 0
U:J.
(I) Flod. matrix B such that AB
~
(2) Find a matrix C such that AC
= A2 +
A.
1.17. Find all possible choices of a, band c so that A that A-I
-1]. ~ ~ i].
= [; ~] has an inverse matrix such
= A.
1.18. Decide whether or not each of the following matrices is invertible. Find the inverses for invertible ones.
A=[~o ~ ~ :], 0 0 4
; -;].
B= [ 011I] 2 3 , 5 5 I
2
I
1.19. Find the inverse of each of the following matrices:
A=[=~ -~ ;]'B=[~ ~ ~ ~],c=[l ~ ~ ~](k=l=O)' 6411
1248
OOlk
1.20. Suppose A is a 2 x I matrix and B is a I x 2 matrix. Prove that the product AB is not invertible. 1.21. Find three matrices which are row equivalent to A
~ [~
-
i
3
4 ]
2
-I
-3
4
.
1.10. Exercises 1.22. Write the following systems of equations as matrix equations Ax computing A -I b:
+
I
=
= b and solve them by
2 XI X2 + X3 5 (2) XI + X2 X3 2x1 + X2 - 2X3 7, 4xI 3X2 + 2x3 1.23. Find the LDU factorization for each of the following matrices: X2 X2 -
2x1 -
(1)
1
(1) A =
[~ ~
3X3 4X3
= =
J.
(2) A =
U~ H
[~ ~
43
= =
5
-1 -3.
l
1.24. Find the LDL T factorization of the following symmetric matrices : (1) A
~
1.25. Solve Ax
L
(2) A
~
n
[:
= b with A = LU , where Land U are given as
= [ .: o
~ ~], I
-1
U
= [~
-~ -~] ,
n b~Ul
b
= [ -; ] .
0 0 1 4 Forward elimination is the same as Lc = b, and back-substitution is Ux = c.
1~ l
,00
= b by Gauss-Jordan elimination.
(2) Find the LDU factorizat ion of A. (3) Write A as a product of elementary matrice s. (4) Find the inverse of A. 1.27. A square matrix A is said to be nilpotent if A k
= 0 for a positive integer k.
(1) Show that any invertible matrix is not nilpotent. (2) Show that any triangular matrix with zero diagonal is nilpoten t. (3) Show that if A is a nilpotent with Ak
I
+ A + ' " + Ak- I .
= 0, then I -
1.28 . A square matrix A is said to be idempotent if A 2
A is invertible with its inverse
= A.
(l) Find an example of an idempotent matrix other than 0 or I . (2) Show that, if a matrix A is both idempotent and invertible, then A
= I.
1.29. Determine whether the following statements are true or false, in general, and justify your answers. (1) Let A and B be row-equivalent square matrices. Then A is invertible if and only if B is invertible . (2) Let A be a square matrix such that AA
= A. Then A is the identity.
(3) If A and B are invertible matrices such that A 2
= I and B2 = I, then (A B) -I = BA.
(4) If A and B are invertible matrices, A + B is also invertible . (5) If A, Band AB are symmetric, then AB
= BA .
(6) If A and B are symmetric and of the same size, then AB is also symmetric.
44
Chapter 1. Linear Equations and Matrices (7) If A is invertible and symmetric, then A -\ is also symmetric. (8) Let AB T
= I . Then A is invertible if and only if B is invertible .
(9) If a square matrix A is not invertible, then neither is AB for any B. (10) If E\ and E2 are elementary matrices , then E\E2
= E2E\ .
(11) The inverse of an invertible upper triangular matrix is upper triangular. (12) Any invertible matrix A can be written as A U is upper triangular.
= LU, where L is lower triangular and
2 Determinants
2.1 Basic properties of the determinant Our primary interest in Chapter 1 was in the solvability or finding solutions of a system Ax = b of linear equations . For an invertible matrix A, Theorem 1.9 shows that the system has a unique solution x = A-I b for any b. Now the question is how to determine whether or not a square matrix A is invertible. In this chapter, we introduce the notion of determinant as a real-valued function of square matrices that satisfies certain axiomatic rules, and then show that a square matrix A is invertible if and only if the determinant of A is not zero. In fact, it was shown in Chapter I that a 2 x 2 matrix A
= [~ ~]
is invertible if and only if
ad - be f:; O. This number is called the determinant of A, written det A, and is defined formally as follows :
Definition 2.1 For a 2 x 2 matrix A =
[~ ~] E M2x2(J~.), the determinant of
A is defined as det A = ad - be. Geometrically, it turns out that the determinant of a 2 x 2 matrix A represents, up to sign, the area of a parallelogram in the xy-plane whose edges are constructed by the row vectors of A (see Theorem 2.10). Naturally, one can expect to define a determinant function on higher order square matrices so that it has a geometric interpretation similar to the 2 x 2 case . However, the formula itself in Definition 2.1 does not provide any clue of how to extend this idea of determinant to higher order matrices. Hence, we first examine some fundamental properties of the determinant function defined in Definition 2.1. By a direct computation, one can easily verify that the function det in Definition 2.1 satisfies the following lemma. Lemma 2.1 (1) det
[~ ~]
J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
1.
46
Chapter 2. Determinants
Proof: (2) det
[~
(3) det [ ka
~
: ] = be - ad = -(ad - be) = - det [
~ la'
kb ~ lb' ]
= (ka + la')d -
(kb + lb')e
= k(ad - be) + l(a'd - b'e)
] = k det [ ae db] + l det [ a'e b' d .
o
In Lemma 2.5, it will be shown that if a function f : M2x2(lR) ~ lRsatisfies the properties (l)-{3) in Lemma 2.1, then it must be the function detdefined in Definition 2.1, that is, f(A) ad -be. The properties (1)-(3) in Lemma 2.1 of the determinant on M2x2(lR) enable us to define the determinant function for any square matrix.
=
Definition 2.2 A real-valued function f : Mnxn(lR) ~ lRofalln xn square matrices is called a determinant if it satisfies the following three rules:
=
(Rl) The value of f of the identity matrix is I , i.e., fUn) 1; (R2) the value of f changes sign if any two rows are interchanged; (R3) f is linear in the first row: that is, by definition,
krl
f
[
~. lr~ .
] =
.
kf
rn
[
~~] [~] . + . .
.
l
f
rn
.
.
,
rn
where r i's denote the row vectors [ail . . . ain] of a matrix . Remark: (1) To be familiar with the linearity rule (R3), note that all row vectors of any n x n matrix belong to the set Mlxn(lR), on which a matrix sum and a scalar multiplication are well defined.A real-valued function f : M 1 xn (R) ~ lR is said to be linear if it preserves these two operations: that is, for any two vectors x, y E M1 xn (R) and scalar k, f(x + y)
= f(x) +
f(y)
and
f(kx)
= kf(x),
or, equivalently f(kx+ly) = kf(x)+lf(y). Such a linear function will be discussed again in Chapter 4. (2) The determinant is not defined for a non-square matrix . Itis already shown that the det on 2x2 matrices satisfies the rules (Rl)-{R3). In the next section, one can see that for each positive integer n there always exists a function
2.1. Basic properties of the determinant
47
f : MnxnOR) ~ IR satisfying the three rules (Rl)-(R3) and such a function is unique (existence and uniqueness). Therefore, we say 'the' determinant and designate it as 'det' in any order. Let us first derive some direct consequences of the rules (Rl)-(R3). Theorem 2.2 The determinant satisfies the following properties. (1) The determinant is linear in each row. (2) If A has either a zero row or two identical rows, then det A = O. (3) The elementary row operation that adds a constant multiple ofone row to another row leaves the determinant unchanged. Proof: (I) Any row can be placed in the first row by interchanging rows with a change of sign in the determinant by the rule (R2), and then using the linearity rule (R3) and (R2) again by interchanging the same rows. (2) If A has a zero row, then this row is zero times the zero row so that det A 0 by (1). If A has two identical rows, then interchanging those two identical rows does not change the matrix itself but det A = - det A by the rule (R2), so that det A = O. (3) By a direct computation using (1), one can get
=
ri+krj det
= det
r,
rj
+
k det
o
in which the second term on the right-hand side is zero by (2).
The rule (R2) of the determinant function is said to be the alternating property, and the property (1) in Theorem 2.2 is said to be multilinearity. It is now easy to see the effect of elementary row operations on evaluations of the determinant. The first elementary row operation that 'multiplies a row by a constant k' changes the determinant to k times the determinant, by Theorem 2.2(1). The rule (R2) explains the effect of the second elementary row operation that 'interchanges two rows '. The third elementary row operation that 'adds a constant multiple of a row to another' is explained in Theorem 2.2(3). In summary, one can see that
det(EA) = det E det A for any elementary matrix E. For example, if E is the elementary matrix obtained from the identity matrix by 'multiplies a row by a constant k', then det(EA) k det A and det E k by Theorem 2.2(1) , so that det(EA) = det E detA . As a consequence, if two matrices A and B are row-equivalent, then det A = k det B for some nonzero number k.
=
=
48
Chapter 2. Determinants
Example 2.1 Consider a matrix
~
1
b A= [ b+c c+a
If one adds the second row to the third, then the third row becomes [a+b+c a+b+c a+b+c],
o
which is a scalar multiple of the first row. Thus , det A = O. Problem 2.1 Show that, for an n x n matrix A and k E JR, det(kA) Problem 2.2 Explain why det A
(1) A
=[
= k n det A.
= 0 for
a + l a+4 a+7] a+2 a+5 a+ 8 • a+3 a+6 a+9
Recall that any square matrix can be transformed into an upper triangular matrix by forward elimination, possibly with row interchanges. Further properties of the determinant are obtained in the following theorem. Theorem 2.3 The determinant satisfies the following properties. (1) (2) (3) (4)
The determinant ofa triangular matrix is the product of the diagonal entries. The matrix A is invertible if and only if det A =1= O. For any two n x n matrices A and B, det(AB) = det A det B. detA T = detA.
=
all ... ann by the Proof: (1) If A is a diagonal matrix, then it is clear that det A multilinearity in Theorem 2.2(1) and rule (Rj). Suppose that A is a lower triangular matrix . If A has a zero diagonal entry, then a forward elimination, which does not change the determinant, produces a zero row, so that det A = O. If A does not have a zero diagonal entry, a forward elimination makes A row equivalent to the diagonal matrix D whose diagonal entries are exactly those of A, so that det A = det D= all ' " ann. Similar arguments can be applied to an upper triangular matrix. (2) A square matrix A is row equivalent to an upper triangular matrix U through a forward elimination possibly with row interchanges : that is, A = PLU for some permutation matrix P and a lower triangular matrix L whose diagonal entries are all 1'S o Thus det A = ± det U , and the invertibility of U and A are equivalent. However, U is invertible if and only if U has no zero diagonal entry by Corollary 1.10, which is equivalent to det U =1= 0 by (1).
2.1. Basic properties of the determinant
49
(3) If A is not invertible, then neither is AB, and so det(AB) = 0 = det A det B. If A is invertible, it can be written as a product of elementary matrices by Theorem 1.9, say A = EIE2 ' " Ei, Then by induction on k,
det(AB) =
det(EIE2'" EkB) = det El det E2 ... det Ek det B = det(El E2.. . Ek) det B = detAdet B.
(4) Clearly, A is not invertibleif and only if AT is not. Thus, for a singular matrix A we have det AT = 0 = det A. If A is invertible, then write it again as a product of elementary matrices, say A = E; E2.. . Ei , But, det E = det E T for any elementary matrix E. In fact, if E is an elementary matrix obtained from the identity matrix by row interchange, then det E T = -1 = det E by (R2), and all elementary matrices of other types are triangular, so that det E = det E T . Hence, we have by (3)
detA T
= = = = =
det(EI E2'" Ek)T det(E[ .. . EI Ef) det E[ . . . det EI det Ef det Ek' . . det E2 det El detA.
0
Remark: From the equality det A = det AT, one could define the determinant in terms of columns instead of rows in Definition2.2, and Theorem 2.2 is also true with 'columns' instead of 'rows' . Example 2.2 (Computing det A by a forward elimination) Evaluate the determinant
of A
=
[
2-4 0 0] 1 -3 1
0
0
1
-1
3 -4
2
.
3-1
Solution: By using forward elimination, A can be transformed to an upper triangular matrix U . Since the forward elimination does not change the determinant, the determinant of A is simply the product of the diagonal entries of U:
det A
=
det U
= det
2-4 0 0] = o
[o 0
-1 0 0 -1
0
1 4
2 . ( _ 1)2 . 13 = 26.
0 13
Problem 2.3 Prove that if A is invertible, then det A-I
= 1/ det A.
o
50
Chapter 2. Determinants
Problem 2.4 Evaluate the determinant of each of the following matrices:
(1)
13 24 14] [ x13 1 4 2] [11 12 23 21 22 3 1 1 , (2) 31 32 33 34 , (3) x 2 [ -2 2 3 41 42 43 44 x
2.2 Existence and uniqueness of the determinant Throughout this section, we prove the following fundamental theorem for the determinant. Theorem 2.4 For any natural number n, (1) (Existence)thereexistsareal-valuedfunetionf: Mnxn(lR) ~ IRwhiehsatisfies the three rules (RI)-(R3) in Definition 2.2. (2) (Uniqueness) Suehfunetion is unique.
Clearly, it is true when n = 1 with conclusion det[a] = a. For 2 x 2 matrices: When n = 2, the existence theorem comes from Lemma 2.1. The next lemma shows that any function f : M2x2(lR) ~ IR satisfying the three rules (RI)-(R3) must be the det in Definition 2.1, which implies the uniqueness of the determinant function on M2x2(1R). Lemma 2.5 If a function f : M2x2(lR)
f
[~ ~] = ad -
be. That is, f(A)
Proof: First, note that
f(A) =
f
f
[~
b]
[~ ~] =
~
R satisfies the rules (Rj)-(R3), then
= detA.
= - 1 by the rules (Rj) and (R2).
f [
~ + 0 0+ ~ ]
f[~ ~]+f[~~] = f[~ ~]+f[~ ~]+f[~ ~]+f[~~] =
ad
+ 0+
0 - be = ad - be,
where the third and fourth equalities come from the multilinearity in Theorem 2.2(1).
o
For 3 x 3 matrices: For n = 3, the same process as in the case of n = 2 can be applied . That is, by repeated use of the three rules (Rj)-(R3) as in the proof of
2.2. Existence and uniqueness of the determinant
51
Lemma 2.5, one can derive an explicit formula for det A of a matrix A = [aij] in M3x3(lR) as follows :
[all
a\2
a31
an
[all det ~
a22
a\,]
det a21 a22 a23
=
+det
a33
[0 + det ~ ]+de{ ~ a\2oo 0] 0 0
0
a23
0
[ «n ~
a33
0 0
a31
o ] + det o
a23
a32
[0
a21
a\20
0
0
0 0
a21
af]
an
0] [0 0 a~, ] 0
+ det
a33
0
a22
a31
0
The first equality is obtained by the multilinearity in Theorem 2.2(1): First, by applying it to the first row with [all
a \2
an]
= [all
0 0]
+
[0 a\2 0]
+
[0 0 an],
det A becomes the sum of determinants of three matrices . Observe that, in each of the three matrices, the first row has just one entry from A and all others zero. Subsequently, by applying the same multilinearity to the second and the third rows of each of the three matrices, one gets the sum of the determinants of 33 = 27 matrices, each of which has exactly three entries from A, one in each of three rows, and all other entries zero. In each of those 27 matrices, if any two of the three entries of A are in the same column, then the matrix contains a zero column so that its determinant is zero. Consequently, the determinants of six matrices are left to get the first equality. The second equality is just the computation of the six determinants by using the rules (R2) and Theorem 2.3(1). In fact, in each of those six matrices, no two entries from A are in the same row or in the same column, and thus one can take suitable 'column interchanges' to convert it to a diagonal matrix. Thus the determinant of each of them is just the product of the three entries with ± sign which is determined by the number of column interchanges. Remark: The explicit formula for the determinant of a 3 x 3 matrix can easily be memorized by the following scheme. Copy the first two columns and put them on the right of the matrix , and compute the determinant by multiplying entries on six diagonals with a + sign or a - sign as in Figure 2.1. This is known as Sarrus's method for 3 x 3 matrices. It has no analogue for matrices of higher order n ::: 4. The computation of the explicit formula for det A shows that, if any real-valued function f : M3x3(lR) -+ lR satisfies the rules (Rl)-(R3) , then f(A) = detA for any matrix A = [aij] E M3x3(lR). This proves the uniqueness theorem when n = 3. On the other hand, one can easily show that the given explicit formula for det A of a matrix A E M3x3(lR) satisfies the three rules, which proves the existence when
52
Chapter 2. Determinants
Figure 2.1. Sarrus's method n = 3. Therefore, for n = 3, it shows both the uniqueness and the existence of the determinant function on M3x3(lR), which proves Theorem 2.4 when n = 3.
Problem 2.5 Showthatthegivenexplicitformula of thedeterminant for3 x 3 matrices satisfies the three rules (RIHR3). Problem 2.6 Use Sarrus's methodto evaluate the determinants of (1) A
=[
142] ,
3 1 1
-2 2 3
(2) A
=
[ 4-2~ 1
-2
_2~] .
Now, a reader might have an idea how to prove the uniqueness and the existence of the determinant function on Mn xn(R) for n > 3. If so, that reader may omit reading its continued proof below and rather concentrate on understanding the explicit formula of det A in Theorem 2.6. For matrices of higher order n > 3: Again , we repeat the same procedure as for the 3 x 3 matrices to get an explicit formula for det A of any square matrix A = [aij] of order n. (Step 1) Just as the case for n = 3, use the multilinearity in each row of A to get det A as the sum of the determinants of nn matrices. Notice that each one of n" matrices has exactly n entries from A, one in each of n rows . However, if any two of the n entries from A are in the same column, then it must have a zero column, so that its determinant is zero and it can be neglected in the summation. Now, in each remaining matrix, the n entries from A must be in different columns: that is, no two of the n entries from A are in the same row or in the same column . (Step 2) Now, we aim to estimate how many of them remain. From the observation in Step 1, in each of the remaining matrices, those n entries from A are of the form
with some column indices i , i, k, .. . , l. Since no two of these n entries are in the same column, the column indices i , l, k, .. . , £ are just a rearrangement of I, 2, .. . , n without repetition or omissions. It is not hard to see that there are just n! ways of such rearrangements, so that n! matrices remain for further consideration. (Here n! = n(n - 1) .. ·2· I, called n factorial.)
2.2. Existence and uniquenessof the determinant
53
Remark: In fact, the n! remaining matrices can be constructed from the matrix A = [aij] as follows: First , choose anyone entry from the first row of A, say ali, in the i -th column. Then all the other n - I entries a2j, a3k. .. . , ani should be taken from the columns different from the i -th column. That is, they should be chosen from the submatrix of A obtained by deleting the row and the column containing ali . If the second entry a2j is taken from the second row, then the third entry a3k should be taken from the submatrix of A obtained by deleting the two rows and the two columns containing ali and a 2j, and so on. Finally, if the first n - 1 entries
are chosen, then there is no alternative choice for the last one an i since it is the one left after deleting n - 1 rows and n - I columns from A.
(Step 3) We now compute the determinant of each of the n! remaining matrices. Since each of those matrices has just n entries from A so that no two of them are in the same row or in the same column, one can convert it into a diagonal matrix by ' suitable ' column interchanges. Then the determinant will be just the product of the n entries from A with '±' sign , which will be determined by the number (actually the parity) of the column interchanges. To determine the sign , let us once again look back to the case of n = 3. Example 2.3 (Convert intoa diagonal matrixby column interchanges) Suppose that one of the six matrices is of the form:
0 [
o
0
a22
a31
a 13]
0
0
.
0
Then , one can convert this matrix into a diagonal matrix by interchanging the first and the third columns. That is, det
0 [
0
a 22
0
a13 ]
a31
0
0
0
= - det
[ a13
0
0
Note that a column interchange is the same as an interchange of the corresponding column indices. Moreover, in each diagonal entry of a matrix, the row index must be the same as its column index . Hence, to convert such a matrix into a diagonal matrix, one has to convert the given arrangement of the column indices (in the example we have 3, 2, 1) to the standard order 1, 2, 3 to be matched with the arrangement 1,2,3 of the row indices. In this case, there may be several ways of column interchanges to convert the given matrix to a diagonal matrix. For example, to convert the given arrangement 3, 2, 1 of the column indices to the standard order 1, 2, 3, one can take either just one interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and then 2 and 1. In either case, the parity is odd so that the "-" sign in the computation of the determinant came from (-1 ) I = (_ 1)3, where the exponents mean the numbers of interchanges of the column indices. 0
54
Chapter 2. Determinants
To formalize our discussions, we introduce a mathematical terminology for a rearrangement of n objects: Definition 2.3 A permutation of n objects is a one-to-one function from the set of
n objects onto itself.
=
In most cases, we use the set of integers Nn {I, 2, .. . , n} for a set of n objects. A permutation a of Nn assigns a numbera(i) in Nn to each number i in Nn , and this permutation a is usually denoted by a = (a(I), a(2), .. . , a(n») =
(a~l) a~2)
:: :
a~n») '
Here , the first row is the usual lay-out of Nn as the domain set, and the second row is the image set showing an arrangement of the numbers in Nn in a certain order without repetitions or omissions. If Sn denotes the set of all permutations of Nn, then, as mentioned previously, one can see that Sn has exactly n! permutations. For example, S2 has 2 = 2!, S3 has 6 = 3!, and S4 has 24 = 4! permutations. Definition 2.4 A permutation a = (iI, h, .. . , jn) is said to have an inversion if > jt for s < t (i.e., a larger number precedes a smaller number).
i,
For example, the permutation a = (3,1,5,4,2) has five inversions, since 3 precedes 1 and 2; 5 precedes 4 and 2; and 4 precedes 2. Note that the identity (1, 2, . .. , n) is the only one without inversions . Definition 2.5 A permutation is said to be even if it has an even number of inversions, and it is said to be odd if it has an odd number of inversions. For a permutation a in Sn, the sign of a is defined as _ { 1 if a is an even permutation } _ (_I)k sgn (a ) -1 if a is an odd permutation , where k is the number of inversions of a. For example, when n = 3, the permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2) are even, while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd. (a(1), a(2), . . . , a(n») in Sn into In general, one can convert a permutation a the identity permutation (1, 2, .. . , n) by transposing each inversion of a . However, the number of necessary transpositions to convert the given permutation into the identity permutation need not be unique as shown in Example 2.3. An interesting fact is that, even though the number of necessary transpositions is not unique, the parity (even or odd) is always the same as that of the number of inversions. (This may not be clear and the readers are suggested to convince themselves with a couple of examples.)
=
We now go back to Step 3 to compute the determinants of those remaining n! matrices. Each of them has n entries of the form al 17 ( I ) , a2u(2), . . . , an17 (n)
2.2. Existence and uniqueness of the determinant
55
for a permutation 0' E Sn. Moreover, this can be converted into a diagonal matrix by column interchanges corresponding to the inversions in the permutation 0' = (O'(l), 0'(2), . . . , O'(n)}. Hence, its determinant is equal to sgn(0')alu(1)a2a(2) . . . anu(n)' This is called a signed elementary product of A. Our discussions can be summarized as follows to get an explicit formula for det A:
Theorem 2.6 For an n x n matrix A, det A =
L
sgn(0')al u(1)a2a(2) . . . anu(n) '
UES n
That is, det A is the sum ofall signed elementary products of A. This shows that the determinant must be unique if it exists. On the other hand, one can show that the explicit formula for det A in Theorem 2.6 satisfies the three rules (Rl)-(R3). Therefore, we have both existence and uniqueness for the determinant function of square matrices of any order n 2: 1, which proves Theorem 2.4. As the last part of this section , we add an example to demonstrate that any permutation 0' can be converted into the identity permutation by the same number of transpositions as the number of inversions in 0'.
Example 2.4 (Convert into the identity permutation by transpositions) Consider a permutation 0' = (3,1 ,5,4, 2) in Ss. It has five inversions, and it can be converted to the identity permutation by composing five transpositions successively: 0'
=
{3, 1,5,4, 2} -+ {I, 3, 5, 4, 2} -+ {I, 3, 5, 2, 4} -+ {I, 3, 2, 5, 4}
-+
{I, 2, 3, 5, 4} -+ {I, 2, 3, 4 , 5}.
It was done by moving the number 1 to the first position, and then 2 to the second position by transpositions, and so on. In fact, the bold faced numbers are interchanged in each one of five steps, and the five transpositions used to convert 0' into the identity permutation are shown below : 0'{2, 1,3,4, 5){1, 2, 3,5, 4}{1,2, 4, 3, 5}{1,3, 2, 4, 5}{1, 2, 3, 5, 4}= (I, 2, 3,4,5).
Here, two permutations are composed from right to left for notational convention : i.e ., if we denote T (2, 1,3,4,5), then O'T 0' 0 T. For example, O'T(2) 0'(1) 3. Also, note that 0' can be converted to the identity permutation by composing the following three transpositions successively:
=
=
0' (2,1,3,4, 5}{1 , 5,3,4, 2}{1,2, 5,4, 3)={1, 2, 3, 4, 5).
=
=
o
56
Chapter 2. Determinants
It is not hard to see that the number of even permutations is equal to that of odd In the case n = 3, one can notice that there are three terms permutations, so it is with + sign and three terms with - sign in det A.
if.
Problem 2.7 Show that the number of even permutations and the number of odd permutations in S« are equal.
=
Problem 2.8 Let A [CI cn] be an n x n matrix with the column vectors c/s. Show that cn] = (-1)j -1 detlc] ... Cj .. . cn]. Note that the same kind of det[c j CI .. . Cj_1 Cj+1 equality holds when A is written in row vectors.
2.3 Cofactor expansion Even ifone has found an explicit formula for the determinant as shown in Theorem 2.6, it is not much help in computing because one has to sum up n! terms, which becomes a very large number as n gets large. Thus, we reformulate the formula by rewriting it in an inductive way, by which the summing time can be reduced. The first factor ala (I) in each of the n! terms is one of al1, a12, ... ,al n in the first row of A. Hence, one can divide the n! terms of the expansion of det A into n groups according to the value of a(1): Say, det A
=
L
sgn(a)ala(l)a2a(2) ... ana(n)
«es, al1AII
+
al2 AI2
+ .. . +
alnAln,
where, for j = 1,2, ... , n, Al j is defined as Alj
=
sgn(a)a2a(2) . .. ana(n) '
L aESn,a(I)=j
This number A Ij will tum out to be the determinant, up to a ± sign, ofthe submatrix of A obtained by deleting the l-st row and j -th column. This submatrix will be denoted by Mlj and called the minor of the entry al j.
Remark: If we replace the entries all, a 12, . ..
, al n in the first row of A as unknown variables XI, X2 , • • . , X n , then det A is a polynomial of the variables Xi , and the number A I j becomes the coefficient of the variable X j in this polynomial.
We now aim to compute Alj for j Al1
=
L aESn,a(l)=1
= 1,2,
sgn(a)a2a(2)" ·ana(n)
.. . , n . Clearly, when j
= 1,
= Lsgn('r)a2T(2)"
. anT(n) ,
T
summing over all permutations r of the numbers 2, 3, . .. , n. Note that each term in A II contains no entries from the first row or from the first column of A. Hence, all
2.3. Cofactor expansion
57
the (n - I)! terms in the sum of All are just the signed elementary products of the submatrix Mll of A obtained by deleting the first row and the first column of A, so that All = det Mll . To compute the number Alj for j > 1, let A = [CI . . . cn] with the column vectors cj's and let B = [Cj CI . . . Cj-l Cj+1 ... cn] be the matrix obtained from A by interchanging the j-th column with its preceding j - 1 columns one by one up to the first. Then, det A = (-1 )j-I det B (see Problem 2.8). Write det B
= bllBll +
b12B\2 + . . . + bi« BIn
as the expansion of det B. Then, alj = bll and the number BII is the coefficient of the entry bll in the formula of det B. By noting A Ij is the coefficient of the entry alj in the formula of det A, one can have Alj = (-I)j-1 BII. Moreover, the minor Mlj of the entry ai] is the same as the minor NIl of the entry bll. Now, by applying the previous conclusion All = det Mll to the matrix B, one can obtain Bll = det Nll and then . I . I . I Alj = (-I)J- Bll = (-I)J- detNll = (-I)J - detMlj' In summary, one can get an expansion of det A with respect to the first row: detA =
=
allAll+a\2AI2+·· ·+alnAln all det Mll - a\2 det MI2 + .. . + (_l) I+na ln det MIn .
This is called the cofactor expansion of det A along the first row. There is a similar expansion with respect to any other row, say the i-th row. To show this, first construct a new matrix C from A by moving the i-th row of A up to the first row by interchanges with its preceding i - I rows one by one. Then det A = (_1)i-1 det C as before . Now, the expansion of det C with respect to the first row [ail .. . ain] is detC = ailCII +ai2C\2 + . .. +ainCln, where Clj =:. (-I)j-1 det Mlj and Mlj denotes the minor of Clj in the matrix C. Noting that Mlj = Mij as minors, we have Aij = (-I)i- IClj =
(-I/+ j detMij
as before and then det A =
ailAil + ai2Ai2 + .. . + ainAin.
The ~u~matrix Mij is called the minor of the entry aij and the number Aij ( _1)1 + J det Mij is called the cofactor of the entry aij. Also, one can do the same with the column vectors because det AT = det A. This gives the following theorem:
58
Chapter 2. Determinants
Theorem 2.7 Let A be an n x n matrix and let Aij be the cofactor of the entry aij' Then, (1) for each 1
s i s n, det A = ail Ail
+ aizAiz + . . . + ainAin,
called the cofactor expansion of det A along the i -th row. j n,
(2) For each 1
s s
det A = aljAlj
+ a2jAzj + . . . + anjAnj,
called the cofactor expansion of det A along the j -tb column.
This cofactor expansion gives an alternative way of defining the determinant inductively. Remark: The sign (-1 )i+ j of the cofactor can be explained as follows:
+ + (_l)n+1
(_l)l+n (_l)2+n (_1)3+n
+
+
+
(_l)n+Z (_l)n+3
(_l)n+n
Therefore, the determinant of an n x n matrix A is the sum of the products of the entries in any given row (or, any given column) with their cofactors. Example 2.5 (Computing det A by a cofactor expansion) Let
23] 5 6
.
8 9 Then the cofactors of all , al2 and al3 are
All = (_1)1+1 det [ ;
~]
5 ·9-8 ·6=-3,
AI2 = (-l)I+Zdet[
~ ~]
=
AI3 = (_1)1+3det[
~ ~]
= 4·8-7 ·5=-3,
(-1)(4·9-7 .6)=6,
respectively. Hence the expansion of det A along the first row is detA = allAII
+ alzAI2 + al3AI3
= 1 . (-3)
+ 2·6 + 3· (-3) =
O.
0
2.3. Cofactor expansion
59
The cofactor expansion formula of det A suggests that the evaluation of Aij can be avoided whenever aij = 0, because the product aij Aij is zero regardless of the value of Aij. Therefore, the computation of the determinant will be simplified by making the cofactor expansion along a row or a column that contains as many zero entries as possible. Moreover, by using the elementary row (or column) operations which do not alter the determinant, a matrix A may be simplified into another one having more zero entries in a row or in a column whose computation of the determinant may be simpler. For example, a forward elimination to a square matrix A will produce an upper triangular matrix U, and so the determinant of A will be just the product of the diagonal entries of U up to the sign caused by possible row interchanges. The next examples illustrate this method for an evaluation of the determinant.
Example 2.6 (Computing det A by aforward elimination and a cofactor expansion) Evaluate the determinant of
I
I -1
-3
A --
2
4
-1]
1 -1
2 -5 -3 -2 6-4
8
.
I
Solution: Apply the elementary operations: 3 x row 1 (-2) x row 1
2 x row 1
+
row 2,
+
row 3,
+
row 4
to A; then
det A = det
1 -1 1 0 -3
o [
o
4
2-1]
_~ ~~ o -1
=
det
[1 7-4] -3 -7 10 4 0-1
.
Now apply the operation : 1 x row 1 + row 2, to the matrix on the right-hand side, and take the cofactor expansion along the second column to get
7
-7
o
-4 ] 10
=
-1
=
det
[-~4 ~0
-1
(_1)1+2.7. det [
-7(2 - 24) = Thus, det A = 154.
-:]
-~
154.
o
60
Chapter 2. Determinants
Example 2.7 Show that det A = (x - y)(x - z)(x - w)(y - z)(y - w)(z - w) for the Vandermonde matrix of order 4:
Solution: Use Gaussian elimination. To begin with, add (-1) x row 1 to rows 2, 3, and 4 of A :
3 detA =
=
det
[ 01 y -x x
o o
z-x w-x
J
_ x' y3 - x 3 ] [ y-x y' det Z - x Z2 _ x 2 Z3 _ x 3 w-x
=
y2 x' _ x 2 y3 x_ x3 Z2 - x 2 Z3 _ x 3 w 2 _ x 2 w 3 _x 3
w 2 _x 2 w 3 _x 3
(y - x)(z - x)(w - x) det [ :
y+x z+x w+x
Z2 +xz +x 2 w 2 +xw +x 2
~
y+x z-y w-y
(z - y)(z + y + x) (w - y)(w + y + x)
(x - y)(x - Z)(w - x)det [
y' +xy +x' ]
y' +xy +x'
]
(x - y)(x - z)(w _ x) det [ Z - Y (z - y)(z + y + x) ] w-y (w-y)(w+y+x)
~
=
(x - y)(x - z)(x - w)(y - z)(w - y) det [
=
(x - y)(x - z)(x - w)(y - z)(y - w)(Z - w).
z+y+x] w+y+x
0
Problem 2.9 Use cofactor expansions along a row or a column to evaluate the determinants of the following matrices:
(1) A
=
[~ ~ ~ ~] 2 2 0 1
2 220
Problem 2.10 Evaluate the determinant of
'
2.4. Cramer's rule
(2) B
==
[
a 0 -a 0 -b-d -c
-e
be] c
d
o 1 -I 0
61
.
2.4 Cramer's rule In Chapter 1, we have studied two ways for solving a system of linear equations Ax = b: (i) by Gauss-Jordan elimination (or by LDU factorization) or (ii) by using A -I if A is invertible. In this section, we introduce another method for solving the system Ax = b for an invertible matrix A . The cofactor expansion of the determinant gives a method for computing the inverse of an invertible matrix A. For i f. j, let A * be the matrix A with the j-th row replaced by the i-th row. Then the determinant of A* must be zero, because the i-th and j-th rows are the same. Moreover, with respect to the j-th row, the cofactors of A* are the same as those of A : that is, Ajk = Ajk for all k = 1, . .. , n. Therefore, we have O=detA* = ailAjl+aizAjZ+ .. ·+ainAjn =
ailAjl
+
«ns n
+ ... +
ainAjn .
This proves the following lemma. Lemma 2.8
ifi=j ifif.j . Definition 2.6 Let A be an n x n matrix and let A ij denote the cofactor of aij. Then the new matrix
[A~I A~z
~~: ~~~ ::: ~~] .'::
•
is called the matrix of cofactors of A. Its transpose is called the adjugate of A and is denoted by adj A. It follows from Lemma 2.8 that detA
0 detA
~
~
o
A · adj A =
[
If A is invertible, then det A
~
] = (detA)/.
detA
f. 0 and we may write A (de~ A adj A) =
I , Thus
62
Chapter 2. Determinants
and
A = (det A) adj (A -I)
by replacing A with A -I. Example2.8 (Computing A-I with adj A) For a matrix A
[ -cd-b] a
, and if det A = ad - be
= [~ ~], adj A
=
f:. 0, then
A-I _ 1 [ d -ab ] . - ad -bc -c Problem 2.11 Compute adj A and A -I for A
=[ ;
i ~] .
2 -2 1
Problem 2.12 Show that A is invertible if and only if adj A is invertible, and that if A is invertible, then (adj A)-I
= -A- = adj(A- 1 ) . detA
Problem 2.13 Let A be an n x n invertible matrix with n > 1. Show that (1) det(adj A) (2) adj(adj A)
= (detA)n-l ; = (det A)n-2 A.
Problem 2.14 For invertible matrices A and B, show that
=
(1) adj AB adj B . adj A; (2) adj QAQ-I Q(adj A)Q-l for any invertible matrix Q; (3) if AB BA , then (adj A)B B(adj A) .
=
=
=
In fact, these three properties are satisfied for any (invertible or not) two square matrices A and B. (See Exercise 6.5.)
The next theorem establishes a formula for the solution of a system of n equations in n unknowns . It may not be useful as a practical method but can be used to study properties of the solution without solving the system. Theorem 2.9 (Cramer's rule) Let Ax = b be a system ofn linear equations in n unknowns such that det A f:. O. Then the system has the unique solution given by detCj detA '
x'--]-
j = 1,2, ... , n,
where C j is the matrix obtainedfrom A by replacing the j -th column with the column matrixb = [bl b2 .. . bnf.
2.4. Crame r 's rule
63
=
Proof: If det A =1= 0, then A is invertible and x A-I b is the unique solution of Ax = b. Since x = A-l b = _ I_ (adj A)b, detA it follows that det Cj detA
o
Exam ple 2.9 Use Cramer 's rule to solve
I
+ + +
Xl 2Xl Xl
Solution:
+ + +
2X2 2X2 2 X2
X3 = X3 = 3X3 =
A~ [ ; 22I] 2 3 I
n
2 [ 50 60 2 90 2
Cl =
C2~
Therefore,
det CI - = 10 det A '
[i
,
50 1] 60 I , 90 3
det C2 detA
Xl = -
50 60 90.
X2 = - - = 10,
C3 ~
[i
detC3 det A
2 50 ] 2 60 . 2 90
X3 = - - = 20.
o
Cramer's rule provides a convenient method for writing down the solution of a system of n linear equations in n unknowns in terms of determina nts. To find the solution, however, one must evaluate n + I determinants of order n, Evaluating even two of these determi nants generally involves more computations than solving the system by using Gauss-Jordan elimination. Problem 2.15 Use Cramer 's rule to solve the following systems. (I)
(2)
I
3Xl - 2x 1 2 x 4 x
4x2 4X2 5X2
+ +
3
+ + 5
- + y z 7 2 + -y + z 2
-
y
-
I
-
z
-2
3X3 5X3 2X3
6
1. 3 0 2.
Problem 2.16 Let A be the matrix obtained from the identity matrix In with i-th column replaced by the column vector x = [Xl .. . xn f. Compute det A.
64
Chapter 2. Determinants
2.5 Applications 2.5.1 Miscellaneous examples for determinants Example 2.10 Let A be the Vandermonde matrix of order n:
A=
1 Xl Xf ... X~_l] 1 X2 X22 . . . X2n-l . . . . '
..
.. . . "
[
1
Xn
x nn -
x n2
l
Its determinant can be computed by the same method as in Example 2.7 as follows:
TI
detA =
(Xj -Xi) .
l~i
Example 2.11 Let Aij denote the cofactor of aij in a matrix A = [aij]. If n > 1, then 0 Xl X2 Xn xl au al2 al n n n a2n = - LLAijXiXj . det x2 a21 a22 i=l j=l x n anI an2
ann
Solution: First take the cofactor expansion along the first row, and then compute the cofactor expansion along the first column of each n x n submatrix. 0 Example 2.12 Let f I. For the matrix
12 , ... , fl(X) f{(x)
[
fn be n real-valued differentiable functions on JR.
h(x) f 2(x )
f?-:I) (x) j 2(n-
l)( )
x
fn(x)
f~(x)
.. ·
]
f~n-:l\X)
,
its determinant is called the Wronskian for (fl (x), h(x) , . . . , fn(x)} , For example , det
[
X 1
sin X C?S X
o - sm x
cos X ] sin x - cos x -
= - x,
but det
sin X X + sin X ] 1 + C?SX 0 - sm x - sm x
[ X 1
C?SX
= o.
In general, in fl , 12, ..., fn , if one of them is a constant multiple of another or a sum of such multiples, then the Wronskian must be zero.
2.5.1. Application: Miscellaneousexamples for determinants
65
Example 2.13 An n x n matrix A is called a circulant matrix if the i-th row of A is obtained from the first row of A by a cyclic shift of the i - I steps, i.e., the general form of the circulant matrix is
A=
al a2 a3 an al a2 an-I an al
an an-I an-2
a2 a3 a4
al
where w = e2rri /3 is the primitive root of unity. In general, for n > 1,
n
n-I
detA =
(al +a2Wj +a3w7 +... +anwrl) ,
j =O
where W j = e2rrij In, j = 0, 1, ... , n - 1, are the roots of unity. (See Example 8.18 for the proof.)
Example 2.14 A tridiagonal matrix is a square matrix of the form
al bl 0 CI a2 b2
0 0
0 0
0 0
0
C2 a3
0
0
0
0 0 0
0 0 0
Tn =
0 0 0
0 an-2 bn-2 Cn-2 an-I bn-I 0 Cn-I an
The determinant of this matrix can be computed by a recurrence relation: Set Do = 1 and Dk = det for k 2: 1. By expanding with respect to the k-th row, one can have a recurrence relation
n
n
The following two special cases are interesting .
Case (1) Let all ai, b j , Ck be the same, say a;
D2 = 0 and
= bj = Ck = b >
O. Then, DI
= b,
66
Chapter 2. Determinants Dn = bDn-1 - b 2D n_2
for n ~ 3.
Successively, one can find D3 = = _b 4 , D5 = 0, . . .. In general, the n-th term D n of the recurrence relation is given by
_b 3 , D4
o; =
n
b [cos
n; + ~ n; l sin
Later in Section 6.3.1, it will be discussed how to find the n-th term of a given recurrence relation. (See Exercise 8.6.)
Case (2) Let all bj
= 1 and all Ck = -1, -1
a2
0 1
0 0
0 0
0 0
0
-1
a3
0
0
0
0 0 0
0 0 0
0 0 0
an-2 an-I
0 1 an
al
(al . .. an)
Then,
and let us write
= det
(ala2·· · an) (a2a3 .. . an)
-1
-1
0 1
=al+ a2
1
+ a3+
+
an-I
1
+-an
ih
Proof: Let us prove it by induction on n. Clearly, al + = «~~» show that 1 (ala2 . . . an) al + = , (a2a3 . . . an) (a2a3 . . . an) (a3a4 · · · an)
.It remains to
i.e., al (a2 . .. an) + (a3 .. . an) = (a,a2 . . . an) ' But this identity follows from the D previous recurrence relation, since (a,a2' . . an) = (an . . . a2a,) .
Example 2.15 (Binet-Cauchy formula) Let A and B be matrices of size n x m and m x n, respectively, and n ::: m. Then
where Ak! ...k n is the minor obtained from the columns of A whose numbers are kl' . . . ,kn and Bk! ...kn is the minor obtained from the rows of B whose numbers
2.5.2. Application:Area and volume
67
are kl' ... ,kn . In other words, det(AB) is the sum of products of the corresponding majors of A and B, where a major of a matrix is, by definition, a determinant of maximal order minor in the matrix. Proof: Let C = AB, Cij = L~=l aikbkj' Then
m
L
kt.....kn=l
alkt ' . . ankn
L
(_1)<1 bkt<1(l) ... bkn<1(n) <1eSn
m
L
=
alkl" . anknB kl...kn.
kt.....kn=l The minor Bkl ...kn is nonzero only if the numbers kl' summation can be performed over distinct numbers kl ' (-1)' Bkt ...kn for any permutation r of the numbers kl'
1 -1
For example, if A = [ 2
det(AB) = det [
~
3 3
2-1 2] and B
-1] 2 det [12 -12 ]
+
det [ 21 3] 2 det [11 2] 2
+
det [
=
+
+
~[
,kn are distinct. Thus, the ,kn . Since B,(kIl...,(kn) = ,kn , we have
21 -12 ] _; • then
~
det [12 -13] det [ - 31 2] 1
det [ -12 -13] det [ -23 -1] 1
3] [2 -1] + [3 3] [-3 1]
- 1 2 2
det
1
2
det
-1 2
det
1 2
-167.
2.5.2 Area and volume In this section, we demonstrate a geometrical interpretation of the determinant of a square matrix A as the volume (or area for n = 2) of the parallelepiped peA) spanned by the row vectors of A. For this, we restrict our attention to the case of n = 2 or 3 for a visualization, even if a similar argument can be applied for n > 3.
68
Chapter 2. Determinants
For an n x n square matrix A, its row vectors ri = [a i 1 as elements in
The set P(A) =
[ttiri:o s u s
... ain]
1, i = 1,2, .. .
1=1
can be considered
,nl
is called a parallelogram if n = 2, or a parallelepiped if n 2: 3. Note that the row vectors of A form the edges of P (A) , and changing the order of row vectors does not alter the shape of P(A) .
Theorem 2.10 The determinant det A of an n x n matrix A is the volume ofP(A) up to sign. Infact, the volume ofP(A) is equal to 1 det A I. Proof: We give the proof for the case n = 2 only, and leave the case n = 3 to the readers. Let
where rl , r2 are the row vectors of A. Let Area(A) denote the area of the parallelogram P(A) (see Figure 2.3). Note that Area [
but
;~ ] = Area [ ;~ ] ,
~~ ] =
det [
- det [
~~
l
Thus, one can expect in general that det [
~~ ] =
±Area [
~~ ] ,
which explains why we say 'up to sign' . To determine the sign ±, we first define the
~~ ]
orientation of A = [
to be
p(A) = {
1~~~~ll= ±1
ifdetA =1= 0 , ifdetA = O.
In general, p(A) = 1 if and only if det A > 0: In this case, we say the ordered pair (rl r2) is positively oriented, while p(A) = -1 if and only if det A < 0: In this case, a is negatively oriented. See Figure 2.2 (next page). For example, p ([
~ ~]) =
1, while p ([
~ ~]) =
- 1.
Tofinish the proof, it is sufficientto show that the function D(A) = p(A)Area(A) satisfies the rules (RI)-(R3) of the determinant, so that det = D = ±Area, or Area(A) = 1 det AI. However,
2.5.2. Application: Area and volume
69
y
y
___--+"""""""-'-__ x
-------IO.-+--- x
Figure 2.2. Orientation of vectors
(1) it is clear that D
[~ ~] =
1.
~~ ] = -D [ ~~ ] ,because Area [ ~~ ] = Area [ ~~
(2) D [
(3) D [
l
k~~ ]
= kD [
~~
p ([
~~]) = -p ([ ~~ ])
] for any k . Indeed, if k = 0, it is clear. Suppose k
and
t= o.
Then , as illustrated in Figure 2.3, the bottom edge r j ofP(A) is elongated by while the height h remains unchanged. Thus
y
-¥::::.....-----------+-x
Figure 2.3. The parallelogram P ([
On the other hand,
k~~ ])
Ikl
70
Chapter 2. Determinants Therefore , we have
D[
(4) D [r l
~r2
k~~]
=
p ([
=
1:1
] = D
[~
k~1 ]) Area [ k~1
~~ ]) IklArea [ ~~ ]
p ([
]+D
] = kD [
~~ ] .
[~ ] for anyu.rj andr2injR2.Ifu =O,there
is nothing to prove. Assume thatu f= O.Chooseanyvectorv E jR2 suchthat{u, v} is a basis for jR2 and the pair (u , v) is positively oriented. Then rj = ajU + biv, i = 1,2, and D [ rl
~ r2]
=
D [ (al
+
=
D [ (bl
~ b2)V]
=
D [ alU
~ blv ] +
=
D[
a2)u
~ (bl +
b2)V ]
= (bl
r~ ] + D [ r~
+
D [ a2
l
b2)D [ : ]
U
~ b2V ]
The second equality follows from (2) and Figure 2.4.
o Remark: (1) Note that if we have constructed the parallelepiped peA) using the column vectors of A, then the shape of the parallelepiped is totally different from the one constructed using the row vectors. However, det A = det A T means their volumes are the same, which is a totally nontrivial fact. (2) For n ~ 3, the volume ofP(A) can bedefined by induction onn, and exactly the same argument in the proof can be applied to show that the volume is the determinant. However, there is another 'Nayoflooking at this fact. Let {CI, C2, . . . , cn } be n column
2.5.2. Application:Area and volume
71
vectors of an m x n matrix A. They constitute an n-dimensional parallelepiped in JRm such that P(A) =
[ttiCi : O::::ti::::l.i=1.2.... ,nj . 1= \
A formula for the volume of this parallelepiped may be expressed as follows: We first consider a two-dimensional parallelepiped (a parallelogram) determined by two column vectors C\ and C2 of A [CI c2l in R3. The area of this parallelogram is
=
y
---:~~----------x
z Figure 2.5. A parallelogram in ]R3 simply Area(P(A)) = IIc\lIh, where h and C2. Therefore. we have Area(P(A))2 =
IIcIil
= IIc211 sin () and () is the angle between CI
2 2II c211 sin 2 () =
IIcIi1211c2112(l - cos 2 ())
=
(C\ ' C2)2 ) (CI ' CI)(C2 . C2) ( 1 - - - - - (CI . C\)(C2 . C2)
=
(CI ' CI)(C2 . C2) - (CI • C2)2
=
det [ C\ . C\ CI' C2 ] C2 . CI C2 ' C2
det
([:r]
[CI
C2]) =
T
det(A A),
where "." denotes the dot product. In general, let CI, • . . • Cn be n column vectors of an m x n (not necessarily square) matrix A. Then one can show (for a proof see Exercise 5.17) that the volume of the n-dimensional parallelepiped P(A) determined by those n column vectors C/s in R m is vol(P(A))
= Jdet(A T A) .
In particular, if A is an m x m square matrix. then vol(P(A)) = Jdet(AT A) = Jdet(AT)det(A) = [det A], as expected.
72
Chapter 2 . Determinants
Problem 2.17 Show thattheareaofa triangle ABC in the plane lR 2 , where A (X2, Y2), C (X3, Y3), is equal to the absolute value of
=
2I det [Xl X2
YI Y2
X3
Y3
= (xj , YI), B =
1] 1 . 1
2.6 Exercises
2.1. Determine the values of k for which det [~ 2:] = o. 2.2. Evaluate det(A 2 BA -1) and det(B- I A 3) for the following matrices: A = [ - ; -; { ] , B = [~ -~ ; ] . o
1 0
2
I 3
2.3. Evaluate the determinant of
A
=
[=~ -~ -~ =~]. 2 -3 -5
8
2.4. Evaluate det A for an n x n matrix A
I
i
= [aij] when
f
j (2) aij = j J, 2.5. Find all solutions of the equation det(AB) (1) aij = { 0
A=[X;2
. _ .
1-
X~2J.
+
j.
= 0 for
B=[~ x~2l
2.6. Prove that if A is an n x n skew-symmetric matrix and n is odd, then det A = O. Give an example of 4 x 4 skew-symmetric matrix A with det A f O. 2.7. Use the determinant function to find (1) the area of the parallelogram with edges determined by (4, 3) and (7, 5),
(2) the volume of the parallelepiped with edges determined by the vectors (1,0,4), (0, -2,2) and (3, 1, -I) .
2.8. Use Cramer 's rule to solve each system . (1) { Xl xl (2)
Xl xl Xl
1
(3)/
Xl Xl Xl
+
X2 x2
+ + +
x2 2X2 3X2 X2
+
x2 X2
= = + + + +
3
-1. X3 X3 X3 X3 x3 X3
=
2 2
-4 .
+
X4
+
X4 X4
= =
-1 3 2
O.
2.6. Exercises
73
(U: -nx~[ -n
2.9. Use Cramer's rule to solve the given system :
(l)[l ;]x~[n
2J
2.10 . Find a constant k so that the system of linear equations
kx
-
z 0 4z 0 (k - ljz = 0 has more than one solution . (Is it possible to apply Cramer's rule here?)
1
2y (k + l)y
-
=
+
2.11. Solve the following system oflinear equations by using Cramer's rule and by using Gaus sian elimination:
[1 : ~ i]x~[n
2.12 . Solve the following system of equations by using Cramer's rule:
3x
1
+
2y 2z
3x + 3z -
1
= =
=
3z + 8 x -
1 5y
2y.
2.13. Calculate the cofactors A II, A \2, A 13 and A33 for the matrix A:
(1)A=[~211 ; ; ] , (2)A=[~ ~ ~], (3)A=[-; -~ 312 32
;1]'
2.14. Let A be the n x n matrix whose entries are all 1. Show that
(1) det(A - nIn) = O. (2) (A - nIn)ij (_1)n-1 nn-2 for all i, j, where (A - nIn)ij denotes the cofactor of the (i , j)-entry of A - nino
=
2.15 . Show that if A is symmetric, so is adj A. Moreover, if it is invertible , then the inverse of A is also symmetric. 2.16 . Use the adjugate formula to compute the inverses of the following matrices:
A=[-~ ~ 4
1
;], -1
B=[COS~ ~ -Sin~] . sine 0
cos s
2.17. Compute adj A, detA, det(adj A), A -I, and verify A · adj A
(1) A
= [-;
= (detA)I for
~ ~], (2) A = [~1 ~5 ~] . 7
3 -2 1
=
2.18. Let A, B be invertible matrices . Show that adj(AB) adj B adj A . (The reader may also try to prove this equality for noninvertible matrices .) 2.19. For an m x n matrix A and n x m matrix B, show that det
[_~ ~ ] = det(AB) .
2.20. Find the area of the triangle with vertices at (0,0), (1,3) and (3,1) in ]R2 .
Chapter 2. Determinants
74
2.21. Find the area of the triangle with vertices at (0, 0, 0), (1, 1,2) and (2,2,1 ) in ]R3. 2.22. For A , B, C, D det
E
Mn x n (]R), show that det
[~ ~] =P det A det D -
[~ ~] = det A det D. But , in general,
det B det C.
2.23. Determine whether or not the following statements are true in general, and justify your answers.
= det A + det B . = det(BA) . If A is an n x n square matrix, then for any scalar c, det(c1n - A) = c" - det A .
(1) For any square matrices A and B of the same size, det(A
+
B)
(2) For any square matrices A and B of the same size, det(AB) (3)
(4) If A is an n x n square matrix, then for any scalar c, det(c1n - AT) (5) If E is an elementary matrix, then det E = ±1. (6) There is no matrix A of order 3 such that A 2 = (7) Let A be a nilpotent matrix, i.e., A k (8) det(kA)
= det(cln -
A).
-lJ.
= 0 for some natural number k , Then det A = 0.
= k det A for any square matrix A.
(9) The multilinearity holds in any two rows at the same time :
b+ v c+w] [a b C] [U a+u + x e + y f + z = det d e f + det x [e m n emn e
det d
(10) Any system Ax = b has a solution if and only if det A f= O.
= o. If A is a square matrix with det A = 1, then adj(adj A ) = A . If the entries of A are all integers and det A = 1 or -1 , then the entries of A-I are
(1 1) For any n x 1, n ~ 2, column vectors u and
(12) (13)
Y,
det(uy T )
also integers . (14) If the entries of A are O's or 1's, then det A
= 1,0, or -1.
(15) Every system of n linear equations in n unknowns can be solved by Cramer's rule. (16) If A is a permutation matrix, then AT
= A.
3 Vector Spaces
3.1 The n-space jRn and vector spaces Wehave seen thatthe Gauss-Jordan elimination is the most basic technique for solving a system Ax b of linear equations and it can be written in matrix notation as an LDU factorization. Moreover, the questions of the existence or the uniqueness of the solution are much easier to answer after the Gauss-Jordan eliminat ion. In particular, if det A f= 0, x = 0 is the unique solution Ax = O. In general, the set of solutions of Ax = 0 has a kind of mathematic al structure, called a vector space, and with this concept one can characteri ze the uniqueness of the solution of a system Ax = b of linear equations in a more systematic way. In this chapter, we introduce the notion of a vector space, which is an abstraction of the usual algebraic structure s of the 3-space ]R3 and then elaborate our study of a system oflinear equations to this framework. Usually, many physical quantities, such as length, area, mass, temperature are described by real numbers as magnitudes. Other physical quantities like force or velocity have directions as well as magnitudes . Such quantities with direction are called vectors, while the numbers are called scalars. For instance, a vector (or a point) x in the 3-space ]R3 is usually represented as a triple of real numbers:
=
=
where Xi E ]R, i 1, 2, 3, which are called the coordinates of x. This expression provides a rectangular coordinate system in a natural way. On the other hand, pictorially such a point in the 3-space ]R3 can also be represented by an arrow from the origin to x. In this way, a point in the 3-space ]R3 can be understood as a vector. The direction of the arrow specifies the direction of the vector, and the length of the arrow describes its magnitude. In order to have a more general definition of vectors, we extract the most basic properties of those arrows in 1R3 . Note that for all vectors (or points) in ]R3, there are two algebraic operations: the sum of two vectors and scalar multiplication of a vector J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
76
Chapter 3. Vector Spaces
by a scalar. That is, for two vectors x = (XJ, x2, X3), Y = (YI, Y2 , Y3) in 1R3 and k a scalar, we define
x+ Y = kx =
X2 + Y2, X3 + Y3), (kXI, kX2, kx3) . (Xl + YI,
Then a vector x = (Xl, X2, X3) in 1R3 may be written as
where i = (1,0,0), j = (0,1,0) and k = (0,0, 1) which were introduced as the rectangular coordinate system in vector calculus. The sum of vectors and the scalar multiplication of vectors in the 3-space 1R3 are illustrated in Figure 3.1:
x+y
X3
X3
Xl
Figure 3.1. A vector sum and a scalar multiplication
Even though our geometric visualization of vectors does not go beyond the 3-space 1R3 , it is possible to extend these algebraic operations of vectors in the 3space 1R3 to the n-space IRn for any positive integer n. It is defined to be the set of all ordered n-tuples (aI, a2, ... ,an) of real numbers, called vectors: i.e.,
For any two vectors x = (Xl, X2, . .. ,xn) and Y = (YI, Y2 , . .. , Yn) in the n-space IRn , and a scalar k, the sum X + Yand the scalar multiplication kx of them are vectors in IRn defined by x-l-y
kx
= =
(XI+YI, X2+Y2, ... ,X n+Yn) ,
(kxl, kX2, . .. , kx n) .
It is easy to verify the following arithmetical rules of the operations:
Theorem 3.1 For any scalars k and f., and vectors x = (Xl , X2, . . . ,Xn ), Y = (YI, Y2, . . . , Yn), and z = (Zl, Z2, . . . ,Zn) in the n-space IRn , the following rules hold:
3.1. The n-space jRn and vector spaces (1) (2) (3) (4) (5)
77
x + Y = Y + x,
x + (y + z) = (x + y) + z, x + 0 = x = 0 + x, x+(-I)x=O, k(x + y) = kx + ky, (6) (k + e)x = kx + ex, (7) k(ex) = (ke)x, (8) lx = x,
where 0 = (0, 0, ... , 0) is the zero vector.
We usually identify a vector (aI, a2, ... ,an) in the n-space lRn with an n x I column vector
Sometimes a vector in lRn is also identified with a I x n row vector (see Section 3.5). Then, the two operations of the matrix sum and the scalar multiplication of column matrices coincide with those of vectors in lRn , and Theorem 3.1 rephrases Theorem 1.3. These rules of arithmetic of vectors are the most important ones because they are the only rules that we need to manipulate vectors in the n-space lR n . Hence, an (abstract) vector space can be defined with respect to these rules of operations of vectors in the n-space lRn so that lRn itself becomes a vector space. In general, a vector space is defined to be a set with two operations: an addition and a scalar multiplication which satisfy the rules (1)-(8) in Theorem 3.1. Definition 3.1 A (real) vector space is a nonempty set V of elements, called vectors, with two algebraic operations that satisfy the following rules. (A) There is an operation called vector addition that associates to every pair x and y of vectors in V a unique vector x + y in V, called the sum of x and y, so that the following rules hold for all vectors x, y, z in V : (1) x + Y = Y+ x (commutativity in addition), (2) x + (y+z) = (x-l-y) + z(= x + Y+ z) (associativity in addition), (3) there is a unique vector 0 in V such that x + 0 = x = 0 + x for all x E V (it is called the zero vector) , (4) for any x E V, there is a vector -x E V, called the negative of x, such that x + (-x) = (-x) + x = O. (B) There is an operation called the scalar multiplication that associates to each vector x in V and each scalar k a unique vector kx in V so that the following rules hold for all vectors x, y, z in V and all scalars k, e: (5) k(x + y) = kx + ky (6) (k + e)x = kx + ex
(distributivity with respect to vector addition), (distributivity with respect to scalar addition) ,
78
Chapter 3. Vector Spaces
(7) k(lx) = (kl)x
(associativity in scalar multiplication),
(8) Ix = x. Clearly, the n-space JRn is a vector space by Theorem 3.1. A complex vector space is obtained if, instead of real numbers, we take complex numbers for scalars. of all ordered n-tuples of complex numbers is a complex For example , the set vector space. In Chapter 7 we shall discuss complex vector spaces, but until then we will discuss only real vector spaces unless otherwise stated.
en
Example 3.1 (Miscellaneous examples 01 vector spaces) (1) For any two positive integers m and n, the set Mmxn(JR) of all m x n matrices forms a vector space under the matrix sum and the scalar multiplication defined in Section 1.3. The zero vector in this space is the zero matrix Omxn , and -A is the negative of a matrix A. (2) Let A be an m x n matrix. Then it is easy to show that the set of solutions of the homogeneous system Ax = 0 is a vector space (under the sum and the scalar multiplication of matrices). (3) Let C(JR) denote the set of real-valued continuous functions defined on the real line JR. For two functions I and g, and a real number k, the sum I + g and the scalar multiplication kf of them are defined by (f
+
g)(x)
I(x)
(kf)(x) =
+
g(x) ,
kl(x).
Then the set C(lR) is a vector space under these operations. The zero vector in this space is the constant function whose value at each point is zero. (4) Let S(JR) denote the set of real-valued functions defined on the set of integers. A function I E S(JR) can be written as a doubly infinite sequence of real numbers: . .• , X-2, X- I , Xo , XI , X2 , •• . ,
=
where Xk I (k) for each k. This kind of sequences appear frequently in engineering, and is called a discrete or a digital signal. One can define the sum of two functions and the scalar multiplication of a function with a scalar just as in C(JR) in (3) so that S(JR) becomes a vector space. 0 Theorem 3.2 Let V be a vector space and let x, y be vectors in V. Then (1) x + Y = Y implies x = 0, (2) Ox = 0, (3) kO = 0 lor any k E R (4) -x is unique and -x = (-l)x, (5) ifkx = 0, then k = 0 orx = O. Proof: (1) By adding -y to both sides ofx + y = y, we have x
= x + 0 = x + Y+
(-y)
=y+
(-y)
= O.
3.2. Subspaces
79
(2) Ox = (0 + O)x = Ox + Ox implies Ox = 0 by (1). (3) This is an easy exercise. (4) The uniqueness of the negative -x ofx can be shown by a simple modification of Lemma 1.7. In fact, if x is another negative of x such that x + x = 0, then -x = -x + 0 = -x + (x +
x)
= (-x + x)
+
X= 0 +
x = x.
On the other hand, the equation x + (-I)x = lx + (-I)x = (I - I)x = Ox = 0
=
shows that (-l)x is another negative of x, and hence -x (-l)x by the uniqueness of-x. (5) Suppose kx = 0 and k ,p. O.Then x = lx = t(kx) = = O. 0
to
Problem 3.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that an addition and scalar multiplication of pairs are defined by (x, y)
+
(u , v)
= (x + 2u, y + 2v),
k(x, y)
= (kx,
ky).
Is the set V a vector space under those operations? Justify your answer.
3.2 Subspaces Definition 3.2 A subset W of a vector space V is called a subspace of V if W itself is a vector space under the addition and the scalar multiplication defined in V. In order to show that a subset W is a subspace of a vector space V, it is not necessary to verify all the arithmetic rules of the definition of a vector space. One only needs to check whether a given subset is closed under the same vector addition and scalar multiplication as in V . This is due to the fact that certain rules satisfied in the larger space are automatically satisfied in every subset. Theorem 3.3 A nonempty subset W ofa vector space V is a subspace if and only if x + y and kx are contained in W (or equivalently, x + ky E W) for any vectors x and y in Wand any scalar k E JR. Proof: We need only to prove the sufficiency. Assume both conditions hold and let x be any vector in W. Since W is closed under scalar multiplication, 0 = Ox and -x (-I)x are in W, so rules (3) and (4) for a vector space hold . All the other rules 0 for a vector space are clear.
=
A vector space V itself and the zero vector {OJ are trivially subspaces. Some nontrivial subspaces are given in the following examples.
80
Chapter 3. Vector Spaces
Example 3.2 (Which planes in lR 3 can be a subspace?) Let
W = {(x, Y, z)
3 E lR : ax
+
by + cz = O},
wherea ,b,careconstants. Ifx = (XI,XZ,X3), Y = (YI,Yz,Y3) are points in W , then clearly x + Y = (XI + YI, Xz + yz, X3 + Y3) is also a point in W, because it satisfies the equation in W. Similarly, kx also lies in W for any scalar k. Hence , W is a subspace of lR3 , which is a plane passing through the origin in lR 3 • 0 Example 3.3 (The solutions of Ax = 0 form a subspace) Let A be an m x n matrix. Then, as shown in Example 3.1(2), the set
W = {x E lRn
Ax = O}
:
=
of solutions of the homogeneous system Ax 0 is a vector space . Moreover, since the operations in Wand in lRn coincide , W is a subspace of lRn . 0 Example 3.4 For a nonnegative integer n, let Pn(lR) denote the set of all real polynomials in x with degree :5 n. Then Pn (R) is a subspace of the vector space C(lR) of all continuous functions on R 0 Example 3.5 (The space ofsymmetricor skew-symmetric matrices) Let W be the set of all n x n real symmetric matrices. Then W is a subspace of the vector space Mn xn (R) of all n x n matrices, because the sum of two symmetric matrices is symmetric and a scalar multiplication of a symmetric matrix is also symmetric. Similarly, the set of all n x n skew-symmetric matrices is also a subspace of Mn xn{lR). 0 Problem 3.2 Which of the following sets are subspaces of the 3-space ll~.3? Justify your answer.
= =
=
(1) W {(x, y, z) E JR3 : x yz O} , (2) W {(2t , 3t , 4t) E JR3 : t E JR}, (3) W={(X,y ,Z)EJR3 : xZ + y2 _ z2 = O}, (4) W {x E JR3 : x T u 0 x T v}, where u and v are any two fixed nonzero vectors in
=
JR3.
= =
Can you describe all subspaces of the 3-space JR3 ? Problem 3.3 Let V = C(JR) be the vector space of all continuous functions on lR. Which of the following sets Ware subspaces of V ? Justify your answer.
(l) W is the set of all differentiable functions on lR. (2) W is the set of all bounded continuous functions on R. (3) W is the set of all continuous nonnegative-valued functions on JR, i.e., !(x) 2:: 0 for any x E lR. (4) W is the set of all continuous odd functions on JR, i.e., ! (-x) ! (x) for any x E lR. (5) W is the set of all polynomials with integer coefficients.
=-
Definition 3.3 Let U and W be two subspaces of a vector space V.
3.2. Subspaces
81
(1) The sum of V and W is defined by V
+
W = {u
+
V : u E V , WE W}.
W E
(2) A vector space V is called the direct sum of two subspaces V and W, written as V = V ffi W , if V = V + Wand V n W = {O}. It is easy to see that V + Wand V n W are also subspaces of V . If V = JR2 (x y-plane), V = {xi : x E JR}(x-axis), and W = {yj : y E JR} (y-axis), then it is easy to see that JR2 = V ffi V = JR ffi JR, by considering the x-axis (and also the y-axis) as R, Similarly, one can easily be convinced that JR3 = JR2 ffi JR 1 = JR 1 ffi JR 1 ffi JR I . Problem 3.4 Let U and W be subspaces of a vector space V.
(l) Suppose that Z is a subspace of V contained in both U and W . Show that Z is also contained in U n W. (2) Suppose that Z is a subspace of V containing both U and W. Show that Z also contains U + W as a subspace.
Theorem 3.4 A vector space V is the direct sum of subspaces V and W , i.e., V = V ffi W, if and only if for any v E V there exist unique u E V and W E W such that v = u-j-w. Proof: (:::}) Suppose that V = V ffi W . Then, for any v E V, there exist vectors u E V and WE W such that v = u + w, since V = V + W. To show the uniqueness, suppose that v is also expressed as a sum u' + w' for u' E V and w' E W. Then u + W = u' + w' implies
u - u' = w' -
=
W E
V
n W = {O}.
Hence, u u' and W = w'. (<=) Clearly, V V + W. Suppose that there exists a nonzero vector v in V Then v can be written as a sum of vectors in V and W in many different ways:
=
v= v+ 0
1
= 0 + v = -v + 2
1 -v 2
1
= -v + 3
2 -v 3
E
V
+
W.
n W.
o
Example 3.6 (Sum, but not direct sum) In the 3-space JR3 , consider the three vectors el = (1,0,0), e2 = (0, 1,0) and e3 = (0,0,1). These three vectors are also well known as l, j and k respectively. Let V = {ai + ck : a, c E JR} be the xz-plane, and let W = {bj + ck : b, c E JR} be the yz-plane, which are both subspaces ofJR3. Then a vector in V + W is of the form
+ ai +
ai
+ bj + bj
(Cl
+
ck
=
c2)k (a , b , c),
82
Chapter 3. VectorSpaces
where c = ct + Cz and a, b, c can be arbitrary numbers . Thus U + W = ]R3. However, ]R3 f: U $ W since clearly k E un W f: {OJ. In fact, the vector k E ]R3 can be written as many linear combinations of vectors in U and W:
k
1
= -k + 2
1 -k 2
1
= -k + 3
2 -k 3
E
U + W.
Note that if we had taken W = {yj : y E ]R} to be the y-axis, then it would be easy to see that]R3 U $ W. Note also that there are many choices for W so that ]R3 U$ W. 0
=
=
Problem 3.5 Let U and W be the subspaces of the vector space MnxnOR) consisting of all symmetric matrices and all skew-symmetric matrices, respectively. Show that Mnxn(lR) = U e W . Therefore, the decomposition of a square matrix A given in (3) of Problem 1.11 is unique.
3.3 Bases As we know, a vector in the 3-space]R3 is of the form (XI , xz, X3) , and also it can be written as
(XI, Xz, X3) = XI (1,0,0) + xz(O, 1,0) + X3(0, 0,1). That is, any vector in ]R3 can be expressed as a sum of scalar multiples of three vectors el = (1,0,0), ez = (0,1,0) and e3 = (0,0,1). The following definition gives a name to such an expression. Definition 3.4 Let V be a vector space and let {XI,Xz, . .. , Xm } be a set of vectors in V. Then a vector y in V of the form
where aI, az , .. . , am are scalars, is called a linear combination of the vectors XI, XZ, •••, Xm · The next theorem shows that the set of all linear combinations of a finite set of vectors in a vector space forms a subspace . Theorem 3.5 Let XI, Xz, . .. , Xm be vectors in a vector space V. Then the set W = {aIxI + azxz + ... + amXm : a; E ]R} ofall linear combinationsof XI. Xz, . . . , Xm is a subspace of V. It is called the subspace of V spanned by XI , Xz, . .. , Xm • Or, XI, Xz, ... , Xm span the subspace W. Proof: It is necessary to show that W is closed under the vector sum and the scalar multiplication. Let u and w be any two vectors in W. Then u
w
=
=
aIxI bIXI
+ +
azxz bzxz
+ +
3.3. Bases
83
for some scalars ai' S and b, 's. Therefore,
and , for any scalar k,
Thus, u + wand ku are linear combinations of XI, contained in W. Therefore, W is a subspace of V .
X2, . .. , Xm
and consequently
0
Example 3.7 (A space can be spanned by many different sets) (1) For a nonzero vector v in a vector space V, a linear combination of v is simply a scalar multiple ofv. Thus the subspace W of V spanned by v is W = {kv : k E JR}. Note that this subspace W can be spanned by any kv, k :f: O. (2) Consider three vectors el = (1,0,0), e2 = (0,1,0) and v = el + ~ = (1, 1, 0) in JR3 . The subspace WI spanned by el and e2 is written as
WI = {aiel +a2e2 = (ai, a2, 0) : ai and the subspace W2 spanned by
W2 = {al el
+
ej ,
E
JR} ,
e2 and v is written as
a2e2 + a3v = (al
+ a3, a2 + a3, 0) : ai E JR}. since v = ei + e2 E WI, W2 S;
Clearly, WI S; W2 . On the other hand, WI. Thus WI W2 which is the xy-plane in JR3 . In general, a subspace in a vector space can have many different spanning sets. 0
=
Example 3.8 (Forany m ::: n, JRm is a subspace ofJRn) Let =
(1,0,0,
,0),
e2 =
(0,1 ,0,
,0) ,
en =
(0,0,0, . . . , 1)
el
be n vectors in the n-space JRn (n 2: 3). Then a linear combination of ej , e2, e3 is of the form Hence, the set
is the subspace of the n-space JRn spanned by the vectors ej , e2, e3. Note that the subspace W can be identified with the 3-space JR3 through the identification
(ai, a2, a3, 0, . .. , 0)
=(ai, a2, a3)
with a, E JR. In general, for m ::: n, the m-space JRm can be identified as a subspace of the n-space JRn . 0
84
Chapter 3. VectorSpaces
Example 3.9 (All Ax'sform the column space) Let A = [Cl C2'" cn] be an m x n matrix with column c, 'so Then the column vectors Ci are in lRm , and the matrix product Ax represents the linear combination of the column vectors Ci whose coefficients are the components of x E lRn, i.e., Ax = Xl Cl + X2C2 + ... + XnC n (see Example 1.9). Therefore, the set W = {Ax E lRm : x E R"} of all linear combinations of the column vectors of A is a subspace of lRm called the column space of A . Consequently, Ax b has a solution (Xl , X2, .. . , x n ) in lRn if and only if the vector b belongs to the column space of A. 0
=
Remark: One can take another point of view on the equation Ax = b. For any vector x E lRn, A assigns another vector b in lRm which is given as Ax. That is, A can be considered as a function from lRn into lRm and the column space of A is nothing but the image of this function. Problem 3.6 Let Xl, X2,
, Xm be vectors in a vector space V and let W be the sub, Xm. Show that W is the smallest subspace of V containing space spanned by xj , x2, Xl, X2, .. . , X m. Inotherwords,ifU is a subspace of V containing xl, X2, ... , xm,then W £; U.
As we saw in Theorem 3.5 and Example 3.7, any nonempty subset of a vector space V spans a subspace through the linear combinations of the vectors, and two different subsets may span the same subspace. This means that a vector can be written as linear combinations in various ways. However, for some sets of vectors in a vector space V , any vector in V can be expressed uniquely as a linear combination of the set. Such a set of vectors is called a basis for V. In the following we will make this notion clear and show how to find such a basis . Definition 3.5 A set of vectors {Xl, X2, .. " xm} in a vector space V is said to be linearly independent if the vector equation, called the linear dependence of Xi 'S,
has only the trivial solution Cl linearly dependent.
= C2 = ... = Cm = 0. Otherwise,
it is said to be
By definition, a set of vectors {Xl, X2, . . . , xm} is linearly dependent if and only if the linear dependence CIXI
+
C2X2
+ . .. +
CmXm = 0
has a nontrivial solution (cj , C2, . . . , cm). For example , if Cm =1= 0, the equation can be rewritten as ci C2 Cm-l Xm = --Xl - -X2 - ... - --Xm-l. Cm Cm Cm That is, a set ofvectors is linearly dependent if and only if at least one of the vectors in the set can be written as a linear combination of the others. It means that at least
3.3. Bases
85
one of the vectors can be expressed as a linear combination of the set in two different ways.
=
=
Example 3.10 (Linear independence in JR3) Let x (1,2,3) and y (3,2,1) be two vectors in the 3-space JR3. Then clearly y =1= AX for any A E ]R (or ax + by = 0 is possible only when a = b = 0). This means that {x, y} is linearly independent in JR3. If w = (3, 6, 9), then {x, w} is linearly 'dependent since w - 3x = O. In general, if x, y are non-collinear vectors in the 3-space ]R3, the set of all linear combinations of x and y determines a plane W through the origin in]R3, i.e., W = {ax + by : a , b E R}, Let z be another nonzero vector in the 3-space JR3. If z E W, then there are some scalars a, b E JR, not all of them are zero, such that z = ax + by, that is, the set [x, y , z} is linearly dependent.Ifz ~ W, then ax + by + cz = 0 is possible only when a = b = c = 0 (prove it). Therefore, the set {x, y, z] is linearly independent if and only if z does not lie in W. 0 By abuse of language, it is sometimes convenient to say that "the vectors xr, X2, . .. , xm are linearly independent," although this is really a property of a set. Example 3.11 The columns of the matrix
1 -2 A= 4 2 [ 2 -1
-1 0] 6 8 1 3
are linearly dependent in the 3-space ]R3, since the third column is the sum of the first and the second. 0 As shown in Example 3.11, the concept of linear dependence can be applied to the row or column vectors of any matrix. Example 3.12 Consider an upper triangular matrix
A=
[
235] .
016 004
The linear dependence of the column vectors of A may be written as
CJ
[~] +'2 [!] +" [:] ~ [~],
which, in matrix notation, may be written as a homogeneous system:
From the third row, c3 = 0, from the second row c2 = 0, and substitution of them into the first row forces Cl = 0, i.e., the homogeneous system has only the trivial solution, so that the column vectors are linearly independent. 0
86
Chapter 3. Vector Spaces
The following theorem can be proved by the same argument. Theorem 3.6 The nonzero rows of a matrix in row-echelon form are linearly independent, and so are the columns that contain leading I's. In particular, the rows of any triangular matrix with nonzero diagonals are linearly independent, and so are the columns. If V = lRm and VI, V2, . .. , Vn are n vectors in lRm , then they form an m x n matrix A = [VI v2 .. . vnl . On the other hand, Example 3.9 shows that the linear dependence ct VI + C2V2 + . .. + Cn Vn = 0 of Vi 'S is nothing but the homogeneous equation Ax = 0, where x = (CI , c2, . .. , cn ) . Thus, one can say
(1) the column vectors Vi 'S of A are linearly independent in lRm if and only if the homogeneous system Ax = 0 has only the trivial solution, and (2) they are linearly dependent if and only if Ax = 0 has a nontrivial solution.
If U is the reduced row-echelon form of A , then we know that Ax = 0 and Ux = 0 have the same set of solutions. Moreover, a homogeneous system Ax = 0 with more unknowns than equations always has a nontrivial solution by Theorem 1.2. This proves the following lemma. Lemma 3.7 (1) Ifn > m, any set ofn vectors in the m-space lRm is linearly dependent . (2) If U is the reduced row-echelon form of A, then the columns of U are linearly independent if and only if the columns of A are linearly independent. Example 3.13 Consider the vectors el = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) in the 3-space lR3. The matrix A = [e\ e2 e3l is the identity matrix and so Ax = 0 has only the trivial solution. Thus , the set of vectors {el , e2, e3} is linearly independent 0 and also spans lR3 . Example 3.14 (The standard basis for lRn ) The vectors ej , e2, ... , en in lRn are clearly linearly independent (see Theorem 3.6). Moreover, they span the n-space lRn : In fact, a vector x = (Xl, x2, .. . ,X n ) E lRn is a linear combination of the vector ei 's :
o Definition 3.6 Let V be a vector space. A basis for V is a set of linearly independent vectors that spans V. For example, as in Example 3.14, the set [ej , e2, .. . , en} forms a basis, called the standard basis for the n-space lRn . Of course, there are many other bases for lRn . Example 3.15 (A basis or not) (1) The set of vectors (1,1, 0) , (0, -1 , 1), and (1,0,1) is not a basis for the 3space lR3 , since this set is linearly dependent (the third is the sum of the first two vectors), and cannot span lR3. (The vector (1, 0, 0) cannot be obtained as a linear combination of them (prove it).) This set does not have enough vectors spanning lR3.
3.3. Bases
87
(2) The set of vectors (1, 0, 0), (0,1,1), (1, 0,1) and (0,1,0) is not a basis either, since they are not linearly independent (the sum of the first two minus the third makes the fourth) even though they span jR3 . This set of vectors has some redundant vectors spanning jR3. (3) The set of vectors (1,1,1), (0,1,1), and (0, 0,1) is linearly independent and also spans jR3. That is, it is a basis for jR3, different from the standard basis. This set has the proper number of vectors spanning jR3, since the set cannot be reduced to a smaller set nor does it need any additional vector to span jR3 . 0 By definition, in order to show that a set of vectors in a vector space is a basis, one needs to show two things: it is linearly independent, and it spans the whole space. The following theorem shows that a basis for a vector space represents a coordinate system just like the rectangular coordinate system by the standard basis for jRn. Theorem 3.8 Let a = {VI, V2, ... , vn} be a basis for a vector space V. Then each vector x in V can be uniquely expressed as a linear combination of VI, V2, ... , Vn, i.e., there are unique scalars a.'s, i = 1, 2, .. . , n, such that
Proof: If x can also expressed as x = blvl + b2V2 + . . . + bnv n, then we have (al - b; )VI + (a2 - b2)V2 + . . . + (an - bn)vn . By the linear independence of
o=
0
vi's,ai=b iforalli=I,2, . . . ,n.
Example 3.16 (Two different bases for jR3) Leta = [ej , e2, e3} be the standard basis forjR3, and let,B = {VI , V2 , V3} with VI = (1, 1, 1) = el +e2 +e3, V2 = (0, 1, 1) = e2 + e3, V3 = (0,0, 1) = ej , Then, ,B is also a basis for JR3 (see Example 3.15(3» . For any x = (XI, X2 , X3) E jR3, one can easily verify that
x
Problem 3.7 Show that the vectors VI 3-space lR3 form a basis .
+
X2 e2
XlVI +
(X2 -
XI el
+
X3e3
XI)V2 +
(X3 -
X2)V3 .
o
= (l, 2, 1), v2 = (2,9,0) and v3 = (3,3,4) in the
Problem 3.8 Show that the set {l , x, x 2 , ... , x"} is a basis for Pn (R), the vector space of all polynomials of degree ~ n with real coefficients . Problem 3.9 Let Xk denote the vector in lRn whose first k - 1 coordinates are zero and whose last n - k + I coordinates are 1. Show that the set {XI, X2, ... , xn} is a basis for lRn .
88
Chapter 3. Vector Spaces
3.4 Dimensions We often say that the line ]R 1 is one-dimensional, the plane]R2 is two-dimensional and the space R' is three-dimensional, etc. This is mostly due to the fact that the freedom in choosing coordinates for each element in the space is 1,2 or 3, respectively. This means that the concept of dimension is closely related to the concept of bases . Note that for a vector space in general there is no unique way in choosing a basis . However, there is something common to all bases, and this is related to the notion of dimension. We first need the following lemma from which one can define the dimension of a vector space.
Lemma 3.9 Let V be a vector space and let ex = {Xl, X2, ... , xm} be a set of m vectors in V .
(1) Ifex spans V, then every set ofvectors with more than m vectors cannot be linearly independent. (2) If ex is linearly independent, then any set of vectors with fewer than m vectors cannot span V. Proof: Since (2) follows from (1) directly, we prove only (1). Let,8 = {Yl, Y2, .. . ,Yn} be a set of n-vectors in V with n > m. We will show that ,8 is linearly dependent. Indeed, since each vector Yj is a linear combination of the vectors in the spanning set ex, i.e., for j = 1,2, .. . , n , m
Yj
= aljXI + a2j X2 + .. . + amjXm = L
aijXi,
i=l
we have crYI + C2Y2
+ .. . +
cnYn =
=
cr (alixi
+
+
c2(a12xI
a21x2 +
+
...+ amlXm) ... + a m2 Xm)
a22x2 +
+ al2 c2 + ... + alnCn)XI + (a2l cl + a22 c2 + . .. + a2nCn)X2 (all CI
Thus, ,8 is linearly dependent if and only if the system of linear equations crYI + C2Y2
+ ... +
CnYn = 0
has a nontrivial solution (CI, C2, • •• ,Cn ) ::f:: (0,0, .. . , 0). This is true if all the coefficients of Xi'S are zero but not all of ci 's are zero. It means that the homogeneous system of linear equations in Ci 'S
3.4. Dimensions
89
must have a nontrivial solution. But it is guaranteed by Lemma 3.7, since m < n. 0 It is clear by Lemma 3.9 that if a set (X = {XI , X2, ... , x n } of n vectors is a basis for a vector space V, then no other set f3 = {YI, Y2 , . . . , Yr} of r vectors can be a basis for V if r i= n. This means that all bases for a vector space V have the same number of vectors, even if there are many different bases for a vector space. Therefore, we obtain the following important result: Theorem 3.10 If a basis for a vector space V consists of n vectors, so does every other basis. Definition 3.7 The dimension of a vector space V is the number, say n, of vectors in a basis for V , denoted by dim V = n. When V has a basis of a finite number of vectors, V is said to be finite dimensional. Example 3.17 (Computing the dimension) The following are trivial: (1) If V has only the zero vector: V = {OJ, then dim V = O. (2) If V = IR n, then the standard basis {el , ez. ... , en} for V implies dim IRn = n. (3) If V = Pn (IR) of all polynomials of degree less than or equal to n, then dim Pn (lR) = n + 1 since {I , x, x 2 , ... , x " ] is a basis for V. (4) If V = Mm x n( lR) of all m x n matrices, then dim M m x n(lR) = mn since {Eij : i = 1, .. . , m, j = 1, .. . , n} is a basis for V , where Eij is the m x n matrix 0 whose (i, j)-th entry is 1 and all others are zero. If V = C(IR) of all real-valued continuous functions defined on the real line, then one can show that V is not finite dimensional. A vector space V is infinite dimensional if it is not finite dimensional. In this book, we are concerned only with finite-dimensional vector spaces unless otherwise stated. Theorem 3.11 Let V be a finite-dimensional vector space. (1) Any linearly independent set in V can be extended to a basis by adding more vectors if necessary. (2) Any set of vectors that spans V can be reduced to a basis by discarding vectors if necessary.
Proof: We prove (1) and leave (2) as an exercise. Let (X = {XI, X2, . .. , Xk } be a linearly independent set in V. If (X spans V, then (X is a basis. If (X does not span V, then there exists a vector, say Xk+ I , in V that is not contained in the subspace spanned by the vectors in (x. Now {XI, .. . , Xko xk+d is linearly independent (check why). If {XI, . .• , Xko xk+d spans V, then this is a basis for V. If it does not span V , then the
90
Chapter 3. VectorSpaces
same procedure can be repeated, yielding a linearly independent set that spans V, i.e., a basis for V . This procedure must stop in a finite number of steps because of Lemma 3.9 for a finite-dimensional vector space V. 0 Theorem 3.11 shows that a basis for a vector space V is a set of vectors in V which is maximally independent and minimally spanning in the above sense. In particular, if W is a subspace of V, then any basis for W is linearly independent also in V, and can be extended to a basis for V . Thus dim W ::: dim V. Corollary 3.12 Let V be a vector space ofdimension n. Then (1) any set ofn vectors that spans V is a basisfor V, and (2) any set ofn linearlyindependent vectorsis a basisfor V.
Proof: Again we prove (1) only. If a spanning set of n vectors were not linearly independent, then the set would be reduced to a basis that has a number of vectors smaller than n. 0 Corollary 3.12 means that if it is known that dim V = n and if a set of n vectors either is linearly independent or spans V, then it is already a basis for the space V. Example 3.18 (Constructing a basis) Let W be the subspace of]R4 spanned by the vectors xi
= (1 ,
-2, 5, -3),
X2
= (0,
1, 1, 4),
X3
= (1, 0,
1, 0).
Find a basis for Wand extend it to a basis for ]R4 . Solution: Note that dim W ::: 3 since W is spanned by three vectors Xj's. Let A be the 3 x 4 matrix whose rows are XI. X2 and X3 :
-2 5 1 1 o 1
-3 ]
4
o
.
Reduce A to a row-echelon form :
The three nonzero row vectors of U are clearly linearly independent, and they also span W because the vectors Xl, X2 and X3 can be expressed as a linear combination of these three nonzero row vectors of U . Hence , the three nonzero row vectors of U provides a basis for W. (Note that this implies dim W = 3 and hence xi, X2, X3 is
3.5. Rowand column spaces
91
also a basis for W by Corollary 3.12. The linear independence of Xi 'S is a by-product of this fact). To extend it to a basis for ]R4 , just add any nonzero vector of the form X4 = (0, 0, 0, t) to the rows of U. D Problem 3.10 Let W be a subspace of a vector space V. Show that if dim W
= dim V , then
W=V . Problem 3.11 Find a basis and the dimension of each of the following subspaces of Mn xn (R) of all n x n matrice s. (1) The space of all n x n diagonal matrices whose traces are zero. (2) The space of all n x n symmetric matrices . (3) The space of all n x n skew-symmetric matrices.
As a direct consequence of Theorem 3.11 and the definition of the direct sum of subspaces, one can show the following corollary.
Corollary 3.13 For any subspace U of V, there is a subspace W of V such that V = U EB W. Proof: Choose a basis {UI , . . . , Uk} for U, and extend itto a basis {UI, .. . , Ub Uk+l , . . . • un} for V. Then the subspace W spanned by {Uk+I • . . . • un} satisfies the requirement. D Problem 3.12 Let lv j , V2•...• vn} be a basis for a vector space V and let Wi be the subspace of V spanned by Vi. Show that V W I EEl W2 EEl ... EEl Wn.
=
= {rv i
: r E R}
3.5 Rowand column spaces In this section, we go back to systems of linear equations and study them in terms of the concepts introduced in the previous sections. Note that an m x n matrix A can be abbreviated by the row vectors or column vectors as follows:
where
fi
is the i-th row vectors of A in jRn, and Cj is the j -th column vectors of A in
jRm.
Definition 3.8 Let A beanm x n matrix withrow vectors {fl . f 2 , vectors {CI , C2, ... , cn }.
. .. , f m }
and column
92
Chapter 3. Vector Spaces
(1) The row space of A is the subspace in jRn spanned by the row vectors {rl. r2 •.. .• r m } . denoted by R(A) . (2) The column space of A is the subspace in jRm spanned by the column vectors Iei. C2• . . . • cn } . denoted by C(A) . (3) The solution set of the homogeneous equation Ax 0 is called the null space of A, denoted by N(A) .
=
Note that the null space N (A) is a subspace of the n-space jRn. Its dimension is called the nullity of A. Since the row vectors of A are just the column vectors of its transpose A T and the column vectors of A are the row vectors of AT, the row (column) space of A is just the column (row) space of AT; that is,
Since Ax get
= XlCl + X2C2 + ... + XnC n for any vector x = (Xl. X2 , •. • ,Xn) E jRn, we C(A) = {Ax : x E jRn}.
Thus, for a vector b E jRm. the system Ax = b has a solution if and only if b E C(A) ~ jRm. In other words, the column space C(A) is the set of vectors bE jRm for which Ax = b has a solution. It is quite natural to ask what the dimensions of those subspaces are. and how one can find bases for them. This will help us to understand the structure of all the solutions of the equation Ax = b. Since the set of the row vectors and the set of the column vectors of A are spanning sets for the row space and the column space, respectively. a minimally spanning subset of each of them will be a basis for each of them . This is not a difficult problem for a matrix of a (reduced) row-echelon form. Example 3.19 (Find a basis for the null space) Let U be in a reduced row-echelon form given as
U=
[
1 0 0 o 1 0
2
o
0
2]
-1 3 0 0 1 4 -1
0 0
.
0
Clearly, the first three nonzero row vectors containing leading 1's are linearly independent and they form a basis for the row space R(U) . so that dim R(U) = 3. On the other hand, the first three columns containing leading l's are also linearly independent (see Theorem 3.6), and the last two column vectors can be expressed as linear combinations ofthem. Hence, they form a basis for C(U), and dim C(U) = 3. To find a basis for the null space N (U) , we first solve the system Ux = 0 to get the solution
3.5. Rowand columnspaces
93
where n, = (-2 , 1, -4, 1, 0),ot= (-2 , -3 , 1,0, l),ands andt are arbitrary values for the free variables X4 and X5, respectively. It shows that these two vectors n, and n, span the null space N (U ), and they are clearly linearly independent (see their last two entries). Hence, the set (os , oel is a basis for the null space N(U ). 0 For any matrix A, we first investigate the row space R (A) and the null space N (A) of A by comparing them with those of the reduced row-echelon form U of A. Since Ax = 0 and Ux = 0 have the same solution set by Theorem 1.1, we clearly have N (A ) = N(U). Let [rj , ... , r m } be the row vectors of an m x n matrix A. The three elementary row operations change A into the matrices Ai of the following three types:
Al =
kr,
for k
¥= 0; A2 =
for i < j;
A3 =
ri
+
krj
rj
It is clear that the row vectors of the three matrices AI, A2 and A3 are linear combinations of the row vectors of A. On the other hand, by the inverse elementary row operations, these matrices can be changed into A. Thus, the row vectors of A can also be written as linear combinations of those of Ai'S. This means that if matrices A and B are row equivalent, then their row spaces must be equal, i.e., R(A) = R(B). Now the nonzero row vectors in the reduced row-echelon form U are always linearly independent and span the row space of U (see Theorem 3.6). Thus they form a basis for the row space R (A) of A. It gives the following theorem. Theorem 3.14 Let U be a (reduced) row-echelon form of a matrix A. Then R (A) = R(U) and N(A) = N(U ). Moreover, if U has r nonzero row vectors containing leading 1's, then they form a basis for the row space R(A), so that the dimension ofR(A) is r.
The following example shows how to find bases for the row and the null spaces, and at the same time how to find a basis for the column space. Example 3.20 (Find bases for the row space and the column space of A) Let A be a matrix given as
A
= [-;o -; -3 3
~ - ~ -~] = [ ~~ ] .
3 4 6 0 -7
1 2
r3 ~
Find bases for the row space R(A), the null space N(A ), and the column space C(A) of A .
94
Chapter 3. Vector Spaces
Solutioo: (1) Find a basis for 'R (A ): By Gauss-Jordan elimination on A , we get the reduced row-echelon form V :
V =
[1o a
1]
2 o I -I 0 I 0 0 1 1
° ° °° °
.
Since the three nonzero row vectors VI V2 V3
(1, 0, 2, 0, 1), (0, 1, -1, 0, 1), (0, 0, 0, 1, 1)
= = =
of V are linearly independent, they form a basis for the row space 'R(V) = R(A), so dim R(A) 3. (Note that in the process of Gaussian elimination, we did not use a permutation matrix . This means that the three nonzero rows of V were obtained from the first three row vectors rl, r2, r3 of A and the fourth row r4 of A turned out to be a linear combination of them. Thus the first three row vectors of A also form a basis for the row space.) (2) Find a basisfor N(A) : It is enough to solve the homogeneous system Ux = 0, since N(A) = N(V). That is, neglecting the fourth zero equation , the equation Ux = 0 takes the following system of equations :
=
+
2x3
X2
X3
X4
+ + +
Xs
=
Xs
X5
=
°0
o.
Since the first, the second and the fourth columns of V contain the leading 1's, we see that the basic variables are XI , X2, X4, and the free variables are X3 , xs . As in Example 3.19 by assigning arbitrary values sand t to the free variables X3 and xs, one can find the solution x of Ux = 0 as
x= where n,
= (-2,1 ,1 ,0,
Oj and n,
SOs
+
tOt,
= (-1,
-1 ,0, -1, 1).lnfact,thetwovectors
n, and n, are the solutions when (X3 , xs) = (s, t) is (1,0) and when (X3, xs) = (s, t) is (0, 1), respectively. They must be linearly independent, since (1,0) and (0, 1), as the (X3, xs)-coordinates of n, and n, respectively, are linearly independent. Since any solution of V x = 0 is a linear combination of them, the set (os , n.] is a basis for the null space N(V) = N(A). Thus dimN(A) = 2 = the number offree variables in Vx=O. (3) Find a basis for C(A): Let CI, C2, C3, C4, Cs denote the column vectors of A in the given order. Since these column vectors of A span C(A), we only need to discard some of the columns that can be expressed as linear combinations of other column vectors. But, the linear dependence XI CI
+
X2C2 + X3 C3 + X4C4 + XScs = 0,
i.e., Ax = 0,
3.5. Rowand column spaces
95
holds if and only ifx = (Xt , ... ,xs ) E N (A). By taking x = n, = (-2 , 1, 1, 0, 0) orx = 0 , = ( - 1 , -1, 0, -1 , 1), the basis vectors of N(A) given in (2), we obtain two nontrivial linear dependencies of C j 's:
-
C4
+
Cs
=
0,
=
0,
respectively. Hence, the column vectors C3 and Cs corresponding to the free variables in Ax = 0 can be written as C3 Cs
2ct - C2,
=
Ct
+
C2
+
C4.
That is, the column vectors C3, Cs of A are linear combinations of the column vectors Ct, C2, C4, which correspond to the basic variables in Ax = O. Hence, [cj , C2, C4} spans the column space C(A) . We claim that [cj , C2, C4} is linearly independent. Let A = [Ct C2 C4] and U = [uj U2 U4] be submatrices of A and U, respectively, where Uj is the j-th column vector of the reduced row-echelon form U of A obtained in (1):
U=
100]
0 1 0 0 Oland A =
[o 0
0
[ 1 2 2] -2 -5 -1 0 - 3 4
3
.
6-7
Then clearly U is the reduced row-echelon form of Aso that N(A) = NeU). Since the vectors uj , U2, U4 are just the columns of U conta ining leading 1's, they are linearly independent, by Theorem 3.6, and Ux = 0 has only a trivial solution. This means that Ax = 0 has also only a trivial solution , so [cj , C2, C4} is linearly independent. Therefore, it is a basis for the column spaceC(A) and dimC(A) = 3 = the number of basic variables . That is, the column vectors of A corresponding to the basic variables 0 in Ux = Oform a basis for the column space C(A) . In summary, given a matrix A, we first find the (reduced) row-echelon form U of A by Gauss-Jordan elimination. Then a basis for R(A) = R(U) is the set of nonzero row vectors of U, a basis for N(A) = N(U) can be found by solving Ux = 0, for a basis for the column space C(A), one notices that C(U) =1= C(A) in general , since the column space of A is not preserved by Gauss-Jordan elimination. (See Problem 3.16.) However, we have dimC(A) = dimC (U) , and a basis for C(A) can be formed by selecting the columns in A, not in U , which correspond to the basic variables (or the leading 1's in U ).
96
Chapter 3. Vector Spaces
Alternatively, a basis for the column space C(A) can also be found with the elementary column operations, which is the same as finding a basis for the row space R(A T) of AT . Problem 3.13 Let A be the matrix given in Example3.20. Find the conditions on a, b, c, d so that the vector x = (a , b, c, d) belongs to C(A) . Problem 3.14 Find bases for R(A) andN(A) of the matrix
0 0 1 -2 2 -5 -3 -2 A= 0 5 15 10 [ 2 6 18 8
Also finda basis for C(A) by finding a basis for R(A T). Problem 3.15 Let A and B be twon x n matrices. Showthat AB space of B is a subspaceof the nullspaceof A.
= oif and only ifthe column
Problem 3.16 Find an exampleof a matrix A and its row-echelon form U such that C(A) C(U) . What is wrong in C(A) = R(A T ) = R(U T ) = C(U)?
f=
3.6 Rank and nullity The argument in Example 3.20 is so general that it can be used to prove the following theorem, which is one of the most fundamental results in linear algebra. The proof given here is just a repetition of the argument in Example 3.20 in a general case, and so it may be skipped at the reader's discretion.
Theorem 3.15 (The fundamental theorem) For any m x n matrix A , the row space and the column space of A have the same dimension ; that is, dim R(A) = dim C(A). Proof: Let dim R(A) = r and let U be the reduced row-echelon form of A . Then r is the number of the nonzero row (or column) vectors of U containing leading 1's, which is equal to the number of basic variables in Ux = 0 or Ax = O. We shall prove that the r columns of A corresponding to the leading l's (or basic variables) form a basis for C(A), so that dim C(A) = r = dim R(A). (1) They are linearly independent: Let Adenote the submatrix of A whose columns are those of A corresponding to the r basic variables (or leading 1's) in U , and let denote the submatrix of U consisting of the r columns containing leading 1'so Then, it is clear that 0 is the reduced row-echelon form of A, so that Ax 0 if and only if Ox = O. However, Ox = 0 has only a trivial solution since the columns
o
=
3.6. Rank and nullity
97
of V containing the leading I 's are linearly independent by Theorem 3.6. Therefore, Ax = 0 also has only the trivial solution, so the columns of Aare linearly independent. (2) They spanC(A): Note that the columns of A corresponding to the free variables are not contained in A, and each of these column vectors of A can be written as a linear combination of the column vectors of A (see Example 3.20). To show this, let {Cil' Ci2' • •• , Cik} be the columns of A (not contained in A) corresponding to the free variables {Xii' Xi2' • • • ,Xik}, and let Xij be any of these free variables. Then, by assigning the value 1 to Xi j and 0 to all the other free variables, one can get a nontrivial solution of Ax = XI cI + X2c2 + ... + xnc n = O. When such a solution is substituted into this equation, one can see that the column Cij of A corresponding to Xij 1 is written as a linear combination of the columns
=
of A. This can be done for each free variable Xi j' j = 1, 2, . .. , k, so the columns of A corresponding to those free variables are redundant in the spanning set of C(A). 0
Remark: (1) In the proof ofTheorem 3.15, once we have shown that the columns in A are linearly independent as in step (1), we may replace step (2) by the following argument: One can easily see that dim C(A) 2: dim R(A) by Theorem 3.11. On the other hand, since this inequality holds for arbitrary matrices, applying to A T particularly we get dim C(A") 2: dim R(A T ) . Moreover, C(A T ) = R(A) and R(A T ) = C(A) implies dim C(A) :s dim R(A), which means dim C(A) = dim R(A). This also means that the column vectors of A span C(A), and so form a basis. (2) The proof (2) of Theorem 3.15 also shows that the reduced row-echelon form of a system is unique , which was stated on page 10. In fact, if VI and V2 are two reduced row-echelon forms of an m x n matrix A, then the columns of VI and V2 corresponding to the basic variables (i.e., containing leading 1's) must be the same and of the form [0 . . . 0 1 0 .. . of by the definition of the reduced row-echelon form . If there are no free variables , then it is quite clear that 1 0
0 1
0 0
0 1 0
VI=
0
0
0
0
...
= V2 ·
0
Suppose that there is a free variable . Since VIX = 0 if and only if V2X = 0, one can easily check that the columns of VI and V2 corresponding to each free variable must also be the same, so that VI = V2.
98
Chapter 3. Vector Spaces
In summary, the following equalities are now clear from Theorems 3.14 and 3.15: dimN(A) =
dimN(U) the number of free variables in Ux = O.
dimR(A) =
dimR(U)
=
the number of nonzero row vectors of U
=
the maximal number of linearly independent row vectors of A
=
the number of basic variables in Ux = 0
=
the maximal number of linearly independent
=
dimC(A).
column vectors of A
Definition 3.9 For an m x n matrix A, the rank of A is defined to be the dimension of the row space (or the column space), denoted by rank A. Clearly, rank In = n and rank A = rank AT. And for an m x n matrix A, since dim R(A) S m and dim C(A) S n, we have the following corollary :
Corollary 3.16 If A is an m x n matrix, then rank AS min{m, n}. Since dim R(A) = dimC(A) = rank A is the number of basic variables in Ax = 0, and dimN(A) = nullity of A is the number of free variables Ax = 0, we have the following theorem.
Theorem 3.17 (Rank Theorem) For any m x n matrix A, dim R(A) dimC(A)
+ +
dimN(A) dimN(A T) =
rank A rank A
+ +
nullity of A = nullity of AT =
n, m.
If dimN(A) = 0 (or N(A) = (O}), then dim R(A) = n (or R(A) = ~n), which means that A has exactly n linearly independent rows and n linearly independent columns. In particular, if A is a square matrix of order n, then the row vectors are linearly independent if and only if the column vectors are linearly independent. Therefore, Ax = 0 has only the trivial solution, and by Theorem 1.9 we get the following corollary.
Corollary 3.18 Let A be an n x n square matrix. Then A is invertible rank A = n.
Example 3.21 (Find the rank and the nullity) For a 4 x 5 matrix
if and only if
3.6. Rank and nullity
A-
[
99
2
-~1
-2 2 2
1
find the rank and the nullity of A.
Solution: Gaussian elimination gives
2021]
013 1 o0 0 1 000 0
.
=
The first three nonzero rows containing leading 1 's in U form a basis for R(U) R(A). Therefore, rank A = dim'R.(A) = dimC(A) = 3, the nullity of A = dimN(A) = 5 - dim R(A) = 2. 0
Problem 3.17 Find the nullity and the rank of each of the following matrices : (1) A
=[
~
; -~
-I -2
~],
0-5
For each of the matrices. show that dim R(A)
(2) A
= [~
~ ~ ~].
2 I 5 -2
= dim C(A) directly by finding their bases.
= b has a solution if and only if b], where [A b] denotes the augmented matrix for Ax = b.
Problem 3.18 Show that a system of linear equations Ax rank A
= rank [A
Theorem 3.19 For any two matrices A and B for which A B can be defined, (1) (2) (3) (4)
N(AB) ;2 N(B), N«AB)T) ;2 N(A T), C(AB) S; C(A), R(AB) S; R(B).
Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x (3) For an m x n matrix A and an n x p matrix B, C(AB)
= S;
{ABx : x E lR.P} {Ay: y E 1R.n} = C(A),
because Bx E lRn for any x E lRP . (See Example 3.9.) (4) R(AB)
= A(Bx) = O.
= C«ABl) = C(B T AT)
S; C(B T)
= R(B).
o
100
Chapter 3. Vector Spaces
Corollary 3.20 rank(AB)
:s min{rank A , rank B} .
In some particular cases, the equality holds. In fact, it will be shown later in Theorem 5.25 that for any square matrix A, rank(A T A) = rank A = rank(AA T ) . The following problem illustrates another such case. Problem 3.19 Let A be an invertible square matrix. Show that, for any matrix B, rank(AB) rank B = rank(B A) .
=
Theorem 3.21 Let A be an m x n matrix of rank r. Then
:s
(1) for every submatrix C of A, rank C r, and (2) the matrix A has at least one r x r submatrix ofrank r .. that is, A has an invertible submatrix oforder r.
Proof: (l) Consider an intermediate matrix B which is obtained from A by removing the rows that are not wanted in C. Then clearly nCB) ~ R(A) and hence rank B :s rank A. Moreover, since the columns of C are taken from those of B , C(C) ~ C(B) and rank C :s rank B. (2) Note that one can find r linearly independent row vectors of A, which form a basis for the row space of A. Let B be the matrix whose row vectors consist of these vectors. Then rank B = r and the column space of B must be of dimension r. By taking r linearly independent column vectors of B, one can find an r x r submatrix C of A with rank r. 0 Problem 3.20 Prove that the rank of a matrix is equal to the largest order of its invertible submatrices . Problem 3.21 For each of the matrices given in Problem 3.17, find an invertible submatrix of the largest order.
3.7 Bases for subspaces In this section, we introduce two ways of finding bases for V + Wand V n W of two subspaces V and W of the n-space jRn, and then derive an important relationship between the dimensions of those subspaces in terms of the dimensions of V and W. Let ex = {VI, V2, • •. , Vk} and f3 = {WI, W2, • .• , we} be bases for V and W, respectively. Let Q be the n x (k + l) matrix whose columns are those basis vectors:
Q=
[VI" . Vk WI ... wtlnx(kH) .
Theorem 3.22 Let V and W be two subspaces ofjRn, and Q be the matrix defined above. (1) C(Q) = V + W , so that a basisfor the column space C(Q) is a basis for V (2) N(Q) can be identified with V n W so that dim(V n W) = dimN(Q).
+
W.
3.7. Bases for subspaces
101
Proof: (1) It is clear that C(Q) = V + W. (2) Let x = (al," " ak, bl, , be) E N(Q) ~ jRk+l . Then Qx = alvl
+
+
akVk + blwl
+ ... +
bewe = 0,
from which we get
If we set
Y =
alvl+ · .. +akvk -(bl WI
+ ... +
bew £),
then Y E V n W since the first right-hand side alVI + . . . + akVk is in Vasa linear combination of the basis vectors in a and the second right-hand side -(blwl + ... + bew£) is in W as a linear combination of the basis vectors in p. That is, to each x E N(Q), there corresponds a vector Y in V n w. On the other hand, if Y E V n W, then Y can be written in two linear combinations by the bases for V and W separately as
Y = Y
=
alvl+· · ·+akvkEV, blwl
+ .. . +
bewe E W,
for some al , .. . , ak and bl , . . . , be. Let x = (al , . .. , ak, -bl , . .. , -be) E jRk+e. Then it is quite clear that Qx = 0, i.e., x E N(Q). Therefore, the correspondence ofx in N (Q) ~ jRk+ e to a vector y in V n W ~ jRn gives us a one-to-one correspondence between the sets N(Q) and V n W . Moreover, if Xi, i = 1, 2, correspond to Yi, then one can easily check that XI + X2 corresponds to YI + Y2, and kx, corresponds to ky!. This means that the two vector spaces N (Q ) and V n W can be identified as vector spaces (see Section 4.2 for an exact meaning of this identification). In particular, for a basis for N(Q) , the corresponding set in V n W is a basis for V n W : that is, if the set of vectors
I 1
=
(all, .. " alk , bll'
, bl e),
x, =
(asl , ... , ask, bsl ,
, bs£),
XI:
is a basis for N (Q), then the set of vectors YI
1
=
allvl
+ . .. +
alkvk,
: Ys
or
=
a slvl
is a basis for V
+ .. . +
askvk,
n W, and vice-versa.
1
YI
=
-(bllWI
+ ... +
blewe),
=
-(bsIWI
+ ...+
bsewt)
: Ys
This implies that
dimN(Q ) = dim(V Note that dim(V + W) =1= dim V gives a relation between them.
+
n W) .
o
dim W , in general. The following theorem
102
Chapter 3. VectorSpaces
Theorem 3.23 For any subspaces V and W of the n-space IRn, dim(V
+
W)
+
dim(V
n W) = dim V +
dim W.
Proof: Let a = {VI, V2, . •. , vd and f3 = {WI , W2, . . . , we} be bases for V and W , respectively. Let Q be the n x (k + f) matrix whose columns are the previous basis vectors: Q = [VI ' .. Vk WI . . . welnx(k+l)' Then, by the Rank Theorem and Theorem 3.22, we have k
+
f = dimC(Q)
+
In particular, dim(V + W) case, V + W = V EB W .
dimN(Q) = dim(V
+
W)
+
dim(V
n W).
D
= dim V + dim W if and only if V n W = {OJ. In this
Example 3.22 (Find a basis for a subspace) Let V and W be two subspaces of IRs with bases
r V2 V3
= = =
(1 , 3, -2, 2, 3), (1, 4, -3, 4, 2), 0, 2, 3), (1, 3,
r W2 W3
= = =
(2, 3, -1, (1, 5, -6, (2, 4 , 4,
-2, 9), 6, 1), 2, 8),
respectively. Find bases for V + Wand V n w.
Solution: The matrix Q takes the following form:
The Gauss-Jordan elimination gives
U =
5 00] 2 0
1 0 0 0 1 0 -3
[ 001
000
o
-1 0
.
001
From this, one can directly see that dim(V + W) = 4, and the columns VI, V2, V3, W3 corresponding to the basic variables in Qx = 0 (or leading 1's in U) form a basis for C(Q) = V + W. Moreover, dimN(Q) = dim(V n W) = 2, corresponding to two free variables X4 and xs in Qx = O. To find a basis for V n w, we solve Ux = 0 for (X4, xs) = (1,0) and (X4, xs) = (0,1) respectively to obtain a basis for N(Q) :
3.7. Bases for subspaces Xl
= (-5,3,0,1,0,0)
and
X2
103
= (0, -2,1,0,1,0).
From Qx; = 0, we obtain two equations: -5VI -2V2
+ +
3V2 V3
+ +
WI W2
= =
0, 0.
Therefore, {YJ, Y2} is a basis for V n W, where
Clearly, one can check dim(V
+
W)
+
dim(V
n W)
= 4 + 2 = 3 + 3 = dim V
+
dim W.
0
Remark: (Another method for finding bases) Example 3.22 illustrates a method for findingbases for V + Wand V n W for givensubspaces V and W oflRn by constructing a matrix Q whose columns are basis vectors for V and basis vectors for W. There is another method for finding their bases by constructing a matrix Q whose rows are basis vectors for V and basis vectors for W. In this case, clearly V + W = R(Q) . By finding a basis for the row space R(Q), one can get a basis for V + W. On the other hand, a basis for V n W can be found as follows: Let A be the k x n matrix whose rows are basis vectors for V, and B the e x n matrix whose rows are basis vectors for W. Then V = R(A) and W = R(B). Let A denote the matrix A with an additional unknown vector X = (Xl , x2, . .. ,Xn ) E IRn attached as the bottom row, i.e.,
and the matrix B is defined similarly.Then it is clear that R(A) = R(A) and R(B) = R(B) if and only if X E V n W = R(A) n R(B). This means that the row-echelon form of A and that of A should be the same via the same Gaussian elimination.
Thus, by comparing the row vectors of the row-echelon form of A with those of A, one can obtain a system of linear equations for x = (Xl, x2, . . . ,xn ) . By the same argument applied to Band B, one gets another system oflinear equations for the same x = (Xl, X2, ••. ,xn ) . Common solutions of these two systems together will provide us a basis for V n W. The following example illustrates how one can apply this argument to find bases for V + Wand V n W.
Example 3.23 (Find a basis for a subspace) Let V be the subspace of 1R5 spanned by
104
Chapter 3. Vector Spaces
(1, 3, -2, 2, 3), 4, 2), ( I, 4, -3, (2, 3, -I , -2, 10),
= V2 = V3 = VI
and W the subspace spanned by = W2 = W3 =
WI
Find a basis for V
+
Wand for V
(1, 3, 0, 2, I) , -6, ( I , 5, 6, 3), (2, 5, 3, 2, I ) .
n w.
Solution: Note that the matrix A whose row vectors are echelon form
I 3 -2 2 ° 1 -1 2 [ °° °°
Vi 'S
is reduced to a row-
3] -1 , 1
so that dim V = 3. Similarly, the matrix B whose row vectors are w/s is reduced to a row-echelon form
1 3 0 2 ° 2 -6 4 [ °° °°°
1] 2 ,
so that dim W = 2. Now, if Q denotes the 6 x 5 matrix whose row vectors are Vi 'S and w/s, then V + W = R (Q ). By Gaussian elimination, Q is reduced to a row-echelon form, excluding zero rows:
I 3 -2 2 -13 ] -1 2 I I -I .
°
° ° °° °° ° Thus, the four nonzero row vectors [
I
(1, 3, -2, 2, 3), (0, 1, -1 , 2, -1), (0, 0, 1, 0, -1), (0, 0, 0, 0, 1)
form a basis for V + W, so that dim(V + W) = 4. We now find a basis for vn W.A vector x = (XI , X2, X3, X4, xs) E ]Rs is contained in V n W if and only if x is contained in both the row space of A and that of B. Let A be A with x attached as the last row: A- =
[~
2
XI
2 3]
3 -2 4 -3 4 2 3 - I - 2 10 X2
X3
X4
X5
.
3.7. Bases for subspaces Then by the same Gaussian elimination to reduce A to its row-echelon form, reduced to
[
1 3
-2
1 0 0
-1 0
o o o
-i] 1
o
-XI+X2+X3
105
A is
.
Therefore, x E R(A) = V if and only if R(A) = R(A) . By comparing the row vectors of the row-echelon form of A with those of A, one can say that x E RCA) if and only if the last row vector of the row-echelon form of A is the zero vector, that is, x is a solution of the homogeneous system of equat ions
+
- XI {
The same calculation with for x:
1
X2
4xI -
- 9X I
+
X3
0
=
+
2X2
= o.
X4
iJ gives another homogeneous system of linear equations
+
4xI 2xI -
3X2
+
X3
2X2
+
X4
X2
+
X5
= =
0 0 O.
Solving these two homogeneous systems together yields
V
nW =
{t(I, 4, -3, 4, 2) : t E 1R}.
Hence, {(l, 4, -3, 4, 2)} is a basis for V
n Wand dim(V n W) = 1.
Problem 3.22 Let V and W be the subs paces of the vector space P3 (R) spann ed by 3
x
5 5 and
I
5x
+ + +
4x 2 5x 2 lOx 2
+ + +
x3 x3 3x 3
+ 3x 2 + 2x 3 + 2x 2 + x 3 W2(X) = 5 + 4x 2 + x 3 W3(X) = 6 respectively. Find the dimensions and bases for V + Wand V n w. WI (x)
=
9 -
3x x
Problem 3.23 Let
V W
=
{(x,Y,Z,U)ElR4 :
{(X , Y, z, u)
E
4
y + z+ u = O},
lR : x
be two subs paces of lR4 . Find bases for V. W, V
+
+ Y = 0, Z = 2U}
W, and V
n w.
0
106
Chapter 3. Vector Spaces
3.8 Invertibility In Chapter 1, we have seen that a non-square matrix A may have only one-sided (right or left) inverses. In this section , it will be shown that the existence of a onesided inverse (right or left) of A implies the existence or the uniqueness of the solutions of a system Ax = b. Theorem 3.24 (Existence) Let A be an m xn matrix. Then the following statements are equivalent. (1) (2) (3) (4)
For each b E jRm, Ax = b has at least one solution x in jRn. The column vectors of A span jRm, i.e., C(A) jRm. rank A = m (hence m :s n) . A has a right inverse (i.e., B such that AB = 1m ) .
=
Proof: (1) <=> (2): In general, C(A) ~ jRm. For any b E jRm, there is a solution x E jRn of Ax = b if and only if b is a linear combination of the column vectors of A, i.e., b E C(A) . Thus jRm = C(A). (2) <=> (3): C(A) = jRm if and only if dimC(A) = m :s n (see Problem 3.10). But dimC(A) = rank A = dim R(A) :s min{m, n},
(1) => (4): Let ej , e2, ... , em be the standard basis for jRm. Then for each ei, one can find an Xi E jRn such that AXi = ei by the hypothesis (1). If B is the n x m matrix whose columns are these Xi'S: i.e., B = [Xj X2 • . . xm ] , then, by matrix multiplication,
(4) => (1): If B is a right inverse of A, then for any b s jRm, X = Bb is a solution ~~=~
0
Condition (2) means that A has m linearly independent column vectors, and condition (3) implies that there exist m linearly independent row vectors of A, since rank A m dim R(A) . Note that if C(A) ~ jRm, then Ax = b has no solution for b ¢ C(A) .
= =
Theorem 3.25 (Uniqueness) Let A be an m x n matrix. Then thefollowing statements are equivalent. (1) (2) (3) (4) (5) (6)
For each b E jRm, Ax = b has at most one solution x in jRn. The column vectors of A are linearly independent. dimC(A) = rank A = n (hence n :s m). R(A) = jRn. N(A) = {OJ. A has a left inverse (i.e., C such that C A = In).
Proof: (1) => (2): Note that the column vectors of A are linearly independent if and only if the homogeneous equation Ax = 0 has only the trivial solution . However,
3.8. Invertibility
107
Ax = 0 has always the trivial solution x = 0, and the statement (1) implies that it is the only one. (2) {} (3): Clear, because all the column vectors are linearly independent if and only if they form a basis for C(A), or dim C(A) = n ~ m. (3) {} (4): Clear, because dim R(A) R(A) = jRn (see Problem 3.10). (4) {} (5): Clear, since dim R(A)
+
= rank A = dimC(A) = n if and only if
dimN(A) = n.
(2) ::::} (6): Suppose that the columns of A are linearly independent so that rank A = n. Extend these column vectors of A to a basis for jRm by adding m -n additional independent vectors to them. Construct an m x m matrix S with those basis vectors in its columns. Then the matrix S has rank m, and hence it is invertible. Let C be the n x m matrix obtained from S-I by throwing away the last m - n rows. Since the first n columns of S constitute the matrix A, we have C A = In.
(6) ::::} (1): Let C be a left inverse of A. If Ax = b has no solution, then we are done . Suppose that Ax = b has two solutions, say Xt and X2. Then XI
= CAxI = Cb = CAX2 = X2·
o
Hence, the system can have at most one solution.
Remark: (1) We have proved that an m x n matrix A has a right inverse if and only if rank A = m, while A has a left inverse if and only if rank A = n. Therefore, if m 1= n, A cannot have both left and right inverses. (2) For a practical way of finding a right or a left inverse of an m xn matrix A, we will show later (see Remark (1) below Theorem 5.26) that if rank A = m, then (AAT )-I exists and AT(AAT)-I is a right inverse of A, and if rank A = n, then (AT A)-I exists and ( A T A)-I AT is a left inverse of A (see Theorem 5.26) . (3) Note that if m = n so that A is a square matrix, then A has a right inverse (and a left inverse) if and only if rank A = m = n. Moreover, in this case the inverses are the same (see Theorem 1.9). Therefore, a square matrix A has rank n if and only if A is invertible. This means that for a square matrix "Existence Uniqueness," and the ten statements listed in Theorems 3.24-3.25 are all equivalent. In particular, for the invertibility of a square matrix it is enough to show the existence of a one-sided inverse.
=
Problem 3.24 For each of the following matrices, find all vectors b such that the system of linear equations Ax = b has at least one solution . Also, discuss the uniqueness of the solution.
108
Chapter 3. Vector Spaces (I) A
(3) A
~
[J
~
[l
3 -2 5 4 1 3 7 - 3 6 13
~l
(2) A
-3]
2 -3 -2 0 -4 3 -2 8 -7 -2 _~; 1 -9 -10
, (4) A
~[ =
;
-;
-6
[
l ;l
1
1 1 ~ 5
2 -2
Summarizing all the results obtained so far about solvability of a system, one can obtain several characterizations of the invertibility of a square matrix. The following theorem is a collection of the results proved in Theorems 1.9,3.24, and 3.25.
Theorem 3.26 For a square matrix A oforder n, the following statements are equivalent. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)* (16)* (17)*
A is invertible. det A =I O. A is row equivalent to In . A is a product of elementary matrices. Elimination can be completed: PA LDU, with all d; =I O. Ax = b has a solution for every bE jRn . Ax = 0 has only a trivial solution, i.e., N(A) = (OJ. The columns of A are linearly independent. The columns of A span jRn, i.e., C(A) = jRn. A has a left inverse. rank A = n. The rows of A are linearly independent. The rows of A span jRn, i.e., R(A) = R". A has a right inverse. The linear transformation A : jRn -+ jRn via A(x) = Ax is injective. The linear transformation A : jRn -+ jRn is surjective. Zero is not an eigenvalue of A.
=
Proof: Exercise; where have we proved which claim? Prove any not covered. The numbers with asterisks will be explained in the following places: (15) and (16) in Remark on page 133 and (17) in Theorem 6.1. 0
3.9 Applications 3.9.1 Interpolation In many scientific experiments , a scientist wants to find the precise functional relationship between input data and output data. That is, in his experiment, he puts various
3.9.1. Application: Interpolation
109
input values into his experimental device and obtains output values corresponding to those input values. After his experiment, what he has is a table of inputs and outputs. The precise functional relationship might be very complicated, and sometimes it might be very hard or almost impossible to find the precise function. In this case, one thing he can do is to find a polynomial whose graph passes through each of the data points and comes very close to the function he wanted to find. That is, he is looking for a polynomial that approximates the precise function. Such a polynomial is called an interpolating polynomial. This problem is closely related to systems of linear equations. Let us begin with a set of given data: Suppose that for n + I distinct experimental input values xo, Xl, ... , x n, we obtained n + I output values YO = I(xo), YI = I(XI), . .. ,Yn = I(xn) . The output values are supposed to be related to the inputs by a certain (unknown) function I. We wish to construct a polynomial p(x) of degree less than or equal to n which interpolates I (x) atxo, Xl, . . . , Xn: i.e., p(Xi) = Yi = I (Xi) for i = 0, I , . . . , n. Note that if there is such a polynomial, it must be unique. Indeed, if q (x) is another such polynomial, then hex) = p(x) - q(x) is also a polynomial of degree less than or equal to n vanishing at n + I distinct points Xo, Xl, . .. , Xn. Hence h (x) must be the identically zero polynomial so that p(x) = q(x) for all X E JR. To find such a polynomial p(x), let
with n + I unknowns ai's. Then, p(Xi) = ao + alXi
+ ... +
anxf = Yi = I(Xi),
for i = 0, I, . .. , n. In matrix notation, XQ ., . . .. I Xl
[
... .. .
I
Xn
X~] [ Xl
..
ao] al
.
...
X~
an
-
[ Yo Yl ]
. ..
.
Yn
The coefficient matrix A is a square matrix of order n + I, known as Vandennonde's matrix (see Example 2.10), whose determinant is
detA =
n
(X) -Xi) .
O;S.i<};S.n Since the Xi'S are all distinct, det A =1= O. Hence, A is nonsingular, and Ax = b has a unique solution which determines the unique polynomial p(x) of degree g n passing through the given n + I points (xo, Yo), (Xl , Yl) , .. . , (Xn, Yn) in the plane ]R2.
Example 3.24 (Finding an interpolating polynomial) Given four points (0, 3), (1, 0), (-I , 2) , (3, 6)
110
Chapter 3. VectorSpaces
in the plane ]R2, let p(x) = ao+alx+a2x2+a3x3 be the polynomial passing through the given four points . Then , we have a system of equations
l
ao
ao + ao ao +
al al 3al
+ +
+
a2 a2 9a2
Solving this system, one can get ao = 3, al polynomial is p(x) = 3 - 2x - 2x 2 + x 3. Problem 3.25 Let f(x)
4-,
+
a3 a3 27a3
+
= = =
3
0 2 6.
= -2, a2 = -2, a3 = I, and the unique 0
= sinx . Then at x = 0,
~,
t,
~, zr, the values of fare y
=
0, ~ , ~ , O. Find the polynomial p(x) of degree g 4 that passes through these five points. (One may need to use a computer to avoid messy computation.) Problem 3.26 Find a polynomial p(x) p'(O) = 2, p(l) = 4, p'(l) = 4.
=a +
bx
+
cx 2
+
d x 3 that satisfies p(O)
= 1,
Problem 3.27 Find the equation ofa circle that passes through the three points (2, -2), (3, 5), and (-4, 6) in the plane ]R2 .
Remark: Note that the interpolating polynomial p(x) of degree ~ n is uniquely determined when we have the correct data, i.e., when we are given precisely n + 1 values of y at n + 1 distinct points Xo , XI , . .. , x n . However, if we are given fewer data, then the polynomial is under-determined: i.e., if we have m values of y with m < n + 1 at m distinct points XI, X2, ... , X m, then there are as many interpolating polynomials as the null space of A since in this case A is an m x (n + 1) matrix with m < n + 1. (See the Existence Theorem 3.24.) On the other hand , if we are given more than n + 1 data, then the polynomial is over-determined: i.e., if we have m values of y with m > n + 1 at m distinct points XI, X2, • •. , X m, then there may not exist an interpolating polynomial since the system could be inconsistent. (See the Uniqueness Theorem 3.25.) In this case, the best one can do is to find a polynomial of degree ~ n to which the data is closest, called the least square solution. It will be reviewed again in Sections 5.9-5.9.2.
3.9.2 The Wronskian Let YI , Y2, . • . , Yn be n vectors in an m-dimensional vector space V. To check the independence of the vector Yi 's, consider its linear dependence: CIYI
=
+
C2Y2
+ ... + cnYn
= O.
Let a {Xl, X2, .. . ,x m} be a basis for V. By expressing each Yi as a linear combination of the basis vectors Xi'S: i.e., Yj = aijXi, the linear dependence of Yi'S can be written as a linear combination of the basis vectors Xi'S:
I:7=1
3.9.2. Application: The Wronskian
o=
ClYl
+
C2Y2
+ . .. +
(all Cl + a12c2 +
CnYn
+
(a21CI
+
a22C2
+
+
111
alnCn)Xl
+
a2nCn)X2
so that all of the coefficients (which are also linear combinations of c; 's) must be zero. It gives a homogeneous system of linear equations in c; 's, say Ac = 0 with an m x n matrix A, as in the proof of Lemma 3.9:
Recall that the vectors Yi 's are linearly independent if and only if the system Ac = 0 has only the trivial solution. Hence, the linear independence of a set of vectors in a finite dimensional vector space can be tested by solving a homogeneous system of linear equations . If V is not finite dimensional, this test for the linear independence of a set of vectors cannot be applied. In this section, we introduce a test for the linear independence of a set of functions . For our purpose, let V be the vector space of all functions on lR which are differentiable infinitely many times . Then one can easily see that V is an infinite dimensional vector space . Let!J,
h. . .. , In be n functions in V. The n functions are linearly independent
in V if
CIfl
+
c2h
+ ... + c-J« =
0
implies that all c; = O. Note that the zero function 0 takes its value zero at all the points in the domain. Thus they are linearly independent if CI!I (x)
+
c2h(x)
+ ... +
cnfn(x) = 0
for all x E lRimplies that all c; = O. By taking a differentiations n - 1 times, one can obtain n equations:
for all x E R Or, in matrix form :
[
!I (x)
h(x)
f{(x)
f~(x)
fl(n-:l) (x)
j 2(n -
l) (
x)
f~(x) fn (x)
.. ·
fn(n-:l)(x)
[ 0 ]
C2 ] [ Cl]
C~
=
b.
112
Chapter 3. VectorSpaces
The determinant of the coefficient matrix is called the Wronskian for {II (x), h(x), . .. , fn(x)} and denoted by W(x) . Therefore, if there is a point Xo E JR such that W(xo) ::j:. 0, then the coefficient matrix is nonsingular at x = xo, and so all ci = O. Therefore,
if the Wronskian W (x) ::j:. 0 for at least one x E then fl' h, .. ., fn are linearly independent.
R
However, the Wronsk ian W (x) = 0 for all x does not imply linear dependence of the given functions f; 's. In fact, W (x) = 0 means that the functions are linearly dependent at each point x E JR, but the constants Cj'S giving nontrivial linear dependence may vary as x varies in the domain . (See Example 3.25 (2).) Example 3.25 (Test the linear independence offunctions by Wronskian) (1) For the sets of functions FI = {x, cosx, sinx} and F2 = {x, e", e- X}, the Wronskians are WI (x)
= det [
sinx ]
cosx X 1 - sinx o -cosx
c~sx
and
x
W2(X)
=
=x
-SlllX
eX eX det 1 eX _e- x [ o e" e- X
]
= 2x .
Since Wj(x) ::j:. 0 for x ::j:. 0, both FI and h are linearly independent. (2) For the set offunctions [x]x], x 2 } on JR, the Wronskian for them is xlxl
x
2
W(x) = det [ 21xl 2x
]
= 0
for all x. These two functions are linearly dependent on each of (-00,0] and [0, 00), since x]x] = -x 2 on (-00,0] and xlxl = x 2 on [0, 00). But they are clearly linearly 0 independent functions on R Problem 3.28 Show that 1, x, x 2 , .. . , x n are linearly independent in the vectorspace C(lR) of continuous functions.
3.10 Exercises 3.1. Let V be the set of all pairs (x, y) of real numbers. Define (x , y)
+
(Xt , YI)
(x +xl, Y +
k(x, y )
(kx, y) .
Is V a vectorspacewiththeseoperations?
yt>
113
3.10. Exercises 3.2. For x, Y E JRn and k E JR, define two operations as
xE9y=x-y,
k ·x
= -kx.
The operations on the right-hand sides are the usual ones . Which of the rules in the definition of a vector space are satisfied for (JR n , E9, .)?
3.3. Determine whether the given set is a vector space with the usual addition and scalar multiplication of functions .
(1) The set of all continuous functions 1.
f
(2) The set of all continuous functions f(x) = O.
defined on the interval [-1, 1] such that
f
f
(0)
=
defined on the real line JR such that Iimx->oo
(3) The set of all twice differentiable functions f defined on JR such that fl/ (x) + f (x) =
O. 3.4. Let C 2 [ -1, 1] be the vector space of all functions with continuous second derivative s on the domain [-1, 1] . Which of the following subsets is a subspace of C 2[-1, I]? (1) W (2) W
= {I(x)
E C 2[ -I ,
= {I(x) E C 2[-I,
1] : fl/ (x ) + f(x)
1] : fl/(x)
+
f(x)
= 0,
-1::: x::: I}.
= x 2,
-1::: x::: I}.
3.5 . Which of the following subsets of C[ -1, 1] is a subspace of the vector space C[ -I, 1] of continuous functions on [-1, I]? (1) W (2) W
= {I(x) E C[-I,
= {I(x) E C[-I ,
1] : f(-I)
= -f(1)}.
1] : f(x)
0 for all x in [-1, I]}.
~
(3) W = {I(x) E C[-I , 1] : fe-I) = -2 and f(1) = 2} . (4) W
= {I(x) E C[-I ,
1] : f(i)
= OJ.
3.6 . Show that the set of all matrices of the form AB - BA cannot span the vector space Mnxn(lR). 3.7. Does the vector (3, -1, 0, -1) belong to the subspace oflR.4 spanned by the vectors (2, -1 , 3, 2) , (-1, 1, 1, -3) and (1, 1, 9, -5) ? 3.8. Expres s the given function as a linear combination of functions in the given set Q. (1) p(x) = -1 - 3x + 3x 2 and Q = {PI (x), P2(X), P3( X)} , where PI(X)=1+2x+x 2, P2(X)=2+5x, P3(X)=3+8x-2x 2 .
=
=
-2 - 4x + x 2 and Q {PI (x) , P2(x), P3(X), P4(X)}, where (2) p(x) PI(X) = I+2x 2+x3, P2(X) = I+x+2x 3, P3(X) = - I - 3 x - 4x3, P4(X) I +2x _x 2 +x 3.
=
3.9. Is {cos2 x , sin 2 x , 1, eX} linearly independent in the vector space C(JR)? 3.10. In the n-space JRn , determine whether or not the set
is linearly dependent. 3.11. Show that the given sets of functions are linearly independent in the vector space C[-]f, ]fl. (1) {I , x, x 2, x 3 , x 4}
114
Chapter 3. Vector Spaces (2) {I, eX, e2x , e3x }
(3) {I, sinx, cosx, .. . , sinkx , coskx} 3.12. Are the vectors VI = (I, I, 2, 4), V3=(1, -1, -4, 0),
V2 = (2, -I, -5,2), V4=(2, 1, 1,6)
linearly independent in the 4-space ]R4?
3.13. In the 3-space ]R3 , let W be the set of all vectors (XI , X2, X3) that satisfy the equation XI - X2 - X3 = O. Prove that W is a subspace of]R3. Find a basis for the subspace W. 3.14. Let W be the subspace of C[ -1£ , 1£) consisting of functions of the form f(x) = a sinx + b cos x . Determine the dimension of W. 3.15. Let V denote the set of all infinite sequences of real numbers : V
= {x: x = {x;}~I 'x; E ]R} .
Ifx = {x;} and y = {y;} are in V, then x + y is the sequence {x; number, then ex is the sequence {ex; }~I '
+
Y;}~I'
If cis a real
(1) Prove that V is a vector space. (2) Prove that V is not finite dimensional.
3.16. For two matrices A and B for which AB can be defined, prove the following statements: (1) If both A and B have linearly independent column vectors, then the column vectors of AB are also linearly independent. (2) If both A and B have linearly independent row vectors, then the row vectors of AB are also linearly independent. (3) If the column vectors of B are linearly dependent, then the column vectors of AB are also linearly dependent. (4) If the row vectors of A are linearly dependent, then the row vectors of AB are also linearly dependent.
3.17. Let U = {(x, y, z) : 2x subspaces of]R3. (1) Find a basis for U
+
3y + z = O} and V = {(x, y, z) : x
+
2y - z = O} be
n V.
(2) Determine the dimension of U + V. (3) Describe U , V,
un V and U +
V geometrically.
3.18. How many 5 x 5 permutation matrices are there? Are they linearly independent? Do they span the vector space MSxS(]R) ?
3.19. Find bases for the row space, the column space, and the null space for each of the following matrices .
(1) A
(3) C
[i IS] ~ [i ~
2 4 -3 0 2 -I 1
3 6 9
n
,
(2) B
(4) D
~
[!
~1 [
2 1 -52 ] , 1 -2 5 0 0 1 -1 1 -1 -23 '1 ] 1 -1 8 3 . 0 -2 2 1 5 -5 5 10
n
3.10. Exercises
3.20 Hnd the rank of A as a function of x A
~
U: =~
115
3.21 . Find the rank and the largest invertible submatrix of each of the following matrices .
(1)
[~ooooJ ~ ! i1.
3.22. For any nonzero column
(2) [ ;
~~
;],
(3)
[i ~ ~ i1.
IOOOJ vectors u and v, show that the matrix A = uv T 1114
has rank 1. Conversely, every matrix of rank 1 can be written as uv T for some vectors u and v.
3.23. Determine whether the following statements are true or false, and justify your answers. (1) The set of all n x n matrices A such that AT Mnxn(lR).
= A -1
is a subspace of the vector space
(2) If Ct and {3 are linearly independent subsets of a vector space V, so is their union a U {3. (3) If U and Ware subspaces of a vector space V with bases
the intersection Ct n {3 is a basis for U n W.
Ct
and {3 respectively, then
(4) Let U be the row-echelon form of a square matrix A. lithe first r columns of U are linearly independent, so are the first r columns of A. (5) Any two row-equivalent matrices have the same column space. (6) Let A be an m x n matrix with rank m. Then the column vectors of A span jRm .
= b has at most one solution . If U is a subspace of V and x, y are vectors in V such that x + y is contained in U,
(7) Let A be an m x n matrix with rank n. Then Ax (8)
then x E U andy E U. (9) Let U and V are vector spaces . Then U is a subspace of V if and only if dim U ~
dimV . (10) Forany m x n matrix A , dimC(A T)
+
dimN(A T)
= m.
4
Linear Transformations
4.1 Basic properties of linear transformations As shown in Chapter 3, there are many different vector spaces even with the same dimension . The question now is how one can determine whether or not two given vector spaces have the 'same' structure as vector spaces , or can be identified as the same vector space. To answer the question , one has to compare them first as sets, and then see whether their arithmetic rules are the same or not. A usual way of comparing two sets is to define afunction between them. When a function f is given between the underlying sets of vector spaces , one can compare the arithmetic rules of the vector spaces by examining whether the function f preserves two algebraic operations: the vector addition and the scalar multiplic ation, that is, f(x + y) = f(x) + f(y) and f(kx) = kf(x) for any vectors x, y and any scalar k. In this chapter, we discuss this kind of functions between vector spaces . Definition 4.1 Let V and W be vector spaces. A function T : V -+ W is called a linear transformation from V to W if for all x, y E V and scalar k the following conditions hold: (1) T(x + y) = T(x) (2) T(kx) = kT(x) .
+
T(y),
We often call T simply linear. It is easy to see that the two conditions for a linear transformation can be combined into a single requirement T(x
+
ky) = T(x)
+
kT(y).
Geometrically, the linearity is just the requirement for a straight line to be transformed to a straight line, since x + ky represents a straight line through x in the direction y in V , and its image T(x) + kT(y) also represents a straight line through T(x) in the direction of T(y) in W . J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
118
Chapter 4. Linear Transformations
Example 4.1 (Lin ear or not ) Consider the following functions: (1) f : jR -+ jR defined by f(x ) = 2x ; (2) g : jR -+ jR defined by g (x) = x 2 - x; (3) h : jR2 -+ jR2 defined by h ex, y) = (x - y, 2x) ; (4) k : jR2 -+ jR2 defined by k(x, y) = (xy , x 2 1).
+
One can easily see that g and k are not linear, while f and h are linear. Moreover, on the l-space R, all polynomials of degree greater than one are not linear. 0 Example 4.2 (A matrix A as a linear transformation) (1) For an m x n matrix A, the transformation T : jRn -+ jRm defined by the matrix product T(x) = Ax is a linear transformation by the distributive law A(x + ky) = Ax + kAy for any x, y E jRn and for any scalar k E IR. Therefore, a matrix A, identified with the transformation T, may be considered as a linear transformation from jRn to jRm. (2) For a vector space V, the identity transformation id : V -+ V is defined by id(x) = x for all x E V. If W is another vector space, the zero transformation To : V -+ W is defined by To(x) = 0 (the zero vector) for all x E V. Clearly, both transformations are linear. 0 The following theorem is a direct consequence of the definition, and the proof is left for an exercise. Theorem 4.1 Let T : V -+ W be a linear transformation. Then (1) T(O) = O. (2) For any xi , X2 , . .. ,
xn
E V
and scalars kl , k2, . .. , k«.
Nontrivial important examples of linear transformations are the rotations , reflections, and projections in a geometry. Example 4.3 (The rotations, reflections and projections in a geometry) (1) Let denote the angle between the x-axis and a fixed vector in jR2. Then the matrix R _ [ cos sin (j sine cos s
e
e -
e]
defines a linear transformation on jR2 that rotates any vector in JR2 through the angle
e about the origin. It is called a rotation by the angle e.
(2) The projection on the x -axis is the linear transformation P : JR2 -+ jR2 defined by, for x = (x , y) E jR2, P(x) =
[~ ~] [
; ] = [
~].
4.1. Basic properties of linear transformation
y
y
Ro(x)
119
y. x
x x
-~---x
P(x)
x
rex)
Figure 4.1. Three linear transformations on ]R2
(3) The linear transformation T : ]R2 -+ ]R2 defined by, for X = (x, y),
o
is the reflection about the x-axis. Problem 4.1 Find the matrix of the reflection about the line y
= x in the plane ]R2 .
Example 4.4 (Differentiations and integrations in calculus) In calculus , it is well known that two transformations
defined by differentiation and integration, D(f)(x) = !,(x),
I(f)(x) =
i
X
f(t)dt,
satisfy linearity, and so they are linear transformations . Many problems related with differential and integral equations may be reformulated in terms of linear transformations . 0 Definition 4.2 Let V and W be two vector spaces, and let T : V -+ W be a linear transformation from V into W. (1) Ker(T) = (v E V : T(v) = O} ~ V is called the kernel of T. (2) Im(T) = (T(v) E W : v E V} = T(V) ~ W is called the image of T . Example 4.5 Let V and W be vector spaces and let id : V -+ V and To : V -+ W be the identity and the zero transformations, respectively. Then it is easy to see that Ker(id) = (OJ, Im(id) = V, Ker(To) = V and Im(To) = to}. 0 Theorem 4.2 Let T : V -+ W be a linear transformation from a vector space V to a vector space W. Then the kernel Ker(T) and the image Im(T) are subspaces of V and W, respectively.
120
Chapter 4. Linear Transformations
Proof: Since T(O) = 0, each of Ker(T) and Im (T) is nonempty having O. (1) For any x, y E Ker(T) and for any scalar k , T(x
+ ky) = T(x) + kT(y) = 0 + kO = O.
Hence x + ky E Ker(T), so that Ker(T) is a subspace of V . (2) If v, W E Im(T), then there exist x and y in V such that T(x) = v and T(y) = w. Thus, for any scalar k,
v + kw = T(x) + kT(y) = T(x + ky) . Thus v
+ kw
E Im(T), so that Im(T) is a subspace of
o
W.
=
Example 4.6 (Ker(A) N(A) and Im(A) = C(A)forany matrix A) Let A : JRn --+ JRm be the linear transformation defined by an m x n matrix A as in Example 4.2(1). The kernel Ker(A) of A consists of all solutions of the homogeneous system Ax = O. Hence, the kernel Ker(A) of A is nothing but the null space N(A) of the matrix A, and the image Im(A) of A is just the column space C(A) = Im(A) ~ JRm of the matrix A. Recall that Ax is a linear combination of the column vectors of A. 0 Example 4.7 (The trace is linear) A trace is a function tr : M nxn (JR) --+ JR defined by the sum of diagonal entries n
tr(A) = au
+ a22 + . . . + ann =
Laii i=1
for A = [aij] E M n xn (JR), and tr(A) is called the trace of the matrix A. It is easy to show that tr(A + B) = tr(A) + tr(B) and tr(kA) = k tr(A) for any two matrices A and B in Mnxn(JR), which means that 'tr' is a linear transformation from Mnxn(JR) to the l -space R In addition, one can easily show that the set of all nxn matrices with trace 0 is a subspace of the vector space Mnxn(JR). 0 Problem 4.2 Let W = (A basis for W .
E
Mnxn(R) : tr(A) = OJ. Show that W is a subspace, and find a
Problem 4.3 Showthat, for any matrices A and B in Mn xn (R) , tr(AB) = tr(BA) . One of the most important properties of linear transformations is that they are completely determined by their values on a basis.
Theorem 4.3 Let V and W be vector spaces. Let {VI , V2, ... , vnl be a bas is for V and let WI, W2, . . . , Wn be any vectors (possibly repeated) in W. Then there exists a unique linear transformation T : V --+ W such that T (Vi) Wi for i = 1, 2, . . . , n.
=
Proof: Let x E V . Then it has a unique expression: x = 2:7=1 aiv, for some scalars aI, a2, .. . , an. Define
4.1. Basic properties of linear transformation
121
n
T :V ~ W
= =
=
by
T(x) = I>iWi. i=1
In particular, T(Vi) Wi for i 1,2, . . . , n. Linearity : For x 2:7=1 aivi, Y 2:7=1 biv, E V and k a scalar. we have x+ky 2:7=I(a i +kbi)Vi . Then
=
=
n n n T(x +ky) = L(ai +kbi)Wi = Laiwi +k LbiWi = T(x) +kT(y). i=1 i=1 i=1 Uniqueness: Suppose that S : V ~ W is linear and S(Vi) 1.2• . .. , n. Then for any x E V with x = 2:7=1 aiv ], we have n
= Wi for i =
n
S(x) = LaiS(Vi) = Laiwi = T(x) . i=1 i=1
o
Hence. we have S = T. The uniqueness in Theorem 4.3 may be rephrased as the following corollary.
CoroUary4.4 Let Vand Wbevectorspacesandlet{v], V2, ...• vn}beabasisforV. ~ W are linear transformations and Stvn T(vi)fori 1.2• .. . . n; then S = T, i.e., S(x) = T(x) for all x E V.
=
If S , T : V
=
Example 4.8 (Linear extension of a transformation defined on a basis) Let WI = (1. 0). W2 = (2. -1). w3 = (4. 3) be three vectors in ]R2. (1) Let ex = {el. e2. e3} be the standard basis for the 3-space]R3. and let T : ]R3 ~ ]R2 be the linear transformation defined by
Find a formula for T(XI , X2. X3) . and then use it to compute T(2 . -3. 5). (2) Let f3 = {VI,V2 , V3} be another basis for ]R3. where VI = (1, 1. 1). V2 = (1. 1. 0). V3 = (1. O. 0). and let T : ]R3 ~ ]R2 be the linear transformation defined by Find a formula for T(XI, X2. X3). and then use it to compute T(2. -3, 5).
T(x)
= =
3
3
LXiT(ei) = LXiWi i=1 i=1 XI(l ,0)+X2(2, -1)+X3(4, 3) (XI + 2X2 + 4X3, -X2 + 3X3).
122
Chapter 4. Linear Transformations
Thus, T(2, -3, 5) = (16, 18). In matrix notation, this can be written as
[ ~~ ] = [ [ oI -I24] 3 X3
Xl + 2X2 + 4X3 ] . - X2 + 3X3
(2) In this case, we need to express x = (Xl, X2, X3) as a linear combination of VI, V2, V3, i.e., 3
(XI ,X2,X3) = LkiVi = i=l
kl(l, I, 1)+k2(1, I, 0)+k3(1,0,0)
By equating corresponding components we obtain a system of equations
= = =
Xl X2 X3 ·
The solution is kl = X3, k2 = X2 - X3, k3 = Xl - X2. Therefore, (Xl, x2, X3)
=
T(XI,X2,X3) =
x3VI + (X2 - X3)V2 + (Xl - X2)V3, and X3T(VI)+(X2-X3)T(V2)+(XI-X2)T(V3)
+
+
=
x3(1, 0)
=
(4XI - 2X2 - X3, 3XI - 4X2 + X3).
(X2 - x3)(2, -I)
(X] - x2)(4, 3)
o
From this formula, we obtain T(2, -3, 5) = (9, 23) .
Problem 4.4 Is there a linear transformation T : ]R3 -l- ]R2 such that T(3 , I, 0) = (1, I)and T (-6, -2. 0) = (2, I) ? If yes, can you find an expression of T (x) for x = (xj , x2. X3) in
]R3?
Problem 4.5 Let V and W be vector spaces and T : V ~ W be linear.Let {WI, w2•. . .• Wk} be a linearly independentsubset of the image Im(T) £ W. Suppose that ex = {V]. v2, . . .• vkl is chosen so that T(Vi) = Wi for i = I. 2, .. . , k. Prove that ex is linearly independent.
4.2 Invertible linear transformations A function I from a set X to a set Y is said to be invertible if there is a function g from Y to X such that their compositions satisfy go 1= id and log id . Such a function g is called the inverse function of I and denoted by g = 1-1. One can notice that if there exists an invertible function from a set X into another set Y, then it gives a one-to-one correspondence between these two sets so that they can be identified as sets. A useful criterion for a function between two given sets to be invertible is
=
4.2. Invertible linear transformations
123
that it is one-to-one and onto. Recall that a function f : X ~ Y is one-to-one (or injective) if feu) = f(v) in Y implies u = v in X, and is onto (or surjective) iffor each element y in Y there exists an element x in X such that f (x) = y. A function is said to be bijective if it is both one-to-one and onto, that is, if for each element y in Y there is a unique element x in X such that f (x) = y .
if and only if it
Lemma 4.5 A function f : X ~ Y is invertible one-to-one and onto) .
is bijective (or
Proof: Suppose f : X ~ Y is invertible, and let g : Y ~ X be its inverse. If feu) = f(v), then u = g(f(u)) = g(f(v)) = v. Thus f is one-to-one. For each y E Y, let g(y) = x in X. Then f(x) = f(g(y)) = y. Thus it is onto. Conversely, suppose f is bijective. Then, for each y E Y, there is a unique x E X such that f(x) = y . Now for each y E Y define g(y) = x. Then one can easily check that g : Y ~ X is well defined, and that fog = id and go f = id, i.e., g is the 0 inverse function of f. If T : V ~ Wand S : W ~ Z are linear transformations, then it is easy to show that their composition (S 0 T)(v) = S(T(v)) is also a linear transformation. In particular, if two linear transformations are defined by matrices A : lRn ~ lRm and B : lRm ~ lRk as in Example 4.2(1), then their composition is nothing but the matrix product BA of them, i.e., (B 0 A)(x) = B(Ax) = (BA)x. The following lemma shows that if a given function is an invertible linear transformation from a vector space into another, then the linearity is preserved by the inversion. Lemma 4.6 Let V and W be vector spaces. If T : V ~ W is an invertible linear transformation, then its inverse T - I : W ~ V is also linear. Proof: Let wj , W2 E W, and let k be any scalar. Since T is invertible, there exist unique vectors VI and V2 in V such that T(VI) = WI and T(V2) = W2. Then T-I(WI
+
kW2) =
T- I (T(VI
=
+ kT(V2)) + kV2))
T- I (T(VI) VI + kV2 T-I(WI)
+
kT- I(W2).
o
Definition 4.3 A linear transformation T : V ~ W from a vector space V to another W is called an isomorphism if it is invertible (or one-to-one and onto) . In this case, we say that V and W are isomorphic to each other. Example 4.9 (The vector space Pn (R) is isomorphic to lRn + 1) Consider the vector space P2(lR) = {a + bx + cx 2 : a , b, c E R] of all polynomials of degree ~ 2 with real coefficients. To each polynomial a + bx + cx 2 in the space P2(lR) , one can assign the column vector [a b c f in lR3 . Then it is not hard to see that it is
124
Chapter 4. Linear Transformations
an isomorphism from the vector space P2(JR) to the 3-space JR3 , by which one can identify the polynomial a + bx + cx 2 with the column vector [a b cf. It means that these two vector spaces can be considered as the same vector space through the isomorphism . In this sense, one often says that a vector space can be identified with another if they are isomorphic to each other. In general, the vector space Pn (JR) can 0 be identified with the (n + I)-space JRn+l. It is clear from Lemma 4.6 that if T is an isomorphism, then its inverse T- I is also an isomorphism with (T- I ) -I = T . In particular, if a linear transformation A : JRn -+ JRn is defined by an invertible n x n matrix A as in Example 4.2(1), then the inverse matrix A -I plays the inverse linear transformation, so that it is also an isomorphism on JRn . That is, a linear transformation A : JRn -+ JRn defined by an n x n square matrix A is an isomorphism if and only if A is invertible, that is, rank A =n.
Problem 4.6 Suppose that Sand T are linear transformations whose composition SoT is well defined. Prove that (1) (2) (3) (4)
if SoT is one-to-one, so is T, if SoT is onto, so is S, if Sand T are isomorphisms, so is SoT, if A and B are two n x n matrices of rank n, so is AB .
Problem 4.7 Let T : V
~
W be a linear transformation. Prove that
T is one-to -one if and only if Ker(T) = {O}. (2) If V = W, then T is one-to-one if and only if T is onto.
(1)
Theorem 4.7 Two vector spaces V and Ware isomorphic
if and only if dim V
=
dimW.
Proof: Let T : V -+ W be an isomorphism, and let (VI, V2, .. " vnl be a basis for V. Then we show that the set (T(vj), T(V2), . . . , T(v n )} is a basis for W so that dim W = n = dim V. (l) It is linearly independent: Since T is one-to-one, the equation
implies that 0 = CI VI + C2V2 + + Cn Vn. Since the Vi'S are linearly independent, we have c, = 0 for all i = 1,2, ,n. (2) It spans W: Since T is onto, for any yEW there exists an x E V such that T(x) = y. Write x = L:?=I aiVi. Then
i.e., y is a linear combination of T(VI) , T(V2) , .. . , T(v n ) . Conversely, if dim V = dim W = n, then one can choose bases {VI, V2, • . . , vn } and {WI, W2, .. . , wn } for V and W, respectively. By Theorem 4.3, there exist linear
4.2. Invertible linear transformations
125
transformations T : V --+ Wand S : W --+ V such that T(Vi) = Wi and S(Wi) = Vi fori = 1,2, . .. , n.Clearly,(SoT)(Vi) = Vi and (ToS)(wj) = Wi fori = 1,2, . . . , n , which implies that SoT and T 0 S are the identity transformations on V and W, respect ively, by the uniqueness in Corollary 4.4. Hence , T and S are isomorphisms, and consequently V and W are isomorphic. 0 Corollary 4.8 Let V and W be vector spaces. (1) (2)
If dim V = n, then V is isomorphic to the n-space jRn. If dim V = dim W, any bijective function from a basis for V to a basis for W can be extended to an isomorphism from V to
w.
An isomorphism between a vector space V and jRn in Corollary 4.8 depends on the choices of bases for two spaces as shown in Theorem 4.7. However, an isomorphism is uniquely determined if we fix the bases in which the order of the vectors is also fixed. An ordered basis for a vector space is a basis endowed with a specific order. For example, in the 3-space jR3 , two bases {eI, e2, e3} with the order eI, e2, e3 and {e2, eI, e3} with the order e2, ej , e3 are clearly different as ordered bases, but the same as unordered ones . The basis {el , e2, ... , en} with the order ej , e2, . .. , en is called the standard ordered basis for IRn. However, we often say simply a basis for an ordered basis if there is no ambiguity in the context. Let V bea vector space of dimension n with an ordered basis a {VI , V2, ... , vn}, and let f3 = {el, e2, ... , en} be the standard ordered basis for IRn. Then the isomorphism
: V --+ IRn defined by (vj) = ei is called the natural isomorphism with respect to the basis a. By this isomorphism, a vector in V can be identified with a column vector in IRn. In fact, for any x = I:7=1 aiv, E V , the image of x under this natural isomorphism is written as
=
(x)
n n = ~ai(Vi) = ~aiei = (al, . . . , an) = [ al:
.=1
1=1
]
E
IRn,
an
which is called the coordinate vector of x with respect to the ordered basis a, and it is denoted by [x]a. Clearly [vila = er. Example 4.10 (1) Recall that, from Example 4.3, the rotation by the angle () of IR2 is given by the matrix R = [ cos () - sin () ] IJ sin () cos () .
R;
I is Clearly, it is invertible and hence is an isomorphism of IR2. In fact, the inverse another rotation R_IJ . (2) Let a = [ej , e2} be the standard ordered basis, and let f3 = {VI, V2}, where Vi = RlJei , i = 1,2. Then f3 is also a basis for IR2. The coordinate vectors of Vi with respect to a are
126
Chapter 4. Linear Transformations
=[
[vtJa while
[vIlp
]
COS 0
sin s
=[ ~
'
l
[vzl a
[vzlp
=
[ - sin 0 ] cosO'
=[~
l
If we choose a' = {ez, eil as a different ordered basis for IR z , then the coordinate vectors of Vi with respect to sx' are [vIla ' = [
~~~: ],
-~~~: ] .
[vzl a l = [
o
Example 4.11 (All reflections are of theform Re 0 T 0 R-e) In the plane IR z, the reflection about the line y = x can be obtained by the compositions of the rotation by of the plane, the reflection about the x-axis, and the rotation by Actually, it is a product of matrices given in (1) and (3) of Example 4.3 with 0 Note that the rotation by is
l = f:
-f
f
R" = '4
COS [
f - sin frr ] -_ [~1 - ~ ]' 1
sin!!.4
cos
and the reflection aboutthex-axis is
'4
.J2.J2
[~ _ ~ land R_1- =
for the reflection about the line y = x is
R"'4 [10 -10 ] R;l '4
=
[1 -1] -
.J2
-
,J2
Rjl. Hence, the matrix
1 1] =
[10 -10 ] [ - - .J2 .J2
[01 01 ] .
In general, the reflection about a line .e in the plane can be expressed as the composition Re 0 T 0 R-e , where T is the reflection about the x-axis and 0 is the 0 angle between the x-axis and the line .e (see Figure 4.2). Problem4.8 Find the matrix of reflectionabout the line y
= ,J3x in ]RZ.
Problem4.9 Find the coordinatevectorof 5 + 2x + 3x z with respectto the givenorderedbasis ex for PZ(]R) : (l)ex={l , x ,x z}; (2)ex={I+x ,l+x z, x+ x z}.
4.3 Matrices of linear transformations We have seen that the product of an m x n matrix A and an n x 1 column matrix x gives rise to a linear transformation from IR n to IR m . Conversely, one can show that a linear transformation of a vector space into another can be represented by a matrix via
4.3. Matrices of linear transformations
127
y
x
R_o(x) y
= RO 0 T 0 R_o(x)
x
T Ro
To R_O(x)
Figure 4.2. The reflection Ro
0
T 0 R-o
the natural isomorphism between an n-dimensional vector space V and the n-space jRn, which will be shown in this section. Let T : V -+ W be a linear transformationfrom an n-dimensionalvectorspace V toanm-dimensional vectorspace W, and let ex = {VI, . . . , vn } and,B = {WI, . . . , wm } be any ordered bases for V and W, respectively, which will be fixed throughout this section. Then by Theorem 4.3 the linear transformation T is completely determined by its values on a basis ex: Write them as { T(v,) T(vz)
= =
allWI
alZwI
+ +
aZlwz aZZwz
+ +
+ +
amlWm amZWm
T(v n)
=
alnwl
+
aznwz
+ ... +
amnwm,
or, in a short form, m
T(vj) =
I>ijWi
for I :5 j :5 n,
i=1
for some scalars aij (i = 1, 2, . . . , m; j = 1, 2, ... , n) . Now,for any vector x = LJ=I XjVj E V,
Equivalently, the coordinate vector of T(x) with respect to the basis ,B in W is
[T(x)]p
=
LJ=I aljx j] [
n:
Lj=1 amjxj
[all' ..
=:
amI
aln] [ XI ]
:
:
amn
Xn
= A[x]a.
That is, for any x E V the coordinatevector[T(x)]p of T(x) in W isjust the productof a fixedmatrix A and the coordinatevector[x]a of x. This situationcan be incorporated
128
Chapter 4. Linear Transformations T
V
~j
X
f--"">
T [x]a
W
T(x)
j~
T f--"">
jRn
A
[T(x»)p
= [T]g
jRm ,
Figure 4.3. The associated matrix for T
in the commutative diagram in Figure 4.3 with the natural isomorphisms ct> and \II, defined in Section 4.2. Note that the commutativity of the diagram means that A 0 ct> = \II 0 T. Note that
is the matrix whose column vectors are just the coordinate vectors [T (v j)]p of T (v j ) with respect to the basis (3. In fact, A [aU] is just the transpose of the coefficient matrix in the expression of T(V i) with respect to the basis {3 in W . Note that this matrix [T]~ is unique since the coordinate expression of a vector with respect to a fixed basis is unique.
=
Definition 4.4 The matrix A is called the associated matrix for T (or the matrix representation of T) with respect to the ordered bases (¥ and {3 , and denoted by A = [T]g. When V = Wand a = {3, we simply write [T]a for [T]~ . Now, the argument so far can be summarized in the following theorem. Theorem 4.9 Let T : V ~ W be a linear transformation from an n-dimensional vector space V to an m-dimensional vector space W. Forfixed ordered bases a = {VI, V2, ... , vnl for V and {3 for W, there corresponds a unique associated m x n matrix [T]g for T such that for any vector x E V the coordinate vector [T(x)]p of T (x) with respect to {3 is given as a matrix product ofthe associated matrix [T]g for T and the coordinate vector [x]a. i.e., [T(x)]p = [T]g[x]a . The associated matrix [T]g is given as
I
4.3. Matrices of linear transformations
129
The following examples illustrate the computation of the associated matrices for linear transformations. Example 4.12 (The associated matrix [id]a) Let id : V ~ V be the identity transformation on a vector space V. Then for any ordered basis a for V, the matrix [id]a = I, the identity matrix , because if a = {VI , V2 , " " vn } , then
j
id(vI> = id(V2) =
IVI OVI
+ +
OV2 IV2
+ +
+ +
id(vn ) =
OVI
+
OV2
+
+
·· ·
.. .
OVm OVm
o
Iv m •
Example 4.13 (The associated matrix [T]~) Let T : PI (lR) ~ P2(lR) be the linear transformation defined by (T(p»)(x) = xp(x) . Find the associated matrix [T]~ with respected to ordered bases a = {I, x} and f3 = {I , x, x 2 } for PI(lR) and P2(lR),respectiveiy. Solution: Clearly,
(T(I»(x) = x = O· I { (T(x»(x) = x 2 = O· I
[! n.
+ +
Ix
Ox
+ +
Ox2
Ix 2.
Hence, the associated matrix for T is the transpose of the coefficient matrix in this expression, that is,
[T]~ ~
0
Example 4.14 (The associated matrices [T]~ and [T]~') Let T : lR2 ~ lR3 be the linear transformation defined by T(x , y) = (x + 2y, 0, 2x + 3y) with respect to the standard bases a and f3 for lR2 and lR3, respectively. Then
T(el) = { T(e2) =
Hence,
[Tl~ ~ [~
T(I,O) = T(O, I) =
i:
P' =
(I, 0, 2) = (2, 0, 3) =
lej 2el
Ie, • ., . ej ], then
+ +
Oe2 Oe2
+ +
2e3 3e3.
[TJ~' ~ [~ ~].
0
Example 4.15 (The associated matrix [T]a for the standard basis a) Let T : lR2 ~ lR 2bealineartransformationgivenbyT(l , I) = (0, l)andT(-I, I) = (2, 3).Find the matrix representation [T]a of T with respect to the standard basis a = [ej , e2}.
130
Chapter 4. Linear Transformations
Solution: Note (a, b) = ae) shows T(el) -T(e»
+
++
be2 for any (a, b) E jR2. Thus, the definition of T
+
T(~) = T(el + e2) = T(l, 1) = T(e2) = T(-e) e2) = T(-l , 1) =
(0 , 1) = (2, 3) = 2el
+
~,
3e2.
By solving these equations, we obtain T(el) T(e2)
Therefore,
rn, =
=~ ; l
[
=
=
-el el
0
Example 4.16 (The associated matrix [T]p for a non-standard basis 13) Let T be the linear transformation given in Example 4.15. Find [TJp for a basis 13 {VI,V2}, where VI = (0, 1) and V2 = (2, 3).
=
Solution: From Example 4.15, T(v»
=
[=~
T(V2)
=
[=~
;] [ ~ ] ;] [; ]
= [ ; ] = [T(VI)]a , = [
~]
= [T(V2)]a.
To write these vectors as linear combinations of basis vectors in
[ ; ] = aVI
+
bV2 = [ a
+;: ], [~ ] + = CVI
13, we put
dV2 = [ C
+;~ ].
Solving for a, b, Cand d, we obtain [T(VI)]p = [ : ] =
Therefore, [T]p
~[ ~
)[1 5]
= '2
1 1
.
] and [T(V2)]p = [
~]
=
~[
; ] .
o
Remark: (1) Recall that any m x n matrix A can be considered as a linear transformation from the n-space jRn to the m-space jRm via x 1-+ Ax . Clearly, its matrix representation with respect to the standard bases ex for jRn and 13 for jRm is the matrix A itself, i.e., A = [A]~. (Note that Aej is just the j-th column vector of A.) In particular, if A is an invertible n x n square matrix, then the column vectors CI, C2, ... , Cn form another basis y for jRn. Thus, A is simply the linear transformation on jRn that takes the standard basis ex to y, in fact,
4.4. Vectorspaces of linear transformations
131
the j-th column of A, so that its matrix representation [A]~ is the identity matrix. (2) Let V and W be vector spaces with bases a and {3, respectively , and let T : V --+ W be a linear transformation with the matrix representation [T]~ A. Then itis clear that Ker(T) and Im(T) are isomorphic to the null spaceN(A) and the column space C(A) , respectively , via the natural isomorphisms. In particular, if V = JRn and W = JRm with the standard bases, then Ker(T) = N(A) and Im(T) = C(A) . Therefore, from Theorem 3.17, we have
=
dim Ker(T)
+
dim Im(T) = dim V.
I
(3) Let Ax = b be a system of linear equations with an m x n coefficient matrix A. By considering the matrix A as a linear transformation from JRn to JRm , one can have other equivalent conditions to those mentioned in Theorems 3.24 and 3.25: The conditions in Theorem 3.24 (e.g., C(A) = JRm) are equivalent to the condition that A is surjective, and those in Theorem 3.25 (e.g., N(A) = {On are equivalent to the condition that A is one-fa-one. This observation gives the proof of (15)-(16) in Theorem 3.26. Problem 4.10 Find the matrix representations [TJa and [Tl,8 of each of the following linear transformations T on jR3 with respect to the standard basis a. = {eI, e2, e3} and another fJ = [ej , e2, ej}: (1) T(x, y, z) = (2x - 3y + 4z, Sx - y (2) T(x, y, z) = (2y + z, x - 4y , 3x). Also , find the matrix representation
+
2z , 4x
+ 7y),
[T]~ of each of the linear transformations T.
Problem 4.11 Let T : jR4 _ jR3 be the linear transformation defined by T(x , y , z , u)=(x+2y, x - 3z + u, 2y+3z+4u) . Let a. and fJ be the standard bases for jR4 and jR3 , respectively . Find [Tl~ .
Problem 4.12 Let id : jRn _ jRn be the identity transformation. Let Xk denote the vector in jRn whose first k - 1 coordinates are zero and the last n - k + 1 coordinates are 1. Then clearly fJ = [xj , . .. , xnl is a basis for jRn (see Problem 3.9) . Let a. = {eI, . . . , en} be the standard basis for jRn. Find the matrix representations [idl~ and [idl~ .
4.4 Vector spaces of linear transformations Let V and W be two vector spaces of dimensions nand m. Let £(V; W) denote the set of all linear transformations from V to W, i.e., £(V; W)
= {T
: T is a linear transformation from V to W} .
For any two linear transformations Sand T in £(V; W) and A E JR, we define the sum S + T and the scalar multiplication 'AS by
132
Chapter 4. Linear Transformations (S + T)(V)
= S(V) +
and
T(v)
(AS)(v)
= A(S(v))
for any v E V. Clearly, the sum S + T and the scalar multiplication AS are also linear and satisfy the operational rules of a vector space, so that L(V; W) becomes a vector space. Let a and fJ be two ordered bases for V and W, respectively and let T : V --+ W be a linear transformation. Then the associated matrix [T]~ of T with respect to these bases is uniquely determined by Theorem 4.9. That is, the function
=
Proof: (1) It is one-to-one : If [S]~ [T]~ for Sand T in L(V; W), then we have S T by Corollary 4.4. (2) It is onto: For any m x n matrix A (considered as a linear transformation from IRn to IRm), define a linear transformation T : V --+ W by T = w- I 0 A 0 4> as the composition of A with the natural isomorphisms 4> : V --+ IR n and W : W --+ IRm. Then clearly [T]~ = A, i.e.,
=
Furthermore, the following lemma shows that
+
[T]~
and
[kS]~ = k[S]~ .
Proof: Leta = {VI , .. . , vn } andfJ = {WI, ... , w m }. Then we have unique expressions S(Vj) = aijw; and T(vj) = bijw; for each I s j s n, so that [S]~ = [aij] and [T]~ = [bij]' Hence
Lr=1
(S Thus
+
Lr=1
m T)(Vj) = I>ijW;
;=1
+
m m I )ijW; = L(aij
;=1
[S + T]~ = [S]~
;=1
+
+ bij)w;.
[T]~.
The proof of the second equality [kS]~ = k[S]~ is similar and left as an exercise. 0 In particular, if V = IRn and W = IRm, then the vector space Mm xn(IR) of m x n matrices may be identified with the vector space L(IRn; IRm), since such a matrix A is
4.4. Vectorspaces of linear transformations
133
a linear transformation and A itself is the matrix representation of itself with respect to the standard bases of lRn and lRm, One can summarize our discussions in the following theorem: Theorem 4.12 For vector spaces V ofdimension nand W ofdimension m, the vector space ,C(V; W) ofall linear transformations from V to W is isomorphic to the vector space Mmxn(lR) of all m x n matrices, and dim£(V ; W)
= dim Mmxn(lR) = mn = dim V dim W.
Remark: With an isomorphism from the vector space £(V; W) to the vector space M mxn (R) as mentioned in Theorem 4.12, one can prove that the following conditions for a linear transformation T on a vector space V are equivalent, as mentioned in Theorem 3.26:
(1) T is an isomorphism, (2) T is one-to-one, (3) T is surjective. (One can also prove it directly by using the definition of a basis for V. See Problem4.7.) The next theorem shows that the one-to-one correspondence between £(V ; W) and Mm xn (R) preserves not only the vector space structure but also the compositions oflinear transformations, Let V, Wand Z be vector spaces. Suppose that S : V --+ W and T : W --+ Z are linear transformations, Then the composition T 0 S : V --+ Z is also linear. Theorem 4.13 Let V, Wand Z be vector spaces with ordered bases a , {3, and y , respectively. Suppose that S : V --+ Wand T : W --+ Z are linear transformations. Then [T 0 S]!; = [T]~ [S]~ .
= (v), .. . , vn }, f3 = {WI, .. . , wm}andy = (z), .. . , zel . Let [ T ]~ = [aij] and [S]~ = [bpq ] . Then, for 1 s i s n,
Proof: Leta
(T
0
It shows that [T
S)(Vi)
0
=
T(S(Vi))
=
tbki (tajkZj) k=l j=)
S]!; = [T]~[S]~.
=T
(tbkiWk) k=)
=t
j=)
= tbkiT(Wk) k=)
(tajkbki) k=)
Zj .
o
134
Chapter 4. Linear Transformations
Problem 4.13 Let ex be the standard basis for lR3, and let S, T : lR3 ~ lR3 be two linear
transformations given by
= (2, 2, T(el) = (1, 0, S(el)
= (0, T(e2) = (0,
1), S(e2)
I, 2),
1),
1, I),
Compute [S + Tl a, [2T - Sla and [T
0
= (-1, 2, I) , T(e3) = (1, 1, 2).
S(e3)
Sla .
=
Problem 4.14 Let T : P2(JR) ~ P2(JR) be the linear transformation defined by T (f) (3 + x)J' +2J,andletS : P2(lR) ~ lR3 be the one definedby S(a+bx+cx 2) (a-b , a+b , c). Fora basis ex = {l,x , x 2} for P2(lR) and the standard basis f3 = {el,e2, e3}for]R3, compute
=
p p [Sla, [T]a and [S 0 T]a .
Theorem 4.14 Let V and W be vector spaces with ordered bases a and {3, respec-
tively, and let T : V
~
W be an isomorphism. Then
Proof: Since T is invertible, dim V = dim W, and the matrices [T]~ and [T-I]p are square and of the same size. Thus,
o
is the identity matrix. Hence, [T-I]p = ([T]~) -I .
In particular, if a linear transformation T : V ~ W is an isomorphism, then [T]~ is an invertible matrix for any bases a for V and {3 for W. Problem 4.15 For the vector spaces PI (JR) and lR2, choose the bases ex = {I , x } for PI (R) and = {el, e2} for lR2, respectively. Let T : PI (R) ~ ]R2 be the linear transformation defined by T(a + bx) = (a, a + b) .
f3
(2) Find [T]~ and [T-Il
(1) Show that T is invertible.
p'
4.5 Change of bases In Section 4 .2, we have seen that, in an n-dimensional vector space V with a fixed basis a, any vector x can be identified with a column vector [x]a in the n-space jRn via the natural isomorphism <1>. Of course, one may get a different column vector [x]p if another basis {3 is given instead of a . Thus, one may naturally ask what the relation between [x], and [x]p is for two different bases a and {3. To answer this question, let us begin with an example in the plane jR2. The coordinate expression ofx = (x, y) E jR2 with respect to the standard basis a = Iei. e2} is x = xel
+
ye2, so that [x]a = [
~
].
4.5. Change of bases
135
x
Figure 4.4. Coordinates {el' e2l and {e;, e2l Now let fJ = {e;, e 2} be another basis for]R2 obtained by rotating ex counterclockwise through an angle 8 as in Figure 4.4, and suppose that the coordinate expression ofx
E
]R2 with respect to fJ is written as x
= x'e; +
y'e 2, or [xl,B
= [ ~;
l
Then,
the expressions of the vectors in fJ with respect to ex are e; = id(eD = { e = id (e ) = 2 2 so [e; ]a = [ Therefore, from x = xel x = x' e;
+
+
~~~: ] ,
cos8 el
+
- sin8 el +
sin8 e2 cos8 e2,
, [ - sin8 ] cos8 ' [e 2la =
ye2 and
y' e2 = (x' cos 8 - y' sin 8)el
+
(x' sin 8 + y' cos 8)e2,
one can have the matrix equation:
= [ [ x] y where
c~s 8 sm8
x' ], or [x], =
- sin 8 ] [ cos8 y'
[id]p[xl,B,
[i d]a = [[ e' ] [e']] = [ c~s 8 - sin 8 ] . ,B I a 2a sin 8 cos 8
It means that two coordinate vectors [x]a and [x],B in the 2-space ]R2 are related by the associated matrix [id]p for the identity transformation id on ]R2. Note that [idlg = ([idl,Ba)-1 = [
c~s~ sin~] by Theorem 4.14 .
- smo coso
In general, if ex = {VI,V2, . . . , vn} and fJ = {WI,W2, . . . , wnl are two ordered bases for an n-dimensional vector space V, then any vector x E V has two expressions:
136
Chapter 4. Linear Transformations n X
=
n
l:::XiVi i=1
= LYjWj . j=1
In particular, each vector in f3 is expressed as a linear combination of the vectors in = id(wj) = I:7=1 quVi for j = 1,2, ... , n, so that
a: say Wj
Then for any x E V, n X
n
n
= LXiVi = LYjWj i= 1
n
LYj LqijVi j=1
j=1
i=1
This is equivalent to the matrix equation
[xla =
[t
quY j]
= [idlp[xlfl
J=I
or
where [id]p
=
qll [
qnl
...
'.
qln]
= [[wIla
.. . [Wn]a]·
qnn
This means that any two coordinate vectors of a vector in V with respect to two different ordered bases a and f3 are related by the matrix representation [id]p of the identity transformation on V, and this can be incorporated in the commutative diagram in Figure 4.5 (next page). Definition 4.5 The matrix representation [id]p of the identity transformation id : V -+ V with respect to any two ordered bases a and f3 is called the basis-change matrix or the coordinate-change matrix from f3 to a. Since the identity transformation id : V -+ V is invertible, the basis-change matrix Q = [id]p is also invertible by Theorem 4.14. If we had taken the expressions of the vectors in the basis a with respect to the basis f3 : Vj = id(vj) = L:7=1 PijWi for j = 1,2, . . . , n, then [Pij] = [idl~ = Q-I and [x]fl = [idl~[x]a = ([id]p)-I[X]a'
137
4.5. Change of bases id
I
V
~'
x
1--- ......
T [xlp
jRn
V
x
T ~
[xla
l~ jRn.
Q = [idl p
Figure 4.5. The basis-change matrix [idlp
Example 4.17 (Analytic interpretation of a basis-change matrix ) Consider a curve xy = I on the plane ]R2 . Find the quadric equation of the cur ve which is obtained from the curve x y = I by rotating around the origin clockwise through an angle x /4. Solution: .Let
f3 =
{e; , e;l be the basis for ]R2 obtained by rotating the standard
basis a = {et , e2l counterclockwis e through an angle n /4, and let[x]a = [ [x]p = [
[
~
] and
~: ] . Then,
~ ] = (i d]~ ~; [
] = [
~~~ f -~~~ f][~; ]= [ ~ - ~ ] [ ~; l
Hence, the equation xy = I is transformed to 1= xy
= (~x/ _ ~y') (~x/ + ~y') =
(x:/ _(y~)2 , o
which is a hyperbola. (See Figure 4.6.)
y
x
Figure 4.6. The graphs of xy = 1 and
T - ¥( ') 2 = 1 (X') 2
138
Chapter 4. Linear Transformations
Example 4.18 (Computing a basis-change matrix) Let the 3-space JR 3 be equipped with the standardryz-coordinate system, i.e., with the standard basis a = [ej, e2, e3}. Take a new x'y'z' -coordinate system by rotating the xyz-system around its z-axis counterclockwise through an angle (), i.e., we take a new basis f3 = {e;, e2, e by rotating the basis a about z axis through (). Then we get
J}
- Sin()]
[e2]a = Hence, the basis-change matrix from
c~s o
[
f3 to a is COS ()
Q = [id]p = so
Ixl,
~
,
-
sin ()
o
[
sin e cos ()
0
[n ~ [~~: -~~~: ~][
n
~ Q[x],
Moreover, Q = [id]p is invertible and the basis-change matrix from a to f3 is
Q-I = [id]~ =
o
[
so that [COS ()
X' ] [
~:
=
-
0]
COS () sin - s~n () cose 0 o 1
sin ()
,
0] [~.
s~n () co~ o ~
x ]
o
Problem 4.16 Find the basis-change matrix from a basis a to another basis f3 for the 3-space 1R3,wherea = {(I , 0,1 ), (1,1 ,0) , (0, 1, I)}, f3 = {(2, 3,1) , (1,2,0), (2,0, 3)).
4.6 Similarity The coordinate expression of a vector in a vector space V depends on the choice of an ordered basis. Hence, the matrix representation of a linear transformation is also dependent on the choice of bases. Let V and W be two vector spaces of dimensions nand m with two ordered bases a and f3 , respectively, and let T : V -+ W be a linear transformation. In Section 4.3, we discussed how to find the associated matrix [T]~. If one takes different bases a' and f3' for V and W, respectively, then one may get another associated matrix a of T . In fact, we have two different expressions
[Tl:
4.6. Similarity
[X]a and [x]a' in IR n [T (x)]fJ and [T (x)] fJ' in IR
139
for each x E V, m
for T(x) E W.
They are related by the basis-change matrices as follows: I fJ' [x]a' = [idv]~ [x]a , and [T(x)]fJ' = [idw]fJ [T(x)],8.
On the other hand, by Theorem 4.9 , we have [T(x)],8
=
[T]~[x]a, and [T(x)],8'
Therefore, we get
fJ' [T]a,[x]a'
=
[T(x)]fJ'
=
[idw ]~'
=
,8' , 8'
=
[idw]fJ [T(x)],8
,8
[idw],8 [T]a[x]a
[T]~ [idv ]~,[x]a"
This equation looks messy. However, by Theorem 4.13, this relation can be obtained directly from T idw 0 T 0 id v as
=
[T]~; = [idw 0 T 0 idv ]~; = [idw ]~' [T]~ [idv ]~,. Note that [T]~ and [T]~; are m x n matrices, [idv ]~, is an n x n matrix and [idw ]~' is an m x m matrix . The relation can also be incorporated in the diagram in Figure 4.7 , in which all rectangles are commutative.
[T]~; jRn
~ [idv ]~,
jRn
(V, a')
idv
7
T
1
(V,a)
;/ • (W , fJ /)
1idw T
[T]~
jRm
• (W,
[idw]~,
fJ)
~
jRm
Figure 4.7. Relating two associated matrices [T]~ and [T],8; a
Our discussion is summarized in the following theorem.
140
Chapter 4. Linear Transformations
Theorem 4.15 Let T : V -+ W be a linear transformation from a vector space V with bases a and a ' to another vector space W with bases 13 and 13'. Then
[T]~: where Q
=
P-I[T]~Q ,
= [i dv J: , and P = [i dw ]~, are the basis-change matrices.
In particular, if we take W the following corollary.
= V, a = 13 and a' = 13', then P = Q and we get to
Corollary 4.16 Let T : V -+ V be a linear transformation on a vector space V and let a and 13 be ordered bases for V. Let Q = [id]Jj be the basis-change matrixfrom 13 to a . Then (1) Q is invertible, and Q-I = [id]~. (2) For any x E V, [x]a = Q[x]p. (3) [T]p = Q-l[T]aQ.
The relation (3) of [T]p and [T]a in Corollary 4.16 is called a similarity. In general, we have the following definition.
Definition 4.6 For any square matrices A and B, A is said to be similar to B if there exists a nonsingular matrix Q such that B = Q-I A Q. Note that if A is similar to B, then B is also similar to A. Thus we simply say that A and B are similar. We saw in Corollary 4.16 that if A and B are n x n matrices representing the same linear transformation T on a vector space V , then A and B are similar.
Example 4.19 (Two similar associated matrices) Let 13 = {VI , V2, V3} be a basis for the 3-space ]R3 consisting of VI = (l, 1, 0), V2 = (1, 0, 1) and V3 = (0, 1, 1) . Let T be the linear transformation on ]R3 given by the matrix [T]p =
[
21-1]
1 2 -1 1
3 1
.
Let a = {el, e2, e3} be the standard basis. Find the basis-change matrix [id]~ and [T]a.
Solution: Since VI
= ei + e2,
10] o 1
1 1
,
V2
and
= el + e3,
V3
= e2 + e3, we have
[id]~ = ([idJJj)-1 = ~2 [ -1~ -~1
Therefore, [T]a = [id]Jj[T]p[id]~ = -1 [
2
43 -1
-: n
-1 ] 1 . 1
o
4.6. Similarity
141
Example 4.20 (Computing an associated matrix) Let T : ]R3 ~ JR.3 be the linear transformation defined by
T(XI, X2, X3) = (2xI
+
X2, XI
+
+
X2
3X3, -X2)'
Let a = [ej , e2, es) be the standard ordered basis for JR.3, and let f3 = {VI, V2, V3} be another ordered basis consisting of VI = (-1, 0, 0), V2 = (2, I, 0), and V3 = (1, I, 1). Find the associated matrices [T]a and [T]/l for T. Also, show that T (v j) is the linear combination of the basis vectors in f3 with the entries of the j -th column of [T]/l as its coefficients for j = 1,2,3 . Solution: One can easily show that
[T]a =
2 10] [o 1
1 3 -1 0
[id]p =
and
[-1 21] 0 1 1
.
001
Thus, with the inverse [id]~ = ([id]p)-I =
- 1 2 [
-1 ] 0 1 - ~ , it follows that
o
0
To show the second statement,let j = 2. Then T(V2) = T(2, 1,0) = (5,3, -1). On the other hand, the coefficients of [T (V2)]/l are just the entries of the second column of [T]/l' Therefore,
T(V2)
2v\
+ 4V2
-
V3
2(-1,0,0) + 4(2 , 1,0) - (1, I, 1) = (5,3, -I),
o
as expected.
The next theorem shows that two similar matrices can be matrix representations of the same linear transformation. Theorem 4.17 Suppose that an n x n matrix A representsa linear transformationT : V ~ Von a vector space V with respect to an ordered basis a = {VI, V2, . . . , vn }, i.e., [T]a = A. If B = Q-I A Q for some nonsingular matrix Q, then there exists a
basis f3 for V such that B = [T]/l and Q = [id]p' Proof: Let Q = [qij] and let WI, W2, . .. , qllvl q12 VI
+ +
q21 V2
q22 V2
Wn
+ +
be the vectors in V defined by
+ +
qnlvn Qn2Vn
142
Chapter 4. Linear Transformations
Then the nonsingularityof Q = [% l impliesthat tl = {WI, W2, .. . , wn } is an ordered basis for V, and Theorem 4.16(3) shows that [Tlp = Q-I [Tl a Q = Q-I A Q = B with Q = [idlfj. 0 Example 4.21 (A matrix similar to an associated matrix is also an associated matrix) Let D be the differential operator on the vector spaceP2(lR) Given the ordered basis a = {I, x, x 2 }, first note that D(l) = D(x) = D(x 2 )
=
0 = 0 . 1+ 0 . x + 0 . x2 1 =1·1+0·x+O·x 2 2x = O· 1 + 2· x + 0 . x 2 •
Hence, the matrix representationof D with respect to a is given by [Dl a =
0I 0] [ 000 0 0 2
.
Choose a nonsingular matrix
Q=
[~ ~ -~], o
0
4
Let
Now, we are going to find a basis tl = {VI, V2, V3} so that B = [Dlp. But, if it is, the matrix Q must be the basis-changematrix [idlfj, and then VI = 1·1 + o· x + O· x 2 = I , V2 = 0·1 + 2 · x + O· x 2 = 2x, V3 = -2.1+0·x+4·x 2 = -2+4x 2 . Clearly, one can obtain D(l) = 0 = O· 1 + 0 . 2x + 0 . (-2 + 4x 2 ) , D(2x) = 2 = 2.1+0.2x+O .(-2+4x 2 ) , 2 D( -2 + 4x ) = 8x = O· 1 + 4 . 2x + 0 . (-2 + 4x 2 ) ,
and [Dlp
=
[
020] 0 0 4 000
thus, as expected, that [Dlp = B = Q-I[DlaQ.
,
o
4.7.1. Application: Dual spaces and adjoint
143
Problem 4.17 Let T : ]R3 ~ ]R3 be the linear transformation defined by
Let a be the standard basis, and let fJ = {VI, V2, V3} be another ordered basis consisting of VI = (1, 0, 0), v2 = (1, 1, 0), and v3 = (1, 1, 1) for ]R3. Find the associated matrix of T with respect to a and the associated matrix of T with respect to fJ. Are they similar?
Problem 4.18 Suppose that A and B are similar n x n matrices. Show that
= =
(1) det A det B, (2) tr(A) tr(B), (3) rank A rank B.
=
Problem 4.19 Let A and B be n x n matrices. Show that if A is similar to B, then A2 is similar to B 2 . In general, An is similar to B" for all n ~ 2.
4.7 Applications 4.7.1 Dual spaces and adjoint Note that the space of all scalars is a one-dimensional vector space JR, and the set of all linear transformations from V to JR is the vector space .c(V; JR) whose dimension is equal to the dimension of V (see Theorem 4.12), so that the two vector spaces .c( V; JR) and V are isomorphic. In this section, we are concerned exclusively with such linear transformations from V to the scalar space R Definition 4.7 Let V be a vector space. (1) The vector space .c(V; JR) of all linear transformations from V to JR is called the dual space of V and denoted by V* . (2) An element (i.e., a linear transformation) in the dual space .c(V ; JR) is called a linear functional of V . From the definition, one can say that any vector space is isomorphic to its dual space . Example 4.22 The trace function tr
Mn xn(JR)
~
JR is a linear functional of
Mn xn(JR)· The definite integral of continuous functions is one of the most important examples of linear functionals in mathematics . Example 4.23 (Fourier coefficients are linear functionals) Let C[a , b] be the vector space of all continuous real-valued functions on the interval [a , b]. The definite integral I : C[a, b] ~ JR defined by
144
Chapter 4. Linear Transformations
I(f) is a linear functional of integer, then
1
An(f) = - 1 rr 0
=
l
b
f(t)dt
qa, b) . In particular, if the
21r
interval is [0, 2rr) and
1
n is an
21r
f(t) cosnt dt
and
Bn(f) = - 1 rr 0
are linear functionals, called the n-th Fourier coefficients of
f(t)sinntdt
f.
o
For a matrix A regarded as a linear transformation A : IRn -+ IRm , the transpose AT of A is another linear transformation AT: IRm -+ IR n . For a linear transformation T : V -+ W from a vector space V to W, one can naturally ask what its transpose is and what the definition is. In this section, we discuss this problem. Recall that a linear functional T : V -+ IR is completely determined by the values on a basis for V. Let a = {VI. V2, .. . , vn } be a basis for a vector space V. For each i = 1, 2, ... , n, define a linear functional vt: V -+ IR by vt(Vj) = liij for each j = 1, 2, . . . , n. Then, for any x = 2:ajvj E V , we have vt (x) = ai, which is the i -th coordinate of x with respect to a . Thus, the functional vt is called the i -th coordinate function with respect to the basis a . Theorem 4.18 The set a* = {vr, vi , . .. , v~} of coordinate functions forms a basis for the dual space V*, and for any T E V* we have n
T = LT(vj)vt. j=1
Proof: Clearly, the set a* = {vi , vi, . .. , v~} is linearly independent, since 0 = 2:7=1 Cjvt implies = 2:7=1 Cjvt(Vj) = Cj for each j = 1,2, . .. , n. Because dim V* = dim V = n, these n linearly independent vectors in a* must form a basis. Now, for any T E V* , let T = 2:7=1 Cjv7- Then, T(V j) = 2:7=1 Cjvt(Vj) = Cj. It gives T = 2:7=1 T(vj)v70
°
Definition 4.8 For a basis a {VI,V2 , .. . , vn } for a vector space V , the basis a* = {vr, vi, ... , v~} for V* is called the dual basis of a. Example 4.24 (Computing a dual basis) Let a = {VI , V2} be a basis for 1R2, where VI = (1, 2) and V2 = (1, 3). To find its dual basis a* = {vi, vi} of a, we consider the equations
+ 2vi(e2), = Vi(V2) = vr(el) + 3Vr(e2). Solving these equations, we obtain that vi(el) = 3 and Vi(e2) = -1. Thus vi(x , y) 3x - y . Similarly, it can be shown that Vi (x , y) = -2x + y. I =
°
vi(vd = vr(el)
=
0
4.7.1. Application: Dual spaces and adjoint
145
The following example shows a natural isomorphism between the n-space JRn and its dual space JRn*. Example 4.25 (The dual basis et is the coordinate function) For the standard basis
a = {el, e2, ... , en} for the n-space JRn , its dual basis vector et is just the i -th coordinate function. In fact, for any vector x = (XI , X2 , ,xn) = XI el + X2e2 + .. . + xnen E JRn, we have et (x) = et (XI el + X2e2 + + xnen) = Xi for all i, On the other hand, when we write a vector in JRn as x = (XI, X2, . . . ,xn ) with variables Xi, it means that for a given point a = (ai, a2, ... , an) E JRn , each Xi gives us the i-th coordinate of a, that is, xi(a) = a, for all i, In this sense, one can identify et = Xi for i = 1, 2, ... , n , so that JRn * = JRn and they are called coordinate functions . 0 Problem4.20 Leta = {(1, 0,1) , (1, 2,1), (0,0, 1)}beabasisforlR3.Findthedualbasis a* .
For a given linear transformation T : V --+ W, one can define T* : W* --+ V* goT for any g E W*. In fact, for any linear functional g E W*, i.e. , by T* (g) g : W --+ JR, the composition goT : V --+ JR given by g 0 T(x) = g(T(x)) for x E V defines a linear functional on V , i.e., T* (g) = goT E V*.
=
Lemma 4.19 The transformation T * : W* --+ V* defined by T*(g) = goT for g E W* is a linear transformation. It is called the adjoint (or transpose) ofT. 0 Proof: For any T*(af
+
f, g
bg)(x)
E W*,
=
a, bE JR and x E V,
af(T(x))
+
=
bg(T(x))
(aT*(f)
+
bT*(g))(x) .
0
Example 4.26 «idv)* = idv» and (T 0 S)* = S* 0 T*) (1) Let id : V --+ V be the identity transformation on a vector space V. Then for any g E V*, i d" (g) = g 0 i d = g. Hence, the adjoint i d * : V* --+ V* is the identity transformation on V*, i.e., id* = id. (2) Let S : U --+ V and T : V --+ W be two linear transformations. Then for any g E W*, we have (T
0
It shows that (T
S)*(g) =
go (T
0
S) = (g
=
T*(g)
0
S
0
=
0
T)
0
S*(T*(g))
S
=
(S*
0
T*)(g).
o
S)* = S* 0 T*.
Now,ifS: V --+ Wisanisomorphism,then(S-I)*oS* shows that S* : W* --+ V* is also an isomorphism.
*:
= (SoS-I)* = id* = id
Note that the linear transformation V --+ V* defined by assigning a basis for V to its dual basis is an isomorphism, so that the composition V --+ V** is also an isomorphism. However, an isomorphism between V and V** can be defined without choosing a basis for V . In fact, for each x E V, one can first define V* --+ JR by
** :
x:
146
Chapter 4. Linear Transformations
x(1) = f (x) for every f E V *. It is easy to verify that x is a linear functional on V* so that E V**. The following theorem shows that the mapping ct> : V ~ V** defined by ct>(x) = is an isomorphism and it is not dependent on the choice of basis for V.
x
x
Theorem 4.20 The mapping ct> : V ~ V** defined by ct>(x) = from V to V**.
xis an isomorphism
Proof: To show the linearity of ct>, let x, y E V and k a scalar. Then , for any f E V*, ct>(x + ky)(f) =
..--....-
(x + ky)(f) = f(x
=
f(x)
+
=
(x
ky)(f) =
+
+
ky)
+ ky(f) (ct>(x) + kct>(y» (I).
kf(y) = x(f)
Hence, ct>(x + ky) = ct>(x) + kct>(y) . = 0 in V**, To show that ct> is injective, suppose x E Ker(ct» . Then ct>(x) = i.e., x(f) = 0 for all f E V* . It implies that x = 0: In fact, if x :f:. 0, one can choose a basis ex = {VI, V2, ... , vn } for V such that VI = X. Let ex* = {vi, vi, . . . , v~} be the dual basis of ex. Then
x
0= xCvi) = vi(x) = vi(vI) = 1, which is a contradiction. Thus , x = 0 and Ker(ct» = {OJ. Since dim V = dim V**, ct> is an isomorphism.
= ]R3 and define Ii E V* as follows : II (x, y, z) = x - 2y, h(x, y, Z) = x + y + Z,
D
Problem 4.21 Let V
Prove that {fl,
Iz.
f3(x, y, Z) = y - 3z.
13} is a basis for V*, and then find a basis for V for which it is the dual .
We now consider the matrix representation of the transpose S* : W* ~ V* of a linear transformation S : V ~ W. Let ex = {VI, V2, .. . , vn } and f3 = {WI, W2, . .. , w m} be bases for V and W with their dual bases ex* = {vi, vi, ... , v~} and f3* = {wi, wi, ... , w~}, respectively.
Theorem 4.21 The matrix representation of the transpose S* : W* transpose of the matrix representation of S : V
Proof: Let S(Vi) = L~=I
Then
auw«. so that
~
W, that is,
~
V* is the
4.7.1. Application: Dual spaces and adjoint
147
[S*lp: = [[S*(wj)la* ... [S*(w~)la*] ' Note that, for 1
s j s m, S*(wj) =
n
n
LS*(wj)(v;)vi =
LajiVi ,
;=1
;=1
since
= (wj 0 S)(Vi)
= wj(S(v;» m
=
wj (takiwk)
=
0'* Hence, we get [S*lfJ*
=
(
Lakiwj(Wk)
=
k=1
k=1
[SlafJ)T .
D
Example 4.27 (The transpose AT is the adjoint transformation of A) Let A : jRn ~ jRm be a linear transformation defined by an m x n matrix A. Let a and f3 be the standard bases for jRn and jRm, respectively. Then [Al~ = A. By Theorem 4.21, we have [A *lp: = ([Al~)T. Thus, with the identification jRk* = jRk via Ct* = Ct and
f3* = f3 as in Example 4.25, we have
[A *lp: = A * and A * = AT . In this sense, we
see that the transpose A T is the adjoint transformation of A.
D
As the final part of the section, we consider the dual space of a subspace. Let V be a vector space of dimension n, and let U be a subspace of V of dimension k. Then U* = {T : U ~ jR : T is linear on U} is not a subspace of V*. However, one can extend each T E U* to a linear functional on V as follows . Choose a basis a = {UI , U2 , ... , uk! for U . Then by definition Ct* = {uj,ui, , uk} is its dual , un} for V . For basis for U*. Now extend a to a basis f3 = {UI, U2, . . . , Uk> Uk+l, each T E U*, let T : V ~ jR be the linear functional on V defined by
ifi ~ k , if k + 1 ~ i
~
n.
Then clearly T E V* and the restriction Tlu ofT on U is simply T: i.e., Tlu = T E U*. It is easy to see that (T + kS) T + kS. In particular, it is also easy to see that {uj, u2' .. ., uk} is linearly independent in V* and uilu = ui E U*, i = 1,2, .. . ,k. Therefore, one obtains a one-to -one linear transformation
=
tp : U* ~
V*
given by cp(T) = T for all T E U*. The image cp(U*) is now a subspace of V*. By identifying U* with the image cp(U*) , one can say U* is a subspace of V*. Problem 4.22 Let U and W be subspaces of a vector space V. Show that U £:: W if and only ifW* £:: U*.
148
Chapter 4. Linear Transformations
Let S be an arbitrary subset of V, and let (S} denote the subspace of V spanned by the vectors in S. Let Sol = (f E V* : f(x) = 0 for any XES} . Then it is easy to show that Sol is a subspace of V*, Sol = (S}ol, and dim(S} + dim Sol = n. Let R be a subset of V*. Then Rol = {x E V : f(x) = 0 for any fER} is again a subspace of V such that Rol = (R}ol and dim Rol + dim(R} = n. Problem 4.23 For subspaces U and W of a vector space V, show that (1) (U + W)ol Uol n w(2) (U n W)ol Uol
=
=
+ w-.
4.7.2 Computer graphics One of the simple applications of a linear transformation is to animation or graphical display of pictures on a computer screen. For a simple display of the idea, let us consider a picture in the 2-plane ]R2. Note that a picture or an image on a screen usually consists of a number of points, lines or curves connecting some of them, and information about how to fill the regions bounded by the lines and curves. Assuming that the computer has information about how to connect the points and curves, a figure can be defined by a list of points. For example, consider the capital letters 'LA' as in Figure 4.8. They can be repre-
o Figure 4.8. Letter L.A. on a screen
sented by a matrix with coordinates of the vertices. For example, the coordinates of the 6 vertices of 'L' form a matrix:
6 vertices 1 2 3 4 5 x-coordinate [ 0.0 0.0 0.5 0.5 2.0 y-coordinate 0.0 2.0 2.0 0.5 0.5 0.0
2.0] = A.
Of course, we assume that the computer knows which vertices are connected to which by lines via some algorithm. We know that line segments are transformed to other line segments by a matrix, considered as a linear transformation. Thus, by multiplying A by a matrix, the vertices are transformed to the other set of vertices, and the line segments connecting the vertices are preserved. For example, the matrix B =
[b
0.;5] transforms the matrix A to the following form, which represents
new coordinates of the vertices:
4.7.2. Application: Computer graphics
149
3 4 5 6 1 2 0.0 0.5 1.0 0.625 2.125 2.0] [ 0.0 2.0 2.0 0.5 0.5 0.0 .
vertices BA =
Now, the computer connects these vertices properly by lines according to the given algorithm and displays on the screen the changed figure as the left-hand side of the Figure 4.9. The multiplication ofthe matrix C =
n L!!=Js
[005
~] to BA shrinks the width
3
BA
1
(CB)A
6
Figure 4.9. Tilting and Shrinking
of BA by half producing the right-hand side of Figure 4.9. Thus, changes in the shape of a figure may be obtained by compositions of appropriate linear transformations. Now, it is suggested that the readers try to find various matrices such as reflections , rotations, or any other linear transformations, and multiply A by them to see how the shape of the figure changes.
= [~ o.~s ] above, by the matrix
Problem 4.24 For the given matrices A and B
BT A
instead of BA, what kind of figure can you have ?
Remark: Incidentally, one can see that the composition of a rotation by 1( followed by a reflection about the x-axis is the same as the composition of the reflection followed by the rotation (see Figure 4.10). In general, a rotation and a reflection are not commutative, neither are a reflection and another reflection. The above argument generally applie s to a figure in any dimension. For instance, a 3 x 3 matrix may be used to convert a figure in )R3 since each point has three components. Example 4.28 (Classifying all rotations in )R3) It is easy to see that the matrices
R(x .a)
=
1 0 0 coso [ o sin o
- sin f3
o
cos f3 COS
R( z,y)
]
=
[
Y
Si~ Y
-sin y cos y
o
are the rotations about the x, y, z-axes by the angles a,
O~] f3 and y, respectively .
,
150
Chapter 4. Linear Transformations
!t \f
1t 2\
Rotationby 7f ~
Reflection
Reflection
Rotation by x
Figure 4.10. Commutativity of a rotationand a reflection
In general, the matrix that rotates JR3 with respect to a given axis appears frequently in many applications. One can easily express such a general rotation as a composition of basic rotations such as R(x ,a) , R(y ,{3) and R(z ,y) : First, note that by choosing a unit vector u in JR3, one can determine an axis for a rotation by taking a line passing through u and the origin O. (In fact, vectors u and -u determine the same line) . Let u = (cosacos{3, cos a sin {3, sin a), -~ ~ a ~ ~' o ~ {3 ~ 2rr in the spherical coordinates. To find the matrix R(u ,o) of the rotation about the u-axis bye, we first rotate the u-axis about the z-axis into the x z-plane by R(z,_{3) and then into the x-axis by the rotation R (y ,-a) about the y-axis , Then the rotation about the u-axis is the same as the rotation about the x-axis followed by the inverses of the above rotations, i.e., take the rotation R(x ,o) about the x-axis, and then get back to the rotation about the u-axis via R(y,a) and R(z,{3) ' In summary,
o z u
y x
Figure 4.11.A rotationabout the u-axis
4.7.2. Application: Computer graphics
Problem 4.25 Find the matrix R (0 , t ) for the rotation aboutthe line determined by u
151
= 0 , 1, 1)
T(
by
4'
So far, we have seen rotations, reflections, tilting (or say shear) or scaling (shrinking or enlargement) or their compositions as linear transformations on the plain ]R2 or the space ]R3 for computer graphics. However, another indispensable transformation for computer graphics is a translation: A translation is by definition a transformation T : jRn --+ ]Rn defined by T (x) x + Xo for any x E jRn , where Xo is a fixed vector in ]Rn . Unfortunately, a translation is not linear if Xo =1= 0 and hence it cannot be represented by a matrix . To escape this disadvantage , we introduce a new coordinate system , called a homogeneous coordinate. For brevity, we will consider only the 3-space ]R3. A point x = (x, y, z) E ]R3 in the rectangular coordinate can be viewed as the set of vectors x (hx , hy, hz; h), h =1= 0 in the 4-space ]R4 in a homogeneous coordinate. Most time, we use (x, y , z, I) as a representative of this set. Conversely, a point (hx, hy, hz, h), (h i= 0) in the 4-space ]R4 in a homogeneous coordinate corresponds to the point (x / h, y / h, z/ h) E ]R3 in the rectangular coordinate. Now, it is possible to represent all of our transformations including translations as 4 x 4 matrices by using the homogeneous coordinate and it will be shown case by case as follows.
=
=
(1) Translations: A translation T : jR3 --+ ]R3 defined by T (x) = x + xo, where Xo = (xo, yo , zo). can be represented by a matrix multiplication in the homogeneous
coordinate as
nR(""~[ :~F ~
(2) Rotations: With the notations in Example 4.28 ,
R(',·)~u ~~~; ~~raa
- sinf3
o
cos f3
o
Y - sin y 0 0] sin y cos y 0 0
COS
R (z,y )= [
0
0
I 0
o
0
0 I
are the rotations about the x , y, z-axes by the angles a , f3 and y in the homogeneous coordinate, respectively. (3) Reflections: An xy- reflection is represented by a matrix multiplication in the homogeneous coordinate as
152
Chapter 4. Linear Transformations
Similarly, one can have an x z-reflection and a yz -reflection. (4) Shear: An xy-shear can be represented by a matrix multiplication in the homogeneous coordinate as
Similarly, one can have an xz-shear and a yz-shear with matrices of the form
! ~ ~], 001
respectively. (5) Scaling: A scaling is represented by a matrix multiplication in the homogeneous coordinate as
In summary, all of the transformations can be represented by matrix multiplications in the homogeneous coordinate and also their compositions can be done by their corresponding matrix multiplications.
4.8 Exercises 4.1. Which of the following functions T are linear transformations?
i,
(1) T(x, y) (x 2 _ x 2 + y2) . (2) T(x, y, z) = (x + y, 0, 2x + 4z). (sin x , y) . (3) T(x , y) (4) T(x, y) (x + I, 2y, x + y ) . (5) T(x, y, z) (]»], 0) .
=
= = =
4.2. Let T : P2(lR) ~ P3(lR) be a linear transformation such that T(l) = 1, T(x) = x 2 and T(x 2 ) = x 3 + x . Find T(ax 2 + bx + c) . 4.3. Find SoT and/or T
0
S whenever it is defined.
4.8. Exercises (1) T(x , y,z) (2) T(x, y )
4.4. Let S : C(R)
= (x -
= (x ,
--+
y+ z, x +z), S(x, y )
= (x,
3y +x, 2x - 4y , y) , S(x, y, z)
153
x - y , y) ;
= (2x,
y ).
C(]R) be the function on the vector space C(]R)defined by, for f E C(]R), S(f )(x ) = f(x) _
~x uf (u)du .
Show that S is a linear transformation on the vector space C(]R).
4.5. Let T be a linear transformation on a vector space V such that T 2 = id and T U = (v E V : T(v) = v} and W = (v E V : T(v ) = -v} . Show that (1) at least one of U and W is a nonzero subspace of V; (2) U
(3) V
nW
f= id. Let
= {O};
=U +
W. 4.6. If T : ]R3 --+ ]R3 is defined by T(x , y, z) = (2x - z, 3x - 2y , x - 2y (1) determine the null space N (T) of T,
+
z),
(2) determine whether T is one-to-one, (3) find a basis for N(T) . 4.7. Show that each of the following linear transformations T on ]R3 is invertible, and find a formula for T- 1: (1) T (x,y ,z) = (3x , x- y, 2x+y + z). (2) T(x, y, z)
= (2x , 4x -
y, 2x
+
3y - z) .
4.8. Let S, T : V --+ V be linear transformations on a vector space V. (1) Show that if T 0 S is one-to-one , then T is an isomorphism. (2) Show that if T
0
S is onto, then T is an isomorphism.
(3) Show that if T k is an isomorphism for some positive k, then T is an isomorphism. 4.9. Let T be a linear transformation from ]R3 to ]R2, and let S be a linear transformation from ]R2 to ]R3 . Prove that the compo sition SoT is not invertible. 4.10. Let T be a linear transformation on a vector space V satisfying T - T 2 = id . Show that T is invertible.
4.11. Let A be an n x n matrix, which is a linear transformat ion on the n-space R" by the matrix multiplication Ax for any x E R", Suppose that rj , r2, . . . , r n are linearly independent vectors in ]Rn constituting a parallelepiped (see Remark (2) on page 70). Then A transforms this parallelepiped into another parallelepiped determined by Arl, Ar2, . . . , Arn . Suppose that we denote the n x n matrix whose j -th column is r j by B , and the n x n matrix whose j -th column is Ar j by C . Prove that vol(P(C»
= I det
AI vol(P(B» .
(This means that, for a square matrix A considered as a linear transformation, the absolute value of the determinant of A is the ratio between the volumes of a parallelepiped P(B) and its image parallelepiped P(C) under the transformation by A.1f det A = 0, then the image P(C) is a parallelepiped in a subspace of dimension less than n). 4.12 . Let T : ]R3 --+ ]R3 be the linear transformation given by T (x, y ,z)
= (x +
y,y + z ,x + z ).
Let C denote the unit cube in ]R3 determined by the standard basis ej , e2, e3. Find the volume of the image parallelepiped T (C) of C under T.
154
Chapter 4. Linear Transformations
4.13. With respect to the ordered basis a = {I, x, x 2 } for the vector space P2(lR), find the coordinate vector of the following polynomials: (1) f(x)
=x2 - x +
I,
= x 2 + 4x -
(2) f(x)
I,
(3) f(x)
= 2x + 5.
4.14. Let T : P3 (R) ~ P3 (R) be the linear transformation defined by Tf(x)
= f"(x) -
4f'(x)
+
f(x).
= {x, 1 + x, x + x 2 ,
x 3 }. 2 4.15. Let T be the linear transformation on lR defined by T(x, y) (-y, x). Find the matrix [Tl a for the basis a
=
(1) What is the matrix of T with respect to an ordered basis a (1, 2), V2 = (1, -I)?
= {VI, V2}, where vI =
(2) Show that for every real number c the linear transformation T - c id is invertible.
4.16. Find the matrix representation of each of the following linear transformations T on lR2 with respect to the standard basis {el, e2}. (1) T(x , y) (2y, 3x - y) . (2) T(x, y)
4.17. Let M
= = (3x -
= [~
4y, x
i ~l
+
5y).
(1) Find the unique linear transformation T : lR3 ~ lR2 so that M is the associated matrix of T with respect to the bases
"1
~
!Ul [il UJ I·
(2) Find T(x, y, z).
"2
~
([
n[: ]J
4.18. Find the matrix representation of each of the following linear transformations T on P2 (lR) with respect to the basis {I, x , x 2 }. (1) T: p(x)
~
p(x
+
1).
(2) T: p(x) ~ p' (x) .
(3) T: p(x)
~
p(O)x .
(4) T : p(x) ~ p(x) - p(O).
x
= {el , e2, ej] the standard basis and I , 1), U2 = (1, 1, 0), U3 = (1,0, O)}.
4.19. Consider the following ordered bases of lR3: a
fJ
= {UI = (I,
(1) Find the basis-change matrix P from a to fJ . (2) Find the basis-change matrix Q from (3) Verify that Q
= P- l .
fJ to a.
(4) Show that [vlfJ = P[vl er for any vector V E lR3. (5) Show that [T]fJ = Q-I [Tl a Q for the linear transformation T defined by T(x,y,z)=(2y+x, x-4y, 3x). 4.20 . There are no matrices A and B in Mnxn(lR) such that AB - BA
= In.
4.8. Exercises
155
4.21. Let T : ]R3 -+ ]RZ be the linear transformation defined by
T (x , y, z)
= (3x + 2y -
4z , x - 5y + 3z),
=
and let a {(l , I , 1), (1,1, D), (I, D, D) } and d and ]RZ, respect ively. (l) Find the associated matrix (2) Verify [T]~[v]a
= {(2, 3) ,
3), (2, 5)} bebasesfor]R3
[T]~ for T .
= [T (v)]/l for any v E ]R3.
4.22 . Find the basis-change matrix (l) a
= {(l,
[id]~ from a to .8, when
(D , I )}, .8 = {(6, 4) , (4,8)};
(2) a = {(5, 1), (I ,2) }, .8 = {(l , D), (D, I )};
= {(l, 1, 1), (l, I , D), (I, D, D)} , .8 = {(2, D, 3), (- 1, 4, I) , (3, 2, 5)}; (4) a={t , 1, t Z}, .8 = {3 +2t + t Z , t Z-4 , 2 +t}.
(3) a
4.23. Show that all matri ces of the form Ae = [ 4.24. Show that the matrix A =
4.25. Are the matrices
[~
c~s BB Sin
sin BB ] are similar. - cos
[~ ~] cannot be similar to a diagonal matrix.
i ~]
~ ~ ~]
and [ similar? I DID D 3 4.26. For a linear transformation T on a vector space V, show that T is one-to -one if and only if its tran spose T * is one-to-on e.
4.27. Let T : ]R3
-+ ]R3 be the linear transformation defined by
T (x, y , z)= (2y+ z , -x+4y+ z , x +z) . Compute [Tl a and [T*la* for the standard basis a = [e j , eZ, ej} , 4.28. Let T be the linea r transformation from ]R3 into ]Rz defined by T (XI, X2 , X3) = (XI +x2, 2X3 - XI ) ·
(l) For the standard ordered bases a and .8 for ]R3 and ]Rz respectivel y, find the associated matrix for T with respect to the bases a and .8.
(2) Let a = {XI, XZ , x3} and .8 = {YI , yz}, where XI = (l, D, -I), xz = (1, 1, 1), X3 = (1 , D, D), and YI = (D, I), yz = (I , D). Find the associated matrices [T]~ and
[T * l~: .
4.29 . Let T be the linear transformation from ]R3 to ]R4 defined by
T (x , y , z )= (2x+ y+4z , x +y +2z. y + 2z, x +y + 3z) . Find the image and the kernel of T . What is the dimen sion of Im(T)? Find [Tl~ and [T*l~: , where
= {(1 , D, D), (D, 1, D) , (D, D, I)}, fJ = {(l ,D,D,D), (I , I , D, D), (1 , I , I ,D),
a
4.30. Let T be the linear transforma tion on V respect to the standard ordered basis is
(1, 1, 1, I)}.
= ]R3, for which the
associated matrix with
A= [ -I~i3 4~ ] . Find the bases for the kernel and the image of the transpose T* on V* .
156
Chapter 4. Linear Transformations
= P2(lR) by I f3(p) = Jo- p(x)dx.
4.31. Define three linear functionals on the vector space V
II (p) = Jd
p(x)dx,
Show that (II,
12,
h(p)
= J0
2
p(x)dx ,
13j is a basis for V* by finding its dual basis for V .
4.32. Determine whether or not the following statements are true in general, and justify your answers . (1) For a linear transformation T : jRn --+ jRm, Ker(T)
= {OJ if m >
n.
(2) For a linear transformation T : jRn --+ jRm, Ker(T) =1= {OJ if m < n . (3) A linear transformation T : jRn --+ jRm is one-to-one if and only if the nullspace of
[T]g is {OJ for any basis ex for jRn and any basis f3 for jRm. (4) For any linear transformation T on jRn, the dimension of the image of T is equal to that of the row space of [T]a for any basis ex for jRn . (5) For any two linear transformations T : V --+ W and S : W --+ Z, if Ker(S then Ker(T) = O.
0
T)
= 0,
(6) Any polynomial p(x) is linear if and only if the degree of p(x) is less than or equal
to 1. (7) Let T : jR3 --+ jR2 be a function given as T(x) = (TI (x) , T2(X» for any x E jR3. Then T is linear if and only if their coordinate functions Ti , i = 1, 2, are linear. (8) For a linear transformation T : jRn --+ jRn , if [T]g = In for some bases ex and R" , then T must be the identity transformation.
f3 of
(9) If a linear transformation T : jRn --+ jRn is one-to-one , then any matrix representation of T is nonsingular. (10) Any m x n matrix A can be a matrix representation of a linear transformation T : jRn --+ jRm.
(11) Every basis-change matrix is invertible. (12) A matrix similar to a basis-change matrix is also a basis-change matrix. (13) det : Mn xn(jR) --+ jR is a linear functional. (14) Every translation in jRn is a linear transformation.
5
Inner Product Spaces
5.1 Dot products and inner products To study a geometry ofa vector space, we go back to the case ofthe 3-space ]R3 •The dot (or Euclidean inner) product of two vectors x = (XI, X2, X3) and Y = (YI, Y2, Y3) in ]R3 is a number defined by the formula
x- y
~ x , y1+ x,Y2 +X3Y3 ~ [x, X2'3] [ ~ ] ~ xTy,
where x T Y is the matrix product of x T and Y, which is also a number identified with the 1 x 1 matrix x T y. Using the dot product, the length (or magnitude) of a vector x (XI , X2, X3) is defined by
=
[x] = (x · x)
i = Jxl + x~ + xi '
and the Euclidean distance between two vectors x and Y in ]R3 is defined by d(x, y) =
IIx - YII .
In this way, the dot product can be considered to be a ruler for measuring the length of a line segment in the 3-space ]R3. Furthermore, it can also be used to measure the angle between two nonzero vectors: in fact, the angle 0 between two vectors x and Y in ]R3 is measured by the formula involving the dot product
x ·y IIxllllYII '
cos s = - - - 0 ~ 0 ~ it , since the dot product satisfies the formula
x· Y = IIxllllYII cos s,
o=
In particular, two vectors x and Y are orthogonal (i.e ., they form a right angle tt /2) if and only if the Pythagorean theorem holds:
J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
158
Chapter 5. Inner Product Spaces
By rewriting this formula in terms of the dot product, we obtain another equivalent condition: X. Y = XIYI + xzYz + X3Y3 = O. In fact, this dot product is one of the most important structures with which JR3 is equipped. Euclidean geometry begins with the vector space JR3 together with the dot product, because the Euclidean distance can be defined by the dot product. The dot product has a direct extension to the n-space JRn of any dimension n: for any two vectors x = (XI , Xz , . . . , xn) and y = (YI , Yz, .. . , Yn) in JRn , their dot product, also called the Euclidean inner product, and the length (or magnitude) of a vector are defined similarly as
XIYI +xzyz+ " ,+xnYn = xTy,
x ·y
[x] =
(x.x)! =
Jxr+xi+ "'+x~,
To extend this notion of the dot product to a (real) vector space, we extract the most essential properties that the dot product in JRn satisfies and take these properties as axioms for an inner product on a vector space V . Definition 5.1 An inner product on a real vector space V is a function that associates a real number (x, y) to each pair of vectors x and y in V in such a way that the following rules are satisfied: For any vectors x, y and z in V and any scalar k in JR, (x, y) = (y , x) (2) (x + y, z) = (x, z) + (y, z) (3) (kx , y) = k(x, y) (4) (x, x) ::: 0, and (x, x) = 0 ¢> x = 0 (1)
(symmetry), (additivity), (homogeneity), (positive definiteness) .
A pair (V, ( ,)) of a (real) vector space V and an inner product ( , ) is called a (real) inner product space. In particular, the pair (JRn , .) is called the Euclidean n-space. Note that by symmetry (1), additivity (2) and homogeneity (3) also hold for the second variable: i.e.,
(2') (x, Y+ z) = (x, y) (3') (x, ky) = k(x, y}.
+
(x, z),
It is easy to show that (0, y) = 0(0, y) = 0 and also (x, O) = O. Remark: In Definition 5.1, the rules (2) and (3) mean that the inner product is linear for the first variable; and the rules (2') and (3') above mean that the inner product is also linear for the second variable. In this sense, the inner product is called bilinear. Example 5.1 (Non-Euclidean inner product on JRz) For any two vectors x = (XI, xz) and y = (YI , yz) in JR z , define
5.1. Dot products and inner products {x, y} =
ax \y\
=
+
[XJX2]
159
C(X\Y2 + X2Y\ ) + bX2Y2
[~ ~] [ ~~ ] =
T
x Ay,
where a, band C are arbitrary real numbers. Then , this function { , } clearly satisfies the first three rules of the inner product , i.e., {, } is symmetric and bilinear. Moreover, if a > and det A = ab - c2 > hold, then it also satisfies rule (4) , the positive definiteness of the inner product. (Hint: The equation {x , x} = ax ?+ 2cxI x2+bxl :::: if and only if either X2 = or the discriminant of (x , x} / is nonpositive. ) In the case of C = 0, this reduces to (x, y} = aXIYI + bX2Y2. Notice also that a = (ej , ej}, b = {e2, e2} and c = {el, e2} = {e2, e.) , 0
°
°
°
°
xi
Problem 5.1 In Example 5.1, the conver se is also true : Prove that if (x, y) product in ]R2, then a > 0 and ab - c2 > O.
= x T Ay is an inner
Example 5.2 (Case ofx f. 0 f. Y but {x, y} = 0) Let V = C [0, 1] be the vector space of all real-valued continuous functions on [0, 1]. For any two functions f and g in V , define {f, g} =
1\
f (x) g(x)dx .
Then { , } is an inner product on V (verify this). Let
f(x) = Then
(
° s x ::: i,
I - 2x if . I If 2
°
f i= 0 i=
sxs
I,
g(x) =
and
(°
if 2x - I if
°s
x
s i,
i s x s I.
o
g, but (f, g ) = O.
By a subspace W of an inner product space V , we mean a subspace of the vector space V together with the inner product that is the restriction of the inner product on Vto W. Example 5.3 (A subspace as an inner product space) The set W = D 1[0, 1] of all real-valued differentiable functions on [0, 1] is a subspace of V = qo, 1]. The restriction to W of the inner product on V defined in Example 5.2 makes W an inner product subspace of V. However, one can define another inner product on W by the following formula: For any two functions f(x) and g(x) in W, {{f, g}} =
1
1
f(x)g(x )dx
+
1
1
f '( x)g'(x)dx.
Then {{, }} is also an inner product on W , which is different from the restriction to W of the inner product of V, and hence W with this new inner product is not a subspace of the inner product space V. 0
160
Chapter 5. Inner Product Spaces
Remark: From vector calculus, most readers might be already familiar with the dot product (or the inner product) and the cross product (or the outer product) in the 3-space jR3. The concept of this dot product is extended to a higher dimensional Euclidean space jRn in this section. However, it is known in advanced mathematics that the cross product in the 3-space jR3 cannot be extended to a higher dimensional Euclidean space jRn. In fact, it is known that if there is a bilinear function I : jRn X jRn -+ jRn; I(x, y) = x x y satisfying the property: x x y is perpendicular to x and y, and IIx x yll2 IIxll 211yll2 - (x . y)2, then n 3 or 7. Hence, the cross product or the outer product will not be introduced in linear algebra.
=
=
5.2 The lengths and angles of vectors In this section, we study a geometry of an inner product space by introducing a length , an angle or a distance between two vectors. The following inequality will enable us to define an angle between two vectors in an inner product space V. Theorem 5.1 (Cauchy-Schwarz inequality)
If x and yare vectors in an inner
product space V , then (x, y)2 :5 (x, x){y, y). Proof: If x
= 0, it is clear. Assume x i= O. For any scalar t , we have 0:5 (tx
+
y, tx + y) = (x, x)t 2 + 2{x, y)t
+
(y, y).
This inequality implies that the polynomial (x, x)t 2 + 2{x, y)t + (y, y) in t has either no real roots or a repeated real root. Therefore, its discriminant must be nonpositive: (x, y)2 - (x, x){y, y) :5 0,
o
which implies the inequality.
Problem 5.2 Prove that the equalityin the Cauchy-Schwarz inequality holdsif and onlyif the vectorsx and yare linearlydependent.
The lengths of vectors and angles between two vectors in an inner product space are defined in a similar way to the case of the Euclidean n-space. Definition 5.2 Let V be an inner product space. (1) The magnitude [x] (or the length) of a vector x is defined by
[x] = J{x, x).
5.2. The lengths and angles of vectors
161
(2) The distance d(x , y) between two vectors x and Yis defined by d(x, y) = [x - YII.
(3) From the Cauchy-Schwarz inequality, we have -1 :s lI~illr;1I :s 1 for any two nonzero vectors x and y. Hence, there is a unique number () E [0, rr] such that
cos e
(x, y}
= IIxlillYIl
or (x,y}
= IIxIiIlYllcos(}.
Such a number () is called the angle between x and y. For example, the dot product in the Euclidean 3-space ]R3 defines the Euclidean distance in ]R3. However, one can define infinitely many non-Euclidean distances in ]R3 as shown in the following example. Example 5.4 (Infinitely many different inner products on ]R2 or ]R3) (1) In ]R2 equipped with an inner product (x, y} = 2X1Yl + 3X2Y2, the angle between x = (1, 1) and Y = (1,0) is computed as (x, y} 2 cos () = - - = - - = 0.6324 .. ..
J5:2
IIxllllYl1
Thus, () = cos"! (k) . Notice that in the Euclidean 2-space]R2 with the dot product, the angle between x = (1,1) and Y = (1,0) is clearly ~ and cos ~ = ~ = 0 .7071· . '. It shows that the angle between two vectors depends actually on an inner
[d
product on a vector space. (2) For any diagonal matrix A =
1
0
o
0 0]
da
0
0
d3
with all d, > 0,
defines an inner product on]R3. Thus, there are infinitely many different inner products on ]R3 . Moreover, an inner product in the 3-space]R3 may play the roles of a ruler and a protractor in our physical world ]R3. 0 Problem 5.3 In Example 5.4(2), show that x T Ay cannot be an inner product if A has a negative diagonal entry d; < O. Problem 5.4 Prove the following properties of length in an inner product space V : For any vectors x, y
E
V,
(1) [x] ~ 0, (2) [x] = 0 if and only if x (3) IIkxll = Iklllxll, (4) IIx+YII::::: [x] + IIYII
= 0, (triangle inequality).
162
Chapter 5. Inner Product Spaces
Problem 5.5 Let V be an inner product space. Show that for any vectors x, y and z in V , (I) d(x, y) 2: 0,
(2) d(x, y) = 0 if and only if x (3) d(x, y) = dey, x), (4) d(x, y) ::: d(x, z) + d(z , y)
= y, (triangle inequality).
Definition 5.3 Twovectorsx and Yin aninnerproductspacearesaid to beorthogonal (or perpendicular) if (x, y) = O. Note that for nonzero vectors x and y, (x, y)
= 0 if and only if 0 = IT /2.
Lemma 5.2 Let V be an inner product space and let x E V . Then, the vector x is orthogonal to every vector y in V (i.e., (x, y) 0 for all y in V) ifand only ifx O.
=
=
Proof: Ifx = 0, clearly (x, y) = 0 for all y in V . Suppose that (x, y) in V. Then (x, x) = 0, implying x = 0 by positive definiteness.
= 0 for all Y 0
Corollary 5.3 Let V be an inner product space, and let a = {Vj, . . . , vn } be a basis for V . Then, a vector x in V is orthogonal to every basis vector Vi in a ifand only if
x=o.
Proof: If (x, Vi) = 0 for i y = L:?=j YiV i E V.
= 1, 2,
. . . , n, then (x, y)
= L:?=j Yi (x, Vi) = 0 for any 0
Example 5.5 (Pythagorean theorem) Let V be an inner product space, and let x and y be any two nonzero vectorsin V with the angle e. Then, (x, y) = IIxlillYIl cosO gives the equality IIx + yf = IIxli z + lIyllZ + 211xllllYil cos e . Moreover, it deduces the Pythagorean theorem: IIx + yliZ = IIxli z + lIyllZ for any orthogonal vectors x and y. 0 Theorem 5.4 [fxj, Xz , .. . , Xk are nonzero mutually orthogonal vectors in an inner product space V (i.e., each vector is orthogonal to every other vector), then they are linearly independent. Proof: Suppose CjXj
o
+
CZXz
=
qXk
= O. Then for each i = 1,2, ... , k,
+ . ..+ qXk , Xi) Cj (Xj,Xi) + . .. + Ci(Xi , Xi} + . .. + =
(0, Xi)
=
+ . .. + (CjXj
q(Xk,
Xi}
2
cillxiU ,
because xi , Xz, . .. , Xk are mutually orthogonal.Since each Xi is not the zero vector, IIxdl ;6 0; so ci = 0 for i = 1, 2, .. . , k. 0
5.3. Matrix representations of inner products
r
163
Problem 5.6 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove (1) (2)
[fJ f(x)g(x)dx s [fJ f2(X)dx] [fJ g2(x)dx]. [fJ (f(x) + g(x»2dx] ~ ~ [fJ f 2(x)dx ] ~ + [fJ g2(x)dx] ~ . 1
I
I
Problem 5.7 Let V = C [0, 1] be the inner product space of all real-valued continuous functions on [0, 1] equipped with the inner product (f, g)
= 10
For the following two functions numbers k, t, (1) f(x) (2) f(x) (3) f(x)
f
1
f(x)g(x) dx for any f and gin V.
and g in V, compute the angle between them: For any natural
= kx and g(x) = lx, = sin 211'kx andg(x) = sin 211'lx, = cos 211'kx and g(x) = cos 211'lx.
5.3 Matrix representations of inner products Let A be an n x n diagonal matrix with positive diagonal entries. Then one can show that (x, y) = x T Ay defines an inner product on the n-space IRn as shown in Example 5.4(2). The converse is also true: every inner product on a vector space can be expressed in such a matrix product form. Let (V, ( ,)) be an inner product space, and let a = {VI , V2, ... , vn } be a fixed ordered basis for V. Then for any x = L:?=I xiv, and Y = LJ=l YjV j in V, n
(x, y) =
n
L :~~>i Yj (Vi,
V j)
i=1 j=1
holds. If we set aij = (Vi, V j) for i, j = I, 2, . . . , n, then these numbers constitute a symmetric matrix A = [aij], since (Vi, V j) = (v j, Vi) ' Thus, in matrix notation, the inner product may be written as n
n
(x,y) = LLxiYjaij = [xlrA[yla. i=l j=l
The matrix A is called the matrix representation of the inner product ( , ) with respect to the basis a.
Example 5.6 (Matrix representation of an inner product) (1) With respect to the standard basis {el,~, . .. , en} for the Euclidean nspace IRn , the matrix representation of the dot product is the identity matrix, since
164
Chapter 5. Inner Product Spaces
ei . ej = Dij . Thus, for x the matrix product x T y:
= Li Xiei and Y = Lj Yjej
ERn, the dot product is just
(2) On V = Pz([O, 1]), we define an inner product on Vas (f, g) =
i
l
f(x)g(x)dx.
Then for a basis a = (f1(X) = 1, fz(x) = x, h(x) find its matrix representation A = [aijl: For instance, aZ3
= (fz, 13) =
i
l
o
fz(x)h(x)dx
=
= xZ} for V,
11 0
x . xZdx
one can easily
1 = -. 4
D
The expression of the dot product as a matrix product is very useful in stating or proving theorems in the Euclidean space. For any symmetric matrix A and for a fixed basis a, the formula (x, y) = [xlr A[Yla seems to give rise to an inner product on V . In fact, the formula clearly is symmetric and bilinear, but does not necessarily satisfy the fourth rule, positive definiteness. The following theorem gives a necessary condition for a symmetric matrix A to give rise to an inner product. Some necessary and sufficient conditions will be discussed in Chapter 8. Theorem 5.5 The matrix representation A of an inner product (with respect to any basis) on a vector space V is invertible. That is, det A =P O. Proof: Let (x, y) = [xlr A[Yla be an inner product in a vector space V with respect to a basis a, and let A[Yla = 0 as a homogeneous system of linear equations. Then
(Y, y) = [YlrA[Yla = O. It implies that A[Yla = 0 has only the trivial solution Y = 0, or equivalently A is invertible by Theorem 1.9. D
Recall that the conditions a > 0 and det A = ab - c Z > 0 in Example 5.1 are sufficient for A to give rise to an inner product on R Z•
5.4 Gram-Schmidt orthogonalization The standard basis for the Euclidean n-space Rn has a special property: The basis vectors are mutually orthogonal and are of length 1. In this sense, it is called the
5.4. Gram-Schmidt orthogonalization
165
rectangular coordinate system for jRn. In an inner product space, a vector with length I is called a unit vector. If x is a nonzero vector in an inner product space V, the vector
II~II x is a unit vector. The process
of obtaining a unit vector from a
nonzero vector by multiplying the reciprocal of its length is called a normalization. Thus, if there is a set of mutually orthogonal vectors (or a basis) in an inner product space, then the vectors can be converted to unit vectors by normalizing them without losing their mutual orthogonality. Definition 5.4 A set of vectors xr, X2, . .. , Xk in an inner product space V is said to be orthonormal if (orthogonality) , (normality) .
A set [xj , X2, . . . , xn } of vectors is called an orthonormal basis for V if it is a basis and orthonormal. Problem 5.8 Determine whether each of the following sets of vectors in ]R2 is orthogonal, orthonormal, or neitherwith respectto the Euclidean inner product. (I) {[
~ ] , [ ~ ]}
(3) {[
~
l [-~ ]}
(4)
I/J2 ] [-I/J2]} {[ 1/J2 ' 1/J2
It will be shown later in Theorem 5.6 that every inner product space has an orthonormal basis , just like the standard basis for the Euclidean n-space jRn. The following example illustrates how to construct such an orthonormal basis. Example 5.7 (How to construct an orthonormal basis ?) For a matrix
find an orthonormal basis for the column space C(A) of A. Solution: Let Cl , C2 and C3 be the column vectors of A in the order from left to right. It is easily verified that they are linearly independent, so they form a basis for the column space C(A) of dimension 3 in jR4. For notational convention, we denote by Spanlx}, .. . ,xd the subspace spanned by {Xl, . . . , xd . (1) First normalize cl to get Cl
ul = ~ =
Cl
(I 1I I)
'2 = 2' 2' 2' 2 '
166
Chapter S. Inner Product Spaces
which is a unit vector. Then Spanluj] =Span{cI}, because one is a scalar multiple of the other. (2) Noting that the vector Cz - (UI, CZ}UI = Cz - 2uI = (0, 1, -1, 0) is a nonzero vector orthogonal to UI , we set
Then, {UI, uz} is orthonormal and Spanlu}, uz} = Spanjc}, cz}. because each u, is a linear combination of CI and C2, and the converse is also true. (3) Finally, note that C3 - (UI, C3}UI - (U2, C3}UZ = C3 - 4uI + J2uz = (0, 1, 1, -2) is also a nonzero vector orthogonal to both UI and uz. In fact, (UI, C3-4uI+Y'2uZ) =
(UI, c3}-4(UI,UI}+Y'2(UI, uz}=O,
(U2, C3-4uI+Y'2uZ) =
(uz, c3}-4(uz, UI}+Y'2(UZ,UZ} =0.
By the normalization , the vector
is a unit vector, and one can also show that Spanlu}, Uz, U3) = Spanlcj , Cz , C3} = C(A) . Consequently, {UI , Uz, U3} is an orthonormal basis for C(A).
0
The orthonormalization process in Example 5.7 indicates how to prove the following general case, called the Gram-Schmidt orthogonaIization.
Theorem 5.6 Every inner product space has an orthonormal basis. Proof: [Gram-Schmidt orthogonalization process] Let {XI,Xz, .. . , xnl be a basis for an n-dimensional inner product space V . Let XI UI = IIxIII'
Of course, Xz - (xz, UI}UI =f. 0, because {XI , xz} is linearly independent. Generally, one can define by induction on k = 1,2, . . . , n,
Then, as Example 5.7 shows, the vectors UI, Uz, ... , Un are orthonormal in the n-dimensional vector space V . Since every orthonormal set is linearly independent, 0 it is an orthonormal basis for V .
5.4. Gram-Schmidt orthogonalization
167
Problem 5.9 Use the Gram-Schmidt orthogonalization on the Euclidean space R" to transform the basis
{(O, 1, 1, 0), (-I, 1, 0, 0), (1, 2, 0, -1), (-1, 0, 0, -I)} into an orthonormal basis.
Problem5.10 Find an orthonormal basis for the subspace W of the Euclidean space x + 2y - z = O.
]R3 given
by
Problem 5.11 Let V = C[O, 1] with the inner product (f,g)
= 10
1
f(x)g(x)dx for any fandgin V.
Find an orthonormal basis for the subspace spanned by 1, x and x 2 .
The next theorem shows that an orthornormal basis acts just like the standard basis for the Euclidean n-space jRn. Theorem 5.7 Let {u I, U2, ... , Uk} be an orthonormal basis for a subspace V in an inner product space V. Then, for any vector x in V ,
Proof: For any vector x E V, one can write x = XIUI + X2U2 + I, linear combination of the basis vectors. However, for each i
=
(Uj,x)
=
(u., XI
= because
XIUI+ " '+XkUk)
(u. , uj )
+ . ..+
Xj(Uj, Uj)
+ ...+
xt{Uj,
+
XkUk,
as a
, n,
Uk)
xi ,
{UI, U2, . , . , Uk}
o
is orthonormal.
In particular, if a = {v], V2, ... , vn } is an orthonormal basis for V, then any vector x in V can be written uniquely as x = (VI, X)VI
+
(V2, X)V2
+ .. .+
(v n , x)v n .
Moreover, one can identify an n-dimensional inner product space V with the Euclidean n-space jRn. Let a = {VI, V2, . • . , vn } be an orthonormal basis for the space V. With this orthonormal basis a, the natural isomorphism : V....-+ jRn given by (Vj) = [v;]a = ej, i = I, 2, . . . , n preserves the inner product on vectors: For a vector x = I:7=1 xiv, in V with x; = (x, Vj), the coordinate vector ofx with respect to a is a column matrix
168
Chapter 5. Inner Product Spaces
Moreover, for another vector y =
2:7=1 YiVj in V,
The right-hand side of this equation is just the dot product of vectors in the Euclidean space JRn. That is, (x, y) = [x]~[yltx = (x) · (y) for any x, y E V. Hence, the natural isomorphism preserves the inner product, and we have the following theorem (compare with Corollary 4.8(1». Theorem 5.8 Any n-dimension inner product space V with an inner product (, ) is isomorphic to the Euclidean n-space JRn with the dot product ., which means that an
isomorphismpreserves the inner product. In this sense, someone likes to restrict the study of an inner product space to the case of the Euclidean n-space JRn with the dot product. A special kind of linear transformation that preserves the inner product such as the natural isomorphism from V to JRn plays an important role in linear algebra, and it will be studied in detail in Section 5.8.
5.5 Projections Let U be a subspace of a vector space V . Then, by Corollary 3.13 there is another subspace W of V such that V = U EB W, so that any x E V has a unique expression as x = u + w for u E U and w E W. As an easy exercise, one can show that a function T : V ~ V defined by T(x) = T(u + w) = u is a linear transformation, whose T(V) is the subspace U and kernel Ker(T) is the subspace W. image Im(T)
=
Definition 5.5 Let U and W be subspaces of a vector space V . A linear transformation T : V ~ V is called the projection of V onto the subspace U along W if V = U EB W and T(x) = u for x = u+w E U EB W. Example 5.8 (Infinitely many differentprojectionsofJR2 onto the x -axis) Let X, Y and Z be the l-dimensional subspaces of the Euclidean 2-space JR2 spanned by the vectors ej , e2, and v = el + e2 = (1, 1), respectively: X =
{reI : r
E
JR} = x-axis,
Y =
{re2 : r
E
1R} = y-axis ,
Z =
{reel
+
e2) : r E JR}.
5.5. Projections
169
Since the pairs leI, ez} and [ej , v} are linearly independent, the space ]R2 can be expressed as the direct sum in two ways : ]R2 = X $ Y = X $ Z .
y
Ty(x)
= (0, 1)
z _____ x
TX(x)
= (2, 1)
Sz(x)
= (1, 1)
x
x
= (2,0)
Figure 5.1. Two decompositions of]R2
Thus, a vector x = (2, 1) E ]R2 may be written in two ways:
x = (2, 1) = { 2(1,0) + (0, 1)
E X $ Y = ]R2 i (1,0) + (1,1) E X $ Z = ]R.
Let
or
Tx and Sx denote the projections of]R2 onto X along Y and Z, respectively. Then Tx(x) = 2(1,0) E X, Sx(x) =
(1,0)
E
X,
Ty(x) = (0, 1) E Y, and
Sz(x) = (1, 1) E Z.
This shows that a projection of ]R2 onto the subspace X depends on a choice of complementary subspace of X. For example, by choosing Zn = {r(n, I) : r E R] for any integer n as a complementary subspace, one can construct infinitely many different projections of ]R2 onto the x -axis. 0 Note that for a given subspace U of V, a projection T of V onto U depends on the choice of a complementary subspace W of U as shown in Example 5.8. However, by definition, T(u) = u for any u E U and for any choice of W. That is, ToT = T for every projection T of V . The following theorem shows an algebraic characterization of a linear transformation to be a projection.
Theorem 5.9 A linear transformation T : V T = T 2 (= ToT by definition).
~
=
V is a projection
if and only if
Proof: The necessity is clear, because ToT T for any projection T. For the sufficiency, suppose T 2 = T . It suffices to show that V = Im(T)$Ker(T) and T(u + w) = u for any u + w E Im(T) $ Ker(T) . First, one needs to prove Im(T) n Ker(T) = (OJ and V = Im(T) + Ker(T). Indeed, if y E Im(T) n Ker(T) , then there exists x E V such that T(x) = yand T(y) = O. It implies
170
Chapter 5. Inner Product Spaces
y = T(x)
= T 2(x) = T(T(x» = T(y) = O.
The hypothesis T 2 = T also shows that T(v) E Im(T) and v - T(v) E Ker(T) for any v E V . It implies V = Im(T) + Ker(T). Finally, note that T(o + w) = T(o) + T(w) = T(o) = 0 for any 0 + WE Im(T) EEl Ker(T). 0 Let T : V -+ V be a projection, so that V = Im(T) EEl Ker(T). It is not difficult to show that Im(idv - T) = Ker(T) and Ker(idv - T) = Im(T) for the identity transformation idv on V. Corollary 5.10 A linear transformation T : V -+ V is a projection if and only if idv - T is a projection. Moreover, if T is the projection of V onto a subspace U along W, then idv - T is the projection of V onto W along U. 0 Proof: It is enough to show that (idv - T) (idv - T)
0
0
(idv - T) = idv - T. But
(idv - T) = (idv - T) - (T - T 2 ) = idv - T.
0
Problem 5.12 For V = U E9 W, let Tv denote the projection of V onto U along W, and let Tw denote the projection of V onto W along U. Prove the following.
(1) (2) (3) (4)
For any x E V , X = TV (x) + Tw(x) . Tv 0 (idv - Tv) = O. Tv 0 Tw = Tw 0 TV = O. For any projection T : V ~ V, Im(idv - T)
= Ker(T) and Ker(idv -
T)
= Im(T).
5.6 Orthogonal projections Let U be a subspace of a vector space V. As shown in Example 5.8, we learn that there are infinitely many projections of V onto U which depend on a choice of complementary subspace W of U. However, if V is an inner product space, there is a particular choice of complementary subspace W, called the orthogonal complement of U, along which the projection onto U is called the orthogonal projection, defined below. To show this, we first extend the orthogonality of two vectors to an orthogonality of two subspaces. Definition 5.6 Let U and W be subspaces of an inner product space V.
=
(1) Two subspaces U and W are said to be orthogonal, written U .1 W, if (0, w) 0 for each 0 E U and w E W. (2) The set of all vectors in V that are orthogonal to every vector in U is called the orthogonal complement of U, denoted by u-, i.e.,
u- = {v E V : (v,o) = 0 for all 0
E U}.
5.6. Orthogonal projections
171
One can easily show that U.i is a subspace of V , and v E u» if and only if (v, u) = 0 for every U E f3, where f3 is a basis for U. Moreover, W ..L U if and only if W ~
o-.
Problem 5.13 Let U and W be subspaces of an inner product space V. Show that
(1) If U .1 W,
un W = {OJ.
w-
(2) U ~ W if and only if
~
u-.
Theorem 5.11 Let U be a subspace ofan inner product space V . Then (1) (U.i).i = U. (2) V = U EEl U.i : that is, for each x E V, there exist unique vectors Xu E U and Xu.l E U.isuchthatx = xU+XuL This is called the orthogonal decomposition of V (or of X) by U.
Proof: Let dim U = k. To show (U.i).i = U, take an orthonormal basis for U, say a = {VI,V2, ... , vd, by the Gram-Schmidt orthogonalization, and then extend it to an orthonormal basis for V, say f3 = {VI,V2,"" Vb Vk+I, ... , Vn }, which is always possible. Then, clearly y = {Vk+I , " " vn } forms an (orthonormal) basis for which means that (U.i).i = U and V = U EEl 0
o-,
o-.
Definition 5.7 Let U be a subspace of an inner product space V, and let {UI, U2 , . .. , urn} be an orthonormal basis for U. The orthogonal projection Proju from V onto the subspace U is defined by Proju(x) = (UI, X)UI + (U2 , X)U2
+ .. . +
(urn, x)u m
for any x E V. Clearly, Proju is linear and a projection, because Proju 0 Proju over, Proju(x) E U and x - Proju(x) E u-, because (x - Proju(x), u.)
= (x, OJ)
-
(Proju(x), u.) = (x, OJ)
-
= Proju. More-
(Uj,x) = 0
for every basis vector u. . Hence, by Theorem 5.7, we have Corollary 5.12 The orthogonal projection Proju is the projection of V onto a subspace U along its orthogonal complement u-. Therefore, in Definition 5.7, the projection Proju(x) is independent of the choice of an orthonormal basis for the subspace U. In this sense, it is called the orthogonal projection from the inner product space V onto the subspace U . Almost all projections used in linear algebra are orthogonal projections.
172
Chapter 5. Inner Product Spaces
Example 5.9 (The orthogonal projectionfrom JR3 onto the xy-plane) In the Euclidean 3-space JR3, let U be the xy-plane with the orthonormal basis ex = [ej , fl}. Then, the orthogonal projection Proju(x) = (ej , x)el + (fl, X)fl is the orthogonal projection onto the xy-plane in a usual sense in geometry, and x- Proju (x) E U 1.., which is the zaxis. It actually means that Proju(XI, X2, X3) = (Xl , X2, 0) for any x = (XI,X2 , X3) E ~. 0 Example 5.10 (The orthogonal projection from JR2 onto the x-axis) As in Example 5.8, let X, Y and Z be the I-dimensional subspaces of the Euclidean 2-space JR2 spanned by the vectors eJ, e2, and v = el + fl = (1, I), respectively. Then clearly Y = Xl.. and V :f:. Xl... And, for the projections Tx and Sx ofJR2 given in Example 5.8, Tx is the orthogonal projection , but Sx is not, so that Tx Proh and SX:f:. Proh . 0
=
Theorem 5.13 Let U be a subspace of an inner product space V , and let x E V. Then, the orthogonal projection Proju(x) ofx satisfies IIx - Proju(x)1I ::: IIx - YII for all y E V . The equality holds ifand only ify = Proju(x).
Proof: First, note that for any vector x E V, we have Proju(x) E V and x Proju(x) E Thus, for all y E V,
u-.
IIx -
yf
= = >
+ (Proju(x) _ Y)1I 2 2 IIx - Proju(x) 11 + IIProju(x) _ yll2 IIx - Proju (x) 11 2, II (x - Proju(x»
where the second equality comes from the Pythagorean theorem for the orthogonality 0 (x - Proju(x» .L (Proju(x) - y) . (See Figure 5.2.)
It follows from Theorem 5.13 that the orthogonal projection Proju(x) ofx is the unique vector in U that is closest to x in the sense that it minimizes the distance from x to the vectors in V. It also shows that in Definition 5.7, the vector Proju(x) is independent of the choice of an orthonormal basis for the subspace U . Geometrically, Figure 5.2 depicts the vector Proju(x) .
°
Problem 5.14 Find the point on the plane x - y - z = that is closest to p
= (1, 2,
0).
Problem 5.15 Let U C ]R4 be the subspace of the Euclidean 4-space ]R4 spanned by (1, 1, 0, 0) and (1, 0, 1, 0), and let We ]R4 be the subspace spanned by (0, 1, 0, 1) and (0, 0, I , 1). Find a basis for and the dimension of each of the following subspaces: (1) U
+
W,
(2)
u-,
(3) Ul.. +
w-,
(4)
un w.
5.6. Orthogonal projections
x - Proju(x) E UJ..
x
.. ' ....
..
173
.... X-y
...... ,.:.:!'"
..JV
'
Proju(x)
U
Figure 5.2. Orthogonal projection Proju
Problem 5.16 Let U and W be subspaces of an inner product space V . Show that (I) (U
+
W)J..
= UJ.. n WJ...
(2) (U
n W)J.. = UJ.. +
WJ...
As a particular case, let V = jRn be the Euclidean n-space with the dot product and let U {ru : r E jR} be a l-dimensional subspace determined by a unit vector u. Then for a vector x in jRn, the orthogonal projection of x into U is
=
Proju(x)
= (u - x)u = (u T x)u = u(u Tx) = (uuT)x.
(Here, the last two equalities come from the facts that u . x = u TX is a scalar and the associativity of a matrix product uuT x, respectively.) This equation shows that the matrix representation of the orthogonal projection Proju with respect to the standard basis ex is [Projula = uuT . If U is an m-dimensional subspace of R" with an orthonormal basis {UI , U2 , " " urn}, then for any x E jRn, Proju(x) =
(UI' X)UI
=
Ut(u[x)
=
(UIU;
+
+
(U2 . X)U2
+ . .. +
(urn' x)um
+ u2(ufx) + ... + um(u~x) U2Ur
+ ... +
UmU~)X.
Thus, the matrix representation of the orthogonal projection Proju with respect to the standard basis ex is
Definition 5.8 The matrix representation [Proju la of the orthogonal projection Proju : jRn --+ jRn of jRn onto a subspace U with respect to the standard basis ex is called the (orthogonal) projection matrix on U. Further discussions about the orthogonal projection matrices will be continued in Section 5.9.3.
174
Chapter 5. Inner Product Spaces
=
Example 5.11 (Distancefrom apointto a line) Letax+by+c 0 be a line L in the plane JR2. (Note that the line L cannot be a subspace ofJR2 if c f= 0.) For any two points Q = (Xl. YI) and R = (X2 , Y2) on the line, the equality a(x2 - Xl) + b(Y2 - YI) = 0 ~lies that the nonzero vector D = (a , b) is perpendicular to the line L, that is,
QR..L D. Let P
=
(xo, YO) be any point in the plane JR2 . Then the distance d between the
point P and the line L is simply the length of the orthogonal projection of D, for any point Q = (Xl, YI) in the line. Thus,
d = = = =
QP into
II Projn(QP)II
I(QP·
11:11)1 (the dot product) la(xo - Xl) + b(yo - YI)I Ja2 + b2 laxo + byo + cl Ja2 + b2 p
= (xo. YO)
Figure 5.3. Distance from a point to a line
Note that the last equality is due to the fact that the point Q is on the line (i.e., aXI + bYI + c = 0). To find the orthogonal projection matrix, let u 11:11 ';)+b 2(a, b). Then the orthogonal projection matrix onto U = {ru : r E JR} is
=
uuT = a 2
~ b2 [ :
] [a b] = a 2
=
~ b2 [:; ~~
l
Thus, if x = (1, 1) E JR2, then Proju(x)
1 [ = (uuT)x = -2--2 a +b
2
ab
a
o
5.7. Relations of fundamental subspaces
175
Problem 5.17 Let V = P3(lR) be the vector space of polynomials of degree ~ 3 equipped with the inner product
(f, g)
= 10
1
f(x)g(x) dx for any f and gin V .
Let W be the subspace of V spanned by [I , x}, and define f(x) projection Proj w(f) of f onto W.
= x 2. Find the orthogonal
5.7 Relations of fundamental subspaces
=
We now go back to the study of a system Ax b of linear equations with an m x n matrix A. One of the most important applications of the orthogonal projection of vectors onto a subspace is the decompositions of the domain space and the image space of A by the four fundamental subspacesN(A), 'R(A) in JRn and C(A), N(A T) in JRm (see Theorem 5.16). From these decompositions, one can completely determine the solution set of a consistent system Ax = b.
Lemma 5.14 For an m x n matrix A, the null spaceN(A) and the row space'R(A) are orthogonal: i.e. , N(A) 1- R(A) in ]Rn. Similarly, N(A T) 1- C(A) in ]Rm. Proof: Note that WE N(A) if and only if Aw = 0, i.e., for every row vector r in A, r . W = O. For the second statement, do the same with AT. D From Lemma 5.14 , it is clear that N(A)
S; 'R(A) .L
(or'R(A) S; N(A).L), and
N(A T)
S; C(A).L
(or C(A) S; N(AT).L).
Moreover, by comparing the dimensions of these subspaces and by using Theorem 5.11 and Rank Theorem 3.17, we have dim R(A) + dimN(A)
n
dimC(A)
m =
+
dimN(A T) =
+ dim C(A) +
= dim'R(A)
dim 'R(A).L, dim C(A).L .
This means that the inclusions are actually equalities.
Lemma 5.15 (1) N(A)
=
R(A).L (or'R(A) (2) N(A T) = C(A).L (orC(A) = N(AT).L).
= N(A).L).
We show that the row space 'R(A) is the orthogonal complement of the null space N(A) in R", and vice-versa. Similarly, the same thing happens for the column space C(A) and the null space N (A T) of AT in JRm . Hence, by Theorem 5.11, we have the following orthogonal decomposition.
176
Chapter 5. Inner Product Spaces
Theorem 5.16 For any m x n matrix A,
(1) N (A) EB R(A) = ]Rn , (2) N(A T ) EBC(A) = ]Rm. Note that if rank A = r so that dimR(A) = r = dimC(A), then dimN(A) = n - rand dimN(A T ) = m - r. Considering the matrix A as a linear transformation A : ]Rn -+ ]Rm, Figure 5.4 depicts Theorem 5.16.
A
- - - -b-
N( A) :::: jRn - r Xn
Figure 5.4. Relations of four fundamental subspaces
Corollary 5.17 The set of solutions of a consistent system Ax = + N (A), where Xo is any solution of A x = b.
b is precisely
Xo
Proof: Let Xo E ]Rn be a solution of a system Ax = b. Now consider the set Xo + N(A), which isjust a translation of N(A) by xo. (1) For any vector Xo +0 in Xo +N(A) , it is also a solution because A(xo + 0) = Axo = b . (2) If x is another solution, then clearly x - Xo is in the null space N(A) so that 0 x = Xo + 0 for some 0 E N (A ), i.e., x E Xo + N(A). In particular, if rank A = m (so that m ~ n), then C(A) = ]Rm. Thus, for any b E ]Rm, the system Ax = b has a solution in ]Rn . (This is the case of the existence Theorem 3.24) . On the other hand, if rank A = n (so that n ~ m), then N(A) = (OJ and R (A) = ]Rn. Therefore, the system Ax = b has at most one solution, that is, it has a unique solution x in R(A ) if b E C(A), and has no solution if b rj. C(A). (Th is is the case of the uniqueness Theorem 3.25) . The latter case may occur when m > r = rank A; that is, N (A T) is a nontrivial subspace of]Rm, and will be discussed later in Section 5.9.1.
5.8. Orthogonal matrices and isometries
177
Problem 5.18 Prove the following statements .
(l) If Ax (2) If Ax
= b and AT Y = 0, then yTb = 0, = 0 and AT y = c, then x T c = 0,
i.e., y.l b. i.e., x .1 c.
Problem 5.19 Given two vectors (1 , 2, I, 2) and (0, - I , -1 , 1) in lR4 , find all vectors in ]R4 that are perpendicular to them. Problem 5.20 Find a basis for the orthogonal complement of the row space of A :
(1) A
=
1 28] [ 23 0-1 61 ,
(2) A
=
001] [0 0 I . 111
5.8 Orthogonal matrices and isometries In Chapter 4, we saw that a linear transformation can be associated with a matrix, and vice-versa. In this section , we are mainly interested in those linear transformations (or matrices) that preserve the length of a vector in an inner product space. Let A = [CI . . . cn] be an n x n square matrix, with columns CI, . . • , Cn' Then, a simple computation shows that
A T A=
[--
C~:
- - cT n
- -] [
__
I I
CI
cT
Hence, if the column vectors are orthonormal, C j = 8ij, then AT A = In, that is, AT is a left inverse of A, and vice-versa. Since A is a square matrix , this left inverse must be the right inverse of A, i.e., AA T = In. Equivalently, the row vectors of A are also orthonormal. This argument can be summarized as follows .
Lemma 5.18 For an n x n matrix A, the following are equivalent. (1) (2) (3) (4) (5)
The column vectors of A are orthonormal. AT A = In. AT A-I .
=
AAT=In .
The row vectors of A are orthonormal.
Definition 5.9 A square matrix A is called an orthogonal matrix if A satisfies one (and hence all) of the statements in Lemma 5.18. Clearly, A is orthogonal if and only if AT is orthogonal.
178
Chapter 5. Inner Product Spaces
Example 5.12 (Rotations and reflections are orthogonal) The matrices A = [
c~s 0
smO
- sin 0 ]
cosO
'
B = [
c~s 0
sin 0 ] smO -cosO
are orthogonal, and satisfy A-I
= AT = [ c~sO
B-1 = B T = [
sinO] - smO cosO '
c~sO
sinO] . smO -cosO
Note that the linear transformation T : ]R2 -+ ]R2 defined by T (x) = Ax is a rotation through the angle 0, while S : ]R2 -+ ]R2 defined by Sex) = Bx is the reflection about the line passing through the origin that forms an angle 0/2 with the positive x-axis.D Example 5.13 (All 2 x 2 orthogonal matrices) Show that every 2 x 2 orthogonal matrix must be one of the forms
cosO - sinO] [ sinO cosO
or
cos O sinO] [ sinO - cos e .
= [~ ~] is an orthogonal matrix, so that AA T = h = AT A. The first equality gives a 2 + b 2 = 1, ac + bd = 0, and c 2 + d 2 = 1. The second equality gives a 2 + c 2 = 1, ab + cd = 0, and b 2 + d 2 = 1. Thus, b = ±c. If b = -c, then we get a = d . If b = c, then we get a = -d. Now, choose 0 so that
Solution: Suppose that A
a = cos 0 and b = sin O.
0
Problem 5.21 Find the inverse of each of the following matrices . (1)
1
[
0
o -
0
0 ]
sinO sin 0 cos (}
cosO
,
1/./2 -1/./2 0]
(2)
[ -1/~ -1/~ ~
.
What are they as linear transformations on ]R3 : rotations, reflections , or other?
Problem 5.22 Find eight 2 x 2 orthogonal matrices which transform the square -1 onto itself.
~
x, y
~
1
As shown in Examples 5.12 and 5.13, all rotations and reflections on the Euclidean 2-space ]R2 are orthogonal and preserve intuitively both the lengths of vectors and the angle between two vectors. In fact, every orthogonal matrix A preserves the lengths of vectors :
IIAxll2 = Ax- Ax = (Ax)T (Ax) = x T AT Ax = x T X = IIxf .
5.8. Orthogonal matrices and isometries
179
Definition 5.10 Let V and W be two inner product spaces. A linear transformation T : V -+ W is called an isometry, or an orthogonal transformation, if it preserves the lengths of vectors , that is, for every vector x E V IIT(x)1I = [x]: Clearly, any orthogonal matrix is an isometry as a linear transformation. If T : V -+ W is an isometry, then T is one-to-one, since the kernel of T is trivial: T (x) = 0 implies [x] = IIT(x)1I = O. Thus, if dim V = dim W, then an isometry is also an isomorphism. The following theorem gives an interesting characterization of an isometry . Theorem 5.19 Let T : V -+ W be a linear transformation from an inner product space V to another W. Then, T is an isometry if and only if T preserves inner products, that is, (T(x), T(y)} = (x, y) for any vectors x, yin V.
Proof: Let T be an isometry. Then IIT(x)f = IIxll 2 for any x E V . Hence, (T(x+y), T(x+y )}
= IIT(x+Y)1I 2 = IIx+yll2 = (x+y,x+Y)
for any x, y E V. On the other hand , (T(x + y) , T(x + y)} =
(x + y, x + y) =
(T(x), T(x)} + 2(T(x), T(y)} + (T(y) , T(y)},
(x, x) + 2(x, y} + (y, y),
from which we get (T(x), T(y)} = (x, y). The converse is quite clear by choosing y = x.
o
Theorem 5.20 Let A be an n x n matrix. Then, A is an orthogonal matrix if and only if A : jRn -+ jRn, as a linear transformation, preserves the dot product. That is, for any vectors x, y E jRn, Ax - Ay =x .y. Proof: The necessity is clear. For the sufficiency, suppose that A preserves the dot product. Then for any vectors x, y E jRn,
Ax- Ay Take x
= xT ATAy = xTY = x · y.
= ei and y = ej . Then , this equation is just [AT A]ij = liij.
o
Since d(x , y) = IIx - YII for any x and y in V , one can easily derive the following corollary.
180
Chapter 5. Inner Product Spaces
Coronary 5.21 A linear transformation T : V d(T(x), T(y»
~
W is an isometry
if and only if
= d(x, y)
for any x and yin V. Recall that if () is the angle between two nonzero vectors x and y in an inner product space V, then for any isometry T : V ~ V, cos () =
{x, y} =
IIxlillYIl
{Tx, Ty} .
IITxllllTYIl
Hence, we have Coronary 5.22 An isometry preserves the angle. The converse of Corollary 5.22 is not true in general. A linear transformation = 2x on the Euclidean space jRn preserves the angle but not the lengths of vectors (i.e., not an isometry). Such a linear transformation is called a dilation. We have seen that any orthogonal matrix is an isometry as the linear transformation T (x) = Ax. The following theorem says that the converse is also true, that is, the matrix representation of an isometry with respect to an orthonormal basis is an orthogonal matrix.
T (x)
Theorem 5.23 Let T : V ~ W be an isometry from an inner product space V to another W of the same dimension. Let ex = {v}, .. . , vn } and fJ = {WI , . . . , w n } be orthonormal bases for V and W, respectively. Then, the matrix [T]~ for T with respect to the bases ex and fJ is an orthogonal matrix. Proof: Note that the k-th column vector of the matrix [T]~ is just [T (Vk)],8' Since T preserves inner products and ex, fJ are orthonormal, we get
o
which shows that the column vectors of [T]~ are orthonormal. Remark: In summary, for a linear transformation T : V equivalent:
~
W, the following are
(1) T is an isometry: that is, T preserves the lengths of vectors. (2) T preserves the inner product. (3) T preserves the distance. (4) [T]~ with respect to orthonormal bases ex and fJ is an orthogonal matrix.
Anyone (hence all) of these conditions implies that T preserves the angle, but the converse is not true.
5.9.1. Application: Least squares solutions
Problem 5.23 Find values r > 0, (1) Q =
S
> 0, a > 0, band c such that matrix Q is orthogonal.
° -s2ssa] b , c r
[
181
r
-s a] 3s b
.
-15 c
Problem 5.24 (Bessel's Inequality) Let V be an inner product space, and let {VI, . . . , vm } be a set of orthonormal vectors in V (not necessarily a basis for V) . Prove that for any x in V, IIxll 2 2: L:i"=ll(x, Vi)!2 .
Problem 5.25 Determine whether the following linear transformations on the Euclidean space are orthogonal.
]R3
(1) T(x, y, z)
(2) T(x, y, z)
= (z, :!,fx + iY, ! - :!,fy). 11 12 5 ) = (U5 x + UZ' UY - UZ' x .
5.9 Applications 5.9.1 Least squares solutions In the previous section, we have completely determined the solution set for a system Ax = b when b E C(A). In this section, we discuss what we can do when the system Ax b is inconsistent, that is, when b It C(A) £ IRm • Certainly, there exists no solution in this case, but one can find a 'pseudo' -solution in the following sense. Note that for any vector x in IRn, Ax E C(A) . Hence, the best we can do is to find a vector XO E jRn so that AXQ is the closest to the given vector b E jRm : i.e., II AXO - b II is as small as possible. Such a vector XQ will give us the best approximation Ax to b for all vectors x in IRn , and it is called a least squares solution of Ax = b. To find a least squares solution, we first need to find a vector in C(A) that is closest to b. However, from the orthogonal decomposition IRm = C(A) EEl N(A T ) , any b E IRm has the unique orthogonal decomposition as
=
b = be + b, E C(A) EElN(AT ) = IRm , where be = Prok(A)(b) E C(A) and b n = b - be E N(A T ) . Here, the vector be = PrOk(A) (b) E C(A) has two basic properties:
(I) There always exists a solution XQ E IRn of Ax = be, since be E C(A), (2) be is the closest vector to b among the vectors in C(A) (see Theorem 5.13). Therefore, a least squares solution XQ E IRn of Ax = b is just a solution of Ax = be. Furthermore, if XQ E IRn is a least squares solution, then the set of all least squares solutions is XQ +N(A) by Corollary 5.17. In particular, ifb E C(A), then b = be, so that the least squares solutions are just the 'true' solutions of Ax = b . The second property of be means that a least squares
182
Chapter 5. Inner Product Spaces
solution Xo E jRn of Ax = b gives the best approximation Axo = b, to b : i.e., for any vector x in jRn, IIAxo - b] :::: II Ax - b]. In summary, to have a least squares solution of Ax = b, the first step is to find the orthogonal projection b c = PrOjc(A) (b) E C(A) ofb, and then solve Ax = b c as usual . One can find b c from b E jRm by using the orthogonal projection if we have an orthonormal basis for C(A). But, such a computation of b c could be uncomfortable, because the only way we know so far to find an orthonormal basis for C(A) is the Gram-Schmidt orthogonalization (whose computation may be cumbersome). However, there is a bypass to avoid the Gram-Schmidt orthogonalization. For this, let us examine a least squares solution once again . If XO E jRn is a least squares solution of Ax = b, then
holds since AXO = bc. Thus, AT (Axo-b) = AT (-bn ) = A Tb, that is, XO is a solution of the equation ATAx=ATb.
oor equivalently AT Axo =
I
This equation is very interesting because it is also a sufficient condition for a least squares solution as the next theorem shows, and it is called the normal equation of Ax=b. Theorem 5.24 Let A be an m x n matrix, and let b e jRm be any vector. Then, a vector XO E jRn is a least squares solution of Ax = b ifand only ifxo is a solution of the normal equation AT Ax = ATb. Proof: We only need to show the sufficiency. Let Xo be a solution of the normal equation AT Ax = ATb. Then, AT (Axo - b) = 0, so AXO - b E N(A T). Say, AXO - b = n inN(A T), and let b = b c + b n E C(A) EBN(A T) . Then, Axo - b c·= n-j-b; E N(A T). Since Axo- b c is also contained in C(A) andN(AT)nC(A) = {OJ, AXO = b, = PrOjc(A) (b) , i.e., xo is a least squares solution of Ax = b . 0 Example 5.14 (The best approximated solution of an inconsistent system Ax = b) Find all the least squares solutions of Ax = b, and then determine the orthogonal projection b c ofb into the column space C(A), where
1 -2 2 -3 A = -1 1 [ 3 -5
5.9.1. Application: Least squares solutions
183
Solution: (The reader may check that Ax = b has no solutions).
-1 3] [ ~
2
1 -5
-3
2
-1
and ATb =
0
-1
3
[ 1 2-1 3] [~]~ [ 0] -2 -3 1 -1
1 -5 2 0
From the normal equation, a least squares solution of Ax ATb, i.e.,
[
-1 3
=
.
= b is a solution of A T Ax =
-~~ -;~ -~] [;~] = [ -~ ] . -3
3
6
3
X3
By sol ving this system of equations (left for an exercise), one can obtain all the least squares solutions, which are of the form:
for any number t E JR. Moreover,
Note that the set of least squares solutions is Xo 5/3 O]T andN(A) = {t[5 3 I f : t E JR}.
+
N(A ), where
=
[
10 2]
o
-1
-1
2 1
2
2
-1
0
'
b
= [-8/3 0
Problem 5.26 Find all least squares solutions x in ]R3 of Ax
A
xo
=
= b, where
[3] -3
O
'
-3
Note that the normal equation is always consistent by the construction, and, as Example 5.14 shows, a least squares solution can be found by the Gauss-Jordan elimination even though AT A is not invertible.lfb E C(A) or, even better, if the rows of A are linearly independent (thus, rank A = m andC(A) = JRm), then be = b so that
184
Chapter 5. Inner Product Spaces
the system Ax = b is always consistent and the least squares solutions coincide with the true solutions.Therefore,for any givensystem Ax = b, consistentor inconsistent, by solving the normalequation AT Ax = ATb, one can obtaineither the true solutions or the least squares solutions. If the square matrix AT A is invertible,then the normal equation A T Ax = ATb of the system Ax = b resolves to x = (AT A)-I ATb, which is a least squares solution. In particular, if AT A = In , or equivalently the columns of A are orthonormal (see Lemma5.18), then the normal equationreduces to the leastsquaressolution x = ATb. The following theorem gives a condition for A T A to be invertible. Theorem 5.25 Forany m x n matrix A, AT A is a symmetric n x n square matrix and rank (A T A) = rank A. Proof: Clearly, AT A is square and symmetric. Since the number of columns of A and AT A are both n, we have
rank A + dimN(A) = n = rank (AT A)
+ dimN(A T A).
Hence, it sufficesto showthat.V(A) = N(A T A) sothatdimN(A) = dimN(A T A). It is trivial to see that N (A) S; N (A T A), since Ax = 0 implies AT Ax = O. Conversely, suppose that AT Ax = O. Then Ax - Ax
= (Ax)T (Ax) = x T (AT Ax) = xTO = O.
Hence Ax = 0, and x E N(A) .
o
It follows from Theorem 5.25 that AT A is invertible if and only if rank A = n: that is, the columns of A are linearly independent. In this case, N(A) = {OJ and so the system Ax = b has a unique least squares solution XQ in 'R(A) = JRn , which is
This can be summarized in the following theorem: Theorem 5.26 Let A be an m x n matrix. If rank A = n, or equivalently the columns of A are linearlyindependent, then
(1) AT A is invertible so that (AT A)-I AT is a left inverse of A, (2) the vector xo = (AT A)-I ATb is the unique least squares solution of a system Ax = b, and (3) AXO = A(A T A)-IATb = be = ProjC(A)(b), that is, the orthogonal projection ofJRm onto C(A) is PrOjC(A) = A(A T A)-lAT. Remark: (1) For an m x n matrix A, by applying Theorem 5.26 to AT, one can say that rank A = m if and only if AA T is invertible. In this case AT (AAT)-I is a right inverse of A (cf. Remark after Theorem 3.25). Moreover, AA T is invertible if and only if the rows of A are linearly independent by Theorem 5.25.
5.9.1. Application: Least squares solutions
185
(2) If the matrix A is orthogonal, then the columns UI , . . . , Un of A form an orthonormal basis for the column space C(A ) , so that for any be lRm , be = (UI . bju,
+ . ..+
(Un'
+ . ..+
b)u n = (uluf
unu~)b
and the projection matrix is T . Pr oJC(A) = UIUI
+ ... +
T
unun·
In fact, this result coincides with Theorem 5.26: If A is orthogonal so that AT A = In, then
and the least squares solution is UF - - ]
: u~ --
_ [ UI.· b ]
b-:
Un'
,
b
which is the coordinate expression ofAXo = be = PrOk(A)(b) with respect to the orthonormal basis {UI , ... , un} forC(A). In general, the columns of A need not be orthonormal, in which case the above formula is not possible. In Section 5.9.3, we will discuss more about this general case. (3) If rank A = r < n, one can reduce the columns of A to a basis for the column space and work with this reduced matrix A (thus, AT A is invertible) to find the orthogonal projection PrOk(A) = A (A T A)-I AT of R" onto the column space C(A) . However, the least squares solutions of Ax = b should be found from the original normal equation directly, since the least squares solution Xo = (AT A)-I ,4Tb of Ax = b has only r components so that it cannot be a solution of Ax = be. Example 5.15 (Solving an inconsistent system Ax = b by thenormalequation) Find the least squares solutions of the system:
Determine also the orthogonal projection be of b in the column space C(A) . Solution: Clearly, the two columns of A are linearly independent and C(A) is the xy-plane. Thus , b ¢ C(A ). Note that
186
Chapter 5. Inner Product Spaces
which is invertible . By a simple computation one can obtain
Hence,
is a least squares solution, which is unique sinceN (A) = O. The orthogonal projection ofb in C(A) is
2] [
1 be = Ax = [ ~ ~
14/3] [ 4] -1/3 = ~ .
o
Problem 5.27 Find all the least squares solutions of the following inconsistent system of linear equations:
5.9.2 Polynomial approximations In this section, one can find a reason for the name of the "least squares" solutions, and the following example illustrates an application of the least squares solution to the determination of the spring constants in physics . Example 5.16 Hooke's law for springs in physics says that for a uniform spring, the length stretched or compressed is a linear function of the force applied, that is, the force F applied to the spring is related to the length x stretched or compressed by the equation F=a+kx , where a and k are some constants determined by the spring . Suppose now that, given a spring of length 6.1 inches, we want to determine the constants a and k under the experimental data: The lengths are measured to be 7.6, 8.7 and lOA inches when forces of 2, 4 and 6 kilograms , respectively, are applied to the spring . However, by plotting these data (x, F) = (6.1, 0), (7.6, 2), (8.7, 4), (lOA , 6),
5.9.2. Application: Polynomial approximations
187
in the x F -plane, one can easily recognize that they are not on a straight line of the form F a + kx in the x F -plane, which may be caused by experimental errors. This means that the system of linear equations
=
r F2 F3 F4
= = = =
a + 6.lk a +7.6k a + 8.7k a + lOAk
= = = =
0 2 4
6
is inconsistent (i.e., has no solutions so the second equality in each equation may not be a true equality). It means that if we put b = (0,2,4,6) and F = (FI, F2, F3, F4) as vectors in R 4 representing the data and the points on the line at Xi'S, respectively, then II b - F I is not zero. Thus, the best thing one can do is to determine the straight line a + kx = F that 'fits' the data best: that is, to minimize the sum of the squares of the vertical distances from the line to the data (Xi, Yi) for i = I, 2, 3, 4 (See Figure 5.5) (this is the reason why we say least squares): (0 - FI)2
+
(2 - F2)2 + (4 - F3)2 + (6 - F4)2 = lib - F1I 2 .
Thus, for the original inconsistent system
Ax
=
6.1] [~] = [ 0] I~ ~:~ ~ =b ~
[I
lOA
C(A),
6
we are looking for F E C(A), which is the projection of b onto the column space C(A) of A, and the least squares solution xo, which satisfies Axo = F. F 10 8 6 4 2
Figure 5.5. Least squares fitting
It is now easily computed as (by solving the normal equation A T Ax = A Tb)
[ ~ ] = x = (AT A)-I ATb = [ -~:~ ] . It gives F = -8.6 + lAx .
o
188
Chapter 5. Inner Product Spaces
In general, a common problem in experimental work is to obtain a polynomial in two variables x and Y that best 'fits' the data of various values of Y determined experimentally for inputs x, say Y
= f(x)
plotted in the xy-plane. Some possible fitting polynomials are (1) by a straight line: y = a + bx, (2) by a quadratic polynomial : y = a + bx + cx 2 , or (3) by a polynomial of degree k: y = ao + alx + ...+ akxk, etc.
=
As a general case, suppose that we are looking for a polynomial y f(x) = ao + alX + a2x2 + . . . + akx k of degree k that passes through the given data. Then we obtain a system of linear equations,
I
f(XI) = ao + alxl + a2x~ + f(X2) ao + alx2 + a2x 2 +
+ +
f(x n) = ao + alXn + a2x; +"
. .+
=
or, in matrix form, the system may be written as Ax
[
x~
t] [
11 Xl X2 x 2
X x
1 Xn x;
x~
···
2
...
=
YI Y2
akx~ =
Yn,
akxt akx2
=
= b:
ao al ]
[ Y2 YI ]
,'' -
...
ak
'
Yn
The left-hand side Ax represents the values of the polynomial atx, 's and the right-hand side represents the data obtained from the inputs Xi'S in the experiment. If n ~ k + 1, then the cases have already been discussed in Section 3.9.1. If n > k + 1, this kind of system may be inconsistent. Therefore, the best thing one can do is to find the polynomial f(x) that minimizes the sum of the squares of the vertical distances between the graph of the polynomial and the data. But, it is equivalent to find the least squares solution of the system Ax = b, because for any C E C(A) of the form
xt ] [ [jX
n
x2
x; ....." xi
o] a al
a~
[ ao + alxl + .. . + akxt ] ao+alx2+ · ··+akx2
=
ao + alx
n
1.. .+ akX~
we have
lib - cII 2 =
(YI - ao - a\x\ - .. " - akxf)2
+ ...
+(Yn - ao - alXn - ". . - akx~)2"
= c,
5.9.2. Application: Polynomialapproximations
189
The previous theory says that the orthogonal projection be ofb into the column space of A minimizes this quantity and shows how to find be and a least squares solution
Xo· Example 5.17 Find a straight line y = a + bx that fits best the given experimental data, (1,0), (2,3), (3,4) and (4,4). Solution: We are looking for a line y = a + bx that minimizes the sum of squares of the vertical distances IYi - a - bx, I's from the line Y = a + bx to the data (Xi, Yi). By adapting matrix notation
we have Ax = b and want to find a least squares solution of Ax = b. But the columns of A are linearly independent, and the least squares solution is x = (AT A)-l ATb. Now,
10]
30
'
(ATA)_I=[
~ -~] ATb=[ll] 1
l'
--
-
2
34
.
5
Hence, we have
o Problem 5.28 From Newton's second law of motion, a body near the surface of the earth falls vertically downward according to the equation
set) = so + vot
+
1
zgt 2 ,
where set) is the distance that the body travelled in time t, andso, Vo are the initial displacement and velocity, respectively, of the body, and g is the gravitational acceleration at the earth's surface. Suppose a weight is released, and the distances that the body has fallen from some reference point were measured to be s -0.18, 0.31, 1.03, 2.48,3.73 feet at times t 0.1. 0.2, 0.3, 0.4. 0.5 seconds. respect ively. Determine approximate values of so. vo, g using these data.
=
=
190
Chapter 5. Inner Product Spaces
5.9.3 Orthogonal projection matrices In Section 5.9.1, we have seen that the orthogonal projection ProjC(A) of the Euclidean space lRm on the column space C(A) of an m x n matrix A plays an important role in finding a least squares solution of an inconsistent system Ax b. Also, the orthogonal projection ProjC(A) is the main tool in the Gram-Schmidt orthogonalization. In general , for a given subspace U of lRm , the computation of the orthogonal projection Proju oflRm onto U appears quite often in applied science and engineering problems. The least squares solution method can also be used to find the orthogonal projection: Indeed, by taking first a basis for U and then making an m x n matrix A with these basis vectors as columns , one clearly gets U = C(A) , and so by Theorem 5.26
=
I
Proju = ProjC(A) = A(A T A)-lAT .
In fact, this projection itself is the orthogonal projection matrix, that is, the matrix representation of Proju with respect to the standard basis ex,
Note that this projection matrix Proju is independent of the choice of a basis for U due to the uniqueness of the matrix representation of a linear transformation with respect to a fixed basis. Example 5.18 Find the projection matrix P on the plane 2x - y - 3z = 0 in the space lR3 and calculate Pb for b = (1, 0, 1). Solution: Choose any basis for the plane 2x - y - 3z VI
Let A
~[
_:
= (0, 3, -1)
and
V2
~] be the matrix with vr end (A TA )- I = [10 6
6 5
The orthogonal projection matrix P
P
= =
=
]-1
=
= 0, say,
= (1 , 2, 0).
V2 as columns,Thon
..!.- [ 14
5 -6] - 6 10 .
= ProjC(A) is
A(A T A)-IA T
[[ 14
[H -i ~ 0
2 -1 [102 13 14 6 -3
5 -6] [0 3
-6
-H
10
1 2
-~ ]
5.9.3. Application: Orthogonal projection matrices and
Pb
= .!..
14
[1~6 -31~ _~] [~] 5 1
= .!.. [ 14
2~11 ].
191
o
If an orthonormal basis for U is known, then the computation of Proju = A(A T A)-I AT is easy, as shown in Remark (2) on page 185: For an orthonormal basis f3 = {uI, U2, . .. , un} for U, the orthogonal projection matrix onto the subspace U is given as
=
=
=
Example 5.19 If A [CI c2l, where CI (1,0,0), C2 (0,1,0), then the column vectors of A are orthonormal, C(A) is the x y-plane, and the projection of b = (x, y, z) E 1R3 onto C(A) is be = (x , y, 0). In fact,
which is equal to CIcj
+
0
C2C{.
Note that, if we denote by Proju; the orthogonal projection of R" on the subspace spanned by the basis vector u, for each i, then its projection matrix is uiuT, and so
and
Problem5029 Let u U
ifi=/=j if i = j .
= ()z ,)z) be a vector in lR2 which determines l-dimensional subspace
= {au = (:72' :72)
:a
E R] . Show that the matrix
considered as a linear transformation on lR2 , is an orthogonal projection onto the subspace U.
Problem5.30 Show that if {VI , v2, .00. v m} is an orthonormal basis for lRm, then VIv[ +
T T_[m· V2 v2+" ,+vmv m-
Chapter 5. Inner ProductSpaces
192
In general, if a basis {CI, C2, . . . , cn} for U is given, but not orthonormal,then one has to directly compute Proju = A(A T A)-I AT, where A = [CI c2 . . . cn] is an m x n matrix whose columns are the given basis vectors c.'s. Sometimesit is necessaryto computean orthonormalbasisfromthe givenbasis for U by the Gram-Schmidt orthogonalization. Its computationgivesus a decomposition of the matrix A into an orthogonal part and an upper triangular part, from which the computation of the projection matrix might be easier. QR decomposition method: Let {CI, C2, ... ,cn} be an arbitrary basis for a subspace U. The Gram-Schmidt orthogonalization process to this basis may be written as the following steps: (1) From the basis {CI, C2, . . . , cn} , find an orthogonal basis {ql' q2, ... , qn} for Uby
CI
q2
= =
C2 -
(ql, C2) ql (ql,qJ)
qn
=
Cn -
(qn-I,Cn) (ql, cn) qn-I - . . . ql · (ql, ql) (qn-I , qn-I)
ql
(2) By normalization of these vectors: u, = qi/llqj II, one obtains an orthonormal basis {UI , . . . , un} for U. (3) By rewriting those equations in (I), one gets
CI = C2 =
h aij = were
ql al2ql
+
(qi,Cj ) ~ . (qi ,qi) lor I
bij
for i
=
q2
=
.
< ], au =
bllUI
bl2 uI
+
b22 U2
1 d ,an
(q j, Cj)
= aij IIqjII = -(--) IIqill = (u., qj,qj
Cj),
: s i. which is just the component of Cj in u, direction.
(4) Let A = [CI C2 . .. cn]' Then, the equations in (3) can be written in matrix
notation as bll
o
A = [CI C2 . . . cn] = [UI U2 . . . un]
~
[
bvi
bin] b2n
b22
o
bnn
=QR,
5.9.3. Application: Orthogonal projection matrices
193
where Q = [UI U2 .. . un] is the m x n matrix whose orthonormal columns are obtained from Cj 's by the Gram-Schmidt orthogonalization, and R is the n x n upper triangular matrix. (5) Note that rank A = rank Q = n ::: m , and C(Q) = V = C(A) which is of dimension n in IR m • Moreover, the matrix R is an invertible n x n matrix, since each b jj = (uj , C j) is equal to IIcj - ProjUj_l (Cj) II , where Vj_1 is the subspace of IR m spanned by {CI, C2, . .. , cj-d (or equivalently, by (UI , U2, . • •, uj-d), and so bjj ;6 0 for all j because Cj ~ Vj-I. One of the byproducts of this computation is the following theorem. Theorem 5.27· Any m x n matrix ofrank n can be factored into a product Q R, where Q is an m x n matrix with orthonormal columns and R is an n x n invertible upper triangular matrix. Definition 5.11 The decomposition A = QR is called the QR factorization or the QR decomposition of an m x n matrix A, (rank A = n), where the matrix Q [UI U2 .. . un] is called the orthogonal part of A, and the matrix R [bij] is called the upper triangular part of A .
=
=
Remark: In the QR factorization of A = QR, the orthononnality of the column vectors of Q means QT Q = In, and the j-th column of the matrix R is simply the coordinate vector of Cj with respect to the orthonormal basis f3 = {u I, U2 , • .. , un} for V : i.e., and so
o
o
With this QR decomposition of A, the projection matrix and the least squares solution of Ax = b for b E IR m can be calculated easily as
P
Xo
= =
A(A T A)-I AT (AT A)-I ATb
= QR(R T QT QR)-I R T QT = QQT, = (R T QT QR)-I R T QTb = R- 1 QTb.
Corollary 5.28 Let A be an m x n matrix of rank n and let A = Q R be its Q R factorization. Then, (1) the projection matrix on the column space of A is [Prok(A)]a = Q QT . (2) The least squares solution of the system Ax = b is given by XQ = R - I QTb, which can be solved by using back substitution to the system Rx = QTb.
194
Chapter 5. Inner Product Spaces
Example 5.20 (QR decomposition of A) Find the QR factorization A = QR and the orthogonal projection matrix P = [PrOk(A)]a for
A=
[CI C2 C3]
=
[
1 1 0] 101 0 1 1 . 001
Solution: We first find the decomposition of A into Q and R, the orthogonal part and the upper triangular part. Use the Gram-Schmidt orthogonalization to get the column vectors of Q:
ql = q2 q3 and IIqlll
CI
= (I, I, 0, 0)
= (~ , -~,
=
C2 - C2' ql ql ql . ql
=
C3 - C3 . q2 q2 _ C3 . ql ql = q2 . q2 ql . ql
2
1,
2
0) (-~ , ~ , ~, 1) , 3 3
3
= -/2, IIq211 = .J'J72, IIq311 = ../173. Hence, UI U2
=
.s, _
=
~_
U3 =
IIqJII IIq211 -
(_1 _1 00) -/2' -/2'
,
0)
(_1 _ _ 1 -/2 ../6 ' ../6'-.13'
q3 ( IIq311 = -
2
2
-.13)
2
.J2T' .J2T' .J2T' .;7 .
Then, CI = -/2uI, C2 = JzUI + .j[U2' C3 = JzUI + ~U2 + ftU3 ' In fact, these equations can also be derived from A = QR with an upper triangular matrix R . (It gives that the (i, j)-entry of R is bij = (Uj , Cj).) Therefore,
A =
[H :] o
= [UI U2 U3]
0 1
[~lj~ .;7/-.13 ~j~ ] 0
-2/.J2T] [ 2/.J2T o -/2/-.13 2/.J2T o 0 -.13/.;7
1/-/2 1/../6 1/-/2 -1/../6
= [
0
-/2 0 0
1/-/2
-.13/-/2 0
1/-/2] 1/../6
.;7/-.13
= QR,
and
6/7 p=QQT=
[
_~~~ -2/7
1/7
1/7
2/7
2/7
6/7 -1/7 -1/7 6/7
- 2/ 7 ] 2/7 2/7 .
3/7
o
5.9.3. Application: Orthogonal projection matrices Problem 5.31 Find the 2 x 2 matrix P that projects the xy -plane onto the line y
195
=x .
[i ~ l
Problem 5.32 Find the projection matrix P of the Euclidean 3-space]R3 onto the column space
C(A)focA~
Problem 5.33 Find the projection matrix P on the XJ, X2, X4 coordinate subspace of the Euclidean 4-space ]R4. Problem 5.34 Find the QR factorization of the matrix [ sin 88
cos
cos 0
8
]
.
As the last part of this section , we introduce a characterization of the orthogonal projection matrices. Theorem 5.29 A square matrix P is an orthogonal projection matrix if andonly if it is symmetric and idempotent, i.e., pT = P and p 2 = P. Proof: Let P be an orthogonal projection matrix . Then , the matrix P can be written as P = A(A T A)-J AT for a matrix A whose column vectors form a basis for the column space of P. It gives
pT
=
(A(A T A)-JATf
p2
=
PP
= A(A T A)-J T AT = A(A T A)-JA T = P,
= (A(A T A)-JA T )
(A(A T A)-JA T )
= A(A T A)-lA T = P.
In fact, this second equation was already shown in Theorem 5.9. Conversely, by Theorem 5.16, one has the orthogonal decomposition jRm = C(P)EB N(p T). But, N(p T) = N(P) since p T = P. Thus , for any u + 0 E C(P) EB N(p T) = jRm, P(u+o) = Pu+ Po = Pu = u, because p 2 = P implies Pu = u for u E C(P) . It shows that P is an orthogonal projection matrix. (Alternatively, one can use directly Theorem 5.9). 0 From Corollary 5.10, if P is a projection matrix on C(P), then 1- P is a projection matrix on the null space N(p) (= C(l - P)), which is orthogonal to C(P) (=
N(l - P)). Example 5.21 Let Pi : jRm --+ jRm be defined by
Pi(Xl,··· ,Xm)=(O, . . . ,O, Xi, 0, . .. ,0), for i = 1, 2, ... , m. Then, each Pi is the projection of jRm onto the i -th axis, whose matrix form looks like
Chapter5. Inner Product Spaces
196
Pi
=
o
o
o
o
I - Pi =
o
o
o
When we restrict the image to JR, Pi is an element in the dual space JRn*, and usually 0 denoted by Xi as the i-th coordinate function (see Example 4.25). Problem 5.35 Show that any square matrix P that satisfies p T P
= P is a projection matrix.
5.10 Exercises 5.1. Decide which of the following functions on lR.2 are inner products and which are not. For x = (XltX2), Y = (Yl, Y2) in]R2
= XlY l X2Y2. = 4Xl Yl + 4X2Y2 - Xl Y2 - X2Yl, (3) (x, y) = Xl Y2 - X2Yl, (4) (x, y) = XlYI + 3X2Y2, (5) (x, y) = XlYI - Xl Y2 - X2Yl + 3X2Y2· Show that the function (A, B) = treAT B) for A, B E Mnxn(]R) defines an inner product (1)
(x, y)
(2) (x, y)
5.2.
on Mnxn(lR) . 5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R5. 5.4. Determine the values of k so that the given vectors are orthogonal with respect to the Euclidean inner product in ]R4.
(l)IUlun (2)IUH -nl
5.5. Consider the space qo, 1] with the inner product defined by
ir. g) =
f
f(x)g(x)dx .
Compute the length of each vector and the cosine of the angle between each pair of vectors in each of the following :
= 1, g(x) = X; = x m, g(x) = x n , where m, n are positive integers; f( x) = sin,.,rrx, g(x) = cosrerx, where m, n are positive integers.
(1) f(x) (2) f(x) (3)
5.6. Prove that
(al for any real numbers al, a2,
+
+
an)2 ::: near
+ ... +
a;)
, an. When does equality hold?
5.10. Exercises
197
5.7. Let V = P2([0, 1]) be the vector space of polynomials of degree g 2 on [0, 1] equipped with the inner product
(f, g) (1) Compute (f, g) and
1If11
= 10
1
f(t)g(t)dt.
for f(x) = x + 2 and g(x) = x 2 - 2x - 3.
(2) Find the orthogonal complement of the subspace of scalar polynomials. 5.8. Find an orthonormal basis for the Euclidean 3-space ]R3 by applying the Gram-Schmidt orthogonalization to the three vectors x = (1, 0, 1), X2 = (1, 0, - 1), x3 = (0, 3, 4). 5.9. Let W be the subspace of the Euclidean space]R3 spanned by the vectors VI = (1, 1, 2) andv2=(I , 1, -1). Find Projg-Ib) forb = (1, 3, -2) . 5.10. Show that if u is orthogonal to v, then every scalar multiple of u is also orthogonal to v, Find a unit vector orthogonal to VI = (1, 1, 2) and v2 = (0, 1, 3) in the Euclidean 3-space ]R3. 5.11. Determine the orthogonal projection of VI onto v2 for the following vectors in the n-space ]Rn with the Euclidean inner product. (1) VI = (1, 2, 3), v2 = (1, 1, 2), (2) VI = (1,2, 1), V2 = (2, 1, -1),
(3) VI = (1, 0, 1, 0) , v2 = (0, 2, 2, 0). 5.12. Let S = {v;}, where Vi'S are given below. For each S, find a basis for Sol with respect to the Euclidean inner product on ]Rn. (1) VI
= (0, 1, 0),
v2
= (0, 0,
1),
(2) VI = (1, 1, 0), V2 = (1, 1, 1), (3) VI = (1, 0, 1,2), v2 = (1, 1, 1, 1), vs = (2, 2, 0, 1). 5.13. Which of the following matrices are orthogonal? 1/2 -1/3 ] 1/3' (1) [ -1/2
°
1/./2 (3)
°
-1/./2 [ -1/./2 1/./2
4/5 -3/5 ] 4/5' (2) [ -3/5 -1/./2] 1/./2 ,
°
(4)
1/ ./2 1/./3 -1/..;'6] 1/./2 -1/./3 1/..;'6 . [ 2/..;'6 1/./3
°
5.14. Let W be the subspace of the Euclidean 4-space ]R4 consisting of all vectors that are orthogonal to both x = (1, 0, -1 , 1) and Y = (2, 3, -1, 2). Find a basis for the subspace W. 5.15. Let V be an inner product space. For vectors x and Yin V , establish the following identities : (1) (x, y) = (2) (x,y) =
! IIx + yll2 - ! IIx -
yll2
1
(lIx+YII2 -lIxll 2 -IIYI12)
(3) IIx+ YII 2 + IIx - YII 2 = 2(lIx1l 2 + IIYII 2)
(polarization identity) , (polarization identity), (parallelogram equality).
5.16. Show that x + Yis perpendicular to x - Yif and only if [x] = IIYII .
198
Chapter 5. Inner Product Spaces
o Figure 5.6. n-dimensional parallelepiped P(A)
5.17. Let A be the m x
n matrix whose columns are Cl, C2, ... , Cn in the Euclidean m-space IRm . Prove that the volume of the n-dimensional parallelepiped P(A) determined by those vectors Cj 's in IRm is given by
vol (A)
=
J
det(AT A) .
(Note that the volume of the n-dimensional parallelepiped determined by the vectors c j , c2, ... , Cn in IRm is by definition the multiplication of the volume of the (n - 1)dimensional parallelepiped (base) determined by C2, . . . , Cn and the height of cl from the plane W which is spanned by C2, .. . , Cn' Here, the height is the length of the vector C = q - Proj W(q ), which is orthogonalto W . (See Figure 5.6.)lfthe vectors are linearly dependent, then the parallelepiped is degenerate, i.e., it is contained in a subspace of dimension less than n .)
5.18. Find the volume of the three-dimensional tetrahedron in the Euclidean 4-space
jR4
whose
vertices are at (0,0,0,0), (1,0,0,0), (0, 1,2,2) and (0,0, 1,2).
5.19. For an orthogonal matrix A, show that det A matrix A for which det A = -1.
= ±1. Give an example of an orthogonal
5.20. Find orthonormal bases for the row space and the null space of each of the following matrices.
(I)
243] 1 1 I ,
[2
0 I
(2)
[
1 40]
-2 -3 1 002
,
5.21. Let A be an m x n matrix of rank r. Find a relation among m, nand r so that Ax = b has infinitely many solutions for every b E jRm . 5.22. Find the equation of the straight line that fits best the data of the four points (0, 1), (1, 3), (2, 4), and (3, 4). 5.23. Find the cubic polynomial that fits best the data of the five points (-1 , -14), (0, -5), (1, -4), (2, 1), and (3, 22). 5.24. Let W be the subspace of the Euclidean 4-space IR4 spanned by the vectors Xj 'S given in each of the following problems. Find the projection matrix P for the subspace W and the null space N(P) of P. Compute Pb for b given in each problem.
5.10. Exercises
199
(1) xI = (1, I , 1, 1),X2=(1 , -I , 1, -1),X 3= (-I , 1, 1, O),and b = (1, 2, I, 1). (2) xI = (0, -2, 2, I ), X2 = (2, 0, -1, 2), and b = (1, 1, I , 1). (3) XI = (2, 0, 3, -6),X2 = (- 3, 6, 8, O),andb= (-I , 2, -I , 1).
5.25. Find the matrix for orthogonal projection from the Euclidean 3-space jR3 to the plane spanned by the vectors (1, 1, 1) and (1, 0, 2). 5.26. Find the projection matrix for the row space and the null space of each of the following
[{s -f ],
matrices: (1)
../5
../5
(2)
I] 1 1 I ' [ 24
(3)
1 4 0] 0 0 2 . [ 2 3 -1
5,27. Consider the space C[ -1, 1] with the inner product defined by
U, g) =
11
-I
f(x)g(x)dx.
A function f E C[-I, 1] is even if f(-x) = f(x), or odd if f(-x) = - f(x) . Let U and V be the sets of all even functions and odd functions in C[ -1, 1], respectively. (1) Prove that U and V are subspace s and C[-I , 1] = U
+
V.
(2) Prove that U .1 V. (3) Prove that for any f E C[-l , IJ.11fll 2 = IIh1l 2 + IIgll 2 where f = h+g E U Etl V.
5.28. Determine whether the following statements are true or false, in general, and justify your answers. (1) An inner product can be defined on any vector space. (2) Two nonzero vectors X and Y in an inner product space are linearly independent if and only if the angle between x and Yis not zero. (3) If V is perpendicular to W , then V .L is perpendicular to W .L. (4) Let V be an inner product space . Then IIx - YII ~ [x] - IIYII for any vectors Xand Y in V.
(5) Every permutation matrix is an orthogonal matrix . (6) For any n x n symmetric matrix A, x T Ay defines an inner product on R", (7) A square matrix A is a projection matrix if and only if A 2 = J. (8) For a linear transformation T : jRn --+ jRn, T is an orthogonal projection if and only if idlR.n - T is an orthogonal projection. (9) For any m x n matrix A , the row space R(A) and the column space C(A) are orthogonal. (10) A linear transformation T is an isomorphism if and only if it is an isometry . (11) For any m x n matrix A and bE jRm , AT Ax = ATb always has a solution. (12) The least squares solution of Ax = b is unique for any matrix A . (13) The least squares solution of Ax = b is the orthogonal projection of b on the column space of A.
6 Diagonalization
6.1 Eigenvalues and eigenvectors Gaussian elimination plays a fundamental role in solving a system Ax = b of linear equations. In general, instead of solving the given system, one could try to solve the normal equation A T Ax = A Tb , whose solutions are the true solutions or the least squares solutions depending on whether or not the given system is consistent. Note that the matrix A T A is a symmetric square matrix, and so one may assume that the matrix in the system is a square matrix . For this kind of reason, we focus on a diagonal matrix or a linear transformation from a vector space to itself throughout this chapter. Recall that a square matrix A, as a linear transformation on jRn , may have various matrix representations depending on the choice of the bases, which are all in similar relations. In particular, A itself is the matrix representation with respect to the standard basis. One may now ask whether there exists a basis f3 with respect to which the matrix representation [Alp of A is diagonal or not. But then A and a diagonal matrix D = [Alp are similar : i.e., there is an invertible matrix Q such that D = Q-l A Q. In this chapter, we will see which matrices can have diagonal matrix representations and how one can find such representations. For this we introduce eigenvalues and eigenvectors, which play important roles in their own right in mathematics and have far-reaching applications not only in mathematics, but also in other fields of science and engineering. Some specific applications of diagonalization of a square matrix A are to (1) (2) (3) (4) (5)
solving a system Ax = b of linear equations, checking the invertibility of A or estimation of det A, calculating a power An or the limit of a matrix series L~l An, solving systems of linear differential equations or difference equations , finding a simple form of the matrix representation of a linear transformation , etc.
One might notice that some of these problems are easy if A is diagonal. Definition 6.1 Let A be an n x n square matrix. A nonzero vector x in the n-space jRn is called an eigenvector (or characteristic vector) of A if there is a scalar A in lR J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
202
Chapter 6. Diagonalization
such that Ax = AX. The scalar A is called an eigenvalue (or characteristic value) of A, and we say X belongs to A. Geometrically, an eigenvector of a matrix A is a nonzero vector x in the n-space jRn such that the vectors x and Ax are parallel. In other words, the subspace W spanned by x is invariant under the linear transformation A : jRn ~ jRn in the sense A (W) C W. Algebraically, an eigenvector x is a nontrivial solution of the homogeneous system (A./- A)x = 0 of linear equations, that is, an eigenvector x is a nonzero vector in the null spaceN(A./ - A). There are two unknowns in the system (A./ - A)x = 0: an eigenvalue A and an eigenvector x. To find those unknowns , first we should determine an eigenvalue A by using the fact that the equation (A./ - A)x = 0 has a nontrivial solution x if and only if A satisfies the equation det(A./ - A) = 0, called the characteristic equation of A. Note that det(A./ - A) is a polynomial of degree n in A and it will be called the characteristic polynomial of A. Thus, the eigenvalues are just the roots of the characteristic equation det(A./ - A) O. Next, the eigenvectors of A can be determined by solving the homogeneous system (A./ - A)x = 0 for each eigenvalue A. In summary, by referring to Theorem 3.26 we have the following theorem.
=
Theorem 6.1 For any square matrix A. the following are equivalent: (1) (2) (3) (4)
A is an eigenvalue of A; det(A./ - A) = 0 (or det(A - A./) = 0); H - A is singular; the homogeneous system (A./ - A)x 0 has a nontrivial solution .
=
Recall that the eigenvectors of A belonging to an eigenvalue A are just the nonzero vectors x in the null space N(A./ - A). This null space is called the eigenspace of A belonging to A, and denoted by H(A) . Example 6.1 (Matrix having distinct eigenvalues) Find the eigenvalues and eigenvectors of
A=
[ .J22 .J2t2].
Solution: The characteristic polynomial is det(A./ - A) = det [ A - r,;2 -.J2] = A2 - 3A = A(A- 3) . -"\12 A- I Thus the eigenvalues are Al = 0 and A2 = 3. To determine the eigenvectors belonging to Ai 'S, we should solve the homogeneous system of equations (AiI - A)x O. Let us take Al = 0 first; then the system of equations (All - A)x = 0 becomes
=
6.1. Eigenvalues and eigenvectors {
-2 x I
= =
-/2X2
--/2 x I
x2
0,
0,
or
X2
=
203
-h XI .
Hence, XI = (XI , X2) = (-1, -/2) is an eigenvector belonging to AI = 0, and E(O) = {txI : t E R}. (Here, one can take any nonzero solution (XI, X2) as an eigenvector XI belonging to AI 0.) For A2 = 3, the system of equations (A2l - A)x = 0 becomes
=
XI { --/2 XI
-
= =
-/2x2 2 X2
+
0, 0,
or
XI
=
h
X2 .
Thus, by a similar calculation, X2 = (-/2, I) is one of the eigenvectors belonging to A2 = 3 and E (3) = {tx2 : t E R}. Note that the eigenvectors XI and X2 belonging to the eigenvalues AI and A2 respectively are linearly independent. 0 Example 6.2 (Matrix having a repeated eigenvalue butfull eigenvectors) Find a basis for the eigenspaces of A=
3-2 0]
[o -2
3 0 0 5
.
Solution: The characteristic polynomial of A is (A- l )(A- 5)2, so thattheeigenvalues of A are AI = 1 and A2 = 5 with multiplicity 2. Thus, there are two eigenspaces of A . By definition, X = (XI, X2, X3) is an eigenvector of A belonging to A if and only if Xis a nontrivial solution of the homogeneous system (Al - A)x = 0:
[
A-2 3 A -23 00] [ o
0
A- 5
XI] X2 X3
=
[
0] 0 . 0
If Al = I, then the system becomes
Solving this system yields XI = t, X2 = t , X3 = 0 for t E R. Thus , the eigenvectors belonging to AI = 1 are nonzero vectors of the form
so that (1, 1, 0) is a basis for the eigenspace E(AI) belonging to Al = 1. If A2 = 5, then the system becomes
204
Chapter 6. Diagonalization
Solving this system yields Xl = -s, X2 = s, X3 = t for s, t E R Thus, the eigenvectors of A belonging to )..2 = 5 are nonzero vectors of the form
for s, t E R Since (-I , 1, 0) and (0, 0, 1) are linearly independent, they form a basis for the eigenspace E()..2) belonging to)..2 = 5. 0 For each eigenvalue), of A in Examples 6.1 and 6.2, one can see that the dimension of the eigenspace E()") is equal to the multiplicity of ).. as a root of the equation det(H - A) = O. But, in general it is not true as the next example shows. Example 6.3 (Matrix having a repeated eigenvalue with insufficient eigenvectors) Consider the matrix
A~[H i~ n
A simple computation shows that the characteristic polynomial of the matrix A is ().. - 2)5 so that the eigenvalue X = 2 is of multiplicity 5. However, there is only one linearly independent eigenvector ej = (I , 0, 0, 0, 0) belonging to ).. = 2 because rank(2/ - A) = 4, which shows that dim E()") = dimN(21 - A) = 1 is less than the multiplicity of ).. . This kind of matrix will be discussed later in Chapter 8. 0 Note that the equation det()..] - A) = 0 may have complex roots, which are called complex eigenvalues. However, the complex numbers are not scalars of the real vector space. In many cases, it is necessary to deal with those complex numbers, that is, we need to expand the set of scalars to the set of complex numbers. This expansion of the set of scalars to the set of complex numbers leads us to work with complex vector spaces, which will be treated in Chapter 7. In this chapter, we restrict our discussion to the case of real eigenvalues, even though the entire discussion in this chapter applies in the same way to the complex vector spaces. Example 6.4 (Matrix having complex eigenvalues) The characteristic polynomial of the matrix A = [ c~s 0 - sin 0 ] smO cosO is )..2 - 2 cos 0)" + (cos 20 + sin 2 0) . Thus, the eigenvalues are X = cosO ± i sinO, which are complex numbers, so this matrix as a rotation of]R2 has no real eigenvalues 0 unless 0 = nn, n = 0, ±l, ±2, . . . .
6.1. Eigenvalues and eigenvectors
205
Problem 6.1 Let x be an eigenvalue of A and let x be an eigenvector belonging to )... Use mathematical induction to show that xm is an eigenvalue of Am and x is an eigenvector of Am belonging to ).. m for each m = 1, 2, . . . .
In the following, we derive some basic properties of the eigenvalues and eigenvectors. Lemma 6.2 (1) If A is a triangular matrix, then the diagonal entries are exactly the eigenvalues of A. (2) If A and B are square matrices similar to each other; then they have the same characteristic polynomial.
Proof: (1) The characteristic equation of an upper triangular matrix is
=
(A - all) . .. (A - ann) = O.
(2) Since there exists a nonsingular matrix Q such that B det(AI - B) =
= Q-I AQ ,
det (Q-I(Al)Q - Q-I AQ) det (Q-IO.. I - A)Q)
=
det Q-I det (AI - A) det Q
=
det(H - A).
0
Lemma 6.2(2) says that similar matrices have the same eigenvalues, i.e., the eigenvalues are invariant under the similarity. However, their eigenvectors might be different: in fact, x is an eigenvector of B belonging to A if and only if Qx is an eigenvector of A belonging to A, since A Q = Q B, and A Qx = QBx = AQx .
Theorem 6.3 Let an n x n matrix A have n eigenvalues AI, A2. . .. , An possibly with repetition. Then, (1) det A = Al A2 .. . An, (the product of the n eigenvalues), (2) tr(A) = Al + A2 + .. . + An, (the sum ofthe n eigenvalues).
Proof: (1) Since eigenvalues AI, A2, . .. • An are the zeros of the characteristic polynomial of A, we have
206
Chapter 6. Diagonalization
If we take A = 0 in both sides, then we get
(2) On the other hand,
(A - At}(A - A2) . .. (A - An) = det(U - A)
~a~lll
n
A
=
det
-al -a2n
.
[
A. ;
-anI
]
'
=
which is a polynomial of the form peA) An +Cn_IA n- 1+.. .+ CIA + Co in A. One can compute the coefficient Cn-I of An-I in two ways by expanding both sides, and 0 get AI + A2 + . .. + An = al1 + a22 + . .. + ann = tr(A) .
Problem 6.2 Show that
(1) for any 2 x 2 matrix A , det(A/ - A) (2) for any 3 x 3 matrix A, det(A/ - A)
= -A 3 +
= A2 + tr(A)A +
det A ;
1L.J " (aija ji - ai;a jj)A tr(A)A2 + 2
+
det A .
ii'j
In Theorem 6.3, we assume that the matrix A has n (real) eigenvalues counting multiplicities. But, by allowing the scalars to be complex numbers, which will be done in the next chapter, every n x n matrix has n eigenvalues counting multiplicities, so that Theorem 6.3 remains true for any square matrix.
Corollary 6.4 The determinant and the trace of A are invariant under similarity.
=
Recall that any square matrix A is singular if and only if det A O. However, det A is the product of its n eigenvalues . Thus a square matrix A is singular if and only if zero is an eigenvalue of A, or A is invertible if and only if zero is not an eigenvalue ofA. The following corollaries are easy consequences of this fact.
Corollary 6.5 For any n x n matrices A and B, the following are equivalent. (1) Zero is an eigen value of AB. (2) A or B is singular. (3) Zero is an eigenvalue of B A.
6.2. Diagonalization of matrices
207
Corollary 6.6 For any n x n matrices A and B, the matrices A Band B A have the same eigenvalues. Proof: By Corollary 6.5, zero is an eigenvalue of AB if and only ifit is an eigenvalue of BA. Let A be a nonzero eigenvalue of AB with (AB )x = AX for a nonzero vector x. Then the vector Bx is not zero, since A i= 0, but (B A)( Bx) = B(Ax) = A(Bx).
This means that Bx is an eigenvector of BA belonging to the eigenvalue A, and A is an eigenvalue of B A. Similarly, any nonzero eigenvalue of B A is also an eigenvalue of AB . 0
Problem 6.3 Find the matrices A and B such that det A
= det B, tr(A) = tr(B), but A is not
similar to B. Problem 6.4 Show that A and A T have the same eigenvalues. Do they necessarily have the same eigenvectors? Problem 6.5 Let AI, A2, . . . , An be the eigenvalues of an n x n matrix A . Then (1) A is invertible if and only if Ai
f
0 for all i
= 1,2, . . . , n.
I I (2) If A is invertible, then the inverse A -I has eigenvalues - , - ,
Al A2
An
Problem 6.6 For any n x n matrices A and B , show that AB and BA are similar if A or B is
nonsingular. Is it true for two singular matrices A and B?
6.2 Diagonalization of matrices In this section, we are going to show what kinds of square matrices are similar to diagonal matrices. That is, given a square matrix A, we want to know whether there exists an invertible matrix Q such that Q-I AQ is a diagonal matrix, and if so, how one can find such a matrix Q. Definition 6.2 A square matrix A is said to be diagonalizable if there exists an invertible matrix Q such that Q-I A Q is a diagonal matrix (i.e., A is similar to a diagonal matrix) . If a square matrix A is diagonalizable, then the similarity D = Q-I A Q gives an easy way to solve some problems related to the matrix A, like (1)-(5) on page 20l. For instance, let Ax = b be a system of linear equations with a square matrix A, and suppose that there is an invertible matrix Q such that Q-I A Q is a diagonal matrix D. Then the system Ax = b can be written as QDQ -I x = b, or equivalently DQ-I x = Q-1b. Hence, for c Q-1b the solution y of Dy = c yields the solution x = Qy of the system Ax = b. Note that Dy = c can be solved easily. The next theorem characterizes a diagonalizable matrix, and the proof shows a practical way of diagonalizing a matrix.
=
208
Chapter 6. Diagonalization
Theorem 6.7 Let A be an n x n matrix. Then A is dlagonalizable has n linearly independent eigenvectors.
if and only if A
Proof: (=» Suppose A is diagonalizable. Then there is an invertible matrix Q such that Q-I AQ is a diagonal matrix D, say
0
..
A2 ..
.
AI
o
Q-IAQ=D= [ or, equivalently, AQ
o
0
= QD. Let XI, .. . , Xn denote the column vectors of Q. Since AQ =
[AxI AX2
QD
[AIXI A2X2
=
AXnl, Anxnl.
the matrix equation AQ = QD implies AXi = AiXi for i = 1, ... , n. Moreover, since Q is invertible, their column vectors are nonzero and are linearly independent, that is, the Xi'S are n linearly independent eigenvectors of A . (<=) Assume that A has n linearly independent eigenvectors XI, • . . , Xn belonging to the eigenvalues AI, ... , An, respectively, so that AXi = AiXi for i = 1, .. . , n.1f we define a matrix Q as Q [XI X2 .. . xn ]
=
with Xj as the j -th column vector, then the same equation shows A Q = Q D, where D is the diagonal matrix having the eigenvalues AI, ... , An on the diagonal. Since the column vectors of Q are assumed to be linearly independent, Q is invertible, so Q-IAQ = D. 0
Remark: (1) The proof of Theorem 6.7 reveals how to diagonalize an n x n matrix A. Step 1 Find n linearly independent eigenvectors XI, X2, ... , Xn of A. Step 2 Form the matrix Q = [XI X2 . . , xnl. Step 3 The matrix Q-I A Q will be a diagonal matrix with AI, . . • , An as its successive diagonal entries, where Aj is the eigenvalue associated with the eigenvector Xj, j = 1, 2, .. . , n. (2) Let a denote the standard basis for IRn and let {J = {XI, X2, .. . , xn } be the basis for IRn consisting of n linearly independent eigenvectors of A. Then the matrix
Q = [XI X2 ... xnl
= [[xIla [X2]a
. . . [xn]al
= [idl p
is the basis-change matrix from {J to a , and the matrix representation of A, as a linear transformation, with respect to {J, is
6.2. Diagonalization of matrices
[Alp
= [idl~[Ala[id]p = Q-I AQ =
AI [
".
o
0 ]
209
.
An
Note that the diagonal entries Ai'S are the eigenvalues of A.
(3) Not all matrices are diagonalizable. A standard example is A =
[~ ~] .
Its eigenvalues are AI = A2 = O. Hence, if A is diagonalizable, then
for some invertible matrix Q, and then A must be the zero matrix. Since A is not the zero matrix, no invertible matrix Q can be obtained so that Q-I AQ is diagonal. Example 6.5 (Several different types ofa diagonalizationy Diagonalize the matrix
A=
[
1 -3 0 -5
o
3] 6 .
-3 4
Solution: A direct calculation gives that the eigenvalues of A are Al A3 = -2, and their associated eigenvectors are XI
= (1,
0, 0), X2 = (0, 1, 1)
and
X3
= (1,
= 1, A2 = 1 and
2, 1),
respectively. They are linearly independent, and the first two vectors XI, X2 form a basis for the eigenspace E (1) belonging to AI = A2 = 1, and X3 forms a basis for the eigenspace E(- 2) belonging to A3 = - 2. Thus, the matrix P =
[
101]
0 1 2 011
U=; j] U=~ nurn
diagonalizes A. In fact, one can verify that
P-'AP = =
[~~ ~]. o 0 -2
What would happen if one chose different eigenvectors belonging to the eigenvalues 1 and -2? According to the proof of Theorem 6.7, nothing would happen: Any matrix whose columns are linearly independent eigenvectors will diagonalize A. For
210
Chapter 6. Diagonalization
example, {(-I, 0, 0), (0, -I, -l)}isanotherbasisforE(l),and{(2, 4, 2)}is also a basis for E( -2). The matrix
Q
=
[
-1 02] 0 -1 4
o
also diagonalizes A as Q-I A Q
-1 2
=
[1 0 ~] . 0 0 -2 0 1
A change in the order of the eigenvectors in constructing a basis-change matrix Q does not change the diagonalizability of A, but the eigenvalues appearing on the main diagonal of the resulting diagonal matrix would appear in accordance with the order of the eigenvectors in the basis-change matrix. For example, let
S=
110] [ 011 0 2 1
.
Then, S will diagonalize A, because it has linearly independent eigenvectors as columns. In fact, one can show that
-1
1]
1 -1 -1 2
S-IAS=
and
[
1 00]
0 -2 0 001
.
o
Problem 6.7 Show that the following matrices are not diagonalizable. (1) A
=
). 1 0] [ 00)"1 , 0)..
(2) B
=
[
). 00] 1)" 0 01)"
,
).. is any scalar.
Problem 6.8 Construct a 2 x 2 matrix A whose eigenvalues are 2 and 3, and whose eigenvectors are (2, 1) and (3, 2), respectively.
From Theorem 6.7, we learn how to diagonalize a matrix and what the diagonal matrix is when the matrix has a full set of linearly independent eigenvectors. The next question is when a square matrix A can have a full set of linearly independent eigenvectors. The following theorem shows that it can happen if an n x n matrix has n distinct (real) eigenvalues. Theorem 6.8 Let AI, AZ, .. . , Ak be distinct eigenvalues of a matrix A and XI, Xz , • . . , Xk eigenvectors belonging to them, respectively. Then {XI , Xz, . •. , Xk} is linearly independent. Proof: Let r be the largest integer such that {XI, ... , xr } is linearly independent. If r = k, then there is nothing to prove. Suppose not, i.e., 1 ::: r < k. Then
6.2. Diagonalization of matrices {XI. ...
,xr+d is linearly dependent. Thus, there exist scalars CJ,
with Cr+1
f::
211
C2, • . . , Cr+1
0 such that
(1) Multiplying both sides by A and using
one can get (2)
Multiplying both sides of (1) by Ar+1 and subtracting the resulting equation from (2) yields
Since {XI, X2, . .. , xr } is linearly independent and AI, A2, • . . , Ar+1 are all distinct, it follows that CI = C2 = ... = c; = O. Substituting these values in (1) yields Cr+1 = 0, which is a contradiction to the assumption. 0 As a consequence of Theorems 6.7 and 6.8, we obtain the following.
Theorem 6.9 Ifan n x n matrixA hasn distincteigenvalues, then A isdiagonalizable. It follows from Theorem 6.9 that if XI, X2, .. . , Xn are eigenvectors of an n x n matrix A belonging to n distinct eigenvalues AI , A2, • . . , An , respectively , then they form a basis for ~n and the matrix representation of A with respect to this basis should be a diagonal matrix as shown in Remark (2) on page 208 . Of course, some matrices can have eigenvalues with multiplicities> 1 so that the number of distinct eigenvalues is strictly less than n. In this case, if such a matrix still has n linearly independent eigenvectors, then it is also diagonalizable, because for a diagonalization all we need is n linearly independent eigenvectors (see Example 6.2 or try with the matrix Un) . In some cases , such a matrix does not have n linearly independent eigenvectors (see Example 6.3), so a diagonalization is impossible. This case will be discussed in Chapter 8. The next example shows a simple application of the diagonalization to the computation of the power An of a matrix A.
Example 6.6 Compute A 100 for A = [ ;
i].
Solution: Its eigenvalues are 5 and -2 with associated eigenvectors (1, 1) and (-4, 3), respectively. Hence Q =
[~
- : ] diagonalizes A, i.e.,
212
Chapter 6. Diagonalization
Therefore,
o Problem 6.9 For the matrix A =
[
5 -4 4]
12 -11 12 , 4 -4 5 (1) diagonalize the matrix A; and (2) find the eigenvalues of A 10 + A7 + 5A.
6.3 Applications 6.3.1 Linear recurrence relations Early in the thirteenth century, Fibonacci posed the following problem: "Suppose that a newly born pair of rabbits produces no offspring during the first month of their lives, but each pair gives birth to a new pair once a month from the second month onward. Starting with one (= XI) newly born pair in the first month, how many pairs of rabbits can be bred in a given time, assuming no rabbit dies?" Initially, there is one pair. After one month there is still one pair, but two months later it gives a birth, so there are two pairs. If at the end of n months there are X n pairs, then after n + 1 months the number will be the X n pairs plus the number of offspring of the Xn-I pairs who were alive at n - 1 months. Therefore, we have for n ::: 2, Xn+1
Here , if we assume become
Xo
= Xn +
Xn- I ·
= 0 and XI = 1, then the first several terms of the sequence
0, I, 1, 2, 3, 5, 8, 13, 21, 34, 55, .. .. This sequence is called the Fibonacci sequence and each term is called a Fibonacci number. Example 6.7 Find the 2000 th Fibonacci number. Solution: A standard trick is to consider a trivial extra equation X n = with the given equation: Xn+1 {
Equivalently in matrix notation ,
Xn
= =
+
Xn-I
Xn
together
6.3.1. Application: Recurrence relations
213
which is of the form
= AXn_1 = Anxo,
Xn
where Xn
= [ X::I
],
Xo
= [~]
n
and A
= 1,
2, . .. ,
= [~ ~] .
Thus, the problem is
reduced to computing An . However, a simple computation gives the eigenvalues Al = + ../5), A2 = ../5) of A and their associated eigenvectors VI = (AI, 1), V2 = (A2, 1), respectively. Moreover, the basis-change matrix and its inverse are found to be
to
to -
Q-I_ _ 1 [
-../5
With D = [
1+J5 ~
0
1
-1
_1-J5] 2 1+J5 . 2
]
12J5'
e-2oJ5f ] Q-. I
For instance, if n = 2000, then X2001 ] [ X2000
= X2000
It gives
In general, the Fibonacci numbers satisfy
for n ::: O. Note that since huge number
Js (1+J5f ooo, because e-2J5t is actually very small for large k . X2000
2
must be an integer, we look for the nearest integer to the
214
Chapter 6. Diagonalization
Historically, the number I+z.J5, which is very close to the ratio ~=, is called the golden mean. 0 Remark: The golden mean is one of the mysterious naturally occurring numbers, like e = 2.71828182··· or 1f = 3.14159265··· and it is denoted by ¢o Its decimal for 0 < s < l representation is ¢ = 1.61803398 .. · . It is also described as ¢ = satisfying =
f
its.
f
Definition 6.3 A sequence {Xn : n ~ O} of numbers is said to satisfy a linear recurrence relation of order k if there exist k constants aj , i = 1, ... , k with al and ak -ronzero such that Xn
= alXn-1
+ azxn-z + ... +
akXn-k
for all n ~ k ,
For example, the relation Xn = aXn-1 of order 1 forn = 1,2, . gives a geometric sequence x; = anxo, and the relation xs.i.; = Xn +Xn-l of order 2 withxo = 0, Xl = 1 for n = 1,2, gives the Fibonacci sequence. A solution to the recurrence relation is any sequence {xn : n ~ O} of numbers that satisfies the equation. Of course, a solution can be found by simply writing out enough terms of the sequence if k beginning values xo, Xl, • . . , Xk-l, called the initial values, are given. As in the case of the Fibonacci sequence, one can write the recurrence relation 0
0
0
••
= alXn-1
Xn
+
azxn-z
+
o
••
+
akXn-k
for all
n ~
k,
or equivalently, Xn+k-l
= aIXn+k-Z + aZXn+k-3
+ . .. + akXn-1
for all
n ~
1.
Its matrix form with some trivial extra equations is Xn+k-l
al
az
a3
ak-l
ak
Xn+k-Z
Xn+k-Z
1 0
0 1
0 0
0 0
0 0
Xn+k-3
0
0
0
1
0
Xn-l
=
Xn =
Xn
Xn+l Xn
for n ~ 1, or simply Xn = a recurrence relation Xn =
AXn-I oThe
matrix
A=
A
is called the companion matrix of
AXn-l,
we first compute the characteristic
AXn-l.
To solve a recurrence relation Xn = polynomial of a companion matrix A. Lemma 6.10
= AXn-1
For a companion matrix
al
az
a3
ak-l
ak
1 0
0 1
0 0
0 0
0 0
0
0
0
0
with al and ak nonzero,
6.3.1. Application: Recurrence relations
215
(1) the characteristic polynomial of A is Ak - a\A k-\ - ... - ak-IA - ak. (2) All eigenvalues of A are nonzero and for any eigenvalue A of A, X n = An is a solution of the recurrence relation Xn = AXn-i.
Proof: (1) Use induction on k. Clearly true for k = 1. Assume the equality for k = n - 1. Let k = n, By taking the cofactor expansion of det(Al - A) along the last column, the induction hypothesis gives det(Al - A)
= =
+
A(An- 1 - aIA n-2 - . .. - an-I) (-1)2n-I an nI An - aiA - . . . - an-IA - an,
(2) Clearly all eigenvalues are not zero, because ak f= O. It follows from (I) that for any eigenvalue A of A, X n = An satisfies the recurrence relation
o Remark: (1) By Lemma 6.10(1), every monic polynomial, a polynomial whose coefficient of the highest degree is 1, can be expressed as the characteristic polynomial of some matrix A. This matrix A is also called the companion matrix of the monic polynomial peA) = Ak - aiA k- I - . . . - ak-IA - ak. (2) From Lemma 6.10(1), one can see that if a recurrence relation
is given, then the characteristic equation of the associated companion matrix A can be obtained from the recurrence relation by replacing Xi with Ai and dividing the resulting equation by An-k . This relation between the recurrence relation and the characteristic equation of the matrix A can be a reason why {An : n :::: O} is a solution for each eigenvalue A. Lemma 6.11 IfAo is an eigenvalue ofthe companion matrix A ofa lineardijference equation of order k, then the eigenspace E(AO) is a I-dimensional subspace and contains [A~-I .". . AO If. Proof: An entry-wise comparison of Ax = AOX,
shows that Xi = AOXi-I = A~-I Xl for i = 2, ... ,k.
o
The recurrence relation Xn = AXn-I can be solved if the companion matrix A is diagonalizable.
216
Chapter 6. Diagonalization
Example 6.8 (Recurrence relation recurrence relation Xn
= 6Xn-1
with initial values Xo = 0,
Xl
-
= 1,
Xn
=
1 IXn-2 X2
AXn-1
+
with diagonalizable A) Solve the
6X n-3
for n
~
3
= -1.
Solution: In matrix form, it is
The characteristic polynomial of A is det(H - A) = .1. 3 - 6.1.. 2 + 11.1.. - 6 = (A - 1)(.1.. - 2)(.1.. - 3), by Lemma 6.10. Hence, the eigenvalues are Al = I, .1..2 = 2, .1..3 = 3 and their associated eigenvectors are
respectively, by Lemma 6.11. Moreover, the basis-change matrix and its inverse can be found to be Q
= [VI V2 V3] =
With
D~U one can get
0 2 0
[
149] 1 2 3 , III
Q-I =
6] 1 -5 -2 8 -6 . 2 [ 1 -3 2
~
n Xn
=
It implies that the solution is X n = -3 + 5 x 2n - 2 x 3 n.
o
6.3.1. Application: Recurrence relations
217
As a generalization of a recurrence relation Xn = AXn-1 with a companion matrix A, let us consider a sequence {xn} of vectors in JRk defined by a matrix equation Xn = AXn-1 with arbitrary k x k square matrix A (not necessarily a companion matrix). Such an equation Xn = AXn-1 is called a linear difference equation. A solution to the linear difference equation Xn = AXn-1 is any sequence {x n E Rk : n ~ O} of vectors that satisfies the equation. In fact, a solution of a linear difference equation is reduced to a simple computation of A n if the starting vector XO, called the initial value, is given. We first examine the set of its solutions. Theorem 6.12 For any k x k matrix A , the set ofall solutions ofthe linear difference equation Xn = AXn-1 is a k-dimensional vector space. In particular, the set ofsolutions of the recurrence relation of order k,
with nonzero al and at. is a k-dimensional vector space.
Proof: Since the proofs are similar, we prove this only for the recurrence relation. Let W be the set ofsolutions {x n } of the recurrence relation. Clearly, a sum oftwo solutions and any scalar multiplication of a solution are also solutions. Hence, the solutions form a vector space as shown in Example 3.1(4) . One can show that the function f : W ~ JRk defined by f({ xnD = (Xk-I, . . . , XI, xo) is a linear transformation. Clearly, itis bijective, because any given initial k values of Xo, XI, ... , Xk -I generate recursively a unique sequence {x n : n ~ O} of numbers that satisfies the equation. Hence , dim W = dim JRk = k . (For the linear difference equation Xn = AXn-I, see Problem 6.10) . 0 Problem 6.10 Let Xn = AXn_1 be any lineardifference equationwith a k x k matrix A. For eachbasisvector ej in [ej , . ... ek} in]Rk. thereis a uniquesolutionofxn = AXn-1 withinitial value ej. Showthat such k solutions form a basis for the solutionspace of Xn = AXn_l.
Definition 6.4 A basis for the space of solutions of a linear difference equation or a recurrence relation is called a fundamental set of solutions, and a general solution is described as its linear combination. If the initial value is specified , the solution is uniquely determined and it is called a particular solution. By Theorem 6.12, it is enough to find k linearly independent solutions in order to solve a given linear difference equation or a recurrence relation of order k, and its general solution is just a linear combination of those linearly independent solutions. First, we assume that the square matrix A is diagonalizable with k linearly independent eigenvectors VI , V2 , ... , Vk belonging to the eigenvalues AI, ).,2, . . . , ).,t.
218
Chapter 6. Diagonalization
respectively. Since {VI, vz , ... , vd is a basis for ]Rk, any initial vector Xo can be written as Xo = CIVI + CZV2 + .. . + CkVk · Since AVj = AjVj, we have
and, in general for all n = 1,2, .. . ,
In particular, if the companion matrix A ofthe recurrence relation Xn = al Xn-I + a2xn-2 +- . +akxn-k of order k has k distinct eigenvalues AI, ... , Ab then its solution X n is (as the (k, I)-entry ofthe vector x.) a linear combination of AI' A2, . .. , At. In fact, for each 1 ~ j ~ k, X n = A'J is a solution of the recurrence relation, and these k solutions are linearly independent, so that it forms a fundamental set of solutions by Theorem 6.12. (One can also directly show that X n = A'J satisfies the recurrence relation). Note that all these solutions are geometric sequences. We can summarize as follows. Theorem 6.13 Let A be a k x k diagonalizable matrix with k linearly independent eigenvectors VI, V2 , ... , Vk belonging to the eigenvalues AI, A2, . . . , Ak, respectively. Then, a general solution of a linear difference equation Xn AXn_1 can be written as X n = CIAIvI + C2A2v2 + . .. + CkAtVk
=
for some constants CI, C2, . .. , Ck. In particular; for the recurrence relation Xn = alXn-1
+
a2xn-2
+ .. . +
akXn-b
n 2: k
with nonzero al and ab if the associated companion matrix A has k distinct eigenvalues AI, A2, ... , Ab then its general solution is Xn = CIAI
+
C2A2
+ ... +
CkAt with constants c.'s.
Example 6.9 (Recurrence relation with distinct eigenvalues) Solve the recurrence relation Xn = Xn-I + 7Xn-2 - X n- 3 - 6Xn- 4 for n 2: 4, and also find its particular solution satisfying the initial conditions Xo = 0, XI = I, X2 = -I, X3 = 2. Solution: By Lemma 6.10, the characteristic polynomial of the companion matrix A associated with the given recurrence relation is
6.3.1. Application: Recurrence relations det(AI - A) = ). 4 - ).3 - 7).2
+ ). +
6 = (). - 1)()'
+
I)().
+
219
2)()' - 3),
so that A has four distinct eigenvalues). I = I, ).2 = -1, ).3 = -2, ).4 = 3. Hence, the geometric sequences {1 n}, {(_I)n}, {(_2)n}, {3n} are linearly independent and a general solution is a linear combination of them by Theorem 6.13: Xn
= CJ In + C2(-l)n +
c3(-2t
+
C43n
with constants CI, C2, C3, C4 . And the initial values give if if if if
n n n n
°
= =1 =2 =3
CI
CI CI CI
+ + + +
C2
(_l)l c2 (-1)2 c2
(-1)3 c2
+ + + +
C3
(-2)l c3 (-2)2 c3 (-2)3 c3
+ + + +
= = = =
C4 31c4
32C4 33C4
0, I, -I,
2,
which is a system oflinear equations with a 4 x 4 Vandermonde matrix as its coefficient . H I 'It to h ave CI = 12 5 ' C2 = -g' I C3 = -13' 4 C4 = -40 I matrix. ence, one can so ve and then its particular solution is X
n
5 1 n 4 n In = -1 n - -(-I) - -(-2) - -3 . 12 8 15 40
o
Problem 6.11 Let{an} be a sequence withao = I, al = 2, a2 = 0, and the recurrence relation an = 2an- 1 + an-2 - 2an-3 for n 2: 3. Find the n-th term an,
The next example illustrates how to solve a recurrence relation when its associated companion matrix A has a repeated eigenvalue. Example 6.10 (Recurrence relation with a repeated eigenvalue) Solve the recurrence relation Xn = -2Xn- 1 - Xn-2 for n :::: 2, and also find its particular solution satisfying the initial conditions Xo = 1, XI = 2. Solution: Its characteristic polynomial is
=
and), -1 is an eigenvalue of multiplicity 2. Hence, the geometric sequence {x n } = {(_1)n} is a solution of the recurrence relation . Since its solution space is of dimension 2 by Theorem 6.12, we should find one more solution which is independent of {x n } = {(_I)n}. But, in this case {xn} = {n( _I)n} is also a solution of the recurrence relation . In fact, for n :::: 2, -2Xn_ 1 - Xn-2
= -2(n -
1)(-l)n-1 - (n - 2)(_l)n-2
= n(_l)n = Xn.
220
Chapter 6. Diagonalization
Clearly, two solutions (_l )n} and (n ( _l)n} are linearly independent, and so X n = CI (_I)n +C2n( _I)n is a general solution. The initial condition gives CJ = 1, C2 = - 3 and X n = (_I)n - 3n( _I)n is the particular solution of the recurrence relation . 0 In Example 6.10, we show that A = -1 is an eigenvalue of multiplicity 2, and two geometric sequences (-l)n} and (n(-l)n} are linearly independent solutions of the recurrence relation. As a general case, let us consider a recurrence relation
with nonzero al and ak, and let the associated companion matrix A have s distinct eigenvalues Al, A2, ... , As with multiplicity ml, m2, ... , m s , respectively. For each eigenvalue Ai with multiplicity m i > 1, we have their derivatives f(Ai) . . . = f(m ;-I)(Ai) = 0, where f(A) = Ak - alA k- 1 - a2Ak-2 - . . . - ak_IA - ak is the characteristic polynomial of A. Hence, for a new function FI (A) defined by
= ro» =
FI (A)
= =
An-k f(A) An -aIA n- 1 _ . . . -ak_IA n-k+1 -ak An- k ,
one can see that the derivative F{ (A) = A = Ai. That is, n).n - al (n - 1)),
n-I -
... -
°
at A = Ai and then F2(A) = AF{ (A) = Oat
ak-I (n - k
+ 1)),
n-k+1 - ak(n - k))' n-k
= O.
°
It shows that X n = nAi is also a solution of the recurrence relation. Inductively, Fj(A) = AFj_I(A) = at A = Ai shows that X n = nj-IAi is also a solution for
j = 1, . . . , mi. Therefore, one can conclude that X n = Ai, nAi, .. . , nm;-I Ai are m, linearly independent solutions of the recurrence relation. Getting together such m ; linearly independent solutions for each eigenvalue Ai, one can get a fundamental set of solutions of the recurrence relation. In summary, we have the following theorem. Theorem 6.14 For any given recurrence relation
with nonzero al and ak, let the associated companion matrix A have s distinct eigen values AI, A2, . . . , As with multiplicity ml , m2, ... , m., respectively. Then
{{An, {nAi}, .. . , {nm;-I An I i
= 1,2, ...
-l
forms a fundamental set ofsolutions, and a general solution is a linear combination of them.
6.3.2. Application: Difference equations
221
Problem 6.12 Prove that if ).. = q is an eigenvalueof a recurrence relation with multiplicity m, then the m solutions {qn} , {nqn}, ... , {nm-Iqn} are linearly independent. Problem 6.13 Solve the recurrence relation Xn = 3Xn_1 - 4xn-3 for n XQ
= 1, xI = X2 = I?
~
3. What is it if
6.3.2 Linear difference equations A linear difference equation Xn = AX n_1 represents sometimes mathematical models of dynamic processes that change over time and are widely used in such areas as economics, electrical engineering, and ecology. In this case, vectors Xn give information about a dynamic process when time n passes. In this concept, a linear difference equation Xn
= AXn-l, n = 1,2, . . .
with a square matrix A is also called a discrete dynamical system. If the matrix A is a companion matrix , then it is nothing but a recurrence relation . If the matrix A is diagonal with diagonal entries AI, ... , At. then, by Theorem 6.13, a general solution of Xn AXn-1 is
=
with some constants CI , C2, .•. , ct . Throughout this section, we are concerned with only a linear difference equation X n = AXn-I, n = 1, 2, ... for a diagonalizable matrix A, because if A is not diagonalizable, it is not easy in general to solve it. However, it can be done after reducing A to a simpler form called the Jordan canonical form and this case will be discussed again in Chapter 8. Let A be a k x k diagonalizable matrix with k linearly independent eigenvectors VI, ... , Vk belonging to the eigenvalues AI, .. . , Ak , respectively. Then, by Theorem 6.13 again, a general solution of X n = AXn_1 is
with some constants CI, C2, • •• , Ck. Hence, if IAi I < 1 for all i, then the vector Xn must approach the zero vector as n increases . On the other hand, if there exists an eigenvalue Ai with IAi I > I , this vector Xn may grow exponentially in magnitude. Therefore, we have three possible cases for a dynamic process given by Xn = AXn-I , n = 1, 2, . .. . The process is said to be (1) unstable if A has an eigenvalue Awith IAI > I, (2) stable if IAI < 1 for all eigenvalues of A, (3) neutrally stable if the maximum value of the eigenvalues of A is 1.
222
Chapter 6. Diagonalization
To determine the stability of a dynamic process, it is often necessary to estimate the (upper) bound for the absolute values of the eigenvalues of a square matrix A. To do this, for any square matrix A = [aij] of order k, let k
1:::
i
s k},
laijl 1:::
j
s k},
R(A) =
max{Ri(A) = L laijl : j=1
c(A) =
max{cj(A) = L
k
s, =
Ri(A)
-Iaul.
i=l
Theorem 6.15 (Gerschgorin's Theorem) For any square matrix A oforder k, every eigenvalue ); of A satisfies I).. - au I ::: se for some 1 :::: l ::: k. Proof: Let ).. be an eigenvalue with eigenvector x = [Xl Xz . .. xkf. Then L'=l aijX j = sx; for i = 1, . .. , k. Take a coordinate xe of x with the largest absolute value. Then clearly xe =1= 0, and
I).. - aullxel = IAxe -
auxel
=
Since
Ixel
> 0,
I).. -
laejllxel =
LaejXj ::: L
Ne
selxel ·
Ne
o
aul :::: sr.
Corollary 6.16 For any square matrix A oforder k, every eigenvalue); of A satisfies 1)..1 ::: min{R(A), c(A)}. Proof: Note that 1)..1::: I).. - aul + since Xis also an eigenvalue of AT,
laul ::: Se + laul = Re(A) 1)..1 :::: R(A T) = c(A) .
:::: R(A) . Moreover, 0
Example 6.11 (Stable or unstable) Solve a discrete dynamical system xn = AXn-l, n = 1,2, . . . , where ( 1) A _ [0.8 0.0] 0.0 0.5 '
( 2) A _ [1.2 0.0]
-
0.0 0.6
'
(3) A
1[3 1]
="2
1 3
.
Solution: (1) Clearly, the eigenvalues of A are 0.8 and 0.5 with eigenvectors VI [
~]
and Vz = [
~]
, respectively. Hence, its general solution is
It concludes that the system x, = AXn-1 is stable. (See Figure 6.1.)
=
6.3.2. Application: Difference equations
223
10
Figure 6.1. A stable dynamical system
(2) Similarly, one can show that Xn
=
CI
(1.2)n [
in which the system is unstable if Ci
~ ] + c2(O.6)n [ ~ ]
,
i= O. (See Figure 6.2.) 10
5
-5
-10
Figure 6.2. An unstable dynamical system
(3) The eigenvalues of A are I and 2 with eigenvectors VI = [ [
~
] , respectively. Hence , a general solution of Xn =
AXn_1
is
~
] and V2
=
224
Chapter 6. Diagonalization
It is unstable if C2 =1= 0: For example, if CI Xo
= [ ~ ] , XI = [
i],
X2
= -1 , C2 = 1, then
= [ ; ] , x3 = [ ~ ] , X4 = [
~~ ] , ... .
D
The following example is a special type of a discrete dynamical system, called a Markov process. Example 6.12 (Markov process with distinct eigenvalues) Suppose that the population of a certain metropolitan area starts with Xo people outside a big city and YO people inside the city. Suppose that each year 20% of the people outside the city move in, and 10% of the people inside move out. What is the 'eventual' distribution of the population?
Solution: At the end of the first year, the distribution of the population will be XI { YI
= =
0.8 Xo + 0.1 YO
0.2 Xo + 0.9 YO .
Or, in matrix form, XI
= [ ;~ ] = [~:~
~:~] [ ;~ ] = Axo.
Thus if Xn = (x n , Yn) denotes the distribution in the metropolitan area of the popuAnxo. lation after n years, we get Xn
=
In this formulation , the problem can be summarized as follows : (1) The entries of A are all nonnegative because the entries of each column of A represent the probabilities of residing in one of the two locations in the next year, (2) the entries of each column of A add up to 1 because the total population of the metropolitan area remains constant.
Now, to solve the problem, we first find the eigenvalues and eigenvectors of A. They are A.I = 1, A.2 = 0.7 and VI = (1, 2), V2 = (-1, 1), respectively, so that its general solution is n[ 1]
xn = CJ (1)
2
+
n [-1]
c2(0.7)
1
-c2(0.7)n = [CJ(1)n 2CJ (I)" + c2(0.7)n
But, the initial condition Xo = (xo, YO) gives CJ that
x
=~+
~ and
C2
-+
(~O + ~O) [;
l
as n -+
00.
.
= -to +
(~O + ~O) [; ] + (-~XO + ~o) (0.7)n [ -~
n
]
]
~, so
6.3.2. Application: Difference equations
225
Note that, since X n + Yn = a is fixed for all n, the process in time remains on the straight line x + Y = a. Thus , for a given initial total population a, the eventual ratio X n : Yn of the populations tends to I : 2 which is independent of the initial distribution. For initial populations of a = 3,4,5,6 million people, the processes are shown in Figure 6.3. 0
Figure 6.3. A Markov process AnXO
Recall that the matrix A in Example 6.12 satisfies the following two conditions: (1) all entries of A are nonnegative, (2) the entries of each column of A add up to 1. Such a matrix A is called a Markov matrix (or, a stochastic matrix). In general , a dynamical system Xn = AXn-l with a Markov matrix A is called a Markov process. The next theorem follows directly from Gerschgorin's Theorem 6.15 . Theorem 6.17 If Ais an eigenvalue ofany Markov matrix A, then
IAI
:s 1.
In fact, every Markov matrix has an eigenvalue A = 1. To show this, let A be any Markov matrix . Then the entries of each column of A add up to 1. It means that the sum of each column of A - I is 0, or equivalently rl + r2 + ... + r n = 0 for the row vectors ri of A - I. This is a nontrivial linear combination of the row vectors, and so these row vectors are linearly dependent, so that det(A - l) = O. Consequently, A = 1 is an eigenvalue of A. If x is an eigenvector of A belonging to A = I, then x. This is called the equilibrium state. Ax
=
Theorem 6.18
If A is a stochastic matrix, then
=
(1) A 1 is an eigenvalue of A, (2) there exists an equilibrium state x that remains fixed by the Markov process. Problem 6.14 Suppose that a land use in a city in 2000 is
Residential Commercial Industrial
XO = 30%, YO 20%,
Zo
= = 50%.
226
Chapter 6. Diagonalization
Denote by Xb Yb Zk the percentage of residential, commercial, and industrial, respectively, after k years, and assume that the stochastic matrix is given as follows:
[
Xk+l] Yk+l Zk+l
=
[ 0.8
0.1 0.0] [ Xk ] Yk . 0.1 0.7 0.1 Zk 0.1 0.2 0.9
Find the land use in the city after 50 years.
Problem 6.15 A car rental company has three branch offices in different cities. When a car is rented at one of the offices, it may be returned to any of three offices. This company started business with 900 cars, and initially an equal number of cars was distributed to each office. When the week-by-week distribution of cars is governed by a stochastic matrix
A
=
0.6 0.1 0.2] 0.2 0.2 0.2 , [ 0.2 0.7 0.6
determine the number of cars at each office in the k-th week. Also, find lim Ak. k-vco
6.3.3 Linear differential equations I A first-order differential equation is a relation between a real-valued differentiable function yet) of time t and its first derivative Y'(t) , and it can be written in the form
, Y (t)
df(t)
= dt = f(t, y(t».
As a special case, if it can be written as y' (t) = get) for an integrable function get) , then its solution is yet) J g(t)dt + c for a constant c. However, it is difficult to solve it in most other cases such as y'(t) = sin( ty 2). As another case.If y'(r) = 5y(t), then it has a general solution y = ce 5t , wherec is an arbitrary constant. Ifan additional condition yeO) = 3, called an initial condition, is given, then its solution is y = 3e5t , called a particular solution. The second case can be generalized to a system of n linear differential equations with constant coefficients, which is by definition of the form
=
I
y~ = Y2 =
a 2lYI
+ +
a12Y2 a22Y2
+ +
+ +
alnYn a2nYn
y~ =
anlYI
+
an2Y2
+
+
annYn,
allYl
where Yi = Ii (t) for i = 1,2, . . . , n are real-valued differentiable functions on an interval I = (a, b). In most cases, one may assume that the interval I contains 0, and some initial conditions are given as Ii (0) = di at 0 E f. Let y = [fl h . . . fnf denote the vector whose entries are the differentiable functions y, = J;'sdefinedonanintervalf = (a , b) : thus, for each t E f,y(t) = [fl (t) h (t) . . . fn (t) f is a vector in jRn. Its derivative is defined by
6.3.3. Application: DitTerentialequations I
f{ ]
y'
~ [ )~
f{(t) ]
f~(t)
, Y (t) =
or
227
:
[
.
f~(t)
If A denotes the coefficient matrix of the system of linear differential equations, the matrix form of the system can be written as Y'
= Ay,
or y'(t)
= Ay(t)
for all tEl.
An initial condition is given by Yo = y(O) = (dt, .. . , dn ) E jRn. A differentiable vector function y(t) is called a solution of the system y'(t) = Ay(t) if it satisfies the equation. In general, the entries of the coefficient matrix A could be functions. However, in this book, we restrict our attention to the systems with constant coefficients. Example 6.13 Consider the following three systems:
yi { y~
=
=
=
2Yt- 3Y2 {Yi 2Yl + Y2 ' y~
=
2Y2 tYt +t { Yi = 3 2Yl + t Y2' y~ =
3yi
2Yt sin yj + 5Y2
The first two systems are linear, but the coefficients of the second are functions of t. The third is not linear because of the terms and sin YI. 0
yi
Example 6.14 (Population model) Let p(t ) denote the population of a given species like bacteria at time t and let r (t , p) denote the difference between its birth rate and its death rate at time t. If r (t , p) is independent of time t, i.e., it is a constant, then d~~t) = rp(t) is the rate of change of the population and its general solution is p(t) = p(O)e r t • 0 Some basic facts about a system y' (t) = Ay(t) , A is any n x n matrix, oflinear differential equations defined on I = (a , b) are listed below. (I) (The fundamental theorem for a system of linear differential equations) The system y' (t) = Ay(t) always has a solution. In addition, if an initial condition Yo is given, then there is a unique solution y(t) on I which satisfies the initial condition. If y = [YI Y2 ... Ynf is a solution on I, then it draws a curve in jRn passing through the initial vector Yo = y(O) = (dl' ... , dn ) as t varies in the interval I . (II) (Linear independence ofsolutions) Let {YI, .. . , Yn} be a set of n solutions of the system y' = Ay on I. The linear independence of the solutions YI, . . . , Yn on I is defined as usual: if c lYI + ... + CnYn = 0 implies Cl = ... = Cn = O. Or, equivalently, they are linearly dependent if and only if one of them can be written as a linear combination of all the others. Define
Y (t)
= [Yt (t)
... Yn(t)]
=
ru(t)
Y12(t )
:
:
Y21(t ) Y22 (t)
[
Ynl (t ) Yn2(t )
::: ~~~~~ ] Ynn(t )
for i
«t.
228
Chapter 6. Diagonalization
If the n solutions are linearly dependent, then det Y (t) = 0 for all i e I, Or, equivalently, if det Y (t) f. 0 for at least one point tEl, then the solutions are linearly independent. However, the next lemma says that det Y (t)
f. 0 for all t
e I
if and only if det Y (t) f. 0 at one point t
e I,
The determinant of Y (t) is called the Wronskian ofthe solutions, denoted by W (t) = det Y (t) for tEl. Note that the Wronskian W (t) is a real-valued differentiable function on I .
o
Lemma6.19 W'(t) = tr(A)W(t). Proof: W'(t)
=
(det Y(t))'
=
L sgn(a)(Yla(l) ' " Yna{n»'
aeSn
=
Lsgn(a)Y~a{I)' " Yna{n)
=
i: i: I
=
=
Y;j Yij =
J
+ ... +
i: (t I
Lsgn(a)Yla(l) '" Y~a{n)
n
Y;j [adj Y]ji) =
L[Y' . adj flii
J
tr(Y'. adj Y) = tr(A · (Y . adj Y)
tr(det Y(t)A)
= tr(A)W(t),
where Yij (t) is the cofactor of Yij, and the equalities in the last two lines are due to the fact that Y' (t) yet) adj yet)
= =
[y~ (t) ...
det Y(t)In
= A[YI (t) = W(t)In·
y"(t)]
.. . Yn(t)]
= AY(t),
o
From Lemma 6.19, it is clear that the Wronskian W(t) is an exponential function of the form Wet) = cetr{A)t with an initial condition W(O) = c. It implies that the value of Wet) is zero for all t or never zero on I depending on whether or not c = O. Thus, we have the following lemma.
Lemma 6.20 Let {Yl, Y2, . . . , Yn} be a set ofn solutions ofthe system Y' = Ayon I , where A is any n x n matrix. Then the following are equivalent. (1) The vectors Yl, Y2 , . . . , Yn are linearly independent. (2) Wet) f. Ofor some t, that is, Yl(t), Y2(t), ... , Yn(t) are linearly independent in JRn for some t. (3) W (t) f. 0 for all t , that is, Yl(t), Y2(t) , . . . , Yn (t) are linearly independent in JRn for all t.
6.3.3. Application: Differential equations I
229
(III) (Dimension ofthe solution space) Clearly, the set of all solutions of Y' (t) = Ay(t) is a vector space. In fact, for any two solutions YI, Y2 of the system Y' (t) = Ay(t), we have (cIYI + c2yd = CIYl + C2Y2 = cIAYI + C2 AY2 = A(CIYI Thus, CIYI + C2Y2 is also a solution for any constants Cj's.
+
C2Y2) .
Let {el, e2, ... , en} be the standard basis for R". For each ei, there exists a unique solution Yi of Y'(t) = Ay(t) such that Yi (0) = ei , by (I). All such solutions YI, Y2, ... , Y« are linearly independent by Lemma 6.20. Moreover, they generate the vector space of solutions. To show this, let Y be any solution. Then the vector y(O) can be written as a linear combination of the standard basis vectors: say, Yo = CI el + C2e2 + ... + cnen. Then , by the uniqueness of the solution in (I), we have y(t) = cm (t) + C2Y2(t) + ... + cnYn(t). This proves the following theorem. Theorem 6.21 Forany n x n matrix A, the set ofsolutions ofa system Y' = Ay on
I is an n-dimensional vector space. Definition 6.5 A basis for the solution space is called a fundamental set of solutions. The solution expressed as a linear combination of a fundamental set is called a general solution of the system . The solution determined by a given initial condition is called a particular solution. By Theorem 6.21, it is enough to find n linearly independent solutions to solve a system Y' = Ay on I , and then its general solution is just a linear combination of those linearly independent solutions . This may be considered in three steps: (1) A is diagonal, (2) A is diagonalizable, and finally (3) A is any square matrix . (1 ) First suppose that A is a diagonal matrix D. Then y'(t) = Ay(t ) is
This system is just n simple linear differential equations of the first order:
y;(t) = AiYi(t),
i=I,2, ... , n,
and their solutions are trivial: Yi(t) = CieA;1 with a constant c; for i = 1 ,2 , . .. ,n. On the other hand, the diagonal matrix A has n linearly independent eigenvectors ej , .. . , en belonging to the eigenvalues AI, ... , An , respectively. One can see that Yi(t) = eAj1ei is a solution of the systemy'(t) = Ay(t) for i = 1,2 , . .. , n. Moreover, at t = 0, the solution set (yl(t) , . .. , Yn(t)} = {el , ... , en} is linearly independent. Hence , by Lemma 6.20, a general solution of the system y'(t) = Ay(t) is
y(t)
= ClYI (t) + . ..+
cnYn(t) = cleA\/ e l
+ . .. +
cneA./en
230
Chapter 6. Diagonalization
with constants Ci'S . Or in matrix notation,
yet) =
[
y'~t) ] :
A1t
=
[ e
;., ] [ ::] =
0
Yn(t)
,'D yo,
where et D is by definition,
o Example 6.15 (A predator-prey problemas a modelofa system ofdifferentialequations) One of the fundamental problems of mathematical ecology is the predator-prey problem. Let x(t) and yet) denote the populations at time t of two species in a specified region, one of which x preys upon the other y. For example, x(t) and yet) may be the number of sharks and small fishes, respectively, in a restricted region of the ocean. Without the small fishes (preys) the population of the sharks (predators) will decrease, and without the sharks the population of the fishes will increase. A mathematical model showing their interactions and whether an ecological balance exists can be written as the following system of differential equations: X ' (t) {
y'(t)
= =
a x(t) - b x(t)y(t) -c yet)
+ d x(t)y(t).
In this equation, the coefficients a and C are the birth rate of x and the death rate of y, respectively. The nonlinear x(t)y(t) terms in the two equations mean the interaction of the two species such as the number of contacts per unit time between predators and prey, so the coefficients b and d are the measures of the effect of the interaction between them . A study of this general system of differential equations leads to very interesting developments in the theory of dynamical systems and can be found in any book on ordinary differential equations . Here, we restrict our study to the case of x and y very small, i.e., near the origin in the plane. In this case, one can neglect the nonlinear terms in the equations, so the system is assumed to be given as follows: x'(t) ] = [ y' (t)
[a0
0] [
-c
x(t) ] .
yet)
Thus, the eigenvalues are Al = a and AZ = -c with their associated eigenvectors and ez, respectively. Therefore, its general solution is
[ ;~~~ ] = [ ~~::ct ] = [e~t e~ct] [ ~~ ] = c,eate, + cze-ctez.
e, 0
(2) We next assume that a matrix A in the system y'(t) = Ay(t) is diagonalizable, that is, it has n linearly independent eigenvectors VI, • •• , Vn belonging to the
6.3.3. Application: Differential equations I
231
= [VI
' " vn]
eigenvalues AI , .. ., An, respectivel y. Then the basis-change matrix Q diagonalizes A and
0]
A = QDQ-I = Q [AI...
o
Q_I .
An
Thus the system becomes Q-I y' = DQ-I y. If we take a change of variables by the new vector x = Q-I y (or y = Qx), then we obtain a new system
x' = Dx, with an initial condition Xo = Q-I yO= (cj , . . . , cn). Since D is diagonal , its general solution is x = elDxo CjeAllel + .. . + CneAnlen.
=
Now, a general solution of the original system y'
Y
=
Qx
=
QeIDQ-I yO
=
[v,
=
cleA11vI
= Ay is
-:
,:, ] [ ;: ]
+ c2eA21v2 + ... + cneAnlvn.
Remark: One can check directly that each vector function Yi (t ) = eAilvi is the particular solution ofthe system with the initial condition Yi (0) = Vi for i = I, , n. Since Yi (0) = Vi for i = 1, . . . , n are linearly independent, eA;1 Vi for i = 1, ,n form a fundamental set of solutions. Thus, we have obtained the following theorem :
Theorem 6.22 Let A be a diagonalizable n x n matrix with n linearly independent eigenvectors VI , V2 , .. . , Vn belonging to the eigenvalues AI, A2 , ... , An, respectively. Then, a general solution of the system of linear differential equations y'(t) = Ay(t) is
with constants CI, C2, . . . , Cn ' Note that a particular solution can be obtained from a general solution by determining the coefficients depending on the given initial condition.
Example 6.16 (y' = Ay with a diagonalizable matrix A) Solve the system of linear differential equations
4Y2 IIY2 4Y2
+ + +
4Y3 12Y3 5Y3.
232
Chapter 6. Eigenvectors and Eigenvalues
Solution: In matrix form, the system may be written as y'
A=
[
5 -4 4]
12 -11 12 4 -4 5
= Ay with
.
The eigenvalues of A are Al = A2 = 1, and A3 = -3, and their associated eigenvectors are vj = (1, 1,0), V2 = (-1 ,0,1) and V3 = (1,3, 1), respectively, which are linearly independent (see Problem 6.9). Hence, by Theorem 6.22, its general solution is
o (3) A system y' = Ay of linear differential equations with a non-diagonalizable matrix A will be discussed in Section 6.5.
6.4 Exponential matrices Just like the Maclaurin series ofthe exponential function eX, we define the exponential ofa matrix. Definition 6.6 For any square matrix A, the exponential matrix of A is defined as the series 00 Ak A2 A3 eA = = I + A + - + - + ... . k=O k! 2! 3!
L-
That is, the exponential matrix e A is defined to be the (entry-wise) limit of the sequence:
[e A ] IJ. . =
lim [Lkm=O Ak:] for all i, j. • ij
m-->oo
Example 6.17 If
D
= [AI
0],
o
then D k
An
=
A1 [ o
Thus, the exponential matrix eD is
Ak
00
00
Ak
k=O
k!
eD=L-=
L k~
k=O
o
o
for any k
~
0.
6.4. Exponential matrices
233
0
which coincides with the definition given on page 230.
Practically, the computation of e A involves the computation of the powers A k for all k 2: 1, and hence it is not easy in general. Nevertheless, one can show that the limit e A exists for any square matrix A. Theorem 6.23 For any square matrix A, the matrix e A exists. In other words, each (i, j)-entry ofe A is convergent. Proof: Since A has only n 2 entries, there is a number M such that laij I =s M for all (i, j)-entries aij of A. Then one can easily show that [Ak]ij =s n k- t M k for all k and i, j . Thus 00 1 1 [e A ] . . < '"' _n k - t Mk = _e nM IJ -
so by the comparison test, each entry of eA
Example 6.18 If A =
= =
[6 ; l
00
0
2
2
1~3]+ ...
It is a good exercise to calculate the missing entry Problem 6.16 Let A
is absolutely convergent for
then
+A + ~ A +.. . [6 ~]+[~ ;]+~[~
I
Ak
= L k! n=O
any square matrix A.
eA
n'
~k' k=O •
=
[~~l
* directly from the definition.
0
= [~ ~] . Find k~~ A k if it exists. (Note that the matrix A is not
diagonalizable.)
The following theorem is sometimes helpful to compute eA. Theorem 6.24 Let At, A2, A3, .. . be a sequence of m x n matrices such that lim Ak = L. Then k-+oo
lim BAk = BL
k-+oo
and
lim AkC = LC
k-+oo
for any matrices B and C for which the products can be defined.
234
Chapter 6. Eigenvectors and Eigenvalues
Proof: By comparing the (i , j)-entries of both sides lim [BAk]ij = lim (t[B]il[Ak]£j)
k->oo
k->oo
= t[B]il lim [Akl£j
£=1
£=1
k-scx:
m
=~)B]il[L]£j =
[BL]ij,
£=1
we get lim BAk = BL. Similarly lim AkC = LC. k->oo
D
k-e-tx:
For example, if A is a diagonalizable matrix and Q -I A Q = D is diagonal for some invertible matrix Q, then, for each integer k ::: 0, A k = Q D k Q - I and lim
k->oo
)..1
lim A k = Q (lim D k ) Q-I = Q
k->oo
k->oo
Thus, lim A k exists if and only if lim k->oo
".
[
k-vtx:
o
)..f exists for i =
1,2, . .. , n.
Also, by Theorem 6.24,
whose computation is easy.
Example 6.19 (Computing e A for a diagonalizable matrix A) Let A
=!
Then its eigenvalues are 1 and 2 with associated eigenvectors UI = [ -
~
[~ l
respectively. Thus A = QDQ-I with D =
[i ;].
] and U2 =
[~ ~] and Q = [-~ ~
Therefore,
l
The following theorem shows some basic properties of the exponential matrices, whose proofs are easy, and are left for exercises.
Theorem 6.25 (1) eM
B
= eAe B provided that AB = BA.
6.5.1. Application: Differential equations II
235
(2) e A is invertible for any square matrix A, and (e A) -I = e" A. (3) e Q- 1AQ = Q-IeAQforany invertible matrix Q. (4) If AI, A2 , .. . , An are the eigenvalues ofa matrix A with their associated eigenvectors VI, V2, . •. , Vn , then eAi 's are the eigenvalues ofe A with the same associated eigenvectors Vi 'S for i = I, 2, . . . , n. Moreover, det e A = eAI ••. eA. = etr(A) f:. 0 for any square matrix A. Problem 6.17 Prove Theorem 6.25. Problem 6.18 Finish the computation of e A for the matrix A in Example 6.18. Problem 6.19 Prove that if A is skew-symmetric, then e A is orthogonal.
In general, the computation of eA is not easy at all if A is not diagonalizable, However, if A is a triangular matrix, it is relatively easy as shown in the following example. Example 6.20 (Computing e A for a triangular matrix of the form A = >..I For A
= [~ ~], compute eA.
Solution: Write A = 21
+
N with N =
[~ ~
l
+
N)
Since (2I)N = N(2I), by
Theorem 6.25(1), e A = e2l eN . From the direct computation of the series expansion, we get e 21 = e 2I . Moreover, since N k = 0 for k ~ 2, eN = 1+ N + ~~ + ... =
I
+
N =
[~ ~
l
Thus,
A 2 e = e (l
+
Problem6.20 Compute e A for A
2]. 2 N) = e [ 0I 31]=[eo2 3 2 ee
=
o
2 3 0] 0 2 3 . [ 002
6.5 Applications continued 6.5.1 Linear differential equations II One of the most prominent applications of exponential matrices is to the theory of linear differential equations. In this section, we show that a general solution of y' (t) = Ay(t) is of the form y(t) = elAyo .
236
Chapter 6. Diagonalization
Lemma 6.26 For any t
E
IR and any square matrix A, the exponential matrix
etA = 1+ tA
2
3
2!
3!
+ ~A2 + ~A3 + ...
d is a differentiable function oft, and dt etA = Ae tA.
Proof: By absolute convergence of the series expansion of etA one can use term by term differentiation, i.e.,
d tA -e = dt
=
d t2 t3 -(l+tA+-A 2+-A3+ ... ) dt 2! 3!
A +t A 2 +-t2A3 21
+ .. .
= A e tA .
o
=
As a direct consequence ofLemma 6.26, one can see that y(t) etAyo is a solution of the linear differential equation y' = Ay. In fact, by taking the initial vector Yo as the standard basis vector ei, 0 ::: i ::: n , the n columns of e/ A become solutions and they are clearly linearly independent, so they form a fundamental set of solutions. Hence, we have the following .
Theorem 6.27 For any n x n matrix A, the linear differential equation y' = Ay has a general solution y(t) = etAyo. In particular, if A is diagonalizable, say Q-I A Q = D is diagonal with a basischange matrix Q = [VI' • . vn ] consisting of n linearly independent eigenvectors of A belonging to the eigenvalues Ai'S, then a general solution of a system y' = Ay is
y(t)
= etAyo =
=
etQDQ-t Yo
Qe tD Q-I yO A1t
= =
[VI • . .
CJeAt/vl
vn ]
+
[
0 ] [
e
o C2eA2/v2
eA. /
+ ...+
CI ] Cn
cneA.tv n .
In fact, {Yi (t) = e Aj / Vi : i = I , ... , n} forms a fundamental set of solutions, and the constants (CJ, .•• , cn) = Q-l yO can be determined if an initial condition is given. Note that this just rephrases Theorem 6.22.
Example 6.21 (y' = Ay for a diagonalizable matrix A) Solve the system
[~l~~~ ] [~ b] [ ~~g~ J. =
with initial conditions YI (0) = I, Y2(0) =
o.
or
6.5.1. Application: Differential equations II
Solution: (l ) The eigen values of A = eigenvectors VI = [1
If and V2 =
are X]
=
I andA2 = -1 with associated
If, respectively.
[I -
[~ _~
l
Q- IAQ
=
1 1 ] [ e0 e-0] I Q-I Yo = CJe [ 1 -1
1 [
(2) By setting Q
= [VI V2] =
b]
[~
237
[b
_~] = D.
(3) A general solution yet) = elAyo is
elAyo =
eIQDQ-lyO = QeIDQ -I yO l
11 ]
+
C2 e_I
[
-11]
with constants CI , C2. The initial conditions YI(O) = 1, Y2(0 ) = 0 determine CI C2 = so that
1,
D
Problem 6.21 Solvethe system { Y}
Y2
Problem 6.22 Solve the system
I
y'
4YI
Y3
- 2YI -2YI
Y~
+
Y2
+
Y3
+
Y3,
and find the particular solution of the system satisfying the initial conditions YI (0) -I, Y2(0)
= I,
Y3(0)
= o.
If A is not diagonalizable, then it is not easy in general to compute etA directly. However, one can still reduce A to a simpler form called the Jordan canon ical form, which will be introduced in Chapter 8, and then the computation of el A is made relatively easy. The following example shows that the computation of eA is possible for some triangular matrices A even if they are not diagonalizable. A general case will be treated again in Chapter 8.
Example 6.22 (y' = Ay for a triangular matrix A = AI + N) Solve the system = Ay of linear differential equations with initial condition yeO) = Yo, where
y'
A=[~
.I
Yo = [ : ] .
Solution: First note that A has an eigenvalue Aof multiplicity 2 and is not diagonalizable. One can rewrite A as
Then, by the same argument as in Example 6.20,
238
Chapter 6. Diagonalization
[6
etA = et(A/+N ) = eAt etN = eAt
~] .
Therefore, the solution is
_
_ tA
At
y - e Yo - e
[1
t] [ a ] _ [ (a + bt)e b beAt
At
] _ At [ a ] - e b
0 1
In terms of components , Yl
= (a +
bt)e At, Y2
+
te
At [
b ]
O'
= be",
D
Example 6.23 (y' = Ay with A having complex eigenvalues)Find a general solution of the system y' = Ay, where a A= [ b
-b] a
'
Solution: Note that the eigenvalues of A are a ± ib , which are not real. However, one can compute etA directly without using diagonalization. We first write A as
-b] a a [1 0] + b [0 -1]
A = [ ab
0 1
=
1
0
= al
+
bJ.
Then clearly I J = J I and et A = eatI+bt J = eatebt J . Since J2 =
[-1 0] 0 -1
= -I,
01]
J3 = [ - 1 0
= -J,
one can deduce Jk = Jk+4 for all k = 1,2, .. . , and
bt J
(bt)2 J2
(bt)3J3
(bt)4 J4
1+-+--+--+--+ ..· I! 2! 3! 4! - (bt)+ -(bt)3 - -(bt)5 + ..· ] 3! 5! (bt)2
(bt)4
2!
4!
1---+--- .. · _ -
- sinbt ] cosbt
[cosbt sinbt
for any constant band t . Thus, a general solution of y' = Ay is Y=
i Ac =
b J = eat cosbt eatetc . b [ SIn
t
- sin bt ] [ Cl cosbt C2
] •
In terms of components,
Yl = { Y2 =
eat(Cl cos bt - C2 sin bt) eat(CJ sin bt + C2 cos bt).
D
6.5.1. Application: Differential equations II
Problem 6.23 Solvethe system y' for (1) A
~ [~ =;J.
YO
~
239
= Aywithinitialcondition yeO) = Yo by computing etAyo
[ :
1
(2) A
~
U~ j l
Yo
~
[:] •
Remark: Consider the n-th order homogeneous linear differential equation dny dt n
+
dn -ly al dt n- l
+
d n-2y a2 dt n-2
+ ... +
anY
= 0,
where a, are constants and yet) is a differentiable function on an intervall = (a, b). A fundamental theorem of differential equations says that such a differential equation has a unique solution yet) on I satisfying the given initial condition: For a point to in I and arbitrary constants Co , .. . , Cn-I , there is a unique solution y yet) of the equation such that y(to) = co, y'(to) = CI, . . . ,y(n-l)(to) = Cn-I . This can be confirmed as follows : Let
=
Yl =
y,
Y2 =
y' =
dr '
=
y" =
dr'
y(n-I)
=
Y3
Yn =
dYI
dY2
dYn-l dt
Then the original homogeneous linear differential equation is nothing but dYn
dr =
dny dtn = -alYn - a2Yn-l -
. . . - an-lY2 - anYl·
In matrix notation,
y(I)=
Y~ ] [
~
-an-l
o o
=
o
-an
0 0
[J. ]
= Ay(I),
o
0
which is just a system of linear differential equations with a companion matrix A. It is treated in Section 6.3.3 (see Theorem 6.22). Therefore, the solution of the original differential equation is just the solution of y'(t) = Ay(t), which is of the form: yet) = cle Att
+ . .. +
cneAnt
if A has distinct eigenvalues AI, . . . , An. In Chapter 8, we will discuss the case of eigenvalues with multiplicity.
240
Chapter 6. Eigenvectors and Eigenvalues
6.6 Diagonalization of linear transformations Recall that two matrices are similar if and only if they can be the matrix representations of the same linear transformation, and similar matrices have the same eigenvalues . In this section, we aim to find a basis ex so that the matrix representation of a linear transformation with respect to ex is a diagonal matrix. First, we start with the eigenvalues and the eigenvectors of a linear transformation. Definition 6.7 Let V be an n-dimensional vector space, and let T : V ~ V be a linear transformation on V . Then the eigenvalues and eigenvectors of T can be defined by the same equation, Tx = AX, with a nonzero vector X E V. Practically, the eigenvalues and eigenvectors of T can be computed as follows: Let ex = {VI , V2, ... , vn } be a basis for V. Then the natural isomorphism 4> : V ~ IRn identifies the associated matrix A = [Tl a : IRn ~ IRn with the linear transformation T : V ~ V via the following commutative diagram. T
V
~l~ IRn
·V
~l~ A = [Tl a
• IRn
Now, the eigenvalues of T are those of its matrix representation A = [Tl a because if [Tl a is similar to [Tlp for any other basis f3 for V, then their eigenvalues are the same by Theorem 6.3. For eigenvectors of T, note that X (Xl, X2, . . . ,Xn ) E IRn is an eigenvector of A belonging to A (Ax = AX) if and only if 4>-1(X) = V = XlVI + X2V2 + ... + XnV n E V is an eigenvector of T (T(v) = AV), because the commutativity of the diagram shows
=
[T(v)la
= [Tla[vl a = Ax = AX = [Avla .
Therefore, if XI. X2, . . . xk are linearly independent eigenvectors of A = [T]a, then 4>-I(XI), 4>-I(X2), ... , 4>-1(Xk) are linearly independent eigenvectors of T. Hence, the linear transformation T has a diagonal matrix representation if and only if it has n linearly independent eigenvectors, by Theorem 6.7. I
The following example illustrates how to find a diagonal matrix representation of a linear transformation on a vector space .
Example 6.24 Let T : P2(1R)
~
P2(1R) be the linear transformation defined by
(Tf)(x) = f(x) +xf'(x) + f'(x). Find a basis for P2(1R) with respect to which the matrix of T is diagonal.
6.6. Diagonalization of linear transformations
241
Solution: First of all, we find the eigenvalues and the eigenvectors of T. Take a basis for the vector space P2(lR), say ex {I, x, x 2}. Then the matrix of T with respect to ex is
=
[Tl a =
1 1 0] 0 2 2 , [ 003
which is upper triangular. Hence, the eigenvalues of T are AI = 1, A2 = 2 and A3 = 3. By a simple computation, one can verify that the vectors XI = (l, 0, 0), X2 (1, 1, 0) and X3 (1, 2, 1) are eigenvectors of [Tl a in ]R3 belonging to eigenvalues AI, A2, A3, respectively. Their associated eigenvectors of Tin P2(lR) are II(x) = 1, h(x) = 1 + x , f3(x) = 1 + 2x + x 2 , respectively . Since the eigenvalues AI, A2, A3 are all distinct, the eigenvectors {XI,X2, X3} of [Tl a are linearly independent and so are {3 = {ft, 12, f3} in P2(lR). Thus , each Ij is a basis for the eigenspace E(Aj) of T belonging to Ai for i = 1, 2, 3, and the basis-change matrix is
=
=
Q
= [idl p = [XI X2 x3l = [[fila [hla [f3la l =
1 1 1]
0 1 2 [ 001
.
Hence, by changing the basis ex to (3, the matrix representation of T is a diagonal matrix : [T],B
= [idl~[Tla[idlp =
Q-I[T]aQ
=
I 00] 0 2 0 = [ 003
D.
o
Note that, if T = A is an n x n square matrix written in column vectors, A = . .. cnl, then the linear transformation A : IRn ~ IRn is given by A(ej) Cj, i = 1, . .. , n, so that A itself is just the matrix representation with respect to the standard basis ex = [ej , . . . , en} for IRn , say A = [Ala . Now if there is a basis {3 = [xj , .. . ,xn } of n linearly independent eigenvectors of A, then the natural isomorphism : IRn ~ IRn defined by (X j) ej is simply a change of basis by the basis-change matrix Q = [idlp and the matrix representation of A with respect to (3 is a diagonal matrix :
=
[CI
=
Problem 6.24 Let T be the lineartransformation on lR 3 defined by T(x, y, z)
= (4x + z, 2x + 3y + Zz, x + 4z) .
Find all the eigenvalues and their eigenvectors of T and diagonalize T. Problem 6.25 Let M2x2(lR) be the vectorspace of all real 2 x 2 matrices and let T be the linear transformation on M2x2 (R) defined by
242
Chapter 6. Eigenvectors and Eigenvalues
-I:
b]=[a+b+d a+b+C] d b+c+d a+c+d .
C
Find the eigenvalues and basis for each of the eigenspaces of T , and diagonalize T .
=
Problem 6.26 Let T : P2(lR) ~ P2(lR) be the linear transformation defined by T(f(x)) lex) x/'(x). Find all the eigenvalues of T and find a basis ex for P2(lR) so that [T]a is a
+
diagonal matrix .
6.7 Exercises 6.1. Find the eigenvalues and eigenvectors for the given matrix, if they exist.
(1)
(3)
[_~ ~
l
(2)
[
311
1 - 33 ] ,
l
[! ~ ! ~], [~1 ~ ~ ~], [-! ~~ -! ~~], [i -1 ~ j]. (4)
1010
(5)
-i
III
(6)
-1
0 -1
2
0
0 2
1 .
n
6.2. Find the characteristic polynomial, eigenvalue s and eigenvectors of the matrix
A
6.3. Show that a 2 x 2 matrix A
=[
-2 0 3 2 4 -1
= [; ~] has
(1) two distinct real eigenvalues if (a - d)2
+ 4bc >
(2) one eigenvalue if (a - d)2 + 4bc
= O.
(3) no real eigenvalues if (a - d)2
4bc < 0,
+
(4) only real eigenvalues if it is symmetric (i.e., b
0,
= c).
6.4. Suppose that a 3 x 3 matrix A has eigenvalues -1, 0, 1 with eigenvectors u, v, w, respectively. Describe the null space N(A) , and the column space C(A) . 6.5. For any two matrices A and B, show that (1) adj AB
= adj B . adj A; = Q(adj A)Q-l for any invertible matrix Q;
(2) adj QAQ-l (3) if AB
= BA, then (adj A)B = B(adj A) .
6.7. Exercises
243
(Hint : It was mentioned for any two invertible matrices A and B in Problem 2.14)
6.6. If a 3 x 3 matrix A has eigenvalues 1, 2, 3, what are the eigenvectors of B
= (A -l)(A-
2l)(A - 31) ?
6.7. Show that any 2 x 2 skew-symmetric nonzero matrix has no real eigenvalue .
= 1, AZ = 2, A3 = 3 with the associated eigenvectors XI = (2, -1 ,0), Xz = (- 1, 2, -1) , x3 = (0, -1,2 ). 6.9. Let P be the projection matri x that project s lRn onto a subspace W. Find the eigenvalues and the eigenspaces for P. 6.8. Find a 3 x 3 matrix that has the eigenvalues Al
=
6.10. Let u ,
Y be n x 1 column vectors, and let A UyT. Show that u is an eigenvector of A, and find the eigenvalues and the eigenvectors of A. 6.11. Show that if A is an eigenvalue of an idempotent n x n matrix A (i.e., A Z = A) , then A must be either 0 or 1.
6.12. Prove that if A is an idempotent matrix, then tr(A) = rank A. 6.13. Let A = [aU] be an n x n matrix with eigenvalues AI, . . . , An . Show that Aj = ajj
+
L (au -
for j
Ai)
= I,
.. . , n.
i f.j
6.14. Prove that if two diagonalizable matrices A and B have the same eigenvectors (i.e., there exists an invertible matrix Q such that both Q-I AQ and Q-I BQ are diagonal ; such BA . In fact, matrices A and B are said to be simultaneously diagonalizable), then AB the converse is also true. (See Exercise 7.17.) Prove the converse with an assumption that the eigenvalues of A are all distinct.
=
6.15. Let D : P3 (lR) --+ P3 (lR) be the differentiation defined by Df (x ) = !,(x) for f Find all eigenvalues and eigenvectors of D and of DZ.
E P3(lR).
6.16. Let T : Pz(R) --+ Pz (R) be the linear transformation defined by T(azx
z
+
al x
+
ao)
= (ao + al )x z +
(a l
+
az )x
+
(ao + az ) ·
Find a basis for Pz(lR) with respect to which the matrix representation for T is diagonal.
6.17. Determine whether or not each of the following matrices is diagonalizable. (1) [
i b -;],
-I
2
(2)
3
[i ~ ~] , 0
I
~ ~ ~]
(3) [
2
.
-2 0 -I
6.18. Find an orthogonal matrix Q and a diagonal matrix D such that Q T A Q (1) A
= [-;
4
-~ ~ ] , (2) A = [; 2 -3
6.19. Calculate A lOx for A =
[bo ~
=;] ,
6 -2
;
~],
(3) A =
0 2 3 x= [
[b
= D for
~ ~] .
0 1 1
~] . 7
6.20. For n
~ 1, let an denote the number of subsets of {I , 2 , . . . , n) that contain no consecutive integers. Find the number an for all n ~ 1.
6.21. Find a general solut ion of each of the following recurrence relations .
244
Chapter 6. Eigenvectors and Eigenvalues (1) Xn = 6Xn- l - llx n _ 2 + 6x n- 3' n 2: 3, (2) Xn = 3Xn - l - 4Xn - 2 + 2x n- 3 , n 2: 3, (3) Xn 4Xn- l - 6Xn - 2 + 4Xn - 3 - Xn- 4 , n 2: 4.
6.22.
= LetA = [0~6
0;3] . Find a value X so that A has an eigenvalue A = 1. For X() = (1,1),
calculate lim Xb where Xk
k-..oo 6.23 . Compute e A for
(1) A
= [~ ~
l
(2) A
= AXk-l , k = 1, 2, . ...
=
[i
~
l
6.24 . In 2000, the initial status of the car owners in a city was reported as follows: 40% of the car owners drove large cars, 20% drove medium-sized cars, and 40% drove small cars. In 2005, 70% of the large-car owners in 2000 still owned large cars, but 30% had changed to a medium-sized car. Of those who owned medium-sized cars in 2000, 10% had changed to large cars , 70% continued to drive medium-sized cars, and 20% had changed to small cars . Finally, of those who owned the small cars in 2000, 10% had changed to medium-sized cars and 90% still owned small cars in 2005. Assuming that these trends continue, and that no car owners are born, die or otherwise add realism to the problem, determine the percentage of car owners who will own cars of each size in 2035. 6.25 . Let A
= [~ ;
J
(1) Compute eA directly from the expansion. (2) Compute eA by diagonalizing A. 6.26 . Let A(t) be a matrix whose entries are all differentiable functions in t and invertible for all t . Compute the following : (2)
(1) :t (A(t)3) ,
6.27. Solve y'
= Ay, where
(1)A=
-6 24 -1 8 [ 2 -12
(2) A
-:J
= [; -~]
I
and
y'
6.28. Solve the system
-
Y~:: Y3 =
with initial conditions Yl (0) 6.29 . Let f(A) (1) A
= det(Al -
= [; I
dt
m l
ODd y(l)
y(O) = [ Yl
~
Y2
3Yl
~
+ +
2Y3 4Y3
2Yl + Y2 = 0, Y2(0) = 2, Y3(0) = 1.
A) be the characteristic polynomial of A . Evaluate f(A) for
~ ~] , 1 3
~(A(t)-l).
(2) A
=[
~
-1
; 1
i]. 4
In fact, f(A) = 0 for any square matrix A and its characteristic polynomial f(A). (This is the Cayley-Hamilton theorem).
6.30. Determine whether the following statements are true or false, in general, and justify your answers.
6.7. Exercises
245
(1) If B is obtained from A by interchanging two rows, then B is similar to A . (2) If A is an eigenvalue of A of multiplicity k , then there exist k linearly independent eigenvectors belonging to A. (3) If A and Bare diagonalizable, so is AB . (4) Every invertible matrix is diagonalizable. (5) Every diagonalizable matrix is invertible. (6) Interchanging the rows of a 2 x 2 matrix reverses the signs of its eigenvalues. (7) A matrix A cannot be similar to A + I.
(8) Each eigenvalue of A + B is a sum of an eigenvalue of A and one of B.
(9) The total sum of eigenvalues of A + B equals the sum of all the eigenvalues of A and of those of B. (10) A sum of two eigenvectors of A is also an eigenvector of A. (11) Any two similar matrices have the same eigenvectors . (12) For any square matrix A, det eA
= edet A .
7 Complex Vector Spaces
7.1 The n-space
en and complex vector spaces
So far, we have been dealing with matrices having only real entries and vector spaces with real scalars. Also , in any system of linear (difference or differential) equations, we assumed that the coefficients of an equation are all real. However, for many applications of linear algebra, it is desirable to extend the scalars to complex numbers. For example, by allowing complex scalars, any polynomial of degree n (even with complex coefficients) has n complex roots counting multiplicity. (This is well known as the fundamental theorem of algebra). By applying it to a characteristic polynomial of a matrix, one can say that all the square matrices of order n will have n eigenvalues counting multiplicity. For instance, the matrix A =
[~
-
~
] has no real
eigenvalues, but it has two complex eigenvalues A. = I ± i . Thus, it is indispensable to work with complex numbers to find the full set of eigenvalues and eigenvectors. Therefore, it is natural to extend the concept of real vector spaces to that of complex vector spaces, and develop the basic properties of complex vector spaces. The complex n-space en is the set of all ordered n-tuples (zr, Z2 , ... ,Zn) of complex numbers: en = {(ZI,Z2 , .. . ,Zn):
u
E
C, i = 1,2, .. . ,n},
and it is clearly a complex vector space with addition and scalar multiplication defined as follows : (Zl , Z2 , . . . , Zn)
+
(Z'l '
z;,
k(ZI , Z2,
, z~)
=
, Zn) =
(Zl
+ zl' Z2 +
z; , ... , Zn
+
z~)
(kZI, kZ2, ... ,kzn) for k E Co
The standard basis for the space en is again {ej , e2, .. . , en} as the real case, but the scalars are now complex numbers so that any vector z in en is of the form z = Lk=l Zkek with Zk = Xk + iYk E C, i.e., Z = x + iy with x , y E jRn . In a complex vector space, linear combinations are defined in the same way as the real case except that scalars are allowed to be complex numbers. Thus the same J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
248
Chapter 7. ComplexVectorSpaces
is true for linear independence, spanning spaces, basis, dimension, and subspace. For complex matrices, whose entries are complex numbers, the matrix sum and product follow the same rules as real matrices. The same is true for the concept of a linear transformation T : V -+ W from a complex vector space V to a complex vector space W. The definitions of the kernel and the image of a linear transformation remain the same as those in the real case, as well as the facts about null spaces , column spaces, matrix representations of linear transformations, similarity, and so on. However, if we are concerned about the inner product , there should be a modification from the real case. Note thatthe absolutevalue (or modulus) of a complex number
Z = x+iy is defined as the nonnegative real number lzl = (Zz)! = Jx 2 + y2, where f
is the complex conjugate of z. Accordingly, the length of a vector z = (Zt, Z2, •• • , Zn) in the n-space with Zk = Xk + iYk E e has to be modified: if one would take an inner product in as IIzll2 = + .. . then a nonzero vector (1, i) in 2 would 2 2 have zero length : 1 + i = 0. In any case, a modified definition should coincide with the old definition, when the vectors and matrices were real. The following is the definition of a usual inner product on the n-space
en en
zI
e
+z;,
en.
Definition7.1 For two vectors U=[UI U2 . .. unf andv=[vi Vz .. . vnf in en, uk. Vk E C, the dot (or Euclidean inner) product u . v ofu and v is defined by
u -v = UlVI where ii = [iiI U2 . . . tude) of a vector u in
+
UZVz + ..
.+
UnVn = iiTv,
unf, the conjugate ofu. The Euclidean length (or magni-
en is defined by
where by
IUkl 2
=
UkUk,
and the distance between two vectors u and v in en is defined
d(u, v) = [u - vII. In an (abstract) complex vector space, one can also define an inner product by adopting the basic properties of the Euclidean inner product on as axioms.
en
Definition 7.2 A (complex) inner product (or Hermitian inner product) on a complex vector space V is a function that associates a complex number (u, v) with each pair of vectors u and v in V in such a way that the following rules are satisfied: For all vectors u, v and w in V and all scalars k in C, (1) (u, v) = (v, u), (2) (u + v, w) = (u, w) (3) (ku , v) k(u, v)
=
+
(4) (v, v) 2:: 0, and (v, v) =
(v, w)
°
if and only if v = 0
(additivity), (antilinear) , (positive definiteness).
A complex vector space together with an inner product is called a complex inner product space or a unitary space. In particular, the n-space en with the dot product is called the Euclidean (complex) n-space.
7.1. The n-space
en
249
The following properties are immediate from the definition of an inner product: (5) (0, v) = (v,O) = 0, (6) (0, v + w) = (0, v) + (0, w), (7) (0 , kv) = k(o , v) .
Remark: There is another way to define an inner product on a complex vector space.
If we redefine the dot product 0 . v on the n-space en by
o· v =
UIVI
+
U2V2
+ . .. +
UnV n,
then the third rule in Definition 7.2 should be modified to be (3') (0, kv)
= k{o, v), so that (ko, v) = k(o , v).
But these two different definitions do not induce any essential difference in a complex vector space. In a complex inner product space, as the real case, the length (or magnitude) of a vector 0 and the distance between two vectors 0 and v are defined by
11011
1
= (u, o)~ ,
d(o, v)
= 110 -
vII,
respectively. Example 7.1 (A complexinner product onafunction space) Let Cc[a, b] denote the set of all complex-valued continuous functions defined on [a, b]. Thus an element in Cc[a , b] is of the formf (x) = II (x)+ih(x), where I I (x) and h(x) are real-valued and continuous on [a, b]. Note that f is continuous if and only if each component function Ii is continuous. Clearly, the set Cc[a , b] is a complex vector space under the sum and scalar multiplication of functions . For a vector f(x) = II (x) + ih(x) in Cc[a, b] , its integral is defined as follows:
l
b
f(x)dx =
l
b
[II (x)
+
ih(x)]dx =
l
b
II (x)dx +
i
l
b
h(x)dx .
It is an elementary exercise to show that, for vectors f(x) = II(x) + ih(x) and g(x) = gl (x) + ig2(X) in the complex vector space Cc[a , b], the following formula defines an inner product on Cc[a, b] :
(f, g) = =
l l l
b
f(x)g(x)dx b
[[I (x ) - ih(x)][gl (x) b
+i
[II (x )gl (x)
l
+
+
ig2(X)] dx
h(X )g2(X)] dx
b
[II (X)g2(X) - h(x)gl(x)]dx.
D
250
Chapter 7. ComplexVectorSpaces
Problem 7.1 Show that the Euclidean inner product on en satisfies all the inner product axioms.
The definitions of such terms as orthogonal sets, orthogonal complements, orthonormal sets, and orthonormal basis remain the same in complex inner product spaces as in real inner product spaces. Moreover, the Gram-Schmidt orthogonalization is still valid in complex inner product spaces, and can be used to convert an arbitrary basis into an orthonormal basis. If V is an n-dimensional complex vector space, then by taking an orthonormal basis for V, there is a natural isometry from V to en that preserves the inner product as in the real case. Hence, without loss of generality, one may work only in en with the Euclidean inner product, and we use · and (, ) interchangeably. On the other hand, one may consider the set en as a real vector space by defining addition and scalar multiplication as (ZI , Z2, . .. , Zn)
+
(z;, z~,
r(ZI, Z2,
,z~) =
=
,Zn)
(ZI
+
z;, Z2
+
z~ , . .. , Zn
+
(rZI, rZ2, .. . , r zn) for r E
z~)
R
Two vectors el = (1,0, . .. ,0) and iel = (i, 0, . . . ,0) are linearly dependent when the space en is considered as a complex vector space. However, they are linearly independent if en is considered as a real vector space. In general,
forms a basis for en considered as a real vector space. In this way, en is naturally identified with the 2n-dimensional real vector space lR2n . That is, dim en = n when en is considered as a complex vector space, but dim en = 2n when en is considered as a real vector space. Note that when en is considered as a 2n-dimensional real vector space, the space lRn {(XI, X2, ... ,xn) : Xi E lR} is a subspace of en, but not when en is considered as an n-dimensional complex vector space.
=
Example 7.2 (Gram-Schmidt orthogonalization on a complex vector space) Consider the complex vector space e 3 with the Euclidean inner product. Apply the Gram-Schmidt orthogonalization to convert the basis XI = (i, i, i), X2 = (0, i, i), X3 (0, 0, i) into an orthonormal basis.
=
Solution: Step 1: Set (i, i, i)
XI
ul
= IIxIIi = .J3
(i
i
i)
= .J3' .J3 ' .J3 .
Step 2: Let WI denote the subspace spanned by UI. Then X2 - ProjWl X2
=
X2 -
=
.. (0, I, I) -
=
2i i i) ( -3' 3' 3" .
(UI, X2)UI
2(i i i) ../3 ../3' ../3' ../3
7.1. The n-space
en
251
Therefore, U2 =
x2-ProjW1x2 3 ( 2i i i) (2i i i) Projw\x211 =./6 -3' 3' 3 = - ./6 ' ./6 ' ./6 .
II x2 -
Step 3: Let W2 denote the subspace spanned by {UI, U2} . Then X3 - ProjW2x3 =
X3 - (UI, X3)UI - (U2 , X3)U2
=
(0, 0, i) -
1 (i
i
i)
./3 ./3' ./3'./3 -
1 ( 2i
i
i)
./6 - ./6' ./6' ./6
= (o,-~ ,~) . Therefore,
Thus,
iii ) (2i i ./3' -Jj , U2= - ./6 ' ./6'
i )
UI= ( ./3'
./6 '
(,
U3=,0, -
i
../2'
i ) .j2
o
form an orthonormal basis for (; 3.
Example 7.3 (An orthonormal set in the complex-valuedfunction space Cc[O, 2rrD Let Cc[O, 2rr] be the complex vector space with the inner product given in Example 7.1, and let W be the set of vectors in Cc[O, 2rr] of the form eikx = coskx + i sinkx, where k is an integer. The set W is orthogonal. In fact, if
gk(X) = eikx are vectors in W, then 2 (gk. ge) = 1l" eikxeil xdx =
i =i =
2
I
1l"
i
cos(i - k)xdx
[ l~k sin(i -
k)x
if k
2
+
1l"
i
J:
e-ikxeilxdx =
i +
2
i
1l"
i
2 1l"
sin(i - k)xdx
[ l--~ cos(i -
k)x
ei(l-k)xdx
J:
if k
i= i ,
if k = l.
10271" dx
if k
gl(X) = eilx
and
i= i,
= e.
252
Chapter 7. ComplexVectorSpaces
Thus, the vectors in Ware orthogonal and each vector has length $ . By normalizing each vector in the orthogonal set W, one can get an orthonormal set. Therefore, the vectors 1 k lO fk(X)
=
r-:t=e x,
v2rr
k
= 0,
±1, ±2,
form an orthonormal set in the complex vector space CdO, 2rr].
D
Problem 7.2 Prove that in a complexinner productspace V,
(Cauchy-Schwarzinequality), (triangleinequality), (Pythagorean theorem).
I(x, y}!2 S (x, x}(y,y} (2) IIx+YII S [x] + IIYII (3) IIx + YII 2 = IIxll 2 + IIYII 2 if x and Yare orthogonal
(1)
The definitions of eigenvalues and eigenvectors in a complex vector space are the same as the real case, but the eigenvalues can now be complex numbers. Hence , for any n x n (real or complex) matrix A , the characteristic polynomial det(A./ - A) has always n complex roots (i.e., eigenvalues) counting multiplicities. For example, consider a rotation matrix
A = [ c~s () - sin () ] sm ()
cos ()
with real entries. This matrix has two complex eigenvalues for any () E JR, but no real eigenvalues unless () kn for an integer k. Therefore, all theorems and corollaries in Chapter 6 regarding eigenvalues and eigenvectors remain true without requiring the existence of n eigenvalues explicitly, and exactly the same proofs as the real case are valid since the arguments in the proofs are not concerned with what the scalars are. For example , one can have a theorem like 'for an n x n matrix A, the eigenvectors belonging to distinct eigenvalues are linearlyindependent', and 'if the n eigenvalues of A aredistinct, then the eigenvectors belonging to themform a basisfor n so that A is diagonalizable ',
=
c
An n x n real matrix A can be considered as a linear transformation on both JRn and C":
T : JRn
JRn
defined by
T(x) = Ax,
S:cn~cn
defined by
S(x) = Ax.
~
Since the entries are all real, the coefficients of the characteristic polynomial f(A) = det(A./ - A) of A are all real. Thus, if A is a root of f(A) = 0, then its conjugate). is also a root because f().) = f(A) = 0. In other words, if A is an eigenvalue ofa real matrix A, then). is also an eigenvalue. In particular, any n x n real matrix A has at least one real eigenvalue if n is odd. Moreover, if x is an eigenvector belonging to a complex eigenvalue A, then the complex conjugate i is an eigenvector belonging to).. In fact, if Ax = AX with x =1= 0, then
7.1. The n-space en
253
x
where denotes the vector whose entries are the complex conjugates of the corresponding entries of x. Using this fact, the following example shows that any 2 x 2 matrix with no real eigenvalues can be written as a scalar multiple of a rotation. Example 7.4 Show that if A is a 2 x 2 real matrix having no real eigenvalues, then A is similar to a matrix of the form r cos B rsin B ] [ -r sin B r cos B .
Solution: Let A be a 2 x 2 real matrix having no real eigenvalues, and let A = a + ib and X a - ib with a, b E jR and b =1= 0 be two complex eigenvalues of A with associated eigenvectors x = u + iv and = u - iv with u, v E jR2, respectively. It follows immediately that
=
x
u
=
!(x
+
a
=
2(A
I
+
x), -
A),
v
=
-~(x - x),
b
=
-2(A - A).
i
-
x
Since A =1= X, the eigenvectors x and are linearly independent in the complex vector space (;2 , as they are when (;2 is considered as a real vector space. It implies that the vectors X + and x - are linearly independent in the real vector space (;2. (see Problem 7.3 below), so that the real vectors u and v are linearly independent in the subspace jR2 of the real vector space (;2. Thus a [u , v} is a basis for the real vector space jR2, and
x
x
=
1
_
2(Ax + Ax) =
Au =
A
1
2(AX
+
-_)
AX
(U ~ iV) + X(U ~ iV)
= au _ by.
Similarly, one can get Av = bu + av, implying that the matrix representation of the linear transformation A : jR2 ~ jR2 with respect to the basis Ci is
That is, any 2 x 2 matrix that has no real eigenvalues is similar to a matrix of such form. Now, by setting r = .Ja 2 + b2 > 0, one can get a = r cos Band b = r sin B for some B E R, so
[A] = [ a
r c~s B r sin B ] .
-r SIn B r cos B
o
Problem 7.3 Let x and y be two vectors in a vector space V . Show that x and y are linearly independent if and only if x + y and x - y are linearly independent.
254
Chapter 7. ComplexVectorSpaces
Problem 7.4 Find the eigenvalues and the eigenvectors of
2 01] , [ 0i O
[1 . . :
(2) - i 2 0 . 1 0 -i 1- i 0 1 Problem 7.5 Prove that an n x n complex matrix A is diagonalizable if and only if A has n linearly independent eigenvectors in the complex vector space en. (1)
7.2 Hermitian and unitary matrices
=
Recall that the dot product of real vectors x, y E lRn is given by x . y x Ty in matrix form . For complex vectors u, v E en, the Euclidean inner product is defined by u . v = UI VI + .. . + Un Vn = iiT v, which involves the conjugate transpose, not just the transpose. Definition 7.3 For a complex matrix A, its complex conjugate transpose, A H = is called the adjoint of A.
AT,
Note that A is the matrix whose entries are the complex conjugates of the corresponding entries in A . Thus, [aij]H = [a jil. With this notation, the Euclidean inner product on en can be written as
H u- v = -T U v = u v, Problem 7.6 Show that (AB)H
= B HAH when AB can be defined.
Problem 7.7 Prove that if A is invertible, so is A H, and (A H)-I
= (A -I)H .
For complex matrices, the notion of symmetry and skew-symmetry real matrices are replaced by Hermitian and skew-Hermitian matrices , respectively. Definition 7.4 A complex square matrix A is said to be Hermitian (or self-adjoint) if A H = A , or skew-Hermitian if A H = -A. For matrices A= [ 4
~ i 4 j i]
and
B= [
_/+ i
1': i
J.
one can see that A is Hermitian and B is skew-Hermitian. A Hermitian matrix with real entries is just a real symmetric matrix, and conversely, any real symmetric matrix is Hermitian. Like real matrices , any m x n (complex) matrix A can be considered as a linear transformation from en to em, and (Ax). y = (Ax)H y = x H AHy =
for any x E en and y E .of Hermitian matrices.
x. (AHy)
em . The following theorem lists some important properties
7.2. Hermitian and unitary matrices
255
Theorem 7.1 Let A be a Hermitianmatrix. (1) Forany (complex) vector x E en, x HAx is real. (2) All (complex) eigenvalues of A are real. In particular; an n x n real symmetric matrix has precisely n real eigenvalues. (3) The eigenvectors ofA belongingto distinct eigenvalues are mutuallyorthogonal. Proof: (1) Since x HAx is a 1x 1 matrix, (x HAx) = (x HAx)H = x HAx . (2) If Ax = AX, then x HAx = x HAX = AX HX = AIIxf. The left-hand side is real and IIxll 2 is real and positive, because x =F O. Therefore, Amust be real. (3) Let x and y be eigenvectors of A belonging to eigenvalues Aand u, respectively. Let A =F p: Because A = A H and A is real, it follows that A(X . y) = (AX) . Y = Ax Y = x • Ay = j.L(x . y).
Since A =F u, it gives that x . y = x H Y = 0, i.e., x is orthogonal to y.
o
In particular, eigenvectors belonging to distinct eigenvalues of a real symmetric matrix are orthogonal.
Remark: Condition (1) in Theorem 7.1 (i.e., x HAx is real for any complex vector x E en) is equivalent to saying that the diagonals of A are real :
::][:J
=
'LaijiiXj i.]
=
"L..,aiilx;l 2 + C + C, -
where C = Li
+ C is real, all au
E lR if and only ifx H Ax E lR
Problem 7.8 Prove that the determinant of any Hermitian matrix is real. Problem 7.9 Let x be a nonzero vector in the complex vector space
en,
and A that A is Hermitian, and find all the eigenvalues and their eigenspaces for A.
= xx H. Show
It is easy to see that if A is Hermitian, then the matrix i A is skew-Hermitian; similarly, if A is skew-Hermitian, then i A is Hermitian. Therefore, the following theorem is a direct consequence of this fact and Theorem 7.1. The proof is left for an exercise.
256
Chapter 7. ComplexVector Spaces
Theorem 7.2 Let A be a skew-Hermitian matrix. (1) For any complex vector x i= 0, x H Ax is purely imaginary, and the diagonal entries of A are purely imaginary. (2) All eigenvalues of A are purely imaginary. In particular; a real skew-symmetric matrix has purely imaginary n eigenvalues. (3) The eigenvectors of A belonging to distinct eigenvalues are mutually orthogonal.
Problem 7.10 Prove Theorem 7.2 by using Theorem 7.1 , andprove (3)directly. Problem 7.11 Show that A = B + iC (B and C real matrices) is skew-Hermitian if andonly if B is skew-symmetric and C is symmetric. Problem 7.12 Let A and B be either both Hermitian or both skew-Hermitian.
=
(1) AB is Hermitian if and only if AB BA . (2) AB is skew-Hermitian if andonly if AB = - B A.
Recall that a square matrix Q with real entries is orthogonalif their column vectors are orthonormal (i.e.• QT Q = l). The same is true for complex matrices (compare with Lemma 5.18). Lemma 7.3 For a complex square matrix U, the following are equivalent: (1) (2) (3) (4) (5)
the column vectors of U are orthonormal; UHU = I ; U-1=UH ; UUH = I ; the row vectors of U are orthonormal.
The complex analogue to an orthogonal matrix is a unitary matrix. Definition 7.5 A complex square matrix U is said to be unitary if it satisfies anyone (and hence, all) of the conditions in Lemma 7.3.
Like a real orthogonal matrix, any unitary matrix preserves the lengths of vectors. Theorem 7.4 Let U be an n x n unitary matrix. (1) U preserves the dot product on
en: i.e., for all x and y in en,
(2) If A is an eigenvalue ofU, then IAI = 1. (3) The eigenvectors ofU belonging to distinct eigenvalues are mutually orthogonal.
Proof: (1) (Ux)H (Uy) = xHUHUy = xHy. (2) For Ux = AX, xHx = (Ux)H (Ux) = IAI 2x H x. (3) Let Ux = AX, Uy = J-Ly, and A i= J-L. Since U is unitary, we have Ai J-LfL. and U-1 y = J-L-1 y = fLy. Therefore,
=1=
7.2. Hermitian and unitary matrices hHy
257
= (AX)Hy = (UX)H y = xHU-Iy = XH (ji,y) = ji,xHy
holds, and A :f: /-L implies x H y = O.
0
From the same argument as in the proof of Theorem 5.19, U preserves the dot product if and only if it preserves the lengths of vectors: IIUxll = IIxll for all x in Thus, a unitary matrix is an isometry.
en.
Theorem 7.5 A basis-change matrix from one orthonormal basis to another in a complex vector space is unitary. Proof: Let a = {VI, . . . , vn } and f3 = {WI , . . . , wn } be two orthonormal bases, and let Q = [qij] be the basis-change matrix from the basis f3 to the basis a . By definition, n Wj
= LqijVi. i= 1
Thus ,
n
=
n
Lqki Lqij(Vb k=1 n
=
Vi}
i=1
n
Lqkiqkj = L[QH]ik[Q]kj. k=1
k=1
This means that the columns of Q are orthonormal and Q is unitary.
o
Just as in the real case, it is true that two matrices representing the same linear transformation on a complex vector space with respect to different bases are similar. If the two bases are both orthonormal , then the basis-change matrix is unitary (or orthogonal) . Problem 7.13 Show that I det U I = I for any unitary matrix U . Problem 7.14 Show that
A=
I +-i 2
[ I~i
-I+i] 2
- 1+ i 2
is unitary but neither Hermitian nor skew-Hermitian.
Problem 7.15 Show that the adjoint of a unitary matrix is unitary, and the product of two unitary matrices is unitary. Problem 7.16 Describe all 3 x 3 matrices that are simultaneously Hermitian, unitary, and diagonal. How many are there?
258
Chapter 7. Complex Vector Spaces
7.3 Unitarily diagonalizable matrices In the previous section, it was shown that if an n x n square matrix A is Hermitian, skew-Hermitian or unitary , then the eigenvectors belonging to distinct eigenvalues are mutually orthogonal. Hence, if such a matrix A has n distinct eigenvalues, then there exists an orthonormal basis ex for consisting of eigenvectors of A so that the matrix representation [Ala is diagonal, i.e., A is diagonalizable by a unitary matrix. In this section, it will be shown that any Hermitian, skew-Hermitian or unitary matrix has n orthonormal eigenvectors even if the eigenvalues are not all distinct. In particular, it is always diagonalizable by a unitary matrix.
en
Definition 7.6 (1) Two real matrices A and B are orthogonally similar if there exists an orthogonal matrix P such that P-' A P = B. A matrix is orthogonally diagonalizable if it is orthogonally similar to a diagonal matrix. (2) Two complex matrices A and B are nnitarily similar if there exists a unitary matrix U such that U-' AU = B. A matrix is nnitarily diagonalizable if it is unitarily similar to a diagonal matrix. We begin with a classical theorem due to Schur (1909) concerning orthogonal and unitary similarity. Lemma 7.6 (Schur's Lemma) (1) Ifan n x n realmatrix A has only real eigenval-
ues, then A is orthogonallysimilar to an upper triangularmatrix. (2) Every n x n complexmatrix is unitarilysimilar to an upper triangularmatrix. Proof: We prove only the second assertion (2) by mathematical induction on n, because (1) can be done in a similar way. Clearly, it is true for n 1. Assume now that the assertion (2) holds for n = r - 1. Let A be any r x r complex matrix and let)", be an eigenvalue of A with a normalized eigenvector x. Extend it to an orthonormal basis by the Gram-Schmidt orthogonalization, say {x, U2, .. • , u,} for C", Set a unitary matrix U, = [x U2 ... u, 1 with these basis vectors as its columns. A direct computation of the product AU, shows
=
o;'
Ui'AU, =
=
[
uf AU, = Uf[Ax AU2 .. . Aurl
---
iT
u-T 2
u-T r
A, =
0 0
I
+
~~H+ *
I
I I
B
I
AU2
I
...
+J
7.3. Unitarily diagonalizable matrices
259
where B is an (r - 1) x (r - 1) matrix. By the induction hypothesis there exists an (r - 1) x (r - 1) unitary matrix V2 such that V;1 B V2 is an upper triangular matrix with diagonal entries A2 , A3, . .. , Ar . Define
Then it is easy to check that V is also a unitary matrix , and
o
o Schur's lemma is a cornerstone in the study of complex matrices.
Theorem 7.7 If A is either a Hermitian, a skew-Hermitian or a unitary matrix, then it is unitarily diagonalizable. Proof: By Schur 's lemma, V H AV = B is an upper triangular matrix for some unitary matrix V . However,
where the right-hand sides of the equalities depend on whether A is either a Hermitian, a skew-Hermitian or a unitary matrix. This means that the upper triangular matrix B takes the same type; a Hermitian, a skew-Hermitian or unitary, as A. Note that B H is a lower triangular matrix and B- 1 is an upper triangular matrix because B is upper triangular. Therefore, the upper triangular matrix B must be a diagonal matrix in each case of Hermitian, skew-Hermitian or unitary. 0 Note that, in the similarity condition V-I AV(= V H AU) = D of A to a diagonal matrix D through a unitary matrix V, the equation AV V D shows that the column vectors of V constitute a set of n orthonormal eigenvectors of A while the diagonal entries of D are eigenvalues of A as shown in Theorem 6.7. Therefore, by Theorems 7.1, 7.2 and 7.4, all the diagonal entries of D are real, purely imaginary or of unit length depending on the types (Hermitian, skew-Hermitian or unitary, respectively) of the matrix A.
=
Example 7.5 (AHermitianmatrixis unitarilydiagonalizable) Diagonalize the matrix 2 A= [ l+i by a unitary matrix.
l-i] 1
260
Chapter 7. Complex Vector Spaces
Solution: Since A is Hermitian, it is unitarily diagonalizable. One can show that A has the eigenvalues Al = 3 and A2 = 0 with associated eigenvectors Xl = (1 - i, 1) and X2 = (-1 , 1 + i), respectively. Let
ui
and let
xII. ) = J3(1 - I , 1 ,
=
IIxllI
U2 =
I x211
X2
1 . = J3(-1, 1 +1),
U=~[l~ i l~il
Then, U is a unitary matrix and diagonalizes A: UHAU
=
~[l~/ l~i][ 111 l~i][ l~i l~i]
=
[~ ~l
0
Since all the real symmetric matrices are Hermitian matrices, they are unitarily diagonalizable by Theorem 7.7 . However, the following theorem says more than that.
Theorem 7.8 For any n x n real matrix A, the following are equivalent. (1) A is symmetric. (2) A is orthogonally diagonalizable. (3) A has a full set of n orthonormal eigenvectors.
Proof: (1) =} (2): If A is real and symmetric, then it is a Hermitian matrix, so it has only real eigenvalues. By Schur's lemma 7.6, A is orthogonally similar to an upper triangular matrix, which must be already diagonal. Hence it is orthogonally diagonalizable. (2) =} (3): If A is diagonalized by an orthogonal matrix Q, then the column vectors of Q are eigenvectors of A. Hence A has a full set of n orthonormal eigenvectors. (3) =} (1): If A has a full set of n orthonormal eigenvectors, then these eigenvectors form an orthogonal basis-change matrix Q such that A Q = Q D. It is now trivial to show that A = QDQ-l = QDQT is symmetric. 0 Corollary 7.9 Let A be a real symmetric matrix, and let A be an eigenvalue of A of multiplicity ms , Then dim E(A) = dimN(Al - A) = m)... By Theorem 7.8, all real symmetric matrices are always diagonalizable, even more, orthogonally. Moreover, they are all that can be "orthogonally" diagonalized. Even though not all matrices are diagonalizable, certain non-symmetric matrices may
7.3. Unitarily diagonalizable matrices
261
still have a full set oflinearly independent eigenvectors so that they are diagonalizable, but in this case the eigenvectors cannot be orthogonal. That is, the basis-change matrix Q cannot be an orthogonal matrix. For example, any triangular matrix having all distinct diagonal entries is diagonalizable because their eigenvalues are all distinct, but cannot be orthogonally diagonalizable if it is not diagonal (i.e., not symmetric). Problem 7.17 Show that the non-symmetric matrices (1) A
=
10-1] [o 0 1 0
0 2
are diagonalizable, but not orthogonally .
Remark: The procedure for orthogonal diagonalization of a symmetric matrix A can be summarized as follows. Step 1 Find a basis for each eigenspace of A. Step 2 Apply the Gram-Schmidt orthogonalization to each of these bases to obtain an orthonormal basis for each eigenspace. Step 3 Form the matrix Q whose columns are the basis vectors constructed in Step 2; this matrix orthogonally diagonalizes A. The justification of this procedure should be clear, because eigenvectors belonging to distinct eigenvalues are orthogonal, while an application of the Gram-Schmidt orthogonalization assures that the eigenvectors obtained within the same eigenspace are orthonormal. Thus, the entire set of eigenvectors obtained by this procedure is orthonormal. Example 7.6 (A symmetric matrix is orthogonally diagonalizable) Find an orthogonal matrix Q that diagonalizes the symmetric matrix A=
4 2 2] 242 . [ 224
Solution: The characteristic polynomial of A is det(AI - A)
= det
[
A-4 -2 -2] = -2
A- 4
-2
-2
-2
A-4
(A - 2)2(A - 8).
Thus, the eigenvalues of A are A = 2 and A = 8. By the method used in Example 6.2, it can be shown that Xl
= (-1, I, 0)
and
X2
= (-1, 0, 1)
form a basis for the eigenspace belonging to A = 2. Applying the Gram-Schmidt orthogonalization to {Xl , X2} yields the following orthonormal eigenvectors (verify):
262
Chapter 7. ComplexVectorSpaces 01
=
I
"fi (-I,
and
I, 0)
02
=
1
.J6 (-I,
-I, 2).
The eigenspace belonging to A = 8 has X3 = (I, I, I) as a basis. The normalization of X3 yields 03 =
~ (I,
I, I). Finally, using 01, 02, and 03 as column vectors,
one can obtain
1
1
-"fi
1
.J6 1 -.J6 "fi
.j3 1
1
Q=
0
.j3
2
1
.J6
.j3
which orthogonally diagonalizes A. (It is suggested that readers verify that QT A Q is actually a diagonal matrix.) D
Example 7.7 (Diagonal, butneitherHermitian, skew-Hermitian, nor unitary) A matrix A=
100] [o 0 0 i 0 x
,
x
E
R with [x] f= I,
is neither Hermitian, skew-Hermitian, nor unitary, but is a diagonal matrix . Hence, there are infinitely many unitarily diagonalizable matrices which are neither Hermitian, skew-Hermitian, nor unitary. D Problem 7.18 For each of matrices (1)
[~ ~
]
(2)
[2i i
0 -1
0] -i
-1 0 2i find a unitary matrix U such that U- I AU is an upper triangular matrix.
7.4 Normal matrices We have seen that Hermitian, skew-Hermitian and unitary matrices are all unitarily diagonalizable. However, it turns out thatthey do not constitute the entire class of unitarily diagonalizable matrices, whereas in the class of real matrices the real symmetric matrices are the only matrices with real entries that are orthogonally diagonalizable. That is, there are infinitely many unitarily diagonalizable matrices that are neither one of the above-mentioned classes of matrices, (see Example 7.7). Actually, all unitarily diagonalizable matrices belong to the following class of matrices, called normal matrices .
Definition 7.7 A complex square matrix A is called normal if AA H =AHA.
7.4. Normal matrices
263
Note that all the Hermitian, skew-Hermitian and unitary matrices are normal , But, there are infinitely many matrices that are normal but are none of these, as shown in Example 7.7. Moreover, there exists an example of such matrices which are not diagonal. Example 7.8 (Normal, but neither Hermitian , skew-Hermitian, unitary, nor diagonal) The 2 x 2 matrix
is normal, but is neither Hermitian, skew-Hermitian, unitary, nor diagonal. However, one can easily check that this matrix is unitarily diagonalizable. In fact,
Problem 7.19 Whichof followingmatricesare Hermitian,skew-Hermitian, unitaryor normal?
(4) [
3]
-i 3 2
~ ~] , (3) [ - ~ L -~], -i 0 -i Oi] l+1 ii] o ,(6) [3 1 -:- i 3 . I 1
'
o
0
-I
3
1
As a matter of fact, it will be shown that the normal matrices are all classified as the unitarily diagonalizable matrices . We begin with a lemma. Lemma 7.10 matrix.
If an upper triangular matrix T is normal, then it must be a diagonal
Proof: Use induction on k in comparing the diagonal (k, k)-entry of both sides of TT H = THT:
tll [
o
" . tIn]
tnn
[
t ll
0] [til
tIn
t nn
=
tIn
0] [tIl t nn
0'·
tIn] lnn
•
If k = 1, the equality [TTH]ll
= Itlll2 + ... +
Itlnl2, and [THT]ll
= Itlll2
implies tl2 = .. . = tIn = O. Inductively, assume that ti-Ii = . . . = ti-In = 0 has been shown for i = 1, .. . , k. Then
264
Chapter 7. ComplexVectorSpaces
and [T
H
2 T]kk = Itlk 1
+ . .. +
2 Itk-Ik 1
+
2 Itkk 1 = Itkd
= .. . = tk-Ik = 0 by an induction hypothesis. But TT H = T H T yields tkk+1 = .. . = lkn = O. It concludes that tkk+1 = .. , = lkn = 0 for all k = I, . . . , n, because tlk
which shows that all the entries of T off the diagonal are zero, i.e., T is diagonal. D
Theorem 7.11 For any n x n complex matrix A, the following are equivalent: (1) A is normal; (2) A is unitarily diagonalizable; (3) A has a full set ofn orthonormal eigenvectors. Proof: (1) => (2): Suppose that A is normal. By Schur 's lemma, there exists a unitary matrix U such that T U HAU is an upper triangular matrix . Then T is also normal , since
=
=
TT H
=
U H AUU H AHU = U H AAHU U H AHUU H AU = THT.
= U H A H AU
Thus , by Lemma 7.10, T is already diagonal so that A is unitarily diagonalizable. (2) => (3): It is clear that the columns of the basis-change matrix U are n orthonormal eigenvectors of A. (3) => (1): Let U be the unitary matrix whose columns are the n orthonormal eigenvectors . Then AU = U D or A = U DU H, and AA H
= =
UDUHUDHU H = UDDHU H = UDHDU H UDHUHUDU H AHA .
D
That is, A is normal.
Note that there exist infinitely many non-normal complex matrices that are still diagonalizable, but of course not unitarily. One can find such examples among the triangle matrices having distinct diagonal entries. Recall that any n x n real matrix A can be written as the sum S + T of a symmetric matrix S = !(A +A T) and askew-symmetric matrix T = !(A - AT) . (See Problem 1.11.) The same kind of expression is also possible for a complex matrix. A complex matrix A can be written as the sum A = HI + i H2, where
1 H HI = -(A +A ), 2
i H H2=--(A-A );
2
. 1 H or 1H2=-(A-A ).
2
Clearly both HI and H2 are Hermitian, and i H2 is skew-Hermitian. Problem 7.20 Show that the matrix A
= [~ ~],
uniIarily diagonalizable. But it is diagonalizable.
x E JR, is not normal, so it cannot be
7.5.1. Application: The spectral theorem
265
[Ii i]
Problem 7.21 Determine whether or not the matrix A =
iii iii
is unitarily diagonalizable. If it is, find a unitary matrix U that diagonalizes A.
Problem 7.22 Let HI and H2 be two Hermitian matrices . Show that A if and only if HIH2 = H2HI.
= HI + i H2 is normal
Problem 7.23 For any unitarily diagonalizable matrix A, prove that (1) A is Hermitian if and only if A has only real eigenvalues; (2) A is skew-Hermitian if and only if A has only purely imaginary eigenvalues; (3) A is unitary if and only if 1> . 1 I for any eigenvalue Xof A .
=
7.S Application 7.5.1 The spectral theorem As shown in the previous section, the normal matrices are the only matrices that can be unitarily diagonalized. That is, A is normal if and only if there exists a basis ex for en consisting of orthonormal eigenvectors of A such that the matrix representation [A]a of A with respect to ex is diagonal.
Theorem 7.12 (Spectral theorem) Let A be a normal matrix, and let {UI, U2, .. . , un} be a set oforthonormal eigenvectors belonging to the eigenvalues AI, A2, ... , An of A, respectively. Then A can be written as
A = U DU
H
= A,IUIUf + A,2 U2U¥ + . . . + A,nUn u !{
I
and uiuf is the orthogonal projection matrix onto the subspace spanned by the eigenvector Uj for i = 1, .. . , n. Proof: Note that U = [UI U2 ... un] is a unitary matrix that transforms A into a diagonal matrix D, i.e., U- I AU = U H AU = D . Then
A
~ =
UDU"
~
[A,., A2.2
A,IUluf + A,2U2U¥ +
A••• l
[~
+ A,nUn u !{
A,I PI+A,2 P2+"'+A,n Pn,
]
266
Chapter 7. Complex Vector Spaces
where
uli ]
Pi = UiU!l = : I . [ Uni
[Uli' . . Un;]
[IUliI
=
Z
: •
.. . Uliuni]
'.
:
•
•
,
lunilZ
UniUIi
which is a Hermitian matrix. Now, for any x E en and i, j = 1, . . . , n,
PiX PiPj
(PI + .. . + Pn)x
= =
= (Ui, X}Ui, uiuf UjUf = (Ui , Uj }Uiuf
=
lUiuf = Pi { OUiuf = 0
=
PIX + .. . + Pnx
uiufx
ifi = j, ifi =P j,
=
n
~)Ui' X}Ui
=
X
=
id(x) .
i= 1
Therefore, each Pi is nothing but the orthogonal projection onto the subspace spanned by the eigenvector Ui. 0
=
Note that the equation PI + ... + Pn id means that if one restricts the image of the Pi to be the subspace spanned by Ui which is isomorphic to C, then (PI, .. . , Pn) defines another orthogonal coordinate system with respect to the orthonormal basis {UI, .. . , un} just like (Zl, . . . , Zn) of the en (see Sections 5.6 and 5.9.3). Note that any X E en has the unique expression x = ~:::
AIPIX + AZPZX + . . . + AnPnX
=
AIUI (u~x) +
+ AnUn(U~X)
=
AI(U"X}UI +
+An(Un,X}un.
If an eigenvalue A has multiplicity t, i.e., A = Ail = ... = Ait' with a set of e orthonormal eigenvectors Uil' ... , Uit' then they form an orthonormal basis for the eigenspace E(A), and
is the orthogonal projection matrix onto E(A) = N(U - A). Therefore, counting the multiplicity of each eigenvalue, every normal matrix A has the unique spectral decomposition into the projections
for k ::: n, where Ai'S are all distinct.
Coronary 7.13 Let A be a normal matrix.
7.5.1. Application:The spectral theorem
267
(1) The eigenvectors of A belonging to distinct eigenvalues are mutually orthogonal. (2) If an eigenvalue A of A has multiplicity k, then the eigenspace N(A - AI) belonging to A is ofdimension k.
Coronary 7.14 Let A be a normal matrix with the spectral decomposition A = A1 PAl + .. .+ AkP Ak· Then,for any positive integer m,
Ai PAl + ... + Ak' PAk •
Am = Moreover,
if A is invertible,
then for any positive integer t;
t i l • =-P At A1 + .. ·+-P At Ak
A-
1
k
Example 7.9 (Spectral decomposition of a symmetric matrix) Find the spectral decomposition of
A=
422] 2 4 2 . [ 224
Solution: From Example 7.6, the spectral decomposition is
where the projections are
-2 ] -2 , 4
Hence,
p{
= P1 +
P2
1[ 2-1 -1]
=3
- 1 2-1 -1 -1 2
is the projection onto the eigenspace E(2) belonging to A = 2, P3 is the projection onto the eigenspace E (8) belonging to A 8, and
=
A =
42 24 2] 2 = -2 [ -12 -12 -1] -1 [ 2 2 4 3 -1 -1 2
+ -8 [11 3
11 1] 1 . 1 1 1
o
268
Chapter 7. Complex Vector Spaces
Problem 7.24 Given A
=
[
0 2-1] 2
3 - 2
-1 -2
, find an orthogonal matrix Q that diagonalizes
0
A, and find the spectral decomposition of A.
Example 7.10 (Spectral decomposition ofa normalmatrix) Find the spectral decomposition of a normal matrix A=
i] .
0 0 0 i 0
[ i 0 0
Solution: Since A is normal (AA H = AHA) , it is unitarily diagonalizable. The characteristic polynomial of A is det(AI - A)
= det [
A 0 0 A- i -i 0
-i] = 0 A
(A - i)2(A + i).
Hence, the eigenvalues are}q = A2 = i of multiplicity 2 and A3 = -i of multiplicity 1. By a simple computation using the Gram-Schmidt orthogonalization, one can find that "I
~ J,
[n ", ~ [n ", ~ J, [ -~ ]
are orthonormal eigenvectors of A belonging to the eigenvalues AJ, A2 and A3, respectively. Now, the spectral decomposition is A = i (PI + P2) - i P3, where the projection matrices are
-1]
o . 1
Hence,
1] .[ 10 -1] o .
o _!.. 1
2
0 0 -1 0
1
D
7.6. Exercises
269
Problem 7.25 Find the spectral decomposition of each of the following matrices : (1) A
= [~
~ J'
(2) B
=[ 2 ~ i I
1 0 0 (3) C
020
=
-
[ 000
11]
1
o o o
1 0 1 0 [ 1 0
(4) D _
0 0 2
2+iJ 3 '
0 0
.
0
7.6 Exercises 7.1. Calculate [ x ] for (1) x
= [ 1 +2 i
]
'
(2) x
=[
I
~ 2i ] .
3+i
7.2. Construct an orthonormal basis for (;2 from {(i, 4 + 2i) , (5 + 6i , I)} by applying the Gram-Schmidt orthogonalization. i 1 l - i l+ i ] 1- i I + i I 2+ i . [ 1 + 3i 1 - i 2 - i 1 + 4i 7.4. Find the eigenvalues and eigenvectors for each of the following matrices:
7.3. Find the rank of the matrix A =
(1) (3)
[-~ -~ [-
~ -~
3 - 5
J.
-;], - 3
7.5. Find the third column vector v so that U
=
JJ 1 [ "J3
freedom is there in this choice ?
v
_...L
I
.j'j,
7.6. Find a real matrix A such that A + r I is invertible for all r matrix A such that A + cI is invertible for all C E (; ?
7.7. Find a unitary matrix whose first row is (1)k(l,I-i)wherekisanumber,
I]
~o
(2)(1 '~ '
E
is unitary. How much
R Does there exist a square
9)-
= (;2 with the Euclidean inner product. Let T be the linear transformation on V with the matrix representation A = [~ ~ ] with respect to the standard basis. Show
7.8. Let V
that T is normal and find a set of orthonormal eigenvectors of T.
7.9. Prove that the following matrices are unitarily similar :
[~~~: -:~~: J.
[e~O
) iOJ. where e is a real number.
270
Chapter 7. Complex Vector Spaces
7.10. For each of the following real symmetric matrices A, find a real orthogonal matrix Q such that QT AQ is diagonal : (1)
[~ ~
J.
(2)
[~ ~
l
7.11. For each of the following Hermitian matrices A, find a unitary matrix U such that UH AU is diagonal.
(2) [
1 . 2 + 3i ] , 2 - 31 -1
(3)
I
2+i]
i
2
-i
[ 2-i
1- i
l+i
.
2
7.12. Find the diagonal matrices to which the following matrices are unitarily similar. Determine whether each of them is Hermitian, unitary or orthogonal. (1)
l[l+i
2:
1- i
l-i] + I
i
'
(2) [0.6 -0.8], 0.8 0.6
(3)
[
1 -i
o
0]
i 1 i -i I
.
7.13. For a skew-Hermitian matrix A, show that (1) A - I is invertible, (2) eA is unitary. 7.14. Let U be a unitary matrix. Prove that U and U T have the same set of eigenvalues . 7.15. Verify that A
= [; ~] is normal. Diagonalize A by a unitary matrix U .
7.16. Show that the non-symmetric real matrix A
=[
~ ~ ~ ] can be diagonalized.
-2 -4 -5
7.17. Suppose that A , Bare diagonalizable n x n matrices. Prove that AB = BA if and only if A and B can be diagonalized simultaneously by the same matrix Q, i.e., Q-I AQ and
7.18.
:~~:':':""~::i:: = [~ ~ ~] . of A
1 1 2 7.19. Let A and B be 2 x 2 symmetric matrices. Prove that A and B are similar if and only if det A = det Band tr(A) = tr(B) .
7.20. Let A be a real symmetric n x n matrix and X an eigenvalue of A with multiplicity m, Show that dimN(A - H) = m. 7.21. Show that a matrix A is nilpotent, i.e., An = 0 for some integer n ~ 1, if and only if its eigenvalues are all zero.
7.22. Determine whether the following statements are true or false, in general, and justify your answers . (1) Every Hermitian matrix is unitarily similar to a diagonal matrix. (2) An orthogonal matrix is always unitarily similar to a real diagonal matrix. (3) For any square matrix A, AA H and AHA have the same eigenvalues. (4) If a triangular matrix is similar to a diagona l matrix, it is already diagonal . (5) If all the columns of a square matrix A are orthonormal, then A is diagonalizable. (6) Every permutation matrix is diagonalizable. (7) Every permutation matrix is Hermitian.
7.6. Exercises (8) A nonzero nilpotent matrix cannot be Hermitian . (9) Every square matrix is similar to a triangular matrix . (10) If A is a Hermitian matrix, then A + i I is invertible.
(11) If A is a real matrix, then A + i I is invertible. (12) If A is an orthogonal matrix, then A +
-! I is invertible.
(13) Every unitarily diagonalizable matrix with real eigenvalues is Hermitian . (14) Every diagonalizable matrix is normal. (15) Every invertible matrix is similar to a unitary matrix.
271
8 Jordan Canonical Forms
8.1 Basic properties of Jordan canonical forms Most problems related to a (complex) matrix A can be easily solved if the matrix is diagonalizable, as shown in previous chapters. For example, this is true in computing the power An, in solving a linear difference equation Xn = AXn-J or a linear differential equation y' (t) = Ay(t) . In this chapter, we discuss how to solve the same problems for a non-diagonalizable matrix A by introducing the Jordan canonical form of a square matrix. Recall that an n x n matrix A is diagonalizable if and only if A has a full set of n linearly independent eigenvectors, or equivalently, the dimension of each eigenspace E(A) = N(Al - A) is equal to the multiplicity of the eigenvalue A. Hence, itA J, . . . , AI are distinct eigenvalues of A with multiplicities mAl' .. . , mAt' respectively, then dim E(Ad
+ ... +
dim EP.·I)
= mAl + . ..+
rnA,
=n
and On the other hand, a matrix A is not diagonalizable if and only if A has an eigenvalue A with multiplicity rn A > 1 such that 1 :::: dim E(A) < rnA,
so that the number of linearly independent eigenvectors belonging to A must be less than ms, However, even if a matrix A is not diagonalizable, one may try to find a matrix similar to A which has as many zero entries as possible except diagonals. Schur's lemma says that any square matrix is (unitarily) similar to an upper triangular matrix. But, it is a fact that any square matrix A can be similar to a matrix much "closer" to a diagonal matrix, called a Jordan canonical form . Its diagonal entries are the eigenvalues of A, the entry just above each diagonal entry is 0 or I, and all other J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
274
Chapter 8. Jordan Canonical Forms
entries are O. In this case, the columns of a basis-change matrix Q are something like eigenvectors, but not the same in general. They are called generalizedeigenvectors. Theorem 8.1 Forany square matrix A, A is similar to a matrix J of the following form, called the Jordan canonical form of A or a Jordan canonical matrix,
in which (1) s is the number oflinearlyindependent eigenvectors of A, and (2) each Jt is an upper triangularmatrix oftheform
where Ai isa singleeigenvalue of Jj withonlyone linearlyindependent associated eigenvector. Such Jj is called a Jordan block belonging to the eigenvalue Aj. The proof of Theorem 8.1 may be beyond a beginning linear algebra course. Hence, we leave it to some advanced books and we are only concerned with how to find the Jordan canonical form J of A and a basis-change matrix Q in this book. First, note that if an n x n matrix A has a full set of n linearly independent eigenvectors (that is s = n), then there have to be n Jordan blocks so that each Jordan block is just a 1 x 1 matrix, and an eigenvalue A appears as many times as its multiplicity. In this case, the Jordan canonical form J of A is just the diagonal matrix with eigenvalues on the diagonal and a basis-change matrix Q is defined by taking the n linearly independent eigenvectors as its columns. Hence, a diagonal matrix is a particular case of the Jordan canonical form. Remark: (1) For a given Jordan canonical form J of A, one can get another one by permuting the Jordan blocks of J, and this new one is also similar to A. It will be shown later that any two Jordan canonical forms of A cannot be similar except this possibility. In other words, any (complex) square matrix A is similar to only one Jordan canonical matrix up to permutations of the Jordan blocks. In this sense, it is called the Jordan canonical form of a matrix A. (2) As an alternative way to define the Jordan canonical form of A, one can take the transpose JT of the Jordan canonical matrix J given in Theorem 8.1. In this case, each Jordan block becomes a lower triangular matrix with a single eigenvalue. But this alternative definition does not induce any essential difference from the original one.
8.1. Basic properties of Jordan canonical forms
275
If J is the Jordan canonical form ofa matrix A , then they have the same eigenvalues and the same number of linearly independent eigenvectors, but not the same set of them in general. (Note that x is an eigenvector of J = Q -I A Q if and only if Qx is an eigenvector of A). Actually, for a given matrix A, its Jordan canonical form J is completely determined by the number s of linearly independent eigenvectors of A and their ranks (which will be defined in Section 8.2): each eigenvector corresponds to a Jordan block in J, and the rank of an eigenvector determines the size of the corresponding block. The following example illustrates which matrices A have the Jordan canonical form J and how the s linearly independent eigenvectors of A (or J) correspond to the Jordan blocks in J. Example 8.1 (Each Jordan block to each eigenvector) Let J be .a Jordan canonical matrix of the form :
J=
[~ ~] [
]
[~ ~]
[ JI
=
h h
]
.
[2]
Find the number of linearly independent eigenvectors of J and determine all matrices whose Jordan canonical forms are J . Solution: Since J is a triangular matrix , the eigenvalues of J are the diagonal entries 6 and 2 with multiplicities 2 and 3, respectively. The eigenspace E(6) has a single dimN(1-6I) 5-rank(Jlinearly independent eigenvector because dim E(6) 6/) = 1, by the Rank Theorem 3.17. In fact, el = (l, 0, 0, 0, 0) is such a vector and A = 6 appears only in a single block h . Similarly, one can see that the eigenspace E (2) has two linearl y independent eigenvectors e3 and es with dim E (2) = 5 - rank(1 - 2/) = 2, and A = 2 appears in two blocks h and h Hence, one can conclude that if a matrix A is similar to J , then A is a 5 x 5 matrix whose eigenvalues are 6 and 2 with multiplicity 2 and 3 respectively, but there is only one linearly independent eigenvector belonging to 6, (i.e., dim E(6) = 1) and there are only two linearly independent eigenvectors belonging to 2, (i.e., dim E(2) = 2). Moreover, the converse is also true by Theorem 8.1. In general, one can say that if a matrix A is similar to a Jordan canonical matrix J, then both matrices have the same eigenvalues of the same multiplicities and dimN(U - A) = dimN(U - J) for each eigenvalue A. D
=
=
As shown in Example 8.1, the standard basis vectors ej's associated with the first column vectors of the Jordan blocks Jj'S of J form linearly independent eigenvectors of the matrix J, and then the vectors Qe j 's form linearly independent eigenvectors of the matri x A, where Q-1AQ = J.
276
Chapter 8. Jordan Canonical Forms
Problem 8.1 Note that the matrix
A
=
[~o - i-~ j -~] 0 0
o
0 0
2 0
0 2
has the eigenvalues 6 and 2 with multiplicities 2 and 3, respectively. Moreover, there are two linearly independent eigenvectors uj (0, -1 , 1, 0, 0) and U2 (0, 1, 0, 0, 1) belonging to A = 2, and vI = (-1, 0, 0, 0, 0) is an eigenvector belonging to A = 6. Show that the Jordan canonical form of A is the matrix J given in Example 8.1 by showing Q-I A Q J with an invertible matrix
=
=
=
Q
=
[-~o 1-! ~! ]. o
0 0
0 -1 0 001
Problem 8.2 Show that for any Jordan block J, the order of J is the smallest positive integer k such that (J - )..J)k = 0 , where A is an eigenvalue of J .
An eigenvalue A. may appear in several blocks. In fact, the number of Jordan blocks belonging to an eigenvalue A. is equal to the number of linearly independent eigenvectors of A belonging to A., which is the dimension of the eigenspace E(A.) = N(A./ - A) . Moreover, the sum of the orders of all Jordan blocks belonging to an eigenvalue A. is equal to the multiplicity ms; of A.. Next, one might ask how to determine the Jordan blocks belonging to a given eigenvalue A.. The following example shows all possible cases of Jordan blocks belonging to an eigenvalue A. when its multiplicity rnA is fixed.
Example 8.2 (Classifying Jordan canonical matrices having a single eigenvalue) Classify all possible Jordan canonical forms of a 5 x 5 matrix A that has a single eigenvalue A. of multiplicity 5 (up to permutations of the Jordan blocks). Solution: There are seven possible Jordan canonical forms as follows . (I) Suppose A has only one linearly independent eigenvector belonging to A.. Then the Jordan canonical form of A is of the form
J(l)
A. o A.I OI O0 = Q-I AQ = 0 0 A. I [ o 0 0 A.
o
0O ] 0 , I 0 0 0 A.
which consists of only one Jordan block with eigenvalue A. on the diagonal. And, both A and J (l ) have only one linearly independent eigenvector belonging to A. . (Note that rank(J (l) - A./) = 4.)
8.1. Basic properties of Jordan canonical forms
277
(2) Suppose it has two linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is either one of the forms
each of which consists of two Jordan blocks belonging to the eigenvalue A. Note that J (2) has two linearly independent eigenvectors el and ej, while J (3) has el and e2. These two matrices J (2) and J (3) cannot be similar, because (1 (2) - A1)3 = 0, but (1(3 ) - A1)3 i: O. (One can justify it by a direct computation.) (3) Suppose it has three linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is either one ofthe forms
each of which consists of three Jordan blocks belonging to the eigenvalue A. Note that J(4 ) has three linearly independent eigenvectors ej , e2 and ea, while J(5) has el, e2 and e3. These two matrices J (4) and J (5) are not similar, because (1 (4) - A1)2 = 0, but (1 (5) - A1)2 f= O. (4) Suppose it has four linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is of the form
which consists of four Jordan blocks with eigenvalue A. (5) Suppose it has five linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is the diagonal matrix J(7 ) with diagonal entries A. Note that all of these seven possible Jordan canonical matrices have the same trace, determinant and the same characteristic polynomial , but any two of them are not similar to each other. D As shown in the case (2) (also in (3)) of Example 8.2, two Jordan canonical matrices J (2) and J (3) have the same eigenvalue of the same multiplicity and they also have the same number of linearly independent eigenvectors , but they are not similar. The problem of choosing one of the two possible Jordan canonical forms
278
Chapter 8. Jordan Canonical Forms
which is similar to the given matrix A depends on the sequence ofrank(A - ).,1)i for 1,2, . . . , n.
e=
The next example illustrates how to determine the orders of the Jordan blocks belonging to the same eigenvalue A.
Example 8.3 (Determine Jordan blocks belonging to the sameeigenvalue) Let J be a Jordan canonical matrix with a single eigenvalue ).,:
[).,] Then,
and (J - 'A./)3
= O.
0
[0]
Thus, one can get rank(J - H) = 4, rank(J - ).,1)2 = 1 and rank(J - H)3 = O. This sequence of ranks, rank(J - H)i for = 1,2,3, determines completely the orders of blocks of J . In fact, one can notice that
e
(i) the fact that (J - ).,1)3 = 0 but (J - H)2 f= 0 implies that the largest block has order 3, (li) rank( J - ).,1)2 1 is equal to the number of blocks of order 3, (iii) rank(J - H) = 4 is equal to twice the number of blocks of order 3 plus the number of blocks of order 2, so there are two of them, (iv) the number of blocks of order 1 is 8 - (2 x 2) - (3 xl) = 1.
=
Recall that two similar matrices have the same rank. Hence, if J is the Jordan canonical form of A, then for any eigenvalue X and for any positive integer e, we have rank(A - ).,1)i = rank (J - ).,1)i. Furthermore, in a sequence of matrices J - H, (J - ).,1)2, (J - ).,1)3, . .. , all Jordan blocks belonging to the eigenvalue x will terminate to a zero matrix but all other blocks (belonging to an eigenvalue different from X)remain as upper triangular matrices with nonzero diagonal entries. Hence, the sequence ofrank(J - ).,1)k must stop decreasing at n - ms, (Note: rank(J _ H)k = n - rnA, when k = rnA')
8.1. Basic properties of Jordan canonical forms Let
CA
279
= n - m A for simplicity. Then, the decreas ing sequence {rank(A - Al)k - CA : k
= 1, . . . , m A}
determines completely the orders of the blocks of J belonging to A as shown in Example 8.3:
(i) The order of the largest block belonging to A is the smallest positive integer k such that rank(A - Al)k - CA = O. And the number of such largest blocks is equal to rank(A - Al)k-I - CA, (say = ij) . (ii) The number of blocks of order k - 1 is equal to rank(A - Al)k-2 - CA 2£1, (say = i2). (iii) The number of blocks of order k - 2 is equal to rank(A - Al)k-3 - CA - 3i I 2£2, (say = i3), and so on. In general, if iI, i2, ... , i j are given, one can determine i j+ I inductively as follows : (iv) The number of blocks of order k - j is equal to rank(A - Al)k-(j+I) - CA (j + 1)il - ji2 - .. . - 2ij, (say = ij+l) with io = 0 for j = 0, . .. , k - 1. In summary, one can determine the Jordan canonical form J of an n x n matrix
A by the following procedure. Step 1 Find all distinct eigenvalues AI, ... , At of A . Let their multiplicities be mAl' ... , m A" respectively, so that mAl + .. . + mAt = n , and let c Aj = n - mA j ' Step 2 For each eigenvalue A, the Jordan blocks belonging to the eigenvalue A are determined by the following criteria: (i) The order of the largest block belonging to A is the smallest positive integer k such that rank(A - Al)k - c ).. = 0, and (ii) the number of blocks of order k - j is inductively determined as rank(A - Ul-(j+I) - CA - (j + l)il - ji2 - . . . - 2£j, (say = ij+l) with io = 0 for j = 0, .. . , k - 1. This is a general guide to determine the Jordan canonical form of a matrix. However, for a matrix oflarge order, the evaluation ofrank(A - Al)k might not be easy at all, while, for matrices of lower order or relatively simple matrices, the computations may be accessible. Example 8.4 (Jordan canonicalform ofa triangularmatrix) Find the Jordan canonical form J of the matrix
A=
2 1 4] 0 2 -1 .
[ 003
Solution: Since A is triangular, the eigenvalues of A are the diagonal entries AI = A2 = 2, A3 = 3. Hence, there are two possibilities of the Jordan canonical form of
A:
280
Chapter 8. Jordan Canonical Forms J{I)
=
200] [210] [0 0 3 0 0 3 0 2 0
or
J (2)
=
0 2 0
.
But, one can see that rank(A - 2l) = 2. It implies that rank(J - 2/) Jordan canonical form of A must be J(2).
= 2 and so the 0
Example 8.5 (Jordan canonicalform of a companion matrix)Find the Jordan canonical form J of the matrix
0]
0 1
A
=[
0 o 1 0 0 0 o 1 -1 4 -6 4
o
.
Solution: The characteristic polynomial of the matrix A is det(AI - A)
= >.. 4 -
4>..3
+
6>.. 2 - 4>" + 1 = (>" - 1)4.
The eigenvalue of A is >.. = 1 of multiplicity 4. Note that the rank of the matrix
A -I
=
[
-1o 1 00] 0 -1
-1 1 0 0 -1 1 4 -6 3
is 3; by noting that the first three rows are linearly independent, or the determinant of the 3 x 3 principal submatrix of the upper left part is not zero. Hence, the rank of J - I is also 3 and so J - I must have three 1's beyond diagonal entries. It means that 1 1 0 J
=[
011 0 0 1 000
Also, one can check the following equations:
(A _ 1)3 =
[
-1 3-3 1] -1 3 -3 1 - 1 3 -3 1 -1 3 -3 1
i- 0,
but
which says that the order of the largest Jordan block is 4.
(A - 1)4
= 0, o
Problem 8.3 Let A be a 5 x 5 matrix with two distinct eigenvalues: ).. of multiplicity 3 and J.L of multiplicity 2. Find all possible Jordan canonical forms of A up to permutations of the
Jordan blocks.
8.2. Generalized eigenvectors
n
281
Problem 8.4 Find the Jordan canonical form for each of the following matrices:
(2)
412] 0 4 2 , [ 004
(3)
U~ !
8.2 Generalized eigenvectors In Section 8.1, assuming Theorem 8.1, we have shown how to determine the Jordan canonical form J of a matrix A . In this section, we discuss how to determine a basischange matrix Q so that Q -I A Q = J is the Jordan canonical form of A. In fact, if J is given, then a basis-change matrix Q is a nonsingular solution of a matrix equation AQ QJ.
=
The following example illustrates how to determine a basis-change matrix Q when a matrix A and its Jordan canonical form J are given. Example 8.6 (Each Jordan block to each chain of generalized eigenvectors) Let A be a 5 x 5 matrix similar to a Jordan block of the form
Q-IAQ=J=
A o AI OOO] I 0 0 0 0 A 1 0
[o o
.
0 0 A 1 0 0 0 A
Determine a basis-change matrix Q = [XI X2 X3 X4 xs].
Solution: Clearly, two similar matrices A and J have the same eigenvalue A of multiplicity 5. Since rank(J - U) = 4, dimN(J - AI) = dim E(A) = 1. In fact, J has only one linearly independent eigenvector , which is el = (1, 0, 0, 0, 0). Thus Qel = Xl is a linearly independent eigenvector of A. Also, note that the smallest positive integer k such that (A - U)k = (J - ul = 0 is 5 which is the order of the block J. To see what the other columns of Q are, we expand A Q Q J as
=
[Ax] AX2 AX3 AX4 AxS] = [AXI Xl + AX2 X2
+
AX3 X3
+
AX4 X4
By comparing the column vectors, we have
Axs = X4 + AX4 = X3 + AX3 = X2 + AX2 = Xl + AXI = AXI,
AXs, AX4, AX3, AX2,
or or or or or
(A (A (A (A (A
-
=
= X4,
=
= X2,
=
= O.
AI)xs (A - AI)IXS U)X4 = (A - U)2 xS = U)X3 (A - U)3 xS U)X2 = (A - AI)4 xs = U)XI (A - U)SXs
X3, XI,
+
AXS].
282
Chapter 8. Jordan Canonical Forms
=
=
Note that the vectorxs satisfies (A -A/)5 x5 0 but (A -Al)4 X5 Xl f:. O. However, (A -A/)5 = (J _A/)5 = O. Hence, if one gets xj as a solution of (A -A/)4x f:. 0, then all other x, 's can be obtained by X4 = (A -Al)X5, X3 = (A -Al)X4, X2 = (A -Al)X3, etc. Such a vectorxs is called a generalized eigenvector of rank 5, and the (ordered) set {xj , .. . , X5} is called a chain ofgeneralized eigenvectors belonging to A. Therefore, the columns of the basis-change matrix Q form a chain of generalized eigenvectors.
o
In general, by expanding A Q = QJ, one can see that the columns of Q corresponding to the first columns of Jordan blocks of J form a maximal set of linearly independent eigenvectors of A, and remaining columns of Q are generalized eigenvectors .
Definition 8.1 A nonzero vector X is said to be a generalized eigenvector of A of rank k belong ing to an eigenvalue A if (A - A/)k x = 0
and
(A - Al)k-l x
f:. O.
Note that if k = 1, this is the usual definition of an eigenvector. For a generalized eigenvector X of rank k 2: 1 belonging to an eigenvalue A,define
= = Xk-2 =
Xk
Xk-l
X2 XI
= =
X, (A - A/)Xk (A - A/)Xk-I
= =
(A - A/)X, (A - A/)2 x,
(A - A/)X3 (A - A/)X2
= =
(A - Al)k-2 x , (A - Al)k-I X.
Thus, for each i, 1 < i :::: k, we have (A - A/)lxl = (A - Al)k x = 0 and (A - A/)l-I x l = XI ;;f: O. Note also that (A - Al)lXj = 0 for i 2: i. Hence, the vector Xl = (A - Al)k-l x is a generalized eigenvector of A of rank l. See Figure 8.1.
Definition 8.2 The set of vectors {xI , X2 , ... , Xk} is called a chain of generalized eigenvectors belonging to the eigenvalue A. The eigenvector XI is called the initial eigenvector of the chain. The following successive three theorems show that a basis-change matrix Q can be constructed from the chains of generalized eigenvectors initiated from s linearly independent eigenvectors of A, and also justify the invertibility of Q. (A reader may omit reading their proofs below if not interested in details .)
Theorem 8.2 A chain of generalized eigenvectors {XI, X2, ... , Xk} belonging to an eigenvalue A is linearly independent.
Let CIXI + C2X2 + .. . + CkXk = 0 with constants ci, i = 1, .. . , k. If we multiply (on the left) both sides of this equation by (A - Al)k-I , then for i = 1, ... ,k-l,
Proof:
8.2. Generalized eigenvectors
.
<..>
.
283
.
'-A.../ A - AI
A - >..I
A - >..I
Figure 8.1. A chain of generalized eigenvectors
(A - AI)k-l Xi
= (A -
).,l)k-(i+I)(A - AI)i Xi
= O.
Thus, Ck(A - AI)k-l Xk = 0, and, hence, Ck = O. Do the same to the equation ClXl + ... +Ck-lXk-1 = 0 with (A - AI)k-Z and get Ck-l O. Proceeding successively, Onecan show that ci 0 for all i 1, . . . , k. 0
=
=
=
Theorem 8.3 The union of chainsof generalized eigenvectorsofa square matrix A belongingto distinct eigenvalues is linearlyindependent. Proof: For brevity, we prove this theorem for only two distinct eigenvalues. Let {Xl,Xz, . . . , Xk} and {Yl,Yz, . . . , Yd be the chains of generalized eigenvectors of A belonging to the eigenvalues x and u, respectively, and let X ::f: u: In order to show that the set of vectors [xj , ... ,Xk, Yl, .. ., Yd is linearly independent, let elxl + . . . +
qXk
+ dlYl + . . . + deYe = 0
with COnstants ci 's and dj ' SoWe multiply both sides of the equation by (A - ).,l)k and note that (A - Al)k Xi = 0 for all i = 1, .. . , k. Thus we have (A - )..l)k(d1YJ + dzyz + . . . + deye) = O.
Again, multiply this equation by (A - f..L/)e-1 and note that (A - f..Ll)e-I(A - ).,l)k = (A - f..Ll)e-l Ye = (A - f..Ll)e-l Yi =
(A - ).,l)k(A - f..Ll)e-l,
Yl,
0
for i = 1, ... , f. - 1. Thus we obtain
Because (A - f..L/)Yl = 0 (or AYI = f..LYl), this reduces to de(f..L - )..lYl = 0,
284
Chapter 8. Jordan Canonical Forms
which implies that dl = 0 by the assumption A =1= /.l and YI =1= O. Proceeding successively, one can show that dj = 0, i = .e,.e - 1, .. . ,2,1, so we are left with CIXt
+ . .. +
qXk
= O. =
Since {XI, ... , xd is already linearly independent by Theorem 8.2, c, 0 for all i = 1, . . . , k. Thus the set of generalized eigenvectors {x I , . .. , Xb Yt , ... , Yel is linearly independent. 0 The next step to determine Q such that A Q = Q J is how to choose chains of generalized eigenvectors from a generalized eigenspace, which is defined below, so that the union of the chains is linearly independent.
Definition 8.3 Let A be an eigenvalue of A. The generalized eigenspace of A belonging to A, denoted by K)., is the set K). = {x
E
en : (A -
AI)P X
=0
for some positive integer
pl.
It turns out that dim K). is the multiplicity of A, and it contains the usual eigenspace N(A - AI) . The following theorem enables us to choose a basis for K)., but we omit the proof even though it can be proved by induction on the number of vectors in SU T.
Theorem 8.4 Let S =
{XI, X2, . . . , Xk} and T = {YI, Y2, .. . , Ytl be two chains of generalized eigenvectors ofA belonging to the same eigenvalue A.lfthe initial vectors XI and y ; are linearly independent. then the union SU T is also linearly independent.
Note that Theorem 8.4 extends easily to a finite number of chains of generalized eigenvectors of A belonging to an eigenvalue A, and the union of such chains will form a basis for K). so that the matrix Q may be constructed from these bases for each eigenvalue as usual.
Example 8.7 (Basis-change matrix for a triangular matrix) For a matrix A =
21 4] [ 00 02 3-1 ,
find a basis-change matrix Q so that Q-I A Q is the Jordan canonical matrix.
Solution: Method 1: In Example 8.4, the Jordan canonical form J of A is determined as J=
[
210] 0 2 0
.
003
By comparing the column vectors of AQ
= QJ with Q = [XI X2 X3], one can get
8.2. Generalized eigenvectors
285
Since XI and X3 are eigenvectors of A belonging to A = 2 and A = 3, one can take XI = (1, 0, 0) and X3 = (3, -1, 1) . Also, from the equation AX2 = 2X2 + XI, one can conclude X2 = (a , 1, 0) with any constant a, so that 3 ] 1 a 0 1 -1 . [ 001
Q=
One may check directly the equality A Q = QJ by a direct computation.
Method 2: This is a direct method to compute Q without using the Jordan canonical form J. Clearly, the eigenvalues of A are Al = A2 = 2, A3 = 3. Since rank(A All) = 2, the dimension of the eigenspace N(A - All) is 1. Thus there is only one linearly independent eigenvector belonging to Al = A2 = 2, and an eigenvector belonging to A3 = 3 is found to be X3 = (3, -1 , 1). We need to find a generalized eigenvector X2 of rank 2 belonging to A = 2, which is a solution of the following systems:
[~
(A - 2l)x =
(A - 2l)2 x
-:]. ~. -!]. ~.
1
0 0 0 0 0
~U
From the second equation , X2 has to be ofthe form (a , b, 0), and from the first equation we must have b :f; O. Let us take X2 = (0, 1, 0) as a generalized eigenvector of rank 2. Then, we have XI = (A - 2l)x2 = (1, 0, 0). Now, one can set
Q~['1'2'3]~
U! -:]
The reader may check by a direct computation Q-I AQ
where JI =
=
[~
210] [ o0 02 03 = [ 0
JI
; ] and
Jz
o
= [3].
Example 8.8 (Basis-change matrix for a companion matrix) Find a basis-change matrix Q so that Q-I A Q = J is the Jordan canonical form of the matrix 0 1
A=
0 0
o
0
o1
0]
0 01
[ -1 4 -6
4
'
286
Chapter 8. Jordan Canonical Forms
Solution: Method J: In Example 8.5, we computed
J=
1100] 1 1 0
o
[0 0
1 1
.
000 1
Now, one can find a basis-change matrix Q = [Xl X2 X3 X4] by comparing the column vectors of AQ = QJ : AXl=Xl , AX2=X2+Xl , AX3=X3+X2 , AX4=X4+X3. By computing an eigenvector of A belonging to >.. = I, one can get xj = (1, 1, 1, 1). The equation AX2 = X2 + Xl gives X2 = (a, a + 1, a + 2, a + 3) for any a. Take X2 = (0, 1, 2, 3). Similarly, from equations AX3 = X3 + X2 and AX4 = X4 + X3, one can get X3 = (b, b, b + I, b + 3) for any b and set X3 = (0, 0, 1, 3), and successively one can take X4 = (0, 0, 0, 1). We conclude that
Q~U ~!
n
One may check A Q = QJ by a direct matrix multiplication. Method 2: The characteristic polynomial of the matrix A is
det(A - >..I) = >..4 - 4>..3 + 6>..2 - 4>" + 1 = (>.. _1)4 . The only eigenvalue of A is x = 1 of multiplicity 4. Note that dimN(A - J) = 1 because the rank of the matrix A _ J _
-
[
-1 1 00] 0 -1 1 0 0 0 -1 1 -1 4 -6 3
is 3. Thus, a basis-change matrix Q = lxr X2 X3 X4] consists of a chain of generalized eigenvectors. First, we find a generalized eigenvector X4 of rank 4, which is a solution Xof the following equations:
(A _1)3 x =
[-1 -3 1]
(A _1)4 x =
O.
3 -1 3 -3 1 -1 3 -3 1 -1 3 -3 1
X
f:
0,
But, a direct computation shows that the matrix (A - 1)4 = O. Hence, one can take any vector that satisfies the first equation as a generalized eigenvector of rank 4: Take X4 = (-I, 0, 0, 0). Then ,
[-~ n[-~] [H
287
8.2. Generalized eigenvectors
X3
=
X2
=
Xl
=
1 0 -1 1 (A -/)'4 = 0 -1 -1 4 -6 (A -/)X3 = (-1, 0, 1, 2), (1, 1, 1, 1). (A -/)X2
=
Now, one can set -1
~ -~]
o
1 o 2 1
0
.
0
Then,
100] 1 0 = J
o1 o
1 1
. h Q-I _ [
Wit
-
0 1
1 00]
00 -1 1 0 1 -2 1 3 -3 1 -1 0
.
o
Remark: In Examples 8.7-8 .8, we use two different methods to determine a basischange matrix. In Method 1, we first find an initial eigenvector Xl in order to get a chain {Xl, X2, .. . ,Xk} of generalized eigenvectors belonging to A. After that, we find X2 as a solution of (A - H)X2 = xi, and X3 as a solution of (A - A/)X3 = X2, and so on. In this method, we don't need to compute the power matrix (A - H)2 and (A - A/)3, etc. But, this method may not work sometimes, when the matrix A - H is not invertible. See the next Example 8.9. In Method 2, we first find a generalized eigenvector Xk of rank k as a solution of (A - )../)k x = 0 but (A - )..l)k-I x =1= O. With this Xko one can get Xk-l as Xk-l = (A - A/)Xk, and Xk-2 = (A - )../)Xk-l, and so on. This method works always, but we need to compute a power (A _ )../)k . The next example shows that a chain of generalized eigenvectors may sometimes not be obtained from an initial eigenvector of the chain. Example 8.9 For a matrix
A=
5-3 -2] [ 8 -5 -4 -4 3 3
,
find a basis-change matrix Q so that Q-l A Q is the Jordan canonical matrix. Solution: Method 1: The eigenvalue of A is A = I of multiplicity 3, and the rank of the matrix A - 1=
[ 4-3 -2] 8 -6 -4 -4 3 2
288
Chapter 8. Jordan Canonical Forms
is 1. (Note that the second and the third rows are scalar multiples of the first row.) Hence, there are two linearly independent eigenvectors belonging to A. = 1, and the Jordan canonical form J of A is determined as
J=
° J.
1 1 010 [ 001
Now, one may find a basis-change matrix Q = [XI X2 X3] by comparing the column vectors of AQ = QJ:
By computing an eigenvector of A belonging to A. = 1, one may get two linearly independent eigenvectors: take UI (l, 0, 2) and U2 (0, 2, -3). If we take the eigenvector XI as UI or U2, then a generalized eigenvector X2 must be a solution of
=
=
But this system is inconsistent and one cannot get a generalized eigenvector X2 in this way. It means that we are supposed to take an eigenvector XI E £(1) carefully in order to get X2 as a solution of (A - I)x XI .
=
Method 2: First, note that A has an eigenvalue A. = 1 of multiplicity 3 and there are two linearly independent eigenvectors belonging to A. = 1. Hence, we need to find a generalized eigenvector of rank 2, which is a solution X of the following equations:
(A -I)x
=
(A _1)2 x
=
U=~ =n [n ~
0,
O.
But, a direct computation shows that the matrix (A - 1)2 = O. Hence, one can take any vector that satisfies the first equation as a generalized eigenvector of rank 2: take X2 = (0, 0, -1). Then,
x\
~
(A -I)x,
~ [j =~
=n UJ ~ Ul
Now, by taking another eigenvector X3 = (1, 0, 2), so that XI and X3 are linearly independent, one can get
8.3. The power A k and the exponential e A
-n
One may check by a direct computation
Q-'AQ=
1 1 0] 0 1 0 =J [ 001
289
o
Problem 8.5 (From Problem 8.4) Find a full set of generalized eigenvectors of the following
matrices:
(2)
[
412] 0 4 2 004
•
8.3 The power A k and the exponential eA In this section, we discuss how to compute the power Ak and the exponential matrix
e A , which are necessary for solving linear difference or differential equations as
mentioned in Sections 6.3 and 6.5. It can be dealt with in two ways. Firstly, it can be done by computing the power Jk and the exponential matrix e J for the Jordan canonical form J of A, as shown in this section. The second method is based on the Cayley-Hamilton theorem (or the minimal polynomial) and it will be discussed later in Sections 8.6.1-8.6.2. Let J be the Jordan canonical form of a square matrix A and let
with a basis-change matrix Q. Since
for k = 1,2, . .. • it is enough to compute Jk for a single Jordan block J. Now an n x n Jordan block J belonging to an eigenvalue). of A can be written as
).J
+
N.
290
Chapter 8. Jordan Canonical Forms
Since I is the identity matrix, clearly IN = N I and
e,
Note that N k = 0 for k ~ n. Thus, by assuming (~) = 0 if k <
=
o Next, to compute the exponential matrix e A , we first note that
Thus, it is enough to compute e J for a single Jordan block J. Let J = AI before. Then, N k = 0 for k ~ nand
1 0 eJ =eJ..leN = eA
n-l
Nk
k=O
k!
L-
I 21
-
1
I
1
(n - 2)!
(n - I)!
-
1
(n - 2)!
2!
= eA
1
1
-
2!
1 0
+
0
1
N, as
8.3. The power A k and the exponential e A
291
Example 8.10 (Computing Ak and eA by usingthe Jordan canonicalform) Compute the power Ak and the exponential matrix eA by using the Jordan canonical form of A for
(1)A=
2 2I - 4] I
[o 0
0
3
(2) A
=[
~ ~ ! ~].
-I 4 -6 4
Solution: (1) From Examples 8.4-8.7, one can see that
-i],
A=QJQ-l=[~0~0-i][~; ~][~ ~ 1003001 and
Jk =
[
2k 00
(~)2kk-l 2
o
0] 0
3k
Hence,
and
(2) Do the same process as (I). From Examples 8.5-8.8,
-~][~ i: ~][ ~ -l j ~1]' o 0 0 0 I
and
-1
3-3
292
Chapter 8. Jordan Canonical Forms
o o
1
@ (;) @
0
1 (~)
(~)
m
000 Now. one can compute Af
e
A
=
=
[ '1 -10 0I ell 0
e
1
2 1
[-I
1 2
2
5
and e
J
=e [
1
o1 11 o0 o0
~1 ~] 1 1
2I 1
0
1
.
= QJkQ-1 and e/' = Qe JQ-l.Forexample,
J[ :
-~o ~
0 0 0 0
o
1 -2o
1 2I 1
l]U
1 0
0]
1 o1 0 -1 1 -2 1 3 -3 1
J
']
0
-3 2 -3 13 21 17 -'6 8 -2 T
Example 8.11 (Computing Ak and e A by usingthe Jordancanonicalform) Compute the power A k and the exponential matrix eA by using the Jordan canonicalform of A for
o1
21 21 0 0 2 [ -1 1 0
A=
01] 0 . 3
Solution: (1) The characteristicpolynomialof A is det(AI - A) = >.. 4 _8>..3 +24>..232>" + 16 = (>" - 2)4 , and A = 2 is an eigenvalue of multiplicity4. Since
rank(A - 2l)
= rank
-1 1 1 1] o0 2 0 0 0 0 0
[
=2
-1 1 0 1
and rank(A - 2l)
2
= rank
[
00 00 01 00 ] 0 0 0 0
o
0 1 0
the Jordan canonicalform J of A must be of the form
= 1,
8.3. The power Ak and the exponential e A
o2100] 2I 0
020' [0 o0 0 2
J=
and a basis-change matrix Q such that Q-I AQ = J is
101 0] 1 0 '
2 0
o o
[ 2 -~ ~ -~] .
-1-1 Q = 0 o 1 0 2 -1 0 -2
and then
0 -1
Therefore,
O]k
2 1
Ak _ QJk Q-I _ Q -
and
0 2 1 [ 0 0 2
[
o
~ ~]k [~ =
02
0
Hence,
Zk Ak =
QJk Q-I = Q [
[
0
o
2k _ k2 k- 1 kZk-l o 2k -
[
(~)2k-1 k 2
0
@2
k-2]
(~)2k-l
] 0
Q_I
2k
o
~2k-2 + 2k2 k- 1
k2 k- 1
0
0
2k
-kZk-l
k2 k- 1
k(k;I)Zk-2
2k k2 k- 1 0
O'
k2 k- 1 + Zk
(Z) With the same notation,
o ] Q-l
eh where
]
'
293
294
Chapter 8. Jordan Canonical Forms
Hence, ell
= e21eN = e2 L'N' -, = e2 [ 01 k-O
-
Thus we have
e
A
~
gel g -l
~ ~ e'g [
0 0
;, ] I
.
I
e2 e2
2 I
I
I
~] g-l~ U'
1
I
k.
I
0
0 I
e2
0 0
j
J. e2 2
2 1 2 o 1 00 (1) A = 0 0 2 0 [ o 0 o 1
0 -3
(2) A
'
=
2 1 -11 2 1 -1 2 [ -2 -3 1 4
-2 -2
o
.
0
~e2 2e 2
Problem 8.6 Compute A k and e A by using the Jordan canonical form for
o
e'0 ]
2e 2 e2
j .
8.4 Cayley-Hamilton theorem As we saw in earlier chapters, the association of the characteristic polynomial with a matrix is very useful in studying matrices. In this section, using this association of polynomials with matrices, we prove one more useful theorem, called the CayleyHamilton theorem, which makes the calculation of matrix polynomials simple, and has many applications to real problems. Let f(x) = amx m + am_lX m-1 + . . . + alx an n x n square matrix. The matrix defined by
+ ao be a polynomial, and let A be
is called a matrix polynomial of A. For example, if f(x) = x 2 - 2x
f(A) =
+
2 and A = [ ;
~], then
A 2-2A+2h
= [~~]-2[;
~]+2[~ ~]=[~ ~] .
8.4. Cayley-Hamilton theorem
295
Problem 8.7 Let}. be an eigenvalue of a matrix A . For any polynomial f(x), show that f(}.) is an eigenvalue of the matrix polynomial f(A) .
Theorem 8.5 (Cayley-Hamilton) For any n x n matrix A , is the characteristic polynomial of A, then f(A) O.
=
if f(A)
= det(Al - A)
Proof: If A is diagonal , its proof is an easy exercise. For an arbitrary square matrix A, let its Jordan canonical form be
sothatf(A)
= Qf(J)Q-I. Since and
f(J) = [f(JI>
o
".
0
],
f(Js)
it is sufficient to show that f(Jj) = 0 for each Jordan block Jj. Let Jj = Aol + N with eigenvalue Ao of multiplicity m, in which N'" = O. Since f(A) = det(Al - A) = det(Al - J) = (A - Ao)mg(A) for some g(A) , we have
f(Jj)
= (Jj -
Aol)mg(Jj)
= (Aol +
N - Aol)mg(Jj)
= Nmg(Jj) = 0 g(Jj) = O. o
Remark: For a Jordan block 0
J=[1
:]
o
Ao
with a single eigenvalue Ao of multiplicity m, we have ( k ),k-m+1
.. . \m-I 1\.0
o Hence, for any polynomial p(A),
296
Chapter 8. Jordan CanonicalForms a~m-l ) pO.O) (m-l) !
p(J) = cl).pO·o) p().o)
o
where 11>.. denotes the derivative with respect to x . In particular, for the characteristic polynomial f().) = det(Al - J) = (). - ).o)m, we have f().o) = a>..f().o) = ... = aim-I) f().o) = 0 and hence f(J) = O. Example 8.12 The characteristic polynomial of A=
is f().)
=
3 6 0 2 [ -3 -12
j]
6)" and
det(Al - A)
=).3 +).2 -
f(A) =
A 3+A2-6A
[27o 78 54]o + [-9 -42 -18] 8 -27 -102 -54
=
0 9
-6[ : 6 6] [0 2
0
-3 -12 -6
=
4 30
0
0 0
0 0
0 18
n
o
Problem 8.8 Let us prove the Cayley-Hamilton Theorem 8.5 as follows: by setting A = A , f(A) det(AI - A) det 0 O. Is it correct or not? If not, what is a wrong step?
=
=
=
Problem 8.9 Prove the Cayley-Hamilton theorem for a diagonal matrix A by computing f(A) directly. By using this, do the same for a diagonalizable matrix A.
The Cayley-Hamilton theorem can be used to find the inverse of a nonsingular matrix. If f().) = ).n + an_l).n-l + . . . + al). + ao is the characteristic polynomial of a matrix A, then or
O=f(A) =
An+an_lAn-l+ ... +alA+aoI,
- aoI
(A n- l + an_lA n- 2 + . . . + all)A.
Since ao = f(O) = det(OI - A) = det( -A) = (_l)n det A, A is nonsingular if and only if ao = (_l)n det A :/= O. Therefore, if A is nonsingular, 1 n- l +an-lA n-2 + .. . +all). A -1 = --(A ao
8.4. Cayley-Hamilton theorem
297
Example 8.13 (Compute A-I by the Cayley-Hamilton theorem) The characteristic polynomial of the matrix A is f()...) yields
=
det()...h - A)
=)...3 -
=
[
42-2]
-5 3 -2 4
2 I
8)...2 + 17A - 10, and the Cayley-Hamilton theorem
Hence
-2] +-17 [1 2 1
10
0 1
o Problem 8.10 Let A and B be square matrices , not necessarily of the same order, and let f()..) = det(A1 - A) be the characteristic polynomial of A. Show that f(B) is invertible if and only if A has no eigenvalue in common with B.
The Cayley-Hamilton theorem can also be used to simplify the calculation of matrix polynomials. Let p()...) be any polynomial and let f()...) be the characteristic polynomial of a square matrix A. A theorem of algebra tells us that there are polynomials q()...) and r()...) such that
p()...)
= q()...)f()...) + r()...) ,
where the degree of r()...) is less than the degree of f()...). Then
p(A) = q(A)f(A) + r(A).
= 0 and p(A) = r(A).
By the Cayley-Hamilton theorem, f(A)
Thus, the problem of evaluating a polynomial of an n x n matrix A or in particular a power A k can be reduced to the problem of evaluating a matrix polynomial of degree less than n. Example 8.14 The characteristic polynomial of the matrix A = [ ; 3. Let p()...) = f()...) gives that
)...2 _ 2)", -
)...4
_7A 3
- 3)...2
i]
is f()...) =
+ ).. + 4 be a polynomial. A division by
298
Chapter 8. Jordan Canonical Forms p(),)
= (),2 -
5), - lO)f(),) - 34), - 26.
Therefore p(A) =
(A 2 - 5A - lO)f(A) - 34A - 26/
=
-34A -26/
=
-34 [;
i] - [b 26
~] = [ =~~
-68 ] -60 .
o
Example 8.15 (Computing A k by the Cayley-Hamilton theorem) Computethepower A 10 by using the Cayley-Hamiltontheoremfor
A=
o1 1 2 1 21 0] 002 0 . [ - 1 1 0 3
Solution: The characteristic polynomial of A is f(),) = det(Al - A) =), 4 - 8),3 + 24),2 - 32), + 16 = (), - 2)4, see Example 8.11. A divisionby f(),) gives ), 10
= q(),)f(),)
+
r(),)
with a quotientpolynomial q(),) = ),6+8),5+40),4+ 160),3 +560),2+ 1792),+5376 and a remainderpolynomial r(),) = 15360),3 -80640),2+ 143360),-86016. Hence, A IO = =
r(A) = 15360A 3 - 80640A 2 + 143360A - 86016/
-4096 5120 16640 5120] o 1024 10240 0 o 0 1024 O ' [ -5120 5120 11520 6144
(Compare this result with Ak givenin Example 8.11). One may noticethat this computationalmethodfor An will becomeincreasingly complicated if n becomesbigger and bigger. A simpler methodwill be shownlater in Example 8.22. 0
Problem 8.11 For the matrix A
=[
I 0 1]
0 2 0
002
,(1) evaluate the power matrix A 10 and the
inverse A-I ; (2) evaluate the matrix polynomial A 5 + 3A 4
+
A 3 - A 2 + 4A
+
6/.
8.5. The minimal polynomial
299
8.5 The minimal polynomial of a matrix Let A be a square matrix of order n and let
to.) = det(H -
t
A)
= An + an_lAn-I + """ + alA + ao = TI (A -
Ai)m}.j
i= 1
be the characteristic polynomial of A, where mi; is the multiplicity of the eigenvalue Ai. Then, the Cayley-Hamilton theorem says that An
t
TI (A -
+ an_lAn-I +.. . + alA + ao! =
Ail)ml.j
= O.
i=1
The minimal polynomial of a matrix A is the monic polynomial m(A) = of smallest degree m such that m
m(A) =
L ci A i =
Er=o CiAi
o.
i=O
=
Clearly, the minimal polynomial m(A) divides any polynomial p(A) satisfying p(A) O. In fact, if p(A) = q(A)m(A) + r(A) as on page 297, where the degree of r(A) is less than the degree of m(A), then 0 = p(A) = q(A)m(A) + r(A) = r(A) and the minimality of m(A) implies that r(A) O. In particular, the minimal polynomial m(A) divides the characteristic polynomial so that
=
t
m(A) =
TI (A -
Ai)k j
i=1
with k; ~ mi. For example, the characteristic polynomial of the n x n zero matrix is An and its minimal polynomial is just A. Clearly, any two similar matrices have the same minimal polynomial because
for any invertible matrix Q. Example 8.16 For a diagonal matrix
A-
[i
0 2 0 0
0 0 2 0
0 0 0 5
0 0 0
n
300
Chapter 8. Jordan Canonical Forms
its characteristic polynomial is f(A) = (A- 2)3 (A- 5)2. However, for the polynomial m(A) = (A - 2)(A - 5), the matrix m(A) is
m(A) = (A - 2/)(A - 5/) =
[~ ~ ~ ~ ~] [-~ -~ -~ ~ ~~] 00030 00003
Hence, the minimal polynomial of A is m(A)
0000 0000
= (A -
= O.
o
2)(A - 5).
Example 8.17 (The minimal polynomial ofa diagonal matrix) (1) Any diagonal matrix of the form AO/ has the minimal polynomial A - AO. In particular, the minimal polynomial of the zero matrix is the monomial A, and the minimal polynomial of the identity matrix is the monomial A - I. (2) If an n x n matrix A has n distinct eigenvalues AI , ... , An , then its minimal polynomial coincides with the characteristic polynomial f(A) = (A - Ai). In fact, for the diagonal matrix D having distinct diagonal entries AI, .. . , An successively and for any given j, the (j, j)-entry of Oi;=j (D - Ai l) is equal to Oi;=/Aj - Ai), which is not zero . 0
07=1
Example 8.18 (The minimal polynomial ofa Jordan canonical matrix J having one or two blocks)
[I ! ~ i ~],
(1) For any 5 x 5 matrix A similar to a Jordan block of the form
Q-I AQ = J =
OOOOAO
its minimal polynomial is equal to the characteristic polynomial f(A) = (A AO)5, because (J - AO/)4 :p 0 but (J - AO/)5 O. (2) For a matrix J having two Jordan blocks belonging to a single eigenvalue A, say
=
JI
J = [ 0
0]
h
=
AO [
o o
1 0]
AO 1 0 AO
the minimal polynomial of the smaller block Jz is a divisor of the minimal polynomial of the larger block JI . In general, if a matrix A or its Jordan canonical form J has a single eigenvalue AO, then its minimal polynomial is (A - AO)k, where k is the smallest positive integer l such that (A - A/)i = O. In fact, such number k is known as the order of the largest Jordan block belonging to A.
8.5. The minimal polynomial
301
(3) For a matrix J having two Jordan blocks belonging to two different eigenvalues AO :/= AI respectively, say
[~
J-[iJ 0]_ 0 h -
I 0 0 AO I o 0 o 0 AO 1 0 0 0 AO I 0 0 o AO
]
[~
1 Al 0
n
the minimal polynomial of J is a product of the minimal polynomials of JI and h which is (A - Ao)5(A - AI)3. In general, for a Jordan canonical matrix J, let h denote the direct sum of Jordan blocks belonging to the eigenvalueA. Then, J = E9:=1 JA; , where t is the number of the distinct eigenvalues of J. In this case, the minimal polynomial of J is the product of the minimal polynomials of the summands hi's. 0 By a method similar to Example 8.18 (2) and (3) with Step 2(i) on page 279, one can have the following theorem. Theorem 8.6 For any n x n matrix A or its Jordan canonical matrix J. its minimal polynomial is n:=1 (A - Ai)ki , where AI, . . . , At are the distinct eigenvalues of A and k; is the order ofthe largest Jordan block in J belonging to the eigenvalue Ai. Or equivalently, k, is the smallest positive integer i such that rank(A - Ai /)l + m A; n, where mAl is the multiplicity of the eigenvalue Ai.
=
Corollary 8.7 A matrix A is diagonalizable if and only if its minimal polynomial is equal to n:=1 (A - Ai), where AI, ... , At are the distinct eigenvalues of A. Example 8.19 (Computing the minimal polynomial) Compute the minimal polynomial of A for (I)A=
20 2I -I4]
[o
0
3
01 o 0 00] 1 0
[ -10
(2) A =
0
0 1
.
4 -6 4
Solution: (1) Since A is triangular, its eigenvalues are Al = A2 = 2, A3 = 3. But, rank(A - 2/) = 2 and so A is not diagonalizable. Hence, its minimal polynomial is m(A) = (A - 2)2(A - 3). (2) Recalling Example 8.5, we know that the eigenvalue of A is A = 1 of multiplicity 4, and rank(A - I) = 3. It implies that the Jordan canonical form of A is 1 1 o J = 0 I 1 0 o 0 1 1 ' [
0]
o 0 o
1
302
Chapter 8. Jordan Canonical Forms
and the minimal polynomial of A is meA) = (A - 1)4, by Theorem 8.6.
0
Example 8.20 Compute the minimal polynomial of A for
o1 12 1 2 1 0 002 0 [ -1 1 0 3
A=
J
.
Solution: In Example 8.11, we show that the characteristic polynomial of A is f(A) = (A - 2)4 , rank(A - 2/) = 2, rank(A - 2/)2 = 1 and the Jordan canonical form J of A is
J=[H o
HJ.
0 0 2
So, the minimal polynomial of A is meA) = ()" - 2)3, by Theorem 8.6.
0
Problem 8.12 In Example8.2, we haveseen that there are seven possible(nonsimilar) Jordan canonicalmatricesof order 5 that have a single eigenvalue. Computethe minimalpolynomial of each of them.
8.6 Applications 8.6.1 The power matrix Ak again We already know how to compute a power matrix Ak in two ways: One is by using the Jordan canonical form J of A with a basis-change matrix Q as shown in Section 8.3; and the other is by using the Cayley-Hamilton theorem with the characteristic polynomial of A as shown in Section 8.4. In this section, we introduce the third method by using the minimal polynomial , as a possibly simpler method. The following example demonstrates how to compute a power A k and A-I for a diagonalizable matrix A by using the minimal polynomial instead ofthe characteristic polynomial and also without using the Jordan canonical form J of A.
Example 8.21 (Computing A k by the minimalpolynomial when A is diagonalizable) Compute the power A k and A-I by using the minimal polynomial for a symmetric matrix 4 0
0 4
A= [
~ -~0 J .
1 1 5 -1 1
o
5
8.6.1. Application: The power matrix Ak again
303
Solution: Its characteristic polynomial is f (A) = (A - 3)2(A - 6)2. Since A is symmetric and so diagonalizable, its minimal polynomial is meA) = (A - 3)(A - 6), or equivalently meA) = A 2 - 9A + 181 = O. Hence , I I 1 A- =--(A-9l)=-18 18
[
-5 0 I-I] 0 -5 I I I 1 -4 0 -I 1 0-4
.
To compute Ak for any natural number k, first note that the power Ak with k ~ 2 can be written as a linear combination of I and A, because A 2 = 9A - 18/. (See page 297.) Hence, one can write
with unknown coefficients Xi'S. Now, by multiplying an eigenvector x of A belonging to each eigenvalue Ain this equation, that is, by computing Akx = (xol + XI A)x, we have a system of equations as A = 3, as A = 6, Its solution is Xo
= 2 . 3k -
3k = 6k =
6k and XI
Xo Xo
= 1(6k -
+ +
3XI ,
6xI.
3k). Hence , 6k
_
3k
6k
-
3k
3k
+
2. 6k
o Now, we discuss how to compute a power A k in two separate cases depending on the diagonalizability of A. (1) Firstly, suppose that A is diagonalizable. Then its minimal polynomial is equal to meA) = (A - Ai), where AI , . . . , AI are the distinct eigenvalues of A. Now, by using meA) = (A - Ail) = 0 as in Example 8.21, one can write
0:=1
0:=1
A k = xoI
+
x\A + ... + xl_ IA I - I
with unknowns Xi 'S. Now, by multiplying an eigenvector Xi of A belonging to each eigenvalue Ai in this equation , one can have a system of equations
304
Chapter 8. Jordan Canonical Forms
Its coefficient matrix P.. {] is an invertible Vandermonde matrix of order k because the eigenvalues Ai'S are all distinct. Hence, the system is consistent and its unique solution Xi'S determines the power A k completely. (2) Secondly, let A be any n x n, may not be diagonalizable. By Theorem 8.6, the minimal polynomial of A is m(A) = (A - Ai )k; , where AI, ... , A, are the distinct eigenvalues of A and ki is the smallest positive integer l such that rank(A Ai l)l + mA; n, Let s L::=1 k, be the degree of the minimal polynomial m(A) and let A k = xoI + xlA + .. . + xs_lA s- l
m=1
=
=
with unknown Xi'S as before. Now, by multiplying an eigenvector x of A belonging to each eigenvalue Ai in this equation, we have
Af = Xo
+
AiXl + ... + Af-lxs_l.
For the eigenvalue Ai, let {Xl, X2, .. . , Xkj} be a chain of generalized eigenvectors belonging to Ai. Then these vectors satisfy
=
AjXl,
Ai X2 + Xl,
By using the first two equations repeatedly, one can get
AkX2 = Afx2 + kAf-lXl . On the other hand,
Akx2
=
(xoI+XlA+ "'+XS_lA S - l)X2
=
XOX2
=
(XO+AiXl+"'+Af-lXs-l)X2
+ Xl(AiX2 + +
Xl) + .
.. +
Xs-l (Af- lX2 + (s - 1)Af-2Xl)
(Xl + 2AiX2 + .
.. +
(s - 1)Ar2Xs_l) Xl.
In these two equations of Akx2, since the vectors Xl and X2 are linearly independent, their coefficients must be the same. It means that =
Xo
+
AiXl Xl
+ +
+
+
2AiX2 +
+
ArX2
, s-l "i Xs-l, (s - 1)Ar2xs_l.
Here, the second equation is the derivative of the first equation with respect to Ai. Similarly, one can write AkXk; as a linear combination oflinearly independent vectors xi, X2, . .. , Xkj in two different ways and then a comparison of their coefficients gives the following ki equations:
8.6.1. Application: The power matrix A k again
).} (~)A/-I (~)Ajk-2
= = =
+ (DXI + @X2 +
+ ... + + .. . + + ... +
AjXI
xo
(i)AiX2 @Ai X3
305
As-I i Xs-I, I)AS - 2 I i Xs-I , I S 3 2 )Ai - Xs-I,
ee-
Note that the last ki - 1 equations are equivalent to the consecutive derivatives of the first equationwith respect to Ai . For example, the two times of the third equation is just the second derivative of the first equation, and the (ki - I)! times of the last equationis the (ki - 1)-th derivative of the firstequation. Gettingtogetherall of such equations for each eigenvalue Ai, i = 1, .. . , s, one can get a system of s equations with s unknowns x j 's with an invertible coefficient matrix. Therefore,the unknowns xo, . . . , Xs-I can be uniquely determined and so can the power Ak , Example 8.22 (Computing A k by the minimal polynomial when A is not diagonalizable) (Example8.11 again) Computethe power A k and A -I by using the minimal polynomial for
A=
I 21I2 01]
o
[ -10
0 2 0 1 0 3
'
Solution: In Example 8.20, the minimal polynomial of A is determinedas m(A) (A - 2)3. Hence, one can write k
A = xoI
+
+
xIA
x2A2
with unknown coefficients Xi'S and
~ ~ ~8]'
2
A = [
:
-4 4 1
Now, at the eigenvalue A = 2, one can have as A = 2,
2k
a", aI,
k2k- 1
take take
k(k -
1)2k -
2
= = =
Xo
+
2xI XI
+ +
= k(k -
1)2k- 3; xI
= k(2 -
k)2 k _ I and XQ
2· 2x2, 2X2 ·
Its solutionis X2
22x2,
= 2k
(1 3+ ) "ik2 - "ik
1 ,
=
306
Chapter 8. Jordan Canonical Forms
Thus, one can have (the same power as in Example 8.11)
+ k2 )
2k - 3(3k
k2 k 2k k(k - 1)2k- 3
To find A -1, first note that m(A) = A 3 - 6A 2
A
-1
1
= '8 (A
2
+
- 6A
Problem 8.13 (Problem 8.11 again)
12/)
1
= '8
[
+
12A - 8/ = O. Hence,
6 -2 -1 -2] 0 4 -4 0 0 0 4 0 . 2 -2
F., the matrix A ~
1
o
2
[~ ~ ~].
,"",.,1< the pow"
matrix A 10 and the inverse A - 1.
Problem 8.14 (Problem8.6 again)Compute Ak by using the minimal polynomial for
0]
2 1 (1) A
=
o
2 o 1 0
o
0
0 0 2 0 [
o
0 -3
'
1
(2) A
=
12]
1 - 1 2 1 -1 2 [ -2 -3 1 4 -2 -2
.
8.6.2 The exponential matrix eA again As a continuation of computing the exponential matrix eA in Section 8.3 in which we use the Jordan canonical form J of A and a basis-change matrix Q, we introduce another method with its minimal polynomial. This method is quite similar to computing a power A k discussed in Section 8.6.1 (see also Example 8.22). It means that we don't need the Jordan canonical form J of A and a basis-change matrix Q. Because of a similarity to computing A k , we state only the difference in computing the exponential matrix eA in this section . We also discuss how to compute eA in two cases depending on the diagonalizability of A. (1) First, let A be diagonalizable. Then its minimal polynomial is equal to m(A) n:=1 (A - Ai), where A1' .. . , At are the distinct eigenvalues of A. Now, as before, one can set
=
8.6.2. Application: The exponential matrix e A again
307
with unknowns Xj'S and by multiplying an eigenvector x of A belonging to each eigenvalue Aj in this equation, one can get
~ ~~
~l=~ ] [ ~~] [ :~: ] A;~l X,~, ,L .
[ ; A,
=
The coefficient matrix is invertible and the unique solution x i 's determines the matrix eA.
(2) Next, let A be any n x n, may not be diagonalizable. Then, the minimal polynomial of A is m(A) = m=1 (A - A;)k; , where AI, ... , At are the distinct eigenvalues of A and k; is the smallest positive integer f. such that rank(A - Aj /) l + mA; = n. Let s = L~=I k, be the degree of m(A) and let e
A
= xoI
+
xlA
+ .. . +
xs_IA
s- 1
with unknown Xj'S as before. For each eigenvalue Aj and a chain of generalized eigenvectors {XI , X2, . . . , Xk;} belonging to Aj, a parallel procedure in computing eAxk; to that for A kxk; gives the following kj equations :
eA; = teA; = "':'e A; =
AjXI
q)AjX 2 (2)A jX3
2!
+ + +
+ + +
,s-l
I\.j
s- l ) , s-2
( 1
I\.j
Xs-I, xs-I,
(S-2 l),I\.js-3 Xs-I,
Note that the last k; - 1 equations are equivalent to the consecutive derivatives of the first equation with respect to Aj. For example, the 2! times of the third equation is just the second derivative of the first equation, and the (kj - I)! times of the last equation is the (kj - 1)-th derivative of the first equation . Getting together all of such equations for each eigenvaluex. , i = 1, . .. , t, one can get a system of s equations with s unknowns x j 's with an invertible coefficient matrix. Therefore, the unknowns xo, .. . , Xs-I can be uniquely determined and so can the exponential eA.
Example 8.23 (Computing e A by the minimalpolynomialwhen A is diagonalizable) Compute the exponential matrix e A by using the minimal polynomial for A
=
[
5 -4 4]
12 -11 12 4 -4 5
.
Solution: First, recall that the matrix A is the coefficient matrix of the system oflinear differential equations given in Example 6.16:
308
Chapter 8. Jordan Canonical Forms
I
yj
= =
5Yl -
y~ Y~ =
l2Yl
4Y2
-
llY2
4Yl -
4Y2
+ + +
4Y3 l2Y3 5Y3·
It was known that A is diagonalizable with the eigenvalues Al = A2 = 1, and A3 = -3. Hence, the minimal polynomial of A is m(A) = (A - l)(A + 3) , and one can write e A xoI + Xl A with unknowns Xi'S. Then
=
as A = 1, as A = -3,
e = Xo + Xl, e- 3 = Xo - 3Xl.
By solving it, one can get Xo = t(3e + e- 3 ) , Xl = t(e - e- 3 ) and
e
A
= xoI +
2e - e- 3 -e + e- 3 e - e- 3 ] xlA = 3e - 3e-3 -2e + 3e-3 3e - 3e-3 . [ e - e- 3 -e + e- 3 2e - e- 3
o
Example 8.24 (Example 8.11 again) (Computing eA by theminimalpolynomialwhen A is not diagonalizable) Compute the exponential matrix e A by using the minimal polynomial for
A=
o1 1 2 1 21 0] [
0 0 2 0
.
-1 1 0 3
Solution: In Example 8.20, the minimal polynomial of A is computed as m(A) = (A - 2)3. Hence, one can set as in Example 8.22, e A = xoI +xlA +x2A2 with unknown coefficients Xi 'S and 2
A Now, at the eigenvalue A
A
~ ~ : ~8]'
-4 4 1
= 2, one can have
as A = 2, take 0)", take (0),,)2, Its solution is X2
=[
e2 = Xo e2 = e2 =
+
2xl Xl
+
+
= !e2; Xl = _ e2 and Xo = e2. Hence,
e =xoI +xlA +x2A
2
= e2 (I -
A+
~A2) = [ ~
_e 2
as shown in Example 8.11.
e2
~e2
e2 2e2 o e2
e2 !e 2
o
8.6.3. Application: Linear Difference equations again
309
Problem 8.15 (Problem 8.6 again) Compute eA by using the minimal polynomial for
(1) A
[~ ~ ~ ~]
= o
0 2 0
12]
0 -3 (2) A
'
-2 1 -1 2 -2 1 -1 2 [ -2 - 3 1 4
=
000 1
.
8.6.3 Linear difference equations again
=
A linear difference equation is a matrix equation Xn AXn-I with a k x k matrix A and if an initial vector Xo is given, then Xn = Anxo for all n. If the matrix A is diagonalizable with k linearly independent eigenvectors VI , V2, . •.• Vk belonging to the eigenvalues AI . A2, •.. , Ak, respectively, a general solution of a linear difference equation Xn AXn-I is known as
=
Xn = CIA'jVI
+
c2A2v2
+ .. . +
CkAkvk
with constants CI, C2, •.• • q . (See Theorem 6.13.) On the other hand, a linear difference equation Xn AXn-I can be solved for any square matrix A (not necessarily diagonalizable) by using the power An, whose computation was discussed in Sections 8.3 and 8.5.
=
Example 8.25 Solve a linear difference equation
o1
21 A= 0 0 [ - 1 1
Xn
= AXn-I, where
211] 0 2 0 . 0 3
Solution: In Example 8.11. it was shown that the matrix A has an eigenvalue 2 with multiplicity 4, and A is not diagonalizable. However, the solution of Xn = AXn-I is Xn Anxo , where
=
2n _ n2 n- I [
+
n2 )
0
~
0
n~ 2n
_n2n- 1
n2 n- 1
n(n - 1)2 n- 3
n
A =
n2 n- 1 2n- 3(3n
0
o
which is given in Examples 8.11 and 8.22
Problem 8.16 Solve the linear difference equation
21
(1) A _
-
0 2
[ o0 00
! ~], o
2
(2) A
=
Xn
= AXn-l for
[~ i ! ~], 0 0 0 2
(3) A
=
[~ i ~ ~2] ' 000
310
Chapter 8. Jordan Canonical Forms
8.6.4 Linear differential equations again Now, we go back to a system of linear differential equations
y' = Ay
with initial condition
y(O) = Yo.
Its solution is known as y(t) = etAyo. (See Theorem 6.27.) In particular, if A is diagonal izable, a general solution of y' = Ay is known as
y(t)
= etAyo = cteA1tvl + c2eA2tv2 + ...+
cneA.tvn,
where VI , V2 , . .. , Vn aretheeigenvectorsbelongingtotheeigenvaluesAl, A2, . . . , An of A, respectively . For any square matrix A (not necessarily diagonalizable), the matrix etA can be J be the Jordan canonical computed in two different ways. Firstly, let Q-l AQ etAyo is form of A. Then, the solution y(t)
=
=
where Q-lyO = (cj , ... , cn) and the Uj'S are generalized eigenvectors of A. In particular, if Q-I A Q = J is a single Jordan block with corresponding generalized eigenvectors u, of order k, then the solution becomes
iAyo
=
eAt QetN Q-I yO = eAt [UI U2 . . . un] 1
o x
o
=
eAt
((~Ck+<~)
UI
+
(~Ck+2~)
U2
+ ... +cnun).
As a simpler method to compute et A, one can use the minimal polynomial of t A as discussed in Section 8.6.2. First, note that if m().,) is the minimal polynomial of A, then m(t).,) is that of the matrix tAo
8.6.4. Application: Linear Differential equations again
311
Example 8.26 Solve the linear differential equation y' = Ay with initial condition y(O) = Yo, where
A
=
[ 4-3 -1] 1 -1
0 -1
2
3
,
Yo
=
[2] 1 4
.
Solution: Method 1: (i) Note that the characteristic polynomial of A is det(AI - A) = 2 + 16A - 12 = (A - 3) (A - 2)2 and A is not diagonalizable (because A3 rank(A-2I) = 2). By taking Xl = (-1, -1 , Ij and x, = (2,1, -1)aseigenvectors belonging to A = 2 and A = 3, respectively, one can compute the Jordan canonical form of A as follows:
n
where
h = [3],
and Q =
[=~1 0~ -1~
(ii) Let y = Qx. Then the given system changes to x' =
].
Jx with
-1 1] [2] = [5] 1 1 -1 0
1 4
5 1
and its solution is to J2
e
]
5] =
[ 5 1
2t [e0 te: e 0 0 Oe 3t
0] [5] 5, 1
since
(iii) Thus, we get
y(t)
=
Qx(t)
= [=~
]
~ ~ [e~
1 0 -1
0
~] [ ; ]
t;;'t 0 e 3t
1
312
Chapter 8. Jordan Canonical Forms
Method 2: To use the minimal polynomial of the matrix t A, first recall that the (A - 3) (A - 2)2 and A is not diagonalcharacteristic polynomial of A is f(A) izable. Hence, its minimal polynomial coincides with the characteristic polynomial. Therefore, the minimal polynomial of t A is the polynomial f(t A), and one can write
=
with unknown coefficient functions Xj(t)'s and
-4 ] -4 . 8 Now, at each eigenvalue A, one can have
2t e = Xo(t) te 2t = e3t = xo(t)
as A = 2, take 0)", as A 3,
=
=
Its solution is xo(t) -3e 2t - 6te 2t 2t 3t 2t X2(t) = e - e - te • Hence,
+ +
+
2x1 (t) XI (t) 3xI (t)
+ +
+
22X2(t) , 2 · 2x2(t) , 32x2(t).
4e3t; XI(t) = 5te 2t - 4e 3t
+
4e2t and
Now, one might compare the value y(t) = etAy(O) with the solution obtained by Method 1. The reader can easily notice that Method 2 is simpler than Method 1. 0
Example 8.27 (Computing etA when A is diagonalizable) (Example 6.16 again) Solve the system of linear differential equations
1
Y~ = 5YI Y2 = 12YI Y3 = 4YI -
4Y2 l1Y2 4Y2
+ + +
4Y3 12Y3 5Y3,
and also find its particular solution satisfying the initial conditions YI (0) 3 and Y3(0) = 2. Solution: The matrix form of the system is y' = Ay with A
=
[
5 -4 4]
12 -11 12
4
-4
5
,
= 0, Y2 (0) =
8.6.4. Application: Linear Differential equations again
313
and its general solution is y = elAyo. It was known that A is diagonalizable and the minimal polynomial of A is meA) = (A-l)(A+3). Ifwe write elA = xo(t)I +Xl (t)A with unknown functions Xi (t)'s, then as A = 1, as A -3,
=
el
= e- 31 =
xo(t)
+
xo(t)
Xl (t) , 3Xl(t).
(Compare with Example 8.23) . By solving it, we havexo(t) = t(3e' +e- 31) , XI(t) teet - e- 31) and
=
-el +e- 31 -2el + 3e-31
-el
+
e- 31
Moreover, with the initial conditions Yl (0) = 0, Y2 (0) = 3 and Y3 (0) = 2, the particular solution is
One might compare this method with that given in Example 6.16.
o
Note: In Example 8.27, we compute el A by finding the unknown functions Xi (t)'s in elA = xo(t)I +xl(t)A. However, in Example 8.23, we determined e A = xoI +xlA with Xo = t(3e + e-3 ) and Xl = tee - e- 3 ) . Hence, it looks true that elA xoI + Xl (t A) with the same Xo = t(3e + e- 3 ) and Xl = tee - e- 3 ) . But, it is not a fact, because if we put elA = xoI + xlA, then Xo and Xl must be functions of t,
=
Example 8.28 (Computing elA when A is not diagonalizable) Solve the system of Ay(t) , where linear differential equations y'(t)
=
A=
[
1111]
o
2 2 0
002 0
.
-1 I 0 3
Solution: In Example 8.20, the minimal polynomial of A is computed as meA) (A - 2)3 . Hence, as in Example 8.24, one can set
=
314
Chapter 8. Jordan Canonical Forms
with unknown coefficient functions Xj(t)'s. Now, at the eigenvalue X = 2, one can have as )..
= 2, aj", af,
take take
e 2t te 2t
=
t 2e2t
=
xo(t)
+
etA =
+
22X2(t),
XI(t)
+
2· 2x2(t),
=
2X2(t) .
= !t 2e2t ; Xl (t) = te 21-
Its solution is X2(t) Hence,
2XI(t)
= e2t -
2t 2e2t andxo(t)
2te 2t +2t 2e2t •
xo(t)l +xI(t)A +x2(t)A2
[ e" -
o o
e21
-te 2t
=
and the solution ofy'(t)
"u "u
,e" +
k",u
te 2t
0
2te 21 e 2t
oo
te 2t
!t 2e21
e 21 + te 21
]'
= Ay(t) is given by y(t) = e'Ayo.
=
o
Example 8.29 (Solving y' (t) Ay(t) by the minimal polynomial) Solve the system of linear differential equations y' (t) Ay(t), where
A=
=
[ 5-3 -2] 8 -5 -4 -4 3 3
.
Solution: (1) The characteristic polynomial of A is det(Al - A) = ().. - 1)3, so that the eigenvalue of A is ), 1 of multiplicity 3. (2) In the matrix
=
A -I =
[ 4-3 -2] 8 -6 -4 -4 3 2
)..3 -3)..2+3),,-1
=
,
one can see that the second and the third rows are constant multiples of the first row, and hence rank(A - I) 1. Hence, A is not diagonalizable, but (A - 1)2 O. It means that the minimal polynomial of A is m()..) = ().. - 1)2 . Therefore, one can write e'A = xo(t)l + Xl (t)A
=
=
with unknown coefficient functions x, (t)'s, and one can have
as ).. take
= 1, aj",
Xo(t)
+ 1· Xl (t), Xl (t)
.
8.7. Exercises
Its solution is xo(t) = e' (l - t);
XI (t) =
eIA=xo(t)/+xt(t)A=
[
te': Hence,
+ 4t)e l
(l
-3te l
8te'
(l-6t)e l
-4te l
3te'
l -2te -4te l
+
(1
]
2t)e'
,
o
and a generalsolutionof y' (t) = Ay(t) is given by y(t) = elAyo.
Problem 8.17 Solve the system of linear differential equations y' tion yeO) = YO, where A
=[
315
2 1 -1] -3 -1 1 , 9 3 -4
Yo
=[
= Ay with the initial condi-
-1 ] - 1 . 1
Problem 8.18 Solve the system of linear differential equations y' = Ay for 2 1 (1) A
=
0 2
[
o o
0 0
!~] , o
(2) A
=
2
[~ i !o ~], 0 0
(3) A
=
2
[~ i ~
000
8.7 Exercises 8.1. For
1 ! ~] ()" o
A=[ ~
=1= 0), find
A-I and its Jordan canonical form J.
0 0 )"
8.2. Show that if A is nonsingular, then A - I has the same block structure in its Jordan canonical form as A does . 8.3. Find the number of linearly independent eigenvectors for each of the following matrices :
(1)
[11000] ['0000] [' 01100 0 0 1 0 0 00031 00003
,(2)
02000 0 0 2 0 0 00051 00005
,(3)
0 0 0 0
1 2 0 0 0
0 0 0 o 0 3 o o] 0 . 0 3 0 0 o 5
8.4. Find the Jordan canonical form for each of the following matrices : (1)
[~ ~].
(2)
[-2o 0-2] -1
1 1
-2 -1
,
(3)
[=~ 3~ -2~ ]. o
Also, find a full set of generalized eigenvectors of each of them.
2
1
316
Chapter 8. Jordan Canonical Forms
8.5. Show that a Jordan block J is similar to its transpose , J T = P -I J P, by the permutation matrix P [en . .. ej]. Deduce that every matrix is similar to its transpose.
=
8.6. Evaluate det An for a tridiagonal matrix
b b 0 0 b b b 0 0 b b b
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
b b b 0 0 b b b 0 0 b b
An =
b > O.
8.7. Solve the system of linear equations
+ +
{ (1- i)x (1 + i)x
(1 (1
+ +
i)y i)y
= 2-i = 1 + 3i .
8.8. Solve the system of three difference equations:
= = =
{ X.+l
Yn+1 Zn+1
for n
= 0 , 1, 2, .. ..
[i
+ +
= AYn-1
for A
=
8.10. Solve Yn
= AYn-1
for A
= [=~
~
:]
2 -12 -6
s' = Ay for A = [~ ~ ] with Yo = [ ~
8.12. Solve y'
= Ay for A = [
=~
2:
+ + +
5Yn Yn Yn
~] with Yo = [ ~
8.9. Solve Yn
8.11. Solve
3xn Xn 2xn
1
with Yo
= (2,
1, 0) .
1
: ] with y(1)
2 -12 -6
2zn Zn 3zn
= (2,
1, 0) .
8.13. Solve the initial value problem YI(O) n(O) Y3(O)
8.14. Consider a 2 x 2 matrix A
= = =
-2 0
- 1.
= [~ ~].
(1) Find a necessary and sufficient condition for A to be diagonalizable.
(2) The characteristic polynomial for A is f()..) that f(A) = O.
= )..2 -
(a
+
d))"
+
(ad - be). Show
8.7. Exercises
317
8.15. For each of the following matrices, find its Jordan canonical form and the minimal polynomial .
2 -3 ]
(2) [ 7
-4
(3)
'
[~ ~ -~]. o
2 -1
8.16. Compute the minimal polynomial of each of the following matrices and from these results compute A -I , if it exists .
(1)
[~ ~J.
(2) [ ;
~
mU
J.
0 2 0
8.17. Compute A-I, An and e A for
(1)
[~ ~ J. (4)
(2)
["
4I 2]
o
4 2 004
1 0 0 2 0 0 o0 1 1 o0 0 2
(5)
,
(3)
n
[1 0 I] 0 2 1 001
[ 10 1 1 0 0 0] 0 0 1 1 0
,
.
-1 0 0 1
8.18 . An n x n matrix A is called a circulant matrix if the i -th row of A is obtained from the first row of A by a cyclic shift of the i - I steps, i.e., the general form of the circulant matrix is
A
=
a2 a3 an al a2 [ "I an al an-I: a2 a3 a4
(1) Show that any circulant matrix is normal.
(2) Find all eigenvalues of the n x n circulant matrix
(3) Find all eigenvalues of the circulant matrix A by showing that n
A
= "L..Jai W;-I . ;=1
(4) Compute det A. (Hint: It is the produc t of all eigenvalues.)
318
Chapter 8. Jordan Canonical Forms (5) Use your answer to find the eigenvalues of
8.19. Determine whether the following statements are true or false, in general, and justify your answers . (1) Any square matrix is similar to a triangular matrix. (2) If a matrix A has exactly k linearly independent eigenvectors, then the Jordan canonical form of A has k Jordan blocks . (3) If a matrix A has k distinct eigenvalues, then the Jordan canonical form of A has k Jordan blocks.
=
(4) If two square matrices A and B have the same characteristic polynomial det(AI- A) det(AI - B) and for each eigenvalue Athe dimensions of their eigenspacesN(AI - A) and N (AI - B) are the same, then A and B are similar.
(5) lfa4x4matrix A has eigenvalues 1 and2,eachofmultiplicity2,suchthatdim E(1) = 2 and dim E(2) I, then the Jordan canonical form of A has three Jordan blocks .
=
(6) If there is an eigenvalue Aof A with multiplicity mj., and dim E(Aj) not diagonalizable. (7) For any Jordan block J with eigenvalue A, det eJ
:F mj."
then A is
= ej.,.
(8) For any square matrix A , A and A T have the same Jordan canonical form. (9) If f(x) is a polynomial and A is a square matrix such that f(A) multiple of the characteristic polynomial of A.
= 0, then f(x) is a
(10) The minimal polynomial of a Jordan canonical matrix J is the product of the minimal polynomials of its Jordan blocks Jj . (11) If the degree of the minimal polynomial of A is equal to the number of the distinct eigenvalues of A , then A is diagonalizable.
9
Quadratic Forms
9.1 Basic properties of quadratic forms In the beginning of this book, we started with systems of linear equations , one of which can be written as
+ a2x2 + .. . + anXn = b. The left-hand side al X l +a2x2+ ... +anXn = aT x of the equation is a (homogeneous) alXl
polynomial of degree I in n real variables. In this chapter, we study a (homogeneous) polynomial of degree 2 in several variables, called a quadraticform, and show that matrices also play an important role in the study of a quadratic form. Quadratic forms arise in a variety of applications, including geometry , number theory, vibrations of mechanical systems, statistics, electrical engineering, etc. A more general type of a quadratic form is a bilinearform which will be described in Section 9.6. As a matter of fact, a quadratic form (or bilinear form) can be associated with a real symmetric matrix, and vice-versa. A quadratic equation in two variables X and y is an equat ion of the form ax
2
+
2bx y
+
cy2
+
dx
+
ey
+f
= 0,
in which the left-hand side consists of a constant term f , a linear form dx + ey , and a quadratic form ax 2 + 2bxy + cy2 . Note that this quadratic form may be written in matrix notation as ax
2
+
2bxy + cl
where x
=[~
= [x
]
y] [ :
and
A
~] [ ~ ] = xT Ax,
= [:
~
l
Note also that the matrix A is taken to be a (real) symmetric matrix. Geometrically, the solution set of a quadratic equation in X and y usually represents a conic section, such as an ellipse , a parabola or a hyperbola in the xy -plane. (See Figure 9.1.) J H Kwak et al., Linear Algebra © Birkhauser Boston 2004
320
Chapter 9. Quadratic Forms
Definition 9.1 (1) A linear form on the Euclidean space jRn is a polynomial of degree 1 in n variables XI. X2, ...• Xn of the form n
b
T
X
=
L:>;x;. ;= 1
where x = [XI .. . xn]T and b = [bl ... bn]T in jRn. (2) A quadratic equation on jRn is an equation in n variables XI. X2 , .. . • Xn of the fonn n n n f(x) = L L aijx;xj + L bix; + c = O. ;=1 j=1
;=1
where aij , bj and c are real constants. In matrix form. it can be written as
= xT Ax + b T X + c = 0, . . , xn]T and b = [bl .. . bnf in lRn •
f(x)
where A = [aij]. x = [XI (3) A quadraticform on jRn is a (homogeneous) polynomial of degree 2 in n variables XI. X2 • . . . •Xn of the form
q(x)
= xT Ax = [XI X2
., . xn][aij]
X2 XI ] : [ .
n
n
= LLa;jX;Xj, ;=1 j=l
Xn
where x = [XI X2 .. . xnf E jRn and A = [aij] is a real n x n symmetric matrix. It is also possible to define a linear form and a quadratic form on the Euclidean complex n-space en instead of the Euclidean real n-space lRn • Definition 9.2 (1) A linear form on the complex n-space en is a polynomial of degree 1 in n complex variables XI. X2 • . . . •Xn of the form n
bH x = " L.Jb;x; '- , ;= 1
=
=
where x [XI .. . xn]T and b [bl . . . bnf in en. (2) A complex quadratic form on en is a polynomial of degree 2 in n complex variables Xl. X2 • . . . •Xn of the form
where x E en and A
= [aij] is an n x n Hermitian matrix .
9.1. Basic properties of quadratic forms
321
The real quadratic form on jRn and the complex quadratic form on en can be denoted simultaneously as q(x) = (x, Ax) by using the dot product on the real n-space jRn or on the complex n-space en . Remark: (I)A quadratic equation f(x) is said to be consistent ifithas a solution, i.e., there is a vector x E jRn such that f(x) = O. Otherwise, it is said to be inconsistent. For instance, the equation 2x 2 + 3 y 2 = -1 in jR2 is inconsistent. In the following, we will consider only consistent equations. (2) A linear form is simply the dot product on the real n-space jRn or on the complex n-space en with a fixed vector b. (3) The matrix A in the definition of a real quadratic form can be any square matrix. In fact, a square matrix A can be expressed as the sum of a symmetric part B and a skew-symmetric part C, say
A = B+C,
1
where B = -(A
2
+
TIT
A ) and C = -(A - A ).
2
For the skew-symmetric matrix C, we have
Hence, as a real number, x T Cx = O. Therefore, q(x) = x T Ax = x T (B
+
C)x
= x T Bx.
This means that, without loss of generality, one may assume that the matrix A in the definition of a real quadratic form is a symmetric matrix. (4) For the definition of a complex quadratic form, let A be any n x n complex matrix. Then, for any x E en, the matrix product x H Ax is a complex number. But, for the matrix A, it is known that there are Hermitian matrices Band C such that A = B + iC. (See page 264.) Hence, x H Ax = x H (B
+
iC)x = x H Bx
+
ix H Cx ,
in which x H Bx and x H Cx are real numbers. Hence, for a complex quadratic form on
en , we are only concerned with a Hermitian matrix A so that x H Ax is a real number for any x E en.
The solution set of a consistent quadratic equation f (x) = x T Ax + b T x + c = 0 is a level surface in jRn, that is, a curved surface that can be parameterized in n - 1 variables. In particular, if n = 2, the solution set of a quadratic equation is called a quadratic curve, or more commonly a conic section. When n = 3, it is called a quadratic surface, which is an ellipsoid, a paraboloid or a hyperboloid. Example 9.1 (The standard three types of conic sections) (1) (circle or ellipse)
~ + ~ = 1 with A = [~ ~] .
322
Chapter 9. Quadratic Forms 2
2
2
(2) (hyperbola) ~ - ~ = 1 or ;;. -
[* _~]
A= (3) (parabola) x
ir2 =
1 with
A=
or
[-t
~].
= ay or y2 = bx with A = [~ ~] or A = [~ ~].
2
All of these cases are illustrated in Figures 9.1 as conic sections.
0
Figure 9.1. Conic sections
Example 9.2 (The standardfour typesof quadratic surfaces) (1) (ellipsoids)
~+ ~+ ~=
1 with A
=
00&
.
2
(2) (hyperboloids of one or two sheets) ~ a 2 2 ~ + ~ = 1 (of two sheets) with
:0!I a
[o
A=
2
(3) (cones) ~
+
2
~
-
2
~
01 b! 0
0] 0
A~[~ 0] 0 0
2 trb -
A=
2
2
~ = 1 (of one sheet) or -~ c a -
[-:!I0 a
0
= 0 with A =
(4) (paraboloids; elliptic or hyperbolic) (hyperbolic) with
+
or
-1c
o1 b! o
[* X ~].
:0!I b!0 0] 0 a
[o
~+ ~
1
0
-&1
.
= ~ (elliptic) or ~ - ~ = ~, c > 1
or
01 0] -b! &? . 0
'Q! A= [ ~
0
9.1. Basic properties of quadratic forms
2
Figure 9.2. Ellipsoid: a~
2
2
2
+
trb
2
+
~ c
323
=1
x2
2
2
2
Figure 9.3. Hyperboloid of onesheet: ~ + trb - c ~ = 1;and of twosheets: -:;r - trb + ~ a a e
2
Figure 9.4. Cone: ~ a
+
2
2
trb - 5 c
=0
All of these cases are illustrated in Figures 9.2-9.5. Problem 9.1 Find the symmetricmatricesrepresenting the quadraticforms
4xj +
(1) 9x[ - xi + (2) XjX2 + XjX3 (3) X[ + xi -
+
xj -
6XjX2 - 8XjX3 + 2X2X3, X2x3, xl + 2xjX2 - lOxjX4 + 4X3X4.
=1
o
324
Chapter 9. Quadratic Forms
Figure 9.5. Elliptic paraboloid: c>O
~+ ~
=
~
and Hyperbolic paraboloid:
~
-
~
=
~,
9.2 Diagonalization of quadratic forms In this section, we discuss how to sketch the level surface of a quadratic equation on jRn. To do this for a quadratic equation f(x) x T Ax + b T X + c 0, we first T consider a special case of the type x Ax = c without a linear form .
=
=
A quadratic form on jRn without a linear form may be written as the sum of two parts: n
T
q(x) = x Ax = 'L,aiixl
+
;=1
2 'L,aijXiXj, i
L:?=I
in which the first part aiixl is called the (perfect) square terms and the second part Li;fj aijxjxj is called the cross-product terms. Actually, what makes it hard to sketch the level surface of a quadratic equation is the cross-product terms, However, the quadratic form q(x) = x T Ax can be transformed into a new quadratic form without the cross-product terms by a suitable change of variables . It can be done by computing the eigenvalues of A and their associated eigenvectors. In fact, the symmetric matrix A can be orthogonally diagonalized, i.e., there exists an orthogonal matrix P such that
T
AI
-I
P AP = P AP = D = [
o
A2
0 ]
.. .
. An
Here, the diagonal entries Ai's are the eigenvalues of A and the column vectors of P are their associated eigenvectors of A. Now, by setting x = Py, (that is, by a change of variables), we have
which is a quadratic form without the cross-product terms, It is also true for a complex quadratic form q(x) = x H Ax with a Hermitian matrix A. Since every Hermitian matrix is unitarily diagonalizable, there exists a
9.2. Diagonalizationof quadratic forms
325
unitary matrix U such that U H AU = D is a diagonal matrix. Hence , by a change of variables x Uy, the quadratic form q(x) s" Dy has only square terms.
=
=
In either case of real or complex, we consequently have the following theorem .
Theorem 9.1 (The principal axes theorem) (1) Let x T Ax be a quadraticform in x = [XI X2 • • • xnf E ]Rn for a symmetric matrix A. Then, there isachangeofcoordinatesofxintoy = pT X = [YI Y2 . . . Yn]T
such that
x TAx
= yTDy = AIYf +
A2yi + .
where P is an orthogonal matrixand pT AP
.. +
AnY; ,
= D is diagonal.
(2) Let x H Ax be a complex quadraticform on en witha Hermitian matrix A. Then, thereis a change ofcoordinates ofx into y = U H X = [YI Y2 ... Ynf such that
where U is a unitary matrixand U H AU = D is diagonal. Clearly, the columns of the matrix P (U, respectively) in Theorem 9.1 form an orthonormal basis for R" (for en, respectively) and it is called the principal axes of the quadratic form. The vector y is just the coordinate expression of x with respect to the principal axes. Example 9.3 (Via a diagonalization ofa quadraticform) Determine the conic section 3x 2 + 2x y + 3y 2 - 8 = 0 on ]R2. Solution: In matrix form, it is
The matrix A =
[~
; ] has eigenvalues Al = 2 and A2 = 4 with associated unit
eigenvectors
VI=(~' - ~)
and
respectively, which form an orthonormal basis p. If ex denotes the standard basis, then the basis-change matrix
P = [id]p = [VI V2] =
1 [ r;;
"0/2
1 1] [ -1 I =
cos 45°
.
-sm45°
sin 45° ] cos45°
which is a rotation through 45° in the clockwise direction such that pT gives a change of coordinates, x = Py, i.e.,
,
= p-I . It
326
Chapter 9. Quadratic Forms
~
[ ]=
~ [-~ ~] [ ~; ] = [ - J,::: ~~:
l
It implies that 3x 2
+ 2xy + 3y2 = =
or
= yT pT APy yT [~ ~] y = 2(x')2 + 4(y')2
xTAx
(X')2
= 8,
(y')2
-4-+-2-= I, which is an ellipse with the principal axes VI
= pT el and V2 = pT e2.
0
To determine the type of a quadratic form, we introduce the following definition for a symmetric matrix A or a quadratic form x T Ax.
Definition 9.3 Let A = [aij] E Mnxn{lR) be a symmetric matrix and let x = (Xl , X2, ... , Xn ) E jRn . Then, the matrix A, or a quadratic form xT Ax, is said to be (1) (2) (3) (4) (5)
positive definite if x T Ax = Li,j aijXiXj > 0 for all nonzero x, positive semidefinite if x T Ax = Li,j aijXiXj 2: 0 for all x, negative definite if x T Ax = Li,j aijxix j < 0 for all nonzero x, negative semidefinite if x T Ax = Li,j aijxix] ::: 0 for all x, indefinite if x T Ax takes both positive and negative values.
Similarly, one can define the same terminologies for a Hermitian matrix or for a complex quadratic form x H Ax on en . For example, the real symmetric matrix
is positive definite, because the quadratic form satisfies
xT Ax =
[Xl x2 X3] [ -
io -;-1 -~2 ][ ~~ ] X3
=
[Xl X2 X3] [
_XI2~2~2X~ X3 ] -X2 + 2X3
= Xl (2Xl - X2) + X2(-Xl + 2x2 - X3) + X3(-X2 + 2X3) = =
2xr - 2X\X2 + 2xi - 2X2X3 + 2xi xr + (Xl - X2)2 + (X2 - X3)2 + xi
> 0
9.3. A classification of levelsurfaces
unless XI =
X2
=
X3
327
= O.
The following characterizationsfollow from the principal axes theorem. Corollary 9.2 A real symmetric or a Hermitian matrix A is (1) (2) (3) (4) (5)
positive definite if and only if all the eigenvalues of A are positive, positive semidefinite if and only if all the eigenvalues of A are nonnegative, negative definite if and only if all the eigenvalues of A are negative, negative semidefinite if and only if all the eigenvalues of A are nonpositive, indefinite if and only if A takes both positive and negative eigenvalues.
Note that if A is positive definite, det A > 0 (as the product of all eigenvalues). If the eigenvalues of A are all negative, then - A must be positive definite and consequently A must be negative definite. If A has eigenvalues that differ in sign, then A is indefinite. Indeed, if Al is a positive eigenvalue of A and XI is an eigenvector belonging to AI, then
and if A2 is a negativeeigenvaluewith eigenvector X2, then
If A is definite,then 0 is the only criticalpoint of a quadraticform q(x) = (x, Ax), and q (0) = 0 is the global minimum if A is positive definiteand the global maximum if A is negativedefinite. If A is indefinite, then 0 is a saddle point.
9.3 A classification of level surfaces We have seen already in Section 9.1 that the geometric type of the level surface of a quadraticequation xT Ax = 0 depends on the signs of the eigenvalues of A. In fact, it is completely determined by the numbers of positive, negative and zero eigenvalues of A. Definition 9.4 The inertia of a Hermitian (or a symmetric) matrix A is a triple of integers denoted by In(A) = (p, q , k), where p, q and k are the numbersof positive, negative and zero eigenvalues of A, respectively.
The inertiaIn(A) determinesthegeometrictype ofthe quadraticsurfacex T Ax = 0 on jRn in the following sense. Since In(-A) = (q, p, k) ifIn(A) = (p , q, k) and the equation x T Ax = c is inconsistent if p = 0 and c > 0, it suffices to consider the cases of c :::: 0 and p > O. Excluding those inconsistentcases, we have the following characterization of the solution sets for n = 2 and 3:
328
Chapter 9. Quadratic Forms For n
= 2, there are only three possible cases for In(A): The solution of x T Ax = c
In(A) (p , q , k) c>O
c=O
(2,0,0) ellipse
a point
(I, 1,0) hyperbola
two lines crossing at 0
(1,0 , 1) two parallel lines
a line
For n = 3, there are six possibilities: The solution of x T Ax = c
In(A) (p, q, k) c>O
c=O
(3,0,0) ellipsoid
a point
(2, 1,0)
one-sheeted hyperboloid elliptic cone
(2,0, 1) elliptic cylinder (1,2,0)
a line
two-sheeted hyperboloid elliptic cone
(1 , I, 1) hyperbolic cylinder
two planes crossing in a line
(1 ,0,2) two parallel planes
a plane
In general , for an n x n symmetric matrix A, In(A) will have n(n + 1)/2 possibilities , each characterizing a different geometric type of a quadratic form. For example, if In(A) = (n, 0, 0) and c > 0, i.e., the eigenvalues of A are all positive, then the quadratic form describes an ellipsoid in ]Rn, etc. Example 9.4 (The inertia determines the geometric type of the quadratic form) Determine the quadratic surface for 2xy + 2xz = 1 on ]R3. Solution: The matrix for the given quadratic form is
A=
011] [ ° 1 0 100
and the eigenvalues of A can be found to be Al associated orthonormal eigenvectors VI
= (_1 ~~)
./2 ' 2' 2 '
V2
,
= ./2,
= (__ 1 ~~)
./2' 2' 2 '
V3
A2
= -./2,
= (0
1 [./2 1 1
-./2 0] 1 -./2 , 1./2
= 0, with
__ 1 _1) './2' ./2 '
respectively. Hence, an orthogonal matrix P that diagonalizes A is
P = 2:
A3
9.3. A classification of levelsurfaces and with the change of coordinates X
1 / / = - ( x - y ),
./2
329
x = Py, that is,
1 / / r: / y = - (x + y -v2 z), 2
z = ~(x/ + y/ +
hz/) ,
the equation is transformed to ./2(x/)2 - ./2(y')2 = 1, which is a hyperbolic cylinder as shown in Figure 9.6. Note that In(A) = (1, 1, 1). D
Figure 9.6. Hyperbolic cylinder: 2x y + 2xz
=1
Now, consider a general form of a quadratic equation on IRn
(1) If it does not have a linearfonn, i.e., b = 0, then, as shown already, aparabolic level surface does not appear as a solution of the quadratic equation. (2) Suppose that it has a nonzero linear form , i.e., b ::/= O. If the matrix A is invertible, then, by taking a change of variables as y = x A -I b (or x = y A -I b), (it is a translation) the given quadratic equation is transformed into a new quadratic equation yTAy = d without a linear form, where d = c + ~bT A-lb. However, if A is not invertible, the solution of the quadratic equation depends not only on the inertia of A , but also on the type of linear form, and a parabolic level surface appears as the solution of the quadratic equation with a nonzero linear form, For example, the equation x 2 - Z = c has a singular quadratic form for which In(A) = (1,0,2) and also has a nonzero linear form that cannot be removed by any change of variables. The solution of this equation is a parabolic cylinder when n = 3.
+!
-!
Example 9.5 (A quadratic equation having a linear form) Determine the conic section for 3x 2 - 6x y + 4y2 + 2x - 2y = O. Solution: The matrix for the quadratic form 3x 2 - 6x y A =
[ 3-3] -3
4
.
+ 4y2 is
330
Chapter 9. Quadratic Forms
Its inverse is A-I =
j [~ ;]
With the change of variables y = x
x
,
+
b = [ _;
and
! A-I b , that is
l
1, y = y,
= x + 3'
the equation is transformed to a new equation 3(x')2 - 6x' y' + 4(/)2 = j. Clearly, the matrix representation of the new quadratic form is also A, and its eigenvalues are ± .,ffl). Therefore, In(A) = (2,0,0) and the solution of the equation is an ellipse. D
1(7
The following is another view of the classifying of the conic sections in ]R2 and it can be skipped at the reader's discretion.
Example 9.6 (The classification of the conic sections in ]R2) Consider a quadratic equation in two variables on ]R2 : ax 2 + 2bxy Or, in matrix form
+ ci + dx + ey + f
x Ax + b T
with the symmetric matrix A
T
X
=
o.
+ f = 0,
= [~ ~], b = [d ef and x = [x yf in ]R2. We
present here the classification of the conic sections according to the coefficients. (1) If b = 0 , then A is already a diagonal matrix with the eigenvalues a and c, and the equation becomes
ax 2 + cy2
+ dx + ey + f
=
o.
(I) If a = 0 = c, then the conic section is a line in the plane. (Ii) If a 0 = c, then it is a parabola when e 0, or one or two lines when e = O. (iii) If a 0 c, then the quadratic equation becomes
t= t= t=
t=
ax 2 + ci
+ dx + ey + f
= a(x - p)2
+ c(y -
q)2
+h =
0
for some constants p, q, and h. If h = 0, the cases are easily classified (try). Suppose h O.Then, the conic section is a circle if a = c, an ellipse if ac > 0, or a hyperbola if ac < O.
t=
t=
(2) Suppose that b O. Since A is symmetric, it can be diagonalized by an orthogonal matrix P whose columns are orthonormal eigenvectors, and the diagonal matrix has eigenvalues AI and A2 on the diagonal. By a basis-change by P, the quadratic equation becomes
ax 2 +2bxy + cy2 +dx
+ ey+ f
=
AIU 2+A2V2 +d'u + e'v + f
= 0
9.3. A classificationof level surfaces
331
for some constantsd' and e', Hence,the classification ofthe conic sectionsis reduced to thecase (1) according tothevarious possiblecasesof theeigenvalues of A. However, the eigenvalues are given as Al =
(a
+ c) + ../ (a -
c)2 + 4b 2
2
'
A2
=
(a +c) - ../(a - c)2 +4b2
2
'
whichare determined by the coefficients a, b, and c. Henceone can classifythe conic sectionaccording to the various possiblecases a, b, and c (see Exercise 9.4). (3) The axes of the conic sectionare the directions of the eigenvectors, whichare orthogonal to each other. Since we only need to find axis lines, but not the direction vectors, we may choose them to be the rotation of the standard coordinate x- and y-axes which are determined by ej , e2 . Now, a pair of orthogonal eigenvectors are found to be Vi
= [ ~:~ ] = [ -(a ~ Ai)
]
for i = 1,2. The slope of VI from the x-axis is =
_a2~c +
(a;Cr +1
= - cot 28 + cosec 28 = tan 8,
°
where cot 28 = aibc for some 8. Since b :f; and a - c > 0, one may assume that < 8 < T( . This means that if we set tan 28 = a~c with - ~ < 8 < ~ , then 8 is the rotation angle we were looking for. Therefore, the orthonormal eigenvectors Ul and U2 of A may be chosen as the rotationof the standardbasis through the angle 8. The basis-change matrix is now
°
P -_ [Ul
0]
l_[cOSO -sinO] d pT Ap_p-l A P_[AI . 0 0 '11.2 sm cosol l ' an
U2 -
By a change of coordinates [ ax 2 + 2bxy + cy2 + dx
~] =
P[
+ ey + f =
~; ], the quadratic equationbecomes
AIX,2
+
A2y,2 + d'x' + e'y' +
f
where d' = d cos8 + e sin 0 and e' = -d sin 0 + e cos O. Problem 9.2 Sketchthe level surfaceof each of the following quadratic equations: (1) 2x 2 + 2 y 2 + 6yz
+ lOz2 = 9; (2) x 2 - 8xy + 16y2 - 3z 2 = 8; (3) 4x 2 + 12xy + 9 y 2 + 3x - 4 = O.
.
= 0, o
332
Chapter 9. Quadratic Forms
9.4 Characterizations of definite forms In the previous section, we have seen that the geometric type of a quadratic equation (x, Ax) = 0 depends on the inertia of the matrix A. Hence, it is important to determine whether or not a Hermitian (or symmetric) matrix A is positive definite or negative definite. In most cases, the definition does not help much for such criteria. But we have seen that Corollary 9.2 gives us a practical characterization of positive definite matrices: A is positivedefinite if and only if all eigenvalues of A arepositive. We will find some other practical criteria in terms of the determinant of the matrix. For this, we again look at the quadratic form in two real variables, q(x , y) ax 2 + 2bxy + cy2, which may be rewritten in a complete square form as
=
q (x) = ax
2
+
2bxy
+
ci = a (x
+
~ y) 2+
(c _
~2) i.
We see that q is positive definite, i.e., q(x) = x T Ax > 0 for any nonzero vector x (x, y) E jR2, if and only if a > 0 and ac > b2, or equivalently, the determinants of
=
[a]
and
[:
~]
are positive. A generalization of these conditions will involve all n submatrices of A, called the principal submatrices of A, which are defined as the upper left square submatrices
With this construction, we have the following characterization of positive definite matrices. Theorem 9.3 The following are equivalent for a Hermitian (or a real symmetric)
matrix A: A is positive definite, i.e., (x, Ax) > 0 for all nonzero vectors x; all the eigenvalues of A are positive; all the principal submatrices Ak 's havepositive determinants; A can be reduced to an upper triangular matrix by using only the elementary operation of "adding a constant multiple of a row to another," (without row interchanges), and all the pivots are positive; (5) there exists a lower triangularmatrix L withpositive diagonalentriessuch that A = LLH, (= LLT if A is real symmetric) (calleda Cholesky decomposition or a Cholesky factorization); (6) there existsa nonsingular matrix W such that A = W H W, (= WT W if A is real symmetric).
(1) (2) (3) (4)
9.4. Characterizations of definite forms
333
Proof: (1) ¢} (2) was shown . (2) =} (3) First, we prove it for the real case with a symmetric matrix A. If A has positive eigenvalues ).,1, ).,2, . . . , ).,n, then det A = ).,1).,2 · ··).,n > O. To prove the same result for all the submatrices At, we claim that if A is positive definite, so is every Ak . For each k = 1, . . . , n, consider all the vectors whose last n - k components are zero , say = [XI . , . Xk 0 ... of = [Xk of, where is any vector in ]Rk. Then
x
x T Ax
xl
*] [
= [Xk 0] T [Ak* *
Xk 0 ]
= xkT AkXk.
x, xl
Since x T Ax > 0 for all nonzero AkXk > 0 for all nonzero Xk E ]Rk; that is, Ak 'S are positive definite, all eigenvalues of Ak are positive, and their determinants are positive. The complex case with a Hermitian matrix A can be proved by the same argument except for using H instead of T. (3) =} (4) Let A = [aij] . Then the first principal submatrix AI = [an] has positive determinant, i.e., an > O. Thus , an can be used as a pivot for forward elimination to make all other first column entries below all zero. Let ai~ denote the (2, 2)-entry of the resulting matrix, so that the principal submatrix A2 has been transformed into a matrix [
al l
a12] (I)
O a22
•
Since the elementary operation of "adding a constant multiple of a row to another" does not change the determinant, we have detA2
(I) = alla22
an
d
(I)
a 22
det A2 det A2 =- =-> all det Al
O.
Since ai~ ¥= 0, it can be used as a pivot in the second forward elimination, to transform the principal submatrix A3 into
Also, one can show that and
(2) a33
det A3 det A3 = --(-I) = -d-> ana22
et A2
O.
A similar process can be repeated to get an upper triangular matrix as a row-echelon form of A with the k-th pivot a1~-I), which is exactly the ratio of det Ak to det Ak-I. Hence, all pivots are positive. (4) =} (5) First, consider the real case . By the hypothesis (4) and the uniqueness of the LDU factorization of a symmetric matrix, the matrix A can be factored as a product L DL T with
334
Chapter 9. Quadratic Forms
d; > 0.
Define
Then, clearly det( yIn) > 0, D A
= ,Ji5,Ji5 and (,Ji5l = ,Ji5. Hence,
= LDL T = (L.JD)(.JDL T) = (L.JD)(L.JD)T,
as desired. A similar process can be repeated for a complex case with H instead of T
(5) (6)
=} =}
(6) is easy. (1) Let A = W T W for a nonsingular matrix W. Then, for
x T Ax
= x T (WTW)x =
(Wxl (Wx) =
x =1= 0,
IIWxll 2 > 0,
o
because Wx =1= O. Similarly for a complex case .
Problem 9.3 Determine which one of the following matrices A and B is positive definite. For the positive definite one, find a nonsingular matrix W such that it is WT W . A
=
[ 2-I -1] -1 -I
2 -I -I 2
B=
,
2-I 0] [o -1
2 1
.
1 2
Problem 9.4 Let A be a positive definite matrix. Prove that CT AC (or C H AC for a complex case) is also positive definite for any nonsingular matrix C .
Since a Hermitian matrix A is negative definite if and only if - A is positive definite, one can get the following theorem from Theorem 9.3. Theorem 9.4 The following statements are equivalent for a Hermitian (or a real symmetric) matrix A:
°
(1) A is negative definite, i.e., (x, Ax) < for all nonzero vectors x: (2) all the eigenvalues of A are negative; (3) the determinants ofthe principal submatrices Ak 'salternate in sign: i.e., det AI < 0, det A2 > 0, det A3 < 0, and so on,'
9.5. Congruence relation
335
(4) A can be reduced to an upper triangular matrix by using only the elementary operation of "adding a constant multiple of a row to another," (without row interchanges), and all the pivots are negative; (5) there exists a lower triangular matrix L with positive diagonal entries such that A = -LL H , (= -LL T if A is real symmetric); (6) there exists a nonsingular matrix W such that A = - WH W, (= - WTW if A is real symmetric). Problem 9.5 Show that the determinant of a negative definite n x n symmetric matrix is positive if n is even and negative if n is odd.
One can easily establish the following analogous theorem for semidefinite matrices.
Theorem 9.5 The following statements are equivalent for a Hermitian (or a real symmetric) matrix A: (1) A is positive semidefinite, i.e., (x, Ax) ~ 0 for all nonzero vectors x; (2) all the eigenvalues of A are nonnegative; (3) A can be reduced to an upper triangular matrix by using only the elementary operation of "adding a constant multiple of a row to another," (without row interchanges), and all the pivots are nonnegative; (4) there exists a matrix W, possibly singular, such that A WHW. (= WTW if A is real symmetric).
=
Problem 9.6 Determine whether the following statement is true or false: A Hermitian matrix A is positive semidefinite if and only if all the principal submatrices Ak'S have nonnegative determinants. Problem 9.7 State the corresponding conditions to the ones in Theorem 9.5 for the negative semidefinite forms. Problem 9.8 Which of the following matrices are positive definite? negative definite? indefinite?
(I)
[
121] 2 1 1 I 1 2
,
(2)
[
200] 0 5 3 ,
035
(3) [ -:
-I 0] 2 I I 3
.
9.5 Congruence relation As we have seen already, in a quadratic equation x T Ax + b T X + c = 0 on IRn , the linear form may be eliminated by a change of variables when A is invertible, and then by the principal axis theorem the equation can be transformed into a simple form yTAy c having only square terms. Hence, the geometric type of the quadratic
=
336
Chapter 9. Quadratic Forms
equation may be easily classified. However, these changes of variables contain basis changes by some invertible matrices. Let us now consider a change of basis (or variables) and a relation between two different matrix representations of a quadratic form, Usually a real or complex quadratic form q(x) = (x, Ax) is expressed in the coordinates of x with respect to the standard basis ex = {ej , C2, , en} for IR n or for en depending on a real or complex case. Let f3 = {e~, ~, , e~} be another basis. Then, any vector x has two coordinate representations [x], and [x]p through the equations x\e\
+
X2e2
+ ...+
xne n
=X
= y\e~
+
Y2e~ +
... +
Yne~.
They are related as [x]a = P[x]p, where P = [id]p is the basis-change matrix from f3 to ex. This is just a change of variables. If we set notations x = [x]a and y = [x]p, then the quadratic form can be written as q(x) = (x, Ax) = (Py, APy) = (y, pH APy) = (y, By) ,
where B = pH AP and (y, By) is the expression of q(x) = (x, Ax) in a new basis (or a new coordinate system) f3.
Definition 9.5 (1) Tworeal n x n matrices A and B are said to be congruent if there exists an invertible real matrix P such that pTA P = B. (2) Two complex n x n matrices A and B are said to be Hermitian congruent if there exists an invertible complex matrix P such that pH A P = B. It is easily seen that the congruence relation is an equivalence relation in the vector space Mnxn(IR), and any two matrix representations of a quadratic form on IR n with respect to different bases are congruent. A similar statement also holds for a Hermitian congruence and a complex quadratic form,
Remark: (1) Two orthogonally similar real matrices are clearly congruent, but the converse is not true in general. Clearly, a real symmetric matrix A is congruent to a diagonal matrix D by an orthogonal matrix P . However, it can be congruent to infinitely many different diagonal matrices (not necessarily by orthogonal matrices). In fact, if pT AP = D by an orthogonal matrix P, then the matrix Q = kP, k =F 0, also diagonalizes A to a different diagonal matrix via a congruence relation:
which is also diagonal with diagonal entries k 2AJ, k2A2, ... , k 2An. In this case, if =F ±1, Q is not an orthogonal matrix and the resulting diagonal entries are not the eigenvalues of A anymore. (2) Sylvester's law of inertia (Theorem 9.10 in Section 9.6) says that even though a real symmetric matrix A may be congruent to various diagonal matrices, the numbers of positive, negative and zero diagonal entries are invariant under the congruence relation. That is, any two symmetric matrices which are congruent have the same inertia. A similarity holds for Hermitian matrices: any two Hermitian matrices which are Hermitian congruent have the same inertia. (See Corollary 9.11.) k
9.5. Congruence relation
337
Certainly, the inertia of a real symmetric matrix (a Hermitian matrix in a complex case) can be found by computing the eigenvalues. However, there is another practical method of diagonalizing it through the congruence (the Hermitian congruence in a complex case) relation by using the elementary row operation of adding a constant multiple of a row (or a column) to another row (or a column). First, suppose that a real symmetric matrix A is diagonalized by an invertible matrix P through the congruence relation pT AP = D. Since both P and pT are invertible matrices, P T can be written as a product of elementary matrices, say P T Ek '" E2E\. Then we have
=
Recall that for any elementary matrix E, the product E A is exactly the matrix that is obtained from A when the same elementary row operation is executed on A. Clearly, if E is an elementary matrix, so is E T • Moreover, if an elementary matrix E is obtained by executing an elementary operation on the i -th row, then the product E AE T is just the matrix that is obtained from A when the same elementary operation is executed both on the i -th row and on the i -th column. Since A is symmetric, the operation EAE T will have the same effect on the diagonally opposite entries of A simultaneously. For instance, if
A=
112] 1 0 3
[ 236
and E
=[
1 0 0]
- 1 1
0
,
001
which is an elementary matrix adding -1 times the first row to the second row, then EA is the matrix obtained from A by replacing the second row [1 0 3] by [0 - 1 1]. Now, the matrix EAET is obtained from the matrix EA by replacing the second column by [0 - 1 If for the symmetry of the matrix EAE T . In fact,
It implies that the operations performed from the left of A (i.e., the product of Ek . .. E2E\) are nothing but a forward elimination on A to get an upper triangular matrix pT A and those from the right (i.e., the product of Ef EI ... ED are the corresponding column operations to yield a diagonal matrix D. In summary, ifwe take aforward elimination on A to getan uppertriangularmatrixby the elementarymatrices E\, . . . , Ek, then Ek ' " E\AEf . . , E[ = D is diagonaland pT = Ek'" E\. It gives
[A I I] ~ [E\AEf I E\l] ~ [E2E\AEf EI I E2E\I] ~ ... ~ [Ek'" E\AEf .. · E[ I Ek'" E\I] = [D I pT].
338
Chapter 9. Quadratic Forms
Remark: (1) In the conjugate relation p T AP = D, the matrix P need not be an orthogonal matrix, and the diagonal entries of D need not be eigenvalues of A. (2) Be careful not to apply the same argument for the diagonalization of symmetric D, because multiplying E- 1 on the right matrices through the similarity p- I AP of A is not the same column operation as E T, so that the operations EAET do not work for the diagonalization of A.
=
To diagonalize a Hermitian matrix A through the Hermitian congruence as a complex case, a similar argument as the real case can be applied in parallel with H instead of T. The following example shows how to determine the inertia of a real symmetric matrix A through the congruence relation (instead of computing the eigenvalues). Example 9.7 (Computing In(A) through the congruence relation) Determine the inertia of the symmetric matrix
A=
[1 °12] 1 3 236
.
Is it positive definite, negative definite, or indefinite? Solution: The preceding method produces
[A I I] =
~ [E2 EIAE[ EI 1E2EIl] =
~ [E3 E2EIAE[ EI Ej I E3 E2EI /] =
=
1 2I1 0 310 1 3 610
U°°
U° U° °
°1
n° °n ° n
01 -1 1 1-1 1 1 21-2
1 -1 I -1 1 31-3 1 T [D I p ], 01
where
° °
° °
° ° .
100] [ 1 0 0] [100] EI = [ -1 1 ,E2 = 1 , E3 = 1 0 1 -2 0 1 0 1 1
Since the diagonal entries of Dare 1, -1 and 3, we get In(A) = (2,1,0) and A is 0 indefinite. One can check that p'I' AP = D by a direct computation.
9.6. Bilinear and Hermitian forms
339
Problem 9.9 Find an invertible matrix P such that p T AP is diagonal for each of the following symmetric matrices :
Problem9.10 For each A of the following Hermitian matrices, find an invertible matrix P such that pH AP is diagonal and determine In(A) : (1) A
=
0 I . ] 1 I , (2) A [ -i 0 2
~
=
1] , (3) A = [OOi] 0 0 0 .
[I I - 3i 2i I + 3i 4 5 1 -2i
-i 0 0
9.6 Bilinear and Hermitian forms In this section, we are concerned with two new forms, bilinear and Hermitian, to have a little deep insight into a real or complex quadratic form, and prove Sylvester's law of inertia as one of the main results. Definition 9.6 A bilinear form on a pair of real vector spaces V and W is a realvalued function b : V x W -+ JR on V x W satisfying
eb(x' , y), (2) b(x, ky + ey') = k b(x , y) + e b(x, y' )
(1) b(kx + ex' , y) = k b(x , y) +
for any x, x' in V, y, y' in W and any scalars k, b : V x V -+ JR is called a bilinear form on V.
e. In
particular, if V = W ,
The conditions (1) and (2) say that b is linear in the first variable and also in the second variable. In this sense, the function b : V x W -+ JR is said to be bilinear. Example 9.8 (Every inner product is a bilinearform) Let A be an m x n real matrix and let b : JRm X JRn -+ JR be defined by b(x, y) = x T Ay for x E JRm, y E JRn. Then b is clearly a bilinear form. In particular, if m = n and A = In, the identity matrix, then it shows (1) the dot product on JRn is a bilinear form. In general, (2) any inner product b(x, y) = (x, y) on a real vector space is a bilinear form. 0 Example 9.9 Let V be a vector space and V* its dual vector space, that is, V * = £(V ; JR). Let b : V x V* -+ JR be defined by b(v, v") =v*(v)
for any v
E
V , v"
E
V *.
Then, b is clearly a bilinear form on the pair of vector spaces V and V *.
0
340
Chapter 9. Quadratic Forms
Definition 9.7 A bilinear form b on a vector space V is said to be symmetric if b(x, y) = b(y, x) for any x, y E V , and is skew-symmetric (or alternating) if b(x, y) = -b(y, x) for any x, y E V .
=
For example, the bilinear form b : jRn X jRn -1' jR defined by b(x, y) xT Ay is symmetric (skew-symmetric, respectively) if and only if the matrix A is symmetric (skew-symmetric, respectively) , Clearly, the dot product and a real quadratic form on jRn are symmetric bilinear forms. Problem 9.11
for all x
Showthata bilinear formb on lRn is skew-symmetric if andonlyif b(x, x)
=0
E jRn.
Definition 9.8 A sesquilinearform on a complex vector space V is a complex-valued function b : V x V -1' C satisfying
=
(1) b(kx + ts', y) k b(x , y) (2) b(x, ky + ly') = k b(x , y)
+ 'l b(x' , y) + l b(x , y')
(semilinear in IS' variable), (linear in 2nd variable)
for any x, s', y, y' in V and any complex scalars k, l. A sesquilinear form is called Hermitian if it satisfies (3) b(x , y) = bey, x) for any x, yin V.
Example 9.10 (Every complex inner product is a Hermitianform) For any n x n complex matrix A, the function b : en x en -1' e defined by b(x, y) x H Ay for x, y E en is a Hermitian form if and only if A is a Hermitian matrix . In fact, b(x, y) = x H Ay is certainly semilinear in the first variable, and s" Ax = x H Ay for all x, y E en if and only if the matrix A is Hermitian. As a special case, if one takes A = In, the identity matrix, then it shows
=
(1) the dot product on en is a Hermitian form. In general, (2) any complex inner product b(x, y) = (x, y) on a complex vector space is a 0 Hermitian form.
=
Let b : V x V -1' jR be a bilinear form on a real vector space V, and let a {VI, V2 ," " vn } be a basis for V. Such a bilinear form is completely determined by the values b(Vi, Vi) of the vectors Vi, Vi in the basis a because of the bilinearity. In fact, if XlVI + X2V2 + x Y YI VI + Y2 V2 +
= =
are vectors in V, then n
b(x , y) =
L
XiYjb(Vi, Vi) = [X]~A[Y]a ,
i,i=I
where A = [aij], aij = b(Vi, Vi)' It is called the matrix representation of b with respect to the basis a and denoted by [b]a. Let fJ be another basis for the vector space V and let P = [i d]p be the basis-change matrix from fJ to a. Then we get
9.6. Bilinear and Hermitian forms
P[x]p
341
= [id]p[x]p = [x]a
for any x in V, and b(x, y) = [x]r A[Y]a = [x]~ (pT AP)[y]p for any x and y in V. Thus, two matrixrepresentations ofa bilinearform b withrespect
to differentbases are congruent, and conversely any two congruentmatrices can be matrix representations ofthe samebilinearform (verify it). Moreover, a bilinear form is symmetric (or skew-symmetric) if and only if its matrix representation is symmetric (or skew-symmetric) for any basis. A similar process works in a complex case with H instead of T in order to have a matrix representation [b]a of the sesquilinear form b with respect to the basis a. As the real case, one can show that two matrix representations ofa sesquilinearform
b with respect to different bases are Hermitian congruent, and conversely any two Hermitiancongruentmatricescan be matrix representations ofthe same sesquilinear form. Moreover, a sesquilinear form b is Hermitian if and only if its matrix representation is Hermitian for any basis. Problem 9.12 Prove:
=
(1) A bilinear form b is symmetric (or skew-symmetric, resp.) if and only if b(Vi, Vj) b(vj. Vi) (or b(Vi, Vj) = -b(Vj. Vi), resp.) for any vectors Vi, Vj in a basis ex, or equivalently, the matrix representation [bl a is symmetric (or skew-symmetric, resp.) for some basis ex. (2) A sesquilinear form is Hermitian if and only if the matrix representation [bl a is Hermitian for some basis a. (3) A sesquilinear form on a complex vector space V is called skew-Hermitian if it satisfies b(x . y) = -b(y, x) for any x, y in V . Show that a sesquilinear form b is skew-Hermitian if and only if its matrix representation [bla is skew-Hermitian for some basis ex.
Note that congruent or Hermitian congruent matrices have the same rank because the basis-change matrix P is nonsingular and so is p T (or pH). Definition 9.9 The rank of a bilinear or a sesquilinear form b on a vector space V, written rank(b), is defined as the rank of any matrix representation of b. Example 9.11 (Computing rank(b)) Let b : ]Rz x ]Rz ~ ]R be defined by b(x, y) = XIYI + 3xIYz + 2xzYI -XZYZ with respect to the standard basis a Ier . ez}. Then , b is clearly a bilinear form but not symmetric, and the matrix representation of b with respect to a is
=
[b]a =
[~
-i].
UfJ = {VI , vz}withvj = (1,0) , Vz = (1, l)isanotherbasisfor]Rz,thenthematrix representation of b with respect to fJ becomes
342
Chapter 9. Quadratic Forms
[blp
=
= [; :
J.
=
=
because b(VI, VI) 1, b(VI, V2) 4, b(V2, VI) 3 and b(V2 , V2) rankjs], = rank[blp = 2, and the rank of b is also 2.
= 5. Hence , 0
]R be definedby b(x, y) = xIYI -2XIY2 +x2YI -X3Y3 with respect to the standard basis. Is this a bilinear form? If so, find the matrix representation of b with respect to the basis
Problem 9.13 (1) Let b : ]R3 x]R3 -
ex
= {VI = (1,
0, 1), v2 = (1, 0, -1), vs
= (0, 1, D)} .
Find its rank. (2) Let V = M2x2(lR) be the vector space of 2 x 2 matrices, and let b : V x V _ ]R be definedby b(A, B) = tr(A)· tr(B) . Is this a bilinearform? If so, findthe matrixrepresentation of b with respect to the basis
ex
= {£I =
[b
~
J.
£2
= [~
bJ. £3 = [~
~
l
£4
= [~
~]}.
Find its rank.
9.7 Diagonalization of bilinear or Hermitian forms Every inner product (x, Ax} on a real vector space can be represented by a symmetric matrix A, which is diagonalizable. However, it is not true for a bilinear form.
Definition 9.10 A bilinear (or sesquilinear) form b on V is diagonalizable if there exists a basis ot for V such that the matrix representation [bla of b with respect to ot is diagonal. Theorem 9.6 (1) A bilinearform b on a real vector space V is symmetric if and only if it is diagonalizable. (2) A sesquilinearform b on a complex vector space V is Hermitian if and only if it is diagonalizable in which all diagonal entries are real. Proof: We prove only (1) and leave (2) as an exercise. Since every symmetric matrix is orthogonally diagonalizable, we only need to prove the sufficiency. Let a bilinear form b be diagonalizable so that the matrix representation [bla is diagonal for some basis ott Then, for any vectors Vi, Vj in a basis ot = {VI,. . . , Vn } , we have b(Vi, Vj) = b(Vj, Vi). Now, for any two vectors x and y in V, let x = L:7=I xiv, and y = L:J=I YjVj. Then, n
b(x, y)
=L
i,j=I
n
xi Yjb(Vi, Vj)
=L
YjX ib(Vj , Vi)
= bey,
x) .
i,j=I
Hence , b is symmetric. (See also Problem 9.12(1).)
o
9.7. Diagonalization of bilinear or Hermitian forms
343
Problem 9.14 Prove Theorem 9.6(2): a sesquilinear form b on a complex vector space V is Hermitian if and only if it is diagonalizable in which all diagonal entries are real.
Example 9.12 (Diagonalizing a symmetricbilinearform) Let b : ]R3 x ]R3 ~ lRbe the bilinear form defined by b(x, y) = XIY3 - 2X2Y2
+ 2X2Y3 + X3YI +
2X3Y2 - X3Y3.
Clearly, b(x, y) = b(y, x), and the matrix representation of b with respect to the standard basis at = [ej , e2, e3} is
[b]a
=
00 -20 21] ,
[1
2-1
which is symmetric. Hence, the bilinear form b is symmetric. By Theorem 9.6, it is diagonalizable through the congruence. In fact, [[b]a II]
--* [EI[b]aE[
--* [E2EI[b]aE[ EI
where EI
= [~ o
=
I Ell] =
I E2EI I ] =
~1 ~], 1
O~
[ [ O~
0 1 1100] -2 210 1 0 2 -1 10 0 1 011100] -2 0 I 0 1 0 o 1I0 1 1
-1 0 011 -1 -1] 0 -2 0 10 1 0 [ o 0110 1 1
=
[D
E2
= [~
I pT],
~ -~] . 1
0 0
=
By a direct computation, one can show that pT[b]a P D . Moreover, if we take another basis f3 = {CI, C2, C3} consisting of the column vectors of the matrix P, then P = [id]p and
Hence, if we write [x]p = (xl' x the bilinear form b becomes
2' x 3)and [y]p = (Yl' Y2' Y3) as new variables, then D
344
Chapter 9. Quadratic Forms
A skew-symmetric matrix is not diagonalizable in general, but the following theorem shows the structure of a skew-symmetric bilinear form. Note that a bilinear form b is skew-symmetric if and only if b(x, x) 0 for any x in V .
=
Theorem 9.7 Let b : V x V -+ IR be a skew-symmetric bilinear form. Then there exists a basis a for V with respect to which the matrix representation [b]a is of the form
[-~ ~]
o
[-~ ~]
o o
o
Proof: If b = 0, then [b]a is the zero matrix. Also if dim V = 1, then b(x, x) = 0 for any basis vector x in V, so b = O. Now, we assume that b =1= 0 and prove it by induction on dim V. Since b ::J= 0, there exist nonzero vectors x and y in V such that b(x, y) =1= O. By the bilinearity of b, one can assume that b(x , y) 1. Such vectors x and y must be linearly independent, because ify = kx, then b(x, y) = kb(x, x) = O. Let U be the subspace of V spanned by x and y, and let
=
w=
(v
E
V : b(v, u) = 0 for any
U E
U}.
Then, one can easily show that W is also a subspace of V and un W = {OJ. Moreover, U+ W V. In fact for a given vector v E V, let U b(v, y)x - b(v, x)y. It is easy to show that U E U and v - U E W . Thus V = U E9 W, where dim W = n - 2. Clearly, the matrix representation of the restriction of b to U with respect to the basis {x, y}
=
is [ _
=
~ ~], and the restriction of b to W is also skew-symmetric. The induction
hypothesis can be applied to W, and then one can finish the proof.
Problem 9.15 Prove that U n W
0
= {OJ in the proof of Theorem9.7.
Example 9.13 (Block diagonalizing a skew-symmetric bilinear form) Let b : 1R3 x 1R3 -+ IR be the bilinear form defined by b(x, y)
= Xl Y2 -
X2Yl
+
X3Yl - Xl Y3
+
X2Y3 - X3Y2·
Clearly, b(x, y) = - b(y, x), and the matrix representation of b with respect to the standard basis a = [ej , e2, e3} is
9.7. Diagonalization of bilinear or Hermitian forms [b]a
=[
345
0 1 -1] -1 0 1 , 1 -1 0
which is skew-symmetric. By a simple computation, b(ej , e2) = 1 = -b(e2, ej). Let U be the subspace of jR3 spanned by ej and e2, i.e., the xy-plane. If we set W = {v E V : b(v, u) = 0 for any U E U}, then W = {AZ : A E R] , where Z = (1, 1, 1). Clearly, f3 = [ej , e2, z} is a basis for jR3 and b(z, z) = 0 so that [b]p =
010] 0 0 . [ -1000
o
Problem 9.16 Show that any bilinear form b on a vector space V is the sum of a symmetric bilinear form and a skew-symmetric bilinear form.
The following theorem shows how quadratic forms and symmetric bilinear forms are related.
Theorem 9.8 If b is a symmetric bilinear form on jRn, then the function q(x) = b(x, x) for x E jRn is a quadratic form . Conversely, for every quadratic form q, there is a unique symmetric bilinearform b such that q(x) = b(x, x) for all x in jRn. Proof: If b(x, y) = x T Ay is a symmetric bilinear form, then q(x) = b(x, x) = x T Ax is clearly a quadratic form . Conversely, if b is a symmetric bilinear form, then
b(x + y, x + y) = b(x, x)
+
2b(x, y)
+
b(y, y),
which is called thepolarform of b. Hence, for any given quadratic form q (x) = x T Ax with a symmetric matrix A , a bilinear form b can be defined by
b(x, y)
1 = 2[q(x + y) -
q(x) - q(y)].
This form b is clearly symmetric, bilinear and b(x, x) comes from this relation .
= q(x). The uniqueness also 0
The following theorem shows how complex quadratic forms and Hermitian forms are related.
Theorem 9.9 lfb is a Hermitianform on en, then the function q(x) = b(x, x)for x E en is a complex quadratic form . Conversely,for every complex quadratic form q, there is a unique Hermitian form b such that q(x) = b(x, x) for all x in en.
346
Chapter 9. Quadratic Forms
Proof: If b(x, y) = x H Ay is a Hermitian form, then q(x) clearly a complex quadratic form. Conversely, if b is a Hermitian form, then b(x + y, x + y) = b(x - y, x - y)
=
x)
= x H Ax is
b(x, x) + b(y, y) + b(x, y) + b(x, y), b(x, x) + b(y, y) - b(x, y) - b(x, y).
Hence, for any given complex quadratic form q (x) = A, a Hermitian form b can be defined by b(x, y)
= b(x,
= ~[q(X + y) -
q(x - y)] +
xH Ax with a Hermitian matrix
~[q(iX + y) -
q(ix - y)],
which is called the polarform of a Hermitian form b. This form b is clearly Hermitian and b(x, x) = q(x) , which implies the uniqueness of b. 0 Now, we prove Sylvester's law of inertia.
Theorem 9.10 (Sylvester's law of inertia) Let b be a symmetric bilinear or a Hermitianform on a vector space V . Then, the number ofpositive diagonal entries and the number ofnegati ve diagonal entries ofany diagonal representation ofb are both independent of the diagonal representation.
Proof: We only prove it for a symmetric bilinear form, because the other case can be proved by a similar method. Let b be a symmetric bilinear form on a vector space V and let ex. = {XI, . .. ,xp , Xp+I, ... , xn } be an ordered basis for V in which b(Xi . Xi) > 0 b(Xi,Xi)::SO
fori = 1,2, fori=p+l,
, p , and , n,
and let fJ = {y I, ... , Yp'» Ypi + I, . .. , Yn} be another ordered basis for V in which b(Yi, Yi) > 0 b(Yi ,Yi)::SO
for i = 1, 2, fori=p'+l,
, p', and ,n.
= p', let U and W be subspaces
of V spanned by {Xl, " " x p } and 0 for any nonzero vector u E U and b(w, w) ::s 0 for any nonzero vector w E W by the bilinearity of b. Thus, W {OJ, and
To show p
{ypl+l," " Yn}, respectively. Then, b(u, u) >
un =
dim(U
+
W)
= dim U +
dim W - dim(U
n W) = p +
(n - p')
::s n,
or p ::s p', Similarly, one can show p' ::s p to conclude p = p', Therefore, any two diagonal matrix representations of b have the same number of positive diagonal entries. By considering the bilinear form -b instead of b, one can also have that any two diagonal matrix representations of b have the same number of negative diagonal entries . 0
9.7. Diagonalization of bilinear or Hermitian forms
347
Corollary 9.11 (1) Any two symmetricmatrices which are congruenthave the same inertia. (2) Any two Hermitian matrices which are Hermitian congruenthave the same inertia. Definition 9.11 Let A be a real symmetric or a Hermitian matrix. The number of positive eigenvalues of A is called the index of A. The difference between the number of positive eigenvalues and the number of negative eigenvalues of A is called the signature of A . Hence, the index and signature together with the rank of a symmetric or a Hermitian matrix are invariants under the congruence relation, and any two of these invariants determine the third: that is,
= =
the number of positive eigenvalues, the index the rank the index + the number of negative eigenvalues, the signature the index - the number of negative eigenvalues.
=
We have shown the necessary condition of the following corollary.
Corollary 9.12 (1) Two symmetric matrices are congruent if and only if they have the same invariants; index, signature and rank. (2) Two Hermitian matrices are Hermitian congruent if and only if they have the same invariants. Proof: We only prove (I). Suppose that two symmetric matrices A and B have the same invariants, and let D and E be diagonal matrices congruent to A and B, respectively. Without loss of generality, one may choose D and E so that the diagonal entries are in the order of positive, negative and zero. Let p and r denote the index and the rank, respectively, of both D and E . Let d; denote the i-th diagonal entry of D. Define the diagonal matrix Q whose i-th diagonal entry q; is given by qi =
j
l/.Jdi ~/J-dj
iflSiSp ifp
Then,
Hence, A is congruent to Jpr» and similarly so is B. It concludes that A is congruent 0 to B.
348
Chapter 9. Quadratic Forms
Example 9.14 Determine the index, the signature and the rank for each of the following matrices .
12]
o
3 3 6
,
Which are congruent to each other? Solution: In Example 9.7, we saw that the matrix A is congruent to the diagonal matrix D
=
1 00]
[o
0 -1
0
.
0 3
Therefore, A has rank 3, index 2 and signature 1.The matrix B is already diagonal, and has rank 3, index 3 and signature 3. Using the method of Example 9.7, one can show that C is congruent to the diagonal matrix with diagonal entries 1, 1, -4. Therefore, C has rank 3, index 2 and signature 1. (Note that it is not necessary to find the eigenvalue of C to determine its invariants.) We conclude that A and C are congruent and B is congruent to neither A nor C by Corollary 9.12. 0
Problem 9.17 Prove that if the diagonal entries of a diagonal matrix are permuted, then the resulting diagonal matrix is congruent to the original one. Problem 9.18 Prove that the total number of distinct equivalence classes of congruent n x n real symmetric matrices is equal to ~(n + l)(n + 2). Problem 9.19 Find the signature , the index and the rank of each of the following matrices . (1)
0 I 2] [ 21 3-2 43 ,
(2)
[
123] 2 4 5 356
,
Which are congruent to each other?
9.8 Applications 9.8.1 Extrema of real-valued functions on jRn In calculus, one uses the second derivative test to see whether a given function y = f(x) takes a local maximum or a local minimum at a critical point. In this section, we show a similar test for a function of more than one variable and also show how quadratic forms arise and how they can be used in this context.
9.8.1. Application: Extrema of real-valued functions on jRn
349
Let f (x) be a real-valued function (not necessarily a quadratic equation) on lRn . A point Xo in lRn at which either a first partial derivative of f fails to exist or the first partial derivatives of f are all zero is called a critical point of f. If f(x) has either a local maximum or a local minimum at a point Xo and all the first partial derivatives of f exist at xo, then all of them must be zero, i.e., fXj (xo) = 0 for all i = 1, 2, ... , n. Thus, if f(x) has first partial derivatives everywhere, its local maxima and minima will occur at critical points. Let us first consider a function of two variables: f(x), x (x, y) E JR2 , which has a critical point Xo = (xo, YO) E lR 2 • If f has continuous third partial derivatives in a neighborhood of Xo, it can be expanded in a Taylor series about that point: For x = (xo + h, YO + k),
=
f(x)
=
f(xo
+ h,
YO
+ k)
= f(xo)
+ (hfx(xo) + kfy(xo))
+ ~ (h 2 fxx(xo) + 2hkfxy(xo) + k 2 fyy(x o)) + R f(xo)
a
= fxx(Xo),
where
1
+ '2 (ah 2 + 2bhk + ck 2) + R,
=
b = fxy(xo),
c = fyy(xo),
and the remainder R is given by
R
= ~ (h 3 fxxx(z) + 3h2kfxxy(z) + 3hk 2f xyy(z) + k 3 fyyy(z)) ,
with Z = (xo + (}h, Yo + (}k) for some 0 < () < 1. If hand k are sufficiently small, IRI will be smaller than the absolute value of !(ah 2 + 2bhk + ck 2), and hence f(x) - f(Xo) and ah 2 + 2bhk + ck 2 will have the same sign. Note that the expression
q(h, k) = ah 2 + 2bhk + ck 2 = [h k]H [
~
]
is a quadratic form in the variables hand k, where
H = H(xo)
= [a
b] b c
= [fxx(Xo)
fxy(x o)] fyy(xo)
fxy(xo)
=
is a symmetric matrix, called the Hessian of fat Xo (xo, Yo). Hence, f(x, y) has a local minimum (or maximum) at Xo if the quadratic form q(h, k) is positive (or negative, respectively) for all sufficiently small (h, k). The critical point Xo is called a saddle point if q(h , k) takes both positive and negative values. Thus, at this point f (x, y) has neither a local minimum nor a local maximum. (This is the second derivative test for a local extrema of f(x, y).) In particular, a quadratic form T
q(x) = x Ax = [x y]
[~ ~] [ ~ ]
= ax
2
+
2bxy + cy2
350
forx are
Chapter 9. Quadratic Forms
= [x y]T
E
1R2 is itself a function oftwo variables,and its firstpartial derivatives qx qy
= =
2ax
+
2by,
2bx +2cy.
By setting these equal to zero, we see that 0 = (0, 0) is a critical point of q. If f= 0, this will be the only critical point of q. Note the Hessian of q is
ac - b 2
Thus, H is nonsingular if and only if ac - b 2 f= O. Since q(O) = 0, it follows that the quadratic form q takes the global minimum at oif and only if q(x) = x T Ax > 0 for all x f= 0, and q takes the global maximum at 0 if and only if q(x)
= x T Ax <
x f= O.
for all
0
If x T Ax takes both positive and negative values, then 0 is a saddle point. Thus, if A is nonsingular, the quadratic form q will have either the global minimum, the global maximum or a saddle point at O. In general, if a function f of two variables has a nonsingular Hessian H at a critical point xo = (xo, YO) which has nonzero eigenvalues Al and A2, then the second derivative test for f(x) says (l) (2) (3)
f has a minimum at Xo if both Al and A2 are positive, f has a maximum at Xo if both Al and A2 are negative, f has a saddle point at Xo if Al and A2 have different signs.
Example 9.15 (The extrema ofa quadraticform f (x, y) = x T Ax can be determined by the inertia ofA)Forq(x, y) = 2x 2-4xy+5y2, determine the nature of the critical point (0, 0). Solution: The matrix of the quadratic form is A
There are two methods:
=
[ 2-2] -2
5
.
(1) Similarity method: Solve det(A./ - A) to get eigenvalues Al = 6 and A2 = 1. Since both eigenvaluesare positive, A is positivedefinite and hence (0, 0) is a global minimum. (2) Congruence method: Diagonalizethe matrix A through thecongruencerelation to get
9.8.1. Application: Extrema of real-valued functions on jRn
A= where E
[_~ -~]
-+
EAE
T
=
[~ ~
l
351
= [~ ~] . It shows that In(A) = (2,0,0) and A is positive definite and 0
hence (0, 0) is a global minimum .
Example 9.16 (The inertia ofthe Hessiandeterminesthe local extrema ofany (nonquadratic) f(x, y) at criticalpoints) Find and describe all critical points of the function
I
f(x , y) = _x 3 + xi - 4xy + 1. 3
Solution: The first partial derivatives of f are
2 -4 y , L>» 2 +y
=
fy = 2xy - 4x = 2x(y - 2).
=°
=
= 2x ,
fxy
=
=
Setting fy 0, we get x or y 2. Setting fx 0, we see that if x 0, then y mustbeeitherOor4,andify = 2,thenx = ±2.Thus,(0, 0), (0,4), (2, 2), (-2,2) are the critical points of f . To classify these critical points, we compute the second partial derivatives:
fxx
= 2y -
fyy
4,
= 2x.
For each critical point (xo, Yo), one can determine the eigenvalues Al and A2 of the Hessian H 2xo 2yo - 4 ] .
=[
2yo - 4
2xo
These values are summarized in the following table: Critical Point (xo, Yo) Al (0, 0) 4 (0, 4) 4 (2, 2) 4 (-2, 2) -4
A2 -4 -4 4 -4
Description saddle point saddle point local minimum local maximum
As an alternative method, one can compute the inertia of the Hessian at each critical point by a congruence relation and get the same description of the nature of 0 critical points . Beyond the functions of two variables, the same argument of the second derivative test for functions of two variables can be justified for functions of more than two variables with a Tayler series about critical points: Let f(x) = f(XI, X2, ... , x n ) be a real-valued function whose third partial derivatives are all continuous. Ifxo is a critical point of f, the Hessian of fat xo is the n x n symmetric matrix H = H (xo) = [hij] given by
352
Chapter 9. Quadratic Forms
hij
82 I = --(xo) . 8Xi8xj
The critical point can be classified as follows : (1) I has a local minimum at Xo if H(xo) is positive definite, (2) I has a local maximum at Xo if H (xo) is negative definite, (3) Xo is a saddle point of I if H (xa) is indefinite.
Example 9.17 (The inertia ofthe Hessian determines the extrema01any I(x, y, z) at critical points) Find the local extrema of the function
ts». y, z) =
x 2 + xz - 3 cos y +
Z2.
Solution: The first partial derivatives of I are
Ix
= 2x + z,
Iy
= 3 sin y ,
Iz =x
+
2z.
= =
=
It follows that (x, y, z) is a critical point of I if and only if x z 0 and y nn, where n is an integer. Let Xo (0, 2krr, 0). The Hessian of I at Xo is given by
=
H(xo)
20I] 3 0 .
= [0
102
It can be diagonalized through the congruence relation to get H(xo) =
02 03 01 ] [ 1 0 2
~
[20 0 3 0 0 ] . 0 0 3/2
It shows that In(H(xo)) = (3,0,0) and H(xa) is positive definite and hence f has a local minimum at xo. (Alternatively, one can compute the eigenvalues of H (xo) which are 3, 3, and 1, which implies that H (xo) is positive definite.) (0, (2k - l)rr, 0), the On the other hand, at a critical point of the form Xl Hessian will be
=
H(Xl) =
2 01] [1 0 2 0 -3 0
.
One can show either that In(H (Xl)) = (2, 1, 0) by using a congruence relation or that the eigenvalues of H(Xl) are -3, 3, and 1. Either one shows that H(x}) is indefinite 0 and hence Xl is a saddle point of I .
Problem 9.20 For each of the following functions, determine whetherthe givencriticalpoint corresponds to a local minimum, local maximum, or saddlepoint: (1) !(x, y)
= 3x 2 -
xy + y2 at (0, 0) ;
9.8.2. Application: Constrained quadratic optimization
353
(2) f(x ,y,z)=x 3+xY Z+y2_3x at (1,0,0).
Problem 9.21 Show that for a continuous function f(x , y) on]R2 which has continuous third partial derivatives, a critical point xo (xo, YO) E ]R2 is a saddle point if and only if det H (X()) < O. Is it also true for such a function f(x, y, z) on ]R3?
=
9.8.2 Constrained quadratic optimization
One of the most importantproblems in appliedmathematics is the optimization (minimizationor maximization) of a real-valued function f of n variables subjectto constraintson the variables. For example, whenthe function f is a linearform subjectto constraints in the form of linear equalities and/or inequalities, the optimization problem is known as linear programming. Those optimization problems are extensively used in the military, industrial, governmental planningfields, among others. In this section,weconsideran optimization problemof a quadraticformin n variables. If there are no constraints on the variables, then such an optimization problem was discussed in Section 9.8.1. As a quadraticoptimization problemwith constraints, we considera very special one: Find the maximumand minimumvalues of a (real or complex) quadratic form q(x) = (x, Ax} subject to the constraint [x] = 1. Advanced calculus tells us that such constraintextremaof q(x) alwaysexists. Theorem 9.13 Let A be a symmetric or a Hermitian matrix, and let the eigenvalues of A be Amin = Al :::: A2 :::: ... :::: An = Amax in increasing orde r. Then, (1) Amin IIxll 2 :::: (x, Ax} :::: Amax IIxll 2 for all x. (2)
(x, Ax} =
(3) Amax
A IIxll 2
ifx
is an eigenvector of A belonging to an eigenvalue A.
= max - - - = max (x, Ax} , and for a unu. vector x, Amax = (x, Ax} x;fO
~,A~
(x, x}
IIxll=1
if and only ifx is an eigenvector belonging to the eigenvalue Amax. (4) Amin
. (x, Ax} . = nun - - - = mm #0 (x, x} IIxll=1
(
x, Ax}, an
dfi . or a unit vector x, Amin
= (x, Ax} if.
and only ifx is an eigenvector belonging to the eigenvalue Amin. In particular, the maximum and minimum values of a (real or complex) quadratic form q(x) (x, Ax} subject to the constraint [x] 1 is the largest and the smallest eigenvalues of A, respectively.
=
=
Proof: We prove it for only a Hermitian matrix A and the other case of a real symmetric matrix is left as an exercise. If A is Hermitian, there is a unitary matrix U such that UH AU = D is a diagonal matrix with AI, A2, . . • ,An as its diagonal entries. Moreover, with a change of coordinates y = U H x = [YI Y2 ... Ynf we have
354
Chapter 9. Quadratic Forms
by the principal axes theorem, and [x]
= IIYII because U is unitary. It implies that
2 " '+A nIYnI 2 2 '2 < AnlyJi + AnlY21 + + AnlYnl 2 2 2 = An (IYII + IY21 + + IYn 1 ) 2 = An lIyll2 = Amax IIxll
(x,Ax) =xHAx =
AJiYli
2+A2IY212+
since An = Amax is the largest eigenvalue. Similarly, one can show Amin ::: (x, Ax) for all x. It proves (1). (2) If x is an eigenvector of A belonging to A, then
(x, Ax)
= (x, AX) = A(X, x) = Allxll 2.
In particular, if x is an eigenvector of A belonging to Amax (Amin, respectively) and [x] = 1, then (x, Ax) = Amax (Amin , respectively) . (3) We only prove the necessity part of the second assertion, because all other (x, Ax) for a parts are clear from (1) and (2). To show this, suppose that Amax Hermitian matrix A and a unit vector x. Let AI, A2, .. . , An be the eigenvalues of A with associated eigenvectors VI , V2, . . . , Vn, respectively. One can assume that Amax = Al and the eigenvectors VI,V2,"" Vn are orthonormal since A is Hermitian. Let x = LjajVj. Then, we have
=
n
=
n
L aj l2Aj s L laj\2A1 = Al i
j=I
j=I
since Amax = AI. Hence , it should be a j = 0 whenever Aj x is an eigenvector belonging to AI. (4) can be proved in a similar way to (3).
¥= AI, which implies that 0
Definition 9.12 The Rayleigh quotient of a symmetric or Hermitian matrix A is the function RA defined for x ¥= 0 by
RA (x) = (x, Ax) (x, x)
for x
¥= O.
It follows from Theorem 9.13 that, subjectto the constraint [x] = 1, the quadratic form (x, Ax) has the maximum value Amax and the minimum value Amin. It means that the smallest and the largest eigenvalues of a Hermitian matrix are characterized as the solutions of a constrained minimum and maximum problem of the Rayleigh quotient. This is very important in vibration problems ranging from aerodynamics to particle physics .
9.8.2. Application: Constrained quadratic optimization
355
Example 9.18 (The extreme values ofa constrained quadratic form) Find the maximum and minimum valuesof the quadratic form
xf + xi + 4XlX2 xf
subject to the constraint + xi = 1, and determine valuesof Xl and X2 at whichthe maximumand minimumoccur. Solution: The quadraticform can be written as Xl2
+
x22
+ 4XlX2 = xT Ax =
[ Xl
X2 ]
[1 2] [ 2 1
Xl ] X2'
The eigenvalues of A are A = 3 and A = -1, which are the largest and smallest eigenvalues, respectively. Their associated unit eigenvectors are
respectively. Note that those extremevaluesof the quadratic form occur at thoseunit eigenvectors. Thus, subject to the constraint + xi = 1, the maximum valueof the quadratic form is A = 3, which occurs at x = ±(I/.J2, 1/.J2), and the minimum value is A = -1, which occurs at x = ±(I/.J2, -1/.J2). Clearly, the quadratic equation x T Ax = c is a hyperbola and the extreme values occur when a hyperbola and the unit circle intersectas in Figure 9.7. 0
xf
~"":-""""+-t-----:*,""-+-i--'::""";:"""""~ Xl
Xf +Xi +4XI X2 = 3
xf +X~ +4XIX2 =-1
Figure 9.7. Extremevalues of the constraint quadratic form
356
Chapter 9. Quadratic Forms
Remark: In Example 9.18, it is shown that the maximum value of the quadratic form xf +xi + 4XlX2 subject to the constraintxf +xi = I is 3. By examining Figure 9.7, one can also see the following dual constraint optimization: the minimum value of the quadratic form xf + xi subject to the constraint xf + xi + 4Xl X2 = 3 is I.
pm:~:;~2[1Tnm]~m: ~~[~? ~~U~O]flh' Rayleigh quotient of each of o
0 0 I
0 -i
I
Problem 9.23 Find the maximum and minimum values of the quadratic form 2Xf subject to the constraint xf + xi and minimum occur.
+
2xi + 3XlX2
= I, and determine valuesof Xl and X2 at which the maximum
Problem 9.24 Find the maximum and minimum of the following quadratic forms subject to the constraint xf + xi + xi = I and determine the values of Xl, x2, and X3 at which the maximum and minimum occur: (1) xf
+
xi
+ 2xj + xi +
(2) 2xf + xi
2XlX2 + 4XlX3 + 4X2X3 , 2xIX3 + 2XlX2·
We have seen that the Rayleigh quotient characterizes the largest and the smallest eigenvalues and their associated eigenvectors of a real symmetric or a Hermitian matrix A in terms of a constrained optimization problem. But all other eigenvalues and their associated eigenvectors can be characterized in a similar way. For example, the second largest eigenvalue can be characterized as the maximum value of the quadratic form {x, Ax} subject to the constraint x H vn=O, where V n is an eigenvector belonging to the largest eigenvalue Amax. For a future discussion, one can refer to some advanced linear algebra books.
9.9 Exercises 9.1. Find the matrix representing each of the following quadratic forms: (1) xf
+
4XlX2 + 3xi.
+ 4XlX3 - 5X2X3 , XI - 2x~ - 3xj + 4XlX2 + 6XlX3 - 8X2X3. 3Xl Yl - 2xl Y2 + 5X2Yl + 7X2Y2 - 8X2Y3 +
(2) xf - xi
(3) (4)
(5) [Xl X2]
+
xi
[~ ~] [ ;~
l
4X3Y2 - X3Y3 ,
9.2. Sketch the level surface of each of the following quadratic equations: (I) xy
= 2.
9.9. Exercises (2) 53x 2 - 72xy
+
357
= 80,
32y2
(3) 16x 2 - 24xy + 9y2 - 60x - 80y + 100 = O.
9.3. Letq be a quadratic form on]R3 and let A
=[
~ -~ -~] bethematrixrepresenting
-5
4
7
q with respect to the basis
= {(I,
ex (1)
0, I) , (I, I, 0), (0,0, I)}.
Diagonalize A, i.e., find an orthogonal matrix P so that p T AP is a diagonal matrix.
(2) Construct a basis f3 for]R3 such that the elements of f3 are the principal axes of the O. quadratic surface q(x)
=
=
9.4. For a given quadratic equation ax 2 + 2bxy
+ ci + dx + ey + f 0 with b :F 0, classify the conic section according to the various possible cases of a, b, and c (see Example 9.6). 9.5 . For a pos itive definite quadratic form q(x) ax 2 + 2bxy + cy2, the curve q(x) 1 is an ellipse. When a c 2 and b -1, sketch the ellipse. 9.6. Show that if A and B are both positive definite, so are A 2 , A -1 and A B.
= =
=
=
=
+
9.7. Prove that if A and B are symmetric and positive definite, so is A2 + B- 1. 9.8. Find a substitution x = Qy that diagonalizes each of the following quadratic forms, where Q is orthogonal. Also, classify the form as positive definite, positive semidefinite, and so on . (1) q(x) 2x 2 + 6xy + 2y2. (2) q(x) x 2 + i + z2 + 2(xy + xz + yz) .
= =
9.9. Determine whether or not each of the following matrices is positive definite:
(1)A=[-~ -~ =~] , (2)A=[~ ~ ~] . -1
-1
2
Use the decomposition A
1 0 1
= LDL T to write x T Ax as the sum of squares.
9.10. Let b be a bilinear form on ]R2defined by
b(x, y) = 2Xl Yl - 3xl Y2
+
x2Y2·
= {(t, 0), (1, I)}. Find the matrix B of b with respect to the basis f3 = {(2, 1), (1, -I)}.
(1) Find the matrix A of b with respect to the basis ex (2)
(3) Find the basis-change matrix Q from the basis B QT AQ .
=
f3
to the basis ex and verify that
9.11. Find the signature, index and rank of each of the following symmetric matrices: (1)
[
0 12] 1 -1 3 2 3 4
,
(2)
[2 3 0] 3 -1 -2 0 -2 0
,
(3)
[4 -3 -3 5
9.12. Which of the following functions b on ]R2 are of bilinear form?
(1) b(x,
=1 = (Xl y) = (Xl +
y)
+
(2) b(x, y)
Yl)2
(3) b(x,
Yl)2 - (Xl - yJ>2
X2Y2
i].
2 1 -6
358
Chapter 9. Quadratic Forms (4) b(x, y)
= XIY2 -
X2YI
=
9.13. For a bilinear form on]R2 defined by b(x , y) XI YI +X2Y2 , find the matrix representation of b with respect to each of the following bases :
= {(I, 0),
a
(0, I)},
{3
= {(I ,
-1) , (1, I)},
= {(I,
y
2), (3, 4)}.
9.14. Which one of the following bilinear forms on lR3 are symmetric or skew-symmetric? For each symmetric one, find its matrix representation of the diagonal form , and for each skew-symmetric one, find its matrix representation of the block form in Theorem 9.7.
= XI Y3 + X3YI y) = xI YI + 2xIY3 + 2x3YI - X2Y2 y) = xIY2 + 2XIY3 - x2Y3 - x2YI -
(1) b(x, y) (2) b(x, (3) b(x,
(4) b(x, y)
= rL=1 (i -
2X3YI
+ x3Y2
j)Xj Yj
9.15. Determine whether each of the following matrices takes a local minimum, local maximum or saddle point at the given point:
= -1
+
+4(eX -x) - 5xsiny 6y 2 atthepoint (x , y) (2) f(x , y)=(x 2-2x)cosyat(x, y)=(1, zr). (1) f(x , y)
= (0,
0);
9.16. Show that the quadratic form q(x) = 2x 2 + 4xy + y 2 has a saddle point at the origin, despite the fact that its coefficients are positive . Show that q can be written as the difference of two perfect squares. 9.17. Find the eigenvalues of the following matrices and the maximum value of the associated quadratic forms on the unit sphere . (1)
[-~
o
.,
~ ],
[-~ -~ ~ ] ,
(2)
1 -1
0
9.18. A bilinear form b : V x W if it satisfies b(v , w) b(v , w)
~
(3)
1 -2
[-~ -~ ~] . 0
0 5
lRon vector spaces V and W is said to be nondegenerate
=0
=0
for all w E W
implies v
for all v E V
implies w
= 0,
= O.
and
As an example, an inner product on a vector space V is just a symmetric, nondegenerate bilinear form on V. Let b : V x W ~ lR be a nondegenerate bilinear form. For a fixed WE W , we define f{Jw : V ~ R by f{Jw(v)
= b(v,
w)
for v E V.
Then, the bilinearity of b proves that f{Jw E V*, from which we obtain a linear transformation rp : W ~ V* defined by rp(w) = f{Jw . Similarly, we can have a linear transformation 1/1 : V ~ W* defined by 1/I(v)(W)
= b(v,
Prove the following statements:
w)
for v E V and WE W.
9.9. Exercises
359
(1) If b : V x W -+ JR is a nondegenerate bilinear form, then the linear transformations rp : W -+ V* and 1/1 : V -+ W* are isomorphisms. (2) If there exists a nondegenerate bilinear form b : V x W -+ JR, then dim V
= dim W .
9.19. Determine whether the following statements are true or false, in general, and justify your answers . (1) For any quadratic form q on JRn , there exists a basis ex for JRn with respect to which the matrix representation of q is diagonal. (2) Any two matrix representations of a quadratic form have the same inertia. (3) If A is positive definite symmetric matrix, then every square submatrix of A has positive determinant. (4) If A is negative definite, det A < O. (5) The sum of a positive definite quadratic form and a negative definite quadratic form is indefinite. (6) If A is a real symmetric positive definite matrix, then the solution set of x T Ax is an ellipsoid. (7) For any nontrivial bilinear form b v =0.
i=
0 on a vector space V, if b(v, v)
=I
= 0, then
(8) Any symmetric matrix is congruent to a diagonal matrix. (9) Any two congruent matrices have the same eigenvalues. (10) Any two congruent matrices have the same determinant. (11) The sum of two bilinear forms on V is also a bilinear form. (12) Any matrix representation of a bilinear form is diagonalizable. (13) If a real symmetric matrix A is both positive semidefinite and negative semidefinite, then A must be the zero matrix. (14) Any two similar real symmetric matrices have the same signature .
Selected Answers and Hints
Chapter1 Problems 1.2 (1) Inconsistent. (2) (XI, Xz, x3, X4) = (-1- 4t, 6 - 21, 2 - 31, 1) for any 1 E JR. 1.3 (1) (x , y, z) 1.4 (1) bl
1.7 a
+
= (1, -1 , 1). (3) (w,
x , y, z)
bZ - b3 = O. (2) For any bj's.
= -¥, b = ¥, c = .!j,
1.9 Consider the matrices : A
d
= (2, 0,
= -4.
= [; :
l
B
= [; ~
1, 3)
J.
C
= [~ ~
J-
1.10 Compare the diagonal entries of AA T and AT A.
1.12 (1) Infinitely many for a = 4, exactly one for a f= ±4, and none for a = -4. (2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise. 1.14 (3) I
= IT = (AA-I)T = (A-I)T AT means by definition (AT)-I = (A-I)T.
1.16 Use Problem 1.14(3). 1 .17 Any permutation on n objects can be obtained by taking a finite number of interchangings of two objects . 1.21 Consider the case that some dj is zero.
1.22
X
= 2, y = 3, Z = 1.
1.23 True if Ax
= b is consistent, but not true in general.
=[ -~ o
~ ~],
[b -~ -~ ].
U= -I I 0 0 1.25 (1) Consider (i, j)-entries of AB for i < j .
1.24 L
I
(2) A can be written as a product of lower triangular elementary matrices .
362
Selected Answers and Hints
1.26 L=
1 01 0] 0 [ -1/2 o -2/3 1
,D=
[20 3~2 ~ 0
o
] ,
4/3
u= [~
0
-~/2 -~/3] . 0
1
1.27 There are four possibilities for P.
= 0.5, 12 = 6, 13 = 0.55. (2) II = 0, h = 13 = 1,14 = 15 = 5.
II
1.29 (1)
1.30 x = k
1.31 A
=
0.35 ] 0.40 for k > O. [ 0.25
0.0 0.1 0.8] [ 90 ] 0.4 0.7 0.1 with d = 10 . [ 0.5 0.0 0.1 30
Exercises 1.1 Row-echelon forms are A, B, D, F . Reduced row-echelon forms are A, B, F.
1 -3 2
1.2 (1)
[
1
2]
~
~ ~ -1/~ 3/~
o
0 0
0
.
0
1 -3 0 3/2 1/2] 0 0 1 -1/4 3/4 o 0 0 0 0 . [ o 0 0 0 0 1.4 (1) XI = 0, X2 = 1, X3 = -1, X4 = 2. (2) X
1.3 (1)
= 17/2, y = 3, Z = -4.
1.5 (1) and (2) . 1.6 For any bi'S. 1.7 bl - 2b2 + 5b3 i= O. 1.8 (1) Take x the transpose of each row vector of A. 1.10 Try it with several kinds of diagonal matrices for B.
1.11 Ak =
[
o
0
-2227
1
101] -60 . [ o 0 87 1.14 See Problem 1.9. 1.16 (1) A-lAB = B. (2) A-lAC 1.17 a = 0, c- I = b i= O. 1.13 (2)
5 0
1 2k 3k(k - 1) ] 1 3k .
0
8 A-I _ 11 . -
1.19 A-I
=C = A +
I.
1 -1 0 0] [13/8 0 1/2 -1/2 0 B-1 = -15/8 0 0 1/3 -1/3 ' [ o 0 0 1/4 5/4
= -Is
8 -23 -19 2] 4 .
[4
1
-2 1
Selected Answers and Hints
1.22 (I)x=A-Ib=
[-~~; -~~~ ~~~] [ ; ] = [-~~;]. -1/3 -2/3
1/3
7
- 5/ 3
1.23 (1) A =
[~ ~] [~ ~] [~
1.24 (1) A =
[~3 ~1 ~] [~~ ~ ] [~0 ~0 ~] , 1 0 0 -1 1
(2)
363
[b~a ~][ ~
Ii
d _ ~2/a ][
2]
= LDU ,
~ b/~
(2) L = A, D = U = I.
l
1.25 c=[2 -I3f,x=[423]T.
1.26 (2) A
=
[~1~1~]1[~0 0 ~ ~]2 [~ ~ 4/~] . 001
1.27 (1) (Ak)- I = (A-Ii . (2) An-I = 0 if A E Mnxn. (3) (l- A)(l + A + ... + Ak- I) = I - Ak. 1.28 (1) A =
[~ ~
1
(2) A =
r
l
AZ =
r
l
A= I.
1.29 Exactly seven of them are true. (1) See Theorem 1.9. :::
(2) Consider A =
[~ ~
r)
l
;:n::e~:=[1-1~ T]1f ~:B-: [? tB~=]~ABl = BT AT =BA . 5 3 1
1 3 2 (8) If A-I exists, A-I = A-I (AB T) = B T. (9) If AB has the (right) inverse C, then A-I = BC.
(7) (A-Il = (AT)-I = A-I.
(10) Con sider EI =
[~ ~] and EZ =
(12) Consider a permutation matrix
U~ l
[~ ~] .
Chapter 2 Problems 2.2 (1) Note: 2nd column - 1st column = yd column - 2nd column. 2.4 (1) -27, (2) 0, (3) (1 - x 4)3 . 2.7 Let a be a transposition in Sn. Then the composition of a with an even (odd) permutation in Sn is an odd (even, respectively) permutation. 2.9 (1) -14. (2) O. 2.10 (I)(y-x)(-x+z)(z-y)(w-x)(w-y)(w- z)(w+y+x+z) . (2) (fa - be + cd) (fa + cd - eb) . 2.11 A-I =
[-~
I =1];
2 -!
adjA=
!
[_~ -58 -52] -1
6
I
.
364
Selected Answers and Hints
2.12 If A = 0, then clearly adj A = O. Otherwise , use A . adj A = (det A)/. 2.13 Use adj A . adj(adj A) = det(adj A) I . 2.14 If A and B are invertible matrices , then (AB)-l = B-1 A-I . Since for any invertible
=
=
matrix A we have adj A (det A)A- 1, (1) and (2) are obvious. To show (3), let AB BA and A be invertible. Then A-I B = A- 1(BA)A- 1 = A- 1(AB)A-1 = BA- 1, which gives (3).
2.15 (1) xl = 4, X2 = 1, x3 = -2. 10 5 5 (2) x = 23' y = 6' z = 2' 2.16 The solution of the system id(x) = x is Xi = «Ji~Si
= det A.
Exercises 2.1 k = 0 or 2. 2.2 It is not necessary to compute A 2 or A 3 . 2.3 -37. 2.4 (1) det A = (_1)n-1 (n - 1). (2) O.
2.5 -2,0,1,4.
L
2.6 Consider
a1u(l) . .. anu(n)'
ueS.
2.7 (1) 1, (2) 24.
2.8 (3) Xl = 1, X2 = -1, x3 =2, X4 = - 2. 2.9 (2) x = (3,0, 4/11)T.
2.10 k=Oor±1.
2.11 x = (-5, 1,2, 3)T . 2.12 x = 3, y = -1, z = 2. 2.13 (3) All = - 2, A12 = 7, An = -8, A33 = 3. 2.16 A-I
= -h.
[~~ -~ 1~ ] 6
2.17 (1) adj A = [
A-I
14 -18
.
i =~ =~] ,
-4
7
5
det A = -7, det(adj A) = 49,
= -+adj A. (2) adj A = [-1~
~ -~],
7 -3 -1
det A = 2, det(adj A) = 4, A-I = !-adj A .
2.19 Multiply
[~ ~].
2.20 If we set A
= [; ~], then the area is 11 det AI = 4.
2.21 If we set A
=
[i ~ 1
then the area is
!.fl "'I(AT A)I ~ 1#.
SelectedAnswers and Hints 2.22 Use det A =
L
sgn(a)al 17(I)'"
a n17(n)'
17eS n
2.23 Exactly seven of them are true.
[~
(I) Consider A = [ ; ; ] and B =
;
(2) det(AB) = det A det B = det B det A .
[~ ~ ] .
l
(4) (cIn - A)T = cln - AT .
(3) Consider A = [ ; ; ] and c = 3. (5) Consider E =
365
(6) and (7) Compare their determinants.
(9) Find its counter example. (8) If A = h ? (10) What happened for A = O? (11) UyT = U[VI . .. vn] = [VIU' " vnu] ; det(uyT) = VI . . . Vn det([u ··· uJ) = O.
: : ::::, U[~(:' ~ 'J': o
A
: :l IT ~ dot A
0'
1 1 (16) At = A -I for any permutation matrix A.
Chapter 3 Problems 3.1 Check the commutativity in addition. 3.2 (2), (4).
3.3 (1), (2), (4). 3.5 See Problem 1.11. 3.6 Note that any vector yin W is of the form aIxI inU .
+
a2x2
+ ... +
amXm
which is a vector
3.7 Use Lemma 3.7(2) . 3.9 Use Theorem 3.6 3.10 Any basis for W must be a basis for V already, by Corollary 3.12. 3.11 (I) dim= n - 1, (2) dim= n(nt) , (3) dim= n(yll.
3.13 63a + 39b - 13c + 5d = O. 3.15 If bj , . .. , b n denote the column vectors of B, then AB = [Abl ... Ab n]. 3.16 Consider the matrix A from Example 3.20. 3.17 (I) rank = 3, nullity = 1. (2) rank = 2, nullity = 2.
3.18 Ax = b has a solution if and only if b E C(A) . 3.19 A-I(AB) = B implies rank B = rank A-1(AB)::: rank(AB) . 3.20 By (2) of Theorem 3.21 and Corollary 3.18, a matrix A of rank r must have an invertible submatrix C of rank r. By (1) of the same theorem, the rank of C must be the largest. 3.22 dim(V
+
W) = 4 and dim(V
n W)
= 1.
366
Selected Answers and Hints
3.23 A basis for V is «1 ,0,0,0), (0, -1 , 1,0), (0, -1,0, I)}, for W : «-1 , 1,0,0), (0,0,2, I)}, and for V n W : «3, -3,2, I)}. Thus, dim(V
+
W) = 4 means V
3.26
+
W =]R4 and any basis for]R4 works for V
+
W.
1 0
A=
[
°1 °1 ° 1
Exercises 3.1 Consider 0(1, 1). 3.2 (5). 3.3 (2), (3). For (1), if f(O) = 1, 2f(O) = 2. 3.4 (1). 3.5 (1), (4). 3.6 tr(AB - BA)
3.7 No.
3.8 (1) p(x)
= 0.
= -PI (x) + 3P2(X) -
2P3(X).
3.9 No. 3.10 Linearly dependent.
3.12 No. 3.13 ((1,1,0) , (1,0, I)}. 3.14 2.
3
{}
OO }
.15 Consider (ej = ai i=l where ai =
{I°
if i = j , otherwise.
+ ... +cpAbs = A(qb l + . .. + cpb S ) implies qb l + .. . + cpb P = 0 since N (A) = 0, and this also implies Ci = for all i = 1, . . . , P since columns of Bare linearly independent. (2) B has a right inverse . (3) and (4) : Look at (1) and (2) above.
3.16 (1) 0 = qAb l
°
3.17 (1) {(-5, 3, I)}. (2) 3.
3.18 5f, and dependent. 3.19 (1) 'R,(A) = (1,2,0,3) , (0,0,1 ,2»), C(A) = (5,0,1), (0,5 ,2»), N(A) = (-2, 1,0,0), (-3,0, -2, 1») . (2) 'R,(B) (1 ,1 , -2,2), (0,2,1, -5), (0,0,0, I»), C(B) ((1, -2,0), (0,1,1), (0,0,1»), N(B) = (5, -1,2,0»).
=
=
3.20 rank = 2 when x = -3, rank = 3 when x
=f. - 3.
3.22 See Exercise 2.23: Each column vector of UyT is of the form ViU , that is, U spans the column space . Conversely, if A is of rank I, then the column space is spanned by anyone column of A, say the first column u of A, and the remaining columns are of the form ViU, i = 2, . .. , n . Take v = [1 V2 ... vnf . Then one can easily see that A = UyT. 3.23 Four of them are true. (1) A - A =1 or 2A = 1 (2) In]R2 , let a = [ej , e2} and
f3
= [ej , -e2}.
Selected Answers and Hints
367
(3) Even U = W, (X n fJ can be an empty set. (4) How can you find a basis for C(A). See Example 3.20. . (5) Consider A = [0 1 0 0 ] and B = [ 1 0 0 0] . (6) See Theorem 3.24. (7) See Theorem 3.25. (8) Ifx (9) In]R2, Consider U ]R2 x 0 and V 0 x ]R2. (10) Note dim C(A T ) dim'R(A) dim C(A). (II) By the fundamental Theorem and the Rank Theorem.
= =
=
=
= -s.
Chapter 4 Problems 4.1
[~ ~
l
since it is simply the change of coordinates x and y .
4.2 To show W is a subspace, see Theorem 4.2. Let Eij be the matrix with I at the (i, j)-th position and 0 at others . Let Fk be the matrix with 1 at the (k, k) -th position, -I at the (n, n)-thposition and 0 at others. Then theset{Eij, Fk : I :::: i f= j :::: n, k = I, ... , n-I} is a basis for W . Thus dim W n2 - 1. 4.3
4.4 4.5
= tr(AB) = I:i=1 I:k=1 aikbki = I:k=1 I:i=1 bkiaik = tr(BA) . If yes, (2, I) = T(-6 , -2, 0) = -2T(3 , I, 0) = (-2, -2). If aivi + a2v2 + . ..+ akvk = 0, then o = T(al VI + a2v2 + .., + akvk) = al WI + a2w2 + . .. + akwk implies ai = 0 for i=I , ... ,k.
4.6 (I) If T(x)
= T(y) , then S
0
T(x)
= So T(y) implies x = y. (4) They are invertible.
4.7 (1) T(x) = T(y) if and only if T(x - y) = 0, i.e., x - y E Ker(T) . (2) Let{VI, . . . , vn} be a basis for V. 1fT is one-to-one, then the set{T(vI), . .. , T(vn )} is linearly independent as the proof of Theorem 4.7 shows. Corollary 3.12 shows it is a basis for V . Thus, for any y E V, we can write it as y I:I=I aiT(vi) T(I:I=I aivi) . Set x = I:I=I aivi E V . Then clearly T(x) = y so that T is onto. If T is onto, then for each i I , .. . , n there exists xi E V such that T(Xi) Vi.Then the set Ixj , ... ,xn} is linearly 0, then 0 T(I:I=I aixi) I:I=I aiT(xi) = independent in V,since, ifI:l=l aixi I:I=I aivi implies ai = 0 for all i = I, . . . ,n. Thus it is a basis by Corollary 3.12 again. 0 for x I:I=I aixi E V , then 0 T(x) I:I=I aiT(xi) I:I=I aivi If T(x) implies ai 0 for all i I , . . . , n, that is x O.Thus Ker (T) {O}.
=
=
=
=
=
=
=
=
4.8 Use rotation R!f andrefiection
=
=
=
4
4.11
=i ~]'[Tlll=[~ -~ 7 0
[Tl~ = [~o ~2 -~3 4~] .
=
=
[~ _~] about the x-axis.
4.9 (1) (5, 2, 3). (2) (2, 3, 0). 4.10 (I)[Tla=[;
=
4 -3
245 ].
=
=
368
413
4.14
[S+Tla~
[Sl~ = [~o - 0~ ~] , [Tl a = [~ ~ ~] . 1 0 0 4
4.15 (2) 4.16
[T]~ = [~ ~] [T -1l p = [ - ~ ~
[idlp=~[~ 2
4.17
U~ n[TaS]a~ U~ n
Selected Answers and Hints
2
-; 1
l
-~]'[idl~=[-; -~ -1~] . 1
1
1
-2
[T]a=[~1 -~0 4~]'[T1P=[-~1 -~1 -~] . 5
4.18 Write B = Q-I AQ with some invertible matrix Q. (1) det B = det(Q-I AQ) = det Q-I det Adet Q = det A. (2) tr(B) = tr(Q-I AQ) tr(QQ-I A) = tr(A) (see Problem 4.3). (3) Use Problem 3.19.
4.20 a* = {fI (x,
y, z)
= x-
!y, !2(x, y, z)
= !y, f3(X, y, z) =
=
-x + z},
4.24 By BA, we get a tilting along the x-axis; while by BT A, one can get a tilting along the y-axis .
Exercises 4.1 (2). 4.2 ax 3 + bx 2 + ax + c. 4.4 S is linear because the integration satisfies the linearity.
4.5 (1) Consider the decomposition ofv = v+I(V)
4.6 (1) {(x, ~x, 2x)
E
lR3
:
+
v-I(v).
x E R}.
4.7 (2) T-1(r, s, t) = (! r, 2r - s, 7r - 3s - r) , 4.8 (1) Since T 0 S is one-to-one from V into V, T 0 S is also onto and so T is onto . Moreover, if S(u) = S(v), then T 0 S(u) = T 0 S(v) implies u = v. Thus, S is one-to-one, and so onto . This implies T is one-to-one. In fact, if T(u) = T(v), then there exist x and y such that S(x) = u and S(y) = v. Thus T 0 S(x) = T 0 S(y) implies x = y and so u = T(x) = T(y) =v.
4.9 Note that T cannot be one-to-one and S cannot be onto. 4.12 vol(T(C» = Idet(A)lvo1(C), for the matrix representation A of T.
4.13 (3) (5, 2, 0). 5 4 -4 -3 4.14 0 0 [ o 0
=~o -:~].
-1/3 ] 4.15 (1) [ -5/3 2/3 1/3 .
1
4.16 (1)
[~
_
i l [i (2)
4.17 (1) T(l , 0, 0) (2) T(x, y, z) 4.18
Selected Answers and Hints
1
-~
= (4, 0), T(1, I, 0) = (1, 3), T(1 , = (4x - 2y + z, Y+ 2z) .
(1)[~001 ~ ;] '(4)[~0 0~ 0~] .
~1' h (I)
U~: ~n(2)
Q
~
[:
I , I)
i iJ ~
369
= (4, 3).
tr:' ,
4.20 Compute the trace of AB - BA . 4.21 (1)
[-7 -13] 4
-33 19
8'
-2/3 1/3 4/3] 2/3 -1/3 -1/3 . [ 7/3 -2/3 -8/3 4.23 ' All represents reflection in a line at angle ()/2 to the x -axis . And , any two such reflections 4.22 (2) [~ ;], (4)
are similar (by a rotation).
4.25 Compute their determinants. 4.27 [T]a
= [-~
~ ~] =
([T*Ja*)T .
1 0 1
4.28 (1)
4J9
[_~ ~ ~ l(2)[T]~ =
:~):Il:::
4.31 PI (x)
I , 0, I),
=1 + x -
0,
[-i ; -~ 1
I, I , I) , 14,2,2,3»,
~x2, P2(x)
= -i +
[T)~~ ~1 ~ ~n
~x2, P3(X)
[
= -j. + x -
~x2 .
4.32 Five of them are false.
(1) Consider T : jR2 -+ jR3 defined by T(x , y) = (x, 0, 0) . (2) Note dim Ker(T) + dim 1m(T) = dim jRn = n ,
(3) dim Ker(T) = dim N([T]~). (4) dim Im(T) dimC([T]a) dimjR([TJa). (5) Ker(T) CKer(S 0 T) . (6) P (x) = 2 is not linear. (7) Use the definition of a linear transformation. (8) and (10) See Remark (1) in Section 4.3. (9) T : jRn -+ jRn is one-to-one iff T is an isomorphism. See Remark in Section 4.4 and Theorem 4.14. (11) By Definition 4.5 and Theorem 4.14. (12) Cf. Theorem 4.17. (13) det(A + B) i= det(A) + det(B) in general. (14) T(O) i= (0) in general for a translation T.
=
=
370
Selected Answers and Hints Chapter 5 Problems
5.1 Let (x, y) = XT Ay be an inner product. Then, for x = (1,0), (x, x) = aXIYI + C(XI Y2 + X2YI) + bX2Y2 > 0 implies a > O. Similarly, for any x (x, 1), (x, x) > 0 implies ab - c2 > O.
=
5.2 (x, y)2 = (x , x) (y, y) if and only if [rx + ylj2 = (x, x)t 2 + 2(x, y)t + (y, y) = 0 has a repeated real root to. 5.3 If dl < 0, then for x
= (1,
0, 0), (x, x)
= dl
< 0: Impossible.
5.4 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality. 5.5 (4) Use Problem 5.4(4): triangle inequality in the length.
fJ
5.6 (f, g) = f(x)g(x)dx defines an inner product on C [0, 1]. Use Cauchy-Schwarz inequality.
fJ
5.7 (2)-(3): Use f(x)g(x) dx = 0 if k f. i; and = ~ if k = 5.8 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.
=
e.
=
5.10 Clearly, Xl (1,0,1), x2 (0,1,2) are in W. The Gram-Schmidt orthogonalization gives UI = = (1,0,1), u2 = JJ(-I, 1, 1) which form a basis.
W
5.11 {I, J3(2x - 1), V5(6x 2
-
6x + I)}.
5.12 (4) Im(idv - T) ~ Ker(T) because T(idV - T)(x) = T(x) - T 2(x) = O. Im(idVT) 2 Ker(T) because if T(x) = 0, x = x - 0 = X - T(x) = (idv - T)x.
= 0 for only x = O. (2) Use Definition 5.6(2).
5.13 (1) (x , x)
5.16 1) is just the definition, and use (1) to prove (2).
-~+x.
5.17
5.18 (1) b E C(A) and y E N(A T) . 5.19 The null space of the matrix
[b -i
_~
i]
is
x=t[I - I l O]T +s[-4IOIffort,sElR.
5.20 Note: R(A)l.
= N(A).
5.22 There are 4 rotations and 4 reflections. 5.23 (1) r =
~,
s=
~,
a = JJ ' b = - JJ '
C
= - JJ.
5.24 Extend {VI , ... , Vm} to an orthonormal basis {Vb " " vm, ... , vn}. Then IIxll 2 El=ll(x, vi)12 + E}=m+ll(x, Vj)l2. 5.25 (1) orthogonal. (2) not orthogonal. 5.26 x
= (1,
~~ ]
-1 , 0) + t(2, 1, -1) for any number t .
=x=
(AT A)-IATb=
5.30 For x E IRm, x
= (VI, X)VI + . . . +
5.28 [
=
'!g
[~~3:] . 16.1
(vm, x)vm = (VI vf)x + ... + (Vmv~)x.
5.31 The line is a subspace with an orthonormal basis ~ (1, 1), or is the column space of
A=~[~l
Selected Answers and Hints
5.32 P
=~ [
371
7 ; -~ ] .
3
1 - I
2
5.33 Note that (e) , e2, ea} is an orthonormal basis for the subspace. 5.35 Hint: First, show that P is symmetric. Exercises 5.1 Inner products are (2) , (4), (5).
5.2 For the last condition of the definition, note that (A , A) if and only if aij = 0 for all t, j .
at =
0
= 3.
5.4 (I)k 5.5 (3)
= tr(A T A) = Lj,i
= IIgll =../f7'1., The angle is 0 ifn = m, ~ ifn i: m .
11/11
5.6 Use the Cauchy-Schwarz inequality and Problem 5.2 with x y = (1, .. . , 1) in (R", .).
JT97J. = h(% + ~ +
5.7 (1) 37/4,
(2) If (h , g)
and
= 0 with h i: 0 a constant and g(x) = ax 2 + bx + c, %+ ~ + c = 0 in jR3.
c)
then (a , b, c) is on the plane
5.9 Hint: For A
= (a), . . . , an)
= [V) V2], two columns are linearly independent, and its column space
is W.
3
1
5.11 (I ) ZV2, (2) ZV2. 5.13 Orthogonal: (4). Nonorthogonal: (I ), (2), (3). 5.17 Use induction on n. Let B be the matrix A with the first column c) replaced by c = c ) - Projw (C) , and write Projw(c) = a2c2 + ... +ancn for some c. ts. Show that Jdet (AT A )
= J det( B T B ) = Ilcllvol(c2, . . . , cn) = vol(P(A».
=
Then the volume of the tetrahedron is
5.18 Let A
[~ ! ~] .
tJ det(A T A ) = 1.
012
= det A imply det A = ±1. Th e matrix . A = [cos sin e . h i WIt ' h det A = - I . sin e e _ cos e ] ISort ogona
5.19 A T A
5.21 Ax
=I
and det AT
= b has a solution for every b E jRm if r = m. It has infinitely many solutions if
nullity
=n - r =n -
5.22 Fiod a least squares
.
my
m > O.
<0""00 0' [ 1 3
= a + bx. Then y = x + Z.
i
~ 1-~ !
-I ]
5.23 Follow",,,,,'se 5.22 with A
[
27
.Then y = 2x 3 - 4x 2 + 3x - 5.
372
Selected Answers and Hints
5.27 (1) Let h ex ) = !-U (x ) + f (- x » and g(x ) = !-U (x ) - fe-x» ~. Then f = h + g . (2) For fEU and g E V , (f, g) = f~1 f (x )g (x)dx =f(-t )g(-t)dt = - f~1 f(t)g (t )dt = -(f,g ), by change of variable x = -rl,
t:'
(3) Expand the length in the inner product.
5.28 Five of them are true. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Possible via a natural isomorphism. See Theorem 5.8. Consider (I , 0) and (-1 , 0). Consider two subspaces U and W of R 3 spanned by el and e2, respectively. IIx - YII + IIYII ::: [x] by the triangle inequality. The columns of any permutation matrix P are {el, ... , en} in some order. See Theorem 5.5. A is a projection iff A 2 A. (See Theorem 5.9.) By Corollaries 5.10 and 5.12. R(A) and C(A) are not subspaces of the same vector space . A dilation is an isomorphism. ATb E C(A T A) always. The solution set is Xo + N(A) by Corollary 5.17.
=
Chapter 6 Problems 6.3 Consider the matrices
[~ ~] and [~ ~
l
l
6.4 Check with A
= [~ ~
6.5 (1) Use det A
= Al .. . An. (2) Ax = AX if and only if x = AA-Ix.
6.6 If A is invertible, then AB and B
= A (B A)A -I. For two singular matrices A = [~ ~]
= [~ ~] , A Band BA are not similar, but they have the same eigenvalues .
=
[XI X2 X3] diagonalizes A, then the diagonal matrix must be AI and 6.7 (1) If Q A Q AQI . Expand this equation and compare the corresponding columns of the equation to find a contradiction on the invertibility of Q.
=
6.8 Q
= [~
~
l
D
= [~
~
J
Then A
= QDQ-I = [=~
:l
6.9 (1) The eigenvalues of A are I , 1, -3, and their associated eigenvectors are (1, 1,0), (-1,0, I) and (1, 3, 1), respectively. (2) If f(x) x lO + x 7 + 5x, then f(1), f(1) and f(-3) are the eigenvalues of A IO + A7 +5A .
=
a~~1
6.11 Note that [
an-I
] [i ~ -~] [a:~1 =
0 1
0
a n -2
] .TheeigenvaluesareI ,2,-Iand
eigenvectors are (1, I, 1), (4, 2,1) and (1, -1 , 1), respectively. It turns out that an 2 2n 2 - (_I)n_ - -.
3
3
6.13 Its characteristic polynomial is fundamental set.
f
=
(A) = (A + 1) (A - 2)2 ; so (- Il , 2n , n2n form a
Selected Answers and Hints
373
6.14 The eigenvalues are 0.6, 0.8, and 1. 6.15 The eigenvalues are 0, 0.4, and 1, and their eigenvectors are (1,4, -5) , (1,0, -1) and (3, 2, 5), respectively.
+
6.17 For (1), use (A
B)k
= 2:~=O (~)A i B k- j
if AB
= BA . For (2) and (3), use the
definition of eA . Use 0) for (4).
6.19 Note that e(A
T
= (eA)T by definition (thus , if A is symmetric, so is eA), and use
)
(4).
6.20 A = 2l
6.21 Yl 6.22
+ N with N =
[~o ~ ~]. 0 0
~
2
3 ~ 3 Then N = O. e = e [ 1 3 0 o 1 3
A
2
I
]
.
= CJ e2x - !C2e-3x; Y2 = Cl e2x + C2e-3x . 2x 3x Yl = - C2 e2x + C3 e3x Yl = e - 2e 2x Y2 = CJ eX + 2c2e2x - C3e3x, Y2 = eX - 2e + 2e3x Y3 = 2c2e2x - C3e3x Y3 = - 2e2x + 2e3x .
I
623 (1) [
:=: ],
(2) [
i~;'3l
6.24 With respect to the standard basis a , [T]a
= [~
~
;] with eigenvalues 3, 3, 5
104 and eigenvectors (0, 1,0), (-1,0,1) and 0 ,2,1), respectively. 6.25 With the standard basis for M2x2(lR):
1 1 0 1]
[T]a = A =
[
b~
~ ~
. The eigenvalues are 3, I, I,
-I, and their asso-
1 0 1 1
ciated eigenvectors are 0,1 ,1, I), (-1,0,1,0), (0, -1 , 0, I) , and (-1, I, -1, I) , respectively.
6.26 With tho basis e
~ [I, x, x'), IT]. ~ A ~ [ ~
o2 o
0] 0 3
.
Exercises 6.1 (4) 0 of multiplicity 3, 4 of multiplicity 1. Eigenvectors are ej - e i+l for 1 ::: i ::: 3 and
2:1=1 ej. = (A + 2) (A 2 -
6.2 f(A)
xl
= (-35,
8A + 15), Al = -2, A2 = 3, A3 = 5, 12, 19), x2 = (0, 3, 1), x3 = (0, 1, 1).
6.4 {v} is a basis for N(A) , and {u, w} is a basis for C(A).
374
Selected Answers and Hints 6.5 Assume that it is true forinvertible matrices. In each of the equations (1)-(3) both sides continuously depend on the elements of A and B . Any matrix A can be approximated by matrices of the form Ae = A + el which are invertible for sufficiently small nonzero E. (Actually, if AI, . . . , An is the whole set of eigenvalues of A, then A e is invertible for all E:/= -Ai,) Besides, if AB = BA , then AeB = BA e . 6.6 Note that the order in the product doesn't matter, and any eigenvector of A is killed by B. Since the eigenvalues are all different, the eigenvectors belonging to 1,2,3 form a basis. Thus B = 0, that is, B has only the zero eigenvalue, so all vectors are eigenvectors of B.
6.8 A = QDQ-I =
~ [~ 2
1
-;
2
=~]. 7
6.9 Note that jRn = W E9 w- and pew) = w for w E Wand P(v) = 0 for v Thus, the eigenspace belonging to A= 1 is W , and that to A = 0 is w-.
E
w-.
6.10 Foranyw E jRn,Aw = u(v T w) = (v-w)u. Thus Au = (v-uju.so u is an eigenvector belonging to the eigenvalue A = v . u . The other eigenvectors are those in vi. with eigenvalue zero. Thus, A has either two eigenspaces E(A) that are l-dimensional spanned by u and E(O) = vi. if v . u :/= 0, or just one eigenspace E(O) = jRn if v·u=O. 6.11 AV = Av = A 2v = A2V implies A(A - 1) = O. 6.13 Use tr(A)
= Al + ... +
An = all
+ ... + ann'
6.14 A = QDI Q-I and B = QD2Q-I imply AB = BA since DID2 = D2DI .
=
Conversely, Suppose AB BA and all eigenvalues AI, , An of A are distinct. , n. But if Ax = Aix, Then the eigenspaces E (Ai) are all I-dimensional for i = 1, then ABx = BAx = ABx implies Bx E E(Ai)' Thus Bx = /l-x means x is also an eigenvector of B . By the same reason, any eigenvector of B is also an eigenvector of A. Choose a set of linearly independent eigenvectors of A, which form an invertible matrix Q such that Q-I AQ = DI and Q-I BQ = D2. 6.16 With respect to the basis ex
= {I, x, x 2 }, [Tl a =
[6 ~
~] . The eigenvalues are
1 1 0
2,1, - 1 and the eigenvectors are (1, 1, 1), (-1 , 1,0) and (1, 1, -2) , respectively. 6.17 None is diagonalizable. 6.18 (1) D =
[~o ~70 -7~]
(2) D =
[~I0 0~ ~] 5
(3) D =
[~0 0~ 2~]
6.19 Eigenvalues are I, 1,2 and eigenvectors are (1,0,0), (0,1,2) and (1, 2,3). A lOx = (1025, 2050, 3076). 6.20 Fibonacci sequence: an+1
= an + an-I with al
= 2 and a2
= 3.
6.22 Thecharacteristicequation isA2-xA-0 .I8 = O. Since A = I is a solution, X = 0.82. The eigenvalues are now I, -0.18 and the eigenvectors are (-0.3, -1) and (1, -0.6). 6.23 (l) e A
= [~
e-
~
l
SelectedAnswers and Hints 6.24 The initial status in 1985 is Xo resent the perc[en;:ge] of
=
375
= (xo, YO, zo) = (0.4,0.2,0.4), where x, Y, z rep-
lar[geo.~e~~m, ~d]sm[~.;a]r owners.
In 1995, the sta-
0.3 0.7 0.1 0.2 = Axo. Thus , in 2025, 0 0.2 0.9 0.4 the status is X4 = A4xo. The eigenvalues are 0.5, 0.8, and 1, whose eigenvectors are (-0.41,0.82, -0.41), (0.47,0.47, -0.94), and (-0.17, -0.52, -1.04), respectively.
tus is Xl
=
Yl Zl
6.27 (1)
6.28 Yl
I
=
YI(X) Y2(X)
=
_2e 2(l - x ) +4e2(x - l ) _e 2(I - x) + 2e2(x - l )
( ) Y3X
=
2e
= 0,
Y2
=
2(J-x)
= 2e2t,
Y3
2 2(x-l)
-e
(2) { Yl (x) Y2(X)
.
2x. (co~x.- smx) 2e smx .
=e =
= e2t.
6.29 (1) I (A) A3 - lOA 2 + 28A - 24, eigenvalues are 6, 2, 2, and eigenvectors are (1,2,1), (-1, 1,0) and (-1, 0,1). (A - 1)(A2 - 6A + 9), eigenvalues are 1, 3,3, and eigenvectors are (2) I(A) (2, -1 ,1), (1, 1,0) and (1, 0,1).
=
1
6.30 Two of them are true: (1) For A Or.
= [~ ~
= Q-l AQ then B must be the identity.
if B
[~ ~] and [~ ~] have a different eigenvalue .
(2) See Example 6.3. (3) Consider A (4) Consider
1 [~ ~ 1
= [~ ~] and B = [~ ~
[~ ~].
(5) Consider
(6) Consider
(7) For any eigenvalue A of A, A + 1 is an eigenvalue of A (8) Consider A
=
[b ~ ]
and B
= [~
~
l
+
[~ ~
l
I.
(9) tr(A + B) = tr(A) + tr(B). See Theorem 6.3. (10) If both belong to the same eigenvalue . (11) In Example 6.6, Q-l AQ is diagonal and its two linearly independent eigenvectors are el and e2. (12) Use Theorem 6.25 with A In.
=
Chapter 7 Problems 7.1 (1) u- v = liT v = Lj uiv;
= Lj V;Uj = v · u. = Lj kuiu, = kI:; UjVj = k(u , v). (4)u · u = Lj lu;l2 ~ 0, and u· u = 0 if and only ifuj = 0 for all i ,
(3) (ku) · v
7.2 (1) If x = 0, clear. Suppose X =1= 0 =1= y. For any scalar k, o :::: (x - ky, x - ky) = (x , x) - k(x, y) - k(y, x) + kk(y, y). Let k = ~ to obtain I (x, x) (y, y) - 1(x, y) 12 ~ O. Note that equality holds if and only if x = ky for some scalar k. (2) Expand IIx + yll2 = (x + y, x + y) and use (1).
376
Selected Answers and Hints 7.3 Suppose that x and yare linearly independent, and consider the linear dependence a(x + y) + b(x - y) = 0 ofx +y and x - y. Then 0 = (a +b)x + (a - b)y . Since x and yare linearly independent , we have a + b = 0 and a - b 0 which are possible only for a 0 b. Thus x + y and x - y are linearly independent. Conversely, if x + y and x - y are linearly independent, then the linear dependence ax + by 0 ofx and y gives ~(a + b) (x + y) + ~(a - b)(x - y) O. Thus we get a 0 b. Thus x and yare linearly independent.
=
= =
= = =
=
7.4 (1) Eigenvalues are 0, 0, 2 and their eigenvectors are (1,0, -i) and (0, 1,0), respec-
¥, I-p,and their eigenvectors are (1, -i, ¥), (1- 3i, I, ¥ ( 1 + .». and (- 4+ 3i, I, l+p(l + .». respectively.
tively. (2) Eigenvalues are 3, 7.5 Refer to the real case.
7.6 (AB)H = (ABl = 7F7/ = BHA H . 7.7 (AH)(A-I)H = (A-IA)H = I. 7.8 The determinant is just the product of the eigenvalues and a Hermitian matrix has only real eigenvalues. 7.9 See Exercise 6.10. 7.10 To prove (3) directly, show that I(x . y) when Ax = /-,X.
= jr(x . y) by using the fact that A H x = -/-,x
= BT - iC T = -B - iC = -A . 7.12 ±AB = (AB)H = BHA H = (±B)(±A) = BA, + if they are Hermitian, 7.11 A H
= BH +
(iC)H
are skew-Hermitian. 7.13 Note that det U H det U, and I detl det(UH U) 7.15 If A-I A H and B-1 BH, then (AB)HAB I.
=
=
=
=
=
=
if they
= I det U1 2.
7.16 Hermitian means the diagonal entries are real, and diagonality implies off-diagonal entries are zero. Unitary means the diagonal entries must be ±l.
.
718
(1)IfTJii./3+~ -ii./3+~] U-IATJ~-~i./3
v1 -i./3 i./3 ' v1 0 ~ + 0~i./3 ] (2)lfU=[-~~!5i ~:·!i ~~!5i]'U-IAU=[~1 ~0 2i~] o -1 0 0
7.20 Note that A has two distinct eigenvalues. 7.21 This is a normal matrix. From a direct computation, one can find the eigenvalues, I - i , 1 - i and I + 2i, and the associated eigenvectors : (-1 , 0, 1), (-1, 1, 0) and (1,1 ,1), respectively, which are not orthogonal. But, by an orthonormalization, one can obtain a unitary basis-change matrix so that A is unitarily diagonalizable. 7.22 AHA = (HI - H2)(HI + H2) = (HI + H2)(HI - H2) = AA H if and only if HIH2 - H2Hl O. 7.23 In one direction these are all already proven in the theorems. Suppose that UH AU = D for a unitary matrix U and a diagonal matrix D. (1) and (2). If all the eigenvalues of A are real (or purely imaginary), then the diagonal entries of D are all real (or purely imaginary) . Thus D H ±D, so that A is Hermitian (or skew-Hermitian). (3) The diagonal entries of D satisfy IAI 1. Thus, D H tr:', and A H = UD- 1U- 1 = A-I.
=
=
=
=
SelectedAnswers and Hints 7.24 Q = _1
./6
[~
-./2 -1] ./2 -2 . ./21
./3
7.25 (1)A=i[
1
-~ -~ ]+%[ ~ ~ 1
= 3+ g./6 [
(2) B
+
(l +..(6)(2+i) ]
7+~./6
(1+1)(2- i)
3- g./6 [
377
1 a
(1-../6)(2+i) ]
7J-i6' 14-2
(l-v~(2-i)
Exercises 7.1 (1)./6,
(2) 4 .
7.4 (1) A= i, x = t(I, -2 - i), A= - i, x = t(1, -2 + i). (2) A= I, x = t(i , I), A = -I , x = t(-i , 1). (3) Eifenvalues are 2, 2 + i, 2 - i, and eigenvectors are (0, -I, 1»), (1 , -3'(2 + i), I), (1, - ~ (2 - i), 1). (4) Eigenvalues are 0, -1 ,2, and eigenvectors are (1,0, -1» , (1 , - i , 1), (1, 2i, 1).
o as a complex
=
i= O. However, for any matrix A, det(A + eI) polynomial has always a (complex) solution . For the real matrix
7.6 A + cI is invertible if det(A + cI)
si A h as no rea1 COS8 - cos sin 88 ] ' A + r I "IS mvertiible f or every real number num er rr SInce [ sin 8 eigenvalues.
I[ I+i1
I-i] -1 '
7.7 (1)./3
~ [~I2i ~ 1 ~ i
(2)
1[I 1]
7.10 (2) Q =./2
1
-1
].
-I+i
.
7.12 (1) Unitary ; diagonal entries are {I, i}. (2) Orthogonal; {cos 8+i sin 8 , cos 8-i sin 8}, where 8 = cos-I (0.6) . (3) Hermitian; {I , 1+./2 , 1 - ./2} . 7.13 (1) Since the eigenvalues of a skew-Hermitian matrix must always be purely imaginary, 1 cannot be an eigenvalue . (2) Note that, for any invertible matrix A, (eA)H = e AH = e- A = (eA)-I . 7.14 det(U - AI) det(U - AI)T det(U T - AI).
=
7.15 U=
~[ ~ -~
l
=
D=U
HAU=[26"i
2~ i
l
7.17 See Exercise 6.14.
7.18 The eigenvalues are I, 1,4, and the orthonormal eigenvectors are
(Ji,-Ji' 0), (- ~, - ~, 1) A
= ~ [~l
-I
and
(-JJ' JJ, JJ).Therefore,
-;1 =~] -I
2
+
~ [~I I~ I~] .
378
Selected Answers and Hints
7.20 See Theorem 7.8. 7.21 If A is an eigenvalue of A , then An is an eigenvalue of An. Thus, if An = 0, then An = 0 or A = O. Conversely, by Schur's lemma, A is similar to an upper triangular matrix, whose diagonals are eigenvaluesthat are supposed to be zero. Then it is easy to conclude A is nilpotent. 7.22 Ten of them are true. . 9 . [COS9 (2) Consider . 9 -sin9] 9 with Sin cos
(I) See Theorem 7.7.
(3) True: See Corollary 6.6.
(4) Consider
[~ ~
l
1= kn .
(5) Such a matrix A is unitary. (6) and (7) A permutationmatrixis an orthogonalmatrix,but need not be symmetric. (8) True: If A is Hermitian,by Schur's lemma, A is orthogonallysimilar to an upper triangular matrix T.1f A is nilpotent,the eigenvaluesof A, as the diagonalentries of T, are all zero. By showing TH T , one can conclude that such A must be
=
the zero matrix. (9) Schur's lemma. (IO) For a Hermitianmatrix A, -i cannot be an eigenvalueof A . Hence,det(A +i I) O. (11) Consider A
l
= [~ =~
Modify (IO). (I3) If U H AU = D with real diagonal entries, then A H (I5) I det U I = 1 for any unitary matrix U.
(12)
= A.
Chapter 8 Problems 8.2 Hint: Let
Then,
J -H
~ [i
and that (J - Al)4
o1 o
01 00 0 1 000 o0 0
00]
0 1 0
1= 0 but (J -
A1)5
= O.
8.3 Six different possibilities. 8.4 (1)
[~ ~] ,
(2)
U !l 1 4 0
(3)
[!
0 2 0 0
0 0 1 0
n
1=
Selected Answers and Hints
=~ ~ ~] 1 0 1 1 0 0
379
.
8.7 See Problem 6.1. 8.8 f(A)
# det(Al -
A) in general .
8.9 For a diagonal D, all diagonal entries of f(D) are zero. For a diagonalizable A Q-l DQ , f(A) Q-l f(D)Q .
=
=
8.10 Let A}, . .. , An be the eigenvalues of A. Then f(A)
= det(Al -
= (A -
A)
AI) ' " (A- An).
=
Thus , f (B) (B - A11m) ... (B - An 1m) is nonsingular if and only if B - Ai1m, i = 1, .. . , n, are all nonsingular. That is, none of the Ai'S is an eigenvalue of B.
8.11 (1) A-I
=
[
1 0 -1/2] [1 0 1023 ] 0 1/2 0 and A lO = 0 1024 0 . o 0 1/2 0 0 1024
(2) The characteristic POlYnonn['a~:f AOis is 104A2 - 228A
8.12 For J(2) ,m(x)
+
1381
= (x -
=
0
o
~~)]= (A -1)(A - 2)2, and the remainder
98 0 0 98
A)3 . For J(3) , m(x)
8.14 (1) m(A) = (A- 1) (A- 2)3 . 2
n
(2) m(A) = A (A - 2) . A =
_2
0 _2 n
.
= (x _A)4. n 1
+
0
2
n
0
2
n
2n
0 0 2n [ _2n _2n+1 2n 2n+1
_2n
2
2
]
n ~ 2.
2
-2 + e -1 + e ] 2 . 8.15 (2)eA = l-e 2 -1 -1 +e 2 2 1 0 -1 + e [ 1- e 1 - e2 3 - 2e 2 -2 + e2 -1 + 2e 2 8.17 The eigenvalue is -1 of multiplicity 3 and has only one linearly independent eigen1 3 - 2e
2
vector (1, 0, 3). The solution is y(t)
=
Yl (t) ] Y2(t) [ Y3(t)
= e:'
2 [ -1 - 5t + 2t ] -1 +4t . 1 - 1St + 6t 2
Selected Answers and Hints
380
8.2 Find the Jordan canonical form of A as Q-I A Q = J. Since A is nonsingular, all the diagonal entries Ai of J , as the eigenvalues of A, are nonzero . Hence, each Jordan blocks Jj of J is invertible. Now one can easily show that (Q-I AQ)-I = Q-I A-I Q = J- I which is the Jordan form of A-I, whose Jordan blocks are of the form J;I .
8.3 (1) [ej , ea}; (3) {el, e3, ea, eS}.
8.4 (1) For A = j - I, XI = (1, -1); for A = j + I , x2 = (1, 1) . (2) For A = -I, XI = (-2, 0, 1), x2 = (0, 1, 1), and for A = 0, XI = (-1, 1, 1). (3) For A = I, XI =(2,0, -I), X2=(-~, -~ , ~),andforA=-I,xl =(9,1, -1). 8.6 Solve the recurrence relation in Example 2.14.
8.7 (x,y)=~(4+ j ,i). 8.9 Use [ 31 31 ] =
[~_~!I] [20
8.10 Use A = QJQ-I =
'.11 8.12
y(l)
I I
[=~6 -1i -4~] [~20 0~ -4~] [~i -~~
~~,~ ~] _~U [
YI (t) n(t) Y3 (t)
= =
YI (t) Y2(t) Y3(t)
=
40 ] [11 -1 1 ] .
_2e 2(I-t) _e 2(1-t) 2e 2(1-t)
+ +
[
-t' l
ot].
4e 2(t-l) 2e 2(t-l) 2e 2(t-l)
2(t - l)e ' -2te t (2t - l)e ' 8.14 (1) (a - d)2 + 4bc f= 0 or A = a/. 8.15 (1) t 2 + t - 11, (2) t 2 + 2t + 13, (3) (t - 1)(t 2 - 2t - 5).
8.13
8.17 (3)
A-I~
U -: ] 0 I 2: 0
-2: 1
(5)
r
0]
0 1 o 0 -1 1 0 1 0 o 1
l - [ -: 1
o
An =
'
'
[IOn 0 2n 2n - 1 ] . o
0
n n1 [ n(yl) A = -n
1
00]
o1 0 0 n 0
1 0
o
.
1
8.18 (2) The characteristic polynomial of W is I(A) = An - 1. So, its eigenvalues are 1, w , w 2, . . . , wn- I, where w = e2ni/n. (3) Eigenvalues of A are Ak = L:1=1 aiw(i -l)k = al + a2wk + a3w2k + . . . + anw(n-I)k , k = 0, 1, .. . .n - 1. (4) detA = k: 6(al +a2wk +a3w~ + . .. +anwk- I), where oj, = e2nik /n . (5) The characteristic polynomial of B is I(A) = (A - n + 1)(A + l)n-l .
n
Selected Answers and Hints
381
8.19 Six of them are true. (I) and (2) See Theorem 8.1. (3) and (4): Check it with Example 8.2. (5) and (6) See Examples 8.2 and 8.3. (7) det e 1n = en . (8) For any Jordan matrix J, J and JT are similar. (9) What is the minimal polynomial of In? (10) See Example 8.18(2). (II) See Corollary 8.7. Chapter 9 Problems 9.1
°
°° °
° 5]
11 3-4] 1[011] 110-0 -1 1 , (2) 2 1 1 ,(3) -1 2 [ -4 1 4 1 1 -5 2-1
(I) [
;
°
.
9.2 (1) The eigenvalues of A are 1,2, 11. (2) The eigenvalues are 17,0, -3, and so it is a hyperbolic cylinder. (3) A is singular and the linear form is present, thus the graph is a parabola.
9.3 B with the eigenvalues 2, 2 + J'i and 2 - J'i. 9.5 The determinant is the product of the eigenvalues. 9.6 False with a counter-example A
= [~ _~ ] .
9.8 (1) is indefinite. (2) and (3) are positive definite. 9.13 (2) bll 9.15 Ifu
= bl4 = b41 = b44 = 1, all others are zero. un w, then u = ax + f3y E W for some scalars a and f3. Since x,
y E U, b(u, x) = b(u, y) = 0. Butb(u, x) = f3b(y, x) = -f3 andb(u, y) = ab(x, y) = a . E
9.16 Letc(x, y) = ~(b(x , y)+b(y , x)) andd(x, y) = ~(b(x, y)-b(y, x)) .Thenb = c+d. 9.17 Let D be a diagonal matrix, and let D' be obtained from D by interchanging two diagonal entries dii and dj l : i =1= j. Let P be the permutation matrix interchanging i-th and j-th rows. Then P Dp T = D' . 9.18 Count the number of distinct inertia (p, q, k). For n, the number of inertia with p is n - i + 1.
=i
9.19 (3) index = 2, signature = 1, and rank = 3. 9.20 (1) local minimum, (2) saddle point. 9.21 Check it with [t», y, z)
=x2 -
y2 - z2.
9.22 Note that the maximum value of R(x) is the maximum eigenvalue of A, and similarly for the minimum value. 9.23 max
=i
9.24 (I) max
at ±(I/J'i, 1/J'i), 1
= 4 at ± .J6 (1,
min = ~ at ±(1/J'i, - 1/ J'i).
1, 2), min = -2 at ±
I (2)max=3at± .J6(2, I , 1), min ee Oat E
~ (-1,
~(I,
-1 , 1);
-1 , -1).
382
Selected Answers and Hints Exercises 9.1 (1) [ ; ;
J'
(3) [ ;
-~ -~], (4) [ ;
-~] .
-;
3 -4 -3 0 4-1 9.3 (2) {(2, 1,2) , (-1, -2,2), O,O,O)}. 9.4 (i) If a = 0 = c, then X, = ±b. Thus the conic section is a hyperbola. (ii) Since we assumed that b i= 0, the discriminant (a - c)2 + 4b 2 > O. By the symmetryof the equationin x and y, we may assume that a - c ::: O. If a - c = 0, then x, = a ± b. Thus, the conic section is an ellipse if ),,1),,2 = a 2 - b2 > 0, or a hyperbolaif a 2 - b2 < O. If ),,1),,2 = a 2 - b2 = 0, then it is a parabola when x] i= 0 and e' i= 0, or a line or two lines for the other cases. If a - c > O. Let,2 = (a - c)2 + 4b 2 > O. Then x, = (a+ for j = 1,2. Hence, 4),,1),,2 = (a + c)2 _,2 = 4(ac - b2) . Thus, the conic section is an ellipse if det A = ac -b2 > 0, ora hyperbolaifdet A = ac-b2 < O.lfdet A = ac-b2 = 0, it is a parabola,or a line or two lines dependingon some possiblevaluesof d ', e' and the eigenvalues. 9.6 If)" is an eigenvalue of A, then)"2 and areeigenvalues of A 2 and A-I, respectively. Note x T (A + B)x = x T Ax + x T Bx.
2)±r
9.8 (1) Q = 9.10 (1) A
t
Jz [~ _~ 1
= [~ - ~
The formis indefinite witheigenvalues)"
J.
(2) B
= [~ :],
(3) Q =
[~ - i
1
9.11 (2) The signatureis I, the index is 2, and the rank is 3. 9.15 (2)The point (1, rr) is a criticalpoint,and the Hessianis [~
_~
is a local maximum. 9.18 (1) Supposethat lpw = ({Jw' . Then, for all v E V, b(v, w)
= ({J(w)(v) = ((J(w')(v) = b(v,
Wi)
= 5 and)" = -1.
J
or b(v, w -
Hence, JO, iT)
Wi)
= O.
The non-degeneracy of b implies that w = w', that is, ({J is one-to-one. This also implies that dim W ::: dim V". A similar argument shows that the linear transformation 1ft : V ...... woO is also one-to-one, and therefore dim V ::: dim WoO. Since dim V = dim V" and dim W = dim WoO from Theorem 4.18, we have dim V ::: dim WoO = dim W ::: dim V" = dim V. Therefore, ({J and 1ft are surjective, and so are isomorphisms. (2) comes from (1). 9.19 Exactly seven of them are true. (1) See Theorem9.1 (The principalaxes theorem) (1) (2) Any two congru[en; m;tri~e]s have the same inertia. [ -1 (3) Consider A =
~
~
~
.
(4) Consider A =
0
(7) Considera bilinearform b(x, y) = XIYl - X2Y2 on JR2. (9) The identity I is congruentto k 2 I for all k E JR. (10) See (9). (12) Consider a bilinear form b(x, y) diagonalizable.
= XIY2. Its matrix Q
[
o0
1] . 0 IS not
Bibliography
1. M. Artin, Algebra, Prentice-Hall, Englewood Cliffs, NJ, 1991. 2. M. Braun, Differential Equations and Their Applications, 4th Edition , SpringerVerlag, New York, 1993. 3. P.R. Gantmakher, The Theoryof Matrices, I, II, Chelsea, New York, 1959. 4. P.R. Halmos, Finite-dimensional Vector Spaces, Springer-Verlag, New York, 1974. 5. K. Hoffman and R. Kunze, LinearAlgebra, 2nd Edition, Prentice-Hall, Englewood Cliffs, NJ, 1971. 6. R.A. Hom and C.R. Johnson , Matrix Analysis, Cambridge University Press , Cambridge , 1986. 7. G. Strang, Linear Algebra and Its Applications, 3rd Edition , Harcourt Brace Jovanovich, San Diego, CA, 1998. 8. V.V. Prasolov, Problems andTheorems in LinearAlgebra, Translatedfrom theRussian manuscript by D.A. Lettes, American Mathematical Society, Providence , RI, 1994.
Index
LDU decomposition, 32 LDU factorization, 32 LU decomposition, 29 LU factorization, 29 QR decomposition, 193 QR factorization, 193 n-space complex, 247 real,76 Additivity, 158,248 Adjoint, 143, 145,254 Adjugate , 61 Angle, 157, 161 Antilinear, 248 Associated matrix, 128 Augmented matrix, 5 Back substitution, 8 Basic variable, 10 Basis, 86 change of, 134 dual,l44 ordered, 125 orthonormal, 165 standard, 86, 163 Basis-change matrix , 136 Bessel's inequality, 181 Bijective, 123 Bilinear, 158 Bilinear form , 339 alternating, 340 diagonalizable, 342 nondegenerate, 358
rank of, 341 skew-symmetric, 340 symmetric, 340 Binet-Cauchy formula, 66 Block ,19 matrix, 19 Cauchy-Schwarz inequality, 160,252 Cayley-Hamilton theorem, 244, 295 Characteristic equation, 202 Characteristic polynomial, 202 Characteristic value, 202 Characteristic vector, 202 Cholesky decomposition, 332 Cholesky factorization , 332 Circulant matrix, 65, 317 Coefficient matrix, 6 Cofactor, 57 expansion , 58 Column (matrix), 12 Column space , 84, 92 Column vector, 12 Companion matrix, 214, 215 Computer graphics, 148 Congruent matrix, 336 Conic section, 321, 322 Conjugate, 248 Coordinate, 75 homogeneous, 151 rectangular, 165 Coordinate function , 144 Coordinate vector, 125 Coordinate-change matrix, 136 Cramer's rule, 62
386
Index Critical point, 349 Cross-product term, 324 Cryptography, 34 Decomposition LDU ,32 LU,29 QR,193 Cholesky,332 Definite form, 326, 327 negative, 326, 327 positive, 326, 327 Determinant, 46 Diagonal entry, 12 Diagonal matrix, 12 Diagonalizable orthogonally, 258 unitarily,258 Diagonalization of a quadratic form, 324 of linear transformation,240 of matrices, 207 Difference equation linear, 217 Differential equation linear, 226 Dilation, 180 Dimension, 89 finite, 89 infinite, 89 Direct sum, 81 Discrete dynamical system, 221 Distance, 157, 161,249 Dot product, 157, 158 Dual basis, 144 Dual space, 143 Eigenspace, 202 Eigenvalue, 202 Eigenvector,201 Electrical network, 36 Elementary column operation, 25 Elementary matrix, 23 Elementary operations, 4 Elementary product, 55 signed,55 Elementary row operation, 6 Elimination, 3 forward,7
Gauss-Jordan, 8 Entry, 12 Equilibrium state, 225 Euclidean n-space, 158 Euclidean complex n-space, 248 Euclidean inner product, 158 Euclidean length, 248 Exponential matrix, 232, 306 Factorization LDU, 32 LU,29 QR,193 Cholesky, 332 Fibonacci,212 number,212 sequence, 212 Forward elimination, 7 Fourier coefficient, 144 Free variables, 10 Fundamental set, 229 Fundamental theorem, 96 Gauss-Jordan elimination, 8 Gaussian elimination, 8 General solution, 226, 229 Generalized eigenspace, 284 Generalized eigenvector, 282 chain of, 282 Gerschgorin's theorem, 222 Global maximum, 327 Global minimum, 327 Golden mean, 214 Gram-Schmidt orthogonalization, 166 Hermitian congruent matrix, 336 Hermitian form, 340 rank of, 341 Hermitian matrix, 254 Hessian, 349, 351 Homogeneity, 158 Homogeneouscoordinate, 151 Homogeneoussystem, 1 Idempotent matrix, 43, 243 Identity matrix, 17 Identity transformation, 118 Image, 119 Indefinite form, 326, 327 Index, 347
Index Inertia, 327, 336, 346 Initial condition, 226 Initial eigenvector, 282 Injective, 123 Inner product, 158,248 complex, 248 Euclidean, 157, 158 Hermitian, 248 matrix representation of, 163 positive definite, 158,248 real, 158 Input-output model , 38 Interpolating polynomial, 109 Interpolation, 108 Inverse left,21 right, 21 Inverse matrix, 22 Inversion, 54 Invertible matrix, 22 Isometry, 179 Isomorphism, 123 natural, 125 Jordan, 273 block,274 canonical form, 273,274 canonical matrix, 274 Kernel,119 Kirchhoff's Current Law, 36 Kirchhoff's Voltage Law, 36 Leading 1's, 8 Least squares solution, 181 Length, 157, 160,249 Linear combination, 82 Linear dependence, 84 Linear difference equation, 217, 221, 309 Linear differential equation , 226, 235, 310 Linear equations, 1 consistent system of, I homogeneous system of, 1 inconsistent system of, 1 system of, 1 Linear form , 320 Linear functional , 143 Linear programming, 353 Linear transformation, 117
associated matrix of, 128 diagonalization of, 240 dilation , 180 eigenvalue of, 240 eigenvector of, 240 identity, 118 image, 119 invertible , 123 isomorphism, 123 kernel,119 matrix representation of, 128 orthogonal, 179 projection, 168 reflection, 119 rotation, 118 scalar multiplication of, 131 sum of, 131 transpose, 145 zero, 118 Linearly dependent, 84 Linearly independent, 84 Lower triangular matrix, 12 Magnitude, 157, 160,248 Markov matrix, 225 Markov process, 224, 225 Matrix, 11 associated, 128 augmented, 5 basis-change, 136 block,19 circulant, 65, 317 column, 12 congruent, 336 coordinate-change, 136 diagonal, 12 diagonalizable, 207 diagonalization of, 207 elementary, 23 entry of, 12 exponential, 232, 306 Hermitian, 254 Hermitian congruent, 336 idempotent, 43 identity, 17 indefinite, 326, 327 inverse, 22 invertible , 22 Jordan canonical, 274
387
388
Index lower triangular, 12 Markov,225 minimal polynomialof, 299 negativedefinite,326, 327 negative semidefinite, 326, 327 nilpotent, 43 nonsingular, 22 normal,262 order of, 12 orthogonal, 177 orthogonal part of, 193 orthogonal projection, 173, 190, 195 permutation,25 positive definite, 326, 327 power of, 289 product of, 15 row, 12 scalar multiplication of, 13 semidefinite, 326, 327 similar, 140 simultaneously diagonalizable, 243 singular,22 size of, 11 skew-Hermitian, 254 skew-symmetric, 14 square, 12 stochastic, 225 sum of, 13 symmetric, 14 transpose of, 12 tridiagonal, 65, 316 unitary,256 upper triangular, 12 upper triangular part of, 193 Vandermonde, 60, 64, 109 zero, 13 Matrix of cofactors, 61 Matrix polynomial,42, 294 Matrix representation, 128, 163 inner product, 163 linear transformation, 128 Maximum, 349 Minimal polynomial,299 Minimum, 349 Minor, 57 Monic, 215 Multilinear, 47 Newton's second law, 189
Nilpotentmatrix, 43 Nonsingularmatrix, 22 Normal equation, 182 Normal matrix, 262 Normalization, 165 Null space, 92 Nullity,92 Ohm's Law,36 One-to-one, 123 Onto, 123 Ordered basis, 125 Orientation,68 Orthogonal, 170 complement, 170 decomposition, 171 matrix, 177 projectionmatrix, 173, 190, 195 transformation, 179 vectors, 162 Orthogonalization, 166 Gram-Schmidt, 166 Orthogonally similar, 258 Orthonormalbasis, 165 Orthonormal vectors, 165 Paraboliccylinder,329 Parallelepiped, 68 Parallelogram, 68 equality,198 Particular solution,226, 229 Permutation, 54 even, 54 inversionof, 54 odd,54 matrix, 25 sign of, 54 Perpendicularvectors, 162 Pivot,7 Polarizationidentity, 198 Polynomialapproximations, 186 Predator-prey problem, 230 Principal submatrix,332 Projection, 168 Pythagoreantheorem, 162, 252 Quadraticequation, 320 Quadraticform, 319, 320 complex,320
Index Quadratic surface, 321 Rank ,98 Rank theorem, 98 Rayleigh quotient, 354 Real inner product space , 158 Recurrence relation , 212, 214 Reduced row-echelon form, 8 Row (matrix), 12 Row space, 92 Row vector, 12,91 Row-echelon form, 8 reduced,8 Row-echelon matrix, 8 Row-equivalent, 6 Saddle point, 327, 350 Sarrus 's method , 51 Scalar, 13,75 Scalar multiplication of linear transformation, 131 matrix, 13 vectors , 77 Schur's lemma , 258 Second derivative test, 350 Self-adjoint, 254 Semidefinite form, 326, 327 negative, 326,327 positive, 326, 327 Semilinear, 340 Sesquilinear form, 340 Sign of permutation, 54 Signature, 347 Similar, 140 orthogonally, 258 unitarily, 258 Similar matrix, 140 Similarity, 140 Simultaneously diagonalizable, 243 Singular matrix, 22 Skew-Hermitian form, 341 Skew-Hermitian matrix, 254 Skew-symmetric matrix, 14 Spectral decomposition, 266 Spectral theorem , 265 Square matrix, 12 Square term, 324 Standard basis, 86, 163 Standard ordered basis, 125
389
Stochastic matrix, 225 Submatrix, 19 minor, 57 principal , 332 Subspace, 79 fundamental, 175 spanned ,82 sum of, 81 Substitution, 3 Sum of linear transformations, 131 matrices, 13 subspaces,81 vectors, 77 SuIjective, 123 Sylvester's law of inertia, 336, 346 Symmetric matrix, 14 Trace, 120, 143 Transformation identity, 118 injective, 123 linear, 117 surjective, 123 zero, 118 Transpose, 12, 145 Triangle inequality, 162, 252 Tridiagonal matrix, 65, 316 Unit vector, 165 Unitarily similar, 258 Unitary matrix, 256 Unitary space, 248 Upper triangular matrix, 12 Value characteristic, 202 Vandermonde matrix, 60, 64, 109,304 Vector, 75, 77 characteristic, 202 column, 12 component of, 75 orthogonal, 162 perpendicular, 162 row, 12,91 scalar multiplication of, 77 sum of, 77 unit, 165 zero, 77 Vector addition , 77
390
Index Vector space, 77 complex , 78, 248 isomorphic, 123 real,77 Volume, 67
Wronskian, 64, 110,228 Zero matrix, 13 Zero transformation, 118 Zero vector, 77