Sunteți pe pagina 1din 10

Math 5327 Spring 2011

The Cayley-Hamilton Theorem and Minimal Polynomials

Here are some notes on the Cayley!Hamilton Theorem, with a few extras thrown in. First, the proof of the Cayley-Hamilton theorem, that the characteristic polynomial is an annihilating polynomial for A. The proof started out this way: given a matrix, A, we consider (xI - A)adj(xI - A) = det(xI - A) I = cA(x) I. It would be nice if we could just plug A in for x, in this equation. Certainly, we cannot do that because the matrix adj(xI - A) has entries which are polynomials in x, so we would end up with a matrix with matrix entries. As we proceed, we will use an example to illustrate the difficulties.
# x2-4x+3 #2 1 1& x-1 x-1 & $ ' $ ' $ ' 2 $ ' . Then adj(xI A) = , 1 2 1 Suppose that A = $ x-1 x -4x+3 x-1 ' $ ' $ ' $ ' $ ' $ ' $ "1 1 2% x-1 x2-4x+3 ' " x-1 %

and

(xI - A)adj(A) = (x3 - 6x2 + 9x - 4)I.

We write this out as (xI - A)(B0 + xB1 + x2B2 + + xn!1Bn-1) = cA(x)I: # $ $xI $ " & # 2 1 1 && ## 3 -1 -1 & # !4 1 1 & ' $ '' $$ ' $ ' 2 I = (x3 - 6x2 + 9x - 4)I. -$ ' $ ' $ ' + x + x 1 2 1 -1 3 -1 1 ! 4 1 ' $ ' $ ' $ ' $ ' $ ' $ ' $ ' $$ -1 -1 3 ' ' $1 1 2 ' $ 1 1 !4 ' " " %' % % % % ""

At this point, the expressions would still make sense if we replaced x by A but we are not guaranteed that the resulting equation is valid. For example, if we replace #2 1 2 & $ ' x by the matrix $ ' 1 2 1 $ ', $ $1 1 2 ' ' " %

page 2

## 2 1 2 & # 2 1 1 && ## 3 -1 -1 & # 2 1 2 &# !4 1 1 & # 2 1 2 &2 & $$ '' ' $ '$ ' $ ' ' $$ ' $ ' $ ' ' $ $1 2 1' '' $ -1 3 -1 ' '+$ $1 2 1' '$ $ 1 !4 1 ' '+$ $1 2 1' ' ' $$ $1 2 1 ' '-$ $ ' $ ' ' ' ' ' $ ' $ '' $$ $ ' $ $ '$ $ ' $ $ ' ' $" $ ' 1 1 2 1 1 2 -1 -1 3 1 1 2 1 1 ! 4 1 1 2 " " % % % " % % " % % " " " %
# 0 0 1 & # # 3 -1 -1 & # !5 0 !5 & # 7 6 9 & & $ $ ' $$ ' $ ' $ '' ' $ ' = $ $ + $ +$ -1 3 -1 ' !1 !6 !1 ' 5 6 6' $0 0 0' '$ $ ' $ ' $ ' ' $ $ ' ' $ ' ' $ ' $$ $ ' $ ' $ $ '' " 0 0 0 % " " -1 -1 3 % " !1 !1 !6 % " 5 5 7 % ' % #3 3 4& # 0 0 1 &# 5 5 3 & $ ' $ '$ ' $ ' = $ ' $ ' = 0 0 0 3 3 4 0 0 0 $ ', $ '$ ' $ ' $ '$ ' $ ' $ "0 0 0 ' "0 0 0% %$ "3 3 4 ' % #2 1 2& #2 1 2& #2 $ ' $ ' $ whereas cA$ $1 2 1' ' = !4I + 9$ $1 2 1' ' ! 6$ $1 $ ' $ ' $ $ ' $ ' $ "1 1 2% "1 1 2% "1 # !4 0 0 & # 18 9 18 & # 42 36 $ ' $ ' $ = $ ' 9 18 9 ' 0 ! 4 0 $ '+$ $ '!$ $ 30 36 $ ' $ ' $ ' $ % $ " 0 0 !4 % $ " 9 9 18 ' " 30 30
2 # 2 1 2 &3 1 2& ' $ ' 2 1' ' +$ $1 2 1' ' ' $ ' ' $ ' 1 2% "1 1 2%

54 &

# 29 28 38 & ' $ ' 36 ' '+$ $ 22 23 28 ' ' ' $ ' 42 ' % $ % " 22 22 29 '

#1 1 2 & $ ' = $ $1 1 1 ' ', $ ' $ "1 1 1 ' % a very different answer. Thus, (xI - A)(B0 + xB1 + x2B2 + + xn!1Bn-1) = cA(x)I is correct for scalars x, but does not appear to work if x is a matrix. Now if x is a scalar, (xI - A)(B + xB + x2B + + xn!1B )
0 1 2 n-1

= !AB0 + (xB0 ! AxB1) + (x2B1 ! Ax2B2) + xnBn!1 = ! AB + x(B ! AB ) + x2(B ! AB ) + + xnB


0 0 1 1 2

n!1.

page 3

If we denote cA(x) by cA(x) = a0 + a1x + + an!1xn!1 + xn, then we have that !AB0 + x(B0 ! AB1) + x2(B1 ! AB2) + + xnBn!1 = a 0 I + xa1 I + + xn ! 1 a n ! 1 I + xn I. Two polynomials are equal if and only if they have the same coefficients (you might think about why this is true, even if the coefficients are matrices), so = I. This means that for scalars AND matrices x, a I = !AB , , B
0 0 n!1

!AB 0 + x(B0 ! AB1) + x2(B1 ! AB2) + + xnBn!1 = cA(x)I. #2 1 2 & $ ' You might check that the matrix $ $1 2 1 ' ' can be substitutes in for x here, and a $ $1 1 2 ' ' " % correct result follows. So we have the following: For any x, scalar OR matrix, cA(x)I = !AB0 + x(B0 ! AB1) + x2(B1 ! AB2) + + xnBn!1, and if x is a scalar, !AB 0 + x(B 0 ! AB1) + x2(B 1 ! AB2) + + xnB n!1 = (xI - A)(B + xB + x2B + + xn!1B
0 1 2 n-1),

but if x is a matrix, then (xI - A)(B0 + xB1 + x2B2 + + xn!1Bn-1) = !AB 0 + (xB0 ! AxB1) + (x2B 1 ! Ax2B 2) + xnB n!1 If x is a matrix for which

page 4

(* )

!AB0 + (xB0 ! AxB1) + (x2B1 ! Ax2B2) + xnBn!1 = ! AB 0 + x(B 0 ! AB1) + x2(B 1 ! AB2) + + xnB n!1,

then we could perform the calculation in this way: cA(x) = !AB0 + x(B0 ! AB1) + x2(B1 ! AB2) + + xnBn!1 (already true) = !AB + (xB ! AxB ) + (x2B ! Ax2B ) + xnB
0 0 1 1 2 n!1

= (xI ! A)((B0 + xB1 + x2B 2 + + xn!1B n-1). For which matrices will this work? The answer is that we can do this, so long as
#2 1 2& $ ' x commutes with any power of A, and this was the problem with $ ' 1 2 1 $ ' : it $ ' $ ' "1 1 2%

does not commute with A. Since A commutes with any power of A, it is legal to substitute A for x into the equation, and we obtain (A ! A)(B0 + + An!1Bn!1) = cA(A ), and in particular, that cA(A) = 0. So the point of this proof is the following: (**) (xI ! A)((B0 + xB1 + x2B2 + + xn!1Bn-1) = cA(x)I is true for any x for which commutes with A (that is, for which xA = Ax). As
#1 1 1& $ ' ' an exercise, you might try x = $ 1 1 1 $ ' , which commutes with A, and show that $ ' $ ' "1 1 1%

(**) works for this x.

page 5

Here is an extension of the Cayley!Hamilton Theorem. It uses adj(xI ! A) to calculate the minimal polynomial of A. Suppose that the greatest common c A (x ) . The divisor of all the entries in adj(xI ! A) is g(x). Then mA(x) = g (x ) proof is very similar to the proof of the Cayley!Hamilton theorem: we can write c A (x )I = (xI ! A) adj(xI ! A) = (xI ! A) g(x ) (C 0 + xC1 + + xm C m ) for some m. Dividing by g(x), we have [cA(x)/g(x)] I = (xI ! A)(C 0 + xC1 + + xmC m). As before, the right hand side of the above can be multiplied out to get a polynomial with matrix coefficients equal to [cA(x)/g(x)] I, and this will all be legal as long as x commutes with A. Thus, (cA/g)(A) = (A ! A)(C0 + AC1 + + AmCm) = 0. Now suppose that f(x) = xnCn + xn!1Cn!1 + + xC1 + C0, is ANY polynomial with matrix coefficients. If x is a variable that commutes with A, then we can write f(x) = (xI ! A) q(x) + R, for some polynomial q(x) with matrix coefficients, where R is some remainder matrix. In particular, if f is an annihilating polynomial for A, then f(A) = 0 = (A ! A)q(A) + R shows R = 0. Thus, f(x) = (xI ! A)q(x) for some polynomial q(x). If mA(x) is the minimal polynomial of A and cA(x) = h(x)mA(x), we have cA(x)I = h(x)m A(x)I = (xI ! A)h(x)q(x) = (xI ! A)h(x)Q (x) where q(x) is a polynomial with matrix coefficients and Q(x) is a matrix with polynomial coefficients. Comparing this to cA(x)I = (xI ! A) adj(xI ! A),

page 6

we have that h(x)Q(x) = adj(xI ! A), so h(x) must be a divisor of each entry of adj(xI ! A). This proves that h = g. As an example, consider the matrix:
# 2 1 1 1 & $ ' $ !1 0 !1 !1 ' A=$ ', 1 1 2 1 $ ' $ ' " !1 !1 !1 0 % # x!2 !1 !1 !1 & $ ' $ 1 x 1 1 ' xI ! A = $ ', ! 1 ! 1 x ! 2 ! 1 $ ' $ ' 1 1 x % " 1 (x!1)2 ! ( x ! 1 )2 x(x!1)2 ! ( x ! 1 )2

We have (after MUCH work) # x(x!1)2 (x!1)2 $ !(x!1)2 (x!1)2(x!2) $ adj(xI ! A) = $ $ (x!1)2 $ (x!1)2 $ $ ! ( x ! 1 )2 ! ( x ! 1 )2 "

& ' ! ( x ! 1 )2 ' ', (x!1)2 ' ' ' (x!1)2(x!2) ' %
(x!1)2

so g(x) = (x ! 1)2, and mA(x) = cA(x)/(x!1) =

( x ! 1) 4 (x ! 1) 2

= (x ! 1)2.

One last thing mentioned in class: That matrices over the integers, or matrices with polynomial coefficients can be put into something called Smith!Normal form: Given A, there are integer (or polynomial) matrices P and #c & $ 1 ' c2 $ ' Q so that det(P) = 1, det(Q) = 1, and PAQ = $ $ ', a diagonal ' $ ' $ ' $ ' cn % " matrix, with c divisible by c divisible by divisible by c . This is not quite
n n!1 1

true: The first several cs will be 0 if det(A) = 0. It is only after the cs become nonzero that they start dividing each other. The cs will be integers if A is an integer matrix, polynomials if A is a polynomial matrix. If det(A) ! 0, then the

page 7

product of the cs is det(A). Finally, if we do this for xI ! A, whose determinant is cA(x), then the cs can be taken to be monic polynomials, and the largest of these, cn, is the minimal polynomial. The proof for this rests on the fact that if B(x) is a matrix with polynomial entries, and we make a single row or column operation on B(x) to get B((x), then the greatest common divisor of the entries in adj(B((x)) is the same as for adj(B(x)). Once this is established, one may perform any number of row or column operations without changing the gcd of the entries of the adjoint. Finally, the adjoint of a diagonal matrix is extremely easy to figure out, and one gets that the gcd will be the product c1cn!1, so the c A (x ) c1 cn!1cn = = cn. Lets verify that a single minimal polynomial is c1 cn!1 c1 cn!1 row operation does not change the gcd of the adjoint for one case when B is 3x3. I will use the cofactor matrix rather than the adjoint below (avoids a transpose). I will use C(B) for the cofactor matrix of B. #a a a & # a !yb a !yb a !yb & 1 2 3 1 2 2 3 3 $ ' $ 1 ' b b b b b b , and B ( = ' $ Suppose B = $ 1 2 3 ' $ 1 2 3' $ '. Now let g(x) be $ $ ' $c c c ' ' $ c c2 c3 ' " % 1 " 1 2 3% the gcd of the entries in C(B), h(x) the gcd of the entries in C(B(). When we take cofactors, the top row will be unchanged. This means that g(x) and h(x) both divide the entries in the top row of either cofactor matrix Lets look at the bottom row. This will be (a2b3 ! a3b2, a3b1 ! a1b3, a1b2 ! a2b1) for B and ((a2!yb2)b3 ! (a3!yb3)b2, something, something) for B(. But (a2!yb2)b3 ! (a3!yb3)b2 = a2b3 ! a3b2, so the bottom rows are the same as well. This leaves just the second row to check. The second rows will be (a3c2 ! a2c3, a1c3 ! a3c1, a2c1 ! a1c2) for B, and ((a3!yb3)c2 ! (a2!yb2)c3, (a1!yb1)c3 ! (a3!yb3)c1, (a2!yb2)c1 ! (a1!yb1)c2) for B(. These are clearly different. Now g(x) divides each of the entries for B.

page 8

In particular, say, g(x) divides a1c3 ! a3c1. But g(x) also divides b1c3 ! b3c1 because this is an entry in C(B) (the (1, 2)!cofactor). Since g(x) divides both a1c3 ! a3c1 and b1c3 ! b3c1, it also divides a1c3 ! a3c1! y(b1c3 ! b3c1), the (3, 2)!cofactor for B(. In a similar way, we see that g(x) divides all the cofactors in B(, so it must divide the greatest common divisor of these. That is, h(x) is divisible by g(x). But row operations are reversible. Because of this, the same argument shows that g(x) is divisible by h(x). Since each divides the other, they are, up to scalar multiples, the same polynomial. Here is an example: (same as before)
# 2 1 1 1 & # x!2 !1 !1 !1 & $ ' $ ' $ !1 0 !1 !1 ' $ 1 x 1 1 ' xI ! A = $ A=$ ', ', 1 1 2 1 ! 1 ! 1 x ! 2 ! 1 $ ' $ ' $ ' $ ' 1 1 x % " !1 !1 !1 0 % " 1 # x!2 !1 !1 !1 & # x!1 0 0 x!1 & $ ' $ ' $ 1 $ 0 x!1 0 !x+1 ' x 1 1 ' We have: $ ) ' $ ' (to simplify) 0 x!1 x!1 ' $ !1 !1 x!2 !1 ' row operations $ 0 $ ' $ ' 1 1 x % 1 1 x % " 1 " 1 #1 & #1 & 1 1 x 0 0 0 $ ' $ ' $ 0 x!1 ' $ 0 !x+1 0 x!1 0 !x+1 ' ) ) $ '. $ ' row operations $ 0 0 x!1 x!1 ' column operations $ 0 0 x!1 x!1 ' $ ' $ ' " 0 !x+1 !x+1 !x2+2x!1 % " 0 !x+1 !x+1 !x2+2x!1 % #1 0 #1 0 #1 0 & 0 0 & 0 0 & 0 0 $ ' $ ' $ ' $ 0 x!1 $ 0 x!1 $ 0 x!1 0 ' 0 !x+1 ' 0 0 ' 0 )$ ') $ ') $ ' x!1 x!1 ' x!1 x!1 ' x!1 ' $0 0 $0 0 $ 0 0 x!1 $ ' $ ' $ ' 0 !x2+2x!1 % " 0 0 !x+1 !x2+x % " 0 0 !x+1 !x2+x % "0 0 #1 0 0 0 & $ ' $ 0 x!1 0 0 ' )$ ' so again, the minimal polynomial is (x ! 1)2. 0 ' $ 0 0 x!1 $ ' 0 (x!1)2 % "0 0

page 9

Finally, for this set of notes, the relationship between minimal polynomials and diagonalization of operators or matrices. First, a result on kernels of compositions of operators (or of products of matrices). Lemma If S and T are linear operators on V, then dim(ker(ST)) " dim ker(S) + dim ker(T). Proof: Ker(ST) = {v * ST(v) = 0} = {v * T(v) = 0} + {v * T(v) ! 0 but ST(v) = 0}. Suppose that ker(T) has basis {u1, u2, , um} and Ker(ST) has basis {u , u , , u } + {v , v , , v }.
1 2 m 1 2 k

Then dim(ker(T)) = m, and dim(ker(ST)) = m+k. Now {T(v1), T(v2), ,T(vk)} is a linearly independent set!!you should check that this is true! it is an important fact about linear transformations. How big can k be? Since S(T(vi)) = 0, each T(vi) is in the kernel of S. Since a vector space cant contain more independent vectors than its dimension, k "dim(ker(S)), Thus, m + k " m + dim(ker(S)) = dim(ker(T)) + dim(ker(S)). Theorem. Let T be a linear operator on a finite dimensional vector space V. Then T is diagonalizable if and only if its minimal polynomial, mT(x) factors into distinct linear terms over F. Proof: One direction was easy: If T is diagonalizable, then in some basis B, for V, [T]B is a diagonal matrix. The minimal polynomial for T is the same as the minimal polynomial for [T]B, and it is easy to check that the minimal polynomial for a diagonal matrix factors into distinct linear terms.

page 10

For the other direction, let mT(x) = (x ! a1)(x ! a2) (x ! am). Then 0 = mT(T) is the m!fold composition of the linear operators T ! a I, , T ! a I.
1 m

Now V = ker(0). Consequently, by the lemma, we have dim V = dim ker(0) " dim ker(T!a1I) + + dim ker(T!am) " dim V, and this can only be true if dim V = dim ker(T!a1I) + + dim ker(T!am). Since ker(T ! aiI) is the eigenspace of ai, this implies that T is diagonalizable by a previous theorem.

S-ar putea să vă placă și