Documente Academic
Documente Profesional
Documente Cultură
Matrix calculus
From too much study, and from extreme passion, cometh madnesse.
Isaac Newton [168, 5]
D.1
D.1.1
f (x) ,
f (x)
x2
..
.
f (x)
xK
RK
(1860)
while the second-order gradient of the twice differentiable real function with respect to its
vector argument is traditionally called the Hessian ;
2f (x)
2f (x)
2f (x)
x
x1 x2
x21
1 xK
2f (x)
2f (x)
2f (x)
2
K
2
x
x
x
x
x
2
1
2
K S
f (x) ,
(1861)
..
.. 2
..
..
.
.
.
.
2
2f (x)
2f (x)
f (x)
2
xK x1
xK x2
x
K
x
x
Dattorro, Convex Optimization Euclidean Distance Geometry 2, Moo, v2015.07.21.
577
578
2 v2 (x)
x2
2 vN (x)
x2
RN
(1863)
h(x) ,
h1 (x)
x1
h2 (x)
x1
hN (x)
x1
h1 (x)
x2
h2 (x)
x2
hN (x)
x2
h1 (x)
xK
h2 (x)
xK
hN (x)
xK
..
.
..
.
..
.
(1864)
2
x
x
x
2
2
2
h(x) ,
..
..
..
(1865)
.
.
.
hN (x)
h2 (x)
h1 (x)
xK
xK
xK
=
where the gradient of each real entry is with respect to vector x as in (1860).
The gradient of real function g(X) : RKL R on matrix domain is
g(X) g(X)
g(X)
X11
X12
X1L
g(X) g(X)
g(X)
X21
X22
X2L
g(X) ,
RKL
..
..
..
.
.
.
g(X)
XK1
g(X)
XK2
g(X)
XKL
(1866)
X(:,1) g(X)
X(:,2) g(X)
..
.
RK1L
X(:,L) g(X)
where gradient X(:, i) is with respect to the i th column of X . The strange appearance of
(1866) in RK1L is meant to suggest a third dimension perpendicular to the page (not
D.1 The word matrix comes from the Latin for womb ; related to the prex matri- derived from mater
meaning mother.
579
g(X)
X11
g(X)
X21
g(X) ,
..
g(X)
XK1
2
g(X)
X12
g(X)
X1L
g(X)
X22
..
.
g(X)
X2L
..
.
g(X)
XK2
g(X)
XKL
RKLKL
(1867)
X(:,1) g(X)
X(:,2) g(X)
..
.
RK1LKL
X(:,L) g(X)
g(X) ,
(1868)
2 g(X) ,
(1869)
g12 (X)
g1N (X)
g21 (X)
g(X) ,
..
.
gM 1 (X)
g22 (X)
..
.
gM 2 (X)
g2N (X)
RM N KL
..
gMN (X)
(1870)
580
.
.
.
2
2
2
gM 1 (X) gM 2 (X) gMN (X)
and so on.
D.1.2
ZX
(1873)
(1875)
= (X Ta)1 (X Ta)2 Xb
(1876)
X (X Ta) Xb =
(X Ta)1
X11
JJ
JJ
JJ
(X Ta)2
X11
(X Ta)1
X12
(X Ta)1
X21
JJ
JJ
JJ
JJ
JJ
JJ
(X Ta)2
X12
(X Ta)2
X21
(X Ta)1
X22
JJ
JJ
JJ
(X Ta)2
X22
(1877)
(Xb)1
R212
(Xb)
581
Because gradient of the product (1874) requires total change with respect to change in
each entry of matrix X , the Xb vector must make an inner product with each vector in
that second dimension of the cubix indicated by dotted line segments;
a1
0
0
X (X Ta) Xb =
a
2
=
a1
0
a2
b1 X11 + b2 X12
b1 X21 + b2 X22
R212
(1878)
= abTX T
R22
where the cubix appears as a complete 2 2 2 matrix. In like manner for the second
term X (g) f
b1
X (Xb) X Ta =
0
0
b2
b1
0
= X TabT R22
b2
X11 a1 + X21 a2
X12 a1 + X22 a2
R212
(1879)
The solution
X aTX 2 b = abTX T + X TabT
(1880)
Kronecker product
A partial remedy for venturing into hyperdimensional matrix representations, such as the
cubix or quartix, is to first vectorize matrices as in (37). This device gives rise to the
Kronecker product of matrices ; a.k.a, tensor product. Although it sees reversal in
the literature, [344, 2.1] we adopt the definition: for A Rmn and B Rpq
(1881)
B A ,
Rpmq n
..
..
..
.
.
.
Bp1 A Bp2 A
Bpq A
2
2
n
T
T
T
T
T
vec
X tr(AXBX ) = vec X vec(X) (B A) vec X = B A + B A R
n2
(1882)
582
To disadvantage is a large new but known set of algebraic rules (A.1.1) and the fact
that its mere use does not generally guarantee two-dimensional matrix representation of
gradients.
Another application of the Kronecker product is to reverse order of appearance in
a matrix product: Suppose we wish to weight the columns of a matrix S RM N , for
example, by respective entries wi from the main diagonal in
w1
0
..
SN
W ,
(1883)
.
T
0
wN
w1
0
..
= S(: , 1)w1 S(: , N )wN RM N
S W = S
(1884)
.
0T
wN
To reverse product order such that diagonal matrix W instead appears to the left of S :
for I SM (Law)
S(: , 1)
0
0
.
0
S(: , 2) . .
T
(1885)
S W = ((W ) I )
RM N
.
.
..
..
0
0
0 S(: , N )
To instead weight the rows of S via
S(1 , :)
0
0
S(2
, :)
WS =
.
.
.
0
0
..
((W ) I ) RM N
..
.
0
0 S(M , :)
S(: , 1)
0
0
S(: , 2)
S Y = (Y (: , 1)) (Y (: , N ))
..
.
0
..
..
0
.
.
0
0
S(: , N )
(1886)
RM N (1887)
which converts a Hadamard product into a standard matrix product. In the special case
that S = s and Y = y are vectors in RM
s y = (s)y
(1888)
sT y = ysT
s y T = sy T
(1889)
D.1.3
583
X g f (X)T = X f T f g
(1890)
X2 g f (X)T = X X f T f g = X2 f f g + X f T f2 g X f
(1891)
D.1.3.1
D.1.3.1.1
Two arguments
X g f (X)T , h(X)T = X f T f g + X hT h g
Example.
T
g f (x)T , h(x)T = (f (x) + h(x)) A(f (x) + h(x))
x1
x1
f (x) =
,
h(x) =
x2
x2
x g f (x)T , h(x)T =
1
0
x g f (x)T , h(x)T =
T
(A + A )(f + h) +
0
1+
0
0
1+
0
1
(A + AT )(f + h)
x1
x1
T
(A + A )
+
x2
x2
(1892)
[43, 1.1]
(1893)
(1894)
(1895)
(1896)
(1897)
2
D.1.4
Assume that a differentiable function g(X) : RKL RM N has continuous first- and
second-order gradients g and 2 g over dom g which is an open set. We seek
simple expressions for the first and second directional derivatives in direction Y RKL :
Y
respectively, dg RM N and dg 2 RM N .
Assuming that the limit exists, we may state the partial derivative of the mn th entry
of g with respect to the kl th entry of X ;
gmn (X + t ek eT
gmn (X)
l ) gmn (X)
= lim
R
t0
Xkl
t
(1898)
584
where ek is the k th standard basis vector in RK while el is the l th standard basis vector in
RL . The total number of partial derivatives equals KLM N while the gradient is defined
in their terms; the mn th entry of the gradient is
gmn (X)
gmn (X)
gmn (X)
X12
X1L
X11
g
mn (X)
X
X
X
RKL
21
22
2L
gmn (X) =
(1899)
..
..
..
.
.
.
gmn (X)
gmn (X)
mn (X)
gX
XK1
XK2
KL
while the gradient is a quartix
g11 (X)
g12 (X)
g (X) g (X)
21
22
g(X) =
..
..
.
.
gM 1 (X) gM 2 (X)
g1N (X)
g2N (X)
..
.
gMN (X)
RM N KL
(1900)
g(X) g(X)
g(X)
X11
X12
X1L
g(X) g(X)
g(X)
X22
X2L
(1901)
g(X)T1 = X. 21
RKLM N
.
..
..
..
.
g(X)
g(X)
g(X)
XK1
XK2
XKL
X gmn (X)
Xkl
k,l
X
k,l
gmn (X + t Ykl ek eT
l ) gmn (X)
t0
t
lim
d
=
gmn (X + t Y )
dt t=0
=
t0
(1903)
(1904)
(1905)
(1906)
585
Y
dg21 (X) dg22 (X) dg2N (X)
dg (X) ,
RM N
..
..
..
.
.
.
M2
MN
tr g11 (X)T Y
tr g21 (X)T Y
..
.
tr gM 1 (X)T Y
P g11 (X)
Xkl
Ykl
k,l
P g (X)
21
Ykl
= k,l X. kl
..
P gM 1 (X)
Xkl Ykl
k,l
tr g12 (X)T Y
tr g22 (X)T Y
..
.
tr gM 2 (X)T Y
P g12 (X)
k,l
Xkl
P g22 (X)
k,l
Xkl
Ykl
Ykl
..
.
P gM 2 (X)
k,l
Xkl
dg (X) =
Ykl
dXY
P g1N (X)
k,l
Xkl
Xkl
P g2N (X)
k,l
Xkl
Ykl
Ykl
..
.
P gMN (X)
k,l
X g(X)
k,l
tr g1N (X)T Y
tr g2N (X)T Y
..
.
tr gMN (X)T Y
Xkl
Ykl
(1907)
Ykl
(1908)
Yet for all X dom g , any Y RKL , and some open interval of t R
Y
(1909)
which is the first-order Taylor series expansion about X . [235, 18.4] [166, 2.3.4]
Differentiation with respect to t and subsequent t-zeroing isolates the second term of
expansion. Thus differentiating and zeroing g(X + t Y ) in t is an operation equivalent
to individually differentiating and zeroing every entry gmn (X + t Y ) as in (1906). So the
directional derivative of g(X) : RKL RM N in any direction Y RKL evaluated at
X dom g becomes
Y
d
g(X + t Y ) RM N
(1910)
dg (X) =
dt t=0
[294, 2.1, 5.4.5] [35, 6.3.1] which is simplest. In case of a real function g(X) : RKL R
Y
dg (X) = tr g(X)T Y
(1932)
D.2 Although
586
( , f ())
f ( + t y)
x f ()
x f ()
1
2 df()
f (x)
dg (X) = g(X)T Y
(1935)
D.1.4.1
In the case of any differentiable real function g(X) : RKL R , the directional derivative
of g(X) at X in any direction Y yields the slope of g along the line {X + t Y | t R}
through its domain evaluated at t = 0. For higher-dimensional functions, by (1907), this
slope interpretation can be applied to each entry of the directional derivative.
Figure 174, for example, shows a plane slice of a real convex bowl-shaped function
f (x) along a line { + t y | t R} through its domain. The slice reveals a one-dimensional
real function of t ; f ( + t y). The directional derivative at x = in direction y is
the slope of f ( + t y) with respect to t at t = 0. In the case of a real function having
vector argument h(X) : RK R , its directional derivative in the normalized direction
587
of its gradient is the gradient magnitude. (1935) For a real function of real variable, the
directional derivative evaluated at any point in the function domain is just the slope of
that function there scaled by the real direction. (confer 3.6)
Directional derivative generalizes our one-dimensional notion of derivative to a
multidimensional domain. When direction Y coincides with a member of the standard
Cartesian basis ek eT
l (60), then a single partial derivative g(X)/Xkl is obtained from
directional derivative (1908); such is each entry of gradient g(X) in equalities (1932)
and (1935), for example.
D.1.4.1.1 Theorem. Directional derivative optimality condition.
[266, 7.4]
Suppose f (X) : RKL R is minimized on convex set C RKL by X , and the
directional derivative of f exists there. Then for all X C
XX
df (X) 0
(1911)
(1912)
xa
f (x) + b
RK R
(1913)
Such a vector is
x f (x)
x f (x)
1
2 df(x)
x f (x) = 2(x a)
(1914)
(1915)
(1916)
2
588
D.1.5
gmn (X)
gmn (X)
=
=
Xkl
Xkl
2gmn (X)
Xkl X11
2gmn (X)
Xkl X12
2gmn (X)
Xkl X21
2gmn (X)
Xkl X22
2gmn (X)
Xkl XK1
2gmn (X)
Xkl XK2
..
.
mn (X)
gX
11
gmn (X)
X21
2 gmn (X) =
..
mn (X)
gX
K1
..
.
2gmn (X)
Xkl XKL
mn (X)
gX
1L
mn (X)
gX
22
mn (X)
gX
2L
mn (X)
gX
K2
mn (X)
gX
KL
..
.
gmn (X)
X11
gmn (X)
X12
gmn (X)
X21
gmn (X)
X22
gmn (X)
XK1
gmn (X)
XK2
..
.
2gmn (X)
Xkl X2L
..
.
mn (X)
gX
12
2gmn (X)
Xkl X1L
..
.
..
.
gmn (X)
X1L
gmn (X)
X2L
gmn (X)
XKL
..
.
RKL
(1917)
RKLKL
(1918)
2 g11 (X)
2 g (X)
21
2 g(X) =
..
2 gM 1 (X)
T1
g(X)
T2
g(X)
2 g12 (X)
2 g22 (X)
..
.
2 gM 2 (X)
g(X)
X11
g(X)
X21
=
..
g(X)
XK1
g(X)
X12
g(X)
X22
..
.
2 g1N (X)
2 g2N (X)
..
.
2 gMN (X)
g(X)
XK2
g(X)
X11
g(X)
X12
g(X)
X21
g(X)
X22
g(X)
XK1
g(X)
XK2
..
.
..
.
g(X)
X1L
RM N KLKL
g(X)
X2L
RKLM N KL
..
g(X)
XKL
g(X)
X1L
g(X)
X2L
g(X)
XKL
..
.
RKLKLM N
(1919)
(1920)
(1921)
589
Assuming the limits exist, we may state the partial derivative of the mn th entry of g with
respect to the kl th and ij th entries of X ;
2gmn (X)
Xkl Xij
T
T
T
gmn (X+t ek eT
l + ei ej )gmn (X+t ek el )(gmn (X+ ei ej )gmn (X))
t
,t0
lim
(1922)
= lim
(1923)
T
T
T
gmn (X+t Ykl ek eT
l + Yij ei ej )gmn (X+t Ykl ek el )(gmn (X+ Yij ei ej )gmn (X))
t
,t0
lim
X X 2gmn (X)
T
Ykl Yij = tr X tr gmn (X)T Y Y
Xkl Xij
i,j
(1924)
k,l
X
i,j
=
=
lim
t0
lim
(1925)
(1926)
t0
d
gmn (X + t Y )
dt2 t=0
dg 2(X) ,
..
..
.
.
d 2gM 1 (X)
d 2gM 2 (X)
(1927)
d 2g1N (X)
d 2g2N (X)
..
.
RM N
2
d gMN (X) dXY
T
tr tr g12 (X)T Y Y tr tr g1N (X)T Y Y
tr tr g11 (X)T Y Y
tr tr g21 (X)T Y Y
tr tr g22 (X)T Y Y tr tr g2N (X)T Y Y
..
..
..
.
.
.
T
T
T
T
tr tr gM 1 (X) Y Y
tr tr gM 2 (X) Y Y tr tr gMN (X) Y Y
PP 2
g11 (X)
Ykl Yij
i,j k,l Xkl Xij
P P 2g21 (X)
P P 2gM 1 (X)
Xkl Xij Ykl Yij
i,j k,l
PP
i,j k,l
PP
i,j k,l
PP
i,j k,l
2g12 (X)
Xkl Xij Ykl Yij
2
g22 (X)
Xkl Xij Ykl Yij
..
.
2gM 2 (X)
Xkl Xij Ykl Yij
PP
i,j k,l
PP
i,j k,l
2g1N (X)
Xkl Xij Ykl Yij
2
g2N (X)
Xkl Xij Ykl Yij
..
.
P P 2gMN (X)
i,j k,l
Xkl Xij
Ykl Yij
(1928)
590
dg (X) =
X X 2g(X)
X Y
Ykl Yij =
dg (X) Yij
Xkl Xij
Xij
i,j
i,j
(1929)
k,l
Yet for all X dom g , any Y RKL , and some open interval of t R
Y
1 2 Y2
t dg (X) + o(t3 )
2!
(1930)
which is the second-order Taylor series expansion about X . [235, 18.4] [166, 2.3.4]
Differentiating twice with respect to t and subsequent t-zeroing isolates the third term of
the expansion. Thus differentiating and zeroing g(X + t Y ) in t is an operation equivalent
to individually differentiating and zeroing every entry gmn (X + t Y ) as in (1927). So
the second directional derivative of g(X) : RKL RM N becomes [294, 2.1, 5.4.5]
[35, 6.3.1]
Y
d 2
2
(1931)
dg (X) = 2 g(X + t Y ) RM N
dt
t=0
which is again simplest. (confer (1910)) Directional derivative retains the dimensions of g .
D.1.6
In the case of a real function g(X) : RKL R , all its directional derivatives are in R :
Y
dg (X) = tr g(X)T Y
Y
2
T
dg (X) = tr X tr g(X)T Y Y = tr X dg (X)T Y
T T
2
T
T
dg (X) = tr X tr X tr g(X) Y Y Y = tr X dg (X) Y
Y
3
(1932)
(1933)
(1934)
dg (X) = g(X)T Y
Y
2
dg (X) = Y T 2 g(X)Y
(1936)
T
dg (X) = X Y T 2 g(X)Y Y
(1937)
Y
3
and so on.
(1935)
D.1.7
591
Taylor series
1 3 Y3
1 2 Y2
dg (X) +
dg (X) + o(4 )
2!
3!
(1938)
1 Y2 X
1 Y3 X
dg (X) +
dg (X) + o(kY k4 )
2!
3!
(1939)
which are third-order expansions about X . The mean value theorem from calculus is what
insures finite order of the series. [235] [43, 1.1] [42, App.A.5] [215, 0.4] These somewhat
unbelievable formulae imply that a function can be determined over the whole of its domain
by knowing its value and all its directional derivatives at a single point X .
D.1.7.0.1 Example. Inverse-matrix function.
Say g(Y ) = Y 1 . From the table on page 596,
Y
d
g(X + t Y ) = X 1 Y X 1
dg (X) =
dt t=0
Y
2
d 2
dg (X) = 2 g(X + t Y ) = 2X 1 Y X 1 Y X 1
dt t=0
Y
d3
dg 3(X) = 3 g(X + t Y ) = 6X 1 Y X 1 Y X 1 Y X 1
dt t=0
(1940)
(1941)
(1942)
Lets find the Taylor series expansion of g about X = I : Since g(I ) = I , for kY k2 < 1
( = 1 in (1938))
g(I + Y ) = (I + Y )1 = I Y + Y 2 Y 3 + . . .
(1943)
(1944)
2
we instead set g(Y ) = (I + Y )1 , then the equivalent expansion would have been about X = 0.
592
D.1.8
From the foregoing expressions for directional derivative, we derive a relationship between
gradient with respect to matrix X and derivative with respect to real variable t :
D.1.8.1
first-order
d
tr X gmn (X + t Y )T Y = gmn (X + t Y )
dt
(1946)
d
tr X g(X + t Y )T Y = g(X + t Y )
dt
(1947)
d
g(X + t Y )
dt
(1948)
tr X g(X + t Y )T Y = tr 2wwT(X T + t Y T )Y
T
= 2w (X Y + t Y Y )w
(1949)
(1950)
= wT X T Y + Y TX + 2t Y T Y w
T
= 2w (X Y + t Y Y )w
X gmn (X + t Y )
Ykl
Xkl
k, l
(1951)
(1952)
(1953)
tr X g(X + t Y )T Y
= 2wT(X T Y + t Y T Y )w
= 2 tr wwT(X T + t Y T )Y
tr X g(X)T Y
= 2 tr wwTX T Y
X g(X) = 2XwwT
D.1.8.2
593
(1954)
second-order
dg (X + t Y ) =
d2
g(X + t Y )
dt2
(1955)
we can find a similar relationship between second-order gradient and second derivative: In
the general case g(X) : RKL RM N from (1924) and (1927),
T
d2
tr X tr X gmn (X + t Y )T Y Y = 2 gmn (X + t Y )
dt
(1956)
T
d2
tr X tr X g(X + t Y )T Y Y = 2 g(X + t Y )
dt
(1957)
From (1936), the simpler case, where real function g(X) : RK R has vector argument,
Y T X2 g(X + t Y )Y =
d2
g(X + t Y )
dt2
(1958)
d
tr hmn (X)T Y
=
hmn (X + t Y )
dt t=0
d
=
h(X
+
t
Y
)
dt
mn
t=0
d
=
(X + t Y )1
dt t=0
mn
= X 1 Y X 1 mn
(1959)
(1960)
(1961)
(1962)
(1963)
594
KK
Setting Y to a member of {ek eT
| k , l = 1 . . . K } , and employing a property (39)
l R
of the trace function we find
1
2 g(X)mnkl = tr hmn (X)T ek eT
= hmn (X)kl = X 1 ek eT
(1964)
l
l X
mn
1
2 g(X)kl = h(X)kl = X 1 ek eT
RKK
l X
(1965)
2
From all these first- and second-order expressions, we may generate new ones
by evaluating both sides at arbitrary t (in some open interval) but only after the
differentiation.
D.2
t , R ,
d
dx
..
.
d
dxk
ex,
|x| ,
sgn x , x/y (Hadamard quotient), x (entrywise square root), etcetera, are maps
f : Rk Rk that maintain dimension; e.g, (A.1.1)
d 1
x
, x 1T (x)1 1
dx
(1966)
For A a scalar or square matrix, we have the Taylor series [80, 3.6]
X
1 k
e ,
A
k!
A
(1967)
k=0
A Sm
(1968)
(1969)
D.2.1
595
algebraic
x x = x xT = I Rkk
X X = X X T , I RKLKL
(Identity)
x (Ax b) = AT
x xTA bT = A
(y)
sgn(Ax b)
x 1T f (|Ax b|) = AT dfdy
y=|Axb|
x xTAx + 2xTB y + y TC y = A + AT x + 2B y
T
T
x (x
+ y) A(x + y) = (A +A )(x + y)
x2 xTAx + 2xTB y + y TC y = A + AT
x aTxTxb = 2xaT b
confer
X 1
1
(1901)
= X 1 ek eT
X
,
l
Xkl
(1965)
X aTX TXb = X(abT + baT )
x aTxTxa = 2xaTa
x aTxxTa = 2aaTx
X aTXX T a = 2aaT X
x aTyxT b = b aTy
X aT Y X T b = baT Y
x aTy Tx b = y bTa
X aT Y TXb = Y abT
x aTxy T b = a bTy
X aTXY T b = abT Y
x aTxTy b = y aT b
X aTX T Y b = Y baT
X (X 1 )kl =
596
algebraic continued
d
dt (X +
tY ) = Y
d
T
dt B (X +
d
T
dt B (X +
d
T
dt B (X +
t Y )1 A = B T (X + t Y )1 Y (X + t Y )1 A
t Y )TA = B T (X + t Y )T Y T (X + t Y )TA
t Y ) A = ... , 1 1, X , Y SM
+
d2
B T (X +
dt2
3
d
B T (X +
dt3
t Y )1 A =
2B T (X + t Y )1 Y (X + t Y )1 Y (X + t Y )1 A
t Y )1 A = 6B T (X + t Y )1 Y (X + t Y )1 Y (X + t Y )1 Y (X + t Y )1 A
d
(X + t Y )TA(X + t Y ) = Y TAX +
dt
2
d
(X + t Y )TA(X + t Y ) = 2 Y TAY
dt2
1
d
T
dt (X+ t Y ) A(X + t Y )
1 T
T
X TAY + 2 t Y TAY
1
= (X + t Y ) A(X + t Y ) (Y AX + X TAY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
d
dt ((X + t Y )A(X + t Y )) = YAX + XAY + 2 t YAY
d2
((X + t Y )A(X + t Y )) = 2 YAY
dt2
x
1
=
.
H
log x
0
D.2.2
trace Kronecker
D.2.3
597
trace
x x = I
X tr X = X tr X = I
d 1
x 1T (x)1 1 = dx
x = x2
T
1
x 1 (x) y = (x)2 y
X tr X 1 = X 2T
X tr(X 1 Y ) = X tr(Y X 1 ) = X T Y TX T
d
dx x
X tr X = X 1 ,
= x 1
X SM
X tr X j = jX (j1)T
x (b aTx)1 = (b aTx)2 a
x (b aTx) = (b aTx)1 a
x xTy = x y Tx = y
T
X tr (B AX)1 = (B AX)2 A
X tr(X T Y ) = X tr(Y X T ) = X tr(Y TX) = X tr(XY T ) = Y
X tr(AXBX T ) = X tr(XBX TA) = ATXB T + AXB
X tr(AXBX) = X tr(XBXA) = ATX TB T + B TX TAT
X tr(AXAXAX) = X tr(XAXAXA) = 3(AXAXA )T
X tr(Y X k ) = X tr(X k Y ) =
k1
P
X i Y X k1i
i=0
X tr (X + Y )T (X + Y ) = 2(X + Y ) = X kX + Y k2F
X tr((X + Y )(X + Y )) = 2(X + Y )T
X tr(ATXB) = X tr(X TAB T ) =
AB T
T 1
T
T
T
X tr(A X B) = X tr(X AB ) = X AB T X T
X aTXb = X tr(baTX) = X tr(XbaT ) = abT
X bTX Ta = X tr(X TabT ) = X tr(abTX T ) = abT
X aTX 1 b = X tr(X T abT ) = X T abT X T
X aTX b = ...
598
trace continued
d
dt
d
tr g(X + t Y ) = tr dt
g(X + t Y )
d
dt
tr(X + t Y ) = tr Y
d
dt
tr j(X + t Y ) = j tr j1(X + t Y ) tr Y
d
dt
tr(X + t Y )j = j tr (X + t Y )j1 Y
d
dt
[219, p.491]
( j)
tr((X + t Y )Y ) = tr Y 2
d
dt
tr (X + t Y )k Y =
d
dt
tr(Y (X + t Y )k ) = k tr (X + t Y )k1 Y 2 ,
d
dt
tr (X + t Y )k Y =
d
dt
tr(Y (X + t Y )k ) = tr
d
dt
d
dt
d
dt
d
dt
d
dt
k1
P
k {0, 1, 2}
(X + t Y )i Y (X + t Y )k1i Y
i=0
tr(X + t Y )1 Y = tr (X + t Y )1 Y (X + t Y )1 Y
trB T (X + t Y )1 A = trB T (X + t Y )1 Y (X + t Y )1 A
trB T (X + t Y )TA = tr B T (X + t Y )T Y T (X + t Y )TA
tr B T (X + t Y )k A = ... , k > 0
tr B T (X + t Y ) A = ... , 1 1, X , Y SM
+
d2
dt2
tr B T (X + t Y )1 A = 2 tr B T (X + t Y )1 Y (X + t Y )1 Y (X + t Y )1 A
d
(X + t Y )TA(X + t Y ) = tr Y TAX + X TAY
dt tr
d2
tr (X + t Y )TA(X + t Y ) = 2 tr Y TAY
dt2
1
d
+ t Y )TA(X + t Y )
dt tr (X
1 T
T
T
= tr (X + t Y ) A(X + t Y )
+ 2 t Y TAY
1
(Y AX + X AY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
d
dt tr((X + t Y )A(X + t Y )) = tr(YAX + XAY
d2
tr((X + t Y )A(X + t Y )) = 2 tr(YAY )
dt2
+ 2 t YAY )
D.2.4
599
logarithmic determinant
log x = x1
X log det X = X T
X2 log det(X)kl =
X T
1 T
= X 1 ek eT
, confer (1918)(1965)
l X
Xkl
d
dx
log x1 = x1
X log det X 1 = X T
d
dx
log x = x1
X log det X = X T
X log det X = X T
X log det X k = X log detk X = kX T
X log det (X + t Y ) = (X + t Y )T
1
x log(aTx + b) = a aTx+b
d2
dt2
d
dt
d2
dt2
d
dt
log det((A(x
+ t y) + a)2 + I)
= tr ((A(x + t y) + a)2 + I)
2(A(x + t y) + a)(Ay)
600
D.2.5
determinant
X det X = det(X )X T
X R22
d2
dt2
d
dt
d2
dt2
d
dt
D.2.6
logarithmic
Matrix logarithm.
d
dt log(X +
d
dt log(I
t Y ) = Y (X + t Y )1 = (X + t Y )1 Y ,
XY = YX
t Y ) = Y (I t Y )1 = (I t Y )1 Y
[219, p.493]
D.2.7
601
exponential
X)
= X det eY
X tr eY X = eY
XT
= etr(Y
Y T = Y T eX
X)
( X , Y )
YT
x 1T eAx = ATeAx
x 1T e|Ax| = AT (sgn(Ax))e|Ax|
(Ax)i 6= 0
1
ex
1T e x
1
1
x
x xT
2
T x
x log(1 e ) = T x (e ) T x e e
1 e
1 e
x log(1T e x ) =
x
x2
k
Q
i=1
k
Q
xik =
1
k
i=1
d tY
dt e
1
k
xik =
k
Q
i=1
1
xik 1/x
k
Q
i=1
xik
1
(x)2 (1/x)(1/x)T
k
= etY Y = Y etY
d X+ t Y
dt e
d 2 X+ t Y
e
dt2
= eX+ t Y Y = Y eX+ t Y ,
= eX+ t Y Y 2 = Y eX+ t Y Y = Y 2 eX+ t Y ,
d j tr(X+ t Y )
e
dt j
= etr(X+ t Y ) tr j(Y )
XY = YX
XY = YX