Sunteți pe pagina 1din 14

Supplemental Notes

on

Kriging


GS871 - ADVANCED GRAVIMETRIC GEODESY




Christopher Jekeli

Division of Geodesy and Geospatial Science
School of Earth Sciences

The Ohio State University

















November 2010



1
A Primer on Kriging
(with an assumed knowledge of least-squares collocation)


Introduction

The following notes roughly follow material presented by R.A. Olea (1999): Geostatistics for
Engineers and Earth Scientists. Kluwer Academic Publ., Boston). The notation is different and
an attempt is made to conform to previous notes in least-squares collocation (lsc). Our
development also differs in that R.A. Olea considers signals in Cartesian space, whereas we may
include signals on a sphere.
Kriging is a kind of optimal estimation of a spatial signal at some points, given observations
of this signal at other points in space. It is closely aligned with lsc (in geodesy), but R.A. Oleas
presentation is more limited with respect to the types of observations and their errors. Kriging is
developed primarily from the point of view of prediction and interpolation of geophysical
signals, rather than a solution to a discrete boundary-value problem. In lsc we begin
fundamentally with a Hilbert space of harmonic functions to which the disturbing potential
belongs and in which we seek its estimate on the basis of observations that are functionals of the
disturbing potential and that include random errors. Ultimately, we are able to estimate any
functional of T from observations of any functionals of T. Observational noise and estimation of
random parameters is fundamentally part of the optimal estimation process. Harmonicity of the
potential is a key element and the covariance function for T can be extended harmonically into
free space, thus allowing the optimal estimation of any functional in free-space. This is not to
say that kriging cannot be (or has not been) extended to the same breadth of applications as lsc in
physical geodesy. The initial foundations and basic assumptions of kriging, however, are much
more primitive. Nevertheless, as well see, there is really only a single main difference between
the two methods and the generalization of kriging to the discrete boundary-value problem in
geodesy could easily be accomplished to the same extent as in lsc.
Three types of kriging are discussed by Olea (1999): simple kriging, ordinary kriging, and
universal kriging. The term kriging was coined in recognition of developments in this field by
the South African, Daniel G. Krige (1919 - ...), in the 1950s. As in lsc, the signal is assumed to
be a sampling of a stochastic process. Kriging differs in the assumption about the mean value or
systematic trend of the signal being observed and estimated. In lsc, we usually assume that the
signal (i.e., T or any of its linear functionals) has zero mean (or we remove it or may attempt to
estimate it). In kriging, no such assumption is generally made. In ordinary and universal
kriging, we need not assume that the covariance is stationary (although we still require
stationarity of the variability of the process), as we do in lsc; however, it is often not
unreasonable to assume stationarity also in kriging (especially in the universal type, where
systematic trends are removed from the estimation process).
The following sections provide the essential derivations of the three types of kriging (where
the reader may need to fill in some details, now and then). Knowledge of lsc is assumed so that
the kriging formulas then exhibit a kind of familiarity and can easily be compared to the
corresponding lsc formulas.


2
I. Simple Kriging

Suppose that we have a signal, s, on the sphere that comes from a random process, with mean
and covariance given by

( )
P P
s m E , (1)


( )
,
cov ,
P Q P Q
s s C , (2)

where C is the covariance function of s and points P and Q are on the sphere. Since, for the most
part, we deal here with only one signal, s, we can use very simple notation by letting the
subscripts of the covariance function denote the points for which it is evaluated. If the random
process is wide-sense stationary, then the mean is a constant and the covariance depends only on
the relative position of P and Q; if it is also isotropic, that dependence is simply the spherical
distance between P and Q, namely
PQ
.
In simple kriging, we assume, in fact, that the process is wide-sense stationary; but for the
sake of simplicity, we continue to use the notation in equation (2) that makes no such explicit
assumption. Now define the signal with its (constant) mean removed:

' s s m . (3)

To generate this centered signal implies that we actually know the value of m. The signal, ' s ,
has zero mean:

( ) ( ) ' 0 s s m m m E E , (4)

but the same covariance as that of s:


( ) ( ) ( ) ( ) ( ) ( )
( )( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
,
,
cov ' , ' ' ' ' '
' 0 ' 0
P Q P Q P P Q Q
P Q
P Q
P P Q Q
P Q
s s C s s s s
s s
s m s m
s s s s
C



E E E
E
E
E E E
(5)

We define points, Q, to be points where the signal is observed, and points, P, where the signal is
to be estimated. The simple kriging estimate of ' s is determined, like the lsc estimate, under the
condition of minimum error variance, and thus it is identical to the lsc estimate (pure
collocation):


T 1
, ,
' '
j j k k
P Q P Q Q Q
s C C s

1 1 1
] ] ]
. (6)
3

The square brackets in these equations denote matrices with indicated elements; using a more
compact notation, we have:


T 1
, ,
' '
P Q P Q Q Q
s C

C s , (7)

where
, ,
j
Q P Q P
C
1

]
C and
, ,
j k
Q Q Q Q
C C
1

]
. The simple kriging estimate of s is, therefore,


( )
T 1
, ,

P Q P Q Q Q
s m C m

+ C s u , (8)

where ( )
T
1 1 L u is a vector of ones, and the mean, m, is assumed to be known. The
ordinary kriging estimate is unbiased as in lsc, which is easily proved by considering the
expectation of equation (8). We also have for the variance of the estimation error, s s , as
in lsc:

( )
2 T 1
, , , ,
var
P
P P Q P Q Q Q P
P C C


C C , (9)

where ( )
,
var
P P P
C s . Since one often does not know the mean of the signal, one either has to
estimate it (as in lsc) or try to work around it; that is, perform the estimation by removing or
filtering it from the estimation procedure. That is the idea behind ordinary kriging, considered in
the next section.


II. Ordinary Kriging

Here we make the assumption that the mean of the signal is not known, but that it is constant.
Furthermore, one can proceed either under the assumption of wide-sense stationarity, or under a
more general hypothesis of what Olea calls intrinsic randomness, which still assumes
stationarity in the variability of the process (but not the variance, for example). We already
know what stationarity implies for the mean and covariance. Intrinsic randomness also implies a
constant mean, but the random similarity between signals at different locations is described by a
variogram, or semi-variogram (which is half of the variogram and more convenient in
applications) instead of the covariance. Intrinsic randomness is less restrictive than stationarity,
in essence, because it is not required to know the value of the mean (as long as it is known that it
is constant; and the variance may not be constant). The semi-variogram for a random signal is
defined as


( )
,
1
var
2
P Q p Q
s s , (10)

where

4

( ) ( )
( )
( ) ( )
( )
( )
( )
( ) ( ) ( )
2
2
2
2 2
var
2
var var 2cov ,
p Q p Q p Q
p Q
P Q P Q
P Q P Q
s s s s s s
s s
s s s s
s s s s


+
+
E E
E
E
(11)

The second equality above follows because the mean is constant. Note, however, that only if the
random signal is also (wide-sense) stationary are the variances equal; in that case, we have


, , , P Q P P P Q
C C . (12)

Note that
,
0
P P
in either case.
In general, we can see that as points, P and Q, become more separated, the covariance
decreases, and the variogram reaches some maximum, which is called the sill. At the origin,
the variogram is strictly zero; however, a practical computation based on discrete data often
causes the next value to have a sudden, abrupt jump to a non-zero value. This is due to very
small scale variability of the discrete data (in essence, due to sampling error) and is called the
nugget effect.
In ordinary kriging, one normalizes the weights of the estimator so that they sum to unity,


T

P Q
s H s ,
T
1
1
n
j
j
H

u H . (13)

In this way, the estimator becomes unbiased:

( )
( )
1 1

j
n n
P j Q j
j j
s H s m H m



E E . (14)

The forced unbiasedness through this condition on the weights also constrains the minimization
of the estimation error variance. Therefore, the cost function, with a Lagrange multiplier, ,
that accounts for the constraint, is set up as follows:

( ) ( ) ( )
T T
1
1 1
var 1 var 1
2 2
n
P P j Q P
j
s s H s

_
+ +

,

H s u H , (15)

The variance of the error is given by:

5

( ) ( )
( ) ( )
( )( )
( )
( ) ( )
T T T
T
2 T
T T
T
var var
var
Q P
Q P Q P
Q P
Q P Q P Q P
s s
s s
s
s s s
C



H s H s H u
H s u
H s u s u H H s u
H H
E E
(16)

where, since the vector,
Q P
s s u, has zero mean,
Q P
s s
C

is its covariance matrix (the last term of
the third equation above thus also vanishes). In this case, the notation for the covariance
function departs from the definition in equation (2) and denotes the covariance of the difference
of the signal at two points. An element of this covariance matrix can be written as


( )( ) ( ) ( ) ( ) ( )
2 2 2
1
2
j k j k j k
Q P Q P Q P Q P Q Q
s s s s s s s s s s
_
+

,
E E , (17)

which is easily verified by adding and subtracting
P
s in the last term of the expectation on the
right side and then expanding. We may thus write each element of the covariance matrix in
terms of semi-variograms:


, , ,
Q P j k j k
s s Q P Q P Q Q
jk
C

1
+
]
; (18)

and, the full matrix becomes:


T
T
, , ,
T T
, , ,
Q P j k j k
s s Q P Q P Q Q
Q P Q P Q Q
C
V

1 1 1 +
] ] ]
+
u u
V u uV
. (19)

where
, ,
j k
Q Q Q Q
V
1
]
= and
, ,
j
Q P Q P

]
V are the semi-variogram matrix and vector, respectively.
Thus, finally, the variance of the error is (with the constraint on the weights, which is satisfied a
priori, regardless of their final values):


( ) ( )
T T T T
, , ,
T T
, ,
var
2
Q P Q P Q P Q Q
Q P Q Q
s V
V
+

H s H V u uV H
H V H H
(20)

Substituting equation (20) into the cost function, equation (15), the latter is now:


( )
T T T
, ,
1
1
2
Q P Q Q
V H V H H + u H . (21)

6
Minimizing the cost function requires that its partial derivatives with respect to the unknowns
(
j
H and ) are zero:


, ,
0
Q P Q Q
V

V H u
H
, (22)


T
1 0

u H . (23)

The last equation is simply a reflection of our constraint. We have the following set of equations
in matrix form


, ,
T
0 1
Q Q Q P
V

_ _ _

, , ,
u H V
u
, (24)

which can be solved for H (and ). Therefore,


1
, ,
T
0 1
Q Q Q P
V

_ _ _

, , ,
H u V
u
, (25)

provided the inverse exists. If the inverse of the semi-variogram matrix exists, then the inverse
of the total matrix is


( )
1 1 1 T 1 1 1
, , , ,
T
1 T 1 1
,
0
Q Q Q Q Q Q Q Q
Q Q
V I v V v V V
v V v


_

_



,
,
uu u u
u
u
, (26)

where


T 1
, Q Q
v V

u u. (27)

The solutions for H and the Lagrange multiplier are thus


( )
( ) ( )
1 1 T 1 1 1
, , , ,
1 1 T 1
, , , ,
1
Q Q Q Q Q P Q Q
Q Q Q P Q Q Q P
V I v V v V
V v V


+
+
H uu V u
V u V u
(28)


( )
1 T 1
, ,
1
Q Q Q P
v V

u V . (29)

Summing the elements of the weight vector, H , written as a slightly modified version of the
first of equations (28), we confirm that

7

( ) ( )
( )
T T 1 1 T 1 1 1
, , , ,
T 1 T 1 T 1 1 T 1
, , , ,
0 1 1
Q Q Q Q Q P Q Q
Q Q Q Q Q P Q Q
I v V V v V
v V V v V


+
+
+
u H u uu V u
u u uu V u u (30)

Substituting equation (28) into (13), we thus have the (ordinary) kriging estimate:


( ) ( )
T 1 T T 1 1
, , , ,
1
P Q P Q P Q Q Q Q Q
s v V V

+ V u V u s . (31)

Again, in this estimate there is no assumption of zero mean in the signal, nor does the mean have
to be known, but it is assumed that the mean is constant. As already derived for the cost function
and with the constraint on the weights, the error variance of the estimate is given by


( )
( )
2
2 T T
, ,
2
P
P p Q P Q Q
s s V

H V H H E . (32)

By nulling the derivatives of the cost function, we had found that


, , Q P Q Q
V V H u. (33)

Multiplying both sides by H , we obtain for the error variance:


2 T T
, ,
2
P
Q P Q Q
V

H V H H . (34)

Substituting equations (28) and (29) for H and , we obtain after some simplification


( )
2
2 T 1 1 1 T 1
, , , , ,
1
P
Q P Q Q Q P Q Q Q P
V v v V


=V V u V . (35)

Note that we have the reproducing property. For
k
P Q , the weight vector reduces to


( )
1 1 T 1 1
, ,
1 1 1 1
, ,
0
1
0
0 0
1 1
0 0
Q Q Q Q
Q Q Q Q
I v V v V
v V v V


_


+



,
_ _


+



, ,
H uu u
= u u=
M
M
M M
M M
(36)
8

and the Lagrange multiplier and error variance both become zero:

( )
2
, ,
,
0
0 1 0 0 1
0
Q
k
Q Q Q Q
k k
V V

_


1
]



,
M
L L
M
. (37)

The previous development for ordinary kriging did not require stationarity in the covariance
function. However, usually we can assume stationarity (and ergodicity), and the estimate can be
written in terms of covariances rather than semi-variograms. We make use of their relationship,
equation (12), when the field is stationary, repeated here for convenience:


, , , P Q P P P Q
C C . (38)

Then, clearly


2 T
, , , 2
1
j k
Q Q P P Q Q Q Q
V C C C

_
1


]
,
= uu , (39)

where
2
, P P
C is the variance of s, and

( )
( )
2
, , 2
1
0
j
Q P s s Q P Q P
C C

_
1


]
,
V = u C . (40)

Substituting these into the cost function, equation (21), we obtain:


( )
( )
T 2 T 2 T T
, , 2 2
2 T T T
, ,
1 1 1
1
2
1 1
1
2 2
Q P Q Q
Q P Q Q
C
C



_ _


, ,
+
H u C H uu H + u H
H C H H + u H
(41)

Minimization yields a similar set of equations for H and :


, ,
T
0 1
Q Q Q P
C

_ _ _


, , ,
u H C
u
, (42)

and the solutions:


( ) ( )
1 1 T 1
, , , ,
1
Q Q Q P Q Q Q P
C c C

+ H C u C u , (43)
9


( )
1 T 1
, ,
1
Q Q Q P
c C

u C , (44)

where


1
,
T
Q Q
c C

u u. (45)

The estimate of the signal at P is then given by


( ) ( )
T 1 T T 1 1
, , , ,
1
P Q P Q P Q Q Q Q Q
s c C C

+ C u C u s . (46)

The estimation error variance (first part of cost function), on the other hand, now becomes


2 2 T T
, ,
2 T
,
2
P
Q P Q Q
Q P
C



+

H C H H
= H C
(47)

With the expressions (43) and (44) for H and , we get after some simplification


( )
2
2 2 T 1 1 T 1
, , , , ,
1
P
Q P Q Q Q P Q Q Q P
C c C



+ C C u C . (48)

The difference between simple kriging (pure lsc) and ordinary kriging can now be seen in the
corresponding error variances. Taking the difference (equations (48) and (9)) we find


( )
2
2 2 1 T 1
(ordinary) (simple) , ,
1
P P
Q Q Q P
c C



u C , (49)

which implies, since 0 c > , that
2 2
(ordinary) (simple)
P P

> . By dealing with an unknown mean in
ordinary kriging, the variance of the error becomes larger it is the price we pay for not knowing
the mean (and we dont estimate it). But, in those cases when we do not know the mean nor
wish to estimate it, ordinary kriging is the correct estimation procedure (as opposed to lsc, which
assumes a zero mean, or requires that we estimate it).


III. Universal Kriging

In this case, we allow the mean to vary, that is, the random signal is definitely not stationary.
However, we may still assume stationarity in the residuals obtained by removing a trend, where
the trend is defined as the variation of the mean. Let this residual be defined as


P P P
y s m , (50)

10
such that its mean is zero for all P. The trend is a deterministic function of P; e.g., Olea (1999)
defines it as a linear combination of polynomials. We will also restrict it somewhat, to the case
that it is a non-zero constant plus some deterministic function:


0 P P
m m f + , (51)

where f does not include a constant and
0
0 m . Let the estimate of s be linear as before:


T

P Q
s H s . (52)

In this case, if the estimate is to be unbiased, we require that the weights satisfy

( ) ( )
T T

P P Q Q
m s H s H m E E , (53)

where
Q
m is the vector of trend values at points, Q. We note that with the particular definition
of the trend, equation (51), we have:


( )
T
0 0 P Q
m f m + + H u f , (54)

for arbitrary points, P and Q. The conditions,


T
1 u H ,
T
P Q
f H f , (55)

follow in view of
0
0 m .
Because of the defining equation (50) and equation (53), we also note that


( ) ( )
T T

P P Q Q P P Q P
s s y m y + + H y m H y ; (56)

therefore, the variance of the error is given by

( ) ( ) ( )
( )
2
T T
var var
P P Q P Q P
s s y y H y H y E , (57)

since
( )
T
0
Q P
y H y E . Hence, proceeding as in equations (16) through (19), we obtain


2 T
P Q P
y y
C


H H , (58)

where the covariance of the difference of residuals can be written in terms of the semi-
variograms of the residuals:

11

( ) ( ) ( )
T
T
, , ,
Q P j k j k
y y y
y y Q P Q P Q Q
C

1 1 1
+
] ] ]
u u . (59)

Imposing the condition that the weights sum to unity (equation (55)), we have for the variance of
the error:


( ) ( )
( )
( )
( ) ( )
T
2 T T
, , ,
T T
, ,
2
P
y y y
Q P Q P Q Q
y y
Q P Q Q
V
V

_
+

,

H V u u V H
H V H H
(60)

With the constraints, equations (55), the cost function to be minimized can now be set up as
follows:


( ) ( )
( ) ( )
T T T T
, ,
1
1
2
y y
Q P Q Q Q P
V f + H V H H + u H f H , (61)

with Lagrange multipliers, and . Setting derivatives with respect to H , , and to zero,
we obtain the solution (in matrix form):


( ) ( )
1
, ,
T
0 0 1
0 0
y y
Q Q Q Q P
T
Q P
V
f

_ _
_





,
, ,
H u f V
u
f
. (62)

The inverse is given by


( )
( )
( ) ( )
( )
( )
( )
( ) ( )
( )
( )
T
1 1 1
1
1 1
, , , T
,
T
1
T
1 1
, T
0 0
0 0
y y y
y
Q Q Q Q Q Q Q Q
Q Q Q
Q
T
y
Q
Q Q
Q
V I B V V B
V
B V B


_ _ _
_


, ,


,

, ,
u
u f u f
u f
f
u
u
f
f
, (63)

where the 2 2 , presumably non-singular matrix, B, is defined by:


( )
( ) ( )
T
1
, T
y
Q Q Q
Q
B V
_



,
u
u f
f
. (64)

Extracting the estimation weights from the solution (62), we have


( )
( )
( )
( )
( )
( )
( )
T
1 1
1
, , , , T
1
y y y y
Q Q Q P Q Q Q Q P
P Q
V B V
f

_ _ _
_
+


,
, , ,
u
H V u f V
f
. (65)
12

To find the variance of the estimation error we first give the solutions to the Lagrange
multipliers:


( )
( )
( )
T
1
1
, , T
1
y y
Q Q Q P
P Q
B V
f

_ _
_ _



, ,
, ,
u
V
f
. (66)

Second, by nulling the derivatives of the cost function, equation (61), we have


( ) ( )
( )
, ,
y y
Q P Q Q Q
V

_


,
V H u f . (67)

Finally, substituting equation (67) into the error variance (first part of the cost function), and
making use of equations (64) through (66), we find (after some simplifications)


( ) ( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
2 T T
, ,
T
,
T
1
, , ,
T 1
, , ,
2
P
y y
Q P Q Q
y
Q P Q
y y y
Q P Q Q Q Q P Q
y y y
Q P Q Q Q P
V
V
V B


_ _


, ,
_ _ _ _
+

, , , ,
_


,
H V H H
H V u f
V u f V u f
V V
(68)

If the residual, y, belongs to a wide-sense stationary process, then, as in ordinary kriging, one
can also formulate the optimal estimate and its error variance in terms of covariances (of y, not
s!) by making use of their relationship to semi-variograms, equation (12). Without derivation,
we give the results:


( )
( )
( )
( )
( )
( )
( )
T
1 1
1
, , , , T
1
y y y y
Q Q Q P Q Q Q Q P
P Q
C B C
f

_ _ _
_
+


,
, , ,
u
H C u f C
f
, (69)


( )
( )
( )
T
1
1
, , T
1
y y
Q Q Q P
P Q
A C
f

_ _
_ _



, ,
, ,
u
C
f
, (70)


( )
( ) ( )
T
1
, T
y
Q Q Q
Q
A C
_



,
u
u f
f
, (71)

13

( )
( )
( )
( )
( )
( )
T 1
2 2
, , ,
P
y y y
Q P Q Q Q P
C A

_
+

,
C C . (72)

Determining the covariances (or the semi-variograms) of the residuals, y, requires that the
residuals be computable, that is, the trend should be known. But then we are back to the usual
least-squares collocation problem. Kriging is supposed to filter out the trends without having to
know them. Since we do need to know them, this is one of the main disadvantages of universal
kriging. Olea (1999, Ch.6) offers some suggestions to deal with this problem in practical
situations, but it seems that there is no procedure that applies in general. In a way, this makes
sense, because with kriging we wish to avoid estimating a trend even if it exists. However, if
that trend is arbitrary (i.e., its form is unknown), then the estimation procedure does not really
know what to filter out. In ordinary kriging, it is possible to filter out a constant, without
actually knowing the constant, since the variogram of the signal is independent of a constant bias
in the signal. If there is a trend in the signal (even if that trend is a constant slope), the variogram
of the signal would depend on that trend, which means it has to be known in order to proceed.
Thus, if faced with a trend that has some deterministic variability it is this writers opinion that it
may be better to model it and/or estimate it (using lsc with parameters).

S-ar putea să vă placă și