Documente Academic
Documente Profesional
Documente Cultură
Proofs subject tocorrection. Not tobe reproducedwithout permission. Contributions tothe dis-
cussionmust not exceed400 words. Contributions longer than400 words will be cut by the editor.
Sendcontributionstojournal@rss.org.uk.Seehttp://www.rss.org.uk/preprints
RSSB b2136 Dispatch: 11.2.2011 No. of pages:37
J. R. Statist. Soc. B (2011)
73, Part 4, pp.
An explicit link between Gaussian elds and
Gaussian Markov random elds: the stochastic
partial differential equation approach
Finn Lindgren and Hvard Rue
Norwegian University of Science and Technology, Trondheim, Norway
and Johan Lindstrm
Lund University, Sweden
[Read before The Royal Statistical Society at a meeting organized by the Research Section on
Wednesday, March 16th, 2011, Professor D. M. Titterington in the Chair ]
Summary. Continuously indexed Gaussian elds (GFs) are the most important ingredient in
spatial statistical modelling and geostatistics. The specication through the covariance func-
tion gives an intuitive interpretation of the eld properties. On the computational side, GFs are
hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the
dimension. Although computational power today is at an all time high, this fact seems still to be
a computational bottleneck in many applications. Along with GFs, there is the class of Gauss-
ian Markov random elds (GMRFs) which are discretely indexed. The Markov property makes
the precision matrix involved sparse, which enables the use of numerical algorithms for sparse
matrices, that for elds in R
2
only use the square root of the time required by general algorithms.
The specication of a GMRF is through its full conditional distributions but its marginal properties
are not transparent in such a parameterization. We show that, using an approximate stochastic
weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the
Matrn class, provide an explicit link, for any triangulation of R
d
, between GFs and GMRFs, for-
mulated as a basis function representation. The consequence is that we can take the best from
the two worlds and do the modelling by using GFs but do the computations by using GMRFs.
Perhaps more importantly, our approach generalizes to other covariance functions generated
by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illus-
trate our approach by analysing global temperature data with a non-stationary model dened
on a sphere.
Keywords: Approximate Bayesian inference; Covariance functions; Gaussian elds; Gaussian
Markov random elds; Latent Gaussian models; Sparse matrices; Stochastic partial differential
equations
1. Introduction
Gaussian elds (GFs) have a dominant role in spatial statistics and especially in the traditional
eld of geostatistics (Cressie, 1993; Stein, 1999; Chils and Delner, 1999; Diggle and Ribeiro,
2006) and form an important building block in modern hierarchical spatial models (Banerjee
et al., 2004). GFs are one of a fewappropriate multivariate models with an explicit and comput-
able normalizing constant and have good analytic properties otherwise. In a domain DR
d
with
co-ordinate s D, x.s/ is a continuously indexed GF if all nite collections {x.s
i
/} are jointly
Address for correspondence: Hvard Rue, Department of Mathematical Sciences, Norwegian University of
Science and Technology, N-7491 Trondheim, Norway.
E-mail: hrue@math.ntnu.no
2 F. Lindgren, H. Rue and J. Lindstrm
Gaussian distributed. In most cases, the GF is specied by using a mean function ./ and a
covariance functionC., /, sothe meanis =..s
i
// andthe covariance matrix is =.C.s
i
, s
j
//.
Oftenthe covariance functionis only a functionof the relative positionof twolocations, inwhich
case it is said to be stationary, and it is isotropic if the covariance functions depends only on the
Euclidean distance between the locations. Since a regular covariance matrix is positive denite,
the covariance function must be a positive denite function. This restriction makes it difcult to
invent covariance functions stated as closed form expressions. Bochners theorem can be used
in this context, as it characterizes all continuous positive denite functions in R
d
.
Although GFs are convenient from both an analytical and a practical point of view, the
computational issues have always been a bottleneck. This is due to the general cost of O.n
3
/ to
factorize dense nn (covariance) matrices. Although the computational power today is at an
all time high, the tendency seems to be that the dimension n is always set, or we want to set it, a
little higher than the value that gives a reasonable computation time. The increasing popularity
of hierarchical Bayesian models has made this issue more important, as repeated computations
(as for simulation-based model tting) can be very slow, perhaps infeasible (Banerjee et al.
(2004), page 387), and the situation is informally referred to as the big n problem.
There are several approaches to try to overcome or avoid the big n problem. The spec-
tral representation approach for the likelihood (Whittle, 1954) makes it possible to estimate the
(power) spectrum(using discrete Fourier transforms calculations) and to compute the log-likeli-
hood fromit (Guyon, 1982; Dahlhaus and Knsch, 1987; Fuentes, 2008) but this is only possible
for directly observed stationary GFs on a (near) regular lattice. Vecchia (1988) and Stein et al.
(2004) proposed to use an approximate likelihood constructed through a sequential represen-
tation and then to simplify the conditioning set, and similar ideas also apply when computing
conditional expectations (kriging). An alternative approach is to do exact computations on a
simplied Gaussian model of low rank (Banerjee et al., 2008; Cressie and Johannesson, 2008;
Eidsvik et al., 2010). Furrer et al. (2006) applied covariance tapering to zero-out parts of the
covariance matrix to gain a computational speed-up. However, the sparsity pattern will depend
on the range of the GFs, and the potential in a related approach, named lattice methods by
Banerjee et al. (2004), section A.5.3, is superior to the covariance tapering idea. In this approach
the GF is replaced by a Gaussian Markov random eld (GMRF); see Rue and Held (2005) for
a detailed introduction and Rue et al. (2009), section 2.1, for a condensed review. A GMRF is
a discretely indexed Gaussian eld x, where the full conditionals .x
i
[x
i
/, i =1, . . . , n, depend
only on a set of neighbours @i to each site i (where consistency requirements imply that if
i @j then also j @i). The computational gain comes from the fact that the zero pattern of the
precision matrix Q (the inverse covariance matrix) relates directly to the notion of neighbours;
Q
ij
,=0i @j j; see, for example, Rue and Held (2005) section 2.2. Algorithms for Markov
chain Monte Carlo sampling will repeatedly update from these simple full conditionals, which
explains to a large extent the popularity of GMRFs in recent years, starting already with the
seminal papers by Besag (1974, 1975). However, GMRFs also allow for fast direct numeri-
cal algorithms (Rue, 2001), as numerical factorization of the matrix Q can be done by using
sparse matrix algorithms (George and Liu, 1981; Duff et al., 1989; Davis, 2006) at a typical
cost of O.n
3=2
/ for two-dimensional GMRFs; see Rue and Held (2005) for detailed algorithms.
GMRFs have very good computational properties, which are of major importance in Bayesian
inferential methods. This is further enhanced by the link to nested integrated Laplace approx-
imations (Rue et al., 2009), which allow fast and accurate Bayesian inference for latent GF
models.
Although GMRFs have very good computational properties, there are reasons why current
statistical models based on GMRFs are relatively simple, in particular when applied to area data
Link between Gaussian Fields and Gaussian Markov Random Fields 3
fromregions or counties. First, there has been no good way to parameterize the precision matrix
of a GMRF to achieve predened behaviour in terms of correlation between two sites and to
control marginal variances. In matrix terms, the reason for this is that one must construct a
positive denite precision matrix to obtain a positive denite covariance matrix as its inverse, so
the conditions for proper covariance matrices are replaced by essentially equivalent conditions
for sparse precision matrices. Therefore, often simplistic approaches are taken, like letting Q
ij
be related to the reciprocal distance between sites i and j (Besag et al., 1991; Arjas and Gasbarra,
1996; Weir and Pettitt, 2000; Pettitt et al., 2002; Gschll and Czado, 2007); however, a more
detailed analysis shows that such a rationale is suboptimal (Besag and Kooperberg, 1995; Rue
and Tjelmeland, 2002) and can give surprising effects (Wall, 2004). Secondly, it is unclear how
large the class of useful GMRF models really is by using only a simple neighbourhood. The
complicating issue here is the global positive deniteness constraint, and it might not be evident
how this inuences the parameterization of the full conditionals.
Rue and Tjelmeland (2002) demonstrated empirically that GMRFs could closely approxi-
mate most of the commonly used covariance functions in geostatistics, and they proposed to
use them as computational replacements for GFs for computational reasons like doing kriging
(Hartman and Hssjer, 2008). However, there were several drawbacks with their approach; rst,
the tting of GMRFs to GFs was restricted to a regular lattice (or torus) and the t itself had to
be precomputed for a discrete set of parameter values (like smoothness and range), using a time-
consuming numerical optimization. Despite these proof-of-concept results, several researchers
have followed up this idea without any notable progress in the methodology (Hrafnkelsson and
Cressie, 2003; Song et al., 2008; Cressie and Verzelen, 2008), but the approach itself has shown
to be useful even for spatiotemporal models (Allcroft and Glasbey, 2003).
The discussion so far has revealed a modelling or computational strategy for approaching
the big n problem in a seemingly good way.
(a) Do the modelling by using a GF on a set of locations {s
i
}, to construct a discretized GF
with covariance matrix .
(b) Find a GMRF with local neighbourhood and precision matrix Q that represents the GF
in the best possible way, i.e. Q
1
is close to in some norm. (We deliberately use the
word represents instead of approximates.)
(c) Do the computations using the GMRF representation by using numerical methods for
sparse matrices.
Such an approach relies on several assumptions. First the GF must be of such a type that there
is a GMRF with local neighbourhood that can represent it sufciently accurately to maintain
the interpretation of the parameters and the results. Secondly, we must be able to compute the
GMRF representation from the GF, at any collections of locations, so fast that we still achieve
a considerable speed-up compared with treating the GF directly.
The purpose of this paper is to demonstrate that these requirements can indeed be met for
certain members of GFs with the Matrn covariance function in R
d
, where the GMRF rep-
resentation is available explicitly. Although these results are seemingly restrictive at rst sight,
they cover the most important and most used covariance model in spatial statistics; see Stein
(1999), page 14., which concluded a detailed theoretical analysis with Use the Matrn model.
The GMRF representation can be constructed explicitly by using a certain stochastic partial
differential equation (SPDE) which has GFs with Matrn covariance function as the solution
when driven by Gaussian white noise. The result is a basis function representation with piece-
wise linear basis functions, and Gaussian weights with Markov dependences determined by a
general triangulation of the domain.
4 F. Lindgren, H. Rue and J. Lindstrm
Rather surprisingly, extending this basic result seems to open new doors and opportunities,
and to provide quite simple answers to rather difcult modelling problems. In particular, we
shall show how this approach extends to Matrn elds on manifolds, non-stationary elds and
elds with oscillating covariance functions. Further, we shall discuss the link to the deformation
method of Sampson and Guttorp (1992) for non-stationary covariances for non-isotropic mod-
els, and how our approach naturally extends to non-separable spacetime models. Our basic
task, to do the modelling by using GFs and the computations by using the GMRF representa-
tion, still holds for these extensions as the GMRF representation is still available explicitly. An
important observation is that the resulting modelling strategy does not involve having to con-
struct explicit formulae for the covariance functions, which are instead only dened implicitly
through the SPDE specications.
The plan of the rest of this paper is as follows. In Section 2, we discuss the relationship between
Matrn covariances and a specic SPDE, and we present the two main results for explicitly con-
structing the precision matrices for GMRFs based on this relationship. In Section 3, the results
are extended to elds on triangulated manifolds, non-stationary and oscillating models, and
non-separable spacetime models. The extensions are illustrated with a non-stationary analysis
of global temperature data in Section 4, and we conclude the main part of the paper with a dis-
cussion in Section 5. Thereafter follows four technical appendices, with explicit representation
results (A), theory for random elds on manifolds (B), the Hilbert space representation details
(C) and proofs of the technical details (D).
2. Preliminaries and main results
This section will introduce the Matrn covariance model and discuss its representation through
an SPDE. We shall state explicit results for the GMRF representation of Matrn elds on a
regular lattice and do an informal summary of the main results.
2.1. Matrn covariance model and its stochastic partial differential equation
Let || denote the Euclidean distance in R
d
. The Matrn covariance function between locations
u, v R
d
is dened as
r.u, v/ =
2
2
1
./
.|v u|/
is the modied Bessel function of the second kind and order >0, >0 is a scaling
parameter and
2
is the marginal variance. The integer value of determines the mean-square
differentiability of the underlying process, which matters for predictions that are made by using
such a model. However, is usually xed since it is poorly identied in typical applications. A
more natural interpretation of the scaling parameter is as a range parameter ; the Euclidean
distance where x.u/ and x.v/ is almost independent. Lacking a simple relationship, we shall
throughout this paper use the empirically derived denition =
.8/=, corresponding to
correlations near 0.1 at the distance , for all .
The Matrn covariance function appears naturally in various scientic elds (Guttorp and
Gneiting, 2006), but the important relationship that we shall make use of is that a GF x.u/ with
the Matrn covariance is a solution to the linear fractional SPDE
.
2
/
=2
x.u/ =W.u/, u R
d
, = d=2, >0, >0, .2/
where .
2
/
=2
is a pseudo-differential operator that we shall dene later in equation (4)
through its spectral properties (Whittle, 1954, 1963). The innovation process W is spatial
Link between Gaussian Fields and Gaussian Markov Random Fields 5
Gaussian white noise with unit variance, is the Laplacian
=
d
i=1
@
2
@x
2
i
,
and the marginal variance is
2
=
./
. d=2/.4/
d=2
2
:
We shall name any solution to equation (2) a Matrn eld in what follows. However, the limiting
solutions to the SPDE (2) as 0 or 0 do not have Matrn covariance functions, but the
SPDE still has solutions when =0 or =0 which are well-dened random measures. We shall
return to this issue in Appendix C.3. Further, there is an implicit assumption of appropriate
boundary conditions for the SPDE, as for 2 the null space of the differential operator is
non-trivial, containing, for example, the functions exp.e
T
u/, for all |e|=1. The Matrn elds
are the only stationary solutions to the SPDE.
The proof that was given by Whittle (1954, 1963) is to show that the wave number spectrum
of a stationary solution is
R.k/ =.2/
d
.
2
|k|
2
/
, .3/
using the Fourier transform denition of the fractional Laplacian in R
d
,
{F.
2
/
=2
}.k/ =.
2
|k|
2
/
=2
.F/.k/, .4/
where is a function on R
d
for which the right-hand side of the denition has a well-dened
inverse Fourier transform.
2.2. Main results
This section contains our main results, however, in a loose and imprecise form. In the appendi-
ces, our statements are made precise and the proofs are given. In the discussion we shall restrict
ourselves to dimension d =2 although our results are general.
2.2.1. Main result 1
For our rst result, we shall use some hand waving arguments and a simple but powerful con-
sequence of a partly analytic result of Besag (1981). We shall show that these results are true
in the appendices. Let x be a GMRF on a regular (tending to innite) two-dimensional lattice
indexed by ij, where the Gaussian full conditionals are
E.x
ij
[x
ij
/ =
1
a
.x
i1,j
x
i1,j
x
i,j1
x
i,j1
/,
var.x
ij
[x
ij
/ =1=a
.5/
and [a[ >4. To simplify the notation, we write this particular model as
.6/
which displays the elements of the precision matrix related to a single location (section 3.4.2 in
Rue and Held (2005) uses a related graphical notation). Owing to symmetry, we display only
the upper right quadrant, with a as the central element. The approximate result (Besag (1981),
6 F. Lindgren, H. Rue and J. Lindstrm
equation 14)) is that
cov.x
ij
, x
i
/
j
/ /
a
2
K
0
{l
.a4/}, l ,=0,
where l is the Euclidean distance between ij and i
/
j
/
. Evaluated for continuous distances, this is
a generalized covariance function, which is obtained from equation (1) in the limit 0, with
2
=a4 and
2
=a=4, even though equation (1) requires >0. Informally, this means that
the discrete model dened by expression (5) generates approximate solutions to the SPDE (2)
on a unit distance regular grid, with =0.
Solving equation (2) for =1 gives a generalized random eld with spectrum
R
1
.k/ .a4|k|
2
/
1
,
meaning that (some discretized version of) the SPDE acts like a linear lter with squared trans-
fer function equal to R
1
. If we replace the noise term on the right-hand side of equation (2) by
Gaussian noise with spectrum R
1
, the resulting solution has spectrum R
2
=R
2
1
, and so on. The
consequence is GMRFrepresentations for the Matrn elds for =1 and =2, as convolutions
of the coefcients in (6): =1,
=2,
The marginal variance is 1={4.a 4/
f.u/g.u/du .7/
where the integral is over the region of interest. The stochastic weak solution of the SPDE is
found by requiring that
{
j
, .
2
/
=2
x), j =1, . . . , m}
d
={
j
, W), j =1, . . . , m} .8/
8 F. Lindgren, H. Rue and J. Lindstrm
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
4
0
.
8
Distance
C
o
r
r
e
l
a
t
i
o
n
(a) (b) (c)
Fig. 2. (a) Locations of leukaemia survival observations, (b) triangulation using 3446 triangles and (c) a sta-
tionary correlation function ( ) and the corresponding GMRF approximation () for D1 and approximate
range 0.26
for every appropriate nite set of test functions {
j
.u/, j =1, . . . , m}, where
d
= denotes equality
in distribution.
The next step is to construct a nite element representation of the solution to the SPDE
(Brenner and Scott, 2007) as
x.u/ =
n
k=1
k
.u/w
k
.9/
for some chosen basis functions {
k
} and Gaussian-distributed weights {w
k
}. Here, n is the
number of vertices in the triangulation. We choose to use functions
k
that are piecewise linear
in each triangle, dened such that
k
is 1 at vertex k and 0 at all other vertices. An interpretation
of the representation (9) with this choice of basis functions is that the weights determine the
values of the eld at the vertices, and the values in the interior of the triangles are determined
by linear interpolation. The full distribution of the continuously indexed solution is determined
by the joint distribution of the weights.
The nite dimensional solution is obtained by nding the distribution for the representation
weights in equation (9) that fulls the stochastic weak SPDE formulation (8) for only a specic
set of test functions, with m=n. The choice of test functions, in relation to the basis func-
tions, governs the approximation properties of the resulting model representation. We choose
k
=.
2
/
1=2
k
for =1 and
k
=
k
for =2. These two approximations are denoted the
least squares and the Galerkin solution respectively. For 3, we let =2 on the left-hand side
of equation (2) and replace the right-hand side with a eld generated by 2, and let
k
=
k
.
In essence, this generates a recursive Galerkin formulation, terminating in either =1 or =2;
see Appendix C for details.
Dene the nn-matrices C, G, and K with entries
C
ij
=
i
j
),
G
ij
=
i
j
),
.K
2 /
ij
=
2
C
ij
G
ij
:
Using Neumann boundary conditions (a zero normal derivative at the boundary), we obtain
our second main result, expressed here for R
1
and R
2
.
Link between Gaussian Fields and Gaussian Markov Random Fields 9
Result 2. Let Q
,
2 be the precision matrix for the Gaussian weights w as dened in equa-
tion (9) for =1, 2, . . . , as a function of
2
. Then the nite dimensional representations of the
solutions to equation (2) have precisions
Q
1,
2 =K
2 ,
Q
2,
2 =K
2 C
1
K
2 ,
Q
,
2 =K
2 C
1
Q
2,
2 C
1
K
2 , for =3, 4, . . .:
.10/
Some remarks concerning this result are as follows.
(a) The matrices C and G are easy to compute as their elements are non-zero only for pairs
of basis functions which share common triangles (a line segment in R
1
), and their values
do not depend on
2
. Explicit formulae are given in Appendix A.
(b) The matrix C
1
is dense, which makes the precision matrix dense as well. In Appendix
C.5, we show that C can be replaced by the diagonal matrix
C, where
C
ii
=
i
, 1), which
makes the precision matrices sparse, and hence we obtain GMRF models.
(c) A consequence of the previous remarks is that we have an explicit mapping from the
parameters of the GF model to the elements of a GMRF precision matrix, with compu-
tational cost O.n/ for any triangulation.
(d) For the special case where all the vertices are points on a regular lattice, using a regular
triangularization reduces main result 2 to main result 1. Note that the neighbourhood
of the corresponding GMRF in R
2
is 3 3 for =1, is 5 5 for =2, and so on.
Increased smoothness of the random eld induces a larger neighbourhood in the GMRF
representation.
(e) In terms of the smoothness parameter in the Matrn covariance function, these results
correspond to =1=2, 3=2, 5=2, . . . , in R
1
and =0, 1, 2, . . . , in R
2
.
(f) We are currently unable to provide results for other values of ; the main obstacle is the
fractional derivative in the SPDE which is dened by using the Fourier transform (4). A
result of Rozanov (1982), chapter 3.1., for the continuously indexed random eld, says
that a random eld has a Markov property if and only if the reciprocal of the spectrum is
a polynomial. For our SPDE (2) this corresponds to =1, 2, 3, . . . ; see equation (3). This
result indicates that a different approach may be needed to provide representation results
whenis not aninteger, suchas approximating the spectrumitself. Givenapproximations
for general 02, the recursive approach could then be used for general >2.
Although the approach does give a GMRF representation of the Matrn eld on the triangu-
lated region, it is truly an approximation to the stochastic weak solution as we use only a subset
of the possible test functions. However, for a given triangulation, it is the best possible approxi-
mation in the sense that is made explicit in Appendix C, where we also show weak convergence
to the full SPDE solutions. Using standard results from the nite element literature (Brenner
and Scott, 2007), it is also possible to derive rates of convergence results, like, for =2,
sup
fH
1
;|f|
H
1
1
{E.f, x
n
x)
2
H
1
/}ch
2
: .11/
Here, x
n
is the GMRF representation of the SPDE solution x, h is the diameter of the largest
circle that can be inscribed in a triangle in the triangulation and c is some constant. The Hilbert
space scalar product and norm are dened in denition 2 in Appendix B, which also includes
the values and the gradients of the eld. The result holds for general d 1, with h proportional
to the edge lengths between the vertices, when the minimal mesh angles are bounded away from
zero.
10 F. Lindgren, H. Rue and J. Lindstrm
To see how well we can approximate the Matrn covariance, Fig. 2(c) displays the empirical
correlation function (dots) and the theoretical function for =1 with approximate range 0.26,
using the triangulation in Fig. 2(b). The match is quite good. Some dots show a discrepancy
from the true correlations, but these can be identied to be due to the rather rough triangula-
tion outside the area of interest which is included to reduce edge effects. In practice there is a
trade-off between accuracy of the GMRF representation and the number of vertices used. In
Fig. 2(b) we chose to use a ne resolution in the study area and a reduced resolution outside.
A minor drawback in using these GMRFs in place of given stationary covariance models is
the boundary effects due to the boundary conditions of the SPDE. In main result 2 we used
Neumann conditions that inate the variance near the boundary (see Appendix A.4 for details)
but other choices are also possible (see Rue and Held (2005), chapter 5).
2.4. Leukaemia example
We shall now return to the example from Henderson et al. (2002) at the beginning of Section 2.3
which models spatial variation in leukaemia survival data in north-west England. The speci-
cation, in (pseudo) WilkinsonRogers notation (McCullagh and Nelder (1989), section 3.4) is
survival(time, censoring) intercept sexage wbc tpi spatial(location)
using a Weibull likelihood for the survival times, and where wbc is the white blood cell count at
diagnosis, tpi is the Townsend deprivation index (which is a measure of economic deprivation
for the related district) and spatial is the spatial component depending on the spatial location
for each measurement. The hyperparameters in this model are the marginal variance and range
for the spatial component and the shape parameter in the Weibull distribution.
Kneib and Fahrmeir (2007) reanalysed the same data set by using a Cox proportional hazards
model but, for computational reasons, used a low rank approximation for the spatial compo-
nent. With our GMRFrepresentation we easily work with a sparse 17491749 precision matrix
for the spatial component. We ran the model in R-inla (www.r-inla.org) using integrated
nested Laplace approximations to do the full Bayesian analysis (Rue et al., 2009). Fig. 3 displays
the posterior mean and standard deviation of the spatial component. A full Bayesian analysis
Easting
N
o
r
t
h
i
n
g
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
0.6
0.4
0.2
0.0
0.2
0.4
Easting
N
o
r
t
h
i
n
g
0.0
0.2
0.4
0.6
0.8
1.0
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
(a) (b)
Fig. 3. (a) Posterior mean and (b) standard deviation of the spatial effect on survival by using the GMRF
representation
Link between Gaussian Fields and Gaussian Markov Random Fields 11
took about 16 s on a quad-core laptop, and factorizing the 27972797 (total) precision matrix
took about 0.016 s on average.
3. Extensions: beyond classical Matrn models
In this section we shall discuss ve extensions to the SPDE, widening the usefulness of the
GMRF construction results in various ways. The rst extension is to consider solutions to the
SPDE on a manifold, which allows us to dene Matrn elds on domains such as a sphere.
The second extension is to allow for space varying parameters in the SPDE which allows us to
construct non-stationary locally isotropic GFs. The third extension is to study a complex ver-
sion of equation (2) which makes it possible to construct oscillating elds. The fourth extension
generalizes the non-stationary SPDEto a more general class of non-isotropic elds. Finally, the
fth extension shows how the SPDE generalizes to non-separable spacetime models.
An important feature in our approach is that all these extensions still give explicit GMRF
representations that are similar to expression (9) and (10), even if all the extensions are com-
bined. The rather amazing consequence, is that we can construct the GMRF representations
of non-stationary oscillating GFs on the sphere, still not requiring any computation beyond
the geometric properties of the triangulation. In Section 4 we shall illustrate the use of these
extensions with a non-stationary model for global temperatures.
3.1. Matrn elds on manifolds
We shall now move away from R
2
and consider Matrn elds on manifolds. GFs on manifolds
are a well-studied subject with important application to excursion sets in brain mapping (Adler
and Taylor, 2007; Bansal et al., 2007; Adler, 2009). Our main objective is to construct Matrn
elds on the sphere, which is important for the analysis of global spatial and spatiotemporal
models. To simplify the current discussion we shall therefore restrict the construction of Matrn
elds to a unit radius sphere S
2
in three dimensions, leaving the general case for the appendices.
Just as for R
d
, models on a sphere can be constructed via a spectral approach (Jones, 1963). A
more direct way of dening covariance models on a sphere is to interpret the two-dimensional
space S
2
as a surface embedded in R
3
. Any three-dimensional covariance function can then be
used to dene the model on the sphere, considering only the restriction of the function to the
surface. This has the interpretational disadvantage of using chordal distances to determine the
correlation between points. Using the great circle distances in the original covariance function
would not work in general, since for differentiable elds this does not yield a valid positive
denite covariance function (this follows from Gneiting (1998), theorem 2). Thus, the Matrn
covariance function in R
d
cannot be used to dene GFs on a unit sphere embedded in R
3
with
distance naturally dened with respect to distances within the surface. However, we can still
use its origin, the SPDE! For this purpose, we simply reinterpret the SPDE to be dened on
S
2
instead of R
d
, and the solution is still what we mean by a Matrn eld, but dened directly
for the given manifold. The Gaussian white noise which drives the SPDE can easily be dened
on S
2
as a (zero-mean) random GF W./ with the property that the covariance between W.A/
and W.B/, for any subsets A and B of S
2
, is proportional to the surface integral over AB.
Any regular 2-manifold behaves locally like R
2
, which heuristically explains why the GMRF
representation of the weak solution only needs to change the denition of the inner product (7)
to a surface integral on S
2
. The theory in Appendices BD covers the general manifold setting.
To illustrate the continuous index denition and the Markov representation of Matrn elds
on a sphere, Fig. 4 shows the locations of 7280 meteorological measurement stations on the
globe, together with an irregular triangulation. The triangulation was constrained to have
12 F. Lindgren, H. Rue and J. Lindstrm
(a) (b)
(d) (c)
Fig. 4. (a), (b) Data locations and (c), (d) triangulation for the global temperature data set analysed in
Section 4, with a coastline map superimposed
minimal angles 21
2
D16
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
180 135 90 45 0 45 90 135 180
0 2000 4000 6000 8000 100001200014000
(a) (b)
Fig. 6. (a) Structure of the (reordered) 1518215182 precision matrix and (b) a visual representation of the
reordering: the indices of each triangulation node have been mapped to grey scales showing the governing
principle of the reordering algorithm, recursively dividing the graph into conditionally independent sets
3.2. Non-stationary elds
From a traditional point of view, the most surprising extension within the SPDE framework
is how we can model non-stationarity. Many applications require non-stationarity in the cor-
relation function and there is a vast literature on this subject (Sampson and Guttorp, 1992;
Higdon, 1998; Hughes-Oliver et al., 1998; Cressie andHuang, 1999; Higdonet al., 1999; Fuentes,
2001; Gneiting, 2002; Stein, 2005; Paciorek and Schervish, 2006; Jun and Stein, 2008; Yue and
Speckman, 2010). The SPDE approach has the additional huge advantage that the resulting
14 F. Lindgren, H. Rue and J. Lindstrm
(non-stationary) GF is a GMRF, which allows for swift computations and can additionally be
dened on a manifold.
In the SPDE dened in equation (2), the parameters
2
and the innovation variance are con-
stant in space. In general, we can allow both parameters to depend on the coordinate u, and we
write
{
2
.u/ }
=2
{.u/x.u/}=W.u/: .12/
For simplicity, we choose to keep the variance for the innovation constant and instead scale the
resulting process x.u/ with a scaling parameter .u/. Non-stationarity is achieved when one or
both parameters are non-constant. Of particular interest is the case where they vary slowly with
u, e.g. in a low dimensional representation like
log{
2
.u/}=
.
2
/
i
B
.
2
/
i
.u/
and
log{.u/}=
./
i
B
./
i
.u/
where the basis functions {B
./
i
./} are smooth over the domain of interest. With slowly varying
parameters
2
.u/ and .u/, the appealing local interpretation of equation (12) as a Matrn
eld remains unchanged, whereas the actual form of the non-stationary correlation function
achieved is unknown. The actual process of combining all local Matrn elds into a consistent
global eld is done automatically by the SPDE.
The GMRF representation of equation (12) is found by using the same approach as for the
stationary case, with minor changes. For convenience, we assume that both
2
and can be
considered as constant within the support of the basis functions {
k
}, and hence
i
,
2
j
) =
i
.u/
j
.u/
2
.u/du C
ij
2
.u
j
/ .13/
for a naturally dened u
j
in the support of
i
and
j
. The consequence is a simple scaling of the
matrices in expression (10) at no additional cost; see Appendix A.3. If we improve the integral
approximation (13) fromconsidering
2
.u/ locally constant to locally planar, the computational
preprocessing cost increases but is still O.1/ for each element in the precision matrix Q
.
3.3. Oscillating covariance functions
Another extension is to consider a complex version of the basic equation (2). For simplicity, we
consider only the case =2. With innovation processes W
1
and W
2
as two independent white
noise elds, and an oscillation parameter , the complex version becomes
{
2
expi }{x
1
.u/ i x
2
.u/}=W
1
.u/ i W
2
.u/, 0 <1: .14/
The real andimaginary stationary solutioncomponents x
1
andx
2
are independent, withspectral
densities
R.k/ =.2/
d
{
4
2cos./
2
|k|
2
|k|
4
}
on R
d
. The corresponding covariance functions for Rand R
2
are given in Appendix A. For gen-
eral manifolds, no closed form expression can be found. In Fig. 7, we illustrate the resonance
effects obtained for compact domains by comparing oscillating covariances for R
2
and the unit
sphere, S
2
. The precision matrices for the resulting elds are obtained by a simple modication
of the construction for the regular case, the precise expression given in Appendix A. The details
Link between Gaussian Fields and Gaussian Markov Random Fields 15
0 1 2 3 4 5
0
.
5
0
.
0
0
.
5
1
.
0
Distance
C
o
r
r
e
l
a
t
i
o
n
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
Distance
C
o
r
r
e
l
a
t
i
o
n
(a) (b)
Fig. 7. Correlation functions from oscillating SPDE models, for D0, 0:1, . . . , 1, on (b) R
2
and (b) S
2
, with
2
D12, D1
of the construction, which are given in Appendix C.4, also reveal the possibility of multivariate
elds, similar to (Gneiting et al. (2010).
For =0, the regular Matrn covariance with =2 d=2 is recovered, with oscillations
increasing with . The limiting case =1 generates intrinsic stationary random elds, on R
d
invariant to addition of cosine functions of arbitrary direction, with wave number .
3.4. Non-isotropic models and spatial deformations
The non-stationary model that was dened in Section 3.2, has locally isotropic correlations,
despite having globally non-stationary correlations. This can be relaxed by widening the class
of SPDEs considered, allowing a non-isotropic Laplacian, and also by including a directional
derivative term. This also provides a link to the deformation method for non-stationary covari-
ances that was introduced by Sampson and Guttorp (1992).
In the deformation method, the domain is deformed into a space where the eld is stationary,
resulting in a non-stationary covariance model in the original domain. Using the link to SPDE
models, the resulting model can interpreted as a non-stationary SPDE in the original domain.
For notational simplicity, assume that the deformation is between two d-manifolds R
d
to
R
d
, with u =f. u/, u , u
. Restricting to the case =2, consider the stationary SPDE
on the deformed space
,
.
2
/ x. u/ =
W. u/, .15/
generating a stationary Matrn eld. A change of variables onto the undeformed space yields
(Smith, 1934)
1
det{F.u/}
2
det{F.u/}
F.u/F.u/
T
det{F.u/}
x.u/ =
1
det{F.u/}
1=2
W.u/, .16/
where F.u/ is the Jacobian of the deformation function f. This non-stationary SPDE exactly
reproduces the deformation method with Matrn covariances (Sampson and Guttorp, 1992). A
sparse GMRFapproximation can be constructed by using the same principles as for the simpler
non-stationary model in Section 3.2.
16 F. Lindgren, H. Rue and J. Lindstrm
An important remark is that the parameters of the resulting SPDE do not depend directly
on the deformation function itself, but only its Jacobian. A possible option for parameterizing
the model without explicit construction of a deformation function is to control the major axis
of the local deformation given by F.u/ through a vector eld, given either from covariate infor-
mation or as a weighted sum of vector basis functions. Addition or subtraction of a directional
derivative term further generalizes the model. Allowing all parameters, including the variance
of the white nose, to vary across the domain results in a very general non-stationary model that
includes both the deformation method and the model in Section 3.2. The model class can be
interpreted as changes of metric in Riemannian manifolds, which is a natural generalization
of deformation between domains embedded in Euclidean spaces. A full analysis is beyond the
scope of this paper, but the technical appendices cover much of the necessary theory.
3.5. Non-separable spacetime models
A separable spacetime covariance function can be characterized as having a spectrum that can
be written as a product or sum of spectra in only space or time. In contrast, a non-separable
model canhave interactionbetweenthe space andtime dependence structures. Whereas it is dif-
cult to construct non-separable non-stationary covariance functions explicitly, non-separable
SPDE models can be obtained with relative ease, using locally specied parameters. Arguably,
the most simple non-separable SPDEthat can be applied to the GMRFmethod is the transport
and diffusion equation
@
@t
.
2
m H/
,
s
,
"
},
and we denote the yearly temperature elds x ={x
t
} and the yearly observations y ={y
t
},
with t =1970, . . . , 1989. Using basis function matrices B
and B
. The prior distribution for the climate eld is chosen as approximate solutions
to the SPDE .u/ =
W.u/, where
s
, Q
1
y[x,
/,
where S
t
s
are station-specic effects and Q
y[x,
=I exp.
"
/ is the observation precision. Since
we use the data only for illustrative purposes here, we shall ignore all station-specic effects
except for elevation. We also ignore any remaining residual dependences between consecutive
years, analysing only the marginal distribution properties of each year.
The Bayesian analysis draws all its conclusions from the properties of the posterior distribu-
tions of .[y/ and .x[y/, so all uncertainty about the weather x
t
is included in the distribution
for the model parameters , and conversely for and x
t
. One of the most important steps is
how to determine the conditional distribution for the weather given observations and model
parameters,
.x
t
[y
t
, / N{
x[
Q
1
x[y,
A
T
t
Q
y[x,
.y
t
A
t
x[
S
t
s
/, Q
1
x[y,
},
Link between Gaussian Fields and Gaussian Markov Random Fields 19
where Q
x[y,
=Q
x[
A
T
t
Q
y[x,
A
t
is the conditional precision, and the expectation is the kri-
ging estimator of x
t
. Owing to the compact support of the basis functions, which determined by
the triangulation, each observation depends on at most three neighbouring nodes in x
t
, which
makes the conditional precision have the same sparsity structure as the eld precisions Q
x[
.
The computational cost of the Kriging estimates is O.n/ in the number of observations, and
approximately O.n
3=2
/ in the number of basis functions. If basis functions with non-compact
support had been used, such as a Fourier basis, the posterior precisions would have been fully
dense matrices, with computational cost O.n
3
/ in the number of basis functions, regardless of
the sparsity of the prior precisions. This shows that when constructing computationally efcient
models it is not enough to consider the theoretical properties of the prior model, but instead
the whole sequence of computations needs to be taken into account.
4.3. Results
We implemented the model by using R-inla. Since .x[y, / is Gaussian, the results are only
approximate with regard to the numerical integration of the covariance parameters .
,
"
/.
Owing to the large size of the data set, this initial analysis is based on data only from the
period 19701989, requiring 336960 nodes in a joint model for the yearly temperature elds,
measurements and linear covariate parameters, with 15182 nodes in each eld, and the num-
ber of observations in each year ranging between approximately 1300 and 1900, for each year
including all stations with no missing monthly values. The full Bayesian analysis took about
1 h to compute on a 12-core computer, with a peak memory use of about 50Gbytes during the
parallel numerical integration phase. This is a notable improvement over earlier work by Das
(2000) where partial estimation of the parameters in a deformation-based covariance model of
the type in Section 3.4 took more than a week on a supercomputer.
The 95% credible interval for the measurement standard deviation, including local unmod-
elled effects, was calculated as .0:628, 0:650/
C, with posterior expectation 0:634
C. The spa-
tial covariance parameters are more difcult to interpret individually, but we instead show the
resulting spatially varying eld standard deviations and correlation ranges in Fig. 8, including
pointwise 95% credible intervals. Both curves show a clear dependence on latitude, with both
larger variance and correlation range near the poles, compared with the equator. The standard
deviations range between 1.2 and 2:6
C, and the correlation ranges vary between 1175 and
2825 km. There is an asymmetric northsouth pole effect for the variances, but a symmetric
curve is admissible in the credible intervals.
Evaluating the estimated climate and weather for a period of only 20 years is difcult, since
climate is typically dened as averages over periods of 30 years. Also, the spherical harmonics
that were used for the climate model are not of sufciently high order to capture all regional
effects. To alleviate these problems, we base the presentation on what can reasonably be called
the empirical climate and weather anomalies for the period 19701989, in effect using the period
average as reference. Thus, instead of evaluating the distributions of .[y/ and .x
t
[y/, we
instead consider . x[y/ and .x
t
x[y/, where x =
1989
t=1970
x
t
=20. In Figs 9(a) and 9(b), the pos-
terior expectation of the empirical climate, E. x[y/, is shown (including the estimated effect
of elevation), together with the posterior expectation of the temperature anomaly for 1980,
E.x
1980
x[y/. The corresponding standard deviations are shown in Figs 9(c) and 9(d). As
expected, the temperatures are low near the poles and high near the equator, and some of the
relative warming effect of the thermohaline circulation on the Alaska and northern European
climates can also be seen. There is a clear effect of regional topography, showing cold areas
for high elevations such as in the Himalayas, Andes and Rocky Mountains, as indicated by an
20 F. Lindgren, H. Rue and J. Lindstrm
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
180 135 90 45 0 45 90 135 180
180 135 90 45 0 45 90 135 180
180 135 90 45 0 45 90 135 180
180 135 90 45 0 45 90 135 180
30 20 10 0 10 20 30 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
(a)
(b) (d)
(c)
Fig. 9. Posterior means for (a) the empirical 19701989 climate and (b) the empirical mean anomaly 1980
with (c) and (d) the corresponding posterior standard deviations respectively: the climate includes the esti-
mated effect of elevation; an area preserving cylindrical projection is used
estimated cooling effect of 5:2
C per kilometre of increased elevation. It is clear from Figs 9(c)
and 9(d) that including ocean-based measurements is vital for analysis of regional ocean climate
and weather, in particular for the south-east Pacic Ocean.
With this in mind, we might expect that the period of analysis and data coverage are too
restricted to allow detection of global trends, especially since the simple model that we use
a priori assumes a constant climate. However, the present analysis, including the effects of all
parameter uncertainties, still yields a 95% Bayesian prediction interval .0:87, 2:18/
C per cen-
tury (expectation 1:52
C) for the global average temperature trend over the 20-year period
analysed. The posterior standard deviation for each global average temperature anomaly was
calculated to about 0:09
i
/=2.
Since d =1, we can write H=H 0, and the elements on row i, around the diagonal, of the precision are
given by
Q
1
: s
i
[ a
i
c
i
b
i
],
Q
2
: s
i
[ a
i
a
i1
a
i
.c
i1
c
i
/ a
i
b
i1
c
2
i
b
i
a
i1
b
i
.c
i
c
i1
/ b
i
b
i1
]
where a
i
=H=
i
s
i
, b
i
=H=
i
s
i
, and c
i
=
2
a
i
b
i
. If the spacing is regular, s = =, and a=a
i
=b
i
H=
2
and c =c
i
2
2a. The special case =2 with =0 and irregular spacing is a generalization of Lindgren
and Rue (2008).
For R
2
, assume a givenregular griddiscretization, withhorizontal (co-ordinate component 1) distances
and vertical (co-ordinate component 2) distances . Let s =, a=H
11
=
2
, b=H
22
=
2
and c =
2
2a2b.
The precision elements are then given by
If the grid distances are proportional to the square root of the corresponding diagonal elements of H (such
as in the isotropic case = and H
11
=H
22
), the expressions simplify to s =, a =b =H
11
=
2
=H
22
=
2
and c =
2
4a.
A.2. Triangulated domains
In this section, we derive explicit expressions for the building blocks for the precision matrices, for
general triangulated domains with piecewise linear basis functions. For implementation of the theory in
Appendix C, we need to calculate
C
ii
=
i
, 1)
,
C
ij
=
i
,
j
)
,
G
ij
=
i
,
j
)
,
B
ij
=
i
, @
n
j
)
@
:
_
_
.19/
For 2-manifolds such as regions in R
2
or on S
2
, we require a triangulation with a set of vertices v
1
, . . . , v
n
,
embedded in R
3
. Each vertex v
k
is assigned a continuous piecewise linear basis function
k
with support
on the triangles attached to v
k
. To obtain explicit expressions for equation (19), we need to introduce some
notation for geometry of an arbitrary triangle. For notational convenience, we number the corner vertices
Link between Gaussian Fields and Gaussian Markov Random Fields 23
of a given triangle T =.v
0
, v
1
, v
2
/. The edge vectors opposite each corner are
e
0
=v
2
v
1
,
e
1
=v
0
v
2
,
e
2
=v
1
v
0
,
and the corner angles are
0
,
1
and
2
.
The triangle area [T[ can be obtained fromthe formula [T[ =|e
0
e
1
|=2, i.e. half the length of the vector
product in R
3
. The contributions from the triangle to the
C and C matrices are given by
[
C
i, i
.T/]
i=0, 1, 2
=
[T[
3
. 1 1 1/,
[C
i, j
.T/]
i, j=0, 1, 2
=
[T[
12
_
2 1 1
1 2 1
1 1 2
_
:
The contribution to G
0, 1
from the triangle T is
G
0, 1
.T/ =[T[.
0
/
T
.
1
/ =
cot.
2
/
2
=
1
4[T[
e
T
0
e
1
,
and the entire contribution from the triangle is
[G
i, j
.T/]
i, j=0, 1, 2
=
1
4[T[
_
_
|e
0
|
2
e
T
0
e
1
e
T
0
e
2
e
T
1
e
0
|e
1
|
2
e
T
1
e
2
e
T
2
e
0
e
T
2
e
1
|e
2
|
2
_
_
=
1
4[T[
. e
0
e
1
e
2
/
T
. e
0
e
1
e
2
/:
For the boundary integrals in expression (19), the contribution from the triangle is
[B
i, j
.T/]
i, j=0, 1, 2
=
1
4[T[
_
0 e
0
e
0
e
1
0 e
1
e
2
e
2
0
_
T
_
b
0
I
b
1
I
b
2
I
_
. e
0
e
1
e
2
/,
where b
k
=I(edge k in T lies on @). Summing the contributions from all the triangles yields the complete
C, C, G, and B-matrices.
For the anisotropic version, parameterizedas inAppendix A.1 andAppendix C.4, the modied G-matrix
elements are given by
[G
i, j
.T/]
i, j=0, 1, 2
=
1
4[T[
. e
0
e
1
e
2
/
T
adj.H/. e
0
e
1
e
2
/, .20/
where adj.H/ is the adjugate matrix of H, for non-singular matrices dened as det.H/H
1
.
A.3. Non-stationary and oscillating models
For easy reference, we give specic precision matrix expressions for the case =2 for arbitrary triangu-
lated manifold domains . The stationary and simple oscillating models for =2 have precision matrices
given by
Q
2
.
2
, / =
4
C2
2
cos./GGC
1
G, .21/
where =0 corresponds to the regular Matrn case and 0< <1 are oscillating models. Using the approx-
imation from expression (13), the non-stationary model (12) with =2 has precision matrix given by
Q
2
{
2
./, ./}=.
2
C
2
2
GG
2
GC
1
G/ .22/
where
2
and are diagonal matrices, with
2
ii
=.u
i
/
2
and
ii
=.u
i
/. As shown in Appendix C.5, all the
C should be replaced by
C to obtain a Markov model.
24 F. Lindgren, H. Rue and J. Lindstrm
A.4. Neumann boundary effects
The effects on the covariance functions resulting fromusing Neumann boundary conditions can be explic-
itly expressed as a folding effect. When the full SPDE is
.
2
/
=2
x.u/ =W.u/, u ,
@
n
.
2
/
j
x.u/ =0, u @, j =0, 1, . . . , .1/=2,
.23/
the following theorem provides a direct answer, in terms of the Matrn covariance function.
Theorem 1. If x is a solution to the boundary value problem (23) for =[0, L] and a positive integer ,
then
cov{x.u/, x.v/}=
k=
{r
M
.u, v2kL/ r
M
.u, 2kLv/}
where r
M
is the Matrn covariance as dened on the whole of R.
Theorem 1, which extends naturally to arbitrary generalized rectangles in R
d
, is proved in Appendix
D.1. In practice, when the effective range is small compared with L, only the three main terms need to be
included for a very close approximation:
cov{x.u/, x.v/}r
M
.u, v/ r
M
.u, v/ r
M
.u, 2Lv/ .24/
=r
M
.0, vu/ r
M
.0, vu/ r
M
{0, 2L.vu/}: .25/
Moreover, the resulting covariance is nearly indistinguishable from the stationary Matrn covariance at
distances greater than twice the range away from the borders of the domain.
A.5. Oscillating covariance functions
The covariances for the oscillating model can be calculated explicitly for Rand R
2
, from the spectrum. On
R, complex analysis gives
r.u, v/ =
1
2sin./
3
exp{cos.=2/[vu[}sin{=2sin.=2/[vu[}, .26/
which has variance {4cos.=2/
3
}
1
. On R
2
, involved Bessel function integrals yield
r.u, v/ =
1
4 sin./
2
i
[K
0
{|v u|exp.i=2/}K
0
{|v u|exp.i=2/}] .27/
which has variance {4
2
sinc./}
1
.
Appendix B: Manifolds, random elds and operator identities
B.1. Manifold calculus
To state concisely the theory needed for constructing solutions to SPDEs on more general spaces than
R
d
, we need to introduce some concepts from differential geometry and manifolds. A main point is that,
loosely speaking, for statisticians who are familiar with measure theory and stochastic calculus on R
d
,
many of the familiar rules for calculus for randomprocesses and elds still apply, as long as all expressions
are dened in co-ordinate-free manners. Here, we give a brief overview of the concepts that are used in
the subsequent appendices. For more details on manifolds, differential calculus and geometric measure
theory see for example Auslander and MacKenzie (1977), Federer (1978) and Krantz and Parks (2008).
Loosely, we say that a space is a d-manifold if it locally behaves as R
d
. We consider only manifolds
with well-behaved boundaries, in the sense that the boundary @ of a manifold, if present, is required to
be a piecewise smooth .d 1/-manifold. We also require the manifolds to be metric manifolds, so that
distances between points and angles between vectors are well dened.
A bounded manifold has a nite maximal distance between points. If such a manifold is complete in
the set sense, it is called compact. Finally, if the manifold is compact but has no boundary, it is closed.
The most common metric manifolds are subsets of R
d
equipped with the Euclidean metric. The prime
Link between Gaussian Fields and Gaussian Markov Random Fields 25
example of a closed manifold is the unit sphere S
2
embedded in R
3
. In Fourier analysis for images, the at
torus commonly appears, when considering periodic continuations of a rectangular region. Topologically,
this is equivalent to a torus, but with a different metric compared with a torus that is embedded in R
3
. The
d-dimensional hypercube [0, 1]
d
is a compact manifold with a closed boundary.
From the metric that is associated with the manifold it is possible to dene differential operators. Let
denote a function : .R. The gradient of at u is a vector .u/ dened indirectly via directional
derivatives. In R
d
with Euclidean metric, the gradient operator is formally given by the column vector
.@=@u
1
, . . . , @=@u
d
/
T
. The Laplacian of at u (or the LaplaceBeltrami operator) can be dened as the
sum of the second-order directional derivatives, with respect to a local orthonormal basis, and is denoted
.u/ = .u/. In Euclidean metric on R
d
, we can write =@
2
=@u
2
1
. . . @
2
=@u
2
d
. At the boundary
of , the vector n
@
.u/ denotes the unit length outward normal vector at the point u on the boundary @.
The normal derivative of a function is the directional derivative @
n
.u/ =n
@
.u/ .u/.
An alternative to dening integration on general manifolds through mapping subsets into R
d
is to
replace Lebesgue integration with integrals dened through normalized Hausdorff measures (Federer,
1951, 1978), here denoted H
d
=H
d
.1
A
/, and the Hausdorff integral of a (measurable) function as
H
d
./. An inner product between scalar or vector-valued functions and is dened through
, )
=H
d
. / =
_
u
.u/ .u/H
d
.du/:
A function : .R
m
, m1, is said to be square integrable if and only if ||
2
=, )
<, which is
denoted L
2
./.
A fundamental relationship, that corresponds to integration by parts for functions on R, is Greens rst
identity,
, )
=, )
@
n
)
@
:
Typical statements of the identity require C
1
./ and C
2
./, but we shall relax these requirements
considerably in lemma 1.
We also need to dene Fourier transforms on general manifolds, where the usual cosine and sine func-
tions do not exist.
Denition 1 (generalized fourier representation). The Fourier transformpair for functions {L
2
: R
d
.
R} is given by
.k/ =.F/.k/ =
1
.2/
d
.u/, exp.ik
T
u/)
R
d
.du/
,
.u/ =.F
1
/.u/ =
.k/exp.ik
T
u/)
R
d
.dk/
:
(Here, we briey abuse our notation by including complex functions in the inner products.)
If is a compact manifold, a countable subset {E
k
, k =0, 1, 2, . . .} of orthogonal and normalized eigen-
functions to the negated Laplacian, E
k
=
k
E
k
, can be chosen as basis, and the Fourier representation
for a function L
2
: .R is given by
.k/ =.F/.k/ =, E
k
)
,
.u/ =.F
1
/.u/ =
k=0
.k/E
k
.u/:
Finally, we dene a subspace of L
2
-functions, with inner product adapted to the differential operators
that we shall study in the remainder of this paper.
Denition 2. The Hilbert space H
1
., /, for a given 0, is the space of functions {: .R} with
L
2
./, equipped with inner product
, )
H
1
., /
=
2
, )
, )
:
26 F. Lindgren, H. Rue and J. Lindstrm
The inner product induces a norm, which is given by ||
H
1
., /
=, )
1=2
H
1
., /
. The boundary case =0 is
alsowell dened, since ||
H
1
., /
is a seminorm, andH
1
., 0/ is a space of equivalence classes of functions,
that can be identied by functions with , 1)
=0.
Note that, for >0, the norms are equivalent, and that the Hilbert space H
1
is a quintessential Sobolev
space.
B.2. Generalized Gaussian random elds
We now turn to the problem of characterizing random elds on . We restrict ourselves to GFs that are
at most as irregular as white noise. The distributions of such elds are determined by the properties of
expectations and covariances of integrals of functions with respect to randommeasures: the so-called nite
dimensional distributions.
In classical theory for GFs, the following denition can be used.
Denition 3 (GF). A random function x : .R on a manifold is a GF if {x.u
k
/, k =1, . . . , n} are
jointly Gaussian random vectors for every nite set of points {u
k
, k =1, . . . , n}. If there is a constant
b0 such that E{x.u/
2
}b for all u , the random eld has bounded second moments.
The complicating issue in dealing with the fractional SPDEs that are considered in this paper is that,
for some parameter values, the solutions themselves are discontinuous everywhere, although still more
regular than white noise. Thus, since the solutions do not necessarily have well-dened pointwise meaning,
the above denition is not applicable, and the driving white noise itself is also not a regular random eld.
Inspired by Adler and Taylor (2007), we solve this by using a generalized denition based on generalized
functions.
Denition 4 (generalized function). For a given function space F, an F-generalized function x: .R,
with an associated generating additive measure x
=x
-measurable
functions F.
When x
, i =1, . . . , n,
are jointly Gaussian. If there is a constant b0 such that E., x)
2
/ b||
2
for every L
2
./, the gen-
eralized eld x has L
2
./-bounded second moments, abbreviated as L
2
./ bounded.
Of particular importance is the fact that white noise can be dened directly as a generalized GF.
Denition 6 (Gaussian white noise). Gaussian white noise W on a manifold is an L
2
./-bounded
generalized GF such that, for any set of test functions {
i
L
2
./, i =1, . . . , n}, the integrals
i
, W)
,
i =1, . . . , n, are jointly Gaussian, with expectation and covariance measures given by
E.
i
, W)
/ =0,
cov.
i
, W)
,
j
, W)
/ =
i
,
j
)
:
In particular, the covariance measure of W over two subregions A, B is equal to the area measure of
their intersection, [AB[
, so the variance measure of W over a region is equal to the area of the region.
We note that the popular approach to dening white noise on R
d
via a Brownian sheet is not applicable
for general manifolds, since the notion of globally orthogonal directions is not present. The closest equiv-
alent would be to dene a set-indexed Gaussian random function W
.A/, W
.B/} =[AB[
, which is much too restrictive to be applied to generalized functions and random elds.
Here, we present the two fundamental identities that are needed for the subsequent SPDEanalysis; Greens
rst identity and a scalar product characterization of the half-Laplacian.
B.3.1. Stochastic Greens rst identity
We here state a generalization of Greens rst identity, showing that the identity applies to generalized
elds, as opposed to only differentiable functions.
Lemma 1. If f L
2
./ and x is L
2
./ bounded, then (with probability 1)
f, x)
=f, x)
f, @
n
x)
@
:
If x is L
2
./ bounded and f L
2
./ , then (with probability 1)
x, f)
=x, f)
x, @
n
f)
@
:
For brevity, we include only a sketch of the proof.
Proof. The requirements imply that each integrand can be approximated arbitrarily closely in the L
2
-
senses using C
q
functions
f and x, where q in each case is sufciently large for the regular Greens identity
to hold for
f and x. Using the triangle inequality, it follows that the expectation of the squared difference
between the left- and right-hand sides of the identity can be bounded by an arbitrarily small positive
constant. Hence, the difference is zero in quadratic mean, and the identity holds with probability 1.
B.3.2. Half-Laplacian
In dening and solving the SPDEs considered, the half-Laplacian operator needs to be characterized in a
way that permits practical calculations on general manifolds. The fractional modied Laplacian operators
.
2
/
=2
, , 0, are commonly (Samko et al. (1992), page 483) dened through the Fourier transform,
as dened above:
{F.
2
/
=2
}.k/ =.
2
|k|
2
/
=2
.F/.k/,
on R
d
;
{F.
2
/
=2
}.k/ =.
2
k
/
=2
.F/.k/,
on compact , where
k
, k =0, 1, 2, . . . , are the eigenvalues of . The formal denition is mostly of
theoretical interest since, in practice, the generalized Fourier basis and eigenvalues for the Laplacian are
unknown. In addition, even if the functions are known, working directly in the Fourier basis is computa-
tionally expensive for general observation models, since the basis functions do not have compact support,
which leads to dense covariance and precision matrices. The following lemma provides an integration
identity that allows practical calculations involving the half-Laplacian.
Lemma 2. Let and be functions in H
1
., /. Then, the Fourier-based modied half-Laplacians
satisfy
.
2
/
1=2
, .
2
/
1=2
)
=, )
H
1
., /
whenever either
(a) =R
d
,
(b) is closed or
(c) is compact and , @
n
@
=@
n
, )
@
=0.
For a proof, see Appendix D.2. Lemma 2 shows that, for functions fullling the requirements, we
can use the Hilbert space inner product as a denition of the half-Laplacian. This also generalizes in a
natural way to random elds x with L
2
./-bounded x, as well as to suitably well-behaved unbounded
manifolds.
It would be tempting to eliminate the qualiers in part (c) of lemma 2 by subtracting the average of the
two boundary integrals to the relationship, and to extend lemma 2 to a complete equivalence relationship.
28 F. Lindgren, H. Rue and J. Lindstrm
However, the motivation may be problematic, since the half-Laplacian is dened for a wider class of
functions than the Laplacian, and it is unclear whether such a generalization necessarily yields the same
half-Laplacian as the Fourier denition for functions that are not of the class L
2
./. See Ili c et al.
(2008) for a partial result.
Appendix C: Hilbert space approximation
We are now ready to formulate the main results of the paper in more technical detail. The idea is to
approximate the full SPDE solutions with functions in nite Hilbert spaces, showing that the approxi-
mations converge to the true solutions as the nite Hilbert space approaches the full space. In Appendix
C.1, we state the convergence and stochastic FEM denitions that are needed. The main result for Matrn
covariance models is stated in Appendix C.2, followed by generalizations to intrinsic and oscillating elds
in Appendix C.3 and Appendix C.4. Finally, the full nite element constructions are modied to Markov
models in Appendix C.5.
C.1. Weak convergence and stochastic nite element methods
We start by stating formal denitions of convergence of Hilbert spaces and of random elds in such
spaces (denitions 7 and 8) as well as the denition of the nite element constructions that will be used
(denition 9).
Denition 7 (dense subspace sequences). A nite subspace H
l
n
., / H
1
., / is spanned by a nite
set of basis functions
n
={
1
, . . . ,
n
}. We say that a sequence of subspaces {H
1
n
} is dense in H
1
if for
every f H
1
there is a sequence {f
n
}, f
n
H
1
n
, such that lim
n
.|f f
n
|
H
1
., /
/ =0.
If the subspace sequence is nested, there is a monotonely convergent sequence {f
n
}, but that is not a
requirement here. For given H
1
n
, we can choose the projection of f H
1
onto H
1
n
, i.e. the f
n
that minimizes
|f f
n
|
H
1 . The error f f
n
is orthogonal to H
1
n
, and the basis co-ordinates can be determined via the
system of equations
k
, f
n
)
H
1
., /
=
k
, f)
H
1
., /
, for all k =1, . . . , n.
Denition 8 (weak convergence). Asequence of L
2
./-bounded generalized GFs {x
n
} is said to converge
weakly to an L
2
./-bounded generalized GF x if, for all f, g L
2
./,
E.f, x
n
)
/ E.f, x)
/,
cov.f, x
n
)
, g, x
n
)
/ cov.f, x)
, g, x)
/,
as n. We denote such convergence by
x
n
D{L
2
./}
* x:
Denition 9 (nite element approximations). Let L be a second-order elliptic differential operator, and
let E be a generalized GF on . Let x
n
=
j
j
w
j
H
1
n
., / denote approximate weak solutions to the
SPDE Lx=E on .
(a) The weak Galerkin solutions are given by Gaussian w={w
1
, . . . , w
n
} such that
E.f
n
, Lx
n
)
/ =E.f
n
, E)
/,
cov.f
n
, Lx
n
)
, g
n
, Lx
n
)
/ =cov.f
n
, E)
, g
n
, E)
/
for every pair of test functions f
n
, g
n
H
1
n
., /.
(b) The weak least squares solutions are given by Gaussian w={w
1
, . . . , w
n
} such that
E.Lf
n
, Lx
n
)
/ =E.Lf
n
, E)
/,
cov.Lf
n
, Lx
n
)
, Lg
n
, Lx
n
)
/ =cov.Lf
n
, E)
, Lg
n
, E)
/
for every pair of test functions f
n
, g
n
H
1
n
., /.
Link between Gaussian Fields and Gaussian Markov Random Fields 29
C.2. Basic Matrn-like cases
In the remainder of the appendices, we let L=.
2
/. In the classic Matrn case, the SPDE L
=2
x=W
can, for integer -values, be unravelled into an iterative formulation
L
1=2
y
1
=W,
Ly
2
=W,
Ly
k
=y
k2
, k =3, 4, . . . , :
For integers =1, 2, 3, . . . , y
,
G
i, j
=
i
,
j
)
,
K=
2
CG
and denote the distribution for w with N.0, Q
1
/, where the precision matrix Q is the inverse of the
covariance matrix, and let x
n
=
k
k
w
k
be a weak H
1
n
., / approximation to L
=2
x=E, L=.
2
/,
with Neumann boundaries, and @
n
k
=0 on @.
(a) When =2 and E =W, the weak Galerkin solution is obtained for Q=K
T
C
1
K.
(b) When =1 and E =W, the weak least squares solution is obtained for Q=K.
(c) When =2 and E is an L
2
./-bounded GF in H
1
n
., / with mean 0 and precision Q
E, n
, the weak
Galerkin solution is obtained for Q=K
T
C
1
Q
E, n
C
1
K.
Theorem 3 (convergence). Let x be a weak solution to the SPDE L
=2
x = W, L = .
2
/, with
Neumann boundaries on a manifold , and let x
n
be a weak H
1
n
., / approximation, when W is
Gaussian white noise. Then,
x
n
D{L
2
./}
* x, .28/
L
=2
x
n
D{L
2
./}
* L
=2
x, .29/
if the sequence {H
1
n
., /, n} is dense in H
1
., /, and either
(a) =2, and x
n
is the Galerkin solution, or
(b) =1 and x
n
is the least squares solution.
Theorem 4 (iterative convergence). Let y be a weak solution to the linear SPDE L
y
y =E on a manifold
, for some L
2
./-bounded random eld E, and let x be a weak solution to the SPDE L
y
Lx=E, where
L=
2
. Further, let y
n
be a weak H
1
n
., / approximation to y such that
y
n
D{L
2
./}
* y, .30/
and let x
n
be the weak Galerkin solution in H
1
n
., / to the SPDEs Lx=y
n
on . Then,
x
n
D{L
2
./}
* x, .31/
30 F. Lindgren, H. Rue and J. Lindstrm
Lx
n
D{L
2
./}
* Lx: .32/
For proofs of the three theorems, see Appendix D.3.
C.3. Intrinsic cases
When =0, the Hilbert space from denition 2 is a space of equivalence classes of functions, correspond-
ing to SPDE solutions where arbitrary functions in the null space of ./
=2
can be added. Such solution
elds are known as intrinsic elds and have well-dened properties. With piecewise linear basis functions,
the intrinsicness can be exactly reproduced for =1 for all manifolds, and partially for =2 on subsets
of R
2
, by relaxing the boundary constraints to free boundaries. For larger or more general manifolds,
the intrinsicness will only be approximately represented. How to construct models with more ne-tuned
control of the null space is a subject for further research.
To approximate intrinsic elds with 2 and free boundaries, the matrix K in theorem 2 should be
replaced by G B (owing Greens identity), where the elements of the (possibly asymmetric) boundary
integral matrix B are given by B
i, j
=
i
, @
n
j
)
@
. The formulations and proofs of theorem 3 and theorem 4
remain unchanged, but with the convergence dened only with respect to test functions f and g orthogonal
to the null space of the linear SPDE operator.
The notion of non-null-space convergence allows us to formulate a simple proof of the result from
Besag and Mondal (2005), that says that a rst-order intrinsic conditional auto-regressive model on innite
lattices in R
2
converges to the de Wij process, which is an intrinsic generalized Gaussian random eld. As
can be seen in Appendix A.1, for =1 and =0, the Q-matrix (equal to G) for a triangulated regular
grid matches the ordinary intrinsic rst-order conditional auto-regressive model. The null spaces of the
half-Laplacian are constant functions. Choose non-trivial test functions f and g that integrate to 0 and
apply theorem 3 and denition 8. This shows that the regular conditional auto-regressive model, seen as
a Hilbert space representation with linear basis functions, converges to the de Wij process, which is the
special SPDE case =1, =0, in R
2
.
C.4. Oscillating and non-isotropic cases
To construct the Hilbert space approximation for the oscillating model that was introduced in Section 3.3,
as well as non-isotropic versions, we introduce a coupled system of SPDEs for =2,
_
h
1
H
1
h
2
H
2
h
2
H
2
h
1
H
1
__
x
1
x
2
_
=
_
E
1
E
2
_
.33/
which is equivalent to the complex SPDE
{h
1
ih
2
.H
1
iH
2
/}{x
1
.u/ i x
2
.u/}=E
1
.u/ i E
2
.u/: .34/
The model in Section 3.3 corresponds to h
1
=
2
cos./, h
2
=
2
sin./, H
1
=I and H
2
=0.
To solve the coupled SPDE system (33) we take a set {
k
, k =1, . . . , n} of basis functions for H
1
n
., /
and construct a basis for the solution space for .x
1
x
2
/
T
as
_
1
0
_
, . . . ,
_
n
0
_
,
_
0
1
_
, . . . ,
_
0
n
_
:
The denitions of the G- and K-matrices are modied as follows:
.G
k
/
i, j
=H
1=2
k
i
H
1=2
k
j
)
, k =1, 2,
K
k
=h
k
CG
k
, k =1, 2:
Using the same construction as in the regular case, the precision for the solutions is given by
_
K
1
K
2
K
2
K
1
_
T
_
C 0
0 C
_
1
_
Q
E
0
0 Q
E
__
C 0
0 C
_
1
_
K
1
K
2
K
2
K
1
_
=
_
Q 0
0 Q
_
,
where Q=Q.h
1
, H
1
/ Q.h
2
, H
2
/, andQ., / is the precisionthat is generatedfor the regular iteratedmodel
Link between Gaussian Fields and Gaussian Markov Random Fields 31
with the given parameters. Surprisingly, regardless of the choice of parameters, the solution components
are independent.
C.5. Markov approximation
By choosing piecewise linear basis functions, the practical calculation of the matrix elements in the
construction of the precision is straightforward, and the local support make the basic matrices sparse.
Since they are not orthogonal, the C-matrix will be non-diagonal, and therefore the FEM construction
does not directly yield Markov elds for 2, since C
1
is not sparse. However, following standard prac-
tice in FEMs, C can be approximated with a diagonal matrix as follows. Let
C be a diagonal matrix, with
C
ii
=
j
C
ij
=
i
, 1)
, and note that this preserves the interpretation of the matrix as an integration matrix.
Substituting C
1
with
C
1
yields a Markov approximation to the FEM solution.
The convergence rate for the Markov approximation is the same as for the full FEM model, which
can be shown by adapting the details of the proofs of convergence. Let f and g be test functions in
H
1
., / and let f
n
and g
n
be their projections onto H
1
n
, , with basis weights w
f
and w
g
. Taking the
difference between the covariances for the Markov ( x
n
) and the full FEM solution (x
n
) for =2 yields the
error
cov.f, L x
n
)
, g, L x
n
)
/ cov.f, Lx
n
)
, g, Lx
n
)
/ =w
f
.
CC/w
g
:
Requiring |f|
H
1
., /
, |g|
H
1
., /
1, it follows from lemma 1 in Chen and Thome (1985) that the covari-
ance error is bounded by ch
2
, where c is some constant and h is the diameter of the largest circle that can
be inscribed in a triangle of the triangulation. This shows that the convergence rate from expression (11)
will not be affected by the Markov approximation. In practice, the C matrix in K should also be replaced
by
C. This improves the approximation when either h or is large, with numerical comparisons showing
a covariance error reduction of as much as a factor 3. See Bolin and Lindgren (2009) for a comparison of
the resulting kriging errors for various methods, showing negligible differences between the exact FEM
representation and the Markov approximation.
Appendix D: Proofs
D.1. Folded covariance: proof of theorem 1
Writing the covariance of the SPDE solutions on the interval =[0, L] R in terms of the spectral repre-
sentation gives an innite series,
cov{x.u/, x.v/}=
0
k=1
cos.uk=L/ cos.vk=L/
k
, .35/
where
0
=.
2
L/
1
and
k
=2L
1
{
2
.k=L/
2
}
.
2
2
/
cos{.vu/}d:
Thus, with r.u, v/ denoting the folded covariance in the statement of theorem 1
r.u, v/ =
k=
{r
M
.u, v2kL/ r
M
.u, 2kLv/}
=
1
2
k=
_
.
2
2
/
[cos{.vu2kL/}cos{.vu2kL/}] d
=
1
2
_
.
2
2
/
k=
[cos{.vu2kL/}cos{.vu2kL/}] d
32 F. Lindgren, H. Rue and J. Lindstrm
Rewriting the cosines via Eulers formulae, we obtain
k=
[cos{.vu2kL/}cos{.vu2kL/}]
=
1
2
k=
{exp.iu/ exp.iu/}[exp{i.v2kL/}exp{i.v2kL/}]
=cos.u/{exp.iv/
k=
exp.2ikL/ exp.iv/
k=
exp.2ikL/}
=2 cos.u/{exp.iv/ exp.iv/}
k=
.2L2k/
=
2
L
cos.u/ cos.v/
k=
k
L
_
where we used the Dirac measure representation
k=
exp.iks/ =2
k=
.s 2k/:
Finally, combining the results yields
r.u, v/ =
1
L
_
.
2
2
/
cos.u/ cos.v/
k=
k
L
_
d
=
1
L
k=
_
_
k
L
_
2
_
cos
_
uk
L
_
cos
_
vk
L
_
=
1
2
L
2
L
k=1
_
_
k
L
_
2
_
cos
_
uk
L
_
cos
_
vk
L
_
,
which is precisely the expression sought in equation(35).
D.2. Modied half-Laplacian equivalence: proof of lemma 2
For brevity, we present only the proof for compact manifolds, as the proof for =R
d
follows the same
principle but without the boundary complications. The main difference is that the Fourier representation
is discrete for compact manifolds and continuous for R
d
.
Let
k
0, k =0, 1, 2, . . . , be the eigenvalue corresponding to eigenfunction E
k
of (denition 1).
Then, with
.k/ =.F/.k/, the modied half-Laplacian from Appendix B.3.2. is dened through F{.
2
/
1=2
}.k/ =.
2
k
/
1=2
.k/, and we obtain
.
2
/
1=2
.
2
/
1=2
)
=
_
k=0
.
2
k
/
1=2
.k/E
k
,
k
/
=0
.
2
k
/ /
1=2
.k
/
/E
k
/
_
,
and, since , H
1
., /, we can change the order of integration and summation,
.
2
/
1=2
.
2
/
1=2
)
k=0
.
2
k
/
.k/
.k/,
since the eigenfunctions E
k
and E
k
/ are orthonormal.
Now, starting from the Hilbert space inner product,
, )
H
1
., /
=
2
, )
, )
=
2
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
k=0
k
/
=0
.k/
.k
/
/E
k
E
k
/ )
k=0
k
/
=0
.k/
.k
/
/E
k
E
k
/ )
:
Further, Greens identity for E
k
, E
k
/ )
yields
E
k
E
k
/ )
=E
k
, E
k
/ )
E
k
, @
n
E
k
/ )
@
=
k
/ E
k
E
k
/ )
E
k
@
n
E
k
/ )
@
:
Since , L
2
./ we can change the order of summation, integration and differentiation for the
boundary integrals,
k=0
k
/
=0
.k/
.k
/
/E
k
@
n
E
k
/ )
@
=@
n
)
@
:
By the boundary requirements in lemma 2, whenever Greens identity holds, the boundary integral van-
ishes, either because the boundary is empty (if the manifold is closed), or the integrand is 0, so collecting
all the terms we obtain
, )
H
1
., /
=
k=0
k
/
=0
.
2
/
k
/
.k/
.k
/
/E
k
E
k
/ )
0=
k=0
.
2
k
/
.k/
.k/,
and the proof is complete.
D.3. Hilbert space convergence
D.3.1. Proof of theorem 2 (nite element precisions)
The proofs for theorem 2 are straightforward applications of the denitions. Let w
f
and w
g
be the Hilbert
space co-ordinates of two test functions f
n
, g
n
H
1
n
., /, and let L=.
2
/.
For case (a), =2 and E =W, so
f
n
, Lx
n
)
i, j
w
f, i
i
L
j
)
w
j
=
i, j
w
f, i
.
2
C
i, j
G
i, j
/w
j
=w
T
f
Kw
owing to Greens identity, and
cov.f
n
, Lx
n
)
, g
n
, Lx
n
)
/ =w
T
f
Kcov.w, w/K
T
w
g
:
This covariance is equal to
cov.f
n
, W)
, g
n
, W)
/ =f
n
, g
n
)
i, j
w
f, i
i
,
j
)
w
g, j
=
i, j
w
f, i
C
i, j
w
g, j
=w
T
f
Cw
g
for every pair of test functions f
n
, g
n
when Q=cov.w, w/
1
=K
T
C
1
K.
For case (b), =1 and E =W. Using the same technique as in (a), but with lemma 2 instead of Greens
identity, L
1=2
f
n
, L
1=2
x
n
)
=f
n
x
n
)
H
1
., /
=w
T
f
Kw and
cov.L
1=2
f
n
, W)
, L
1=2
g
n
, W)
/ =L
1=2
f
n
, L
1=2
g
n
)
=f
n
g
n
)
H
1
., /
=w
T
f
Kw
g
so Q=K
T
K
1
K=K, noting that K is a symmetric matrix since both C and G are symmetric.
34 F. Lindgren, H. Rue and J. Lindstrm
Finally, for case (c), =2 and E =E
n
is a GFon H
1
n
., / with precision Q
E, n
. Using the same technique
as for (a),
cov.f
n
, Lx
n
)
, g
n
, Lx
n
)
/ =w
T
f
Kcov.w, w/K
T
w
g
:
and the nite basis representation of the noise E
n
gives
cov.f
n
, E
n
)
, g
n
, E
n
)
/ =w
T
f
CQ
1
E, n
Cw
g
:
Requiring equality for all pairs of test functions yields Q=K
T
C
1
Q
E, n
C
1
K. Here, keeping the transposes
allows the proof to apply also to the intrinsic free-boundary cases.
D.3.2. Proof of theorem 3 (convergence)
First, we show that expression (28) follows from expression (29). Let L=.
2
/, let f and g be functions
in H
1
., /, and let
f the solution to the PDE
L
f.u/ =f.u/, u ,
@
n
f.u/ =0, u @,
and correspondingly for g. Then
fs and g are in H
1
., / and further full the requirements of lemma 1
and lemma 2. Therefore,
f, x
n
)
=L
f, x
n
)
=
f, x
n
)
H
1
., /
=
f, Lx
n
)
,
and
f, x)
=L
f, x)
=
f, x)
H
1
., /
=
f, Lx)
,
where the last equality holds when =2, since W is L
2
./ bounded. The convergence of x
n
to x follows
from expression (29). In the Galerkin case (a), we have
cov.f, x
n
)
, g, x
n
)
/ =cov.
f, Lx
n
)
, g, Lx
n
)
/
cov.
f, Lx)
, g, x)
/ =cov.f, x)
, g, x)
/,
and similarly for the least squares case (b).
For expression (29), let f
n
=
k
k
w
f, k
and g
n
=
k
k
w
g, k
be the orthogonal projections of f and g onto
H
1
n
, . In case (a), then
f, Lx
n
)
=f, x
n
)
H
1
., /
=f f
n
, x
n
)
H
1
., /
f
n
, x
n
)
H
1
., /
=f
n
, x
n
)
H
1
., /
,
and
cov.f, Lx
n
)
, g, Lx
n
)
/ =cov.f
n
, x
n
)
H
1
., /
, g
n
, x
n
)
H
1
., /
/
=cov.f
n
, W)
, g
n
, W)
/ =f
n
, g
n
)
f, g)
=cov.f, W)
, g, W)
/
as n. Similarly in case (b), for any f H
1
., / fullling the requirements of lemma 2,
L
1=2
f, L
1=2
x
n
)
=f, x
n
)
H
1
., /
=f
n
, x
n
)
H
1
., /
,
and
cov.L
1=2
f, L
1=2
x
n
)
, L
1=2
g, L
1=2
x
n
)
/ =cov.f
n
, x
n
)
H
1
., /
, g
n
, x
n
)
H
1
., /
/
=cov.L
1=2
f
n
, W)
, L
1=2
g
n
, W)
/ =f
n
, g
n
)
H
1
., /
f, g)
H
1
., /
=L
1=2
f, L
1=2
g)
=cov.L
1=2
f, W)
, L
1=2
g, W)
/
as n.
Link between Gaussian Fields and Gaussian Markov Random Fields 35
D.3.3. Proof of theorem 4 (iterative convergence)
First, we show that expression (31) follows from expression (32). Let
f and g be dened as in the proof of
theorem 3. Then, since L=
2
,
f, x
n
)
=
f, Lx
n
)
and
f, x)
=
f, Lx)
,
and the convergence of x
n
to x follows from expression (32). For expression (32) as in the proof of theorem
3, f, Lx
n
)
=f
n
, x
n
)
H
1
., /
, and
cov.f, Lx
n
)
, g, Lx
n
)
/ =cov.f
n
, x
n
)
H
1
., /
, g
n
, x
n
)
H
1
., /
/
=cov.f
n
, y
n
)
, g
n
, y
n
)
/ =cov.f, y
n
)
, g, y
n
)
/
cov.f, y)
, g, y)
/ =cov.f, Lx)
, g, Lx)
/
as n, due to requirement (30).
References
Adler, R. J. (2009) The Geometry of RandomFields. Philadelphia: Society for Industrial and Applied Mathematics.
Adler, R. J. and Taylor, J. (2007) Random Fields and Geometry. New York: Springer.
Allcroft, D. J. and Glasbey, C. A. (2003) A latent Gaussian Markov random-eld model for spatiotemporal
rainfall disaggregation. Appl. Statist., 52, 487498.
Arjas, E. and Gasbarra, D. (1996) Bayesian inference of survival probabilities, under stochastic ordering con-
straints. J. Am. Statist. Ass., 91, 11011109.
Auslander, L. and MacKenzie, R. E. (1977) Introduction to Differentiable Manifolds. New York: Dover Publica-
tions.
Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004) Hierarchical Modeling and Analysis for Spatial Data. Boca
Raton: Chapman and Hall.
Banerjee, S., Gelfand, A. E. Finley, A. O. and Sang, H. (2008) Gaussian predictive process models for large spatial
data sets. J. R. Statist. Soc. B, 70, 825848.
Bansal, R., Staib, L. H., Xu, D., Zhu, H. and Peterson, B. S. (2007) Statistical analyses of brain surfaces using
Gaussian random elds on 2-D manifolds. IEEE Trans. Med. Imgng, 26, 4657.
Besag, J. (1974) Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Statist.
Soc. B, 36, 192236.
Besag, J. (1975) Statistical analysis of non-lattice data. Statistician, 24, 179195.
Besag, J. (1981) On a system of two-dimensional recurrence equations. J. R. Statist. Soc. B, 43, 302309.
Besag, J. and Kooperberg, C. (1995) On conditional and intrinsic autoregressions. Biometrika, 82, 733746.
Besag, J. and Mondal, D. (2005) First-order intrinsic autoregressions and the de Wijs process. Biometrika, 92,
909920.
Besag, J., York, J. and Molli, A. (1991) Bayesian image restoration with two applications in spatial statistics
(with discussion). Ann. Inst. Statist. Math., 43, 159.
Bolin, D. and Lindgren, F. (2009) Wavelet Markov models as efcient alternatives to tapering and convolution
elds. Preprint 2009:13. Lund University, Lund.
Bolin, D. and Lindgren, F. (2011) Spatial models generated by nested stochastic partial differential equations.
Ann. Appl. Statist., to be published.
Brenner, S. C. and Scott, R. (2007) The Mathematical Theory of Finite Element Methods, 3rd edn. New York:
Springer.
Brohan, P., Kennedy, J., Harris, I., Tett, S. and Jones, P. (2006) Uncertainty estimates in regional and global
observed temperature changes: a new dataset from 1850. J. Geophys. Res., 111.
Chen, C. M. and Thome, V. (1985) The lumped mass nite element method for a parabolic problem. J. Aust.
Math. Soc. B, 26, 329354.
Chils, J. P. and Delner, P. (1999) Geostatistics: Modeling Spatial Uncertainty. Chichester: Wiley.
Ciarlet, P. G. (1978) The Finite Element Method for Elliptic Problems. Amsterdam: North-Holland.
Cressie, N. A. C. (1993) Statistics for Spatial Data. New York: Wiley.
Cressie, N. and Huang, H. C. (1999) Classes of nonseparable, spatio-temporal stationary covariance functions.
J. Am. Statist. Ass., 94, 13301340.
Cressie, N. and Johannesson, G. (2008) Fixed rank kriging for very large spatial data sets. J. R. Statist. Soc. B,
70, 209226.
36 F. Lindgren, H. Rue and J. Lindstrm
Cressie, N. and Verzelen, N. (2008) Conditional-mean least-squares tting of Gaussian Markov random elds to
Gaussian elds. Computatnl Statist. Data Anal., 52, 27942807.
Dahlhaus, R. and Knsch, H. R. (1987) Edge effects and efcient parameter estimation for stationary random
elds. Biometrika, 74, 877882.
Das, B. (2000) Global covariance modeling: a deformation approach to anisotropy. PhD Thesis. Department of
Statistics, University of Washington, Seattle.
Davis, T. A. (2006) Direct Methods for Sparse Linear Systems. Philadelphia: Society for Industrial and Applied
Mathematics.
Diggle, P. J. and Ribeiro, P. J. (2006) Model-based Geostatistics. New York: Springer.
Duff, I. S., Erisman, A. M. and Reid, J. K. (1989) Direct Methods for Sparse Matrices, 2nd edn. New York:
Clarendon.
Edelsbrunner, H. (2001) Geometry and Topology for Mesh Generation. Cambridge: Cambridge University Press.
Eidsvik, J., Finley, A. O., Banerjee, S. and Rue, H. (2010) Approximate bayesian inference for large spatial data-
sets using predictive process models. Technical Report 9. Department of Mathematical Sciences, Norwegian
University of Science and Technology, Trondheim.
Federer, H. (1951) Hausdorff measure and Lebesgue area. Proc. Natn Acad Sci. USA, 37, 9094.
Federer, H. (1978) Colloquium lectures on geometric measure theory. Bull. Am. Math. Soc., 84, 291338.
Fuentes, M. (2001) High frequency kriging for nonstationary environmental processes. Environmetrics, 12,
469483.
Fuentes, M. (2008) Approximate likelihood for large irregular spaced spatial data. J. Am. Statist. Ass., 102,
321331.
Furrer, R., Genton, M. G. and Nychka, D. (2006) Covariance tapering for interpolation of large spatial datasets.
J. Computnl Graph. Statist., 15, 502523.
George, A. and Liu, J. W. H. (1981) Computer Solution of Large Sparse Positive Denite Systems. Englewood
Cliffs: Prentice Hall.
Gneiting, T. (1998) Simple tests for the validity of correlation function models on the circle. Statist. Probab. Lett.,
39, 119122.
Gneiting, T. (2002) Nonseparable, stationary covariance functions for space-time data. J. Am. Statist. Ass., 97,
590600.
Gneiting, T., Kleiber, W. and Schlather, M. (2010) Matrn cross-covariance functions for multivariate random
elds. J. Am. Statist. Ass., 105, 11671177.
Gschll, S. and Czado, C. (2007) Modelling count data with overdispersion and spatial effects. Statistical Papers.
(Available from http://dx.doi.org/10.1007/s00362-006-0031-6.)
Guttorp, P. and Gneiting, T. (2006) Studies in the history of probability and statistics XLIX: on the Matrn
correlation family. Biometrika, 93, 989995.
Guyon, X. (1982) Parameter estimation for a stationary process on a d-dimensional lattice. Biometrika, 69,
95105.
Hansen, J., Ruedy, R., Glascoe, J. and Sato, M. (1999) GISS analysis of surface temperature change. J. Geophys.
Res., 104, 3099731022.
Hansen, J., Ruedy, R., Sato, M., Imhoff, M., Lawrence, W., Easterling, D., Peterson, T. and Karl, T.
(2001) A closer look at United States and global surface temperature change. J. Geophys. Res., 106, 23947
23963.
Hartman, L. and Hssjer, O. (2008) Fast kriging of large data sets with Gaussian Markov randomelds. Computnl
Statist. Data Anal., 52, 23312349.
Heine, V. (1955) Models for two-dimensional stationary stochastic processes. Biometrika, 42, 170178.
Henderson, R., Shimakura, S. and Gorst, D. (2002) Modelling spatial variation in leukemia survival data. J. Am.
Statist. Ass., 97 965972.
Higdon, D. (1998) A process-convolution approach to modelling temperatures in the North Atlantic Ocean.
Environ. Ecol. Statist., 5, 173190.
Higdon, D., Swall, J. and Kern, J. (1999) Non-stationary spatial modelling. In Bayesian Statistics 6 (eds
J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith), pp. 761768. New York: Oxford University
Press.
Hjelle, . and Dhlen, M. (2006) Triangulations and Applications. Berlin: Springer.
Hrafnkelsson, B. and Cressie, N. A. C. (2003) Hierarchical modeling of count data with application to nuclear
fall-out. Environ. Ecol. Statist. 10, 179200.
Hughes-Oliver, J. M., Gonzalez-Farias, G. Lu, J. C. and Chen, D. (1998) Parametric nonstationary correlation
models. Statist. Probab. Lett., 40, 267278.
Ili c, M., Turner, I. W. and Anh, V. (2008) A numerical solution using an adaptively preconditioned Lanczos
method for a class of linear systems related with the fractional Poisson equation. J. Appl. Math. Stoch. Anal.,
104525.
Jones, R. H. (1963) Stochastic processes on a sphere. Ann. Math. Statist., 34, 213218.
Jun, M. and Stein, M. L. (2008) Nonstationary covariancs models for global data. Ann. Appl. Statist., 2,
12711289.
Link between Gaussian Fields and Gaussian Markov Random Fields 37
Karypis, G. and Kumar, V. (1998) METIS: a Software Package for Partitioning Unstructured Graphs, Partitioning
Meshes, and Computing Fill-reducing Orderings of Sparse Matrices, Version 4.0. Minneapolis: University of
Minnesota. (Available from http://www-users.cs.umn.edu/karypis/metis/index.html.)
Kneib, T. and Fahrmeir, L. (2007) A mixed model approach for geoadditive hazard regression. Scand. J. Statist.,
34, 207228.
Krantz, S. G. and Parks, H. R. (2008) Geometric Integration Theory. Boston: Birkhuser.
Lindgren, F. and Rue, H. (2008) A note on the second order random walk model for irregular locations. Scand.
J. Statist., 35, 691700.
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. London: Chapman and Hall.
Paciorek, C. and Schervish, M. (2006) Spatial modelling using a new class of nonstationary covariance functions.
Environmetrics, 17, 483506.
Peterson, T. and Vose, R. (1997) An overview of the Global Historical Climatology Network temperature data-
base. Bull. Am. Meteorol. Soc., 78, 28372849.
Pettitt, A. N., Weir, I. S. and Hart, A. G. (2002) A conditional autoregressive Gaussian process for irregularly
spaced multivariate data with application to modelling large sets of binary data. Statist. Comput., 12, 353367.
Quarteroni, A. M. and Valli, A. (2008) Numerical Approximation of Partial Differential Equations, 2nd edn.
New York: Springer.
Rozanov, A. (1982) Markov Random Fields. New York: Springer.
Rue, H. (2001) Fast sampling of Gaussian Markov random elds. J. R. Statist. Soc. B 63, 325338.
Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications. London: Chapman and
Hall.
Rue, H., Martino, S. and Chopin, N. (2009) Approximate Bayesian inference for latent Gaussian models by using
integrated nested Laplace approximations (with discussion). J. R. Statist. Soc. B, 71, 319392.
Rue, H. and Tjelmeland, H. (2002) Fitting Gaussian Markov random elds to Gaussian elds. Scand. J. Statist.,
29, 3150.
Samko, S. G., Kilbas, A. A. andMaricev, O. I. (1992) Fractional Integrals and Derivatives: Theory and Applications.
Yverdon: Gordon and Breach.
Sampson, P. D. and Guttorp, P. (1992) Nonparametric estimation of nonstationary spatial covariance structure.
J. Am. Statist. Ass., 87, 108119.
Smith, T. (1934) Change of variables in Laplaces and other second-order differential equations. Proc. Phys. Soc.,
46, 344349.
Song, H., Fuentes, M. and Gosh, S. (2008) A compariative study of Gaussian geostatistical models and Gaussian
Markov random eld models. J. Multiv. Anal., 99, 16811697.
Stein, M. (2005) Space-time covariance functions. J. Am. Statist. Ass., 100, 310321.
Stein, M. L. (1999) Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer.
Stein, M. L., Chi, Z. and Welty, L. J. (2004) Approximating likelihoods for large spatial data sets. J. R. Statist.
Soc. B, 66, 275296.
Vecchia, A. V. (1988) Estimation and model identication for continuous spatial processes. J. R. Statist. Soc. B,
50, 297312.
Wahba, G. (1981) Spline interpolation and smoothing on the sphere. SIAM J. Scient. Statist. Comput., 2, 516.
Wall, M. M. (2004) A close look at the spatial structure implied by the CAR and SAR models. J. Statist. Planng
and Inf., 121, 311324.
Weir, I. S. and Pettitt, A. N. (2000) Binary probability maps using a hidden conditional autoregressive Gaussian
process with an application to Finnish common toad data. Appl. Statist., 49, 473484.
Whittle, P. (1954) On stationary processes in the plane. Biometrika, 41, 434449.
Whittle, P. (1963) Stochastic processes in several dimensions. Bull. Inst. Int. Statist., 40, 974994.
Yue, Y. and Speckman, P. (2010) Nonstationary spatial Gaussian Markov random elds. J. Computnl Graph.
Statist., 19, 96116.