Lindgren 16 March 2011 PDF

2011 Royal Statistical Society 13697412/11/73000
Proofs subject tocorrection. Not tobe reproducedwithout permission. Contributions tothe dis-
cussionmust not exceed400 words. Contributions longer than400 words will be cut by the editor.
Sendcontributionstojournal@rss.org.uk.Seehttp://www.rss.org.uk/preprints
RSSB b2136 Dispatch: 11.2.2011 No. of pages:37
J. R. Statist. Soc. B (2011)
73, Part 4, pp.
An explicit link between Gaussian elds and
Gaussian Markov random elds: the stochastic
partial differential equation approach
Finn Lindgren and Hvard Rue
Norwegian University of Science and Technology, Trondheim, Norway
and Johan Lindstrm
Lund University, Sweden
[Read before The Royal Statistical Society at a meeting organized by the Research Section on
Wednesday, March 16th, 2011, Professor D. M. Titterington in the Chair ]
Summary. Continuously indexed Gaussian elds (GFs) are the most important ingredient in
spatial statistical modelling and geostatistics. The specication through the covariance func-
tion gives an intuitive interpretation of the eld properties. On the computational side, GFs are
hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the
dimension. Although computational power today is at an all time high, this fact seems still to be
a computational bottleneck in many applications. Along with GFs, there is the class of Gauss-
ian Markov random elds (GMRFs) which are discretely indexed. The Markov property makes
the precision matrix involved sparse, which enables the use of numerical algorithms for sparse
matrices, that for elds in R
2
only use the square root of the time required by general algorithms.
The specication of a GMRF is through its full conditional distributions but its marginal properties
are not transparent in such a parameterization. We show that, using an approximate stochastic
weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the
Matrn class, provide an explicit link, for any triangulation of R
d
, between GFs and GMRFs, for-
mulated as a basis function representation. The consequence is that we can take the best from
the two worlds and do the modelling by using GFs but do the computations by using GMRFs.
Perhaps more importantly, our approach generalizes to other covariance functions generated
by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illus-
trate our approach by analysing global temperature data with a non-stationary model dened
on a sphere.
Keywords: Approximate Bayesian inference; Covariance functions; Gaussian elds; Gaussian
Markov random elds; Latent Gaussian models; Sparse matrices; Stochastic partial differential
equations
1. Introduction
Gaussian elds (GFs) have a dominant role in spatial statistics and especially in the traditional
eld of geostatistics (Cressie, 1993; Stein, 1999; Chils and Delner, 1999; Diggle and Ribeiro,
2006) and form an important building block in modern hierarchical spatial models (Banerjee
et al., 2004). GFs are one of a fewappropriate multivariate models with an explicit and comput-
able normalizing constant and have good analytic properties otherwise. In a domain DR
d
with
co-ordinate s D, x.s/ is a continuously indexed GF if all nite collections {x.s
i
/} are jointly
Address for correspondence: Hvard Rue, Department of Mathematical Sciences, Norwegian University of
Science and Technology, N-7491 Trondheim, Norway.
E-mail: hrue@math.ntnu.no
2 F. Lindgren, H. Rue and J. Lindstrm
Gaussian distributed. In most cases, the GF is specied by using a mean function ./ and a
covariance functionC., /, sothe meanis =..s
i
// andthe covariance matrix is =.C.s
i
, s
j
//.
Oftenthe covariance functionis only a functionof the relative positionof twolocations, inwhich
case it is said to be stationary, and it is isotropic if the covariance functions depends only on the
Euclidean distance between the locations. Since a regular covariance matrix is positive denite,
the covariance function must be a positive denite function. This restriction makes it difcult to
invent covariance functions stated as closed form expressions. Bochners theorem can be used
in this context, as it characterizes all continuous positive denite functions in R
d
.
Although GFs are convenient from both an analytical and a practical point of view, the
computational issues have always been a bottleneck. This is due to the general cost of O.n
3
/ to
factorize dense nn (covariance) matrices. Although the computational power today is at an
all time high, the tendency seems to be that the dimension n is always set, or we want to set it, a
little higher than the value that gives a reasonable computation time. The increasing popularity
of hierarchical Bayesian models has made this issue more important, as repeated computations
(as for simulation-based model tting) can be very slow, perhaps infeasible (Banerjee et al.
(2004), page 387), and the situation is informally referred to as the big n problem.
There are several approaches to try to overcome or avoid the big n problem. The spec-
tral representation approach for the likelihood (Whittle, 1954) makes it possible to estimate the
(power) spectrum(using discrete Fourier transforms calculations) and to compute the log-likeli-
hood fromit (Guyon, 1982; Dahlhaus and Knsch, 1987; Fuentes, 2008) but this is only possible
for directly observed stationary GFs on a (near) regular lattice. Vecchia (1988) and Stein et al.
(2004) proposed to use an approximate likelihood constructed through a sequential represen-
tation and then to simplify the conditioning set, and similar ideas also apply when computing
conditional expectations (kriging). An alternative approach is to do exact computations on a
simplied Gaussian model of low rank (Banerjee et al., 2008; Cressie and Johannesson, 2008;
Eidsvik et al., 2010). Furrer et al. (2006) applied covariance tapering to zero-out parts of the
covariance matrix to gain a computational speed-up. However, the sparsity pattern will depend
on the range of the GFs, and the potential in a related approach, named lattice methods by
Banerjee et al. (2004), section A.5.3, is superior to the covariance tapering idea. In this approach
the GF is replaced by a Gaussian Markov random eld (GMRF); see Rue and Held (2005) for
a detailed introduction and Rue et al. (2009), section 2.1, for a condensed review. A GMRF is
a discretely indexed Gaussian eld x, where the full conditionals .x
i
[x
i
/, i =1, . . . , n, depend
only on a set of neighbours @i to each site i (where consistency requirements imply that if
i @j then also j @i). The computational gain comes from the fact that the zero pattern of the
precision matrix Q (the inverse covariance matrix) relates directly to the notion of neighbours;
Q
ij
,=0i @j j; see, for example, Rue and Held (2005) section 2.2. Algorithms for Markov
chain Monte Carlo sampling will repeatedly update from these simple full conditionals, which
explains to a large extent the popularity of GMRFs in recent years, starting already with the
seminal papers by Besag (1974, 1975). However, GMRFs also allow for fast direct numeri-
cal algorithms (Rue, 2001), as numerical factorization of the matrix Q can be done by using
sparse matrix algorithms (George and Liu, 1981; Duff et al., 1989; Davis, 2006) at a typical
cost of O.n
3=2
/ for two-dimensional GMRFs; see Rue and Held (2005) for detailed algorithms.
GMRFs have very good computational properties, which are of major importance in Bayesian
inferential methods. This is further enhanced by the link to nested integrated Laplace approx-
imations (Rue et al., 2009), which allow fast and accurate Bayesian inference for latent GF
models.
Although GMRFs have very good computational properties, there are reasons why current
statistical models based on GMRFs are relatively simple, in particular when applied to area data
Link between Gaussian Fields and Gaussian Markov Random Fields 3
fromregions or counties. First, there has been no good way to parameterize the precision matrix
of a GMRF to achieve predened behaviour in terms of correlation between two sites and to
control marginal variances. In matrix terms, the reason for this is that one must construct a
positive denite precision matrix to obtain a positive denite covariance matrix as its inverse, so
the conditions for proper covariance matrices are replaced by essentially equivalent conditions
for sparse precision matrices. Therefore, often simplistic approaches are taken, like letting Q
ij
be related to the reciprocal distance between sites i and j (Besag et al., 1991; Arjas and Gasbarra,
1996; Weir and Pettitt, 2000; Pettitt et al., 2002; Gschll and Czado, 2007); however, a more
detailed analysis shows that such a rationale is suboptimal (Besag and Kooperberg, 1995; Rue
and Tjelmeland, 2002) and can give surprising effects (Wall, 2004). Secondly, it is unclear how
large the class of useful GMRF models really is by using only a simple neighbourhood. The
complicating issue here is the global positive deniteness constraint, and it might not be evident
how this inuences the parameterization of the full conditionals.
Rue and Tjelmeland (2002) demonstrated empirically that GMRFs could closely approxi-
mate most of the commonly used covariance functions in geostatistics, and they proposed to
use them as computational replacements for GFs for computational reasons like doing kriging
(Hartman and Hssjer, 2008). However, there were several drawbacks with their approach; rst,
the tting of GMRFs to GFs was restricted to a regular lattice (or torus) and the t itself had to
be precomputed for a discrete set of parameter values (like smoothness and range), using a time-
consuming numerical optimization. Despite these proof-of-concept results, several researchers
have followed up this idea without any notable progress in the methodology (Hrafnkelsson and
Cressie, 2003; Song et al., 2008; Cressie and Verzelen, 2008), but the approach itself has shown
to be useful even for spatiotemporal models (Allcroft and Glasbey, 2003).
The discussion so far has revealed a modelling or computational strategy for approaching
the big n problem in a seemingly good way.
(a) Do the modelling by using a GF on a set of locations {s
i
}, to construct a discretized GF
with covariance matrix .
(b) Find a GMRF with local neighbourhood and precision matrix Q that represents the GF
in the best possible way, i.e. Q
1
is close to in some norm. (We deliberately use the
word represents instead of approximates.)
(c) Do the computations using the GMRF representation by using numerical methods for
sparse matrices.
Such an approach relies on several assumptions. First the GF must be of such a type that there
is a GMRF with local neighbourhood that can represent it sufciently accurately to maintain
the interpretation of the parameters and the results. Secondly, we must be able to compute the
GMRF representation from the GF, at any collections of locations, so fast that we still achieve
a considerable speed-up compared with treating the GF directly.
The purpose of this paper is to demonstrate that these requirements can indeed be met for
certain members of GFs with the Matrn covariance function in R
d
, where the GMRF rep-
resentation is available explicitly. Although these results are seemingly restrictive at rst sight,
they cover the most important and most used covariance model in spatial statistics; see Stein
(1999), page 14., which concluded a detailed theoretical analysis with Use the Matrn model.
The GMRF representation can be constructed explicitly by using a certain stochastic partial
differential equation (SPDE) which has GFs with Matrn covariance function as the solution
when driven by Gaussian white noise. The result is a basis function representation with piece-
wise linear basis functions, and Gaussian weights with Markov dependences determined by a
general triangulation of the domain.
Rather surprisingly, extending this basic result seems to open new doors and opportunities,
and to provide quite simple answers to rather difcult modelling problems. In particular, we
shall show how this approach extends to Matrn elds on manifolds, non-stationary elds and
elds with oscillating covariance functions. Further, we shall discuss the link to the deformation
method of Sampson and Guttorp (1992) for non-stationary covariances for non-isotropic mod-
els, and how our approach naturally extends to non-separable spacetime models. Our basic
task, to do the modelling by using GFs and the computations by using the GMRF representa-
tion, still holds for these extensions as the GMRF representation is still available explicitly. An
important observation is that the resulting modelling strategy does not involve having to con-
struct explicit formulae for the covariance functions, which are instead only dened implicitly
through the SPDE specications.
The plan of the rest of this paper is as follows. In Section 2, we discuss the relationship between
Matrn covariances and a specic SPDE, and we present the two main results for explicitly con-
structing the precision matrices for GMRFs based on this relationship. In Section 3, the results
are extended to elds on triangulated manifolds, non-stationary and oscillating models, and
non-separable spacetime models. The extensions are illustrated with a non-stationary analysis
of global temperature data in Section 4, and we conclude the main part of the paper with a dis-
cussion in Section 5. Thereafter follows four technical appendices, with explicit representation
results (A), theory for random elds on manifolds (B), the Hilbert space representation details
(C) and proofs of the technical details (D).
2. Preliminaries and main results
This section will introduce the Matrn covariance model and discuss its representation through
an SPDE. We shall state explicit results for the GMRF representation of Matrn elds on a
regular lattice and do an informal summary of the main results.
2.1. Matrn covariance model and its stochastic partial differential equation
Let || denote the Euclidean distance in R
d
. The Matrn covariance function between locations
u, v R
d
is dened as
r.u, v/ =

2
2
1
./
.|v u|/
.|v u|/: .1/

Here, K
is the modied Bessel function of the second kind and order >0, >0 is a scaling
parameter and
2
is the marginal variance. The integer value of determines the mean-square
differentiability of the underlying process, which matters for predictions that are made by using
such a model. However, is usually xed since it is poorly identied in typical applications. A
more natural interpretation of the scaling parameter is as a range parameter ; the Euclidean
distance where x.u/ and x.v/ is almost independent. Lacking a simple relationship, we shall
throughout this paper use the empirically derived denition =
.8/=, corresponding to
correlations near 0.1 at the distance , for all .
The Matrn covariance function appears naturally in various scientic elds (Guttorp and
Gneiting, 2006), but the important relationship that we shall make use of is that a GF x.u/ with
the Matrn covariance is a solution to the linear fractional SPDE
.
2
/
=2
x.u/ =W.u/, u R
d
, = d=2, >0, >0, .2/
where .
2
/
=2
is a pseudo-differential operator that we shall dene later in equation (4)
through its spectral properties (Whittle, 1954, 1963). The innovation process W is spatial
Gaussian white noise with unit variance, is the Laplacian
=
d
i=1
@
2
@x
2
i
,
and the marginal variance is
2
=
./
. d=2/.4/
d=2
2
:
We shall name any solution to equation (2) a Matrn eld in what follows. However, the limiting
solutions to the SPDE (2) as 0 or 0 do not have Matrn covariance functions, but the
SPDE still has solutions when =0 or =0 which are well-dened random measures. We shall
return to this issue in Appendix C.3. Further, there is an implicit assumption of appropriate
boundary conditions for the SPDE, as for 2 the null space of the differential operator is
non-trivial, containing, for example, the functions exp.e
T
u/, for all |e|=1. The Matrn elds
are the only stationary solutions to the SPDE.
The proof that was given by Whittle (1954, 1963) is to show that the wave number spectrum
of a stationary solution is
R.k/ =.2/
d
.
2
|k|
2
/
, .3/
using the Fourier transform denition of the fractional Laplacian in R
d
,
{F.
2
/
=2
}.k/ =.
2
|k|
2
/
=2
.F/.k/, .4/
where is a function on R
d
for which the right-hand side of the denition has a well-dened
inverse Fourier transform.
2.2. Main results
This section contains our main results, however, in a loose and imprecise form. In the appendi-
ces, our statements are made precise and the proofs are given. In the discussion we shall restrict
ourselves to dimension d =2 although our results are general.
2.2.1. Main result 1
For our rst result, we shall use some hand waving arguments and a simple but powerful con-
sequence of a partly analytic result of Besag (1981). We shall show that these results are true
in the appendices. Let x be a GMRF on a regular (tending to innite) two-dimensional lattice
indexed by ij, where the Gaussian full conditionals are
E.x
ij
[x
ij
/ =
1
a
.x
i1,j
x
i1,j
x
i,j1
x
i,j1
/,
var.x
ij
[x
ij
/ =1=a
.5/
and [a[ >4. To simplify the notation, we write this particular model as
.6/
which displays the elements of the precision matrix related to a single location (section 3.4.2 in
Rue and Held (2005) uses a related graphical notation). Owing to symmetry, we display only
the upper right quadrant, with a as the central element. The approximate result (Besag (1981),
equation 14)) is that
cov.x
ij
, x
i
/
j
/ /
a
2
K
0
{l
.a4/}, l ,=0,
where l is the Euclidean distance between ij and i
/
j
/
. Evaluated for continuous distances, this is
a generalized covariance function, which is obtained from equation (1) in the limit 0, with
2
=a4 and
2
=a=4, even though equation (1) requires >0. Informally, this means that
the discrete model dened by expression (5) generates approximate solutions to the SPDE (2)
on a unit distance regular grid, with =0.
Solving equation (2) for =1 gives a generalized random eld with spectrum
R
1
.k/ .a4|k|
2
/
1
,
meaning that (some discretized version of) the SPDE acts like a linear lter with squared trans-
fer function equal to R
1
. If we replace the noise term on the right-hand side of equation (2) by
Gaussian noise with spectrum R
1
, the resulting solution has spectrum R
2
=R
2
1
, and so on. The
consequence is GMRFrepresentations for the Matrn elds for =1 and =2, as convolutions
of the coefcients in (6): =1,
=2,
The marginal variance is 1={4.a 4/
}. Fig. 1 shows how accurate these approximations

are for =1 and range 10 and 100, displaying the Matrn correlations and the linearly inter-
polated correlations for integer lags for the GMRF representation. For range 100 the results
0 5 10 15 20
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Distance
C
o
r
r
e
l
a
t
i
o
n
0 50 100 150 200
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Distance
C
o
r
r
e
l
a
t
i
o
n
(a) (b)
Fig. 1. Mat ern correlations ( ) for range (a) 10 and (b) 100, and the correlations for the GMRF repre-
sentation ()
are indistinguishable. The root-mean-square error between correlations up to twice the range
is 0.01 and 0.0003 for range 10 and 100 respectively. The error in the marginal variance is 4%
for range 10 and negligible for range 100.
Our rst main result conrms the above heuristics.
Result 1. The coefcients in the GMRF representation of equation (2) on a regular unit
distance two-dimensional innite lattice for =1, 2, . . . , is found by convolving model (6) by
itself times.
Simple extensions of this result include anisotropy along the mainaxes, as presentedinAppen-
dix A. A rigorous formulation of the result is derived in the subsequent appendices, showing
that the basic result is a special case of a more general link between SPDEs and GMRFs. The
rst such generalization, which is based on irregular grids, is the next main result.
2.3. Main result 2
Although main result 1 is useful in itself, it is not yet fully practical since often we do not want to
have a regular grid, to avoid interpolating the locations of observations to the nearest grid point,
and to allow for ner resolution where details are required. We therefore extend the regular grid
to irregular grids, by subdividing R
2
into a set of non-intersecting triangles, where any two
triangles meet in at most a common edge or corner. The three corners of a triangle are named
vertices. In most cases we place initial vertices at the locations for the observations and add
additional vertices to satisfy overall soft constraints of the triangles, such as maximally allowed
edge length, and minimally allowed angles. This is a standard problemin engineering for solving
partial differential equations by using nite element methods (FEMs) (Ciarlet, 1978; Brenner
and Scott, 2007; Quarteroni and Valli, 2008), where the quality of the solutions depends on the
triangulation properties. Typically, the triangulation is chosen to maximize the minimum inte-
rior triangle angle, so-called Delaunay triangulations, which helps to ensure that the transitions
between small and large triangles are smooth. The extra vertices are added heuristically to try
to minimize the total number of triangles that are needed to full the size and shape constraints.
See for example Edelsbrunner (2001), and Hjelle and Dhlen (2006) for algorithm details. Our
implementation in the R-inla package (www.r-inla.org) is based on Hjelle and Dhlen
(2006).
To illustrate the process of triangulation of R
2
, we shall use an example from Henderson
et al. (2002) which models spatial variation in leukaemia survival data in north-west England.
Fig. 2(a) displays the locations of 1043 cases of acute myeloid leukaemia in adults who were
diagnosed between 1982 and 1998 in north-west England. In the analysis, the spatial scale has
been normalized so that the width of the study region is equal to 1. Fig. 2(b) displays the tri-
angulation of the area of interest, using ne resolution around the data locations and rough
resolution outside the area of interest. Further, we place vertices at all data locations. The
number of vertices in this example is 1749 and the number of triangles is 3446.
To construct a GMRF representation of the Matrn eld on the triangulated lattice, we start
with a stochastic weak formulation of SPDE (2). Dene the inner product
f, g) =
f.u/g.u/du .7/
where the integral is over the region of interest. The stochastic weak solution of the SPDE is
found by requiring that
{
j
, .
2
/
=2
x), j =1, . . . , m}
d
={
j
, W), j =1, . . . , m} .8/
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
4
0
.
8
Distance
C
o
r
r
e
l
a
t
i
o
n
(a) (b) (c)
Fig. 2. (a) Locations of leukaemia survival observations, (b) triangulation using 3446 triangles and (c) a sta-
tionary correlation function ( ) and the corresponding GMRF approximation () for D1 and approximate
range 0.26
for every appropriate nite set of test functions {
j
.u/, j =1, . . . , m}, where
d
= denotes equality
in distribution.
The next step is to construct a nite element representation of the solution to the SPDE
(Brenner and Scott, 2007) as
x.u/ =
n
k=1
k
.u/w
k
.9/
for some chosen basis functions {
k
} and Gaussian-distributed weights {w
k
}. Here, n is the
number of vertices in the triangulation. We choose to use functions
k
that are piecewise linear
in each triangle, dened such that
k
is 1 at vertex k and 0 at all other vertices. An interpretation
of the representation (9) with this choice of basis functions is that the weights determine the
values of the eld at the vertices, and the values in the interior of the triangles are determined
by linear interpolation. The full distribution of the continuously indexed solution is determined
by the joint distribution of the weights.
The nite dimensional solution is obtained by nding the distribution for the representation
weights in equation (9) that fulls the stochastic weak SPDE formulation (8) for only a specic
set of test functions, with m=n. The choice of test functions, in relation to the basis func-
tions, governs the approximation properties of the resulting model representation. We choose
k
=.
2
/
1=2
k
for =1 and
k
=
k
for =2. These two approximations are denoted the
least squares and the Galerkin solution respectively. For 3, we let =2 on the left-hand side
of equation (2) and replace the right-hand side with a eld generated by 2, and let
k
=
k
.
In essence, this generates a recursive Galerkin formulation, terminating in either =1 or =2;
see Appendix C for details.
Dene the nn-matrices C, G, and K with entries
C
ij
=
i
j
),
G
ij
=
i
j
),
.K
2 /
ij
=
2
C
ij
G
ij
:
Using Neumann boundary conditions (a zero normal derivative at the boundary), we obtain
our second main result, expressed here for R
1
and R
2
.
Result 2. Let Q
,
2 be the precision matrix for the Gaussian weights w as dened in equa-
tion (9) for =1, 2, . . . , as a function of
2
. Then the nite dimensional representations of the
solutions to equation (2) have precisions
Q
1,
2 =K
2 ,
Q
2,
2 =K
2 C
1
K
2 ,
Q
,
2 =K
2 C
1
Q
2,
2 C
1
K
2 , for =3, 4, . . .:
.10/
Some remarks concerning this result are as follows.
(a) The matrices C and G are easy to compute as their elements are non-zero only for pairs
of basis functions which share common triangles (a line segment in R
1
), and their values
do not depend on
2
. Explicit formulae are given in Appendix A.
(b) The matrix C
1
is dense, which makes the precision matrix dense as well. In Appendix
C.5, we show that C can be replaced by the diagonal matrix

C, where

C
ii
=
i
, 1), which
makes the precision matrices sparse, and hence we obtain GMRF models.
(c) A consequence of the previous remarks is that we have an explicit mapping from the
parameters of the GF model to the elements of a GMRF precision matrix, with compu-
tational cost O.n/ for any triangulation.
(d) For the special case where all the vertices are points on a regular lattice, using a regular
triangularization reduces main result 2 to main result 1. Note that the neighbourhood
of the corresponding GMRF in R
2
is 3 3 for =1, is 5 5 for =2, and so on.
Increased smoothness of the random eld induces a larger neighbourhood in the GMRF
representation.
(e) In terms of the smoothness parameter in the Matrn covariance function, these results
correspond to =1=2, 3=2, 5=2, . . . , in R
1
and =0, 1, 2, . . . , in R
2
.
(f) We are currently unable to provide results for other values of ; the main obstacle is the
fractional derivative in the SPDE which is dened by using the Fourier transform (4). A
result of Rozanov (1982), chapter 3.1., for the continuously indexed random eld, says
that a random eld has a Markov property if and only if the reciprocal of the spectrum is
a polynomial. For our SPDE (2) this corresponds to =1, 2, 3, . . . ; see equation (3). This
result indicates that a different approach may be needed to provide representation results
whenis not aninteger, suchas approximating the spectrumitself. Givenapproximations
for general 02, the recursive approach could then be used for general >2.
Although the approach does give a GMRF representation of the Matrn eld on the triangu-
lated region, it is truly an approximation to the stochastic weak solution as we use only a subset
of the possible test functions. However, for a given triangulation, it is the best possible approxi-
mation in the sense that is made explicit in Appendix C, where we also show weak convergence
to the full SPDE solutions. Using standard results from the nite element literature (Brenner
and Scott, 2007), it is also possible to derive rates of convergence results, like, for =2,
sup
fH
1
;|f|
H
1
1
{E.f, x
n
x)
2
H
1
/}ch
2
: .11/
Here, x
n
is the GMRF representation of the SPDE solution x, h is the diameter of the largest
circle that can be inscribed in a triangle in the triangulation and c is some constant. The Hilbert
space scalar product and norm are dened in denition 2 in Appendix B, which also includes
the values and the gradients of the eld. The result holds for general d 1, with h proportional
to the edge lengths between the vertices, when the minimal mesh angles are bounded away from
zero.
To see how well we can approximate the Matrn covariance, Fig. 2(c) displays the empirical
correlation function (dots) and the theoretical function for =1 with approximate range 0.26,
using the triangulation in Fig. 2(b). The match is quite good. Some dots show a discrepancy
from the true correlations, but these can be identied to be due to the rather rough triangula-
tion outside the area of interest which is included to reduce edge effects. In practice there is a
trade-off between accuracy of the GMRF representation and the number of vertices used. In
Fig. 2(b) we chose to use a ne resolution in the study area and a reduced resolution outside.
A minor drawback in using these GMRFs in place of given stationary covariance models is
the boundary effects due to the boundary conditions of the SPDE. In main result 2 we used
Neumann conditions that inate the variance near the boundary (see Appendix A.4 for details)
but other choices are also possible (see Rue and Held (2005), chapter 5).
2.4. Leukaemia example
We shall now return to the example from Henderson et al. (2002) at the beginning of Section 2.3
which models spatial variation in leukaemia survival data in north-west England. The speci-
cation, in (pseudo) WilkinsonRogers notation (McCullagh and Nelder (1989), section 3.4) is
survival(time, censoring) intercept sexage wbc tpi spatial(location)
using a Weibull likelihood for the survival times, and where wbc is the white blood cell count at
diagnosis, tpi is the Townsend deprivation index (which is a measure of economic deprivation
for the related district) and spatial is the spatial component depending on the spatial location
for each measurement. The hyperparameters in this model are the marginal variance and range
for the spatial component and the shape parameter in the Weibull distribution.
Kneib and Fahrmeir (2007) reanalysed the same data set by using a Cox proportional hazards
model but, for computational reasons, used a low rank approximation for the spatial compo-
nent. With our GMRFrepresentation we easily work with a sparse 17491749 precision matrix
for the spatial component. We ran the model in R-inla (www.r-inla.org) using integrated
nested Laplace approximations to do the full Bayesian analysis (Rue et al., 2009). Fig. 3 displays
the posterior mean and standard deviation of the spatial component. A full Bayesian analysis
Easting
N
o
r
t
h
i
n
g
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
0.6
0.4
0.2
0.0
0.2
0.4
Easting
N
o
r
t
h
i
n
g
0.0
0.2
0.4
0.6
0.8
1.0
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
(a) (b)
Fig. 3. (a) Posterior mean and (b) standard deviation of the spatial effect on survival by using the GMRF
representation
took about 16 s on a quad-core laptop, and factorizing the 27972797 (total) precision matrix
took about 0.016 s on average.
3. Extensions: beyond classical Matrn models
In this section we shall discuss ve extensions to the SPDE, widening the usefulness of the
GMRF construction results in various ways. The rst extension is to consider solutions to the
SPDE on a manifold, which allows us to dene Matrn elds on domains such as a sphere.
The second extension is to allow for space varying parameters in the SPDE which allows us to
construct non-stationary locally isotropic GFs. The third extension is to study a complex ver-
sion of equation (2) which makes it possible to construct oscillating elds. The fourth extension
generalizes the non-stationary SPDEto a more general class of non-isotropic elds. Finally, the
fth extension shows how the SPDE generalizes to non-separable spacetime models.
An important feature in our approach is that all these extensions still give explicit GMRF
representations that are similar to expression (9) and (10), even if all the extensions are com-
bined. The rather amazing consequence, is that we can construct the GMRF representations
of non-stationary oscillating GFs on the sphere, still not requiring any computation beyond
the geometric properties of the triangulation. In Section 4 we shall illustrate the use of these
extensions with a non-stationary model for global temperatures.
3.1. Matrn elds on manifolds
We shall now move away from R
2
and consider Matrn elds on manifolds. GFs on manifolds
are a well-studied subject with important application to excursion sets in brain mapping (Adler
and Taylor, 2007; Bansal et al., 2007; Adler, 2009). Our main objective is to construct Matrn
elds on the sphere, which is important for the analysis of global spatial and spatiotemporal
models. To simplify the current discussion we shall therefore restrict the construction of Matrn
elds to a unit radius sphere S
2
in three dimensions, leaving the general case for the appendices.
Just as for R
d
, models on a sphere can be constructed via a spectral approach (Jones, 1963). A
more direct way of dening covariance models on a sphere is to interpret the two-dimensional
space S
2
as a surface embedded in R
3
. Any three-dimensional covariance function can then be
used to dene the model on the sphere, considering only the restriction of the function to the
surface. This has the interpretational disadvantage of using chordal distances to determine the
correlation between points. Using the great circle distances in the original covariance function
would not work in general, since for differentiable elds this does not yield a valid positive
denite covariance function (this follows from Gneiting (1998), theorem 2). Thus, the Matrn
covariance function in R
d
cannot be used to dene GFs on a unit sphere embedded in R
3
with
distance naturally dened with respect to distances within the surface. However, we can still
use its origin, the SPDE! For this purpose, we simply reinterpret the SPDE to be dened on
S
2
instead of R
d
, and the solution is still what we mean by a Matrn eld, but dened directly
for the given manifold. The Gaussian white noise which drives the SPDE can easily be dened
on S
2
as a (zero-mean) random GF W./ with the property that the covariance between W.A/
and W.B/, for any subsets A and B of S
2
, is proportional to the surface integral over AB.
Any regular 2-manifold behaves locally like R
2
, which heuristically explains why the GMRF
representation of the weak solution only needs to change the denition of the inner product (7)
to a surface integral on S
2
. The theory in Appendices BD covers the general manifold setting.
To illustrate the continuous index denition and the Markov representation of Matrn elds
on a sphere, Fig. 4 shows the locations of 7280 meteorological measurement stations on the
globe, together with an irregular triangulation. The triangulation was constrained to have
(a) (b)
(d) (c)
Fig. 4. (a), (b) Data locations and (c), (d) triangulation for the global temperature data set analysed in
Section 4, with a coastline map superimposed
minimal angles 21
and maximum edge lengths corresponding to 500 km based on an aver-

age Earth radius of 6370 km. The triangulation includes all the stations more than 10 km apart,
requiring a total of 15182 vertices and 30360 triangles. The resulting GF model for =2 is
illustrated in Fig. 5, for
2
=16, corresponding to an approximate correlation range 0.7 on a
unit radius globe. Numerically calculating the covariances between a point on the equator and
all other points shows, in Fig. 5(a), that, despite the highly irregular triangulation, the deviations
from the theoretical covariances determined by the SPDE (calculated via a spherical Fourier
series) are practically non-detectable for distances that are larger than the local edge length
(0.08 or less), and nearly undetectable even for shorter distances. A random realization from
the model is shown in Fig. 5(b), resampled to a longitudelatitude grid with an area preserving
cylindrical projection. The number of Markov neighbours of each node ranges from 10 to 34,
with an average of 19. The resulting structure of the precision matrix is shown in Fig. 6(a), with
the corresponding ordering of the nodes shown visually in Fig. 6(b) by mapping the node indices
to grey scales. The ordering uses the Markov graph structure to divide the graph recursively into
conditionally independent sets (Karypis and Kumar, 1998), which helps to make the Cholesky
factor of the precision matrix sparse.
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
180 135 90 45 0 45 90 135 180
0.20 0.10 0.00 0.05 0.10 0.15 0.20
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
.
0
0
0
0
.
0
0
2
0
.
0
0
4
Great circle distance
C
o
v
a
r
i
a
n
c
e
(a) (b)
Fig. 5. (a) Covariances (, numerical result for the GMRF approximation; , theoretical covariance
function) and (b) a random sample from the stationary SPDE model (2) on the unit sphere, with D1 and
2
D16
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
180 135 90 45 0 45 90 135 180
0 2000 4000 6000 8000 100001200014000
(a) (b)
Fig. 6. (a) Structure of the (reordered) 1518215182 precision matrix and (b) a visual representation of the
reordering: the indices of each triangulation node have been mapped to grey scales showing the governing
principle of the reordering algorithm, recursively dividing the graph into conditionally independent sets
3.2. Non-stationary elds
From a traditional point of view, the most surprising extension within the SPDE framework
is how we can model non-stationarity. Many applications require non-stationarity in the cor-
relation function and there is a vast literature on this subject (Sampson and Guttorp, 1992;
Higdon, 1998; Hughes-Oliver et al., 1998; Cressie andHuang, 1999; Higdonet al., 1999; Fuentes,
2001; Gneiting, 2002; Stein, 2005; Paciorek and Schervish, 2006; Jun and Stein, 2008; Yue and
Speckman, 2010). The SPDE approach has the additional huge advantage that the resulting
(non-stationary) GF is a GMRF, which allows for swift computations and can additionally be
dened on a manifold.
In the SPDE dened in equation (2), the parameters
2
and the innovation variance are con-
stant in space. In general, we can allow both parameters to depend on the coordinate u, and we
write
{
2
.u/ }
=2
{.u/x.u/}=W.u/: .12/
For simplicity, we choose to keep the variance for the innovation constant and instead scale the
resulting process x.u/ with a scaling parameter .u/. Non-stationarity is achieved when one or
both parameters are non-constant. Of particular interest is the case where they vary slowly with
u, e.g. in a low dimensional representation like
log{
2
.u/}=
.
2
/
i
B
.
2
/
i
.u/
and
log{.u/}=
./
i
B
./
i
.u/
where the basis functions {B
./
i
./} are smooth over the domain of interest. With slowly varying
parameters
2
.u/ and .u/, the appealing local interpretation of equation (12) as a Matrn
eld remains unchanged, whereas the actual form of the non-stationary correlation function
achieved is unknown. The actual process of combining all local Matrn elds into a consistent
global eld is done automatically by the SPDE.
The GMRF representation of equation (12) is found by using the same approach as for the
stationary case, with minor changes. For convenience, we assume that both
2
and can be
considered as constant within the support of the basis functions {
k
}, and hence
i
,
2
j
) =

i
.u/
j
.u/
2
.u/du C
ij
2
.u
j
/ .13/
for a naturally dened u
j
in the support of
i
and
j
. The consequence is a simple scaling of the
matrices in expression (10) at no additional cost; see Appendix A.3. If we improve the integral
approximation (13) fromconsidering
2
.u/ locally constant to locally planar, the computational
preprocessing cost increases but is still O.1/ for each element in the precision matrix Q
.
3.3. Oscillating covariance functions
Another extension is to consider a complex version of the basic equation (2). For simplicity, we
consider only the case =2. With innovation processes W
1
and W
2
as two independent white
noise elds, and an oscillation parameter , the complex version becomes
{
2
expi }{x
1
.u/ i x
2
.u/}=W
1
.u/ i W
2
.u/, 0 <1: .14/
The real andimaginary stationary solutioncomponents x
1
andx
2
are independent, withspectral
densities
R.k/ =.2/
d
{
4
2cos./
2
|k|
2
|k|
4
}
on R
d
. The corresponding covariance functions for Rand R
2
are given in Appendix A. For gen-
eral manifolds, no closed form expression can be found. In Fig. 7, we illustrate the resonance
effects obtained for compact domains by comparing oscillating covariances for R
2
and the unit
sphere, S
2
. The precision matrices for the resulting elds are obtained by a simple modication
of the construction for the regular case, the precise expression given in Appendix A. The details
0 1 2 3 4 5
0
.
5
0
.
0
0
.
5
1
.
0
Distance
C
o
r
r
e
l
a
t
i
o
n
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
Distance
C
o
r
r
e
l
a
t
i
o
n
(a) (b)
Fig. 7. Correlation functions from oscillating SPDE models, for D0, 0:1, . . . , 1, on (b) R
2
and (b) S
2
, with
2
D12, D1
of the construction, which are given in Appendix C.4, also reveal the possibility of multivariate
elds, similar to (Gneiting et al. (2010).
For =0, the regular Matrn covariance with =2 d=2 is recovered, with oscillations
increasing with . The limiting case =1 generates intrinsic stationary random elds, on R
d
invariant to addition of cosine functions of arbitrary direction, with wave number .
3.4. Non-isotropic models and spatial deformations
The non-stationary model that was dened in Section 3.2, has locally isotropic correlations,
despite having globally non-stationary correlations. This can be relaxed by widening the class
of SPDEs considered, allowing a non-isotropic Laplacian, and also by including a directional
derivative term. This also provides a link to the deformation method for non-stationary covari-
ances that was introduced by Sampson and Guttorp (1992).
In the deformation method, the domain is deformed into a space where the eld is stationary,
resulting in a non-stationary covariance model in the original domain. Using the link to SPDE
models, the resulting model can interpreted as a non-stationary SPDE in the original domain.
For notational simplicity, assume that the deformation is between two d-manifolds R
d
to
R
d
, with u =f. u/, u , u

. Restricting to the case =2, consider the stationary SPDE
on the deformed space

,
.
2

/ x. u/ =

W. u/, .15/
generating a stationary Matrn eld. A change of variables onto the undeformed space yields
(Smith, 1934)
1
det{F.u/}
2
det{F.u/}
F.u/F.u/
T
det{F.u/}
x.u/ =
1
det{F.u/}
1=2
W.u/, .16/
where F.u/ is the Jacobian of the deformation function f. This non-stationary SPDE exactly
reproduces the deformation method with Matrn covariances (Sampson and Guttorp, 1992). A
sparse GMRFapproximation can be constructed by using the same principles as for the simpler
non-stationary model in Section 3.2.
An important remark is that the parameters of the resulting SPDE do not depend directly
on the deformation function itself, but only its Jacobian. A possible option for parameterizing
the model without explicit construction of a deformation function is to control the major axis
of the local deformation given by F.u/ through a vector eld, given either from covariate infor-
mation or as a weighted sum of vector basis functions. Addition or subtraction of a directional
derivative term further generalizes the model. Allowing all parameters, including the variance
of the white nose, to vary across the domain results in a very general non-stationary model that
includes both the deformation method and the model in Section 3.2. The model class can be
interpreted as changes of metric in Riemannian manifolds, which is a natural generalization
of deformation between domains embedded in Euclidean spaces. A full analysis is beyond the
scope of this paper, but the technical appendices cover much of the necessary theory.
3.5. Non-separable spacetime models
A separable spacetime covariance function can be characterized as having a spectrum that can
be written as a product or sum of spectra in only space or time. In contrast, a non-separable
model canhave interactionbetweenthe space andtime dependence structures. Whereas it is dif-
cult to construct non-separable non-stationary covariance functions explicitly, non-separable
SPDE models can be obtained with relative ease, using locally specied parameters. Arguably,
the most simple non-separable SPDEthat can be applied to the GMRFmethod is the transport
and diffusion equation
@
@t
.
2
m H/
x.u, t/ =E.u, t/, .17/

where mis a transport direction vector, His a positive denite diffusion matrix (for general man-
ifolds strictly a tensor) and E.u, t/ is a stochastic spacetime noise eld. It is clear that even this
stationary formulation yields non-separable elds, since the spatiotemporal power spectrum of
the solution is
R
x
.k, / =R
E
.k, /{.m k/
2
.
2
k Hk/
2
}
1
, .18/
which is strictly non-separable even with m=0 and H=I. The driving noise is an important
part of the specication and may require an additional layer in the model. To ensure a desired
regularity of the solutions, the noise process can be chosen to be white in time but with spa-
tial dependence, such as a solution to .
2
/
=2
E.u, t/ =W.u, t/, for some 1, where
W.u, t/ is spacetime white noise. A GMRF representation can be obtained by rst applying
the ordinary spatial method, and then discretizing the resulting systemof coupled temporal sto-
chastic differential equation with, for example, an Euler method. Allowing all the parameters to
vary with location in space (and possibly in time) generates a large class of non-separable non-
stationary models. The stationary models that were evaluated by Heine (1955) can be obtained
as special cases.
4. Example: global temperature reconstruction
4.1. Problem background
Whenanalysing past observedweather andclimate, the Global Historical Climatology Network
data set (http://www.ncdc.noaa.gov/ghcn/ghcn.html) (Peterson and Vose, 1997) is
commonly used. On August 8th, 2010, the data contained meteorological observations from
7280 stations spread across continents, where each of the 597373 rows of observations contains
the monthly mean temperatures from a specic station and year. The data span the period
17022010, though counting, for each year, only stations with no missing values, yearly aver-
ages can be calculated only as far back as 1835. The spatial coverage varies from less than 400
stations before 1880 up to 3700 in the 1970s. For each station, covariate information such as
location, elevation and land use is available.
The Global Historical Climatology Network data are used to analyse regional and global
temperatures in the GISS (Hansen et al., 1999, 2001) and HadCRUT3 (Brohan et al., 2006)
global temperature series, together with additional data such as ocean-based sea surface tem-
perature measurements. These analyses process the data in different ways to reduce the inuence
of station-specic effects (which is a procedure knows as homogenization), and the information
about the temperature anomaly (the difference in weather from the local climate, the latter
dened as the average weather over a 30-year reference period) is then aggregated to latitude
longitude grid boxes. The grid box anomalies are then combined by using area-based weights
into an estimate of the average global anomaly for each year. The analysis is accompanied by a
derivation of the resulting uncertainty of the estimates.
Though different in details, the griding procedures are algorithmically based, i.e. there is no
underlying statistical model for the weather and climate, only for the observations themselves.
We shall here present a basis for a stochastic-model-basedapproachtothe problemof estimating
past regional and global temperatures, as an example of how the non-stationary SPDE models
can be used in practice. The ultimate goal is to reconstruct the entire spatiotemporal yearly (or
even monthly) average temperature eld, with appropriate measures of uncertainty, taking the
model parameter uncertainty into account.
Since most of the spatial variation is linked to the rotational nature of the globe in relation
to the sun, we shall here restrict ourselves to a rotationally invariant covariance model, which
reduces the computational burden. However, we shall allow for regional deviations from rota-
tional symmetry in the expectations. The model separates weather from climate by assuming
that the climate can be parameterized by non-stationary expectation and covariance parameters
.u/, .u/ and .u/, for uS
2
, and assuming that the yearly weather follows the model dened
by equaiton (12), given the climate. Using the triangulation from Fig. 4 with piecewise linear
basis function, the GMRF representation that is given in Appendix A.3 will be used, with x
t
denoting the discretized eld at time t. To avoid complications due to temporal dependence
between monthly values, we aggregate the measurements into yearly means and model only the
yearly average temperature at eachlocation. Afull analysis needs totake local station-dependent
effects into account. Here, we include only the effect of elevation. To incorporate a completely
integrated station homogenization procedure into the model would go far beyond the scope of
this paper, and we therefore use the adjusted Global Historical Climatology Network data set,
which includes some outlier quality control and relative station calibrations.
4.2. Model summary
The climate and observation model is governed by a parameter vector ={
,
s
,
"
},
and we denote the yearly temperature elds x ={x
t
} and the yearly observations y ={y
t
},
with t =1970, . . . , 1989. Using basis function matrices B
(all 49 spherical harmonics up to

and including order 6; see Wahba, (1981)), B
and B
(B-splines of order 2 in sin(latitude),

shown in Fig. 8), the expectation eld is given by
x[
=B
, the local spatial dependence

.u/ is dened through log
2
=B
and the local variance scaling .u/ is dened through

log./ =B
. The prior distribution for the climate eld is chosen as approximate solutions
to the SPDE .u/ =
W.u/, where
0, which provides natural relative prior weights for

the spherical harmonic basis functions.
90 60 30 0
(a) (b)
(c)
30 60 90
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Latitude
F
u
n
c
t
i
o
n

v
a
l
u
e
C
90 60 30 0 30 60 90
0
.
0
1
.
0
2
.
0
3
.
0
Latitude
90 60 30 0 30 60 90
0
1
0
0
0
2
0
0
0
3
0
0
0
Latitude
k
m
Fig. 8. (a) Three transformed B-spline basis functions of order 2, and approximate 95% credible intervals
for (b) standard deviation and (c) approximate correlation range of the yearly weather, as functions of latitude
The yearly temperature elds x
t
are dened conditionally on the climate as
.x
t
[/ N.
x[
, Q
1
x[
/,
where Q
x[
is the GMRF precision corresponding to model (12) with parameters determined
by .
/. Introducing observation matrices A

t
, that extract the nodes from x
t
for each obser-
vation, the observed yearly weather is modelled as
.y
t
[x
t
, / N.A
t
x
t
S
t
s
, Q
1
y[x,
/,
where S
t
s
are station-specic effects and Q
y[x,
=I exp.
"
/ is the observation precision. Since
we use the data only for illustrative purposes here, we shall ignore all station-specic effects
except for elevation. We also ignore any remaining residual dependences between consecutive
years, analysing only the marginal distribution properties of each year.
The Bayesian analysis draws all its conclusions from the properties of the posterior distribu-
tions of .[y/ and .x[y/, so all uncertainty about the weather x
t
is included in the distribution
for the model parameters , and conversely for and x
t
. One of the most important steps is
how to determine the conditional distribution for the weather given observations and model
parameters,
.x
t
[y
t
, / N{
x[
Q
1
x[y,
A
T
t
Q
y[x,
.y
t
A
t
x[
S
t
s
/, Q
1
x[y,
},
where Q
x[y,
=Q
x[
A
T
t
Q
y[x,
A
t
is the conditional precision, and the expectation is the kri-
ging estimator of x
t
. Owing to the compact support of the basis functions, which determined by
the triangulation, each observation depends on at most three neighbouring nodes in x
t
, which
makes the conditional precision have the same sparsity structure as the eld precisions Q
x[
.
The computational cost of the Kriging estimates is O.n/ in the number of observations, and
approximately O.n
3=2
/ in the number of basis functions. If basis functions with non-compact
support had been used, such as a Fourier basis, the posterior precisions would have been fully
dense matrices, with computational cost O.n
3
/ in the number of basis functions, regardless of
the sparsity of the prior precisions. This shows that when constructing computationally efcient
models it is not enough to consider the theoretical properties of the prior model, but instead
the whole sequence of computations needs to be taken into account.
4.3. Results
We implemented the model by using R-inla. Since .x[y, / is Gaussian, the results are only
approximate with regard to the numerical integration of the covariance parameters .
,
"
/.
Owing to the large size of the data set, this initial analysis is based on data only from the
period 19701989, requiring 336960 nodes in a joint model for the yearly temperature elds,
measurements and linear covariate parameters, with 15182 nodes in each eld, and the num-
ber of observations in each year ranging between approximately 1300 and 1900, for each year
including all stations with no missing monthly values. The full Bayesian analysis took about
1 h to compute on a 12-core computer, with a peak memory use of about 50Gbytes during the
parallel numerical integration phase. This is a notable improvement over earlier work by Das
(2000) where partial estimation of the parameters in a deformation-based covariance model of
the type in Section 3.4 took more than a week on a supercomputer.
The 95% credible interval for the measurement standard deviation, including local unmod-
elled effects, was calculated as .0:628, 0:650/

C, with posterior expectation 0:634

C. The spa-
tial covariance parameters are more difcult to interpret individually, but we instead show the
resulting spatially varying eld standard deviations and correlation ranges in Fig. 8, including
pointwise 95% credible intervals. Both curves show a clear dependence on latitude, with both
larger variance and correlation range near the poles, compared with the equator. The standard
deviations range between 1.2 and 2:6

C, and the correlation ranges vary between 1175 and
2825 km. There is an asymmetric northsouth pole effect for the variances, but a symmetric
curve is admissible in the credible intervals.
Evaluating the estimated climate and weather for a period of only 20 years is difcult, since
climate is typically dened as averages over periods of 30 years. Also, the spherical harmonics
that were used for the climate model are not of sufciently high order to capture all regional
effects. To alleviate these problems, we base the presentation on what can reasonably be called
the empirical climate and weather anomalies for the period 19701989, in effect using the period
average as reference. Thus, instead of evaluating the distributions of .[y/ and .x
t
[y/, we
instead consider . x[y/ and .x
t
x[y/, where x =
1989
t=1970
x
t
=20. In Figs 9(a) and 9(b), the pos-
terior expectation of the empirical climate, E. x[y/, is shown (including the estimated effect
of elevation), together with the posterior expectation of the temperature anomaly for 1980,
E.x
1980
x[y/. The corresponding standard deviations are shown in Figs 9(c) and 9(d). As
expected, the temperatures are low near the poles and high near the equator, and some of the
relative warming effect of the thermohaline circulation on the Alaska and northern European
climates can also be seen. There is a clear effect of regional topography, showing cold areas
for high elevations such as in the Himalayas, Andes and Rocky Mountains, as indicated by an
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
Longitude
L
a
t
i
t
u
d
e
90
60
45
30
15
0
15
30
45
60
90
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
180 135 90 45 0 45 90 135 180
180 135 90 45 0 45 90 135 180
180 135 90 45 0 45 90 135 180
180 135 90 45 0 45 90 135 180
30 20 10 0 10 20 30 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
(a)
(b) (d)
(c)
Fig. 9. Posterior means for (a) the empirical 19701989 climate and (b) the empirical mean anomaly 1980
with (c) and (d) the corresponding posterior standard deviations respectively: the climate includes the esti-
mated effect of elevation; an area preserving cylindrical projection is used
estimated cooling effect of 5:2

C per kilometre of increased elevation. It is clear from Figs 9(c)
and 9(d) that including ocean-based measurements is vital for analysis of regional ocean climate
and weather, in particular for the south-east Pacic Ocean.
With this in mind, we might expect that the period of analysis and data coverage are too
restricted to allow detection of global trends, especially since the simple model that we use
a priori assumes a constant climate. However, the present analysis, including the effects of all
parameter uncertainties, still yields a 95% Bayesian prediction interval .0:87, 2:18/

C per cen-
tury (expectation 1:52

C) for the global average temperature trend over the 20-year period
analysed. The posterior standard deviation for each global average temperature anomaly was
calculated to about 0:09
C. Comparing the values with the corresponding estimates in the GISS

series, which has an observed trend of 1:48

C per century for this period, yields a standard
deviation for the differences between the series of only 0:04
C. Thus, the results here are similar

to the GISS results, even without the use of ocean data.
The estimated trend has less than a 2% probability of occurring in a random sample from
the temporally stationary model that was used in the analysis. From a purely statistical point of
view, this could indicate either that there is a large amount of unmodelled temporal correlation
in the yearly weather averages or that the expectation is non-stationary, i.e. that the climate was
changing. Since it is impossible to distinguish between these two cases by using only statistical
methods on the single realization of the actual climate and weather system that is available, a
full analysis should incorporate knowledge from climate system physics to balance properly the
change in climate and short-term dependence in the weather in the model.
5. Discussion
The main result in this work is that we can construct an explicit link between (some) GFs and
GMRFs by using an approximate weak solution of the corresponding SPDE. Although this
result is not generally applicable for all covariance functions, the subclass of models where this
result is applicable is substantial, and we expect to nd additional versions and extensions in the
future; see for example Bolin and Lindgren (2011). The explicit link makes these GFs much more
practically applicable, as we might model and interpret the model by using covariance func-
tions while doing the computations by using the GMRF representation which allows for sparse
matrix numerical linear algebra. In most cases, we can make use of the integrated nested Laplace
approximation approach for doing (approximate) Bayesian inference (Rue et al., 2009), which
requires the latent eld to be a GMRF. It is our hope that the SPDE link might help in bridg-
ing the literature of (continuously indexed) GFs and geostatistics on one side, and GMRFs or
conditional auto-regressions on the other.
Furthermore, the simplicity of the SPDE parameter specications provides a new model-
ling approach that is not dependent on the theory for constructing positive denite covariance
functions. The SPDE approach allows easy construction of non-stationary models, dened in
a natural way that provides good local interpretation, via spatially varying parameters, and
is computationally very efcient, as we still obtain GMRF representations. The extension to
manifolds is also useful, with elds on the globe as the main example.
A third issue, which has not yet been discussed, is that the SPDE approach might help to
interpret external covariates (e.g. wind speed) as an appropriate drift term or similar in the
related SPDE and then this covariate would enter the spatial dependence models correctly. This
is again an argument for more physics-based spatial modelling but, as we have shown in this
paper, such an approach can also provide a huge computational benet.
On the negative side, the approach comes with an implementation and preprocessing cost
for setting up the models, as it involves the SPDE, triangulations and GMRF representa-
tions, but we rmly believe that such costs are unavoidable when efcient computations are
required.
Acknowledgements
This paper is dedicated to the memory of Julian E. Besag (19452010), whose work on Markov
random elds from 1974 onwards inspired us to investigate the link to Gaussian random-eld
models that are commonly used in spatial statistics.
The authors thank the Research Section Committee and reviewers for their very helpful com-
ments and suggestions. We are also grateful to Peter Guttorp for encouraging us to address the
global temperature problem, Daniel Simpson for providing the convergence rate result (11) and
the key reference in Appendix C.5 and for invaluable discussions and comments, and to Georg
Lindgren for numerous comments on the manuscript.
Appendix A: Explicit results
This appendix includes some explicit expressions and results that are not included in the main text.
A.1. Regular lattices
Here we shall give some explicit precision expressions for grid-based models on R and R
2
. Consider the
SPDE
.
2
H/
=2
x.u/ =W.u/, =R
d
, d =1 or d =2,
where H is a diagonal d-dimensional matrix with positive diagonal elements (compare with Section 3.4).
For any given ordered discretization u
1
, . . . , u
n
on R, let
i
=u
i
u
i1
,
i
=u
i1
u
i
and s
i
=.
i
i
/=2.
Since d =1, we can write H=H 0, and the elements on row i, around the diagonal, of the precision are
given by
Q
1
: s
i
[ a
i
c
i
b
i
],
Q
2
: s
i
[ a
i
a
i1
a
i
.c
i1
c
i
/ a
i
b
i1
c
2
i
b
i
a
i1
b
i
.c
i
c
i1
/ b
i
b
i1
]
where a
i
=H=
i
s
i
, b
i
=H=
i
s
i
, and c
i
=
2
a
i
b
i
. If the spacing is regular, s = =, and a=a
i
=b
i
H=
2
and c =c
i
2
2a. The special case =2 with =0 and irregular spacing is a generalization of Lindgren
and Rue (2008).
For R
2
, assume a givenregular griddiscretization, withhorizontal (co-ordinate component 1) distances
and vertical (co-ordinate component 2) distances . Let s =, a=H
11
=
2
, b=H
22
=
2
and c =
2
2a2b.
The precision elements are then given by
If the grid distances are proportional to the square root of the corresponding diagonal elements of H (such
as in the isotropic case = and H
11
=H
22
), the expressions simplify to s =, a =b =H
11
=
2
=H
22
=
2
and c =
2
4a.
A.2. Triangulated domains
In this section, we derive explicit expressions for the building blocks for the precision matrices, for
general triangulated domains with piecewise linear basis functions. For implementation of the theory in
Appendix C, we need to calculate
C
ii
=
i
, 1)
,
C
ij
=
i
,
j
)
,
G
ij
=
i
,
j
)
,
B
ij
=
i
, @
n
j
)
@
:
_
_
.19/
For 2-manifolds such as regions in R
2
or on S
2
, we require a triangulation with a set of vertices v
1
, . . . , v
n
,
embedded in R
3
. Each vertex v
k
is assigned a continuous piecewise linear basis function
k
with support
on the triangles attached to v
k
. To obtain explicit expressions for equation (19), we need to introduce some
notation for geometry of an arbitrary triangle. For notational convenience, we number the corner vertices
of a given triangle T =.v
0
, v
1
, v
2
/. The edge vectors opposite each corner are
e
0
=v
2
v
1
,
e
1
=v
0
v
2
,
e
2
=v
1
v
0
,
and the corner angles are
0
,
1
and
2
.
The triangle area [T[ can be obtained fromthe formula [T[ =|e
0
e
1
|=2, i.e. half the length of the vector
product in R
3
. The contributions from the triangle to the

C and C matrices are given by
[

C
i, i
.T/]
i=0, 1, 2
=
[T[
3
. 1 1 1/,
[C
i, j
.T/]
i, j=0, 1, 2
=
[T[
12
_
2 1 1
1 2 1
1 1 2
_
:
The contribution to G
0, 1
from the triangle T is
G
0, 1
.T/ =[T[.
0
/
T
.
1
/ =
cot.
2
/
2
=
1
4[T[
e
T
0
e
1
,
and the entire contribution from the triangle is
[G
i, j
.T/]
i, j=0, 1, 2
=
1
4[T[
_
_
|e
0
|
2
e
T
0
e
1
e
T
0
e
2
e
T
1
e
0
|e
1
|
2
e
T
1
e
2
e
T
2
e
0
e
T
2
e
1
|e
2
|
2
_
_
=
1
4[T[
. e
0
e
1
e
2
/
T
. e
0
e
1
e
2
/:
For the boundary integrals in expression (19), the contribution from the triangle is
[B
i, j
.T/]
i, j=0, 1, 2
=
1
4[T[
_
0 e
0
e
0
e
1
0 e
1
e
2
e
2
0
_
T
_
b
0
I
b
1
I
b
2
I
_
. e
0
e
1
e
2
/,
where b
k
=I(edge k in T lies on @). Summing the contributions from all the triangles yields the complete
C, C, G, and B-matrices.
For the anisotropic version, parameterizedas inAppendix A.1 andAppendix C.4, the modied G-matrix
elements are given by
[G
i, j
.T/]
i, j=0, 1, 2
=
1
4[T[
. e
0
e
1
e
2
/
T
adj.H/. e
0
e
1
e
2
/, .20/
where adj.H/ is the adjugate matrix of H, for non-singular matrices dened as det.H/H
1
.
A.3. Non-stationary and oscillating models
For easy reference, we give specic precision matrix expressions for the case =2 for arbitrary triangu-
lated manifold domains . The stationary and simple oscillating models for =2 have precision matrices
given by
Q
2
.
2
, / =
4
C2
2
cos./GGC
1
G, .21/
where =0 corresponds to the regular Matrn case and 0< <1 are oscillating models. Using the approx-
imation from expression (13), the non-stationary model (12) with =2 has precision matrix given by
Q
2
{
2
./, ./}=.
2
C
2
2
GG
2
GC
1
G/ .22/
where
2
and are diagonal matrices, with
2
ii
=.u
i
/
2
and
ii
=.u
i
/. As shown in Appendix C.5, all the
C should be replaced by

C to obtain a Markov model.
A.4. Neumann boundary effects
The effects on the covariance functions resulting fromusing Neumann boundary conditions can be explic-
itly expressed as a folding effect. When the full SPDE is
.
2
/
=2
x.u/ =W.u/, u ,
@
n
.
2
/
j
x.u/ =0, u @, j =0, 1, . . . , .1/=2,
.23/
the following theorem provides a direct answer, in terms of the Matrn covariance function.
Theorem 1. If x is a solution to the boundary value problem (23) for =[0, L] and a positive integer ,
then
cov{x.u/, x.v/}=
k=
{r
M
.u, v2kL/ r
M
.u, 2kLv/}
where r
M
is the Matrn covariance as dened on the whole of R.
Theorem 1, which extends naturally to arbitrary generalized rectangles in R
d
, is proved in Appendix
D.1. In practice, when the effective range is small compared with L, only the three main terms need to be
included for a very close approximation:
cov{x.u/, x.v/}r
M
.u, v/ r
M
.u, v/ r
M
.u, 2Lv/ .24/
=r
M
.0, vu/ r
M
.0, vu/ r
M
{0, 2L.vu/}: .25/
Moreover, the resulting covariance is nearly indistinguishable from the stationary Matrn covariance at
distances greater than twice the range away from the borders of the domain.
A.5. Oscillating covariance functions
The covariances for the oscillating model can be calculated explicitly for Rand R
2
, from the spectrum. On
R, complex analysis gives
r.u, v/ =
1
2sin./
3
exp{cos.=2/[vu[}sin{=2sin.=2/[vu[}, .26/
which has variance {4cos.=2/
3
}
1
. On R
2
, involved Bessel function integrals yield
r.u, v/ =
1
4 sin./
2
i
[K
0
{|v u|exp.i=2/}K
0
{|v u|exp.i=2/}] .27/
which has variance {4
2
sinc./}
1
.
Appendix B: Manifolds, random elds and operator identities
B.1. Manifold calculus
To state concisely the theory needed for constructing solutions to SPDEs on more general spaces than
R
d
, we need to introduce some concepts from differential geometry and manifolds. A main point is that,
loosely speaking, for statisticians who are familiar with measure theory and stochastic calculus on R
d
,
many of the familiar rules for calculus for randomprocesses and elds still apply, as long as all expressions
are dened in co-ordinate-free manners. Here, we give a brief overview of the concepts that are used in
the subsequent appendices. For more details on manifolds, differential calculus and geometric measure
theory see for example Auslander and MacKenzie (1977), Federer (1978) and Krantz and Parks (2008).
Loosely, we say that a space is a d-manifold if it locally behaves as R
d
. We consider only manifolds
with well-behaved boundaries, in the sense that the boundary @ of a manifold, if present, is required to
be a piecewise smooth .d 1/-manifold. We also require the manifolds to be metric manifolds, so that
distances between points and angles between vectors are well dened.
A bounded manifold has a nite maximal distance between points. If such a manifold is complete in
the set sense, it is called compact. Finally, if the manifold is compact but has no boundary, it is closed.
The most common metric manifolds are subsets of R
d
equipped with the Euclidean metric. The prime
example of a closed manifold is the unit sphere S
2
embedded in R
3
. In Fourier analysis for images, the at
torus commonly appears, when considering periodic continuations of a rectangular region. Topologically,
this is equivalent to a torus, but with a different metric compared with a torus that is embedded in R
3
. The
d-dimensional hypercube [0, 1]
d
is a compact manifold with a closed boundary.
From the metric that is associated with the manifold it is possible to dene differential operators. Let
denote a function : .R. The gradient of at u is a vector .u/ dened indirectly via directional
derivatives. In R
d
with Euclidean metric, the gradient operator is formally given by the column vector
.@=@u
1
, . . . , @=@u
d
/
T
. The Laplacian of at u (or the LaplaceBeltrami operator) can be dened as the
sum of the second-order directional derivatives, with respect to a local orthonormal basis, and is denoted
.u/ = .u/. In Euclidean metric on R
d
, we can write =@
2
=@u
2
1
. . . @
2
=@u
2
d
. At the boundary
of , the vector n
@
.u/ denotes the unit length outward normal vector at the point u on the boundary @.
The normal derivative of a function is the directional derivative @
n
.u/ =n
@
.u/ .u/.
An alternative to dening integration on general manifolds through mapping subsets into R
d
is to
replace Lebesgue integration with integrals dened through normalized Hausdorff measures (Federer,
1951, 1978), here denoted H
d
./. This leads to a natural generalization of Lebesgue measure and inte-

gration that coincides with the regular theory on R
d
. We write the area of a d-dimensional Hausdorff
measurable subset A as [A[
=H
d
.1
A
/, and the Hausdorff integral of a (measurable) function as
H
d
./. An inner product between scalar or vector-valued functions and is dened through
, )
=H
d
. / =
_
u
.u/ .u/H
d
.du/:
A function : .R
m
, m1, is said to be square integrable if and only if ||
2
=, )
<, which is
denoted L
2
./.
A fundamental relationship, that corresponds to integration by parts for functions on R, is Greens rst
identity,
, )
=, )
@
n
)
@
:
Typical statements of the identity require C
1
./ and C
2
./, but we shall relax these requirements
considerably in lemma 1.
We also need to dene Fourier transforms on general manifolds, where the usual cosine and sine func-
tions do not exist.
Denition 1 (generalized fourier representation). The Fourier transformpair for functions {L
2
: R
d
.
R} is given by
.k/ =.F/.k/ =
1
.2/
d
.u/, exp.ik
T
u/)
R
d
.du/
,
.u/ =.F
1

/.u/ =
.k/exp.ik
T
u/)
R
d
.dk/
:
(Here, we briey abuse our notation by including complex functions in the inner products.)
If is a compact manifold, a countable subset {E
k
, k =0, 1, 2, . . .} of orthogonal and normalized eigen-
functions to the negated Laplacian, E
k
=
k
E
k
, can be chosen as basis, and the Fourier representation
for a function L
2
: .R is given by
.k/ =.F/.k/ =, E
k
)
,
.u/ =.F
1

/.u/ =
k=0
.k/E
k
.u/:
Finally, we dene a subspace of L
2
-functions, with inner product adapted to the differential operators
that we shall study in the remainder of this paper.
Denition 2. The Hilbert space H
1
., /, for a given 0, is the space of functions {: .R} with
L
2
./, equipped with inner product
, )
H
1
., /
=
2
, )
, )
:
The inner product induces a norm, which is given by ||
H
1
., /
=, )
1=2
H
1
., /
. The boundary case =0 is
alsowell dened, since ||
H
1
., /
is a seminorm, andH
1
., 0/ is a space of equivalence classes of functions,
that can be identied by functions with , 1)
=0.
Note that, for >0, the norms are equivalent, and that the Hilbert space H
1
is a quintessential Sobolev
space.
B.2. Generalized Gaussian random elds
We now turn to the problem of characterizing random elds on . We restrict ourselves to GFs that are
at most as irregular as white noise. The distributions of such elds are determined by the properties of
expectations and covariances of integrals of functions with respect to randommeasures: the so-called nite
dimensional distributions.
In classical theory for GFs, the following denition can be used.
Denition 3 (GF). A random function x : .R on a manifold is a GF if {x.u
k
/, k =1, . . . , n} are
jointly Gaussian random vectors for every nite set of points {u
k
, k =1, . . . , n}. If there is a constant
b0 such that E{x.u/
2
}b for all u , the random eld has bounded second moments.
The complicating issue in dealing with the fractional SPDEs that are considered in this paper is that,
for some parameter values, the solutions themselves are discontinuous everywhere, although still more
regular than white noise. Thus, since the solutions do not necessarily have well-dened pointwise meaning,
the above denition is not applicable, and the driving white noise itself is also not a regular random eld.
Inspired by Adler and Taylor (2007), we solve this by using a generalized denition based on generalized
functions.
Denition 4 (generalized function). For a given function space F, an F-generalized function x: .R,
with an associated generating additive measure x
: F .R, is an equivalence class of objects identied

through the collection of integration properties that is dened by , x)
=x
./, for all x
-measurable
functions F.
When x
is absolutely continuous with respect to the Hausdorff measure on , x is a set of regular

functions, at most differing on sets with Hausdorff measure zero. The denition allows many of the regu-
lar integration rules to be used for generalized functions, without any need to introduce heavy theoretical
notational machinery, andprovides a straightforwardway of generalizing denition3 tothe kindof entities
that we need for the subsequent analysis.
Denition 5 (generalized GF). A generalized GF x on is a random L
2
./ generalized function such
that, for every nite set of test functions {
i
L
2
./, i =1, . . . , n}, the inner products
i
, x)
, i =1, . . . , n,
are jointly Gaussian. If there is a constant b0 such that E., x)
2
/ b||
2
for every L
2
./, the gen-
eralized eld x has L
2
./-bounded second moments, abbreviated as L
2
./ bounded.
Of particular importance is the fact that white noise can be dened directly as a generalized GF.
Denition 6 (Gaussian white noise). Gaussian white noise W on a manifold is an L
2
./-bounded
generalized GF such that, for any set of test functions {
i
L
2
./, i =1, . . . , n}, the integrals
i
, W)
,
i =1, . . . , n, are jointly Gaussian, with expectation and covariance measures given by
E.
i
, W)
/ =0,
cov.
i
, W)
,
j
, W)
/ =
i
,
j
)
:
In particular, the covariance measure of W over two subregions A, B is equal to the area measure of
their intersection, [AB[
, so the variance measure of W over a region is equal to the area of the region.
We note that the popular approach to dening white noise on R
d
via a Brownian sheet is not applicable
for general manifolds, since the notion of globally orthogonal directions is not present. The closest equiv-
alent would be to dene a set-indexed Gaussian random function W
.A/ : {A; A} .R, such that

E{W
.A/} =0 and cov{W
.A/, W
.B/} =[AB[
. This denition is equivalent to that above (Adler

and Taylor, 2007), and the Brownian sheet is a special case that considers only rectangular regions along
the axes of R
d
, with one corner xed at the origin.
B.3. Operator identities
Identities for differentiation and integration on manifolds are usually stated as requiring functions in C
1
,
C
2
or even C
, which is much too restrictive to be applied to generalized functions and random elds.
Here, we present the two fundamental identities that are needed for the subsequent SPDEanalysis; Greens
rst identity and a scalar product characterization of the half-Laplacian.
B.3.1. Stochastic Greens rst identity
We here state a generalization of Greens rst identity, showing that the identity applies to generalized
elds, as opposed to only differentiable functions.
Lemma 1. If f L
2
./ and x is L
2
./ bounded, then (with probability 1)
f, x)
=f, x)
f, @
n
x)
@
:
If x is L
2
./ bounded and f L
2
./ , then (with probability 1)
x, f)
=x, f)
x, @
n
f)
@
:
For brevity, we include only a sketch of the proof.
Proof. The requirements imply that each integrand can be approximated arbitrarily closely in the L
2
-
senses using C
q
functions

f and x, where q in each case is sufciently large for the regular Greens identity
to hold for

f and x. Using the triangle inequality, it follows that the expectation of the squared difference
between the left- and right-hand sides of the identity can be bounded by an arbitrarily small positive
constant. Hence, the difference is zero in quadratic mean, and the identity holds with probability 1.
B.3.2. Half-Laplacian
In dening and solving the SPDEs considered, the half-Laplacian operator needs to be characterized in a
way that permits practical calculations on general manifolds. The fractional modied Laplacian operators
.
2
/
=2
, , 0, are commonly (Samko et al. (1992), page 483) dened through the Fourier transform,
as dened above:
{F.
2
/
=2
}.k/ =.
2
|k|
2
/
=2
.F/.k/,
on R
d
;
{F.
2
/
=2
}.k/ =.
2
k
/
=2
.F/.k/,
on compact , where
k
, k =0, 1, 2, . . . , are the eigenvalues of . The formal denition is mostly of
theoretical interest since, in practice, the generalized Fourier basis and eigenvalues for the Laplacian are
unknown. In addition, even if the functions are known, working directly in the Fourier basis is computa-
tionally expensive for general observation models, since the basis functions do not have compact support,
which leads to dense covariance and precision matrices. The following lemma provides an integration
identity that allows practical calculations involving the half-Laplacian.
Lemma 2. Let and be functions in H
1
., /. Then, the Fourier-based modied half-Laplacians
satisfy
.
2
/
1=2
, .
2
/
1=2
)
=, )
H
1
., /
whenever either
(a) =R
d
,
(b) is closed or
(c) is compact and , @
n
@
=@
n
, )
@
=0.
For a proof, see Appendix D.2. Lemma 2 shows that, for functions fullling the requirements, we
can use the Hilbert space inner product as a denition of the half-Laplacian. This also generalizes in a
natural way to random elds x with L
2
./-bounded x, as well as to suitably well-behaved unbounded
manifolds.
It would be tempting to eliminate the qualiers in part (c) of lemma 2 by subtracting the average of the
two boundary integrals to the relationship, and to extend lemma 2 to a complete equivalence relationship.
However, the motivation may be problematic, since the half-Laplacian is dened for a wider class of
functions than the Laplacian, and it is unclear whether such a generalization necessarily yields the same
half-Laplacian as the Fourier denition for functions that are not of the class L
2
./. See Ili c et al.
(2008) for a partial result.
Appendix C: Hilbert space approximation
We are now ready to formulate the main results of the paper in more technical detail. The idea is to
approximate the full SPDE solutions with functions in nite Hilbert spaces, showing that the approxi-
mations converge to the true solutions as the nite Hilbert space approaches the full space. In Appendix
C.1, we state the convergence and stochastic FEM denitions that are needed. The main result for Matrn
covariance models is stated in Appendix C.2, followed by generalizations to intrinsic and oscillating elds
in Appendix C.3 and Appendix C.4. Finally, the full nite element constructions are modied to Markov
models in Appendix C.5.
C.1. Weak convergence and stochastic nite element methods
We start by stating formal denitions of convergence of Hilbert spaces and of random elds in such
spaces (denitions 7 and 8) as well as the denition of the nite element constructions that will be used
(denition 9).
Denition 7 (dense subspace sequences). A nite subspace H
l
n
., / H
1
., / is spanned by a nite
set of basis functions
n
={
1
, . . . ,
n
}. We say that a sequence of subspaces {H
1
n
} is dense in H
1
if for
every f H
1
there is a sequence {f
n
}, f
n
H
1
n
, such that lim
n
.|f f
n
|
H
1
., /
/ =0.
If the subspace sequence is nested, there is a monotonely convergent sequence {f
n
}, but that is not a
requirement here. For given H
1
n
, we can choose the projection of f H
1
onto H
1
n
, i.e. the f
n
that minimizes
|f f
n
|
H
1 . The error f f
n
is orthogonal to H
1
n
, and the basis co-ordinates can be determined via the
system of equations
k
, f
n
)
H
1
., /
=
k
, f)
H
1
., /
, for all k =1, . . . , n.
Denition 8 (weak convergence). Asequence of L
2
./-bounded generalized GFs {x
n
} is said to converge
weakly to an L
2
./-bounded generalized GF x if, for all f, g L
2
./,
E.f, x
n
)
/ E.f, x)
/,
cov.f, x
n
)
, g, x
n
)
/ cov.f, x)
, g, x)
/,
as n. We denote such convergence by
x
n
D{L
2
./}
* x:
Denition 9 (nite element approximations). Let L be a second-order elliptic differential operator, and
let E be a generalized GF on . Let x
n
=
j
j
w
j
H
1
n
., / denote approximate weak solutions to the
SPDE Lx=E on .
(a) The weak Galerkin solutions are given by Gaussian w={w
1
, . . . , w
n
} such that
E.f
n
, Lx
n
)
/ =E.f
n
, E)
/,
cov.f
n
, Lx
n
)
, g
n
, Lx
n
)
/ =cov.f
n
, E)
, g
n
, E)
/
for every pair of test functions f
n
, g
n
H
1
n
., /.
(b) The weak least squares solutions are given by Gaussian w={w
1
, . . . , w
n
} such that
E.Lf
n
, Lx
n
)
/ =E.Lf
n
, E)
/,
cov.Lf
n
, Lx
n
)
, Lg
n
, Lx
n
)
/ =cov.Lf
n
, E)
, Lg
n
, E)
/
n
, g
n
H
1
n
., /.
C.2. Basic Matrn-like cases
In the remainder of the appendices, we let L=.
2
/. In the classic Matrn case, the SPDE L
=2
x=W
can, for integer -values, be unravelled into an iterative formulation
L
1=2
y
1
=W,
Ly
2
=W,
Ly
k
=y
k2
, k =3, 4, . . . , :
For integers =1, 2, 3, . . . , y
is a solution to the original SPDE. To avoid solutions in the null space of

.
2
/, we shall require Neumann boundaries, i.e. the solutions must have zero normal derivatives at
the boundary of . In the Hilbert space approximation, this can be achieved by requiring that all basis
functions have zero normal derivatives.
We nowformulate the three main theorems of the paper, which showwhat the precision matrices should
look like for given basis functions (theorem 2), that the nite Hilbert representations converge to the true
distributions for =1 and =2 and dense Hilbert space sequences (theorem 3) and nally that the iter-
ative constructions for 3 also converge (theorem 4). A sequence H
1
n
., / of piecewise linear Hilbert
spaces dened on non-degenerate triangulations of is a dense sequence in H
1
., / if the maximal edge
length decreases to zero. Thus, the theorems are applicable for piecewise linear basis functions, showing
weak convergence of the eld itself and its derivatives up to order min.2, /.
Theorem 2 (nite element precisions). Dene matrices C, G and K through
C
i, j
=
i
,
j
)
,
G
i, j
=
i
,
j
)
,
K=
2
CG
and denote the distribution for w with N.0, Q
1
/, where the precision matrix Q is the inverse of the
covariance matrix, and let x
n
=
k
k
w
k
be a weak H
1
n
., / approximation to L
=2
x=E, L=.
2
/,
with Neumann boundaries, and @
n
k
=0 on @.
(a) When =2 and E =W, the weak Galerkin solution is obtained for Q=K
T
C
1
K.
(b) When =1 and E =W, the weak least squares solution is obtained for Q=K.
(c) When =2 and E is an L
2
./-bounded GF in H
1
n
., / with mean 0 and precision Q
E, n
, the weak
Galerkin solution is obtained for Q=K
T
C
1
Q
E, n
C
1
K.
Theorem 3 (convergence). Let x be a weak solution to the SPDE L
=2
x = W, L = .
2
/, with
Neumann boundaries on a manifold , and let x
n
be a weak H
1
n
., / approximation, when W is
Gaussian white noise. Then,
x
n
D{L
2
./}
* x, .28/
L
=2
x
n
D{L
2
./}
* L
=2
x, .29/
if the sequence {H
1
n
., /, n} is dense in H
1
., /, and either
(a) =2, and x
n
is the Galerkin solution, or
(b) =1 and x
n
is the least squares solution.
Theorem 4 (iterative convergence). Let y be a weak solution to the linear SPDE L
y
y =E on a manifold
, for some L
2
./-bounded random eld E, and let x be a weak solution to the SPDE L
y
Lx=E, where
L=
2
. Further, let y
n
be a weak H
1
n
., / approximation to y such that
y
n
D{L
2
./}
* y, .30/
and let x
n
be the weak Galerkin solution in H
1
n
., / to the SPDEs Lx=y
n
on . Then,
x
n
D{L
2
./}
* x, .31/
Lx
n
D{L
2
./}
* Lx: .32/
For proofs of the three theorems, see Appendix D.3.
C.3. Intrinsic cases
When =0, the Hilbert space from denition 2 is a space of equivalence classes of functions, correspond-
ing to SPDE solutions where arbitrary functions in the null space of ./
=2
can be added. Such solution
elds are known as intrinsic elds and have well-dened properties. With piecewise linear basis functions,
the intrinsicness can be exactly reproduced for =1 for all manifolds, and partially for =2 on subsets
of R
2
, by relaxing the boundary constraints to free boundaries. For larger or more general manifolds,
the intrinsicness will only be approximately represented. How to construct models with more ne-tuned
control of the null space is a subject for further research.
To approximate intrinsic elds with 2 and free boundaries, the matrix K in theorem 2 should be
replaced by G B (owing Greens identity), where the elements of the (possibly asymmetric) boundary
integral matrix B are given by B
i, j
=
i
, @
n
j
)
@
. The formulations and proofs of theorem 3 and theorem 4
remain unchanged, but with the convergence dened only with respect to test functions f and g orthogonal
to the null space of the linear SPDE operator.
The notion of non-null-space convergence allows us to formulate a simple proof of the result from
Besag and Mondal (2005), that says that a rst-order intrinsic conditional auto-regressive model on innite
lattices in R
2
converges to the de Wij process, which is an intrinsic generalized Gaussian random eld. As
can be seen in Appendix A.1, for =1 and =0, the Q-matrix (equal to G) for a triangulated regular
grid matches the ordinary intrinsic rst-order conditional auto-regressive model. The null spaces of the
half-Laplacian are constant functions. Choose non-trivial test functions f and g that integrate to 0 and
apply theorem 3 and denition 8. This shows that the regular conditional auto-regressive model, seen as
a Hilbert space representation with linear basis functions, converges to the de Wij process, which is the
special SPDE case =1, =0, in R
2
.
C.4. Oscillating and non-isotropic cases
To construct the Hilbert space approximation for the oscillating model that was introduced in Section 3.3,
as well as non-isotropic versions, we introduce a coupled system of SPDEs for =2,
_
h
1
H
1
h
2
H
2
h
2
H
2
h
1
H
1
__
x
1
x
2
_
=
_
E
1
E
2
_
.33/
which is equivalent to the complex SPDE
{h
1
ih
2
.H
1
iH
2
/}{x
1
.u/ i x
2
.u/}=E
1
.u/ i E
2
.u/: .34/
The model in Section 3.3 corresponds to h
1
=
2
cos./, h
2
=
2
sin./, H
1
=I and H
2
=0.
To solve the coupled SPDE system (33) we take a set {
k
, k =1, . . . , n} of basis functions for H
1
n
., /
and construct a basis for the solution space for .x
1
x
2
/
T
as
_
1
0
_
, . . . ,
_
n
0
_
,
_
0
1
_
, . . . ,
_
0
n
_
:
The denitions of the G- and K-matrices are modied as follows:
.G
k
/
i, j
=H
1=2
k

i
H
1=2
k

j
)
, k =1, 2,
K
k
=h
k
CG
k
, k =1, 2:
Using the same construction as in the regular case, the precision for the solutions is given by
_
K
1
K
2
K
2
K
1
_
T
_
C 0
0 C
_
1
_
Q
E
0
0 Q
E
__
C 0
0 C
_
1
_
K
1
K
2
K
2
K
1
_
=
_
Q 0
0 Q
_
,
where Q=Q.h
1
, H
1
/ Q.h
2
, H
2
/, andQ., / is the precisionthat is generatedfor the regular iteratedmodel
with the given parameters. Surprisingly, regardless of the choice of parameters, the solution components
are independent.
C.5. Markov approximation
By choosing piecewise linear basis functions, the practical calculation of the matrix elements in the
construction of the precision is straightforward, and the local support make the basic matrices sparse.
Since they are not orthogonal, the C-matrix will be non-diagonal, and therefore the FEM construction
does not directly yield Markov elds for 2, since C
1
is not sparse. However, following standard prac-
tice in FEMs, C can be approximated with a diagonal matrix as follows. Let

C be a diagonal matrix, with
C
ii
=
j
C
ij
=
i
, 1)
, and note that this preserves the interpretation of the matrix as an integration matrix.
Substituting C
1
with

C
1
yields a Markov approximation to the FEM solution.
The convergence rate for the Markov approximation is the same as for the full FEM model, which
can be shown by adapting the details of the proofs of convergence. Let f and g be test functions in
H
1
., / and let f
n
and g
n
be their projections onto H
1
n
, , with basis weights w
f
and w
g
. Taking the
difference between the covariances for the Markov ( x
n
) and the full FEM solution (x
n
) for =2 yields the
error
cov.f, L x
n
)
, g, L x
n
)
/ cov.f, Lx
n
)
, g, Lx
n
)
/ =w
f
.
CC/w
g
:
Requiring |f|
H
1
., /
, |g|
H
1
., /
1, it follows from lemma 1 in Chen and Thome (1985) that the covari-
ance error is bounded by ch
2
, where c is some constant and h is the diameter of the largest circle that can
be inscribed in a triangle of the triangulation. This shows that the convergence rate from expression (11)
will not be affected by the Markov approximation. In practice, the C matrix in K should also be replaced
by

C. This improves the approximation when either h or is large, with numerical comparisons showing
a covariance error reduction of as much as a factor 3. See Bolin and Lindgren (2009) for a comparison of
the resulting kriging errors for various methods, showing negligible differences between the exact FEM
representation and the Markov approximation.
Appendix D: Proofs
D.1. Folded covariance: proof of theorem 1
Writing the covariance of the SPDE solutions on the interval =[0, L] R in terms of the spectral repre-
sentation gives an innite series,
cov{x.u/, x.v/}=
0
k=1
cos.uk=L/ cos.vk=L/
k
, .35/
where
0
=.
2
L/
1
and
k
=2L
1
{
2
.k=L/
2
}
are the variances of the weights for the basis functions

cos.uk=L/, k =0, 1, 2, . . .:
We use the spectral representation of the Matrn covariance in the statement of theorem 1, and
show that the resulting expression is equal to the spectral representation of the covariance for the solu-
tions to the given SPDE. The Matrn covariance on R (with variance given by the SPDE) can be
written as
r
M
.u, v/ =
1
2
_

.
2
2
/
cos{.vu/}d:
Thus, with r.u, v/ denoting the folded covariance in the statement of theorem 1
r.u, v/ =
k=
{r
M
.u, v2kL/ r
M
.u, 2kLv/}
=
1
2
k=
_

.
2
2
/
[cos{.vu2kL/}cos{.vu2kL/}] d
=
1
2
_

.
2
2
/
k=
[cos{.vu2kL/}cos{.vu2kL/}] d
Rewriting the cosines via Eulers formulae, we obtain
k=
[cos{.vu2kL/}cos{.vu2kL/}]
=
1
2
k=
{exp.iu/ exp.iu/}[exp{i.v2kL/}exp{i.v2kL/}]
=cos.u/{exp.iv/
k=
exp.2ikL/ exp.iv/
k=
exp.2ikL/}
=2 cos.u/{exp.iv/ exp.iv/}
k=
.2L2k/
=
2
L
cos.u/ cos.v/
k=
k
L
_
where we used the Dirac measure representation
k=
exp.iks/ =2
k=
.s 2k/:
Finally, combining the results yields
r.u, v/ =
1
L
_

.
2
2
/
cos.u/ cos.v/
k=
k
L
_
d
=
1
L
k=
_
_
k
L
_
2
_
cos
_
uk
L
_
cos
_
vk
L
_
=
1
2
L
2
L
k=1
_
_
k
L
_
2
_
cos
_
uk
L
_
cos
_
vk
L
_
,
which is precisely the expression sought in equation(35).
D.2. Modied half-Laplacian equivalence: proof of lemma 2
For brevity, we present only the proof for compact manifolds, as the proof for =R
d
follows the same
principle but without the boundary complications. The main difference is that the Fourier representation
is discrete for compact manifolds and continuous for R
d
.
Let
k
0, k =0, 1, 2, . . . , be the eigenvalue corresponding to eigenfunction E
k
of (denition 1).
Then, with

.k/ =.F/.k/, the modied half-Laplacian from Appendix B.3.2. is dened through F{.
2
/
1=2
}.k/ =.
2
k
/
1=2

.k/, and we obtain
.
2
/
1=2
.
2
/
1=2
)
=
_
k=0
.
2
k
/
1=2

.k/E
k
,
k
/
=0
.
2
k
/ /
1=2

.k
/
/E
k
/
_
,
and, since , H
1
., /, we can change the order of integration and summation,
.
2
/
1=2
.
2
/
1=2
)
k=0
.
2
k
/

.k/

.k/,
since the eigenfunctions E
k
and E
k
/ are orthonormal.
Now, starting from the Hilbert space inner product,
, )
H
1
., /
=
2
, )
, )
=
2
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_

and, since , H
1
., / and E
k
, E
k
/ L
2
./, we can change the order of differentiation and summation,
, )
H
1
., /
=
2
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
k=0
.k/E
k
,
k
/
=0
.k
/
/E
k
/
_
and, since in addition E

k
, E
k
/ L
2
./, we can change the order of summation and integration,
, )
H
1
., /
=
2

k=0
k
/
=0
.k/

.k
/
/E
k
E
k
/ )
k=0
k
/
=0
.k/

.k
/
/E
k
E
k
/ )
:
Further, Greens identity for E
k
, E
k
/ )
yields
E
k
E
k
/ )
=E
k
, E
k
/ )
E
k
, @
n
E
k
/ )
@
=
k
/ E
k
E
k
/ )
E
k
@
n
E
k
/ )
@
:
Since , L
2
./ we can change the order of summation, integration and differentiation for the
boundary integrals,
k=0
k
/
=0
.k/

.k
/
/E
k
@
n
E
k
/ )
@
=@
n
)
@
:
By the boundary requirements in lemma 2, whenever Greens identity holds, the boundary integral van-
ishes, either because the boundary is empty (if the manifold is closed), or the integrand is 0, so collecting
all the terms we obtain
, )
H
1
., /
=
k=0
k
/
=0
.
2
/
k
/

.k/

.k
/
/E
k
E
k
/ )
0=
k=0
.
2
k
/

.k/

.k/,
and the proof is complete.
D.3. Hilbert space convergence
D.3.1. Proof of theorem 2 (nite element precisions)
The proofs for theorem 2 are straightforward applications of the denitions. Let w
f
and w
g
be the Hilbert
space co-ordinates of two test functions f
n
, g
n
H
1
n
., /, and let L=.
2
/.
For case (a), =2 and E =W, so
f
n
, Lx
n
)
i, j
w
f, i
i
L
j
)
w
j
=
i, j
w
f, i
.
2
C
i, j
G
i, j
/w
j
=w
T
f
Kw
owing to Greens identity, and
cov.f
n
, Lx
n
)
, g
n
, Lx
n
)
/ =w
T
f
Kcov.w, w/K
T
w
g
:
This covariance is equal to
cov.f
n
, W)
, g
n
, W)
/ =f
n
, g
n
)
i, j
w
f, i
i
,
j
)
w
g, j
=
i, j
w
f, i
C
i, j
w
g, j
=w
T
f
Cw
g
n
, g
n
when Q=cov.w, w/
1
=K
T
C
1
K.
For case (b), =1 and E =W. Using the same technique as in (a), but with lemma 2 instead of Greens
identity, L
1=2
f
n
, L
1=2
x
n
)
=f
n
x
n
)
H
1
., /
=w
T
f
Kw and
cov.L
1=2
f
n
, W)
, L
1=2
g
n
, W)
/ =L
1=2
f
n
, L
1=2
g
n
)
=f
n
g
n
)
H
1
., /
=w
T
f
Kw
g
so Q=K
T
K
1
K=K, noting that K is a symmetric matrix since both C and G are symmetric.
Finally, for case (c), =2 and E =E
n
is a GFon H
1
n
., / with precision Q
E, n
. Using the same technique
as for (a),
cov.f
n
, Lx
n
)
, g
n
, Lx
n
)
/ =w
T
f
Kcov.w, w/K
T
w
g
:
and the nite basis representation of the noise E
n
gives
cov.f
n
, E
n
)
, g
n
, E
n
)
/ =w
T
f
CQ
1
E, n
Cw
g
:
Requiring equality for all pairs of test functions yields Q=K
T
C
1
Q
E, n
C
1
K. Here, keeping the transposes
allows the proof to apply also to the intrinsic free-boundary cases.
D.3.2. Proof of theorem 3 (convergence)
First, we show that expression (28) follows from expression (29). Let L=.
2
/, let f and g be functions
in H
1
., /, and let

f the solution to the PDE
L

f.u/ =f.u/, u ,
@
n

f.u/ =0, u @,
and correspondingly for g. Then

fs and g are in H
1
., / and further full the requirements of lemma 1
and lemma 2. Therefore,
f, x
n
)
=L

f, x
n
)
=

f, x
n
)
H
1
., /
=

f, Lx
n
)
,
and
f, x)
=L

f, x)
=

f, x)
H
1
., /
=

f, Lx)
,
where the last equality holds when =2, since W is L
2
./ bounded. The convergence of x
n
to x follows
from expression (29). In the Galerkin case (a), we have
cov.f, x
n
)
, g, x
n
)
/ =cov.

f, Lx
n
)
, g, Lx
n
)
/
cov.

f, Lx)
, g, x)
/ =cov.f, x)
, g, x)
/,
and similarly for the least squares case (b).
For expression (29), let f
n
=
k
k
w
f, k
and g
n
=
k
k
w
g, k
be the orthogonal projections of f and g onto
H
1
n
, . In case (a), then
f, Lx
n
)
=f, x
n
)
H
1
., /
=f f
n
, x
n
)
H
1
., /
f
n
, x
n
)
H
1
., /
=f
n
, x
n
)
H
1
., /
,
and
cov.f, Lx
n
)
, g, Lx
n
)
/ =cov.f
n
, x
n
)
H
1
., /
, g
n
, x
n
)
H
1
., /
/
=cov.f
n
, W)
, g
n
, W)
/ =f
n
, g
n
)
f, g)
=cov.f, W)
, g, W)
/
as n. Similarly in case (b), for any f H
1
., / fullling the requirements of lemma 2,
L
1=2
f, L
1=2
x
n
)
=f, x
n
)
H
1
., /
=f
n
, x
n
)
H
1
., /
,
and
cov.L
1=2
f, L
1=2
x
n
)
, L
1=2
g, L
1=2
x
n
)
/ =cov.f
n
, x
n
)
H
1
., /
, g
n
, x
n
)
H
1
., /
/
=cov.L
1=2
f
n
, W)
, L
1=2
g
n
, W)
/ =f
n
, g
n
)
H
1
., /
f, g)
H
1
., /
=L
1=2
f, L
1=2
g)
=cov.L
1=2
f, W)
, L
1=2
g, W)
/
as n.
D.3.3. Proof of theorem 4 (iterative convergence)
First, we show that expression (31) follows from expression (32). Let

f and g be dened as in the proof of
theorem 3. Then, since L=
2
,
f, x
n
)
=

f, Lx
n
)
and
f, x)
=

f, Lx)
,
and the convergence of x
n
to x follows from expression (32). For expression (32) as in the proof of theorem
3, f, Lx
n
)
=f
n
, x
n
)
H
1
., /
, and
cov.f, Lx
n
)
, g, Lx
n
)
/ =cov.f
n
, x
n
)
H
1
., /
, g
n
, x
n
)
H
1
., /
/
=cov.f
n
, y
n
)
, g
n
, y
n
)
/ =cov.f, y
n
)
, g, y
n
)
/
cov.f, y)
, g, y)
/ =cov.f, Lx)
, g, Lx)
/
as n, due to requirement (30).
References
Adler, R. J. (2009) The Geometry of RandomFields. Philadelphia: Society for Industrial and Applied Mathematics.
Adler, R. J. and Taylor, J. (2007) Random Fields and Geometry. New York: Springer.
Allcroft, D. J. and Glasbey, C. A. (2003) A latent Gaussian Markov random-eld model for spatiotemporal
rainfall disaggregation. Appl. Statist., 52, 487498.
Arjas, E. and Gasbarra, D. (1996) Bayesian inference of survival probabilities, under stochastic ordering con-
straints. J. Am. Statist. Ass., 91, 11011109.
Auslander, L. and MacKenzie, R. E. (1977) Introduction to Differentiable Manifolds. New York: Dover Publica-
tions.
Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004) Hierarchical Modeling and Analysis for Spatial Data. Boca
Raton: Chapman and Hall.
Banerjee, S., Gelfand, A. E. Finley, A. O. and Sang, H. (2008) Gaussian predictive process models for large spatial
data sets. J. R. Statist. Soc. B, 70, 825848.
Bansal, R., Staib, L. H., Xu, D., Zhu, H. and Peterson, B. S. (2007) Statistical analyses of brain surfaces using
Gaussian random elds on 2-D manifolds. IEEE Trans. Med. Imgng, 26, 4657.
Besag, J. (1974) Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Statist.
Soc. B, 36, 192236.
Besag, J. (1975) Statistical analysis of non-lattice data. Statistician, 24, 179195.
Besag, J. (1981) On a system of two-dimensional recurrence equations. J. R. Statist. Soc. B, 43, 302309.
Besag, J. and Kooperberg, C. (1995) On conditional and intrinsic autoregressions. Biometrika, 82, 733746.
Besag, J. and Mondal, D. (2005) First-order intrinsic autoregressions and the de Wijs process. Biometrika, 92,
909920.
Besag, J., York, J. and Molli, A. (1991) Bayesian image restoration with two applications in spatial statistics
(with discussion). Ann. Inst. Statist. Math., 43, 159.
Bolin, D. and Lindgren, F. (2009) Wavelet Markov models as efcient alternatives to tapering and convolution
elds. Preprint 2009:13. Lund University, Lund.
Bolin, D. and Lindgren, F. (2011) Spatial models generated by nested stochastic partial differential equations.
Ann. Appl. Statist., to be published.
Brenner, S. C. and Scott, R. (2007) The Mathematical Theory of Finite Element Methods, 3rd edn. New York:
Springer.
Brohan, P., Kennedy, J., Harris, I., Tett, S. and Jones, P. (2006) Uncertainty estimates in regional and global
observed temperature changes: a new dataset from 1850. J. Geophys. Res., 111.
Chen, C. M. and Thome, V. (1985) The lumped mass nite element method for a parabolic problem. J. Aust.
Math. Soc. B, 26, 329354.
Chils, J. P. and Delner, P. (1999) Geostatistics: Modeling Spatial Uncertainty. Chichester: Wiley.
Ciarlet, P. G. (1978) The Finite Element Method for Elliptic Problems. Amsterdam: North-Holland.
Cressie, N. A. C. (1993) Statistics for Spatial Data. New York: Wiley.
Cressie, N. and Huang, H. C. (1999) Classes of nonseparable, spatio-temporal stationary covariance functions.
J. Am. Statist. Ass., 94, 13301340.
Cressie, N. and Johannesson, G. (2008) Fixed rank kriging for very large spatial data sets. J. R. Statist. Soc. B,
70, 209226.
Cressie, N. and Verzelen, N. (2008) Conditional-mean least-squares tting of Gaussian Markov random elds to
Gaussian elds. Computatnl Statist. Data Anal., 52, 27942807.
Dahlhaus, R. and Knsch, H. R. (1987) Edge effects and efcient parameter estimation for stationary random
elds. Biometrika, 74, 877882.
Das, B. (2000) Global covariance modeling: a deformation approach to anisotropy. PhD Thesis. Department of
Statistics, University of Washington, Seattle.
Davis, T. A. (2006) Direct Methods for Sparse Linear Systems. Philadelphia: Society for Industrial and Applied
Mathematics.
Diggle, P. J. and Ribeiro, P. J. (2006) Model-based Geostatistics. New York: Springer.
Duff, I. S., Erisman, A. M. and Reid, J. K. (1989) Direct Methods for Sparse Matrices, 2nd edn. New York:
Clarendon.
Edelsbrunner, H. (2001) Geometry and Topology for Mesh Generation. Cambridge: Cambridge University Press.
Eidsvik, J., Finley, A. O., Banerjee, S. and Rue, H. (2010) Approximate bayesian inference for large spatial data-
sets using predictive process models. Technical Report 9. Department of Mathematical Sciences, Norwegian
University of Science and Technology, Trondheim.
Federer, H. (1951) Hausdorff measure and Lebesgue area. Proc. Natn Acad Sci. USA, 37, 9094.
Federer, H. (1978) Colloquium lectures on geometric measure theory. Bull. Am. Math. Soc., 84, 291338.
Fuentes, M. (2001) High frequency kriging for nonstationary environmental processes. Environmetrics, 12,
469483.
Fuentes, M. (2008) Approximate likelihood for large irregular spaced spatial data. J. Am. Statist. Ass., 102,
321331.
Furrer, R., Genton, M. G. and Nychka, D. (2006) Covariance tapering for interpolation of large spatial datasets.
J. Computnl Graph. Statist., 15, 502523.
George, A. and Liu, J. W. H. (1981) Computer Solution of Large Sparse Positive Denite Systems. Englewood
Cliffs: Prentice Hall.
Gneiting, T. (1998) Simple tests for the validity of correlation function models on the circle. Statist. Probab. Lett.,
39, 119122.
Gneiting, T. (2002) Nonseparable, stationary covariance functions for space-time data. J. Am. Statist. Ass., 97,
590600.
Gneiting, T., Kleiber, W. and Schlather, M. (2010) Matrn cross-covariance functions for multivariate random
elds. J. Am. Statist. Ass., 105, 11671177.
Gschll, S. and Czado, C. (2007) Modelling count data with overdispersion and spatial effects. Statistical Papers.
(Available from http://dx.doi.org/10.1007/s00362-006-0031-6.)
Guttorp, P. and Gneiting, T. (2006) Studies in the history of probability and statistics XLIX: on the Matrn
correlation family. Biometrika, 93, 989995.
Guyon, X. (1982) Parameter estimation for a stationary process on a d-dimensional lattice. Biometrika, 69,
95105.
Hansen, J., Ruedy, R., Glascoe, J. and Sato, M. (1999) GISS analysis of surface temperature change. J. Geophys.
Res., 104, 3099731022.
Hansen, J., Ruedy, R., Sato, M., Imhoff, M., Lawrence, W., Easterling, D., Peterson, T. and Karl, T.
(2001) A closer look at United States and global surface temperature change. J. Geophys. Res., 106, 23947
23963.
Hartman, L. and Hssjer, O. (2008) Fast kriging of large data sets with Gaussian Markov randomelds. Computnl
Statist. Data Anal., 52, 23312349.
Heine, V. (1955) Models for two-dimensional stationary stochastic processes. Biometrika, 42, 170178.
Henderson, R., Shimakura, S. and Gorst, D. (2002) Modelling spatial variation in leukemia survival data. J. Am.
Statist. Ass., 97 965972.
Higdon, D. (1998) A process-convolution approach to modelling temperatures in the North Atlantic Ocean.
Environ. Ecol. Statist., 5, 173190.
Higdon, D., Swall, J. and Kern, J. (1999) Non-stationary spatial modelling. In Bayesian Statistics 6 (eds
J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith), pp. 761768. New York: Oxford University
Press.
Hjelle, . and Dhlen, M. (2006) Triangulations and Applications. Berlin: Springer.
Hrafnkelsson, B. and Cressie, N. A. C. (2003) Hierarchical modeling of count data with application to nuclear
fall-out. Environ. Ecol. Statist. 10, 179200.
Hughes-Oliver, J. M., Gonzalez-Farias, G. Lu, J. C. and Chen, D. (1998) Parametric nonstationary correlation
models. Statist. Probab. Lett., 40, 267278.
Ili c, M., Turner, I. W. and Anh, V. (2008) A numerical solution using an adaptively preconditioned Lanczos
method for a class of linear systems related with the fractional Poisson equation. J. Appl. Math. Stoch. Anal.,
104525.
Jones, R. H. (1963) Stochastic processes on a sphere. Ann. Math. Statist., 34, 213218.
Jun, M. and Stein, M. L. (2008) Nonstationary covariancs models for global data. Ann. Appl. Statist., 2,
12711289.
Karypis, G. and Kumar, V. (1998) METIS: a Software Package for Partitioning Unstructured Graphs, Partitioning
Meshes, and Computing Fill-reducing Orderings of Sparse Matrices, Version 4.0. Minneapolis: University of
Minnesota. (Available from http://www-users.cs.umn.edu/karypis/metis/index.html.)
Kneib, T. and Fahrmeir, L. (2007) A mixed model approach for geoadditive hazard regression. Scand. J. Statist.,
34, 207228.
Krantz, S. G. and Parks, H. R. (2008) Geometric Integration Theory. Boston: Birkhuser.
Lindgren, F. and Rue, H. (2008) A note on the second order random walk model for irregular locations. Scand.
J. Statist., 35, 691700.
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. London: Chapman and Hall.
Paciorek, C. and Schervish, M. (2006) Spatial modelling using a new class of nonstationary covariance functions.
Environmetrics, 17, 483506.
Peterson, T. and Vose, R. (1997) An overview of the Global Historical Climatology Network temperature data-
base. Bull. Am. Meteorol. Soc., 78, 28372849.
Pettitt, A. N., Weir, I. S. and Hart, A. G. (2002) A conditional autoregressive Gaussian process for irregularly
spaced multivariate data with application to modelling large sets of binary data. Statist. Comput., 12, 353367.
Quarteroni, A. M. and Valli, A. (2008) Numerical Approximation of Partial Differential Equations, 2nd edn.
New York: Springer.
Rozanov, A. (1982) Markov Random Fields. New York: Springer.
Rue, H. (2001) Fast sampling of Gaussian Markov random elds. J. R. Statist. Soc. B 63, 325338.
Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications. London: Chapman and
Hall.
Rue, H., Martino, S. and Chopin, N. (2009) Approximate Bayesian inference for latent Gaussian models by using
integrated nested Laplace approximations (with discussion). J. R. Statist. Soc. B, 71, 319392.
Rue, H. and Tjelmeland, H. (2002) Fitting Gaussian Markov random elds to Gaussian elds. Scand. J. Statist.,
29, 3150.
Samko, S. G., Kilbas, A. A. andMaricev, O. I. (1992) Fractional Integrals and Derivatives: Theory and Applications.
Yverdon: Gordon and Breach.
Sampson, P. D. and Guttorp, P. (1992) Nonparametric estimation of nonstationary spatial covariance structure.
J. Am. Statist. Ass., 87, 108119.
Smith, T. (1934) Change of variables in Laplaces and other second-order differential equations. Proc. Phys. Soc.,
46, 344349.
Song, H., Fuentes, M. and Gosh, S. (2008) A compariative study of Gaussian geostatistical models and Gaussian
Markov random eld models. J. Multiv. Anal., 99, 16811697.
Stein, M. (2005) Space-time covariance functions. J. Am. Statist. Ass., 100, 310321.
Stein, M. L. (1999) Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer.
Stein, M. L., Chi, Z. and Welty, L. J. (2004) Approximating likelihoods for large spatial data sets. J. R. Statist.
Soc. B, 66, 275296.
Vecchia, A. V. (1988) Estimation and model identication for continuous spatial processes. J. R. Statist. Soc. B,
50, 297312.
Wahba, G. (1981) Spline interpolation and smoothing on the sphere. SIAM J. Scient. Statist. Comput., 2, 516.
Wall, M. M. (2004) A close look at the spatial structure implied by the CAR and SAR models. J. Statist. Planng
and Inf., 121, 311324.
Weir, I. S. and Pettitt, A. N. (2000) Binary probability maps using a hidden conditional autoregressive Gaussian
process with an application to Finnish common toad data. Appl. Statist., 49, 473484.
Whittle, P. (1954) On stationary processes in the plane. Biometrika, 41, 434449.
Whittle, P. (1963) Stochastic processes in several dimensions. Bull. Inst. Int. Statist., 40, 974994.
Yue, Y. and Speckman, P. (2010) Nonstationary spatial Gaussian Markov random elds. J. Computnl Graph.
Statist., 19, 96116.

Lindgren 16 March 2011 PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lindgren 16 March 2011 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

2011 Royal Statistical Society 13697412/11/73000

.|v u|/: .1/

}. Fig. 1 shows how accurate these approximations

and maximum edge lengths corresponding to 500 km based on an aver-

x.u, t/ =E.u, t/, .17/

(all 49 spherical harmonics up to

(B-splines of order 2 in sin(latitude),

, the local spatial dependence

and the local variance scaling .u/ is dened through

0, which provides natural relative prior weights for

/. Introducing observation matrices A

C. Comparing the values with the corresponding estimates in the GISS

C. Thus, the results here are similar

./. This leads to a natural generalization of Lebesgue measure and inte-

: F .R, is an equivalence class of objects identied

./, for all x

is absolutely continuous with respect to the Hausdorff measure on , x is a set of regular

.A/ : {A; A} .R, such that

.A/} =0 and cov{W

. This denition is equivalent to that above (Adler

is a solution to the original SPDE. To avoid solutions in the null space of

are the variances of the weights for the basis functions

Link between Gaussian Fields and Gaussian Markov Random Fields 33

and, since in addition E

S-ar putea să vă placă și