Bera, A Et Al. - Spatial Analysis From The Beginning To The Frontiers of Spatial Econometrics

1
Cover Page
Spatial Analysis: From the Beginning to the Frontiers of

Spatial Econometrics
Anil K Bera
abera@uiuc.edu
University of Illinois, Urbana-Champaign, IL
Pradosh Simlai
pradosh.simlai@business.und.edu
University of North Dakota, Grand Forks, ND
Jun Yan
jun.yan@uconn.edu
University of Connecticut, Storrs, CT
1
2
Moran’s Test: The first formal result
3
3
Developments in the Statistics literature
Conventionally, spatial data are classified into three types: geostatisti-

cal data, lattice data, and point pattern data, each leading to a rather
separate branch of spatial statistics. This section is by no means a
thorough review. Instead, it aims to provide a big picture of the prob-
lems, models, and inferences in spatial statistics for readers who are
not familiar spatial statistics literature. Recent book level references
on spatial statistics are, for example, ?, ?, ?, ?, ?, and ?.
3.1 Geostatistical Data

3.1.1 Data
Consider a subset D of the r-dimensional Euclidean space Rr , with
positive Euclidean volume. The dimension r in most spatial data is
2. A random field on D is a stochastic process indexed by s ∈ D,
i.e., a family of random variables {Y (s) : s ∈ D}. A geostatistical
data is a realization of the random field observed at fixed locations
{si ∈ D : i = 1, . . . , n}, where the spatial index si can vary continuously
over D. In particular, the observed data is a realization of
{Y (si ) : si ∈ D, i = 1, . . . , n}. (3.1)
5
6 Developments in the Statistics literature
Although in many the geostatistical applications normality has been

assumed for {Y (s) : s ∈ D}, there are as many or more applications
where the data are clearly non-normally distributed (?). Models for
both normal and non-normal data are discussed in the next subsection.
The origin of geostatistics can be traced back to the early 1950s
in mining engineering (?). An important problem was to predict the
ore grade in a mining block given the observed samples. That is, given
observations {Y (si ) : i = 1, . . . , n}, how do we predict the variable
Y (s0 ) at an unobserved location s0 ? ? developed an empirical approach,
now known as kriging, named after him by ?. A large amount of work by
Matheron and his colleague on geostatistics was developed in parallel
to the mainstream statistics literature represented by ?. In fact, what’s
known as simple kriging in geostatistics is an optimal prediction with
minimum mean mean-squared error under a linear Gaussian model (?).
A closer look at this optimal prediction problem here can motivate
discussions of models and inferences of geostatistics. Assume that the
data {Y (si ) : i = 1, . . . , n} is a realization of an underlying random field
∑
with zero mean. A linear predictor of Y (s0 ) is of the form ni=1 li Y (si )
∑n
with i=1 li = 1. An optimal predictor minimizes the mean squared
error (MSE)
∑
n ∑
n ∑
n
E[Y (s0 ) − 2
li Y (si )] = ai aj E[Y (si )Y (sj )] (3.2)
i=1 i=0 j=0
∑n
where a0 = 1 and ai = −li , i = 1, . . . , n. Using the fact that i=0 ai =
0, (3.2) can be rewritten as
1 ∑∑
n n
− ai aj Var[Y (si ) − Y (sj )]. (3.3)
2
i=0 j=0
When Var[Y (si ) − Y (sj )] = 2γ(si − sj ), a function only of the incre-

ment si − sj , the quantity 2γ(·) is known as variogram and γ(·) as
semivariogram. Expression (3.3) shows the importance of variogram in
this optimization problem.
In addition to prediction, estimation is another main objective of
a geostatistical analysis in most applications. Estimation of model pa-
rameters, for example, regression coefficients, can be of direct scientific
interest. Estimation of variogram is important for prediction.
3.1. Geostatistical Data 7
3.1.2 Fundamentals
3.1.2.1 Variation Decomposition
A general model for geostatistical data is to decompose the observation
into large scale variation and small scale variation (?):
Y (si ) = µ(si ) + Z(si ) + ϵ(si ), (3.4)
i = 1, . . . , n, where {µ(si )} captures the large scale variation (spa-

tial trend), {Z(si )} captures the small scale variation with spatial cor-
relation, and {ϵ(si )} is a measurement error process independent of
{Z(si )}. The spatial trend µ(si ) can incorporate explanatory variables
X(si ) at location si to form a regression model µ(si ) = X ⊤ (si )β with
parameter vector β. The spatial process {Z(si )} with spatial correla-
tion is of great interest in a spatial context. The variogram of {Z(si )}
is crucial in determining the optimal kriging (3.2). Often times, {Z(s)}
is assumed to be a Gaussian field in model-based geostatistics. The
measurement error term {ϵ(si )} is usually assumed to be mutually in-
dependent N (0, τ 2 ) variables.
3.1.2.2 Stationarity
To proceed with the spatial process {Z(si )} in (3.4), some stationarity
is needed. Otherwise, no inference can be made since the data is only an
incomplete sampling of a single realization. A random field {Z(s)} in
D ⊂ Rr is strictly stationary if all finite dimensional distributions are
invariant under translations of the index set D. That is, for any k ≥ 1
and any h ∈ Rr , the distribution of [Z(s1 ), . . . , Z(sk )] is the same as
that of [Z(s1 +h), . . . , Z(sk +h)]. A practically more useful stationarity
is weakly stationary or second-order stationary, which only specifies
the first two moments. A random field {Z(s)} in D ⊂ Rr is weakly
stationary if for any s, t ∈ D, E[Z(s)] = µ and Cov[Z(s), Z(t)] =
C(s − t). The function C(·) is also known as covariogram. If C(s − t) is
invariant under rotations of the index set D, i.e., C(s − t) is a function
only of the Euclidean distance ∥s−t∥, then C(·) is isotropic. Obviously,
strict stationary is stronger than weak stationarity except for Gaussian
fields where the first two moments identify the whole distribution.
Another type of stationarity, intrinsic stationarity, is motivated by

the traditional importance of prediction in geostatistics and the var-
iogram 2γ in equation (3.3). A random field {Z(s)} in D ⊂ Rr is
intrinsically stationary if for any s, t ∈ D, E[Z(s) − Z(t)] = 0 and
Var[Z(s)−Z(t)] = 2γ(s−t). The quantity 2γ(·) for an intrinsic station-
ary random field is the so-called variogram, which is particular useful
in forming optimal predictions as seen in (3.2) and (3.3). If variogram
2γ(s − t) is a function only of the Euclidean distance ∥s − t∥, then 2γ(·)
is isotropic. Intrinsic stationarity is weaker than weak stationarity, since
variogram may exist when covariogram does not exist.
3.1.2.3 Covariogram and Variogram
Under weak stationarity, the relationship between variogram 2γ(h) and

covariogram C(h) can be easily derived as
2γ(h) = Var[Z(s + h) − Z(s)] = 2[C(0) − C(h)]. (3.5)
Therefore, variogram γ can be from covariogram C. The converse is

not true in general. However, if the random field is ergodic, which
means that the correlation decays fast enough for the sample mean
to be consistent for the true mean, then we have lim∥h∥→∞ C(h) = 0
and covariogram can be recovered from variogram γ (see, for exam-
ple, ?, p. 23). In particular, by letting ∥h∥ → ∞ in (3.5), we have
lim∥h∥→∞ γ(h) = C(0). Therefore,
C(h) = C(0) − γ(h) = lim γ(u) − γ(h). (3.6)

∥u∥→∞
The quantity C(0) or γ(∞) is called the sill of the semivariogram. The
infimum of ∥h∥ such that γ(h) = C(0) is called the range of the semi-
variogram in the direction of h. The discussion of (3.5) and (3.6) also
reveals that weak stationarity is stronger than intrinsic stationarity.
Note that covariogram or variogram only characterizes the second
moment, not the full distribution of the random field. Moment-based
estimation for these functions can be used to satisfy the conventional
need of prediction in (3.2).
Table 3.1 Isotropic covariogram examples as function of distance d for L2

continuous processes.
Structure Parameters Covariance C(d)

Exponential ϕ, σ 2 σ 2 exp(−ϕd)
Gaussian ϕ, σ 2 σ 2 exp(−ϕ
[
2 2
( d) ) ( 3 )]
Spherical ϕ, σ 2 σ 2 1 − 2ϕ
3d
− 2ϕd
3 I{d ≤ ϕ}
−1 √ √
Matérn ϕ, ν, σ 2 2 ν
σ Γ (ν)(ϕ νd) 2Kν (2ϕ νd)
a The function Kν in Matérn’s structure is the modified Bessel function of the
second kind of (real) order ν > 0.
3.1.2.4 Parametric Isotropic Models
The simplest model for covariogram or variogram assumes isotropy.

That is, the covariogram C(h) or semivariogram γ(h) depends upon h
only through its length ∥h∥. Some popular parametric forms of C(d)
as a univariate function of d = ∥h∥ are summarized in Table 3.1. The
corresponding variogram γ(d) can be found from (3.5).
The parameters of a spherical variogram has nice interpretations.
The range parameter ϕ is the distance at which the model reaches its
maximum value σ 2 . The sill parameter σ 2 is the upper limit of the
semivariogram. Semivariograms of other classes in Table 3.1 all have
a finite sill parameter σ 2 which is only reached asymptotically (range
is infinity). The Matérn’s class in Table 3.1 was suggested by ?. The
parameter ν controls the smoothness of the random field. A random
field {Z(s)} is L2 -continuous if E[Z(s + h) − Z(s)]2 → 0 as ∥h∥ → 0.
Under a Matérn’s covariance structure, the smoothness (continuity) of
Z(s) increases as ν increases. It is worth pointing out that the Matérn
class becomes the exponential class when ν = 1/2, and the Gaussian
class when ν → ∞. A closed form is also available for ν = 3/2: σ 2 (1 +
ϕd) exp(−ϕd).
All the covariance functions in Table 3.1 are for random fields that
are L2 -continuous. That is, γ(d) → 0, as d → 0. For a real data,
however, some micro scale variation may cause a discontinuity at a
neighborhood around the origin. As a result, γ(d) → τ 2 > 0 as d →
0. This constant τ 2 is called the nugget effect, which is captured by
having a measurement error process ϵ(si ) in model (3.4). The covariance

function of {Z(si ) + ϵ(si )} is then γ(h) = τ 2 + γZ (h), where γZ (h) is
the variogram of {Z(s)}.
3.1.2.5 Hierarchical Model
Hierarchical models have other names such as multilevel models and

mixed effects models. In a geostatistical setting, spatial random effects
are introduced to accomodate the spatial autocorrelation, forming a
generalized linear mixed model (GLMM) (??). Most often, spatially
correlated random effects {Z(si ) : i = 1, . . . , n} are assumed to be
a stationary Gaussian field. Conditional on {Z(si ) : i = 1, . . . , n},
{Y (si ) : i = 1, . . . , n} are mutually independent with distributions
fi [y|S(si )] = f (y; µi ) specified by the conditional expectation µ(si ) =
E[Y (si )|Z(si )]. The conditional mean µ(si ) is linked through a link
function g to explanatory variables X(si ) and parameter β:
g[µ(si )] = Xi⊤ (si )β + Z(si ). (3.7)
3.1.3 Inferences
3.1.3.1 Variogram Estimation
Covariogram is only defined for weakly stationary processes, whereas

the variogram is defined for intrinsically stationary processes. Vari-
ogram estimation is preferred to covariogram estimation (?, p.70), since
variogram may exist in cases where covariogram is not defined.
Suppose that {Y (si ) : i = 1, . . . , n} is observed from an intrinsically
stationary random field. A simple moment estimator of the variogram
is (?)
1 ∑
2γ̂(h) = [Y (si ) − Y (sj )]2 , (3.8)
|N (h)|
(si ,sj )∈Nh
where N (h) is the set of pairs (si , sj ) such that si − sj = h, and |N (h)|
is the number of pairs in N (h). For irregularly spaced data, the set
N (h) in (3.8) needs to be binned:
N [h(l)] = {(si , sj ) : si − sj ∈ T [h(l)])} (3.9)

where T [h(l)] is some specified “tolerance” region around h(l). The

tolerance rigion should be small enough to retain the spatial resolution,
but large enough to obtain stable estimate. It is suggested that the
number of pairs in each bin be at least 30 (?). When the variogram
is isotropic, the set N (h) is replaced with N (∥h∥). When there are
spatial trend to be estimated, variograms can be estimated based on
the detrended residuals.
The estimator (3.8) is sensitive to outliers due to the squared terms
in the summand. As a remedy, ? obtain a location estimate of square-
root-differences {|Y (si )−Y (sj )|1/2 : (si , sj ) ∈ N (h)}, transform it back
to the correct scale, and adjust for bias. The resulting robust estimator
is
[ ∑ ]4
1
|N (h)| (si ,sj )∈Nh [Y (si ) − Y (sj )] 1/2
2γ̄(h) = . (3.10)
0.457 + 0.494/|N (h)|
The estimator (3.8) is unbiased when Y (·) is intrinsically stationary.
This is in contrast to that the moment estimator of covariogram Ĉ has
O(1/n) bias when Y (·) is second-order stationary. When the spatial
trend needs to be estimated, the bias of 2γ̂ based on residuals is biased
but the bias if of a smaller order than the corresponding covariogram
estimators’ bias (?, p.71). The robust estimator (3.10) is asymptotically
unbiased from its construction. More properties about the estimators
can be found, for example, in ?.
Empirical variogram estimate is an important exploratory tool. Di-
rectional variogram can be used to detect anisotropy. One obtains var-
iogram estimates at a range of binned angles — for instance, 0, 45, 90,
and 135 degrees — and display them on the same plot. Big differences
among the estimated curve at different angles suggests anisotropy. Em-
pirical variogram estimate can also be used to estimate the parameters
in a parametric variogram model with curve-fitting methods. For ex-
ample, using the generalized least square (GLS) method, one obtains
parameter estimate by minimizing the distance between the assumed
the variogram curve and the empirical variogram curve
When the full distribution of the spatial process is specified (mostly
Gaussian), variogram parameters as well as other model parameters
can be estimated using the maximum likelihood (ML) method or re-
stricted ML (REML) method, similar to those in time series analysis.

Estimating functions based on composite likelihood can also be used to
estimate the variogram (??). Asymptotic properties of the estimators
depend on whether increasing-domain or infill asymptotics are used.
3.1.3.2 Increasing-Domain or Infill Asymptotics
There are two types of asymptotics in spatial statistics: increasing-

domain asymptotics and infill asymptotics.
Increasing-domain asymptotics refer to the case where more data
are available by increasing the domain of the sampling area. This type
of asymptotics is analogous to the asymptotics used in time series,
where the length of the time series goes to infinity. The ML estimators
are consistent and asymptotically normal under regularity conditions
(?). The GLS estimation yields an asymptotically normal estimator
and can be compared with the ML estimator in efficiency (?). Variance
estimates may be obtained via resampling methods (?); see more under
the presentation of lattice data.
Infill asymptotics, also known as fixed-domain asymptotics, refer to
the case where more data are available by sampling more densely over
a fixed domain. Obviously, with a fixed-domain, consistent inference
for associations at increasing distance is not to be expected. In fact, ?
illustrates that the lease square estimate of regression coefficients con-
verges to a non-degenerate random variable, and hence, is inconsistent.
More recently, ? shows that parameters in the Matérn correlation func-
tion cannot all be estimated consistently, regardless of the estimation
method used. As for prediction, however, one can do an increasingly
better job (??). ? points out that a certain quantity can be estimated
consistently by the ML method and is all that matters for spatial in-
terpolation.
A mix of increasing-domain and infill asymptotics can be useful, for
example, in spatial sampling design. ? develops results under the mixed
increasing-domain spatial asymptotic structure that involves simulta-
neous infilling of increasing domains.
Asymptotics are used in practice not because more and more obser-
vations are to be planned by either increasing the domain or sampling
more densely in a fixed domain. Instead, they are used because that the
asymptotic results may be useful for the specific problems, estimation
or prediction, to be solved (?, p.62).
3.1.3.3 Prediction (Kriging)

Prediction has been important for geostatistical data since the origin
of geostatistics (?). Given the observations of a random field {Y (si ) :
i = 1, . . . , n}, what is the optimal predictor of the variable Y (s0 ) at
an unobserved location s0 ? This problem is often discussed under the
minimum MSE (MMSE) criterion. We consider universal kriging in this
subsection, where covariate X(si ) is available (It is referred as ordinary
kriging when there is no covariate).
Under Gaussian assumption, the optimal linear prediction can be
found using the standard multivariate normal distribution theory. Sup-
pose that the model is Y (s) = X ⊤ (s)β + Z(s), where β is the covariate
coefficient vector, and Z(·) is an intrinsically stationary Gaussian field
with semivariogram function γ(·). Consider the class of linear unbiased
predictors
Ŷ (s0 ) = Y ⊤ λ, with Xλ = x0 , (3.11)
where Y = {Y (s1 ), . . . , Y (sn )}⊤ , X = {X(s1 ), . . . , X(sn )}⊤ , x0 =
x(s0 ), and λ is the coefficient vector to be determined. The constrain
Xλ = x0 is necessary and sufficient for a uniformly unbiased predictor.
The solution to the constrained optimization problem turns out to be
(?, p.153)
[ ]
λ = Γ−1 γ + X(X ⊤ Γ−1 X)−1 (x0 − X ⊤ Γ−1 γ)−1 , (3.12)
where Γij = γ(si − sj ), γ = [γ(s0 − s1 ), . . . , γ(s0 − sn )]⊤ . This solution

reiterates the importance of semivariogram in geostatistics.
If the intrinsic stationarity assumption is strengthened to be the
more familiar second-order stationarity, the solution has a similar form
[ ]
λ = Σ−1 c + X(X ⊤ Σ−1 X)−1 (x0 − X ⊤ Σ−1 c)−1 , (3.13)
where Σ is the covariance matrix of Y , and c = [C(s0 − s1 ), . . . , C(s0 −

sn )]⊤ . Note that the generalized least square estimate of β is β̂ =
(X ⊤ Σ−1 X)−1 X ⊤ Σ−1 Y . The resulting predictor can be expressed as
Ŷ (s0 ) = x⊤ −1
0 β̂ + cΣ (Y − X β̂), (3.14)
which is exactly the conditional mean of Y (s0 ) with β replaced with β̂

(?, p.51).
For a given λ, the kriging variance
Var[Ŷ (s0 )] = λ⊤ Σλ. (3.15)
In practice, the semivariogram γ(·) or covariagram C(·) will need to

be estimated. When these quantities are substituted by their estimate,
the variability of the estimates is not reflected in the kriging variance,
and hence, the coverage of prediction intervals will be smaller than the
nominal level.
For non-normal data, ? extended kriging into the setting of GLMM
(3.7). The difficulty is that Z(·) is unobserved. Of interest is the pre-
diction of Z(s0 ) given the observed Y . ? refer to Ẑ(s0 ) = E[S(s0 )|Y ]
as the generalized linear predictor for Z(s0 ). ? points out that, if
Z(·) is a zero mean Gaussian field, and given Z(·), the distribu-
tion of Y (s) depends on Z(s) only, then the MMSE predictor is
∑
E[Z(s0 )|Y ] = ni=1 λi E[Z(si )|Y ], where the coefficients λi ’s are such
∑
that E[Z(s0 )|Z(si ) : i = 1, . . . , n] = ni=1 λi Z(si ). ? develops a Monte
Carlo version of the EM gradient algorithm to obtain the MLEs, with
MMSE predictors for random effects as a by-product.
3.1.3.4 Bayesian Inference

? propose model-based geostatistics for GLMM (3.7) using Bayesian
methods. Choosing prior distributions for the parameters completes the
Bayesian specification. Markov chain Monte Carlo (MCMC) methods
are widely used in this field (?). Inferences are based on posterior sam-
ples of the quantities of interests, for example, regression parameters,
latent spatial process, prediction at unobserved locations. Bayesian in-
ference has the advantages that it can accommodate the variation from
parameter estimation in kriging, and that asymptotics is not a con-
cern. In principal, with the MCMC methods, Bayesian inference can
be applied to any problem that is intractable using other methods. For
example, ? develop spatial models with regression coefficients smoothly

space-varying.
The general purpose software for Bayesian inference, BUGS (?), has
an add-on module GeoBUGS (?), which provides facilities for Bayesian
geostatistical inference; see detailed examples in ?.
3.1.4 Simulation
Simulation from a Gaussian process with given covariance function can

be done on a grid or at given locations s1 , . . . , sn . Let Σ be the co-
variance matrix for the discrete observations. A Gaussian process with
mean µ can be simulated from the general algorithm of generating mul-
tivariate normal variates: obtain Cholesky decomposition Σ = LL⊤ ;
simulate z N (0, In ); and return x = µ + Lz. The Cholesky decompo-
sition of a n × n matrix is of order n2 in terms of storage and n3 in
terms of computation. Therefore, the real problem is how to simulate
efficiently for large n.
For a stationary Gaussian field, the circulant embedding method
(???) can simulate from it very efficiently on a grid on [0, 1]r . The
covariance matrix of a stationary Gaussian field on a grid is Toeplitz
for r = 1, block Toeplitz for r = 2, and nested block Toeplitz for r ≥ 3.
Such matrices can be embedded in a circulant matrix that is symmetric.
The circulant embedding method simulates from a longer vector whose
covariance matrix is the circulant, and then select a subvector whose
covariance matrix has the appropriate Toeplitz form. The speed of the
algorithm lies in that the square root of a circulant matrix can be
obtained efficiently using the fast Fourier transform (FFT), provided
the dimension of the circulant is highly composite.
These methods and other methods are implemented in the R pack-
age RandomFields (?). A recent book on geostatistical simulation is
?.
3.2 Lattice Data

3.2.1 Data
A lattice D in Rr is a countable collection of spatial locations. A lattice

data is a realization of the random field with index collection D = {si :
i = 1, . . . , n}:
{Y (si ) : i = 1, . . . , n}. (3.16)
Lattice data differs from geostatistical data in that the spatial index
s of a lattice varies discretely, instead of continuously, over the spatial
index collection D.
According to whether the spatial locations in D form a regular grid,
there are regular lattice and irregular lattice. Each individual location
may be a spatial point or a region. In the former case, the data is
a special form of geostatistical data. In the latter case, the data is
also referred as areal data. Regular lattice often arises under controlled
agricultural or ecological experiments where the system of sites forms a
regular grid. Irregular lattice data are collected at natural locations or
from regions with administrative boundary in a geographical context.
Lattice data is often supplemented with a system of neighbor in-
formation. Neighbors of a location si ∈ D can be defined based on
distance measures or similarity measures. Let N (si ) = {sk : sk ∼ si },
where a ∼ b means that a is a neighbor of b. Then a lattice D
supplemented with neighborhood information can be expressed as
DN = {[si , N (si )] : i = 1, . . . , n}. Various ways of defining neighbors
are appropriate under various circumstances. For example, in geograph-
ical studies such as disease mapping, two regions are neighbor if they
share a boundary, or if their centers are within a certain distance.
In many applications, the main scientific interest in lattice data
analysis is to assess the influence of a set of covariates on a response
variable, with the spatial dependence appropriately accounted for. The
spatial dependence may or may not be of direct concern. Even when it is
a nuisance, a solid treatment is necessary for valid and efficient inference
on regression coefficients. Spatial statistics for lattice data has found
applications in a wide range of disciplines, including the most fruitful
area of disease mapping in epidemiology and public health studies (see,
3.2. Lattice Data 17
for example, ??).
3.2.2 Models
A general model for lattice data accommodates both large scale vari-
ation and small scale variation. Large scale variation can incorporate
covariates. Small scale variation captures the spatial dependence. We
start from Gaussian (SAR and CAR) models for spatial dependence,
extend them to general Markov random fields, and finally add covari-
ates for spatial trends. For SAR and CAR Gaussian models, without
loss of generality, we assume {Z(s) : s ∈ D} has mean zero.
3.2.2.1 Simultaneous Autoregressive (SAR) Model

An analog to the autoregressive (AR) Gaussian model in time series
was first proposed for regular lattices by ?. Using the matrix notation,
a general model which also applies to irregular lattice has the form
Z = BZ + u, (3.17)
where Z = [Z(s1 ), . . . , Z(sn )]⊤ , B is a n×n spatial-dependence matrix,

and u is multivariate Gaussian with mean zero and variance Λ = σ 2 I.
The spatial-dependence matrix B may have dependence parameters
in its construction. This model is referred to as the simultaneous AR
(SAR) model in the statistics literature, because the distribution of Z
is simultaneously specified as
Z ∼ N [0, (I − B)−1 Λ(I − B ⊤ )−1 ]. (3.18)
Note that the spatial lag dependence model and spatial error depen-
dence model in the econometrics literature are both simultaneously
specified model. The most notable feature of simultaneously specified
models is that they facilitate likelihood-based inferences.
A frequently used choice for the spatial-dependence matrix B is
B = ρW , where ρ is a spatial autoregression parameter, and W is a
contiguity matrix with the (i, j)th entry being 1 if i and j are neigh-
∑
bors and 0 otherwise. In this case, Z(si ) = ρ sj ∼si Z(sj ) + ui . One
limitation of this specification is that, in order for I − B to be full rank,
ρ must be within (λ−1 −1

(1) , λ(n) ), where λ(1) and λ(n) are the smallest and
the largest eigenvalues of W , respectively.
An alternative form of B is to replace W with its row-normalized
version W̃ so that each row sums to 1. That is, W̃ij = Wij /Wi+ ,
where Wi+ is the sum of the ith row of W . Unlike contiguity ma-
trix W , W̃ is not symmetric. Let B = αW̃ . Then the SAR model
∑
becomes Z(si ) = α sj ∼si Z(sj )/Wi+ + ui . Since the eigenvalues of a
row stochastic matrix W̃ have maximum 1 in absolute value, matrix
I − B will be full rank for all α ∈ (−1, 1). The benefit of this speci-
fication is therefore that α can be interpreted as a spatial correlation
parameter (?, p.85).
3.2.2.2 Conditional Autoregressive (CAR) Model

? proposes a conditional specification of multivariate Gaussian model
for {Z(s) : s ∈ D}. For each Z(si ), a conditional normal distribution
is specified given all the rest {Z(sj ) : j ̸= i}. Assume E(Z) = 0 and
Var(Z) = Q−1 , where Q is the precision matrix. It can be shown (see,
for example, ?, p.22) that
 
∑
Z(si )|{Z(sj ) = zj : j ̸= i} ∼ N −Q−1ii Qij zj , Q−1
ii
, (3.19)
j̸=i
and furthermore, for i ̸= j,

Qij
Corr(Z(si ), Z(sj )|{Z(sk ) : k ̸= i, k ̸= j}) = − √ . (3.20)
Qii Qjj
Equation (3.19) gives the conditional mean and conditional variance of
Z(si ) given {Z(sj ) = zj : j ̸= i}. Equation (3.20) suggests that Z(si )
and Z(sj ) are conditionally independent given the rest, if and only if
Qij = 0. Clearly, a conditionally specified Gaussian model is essentially
specifying the precision matrix Q.
Conditional specification raises a natural question: How can one
specify the conditional model (3.19) for each Z(si ) so that the result-
ing matrix Q is a valid precision matrix, i.e., symmetric and positive
definite? A generalization of this question without the Gaussian as-
sumption is, how can one specify all the full conditional distributions
in a consistent way such that they uniquely determine a valid joint dis-
tribution? This question is to be answered with the theory of Markov
random fields later.
The consistency of conditional specification can be ensured in a
simple CAR Gaussian model. Let M = diag{τ12 , . . . , τn2 }. Let C be a
spatial-dependence matrix with cij τj2 = cji τi2 and cii = 0. A CAR
model specifies a full conditional Gaussian model for each Z(si ) with
∑
conditional mean j̸=i cij Z(sj ) and conditional variance τi2 . The joint
distribution of Z is then
( )
Z ∼ N 0, (I − C)−1 M , (3.21)
if I − C is full rank. Similar to the construction of B for SAR, matrix C

can also incorporate dependence parameter ρ and a contiguity matrix
W.
3.2.2.3 SAR and CAR
The distinction between conditional and simultaneously specified mod-

els was first noted by ?. A recent comparison of the spatial structure
implied by the CAR and SAR models is ?. The SAR Gaussian (SG)
model (3.18) and CAR Gaussian (CG) model 3.21 are equivalent if and
only if
(I − C)−1 M = (I − B)−1 Λ(I − B ⊤ )−1 . (3.22)
However, the CG model is a wider class of models than the SG model in
that any SG model can be represented as a CG model but the converse
is not true in general. On a regular lattice, a SG model of order 1
corresponds to a CG model of order 3. There is no equivalent SG model
for CG models of order 1 and 2 (?, p.409).
Another advantage of the CG model is that the precision matrix
can be singular, in which case we have a density function for Z which
is not integrable and hence improper. A widely used specification is the
pairwise-difference specification:
 
1 ∑
π(z1 , . . . , zn ) ∝ τ −(n−1)/2 exp − 2 (zi − zj )2  , (3.23)
2τ
i∼j
where π is the joint density function of Z. Note that the overall level
∑
of Z is not specified here. A constraint ni=1 Zi = 0 would lead to a
proper distribution. This specification is referred to as an intrinsically
autoregressive (IAR) model (?). The power of τ in the normalizing
constant is −(n − 1)/2 instead of −n/2 (?), which is important in
estimating τ . This specification is widely used as a prior for spatially
correlated random effects in Bayesian hierarchical modeling (?).
The SG model is dominating in the econometrics literature while the
CG model is much more popular in the statistics literature. Compared
to the CG model, the SG model may lead to misspecification due to
the “edge effect” when applied to a subregion of a region. For example,
consider Z(s) measured at all counties in the US. Suppose that the
true model for Z is a SG model (3.18). and that we only have data
from counties in one state, for example, Iowa. Simply extracting the
corresponding rows from (3.18) gives a misspecified model for Iowa,
because those surrounding counties outside of Iowa need to be involved.
For a CG model, although edge sites result in a complicated likelihood,
the model specification problem is less serious.
3.2.2.4 Markov Random Field (MRF)
If the full conditional distribution Z(si ) given the rest Z(sj ) : j ̸= i in a

CG model 3.19 can be determined by its neighbors N (i) = {zj : j ∼ i},
then a Gaussian MRF (GMRF) is specified. A GMRF is a multivariate
normal distribution specified by a mean vector µ and a precision matrix
Q (?). Certainly, these full conditionals can not be arbitrary. Instead,
they need to be compatible so that a unique joint distribution, if exists
at all, is defined; see ? for examples of incompatibility.
A general MRF, not necessarily GMRF, uses local conditional spec-
ifications to determine a joint distribution. A random field {Z(si ) : si ∈
D} is Markov if the full conditional π[z(si )|z(sj ) : j ̸= i] is specified by
its neighbors:
π[z(si )|z(sj ) : j ̸= i] = π[z(si )|z(sj ) : j ∼ i]. (3.24)
When they are compatible, Brook’s lemma (?) gives the shape of the
joint distribution in terms of all the full conditionals:
∏
n
π [z(si )|z(s1 ), . . . , z(si−1 ), y(si+1 ), . . . , y(sn )]
π(z) = π(y) , (3.25)
π [y(si )|z(s1 ), . . . , z(si−1 ), y(si+1 ), . . . , y(sn )]
i=1
where z = {z(s1 ), . . . , z(sn )}⊤ , and y = {y(s1 ), . . . , y(sn )}⊤ is a fixed

point in the support of π(z). The conditional specification (3.24) is
preferred to a joint specification when n is large. The full conditional
distributions in (3.24) must satisfy the factorization in Brook’s lemma
(?) to be consistent. The joint distribution must satisfy the positivity
condition. That is, the support of the joint distribution π(z) must equal
the product of the support of all the marginal distributions π [z(si )].
What remains to be addressed is the formulation of the joint distri-
bution from the full conditionals in (3.24). The factorization in equation
(3.25) suggests that the joint density of a continuous MRF at any point
z in its support ζ can expressed as
exp[Q(z)]
π(z) = ∫ , (3.26)
y∈ζ exp[Q(y)]
where Q(z) = log{Pr(z)/ Pr(z 0 )} is called the the negpotential func-

tion, and z 0 is a fixed point in ζ. Joint mass function in the discrete
case is obtained by replacing the integral with summation in (3.26).
The form of π(z) is established by the Hammersley–Clifford Theo-
rem using the concept of clique. A clique is a set of site that consists
of either a single site or sites that are all neighbors of each other. A
site j is a neighbor of a site i if the full conditional distribution of z(si )
depends on z(sj ). The Hammersley-Clifford Theorem (?) states that if
(3.24) defines a MRF, its negpotential function is expressible as a sum
over all cliques:
∑ ∑
Q(z) = ϕk [z(α1 ), . . . , z(αk )], (3.27)
k α∈Mk
where Mk is a collection of all the cliques of size k, and ϕk is a function

with exchangeable arguments. Only cliques contributes to the negpo-
tential function.
3.2.2.5 Auto-Models
Auto-models are derived from the general MRF under appropriate as-
sumptions (?). The CAR Gaussian model (3.19) is also known as auto-
normal model. We now discuss two widely used auto-models: auto-
logistic model and the auto-Poisson model. Two other auto-models for
discrete data, auto-binomial and auto-negative-binomial, can be found
in ?.
Assume conditional exponential distributions for Z(si ):
π[z(si )|z(sj ) : j ̸= i] = exp {Ai Bi [z(si )] + Ci [z(si )] + Di } , (3.28)
where Ai and Di are expressions of {z(sj ) : j ̸= i}. Assume further

pairwise-only dependence between sites, i.e., ϕk = 0 for k > 2 in (3.27).
? shows that ∑
Ai = αi + θij Bj [z(sj )], (3.29)
sj ∈N (si )
where θij = θji .

Using the nature of binary data and with some algebra, an auto-
logistic model can be derived as
∑
logit{Pr[Z(si ) = 1|Z(sj ) = z(sj ) : j ̸= i]} = αi + θij z(sj ),
sj ∈N (si )
(3.30)
u
where logit(u) = log 1−u , and αi and θij are regression parameters. The
negpotential function for model (3.30) can be shown to be, up to an
additive constant,
∑
n ∑
Q(z) = αi z(si ) + θij z(si )z(sj ), (3.31)
i=1 1≤i<j≤n
where θij = 0 unless si and sj are mutual neighbors.

Assuming a Poisson distribution with mean λi for π[z(si )|z(sj ) :
j ̸= i], an auto-Poisson models can be derived as
∑
log λi = αi + θij z(sj ). (3.32)
sj ∈N (si )
The negpotential function for model (3.32) can be shown to be, up to

a additive constant,
∑
n ∑ ∑
n
Q(z) = αi z(si ) + θij z(si )z(sj ) − log z(si )!, (3.33)
i=1 1≤i<j≤n i=1
where θij = 0 unless si and sj are mutual neighbors.

There is a serious limitation of the auto-Poisson model in its pa-
rameter space: the dependence parameters θij ≤ 0 for all i, j = 1, . . . , n
(?). This results from the “summability condition”, a requirement that
the normalizing constant of the joint distribution of Z be finite (?,
p.428). An auto-Poisson model cannot accommodate positive spatial
autocorrelation, which is at odds with the mostly often seen positive
spatial autocorrelation in the real world. ? propose a Winsorized Pois-
son auto-model to accommodate positive spatial autocorrelation. The
conditional distributions π[z(si )|z(sj ) : j ̸= i] are specified as Win-
sorized Poisson distribution, replacing high counts with a cutoff value
R. This cutoff value R should be chosen to be 3 times of largest condi-
tional expectations E[Z(si )|Z(sj ) = z(sj ), j ̸= i].
In practice, large scale variations can be incorporated in αi through
αi = Xi⊤ β. The dependence parameter θij can depend on covariates
too. The simplest specification for dependence is θij = ρ.
3.2.2.6 Marginal Models

In many applications, scientific interests lie in the relationship between
a response variable Y (si ) and a covariate vector X(si ), whereas the
spatial dependence is not of primary interest. A marginal GLM is of
the form
g[µ(si )] = X(si )⊤ β (3.34)
where g is a link function, µ(si ) = E[Y (si )|X(si )], and β is the re-
gression coefficient vector to be estimated. Marginal models have been
popular in longitudinal data analysis, providing valid inference for re-
gression parameters without fully specifying the dependence structure
(?). The generalized estimating equation (GEE) approach of ? has been
fruitful, provided independence between clusters. ? and ? extend the
GEE method into the setting of a single spatial process and develop
valid variance estimators.
3.2.2.7 Hierarchical Models

A hierarchical model for lattice data is a spatial GLMM (??, see for
example). The model specification has the same form as (3.7) for geo-
statistical data. The spatial random effects Z(si ) are often assumed
to come from a CAR or ICAR process in a Bayesian setting. Given
{Z(si )}, {Y (si )} are from a conditional exponential family specified by
the conditional mean µ(si ). The conditional mean µi can be linked to
covariates through a link function g:
g[µ(si )] = X(si )⊤ β + Z(si ). (3.35)
Note that model (3.35) is a conditional GLM given the latent random
effect Z(si ), while model (3.34) is a marginal GLM.
For non-Gaussian data, ? adds another source of random effects
ϵ(si ), which is independent of Z(si ), into model (3.35):
g[µ(si )] = X(si )⊤ β + Z(si ) + ϵ(si ), (3.36)
where {ϵ(si )} are independent N (0, σϵ2 ). Random effects Z captures the
spatial clustering effects while ϵ captures the spatial heterogeneity. This
model has been adopted in a wide variety of applications, for instance,
disease mapping, image restoration, and so on.
3.2.3 Inferences
3.2.3.1 Likelihood Method
The exact likelihood for MRF model for a lattice data is constructed
from their joint density (3.26). The normalizing constant of this joint
density is very difficult to evaluate when the dimension n is large. All
likelihood methods have to solve this problem.
For a Gaussian model, either SG or CG, assume that Z(s) ∼
N [x(s)β, Σ(γ)], where β and γ are respectively mean and covariance
parameter vectors. Evaluating the loglikelihood demands the inverse
and the determinant of Σ, which are both available from the Cholesky
decomposition of Σ. A CG model gets around the inversion since Σ−1
is specified, but the determinant |Σ| is needed for both SG and CG.
Some computational simplifications are possible. For example, when
∏
C = ρW in a CG model, ? shows that |I − ρW | = ni=1 (1 − ρωi ),
where ωi ’s are eigenvalues of W . Alternatively, one can maximize the
profiled likelihood for a grid of ρ values in (−1, 1). ? give the increasing-
domain asymptotics: under certain regularity conditions, the MLEs are
consistent and asymptotically normal
For an auto-logistic model, it is impractical get the normalizing con-
stant except for very small n values. The normalizing constant is the
summation of 2n terms. It is very hard to approximate even using the
bootstrap method (?) An auto-Poisson model shares the same difficulty.
In stead of approximating the likelihood itself, it is possible to approxi-
mate the MLEs using Monte Carlo methods (??). The idea is to use the
Metropolis algorithm and variants to simulate data without knowing
the normalizing constant and make inference using the simulated data.
These methods in general require a reasonably good starting value,
for example, maximum pseudo likelihood estimator, and iterate many
times to achieve convergence. That means the methods are computing
intensive and convergence monitoring is important.
For hierarchical models with random effects, estimation is in general
a hard problem, since the random effects need to be integrated out to
obtain the marginal likelihood. MCMC methods have been used in a full
Bayesian framework (?). Alternatively, ? proposed a penalized quasi-
likelihood (PQL) approach, which is based on a Laplace approximation
to integrate out the random effects.
3.2.3.2 Pseudo/Composite Likelihood
To avoid the difficulty involved in likelihood approaches, ? proposes a

computationally simple estimation approach, coding, where an exact
likelihood is constructed conditionally on some subset of lattice sites.
The coding method works naturally on a regular lattice, where the
data points are coded into two groups such that within each group
individual components are conditionally independent given the other
group. Then, the usual likelihood method for data within each group
applies. The estimate from the two groups may be averaged to give a
combined estimator.
? introduced a pseudo likelihood (PL) method, which maximizes a

PL constructed by pooling the conditional likelihood of all the data
points
∏n
π[z(si )|z(sj ) : j ̸= i]. (3.37)
i=1
The PL method is closely related to the coding method in that, in the

case of regular lattice, the PL is the product of the individual coding-
based likelihoods corresponding to different coding schemes (?, p.287).
Both the coding method and the PL method are special cases of
the more general composite likelihood method (?), which is most use-
ful when an exact MLE is not feasible. A composite likelihood is con-
structed by pooling individual component likelihoods each of which is
a valid marginal or conditional likelihood. The pooled components are
not necessarily independent. For binary spatial data, ? estimate the
parameters in a spatial probit model using a composite likelihood with
pairwise contributions. ? propose a generalized PL (GPL) which is the
product of exact likelihood over small subsets of the data. These subsets
are of size less than 9–15 such that the exact likelihood on each subset
can be feasibly evaluated. The GPL method is a comprise between the
PL and true likelihood in terms of both computational intensity and
efficiency.
The estimator from maximizing a composite likelihood can be
shown, under regularity conditions, to be consistent and asymptoti-
cally normal; see ? for details under the discussion of minimum con-
trast estimation. Although it is often easy to obtain the parameter
estimates, their variance estimates are usually very hard to get. Even
when the variance estimator has a closed form expression, comput-
ing it may quickly become computationally impractical as the sample
size increases (?, p.1103). Resampling methods have been suggested
for random fields, extending the subsampling methods of ? and ? for
stationary time series. ? showed that subsampling methods can be used
to estimate the moments of a general statistic over stationary random
fields, and for spatial data, the optimal subshape size is proportional to
n1/2 . In a more general setting of correlated data regression, ? propose a
class of weighted empirical adaptive variance estimator for estimators
obtained from estimating functions based on marginal GLMs. Since

composite score functions are estimating functions, the method of ?
described next applies.
3.2.3.3 Estimating Functions

Consider the marginal GLM (3.34). Let Dn be the lattice under increas-
ing domain asymptotics. The regression parameter β can be estimated
by the root β̂n of the estimating functions
1 ∑
Ūn (β) = Ui (β) = 0. (3.38)
n
i∈Dn
Estimating functions theory for dependent data (?) gives the optimal
linear unbiased estimating functions
1 T −1
Ūn (β) = D V (Y − µ) = 0, (3.39)
n
where µ = E(Y ), V = Var(Y ), and D = ∂µ/∂β. However, since Var(Y )
is in general unknown, a reasonable approximation W to V −1 will also
give good results. The pseudo likelihood method is a special case in
which the estimating functions are the pseudo score functions.
The estimator β̂n under regularity conditions is consistent and
√
asymptotically normal with n(β̂n − β0 ) → N (0, Ξ), where β0 is the
true parameter value. The asymptotic variance matrix Ξ can be ap-
proximated by a sandwich form: Hn−1 Σn Hn−1 , where H = ∂ Ūn /∂β and
Σn = nVar[Ūn (β)]. The outer terms Hn can be estimated by their
empirical means. However, the middle term Σn can be very hard to
estimate. To appreciate the difficulty, rewrite Var[Ūn (β)] as
 
1 ∑
Var[Ūn (β)] = Eβ0  Ui (β)Uj⊤ (β) . (3.40)
n
i,j∈Dn
If β̂n is substituted into (3.40), without independent replications, the

∑
sum i,j∈Dn Ui (β̂)Uj⊤ (β̂) ≡ 0 by the definition of β̂.
? propose to estimate Σn by
1 ∑
wijn Ui (β̂)Uj⊤ (β̂), (3.41)
n
i,j∈Dn
where wijn → 1 as n → ∞ but wijn → 0 as the distance between si

and sj , d(i, j) → ∞ for fixed n. Their variance estimator synthesizes
the well known spectral based variance estimator in times series (?)
and the subsampling estimator for dependent data (??). ? propose a
general subsampling method for estimating the variance of estimators
obtained from estimating functions for correlated data regression. More
recently, ? propose a different approach using quasi-likelihood estima-
tion equations of the form (3.39) for binary spatial data.
For a conditional GLM (3.34), the estimating function approach
also provides an alternative to the PQL method and the Monte Carlo
method. ? use estimating equations to estimate regression parameter
and dependence parameter in a regression model for the latent spatial
random effects. The central idea is to estimate the unobserved estimat-
ing equations unbiasedly using the observed quantities (?). The optimal
weight of the estimating functions is approximated by computationally
simpler quantities. The method is similar in spirit to that of ? for time
series of count data.
The estimating function approach is close related to the general-
ized method of moments ? in the econometrics literature. Recent work
exploits their connection under the unified framework of minimum chi-
squared method to improve efficiencies (??).
Prior specification is an important part of Bayesian inference. Consider

model (3.36) with two sources of random effects. ? propose improper
pairwise difference prior (3.23) or IAR for spatial clustering effects Z(·).
∑
To identify the regression parameters, a constraint ni=1 Z(si ) = 0 is
added. The semi-conjugate priors for variance parameters τ 2 and σϵ2 are
inverse gamma. The parameters of these inverse gamma distribution
need to be chosen with care such that the prior emphasis on the two
sources are fair and Bayesian learning is possible (?, p.161). ? find that
IG(0.5, 0.0005), suggested by ?, may be a more reasonable prior for the
scale of their application. In practice, a sensitivity analysis is always
needed to understand how influential the priors are.
The conditional specification of MRF facilitates the implementation
of Gibbs sampler in this setting, but convergence and mixing properties

of such algorithm can be extremely poor due strong autocorrelation of
the posterior samples. ? show large benefits of updating the parameters
and the hyper-parameters in one block using the methods for fast sam-
pling of GMRF (?). ? achieve similar conclusion with an alternative
block updating algorithm based on structured MCMC (SMCMC), a
general method for Bayesian computing in richly parameterized mod-
els. (??).
3.2.4 Simulation
Simulating from a GMRF with mean µ and precision matrix Q is stan-

dard: obtain the Cholesky decomposition of Q = LL⊤ ; simulate stan-
dard normal z ∼ N (0, In ); solve the triangular system Ltop u = z; and
return x = µ + u. Given the computation burden of Cholesky decom-
position for large n, the real problem is still how to simulate efficiently.
? proposes a fast way to simulate from GMRFs, exploiting the
sparse structure of the precision matrix. The idea is to reduce the
bandwidth of the precision matrix by reordering using algorithms from
numerical analysis and then apply Cholesky decomposition to the re-
sulting banded matrix. Matrix L is always more or equally dense than
the lower triangular part of Q. Let nl and nQ be, respectively, the num-
ber of non-zero elements of L and lower triangular part of Q. The closer
the fill-in ratio nl /nQ is to 1, the more efficient is the sparse Cholesky
decomposition. This reordering only needs to done once for each given
Q. Using this method, one can efficiently approximate Gaussian fields
or hidden GMRFs in hierarchical modeling using GMRFs (??). These
developments have been summarized in the book of ?.
Simulation from an auto-logistic model can be done through
MCMC. Care is needed to ensure the burn-in period has been passed.
Perfect sampling (?) can be applied to eliminate the error in estimating
the burn-in time to stationarity in each MCMC chain.
3.3 Point Pattern

3.3.1 Data
A spatial point pattern is comprised of locations of events in a study
region S. A dataset of spatial point pattern is a realization of an under-
lying spatial point process. The term “event” has been used for location
of an observation to distinguish from any other arbitrary location in
the study region. The events in a spatial point pattern data can be a
complete enumeration of all events, or a random sample of events in
the region. The study region S may be of regular or irregular shape.
If a random “mark” is attached to each event, then the point process
combined with the mark is called a marked point process. In the se-
quel, we focus on point patterns in R2 . For recent work on marked
point processes see for example ? and ? and references therein.
Spatial point pattern data has a long history in ecological and
forestry statistics (?). Other important fields of application of spa-
tial point pattern include astronomy, geology, archeology, seismology,
and sociology, Examples are locations of trees, animal behavior, cancer
cases, convenient stores, and so on. ? and ? both have some datasets
in the public domain.
The main questions to be answered in a spatial point pattern anal-
ysis are: Is the point pattern completely random? If not, how can the
point pattern be described? Observed spatial point patterns may ex-
hibit spatial clustering or spatial regularity, i.e., there are attraction or
repulsion between events. Figure 3.1 shows example spatial point pat-
terns from complete spatial randomness, spatial regularity, and spatial
clustering.
3.3.2 Spatial Point Process Models

3.3.2.1 General
A spatial point process Y is a random countable subset of a space S.
An intuitive way to express a spatial point process is in terms of a set of
random variables Y (B), the number of events in arbitrary subregions
B ⊂ S. As for other spatial processes, the first-order and second-order
properties are important in characterizing a spatial point process. The
3.3. Point Pattern 31
Fig. 3.1 Spatial point pattern examples. Left: complete spatial randomness; Center: spatial
regularity; and Right: spatial clustering.
first-order property is defined through the intensity function λ(s) of

the process at each point s:
E[Y (ds)]
λ(s) = lim , (3.42)
|ds|→0 |ds|
where ds is a small neighborhood around point s and |ds| is the area of

ds. The intensity function describes how the expected number of events
varies over space. The second-order properties describe the second mo-
ment of Y (B). One way to characterize it is through the second-order
intensity
E[Y (dsi )Y (dsj )]
γ(si , sj ) = lim . (3.43)
|dsi |,|dsj |→0 |dsi ||dsj |
The pair-correlation function is similarly defined as
γ(si , sj )
g(si , sj ) = . (3.44)
λ(si )λ(sj )
Stationarity and isotropy can be also defined for spatial point pro-
cesses. A spatial point process is stationary if the distribution of Y (B)
is invariant to translation of the coordinate system. If furthermore, the
distribution of Y (B) is invariant to rotation of the coordinate system,
then the process is isotropic. For a stationary and isotropic spatial point
process, the intensity function is constant, and the second-order inten-
sity γ(si , sj ) and the pair-correlation g(si , sj ) are functions of ∥si − sj ∥
only.
Next we briefly present some important processes: Poisson process,

Cox process, and Markov point process. For details and other spatial
point processes, we refer to ? and ?.
3.3.2.2 Poisson Process
Poisson processes play an important role in spatial point pattern anal-

ysis. The simplest Poisson process is the homogeneous Poisson process,
which is also known as a complete spatial random (CSR) process. A
homogeneous Poisson process is defined by a constant intensity λ such
that Y (B) for any subregion B ⊂ S has a Poisson distribution with
mean λ|B| and given the number of events n in B, these events are
independent samples from a uniform distribution on B. Homogeneous
Poisson processes are often used as the null model of complete spatial
randomness to be tested.
An interesting example of CSR is the flying-bomb hits in the south
of London during the Second World War. The entire area is divided
into 576 subregions of 1/4 square kilometers each. ? gives the number
of subregions with exactly k hits for k = 0, 1, 2, 3, 4, and 5 or above:
229, 221, 93, 35, 7, and 1. A Poisson distribution fits the data surpris-
ingly well; about 88% of comparable observations should show a worse
agreement.
An inhomogeneous Poisson process is defined by intensity function
λ(s) such that the Y (B) for any subregion B ⊂ S has a Poisson dis-
∫
tribution with mean B λ(s)ds and given the number of events n in
B, these events are independent samples from the distribution with
∫
density f (x) = λ(x)/ B λ(s)ds. The inhomogeneous Poisson process is
the simplest alternative model to CSR (homogeneous Poisson process).
The intensity function λ(s) can incorporate covariate:
λ(s; β) = exp[X ⊤ (s)β], (3.45)
where X is a covariate vector and β is the c An inhomogeneous Poisson

process with intensity (3.45) is called a modulated Poisson process (?).
3.3.2.3 Cox Process

A Cox process is an inhomogeneous Poisson process with a random
intensity function. It is also known as a doubly stochastic Poisson pro-
cess: a nonnegative intensity function Λ(s) is generated first and then
the inhomogeneous Poisson process with intensity Λ(s) is generated.
The key point in defining a Cox process is how the random intensity
Λ(s) is constructed. A log Gaussian Cox process is defined with (?)
Λ(s) = exp[Z(s)], (3.46)
where Z(·) is a Gaussian field. A simple way to construct random inten-

sity Λ(s) is to use realizations of simpler point processes. Let t1 , t2 , . . .
be a realization of a homogeneous Poisson process with intensity λp .
Then Λ(s) can be defined as
∞
∑
Λ(s) = µ f (s − ti ), (3.47)
i=1
where µ is some positive constant, and f is a probability density func-

tion. For example, the Thomas process has this form with f being a
normal density. ? recently propose Poisson/gamma random field mod-
els with ∫
Λ(s) = k(s − t)Γ(dt), (3.48)
where Γ is a random Gamma measure, and k is a kernel function. The

modulated Poisson process model can incorporate in the log intensity
with a random effect Z(·) to yield a Cox process with
λ(s; β) = exp[X ⊤ (s)β + Z(s)]. (3.49)
3.3.2.4 Markov Point Process

A spatial point process is Markov of range ρ if the conditional intensity
at s, given the realization of the process in S\{s} depends only on the
events in b(s, ρ)\{s}, where b(s, ρ) is the closed ball of radius ρ centered
at s (?). Markov point processes can be conveniently defined through
a Gibbs process combined with a neighbor structure. A general Gibbs
process has a joint density proportional to the exponent of a negative
potential energy function; see equation (3.27). A neighbor structure

defined through range ρ facilitates the presentation of the joint den-
sity in which non-clique components of the potential function are zero;
see Hammersley-Clifford Theorem (3.27). Pair-potential Markov point
processes are Gibbs point processes with pairwise interactions, yielding
joint density
 
∑
f (s1 , . . . , sn ) ∝ exp − ϕ(∥si − sj ∥) (3.50)
i<j
for a point pattern (s1 , . . . , sn ) of n points. If ϕ(h) is positive (negative),

then the model shows inhibition (attraction) among events at distance
h. Note that if the pair-potential function ϕ(h) = 0 for all h > 0, then
a pattern of n independent uniform points on S are obtained and the
process reduces to CSR. The Strauss process (?) is a special case with
ϕ(h) = − log(γ)I(0 < h ≤ ρ), where γ is a parameter.
3.3.3 Inferences
3.3.3.1 Kernel Estimation of Intensity
Suppose that a point pattern Y of n events is observed over a study

region S. For homogeneous Poisson processes, the intensity function
can be easily estimated as λ̂ = n/|S|. When the intensity function is
not constant, a nonparametric kernel estimator is (?)
∑
n
λ̂b (s) = wb−1 (s)kb (s − si ), (3.51)
i=1
∫
where kb is a kernel with bandwidth b > 0 and wb (s) = S kb (t − s)dt is
an edge correction factor. The kernel choice is usually not important.
The bandwidth b, however, can be very influential on the estimate.
The bias-variance trade-off here is similar to that for kernel density
estimation. MSE is used to measure the quality of the estimator; see ?
more details.
3.3.3.2 Ripley’s K Function
A widely accepted inferential tool is the K function popularized by

?. Assuming stationarity and isotropy, the K function of spatial point
process with intensity λ is such that
λK(h) = E{Y [b(h)]}, h ≥ 0, (3.52)
where Y [b(h)] is the number of extra events within a ball of radius h

centered at a randomly chosen event. The K function is a measure of
the second-order property. In R2 , it is related to the pair correlation
g(h) via
g(h) = K ′ (h)/(2πh). (3.53)
Of note is that the K function does not completely characterize a point
process; a cell process (?) has the same second-order properties as CSR.
Given n events in a study region S, ? proposes to estimate K(h)
with
1 ∑ ∑ −1
K̂(h) = vij I(∥si − sj ∥ ≤ h), (3.54)
λ̂n i j̸=i
where vij is the proportion of the area of a circle centered at si inside

the study region S. The incorporation of vij is an edge correction. ?
demonstrated that this estimator is approximately unbiased, especially
for small h. The estimator is usually computed at a sequence of h values.
It is common practice to consider h values only below a half length of
the shortest dimension.
The theoretical K function is known for many spatial processes.
Comparing the estimate K̂(h) and the theoretical K(h) provides an
informative diagnosis tool. For CSR in R2 , K(h) = πh2 . The variation
of K̂(h) is commonly assessed by simulation under the hypothesized
null model, for example, CSR. A large number of replicate of the data
given n is generated and K̂(h) computed for each replicate. The result-
ing bootstrap sample of K̂(h) is then used for inference. Due to the
scale of K(h), the variation of K̂(h) increases with h, which hides in-
teresting patterns at small h. A remedy is to use the L function defined
as
√
L(h) = K(h)/π − h. (3.55)
For CSR, the theoretical L function is zero. If the difference L(h) is sig-
nificantly positive (negative), then there are more (fewer) points than
expected from CSR at distance h, suggesting spatial clustering (regu-
larity). Therefore, L̂(h) provides an intuitive way to test CSR.
3.3.3.3 Likelihood Method
Consider a mapped point pattern of n events in a study region S.

That is, every event in S has been observed. In the simplest CSR case,
the MLE is λ̂ = n/|S|. For an inhomogeneous Poisson process with
intensity function λ(s; θ), the loglikelihood function is
∑
n ∫
log L(θ; s1 , . . . , nn ) = log λ(s; θ) − λ(u; θ)du. (3.56)
i=1 S
The difficulty of (3.56) lies in the second term. This integral is the to-
tal intensity over the whole study region S, requiring information for
each s ∈ R, which can be impractical, if not impossible, to get when
λ has covariates incorporated. Kriging from geostatistical methods can
be used to give predictions of covariates at unsampled locations from
a collection of sampling sites. These predictions are then used to ap-
proximate the integral in (3.56) (?).
For a Cox process, the likelihood function is complicated by the fact
that the latent intensity needs to be integrated out. By treating the
latent intensity as missing data, Monte Carlo methods for computing
missing data likelihoods can be applied (?, section 10.3).
For a Markov point process, the difficulty is from the unknown
normalizing constant of the joint density. Earlier work (??) use some
expansion method for estimating the normalizing constant, which may
not be reliable in cases with strong interaction (?). The normalizing
constant can be quite accurately estimated with Monte Carlo methods
(?); see ? for a recent summary. A computationally simpler alternative
is the maximum pseudo likelihood method (?), which has been imple-
mented in the spatstat package (?). A composite likelihood approach
is proposed by ?.

For Cox processes, Bayesian inference is readily applicable because of
the hierarchical structure. ? develop Bayesian inference for Poisson-
Gamma processes. The method is applied in ? to analyze the effect
of traffic pollution on respiratory disorders in children. use Bayesian
inference for Poisson-Gamma processes. ? studies the convergence of
posteriors for discretized LGCPs.
For Markov point processes, the literature on Bayesian inference
has been rather scarce. The difficulty is again caused by the unknown
normalizing constant in the likelihood, which is needed in Metropolis-
Hastings algorithms. ? propose a nonparametric estimator of the pair
potential function based on a Bayesian smoothing method and concen-
trate on finding the posterior mode instead of a full posterior analysis.
? propose an efficient MCMC algorithm using importance sampling to
draw from the full posterior distribution.
3.3.4 Simulation
Simulation from a homogeneous Poisson process with intensity λ on a
region can be done from the definition. We first generate Y (S) from
a Poisson distribution with mean λ|S| and then generate Y (S) inde-
pendent and uniformly distributed points on S. This algorithm is the
simplest when S is a box. When S is a circle (or ball in higher dimen-
sion), ? propose an alternative radial simulation procedure using the
polar coordinates. For irregular shapes of S, we simply simulate on a
box or circle that covers S and discard the points outside of S.
Simulation of an inhomogeneous Poisson process with intensity λ(s)
can be done by independent thinning a homogeneous Poisson process
(?). Let λ0 = sups∈S λ(s). We first simulate a homogeneous Poisson
process with intensity λ0 and then, independently retain each event s
with probability λ(s)/λ0 .
Simulation of Cox processes can be done utilizing its two-stage
random mechanism and discretization: first generate a step intensity
function from a discretized Gaussian field and then simulate from an
inhomogeneous Poisson point process with the realized intensity. Sim-
ulation of Markov point processes is more difficult since it involves the
unknown normalizing constant. Metropolis-Hasting algorithms can be

used to simulate spatial point processes defined by an unnormalized
density (?, chapter 7).
3.4 Some Remarks

Some important and active areas of spatial statistics have been left
out in this introductory review. Examples are anisotropy (??), nonsta-
tionarity (??), spatial sampling design (??), multivariate models (???),
spatio-temporal models (??), and spatial survival models (??). Many of
these topics are relevant to all three branches of spatial statistics. We
foresee more methodological developments and applications in these
areas of spatial statistics.
Recent developments seem to have made the borders of separate
areas of spatial statistics fade away (?, p.75). ? use a geostatistical
approach in modeling area-level relative disease risks by incorporating
the latent continuous risk surface of a Gaussian field. ? uses kriging to
approximate a continuous covariate surface in modeling an inhomoge-
neous Poisson intensity. In terms of data format, by partitioning the
study region of a point pattern into a regular lattice, GLMMs can be
used to analyze the resulting count data (?).
4
Key Developments in Spatial Econometrics
Even though spatial statistics is quite well and established for some
time, the literature in econometrics is fairly new. The main objective
of this chapter is to explore the key developments in the spatial econo-
metrics literature during last 30 years. There exists a wide variety of
alternative econometric approaches for undertaking estimation and in-
ference using spatial regression models. A number of surveys and papers
provides an excellent overview of the methodological development that
proves to be relevant for actual practice, e.g. Anselin(1988, 2001, 2006),
Anselin and Bera (1998), Anselin, Bera, Florax and Yoon (1996), Kele-
jian and Prucha (2007), Lee and Yu (2010) etc. Also recently several
journals devoted an entire issue to spatial econometrics, e.g., Empirical
Economics (Vol.34(1), Feb 2008), Journal of Econometrics (Vol.40(1),
Sep 2007), Regional Science and Urban Economics (Vol.40(5), Sep
2010) etc. The enormity of the literature makes it difficult to cover
all the theoretical developments and published application in a single
survey. Instead we emphasis on some of the influential estimation and
inference techniques which can elucidate the appeal of the economic ap-
plication of spatial econometric modeling. Both theoretical and applied
econometrician can reasonably benefit from our focus on different im-
39
40 Key Developments in Spatial Econometrics
portant but easily applicable areas of development in linear regression

models involving spatial dependence.
The overview of this chapter is as follows. First, we start with some
of the basic econometric procedures for estimating spatial regression
models. The goal is to highlight some of the salient features that dis-
tinguishes each method from other approaches. Next, without going
into rigorous technical details, we outline various asymptotic methods
that exists in the spatial econometrics literature. Finally, we present the
development of spatial econometric specification testing in an unified
framework.
4.1 Estimation of Spatial Econometric Models

As we mentioned in the earlier chapters, unlike time series process,
in spatial domain the simultaneous and conditionally specified models
are not equivalent. For some reason, in the spatial econometrics liter-
ature, the main focus seems to be on simultaneously specified model
and thats where our main emphasis is throughout this section. One
of the earlier method for estimating simultaneous spatial regression
models is maximum likelihood estimation (MLE) procedure, the origin
of which goes back to Ord (1975). However, in addition to MLE the
exists other approaches such as Quasi MLE (QMLE), Method of Mo-
ments (MM), Instrumental Variables (IV) and Generalized Method of
Moments (GMM). We intend to provide a synthesis of all of them.
4.1.1 MLE of spatial autoregressive and error models
In this subsection we first describe the basic MLE method for various
simultaneously specified spatial models. We start with the simplest
possible pure endogenous model:
Yn = ρWn Yn + ϵn (4.1)
where Yn is an n × 1 vector of values of the dependent variable, ϵn is
an n × 1 vector of disturbances, Wn yn is the spatially lagged depen-
dent variable for weights matrix Wn , and ρ is the spatial autoregressive
parameter. For notational simplicity from now onwards we denote the
4.1. Estimation of Spatial Econometric Models 41
subscript n only for model specification purpose. Even though the ap-
plication of this model is limited in practice, a discussion of (4.1) is
important as it underlies the formulation of most simultaneous spatial
regression model.
If we assume that ϵ ∼ N (0, σ 2 I), then log likelihood function for
ρ, σ 2 , given that Y = y, takes the following form
′
n n (y − ρW y) (y − ρW y)
l(ρ, σ 2 ) = − ln(2π) − ln(σ 2 ) − + ln |I − ρW |
2 2 2σ 2
(4.2)
As mentioned by Anselin and Bera (1998), in contrast to time-series,
the spatial jacobian is not the determinant of a traingular matrix, but of
a full matrix. However, one can simplify the calculation of the jacobian
of the transformation by using a result of Cliff and Ord (1973, p.165)
[see also, Rao (1973, p.40)]. If W have ω1 , . . . , ωn as its eigenvalues,
then by the definition of the characteristic equation,
∏
n
|ωI − W | = (ω − ωi ).
i=1
Also, the determinants of a matrix is equal to the product of its eigen-

values1 . So if f (W ) is an algebraic polynomial in W , then the eigenval-
ues of f (W ) are f (ωi ) (Lancaster 1971, p.289). Hence, the eigenvalues
of I − ρW are {1 − ρωi } and
∏
n
|I − ρW | = (1 − ρωi ).
i=1
By minimizing (4.2) with respect to σ 2 and forming the usual first-order

conditions, we obtain the the ML estimate of σ 2 as:
(y − ρW y)′ (y − ρW y) y ′ A′ Ay
bM
σ 2
L = =
n n
1 Let W be a real symmetric matrix. Then the determinantal equation (also called charac-
teristic equation) |W − λI| = 0 is of degreee ≤ n in λ if W is n × n. Corresponding to any
eigenvalues λi of W , there exists a non-null column vector Pi , such that W Pi = λi Pi and
Pi′ Pi = 1
where A = I −ρW . The ML estimate for ρ is obtained from a numerical

optimization of the concentrated log-likelihood function (for details see
Anselin 1980, ch.4):
n ∑ n
lc = constant − b2 +
ln σ ln(1 − ρωi )
2
i=1
The asymptotic variance matrix follows as the inverse of the informa-
tion matrix (Ord 1975)
( )
2 trB/σ 2 tr(B ′ B) − α
AsyV ar(ρ, σ ) = (4.3)
n/2σ 4 trB/σ 2
∑
where B = W A−1 and α = − ni=1 ωi2 /(1 − ρωi )2 . It is evident that the
existence of the spatially lagged variable makes it difficult to have the
usual asymptotic properties under approaches such as ordinary least
squares (OLS) as it ignores the jacobian term. For example, the OLS
estimator for model (4.1) is the solution of the estimating equation
0 = y ′ W y − ρy ′ W ′ W y = y ′ (I − ρW ′ )W y = e′ W y (4.4)
where e = Ay is the observed disturbance terms. Equation (4.4) will be
an unbiased estimating equation only if E(ϵ′ W y) = 0, implying that
tr[W ′ (I − ρW )−1 ] = 0. For consistency, we need
plimn→∞ {ϵ′ W y/y ′ W ′ W y} = 0

which holds only when W is upper or lower traingular, but not in
general. Hence, OLS estimator is inconsistent (Ord 1975).
Next we consider the widely known spatial autoregressive model

with exogenous variables, also known mixed regressive autoregressive
model:
Yn = ρWn Yn + Xn β + ϵn (4.5)
where Xn is an n × k matrix of fixed regressors (exogenous variables)
and β is a k × 1 vector of parameters. It is clear that for ρ known, the
OLS estimators for β are best linear unbiased (BLUE). For ρ unknown,
one option is to use the MLE procedure to estimate model parameters.
Using the normality assumption of the error term, the log-likelihood

function for the spatial lag model is:
′
n n (y − ρW y − Xβ) (y − ρW y − Xβ)
l(ρ, β, σ ) = − ln(2π)− ln(σ 2 )−
2
+ln |I−ρW |
2 2 2σ 2
(4.6)
Hence, from the first order conditions we obtain the ML estimates for
β and σ 2 as the following
βM L = (X ′ X)−1 X ′ (I − ρW )y = (X ′ X)−1 X ′ z
and
(y − ρW y − XβM L )′ (y − ρW y − XβM L )
bM
σ 2
L =
n
where z = Ay is the spatially filtered dependent variable. If ρ were
known, then these estimates are simply given by OLS of z on X. To
estimate ρ, similar to model (4.1), for spatial lag model (4.5), we use
the concentrated log-likelihood function:
[ ] ∑n
n (e0 − ρeL )′ (e0 − ρeL )
lc = − ln + ln(1 − ρωi )
2 n
i=1
where e0 and eL are residuals in a regression of y on X and W y on
X, respectively (Anselin 1980, ch. 4). The asymptotic variance for the
estimators is given by
 [BXβ]′ [BXβ] (X ′ BXβ)′

−1
tr[B]2 + tr[B ′ B] + σ2 σ2
tr(B)
σ2
 X ′ BXβ X′X 
AsyV ar(ρ, β, σ 2 ) =  σ2 σ2
0 
tr(B) n
σ2
0 2σ 4
(4.7)
As noted by Anselin and Bera (1998), the lack of block diagonality
in the information matrix has strong implications on the structure of
specification tests which we discuss in later sections.
Another attractive model that has been used in the literature is the
linear regression with spatial error autocorrelation model:
Yn = Xn β + ϵn , ϵn = λWn ϵn + un (4.8)
where λ is the spatial autoregressive coefficient for the error lag W ϵ

and u is an uncorrelated and homoskedastic error term with zero mean
and scalar covariance matrix σu2 I. The scalar parameter λ determines
the degree of correlation among the components of ϵ. If λ is known, we
can substitute the value of ϵ and estimate β by OLS using
Y = λW Y + Xβ − λW Xβ + u
However, when λ is unknown (which is usually the case), MLE is one
of the possible alternative method. Note that, model (4.8) leads to a
error covariance structure given by
E[ϵϵ′ ] = σu2 [(I − λW )′ (I − λW )]−1 = Ω

Under the assumption of the normality, the log-likelihood function be-
comes
n n (y − Xβ)′ (I − ρW )′ (I − ρW )(y − Xβ)

l(σu2 , λ, β) = − ln 2π− ln σ 2 +ln(|I−λW |)−
2 2 2σ 2
(4.9)
Solving three first order conditions
σu−2 X ′ (I − λW )′ (I − λW )(y − Xβ) = 0
1
− {n + σu−2 (y − Xβ)′ (I − λW )′ (I − λW )(y − Xβ)} = 0
2σu2
∑
n
ωi 1
− + 2 (y −Xβ)′ {(I −λW )′ W +W ′ (I −λW )}(y −Xβ) = 0
1 − λωi 2σu
i=1
leads us to the familiar generalized least squares (GLS) estimate of

β, conditional upon λ:
βM L = βGLS = [X ′ Ω−1 X]−1 X ′ Ω−1 y

Since GLS is BLUE, we always have Cov(βGLS ) ≤ Cov(βOLS ), i.e.,
Cov(βGLS )−Cov(βOLS ) is positive semidefinite. For the autoregressive
parameter λ, we utilize the concentrated log-likelihood:
( ) ∑
n yL′ yL − yL′ XL [XL XL ]−1 XL′ yL
lc = − ln + ln(1 − λωi )
2 n
i
where yL = y − λW y and XL = X − λW X. The asymptotic covariance

between σu−2 and λ becomes
( tr[W (I−λW )−1 ]

)
n
2σu4 2
σu
AsyV ar(σu2 , λ) == tr(W (I−λW )−1 )
2
σu
Q
where Q = tr(W (I − λW )−1 )2 + tr[(W (I − λW )−1 )′ (W (I − λW )−1 )].
4.1.2 MLE of Spatial Panel Data

Recently various alternative specification of spatial panel data models
has been proposed (Anselin (2006), Case (1991), Elhorst (2003), Baltagi
et al. (2003, 2007), Kapoor, Kelejian and Prucha (2007), Moscone et al.
(2007)) that take account both time-series and cross-sectional variation.
Baltagi (2008, pp. 216-219) provides a textbook treatment of spatial
panel data. Very broadly, we can identify four class of models:
• A regular panel data regression model with spatial autore-

gressive error process (Anselin 1988) -
yit = Xit′ β + ϵit , (4.10)

∑
N
ϵit = λ wij ϵjt + uit , (4.11)
j=1
for i = 1, . . . , N and t = 1, . . . , T , where yit is the observation
on the ith region for the tth time period, Xit denotes the
K ×1 vector of observations on the non-stochastic regressors,
uit are IID(0, σu2 ).
• The random effects model with spatially dependent variable
(Elhorst 2003, Moscone et al., 2007)
∑
N
yit = ρ wij yjt + Xit′ β + ϵit , (4.12)
i=1
ϵit = µi + uit , (4.13)

where µi denote the random region effects which are assumed
to be IID(0, σµ2 ).
• The random effects model with spatial error autocorrelation
(Baltagi et al., 2003) - equation (4.10), (4.13) and
∑
N
uit = λ wij ujt + νit , (4.14)
j=1
where λ is the scalar spatial autoregressive coefficient with

|λ| < 1, νit is IID over i and t and is assumed to be N (0, σν2 ).
Also, {νit } is independent of {µi }.
• The random effects model with spatial error autocorrelation:
an alternative specification (Kapoor, Kelejian and Prucha
2005) - equation (4.10) and
∑
N
ϵit = λ wij ϵjt + uit , (4.15)
j=1
uit = µi + νit , (4.16)
The ML based estimation and inference for spatial panel data mod-
els can be done easily by exploiting two dimentional nature of the data
and related matrix algebra results. For example, consider the the ran-
dom effects model with spatially dependent variable (4.13) and (4.14),
expressed in matrix form:
y = ρ(IT ⊗ W )y + Xβ + ϵ, (4.17)
ϵ = (iT ⊗ µ) + u, (4.18)
where y = (y1′ , . . . , yT′ )′ , X = (X1′ , . . . , XT′ ), u = (u′1 , . . . , u′T ), iT is a
T × 1 vector of one and ⊗ is the kronecker product. Assume u ∼
n(0, σu2 IN T ), and define A = IN − ρW and θ = (σu2 )/(T σµ2 + σu2 ), the
log likelihood function of model (4.17) and (4.18) is:
[ ]
NT N 1 ′ 1 ′
l(ρ, β, θ, σ ) = −
2
ln(2πσu )+ ln θ+T ln |A|− 2 ϵ IN T − (1 − θ) (iT iT ) ⊗ IN ϵ,
2
2 2 2σu T
where ϵ = y − ρ(IT ⊗ W )y − Xβ. The parameters β and σu2 can be

solved from the first order maximizing conditions of the likelihood (see
also Elhorst 2003).
For the random effects model with spatial error correlation we write
down model (4.10), (4.13) and (4.14) as
y = Xβ + ϵ, (4.19)
ϵ = (iT ⊗ µ) + [IT ⊗ (IN − λW )−1 ]ν. (4.20)

If we define B = IN − λW and η = σµ2 /σnu
2 , the log likelihood function
becomes
NT 1
l(β, λ, η) = − ln(2πσu2 ) + (T − 1)ln|B| − ln |T ηIN + (B ′ B)−1 |
2 2
[ ] [( ) ]
1 ′ 1 ′ ′ −1 −1 1 ′ 1 ′ ′
− 2ϵ iT iT ⊗ [T ηIN + (B B) ] ϵ− 2 ϵ IT − iT It ⊗ (B B) ϵ,
2σu T 2σu T
where ϵ = y − Xβ and B = IN − λW . Solving the first order conditions
gives us the estimate of β and σu2 . Upon substituting those estimate
in the likelihood function we end up with the concentrated likelihood
function for λ and η. The final ML estimate can be obtained by a two
stage iterative procedure that alternates the estimate of β and σu2 on
one side, and λ and η on the other.
4.1.3 Method of moments and IV method of spatial autore-

gressive and error models
It is widely known that even for moderately large samples maximum-

likelihood based estimators are not easy to implement for spatial models
unless weights matrix satisfy certain basic conditions. As an alterna-
tive recently in a series of papers Kelejian and Prucha (1998,1999, 2004,
2007, 2010a, 2010b) and Kelejian and Robinson (1993) proposed a com-
putationally feasible three step procedure for spatial models involving
both lagged dependent variable and spatially correlated disturbance
term. The suggested alternative stems from a combination of tradi-
tional MM and IV approach.
Due to the presence of a spatially lagged dependent variable (W y),

an IV approach deemed to be appropriate to take care of the endogene-
ity problem (Anselin 1988, ch.7). For example, for model (4.5) which
we can rewrite as y = Zγ + ϵ where Z = (W y, X) and γ ′ = (ρ, β ′ ), if
we have a matrix of instrument Q that are strongly correlted with the
original variable Z = (W y, X), but uncorrelated with the error term;
the IV estimate follows as
γIV = [Z ′ Q(Q′ Q)−1 Q′ Z]−1 Z ′ Q(Q′ Q)−1 Q′ y

with coefficient variance
V ar(γIV ) = σ 2 [Z ′ Q(Q′ Q)−1 Q′ Z]−1
and σ 2 = (y − ZγIV )′ (y − ZγIV )/n. In order to take into account the
nonspherical nature of the disturbance ϵ (i.e., V ar(ϵ) = Ω), the IV
estimator will take the form
γIV = [Z ′ Q(Q′ ΩQ)−1 Q′ Z]−1 Z ′ Q(Q′ ΩQ)−1 Q′ y

and the coefficient variance becomes
V ar(γIV ) = σ 2 [Z ′ Q(Q′ ΩQ)−1 Q′ Z]−1 .

Using similar setup, Kelejian and Robinson (1993) provided a gen-
eralized method of moments (GMM) estimator in the context of a spa-
tial lag model with a general spatial error dependence. More specifically
they considered the model (4.5) in combination to ϵ = M η +ψ, where η
and ψ are n vector of random shocks, M is n × n non-stochastic weight
matrixes and has zeros in its main diagonal. Also E(η) = 0, E(ψ) =
0, E(η ′ η) = σ 2 In , E(ψψ ′ ) = σψ2 In and E(ϵϵ′ ) = ση2 M M ′ +σψ2 In = Ω. Let
H = (X, W X ∗ ) denote a matrix of instruments of rank (H) = k + k ∗ ,
where X ∗ is a submatrix of X consisting of k ∗ < k of its columns.
Kelejian and Robinson (1993) obtained an efficient GMM estimator by
minimizing a criterion function that is obtained from an orthogonality
condition E[H ′ u] = 0. The minimization of the criterion function
ϵ′ H[H ′ ΩH]−1 H ′ ϵ
results in the following estimator
b −1 Z ′ Dy
γGM M = (Z ′ DZ) b (4.21)
where Db = H(H ′ ΩH)
b −1 H ′ , and the estimated asymptotic covariance
matrix becomes
b −1 H ′ Z]−1
γGM M ) = [Z ′ H(H ′ ΩH)
V ar(b
where Ω b is a consistent estimator of Ω. In other words, Kelejian and
Robinson obtained their efficient GMM estimator by using the set of
instruments H in a 2 stage least squares (2SLS) procedure to obtain
a consistent preliminary estimate of γ. Then they utilize the estimate
of γ in y = Zγ + u to obtain u b and utilize the estimate of u to obtain
consistent estimates of σbη and σ
2 bψ2 . At the end, the equation (4.21) can
be used to obtain γ bGM M . Recently in the context of both spatial lag
and spatial error model, various suggestions has been made regarding
optimal instruments (see e.g., Lee (2003), Kelejian and Prucha (2004)
etc.)
In the formulation of spatial autoregressive error process Kelejian
and Prucha (1998, 1999) suggested a generalize moments (GM) ap-
proach. The approach has become very popular since its publication.
The idea is to utilize the empirical counterpart of three moment con-
dition on u and W u for the second part of the model (4.8):
 [ ] 
E n1 u′ u = σ 2
[1 ′ ′ ] 1 2
 E u W W u = σ T r(W ′ W ) 
n [ n]
E n1 u′ W ′ u = 0
In order to construct the GM estimator for ρ based on these moments,

Kelejian and Prucha consider the following 3-equation system:
 ′    1 
2
n E(ϵ ϵ) − n1 E(ϵ′ ϵ) 1 ρ ′
n E(ϵ ϵ)
 ′ ′
2
n E(ϵ ϵ) − n1 E(ϵ ϵ) n1 T r(W ′ W )   ρ2  =  n1 E(ϵ′ ϵ) 
′ ′ ′ ′
n E(ϵ ϵ + ϵ ϵ) − n E(ϵ ϵ)
1 1 1
0 σ2 n E(ϵ ϵ)
where ϵ = W ϵ and ϵ = W W ϵ. Using non-linear least squares, this

system can be solved for ρ2 and σ 2 . Recently Kelejian and Prucha
(2007) suggested an asymptotic variance for ρb which can be used for it’s
statistical inference. Note that, even though the above GM procedure
was for spatial error model, we can use it in the presence of spatial lag
structure in the mean specification. In order to deal with both spatial
lag and spatial error model, one can utilize the following 3 steps (see
Kelejian and Prucha (1998) for details):
(1) Estimate γ from Y = Zγ +ϵ using 2SLS/IV with instruments

Q
(2) Estimate ρ in ϵ = ρW ϵ + u using GM method of Kelejian
and Prucha (1999)
(3) Obtain generalized spatial 2SLS estimator of γ by Cochran-
Orcutt type transformation.
Recently Kapoor, Kelejian and Prucha (2007) and Kelejian and

Prucha (2004) extended the 2SLS/IV and GM approach to panel data
with serially correlated error components model and systems of simul-
taneous equations models respectively. In the next two subsection we
describe those procedures briefly.
4.1.3.1 GM and IV method for panel data with error com-

ponents
Kapoor, Kelejian and Prucha (2006) introduced a generalizations of the
GM procedure for panel data models involving a first order spatially
autoregressive disturbance term, whose innovations have an error com-
ponent structure described by equations (4.10), (4.15) and (4.16). The
also implemented a FGLS estimator for the model’s regression param-
eter’s based on Cochran-Orcutt type iterative method and transforma-
tions in the estimation of classical error component models. Compare
to early work (Anselin (1988), p.153 and Baltagi et al. (2003)), in the
current model we can allow for spatial interactions involving not only
the error components, but also the unit specific error components. After
stacking the observations the model takes the form of equation (4.19)
and
ϵ = λ(IT ⊗ W )ϵ + u, (4.22)
u = (iT ⊗ IN )µ + ν, (4.23)
where µ represents the vector of unit specific error components and ν

contains the error components that vary over both the cross-sectional
units and time periods. The foundations of their result rest on some
crucial assumptions; the implication of which is E(uit ) = 0 and
 
σu2 + σν2 if i = j; t = s
E(uit ujs ) =  σu2 if i = j; t ̸= s 
0 otherwise
So the innovations uit are autocorrelated over time only and are not
spatially correlated across units. The variance covariance matrix of u
is
Qu = E(uu′ ) = σν2 Q0 + σ12 Q1 ,
where σ12 = σν2 + T σµ2 , Q0 = (IT − (JT /T )) ⊗ IN , Q1 = (JT /T ) ⊗ IN , and

JT = iT i′T is a T × T matrix of unit elements. The three GM estimators
ρ, σν2 and σµ2 , or equivalently ρ, σν2 and σ12 (since in their analysis T is
fixed and N → ∞) are based on the the sample counterpart of the
following population moments:
   
1 ′
N (T −1) u Q0 u σν2
 ′   
 1
N (T −1) u Q0 u
  σν N T r(W ′ W )
2 1

   
 1 ′   0 
E N (T −1) u Q0 u = 
 1 ′   σ12 
 N u Q1 u   
 1 ′   σ12 N1 T r(W ′ W ) 
 N u Q1 u 
1 ′ 0
N u Q1 u
where u = (IT ⊗ W )u. In some sense, the above set up generalizes the
moment structure originally introduced in Kelejian and Prucha (1998,
1999). For example, if T = 1, we have Q0 = 0 and the first three
moment conditions become uninformative, and the last three equa-
tions reduce to those of Kelejian and Prucha (1998, 1999). Utilizing
the estimate of the spatial autoregresive parameter and the variance
components of the disturbance process, we use a FGLS type procedure
for regression parameters.
4.1.3.2 GM and IV method for spatial simultaneous system

of equations
The above sets of results can be extended to a simultaneous system of
equations which has both spatially lagged dependent variable as well
as error term. It has been discussed by Kelejian and Prucha (2004).
The basic model specification is given by
Yn = Yn B + Xn C + Y n Λ + Un
Un = U n R + E n (4.24)
where Yn = (y1,n , ..., ym,n ), Xn = (x1,n , ..., xk,n ),Un =
(u1,n , ..., um,n ), y j,n = Wn yj,n , j = 1, ..., m, Y n = (y 1,n , ..., y m,n ), U n =
(u1,n , ..., um,n ), uj,n = Wn uj,n , En = (ϵ1,n , ..., ϵm,n ), R = diagj=1 m (ρ ).
j
Note that, yj,n , uj,n and ϵj,n are all n × 1 vectors in the jth equation,
xl,n is n×1 vector on lth exogenous variable and the ith element of y j,n
∑
is y ij,n = r=1 n wir,n yrj,n . From this given set up we derive the next set
of equation
yn = Bn∗ yn + Cn∗ xn + un
un = Rn∗ un + ϵn (4.25)
where yn = vec(Yn ), xn = vec(Xn ), un = vec(Un ), ϵn = vec(En ),
Bn∗ = [(B ′ ⊗ In ) + (Λ ⊗ Wn )], Cn∗ = (C ′ ⊗ In ), Rn∗ = (R ⊗ Wn ) =
n (ρ W ).
diagj=1 j j
Finally for j = 1, . . . , m, we can express the entire system by the
following common form
yj,n = zj,n δj + uj,n

uj,n = ρj Wn uj,n + ϵj,n (4.26)
where zj,n = (Yj,n , Xj,n , Y j,n ), δj = (βj′ , γj′ , λ′j )′ . Since Imn − Rn∗ =
m (I − ρ W ), equation (4) implies the following
diagj=1 n j n
yn = (Imn − Bn∗ )−1 [Cn∗ xn + un ]

un = (Imn − Rn∗ )−1 ϵn (4.27)
By focusing on the simultaneous system (4.25), Kelejian and Prucha

(2004) considered the issue of estimation based on limited and full
information IV’s. The method is very similar to classical two and three
stage least squares and computationally tractable. The idea is to pick
three moment conditions to estimate spatial dependence parameter ρ.
For spatially autoregressive error structure based on equations (4.26),
these moment equations based on three error moments corresponds to
 
u′j,n uj,n − nσ 2
 
ψ(uj,n , θ) =  u′j,n uj,n − σ 2 T r(Wn Wn )  (4.28)
′
uj,n uj,n
where uj,n = Wn uj,n . In the above set up the number of moment
condition becomes equal to the number of parameter to be estimated,
so the question of optimal weights matrix does not arise. In summary,
the GM estimation of spatial simultaneous system involve three easy
steps:
• Estimate first part of (4.27) using 2SLS/IV with instruments

Q where choice of instruments are linear columns of Xn and
Wns Xn for s ≥ 1.
• Use the residual from first step and estimate ρ’s by
minθ ψ(uj,n , θ)′ ψ(uj,n , θ)

• Obtain 2SLS estimates of δ’s by Cochran-Orcutt type trans-
formation
4.1.4 Method of moments using directing process

Conceptually any stochastic process that contains a spatial variable is
called a random field. Attempts has been made to utilize the random
filed structure of the data to derive optimum properties of spatial de-
pendence parameter. Conely (1999) exploited the stationary and mix-
ing structure of the random field and directing process to derive GMM
estimate using spatially dependent data. Driscoll and Kraay (1998)
uses this mixing field set up to characterize a broad class of spatial and
temporal dependence and provides a robust covariance matrix estima-
tor. The starting point for Conely (1999) is the overidentified moment
equation: Eg(Xsi ; θ) = 0, where si ’s are collection of location, Xsi is

a random field at each location si within the sequence of finite closed,
convex sample regions Λn which increases in area as n → ∞. The num-
ber of points in Λn is a random variable Tn . The GMM estimators of
θ, is the argmin of
[ ]′ [ ]
1 ∑ 1 ∑
Tn Tn
g(Xsi ; θ) Θ g(Xsi ; θ)
Tn Tn
i=1 i=1
where Θ is positive-definite weighting matrix. Using the framework,
he proves the consistency and distribution results for GMM estimators
and consistency results for a class of PSD covariance matrix estima-
tors that remain consistent in the presence of measurement errors in
economic distances. Recently, Andrews (2005) utilizes similar directing
process approach and consider regression models for cross-section data
that exhibit cross-section dependence due to common shocks. Andrews
results allow for any-form of cross-section dependence and heterogene-
ity across population units.
4.2 Asymptotics in Spatial Econometrics: Methods of Anal-

ysis
Unlike time series analysis in spatial domain there exists two distinct
forms of asymptotic theory to justify the optimal behavior of spatial
estimators: increasing domain and infill asymptotics. Unlike the first
approach, infill sampling is based on adding additional observations to
the existing data set. Due to the complexity of medeling and implemen-
tation of underlying assumption in the second approach, the increasing
domain has became more popular in Econometrics.
It was Mardia and Marshall (1984) who first proved the asymptotic
properties of consistency and asymptotic normality of ML based esti-
mators for spatial models using increasing domain asymptotics. How-
ever in their approach the disturbances are Gaussian and disturbance
covariance matrix is independent of the sample size. Also their con-
ditions are difficult to check, specially in case of irregular lattice of
sampling points.
A slight variation of ML based approach known as Restricted ML
4.2. Asymptotics in Spatial Econometrics: Methods of Analysis 55
(REML) is used by Cressie and Lahiri (1993) to estimate spatial-

dependence parameters. They showed that under appropriate regular-
ity conditions spatial REML estimators are asymptotically normal and
provides some sufficient conditions to check that.
In this section our interest is in three major econometric approaches
of asymptotics: traditional ML and QML, MM based 2SLS, and GMM.
Since all of them are under increasing domain it would be interesting
to compare them with infill asymptotics type results as described in
Lahiri (1996) who proved inconsistency of LS based spatial estimators
under that approach. Thats why we describe infill approach first.
For MM and 2SLS we will follow Kelejian and Prucha. They uti-
lizes the central limit theorem for triangular arrays to prove consistency
and asymptotic distribution. In contrast, Conely (1999) approaches the
estimation problem in a GMM framework by using stationary and spa-
tially mixing data, the origin of which goes back to Bolthausen (1982).
Even though this approach is free of misspecification, when it comes to
actual application the stationary assumptions are not easy to check.
Lee (1999, 2004) maintains the same set of assumptions like Keljian
and Prucha except that his QMLE (Quasi MLE) approaches relies heav-
ily on some specific form of weights matrix to prove consistency and
asymptotic normality of spatial estimators. Also it’s not easy to judge
how large the sample size should be for asymptotic results to be valid
. By comparison, asymptotics proposed by Andrews (2003) are based
on cross-sectional shocks.
Let us briefly discuss the main assumptions and implementation is-
sues in each of the approaches. We will start with infill asymptotics due
to it’s unique nature and then go through increasing domain approaches
in econometrics.
4.2.1 Infill Approach
As in any scientific discipline, a well-designed spatial experiment is

based on three concepts - randomization, blocking and replication
(Cressie, 1993). Under infill asymptotics the treatment of appropriate
blocking and randomization highlights the main difference with stan-
dard statistical inference procedures. Here the sample is taken from a
bounded region and observation becomes dense resulting a strong form

of dependence. So in order to replicate the limit behaviors it is impor-
tant that in simulation the sampling scheme provides such structure.
The starting point of Lahiri (1996) was to consider a random field
{Z(x) : x ∈ R} on a bounded region d ≥ 1 and data sites x′i s which
may depend on n becomes dense in R. The idea is to devide R into N n
hyperrectangles, each having equal volume. Even though data sites x′i s
may not be equispaced, R is partitioned using equispaced points along
each co-ordinate. Technically he denotes the cartisian product of Njn
disjoint subintervals is
∆n (k1 , ..., kd ) = (t1k1 , t1k1 +1]×· · ·×(t1k1 , t1k1 +1], 0 ≤ kj ≤ Njn ≤ j ≤ d
and infill asymptotics occurs when Njn → ∞ as n → ∞. That is, we

select the data sites by choosing a point from each ∆n . In finite sample
application the arbitrary choices of ∆n ’s should allow a fair amount of
irregular spacing.
Overall, the set of spatial locations (i.e., R) in this sample is a fixed
regular or irregular lattice on the plane and we exclude spatial point
pattern where R is a random collection of points on the plane.
4.2.2 Kelejian and Prucha approach

Under this method, the results are based on the following key assump-
∑
tions2 (for a model uj,n = ρ nj=1 Wij,n uj,n + ϵi,n ):
(1) diag(Wn ) = 0
(2) (I − ρWn ) nonsingular
(3) row and column sums of W, (I − ρWn )−1 are bounded in
absolute value
(4) regressors have full column rank, their elements are uniformly
bounded, same is true for instruments if any
(5) ϵi,n ’s are iid with finite fourth moments
(6) positive smallest eigenvalue for their moments matrix
2 Theseassumptions originated in Anselin and Kelejian (1997). Later Kelejian and Prucha
modified and extended them in their subsequent papers.
(7) products of instruments and regressors are finite and have

full column rank
The same is true whether they have lagged dependent variable or

lagged error term or both. Also, for IV and spatial error components we
can modify the above assumptions to accommodate additive regression
structure.
4.2.2.1 Consistency result
For consistency of 2SLS estimators δen Kelejian and Prucha uses the
following theorem on triangular inequality.
A.1: If {νi,n , 1 ≤ i ≤ n, n ≥ 1} ∼ iid(0, σ 2 ) and A = {aij,n , 1 ≤
i ≤ n, n ≥ 1} both follows triangular arrays, then n−1/2 A′n νn →D
N (0, σ 2 Q).
The idea is to break 2SLS estimators into two parts, and apply the
theorem on the 2nd part to derive the required result.
For consistency of 2SLS estimators with spatial correlation, Kelejian
and Prucha uses the following result.
A.2: For a triangular array {ξi,n , 1 ≤ i ≤ n, n ≥ 1}, a sufficient
∑
condition for n−1 ni=1 ∥ξi,n ∥s = Op (1), s > 0 is E|ξij,n |s ≤ cξ < ∞.
Since one can break the estimated residual from a regression with
spatial error term:
en = un + Zn (δ − δen )
u
and so the proof boils down to
ei,n | ≤ ∥zi,n ∥∥δ − δen ∥

|ui,n − u
Kelejian and Prucha showed (δ− δen ) = Op (n−1/2 ) and following A.2,
E|zij,n |3 ≤ cz . If z includes only non-random x’s then it’s not a problem.
But since in their case it includes endogenous lagged regressors they
expand y n = Wn yn and uses triangle inequality to prove that E|y i,n |3 ≤
cz .
4.2.2.2 Asymptotic Normality result
The asymptotic normality of generalized 2SLS (GS2SLS) estimators

follows from the underlying assumptions, consistency of spatial cor-
relation parameter and the use of Cramér-Wold device. For the error
variance consistency they expanded it into 6 terms
bϵ2 = n−1 ϵ′n ϵn + ∆1n + ∆2n + ∆3n + ∆4n + ∆5n

σ
and showed that each term ∆in is Op (1). So using the assumption
on weights matrix, error distribution and Chebyshev’s inequality the
required result of error variance consistency follows.
Note that, in order to replicate this large-sample distribution in
small-samples one need to check various model assumptions and con-
duct a Monte-carlo study to see how this Generalized 2SLS (GS2SLS
as they say) procedure performs in compare to infill strategy of Lahiri
(1996), traditional ML based estimators of Anselin (1988), QMLE ap-
proach of Lee and Conely (1999) type GMM procedures. The approxi-
mation in small-samples is not difficult provided that the weights ma-
trix and data follows common features like symmetry, sparseness etc.
The comparison of bias, RMSE should shed some light on the effects
of sample size and number of neighbors on the small sample proper-
ties of alternative estimators. From the literature it’s still not clear
which methods are robust when it comes to their bias properties, exact
parametrization of the model or small sample efficiency. Also, it would
be interesting to compare 3 different error structure like Kelejian and
Prucha (1999) and Anselin and Moreno (2003) to see the effects of
non-normality, if any.
4.2.3 Conely Approach
As we have MENTIONED before, Conely (1999) assumes that eco-

nomic theory has produced a set of moment restriction that can be
used to used to get estimates of the spatial dependence parameter θ
that is identified as the unique solution to the moment equation.
The results on consistency obtained when Λn grows uniformly as

n → ∞. Conely assumes that the directing process is a regular lattice
indexed random fields Ws which follows same stationary and mixing
conditions like Xs . Expanding sample counterpart of the moment con-
dition he applied ergodic theorem for random fields. Then Stutsky’s
theorem and a pointwise law of large numbers leads to the desired
result.
The limiting distribution of estimated θ’s are obtained using the mean
value expansion of g(Xsi ; θ). Then given the assumptions, he applied a
central limit theorem due to Bolthausen (1982) for stationary, mixing
random fields on regular lattices. Finally the use of Slutsky’s theorem
and Cramér-Wold device gives the result.
So basically he shows that GMM estimators remain consistent with
spatial dependent structure and distribution theory can be obtained
provided one can handle the complications of spatial dependent data
differently.
Conveniently, two ways one can can implement this procedure for
real data: the structure of spatial dependence can affect point estimates
or it can can have some impact on inference procedures. Even though
in his paper Conely never implemented the first issue one can try to ap-
proximate that by same algorithm of Hansen’s (1982, 84) 2-step GMM
approach. It would be interesting to see how over-identified case differs
from ML, 2SLS, GM based approaches when we allow spatial depen-
dence. For covariance matrix estimation he suggested two-dimensional
Bartlett window type weight function
KM N (j, k) = (1 − |j|/LM )(1 − |k|/LN )f or|j| < LM , |k| < LN
and 0 otherwise; so this can be used calculate spatial GMM standard

errors.
4.2.4 Lee Approach

The basic regularity conditions for Lee (2002, 2004, 2007) are similar
to Kelejian and Prucha except the following
(1) The elements of spatial (weights

) matrix Wn are at most of
−1 1
order hn , i.e. wn,ij = O hn
(2) The ratio 1
hn → 0 as n → ∞
Lee mentioned that for models with a few neighboring units {hn }
should be bounded. Also for LS approach, the fact that whether {hn }
is bounded or divergent has important implications. For example, for a
SAR model like Yn = Xn β + λWn Yn + Un , the LS estimators of β and
λ are consistent when {hn } is divergent and inconsistent when {hn }
is bounded (Lee 2002). As an example where {hn } → ∞ and satisfies
above two assumptions, he considers that of Case (1991).
Basically assumption 2 excludes the case where spatial distance di-
verge to ∞ at a rate ≥ n, the rate of the sample size, because the
MLE will be inconsistent otherwise. Also when {hn } is a divergent se-
quence, the MLE’s of λbn and σbn2 becomes asymptotically independent.
So it becomes easy to bypass the discussion of Anselin and Bera (1998)
on the impact of their asymptotic dependence on statistical inference
described in equation (4.7).

For the consistency of QMLE estimators for SAR model he uses the
White (1994) type argument of uniform convergence of [ n1 ln Ln −
1
n maxE(ln Ln )] and uniqueness of identification condition. The cru-
cial fact is that the QMLE rate of convergence depends on hn . If {hn }
is bounded, λ bn →P λ0 at √n, but if {hn } is divergent, convergence rate
√
becomes ≤ n.

The asymptotic distribution of the QMLE is derived from the usual
Taylor series expansion of first derivative of likelihood function and the
application a CLT for linear-quardratic forms of Keljian and Prucha

(2001).
The application of this QMLE approaches will be interesting be-
cause it has some link with infill asymptotics of Lahiri. For example,
in Case (1991) study the sample consists of certain number of districts
in Indonesia and each district there are some farmers. Now if we keep
√
on increasing n by increasing the number of district, the usual n
convergence of increasing domain approach can be obtained. But if we
increase n by drawing more sample from a particular district the rate
√
of convergence for QMLE estimators will be ≤ n, otherwise it will
be inconsistent. The bias of spatial QML estimator in comparison to
GS2SLS and GMM estimator would be interesting, especially in rela-
tion to both number of districts and number of farmers.
Lee (2004) results on SAR and MRSAR could be helpful in this
regard.
4.2.5 Andrews Approach
Andrews (2005) approach is very similar to Kelejian and prucha except

that he is defining the components of a regression model Y (s) = α0 +
X(s)′ β0 + U (s) as part of a random element W (s), s ∈ Γ, defined on
some common probability space3 ; so the set up includes a broad class
of models. Like Conely, {Wsi } becomes the subordinated stochastic
process and {si } is the directing process.
The role of common shocks comes into play for being a part of
W (s), which acts like a supplementary variable and is nothing but
some characteristics of s or stochastic terms common to some or all
units in the population. By including this common shocks in W (s) we
are allowing for arbitrary dependence (in our case cross-sectional) and
heterogeneity across s.
3 Tomake comparison with Conely simple here we are denoting s as some unit in the
population, otherwise Andrews used γ to denote them, but both of them are basically
same.
As a consequence of his analysis of the standard factor structure, het-

erogenous factor structure and functional factor structure using differ-
ent common shocks, the disturbance terms and in some cases regressors,
becomes spatially correlated and heteroskedastic. Since common shocks
plays a defining role for different factor structure model, the LS esti-
mates of β becomes inconsistent, which is due to the randomness that
does not dies out as n → ∞.
To make LS estimates consistent he provides some sufficient condi-
tions which is nothing but traditional mean zero error plus it’s uncor-
relatedness with regressors, but now they are conditional and is defined
using alternative factor structure specification. The use of IV’s becomes
evident when those conditions fails to satisfy.
The asymptotic variance of the normalized LS is obtained by using

different factor structure. Like Kelejian and Prucha they turn out to
be robust to heteroskedasticity under a set of assumptions.
Even though he has no application of the results, one can implement
them, at least the modified LS estimates, given that he can specify
the factor structures correctly. Once we identify what kind of common
shocks is working in the model we need to use the spatial weights
matrix accordingly. In the Case (1991) study, for example, if there is
some common shock to some particular district (e.g., a common farmers
disease, better local government facilities etc.) we should model that in
underlying factor structure.
4.3 Specification Testing for Spatial Econometric Models

Similar to mainstream econometrics, at the initial stage of develop-
ment the main emphasis in spatial econometrics was on estimation
methodology. One of the possible reason for less popularity was the
implementation issue since testing in spatial models is quite different
from classical regression models or in time series analysis. However,
during last two decades we have seen a number of contribution in this
4.3. Specification Testing for Spatial Econometric Models 63
field which highlights the importance and attention this particular area
of research is getting. In this section we present a brief overview of the
existing methods of tests for spatial dependence.
4.3.1 Global indicators of Spatial dependence

The oldest and commonly used test for spatial dependence are based
on two global measures of spatial autocorrelation - Moran’s I (Moran
1948, 1950a, 1950b) and Geary’s C (Geary 1954). They are global in
the sense that both of the test statistic indicates the degree of possible
spatial dependence as reflected in the data set as a whole. In additon to
them there exists other indicators of spatial autocorrelation: the vari-
ogram and semi-variogram approach to spatial association (described
in section 2) and local indicators of spatial association (LISA) that are
related to localized version of Moran’s I and Geary’s C (Anselin 1995).
However, in our presentaion we concentrate on the global indicators
only.
The general formula for computing Moran’s I is
∑ ∑
N i j wij Zi Zj
I= ∑
S0 i Zi2
where Zi = Xi − µ, N is the number of observations and S0 is a stan-
∑ ∑
dardization factor equal to i j wij . In some sense, Moran’s I is the
correlation between neighboring values on a variable that incorporate
weights to index relative location. With respect to a simple OLS re-
gression set up (Cliff and Ord 1972, 1973 and Haining 1990) we can
write down the above expression as:
( )
N e′ W e
I=
S0 e′ e
where e is the vector of OLS residuals and W = ((wij )) is the spatial
weights matrix. For a normalized weights matrix, S0 = N and the
statistic simplifies to ( ′ )
e We
I=
e′ e
It is evidend from the expresion that the statistics is a ratio of two
terms: a cross-product (covariance) term on the numerator and a vari-
ance in the denominator - thus redering it to behave as a product mo-

ment correlation. In practice, a positive correlation means that rates in
nearby areas are similar therby indicating global clustering (a departure
from spatial randomness) and value of Moran’s I as large and positive.
On the other hand, when the nearby areas has dissimilar rates, value of
Moran’s I will be negative. As mentioned in Anselin and Bera (1998),
one can view Moran’s I as two-dimensional analogue of the familiar
Durbin-Watson (DW) test of significance of the serial autocorrelation
coefficient in univariate time series.
Geary’s C (Geary 1954) can be defined as
∑ ∑
N − 1 i j wij (Xi − Xj )
2
C= ∑ 2
2S0 i Zi
with Xi is the value at location i and Xj is the value at location j.

As one can easily see, calculation of C is similar to I. The main differ-
ence is that on the numerator of I the cross product is based on the
deviations from the means for the two location values whereas for C,
the cross-product uses the actual value themselves at each location. In
practice, both Moran’s I and Geary’s C can be tested using analytical
expectation and variance (Cliff and Ord 1972, 1973) and are asymp-
totically normally distributed - obtained by substracting the expected
value and deviding by the standard deviation. Alternatively, as shown
in Tiefelsdorf and Boots (1995) and Anselin and Bera (1998), an exact
critical values of I can be computed by numerical integration using the
results of Imhof (1961) and Koerts and Abrahamse (1968).
4.3.2 Likelihood based inference
One distinctive feature of Moran’s I and Geary’s C is that they are

not based on any explicit specification of the generating process of the
disturbance term. However, as it is well known, likelihood based frame-
work is very useful in developing diagnostics agaist explicitly stated
data generating process. There exists three popular likelihood based
tests - Wald (W), liklihood ratio (LR) and lagrange multiplier (LM)
or Rao’s Score (RS) based tests. As in the mainstream econometrics,
given a likelihood function one can always construct W and LR tests
but both approaches require that the alternative model be estimated.

Whereas in the case of RS tests principle we need only estimation of the
model under the null and that’s what makes it so attractive. Recently,
score based tests for various forms of spatial autocorrelation has been
constructed by formulating specific null and alternative hypothesis. In
this section our emphasis is on those RS test principle and we present
a summary of various RS test for various spatial alternatives. More
specifically we will discuss the following cases:
(1) Spatial error dependence (Anselin 1988a, Anselina and Bera

1998)
(2) Spatial lag dependence (Anselin 1988c)
(3) Joint presence of both spatial lag and error autocorrelation
(Anselin and Bera 1998)
(4) Spatial autoregressive and moving average process (Anselin
2001b)
(5) Spatial error components and direct representaion models
(Anselin 2001b)
(6) Robust tests in the prsence of local misspecification (Anselin,
Bera, Florax and Yoon 1996)
(7) Panel data regression model with spatial error autocorrela-
tion (Baltagi et al. 2003)
(8) Panel data regression model with serial correlation and spa-
tial dependence (Baltagi et al. 2007)
4.3.2.1 Spatial error dependence

Let us assume that we are interested in testing an alternative hypothesis
of spatial eror dependence as in (4.8), i.e.
ϵ = λW ϵ + u (4.29)
and the null hypothesis for our test is H0 : λ = 0. The general formula
for RS test is
e θ)
RS = d′ (θ)I( e −1 d(θ)
e (4.30)
where d(θ) = ∂l(θ)/∂θ is the score vector, I(θ) = −E[∂ 2 l(θ)/∂(θ)∂(θ)′ ]
is the information matrix, l(θ) is the log-lielihood function, and θe is
the restricted (under the tested hypothesis) ML estimator of the pa-

rameter vector θ. For the spatial error autocorrelation model (4.8),
θ = (β ′ , σ 2 , λ)′ and the log-likelihood function is given by (4.9). The
score with respect to λ evaluated at the null is

∂l ϵ′ W ϵ
dλ = = (4.31)
∂λ σ2
λ=0
Using the I(θ) calculated under the null, RS test statistic takes the
following form
[ ′ ]2
de2λ e W e/eσ2
RSλ = = (4.32)
I(θ)e T r[(W ′ + W )W ]
Note that, for spatial MA process under the null of no spatial depen-
dence, the RS test statistics will take similar form. By construction, the
test requires only OLS estimates, and under H0 , RSλ →D χ21 . Utilizing
the lagrange set up of a constrained optimization problem, Burridge
(1980) demonstrated that Moran’s I test statistic is equivalent to the
LM test statistic derived from a linear regression model without spa-
tial lag and this result in the same statistic RSλ . Coveniently one can
also compute W and LR tests for spatial error dependence model but
they require ML estimation under the alternative, and explicit forms of
the tests are more complicated. The generalization of the RS statistics
for higher-order spatial process is straightforward. For example, for a
qth-order spatial autoregressive model
ϵ = λ1 W1 ϵ + λ2 W2 ϵ + . . . + λq Wq ϵ + u (4.33)
the RS statistics for the hull hypothesis of no spatial dependence H0 :
λ1 = λ2 = . . . = λq = 0 is given by
[ ′ ]2
∑
q
σ2
e Wi e/e
RSλ1 ...λq = (4.34)
T r[(Wi′ + Wi )Wi ]
i=1
and RSλ1 ...λq→D χ2q .
As it is evidend, the higher order test statistic is
the sum of corresponding individual tests.
4.3.2.2 Spatial lag dependence

In order to test spatial lag dependence we consider model (4.5) and
tests the null hypothesis H0 : ρ = 0 using the log-likelihood function
(4.6). The score with respect to ρ evaluated at the null is

∂l ϵ′ W y
dρ = = (4.35)
∂ρ σ2
ρ=0
and the inverse of the information matrix is given by (4.7). Even though
under ρ = 0, the information matrix is not block diagonal, the asym-
totic variance of drho is obtained from the reciprocal of the (1, 1) ele-
ment of
[ ′ ′
]−1
Xβ)′
tr[W 2 + W ′ W ] + [W Xβ]σ2[W Xβ] (X Wσ 2
X ′ W Xβ X′X
σ2 σ2
Finally, the RS test agaist a spatial alternative takes the following form
de2ρ [e′ W e/eσ 2 ]2
RSρ = =
I(θ)e e ′ (I − X(X ′ X)−1 X ′ )(W X β)/e
[(W X β) e σ 2 ] + T r(W 2 + W ′ W )
(4.36)
Similar to RSλ , under the null H0 : ρ = 0, the RSρ statistic also has
χ21 distribution asymptotically.
4.3.2.3 Joint presence of both spatial lag and error autocor-

relation
In the previous two sections we considered one directional tests in the
sense that they are designed to test a single specification assuming the
rest of the model is correctly specified. However, this may not be the
case in most of the situations. Recently, there is a huge interest in
the effects of mispecification on hypothesis testing in the presence of
nuisance parameters. Theoretical frameworks on this issue was devel-
oped by Bera and Yoon (1993), and following their contributions two
types of applications have emerged: regression models involving spatial
dependence (e.g., Anselin, Bera, Florax and Yoon 1996, Baltagi and
Li 2004) and panel data regression model with various combinations
of random effects, spatial dependence and serial correlation (Baltagi,
Song, Koh 2003, Baltagi, Song, Jung, and Koh 2007). Irrespective of
the methodology, that basic idea behind all the approach is to follow
Bera and Yoon (1993) procedure and provide a modified RS test for
spatial dependence that is robust to local misspecifications.
More coming....hjjsaas=rwqrwqrwqrwq
4.4 References
In this section, we intend to provide an up-to-date and almost complete
list of references.
Andrews, D. (2005). Cross-section Regression with Common

Shocks, Econometrica, 73, 1551-1585.
Anselin, L. (1980). Estimation Methods for Spatial Autoregressive

Structures. Regional Science Dissertation and Monograph
Series, Cornell University, Ithaca, NY.
Anselin, L. (1988a). Lagrange Multiplier test diagnostics for spatial

dependence and spatial heterogeneity. Geographical Analysis,
20:1-17.
Anselin, L. (1988b). A test for spatial autocorrelation in seemingly

unrelated regressions. Economics Letters, 28:335-341.
Anselin, L.(1988c). Spatial Econometrics: Methods and Models,

Kluwer Academic Publishers, The Netherlands.
Anselin, L.(1988d). Model Validation in Spatial Econometrics: A

Review and Evaluation of Alternative Approaches, Interna-
tional Regional Science Review, 11, p.279-316.
Anselin, L. (1990). Spatial dependence and spatial structure

instability in applied regression analysis. Journal of Regional
Science, 30:185-207.
Anselin, L. (1992). Space and applied econometrics. Introduction.

Regional Science and Urban Economics, 22:307–316.
Anselin, L. (1995). Local Indicators of Spatial Association - LISA.

Geographical Analysis, 27: 93-115.
4.4. References 69
Anselin, L. (2001a). Raos score test in spatial econometrics.

Journal of Statistical Planning and Inference, 97:113-139.
Anselin, L. (2001b). Spatial econometrics. In Baltagi, B., editor,

A Companion to Theoretical Econometrics, pages 310-330.
Blackwell, Oxford.
Anselin, L. (2001c). Spatial effects in econometric practice in

environmental and resource economics. American Journal of
Agricultural Economics, 83(3):705-710.
Anselin, L. (2002). Under the hood. Issues in the specification

and interpretation of spatial regression models. Agricultural
Economics, 27(3):247-267.
Anselin, L. (2003a). Spatial externalities. International Regional

Science Review, 26(2):147-152.
Anselin, L. (2003b). Spatial externalities, spatial multipliers and

spatial econometrics. International Regional Science Review,
26(2):153-166.
Anselin, L., (2006): Spatial Econometrics. Palgrave Handbook of

Econometrics, Vol. 1, MacMillan, London.
Anselin, L. and Bera, A. K., (1998). Spatial dependence in linear

regression models with an introduction to spatial econometrics.
In: A. Ullah and D.E.A. Giles (Editors), Handbook of Applied
Economic Statistics, 237- 289.
Anselin, L., Bera. A. K, Florax, R. and Yoon, M.(1996).

Simple Diagnostic Tests for Spatial Dependence, Regional
Science and Urban Economics, 26, p.77-104.
Anselin, L. and Florax, R. J. (1995a). New Directions in Spatial

Econometrics. Springer-Verlag, Berlin.
Anselin, L. and Florax, R. J. (1995b). Small sample properties

of tests for spatial dependence in regression models: Some
further results. In Anselin, L. and Florax, R. J., editors, New
Directions in Spatial Econometrics, pages 21-74. Springer-
Verlag, Berlin.
Anselin, L., Florax, R. J., and Rey, S. J. (2004a). Advances in

Spatial Econometrics. Methodology, Tools and Applications.
Springer-Verlag, Berlin.
Anselin, L., Florax, R. J., and Rey, S. J. (2004b). Econo-

metrics for spatial models, recent advances. In Anselin, L.,
Florax, R. J., and Rey, S. J., editors, Advances in Spatial
Econometrics. Methodology, Tools and Applications, pages
1-25. Springer-Verlag, Berlin.
Anselin, L. and Griffith, D. A. (1988). Do spatial effects

really matter in regression analysis. Papers, Regional Science
Association, 65:11-34.
Anselin, L. and Kelejian, H. H. (1997). Testing for spatial

error autocorrelation in the presence of endogenous regressors.
International Regional Science Review, 20:153-182.
Anselin, L., LeGallo, J. and Jayet, H.(2006). Spatial Panel

Econometrics. Forthcoming in The Econometrics of Panel
Data, Fundamentals and Recent Developments in Theory and
Practice (3rd Edition), edited by L. Matyas and P. Sevestre.
Anselin, L. and Moreno, R. (2003). Properties of tests for spatial

error components. Regional Science and Urban Economics,
33(5):595-618.
Anselin, L. and Rey, S. J. (1991). Properties of tests for spatial

dependence in linear regression models. Geographical Analysis,
23:112-131.
4.4. References 71
Anselin, L. and Smirnov, O. (1996). Efficient algorithms for

constructing proper higher order spatial lag operators. Journal
of Regional Science, 36:67-89.
Arbia, G. (2006). Spatial Econometrics: Statistical foundations

and application to regional convergence. Springer-Verlag.
Arbia, G., and Baltagi, B. H. (2009) Spatial Econometrics:

Methods and Applications, Physica-Verlag, Heidelberg.
Baltagi, B. H. and Li, D. (2001a). Double length artificial re-

gressions for testing spatial dependence. Econometric Reviews,
20(1):31-40.
Baltagi, B. H. and Li, D. (2001b). LM tests for functional form

and spatial error correlation. International Regional Science
Review, 24(2):194-225.
Baltagi, B. H. (2008). Econometric Analysis of Panel Data, John

Wiley and Sons, Chichester, 4th edition
Baltagi, B. H., Song, S. H., and Koh, W. (2003). Testing panel

data regression models with spatial error correlation. Journal
of Econometrics, 117:123-150.
Baltagi, B. H., Song, S. H., Jung, B. C., and Koh, W. (2007).

Testing for Serial Correlation, Spatial Autocorrelation and
Random Effects Using Panel Data. Journal of Econometrics,
140: 5-51.
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hier-

archical Modeling and Analysis for Spatial Data. Chapman &
Hall/CRC, Boca Raton, FL.
Bell, K.P. and Bockstael, N.E.(2000). Applying the generalized

-moments estimation approach to spatial problems involving
microlevel data, The Review of Economics and Statistics,
82(1), 72-82.
Bera, A. and Yoon, M. J. (1993). Specification testing with

misspecified alternatives. Econometric Theory, 9:649-658.
Besag, J., (1974). Spatial interaction and the statistical analysis of

lattice systems (with discussion), Journal of Royal Statistical
Society, Series B, 36, 192-236.
Besag, J., (1977). Efficiency of pseudo-likelihood estimators for

simple gaussian fields, Biometrika, 64, 616-618.
Bolthausen, E. (1982). On the central limit theorem for stationary

mixing random fields. The Annals of Probability, 10, 1047-1050.
Case, A. C. (1991). Spatial patterns in household demand.

Econometrica, 59:953-965. Case, A. C. (1992). Neighborhood
influence and technological change. Regional Science and
Urban Economics, 22:491-508.
Case, A. C., Rosen, H., and Hines, J. R. (1993). Budget

spillovers and fiscal policy interdependence: Evidence from the
states. Journal of Public Economics, 52:285-307.
Chica-Olmo, Jorge. (2007). Prediction of Housing Location Price

by a Multivariate Spatial Method: Cokriging, Journal of Real
Estate Research, Vol 29, 1, pp.92-114.
Cliff, A.D. and Ord, J. K. (1973). Spatial Autocorrelation, Pion,

London.
Cliff, A.D. and Ord, J. K. (1981). Spatial Processes, models and

applications, Pion, London.
Cressie, N. (1993). Statistics for Spatial Data. Wiley, New York.
Chen, X. and Conley, T. G. (2001). A New Semiparametric

Spatial Model for Panel Time Series, Journal of Econometrics,
4.4. References 73
105: 59-83.
Conely, T.G.(1999). GMM Estimation with Cross Sectional

Dependence, Journal of Econometrics, 92, p.1-45.
Conley, T. G. and Ligon, E. (2002). Economic distance, spillovers

and cross country comparisons. Journal of Economic Growth,
7:157-187.
Conley, T. G. and Topa, G. (2002). Socio-economic distance

and spatial patterns in unemployment. Journal of Applied
Econometrics, 17:303-327.
Conley, T. G. and Dupor, W. D. (2003). A Spatial Analysis of

Sectoral Complementarity, Journal of Political Economy, 111:
311-352.
Driscoll, J. C. and Kraay, A. C. (1998). Consistent covariance

matrix estimation with spatially dependent panel data. The
Review of Economics and Statistics, 80:549-560.
Dubin, R. (1988). Estimation of regression coefficients in the pres-

ence of spatially autocorrelated errors. Review of Economics
and Statistics, 70:466-474.
Dubin, R. (1992). Spatial autocorrelation and neighborhood

quality. Regional Science and Urban Economics, 22:433-452.
Dubin, R., Pace, R. K., and Thibodeau, T. G. (1999). Spatial

autoregression techniques for real estate data. Journal of Real
Estate Literature, 7:79-95.
Duclos, J., Sahn, D.E. and Younger, S.D. (2006). Robust

multidimensional spatial poverty comparisons in Ghana,
Madagascar, and Uganda. World Bank Economic Review, Vol.
20, No. 1, pp. 91-113.
Elhorst, J.P. (2003). Specification and estimation of spatial panel

models. International Regional Science Review, 26, 244-268.
Fairfield, Smith, H. (1938). The empirical law describing het-

erogeneity in the yields of agricultural crops. Journal of
Agricultural Science (Cambridge), 28, 1-23.
Fisher, R.A. (1937). The Design of Experiments, Second edition,

Oliver and Boyd, Edinburgh, London.
Geary, R. C. (1954). The contiguity ratio and statistical mapping.

The Incorporated Statistician, 5: 115-145.
Gotway, C. A. and Wolfinger, R. D. (2003). Spatial prediction

of counts and rates. Statistics in Medicine, 22:1415-1432.
Gotway, C. A. and Young, L. J. (2002). Combining incompatible

spatial data. Journal of the American Statistical Association,
97:632-648.
Haining, R. (1988). Estimating spatial means with an application

to remotely sensed data. Communications in Statistics, Theory
and Methods, 17:573-597.
Haining, R. (1990). Spatial Data Analysis in the Social and Envi-

ronmental Sciences. Cambridge University Press, Cambridge.
Kapoor, M., Kelejian, H. H., and Prucha, I. R. (2007). Panel

data models with spatially correlated error components, Jour-
nal of Econometrics, 140: 97-130.
.
Kelejian, H. H., and Prucha, I. R. (1998). A Generalized Spatial
Two-Stage Least Square Procedure for Estimating a Spatial
Autoregressive Model with Autoregressive Disturbances,
Journal of Real Estate Finance and Economics, 17, p.99-121.
Kelejian, H. H and Prucha, I. R. (1999). A Generalized

Moments Estimator for the Autoregressive Parameter in a
4.4. References 75
Spatial Model, International Economic Review, 40, p.509- 532.
Kelejian, H. H., and Prucha, I. (2001). On the asymptotic

distribution of the Moran I test statistic with applications.
Journal of Econometrics, 104(2):219-257.
Kelejian, H. H., and Prucha, I. R. (2002). 2SLS and OLS

in a spatial autoregressive model with equal spatial weights.
Regional Science and Urban Economics, 32(6):691-707.
Kelejian, H. H., and Prucha, I. R. (2003). HAC estimation in a

spatial framework. Working paper, Department of Economics,
University of Maryland, College Park, MD.
Kelejian, H. H. and Prucha, I. R. (2004). Estimation of

simultaneous systems of spatially interrelated cross sectional
equations. Journal of Econometrics, 118:27-50.
Kelejian, H. H. and Prucha, I. R. (2010a). Specification and

Estimation of Spatial Autoregressive Models with Autoregres-
sive Disturbances. Journal of Econometrics, 157:53-67.
Kelejian, H. H. and Prucha, I. R. (2010b). Spatial Models with

Spatially Lagged Dependent Variables and Incomplete data.
Journal of Geographical Systems, 12:241-257.
Kelejian, H. H., Prucha, I. R., and Yuzefovich, Y. (2004).

Instrumental variable estimation of a spatial autoregressive
model with autoregressive disturbances: Large and small sam-
ple results. In LeSage, J. P. and Pace, R. K., editors, Advances
in Econometrics: Spatial and Spatiotemporal Econometrics,
pages 163-198. Elsevier Science Ltd., Oxford, UK.
Kelejian, H. H. and Robinson, D. P. (1992). Spatial autocorre-

lation: A new computationally simple test with an application
to per capita country police expenditures. Regional Science
and Urban Economics, 22:317-333.
Kelejian, H. H and Robinson D.P.(1993). A Suggested Method

of Estimation for Spatial Interdependent Models with Autocor-
related Errors and an Application to a Country Expenditure
Model, Papers in Regional Science, 72, 297-312.
Kelejian, H. H and Robinson D.P.(1995). Spatial Correlation:

A Suggested Alternative to the Autoregressive Model, In:
Anselin, L.,Florax, R. J. G. M. (eds.) New Directions in Spatial
Econometrics, Springer, Berlin.
Knox, G. (1964). Epidemiology of childhood leukemia in Northum-

berland and Durham. British Journal of Preventive and Social
Medicine, 18: 17-24.
Lee, L.F., (2002). Consistency and Efficiency of Least Squares

Estimation for Mixed Regressive, Spatial Autoregressive
Models, Econometric Theory, 18, p.252-277.
Lee, L. F. (2003). Best spatial two-stage least squares estimators

for a spatial autoregressive model with autoregressive distur-
bances. Econometric Reviews, 22:307-335.
Lee, L. F. (2004). Asymptotic Distributions of Quasi-Maximum

Likelihood Estimators for Spatial Econometric Models. Econo-
metrica, 72: 1899-1926.
Lesage, J. P. and Pace, R.K. (2007). Spatial Econometric

Modeling of Origin-Destination Flows.
Parent, O. and Lesage, J. P. (2007). Using the Variance Struc-

ture of the Conditional Autoregressive Spatial Specification to
Model Knowledge Spillovers.
Mahalanobis, P. C. (1947). On large scale sample surveys,
Philosophical Transactions of Royal Society, 231(B), 329-451.
Mantel, N. (1967). The detection of disease, clustering and a

generalized regression approach. Cancer Research, 27: 209-220.
4.4. References 77
Mardia, K. and Marshall, R. (1984). Maximum likelihood esti-

mation of models for residual covariance in spatial regression.
Biometrika, 71:135-146.
Mardia, K. and Watkins, A. (1989). On multimodality of the

likelihood for the spatial linear model. Biometrika, 76:289-295.
Moran, P. (1947). Random association on a lattice, Proceedings of

Cambridge Philosophical Society, Vol. 43, 321-328.
Moran, P. (1948). The interpretation of statistical maps.

Moran, P. (1950). A test for the serial dependence of residuals.

Moran, P. A. P. (1973). A Gaussian Markovian process on a

square lattice, Journal of Applied Probability, 10: 54-62.
Moscone, F., Knapp, M. and Tosetti, E. (2007). Mental health

expenditure in England: A spatial panel approach. Journal of
Health Economics, 26, 842-864.
Nelson, G. C. (2002). Introduction to the special issue on spatial

analysis. Agricultural Economics, 27(3):197-200.
Ord, K. (1975). Estimation Methods for Models of Spatial Inter-

action, Journal of the American Statistical Association, 70,
p.120-126.
Paelinck, J. and Klaassen, L. (1979). Spatial Econometrics.

Saxon House, Farnborough.
Papadakis, J. S. (1937). Methode statistique pour des expe-

riences sur champ. Bulletin Scientifique, No. 23, Institut
d’Amelioration des Plantes a Thessaloniki (Greece).
Pinkse, J. (2004). Moran-flavored tests with nuisance parameter. In
Anselin, L., Florax, R. J., and Rey, S. J., editors, Advances in

Spatial Econometrics, pages 67-77. Springer-Verlag, Heidelberg.
Pinkse, J. and Slade, M. E. (1998). Contracting in space:

An application of spatial statistics to discrete-choice models.
Journal of Econometrics, 85:125-154.
Pinkse, J., Slade, M. E., and Brett, C. (2002). Spatial price

competition: A semiparametric approach. Econometrica,
70(3):1111-1153.
Ripley, B. D. (1981). Spatial Statistics. Wiley, NY.
Ripley, B. D. (1988). Statistical Inference for Spatial Processes.

Cambridge University Press, Cambridge.
Royaltey, H., Astrachan, E. and Sokal, R. (1975). Tests

for pattern in geographic variation. Geographic Analysis, 7:
369-396.
Saavedra, L. A. (2003). Tests for spatial lag dependence based on

method of moments estimation. Regional Science and Urban
Economics, 33, 27-58.
Smirnov, O and Anselin, L. (2001). Fast maximum likelihood

estimation of very large spatial autoregressive models: a
charateristic polynomial approach. Computational Statistics
and Data Analysis, 35, p.301-319.
Tiefelsdorf, M. and Boots, B. (1995). The exact distribution of

Morans I. Environment and Planning A, 27:985-999.
Tobler, W. (1970). A computer movie simulating urban growth in

the Detroit region. Economic geography, 46: 234-240.
Wall, M. M. (2004). A close look at the spatial structure implied

by the CAR and SAR models. Journal of Statistical Planning
and Inference, 121:311-324.
4.4. References 79
Waller, L., Carlin, B., Xia, H., and Gelfand, A. (1997b).

Hierarchical spatiotemporal mapping of disease rates. Journal
of the American Statistical Association, 92:607-617.
Waller, L. A. and Gotway, C. A. (2004). Applied Spatial

Statistics for Public Health Data. John Wiley, Hoboken, NJ.
Whittle, P. (1954). On stationary processes in the plane,

Biometrika, 41:434- 449.
Wikle, C. K., Berliner, L. M., and Cressie, N. (1998).

Hierarchical Bayesian space-time models. Environmental and
Ecological Statistics, 5:117-154.
Yasui, Y. and Lele, S., (1997). A regression method for spatial

disease rates: An estimating function approach, Journal of the
American Statistical Association, 92, 21-32.
Zhang, H. (2002). On estimation and prediction for spatial gener-

alized linear mixed models. Biometrics, 56:129-136.
5
Epilogue
81

Bera, A Et Al. - Spatial Analysis From The Beginning To The Frontiers of Spatial Econometrics

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Bera, A Et Al. - Spatial Analysis From The Beginning To The Frontiers of Spatial Econometrics

Încărcat de

Drepturi de autor:

Formate disponibile

1

Spatial Analysis: From the Beginning to the Frontiers of

Conventionally, spatial data are classiﬁed into three types: geostatisti-

3.1 Geostatistical Data

Although in many the geostatistical applications normality has been

When Var[Y (si ) − Y (sj )] = 2γ(si − sj ), a function only of the incre-

Y (si ) = µ(si ) + Z(si ) + ϵ(si ), (3.4)

i = 1, . . . , n, where {µ(si )} captures the large scale variation (spa-

Another type of stationarity, intrinsic stationarity, is motivated by

3.1.2.3 Covariogram and Variogram

Under weak stationarity, the relationship between variogram 2γ(h) and

2γ(h) = Var[Z(s + h) − Z(s)] = 2[C(0) − C(h)]. (3.5)

Therefore, variogram γ can be from covariogram C. The converse is

C(h) = C(0) − γ(h) = lim γ(u) − γ(h). (3.6)

Table 3.1 Isotropic covariogram examples as function of distance d for L2

Structure Parameters Covariance C(d)

3.1.2.4 Parametric Isotropic Models

The simplest model for covariogram or variogram assumes isotropy.

having a measurement error process ϵ(si ) in model (3.4). The covariance

3.1.2.5 Hierarchical Model

Hierarchical models have other names such as multilevel models and

g[µ(si )] = Xi⊤ (si )β + Z(si ). (3.7)

3.1.3.1 Variogram Estimation

Covariogram is only deﬁned for weakly stationary processes, whereas

N [h(l)] = {(si , sj ) : si − sj ∈ T [h(l)])} (3.9)

where T [h(l)] is some speciﬁed “tolerance” region around h(l). The

stricted ML (REML) method, similar to those in time series analysis.

3.1.3.2 Increasing-Domain or Inﬁll Asymptotics

There are two types of asymptotics in spatial statistics: increasing-

3.1.3.3 Prediction (Kriging)

where Γij = γ(si − sj ), γ = [γ(s0 − s1 ), . . . , γ(s0 − sn )]⊤ . This solution

where Σ is the covariance matrix of Y , and c = [C(s0 − s1 ), . . . , C(s0 −

(X ⊤ Σ−1 X)−1 X ⊤ Σ−1 Y . The resulting predictor can be expressed as

which is exactly the conditional mean of Y (s0 ) with β replaced with β̂

Var[Ŷ (s0 )] = λ⊤ Σλ. (3.15)

In practice, the semivariogram γ(·) or covariagram C(·) will need to

3.1.3.4 Bayesian Inference

example, ? develop spatial models with regression coeﬃcients smoothly

Simulation from a Gaussian process with given covariance function can

3.2 Lattice Data

A lattice D in Rr is a countable collection of spatial locations. A lattice

for example, ??).

3.2.2.1 Simultaneous Autoregressive (SAR) Model

where Z = [Z(s1 ), . . . , Z(sn )]⊤ , B is a n×n spatial-dependence matrix,

Z ∼ N [0, (I − B)−1 Λ(I − B ⊤ )−1 ]. (3.18)

ρ must be within (λ−1 −1

3.2.2.2 Conditional Autoregressive (CAR) Model

and furthermore, for i ̸= j,

if I − C is full rank. Similar to the construction of B for SAR, matrix C

3.2.2.3 SAR and CAR

The distinction between conditional and simultaneously speciﬁed mod-

3.2.2.4 Markov Random Field (MRF)

If the full conditional distribution Z(si ) given the rest Z(sj ) : j ̸= i in a

π[z(si )|z(sj ) : j ̸= i] = π[z(si )|z(sj ) : j ∼ i]. (3.24)

joint distribution in terms of all the full conditionals:

where z = {z(s1 ), . . . , z(sn )}⊤ , and y = {y(s1 ), . . . , y(sn )}⊤ is a ﬁxed

where Q(z) = log{Pr(z)/ Pr(z 0 )} is called the the negpotential func-

where Mk is a collection of all the cliques of size k, and ϕk is a function

π[z(si )|z(sj ) : j ̸= i] = exp {Ai Bi [z(si )] + Ci [z(si )] + Di } , (3.28)

where Ai and Di are expressions of {z(sj ) : j ̸= i}. Assume further

where θij = θji .

where θij = 0 unless si and sj are mutual neighbors.

The negpotential function for model (3.32) can be shown to be, up to