Documente Academic
Documente Profesional
Documente Cultură
Cover Page
Anil K Bera
abera@uiuc.edu
University of Illinois, Urbana-Champaign, IL
Pradosh Simlai
pradosh.simlai@business.und.edu
University of North Dakota, Grand Forks, ND
Jun Yan
jun.yan@uconn.edu
University of Connecticut, Storrs, CT
1
2
Moran’s Test: The first formal result
3
3
Developments in the Statistics literature
5
6 Developments in the Statistics literature
3.1.2 Fundamentals
3.1.2.1 Variation Decomposition
A general model for geostatistical data is to decompose the observation
into large scale variation and small scale variation (?):
3.1.2.2 Stationarity
To proceed with the spatial process {Z(si )} in (3.4), some stationarity
is needed. Otherwise, no inference can be made since the data is only an
incomplete sampling of a single realization. A random field {Z(s)} in
D ⊂ Rr is strictly stationary if all finite dimensional distributions are
invariant under translations of the index set D. That is, for any k ≥ 1
and any h ∈ Rr , the distribution of [Z(s1 ), . . . , Z(sk )] is the same as
that of [Z(s1 +h), . . . , Z(sk +h)]. A practically more useful stationarity
is weakly stationary or second-order stationary, which only specifies
the first two moments. A random field {Z(s)} in D ⊂ Rr is weakly
stationary if for any s, t ∈ D, E[Z(s)] = µ and Cov[Z(s), Z(t)] =
C(s − t). The function C(·) is also known as covariogram. If C(s − t) is
invariant under rotations of the index set D, i.e., C(s − t) is a function
only of the Euclidean distance ∥s−t∥, then C(·) is isotropic. Obviously,
strict stationary is stronger than weak stationarity except for Gaussian
fields where the first two moments identify the whole distribution.
8 Developments in the Statistics literature
The quantity C(0) or γ(∞) is called the sill of the semivariogram. The
infimum of ∥h∥ such that γ(h) = C(0) is called the range of the semi-
variogram in the direction of h. The discussion of (3.5) and (3.6) also
reveals that weak stationarity is stronger than intrinsic stationarity.
Note that covariogram or variogram only characterizes the second
moment, not the full distribution of the random field. Moment-based
estimation for these functions can be used to satisfy the conventional
need of prediction in (3.2).
3.1. Geostatistical Data 9
3.1.3 Inferences
where N (h) is the set of pairs (si , sj ) such that si − sj = h, and |N (h)|
is the number of pairs in N (h). For irregularly spaced data, the set
N (h) in (3.8) needs to be binned:
more densely in a fixed domain. Instead, they are used because that the
asymptotic results may be useful for the specific problems, estimation
or prediction, to be solved (?, p.62).
Ŷ (s0 ) = x⊤ −1
0 β̂ + cΣ (Y − X β̂), (3.14)
3.1.4 Simulation
Lattice data differs from geostatistical data in that the spatial index
s of a lattice varies discretely, instead of continuously, over the spatial
index collection D.
According to whether the spatial locations in D form a regular grid,
there are regular lattice and irregular lattice. Each individual location
may be a spatial point or a region. In the former case, the data is
a special form of geostatistical data. In the latter case, the data is
also referred as areal data. Regular lattice often arises under controlled
agricultural or ecological experiments where the system of sites forms a
regular grid. Irregular lattice data are collected at natural locations or
from regions with administrative boundary in a geographical context.
Lattice data is often supplemented with a system of neighbor in-
formation. Neighbors of a location si ∈ D can be defined based on
distance measures or similarity measures. Let N (si ) = {sk : sk ∼ si },
where a ∼ b means that a is a neighbor of b. Then a lattice D
supplemented with neighborhood information can be expressed as
DN = {[si , N (si )] : i = 1, . . . , n}. Various ways of defining neighbors
are appropriate under various circumstances. For example, in geograph-
ical studies such as disease mapping, two regions are neighbor if they
share a boundary, or if their centers are within a certain distance.
In many applications, the main scientific interest in lattice data
analysis is to assess the influence of a set of covariates on a response
variable, with the spatial dependence appropriately accounted for. The
spatial dependence may or may not be of direct concern. Even when it is
a nuisance, a solid treatment is necessary for valid and efficient inference
on regression coefficients. Spatial statistics for lattice data has found
applications in a wide range of disciplines, including the most fruitful
area of disease mapping in epidemiology and public health studies (see,
3.2. Lattice Data 17
3.2.2 Models
A general model for lattice data accommodates both large scale vari-
ation and small scale variation. Large scale variation can incorporate
covariates. Small scale variation captures the spatial dependence. We
start from Gaussian (SAR and CAR) models for spatial dependence,
extend them to general Markov random fields, and finally add covari-
ates for spatial trends. For SAR and CAR Gaussian models, without
loss of generality, we assume {Z(s) : s ∈ D} has mean zero.
Z = BZ + u, (3.17)
Note that the spatial lag dependence model and spatial error depen-
dence model in the econometrics literature are both simultaneously
specified model. The most notable feature of simultaneously specified
models is that they facilitate likelihood-based inferences.
A frequently used choice for the spatial-dependence matrix B is
B = ρW , where ρ is a spatial autoregression parameter, and W is a
contiguity matrix with the (i, j)th entry being 1 if i and j are neigh-
∑
bors and 0 otherwise. In this case, Z(si ) = ρ sj ∼si Z(sj ) + ui . One
limitation of this specification is that, in order for I − B to be full rank,
18 Developments in the Statistics literature
in a consistent way such that they uniquely determine a valid joint dis-
tribution? This question is to be answered with the theory of Markov
random fields later.
The consistency of conditional specification can be ensured in a
simple CAR Gaussian model. Let M = diag{τ12 , . . . , τn2 }. Let C be a
spatial-dependence matrix with cij τj2 = cji τi2 and cii = 0. A CAR
model specifies a full conditional Gaussian model for each Z(si ) with
∑
conditional mean j̸=i cij Z(sj ) and conditional variance τi2 . The joint
distribution of Z is then
( )
Z ∼ N 0, (I − C)−1 M , (3.21)
where π is the joint density function of Z. Note that the overall level
∑
of Z is not specified here. A constraint ni=1 Zi = 0 would lead to a
proper distribution. This specification is referred to as an intrinsically
autoregressive (IAR) model (?). The power of τ in the normalizing
constant is −(n − 1)/2 instead of −n/2 (?), which is important in
estimating τ . This specification is widely used as a prior for spatially
correlated random effects in Bayesian hierarchical modeling (?).
The SG model is dominating in the econometrics literature while the
CG model is much more popular in the statistics literature. Compared
to the CG model, the SG model may lead to misspecification due to
the “edge effect” when applied to a subregion of a region. For example,
consider Z(s) measured at all counties in the US. Suppose that the
true model for Z is a SG model (3.18). and that we only have data
from counties in one state, for example, Iowa. Simply extracting the
corresponding rows from (3.18) gives a misspecified model for Iowa,
because those surrounding counties outside of Iowa need to be involved.
For a CG model, although edge sites result in a complicated likelihood,
the model specification problem is less serious.
When they are compatible, Brook’s lemma (?) gives the shape of the
3.2. Lattice Data 21
∏
n
π [z(si )|z(s1 ), . . . , z(si−1 ), y(si+1 ), . . . , y(sn )]
π(z) = π(y) , (3.25)
π [y(si )|z(s1 ), . . . , z(si−1 ), y(si+1 ), . . . , y(sn )]
i=1
exp[Q(z)]
π(z) = ∫ , (3.26)
y∈ζ exp[Q(y)]
3.2.2.5 Auto-Models
Auto-models are derived from the general MRF under appropriate as-
sumptions (?). The CAR Gaussian model (3.19) is also known as auto-
normal model. We now discuss two widely used auto-models: auto-
logistic model and the auto-Poisson model. Two other auto-models for
discrete data, auto-binomial and auto-negative-binomial, can be found
in ?.
Assume conditional exponential distributions for Z(si ):
a additive constant,
∑
n ∑ ∑
n
Q(z) = αi z(si ) + θij z(si )z(sj ) − log z(si )!, (3.33)
i=1 1≤i<j≤n i=1
Note that model (3.35) is a conditional GLM given the latent random
effect Z(si ), while model (3.34) is a marginal GLM.
For non-Gaussian data, ? adds another source of random effects
ϵ(si ), which is independent of Z(si ), into model (3.35):
where {ϵ(si )} are independent N (0, σϵ2 ). Random effects Z captures the
spatial clustering effects while ϵ captures the spatial heterogeneity. This
model has been adopted in a wide variety of applications, for instance,
disease mapping, image restoration, and so on.
3.2.3 Inferences
3.2.3.1 Likelihood Method
The exact likelihood for MRF model for a lattice data is constructed
from their joint density (3.26). The normalizing constant of this joint
density is very difficult to evaluate when the dimension n is large. All
likelihood methods have to solve this problem.
For a Gaussian model, either SG or CG, assume that Z(s) ∼
N [x(s)β, Σ(γ)], where β and γ are respectively mean and covariance
parameter vectors. Evaluating the loglikelihood demands the inverse
and the determinant of Σ, which are both available from the Cholesky
decomposition of Σ. A CG model gets around the inversion since Σ−1
is specified, but the determinant |Σ| is needed for both SG and CG.
Some computational simplifications are possible. For example, when
3.2. Lattice Data 25
∏
C = ρW in a CG model, ? shows that |I − ρW | = ni=1 (1 − ρωi ),
where ωi ’s are eigenvalues of W . Alternatively, one can maximize the
profiled likelihood for a grid of ρ values in (−1, 1). ? give the increasing-
domain asymptotics: under certain regularity conditions, the MLEs are
consistent and asymptotically normal
For an auto-logistic model, it is impractical get the normalizing con-
stant except for very small n values. The normalizing constant is the
summation of 2n terms. It is very hard to approximate even using the
bootstrap method (?) An auto-Poisson model shares the same difficulty.
In stead of approximating the likelihood itself, it is possible to approxi-
mate the MLEs using Monte Carlo methods (??). The idea is to use the
Metropolis algorithm and variants to simulate data without knowing
the normalizing constant and make inference using the simulated data.
These methods in general require a reasonably good starting value,
for example, maximum pseudo likelihood estimator, and iterate many
times to achieve convergence. That means the methods are computing
intensive and convergence monitoring is important.
For hierarchical models with random effects, estimation is in general
a hard problem, since the random effects need to be integrated out to
obtain the marginal likelihood. MCMC methods have been used in a full
Bayesian framework (?). Alternatively, ? proposed a penalized quasi-
likelihood (PQL) approach, which is based on a Laplace approximation
to integrate out the random effects.
Estimating functions theory for dependent data (?) gives the optimal
linear unbiased estimating functions
1 T −1
Ūn (β) = D V (Y − µ) = 0, (3.39)
n
where µ = E(Y ), V = Var(Y ), and D = ∂µ/∂β. However, since Var(Y )
is in general unknown, a reasonable approximation W to V −1 will also
give good results. The pseudo likelihood method is a special case in
which the estimating functions are the pseudo score functions.
The estimator β̂n under regularity conditions is consistent and
√
asymptotically normal with n(β̂n − β0 ) → N (0, Ξ), where β0 is the
true parameter value. The asymptotic variance matrix Ξ can be ap-
proximated by a sandwich form: Hn−1 Σn Hn−1 , where H = ∂ Ūn /∂β and
Σn = nVar[Ūn (β)]. The outer terms Hn can be estimated by their
empirical means. However, the middle term Σn can be very hard to
estimate. To appreciate the difficulty, rewrite Var[Ūn (β)] as
1 ∑
Var[Ūn (β)] = Eβ0 Ui (β)Uj⊤ (β) . (3.40)
n
i,j∈Dn
3.2.4 Simulation
Fig. 3.1 Spatial point pattern examples. Left: complete spatial randomness; Center: spatial
regularity; and Right: spatial clustering.
E[Y (ds)]
λ(s) = lim , (3.42)
|ds|→0 |ds|
γ(si , sj )
g(si , sj ) = . (3.44)
λ(si )λ(sj )
Stationarity and isotropy can be also defined for spatial point pro-
cesses. A spatial point process is stationary if the distribution of Y (B)
is invariant to translation of the coordinate system. If furthermore, the
distribution of Y (B) is invariant to rotation of the coordinate system,
then the process is isotropic. For a stationary and isotropic spatial point
process, the intensity function is constant, and the second-order inten-
sity γ(si , sj ) and the pair-correlation g(si , sj ) are functions of ∥si − sj ∥
only.
32 Developments in the Statistics literature
3.3.3 Inferences
∑
n
λ̂b (s) = wb−1 (s)kb (s − si ), (3.51)
i=1
∫
where kb is a kernel with bandwidth b > 0 and wb (s) = S kb (t − s)dt is
an edge correction factor. The kernel choice is usually not important.
The bandwidth b, however, can be very influential on the estimate.
The bias-variance trade-off here is similar to that for kernel density
estimation. MSE is used to measure the quality of the estimator; see ?
more details.
3.3. Point Pattern 35
For CSR, the theoretical L function is zero. If the difference L(h) is sig-
nificantly positive (negative), then there are more (fewer) points than
expected from CSR at distance h, suggesting spatial clustering (regu-
larity). Therefore, L̂(h) provides an intuitive way to test CSR.
∑
n ∫
log L(θ; s1 , . . . , nn ) = log λ(s; θ) − λ(u; θ)du. (3.56)
i=1 S
The difficulty of (3.56) lies in the second term. This integral is the to-
tal intensity over the whole study region S, requiring information for
each s ∈ R, which can be impractical, if not impossible, to get when
λ has covariates incorporated. Kriging from geostatistical methods can
be used to give predictions of covariates at unsampled locations from
a collection of sampling sites. These predictions are then used to ap-
proximate the integral in (3.56) (?).
For a Cox process, the likelihood function is complicated by the fact
that the latent intensity needs to be integrated out. By treating the
latent intensity as missing data, Monte Carlo methods for computing
missing data likelihoods can be applied (?, section 10.3).
For a Markov point process, the difficulty is from the unknown
normalizing constant of the joint density. Earlier work (??) use some
expansion method for estimating the normalizing constant, which may
not be reliable in cases with strong interaction (?). The normalizing
constant can be quite accurately estimated with Monte Carlo methods
(?); see ? for a recent summary. A computationally simpler alternative
is the maximum pseudo likelihood method (?), which has been imple-
mented in the spatstat package (?). A composite likelihood approach
is proposed by ?.
3.3. Point Pattern 37
3.3.4 Simulation
Simulation from a homogeneous Poisson process with intensity λ on a
region can be done from the definition. We first generate Y (S) from
a Poisson distribution with mean λ|S| and then generate Y (S) inde-
pendent and uniformly distributed points on S. This algorithm is the
simplest when S is a box. When S is a circle (or ball in higher dimen-
sion), ? propose an alternative radial simulation procedure using the
polar coordinates. For irregular shapes of S, we simply simulate on a
box or circle that covers S and discard the points outside of S.
Simulation of an inhomogeneous Poisson process with intensity λ(s)
can be done by independent thinning a homogeneous Poisson process
(?). Let λ0 = sups∈S λ(s). We first simulate a homogeneous Poisson
process with intensity λ0 and then, independently retain each event s
with probability λ(s)/λ0 .
Simulation of Cox processes can be done utilizing its two-stage
random mechanism and discretization: first generate a step intensity
function from a discretized Gaussian field and then simulate from an
inhomogeneous Poisson point process with the realized intensity. Sim-
ulation of Markov point processes is more difficult since it involves the
38 Developments in the Statistics literature
Even though spatial statistics is quite well and established for some
time, the literature in econometrics is fairly new. The main objective
of this chapter is to explore the key developments in the spatial econo-
metrics literature during last 30 years. There exists a wide variety of
alternative econometric approaches for undertaking estimation and in-
ference using spatial regression models. A number of surveys and papers
provides an excellent overview of the methodological development that
proves to be relevant for actual practice, e.g. Anselin(1988, 2001, 2006),
Anselin and Bera (1998), Anselin, Bera, Florax and Yoon (1996), Kele-
jian and Prucha (2007), Lee and Yu (2010) etc. Also recently several
journals devoted an entire issue to spatial econometrics, e.g., Empirical
Economics (Vol.34(1), Feb 2008), Journal of Econometrics (Vol.40(1),
Sep 2007), Regional Science and Urban Economics (Vol.40(5), Sep
2010) etc. The enormity of the literature makes it difficult to cover
all the theoretical developments and published application in a single
survey. Instead we emphasis on some of the influential estimation and
inference techniques which can elucidate the appeal of the economic ap-
plication of spatial econometric modeling. Both theoretical and applied
econometrician can reasonably benefit from our focus on different im-
39
40 Key Developments in Spatial Econometrics
In this subsection we first describe the basic MLE method for various
simultaneously specified spatial models. We start with the simplest
possible pure endogenous model:
Yn = ρWn Yn + ϵn (4.1)
where Yn is an n × 1 vector of values of the dependent variable, ϵn is
an n × 1 vector of disturbances, Wn yn is the spatially lagged depen-
dent variable for weights matrix Wn , and ρ is the spatial autoregressive
parameter. For notational simplicity from now onwards we denote the
4.1. Estimation of Spatial Econometric Models 41
subscript n only for model specification purpose. Even though the ap-
plication of this model is limited in practice, a discussion of (4.1) is
important as it underlies the formulation of most simultaneous spatial
regression model.
If we assume that ϵ ∼ N (0, σ 2 I), then log likelihood function for
ρ, σ 2 , given that Y = y, takes the following form
′
n n (y − ρW y) (y − ρW y)
l(ρ, σ 2 ) = − ln(2π) − ln(σ 2 ) − + ln |I − ρW |
2 2 2σ 2
(4.2)
As mentioned by Anselin and Bera (1998), in contrast to time-series,
the spatial jacobian is not the determinant of a traingular matrix, but of
a full matrix. However, one can simplify the calculation of the jacobian
of the transformation by using a result of Cliff and Ord (1973, p.165)
[see also, Rao (1973, p.40)]. If W have ω1 , . . . , ωn as its eigenvalues,
then by the definition of the characteristic equation,
∏
n
|ωI − W | = (ω − ωi ).
i=1
∏
n
|I − ρW | = (1 − ρωi ).
i=1
(y − ρW y)′ (y − ρW y) y ′ A′ Ay
bM
σ 2
L = =
n n
1 Let W be a real symmetric matrix. Then the determinantal equation (also called charac-
teristic equation) |W − λI| = 0 is of degreee ≤ n in λ if W is n × n. Corresponding to any
eigenvalues λi of W , there exists a non-null column vector Pi , such that W Pi = λi Pi and
Pi′ Pi = 1
42 Key Developments in Spatial Econometrics
n ∑ n
lc = constant − b2 +
ln σ ln(1 − ρωi )
2
i=1
The asymptotic variance matrix follows as the inverse of the informa-
tion matrix (Ord 1975)
( )
2 trB/σ 2 tr(B ′ B) − α
AsyV ar(ρ, σ ) = (4.3)
n/2σ 4 trB/σ 2
∑
where B = W A−1 and α = − ni=1 ωi2 /(1 − ρωi )2 . It is evident that the
existence of the spatially lagged variable makes it difficult to have the
usual asymptotic properties under approaches such as ordinary least
squares (OLS) as it ignores the jacobian term. For example, the OLS
estimator for model (4.1) is the solution of the estimating equation
0 = y ′ W y − ρy ′ W ′ W y = y ′ (I − ρW ′ )W y = e′ W y (4.4)
where e = Ay is the observed disturbance terms. Equation (4.4) will be
an unbiased estimating equation only if E(ϵ′ W y) = 0, implying that
tr[W ′ (I − ρW )−1 ] = 0. For consistency, we need
Yn = ρWn Yn + Xn β + ϵn (4.5)
where Xn is an n × k matrix of fixed regressors (exogenous variables)
and β is a k × 1 vector of parameters. It is clear that for ρ known, the
OLS estimators for β are best linear unbiased (BLUE). For ρ unknown,
one option is to use the MLE procedure to estimate model parameters.
4.1. Estimation of Spatial Econometric Models 43
′
n n (y − ρW y − Xβ) (y − ρW y − Xβ)
l(ρ, β, σ ) = − ln(2π)− ln(σ 2 )−
2
+ln |I−ρW |
2 2 2σ 2
(4.6)
Hence, from the first order conditions we obtain the ML estimates for
β and σ 2 as the following
βM L = (X ′ X)−1 X ′ (I − ρW )y = (X ′ X)−1 X ′ z
and
(y − ρW y − XβM L )′ (y − ρW y − XβM L )
bM
σ 2
L =
n
where z = Ay is the spatially filtered dependent variable. If ρ were
known, then these estimates are simply given by OLS of z on X. To
estimate ρ, similar to model (4.1), for spatial lag model (4.5), we use
the concentrated log-likelihood function:
[ ] ∑n
n (e0 − ρeL )′ (e0 − ρeL )
lc = − ln + ln(1 − ρωi )
2 n
i=1
where e0 and eL are residuals in a regression of y on X and W y on
X, respectively (Anselin 1980, ch. 4). The asymptotic variance for the
estimators is given by
Another attractive model that has been used in the literature is the
linear regression with spatial error autocorrelation model:
Yn = Xn β + ϵn , ϵn = λWn ϵn + un (4.8)
44 Key Developments in Spatial Econometrics
Y = λW Y + Xβ − λW Xβ + u
However, when λ is unknown (which is usually the case), MLE is one
of the possible alternative method. Note that, model (4.8) leads to a
error covariance structure given by
1
− {n + σu−2 (y − Xβ)′ (I − λW )′ (I − λW )(y − Xβ)} = 0
2σu2
∑
n
ωi 1
− + 2 (y −Xβ)′ {(I −λW )′ W +W ′ (I −λW )}(y −Xβ) = 0
1 − λωi 2σu
i=1
( ) ∑
n yL′ yL − yL′ XL [XL XL ]−1 XL′ yL
lc = − ln + ln(1 − λωi )
2 n
i
∑
N
uit = λ wij ujt + νit , (4.14)
j=1
∑
N
ϵit = λ wij ϵjt + uit , (4.15)
j=1
The ML based estimation and inference for spatial panel data mod-
els can be done easily by exploiting two dimentional nature of the data
and related matrix algebra results. For example, consider the the ran-
dom effects model with spatially dependent variable (4.13) and (4.14),
expressed in matrix form:
y = ρ(IT ⊗ W )y + Xβ + ϵ, (4.17)
ϵ = (iT ⊗ µ) + u, (4.18)
where y = (y1′ , . . . , yT′ )′ , X = (X1′ , . . . , XT′ ), u = (u′1 , . . . , u′T ), iT is a
T × 1 vector of one and ⊗ is the kronecker product. Assume u ∼
n(0, σu2 IN T ), and define A = IN − ρW and θ = (σu2 )/(T σµ2 + σu2 ), the
log likelihood function of model (4.17) and (4.18) is:
[ ]
NT N 1 ′ 1 ′
l(ρ, β, θ, σ ) = −
2
ln(2πσu )+ ln θ+T ln |A|− 2 ϵ IN T − (1 − θ) (iT iT ) ⊗ IN ϵ,
2
2 2 2σu T
4.1. Estimation of Spatial Econometric Models 47
y = Xβ + ϵ, (4.19)
becomes
NT 1
l(β, λ, η) = − ln(2πσu2 ) + (T − 1)ln|B| − ln |T ηIN + (B ′ B)−1 |
2 2
[ ] [( ) ]
1 ′ 1 ′ ′ −1 −1 1 ′ 1 ′ ′
− 2ϵ iT iT ⊗ [T ηIN + (B B) ] ϵ− 2 ϵ IT − iT It ⊗ (B B) ϵ,
2σu T 2σu T
where ϵ = y − Xβ and B = IN − λW . Solving the first order conditions
gives us the estimate of β and σu2 . Upon substituting those estimate
in the likelihood function we end up with the concentrated likelihood
function for λ and η. The final ML estimate can be obtained by a two
stage iterative procedure that alternates the estimate of β and σu2 on
one side, and λ and η on the other.
ϵ′ H[H ′ ΩH]−1 H ′ ϵ
results in the following estimator
4.1. Estimation of Spatial Econometric Models 49
b −1 Z ′ Dy
γGM M = (Z ′ DZ) b (4.21)
where Db = H(H ′ ΩH)
b −1 H ′ , and the estimated asymptotic covariance
matrix becomes
b −1 H ′ Z]−1
γGM M ) = [Z ′ H(H ′ ΩH)
V ar(b
where Ω b is a consistent estimator of Ω. In other words, Kelejian and
Robinson obtained their efficient GMM estimator by using the set of
instruments H in a 2 stage least squares (2SLS) procedure to obtain
a consistent preliminary estimate of γ. Then they utilize the estimate
of γ in y = Zγ + u to obtain u b and utilize the estimate of u to obtain
consistent estimates of σbη and σ
2 bψ2 . At the end, the equation (4.21) can
be used to obtain γ bGM M . Recently in the context of both spatial lag
and spatial error model, various suggestions has been made regarding
optimal instruments (see e.g., Lee (2003), Kelejian and Prucha (2004)
etc.)
In the formulation of spatial autoregressive error process Kelejian
and Prucha (1998, 1999) suggested a generalize moments (GM) ap-
proach. The approach has become very popular since its publication.
The idea is to utilize the empirical counterpart of three moment con-
dition on u and W u for the second part of the model (4.8):
[ ]
E n1 u′ u = σ 2
[1 ′ ′ ] 1 2
E u W W u = σ T r(W ′ W )
n [ n]
E n1 u′ W ′ u = 0
was for spatial error model, we can use it in the presence of spatial lag
structure in the mean specification. In order to deal with both spatial
lag and spatial error model, one can utilize the following 3 steps (see
Kelejian and Prucha (1998) for details):
ϵ = λ(IT ⊗ W )ϵ + u, (4.22)
u = (iT ⊗ IN )µ + ν, (4.23)
4.1. Estimation of Spatial Econometric Models 51
So the innovations uit are autocorrelated over time only and are not
spatially correlated across units. The variance covariance matrix of u
is
where u = (IT ⊗ W )u. In some sense, the above set up generalizes the
moment structure originally introduced in Kelejian and Prucha (1998,
1999). For example, if T = 1, we have Q0 = 0 and the first three
moment conditions become uninformative, and the last three equa-
tions reduce to those of Kelejian and Prucha (1998, 1999). Utilizing
the estimate of the spatial autoregresive parameter and the variance
components of the disturbance process, we use a FGLS type procedure
for regression parameters.
52 Key Developments in Spatial Econometrics
Yn = Yn B + Xn C + Y n Λ + Un
Un = U n R + E n (4.24)
where Yn = (y1,n , ..., ym,n ), Xn = (x1,n , ..., xk,n ),Un =
(u1,n , ..., um,n ), y j,n = Wn yj,n , j = 1, ..., m, Y n = (y 1,n , ..., y m,n ), U n =
(u1,n , ..., um,n ), uj,n = Wn uj,n , En = (ϵ1,n , ..., ϵm,n ), R = diagj=1 m (ρ ).
j
Note that, yj,n , uj,n and ϵj,n are all n × 1 vectors in the jth equation,
xl,n is n×1 vector on lth exogenous variable and the ith element of y j,n
∑
is y ij,n = r=1 n wir,n yrj,n . From this given set up we derive the next set
of equation
yn = Bn∗ yn + Cn∗ xn + un
un = Rn∗ un + ϵn (4.25)
where yn = vec(Yn ), xn = vec(Xn ), un = vec(Un ), ϵn = vec(En ),
Bn∗ = [(B ′ ⊗ In ) + (Λ ⊗ Wn )], Cn∗ = (C ′ ⊗ In ), Rn∗ = (R ⊗ Wn ) =
n (ρ W ).
diagj=1 j j
Finally for j = 1, . . . , m, we can express the entire system by the
following common form
(1) diag(Wn ) = 0
(2) (I − ρWn ) nonsingular
(3) row and column sums of W, (I − ρWn )−1 are bounded in
absolute value
(4) regressors have full column rank, their elements are uniformly
bounded, same is true for instruments if any
(5) ϵi,n ’s are iid with finite fourth moments
(6) positive smallest eigenvalue for their moments matrix
2 Theseassumptions originated in Anselin and Kelejian (1997). Later Kelejian and Prucha
modified and extended them in their subsequent papers.
4.2. Asymptotics in Spatial Econometrics: Methods of Analysis 57
For consistency of 2SLS estimators δen Kelejian and Prucha uses the
following theorem on triangular inequality.
A.1: If {νi,n , 1 ≤ i ≤ n, n ≥ 1} ∼ iid(0, σ 2 ) and A = {aij,n , 1 ≤
i ≤ n, n ≥ 1} both follows triangular arrays, then n−1/2 A′n νn →D
N (0, σ 2 Q).
The idea is to break 2SLS estimators into two parts, and apply the
theorem on the 2nd part to derive the required result.
For consistency of 2SLS estimators with spatial correlation, Kelejian
and Prucha uses the following result.
A.2: For a triangular array {ξi,n , 1 ≤ i ≤ n, n ≥ 1}, a sufficient
∑
condition for n−1 ni=1 ∥ξi,n ∥s = Op (1), s > 0 is E|ξij,n |s ≤ cξ < ∞.
Since one can break the estimated residual from a regression with
spatial error term:
en = un + Zn (δ − δen )
u
Kelejian and Prucha showed (δ− δen ) = Op (n−1/2 ) and following A.2,
E|zij,n |3 ≤ cz . If z includes only non-random x’s then it’s not a problem.
But since in their case it includes endogenous lagged regressors they
expand y n = Wn yn and uses triangle inequality to prove that E|y i,n |3 ≤
cz .
58 Key Developments in Spatial Econometrics
and showed that each term ∆in is Op (1). So using the assumption
on weights matrix, error distribution and Chebyshev’s inequality the
required result of error variance consistency follows.
Note that, in order to replicate this large-sample distribution in
small-samples one need to check various model assumptions and con-
duct a Monte-carlo study to see how this Generalized 2SLS (GS2SLS
as they say) procedure performs in compare to infill strategy of Lahiri
(1996), traditional ML based estimators of Anselin (1988), QMLE ap-
proach of Lee and Conely (1999) type GMM procedures. The approxi-
mation in small-samples is not difficult provided that the weights ma-
trix and data follows common features like symmetry, sparseness etc.
The comparison of bias, RMSE should shed some light on the effects
of sample size and number of neighbors on the small sample proper-
ties of alternative estimators. From the literature it’s still not clear
which methods are robust when it comes to their bias properties, exact
parametrization of the model or small sample efficiency. Also, it would
be interesting to compare 3 different error structure like Kelejian and
Prucha (1999) and Anselin and Moreno (2003) to see the effects of
non-normality, if any.
The limiting distribution of estimated θ’s are obtained using the mean
value expansion of g(Xsi ; θ). Then given the assumptions, he applied a
central limit theorem due to Bolthausen (1982) for stationary, mixing
random fields on regular lattices. Finally the use of Slutsky’s theorem
and Cramér-Wold device gives the result.
So basically he shows that GMM estimators remain consistent with
spatial dependent structure and distribution theory can be obtained
provided one can handle the complications of spatial dependent data
differently.
Conveniently, two ways one can can implement this procedure for
real data: the structure of spatial dependence can affect point estimates
or it can can have some impact on inference procedures. Even though
in his paper Conely never implemented the first issue one can try to ap-
proximate that by same algorithm of Hansen’s (1982, 84) 2-step GMM
approach. It would be interesting to see how over-identified case differs
from ML, 2SLS, GM based approaches when we allow spatial depen-
dence. For covariance matrix estimation he suggested two-dimensional
Bartlett window type weight function
Lee mentioned that for models with a few neighboring units {hn }
should be bounded. Also for LS approach, the fact that whether {hn }
is bounded or divergent has important implications. For example, for a
SAR model like Yn = Xn β + λWn Yn + Un , the LS estimators of β and
λ are consistent when {hn } is divergent and inconsistent when {hn }
is bounded (Lee 2002). As an example where {hn } → ∞ and satisfies
above two assumptions, he considers that of Case (1991).
Basically assumption 2 excludes the case where spatial distance di-
verge to ∞ at a rate ≥ n, the rate of the sample size, because the
MLE will be inconsistent otherwise. Also when {hn } is a divergent se-
quence, the MLE’s of λbn and σbn2 becomes asymptotically independent.
So it becomes easy to bypass the discussion of Anselin and Bera (1998)
on the impact of their asymptotic dependence on statistical inference
described in equation (4.7).
3 Tomake comparison with Conely simple here we are denoting s as some unit in the
population, otherwise Andrews used γ to denote them, but both of them are basically
same.
62 Key Developments in Spatial Econometrics
field which highlights the importance and attention this particular area
of research is getting. In this section we present a brief overview of the
existing methods of tests for spatial dependence.
ϵ = λW ϵ + u (4.29)
and the null hypothesis for our test is H0 : λ = 0. The general formula
for RS test is
e θ)
RS = d′ (θ)I( e −1 d(θ)
e (4.30)
where d(θ) = ∂l(θ)/∂θ is the score vector, I(θ) = −E[∂ 2 l(θ)/∂(θ)∂(θ)′ ]
is the information matrix, l(θ) is the log-lielihood function, and θe is
66 Key Developments in Spatial Econometrics
and the inverse of the information matrix is given by (4.7). Even though
under ρ = 0, the information matrix is not block diagonal, the asym-
totic variance of drho is obtained from the reciprocal of the (1, 1) ele-
ment of
[ ′ ′
]−1
Xβ)′
tr[W 2 + W ′ W ] + [W Xβ]σ2[W Xβ] (X Wσ 2
X ′ W Xβ X′X
σ2 σ2
Finally, the RS test agaist a spatial alternative takes the following form
de2ρ [e′ W e/eσ 2 ]2
RSρ = =
I(θ)e e ′ (I − X(X ′ X)−1 X ′ )(W X β)/e
[(W X β) e σ 2 ] + T r(W 2 + W ′ W )
(4.36)
Similar to RSλ , under the null H0 : ρ = 0, the RSρ statistic also has
χ21 distribution asymptotically.
More coming....hjjsaas=rwqrwqrwqrwq
4.4 References
In this section, we intend to provide an up-to-date and almost complete
list of references.
82(1), 72-82.
105: 59-83.
81