Smoothed Generalized Method of Moments Estimation of Threshold Spatial Autoregressive Model

Smoothed Generalized Method of Moments Estimation of Threshold Spatial Autoregressive Model
Zheng-Yu Zhang Center for Econometric Study Shanghai Academy of Social Sciences zyzhang@sass.org.cn Manuscript
Abstract We propose a class of threshold spatial autoregressive (TSAR) models that allow for heterogenous spatial dependence over dierent subpopulations. We develop for TSAR model a generalized method of moments (GMM) estimator that combines the essential features of Seo and Linton (2007)s smoothed least squares estimator for conventional threshold regression and Lee (2007)s GMM framework for linear SAR model. The resulting estimator is shown to be consistent and asymptotically normal. Choice of the best moment condition is discussed to motivate a more ecient feasible iterated GMM estimator with an initial estimator based on certain non-best moment condition. Simulation results indicate that the suggested estimators perform well in nite samples.
JEL Classication: C13; C14; C21;

Keywords: Threshold model; Spatial autoregressive model; Generalized method of moments; Kernel smoothing;
Introduction
Recently, topics concerning spatial dependence have received increasing attention in analyzing
economic problems using cross sectional or panel data. Economic models underpinning empirical work in urban, environmental, industrial organization and growth convergence often suggest that observed outcome variables under investigation are not independent of each other. Among the various models characterizing spatial dependence, the most popular one is perhaps the spatial autoregressive (SAR) model of Cli and Ord (1973), in which the value of the dependent variable corresponding to each cross-sectional unit is assumed, in part, to depend on a weighted average of that dependent variable corresponding to neighboring cross-sectional units, i.e.,
n
yi = 0
j=i
wij yj + xi 0 + ui , i = 1, , n,
(1)
where n is the number of total cross sectional units, yi is the outcome variable for i, 0 is the SAR parameter, [wij ]i=j;i,j=1, ,n s are termed as the spatial weights, which determine the structure of dependence among cross sectional units, xi is a kx -dimensional column vector of exogenous regressors (including a constant term), 0 is a kx -dimensional vector of slope coecients and ui s are i.i.d. error terms with zero mean and nite variance.
1
A number of methods have been proposed to estimate SAR model, including the method of maximum likelihood (ML) by Ord (1975), the method of moments (MM) by Kelejian and Prucha (1999, 2010), the method of quasi-maximum likelihood estimation (QMLE) by Lee (2004), the method of two-stage least squares (2SLS) by Kelejian and Prucha (1998), Lee (2003) and the generalized method of moments (GMM) by Lee (2007), Lin and Lee (2010). A common feature of these methods is that they are all developed to estimate the model with linear SAR structure and the SAR parameter held constant over the entire population. In this paper, we consider estimation of an alternative specication of the spatial model that aims to relaxing the linearity of SAR structure and constancy of SAR parameter. Particularly, we consider the following threshold SAR (TSAR) model, 0 y i + xi 0 + ui , if ti 0 ; yi = i = 1, , n,
d (0 + d )y i + xi (0 + 0 ) + ui , if ti > 0 ; 0
(2)
where y i =
1
j=i wij yj
is the spatial lag, t is the observed threshold variable, which can be an
Throughout the paper, any parameter with subscript zero represents the true parameter that generates the data.
d element of x and 0 is the threshold parameter.2 The coecients d and 0 then capture the 0
threshold eects for spatially lagged term and exogenous regressors. In general, consideration of TSAR model like (2) can be motivated from the following two aspects: First, the term spatial eects originally incorporates the notions of both spatial dependence and spatial heterogeneity. While modeling spatial dependence has received a lot of attention, spatial heterogeneity has not been adequately accounted for especially in analyzing cross sectional data.3 For example, spatial units are often heterogeneous in individual characteristics like size and location; members of social group interact with the strength and structure of social interaction changing across groups. See, e.g., Glaeser et al. (1996), LeSage (1999) for more discussions on spatial heterogeneity. Recent theoretical consideration of spatial heterogeneity includes Lin and Lee (2010), who extend the GMM method to allow for heteroscedasticity in the SAR model, Kelejian and Prucha (2010), who consider the GMM estimation with heteroscedasticity for a more general spatial model, and among others. On the other hand, linear SAR model such as (1) assumes both the direction and magnitude of spatial dependence to be constant over the entire population. Such spatial homogeneity, sometimes, is fairly restrictive. Recognizing this, Su and Yang (2011) allow for quantile-dependent SAR parameter and develop an instrumental variable quantile regression estimator for SAR model. Unlike the specication of quantile-dependent SAR parameter in Su and Yang (2011), we allow the SAR parameter in model (2) to be dependent on the threshold variable. The practical relevance of this TSAR model naturally arises when both spatial and threshold eects should be taken into account in analyzing a problem. Taking the housing pricing for example, it would be very reasonable to think that the way the prices of houses spatially correlated to each other in wealthy regions (with population proportion of low income status below some threshold) to be dierent from the way in poor regions (with population proportion of low income status above some threshold). Second, the threshold model (also called change-point model, two-phase regression, or sample splitting) has wide application in economics. Typical applications include modeling multiple equilibria in cross sectional growth convergence (Durlauf and Johnson, 1995), considering a countrys
2
To x the idea, we consider model (2), which contains rst-order spatial autoregression, only one threshold
variable and one threshold parameter. Some notationally complicated but essentially parallel work may extend our analysis to the TSAR model with higher-order spatial lags, multivariate thresholds and more than one threshold parameters. One may refer to Lee and Liu (2010) for a treatment of high order SAR model within the GMM framework. And Seo and Linton (2007) consider the threshold regression with multiple threshold variables. 3 For panel data models, this inadequacy is mitigated as one may allow for xed or random individual eects to capture unobserved spatial heterogeneity.
growth behavior is determined by its initial condition. Using the initial per capital output as the threshold variable, Hansen (2000) nds reasonable evidence for a two-regime specication in explaining cross-country real GDP growth. Meanwhile, the spatial issue has been widely explored in explaining cross sectional growth in recent years. Although economies are assumed to be independent in the neoclassical growth theory, technological advances in one economy might be transmitted to other economies. Consequently, the closed economy assumption might not be valid, which justies inclusion of the spatial dependence as an important factor in explaining growth convergence. For example, Yu and Lee (2009) adopt a spatial dynamic panel data approach to study regional growth convergence in the U.S. economy although they do not consider the threshold eect in their model. It can be argued that, if one considers both the threshold eect and the spatial dependence together, then the resulting model will be the TSAR model. The literature on threshold regression is vast. It has been studied for time series autoregressive models (Chan, 1993), for linear regression models (Hansen, 2000; Koul and Qian, 2002), for nonparametric models (Delgado and Hidalgo, 2000), for transformation models (Pons, 2003) and for binary choice model (Lee and Seo, 2008). However, none of these prior studies has explicitly taken the element of spatial dependence or more generally, cross sectional dependence into consideration and this paper serves to ll the gap. Compared with conventional threshold regression models, the setting of our STAR model complicates the econometric analysis in two ways. First, for TSAR model, the spatially lagged variable y i is generally correlated with the error term, which precludes any LS-based estimation procedures such as Hansen (2000) or Seo and Linton (2007). Although inclusion of an endogenous regressor in threshold regression is not new, e.g., Caner and Hansen (2004), naive two stage least squares estimator that replaces the endogenous part with its predicted value, and then implements the LS procedure by minimizing the sum of squared errors is not applicable to SAR model. The major reason seems to be that for SAR model like (1)-(2), observed variables yi , y i , xi are unlikely to be independently distributed across i, thus the residuals produced by the rst-stage regression are generally spatially correlated. As a result, the second stage estimation based on least sum of squared errors will be problematic. Second, in the prior studies of threshold model, the regressors are treated as random variables and are assumed to be identically and independently distributed across individuals. In contrast, most literature on spatial econometrics treats the regressors non-stochastic (Kelejian and Prucha, 1998, 1999, 2001, 2010; Lee, 2004, 2007; Su and Jin, 2009; Su, 2011, and among others). Therefore, some technical modication is needed to adapt the asymptotic analysis that was developed for
conventional threshold regression to the spatial context.4 The main theoretical concern for our TSAR model seems to be whether a desirable estimator of SAR parameter can be achieved in the presence of threshold eect, and vice versa, whether a desirable estimator of threshold parameter can be obtained in the presence of spatial dependence. To this end, we begin with the GMM framework for spatial model as systematically developed by Lee (2007). To make the standard Taylor expansion applicable in both calculation and asymptotic analysis, we smooth the objective function of GMM estimation in the spirit of Seo and Linton (2007)s smoothed LS estimator. The resulting smoothed GMM estimator (SGMME) is shown to be consistent and asymptotically normally distributed. As our SGMME is introduced based on a general set of moment conditions, we derive the best moment condition to motivate a more ecient feasible iterated GMM estimator with an initial estimator based on certain non-best moment condition. Simulation results indicate that the suggested SGMME performs well in nite samples. The rest of the paper is organized as follows: Section 2 introduces the smoothed GMM estimator and the asymptotic properties are established in Section 3. Section 4 discusses choice of the best moment condition and denes the feasible best SGMME. Section 5 reports some Monte Carlo results and Section 6 concludes. The proofs of propositions are collected in an Appendix.
Smoothed GMM Estimator

Dene di () = 1(ti > ), y d () = y i di (), xd () = xi di (), y d = y d (0 ) and xd = xd (0 ). i i i i i i
5
Write model (2) as

d yi = 0 y i + xi 0 + d y d + xd 0 + ui , i = 1, , n. 0 i i
(3)
Stacking the variables yi , xi and ui s, i = 1, , n, (3) can be further written in its matrix form,
d Yn = 0 Wn Yn + Xn 0 + d Dn Wn Yn + Dn Xn 0 + Un , 0
(4)
where Yn = (y1 , , yn ) , Wn = [wij ]i,j=1, ,n with wii = 0, i = 1, , n, Xn = (x1 , , xn ) ,

Dn () = diag{d1 (), , dn ()} and Dn = Dn (0 ). Dene Zn () = [Wn Yn , Xn , Dn ()Wn Yn , Dn ()Xn ], = (, , d , d ) , = ( , ) , Un () = Yn Zn () . Observe that for model (4), the spatial
lag terms Wn Yn and Dn Wn Yn are generally correlated with the error term Un , which implies that
4
For example, a central limit theorem type result in Horowitz (1992) is adapted to accommodate our spatial case.
See Lemma 6 in Appendix. 5 Since the notations are fairly heavy for this paper, we collect them in Appendix A.
the LS-based estimator of that intends to minimize the following sum of squared errors, Sn () = 1 U ()Un () n n (5)
is generally inconsistent. To establish a consistent estimation procedure for (4), lets follow Lee (2007)s GMM framework for linear SAR model. Assume that there exists a n kq matrix Qn constructed by some functions of Xn and Wn . The moment functions corresponding to the orthogonality condition of Qn and Un () can be written as mn () = 1 Q Un (). n n (6)
As long as the components that construct Qn are non-stochastic or independent of the error term Un ,
there must hold the moment conditions (6) since at 0 = (0 , 0 ) , E(Qn Un (0 )) = E(Qn Un ) = 0.
The GMM estimator based on (6) can be viewed as the solution to the following minimization problem, n = arg min Sn () = arg min mn ()An mn ()

(7)
where is some compact set of R2kx +3 that contains 0 as the interior. An is some kq kq nonnegative denite matrix and is assumed to converge to a constant matrix A. This design corresponds to Hansen (1982)s GMM setting, which can be used to illustrate the optimal weighting issue. One undesirable aspect of the objective function Sn () is that it is not continuous in because enters Sn () through the factor 1(ti > ) and this property precludes asymptotic analysis using standard Taylor expansion. In the spirit of the smoothed maximum score estimator of Horowitz (1992), Seo and Linton (2007) suggest replacement of this discontinues step function with some integrated kernel function K(s) with
s
lim K(s) = 0, and
lim K(s) = 1.
Dene the n n matrix Kn (, h) = diag K( t1h ), , K( tnh ) . The smoothed version of s Zn () and Un () then can be dened as Zn (, hn ) = [Wn Yn , Xn , Kn (, hn )Wn Yn , Kn (, hn )Xn ], s s Un (, hn ) = Yn Zn (, hn ) , where the bandwidth hn goes to zero as n . The proposed
smoothed GMM estimator (SGMME) is the minimizer to the following smoothed quadratic criterion function, or,
s n = arg min Sn (, hn ) = arg min ms (, hn )An ms (, hn ) n n
(8)
where ms (, hn ) = n 1 Q U s (, hn ). n n n 6 (9)
At rst glance, the minimization problem posed by (8) will be computationally demanding as it involves optimization over a (2kx + 3)-dimensional space, specically when kx is large. Observing that Un () is linear in conditional on , the computationally easiest method to calculate this SGMME is through proling. Given , the linear GMM estimator of is given by
s s n () = Zn (, hn )Qn An Qn Zn (, hn ) 1 s Zn (, hn )Qn An Qn Yn .
(10)
Then n can be dened as the minimizer to the concentrated quadratic criterion

s,c n = arg min Sn (, hn ) = arg min ms,c (, hn )An ms,c (, hn ) n n
(11)
where ms,c (, hn ) = n 1 s Q Yn Zn (, hn )n () . n n (12)
Given n , then can be estimated by n (n ).
For the moment, no specic form of this IV matrix Qn has been given. The best IV matrix will be derived and the feasible best SGMME will be introduced in Section 4. In Section 3, we develop a general asymptotic theory for this GMM estimation as long as Qn meets some regularity conditions.6 As will become evident, the best chosen moment condition derived in Section 4 will meet these regularity conditions automatically. The purpose of this plan is two-fold. First, since the best SGMME introduced in Section 4 should rely on some initial consistent estimate, developing a GMM framework bases on a generic set of (non-best) moment conditions seems to be necessary. Second, this plan parallels Lee (2007) in developing the GMM estimation for linear SAR model.
7
To make our SGMME work well, the IV matrix Qn is required not to contain any unknown parameters that need to be estimated. In other words, if Qn contains some unknown parameters, they usually have to be replaced by their consistent estimators. This feature of IV matrix is also implicitly maintained in Kelejian and Prucha (1998), Lee (2003) and Lee (2007) among others, to develop their own ecient two-stage or multi-stage estimators, respectively. Similarly, implementing the best SGMME calls for a consistent but not necessarily ecient SGMME perhaps based on certain non-best Qn . Surely, such choice of Qn is not unique. According to our experience in the Monte Carlo experiment,8 given , one may set Qn (, hn ) = [Q (, hn ), Q (, hn )], where n n
Q (, hn ) = [Q1n (, hn ), Kn (, hn )Q1n (, hn )], n

6 7
(13)
See Assumption 5 in Section 3. Lee (2007) also adopts this plan. In other words, he rst establishes the asymptotic theory for the spatial GMM
estimator based on a general set of moment conditions and then discusses choice of the best moment condition. 8 See Section 5 for a detailed description of Monte Carlo design and various estimation procedures dened there.
Q1n (, hn ) is composed of linearly independent columns chosen from

ds 2 ds In , Wn , Wn (, hn ), Wn , Wn (, hn ) 2 ds p ds , Wn Wn (, hn ), , Wn Wn (, hn ) q
, Xn ,
ds p and q are nite positive integers, Wn (, hn ) = Kn (, hn )Wn and
Q (, hn ) = h1 [K(1) (, hn )Q1n (, hn ), K(1) (, hn )ln , K(1) (, hn )Xn ], n n Kn (, hn ) = diag K (1)

(1) t1 hn
(14)
, , K (1)
tn hn
with K (1) () being the rst derivative of K().

s [Zn (,hn ) ]
The intuition underlying such choice of Qn given by (13)-(14) can be argued as follows. First, this Qn is divided into Q and Q , where Q intends to instrument n n n

s = Zn (, hn )
and Q is designed to instrument n
s [Zn (,hn ) ] .
Second, as will become evident later, here Qn is
intentionally dened as an approximation to the best IV matrix derived in Section 4.9 Last but not least, one may note that even the initial SGMME dened above has to rely on some subjectively chosen parameter . In practice, the compact set is reasonably set as large as [mini=1, ,n {ti }, maxi=1, ,n {ti }]. However, it can be argued that not each point within this is appropriate for being used as . In fact, should be such chosen that the resulting Qn (, hn ) can meet the identication condition in Section 3.2. Although we defer formal analysis of identication to the next section, an illustrative example can be given here to facilitate understanding. If is chosen suciently near the boundary of , say, = mini=1, ,n {ti } or = maxi=1, ,n {ti }, then Kn (, hn ) will be a zero matrix or an identity matrix. As a result, Qn unlikely has the full column rank, thus violating the rank condition. All these considerations above seem to make it necessary to explore in the Monte Carlo simulation whether our SGMME is sensitive to dierent choices of with dierent distances from 0 . Our simulation results in Section 5 indicate that even in case that is chosen substantially distant from the true 0 , the estimator performs fairly robustly after iterating enough times.
3
3.1
Asymptotic Properties
Assumptions
To analyze the asymptotic properties of the SGMME, lets introduce the following regularity conditions.
2 Assumption 1. (i) The ui s, i = 1, , n, are i.i.d. with zero mean, variance 0 and the nite
9
The accuracy of the initial estimator will not aect the asymptotic eciency of the best SGMME, but may aect
its nite sample performance.
absolute moment of order three. (ii) The regressor matrix Xn is non-stochastic and its elements are uniformly bounded by some positive constant. (iii) is a compact subset of R2kx +3 that includes 0 as the interior. Existence of the absolute moment of order three for ui s is assumed so that some central limit theorem of simple form (Lemma 6 in Appendix B) can be applied. Since our key idea to derive this CLT is to check the Liapounov condition, this assumption can be relaxed to the existence of some absolute moment of order higher than two. The non-stochastic design assumption of Xn is made for several reasons. First, it parallels that of Kelejian and Prucha (1998, 1999, 2001, 2010), Lee (2004, 2007) and Lin and Lee (2010). Second, it allows us to avoid the use of trimming factors (e.g. Robinson, 1988). As noted by Lee (2007), non-stochastic regressor design and its uniform boundedness condition are made for technical convenience. If the elements of Xn are stochastic and have unbounded ranges, conditions in Assumption 1-(ii) can be replaced by some nite moment conditions. Assumption 1-(iii) is often maintained for asymptotic analysis of a nonlinear extremum estimator (Amemiya, 1985). Furthermore, for the proof of consistency of the extremum estimator, the uniform convergence argument will usually require a compact parameter space. Assumption 2. The scalar threshold variable ti s, i = 1, , n are non-stochastic and there exists some positive and continuous function f () such that 1 lim n n
n +
(ti ) =
i=1
(t)f (t)dt
(15)
for any bounded real function (), where |f (t)| cf , its rst derivative f (1) (t) exists, is continuous and bounded. Eqn. (15) is frequently seen in spatial econometrics literature involving nonparametric techniques (Su and Jin, 2010; Su, 2011), or more generally, in nonparametric regression literature with xed regressor design (Linton 1995). Essentially, Eqn. (15) relates the xed design to an implicit random generation mechanism. Here f (t) can be interpreted as the underlying probability density function that generates ti , i = 1, , n. If one lets 1, Eqn. (15) is reduced to the fact that f (t) integrates to one over the real line. Essentially, ti s are required to have everywhere positive density with respect to Lebesgue measure. Furthermore, the smoothness condition that f (1) is continuous and bounded is not necessary for the consistency but will be used to establish the asymptotic distribution of SGMME. Finally, even though we focus on the xed regressor case, our analysis holds with probability one if ti s are generated randomly, and in this case, we can interpret our analysis as being conditional on ti s. Dene Sn (, d , ) = In Wn d Dn ()Wn , Sn = Sn (0 , d , 0 ). 0 Assumption 3. (i) The spatial weights matrix Wn has zero diagonals. (ii) The matrix 9
1 Sn = In 0 Wn d Dn Wn is nonsingular. (iii) The row and column sums of Sn and Wn are 0
uniformly bounded in absolute value.
10
Assumption 3 concerns the essential features of spatial weights matrix. Assumption 3-(i) implies that each unit is not a neighbor of itself. Assumption 3-(ii) implies that the TSAR model considered is well dened, that is, the dependent variable Yn is uniquely determined in terms of
1 the disturbances Un . The uniform boundedness assumptions on Wn and Sn are originated in a
series of papers by Kelejian and Prucha, see, e.g., Kelejian and Prucha (1998, 1999), in order to limit spatial dependence across units to a manageable degree. Assumption 4. (i) lims K(s) = 0, lims K(s) = 1. (ii) K() is twice dierentiable everywhere, K (1) (s) is symmetric around zero and both K (1) () and K (2) () are uniformly bounded.(iii) |sl K (1) (s)|ds < for l = 0, 1 and 2. (iv)hn 0, nh2 and nh3 0. n n These conditions are similar to those maintained by Seo and Linton (2007). As noted by Seo and Linton, a kernel that satises these conditions is K(s) = (s) + s(s), where (s) and (s) are the cumulative distribution and probability density function of the standard normal distribution, respectively. Finally, hn = ln n n1/2 satises Assumption 4-(iv). Assumption 5.
(1)
The n kq matrix Qn = [Q , Q ], where Q is n kq and Q is n kq . n n n n

Q = h1 Kn (, hn )Q , where Kn (, hn ) is dened below (14). Furthermore, the elements of n n 1n Q and Q are non-stochastic and uniformly bounded. n 1n Assumption 5 provides the regularity conditions for a general Qn to meet. There are several features of these conditions that need to be noted. First, Qn can be divided into two parts, where Q is implicitly assumed to identify and Q is to identify . Second, both Q and Q are n n n 1n

(1)
assumed to be uniformly bounded. Given the uniform boundedness condition of Xn and Wn by Assumption 1 and 3, it is reasonable to construct Q and Q based on some combinations of n 1n
Xn and Wn . Third, it seems controversial at rst glance that Q is required to contain a scaling n factor h1 Kn (, hn ). As a matter of fact, similar to Seo and Linton (2007), it can be shown in n the next section that the SGMME of has the usual convergence rate n while the estimator of converges at a rate nh1 , faster than n. Given dierent convergence rates for and , n
10
(1)
The row and column sums of an n n matrix Pn are said to be uniformly bounded if we have for all n, there
Pn
j=1
exists a positive constant c independent of n with maxi of an n n matrix Pn is dened as Pn dened as Pn
|Pn,ij | < c and maxj
Pn
i=1
|Pn,ij | < c. This notion of

1
uniform boundedness can be dened in terms of some matrix norms. The maximum column sum matrix norm = maxi
1}
= maxj
|Pn,ij |, and the maximum row sum matrix norm
is
|Pn,ij |. Thus the uniform boundedness of {Pn } in column or row sums is equivalent to
}
the sequence { Pn
or { Pn
being bounded.
10
it might be expected that the moment conditions may have dierent stochastic orders to identify dierent parameters. Fourth, it can be seen from our proofs of propositions that the kernel function
s K() in Assumption 5 is not necessarily the same as the one that enters Un (, hn ) in (9), as long as
both K()s satisfy Assumption 4. (13)-(14) satises Assumption 5.
11
As an example, one may easily verify that the Qn dened by
3.2
Consistency
First consider the identication issue based on the moment functions (9). In the GMM framework, the identication condition requires the unique solution to the limiting equations, lim E (ms (, hn )) = 0 n (16)
1 1 implies that = 0 (Hansen, 1982). Denote Gn (, d , ) = Wn Sn (, d , ), Gn = Wn Sn , Xn =
(Xn , Dn Xn ) and = ( , d ) . The following assumption provides a sucient condition for identication uniqueness. Assumption 6. There exists some compact set that contains 0 as interior and for any , the limit of 1 1 (1) Q Gn Xn 0 , Xn , Dn Gn Xn 0 , Dn Xn , Kn (, hn )(d Gn Xn 0 + Xn d ) n n hn has the full column rank 2kx + 3, where Kn (, hn ) is dened below Eqn. (14). A necessary condition for (17) to have the full column rank is that the columns of
Gn Xn 0 , Xn , Dn Gn Xn 0 , Dn Xn , (1)
(17)
1 (1) K (, hn )(d Gn Xn 0 + Xn d ) hn n
should be linearly independent, which also implies the following regularity conditions: (i) Xn has the
d full column rank kx . (ii) At least one element of 0 = (0 , 0 ) is nonzero. Otherwise, Gn Xn 0 0. d d (iii) At least one element of 0 = (d , 0 ) is nonzero. Otherwise, 0 (1) 1 d d hn Kn (, hn )( Gn Xn 0 +Xn )
d is zero at 0 , violating the requirement that (17) has the full rank for any . Note that this
assumption of nonzero threshold eect is also maintained by Hansen (2000, Assumption 1-(6)) and Seo and Linton (2007, Assumption 1-(c)). Proposition 1.(Identication) Suppose that Assumption 1-4 and 6 hold, is uniquely identied relative to from the moment conditions (9).
11 s Since there are no reason for that we must choose dierent kernels for Un (, hn ) and Qn , we use the same kernel
throughout the estimation procedure for convenience.
11
Observe that by Assumption 4-(ii), K(1) () should be continuous. As a result, 1 Q K(1) (, hn )(d Gn Xn 0 + Xn d ) nhn n n is continuous in , d and d . If (17) has the full column rank at 0 , there must exist a neighborhood B(0 , ) centering on 0 with radius , such that (17) has the full rank over B(0 , ) too, which gives the following local identication condition: Corollary 1.(Local identication) If the limit of 1 1 (1) d Q K (0 , hn )(d Gn Xn 0 + Xn 0 ), Gn Xn 0 , Xn , Dn Gn Xn 0 , Dn Xn 0 n n hn n 0 from the moment conditions (9). Lets dene two important scaling matrices that will be used throughout the rest of the paper, namely, R1n = diag 1, , 1, hn and R2n = diag 1, , 1, hn , , hn . The following
2kx +2
kq kq
(18)
has the full column rank 2kx + 3, then is uniquely identied relative to a neighborhood containing
proposition establishes the consistency of the SGMME. Proposition 2.(Consistency) Suppose that Assumption 1-6 hold, An is a positive semi-denite
1 1 matrix, An = R2n An R2n has the order O(1) and it converges in probability to a positive denite
matrix A, then the SGMME n as a solution to the minimization problem (8) is a consistent estimator of 0 .
1 1 At rst glance, the condition R2n An R2n = O(1) seems unusual. There are two ways to under-
stand this arrangement. First, multiplication of An by scaling matrices can oset the unbalanced stochastic order in the moment conditions since Q and Q have dierent orders by Assumption n n
5. Second, as will become evident later, for eciency consideration, the optimal weighting matrix Ao,n derived in Section 4, satises this condition automatically.
3.3
Asymptotic normality
For GMM estimators, their limiting distributions usually involve the expected rst order derivative and covariance-variance (CV) matrix of the moment functions evaluated at the true parameters. For our SGMME, the expected derivative of moment functions (9) evaluated at 0 is given by E
ms (,hn ) n =0
= n =

1 n
d Q Gn Xn 0 Qn Xn Q Kn Gn Xn 0 Q Kn Xn Q h1 Kn (d Gn Xn 0 + Xn 0 ) n n n n n 0
(1)
, (19)
Q Gn Xn 0 n
Q Xn n
Q Kn Gn Xn 0 n
Q Kn Xn n
d Q h1 Kn (d Gn Xn 0 + Xn 0 ) n n 0
(1)
12
where Kn = Kn (0 , hn ) and Kn = Kn (0 , hn ). In Appendix C, we show that R2n n R1n is O(1) and is asymptotically equivalent to n = 1 n
Q Gn Xn 0 Q Xn Q Kn Gn Xn 0 Q Kn Xn n n n n

(1)
(1)
0
(1)
d Q Kn (d Gn Xn 0 + Xn 0 ) n 0 (20)
Similarly, in Appendix C, we also show that the re-scaled CV matrix of the unsmoothed moment conditions evaluated at the true parameters, Var nR2n mn (0 ) is asymptotically equivalent to 2 0 0 Q Q n n . n = (21) n 0 hn Qn Qn and that n = O(1). The following proposition provides the asymptotic distribution of the SGMME. Proposition 3.(Asymptotic distribution) Suppose that Assumption 1-6 hold, then the SGMME has the limiting distribution
n(n 0 )
nh1 (n n
0 )
d N (0, )
where = lim
n
n An n
n An n An n n An n
(22)
1 1 provided that the limits of n , n and An = R2n An R2n exist, where n , n are given by (20) and
(21). Similar to Seo and Linton (2007), smoothing accelerates the convergence rate of n . Given our usual choice of the bandwidth hn = ln n n1/2 , this rate can be almost as fast as n. Second, unlike Seo and Linton, where the smoothed LSE of the slope coecients and the threshold parameter are
asymptotically independent, with general choice of An , one may not conrm that n and n are
asymptotically independent. But this property is restored if we use the optimal weighting matrix discussed in the next section. Third, although we do not explicitly treat it, the case of small
d threshold in Hansen (2000) can be analyzed within the same framework. Specically, when 0 is d replaced by 0 n , one still obtains asymptotic normality, provided that is not too large, but at
a slower rate of convergence reecting the presence of n in calculating n .
13
Best SGMME
From Proposition 3, given the moment functions (9), the optimal choice of a weighting matrix
1
will be Ao,n = n by the generalized Schwartz inequality, or equivalently, Ao,n = 1 with n = n s 1 1 R2n n R2n = AsyVar nmn (0 , hn ) . Then the resulting SGMME with the optimal weighting matrix will have the asymptotic variance, = lim
n 1
n n n
(23)
Applying the generalized Schwartz inequality to this asymptotic variance matrix, we see that the best IV matrix Qb,n within the class of IV matrices satisfying Assumption 5 is
(1) d Qb,n = Gn Xn 0 , Xn , Kn (0 , hn )Gn Xn 0 , Kn (0 , hn )Xn , h1 Kn (0 , hn )(d Gn Xn 0 + Xn 0 ) , n 0
(24) and the best SGMME based on Qb,n will have the asymptotic variance matrix, 2 2 0 0 0 Q Q b,n b,n R1n Qb,n Qb,n R1n = b = lim b,n = lim n n n n 0 hn Q Q
b,n
, (25)
b,n
with Q = Gn Xn 0 , Xn , Kn (0 , hn )Gn Xn 0 , Kn (0 , hn )Xn and Q = h1 Kn (0 , hn )(d Gn Xn 0 + n n 0 b,n d Xn 0 ).
(1)
In practice, with initial consistent estimates , Qb,n can be estimated by Qb,n = [Q , Q ], b,n b,n
with Q = n Gs (, d , , hn )(Xn +Kn (, hn )Xn d ), Xn , Kn (, hn )Gs (, d , , hn )(Xn +Kn (, hn )Xn d ), Kn (, hn )Xn n n (26) and
(1) Q = h1 Kn (, hn )(d Gs (, d , , hn )(Xn + Kn (, hn )Xn d ) + Xn d ), n n b,n
(27)
s1 s in which Gs (, d , , hn ) = Wn Sn (, d , , hn ), Sn (, d , , hn ) = In Wn d Kn (, hn )Wn . n
To summarize the asymptotic properties of the feasible best SGMME, lets make the following high-level assumption.
s s1 Assumption 7. Sn (0 , d , 0 , hn ) is nonsingular and the row and column sums of Sn (0 , d , 0 , hn ) 0 0
are uniformly bounded in absolute value. Assumption 7 guarantees our feasible best Qb,n dened by (26)-(27) performs as regularly as its unfeasible Qb,n . Apparently, in the presence of Assumption 3, Assumption 7 seems to be unnecessary. Let be the maximum column sum norm or the maximum row sum norm dened by footnote 14
s1 10. Given Kn (0 , hn ) as a smoothed approximation to Dn , one may think of Sn (0 , d , 0 , hn ) 0 s1 1 s1 being nite since Sn (0 , d , 0 , hn ) is close to Sn . Writing Sn (0 , d , 0 , hn ) = 0 0 1 Sn In d (Kn (0 , hn ) Dn )Gn 0 1
, we have In d (Kn (0 , hn ) Dn )Gn 0 Gn

k 1
s1 1 Sn (0 , d , 0 , hn ) Sn 0
cs
k=1 n
|d |k Kn (0 , hn ) Dn 0 |cg d |k max 0
k=1 i
= cs
ti 0 hn
1(ti > 0 )
(28)
1 where Sn = cs and Gn = cg . Since ti s have a continuous density function by Assumption 2,
for given hn , there always with probability 1 exists at least one ti within (0 hn n , 0 + hn n ) with some > 0, such that K
ti 0 hn
1(ti > 0 )
1 2
= 0, as n . Therefore, the uniform
1 boundedness of the row or column sums of Sn and Gn by Assumption 3, does not necessarily s1 assure that the row or column sums of Sn (0 , d , 0 , hn ) are also uniformly bounded. In the 0
special case of small threshold, |2cg d | < 1 as long as n is suciently large, then the summation 0 on the righthand side of (28) is nite. Then Assumption 7 holds automatically. The following proposition shows that the feasible best SGMME has the same limiting distribution as the (unfeasible) best SGMME. Proposition 4.(Feasible best SGMME) Suppose that Assumption 1-4, 7 hold, is a
consistent estimator of n , is a 2 of 0 , then the feasible SGMME
n-
nh1 -consistent estimator of 0 and 2 is a consistent estimator n
b,n = arg min ms (, hn )1 ms (, hn ), b,n b,n b,n

1 s with ms (, hn ) = n Qb,n Un (, hn ), b,n
b,n = 2 n
Q Q b,n b,n
0 hn Q Q b,n b,n
is consistent and has the limiting distribution n(b,n 0 ) d N (0, b ) 1 nhn (b,n 0 ) where b is given by (25), Qb,n is dened by (26)-(27).
15
Since we assume out spatial autocorrelation and heteroscedasticity in the disturbance Un , consistent estimate of the CV matrix is straightforward, i.e., by replacing unknown parameters in the denition of b with their consistent estimators. In the presence of unknown spatial autocorrelation and heteroscedasticity across ui s, Kelejian and Prucha (2007)s nonparametric HAC estimator may be applicable, but permitting non i.i.d. disturbances will complicate the proofs of the asymptotic analysis and more technical assumptions should be imposed.
Monte Carlo Simulation

We conduct a small-scale Monte Carlo experiment to evaluate the nite sample performance of
the SGMME for TSAR model. A larger Monte Carlo study relating to a wider set of experiments than those described below is left for future research. The design is yi = (0 + d 1(ti > 0 )) 0
j=i d d wij yj + (10 + 10 1(ti > 0 )) + xi (20 + 20 1(ti > 0 )) + ui , i = 1, , n,
(29)
d d where 0 = 0.3, d = 0.8, 10 = 0, 10 = 20 = 1 and 20 = 0.5. Like Su and Yang (2011) and Su 0
(2010), we generate the spatial weight matrix Wn = [wij ]i,j=1, ,n,i=j according to the principle of Rook contiguity, by randomly allocating the n spatial units on a lattice of n n squares, nding the neighbors for each unit, and then row normalizing. The ui s are identically and independelty drawn from a standard normal distribution N (0, 1), both the scalar regressor xi and the threshold variable ti are i.i.d. drawn from a normal distribution N (1, 1). The threshold parameter 0 is set to be 0.5. We consider two iterated SGMMEs and their corresponding feasible best SGMME. These estimators are dened below. The IV matrix used to compute initial SGMME is given by Qn (, hn ) = [Q (, hn ), Q (, hn )], n n
where Q (, hn ) = [Q1n (, hn ), Kn (, hn )Q1n (, hn )], n

d 2 d Q1n (, hn ) = ln , In , Wn , Wn (, hn ), Wn , Wn (, hn ) 2 d , Wn Wn (, hn ) Xn ,
d ln is a n-dimensional column vector of ones, Wn (, hn ) = Kn (, hn )Wn and
Q (, hn ) = h1 [K(1) (, hn )Q1n (, hn ), K(1) (, hn )ln , K(1) (, hn )Xn ]. n n
(30)
The SGMME is computed with K(s) = (s) + s(s), where and are the standard Gaussian c.d.f. and density functions respectively, hn = (ln n)n1/2 and the GMM weighting matrix An = (Qn (, hn )Qn (, hn ))1 . The resulting initial estimator n will be substituted into Qn (, hn ) 1 16
it again to compute the iterated SGMME. After it times of iteration, the resulting estimator n = n (it , it , it , d,it , d,it , d,it , it ) is used to construct the feasible best instrumental matrix dened
n 1n 2n 1n 2n n
in Section 4. To illustrate how sensitively our SGMME relies on an initial choice of , we explore three cases where is chosen to be mildly ( = 1), substantially ( = 1.5) or extremely ( = 2) distant from the true value 0 = 0.5. We compute four estimators, that is, SGMME with it = 5 iterations and its corresponding feasible best SGMME, SGMME with it = 10 iterations and its corresponding feasible best SGMME. For each estimator, we carry out 500 repetitions, with sample size n = 196, 400, 900 and 1600, and report their median and the interquartile range (the dierence between 3/4 quantile and 1/4 quantile) divided by 1.35, which is a robust estimate of the standard deviation of the estimates. Table 1 summarizes the simulation results and there are several main observations: 1. Standard deviation of these estimators declines with the sample size. The magnitude of such decline is generally consistent to n1 -asymptotics for and n-asymptotics for other parameters, which is consistent with the theoretical predictions. 2. The estimate of is essentially unbiased. Except for , other estimators incur a bias of certain degree but the bias seems to decline with the sample size. Since our SGMME of has a reasonably good performance in the presence of spatial autoregression in all the cases, the biasedness of other estimators does not seem to be specic to this TSAR model, and may be existing in other spatial models. Indeed, one can nd in Lee (2007), that for linear SAR models, 2SLSE and GMME there all have a certain degree of bias. 3. In most cases, feasible best SGMMEs usually have a smaller bias and standard deviation relative to their non-best counterparts, which is consistent to our theory. 4. Our SGMME seems quite robust with respect to the initial choice of . The estimators based on = 2 have a competitive performance relative to other estimators, which is important for applied researchers. We also explore the case where the threshold variable and the regressor has some common element, by letting ti = xi in the design above and replicating the simulations for n = 196 and 400. Table 2 summarizes these results. Overall, the resulting estimators perform similarly to those in Table 1 except that they have a larger standard deviation relative to their counterparts in Table 1, which is also observed by Seo and Linton (2006). Finally, we remain concerned with whether our SGMME is sensitive to the distribution of the threshold variable t, especially when t has a heavy tail distribution or is asymmetrically distributed. 17
We replicate the simulations for design (29) but for now t has a uniform distribution or chi square distribution. To make the results comparable, we let these distributions have the same mean and variance with N (1, 1), that is, t Unif[1 3, 1 + 3] or t 2 (1)/ 2. The results summarized in Table 3 seem to conrm that our SGMME is robust both to choice of the initial threshold parameter and to the distribution of the threshold variable.
Conclusions
Linear spatial autoregressive model assumes spatial dependence constant over the whole pop-
ulation, which can be fairly restrictive in practice. Unlike Su and Yang (2011)s spatial quantile autoregression model that allows for quantile-dependent spatial coecient, the threshold spatial autoregressive model considered in this paper assumes possible heterogenous spatial dependence across dierent subpopulations. By extending Lee (2007)s GMM framework for linear SAR model, we propose a smoothed GMM estimator and analyze its asymptotic properties. Similar to Seo and Linton (2007), it turns out for our TSAR model that both the spatial autoregressive coecient and the slope coecients have a usual convergence rate of n while the threshold parameter converges at a faster rate nh1 . We also briey discuss choice of the best moment condition, to motin vate a more ecient feasible iterated GMM estimator starting with an initial consistent estimator. One aspect in that our SGMME diers from Seo and Lintons smoothed LSE lies in that even in computing the initial estimator, some threshold parameter needs to be subjectively chosen. Given this, we are much concerned with whether our estimator is sensitive to such choice of initial value. Simulation results indicate that our SGMME is promising. They perform quite robust and stably both to the choice of initial value and to the distribution of the threshold variable. Monte Carlo simulation results also show that except for the threshold parameter, the estimators of other parameters incur a bias of certain degree. To reduce the bias, one may follow Lee (2007) to employ a more comprehensive set of moment conditions based on not only the orthogonality condition but also the quadratic moment functions originated for the estimation of pure SAR processes. Since this paper severs as a rst attempt to address the threshold spatial autoregressive model, it also leaves many topics for future research. First, it is interesting to develop a test of the threshold eect in the presence of the spatial dependence or develop a test of the spatial dependence in the presence of the threshold eect. Second, extending the TSAR model to a more general case, i.e., including high order spatial autoregression, multiple threshold variables, or even multi-regime may be more theoretically challenged but is useful for applied researchers. 18
Appendix A. Summary of Notations

yi =
n j=i
wij yj , di () = 1(ti > ), y d () = y i di (), xd () = xi di (), y d = y d (0 ), xd = xd (0 ). i i i i i i
Dn () = diag{d1 (), , dn ()}, Dn = Dn (0 ), Zn () = [Wn Yn , Xn , Dn ()Wn Yn , Dn ()Xn ]. = (, , d , d ) , = ( , ) , = ( , d ) , = (, d ) , d = (d , d ) , Un () = Yn Zn () . Kn (, h) = diag K( t1h ), , K( tnh ) , Kn (, h) = diag K (1) ( t1h ), , K (1) ( tnh ) , s s s d Zn (, hn ) = [Wn Yn , Xn , Kn (, hn )Wn Yn , Kn (, hn )Xn ], Un (, hn ) = Yn Zn (, hn ) , Wn () = Dn ()Wn , ds Wn (, hn ) = Kn (, hn )Wn . 1 Sn (, d , ) = In Wn d Dn ()Wn , Sn = Sn (0 , d , 0 ), Gn (, d , ) = Wn Sn (, d , ), Gn = 0 s s1 Gn (0 , d , 0 ), Sn (, d , , hn ) = In Wn d Kn (, hn )Wn , Gs (, d , , hn ) = Wn Sn (, d , , hn ). n 0 R1n = diag 1, , 1, hn , R2n = diag 1, , 1, hn , , hn . 2kx +2
kq kq
(1)
Appendix B. Lemmas
In this appendix, we list some lemmas which are useful for the proofs of the results in the text. Throughout the appendix, let c or C be some generic positive constant that is independent of n and is determined
2 case by case. The ui s, i = 1, , n, are i.i.d. with zero mean, variance 0 and some nite absolute moment
of order three. Lemma 1. Under Assumption 2 and 4, we have (i) 1 n

n
(1(ti > ) K((ti )/hn )) = o(1),

i=1
uniformly over , where is some compact set containing 0 as the interior; (ii) 1 n
n i=1
(1(ti > ) K((ti )/hn )) = O( nh2 ), n
for any ; (iii) For any Bn = [bij ]i,j=1, ,n that is a n n matrix with its column sums being uniformly bounded in absolute value, 1 n
n n
bij (1(ti > ) K((ti )/hn ))(1(tj > ) K((tj )/hn )) = O( nh2 ); n
i=1 j=1
(iv) Furthermore, 1 nhn uniformly over .
K (1) ((ti )/hn )(1(ti > ) K((ti )/hn )) = o(1),

i=1
Proof. For (i), the proof essentially follows Lemma 4 of Horowitz (1992). Given any > 0, 1 n
n
(1(ti > ) K((ti )/hn )) C1n + C2n

i=1
19
where C1n = and C2n =
1 n 1 n
|(1(ti > ) K((ti )/hn ))| 1(|ti | )

i=1 n
|(1(ti > ) K((ti )/hn ))| 1(|ti | < ).

i=1
Assumption 4-(i) with hn 0 together imply that for each > 0, C1n 0 uniformly over . Since K() is bounded, there exists some nite constant c such that C2n c n
n +
1(|ti | < ) = c
i=1
1(|t | < )f (t)dt = c
f (t)dt 2ccf ,
(A.1.1)
where the rst equality follows from Assumption 2, the inequality is implied by f () is bounded by some positive constant cf . Then C2n uniformly converges to zero since can be made suciently small and 2ccf is independent of . For (ii), by Assumption 2 we have 1 n = =
n
(1(ti > ) K((ti )/hn )) =

i=1 +
(1(t > ) K((t )/hn ))f (t)dt

+
nhn
(1(hn s > 0) K(s))f (hn s + )ds =

+
nhn
(1(s > 0) K(s))(f () + f (1) ()hn s)ds
nhn f ()
(1(s > 0) K(s))ds +
nh2 n
(1(s > 0) K(s))sf (1) ()ds
where and are due to change of variables and mean value theorem, respectively, and lies between and + hn s. Note that that
+ (1(s + (1(s
> 0) K(s))ds = 0, since K (1) is symmetric by Assumption 4 and for

0
any s > 0, 1(s > 0) K(s) + 1(s > 0) K(s) = 1(s > 0) K(s) + 1(s < 0) (1 K(s)) = 0. Also note > 0) K(s))sds = 2
1 n
K(s)sds = K(s)s2 |0 +
4-(iii). As a result, we see bounded by Assumption 2. For (iii), note that 1 n = 1 n

n
n i=1 (1(ti
> ) K((ti )/hn )) =
0 s2 K (1) (s)ds < by O( nh2 ) = o(1) since f (1) n
Assumption is uniformly
bij (1(ti > ) K((ti )/hn ))(1(tj > ) K((tj )/hn ))

i=1 j=1 n i=1
n j=1
(1(tj > ) K((tj )/hn ))bij ,
(1(ti > ) K((ti )/hn ))
where )/hn ))
n j=1 (1(tj n and j=1
> ) K((tj )/hn ))bij
has the uniform order O(1) since both (1(tj > ) K((tj
|bij | are bounded and these bounds are independent of i, j and n. Then the result follows
from Lemma 1-(ii).
20
For (iv), similarly we have 1 nhn = =

+ n
K (1) ((ti )/hn )(1(ti > ) K((ti )/hn ))

i=1 +
1 hn
K (1) ((t )/hn )(1(t > ) K((t )/hn ))f (t)dt
K (1) (s)(1(s > 0) K(s))f ( + hn s)ds

+
K (1) (s)(1(s > 0) K(s))f ()ds + hn
K (1) (s)(1(s > 0) K(s))sf (1) ()ds = O(hn ),
since K (1) (s)(1(s > 0) K(s)) is an odd function, both (1(s > 0) K(s)) and f (1) are bounded and the integral |K (1) (s)s| is nite. Lemma 2. Let {gn,i (), i = 1, , n}, be a sequence of stochastic real valued functions on Rk . If (i) gn,i () p 0 for each Rk , (ii) |gn,i (2 ) gn,i (1 )| Bn ( 1 2 ) almost surely for all 1 , 2 , where () is a non-stochastic function and (s) 0 as s 0, and Bn is stochastically uniformly bounded for all , then sup |gn,i ()| p 0. Proof. Davidson (1994)s Theorem 21.9 and 21.10. Lemma 3. For any two n n matrices B1n and B2n whose row and column sums are bounded uniformly bounded by a constant, (i) the row and column sums of B1n B2n are also uniformly bounded by a constant. (ii) For some nk matrices C1n and C2n whose elements are bounded by a constant, the elements of B1n C1n ,
1 n C1n C2n
and
1 n C1n B1n C2n
are uniformly bounded by a constant. Furthermore, (iii) assume that the row
1 n C3n B3n Un
and column sums of n n matrix B3n are uniformly bounded in probability, the elements of n k matrix C3n are uniformly bounded in probability, then are allowed to be correlated with Un . Proof. We only prove (iii). Others are trivial. Without loss of generality, let k = 1. n1 C3n B3n Un n1
n i=1 n j=1
= Op (1), where the elements of B3n and C3n
|ci ||uj ||bij | n1
n j=1
|uj | (
n i=1
|ci ||bij |) = n1
n j=1
|uj | Op (1) = Op (1), by the assump-
tions of Lemma 3 and the law of large numbers, where ci and bij are the elements of C3n , B3n , respectively. Lemma 4. For any two n k matrices C1n and C2n whose elements are uniformly bounded by a constant, (i) the elements of diag K (1)
(2) t1 hn
(1) 1 nhn C1n Kn (, hn )C2n is also bounded, uniformly over , where (2) 1 , K (1) tn . Similarly, nhn C1n Kn (, hn )C2n = O(1), where hn t1 hn
Kn (, hn ) =
(1)
Kn (, hn ) = diag K (2)
, , K (2)
tn hn
. Furthermore, (iii) assume that the elements of n k

(1) 1 nhn C3n Kn (, hn )Un
matrix C3n are uniformly bounded in probability, then
= Op (1), uniformly over ,
where the elements of C3n are allowed to be correlated with Un . Proof. Without loss of generality, let k = 1. For (i), we have 1 1 C K(1) (, hn )C2n = nhn 1n n nhn c hn K (1) t hn f (t)dt = c
n
c1n,i c2n,i K (1)

i=1
ti hn K (1) (s) ds.
K (1) (s) f (hn s + r)ds ccf
21
Then the desired result follows by Assumption 4. The result (ii) can be veried similarly. Given the existence of the absolute moment of order higher than two, the elements of Un are Op (1) by Markovs inequality. Then the result (iii) can be veried by analogous argument to that for (i). Lemma 5. Suppose that Bn is a n n matrix with its column sums being uniformly bounded in absolute value, elements of the n k matrix Cn are uniformly bounded, and the components of Un = (u1 , , un ) 2 are i.i.d. with zero mean and nite variance 0 . Then, 1/ nCn Bn Un = Op (1), 1/nCn Bn Un = op (1) and 1 1 2 1/ nCn Bn Un d N 0, 0 limn n Cn Bn Bn Cn if the limit of n Cn Bn Bn Cn exists and is positive denite. Proof. See Lee (2004). Lemma 6. Suppose that Bn is a n n matrix with its column sums being uniformly bounded in absolute value, elements of the n k matrix Cn are uniformly bounded, and the components of Un = (u1 , , un )
2 are i.i.d. with zero mean and nite variance 0 . Then, (1) 1 C Kn (, hn )Bn Un nhn n
= Op (1) uniformly over
. Furthermore, if limn 1 C K(1) (, hn )Bn Un d N nhn n n

2 0, 0 lim n
1 (1) C K(1) (, hn )Bn Bn Kn (, hn )Cn , nhn n n
(1) (1) 1 nhn Cn Kn (, hn )Bn Bn Kn (, hn )Cn exists and is positive nite. (1) 1 Proof. Since E nh Cn Kn (, hn )Bn Un = 0, the result follows by Chebyshevs inequality if we can show n (1) 1 that Var nh Cn Kn (, hn )Bn Un is bounded uniformly over Without loss of generality, letting
n
k = 1 and bij be the (i, j)-th element of Bn Bn , we have Var = =

2 0 nhn
(1) 1 C Kn (, hn )Bn Un nhn n
2 (1) (1) 0 nhn Cn Kn (, hn )Bn Bn Kn (, hn )Cn
n i=1
n (1) j=1 ci cj K
ti hn
K (1) K (1)
tj hn
bij bij =
2 2 C 1 0 nhn
n i=1
n j=1
K (1)
ti hn
K (1) ,
tj hn
bij
2 2 C 1 0 nhn
n i=1
K (1)
ti hn
n j=1
tj hn
2 2 C1 C2 C3 0 nhn
n i=1
K (1)
ti hn (1)
(A.1.2) where C1 , C2 are the uniform bounds of the elements of the matrix Cn and K
1 nhn n i=1
() respectively, and C3
is the uniform bound of the row sums of Bn Bn . Finally the desired result follows since we can show K (1)
ti hn
= O(1) by analogous argument to Lemma 4.

1 hn n i=1
To establish the CLT result, assume that k = 1 without loss of generality, otherwise we can always employ the Cramer-Wold device. Lets check the Liapounov condition for aj uj , where aj =
22
Cn Kn (, hn )Bnj and Bnj is the j-th column of Bn . 1 E aj uj hn j=1

n n 3 n
(1)
=
j=1
1 aj hn
E |uj | = 3
j=1 3 n
1 aj hn
n
= 3
j=1 n
1 hn
n
ci K (1) ((ti )/hn )bij

i=1
3 c
j=1 n i=1 i=1
|h1/2 K (1) ((ti )/hn )||bij | n

n
3 c
j=1 i=1
|h1/2 K (1) ((ti )/hn )|3 |bij |3 = 3 c n

n
|h1/2 K (1) ((ti )/hn )|3 n

j=1
|bij |3
3 c2 c2 ch3/2 1 n
i=1 3
|K (1) ((ti )/hn )| = O(nh1/2 ),

n j=1
where 3 = E |uj | , c, c1 and c2 are the uniform bounds of ci , K (1) and

1 nhn n i=1
|bij |3 respectively, and

1 hn n i=1
|K (1) ((ti )/hn )| = O(1) by analogous argument to (A.1.2). Then together with Var
3 3/2
aj uj =
O(n) by (A.1.2), the Liapounov condition

n j=1
E Var
1 aj uj hn n i=1
= O((nhn )1/2 ) 0
1 hn
aj uj
holds. Lemma 7-10 are used to complete the proof of Proposition 4. Suppose that all the assumptions for Proposition 4 hold for the following lemmas. Lemma 7. Assume that Cn is a n k matrix whose elements are uniformly bounded by a positive constant, then (i) (ii) (iii)
1 n 1 n 1 n
Gs (, d , , hn )Xn Gn Xn 0 Cn = op (1), n
d Gs (, d , , hn )Kn (, hn )Xn d Gn Dn Xn 0 Cn = op (1), n
Kn (, hn )Gs (, d , , hn )Xn Kn (0 , hn )Gn Xn 0 Cn = op (1), n

1 n d Kn (, hn )Gs (, d , , hn )Kn (, hn )Xn d Kn (0 , hn )Gn Dn Xn 0 Cn = op (1). n
and (iv) Proof.

1 n
We only prove (i), and (ii)-(iv) can be proved analogously. Since (

1 0 ) n Xn Gn Cn
Without loss of generality, lets
assume that k = kx = 1.
= op (1) by Lemma 3, it suces to show that
s1 1 Wn (Sn (, d , , hn ) Sn )Xn
Cn = op (1). Write
1 1 s1 s s1 Sn (, d , , hn ) Sn = Sn (, d , , hn ) Sn Sn (, d , , hn ) Sn s1 = Sn (, d , , hn ) ( 0 )Wn + (d d )Kn (, hn )Wn 0 1 + d Kn (, hn ) Kn (0 , hn ) Wn + d (Kn (0 , hn ) Dn )Wn Sn . 0 0 s1 s Since the row and column sums of Sn (0 , d , 0 , hn ) are uniformly bounded in absolute value and Sn (, d , , hn ) 0 s1 is continuous in , d and , the row and column sums of Sn (, d , , hn ) are also uniformly bounded in
(A.1.3)
absolute value in probability. Lemma 3 then implies that ( 0 ) 1 s1 1 Wn Sn (, d , , hn )Wn Sn Xn n Cn = op (1),
23
(d d ) 0
1 s1 1 Wn Sn (, d , , hn )Kn (, hn )Wn Sn Xn n
Cn = op (1).
Further, Lemma 4 implies that d 0 s1 1 Wn Sn (, d , , hn ) Kn (, hn ) Kn (0 , hn ) Wn Sn Xn Cn n d s1 (1) 1 = ( 0 ) 0 Wn Sn (, d , , hn )Kn (, hn )Wn Sn Xn Cn = op (1), nhn where lies between 0 and , and Lemma 1 implies that d 0 s1 1 Wn Sn (, d , , hn )(Kn (0 , hn ) Dn )Wn Sn Xn n which completes the proof. Lemma 8. Assume that Bn is a nk matrix whose row and column sums are uniformly bounded in absolute value, then (i) (ii) (iii)
1 n 1 n 1 n
Cn = op (1),
Gs (, d , , hn )Xn Gn Xn 0 Bn Un = op (1), n
d Gs (, d , , hn )Kn (, hn )Xn d Gn Dn Xn 0 Bn Un = op (1), n
Kn (, hn )Gs (, d , , hn )Xn Kn (0 , hn )Gn Xn 0 Bn Un = op (1), n

1 n d Kn (, hn )Gs (, d , , hn )Kn (, hn )Xn d Kn (0 , hn )Gn Dn Xn 0 Bn Un = op (1). n
and (iv)
Proof. We only prove (i), and (ii)-(iv) can be proved analogously. Without loss of generality, lets as1 sume that k = kx = 1. Since ( 0 ) n Xn Gn Bn Un = op (1) by Lemma 5, it suces to show that 1 n 1 s1 Wn (Sn (, d , , hn ) Sn )Xn
Bn Un = op (1). According to (A.1.3), rst we show that Bn Un = op (1). (A.1.4)
1 ( 0 ) Gs (, d , , hn )Gn Xn n n Given Gs (, d , , hn ) = n
Gn + Gs (, d , , hn ) ( 0 ) + (d d )Kn (, hn ) + d Kn (, hn ) Kn (0 , hn ) + d (Kn (0 , hn ) Dn ) Gn n 0 0 0 (A.1.5) by (A.1.3), we can show that 1 ( 0 ) G2 Xn n n Bn Un = op (1)
by Lemma 5, 1 ( 0 )2 Gs (, d , , hn )Gn Xn Bn Un n n 1 Gs (, d , , hn )Gn Xn = ( 0 )( n( 0 )) n n
Bn Un = op (1),
1 ( 0 )(d d ) Gs (, d , , hn )Kn (, hn )Gn Xn Bn Un n 0 n 1 Gs (, d , , hn )Kn (, hn )Gn Xn Bn Un = op (1) = ( 0 )( n(d d )) n 0 n
24
by Lemma 3, d ( 0 ) 0 Gs (, d , , hn ) Kn (, hn ) Kn (0 , hn ) Gn Xn Bn Un n n d (1) = ( 0 )( 0 ) 0 Gs (, d , , hn )Kn (, hn )Gn Xn Bn Un n nhn d (1) = ( n( 0 ))( 0 ) 0 Gs (, d , , hn )Kn (, hn )Gn Xn Bn Un = op (1) n nhn by Lemma 4, and d ( 0 ) 0 Gs (, d , , hn ) Kn (0 , hn ) Dn Gn Xn Bn Un n n d = ( n( 0 )) 0 Gs (, d , , hn ) Kn (0 , hn ) Dn Gn Xn Bn Un = op (1) n n by Lemma 1, noting that the elements of both Gn Xn and Gs (, d , , hn )Bn Un are uniformly bounded (in n probability), which completes the proof of (A.1.4). Similar arguments are applicable to show that 1 (d d ) Gs (, d , , hn )Kn (, hn )Gn Xn 0 n n Bn Un = op (1), Bn Un = op (1), (A.1.6) (A.1.7)
1 Gs (, d , , hn ) Kn (, hn ) Kn (0 , hn ) Gn Xn n n and 1 Gs (, d , , hn ) Kn (0 , hn ) Dn Gn Xn n n
Bn Un = op (1),
(A.1.8)
by Lemma 1, 3-6, which completes the proof of (i). Lemma 9. Assume that Cn is a n k matrix whose elements are uniformly bounded by a positive constant, Bn is a n k matrix whose row and column sums are uniformly bounded in absolute value, then (i)
1 nhn Kn (, hn )d Gs (, d , , hn )(Xn + Kn (, hn )Xn d ) Kn (0 , hn )d Gn Xn 0 Cn = op (1), n 0 1 nhn d Kn (, hn )Xn d Kn (0 , hn )Xn 0 Cn = op (1), Kn (, hn )d Gs (, d , , hn )(Xn + Kn (, hn )Xn d ) Kn (0 , hn )d Gn Xn 0 Bn Un = op (1), n 0 d Kn (, hn )Xn d Kn (0 , hn )Xn 0 Bn Un = op (1). (1) (1) (1) (1) (1) (1) (1) (1)
(ii) (iii)
1 nhn
and (iv)
1 nhn
Proof. We only prove (i) and (iii). Similar arguments are applicable to (ii) and (iv). Without loss of generality, let k = kx = 1. For (i), 1 nhn 1 nhn 1 nhn 1 nhn 1 nhn
(1) (1) Kn (, hn )d Gs (, d , , hn )Xn Kn (0 , hn )d Gn Xn 0 Cn n 0 (1) (1) Kn (, hn ) Kn (0 , hn ) d Gs (, d , , hn )Xn n (1) Kn (0 , hn )(d d )Gs (, d , , hn )Xn 0 n (1) Kn (0 , hn )d (Gs (, d , , hn ) Gn )Xn 0 n (1) Kn (0 , hn )d Gn Xn ( 0 ) 0
=1a +1b +1c +1d
Cn
Cn Cn
Cn .
25
By Lemma 4, the terms (1b) and (1d) are op (1). For term (1a), it equals that ( 0 ) = 1 nhn 1 (2) Kn (, hn )d Gs (, d , , hn )Xn n nh2 n nh1 ( 0 ) n Cn Cn = op (1)
d (2) Kn (, hn )Gs (, d , , hn )Xn n nhn

1 nhn (1)
by Assumption 4 and Lemma 4. For term (1c), it can be shown to be op (1) by analogous argument to that for Lemma 7. Then the result (i) follows since we can show
(1)
Kn (, hn )d Gs (, d , , hn )Kn (, hn )Xn d n
d Kn (0 , hn )d Gn Kn (0 , hn )Xn 0 Cn = op (1) by similar argument. 0
For (iii), it suces to show that 1 nhn 1 nhn 1 nhn 1 nhn 1 nhn =2a +2b +2c +2d
(1) (1) Kn (, hn )d Gs (, d , , hn )Xn Kn (0 , hn )d Gn Xn 0 Bn Un n 0 (1) (1) Kn (, hn ) Kn (0 , hn ) d Gs (, d , , hn )Xn n (1) Kn (0 , hn )(d d )Gs (, d , , hn )Xn 0 n (1) Kn (0 , hn )d (Gs (, d , , hn ) Gn )Xn 0 n (1) Kn (0 , hn )d Gn Xn ( 0 ) 0
Bn Un
Bn Un Bn Un
Bn Un = op (1).
The term (2d) is op (1) by Lemma 6. The term (2c) can be shown to be op (1) by analogous argument to that for Lemma 8. For the term (2b), by (A.1.5), it equals that (d d ) 0 = ( d d ) 0 + + + + (1) Kn (0 , hn )Gs (, d , , hn )Xn n nhn Bn Un
(1) Kn (0 , hn )Gn Xn Bn Un nhn hn (1) Kn (0 , hn )Gs (, d , , hn )Gn Xn ( n( 0 ))( n(d d )) 0 n n nhn (n(d d )2 ) 0
Bn Un
hn (1) Kn (0 , hn )Gs (, d , , hn )Kn (, hn )Gn Xn Bn Un n n nhn 1 d (1) (1) nh1 ( 0 ) ( n(d d )) 0 Kn (0 , hn )Gs (, d , , hn )Kn (, hn )Gn Xn n 0 n n nhn
Bn Un
d (1) ( n(d d )) hn 0 Kn (0 , hn )Gs (, d , , hn )(Kn (0 , hn ) Dn )Gn Xn 0 n nhn
Bn Un = op (1)
by Lemma 1 and 4. Similar arguments are applicable to term (2a) to show that it is op (1). Lemma 10. Assume that Cn is a n k matrix whose elements are uniformly bounded by a positive constant, Bn is a n k matrix whose row and column sums are uniformly bounded in absolute value, then (i)
1 n
Gs (, d , , hn )Xn Gn Xn 0 (Kn (0 , hn ) Dn )Cn = op (1), n

1 n 1 n d Gs (, d , , hn )Kn (, hn )Xn d Gn Dn Xn 0 (Kn (0 , hn ) Dn )Cn = op (1), n
(ii) (iii)
Kn (, hn )Gs (, d , , hn )Xn Kn (0 , hn )Gn Xn 0 (Kn (0 , hn ) Dn )Cn = op (1), n
26
(iv) (v)
1 n 1 nhn
d Kn (, hn )Gs (, d , , hn )Kn (, hn )Xn d Kn (0 , hn )Gn Dn Xn 0 (Kn (0 , hn ) Dn )Cn = op (1), n Kn (, hn )d Gs (, d , , hn )(Xn +Kn (, hn )Xn d )Kn (0 , hn )d Gn Xn 0 (Kn (0 , hn )Dn )Cn = n 0 1 nhn d Kn (, hn )Xn d Kn (0 , hn )Xn 0 (Kn (0 , hn ) Dn )Cn = op (1). (1) (1) (1) (1)
op (1), and (vi)
Proof. We only prove (v) and others can be proved analogously. Without loss of generality, let k = kx = 1. For (v), it suces to show that op (1) = =3a +3b +3c +3d 1 nhn 1 nhn 1 nhn 1 nhn 1 nhn
(1) (1) Kn (, hn )d Gs (, d , , hn )Xn Kn (0 , hn )d Gn Xn 0 (Kn (0 , hn ) Dn )Cn n 0 (1) (1) Kn (, hn ) Kn (0 , hn ) d Gs (, d , , hn )Xn n (1) Kn (0 , hn )(d d )Gs (, d , , hn )Xn 0 n (1) Kn (0 , hn )d (Gs (, d , , hn ) Gn )Xn 0 n (1) Kn (0 , hn )d Gn Xn ( 0 ) 0
(Kn (0 , hn ) Dn )Cn
(Kn (0 , hn ) Dn )Cn (Kn (0 , hn ) Dn )Cn
(Kn (0 , hn ) Dn )Cn . (Kn (0 , hn ) Dn )Cn and can be shown
(1) d 0 The term (3d) equals to ( n( 0 )) hn nhn Kn (0 , hn )Gn Xn
to be op (1) by Lemma 1-(iv) and the uniform boundedness of Gn Xn and Cn . Similar argument to that for (3d) and Lemma 7 can be used to show the term (3a)-(3c) are all op (1).
Appendix C. Proofs
Proof of Proposition 1. Direct computation gives that 1 1 s s E (Qn Un (, hn )) = E Qn (Zn 0 Zn (, hn ) ) n n 1 1 s E Qn Zn Zn (, hn ) + E(Qn Zn )(0 ) n n
Observing that E(Wn Yn ) = Gn Xn 0 by assumption 1, the equality above continues as 1 nE 1 s Qn Zn Zn (, hn ) + n E(Qn Zn )(0 ) 1 Dn Kn (, hn ) d Gn Xn 0 + Xn d + n Qn Gn Xn 0 , Xn , Dn Gn Xn 0 , Dn Xn (0 ) Kn (0 , hn ) Kn (, hn ) d Gn Xn 0 + Xn d Gn Xn 0 , Xn , Dn Gn Xn 0 , Dn Xn (0 ) + o(1) d Gn Xn 0 + Xn d ( 0 )
= = + = +
1 n Qn 1 n Qn 1 n Qn
(1) 1 nhn Qn Kn (, hn ) 1 n Qn
Gn Xn 0 , Xn , Dn Gn Xn 0 , Dn Xn (0 ) + o(1)
(A.2.1) where follows from Lemma 1 and uniform boundedness of

Qn , Gn Xn 0 ,
as well as Xn , lies between
and 0 . Under Assumption 6, for a system of the linear equations dened by (A.2.1), one must have = 0
27
and = 0 . Proof of Proposition 2. The consistency follows from both the identication uniqueness and uniform convergence (White 1994). As under Assumption 6, the identication uniqueness has been established by Proposition 1, it suces to show that 1 s R2n Qn Un (, hn ) E n uniformly over . Direct computation gives 1 1 s s R2n Qn Un (, hn ) E R2n Qn Un (, hn ) n n 1 1 1 R2n Qn Un + (0 ) R2n Qn Gn Un + (d d ) R2n Qn Dn Gn Un 0 n n n 1 R2n Qn Dn Kn (, hn ) d Gn Un n 1 s R2n Qn Un (, hn ) n p 0
= +
1 To complete the proof, it suces to show that (i) (d d ) n R2n Qn Dn Gn Un uniformly converges to zero 0
over , where is some compact set containing 0 as the interior, and (ii) 1 R2n Qn Dn Kn (, hn ) d Gn Un p 0 n uniformly over (d , ) d , where d and are some compact sets containing d and 0 as the interior, 0 respectively, while other uniform convergence can be established analogously. For (i), by Assumption 5, we have 1 R2n Qn Dn Gn Un = n
1 n Qn Dn Gn Un (1) 1 Q K (, hn )Dn Gn Un n hn 1n n
Since the elements of Q are uniformly bounded, the row and column sums of both Gn and Dn are unin formly bounded, we have
1 n Qn
Dn Gn Un = op (1) by Lemma 3 and 5. Furthermore, it can be shown that
(1) 1 Q K (, hn )Dn Gn Un n hn 1n n
= op (1) by Lemma 6. Then the uniform convergence follows by applying
Lemma 2. For (ii), by Lemma 7 and mean value theorem, it suces to show that (1) 1 Q Kn (, hn )Gn Un 1 n (1) p 0 ( 0 ) R2n Qn Kn (, hn )d Gn Un = ( 0 )d nhn (1) 1 nhn Q1n Kn (, hn )Gn Un 3/2
nhn
(A.2.2)
uniformly over (d , ) d . The pointwise stochastic convergence follows from Lemma 6 and Assumption 4-(iv). Finally, the uniform stochastic convergence can be shown by applying Lemma 2 and since d is compact. Proof of (20). The result that R2n n R1n has the order O(1) and is asymptotically equivalent to give by (20) can be established by Lemma 3 and 4. For example, by Assumption 5 and letting Cn =
d d Gn Xn 0 + Xn 0 , 0
1 (1) d 1 d (1) Q K (0 Gn Xn 0 + Xn 0 ) = Q K(1) (, hn )Kn Cn = O(1) n n n nhn 1n n
28
by noting that the elements of Kn , Cn and Q are uniformly bounded and by Lemma 4. Others can be 1n proved similarly. Proof of (21). Since Var 2 1 0 Q Q n n nR2n mn (0 ) = Var R2n Qn Un = n n hn Q Q n n hn Q Q n n
(1)
hn Q Q n n
the result can be established, similar to the proof of (20), by Lemma 3 and 4. Proof of Proposition 3. For the asymptotic distributions of n , by Taylors expansion of 0 at 0 , we have
1 nR1n (n
b ms (n ,hn ) n An ms (n , hn ) n
0 )
ms (n , hn ) ms ( n , hn ) n n 1 1 = R1n R2n R2n An R2n R2n R1n R1n ms (n , hn ) n 1 1 R2n R2n An R2n nR2n ms (0 , hn ), n
where n lies between n and 0 . Since n , n and An are all proved or assumed to be O(1) and n , n converge to 0 by Proposition 2, the limiting distribution (22) follows if one can establish the following two results in sequence, that is, (i) R2n (ii) nR2n ms (0 , hn ) d N (0, n ). n
ms (,hn ) n R1n E
R2n
ms (n ,hn ) n R1n
= op (1) uniformly over and
For (i), some algebraic computation gives that R2n n n R1n = Q Xn Q Kn (, hn )Wn Yn Q Kn (, hn )Xn 1 Q Wn Yn n n n n n hn Q Wn Yn hn Q Xn hn Q Kn (, hn )Wn Yn hn Q Kn (, hn )Xn n n n n hn
1/2 Qn Kn (, hn )(d Wn Yn + Xn d ) (1)
ms (,h )
(1)
(A.2.3)
Q Kn (, hn )(d Wn Yn + Xn d ). n First,
1 n Qn
Wn Yn E
1 n Qn
Wn Yn =
1 n Qn
1 Wn Sn Un = op (1) by Lemma 3. Second,

1 (1) Q Kn (, hn )(d Wn Yn + Xn d ) n n hn
1 (1) Q Kn (, hn )(d Wn Yn + Xn d ) E n n hn 1 Q K(1) (, hn )d Gn Un = Op (n1/2 ) n hn n n
uniformly over (d , ) d by Lemma 6. Similarly, we can also show that 1 (1) 1 (1) Q K (, hn )(d Wn Yn + Xn d ) E Q K (, hn )(d Wn Yn + Xn d ) n n n n n n 1 Q K(1) (, hn )d Gn Un = Op ((nh)1/2 ) nhn 1n n
uniformly over (d , ) d , and the uniform convergence of other elements of (A.2.3) can be established similarly.
29
For (ii), rst we show that
nR2n (ms (0 , hn ) mn (0 )) = op (1). Some computation gives that n
1 s nR2n (ms (0 , hn ) mn (0 )) = R2n Qn (Zn (0 ) Zn (0 , hn ))0 n n 1 d = R2n Qn Dn Kn (0 , hn ) d Gn Xn 0 + d Gn Un + Xn 0 0 0 n 1 d Q Dn Kn (0 , hn ) d Gn Xn 0 + d Gn Un + Xn 0 0 0 n n . = d 1 Q Dn Kn (0 , hn ) d Gn Xn 0 + d Gn Un + Xn 0 0 0 1n nh

n
By Lemma 1-(ii) and the uniform boundedness of Q , Gn Xn 0 and Xn , 1n 3/2 d d 1 Q 1n Dn Kn (0 , hn ) 0 Gn Xn 0 + Xn 0 = O( nhn ) = o(1) by Assumption 4-(iv). nh
n
1 Q Dn Kn (0 , hn ) d Gn Un 0 nhn 1n 2 d 2 0 (0 ) nhn Q1n Dn Kn (0 , hn ) Gn Gn Dn

1 Q Un n n 1 Q Un nhn n
= 0 and Var Kn (0 , hn )
1 Q Dn Kn (0 , hn ) d Gn Un = 0 nhn 1n Q1n = O(hn ) = o(1) since Q1n is uniformly
bounded,
the row and columns of Gn Gn is uniformly bounded and by Lemma 1-(iii). Finally, the limiting distribution of nR2n ms (0 , hn ) = n follows from Lemma 5-6.
Proof of Proposition 4. For consistency, note that since Corollary 1 holds for the true (unfeasible) best IV matrix Qb,n and Qb,n satises Assumption 5 automatically, the consistency of the unfeasible best SGMME follows by Proposition 2. As ms (, hn )1 ms (, hn ) = b,n b,n b,n ms (, hn )1 ms (, hn ) + (ms (, hn )1 ms (, hn ) ms (, hn )1 ms (, hn )), b,n b,n b,n b,n b,n b,n b,n b,n b,n the consistency of the feasible best SGMME follows if we can show that ms (, hn )1 ms (, hn ) b,n b,n b,n ms (, hn )1 ms (, hn ) = op (1) uniformly over . This is equivalent to showing that (i) b,n b,n b,n 1 1 s s R1n Qb,n Un (, hn ) R1n Qb,n Un (, hn ) p 0, n n uniformly over , and (ii) R1n (b,n b,n )R1n = op (1).
s d For (i), given Un (, hn ) = Un + (0 )Wn Yn + Xn (0 ) + (d Dn d Kn (, hn ))Wn Yn + (Dn Xn 0 0
(A.2.4)
Kn (, hn )Xn d ),

d Gs (, d , , hn )(Xn + Kn (, hn )Xn d ) Gn (Xn 0 + Dn Xn 0 ) n
Q Q b,n b,n
0 = d s d Kn (, hn )Gn (, , , hn )(Xn + Kn (, hn )Xn d ) Kn (0 , hn )Gn (Xn 0 + Dn Xn 0 ) [(Kn (, hn ) Kn (0 , hn ))Xn ]
and Q Q = b,n b,n

(1) (1) d h1 Kn (, hn )(d Gs (, d , , hn )(Xn + Kn (, hn )Xn d ) + Xn d ) h1 Kn (0 , hn )(d Gn Xn 0 + Xn 0 ), n n n 0
the pointwise convergence of pointwise convergence of
1 1 s s n Qb,n Un (, hn ) n Qb,n Un (, hn ) hn hn s s n Qb,n Un (, hn ) n Qb,n Un (, hn )
p 0 follows from Lemma 7-8 while the p 0 follows from Lemma 9. And their
uniform convergence follows from boundness of K (1) (), compactness of parameter space and Lemma 2.
30
For (ii),it suces to show that, by referring to (25),

hn n Qb,n Qb,n
1 n Qb,n Qb,n
1 n Q Q = op (1) and b,n b,n
hn n Qb,n Qb,n
= op (1), and all these results can be veried by making use of Lemma 7 and 9 repeatedly.
For the limiting distribution of b,n , Taylor expansion similar to that of Proposition 3 gives
1 nR1n (b,n
0 )
ms (b,n , hn ) ms ( b,n , hn ) b,n b,n 1 1 = R1n R1n R1n 1 R1n R1n R1n b,n R1n ms (b,n , hn ) b,n 1 1 R1n R1n 1 R1n nR1n ms (0 , hn ). b,n b,n
Denote the true (unfeasible) best moment conditions corresponding to ms (, hn ) to be ms (, hn ) = b,n b,n
s n1 Qb,n Un (, hn ). Since we have essentially shown in proving (20), (21) and Proposition 3 that R1n b,n R1n , R1n b,n R1n , nR1n ms (0 , hn ) all have the stochastic order Op (1), and R1n (b,n b,n )R1n = op (1) in b,n
b,n proving the consistency, the desired result follows if we can show that (i) R1n op (1) uniformly over ; and (ii) nR1n ms (0 , hn ) ms (0 , hn ) = op (1). b,n b,n
ms
(,hn )
ms b
(,hn )
ms (,hn ) b,n
R1n =
For (i), some algebraic computation gives that R1n Wn Yn Q Q 1 b,n b,n n hn Qb,n Qb,n Wn Yn Q Q b,n b,n hn Q Q b,n b,n

ms (,hn ) b b,n
ms (,hn ) b,n

R1n = Kn (, hn )Wn Yn Kn (, hn )Wn Yn ,
Q Q b,n b,n hn Q b,n hn Q b,n
Xn Xn
Q Q b,n b,n hn Q b,n Q b,n
Kn (, hn )Xn Kn (, hn )Xn
1/2
Q Q b,n b,n
Kn (, hn )(d Wn Yn + Xn d )
(1)
(1)
Q Q b,n b,n
Kn (, hn )(d Wn Yn + Xn d ).
By Lemma 7-9 and the arguments analogous to that proving (A.2.4), we can show (i) to be true. For (ii), d nR1n ms (0 , hn ) ms (0 , hn ) = R1n (Qb,n Qb,n ) Un +[Dn Kn (0 , hn )](d Wn Yn +Xn 0 ) , where 0 b,n b,n n Lemma 8-9 can be used to show that that
R1n (Qb,n n
Qb,n ) [Dn
R1n (Qb,n Qb,n ) Un = op (1) n d Kn (0 , hn )](d Wn Yn + Xn 0 ) = op (1). 0
and Lemma 8-10 can be used to show
References
Amemiya, T., 1985, Advanced Econometrics, Harvard University Press, Cambridge, MA. Caner, M. and Hansen, B. E., 2004, Instrumental variable estimation of a threshold model, Econometric Theory, 20, 813-843. Chan, K.S., 1993, Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model, The Annals of Statistics, 21, 520-533. Cli, A.D. and Ord, J.K., 1973, Spatial Autocorrelation, Pion Ltd., London. Davidson, J., 1994, Stochastic Limit Theory: An Introduction for Econometricians, Oxford Press. Delgado, M. A. and Hidalgo, J., 2000, Nonparametric inference on structural breaks, Journal of Econometrics, 96, 113-144.
31
Durlauf, S. N. and Johnson, P. A., 1995, Multiple regimes and cross-Country growth behavior, Journal of Applied Econometrics, 10, 365-384. Glaeser, E. L., Sacerdote, B. and Scheinkman, J. A., 1996, Crime and social interactions, Quarterly Journal of Economics, 111, 507-548. Hansen, B. E., 2000, Sample splitting and threshold estimation, Econometrica, 68, 575-603. Hansen, L. P., 1982, Large sample properties of generalized method of moments estimators, Econometrica, 50, 1029-1054. Horowitz, J.L., 1992, A smoothed maximum score estimator for the binary response model, Econometrica, 60, 505-531. Kelejian, H.H. and Prucha, I.R., 1998, A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance, Journal of Real Estate Finance and Economics, 17, 99-121. Kelejian, H.H. and Prucha., I.R., 1999, A generalized moments estimator for the autoregressive parameter in a spatial model, International Economic Review, 40, 509-533. Kelejian, H.H., Prucha., I.R., 2001, On the asymptotic distribution of the Moran I test statistic with applications, Journal of Econometrics, 104, 219-257. Kelejian, H.H., Prucha., I.R., 2007, HAC estimation in a spatial framework, Journal of Econometrics, 140, 131-154. Kelejian, H. H. and Prucha, I. R., 2010, Specication and estimation of spatial autoregressive models with autoregressive and heteroscedastic disturbances, Journal of Econometrics, 157, 53-67. Koul, H. L. and Qian, L., 2002, Asymptotics of maximum likelihood estimator in a two-phase linear regression model, Journal of Statistical Planning and Inference, 108, 99C119. Lee. L.F., 2003, Best spatial two stage least squares estimators for a spatial autoregressive model with autoregressive disturbances, Econometric Reviews, 22, 307-335. Lee, L.F., 2004, Asymptotic distribution of quasi-maximum likelihood estimators for spatial autoregressive models, Econometrica, 72, 1899-1925. Lee, L.F., 2007, GMM and 2SLS estimation of mixed regressive, spatial autoregressive models, Journal of Econometrics, 137, 489-514. Lee, L.F. and Liu, X., 2010, Ecient GMM estimation of high order spatial autoregressive models with autoregressive disturbances, Econometric Theory, 26, 187-230. Lee, S. and Seo, M., 2008, Semi-parametric estimation of a binary response model with a change-point due to a covariate threshold, Journal of Econometrics, 144, 492-499. LeSage, J. P., 1999, The Theory and Practice of Spatial Econometrics, http://www.spatial-econometrics.com. Lin, X. and Lee, L.F., 2010, GMM estimation of spatial autoregressive models with unknown heteroscedasticity, Journal of Econometrics, 157, 34-52. Linton, O., 1995, Second order approximation in the partially linear regression model, Econometrica, 63, 1079-1112.
32
Ord, J.K., 1975, Estimation methods for models of spatial interaction, Journal of the American Statistical Association, 70, 120-126. Pons, O., 2003, Estimation in a Cox regression model with a change-point according to a threshold in a covariate, The Annals of Statistics, 31, 442463. Seo, M. and Linton, O., 2007, A smoothed least squares estimator for the threshold regression, Journal of Econometrics, 141, 704-735. Su, L., 2011, Semi-parametric GMM estimation of spatial autoregressive models, Journal of Econometrics, forthcoming. Su, L. and Jin, S., 2010, Prole quasi-maximum likelihood estimation of spatial autoregressive models, Journal of Econometrics, 157, 18-33. Su, L. and Yang, Z., 2011, Instrumental variable quantile estimation of spatial autoregressive models, Working paper, Singapore Management University. White, H., 1994, Estimation, Inference and Specication Analysis, Cambridge Univsersity Press, New York. Yu, J. and Lee, L.F., 2009, Convergence: a spatial dynamic panel data approach, Working paper.
33
Table 1: Threshold variable t N (1, 1), Independent t and x, d (0 , d , 20 , 20 , 0 ) = (0.3, 0.8, 1, 0.5, 0.5) 0
n = 196 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.487 0.042 0.279 0.165 0.748 0.178 1.020 0.147 0.470 0.165 (5) 0.486 0.043 0.265 0.168 0.737 0.186 1.025 0.150 0.456 0.174 (5)b 0.488 0.043 0.279 0.166 0.748 0.177 1.018 0.148 0.458 0.168 (10) 0.490 0.044 0.263 0.167 0.738 0.185 1.024 0.149 0.465 0.173 (10)b 0.488 0.042 0.274 0.167 0.750 0.177 1.020 0.148 0.470 0.167 (5) 0.485 0.044 0.257 0.170 0.740 0.190 1.035 0.152 0.450 0.175
(3) = 2 e (5)b 0.486 0.043 0.276 0.168 0.746 0.180 1.020 0.149 0.456 0.170 (10) 0.484 0.045 0.258 0.170 0.740 0.187 1.033 0.149 0.453 0.173 (10)b 0.490 0.042 0.280 0.166 0.747 0.180 1.022 0.147 0.465 0.168
(5)b 0.488 0.042 0.278 0.165 0.750 0.178 1.019 0.147 0.458 0.168
(10) 0.487 0.044 0.266 0.167 0.738 0.186 1.018 0.149 0.468 0.172
Med Std Med Std Med Std Med Std Med Std
0.488 0.043 0.264 0.168 0.744 0.187 1.019 0.149 0.458 0.173
2
d 2
n = 400 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.495 0.024 0.292 0.115 0.758 0.125 1.007 0.102 0.481 0.118 (5) 0.494 0.025 0.276 0.118 0.752 0.130 1.008 0.104 0.475 0.123 (5)b 0.493 0.024 0.282 0.112 0.760 0.126 1.008 0.104 0.479 0.119 (10) 0.493 0.025 0.271 0.116 0.751 0.131 1.008 0.104 0.477 0.123 (10)b 0.493 0.024 0.287 0.112 0.758 0.126 1.007 0.104 0.481 0.121 (5) 0.493 0.026 0.274 0.117 0.753 0.129 1.008 0.105 0.478 0.122
(3) = 2 e (5)b 0.493 0.025 0.290 0.113 0.760 0.125 1.007 0.104 0.481 0.121 (10) 0.493 0.025 0.274 0.115 0.753 0.129 1.008 0.105 0.475 0.121 (10)b 0.493 0.024 0.290 0.112 0.760 0.125 1.007 0.104 0.480 0.120
(5)b 0.495 0.023 0.290 0.115 0.760 0.125 1.008 0.102 0.480 0.118
(10) 0.495 0.024 0.275 0.118 0.752 0.130 1.010 0.103 0.477 0.122
0.494 0.024 0.274 0.119 0.752 0.128 1.010 0.104 0.477 0.122
2
d 2
Note: The estimators (5),(5)b,(10) and (10)b refer to the SGMME with ve times of iterations, its
corresponding feasible best SGMME, SGMME with ten iterations and its corresponding feasible best SGMME, respectively. The empirical standard deviation is computed as the interquartile range (the dierence between 3/4 quantile and 1/4 quantile) divided by 1.35.
34
Table 1: (continued)
n = 900 (1) = 1 e (5) d 2

d 2
(2) = 1.5 e (10)b 0.497 0.015 0.292 0.076 0.777 0.085 1.006 0.067 0.486 0.078 (5) 0.494 0.018 0.283 0.086 0.773 0.096 1.006 0.075 0.482 0.087 (5)b 0.497 0.015 0.292 0.077 0.776 0.087 1.005 0.068 0.486 0.078 (10) 0.494 0.018 0.282 0.085 0.773 0.095 1.006 0.074 0.482 0.087 (10)b 0.497 0.016 0.294 0.077 0.776 0.086 1.005 0.069 0.486 0.079 (5) 0.496 0.018 0.280 0.085 0.772 0.095 1.007 0.073 0.480 0.088
(3) = 2 e (5)b 0.497 0.015 0.291 0.077 0.776 0.086 1.007 0.067 0.485 0.078 (10) 0.495 0.017 0.282 0.085 0.772 0.096 1.007 0.073 0.481 0.087 (10)b 0.497 0.015 0.292 0.076 0.775 0.085 1.006 0.067 0.485 0.078
(5)b 0.497 0.014 0.292 0.076 0.774 0.086 1.005 0.067 0.486 0.077
(10) 0.495 0.017 0.284 0.084 0.772 0.094 1.007 0.074 0.483 0.086
0.495 0.017 0.281 0.085 0.772 0.095 1.007 0.075 0.483 0.086
n = 1600 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.500 0.008 0.295 0.056 0.787 0.057 1.004 0.044 0.489 0.055 (5) 0.499 0.009 0.295 0.061 0.780 0.065 1.007 0.053 0.485 0.060 (5)b 0.499 0.008 0.297 0.057 0.786 0.059 1.004 0.046 0.491 0.056 (10) 0.498 0.009 0.296 0.061 0.780 0.064 1.006 0.052 0.487 0.060 (10)b 0.499 0.008 0.296 0.057 0.786 0.057 1.004 0.046 0.491 0.055 (5) 0.498 0.010 0.295 0.060 0.781 0.066 1.007 0.055 0.484 0.061
(3) = 2 e (5)b 0.499 0.008 0.295 0.057 0.786 0.059 1.004 0.047 0.490 0.057 (10) 0.498 0.010 0.295 0.060 0.780 0.066 1.007 0.054 0.484 0.060 (10)b 0.498 0.008 0.295 0.056 0.785 0.058 1.004 0.047 0.490 0.057
(5)b 0.499 0.009 0.296 0.056 0.787 0.058 1.004 0.046 0.490 0.056
(10) 0.498 0.010 0.294 0.061 0.781 0.064 1.005 0.052 0.486 0.059
0.498 0.010 0.294 0.060 0.780 0.065 1.005 0.053 0.486 0.060
2
d 2
Note: The estimators (5),(5)b,(10) and (10)b refer to the SGMME with ve times of iterations, its
35
Table 2: Threshold variable t N (1, 1), t = x, d (0 , d , 20 , 20 , 0 ) = (0.3, 0.8, 1, 0.5, 0.5) 0
n = 196 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.489 0.056 0.270 0.216 0.738 0.238 1.016 0.295 0.554 0.340 (5) 0.484 0.062 0.262 0.227 0.728 0.248 1.020 0.329 0.567 0.356 (5)b 0.489 0.058 0.267 0.219 0.736 0.241 1.018 0.312 0.556 0.346 (10) 0.485 0.061 0.263 0.226 0.729 0.245 1.019 0.326 0.560 0.355 (10)b 0.488 0.057 0.269 0.217 0.737 0.238 1.018 0.310 0.555 0.348 (5) 0.480 0.064 0.260 0.228 0.728 0.250 1.021 0.332 0.570 0.357
(3) = 2 e (5)b 0.486 0.060 0.265 0.220 0.735 0.243 1.020 0.321 0.557 0.346 (10) 0.481 0.063 0.261 0.226 0.727 0.249 1.020 0.330 0.568 0.356 (10)b 0.486 0.058 0.266 0.217 0.735 0.243 1.018 0.319 0.557 0.348
(5)b 0.488 0.058 0.268 0.219 0.736 0.240 1.018 0.302 0.557 0.344
(10) 0.485 0.061 0.263 0.225 0.729 0.246 1.019 0.316 0.565 0.350
0.484 0.062 0.262 0.228 0.727 0.248 1.020 0.320 0.564 0.354
2
d 2
n = 400 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.499 0.028 0.286 0.149 0.757 0.160 1.014 0.228 0.532 0.243 (5) 0.497 0.029 0.277 0.154 0.751 0.175 1.017 0.240 0.538 0.254 (5)b 0.497 0.030 0.286 0.151 0.758 0.163 1.015 0.228 0.533 0.247 (10) 0.497 0.030 0.276 0.154 0.751 0.173 1.016 0.239 0.538 0.254 (10)b 0.498 0.030 0.286 0.151 0.757 0.162 1.016 0.228 0.534 0.245 (5) 0.497 0.031 0.274 0.157 0.747 0.175 1.018 0.243 0.540 0.258
(3) = 2 e (5)b 0.495 0.030 0.280 0.154 0.757 0.164 1.016 0.231 0.534 0.248 (10) 0.497 0.030 0.273 0.157 0.746 0.176 1.019 0.243 0.540 0.258 (10)b 0.496 0.029 0.283 0.153 0.757 0.165 1.015 0.230 0.535 0.250
(5)b 0.499 0.029 0.285 0.150 0.758 0.163 1.013 0.229 0.534 0.246
(10) 0.498 0.029 0.277 0.153 0.751 0.174 1.015 0.238 0.537 0.252
0.498 0.029 0.276 0.154 0.750 0.175 1.015 0.240 0.540 0.254
2
d 2
Note: The estimators (5), (5)b, (10) and (10)b refer to the SGMME with ve times of iterations, its
36
Table 3: Threshold variable t Uniform(1 3, 1 + 3), Independent t and x, d (0 , d , 20 , 20 , 0 ) = (0.3, 0.8, 1, 0.5, 0.5) 0
n = 196 (1) = 1 e (5) d 2

d 2
(2) = 1.5 e (10)b 0.496 0.049 0.267 0.152 0.750 0.175 1.014 0.143 0.486 0.164 (5) 0.496 0.050 0.252 0.156 0.742 0.185 1.018 0.147 0.481 0.174 (5)b 0.495 0.048 0.265 0.153 0.748 0.176 1.015 0.145 0.486 0.166 (10) 0.496 0.050 0.253 0.156 0.742 0.184 1.017 0.147 0.481 0.176 (10)b 0.496 0.049 0.266 0.153 0.749 0.177 1.014 0.146 0.486 0.168 (5) 0.495 0.051 0.254 0.157 0.743 0.186 1.019 0.147 0.480 0.175
(3) = 2 e (5)b 0.495 0.049 0.260 0.154 0.748 0.178 1.015 0.147 0.484 0.167 (10) 0.495 0.050 0.254 0.157 0.744 0.185 1.019 0.148 0.480 0.174 (10)b 0.495 0.049 0.265 0.153 0.747 0.177 1.015 0.144 0.483 0.169
(5)b 0.496 0.048 0.265 0.153 0.747 0.176 1.014 0.144 0.485 0.165
(10) 0.496 0.050 0.254 0.156 0.743 0.183 1.016 0.146 0.481 0.173
0.496 0.050 0.250 0.156 0.742 0.184 1.018 0.147 0.480 0.173
n = 400 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.499 0.025 0.278 0.099 0.769 0.113 1.007 0.087 0.488 0.108 (5) 0.498 0.026 0.265 0.109 0.761 0.121 1.010 0.096 0.486 0.112 (5)b 0.498 0.026 0.275 0.101 0.767 0.114 1.007 0.090 0.486 0.109 (10) 0.498 0.026 0.264 0.107 0.763 0.121 1.009 0.095 0.486 0.111 (10)b 0.499 0.025 0.275 0.100 0.769 0.114 1.007 0.090 0.487 0.109 (5) 0.498 0.026 0.262 0.110 0.760 0.122 1.011 0.097 0.487 0.114
(3) = 2 e (5)b 0.498 0.026 0.273 0.104 0.765 0.115 0.009 0.090 0.486 0.110 (10) 0.498 0.026 0.262 0.110 0.758 0.124 1.010 0.097 0.485 0.115 (10)b 0.498 0.026 0.273 0.102 0.765 0.118 1.011 0.090 0.485 0.109
(5)b 0.499 0.025 0.276 0.100 0.767 0.112 1.008 0.088 0.488 0.108
(10) 0.499 0.025 0.266 0.107 0.762 0.120 1.009 0.094 0.485 0.111
0.498 0.026 0.265 0.109 0.760 0.120 1.009 0.095 0.486 0.112
2
d 2
37
Table 4: Threshold variable t 2 (1)/ 2, Independent t and x, d (0 , d , 20 , 20 , 0 ) = (0.3, 0.8, 1, 0.5, 0.5) 0
n = 196 (1) = 1 e (5) d 2

d 2
(2) = 1.5 e (10)b 0.534 0.050 0.266 0.138 0.741 0.194 1.030 0.155 0.460 0.163 (5) 0.535 0.053 0.261 0.147 0.735 0.204 1.037 0.158 0.452 0.168 (5)b 0.536 0.051 0.265 0.139 0.740 0.196 1.034 0.155 0.459 0.165 (10) 0.536 0.053 0.260 0.147 0.736 0.204 1.037 0.158 0.453 0.168 (10)b 0.539 0.051 0.264 0.139 0.739 0.195 1.035 0.155 0.457 0.165 (5) 0.539 0.054 0.256 0.150 0.734 0.205 1.039 0.158 0.451 0.171
(3) = 2 e (5)b 0.540 0.051 0.255 0.144 0.738 0.199 1.036 0.157 0.458 0.167 (10) 0.538 0.054 0.257 0.151 0.732 0.205 1.039 0.159 0.452 0.170 (10)b 0.539 0.054 0.255 0.145 0.736 0.198 1.037 0.157 0.457 0.167
(5)b 0.534 0.050 0.261 0.137 0.740 0.197 1.032 0.156 0.458 0.164
(10) 0.535 0.052 0.261 0.144 0.737 0.203 1.032 0.157 0.458 0.167
0.536 0.053 0.260 0.144 0.736 0.204 1.035 0.158 0.452 0.168
n = 400 (1) = 1 e (5)

d
(2) = 1.5 e (10)b 0.522 0.024 0.279 0.091 0.765 0.124 1.015 0.109 0.479 0.111 (5) 0.524 0.027 0.270 0.096 0.755 0.130 1.025 0.117 0.472 0.116 (5)b 0.527 0.025 0.272 0.094 0.763 0.127 1.019 0.116 0.474 0.114 (10) 0.527 0.027 0.268 0.096 0.754 0.129 1.027 0.117 0.472 0.115 (10)b 0.529 0.026 0.271 0.094 0.760 0.124 1.018 0.115 0.472 0.115 (5) 0.530 0.028 0.260 0.098 0.749 0.132 1.028 0.120 0.478 0.122
(3) = 2 e (5)b 0.531 0.027 0.265 0.097 0.751 0.128 1.021 0.118 0.478 0.120 (10) 0.532 0.030 0.268 0.098 0.749 0.132 1.028 0.121 0.477 0.122 (10)b 0.532 0.030 0.261 0.096 0.751 0.129 1.023 0.118 0.476 0.122
(5)b 0.523 0.025 0.280 0.092 0.761 0.127 1.018 0.114 0.478 0.112
(10) 0.524 0.025 0.276 0.095 0.758 0.128 1.023 0.113 0.476 0.113
0.524 0.026 0.275 0.095 0.755 0.130 1.024 0.115 0.475 0.114
2
d 2
38

Smoothed Generalized Method of Moments Estimation of Threshold Spatial Autoregressive Model

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Smoothed Generalized Method of Moments Estimation of Threshold Spatial Autoregressive Model

Încărcat de

Drepturi de autor:

Formate disponibile

Smoothed Generalized Method of Moments Estimation of Threshold Spatial Autoregressive Model

JEL Classication: C13; C14; C21;

is the spatial lag, t is the observed threshold variable, which can be an

Smoothed GMM Estimator

Write model (2) as

where Yn = (y1 , , yn ) , Wn = [wij ]i,j=1, ,n with wii = 0, i = 1, , n, Xn = (x1 , , xn ) ,

lim K(s) = 0, and

Then n can be dened as the minimizer to the concentrated quadratic criterion

where ms,c (, hn ) = n 1 s Q Yn Zn (, hn )n () . n n (12)

Given n , then can be estimated by n (n ).

Q (, hn ) = [Q1n (, hn ), Kn (, hn )Q1n (, hn )], n

Q1n (, hn ) is composed of linearly independent columns chosen from

ds p and q are nite positive integers, Wn (, hn ) = Kn (, hn )Wn and

Q (, hn ) = h1 [K(1) (, hn )Q1n (, hn ), K(1) (, hn )ln , K(1) (, hn )Xn ], n n Kn (, hn ) = diag K (1)

with K (1) () being the rst derivative of K().

and Q is designed to instrument n

Second, as will become evident later, here Qn is

its nite sample performance.

1 Sn = In 0 Wn d Dn Wn is nonsingular. (iii) The row and column sums of Sn and Wn are 0

uniformly bounded in absolute value.

The n kq matrix Qn = [Q , Q ], where Q is n kq and Q is n kq . n n n n

exists a positive constant c independent of n with maxi of an n n matrix Pn is dened as Pn dened as Pn

|Pn,ij | < c and maxj

|Pn,ij | < c. This notion of

|Pn,ij |, and the maximum row sum matrix norm

both K()s satisfy Assumption 4. (13)-(14) satises Assumption 5.

As an example, one may easily verify that the Qn dened by

1 1 implies that = 0 (Hansen, 1982). Denote Gn (, d , ) = Wn Sn (, d , ), Gn = Wn Sn , Xn =

throughout the estimation procedure for convenience.

a slower rate of convergence reecting the presence of n in calculating n .

with Q = Gn Xn 0 , Xn , Kn (0 , hn )Gn Xn 0 , Kn (0 , hn )Xn and Q = h1 Kn (0 , hn )(d Gn Xn 0 + n n 0 b,n d Xn 0 ).

, we have In d (Kn (0 , hn ) Dn )Gn 0 Gn

1 where Sn = cs and Gn = cg . Since ti s have a continuous density function by Assumption 2,

= 0, as n . Therefore, the uniform

nh1 -consistent estimator of 0 and 2 is a consistent estimator n

b,n = arg min ms (, hn )1 ms (, hn ), b,n b,n b,n

Monte Carlo Simulation

where Q (, hn ) = [Q1n (, hn ), Kn (, hn )Q1n (, hn )], n

d ln is a n-dimensional column vector of ones, Wn (, hn ) = Kn (, hn )Wn and

Q (, hn ) = h1 [K(1) (, hn )Q1n (, hn ), K(1) (, hn )ln , K(1) (, hn )Xn ]. n n

Appendix A. Summary of Notations

wij yj , di () = 1(ti > ), y d () = y i di (), xd () = xi di (), y d = y d (0 ), xd = xd (0 ). i i i i i i

of order three. Lemma 1. Under Assumption 2 and 4, we have (i) 1 n

(1(ti > ) K((ti )/hn )) = o(1),

(1(ti > ) K((ti )/hn )) = O( nh2 ), n

(iv) Furthermore, 1 nhn uniformly over .

K (1) ((ti )/hn )(1(ti > ) K((ti )/hn )) = o(1),

(1(ti > ) K((ti )/hn )) C1n + C2n

where C1n = and C2n =

|(1(ti > ) K((ti )/hn ))| 1(|ti | )

|(1(ti > ) K((ti )/hn ))| 1(|ti | < ).

1(|t | < )f (t)dt = c

(1(ti > ) K((ti )/hn )) =

(1(t > ) K((t )/hn ))f (t)dt

(1(hn s > 0) K(s))f (hn s + )ds =

(1(s > 0) K(s))(f () + f (1) ()hn s)ds

(1(s > 0) K(s))ds +

(1(s > 0) K(s))sf (1) ()ds

> 0) K(s))ds = 0, since K (1) is symmetric by Assumption 4 and for

4-(iii). As a result, we see bounded by Assumption 2. For (iii), note that 1 n = 1 n

> ) K((ti )/hn )) =

0 s2 K (1) (s)ds < by O( nh2 ) = o(1) since f (1) n