Documente Academic
Documente Profesional
Documente Cultură
standardised to have a mean of zero and a standard deviation of one in order for the all variables to
contribute evenly to the model, aid in comparing the variables and make it easier to interpret
results of the logistic regression (Hosmer & Lemeshow, 2000).
Advice
In summary, my personal advice (and I have respect for conflicting opinions) is
I never worry about whether (1) is true. I assume the logit link is OK.
If I think the model is reasonably specified, I use the ML variance estimator for logistic
regression.
Only if I have good reason to believe that the model is poorly specified would I use the robust
variance estimator. That is, if the model fails goodness-of-fit tests, etc. Sometimes one just has
to live with missing predictors and badly fitting models because data were collected for only a
few predictors. In this case, Id use the robust variance estimator.
And, obviously, Id use the robust variance estimator if I had clustered data.
This recommendation is in contrast to the advice Id give for linear regression for which Id
say always use the robust variance estimator.
The basic idea of logistic regression is to use the mechanism already developed for linear
regression by modeling the probability pi using a linear predictor function, i.e. a linear combination of
the explanatory variables and a set of regression coefficients that are specific to the model at hand but
the same for all trials. The linear predictor function for a particular data point i is written as:
where are regression coefficients indicating the relative effect of a particular explanatory variable
on the outcome.
The regression coefficients 0, 1, ..., m are grouped into a single vector of size m + 1.
For each data point i, an additional explanatory pseudo-variable x0,i is added, with a fixed value
of 1, corresponding to the intercept coefficient 0.
The resulting explanatory variables x0,i, x1,i, ..., xm,i are then grouped into a single vector Xi of
size m + 1.
Written using the more compact notation described above, this is:
The intuition for transforming using the logit function (the natural log of the odds) was
explained above. It also has the practical effect of converting the probability (which is
bounded to be between 0 and 1) to a variable that ranges over thereby matching
the potential range of the linear prediction function on the right side of the equation.
Note that both the probabilities pi and the regression coefficients are unobserved, and
the means of determining them is not part of the model itself. They are typically
determined by some sort of optimization procedure, e.g. maximum likelihood
estimation, that finds values that best fit the observed data (i.e. that give the most
accurate predictions for the data already observed), usually subject
to regularization conditions that seek to exclude unlikely values, e.g. extremely large
values for any of the regression coefficients. The use of a regularization condition is
equivalent to doing maximum a posteriori (MAP) estimation, an extension of
maximum likelihood. (Regularization is most commonly done using a squared
regularizing function, which is equivalent to placing a zero-mean Gaussian prior
distribution on the coefficients, but other regularizers are also possible.) Whether or
not regularization is used, it is usually not possible to find a closed-form solution;
instead, an iterative numerical method must be used, such as iteratively reweighted
least squares (IRLS) or, more commonly these days, a quasi-Newton method such as
the L-BFGS method.
The interpretation of the j parameter estimates is as the additive effect on the log of
the odds for a unit change in the jth explanatory variable. In the case of a
dichotomous explanatory variable, for instance gender, is the estimate of the odds of
having the outcome for, say, males compared with females.
An equivalent formula uses the inverse of the logit function, which is the logistic
function, i.e.:
[page needed]
Menard, Scott W. (2002). Applied Logistic Regression (2nd ed.). SAGE. ISBN 978-0-7619-2208-7.
The linear probability model (LPM) is by far the simplest way of dealing with binary
dependent variables, and it is based on an assumption that the probability of an
event occurring, Pi , is linearly related to a set of explanatory variables (Brooks,
2008:512) x2i , x3i , ... , xki Pi = p(yi = 1) = 1 + 2x2i + 3x3i ++ k xki + ui, i =
1,..., N (11.1) The actual probabilities cannot be observed, so we would estimate a
model where the outcomes, yi (the series of zeros and ones), would be the dependent
variable. This is then a linear regression model and would be estimated by OLS. The
set of explanatory variables could include either quantitative variables or dummies or
both. The fitted values from this regression are the estimated probabilities for yi = 1
for each observation i. The slope estimates for the linear probability model can be
interpreted as the change in the probability that the dependent variable will equal 1
for a one-unit change in a given explanatory variable, holding the effect of all other
explanatory variables fixed. Suppose, for example, that we wanted to model the
probability that a firm i will pay a dividend (yi = 1) as a function of its market
capitalisation (x2i , measured in millions of US
While the linear probability model is simple to estimate and intuitive to interpret, the
diagram should immediately signal a problem with this setup. For any firm whose
value is less than $25m, the model-predicted probability of dividend payment is
negative, while for any firm worth more than $88m, the probability is greater than
one. Clearly, such predictions cannot be allowed to stand, since the probabilities
should lie within the range (0,1). An obvious solution is to truncate the probabilities at
0 or 1, so that a probability of 0.3, say, would be set to zero, and a probability of,
say, 1.2 would be set to 1. However, there are at least two reasons why this is still not
adequate: (1) The process of truncation will result in too many observations for which
the estimated probabilities are exactly zero or one. (2) More importantly, it is simply
not plausible to suggest that the firms probability of paying a dividend is either
exactly zero or exactly one. Are we really certain that very small firms will definitely
never pay a dividend and that large firms will always make a payout? Probably not, so
a different kind of model is usually used for binary dependent 514 Introductory
Econometrics for Finance variables -- either a logit or a probit specification. These
approaches will be discussed in the following sections. But before moving on, it is
worth noting that the LPM also suffers from a couple of more standard econometric
problems that we have examined in previous chapters. First, since the dependent
variable takes only one or two values, for given (fixed in repeated samples) values of
the explanatory variables, the disturbance term1 will also take on only one of two
values. Consider again equation (11.1). If yi = 1, then by definition
Hence the error term cannot plausibly be assumed to be normally distributed. Since
ui changes systematically with the explanatory variables, the disturbances will also
be heteroscedastic. It is therefore essential that heteroscedasticity-robust standard
errors are always used in the context of limited dependent variable models.
Both the logit and probit model approaches are able to overcome the limitation of the
LPM that it can produce estimated probabilities that are negative or greater than one.
They do this by using a function that effectively transforms the regression model so
that the fitted values are bounded within the (0,1) interval. Visually, the fitted
regression model will appear as an S-shape rather than a straight line, as was the
case for the LPM. This is shown in figure 11.2. Brooks, 514
Log [p/(1-p)] = a + bx
Wooldridge (533) puts it, for estimating limited dependent variable models,
maximum likelihood methods are indispensable.
Yes, I do get grumpy about some of the things I see so-called "applied econometricians" doing
all of the time. For instance, see my Gripe of the Day post back in 2011. Sometimes I feel as if I
could produce a post with that title almost every day!
The following facts are widely known (e.g., check any recent edition of Greene's text) and it's
hard to believe that anyone could get through a grad. level course in econometrics and not be
aware of them:
In the case of a linear regression model, heteroskedastic errors render the OLS estimator, b, of
the coefficient vector, , inefficient. However, this estimator is still unbiased and weakly consistent.
In this same linear model, and still using OLS, the usual estimator of the covariance matrix of b
is an inconsistent estimators of the true covariance matrix of b. Consequently, if the standard errors of
the elements of b are computed in the usual way, they will inconsistent estimators of the true standard
deviations of the elements of b.
For this reason,we often use White's "heteroskedasticity consistent" estimator for the covariance
matrix of b, if the presence of heteroskedastic errors is suspected.
This covariance estimator is still consistent, even if the errors are actually homoskedastic.
In the case of the linear regression model, this makes sense. Whether the errors are
homoskedastic or heteroskedastic, both the OLS coefficient estimators and White's standard errors are
consistent.
However, in the case of a model that is nonlinear in the parameters:
The MLE of the parameter vector is biased and inconsistent if the errors are heteroskedastic
(unless the likelihood function is modified to correctly take into account the precise form of
heteroskedasticity).
This stands in stark contrast to the situation above, for the linear model.
The MLE of the asymptotic covariance matrix of the MLE of the parameter vector is also
inconsistent, as in the case of the linear model.
Obvious examples of this are Logit and Probit models, which are nonlinear in the parameters, and
are usually estimated by MLE.
I've made this point in at least one previous post. The results relating to nonlinear models are really
well-known, and this is why it's extremely important to test for model mis-specification (such as
heteroskedasticity) when estimating models such as Logit, Probit, Tobit, etc. Then, if need be, the model
can be modified to take the heteroskedasticity into account before we estimate the parameters. For
more information on such tests, and the associated references, see this page on my professional
website.
Unfortunately, it's unusual to see "applied econometricians" pay any attention to this! They tend to just
do one of two things. They either
1. use Logit or Probit, but report the "heteroskedasticity-consistent" standard errors that their
favourite econometrics package conveniently (but misleading) computes for them. This involves a
covariance estimator along the lines of White's "sandwich estimator". Or, they
2. estimate a "linear probability model" (i.e., just use OLS, even though the dependent variable is a
binary dummy variable, and report the "het.-consistent standard errors".
If they follow approach 2, these folks defend themselves by saying that "you get essentially the same
estimated marginal effects if you use OLS as opposed to Probit or Logit." I've said my piece about this
attitude previously (here, here, here, and here), and I won't go over it again here.
The "robust" standard errors are being reported to cover the possibility that the model's errors may be
heteroskedastic. But if that's the case, the parameter estimates are inconsistent. What use is a
consistent standard error when the point estimate is inconsistent? Not much!!
This point is laid out pretty clearly in Greene (2012, pp. 692-693), for example. Here's what he
has to say:
"...the probit (Q-) maximum likelihood estimator is not consistent in the presence of any form
of heteroscedasticity, unmeasured heterogeneity, omitted variables (even if they are
orthogonal to the included ones), nonlinearity of the form of the index, or an error in the
distributional assumption [ with some narrow exceptions as described by Ruud (198)]. Thus, in
almost any case, the sandwich estimator provides an appropriate asymptotic covariance
matrix for an estimator that is biased in an unknown direction." (My underlining; DG.) "White
raises this issue explicitly, although it seems to receive very little attention in the
literature.".........."His very useful result is that if the QMLE converges to a probability limit,
then the sandwich estimator can, under certain circumstances, be used to estimate the
asymptotic covariance matrix of that estimator. But there is no guarantee the the
QMLE will converge to anything interesting or useful. Simply computing a robust covariance
matrix for an otherwise inconsistent estimator does not give it redemption. Consequently, the
virtue of a robust covariance matrix in this setting is unclear."
Back on July 2006, on the R Help feed, Robert Duval had this to say:
"This discussion leads to another point which is more subtle, but more important...
You can always get Huber-White (a.k.a robust) estimators of the standard errors even in non-linear
models like the logistic regression. However, if you believe your errors do not satisfy the standard
assumptions of the model, then you should not be running that model as this might lead to biased
parameter estimates.
For instance, in the linear regression model you have consistent parameter estimates independently of
whether the errors are heteroskedastic or not. However, in the case of non-linear models it is usually the
case that heteroskedasticity will lead to biased parameter estimates (unless you fix it explicitly
somehow).
Stata is famous for providing Huber-White std. errors in most of their regression estimates, whether
linear or non-linear. But this is nonsensical in the non-linear models since in these cases you would be
consistently estimating the standard errors of inconsistent parameters.
This point and potential solutions to this problem is nicely discussed in Wooldrige's Econometric Analysis
of Cross Section and Panel Data."
Amen to that!
Regrettably, it's not just Stata that encourages questionable practices in this respect. These
same options are also available in EViews, for example.
Reference
Greene, W. H., 2012. Econometric Analysis. Prentice Hall, Upper Saddle River, NJ.
Greene, W. H. (:389)
Binary 189-190
x* = (x-m)/sd
To illustrate the process of standardization, we will use the High School and Beyond
dataset (hsb2). We will create standardized versions of three variables, math,
science, and socst. These variables contain students scores on tests of knowledge of
mathematics (math), science (science), social studies (socst). First, we will use the
summarize command (abbreviated as sum below) to get the mean and standard
deviation for each variable.
There are many reasons for transformation. The list here is not
comprehensive.
1. Convenience
2. Reducing skewness
3. Equal spreads
4. Linear relationships
5. Additive relationships
If you are looking at just one variable, 1, 2 and 3 are relevant, while
if you are looking at two or more variables, 4 and 5 are more important.
However, transformations that achieve 4 and 5 very often achieve 2 and 3.
value - level
standardised value = -------------.
spread
Standardised values have level 0 and spread 1 and have no units: hence
standardisation is useful for comparing variables expressed in different
units. Most commonly a standard score is calculated using the mean and
standard deviation (sd) of a variable:
x - mean of x
z = -------------.
sd of x
Standardisation makes no difference to the shape of a distribution.
Dear hyojoung,
What you could do is estimate a model with -hetprob- and -probit- and
do a likelihood ratio test (-lrtest-). This is an test for
heteroscedasticity in probit regression, which is very close to
logisitic regression, except you don't get the nice odds ratios.
Logistic and probit regression are so close that the choice between
them is often based on practical grounds and tradition within the
discipline and not on substantial grounds. The pressence of -hetprob-
would be such a practical reason why you might want to switch to
probit in this case. If you realy want to use logit, and want to put
up with indirect evidence than the comparison of -hetprob- and -
probit- would in my eyes be more convincing indirect evidence than -
hettest- on a linear probability model.
Maarten
http://www.statalist.org/forums/forum/general-stata-discussion/general/5564-
heteroskedasticity-test-for-logit-and-logistic-models
http://www.stata.com/manuals13/rhetprobit.pdf
ln[p/(1-p)] = a + BX + e or
[p/(1-p)] = exp(a + BX + e)
where: