Documente Academic
Documente Profesional
Documente Cultură
Suppose you were given a random sample of observations from a normal distribution, and you
wish to use the sample data to estimate the population mean. This is a simple example of an
estimation problem. The you are seeking to estimate is a population parameter. A parameter
is a number, generally unknown, that describes some interesting characteristic of the population.
In a more general setting the generic notation for an unknown population parameter isu .
Suppose you decide to use the sample mean of the data, X , to guess the true value of . X is
an example of an estimator; the generic notation for an estimator is
u . More generally, an
estimator is a function of sample data used to guess an unknown population parameter. The
difficulty arises because there are many plausible ways to use the sample data to guess the same
unknown population parameter. Since the normal distribution is symmetric, the population mean
is the same as the population median. Therefore it makes sense to believe you could also guess
using the sample median. For that matter, because the normal is symmetric, you could equally
plausibly use the sample midrange the average of the largest and smallest observations in the
data. Which of these alternatives is best, and what exactly does one mean by best?
Desirable Properties of Estimators: What do we mean by best?
There are three properties of estimators that are commonly used to judge their quality.
1) Unbiasedness An unbiased estimator has no tendency to over or underestimate the
truth. The mathematical statement of unbiasedness is that
( )
. E u u =
( )
E u is the
average guess, and unbiasedness means the average guess is correct. Average over
what? you might ask. The answer is the average over all the possible samples of
size n that might be drawn to construct the estimate. If an estimator is biased, the
size of that bias is given by
( )
Bias E u u .
2) Consistency A consistent estimator has the property that as the sample size goes to
infinity, the estimator homes in on the true parameter value. To state this property
more precisely, it is that
( )
1
n
P u u o
< . In English, the probability that the guess,
= +
}
and ( )
0
xf x dx
=
}
. According to the Lebesgue definition of the
integral (which is the correct one for this branch of statistics) = , which is undefined. On
account of this the Cauchy has other bizarre properties. If you take a random sample of size n
and computeX , X has the same distribution as any of the individual observations. In other
words, if you try to discover the center of the distribution by usingX , increasing the sample size
will do you no good at all. Your estimator will have the same distribution whether you use one
observation or a million observations. Suppose we call the center of the Cauchy bell curve u .
This means X is not a consistent estimator of u . You cant even sayX is an unbiased estimator
of u because the definition of unbiasedness is
( )
0 bias E u u = = . For
n
i
i
X X
n
o
=
=
. How do
2
s and
2
o compare according to the
criteria introduced here? If n is large, the two estimators are almost the same, so to keep the
difference appreciable, I have simulated 5000 samples, each with 3 n = , and computed 5000
sample estimates using both
2
s and
2
o . The sample is taken from a normal distribution with
2
o =4, so we want our estimators to give answers close to four. Here are histograms of the
actual results.
0
.
0
5
.
1
.
1
5
.
2
.
2
5
D
e
n
s
i
t
y
0 10 20 30 40
Variances
Observed values of s-squared
0
.
1
.
2
.
3
.
4
D
e
n
s
i
t
y
0 5 10 15 20 25
sigma_hat_sq
Observed values of sigma hat squared
Since these distributions arent symmetric, it is hard to eyeball the average value. However, here
are the summary statistics.
si gma_hat _sq 5000 2. 647214 2. 621763 . 0001333 23. 08673
s_squar ed 5000 3. 970821 3. 932645 . 0002 34. 6301
Var i abl e Obs Mean St d. Dev. Mi n Max
. summar i ze s_squar ed si gma_hat _sq
Recall that the true value of
2
o is four, and note that the average guess when one uses
2
s is 3.97;
when one uses
2
o the average guess is 2.65. This suggests (correctly) that
2
s is unbiased but that
2
o is downwardly biased, so that it tends to underestimate the true value of
2
o . Both of these
estimators are consistent, however. It is easy to see that if one of them homes in on
2
o as the
sample size goes to infinity, the other must also, because the two differ by a factor of ( ) 1 n n
and ( ) 1 1
n
n n
. If one looks at the summary table and the histograms one can see a weakness
in
2
s ; namely, that while it is correct on average, it sometimes overestimates the true value by a
large margin and is considerably more variable than
2
o -- it has a standard deviation of 3.93
compared to just 2.62 for
2
o . Which estimator has the smaller mean squared error?
( ) ( )
2 2
var MSE E bias u u u u = +
Applying this to the estimator with the n-1 divisor, we can approximate the MSE by using the
results of our simulation.
( ) ( )
2 2
3.932 3.971 4 15.46 MSE = + =
Applying this to the estimator with the n divisor, we can approximate its MSE.
( ) ( )
2 2
2.621 2.647 4 8.70 MSE = + =
For efficiency, measured by MSE, we actually did better dividing by n than dividing by n-1. As
a matter of fact, there is a theorem that says using a divisor of n+1 actually gives the best MSE
when the draws come from a normal distribution. This is a good example of a case where our
criteria diverge: using a divisor of n-1 gives an unbiased estimate, but using a divisor of n (or
n+1) gives a more efficient estimate.
Actually, insisting on an unbiased estimator is often of limited value. In this example, it might
really be the standard deviation that is of interest, not the variance. Recall that
2
o o = and
2
s s = . Usually if ( ) y f x = , ( ) ( ) ( )
E Y f E X = unless the function f is linear. In this very
example, 2 o = , but the average value of s is 1.769.
st devs 5000 1. 76923 . 9169594 . 0157 5. 88473
Var i abl e Obs Mean St d. Dev. Mi n Max
. summar i ze st devs
In other words, even though
2
s is an unbiased estimator of
2
o , s is not an unbiased estimator of
o . However, s is a consistent estimator of o .