Documente Academic
Documente Profesional
Documente Cultură
CVEN2002/2702
Week 6
This lecture
CVEN2002/2702 (Statistics)
Dr Justin Wishart
2 / 43
1
2
(x)2
2 2
( SX = R)
e 22 dy
F (x) =
2
CVEN2002/2702 (Statistics)
Dr Justin Wishart
3 / 43
fX(x)
FX(x)
22
1/2
0
2
+ 2
+ 2
cdf F (x)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
4 / 43
f (x) dx =
e 22 dx = 1
2
SX
1
2
1
2
xe
(x)2
2 2
dx =
(x )2 e
(x)2
2 2
dx = 2
and
Var(X ) = 2
Dr Justin Wishart
( sd(X ) = )
Session 2, 2012 - Week 6
5 / 43
CVEN2002/2702 (Statistics)
and
Var(Z ) = 1
Dr Justin Wishart
(= sd(Z ))
6 / 43
(x)
(x)
1/2
cdf (x)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
7 / 43
0.00
0.0
35
40
N(70,10)
N(10,5)
45
0.06
f(x)
0.02
0.00
0.00
0.01
0.04
0.02
0.03
0.08
0.04
f(x)
0.10
f(x)
0.05
0.2
0.1
f(x)
0.3
0.15
0.4
N(0,1)
40
50
60
70
80
90
100
CVEN2002/2702 (Statistics)
10
15
20
25
Dr Justin Wishart
8 / 43
CVEN2002/2702 (Statistics)
Dr Justin Wishart
9 / 43
Property: Standardisation
If X N (, ), then
Z =
X
N (0, 1)
Dr Justin Wishart
10 / 43
Dr Justin Wishart
11 / 43
Dr Justin Wishart
12 / 43
CVEN2002/2702 (Statistics)
Dr Justin Wishart
13 / 43
1.25
2
1
ex /2 dx
2
Dr Justin Wishart
14 / 43
Example
Suppose that Z N (0, 1). What is P(Z < 1.25)? What is P(Z > 1.25)?
What is P(0.38 Z < 1.25)?
As Z is a continuous random variable, P(Z < z) = P(Z z) for any z
table
CVEN2002/2702 (Statistics)
Dr Justin Wishart
15 / 43
P(X 1.75) = P
0.46
0.46
table
CVEN2002/2702 (Statistics)
Dr Justin Wishart
16 / 43
0.02 = P(X 4) = P
4
X
0.04
0.04
#
$
4
=P Z
0.04
In the standard normal table, we can find that P(Z 2.05) 0.02
We conclude that
4
0.04
Dr Justin Wishart
17 / 43
0.4
P(Z < z ) = ,
for Z N (0, 1)
(x)
0.2
0.1
i.e.,
1
0.0
P(Z > z ) = 1 .
0.3
2z
Dr Justin Wishart
18 / 43
0.4
0.3
0.1
0.025
1.645
0.005
0.0
0.0
0.0
0.05
0.2
( x)
0.3
0.1
0.2
( x)
0.3
0.2
0.1
( x)
0.4
0.4
1.96
2.575
Dr Justin Wishart
19 / 43
Property
Suppose X1 N (1 , 1 ), X2 N (2 , 2 ) and X1 and X2 are
independent. Then, for any real values a and b,
#
$
%
2
2
2
2
aX1 + bX2 N a1 + b2 , a 1 + b 2
CVEN2002/2702 (Statistics)
Dr Justin Wishart
20 / 43
Example
Let X1 , X2 , X3 represent the times necessary to perform three successive
repair tasks at a certain service facility. Suppose they are independent normal
random variables with expected values 1 , 2 and 3 and variances 12 , 22
and 32 , respectively. What can be said about the distribution of X1 + X2 + X3 ?
From the previous property, we can conclude that
$
#
%
X1 + X2 + X3 N 1 + 2 + 3 , 12 + 22 + 32
CVEN2002/2702 (Statistics)
Dr Justin Wishart
21 / 43
Hence,
&
'
.
X = X1 + X2 + X3 N 150, 36 = 6
160 150
P(X 160) = P Z
6
#
CVEN2002/2702 (Statistics)
table
Dr Justin Wishart
22 / 43
Dr Justin Wishart
23 / 43
Dr Justin Wishart
24 / 43
Density histogram
35
40
45
35
40
45
0.10
0.00
0.00
0.05
0.05
f(x)
Density
0.10
0.15
0.15
density
35
40
45
Dr Justin Wishart
25 / 43
Density
Density
0.00
0.00
0.02
0.05
0.04
0.10
0.06
0.08
0.15
10
15
20
20
40
60
80
Dr Justin Wishart
26 / 43
Quantile plots
Density histograms are easy to use; however, they are usually not
really reliable indicators of the distributional form unless the number of
observations is very large
another special graph, called a normal quantile plot, is more
effective in detecting departure from normality
The plot essentially compares the data ordered from smallest to
largest with what to expect to get for the smallest to largest in a sample
of that size if the theoretical distribution from which the data have come
is normal
if the data were effectively selected from the normal distribution, the
two sets of values should be reasonably close to one another
Note: a quantile plot is also sometimes called qq-plot
( command in Matlab: qqplot)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
27 / 43
Quantile plots
Procedure for building a quantile plot:
observations {x1 , x2 , . . . , xn }
ordered observations: x(1) x(2) . . . x(n)
cumulative probabilities i = i0.5
n , for all i = 1, . . . , n
standard normal quantiles of level i : for all i = 1, . . . , n, zi
chosen such that P(Z zi ) = i , where Z N (0, 1)
Quantile plot: plot the n pairs (x(i) , zi )
If the sample comes from the Standard Normal distribution, x(i) zi
and the points would fall close to a 45 straight line passing by (0,0)
If the sample comes from some other normal distribution, the points
would still fall around a straight line, as there is a linear relationship
between the quantiles of N (, ) and the standard normal quantiles
Fact
If the sample comes from some normal distribution, the points should
follow (at least approximately) a straight line
CVEN2002/2702 (Statistics)
Dr Justin Wishart
28 / 43
quantile plot
Theoretical Quantiles
Theoretical Quantiles
10
15
Sample Quantiles
20
40
60
80
Sample Quantiles
the normality assumption appears acceptable for the first data set,
not at all for the second
CVEN2002/2702 (Statistics)
Dr Justin Wishart
29 / 43
Transforming observations
When the density histogram and the qq-plot indicate that the
assumption of a normal distribution is invalid, transformations of the
data can often improve the agreement with normality (see lab classes
2 & 4)
Scientists regularly express their observations in natural logs
Lets look again at the seeded clouds rainfall data (Slide 9, Week 1)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
30 / 43
Transforming observations
Apart from log, other transformations may be useful:
1
,
x
x,
x,
x 2,
x3
CVEN2002/2702 (Statistics)
Dr Justin Wishart
31 / 43
6. Sampling distributions
6. Sampling distributions
CVEN2002/2702 (Statistics)
Dr Justin Wishart
32 / 43
6. Sampling distributions
6.1 Introduction
Dr Justin Wishart
33 / 43
6. Sampling distributions
6.1 Introduction
Dr Justin Wishart
34 / 43
6. Sampling distributions
6.1 Introduction
Random sampling
The importance of random sampling has been emphasised in
Lecture 1:
to assure that a sample is representative of the population from
which it is obtained, and
to provide a framework for the application of probability theory to
problems of sampling
As we said, the assumption of random sampling is very important: if
the sample is not random and is based on judgement or flawed in
some other way, then statistical methods will not work properly and will
lead to incorrect decisions
Therefore, we should now properly define a random sample
CVEN2002/2702 (Statistics)
Dr Justin Wishart
35 / 43
6. Sampling distributions
6.1 Introduction
Random sampling
Before a sample of size n is selected at random from the population,
the observations are modelled as random variables X1 , X2 , . . . , Xn
Definition
The set of observations X1 , X2 , . . . , Xn constitutes a random sample if
1
Dr Justin Wishart
36 / 43
6. Sampling distributions
Dr Justin Wishart
37 / 43
6. Sampling distributions
CVEN2002/2702 (Statistics)
Dr Justin Wishart
38 / 43
6. Sampling distributions
CVEN2002/2702 (Statistics)
Dr Justin Wishart
39 / 43
6. Sampling distributions
E X =E
Xi =
E(Xi ) =
=
n
n
n
i=1
and variance
* +
= Var
Var X
CVEN2002/2702 (Statistics)
1)
Xi
n
i=1
i=1
i.
i=1
n
n
)
1 )
2
i.d. 1
2
Var(X
)
=
=
i
n
n2
n2
i=1
Dr Justin Wishart
i=1
40 / 43
6. Sampling distributions
X N ,
n
if the population is normal
2.0
1.5
1.0
0.5
n=1
n=2
n=4
n=10
n=25
0.0
3 2
+ 2 + 3
CVEN2002/2702 (Statistics)
Dr Justin Wishart
41 / 43
6. Sampling distributions
Example
A sample of n = 28 heavy smoking men aged between 35 and 57, had a
sample mean testosterone level of 795.1 nanograms per decilitre. In the
general population of non smoking men of this age, we can assume that
testosterone is normally distributed with mean 620.6 and standard deviation
of 201.5. Are the results for the sample of smokers consistent with the
distribution for non-smokers?
CVEN2002/2702 (Statistics)
Dr Justin Wishart
42 / 43
Objectives
Objectives
Now you should be able to:
Calculate probabilities, determine mean, variance and standard
deviation for normal distributions
Standardise normal random variables, and understand why this is
useful
Use the table for the cdf of a standard normal distribution to
determine probabilities of interest
Explain the general concepts of estimating the parameters of a
population, in particular the difference between estimator and
estimate, and the role played by the sampling distribution of an
estimator
Illustrate those concepts with the particular case of the estimation
of the mean in a normal population
Recommended exercises Q31, Q33 p.41, Q35, Q37 p.42, Q73 p.58,
Q27 p.78, Q47 p.93, Q65 p.97, Q51, Q53 p.237
CVEN2002/2702 (Statistics)
Dr Justin Wishart
43 / 43