Business Statistics

DEPARTMENT OF MARKETING AND STATISTICS
AARHUS SCHOOL OF BUSINESS UNIVERSITY OF AARHUS
INTERNT UNDERVISNINGSMATERIALE E309 (ERSTATTER E281)
LectureNotesinBusinessStatistics
BSc(B)/(IM)part1
SteenAndersenandMortenBergJensen 2009
TableofContents
1. 2. Expected Value and Variance of Random Variables .......................................................... 1 Various Probability Distributions ......................................................................................... 7 A) B) C) D) E) F) G) H) I) J) K) L) 3. 4. 5. 6. The hypergeometric distribution..................................................................................... 9 The k-dimensional hypergeometric distribution ........................................................... 10 The binomial distribution ............................................................................................. 11 The multinomial distribution ........................................................................................ 12 The geometric distribution ............................................................................................ 13 The Poisson distribution ............................................................................................... 14 The exponential distribution ......................................................................................... 15 The uniform distribution ............................................................................................... 16 The normal distribution ................................................................................................ 17 The T-distribution ......................................................................................................... 19 The 2-distribution ........................................................................................................ 21 The F-distribution ......................................................................................................... 23
Choice of Test Statistic by Test for Expected Values and Proportions............................ 24 Sampling Methods................................................................................................................. 29 Poisson Distribution, Confidence Intervals and Hypothesis Testing ............................... 31 Overview ................................................................................................................................ 35 A) B) C) Descriptive measures .................................................................................................... 35 Construction of confidence intervals ............................................................................ 36 Application of hypotheses ............................................................................................ 41
1. Expected Value and Variance of Random Variables

A) Definition A random variable is a function or rule that assigns a number to each outcome of an experiment. Random variables are separated into discrete and continuous random variables. A throw of the dice will for example have a discrete outcome, whereas time of reaction will have a continuous outcome.
Discrete random variables If we look at the throw of the dice example, the throw itself is the experiment, the random variable is the possible outcomes that the throw may result in, the outcome is the number of pips the throw shows, and the probability distribution is the probabilities for each of the possible outcomes. P(x) = P(X = x) is to be read as the Probability that the random variable X equals x. All probabilities should be larger or equal to zero, and the sum of probabilities should be equal to 1 P( x) 0 and P ( x) = 1 . x Random variables are written in capital letters and values in small letters. E(X) is to be read as the expected value of the random variable X. For a throw with the dice, the expected value is 3 expressing the average number of pips you will expect to throw in the long run. The expected value is defined as E ( X ) = X = x P ( x ) , the sum of the single outcomes, weighted
x
with the probability of the outcome. V(X) is to be read as the variance of the random variable X. The variance is an expression of the diversity that the throw of the dice may result in. The variance is defined as:
2 2 V ( X ) = X = E ( X X ) =
( x )
X x
P( x) =
x
x
2 P ( x) X = E ( X 2 ) E ( X ) 2 .
2 E ( X X ) is the expected value of the squared deviations and 2 E ( X ) is the expected value of the random variable X2.
Example Let the random variable X be defined by the following probability distributions: x P ( x ) = 10 for x = {1, 2,3, 4} , where {1,2,3,4} is the possible outcomes for X.
P(x)
,5
,4
,3
,2
,1
0,0 1 2 3 4
The probability distribution can appropriately be presented in a table where further calculations can be made. x 1 2 3 4 sum P(x) 0.1 0.2 0.3 0.4 1.0
x P( x)
x 2 P( x)
0.1 0.8 2.7 6.4 10.0 = E(X2)
( x X )
4 1 0 1 -
( x X )
P( x)
0.1 0.4 0.9 1.6 3.0 = E(X)
0.4 0.2 0.0 0.4 1.0 = V(X)
Continuous random variables In the continuous case the point probability is 0, and this is replaced by f(x), called the probability density function for X, if f(x) is non-negative and exhaustive.
I.e. f ( x) 0 and
f ( x)dx = 1.
The probability that X assumes a value within the interval [a;b] is

P (a X b) =
f ( x)dx
a
= F (b) F ( a )
where F (b) =
f ( x) dx and F (a ) =
f ( x) dx
The expected value is defined as E ( X ) = X = The variance is defined as

2 2 V ( X ) = X = E ( X X ) =
x f ( x)dx.
2
( x X ) f ( x)dx =
2 f ( x)dx X
= E ( X 2 ) E ( X )2
B) Laws of expected value and variance 1. Laws of expected value (1) (2) (3) (4) (5) (6) E(b) E(a X) E(aX + b) E(X + Y) E(X Y) E(aX + bY) =b = a E(X) = a E(X) + b = E(X) + E(Y) = E(X) E(Y) = a E(X) + b E(Y) where b is an arbitrary constant where a is an arbitrary constant general addition
2. Laws of variance (1) (2) (3) (4) (5) (6) V(b) V(a X) V(aX + b) V(X + Y) V(X Y) V(aX + bY) =0 = a2 V(X) = a2 V(X) = V(X) + V(Y) = V(X) + V(Y) = a2 V(X) + b2 V(Y)
if X and Y are independent if X and Y are independent if X and Y are independent
3. Covariance and coefficient of correlation If the random variables are not independent, then (4a) V(X + Y) (5a) V(X Y) (6a) V(aX + bY) = V(X) + V(Y) + 2 COV(X,Y) = V(X) + V(Y) 2 COV(X,Y) = a2 V(X) + b2 V(Y) + 2ab COV(X,Y)
Let the random variable R be a linear combination of k random variables. The expected value and variance of R are determined by the following expressions based on the expected values, variances and covariances. R could be the turnover, ai is the price of item i and Xi is the number of item i sold.
R =
a X
i =1 i k i =1 k
E ( R) = V ( R) =
a E( X )
i i
ai2 V ( X i ) + 2 ai a j COV ( X i , X j ).
i =1 i =1 j >i
k 1 k
The random variable X is a linear combination of n independent random variables, all weighted by 1 n .
X =
i =1
1 n
Xi E( X i ) = n 1 = n
i =1 2 n n 1 n n
E( X ) = V (X ) =
1 n
when E ( X i ) =
2
n
( 1 ) V ( X i ) n
i =1
+ 2 1 1 COV ( X i , X j ) = n n12 2 + 0 = n n
i =1 j >i
when V ( X i ) = 2 and COV ( X i , X j ) = 0
The covariance is an expression of linear dependence. If the covariance is 0, then there may be independence, but there may be another kind of dependence than linear dependence.
COV ( X , Y ) = E ( X X ) (Y Y ) = E ( X Y ) X Y E( X Y ) =
where
x y P( x, y)
x y
or
E( X Y ) =
x y f ( x, y)dxdy.
x y
The coefficient of correlation, , is defined from the covariance:
COV ( X , Y ) , where 1 1 . X Y
2 is an expression of the proportion of the variation in Y which is explained by X. can be

regarded as a kind of index figure and the direction of the relationship. As supplement to V ( X ) it can be proved that if V ( X i ) = 2 and ij = , then V (X ) =
( 1 ) V ( X i ) n
2 i =1
n + 2 1 1 i j = 2 ( 1 + (1 1 ) ) . n n n n 2
C) Example
Let the random variables X and Y denote the number of computers sold from store X and Y per day. The two random variables are defined by the following simultaneous probability distribution: y x 1 2 3 4 P(y) 0 0.10 0.05 0.05 0.00 0.2 1 0.00 0.10 0.10 0.10 0.3 2 0.00 0.05 0.15 0.30 0.5 P(x) 0.1 0.2 0.3 0.4 1.0
The table states the probability for all possible cases of (x,y), e.g. P ( X = 3 Y = 2) = P (3, 2) = 0.15 . This means that there is a 15% probability that on one day 3 computers are sold from store X, while 2 computers are sold from store Y. Also available are the probability distributions for X and Y, called the marginal probability distributions, e.g. there is a 10% probability that on one day 1 computer will be sold from store X. Calculating the expected value and variance for X and Y, respectively, leads to the following results:
2 E ( X ) = X = 3.0 and V ( X ) = X = 1.0 2 E ( Y ) = Y = 1.3 and V ( Y ) = Y = 0.61.
Let us define a random variable S = X + Y, the sale of computers from the two stores per day. What are E(S) and V(S)? E ( S ) = E ( X + Y ) = E ( X ) + E (Y ) = 3.0 + 1.3 = 4.3 V ( S ) = V ( X ) + V (Y ) + 2 COV ( X , Y ) = 1.0 + 0.61 + 2 0.50 = 2.61 COV ( X , Y ) = E ( X Y ) X Y = 4.40 3.0 1.3 = 0.50 E( X Y ) =
x y P( x, y)
x y
= 0 x P ( x, y ) + 1 x P ( x , y ) + 2 x P ( x, y )
x x x
= 0 + 1 (1 0.00 + 2 0.10 + 3 0.10 + 4 0.10 ) + 2 (1 0.00 + 2 0.05 + 3 0.15 + 4 0.30 ) = 0 + 0.90 + 3.50 = 4.40
COV ( X , Y ) 0.50 = = 0.6402 X Y 1.0 0.61
and 2 = 0.4098.
A more laborious method would be to determine the probability distribution for S = X + Y and carry out the calculations on the basis of this. s (x + y) 1 (1 + 0) 2 (2 + 0 1 + 1) 3 (3 + 0 2 + 1 1 + 2) 4 (4 + 0 3 + 1 2 + 2) 5 (4 + 1 3 + 2) 6 (4 + 2) sum E(S) = 4.30 V(S) = E(S2) E(S)2 = 21.10 4.302 = 21.10 18.49 = 2.61 The same result as obtained when applying the law of the expected value and variance. Suppose that the profit of a computer sold in store X is 1,000 DKK, and that the profit of a computer sold in store Y is 1,300 DKK. What is the expected profit per day for the two stores, and what are the variance and standard deviation of the profit? P(s) 0.10 0.05 0.15 0.15 0.25 0.30 1.00
s P( s)
s 2 P( s )
0.10 0.20 1.35 2.40 6.25 10.80 21.10 = E(S2)
( s S )
10.89 5.29 1.69 0.09 0.49 2.89 -
( s S )
P( s)
0.10 0.10 0.45 0.60 1.25 1.80 4.30 = E(S)
1.0890 0.2645 0.2535 0.0135 0.1225 0.8670 2.6100 = V(S)
F = a X + b Y , where a = 1, 000 and b = 1,300 E ( F ) = a E ( X ) + b E (Y ) = 1, 000 3.0 + 1,300 1.3 = 3, 000 + 1, 690 = 4, 690 V ( F ) = a 2 V ( X ) + b 2 V (Y ) + 2ab COV ( X , Y ) = 1, 0002 1.0 + 1,3002 0.61 + 2 1, 000 1,300 0.50 = 1, 000, 000 + 1, 030,900 + 1,300, 000 = 3,330,900 The standard deviation of the profit is V ( F ) = 3,330,900 = 1,825 DKK.
2. Various Probability Distributions
Geometric distribution, X = number of throws, p = 1/6

P(tosses)
,20
,17
,15
Binomial distribution, X = number of sixes, n = 6 throws, p = 1/6

P(sixs)
,45 ,40 ,35 ,33 ,30 ,25 ,20 ,15 ,10 ,05 0,00 0 1 2 3 4 5 6 ,40
,14
,12
,10
,10 ,08 ,07
,05
,06 ,05 ,04 ,03 ,03 ,02 ,02 ,02 ,01 ,01 ,01
0,00 1 ,20 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20
Number of tosses
,05
Hypergeometric distribution, X = number of clubs, n = 13, N = 52 and S = 13

P(clubs)
,30 ,29 ,25 ,24 ,20 ,21
Number of sixs
Poisson distribution, = 360/198

P(numbers)
,30 ,25
,20
,15
,15
,10
,12 ,10
,05
,08 ,05 ,04 0,00 0 1 2 3 4 5 6 7 8 9 10 11 12 13
0,00 0 1 2 3 4 5 6 7 8 9 10
Numbers
Number of clubs
T-distribution with 9 degrees of freedom
Standard normal distribution, N(0;1)
Exponential distribution, density function = 360/(198 90) = 2/99
F-distribution with various degrees of freedom
2 -distribution with 1, 2, 3 and 4 degrees of freedom
Relationships between different distributions with some rules of thumb for approximation:
k-dimensional hypergeometric distribution (B)
Multinomial distribution (D)
Geometric distribution (E)
N > 50 n < 0.05 N
Binomial distribution (C)
n 20 p 0.05
Hypergeometric distribution (A)
np>5 n (1-p) > 5
Poisson distribution (F)
S S N n 1 >5 N N N 1
> 10
Tdistribution (J)
> 30
Normal distribution (I)
Exponential distribution (G)
> 50
Fdistribution (L)
2 distribution (K)
In each circle it is indicated in which of the following sections further information about the distributions can be found.
A) The hypergeometric distribution 1

Hypergeometric distribution, X = number of clubs, n = 13, N = 52 and S = 13
P(clubs)
,30 ,29 ,25 ,24 ,20 ,21
,15 ,12 ,10 ,08 ,05 ,04 0,00 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of clubs
Assumptions 1) Finite population consisting of N elements. 2) Population consisting of 2 alternative groups ( A / A) . 3) A simple random sample consisting of n elements is drawn. Point probability P ( x) = P ( X = x) = h( x; N , n, S )
S N S x n x , = N n
N S N-S n x n-x = = = = = = number of elements in the population number of elements in the population with the distinctive mark A number of elements in the population with the distinctive mark A number of elements in the sample number of elements in the sample with the distinctive mark A number of elements in the sample with the distinctive mark A
where
N is number of ways, where n elements can be drawn among N elements. n N N! , where n ! = n (n 1) ..... 2 1. = n n ! ( N n)! Expected value
E( X ) = n S N
Variance
Skewness
V (X ) = n
S S N n 1 N N N 1
2S N 2n N 1 = S S N n N 2 n 1 N N N 1 1
Keller, CD appendix F
Approximations To the binomial distribution, if N > 50 and

n < 0.05 N
At the approximation p =
S N
To the normal distribution if n
S S N n 2 1 > 5 At the approximation =E(X) and =V(X). N N N 1
X ~ h ( x; N , n , S )
S N ~ Z = N (0,1) S S N n n 1 N N N 1 X n
Correction for the approximation from a discrete to a continuous distribution is done as follows:
x + 0.5 E ( X ) If P ( X x | N , n, S ) is required, then P Z < V (X )

If P ( X x | N , n, S ) is required, then 1 P ( X x 1 | N , n, S ).
for illustration, see Applet 13, Keller p. 312.
B) The k-dimensional hypergeometric distribution

Assumptions 1) Finite population consisting of N elements. 2) Population consisting of k alternative groups (A1, A2, , Ak). 3) A simple random sample consisting of n elements is drawn. Expected value S E( X i ) = n i N for i = {1, 2,..., k } Variance
V (Xi ) = n Si N n 1 N N 1 for i = {1, 2,..., k } Si N
Covariance
Cov( X i , X j ) = n for i j Si S j N n N N N 1
10
C) The binomial distribution

Binomial distribution, X = number of sixes, n = 6 throws, p = 1/6
P(sixs)
,45 ,40 ,35 ,33 ,30 ,25 ,20 ,15 ,10 ,05 0,00 0 1 2 3 4 5 6
,40
,20
,05
Number of sixs
Assumptions 1) 2) 3) 4) Each trial results in one of two mutually excluding outcomes ( A / A) . P(A) = p is constant from trial to trial. The individual trials are independent. n trials are carried out.
Point probability 2
P ( X = x ) = P ( x ) = b( x; n, p ) n = p x (1 p ) n x x
where X = the number of trials with the outcome A.
Expected value
E( X ) = n p
Variance
V ( X ) = n p (1 p )
Skewness 1 2 p 1 = np(1 p)
Approximations 3 1) To the normal distribution if n p > 5 and n (1 p) > 5, X ~ b( x; n, p ) X n p P p X ~ Z or alternatively ~ Z , where P = n n p(1 p ) p(1 p ) n Regarding correction for the approximation from a discrete to a continuous distribution see p. 10 and Keller p. 311. 2) To the Poisson distribution if n > 20 and p < 0.05. At the approximation n p = .
2 3
Cf. Keller p. 236 Cf. Keller p. 310
11
D) The multinomial distribution

The multinomial distribution is a generalization of the binomial distribution where, instead of 2 alternative outcomes, each trial has k possible outcomes (mutually exclusive and collectively exhaustive). Assumptions 1) Each trial results in one of k mutually exclusive outcomes, Ai, which are collectively exhaustive i = {1, 2, , k}. 2) P(Ai) = pi is constant from trial to trial i = {1, 2, , k}. 3) The individual trials are independent. 4) n trials are carried out. Assumptions 1) and 2) mean that
p
i =1
=1
Simultaneous probability distribution P( x1 , x2 ,..., xk ) = P( X 1 = x1 X 2 = x2 ... X k = xk ) = p( x1 , x2 ,..., xk ; n, p1 , p2 ,..., pk ) = n! x p1x1 p2 2 ... pkxk x1 ! x2 !..., xk ! Where Xi = number of trials with the outcome Ai
From this follows that Xi is binomially distributed b(xi;n,pi). Expected value E ( X i ) = n pi for i = {1, 2, , k} Variance V ( X i ) = n pi (1 pi ) for i = {1, 2, , k} Covariance Cov( X i , X j ) = n pi p j for i j
12
E)
The geometric distribution
Geometric distribution, X = number of throws until you get a six, p = 1/6

P(tosses)
,20
,17
,15
,14
,12
,10
,10 ,08 ,07
,05
,06 ,05 ,04 ,03 ,03 ,02 ,02 ,02 ,01 ,01 ,01
0,00 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20
Number of tosses
In the geometric distribution the random variable X is the number of trials until A is stated/observed for the first time in a Bernoulli-process (= binomial experiment), i.e. Assumptions: Like the binomial distribution. Point probability
P( x) = P( X = x) = p(1 p) x1
Expected value 1 E( X ) = p
Where X = number of trials until the first trial with the outcome A. Variance 1 p V (X ) = 2 p Skewness 2 p 1 = 1 p
E-supl.) The negative binomial distribution
In the negative binomial distribution the random variable X is the number of trials until A is stated/observed the sth time in a Bernoulli process (= binomial experiment), i.e. Assumptions: Like the binomial distribution. Point probability x 1 s 1 x 1 s xs xs P( x) = P( X = x) = p b( s 1; x 1, p) = p p (1 p ) = p (1 p ) s 1 s 1 Where X = number of trials until the sth trial with the outcome A. Expected value 1 E( X ) = s p Variance
V (X ) = s
1 p p2
Skewness 2 p 1 = s (1 p)
13
F)
P(numbers)
,30 ,25
The Poisson distribution

The Poisson distribution, = 360/198. The random variable X indicates the number of occurrences in a particular interval of time (or space). Ex. 360 goals scored in 198 matches, where X is the number of goals scored per match by the home team.
0 1 2 3 4 5 6 7 8 9 10
,20
,15
,10
,05
0,00
Numbers
Assumptions 1) The number of occurrences within an interval of time (e.g. one minute) are independent of the number of occurrences within other intervals of time, provided that overlapping intervals of time are not involved. 2) The expected number of occurrences within an interval of time (e.g. one minute) is constant during the whole lapse of time (e.g. one hour or one day). The process is said to be stationary. 3) The probability that exactly one occurrence takes place within a very small interval of time is proportional to the length of the interval. 4) The probability that more than one occurrence take place within a very small interval of time is negligible in relation to the probability that exactly one occurrence takes place. Point probability 4
x! where = the intensity (average number of activities within a certain interval of time).
Generally = t, where is the intensity per unit of time and t is the time. Expected value
E( X ) =
P( x) = P( X = x) = p( x; ) =
Variance
V (X ) =
Skewness 1 1 =
Use Primarily in connection with queuing theoretical problems. The Poisson process gives a good description of a series of situations where arrivals occur at random over time. Approximations To the normal distribution if > 10
~Z.
Correction for the approximation from a discrete to a continuous distribution see p. 10 and Keller p. 311.
4
Cf. Keller p. 243
14
G) The exponential distribution

Distribution function = 360/(198 90) = 2/99 Density function = 360/(198 90) = 2/99
Continuous distribution point probabilities = 0. In this example T states the time until the first goal is scored by the home team.
Assumptions: Like the Poisson distribution. Density function 5
f (t ) = et
where T states the time between 2 activities, or the time one activity takes (operation time) and the parameter states the intensity (average number of occurrences) per unit of time. Distribution function (The cumulative probability function)
F (t ) = P(T t ) = 1 et states the probability that the next activity in a Poisson process occurs at time t at the latest. The distribution function can be derived from the Poisson distribution, since the expected number of occurrences in units of time t are = t. The probability that the next activity occurs before time t in the exponential distribution corresponds to the fact that at least one activity occurs during an interval of t units of time in the Poisson distribution which is
P (T t ) = P( X 1 | = t ) = 1 P( X = 0 | = t ) = 1 = 1 e t Expected value 1 E (T ) = Variance 1 V (T ) = 2 Skewness ( t )0 t e 0!
1 = 2
Cf. Keller p. 277
15
H) The uniform distribution

Discrete or continuous. Range of definition and variation: a X b .
Discrete version: (e.g. number of pips after a throw of a die)
Point probability
P ( x) = P ( X = x) = 1 b a +1 a X b
Expected value b+a E( X ) = 2
Variance (b a) 2 (b a) (b a) (b a + 2) V (X ) = + or 12 6 12
Skewness
1 = 0
Continuous version: (e.g. the waiting time for a bus arriving at intervals of 10 minutes)
Density function 6
f ( x) = 1 ba a X b
Expected value b+a E( X ) = 2
Variance (b a)2 V (X ) = 12
Skewness
1 = 0
Cf. Keller p. 255
16
I)
The normal distribution 7
The standard normal distribution, N(0;1)
X ~ N ( , 2 ) , i.e. X follows a normal distribution with the expected value and the standard deviation .
~ Z follows a standard normal distribution with the expected value 0 and the standard deviation 1 Z ~ N (0,1). X
The normal distribution is continuous. The point probability is always 0. Thus the density function does not indicate the point probability but the density. Density function 8
1 f ( x) = e 2 2 1 x
2
< x <
Normal distribution Standard normal distribution
Expected value E( X ) =
E (Z ) = 0
Variance V (X ) = 2 V (Z ) = 1
Skewness 1 = 0
1 = 0
Convolution property If X1 and X2 are independent and normally distributed, the following applies
2 2 Y = a + b1 X1 + b2 X 2 ~ N (a + b1 1 + b2 2 , b12 12 + b2 2 ),
where a, b1 and b2 are arbitrary constants.
7 8
Keller p. 259 and CD Applet 5 Cf. Keller p. 259
17
Use 1) Tests and confidence intervals for . 2) Tests and confidence intervals for comparison of expected values in 2 populations (1 2). 3) Tests and confidence intervals for p =
S in the hypergeometric distribution. N
4) Tests and confidence intervals for p in the binomial distribution. 5) Tests and confidence intervals for comparison of proportions in 2 populations
S1 S 2 = ( p1 p2 ) . N1 N 2
6) Tests and confidence intervals for comparison of processes in 2 populations ( p1 p2 ) . 7) Tests and confidence intervals for and (1 2) in the Poisson distribution. Ad 1 The normal distribution can only be applied if is known and X is normally distributed, either because the variable (X) is normally distributed, or because of the central limit theorem, where the sample size, n, is assumed to be sufficiently large according to the form (skewness) of X, cf. Keller p. 300. On pp. 24, 36 and 43 the procedure is shown for deciding the test procedure. Ad 2 2 The normal distribution can only be applied if X 1 X 2 are normally distributed, and 12 and 2 are known, or if the samples are sufficiently large (cf. the central limit theorem). On pp. 25, 37 and 44 you can see the choice of test variable/test statistic when 2 population averages are to be compared. Ad 3, 4, 5 and 6 The normal distribution can only be applied if X 1 and X 2 in P = X 1 / n1 , and P2 = X 2 / n2 are 1 approximately normally distributed, cf. approximation from the binomial distribution or the hypergeometric distribution to the normal distribution. Correction for the finite population should be included if ni/Ni > 0.05. Ad 7 See pages 32-33.
18
J)
The T-distribution 9
Illustration of various T-distribution and the standard normal distribution
T-distribution with 9 degrees of freedom
Continuous distribution point probability = 0 Defined by:

Z v ~ Tv , where Z ~ N (0,1), v2 ~ 2 distributed with degrees of freedom, and Z is independent of v2 .
2 v
Density function
f (t ) =
[( 1) / 2]! 1 + t ( 1) / 2 = [ ( 2) / 2]!
2
( 1) / 2 ! [[( 2) / 2]]! 1 + t
2
/ 2
1 + t
for
f (t ) =
e t
/2
1 =
1 2
et
/2
The gamma function ( n ) ,which is equal to (n-1)! for positive integers, is defined for all nonnegative real figures. This means that the expression for the density function can be written as
f (t ) =
( 2 )
+ ( 21 )
1 + t
2
( 1) / 2
Expected value E (Tv ) = 0 only defined for v > 1
Variance
v v2 only defined for v > 2 V (Tv ) =
Skewness 1 = 0 only defined for v > 3
Keller p. 281 and CD Applet 6
19
Use 1) Tests and confidence intervals for in a normally distributed population, where is unknown. According to W. C. Guenther empirical trials have shown that the T-distribution with a good approximation can be used if n is assumed to be sufficiently large (cf. Keller p. 389).
2 2) Comparison of two expected values from normally distributed populations, where 12 and 2 are unknown.
3) Tests and confidence intervals in regression and correlation analyses. Approximations When > 30, the standard normal distribution can be applied as an approximation of the Tdistribution.
It will always give a more precise result to use the T-distribution when is unknown irrespective of the size of n.
20
K) The 2-distribution 10
2 distribution with respectively 1, 2, 3 and 4 degrees of freedom 2 distribution with respectively 10, 20 and 50 degrees of freedom compared with corresponding normal distributions
Continuous distribution point probability = 0

2 Z12 + Z 2 + .... + Z v2 ~ v2 where
Defined by: Density function
Z i ~ N (0,1) for i = {1,....v} and Zi independent of Z j for all i j.
f ( 2 ) = f ( 2 ) =
[( 2) / 2]! 2
1
1
2
2 ( 2) 2
1 2 2
2
( 2 )2 e
1
2
2
Expected value
Variance
E ( v2 ) = v
Use
V ( v2 ) = 2v
Skewness 8 1 = v
2 1) Confidence intervals for and test of X , when X is normally distributed or the sample is sufficiently large. 11 2) Goodness-of-fit (test showing whether or not some given data follow a given distribution). 3) Test for independence and homogeneity.
10 11
Keller p. 286 and CD Applet 7 Keller pp. 402
21
Approximations If > 50, the normal distribution can be applied as an approximation. This results in 2 N (v, 2v). The probability in the 2 -distribution can be calculated as follows:
2 Z > a E ( v ) = P Z > a v . P ( > a) P 2v V ( v2 ) 2 v
By construction of the confidence interval for 2 (see page 38) we get:
(n 1) s 2 ( n 1) + za / 2 2 ( n 1)
(n 1) s 2 (n 1) za / 2 2 (n 1)
s2 1 + z / 2 2 n 1
s2 1 z / 2 2 n 1
By test of 2 (see p. 45) we get:

2 H 0 : 2 = 0 2 H1 : 2 > 0
2 Obs =
(n 1) s 2
02
2 (n 1) 2 2 p value = P ( n 1 > Obs ) P Z > Obs . 2 (n 1)
22
L)
The F-distribution 12
F- distribution with various degrees of freedom
Continuous distribution point probability = 0 Defined by
2 / 1 ~ F( , 2 / 2
1 1 2
2)
provided that there is independence between numerator and denominator.
Density function
f f1 ; 2 =
( ( 1 2 2) / 2 )! 1 f(f)= ( ( 1 2) / 2 )! ( ( 2 2) / 2 )! 2
Variance V F( v1 ,v2 ) =
1
2
12
2
(1 + )
1 f 2
1+ 2
2
Expected value v2 E ( F( v1 , v2 ) ) = v2 2 Please note: independence of
Skewness
2 v1 ( v2 2 ) ( v2 4 ) 1 =
2 2v2 (v1 + v2 2)
8 ( 2 4 ) 1 + 2 2 ) 1 (
2 2 1 2+ 6 2
for 2 > 2 if 2 large: 1
for 2 > 4 if 1 ~ 2 and large : 4 if 1 small and 2 large: 1

2
for 2 > 6 if 1 2 and large:

6
1
8
if 1 small and 2 large:
Use 1) Comparison of two variances from normally distributed populations. 2) Analysis of variance.
12 /1 Z 2 = 2 v ~ F(1;v ) ~ Tv2 Please note that 2 v / v v
i.e. Tv ~ F(1,v ) .
12
Keller p. 289 and CD Applet 8
23
3. Choice of Test Statistic by Test of Expected Values and Proportions

A) Test for expected values
1. Test for one expected value (one -value)
H 0 : = 0
H1 : 0
Is
known?
Yes
Is X approximately normally distributed 1
Normal test
Yes
X 0 / n
No
No
Is
X 0 S / n
Yes
T-test
approximately T-distributed?2
X 0 Tn 1 S/ n
No
Normal test and T-test f or cannot be used
According to the central limit theorem is the sample size sufficiently large, according to the distribution of X. Is f(x) symmetric: n > 5-10. Is f(x) moderately skew: n > 20-30. Is f(x) on the contrary extremely skew, the demand on n can be extremely great. Rule of thumb: n > 25 12 (which requires knowledge or assumption of the size of 1 - see p. 35). More strict demand on the sample size, often referred to as the extended central limit theorem. Rule of thumb: n > 100 12 .
If the sample makes up more than 5% of the population, make correction for the finite population X 0 X 0 ~Z and Tn 1. N n S N n N 1 N 1 n n If X can be assumed to follow a poisson distribution and > 10, the following normal test can be X 0 Z since V(X) = for a poisson distributed random variable. used:
24
2. Comparison of 2 -values
H 0 : 1 2 = ( 1 2 )0
H1 : 1 2 ( 1 2 )0
Paired comparison?
Yes
D D0 SD n
approximately T-distributed?1 No
Is
T-test Yes
D D0 SD n
T n 1
No
Are 2 12 and 2 known?
Yes
Approximately normally distributed?2
X1 X 2
Is
Yes
(X
Normal test
1
X 2 ( 1 2 )0
2 1
n1
2 2
n2
No No
12
Is =
2 2 ?3
Yes
(X
X 2 ) ( 1 2 ) 0
2 Sp
Is
T-test Yes
1 n1
1 n2
(X
X 2 ( 1 2 )0 S
2 p 1 n1
approximately T-distributed?4 No No
) (
1 n2
Tn1 + n2 2
(X
X 2 ) ( 1 2 )0
S1
2
Is
T-test Yes
(X
X 2 ) ( 1 2 )0
S1
2
n1
S2 n2
approximately T-distributed?5 No
n1
S2 n2
~ T
Normal test or T-test for 1- 2 cannot be used6
25
Identical to the test of with unknown . See Keller chapter 13.1-3 to distinguish between paired and independent samples. If X1 and X2 are not extremely skewed, X 1 X 2 are approximately normally distributed even at relatively small samples. Usually n1 30 and n2 30 will be sufficient with the caveat that there are very unequal distributions, see the reference to the central limit theorem on page 24.
Tested by an F-test, see page 45: f obs
s12 12 s12 = 2 2 = 2 , if X1 and X2 are independent and normally s2 2 s2
2 distributed or n1 and n2 are sufficiently large. S p is stated on p. 37.
If X1 and X2 are independent and not skewed, the test statistic is approximately T-distributed at relatively small samples. Usually n1 30 and n2 30 will be sufficient with the caveat that there are very unequal distributions, see the reference to the central limit theorem on page 24. The test statistic will be approximately T-distributed with n1 + n2 2 degrees of freedom. If X1 and X2 are independent and not skewed, the test statistic is approximately T-distributed at relatively small samples. Usually n1 30 and n2 30 will be sufficient with the caveat that there are very unequal distributions, see the reference to the central limit theorem on page 24. The test statistic will be approximately T-distributed with degrees of freedom where
(s
(s
2 1
2 1
n1 )
2 n1 + s2 n2 ) 2
2 2
n1 1
(s +
2 2
n2 )
n2 1
For n1 = n2 we get: = (n2
(s 1)
2 1
2 + s2 )
4 s14 + s2
On page 20 it is stated that when > 30, the standard normal distribution could be used as a reasonable approximation to the T-distribution - which means that in this case, we can assume that
(X
X 2 ) ( 1 2 )0
S12 n1
2 S2 n2
Z and hence omit the calculation of .
See Keller ch. 19.1.
26
B) Test for proportions
1. Test for one proportion
H 0 : p = p0
H1 : p p0
X is approximately normally distributed?1
No
Solution by exact calculations2
Yes
n p0 (1 p0 )
1
X n p0
Z or
p0 (1 p0 ) / n
P p0
Estimated on the basis of the approximation from either the hypergeometric distribution or from the binomial distribution to the normal distribution. Examples: a) Hypergeometric distribution:
H0 : p =
4 52
N = 52, n = 5, S = 4, x = 3
4 i 48 4 i 48 3 2 4 1 p value = 2 P ( X 3 | N = 52, n = 5, S = 4 ) = 2 52 + 52 = 5 5 2 (0.001736 + 0.000018) = 0.00351
b) Binomial distribution:
H 0 : p 0.05 n = 50 x = 1 p-value = P(X 1 | n = 50, p = 0.05) = 0.279

If the sample makes up more than 5% of a population, a correction for the finite population is made
n p0 (1 p0 )
X n p0
N n N 1
~ Z
or
p0 (1 p0 ) / n
P p0
N n N 1
~ Z.
27
2. Comparison of 2 proportions
H 0 : p1 p2 = ( p1 p2 )0
H1 : p1 p2 ( p1 p2 )0
Are X1 - X2 approximately normally distributed?1
No
Within the curriculum there are no immediate solutions
Yes
No
Is ( p1 p2 )0 = 0
Yes
P P2 1 ~ Z2 1 P 1 P n1 + n12
)(
(P P ) ( p p )
1 2
1
2
2 0
P 1 P 1 1
1
( ) + P (1 P ) n n
2 2
~Z
Estimated on the basis of the approximation from either the hypergeometric distribution or from the binomial distribution to the normal distribution.
n P + n P X + X2 P= 1 1 2 2 = 1 n1 + n2 n1 + n2
If one or both samples make up more than 5% of a population, a correction for the finite population is made.
P 1 P
)(
P P2 1
1 n1
N1 n1 N1 1
1 n2
N 2 n2 N 2 1
~Z
or
P P2 ( p1 p2 )0 1
P 1 P 1 1 n1
N1 n1 N1 1
P2 1 P2 n2
~ Z.
N 2 n2 N 2 1
28
4. Sampling Methods 13
There are various methods of drawing random samples. One of the methods is to draw a simple random sample. You choose n among N elements. All n elements have equal drawing probability, (1/N), and all sample combinations N N! have the same probability. = n n!( N n)! Metaphorically speaking, imagine that all N in a population has a numbered slip put in a hat from which you draw n at random. Sometimes you can or should apply other sampling methods. In the following we describe when and how. If it is assumed that the income of a given population with regard to a specific variable, e.g. kind of housing, is very different (heterogeneous), it is advantageous to apply stratified sampling. The variable, type of housing, is a stratification criterion, and the sub-populations, in this example e.g. house owners and tenants, are the strata. In this context heterogeneous means that the population consists of sub-populations, each of which are similar (homogeneous), but across the populations there are differences. If you divide the population into the strata, owners and tenants, you will in each of these strata see a higher degree of homogeneity with respect to income, than if you consider the population as a whole. In each of the strata you have a high degree of homogeneity, so a smaller sample from each stratum is required in order to reach a reliable estimate of the mean income of the sub-population. You may furthermore allocate the sample among the strata so that you are certain that the sample is representative (proportional allocation) or so that the marginal information of the last sample unit drawn is the same for each stratum (optimal allocation). Concerning calculation of average and variance, weights should be included, i.e. the population proportions made up by the subpopulations. The weight for stratum no. 1 is N1 / N . Metaphorically speaking it corresponds to all
N1 in stratum no. 1 having a numbered slip in hat no. 1 - and that you then draw n1 at random and so on for the other strata.
If we have the opposite situation where each of the sub-populations is heterogeneous, and where homogeneity exists across the sub-populations, it is advantageous to use cluster sampling. If we assume that in a lot of 1,000 bags of potatoes there is no great variation from one bag to another, but there are large as well as small potatoes in every bag, we have homogeneity across the bags and heterogeneity within the bags. Cluster sampling can also be an advantage in some other cases. It is often the case that you do not know the population (there is no register) from which you can draw a sample. In such a situation you can choose some sub-populations and then draw a simple random sample or maybe make a total count. Large distances between the sub-populations involved with a subsequent long time of transportation for an interviewer as well as small marginal costs by choosing larger samples from a subpopulation already selected are indications in favour of cluster sampling. Metaphorically
13
Keller ch. 5.3
29
speaking it means that first you choose from among the hats, and then from among the slips in the selected hats. The fourth and last method to be discussed here is systematic sampling. Systematic sampling means that if you choose n among N elements, which are numbered from 1 to N, then you choose every N/nth element. Since N/n is seldom an integer, you round off to the integer. So all you have to do is to choose a random element, j, from the first N/n elements. The chosen elements will be j, j+N/n, j+2N/n, j+3N/n, ... , j+(n-1) N/n. If there is no relationship between the order of the elements and the variable to be examined, the arithmetic for simple random sampling can be used. It should be noted, however, that not all sample combinations have the same probability, since only N N/n combinations of in all are the only possibilities. n If there is some kind of relationship between the order of the elements and the variable examined, you can, if possible, make a sorting resulting in a random order for the variable to be examined. If it is not possible to make such a sorting (e.g. move about the passengers in a bus, change the times of completion for some mass-produced items), the methods of stratified sampling or of cluster sampling can be applied. As an example The Aarhus School of Business may wish to examine the degree of satisfaction with the tutorials in statistics. If the tutorials are the same for all groups, it would be advantageous to use cluster sampling. The tutorial groups are alike (homogeneous) in relation to satisfaction, whereas the students of the various groups differ (heterogeneous). This means that first you choose some groups (clusters), and then some, or maybe all, students are chosen from each of the groups chosen. If the tutorials held differ from one group to another (different teachers, times, rooms etc.) it would be advantageous to use stratified sampling. The tutorial groups differ (are heterogeneous) as far as satisfaction is concerned, whereas the students of the single groups are more alike (homogeneous) their fellow students in the group, than their other fellow students. This means that from each tutorial group (strata) a sample is drawn. The size of the sample depends on the size of the group; the larger the group the larger the sample. If the groups differ greatly with regard to variance; the greater the variance the larger the sample. It should be noted that none of the four sampling methods discussed above are superior. For a given problem you may often find that one or more of the methods will be superior to the others with regard to efficiency. By efficiency is meant the least margin of error/uncertainty in connection with the expenditure of a given sample.
30
5. Poisson Distribution, Confidence Intervals and Hypothesis Testing

If the random variable X follows a Poisson distribution, where E ( X ) = and V ( X ) = , the best estimate of the parameter is the number of occurrences in an interval of time divided by the number of time units (named m in the following and not necessarily an integer). If you on the basis of the number of occurrences that has taken place within one time unit want to construct a confidence interval for , the lower ( L ) and the upper ( H ) limit can be obtained as follows: P ( X x | L ) = / 2 and P ( X x | H ) = / 2. As a solution it can be shown that: 14
L =
2 2x;1 / 2
and
H =
2 2( x +1); / 2
Assume that the number of orders a department within a company receives per day is Poisson distributed and that you one day has received 15 orders. The best estimate of would be 15. A construction of a 95 % confidence interval for would be as follows:
L =
2 2x;1 / 2
2 30;0.975
16.79 = = 8.395 2
and
H =
2 2( x +1); / 2
2 32;0.025
49.48 = 24.74. 2
Assume that the number of daily orders follows a Poisson distribution and within a year of 210 (= m) working days you have received in total 3150 ( = x ) orders, then the best estimate of would be: x 3150 = = = 15. m 210 A construction of a 95 % confidence interval for would be, with approximating to the normal distribution, see p. 22, as follows:
m L =
2 6300;0.975
6300 1.96 12600 = 3040 and 2
m H =
2 6302;0.025
6302 + 1.96 12604 = 3261 2
L =
2 6300;0.975
m2
3040 = 14.48 and 210
H =
2 6302;0.025
m2
3261 = 15.53. 210
14
Johnson NL, Kotz S. Discrete Distributions. Boston: Houghton Mifflin Company 1969 and Stuart A, Ord JK. Kendall's Advanced Theory of Statistics (6th edition). London: Edward Arnold 1994.
31
If m is larger than 10, it could alternatively be calculated as follows:
x x 3150 3150 z / 2 = 1.96 = 15 1.96 0.267 = 15 0.52. m m 210 210
Testing if = 16 can be done as follows:

H 0 : = 16 x 3150 0 16 15 16 = m = 210 = = 3.62. 0.276 0 16 210 m
zObs =
x 0
15 16 1 = = = 0.25 and 4 16
zObs
Alternatively by a Goodness-of-fit test:
2 Obs
( x 0 ) =
0
and
2 Obs
( x m 0 ) =
m 0
2 2 are to be compared to the critical point 1; .
With reference to the source below 15, two independent Poisson processes can be compared as follows, when mi i can be assumed to be larger than 10:
1 2 =
x1 x2 x1 x2 + 2 z / 2 m12 m2 m1 m2
and
H 0 : 1 2 = ( 1 2 )0
zObs
x1 x2 m1 m2 = x1 + x2 m1 m2 m1 = m2 zObs = x1 + x2
when ( 1 2 )0 = 0
If
( x1 x2 )
zObs
x1 x2 ( 1 2 )0 m1 m2 = x1 x2 + 2 m12 m2 zObs =
when ( 1 2 )0 0
If m1 = m2
( x1 x2 ) m1 ( 1 2 )0
x1 + x2
15
Sahai H, Kurshid A. Statistics in epidemiology: methods techniques and applications. CRC Press 1996
32
Assume that the same assumption holds for another and independent department of the company. This department has one day received 14 orders and within 210 working days received 2940 orders in total. Confidence intervals for the difference in the two cases would by approximation to the normal distribution be as follows:
1 2 = ( x1 x2 ) z / 2 x1 + x2 = (15 14 ) 1.96 5.39 = 1 10.55 1 2 =

x1 x2 x1 x + 22 z / 2 2 m1 m2 m1 m2
and
3150 2940 3150 2940 = + = (15 14 ) 1.96 0.372 = 1 0.73. 1.96 210 2102 2102 210
Hypothesis for the difference in the two cases are as follows:

H 0 : 1 2 = 0
zObs =
( x1 x2 ) ( 1 2 )0 (15 14 ) 0 = =
x1 + x2 15 + 14
1 = 0.19 5.39
and
zObs
x1 x2 ( 1 2 )0 ( x x ) ( 3150 2940 ) = 210 = 210 = 2.69. m1 m2 = = 1 2 = 3150 + 2940 6090 78.04 x1 + x2 x1 + x2 m1 m2
33
For testing of 3 or more independent Poisson processes, H 0 : 1 = 2 = ... = k , a 2 -test is applied:
i =1
( Fi Ei )
Ei
k21
2 Obs
=
i =1
( fi ei )
ei
where f i = xi and ei = mi
x
i =1 k i =1
m
2 X
2 if mi = m we get Obs = i =1 k
( xi )
2 i
(k 1) s
k
where =
x
i =1
2 Obs =
i =1
( xi )
x
i =1
k 2
= ki =1 + xi . xi / k i =1 i =1
2 i
Assume that a third and independent department one day has received 19 orders and within 210 working days has received 3045 orders in total. The three departments can now be compared to each other. mi fi 1 1 1 15 14 19 48 ei 16 16 16 48
( f i ei )
mi 210 210 210 630
fi 3150 2940 3045 9135
ei 3045 3045 3045 9135
( fi ei )
ei 1/16 4/16 9/16 2 Obs = 0.875
ei 11025/3045 11025/3045 0 2 Obs = 7.241
2 2 In both cases the Obs value is to be evaluated with the critical point, 2;0.05 = 5.99 .
2 Obs = 3
152 + 14 2 + 19 2 16 = 0.875 48
2 Obs = 3
3150 2 + 2940 2 + 30452 3045 = 7.241. 9135
34
6. Overview
A) Descriptive measures
Parameter Parameter estimate Description/name Average
2
Description and reference to Keller Mean, p. 98
x = xi / n
i =1
2
s2 =
( xi x )
i =1
n 1
2
n
x
i =1
2 i
n x2
Variance Standard deviation
Variance, p. 107 Std., p. 110 Skewness, p. 36
n 1
Skew to the left 1 < 0 Skewness Symmetrical 1 = 0 Skew to the right 1 > 0 Less peaked than normal distr. 2 < 0
s= s
g1
( xi x )
i =1
ns
n i
g2
( x x )
i =1
ns
Kurtosis
Normal distribution 2 = 0 More peaked than normal distr. 2 > 0
(Kurtosis)
x
i =1
2 i
Sum of squared values

n 2 n
Sum of squared values Sum of squared deviations, p. 107 CV, p. 113 Std. Error, p. 300
2 SS X = ( xi x ) = xi2 n x 2 = (n 1) s X i =1 i =1
Sum of squares of the deviations Coefficient of variation Standard error
cv =
s x
sX =
s s N n n , if > 5% or s X = N 1 N n n
35
B) Construction of confidence intervals

Generally it is assumed that the sample is drawn at random and that the responses are reliable. A confidence interval with a confidence level of 1- expresses that with a certainty of 1- you can say that the parameter is included in this interval. In case the sample makes up more than 5% of a population, the variance of the test statistic is to be corrected by N 1 , e.g. The parameters and their starting point stated in the grey cells
Parameter (Generally) Starting point Known or assumed variance Estimate(s)
N n
2 X =
2
n
n N 1 . N
are most common.

Assumptions
Confidence interval
= z 2
That the test statistic, , follows a normal distribution.

That the test statistic X follows a normal distribution re: the central limit theorem:
Known or assumed variance
x=
x
i =1
= x z / 2
Irrespective of the distribution of X, X is approximately normally distributed, when the sample is sufficiently large. Use the rule-of-thumb that the sample should exceed 25 times the square of skewness. That X follows a normal distribution or that the extended central limit theorem is fulfilled.
x=
x
i =1
One sample Unknown variance
s=
(x x )
i =1 i
s = x tn1; / 2 n
2
If X is not extremely skewed,
X is approximately S/ n
Tn-1-distributed, when the sample is sufficiently large. Use the rule-of-thumb that the sample should exceed 100 times the square of skewness. (see pp. 24 and 35)
n 1
36
Parameter
Starting point X~Poisson distributed (see pp. 14, 31 and 32)
Estimate(s)
Confidence interval
Assumptions That X is approximately normally distributed.
Occurrences per time unit
x is the number of occurrences
x x = z 2 m m
If m is assumed to be larger than 10. Note that x is used for calculating the margin of error.
m is the number of time units; m > 0.

Paired samples
d =
d
i =1
That D follows a normal distribution or that the extended central limit theorem is fulfilled.
Unknown variance
(di = x1i x2i )

Two independent samples Known variances Two independent samples Unknown but equal variances. (see p. 25) Two independent samples Unknown variances
sD =
n1
(d
i =1
d )
D = d tn 1; 2 s n
D
If D is not extremely skewed,
D D is approximately SD / n
n 1
T n-1-distributed when the sample is sufficiently large.
1 2
x1 =
2 p
x
i =1
1i
1 2 = ( x1 x2 ) z 2
1 2 =
2 1
n1
2 ( n1 1)s12 +( n2 1)s2
n1
+ 22 n
That X 1 X 2 is approximately normally distributed.
1 2
s =
That
1 2
(X
X 2 ) ( 1 2 )
2 Sp
n1 + n2 2
( x1 x2 ) tn + n 2 2
;
s2 p
1 n1
+ n12
1 n1
+ n12
is approximately
Tn1 + n2 2 -distributed.
That
1 2
s12 =
( x1i x1 )
i =1
n1
(X
X 2 ) ( 1 2 )
S12 n1
n1 1
1 2 = ( x1 x2 ) t ; 2
2 s1 n1
2 s2 n2
2 S2 n2
is appr. T -distributed
2 s2 = ....
- for calculation of see p. 26.
37
Parameter
Starting point X1 and X2 ~Poisson distributed (see p. 14, 32 and 33)
Estimate(s)
Confidence interval
Assumptions That X1 - X2 is approximately normally distributed and independent.
1 2
x1 = number of occurrences in Poisson process 1
1 2 =
x1 x2 x1 x + 22 z 2 2 m1 m2 m1 m2
If m1 1 and m2 2 both are larger than 10. Note that x1 and x2 are used for calculating the margin of error. Two time spans of m1 and m2 time units.
Three or more independent samples
i j
Simultaneous C.I. Unknown but equal variances
s2 = p
(n
j =1
1) s 2 j
1 2 =
(x x ) t
i j
n k ; * 2
s
2 p
1 ni
1 nj
That
(X
X j ) ( i j ) S
2 p
nk
1 ni
1 nj
is approximately
* = k (2k 1)
2
Tn-k-distributed.
One sample
s =
2
( x x )
i =1 i
( n 1) s 2 2 ( n 1) s 2
2 n 1; 2
n 1
2 n 1;1 2
That X is normally distributed. Alternatively that the sample is sufficiently large.
12 2 2
Two independent samples One sample
2 s12 s2
f n1 1;n2 1; 2
x n
2 12 s12 s2 2 2 f n 1;n 1;1 2

1 2
That X1 and X2 both are normally distributed. Alternatively that the samples are sufficiently large.
X is binomial or hypergeometrical distributed
p=
p = p z 2
p (1 p )
x = number successes
That X is approximately normally distributed. This means that n p and n (1 p ) can be assumed to be larger than 5, or that V(X) can be assumed to be larger than 5.
38
Parameter
Starting point Two independent samples
Estimate(s)
Confidence interval
Assumptions That X1 - X2 is approximately normally distributed.
p1 p2
X1 and X2 are , binomial or hypergeometrical distributed One sample
p1 =
x1 n1
p1 p2 = ( p1 p2 ) z 2
p1 (1 p1 ) n1
p2 (1 p2 ) n2
This means that n p1 , n (1 p1 ) , n p2 and
n (1 p2 ) can be assumed to be larger than 5, or that V ( X 1 ) and V ( X 2 ) can be assumed to be larger than 5.
pi =
xi n
pj =
xj n
pi p j =
( p p ) z
i j
pi + p j pi p j
That Xi Xj is approximately normally distributed. This means that n pi , n (1 pi ) , n p j and n 1 p j can be assumed to be larger than 5, or that V ( X i ) and
pi p j
X is multinomial or kdimensional hyperxi = number of geometrical outcomes with distributed characteristic i Regression analysis
V ( X j ) can be assumed to be larger than 5.
b1 =
cov( X , Y ) 2 sX
1 = b1 tn2; 2 sb
That the error component is approximately normally distributed, E ( ) = 0 and V ( ) = constant. That the error component is approximately normally distributed, E ( ) = 0 and V ( ) = constant. When X = 0 is close to the X values. That the error component is approximately normally distributed, E ( ) = 0 and V ( ) = constant. When X = x is close to the X values. That the error component is approximately normally distributed, E ( ) = 0 and V ( ) = constant. When X = x is close to the X values.
Regression analysis Regression analysis
b0 = y b1 x
0 = b0 tn 2; 2 sb
Y | X
Confidence interval for expected value Regression analysis
E (Y | X ) = ( b0 + b1 x ) tn 2; 2 s
1 n
+ ( SS X)
x x
Y|X
Prediction interval for a value
Y | X = ( b0 + b1 x ) tn 2; 2 s 1 + 1 + ( SS X) n
x x
39
Parameter
Starting point The degree of linear relationship
Estimate(s)
Confidence interval
Assumptions
Analysis of correlation When is assumed to be ~ 0
r=
cov( X , Y ) s X sY
r tn2; 2
1 r 2 n2
Corresponding to assumptions concerning 1 .
Corresponding to assumptions concerning Analysis of correlation When is assumed to be 0
cov( X , Y ) r= s X sY
Z = rZ z / 2 / n 3 [ Lower ;Upper ]
Transformations: rZ = ( ln(1 + r ) ln(1 r ) ) / 2
Lower =
e e
2 rz z / 2 / n 3 2 rz z / 2 / n 3
) )
1 +1
Upper =
e e
2 rz + z / 2 / n 3 2 rz + z / 2 / n 3
) )
1 +1
40
C) Application of hypotheses
Generally it is assumed that the sample is a simple random sample and that the responses are reliable. The 7 points approach below refer to a hypothesis on a -value. The approach can also be used with another notation for tests of proportions, difference between two means/proportions as well as for tests on one and two variances, respectively.
X: Definition of the random variables. Given information: n, x and 2 or s2.

1) Hypothesis formulation H0: = 0 The null hypothesis is status quo (write down the action taken if H0 is chosen) (write down the action taken if H1 is chosen). H1: 0 The alternative hypothesis is what we wish or fear to show
H1 could possibly be one-sided, > 0 or < 0, given that it can be reasoned for through a theory or earlier survey within the given area. If the test is one-sided it will have an effect on the critical limit (point 5) and the calculation of the p-value (point 6).
2) Choice of significance level, typically 0.05 unless otherwise stated (see Keller p. 346) The maximum uncertainty allowed for a type I error, i.e. choosing H1, when H0 is true. If the consequence by committing a type I error is crucial, the level should be lowered. 3) Choice of test statistic (cf. pp. 24, 25-28 and 43-48)
If the population variance is known (2), or assumed, a Z-test statistic (observatory) is used, see Keller chapter 11.2, If the variance is estimated (s2) on the basis of a simple random sample, the T-test statistic is used: Assumptions: H0 is true X NF n/N < 0.05 or
X 0
/ n
Z.
X 0 S/ n
Tn1 , see Keller chapter 12.1.
X 0 S/ n
- hypothesis testing is conducted from the assumption that H0 is true.
Tn 1
- must be discussed, see p. 24 and Keller p. 300 and p. 389. - or else the finite population correction factor must be added.
41
4) Calculation of value of test statistic (see Keller p. 351 and p. 383) x 0 x 0 zObs = tObs = . or s/ n / n Number of standard deviations of X ( X or s X ) that the simple random sample result ( x ) deviates from 0.
If the sample represents more than 5% of the population, the variance of the estimator is corrected with N 1 , e.g.
5) Determination of critical values and choice between H0 and H1 (see Keller p. 350-351 and p. 386) z/2 and z1-/2 - add a graphic illustration
N n
2 X =
2
n
n N 1 . N
if |zobs| |z/2| maintain H0 if |zobs| > |z/2| reject H0 At a lower one-sided hypothesis test At an upper one-sided hypothesis test
(then H0 cannot be rejected) (then H0 is accepted) (then H0 is chosen) (making H1 plausible) (then H1 is chosen) H1: < 0 is the critical limit H1: > 0 is the critical limit -|z| |z|
If the variance is estimated on the basis of the sample, the Tn-1-distribution fractiles must be applied.
6) Calculation of p-value (see Keller p. 353 and p. 386) The probability of just as extreme or more extreme sample result, if the null hypothesis is true, i.e. At a lower one-sided hypothesis test H1: < 0 the p-value is P(Z zobs). At an upper one-sided hypothesis test H1: > 0 the p-value is P(Z zobs).
2P(Z -|zobs|)
or
2P(Tn-1 -|tobs|)
7) Conclusion (see Keller p. 355 and p. 358) Must contain a choice: either you choose H0 or H1 and the certainty with which the choice has been made. It is a very certain conclusion when you choose H0 - and p-value is >> . It is an uncertain conclusion when you choose H0 - and p-value is > but close to . It is an uncertain conclusion when you choose H1 - and p-value is < but close to . It is a very certain conclusion when you choose H1 - and p-value is << .
The reservations you may have about the assumptions have hardly any influence on the choice if it is a very certain conclusion, but they may be of importance for the choice between H0 and H1, when dealing with an uncertain conclusion. 42
The hypotheses stated in the grey cells

H0-hypothesis (Generally) Test statistic
are the most common.

The value of the test statistic Assumptions
= 0 = 0
2 known or assumed
~Z
zObs =
That the test statistic, , follows a normal distribution.

That the test statistic, X , follows a normal distribution re: the central limit theorem:
X 0 ~Z n
zObs =
x 0
Irrespective of the distribution of X, X is approximately normally distributed, when the sample is sufficiently large. Use the rule-of-thumb that the sample should exceed 25 times the square of skewness. That X follows a normal distribution or that the extended central limit theorem is fulfilled.
= 0
X 0 ~ Tn 1 S n
Unknown variance
tObs
x 0 = s n
If X is not extremely skewed,
X is approximately Tn-1S/ n
distributed, when the sample is sufficiently large. Use the rule-of-thumb that the sample should exceed 100 times the estimated square of skewness (see pp. 24 and 92). That X is approximately normally distributed.
= 0
X ~ Poisson
X 0 m ~Z
0
m
zObs
x 0 = m
If m 0 is assumed to be larger than 10. Note that 0 is used for calculating the standard deviation for the test statistic. m is the number of time units, m > 0.
0
m
43
H0-hypothesis
Test statistic
The value of the test statistic
Assumptions Paired samples.
D = D
D D0 SD n
~ Tn 1
tObs =
d D0 sD n
That D follows a normal distribution or that the extended central limit theorem is fulfilled. If D is not extremely skewed,
D D0 SD / n
is approximately T n-1-
1 2 = ( 1 2 )0 2 s are known or j
assumed
distributed, when the sample is sufficiently large.
(X
X 2 ) ( 1 2 )0
12
n1
+ 22 n
~Z
zObs =
( x1 x2 ) ( 1 2 )0
12
n1
Two independent samples. That X 1 X 2 is approximately normally distributed. Two independent samples.
+ 22 n
1 2 = ( 1 2 )0
given =
2 1 2 2
(X
X 2 ) ( 1 2 )0
2 Sp
1 n1
+ n12
~ Tn1 + n2 2
tObs =
( x1 x2 ) ( 1 2 )0
s2 p
1 n1
+ n12
(X
That
X 2 ) ( 1 2 )0
2 Sp
1 n1
+ n12
is approximately
Unknown variances
Tn1 + n2 2 -distributed.
Two independent samples.
1 2 = ( 1 2 )0
Unknown and unequal variances
(X
X 2 ) ( 1 2 )0
2 S1 n1
2 S2 n2
~ T
tObs =
( x1 x2 ) ( 1 2 )0
2 s1 n1
s + n22
That
(X
X 2 ) ( 1 2 )0
S12 n1
2 S2 n2
is approx. T -distributed
- for calculation of see p. 26.
44
H0-hypothesis
Test statistic
Assumptions That X1 - X2 is approximately normally distributed. That X1 og X2 is independent. If m1 1 and m2 2 both are larger than 10. Note that x1 and x2 is used for calculating the standard deviation for the test statistic. Two time spans of respectively m1 and m2 time units.
1 2 = 0
X j Poisson
X1 X 2 m1 m2 ~Z X1 + X 2 m1 m2
zObs
x1 x2 m1 m2 = x1 + x2 m1 m2
1 = 2 = ... = k
MSTR ~ Fk 1;n k MSE
fObs =
n ( x
j =1 k j
x ) /(k 1)
2
That the Xjs are approximately normally distributed and have equal variances. That Xjs are independent. Alternatively that the samples are sufficiently large. That X is normally distributed. Alternatively that the sample is sufficiently large. That both X1 and X2 are normally distributed. Alternatively that the samples are sufficiently large. The false F-test
(n
j =1
1) s 2 /( n k ) j
2 = 02
( n 1) S 2 ~ 2 n 1 2
0
S12 ~ Fn1 1; n2 1 S 22
2 Obs =
( n 1) s 2
02
s12 2 s2
2 s Max 2 s Min
2 12 = 2
f Obs =
Evaluation of
f Obs =
2 k
ift
f n 1; n
i
1; * / 2
= = ... =
2 1 2 2
That Xi and Xj are both normally distributed and that n j equal. Alternatively that the samples are sufficiently large.
* =
2 k ( k 1)
45
H0-hypothesis
Test statistic
Assumptions That X is binomial or hypergeometrical distributed.
p = p0
p0 (1 p0 ) n
P p0
~Z
zObs =
p0 (1 p0 ) n
p p0
That X is approximately normally distributed. This means that n p0 and n (1 p0 ) are larger than 5, or that V(X) are larger than 5
(P = )
X n
p1 p2 = 0
P 1 P
P P2 1
)(
1 n1
+ n12
)
)
~Z
zObs =
p1 p2 p (1 p )
1 n1
+ n12
That X1 X2 is approximately normally distributed. Two independent samples. That X1 X2 is approximately normally distributed. One sample, each test has 2 possible outcomes.
p1 p2 = ( p1 p2 )0
(P P ) ( p p )
1 2 1 P 1 P 1 1
n1
2 0
P2 1 P2
n2
~Z
zObs =
( p1 p2 ) ( p1 p2 )0
p1 (1 p1 ) n1
p2 (1 p2 ) n2
pi p j = 0
Pi Pj
Pi + Pj n
~Z
zObs
p p j xi x j = = i pi + p j xi + x j n
That Xi Xj is approximately normally distributed. This means that n pi , n (1 pi ) , n p j and n 1 p j can

i
j
) be assumed to be larger than 5, or that V ( X ) and V ( X ) can
be assumed to be larger than 5. One sample, each trial has 2 possible outcomes.
pi p j = ( pi p j )
(P P ) ( p p )
i j i
j 0
Pi + Pj pi p j
n
)0
~Z
zObs
( p p )( p p ) =
i j i pi + p j pi p j n
That Xi Xj is approximately normally distributed. This means that n pi , n (1 pi ) , n p j and can be assumed to be larger than 5, or that V ( X i ) and V X j to be larger than 5.
j 0
)0
( ) can be assumed
46
H0-hypothesis
Test statistic
Assumptions That Xi is approximately normally distributed. This means that all ei must be larger than 5. m = number of estimated parameters. That Xij is approximately normally distributed.
p1 = p10 , ... , pk = pk0
p
i =1
i0
=1
i =1
( Fi Ei )
Ei
~ k2 m 1
2 Obs
=
i =1
( fi ei )
ei
ei = n pi0
Test of distribution
Independence or homogeneity
i =1 j =1
(F
ij
Eij ) Eij
~ (2r 1)( c 1)
2 Obs
=
i =1 j =1
(f
ij
eij ) eij
This means that all eij must be larger than 5.
eij =
ni i ni j n
ni j =
i =1
fij and ni i =
f .
j =1 ij
1 = 1
b1 10 Sb1 b0 00 Sb0
~ Tn 2
tObs =
b1 10 sb1 b0 00 sb0
That the test statistic is approximately normally distributed, E ( ) = 0 and V ( ) = constant. That the test statistic is approximately normally distributed, E ( ) = 0 and V ( ) = constant. When X = 0 is close to the X values. That the test statistic is approximately normally distributed, E ( ) = 0 and V ( ) = constant. When X = x is close to the X values. That the test statistic is approximately normally distributed, E ( ) = 0 and V ( ) = constant. When X = x is close to the X values.
0 = 0
~ Tn 2
tObs =
Y = Y
( b0 + b1 X ) Y
S
1 n
(X X ) SS X
~ Tn 2
tObs =
( b0 + b1 x ) Y
s
1 n
( x x )2
SS X
Y | X = Y0
( b0 + b1 X ) Y
S 1 + 1 + n
(X X )
SS X
~ Tn 2
tObs =
( b0 + b1 x ) Y
s 1 + 1 + ( SS X) n
x x
2
47
H0-hypothesis
Test statistic
Assumptions Corresponding to assumptions concerning 1 . Corresponding to assumptions concerning 1 .
=0
n2 1 R 2
~ Tn 2
tObs = r
n2 1 r 2
= 0
RZ Z0 1 n3
~Z
zObs =
rZ Z0 1 n3
Transformations:
rZ = ( ln(1 + r ) ln(1 r ) ) / 2
Z = ( ln(1 + 0 ) ln(1 0 ) ) / 2
0
1 = 2
RZ1 RZ2
1 n1 3
+ n213
~Z
zObs =
rZ1 rZ2
1 n1 3
Corresponding to assumptions concerning 1 . Transformations:
+ n213
rZ1 = ( ln (1 + r1 ) ln (1 r1 ) ) / 2
rZ 2 = ( ln (1 + r2 ) ln (1 r2 ) ) / 2
A , B = A ,C
2 (1 R R
2 A, B
RA,C RB,C
(n 3) (1+ RA,B )
2 A,C
2 RB,C + 2 RA,B RA,C RB,C )
~ Tn3
tObs =
2 (1 r r
2 A, B
rA,C rB,C
(n 3) (1+ rA,B )
2 A,C
2 rB,C + 2 rA,B rA,C rB,C )
Overlapping correlations where the correlation between A and C are tested against the correlation between B and C in consideration of the correlation between A and B.
48

Business Statistics

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Business Statistics

Încărcat de

Drepturi de autor:

Formate disponibile

DEPARTMENT OF MARKETING AND STATISTICS

AARHUS SCHOOL OF BUSINESS UNIVERSITY OF AARHUS

INTERNT UNDERVISNINGSMATERIALE E309 (ERSTATTER E281)

1. Expected Value and Variance of Random Variables

0.1 0.4 0.9 1.6 3.0 = E(X)

0.4 0.2 0.0 0.4 1.0 = V(X)

The probability that X assumes a value within the interval [a;b] is

The expected value is defined as E ( X ) = X = The variance is defined as

if X and Y are independent if X and Y are independent if X and Y are independent

when V ( X i ) = 2 and COV ( X i , X j ) = 0

The coefficient of correlation, , is defined from the covariance:

2 is an expression of the proportion of the variation in Y which is explained by X. can be

COV ( X , Y ) 0.50 = = 0.6402 X Y 1.0 0.61

0.10 0.10 0.45 0.60 1.25 1.80 4.30 = E(S)

1.0890 0.2645 0.2535 0.0135 0.1225 0.8670 2.6100 = V(S)

2. Various Probability Distributions

Geometric distribution, X = number of throws, p = 1/6

Binomial distribution, X = number of sixes, n = 6 throws, p = 1/6

Hypergeometric distribution, X = number of clubs, n = 13, N = 52 and S = 13

Poisson distribution, = 360/198

T-distribution with 9 degrees of freedom

Standard normal distribution, N(0;1)

Exponential distribution, density function = 360/(198 90) = 2/99

F-distribution with various degrees of freedom

2 -distribution with 1, 2, 3 and 4 degrees of freedom

k-dimensional hypergeometric distribution (B)

Multinomial distribution (D)

Geometric distribution (E)

N > 50 n < 0.05 N

Binomial distribution (C)

Hypergeometric distribution (A)

np>5 n (1-p) > 5

Poisson distribution (F)

Normal distribution (I)

Exponential distribution (G)

A) The hypergeometric distribution 1

,15 ,12 ,10 ,08 ,05 ,04 0,00 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Approximations To the binomial distribution, if N > 50 and

To the normal distribution if n

S S N n 2 1 > 5 At the approximation =E(X) and =V(X). N N N 1

x + 0.5 E ( X ) If P ( X x | N , n, S ) is required, then P Z < V (X )

for illustration, see Applet 13, Keller p. 312.

B) The k-dimensional hypergeometric distribution

C) The binomial distribution

where X = the number of trials with the outcome A.

Cf. Keller p. 236 Cf. Keller p. 310

D) The multinomial distribution

The geometric distribution

Geometric distribution, X = number of throws until you get a six, p = 1/6

E-supl.) The negative binomial distribution

The Poisson distribution

Cf. Keller p. 243

G) The exponential distribution

Assumptions: Like the Poisson distribution. Density function 5

Cf. Keller p. 277

H) The uniform distribution

Expected value b+a E( X ) = 2

Expected value b+a E( X ) = 2

Cf. Keller p. 255

The normal distribution 7

The standard normal distribution, N(0;1)

Normal distribution Standard normal distribution

where a, b1 and b2 are arbitrary constants.