Sunteți pe pagina 1din 350

Research Methods

Part 2: Statistics

If you have no brain to guess


Then enjoy this utter mess!

1
Frequency Distribution
 Definition: Frequency is the number of times a
certain event occurs
 Example: Assume that a pediatrician is interested
in the weight of new born babies at a Gandhi
Hospital. The frequency graph of the weights in
grams of babies is given in the next slide
 What is the probability of the weight at birth of
any baby born being in 2440 g class?

2
Frequency distribution

p=0.0677

3
Birthweight < 3000 g
How many of them?
Frequency distribution

28.57%

4
Frequency distribution

We sampled from unknown population of


babies and found that the very first individual
sampled had birthweight equal to 6000 g.
Is this the same population?

5
Frequency distribution

P(Birthweight=6000) → 0

6
Frequency distribution

We would reject any hypothesis that the


unknown population was the same as the
sampled.
We would conclude, that most probably, the
unknown population we sampled from is
different in mean and possibly in variance.

7
Frequency distribution
We have used the empirical frequency distribution
to make certain predictions or make judgments and
decisions. In many cases we will make such
predictions not from empirical distributions, but on
the basis of theoretical considerations.
 We may feel, that data should be distributed in a
certain way. If our observed data do not sufficiently
conform to the values expected on the basis of
these assumptions – we will have serious doubts
about our assumptions.
8
Probability distribution

The assumptions being tested generally lead to a


theoretical frequency distribution, known also as
a probability distribution.

Definition: Probability of an event is a numerical


value that represents the proportion of times an
event is expected to happen when the
experiment is repeated under identical conditions

9
Discrete probability distributions
 Binomial distribution
 Negative Binomial distribution

 Multinomial distribution

 Geometric distribution

 Hypergeometric distribution

 Poisson distribution
10
The binomial distribution
 Suppose that n independent experiments,
or trials, are performed, where n is a fixed
number, and that each experiment results
in „success” with probability p and a
„failure” with probability q=1-p.
 The total number of successes, X, is a
binomial random variable with parameters n
and p.
11
The Binomial Distribution
The probability, that X=k, or p(k), can be
found in the following way:
 Any particular sequence of k successes
nk
occurs with probability p (1  p )
k

 The total number of such sequences is n


k 
 
 n  nk n! nk
p(k )    p (1  p ) 
k
p (1  p )
k

k  k!(n  k )!
12
The binomial distribution

The expected value is equal to:

E( X )  k  p

and variance can be obtained from:

Var ( X )  k  p  q
13
The binomial distribution

N=10, p=0.1

N=10, p=0.5

14
Example of binomial distribution
 Tay-Sachs disease is a rare but fatal disease
of genetic origin occuring chiefly in infants
and children. If a couple are both carriers of
Tay-Sachs disease, their child has probability
0.25 of being born with the disease.
 If such a couple has four children, what is
the frequency function for the number of
children that will have the disease?

15
Solution of the problem
(a)  4
P ( X  0)     0.250  0.754  1 1 0.316  0.3164
0
(b)  4
P ( X  1)     0.2510.753  6  0.25  0.421875  0.422
1 
(c)  4
P( X  2)     0.252  0.752  6  0.0625  0.5625  0.211
 2
(d)  4
P( X  3)     0.253  0.751  4  0.015625 0.75  0.047
3
(e)  4
P ( X  4)     0.254  0.750  1 0.00391 1  0.004
 4

16
The binomial distribution
0.422

0.316

0.211

0.047
0.004

17
The Geometric Distribution
 The geometric distribution is also
constructed from independent Bernoulli
trials like the binomial, but from an
infinite sequence. On each trial, a
success occurs with probability p, and X is
the total number of trials up to and
including the first success. In order that
X=k, there must be k-1 failures followed
by a success. k 1
p (k )  (1  p ) p
18
The geometric distribution

The expected value is equal to:


1
E( X ) 
p

and variance can be obtained from:


1 p
Var ( X )  2
p
19
Example

20
The hypergeometric distribution
 Suppose that an urn contains n balls, of
which r are black and n-r are white.
 Let X denote the number of black balls
drawn when taking m balls without
replacement. So
 r  n  r 
 k  m  k 
P( X  k )  p (k )    
n
m
  21
The hypergeometric distribution

The expected value is equal to:


r m
E( X ) 
n

and variance can be obtained from:


r  m  ( n  r )  ( n  m)
Var ( X ) 
n  (n  1)
2
22
Example
Suppose there are 100 floppy disks and
we know that 20 of them are defective.
What is probability of drawing zero to
two defective floppies if you select 10 at
random?
n=100 r=20 m=10

23
Example

24
The Poisson distribution
 The Poisson distribution can be derived
as the limit of a binomial distribution as
the number of trials n approaches to
infinity and the probability of success on
each trial p approaches zero in such a
way that np=λ.
k 
e
p(k ) 
k!
25
The Poisson distribution

The expected value is equal to:

E( X )  

and variance can be obtained from:

Var ( X )  
26
The Poisson distribution

27
Example
 Two dice are rolled 100 times, and the
number of double sixes X is counted.
The distribution of X is binomial with
n=100 and p=1/36=0.0278. Since n is
large and p is small, we can
approximate the binomial probabilities
by Poisson probabilities with λ=np=2.78

28
Example

29
Another example
 Suppose that an office receives
telephone calls as a Poisson process
with λ=0.5 per minute. The number of
calls in a 5-min interval follows a
Poisson distribution with parameter
ω=5λ=2.5. Thus, the probability of no
calls in 5-min interval is
2.5
p(k  0)  e  0.082
30
Continuous density functions
For the continuous random
variable, the role of the frequency
function is taken by a density
function f(x) which has the
properties:

f ( x)  0 and  f ( x)dx  1

and b
P(a  X  b)   f ( x)dx
a
31
Probability Density Function

32
Examples of Continuous Density Functions

 The uniform density


 The exponential density
 The normal distribution
 Chi-square distribution
 F distribution
 t distribution
33
The uniform density function
 A distribution which has constant
probability is called uniform distribution.
 The probability density function (pdf):

0 for xa
 1
P( X )   for a xb
 b  a
0 for xb
34
Uniform Density Function…
A function f(x) is called a probability density function
(over the range a ≤ x ≤ b if it meets the following
requirements:
1) f(x) ≥ 0 for all x between a and b, and

2) The total area under the curve between a and b is 1.0

f(x)
area=1
a b x
35
Uniform Distribution…
Consider the uniform probability distribution
(sometimes called the rectangular probability
distribution).
It is described by the function:

f(x)

a b x

area = width x height = (b – a) x


=1 36
The uniform density function

x
F ( X )  P( X  x)   f ( x)dx 37

Example …
The amount of gasoline sold daily at a service station is
uniformly distributed with a minimum of 2,000 liters
and a maximum of 5,000 liters.
f(x)

2,000 5,000 x

What is the probability that the service station will sell


at least 4,000 liters?
Algebraically: what is P(X ≥ 4,000) ?
P(X ≥ 4,000) = (5,000 – 4,000) x (1/3000) =0.3333
38
The exponential density function
 The exponential random variable can be
used to describe the life time of a
machine, industrial product and Human
being. Also, it can be used to describe
the waiting time of a customer for some
service.
 The probability density function (pdf):
 x
f ( x)  e , for 0  x   and   0
1 

39
Exponential Distribution (another form)…
Another important continuous distribution is the
exponential distribution which has this probability
density function:

Note that x ≥ 0. Time (for example) is a non-negative


quantity; the exponential distribution is often used for time
related phenomena such as the length of time between
phone calls or between parts arriving at an assembly station.
Note also that the mean and standard deviation are equal to
each other and to the inverse of the parameter of the
distribution (lambda )

40
The exponential density function

The expected value is equal to:


 x
1
E( X )   x
0

e 
dx  

and variance can be obtained from:


 x
2 1
Var ( X )    x      e dx  
 2

0

41
Exponential Distribution…

The exponential distribution depends upon the value


of . Smaller values of “flatten” the curve:
(E.g. exponential
distributions for
= =0.5, 1, 2)

42
The exponential density function

  15

43
The exponential density function
 x0
P ( X  x0 )  1  e 

and
 x0
S ( x0 )  P( X  x0 )  1  P( X  x0 )  e 

is called the survival function.


44
The exponential density function

45
Example
Let X represents the life time of a washing
machine. Suppose the average lifetime for
this type of washing machine is 15 years.
a.What is the probability that this washing
machine can be used for less than 6 years?
b.Also, what is the probability that this
washing machine can be used for more
than 18 years?
46
Example
P(X≤6) ≈ 0.0447×6+(0.0667-
0.0447)×6/2
0.0667 = 0.3342
0.0447

47
The exponential density function

6
P( X  6)  1  e  0.3297
15

18
P( X  18)  e 15
 0.3012

48
Example

Thus, for this washing machine, it is about


30% chance that it can be used for quite a
long time or a short time.

49
Poisson and exponential ...
Let Y be a Poisson random variable representing
the number of occurrences in the unit time interval
with the probability distribution
k u
 e
P(Y  k )  ,
k!
where μ is the mean number of occurrences in this
time interval. Then, if X represents the time of one
occurrence, X has the exponential density function
with mean 1
E( X )   
 50
Example
 The average number of car accidents
on the highway in two days is 8. What
is the probability of no accident for
more than 3 days?

51
Example
 The average number of car accidents on the
highway in one day is 4. Thus, the mean
time of one occurrence is 0.25 (day).
 Let Y be the Poisson random variable with
mean 4 representing the number of car
accidents in one day, while X be the
exponential random variable with mean
representing the time of one accident
occurrence. 52
Example

P(no accident for more than 3 days) =


P(the time to the first occurence larger than 3)
3
12
P( X  3)  e 0.25
e 0

53
The Normal distribution
 The normal distribution plays a central role in
probability and statistics.
 This distribution is also called the Gaussian
distribution after Carl Friedrich Gauss, who
proposed it as a model for measurement
errors (in 1809).
 The normal distribution has been used as a
model for such diverse phenomena as a
person’s height, the distribution of IQ scores,
and the velocity of gas molecules.
54
The Normal Distribution…
The frequency curve of the normal probability
distribution looks like the following graph. As
could be seen it has a shape of a bell. It is
also symmetrical around the mean. By that is
meant that the left side of the mean of the
curve is a mirror
image of the right side

55
The normal distribution
 The density function of the normal
distribution depends on two parameters, μ
called mean and σ named standard deviation
of the normal density (where -∞< μ< ∞ and
σ > 0):

56
The Normal Distribution…

Important things to note:


The normal distribution is fully defined by two
parameters: its standard deviation and mean

The normal distribution is bell shaped and


symmetrical about the mean

Unlike the range of the uniform distribution (a ≤ x ≤ b)


Normal distributions range from minus infinity to plus infinity
57
The normal distribution

μ=0 μ=4

Same variance but different means


58
The normal distribution

σ=1
σ=2

σ=3

Same mean but different variances


59
The normal distribution
 The curve is symetrical around the mean.
Therefore the mean, median, and mode of the
normal distribution are the same.
 The following percentages of individuals in a
normal distirbution lie within the indicated
limits:
μ ± 1σ contains 68.72 % of individuals
μ ± 2σ contains 95.45 % of individuals
μ ± 3σ constains 99.73% of individuals
 These are depicted in the following graph
60
The normal distribution
99.73%
95.45%
68.27%

61
The normal distribution

Cumulative normal
distribution function

Normal probability
50.00%
density function

34.13%

5.87% 13.59%
2.28% 2.14%

62
Standard normal distribution
 The special case for which =0 and
=1 is called the standard normal
density.

 Its cumulative density function is


denoted by  and its density by .

63
Standard normal distribution
 Probabilities for general normal random
variables can be evaluated in terms of
probabilities for standard normal
variables.
 To demonstrate it we will use the
following property:

If X ~ N (,  ) and Y  aX  b,
2

then Y ~ N (a  b, a  )
2 2
64
Standard normal distribution
 Suppose that X~N(,2) and we wish to
find P(x0<X<x1) for some numbers x0
and x1. Consider the random variable:
X  X 
Z  
  
where a=1/ and b=-/. We see that
 
Z ~ N (a  b, a  )  N (  , ( )  )  N (0,1)
2 2
 
1 2

2

65
Standard Normal Distribution…

A normal distribution whose mean is zero and standard


deviation is one is called the standard normal distribution.

-0
1

1
As we shall see shortly, any normal distribution can be
converted to a standard normal distribution with simple
algebra. This makes calculations much easier.

66
Standard normal distribution
 Therefore
X  x
FX ( x)  P ( X  x)  P (    )
x x
 P( Z   )  (  )
We thus have
P( x0  X  x1 )  FX ( x1 )  FX ( x0 ) 
x1  x0  
 (  )  (  )
67
Normal Distribution…
The normal distribution is described by two parameters:
its mean and its standard deviation . Increasing
the mean shifts the curve to the right…

68
Normal Distribution…
The normal distribution is described by two parameters:
its mean and its standard deviation . Increasing
the standard deviation “flattens” the curve…

69
Calculating Normal Probabilities…

Example: The time required to build a computer is normally


distributed with a mean of 50 minutes and a standard
deviation of 10
minutes:

What is the probability that a computer is assembled in a time


between 45 and 60 minutes?
Algebraically speaking, what is P(45 < X < 60) ?
70
Calculating Normal Probabilities…
…mean of 50 minutes and a
P(45 < X < 60) ? standard deviation of 10 minutes…

71
Calculating Normal Probabilities…

P(–0.5 < Z < 1) looks like


this:
The probability is the area
under the curve…

We will add up the


two sections: 0
P(–0.5 < Z < 0) and –.5 …... 1
P(0 < Z < 1)

72
Calculating Normal Probabilities…

How to use Table of Z distribution


This table gives probabilities P(0 < Z < z)
First column = integer + first decimal
Top row = second decimal place

P(0 < Z < 0.5)

P(0 < Z < 1)

P(–0.5 < Z < 1) =0.1915 + 0.3414 = 0.5328

The probability time is between


45 and 60 minutes = 0.5328
73
Using the Normal Table (Table 3)…
What is P(Z < 1.52)
P(0 < Z < 1.52)
P(Z < 0) =0.5

0 1.52
P(Z < 1.52) = 0.5 + P(0 < Z < 1.52)
= 0.5 + 0.4357
= 0.9357
74
Example 2: Return To Investment
The return on investment is normally
distributed with a mean of 10% and a standard
deviation of 5%. What is the probability of losing
money?
We want to determine P(X < 0). Thus,
 X   0  10 
P ( X  0)  P   
  5 
 P ( Z   2)
 0.5  P(0  Z  2)
 0.5  0.4772
 0.0228

8.75
Finding Values of ZA…
Often we’re asked to find some value of Z for
a given probability, i.e. given an area (A)
under the curve, what is the corresponding
value of z (zA) on the horizontal axis that gives
us this area?
That is:

P(Z > zA) = A

76
Finding Values of Z…
What value of z corresponds to an area under the curve of
2.5%? That is, what is z.025 ?

Area = .50 Area = 0.025

Area = 0.50–0.025 = 0.4750

If you do a “reverse look-up” on Table of z for 0.4750,


you will get the corresponding zA = 1.96
Since P(z > 1.96) = 0.025, we say: z.025 = 1.96
77
Finding Values of Z…
 Other Z values are
 Z.05 = 1.645
 Z.01 = 2.33

Will show you shortly how to use the “t-tables”


with infinite degrees of freedom to find a bunch of
these standard values for Zα
Note that the t and z values coincide after n= 30

78
Using the values of Z

Because z.025 = 1.96 and - z.025= -1.96, it follows that we


can state
P(-1.96 < Z < 1.96) =0 .95

The old Empirical Rule stated about 95% are within + 2σ


P(-2 < Z < 2) =0 .95
From now on we will use the 1.96 number for this
statement unless you are just talking in general terms
about how much of a population is within + 2σ

Similarly
P(-1.645 < Z < 1.645) = 0.90
79
Example
 Scores on a certain standardized test,
IQ scores, are approximately normally
distributed with mean =100 and
standard deviation =15.
 If an individual is selected at random,
what is the probability that his score X
satisfies 120 < X < 130?

80
Example
 We can calculate this probability by using
the standard normal distribution as
follows:
P(120  X  130)  P( 12015100  X 100
15  13015100 )
 P(1.33  Z  2)   (2)   (1.33)
 0.9772  0.9082  0.069

81
Symmetry and kurtosis
 In many cases an observed frequency
distribution departs obviously from
normality; thus statistics that measure the
nature and amount of departure are
useful.
 We will focus on two types of departures
from normality: skewness and kurtosis.

82
Skewness
 Skewness, which is another name for
asymmetry, means that one tail of the curve
is drawn out more than the other.
 In such curves the mean and the median do
not coincide.
 Curves are called skewed to the right or left,
depending upon wheter the right or left tails
are drawn out.
83
Skewness

84
Kurtosis
 If a symmetrical distribution is considered to
have a center, two shoulders, and two tails, the
kurtosis describes the proportions in the center
and in the tails with relation to those in the
shoulders.
 There are three types of kurtosis: leptokurtic,
mesokurtic and platykurtic
 We shall define the leptokurtic and platykurtic
curves.
85
Kurtosis
 A leptokurtic curve has more items near
the center and at the tails, with fewer
items in the shoulders relative to a
normal distribution with the same mean
and variance.

86
Kurtosis
 A platykurtic curve has fewer items at
the center and at the tails than the
normal curve, but has more items in the
shoulders. A bimodal distribution is an
extreme case of platykurtic distribution.

87
Skewness and kurtosis
 The sample statistics for measuring skewness
and kurtosis are called g1 and g2, to represent
population parameters G1 and G2.

n ( X i  X ) 3

g1 
( n  1)( n  2) s 3

n ( n 1)
n 1 (X  X )  3  ( X i  X )
4

2 2
g2  i

( n  2)( n  3) s 4
88
Skewness and kurtosis
 In a normal frequency distribution both
G1 and G2 are zero.
 A negative g1 indicates skewness to the
left, a positive g1 skewness to the right.
 A negative g2 indicates platykurtosis,
while a positive g2 shows leptokurtosis.
 The absolute magnitudes of g1 and g2
do not mean so much.
89
Quantile measures of symmetry and kurtosis

 Denoting the ith quartile as Qi we can define the Bowley coefficient of skewness

 It is a measure that may range from -1, for a distribution with extreme left skewness; to 0 for a symmetrical distribution; to 1, for a distribution
with extreme right skewness

Q3  Q1  2Q2
skewness 
Q3  Q1

90
Quantile measures of kurtosis
 A kurtosis measure based on octiles Oi
(12.5%, 25%, 37.5% and so on) was
proposed in 1988 by Moors
(O7  O5 )  (O3  O1 )
kurtosis 
Q3  Q1
 Which measures zero , for extreme
platykurtosis; to 1.233 for normal; to
infinity, for extreme leptokurtosis
91
Graphic test for normality

 Quantile-quantile (Q-Q) plots are useful


for comparing distribution functions in
general. In Q-Q plots, the quantiles of
one distribution are plotted against
those of another.

92
Graphic test for normality

93
Graphic test for normality

94
Chi-Square Distribution and Chi-square Tests

Chi-square Goodness of Fit Test

Multinomial Experiments

95
Multinomial Experiment
A multinomial experiment is a probability experiment
consisting of a fixed number of trials in which there are
more than two possible outcomes for each independent
trial. (Unlike the binomial experiment in which there
were only two possible outcomes.)
Example:
A researcher claims that the distribution of favorite pizza
toppings among teenagers is as shown below.
Topping Frequency, f
Each outcome is Cheese 41% The probability for
classified into Pepperoni 25% each possible
categories. Sausage 15% outcome is fixed.
Mushrooms 10%
Onions 9%
Uses of Chi-Square Goodness-of-Fit Test
A Chi-Square Goodness-of-Fit Test is used to test whether a
frequency distribution fits an expected distribution.
To calculate the test statistic for the chi-square goodness-of-fit test,
the observed frequencies and the expected frequencies are used.
The observed frequency O of a category is the frequency for the
category observed in the sample data.
The expected frequency E of a category is the calculated
frequency for the category. Expected frequencies are obtained
assuming the specified (or hypothesized) distribution. The expected
frequency for the ith category is
Ei = npi
where n is the number of trials (the sample size) and pi is the
assumed probability of the ith category.
Observed and Expected Frequencies
Example:

200 teenagers are randomly selected and asked what their favorite
pizza topping is. The results are shown below.
Find the observed frequencies and the expected frequencies.

Topping Results % of Observed Expected


(n = teenager Frequenc Frequency
200) s y 200(0.41) = 82
Cheese 78 41% 78 200(0.25) = 50
Pepperoni 52 25% 52 200(0.15) = 30
Sausage 30 15% 30 200(0.10) = 20
Mushroom 25 10% 25 200(0.09) = 18
s 15
Onions 15 9%
Chi-Square Goodness-of-Fit Test
For the chi-square goodness-of-fit test to be used, the following must be
true.
1. The observed frequencies must be obtained by using a random
sample.
2. Each expected frequency must be greater than or equal to 5.

The Chi-Square Goodness-of-Fit Test


If the conditions listed above are satisfied, then the sampling distribution
for the goodness-of-fit test is approximated by a chi-square distribution
with k – 1 degrees of freedom, where k is the number of categories. The
test statistic for the chi-square goodness-of-fit test is
2 (O  E )2 The test is always a right-
χ � tailed test.
E
where O represents the observed frequency of each category and E
represents the expected frequency of each category.
Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
1. Identify the claim. State the null State H0 and Ha.
and alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f. = k – 1

4. Determine the critical value. Use Table in the


Appendix.
5. Determine the rejection region.

Continued.
100
Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
6. Calculate the test statistic. 2
χ �
(O  E )2
E
7. Make a decision to reject or fail
to reject the null hypothesis. If χ2 is in the
rejection region,
8. Interpret the decision in the reject H0. Otherwise,
context of the original claim. fail to reject H0.

101
Chi-Square Goodness-of-Fit Test
Example:
A researcher claims that the distribution of favorite pizza
toppings among teenagers is as shown below. 200
randomly selected teenagers are surveyed.
Topping Frequency, f
Cheese 39%
Pepperoni 26%
Sausage 15%
Mushrooms 12.5%
Onions 7.5%

Using  = 0.01, and the observed and expected values


previously calculated, test the surveyor’s claim using a chi-
square goodness-of-fit test.
Continued.
Chi-Square Goodness-of-Fit Test
Example continued:
H0: The distribution of pizza toppings is 39% cheese, 26%
pepperoni, 15% sausage, 12.5% mushrooms, and 7.5%
onions. (Claim)
Ha: The distribution of pizza toppings differs from the
claimed or expected distribution.

Because there are 5 categories, the chi-square distribution has k


– 1 = 5 – 1 = 4 degrees of freedom.

With d.f. = 4 and  = 0.01, the critical value is χ20 = 13.277.

Continued.
Chi-Square Goodness-of-Fit Test
Example continued:
Topping Observed Expected
Rejection Frequenc Frequenc
region y y
  0.01 Cheese 78 82
Pepperoni 52 50
X2 Sausage 30 30
χ20 = 13.277 Mushroom 25 20
s
2 Onions
(O  E )2 (78  82)2 (52  50)2
(30  30)2 (2515 18
 20)2 (15  18)2
χ �     
E 82 50 30 20 18
�2.025
Conclusion: Fail to reject H0.
There is not enough evidence at the 1% level to reject the
surveyor’s claim.
104
Independence of Chi-squares

Contingency Tables

105
Contingency Tables
An r  c contingency table shows the observed
frequencies for two variables. The observed frequencies
are arranged in r rows and c columns. The intersection of
a row and a column is called a cell.

The following contingency table shows a random sample of


321 fatally injured passenger vehicle drivers by age and
gender.

Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older
Male 32 51 52 43 28 10
Female 13 22 33 21 10 6
Expected Frequency

Assuming the two variables are independent, you can use


the contingency table to find the expected frequency for
each cell.

Finding the Expected Frequency for Contingency Table


Cells
The expected frequency for a cell Er,c in a contingency table is
(Sum of row r ) �(Sum of column c)
Expected frequency E r ,c  .
Sample size
Expected Frequency
Example:
Find the expected frequency for each “Male” cell in the
contingency table for the sample of 321 fatally injured drivers.
Assume that the variables, age and gender, are independent.

Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

Continued.
108
Expected Frequency
Example continued:
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

(Sum of row r ) �(Sum of column c)


Expected frequency E r ,c 
Sample size
216 �
45 216 �
73 216 �
85
E 1,1  �30.28 E 1,2  �49.12 E 1,3  �57.20
321 321 321

216 �
64 216 �
38 216 �
16
E 1,4  �43.07 E 1,5  �25.57 E 1,6  �10.77
321 321 321

109
Chi-Square Independence Test

A chi-square independence test is used to test the


independence of two variables. Using a chi-square test,
you can determine whether the occurrence of one
variable affects the probability of the occurrence of the
other variable.
For the chi-square independence test to be used, the
following must be true.

1. The observed frequencies must be obtained by using a


random sample.
2. Each expected frequency must be greater than or equal
to 5.
Chi-Square Independence Test
The Chi-Square Test of Independence
If the conditions listed are satisfied, then the sampling
distribution for the chi-square independence test is
approximated by a chi-square distribution with
(r – 1)(c – 1)
degrees of freedom, where r and c are the number of rows
and columns, respectively, of a contingency table. The test
statistic for the chi-square independence test is
2 (O  E )2 The test is always a right-
χ � tailed test.
E
where O represents the observed frequencies and E
represents the expected frequencies.
Chi-Square Independence Test
Performing a Chi-Square Independence Test
In Words In Symbols
1. Identify the claim. State the null State H0 and Ha.
and alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f. = (r – 1)(c – 1)

4. Determine the critical value. Use Table in


Appendix.
5. Determine the rejection region.

Continued.
112
Chi-Square Independence Test
Performing a Chi-Square Independence Test

In Words In Symbols

2 (O  E )2
6. Calculate the test statistic. χ �
E

7. Make a decision to reject or fail


to reject the null hypothesis. If χ2 is in the rejection
region, reject H0.
8. Interpret the decision in the Otherwise, fail to reject
context of the original claim. H 0.

113
Chi-Square Independence Test
Example:
The following contingency table shows a random sample
of 321 fatally injured passenger vehicle drivers by age
and gender. The expected frequencies are displayed in
parentheses. At  = 0.05, can you conclude that the
drivers’ ages are related to gender in such accidents?

Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Female 13 22 33 21 10 6 105
(14.72) (23.88) (27.80) (20.93) (12.43) (5.23)
45 73 85 64 38 16 321
Chi-Square Independence Test
Example continued:

Because each expected frequency is at least 5 and the


drivers were randomly selected, the chi-square independence
test can be used to test whether the variables are
independent or not.

H0: The drivers’ ages are independent of gender.

Ha: The drivers’ ages are dependent on gender. (Claim)


d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5

With d.f. = 5 and  = 0.05, the critical value is χ20 = 11.071.


Continued.
Chi-Square Independence Test
Example continued: O E O–E (O – E)2 (O  E )2
Rejection
E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
  0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O  E )2 22 23.88 1.88 3.5344 0.148
2
χ �  2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Decision: Fail to 10 12.43 2.43 5.9049 0.4751
reject H0. 6 5.23 0.77 0.5929 0.1134

There is not enough evidence at the 5% level to conclude


that age is dependent on gender in such accidents.
116
117
Student’s t distribution
Let Z and V be two independent
random variables with:
1. Z having a Standard Normal distribution
and
2. V having a 2 distribution with  degrees
of freedom

Next we shall find the formula for t and then


its density function
8.118
Student t distribution
Let Z ~ N (0,1) V ~  k2 Z , V independent
Z
Define : t 
V /k

Application : X 1 ,..., X n ~ iid N  ,  2 
X   X  
Z  n   ~ N (0,1)

/ n   
(n  1) S 2
V ~  2
n 1 Z , V Independent
 2

 X 
n  

    X   X 
t  n  
 


(n  1) S  S  S n
2

2
(n  1)
119
Student’s t distribution

 1

�t2 � 2
g (t )  K �  1�
� �
where � 1 �
G� �
� 2 �
K 
� �
p G � �
�2�

Where G (k ) = k × (k – 1) × (k – 2) × (k – 3) ×  × 2 × 1

8.120
Relationship between Normal & t Distributions

t distribution standard normal distribution

121
Section 3
COMPARING TWO VARIANCES

THE F DISTRIBUTION

122
The F Distribution
V ~  k21 W ~  k22 V , W independent
V k1
F
W k2
Application ({ X i } and {Y j } independent) :

X 1 ,... X n1 ~ iid N 1 ,  12  
Y1 ,...Yn2 ~ iid N  2 ,  22 
( n1  1) S12 ( n2  1) S 22
V  ~  2
n1 1 W  ~  2
n2 1
 12  22
V , W independent
( n1  1) S12
( n1  1)
V k1 12
S12  12
F   2
W k2 ( n2  1) S 22
S 2  22
n2  1
22

123
F-Distribution
Let s12 and s22 represent the sample variances of two
different populations. If both populations are normal and the
population variances σ12 and σ22 are equal, then the sampling
distribution of
s12
F  2
s2 is called an F-distribution.
The frequency function is given by
 k1  k 2 
G  k1

k1  k 2

2   k1  2  fk1 
k1
h f      f
1 2

1  f 0
2

k  k  k k
G 1  G  2   2   2 

2  2
There are several properties of this distribution.
Continued.
F-Distribution
1. The F-distribution is a family of curves each of which is
determined by two types of degrees of freedom: the degrees
of freedom corresponding to the variance in the numerator,
denoted d.f.N, and the degrees of freedom corresponding to
the variance in the denominator, denoted d.f.D.

2. F-distributions are positively skewed.


3. The total area under each curve of an F-distribution is equal
to 1.

125
F-Distribution

4. F-values are always greater than or equal to 0.


5. For all F-distributions, the mean value of F is approximately equal to
1.

d.f.N = 1 and d.f.D = 8


d.f.N = 8 and d.f.D = 26
d.f.N = 16 and d.f.D = 7
d.f.N = 3 and d.f.D = 11

F
1 2 3 4
The F-Distribution
Finding Critical Values for the F-Distribution

1. Specify the level of significance .


2. Determine the degrees of freedom for the numerator, d.f. N.
3. Determine the degrees of freedom for the denominator, d.f. D.
4. Use Table in Appendix B to find the critical value. If the
hypothesis test is
a. one-tailed, use the  F-table.
b. two-tailed, use the 0.5 F-table.

127
Critical Values for the F-Distribution
Example:
Find the critical F-value for a right-tailed test when
 = 0.05, d.f.N = 5 and d.f.D = 28.
Appendix B: Table 7: F-Distribution
d.f.D:  = 0.05
Degrees of d.f.N: Degrees of freedom, numerator
freedom,
denominator

1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
27
2 4.21 19.00
18.51 3.35 2.96
19.16 2.73
19.25 2.57
19.30 2.46
19.33
28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43

The critical value is F0 = 2.56.


Critical Values for the F-Distribution
Example:
Find the critical F-value for a two-tailed
 = (0.10) = 0.05
test when  = 0.10, d.f.N = 4 and d.f.D
= 6. Appendix B: Table 7: F-Distribution
d.f.D:  = 0.05
Degrees of d.f.N: Degrees of freedom, numerator
freedom,
denominator

1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
The critical
6 value is F0 5.14
5.99 = 4.53.4.76 4.53 4.39 4.28
7 5.59 4.74 4.35 4.12 3.97 3.87
Two-Sample F-Test for Variances
Two-Sample F-Test for Variances
2
A two-sampleσ1F-test σ22used to compare two population variances
and is
when a sample is randomly selected from each
population. The populations must be independent and normally
distributed. The test statistic is
s12 2 2
F  2 s1 �s2.
s2
2 2
where 1 s and s2 represent the sample variances with
The degrees of freedom for the numerator is d.f. N =
n1 – 1 and the degrees of freedom for the denominator is2 d.f. D =
n2 – 1, where n1 is the size of the sample having s1
variance and
2
s2.
n2 is the size of the sample having variance
Two-Sample F-Test for Variances
2 2
Using a Two-Sample F-Test to Compare σ1 and σ2
In Words In Symbols
1. Identify the claim. State the null State H0 and Ha.
and alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f.N = n1 – 1


d.f.D = n2 – 1

4. Determine the critical value. Use Table 7 in


Appendix

Continued.
131
Two-Sample F-Test for Variances
2 2
Using a Two-Sample F-Test to Compare σ1 and σ2
In Words In Symbols
5. Determine the rejection region.
6. Calculate the test statistic.
s12
F  2
7. Make a decision to reject or fail s2
to reject the null hypothesis.
If F is in the rejection
region, reject H0.
8. Interpret the decision in the Otherwise, fail to
context of the original claim. reject H0.

132
Two-Sample F-Test
Example:
A travel agency’s marketing brochure indicates that
the standard deviations of hotel room rates for two cities are
the same. A random sample of 13 hotel room rates in one
city has a standard deviation of $27.50 and a random
sample of 16 hotel room rates in the other city has a
standard deviation of $29.75. Can you reject the agency’s
claim at  = 0.01?
2 2
Because 29.75 > 27.50, 1s =885.06 and s2  756.25.

H0: σ12  σ22 (Claim)

Ha: σ12 � σ22


Continued.
133
Two-Sample F-Test
Example continued:
This is a two-tailed test with  = ( 0.01) = 0.005, d.f.N = 15
and d.f.D = 12.
The critical value is F0 = 4.72.
1
  0.005
2 The test statistic is
s12 885.06
1 2 3 4
F F  2 �1.17.
F0 = 4.72 s2 756.25

Decision: Fail to reject H0.


There is not enough evidence at the 1% level to reject the
claim that the standard deviation of the hotel room rates for
the two cities are the same.
134
Analysis of Variance

More often than not, not all values observed are the
same. This is because there are various sources of
error which create differences between observations
gathered by a researcher. The analysis of variance is
the breakdown of such variabilities into their
component parts.
The technique of the analysis of variance was credited
to the famous British statistician Sir Ronald R. Fisher.
135
One-Way ANOVA
One-way analysis of variance is a hypothesis-testing
technique that is used to compare means from three or
more populations. Analysis of variance is usually
abbreviated ANOVA.

In a one-way ANOVA test, the following must be true.

1. Each sample must be randomly selected from a normal,


or approximately normal, population.
2. The samples must be independent of each other.
3. Each population must have the same variance.
One-Way ANOVA

1. The variance between samples MSB measures the


differences related to the treatment given to each
sample and is sometimes called the mean square
between.
2. The variance within samples MSW measures the
differences related to entries within the same sample.
This variance, sometimes called the mean square
within, is usually due to sampling error.
One-Way ANOVA
One-Way Analysis of Variance Test
If the conditions listed are satisfied, then the sampling distribution
for the test is approximated by the F-distribution. The test
statistic is
MS B
F  .
MSW
The degrees of freedom for the F-test are
d.f.N = k – 1
and
d.f.D = N – k
where k is the number of samples and N is the sum of the sample
sizes.
Test Statistic for a One-Way ANOVA
Finding the Test Statistic for a One-Way ANOVA Test
In Words In Symbols
1. Find the mean and variance of �x 2 �(x  x )2
x s 
each sample. n n 1
2. Find the mean of all entries in all
�x
samples (the grand mean). x
N
3. Find the sum of squares
between the samples. SS B  �ni (x i  x )2
4. Find the sum of squares within
the samples.
SSW  �(ni  1)si2

Continued.
139
Test Statistic for a One-Way ANOVA
Finding the Test Statistic for a One-Way ANOVA Test
In Words In Symbols

5. Find the variance between the SS B SS B


MS B  
samples. k  1 d.f.N

6. Find the variance within the SSW SS


MSW   W
samples N  k d.f.D

7. Find the test statistic. MS B


F 
MSW

140
Performing a One-Way ANOVA Test

In Words In Symbols
1. Identify the claim. State the null State H0 and Ha.
and alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f.N = k – 1


d.f.D = N – k

4. Determine the critical value. Use Table 7 in


Appendix.
Continued.
141
Performing a One-Way ANOVA Test

In Words In Symbols
5. Determine the rejection region.
6. Calculate the test statistic.
MS B
F 
7. Make a decision to reject or fail MSW
to reject the null hypothesis.
If F is in the rejection
region, reject H0.
8. Interpret the decision in the Otherwise, fail to
context of the original claim. reject H0.

142
ANOVA Summary Table
A table is a convenient way to summarize the results in a
one-way ANOVA test.

Degrees
Sum of Mean
Variation of F
squares squares
freedom

SS B
Between SSB d.f.N MS B  MS B �MSW
d.f.N

SSW
Within SSW d.f.D MSW 
d.f.D

143
Performing a One-Way ANOVA Test
Example:
The following table shows the salaries of randomly
selected individuals from four large metropolitan areas. At 
= 0.05, can you conclude that the mean salary is different in
at least one of the areas?

Los Tokyo London Paris


Angelis
27,800 30,000 32,000 30,000
28,000 33,900 35,800 40,000
25,500 29,750 28,000 35,000
29,150 25,000 38,900 33,000
30,295 34,055 27,245 29,805
Continued.
Performing a One-Way ANOVA Test
Example continued:
H0: μ1 = μ2 = μ3 = μ4
Ha: At least one mean is different from the others. (Claim)

Because there are k = 4 samples, d.f.N = k – 1 = 4 – 1 = 3.

The sum of the sample sizes is


N = n1 + n2 + n3 + n4 = 5 + 5 + 5 + 5 = 20.

d.f.D = N – k = 20 – 4 = 16

Using  = 0.05, d.f.N = 3, and d.f.D = 16,


the critical value is F0 = 3.24.
Continued.
Performing a One-Way ANOVA Test
Example continued:
To find the test statistic, the following must be calculated.

�x 140745  152705  161945  167805


x   31160
N 20
SS B � n (x  x )2
MS B   i i
d.f.N k 1
5(28149  31160)2  5(30541  31160)2
 
4 1
5(32389  31160)2  5(33561  31160)2
4 1
�27874206.67
Continued.
Performing a One-Way ANOVA Test
Example continued:
SSW �(ni  1)si2
MSW 
d.f.D  N  k
(5  1)(3192128.94)  (5  1)(13813030.08)
� 
20  4
(5  1)(24975855.83)  (5  1)(17658605.02)
20  4
 14909904.97 Test statistic Critical
value
MS B 27874206.67 �1.870
F   1.870 < 3.24.
MSW 14909904.34
Fail to reject H0.
There is not enough evidence at the 5% level to conclude that the
mean salary is different in at least one of the areas.
147
The Analysis of Variance Table
Source of Degrees Sum of Mean F Value Critical
Variation of Squares Square value
Freedom
Between 3 27874206.67 1.870 3.24
Cities

Within 16 14909904.97
Cities

Total 19

148
End of Probability

Distributions and
Statistical Analysis

149
CHAPTER

DIFFERENT TYPES OF
HYPOTHESIS TESTING
CONTENTS OF THE CHAPTER
• Examples on Hypothesis Testing
• One-tailed and Two-tailed Hypotheses
• Test of One Mean
• Test of two means which are Independent
• Test of two means when samples are Dependent
• Test for a Single Proportion
• Test for Two Proportions
• Cross Tabulation Hypothesis (Chi-square test)

13 | 151
Hypotheses Testing

 Oversimplified or incorrect assumptions must


be subjected to more formal hypothesis
testing

13 | 152
Some Important
Hypotheses
 Bankers assumed high-income earners are more
profitable than low-income earners
 Clients who carefully balance their checkbooks every
month and minimize fees due to overdrafts are
unprofitable checking account customers
 Old clients are more likely to diminish CD balances
by large amounts compared to younger clients
 This was non-intuitive because conventional wisdom
suggested that older clients have a larger portfolio of
assets and seek less risky investments

13 | 153
Data Analysis
 Descriptive
 Computing measures of central tendency and
dispersion,as well as constructing one-way
tables
 Inferential
 Data analysis aimed at testing specific
hypotheses is usually called inferential
analysis

13 | 154
Null and Alternative
Hypotheses
H0 : Null Hypotheses
Ha : Alternative Hypotheses

 Hypotheses always pertain to population


parameters or characteristics rather than to
sample characteristics. It is the population,
not the sample, that we want to make an
infernece about from limited data

13 | 155
Steps in Conducting a
Hypothesis Test
 Step 1. Set up H0 and Ha
 Step 2. Identify the nature of the sampling
distribution curve and specify the appropriate
test statistic
 Step 3. Determine whether the hypothesis
test is one-tailed or two-tailed

13 | 156
Steps in Conducting a
Hypothesis Test (Cont’d)
 Step 4. Taking into account the specified significance
level, determine the critical value (two critical values
for a two-tailed test) for the test statistic from the
appropriate statistical table
 Step 5. State the decision rule for rejecting H0
 Step 6. Compute the value for the test statistic from
the sample data
 Step 7. Using the decision rule specified in step 5,
either reject H0 or reject Ha

13 | 157
Launching a Product Line
Into a
New Market Area
 Martha, product manager for a line of apparel, wishes
to introduce the product line into a new market area
 Survey of a random sample of 400 households in that
market showed a mean income per household of
$30,000. Martha strongly believes the product line
will be adequately profitable only in markets where
the mean household income is greater than $29,000.
Should Martha introduce the product line into the new
market?

13 | 158
Martha’s Criterion for
Decision Making
 To reach a final decision, Martha has to make
a general inference (about the population)
from the sample data
 Criterion: mean income across across all
households in the market area under
consideration
 If the mean population household income is
greater than $29,000, then Martha should
introduce the product line into the new market

13 | 159
Martha’s Hypothesis
 Martha’s decision making is equivalent to
either accepting or rejecting the hypothesis:
 The population mean household income in the
new market area is greater than $29,000

13 | 160
One-Tailed Hypothesis Test
 The term one-tailed signifies that all of z-
values that would cause Martha to reject H0,
are in just one tail of the sampling distribution
 Is the population Mean
H0:   $29,000
Ha:   $29,000

13 | 161
Type One and Type Two Errors
 Type I error occurs if the null hypothesis is
rejected when it is true
 Type II error occurs if the null hypothesis is
not rejected when it is false

13 | 162
Significance Level

  is Significance level—the upper-bound


probability of a Type I error
 1 -  is confidence level—the complement of
significance level

13 | 163
Summary of Errors Involved in
Hypothesis Testing
Inference Real State of Affairs
Based on
Sample Data H0 is True H0 is False
Correct decision Type II error
H0 is True Confidence level
= 1-  P (Type II error) = 
Correct decision
Type I error
H0 is False Significance level Power = 1-
= *

*Term  represents the maximum probability of


committing a Type I error

13 | 164
Level of Risk

 Two firms considering introducing a new product that


radically differs from their current product line
 Firm ABC
 Well-established customer base, distinct reputation for its
existing product line
 Firm XYZ
 No loyal clientele, no distinct image for its present
products
 Which of these two firms should be more cautious in
making a decision to introduce the new product?

13 | 165
Scenario - Firms ABC & XYZ

 Firm ABC
 ABC should be more cautious
 Firm XYZ
 XYZ should be less cautious

13 | 166
Exhibit 1 Identifying the Critical Sample Mean
Value – Sampling Distribution

Sample mean (x) values greater than $29,000--that is x-values on the right-
hand side of the sampling distribution centered on µ = $29,000--suggest that
H0 may be false. More important the farther to the right x is , the stronger is
the evidence against H0
13 | 167
Martha’s Decision Rule for Rejecting
the Null Hypothesis

 Reject H0 if the sample mean exceeds xc

13 | 168
Criterion Value
Every mean x has a corresponding equivalent
standard Normal Deviate:
The expression for z
x-
Z = ---------

sx

x =  + zsx
Substituting xc for x and zc for z
xc =  + zcsx where zc is standard normal
deviate 13 | 169
Computing the Criterion Value
Standard deviation for the sample of 400
households is $8,000. The standard error of the
mean (sx ) is given by
x
S
s = ---- = $400
n
Critical mean household income xc through the
following two steps:
1. Determine the critical z-value, zc. For  =.05, From
Appendix 1, zc = 1.645.

2. Substitute the values of zc, s, and  (under the


13 | 170
assumption that H is "just" true ), x  + z s =
Martha’s Decision Rule

 If the sample mean household income is


greater than $29,658, reject the null
hypothesis and introduce the product line into
the new market area.

13 | 171
Test Statistic

The value of the test statistic is simply the z-


value corresponding to = $30,000.

x-
Z = ------ = 2.5
s

13 | 172
Exhibit 2 Critical Value for Rejecting
the Null Hypothesis

13 | 173
P - Value – Actual Significance Level

 The probability of obtaining an x value as high as


$30,000 or more when  is only $29,000 = .0062
 This value is sometimes called the actual significance
level, or the p-value
 The actual significance level of .0062 in this case
means the odds are less than 62 out of 10,000 that
the sample mean income of $30,000 would have
occurred entirely due to chance (when the population
mean income is $29,000 or less)

13 | 174
T-test

Conduct the t-test when sample is small


Let the sample size, n = 25

X = $30,000 , s = $8,000
From the t-table in Appendix 3, tc = 1.71 for 
= .05
and d.f. = 24.

Decision rule: “Reject H0 if t  1.7l.”

13 | 175
T-test (Cont’d)

The value of t from the sample data:


S = 8000/25 = $1,600
x-
t = ------ = 0.625
sx
The computed value of t is less than 1.71, H0
cannot be rejected.
Martha should not introduce the product line into
the new market area.

13 | 176
Two-Tailed Hypothesis Test

 Two-tailed test is one in which values of the


test statistic leading to rejectioin of the null
hypothesis fall in both tails of the sampling
distribution curve. The following hypothesis is
a two-tailed hypothesis.
H0 :  = $29,000
Ha :   $29,000

13 | 177
Test of Two Means

 A health service agency has designed a public


service campaign to promote physical fitness and the
importance of regular exercise. Since the campaign
is a major one, the agency wants to make sure of its
potential effectiveness before running it on a national
scale
 To conduct a controlled test of the campaign’s
effectiveness, the agency needs two similar cities
 The agency identified two similar cities
 city 1 will serve as the test city
 city 2 will serve as a control city

13 | 178
Test of Two Means (Cont’d)

 Random survey was conducted to measure the


average time per day a typical adult in each city
spent on some form of exercise
 300 adults in city 1,
 200 adults in city 2
 Results of the survey :
 average was 30 minutes per day (with a standard
deviation of 22 minutes) in city 1
 Average was 35 minutes per day (with a standard
deviation of 25 minutes) in city 2
 Question
 From these results, can the agency conclude
confidently that the two cities are well matched for the
controlled test?
13 | 179
Basic Statistics and Hypotheses

City 1: n1 = 300 x1 = 30 s1 = 22

City 2: n2 = 200 x2 = 35 s2 = 25

The hypotheses are

H0: 1 =2 or 1 -2 = 0

Ha: 1  2 or 1 -2  0 13 | 180


Test Statistic

Test statistic is the z-statistic, given by

(x1 - x 2) - (1 - 2 )
z = -------------------------------
 s12/n1 + s22/n2

n1 and n2 are greater than 30.


The z-statistic can therefore be used as the
test statistic.

13 | 181
Decision – Two-Tailed Test

 For Two-Tailed tests


 Identify two critical values of z, one for each tail of the
sampling distribution
 The probability corresponding to each tail is .025, since
 = .05
 From the Normal Table, the z-value, for /2 =.025 is
1.96
 Decision rule : “Reject H0 if z  -1.96 or if z  1.96.”

13 | 182
Computing Z-value – Two-Tailed Test

Computing the value of z from the survey


results and under the customary assumption
that the null hypothesis is true (i.e., 1 - 2 =
0):
(30 - 35) - (0)
z= --------------------------------- = -2.29
(22)2/300 + (25)2/200

Since z  -1.96, we should reject H0.

13 | 183
Exhibit 5 Hypothesis Test Related to Mean
Exercising in Two Cities

13 | 184
The t- test for Independent Samples

Test statistic

(x1 - x2) - (1 - 2 )


t = -------------------------
s* (  1/n1 + 1/n2 )

with d.f. = n1 + n2 - 2. In this expression, s* is the


pooled standard deviation, given by

(n1 – 1)s12 + (n2 – 1)s22


s* = ---------------------------------
n1 + n 2 - 2
13 | 185
The t- test for Independent Samples - Two Cities

n1 = 20 x1 = 30 s1 = 22
n2 = 10 x2 = 35 s2 = 25

The degrees of freedom for the t ‑ statistic


are
d.f. = 28

Critical value of t with 28 d.f for a tail


probability of .025 is 2.05.

Decision rule : “Reject H0 if t  -2.05 or if t


 2.05." The pooled standard deviation is
s* =  529 (approximately) = 23 13 | 186
The t - test for Independent Samples

The test statistic is

t = -0.56

Since t is neither less than -2.05 nor greater than


2.05, we cannot reject H0

The sample evidence is not strong enough to


conclude that the two cities differ in terms of
levels of exercising activity of their residents.
13 | 187
National Insurance Company Study – Perceived Service
Quality Differences Between Males and Females

 Test of Two Means Using any software


program
 On the 10-point scale,
males gave a mean rating of approximately
7.87
females gave a mean rating of approximately
7.83.

13 | 188
National Insurance Company Study – Perceived Service
Quality Differences Between Males and Females

Group Statistics

Std. Error
gender N Mean Std. Deviation Mean
OQ male 137 7.87 2.26 .19
female 126 7.83 2.31 .21

13 | 189
Test of Two Means When Samples Are Dependent

 The need to check for significant differences


between two mean values when the samples
are not independent

13 | 190
Test of Two Means When Samples Are Dependent
(Cont’d)

 A retail chain ran a special promotion in a


representative sample of 10 of its stores to
boost sales
 Weekly sales per store before and after the
introduction of the special promotion are
shown
 Did the special promotion lead to a significant
increase in sales?

13 | 191
Sales Per Store Before and After a
Promotional Campaign
Sales per Store (In Thousands)

Store Before After Change in


Number (i) Promotion Promotion Sales (In
(xbi ) (xai ) Thousands)
xdi = xai - xbi
1 250 260 10
2 235 240 5
3 150 151 1
4 145 140 -5
5 120 124 4
6 98 100 2
7 75 70 -5
8 85 95 10
9 180 200 20
10 212 220 8
Total 50

13 | 192
Test of Two Means When Samples Are Dependent
(Cont’d)

One-Tailed Hypothesis Test:


H0: d  0; Ha: d  0.
The sample estimate of d is xd,
given by
n
Xdi
i=1
xd = -----
n
where n is the sample size.

xd = 50/10 = 5
13 | 193
Test of Two Means When Samples Are Dependent
(Cont’d)

Test statistic is

xd - 
t = ----------- = 2.10
s/n

13 | 194
Test of Two Means When Samples Are Dependent
(Cont’d)

Standard deviation (s) = 7.53,  = 0.05,


tc for 9 d.f = 1.83 from the Appendix 3

Decision rule: “Reject H0 if t  1.83.”

Test Statistic, t  1.83, we reject H0 and


conclude that the mean change in sales per
store was significantly greater than zero.

The special promotion was indeed effective.


13 | 195
Exhibit 6 Hypothesis Test Related to Change in
Weekly Sales Per Store

13 | 196
Test for a Single Proportion

 Ms.Jones wants to substantially increase her firm's


advertising budget.
 The firm sells a variety of personal computer
accessories
 Random sample : 20/100 (20%) know the brand
name
 True awareness rate for the brand name across all
personal computer owners is less than .3
 Should Ms. Jones increase the advertising budget on
the basis of survey results?

13 | 197
Test for a Single Proportion (Cont’d)

 Need to test the population proportion of


personal computer owners who are aware of
the brand:

H0: p  0.3
Ha: p  0.3
(p is the symbol for population proportion)

13 | 198
Test for a Single Proportion (Cont’d)

The test statistic:


p-p
Z = ---------------------
 p(1- p)/n
where p is the sample proportion.

From the Normal Table, zc, = -1.645 for  = .


05.

Decision rule here is: “Reject Ho if z  -


1.645.”
13 | 199
Test for a Single Proportion

Since -2.174  -1.645, we reject H0;


The sample awareness rate of 0.2 is too low to
support the hypothesis that the population awareness
rate is 0.3 or more.

The actual significance level (p-value) corresponding


to
z = -2.174 is approximately 0.015 (from Appendix 1).

Level of significance implies that the odds are lower


than 15 in 1,000 that the sample awareness rate of
0.2 would have occurred entirely by chance (that is,
when the population awareness rate is 0.3 or higher).13 | 200
Exhibit 13.4 Hypothesis Test Related to
Proportion of Personal Computer Owners

13 | 201
Test of Two Proportions: Choosing Between
Commercial X & Commercial Y For a New Product

 Tom, advertising manager for a frozen-foods,


company, is in the process of deciding between two
TV commercials (X and Y) for a new frozen food to
be introduced
 Commercial X
• Runs for 20 seconds
• Random sample: 20 % awareness out of 200
respondents
 Commercial Y
• Runs for 30 seconds
• Random sample: 25 % awareness out of 200
respondents

13 | 202
Test of Two Proportions (Cont’d)

 Question
 Can Tom conclude that commercial Y will be
more effective in the total market for the new
product?

13 | 203
Criterion for Decision Making

 To reach a final decision, Tom has to make a


general inference (about the population) from
the sample data
 Criterion: relative degrees of awareness likely
to be created by the 2 commercials in the
population of all adult consumers
 Tom should conclude that commercial Y is
more effective than commercial X only if the
anticipated population awareness rate for
commercial Y is greater than that for X

13 | 204
Hypothesis

 Tom’s decision-making is equvalent to either


accepting or rejecting the hypothesis
 The potential awareness rate that commercial
Y can generate among the population of
consumers is greater than that which
commercial X can generate

13 | 205
Null and Alternative Hypotheses

Commercial
Commercial
X Y
Sample sizes: n1 = 200 n2 = 200

Sample proportions: p1 = 0.25 p2 =


0.20

The hypotheses are:

H0: p1  p2 or p1 - p2  0

Ha: p1  p2 or p1 - p2  0 13 | 206
Test of Two Proportions – Commercial
Sample Standard Error

(p1 – p2) - (p1 - p2)


z = ------------------------
p1 - p2 -- is estimated by the sample
standard error formula

Sample Standard Error


sp1 - p2 = pq ( 1/n1 + 1/n2)
n1p1 + n2p2
p = -------------------
n1 + n2
a = 1-p
13 | 207
Test of Two Proportions

For  = .05, the critical value of z (from


Appendix 1) is 1.645.
Decision rule: “Reject H0 if z  1.645.”

First compute p and q, then sp1 - p2 and z:


200(0.25) + 200(0.2)
p= ----------------------- = 0.225
200 + 200
q = 1 - 0.225 = 0.775

13 | 208
Test of Two Proportions

sp1 - p2 = (0.225)(0.775) (1/200 + 1/200)

=0.042

(0.25 - 0.20) - (0)


z= ---------------------- = 1.19
0.042

Since z  1.645, we cannot reject H0.

The sample evidence is not strong enough to suggest


that commercial Y will be more effective than
commercial X. 13 | 209
Hypothesis Test Related to Awareness
Generated by Two Commercials

13 | 210
Cross-Tabulations: Chi-square Contingency Test

 Technique used for determining whether there


is a statistically significant relationship
between two categorical (nominal or ordinal)
variables

13 | 211
Telecommunications Company

 Marketing manager of a telecommunication company


is reviewing the results of a study of potential users of
a new cell phone
 Random sample of 200 respondents
• A cross-tabulation of data on whether target consumers
would buy the phone (Yes or No) and whether the cell
phone had Bluetooth wireless technology (Yes or No)
 Question
 Can the marketing manager infer that an association
exists between Bluetooth technology and buying the
cell phone?

13 | 212
Table 3 Two-Way Tabulation of Bluetooth Technology and
Whether Customers Would Buy Cell Phone

13 | 213
Cross Tabulations - Hypotheses

H0: There is no association between wireless


technology and buying the cell phone (the
two variables are independent of each
other).

Ha: There is some association between the


Bluetooth feature and buying the cell phone
(the two variables are not independent of
each other).

13 | 214
Conducting the Test

• Test involves comparing the actual, or


observed, cell frequencies in the cross-
tabulation with a corresponding set of
expected cell frequencies (Eij)

13 | 215
Expected Values

ninj
Eij = -----
n

Where ni and nj are the marginal frequencies,


that is, the total number of sample units in
category i of the row variable and category j
of the column variable, respectively

13 | 216
Computing Expected Values

The expected frequency for the first-row, first-column


cell is given by
100  100
E11 = ------------ = 50
200

13 | 217
Table 4 Observed and Expected Cell Frequencies

13 | 218
Chi-square Test Statistic

= 72.00

Where r and c are the number of rows and columns,


respectively, in the contingency table. The number
of degrees of freedom associated with this
chi‑square statistic are given by the product (r - 1)(c
- 1).
13 | 219
Chi-square Test Statistic in a Contingency Test

For d.f. = 1, Assuming  =.05, from Appendix 2,


the critical Chi-square value (2c) = 3.84.

Decision rule is: “Reject H0 if 2  3.84.”

Computed 2 = 72.00
Since the computed Chi-square value is greater
than the critical value of 3.84, reject H0.
The apparent relationship between “Bluetooth
technology "and "would buy the cellular phone"
revealed by the sample data is unlikely to have
occurred because of chance
13 | 220
Interpretation

 The actual significance level associated with


a chi-square value of 72 is less than 0.001
(from Appendix 2). Thus, the chances of
getting a Chi-square value as high as 72
when there is no relationship between
Bluetooth technology and purchase of cell
phones are less than 1 in 1,000.

13 | 221
Cross-Tabulation Using SPSS
for National Insurance Company

 One crucial issue in the customer survey of


National Insurance Company was how a
customer's education was associated with
whether or not she or he would recommend
National to a friend.

13 | 222
Need to Conduct Chi-square
Test to Reach a Conclusion

 The hypotheses are


 H0:There is no association between
educational level and willingness to
recommend National to a friend (the two
variables are independent of each other)
 Ha: There is some association between
educational level and willingness to
recommend National to a friend (the two
variables are not independent of each other)

13 | 223
Association Between Education and Customer’s
Willingness to recommend National to a Friend

For two-way tabulation:


1. Select ANALYZE on the SPSS menu,
2. Click on DESCRIPTIVE STATISTICS,
3. Select CROSS-TABS.
4. Move the “highest level of schooling” to ROW(S) box,
5. Move “rec” variable to “COLUMN(S) box.
6. Click on CELLS,
7. Select OBSERVED, and ROW PERCENTAGES.
8. Click CONTINUE and
9. Click OK.

13 | 224
National Insurance Company Study -
Chi-Square Test
For Chi-Square Assessment:
1. Select ANALYZE
2. Click on DESCRIPTIVE STATISTICS
3. Select CROSS-TABS
4. Move the variable “highest level of schooling” to ROW(s) box
5. Move “rec” to COLUMN(s) box;
6. Click on “STATISTICS”
7. Select CHI-SQUARE, CONTINGENCY COEFFICIENT, and
CRAMER’S V
8. Click on CELLS,
9. Select OBSERVED and EXPECTED FREQUENCIES
10.Click CONTINUE
11.Click OK.

13 | 225
National Insurance Company Study –
P-Value Significance
 The actual significance level (p-value) = 0.019
 The chances of getting a chi-square value as
high as 10.007 when there is no relationship
between education and recommendation are
less than 19 in 1000
 The apparent relationship between education
and recommendation revealed by the sample
data is unlikely to have occurred because of
chance
 Jill and Tom can safely reject null hypothesis

13 | 226
Precautions in Interpreting
Cross Tabulation Results
 Two-way tables cannot show conclusive
evidence of a causal relationship
 Watch out for small cell sizes
 Increases the risk of drawing erroneous
inferences when more than two variables are
involved

13 | 227
Exercise: Two-way Table Based on a Survey of
200 Hospital Patients:

Patients who Patients who


jog do not jog

Patients with
20 40
heart disease
Patients
without heart 80 60
disease

100 100

Is there a causal relationship between patients


who jog and patients with hearth disease?
13 | 228
NON-PARAMETRIC STATISTICAL
METHODS

Slide
229
Nonparametric Methods
 Sign Test
 Wilcoxon Signed-Rank Test
 Mann-Whitney-Wilcoxon Test
 Kruskal-Wallis Test
 Rank Correlation

Slide
230
Nonparametric Methods

 Most of the statistical methods referred to as


parametric require the use of interval- or ratio-scaled
data.
 Nonparametric methods are often the only way to
analyze nominal or ordinal data and draw statistical
conclusions.
 Nonparametric methods require no assumptions
about the population probability distributions.
 Nonparametric methods are often called distribution-
free methods.

Slide
231
Nonparametric Methods

 In general, for a statistical method to be classified as


nonparametric, it must satisfy at least one of the
following conditions.
• The method can be used with nominal data.
• The method can be used with ordinal data.
• The method can be used with interval or ratio data
when no assumption can be made about the
population probability distribution.

Slide
232
Sign Test

 A common application of the sign test involves using


a sample of n potential customers to identify a
preference for one of two brands of a product.
 The objective is to determine whether there is a
difference in preference between the two items being
compared.
 To record the preference data, we use a plus sign if
the individual prefers one brand and a minus sign if
the individual prefers the other brand.
 Because the data are recorded as plus and minus
signs, this test is called the sign test.

Slide
233
Example: Peanut Butter Taste Test

 Sign Test: Large-Sample Case


As part of a market research study, a sample of 36
consumers were asked to taste two brands of peanut
butter and indicate a preference. Do the data shown
below indicate a significant difference in the consumer
preferences for the two brands?
18 preferred Hoppy Peanut Butter (+ sign recorded)
12 preferred Pokey Peanut Butter (_ sign recorded)
6 had no preference
The analysis is based on a sample size of 18 + 12 = 30.

Slide
234
Example: Peanut Butter Taste Test

 Hypotheses
H0: No preference for one brand over the other exists
Ha: A preference for one brand over the other exists
 Sampling Distribution
Sampling distribution
of the number of “+”
values if there is no
brand preference

2.74

 = 15 =0.5(30)
Slide
235
Example: Peanut Butter Taste Test

 Rejection Rule
Using 0.05 level of significance,
Reject H0 if z < -1.96 or z > 1.96
 Test Statistic
z = (18 - 15)/2.74 = 3/2.74 = 1.095
 Conclusion
Do not reject H0. There is insufficient evidence in
the sample to conclude that a difference in preference
exists for the two brands of peanut butter.
Fewer than 10 or more than 20 individuals would
have to have a preference for a particular brand in
order for us to reject H0.

Slide
236
Wilcoxon Signed-Rank Test

 This test is the nonparametric alternative to the


parametric matched-sample test presented in before.
 The methodology of the parametric matched-sample
analysis requires:
• interval data, and
• the assumption that the population of differences
between the pairs of observations is normally
distributed.
 If the assumption of normally distributed differences
is not appropriate, the Wilcoxon signed-rank test can
be used.

Slide
237
Example: Express Deliveries

 Wilcoxon Signed-Rank Test


A firm has decided to select one of two express
delivery services to provide next-day deliveries to the
district offices.
To test the delivery times of the two services, the firm
sends two reports to a sample of 10 district offices, with
one report carried by one service and the other report
carried by the second service.
Do the data (delivery times in hours) on the next
slide indicate a difference in the two services?

Slide
238
Example: Express Deliveries

District Office Overnight NiteFlite


Seattle 32 hrs. 25 hrs.
Los Angeles 30 24
Boston 19 15
Cleveland 16 15
New York 15 13
Houston 18 15
Atlanta 14 15
St. Louis 10 8
Milwaukee 7 9
Denver 16 11

Slide
239
Wilcoxon Signed-Rank Test

 Preliminary Steps of the Test


• Compute the differences between the paired
observations.
• Discard any differences of zero.
• Rank the absolute value of the differences from
lowest to highest. Tied differences are assigned
the average ranking of their positions.
• Give the ranks the sign of the original difference in
the data.
• Sum the signed ranks.
. . . next we will determine whether the sum is
significantly different from zero.

Slide
240
Example: Express Deliveries

District Office Differ. |Diff.| Rank Sign. Rank


Seattle 7 10 +10
Los Angeles 6 9 +9
Boston 4 7 +7
Cleveland 1 1.5 +1.5
New York 2 4 +4
Houston 3 6 +6
Atlanta -1 1.5 -1.5
St. Louis 2 4 +4
Milwaukee -2 4 -4
Denver 5 8 +8
+44

Slide
241
Example: Express Deliveries

 Hypotheses
H0: The delivery times of the two services are the
same; neither offers faster service than the other.
Ha: Delivery times differ between the two services;
recommend the one with the smaller times.
 Sampling Distribution
Sampling distribution
of T if populations
19.62 are identical

T = 0
T
Slide
242
Example: Express Deliveries

 Rejection Rule
Using 0.05 level of significance,
Reject H0 if z < -1.96 or z > 1.96
 Test Statistic
z = (T - T )/T = (44 - 0)/19.62 = 2.24
 Conclusion
Reject H0. There is sufficient evidence in the
sample to conclude that a difference exists in the
delivery times provided by the two services.
Recommend using the NiteFlite service.

Slide
243
Mann-Whitney-Wilcoxon Test

 This test is another nonparametric method for


determining whether there is a difference between
two populations.
 This test, unlike the Wilcoxon signed-rank test, is not
based on a matched sample.
 This test does not require interval data or the
assumption that both populations are normally
distributed.
 The only requirement is that the measurement scale
for the data is at least ordinal.

Slide
244
Mann-Whitney-Wilcoxon Test

 Instead of testing for the difference between the


means of two populations, this method tests to
determine whether the two populations are identical.
 The hypotheses are:
H0: The two populations are identical
Ha: The two populations are not identical

Slide
245
Example: Westin House Freezers

 Mann-Whitney-Wilcoxon Test (Large-Sample Case)


Manufacturer labels indicate the annual energy
cost associated with operating home appliances such
as freezers.
The energy costs for a sample of 10 Westin
freezers and a sample of 10 Brand-X Freezers are
shown on the next slide. Do the data indicate, using
 =0.05, that a difference exists in the annual energy
costs associated with the two brands of freezers?

Slide
246
Example: Westin Freezers

Westin Freezers Brand-X Freezers


$55.10 $56.10
54.50 54.70
53.20 54.40
53.00 55.40
55.50 54.10
54.90 56.00
55.80 55.50
54.00 55.00
54.20 54.30
55.20 57.00

Slide
247
Example: Westin Freezers

 Mann-Whitney-Wilcoxon Test (Large-Sample Case)


• Hypotheses
H0: Annual energy costs for Westin freezers
and Brand-X freezers are the same.
Ha: Annual energy costs differ for the
two brands of freezers.

Slide
248
Mann-Whitney-Wilcoxon Test:
Large-Sample Case
 First, rank the combined data from the lowest to
the highest values, with tied values being assigned
the average of the tied rankings.
 Then, compute T, the sum of the ranks for the first
sample.
 Then, compare the observed value of T to the
sampling distribution of T for identical populations.
The value of the standardized test statistic z will
provide the basis for deciding whether to reject H0.

Slide
249
Mann-Whitney-Wilcoxon Test:
Large-Sample Case
 Sampling Distribution of T for Identical Populations
• Mean
T = (1/2)n1(n1 + n2 + 1)
1

• Standard Deviation
 T  1 12 n1n2 (n1  n2  1)

• Distribution Form
Approximately normal, provided
n1 > 10 and n2 > 10

Slide
250
Example: Westin Freezers

Westin Freezers Rank Brand-X Freezers Rank


$55.10 12 $56.10 19
54.50 8 54.70 9
53.20 2 54.40 7
53.00 1 55.40 14
55.50 15.5 54.10 4
54.90 10 56.00 18
55.80 17 55.50 15.5
54.00 3 55.00 11
54.20 5 54.30 6
55.20 13 57.00 20
Sum of Ranks 86.5 Sum of Ranks 123.5

Slide
251
Example: Westin Freezers

 Mann-Whitney-Wilcoxon Test (Large-Sample Case)


• Sampling Distribution

Sampling distribution
of T if populations
are identical
13.23

T
T = 105 =1/2(10)(21)

Slide
252
Example: Westin Freezers

 Rejection Rule
Using .05 level of significance,
Reject H0 if z < -1.96 or z > 1.96
 Test Statistic
z = (T - T )/T = (86.5 - 105)/13.23 = -1.40
 Conclusion
Do not reject H0. There is insufficient evidence in
the sample data to conclude that there is a difference
in the annual energy cost associated with the two
brands of freezers.

Slide
253
Kruskal-Wallis Test

 The Mann-Whitney-Wilcoxon test can be used to test


whether two populations are identical.
 The MWW test has been extended by Kruskal and
Wallis for cases of three or more populations.
 The Kruskal-Wallis test can be used with ordinal data
as well as with interval or ratio data.
 Also, the Kruskal-Wallis test does not require the
assumption of normally distributed populations.
 The hypotheses are:
H0: All populations are identical
Ha: Not all populations are identical

Slide
254
Rank Correlation

 The Pearson correlation coefficient, r, is a measure of


the linear association between two variables for
which interval or ratio data are available.
 The Spearman rank-correlation coefficient, rs , is a
measure of association between two variables when
only ordinal data are available.
 Values of rs can range from –1.0 to +1.0, where
• values near 1.0 indicate a strong positive
association between the rankings, and
• values near -1.0 indicate a strong negative
association between the rankings.

Slide
255
Rank Correlation

 Spearman Rank-Correlation Coefficient, rs

6 di2
rs  1 
n( n2  1)

where: n = number of items being ranked


xi = rank of item i with respect to one variable
yi = rank of item i with respect to a second
variable
di = xi - yi

Slide
256
Test for Significant Rank Correlation

 We may want to use sample results to make an


inference about the population rank correlation ps.
 To do so, we must test the hypotheses:
H0: ps = 0
Ha: ps = 0

Slide
257
Rank Correlation

 Sampling Distribution of rs when ps = 0


• Mean
rs  0

• Standard Deviation
1
 rs 
n1

• Distribution Form
Approximately normal, provided n > 10

Slide
258
Example: Connor Investors

 Rank Correlation
Connor Investors provides a portfolio
management service for its clients. Two of Connor’s
analysts rated ten investments from high (6) to low
(1) risk as shown below. Use rank correlation, with
 = .10, to comment on the agreement of the two
analysts’ ratings.

Investment A B C D E F G H I J
Analyst #1 1 4 9 8 6 3 5 7 2 10
Analyst #2 1 5 6 2 9 7 3 10 4 8

Slide
259
Example: Connor Investors

Analyst #1 Analyst #2
Investment Rating Rating Differ. (Differ.)2
A 1 1 0 0
B 4 5 -1 1
C 9 6 3 9
D 8 2 6 36
E 6 9 -3 9
F 3 7 -4 16
G 5 3 2 4
H 7 10 -3 9
I 2 4 -2 4
J 10 8 2 4
Sum = 92

Slide
260
Example: Connor Investors

 Hypotheses
H0: ps = 0 (No rank correlation exists.)
Ha: ps = 0 (Rank correlation exists.)

 Sampling Distribution
Sampling distribution of
rs under the assumption
of no rank correlation
1
 rs   .333
10  1
rs
r = 0
Slide
261
Example: Connor Investors

 Rejection Rule
Using .10 level of significance,
Reject H0 if z < -1.645 or z > 1.645
 Test Statistic
6 di2 6(92)
rs  1   1  0.4424
n(n  1)
2
10(100  1)
z = (rs - r )/r = (.4424 - 0)/.3333 = 1.33
 Conclusion
Do no reject H0. There is not a significant rank
correlation. The two analysts are not showing
agreement in their rating of the risk associated with
the different investments.

Slide
262
MULTIVARIATE STATISTICAL
TECHNIQUES

263
DEFINITION OF MULTIVARIATE TECHNIQUES
 All statistical techniques which simultaneously analyze more
than two variables on a sample of observations can be
categorized as multivariate techniques.
 Multivariate analysis is a collection of methods for analyzing
data in which a number of observations are available for each
object.
 In the analysis of many problems, it is helpful to have a
number of scores for each object. For instance, in the field of
intelligence testing if we start with the theory that general
intelligence is reflected in a variety of specific performance
measures
 Then to study intelligence in the context of this theory one
must administer many tests of mental skills, such as
vocabulary, speed of recall, mental arithmetic, verbal
analogies and so on.
264
Example of Multivariate Variables
• The score on each test is one variable, Xi, and there are
several, k, of such scores for each object, represented
as X1, X2,…, Xk.
• Most of the research studies involve more than two
variables in which situation analysis is desired of the
association between one (at times many) criterion
variable and several independent variables
• Or we may be required to study the association
between variables having no dependency relationships.
• All such analyses are termed as multivariate analyses or
multivariate techniques.
• In brief, techniques that take account of the various
relationships among variables are termed multivariate
analyses or multivariate techniques 265
GROWTH OF MULTIVARIATE TECHNIQUES
• Of late, multivariate techniques have emerged as a powerful
tool to analyze data represented in terms of many variables.
• The main reason being that a series of univariate analysis
carried out separately for each variable may, at times, lead to
incorrect interpretation of the result.
• This is so because univariate analysis does not consider the
correlation or inter-dependence among the variables.
• As a result, during the last fifty years, a number of statisticians
have contributed to the development of several multivariate
techniques.
• Today these techniques are being applied in many fields such
as economics, sociology, psychology, agriculture,
anthropology, biology and medicine.
266
GROWTH OF MULTIVARIATE TECHNIQUES

• These techniques are used in analyzing social,


psychological, medical and economic data, specially
when the variables concerning research studies of
these fields are supposed to be correlated with
each other and when rigorous probabilistic models
cannot be appropriately used.
• Applications of multivariate techniques in practice
have been accelerated in modern times because of
the advent of high speed electronic computers.

267
CHARACTERISTICS AND APPLICATIONS

• Multivariate techniques are largely empirical and deal with the


reality; they possess the ability to analyze complex data.
• Accordingly in most of the applied and behavioral researches, we
generally resort to multivariate analysis techniques for realistic
results. Besides being a tool for analyzing the data, the techniques
also help in various types of decision-making.
• For example, take the case of college entrance examination wherein
a number of tests are administered to candidates, and the candidates
scoring high total marks based on many subjects are admitted.
• This system, though apparently fair, may at times be biased in favor of
some subjects with the larger standard deviations.
• Multivariate techniques may be appropriately used in such situations
for developing norms as to who should be admitted in college
268
CHARACTERISTICS AND APPLICATIONS
 We may also cite an example from medical field.
 Many medical examinations such as blood pressure and cholesterol
tests are administered to patients
 Each of the results of such examinations has significance of its own,
but it is also important to consider relationships between different
test results or results of the same tests at different occasions in order
to draw proper diagnostic conclusions and to determine an
appropriate therapy.
 Multivariate techniques can assist us in such a situation. In view of all
this, we can state that “if the researcher is interested in making
probability statements on the basis of sampled multiple
measurements, then the best strategy of data analysis is to use some
suitable multivariate statistical technique.”

269
CHARACTERISTICS AND APPLICATIONS
 The basic objective underlying multivariate techniques is to
represent a collection of massive data in a simplified way.
 In other words, multivariate techniques transform a mass of
observations into a smaller number of composite scores in
such a way that they may reflect as much information as
possible contained in the raw data obtained concerning a
research study.
 Thus, the main contribution of these techniques is in
arranging a large amount of complex information involved
in the real data into a simplified visible form.
 Mathematically, multivariate techniques consist in “forming
a linear composite vector in a vector subspace, which can
be represented in terms of projection of a vector onto 270
CHARACTERISTICS AND APPLICATIONS
 For better appreciation and understanding of
multivariate techniques, one must be familiar with
fundamental concepts of linear algebra, vector spaces,
orthogonal and oblique projections and univariate
analysis.
 Even then before applying multivariate techniques for
meaningful results, one must consider the nature and
structure of the data and the real aim of the analysis.
 We should also not forget that multivariate techniques
do involve several complex mathematical computations
and as such can be utilized largely with the availability
of computer facility.
271
CLASSIFICATION OF MULTIVARIATE TECHNIQUES
 Today, there exist a great variety of multivariate techniques which can
be conveniently classified into two broad categories viz., dependence
methods and interdependence methods.
 This sort of classification depends upon the question: Are some of the
involved variables dependent upon others? If the answer is ‘yes’, we
have dependence methods; but in case the answer is ‘no’, we have
interdependence methods.
 Two more questions are relevant for understanding the nature of
multivariate techniques. Firstly, in case some variables are
dependent, the question is how many variables are dependent?
 The other question is, whether the data are metric or non-metric?
This means whether the data are quantitative, collected on interval
or ratio scale, or whether the data are qualitative, collected on
nominal or ordinal scale.
272
CLASSIFICATION OF MULTIVARIATE TECHNIQUES

 The technique to be used for a given situation depends


upon the answers to all these very questions.
 Sheth in his article on “The multivariate revolution in
marketing research” has given the flow chart that clearly
exhibits the nature of some important multivariate
techniques as shown below.
 Thus, we have two types of multivariate techniques: one
type for data containing both dependent and independent
variables, and the other type for data containing several
variables without dependency relationship.

273
CLASSIFICATION OF MULTIVARIATE TECHNIQUES

 In the former category are included techniques like


• multiple regression analysis,
• multiple discriminant analysis,
• multivariate analysis of variance and
• canonical analysis,

 In the latter category we put techniques like


• factor analysis,
• cluster analysis,
• multidimensional scaling or MDS (both metric and non-
metric) and
• the latent structure analysis 274
275
VARIABLES IN MULTIVARIATE ANALYSIS
 Before we describe the various multivariate techniques, it seems
appropriate to have a clear idea about the term, variables used in the
context of multivariate analysis.
 Many variables used in multivariate analysis can be classified into
different categories from several points of view. Important ones are
as under:
 Explanatory variable and criterion variable: If X is considered to be
the cause of Y, then X is described as explanatory variable (also
termed as causal or independent variable) and Y is described as
criterion variable (also termed as resultant or dependent variable).
 In some cases both explanatory variable and criterion variable may
consist of a set of many variables in which case set (X1, X2, X3, …., Xp)
may be called a set of explanatory variables
276
VARIABLES IN MULTIVARIATE ANALYSIS

and the set


(Y1, Y2, Y3, …., Yq )
may be called a set of criterion variables if the
variation of the former may be supposed to cause
the variation of the latter as a whole.
 In economics, the explanatory variables are called
external or exogenous variables and the criterion
variables are called endogenous variables. Some
people use the term external criterion for
explanatory variable and the term internal criterion
for criterion variable. 277
VARIABLES IN MULTIVARIATE ANALYSIS

 Observable variables and latent variables: Explanatory


variables described above are supposed to be observable
directly in some situations, and if this is so, the same are
termed as observable variables.
 However, there are some unobservable variables which may
influence the criterion variables. We call such unobservable
variables as latent variables.
 Discrete variable and continuous variable: Discrete variable is
a variable which may take only integer values whereas
continuous variable is one which can assume any real value
(even in decimal points).
 Dummy variable (or Pseudo variable): This term is being used
in a technical sense and is useful in algebraic manipulations in
context of multivariate analysis. We call Xi ( i = 1, …., m) a
dummy variable, if only one of Xi is 1 and the others are all
zero. 278
IMPORTANT MULTIVARIATE TECHNIQUES

 A brief description of the various multivariate techniques named


above (with special emphasis on factor analysis) is as under:
 Multiple regression: In multiple regression we form a linear
composite of explanatory variables in such way that it has maximum
correlation with a criterion variable.
 This technique is appropriate when the researcher has a single,
metric criterion variable. Which is supposed to be a function of other
explanatory variables.
 The main objective in using this technique is to predict the variability
in the dependent variable based on its covariance with all the
independent variables.
 One can predict the level of the dependent phenomenon through
multiple regression analysis model, given the levels of independent
variables. 279
MULTIPLE REGRESSION
 Given a dependent variable, the linear-multiple regression
problem is to estimate constants β1, β2, ... βk and α such that
the expression
Y = α + β1 X1 + β2 X2 + ... + βk Xk
provides a good estimate of an individual’s Y score based on
his X scores.
 In practice, Y and the several X variables are converted to
standard scores; zy, zl, z2, ... zk; each z has a mean of 0 and
standard deviation of 1.
 Then the problem is to estimate constants, bi , such that
z’y = b1z1 + b2z2 + ...+ bk zk

where zy stands for the predicted value of the standardized Y


score, zy. 280
MULTIPLE REGRESSION
 The expression on the right side of the above equation is the linear
combination of explanatory variables. The constant α is eliminated in
the process of converting X’s to z’s.
 The least-squares-method is used, to estimate the beta weights in
such a way that the sum of the squared prediction errors is kept as
small as possible i.e., the Formula is minimized.
 The predictive adequacy of a set of beta weights is indicated by the
size of the correlation coefficient rzy × z’y between the predicted z’y
scores and the actual zy scores.
 This special correlation coefficient from Karl Pearson is termed the
multiple correlation coefficient (R).
 The squared multiple correlation, R2, represents the proportion of
criterion (zy) variance accounted for by the explanatory variables, i.e.,
the proportion of total variance that is ‘Common Variance’.
281
MULTIPLE REGRESSION

 Sometimes the researcher may use step-wise


regression techniques to have a better idea of the
independent contribution of each explanatory
variable.
 Under these techniques, the investigator adds the
independent contribution of each explanatory
variable into the prediction equation one by one,
computing betas and R2 at each step.
 Formal computerized techniques are available for
the purpose and the same can be used in the
context of a particular problem being studied by the
researcher 282
MULTIPLE DISCRIMINANT ANALYSIS:
 Through discriminant analysis technique, a researcher
may classify individuals or objects into one of two or
more mutually exclusive and exhaustive groups on the
basis of a set of independent variables.
 Discriminant analysis requires interval independent
variables and a nominal dependent variable.
 For example, suppose that brand preference (say brand
x or y) is the dependent variable of interest and its
relationship to an individual’s income, age, education,
etc. is being investigated, then we should use the
technique of discriminant analysis.
 Regression analysis in such a situation is not suitable
because the dependent variable is, not intervally
scaled. 283
MULTIPLE DISCRIMINANT ANALYSIS:
 Thus discriminant analysis is considered an appropriate
technique when the single dependent variable happens to be
non-metric and is to be classified into two or more groups,
depending upon its relationship with several independent
variables which all happen to be metric
 The objective in discriminant analysis happens to be to
predict an object’s likelihood of belonging to a particular
group based on several independent variables
 In case we classify the dependent variable in more than two
groups, then we use the name multiple discriminant analysis
 But in case only two groups are to be formed, we simply use
the term discriminant analysis

284
MULTIPLE DISCRIMINANT ANALYSIS:
 We may briefly refer to the technical aspects relating to
discriminant analysis
 There happens to be a simple scoring system that assigns a
score to each individual or object.
 This score is a weighted average of the individual’s numerical
values of his independent variables.
 On the basis of this score, the individual is assigned to the ‘most
likely’ category.
 For example, an individual is 20 years old, has an annual income
of USD 12,000, and has 10 years of formal education.
 Let b1, b2, and b3 be the weights attached to the independent
variables of age, income and education respectively.
285
MULTIPLE DISCRIMINANT ANALYSIS:
• The individual’s score (z), assuming linear score, would
be:
z = b1 (20) + b2 (12000) + b3 (10)
This numerical value of z can then be transformed into the
probability that the individual is an early user, a late user
or a non-user of the newly marketed consumer product
(here we are making three categories viz. early user, late
user or a non-user).
 The numerical values and signs of the b’s indicate
the importance of the independent variables in
their ability to discriminate among the different
classes of individuals. 286
MULTIPLE DISCRIMINANT ANALYSIS:
• Thus, through the discriminant analysis, the researcher can as
well determine which independent variables are most useful in
predicting whether the respondent is to be put into one group or
the other.
• In other words, discriminant analysis reveals which specific
variables in the profile account for the largest proportion of inter-
group differences.
 In case only two groups of the individuals are to be formed on
the basis of several independent variables, we can then have a
model like this

zi = b0 + b1X1i + b2X2i + ... + bnXni

where Xji = the ith individual’s value of the jth independent


variable;

bj = the discriminant coefficient of the jth variable; 287


MULTIPLE DISCRIMINANT ANALYSIS:
• bj = the discriminant coefficient of the jth variable;
zi = the ith individual’s discriminant score;
zcrit. = the critical value for the discriminant score.
• The classification procedure in such a case would be

a) If zi > zcrit., classify individual i as belonging to Group I


b) If zi < zcrit, classify individual i as belonging to Group II.

• When n (the number of independent variables) is equal to 2,


we have a straight line classification boundary.
• Every individual on one side of the line is classified as Group I
and on the other side, every one is classified as belonging to
Group II
288
MULTIPLE DISCRIMINANT ANALYSIS:
 When n = 3, the classification boundary is a two-dimensional
plane in 3 space and in general the classification boundary is an
n – 1 dimensional hyper-plane in n space.
 In n-group discriminant analysis, a discriminant function is
formed for each pair of groups.
 If there are 6 groups to be formed, we would have 6(6 – 1)/2 =
15 pairs of groups, and hence 15 discriminant functions.
 The b values for each function tell which variables are important
for discriminating between particular pairs of groups.
 The z score for each discriminant function tells in which of these
two groups the individual is more likely to belong.
 Then use is made of the transitivity of the relation “more likely
than”.
289
MULTIPLE DISCRIMINANT ANALYSIS:
 For example, if group II is more likely than group I and group III
is more likely than group II, then group III is also more likely
than group I.
 This way all necessary comparisons are made and the individual
is assigned to the most likely of all the groups.
 Thus, the multiple-group discriminant analysis is just like the
two-group discriminant analysis for the multiple groups are
simply examined two at a time.
 For judging the statistical significance between two groups, we
work out the Mahalanobis statistic, D2, which happens to be a
generalized distance between two groups, where each group is
characterized by the same set of n variables and where it is
assumed that variance-covariance structure is identical for both
groups.
290
MULTIPLE DISCRIMINANT ANALYSIS:
 It is worked out thus:

where U1 = the mean vector for group I


U2 = the mean vector for group II
v = the common variance matrix

 By transformation procedure, this D2 statistic becomes an F


statistic which can be used to see if the two groups are
statistically different from each other.
 From all this, we can conclude that the discriminant analysis
provides a predictive equation, measures the relative
importance of each variable and is also a measure of the ability
of the equation to predict actual class-groups (two or more)
concerning the dependent variable 291
MULTIVARIATE ANALYSIS OF VARIANCE:
 Multivariate analysis of variance is an extension of bivariate
analysis of variance in which the ratio of among-groups
variance to within-groups variance is calculated on a set of
variables instead of a single variable.
 This technique is considered appropriate when several
metric dependent variables are involved in a research study
along with many non-metric explanatory variables.
 (But if the study has only one metric dependent variable
and several nonmetric explanatory variables, then we use
the ANOVA technique as explained earlier.)
 In other words, multivariate analysis of variance is specially
applied whenever the researcher wants to test hypotheses
concerning multivariate differences in group responses to
experimental manipulation 292
MULTIVARIATE ANALYSIS OF VARIANCE:
 For instance, the market researcher may be interested in
using one test market and one control market to examine
the effect of an advertising campaign on sales as well as
awareness, knowledge and attitudes.
 In that case he should use the technique of multivariate
analysis of variance for meeting his objective
 Canonical correlation analysis: This technique was first
developed by Hotelling wherein an effort is made to
simultaneously predict a set of criterion variables from their
joint co-variance with a set of explanatory variables.
 Both metric and non-metric data can be used in the context
of this multivariate technique
293
CANONICAL CORRELATION ANALYSIS:

 The procedure followed is to obtain a set of weights for


the dependent and independent variables in such a
way that linear composite of the criterion variables has
a maximum correlation with the linear composite of
the explanatory variables.
 For example, if we want to relate grade school
adjustment to health and physical maturity of the child,
we can then use canonical correlation analysis,
provided we have for each child a number of
adjustment scores
 (such as tests, teacher’s ratings, parent’s ratings and so
on) and also we have for each child a number of health
and physical maturity scores (such as heart rate, height,
weight, index of intensity of illness and so on). 294
 The main objective of canonical correlation analysis is
to discover factors separately in the two sets of
variables such that the multiple correlation between
sets of factors will be the maximum possible.
 Mathematically, in canonical correlation analysis, the
weights of the two sets viz., a1, a2, … ak and yl, y2, y3, ...
yj are so determined that the variables X = a1X1 + a2X2
+... + akXk + a and Y = y1Y1 + y2Y2 + … yjYj + y have a
maximum common variance.
 The process of finding the weights requires factor
analyses with two matrices.* The resulting canonical
correlation solution then gives an over all description of
the presence or absence of a relationship between the
two sets of variables. 295
FACTOR ANALYSIS:
 Factor analysis is by far the most often used multivariate
technique of research studies, specially pertaining to
social and behavioral sciences.
 It is a technique applicable when there is a systematic
interdependence among a set of observed or manifest
variables and the researcher is interested in finding out
something more fundamental or latent which creates
this commonality.
 For instance, we might have data, say, about an
individual’s income, education, occupation and dwelling
area and want to infer from these some factor (such as
social class) which summarizes the commonality of all
the said four variables.
 The technique used for such purpose is generally
described as factor analysis. 296
FACTOR ANALYSIS:
 Factor analysis, thus, seeks to resolve a large set of
measured variables in terms of relatively few
categories, known as factors.
 This technique allows the researcher to group variables
into factors (based on correlation between variables)
and the factors so derived may be treated as new
variables (often termed as latent variables) and their
value derived by summing the values of the original
variables which have been grouped into the factor.
 The meaning and name of such new variable is
subjectively determined by the researcher. Since the
factors happen to be linear combinations of data, the
coordinates of each observation or variable is
measured to obtain what are called factor loadings.
297
FACTOR ANALYSIS:
 Such factor loadings represent the correlation between the
particular variable and the factor, and are usually placed in
a matrix of correlations between the variable and the
factors
 The mathematical basis of factor analysis concerns a data
matrix* (also termed as score matrix), symbolized as S.
 The matrix contains the scores of N persons of k measures.
 Thus a1 is the score of person 1 on measure a, a2 is the score
of person 2 on measure a, and kN is the score of person N
on measure k. The score matrix then takes the form as
shown following:

298
FACTOR ANALYSIS:
 SCORE MATRIX (or Matrix S)

299
FACTOR ANALYSIS:
 It is assumed that scores on each measure are standardized
[i.e., xi = (X - Xi )2 /si ] .
 This being so, the sum of scores in any column of the matrix,
S, is zero and the variance of scores in any column is 1.0.
 Then factors (a factor is any linear combination of the
variables in a data matrix and can be stated in a general way
like:
 A = Waa + Wbb + … + Wkk)

 are obtained (by any method of factoring). After this, we


work out factor loadings (i.e., factor-variable correlations).
 Then communality, symbolized as h2, the eigen value and the
total sum of squares are obtained and the results interpreted
300
FACTOR ANALYSIS:
• For realistic results, we resort to the technique of rotation,
because such rotations reveal different structures in the
data.
• Finally, factor scores are obtained which help in explaining
what the factors mean. They also facilitate comparison
among groups of items as groups.
• With factor scores, one can also perform several other
multivariate analyses such as multiple regression, cluster
analysis, multiple discriminant analysis, etc

301
IMPORTANT METHODS OF FACTOR ANALYSIS

 There are several methods of factor analysis, but they


do not necessarily give same results. As such factor
analysis is not a single unique method but a set of
techniques. Important methods of factor analysis are:
• the centroid method;
• the principal components method;
• the maximum likelihood method.
 Before we describe these different methods of factor
analysis, it seems appropriate that some basic terms
relating to factor analysis be well understood
302
IMPORTANT METHODS of
 Definitions
 Factor: A factor is an underlying dimension that accounts for
several observed variables.
• There can be one or more factors, depending upon the nature
of the study and the number of variables involved in it.
 Factor-loadings: Factor-loadings are those values which
explain how closely the variables are related to each one of
the factors discovered.
• They are also known as factor-variable correlations. In fact,
factor-loadings work as key to understanding what the factors
mean.
• It is the absolute size (rather than the signs, plus or minus) of
the loadings that is important in the interpretation of a factor.
303
IMPORTANT METHODS of
 Communality (h2): Communality, symbolized as h2, shows
how much of each variable is accounted for by the underlying
factor taken together.
• A high value of communality means that not much of the
variable is left over after whatever the factors represent is
taken into consideration. It is worked out in respect of each
variable as under:
h2 of the ith variable = (ith factor loading of factor A)2
+ (ith factor loading of factor B)2 + …
 Eigen value (or latent root): When we take the sum of
squared values of factor loadings relating to a factor, then
such sum is referred to as Eigen Value or latent root.
• Eigen value indicates the relative importance of each factor in
accounting for the particular set of variables being analyzed.
304
IMPORTANT METHODS of
 Total sum of squares: When eigen values of all factors
are totaled, the resulting value is termed as the total
sum of squares.
• This value, when divided by the number of variables
(involved in a study), results in an index that shows how
the particular solution accounts for what all the
variables taken together represent.
• If the variables are all very different from each other,
this index will be low.
• If they fall into one or more highly redundant groups,
and if the extracted factors account for all the groups,
the index will then approach unity. 305
IMPORTANT METHODS of
 Rotation: Rotation, in the context of factor analysis, is
something like staining a microscope slide. Just as different
stains on it reveal different structures in the tissue, different
rotations reveal different structures in the data.
• Though different rotations give results that appear to be
entirely different, but from a statistical point of view, all
results are taken as equal, none superior or inferior to others.
• However, from the standpoint of making sense of the results
of factor analysis, one must select the right rotation.
• If the factors are independent, orthogonal rotation is done
and if the factors are correlated, an oblique rotation is made.

306
IMPORTANT METHODS of
• Communality for each variables will remain
undisturbed regardless of rotation but the eigen
values will change as result of rotation.
 Factor scores: Factor score represents the degree to
which each respondent gets high scores on the
group of items that load high on each factor.
• Factor scores can help explain what the factors
mean. With such scores, several other multivariate
analyses can be performed.
• We can now take up the important methods of
factor analysis 307
CENTROID METHOD OF FACTOR ANALYSIS
 This method of factor analysis, developed by L.L. Thurstone,
was quite frequently used until about 1950 before the
advent of large capacity high speed computers.*
 The centroid method tends to maximize the sum of
loadings, disregarding signs; it is the method which
extracts the largest sum of absolute loadings for each
factor in turn.
 It is defined by linear combinations in which all weights are
either + 1.0 or – 1.0.
 The main merit of this method is that it is relatively simple,
can be easily understood and involves simpler
computations.
308
CENTROID METHOD OF FACTOR ANALYSIS
• If one understands this method, it becomes easy to
understand the mechanics involved in other methods of
factor analysis.
• Various steps involved in this method are as follows:
1) This method starts with the computation of a matrix of
correlations, R, wherein units are placed in the diagonal
spaces. The product moment formula is used for
working out the correlation coefficients.
2) If the correlation matrix so obtained happens to be
positive manifold (i.e., disregarding the diagonal
elements each variable has a large sum of positive
correlations than of negative
309
CENTROID METHOD OF FACTOR ANALYSIS
correlations), the centroid method requires that the weights for
all variables be +1.0. In other words, the variables are not
weighted; they are simply summed.
• But in case the correlation matrix is not a positive manifold,
then reflections must be made before the first centroid factor is
obtained.
3) The first centroid factor is determined as under:
a) The sum of the coefficients (including the diagonal unity) in
each column of the correlation matrix is worked out.
b) Then the sum of these column sums (T) is obtained.
c) The sum of each column obtained as per (a) above is divided
by the square root of T obtained in (b) above, resulting in
what are called centroid loadings.
d) This way each centroid loading (one loading for one variable)
is computed. The full set of loadings so obtained constitute
the first centroid factor (say A).
310
CENTROID METHOD OF FACTOR ANALYSIS
4) To obtain second centroid factor (say B), one must first
obtain a matrix of residual coefficients. For this purpose,
the loadings for the two variables on the first centroid
factor are multiplied.
 This is done for all possible pairs of variables (in each
diagonal space is the square of the particular factor
loading).
 The resulting matrix of factor cross products may be
named as Q1. Then Q1 is subtracted element by element
from the original matrix of correlation, R, and the result
is the first matrix of residual coefficients, R1.*
 After obtaining R1, one must reflect some of the
variables in it, meaning thereby that some of the
variables are given negative signs in the sum [This is
usually done by inspection. 311
CENTROID METHOD OF FACTOR ANALYSIS
 The aim in doing this should be to obtain a reflected matrix,
R'1, which will have the highest possible sum of coefficients
(T)].
 For any variable which is so reflected, the signs of all
coefficients in that column and row of the residual matrix are
changed.
 When this is done, the matrix is named as ‘reflected matrix’
form which the loadings are obtained in the usual way
(already explained in the context of first centroid factor), but
the loadings of the variables which were reflected must be
given negative signs.
 The full set of loadings so obtained constitutes the second
centroid factor (say B).
 Thus loadings on the second centroid factor are obtained
from R'1.
312
CENTROID METHOD OF FACTOR ANALYSIS
5)For subsequent factors (C, D, etc.) the same process outlined
above is repeated.
 After the second centroid factor is obtained, cross products
are computed forming, matrix, Q2. This is then subtracted
from R1 (and not from R'1) resulting in R2.
 To obtain a third factor (C), one should operate on R2 in the
same way as on R1.
 First, some of the variables would have to be reflected to
maximize the sum of loadings, which would produce R'2.
 Loadings would be computed from R'2 as they were from R'1.
Again, it would be necessary to give negative signs to the
loadings of variables which were reflected which would
result in third centroid factor (C).
313
PRINCIPAL-COMPONENTS METHOD OF FACTOR ANALYSIS

 Principal-components method (or simply P.C. method) of


factor analysis, developed by H. Hotelling, seeks to
maximize the sum of squared loadings of each factor
extracted in turn.
 Accordingly, PC factor explains more variance than would
the loadings obtained from any other method of factoring.
 The aim of the principal components method is the
construction out of a given set of variables Xj’s (j = 1, 2, …,
k) of new variables (pi), called principal components which
are linear combinations of the Xs

314
PRINCIPAL-COMPONENTS METHOD OF FACTOR ANALYSIS

 The method is being applied mostly by using standardized


variables, i.e.,

 The aij’s are called loadings and are worked out in such a
way that the extracted principal components satisfy two
conditions:
315
PRINCIPAL-COMPONENTS METHOD OF FACTOR ANALYSIS

(i) principal components are uncorrelated (orthogonal) and


(ii) the first principal component (p1) has the maximum
variance, the second principal component (p2) has the
next maximum variance and so on.
 Following steps are usually involved in principal
components method
1) Estimates of aij’s are obtained with which X’s are
transformed into orthogonal variables i.e., the
principal components.
 A decision is also taken with regard to the question:
how many of the components be retained into the
analysis?
316
PRINCIPAL-COMPONENTS METHOD OF FACTOR ANALYSIS

 We then proceed with the regression of Y on these


principal components i.e.,

 From the ∑aij and ∑yij , we may find bij of the original model,
transferring back from the p’s into the standardized X’s.
 Alternative method for finding the factor loadings is as
under
1) Correlation coefficients (by the product moment
method) between the pairs of k variables are worked
out and may be arranged in the form of a correlation
matrix, R, as under
317
ALTERNATIVE METHOD FOR FINDING THE FACTOR LOADINGS

• The main diagonal spaces include unities since such


elements are self-correlations. The correlation matrix
happens to be a symmetrical matrix.
2) Presuming the correlation matrix to be positive manifold (if
this is not so, then reflections as mentioned in the case of
centroid method must be made), the first step is to obtain
the sum of coefficients in each column, including the
diagonal element.
 The vector of column sums is referred to as Ua1 and when
Ua1 is normalized, we call it Va1. This is done by squaring
and summing the column sums in Ua1 and then dividing
each element in Ua1 by the square root of the sum of
squares (which may be termed as normalizing factor). 318
ALTERNATIVE METHOD FOR FINDING THE FACTOR LOADINGS

 Then elements in Va1 are accumulatively multiplied by


the first row of R to obtain the first element in a new
vector Ua2.
 For instance, in multiplying Va1 by the first row of R, the
first element in Va1 would be multiplied by the r11 value
and this would be added to the product of the second
element in Va1 multiplied by the r12 value.
 To this would be added to the product of third element
in Va1 multiplied by the r13 value, and so on for all the
corresponding elements in Va1 and the first row of R.
 To obtain the second element of Ua2, the same process
would be repeated i.e., the elements in Va1 are
accumulatively multiplied by the 2nd row of R.
 The same process would be repeated for each row of R
and the result would be a new vector Ua2. 319
ALTERNATIVE METHOD FOR FINDING THE FACTOR LOADINGS

 Then Ua2 would be normalized to obtain Va2. One


would then compare Va1 and Va2.
 If they are nearly identical, then convergence is said to
have occurred (If convergence does not occur, one
should go on using these trial vectors again and again
till convergence occurs).
 Suppose the convergence occurs when we work out
Va8 in which case Va7 will be taken as Va (the
characteristic vector) which can be converted into
loadings on the first principal component when we
multiply the said vector (i.e., each element of Va) by
the square root of the number we obtain for
normalizing Ua8.
320
ALTERNATIVE METHOD FOR FINDING THE FACTOR LOADINGS

3) To obtain factor B, one seeks solutions for Vb, and the


actual factor loadings for second component factor, B.
 The same procedures are used as we had adopted for
finding the first factor, except that one operates off the
first residual matrix, R1 rather than the original correlation
matrix R (We operate on R1 in just the same way as we did
in case of centroid method stated earlier).
4) This very procedure is repeated over and over again to
obtain the successive PC factors Other steps involved in
factor analysis
 Next the question is: How many principal components to
retain in a particular study? Various criteria for this
purpose have been suggested, but one often used is
Kaiser’s criterion. 321
ALTERNATIVE METHOD FOR FINDING THE FACTOR LOADINGS

 According to this criterion only the principal components,


having latent root greater than one, are considered as
essential and should be retained.
 The principal components so extracted and retained are then
rotated from their beginning position to enhance the
interpretability of the factors.
 Communality, symbolized, h2, is then worked out which
shows how much of each variable is accounted for by the
underlying factors taken together.
 A high communality figure means that not much of the
variable is left over after whatever the factors represent is
taken into consideration. It is worked out in respect of each
variable as under:

h2 of the ith variable = (ith factor loading of factor A)2


+ (ith factor loading of factor B)2 + … 322
ALTERNATIVE METHOD FOR FINDING THE FACTOR LOADINGS

Then follows the task of interpretation. The amount of


variance explained (sum of squared loadings) by each
PC factor is equal to the corresponding characteristic
root.
When these roots are divided by the number of
variables, they show the characteristic roots as
proportions of total variance explained

 The variables are then regressed against each factor


loading and the resulting regression coefficients are
used to generate what are known as factor scores
which are then used in further analysis and can also be
used as inputs in several other multivariate analyses.
323
MAXIMUM LIKELIHOOD (ML) METHOD OF FACTOR ANALYSIS

 The ML method consists in obtaining sets of factor loadings


successively in such a way that each, in turn, explains as
much as possible of the population correlation matrix as
estimated from the sample correlation matrix.
 If Rs stands for the correlation matrix actually obtained
from the data in a sample, Rp stands for the correlation
matrix that would be obtained if the entire population were
tested.
 Then the ML method seeks to extrapolate what is known
from Rs in the best possible way to estimate Rp (but the PC
method only maximizes the variance explained in Rs).
324
MAXIMUM LIKELIHOOD (ML) METHOD OF FACTOR ANALYSIS

 Thus, the ML method is a statistical approach in which one


maximizes some relationship between the sample of data and
the population from which the sample was drawn
 The arithmetic underlying the ML method is relatively difficult
in comparison to that involved in the PC method and as such
is understandable when one has adequate grounding in
calculus, higher algebra and matrix algebra in particular.
 Iterative approach is employed in ML method also to find
each factor, but the iterative procedures have proved much
more difficult than what we find in the case of PC method.
 Hence the ML method is generally not used for factor analysis
in practice.

325
MAXIMUM LIKELIHOOD (ML) METHOD OF FACTOR ANALYSIS

 The loadings obtained on the first factor are employed


in the usual way to obtain a matrix of the residual
coefficients.
 A significance test is then applied to indicate whether
it would be reasonable to extract a second factor.
 This goes on repeatedly in search of one factor after
another. One stops factoring after the significance test
fails to reject the null hypothesis for the residual
matrix.
 The final product is a matrix of factor loadings. The ML
factor loadings can be interpreted in a similar fashion as
we have explained in case of the centroid or the PC
method. 326
ROTATION IN FACTOR ANALYSIS
 One often talks about the rotated solutions in the
context of factor analysis. This is done (i.e., a factor
matrix is subjected to rotation) to attain what is
technically called “simple structure” in data.
 Simple structure according to L.L. Thurstone is
obtained by rotating the axes until:
1) Each row of the factor matrix has one zero.
2) Each column of the factor matrix has p zeros, where p is
the number of factors.
3) For each pair of factors, there are several variables for
which the loading on one is virtually zero and the
loading on the other is substantial.
327
ROTATION IN FACTOR ANALYSIS
4) If there are many factors, then for each pair of
factors there are many variables for which both
loadings are zero.
5) For every pair of factors, the number of variables
with non-vanishing loadings on both of them is
small.
 All these criteria simply imply that the factor analysis
should reduce the complexity of all the variables.
 There are several methods of rotating the initial factor
matrix (obtained by any of the methods of factor
analysis) to attain this simple structure.
 Varimax rotation is one such method that maximizes
(simultaneously for all factors) the variance of the
loadings within each factor. 328
ROTATION IN FACTOR ANALYSIS
 The variance of a factor is largest when its smallest
loadings tend towards zero and its largest loadings tend
towards unity.
 In essence, the solution obtained through varimax
rotation produces factors that are characterized by
large loadings on relatively few variables.
 The other method of rotation is known as quartimax
rotation wherein the factor loadings are transformed
until the variance of the squared factor loadings
throughout the matrix is maximized.
 As a result, the solution obtained through this method
permits a general factor to emerge, whereas in case of
varimax solution such a thing is not possible 329
R-TYPE AND Q-TYPE FACTOR ANALYSES
 But both solutions produce orthogonal factors i.e., uncorrelated
factors. It should, however, be emphasized that right rotation must be
selected for making sense of the results of factor analysis.
 Factor analysis may be R-type factor analysis or it may be Q-type
factor analysis. In R-type factor analysis, high correlations occur when
respondents who score high on variable 1 also score high on variable
2 and respondents who score low on variable 1 also score low on
variable 2.
 Factors emerge when there are high correlations within groups of
variables. In Q-type factor analysis, the correlations are computed
between pairs of respondents instead of pairs of variables.
 High correlations occur when respondent 1’s pattern of responses on
all the variables is much like respondent 2’s pattern of responses.

330
R-TYPE AND Q-TYPE FACTOR ANALYSES
• Factors emerge when there are high correlations within
groups of people. Q-type analysis is useful when the
object is to sort out people into groups based on their
simultaneous responses to all the variables.
• Factor analysis has been mainly used in developing
psychological tests (such as IQ tests, personality tests,
and the like) in the realm of psychology.
• In marketing, this technique has been used to look at
media readership profiles of people.
• Merits: The main merits of factor analysis can be stated
thus:
1) The technique of factor analysis is quite useful
when we want to condense and simplify the
multivariate data. 331
R-TYPE AND Q-TYPE FACTOR ANALYSES
2) The technique is helpful in pointing out important and interesting,
relationships among observed data that were there all the time, but
not easy to see from the data alone.
3) The technique can reveal the latent factors (i.e., underlying factors
not directly observed) that determine relationships among several
variables concerning a research study.
 For example, if people are asked to rate different cold drinks (say,
Limca, Nova-cola, Gold Spot and so on) according to preference, a
factor analysis may reveal some salient characteristics of cold drinks
that underlie the relative preferences.
4) The technique may be used in the context of empirical clustering of
products, media or people i.e., for providing a classification scheme
when data scored on various rating scales have to be grouped
together.
332
LIMITATIONS OF FACTOR ANALYSIS
 One should also be aware of several limitations of factor
analysis. Important ones are as follows
1) Factor analysis, like all multivariate techniques, involves
laborious computations involving heavy cost burden.
 With computer facility available these days, there is no
doubt that factor analysis has become relatively faster and
easier, but the cost factor continues to be the same i.e.,
large factor analyses are still bound to be quite expensive.
2) The results of a single factor analysis are considered
generally less reliable and dependable for very often a
factor analysis starts with a set of imperfect data.
 “The factors are nothing but blurred averages, difficult to
be identified.”
333
R-TYPE AND Q-TYPE FACTOR ANALYSES
 To overcome this difficulty, it has been realised that analysis
should at least be done twice. If we get more or less similar
results from all rounds of analyses, our confidence concerning
such results increases.
3) Factor-analysis is a complicated decision tool that can be used
only when one has thorough knowledge and enough experience
of handling this tool. Even then, at times it may not work well
and may even disappoint the user.
 To conclude, we can state that in spite of all the said limitations
“when it works well, factor analysis helps the investigator make
sense of large bodies of intertwined data.
 When it works unusually well, it also points out some
interesting relationships that might not have been obvious from
examination of the input data alone”. 334
CLUSTER ANALYSIS
 Cluster analysis consists of methods of classifying
variables into clusters.
 Technically, a cluster consists of variables that correlate
highly with one another and have comparatively low
correlations with variables in other clusters.
 The basic objective of cluster analysis is to determine
how many mutually and exhaustive groups or clusters,
based on the similarities of profiles among entities,
really exist in the population and then to state the
composition of such groups.
 Various groups to be determined in cluster analysis are
not predefined as happens to be the case in
discriminant analysis. 335
CLUSTER ANALYSIS
 In general, cluster analysis contains the following steps to
be performed:
a) First of all, if some variables have a negative sum of
correlations in the correlation matrix, one must reflect
variables so as to obtain a maximum sum of positive
correlations for the matrix as a whole.
b) The second step consists in finding out the highest
correlation in the correlation matrix and the two variables
involved (i.e., having the highest correlation in the matrix)
form the nucleus of the first cluster.
c) Then one looks for those variables that correlate highly
with the said two variables and includes them in the cluster.
This is how the first cluster is formed.
336
CLUSTER ANALYSIS
d) To obtain the nucleus of the second cluster, we find two
variables that correlate highly but have low correlations
with members of the first cluster.
 Variables that correlate highly with the said two variables
are then found. Such variables along the said two
variables thus constitute the second cluster.
e) One proceeds on similar lines to search for a third cluster
and so on.
 From the above description we find that clustering
methods in general are judgmental and are devoid of
statistical inferences.
337
CLUSTER ANALYSIS
 For problems concerning large number of variables,
various cut-and try methods have been proposed for
locating clusters.
 McQuitty has specially developed a number of rather
elaborate computational routines* for that purpose.
 In spite of the above stated limitation, cluster analysis
has been found useful in context of market research
studies.
 Through the use of this technique we can make
segments of market of a product on the basis of several
characteristics of the customers such as personality,
socio-economic considerations, psychological factors,
purchasing habits and like ones. 338
Multidimensional Scaling
 Multidimensional scaling (MDS) allows a researcher to
measure an item in more than one dimension at a time.
 The basic assumption is that people perceive a set of
objects as being more or less similar to one another on
a number of dimensions (usually uncorrelated with one
another) instead of only one.
 There are several MDS techniques (also known as
techniques for dimensional reduction) often used for
the purpose of revealing patterns of one sort or
another in interdependent data structures.
 If data happen to be non-metric, MDS involves rank
ordering each pair of objects in terms of similarity.
339
Multidimensional Scaling
 Then the judged similarities are transformed into
distances through statistical manipulations and are
consequently shown in n-dimensional space in a way
that the interpoint distances best preserve the original
interpoint proximities.
 After this sort of mapping is performed, the dimensions
are usually interpreted and labeled by the researcher.
 The significance of MDS lies in the fact that it enables
the researcher to study
 “The perceptual structure of a set of stimuli and the
cognitive processes underlying the development of this
structure.... MDS provides a mechanism for
determining the truly salient attributes without forcing
the judge to appear irrational” 340
Multidimensional Scaling
• With MDS, one can scale objects, individuals or both with a
minimum of information.
• The MDS analysis will reveal the most salient attributes
which happen to be the primary determinants for making a
specific decision.

341
LATENT STRUCTURE ANALYSIS
 This type of analysis shares both of the objectives of factor
analysis viz., to extract latent factors and express
relationship of observed (manifest) variables with these
factors as their indicators and to classify a population of
respondents into pure types.
 This type of analysis is appropriate when the variables
involved in a study do not possess dependency relationship
and happen to be non-metric.
 In addition to the above stated multivariate techniques, we
may also describe the salient features of what is known as
“Path analysis”, a technique useful for decomposing the
total correlation between any two variables in a causal
system 342
PATH ANALYSIS
 The term ‘path analysis’ was first introduced by the biologist
Sewall Wright in 1934 in connection with decomposing the
total correlation between any two variables in a causal
system.
 The technique of path analysis is based on a series of multiple
regression analyses with the added assumption of causal
relationship between independent and dependent variables.
 This technique lays relatively heavier emphasis on the
heuristic use of visual diagram, technically described as a path
diagram.
 An illustrative path diagram showing interrelationships
between Fathers’ education, Fathers’ occupation, Sons’
education, Sons’ first and Sons’ present occupation can be
shown in the Fig. below 343
PATH ANALYSIS
 Path analysis makes use of standardized partial regression
coefficients (known as beta weights) as effect coefficients.
 In linear additive effects are assumed, then through path
analysis a simple set of equations can be built up showing how
each variable depends on preceding variables.
 “The main principle of path analysis is that any correlation
coefficient between two variables, or a gross or overall
measure of empirical relationship can be decomposed into a
series of parts: separate paths of influence leading through
chronologically intermediate variable to which both the
correlated variables have links.”
 The merit of path analysis in comparison to correlational
analysis is that it makes possible the assessment of the relative
influence of each antecedent or explanatory variable
344
PATH ANALYSIS
explicit the assumptions underlying the causal connections and then
by elucidating the indirect effect of the explanatory variables.

345
PATH ANALYSIS
 The use of the path analysis technique requires the
assumption that there are linear additive, a symmetric
relationships among a set of variables which can be measured
at least on a quasi-interval scale.
 Each dependent variable is regarded as determined by the
variables preceding it in the path diagram, and a residual
variable, defined as uncorrelated with the other variables, is
postulated to account for the unexplained portion of the
variance in the dependent variable.
 The determining variables are assumed for the analysis to be
given (exogenous in the model).”
 We may illustrate the path analysis technique in connection
with a simple problem of testing a causal model with three
explicit variables as shown in the following path diagram:346
PATH ANALYSIS

• The structural equation for the above can be written as:

347
PATH ANALYSIS
where the X variables are measured as deviations
from their respective means.
 p21 may be estimated from the simple regression of
X2 on X1 i.e., X2 = b21Xl and p31 and p32 may be
estimated from the regression of X3 on X2 and X1 as
under:

where b31.2 means the standardized partial


regression coefficient for predicting variable 3 from
variable 1 when the effect of variable 2 is held
constant 348
PATH ANALYSIS
 In path analysis the beta coefficient indicates the direct
effect of Xj (j = 1, 2, 3, ..., p) on the dependent variable.
 Squaring the direct effect yields the proportion of the
variance in the dependent variable Y which is due to
each of the p number of independent variables
Xj (i = 1, 2, 3, ..., p).
 After calculating the direct effect, one may then obtain a
summary measure of the total indirect effect of Xj on
the dependent variable Y by subtracting from the zero
correlation coefficient ryxj, the beta coefficient bj i.e.,
 Indirect effect of Xj on Y = cjy = ryxj – bj
for all j = 1, 2, ..., p.
349
PATH ANALYSIS
 Such indirect effects include the unanalyzed effects and
spurious relationships due to antecedent variables.
 In the end, it may again be emphasized that the main virtue
of path analysis lies in making explicit the assumptions
underlying the causal connections and in elucidating the
indirect effects due to antecedent variables of the given
system.

350

S-ar putea să vă placă și