Sunteți pe pagina 1din 108

Conjoint Analysis

Conjoint Analysis is used by marketers to tell which product attributes of a product are most important to a consume
and to what degree is each important to the consumer.

Step 1 - Make a list of product attributes


to be evaluated by consumer.
Brand
Color
Price
A
Red
$50
B
Blue
$100
C
$150
Step 2 - Make a complete list of all possible
attribute combinations.
Card
Brand
Color
Price
1
A
Red
50
2
A
Red
100
3
A
Red
150
4
A
Blue
50
5
A
Blue
100
6
A
Blue
150
7
B
Red
50
8
B
Red
100
9
B
Red
150
10
B
Blue
50
11
B
Blue
100
12
B
Blue
150
13
C
Red
50
14
C
Red
100
15
C
Red
150
16
C
Blue
50
17
C
Blue
100
18
C
Blue
150

Step 3 - Have the consumer rank each combination


on a scale of 1 (worst) to 10 (best).
Card
Brand
Color
Price
1
1
1
50
2
1
1
100
3
1
1
150
4
1
2
50
5
1
2
100
6
1
2
150
7
2
1
50
8
2
1
100
9
2
1
150
10
2
2
50
11
2
2
100
12
2
2
150
13
3
1
50
14
3
1
100
15
3
1
150
16
3
2
50
17
3
2
100
18
3
2
150

Step 4 - Final data preparation step prior to running regression - Remove 1 variable from each set of
variables with more than 1 choice. Removal of these variables removes the predictability of the other variables.
Card
A
B
C
Red
Blue
$50
$100
$150
1
1
0
0
1
0
1
0
0
2
1
0
0
1
0
0
1
0
3
1
0
0
1
0
0
0
1
4
1
0
0
0
1
1
0
0
5
1
0
0
0
1
0
1
0
6
1
0
0
0
1
0
0
1
7
0
1
0
1
0
1
0
0
8
0
1
0
1
0
0
1
0

9
10
11
12
13
14
15
16
17
18

0
0
0
0
0
0
0
0
0
0

1
1
1
1
0
0
0
0
0
0

0
0
0
0
1
1
1
1
1
1

1
0
0
0
1
1
1
0
0
0

0
1
1
1
0
0
0
1
1
1

0
1
0
0
1
0
0
1
0
0

0
0
1
0
0
1
0
0
1
0

1
0
0
1
0
0
1
0
0
1

Card
1
2
3
4
5
6
7

B
0
0
0
0
0
0
1

C
0
0
0
0
0
0
0

Blue
0
0
0
1
1
1
0

$100
0
1
0
0
1
0
0

$150
0
0
1
0
0
1
0

8
9
10
11
12
13
14
15
16
17
18

1
1
1
1
1
0
0
0
0
0
0

0
0
0
0
0
1
1
1
1
1
1

0
0
1
1
1
0
0
0
1
1
1

1
0
0
1
0
0
1
0
0
1
0

0
1
0
0
1
0
0
1
0
0
1

e most important to a consumer

Conjoint

each combination
Preference
5
5
0
8
5
2
7
5
3
9
6
5
10
7
5
9
7
8

m each set of
ity of the other variables.
Preference
5
5
0
8
5
2
7
5

Conjoint is an analysis that provides a marketer with a method to predict how much more or less a co
one combination of product attributes over another combination of product attributes. The degree tha
a product attribute is called the "utility" of that attribute. For example, a product might come in three b
at three levels of price. Each color, brand, and price level will have its own utility caluculated during th
Conjoint is done using Multiple Regression. Each product attribute variation will assigned as one of th
to the Multiple Regression equation. For example, the color red will be represented by one independe
blue will be presented by another independent variable. The resulting regression equation assigns a
variable. These coefficients are the utilities of each of the attributes. The more positive an individual c
highly valued is the associated product attribute. The coefficients can be interrpretted as the utilities o

In this conjoint exercise, we are going to determine the utilities of eight product attributes. They are a

There are 18 possible combinations of these attributes (3 brands x two colors x three prices). The
on a scale of 0 to 10 (10 being the best). The consumer test results are modified for the regression e
The resulting regression analysis calculates a coefficient for each independent variable as part of the
Each coefficient is the measure of value that the consumer places on the product attribute associated

The chart on the left side provides the choices that the consumer had to analyze. The consumer
was provided with 18 separate cards. Each card contained one of the 18 possible variations of
product attributes. The consumer had to rate their overall preference of each combination of attribute
on a scale of 1 to 10.

The chart on the right shows the consumer's stated preference for each combination of attributes.
Non-numerical attributes were assigned numbers. Brand A and Red are shown as 1's in their respect
respective columns. Brand C was assigned a 3 in its respective column.

The chart is now further prepared for Regression Analysis. Each individual product attribute

3
9
6
5
10
7
5
9
7
8

Preference
5
5
0
8
5
2
7
5
3
9
6
5
10
7
5
9
7
8

is given its own column. Each product attribute now has either the value of 1 or 0.

One problem must be corrected before this data can be submitted for Regression
Analysis. Independent variables or combinations of independent variables should
not be able to predict each other. Using independent variables that are highly correlated
to each other (either positively or negatively) produce a regression error known as co-linearity.
For example, if the color is either red or blue, knowing the state of one of the color (if
the state of Blue = 1, the state of Red must = 0), we know the state of the other color.
This error condition also occurs when there are 3 variables. If you know the states of 2,
you know the state of the remaining one.
These error conditions are solved by removing one column of data from each type of
variation. Information about Brand A, Red, and Price level $50 were removed.
We will see below that this has no effect on the accuracy of the Regression output.

SUMMARY OUTPUT
Regression Statistics
Multiple R
0.933190299
R Square
0.8708441342
Adjusted R Square
0.8121369224
Standard Error
1.1413191612
Observations
17
ANOVA
df
Regression
Residual
Total

Intercept
Brand B
Brand C
Blue
$100
$150

5
11
16
Coefficients
5.9166666667
1.5138888889
3.3472222222
1.2314814815
-2.3194444444
-4.3194444444

SS
MS
96.6124727669 19.3224946
14.3287037037 1.30260943
110.9411764706
Standard Error
0.8070345183
0.6989123946
0.6989123946
0.5599921057
0.6989123946
0.6989123946

t Stat
7.33136753
2.16606387
4.7891871
2.19910507
-3.31864832
-6.18023729

Regression Equation Combination Preference = 5.91666666666667 + (1.5138


Removing information about Brand A, Red, and Price level $50 did not hurt the output
accuracy. These product attributes could still be considered to be part of the
Regression equation, but with coefficients of 0.

The coefficients attached to each of the product attributes simply show the consumer's
utility for that attribute. The utilities for each attribute are relative to each other.
For example, Price level $50 has the highest preference with with a utility of 0 while Price
level $150 has the lowest utility of -4.319444444. Blue has a utility of 1.231481481,
which is that much hgiher than the utility of red, which was 0. Brand C was the most liked brand
with a utility of 3.347222222 with Brand A is liked the least with a utility of 0.
The resulting Regression Equation still does a good job of predicting overall preference.
For example, the consumer rated the combination of attributes on card 13 with a 10.
Here the predicted Combination Preference for card 13 attribute combination is:
(5.9166) + (3.3472)(1) = 9.263 which is very close to the consumer's rating of 10.

The regression appears to be a good one because Adjusted R Squared is high (close to 1).
Adjusted R Square = Explained variance over unexplained variance. Here, Adjusted R Square is 8.12
Each of the variables has a low p-Value and is therefore a significant predictor.
The absolute value of the coefficients indicates the effect that each has on the consumer's
overall liking of product. For example, Brand C (coefficient = 3.347) produced the highest
positive influence while the $150 price (coefficient = -4.319) reduces consumer liking the most.

The overall low significance of the regressions F statistic indicates that the regression, overall, is valid

o predict how much more or less a consumer will value


of product attributes. The degree that a consumer likes
mple, a product might come in three brands, two colors, and
ve its own utility caluculated during the conjoint analysis.
ute variation will assigned as one of the independent variable inputs
will be represented by one independent variable while the color
ulting regression equation assigns a coefficient to each independent
utes. The more positive an individual coefficient is, the more
s can be interrpretted as the utilities of the variables.

of eight product attributes. They are as follows:

ds x two colors x three prices). The consumer rates each combination


ults are modified for the regression equation and then run through the regression.
ch independent variable as part of the regression output equation.
es on the product attribute associated with that utiliy.

er had to analyze. The consumer


of the 18 possible variations of
ence of each combination of attributes

for each combination of attributes.


Red are shown as 1's in their respective columns. Brand B and Blue were shown as 2's in their

h individual product attribute

he value of 1 or 0.

ed for Regression
t variables should
hat are highly correlated
on error known as co-linearity.

of one of the color (if


tate of the other color.

ou know the states of 2,

ata from each type of


were removed.

Regression output.

F
14.83368241

Significance F
0.000143011

P-value
1.482774E-005
0.0531402239
0.0005630386
0.0501644572
0.0068476873
6.905826E-005

Lower 95%
4.1403956692
-0.0244069189
1.8089264144
-0.0010528323
-3.8577402522
-5.8577402522

Upper 95%
7.692937664
3.052184697
4.88551803
2.464015795
-0.781148637
-2.781148637

Lower 95.0%
4.1403956692
-0.0244069189
1.8089264144
-0.0010528323
-3.8577402522
-5.8577402522

Upper 95.0%
7.6929376641
3.0521846967
4.88551803
2.4640157952
-0.7811486366
-2.7811486366

5.91666666666667 + (1.51388888888889)*(Brand B) + (3.34722222222222)*(Brand C) + (1.23148148148148

did not hurt the output


e part of the

y show the consumer's


to each other.

th a utility of 0 while Price


lity of 1.231481481,
and C was the most liked brand

cting overall preference.


on card 13 with a 10.

e combination is:
onsumer's rating of 10.

Squared is high (close to 1).


ance. Here, Adjusted R Square is 8.12.

icant predictor.

ach has on the consumer's


47) produced the highest
uces consumer liking the most.

es that the regression, overall, is valid.

C) + (1.23148148148148)*(Blue) + (-2.31944444444445)*($100) + (-4.319444444)*($150)

Regression

Regression is a statistical techniques that is used to create predictive models. The models receive input (independen
the outcome of the dependent variable.

When performing Multiple Regression, Correlation Analysis should be performed on a independent and dependent va

Monthly Rates of Return


Date
1/30/1998
2/27/1998
3/31/1998
4/30/1998

S&P
0.8799
7.5187
5.558
1.3716

Viacom
0.7541
14.9701
11.9792
7.907

AT&T
2.1407
-2.5948
7.7869
-8.5551

GM
-4.6296
18.986
-1.7226
-0.5535

Coke
-18.8406
6.6964
-3.3473
5.8442

5/29/1998
6/27/1998

-1.6289
2.4171

-5.1724
3.4091

1.2474
0.8214

6.679
1.8261

1.9427
2.1063

S&P
Viacom
0.8799
0.7541
7.5187 14.9701
5.558 11.9792
1.3716
7.907
-1.6289
-5.1724
2.4171
3.4091

AT&T
2.1407
-2.5948
7.7869
-8.5551
1.2474
0.8214

GM
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261

odels. The models receive input (independent) variables and predict

erformed on a independent and dependent variables first, as below.

Correlation Analysis
Tools / Data Analysis / Correlation
S&P
S&P
Viacom
AT&T
GM
Coke

Viacom

AT&T

GM

1
0.9386616468
0.1285583787
0.4703491066

1
-0.0989328142
0.3504379667

1
-0.2637108598

0.2550526617

0.3423373581

-0.5014902082

0.627513676

Coke has a low correlation with the S&P and is therefore not a good predictor of the S&P
Also, if two of the independent variables above are highly correctlated with each other, only one of th
be used in the Multiple Regression below. This is not the case here because none of the variables ab
a high correlation with another variable. Using highly correlated variables as inputs to a Multiple Reg
causes an error called Multicollinearity and should be avoided. Multiple Regressions should be built
new independent variable at a time and evaluating results. Good new independent variables noticea
and lower Standard Error without causing much change to Coefficients. Poor new independent varia
R-Square much but have unpredictable effects on Coefficients. Build regressions up one variable at
evaluate after adding each new variable.

Multiple Regression

Predicting S&P returns from returns of other investments

Tools / Data Analysis / Regression


Coke was not used because it has a low correlation with S&P and is therefore not a good predictor of the S&P
All others (Viacom, AT&T, GM) were used because they had a relatively high correction with S&P and low corrections
Regressions are Predictive, not Forecasting. All new independent variables must be chosen from within the range of

SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9877323112
R Square
0.9756151185
Adjusted R Square
0.9390377963
Standard Error
0.8210012659
Observations
6
ANOVA

Adjusted R Square - states that 94% of the varian

The high coefficient of Viacom indicates that it is


The standard error of regression is used to deter
95% confidence interval = Predicted S&P Value +
MS (Model Significance) shows high ratio of expl
F Ratio = Explained variance (17.9) / Unexplained

df
3
2
5

SS
53.9356008961
1.3480861572
55.2836870533

MS
17.978533632
0.6740430786

F
26.6726774628

Coefficients
0.1250621001
0.3942208055
0.1701350642
0.0912674536

Standard Error
0.4416975598
0.0525631859
0.0701416328
0.0474978751

t Stat
0.2831396672
7.4999412397
2.4255931513
1.9215060332

P-value
0.8036858951
0.0173175912
0.1361101668
0.1946174186

Regression
Residual
Total

Intercept
Viacom
AT&T
GM

Regression Equation S&P = (0.125062100111188) + (0.39422080554261)*(Viacom) + (0.1701350641

Interpretting the Regression:

Low signifiance of the F statistic - indicates that, overall, the regession output is statistically significant (valid), at leas

p-Values for each variable - The lower the p-Value, the better predictor the variable was.
Viacom returns are a good predictor of the S&P
AT&T and GM returns are much less effective predictors of the S&P return (higher p-Values) - These would not be vali
The small coefficients of these two company returns also indicate that they are lesss valid predictors.
Adding new independent variables to a regression equation always increases R Square.

Adjusted R Square is increased only when newly added independent variable increase predictability of the dependent

Coke

dictor of the S&P


th each other, only one of them should
use none of the variables above have
as inputs to a Multiple Regression
Regressions should be built up by adding one
dependent variables noticeably raise R-Square
Poor new independent variables don't change
ressions up one variable at a time and

r investments

edictor of the S&P


S&P and low corrections with each other
n from within the range of the previously sampled independent variable,

ates that 94% of the variance of the S&P return is explained by the model - This is good.

Viacom indicates that it is the biggest predictor of the S&P. It's high correlation indicates this as well.
egression is used to determine confidence intervals.
al = Predicted S&P Value +/- z(95%) * (Standard Error)
e) shows high ratio of explained (regression) over unexplained (residual) variance. Low p value (Significance of F) shows regressi
iance (17.9) / Unexplained variance (0.67) = 26.6 - This is high and is good. A low P value shows that this is significant.

Significance F
0.0363534242

Lower 95%
-1.7754091113
0.1680596703
-0.1316600237
-0.1130994085

Upper 95%
2.0255333115
0.6203819408
0.4719301521
0.2956343158

Lower 95.0%
-1.7754091113
0.1680596703
-0.1316600237
-0.1130994085

com) + (0.170135064181028)*(AT&T) + (0.0912674536429872)*(GM)

y significant (valid), at least to the 0.05 level of significance.

- These would not be valid predictors for a 0.05 level of significance.

ictability of the dependent variable.

indicates this as well.

ce. Low p value (Significance of F) shows regression model is statistically significant


w P value shows that this is significant.

Upper 95.0%
2.0255333115
0.6203819408
0.4719301521
0.2956343158

Testing Two Population Means To Determine If Change Occ


The Confidence Interval or the t-Test can be used to determine if a population mean has changed.

Testing to determine if a change has occurred, for example, after an ad co


using the Confidence Interval
BEFORE
Average
Daily

AFTER
Average
Daily

DEALER

Sales

Sales

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O

100
130
120
140
155
200
300
260
190
185
100
130
120
140
155

110
135
122
157
160
206
309
283
202
192
110
135
122
157
160

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

P
Q
R
S
T
U
V
W
X
Y
Z
A1
B1
C1
D1

200
300
260
190
185
100
130
120
140
155
200
300
260
190
185

206
309
283
202
192
110
135
122
157
160
206
309
283
202
192

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

Testing to determine if a change has occurred, using the t-Test


t-Test - Paired Means

Sampling the same thing before and after to determine if somethi


Trying to determine if the "after" samples are statistically different
30 Samples should always be taken, unless population is known
(Here only 6 samples are taken for brevity)
In this case, we want to determine with 95% certainty whether or
a change from before to after. Null

hypothesis is 0 and =

t-Test: Paired Two Sample for Means


Before
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091

After
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261

Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

P(T<=t) one-tail (0.289) is greater the (0.05) so th

P(T<=t) two-tail (0.579) is greater the (0.05) so th

Here, because is less than both P values, we cannot reject the


in either case. The null Hypothesis states that there is no change

Problem:

Car
1
2
3
4
5
6

A tire manufacturer wants to determine if a new rubber formulation will improve tire wear.
12 sets of tires were created with the old rubber formula and 12 sets of news with the new
rubber formulation. They were placed on the following cars and driven until they wore out.
Determine at a 0.05 level of significance whether the new rubber produces longer tread life.

Tire Location
Front
Rear
Front
Rear
Front
Rear
Front
Rear
Front
Rear
Front
Rear

Old Rubber
37661
42342
31108
41239
32903
42658
29829
39616
34625
42650
31923
39990

New Rubber
31902
41203
38816
43305
35375
52353
30883
49424
38724
43234
34565
43861

The NULL Hypothesis here is that the mean tread wear of the old rubber equals the mean tread wear of the
The p-Value for both one-tailed test and two-0tailed test is less than the level of significance (0.05) so the N
is rejected - Therefore, we have a 95% certainty that the new rubber compund increases tread wear.

Problem:

Viacom
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091
0.7541
14.9701

Evaluate the returns of these two stocks to determine if there is a real difference. Use a 0.05

GM
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
-4.6296
18.986

t-Test: Two-Sample Assuming Unequal Varia


Mean
Variance
Observations
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

11.9792
7.907
-5.1724
3.4091
3.4091
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091
0.7541
14.9701
11.9792
7.907
-5.1724

Problem:

-1.7226
-0.5535
6.679
1.8261
1.8261
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
-4.6296
18.986
-1.7226
-0.5535
6.679

The NULL Hypothesis that the means of both returns are equ

A company is testing light bulbs from 2 suppliers. Below is listed the hours of usage before
Determine using a 0.05 level of significance whether the new supplier's light bulbs really las
old supplier's.

Light Bulb Suppliers


Old
42
46
64
53
38
44
61
44
50
60
39
51
42
37
45
65
54
46
42
44
26
52

p-Values for both one and two tailed tests are greater than th
so it can be stated with 95% certainty that there is a differenc

New
55
45
58
52
54
47
51
61
49
56
52
49

t-Test: Two-Sample Assuming Equal Varianc


Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

The one-tailed p-value (one-tailed because we are only testin


the stated level of significance (0.05) so we cannot reject the
the means light bulb life for both suppliers is the same.

If Change Occurred

ple, after an ad compaign is run.

Difference
10
5
2
17
5
6
9
23
12
7
10
5
2
17
5
6
9
23
12
7
10
5
2
17
5
6
9
23
12
7

In this case, we want to determine with 95% certainty whether an advertising campa
to our large dealer network. To determine this, we must take Before and After sampl
The keys to success of this sampling are the following:

1) At least 30 dealers must be sampled.


2) Before and After samples must be taken from the same dealers
3) The samples must be AVERAGE sales, for example, average daily sales over a we
4) The dealer's sampled must be random and representative of the overall populatio

We are trying to determine whether the Mean Difference falls inside or outside the 9
If the Mean Difference falls within this 95% Confidence Interval, We say that there is
If the Mean Difference Falls outside this Confidence Interval, there is a 95% chance

We can state with 95% certainly that there has been no significant change if the Ave
the 95% Confidence Interval of this mean being 0. To determine the 95% Confidence
Sample size (COUNT) =
Sample Standard Deviation (S
Sample Standard Error =
Sample Mean (AVERAGE) =

30
6.11
1.11
9.60

Need at least 30 samples.of

(1 - Confidence Interval) =

0.05

(for 95% Confidence Intveral, = 0.

Sample Standard Error = (Sample S

The 95% confidence interval will contain 95% of the area under the Normal curve. The rem
The Z Score represents the right outer edge of the confidence interval. Total area under th
a 95% two-tailed confidence interval is 97.5%. The z Score for this is 1.96. This means tha
is to the left of 1.96 Standard deviations to the right of the mean.

Z Score (two tailed) for 95% CI

1.96

NORMSINV(0.975)

The 95% Confidence Interval around a Sample Mean of 0 = 0 +/- (Z Score for 95% CI)
0 +/- (1.96) x (1.11)

The 95% Confidence Interval for the Mean = 0 is from -2.18 to +2.18

If the Sample Mean (9.60) is outside of the 95% Confidence Interval for the Mean Differen
We can say with 95% certainty that Average Daily sales throughout the entire population o
has increased.

This is the case because Mean of 9.60 is outside of the confidence interval of -2.18 to +2.
We can now state with 95% certainty that the advertising campaign has caused a change

d after to determine if something has changed


mples are statistically different than the "before"sample
n, unless population is known to be normally distributed

with 95% certainty whether or not there has been

hypothesis is 0 and = 0.05 (1 - 0.95)

ple for Means


Before
After
5.6411833333
3.4309
55.6264861257 72.498467704
6
6
0.3504379667
0
5
0.5920775727
0.28977785
2.0150483721
0.5795557
2.5705818347

eater the (0.05) so there has been no significant increase

eater the (0.05) so there has been no significant change at all

P values, we cannot reject the Null Hypothesis


states that there is no change in the mean.

on will improve tire wear.


sets of news with the new
driven until they wore out.
produces longer tread life.

t-Test: Paired Two Sample for Means


Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

the mean tread wear of the new rubber.


significance (0.05) so the NULL Hypothesis
ncreases tread wear.

a real difference. Use a 0.05 level of significance.

uming Unequal Variances


Viacom
GM
5.6411833333
3.4309
47.9538673497 62.498679055
30
30
0
57
1.1519157329
0.1270821737
1.6720288889
0.2541643475
2.0024654439

Old Rubber
37212
23678506
12
0.7364904091
0
11
-2.3950919344
0.0177699241
1.7958848142
0.0355398482
2.2009851587

New Rubber
40303.75
43699518.3864
12

led tests are greater than the stated level of significance (0.05)
ainty that there is a difference in the returns of these companies.

eans of both returns are equal is rejected.

d the hours of usage before each sample burned out.


pplier's light bulbs really last longer than the

uming Equal Variances


Old

New
47.5 52.416666667
90.5476190476 21.537878788
22
12
66.8255208333
0
32
-1.675954
0.051746314
1.6938887026
0.103492628
2.0369333344

d because we are only testing if one is better) is very close to


.05) so we cannot reject the NULL Hypothesis, which states that
suppliers is the same.

whether an advertising campaign increased average daily sales


t take Before and After samples of average daily sales at least 30 dealers.

average daily sales over a week or a month. It cannot just be one sample of one day's sales
tative of the overall population.

e falls inside or outside the 95% Confidence Interval that the Mean Difference is 0.
Interval, We say that there is a 95% that the Mean Difference is 0 and No change occurred.
terval, there is a 95% chance that average daily sales for the whole network has changed.

significant change if the Average (Mean) Difference is within


etermine the 95% Confidence Interval for a 0 Mean, we need the following information:

d at least 30 samples.of daily averages from the same dealers

ple Standard Error = (Sample Standard Deviation) / ( Square Root of Sample Size)

95% Confidence Intveral, = 0.05)

under the Normal curve. The remaining 5% () will be split between each outer tail on the Normal curve.
nce interval. Total area under the Normal curve to the left of this Z value for
e for this is 1.96. This means that 97.5% of the total area under the Normal curve

MSINV(0.975)
= 0 +/- (Z Score for 95% CI) * (Sample Standard Error)

e Interval for the Mean Difference being 0,


roughout the entire population of dealers

nfidence interval of -2.18 to +2.18

ampaign has caused a change in the daily sales of the dealer network.

Analysis of Variance - ANOVA

ANOVA is a technique for testing the equality of different population means. ANOVA is very useful because it can be
extened to any number of populations. All ANOVA test the NULL Hypothesis - that is - all samples drawn have the sam

ANOVA is often used by markets to tests whether different marketing campaigns with multiple varying elements actua

The NULL Hypothesis is rejected - that is - there are real differences between the means - if the p-Value pertaining to t
item being evaluated is less than the desired level of significance. For example, in the 1st ANOVA below, the p-Value
petaining to "Between Methods (Groups) is less than the desired lever of significance - So there is a difference betwe

Anova: Single Factor - Single Factor Analysis Calculated by Excel


The Hand Calculation of this ANOVA is performed at the bottom of this worksheet

Students
Problem: 3 different sale training methods are used. Three groups of
four randomly chosen new saleppeople are chosen. Each
group is trained using one of the methods. After the course
is completed, sales totals of each salesperson over the
next two weeks is collected.

1
2
3
4

Determine within a 0.05 level of significance whether there


is a difference in the effectiveness of the courses.

Anova: Single Factor


SUMMARY
Groups
Method 1
Method 2
Method 3

Count

Sum
4
4
4

68
80
92

ANOVA
Source of Variation
Between Groups
Within Groups
Total

SS

df
72
46

2
9

118

11

The p-Value for Methods (Between Groups, which are the Methods) (0.011419201) is much less than the level of signi
so there is a difference between the effectiveness of the teaching methods..

The p-Value calculated by Excel agrees with the hand-calculated p-Value, which is less than the level of significance.
difference in the effectiveness between the courses.

Anova: Two Factor - Two Factor Without Replication


Two factors are being evaluated and each test is performed only once.

Problem:

Here are 3 different types of typing keyboards.


5 Typists each got to use all three keyboards. Here
are the typing speeds of each typist on of of the 3
keyboard types. Determine at a 0.01 level of
significance (99% certainty) whether typing speed
differs between the 3 keyboard type.

Typist 1
Typist 2
Typist 3
Typist 4
Typist 5

In this example, the two factors that influence the speed of typing are 1) the keyboard, and 2) the typing ability of each

Anova: Two-Factor Without Replication


SUMMARY
Typist 1
Typist 2
Typist 3
Typist 4
Typist 5

Count

Keyboard A
Keyboard B
Keyboard C

Sum
3
3
3
3
3

180
338
141
303
216

5
5
5

375
379
424

ANOVA
Source of Variation
Rows
Columns
Error
Total

SS

df
9151.0666666667
296.1333333333
94.5333333333

4
2
8

9541.7333333333

14

The p-Value for the Rows (5.42004E-08) is much less than the level of significance (0.05) so there is a difference betwe

The p-Value for columns (0.003428581) is much less than the level of significance (0.05) so there is a difference betwe

Anova: Two Factor - Two Factor With Replication

Two factors are being evaluated and the tests are performed more than once (in this case, each test is performed in tw

Problem

A Perfume company was testing a product using


3 different advertising focuses (Sophisticated, Athletic, PopularDesign 1
3 different package Designs, and testing 2 separate
markets. Using a 0.05 level of significance,
Design 2
determine 1) Advertising Focus, 2) Package Design,
or 3) the Interaction between them had any affect
Design 3
on sales. The chart shows the sales with each
combination in each of the two markets.

Anova: Two-Factor With Replication


SUMMARY

Sophisticated

Athletic

Design 1

Count
Sum
Average
Variance

2
5.53
2.765
0.00245

2
3.37
1.685
0.25205

2
5.97
2.985
0.18605

2
2.9
1.45
0.005

2
5.13
2.565
0.00125

2
6.03
3.015
0.03645

6
16.63
2.7716666667
0.0732566667

6
12.3
2.05
0.62848

Design 2

Count
Sum
Average
Variance
Design 3

Count
Sum
Average
Variance
Total

Count
Sum
Average
Variance

ANOVA
Source of Variation
Sample
Columns
Interaction
Within

SS

Total

df
0.8072111111
4.9910777778
2.2771222222
1.0447

2
2
4
9

9.1201111111

17

The p-Value for Sample (0.076062) is more than the level of significance (0.05). We cannot reject the NULL Hypothesis

The p-value for Columns (0.00037339) is less than the level of significance (0.05). This indicates that that overall adve
The p-Value for Interaction (0.022409) is less than the level of significance. This indicates that different combinations

Anova: Single Factor - Single Factor Analysis Calculated by Hand


( Excel calculation of Single Factor ANOVA is shown at the top of this Worksheet)
Problem:

3 different sale training methods are used. Three groups of


four randomly chosen new saleppeople are chosen. Each
group is trained using one of the methods. After the course
is completed, sales totals of each salesperson over the
next two weeks is collected.
Determine within a 0.05 level of significance whether there
is a difference in the effectiveness of the courses.

Column Total

Method 1
16
21
18
13
68

Column Mean

17

Grand Mean = (17 + 20 + 23) / 3

Grand Mean =

20

Column Mean - Grand Mean

-3

(Column Mean - Grand Mean)^2

# Rows x [ (Column Mean - Grand Mean)^2 ]

36

Sum of Squares Between Groups = 36 + 0 + 36 =

72

Degrees of Freedom
Between Groups DOF = # groups - 1 = c - 1 = 3 - 1 =

Within Groups DOF = C(r-1) = 3 (4 - 1) =

Total Degrees of Freedom =

11

Sum of Squares
Between Groups Sum of the Squares
Sum of Squares Within Groups

72
46

Total Sum of the Squares

118

Mean Squares
MS = Mean Square = Sum of Square / degrees of freedom
SS
72
46

df
2
9

F Statistic
F Statistic = (MS Between Group) / (MS Within Groups)
F Statistic = 36 / 5.111111 =

7.0434782609

p Value
p-Value = FDIST(F Statistic,DOF Between Groups,DOF Within Groups) =

p-Value = FDIST(7.043478,2,9) =

0.0144192029

The p-value of 0.014419 is less than the designated level of significance of 0.05. This indicates
if there was no difference in effectiveness between the courses. Therefore, there is at least 95

seful because it can be


mples drawn have the same mean.

le varying elements actually yielded different results.

he p-Value pertaining to that


NOVA below, the p-Value
here is a difference between the groups.

by Excel

is worksheet

Teaching Method
Method 1
Method 2
Method 3
16
19
24
21
20
21
18
21
22
13
20
25

Average

Variance
17 11.3333333
20 0.66666667
23 3.33333333

P-value
F
36 7.04347826 0.014419201
5.1111111111
MS

ess than the level of significance (0.05)

F crit
4.2564947291

the level of significance. This indicates that there is a real

Keyboard A
51
109
47
98
70

Keyboard B
57
112
43
98
69

Keyboard C
72
117
51
107
77

) the typing ability of each typist.

Average

Variance
60
117
112.6666667 16.3333333
47
16
101
27
72
19
75
75.8
84.8

767.5
819.7
724.2

P-value
MS
F
2287.766667 193.605078 5.42004E-008
148.0666667 12.5303244 0.003428581
11.81666667

here is a difference between the speed of each typist.

F crit
7.0060766231
8.6491106407

here is a difference between keyboards regarding typing speed.

ach test is performed in two markets).

Sophisticated
2.80
2.73
3.29
2.68
2.54
2.59

Popular

Athletic
2.04
1.33
1.50
1.40
3.15
2.88

Total

2
6
2.84
11.74
1.42 1.95666667
0.0512 0.46722667

2
6
2.82
11.69
1.41 1.94833333
0.3362 0.75057667

2
6
3.25
14.41
1.625 2.40166667
0.17405 0.44477667

6
8.91
1.485
0.12407

Popular
1.58
1.26
1.00
1.82
1.92
1.33

Use "2 Rows Per Sample"

P-value
MS
F
0.403605556 3.4770269 0.076062669
2.495538889 21.4988513
0.00037339
0.569280556 4.90430267 0.022409688
0.116077778

F crit
4.2564947291
4.2564947291
3.6330885115

ect the NULL Hypothesis that states that the package does not affect sales.

tes that that overall advertising strategies affect sales differently.

at different combinations of interactions (package / ad campaign) have different affects on sales.

by Hand

Method 2
19
20
21
20
80

Method 3
24
21
22
25
92

Column Total

20

23

Column Mean

36

Sum of Squares Within Treatments = 34 + 2 + 10 =

MS
36
5.1111111111

The p-Value represents the proportion of area under the F Distribution curve to the right of the given F value.
If this p-Value is less than the stated level of significance, this demonstrates that there is a difference
in the objects or process being analyzed. - in other words, there is a difference in the variances.

nce of 0.05. This indicates that there is less than a 5% chance that this result could have occurred
refore, there is at least 95% certainty that there is a real difference in effectiveness of the courses.

Method 1
16
21
18
13
68
17

Method 1

Method 2 Method 3
19
24
20
21
21
22
20
25
80
92
20

23

Method 2 Method 3

16 - 17
21-17
18 - 17
13 - 17

Method 1
-1
4
1
-4

19 - 20
20 - 20
21 - 20
20 - 20

24 - 23
21 - 23
22 - 23
25 - 23

Method 2 Method 3
-1
1
0
-2
1
-1
0
2
Square each

Method 1
1
16
1
16
34
46

Method 2 Method 3
1
1
0
4
1
1
0
4
2
10

right of the given F value.


ere is a difference
e variances.

Determining if Population Variance Has Changed - Uses Ch

Quality control people use the Chi Square test to determine if process' variance levels are staying within given limits.

The Chi Square Distribution is used to determine if a population's variance has been changed. The Chi Squre Distribution is sk
curve occuring at the point on the x axis that equals the number of degrees of freedom (n-1 --> Sample Size - 1). The total are
The area under the curve to the left or right of outer limits determines wihether it can be said with a certain degree of confidenc
If the area outside the Chi Square Statistic (the p value) is less than the desired level of significance, then the population varia

If Sample Standard Deviation, s, is greater than Population Standard Deviation, , then the Chi Squared Statistic will be to the
and the p value produced by CHIDIST(ChiSquare Statistic, degrees of freedom) will be the p value of the right tail.

If Sample Standard Deviation, s, is less than Population Standard Deviation, , then the Chi Squared Statistic will be to the lef
and the p value produced by CHIDIST(ChiSquare Statistic, degrees of freedom) will still be the area under the Chi Square curv
To get the area under the left tail (are to the left of the Chi Square point), the p-value = 1 - CHIDIST(Chi Square Statistic, degre

Test on Whether a Population Variance Has Increased Above a Gi

Problem: A manufacturer wants to check if the variance on a process has changed. A machine drills a hole as part o
The standard deviation of the hole diameter has historically been 1.6 ml.
A random sample of 50 hole diameters were checked in one batch. The measured sample standard deviatio
At an 0.05 level of significance, has the population standard deviation increased above 1.6 ml?
Givens:
n=
Degrees of Freedom= n-1
Level of Significance, , =
Population Standard Deviation, , =
Sample Standard Deviation, s, =

50
49
0.05
1.6
1.9

Use the Chi Squared Test to determine if there has been a change in variance.
1) Calculate Chi Square Statistic, = [ (n-1)*(s*s) ] / (*) =

69.09766

2) Obtain p value from Chi Square Statistic


Upper p value = CHIDIST(69.09766,49) =

0.030749

This p value states the portion of total area under the Chi Square distribution curve for 49 degree of freedom to the
The Chi Square Statistic is caluculated from sample size (n - 1), population standard deviation, and sample standa
If the p value ( the area under the Chi Square distribution curve to the right of the Chi Square Statistic on that curve
greater than the level of significance value we are evaluating ( = 0.05 on a one-tailed test), then we accept the NU
In the case the p value (0.030749) is less than the desired level of significance ( = 0.05), and we reject the

It appears that the population variance has increased above 1.6 ml.

Test on Whether a Population Variance Has Decreased Below a G

Problem: A manufacturer wants to check if the variance on a process has changed. A machine drills a hole as part o
The standard deviation of the hole diameter has historically been 1.6 ml. The engineers believe that they ha
A random sample of 50 hole diameters were checked in one batch. The measured sample standard deviatio
At an 0.05 level of significance, has the population standard deviation decreased 1.6 ml?
Givens:
n=
Degrees of Freedom= n-1
Level of Significance, , =
Population Standard Deviation, , =
Sample Standard Deviation, s, =

50
49
0.05
1.6
1.375

Use the Chi Squared Test to determine if there has been a change in variance.
1) Calculate Chi Square Statistic, = [ (n-1)*(s*s) ] / (*) =

36.18774

2) Obtain p value from Chi Square Statistic


Area under curve to right = CHIDIST(69.09766,49) =

0.912951

p value = Area to the left of Chi Square point = 1 - CHIDIST () =

0.087049

This p value states the portion of total area under the Chi Square distribution curve for 49 degree of freedom to the
The Chi Square Statistic is calculated from sample size (n - 1), population standard deviation, and sample standard
If the p value ( the area under the Chi Square distribution curve to the right of the Chi Square Statistic on that curve
greater than the level of significance value we are evaluating ( = 0.05 on a one-tailed test), then we accept the NU
In the case the p value (0.087049) is greater than the desired level of significance ( = 0.05), and we do not
It appears that the population variance has not decreased below 1.6 ml.

anged - Uses Chi Squared Distribution

are staying within given limits.

. The Chi Squre Distribution is skewed with the high point of the
> Sample Size - 1). The total area under the Chi Squared curve is 1.0.
with a certain degree of confidence that the population variance has changed.
cance, then the population variance has changed.

hi Squared Statistic will be to the right (greater than) the degree of freedom point
value of the right tail.

Squared Statistic will be to the left (less than) the degree of freedom point
e area under the Chi Square curve to the right of the Chi Square Statistic point..
IDIST(Chi Square Statistic, degrees of freedom)

eased Above a Given Value

machine drills a hole as part of the manufacturing process.

sured sample standard deviation was 1.9 ml.


sed above 1.6 ml?

e for 49 degree of freedom to the left of the Chi Square Statistic


rd deviation, and sample standard deviation.
Chi Square Statistic on that curve) is
iled test), then we accept the NULL Hypothesis.

e ( = 0.05), and we reject the NULL Hypothesis.

eased Below a Given Value

machine drills a hole as part of the manufacturing process.


engineers believe that they have improved the process.
sured sample standard deviation was 1.35 ml.
ased 1.6 ml?

e for 49 degree of freedom to the left of the Chi Square Statistic


d deviation, and sample standard deviation.
Chi Square Statistic on that curve) is
iled test), then we accept the NULL Hypothesis.

ance ( = 0.05), and we do not reject the NULL Hypothesis that there has been no change.

Normal Distribution

The Normal distribution is a continuous distribution, as oppoed to a discrete distribution such as the binomial distrib
Any Normal distribution can be identified by two variables - the mean and standard deviation
The area under the entire density function = 1.

Most problems involving the Normal distribution fall into two categories:
1) Determining the probability of a normally distributed random variable having a value within a given interval

2) Determining a Confidence Interval - that is - Determining an interval within which the value of a normally distribute

To be able to apply the Normal distribution, It is extremely important that the underlying population can be

For any population, whether Normally distributed or not, the distribution of x bar (th
Normally distributed if sample size is large (30 or more).
This a basic tenant of the Central Limit Theorem - Statistics' most fundamental rule.
It is important to note that the problems on this page do not deal with samples. These problems only use parameters

z = number of standard deviations that a points lies from the mean


Population Mean = = "mu"
Population Standard Deviation = = "sigma"
z=(x-)/

= ( x - mean ) / ( Length of 1 Standard Deviation )

The z distribution, sometimes called the standard normal distribution, is a normal distirbution with the mean, , = 0 and the sta

Population parameters are generally described with Greek letters, such as (population mean) and (population standard de
while Sample parameters are genearlly described with Roman letters, such as x bar (sample mean) and s (sample standard d
Statistical Function NORMSDIST(z) tells what percentage of total area of standardized normal curve (mean = 0 and standard
is to the left of a point z standard deviations from the mean, which is 0.
NORMSDIST(0) =
NORMSDIST(1.96) =

0.5

This means that half of the area under the standardized normal curve exists t

0.975

This means that 97.5% of the total area under that staandardized normal curv
This point of z = 1.96 is often used to calculate the 95% Confidence interval.
standard deviations to the left of the mena and extends to 1.96 standard devi
95% of the total area under the bell shaped Normal curve.

Statistical Function NORMSINV() tells how many standard deviations a point on a normal curve is to the left of the mean that t

will equal the percentage given as the argument for the function.
NORMSINV(0.0975) =

1.96

This means that 97.5% of the total area under the normal curve is to the left o

Statisical Function NORMDIST(x, mean, standard dev, TRUE) will calculate the area under the curve to the left of point x on a
The TRUE stated to provide Cumulative area - This is nearly always TRUE)
NORMDIST(1.96,0,1,TRUE) =

0.975

Setting mean to 0 and stan. Dev. To 1 makes it a standardized No

Problem: A store has normally distributed daily sales. The average daily sales = $2,000 and the daily sales standard d
What is the probability that the sales of one random day will be below $1,000?
Population Mean = = "mu" = $2,000
Population Standard Deviation = = "sigma" = = $500
x = $1,000
NORMDIST(1000,2000,500,TRUE) =

0.02275
2.28%

This can be interpreted by saying the only 2.28% of the total area

Problem: A brand of car has a mean fuel consumption of 27 mpg with a standard deviation of 5 mpg.
What percentage of the cars can be expected to have a fuel consumption of between 25 mpg and 30 mpg?
Fuel consumption is normally distributed for this population.
Percentage of cars with fuel efficiency between 25 mpg and 30 mpg =
Percentage of cars with fuel efficiency less than 30% - Percentage of cars with fuel efficiency less than 25% =
NORMDIST(30,27,5,TRUE) - NORMDIST(25,27,5,TRUE) = 0.725747

0.344578

For the regular Normal curve, x = + z


The standardized Normal curve has = 0 and = 1.

Statistical Function NORMSINV() tells how many standard deviations a point on a normal curve is to the left of the mean that t
will equal the percentage given as the argument for the function.
NORMINV(0.975,0,1)

1.96

This means that 97.5% of the total area under the normal curve is to the left o

Problem: A company's package delivery time is normally distributed with a mean of 10 hours and a standard deviation
What delivery time will be beaten by only 2.5% of all deliveries?
= 10
=3
NORMINV(0.025,10,3) =

4.12

Meaning that only 2.5% of all package delivery times will be quick

Problem: A tire company makes a tire with a normally distributed tread life that has a mean of 39,000 miles and standa
What tread life would be exceeded by 98% of all tires?
= 39,000
= 5,000
NORMINV(0.02,39000,5300) =

28115

Meaning that only 2% of all tires will wear out before 28,115 miles

Problem: A tire company makes a tire with a normally distributed tread life that has a mean of 39,000 miles and standa
What would the range of tread life be that 95% of all tires would wear out in?
= 39,000
= 5,000
Calculation of the left boundary:
NORMINV(0.025,39000,5300) =

28612

Meaning that only 2.5% of all tires will wear out before 28,115 mile

49388

Meaning that only 2.5% of all tires will wear out after 49,388 miles

Calculation of the right boundary:


NORMINV(0.975,39000,5300) =

So, 95% of tires will wear out in the range of 28,612 miles to 49,388 miles.

on such as the binomial distribution, whish is a set of discrete points.

e within a given interval

e value of a normally distributed random variable will fall with a given probability

nderlying population can be proven to be normally distributed. This is often not the case.

distribution of x bar (the average of each sample) will be approximately

most fundamental rule.

problems only use parameters of the entire populations.

with the mean, , = 0 and the standard deviation, , = 1.

n) and (population standard deviation)


mean) and s (sample standard devation)

al curve (mean = 0 and standard deviation length = 1)

tandardized normal curve exists to the left of z when z = 0 (z is exactly on top of the mean, that is, 0 standard deviations away from the mea

er that staandardized normal curve is to the left of the z when z is 1.96 standard deviations from the mean.
ate the 95% Confidence interval. That is, the section under the normal curve that starts a 1.96
nd extends to 1.96 standard deviations to the right of the normal curve will contain
Normal curve.

ve is to the left of the mean that the stated total area under the normal curve

er the normal curve is to the left of the point 1.96 standard deviations from the mean

he curve to the left of point x on a normal curve with the given mean and standard deviation.

. To 1 makes it a standardized Normal curve, like the above problem.

and the daily sales standard deviation = $500,

g the only 2.28% of the total area under this particular Normal curve falls to the left of x = 1,000

tion of 5 mpg.
5 mpg and 30 mpg?

cy less than 25% =


0.381169
38.12%

ve is to the left of the mean that the stated total area under the normal curve

er the normal curve is to the left of the point 1.96 standard deviations from the mean

hours and a standard deviation of 3 hours.

ckage delivery times will be quicker than 4.12 hours.

mean of 39,000 miles and standard deviation of 5,300 miles.

will wear out before 28,115 miles..

mean of 39,000 miles and standard deviation of 5,300 miles.

es will wear out before 28,115 miles.

es will wear out after 49,388 miles..

ations away from the mean)

Confidence Intervals
Collection of 40 individual test scores
210
340
490
610

Calculate with 95% certainty an interval in which the population me


based upon a random sample of 40 test scores taken from that pop

In other words, calculate a 95% Confidence Interval for the population mean.

Sample size must be at least 30 and must be random and representative of the population
Sample size (COUNT) =
Sample Standard Deviation (STDEV) =
(1 - Confidence Interval) =
Mean (AVERAGE) =

Excel calculates the Confidence Interval to be 49.42 using the following statistical function: CONFIDENCE (alpha, s
Input for this function are CONFIDENCE(0.05,159.48,40) =

Let's see how Excel's calculation holds up to the correct, manual calculation of Confidence Interval calculated from
(Excel hits it just about right on)

The 95% Confidence Interval around a Sample Mean of 0 = 0 +/- (Z Score for 95% Confidence Interval) * (Sam
Z Score for 95% Confidence Interval (two sided) = Z(0.975) = 1.96
Sample Standard Error = (Sample Standard Deviation) / ( Square Root of Sample Size)
Sample Standard Error = (159.48) / (Square Root [40] ) = 25.21
Confidence Interval = Sample Mean +/- Z Score(95% Confidence Interval) *(Sample Standard Error)
Confidence Interval = 473.5 +/- (1.96) x (25.21) = 473.5 +/Confidence Interval = 473.75 +/- 49.41 = 124.32 to 223.16

This means that there is a 95% chance that the mean of the entire popultation
is between the endpoints of this 95% Confidence Interval

Statistically this is written as:


Confidence Interval = Sample Mean +/- Z /2 * (Sample Standard Deviation / Square root of Sample Size)

Getting Z Score for Two-Tailed 95% Confidence Interval


Two-tailed 95% confidence interval will have 2.5% of toal curve area in each tail.
Therefore this Z Score corresponds to 97.5% of total area to left of Z

Z Score for two-tailed 95% confidence interval =


(NORMSINV) - Input is percentage (expressed as decimal)
of area under standardized normal curve to the left of Z =
Standardized normal curve --> Mean = 0, Standard Deviation Length = 1

Getting Z Score for One-Tailed 95% Confidence Interval


One-tailed 95% confidence interval will have 5% of total curve area in right tail.
Therefore this Z Score corresponds to 95% of total area to left of Z

Z Score for one-tailed 95% confidence interval =


(NORMSINV) - Input is percentage (expressed as decimal)
of area under standardized normal curve to the left of Z =

Determining Sample Size (n) for a Given Confidence Level and Bound (B)
n = number of sample needed to establish a specified confidence interval of of width B on either side of mean
e.g. How many samples must be taken to estimate the population diameter (of, for example,
holes drilled by a machine) to within 0.05 mm. of the mean sample diameter with 99% confidence.
Standard deviation (determined from previous sampling) is 0.75 mm ?.
n = [ (Z score of two-tailed 99% confidence)**2 x (sample standard deviation)**2 ] / [Interval**2]
n = [ (2.575)**2 x (0.75)**2 ] / [ (0.05)**2 ] = 1,492
NORMSINV(0.995)=

Problem: A restaurant owner wants to estimate within $2.00 the average amount that customers spend during lunch.
For experience, the standard deviation of the population is $5.00. How many samples need to be taken to get a sampl
that is 92% certain of being within $2.00 of the population mean
Z score of two-tailed 92% confidence = NORMSINV(0.96) =
Population Standard Deviation = 5.00
Interval = 2.00

n = [ (Z score of two-tailed 92% confidence)**2 x (sample standard deviation)**2 ] / [Interval**2]


n = [ (1.751)**2 x (5.00)**2 ] / [ (2.00)**2 ] =

220
370
500
640

230
370
500
640

240
380
510
640

270
400
510
650

n interval in which the population mean must fall


of 40 test scores taken from that population.

nfidence Interval for the population mean.

st be random and representative of the population.


40
159.48
0.05
473.75

(Need

a sample size of at least 30

(for 95% Confidence Inveral, = 0.05)

using the following statistical function: CONFIDENCE (alpha, standard_dev,size)]

49.42

rect, manual calculation of Confidence Interval calculated from this sample:

n of 0 = 0 +/- (Z Score for 95% Confidence Interval) * (Sample Standard Error)

(0.975) = 1.96

1.96

Insert /NORMSINV(0.975)
Fu

on) / ( Square Root of Sample Size)

40] ) = 25.21

5% Confidence Interval) *(Sample Standard Error)


473.5 +/-

49.41

(Excel's answer of 49.42 is pretty close to the manual calculation of 49.41)

32 to 223.16

an of the entire popultation

ample Standard Deviation / Square root of Sample Size)

dence Interval
area in each tail.

1.96
0.975

dence Interval

area in right tail.

1.64
0.95

onfidence Level and Bound (B)

ce interval of of width B on either side of mean

on diameter (of, for example,


ple diameter with 99% confidence.

dard deviation)**2 ] / [Interval**2]

2.576

0 the average amount that customers spend during lunch.


$5.00. How many samples need to be taken to get a sample average mean expenditure during lunch?

1.751

dard deviation)**2 ] / [Interval**2]


19

Samples

Although 30 samples shold be the minimum taken unless


you know for certain that the underlying population is normally
distributed.

300
410
540
660

300
410
540
660

size of at least 30 to be able to use z score for Normal Distribution)

Inveral, = 0.05)

ORMSINV(0.975)

culation of 49.41)

320
450
580
750

320
470
580
750

320
470
610
790

xpenditure during lunch?

Binomial Distribution

Binomial distributions are are collections of discrete values as opposed to, for example, the Normal distribution, whic

Any binomial distribution can be identified the value of two of its variables - the number of trials (n) and the probabilit

Random Number Generator


Tools / Data Analysis / Random Number Generator

In this case, generate 5 random numbers, Each with possible outcomes of 2 or 3. Each event has a 20% probability of a "2" ou
(You could easily do the same thing with outputs of 1 and 0 - measuring something occuring or not occurring)
3
3
3
2
2

Number of variable = 1
(The value of the 1 variable is 1 or 0)
Number of random variables = 5

Outcome Probability
2
0.2
3
0.8

Distribution type is Discrete


Value in input range - the Yellow highlighted
Ouput range - Highlight the tan range

Sum of 2's =
2
Statistical function COUNTIF - Select the range of outputs to be c
The sum is the number of successes in 5 random trials, each having a 0.20 chance of a "2" outcome.

This sum is a binomially distributed random variable.

Calculating the probability of a certain number of a given outcome to occur


in a certain number of trials
if the probability of that outcome on a single trial is known.

Problem: What is the probability of 3 successful outcomes in 5 trials if the probability of a success
s = number of successes =

n = number of trials =

p = probability of successful outcome =


on 1 trial
Find Cumulative distribution (NO) - Use 0

0.2

Probability of this is =
0.0512
Statistical Function / BINOMDIST (in this case, you don't want cumulative distribution - Use 0 as that last argument)
Which is =
Format / Cell / Percentage

5.12%

Problem - In 12 trials (n = 12), what is the probability that at least 10 of them (Sum of the probabilities that s = 10, s = 11, and s
will have the 1 of the 2 possible outcomes that has a probability of occuring of 65%?
The probabilities of each outcome need to be added up.
10
11
12
0.65
12
0
0.108846 0.036753 0.005688009
0.151288 This represents a combined probability of
Statistical function BINOMDIST(s,p,n,FALSE)
BINOMDIST(10,12,0.65,0) + BINOMDIST(11,12,0.65,0) + BINOMDIST(12,12,0.65,0)

Problem - What is the possibility of getting between 4 and 6 heads on 10 flips of a fa


Probability of getting between 4 and 6 head = P(4) + P(5) + P(6)
Also equals [ P(1) + P(2) + P(3) + P(4) + P (5) + P(6) ]

[ P(1) + P(2) + P(3) ]

This equals [ Cumulative probability of P(6) ]

[ Cumulative probability of P(3) ]

6
0.5
10
1
0.828125
BINOMDIST(6,10,0.5,1)

3
0.5
10
1
0.171875

Equals

BINOMDIST(3,10,0.5,1)

Problem - If 10% of products require servicing, what is probability that less than 15 o
The problem actually asks what is the probability that up to 14 products will need servicing.
Therefore, you are solving for the cumulative probability that up to 14 products need servicing
s = 14
p = 0.10
n = 200
TRUE = 1

BINOMDIST(14,200,0.10,1) =

0.092946
9.29%

e, the Normal distribution, which is continuous.

r of trials (n) and the probability of success on a single trial (p)

has a 20% probability of a "2" outcome and an 80% of a "3" outcome. .


or not occurring)
Probability
This = p - This is the probability that the outcome of the event will be "1" and not "0"
This = q - This is the probabability that the outcome of the event will be "0" and not "1"

elect the range of outputs to be counted and then select the cell that has the output to be counted, (Where outcome = 2)
0.20 chance of a "2" outcome.

ome to occur

e probability of a successful outcome in 1 trial is 20%?

as that last argument)

abilities that s = 10, s = 11, and s = 12)

15.13%

eads on 10 flips of a fair coin?

0.65625
65.63%

ility that less than 15 of 200 products will need servicing?

e outcome = 2)

Population Proportions

When sample of size n is used to estimate a population proportion, e.g. a proportion of a population who would vote f
it can be analyzed using the binomial distribution
The population proportion of success will be the same as p, the probability of success of a single trial.
The following relationships hold true for population proportions:
The mean of sample proportions = = p
The standard deviation of sample proportions = = SQRT { [ p (1 - p) ] / n }
The confidence interval of a population proportion would be = z = p zSQRT { [ p (1 - p) ] / n }

Problem: A random sample of 350 people was chosen and each person was asked if they recognized a particular bran
112 people recognized the brand. Calculate a 95% confidence interval of the proportion of the total populatio
who recognize the brand.
Givens:
n=
p= 112 / 350 =

350
0.32

Confidence level

0.95 - This means that 2.5% of area under Normal curve exists in each tail above and belo

z = NORMSINV(0.975) =

1.96

- 97.5% of the total area under the normal curve is to the left of a point 1.96 standard

The confidence interval = z = p zSQRT { [ p (1 - p) ] / n } =


The confidence interval =
The confidence interval =

0.32
0.27113

0.04887
to

0.36887

Which means that there is a 95% chance that be


are aware of the brand.

Determining Sample Size for a Desired Sampling Error

The minimum number of sample needed, n, to obtian a confidence interval of a certain width, e (or given sample error
n = p (1-p) (z/e)**2

It is better to use the binomial distribution to calculate the p value when dealing with
The p value is the area under the Normal curve outside of x - NOT the probability of a successful trial)

Problem: A manufacturer of circuit boards wants to keep the proportion of defective boards at 0.098.
The manufactur tested 156 randomly chosen boards and found 20 to be defective.
Determine with a 95% certainty (0.05 level of significance) the defective proportion has not increased above 0
n=
p=
x=

156
0.098
19

The probability that 20 or more boards are defective =

1 - the probability that19 or less are defective = 1 - Cumulative probability of 19 defective = 1 - BINOMDIST(19,256,0.
1-

0.870142

0.129858

This p-value of 0.129858 is greater than (0.05 - the level of significance - the proportion of area under the Normal curve to th
We therefore conclude that the large x value could have happened by chance and we fail to reject the NULL Hypothesis.

To determine whether a known population has changed, take a sample of the population and use the binomial distribu
calculate the probability of that sampling event (the number of successes, x, per given sample size,n, given p - the pre
and compare that probabiilty to the desired level of significance.
If this probability is less than the level of significance you have established ( for a one-tailed test and /2 for a two-ta
then the NULL Hypothesis is rejected.

population who would vote for a certain candidate,

recognized a particular brand.


portion of the total population

ists in each tail above and below the confidence interval.

he left of a point 1.96 standard deviations from the mean

here is a 95% chance that between 27.1% and 36.9% of the total population

dth, e (or given sample error)

ue when dealing with a proportion.

cessful trial)

ds at 0.098.

n has not increased above 0.098.

e = 1 - BINOMDIST(19,256,0.098,1)

a under the Normal curve to the right of the critical value)

ct the NULL Hypothesis.

and use the binomial distribution to


mple size,n, given p - the previously know probability of success in a single trial)

ailed test and /2 for a two-tailed test),

Histograms, Charting, and Descriptive Statistics

Civilian Labor Force (1,000)


Year
Males
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988

40,619
40,803
41,129
40,831
40,712
41,334
41,496
41,749
42,645
42,625
42,833
43,053
43,563
43,907
43,589
44,025
44,397
44,837
44,698
45,086
45,671
46,081
46,842
47,627
48,542
49,389
50,862
51,213
51,753
52,784
54,077
55,349
56,225
56,860
57,461
58,105
59,250
59,949
61,126
61,899
62,423

Females
14,974
15,580
16,285
17,000
17,593
17,957
17,492
18,266
19,456
19,591
20,093
20,455
20,689
21,608
21,758
22,134
22,734
23,351
24,043
25,003
25,642
26,770
27,954
28,810
29,580
30,148
31,491
32,972
34,214
35,399
37,323
38,959
40,747
41,866
42,952
44,255
44,994
46,740
47,852
49,085
50,436

80,000
70,000
60,000
50,000
40,000
30,000
20,000
10,000
0

1st - Highlight the Males and Females column of data to cr


2nd - In the 2nd step of creating the chart, click the Series

Descriptive Statistics - Tools / Data Analysis / Descriptive S


Males
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999

63,375
64,805
65,149
65,767
66,329
66,788
67,516
67,434
68,884
69,547
70,295

51,996
52,925
53,328
54,356
54,982
56,322
56,871
57,503
58,788
59,583
60,718

Measures of Dispersion - Standard Deviation and Variance


x

x bar (x mean)
20
30
42
40
55
521

118
118
118
118
118
118
Sum ( (x - x bar)**2) =

# of points
Statistical Function COUNT

n-1

Sum
Arithmetic Function SUM

708

Mean
Statistical Function AVERAGE

118

Individual Function Calculation of Stan Dev & Var


Variance
Statistical Function VAR

39117.2

Standard Deviation
Statistical Function STDEV

197.7807

Histogram and Descriptive Statistics


State

Median Value
Owner
Occupied

Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas

$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$

53,700
94,400
80,100
46,300
195,500
82,700
177,800
100,100
123,900
77,100
71,300
245,300
58,200
80,900
53,900
45,900
52,200

Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri

$
$
$
$
$
$
$
$
$

50,500
58,500
87,400
116,500
162,800
60,600
74,000
45,600
59,800

Descriptive Statistics
Median Value Owner Occupied
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

Histogram
Bin Range Requested By Histogram (in Yellow)
Interval
1
2
3
4
5
6

Montana
Nebraska
Nevada

$
$
$

56,600
50,400
95,700

7
8

New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming

$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$

129,400
162,300
70,100
131,600
65,800
50,800
63,500
48,100
67,100
69,700
133,500
61,100
45,200
58,400
59,600
68,900
95,500
91,000
93,400
47,900
62,500
61,600

Histogram - Tools / Data Analysis / Histogram


45000
70000
95000
120000
145000
170000
195000
220000

Frequency

More

Histogram - Median Income


30

27

25
20
15

Frequency

11

10
4

45000 - Starting (25000 blocks)

Sorting Data and Histogram To Find Patterns


Sorted Data

Original Data
Gross Domestic Product Per Capita
using Purchasing Power Parity 1991
Country
Australia

per capita GDP


(dollars)
$
16,085

Gross Domestic Product Per Capita using


Purchasing Power Parity 1991
Country
Turkey

Austria
Belgium
Canada
Denmark
Finland
France
Germany
Greece
Iceland
Ireland
Italy
Japan
Luxembourg
Netherlands
New Zealand
Norway
Portugal
Spain
Sweden
Switzerland
Turkey
United Kingdom
United States

$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$

17,280
17,454
19,178
17,621
15,997
18,227
19,500
7,775
17,237
11,507
16,896
19,107
21,372
16,530
13,883
16,904
9,191
12,719
16,729
21,747
3,491
15,720
22,204

Greece
Portugal
Ireland
Spain
New Zealand
United Kingdom
Finland
Australia
Netherlands
Sweden
Italy
Norway
Iceland
Austria
Belgium
Denmark
France
Japan
Canada
Germany
Luxembourg
Switzerland
United States

Males vs. Female Hires

Males
Females

ght the Males and Females column of data to create the chart. Do not highlight the year column.

e 2nd step of creating the chart, click the Series tab and highlight the Year column as the x-axis.
Statistics - Tools / Data Analysis / Descriptive Statistics
Females
52371.3076923077
1362.9393673634
50125.5
#N/A
9828.2955487544
96595393.3936654
-1.3294344022
0.4121671405
29676
40619
70295
2723308
52

Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

34646.6
2051.745
30819.5
#N/A
14795.34
2.2E+008
-1.365154
0.341443
45744
14974
60718
1801623
52

and Variance
x - x bar

(x - x bar)2
-98
-88
-76
-78
-63
403

9604
7744
5776
6084
3969
162409

n-1 =

195586
5

Direct Calculations of Standard Deviation and Variance


Variance = [Sum ( ( x - x bar)**2 )] / [n-1] =

Standard Deviation = SQ RT (Variance) =

39117.2

197.7807
Arithmetic Function SQRT

Descriptive Statistics Calculations of Stand Dev & Variance


Descriptive Statistics Tools / Data Analysis / Descriptive Statistics

Mean
Standard Error
Median

118
80.7436271995
41

Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

#N/A
197.7806866203
39117.2
5.925570311
2.429919032
501
20
521
708
6

ptive Statistics

edian Value Owner Occupied


84209.8039215686
6018.5414524549
68900
#N/A
42980.9830269247
1847364901.96079
3.5562086171
1.8496160605
200100
45200
245300
4294700
51

Requested By Histogram (in Yellow)


More than ..
45000
70000
95000
120000
145000
170000

But not more than..


70000
95000
120000
145000
170000
195000

195000
220000

220000
245000

ta Analysis / Histogram
Frequency
27
11
4
4
2
1
1
1

togram - Median Income

Frequency

0 - Starting (25000 blocks)

per capita GDP


(dollars)
3,491

Histogram
Bin
3491
8169.25

Allowing Excel to pick bin s

Frequency
1
1

Fre que ncy

The data needs to be copied here and then sorted Data / Sort
omestic Product Per Capita using
urchasing Power Parity 1991

15
10
5
0

Fre que ncy


15

$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$

7,775
9,191
11,507
12,719
13,883
15,720
15,997
16,085
16,530
16,729
16,896
16,904
17,237
17,280
17,454
17,621
18,227
19,107
19,178
19,500
21,372
21,747
22,204

12847.5
17525.75
More

3
11
8

10
5
0

Males
Females

Fre que ncy

Allowing Excel to pick bin size (leave bin range blank)

Histogram
15
10
5
0

Frequency

Fre que ncy

Histogram
15
10

Frequency

5
0

per Cap GDP

Hypothesis Testing of a Population Mean

Hypothesis testing is one of the types of statistical tests to determine if a change has occurred to a population mean.

Overall, two hypothesis are being created and tested.

The first hypothesis, the NULL Hypothesis, is usually stated in terms such as "There has been no change in the popu
This will normally involve an equal sign.

The second hypothesis, the Alternative Hypothesis, states that the population mean has changed in one of three ways
1) The population mean has changed (increased OR decreased) - This involves a two-tailed test
2) The population mean has decreased - This involves a one-tailed test with the left tail
3) The population mean has increased - This involves a one-tailed test with the right tail.
In summary, hypothesis testing involves:
1) Determining the NULL hypothesis, determining the level of certainty to which that NULL Hypothesis

1) Determining the NULL hypothesis. This is normally that the original population mean has not changed.
2) Determining the level of certainty to which that NULL Hypothesis will be tested. If you want to establish a 95% certainty leve
3) Take a sample of the population.
4) Calculate the sample mean. This value will be called x.
5) Graph this sample mean on the normal curve created from the original population mean
6) The NULL Hypothesis is accepted or rejected based upon the results of either of the following tests (which are both equivale

6a) The critical value test - The level of certainty, , is converted to a "critical value." This "critical value" is the number of stand
the level of certianty is from the mean. For example, on a two-tailed test, an of 0.05 translates to a 95% level of certainty
On a two-tailed test, this would result in 2.5% of the total area under the Normal curve to be greater than the right critical v
and 2.5% of the area under the Normal curve to be less than the left critical value. Each critical value is 1.96 standard devi
from the mean on the normal curve - NORMSINV(0.975) =
1.96
The z value of the sample mean is calculated. The z-value is the number of standard deviations that the sample mean is fr
on a Normal curve derived from the population mean.
If the z-value of the sample is farther away from the mean than the critical value (the z value of that level of certainty), then

6b) The p-value test - This is equivalent to the above test A Normal curve is constructed based upon the population mean.
The is the significance level. The significance level represents that percentage of the area under the normal curve that is
For example, on a two-tailed test with a 95% required level of certainty, = 0.05. The test is two-tailed so 2.5% of the total
and 2.5% of the area under the normal curve will be below the 95% confidence area.
The p value is equal to the percentage of area under the normal curve that is outside of x on the normal curve.
If the p value is less than the the percentage of the area under the normal curve corresponding to , the NULL Hypothesis

Two-tailed test - Testing whether a population mean changed in e

Problem: A manufacturer claims that the average thickness of metal sheets is 15 mls. And that the population standar
50 sheets are sample having a sample mean of 14.982 mls. At the 0.05 significance level (95% confidence leve
the manufacturer's claim that the average thickness of 15 mls. is correct.
Givens:
n=
=
=
x=
=

50
0.05
0.1
14.982
15

The NULL Hypothesis is the population mean, , = 15 mls.

The ALTERNATE Hypothesis is that 15 mls. (Since we are testing whether a difference exists in either direction, this is a tw
1) Calculate Sample Standard Error

Sample Standard Error = / SQRT(n) =

0.014142

2) Calculate z value for sample -

Z value = (x - ) / (Sample Standard Error)= -1.272792

3) Calculate p value - the area under the Normal curve outside the sample z value.
NORMSDIST(1.272792) =
This states that 10.154% of the total area under the Normal curve is lies outside a point 1.27 standard deviations from the m
THE P TEST CAN BE PERFORMED AT THIS POINT

The NULL Hypothesis is rejected if the p-value (the percentage of area under the Normal curve ouside point x) is less than /2

The p-value = 0.101546 and is much larger than /2 (0.025) so the NULL Hypothesis is not rejected - The manufacturer's claim

TO PERFORM THE EQUIVALENT CRITICAL VALUE TEST, DO THE FOLLOWING;


1) Calculate the critical value of - NORMSINV(0.975)=

1.96

This states that of 0.05 on a two-tailed test produces a confidence interval that goes from 1.96 standard deviations above th
If x is outside of this range (the z value for z is greater than 1.96), then the NULL Hypothesis is rejected.

In this case, the z value of x (1.27279) is less than the critical value (1.96) and therefore x is closer to the mean than the critica

One-tailed test - Testing whether a population mean changed in o

Problem: A furniture company states that its average delivery time is 15 days with a (population) standard deviation o
A random sample of 50 deliveries showed an average delivery time of 17 days.
Determine within 98% certainty (0.02 significance level) whether delivery time has increased.
Givens:
n=
=

50
0.02

=
x=
=

4
17
15

This is a one-tailed test because we are checking whether delivery time increased.
NULL Hypothesis - = 15
ALTERNATE Hypothesis - > 15

Using the P-test, we will determine if the p value (area above x under the normal curve) is less than (since this is a one-tailed
1) Calculate Sample Standard Error

Sample Standard Error = / SQRT(n) =

0.565685

2) Calculate z value for sample -

Z value = (x - ) / (Sample Standard Error)=

3.535534

3) Calculate p value - the area under the Normal curve outside the sample z value =
1 - NORMSDIST(3.535534) =
This states that 0.000203 of the total area under the Normal curve is lies above the point 3.535534 standard deviations abo

This p-value (0.000203) is less than (0.02) so the NULL Hypothesis is rejected - It appears likely that delievery time has in

d to a population mean.

n no change in the population mean"

ged in one of three ways:

blish a 95% certainty level, then , "alpha" , = 0.05

s (which are both equivalent to each other)

ue" is the number of standard deviations that


to a 95% level of certainty.
ater than the right critical value
alue is 1.96 standard deviations

hat the sample mean is from the population mean

at level of certainty), then the NULL hypothesis is normally rejected

er the normal curve that is outside the required level of certainty.


ailed so 2.5% of the total area will be in one tail above the 95% certainty level

normal curve.
o , the NULL Hypothesis is normally rejected.

changed in either direction

t the population standard deviation, , is 0.1 mls.


vel (95% confidence level) whether

either direction, this is a two tailed test)

SDIST(1.272792) =
0.101546
dard deviations from the mean on either side (tail) of the Normal curve.

de point x) is less than /2 (in a two-talied test) or (in a one-tailed test)

- The manufacturer's claim appears to be valid.

ndard deviations above the mean to 1.96 standard deviations below the mean.

the mean than the critical value, and we do not reject the NULL Hypothesis.

changed in only one direction

on) standard deviation of 4 days.

(since this is a one-tailed test)

RMSDIST(3.535534) =
10.999797
4 standard deviations above the mean.
that delievery time has increased.

0.000203

Discrete Variables
Calculating Means, Standard Deviations, and Variances of their distributions of Disrete Variables.

P(x)

x * P(x)

Grade
4
3
2
1
0

Probability
0.1
0.2
0.35
0.25
0.1

0.4
0.6
0.7
0.25
0

1.95

Expected Value = mean = x bar = Sum [ x * P(x) ]

1.95

x
Grade
4
3
2
1
0

P(x)
Mean
1.95
1.95
1.95
1.95
1.95

( x - Mean )
2.05
1.05
0.05
-0.95
-1.95

Square of (x - Mean )
4.2025
1.1025
0.0025
0.9025
3.8025
Variance =

Probability
0.1
0.2
0.35
0.25
0.1

SUM [ { Square of (x-Mean) } * P

Standard Deviation = SQRT (Variance) =


Mathematical Function SQ

These are the variance and stand dev of probabi

{ Square of (x-Mean) } * P(x)


0.42025
0.2205
0.000875
0.225625
0.38025

{ Square of (x-Mean) } * P(x)

SQRT (Variance) =
Mathematical Function SQRT

]=

1.2475

1.116915

and stand dev of probability distribution of x (the distribution of the grades)

S-ar putea să vă placă și