Sunteți pe pagina 1din 3

2009 Spring TUNG, Yik-Man

1
ISOM111 Business Statistics
Tutorial Set 2


Z-score and Outlier ()

Z-score: For each observation
i
X , the Relative Standing or Z-score or Standardized
observation in a sample is
X i i
S X X Z ) ( = .
Outlier: An observation is an Outlier if it is unusually large or small relative to other
observations in the data set.
For a symmetric distributed data set, an outlier is defined as an observation with Z-score
has absolute value greater than 2.

Covariance and Correlation ()

Covariance: A measure of the level in which two sets of equal size data co-varying linearly.
1 ) )( ( ) , (
1
=

=
n Y Y X X Y X Cov
n
i
i i

) , ( 1 ) )( ( 1 ) ( ) (
1 1
2
X X Cov n X X X X n X X X Var
n
i
i i
n
i
i
= = =

= =

Positive covariance means the two sets of sample data co-varying in the same direction.
Negative covariance means the two sets of sample data co-varying in the opposite
direction.
Population Covariance: N Y X
N
i
Y i X i XY
=
=
1
) )( ( o

Correlation or Correlation coefficient: A measure of the strength of linear relationship
between two sets of equal size data.


= = =
= =
n
i
i
n
i
i
n
i
i i Y X
Y Y X X Y Y X X S S Y X Cov Y X Corr
1
2
1
2
1
) ( ) ( ) )( ( ) , ( ) , (
1 ) , ( 1 s s Y X Corr
1 ) , ( ~ Y X Corr means the two sets of data is strongly and positively linear co-varying.
1 ) , ( ~ Y X Corr means the two sets of data is strongly and negatively linear co-varying.
0 ) , ( ~ Y X Corr means the two sets of data has no linear co-varying relationship. But
may have nonlinear co-varying relationship.
Population Correlation coefficient:
Y X XY XY
o o o =

Example

1. (PS1Q1) Two hundred employees at a plant take an aptitude test to qualify for promotion.
The frequency distribution of scores is seen to be symmetric bell-shaped. Ms. Jones
receives a score of 70, which turns out to be one standard deviation above the mean score.
Mr. Shaw receives a score of 28, which turns out to be 1.8 standard deviations below the
mean score. If the highest score received is 100, how many standard deviations is the
highest score above the mean score?

Sol: This is to calculate the standard score of 100. Based on the given information, we can set
up the system of equation:
2009 Spring TUNG, Yik-Man
2

=
+ =
o
o
8 . 1 28
70
X
X
. Solving we get 15 , 55 = = o X . Then the standard score of 100 is given
by 3
15
55 100
=

. So 100 is an Outlier!

2. Two data sets each contains five observations taken for two variables follow.

i
X 4 6 11 3 16
i
Y 50 50 40 60 30
Data 1

i
X 6 11 15 21 27
i
Y 6 9 6 17 12
Data 2

a). Compute the sample covariances for the two data sets.
b). Compute the sample correlation coefficients for the two data sets.
c). What does the result of b indicates the relationship between the two variables in each
data set?

Sol: a). For both data sets, some descriptive statistics are:
For Data 1:
5 . 29
8
2
=
=
X
S
X
and
130
46
2
=
=
Y
S
Y
; for Data 2:
68
16
2
=
=
X
S
X
and
5 . 21
10
2
=
=
Y
S
Y
.
Thus the covariance of Data 1 and Data 2 are respectively about -60 and 26.5.

b). For Data 1, 96888 . 0 ) , ( ~ Y X Corr and for Data 2, 6931 . 0 ) , ( ~ Y X Corr .

c). For Data 1, Negative Linear relationship between two variables.
For Data 2, Positive Linear relationship between two variables.

3. (Spring08Exam1Q5) Shown below are the descriptive statistics of the three variable
MKTG, ENGG, and ACCT in the Stockton sample data set discussed in class. A matrix of
sample correlation coefficients among these three variables is also displayed. Use this
information to answer the two questions on the following?

MKTG ENGG ACCT
Mean 4.766 Mean 5.044 Mean 3.652
Standard
Error
0.365
Standard
Error
0.542
Standard
Error
0.885
Median 5.4 Median 4.5 Median 0.8
Mode 6.2 Mode 1.3 Mode 0.1
Standard
Deviation
2.584
Standard
Deviation
3.835
Standard
Deviation
6.256
Sample
Variance
6.677
Sample
Variance
14.704
Sample
Variance
39.143
Kurtosis -0.256 Kurtosis -0.339 Kurtosis 6.907
2009 Spring TUNG, Yik-Man
3
Skewness -0.103 Skewness 0.756 Skewness 2.552
Range 10.9 Range 14 Range 29.9
Minimum -13 Minimum 0.4 Minimum 0.1
Maximum 11 Maximum 14.4 Maximum 30
Sum 238.3 Sum 252.2 Sum 182.6
Count 50 Count 50 Count 50

Correlation Coefficients
MKTG ENGG ACCT
MKTG 1
ENGG -0.0034 1
ACCT -0.1587 -0.1835 1

a). If Z-scores of all the 50 sample observations of each of the three variables are calculated,
which variable (MKTG, ENGG, or ACCT) will contain the largest Z-score in absolute
value? What is that Z-score? (Show your work)
b). Determine the pairwise covariances of these three variables. (Show your work)

Sol: a). For MKTG: Max: 4125 . 2 584 . 2 ) 766 . 4 11 ( ~
Min: 8754 . 6 584 . 2 ) 766 . 4 13 ( ~
For ENGG: Max: 4396 . 2 835 . 3 ) 044 . 5 4 . 14 ( ~
Min: 211 . 1 835 . 3 ) 044 . 5 . 4 . 0 ( ~
For ACCT: Max: 2116 . 4 256 . 6 ) 652 . 3 30 ( ~
Min: 5678 . 0 256 . 6 ) 652 . 3 . 1 . 0 ( ~
So MKTG contains the largest Z-score in absolute value and the corresponding Z-score
is 6.8754.

b). Since
Y X
Y X
S S Y X Corr Y X Cov
S S
Y X Cov
Y X Corr ) , ( ) , (
) , (
) , ( = = , then:

03369 . 0
835 . 3 584 . 2 0034 . 0
) , ( ) , (
~
=
=
ENGG MKTG
S S ENGG MKTG Corr ENGG MKTG Cov
4008 . 4
256 . 6 835 . 3 1835 . 0
) , ( ) , (
~
=
=
ACCT ENGG
S S ACCT ENGG Corr ACCT ENGG Cov
5655 . 2
256 . 6 584 . 2 1587 . 0
) , ( ) , (
~
=
=
ACCT MKTG
S S ACCT MKTG Corr ACCT MKTG Cov

S-ar putea să vă placă și