Sunteți pe pagina 1din 30

# Covariance and correlation

## Coefficient of Correlation Values

Perfect Negative Correlation No Linear Correlation Perfect Positive Correlation

1.0

.5

+.5

+1.0

## Covariance and correlation

Are used to quantify the relationship that exists between the pairs of variables. Caution
Sometimes, statistical relationship exists between variables even when it is difficult to justify a causal relationship. One must be cautious in drawing inferences about causal relationships based solely on statistical relationships.

Sample covariance
Is a measure of how two variables move together. For a sample of n pairs of data (xi,yi), the sample covariance is defined as Sxy= [ (xi-x)(yi-y ) ]/(n-1)
-

## Sample correlation coefficient

r xy = sxy / sxsy Sxy = sample covariance Sx and Sy = sample standard deviation r varies between -1 and +1 R2 is CD or coefficient of Determination and gives the proportion of the variation explained by the relationship.

Coefficient of Correlation
r SS xy SS xx SS yy
2

where

SS xx x

x
n

SS yy y

SS xy xy

n x y n

## Coefficient of Correlation Example

Youre a marketing analyst for Hasbro Toys. Ad \$ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Calculate the coefficient of correlation.

Solution Table
xi
1 2 3 4

yi
1 1 2 2

2 xi

2 yi

xiyi
1 2 6 8

1 4 9 16

1 1 4 4

5
15

4
10

25
55

16
26

20
37

## Coefficient of Correlation Solution

SS xx x 2

x
n

(15) 2 55 10 5

(10) 2 SS yy y 2 26 6 n 5 x y 37 (15)(10) 7 SS xy xy n 5

SS xy SS xx SS yy

7 .904 10 6

## Coefficient of Correlation Thinking Challenge

Youre an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 Find the coefficient of correlation.

Solution Table*
xi
4
6 10

yi
3.0
5.5 6.5

xi

2 yi

xiyi
12
33 65

16
36 100

9.00
30.25 42.25

12
32

9.0
24.0

144
296

81.00
162.50

108
218

SS xx x 2

x
n

(32) 2 296 40 4

## (24) 2 SS yy y 2 162.5 18.5 n 4 x y 218 (32)(24) 26 SS xy xy n 4

SS xy SS xx SS yy

26 .956 40 18.5

## Sample correlation coefficient

r xy = sxy / sxsy Sxy = sample covariance Sx and Sy = sample standard deviation r varies between -1 and +1 R2 is CD or coefficient of Determination and gives the proportion of the variation explained by the relationship.

Sales 30146353

Sales
1

## Sales and Advertising are strongly correlated.

Sales

But we dont know what sales response we might expect for a given level of advertising?

Correlation Models
Answers How strong is the linear relationship between two variables? Coefficient of correlation
Sample correlation coefficient denoted r Values range from 1 to +1 Measures degree of association Does not indicate causeeffect relationship

Regression Analysis
Sales = 4.694 (advertising) + 49492 This model suggests that as advertising increases, sales increases proportionately. RA allows us to develop models that describe the relationships between a dependent variable and one or more independent variables to use for estimation or prediction. As well as to test the significance of the relationship statistically.

## Simple Linear Regression

y^ = b0 + b1x b0 = the estimated y-intercept b1 = the estimated slope of the regression line y^ = the estimated value of the dependent variable

1
2
3

## 1 Standard Regression Statistics

Multiple R is correlation coefficient Adjusted R square reflects the sample size and is useful when comparing this model with others that include additional variables. R Square or coefficient of determination, provides a measure of how well a regression line fits the data.

yi

yi

SSE

yi

yi
SST

yi

y
SSR

R 2 = SSR/SST

## Strength of regression relationship

When two variables are not correlated ( that is the slope of the regression line is zero), the best estimate for any value of the independent variable is simply the mean. Any variation from the mean is due to random error. If the slope is not zero, then a portion of the deviation from the mean is explained by regression, and the remainder is due to error.

yi y

yi

## SSR = sum of square due to regression=

SST = SSE + SSR and R 2 = SSR/SST it indicates the fraction of the total variation in the dependent variable about its mean that is explained by the regression line.

0R21
R2 = 0 (NONE OF THE VARIATION IS EXPLAINED BY THE REGRESSION LINE)
R2 = 1 ( ALL THE VARIATION EXPLAINED BY REGRESSION LINE THAT IS ALL THE DATA IS ON REGRESSION LINE)