Documente Academic
Documente Profesional
Documente Cultură
Regression
Davina Bristow &
Angela Quayle
Topics Covered:
Regression
Pearsons r
t test
Relevance to SPM
GLM
CORRELATION CAUSATION
In
Scattergrams
Y
Positive correlation
Negative correlation
No correlation
Variance vs Covariance
Variance vs Covariance
Variance:
Gives information on variability of a
single variable.
S
2
x
2
(
x
x
)
i
i 1
n 1
Covariance:
Gives information on the degree to
which two variables vary together.
Note how similar the covariance is
to variance: the equation simply
multiplies xs error scores by ys error
scores as opposed to squaring xs
error scores.
cov( x, y )
(x
i 1
x)( yi y )
n 1
Covariance
n
cov( x, y )
(x
i 1
x)( yi y )
n 1
Example Covariance
cov( x, y )
( x x)( y
i 1
n 1
y ))
xi x
yi y
0
2
3
4
6
3
2
4
0
6
-3
-1
0
1
3
0
-1
1
-3
3
x3
y3
7
1.75
4
( xi x )( yi y )
0
1
0
-3
9
7
Subject
x error * y
error
X error * y
error
101
100
2500
54
53
81
80
900
53
52
61
60
100
52
51
51
50
51
50
41
40
100
50
49
21
20
900
49
48
2500
48
47
Mean
51
50
51
50
7000
28
Covariance:
1166.67
Covariance:
4.67
Solution: Pearsons r
rxy
cov( x, y )
sx s y
Pearsons R continued
cov( x, y )
( x x)( y
i 1
n 1
y)
rxy
( x x)( y
i 1
Z
i 1
y)
(n 1) s x s y
rxy
xi
* Z yi
n 1
Limitations of r
When r = 1 or r = -1:
We can predict y from x with certainty
all data points are on a straight line: y = ax + b
r is actually r
5
4
3
2
1
0
0
Regression
Best-fit Line
= ax + b
slope
intercept
= , predicted value
= y i , true value
= residual error
a = slope, b = intercept
Residual () = y -
Sum of squares of residuals = (y )2
Finding b
Finding a
Gradient = 0
min S
Values of a and b
The solution
a=
r sy
sx
y = ax + b
b = y ax
r sy
x+y = ax + b =
sx
a
Rearranges to:
r sy
x
sx
r sy
(x x) + y
=
sx
If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y
We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
Total variance of y:
( y)2
n-1
sy2 =
SSpred
df
Error variance:
serror =
2
(y )2
n-2
SSer
dfer
n-1
SSy
dfy
How
good is our model cont.
sy2 = s2 + ser2
s2 = r2 sy2
r2 = s2 / sy2
F-statistic:
F(df ,df ) =
er
s2
ser2
complicated
rearranging
r2 (n - 2)2
=......=
1 r2
So all we need to
know are r and n
Multiple regression
SPM
This is what SPM does and all will be explained next week!