Sunteți pe pagina 1din 2

Where do the linear regression equations come from?

Time out for a calculus break

We want to minimize the sum of the squared residuals: SSE =


∑ ( y − yˆ)
a ll
2

d a ta

But yˆ = a + bx , so we can substitute into SSE to get SSE =


∑ ( y − a − b )x
a ll
2

d a ta
Since we want to find the values of a and b that make SSE a minimum, a and b are the variables.
Take the derivative of SSE of with respect to a and the derivative of SSE with respect to b. Then set the
derivatives equal to 0, to obtain equations which we will later solve to find the values of a and b.

∑ [( ay −− b) ]= ∑ [x2(y a−− b )− 1)] = (−x2∑ (y a−− b = 0) x


∂ 2

∂a a ll a ll a l l
d a t ad a t a d a t a

∑ [( ay −− b ) ]= ∑ [2(x ay −− b )− x)]= (− 2x∑ [( ay −− )xb ] = − 2∑x(x − a − by = 0) x x


∂ 2 2

∂b a ll a l l a l l a l l
d a t ad a t a d a t a d a t a

By breaking up the sums, we can “simplify” this into the two equations with two unknowns a and b

− ∑ y + n a+ b∑ x = 0 2
()
− ∑ ( x ) y+ a∑ x + b∑ x = 0
a ll a ll a ll a ll a ll
d a ta d a ta d a ta d a ta d a ta

These equations are linear in a and b, so they are not “difficult” to solve, although the algebra requires a
lot of care and patience because the coefficients of the variables a and b are sums. Some cleverness in
substituting means for sums helps to further “simplify” the equations to make them easier to work with.
Solving these equations to obtain the values of a and b that will minimize the SSE gives us:
∑y
a ll
− b∑ x
a ll
d a ta d a ta
a= = y − bx
n
− ∑ ( x ) +y ( y − bx ) ∑ x + b∑ x2 = 0 ()
a ll a ll a ll
d a ta d a tad a ta

− ∑ ( x ) + y y ∑ x − bx ∑ x + b ∑ x 2 = 0 ()
a ll a ll a ll a ll
d a t a d a t da a t a d a t a

− ∑ ( x ) y+ nyx − b xx n+ b∑ x2 = 0 ()
a ll a ll
d a ta d a ta
∑ ( xy )
all
− nx y

a = y − bx
data
Finally, b = ; after finding b substitute its value to find a using
∑ ( x ) − nx
all
2 2

data
Your calculator is very good at doing this type of tedious repetitive calculation quickly. Your calculator has the
formulas programmed into it and uses them with the data you input to quickly calculate the values of a and b

If you want more information about the theory and derivation of the equations for simple linear
regression, correlation and the coefficient of variation, visit the Mathworld website:
http://mathworld.wolfram.com/LeastSquaresFitting.html

S-ar putea să vă placă și