Multiple Reg Ludlow

Multiple
Regression
FittingModelsforMultipleIndependentVariables
ByEllenLudlow
If you wanted to predict someones

weight based on their height, you
would collect data by recording the
height and weight and fit a model.

Lets say our population are males
ages 16-25, and this is a table of
collected data...

Lets say our population are males
ages 16-25, and this is a table of
collected data...
height 60 63 65 66 67 68 68 69 70 70 71 72 72 73 75
weight 120 35 130 143 37 149 144 150 156 152 154 162 169 163 168
Next, we graph the data..

Height vs Weight
80
75
70
65
Weight (lbs)
60
55
115
125
135
145
155
165
Height (ins)
175

Height vs Weight
80
75
70
65
Weight (lbs)
60
55
115
125
135
145
155
165
175
Height (ins)
And because the data looks linear, fit an LSR line

Height vs Weight
80
75
70
65
Weight (lbs)
60
55
115
125
135
145
155
165
175
Height (ins)
And because the data looks linear, fit an LSR line
But weight isnt the only factor that

has an impact on someones height.
The height of someones parents may
be another predictor.
With multiple regression you may have
more then one independent variable,
so you could use someone's weight and
his parents height to predict his own
height.
Our new table, with the data, the

average height of a subjects parents,
looks like this
height
60 63 65 66 67 68 68 69 70 70 71 72 72 73 75
weight 120 35 130 143 37 149 144 150 156 152 154 162 169 163 168
parent's
height
59 67 62 59 71 66 71 67 69 73 69 75 72 69 73
This data cant be graphed like simple

linear regression, because there are two
independent variables.
This data cant be graphed like simple

linear regression, because there are two
independent variables.
There is software, however, such as
Minitab, that can analyze data with
multiple independent variable.
Lets take a look at a Minitab output for
our data
PredictorCoefStdevtratiop
Constant25.0284.3265.790.000
weight0.240200.031407.650.000
parenth0.114930.090351.270.227
s=1.165Rsq=92.6%Rsq(adj)=91.4%
AnalysisofVariance
SOURCEDFSSMSFp
Regression2205.31102.6575.620.000
Error1216.291.36
Total14221.60
What does all this mean?
First, Lets look at the multiple

regression model
The general model for multiple
regression is similar to the model for
simple linear regression.
Simple linear regression model:
y=0 +1x
Multiple regression model:
y=0 +1x1 +2x2 +...+k xk
Just like linear regression, when you fit a

multiple regression to data, the terms in
the model equation are statistics not
parameters.
A multiple regression model using
statistical notation looks like...
=B0 +B1x1 +B2 x2 +...+Bk xk

y
where k is the number of independent
variables.
The multiple regression model for our

data is
height 25.028 .24020weight .11483parenth

We get the coefficient values from the
Minitab output
Constant25.0284.3265.790.000
weight0.240200.031407.650.000
parenth0.114930.090351.270.227
Once the regression is fitted, we need

to know how well the model fits the
data
First, we check and see if there is a
good overall fit.
Then, we test the significance of
each independent variable. You will
notice that this is the same way we
test for significance in a simple linear
regression.
TheOverallTest
Hypotheses:
TheOverallTest
Hypotheses:
HO :1 =2 =3 =...=k
All independent variables are unimportant for

predicting y
TheOverallTest
Hypotheses:
HO :1 =2 =3 =...=k
All independent variables are unimportant for

predicting y
: At least one 0
At least one independent variable is useful

for predicting y
What type of test should be used?

The distribution used is called the
Fischer distribution. The F-Statistic is
used with this distribution.
<-- Fischer
Distribution
How do you calculate the F-statistic?

It can easily be found in the Minitab output,
along with the p-value

It can easily be found in the Minitab output,
along with the p-value
SOURCEDFSSMSFp
Regress2205.31102.6575.620.000
Error1216.291.36
Total14221.60
Or you can calculate it by hand
But, before you can calculate the Fstatistic, you need to be introduced to
some other terms.
some other terms.
Regression sum of squares
(regression SS) - the variation in Y
accounted for by the regression model
with respect to the mean model
some other terms.
Error sum of squares (error SS) - the
variation in Y not accounted for by the
regression model.
some other terms.
Error sum of squares (error SS) - the
variation in Y not accounted for by the
regression model.
Total sum of squares (total SS) - the
total variation in Y
Now that we understand these terms we

need to know how to calculate them
Now that we understand these terms we

need to know how to calculate them
Regression SS
Error SS
= (Yi Y )
i=1
n
= (Yi Y)
i=1
Total SS
= (Yi Y )
i=1
Total SS = Regression SS + Error SS
Y Y
n
i 1
i 1
Yi Y
i 1
Yi Y
There are also regression mean of

squares, error mean of squares, and total
mean of squares (abbreviated MS).

To calculate these terms, you divide the
sum of squares by its respective degrees
of freedom

of freedom
Regression d.f. = k
Error d.f. = n-k-1
Total d.f. = n-1

of freedom
Regression d.f. = k
Error d.f. = n-k-1
Total d.f. = n-1
Where k is the number of independent variables
and n is the total number of observations used
to calculate the regression
So
Regression
MS
= (Yi Y )
i=1
k
n
Error MS
(Yi Y )
i=1
nk 1
n
(Y Y )
i
Total MS
i=1
n1
and Regression MS + Error MS = Total

MS
Both sum of squares and mean squares

values can be found in Minitab

SOURCEDFSSMSFp
Regress2205.31102.6575.620.000
Error1216.291.36
Total14221.60

SOURCEDFSSMSFp
Regress2205.31102.6575.620.000
Error1216.291.36
Total14221.60
Now we can calculate the F-statistic.
Test Statistic and Distribution

Teststatistic:
F=
F=
F=
model mean square

error mean square
102.65
1.36
75.48
Which is very close to F-statistic from

Minitab ( 75.62)
The p-value for the F-statistic is then

found in a F-Distribution Table. As you
saw before, it can also be easily
calculated by software.
A small p-value rejects the null
hypothesis that none of the independent
variables are significant. That is to say,
at least one of the independent
variables are significant.
The conclusion in the context of our

data is:
We have strong evidence (p is approx.
0) to reject the null hypothesis. That is
to say either someones weight or their
average parents height is significant in
predicting his height.
Once you know that at least one
independent variable is significant, you
can go on to test each independent
variable separately.
TestingIndividualTerms
Ifanindependentvariabledoesnotcontribute
significantlytopredictingthevalueofY,the
coefficientofthatvariablewillbe0.
Thetestofthethesehypothesesdetermines
whethertheestimatedcoefficientissignificantly
differentfrom0.
Fromthis,wecantellwhetheranindependent
variableisimportantforpredictingthedependent
variable.
TestforIndividualTerms:
HO: j =0
HO: j =0
Theindependentvariable,xj,isnotimportant
forpredictingy
HO: j =0
forpredictingy
HA: 0 or 0 or 0
HO: j =0
forpredictingy
HA: 0 or 0 or 0
Theindependentvariable,xj,isimportantfor
predictingy
HO: j =0
forpredictingy
HA: 0 or 0 or 0
Theindependentvariable,xj,isimportantfor
predictingy
wherejrepresentsaspecifiedrandomvariable
Test Statistic:
t=
Test Statistic:
t=
d.f. = n-k-1
Test Statistic:
t=
d.f. = n-k-1
Remember, this test is only to be
performed, if the overall model of
the test is significant.
T-distribution
QuickTime and a
GIF decompressor
are needed to see this picture.
Tests of individual terms for

significance are the same as a test of
significance in simple linear regression
A small p-value means that the independent

variable is significant.
Constant25.0284.3265.790.000
weight0.240200.031407.650.000
parenth0.114930.090351.270.227
This test of significance shows that
weight is a significant independent
variable for predicting height, but
average parent height is not.
Now that you know how to do tests of

significance for multiple regression,
there are many other things that you
can learn. Such as
How to create confidence intervals
How to use categorical variables in
multiple regression
How to test for significance in groups
of independent variables

Multiple Reg Ludlow

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Multiple Reg Ludlow

Încărcat de

Drepturi de autor:

Formate disponibile

Multiple

If you wanted to predict someones

If you wanted to predict someones

If you wanted to predict someones

Next, we graph the data..

Next, we graph the data..

Next, we graph the data..

And because the data looks linear, fit an LSR line

Next, we graph the data..

And because the data looks linear, fit an LSR line

But weight isnt the only factor that

Our new table, with the data, the

This data cant be graphed like simple

This data cant be graphed like simple

What does all this mean?

First, Lets look at the multiple

y=0 +1x1 +2x2 +...+k xk

Just like linear regression, when you fit a

=B0 +B1x1 +B2 x2 +...+Bk xk

The multiple regression model for our

height 25.028 .24020weight .11483parenth

Once the regression is fitted, we need

All independent variables are unimportant for

All independent variables are unimportant for

At least one independent variable is useful

What type of test should be used?

How do you calculate the F-statistic?

How do you calculate the F-statistic?

How do you calculate the F-statistic?

Now that we understand these terms we

Now that we understand these terms we

Total SS = Regression SS + Error SS

There are also regression mean of

There are also regression mean of

There are also regression mean of

There are also regression mean of

and Regression MS + Error MS = Total

Both sum of squares and mean squares

Both sum of squares and mean squares

Both sum of squares and mean squares

Test Statistic and Distribution

model mean square

Which is very close to F-statistic from

The p-value for the F-statistic is then

The conclusion in the context of our

Tests of individual terms for

A small p-value means that the independent

Now that you know how to do tests of

S-ar putea să vă placă și