Sunteți pe pagina 1din 7

PublishedonSTAT501(https://onlinecourses.science.psu.

edu/stat501)
Home>1.2Whatisthe"BestFittingLine"?

1.2Whatisthe"BestFittingLine"?
Sinceweareinterestedinsummarizingthetrendbetweentwoquantitativevariables,thenatural
questionarises"whatisthebestfittingline?"Atsomepointinyoureducation,youwereprobably
shownascatterplotof(x,y)dataandwereaskedtodrawthe"mostappropriate"linethroughthe
data.Evenifyouweren't,youcantryitnowonasetofheights(x)andweights(y)of10students,
(student_height_weight.txt)[1] .Lookingattheplotbelow,whichlinethesolidlineorthedashed
linedoyouthinkbestsummarizesthetrendbetweenheightandweight?

Holdontoyouranswer!Inordertoexaminewhichofthetwolinesisabetterfit,wefirstneedto
introducesomecommonnotation:
denotestheobservedresponseforexperimentaluniti
xi denotesthepredictorvalueforexperimentaluniti
^ isthepredictedresponse(orfittedvalue)forexperimentaluniti
y
i
yi

Then,theequationforthebestfittinglineis:
^
y

= b0 + b1 xi

Incidentally,recallthatan"experimentalunit"istheobjectorpersononwhichthemeasurementis
made.Inourheightandweightexample,theexperimentalunitsarestudents.
Let'stryoutthenotationonourexamplewiththetrendsummarizedbythelinew=266.53+6.1376
h.(Notethatthislineisjustamorepreciseversionoftheabovesolidline,w=266.5+6.1h.)The
firstdatapointinthelistindicatesthatstudent1is63inchestallandweighs127pounds.Thatis,x1=
63andy1=127.Doyouseethispointontheplot?Ifweknowthisstudent'sheightbutnothisorher
weight,wecouldusetheequationofthelinetopredicthisorherweight.We'dpredictthestudent's
weighttobe266.53+6.1376(63)or120.1pounds.Thatis,y^1 =120.1.Clearly,ourprediction
wouldn'tbeperfectlycorrectithassome"predictionerror"(or"residualerror").Infact,thesize
ofitspredictionerroris127120.1or6.9pounds.
Youmightwanttorollyourcursorovereachofthe10datapointstomakesureyouunderstandthe

notationusedtokeeptrackofthepredictorvalues,theobservedresponsesandthepredicted
responses:

xi

63 127 120.1

64 121 126.3

66 142 138.5

69 157 157.0

69 162 157.0

71 156 169.2

71 169 169.2

72 165 175.4

73 181 181.5

yi

^
y

10 75 208 193.8
Asyoucansee,thesizeofthepredictionerrordependsonthedatapoint.Ifwedidn'tknowthe
weightofstudent4,theequationofthelinewouldpredicthisorherweighttobe266.53+6.1376(69)
or157pounds.Thesizeofthepredictionerrorhereis162157,or5pounds.
Ingeneral,whenweusey^i
(orresidualerror)ofsize:

= b0 + b1 xi

topredicttheactualresponseyi,wemakeapredictionerror
^
ei = yi y

Alinethatfitsthedata"best"willbeoneforwhichthenpredictionerrorsoneforeachobserved
datapointareassmallaspossibleinsomeoverallsense.Onewaytoachievethisgoalisto
invokethe"leastsquarescriterion,"whichsaysto"minimizethesumofthesquaredprediction
errors."Thatis:
Theequationofthebestfittinglineis:y^i = b0 + b1 xi
Wejustneedtofindthevaluesb0andb1thatmakethesumofthesquaredpredictionerrorsthe
smallestitcanbe.
Thatis,weneedtofindthevaluesb0andb1thatminimize:
n
2

^ )
Q = (yi y
i

i=1

Here'showyoumightthinkaboutthisquantityQ:
Thequantityei = yi y^i isthepredictionerrorfordatapointi.
Thequantitye2i = (yi y^i )2 isthesquaredpredictionerrorfordatapointi.
n
And,thesymboli=1 tellsustoaddupthesquaredpredictionerrorsforallndatapoints.
Incidentally,ifwedidn'tsquarethepredictionerrorei

^
= yi y

togete2i

^ )
= (yi y
i

,thepositive

andnegativepredictionerrorswouldcanceleachotheroutwhensummed,alwaysyielding0.
Now,beingfamiliarwiththeleastsquarescriterion,let'stakeafreshlookatourplotagain.Inlightof
theleastsquarescriterion,whichlinedoyounowthinkisthebestfittingline?

Let'sseehowyoudid!Thefollowingtwosidebysidetablesillustratetheimplementationoftheleast
squarescriterionforthetwolinesupforconsiderationthedashedlineandthesolidline.
w=331.2+7.1h(thedashedline)
i

xi

w=266.53+6.1376h(thesolidline)

^ )
(yi y

xi

63 127 116.1 10.9

118.81

63 127 120.139 6.8612

47.076

64 121 123.2 2.2

4.84

64 121 126.276 5.2764

27.840

66 142 137.4 4.6

21.16

66 142 138.552 3.4484

11.891

69 157 158.7 1.7

2.89

69 157 156.964 0.0356

0.001

69 162 158.7 3.3

10.89

69 162 156.964 5.0356

25.357

71 156 172.9 16.9

285.61

71 156 169.240 13.2396

175.287

71 169 172.9 3.9

15.21

71 169 169.240 0.2396

0.057

72 165 180.0 15.0

225.00

72 165 175.377 10.3772

107.686

73 181 187.1 6.1

37.21

73 181 181.515 0.5148

0.265

44.89

10 75 208 193.790 14.2100

yi

^
y

^ )
(yi y
i

10 75 208 201.3 6.7

yi

^
y

^ )
(yi y
i

______

766.5

^ )
(yi y
i

201.924
______

597.4

Basedontheleastsquarescriterion,whichequationbestsummarizesthedata?Thesumofthe
squaredpredictionerrorsis766.5forthedashedline,whileitisonly597.4forthesolidline.
Therefore,ofthetwolines,thesolidline,w=266.53+6.1376h,bestsummarizesthedata.But,is
thisequationguaranteedtobethebestfittinglineofallofthepossiblelineswedidn'tevenconsider?
Ofcoursenot!
Ifweusedtheaboveapproachforfindingtheequationofthelinethatminimizesthesumofthe
squaredpredictionerrors,we'dhaveourworkcutoutforus.We'dhavetoimplementtheabove
procedureforaninfinitenumberofpossiblelinesclearly,animpossibletask!Fortunately,

somebodyhasdonesomedirtyworkforusbyfiguringoutformulasfortheinterceptb0andtheslope
b1fortheequationofthelinethatminimizesthesumofthesquaredpredictionerrors.
Theformulasaredeterminedusingmethodsofcalculus.Weminimizetheequationforthesumofthe
squaredpredictionerrors:
n
2

Q = (yi (b0 + b1 xi ))
i=1

(thatis,takethederivativewithrespecttob0andb1,setto0,andsolveforb0andb1)andgetthe
"leastsquaresestimates"forb0andb1:
b1 x

b0 = y

and:

b1 =

i=1

)(yi y
)
( xi x
n

i=1

)
( xi x

Becausetheformulasforb0andb1arederivedusingtheleastsquarescriterion,theresulting
equationy^i = b0 + b1 xi isoftenreferredtoasthe"leastsquaresregressionline,"orsimply
the"leastsquaresline."Itisalsosometimescalledthe"estimatedregressionequation."
Incidentally,notethatinderivingtheaboveformulas,wemadenoassumptionsaboutthedataother
thanthattheyfollowsomesortoflineartrend.
, y
) ,since
Wecanseefromtheseformulasthattheleastsquareslinepassesthroughthepoint(x
,theny = b0 + b1 x
= y
b1 x
+ b1 x
= y
.
whenx = x

Inpractice,youwon'treallyneedtoworryabouttheformulasforb0andb1.Instead,youarearegoing
toletstatisticalsoftware,suchasMinitab,findleastsquareslinesforyou.But,wecanstilllearn
somethingfromtheformulasforb1inparticular.
Ifyoustudytheformulafortheslopeb1:
n

b1 =

i=1

)(yi y
)
( xi x
n

i=1

)
( xi x

youseethatthedenominatorisnecessarilypositivesinceitonlyinvolvessummingpositiveterms.
Therefore,thesignoftheslopeb1issolelydeterminedbythenumerator.Thenumeratortellsus,for
eachdatapoint,tosumuptheproductoftwodistancesthedistanceofthexvaluefromthemean
ofallofthexvaluesandthedistanceoftheyvaluefromthemeanofalloftheyvalues.Let'ssee
howthisdeterminesthesignoftheslopeb1bystudyingthefollowingtwoplots.
Whenistheslopeb1>0?Doyouagreethatthetrendinthefollowingplotispositivethatis,asx
increases,ytendstoincrease?Ifthetrendispositive,thentheslopeb1mustbepositive.Let'ssee
how!
Clickonthebluedatapointintheupperrightquadrant.........Notethattheproductofthetwo
distancesforthisdatapointispositive.Infact,theproductofthetwodistancesispositivefor

anydatapointintheupperrightquadrant.
Now,selectclearandthenclickonthebluedatapointinthelowerleftquadrant.........Notethat
theproductofthetwodistancesforthisdatapointisalsopositive.Infact,theproductofthetwo
distancesispositiveforanydatapointinthelowerleftquadrant.

Addingupallofthesepositiveproductsmustnecessarilyyieldapositivenumber,andhencethe
slopeofthelineb1willbepositive.
Whenistheslopeb1<0?Now,doyouagreethatthetrendinthefollowingplotisnegativethat
is,asxincreases,ytendstodecrease?Ifthetrendisnegative,thentheslopeb1mustbenegative.
Let'sseehow!
Clickonthebluedatapointintheupperleftquadrant.........Notethattheproductofthetwo
distancesforthisdatapointisnegative.Infact,theproductofthetwodistancesisnegativefor
anydatapointintheupperleftquadrant.
Now,selectclearandthenclickonthebluedatapointinthelowerrightquadrant.........Notethat
theproductofthetwodistancesforthisdatapointisalsonegative.Infact,theproductofthe
twodistancesisnegativeforanydatapointinthelowerrightquadrant.

Addingupallofthesenegativeproductsmustnecessarilyyieldanegativenumber,andhencethe
slopeofthelineb1willbenegative.

Nowthatwefinishedthatinvestigation,youcanjustsetasidetheformulasforb0andb1.Again,in
practice,youaregoingtoletstatisticalsoftware,suchasMinitab,findleastsquareslinesforyou.We
canobtaintheestimatedregressionequationintwodifferentplacesinMinitab.Thefollowingplot
illustrateswhereyoucanfindtheleastsquaresline(inbox)onMinitab's"fittedlineplot."

ThefollowingMinitaboutputillustrateswhereyoucanfindtheleastsquaresline(inbox)inMinitab's
"standardregressionanalysis"output.

Notethattheestimatedvaluesb0andb1alsoappearinatableunderthecolumnslabeled
"Predictor"(theinterceptb0isalwaysreferredtoasthe"Constant"inMinitab)and"Coef"(for
"Coefficients").Also,notethatthevalueweobtainedbyminimizingthesumofthesquaredprediction
errors,597.4,appearsinthe"AnalysisofVariance"tableappropriatelyinarowlabeled"Residual
Error"andunderacolumnlabeled"SS"(for"SumofSquares").
Althoughwe'velearnedhowtoobtainthe"estimatedregressioncoefficients"b0andb1,we'venot
yetdiscussedwhatwelearnfromthem.Onethingtheyallowustodoistopredictfutureresponses
oneofthemostcommonusesofanestimatedregressionline.Thisuseisratherstraightforward:
Acommonuseoftheestimated
regressionline.

^
y

i,wt

= 267 + 6.14xi,ht

Predict(mean)weightof66"inch
tallpeople.

^
y

= 267 + 6.14(66) = 138.24

Predict(mean)weightof67"inch
tallpeople.

^
y

= 267 + 6.14(67) = 144.38

i,wt

i,wt

Now,whatdoesb0tellus?Theanswerisobviouswhenyouevaluatetheestimatedregression
equationatx=0.Here,ittellsusthatapersonwhois0inchestallispredictedtoweigh267pounds!
Clearly,thispredictionisnonsense.Thishappenedbecausewe"extrapolated"beyondthe"scopeof
themodel"(therangeofthexvalues).Itisnotmeaningfultohaveaheightof0inches,thatis,the
scopeofthemodeldoesnotincludex=0.So,heretheinterceptb0isnotmeaningful.Ingeneral,if
the"scopeofthemodel"includesx=0,thenb0isthepredictedmeanresponsewhenx=0.
Otherwise,b0isnotmeaningful.
And,whatdoesb1tellus?Theanswerisobviouswhenyousubtractthepredictedweightof66"
inchtallpeoplefromthepredictedweightof67"inchtallpeople.Weobtain144.38138.24=6.14
poundsthevalueofb1.Here,ittellsusthatwepredictthemeanweighttoincreaseby6.14pounds
foreveryadditionaloneinchincreaseinheight.Ingeneral,wecanexpectthemeanresponseto
increaseordecreasebyb1unitsforeveryoneunitincreaseinx.
SourceURL:https://onlinecourses.science.psu.edu/stat501/node/252
Links:
[1]
https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/student_height_weight.txt

S-ar putea să vă placă și