Sunteți pe pagina 1din 8

SAS Simple Linear Regression Example

This handout gives examples of how to use SAS to generate a simple linear regression plot, check the
correlation between two variables, fit a simple linear regression model, check the residuals from the model, and
also shows some of the ODS (Output Delivery System output in SAS!
Read in Raw Data
"e first read in the raw data from the werner#!dat raw dataset, and set up the missing value codes using a data
step, and then check descriptive statistics for the numeric variables, using $roc %eans!
OPTIONS FORMCHAR="|----|+|---+=|-/\<>*";
libname b!" "C#\$%e&%\'(el)*\+e%',-.\/!"";
+ATA b!"0(e&ne&;
INFI12 "C#\$%e&%\'(el)*\+e%',-.\/!"\(e&ne&304a,";
INP$T I+ !-5 A62 -7 HT 8-!3 9T !:-!;
PI11 !<-3" CHO1 3!-35 A1/ 3-37 !
CA1C 38-:3 ! $RIC ::-:; !;
IF HT = 888 TH2N HT = 0;
IF 9T = 888 TH2N 9T = 0;
IF CHO1 = ;"" TH2N CHO1 = 0;
IF A1/ = 88 TH2N A1/ = 0;
IF CA1C = 88 TH2N CA1C = 0;
IF $RIC = 88 TH2N $RIC = 0;
&=n;
/*C*e)' ,*e +a,a*/
,i,le "+2SCRIPTI>2 STATISTICS";
.&-) mean% 4a,a=b!"0(e&ne&;
&=n;
DESCRIPTIVE STATISTICS
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------
ID 1 1!"#"$ 1%!&#%" '#%%%%%%% '!1"#%%
A(E 1 ''#1"1)" 1%#11*$")* 1"#%%%%%%% !!#%%%%%%%
+T 1$ $)#!1%&!*& *#)!%$&' !&#%%%%%%% &1#%%%%%%%
,T 1$ 1'1#$&*%)'% *%#$$%!&$& ")#%%%%%%% *1!#%%%%%%%
PI-- 1 1#!%%%%%% %#!%1''!1 1#%%%%%%% *#%%%%%%%
C+.- 1& *'!#1!!%%* ))#!&%$*1" !%#%%%%%%% '"%#%%%%%%%
A-/ 1$ )#111*"%' %#'!&"$") '#*%%%%%% !#%%%%%%%
CA-C 1! "#"$*1$** %#)&"!!!$ #$%%%%%% 11#1%%%%%%
0RIC 1& )#&&%!* 1#1!&*'1* *#*%%%%%% "#"%%%%%%
-------------------------------------------------------------------------------
Correlation
"e now check the correlation between the response (or dependent variable, &'O(, and the predictor (or
independent variable, A)*! +t is positive, and significant (r , !-./, p0!1112! 3ote that there are 244
observations for A)*, but only 245 for &'O(, and that the correlation is based on the 245 observations that
have values for both variables!

,i,le "Pea&%-n C-&&ela,i-n";
.&-) )-&& 4a,a=b!"0(e&ne&;
2
?a& a@e )*-l;
&=n;
Pear1on Correlation
The C.RR Procedure
* Variable12 A(E C+.-
Sim3le Stati1tic1
Variable N Mean Std Dev Sum Minimum Maximum
A(E 1 ''#1"1! 1%#11*$" $'! 1"#%%%%% !!#%%%%%
C+.- 1& *'!#1!!% ))#!&%$* )'"&) !%#%%%%% '"%#%%%%%
Pear1on Correlation Coe4cient1
Prob 5 6r6 under +%2 Rho7%
Number o8 .b1ervation1
A(E C+.-
A(E 1#%%%%% %#'$"*'
9#%%%1
1 1&
C+.- %#'$"*' 1#%%%%%
9#%%%1
1& 1&
Scatterplot
"e now check a bivariate scatterplot to assess whether the relationship between &'O( and A)* appears to be
linear, and to check for outliers! Although there is not a very tight relationship between these two variables, it
does appear that the relationship is linear and increasing!
,i,le "S)a,,e&.l-, (i,* Re@&e%%i-n 1ine";
.&-) %@.l-, 4a,a=b!"0(e&ne&;
&e@ A=)*-l B=a@e;
&=n;
#
Simple Linear Regression
"e now fit a linear regression model, with &'O( as the 6 (dependent or outcome variable and A)* as the 7
(independent or predictor variable, using $roc 8eg! "e first illustrate the most basic $roc 8eg syntax, and then
show some useful options! The 9uit statement is used to tell SAS that there are no more statements coming for
this run of $roc 8eg!
The output shows that there is a positive relationship between these two variables! "hen age increases by one
year, average cholesterol is predicted to increase by 2!.# units, and this is a significant relationship (t(24: ,
:!;1, p0!1112! 3ote that the degrees of freedom for the t<test are 24:, the same as the error degrees of
freedom! The model 8<s=uare (!2-.4 is the s=uare of the correlation between the two variables! There were
245 observations used in the regression model!
,i,le "Sim.le 1inea& Re@&e%%i-n M-4el (i,* n- -.,i-n%";
.&-) &e@ 4a,a=b!"0(e&ne&;
m-4el )*-l = a@e;
&=n;C=i,;
Sim3le -inear Re:re11ion Model ;ith no o3tion1
The RE( Procedure
Model2 M.DE-1
De3endent Variable2 C+.-
Number o8 .b1ervation1 Read 1
Number o8 .b1ervation1 01ed 1&
Number o8 .b1ervation1 ;ith Mi11in: Value1 1
Anal<1i1 o8 Variance
-
Sum o8 Mean
Source D= S>uare1 S>uare = Value Pr 5 =
Model 1 !%'&' !%'&' *"#*% 9#%%%1
Error 1! '1"1*' 1&*)#""%*%
Corrected Total 1$ '$")"&
Root MSE )1#!''%% R-S>uare %#1'$'
De3endent Mean *'!#1!!% Ad? R-S> %#1'1&
Coe@ Var 1&#$$1"$
Parameter E1timate1
Parameter Standard
Variable D= E1timate Error t Value Pr 5 6t6
Interce3t 1 1&"#"$1&) 1%#$!!$) 1$#" 9#%%%1
A(E 1 1#$*"& %#'%1)) !#)% 9#%%%1
Simple Linear Regression with Diagnostic Plots
"e now include some diagnostic plots using $roc 8eg! "e also generate a new dataset called O>T8*)2 that
contains all of the original variables, plus the predicted value for each observation ($8*D+&T, the residual
(8*S+D and the studenti?ed<deleted residual (8ST>D, and &ook@s Distance (&OOAD!!
-4% @&a.*i)% -n;
,i,le "Sim.le 1inea& Re@&e%%i-n (i,* +ia@n-%,i) Pl-,%";
.&-) &e@ +ATA=/!"0(e&ne&;
MO+21 CHO1=A62 / %,b )lb;
O$TP$T O$T=O$TR26! P=PR2+ICT R=R2SI+ RST$+2NT=RST$+2NT COOD+=COOD+;
&=n;C=i,;
-4% @&a.*i)% -EE;
The partial output below shows the standardi?ed estimate (obtained with the STB option, which shows the
estimated change in 6 (in standard deviation units when 7 is increased by one standard deviation! This
estimate is 1!-./! "e also see the /:C &onfidence limits for the parameter estimate, which are form 2!1- to
#!##!
Parameter E1timate1
Parameter Standard StandardiAed
Variable D= E1timate Error t Value Pr 5 6t6 E1timate
Interce3t 1 1&"#"$1&) 1%#$!!$) 1$#" 9#%%%1 %
A(E 1 1#$*"& %#'%1)) !#)% 9#%%%1 %#'$"*'
Parameter E1timate1
Variable D= "!B ConCdence -imit1
Interce3t 1 1!#"'"!! *%%#"'"*
A(E 1 1#%')*$ *#**'$
The diagnostic panel shows a series of diagnostic plots for this regression model!
;
The residual plot below shows a scatterplot with the residuals on the 6<axis and A)* on the 7<axis! "e want to
look for a lack of pattern in these residuals! "e can see that there is one low outlier, at about age #:!
:
The fit plot shown below shows the regression model fit, and summari?es some of the statistics for the model!
Check the output dataset
"e now check the output dataset, using $roc $rint! "e also re=uest that $roc $rint display the labels for the
each variable, by using the (abel option! "e print selected variables for those observations with the absolute
value of the studenti?ed deleted residuals being greater than or e=ual to -, using a "here statement!
.
,i,le "Pa&,ial 1i%,in@ -E O=,.=, +a,a%e,";
.&-) .&in, 4a,a=-=,&e@!;
(*e&e ab%F&%,=4G >=:;
>AR I+ A62 CHO1 PR2+ICT R2SI+ RST$+ COOD+ 1C1 $C1 1C1M $C1M;
&=n;
Partial -i1tin: o8 .ut3ut Data1et
.b1 ID A(E C+.- PREDICT RESID RST0D C..DD -C- 0C- -C-M 0C-M
) 1&"& *! !% **%#$$ -1&%#$$ -)#'**1) %#%1%* 1'#'! '%'#%1) *1*#$" **#$&)
1* '1') !% '"% *$1#)1% 1*#!"% '#*%'*$ %#%")&"* 1&#$"! '))#1*$ *!%#1%$ *&*#&1)
Check the residuals for normality
"e now check the studenti?ed residuals for normality, using $roc >nivariate! This is similar to the output from
the ODS graphics that was shown in the earlier panel!
,i,le "C*e)'in@ Re%i4=al% E-& N-&mali,A";
.&-) =ni?a&ia,e 4a,a=-=,&e@! P1OT NORMA1;
?a& &%,=4;
*i%,-@&am / n-&mal;
CC.l-, / n-&malFm==e%, %i@ma=e%,G;
&=n;
The residuals appear to be fairly normally distributed, but there is at least one very low outlier, which we
identified earlier, when we checked the values in the output dataset!
Checking Residuals for Normality
-4.0 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2
0
5
10
15
20
25
30
35
P
e
r
c
e
n
t
Studentized Residual without Current Os

Checking Residuals for Normality
-3 -2 -1 0 1 2 3
-6
-4
-2
0
2
4
S
t
u
d
e
n
t
i
z
e
d

R
e
s
i
d
u
a
l

w
i
t
h
o
u
t

C
u
r
r
e
n
t

O

s
!or"al #uantiles
Refit the regression model without the cases in question
"e now refit the model, but without the two outliers being included, by using a "here statement!!
-4% @&a.*i)% -n;
,i,le "Re&=n ,*e m-4el (i,*-=, ,(- -b%";
.&-) &e@ 4a,a=b!"092RN2R;
(*e&e i4 n-, in F!<8<H :!:5G;
m-4el )*-l=a@e;
&=n;C=i,;
-4% @&a.*i)% -EE;
5
"e can see the changes in the parameter estimates from the output below!
De3endent Variable2 C+.-
Number o8 .b1ervation1 Read 1$
Number o8 .b1ervation1 01ed 1!
Number o8 .b1ervation1 ;ith Mi11in: Value1 1
Anal<1i1 o8 Variance
Sum o8 Mean
Source D= S>uare1 S>uare = Value Pr 5 =
Model 1 ')& ')& *!#* 9#%%%1
Error 1' *&*&!) 1)"%#)$1!
Corrected Total 1) '11*'*
Root MSE '#$%$!% R-S>uare %#1*'$
De3endent Mean *'!#'1"* Ad? R-S> %#11
Coe@ Var 1$#)%$%'
Parameter E1timate1
Parameter Standard
Variable D= E1timate Error t Value Pr 5 6t6
Interce3t 1 1$#&%%'" "#"%"1 1#&1 9#%%%1
A(E 1 1#)'$! %#**&) !#% 9#%%%1
4

S-ar putea să vă placă și