Sunteți pe pagina 1din 11

Combining SAS Data Sets

(commands=combine.sas)
There are many ways that SAS data sets can be combined. This handout illustrates
combining data sets vertically by adding more cases (stacking or appending data sets) and
combining data sets horizontally by adding new variables (merging data sets).
Stack Data Sets Vertically (adds new cases):
You can use the set statement to combine data sets vertically. It is not necessary or the
data sets being combined to have their variables in the same order! or even or them to
have the same variables. "owever! it is critical that i the same variable does appear in
both data sets! it should be o the same type (either character or numeric) in both.
I a variable is present in one data set and not in the other! the values or that variable will
be missing or all cases or the data set that did not have it. The order o variables in the
resulting data set will relect the order o the irst data set listed.
In the e#ample below! the data set $%YS has dierent variables! which are also in a
dierent order! than the variables in the data set &I'(S.
data boys;
input name $ sex $ age height teacher $;
cards;
Tom M 12 62 Smith
Bob M 13 57 reen
!oe M 11 5" reen
#arry M 12 53 reen
$i%%iam M 13 6& Smith
!ohn M 11 57 Smith
'ichard M 11 55 reen
;
data gir%s;
input name $ age sex $ teacher $;
cards;
Sharice 13 ( Smith
Mary 12 ( Smith
)%%en 11 ( reen
*aro% 11 ( reen
*hris 13 ( Smith
*%aire 12 ( reen
'aye 13 ( Smith
;
data a%%+ids;
set boys gir%s;
run;
)
proc print data , a%%+ids;
tit%e -printout o. a%%+ids data set-;
tit%e2 -/ith boys .irst in the data set-;
run;
printout o. a%%+ids data set
/ith boys .irst in the data set
0BS 12M) S)3 2) #)4#T T)2*#)'
1 Tom M 12 62 Smith
2 Bob M 13 57 reen
3 !oe M 11 5" reen
5 #arry M 12 53 reen
5 $i%%iam M 13 6& Smith
6 !ohn M 11 57 Smith
7 'ichard M 11 55 reen
6 Sharice ( 13 7 Smith
" Mary ( 12 7 Smith
1& )%%en ( 11 7 reen
11 *aro% ( 11 7 reen
12 *hris ( 13 7 Smith
13 *%aire ( 12 7 reen
15 'aye ( 13 7 Smith
data a%%+ids2;
set gir%s boys;
run;
proc print data , a%%+ids2;
tit%e -printout o. a%%+ids data set-;
tit%e2 -/ith gir%s .irst in the data set-;
run;
printout o. a%%+ids data set
/ith gir%s .irst in the data set
0BS 12M) 2) S)3 T)2*#)' #)4#T
1 Sharice 13 ( Smith 7
2 Mary 12 ( Smith 7
3 )%%en 11 ( reen 7
5 *aro% 11 ( reen 7
5 *hris 13 ( Smith 7
6 *%aire 12 ( reen 7
7 'aye 13 ( Smith 7
6 Tom 12 M Smith 62
" Bob 13 M reen 57
1& !oe 11 M reen 5"
11 #arry 12 M reen 53
12 $i%%iam 13 M Smith 6&
13 !ohn 11 M Smith 57
15 'ichard 11 M reen 55
*otice that the order o the variables in the inal data set is changed! depending on which
data set was listed irst in the set statement! but the values in both data sets are the same.
+
Merge Data Sets Horiontally (adds new !ariables):
SAS data sets can be merged horizontally in a number o ways. This method o
combining data sets allows you to match based on some key variable(s) such as I, or
household. "o# m#st $irst sort t%e data sets t%at are being merged by t%e key
!ariable(s)& and t%en merge by t%e same key !ariable(s).
The e#ample below shows how to merge two data sets or the same people. The dataset!
-.A/ contains data or a hypothetical group o people on a physical e#am. The data set
(A$ contains inormation or the some of the same people on their laboratory results.
data exam;
input id examdate mmddyy10. sex age height weight sbp dbp;
format examdate mmddyy10.;
cards;
1 10/18/2000 1 25 72 156 128 89
2 05/29/2000 1 33 68 168 145 96
3 02/21/2000 1 47 65 182 152 98
4 06/17/2000 1 29 69 190 139 91
5 01/11/2000 2 37 62 129 145 93
6 08/15/2000 2 42 64 156 133 94
;
data lab;
input id hgb;
cards;
1 13.2
4 12.1
3 14.5
6 12.8
12 13.0
;
proc sort data=exam;
by id;
run;
proc sort data=lab;
by id;
run;
data exam_lab;
merge exam lab;
by id;
run;
proc print;
title "Printout of Exam_lab Data Set";
run;
Printout of Exam_lab Data Set

Obs id examdate sex age height weight sbp dbp hgb
1 1 10/18/2000 1 2 !2 1" 128 8# 1$%2
2 2 0/2#/2000 1 $$ "8 1"8 1& #" %
$ $ 02/21/2000 1 &! " 182 12 #8 1&%
& & 0"/1!/2000 1 2# "# 1#0 1$# #1 12%1
01/11/2000 2 $! "2 12# 1& #$ %
" " 08/1/2000 2 &2 "& 1" 1$$ #& 12%8
0
! 12 % % % % % % % 1$%0

$y deault! SAS will include all observations rom both data sets in the merged data.
*otice in the above e#ample! I, numbers + and 1 are in the -.A/ data set! but not in
the lab data set! while I, number )+ is in the (A$ data set! but not in the -.A/ data
set. "owever all o these cases are in the merged -.A/2(A$ data set.
You can control the observations that get written to the inal data set! using the in= data
set option. This creates a temporary !ariable that indicates whether a case is in a
particular data set or not. Then you can control which observations get written out! using
s#bsetting i$ statements. The e#amples below show three dierent ways this could be
done.
/*How to include only cases that are in both data sets*/
data exam_lab2;
merge exam(in=a) lab(in=b);
by id;
if a and b;
run;
proc print data=exam_lab2;
title "Exam_lab2 Data Set Includes Only Those";
title2 "In Both Data Sets";
run;
Exam_lab2 Data Set 'n(ludes Onl) *hose
'n +oth Data Sets
Obs id examdate sex age height weight sbp dbp hgb
1 1 10/18/2000 1 2 !2 1" 128 8# 1$%2
2 $ 02/21/2000 1 &! " 182 12 #8 1&%
$ & 0"/1!/2000 1 2# "# 1#0 1$# #1 12%1
& " 08/1/2000 2 &2 "& 1" 1$$ #& 12%8
/*How to include cases that are in EXAM, regardless of Lab Data*/
data exam_lab3;
merge exam(in=a) lab(in=b);
by id;
if a;
run;
proc print data=exam_lab3;
title "Exam_lab3 Data Set Includes Those";
title2 "In Exam Data, Regardless of Lab Data";
run;
3
Exam_lab$ Data Set 'n(ludes *hose
'n Exam Data, -egardless of .ab Data
Obs id examdate sex age height weight sbp dbp hgb
1 1 10/18/2000 1 2 !2 1" 128 8# 1$%2
2 2 0/2#/2000 1 $$ "8 1"8 1& #" %
$ $ 02/21/2000 1 &! " 182 12 #8 1&%
& & 0"/1!/2000 1 2# "# 1#0 1$# #1 12%1
01/11/2000 2 $! "2 12# 1& #$ %
" " 08/1/2000 2 &2 "& 1" 1$$ #& 12%8
/*How to include cases that are in LAB, regardless of Exam Data*/
data exam_lab4;
merge exam(in=a) lab(in=b);
by id;
if b;
run;
proc print data=exam_lab4;
title "Exam_lab4 Data Set Includes Those";
title2 "In Lab Data, Regardless of Exam Data";
run;

Exam_lab& Data Set 'n(ludes *hose
'n .ab Data, -egardless of Exam Data
Obs id examdate sex age height weight sbp dbp hgb
1 1 10/18/2000 1 2 !2 1" 128 8# 1$%2
2 $ 02/21/2000 1 &! " 182 12 #8 1&%
$ & 0"/1!/2000 1 2# "# 1#0 1$# #1 12%1
& " 08/1/2000 2 &2 "& 1" 1$$ #& 12%8
12 % % % % % % % 1$%0
How to merge data sets w%en t%e !ariable names are t%e same:
I the two data sets that you wish to merge have the same variable names! this can be
handled by using the rename dataset option or either one or both o the datasets.
data oldsal;
input name $ idnum sex $ age salary jobcat year;
cards;
Roger 518 M 45 7677 2 1989
Martha 321 F 28 5000 1 1989
Zeke 444 M 33 6075 1 1989
Barb 1728 F 40 9023 2 1989
Bill 993 M 36 7739 3 1989
Sandy 1002 F 29 6161 3 1989
;
data newsal;
input name $ idnum salary jobcat year;
cards;
Hank 108 11138 1 1995
Fred 519 10035 2 1995
Zeke 444 9697 1 1995
Martha 321 7987 2 1995
Sandy 1002 6995 2 1995
Bill 993 12400 3 1995
Roxy 773 10119 2 1995
1
;
/*merging by idnum*/
proc sort data=oldsal;
by idnum;
run;
proc sort data=newsal;
by idnum;
run;
data combine1;
merge oldsal(rename=(salary=salary89 jobcat=jobcat89))
newsal(rename=(salary=salary95 jobcat=jobcat95));
by idnum;
drop year;
run;
proc print data=combine1;
title "printout of combine1 data set";
title2 "matching by id number";
title3 "all cases that were in either data set are included";
run;
printout of (ombine1 data set
mat(hing b) id number
all (ases that were in either data set are in(luded
Obs name idnum sex age salar)8# /ob(at8# salar)# /ob(at#
1 0an1 108 % % % 111$8 1
2 2artha $21 3 28 000 1 !#8! 2
$ 4e1e &&& 2 $$ "0! 1 #"#! 1
& -oger 18 2 & !"!! 2 % %
3red 1# % % % 100$ 2
" -ox) !!$ % % % 1011# 2
! +ill ##$ 2 $" !!$# $ 12&00 $
8 Sand) 1002 3 2# "1"1 $ "## 2
# +arb 1!28 3 &0 #02$ 2 % %
You can control the observations that are written to the inal data set! using in= data set
options or this type o merge also.
/*merging by idnum, but keeping only cases that are in both datasets*/
data combine2;
merge oldsal(in=a rename=(salary=salary89 jobcat=jobcat89))
newsal(in=b rename=(salary=salary95 jobcat=jobcat95));
by idnum;
if a and b;
totsal = sum (salary89,salary95);
format salary89 salary95 totsal dollar12.;
drop year;
run;
proc print data=combine2;
title "printout of combine2 data set";
title2 "matching by id number";
4
title3 "and only including cases that are in both data sets";
run;
printout of (ombine2 data set
mat(hing b) id number
and onl) in(luding (ases that are in both data sets
Obs name idnum sex age salar) /ob(at salar)# /ob(at# totsal
1 2artha $21 3 28 000 1 !#8! 2 512,#8!
2 4e1e &&& 2 $$ "0! 1 #"#! 1 51,!!2
$ +ill ##$ 2 $" !!$# $ 12&00 $ 520,1$0
& Sand) 1002 3 2# "1"1 $ "## 2 51$,1"
'ne(to(Many or Many(to('ne merges:
%ne5to5many or many5to5one merges oten arise in dealing with comple# study designs.
6or e#ample! in a longitudinal study we wish to combine longitudinal (time5varying)
inormation with one5time only (time5invariant) inormation or the same participant. In a
clustered or hierarchical study design! we may wish to combine cluster5level data with
individual5level data! so that everyone in the same cluster gets the same values. 6or
e#ample! we may want to merge classroom5level inormation ((evel II data in a
hierarchical sense) with student5level inormation ((evel I data in a hierarchical sense).
In another e#ample! we may wish to combine census tract inormation with all
individuals in our sample that come rom the same census tract. This is a relatively easy
process in SAS.
SAS automatically merges every instance o a keyed variable or variables in one ile to
every instance o the keyed variable(s) in the other ile. Thus the inormation in the 7one8
ile gets attached and 7illed down8 to every matching case in the 7many8 ile. SAS
doesn9t care how many observations there are in the 7many8 ile or each single
observation in the 7one8 ile. You can even have mi#tures o one5 and many5 within the
same ile. Howe!er& SAS will not allow m#ltiple instances o$ t%e same keyed !ariable
in bot% $iles& and will prod#ce a message in t%e log i$ t%is occ#rs. The e#amples below
illustrate this process or a hypothetical longitudinal study.
6irst! we create the time5varying data. *ote the use o the : ormat modiier with the
inormat! mmddyy):. This allow SAS to read a date the input dates with ): or possibly
ewer characters! rather than re;uiring strictly ): characters or each date. The ormat
statement instructs SAS to display the ,AT- variable with the mmddyy):. ormat.
<
890ne:to:Many and Many:to:0ne Merges98
tit%e;
data time;<arying;
input 4= >5 date ?mmddyy1&7 SB@ =B@;
.ormat date mmddyy1&7;
cards;
5 18"82&&7 117 62
2 381582&&7 111 75
2 582582&&7 1&6 65
1 581782&&7 155 "5
1 1182282&&7 13& "&
1 181282&&6 12& 6&
3 182282&&6 126 63
;
tit%e -Time Aarying =ata-;
proc print data,time;<arying;
run;
*ime 6ar)ing Data
Obs 'D date S+P D+P
1 1 0/1!/200! 1& #&
2 1 11/22/200! 1$0 #0
$ 1 01/12/2008 120 80
& 2 0$/1/200! 111 !&
2 0&/2/200! 108 "
" $ 01/22/2008 128 8$
! & 01/0#/200! 11! 82
=e now create the one5per5person data set. *otice that we input date as three separate
variables! /%*$I'T"! ,AY$I'T"! and Y-A'$I'T"! and then combine them into a
date variable in SAS using the mdy unction. =e also use a ormat to display this date
using the mmddyy):. ormat. =e then drop the individual variables! because they are not
necessary in the inal data set.
data one;per;
input 4= sex $ monbirth daybirth yearbirth;
dob , mdyBmonbirthCdaybirthCyearbirthD;
.ormat dob mmddyy1&7;
drop monbirth daybirth yearbirth;
cards;
3 M 12 1 1"6&
1 ( 1 1& 1"76
2 ( 5 15 1"76
5 M 5 11 1"61
5 ( 7 17 1"6&
;
tit%e -0ne:@er:@erson =ata-;
proc print data,one;per;
run;
One7Per7Person Data
Obs 'D sex dob
1 1 3 01/10/1#!8
>
2 2 3 0/1/1#!"
$ $ 2 12/01/1#80
& & 2 0&/11/1#81
3 0!/1!/1#80
=e now merge the data sets! by I,! being sure to sort each data set irst. *ote that we
sort the time5varying data set by I, and ,AT-! which is not necessary! but makes our
inal data set be in a nicer order. Age is calculated as age in years.
proc sort data,time;<arying;
by id date;
run;
proc sort data,one;per;
by id;
run;
data one;to;many;
merge one;perBin,aD
time;<aryingBin,bD;
by id;
age , Bdate:dobD8365725;
run;
tit%e -0ne:to:Many Merged =ata Set-;
proc print data,one;to;many;
run;
"ere is the log rom merging these two data sets. *otice that there are > observations in
the inal data set. This is more observations than were in either o the original data sets.
?an you tell why this happened@
2" data one_to_man)8
2! merge one_per9in:a;
28 time_<ar)ing9in:b;8
2# b) id8
$0 age : int99date7dob;/$"%2;8
$1 run8
=O*E> 2issing <alues were generated as a result of performing an operation on missing
<alues%
Ea(h pla(e is gi<en b)> 9=umber of times; at 9.ine;>9?olumn;%
1 at $0># 1 at $0>18 1 at $0>2$
=O*E> *here were obser<ations read from the data set @O-A%O=E_PE-%
=O*E> *here were ! obser<ations read from the data set @O-A%*'2E_6B-C'=D%
=O*E> *he data set @O-A%O=E_*O_2B=C has 8 obser<ations and ! <ariables%
=O*E> DB*B statement used 9*otal pro(ess time;>
real time 0%01 se(onds
(pu time 0%01 se(onds
One7to72an) 2erged Data Set
Obs 'D sex dob date S+P D+P age
1 1 3 01/10/1#!8 0/1!/200! 1& #& 2#
2 1 3 01/10/1#!8 11/22/200! 1$0 #0 2#
$ 1 3 01/10/1#!8 01/12/2008 120 80 $0
& 2 3 0/1/1#!" 0$/1/200! 111 !& $0
2 3 0/1/1#!" 0&/2/200! 108 " $0
" $ 2 12/01/1#80 01/22/2008 128 8$ 2!
! & 2 0&/11/1#81 01/0#/200! 11! 82 2
A
8 3 0!/1!/1#80 % % % %

=e now re5merge the data sets! but only include cases that were in both iles in the inal
merged data by using the s#bsetting i$ statementB i. a and b;
data one;to;many2;
merge one;perBin,aD
time;<aryingBin,bD;
by id;
i. a and b;
age , intBBdate:dobD8365725D;
run;
The log rom this merge is shown below. *ote that there are now > observations in the
inal data set! but that not all subCects who were in the one5per data set are included!
because one o them! I,D1! did not have any data in the time5varying data set.
$"
$! data one_to_man)28
$8 merge one_per9in:a;
$# time_<ar)ing9in:b;8
&0 b) id8
&1 if a and b8
&2 age : int99date7dob;/$"%2;8
&$ run8
=O*E> *here were obser<ations read from the data set @O-A%O=E_PE-%
=O*E> *here were ! obser<ations read from the data set @O-A%*'2E_6B-C'=D%
=O*E> *he data set @O-A%O=E_*O_2B=C2 has ! obser<ations and ! <ariables%
=O*E> DB*B statement used 9*otal pro(ess time;>
real time 0%00 se(onds
(pu time 0%01 se(onds
tit%e -0ne:to:Many Merged =ata Set-;
tit%e2 -$ith 0n%y *ases .rom Both (i%es 4nc%uded-;
proc print data,one;to;many2;
run;
):
One7to72an) 2erged Data Set
@ith Onl) ?ases from +oth 3iles 'n(luded
Obs 'D sex dob date S+P D+P age
1 1 3 01/10/1#!8 0/1!/200! 1& #& 2#
2 1 3 01/10/1#!8 11/22/200! 1$0 #0 2#
$ 1 3 01/10/1#!8 01/12/2008 120 80 $0
& 2 3 0/1/1#!" 0$/1/200! 111 !& $0
2 3 0/1/1#!" 0&/2/200! 108 " $0
" $ 2 12/01/1#80 01/22/2008 128 8$ 2!
! & 2 0&/11/1#81 01/0#/200! 11! 82 2
=e now make a unky graph o this ake data set as an illustrationB
goptions reset,a%%;
goptions de<ice,/in target,/inprtm;
symbo%1 co%or,b%ac+ <a%ue,dot height,75
%ine,1 interpo%,Eoin r,1&;
proc gp%ot data,one;to;many2;
p%ot sbp9date,4= 8 no%egend;
run; Fuit;
))

S-ar putea să vă placă și