Sunteți pe pagina 1din 26

HelptheStatConsultingGroupby

stat

>

sas

>

seminars

>

sas_programming_basics

givingagift

>sasprogrammingbasics.htm

SASProgrammingBasics
SASisapowerfulandflexiblestatisticalpackagethatrunsonmanyplatforms,includingWindowsandUnix.Thisclassisdesignedforanyoneinterestedin
learninghowtowritebasicSASprograms.SomefamiliaritywithSASisrecommended.IfyouarenewtoSASyoumaywanttoreviewourIntroductionto
SASSeminar.Itisexpectedthatthoseattendingthiscoursehavetheabilitytonavigatetoandaccessdatafilesontheirownoperatingsystem.The
studentsintheclasswillhavehandsonexperienceusingSASfordatamanipulationincludinguseofarithmeticoperators,conditionalprocessing,usingSAS
builtinfunctions,merging,appending,formattinganddifferentoptionsformodifyingSASoutput.Itisourhopethatafterthisseminaryouwillbeableto:

ComfortablynavigatetheSASwindowenvironment
Subsetandcreatenewdatasets
Createnewvariables
WriteanddebugbasicSASprograms
UseSASfunctionforbasicdatamanagementtasks
Mergeandappenddata
ModifySASoutputforpresentation

PleasenotethatsinceweareusingdatafilesprovidedbySAS,weareunabletomaketheseavailableonourwebsite.Thus,thisseminarpageincludes
outputfromtheSASproceduresusedintheseminar.
ForclarityallSASkeywordswillbeinCAPITALlettersinordertodistinguishthemfromtheinformationthatyouastheuserwillprovide.
Note:ThisseminarwasdevelopedinSAS9.4

1.0SASRefresher
1.1Libname
Wewillstartbysettingourlibname,whichopensadirectorytothelocationwhereourSASdatafilesarestored.

*assign libname
LIBNAME idre 'C:\';
SASalsoallowsyoutoclearaparticularlibnameorusethe_all_keywordtoclearallassignedlibnames.

*clear libname;
LIBNAME idre CLEAR;
LIBNAME _ALL_ CLEAR;
* reassign library;
LIBNAME idre 'C:\';
1.1SASWindowingenvironment
Let'sbrieflyreviewtheSASwindowingenvironment.ThefivemainwindowsinSASaretheExplorer,Results,ProgramEditor,Log,andOutput/Results
Viewerwindows.Ingeneral,whenyoustartSAS,thewindowsthatinitiallyappeararetheLog,EditorandExplorerwindows.Otherwindowscanbefound
undertheViewmenuinthetoolbar.
TheSASExplorerwindowallowsyoutomanagefilesassociatedwithyourcurrentSASsessionincludingviewing,deleting,moving,andcopyingfiles.The
Editorwindow,whichisliterallyjustatexteditor,permitsyoutoenter,edit,submitandsaveSASprograms.TheLogwindowallowstheusertoview
informationabouttheircurrentsessionincludingmessagesaboutsubmittedSASprogramssuchassuccessfulexecution,errorsorwarnings.TheResults
windowenablesyoutoviewalistofresultsfromexecutedSASprograms.TheResultsViewerallowsyoutoviewHTMLresultsofexecutedSAS
procedures.InSAS9.4,thedefaultoutputformatisHTML.
1.2CreatingnewSASdatasets
Aswewillbeusingseveraldifferentdatasetsintheseminartoday,let'salsocoverhowtocreatenewpermanentandtemporarydatasetsfromthedatafiles
youhavebeenprovided.

*permanent dataset;
DATA idre.new;
SET idre.charities;
RUN;
*temporary dataset;
DATA new;
SET idre.charities;
RUN;

1.3SASOptions
SASincludesalargesuiteofsystemoptionsthatwillaffectyourSASsession.SpecificoptionsareinvokedbydefaultwhenyouopenSAS.Theoptionscan
varydependingwhatcomputingenvironmentyouareusing(e.g.Windows,Unix).TheOPTIONSprocedureliststhecurrentsettingsofSASsystemoptions
intheSASlog.

PROC OPTIONS;
RUN;
SASincludestwotypesofoptions:portableandhost.Portableornohostareoptionsthatarethesameregardlessoftheoperatingsystem.Hostoptionsare
differentdependingonwhichoperatingsystemyouareusing.
Belowaresomeexamplesofcommonoptionsandwhattheyareresponsiblefordoing.
TheAUTOCORRECToptionisturnedonbydefaultandallowsSAStocorrectsyntaxwithsmallmistakeslikeamisspelledkeyword.Inthefirstexample
below,theDATAkeywordismisspelledtoDATE.Whentheoptionisinvoked,youwillseethatintheLog(shownbelow),SASissuesawarningitassumed
thatthekeywordwasmisspelledandcontinuesexecutingtheprocedure.However,inthesecondexamplewheretheautomaticcorrectionoptionisturned
off,SASissuesanerrorandstopsexecutingtheprocedure.

*autocorrect option;
OPTIONS AUTOCORRECT; /*default*/
PROC FREQ DATE=idre.charities;
TABLE code;
RUN;
OPTIONS NOAUTOCORRECT;
PROC FREQ DATE=idre.charities;
TABLE code;
RUN;

TheFMTERRoptioncontrolswhetherSASwillissueawarningforincorrectformatsbeingusedforvariables.Inthiscase,thedefaultisforSASistoerror
andstopprocessingtheexecutedprocedure.Inthefirstexample,thedefaultoptionisinvokedandasyoucanseebelowSASissuesawarningthatthe
formatusedcouldnotbefound.However,inthesecondexamplewherewetellSAStonotissueanerror(NOFMTERR),SASignorestheincorrectlyused
formatandwilltheexecutethecommandwithouttheformat.

*format error;
OPTIONS FMTERR;/*default*/
PROC PRINT DATA=idre.charities;
FORMAT code $code.;
RUN;
OPTION NOFMTERR;
PROC PRINT DATA=idre.charities;
FORMAT code $code.;
RUN;


2.0DiagnosingandCorrectingSyntaxErrors
Amainissueswithlearninganewprogramminglanguageistheabilitytoidentifyandaddresscodingerrors.ThereareseveralwaysthatSASwillnotifyyou
ofsyntaxerrors.
2.1ColorCodedSyntax.
WhenexecutingcodeinSASintheEnhancedEditoryouwillnoticesomecolorcoding.Colorcodingprogramcomponentswillhelpyoumoreeasily
diagnosesyntaxerrors,andwhenyoufirststartwithSASyouwillmakemanymistakes.TakealookattheexamplesyntaxbelowcopiedfromtheEnhanced
Editorwindow.Hereyouwillsee5differentcolorsautomaticallygeneratedbySAS.ForexampleyouwillseethatkeywordslikeDATA,CLASS,MODELare
allhighlightedinblue.Ifyouusethewrongkeywordwithaprocedure,thekeywordwilloftenremainblacklikethevariablenamesbecauseSASdoesnot
recognizeit.OptionslikeSOLUTIONarealsoconsideredkeywords.Aswewilldiscusslater,thewaytoindicateaformatistoputaperiodatthenendand,
onceyoudothis,itwillturngreen.Anythinginquotationmarksturnsred.Inthesecondsetofcode,youwillseethatwearemissinganendquote,thusallof
thesyntaxisred.Thuswewouldknowtocorrectthemissingdoublequote.

2.2LogFile
Thelogfilewillalsoletyouknowwhenyouhavesyntaxerrors.BelowisanexampleusingPROCMEANS:

Inthesyntaxshown,weareattemptingtoruntheMEANSprocedurewithacoupleoptions.Wehaveadded"average"and"min"optionstoourstatementto
indicatethatweonlywanttotheaverageandtheminimumvaluesforsalary.Aswedescribedintheprevioussection,optionsshouldbecoloredinblueand
inthisexample"average"remainsblackindicatingtheSASisnotrecognizingitasakeyword.
Belowweseewhathappenswhenweattempttoexecutethesyntaxaswritten.Anerrorappearsinthelogfileindicatingthatthekeyword"average"wasnot
arecognizedoption.Additionally,inthisinstance,SASprovidesalistofalternateoptionsyoumayhavewanted.Ifyoulookcarefully,youwillseethat"mean"
isoneofthem.Ifwereplacetheunrecognizedkeyword"average"with"mean"theprocedurewillexecuteasexpected.

3.0DataStepvs.ProcStep
SASprogramsarecomprisedoftwodistinctsteps:datastepsandprocsteps.Datastepsarewrittenbyyou,whileproceduresareprewrittenprogramsthat
arebuiltin.Ingeneral,Datastepsareusedtoread,modifyandcreatedatafilesandalwaysbeginwitha"DATA"statement.Yousawanexampleofadata
stepinsection1.2.FromastatisticalstandpointaProcstepistypicallyusedtoanalyzeadatasetinSASwithoutmakingchangestothedata.Thereare
exceptionstothis.Procstepsalwaysstartwiththefamiliar"PROC"statement.YouhaveseenseveralexamplesofProcstepsintheprecedingsections
includingPROCPRINT,PROCMEANS,andPROCFREQ.Eachprocedureenablesustoanalyzeandprocessthatdatainspecificway.
Inthefollowingsectionswewilldemonstratehowtousethesetwotypesofsteps.

4.0ManipulatingDatasets
4.1Operators
AnoperatorinSASisasymbolrepresentingacomparison,logicaloperationormathematicalfunction.
4.1.1ComparisonOperator
Theseareoperatorsthatcompareavariablewithsomespecifiedvalueoranothervariable.Theyaretypicallyrepresentedassymbolssuchas=,<,>but
alsohavemnemonicequivalentslikeEQ,LT,orGT,respectively.Theoperatorscanusedwithinadataorprocstepdependingonyourneeds.
OneofthesimplestwaystouseacomparisonoperatorisinaWHEREstatement.Inthe"sales"datafile,wehaveinformationonsalesassociatesfrom
Australia(AU)andtheUnitedStates(US).IfweonlywantedtooutputrecordsforAustraliansalesassociateswecouldusethe=oreqoperator.Sincethe
variablecountrycontainscharacterinformationnotnumeric,weneedtoputsinglequotesaround'AU'.

PROC PRINT DATA=idre.sales;


WHERE Country='AU';
RUN;
TheINoperatorcanbeusedifyouaretryingtospecifyalistorrangeofvalues,asdemonstratedbelow.

PROC PRINT DATA=idre.sales;


WHERE Country IN ('AU', 'US');
RUN;
WealsocanspecifySAStooutputonlycertainrangesofvaluesfornumericvariables.InthefirstexamplebelowweaskSAStooutputsalaryvaluesthat
arelessthan(<)$30,000.Inthesecondexample,weoutputsalaryvaluesgreaterthanorequalto(ge)$30,000.

PROC PRINT DATA=idre.sales;


WHERE Salary<30000;
RUN;
PROC PRINT DATA=idre.sales;
WHERE Salary ge 30000;
RUN;
OnelimitationofusingaWHEREstatementisthatmorethan1cannotbeusedsimultaneously,exceptinspecialcases.Ifyouattempttosubmitthe
followingsyntax,SASwillissueanoteintheLogstating"WHEREclausehasbeenreplaced."Itwillthenexecutethefollowingsyntaxomittingthefirst
WHEREstatement.However,inthenextsectionwewilldemonstratehowtocombinecomparisonoperatorswithlogicaloperatorstoachievethedesired
output.

PROC PRINT DATA=idre.sales;


WHERE Country='AU';
WHERE Salary<30000;
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageoncomparisonoperators.
4.1.2LogicalOperators
ThelogicalorBooleanoperatorsincludeAND,OR,&NOT.Theyareoftenusedtolinkaseriesofcomparisons.Justlikethecomparisonoperators,these
canbewrittenaseithersymbolsormnemonics.Belowistableshowingtheeachsymbolandit'smnemonicalternative.

Symbol

Mnemonic

^~

NOT

&

AND

OR

IntheprevioussectionwelearnedthatwecannotusetwoWHEREstatements,butwecanusetheANDoperatortocombinetheinformationcontainedin
thosetwostatementstoachievethedesiredresult.
BelowweuseANDtooutputobservationsrepresentingAustraliansalesassociatesthatmakelessthan$30,000ayear.Boththesymbolandmnemonicare
usedandtheygivethesameresult.

PROC PRINT DATA=idre.sales;


WHERE Country='AU' AND Salary<30000;
RUN;
PROC PRINT data=idre.sales;
WHERE Country='AU' & Salary<30000;
RUN;

AswithcomparisonoperatorsyoucanalsocombineAND,OR,&NOTwiththeINoperator.Intheexamplebelowthevariablejob_titleincludesseven
differentjobtypes.Wewanttoobtainfrequenciesofallofthemexceptfortwo,SalesManagerandSalesRepIV.SoweusetheNOTcombinedwithINsince
wehavemorethenonevaluewearetryingtoexclude.

PROC FREQ DATA=idre.sales;


TABLES Job_Title;
WHERE Job_Title NOT IN ('Sales Manager','Sales Rep. IV');
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonlogicaloperators.
4.1.3WhereOperators
Wehavejustcoveredseveralexamplesusingthecomparisonandlogicaloperators.However,SASdoesincludeasetofspecialoperatorsthatcanbeused
onlyinWHEREexpressions.Someofthesehavesimilarfunctionstocomparisonoperators.

Operator

Description

CharorNum

BetweenAnd

Allowsforaninclusiverange

Both

Contains

Includesacharacterstringorsubstring

CharacterOnly

IsNullorIsMissing

Identifiesmissingvalues

Both

Like

Matchesapattern

CharacterOnly

=*

Soundslike

CharacterOnly

SameAndorAlso

AugmentsanexistingWHEREclausewithout
havetoretypetheoriginalone

Both

Forexample,herearethreewaysofspecifyingthatwewantSAStooutputallsalesassociaterecordswithsalariesthatrangefrom$28,000to$30,000.As
inanygoodprogramminglanguage,therearealwaysmultiplewaysofdoingthesamething.

* We can use only comparison operators;


PROC PRINT DATA=idre.sales;
WHERE 28000<=Salary<=30000;
RUN;
*We can use a mix of comparison and logical operators;
PROC PRINT DATA=idre.sales;
WHERE Salary>=28000 & Salary<=30000;
RUN;
*We can use only the special WHERE operators;
PROC PRINT DATA=idre.sales;
WHERE Salary BETWEEN 28000 AND 30000;
RUN;
Earlier,wediscussedthat,ingeneral,SASdoesnotallowyoutousemorethenoneWHEREstatementinthesamedataorprocstep.Theexceptiontothis
arethespecialoperators"sameand"and"also".ThesewillallowyoutoupdateoraugmentanexistingWHEREstatementtoaddanadditionalconditions.In
theexamplebelowthefirstconditionsubsetsthedatatoAustraliansalesassociatesthatmakelessthen$26,000,andthenweaddtheadditionalclausethat
theymustalsobefemale.

*Using Same and;


PROC PRINT DATA=idre.sales;
WHERE Country='AU' and Salary<26000;
WHERE SAME AND Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
*Using Also;
PROC PRINT DATA=idre.sales;
WHERE Country='AU' & Salary<26000;
WHERE ALSO Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
Nowwhilesomeofthesespecialoperatorsarefairlyselfexplanatorylike"IsNull"somemaybelessso,suchas"=*"and"Like".Theseoperatorscanbe
helpfulforidentifyingissuessuchasmisspelledinformation,incorrectlyenteredinformation,oridentifyingrelatednamesortitlesthatvary.Forexample,
belowisadatasetcalled"shoes_eclipse"thatincludesseveraldifferentproductnames:

Let'ssupposeweareinterestedinidentifyingproductnamesthatincludetheword"Woman's".Howcouldwedothat?The"Like"operatorcouldhelpusdo
this.Itworksbycomparingcharactervaluestosomegivenpattern.Itrequirestwospecialcharacters,apercent(%)signandanunderscore(_).Thepercent
denotesthatanynumberofcharactersmayoccupyaposition.However,theunderscorespecifiesthatonlyonecharactermayoccupyaposition.Ifweare
onlyinterestedinproductsthatstartwith"Woman's",thenwedon'tcarehowmanyspacescomeafter"Woman's":

PROC PRINT DATA=idre.shoes_eclipse;


VAR product_name;
WHERE product_name LIKE "Woman's %";
RUN;

IalsocouldaskSAStooutputtomeanynamethatincludes"Men's"anywhereinthetitle.Thiswouldrequiremultiple%signsb/canyproductnamewith
"Men's"mayhavecharacterspacesbeforeandafter.

PROC PRINT DATA=idre.shoes_eclipse;


VAR product_name;
WHERE product_name LIKE "% Men's %";
RUN;

FormoreinformationcheckoutSASHelpandDocumentationonspecialWHEREoperators.
4.1.4Arithmeticoperators
Arithmeticoperators,asyoucanprobablytellfromthename,allowyoutoperformarithmeticcalculationsinSAS.Belowisatableoftheoperatorsandtheir
symbolsusedinSAS.

Symbol

Description

**
*
/
+

Exponentiation
Multiplication
Division
Addition
Subtraction

Afewthingstonoteaboutusingtheseoperators.First,ifyouarecalculatingvaluesusingavariable(s)withmissingdata,theresultingvaluewillalsobe
missing.Second,expressionsareevaluatedwithrespecttothetraditionalorderofoperationswithexponentiationtakingthehighestprioritylevel,then
multiplication/divisionandlastaddition/subtraction.Thisorderingcanbemodifiedbyusingparentheses.Third,asisthecasewiththeotheroperatorswe
havediscussed,arithmeticoperatorscanbeusingonconjunctionwithbothlogicalandcomparisonoperators.
Let'stryafewexamples.BelowwewilluseaDatasteptocreateanewtemporarydatasetcalled"sales_subset"fromthe"sales"data.Thisdatasetwill
containonlyobservationsfromAustralianemployeeswhosejobtitlecontainstheword"Rep".SoweareusingalogicalandspecialWHEREoperator.
Additionally,wearecreatinganewvariablecalled"Bonus"whichiscalculatedbymultiplying"Salary"by.10.

DATA sales_subset;
SET idre.sales;
WHERE Country='AU' & Job_Title contains 'Rep';
Bonus=Salary*.10;
RUN;
Belowweoutputthefirst20recordsofournewdataset.

Inthissecondexamplelet'suseparenthesestochangetheestimationofacompound(morethenoneoperator)expression.WewilluseaDatastepto
createtwonewvariablesprofit1andprofit2.

DATA profit;
SET idre.order_fact;
profit1 = total_retail_price - costPrice_per_unit * quantity;
profit2 = (total_retail_price - costPrice_per_unit) * quantity;
RUN;
Let'sseehowtheuseofparentheseshaschangedourvalues.

FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonarithmeticoperators.
4.2ConditionalProcessing
4.2.1WHEREandIFstatements
ConditionalprocessinginSASallowstheusertomanipulateandoutputportionsofdatainsteadofthewholefile.Inprevioussectionyouhaveseenseveral
examplesoftheWHEREstatement.Alternatively,SASalsoallowsfortheuseofIFstatements.Bothcanaccomplishsimilartaskshowever,whileboth
WHEREandIFcanbeusedwithaDatastep,onlyWHEREisallowedinaProcstep.Forexample,ifweaddanIFstatementtothePROCMEANS
commandfromearlierwewillseethattheIFturnsred.Thisindicatesthatthesyntaxisincorrect.

HoweverifyouuseWHEREthestatementisblue.

IfyouattempttoexecutethePROCMEANSusingtheincorrectIFstatementSASwillproduceanerrorbutSASwillexecutethecommandusingthe
WHEREstatement.

DatastepswillacceptbothWHEREandIFstatement,howeveronlyanIFcanbeusedforassignmentstatements.Belowisanexampleofanassignment
statement.Assignmentinthiscasemeanswearetakingobservationswithvaluesforsalarythataregreaterthan$30,000andassigningthem,usingTHEN
OUTPUT,toanewdatasetcalled"highsales".

DATA highsales ;
SET idre.sales;
IF salary GT 30000 THEN OUTPUT highsales;
RUN;
YoucanalsocombineWHEREandIFinthesameDatastepasdemonstratedbelow.WeuseanIFandWHEREstatementtosubsetthedata.Canyouthink
ofequivalentwaysofsubsettingthedata?

DATA emps;
SET idre.sales;
WHERE Country='AU';
Bonus=Salary*.10;
IF Bonus>=3000;
RUN;
Moreover,youwillnoticethatSASallowsyoutocreateavariableanduseitinanIFstatementinthesameDatastep.Thisissomethingyoucanaccomplish
withanIFbutnotWHERE.ThereasonforthishastodowithwhenSASexecutesconditionalstatements.WhenusingaWHEREconditionSASonlyselects
theobservationsthatmeetsthisparticularconditionandthencontinuesexecutinganyotheroperationsintheDataStep.Thismakesformoreefficient
processingofdataespeciallywithlargeamountsofdata.ButinthisinstanceifwehadusedaWHEREstatementtosubsetthedatausing"Bonus",SAS
wouldhavegivenusanerrorsayingthe"Bonus"variableisnotinthedataset.However,IFconditionsarenotprocesseduntiltheendoftheDatastep.Thus,
SASwillexecutetheWHEREstatementandcreate"Bonus"andthenassesswhethertheIFconditionistrue.
4.2.2IfThenstatement
AnIfthenstatementisacommonlyusedassignmentstatementthatistypicallycarriedoutwithinthecontextofDataStep.ItexecutesaSASstatementthat
fulfillsacertaincondition.
Wewillonceagaincreateavariablecalled"Bonus",butassignthevaluesbasedonacertainsetofconditionsthataredefinedbyanemployee'sjobtitle.

DATA comp1;
SET idre.sales;
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
IF Job_Title='Sales Manager' THEN Bonus=1500;
IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;

Youwillseeintheoutputabove,thatseveralobservationshavemissingvalues.Thisisduetothefactthatwedidnotassignvaluesfor"Bonus"forallofthe
jobtitles.
ArelatedstatementtoIFTHENistheELSEstatementthatcanbeusedwhencreatingconditionalstatementsaroundmutuallyexclusivegroups.

DATA comp2;
SET idre.sales;
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
SASprocessthefirstIFstatementandifitisnottrueitmovestothenextandsoon.SAScontinuestotesttheIFTHENstatementuntilitfindsonethatis
true,whichatthatpointitstopsandwillnottesttheremainingconditions.Onceagain,thiscanspeeduptheprocessingoflargedatasets.However,aswas
thecasewiththefirstIFTHENexample,wewillendupwithalotofmissingvaluesusingthissyntax.
Whatifwehadascenariowherewewantedtogivealltheremainingcategories,thatdidnotfulfilltheprescribedconditions,onebonusvalue.Wecando
thatusingafinalELSEstatementwithnoIFTHEN.IntheSAScodebelow,weaddanadditionalELSEstatementassigningallofthejobtitlesabonusvalue
of500.

DATA comp3;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
ELSE Bonus=500;
RUN;

Now,wehavecompletedataforallobservations.
AsecondrelatedstatementtoIFTHENistheDELETEstatement.InallofthepreviousexampleswehaveusedtheIFTHENstatementtoaddinformation
butyoucanalsousetheIFTHENtodeleteaswell.UsingtheIFTHENDELETEsyntaxwecanspecifythatcertainobservationsfittingourconditionbe
permanentlydeletedfromthedata.Intheexamplebelow,wedeleteallobservationsassociatedwiththreespecificjobtitles.

DATA drop;
SET idre.sales;
IF Job_Title IN('Sales Manager', 'Senior Sales Manager', 'Chief Sales Officer') THEN DELETE;
RUN;
4.2.3UsingDo
TypicallywithanIFTHENstatementonlyoneexecutablestatementisallowed.Whenanexpressionistruetheassociatedstatementisexecuted.Butwhat
happensifyouwantmorethenonestatementexecutedforeachexpression.Forexample,let'simaginethatforeachbonusvalue,Ialsowanttocreatea
variablecalledfreqthatdenoteshowmanytimesayearthesalesassociatecanreceivethebonus(e.g.onceayear,twiceayear).Sowemighttrythe
followingcodeusingalogicaloperator.

DATA freq1;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000 & Freq = "once a year";
ELSE Bonus=500 & Freq = "twice a year";
RUN;
Whilethissyntaxappearsreasonable,SASwillexecutethestatementandtheissueanoteinthelogthat"VariableFreqisuninitialized".WhenSASis
unabletolocateavariableinaDATAstep,SASprintsthismessage.Ifyoulookinthefreq1SASdatasetyouwillseethatSAScreatedthevariablebutsets
allofit'svaluestomissingwhichisundesirable.Itappearsthatcreating"Freq"willrequireaseparatestatementinsteadofjustasimple"&".Youcouldtry
this:

DATA freq2;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE Bonus=500;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Freq = "once a year";
ELSE Freq = "twice a year";
RUN;
Butthiscodecouldgetfairlylongifyouhavealotofvariablestocreate.AbetterwaytodothiswouldbethroughtheuseofaDOgroupwhichallowsfor
multiplestatements.

DATA bonus;
SET idre.sales;
IF Country='US' THEN DO;
Bonus=500;
Freq='Once a Year';
END;

ELSE DO;
Bonus=300;
Freq='Twice a Year';
END;
RUN;
WhilethesyntaxlookssimilartoatraditionalIFTHEN,therearesomeimportantdifferences.First,theIFexpressionnowendswithTHENDO.Thisis
followedbyasetofstatementstobeexecuted.Second,eachDOblockendswithanENDstatement.Third,insteadofjustELSEwenowhaveELSEDO
whichalsohasanENDstatement.IfyouaremissinganEND,SASwillissueawarninginthelogandfailtoexecutetheDatastep.
4.3SASFunctions
Functionsacceptsargumentsandthenproduceaparticularvalue(numericorcharacter)basedonthosearguments.Argumentsareenclosedwithin
parenthesesandeachargumentisseparatedbyacomma.SAShasawidearrayofdifferentfunctionsdependingontheneedsoftheuserandcanbeused
inDatastep.Wewillcoverafewexamplesofbasicmathematicalfunctions,commondatefunctions,andsomeadditionalfunctionsusefulforspecificdata
managementtasks.
4.3.1ArithmeticFunctions
Inthefirstexample,wewillusethe"Oldbudget"datafiletocalculatethetotalandaverageamountbudgetedforbusinessoperationsoverafiveyearperiod.

DATA budget;
SET idre.oldbudget;
sum1 = yr2003 + yr2004 + yr2005 + yr2006 + yr2007;
sum2 = SUM(yr2003, yr2004, yr2005, yr2006, yr2007);
sum3 = SUM( of yr2003-yr2007);
mean1 = (yr2003 + yr2004 + yr2005 + yr2006 + yr2007)/5;
mean2 = MEAN(yr2003, yr2004, yr2005, yr2006, yr2007);
mean3 = MEAN( of yr2003-yr2007);
RUN;
Therearemanydifferentwayofcreatingthesumandmeanvariablesthatweneed.Wecreate"sum1"usinganarithmeticoperatortoaddthe5budget
amounttogether.Alternatively,wecanusetheSUM()function,theargumentsarethevariablesyouwishtosumtogether.Thedifferencebetweenusingthe
functionversusmanuallyaddingtogethereachvariableisthetreatmentofmissing.Whenweadditemsusing"+",acasewithmissingvaluesonanyofthe
variableslistedwillhaveamissingvaluefortheresultingvariable.IfweusetheSUM()function,anymissingvalueswillbetreatedasthoughtheywerezero,
andthenewvariablewillbeequaltomissingonlyifallofthevariableslistedaremissing.Whichmethodismostappropriatedependsonthesituationand
whatyouaretryingtoachieve.Last,ifyouhavealotofvariablestobesummedyoucanspecifyaSASvariablelist.Thissyntaxworkssincethevariable
beingspecifiedareconsecutiveinthedata.ChecktheSASdocumentationpageonSASvariablelistsonhowtousethisshortcutinothercircumstances.We
alsousesimilarsyntaxtodemonstratehowtoestimatetheaverageormeanbudgetvariables.

Allthevaluesproducedfor"sum1sum3"and"mean1mean3"arethesamesincewedonothaveanymissingdata.SAShasanumberofadditional
mathematicalfunctionsincludingabsolutevalue,maximum,minimumandsquarerootthatcanbeusedinasimilarmanner.
4.3.2DateFunctions
Oneofthemorechallengingdatatypestodealwithinanydataanalysispackagearedatevalues.Thankfully,SAShassomebuiltinfunctionsthatcanassist
userswithmanagingthisdatatype.SASstoresdateinformationasnumericvaluesrepresentingthenumberdaysbeforeorafterJan1,1960.SAScanalso
recognize2or4digityearvalues.Wewillusethe"Sales"datasetwhichincludesinformationondateofbirthandhiringdataforeachemployeeto
demonstratesomedatefunctions.

DATA comp;
SET idre.sales;
Hire_Month=MONTH(Hire_Date);
Birth_Day = WEEKDAY(Birth_date);
Day_Dif = DATDIF(Birth_date,Hire_Date, 'actual');
Month_dif= INTCK('years',Birth_date,Hire_Date);
Bonus_1 = INTNX('month', Hire_Date, 6);
RUN;
TheMONTHfunctionpullsthemonthfrom"Hire_date"andput'sitinavariablecalled"Hire_month".TheWEEKDAYfunctionfiguresoutwhatdayofthe
week(17)thedatewouldhavefallenonandoutputsthis.
DATDIFcalculatesthedifferenceindaysbetweentwodatesgiveninthefirsttwoarguments.Thethirdargumentspecifiesthemethodforcalculatingthe
days.Inthisexamplewespecifywewantthe'actual'numberofdays,butwecouldchooseothermethodsofcalculationsuchassumingthateachmonthhas
30daysandthatayearalwayshas360days.
INTCKcountsthenumberofintervalsbetweentwodates,inourexampleweaskedSAStooutputthenumberofyearsbetweenanemployeesdataofbirth
andwhentheywerehiredwhichwewouldbeequivalenttoanemployeesageatthetimeofhire.

INTNKisusedtoestimatecalculatethevariablebonus_1.Herewewanttocalculatewhenanemployeewithbeeligiblefortheirnextbonus.The
argumentsforthisfunctionaretheunitoftime,thevariablerepresentingthestartdate/timeandthenumberofincrements.Inourexample,employeesare
eligible6monthsaftertheirhiredate.
Belowistheoutputofthefirst10observationsofthe"comp"dataset,withandwithoutdateformats.Asmentionedbefore,SASstoresdateinformationas
numericinformationindays.Thusifyoudonotformatdatewithaformatstatement(discussedfurtherinthenextsection),itwilldisplayasjustanumber.

PROC PRINT DATA=comp (OBS=10);


VAR Employee_ID Hire_date Hire_Month Birth_date Birth_Day Day_dif Month_dif Bonus_1;
*FORMAT Hire_date Birth_date Bonus_1 mmddyy10.;
RUN;

MoreexampleofSASdatefunctioncanbefoundontheSASHelpandDocumentationwebsite.
4.3.3OtherFunctions
SASincludesseveralothertypesoffunctionsdesignedforspecifictypesofneedsmanyofthesefunctionsarehelpfulfordatamanagementofcharacteror
stringinformation.Forexample,LENGTHtellstheuserthelengthofacharacterstringwhileCOMPRESSwillcompressstringvaluesandremoveunwanted
blanksandspecificcharactervalueslikedashes.Additionally,insimilarwaytoextractingdateinformationliketheMONTHfunction,SAShasseveral
functionsincludingSCANandSUBSTRthatallowsyoutoextractwordsfromaphrase.
Let'sdemonstratethese.Belowissampleofdatafromadatasetcalled"Shoes_eclipse"whereallthevariableshavecharacterinformation.Ourtaskof
interestistoobtainthelengthofproduct_name,compressproduct_nametoremovetheblanks,andcreateavariabletheextractsthebrandname
"Eclipse"fromproduct_group.

DATA shoes;
SET idre.shoes_eclipse;
length_name = LENGTH(product_name);
comp_product = COMPRESS(product_name);
brand = SUBSTR(product_group, 1, 7);

brand2 = SCAN(product_group, 1, " ");


RUN;

Youwillnoticeafewthingsabouttheoutputabove.First,forthevariablelength_name,ifyoucountedthenumberoflettersandspacesinproduct_name
youwouldendupwiththesamevaluesdisplayedabove.Second,thecompressedversionofproduct_namenowincludesnospaces.Third,bothSCAN
andSUBSTRfunctionsproducedthesameoutput.TheSUBSTRfunctiontakes3arguments,thenameofvariablewiththeinformationyouwanttoextract,
thecharacterpositionyouwanttostartfromandthenthenumberofcharactertoextract.InourexamplewearetellingSASthatwewanttoextracta
characterstringoflength7startingatthefirstcharacterpositionof"productgroup"whichwouldbethe"E"inEclipse.Unfortunately,thismeanswhatever
valueweareextractingmustalwaysbeofthesamelength.Whatifwehaveproductnamesofdifferentlengths.ThenyoumightwanttousetheSCAN
function,whichworkverysimilartoSUBSTRexcept,insteadofspecifyingthelengthofthestring,thelastargumentisadelimiter.Thesyntaxabove
indicatesthatthecharacterstringofintereststartsatthefirstpositionandcontinuesuntilablank/spaceisencountered.Thisfunctionworkswithmanytypes
ofdelimitersincluding<(+&!$*)^/,%.
Inthepreviousexamples,wewereextractingvaluesfromastring,butwhatifwewantedtocombinestringvariables.AusefulfunctionwouldbeCATX.
BelowwewantSAStocombinethecharacterstringinformationinfirst_nameandlast_nameintoonefullnamevariable.Additionally,thefunctionalso
requiresthespecificationofvariablesthatincludesinformationonthedelimiterofchoice.Inthefirstexample,thedelimiterisjustablankwhileinthesecond
examplethedelimiterisacomma.

DATA salesquiz;
SET idre.salesquiz;
sep = " ";
fullname = CATX(sep, first_name, last_name);
sep1 = ",";
fullname1 = CATX(sep1, last_name, first_name);
RUN;

Thenewvariablesaredisplayedabove.
AlistofallSASfunctions,bycategory,canbefoundhereontheSASwebsite.
Note:TheorderinwhichthevariablesarespecifiedintheCATXfunctiongovernstheorderinwhichtheywillbecombined.
4.4Sorting,MergingandAppending
4.4.1Sorting
Thearemanyinstanceswhenhavingyourdatasortedinaparticularwaywillbehelpfulforvisualizingyourdata.Additionally,certaintypesofdata
managementneedslikemergingdatasetsorgroupingobservationsbyaparticularcharacteristicrequiresorting.
SortingdatabyasinglevariableinSASisthemostsimple.BydefaultSASsortsdataascendingwiththesmallervaluesfirst.

PROC SORT DATA=idre.sales OUT=sales; *OUT= is optional;


BY Salary;
RUN;

Sortingcanalsobedoneusingmorethenonevariable.

PROC SORT DATA=idre.sales OUT=sales;


BY Salary Country;
RUN;

Asyoucansee,thedataissortedinascendingorderby"Salary"firstandthenwhentherearetiedsalariesfromdifferentcountries,AUcomesbeforeUS
alphabetically.Wecanchangethissortingbehaviorflippingtheorderingofourvariablesand/oraddingintheDESCENDINGoption,whichreversesthesort
orderforthevariablethatimmediatelyfollowsit.

PROC SORT DATA=idre.sales OUT=sales;


BY DESCENDING Salary DESCENDING Country;
RUN;

4.4.2Merging
Onedatamanagementtaskthatrequirespropersortingismerging.Merginginvolvesmatchingoneobservationinadatasettooneobservation(OnetoOne)
ormultipleobservations(OnetoMany)inaseconddataset.InorderforthistobedoneproperlyinSAS,thedatasetstobemergedmustbesortedbythe
samevariable(s).Intheexamplebelow,wewillmergeadatasetthathasemployeepayrollinformationwithaseconddatasetwithemployeeaddresses.
Sinceanemployee'sIDnumber(employee_id)isauniqueidentifierofeachobservation,wewillusethisvariabletomatchobservations.
First,weneedsorteachdatasetbyemployee_id.

PROC SORT DATA=idre.employee_payroll OUT=payroll;


BY Employee_ID;
RUN;

PROC SORT DATA=idre.employee_addresses OUT=addresses;


BY Employee_ID;
RUN;

Afewthingstotakenoteof.First,Youwillnoticethatdatasets"addresses"and"payroll"donotshareanyofthesamevariablesexceptEmployee_ID.In
general,youdonotwanttomergedatasetsthatincludevariableswiththesamenames.SAScanonlyuseonesetofvaluesandwillarbitrarilychoosethe
valuesfromthelastdatasetread.Thus,youshouldrenamevariablesbeforeattemptingthemerge.Second,Employee_IDisuniqueineachdataset,so
thiswillbeaOnetoOnemerge.
MergingisdoneinaDatastepsimilartowhatwehavebeenexecuting,exceptinsteadoftheSETstatementwenowhaveaMERGEstatement.Additionally,
theBYisusedtotellSASwhichvariablewillbeusedtomatchrecords.ThevariableaftertheBYstatementisthesameuniqueidentifierthatwejustused
forsorting.

DATA payadd;
MERGE payroll addresses;
BY Employee_ID;
RUN;
Belowisasubsetofvariablesfromthenewlymergeddata.Asyoucansee,Employee_Nameisfromthe"addresses"dataandBirth_dateandSalaryare
fromthe"payroll"data.

Nowlet'stakealookatanexampleofaOnetoManymerge.
Thefirstsetofdataprovidesinformationonorderanddeliverydates.Inthesecondsetofdatawehaveinformationontheproductorproductsordered.
BecausemorethenoneitemcanbeassociatedwithaparticularOrder_ID,itisnotuniqueinthisdataset.Thus,wewillneedtoconductaonetomany
mergewhereeachrowinour"orders"datacouldbemergedwithmultiplerowsinthe"order_item"data.Again,wewillbeginbysortingbothsetsofdataby
Order_ID.

PROC SORT DATA=idre.orders OUT= orders;


BY Order_id;
RUN;

PROC SORT DATA=idre.order_item OUT= order_item;


BY Order_id;
RUN;

Belowisoursyntaxtomergethetwodatasets.NoticewealsousedaKEEPstatement.Thisallowsustomergethedataandcontrolthenumbervariables
presentinthefinalmergeddataset.

DATA allorders;
MERGE orders order_item;
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
RUN;

Aboveisaselectedportionofthemergeddata.SASexecutedthemergewithoutanerrorbutitappearsthatwehavesomemissingdataasaresult.Thetwo
variablesthathavemissinginformationwerebothfromthe"orders"data.Thisisanindicationthatweperhapshavesomenonmatches.Ifwegobackand
lookatthe"orders"datawewouldseethatthereisnoinformationfortheorderidentifier"1243854878"butthereisinformationin"order_item",thuswhen
youmergethedatasetstogetherallthevariablesfrom"orders"willhavemissingvaluesforthisparticularorder.Thereareacouplewaysyoucandealwith
thisissue.First,youcanleavethedataasiswithmissinginformationfornonmatches.Alternatively,youcanchoosetocontroltheobservationsoutputtothe
newmergeddatasetbyusingtheINoptionontheMERGEstatement.TheINoptionacreatesvariableindicatingwhichdataset(s)contributedtoformingthe
observationinthefinalmergedataset.Itisatemporaryvariableusedinthemergingprocessthatisgivena0valueifdidnotprovideinformationora1ifit
did.Wecouldthenusethisvariabletoselectobservationsinthenewlymergeddatathatcomefromonedatasetorboth.Let'stakealookathowwecould
applythisoptioninourpreviousmerge.

DATA allorders2;
MERGE orders (in=a)
order_item (in=b);
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
IF a;
RUN;
UsingtheINoptionwithanIFstatementselectsobservationstobematchedbyorder_IDthatarepresentin"orders".Ifyouhaveavaluefororder_IDthatis
in"order_item"butnot"orders"thenitwillnotbeusedtoconstructobservationsforthe"allorders2"dataset.Note:UsingIF=aisequivalenttosayingIFa=1.
Thus,youwillnotendupwithanymissingvalues.
Now,onethingyougenerallywanttoavoidismanytomanymerges.Whenneitherdatasethasauniqueidentifierthatwillallowforpropermatchingof
recordstheresultisasomewhatunpredictableandoftenundesirableassortingofobservations.
4.4.3Appending
Appendingorconcatenatingobservationsistheprocessofaddingrowsorobservationstoadatasetasopposedtomergingwhichaddsvariables.Thiscan
alsobeaccomplishedusingaDatastep.SASwillstackthecolumnstotogetherbymatchingthenamesacrossdatasets.
Wewillappendthreedatasetsthatincludeinformationonordersfrom3consecutivemonths(JulySeptember)in2011.Belowissnapshotofthefirsttwo
recordsfromeachofthedatasetstobeappended.

Thesyntaxtoconducttheappendisquitesimple.AllyouhavetodoislistthedatasetstobeappendedontheSETstatementline.Theorderingandthe
numberofdatasetsdoesnotmatter.

DATA mnth7_8_9_2011 ;
SET idre.mnth7_2011 idre.mnth8_2011 idre.mnth9_2011;
RUN;
Aportionofthenewlyappendeddatasetisbelow.

Nowyoucanseethatall3datasetshasbeenappendedor"stacked"together.Thisexampleworkedperfectlybecausethethreedatasetssharedtheexact
samevariables.Butwhathappenswhenyouappendthedatasetsthatdonotcontainthesamevariables?
Takealookbackatour"shoe"data.Belowwehavetwosetsofdata,oneforEclipseshoesandoneforTrackerShoes.Youwillnoticetheyshareallthe
samevariablesexcepttwo,product_idandsupplier_name.

Whatwillhappenwhenweattempttoappendthedata?

DATA shoes;
SET idre.shoes_eclipse idre.shoes_tracker;
RUN;

Theappendstillexecuteswithouterror.However,inthenew"shoes"datawecreated,alltherecordsfromtheEclipsedatasetwillbemissingonthe
variablesthatwereonlyintheTrackerdataset.

5.0ModifyingSASOutput
5.1TitlesandFootnotes
Asyouhaveprobablyalreadynoticed,SASprovidesalotofoutputfrommanyofit'sprocedures.Asaresearcheritisimportanttoknowhowtomanipulate
andchangeyouroutputtoconveyimportantinformationtoyouraudience.OneoftheprocedureswehavebeenusingtoobtainoutputfromourvariousData
stepsisPROCPRINT.Wewillbeginbyexploringsomewaysofenhancingtheoutputfromthisprocedure.
Wheneveryouarepresentingtablesofinformation,thefirstitemmostpeoplelookforisatitle.ThisiseasilyimplementedwithaTITLEstatementin
SAS.Additionally,itisalsopossibletoaddmultipletitlestooutputinSASaswellasfootnotesbyjustaddinganumericsuffixtothestatementindicatingthe
desiredordering.SASallowsforupto10differenttitlesand/orfootnotes.

TITLE1 'Orion Star Sales Staff';


TITLE2 'Salary Report';
FOOTNOTE1 'Confidential';
PROC PRINT DATA=idre.sales (OBS=5);
VAR Employee_ID
Last_Name Salary;
RUN;


5.2LabelOptions
Additionally,youmayalsofinditusefultolabelyourvariablesforaddedreadability.YoucandothiswiththeLABELstatement.Youwillnoticethatsomeof
thevariablelabelsarelongerthanothers.Whenyouonlyhavethreevariablesitmaynotseemimportant,butifyouhavetolabel10variables,available
spaceinatablemayneedtobeaconsideration.OnewaytodealwiththisistousetheSPLIToptiononthePROCPRINTline.Thisallowstheuserto
controlthedisplayofthetitlesothatinsteadofthelabelbeingoneline,youcansplititintotwolines.

PROC PRINT DATA=idre.sales (OBS=5) SPLIT='*';


VAR Employee_ID Last_Name Salary;
LABEL Employee_ID = 'Sales ID'
Last_Name = 'Last*Name'
Salary = 'Annual*Salary';
RUN;

Thevariablenameshavebeenreplacedwithlabels.NotethatwedidnothavetoresubmittheTITLEandFOOTNOTEstatements.Theseareconsidered
globalstatementsandremainineffectuntilyoucancelthemoryouendyourSASsession.Tocanceltheseyoumustissueablankstatementforeach:

TITLE;
FOOTNOTE;
5.3Formats
Beyondjustlabelingvariables,youmayalsowanttoproperlylabelthevaluesofthosevariables.ThisiscarriedoutinSASusingaFORMATstatement.
FormattingvalueschangestheappearanceofthosevaluesinoutputbuttheunderlyingvaluesdoesNOTchange.
SAShaspredefinedformatsforcertaintypesofvariableslikedatesandallowsuserstocreatetheirownformatsforspecificsituations.Earlierwesawsome
examplesofpredefinedformatswhenwecovereddatefunctions.Here,wewillfocusonhowtocreateandapplyuserdefinedformats.
InSAS,thePROCFORMATprocedureisusedtodefineformats.Forexample,takealookatthesyntaxbelow:

PROC FORMAT;
VALUE $ctryfmt 'AU'='Australia'
'US'='United States'
other ='Miscoded';
VALUE tiers0-49999='Tier 1'
50000-99999='Tier 2'
100000-250000='Tier 3';
RUN;
EachformatisdefinedafteraVALUEstatement.Thenameyouchooseisuptoyou.Noticethatcharacterformatsmustbedefinedwitha"$"infrontof
them.Youthenprovideavaluelabelforeachlevelorrangeofvalues.Thefirstformatwearecreatingistolabelthecountrieswilltheirfullnamesinsteadof
abbreviations.VALUEstatementscanalsousekeywords.InthisexampleotherspecifiesthatanyvaluesotherthenAUorUSwillbelabeledas"Miscoded'.
Fornumericformats,youcanlabelasinglevalueorarangeofvalues.Oncetheformatsarecreatedwecanapplythemtothevariablesofinterest.

InbothDatastepsandProcsteps,SASdistinguishesformatsfromvariablesbyendingtheminaperiodwhichthenturnsthetextgreen.

PROC PRINT DATA=idre.sales (OBS=5);


VAR Employee_ID Salary Country Birth_Date Hire_Date;
FORMAT Salary tiers. Birth_Date Hire_Date monyy7. Country $ctryfmt.;
RUN;

Aboveyoucanseetheappearanceofthetablewithunformattedvaluestotheonewithformattedvalues.Ifyouonlywanttousetheformattedvaluesfor
certainproceduresinSAS,thenyoucanjustaddaformatstatementaswedidabove.Ifyouwanttheseformatstobepermanentlyappliedtoavariable,
thenyoucanusethesameformatstatementinaDataStep.
5.4OutputDeliverySystem(ODS)Basics
BesidescustomizingtheSASdefaultoutput,youmaywanttooutputresultstodifferentfiletypes.BydefaultSAS9.4outputresultsasHTMLandthisis
whatyouseeinthe"ResultsViewer"window.Ifyouwouldliketochangethisbehavior,youwillneedtousetheOutputDeliverySystem(ODS)statement.
Thiswillallowforoutputinseveraldifferentformatsincludinglisting/text,rtf,pdfand.xls.
ODSstatementsarealsoglobalstatementsareineffectuntilclosed.Thestatementcomesbeforerunningtheprocedure.Oncetheprocedure(s)is
executed,youwillthenclosetheODSstatement.Thebasicsyntaxisshownbelow:

ODS PDF FILE="&path\example.pdf";


ODS RTF FILE="&path\example.rtf";
PROC FREQ DATA=<data>;
TABLES <variable>;
RUN;
ODS PDF CLOSE;
ODS RTF CLOSE;
Inthiscase,theoutputfromthePROCFREQwillbesavedtoapdffileandartffile.Youmustalsospecifythepathorlocationwhereyouwantthese
documentssaved.OncecompleteyoushouldthenclosetheoutputdestinationotherwiseSASwillkeepsendingyourresultstothesedocuments.
BeforeSAS9.3thedefaultoutputdestinationwaslisting.YoucanseetheresultsfromthePROCFREQintheOutputwindowandseeanewiconinthe
Resultstab.

ODS LISTING;
PROC FREQ DATA=idre.sales;
TABLES gender;
RUN;
ODS LISTING CLOSE;
Asauser,youcanalsocustomizethestyleorlookoftheoutputwhenselectingeitherahtml,pdsorrtfdestination.Thisabilityisoftenusefulwhen
formattingresultsforpresentationsorpublications.BelowaresomeexamplesofthedifferentoptionsavailableinSAS:

ODS HTML FILE="C:\myreport.html" STYLE=sasweb;


PROC FREQ DATA=idre.sales;
TABLES gender;
RUN;
ODS HTML CLOSE;

ODS PDF FILE="C:\myreport.pdf" STYLE=printer; /*Default*/


ODS PDF FILE="C:\myreport1.pdf" STYLE=journal;
PROC FREQ DATA=idre.sales;
TABLES gender;

RUN;
ODS PDF CLOSE;

Justtobesafewewillgoaheadandclosealloftheopendestinations.However,bemindfulthatthiswillalsoclosethehtmldefault,soyouwillneedto
reissuetheglobalstatementturningitbackon.Otherwisethenexttimeyouissueaprocedurethatgeneratesoutput,SASwillissuethewarning"Nooutput
destinationactive".

ODS _ALL_ CLOSE;


ODS HTML;

6.0SpecialIssues
6.1DealingwithDuplicates
Anissuethatcomesupalotindatamanagementishowtohandleduplicates.ThereareseveralwaysinSAStoidentifyduplicaterecords.
OnewayistousesomeoftheoptionsavailabletouswiththePROCFREQprocedure.Inthe"nonsales"datafile,weshouldhave235uniqueemployee
identificationnumbers.WecanusetheORDER=FREQoptiontodetermineifthisistrue.Thisoptiondisplaysthefrequencyofeachuniqueidentification
numberindescendingorder.

PROC FREQ DATA=idre.nonsales ORDER=FREQ;


TABLES Employee_ID;
RUN;

AboveyoucanseethattheemployeeID#120108hastworecordsassociatedwithit,indicatingthatwehaveaduplicateproblem.Anotherusefuloptionis
NLEVELS,whichdisplaysthenumberofdistinctvaluesforeachvariable.

PROC FREQ DATA=idre.nonsales NLEVELS;


TABLES Employee_ID /NOPRINT;
RUN;

Thereare235uniqueemployeesinthe"nonsales"databutonly234uniquelevels,meaningthatoneemployeeID#isduplicated.
OncethepresenceofduplicateIDnumbershasbeenconfirmed,youwillmostlikelywanttoexaminethemtodetermineiftheyareindeedduplicaterecords
oriftheemployeeIDnumberisincorrect.InourdatasetonlyoneIDisduplicatedmakingassessmentfairlyeasy.However,whatdoyoudowhenseveral
ID'sorrecordsareduplicated?Let'sseparatetherecordswithuniqueID'sfromtheduplicatesusinganIFstatement.

PROC SORT DATA=idre.nonsales OUT=ids2;


BY employee_id;

RUN;
DATA dupes nodupes;
SET ids2;
BY employee_id;
IF NOT (FIRST.employee_id and LAST.employee_id) THEN OUTPUT dupes;
ELSE OUTPUT nodupes;
RUN;
AboveweareusingthekeywordsFIRST.andLAST.ThesekeywordsidentifythefirstandlastrecordinthegroupingvariableindicatedaftertheBY
statement.WhenanemployeeIDisunique,thefirstandlastrecordwillbethesamerow.ThusourcodeoutputsemployeeID'swherethefirstandlast
recordsarenotthesame,toadatasetcalled"dupes",andalltheotheruniquerecordsareputindatasetcalled"nodupes".
"Dupes"orduplicatedemployeeIDnumbers:

"NoDupes"oruniqueemployeeIDnumbers.

6.2IdentifyingOutliers
Anotherissuethatcomesupalotindealingwithdataisoutliers.ThesimplestwayinSAStoidentifyoutliersintousetheUNIVARIATEprocedure.
Bydefault,theUNIVARIATEprocedureoutputsthe5highestandlowestextremeobservations.Let'sexamineoutliersforproductpricesinthe"price_new"
dataset.

PROC UNIVARIATE DATA=idre.price_new;


VAR unit_cost_price;
RUN;

YoucanoverridethisdefaultbyspecifyingtheoptionNEXTROBS=ontheprocedurelineandindicatethenumberofoutlierstodisplay.Youcanspecifyany
numberbetween0andhalfofthetotalobservations.Youwillalsonoticethatalongwiththeextremevalues,SASalsoprovidesanobservationorrow
numberthatcorrespondstothisvalue.Additionally,youcanalsousetheIDstatementtoidentifyobservations.Thisstatementspecifiesoneormore
variablestobeincludedintheoutlierstable.Let'stryaddingtheproductidentifierProduct_IDtoeachofourextremevalues.

PROC UNIVARIATE DATA=idre.price_new NEXTROBS=3;


VAR unit_cost_price;
ID Product_ID;
RUN;

Noweachextremevalueisassociatedwithit'sIDnumber.

7.0Wrappingthingsup
Aswestatedinthebeginning,SASisaveryflexibleprogramswithgreatfeaturesfordatamanagement.
Thisseminaronlyscratchesthesurfaceondescribingalloftheprogrammingoptionsavailabletousers.
Formoreinformationonthetopicsdiscussedherepleaseexploreourwebsite.
Additionally,SAShasahostofcoursesdesignedtoimproveyourprogrammingskillsaimedatusersofalllevels.

Howtocitethispage

Reportanerroronthispageorleaveacomment

Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.

IDRE RESEARCH TECHNOLOGY


GROUP

High Performance
Computing
Statistical Computing

GIS and Visualization

ABOUT
2016 UC Regents

CONTACT

NEWS

Terms of Use & Privacy Policy

HighPerformanceComputing

GIS

StatisticalComputing

Hoffman2Cluster

Mapshare

Classes

Hoffman2AccountApplication

Visualization

Conferences

Hoffman2UsageStatistics

3DModeling

ReadingMaterials

UCGridPortal

TechnologySandbox

IDREListserv

UCLAGridPortal

TechSandboxAccess

IDREResources

SharedCluster&Storage

DataCenters

SocialSciencesDataArchive

AboutIDRE

EVENTS

OUR EXPERTS

S-ar putea să vă placă și