Sunteți pe pagina 1din 8

--------------Creating library using macro

%let path=/folders/myfolders/ecprg193;
libname orion "&path";
data steps --read from ip to create data set
proc steps -- procedure step process sas data set to report
run: step to explicitly end each step
global step : title
sas free format style
Unbalanced quotation marks. no errors or warning messages.. Ypu have to correct
by stopping execution manually.
Accessing sas data set.
Work is a temporary library and deletes after the session.
libref
work.newssalesman --------one level name.
temporary
LIBNAME libref 'sas_library'<options> sa_library should be lessthan 8 characters
.
You can specify the libname at the top of the code.
EG: liname orion 'filepath';
macro variable is reference by "&"
to read the contents of library
proc contents data = libref._ALL_;
run;
nods option to supress description and only for _ALL_ with a space.
Make sure you have assigned libref before you begin the code.
row is called as observation in sas.
column as variable
table as dataset.
missing records in data can be altered by default for character an empty space a
nd for numericals a dot.
formats --informat and lable
sas variables atoz can be started with _ but not numbers
by default proc print displays all variables but you can over ride by using VAR
statement.
sum Salary; for sum var1 var2 var3subset observations by using where.
not equal ^=
** exponention
two where conditions------second where replaces the first.Can use logical operat
ors.
We can add noobs in proc data = orion.sales noobs; to supress the observation nu
mbers.
contains symbol ? includes a 'substring'
where country = 'au' and
job ? 'rep'
We can replace obs by unique id by using
var var1 var2;
ID Customer_Id.

proc sort data. (sorts and replaces original data set un less u use OUT = <outpu
t -SAS-data-set>;
BY ASCENDING var;)
BY COUNTRY DESCENDING SALARY; LIKE GROUP BY COUNTRY.
DEFAULT TITLE the sas system
TITLES AND FOOT NOTES. are global.
title1; after title3 replaces all and deletes 2 and 3 title.
At the end u can end title;
footnotes;
use label var = 'newnameofvariable'; and should add the lable option at the end
of proc print stmnt
split='*'
sas formats:
date formats and adding $ signs.
use format statement.
format salary dollar8. Hire_date mmddyy10;
<$>format<width>.<decimal>
characters will be truncated if not specified correct length
jan 1 1960 have 0 and previous dates will give negative dates.
mmddyy6
mmddyy8
mmddyy10
create your own format by proc format statement;
proc foramt;
value format-name value or -range of value = 'formatted-value1'
can use other = 'AUSTRALIA'(NO QUOTATION FOR OTHER)
PROC FORMAT;
VALUE $CONTRYFMT 'AU'= 'AUSTRALIA'
'uS'= 'USA'
OTHER='MISCODE';
VALUE $SPORTS
'FB'='FOOTBALL';
proc print data = orion.sales lable;
format salary dollar10.birth_date Hire_date monyy7.country $contryfmt.;
run;
low and high can be used in tiers1
Reading data set.
creating data step
data work.subset1;
set orion.sales;
where country ='AU';
run;
NOW SUBSET.1 will have details.
sas date constant----- hire_date<'01jan2000'd; for subsetting using date variabl
e.
Incorrect.
The correct answer is a. If an operand in an arithmetic expression has a missing
value, the result is a missing value.
drop and keep(if the number of variables in keep is lesser than drop then we use
keep)
compilation phase
errors
program data vector(PDV)---AREA OF MEMORY for SAS build one observation_N_,_ERRO

R_.
for each variable a slot is added into the pdv.
descriptive portion
with data set & var names
After successful compilation
and execution phase
reads and writes observations from pdv to output data set.
sas
PDV contains only one observation at any time.
where statement selects the observation when they are readfrom input data set in
to the pdv.

if expression;
if expr2;
if epr3;
IF cannot be used in proc statement; where can be used in proc statement;
In data step we use if statement and where(only reads the data variables from t
he subset and cannot read the assignment variables as they are not in the subset
)
If u use label in the proc statement then they are temporary but if you want to
write the lables to the descriptive sections then we if should ede them ath the
data step;
SImilarly we can permanently format the variable in data set.
if you want to display the label you should add the lable to the proc pritn ste
p;
reading the spreadsheet data set; by using proc print;
sas/access interface to pc files.
sas/access libname statement interface we can connect to third part databases.e
g xls, oracle
libname libref <engine>"workbook-name"<options>;
bitness should be same.
if not we should use engine
libname orionx pcfiles path="&path/sales.xls";
proc contents orionx._all_;
gets all in the library orionx
If name ende with $ then it was from xls.
libref.'worksheetname$'n the coorect of reading the worksheet which enables the
special characters in the work sheet.
printing xls------proc print orionx.'worksheet$'n noobs;
it is important to dis associate the data source.
sas libref puts lock on xls.
libname orinx clear to disassociate;
creating sas data set from xls in a remote.
data work.subset;
set orionx.'Austali$'n;

accessing data base by using sas/access


libname oralib oracle user =orclusername password = pas path = pracledriver sch
ema = schemaname;
libname libref engine <sas/accessOptions>;
proc print data = oralib.supervisors;(we can use them as if they were sas data s
ets)
reading raw data fiels:(delimited,fixed)no column headings.
we have provide the name of variables and data types.
techniques
lsit input--- separated by delimiter(Standrd/nondtandrd)
column input --- in columns and is standard.
formatted input reads std and non std data in columns.
standard --sas can read without specifications.
non standard --- data with special characters.
csv-- standard data then we use list input.
Instead of using set step in data step we uses infile and input statements in da
ta step to read raw data.
infile specifies location and input specifies the headings as sas variables.
INFILE "filpath/filena.csv" dlm = ',';
use "" when referencing macro variable here &path
by default dlm is space or blank in raw data file.
input names and type of variable created for standard.
input emp_Id dirst_name$.
date values in raw fiel are not standard
A libref is used to access SAS data sets in a SAS data library. The INFILE state
ment references the raw data file, so you do not need to use a libref to point t
o it.
the values will be truncated if you don't specify the length of variable while r
eading it takes only 8 bytes by default.
It creates buffer when reading raw data only but not reading sas data set.
then sas creates PDV area of memory for sas builds and observation with _N_, _ER
ROR_
and adds slot with the variable we created and stores 8 bytes by default.
sas initializes pdv during execution phases.
length variable <$> 40;
all these are assigned in the pdv
order of steps
data work.sales2;
length FIrst_Name $ 12;
infile;;;
input FIrst_Name $;
we can add varnum in proc contents to see the variables in contents...
Reading non standard & standard values from the raw data.

we can read modified list input----informats and colon formats


Informat -- input First_Name :$12.;-----Informat ---input Birth_date :date.;--DDMMYY(OR DDMMYYYY)
INPUT HIREdATE :MMDDYY.;
We can add DROP, KEEP, IF, WHERE, LABLES FORMATS in the data step;
wehave to add label in the proc print statement to display the new value.
We can read instreams data with datalines key words;; this should be the last st
ep.
data work.newemps;
input First_Name $ ;
datalines;
steve
john;
data work.newemps;
infile datalines dlm =',';
datalines;
-If any record is missing then we can have
infile 'raw-data-file-name' dlm=',' DSD;(Delimiter sensitive data)
We use dsd only when we have missing records with delimiters but if you have no
delimiters then we missover option instead of dsd.
sum(salary,bomus,money,general)
YEAR(SASDATE), QTR(SAS-DATE), MONTH(SAS-DATE)
TODAY()---- IS A DATE FUNCTION GIVES CURRENT DATA
DATE()----------SAME AS ABOE
MDY(MONTH,DAY,YEAR)
IF expression THEN statement;
else statement(increases the performance)
do;
statements;
end;
concatenating data sets
data sas-dataset;
set sas-data-set1 sas-data-set2;
run;
SAS DONOT reinitializes when reading from the sas data set but reinitializes whe
n reading the data from the second sas data set while concatenating.
rename for concatenating data set with different variables
datasettobechanged(RENAME =(oldname-1 =new-name1
oldname-2=newnaamae));
JOINS IN SAS AS merging
many to one relationship
many to many relationship
non matches
based on the one or more common variables
one to one relationship if have same obs.

We do this by merging
data sasdataset;
merge sasdataset1 data set2;
by <ascending/descending>commonvariable
sas donot reinitialize while merging.
If has any values not matching then sas first executes all the matched records a
nd then it re initailizes the pdv fto enter non matching records.
You can use IN (set)
sas reportings:
You can create sas proc freq, proc means, proc univariate.
to out output into the external files using proc ods (output delivery system).
proc freq orion.sales;
tables gender country;
where country ='AU';
RUN;
BY default it gives four columns with frequency,percent,cumulative frequency,cum
ulative percent.
You can supress them by using nocum in the tables statements
tables gender/nocum;
for no percentage we use
tables gender/nocum no percent;
we can use formats for grouping the numericals.eg tiers.
proc freq data = orion.sales;
tables salary;
foemat salar tiers.;
run;
proc freq data = orion.sales;
tables salary gender;
run;
You will get two seperate tables which is not useful. SO we will add by statemen
t after gender statement
proc freq data = orion.sales;
tables gender county;
by country;
run;
but before going to this we need to sort the data set by country. we have to do
this whenever you are using by statement.
eg :
proc sort data = orion.sales;
out = sorted;
by country;
run;
proc freq data = sorted;
tables salary gender;
by country;
run;
cross tabulatations by using * in proc freq;
proc freq data = orion.sales;
tables gender*county;

run;
here gender specifies the rows and country specifies the columns in the output d
ataset.
In cross tabulation results we will have four values in a cell as frequency,perc
ent, row percentage and column percentage.
If you want them in the separate columns then we have to use
tables gender*country/crosslist;
we cannot use nocum for crosstabulations but u can use no percent in these tabul
ations.
We can use /nopercent;
/nofreq;
/nocol;
/no
We can use proc format in the proc freq tables.
We can validate data with proc freq.
To find any duplicates
proc freq data = orion.nonsales2 order=freq;
tables employee_Id/nocum nopercent;
run;
We can also specify nlevels instead of order = freq; to find duplicates
proc freq data = orion.nonsales2 nlevels;
tables employee_Id/nocum nopercent;
run;
STATISTICS:
USING proc means
proc means data= sasdataset;
var analysis variables;
run;
u will get mean and standard deviation minumum and maximum.
We can group variables by using class;
prpoc means data = orion.sales;
var salary;
class gender country;
run;
Requsting specific requests; u want just n and mean
proc means data= sasdataset n mean;
var analysis variables n mean;
run;
proc means data= sasdataset min max sum;
var analysis variables n mean;
run;
If u want you control the decimal point by usind max dec=0;
We can use nmiss to identify the missing data.
proc means data = sasdataset min nmiss max;
Dedecting data outliers by using proc univariate.

proc univariate data = orion.sales2;


var salary;
run;
The Obs gives the observation number and not number of observations.
Sas output delivery system; by using ODS.
ODS destination FILE ="filename"<options>;
<SASCodeTogenrateThereport>
ODS destination CLOSE;
ods pdf file="path/salaries.pdf";
--osd pdf close;
for csv
ods csvall file="path/salaries.csv";
ods rtf file="path/salaries.rtf";
--odscsv close;
ods rtf close;

S-ar putea să vă placă și