Sunteți pe pagina 1din 5

Introduction

SAS is a collection of modules that are used to process and analyze data. It began in the late 60s and early 70s as a statistical package (the name SAS originally stood for Statistical Analysis System). However, unlike many competing statistical packages, SAS is also an extremely powerful, general-purpose programming language. it has been enhanced to provide state-of-theart data mining tools and programs for Web development and analysis.

Getting Data into SAS


SAS can read data from almost any source. Common sources of data are raw text files, Microsoft Office Excel spreadsheets, Access databases, and most of the common database systems such as DB2 and Oracle. In SAS terminology, each piece of information is called a variable. (Other database systems, and sometimes SAS, use the term column.) File c:\books\learning\veggies.txt
Cucumber 50104-A 55 30 195 Cucumber 51789-A 56 30 225 Carrot 50179-A 68 1500 395 Carrot 50872-A 65 1500 225 Corn 57224-A 75 200 295 Corn 62471-A 80 200 395 Corn 57828-A 66 200 295 Eggplant 52233-A 70 30 225

In this example, each line of data produces what SAS calls an observation (also referred to as a row in other systems). Program 1-1 A sample SAS program *SAS Program to read veggie data file and to produce several reports;
options nocenter nonumber; data veg; infile "c:\books\learning\veggies.txt"; input Name $ Code $ Days Number Price; CostPerSeed = Price / Number; run; title "List of the Raw Data"; proc print data=veg; run; title "Frequency Distribution of Vegetable Names"; proc freq data=veg; tables Name; run; title "Average Cost of Seeds"; proc means data=veg; var Price Days; run;

SAS programs often contain DATA steps and PROC steps. DATA steps are parts of the program where you can read or write the data, manipulate the data, and perform calculations. PROC

(short for procedure) steps are parts of your program where you ask SAS to run one or more of its procedures to produce reports, summarize the data, generate graphs, and much more. DATA steps begin with the word DATA and PROC steps begin with the word PROC. Most DATA and PROC steps end with a RUN statement (more on this later). SAS processes each DATA or PROC step completely and then goes on to the next step. SAS also contains global statements that affect the entire SAS environment and remain in effect from one DATA or PROC step to another. In the program above, the OPTIONS and TITLE statements are examples of global statements. It is important to keep in mind that the actions of global statements remain in effect until they are changed by another global statement or until you end your SAS session. All SAS programs, whether part of DATA or PROC steps, are made up of statements. Here is the rule: all SAS statements end with semicolons. Because a semicolon determines the end of a SAS statement, you can place more than one statement on a single line (although this is not recommended as a matter of style). Another thing to notice about this program is that SAS is not case sensitive. Well, this is almost true. But if you are running SAS under UNIX or Linux, file names will be case-sensitive. Although SAS doesnt care whether you write these names in uppercase, lowercase, or mixed case, it does remember the case of each variable the first time it encounters that variable and uses that form of the variable name when producing printed reports.

SAS Names
SAS names follow a simple naming rule: All SAS variable names and data set names can be no longer than 32 characters and must begin with a letter or the underscore ( _ ) character. The remaining characters in the name may be letters, digits, or the underscore character. Characters such as dashes and spaces are not allowed. Eg:valid invalid
Parts LastName First_Name Ques5 Cost_per_Pound DATE time X12Y34Z56 8_is_enough :- Begins with a number Price per Pound :- Contains blanks Month-total :-Contains an invalid character ( - ) Num% :- Contains an invalid character (%)

SAS Data Sets and SAS Data Types


SAS reads data from anywhere (for example, raw data, spreadsheets), it stores the data in its own special form called a SAS data set. Only SAS can read and write SAS data sets. Even if SAS is reading data from Oracle tables or DB2, it is actually converting the data into SAS data set format in the background. SAS data sets contain two parts: a descriptor portion and a data portion. SAS has only two types of variables: character and numeric. SAS determines a fixed storage length for every variable. Numerical valuesthey are stored in 8 bytes Each character value (data stored as letters, special characters, and numerals) is assigned a fixed storage length.

The SAS Display Manager and SAS Enterprise Guide


where you write your program in the Enhanced Editor (Editor window), see any error messages and comments about your program and the data in the Log window, and view your output in the Output window. In addition to the Enhanced Editor, an older program, simply called the Program Editor, is available for Windows and UNIX users. As an alternative to the Display Manager, you

may enter the SAS environment using SAS Enterprise Guide, which is a front-end to SAS that allows you to use a menu-driven system to write SAS programs and produce reports.

Writing Your First SAS Program


The task: you have data values in a text file. These values represent Gender (M or F), Age, Height, and Weight. Each data value is separated from the next by one or more blanks. You want to produce two reports: one showing the frequencies for Gender (how 12 Learning SAS by
Example: A Programmers Guide

many Ms and Fs); the other showing the average age, height, and weight for all the subjects. Here is a listing of the raw data file that you want to analyze: File c:\books\learning\mydata.txt
M 50 68 155 F 23 60 101 M 65 72 220 F 35 65 133 M 15 71 166

Here is the program:


data demographic; infile "c:\books\learning\mydata.txt"; input Gender $ Age Height Weight; run; title "Gender Frequencies"; proc freq data=demographic; tables Gender; run; title "Summary Statistics"; proc means data=demographic; var Age Height Weight; run;

Notice that this program consists of one DATA step followed by two PROC steps. DATA step begins with the word DATA. In this program, the name of the SAS data set being created is Demographic. The next line (the INFILEstatement) tells SAS where the data values are coming from. In this example, the text filemydata.txt is in the folder c:\books\learning on a Windows-based system. The INPUT statement shown here is one of four different methods that SAS has for reading raw data. This program uses the list input method, appropriate for data values separated by delimiters. The default data delimiter for SAS is the blank. SAS can also read data separated by any other delimiter (for example, commas, tabs) with a minor change to the INFILE statement. When you use the list input method for reading data, you only need to list the names you want to give each data value. SAS calls these variable names. the dollar sign ($) following the variable name Gender. The dollar sign following variable names tells SAS that values for Gender are character values. Without a dollar sign, SAS assumes values are numbers and should be stored as SAS numeric values. Finally, the DATA step ends with a RUN statement. You will see later that, depending on what platform you are running your SAS program, RUN statements are not always necessary. TITLE (placed in single or double quotes) is printed at the top of each page of SAS output. Statements such as the TITLE statement are called global statements. The term global refers to the fact that the operations these statements perform are not tied to one single DATA or PROC step. They affect the entire SAS environment.

In addition, the operations performed by these global statements remain in effect until they are changed. For example, if you have a single TITLE statement in the beginning of your program, that title will head every page of output from that point on until you write a new TITLE statement. It is a good practice to place a TITLE statement before every procedure that produces output to make it easy for someone to read and understand the information on the page. If you exit your SAS session, your titles are all reset and you need to submit new TITLE statements if you want them to appear. The FREQ procedure (also called PROC FREQ) is one of the many built-in SAS procedures. As the name implies, this procedure counts frequencies of data values. To tell this procedure which variables to count frequencies on, you add an additional statementthe TABLES (or TABLE) statement. Following the word TABLES, you list those variables for which you want frequency counts. You could actually omit this statement but, if you did, PROC FREQ would compute frequencies for every variable in your data set. PROC MEANS is another built-in SAS procedure that computes means (averages) as well as some other statistics such as the minimum and maximum value of each variable. A VAR (short for variables) statement supplies PROC MEANS with a list of analysis variables (which must be numeric) for which you want to compute these statistics. Without a VAR statement, PROC MEANS computes statistics on every numeric variable in your data set. you can submit your program by using the menu system (Run->Submit), by pressing the appropriate function key (F3 in Windows), or by clicking the Submit icon (picture of a running person). Here is what your screen would look like after you have typed this program into the editor:

What you see are the Output and Log windows. (The exact appearance of these windows will vary, depending on how you have set up SAS.) The Output window (the top one) shows part of the output. To see it all, you can click on this window to make it active (alternatives: use a function key or select an item from the View menu on the Menu bar) and then scroll up or down. You can also click the Print icon to send this output to your printer. The output from this program is shown next:

Gender Frequencies The FREQ Procedure Cumulative Frequency 2 5 Cumulative Percent 40.00 100.00

Gender F M

Frequency 2 3

Percent 40.00 60.00

Summary Statistics The MEANS Procedure Variable N Mean Std Dev Minimum Maximum

Age 5 37.6000000 20.2188031 15.0000000 65.0000000 Height 5 67.2000000 4.8682646 60.0000000 72.0000000 Weight 5 155.0000000 44.0056815 101.0000000 220.0000000 ______________________________________________________________

S-ar putea să vă placă și