Sunteți pe pagina 1din 7

Very Basic 1. What SAS statements would you code to read an external raw data file to a DATA step?

Ans. INFILE AND INPUT STATEMENTS EX: DATA EMP; INFILE 'E:\ABC\EMPLOYEE.TXT' MISSOVER; INPUT VAR1 VAR 2....VARN; RUN; 2. Ans. 3. Ans. How do you read in the variables that you need? Using Input statement with the column pointers like @5/12-17 etc. Are you familiar with special input delimiters? How are they used? DLM and DSD are the delimiters that Ive used. They should be included in the infile statement. Comma separated values files or CSV files are a common type of file that can be used to read with the DSD option. DSD option treats two delimiters in a row as MISSING value. DSD also ignores the delimiters enclosed in quotation marks. 4. Ans. If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value? By using the option MISSOVER in the infile statement.If the input of some data lines are shorter than others then we use TRUNCOVER option in the infile statement. What is the difference between an informat and a format? Name three informats or formats. Informats read the data. Format is to write the data. Informats: comma. dollar. date. Formats can be same as informatsInformats: MMDDYYw. DATEw. TIMEw. , PERCENTw,Formats: WORDIATE18., weekdatew. Name and describe three SAS functions that you have used, if any? LENGTH: returns the length of an argument not counting the trailing blanks.(missing values have a length of 1)Ex: a=my cat; x=LENGTH(a); Result: x=6 SUBSTR: SUBSTR (arg,position,n) extracts a substring from an argument starting at position for n characters or until end if no n. Ex: data dsn; A=(916)734-6241; X=SUBSTR(a,2,3); RESULT: x=916 ; run; TRIM: removes trailing blanks from character expression. Ex: a=my ; b=cat;X= TRIM(a)(b); RESULT: x=mycat. SUM: sum of non missing values.Ex: x=Sum(3,5,1); result: x=9.0

5. Ans.

6. Ans.

INT: Returns the integer portion of the argument. 7. Ans. 8. Ans. How would you code the criteria to restrict the output to be produced? Use NOPRINT option. What is the purpose of the trailing @? The @@? How would you use them? @ holds the value past the data step. @@ holds the value till a input statement or end of the line. Double trailing @@: When you have multiple observations per line of raw data, we should use double trailing signs (@@) at the end of the INPUT statement. The line hold specifies like a stop sign telling SAS, stop, hold that line of raw data. Trailing @: By using @ without specifying a column, it is as if you are telling SAS, stay tuned for more information. Dont touch that dial. SAS will hold the line of data until it reaches either the end of the data step or an INPUT statement that does not end with the trailing. 9. Under what circumstances would you code a SELECT construct instead of IF statements? When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced. SELECT GROUP: Select: begins with select group.When: identifies SAS statements that are executed when a particular condition is true. Otherwise (optional): specifies a statement to be executed if no WHEN condition is met. End: ends a SELECT group. 10. Ans. 11. What statement do you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file? PUT and FILE statements. If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? Data _Null_ What is the one statement to set the criteria of data that can be coded in any step? Options statement: This a part of SAS program and effects all steps that follow it. Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself. How would you include common or reuse code to be processed along with your statements? By using SAS Macros.

12. Ans. 13. Ans. 14. 15. Ans.

16. Ans. 17. Ans. 18. Ans.

When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? SCAN. If you have a data set that contains 100 variables, but you need only five of those. If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable? Using KEEP option or statement. Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables. Proc sort data=one; BY State District County ; Run ; How would you delete duplicate observations? NONUPLICATES How would you delete observations with duplicate keys? NODUPKEY How would you code a merge that will keep only the observations that have matches from both sets. Check the condition by using If statement in the Merge statement while merging datasets. How would you code a merge that will write the matches of both to one data set, the nonmatches from the left-most data set to a second data set, and the non-matches of the rightmost data set to a third data set. Step1: Define 3 datasets in DATA step Step2: Assign values of IN statement to different variables for 2 datasets Step3: Check for the condition using IF statement and output the matching to first dataset and no matches to different datasets

19. Ans. 20. Ans. 21. Ans. 22.

Ans.

Internals 23. What is the Program Data Vector (PDV)? What are its functions? Ans. Function: To store the current observation; PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_. The Logical Program Data Vector (PDV) is a set of buffers that includes all variables referenced either explicitly or implicitly in the DATA step. It is created at compile time, then used at execution time as the location where the working values of variables are stored as they are processed by the DATA step program. 24. Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. Ans. SAS compiles the code

25. At compile time when a SAS data set is read, what items are created? Ans. Automatic variables are created. Input Buffer, PDV and Descriptor Information 26. Name statements that are recognized at compile time only? Ans. PUT 27. Name statements that are execution only. Ans. INFILE, INPUT 28. Identify statements whose placement in the DATA step is critical. Ans. DATA, INPUT, RUN. 29. Name statements that function at both compile and execution time. Ans. INPUT 30. In the flow of DATA step processing, what is the first action in a typical DATA Step? Ans. The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. 31. What is _n_? Ans. It is a Data counter variable in SAS. Note: Both -N- and _ERROR_ variables are always available to you in the data step N- indicates the number of times SAS has looped through the data step. This is not necessarily equal to the observation number, since a simple sub setting IF statement can change the relationship between Observation number and the number of iterations of the data step. The ERROR- variable ha a value of 1 if there is a error in the data for that observation and 0 if it is not. Ex: This is nothing but a implicit variable created by SAS during data processing. It gives the total number of records SAS has iterated in a dataset. It is Available only for data step and not for PROCS. Eg. If we want to find every third record in a Dataset thenwe can use the _n_ as follows Data new-sas-data-set; Set old; if mod(_n_,3)= 1 then; run; Note: If we use a where clause to subset the _n_ will not yield the required result. 32. How do i convert a numeric variable to a character variable? Ans. You must create a differently-named variable using the PUT function. 33. How do i convert a character variable to a numeric variable? Ans. You must create a differently-named variable using the INPUT function. 34. How can I compute the age of something? Ans. Given two sas date variables born and calc: age = int(intck('month',born,calc) / 12); if month(born) = month(calc) then age = age - (day(born) > day(calc));

35. How can I compute the number of months between two dates? Ans. Given two sas date variables begin and end: months = intck('month',begin,end) - (day(end) <> Base SAS 36. What is the effect of the OPTIONS statement ERRORS=1? 37. What's the difference between VAR A1 - A4 and VAR A1 -- A4? 38. What do the SAS log messages "numeric values have been converted to character" mean? What are the implications? 39. Why is a STOP statement needed for the POINT= option on a SET statement? 40. How do you control the number of observations and/or variables read or written? 41. Approximately what date is represented by the SAS date value of 730? 42. How would you remove a format that has been permanently associated with a variable?? 43. What does the RUN statement do? 44. Why is SAS considered self-documenting? 45. What areas of SAS are you most interested in? 46. Briefly describe 5 ways to do a "table lookup" in SAS. 47. What versions of SAS have you used (on which platforms)? 48. What are some good SAS programming practices for processing very large data sets? 49. What are some problems you might encounter in processing missing values? *In Data steps? Arithmetic? Comparisons? Functions? Classifying data? 50. How would you create a data set with 1 observation and 30 variables from a data set with 30 observations and 1 variable? 51. What is the different between functions and PROCs that calculate the same simple descriptive statistics? 52. If you were told to create many records from one record, show how you would do this using arrays and with PROC TRANSPOSE? 53. What are _numeric_ and _character_ and what do they do? 54. How would you create multiple observations from a single observation? 55. For what purpose would you use the RETAIN statement? 56. What is a method for assigning first.VAR and last.VAR to the BY group variable on unsorted data? 57. What is the order of application for output data set options, input data set options and SAS statements? 58. What is the order of evaluation of the comparison operators: + - * / ** ( ) ? Testing, debugging 59. How could you generate test data with no input data? 60. How do you debug and test your SAS programs? 61. What can you learn from the SAS log when debugging? 62. What is the purpose of _error_? 63. How can you put a "trace" in your program? 64. Are you sensitive to code walk-throughs, peer review, or QC review? 65. Have you ever used the SAS Debugger? 66. What other SAS features do you use for error trapping and data validation? Missing values 67. How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs? 68. How many missing values are available? When might you use them?

69. How do you test for missing values? 70. How are numeric and character missing values represented internally? General 71. What has been your most common programming mistake? 72. What is your favorite programming language and why? 73. What is your favorite operating system? Why? 74. Do you observe any coding standards? What is your opinion of them? 75. What percent of your program code is usually original and what percent copied and modified? 76. Have you ever had to follow SOPs or programming guidelines? 77. Which is worse: not testing your programs or not commenting your programs? 78. Name several ways to achieve efficiency in your program. Explain trade-offs. 79. What other SAS products have you used and consider yourself proficient in using? Functions 80. How do you make use of functions? 81. When looking for contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? 82. What is the significance of the 'OF' in X=SUM(OF a1-a4, a6, a9);? 83. What do the PUT and INPUT functions do? 84. Which date function advances a date, time or date/time value by a given interval? 85. What do the MOD and INT function do? 86. How might you use MOD and INT on numerics to mimic SUBSTR on character strings? 87. In ARRAY processing, what does the DIM function do? 88. How would you determine the number of missing or nonmissing values in computations? 89. What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);? 90. There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT. 91. In the following DATA step, what is needed for 'fraction' to print to the log? data _null_; x=1/3; if x=.3333 then put 'fraction'; run; 92. What is the difference between calculating the 'mean' using the mean function and PROC MEANS? PROCs 93. Have you ever used "Proc Merge"? (be prepared for surprising answers..) 94. If you were given several SAS data sets you were unfamiliar with, how would you find out the variable names and formats of each dataset? 95. What SAS PROCs have you used and consider yourself proficient in using? 96. How would you keep SAS from overlaying the a SAS set with its sorted version? 97. In PROC PRINT, can you print only variables that begin with the letter "A"? 98. What are some differences between PROC SUMMARY and PROC MEANS? PROC FREQ: *Code the tables statement for a single-level (most common) frequency. *Code the tables statement to produce a multi-level frequency. *Name the option to produce a frequency line items rather that a table. *Produce output from a frequency. Restrict the printing of the table. PROC MEANS: *Code a PROC MEANS that shows both summed and averaged output of the data. *Code the option that will allow MEANS to include missing numeric data to be included in the report. *Code the MEANS to produce output to be used later.

99. Do you use PROC REPORT or PROC TABULATE? Which do you prefer? Explain. Merging/Updating 100. What happens in a one-on-one merge? When would you use one? 101. How would you combine 3 or more tables with different structures? 102. What is a problem with merging two data sets that have variables with the same name but different data? 103. When would you choose to MERGE two data sets together and when would you SET two data sets? 104. Which data set is the controlling data set in the MERGE statement? 105. How do the IN= variables improve the capability of a MERGE? 106. Explain the message 'MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLES". Simple statistics 107. How would you generate 1000 observations from a normal distribution with a mean of 50 and standard deviation of 20. How would you use PROC CHART to look at the distribution? Describe the shape of the distribution. 108. How do you generate random samples? Customized Report Writing 109. What is the purpose of the statement DATA _NULL_ ;? 110. What is the pound sign used for in the DATA _NULL_? 111. What would you use the trailing @ sign for? 112. For what purpose(s) would you use the RETURN statement? 113. How would you determine how far down on a page you have printed in order to print out footnotes? 114. What is the purpose of using the N=PS option?

S-ar putea să vă placă și