Sunteți pe pagina 1din 52

Traditional SAS Programming versus SQL

Rick Andrews
Office of the Actuary
Centers for Medicare and Medicaid Services
Background

 SQL - Structured Query Language


 1975 - Created in by IBM®
 1986 - American National Standards Institute (ANSI)
 1999 - Major revision, adopted as FIPS 127-2
 2003 - Ordered Analytical (a.k.a. Window) Functions

 SAS - Statistical Analysis System


 1976 - Incorporated in by Jim
 1990 - SAS Implements SQL in version 6
 2000 - SAS Implements SQL:1999 in version 8
 SAS does not yet utilize Window Functions
Background (cont.)

• ANSI: SQL 1999 SELECT


FROM

• SAS 9 WHERE
GROUP BY
HAVING
ORDER BY
• ANSI SQL: 2003
• Ordered Analytical Functions OVER
PARTITION BY
• a.k.a. Window Functions ORDER BY
• a.k.a. OLAP Functions ROWS
QUALIFY

• ANSI SQL: 2008


• Not Discussed
Terminology

SQL SAS
Table Data Set
Row Observation
Column Variable
Join Merge
Query Program
CREATING TABLES

 DATALINES  CREATE TABLE


• a.k.a. CARDS  INSERT INTO
 RUN  QUIT

DATA table1a; PROC SQL;


LENGTH var1 $3. var2 $2. CREATE TABLE table1b
var3 $8. numvar 8.; ( var1 CHAR(3), var2 CHAR(2),
INPUT var1 $ var2 $ var3 CHAR(8), numvar NUM );
var3 $ numvar;
DATALINES; INSERT INTO table1b
SAS is Great 1 VALUES('SAS','is','Great ',1)
SAS is Good 2 VALUES('SAS','is','Good ',2)
Let us Thank 4 VALUES('Let','us','Thank ',4)
Jim we Should 8 VALUES('Jim','we','Should',8);
; QUIT;
RUN;
Example Table / Data Set

Var1 Var2 Var3 NumVar

SAS is Great 1
SAS is Good 2
Let us Thank 4
Jim we Should 8
SUB-SETTING TABLES

 WHERE  WHERE
 <= var <=  BETWEEN

PROC SQL;
DATA table2a; CREATE TABLE table2b AS
SET table1a; SELECT * FROM table1b
WHERE 2 <= numvar <= 4; WHERE numvar BETWEEN 2 AND 4;
RUN; QUIT;

 WHERE clause is interchangeable


SORTING

 PROC SORT  ORDER BY

PROC SQL;
PROC SORT DATA= table1a; CREATE TABLE table1b AS
BY var1; SELECT *
RUN; FROM table1b
ORDER BY var1;
QUIT;
ELIMINATING DUPLICATES

 NODUPS  DISTINCT

PROC SQL;
PROC SORT CREATE TABLE table2b AS
DATA= table1a SELECT DISTINCT *
OUT= table2a FROM table1b
NODUPS ; ORDER BY var1;
BY var1; QUIT;
RUN;
ELIMINATING DUPLICATE KEYS

 NODUPKEY  GROUP BY
 Similar to FIRST (dot)  No equivalent in SQL:1999

PROC SQL;
PROC SORT CREATE TABLE table3b AS
DATA= table1a SELECT MIN (var1) AS var1,
OUT= table3a MIN (var2) AS var2,
NODUPKEY ; MIN (var3) AS var3,
BY var1; MIN (numvar) AS numvar
RUN; FROM table1b
GROUP BY var1
ORDER BY var1;
QUIT;
GROUPING

 FIRST (dot)  Aggregate Functions


 LAST (dot)  GROUP BY
PROC SORT DATA= table1a; PROC SQL;
BY var1; CREATE TABLE table5b AS
RUN; SELECT var1,
COUNT(*) AS cnt_var,
DATA table5a ( DROP= numvar ); SUM(numvar) AS sum_var
SET table1a ( KEEP= var1 numvar); FROM table1b
BY var1; GROUP BY var1;
IF FIRST.var1 THEN DO; QUIT;
cnt_var = 0;
sum_var = 0; Var1 cnt_var sum_var
END;
cnt_var + 1; Jim 1 8
sum_var + numvar; Let 1 4
IF LAST.var1 THEN OUTPUT;
RUN; SAS 2 3
First (dot) Example

Var1 Var2 Var3 NumVar


DATA table5a;
SET table1a ( KEEP= var1 numvar ); SAS is Great 1
BY var1;
SAS is Good 2
first_var = FIRST.var1; Let us Thank 4
last_var = LAST.var1;
Jim we Should 8
IF FIRST.var1 = 1 THEN DO;
cntvar = 0;
sumvar = 0;
END; first_ last_ cnt_ sum_ keep_
Var1 NumVar
var var var var it
cntvar + 1;
sumvar + numvar; Jim 8 1 1 1 8 yes
Let 4 1 1 1 4 yes
IF LAST.var1 = 1 THEN keep_it = 'yes'; SAS 1 1 0 1 1
RUN;
SAS 2 0 1 2 3 yes
GROUPING using PROC SUMMARY

 PROC SUMMARY  CLASS


 NWAY MISSING  VAR
 OUTPUT  N= SUM=

PROC SUMMARY DATA= table1a PROC SQL;


NWAY MISSING; CREATE TABLE table5b AS
CLASS var1; SELECT var1,
VAR numvar; COUNT(*) AS cntvar,
OUTPUT OUT= table5c SUM(numvar) AS sumvar
N= cntvar FROM table1b
SUM= sumvar; GROUP BY var1;
RUN; QUIT;
Type of Joins

 INNER JOIN  FULL OUTER JOIN

 LEFT OUTER JOIN  RIGHT OUTER JOIN


Join Examples

Table A Table B
Var1 Var2 Var1 Var3
A 1 A W
B 2 B X
C 3 C Y
D 4 E Z

Inner Join Left Outer Join Right Outer Join Full Outer Join
Var1 Var2 Var3 Var1 Var2 Var3 Var1 Var2 Var3 Var1 Var2 Var3
A 1 W A 1 W A 1 W A 1 W
B 2 X B 2 X B 2 X B 2 X
C 3 Y C 3 Y C 3 Y C 3 Y
D 4 E Z D 4
E Z
INNER JOIN using Comma
 MERGE BY  FROM
 IN= left IN= right  Notice the comma
 IF left AND right  WHERE

PROC SORT DATA=table1d; BY var1; PROC SQL;


PROC SORT DATA=table2d; BY var1; CREATE TABLE table3e AS
SELECT table1e.var1,
DATA table3d; table2e.var2,
MERGE table1d ( IN= left ) table2e.var3,
table2d ( IN= right ); table2e.numvar
BY var1; FROM table1e , table2e
IF left AND right; WHERE table1e.var1 = table2e.var1;
RUN; QUIT;
INNER JOIN using ON clause
 MERGE BY  INNER JOIN
 IN= left IN= right  ON
 IF left AND right

PROC SORT DATA=table1d; BY var1; PROC SQL;


PROC SORT DATA=table2d; BY var1; CREATE TABLE table3g AS
SELECT T1.var1, T2.var2,
DATA table3d; T2.var3, T2.numvar
MERGE table1d ( IN= left ) FROM table1e T1
table2d ( IN= right ); INNER JOIN No comma
BY var1; table2e T2
IF left AND right; ON T1.var1 = T2.var1;
RUN; QUIT;
LEFT JOIN

 IN=  FROM
 IF left  LEFT JOIN
 ON

PROC SORT DATA=table1f; BY var1; PROC SQL;


PROC SORT DATA=table2f; BY var1; CREATE TABLE table3g AS
SELECT T1.var1, T2.var2,
DATA table3f; T2.var3, T2.numvar
MERGE table1f ( IN= left ) FROM table1g T1
table2f; LEFT JOIN No comma
BY var1; table2g T2
IF left; ON T1.var1 = T2.var1;
RUN; QUIT;
FULL JOIN

 MERGE  FULL JOIN


 BY  ON

PROC SORT DATA=table1h; BY var1; PROC SQL;


PROC SORT DATA=table2h; BY var1; CREATE TABLE table3i AS
SELECT T1.var1, T2.var2,
DATA table3h; T2.var3, T2.numvar
MERGE table1h FROM table1i T1
table2h; FULL JOIN
BY var1; table2i T2
RUN; ON T1.var1 = T2.var1;
QUIT;
APPENDING DATA

 SET  UNION ALL

DATA table1j; PROC SQL;


SET table1j CREATE TABLE table1k AS
table2j; SELECT * FROM table1k
RUN; UNION ALL
SELECT * FROM table2k;
QUIT;

 Duplicate records are allowed


APPENDING – NO DUPLICATES

 PROC APPEND  UNION w\o ALL


 NODUPS

PROC APPEND PROC SQL;


DATA=table2l CREATE TABLE table1m AS
OUT=table1l; SELECT * FROM table1m
RUN; UNION
SELECT * FROM table2m;
PROC SORT DATA=table1l QUIT;
NODUPS;
BY var1;
RUN;
IDENTIFYING DUPLICATES

 FIRST (dot) cntvar = 0  COUNT (*)


 LAST (dot) cntvar > 1  HAVING cntvar > 1

PROC SORT DATA=table1; BY var1;


PROC SQL;
DATA find_dups_datastep; CREATE TABLE find_dups_sql AS
SET table1; SELECT var1,
BY var1; COUNT(*) AS cntvar
IF FIRST.var1 THEN cntvar = 0; FROM table1
cntvar + 1; GROUP BY var1
IF LAST.var1 AND cntvar > 1; HAVING cntvar > 1;
RUN; QUIT;
CASE vs IF
 IF  CASE
 THEN  WHEN THEN ELSE
 ELSE  END

PROC SQL;
DATA table1n; CREATE TABLE table2n AS
SET table1a; SELECT *,
IF itis = 'Snowing' CASE
THEN life = 'Good'; WHEN itis = 'Snowing'
ELSE life = 'Ok'; THEN 'Good'
RUN; ELSE 'Ok'
END AS life
FROM table1b;
QUIT;
SUB-QUERY Example

PROC SQL;
DATA t2 ; CREATE TABLE t4 AS
SET t1; SELECT *
WHERE life = 'Good'; FROM t3
KEEP var1; INNER JOIN
RUN; (
SELECT DISTINCT var1
PROC SORT DATA= t2 NODUPS ; BY var1; FROM t1
PROC SORT DATA= t3 ; BY var1; WHERE life = 'Good'
) AS t2
DATA t4; ON t2.var1 = t3.var1;
MERGE t2 ( IN= left ) QUIT;
t3 ( IN= right) ;
BY var1;
IF left AND right;
RUN;
MACRO VARIABLES

 DATA _NULL_  PROC SQL NOPRINT


 CALL SYMPUT  INTO : macvar

DATA _NULL_; PROC SQL NOPRINT;


SET table1a; SELECT numvar
CALL SYMPUT('macvar1', numvar); INTO : macvar2
RUN; FROM table1b;
QUIT;
DYNAMIC IN (LIST)
 SELECT
 Concatenate ||
 SEPARATED BY

PROC SQL NOPRINT;


SELECT "'" || cptcode || "'"
INTO : mylist
SEPARATED BY ','
FROM table2;
QUIT;

%PUT &mylist;
*RESULT = '21081','21082','21083','21084','21085';
Implicit Pass-thru Query using a DATA Step

 Using a DATA Step against a database is not recommended

LIBNAME enroll ORACLE SCHEMA=enroll PATH='hcis' ... ;


LIBNAME ref ORACLE SCHEMA=ref PATH='hcis' ... ;

DATA hcis1;
MERGE enroll.bene_smry (IN=left WHERE=(year='2004'))
ref.state_tbl (RENAME=(state_cd=bene_state));
BY bene_state;
KEEP bene_state state_name bene_cnt_tot;
IF FIRST.bene_state THEN bene_cnt = 0;
bene_cnt_tot + bene_cnt;
IF left AND LAST.bene_state THEN OUTPUT;
RUN;
Implicit Pass-thru Query using SQL Procedure

LIBNAME enroll ORACLE SCHEMA=enroll PATH='hcis' ... ;


LIBNAME ref ORACLE SCHEMA=ref PATH='hcis' ... ;

PROC SQL;
CREATE TABLE hcis2 AS
SELECT T1.bene_state, T2.state_name,
SUM(T1.bene_cnt) as bene_cnt_tot
FROM enroll.bene_smry T1
LEFT JOIN
ref.state_tbl T2
ON T1.bene_state = T2.state_cd
WHERE T1.year ='2004'
GROUP by
T1.bene_state, T2.state_name;
QUIT;
Explicit Pass-thru Query using SQL Procedure

PROC SQL;
CONNECT TO ORACLE ( PATH='hcisprd.world' ... );
CREATE TABLE hcis3 AS
SELECT * FROM CONNECTION TO ORACLE
(
SELECT T1.bene_state, T2.state_name,
SUM(T1. bene_cnt) as bene_cnt_tot
FROM enroll.bene_smry T1,
ref.state_tbl T2
WHERE T2.state_cd = T1.bene_state (+)
AND T1.year = '2004'
GROUP BY T1.bene_state, T2.state_name
);
DISCONNECT FROM ORACLE;
QUIT;
Ordered Analytical Functions

• ANSI SQL: 2003 Window Functions


- AVG - RANK - REGR_SXY
- CORR - REGR_AVGX - REGR_SYY
- COUNT - REGR_AVGY - ROW_NUMBER
- COVAR_POP - REGR_COUNT - STDDEV_POP
- COVAR_SAMP - REGR_INTERCEPT - STDDEV_SAMP
- MAX - REGR_R2 - SUM
- MIN - REGR_SLOPE - VAR_POP
- PERCENT_RANK - REGR_SXX - VAR_SAMP

30
Ordered Analytical Functions (cont.)

• Partial Window Function Diagram


AVG( value_expression ) OVER ( A
,

A PARTITION BY column_reference B
,

B ORDER BY value_expression C
ASC
DESC
C )

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING


value FOLLOWING
CURRENT ROW
value PRECEDING

31
OVER ( ) vs. “End Of File”

• OVER Clause - Window Function


SELECT This Window is
Bene_SK, Thru_Dt, Clm_Type, Paid_Amt,
SUM(Paid_Amt) OVER ( ) AS Total_Paid
the entire table
FROM Mdcr_Clm
Bene Thru Clm Paid Total
• SAS Equivalent ID Date Type Amt Paid
1 01-Jan-11 IP $500 $2300
DATA Derived_Table;
SET Mdcr_Clm END=EndOfFile; 1 05-Jan-11 SNF $300 $2300
Total_Paid + Paid_Amt;
IF EndOfFile THEN OUTPUT; 1 30-Jan-11 PHY $175 $2300
RUN;
2 02-Feb-11 IP $750 $2300
DATA Mdcr_Final;
SET Mdcr_Clm; 2 14-Feb-11 SNF $300 $2300
IF _N_=1 THEN SET Derived_Table;
2 21-Feb-11 PHY $125 $2300
RUN;
2 17-Mar-11 HHA $150 $2300

32
OVER ( ) vs. PROC SUMMARY

• OVER Clause - Window Function


SELECT This Window is
Bene_SK, Thru_Dt, Clm_Type, Paid_Amt,
SUM(Paid_Amt) OVER ( ) AS Total_Paid
the entire table
FROM Mdcr_Clm
Bene Thru Clm Paid Total
• SAS Equivalent ID Date Type Amt Paid
1 01-Jan-11 IP $500 $2300
PROC SUMMARY DATA=Mdcr_Clm;
VAR Paid_Amt; 1 05-Jan-11 SNF $300 $2300
OUTPUT OUT=Derived_Table
SUM=Total_Paid; 1 30-Jan-11 PHY $175 $2300
RUN;
2 02-Feb-11 IP $750 $2300
PROC SQL; 2 14-Feb-11 SNF $300 $2300
CREATE TABLE Mdcr_Final2 AS
SELECT t1.*, t2.Total_Paid 2 21-Feb-11 PHY $125 $2300
FROM Mdcr_Clm AS t1,
Derived_Table AS t2; 2 17-Mar-11 HHA $150 $2300
QUIT;

33
OVER ( ) vs. Derived Table

• OVER Clause - Window Function


SELECT This Window is
Bene_SK, Thru_Dt, Clm_Type, Paid_Amt,
SUM(Paid_Amt) OVER ( ) AS Total_Paid
the entire table
FROM Mdcr_Clm
Bene Thru Clm Paid Total
• Derived Table Equivalent ID Date Type Amt Paid
1 01-Jan-11 IP $500 $2300
PROC SQL;
CREATE TABLE Mdcr_Final3 AS 1 05-Jan-11 SNF $300 $2300
SELECT t1.*, q1.Total_Paid
FROM Mdcr_Clm AS t1, 1 30-Jan-11 PHY $175 $2300
(
SELECT SUM(Paid_Amt) AS Total_Paid 2 02-Feb-11 IP $750 $2300
FROM Mdcr_Clm AS t2 2 14-Feb-11 SNF $300 $2300
) AS Derived_Table ;
QUIT; 2 21-Feb-11 PHY $125 $2300
2 17-Mar-11 HHA $150 $2300

34
BY-Group Example by Claim Type
Example 1 Example 2
• Similar to Partition By Clm First Last
Type (dot) (dot)
Clm First Last
Type (dot) (dot)
HHA 1 1 DME 1 0
PROC SORT DATA=Mdcr_Clm;
IP 1 0 DME 0 0
BY Clm_Type;
RUN; IP 0 1 DME 0 0
PHY 1 0 DME 0 1
DATA Derived_Table; PHY 0 1 HOS 1 0
SET Mdcr_Clm; SNF 1 0 HOS 0 0
BY Clm_Type; SNF 0 1 HOS 0 1
IF FIRST.Clm_Type THEN Total_Paid=0;
Total_Paid + Paid_Amt; Bene Thru Clm Paid Total
IF LAST.Clm_Type THEN OUTPUT; ID Date Type Amt Paid
RUN; 2 17-Mar-11 HHA $150 $150
1 01-Jan-11 IP $500 $1,250
DATA Final_Table;
MERGE Mdcr_Clm 2 02-Feb-11 IP $750 $1,250
Derived_Table; 1 30-Jan-11 PHY $175 $300
BY Clm_Type; 2 21-Feb-11 PHY $125 $300
RUN;
1 05-Jan-11 SNF $300 $600
2 14-Feb-11 SNF $300 $600
35
PARTITION BY Phrase by Claim Type

• Similar to By-Group Processing


This Window is by “Clm Type”

SELECT Bene Thru Clm Paid Total


Bene_SK, Thru_Dt, Clm_Type, Paid_Amt, ID Date Type Amt Paid
SUM(Paid_Amt) 2 17-Mar-11 HHA $150 $150
OVER ( PARTITION BY Clm_Type ) 1 01-Jan-11 IP $500 $1,250
AS Total_Paid 2 02-Feb-11 IP $750 $1,250
FROM Mdcr_Clm
1 30-Jan-11 PHY $175 $300
2 21-Feb-11 PHY $125 $300
1 05-Jan-11 SNF $300 $600
2 14-Feb-11 SNF $300 $600

“Total Paid” reflects each Window

36
Without ORDER BY Phrase

• Partition by Bene_SK
• Thru_Dt out of order This Window is by “Bene ID”

Bene Thru Clm Paid Total


ID Date Type Amt Paid
SELECT 1 30-Jan-11 PHY $175 $975
Bene_SK, Thru_Dt, Clm_Type, Paid_Amt, 1 05-Jan-11 SNF $300 $975
SUM(Paid_Amt) 1 01-Jan-11 IP $500 $975
OVER ( PARTITION BY Bene_SK ) 2 21-Feb-11 PHY $125 $1325
AS Total_Paid 2 17-Mar-11 HHA $150 $1325
FROM Mdcr_Clm
2 14-Feb-11 SNF $300 $1325
2 02-Feb-11 IP $750 $1325

In “Paid Amt” order by chance

37
With ORDER BY Phrase

• Partition by Bene ID
• Order by Thru_Dt
Window is now Ordered By Thru Dt

Bene Thru Clm Paid Total


SELECT ID Date Type Amt Paid
Bene_SK, Thru_Dt, Clm_Type, Paid_Amt, 1 01-Jan-11 IP $500 $975
SUM(Paid_Amt) 1 05-Jan-11 SNF $300 $975
OVER ( 1 30-Jan-11 PHY $175 $975
PARTITION BY Bene_SK 2 02-Feb-11 IP $750 $1325
ORDER BY Thru_Dt ) 2 14-Feb-11 SNF $300 $1325
AS Total_Paid
2 21-Feb-11 PHY $125 $1325
FROM Mdcr_Clm
2 17-Mar-11 HHA $150 $1325

38
ROWS Phrase
• ROWS
– starting point for the partition always first record in the group
– aggregation group end always current row
• ROWS BETWEEN
– aggregation group-start and end
– defines a set of rows relative to the current
– must precede the row specified by the group end
• UNBOUNDED PRECEDING – entire partition preceding current row
• UNBOUNDED FOLLOWING – entire partition following current row
• CURRENT ROW – start or end of aggregation group as the current row
• value PRECEDING – number of rows preceding current row
• value FOLLOWING – number of rows following current row

39
Cumulative Summary

• OVER clause contains only the ROWS BETWEEN phrase


• UNBOUNDED PRECEDING & CURRENT ROW
• The Window is the entire table
Creates a cumulative summary

Bene Thru Clm Paid Total


SELECT ID Date Type Amt Paid
Bene_SK, Thru_Dt, Clm_Type, Paid_Amt, 1 30-Jan-11 PHY $175 $175
SUM(Paid_Amt) 1 05-Jan-11 SNF $300 $475
OVER (
1 01-Jan-11 IP $500 $975
ROWS BETWEEN UNBOUNDED PRECEDING
2 21-Feb-11 PHY $125 $1,100
AND CURRENT ROW )
2 17-Mar-11 HHA $150 $1,250
AS Total_Paid
FROM Mdcr_Clm 2 14-Feb-11 SNF $300 $1,550
2 02-Feb-11 IP $750 $2,300

40
Cumulative Summary by Bene

• This Window is by Bene ID


• Ordered by Thru Dt
• Summary restarts for each Bene ID
Cumulative Summary by Bene ID

SELECT Bene Thru Clm Paid Total


Bene_SK, Thru_Dt, Clm_Type, Paid_Amt, ID Date Type Amt Paid
SUM(Paid_Amt)
1 01-Jan-11 IP $500 $ 500
OVER (
1 05-Jan-11 SNF $300 $ 800
PARTITION BY Bene_SK
1 30-Jan-11 PHY $175 $ 975
ORDER BY Thru_Dt
2 02-Feb-11 IP $750 $ 750
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW ) 2 14-Feb-11 SNF $300 $1050
AS Total_Paid 2 21-Feb-11 PHY $125 $1175
FROM Mdcr_Clm 2 17-Mar-11 HHA $150 $1325

41
Previous Row Example

• Obtain previous service date


Create Episode Indicator
• New episode after 5 days
Bene Prev Days Begin
COALESCE( Svc Dt
SK Svc Dt After Episode
MAX(Svc_Dt)
OVER ( 1 11Apr2016 01Jan1980 13250 1
PARTITION BY BENE_SK 1 12Apr2016 11Apr2016 1 0
ORDER BY Svc_Dt
ROWS BETWEEN 1 PRECEDING AND 1 13Apr2016 12Apr2016 1 0
1 PRECEDING ), 1 14Apr2016 13Apr2016 1 0
DATE '1980-01-01')
AS Prev_Svc_Dt, 1 15Apr2016 14Apr2016 1 0
1 18Apr2016 15Apr2016 3 0
Svc_Dt - Prev_Svc_Dt AS Days_After, 1 19Apr2016 18Apr2016 1 0
1 02May2016 22Apr2016 10 1
CASE
WHEN Days_After > 5 1 04May2016 02May2016 2 0
THEN 1 else 0 1 06May2016 04May2016 2 0
END AS Begin_Episode
1 09May2016 06May2016 3 0
42
Keep Last Record in a Window
• Row_Number used to keep last record
• Remove the DESC keyword to keep first record
PRVDR PRVDR TOT
OSCAR PRVDR PSF PASS THRU
NUM EFCTV DT AMT
123456 01Oct2016 991.06
123456 30Sep2016 991.06
123456 20Sep2016 991.06
123456 16Aug2016 764.43
select 123456 15Jun2016 764.43
PRVDR_OSCAR_NUM 123456 15May2016 668.30
,PRVDR_TOT_PASS_THRU_AMT
123456 01Oct2015 668.30
from V2_MDCR_PRVDR_PSF 123456 25Sep2015 668.30
qualify 123456 01Jun2015 750.25
row_number()
over (
partition by PRVDR_OSCAR_NUM
order by PRVDR_OSCAR_NUM, PRVDR_PSF_EFCTV_DT desc ) = 1

43
HAVING Clause Example
• Keep Bene’s w/ 12 months of coverage
Cumulative Count by “Bene ID”

Bene ID Elig Mth Mth Cnt


SELECT
1 Jan-2011.. 1
Bene_SK,
1 Feb-2011.. 2
COUNT(*) AS Mth_Cnt 1 Mar-2011.. 3
FROM Bene_Fact 1 Apr-2011.. 4
GROUP BY Bene_SK 1 May-2011.. 5
1 Jun-2011.. 6
HAVING COUNT(*) = 12
1 Jul-2011.. 7
1 Aug-2011.. 8
1 Sep-2011.. 9
1 Oct-2011.. 10
Query Results 1 Nov-2011.. 11
1 Dec-2011.. 12
Bene ID Mth Cnt 2 Jan-2011.. 1
1 12 2 Feb-2011.. 2
44
Snowbird Example

• Determine state where Note the cumulative counts


bene lived the longest
• Keep beneficiary with Bene State
ID Cd
Elig
Mth
Mth State
Cnt Cnt
12 months of coverage 1
1
NY
NY
Jan-2011.. 1
Feb-2011.. 2
1
2
• Bene number 1 lived 1
1
NY
NY
Mar-2011.. 3
Apr-2011.. 4
3
4
in FL 4 months and 1
1
NY
NY
May-2011.. 5
Jun-2011.. 6
5
6
in NY 8 months 1 NY Jul-2011.. 7 7
1 NY Aug-2011.. 8 8
1 FL Sep-2011.. 9 1
Desired Results 1 FL Oct-2011.. 10 2
1 FL Nov-2011.. 11 3
Bene ID State Cd Mth Cnt 1 FL Dec-2011.. 12 4
1 NY 12 2 CA Jan-2011.. 1 1
2 CA Feb-2011.. 2 2

45
SAS By-Group Example
PROC SORT DATA=work.Sample_Data;
BY Bene_SK State_Cd;
RUN; Twelve Months
Bene ID Mth Cnt
DATA work.Twelve_Months ( 1 12
KEEP= Bene_SK Mth_Cnt )
work.State_Counts ( State Counts
Bene ID State Cd State Cnt
KEEP= Bene_SK State_Cd State_Cnt );
1 FL 4
SET work.Sample_Data; 1 NY 8
BY Bene_SK State_Cd; 2 CA 2
IF FIRST.Bene_SK THEN Mth_Cnt = 0;
Mth_Cnt + 1;
IF FIRST.State_Cd THEN State_Cnt = 0;
State_Cnt + 1;
IF LAST.Bene_SK AND Mth_Cnt = 12
THEN OUTPUT work.Twelve_Months;
IF LAST.State_Cd
THEN OUTPUT work.State_Counts;
RUN; 46
Create Final Output with SAS
PROC SORT DATA=work.State_Counts; State Counts
BY BENE_SK Elig_Mth State_Cnt; Bene ID State Cd Elig Mth State Cnt
1 FL 01-OCT... 4
RUN;
1 NY 01-DEC... 8
DATA work.Max_State; 2 CA 01-FEB... 2

SET work.State_Counts;
BY BENE_SK Elig_Mth State_Cnt; Max State
Bene ID State Cd State Cnt
IF LAST.BENE_SK;
1 NY 8
RUN; 2 CA 2

DATA work.Final;
Twelve Months
MERGE
Bene ID Mth Cnt
work.Max_State 1 12
work.Twelve_Months (IN=Keep_It);
BY BENE_SK; Final
KEEP BENE_SK State_Cd Mth_Cnt; Bene ID State Cd Mth Cnt
1 NY 12
IF Keep_It THEN OUTPUT;
RUN; 47
SQL Qualify Clause Example
Window by Bene ID & State Cd
• Keep “Bene ID” with Window by Bene ID
12 months of coverage
Bene State Elig Mth State
SK Cd Mth Cnt Cnt
SELECT 1 NY Jan-2011.. 12 8
Bene_SK, State_Cd, Elig_Mth, 1 NY Feb-2011.. 12 8
COUNT(*) OVER ( PARTITION BY Bene_SK ) 1 NY Mar-2011.. 12 8
AS Mth_Cnt, 1 NY Apr-2011.. 12 8
1 NY May-2011.. 12 8
COUNT(*) OVER ( PARTITION BY Bene_SK, State_Cd ) 1 NY Jun-2011.. 12 8
AS State_Cnt 1 NY Jul-2011.. 12 8
FROM Bene_Fact 1 NY Aug-2011.. 12 8
QUALIFY Mth_Cnt = 12 1 FL Sep-2011.. 12 4
1 FL Oct-2011.. 12 4
1 FL Nov-2011.. 12 4
1 FL Dec-2011.. 12 4
2 CA Jan-2011.. 2 2
2 CA Feb-2011.. 2 2

Bene 2 will be eliminated 48


QUALIFY Using RANK Function

• Determine state where Rank Value of 1 is Desired


bene lived the longest
SELECT Bene_SK, State_Cd, State_Cnt Bene State Mth State Rank
FROM ( ... QUALIFY ... = 12 ) AS Derived_Table SK Cd Cnt Cnt Value
GROUP BY Bene_SK, State_Cd, State_Cnt 1 NY 12 8 1
QUALIFY 1 FL 12 4 2

RANK()
OVER (
PARTITION BY Bene_SK Bene State State
Id Cd Cnt
ORDER BY State_cnt DESC ) = 1
1 NY 8

• RANK is used to identify state with largest count


• Will not work if there is a tie
49
QUALIFY Using MAX Function

• Determine state where Max Value of State is Desired


bene lived the longest
SELECT Bene_SK, State_Cd, State_Cnt Bene State Mth State
FROM ( ... QUALIFY ... = 12 ) AS Derived_Table SK Cd Cnt Cnt
GROUP BY Bene_SK, State_Cd, State_Cnt 1 NY 12 8
QUALIFY 1 FL 12 4

MAX(State_Cnt)
OVER (
PARTITION BY Bene_SK ) = State_Cnt Bene State State
Id Cd Cnt
1 NY 8

• MAX is used to identify state with largest count


• Will not work if there is a tie
50
Example of a Tie

• Bene lived in VA and MD for same number of months


• Spec is to keep state where Bene lived last
Bene State Elig Mth State Bene State Elig Mth State
SK Cd Mth Cnt Cnt SK Cd Mth Cnt Cnt
1 NY Jan-2011.. 1 1 3 MD Jan-2011.. 1 1
1 NY Feb-2011.. 2 2 3 MD Feb-2011.. 2 2
1 NY Mar-2011.. 3 3 3 MD Mar-2011.. 3 3
1 NY Apr-2011.. 4 4 3 MD Apr-2011.. 4 4
1 NY May-2011.. 5 5 3 MD May-2011.. 5 5
1 NY Jun-2011.. 6 6 3 MD Jun-2011.. 6 6
1 NY Jul-2011.. 7 7 3 VA Jul-2011.. 7 1
1 NY Aug-2011.. 8 8 3 VA Aug-2011.. 8 2
1 FL Sep-2011.. 9 1 3 VA Sep-2011.. 9 3
1 FL Oct-2011.. 10 2 3 VA Oct-2011.. 10 4
1 FL Nov-2011.. 11 3 3 VA Nov-2011.. 11 5
1 FL Dec-2011.. 12 4 3 VA Dec-2011.. 12 6
2 CA Jan-2011.. 1 1
2 CA Feb-2011.. 2 2
51
QUALIFY Using ROW_NUMBER
• Determine state where
bene lived the longest
Row Number of 1 is Desired
• Or if there is a tie
SELECT Bene_SK, State_Cd, State_Cnt Bene State Elig Mth State Row
FROM ( ... QUALIFY ... = 12 ) AS Derived_Table SK Cd Mth Cnt Cnt Nbr
GROUP BY Bene_SK, State_Cd, Elig_Mth, State_Cnt 1 NY Aug-2011 12 8 1
QUALIFY 1 FL Dec-2011 12 4 2
3 VA Dec-2011 12 6 1
ROW_NUMBER() 3 MD Jun-2011 12 6 2
OVER (
PARTITION BY Bene_SK Bene State State
ORDER BY State_cnt DESC, Id Cd Cnt
1 NY 8
Elig_Mth DESC ) = 1 3 VA 6

• ROW NUMBER is used to identify state with largest


count AND the last month of residence for a tie.
52

S-ar putea să vă placă și