Documente Academic
Documente Profesional
Documente Cultură
PREDICTOR VARIABLES
USING SAS® ENTERPRISE GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
AGENDA HOW TO SELECT THE BEST PREDICTOR VARIABLES
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
WHAT? VARIABLE SELECTION OR VARIABLE REDUCTION
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
WHY? VARIABLE SELECTION
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
WHY VARIABLE
SELECTION?
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
BEFORE VARIABLE
THINGS TO CONSIDER
SELECTION
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
BEFORE VARIABLE
TASKS TO COMPLETE
SELECTION
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
VARIABLE
CONCEPTS
SELECTION
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
VARIABLE SELECTION CONCEPTS
Variable
Selection Cluster Best
Input based on Scores Variables
variables correlation based on based on
and with Variable Variable Principal
TARGET: TARGET: Clustering: Clustering: Components:
VAR01 VAR01 VAR01 VAR01
VAR02 VAR01 VAR02 VAR02
VAR02 VAR02
CLUS1 VAR02
VAR03 VAR03 VAR03
VAR04 VAR03 VAR04 VAR04
VAR04 VAR04
VAR05
CLUS2 VAR05 VAR05 … VAR05
PC01 PC10
VAR06 VAR05 VAR06 VAR06
VAR07 VAR06 VAR07 VAR07
VAR07 VAR07
VAR08 VAR08 VAR08
VAR09 VAR08
CLUS3 VAR09 VAR09
VAR09 VAR09 VAR09
VAR10 VAR10 VAR10
TARGET VAR10
TARGET
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
VARIABLE
ALL POSSIBLE VS AUTOMATIC VS
SELECTION WHICH ONE IS BEST?
CRITERION BASED
METHODS
• All possible
• Best subset selection methods
• Automatic
• Stepwise, Backward, Forward
• Criterion Based
• Variable Ranking, Correlations
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
VARIABLE
SELECTION MORE DETAIL
METHODS
• Stepwise selection considers adding and deleting predictors at each step of the process
• Forward selection begins with a simple regression model and adds, one at a time. However,
once a predictor is in the equation, it is never deleted.
• Backward selection begins with the multiple regression model including all possible
predictors and deletes, one at a time. Once a variable is deleted, it is never reconsidered for
inclusion.
• Best subsets estimates one regression model for all possible combinations of the predictor
variables and chooses the best model among them.
When five predictors are available for estimation, there are: 5 simple
regression models, 10 different two-predictor models, 10 different
three-predictor models, 5 different four-predictor models, and 1 five-
predictor model, totaling 31 regressions. When ten predictors are
available, there are 1,023 possible subsets.
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
DATA DONOR_RAW_DATA
• Y=TARGET_B
• N = 19,372
• Variables = 50 (47
Inputs)
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® Enterprise Guide®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
METHODS AVAILABLE
GUIDE®
• Regression
• Variable Screening
• Correlation
• Variable Clustering
• Principle Components
• Weight of Evidence (WOE) and Information Value (IV)
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
FIRST THINGS FIRST
GUIDE®
Categorical
Set Gender to Unknown, SES to
Level 5 (Unknown), Urbanity to M
(missing)
Continuous
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LOGISTIC REGRESSION
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LOGISTIC REGRESSION
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LOGISTIC REGRESSION
GUIDE®
• Selection=None
• Selection=Forward
• Selection=Backward
• Selection=Stepwise
• Selection=Score
Ones in BLUE available in SAS
Enterprise Guide
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LOGISTIC REGRESSION
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE VARIABLE SELECTION METHODS IN SAS/STAT
GUIDE® PROC LOGISTIC & PROC REG
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LOGISTIC REGRESSION – USE SELECTION=SCORE
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SELECTION METHODS IN SAS/STAT PROC LOGISTIC
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SELECTION METHODS IN SAS/STAT PROC LOGISTIC
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SELECTION METHODS IN SAS/STAT PROC LOGISTIC
GUIDE®
SELECTION = SCORE BEST=1 (continued)
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SELECTION METHODS IN SAS/STAT PROC REG
GUIDE®
• Forward
• Backward
• Stepwise
• Lasso
• Lars
• MAXR
• MINR
• RSQUARE
• CP
• ADJRSQ
• SCORE
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SCREENING - CORRELATIONS
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
Only for numeric variables
SAS® ENTERPRISE
VARIABLE SCREENING - CORRELATIONS
GUIDE®
3 Correlation Types
• Pearson
• Spearman
• Hoeffding’s D
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SCREENING - CORRELATIONS
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LIST HIGH CORRELATION VALUES
GUIDE®
Stack Filter to
Sort to
Run Columns NOT
DEDUP
Correlation (keep BETWEEN
other ½ of
Task _TYPE_= -.5 and .5
matrix
‘CORR’ and NOT=1
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LIST HIGH CORRELATION VALUES
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE CLUSTERING
GUIDE®
• Finds groups of variables that are as correlated as possible with each other
• And as uncorrelated as possible with other variables
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE CLUSTERING
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE CLUSTERING
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
PRINCIPAL COMPONENTS
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
PRINCIPAL COMPONENTS
GUIDE®
5%
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
PRINCIPAL COMPONENTS
GUIDE®
4. Interpretability criterion
1. Are there at least three variables (items) with significant loadings on each retained component?
2. Do the variables that load on a given component share the same conceptual meaning?
3. Do the variables that load on different components seem to be measuring different constructs?
4. Does the rotated factor pattern demonstrate “simple structure?”
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
WEIGHT OF EVIDENCE (WOE) AND INFORMATION VALUE (IV)
GUIDE®
Weight of Evidence (WoE) is a measure of how much an attribute in the data is related to the outcome.
𝐼𝑉 = (𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞 𝑜𝑓 𝑁𝑜𝑡 𝑑𝑖𝑠𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑖𝑛𝑔 𝑆𝑡𝑎𝑡𝑖𝑛𝑖 - 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞 𝑜𝑓 𝐷𝑖𝑠𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑖𝑛𝑔 𝑆𝑡𝑎𝑡𝑖𝑛𝑠𝑖 ) * WOE
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
WEIGHT OF EVIDENCE (WOE) AND INFORMATION VALUE (IV)
GUIDE®
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
Available in SAS 9.4
SAS® ENTERPRISE
WEIGHT OF EVIDENCE (WOE) & INFORMATION VALUE
GUIDE®
Information Value
0.02-0.1 Weak
0.1-0.3 Medium
0.3-0.5 Strong
Variable Reduction in SAS by Using Weight of Evidence and Information Value >0.5 Suspicious
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
Available in SAS 9.4
SAS® Enterprise Miner™
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
METHODS AVAILABLE
MINER™
• Regression
• Decision Trees
• Random Forest
• Variable Selection
• Stat Explore
• Variable Selection
• LARS/LASSO
• Variable Clustering
• Principle Components
• Weight of Evidence (WOE)
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
REGRESSION
MINER™
• Full
• Stepwise
• Backward
• Forward
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
DECISION TREE
MINER™
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
RANDOM FOREST
MINER™
Available in EM 13.1
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
STATEXPLORE
MINER™
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE SELECTION
MINER™
• R-square
• Chi-square
• Both
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
LARS
MINER™
• LASSO
• LAR
• Adaptive LASSO Adjusted LASSO
• None
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
VARIABLE CLUSTERING
MINER™
Cluster components
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
PRINCIPLE COMPONENTS
MINER™
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
Metadata node for combining
MINER™
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
SAS® ENTERPRISE
Metadata node for combining
MINER™
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
Resources for Variable Selection
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES ADDITIONAL READING
• Graphs Useful For Variable Selection in Predictive Modeling Predictive Models Based on Reduced Input
Space That Uses Rejected Variables
• Variable Reduction in SAS by Using Weight of Evidence and Information Value
• Variable Reduction for Modeling using PROC VARCLUS
• Applications of the GLMSELECT Procedure for Megamodel Selection
• Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy
• Model Variable Selection Using Bootstrapped Decision Tree in Base SAS®
• SAS® Code for Variable Selection in Multiple Linear Regression Models Using Information Criteria
Methods with Explicit Enumeration for a Large Number of Independent Regressors
• On Bayesian Model and Variable Selection using MCMC
• Recreating the SELECTION=SCORE Model Specification with the BEST=n Effect Selection Option for
PROC SURVEYLOGISTIC
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES SAS COURSES
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES VIDEOS
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES ADDITIONAL READING
• Graphs Useful For Variable Selection in Predictive Modeling Predictive Models Based on Reduced Input
Space That Uses Rejected Variables
• Variable Reduction in SAS by Using Weight of Evidence and Information Value
• Combining Decision Trees with Regression in Predictive Modeling with SAS® Enterprise Miner™
• Variable Reduction for Modeling using PROC VARCLUS
• Applications of the GLMSELECT Procedure for Megamodel Selection
• Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy
• Model Variable Selection Using Bootstrapped Decision Tree in Base SAS®
• SAS® Code for Variable Selection in Multiple Linear Regression Models Using Information Criteria
Methods with Explicit Enumeration for a Large Number of Independent Regressors
• On Bayesian Model and Variable Selection using MCMC
• Recreating the SELECTION=SCORE Model Specification with the BEST=n Effect Selection Option for
PROC SURVEYLOGISTIC
• An Overview of Machine Learning with SAS Enterprise Miner
• How to Apply the VIF Regression Algorithm in SAS Enterprise Miner
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES SAS COURSES
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES VIDEOS
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES LIVE WEBINARS FROM CUSTOMER LOYALTY
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
RESOURCES LIVE WEBINARS FROM CUSTOMER LOYALTY
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed.
Connect with me:
LinkedIn: https://www.linkedin.com/in/melodierush
Twitter: @SAS_MelodieRush
QUESTIONS?
• Thank you for your time and attention!
Cop yrig ht © 2014, SAS Institute Inc. All rig hts reserv ed. sas.com