Documente Academic
Documente Profesional
Documente Cultură
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2 2
Gender Discrimination
► Consider the ‘Gender Discrimination’ case
► Take a look at the data
200 observations (134 Female, 66 Male)
Gender, Job Level, Education Level, Work Ex, Prior Work Ex,
Analytical Skills, Age
Salary
► A simple analysis: Collect the male and female employees into
two columns and find the average salaries:
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3 3
Gender Discrimination
► Consider the ‘Gender Discrimination’ case
► Take a look at the data
200 observations (134 Female, 66 Male)
Gender, Job Level, Education Level, Work Ex, Prior Work Ex,
Analytical Skills, Age
Salary
► A simple analysis: Collect the male and female employees into
two columns and find the average salaries:
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5 5
Gender Discrimination
► Consider the ‘Gender Discrimination’ case
► Take a look at the data
200 employees (134 Female, 66 Male)
Gender, Job Level, Education Level, Work Ex, Prior Work Ex,
Analytical Skills, Age
Salary
► Identify the numerical and categorical variables
► How would you model Gender?
Create a dummy (categorical/indicator) variable
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide6 6
Model 1 (Salary vs Gender)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide7 7
Model 1 (Salary vs Gender)
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
1 Genderb . Enter
a. Dependent Variable: SalaryRs
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .351a .123 .119 127558.101
a. Predictors: (Constant), Gender
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 452670537977.227 1 452670537977.2 27.821 .000b
1 Residual 3221671689990.777 198 16271069141.36
Total 3674342227968.004 199
Coefficientsa
Standardized
Unstandardized Coefficients
Model Coefficients t Sig.
B Std. Error Beta
(Constant) 545158.182 15701.317 34.721 .000
1
Gender -101176.988 19182.212 -.351 -5.275 .000
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide8 8
Model 1 (Salary vs Gender)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide9 9
Model 1 (Salary vs Gender)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1010
Model 1 (Salary vs Gender)
► Draw the regression plot
600000
580000
560000
Male:
540000 545158
520000
500000
480000
460000
Female:
440000
443981
420000
400000
1 1.2 1.4 1.6 1.8 2
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1111
Model 1 (Salary vs Gender)
► Draw the regression plot
600000
580000
560000
Male:
540000 545158
520000
500000
480000
460000
Female:
440000
443981
420000
400000
1 1.2 1.4 1.6 1.8 2
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1212
Model 2 (Salary vs Gender and YrsExp)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1313
Model 2 (Salary vs Gender and YrsExp)
Variables Entered/Removeda
Variables
Model Variables Entered Method
Removed
1 ExpYears, Genderb . Enter
a. Dependent Variable: SalaryRs
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .702a .493 .488 97212.725
a. Predictors: (Constant), ExpYears, Gender
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 1812630387807.858 2 906315193903.929 95.903 .000b
1 Residual 1861711840160.146 197 9450313909.442
Total 3674342227968.004 199
Coefficientsa
Standardized
Unstandardized Coefficients
Model Coefficients t Sig.
B Std. Error Beta
(Constant) 440769.577 14795.584 29.791 .000
1 Gender -96986.767 14623.041 -.336 -6.632 .000
ExpYears 11757.078 980.075 .609 11.996 .000
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1414
Model 2 (Salary vs Gender and YrsExp)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1515
Model 2 (Salary vs Gender and YrsExp)
600000
550000
500000
450000
Male:
400000 440769
350000
Female:
300000
343783
250000
0 1 2 3 4 5 6 7 8 9 10
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1616
Model 2 (Salary vs Gender and YrsExp)
600000
550000 Conclusion:
500000 • Women are
450000 discriminated at the
Male:
400000 440769 entry level but there is
350000
no discrimination on the
Female:
300000
343783 job!
250000
0 1 2 3 4 5 6 7 8 9 10
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1717
Model 2 (Salary vs Gender and YrsExp)
► Examine the residual plot(s)
Residuals vs. Years Experience
Check for any patterns/correlation
400000
300000
200000
100000
0
0 5 10 15 20 25 30 35 40
-100000
-200000
-300000
-400000
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1818
Model 2 (Salary vs Gender and YrsExp)
► Examine the residual plot(s)
Residuals vs. Years Experience for Males/Females separately
What about patterns/correlations?
What is your conclusion? (Need interactive effects)
400000
300000
e = – 57559 + 6482.8yrs
200000
100000
300000
0
200000
0 10 20 30 40
-100000
100000
-200000
0
0 5 10 15 20 25 30 35
-300000
-100000
-200000
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide1919
Model 3 (Salary vs Gender, YrsExp, and interaction)
Interaction Variable
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2020
Model 3 (Salary vs Gender, YrsExp, and interaction)
Variables Entered/Removeda
Model Variables Entered Variables Method
Removed
GenderYearsExp, . Enter
1
ExpYears, Genderb
a. Dependent Variable: SalaryRs
Model Summary
Model R R Square Adjusted R Square Std. Error of the
Estimate
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 2367390744669.802 3 789130248223.267 118.344 .000b
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2121
Model 3 (Salary vs Gender, YrsExp, and interaction)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2222
Model 3 (Salary vs Gender, YrsExp, and interaction)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2323
Model 3 (Salary vs Gender, YrsExp, and interaction)
► What we really need is not only the effect of gender and
experience, but their interactive effect
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2424
Model 3 (Salary vs Gender, YrsExp, and interaction)
► What we really need is not only the effect of gender and
experience, but their interactive effect
600000
550000
500000
Female:
450000
417910
400000
Male:
350000 383210
300000
0 1 2 3 4 5 6 7 8 9 10
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2525
Model 3 (Salary vs Gender, YrsExp, and interaction)
► What we really need is not only the effect of gender and
experience, but their interactive effect
600000
Conclusion:
550000
• Men are discriminated
500000
Female:
against at the entry
450000
417910 level…
400000
• Women are
Male:
350000 383210 discriminated against on
300000 the job!
0 1 2 3 4 5 6 7 8 9 10
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2626
Model 4 (Education Levels)
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
PostGraduate, ExpYears,
Science, Gender,
1 . Enter
Commerce, Technology,
GenderYearsExpb
Model Summary
Model R R Square Adjusted R Square Std. Error of the
Estimate
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 2700356036352.063 7 385765148050.295 76.045 .000b
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2727
Model 4 (Education Levels)
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide2929
Stepwise Regression
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3131
Stepwise Regression
► Suppose, we wish to build a regression model from scratch
► We have identified the dependent variable and all possible
independent variables
► In stepwise regression, we add one independent variable to the
model at a time
► The criterion used for selection of the independent variable to
be added to the model is based on part (semi-partial)
correlation and significance level (p-in value)
Recap:
Partial Correlation: Correlation between Y and X2 after the effect
of X1 has been removed from both Y and X2
Semi-Partial (Part) Correlation: Correlation between Y and X2 after
the effect of X1 has been removed from only X2
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3232
Stepwise Regression
► Suppose, we wish to build a regression model from scratch
► We have identified the dependent variable and all possible
independent variables
► In stepwise regression, we add one independent variable to the
model at a time
► The criterion used for selection of the independent variable to
be added to the model is based on part (semi-partial)
correlation and significance level (p-in value)
► When a new variable is added to the model, an existing
variable might become insignificant and can be removed from
the model (p-out value)
► Process stops when:
No more variables are left to be considered
No variables can be added based on significance level (p-in)
No variables can be removed based on significance level (p-out)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3333
Stepwise Regression
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3434
Stepwise Regression
► Begin with the variable that has the largest partial (and also part)
correlation with the response variable
► Initially, this is the same as pairwise correlation (Identify)
► In our example, it is Org Level F (What is R2?)
► Run a simple linear regression with salary as the response
variable and Org Level F as the predictor variable
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3535
Stepwise Regression
Coefficientsa
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3636
Stepwise Regression
► Begin with the variable that has the largest partial (and also part)
correlation with the response variable
► Initially, this is the same as pairwise correlation (Identify)
► In our example, it is Org Level F (What is R2?)
► Run a simple linear regression with salary as the response
variable and Org Level F as the predictor variable
► Having entered a variable, do a partial F-test to test its
significance
► Initially, partial F-test is the same as F-test (obtained from
output) (only for the first variable)
► The F-test shows that the overall model is significant; so,
Org Level F stays
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3737
Stepwise Regression
Excluded variables
Collinearity
Partial Statistics
Model Beta In t Sig.
Correlation
Tolerance
Gender -.143b -2.602 .010 -.182 .889
ExpYears .328b 5.283 .000 .352 .629
GenderYearsExp -.060b -1.124 .262 -.080 .980
Commerce -.131b -2.514 .013 -.176 .986
Science -.004b -.066 .947 -.005 .998
Technology -.092b -1.754 .081 -.124 .993
PostGraduate .290b 5.761 .000 .380 .936
AnalyticalSkills -.056b -1.070 .286 -.076 .994
1
ORGLevelA -.323b -6.682 .000 -.430 .971
ORGLevelB -.145b -2.774 .006 -.194 .982
ORGLevelC .032b .600 .549 .043 .984
ORGLevelD .212b 4.178 .000 .285 .989
ORGLevelE .384b 8.473 .000 .517 .992
Arts -.096b -1.821 .070 -.129 .986
BirthYear .123b 2.164 .032 .152 .835
PriorExpYears .042b .797 .426 .057 .974
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3838
Stepwise Regression
► Amongst the excluded variables, select the variable that has the
largest partial correlation with the response variable
► Value of Org Level E partial correlation = 0.517
► What does this mean?
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide3939
Stepwise Regression
► Amongst the excluded variables, select the variable that has the
largest partial correlation with the response variable
► Value of Org Level E partial correlation = 0.517
► What does this mean?
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4040
Stepwise Regression
► Amongst the excluded variables, select the variable that has the
largest partial correlation with the response variable
► Value of Org Level E partial correlation = 0.517
► What does this mean?
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4141
Stepwise Regression
► Value of Org Level E partial correlation = 0.517
► Square of this partial correlation is 0.26789
► Interpret this value:
This implies that 26.789% of the residual of Model 1 is explained
by the Residual_(Org Level E vs. Org Level F)
SSE = 2009697221196.878
Variability unexplained by Model 1
SSR = 1664645006771.126
Variability explained by Model 1
Explanatory Variable: Org Level F 26.79% 83.21%
Orange
Semi-partial correlation of Org Level E is analogous to: = 0.3828
Orange + Red + Blue
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4444
Stepwise Regression (Method 2: Using R2)
► Value of Org Level E partial correlation = 0.517
► Square of this partial correlation is 0.26789
► Interpret this value:
This implies that 26.789% of the residual of Model 1 is explained
by the Residual_(Org Level E vs. Org Level F)
R2 = 0.453 1 – R2 = 0.547
Variability explained by Model 1
Variability unexplained by Model 1
Explanatory Variable: Org Level F
1 – R2 = 0.547
R2 = 0.453 Variability unexplained by Model 1
Variability explained by Model 1
Explanatory Variable: Org Level F 26.79% 83.21%
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4646
Stepwise Regression
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4747
Stepwise Regression
R2 = 0.599 1 – R2 = 0.401
Variability explained by Model 2
Variability unexplained by Model 2
Explanatory Variables: Org Level F, Org Level E
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4848
Stepwise Regression
Coefficientsa
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide4949
Stepwise Regression
► Now, Org Level E has also been added to the model as the
second explanatory variable
► Does the presence of Org Level F impact the significance of Org
Level E as a predictor?
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5050
Stepwise Regression
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5151
Stepwise Regression
► Now, Org Level E has also been added to the model as the
second explanatory variable
► Does the presence of Org Level F impact the significance of Org
Level E as a predictor?
► Do a “partial F-test” for Org Level E
As p-value is less than 0.05 (p-in value), Org Level E stays
► Repeat analysis for Org Level F:
If p-value is greater than 0.15 (p-out value), remove Org
Level F
► Note that: p-in < p-out (always)
► Default values used by SPSS: p-in = 0.05 and p-out = 0.10
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5252
Stepwise Regression
Part not
Part Part Part explained
Part explained
explained explained explained … by the
by X1
by X2 by X3 by X4 variables
(SSE)
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5353
Comments on Stepwise Regression
► Step 0: Find pairwise correlations between all dependent variable and all
independent variables; select the one with the largest correlation, and
add to “included variables” list. Go to Step 1
► Step 1: Develop a linear regression model between dependent variable
and all variables in “included variables” list. Do a partial F-test to
determine if last included variable is insignificant (p-in value). If yes, go
to Step 4; else, go to Step 2
► Step 2: Do a partial F-test and check if any of the included variables are
insignificant (p-out value). If yes, remove the most insignificant variable
from the list and add it to “excluded variables” list. Go to Step 3
► Step 3: Look at the “excluded variables” list; if none exist, stop; return
current model as final model. Else, select the one with the largest
partial correlation. Go to Step 5.
► Step 4: Stop; return current model as final model. Else, go to Step 5.
► Step 5: Add selected variable to “included variables” list and go to Step 1
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5454
Variable Selection Methods
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5656
Backward Regression
10/2/2018
PGP 2018-20 A Global Optimization Framework
Decision Sciences - II for SDA Slide5757