p5130 Lec09 Qualitative Ivs

Lecture 9 Qualitative Independent Variables
Comparing means using Regression

(I dont need no stinkin ANOVA)
In linear regression analysis, the dependent variable should always be a continuous variable. The same
restriction does not apply to the independent variables, however.
This lecture shows how qualitative variables variables whose value
represent different groups of people, not different quantities, are incorporated into regression analyses,
allowing comparison of means of the groups.
Well discover that IVs with only 2 values can be treated as if they are continuous IVs in any regression.
But IVs with 3 or more values must be treated specially. Once thats done, they also can be included in
regression analyses.
Regression with a single two-valued (dichotomous) predictor

Any two-valued independent variable can be included in a simple or multiple regression analysis. The
regression can be used to compare the means of the two groups yielding the same conclusion as the equal-
variances independent groups t-test.
Example: Suppose the performance of two groups trained using different methods is being compared. Group 1
was trained using a Lecture only method. Group 2 was trained using a Lecture+CAI method. Performance
was measured using scores on a final exam covering the material being taught. So, the dependent variable is
PERF performance in the final exam. The independent variable is TP Training program: Lecture only vs.
Lecture+CAI.
The data follow
ID TP PERF 12 1 65 23 1 45
ID TP PERF 24 1 52
1 1 37
2 1 69 13 1 57
3 1 64 14 1 50
4 1 43 15 1 58
5 1 37 16 1 65
6 1 54 17 1 48
7 1 52 18 1 34
8 1 40 19 1 44
9 1 61 20 1 58
10 1 48 21 1 45
11 1 44 22 1 35
Qualitative Independent Variables - 1 8/27/2017

ID TP PERF ID TP PERF
25 1 37 38 2 56
26 2 53 39 2 61
27 2 62 40 2 62
28 2 56 41 2 72
29 2 61 42 2 46
30 2 63 43 2 64
31 2 34 44 2 60
32 2 56 45 2 58
33 2 54 46 2 73
34 2 60 47 2 57
35 2 59 48 2 53
36 2 67 49 2 43
37 2 42 50 2 61
How should the groups be coded?

In the example data, Training program (TP) was coded as 1 for the Lecture method and 2 for the L+CAI
method. But any two values could have been used. For example 0 and 1 could have been used. Or, 3 and 47
could have been used. When the IV is a dichotomy, the specific values used to represent the two groups formed
by the two values of the IV are completely arbitrary.
When one of the groups has whatever the other has plus something else, my practice is to give it the larger of
the two values, often 0 for the group with less and 1 for the group with more.
When one is a control and the other is an experimental group, my practice is to use 0 for the control and 1 for
the experimental.

Visualizing regressions when the independent variable is a dichotomy.
When an IV is a dichotomy, the scatterplot takes on an unusual appearance. It will be two columns of points,
one over one of the values of the IV and the other over the other value. It can be interpreted in the way all
scatterplots are interpreted, although if the values of the IV are arbitrary, the sign of the relationship may not be
a meaningful characteristic. For example, in the following scatterplot, it would not make any sense to say that
performance was positively related to training program. It would make sense, however, to say that performance
was higher in the Lecture+CAI program than in the Lecture-only program.
In the graph of the example data, the best fitting straight line has been drawn through the scatterplot. When the
independent variable is a dichotomy, the line will always go through the mean value of the dependent variable
at each of the two independent variable values.
Well notice that the regression coefficient, the B value, for Training Program is equal to the difference between
the means of performance in the two programs. This will always be the case if the values used to code the two
groups differ by one (1 vs. 2 in this example).
80
Mean
70 Mean Perf for
Perf for Method 2
Method 1
60
50
40
PE
RF
30
L Only L+CAI
TP
SPSS Output and its interpretation.

R-square is the proportion of variance in Y
Regression related to differences between the groups.
Model S umm ary
Ad justed R Std . Erro r of

Some say that R-square is the proportion of
Mo del R R S quare Sq uare the Estim ate variance related to group membership.
1 .37 4 a .14 0 .12 2 9.6 7
a. Pre dicto rs: (Consta nt), T P So in this example, 14% of variance of Y is
related to group membership.

ANOVAb
Su m of
Mo del
1 Re gressi on
Sq uares
729 .620
df
1
Me an S quare
729 .620
F
7.7 95
Sig .
.00 7 a
As was the case with simple regression
Re sidua l 449 2.88 0 48 93. 602 with a continuous predictor, the
To tal 522 2.50 0 49 information in the ANOVA summary
a. Pre dicto rs: (Consta nt), T P
table is redundant with the information in
b. De pend ent V ariab le: PE RF
the Coefficients box below.
Coeffici ents a
Sta ndard
ize d
Un stand ardized Co efficie
Co efficie nts nts
Mo del B Std . Erro r Be ta t Sig .
1 (Co nstan t) 42. 040 4.3 27 9.7 16 .00 0
TP 7.6 40 2.7 36 .37 4 2.7 92 .00 7
a. De pend ent V ariab le: PE RF
Interpretation of (Constant): This is the expected value of the dependent variable when the independent
variable = 0. If one of the groups had been coded as 0, then the y-intercept would have been the expected value
of Y in that group. In this example, neither group is coded 0, so the value of the y-intercept has no special
meaning.
Interpretation of B when IV has only two values . . .
B = Difference in group means divided by difference in X-values for the two groups.
1 2
If the X-values for the groups differ by 1, as they do here, then B = Difference in group means.
The sign of the B coefficient.
The sign of the b coefficient associated with a dichotomous variable dependent on how the groups were labeled.
In this case, the L Only group was labeled 1 and the L+CAI group was labeled 2.
If the sign of the B coefficient is positive, this means that the group with the larger IV value had a larger
mean.
If the sign of the B coefficient is negative, this means that the group with the larger IV value had a
SMALLER mean.
The fact that B is positive means that the L+CAI group mean (coded 2) was larger than the L group mean
(coded 1). If the labeling had been reversed, with L+CAI coded as 1 and L-only coded as 2, the sign of the b
coefficient would have been negative.
The t-value
The t values test the hypothesis that each coefficient equals 0. In the case of the Constant, we don't care.
In the case of the B coefficient, the t value tells us whether the B coefficient, and equivalently, the
difference in means, is significantly different from 0. The p-value of .007 suggests that the B value is
significantly different from 0.
The bottom line
This means that when the independent variable is a dichotomy, regression of the dependent variable onto a
dichotomous independent variable is a comparison of the means of the two groups.

Relationship to independent groups t.
You may be thinking that another way to compare the performance in the two groups would be to perform an
independent groups t-test. This might then lead you to ask whether you'd get a result different from the
regression analysis.
The t-test on the data follows.
T-Test Start here on 4/4/17

Gr oup S tatis tics
This is what the Regression
TP N Me an Std . Deviatio n Std . Erro r Me an
PE RF 1.0 0 L Only 25 49 .6800 10 .3952 2.0 790
t is from the Coefficients
2.0 0 L+ CAI 25 57 .3200 8.8 963 1.7 793 table on the previous page.
Note that the difference in means is 57.32 - 49.68 = 7.64.

Independent Sam ples Test
Le vene 's Test for

Eq uality of
Va riances t-te st for Equ ality of Me ans
95 % Co nfide nce
Int erval of th e
Me an Std . Error Dif feren ce
F Sig . t df Sig . (2-t ailed ) Dif feren ce Dif feren ce Lo wer Up per
PE RF Eq ual varian ces a ssum ed 1.9 74 .16 6 -2. 792 48 .00 7 -7. 6400 2.7 364 -13 .142 0 -2. 1380
Eq ual varian ces n ot assume d -2. 792 46 .881 .00 8 -7. 6400 2.7 364 -13 .145 4 -2. 1346
Note that the t-value is 2.792, the same as the t-value from the regression analysis. This indicates a very
important relationship between the independent groups t-test and simple regression analysis:
When the independent variable is a dichotomy, the simple regression of Y onto the dichotomy gives the
same test of difference in group means as the equal variances assumed independent groups t-test.
As we'll see when we get to multiple regression, when independent variables represent several groups, the
regression of Y onto those independent variables gives the same test of differences in group means as does the
analysis of variance. That is, every test that can is conducted using analysis of variance can be conducted
using multiple regression analysis.
Analysis of variance a dinosaur methodology?
Yes, it is. No self-respecting computer program would use the ANOVA formulae taught in many (but fewer
each year) older statistical textbooks. All well-written computer programs convert the problem to a regression
analysis and conduct the analysis as if it were a regression, using the techniques to be shown in the following.
But statistics is littered with dinosaurs. Among many analysts, regression analysis itself has been replaced by
structural equation modeling a much more inclusive technique.
Among other analysts, the kinds of regression analyses were doing have been replaced by multilevel analyses,
again, a more inclusive technique in a different context.
When you wake up tomorrow, statistical analysis will have changed

Comparing Three Group Means using Regression
The problem
Consider comparing mean religiosity scores among three religious groups Protestants, Catholics, and Jews.
Suppose you had the following data
Religion Nave Religiosity

Religion
Code
Prot 1 6
Prot 1 12
Prot 1 13
Prot 1 11
Prot 1 9
Prot 1 14
Prot 1 12
Cath 2 5
Cath 2 7
Cath 2 8
Cath 2 9
Cath 2 10
Cath 2 8
Cath 2 9
Jew 3 4
Jew 3 3
Jew 3 6
Jew 3 5
Jew 3 7
Jew 3 8
Jew 3 2
Obviously, we could compare the means using traditional ANOVA formulas.
But suppose you wished to analyze these data using regression.
One seemingly logical approach would be to assign the successive integers to the religion groups and perform a
simple regression.
In the above, the variable, RELCODE, is a numeric variable representing the 3 religions.
Because it is NOT the appropriate way to represent a three-category variable in a regression analysis, well call
it the Nave RELCODE.
The simple regression follows:

Scatterplot of Religiosity vs. Nave RELCODE
Below is a scatterplot of the relationship of STRENGTH to Nave RELCODE.

16
14
12
10
6
Religiosity
STRENGTH
4
This is mostly a page of crap
2 because the analysis is completely
0
inappropriate.
.5 1.0 1.5 2.0 2.5 3.0 3.5
RELCODE NAVE RELCODE
Regression
Va riabl es Entere d/Rem ov e db
Va riable s
Mo del Va riable s En tered Re move d Me thod
1 RE LCO DE a . En ter
a. All requ ested vari ables ente red.
b. De pend ent V ariab le: S TRE NGT H
Model S umm ary
Std . Erro r of
Mo del R R S quare Ad justed R S quare the Estim ate
1 .76 7 a .58 9 .56 7 2.1 52
a. Pre dicto rs: (Consta nt), RELCODE
Coeffici ents a
Sta ndardized
Un stand ardized Coeffi cient s Co effici ents
Mo del B Std . Error Be ta t Sig .
1 (Co nsta nt) 14 .000 1.2 43 11 .267 .00 0
RE LCO D
-3. 000 .57 5 -.7 67 -5. 216 .00 0
E
a. De pend ent V ariab le: S TRE NGT H
Looks like a strong negative relationship.
But wait!! Somethings wrong. <===== Not crap.
For this analysis, I assigned the numbers 1, 2, and 3 to the religions Prot, Cath, and Jew respectively.
But I could just as well have used a different assignment. How about Cath = 1, Prot=2, and Jew=3?

The data would now look like
This is another page of crap
Religion New Religiosity because the analysis is completely
Nave inappropriate.
RelCode
Prot 2 6
Prot 2 12
Prot 2 13
Prot 2 11 The scatterplot would be
Prot 2 9 16
Prot 2 14 14
Prot 2 12
Cath 1 5 12
Cath 1 7 10
Cath 1 8
8
Cath 1 9
Cath 1 10 6
Cath 1 8 Religiosity
STRENGTH
4
Cath 1 9
Jew 3 4 2
Jew 3 3 0
.5 1.0 1.5 2.0 2.5 3.0 3.5
Jew 3 6
Jew 3 5 NEW NAVE RELCODE
RELCODE
Jew 3 7
Jew 3 8
Jew 3 2
The analysis would be

Regression
Model S umm ary
Std . Erro r of
Mo del R R S quare Ad justed R S quare the Estim ate
1 .38 4 a .14 7 .10 2 3.0 99
a. Pre dicto rs: (Consta nt), RELCODE
Coeffici ents a
Sta ndardized
Un stand ardized Coeffi cient s Co effici ents
Mo del B Std . Error Be ta t Sig .
1 (Co nsta nt) 11 .000 1.7 89 6.1 48 .00 0
RE LCO D
-1. 500 .82 8 -.3 84 -1. 811 .08 6
E
a. De pend ent V ariab le: S TRE NGT H
Whoops! Whats going on? Two analyses of the same data yield two VERY different results. Which is
correct? Answer: Neither is correct. In fact, there is nothing of use in either analysis.
This is a great example of how a statistical analysis can go completely wrong.

The problem
Qualitative Factors, such as religion, race, type of graduate program, etc. with 3 or more values, cannot be
analyzed using simple regression techniques in which the factor is used as-is as a predictor.
Thats because the numbers assigned to qualitative factors are simply names. Any set of numbers will do. The
problem with that is that each different set of numbers will yield a different result in a simple regression.
Note: If the qualitative factor has only 2 values, i.e., its a dichotomy, it CAN be used as-is in the regression.
(So everything on the first couple of pages of this lecture is still true.) But if it has 3 or more values, it cannot.
Does this mean that regression analysis is useful only for continuous or dichotomous variables? How limiting!!
The solution (thanks Mathematicians)
1. Represent each value of the qualitative factor with a combination of two or more values of specially selected
Group Coding Variables.
Theyre called group coding variables because each value of a qualitative factor represents a
group of people.
If there are K groups, then K-1 group coding variables are required. .
2. Regress the dependent variable onto the set of group coding variables in a multiple regression.
Group Coding Variables
The question arises: What actually are the group coding variables? How are they created?
There are 3 common types of group coding variables. (There are several other less common types.)
1. Dummy Coding.
2. Effects Coding.
3. Contrast Coding. (We wont cover this technique this semester. Covered in Advanced SPSS.)

Dummy Variable Coding
In Dummy Variable Coding, one group is designated as the Comparison/Reference group. As a byproduct of
the analysis, its mean is compared with the means of all the other groups.
If K is the number of groups, then K-1 Dummy variables are created.
The comparison group is assigned the value 0 on all Dummy Variables.
Each other group is assigned the value 1 on one Dummy Variable and 0 on the remaining.
Each group is assigned the value 1 on a different Dummy Variable.
Examples . . .
Two Groups (Even though we dont actually need special techniques for two groups.)
Group DV1
G1 1
G2 0 = The Comparison Group
Three Groups
Group DV1 DV2
G1 1 0
G2 0 1
G3 0 0 The Comparison Group
Four Groups
Group DV1 DV2 DV3
G1 1 0 0
G2 0 1 0
G3 0 0 1
G4 0 0 0 The Comparison Group
Five Groups
Group DV1 DV2 DV3 DV4
G1 1 0 0 0
G2 0 1 0 0
G3 0 0 1 0
G4 0 0 0 1
G5 0 0 0 0 The Comparison Group
Etc.
Because, as will be shown below, the regression results in a comparison of the means of the groups with 1
codes with the mean of the Comparison Group, this coding scheme is most often used in situations in which
there is a natural comparison group, for example, a control group to be compared with several experimental
groups.

Example Regression Using Dummy Variable Coding
The hypothetical data are job satisfaction scores (JS) of three groups of employees.
JS JOB DC1 DC2
DC2
6 1 1 0
Group 1
7 1 1 0
8 1 1 0
11 1 1 0
9 1 1 0
7 1 1 0
7 1 1 0
5 2 0 1 Group 2
7 2 0 1
8 2 0 1
9 2 0 1
10 2 0 1
8 2 0 1
9 2 0 1
4 3 0 0
3 3 0 0
6 3 0 0 Group 3, the Comparison
5 3 0 0 Group.
7 3 0 0
8 3 0 0
2 3 0 0
The REGRESSION Dialog

Regression
b
Variables Entered/Removed
Variables Variables
Model Entered Removed Method
1 DC2, DC1a . Enter
a. All requested variables entered.
b. Dependent Variable: JS
Model Summary
When the predictors are group coding
Std. Error
Adjusted R of the
variables, we often say that R2 is the
Model R R Square Square Estimate proportion of variance related to
1 .630a .397 .330 1.84 group membership.
a. Predictors: (Constant), DC2, DC1
This F tests the overall null
ANOVAb
hypothesis that there are no
Sum of Mean differences between the 3
Model Squares df Square F Sig. population means. Its the same
1 Regression 40.095 2 20.048 5.930 .011a value we would have obtained
Residual 60.857 18 3.381 had we conducted an ANOVA.
Total 100.952 20
a. Predictors: (Constant), DC2, DC1 The F is significant, so reject the
hypothesis that the population
means are equal.
Interpretation of the Coefficients Box.
Each Dummy Variable compares the mean of the group coded 1 on that variable to the mean of the Comparison
group. The value of the B coefficient is the difference in means.
So, for DC1, the B of 2.857 means that the mean of Group1 was 2.857 larger than the Comparison group mean.
For DC2, the B of 3.000 means that the mean of Group2 was 3.000 larger than the Comparison group mean.
Coefficientsa
When is dummy coding
Stan
dardi used?
zed
Coeff When one of the groups is a
Unstandardized icient natural control group for
Coefficients s all the other groups.
Model B Std. Error Beta t Sig.
1 (Constant) 5.000 .695 7.194 .000
DC1 2.857 .983 .614 2.907 .009
DC2 3.000 .983 .645 3.052 .007
a. Dependent Variable: JS
Each t tests the significance of the difference between a group mean and the reference group mean.
t=2.907 tests the significance of the difference between Group 1 mean and the Reference group mean.
t = 3.052 test the significance of the difference between Group 2 mean and the Reference group mean.
So the mean of Group1 is significantly different from the Reference group mean and the mean of Group2 is also
significantly different from the Reference Group mean.

Effects Coding (called Deviation coding in SPSS)
Effects coding is basically the same as Dummy Variable Coding with the exception that the comparison group
code is switched from all 0s to all -1s.
Two Groups (Remember, special coding is not actually needed, since there are two groups.)
Group Code
G1 1
G2 -1
Three Groups Special coding IS needed when you are comparing means of 3 or more groups.
Group GCV1 GCV2
G1 1 0
G2 0 1
G3 -1 -1
Four Groups
Group GCV1 GCV2 GCV3
G1 1 0 0
G2 0 1 0
G3 0 0 1
G4 -1 -1 -1
Etc.
The coding switch changes the interpretation of the B coefficients.
Now, rather than representing a comparison of the mean of a 1 group with the mean of a comparison group,
the B coefficient represents a comparison of the mean of a 1 group with the mean of ALL groups.

Regression Example Using Effects Coding
JS JOB EC1 EC2
6 1 1 0
7 1 1 0
8 1 1 0 Group 1
11 1 1 0
9 1 1 0
7 1 1 0
7 1 1 0
5 2 0 1 Group 2
7 2 0 1
8 2 0 1
9 2 0 1
10 2 0 1
8 2 0 1
9 2 0 1
4 3 -1 -1
3 3 -1 -1
6 3 -1 -1 Group 3: Comparison Group
5 3 -1 -1
7 3 -1 -1
8 3 -1 -1
2 3 -1 -1
Report
JS
Std. Alas, we can use REGRESSION to
JOB Mean N Deviation compare means, but it wont report
1 Clerks 7.86 7 1.68 them for us. We have to use some
2 Receptionist 8.00 7 1.63 other procedure, such as the
3 Mailroom 5.00 7 2.16 REPORT procedure, if we want to
Total 6.95 21 2.25 actually seen the values of the
means.

Regression
b
Variables Entered/Removed
Variables Variables
Model Entered Removed Method
Everything in the top
1 DC1a
EC1, EC2
DC2, . Enter three boxes is the same
a. All requested variables entered. as in the dummy variable
b. Dependent Variable: JS analysis.
Model Summary
The means are still
Std. Error significantly different.
Adjusted R of the The F of 5.930 is
Model R R Square Square Estimate
1 .630a .397 .330 1.84 EXACTLY the same
a. Predictors: (Constant), DC2, DC1
EC1, EC2
value as we obtained
using dummy coding and
ANOVAb EXACTLY the same
Sum of Mean value wed have
Model Squares df Square F Sig. obtained had we done an
1 Regression 40.095 2 20.048 5.930 .011a
Analysis of Variance.
Residual 60.857 18 3.381
Total 100.952 20
a. Predictors: (Constant), EC2, EC1
Interpretation of the Coefficients Box.
In Effects coding, each B coefficient represents a comparison of the mean of the group coded 1 on the variable
with the mean of ALL the groups.
So, for EC1, the B of .905 indicates that the mean of Group 1 was .905 larger than the mean of all the groups.
For EC2, the B of 1.048 indicates that the mean of Group 2 was 1.048 larger than the mean of all the groups.
There is no B coefficient for Group 3.
Coefficientsa DC1
Stan DC2
dardi
zed
Coeff
Unstandardized icient
Coefficients s
Model B Std. Error Beta t Sig.
EC1
1 (Constant) 6.952 .401 17.327 .000 EC2
EC1 .905 .567 .337 1.594 .128
EC2 1.048 .567 .390 1.846 .081
a. Dependent Variable: JS
The t of 1.594 indicates that the mean of Group 1 was not significantly different from the mean of all groups.
The t of 1.846 indicates that the mean of Group 2 was not significantly different from the mean of all groups.
Remember that these are the same data as above. It indicates that one form of analysis of the data may be more
informative than another form. In this case, the Dummy Variable analysis was more informative.

Perspective
You may recall that we considered a procedure for comparing means in the fall semester. It was the analysis of
variance. It was a lot easier than creating group-coding variables and performing the regression analyses
weve done here. Furthermore, using the analysis of variance procedure in SPSS automatically provided
means and standard deviations of the groups, something we had to do as an extra step when using
REGRESSION. Plus, the analysis of variance provides post hoc tests that arent available in regression.
Heres the output of SPSSs ONEWAY analysis of variance procedure for the above data . . .
ANOVA
JS
Sum of Mean
Squares df Square F Sig.
Between Groups 40.095 2 20.048 5.930 .011
Within Groups 60.857 18 3.381
Total 100.952 20
Note that the F value (5.930) is exactly the same as the F value from the ANOVA table from the regression
procedure.
So why bother to use the regression procedure to compare group means?
The answer is that if the comparison of a single set of group means were all that there was to the analysis, you
would NOT use the regression procedure - youd use the analysis of variance procedure.
But here are four reasons for using or at least being familiar with regression-based means comparisons and the
group coding variable schemes upon which theyre based.
1. Whenever you have a mixture of qualitative and quantitative variables in the analysis, regression
procedures are the overwhelming choice. Example: Are there differences in the means of three groups
controlling for cognitive ability? Cant do that without including cognitive ability, a quantitative variable in
the analysis. Traditional analysis of variance formulas dont easily incorporate quantitative variables. Once
youre familiar with group coding schemes, its pretty easy to perform analyses with both quantitative and
qualitative variables.
2. Most statistical packages perform ALL analyses of both qualitative and quantitative and mixtures
using regression formulas. When analyzing only qualitative variables they will print output that looks like
theyve used the analysis of variance formulas, but behind your back, theyve actually done regression analyses.
Some of that output may reference the behind-your-back regression that was actually performed. So
knowing about the regression approach to comparison of group means will help you understand the output of
statistical packages performing analysis of variance. Well see that in the GLM procedure below.
3. Other analyses, for example Logistic Regression and Survival Analyses, to name two in SPSS, have very
regression-like output when qualitative factors are analyzed. That is, theyre quite up-front about the fact
that they do regression analyses. If you dont understand the regression approach to analysis of variance, itll be
very hard for you to understand the output of these procedures.
4. Its just cool to know how to do this.

Doing the analyses using the GLM procedure.
JS JOB
6 1 Note that there are no

7 1
Group 1 group-coding variables
8 1
in the data that must be
11 1
submitted to GLM.
9 1
7 1
Hurray. Hurray!!
7 1
5 2 Group 2 Dont need no stinkin
7 2
8 2 GCVs.
9 2
10 2
8 2
9 2
4 3
3 3
Group 3: Comparison Group
6 3
5 3
7 3
8 3
2 3

Put names of
qualitative
factors in the
Fixed Factor(s)
field.
Put names of
quantitative
factors in the
Covariates
field.

SAVE OUTFILE='C:\Users\Michael\Documents\JSExampleFor513.sav'
/COMPRESSED.
UNIANOVA JS BY JOB
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/POSTHOC=JOB(BTUKEY)
/PLOT=PROFILE(JOB)
/PRINT=ETASQ HOMOGENEITY DESCRIPTIVE OPOWER
/CRITERIA=ALPHA(.05)
/DESIGN=JOB.
[DataSet0] C:\Users\Michael\Documents\JSExampleFor513.sav
Between-Subjects Factors
N Descriptive Statistics
Dependent Variable:JS
JOB 1 7
Job Mean Std. Deviation N
2 7
1 7.86 1.676 7
3 7
2 8.00 1.633 7
Levene's Test of Equality of Error Variancesa 3 5.00 2.160 7
Dependent Variable:JS Total 6.95 2.247 21
F df1 df2 Sig.
.572 2 18 .574
Tests the null hypothesis that the error variance of

the dependent variable is equal across groups.
a. Design: Intercept + JOB

Tests of Between-Subjects Effects
Noncent Observed
. Powerb
Whats
Type III Sum Partial Eta Paramet
this?
Source of Squares df Mean Square F Sig. Squared er
Corrected Model 40.095a 2 20.048 5.930 .011 .397 1.186E1 .815
Intercept 1015.048 1 1015.048 3.002E2 .000 .943 3.002E2 1.000
JOB 40.095 2 20.048 5.930 .011 .397 1.186E1 .815
Error 60.857 18 3.381
Total 1116.000 21
Corrected Total 100.952 20
a. R Squared = .397 (Adjusted R Squared = .330)
b. Computed using alpha = .05
Corrected Model: This is what is in the ANOVA box in regression.
GLM regresses the dependent variable onto ALL of the group coding variables and quantitative
variables, if there are any. This is the report of the significance of that regression.
Intercepts: This is the report on the Y-intercept of the All predictors regression reported on in the line
immediately above.
These are signs of the behind-your-back regression analysis thats actually been conducted.
JOB: The overall F again, this time for job.
Note that no mention is made of the fact that two group-coding variables were created to represent JOB.
The only indication that something is up is the 2 in the df column. That 2 is the number of actual
independent variables used to represent the JOB factor.
Error: The denominator of the F statistic.
Partial Eta squared: A measure of effect size appropriate for analysis of variance.
See 5100 notes for interpretation of eta squared.
Observed Power: Probability of a significant F if experiment were conducted again with population means
equal to these sample means.

Profile Plots
Post Hoc Tests

JOB
Homogeneous Subsets
JS
Tukey B
Subset
JOB N 1 2
3 7 5.00
1 7 7.86
2 7 8.00
Means for groups in homogeneous subsets are

displayed.
Based on observed means.
The error term is Mean Square(Error) = 3.381.

Having your cake and eating it too - Specifying Coding Schemes in GLM
What if you just miss group coding variables. Is there a way to see them one last time in GLM?
Click on this button to

work with group
coding variables.
Here are the SPSS names

for the coding schemes
were using
Our name SPSSs
Dummy Simple
Effects Deviation

I should have
checked the
homogeneity box
here. Thanks,
Stephanie.
UNIANOVA JS BY Job
/CONTRAST(Job)=Deviation
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=OPOWER ETASQ DESCRIPTIVE PARAMETER Checking the Parameter
/CRITERIA=ALPHA(.05) Estimates box tells GLM to print
/DESIGN=Job. out any regression parameters it
Univariate Analysis of Variance might have computed.
Between-Subjects Factors
These are regression parameters
N
for any quantitative independent
Job 1 7 variables and for group-coding
2 7 variables that are created
3 7 automatically by GLM.
Descriptive Statistics
Job Mean Std. Deviation N
1 7.86 1.676 7
2 8.00 1.633 7
3 5.00 2.160 7
Total 6.95 2.247 21

Tests of Between-Subjects Effects
Type III Sum Partial Eta Noncent. Observed
Source of Squares df Mean Square F Sig. Squared Parameter Powerb
Corrected Model 40.095a 2 20.048 5.930 .011 .397 11.859 .815
Intercept 1015.048 1 1015.048 300.225 .000 .943 300.225 1.000
Job 40.095 2 20.048 5.930 .011 .397 11.859 .815
Error 60.857 18 3.381
Total 1116.000 21
Corrected Total 100.952 20
a. R Squared = .397 (Adjusted R Squared = .330) These results are from the default
b. Computed using alpha = .05 dummy coding that SPSS always does
automatically. Note identical to those
Parameter Estimates obtained earlier.
95% Confidence Interval Partial Eta Noncent. Observed
Parameter B Std. Error t Sig. Lower Bound Upper Bound Squared Parameter Powera
Intercept 5.000 .695 7.194 .000 3.540 6.460 .742 7.194 1.000
[Job=1] 2.857 .983 2.907 .009 .792 4.922 .319 2.907 .785
[Job=2] 3.000 .983 3.052 .007 .935 5.065 .341 3.052 .823
[Job=3] 0b . . . . . . . .
a. Computed using alpha = .05
b. This parameter is set to zero because it is redundant.
Custom Hypothesis Tests These are the results for the deviation group coding scheme we asked for.
Contrast Results (K Matrix)
Dependent
Variable
Job Deviation Contrasta JS
Level 1 vs. Mean Contrast Estimate .905
Hypothesized Value 0
Difference (Estimate - Hypothesized) .905
p-values are the
Std. Error .567
Sig. .128
same as those
95% Confidence Interval for Lower Bound -.287 obtained using
Difference Upper Bound 2.097 the
Level 2 vs. Mean Contrast Estimate 1.048 REGRESSION
Hypothesized Value 0 procedure on p.
Difference (Estimate - Hypothesized) 1.048 14.
Std. Error .567
Sig. .081
95% Confidence Interval for Lower Bound -.145
Difference Upper Bound 2.240
a. Omitted category = 3
Whats this???
Test Results
Sum of Partial Eta Noncent. Observed
Source Squares df Mean Square F Sig. Squared Parameter Powera
Contrast 40.095 2 20.048 5.930 .011 .397 11.859 .815
Error 60.857 18 3.381
a. Computed using alpha = .05

p5130 Lec09 Qualitative Ivs

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

p5130 Lec09 Qualitative Ivs

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture 9 Qualitative Independent Variables

Comparing means using Regression

This lecture shows how qualitative variables variables whose value

Regression with a single two-valued (dichotomous) predictor

Qualitative Independent Variables - 1 8/27/2017

How should the groups be coded?

Qualitative Independent Variables - 2 8/27/2017

SPSS Output and its interpretation.

Ad justed R Std . Erro r of

Qualitative Independent Variables - 3 8/27/2017

Interpretation of B when IV has only two values . . .

The sign of the B coefficient.

Qualitative Independent Variables - 4 8/27/2017

The t-test on the data follows.

T-Test Start here on 4/4/17

Note that the difference in means is 57.32 - 49.68 = 7.64.

Le vene 's Test for

Analysis of variance a dinosaur methodology?

When you wake up tomorrow, statistical analysis will have changed

Qualitative Independent Variables - 5 8/27/2017

Suppose you had the following data

Religion Nave Religiosity

Obviously, we could compare the means using traditional ANOVA formulas.

But suppose you wished to analyze these data using regression.

The simple regression follows:

Qualitative Independent Variables - 6 8/27/2017

Below is a scatterplot of the relationship of STRENGTH to Nave RELCODE.

RELCODE NAVE RELCODE

Model S umm ary

Looks like a strong negative relationship.

But wait!! Somethings wrong. <===== Not crap.

Qualitative Independent Variables - 7 8/27/2017

The analysis would be

Model S umm ary

This is a great example of how a statistical analysis can go completely wrong.

Qualitative Independent Variables - 8 8/27/2017

The solution (thanks Mathematicians)

Group Coding Variables

Qualitative Independent Variables - 9 8/27/2017

If K is the number of groups, then K-1 Dummy variables are created.

The comparison group is assigned the value 0 on all Dummy Variables.

Qualitative Independent Variables - 10 8/27/2017

The REGRESSION Dialog

Qualitative Independent Variables - 11 8/27/2017

Qualitative Independent Variables - 12 8/27/2017

The coding switch changes the interpretation of the B coefficients.

Qualitative Independent Variables - 13 8/27/2017

Qualitative Independent Variables - 14 8/27/2017

Qualitative Independent Variables - 15 8/27/2017

So why bother to use the regression procedure to compare group means?

4. Its just cool to know how to do this.

Qualitative Independent Variables - 16 8/27/2017

6 1 Note that there are no

Qualitative Independent Variables - 17 8/27/2017

Qualitative Independent Variables - 18 8/27/2017

F df1 df2 Sig.

Tests the null hypothesis that the error variance of

a. Design: Intercept + JOB

Qualitative Independent Variables - 19 8/27/2017

Corrected Model 40.095a 2 20.048 5.930 .011 .397 1.186E1 .815

Intercept 1015.048 1 1015.048 3.002E2 .000 .943 3.002E2 1.000

JOB 40.095 2 20.048 5.930 .011 .397 1.186E1 .815

Error 60.857 18 3.381

Corrected Total 100.952 20