Documente Academic
Documente Profesional
Documente Cultură
CASE 3 DATA
All predictors are metric or interval:- You can use two types of analysis
1) Discriminant Analysis: Has more discriminating power. Used when all the
assumptions of Regression are satisfied.
2) Regression: Can be used in more general situation (not normal data)
y = a + b1 x1+ b2 x2 + ……….. + b9 x9
Here y is categorical. But y can have more than two categories yes no can’t say type.
PROCESS:
3) Classification table
This gives the percentage of correctly predicted values. For eg. How many observed
preferred values are equal to expected preferred values? Diagonals give the correctly
predicted values(Hit ratio2).
Hit Ratio= 87% (in this case)
Thumb Rule=70%
MODEL
y=0--------0.5----------1
2
Hit Ratio=Sum of diagonals/Total No of Obs
0-not preferred
1- preferred
Thus it will lie in the category who preferred the diaper but actually the model has
predicted not preferred.
PREDICTORS ANALYSIS
β =(B-0)/SE(β )
Wald Statistic=β
ASSUMPTION
Corraltion Matrix for checking the mlticollinearity
FACTOR ANALYSIS
CASE 3(C)DATA
F1= k1+k2……………+k9
F2= l1+l2+l3………….+l9
F3=m1+m2+………….+m9
F4=n1+n2+……………+n9
K2=.71
L2=.11
M2=.32
N2=.13
K3=.06
L3=.73
M3=.19
N3=.42
This means by deductive reasoning we can conclude that x1, x2 and x3 are mutually
related or x1, x2 & x3 are dependant.
It will never happen that u get the same variable in different factors.
Now instead of using the traditional explained variance we will use shared variance.
PROCESS
Data Reduction>Factor
Enter all variables you want to run the factor analysis on.
Selection Variable: Used when you want separate results of different categories. For
example factors of males and females might differ. Or that of mothers and fathers might
differ.
Descriptives: Select Coefficients, KMO and Bartletts, significance, univariate
descriptives.
Extraction: Method is Principal components3(Ask other methods), Select Scree Plot,
Extract Eigen values over 1.
Rotation: select varimax (IMP)
Scores: Save as variables
Ask Method: Regression, Bartlett, Anderson Rubin
Display factor score coefficient matrix.
Options: No changes
If you run the factor analysis you get three new columns FAC1_1, FAC2_1, FAC3_1
OUTPUT ANALYSIS
Correlation Matrix
If all non –diagonal values are below 0.2 or 0.3. No relational factor analysis is required.
Data is not suitable for factor analysis.
Note:It is only indicative/suggestive whether data is suitable for factor analysis. You
only get an impression which you have to confirm. Confirmation through KMO and
Bartletts Test.
1) Bartlett’s Test
To confirm whether the nine variables as a whole are suitable for Factor Analysis or
whether there is a perceptual dimension beneath the data we see the Bartletts Test.
Only if Bartlett’s Test is rejected (at 0.05 sig) there is a significant relationship between
variables (data suitable for Factor Analysis). Here it is .000. so we reject H0.
If the above two tests give conflicting results use the first one.
NOTE: The above tests give whether all nine variables as a whole are suitable or not. To
see whether which variables are related you see the COMMUNALITIES
3
Principal component analysis (PCA) involves a mathematical procedure that transforms a
number of possibly correlated variables into a smaller number of uncorrelated variables called
principal components. PCA was invented in 1901 by Karl Pearson.
COMMUNALITIES
Answer: In this analysis we had selected the Extraction > Eigen values more than 1
Now suppose we see that the eigen value of the fourth component is 0.999, then actually
this component should also be a factor as it is important but it is not included as a factor.
Then in that case we rerun the factor analysis by Selecting>Extraction>No. of Factors =4
Here 9 variables have been reduced to three factors, thus capturing 84% of the variation
and sacrificing 16% of the variation.
There will always be a trade-off. The variance extracted should not be very less.
SCREE PLOT
The steep slope shows that more variance is captured. Scree plot suggests how many
factors should be there. And for this purpose elbow criteria is used (the total number of
factors that should be taken must coincide with the elbow.
Scree Plot
4
Eigenvalue
1 2 3 4 5 6 7 8 9
Component Number
i.
These tables show the correlation between the factor & the variable. Principal component
matrix puts maximum shared variation in the first factor. Thus, all variables are assigned
to the first factor.
In component matrix since principal components is used the first factor will always have
the maximum variation captured so maximum number of variables will be included in it
& the interpretation of the factors becomes difficult. Thus component matrix should not
be used for analysis.
Intially the axes are orthogonal. F1 & F2
are perpendicular. But the axis are rotated
in such a way that the maximum variation
is captured. V1 &V3 both belonged to F1
along with V6 & V9. But as the axis are
rotated the V6 & V9 are moved to F2.
Also note that for the variance explained
table the total variance shared remains the
same in case of the component matrix as
well as the rotated component matrix. But
the variance extracted of each factor
changes in rotation.
Component
1 2 3
Count Per Box .224 .865 .251
Price Charged .193 .891 .243
Value .183 .862 .105
Unisex vs. Separate Sex .244 .266 .902
Style .237 .220 .916
Absorbency .850 .232 .256
Leakage .879 .182 .257
Comfort .863 .177 .157
Taping .768 .145 .079
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a Rotation converged in 4 iterations.
BEHAVIORAL ANALYSIS
D = Discriminant score
bs= discriminant loadings/coefficient or weight
Classify>Discriminanat>
Statistics
Means
Univariate ANOVA
Boxes M
Fisher’s
Unstandardized
Within group correlation
Classify
Case-wise Results
Summary Table
Save…
Predicted Membership
Discriminant Scores
Probability Gp Members
Lets see what discriminant analysis does
Count
Pref- 4.12
Non Pref- 4.08
Style
Pref-4.10
Non Pref- 1.36
Group statistics table suggests which groups are playing a role in discriminating. If there
is a large difference between the means of two groups then that variable is playing a role.
P< .05
The predictor is playing a role
All the predictors are playing a role together
WILK’S LAMBDA
total var= between grp+within grp
Wilks'
Lambda F df1 df2 Sig.
Count Per Box .651 159.897 1 298 .000
Price Charged .702 126.733 1 298 .000
Value .780 83.930 1 298 .000
Unisex vs. Separate Sex .610 190.575 1 298 .000
Style .699 128.296 1 298 .000
Absorbency .670 146.588 1 298 .000
Leakage .690 133.692 1 298 .000
Comfort .784 82.343 1 298 .000
Taping .880 40.477 1 298 .000
Arrange
Unisex vs separate sex
Count per box
Absorbancy
There are two assumptions that need to be satisfied for discriminant analysis
1. Predictors should follow normal distribution
2. The groups should have equal var – cov matrix
Test Results
Box's M 48.212
F Approx. 1.036
df1 45
df2 242753.94
4
Sig. .405
Tests null hypothesis of equal population covariance matrices.
0.405>0.05
Accept H0