Sunteți pe pagina 1din 14

Factor Analysis

Communalities

Communalities indicate the amount of variance in each variable that is
accounted for.
Initial communalities are estimates of the variance in each variable
accounted for by all components or factors. For principal components
extraction, this is always equal to 1.0 for correlation analyses.
Extraction communalities are estimates of the variance in each variable
accounted for by the components. The communalities in this table are
all high, which indicates that the extracted components represent the
variables well. If any communalities are very low in a principal components extraction, you may need to extract another
component.
Total Variance Explained

The variance explained by the initial solution, extracted
components, and rotated components is displayed. This
first section of the table shows the Initial Eigenvalues.
The Total column gives the eigenvalue, or amount of
variance in the original variables accounted for by each
component.
The % of Variance column gives the ratio, expressed as a
percentage, of the variance accounted for by each
component to the total variance in all of the variables.
The Cumulative % column gives the percentage of
variance accounted for by the first n components. For
example, the cumulative percentage for the second
component is the sum of the percentage of variance for
the first and second components.
For the initial solution, there are as many components as variables, and in a correlations analysis, the sum of the
eigenvalues equals the number of components. You have requested that eigenvalues greater than 1 be extracted, so the
first three principal components form the extracted solution.

Total Variance Explained

The second section of the table shows the extracted
components. They explain nearly 88% of the variability in
the original ten variables, so you can considerably reduce
the complexity of the data set by using these components,
with only a 12% loss of information


The rotation maintains the cumulative percentage of
variation explained by the extracted components, but that
variation is now spread more evenly over the components.
The large changes in the individual totals suggest that the
rotated component matrix will be easier to interpret than
the unrotated matrix.

Scree Plot

The scree plot helps you to determine the optimal number of components. The eigenvalue of each component in the
initial solution is plotted.
Generally, you want to extract the components on the steep slope.
The components on the shallow slope contribute little to the solution.
The last big drop occurs between the third and fourth
components, so using the first three components is an easy
choice.

Rotated Component Matrix

The rotated component matrix helps you to determine what
the components represent.
The first component is most highly correlated with Price in
thousands and Horsepower. Price in thousands is a better
representative, however, because it is less correlated with
the other two components.
The second component is most highly correlated with
Length.
The third component is most highly correlated with Vehicle type.
This suggests that you can focus on Price in thousands, Length, and Vehicle type in further analyses, but you can do even
better by saving component scores.

Component Score Coefficient Matrix

For each case and each component, the component score is computed by multiplying the case's standardized variable
values (computed using listwise deletion) by the component's score coefficients. The resulting three component score
variables are representative of, and can be used in place of, the ten original variables with only a 12% loss of
information.
Using the saved components is also preferable to using Price in thousands, Length, and Vehicle type because the
components are representative of all ten original variables, and the components are not linearly correlated with each
other.
Although the linear correlation between the components is guaranteed to be 0, you should look at plots of the
component scores to check for outliers and nonlinear associations between the components.

Scatterplot Matrix of Component Scores


The first plot in the first row shows the first component on the
vertical axis versus the second component on the horizontal
axis, and the order of the remaining plots follows from there.
The scatterplot matrix shows that the first component has a
skewed distribution, which is because Price in thousands is
skewed. A principal components extraction using a log-
transformed price might give better results. The separation that
you see in the third component is explained by the fact that
Vehicle type is a binary variable. There appears to be a
relationship between the first and third components, due to the
fact that there are several expensive automobiles but no
"luxury trucks." This problem may be alleviated by using a log-
transformed price, but if this does not solve the problem, you
may want to split the file on Vehicle type.
Cluster Analysis
K-means cluster analysis is a tool designed to assign cases to a fixed number of groups (clusters) whose
characteristics are not yet known but are based on a set of specified variables. It is most useful when you want
to classify a large number (thousands) of cases.
A good cluster analysis is:
Efficient. Uses as few clusters as possible.
Effective. Captures all statistically and commercially important clusters. For example, a cluster with five
customers may be statistically different but not very profitable.
Initial Cluster Centers

The initial cluster centers are the variable values of the k well-spaced observations.

Iteration History

The iteration history shows the progress of the clustering process at each step.
By the 14th iteration, they have settled down to the general area of their final
location, and the last four iterations are minor adjustments.
If the algorithm stops because the maximum number of iterations is reached,
you may want to increase the maximum because the solution may otherwise be
unstable.

ANOVA

The ANOVA table indicates which variables contribute the most to your cluster solution
Variables with large F values provide the greatest separation between clusters.

Final Cluster Centers

The final cluster centers are computed as the mean for each variable within each final cluster. The final cluster centers
reflect the characteristics of the typical case for each cluster.
Customers in cluster 1 tend to be big spenders who purchase a lot of services.
Customers in cluster 2 tend to be moderate spenders who purchase the "calling" services.
Customers in cluster 3 tend to spend very little and do not purchase many services.

Distances between Final Cluster Centers

This table shows the Euclidean distances between the final cluster
centers. Greater distances between clusters correspond to greater
dissimilarities.
Clusters 1 and 3 are most different.
Cluster 2 is approximately equally similar to clusters 1 and 3.
These relationships between the clusters can also be intuited from the final cluster centers, but this becomes more
difficult as the number of clusters and variables increases.

Number of Cases in Each Cluster

A large number of cases were assigned to the third cluster, which unfortunately is
the least profitable group. Perhaps a fourth, more profitable, cluster could be
extracted from this "basic service" group.


Discriminant Analysis



Log determinants are a measure of the variability of the groups. Larger log determinants correspond to more
variable groups. Large differences in log determinants indicate groups that have different covariance matrices.
Since Box's M is significant, you should request separate matrices to see if it gives radically different classification
results. See the section on specifying separate-groups covariance matrices for more information
Assessing the Contribution of Individual Predictors. Test of equality of grp means:

The tests of equality of group means measure each independent variable's potential before the model is created.
Each test displays the results of a one-way ANOVA for the independent variable using the grouping variable as the
factor. If the significance value is greater than 0.10, the variable probably does not contribute to the model. According to
the results in this table, every variable in your discriminant model is significant. Wilks' lambda is another measure of a
variable's potential. Smaller values indicate the variable is better at discriminating between groups. The table suggests
that Debt to income ratio (x100) is best, followed by Years with current employer, Credit card debt in thousands, and
Years at current address.
Standardized Canonical Discriminant Function Coefficients

The standardized coefficients allow you to compare variables measured on
different scales. Coefficients with large absolute values correspond to
variables with greater discriminating ability.
This table downgrades the importance of Debt to income ratio (x100), but the
order is otherwise the same.
Structure Matrix
The structure matrix shows the correlation of each predictor variable with the
discriminant function. The ordering in the structure matrix is the same as that
suggested by the tests of equality of group means and is different from that in
the standardized coefficients table. This disagreement is likely due to the
collinearity between Years with current employer and Credit card debt in
thousands noted in the correlation matrix. Since the structure matrix is
unaffected by collinearity, it's safe to say that this collinearity has inflated the
importance of Years with current employer and Credit card debt in thousands
in the standardized coefficients table. Thus, Debt to income ratio (x100) best discriminates between defaulters and
nondefaulters.
Assessing Model Fit. Eigenvalues.
The eigenvalues table provides information
about the relative efficacy of each
discriminant function. When there are two
groups, the canonical correlation is the most
useful measure in the table, and it is equivalent to Pearson's correlation between the discriminant scores and the
groups.
Wilks' Lambda
Wilks' lambda is a measure of how well each
function separates cases into groups. It is
equal to the proportion of the total variance in
the discriminant scores not explained by
differences among the groups. Smaller values of Wilks' lambda indicate greater discriminatory ability of the function.
The associated chi-square statistic tests the hypothesis that the means of the functions listed are equal across groups.
The small significance value indicates that the discriminant function does better than chance at separating the groups.
Model Validation.

The classification table shows the practical results of using
the discriminant model. Of the cases used to create the
model, 94 of the 124 people who previously defaulted are
classified correctly. 281 of the 375 nondefaulters are
classified correctly. Overall, 75.2% of the cases are
classified correctly. Classifications based upon the cases
used to create the model tend to be too "optimistic" in
the sense that their classification rate is inflated. The
cross-validated section of the table attempts to correct
this by classifying each case while leaving it out from the
model calculations; however, this method is generally still
more "optimistic" than subset validation. Subset
validation is obtained by classifying past customers who
were not used to create the model. These results are
shown in the Cases Not Selected section of the table. 77.1
percent of these cases were correctly classified by the model. This suggests that, overall, your model is in fact correct
about three out of four times. The 150 ungrouped cases are the prospective customers, and the results here simply give
a frequency table of the model-predicted groupings of these customers. Since Box's M is significant, it's worth running a
second analysis to see whether using a separate-groups covariance matrix changes the classification.
Specifying Separate-Groups Covariance Matrices.

The classification results have not
changed much, so it's probably not
worth using separate covariance
matrices. Box's M can be overly
sensitive to large data files, which is
likely what happened here.
Adjusting Prior Probabilities
This table displays the prior probabilities for
membership in groups. A prior probability is an estimate
of the likelihood that a case belongs to a particular
group when no other information about it is available.
Unless you specified otherwise, it is assumed that a case
is equally likely to be a defaulter or nondefaulter. Prior
probabilities are used along with the data to determine the classification functions. Adjusting the prior probabilities
according to the group sizes can improve the overall classification rate.

The prior probabilities are now based on the sizes of the
groups. A priori, 75.2% of the cases are nondefaulters,
so the classification functions will now be weighted
more heavily in favor of classifying cases as
nondefaulters.

The overall classification rate is higher for
these classifications than for the ones based
on equal priors. Unfortunately, this comes at
the cost of misclassifying a greater percentage
of defaulters. If you need to be conservative
in your lending, then your goal is to identify
defaulters, and you'd be better off using equal
priors. If you can be more aggressive in your
lending, then you can afford to use unequal
priors. Summary: Using Discriminant Analysis,
you created a model that classifies customers
as high or low credit risks. Box's M showed a
possible problem with heterogeneity of the covariance matrices, although further investigation revealed this was
probably an effect of the size of the data file. The use of unequal priors to take advantage of the fact that nondefaulters
outnumber defaulters resulted in a higher overall classification rate but at the cost of missing defaulters.
Stepwise Discriminant Analysis

The stepwise method starts with a model that
doesn't include any of the predictors. At each step,
the predictor with the largest F to Enter value that
exceeds the entry criteria (by default, 3.84) is
added to the model. The variables left out of the
analysis at the last step all have F to Enter values
smaller than 3.84, so no more are added.

The variables left out of the analysis at the
last step all have F to Enter values smaller
than 3.84, so no more are added.
This table displays statistics for the variables that are in the
analysis at each step. Tolerance is the proportion of a variable's variance not accounted for by other independent
variables in the equation. A variable with very low tolerance contributes little information to a model and can cause
computational problems. F to Remove values are useful for describing what happens if a variable is removed from the
current model (given that the other variables remain). F to Remove for the entering variable is the same as F to Enter at
the previous step (shown in the Variables Not in the Analysis table). Stepwise methods are convenient, but have their
limitations. Be aware that because stepwise methods select models based solely upon statistical merit, it may choose
predictors that have no practical significance. If you have some experience with the data and have expectations about
which predictors are important, you should use that knowledge and eschew stepwise methods. If, however, you have
many predictors and no idea where to start, running a stepwise analysis and adjusting the selected model is better than
no model at all.
Checking model fit

Nearly all of the variance explained by the
model is due to the first two discriminant
functions. Three functions are fit
automatically, but due to its minuscule
eigenvalue, you can fairly safely ignore the third.

Wilks' lambda agrees that only the first two functions are useful. For each set of functions, this tests the hypothesis that
the means of the functions listed are equal across groups. The test of function 3 has a significance value greater than
0.10, so this function contributes little to the model.
Structure Matrix
When there is more than one discriminant function, an
asterisk(*) marks each variable's largest absolute correlation
with one of the canonical functions. Within each function, these
marked variables are then ordered by the size of the correlation.
Level of education is most strongly correlated with the first
function, and it is the only variable most strongly correlated with
this function. Years with current employer, Age in years,
Household income in thousands, Years at current address,
Retired, and Gender are most strongly correlated with the
second function, although Gender and Retired are more weakly
correlated than the others. The other variables mark this
function as a "stability" function. Number of people in
household and Marital status are most strongly correlated with
the third discriminant function, but this is a useless function, so
these are nearly useless predictors.
Territorial Map

The territorial map helps you to study the relationships between the groups and the discriminant functions. Combined
with the structure matrix results, it gives a graphical interpretation of the relationship between predictors and groups.
The first function, shown on the horizontal axis, separates group 4 (Total service customers) from the others. Since Level
of education is strongly positively correlated with the first function, this suggests that your Total service customers are,
in general, the most highly educated. The second function separates groups 1 and 3 (Basic service and Plus service
customers). Plus service customers tend to have been working longer and are older than Basic service customers. E-
service customers are not separated well from the others, although the map suggests that they tend to be well educated
with a moderate amount of work experience. In general, the closeness of the group centroids, marked with asterisks (*),
to the territorial lines suggests that the separation between all groups is not very strong. Only the first two discriminant
functions are plotted, but since the third function was found to be rather insignificant, the territorial map offers a
comprehensive view of the discriminant model.
Classification Results
From Wilks' lambda, you know that your
model is doing better than guessing, but you
need to turn to the classification results to
determine how much better. Given the
observed data, the "null" model (that is, one
without predictors) would classify all
customers into the modal group, Plus service.
Thus, the null model would be correct 281/1000 = 28.1% of the time. Your model gets 11.4% more or 39.5% of the
customers. In particular, your model excels at identifying Total service customers. However, it does an exceptionally
poor job of classifying E-service customers. You may need to find another predictor in order to separate these
customers. You have created a discriminant model that classifies customers into one of four predefined "service usage"
groups, based on demographic information from each customer. Using the structure matrix and territorial map, you
identified which variables are most useful for segmenting your customer base. Lastly, the classification results show that
the model does poorly at classifying E-service customers. More research is required to determine another predictor
variable that better classifies these customers, but depending on what you are looking to predict, the model may be
perfectly adequate for your needs. For example, if you are not concerned with identifying E-service customers the model
may be accurate enough for you. This may be the case where the E-service is a loss-leader which brings in little profit. If,
for example, your highest return on investment comes from Plus service or Total service customers, the model may give
you the information you need.
Conjoint Analysis
Dimensionality:
The inertia per dimension shows the
decomposition of the total inertia along each
dimension. Two dimensions account for 83% of
the total inertia. Adding a third dimension adds
only 8.6% to the accounted-for inertia. Thus, you
elect to use a two-dimensional representation.
Contributions: The row points overview shows the
contributions of the row points to the inertia of the
dimensions and the contributions of the dimensions to
the inertia of the row points. If all points contributed
equally to the inertia, the contributions would be 0.043.
Healthy and low fat both contribute a substantial portion
to the inertia of the first dimension. Men and tough
contribute the largest amounts to the inertia of the
second dimension. Both ugly and fresh contribute very
little to either dimension.
Two dimensions contribute a large amount to the inertia
for most row points. The large contributions of the first
dimension to healthy, new, attractive, low fat, nutritious,
and women indicate that these points are very well
represented in one dimension. Consequently, the higher
dimensions contribute little to the inertia of these points,
which will lie very near the horizontal axis. The second
dimension contributes most to men, premium, and tough.
Both dimensions contribute very little to the inertia for South Australian and ugly, so these points are poorly
represented.
The column points overview displays the
contributions involving the column points. Brands
CC and DD contribute the most to the first
dimension, whereas EE and FF explain a large
amount of the inertia for the second dimension.
AA and BB contribute very little to either
dimension.
In two dimensions, all brands but BB are well
represented. CC and DD are represented well in
one dimension. The second dimension contributes the largest amounts for EE and FF. Notice that AA is
represented well in the first dimension but does not have a very high contribution to that dimension.
Plots:
The row points plot shows that fresh and ugly are both very close to
the origin, indicating that they differ little from the average row profile. Three general classifications emerge. Located in
the upper left of the plot, tough, men, and working are all similar to each other. The lower left contains sweet, fattening,
children, and premium. In contrast, healthy, low fat, nutritious, and new cluster on the right side of the plot.
Notice in the column points plot that all brands are far from the
origin, so no brand is similar to the overall centroid. Brands CC
and DD group together at the right, whereas brands BB and FF
cluster in the lower half of the plot. Brands AA and EE are not
similar to any other brand.
Symmetrical Normalization
In the upper left of the resulting biplot, brand EE is the
only tough, working brand and appeals to men. Brand AA is the most popular and also viewed as the most highly
caffeinated. The sweet, fattening brands include BB and FF. Brands CC and DD, while perceived as new and healthy, are
also the most unpopular. For further interpretation, you can draw a line through the origin and the two image attributes
men and yuppies, and project the brands onto this line. The two attributes are opposed to each other, indicating that
the association pattern of brands for men is reversed compared to the pattern for yuppies. That is, men are most
frequently associated with brand EE and least frequently with brand CC, whereas yuppies are most frequently associated
with brand CC and least frequently with brand EE.

S-ar putea să vă placă și