Documente Academic
Documente Profesional
Documente Cultură
Factor Analysis
Analysis of Interdependence: for data reduction and the discovery of underlying themes in the data
FA is based on analysing correlation matrix of attributes and aims to identify questions that measure, what respondents see as, similar or related concepts Essentially factor analysis is applied as a data reduction or structure detection method
Illustration
Can a set of 30 imagery statements for the shampoo category be simplified without any loss of information? There seem to be as many as 42 purchase decision criteria for my category. Can you help summarize these criteria? What all would Customer Service Orientation constitute? What variables? Can the variables be grouped into themes / dimensions? I want to deploy an objectively tested, valid scale for my Customer Satisfaction studies. My team has developed a huge battery of statements? Can you help?
Factor Analysis
Factor Analysis
Investigates interrelationships among variables. Variable reduction exercise: Reduces the variables into a sub-set of factors without loss of information Used to define or discover themes or underlying (latent) dimensions of a large set of attributes / variables. Often an intermediate step to some other procedure Factors are used as independent variables in Multiple Regression. Interdependence technique: no variable designated dependent or independent. All variables to be metric (interval) Large samples preferred The worth of the solution often depends on the intuitive interpretability of the factors rather than statistical rules.
One can summarize the correlation between the two variables in a scatter plot. A regression line can be fitted that represents the best summary of the linear relationship between two variables
Ajay Macaden
4 6 8 10 Satisfaction with Hobbies
7 point scales
Orthogonal Factors
After we have found the line on which the Variance is maximal, there remains variability around this line We continue and define another line that maximizes the remaining variability In this manner consecutive factors are extracted Because each factor is defined to maximize the variability that is not captured by preceding factor, consecutive factors are independent of each other
Put another way consecutive factors are uncorrelated or ORTHOGONAL to each other
Which statements did they rate similarly? ie which statements are correlated? common themes in the data
Q2 Friendly
Q3 Nervous
Q4 Tolerate it
Q5 Easy
Q6 Interesting
Q7 Uncertain
Q8 - Waste of
time
1
Q2 Friendly Q1 Relaxed Q6 Interesting Q5 Easy Q4 - Tolerate it 0.823 0.803 0.732 0.725 0.456
2
-0.186
-0.265 0.253
Q7 Uncertain
Q3 Nervous Q8 - Waste of time -0.144
0.767
0.697 0.691
1
Q2 Friendly Q1 Relaxed Q6 Interesting Q5 Easy Q4 - Tolerate it 0.82 0.80 0.73 0.72 0.45
Q7 Uncertain
Q3 Nervous Q8 - Waste of time
0.77
0.70 0.70
Other 4 load on second factor Negative about bus travel Tolerate it loads on both
Q1 Relaxed
Q2 Friendly
Q5 - Easy
Q6 Interesting
Q4 Tolerate it
Q3 Nervous
Q7 Uncertain
Q8 - Waste of
time
% of Cumulative % of Cumulative % of Cumulative Total Total Variance % Variance % Variance % 49.628 11.359 8.179 7.546 5.736 4.348 3.917 3.208 2.682 1.850 49.628 5.459 60.986 1.249 69.165 76.711 82.448 86.795 90.713 93.921 96.603 98.453 .900 49.628 11.359 8.179 49.628 2.894 60.986 2.634 69.165 2.080 26.312 23.948 18.905 26.312 50.260 69.165
How much of the total variation in the data is explained by the factors 11 .170 1.547 100.000 The factors should explain at least 2/3 of the Extraction Method: Principal Component Analysis. variance. In this data, the first three factors explain 69% of the variable.
10
.865 .257 -.006 .836 .101 .192 .741 .197 .432 .657 .326 .267 .251 .849 .086 .187 .809 .208 .425 .593 .283 .074 .575 .458
One of the insurance companies that I would first recommend to my customers .172 .086 .821 Has strong working relationships with its distributors/intermediaries Established local insurance company Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. .200 .342 .689 .334 .481 .543
Review factor loadings to decipher the factors. The factor loadings are the correlations between the factor and the attribute.
.865 .257 -.006 .836 .101 .192 .741 .197 .432 .657 .326 .267 .251 .849 .086 .187 .809 .208 .425 .593 .283 .074 .575 .458
Factor 2: Reputation
One of the insurance companies that I would first recommend to my customers .172 .086 .821 Has strong working relationships with its distributors/intermediaries Established local insurance company Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. .200 .342 .689 .334 .481 .543
1. 2. 3.
A three factor solution is selected for these data: Practical solutions Reputation Distribution/how well established
Key Concepts
Eigen Value
Also called characteristic roots. The eigen value for a given factor measures the variance in all the variables which is accounted for by that factor. The ratio of eigen values is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigen value, then it is contributing little to the explanation of variances in the variables and may be ignored as redundant with more important factors.
Variance Explained
To get the percent of variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor (column) and divide by the number of variables. (Note the number of variables equals the sum of their variances as the variance of a standardized variable is 1.) This is the same as dividing the factor's eigenvalue by the number of variables.
Types of Rotation
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which has the effect of differentiating the original variables by extracted factor. Each factor will tend to have either large or small loadings of any particular variable. A varimax solution yields results which make it as easy as possible to identify each variable with a single factor. This is the most common rotation option.
Promax rotation
is an alternative non-orthogonal rotation method which is computationally faster than the direct oblimin method and therefore is sometimes used for very large datasets.
Scree Plot
Eigenvalue
0 1 2 3 4 5 6 7 8 9
Component Number
Statistics to look at
KMO : Should be more than 0.5
Tells whether the partial correlation between variables is small or large
(0-1) should be close to 1 Below 0.5 implies factor analysis wont be useful.
Must achieve a balance between, one the one hand, having enough factors to explain the variation in the original data satisfactorily and, on the other, not having so many factors that little or no data reduction had been achieved. Look for at least 65-70%+ with scale data, but 50+% with binary How big a sample is needed?
The larger the sample size, the more accurately we can estimate the correlations between questions and the more repeatable the analysis will be A sample of 400 or more should provide a stable factor analysis Minimum sample size of c200
Question
Factor - respondent vs response ?
Factor vs Cluster ?
All variability in data not usually accounted for in factor analysis Factors can be hard to interpret - represent many measures Factors depend on data, and can differ for different sets of data
What it does
Identifies underlying families of parameters that are highly correlated, and each family represents a different factor. Helps reduce data from a large number of parameters to a small number of factors. Produces a set of independent variables to be used for further analysis.
Examples of application
What are the main characteristics based on which consumers form brand images in their mindsets? What are the main service aspects retailers consider when evaluating overall satisfaction with the service provided by supplier? Which 10 or 15 attributes should I finalize to be measured from a list of 30 attributes
Requirements
Type of scales: Interval Free association data (ordinal type) could be treated as binary interval data. Some binary nominal scales (involving opposites) could be treated as interval Exclude variables with very low variance, if any. If a pair of variables has a very high correlation, keep one, exclude the other.
Guidelines
Number of factors, based on: Total variance explained above 60% Eigen value above 1. Number of variables divided by 3 or 4. KMO measure: (0-1) should be close to 1.Below 0.5 implies factor analysis wont be useful. Bartlettsignificance level should be very small; say below 0.05. and NOT above 0.10