Sunteți pe pagina 1din 42

1

CHAPTER 1
INTRODUCTION

Many scientific studies are featured by the fact that numerous variables are used to
characterize objects [1]. Examples are studies in which questionnaires are used that consist of a
lot of questions (variables), and studies in which mental ability is tested via several subtests, like
skills tests, logical reasoning ability tests, etc. [2]. Because of these big numbers of variables
that are into play, the study can become rather complicated in the sense that as we add more and
more variables, more and more overlap. For situations such as these, Exploratory Factor Analysis
(EFA) has been invented. Broadly speaking, factor analysis provides the tools for analyzing the
structure of interrelationships among a large number of variables by defining sets of variables
that are highly interrelated, known as factors which are assumed to represent dimensions within
the data and to partially or completely replace the original set of variables [5].
This paper explores the use of EFA as a variable-reduction multivariate technique.
Further, it assesses results on different data formats.
The goal of this paper is to discuss common practice in studies using EFA and provide
practical information on best practices in the use of EFA. In particular we discuss three issues:
(1) To determine what type of data format is consistent for EFA technique.
(2) To evaluate which type of factor rotation is the most appropriate for EFA.
(3) To assess results on split-sampling of EFA.

2


1.1 Preliminaries
A hypothesis is a tentative theory which aims to explain facts about the real world. In
statistics, a statistical hypothesis is a conjecture about a population parameter. This conjecture
may or may not be true. This parameter is a characteristic or measure obtained by using all the
data values for a specific population. This particular hypothesis includes the null hypothesis and
alternative hypothesis. For instance, when the statement indicates there is no difference between
two parameters, the hypothesis is a null hypothesis denoted by H
0
; but where there is a difference
between two parameters, the hypothesis is an alternative hypothesis denoted by H
1
[8].
Let n be the number of observations for two variables x and y. As defined in [7], a
correlation, denoted by r, is given by
(

)(

][

]
, is a single number
that describes the strength of relationship of x and y. A zero correlation indicates no relationship
between x and y. When x and y have a positive correlation, x and y move in the same
direction, i.e., as x increases, y also increases or the other way around. On the other hand, when
x and y have a negative correlation, x and y move in the opposite direction. That is, when x
increases, y decreases or vice versa. In addition, a partial correlation is the relationship of two
variables when the effects of the two or more related variables are removed. A correlation
matrix is a symmetric matrix showing the intercorrelations among all variables. Its diagonal has
a uniform correlation value of 1.000 which is the correlation of the variable within itself. The
number of correlations (m) in a correlation matrix is given by

, where n is the
number of variables.

3

To exemplify the concept of correlation and partial correlation, consider the 26 variables
of the prelim examination results in Table 1 and Table 2 of Appendix B. Table 1 is the
correlation matrix of the 26 variables which has 325 correlations and on its diagonal is a uniform
correlation value of 1.00 which is the correlation of each variable onto itself. Moreover, variables
X
16
(adding unlike terms) and X
18
(incorrect application of DPMA) has a correlation value of
0.96 which implies that X
16
and X
18
is 96 percent highly and positively correlated (i.e., X
16
and
X
18
move in the same direction that when X
16
increases, X
18
also increases or when X
16
decreases,
X
18
also decreases). On the other hand, Table 2 presents the partial correlations of the variables.
Here,







When a certain study has been conducted, one of the goals of the researcher is to describe
the characteristics of the data set. First attempt is usually on the descriptive measures of the data,
that is, the measures of central tendency which serves to locate the center of the data set and the
measures of dispersion.
4

Let x
1
, x
2, .
. . , x
n
be the n observations. The measures of central tendency include mean,
median and mode. The median of n observations according to [11] can be defined as the
middlemost value once the data are arranged according to size. More precisely, if n is an odd
number, the median is the value of the observation numbered

; if n is an even number, the


median is defined as the average of the observations numbered

and

.
On the other hand, one measure of the data dispersion is the sample variance, denoted by
s
2
, is given by

[10, 11, 12].


According to [5], when a variable x is correlated with variable y, x shares variance of y,
and the amount of sharing between x and y is simply the squared correlation. So, from the
example of the correlation, X
16
and X
18
which has a correlation value of 0.96 shares 92 percent of
their variance.
Moreover, common variance is defined as that variance in a variable that is shared with
all other variables in the analysis. This variance is accounted for or shared based on a variables
correlations with all other variables in the analysis. A variables communality is the estimate of
its shared variance among the variables.





5

1.2 Factor Analysis Decision Process [5]
The ultimate goal of any multivariate technique is to obtain reliable results and gain
informative interpretation of the data. To achieve its goal, factor analysis follows a seven-stage-
model-building paradigm. Figure 1 shows the seven stages of EFA.
As shown in the figure below, the starting point of EFA is the research problem. In this
stage, the researcher decides based on the objective what type of factor analysis will be
employed. If the objective of the study is only to summarize the data, i.e., to identify latent
dimensions within the data then the researcher employs the confirmatory factor analysis. In
contrast, if the primary objective of the study centers on data reduction, i.e., extends the
summarization of the data by deriving estimates such as factor score and composite summated
scale, then exploratory factor analysis will be appropriate.
As an illustrative example of the EFA decision process, we consider the application of
EFA on the data of the prelim departmental examination results of the Math 1.7 students. There
are 26 variables, metric in rubric form. These variables are the weaknesses found of each student
in the problem solving part of the exam. Since the objective of the analysis is to reduce the 26
variables, then exploratory factor analysis will be used.
The next stage involves decision on designing the factor analysis. When variables are to
be grouped, R-type factor analysis will be utilized. While, if the researcher decides on grouping
the respondents, then some type of EFA of respondents such as Q-type factor and cluster
analysis will be used.
Continuing the illustrative example, the design of our factor analysis will be R-type factor
analysis since we tend to group variables.
6

Stage 1



Confirmatory
Exploratory




Cases Variables

Stage 2

Stage 3



Stage 4
Common Factor Analysis Principal Component Analysis




Stage 5
Orthogonal Oblique


No


Yes

Yes

No
Stage 6
Stage 7

Figure 1: Stages of EFA
Research Problem

Selecting a Factor Method
Assumptions
Research Design
Select the type of
Factor Analysis
Structural Equation
Modeling
Criterion of
Summated Scales
Computation of Factor Scores
Selection of
Surrogate Variable
Validation of EFA
Factor Model Respecification
Interpreting the Rotated
Factor Matrix
Selecting a Rotational Method
Specifying the Factor Matrix
7

The third stage of EFA focuses on checking the assumptions. First to consider is the
selection of variable and the appropriateness of sample size. Metric variables are the most
appropriate in EFA since it can easily used the typical correlation matrix of variables, but if a
nonmetric variable should be included in the analysis, one approach is to define dummy
variables (coded 0-1) to represent categories on nonmetric variables. If all the variables are
dummy variables, then specialized forms of factor analysis, such as Boolean factor analysis, are
more appropriate. Regarding the sample size question, the researcher generally would not factor
analyze a sample of fewer than 50 observations, and preferably the sample size should be 100 or
larger. As a general rule, the minimum is to have at least five times as many observations as the
number of variables to be analyzed, and the more acceptable sample size would have a 10:1
ratio.
Another method of determining the appropriateness of factor analysis examines the entire
correlation matrix. The Bartlett test of sphericity tests the null hypothesis that the original
correlation matrix is an identity matrix. For factor analysis to work, we need this test to be
significant (i.e., p-value < .05) so that our correlation matrix is not an identity matrix; therefore,
there are some relationships between the variables we hope to include in the analysis.
A third measure to quantify the degree of intercorrelations among the variables and the
appropriateness of factor analysis is the measure of sampling adequacy (MSA). This index ranges
from 0 to 1, reaching 1 when each variable is perfectly predicted without error by the other
variables. As a rule of thumb, MSA below .50 is unacceptable.
In addition to a visual examination of a variables correlations with the other variables in
the analysis, the MSA guidelines can be extended to individual variables. The researcher should
8

examine the MSA values for each variable and exclude those falling in the unacceptable range.
In deleting variables, the researcher should first delete the variable with the lowest MSA and
then recalculate the factor analysis. Continue this process of deleting the variable with the lowest
MSA value under .50 until all variables have an acceptable MSA value.
In addition, the researcher must also ensure that the data matrix has sufficient correlations
to justify the application of factor analysis. If visual inspection of the correlation matrix reveals
no substantial number of correlations greater than .30, then factor analysis is probably
inappropriate.
For checking the assumptions of our example, refer to the tables in Appendix B. In this
example, we have a sample size of 864 which falls within acceptable limits. The 26 variables are
in rubric form which is metric. Inspection of the correlation matrix in Table 1 reveals presence of
correlations greater than .300 (shaded in color) which provides an adequate basis for proceeding
to an empirical examination of adequacy for factor analysis on both an overall basis and for each
variable. Bartlett test (p-value=.000: highly significant) finds that the correlations, when taken
collectively indicates presence of nonzero correlations which implies that variables in the
analysis shows interrelationships with each other. Clearly, our correlation matrix shows that
correlations of the variables are not zero. Further checking of the assumptions, Table 3 presents
the MSA values and the variables deleted in the analysis. Here, the overall MSA value .466 does
not fall on the acceptable range Examining the individual variables MSA which are the numbers
on the diagonal of Table 2, identifies fifteen variables (X
1
, X
2
, X
3
, X
4,
X
5
, X
6
, X
7
, X
8
, X
9,
X
10
, X
11
,
X
12
, X
13
, X
14,
X
15
) that have MSA values under .50. Because X
19
the lowest MSA value, it will be
omitted in the attempt to attain a set of variables that can exceed the minimum acceptable MSA
levels. Recalculating the MSA values finds again another variable with MSA value below .50
9

(listed in Table 2) so it is also deleted from the analysis. The process in recalculating the MSA
values continues until all variables meet the acceptable MSA level. For this, we come up with an
overall MSA value of .594. As we observed, the deletion of those variables increases the overall
MSA of the analysis. As a result, Table 3 contains the correlation matrix for the revised set of
variables, that is, variables with MSA below .50 were deleted. As with the full set of variables,
Bartlett test shows nonzero correlations and correlation matrix shows presence of correlations
greater than .30 (in color). Finally, examining the partial correlations in Table 4 shows only two
with values greater than .50 (X
5
-X
4
, X
12
-X
16
: values in color) which is another of indicator of the
strength of the interrelationships among the variables in the reduced set. As shown, the reduced
set of variables meets the fundamental requirements and is appropriate for factor analysis, and
the analysis can proceed to the next stages.
The next stage involves decisions concerning the method of extracting the factors and the
selection of factors to be retained. The researcher can choose from two similar, yet unique,
methods for extracting the factors, namely; the common factor analysis (CFA) or the principal
component analysis (PCA). The decision on the method to use is based on the objectives of the
factor analysis.
PCA is most appropriate when data reduction is a primary concern, focusing on the
minimum number of factors needed to account for the maximum portion of the total variance
represented in the original set of variables. When PCA is used as an extraction method, then
extracted linear combination of variables that are highly interrelated is called component.
10

In contrast, CFA is most appropriate when the primary objective is to identify the latent
dimension or constructs represented in the original variables. When used, extracted linear
combination of variables is called factor.
Both factor analysis methods are interested in the best linear combination of
variablesbest in the sense that the particular combination of original variables accounts for more
of the variance in the data as a whole than any other linear combination of variables. Therefore,
the first factor may be viewed as a single best summary of linear relationships exhibited in the
data. The second factor is defined as the second-best linear combination of the variables, subject
to the constraint that it is orthogonal to the first factor. To be orthogonal to the first factor, the
second factor must be derived from the variance remaining after the first factor has been
extracted. Thus, the second factor may be defined as the linear combination of variables that
accounts for the most variance that is still unexplained after the effect of the first factor has been
removed from the data. The process continues extracting factors accounting for smaller and
smaller amounts of variance until all of the variance is explained.
In deciding when to stop factoring, the researcher generally begins with some
predetermined criteria, such as the general number of factors plus some general thresholds of
practical relevance (e.g., required of percentage of variance explained). An exact quantitative
basis for deciding the number of factors to extract has not been developed. However, the some
stopping criteria for the number of factors to extract are currently being utilized.
The latent root criterion is the most commonly used technique and is apply to either PCA
or CFA. The rationale of this criterion is that any individual factor should account for the
11

variance of at least a single variable if it is to be retained for interpretation. Thus, only the factors
having latent roots or eigenvalues greater than 1 are considered significant.
Furthermore, the percentage of variance criterion is an approach based on achieving a
specified cumulative percentage of total variance extracted by successive factors. The purpose is
to ensure practical significance for the derived factors by ensuring that they explain at least a
specified amount of variance. No absolute threshold has been adopted for all applications.
However, in the social sciences, where information is often less precise, it is not uncommon to
consider a solution that accounts for 60 percent of the total variance.
Another technique for factor retention is the scree test criterion, is a graphical approach
which is derived by plotting the latent roots against the number of factors in their order of
extraction, and the shape of the resulting curve is used to evaluate the cutoff point.
In our example ,we have the primary objective of variable-reduction and so PCA is the
most appropriate tool for factor extraction. Table 5 contains the information regarding the 14
possible factors and their relative explanatory power as expressed by their eigenvalues. Applying
the latent root criterion of retaining factors with eigenvalues greater than 1.0, five factors will be
retained. However, the five factors retained represent only 59.495 percent of the variance of the
14 variables, which is not sufficient in terms of total variance explained. The scree test in Figure
1 of Appendix B, however, indicates that six factors may be appropriate for retention
accompanied by 66.531 percentage of the total variance explained, deemed to meet the
requirement. Considering the eigenvalue of the sixth factor, it is been quite close to 1, relative to
the latent root criterion value of 1.0 precluded its inclusion. Combining all these criteria together
leads to the conclusion to retain six factors for further analysis.
12

The fifth stage of EFA is the factor interpretation. After defining the number of factors to be
retained for interpretation, the researcher will now examine the unrotated factor loading matrix
containing the factor loadings for each variable on each factor. Factor loadings are the
correlation of each variable and the factor. Through factor loadings of each variable to the
factors, the researcher can assess each variables communality whether it falls within acceptable
limits. Communality is equal to the total sum of the squared factor loadings of each variable to
the factors. It is specified that at least one-half of the variance of each variable must be taken into
account. The size of the communality is a useful index for assessing how much variance in a particular
variable is accounted for by the factor solution. Higher communality values indicate that a large amount
of variance in a variable has been extracted by the factor solution. Small communalities show that a
substantial portion of the variable is not accounted for by the factors. Although no statistical guidelines
indicate exactly what is large or small, practical considerations dictate a lower level of .50 for
communalities in this analysis. Using this guideline, all variables identified with communalities less
than implies no sufficient explanation.
When a variable x has communality below .500, then x will be deleted and recalculate the
analysis until all variables have communalities greater than .500.
The total amount of variance explained by either a single factor or the overall factor solution can
be compared to the overall variation in the set of variables as represented by the trace of the factor
matrix. The trace is the total variance to be explained and is equal to the sum of the eigenvalues of the
variable set. In component analysis, the trace is equal to the number of variables because each variable
has a possible eigenvalue of 1.0. By adding the percentages of trace for each of the factors (or dividing
the total eigenvalues of the factors by the trace), we obtain the total percentage of trace extracted for the
factor solution. This total is used as an index to determine how well a particular factor solution accounts
for what all the variables together represent. If the variables are all very different from one another, this
13

index will be low. If the variables fall into one or more highly redundant or related groups, and if the
extracted factors account for all the groups, the index will approach 100 percent.
When all the variables have communality falls in acceptable level, the researcher would
then examine significant factor loadings in the unrotated factor matrix.
The process of interpretation would be greatly simplified if each variable had only
one significant loading. When a variable is found to have more than one significant loading, it is
termed as cross-loading. The difficulty arises when a variable results to several significant
loadings. If a variable persists in having cross-loadings, it becomes a candidate for deletion.
In most cases, most of the variables in the unroted factor matrix cross-loads. Because of
this, the interpretation is difficult and quite meaningless and we need to rotate the factors hoping
to find a more simplified structure.
In factor analysis, there are two types of factor rotation. These are orthogonal factor
rotation and oblique factor rotation. The first type of factor rotation is subject to the constraint
that the axes of rotation is maintain on a 90 degree position which assures that whenever EFA
uses orthogonal rotation components extracted are uncorrelated. In other words, variables in each
component are cannot be explained by the other variables on the other components. In contrast,
the second type of rotation is similar to the first, except that oblique rotations allow correlated
factors instead of maintaining independence between the rotated factors. Thus, some variables on
one component can be explain by other variables on the other component.
In practice, the objective of all methods of rotation is to simplify the rows and columns of
the factor matrix to facilitate interpretation. In a factor matrix, columns represent factors, with
each row corresponding to a variables loading across the factors. By simplifying the rows, we
14

mean making as many values in each row as close to zero as possible (i.e., maximizing a
variables loading on a single factor). By simplifying the columns, we mean making as many
values in each column as close to zero as possible (i.e., making the number of high loadings as
few as possible).
Three major orthogonal approaches have been developed. First is the QUARTIMAX
rotation where its ultimate goal of is to simplify the rows of a factor matrix. In contrast to
QUARTIMAX, the VARIMAX criterion centers on simplifying the columns of the factor
matrix. Finally, is the EQUIMAX approach which is a compromise between the QUARTIMAX
and VARIMAX approaches.
On the other side, Direct Oblimin is the default of oblique factor rotation in SPSS.
No specific rules have been developed to guide the researcher in selecting a particular
orthogonal or oblique rotational technique. The choice should be made on the basis of the
particular needs of a given research problem.
When the objective of the study is data reduction it is further to note that orthogonal
factor rotation using VARIMAX approach is more appropriate.
As a final process, the researcher evaluates the rotated factor loadings for each variable in
order to determine that variables role and contribution in determining the factor structure. A .30
loading translates to approximately 10 percent explanation, and a .50 loading denotes that 25
percent of the variance is accounted for by the factor. The loading must exceed .70 for the factor
to account for 50 percent of the variance of a variable. Thus, larger absolute size of the factor
loading, the more important the loading in interpreting the factor matrix.
15

With the objective of obtaining a power level of 80%, the use of a .05 significance level,
and the proposed inflation of the standard errors of factor loading, Table 1 contains the sample
sizes necessary for each factor loading value to be considered significant.
Table 1: Guidelines for Identifying Significant Factor Loadings Based on Sample Size


When an acceptable factor solution has been obtained in which all variables have a
significant loading on a factor, the researcher attempts to assign more meaning to the pattern of
factor loadings. Variables with higher loadings are considered more important and have greater
influence on the name or label selected to represent a factor. This label is not derived or assigned
by the factor analysis computer program; rather, the label is intuitively developed by the
researcher based on its appropriateness for representing the the underlying dimensions of a
particular factor.
For our example, Table 6 contains the unrotated factor loading matrix of the 6 components
retained. The seventh column provides summary statistics detailing how well each variable is explained
by the six components. For instance, the communality figure of .548 for variable X
15
indicates that it has
less in common with the other variables included in the analysis than does variable X
16
, which has a
16

communality of .900. Both variables, however, still share more than one-half of their variance with the six
factors. On the other side, two variables with communality below .500 are depicted (X
9
and X
12
), which
will be eliminated.
With X
9
and X
12
eliminated, Table 7 displays the factor loading matrix of revised set of variables with
five components extracted. In the table another two variables (X
5
and X
14
) was observed to have
communalities below .500 which requires deletion in the analysis. Table 8 portrays the revised factor
loading matrix with X
5
and X
14
deleted. Examination of the table reveals no substantial communality of the
variables below .50. Thus, analysis will proceed to the next stage.
In Table 8, the first row of numbers at the bottom of each column is the column sums of squared
factor loadings (eigenvalues) and indicates the relative importance of each factor in accounting for the
variance associated with the set of variables. Note that sums of squares for the six factors are 2.4, 1.79,
1.53, 1.11, and 1.08, respectively. As expected, the factor solution extracts the factors in the order of
their importance, with factor 1 accounting for the most variance, factor 2 slightly less, and so on through
all the six factors. The total eigenvalues which is 7.91 represents the total amount of variance extracted
by the factor solution. The percentage of trace explained by each of the six factors (21.82%, 16.27%,
13.91%, 10.09%, and 9.82%, respectively) are shown as the last row of values of Table 8. The
index for the overall solution shows that 71.91 percent of the total variance is represented by the
information contained in the factor matrix of the six-factor solution. Therefore, the index for
this solution is high, and the variables are in fact highly related to one another.
Given the sample size of 864, factor loadings of .30 and higher will be considered
significant for interpretative purposes. Numbers in shaded color in Table 8 indicates the
significant loadings. Here, 10 of the 11 variables has cross-loadings, thus, we need to rotate the
factors to obtain a more simplified structure.
17

The VARIMAX-rotated component analysis factor matrix is shown in Table 10. Note
that the total amount of variance extracted is the same in the rotated solution as it was in the
unrotated one, 71.91 percent. Also, the communalities for each variable do not change when a
rotation technique is applied. Still, two differences do emerge. First, the variance is redistributed
so that the factor-loading pattern and the percentage of variance for each of the factors are
slightly different. Specifically, in the VARIMAX-rotated factor solution, the first factor
accounts for 20.09 percent of the variance, compared to 21.82 percent in the unrotated solution.
Likewise, the other factors also change, the largest change being the fourth factor, increasing
from 10.09 percent in the unrotated solution to 11.55 percent in the rotated solution. Thus, the
explanatory power shifted slightly to a more even distribution because of the rotation. Second,
the interpretation of the factor matrix is simplified.
Having defined the various elements of the rotated factor matrix, let us examine the
pattern of significant factor loading hoping to find a simplified structure.
In the rotated factor solution each of the variables has a significant loading on one factor
except for X
1
, X
11
and X
15
.

Moreover, variables with no cross-loadings exhibits factor loadings
above .50, meaning that more than one-half of the variance is accounted for by the loading on a
single factor. With all of the communalities of sufficient size to warrant inclusion, the only
remaining decision is to delete X
1
, X
11
and X
15
.
Deleting X
1
, X
11
and X
15
, Table 11(left-side) displays the revised set of the factor-loading
matrix. The matrix shows simplified structure of components but assessing communalities of the
individual variables reveals X
3
and X
7
with communalities below .50. Finally, deleting those two
variables leaves us with 6 variables in the analysis. As we see, the factor loadings for the six
18

variables remain almost identical, exhibiting both the same pattern and almost the same values
for the loadings. The amount of explained variance increases up to 82 percent. With the
simplified pattern of loadings, all communalities above 50 percent, and the overall level of
explained variance is high enough, the six-variable/three-factor solution is accepted.
The sixth stage involves assessing the degree of generalizability of the results to the
population The most direct method of validating the results is to split the sample of the original
data set and assess the replicability of the results. Factor stability is the primary concern to
assess robustness of the solution across the sample.

1.5.2.7 Additional Uses of Factor Analysis Results
This stage includes two options: (a) Selecting the variable with the highest factor loading
as a surrogate representative for a particular factor dimension; (b) Replacing the original set of
variables with an entirely new, smaller set of variables created either from summated scales or
factor scores.

If the researchers objective is simply to identify appropriate variables for subsequent
application with other statistical techniques, the researcher has the option of examining the
factor matrix and selecting the variable with the highest factor loading on each factor to act as a
surrogate variablethat is representative of that factor.




19

1.6 Statistical Packages
1.6.1 SPSS [9]
SPSS (Statistical Packages for Social Sciences) is a statistical analysis and data
management software package. SPSS can take data from almost any type of file and use them to
generate tabulated reports, charts, and plots of distribution and trends, descriptive statistics, and
conduct complex statistical analyses.
Moreover, SPSS is a powerful, user-friendly software package for the manipulation and
statistical analysis of data. The package is necessary in the field of Social Sciences such as
psychology, sociology, psychiatry, and many other containing as it does an extensive range of
both univariate and multivariate procedures.










20

CHAPTER 2
METHODOLOGY

2.1 The Data
The data used in the empirical analysis was obtained from the results of the Prelim,
Midterm, and Final Departmental Examination of the students taking up Math 1.7 of the Caraga
State University (CSU)-Main Campus, Ampayon, Butuan City during the first (1st) semester of
SY 2012-2013.There were 864 students who took the Prelim Departmental Examination, 500 in
the Midterm, and 565 during the Final Examination.
The data focuses on the unmastered competencies (weaknesses) of the students in
Problem Solving. Manipulating the data on each departmental examination results yields 26
metric variables for prelim, 29 for midterm and 28 for final. Refer to Appendix C for the list of
variables and its corresponding deficiency indicator.

2.2 Data Format
To achieve the specified objectives of this study, we use three methods of quantification
of our data. Table 2.2.1 presents each type of data format with its corresponding description and
illustration.


21

Table 2.2.1 Description and Illustration of each Data Format arrived for the use of EFA.
Data Format Description Illustration
Rubric-Score RS=Weight-Score Level of weakness of each student is
taken from the item of each problem
minus its students corresponding
score.
Likert Scale Equally-distant 5 category from
scale 1 to 5 in each item
Total weights of each weakness are
sum up, using it in arrive the equally-
distant interval scale from 1 to 5.
Thurstone Median-based rate (0-5) on each
weakness regarding on how
difficult each problem
Three persons rate each weaknesses
and the median is derived making it the
standard score.

2.3 Computing in SPSS
The following procedures were utilized in the analysis of data using the SPSS.
2.3.1 Starting up SPSS and Data Entry
As shown in Figure 2.3.1.1,
1. In the search programs and files in the computer, type SPSS and click.
2. From Excel where the raw data are being stored, copy and paste them to the SPSS
Spreadsheet data editor (DATA VIEW).

22

3. Click the VARIABLE VIEW found just below the Spreadsheet data editor. Then label,
arrange and fix all variables ready for the empirical data analysis.


2.4 Data Analysis
As shown in Figure 2.4.1, data were analyzed using the following steps:
1. After fixing all variables in the data, click ANALYZE in the menu bar of SPSS. Find the
DATA REDUCTION item and select FACTOR.
2. Output in Figure 2.3.1 (b) will pop out, check DESCRIPTIVES and all the boxes present for
the STATISTICS and CORRELATION MATRIX output. Then, click continue.
3. Click EXTRACTION in the left of Descriptives. Then choose PRINCIPAL
COMPONENTS as the extraction method and check for the CORRELATION MATRIX,
UNROTATED FACTOR SOLUTION and SCREE PLOT. Also, choose EXTRACT
EIGENVALUES GREATER THAN 1 and use the default for the maximum number of
convergence which is 25. Then, again click continue.
4. In the left of Extraction, click ROTATION and then choose VARIMAX and check
ROTATED SOLUTIONS and LOADING PLOTS. Again, use the default for the maximum
iterations of convergence equal to 25. Click continue.
5. To continue, click SCORES. Under factor analysis factor scores, check SAVE AS
VARIABLES and choose BARTLETT and check display factor score coefficient matrix.
Again, click continue.
23

6. Finally, click OPTIONS. Under missing values, choose exclude cases pairwise. Check
sorted by size and suppress absolute value less than the corresponding significant loading
corresponding to the sample size of the data analyzed.
7. The OUTPUT NAVIGATOR appears and displays the desired statistical results and tables
from the analysis performed.
8. Repeat the same method until to the last set of data.

(a) (b)


(c
Figure 2.3.1.1 Starting up SPSS and Data Entry
24

Figure 2.4.1 The Data Analysis




25


(e) (f)




(h) (g)
26


(i) (j)



(l) (k)


27

CHAPTER 3
RESULTS AND DISCUSSION

This chapter presents the results on the numerical experiment with Exploratory Factor
Analysis (EFA) using empirical data. The empirical data were obtained from the problem solving
part of the departmental examinations of College Algebra (Math 1.7) implemented in Caraga
State University (CSU) in the first semester of the school year 2012-2013.
The first section presents the checking of assumptions. Particularly, inspection of the
MSA values and Bartlett test p-values is being examined. Variables with MSA values less than
.500 are deleted and analysis is recalculated again and again until all variables left have MSA
values on the acceptable range.
On the other hand, the second section shows the exploratory factor analysis of the three
different data format across departmental examinations. The number of components (Co) and the
percentage of variance explained (%VE) are considered to be the criteria in choosing the most
efficient data format where EFA is consistent.
Moreover, the third part presents the comparison of the two types of EFA factor rotation,
namely, orthogonal and oblique factor rotation. These rotations will be compared on the data
format chosen in the second section. Details of consistencies and inconsistencies of the factor
loadings are used to examine what type of factor rotation is the most appropriate.
Finally, the last section assesses the validation of EFA on split-sampling (60%-40%,
50%-50%) on rubric data format. Variable composition on each component and variance
explained serve as basis of stability of the factors.
28

3.1 The Data Analysis


Table 3.1.2: Overall MSA and Individual Variables MSA
Data Source Prelim Midterm Final

Iteration
Bartlett
test
(p-value)
Overall
MSA
Variables
with MSA
less than .5
Overall
MSA
Variables
with MSA
less than .5
Overall
MSA
Variables
with MSA
less than .5
1 .000 .466 X
19
=.131 .543 X
24
=.399 .544 X
11
=.450
2 .000 .526 X
22
=.288 .554 X
18
=.441 .547 X
15
=.458
3 .000 .542 X
23
=.366 .582 X
20
=.427 .552 X
14
=.446
4 .000 .554 X
6
& X
21
=.391
.594 X
22
=.442 .555 X
24
=.457
5 .000 .565 X
20
=.434 .598 X
21
& X
25

=.462
.560 X
4
=.480
6 .000 .569 X
10
=.418 .606 X
17
& X
14

=.496
.563 None
7 .000 .576 X
13
& X
24

=.452
.625 X
13
=.400
8 .000 .585 X
2
=.446 .631 X
19
=.474
9 .000 .591 X
17
=.477 .641 X
27
=.477
10 .000 .594 None .645 X
11
=.481
11 .000 .655 none

The same process of checking assumptions was utilized for data in midterm and final
departmental results as shown in Table 3.1.2.






29

3.2 Data Format and Factor Analysis
Applying the same criteria above, Table 3.2.2 displays the percentage of variance
explained of the extracted components of the three data format of our data set, namely;
rubric, likert, and thurstone across departmental examinations.
Examination on the table reveals that all data formats consistently extracts percentage of
variance greater than 60% across data sources.
Specifying the results, rubric extracts least number of components and highest percentage
of variance explained in prelim. On the other hand, likert and thurstone extracts least number of
components with least percentage of variance explained and highest percentage of variance
explained with highest number of components, respectively. This result leads us to choose rubric
in both midterm and final departmental examinations.
Based on the findings of the specified data, rubric is the most appropriate data format
suited for the application of EFA.











30

Table 3.2.2: Percentage of variance explained (%VE) of Extracted Components
(Co) on different data format of the unmastered competencies of College
Algebra across departmental examinations
Data source Rubric Likert Thurstone




Prelim
Co %VE Co %VE Co %VE
1 17.651 1 16.306 1 12.307
2 14.063 2 13.337 2 10.526
3 11.062 3 8.251 3 9.752
4 8.452 4 7.368 4 8.214
5 8.266 5 7.244 5 6.822
6 7.036 6 6.821 6 6.486
7 6.305 7 5.713
8 5.429
Total %VE 66.531 65.631 65.248





Midterm
Rubric Likert Thurstone
Co %VE Co Co %VE Co
1 14.419 1 14.325 1 11.201
2 9.027 2 9.344 2 8.353
3 8.573 3 8.927 3 7.665
4 7.353 4 8.175 4 7.474
5 6.802 5 7.354 5 6.537
6 6.328 6 6.647 6 6.387
7 6.083 7 6.100 7 5.909
8 5.838 8 5.725
9 5.688
Total %VE 64.442 60.872 64.939






Final
Rubric Likert Thurstone
Co %VE Co Co %VE Co
1 9.613 1 11.944 1 8.991
2 8.658 2 10.482 2 7.477
3 7.358 3 8.198 3 6.554
4 6.301 4 6.807 4 6.301
5 5.631 5 6.241 5 5.377
6 5.213 6 6.228 6 5.272
7 4.983 7 5.796 7 4.920
8 4.840 8 5.714 8 4.714
9 4.609 9 4.631
10 4.437 10 4.298
11 4.156
Total %VE 61.664 61.410 62.690

31

3.3 Factor Rotations of EFA
With the above procedures on rotation of components, Table 3.3.6 compares the two
types of factor rotation, namely; orthogonal versus oblique factor rotation. We observed that
variable compositions on the first component of both types of rotation are similar. Specifically,
in prelim, X
16
and X
18
are the variable composition of component 1; X
10
and X
12
in midterm; and
in final, component 1 is composed of X
22
and X
3
.
To specify, in prelim, orthogonal rotation yields three components with factor loadings
relatively higher compared to oblique rotation since it extracts more number of components. On
the other hand, in midterm, the same number of components extracted but X
10
in oblique rotation
has a higher factor loading compared to the first variable in component 1 of orthogonal rotation.
However, we noticed that oblique rotation does not give a consistent factor loading. Specifically,
X
28
has a factor loading of .793 on component 4 which is higher compared to .787 of X
19
of
component 2. This result does not give well-defined factor-loading pattern of our analysis. Same
observation is visible o final results, inconsistency of factor loadings is present on both types of
rotation. Moreover, orthogonal extracts more number of components than oblique rotation.
Thus, results imply that orthogonal rotation is a more appropriate type of component
rotation used for EFA.




32

Table 3.3.6: Orthogonal vs. Oblique Rotation across Data Sources
Data Source
Co
Component Rotation






Prelim
Orthogonal rotation Oblique rotation
Var FL Var FL
1 X16
X18
.982
.981
X18
X16
.973
.973
2

X4
X8

.900
.887
X4
X8
X9
.875
.777
.670
3 X25
X26

.828
.794
X25
X26
.776
.741
4 X15
X3
.707
.602
5 X5 .766








Midterm
Co Orthogonal rotation Oblique rotation
Var FL Var FL
1 X12
X10
.809
.804
X10
X12
.819
.807
2 X26
X23
.753
.706
X19
X2
.787
.667
3 X28
X29
.766
.721
X27
X11
.714
.709
4 X7
X5
.744
.732
X28
X29
.793
.708
5 X15
X3
.74
.738
X3
X16
.778
.714





Final
Co Orthogonal rotation Oblique rotation
Var FL Var FL
1 X3
X22
.962
.961
X3
X22
.968
.967
2 X23
X16
X20
.812
.731
.633
X16
X23
.835
.831
3 X21
X7

.827
.670
X8
X10
.764
.703
4 X10
X8
.737
.715
X21
X25
.778
-.610
5 X5
X2
.788
-668
X19
X2
-.685
-.681
6 X13 .938
7 X28 .921

33















3.4 On Split-Sampling of Factor Analysis
By cross examinations of the
contents in Table 3.4.1, it reveals that across
different data sources, extracted components
of rubric are stable. That is, variables
composition on each component in 100%
data analysis remains stable across 60%-
40% split-sample. In prelim, composition of
variables in each component is strongly
stable across our split-sample. In midterm,
factors are stable but compositions of
variables interchange except on component
4. The same observation in component 3 and
4 in final, while component 2 only 60%
gives the same variable composition.
However, stability of the factors is not
affected as shown by shading in the Table
3.4.1. Hence, the use of EFA in rubric is
validated.




Table 3.3.1 Validation of EFA on Split-
sample Across Data Sources
Prelim Departmental Examination (n=864)

Co
Variables
100% 60% 40%
1 X16
X18
X18
X16
X18
X16
2 X4
X8

X4
X8

X8
X4
3 X25
X26

X25
X26

X25
X26
4

X14
X21
5 X3
X15
Midterm Departmental Examination (n=500)
Co Variables
1 X12
X10
X9
X24
X10
X12
X13
2 X26
X23
X12
X10
X29
X28
3 X28 X16
34

X29 X3
4 X7
X5
X7
X5

5 X15
X3
X26
X23

Final Departmental Examination (n=565)
Co Variables
1 X3
X22
X22
X3
X3
X22
2 X23
X16
X20
X16
X23
X18
X19
3 X21
X7
X10
X8
X15
X24
4 X10
X8
X27
X6
X14
X5
5 X5
X2
X21
X5
X7
X26
6 X13 X25
X17
7 X28
Note: Shaded parts imply factor structure stability.
35


Finally, Table 3.4.2 presents the reduced variables with its corresponding component together
with its deficiency indicator. The first variable in each component is the surrogate variable which
can be used for further analysis.
Table 3.4.2: Extracted Components with its Variables Composition and
Deficiency Indicator of Rubric Data in Math 1.7 Across Departmental
Examinations
Prelim Departmental Examination
Component Variable and Deficiency Indicator
1 Addition knowledge X
16
Adding unlike terms
X
18
Incorrect application of DPMA
2 set operations X
4
Not able to identify element/ s in the complement
X
8
Not able to identify element/s in the union
3 Incorrect Concepts X
25
Incorrect graphing in complex plane
X
26
Incorrect squaring in polynomials
Midterm Departmental Examination
Component Variable and Deficiency Indicator
1 signed numbers
operations
X12 Operations on integers
X10 Carelessness
2 division operation
deficiency
X26 Identifying conjugate
X23 Finding initial quotient
3 Mathematical expressions
unmastered
X28 Expression on integer
X29 Operations on radical expression
4 Exponential problems
X7 Cancellation of unlike terms
X5 Operations on exponents
5 Factoring problems
X15 Use of grouping sign
X3 Factoring sum of cubes/ factoring trinomial
Final Departmental Examination
Component Variable and Deficiency Indicator
1 Simplification unmastered
X3 Answer not in simplest form
X22 Finding values of x and y in an equation
2 Graphing problems
X23 Identification of vertex
X16 Plotting the points in the graph
X20 Finding the intersection
3 No idea
X21 No solution shown
X7 Carelessness

4 Carelessness
X10 Copying the given problem
X8 Correct solution, wrong final answer
5 Factoring deficiency
X5 Factoring
X2 Formula
6 LCD unmastered
X13 Identification on LCD
7 Unit knowledge
X28 No units


36

Chapter 4
SUMMARY AND RECOMMENDATIONS

4.1 Summary of findings
1. Comparison of the three data format, namely; rubric, likert, and thurstone, results of EFA
shows that EFA is robust on rubric.
2. Comparison of the two types of factor rotation, that is; orthogonal versus oblique, reveals
that orthogonal rotation yields more consistent results of EFA.
3. Validation of EFA in rubric by split-sampling provides stability of the factor structure.
4.2 Recommendations
Based on the findings of the study, the following are recommended for subsequent
investigation:
1. Using the same data and quantification, apply EFA using Common Factor Analysis
(CFA) as the extraction method and EQUIMAX as the orthogonal rotation method.
2. Compare EFA results when using CFA with EQUIMAX in orthogonal rotation versus
CFA with OBLIMIN in oblique rotation.
3. Verify the conjecture of EFA with PCA and VARIMAX on orthogonal rotation that the
number of components extracted is directly proportional with the percentage of variance
explained on EFA with CFA using EQUIMAX in orthogonal rotation.
4. Determine which factor rotation is efficient for EFA using CFA as the extraction method.

37

REFERENCES CITED

[1] Rietveld, T. & Van Hout, R. (1993). Statistical Techniques for the Study of Language and
Language Behaviour. Berlin New York: Mouton de Gruyter.
[2] Darlington, R.B. (2004). Factor Analysis. Website:
http://comp9.psych.cornell.edu/Darlington/factor.htm (accessed 08 November 2012).
[3] Habing, B. (2003). Exploratory Factor Analysis. Website:
http://www.stat.sc.edu/~habing/courses/530EFA.pdf (accessed 08 November 2012).
[4] Diamantopoulos, A., and H. M. Winklhofer 2001. Index Construction with Formative
Indicators: An Alternative to Scale Development. Journal of Marketing Research 38 (May):
269-77
[5] Hair, Joseph H. Jr., and W.C. Black 2004. Multivariate Data Analysis.: pp.90-151
[6] Marsh, H. W., and S. Jackson. 1999. Flow Experience in Sport: Construct Validation of
Multidimensional Hierarchical State and Trait Responses. Structural Equations Modeling
6(4): 343-71
[7] Maison, U. Correlations in statistics.
Website: http://www.socialresearchmethods.net/kb/statcorr.php
38

[8] Codaste, Ijean B. On some tests for Normality Numerical and Graphical Application,
2010.pp.4-8
[9] Landau, Sabine and Brian S. Everitt. A handbook of Statistical Analyses using SPSS, 2003
[10] Panik, Michael J. Advanced Statistics from an Elementary Point of View. 2005
[11] Miller and Freunds. Probability and Statistics for Engineers.
[12] Data Quality and Assessment: Statistical Methods for Practitioners, quality@epa.gov













39

APPENDICES
Appendix A:
Table 1: Variables and its Description/Deficiency Indicator in Prelim Departmental Examination
Variables Deficiency Indicator
X
1
No idea/no answer
X
2
Careless in plotting elements in Venn Diagram
X
3
Incorrect plotting of elements in the Venn Diagram
X
4
Not able to identify element/s in the complement
X
5
Not able to identify element/s in the intersection
X
6
Not able to write proper notation
X
7
Careless in handling signs (+/-)
X
8
Not able to identify element/s in the union
X
9
Not able to identify element/s in the set difference
X
10
Missing some necessary step/s in the solution
X
11
Not able to simplify combination of set operations
X
12
Misinterpretation of problem
X
13
applying DPMA not the technique
X
14
Misinterpretation of an expression as equation
X
15
Not able to simplify operations with complex numbers
X
16
Adding unlike terms
X
17
Not able to divide terms in long division with polynomials
X
18
Incorrect application of DPMA
X
19
Incorrect transposition
X
20
Not able to simplify algebraic expression
X
21
Not able to give the final answer but went through correct
solution
X
22
Not able to identify conjugate
X
23
Not able to simplify the final answer
X
24
Careless in writing terms
X
25
Incorrect graphing in complex plane
X
26
Incorrect squaring of polynomials






40


Table 2: Variables and its Description/Deficiency Indicator in Midterm Departmental
Examination
Variables Deficiency Indicator
X
1
No idea/no answer
X
2
Dividing terms/polynomials sum
X
3
Factoring sum of cubes/factoring trinomial
X
4
Assigning identification of LCD
X
5
Operations on exponents
X
6
Not simplifying final answer
X
7
Cancellation of unlike terms
X
8
Operations on unlike terms/Adding terms of polynomials
X
9
Solution not shown
X
10
Carelessness
X
11
Inclusion of the coefficient in the variable exponent
X
12
Operations on integers
X
13
Laws of exponents
X
14
Misuse of notations
X
15
Use of grouping signs
X
16
Dividing exponential expression with same base
X
17
Dividing exponential expression
X
18
Operations on fractions
X
19
Combining unlike base in exponential expression
X
20
Identifying coefficients on dividend for synthetic
X
21
Subtracting 2
nd
row from the 1
st
in synthetic division
X
22
Wrong identification of divisor
X
23
Finding initial quotient
X
24
Express divisor as synthetic division (x-r)
X
25
Identifying terms on dividend long division
X
26
Identifying conjugate
X
27
Multiplying sum of radical of index 2
X
28
Expression on integer
X
29
Operations on radical expression






41


Table 3: Variables and its Description/Deficiency Indicator in Final Departmental Examination
Variables Deficiency Indicator
X
1
No idea/no answer
X
2
Formula
X
3
Answer not in simplest form
X
4
Identification of coefficient
X
5
Factoring
X
6
Operation on integers
X
7
Carelessness
X
8
Correct solution, wrong final answer
X
9
Operations on polynomials
X
10
Copying the given problem
X
11
Operation on radical equation
X
12
Not using equality sign
X
13
Identification of LCD
X
14
Identification of replacement set
X
15
Not simplifying final answer
X
16
Plotting the points in the graph
X
17
Use of method apply
X
18
No graph shown
X
19
Solution set
X
20
Finding the intersection
X
21
No solution shown
X
22
Finding values of x and y in an equation
X
23
Identification of vertex
X
24
Misuse of equation in finding vertex
X
25
Comprehension
X
26
No solution in finding consecutive integers
X
27
Use of formula
X
28
No units







42

S-ar putea să vă placă și