Sunteți pe pagina 1din 12

1- Bas, G., Kubiatko, M. & Sünbül, A. M. (2016).

Teachers' perceptions towards ICTs in


teaching-learning process: Scale validity and reliability study. Computers in Human Behavior,
Vol. 61, pp. 176 – 185.

Bas, Kubiatko & Sünbül (2016)

---------------------

First, an extensive literature review was made and some items were written for trial based on the
relevant literature. A pool consisting of 35 items were formed by the researchers. Then, the first
form of the scale consisting of 35 items was presented to the views of a group of experts in order
to test the content validity of the scale. Experts from the fields of curriculum and instruction,
educational measurement and evaluation, psychological guidance, instructional technology, and
linguistics were asked to comment on the items prepared for the scale. Based on the views of
these experts, necessary changes and/or improvements were made in regard of language and
intelligibility of the expressions of the items in the
scale. The items in the scale were designed according to 5-point Likert (1932) type scale ranging
from 1 (totally disagree) to 5 (totally agree) to indicate teachers' level of agreement or
disagreement with each of these items. The scale, then, was finalised as consisting of 30 items
for pilot study. Five items were removed from the scale because of the negative views of the
suggestions of the experts. The final form of the scale was prepared and then applied on a group
of high school teachers.

In order to sustain the content validity of the scale, the exploratory factor analysis (EFA) based
on a principal component analysis (PCA) was applied (Murphy Davidshofer, 1991; & Reuterberg
Gustafsson, 1992), then the confirmatory factor & analysis (CFA) was conducted to determine
whether the define construct was valid (J€oreskog S€orbom, 1993; Tabachnick Fidell, & 2001).

scale, studies regarding the EFA were carried out. In the evaluation of the EFA, Kaiser-Meyer-
Olkin measure of sampling adequacy (KMO) and Bartlett's test of sphericity were used in the
study. KMO sampling adequacy and Bartlett's test of sphericity are used as a criterion for the
EFA. KMO sampling adequacy and Bartlett's test of sphericity values were examined in order to
test the eligibility of the data obtained for the EFA (Fraenkel Wallen, 2000; Murphy & Davidshofer,
1991). In the related literature, it is stated that KMO value should be greater than 0.60 (Fraenkel
Wallen, 2000; & Tabachnick Fidell, 2001) as well as Bartlett's test of sphericity & should be
significant to conduct a factor analysis (Murphy
& Davidshofer, 1991; Reuterberg Gustafsson, 1992).

factors extracted as a result of the EFA (Kline, 1994). In the EFA, factors with eigenvalues equal
to 1.00 or more than this value are accepted as important factors (Kline, 1994; Tabachnick Fidell,
& 2001). Besides, in the related literature factor loads ranging be tween 0.30 and 0.40 can be
taken as the lowest limits in determining whether the items were included in the scale (Diekhoff,
1992; Ferguson Takane, 1989). According to Diekhoff (1992), a & factor loading is considered as
“excellent” if it is 0.71 (which explains 50% of the variance). According to Tabachnick and Fidell
(2001), it is considered as “pretty good” if it is 0.63 (which explains 40% of the variance), as “good”
if it is 0.55 (which explains 30% of the variance), as “average” if it is 0.45 (which explains 20% of
“ the variance), and “poor” if it is 0.32 (which explains 10% of the variance). As there are different
views in determining the lowest factor loading limit (e.g., Diekhoff, 1992; Tabachnick Fidell, 2001),
Ferguson and Takane (1989) indicate that 0.40 should be taken as the lowest factor loading limit
in order to create factor patterns.

CFA was conducted in the study. As a result of the EFA, it was seen that the scale had a structure
of three factors with 25 items. The CFA was used to examine whether the structure identified in
the EFA worked in a new sample. Hence, the three-factor structure derived as a result of the EFA
was applied on a group of 200 high school teachers similar to the sample group of the study. Also,
Kline (2005) suggests that a CFA should be conducted on the model derived as a result of the
EFA so that the CFA was applied to test the model derived as a result of the EFA. On the other
hand, as a result of the CFA various goodness of fit indices are obtained. In the related literature,
it is accepted as reasonable to use multiple goodness of fit indices instead of one single fit index
so as to test the model derived as a result of the EFA (J€oreskog S€orbom, 1993; Kline, & 2005;
Marsh, Balla, McDonald, 1988; Schumacker Lomax, & & 1996; Tabachnick Fidell, 2001). As a
result of teachers' perceptions towards ICTs in teaching-learning process scale, in addition to
(Dc2) traditional chi-square analysis, various goodness of fit indices including the goodness of fit
index (GFI), the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square
error of approximation (RMSEA), and adjusted goodness of fit index (AGFI) are used in
confirmatory factor analysis (CFA). While the ratio of Dc2 chi-square degrees of freedom ( /df)
should be less than 3, GFI,
E CFI, TLI, and AGFI values can vary from 0 to 1 and the values exceeding 0.90 indicate a good
fit. Also, RMSEA should be less than 0.05, with values less than 0.06 representing good fit
(HuBentler, & 1999; J€oreskog S€orbom, 1993; Kline, 2005; Thompson, 2004). After the CFA of
the scale, Cronbach's (1990) Alpha internal consistency coefficients were calculated for reliability
of the scale.
The results for the construct validity of the scale were gathered in two different titles in the study.
The results for the construct validity of the scale were examined under results of the exploratory
factor analysis (EFA) and results of the confirmatory factor analysis (CFA) titles.

As looked at the results in Table 5, it was seen that Cronbach's extent (Anderson, 1988; Peers,
1996; Scherer, 1988).
Alpha internal consistency coefficients of the factors obtained range. When looked at the alpha
values of the factors, it was seen that ATT factor was calculated as 0.88, US factor was found as
0.85, and the last factor, BEL was calculated as 0.72, respectively.
Also, the general reliability coefficient value for the scale was found as 0.92 in the study. Reliability
coefficients in reliability studies values between 0.60 and 0.70 are accepted as sufficient
Cronbach (1990).
However, it is generally accepted that the reliability coefficient must be 0.70 in a lesser Secondly,
Spearman-Brown correlation coefficient for the scale between 0.88 and 0.72 in the study. was
calculated and the result in regard of the analysis was found as 0.85 in the study.

In the related literature, values above 0.80 are accepted as good for reliability (Anderson, 1988;
Kline, 1994).
According to €Ozen, Güçalti, and Kandemir (2006), Spearman-Brown correlation coefficient is a
good was when it is hard to use the test for two times and/or prepare two equivalent forms of the
similar test. Thus, the value obtained in regard of Spearman-Brown correlation coefficient for the
scale can be defined as good (Murphy Davidshofer, 1991; Reuterberg Gustafsson, 1992). Also,
the item-total correlations were calculated for the total scale and the analyses were presented in
Table 6.

In Table 6, participants' perception scores (mean and standard deviation) towards ICTs in
teaching-learning process as well as item-total correlation values were given. As a result of
Pearson's correlation analysis, all the items in the scale were understood to be correlated
significantly with the total score at 0.01 level.

2- Demirci,F. & Ozyurek, C. (2018). Astronomy Teaching Self-Efficacy Belief Scale: The Validity
and Reliability Study. Journal of Education and Learning, Vol. 7(1), pp. 258-271.

The construct validity of the scale was investigated via exploratory factor analysis (EFA) and
confirmatory factor analysis (CFA). The results of EFA showed that the scale construct included
a total of three factors and 13 questions and explained 70.60% of the total variance. CFA results
showed that the chi-squared (.2/value and the degrees-of-freedom rates sd = 1.67) were perfect,
and the other fit indices showed a good fit (GFI = 0.86, CFI = 0.94, NNFI = 0.92, IFI = 0.94, SRMR
= 0.08 and RMSEA = 0.06). The results of the reliability analysis showed that the Cronbach’s
alpha reliability coefficient was 0.84 for the whole scale, 0.90 for “student outcomes through
astronomy teaching” factor, and 0.83 for both “astronomy teaching strategies” factor and “difficulty
in astronomy teaching” factor. In conclusion, the results obtained showed that “Astronomy
Teaching Self-Efficacy Belief Scale” can be used as a valid and reliable assessment instrument.

Before the pilot scale was finalized, a pre-pilot scale was used with 10 science teachers other
than those in the research group, in order to identify items which were difficult to understand and
to measure the average time required to complete the whole scale.

According to the feedback from these teachers, it was found that the scale was not difficult to
understand, and the period of time required for answering the scale was about 30-35 minutes.

Finally, participant instructions and a personal information form were added to the scale, forming the final
pilot scale. For factor analysis, the pilot scale was applied to 113 volunteer science teachers working in the
province and towns of Ordu during the 2016-2017 academic year. After the scale was implemented, the
scale forms which included random answers, unanswered items or multiple answers, were not entered into
the SPSS program and the analyses were conducted on the data from a total of 106 participants.

Reliability analysis was conducted on the ATSBS as a whole scale and on its sub-dimensions by calculating
the Cronbach’s alpha reliability coefficients. Construct validity was examined with content validity and
factor analyses by taking experts’ views. In addition, item analysis was conducted with item-total
correlation and with an independent groups t-test, in order to determine whether there was a difference
between the scores of the items in the lower and upper groups of 27%.

After the exploratory factor analysis (EFA) of the scale was conducted with the help of the SPSS 22.0
program, confirmatory factor analysis (CFA) was conducted using the LISREL 8.51 program. The EFA
and the CFA of the scale were both undertaken with the same research group data.

Construct validity gives proof about how well the scale measures the concept (factor) which is
intended to be measured (DeVellis, 2014). In this study, two kinds of factor analyses were applied
to obtain proof about the construct validity of the ATSBS. First, the factor structure of the
developed scale was identified, and then CFA was used to test whether the EFA and the
determined item-factor structure were consistent.

The varimax technique was applied in EFA (Can, 2014). In order to test the suitability of the data
structure in EFA, the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett’s
test of sphericity results were examined (Table 1).

A KMO value of between 0.5 and 0.7 shows that the sample size is sufficient (Can, 2014). Table 1 shows
that the KMO sample sufficiency level was 0.79 and Bartlett’s test resulted in a chi-squared value (2

(78) =615.10; p < 0.001) that was statistically significant. These results show that the sample size of the data
is suitable for factor analysis and that the data originate from a multivariate normal distribution (Kaiser,
1974). In conclusion, since these two examined premises are met, it means that there is a suitable quantity
of data and thus they are suitable for EFA (Tabachnick & Fidell, 2007).

Principal component analysis (PCA), (a factorization method) and varimax vertical rotation techniques, (a
rotation technique), were used to show the factors under which the items were grouped. Following the
rotation, the distributions of the items to the factors, factor loads, common variations, averages and standard
deviation values are given in Table 2.

Cronbach’s alpha coefficients were calculated for the whole scale and for each sub-factor, in order
to analyse how consistent (in terms of internal consistency) the items of the scale were. In general,
it can be said that for Cronbach’s alpha reliability coefficient values of 0.70 and above, the scale
developed is reliable (Fraenkel et al., 2012). The Cronbach’s alpha reliability coefficient values
were 0.84 for the whole scale, 0.90 for the first factor, and 0.83 for the second and third factors.

As a result of the pilot scale, a factor analysis was conducted to investigate the construct validity
of the ATSBS.
First of all, the KMO and Bartlett’s test of sphericity were used to test whether the data obtained
were suitable for factor analysis. Analysis results showed that the KMO value was 0.79 and
Bartlett’s test of sphericity was found to be statistically significant (p < 0.001). These results show
that the data and the sample size were suitable in terms of the applicability of factor analysis and
the data were found to be normally distributed. In other words, the variables had associations high
enough to form a reasonable basis for their use (Leech, Barrett,& Morgan, 2005).

The data tested for suitability for factor analysis were first examined with EFA, and then with CFA, to
confirm the factor construct of the scale. Principal components analysis was adapted and varimax
(maximum change) rotation was used. As a result of varimax rotation, it was concluded that the ATSBS
had a total of 13 items -nine positive and four negative- and three factors. In addition, it was found that the
factors of the scale explained 70.60% of the total variance. According to this result, it can be said that the
items in the factors were highly correlated with the factor and the scale had a strong construct.

After the scale’s factor construct was determined with EFA, CFA was applied. As a result of the findings
from the fit indices of the construct with three factors, the ratio (2/sd = 1.35) of the chi-squared value (2 =
97) to the degree of freedom (sd = 72) was found to be perfect, while the other fit indices (RMSEA = 0.06,
SRMR = 0.08, CFI = 0.94, IFI = 0.94, NNFI = 0.92, AGFI = 0.79 and GFI = 0.86) were found to show an
acceptable fit, and the tested model was concluded to be sufficient.

3- Türkel, A., Özdemir, E. E. & Akbulut, S. (2017). Validity and reliability study of reading culture
scale. International Periodical for the Languages, Literature and History of Turkish or Turkic,
Vol. 12/14, pp. 465-490. DOI: http://dx.doi.org/10.7827/TurkishStudies

This study is a screening study aimed at determining the structural validity and internal
consistency of the "Reading Culture Scale".

In order to develop the scale, literature on the reading culture was scanned and Turkish/ Turkish
Language and Literature teachers and instructors working in the Department of Turkish Education
were consulted. A pool of items was created on the basis of the information obtained.

While creating the item, attention had been paid to ensure that the items were simple and
understandable, that there were positive and negative expressions, that there were no factual
expressions, and that an item didn’t not contain more than one judgment. The 51-item draft scale
was developed as a 5-point Likert type. "Never" (1), "Rarely" (2), "Sometimes" (3), "Usually" (4),
"Always" (5) expressions were used for the items used in the scale. Negative items used were
inverted before analysis.

Applying the expert opinion is one of the logical ways to test the validity of the scale. It is expected
from the expert that items in the draft form of the scale be assessed in terms of the scope
validation (Büyüköztürk, 2014:180). The draft scale consisting of 49 items consists of 5 Turkish
teachers, 2 Turkish Language and Literature teachers, 3 instructors in Turkish Language and
Literature Education Department and 8 instructors who are working in Turkish Language
Education Department. In the end of the draft scale, there are 51 items.

The 51-item draft scale prepared was applied to a total of randomly selected 385 pre-service
teachers who studied at a state university during the fall semester of the 2016/2017 academic
year. Missing filled forms were not considered and a total of 379 forms were obtained. 261 (69%)
of the participants were female and 118 (31%) were male.

The factor load values of the scale consisting of four factors and a total of 30 items, as seen in
Table 2, range from .458 to .826. The four identified factors account for 48,402% of the total
variance. Tavsancil (2006: 48) stated that variance rates ranging from 40% to 60% are acceptable
values in analyses made in social sciences. Therefore, it can be said that the explanation ratio of
the total variance of the defined factors is sufficient. The names of the factors were made taking
into account the items they contain
One of the most important points of a measuring instrument is reliability. Reliability can be
described as the level of consistency of responses of the participants that were included in the
study to the test items which make the scale. Reliability of a scale can be deduced from a
commonly used "Cronbach Alpha coefficient" and "substance-total correlation coefficient". The
Cronbach Alpha coefficient can be used to determine the consistency level of the instrument
without requiring multiple applications. Cronbach's alpha coefficient of .70 and above indicates
that the scale is reliable at the generally accepted level (Büyüköztürk, 2014: 181).

According to the analysis made to determine the internal consistency of the items in the scale;
the internal consistency coefficient of the scale was .90. When the Cronbach Alpha values of the
scale were examined according to the factors, the reliability coefficient was found to be between
.72 and .86 for the factors of scale. According to this, .86 for the first factor, .86 for the second
factor, .72 for the third factor and .78 for the fourth factor was found. According to the findings, it
can be said that the scale is also reliable in terms of factors.

Another criterion applied when reliability is determined is to examine the item-total correlation
coefficient in order to determine internal consistency. Item-total correlation is an important
indicator for establishing the relationship between the scores obtained from the test items and the
total scores obtained from the test, and for determining the discrimination of the items forming the
scale. In general, item-total correlation values of a scale item of .30 and above indicate that the
discrimination of those items is high (Büyüköztürk, 2014: 183). In the analysis made after the pre-
application, the 21 items taking the value under .30 were removed. According to the obtained
results, item total correlations of all the items forming the scale are shown in Table 3.

As shown in Table 3, item total correlation values of all the items forming the scale are .30 and
over. In this case, it is possible to say that the measures are intended to measure the same
behavior. It is seen that the total correlations of the items of the scale are sufficient.

A scale structure consisting of four sub-dimensions emerged in line with the results of the analysis
of basic components made using the Varimax vertical rotation technique. The Cronbach Alpha
reliability coefficient as a whole was calculated as .90. The Cronbach Alpha reliability coefficients
for the four sub-dimensions of the scale were found to be .86, .86, .72, .78 respectively. According
to the findings obtained, it was revealed that the scale is a reliable scale in terms of both the whole
and the sub-dimensions.

4- Vaske, J. J., Beaman, J. & Sponarski, C. C. (2017). Rethinking internal consistency in


Cronbach's Alpha. Leisure Sciences, Vol. 39(2), pp. 163-173. DOI:
http://dx.doi.org/http://dx.doi.org/10.1080/01490400.2015.1127189.

Many concepts of interest to social scientists are not directly observable (e.g., attitudes, beliefs,
norms, value orientations). Their existence must be inferred from survey responses that serve as
indicators of the concepts. A variety of measurement methodologies and scaling procedures have
been proposed for examining psychological concepts. Likert, for example, introduced the
summated rating scale in 1932. This scaling approach begins with a pool of items that are
believed to be relevant to an object (e.g., attitude, belief) of interest. Respondents indicate their
level of agreement or disagreement with each statement. For example, positive values of of-1,
and- 1 and 2might correspond to “agree” and “strongly agree,” while negative values -1, 2 would
represent identical levels of disagreement. Zero is often used as a “neutral” or “neither agree nor
disagree” point (Nowlis, Kahn, & Dhvar, 2002). The Likert technique is referred to as a summated
rating scale because the responses received from each item are summed (or averaged) to obtain
the respondent’s score on the scale. Cronbach’s alpha (often symbolized by the lower-case Greek
letter a) is commonly used to examine the internal consistency or reliability of summated rating
scales (Cronbach, 1951).

There are four defining characteristics of a summated rating scale (Vaske, 2008). First, as the
name (i.e., summated) implies, the scale must contain multiple items (survey questions) that will
be combined by summing or averaging. Second, each item in the scale must reflect the concept
being measured. Third, unlike a multiple choice exam, there are no “right” or “wrong” answers to
the items in a summated rating scale (e.g., items measuring attitudes, beliefs, or norms). Fourth,
each item in a scale is a statement and respondents rate each statement. Many surveys contain
three to five statements per scale, with each statement including four to seven response choices
(Green & Rao, 1970). The optimal number of response categories (Dawes, 2002) and whether or
not to include a neutral category (Nowlis et al., 2002) are complex issues that can vary by salience
of topic to respondents and survey methodology (e.g.,mail vs. phone survey).

Summated rating scales should have good psychometric properties (Nunnally & Bernstein, 1994).
In other words, a good scale is reliable, valid, and often precisely measured. Measurement
reliability means that the multiple items measure the same construct (i.e., the items inter correlate
with each other). Measurement validity means that the scale measures what it was intended to
measure. Precision reflects the number of response options associated with survey questions.
For example, a 7-point strongly agree to strongly disagree scale is more precise than a 2-point
agree or disagree scale. With multiple response choices (e.g., strongly agree, moderately agree,
slightly agree, neutral, slightly disagree, moderately disagree, strongly disagree), those who feel
strongly can be distinguished from those with moderate feelings (Revilla, Saris, & Krasnick, 2014).

For summated rating scales, a reliability analysis is commonly performed to estimate the internal
consistency of the items. Internal consistency statistics estimate how consistently individuals
respond to the items within a scale. The word “scale” is used here to reflect a collection of survey
items that are intended to measure the unobserved concept (e.g., a sum mated rating scale).
There are several internal consistency reliability estimates, for example, Cronbach’s alpha
(Cronbach, 1951; Cronbach & Shavelson, 2004) and Kuder-Richardson formula 20 (a.k.a. KR20;
Kuder & Richardson, 1937).

Cronbach’s alpha is perhaps the most common estimate of internal consistency of items in a scale
(Cronbach, 1951; Cronbach & Shavelson, 2004). Alpha measures the extent to which item
responses (answers to survey questions) correlate with each other. In other words, estimates the
proportion of variance that is systematic or consistent in a set of survey responses. The general
formula for computing is:

The statistic “typically” ranges from 0.00 to 1.00, but a negative value can occur when a the items
are not positively correlated among themselves. The size of alpha depends on the number of
items in the scale (Streiner, 2003). For example, assume the reliability for a 4-item scale is .80. If
the average correlation among the items remains constant (e.g., .50), and the number of items
increases, the scale’s reliability increases to .86 with 6 items, and .91 with 10 items. For 20 items,
the reliability increases to .95. Conversely, if the scale had only 2 items and the same average
correlation, the reliability declines to .66. In general, the relationship between number of items
and alpha is curvilinear (Komorita & Graham, 1965) and begins to level off before the number of
items reaches 19.

By convention, an alpha of .65–.80 is often considered “adequate” for a scale used in human
dimensions research (Green et al., 1977; Spector, 1992; Vaske, 2008). alpha provides an
estimate of the internal consistency of variation in the variables in the scale (Cortina, 1993;
Tavakol&Dennick, 2011).

Internal consistency is concerned with the interrelatedness of a sample of test items, whereas
homogeneity refers to unidimensionality (Green et al., 1977). Internal consistency is a necessary
but not sufficient condition for measuring homogeneity or unidimensionality in a sample of test
items (Cortina, 1993). If this unidimensionality assumption is violated, reliability will be
underestimated (Miller, 1995).

Finally, the items in the scale are assumed to be positively correlated with each other because they
are measuring the same theoretical concept. A negative estimate of α can occur when the items are
not positively correlated among themselves (Ritter, 2010). In this situation, one or more variables
may need to be recoded so all items are coded in the same conceptual direction. A negative
correlation, however, can also occur when respondents’ answer inconsistently (Thompson, 2003).
This article focuses on situations when respondents answer survey questions inconsistently.

5- Najafabadi, A. T. P., & Najafabadi, M. O. (2016). On the Bayesian estimation for Cronbach's
alpha. Journal of Applied Statistics, Vol. 43(13), pp. 2416-2441. DOI:
http://dx.doi.org/10.1080/02664763.2016.1163529.
The Cronbach’s alpha is one of the most popular coefficients for measuring reliability or internal
consistency of a test consisting of multiple components.

Likert type data are commonly utilized in the most of social sciences settings to measure an
unobserved continuous variable with an ordinal one.

6- Diedenhofen, B., & Musch, J. (2016). cocron: A Web Interface and R Package for the
Statistical Comparison of Cronbach's Alpha Coefficients. International Journal of Internet
Science, Vol. 11(1), pp. 51-60.

Thus, Cronbach’s alpha is based on the ratio of the sum of item variances to the test score
variance, employing a correction factor to take the number of items into account. (To avoid
confusion with the statistical significance level a, the population alpha coefficient ....^, and the
sample alpha coefficient are denoted by .... and ....^, respectively.) The upper and lower bounds
of a 100(1 – a) percent confidence interval for .... are calculated as.

7- Bonett, D. G., & Wright, T. A. (2015). Cronbach's alpha reliability: Interval estimation,
hypothesis testing, and sample size planning. Journal of Organizational Behavior, Vol. 36(1),
pp. 3-15.

Cronbach’s alpha reliability describes the reliability of a sum (or average) of q measurements
where the q measurements may represent q raters, occasions, alternative forms, or
questionnaire/test items. When the measurements represent multiple questionnaire/test items,
which is the most common application, Cronbach’s alpha is referred to as a measure of “ internal
consistency” reliability.

All of the above statements refer to the effect of the population reliability value and not the sample
reliability value that is reported in the vast majority of social science and organizational studies.
This is another reason why it is important to report a confidence interval for .q rather than just the
sample value of .q.

8- Taber, K. S. (2018). The use of Cronbach’s alpha when developing and reporting research
instruments in science education. Research in Science Education, Vol. 48(6), pp. 1273-1296.
DOI: https://doi.org/10.1007/s11165-016-9602-2.
Cronbach’s alpha is a statistic commonly quoted by authors to demonstrate that tests and scales
that have been constructed or adopted for research projects are fit for purpose. Cronbach’s alpha
is regularly adopted in studies in science education: it was referred to in 69 different papers
published in 4 leading science education journals in a single year (2015)— usually as a measure
of reliability.

When choosing an instrument, or developing a new instrument, for a study, a researcher is


expected to consider the relevance of the instrument to particular research questions (National
Research Council Committee on Scientific Principles for Educational Research, 2002) as well as
the quality of the instrument. Quality may traditionally be understood in terms of such notions as
validity (the extent to which an instrument measures what it claims to measure, rather than
something else) and reliability (the extent to which an instrument can be expected to give the
same measured outcome when measurements are repeated) (Taber, 2013a).
when an instrument does not give reliable readings, it may be difficult to distinguish genuine
changes in what we are seeking to measure from changes in readings that are an artefact of the
unreliability of the instrument.

In educational research, it may be quite difficult to test the reliability of an instrument such as an
attitude scale or a knowledge test by simply undertaking repeated readings because human
beings are constantly changing due to experiences between instrument administrations, and also
because they may undergo changes due to the experience of the measurement process itself.

So, a student may answer a set of questions, and that very activity may set in chain thinking
processes that lead to new insights or further integration of knowledge. A day, week, or month
later, the student may answer the same questions differently for no other reason than that
responding to the original test provided a learning experience.

The present article takes the form of a methodological critique, focused on one measure
commonly associated with instrument reliability in science education research (Cronbach’s
alpha).

It is common to see the reliability of instruments used in published science education studies
framed in terms of a statistic known as Cronbach’s alpha (Cronbach, 1951). Cronbach’s alpha
has been described as ‘one of the most important and pervasive statistics in research involving
test construction and use’ (Cortina, 1993, p. 98) to the extent that its use in research with routine
multiple-item measurements is considered (Schmitt, 1996, p. 350). Alpha is commonly reported
for the development of scales intended to measure attitudes and other affective constructs.
However, the literature also includes reports of the development of tests of student knowledge
and understanding that cite Cronbach’s alpha as an indicator of instrument quality.

The most common descriptors were (separately or together) reliability or internal consistency.
Twelve articles used both of these terms, including one paper that described alpha in terms of
internal consistency, reliability, and also discriminating power,

In this study, Cronbach’s coefficient a was used to calculate the internal consistency coefficients
of the items included in the questionnaire through a pilot study with 42 science teachers. Results
of the reliability analysis showed that the items in the six scales had a satisfactory discriminating
power. (Mansour, 2015, p. 1773).

The common notion of there being a threshold of acceptability for alpha values, if rule of thumb
only as a (Plummer & Tanis Ozcelik, 2015), was not always seen as implying that lower values
of alpha should be taken as indicating an unsatisfactory instrument. Griethuijsen et al. (2014)
reported a cross-national study looking at student interests in science where “several of the values
calculated for Cronbach’s alpha are below the acceptable values of 0.7 or 0.6” (p.588).

Alpha is then widely used by authors in science education to represent the reliability, or the
internal consistency, of an instrument or an instrument scale in relation to a particular sample or
subsample of a population. These terms are often seen as synonymous in relation to alpha, and
a number of alternative terms are also associated with alpha values cited in science education. A
value of around 0.70 or greater is widely considered desirable (although characterisation of the
qualitative merits of specific values seems highly variable between studies). The examples
reviewed here show that alpha values of 0.7 or above can be achieved even when an instrument
is exploring multiple constructs or testing for several different aspects of knowledge or acceptable
understanding. As seen above, values of alpha may be reported even when an instrument
includes items of high difficulty that few students can correctly answer, or items that are
considered to be only loosely related to each other, or when items are found to be problematic in
terms of their loading on factors associated with the particular constructs they are intended to
elicit (that is, when items may not clearly belong in the scale or test section they are designated
to be part of).

Cronbach was concerned with having a measure of reliability for a test or instrument which could
be obtained from a single administration given the practical difficulties (referred to stability earlier)
in obtaining test-retest data—and he distinguished the latter as a matter of test (lack of change
over time) as opposed to tests like Cronbach’s alpha that offered measures of equivalence
(whether different sets of test items would give the same measurement outcomes). The approach
used to test equivalence was based on dividing the items in an instrument into two groups, and
seeing whether analysis of the two parts gave comparable results.

So, alpha reflects the extent to which different subsets of test items would produce similar
measures. Cronbach suggested that alpha “reports how much the test score depends upon
general and group, rather than item specific, factors” (p.320).

“alpha is maximised when every item in a scale shares common variance with at least some
other items in the scale” (p.286, emphasis in original). Gardner highlighted how when using a
ratings scale where a total score was obtained by summing the responses all the items reflect the
same construct. across items it is important that That is, that the scale needs to be unidimensional
to provide an “interpretable” result, as ‘a score obtained from a measuring scale ought to indicate
the “amount” of the construct being measured’.

Cortina (1993) noted over 20 years ago that it was common for authors to assume that
demonstrating that alpha was greater than 0.70 was sufficient to consider no further scale
development was needed, leading to the statistic simply being presented in studies without further
interpretation.

9- Trinchera, L., Marie, N. & Marcoulides, G. A. (2018). A distribution free interval estimate for
Coefficient Alpha. Structural Equation Modeling: A Multidisciplinary Journal, Vol. 25(6), pp.
876-887. DOI: https://doi.org/10.1080/10705511.2018.1431544.

The use of multicomponent measurement instruments (such as tests, scales, self-reports,


questionnaires, or surveys) is very popular in the biomedical, behavioral, educational, managerial,
and social sciences. One reason for this popularity is due to their capacity to provide multiple
converging pieces of information about underlying constructs or latent variables of substantive
interest.

A key aspect of much of this work has been devoted to the concept of reliability, particularly with
regards to point and interval estimation of instrument reliability. To date, Cronbach’s (1951)
coefficient alpha appears to be the most frequently reported index for assessing the reliability of a
scale.

point and interval estimation of instrument reliability continues to be generally determined using
point and interval estimates of coefficient alpha (Raykov, West, & Traynor, 2015).

The sampling distribution of coefficient alpha has also been extensively discussed in the literature
(Feldt, 1965; Feldt & Brennan, 1989; Kristof, 1963; Lord, 1974; Raykov & Marcoulides, 2015) with a
significant number of these studies having also considered the determination of its confidence
intervals (CIs) (Duhachek & Iacobucci, 2004; Raykov & Marcoulides, 2015).

Therefore, under these circumstances (which are empirically examinable/testable; see Raykov
&Marcoulides, 2017), point and interval estimation of the scale reliability coefficient can be
accomplished in practical terms by point and interval estimation of coefficient alpha (see Raykov et
al., 2015). Conventionally, a scale is commonly considered to exhibit good psychometric properties
if the associated coefficient alpha is at least 0.7 or higher and its CI is narrow.

Equations (10) and (11) can also be used to build a test statistic on the value of coefficient alpha.
This test statistic would be useful in cases in which a researcher is interested in testing whether
coefficient alpha is significantly greater than (or equal) to a given specified value (e.g., the
common threshold value of 0.7, frequently expected for a scale to exhibit good psycho metric
properties). To accomplish this activity, let us consider the following null ( H0) and alternative
hypotheses ( H1)

Reliability is commonly examined in order to assess the measurement quality of scales. To date,
Cronbach’s coefficient alpha is the most commonly reported index of measurement quality for
assessing scale reliability.

10- Vargas-Fernández, T. & Cuesta-Santos, A. (2018). Las competencias para el turismo


sostenible. Su determinación empírica. Ingeniería Industrial, 39(3), pp. 226-236.

El análisis de la fiabilidad se realiza mediante el coeficiente a Cronbach. Los resultados


responden al análisis de este coeficiente en los cuestionarios aplicados a los directivos y
trabajadores de las organizaciones seleccionadas en Viñales, para evaluar sus necesidades de
competencias para el turismo sostenible. Los directivos y trabajadores debían evaluar la opción
que consideren más se ajuste a sus competencias en una escala de 1 a 5, donde (1) era muy
baja, (2) baja, (3) media, (4) alta y (5) muy alta.

Con la aplicación del cuestionario de diagnóstico se midieron 61 variables que corresponden a


las dimensiones de las competencias genéricas y específicas determinadas por los expertos y
se realizaron cinco mediciones. El análisis del a Cronbach obtenido es de 0,960 para las CG,
0,976 para las CED y 0,970 para las CET, lo cual evidencia una fiabilidad excelente, al demostrar
que existe consistencia interna entre los datos estudiados. Pues aunque no existe un patrón
único, generalmente el coeficientes de fiabilidad en torno a 0,9 se consideran excelentes, valores
en torno a 0,8 son muy buenos, valores en torno a 0,7 adecuados y valores superiores a 0,6
aceptables [18].

11- Shelby, L.B. (2011) Beyond Cronbach's Alpha: Considering Confirmatory Factor Analysis and
Segmentation, Human Dimensions of Wildlife, Vol. 16(2), pp. 142-148. DOI:
https://doi.org/10.1080/10871209.2011.537302

Measurement reliability refers to the consistency of responses to a set of questions (variables)


designed to measure a given concept (Vaske, 2008). Cronbach (1951) introduced coefficient
alpha (i.e., Cronbach’s alpha) as a test of this internal structure. Cronbach’s alpha measures the
extent to which item responses correlate with each other. In other words, the statistic estimates
the proportion of variance that is systematic or consistent in a set of survey responses.
Cronbach’s alpha is now frequently used in the human dimensions literature as a panacea for
reporting the internal consistency of a multiple item scale.

Statisticians recommend that whenever Cronbach’s alpha is reported, corrected item-total


correlations should be required (Schmitt, 1996). The corrected item-total correlations should be
greater than .40 for an acceptable scale (Vaske, 2008).

The inter-item correlations for the items in each of the four emotion scales ranged from .47 to .76.
The alpha if item deleted results did not indicate that any of the items should be removed from
the scales. These findings provide justification for combining all of the items associated with a
given emotional component into the four hypothesized scales: (a) sympathy for those negatively
impacted by wolves (i.e., ranchers), (b) sympathy for wolves, (c) anger felt about the presence of
wolves, and (d) fear of wolves.

The initial reliability results demonstrated that the Cronbach’s alphas and associated statistics for
the entire sample were acceptable. The Cronbach’s alphas for each of the four emotion concepts
were greater than .80, and all of the inter-item correlations were greater than .40. The results did
not indicate that any of the items should be deleted to increase the value of Cronbach’s alpha.
The typical conclusion in the human dimensions literature would be that internal consistency had
been shown and that all items associated with a given emotions concept could be combined into
their respective scales for further analysis.
O alfa de Cronbach mede até que ponto as respostas dos itens se correlacionam entre si.
Em outras palavras, a estatística estima a proporção de variância que é sistemática ou
consistente em um conjunto de respostas de pesquisa.
Os estatísticos recomendam que, sempre que o alfa de Cronbach seja relatado, devem ser
necessárias correlações item-totais corrigidas (Schmitt, 1996). As correlações item-total
corrigidas devem ser maiores que. 40 para uma escala aceitável (vaske, 2008).
escalas de variaram de. 47 a. 76. indicou que nenhum dos itens deve ser removido das
escalas.
Esses achados fornecem justificativa para a combinação de todos os itens associados a um.
A conclusão típica na literatura das dimensões humanas seria que a consistência interna
estêve mostrada e que todos os artigos associados com um conceito dado das emoções
poderiam ser combinados em suas escalas respectivas para uma análise mais adicional.

12- Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s
alpha. Psychometrika, 74(1), pp. 107-120. DOI: https://doi.org/10.1007/S11336-008-9101-0

Schmitt (1996) distinguishes internal consistency from homogeneity and claims that internal
consistency refers to the interrelatedness of a set of items, and homogeneity to the
unidimensionality of a set of items.

A single number—alpha—that expresses both reliability and internal consistency—conceived of


as an aspect of validity that suggests that items “measure the same thing”—is a blessing for the
assessment of test quality. In the meantime, alpha “only” is a lower bound to the reliability and
not even a realistic one.

13- Ritter, N. L. (2010). Understanding a widely misunderstood statistic: Cronbach's. Online


Submission.

When items are perfectly correlated, there is perfect internal consistency between the item scores.
Accordingly, a = 1 (within rounding error) when items are perfectly correlated.

When items are perfectly correlated and have mixed signs, the sum of item variances will be
greater than the total score variance. When the individual score variance is greater than total
score, internal consistency is non-existent between the item scores; therefore the items are
measuring different concepts. In general, as items are more correlated, shared variance
increases, increasing internal consistency; therefore increasing the magnitude of the alpha
coefficient.

Ritter (2010). Quando os itens estão perfeitamente correlacionados, há consistência interna


perfeita entre as pontuações dos itens. Consequentemente, a = 1 (dentro do erro de
arredondamento) quando os itens estão perfeitamente correlacionados. Em geral, como os
itens estão mais correlacionados, a variância compartilhada aumenta, aumentando a
consistência interna; aumentando, portanto, a magnitude do coeficiente alfa.

14- Streiner, D. L. (2003) Starting at the Beginning: An Introduction to Coefficient Alpha and
Internal Consistency, Journal of Personality Assessment, Vol 80(1), pp. 99-103.DOI:
https://doi.org/10.1207/S15327752JPA8001_18
In other words, even though a scale may consist of two or a more independent constructs, could
be substantial as long as the scale contains enough items. The bottom line is that a a high value
of alfa is a prerequisite for internal consistency, but does not guarantee it; long, multidimensional
scales will also have high values of alfa.

Scales over 20 items or so will a, have acceptable values of even though they may consist of two
or three orthogonal dimensions. It is necessary to also examine the matrix of correlations of the
individual items and to look at the item-total correlations. In this vein, Clark and Watson (1995)
recommended a mean interitem correlation within the range of .15 to .20 for scales that measure
broad characteristics and between .40 to .50 for those tapping narrower ones.

Streiner, D. L. (2003) A linha inferior é que um valor elevado de alfa é um pré-requisito para
a consistência interna, mas não o garante; Escalas mais de 20 itens ou assim será um, ter
valores aceitáveis, mesmo que eles podem consistir em duas ou três dimensões ortogonais.
É necessário também examinar a matriz de correlações dos itens individuais e observar as
correlações item-total. Nesta veia, Clark e Watson (1995) preconiza uma correlação interitem
média dentro da faixa de .15 a .20 para escalas que medem características amplas e entre. 40
a. 50 para aqueles que escutam os mais estreitos.

S-ar putea să vă placă și