Sunteți pe pagina 1din 20

ALEXANDER (0755084)

DWI KURNIASIH (0755079)


EVA SUSANTI (0755072)
NOVYKO DWI AULLIAH (0755073)
RISKI ANDESPA (0755042)
WAHYU WIDIA PUTRA (0755090)
SRI WAHYUMI (0655023)
MARDIAN GUSTIANA (0655014)
LASMINOVA CHOLIS (0655085)
 Measurement specialist’s hope long
recognized that the examination of ratability
depends upon our ability to distinguish the
effects (on test scores) of the abilities we
want to measure from the effects of other
factors.
 When we investigate reliability, it is
essential to keep in mind the distinction
between unobservable abilities, on the one
hand, and observed test scores, on the
other.
 Classical
true score {CTS} measurement
theory consists of a set assumption about the
relationship between actual, or observed
test scores and the factors that effect these
scores.
 Another concept that is part of CTS theory is
that of parallel tests. in order for two tests
to be considered parallel, we assume that
they are measure of the same ability, that is,
that an individual’s true score on one test
will be the same as his true score on the
other.
 The definitions of true score and error score
variance given above are abstract, in the
sense that we cannot actually observe the
true and error scores for a given test.
 In any given test situation, there will probably be more
than one source of measurement error. Three
approaches to estimating reliability, each of which
addresses different sources of error. Internal
consistency estimate are concerned primarily with
sources of error from within the test and scoring
procedures, stability estimates indicated how consistent
test score are over time, and equivalence estimate
provides in indication of the extent to which score
scores on alternate forms of a test are equivalent. The
estimates of reliability that these approaches are called
reliability coefficient.
 One approach to examining the internal
consistency of a test is the split-half method,
in which we divide the test into two halves
and then determine the extent to which
score on these to half are consistence with
each other. In so doing, we are treating the
halves as parallel tests, and so we must make
certain assumption about the equivalence of
the two halves, specifically that they have
equal means and variances.
 Generalizability coefficients
The G-theory analog of the CTS-theory reliability coefficient is the generalizability
coefficient, which is defined as the proportion of observed score variance that is universe
score variance :
P2xx = s2p
s2x
 
s2p : universe score variance
s2x : is observed score variance
 Estimation
One statistical procedure that can be used for estimating the relative effects of different
sources of variance on test scores is the analysis of variance. We can represent the
variance in test score in terms of the different facets in our G-study design :
s2x = s2p + s2f + s2r
 
s2x : is the variance in all the test scores
s2p : is the universe, or person score variance
s2f : is the variance due to differences in question forms
s2r : is the variance due to differences in raters
 The estimate of the SEM indicates how far a
given obtained scores is from its true score,
and provides mean for estimating the
accuracy of a given obtained score as an
indicator of an individual’s true score. A
band score is a straight forward application
of the confidence interval discussed above. A
band score is the score range around an
observed score that is defined by a given
confidence interval of the SEM.
A. The Unidimensionality Assumption
B. Item Characteristic curve
 The degree to which the item discriminates
among individuals of differing level of ability
 The level of difficult of the item
 The probability that an individual’s of low ability
can answer correctly
 There are a number of limitations on both
CTS theory and G-theory with respect to
estimating the precision of measurement,
only two of which will be discussed here.
First, estimators of both reliability and
generalizability, and the standard errors
measurement associated with them, are
sample dependent, so that scores from the
same test administrated to different groups
test takers may differ in their reliabilities.
 All of the approaches to reliability discussed
thus far have been develop within framework
that operationally define level of ability
(true score, universe score, IRT ability
parameter), as an average of an indefinitely
large number of measures.
 Although NR reliability estimates are
inappropriate for CR test scores, it is not the
case that reliability is of no concern in such
tests. On the contrary, consistency, stability
and equivalence are equally important for CR
tests. However, they take on different
aspects in the CR context, and therefore
require different approaches to both
estimation and interpretation.
 Cut off score
 Mastery level and cut-off score
 Classification errors
 Estimating the dependability
mastery/nonmastery classification
 Severalagreement indices that treat all
misclassification as equally serious have been
developed. The most easily understood
approach is simply to give the test twice,
and then compute the propositions of
individuals who are consistently classified as
masters and nonmasters on both tests.
 po = nm + nn
N N
 
po : coefficient
nm : the number of individuals classified as
masters and nonmasters on both test
administration.
nn : the number of individuals classified
as nonmasters on both
N : total number of individuals who took
the test twice.
G theory :
O (co) = 1 1 xp (1 xp) s2 p
k 1 ( xp – co)2 + s2 p
 
xp : the mean of the proportion scores
s2 p : the variance of the proportion scores
k : the number of item in the test
co : cut of score
 The effect of systematic measurement error
1. General affect of systematic error is constant
for all observations ; it affects the score of
all individuals who take the test.
2. Specific affect varies across individuals ; it
affect different individuals differentially.
 The effect of test method
Two problems:
1. A dilemma in choosing the type of error we
want to minimize
2. Ambiguity in the inferences we can make from
test score.

S-ar putea să vă placă și