EVA SUSANTI (0755072) NOVYKO DWI AULLIAH (0755073) RISKI ANDESPA (0755042) WAHYU WIDIA PUTRA (0755090) SRI WAHYUMI (0655023) MARDIAN GUSTIANA (0655014) LASMINOVA CHOLIS (0655085) Measurement specialist’s hope long recognized that the examination of ratability depends upon our ability to distinguish the effects (on test scores) of the abilities we want to measure from the effects of other factors. When we investigate reliability, it is essential to keep in mind the distinction between unobservable abilities, on the one hand, and observed test scores, on the other. Classical true score {CTS} measurement theory consists of a set assumption about the relationship between actual, or observed test scores and the factors that effect these scores. Another concept that is part of CTS theory is that of parallel tests. in order for two tests to be considered parallel, we assume that they are measure of the same ability, that is, that an individual’s true score on one test will be the same as his true score on the other. The definitions of true score and error score variance given above are abstract, in the sense that we cannot actually observe the true and error scores for a given test. In any given test situation, there will probably be more than one source of measurement error. Three approaches to estimating reliability, each of which addresses different sources of error. Internal consistency estimate are concerned primarily with sources of error from within the test and scoring procedures, stability estimates indicated how consistent test score are over time, and equivalence estimate provides in indication of the extent to which score scores on alternate forms of a test are equivalent. The estimates of reliability that these approaches are called reliability coefficient. One approach to examining the internal consistency of a test is the split-half method, in which we divide the test into two halves and then determine the extent to which score on these to half are consistence with each other. In so doing, we are treating the halves as parallel tests, and so we must make certain assumption about the equivalence of the two halves, specifically that they have equal means and variances. Generalizability coefficients The G-theory analog of the CTS-theory reliability coefficient is the generalizability coefficient, which is defined as the proportion of observed score variance that is universe score variance : P2xx = s2p s2x
s2p : universe score variance s2x : is observed score variance Estimation One statistical procedure that can be used for estimating the relative effects of different sources of variance on test scores is the analysis of variance. We can represent the variance in test score in terms of the different facets in our G-study design : s2x = s2p + s2f + s2r
s2x : is the variance in all the test scores s2p : is the universe, or person score variance s2f : is the variance due to differences in question forms s2r : is the variance due to differences in raters The estimate of the SEM indicates how far a given obtained scores is from its true score, and provides mean for estimating the accuracy of a given obtained score as an indicator of an individual’s true score. A band score is a straight forward application of the confidence interval discussed above. A band score is the score range around an observed score that is defined by a given confidence interval of the SEM. A. The Unidimensionality Assumption B. Item Characteristic curve The degree to which the item discriminates among individuals of differing level of ability The level of difficult of the item The probability that an individual’s of low ability can answer correctly There are a number of limitations on both CTS theory and G-theory with respect to estimating the precision of measurement, only two of which will be discussed here. First, estimators of both reliability and generalizability, and the standard errors measurement associated with them, are sample dependent, so that scores from the same test administrated to different groups test takers may differ in their reliabilities. All of the approaches to reliability discussed thus far have been develop within framework that operationally define level of ability (true score, universe score, IRT ability parameter), as an average of an indefinitely large number of measures. Although NR reliability estimates are inappropriate for CR test scores, it is not the case that reliability is of no concern in such tests. On the contrary, consistency, stability and equivalence are equally important for CR tests. However, they take on different aspects in the CR context, and therefore require different approaches to both estimation and interpretation. Cut off score Mastery level and cut-off score Classification errors Estimating the dependability mastery/nonmastery classification Severalagreement indices that treat all misclassification as equally serious have been developed. The most easily understood approach is simply to give the test twice, and then compute the propositions of individuals who are consistently classified as masters and nonmasters on both tests. po = nm + nn N N
po : coefficient nm : the number of individuals classified as masters and nonmasters on both test administration. nn : the number of individuals classified as nonmasters on both N : total number of individuals who took the test twice. G theory : O (co) = 1 1 xp (1 xp) s2 p k 1 ( xp – co)2 + s2 p
xp : the mean of the proportion scores s2 p : the variance of the proportion scores k : the number of item in the test co : cut of score The effect of systematic measurement error 1. General affect of systematic error is constant for all observations ; it affects the score of all individuals who take the test. 2. Specific affect varies across individuals ; it affect different individuals differentially. The effect of test method Two problems: 1. A dilemma in choosing the type of error we want to minimize 2. Ambiguity in the inferences we can make from test score.