Documente Academic
Documente Profesional
Documente Cultură
2. RELIABILITY
Reliability means that the assessment is consistent and dependable (Brown &
Abeywickrama, 2010), which means that the same score, will be achieved from the
same type of students no matter when it is scored or who scores it. Brown and
Abeywickrama (2010) have summarized the feature of this principle as follows:
a reliable test-:
is consistent in its conditions across two or more administrations
gives clear directions for scoring/evaluation
has uniform rubrics for scoring/evaluation
lends itself to consistent application of those rubrics by the scorer
contains items/tasks that are unambiguous to the test-taker (Brown &
Abeywickrama, 2010).
To make the test reliable, especially for subjective and open-ended
assessments, it is important to write scoring procedures clearly and to train teachers to
be able to score the assessment correctly (Linville, 2011, Unit 2, p. 11).
Factors affecting reliability are (Heaton, 1975: 155-156; Brown, 2004: 21-22):
1. Student-related reliability: students personal factors such as motivation,
illness, anxiety can hinder from their real performance,
2. Rater/scorer reliability: either intra-rater or inter-rater leads to
subjectivity, error, bias during scoring tests,
3. Test administration reliability: when the same test administered in
different occasion, it can result differently. For example is the test of aural
comprehension with a tape recorder. When a tape recorder played items,
the students sitting next to windows could not hear the tape accurately
because of the street noise outside the building.
4. Test reliability: dealing with duration of the test and test instruction. If a
test takes a long time to do, it may affect the test takers performance such
as fatigue, confusion, or exhaustion. Some test takers do not perform well
in the timed test. Test instruction must be clear for all of test takers since
they are affected by mental pressures.
Some methods are employed to gain reliability of assessment (Heaton, 1975:
156; Weir 1990: 32; Gronlund and Waugh, 2009: 59-64). They are:
1. Test-retest/re t administer: the same test is administered after a lapse of
time. Two gained scores are then correlated.
2. Parallel form/equivalent-forms method: administrating two cloned tests
at the same time to the same test takers. Results of the tests are then
correlated.
3. Split-half method: a test is divided into two, corresponding scores
obtained, the extent to which they correlate with each other governing the
reliability of the test as a whole.
itself but rather a task that is related in some way. For example if you intent to test the
learners oral production of syllable stress and your task test is to have learners
marks(with writtent accent marks) stressed syllable in a list of written word, you
could,with a stretch of logic, argue that you are indirectly testing their oral production.
A direct test of syllable production would have ton require that students actually
produce target words orally.
The most feasible rule of thumb for echieving content validaty in classroom
assesment is to test performance directly. Consider, for example a listening/speaking
class that is doing a unit on greating and exchanges that includes discourse for asking
for personal information( name, address,hobbies, etc) with some form-focus on the
verb to be, personal pronouns and question formation. The test on the unit should
include all of the actual prformance of listening and speaking.
2. Criterion-Related Evidance
A second form of evidance of the validaty a test may be found in what is
called criterion-related evidance, also referred ton as criterion-related validaty,or the
extent to which the criterion of the test actually been reached.
Criterion related evidance usually falls into one of two categories concurent
and predictive validaty. A test has concurrent validaty if its results are supported by
other concurrent performance beyond assessment itself.forexample the validaty of a
high score on the exam of a foreign language courses will be substantiated by actual
profiency in the language. The predictive validaty of an assessment become
importand
In the case of placement tests,admission assessment batteries, language
aptitude test, and the like.the assessment criterion in such cases is not to measure
concurrent ability but to asses (and predict) a test-tskerd likelihood of future succes.
3. Construct-Related Evidance
A third kind of evidance that can support validaty, but that does not play as
large a role for classroom teachers, is construct-related validaty, commonly reffered to
as construct validity. A construct ia any theory, hyphotesis or model that attempts to
explain observed phenomenon in our universe of perception. Constructs may or vmay
not be directly or empirically measured their verification often requaire inferential
data. Profiency and communicative competence are linguistic constructs , self-esteem
and motivation are psyhologycal constructs.
4. Consequncetial Validity
As wall as the above three widely accepted forms of evidance that may be
introduceed to support validay of an assessment, two other categories may be of some
interest and untility in your own quest for validating classroom tests., among others,
underscore the potential importance of the consequence of using an assesment.
Consequance validity uncompassed all the consequence of a test, including such
4. AUTHENCITY
A fourth major principle of language testing is authenticity, a concept that is a
litle slippery to define, especially whitin yhe artr and science of evaluating and
designing tests. Bachman and Palmer Aunthenticitydefine authenticity as a degree of
correspondence of the chracteristics of a given language test task to the futures of a
target language task and then suggest an agenda for identifiying those target language
tasks and for tranforming them in to valid test items.
In a test authenticity may be present in the following ways:
or episode.
Task represent, or closely approximatye, real-world tasks.
5. WASHBACK
The effects of tests on teaching and learning are called washback. Washback
refers to criterion for a test is the influence of the form and the content of the test in
the classrooms. Teachers must be able to create classroom tests that serve as learning
devices through which washback is achieved. Washback enhances intrinsic
As this checklist suggests, after you account for the administrative details of
giving test, you need to think about the practicality of your plans for scoring the test.
4.
This question integrates the concept of face validity with the importance of
structuring an assessment procedure to elicit the optimal performance of the
student. Students will generally judge a test to be face valid if
Timing is appropriate
Structures the test so that the best students will be modestly challenged
and the weaker student will not be overwhelmed.
10
CONCLUSION:
1. A test is good if it contains practicality, high reliability, good validity,
authenticity, and positive washback.
2. The five principles provide guidelines for both constructing and evaluating the
tests.
3. Teachers should apply these five principles in constructing or evaluating tests
which will be used in assessment activities.