Sunteți pe pagina 1din 10

1

The Five Principles of Language Assessment


Teachers need to consider five principles of language assessment when they
create assessments (Brown & Abeywickrama, 2010). These principles, which are all
of equal importance, may be used to evaluate a designed assessment:
1. PRACTICALITY
Practicality refers to evaluating the assessment according to cost, time needed,
and usefulness. This principle is important for classroom teachers.
An effective test is practical. This means that it:
is not excessively expensive.
A test that is prohibitively expensive is impractical.
stays within appropriate time constraint.
A test of language proficiency that takes a student 10 hours to complete
is impractical.
is relatively easy to administer.
A test that takes a few minutes for a student to take and several hours
for an examiner to evaluate for most classroom situation is impractical.
has a scoring/evaluation procedure that is specific and time efficient.
A test that can be scored only by computer if the test takes place a
thousand miles away from the nearest computer is impractical.
(Brown, 2004).
In addition, Brown and Abeywickrama (2010) have explained the attributes of
practical tests as follows:
a practical test stays within budgetary limits
can be completed by the test-taker within appropriate time constraints
has clear directions for administration
appropriately utilizes available human resources
does not exceed available material resources
considers the time and effort involved for both design and scoring
(Brown & Abeywickrama, 2010).

Furthermore, for a test to be practical:


administrative details should clearly be established before the test,
students should be able to complete the test reasonably within the set time
frame,
all materials and equipment should be ready,
the cost of the test should be within budgeted limits,
the scoring/evaluation system should be feasible in the teachers time frame.

2. RELIABILITY
Reliability means that the assessment is consistent and dependable (Brown &
Abeywickrama, 2010), which means that the same score, will be achieved from the
same type of students no matter when it is scored or who scores it. Brown and
Abeywickrama (2010) have summarized the feature of this principle as follows:
a reliable test-:
is consistent in its conditions across two or more administrations
gives clear directions for scoring/evaluation
has uniform rubrics for scoring/evaluation
lends itself to consistent application of those rubrics by the scorer
contains items/tasks that are unambiguous to the test-taker (Brown &
Abeywickrama, 2010).
To make the test reliable, especially for subjective and open-ended
assessments, it is important to write scoring procedures clearly and to train teachers to
be able to score the assessment correctly (Linville, 2011, Unit 2, p. 11).
Factors affecting reliability are (Heaton, 1975: 155-156; Brown, 2004: 21-22):
1. Student-related reliability: students personal factors such as motivation,
illness, anxiety can hinder from their real performance,
2. Rater/scorer reliability: either intra-rater or inter-rater leads to
subjectivity, error, bias during scoring tests,
3. Test administration reliability: when the same test administered in
different occasion, it can result differently. For example is the test of aural
comprehension with a tape recorder. When a tape recorder played items,
the students sitting next to windows could not hear the tape accurately
because of the street noise outside the building.
4. Test reliability: dealing with duration of the test and test instruction. If a
test takes a long time to do, it may affect the test takers performance such
as fatigue, confusion, or exhaustion. Some test takers do not perform well
in the timed test. Test instruction must be clear for all of test takers since
they are affected by mental pressures.
Some methods are employed to gain reliability of assessment (Heaton, 1975:
156; Weir 1990: 32; Gronlund and Waugh, 2009: 59-64). They are:
1. Test-retest/re t administer: the same test is administered after a lapse of
time. Two gained scores are then correlated.
2. Parallel form/equivalent-forms method: administrating two cloned tests
at the same time to the same test takers. Results of the tests are then
correlated.
3. Split-half method: a test is divided into two, corresponding scores
obtained, the extent to which they correlate with each other governing the
reliability of the test as a whole.

4. Test-retest with equivalent forms: mixed method of test-retest and


parallel form. Two cloned tests are administered to the same test takers in
different occasion.
5. Intra-rater and inter-rater: employing one person to score the same test
in different time is called intra-rater. Some hits to minimize unreliability
are employing rubric, avoiding fatigue, giving score on the same numbers,
and suggesting students write their names at the back of test paper. When
two people score the same test, it is inter-rater. The tests done by test takers
are divided into two. A rubric and discussion must be developed first in
order to have the same perception. Two scores either from intra- or interrater are correlated.
3. VALIDITY
By the far most complex criterion of an effective test-and arguably the most
importand principle is validIty, the extend towich imperence made from essesment
results are appropiate,meaningfull,and usefull in terms of porpuse of the assesment. A
valid test of reading ability actually measures reading ability not 20%20 vision,nor
previos knowladge in a subject, nor some other variable of question rrelevance. To
measure writing ability, one make ask the students to write as many words as they can
in 15 minutes, then simply count the word for the final score. Such a test would be
easy to administer(practical)and the scoring quite depandable (realible). But it would
not constitute a valid test of writing ability without some considaration of
ideas,among other factors.
How is the validaty of a test established? There is no final, absolute measure
of validity, but several different kind of evidence may be invoked in support. In some
cases, it may be appropiate to examine the extence to wich a test calls for ferformance
that mathes that of the course or unit of the study being tested. In other cases we may
be concerned with how well a test determines wheater or not students have reached an
established set of goals or level of competence. Statistical correLation with other
releted but independent measure is onother widely accepted from of evidence. Other
concers about a tests validity may focus on the consequences beyond measuring the
criteria themselves-of a test or even the test-takers perception of validity. We will
look at the these five types of evidence bellow:
1. content related-evidence
If a test actually samples the subject matter about wich conclusions are to be
drawn, and if it requires the test-takers to nperform the behavior that ios being
measured, it can be claim content -related evidence of validaty. Often popularly
referred to as content validaty. You can usually identify content-related evidance
observationally if you can clearly define the achievment that you are measuring.
Onother way of understanding content validaty is to consider the difference
between direct and indirect testing. Direct testing involves the test-takers in actually
performing the target task. In and indirect test, learners are not performing the task

itself but rather a task that is related in some way. For example if you intent to test the
learners oral production of syllable stress and your task test is to have learners
marks(with writtent accent marks) stressed syllable in a list of written word, you
could,with a stretch of logic, argue that you are indirectly testing their oral production.
A direct test of syllable production would have ton require that students actually
produce target words orally.
The most feasible rule of thumb for echieving content validaty in classroom
assesment is to test performance directly. Consider, for example a listening/speaking
class that is doing a unit on greating and exchanges that includes discourse for asking
for personal information( name, address,hobbies, etc) with some form-focus on the
verb to be, personal pronouns and question formation. The test on the unit should
include all of the actual prformance of listening and speaking.
2. Criterion-Related Evidance
A second form of evidance of the validaty a test may be found in what is
called criterion-related evidance, also referred ton as criterion-related validaty,or the
extent to which the criterion of the test actually been reached.
Criterion related evidance usually falls into one of two categories concurent
and predictive validaty. A test has concurrent validaty if its results are supported by
other concurrent performance beyond assessment itself.forexample the validaty of a
high score on the exam of a foreign language courses will be substantiated by actual
profiency in the language. The predictive validaty of an assessment become
importand
In the case of placement tests,admission assessment batteries, language
aptitude test, and the like.the assessment criterion in such cases is not to measure
concurrent ability but to asses (and predict) a test-tskerd likelihood of future succes.
3. Construct-Related Evidance
A third kind of evidance that can support validaty, but that does not play as
large a role for classroom teachers, is construct-related validaty, commonly reffered to
as construct validity. A construct ia any theory, hyphotesis or model that attempts to
explain observed phenomenon in our universe of perception. Constructs may or vmay
not be directly or empirically measured their verification often requaire inferential
data. Profiency and communicative competence are linguistic constructs , self-esteem
and motivation are psyhologycal constructs.
4. Consequncetial Validity
As wall as the above three widely accepted forms of evidance that may be
introduceed to support validay of an assessment, two other categories may be of some
interest and untility in your own quest for validating classroom tests., among others,
underscore the potential importance of the consequence of using an assesment.
Consequance validity uncompassed all the consequence of a test, including such

considarations as its accuracy in measuring intended criteria , its impact on the


preparation of test-takers, its effect on the klearners, and the intended and
unintended) sicial sequences of a tests interpretation and use. In other word
consequntial validaty is How well use of assessment results accomplishes intended
purposes and avoids unintended effect.
5. Face Validaty
An importand facet of cconsequential validaty is the extent to wich students
view the assessment as fair,relevant, and usefull for improving learning. or what is
popularly known as face validity . face validity referst to the degre to wich a test looks
right. And appears to measure the knowladge or abilities it claims to measure, based
on the subjective judgment of the examines who take it, the administrative personnel
who decide in its use, and others psychometrically unsophisticated observers.
Some time student dont what is beingn tested when they takle a test. They
may fell, for a variaty of reason, that a test is not testing what it is supposed to test.
Face validaty means that the student perceive the test to be valid. Face validity It can
be empirically tested by a teacher or even by a testing expert because it is based on
the subjective judgment of the examinees who take it.

4. AUTHENCITY
A fourth major principle of language testing is authenticity, a concept that is a
litle slippery to define, especially whitin yhe artr and science of evaluating and
designing tests. Bachman and Palmer Aunthenticitydefine authenticity as a degree of
correspondence of the chracteristics of a given language test task to the futures of a
target language task and then suggest an agenda for identifiying those target language
tasks and for tranforming them in to valid test items.
In a test authenticity may be present in the following ways:

the language in the test is as natural as possible.


Items are contextualized rather than isolation.
Topict are meaningfull(relevant\, interesting) for the learner.
Some thematic organization to items isprovided, such as through a story line

or episode.
Task represent, or closely approximatye, real-world tasks.

5. WASHBACK
The effects of tests on teaching and learning are called washback. Washback
refers to criterion for a test is the influence of the form and the content of the test in
the classrooms. Teachers must be able to create classroom tests that serve as learning
devices through which washback is achieved. Washback enhances intrinsic

motivation, autonomy, self-confidence, language ego, interlanguage, and strategic


investment in the students. Instead of giving letter grades and numerical scores which
give no information to the students performance, giving generous and specific
comments is a way to enhance washback.
In large-scale assessment, wasback generally refers to the effects the test have
on instruction in terms of how students prepare for the test. Cram courses and
teaching to the test are examples of such washback. Another form of washback that
occurs more in classroom assessment is the information that washes back to
students in the form of useful diagnoses of strengths and weaknesses. Washback also
includes the effects of an assessment on teaching and learning prior to the assessment
itself, that is, on preparation for the assessment.
Informal performance assessment is by nature more likely to have built-in
wash back effects because the teacher is usually providing interactive feedback.
Formal tests can also have positive wash back, but they provide no wash back, if the
students receive a simple letter grade or a single overall numerical score. Wash back
enhances a number of basic principles of language acquisition : intrinsic motivation,
autonomy, self confidence, language ego, interlanguage, and strategic investment,
among others. One way to enhance wash back is to comment generously and
specifically on test performance. Wash back implies that students have ready access to
the teacher to discuss the feedback and evaluation he has given, Teachers can raise the
wash back potential by asking students to use test results as a guide to setting goals
for their future effort.
Washback can also be negative and positive. It is easy to find negative wash
back such as narrowing down language competencies only on those involve in tests
and neglecting the rest. While language is a tool of communication, most students and
teachers in language class only focus on language competencies in the test. On the
other hand, a test can be positive washback if it encourages better teaching and
learning. However, it is quite difficult to achieve. An example of positive washback of
a test is National Matriculation English Test in China. It resulted that after the test was
administered, students proficiency in English for actual or authentic language use
situation improved.

CAN THESE PRINCIPLES APPLY TO CLASSROOMS?


According to the five principles: practicality, reliability, validity, authenticity,
and washback, they can be provided as guidelines for evaluating a step-by-step
procedure in the classrooms. Clearly, validity is the first priority choice to consider,
and practicality is a minor important. Showed eight tips based on five principles
features below:

1. Are the test procedures practical?

Practically is determined by the teachers (and the students) time contains,


costs, and administrative details, and to some extent by what occurs before and
after the test. To determine whether a test is practical for your needs, you may
want to use the checklist below:
Are administrative details clearly established before the test?
Can students complete the test reasonably within the set time frame?
Can the test administrated smoothly, without procedural glitches?
Are all materials and equipment ready?
Is the cost of the test within budgeted limits?
Is the scoring system feasible in the teachers time frame?
Are methods for reporting results determined in advance?

As this checklist suggests, after you account for the administrative details of
giving test, you need to think about the practicality of your plans for scoring the test.

2. Is the test itself reliable?


Reability applies to both the test and the teacher, and at least four sources of
unreliability can be achieved by making sure that all students receive the same
quality of input, whether written or auditory. Part of achieving test reliability
depends on the physical context-making sure. For example that:

Every student has a cleanly photocopied test sheets

Sound amplification is clearly audible to everyone in the room

Video input is equally visible to all

Lighting, temperature, extraneous noise, and other classroom


conditions are equal (and optional) for all student

Objective scoring procedures leave litte debate about correctiveness of


an answer

3. Does the procedure demonstrate content validity?


There are two steps to evaluating the content validity of a classroom test.
1. Are classroom objectives identified and appropriately framed? Underlying
every good classroom test are the objectives of the lesson, module, or unit
of the course in question. So the first measure of an effective classroom
test is the identification of objectives.
2. Are the lesson objectives represented in the form of test specification? The
nest content-validity issue that can be applied to a classroom test centers
on the concept of test specification. Dont let this word scare you. It
simply means that a test should have a structure that follows logically from
the lesson or unit you are testing.

4.

Is the procedure biased for best?

This question integrates the concept of face validity with the importance of
structuring an assessment procedure to elicit the optimal performance of the
student. Students will generally judge a test to be face valid if

Directions are clear

The structure of the test is organized logically

Its difficulty level is appropriately pitched

The test has no surprise

Timing is appropriate

A phrase that has come to be associated with face validity is biased


for best, a term that goes a little beyond howcthe student views the test to a
degree of strategic involvement on the part of student and teacher in preparing
for setting up and following upon the test itself. To give an assessment
procedure that is biased for best a teacher

Offers student appropriate review and preparation for the test

Suggests strategies that will be beneficial and

Structures the test so that the best students will be modestly challenged
and the weaker student will not be overwhelmed.

5. Are the test tasks as authentic as possible?


Evaluate the extent to which a test is authentic by asking the following question:

Is the language in the test as natural as possible?

Are items as contextualized as possible rather than isolated?

Are topics and situations interesting, enjoyable, and/or humorous?

Is some thematic organization provided, such as through a story line or


episode?

Do tasks represent, or closely approximate real word task?


Consider the following two excerpts from tests (Multiple-choice taskscontextualized and Multiple-choice tasks-decontextualized), and the
concept of authenticity may become a little clearer

6. Does the test offer beneficial washback to the learner?


The design of an effective test should point the way to beneficial washback. A
test that achieves content validity demonstrates relevance to the curriculum in
question and thereby sets the stage for washback. When test items represent the
various objectives of a unit, and/or when sections of a test clearly focus on major
topics of the unit, classroom tests can serve in a diagnostic capacity even if they
are not specifically labeled.

10

CONCLUSION:
1. A test is good if it contains practicality, high reliability, good validity,
authenticity, and positive washback.
2. The five principles provide guidelines for both constructing and evaluating the
tests.
3. Teachers should apply these five principles in constructing or evaluating tests
which will be used in assessment activities.

S-ar putea să vă placă și