Documente Academic
Documente Profesional
Documente Cultură
CHAPTER I
INTRODUCTION
There are as many different tests of foreign language skills as there are
reasons for testing them. However, one thing that holds true for any test is that
there is no such thing as perfection. Human fallibility has a part to play there, but
it is also a result of the need to bear in mind certain principles when constructing a
test, principles which have within them certain contradictions, making it
impossible to design The Ultimate Test. The aim here is to set out the principles
that are used when the construction of language tests is under discussion, suggest
examples of how they can be applied, and point out the areas of conflict which
make test design so tricky. It should also be noted that while this entry will look at
these principles in relation only to language tests, they could well be applied to
many tests in other subjects. The difference tends to be that every field has its own
way of referring to and grouping the issues, which will be discussed here.
These and other questions help to identify five cardinal criteria for “testing
a test”, practically, reliability, validity, authenticity, and washback. In our paper
will be discussed three principles of language assessment like, practically,
reliability and validity. Because, it not many time to discussed enough for all of
them.
2
1. What is Practicality?
2. What is Validity?
3. What is Reliability?
C. Purposes of the Problem
CHAPTER II
CONCEPTUAL AND THEORETICAL FRAMEWORK
A. Practically
1
Doughlas H, Brown, 2004, Language Assessment (Principle and Classroom Practices),
(San Francisco: Longman), p. 19.
4
B. Reliability
1. Definition of reliability
A reliability test is consistent and dependable. If you give the same test to
the same student or matched students on two different opportunities, the test
should yield similar result. The issue of reliability of a test may be addressed by
considering a number of factors that may contribute to the unreliability of a test.
Consider the following possibilities movement in the student, in scoring, in the
test administration and in the test itself.2
2
Brown, Language Assessment ..., p. 20-21
3
Jurnal Psikologi Universitas Diponegoro Vol.3 No. 1, June 2006 .
5
2. Rater reliability
Unreliability may also result from the conditions in which the test is
administered. I once witnessed the administrations of a test of aural
comprehension in which a tape recorder played items for comprehension, but
because of street noise outside the building, students sitting next to windows
could not hear the tape accurately. This was a clear case of unreliability caused by
the situations of the test administration. Other sources of unreliability are found in
photocopying kinds, the amount of light in different parts of the room, kinds in
temperature, and even the conditions of the desks and chairs5.
4. Test Reliability
Sometimes the nature of the test itself can cause measurement errors. If a
test is too long test-takers may become fatigued by the time they reach the later
items and hastily respond incorrectly. Time tests may discriminate against
students who do not perform well on a test with a time limit. We all know people
(and you may be include in this category) who “know” the course material
perfectly but who are advesely affected by the presence of a clock ticking away.
Poorly written test items (that are ambiguous or that have more than one correct
answer) may be a further source of test unreliability6.
C. Validity
1. Content validity
The first form of evidence relates to the content of the test. A test is said to
have content validity its content constitutes a representative sample of the
language skills, structures, etc.
The test would have content validity only if it included a proper sample of
relevant structures. Just what are the relevant structures will depend, of course,
upon the purpose of the test.
What is the importance of content validity? First, the greater a test content
validity, the more likely it is to an accurate measure of what it is suppose to
measure. A test in which major areas identified in the specification are under-
represented-or not represented at all-is unlike to be accurate. Secondly, such a test
is likely to have a harmful backwash effect. Areas that are not tested are likely to
become areas ignored in teaching and learning. For this reason, content validation
should be carried out while test is being developed, it should not wait until the test
is already being used.8
2. Ceriteration-related validity
The second form of evidence of a test construct validity relates to the degree
to which result on the test agree with those provided by some independent and
highly dependable assessment of the candidates ability.
7
Arthur Hughes, Testing For Language Teachers, Second Edition,( United Kingdom:
Cambridge University Press, 2003), p.26
8
Hughes, Testing For …., p. 27
9
9
Hughes, Testing For …., p. 27
10
Hughes, Testing For …., p. 27
10
The word “construct” refers to any underlying ability (or trait) that is
hypothesized in the theory of language ability.11 A construct is any theory,
hypothesis, or model that attempts to explain observed phenomena in our universe
of perceptions. Constructs may or may not be directly or empirically measured-
their verification often requires inferential data.
4. Consequential Validity
5. Face Validity
11
Hughes, Testing For …., p. 31
12
Hughes, Testing For …., p. 26
13
Hughes, Testing For …., p. 33
11
it may mean that they do not perform on it a way that truly reflects their ability.
Novel techniques, particularly those which provide indirect measures, have to be
introduced slowly, with case, and with convicing explanations.
In the development of a high stakes test, which may significantly affect the
lives of those who take it, there is an obligation to carry out a full validation
exercise before the test becomes operational.
First, write explicit specifications for the test which take account of all that
is known about the constructs that are to be measured. Make sure that you include
a representative sample of the content of these in the test.
Second, whenever feasible, use direct testing. If for some reason it is decide
that indirect testing is necessary, reference should be made to the research
literature to confirm that measurement of the relevant underlying constructs has
been demonstrated using the testing techniques that are to be employed (this may
often result in disappointment, another reason for favouring direct testing).
Third, make sure that the scoring of responses related directly to what is
being tested.
14
Hughes, Testing For …., p. 34
12
CHAPTER III
CONCLUSION
A. Conclusion
Practicality refers to the need to ensure that the assessment requirements are
appropriate to the intended learning outcomes of a program, and that in their
operation they do not distort the learning/training process, and that they do no
make unreasonable demands on the time and resources available to learner,
teacher/ trainer and/or assessor.
A reliability test is consistent and dependable. If you give the same test to
the same student or matched students on two different opportunities, the test
should yield similar result.