Testing Paper KLP 3

CHAPTER I
INTRODUCTION
A. Background of the Problem
The most important consideration in designing and developing a language

test is the use for which it is intended, so that the most important quality of a
test is its usefulness. Test usefulness provides a kind of metric by which we
can evaluate not only the test that we develop and use, but also all aspects of
test development and use. We thus regard a model of test usefulness as the
essential basis for quality control throughout the entire test development
process.
A fundamental concern in the development and use of language tests is to

identify potential sources of error in a given measure of communicative
language ability and to minimize the effect of these factors on that measure.
We must be concerned about errors of measurement, or unreliability,
because we know that test performance is affected by factors other than the
abilities we want to measure.
In this paper, the writers propose a model of usefulness that includes five test
qualities. They are validity, reability, practically, impact, and authenticity.
Moreover, the writers also propose about language use in language test.
B. Limitation of the problem

1. Reliability, validity, authenticity, impact and practicality
2. Describing tasks
1
CHAPTER II
THEORETICAL FRAMEWORK
A. Qualities of Language Test
The traditional approach to describing qualities has been to discuss these as

more or less independent characteristic, emphasizing the need to maximize
them all. Language testers have been told that the qualities of reliability and
validity are essentially in conflict, or that it is not possible to design test tasks
that are authentic and at the same time reliable. A much more reasonable
position is that although there is a tension among the different test qualities,
this need not lead to the total abondement of any. It is our view that rather
than emphasizing the tension among the different qualities, test developers
need to recognize their complementarity. Notion usefulness can be
expressed as figure below.
Usefulness = Reliability + Construct Validity + Authenticity + Interactiveness

+ Impact + Practicality
Test usefulness can be described as a function of several different qualities,

all of which contribute in unique but interrelated ways to the overall
usefulness of a given test. The principles of test design and its development,
they are as follows :
Principle 1 : It is the overall usefulness of the test that is to be maximize,

rather than the individual qualities that affect usefulness.
2
Principle 2 : The individual test qualities cannot be evaluated independently,
but must be evaluated in terms of their combined effect on the overall
usefulness of the test.
Principle 3 : Test usefulness and the appropriate balance among the

different qualities cannot be prescribed in general, but must be determined for
each specific testing situation.
1. Validity
a. Definition of Validity
A test is valid if tests what it is supposed to tests. In other words, a test is said
to be valid if it measures accurately what it is intended to measure. Thus it is
not valid, for instance, to test writing ability with an essay question that
requires specialist knowledge of history or biology – unless it is known that all
students share this knowledge before they do the test. Validity is a concept
designating an ideal state – to be pursued, but not to be attained. 1 As the
roots of the words imply, validity has to do with truth, strength, and value.
Rather, validity is like integrity, character to be assessed.
b. Types of Validity
1. Construct Validity
Construct validity refers to the question of whether the theory supported by

the findings provides the best available experimental of the results. Construct
validity focused on kind of the test that was used to measure the ability. A
test, part of a test, or a testing technique is said to have construct validity if it
can be demonstrated that it measures just the ability which is supposed to
measure.2 The word “construct” refers to any underlying ability ( or trait )
which is hypothesized in a theory of language ability. Construct validation is a
1
David Brinberg, Joseph E. McGrath, Validity and the Research Process, ( California: Sage
Publications, 1984), p. 13
3
research activity, the means by which theories are put to the test and are
confirmed, modified, or abandoned.
2. Content Validity
A test is said to have content validity if its content constitutes a representative

sample of the language skills, structures, etc.3 With which it is meant to be
concerned. It is obvious that a grammar test, for instance, must be made up
of items testing knowledge or control of grammar. But this is itself does not
ensure content validity. The test would have content validity only if it included
a proper sample of the relevant structures. The importance of content validity,
as follows :
a. The greater a test’s content validity, the more likely it is to be an

accurate measure of what it is supposed to measure.
b. Such a test likely to have a harmful backwash effect. Areas which are
not tested are likely to become areas ignored in teaching and learning.
3. Face Validity
A test is said to have face validity if it looks as if it measures what it is

supposed to measure. For example, a test which pretended to measure
pronunciation ability but which did not require the candidate to speak might
be thought to lack face validity. Face validity is hardly a scientific concept, yet
it is very important. A test which does not have face validity may not be
accepted by candidates, teachers, education authorities or employers.
4. Criterion-Related Validity
2
Arthur Hughes, Testing for Language Teachers, ( New York: Cambridge University Press, 1989 ), p.
26
3
ibid p. 22
4
Another approach to test validity is to see how far results on the test agree
with those provided by some independent and highly dependaple assessment
of the candidate’s ability. This independent assessment is thus the criterion
measure agains which the test is validated. There are essentially two kinds of
criterion-related validity, they are concurrent validity and predictive validity.
a. Concurrent Validity
Concurrent validity is established when the test and the criterion are
administered at about the same time.
b. Predictive Validity
Predictive validity concerns the degree to which a test can predict
candidates’ future performance.
2. Reliability
Reliability is often defined as consistency a measurement. 4A good test

should give consistent results. For example, if the same group of the students
took the same test twice within two days – without reflecting on the first test
before they sat again – they should get the same results on each occasion. If
they took another similar test, the results should be consistent. If two groups
who were demonstrably alike took the test, the marking range would be the
same. Reliability shows that an instrument can be believed to be used as a
tool of data collecting technique when the instrument is good enough.
In practice, “reliability” is enhanced by making the test instruction absolutely

clear, restricting the scope for variety in the answers, and making sure that
test conditions remain constant. Reliability also depends on the people who
mark the test – the scores. Clearly a test is unreliable if the result depends on
to any large extent on who is marking it.
4
Lyle F. Bachman, Adrian S. Palmer, Language Testing in Practice: Designing and
Developing Useful Language Tests, ( Oxford: Oxford University Press, 1996), p. 19.
5
Scores on test tasks with Reliability Scores on test tasks with
characteristic A characteristics A’
3. Practicality
Practicality pertains primarily to the ways in which the test will be

implemented, and to a large degree whether it will be developed and used at
all. Although the consideration of practicality logically follows the
considerations of other qualities, this does not imply that practicality is any
less important than the other qualities of usefulness. Thus, determining the
practicality of a given test involves the consideration of the resources that will
be required to develop an operational test that has the balance of qualities,
and the allocation and management of the resources that are available.
Practicality = Available Resources / Required resources
If practicality ≥ 1, the test development and use is practical
If practicality ≤ 1, the test development and use is not practical
Practicality is a matter of the extent to which the demands of the particular

test specifications can be met within the limits of existing resources.
Practicality is useful because it enables us to define a “ threshold level “ for
practicality in any given testing situation. If the resource demands of the test
specifications do not exceed the available resource at any stage in in test
development, then the test is practical and development and test use can
proceed. Thus, a practical test is one whose design, development, and use
do not require more resources than are available.
6
In order to assess practicality, we need to resources which classify into three
types , they are as follows:
a. Human Resources
They include test writers, scorers or raters, and test administrators.
b. Material Resources
Rooms for test development and test administration, equipment such
as papers, pictures, typewriters.
c. Time
It consists of development time ( time from beginning of the test
development process to the reporting of scores from the first
operational administration.
4. Impact
Another of quality of tests is their impact on society and educational systems

and upon the individuals within those systems. The impact of test use
operates at two levels, they are micro level, in terms of the individuals who
are affected by the particular test use, and macro level, in terms of the
educational system or society.
Test taking and use Impact Macro : society,

of test score education system
micro: individuals
a. Washback
7
An aspect of impact that has been particular interest to both language
testing researches and practitioners is what is referred to as
‘washback” and most discussions of this have focused on processes
( learning and instructions). Washback as the direct impact of testing
on individuals , and it is widely assumed to exist.
b. Impact on individuals
A variety of individuals will be affected by and thus have an interest, or
hold a “ stake”, in the use of a given test in any particular situation.
1. Impact on test takers
Test takers can be affected by three aspects of the testing
procedure:
- The experience of taking and, in some cases of preparing for
the test,
- The feedback they receive about their performance on the test,
and
- The decisions that may be made out about them on the basis of
their test scores.
2. Impact on teachers
The second group of individuals who are directly affected by tests
are test users, and in an instructional program the test users that
are most directly affected by test use are teachers. As noted above,
impact on the program of instruction, as implemented by classroom
teachers, has been referred to by language testers as washback.
One way to minimize the potential for negative impact on
instruction is to change the way we test so that the characteristics
of the test and test tasks correspond more closely to the
characteristics of the instructional program.
c. Impact on society and education systems
8
A test developers and test users must always consider the societal and
educational value systems that inform our test use. The consideration
of values and goals is particularly complex in the contect of second or
foreign language testing, since the this situation inevitably leads us to
the realization that the values and goals that inform test use may vary
from one culture to the next. For example, one culture may place great
value on individual effort and achievement, while in another culture
group cooperation and respect for authority may be highly valued.
Values and goals also chane over time, so that issues such as secrecy
and access to information, privacy and confidentially, which are now
considered by many to be basic rights of test takers, were at one time
not even a matter of consideration. The other examples, the potential
impact on the language teaching practice and language in a given
country of using a particular type of test task, such as the multiple-
choice item or a specific type of oral interview, in widely used high-
stakes tests on a national level.
The impact of the test use needs to be considered within the values and
goals of society and the educational program in which it takes place, and
according to the potential consequences of such use.
5. Authenticity
Authenticity is the degree of correspondence of the characteristics of a given

language test task to the features. Authenticity is an important test quality
because it relates the test task to the domain of generalization to which we
want our score interpretations to generalize. Another reason for considering
authenticity to be important is because of its potential effect on test takers’
perceptions of the test and, hence, on their performance.
9
Authenticity
Characteristics of the Characteristics of of
TLU task the test task
B. Describing Tasks: Language Use in Language Tests
The characteristics of tasks are of interest for several reasons, they are :
1. These characteristics provide the link between tasks in different

domains – the domains of test tasks and the domain of non-test tasks
– and permit us to select or design test tasks that correspond in
specific ways to language use tasks.
2. The characteristics of the task will help determine the extent and ways
in which the task taker’s language ability is engaged.
3. The degree of correspondence between the characteristics of a given
test task and of a particular language use task will determine.
4. The characteristics of the test task can potentially be controlled by the
way language tests are designed and developed.
1. Language Use Tasks

Two aspects of tasks that are relevant to both language use and
language testing :
a. The individual must understand what sort of result is to be achieved
b. The individual needs to have some idea of the criteria by which
performance will be assessed.
10
A language use task is an activity that involves individuals in using
language for the purpose of achieving a particular goal or objective in a
particular situations.
2. Language use in language tests

In general, language use can be defined as the creation or
interpretation of intended meanings in discourse by an individual, or as
the dynamic and interactive negotiation of intended meanings between
two or more individual in a particular situation.
3. Target Language Use ( TLU ) Domain

Target language use as a set of specific language use tasks that the
test takers is likely to encounter outside of the test itself , and to which
we want our inferences about language ability to generalized. There
are two general types of target language use domain that are of
particular interest to the development of language test .
a. Real-life domains, in which language is used essentially for
purpose of communication.
b. Language instruction domain, in which language is used for the
purpose of teaching and learning of language.
4. Characteristics of Test Task

The characteristics of the tasks used are always likely to affect test
scores to some degree, so that there is virtually no test that yields only
information about the ability we want to measure.
11
5. A Framework of language task characteristics
The framework of task characteristics consist of a set features for
describing five aspects of task : setting, test rubric, input, expected
response, and relationship between input and respone. Framework is
to provide a basis for language test development and use. This
involves three activities :
a. Describing TLU tasks as a basis for designing language test tasks,
b. Describing different test tasks in order to insure their comparability
and as a means for assessing reliability,
c. Comparing the characterictics of TLU and test tasks to assess
authenticity.
6. Characteristics of the setting

These characteristics include:
a. Physical setting, it includes the location, the noise level,
temperature, and lighting.
b. Participants setting, these are the people who are involved in the
task.
c. Time of task, is the time at which the test is administrated or at
which the TLU task takes place.
7. Characteristics of the test rubrics

a. Instructions, language ( native, target), channel (aural, visual),
specifications of procedures and tasks.
b. Structure, number of parts/tasks, sequence of parts, number of
tasks.
c. Time allotment
d. Scoring method
8. Characteristics of the input
12
a. Format
b. Language of input
c. Topical characteristics
9. Characteristics of the expected response

a. Format, as follows : channel, form, language, length, type
b. Language of expected response
a. Language characteristics
b. Language of expected response.
10. Relationship between input and response

a. Reactivity: the extent to which the input or the response directly
affects subsequent input and response.
- Reciprocal tasks are those in which the taker or language user
engages taker or language use with another interlocutor.
- Non-reciprocal, there is neither feedback nor interaction
between language users.
- Adaptive, a recent development in measurement is the use of
adaptive tests in which the particular tasks presented to the test
taker are determined by the response to previous tasks.
b. Scope of relationship : the amount or range of input that must be
processed in order for the test taker or language user to respond as
expected.
- Broad scope : tasks that require the test taker or language user
to process a lot of input can be characterized.
- Narrow scope : tasks that require the processing of only a
limited amounts of input can be characterized.
c. Directness of relationship : the degree to which the expected
response can be based primarily on information in the input, or
13
whether the test taker or language user must also rely on the
information in the context or in his own topical knowledge.
- Direct : the response includes primarily information supplied in
the input.
- Indirect : the response includes information not supplied in the
input.
11. Applications of the tasks characteristics framework

a. Using a task characteristics checklist for comparing characteristics
of target language use tasks and test tasks
b. Using a task characteristics checklist to create completely new test
task types
CHAPTER III
CONCLUSION
There are five test qualities : validity, reliability, authenticity, impact, and
practically. These five test qualities all contribute to tets usefulness, so that
14
they cannot be evaluated independently of each other. Furthermore, the
relative importance of these different qualities will vary from one testing
situation to another. So that, test usefulness can only be evaluated for
specific testing situation.
A language use task is an activity that involves individuals in using language

for the purpose of achieving a particular goal or objective in a particular
situation or setting. Two important uses of the task characteristics framework
are (1) creating a template for describing both a TLU situation and a test task
in order to evaluate the degree to which these two tasks correspond to each
other, and ( 2) varying certain test method characteristics in order to create
new testing methods.
REFERENCES
Bachman, Lyle F. Fundamental Considerations in Language Testing. Oxford:

Oxford University Press. 1990.
15
David Brinberg, Joseph E. McGrath. Validity and the Research Process. California:
Sage Publications, 1984.
Hughes, Arthur. Testing for Language Teachers. New York: Cambridge

University Press. 1989.
Lyle F. Bachman, Adrian S. Palmer. Language Testing in Practice: Designing

And Developing Useful Language Tests. Oxford: Oxford University
Press. 1996.
16

Testing Paper KLP 3

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Testing Paper KLP 3

Încărcat de

Drepturi de autor:

Formate disponibile

CHAPTER I

A. Background of the Problem

The most important consideration in designing and developing a language

A fundamental concern in the development and use of language tests is to

B. Limitation of the problem

A. Qualities of Language Test

The traditional approach to describing qualities has been to discuss these as

Usefulness = Reliability + Construct Validity + Authenticity + Interactiveness

Test usefulness can be described as a function of several different qualities,

Principle 1 : It is the overall usefulness of the test that is to be maximize,

Principle 3 : Test usefulness and the appropriate balance among the

Construct validity refers to the question of whether the theory supported by

A test is said to have content validity if its content constitutes a representative

a. The greater a test’s content validity, the more likely it is to be an

A test is said to have face validity if it looks as if it measures what it is

Reliability is often defined as consistency a measurement. 4A good test

In practice, “reliability” is enhanced by making the test instruction absolutely

Practicality pertains primarily to the ways in which the test will be

Practicality = Available Resources / Required resources

If practicality ≥ 1, the test development and use is practical

If practicality ≤ 1, the test development and use is not practical

Practicality is a matter of the extent to which the demands of the particular

Another of quality of tests is their impact on society and educational systems

Test taking and use Impact Macro : society,

c. Impact on society and education systems

Authenticity is the degree of correspondence of the characteristics of a given

B. Describing Tasks: Language Use in Language Tests

1. These characteristics provide the link between tasks in different

1. Language Use Tasks

2. Language use in language tests

3. Target Language Use ( TLU ) Domain

4. Characteristics of Test Task

6. Characteristics of the setting

7. Characteristics of the test rubrics

8. Characteristics of the input

9. Characteristics of the expected response

10. Relationship between input and response

11. Applications of the tasks characteristics framework

A language use task is an activity that involves individuals in using language

Bachman, Lyle F. Fundamental Considerations in Language Testing. Oxford:

Hughes, Arthur. Testing for Language Teachers. New York: Cambridge

Lyle F. Bachman, Adrian S. Palmer. Language Testing in Practice: Designing

S-ar putea să vă placă și