Documente Academic
Documente Profesional
Documente Cultură
3.0 Introduction
Unit 1 of this block introduced you to the aims and purposes of testing and
helped you understand what language testing involves. Unit 2 dealt with
the objectives of teaching and testing, what is worth testing and how. In
this unit we will examine the inherent characteristics of a good test.
A good test is based on certain general principles of testing. Whatever be
the kind of test or the scoring procedure, a good test will tell us whether
certain abilities do or do not exist in the language user. It also tells us the
extent to which the abilities exist. This information has to be true. The two
important terms associated with this truth value of test-information are
validity and reliability. This unit will deal with the key principles of
language testing:
validity
reliability
authenticity
interactiveness
practicality and
impact
learners have or alternatively, the learning has not taken place but the test
results make us think that learning has been successful.
The need to assess the worth of a test can arise in two types of situation:
a. You along with a team have prepared a test for some specific purpose
and wish to evaluate it.
b. You wish to use a readymade test for a specific purpose.
In both cases you will need some criteria for evaluating the test
instruments. Various criteria are available for judging the quality of a test,
or rather, a testing procedure, which involves the test itself and the use that
is made of the information that is provided.
The questions we ask are:
The first two, validity and realiability are technical concepts. The other
three are considerations related to the usability and usefulness of the test in
practical situations. We will deal with each of these questions in the
following sections.
Please remember that no test is perfect. The goal is to constantly improve
the validity, reliability and usability of tests.
3.2 Validity
Every test has a specific purpose. How well does the test fulfill this
purpose? How successful is it in eliciting the information that it seeks to
elicit? Does it accurately measure the ability it seeks to measure? The
answers to these questions indicate the validity of a test.
Validity has been defined as "the degree to which a test measures what it
claims, or purports, to be measuring" (Brown, 1996, p. 231).
This will include the test design, the scoring procedure and the
interpretation of test performance.
There are three main kinds of validity:
Construct validity
Content validity and
2
Criterion validity
Activity A
With the help of the above definition, tick the items that may be called a
construct.
reading
writing
voting
counting
spelling
telephoning
inferring
sequencing
Discussion
Except for voting and telephoning all the others are examples of constructs.
All of them are mental abilities. Reading, writing, spelling are languagerelated abilities. Counting, inferring, sequencing are cognitive abilities.
Each of them can be tested.
If we want to test if a person can read, we give him/her a reading test. This
test will specifically measure reading ability. We recognise that reading is
an ability distinct from writing or speaking or listening or grammatical
control. The ability can be isolated for testing.
When we talk of validity in testing, we primarily mean construct validity.
The other kinds of validity fall under construct validity.
How do we say that a test has construct validity? If the test results show
the existence or non-existence of an ability, we say it has construct validity.
For example, if we give a six- year old child a string of 25 beads and ask
the child to count them, we can say from what the child does, if she/he can
count or not. This test of counting has construct validity.
3
It is easy to say that a test has construct validity in a direct test, say, of
writing or speaking. It is difficult to establish this in indirect tests. We
need empirical evidence to prove this.
If we test writing indirectly through multiple choice tests, the performance
on these tests should correspond with performance on longer pieces of
writing.
If we test reading comprehension, we must be sure that the items test only
comprehension and not the grammaticality of the written answers.
The two primary forms of construct validity are the two subordinate forms
of validity, content validity and criterion validity.
Construct
validity
Content
validity
3.2.2
Criterion
validity
Content validity
As the name suggests, this relates to the content of the test. How is the
content determined? This depends on the area or domain being tested.
This content will already have been laid down in the course objectives.
The test should measure the ability/abilities in this domain appropriately,
and at the same time include adequate samples from the domain.
Let us look at an example of a sample CBSE Class IX test paper and see
how it aims to establish content validity. Only the outline of the contents is
given, not the actual test paper.
SAMPLE QUESTION PAPER
3 hours
80 marks
SECTION A
(40 periods)
READING
15 marks
Two unseen passages with a variety of comprehension questions including
4 marks for word attack skills such as word formation and word meaning.
A.1
A.2
The total length of the two passages is between 650 and 800 words.
A.1
A.2
SECTION B
WRITING
(63 periods)
25 marks
B.1 and B.2 Short composition of not more than 50 words each e.g.
notice, message or short postcard.
5 marks
8 marks
One of the longer (7/8 marks) questions will draw on the thematic content
of the Main Course Book.
The 150 word composition will be for 7 marks and the 200 word
composition for 8 marks.
SECTION C
GRAMMAR
(42 periods)
15 marks
LITERATURE
25 marks
D.1 and D2: Two extracts from different poems from the prescribed
Reader followed by two or three questions to test local and global
comprehension of the set text. Each extract will carry 4 marks. Word
limit: one or two lines for each answer.
D.3
D. 4
One question based on one of the prose texts from the prescribed
reader to test global comprehension and extrapolation beyond the
text. Word limit: 50-75 words.
3 marks
D. 6
One extended question based on one of the prose texts from the
prescribed reader to test global comprehension and extrapolation
beyond the text. Word limit: 150-175 words.
Questions will test comprehension at different levels: literal,
inferential and evaluative.
6 marks
Note:
Since Continuous and Comprehension Evaluation is to be followed the
weighting will be as follows:
Assignments to test Listening skills:
10%
10%
20%
40%
The four areas of content are specified, namely, reading, writing, grammar
and literature. Writing and literature get 25 marks each; reading and
grammar 15 marks each out of a total of 80 marks.
You will see that 55% f the test paper is devoted to language and 25% to
questions based on the prescribed reader.
The course of instruction will have a similar distribution. The number of
periods is also indicated. This follows the objectives of a school English
course at Class IX level.
What has been described above is the content of the course. Content
validity in an achievement test would imply the extent to which the areas
covered during the course appear in the test paper and if there are adequate
samples of each area.
We might argue that literature and memory-based questions do not reflect
language ability. In school contexts, however, there is a great reliance on
textbooks for language teaching and hence both teachers and students feel
that there should be questions based on the prescribed textbook. Very few
language papers in the academic set-up do without memory questions,
though we may question the construct that is being tested here. At best, it
may be said that overall language ability is indirectly tested.
6
To partially veer around this problem, in the question paper above, the
lines from the textbook are reproduced (with or without extract). Yet, the
questions depend upon remembering the context of the poem and the prose
pieces.
If you look at the objectives in any language syllabus, you will not find any
reference to prescribed texts. You will find only language abilities. The
abilities stated in the objectives should form the content of a language test.
The above example illustrates what we mean by content validity.
A proficiency test will also have similar content specification. Here is the
test content specification for the CBSE Class X Proficiency Test.
The test is text independent i.e. it is not based on a set text or syllabus. As a
Proficiency Test, it tests both skills and knowledge. There is a balance between
key aspects of language as, for instance, reading skills, involving language
knowledge as well as ability to process meanings through inference, analysis,
comparison and evaluation; knowledge of grammar and vocabulary to the extent
that is required for general communicative tasks. Specific skills involved in writing
are also important e.g. awareness of the structure of simple written texts, how
they are organized, and the kinds of formats that are used in letters, for
instance.
Reading is given 30 marks as it is the basis for grammar & writing and because it
is important in further studies which students have to undertake in their later
academic work.
One passage of reading is of narrative type, (it may be an extract from a story)
and tests candidates understanding of events, characters, descriptions and also
the perception of meanings which are implicit in the details of the story.
One passage is a nonfictional text containing information, argument, opinion,
facts and ideas. Reading of this kind is focused on ability to arrive at the gist of an
idea/argument, to correctly separate opinions from facts (which implies some
ability to analyse), to be able to distinguish main ideas from subordinate ones, to
understand the tone or viewpoint e.g. humorous, ironic, serious etc.
The third passage is a short poem, around 20 lines. This is to test if candidates
can understand language which is composed differently it is not linear, has
hidden meanings, unusual expressions and uses sound effects (e.g. rhyme),
simile and metaphor which conveys meaning indirectly rather than directly.
Vocabulary is given 20 marks as:
1. Vocabulary is central in reading comprehension, where it is essential in
meaning.
2. It is also tested separately in order to test range of knowledge of words.
It is to be kept in mind that the level of vocabulary is such that is commonly found
in texts that are prescribed in the school readers at class 9 and 10.
Grammar + Writing (30 +20) The MCQ format does not allow testing of writing
skills as writing is integrative of other skills and needs to be tested through
production. However, it is felt that the awareness of the components of writing can
be tested here e.g. format of letters, paragraph organization, linkage between
sentences etc. These are also part of language knowledge.
Cloze Test has been included as it is a test which is the most global and
comprehensive test of language. It consists of a passage where the first sentence
sets the context of the passage and subsequently there are blanks at regular
Notice the first statement. It states that it is text independent and it tests
both skills and knowledge.
The domain of the test is delineated. The question paper must reflect the
description of the test content.
One question that is important is the coverage of the different areas and the
adequacy of the sampling. This is conditioned by the length of the test and
the sub-test. Obviously 3 items are not enough to cover the whole area of
tense. The principle of coverage is readily seen in the case of the general
English paper related to the reader consisting of several selections. While
no test can be long enough to include a number of questions on each
lesson, the overall plan of the paper (choice within sections only) can
enhance the coverage of the content.
Content validity cannot be represented by a numerical index. It is
established qualitatively by a process of inspection. Both the apparent
nature of the item and what actually happens when a student attempts it
need to be scrutinized. It is important to look at what the item actually
requires the examinee to do, rather than the format alone (short answer,
long essay etc.).
8
3.2.3
Criterion validity
Criterion
validity
Concurrent
validity
Predictive
validity
The scores of the 10-student sample on the long test and the short test are
compared. If they correspond well, then the short test is a valid
representation of the items in the long test.
This is called concurrent validity. The scores on the longer test is the
criterion against which the validity of the scores on the shorter test is
judged.
A long test may not be the only criterion. As a teacher you are in a position
to judge the abilities of your students. You expect a high degree of
agreement between your subjective assessment and your students test
performance. In other words, what we are saying is that the test should
bring out the best of the students ability. The test should be a valid
measure of a candidates ability.
Degrees of agreement between a test and the criterion can be statistically
measured by the correlation coefficient. This will be dealt with in Unit 3,
Block IV. Perfect agreement between two sets of scores will result in a
coefficient of 1 and total lack of correlation will result in 0.
3.2.3.2
Predictive validity
If candidates are asked to write about some subject area topic in the
English test, it will fail to elicit the right kind of response. Finally, it should
also be said that if the test is not scored properly, its validity suffers. This
will be further discussed under reliability.
In this section, we have discussed validity and its different forms. When
we say a test is valid, we are saying that the test result or score is a
dependable measure of a persons ability.
Review question I
Match the statements in Column A with the terms in Column B.
A
B
a. The quality of a test that indicates
fairly accurately, the existence or
non-existence of an ability in an
individual.
1. Predictive validity
2. Face validity
3. Construct
c. A mental ability that can be defined
and can be shown to exist or not
exist in an individual.
4. Content validity
d. The
correlation
between
performance on a test and some
independent
and
dependable
assessment of that ability.
5. Concurrent validity
e. The
correlation
between
an
individuals performance on a test
and his/her performance on future
tasks that are represented in the test.
6. Construct validity
7. Criterion validity
3.2
Reliability
We said that validity is an inherent quality of a test. The content of the test
is one aspect of testing; the other aspect relates to the conditions that affect
the performance on a test.
Activity B
Given below are five test situations. Tick the items that you think will lead to
a true reflection of ability.
1. Candidates appearing for a Common Medical PG Entrance test were
surprised to find that the 200 marks paper that they had come prepared
for turned out to be a 100 marks paper. As soon as the mistake was
realised the test papers were replaced by the 200 marks paper. This
resulted in a half hour delay in the starting the test.
2. Candidates appearing for a mathematical ability test were given four
different versions of a test covering the same content areas on four
successive days.
3. Candidates were assessed on the basis of one examination at the end of
a year-long course of instruction.
4. Candidates appeared for a test under strict police vigilance during a
local political disturbance.
5. Candidates had to submit a collection of writings that they had done
during the course for evaluation when they appeared for the final
writing test.
Discussion
Items 2 and 5 are true indicators of ability. The tests can be said to be
reliable.
In situation 2, the candidates are tested for the same ability again and
again. This evens out any differences arising from ones psychological
state on a particular occasion.
12
In situation 5, the candidates are tested not only by a formal test but also
all the writing that they had done during the course. This offers a variety of
writing samples which can be judged.
Let us now examine the other three situations.
In situation 1, candidates will tend to be psychologically affected by the
administrative mistake and may not perform to the best of their ability.
In situation 3, one final examination may not be a reliable source of
information. A candidate may not be in the best of health; some other
candidate may have had a mental setback; a third candidate may have not
studied through the year but crammed for the examination. Several
samples of ability are required to judge a candidate.
Item 4 is a stressful situation. Candidates may experience a great deal of
anxiety while they are writing the exam. This may affect their
performance.
The activity would have helped you see that reliability is very important
for the interpretations we make about a candidates ability based on test
scores. Assuming that the test is measuring a stable characteristic
(construct), different observations made using the same test should yield
consistent results.
To ensure reliability we need the scores on several tests. There is always an
element of chance each time a test is administered. At several stages in the
total process of constructing and using an ability test there could arise the
question of error of measurement in a statistical sense. At each stage a
decision has to be made, which, even if sound in itself, is only one among
many possible decisions at that point. The occurrence of a particular
decision is governed by chance factors- and so is a source of error, in the
sense of variability. That is, when the test is repeated, some other factor
might influence the item construction. This becomes an error of
measurement.
c)
Each item of Test A that is in the multiple choice format has a number
of distractors.
Other test writers could use a wide range of other possible distractors
even if the correct option is identical in every case.
f)
14
The list above has indicated a number of points in the testing procedure at
which a particular decision is taken or outcome obtained. Thus the score
secured by a candidate in a test is affected by a unique combination of
chance factors. A different combination, again, purely the result of chance,
might have resulted in a different score.
Although these examples relate to test content, it is not the validity of the
items themselves that are being questioned. Each test may be valid by
itself. What is at issue here are the variable factors that may influence test
performance.
15
3.2.5
17
Review question II
Tick the items that make a test reliable.
1. Limited number of test items.
2. Open-ended questions.
8. Ambiguous items.
3.3 Authenticity
Validity and reliability are traditionally accepted principles in testing and
have been applied to system-based testing. With the shift in approach to
communicative language testing, authenticity is an additional principle that
is seen to be important for validity. Performance on language tests should
indicate how well the candidate can perform on real language use tasks.
These are called Target Language Use (TLU) tasks. If you are taking a
course in editing and publishing in order to take up an editorial job in a
publishing firm, the test should include tasks that you will be expected to
do in that field. If the test is a general language proficiency test, it might
be valid, but it may not be authentic, because editing for publishing
requires a specialised orientation.
You may think that this is similar to predictive validity. The difference is
this. Predictive validity is an indirect way of judging suitability for specific
tasks. The content of an authentic test is more direct in that it uses tasks
18
Characteristics of
the TLU task
Authenticity
Characteristics of
the test task
Let us now look at the reasons why authenticity in test tasks is important.
One of the purposes of tests is that we can make generalisations about
performance on future tasks based on our interpretations of the test scores.
That is, we want to match test performance with non-test performance in
the Target language Use domains. It is thus related to construct validity
which addresses the question: Does the test give accurate information on
the ability that it claims to test.
Try this activity to see what we mean.
Activity C
Given below are two test tasks. State which task is more authentic and why.
1. Write a story with the help of the outline given below.
2. Write a notice to be put up on your school notice-board announcing a
poetry-reading competition to be held I your school with the help of the
information given.
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
Discussion
Task 2 is more authentic than Task 1.
What are the reasons?
Wherever we work or study, at some point or the other, we may have to
write notices for various purposes. Notices belong to the informational
19
Read the text about car production and complete the flow chart below.
BUILT TO ORDER
As soon as a car is ordered and a delivery date agreed, weekly and daily
production schedules are created and sent to outside suppliers and the companys
own pre-assembly stations. This is to make sure that all the necessary
components arrive on time.
First of all, a small data carrier contains all the customers specifications and
communicates wirelessly with control units along the production line. In the body
shop the floor pan, wheel arches, side panels, and roof are welded together by
robots to make the frame of the car. The add-on parts _ the doors, boot lid, and
bonnet_ are then mounted to make the body-in-white.
The finished body shell then goes into the paint shop where the data carrier
determines the colour. In final assembly, the interior and exterior parts (for
example, the front and rear bumpers, headlights, windscreen and other windows)
are fitted. After quality control and a final check, the finished car can be released.
It is now ready for delivery to its new owner.
From English for the Automobile Industry, Marie Cavanagh, OUP, 2007 p.14
20
21
22
Both the topic and the task type will be of interest to automobile
engineering students. The focus is off-language but they have to read the
text, gather information from it, and fill in the flow chart according to what
they read. All this involves processing language and transferring
information to a non-text format. The students may have to read such texts
during their course of study as well as in their occupations later on. They
will thus perceive its relevance and usefulness and be motivated to do the
task well.
Compare this with the following test task for the same group of students.
Read the following poem and underline the figures of speech used in it. Then
explain each of these.
This task is literary in nature and not all engineering students will see the
usefulness or relevance of this task for their needs for English in their work
domain.
It must, however, be said that the objectives of a particular language course
must be made clear to the learners so that they understand the rationale for
particular tasks that are done during classroom learning as well as the test
tasks. This will ensure that the perceptions of course developers, teachers,
paper-setters, examiners and the candidates converge towards a common
goal.
I am saying this because many teachers and students come with the
conditioned view that a language course is meant to give exposure through
literary and semi-literary texts and that this would invisibly lead to
language development. While this may happen to a certain extent, it may
not lead to equipping the learners with the kind of skills that they need to
employ for authentic language tasks.
Another factor to be remembered is that complete congruence of test tasks
and TLU domain tasks is difficult to achieve. What we can aim at is a
close approximation to them.
Closely related to authenticity is interactiveness of test tasks.
3.4 Interactiveness
Bachman and Palmer define interactiveness as the extent and type of
involvemet of the test takers individual characteristics in accomplishing a
test task (1996: 25).
The three elements involved in individual characteristics are the testtakers:
affective schemata
Interactiveness of a test task is the extent to which these three elements are
engaged while doing the test task.
For example, a student of sociology will find a language test task which
has something related to sociology as a test input more engaging than a test
input from the domain of natural science.
The following figure from Bachman and Palmer illustrates the relationship
between the three elements and the characteristics of a language test task.
LANGUAGE ABILITY
Language knowledge
Metacognitive strategies
Affective
schemata
Topical
knowledg
e
Characteristics of
language test task
Authenticity pertains to the correlation between test tasks and the TLU
tasks. Interactiveness pertains to the relationship between the individual
characteristics and the test task. Interactiveness is a quality of any task.
To bring out this distinction Bachman and Palmer gives four examples of
test tasks.
24
The first example (A) is a test for selection of typists. The test task for this
involves the ability either to listen to and type out a spoken text or copy out
a written text. A typist may or may not really understand what he/she is
typing. But as a test of typing ability the task relates to the future work
domain ability. Such a task is highly authentic because the test task relates
to the TLU task (what the typist has to do at work) but it is not very
interactive because the typist does not really process the text or interact
with it. It is mechanical copying on the typewriter. (You might be familiar
with this example with the typists of your own institution.)
The second example (B) is a test for the same selection of typists. The
candidates are asked to make small conversation; talk about food, the
weather, clothing etc. through an interview format. They are also allowed
to choose the topic that interests them. This task involves a lot of
interaction but does not have any connection with a typists job. Hence this
task is more interactive but less authentic.
The third example (C) is a vocabulary task for international students
entering an American University. The test involves matching a list of
words in one column with their meanings in the second column. This test is
expected to give diagnostic information on the students ability to read
academic texts. But the test task has no relation to the TLU domain
because the candidate will never be asked to match words with their
meanings in their studies at the university. It is not very authentic. The
task also does not demand too much language knowledge or use of
metacognitive strategies. Hence it is not very interactive either.
The last example (D) is a test task for prospective salesmen involving a
role-play between the salesperson and a customer. The examiner takes on
the role of a customer and the candidate has to sell his/her product. This
task is highly authentic and interactive. It is authentic because it relates to
the TLU domain tasks. It is also highly interactive because it involves all
areas of language knowledge, strategic competence and topic knowledge.
This task is both authentic and interactive.
Both interactiveness and authenticity are relative concepts. We say that a
task is high or low on these principles but not authentic/inauthentic or
interactive/non-interactive. The characteristics are not inherent in the tasks
but depend upon the individuals taking the test and the intended purposes
of the test.
Please note that while interactiveness is more prominent in oral tasks,
reading and writing tasks also involve interactiveness. The individuals
engagement with the task is what determines interactiveness.
This discussion will help you especially when you are studying the unit on
testing speaking skills.
25
3.5
Practicality
As you were reading about these principles, you must have been
wondering whether all this is really practicable and if tests can always be
made according to these principles. This raises the issue of practicality or
usability.
A concern for all the principles mentioned above may lead to the devising
of tests that are not very cost- and time-effective. We are concerned with
educational testing- procedures that you as a language teacher can use and
even develop yourself. Moreover, the test should not be so strange that
students and even the other stake-holders (parents, employers) do not
accept them. It is also important that the use of a new test has, in balance,
a positive effect on the teaching-learning process. This will be discussed in
the next section under washback.
A test may be very valid, reliable, authentic and interactive and yet it may
require a disproportionate amount of teachers (and students) time and
hence not be practicable. Two important considerations are: a) the
resources that will be required to develop an operational test that will have
a balance of all the qualities mentioned above and , b) the allocation and
management of the time that is available.
Bachman and Palmer define practicality as the relationship between the
resources that will be required in the design, development, and use of the
test and the resources that will be available for these activities. If the
available resources are equal to the required resources, the test is practical;
if the available resources are less than the resources required the test is not
practical.
This is why we need Continuous Comprehensive Evaluation. What is not
practicable in a limited-time large-scale testing can be done in an openended manner during the course of instruction. This allows for a variety of
task-types. Authenticity can be better ensured with greater allocations of
resources and time. Interactiveness, especially in speaking activities can be
better tested through items that will involve peer interaction rather than
communication with the examiner or a recording machine.
Project-based learning is an accepted educational practice. This would
involve out-of-classroom field work. They may be surveys, data collection
of various kinds, creating tasks etc. These can be valid and reliable
measures of a candidates ability. This however, cannot be tested in a
timed-examination.
In this section, we tried to rationalise the principles of testing and ensure
that you do not get the feeling that the principles are paramount and we
have to subject all our tests to these principles rigidly. Practicality, we saw
is the final consideration.
26
3.6 Impact
Until this point we discussed the principles of validity, reliability,
authenticity, interactiveness and practicality. All these are principles are
27
Consider the Class XII examination conducted either by the Central or the
State Boards. List all the stakeholders in this examination.
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
Discussion
Match your list with the following:
1. Students
2. Teachers
3. The school system
4. The examining boards
5. The higher education admitting authorities
6. Employers
The school-leaving examination is a landmark in the career path of
students. Their levels of achievement serve as an indicator for successive
batches of students with regard to the investment of effort they will have to
make. This examination decides their future, their choice of career and
institutions of higher education. Those who cannot continue their education
will have this certificate for applying for various jobs available at that
level.
Teachers have a very high stake in examination results as they are
perceived to be primarily responsible for the preparation of the students for
the examination.
School administrations take these results as a yardstick of the performance
of their schools. If the results are good with students getting national or
state level ranks, it serves as publicity for their schools. If the results are
not satisfactory, they take remedial measures to ensure that their
institutions do not fall behind the others in standards.
28
Washback
That apart, one of the most important points about testing is the effect it
has on teaching. Based on the test results, teaching content, techniques,
methods change. Objectives can be reformulated on the basis of test
results. Students also change their learning strategies based on their test
performance. Tests can thus have a beneficial washback on the whole
instructional process. Hence it is very important to ensure that tests are
valid and reliable.
Tests should be based directly on the objectives of the course to ensure that
the course objectives have been fulfilled. If the test results are positive, it
indicates that the objectives are realistic and achievable. If the test results
are not encouraging, then the course of instruction including the materials
used should be reviewed. In an extreme case it might be that the objectives
are unrealistic. Thus we see that the test is the final determinant of the
course of instruction.
29
3.7 Summary
In this unit we examined the various characteristics of a good test. We
discussed validity and the three types of validity, construct, content and
criterion and their sub-components. We saw how reliability is an important
concomitant of validity. We then moved on to the principles that are very
relevant to communicative language testing, namely, authenticity and
interactiveness. Finally we said that all these principles are subject to the
condition of their being usable and practicable While validity and
reliability are absolutely essential criteria, the principles of authenticity and
interactiveness are relative to the intended purposes and the resources
available.
The unit ended with a section on the impact that tests have on various
stakeholders in the educational system.
In the next unit we will discuss test formats and their appropriateness for
testing various levels of linguistic and cognitive abilities.
3.8 Sources
Bachman, L. F. and
Palmer A. S.
1996. Language Testing in Practice. Oxford. Oxford
University Press.
Hughes, Arthur. 1989.
2g 3c 4b 5f
6 a 7d
Review question II
3,4,6,7,10 reliable; 1,2, 5,8 not reliable
Review question III
1.
2.
3.
4.
Interactiveness
Continuous Comprehensive Evaluation
Practicality
Authenticity
30
31