Sunteți pe pagina 1din 178

Toi copiii din mediul rural

trebuie s mearg mai departe!


Tu i poi ajuta!
Toi copiii din mediul rural
trebuie s mearg mai departe!
Tu i poi ajuta!
Program cofinanat de Guvernul Romniei, Banca Mondial i comunitile rurale.
Unitatea de Management a Proiectului pentru nvmntul Rural
Str. Spiru Haret nr. 10-12, etaj 2,
sector 1, cod potal 010176,
Bucureti
Tel: 021 305 59 99
Fax: 021 305 59 89
http://rural.edu.ro
e-mail: office@ump.kappa.ro
Ministerul Educaiei i Cercetrii
ISBN 00 000-0-00000-0;
ISBN 00 000-000-0-00000-0.
T
H
E

M
E
T
H
O
D
O
L
O
G
Y

O
F

E
V
A
L
U
A
T
I
O
N

A
N
D

T
E
S
T
I
N
G
D
u
m
i
t
r
u

D
O
R
O
B

2
0
0
7
2007
Program postuniversitar de conversie profesional
pentru cadrele didactice din mediul rural
THE METHODOLOGY
OF EVALUATION AND TESTING
Dumitru DOROB
Specializarea LIMBA I LITERATURA ENGLEZ
Forma de nvmnt ID - semestrul IV
Ministerul Educaiei i Cercetrii
Proiectul pentru nvmntul Rural










LIMBA I LITERATURA ENGLEZ







The Methodology of Evaluation and Testing













Dumitru DOROB



2007

2007 Ministerul Educaiei i Cercetrii
Proiectul pentru nvmntul Rural


Nici o parte a acestei lucrri
nu poate fi reprodus fr
acordul scris al Ministerului Educaiei i Cercetrii












































ISBN 978-973-0-04814-8
Table of contents

Proiectul pentru nvmnt Rural i

TABLE OF CONTENTS

Introduction....................................................................................................................... vi

Unit 1
Introduction to Language Testing.................................................................................... 1

1.1 Unit Objectives ............................................................................................................. 1
1.2 Assessment. Testing. Evaluation .................................................................................. 2
1.3 Setting Testing Parameters ........................................................................................... 4
1.4 Participants in Testing .................................................................................................. 5
1.4.1 The Tester .................................................................................................................. 6
1.4.2 The Test Takers/ The Testees.................................................................................... 6
1.4.3 The Test User ............................................................................................................ 6
1.5 The Beneficiaries of Testing ......................................................................................... 6
1.6 The Overall Impact of Testing in Students Motivation .................................................. 7
1.7 Summary ....................................................................................................................... 9
1.8 Key Concepts ................................................................................................................ 9
1.9 Checklist ........................................................................................................................ 9
1.10 Answers to SAQs ...................................................................................................... 10
1.11 Further Readings....................................................................................................... 10

Unit 2
Conditions of a Good Test.............................................................................................. 11

2.1 Unit Objectives ........................................................................................................... 11
2.2 Principles of Good Practice for Assessing Student Learning ...................................... 13
2.3 Validity ........................................................................................................................ 14
2.3.1 Content Relevance .................................................................................................. 16
2.3.2 Content Coverage ................................................................................................... 16
2.3.3 Face Validity ............................................................................................................ 16
2.3.4 Content Validity ....................................................................................................... 16
2.3.5 Predictive Validity .................................................................................................... 17
2.3.6 Construct Validity ..................................................................................................... 17
2.3.7 Curricular Validity .................................................................................................... 19
2.3.8 Criterion Related Validity ......................................................................................... 19
2.3.9 Concurrent Validity .................................................................................................. 20
2.4 Reliability .................................................................................................................... 21
2.4.1 Measuring Reliability ................................................................................................ 21
2.4.1.1 Test-Retest Method .............................................................................................. 21
2.4.1.2 Parallel Forms of the Test to the Same Group ..................................................... 22
2.4.1.3 The Split-Half Method ........................................................................................... 22
2.4.1.4 Factors that Affect Language Scores .................................................................... 23
2.4.1.5 Test Length............................................................................................................ 26
2.5 Discrimination ............................................................................................................. 27
2.6 Feasibility .................................................................................................................... 28
2.7 Washback ................................................................................................................... 29
2.7.1 Negative Washback ................................................................................................. 30
2.7.2 Positive Washback .................................................................................................. 30
2.8 Summary ..................................................................................................................... 31
2.9 Key Concepts .............................................................................................................. 31
Table of contents

ii Proiectul pentru nvmnt Rural
2.10 Checklist ...................................................................................................................31
SAA 1.................................................................................................................................32
2.11 Answers to SAQs ......................................................................................................33
2.12 Further Readings ......................................................................................................34

Unit 3
Types of Tests I ...............................................................................................................35

3.1 Unit Objectives ............................................................................................................35
3.2 Informal Assessment ...................................................................................................36
3.2.1 Informal Assessment of Speaking ............................................................................37
3.2.2 Informal Assessment of Writing ................................................................................38
3.2.3 Informal Assessment of Listening ............................................................................38
3.2.4 Informal Assessment of Reading .............................................................................39
3.2.5 Informal Assessment of Non Linguistic Factors ....................................................39
3.2.6 Informal Assessment of Grammar and Vocabulary ..................................................39
3.3 Formal Assessment - Types of Tests and Testing ......................................................40
3.3.0 Classification by Stimulus Material ...........................................................................40
3.3.1 The purpose, or use, for which they are intended i.e. the types of decisions to be
made function of the scores ....................................................................................41
3.3.1.1 Selection Tests ......................................................................................................41
3.3.1.2 Entrance Tests ......................................................................................................43
3.3.1.3 Readiness Tests ...................................................................................................43
3.3.1.4 Placement Tests ...................................................................................................43
3.3.1.5 Diagnostic Tests ....................................................................................................44
3.3.1.6 Progress Tests ......................................................................................................45
3.3.1.7 Achievement/ Attainment Tests ............................................................................46
3.3.1.8 Mastery Tests ........................................................................................................46
3.3.2 Function of Content ..................................................................................................48
3.3.2.1 Proficiency Tests ...................................................................................................48
3.3.2.2 Achievement or Attainment Tests ..........................................................................52
3.3.2.3 Aptitude or Prognostic Tests .................................................................................53
3.3.3 The frame of reference .............................................................................................54
3.3.3.1 Norm-Referenced Tests ........................................................................................55
3.3.3.2 Criterion Referenced Tests .................................................................................56
3.4 Summary .....................................................................................................................57
3.5 Key Concepts ..............................................................................................................58
3.6 Checklist .....................................................................................................................58
3.7 Answers to SAQs ........................................................................................................58
3.8 Further Readings ........................................................................................................60

Unit 4
Types of Tests II ..............................................................................................................61

4.1 Unit Objectives ............................................................................................................61
4.2 Formal Assessment - Types of Tests and Testing ......................................................62
4.2.1 Scoring Procedures ..................................................................................................62
4.2.1.1 Subjective Tests ...................................................................................................63
4.2.1.2 Objective Test .......................................................................................................63
4.2.1.3 Performance Tests ................................................................................................66
4.2.2 The Specific Technique or Method They Employ .....................................................67
4.2.2.1 Multiple Choice, Completion, Dictation, Cloze Tests .............................................67
Table of contents

Proiectul pentru nvmnt Rural iii
4.2.3 The Approach to Test Construction ......................................................................... 79
4.2.3.1 Direct Tests .......................................................................................................... 79
4.2.3.2 Indirect Tests ........................................................................................................ 79
4.2.4 Function of the Number of Elements Tested at a Time ............................................ 79
4.2.4.1 Discrete Point Tests .............................................................................................. 79
4.2.4.2 Integrative Tests ................................................................................................... 79
4.2.5 Speed Tests vs. Power Tests .................................................................................. 80
4.2.6 Other Test Categories ............................................................................................. 80
4.3 Self Assessment ...................................................................................................... 80
4.4 Standardized Tests ..................................................................................................... 85
4.5 Summary .................................................................................................................... 88
4.6 Key Concepts ............................................................................................................. 88
4.7 Checklist ..................................................................................................................... 88
SAA 2 ............................................................................................................................... 89
4.8 Answers to SAQs ....................................................................................................... 89
4.9 Further Readings ........................................................................................................ 91

Unit 5
Testing the Language Skills I ......................................................................................... 92

5.1 Unit Objectives ........................................................................................................... 92
5.2 Testing Speaking ........................................................................................................ 92
5.2.1 What Is Speaking? ................................................................................................... 93
5.2.2 Types of Speaking Based on Content and Function................................................. 93
5.2.3 Objectives ................................................................................................................ 94
5.2.4 Types of Speaking Tests ......................................................................................... 95
5.3 Testing Listening ...................................................................................................... 101
5.3.1 How Do We Comprehend? .................................................................................... 102
5.3.2 Micro Skills ............................................................................................................ 102
5.3.3 Informal Evaluation ................................................................................................ 103
5.3.4 Scoring the Listening Test ...................................................................................... 106
5.4 Summary .................................................................................................................. 110
5.5 Key Concepts ........................................................................................................... 110
5.6 Checklist ................................................................................................................... 110
5.7 Answers to SAQs ..................................................................................................... 111
5.8 Further Readings....................................................................................................... 112

Unit 6
Testing the Language Skills II ..................................................................................... 113

6.1 Unit Objectives ......................................................................................................... 113
6.2 Testing Reading ....................................................................................................... 114
6.2.1 Types of Reading based on Content and Function ................................................ 114
6.2.2 Types of Reading based on Context and Processing Variables ............................ 114
6.2.3 Types of Reading according to Purpose ................................................................ 115
6.2.4 Cloze Passages ..................................................................................................... 116
6.2.5 Passages with Questions ...................................................................................... 117
6.2.6 Microskills .............................................................................................................. 117
6.2.7 True False Dont Know Checks ....................................................................... 118
6.2.8 Other Reading Techniques .................................................................................... 118
6.2.9 Assessing Overall Comprehension ........................................................................ 118
6.2.10 Issues in Teaching Reading ................................................................................ 119
Table of contents

iv Proiectul pentru nvmnt Rural
6.2.10.1 Narrative Text. Reading for Pleasure ................................................................120
6.2.10.2 Reading for Information .....................................................................................120
6.2.10.3 An Instructive Test ............................................................................................120
6.2.10.4 Types of Test Procedures .................................................................................121
6.3 Testing Writing ..........................................................................................................121
6.3.1 Conditions under which Writing Takes Place .........................................................122
6.3.2 Current Theories of Writing with Particular Reference to
Foreign Language Writing .......................................................................................123
6.3.2.1 Writing as a Product ............................................................................................123
6.3.2.2 Writing as a Process ...........................................................................................124
6.3.2.3 Writing as a Social Activity ..................................................................................124
6.3.3 The Main Approach to Teaching Writing. Text Based Approaches .....................125
6.3.3.1 Grammatical Form Practice ................................................................................125
6.3.3.2 A Communicative Approach ................................................................................125
6.3.3.3 Writer Based Approach ....................................................................................125
6.3.4 Various Choices of Writing Tasks ..........................................................................126
6.3.4.1 Scoring Essay Type Tests ..................................................................................126
6.3.4.2 The Point Score Method .....................................................................................128
6.4 Summary ...................................................................................................................130
6.5 Key Concepts.............................................................................................................130
6.6 Checklist ...................................................................................................................131
SAA 3 ..............................................................................................................................131
6.7 Answers to SAQs ......................................................................................................132
6.8 Further Readings ......................................................................................................132

Unit 7
Testing the Language System and Beyond ................................................................133

7.1 Unit Objectives ..........................................................................................................133
7.2 Testing Pronunciation ................................................................................................133
7.3 Testing Grammar and Usage.....................................................................................138
7.3.1 Multiple- Choice Fill In .........................................................................................138
7.3.2 Modify and Fill In .................................................................................................138
7.4 Testing Vocabulary ....................................................................................................140
7.4.1 Cloze ......................................................................................................................142
7.4.2 Multiple Choice Fill- In Type.................................................................................142
7.4.3 Multiple Choice Synonym Type ...........................................................................143
7.4.4 Matching ................................................................................................................143
7.4.5 Simple Prompts ......................................................................................................143
7.4.6 Selection of the Words to Be Tested ......................................................................143
7.4.7 Translation .............................................................................................................143
7.4.8 True/ False..............................................................................................................143
7.4.9 Checklist Tests .......................................................................................................143
7.5 Testing Beyond Language Form ...............................................................................144
7.5.1 Discourse and Culture ............................................................................................145
7.5.2 Speech events .......................................................................................................147
7.5.3 Literature ................................................................................................................148
7.6 Summary ...................................................................................................................149
7.7 Key Concepts ............................................................................................................149
7.8 Checklist ...................................................................................................................149
SAA 4 ..............................................................................................................................150
7.9 Answers to SAQs ......................................................................................................150
Table of contents

Proiectul pentru nvmnt Rural v
7.10 Further Readings .................................................................................................... 150


Unit 8
New Trends in Testing ................................................................................................. 151

8.1 Unit Objectives ......................................................................................................... 151
8.2 General Trends.......................................................................................................... 151
8.3 Computer- Based Language Testing ........................................................................ 152
8.4 Alternative Assessment ............................................................................................ 156
8.4.1 Techniques ............................................................................................................ 156
8.4.2 Journals ................................................................................................................. 156
8.4.3 Conferences .......................................................................................................... 157
8.4.4 Cooperative test construction ................................................................................ 157
8.5 Portfolios ................................................................................................................... 157
8.5.1 Characteristics ....................................................................................................... 158
8.5.2 Assessing Portfolios .............................................................................................. 159
8.5.3 Portfolio Content .................................................................................................... 160
8.5.4 Useful advice on development of portfolios ........................................................... 161
8.6 Summary .................................................................................................................. 162
8.7 Key Concepts ........................................................................................................... 163
SAA 5 .............................................................................................................................. 163
8.8 Answers to SAQs ..................................................................................................... 164
8.9 Further Readings....................................................................................................... 164



Bibliography .................................................................................................................. 165

Introduction

vi Proiectul pentru nvmntul Rural

INTRODUCTION


1. What this course is about

This course is an introduction to the methodology of evaluation
and testing in teaching and learning English as a Foreign Language.
It is obvious to all educators that the issues of grading and reporting
on student learning continue to challenge teachers. However more is
known at the beginning of the 21
st
century than ever before about
the complexities involved and how certain practices can influence
teaching and learning. This introduction tries to identify grading and
reporting practices that can beneficially influence teaching and
learning. Developing teachers awareness is another area of
interest. The practical side of the course is obvious: to encourage
the design and use of effective techniques in English language
testing. Summarily, the course addresses and answers a number of
questions about testing, helping you to develop a scientific
perspective before you begin using and devising tests.

2. Course objectives

One of the major goals is to assist you in recognizing that the
purposes of measurement and evaluation are good not bad.
Measurement, evaluation and testing are essential to sound educational
decision making. After reading this course you will be able to:
recognize that evaluation and testing are essential to sound
educational decision making;
understand the components of a model of decision making;
recognize the way evaluation and testing can assist in
instructional, guidance, administrative, and research decisions;
have a better understanding of the role of testing in language
teaching;
analyse and assess different kinds of tests;
identify the different purposes of testing;
identify the way in which testing can encourage good
teaching and learning;
learn how teachers can test the main skills, the language
system and beyond;
learn and apply techniques of test construction and
administration;
design tests that can assist/ complete good teaching and
learning;
develop techniques of self-learning;
appreciate the variety of interesting issues in evaluation and
testing that will be covered in subsequent chapters.

Introduction

Proiectul pentru nvmntul Rural vii


3. Course content and structure

This course is divided in 8 units of study. Each unit comprises
general presentations and self-assessment questions (SAQs) that
aim at actively involving you in the learning process. The solutions
and suggestions to the SAQs are provided in a separate section. At
the same time, SAQs also give you a sense of direction, motivating
you to continue in the right direction. We also provide other
instruments that might help you e.g. a summary, key concepts, a
checklist, and, in four cases, assignments. All of them must be
submitted for evaluation. Each assignment, accompanied by a
written feedback, will be marked by your tutor and returned to you. If
you fail to present adequate papers, you will be given two
opportunities to submit work for assessment and feedback.

4. The units of learning

Unit 1 (Introduction to Language Testing) defines and differentiates
the terms evaluation, assessment, testing; classifies the purpose of
evaluation and testing.

Unit 2 (Conditions of a Good Test) presents the principles of good
practices for assessing student learning and the basic tools in
assessing tests: validity, reliability, discrimination, feasibility, washback.

Unit 3 (Types of Tests I) is an introduction to informal assessment of
the main skills and of Grammar and Vocabulary; it also presents a
classification of tests based on a number of criteria: the purpose for
which they are intended (Selection, Entrance, Readiness,
Placement, Diagnostic, Progress, Achievement/ Attainment, Mastery
Tests), the content upon which they are based (proficiency,
achievement, aptitude tests); the frame of reference within which the
scores of the tests are interpreted (Norm-Referenced, Criterion-
Referenced Tests).

Unit 4 (Types of Tests II) continues the classification of the formal
tests function of Scoring Procedures (Subjective, Objective,
Performance Test), the specific technique they employ (multiple-
choice, completion, dictation, cloze tests), of the approach to test
construction (direct and indirect test), of the number of elements
tested at a time (discrete point/ integrative tests, speed vs power
tests); other test categories.

Unit 5 (Testing the Language Skills I) explores the issues of testing
two of the main language skills: speaking and listening.

Unit 6 (Testing the Language Skills II) introduces you to the main
techniques of testing the other two language skills: reading and writing.

Introduction

viii Proiectul pentru nvmntul Rural
Unit 7 (Testing the Language System and Beyond) discusses the
main issues of testing pronunciation, grammar ad usage,
vocabulary, discourse, literature, culture etc

Unit 8 (New Trends in Testing) tries to identify the main contemporary
trends in evaluation and testing: computer based language testing,
alternative forms of assessment, authentic testing, etc.

5. Self-Assessment Questions (SAQs)

The self-assessment questions aim at actively involving you in
the learning process. The tasks aim mainly at activating your
schemata, at making you think creatively. The variety of SAQs
(multiple-choice, answering questions, matching, true-false etc) tries
to exemplify the theoretical and practical aspects of testing.

A self-assessment question (SAQ) is signalled by the icon on the left.



6. Point(s) to Ponder

Points to Ponder include aphorisms and quotations which may
be starting points for personal reflection on various issues/
controversies.

Point to Ponder is signalled by the icon on the left.



7. Solutions and suggestions for SAQs

You are advised to check your answers to each SAQ by going
to this section at the end of each unit. You should not be
discouraged if some of your answers are different from those offered
in this section. Read them carefully and try to learn from them as,
hopefully, you will find them interesting and thought-provoking.

8. Assessment and Evaluation

The course also contains four send-away assignments (SAAs)
which will enable your tutor to assess your performance on the course.



A send-away assignment (SAA) is signalled in the course text by
the icon on the left.

These SAAs count for 40% of your final grade. The exam at the
end of the semester will add 40% while portfolio assessment
Introduction

Proiectul pentru nvmntul Rural ix
represents 20%. Your portfolio should contain samples of tests of
various kinds, tests and essays of your pupils, other materials
designed by you for evaluation purposes. In compiling a portfolio,
your creativity is a must. The table below represents the place,
number of tasks, and the weight of each assignment.

Assignment
no.
The unit
containing
the SAA
The number of tasks
and their weight in
each SAA
Weight of
each SAA in
the final
assessment
SAA no. 1 Unit 2 1
1 100%
10%
SAA no. 2 Unit 4 1
1 100%
5%
1 50%
SAA no. 3 Unit 6 2
2 50%
5%
1 25% SAA no. 4 Unit 8 2
2 75%
20%

In the assessment of each assignment, the tutor will take into
account:
the degree to which your answer proves that you meet the
requirements of the task - 40%
the correctness of your metalanguage 10%
discourse features (coherence, cohesion etc) 20%
grammatical accuracy 20%
spelling accuracy 10%

Each assignment must be completed and sent to the tutor in the
allotted study week, function of your study schedule. A type written
paper is recommended. If this is not possible, take care that your
handwriting is legible.
My advice is to contact your tutor for any queries.

9. Further readings

Before starting studying this textbook, I recommend you read
two books in Romanian:
Vagler, Jean (2000) Evaluarea n nvmntul preuniversitar,
translated by Ctlina Grba i Ionela Blu, Iai: Polirom
Pavelcu, Vasile (1968) Principii de docimologie, Bucureti: EDP


10. Your study schedule

This course is devised for 42 hours of study i.e. 28 hours are
meant for individual study of the course material (the solving of the
SAQs included); 6 hours are allotted to your tutorial meetings and 8
hours for the completion of your SAAs.
Plan your study by taking into account that an academic
semester lasts 14 weeks. Function of the difficulty of the various
topics, I recommend the following study schedule:
Introduction

x Proiectul pentru nvmntul Rural

Week Unit Number of
study hours
Assignment Number of
hours for the
SAAs
1 Introduction 2
2
3
Unit 1
Unit 2
4
SAA no. 1

2
4
5
Unit 3 4
6
7
Unit 4 4 SAA no. 2 2
7
8
Unit 5 4
9
10
Unit 6 4 SAA no. 3 2
11
12
Unit 7 2
13 Unit 8 2 SAA no. 4 2
14 Revision 2
TOTAL 28 hours

Planning your course work is important as it will enable you to
send your assignments to your tutor in due time.

11. Appendices

To facilitate your acquisition of the main issues, several
appendices have been added:
At the end of each unit you may find a Summary, a list of
Key words and a Checklist. The Further Reading section
gives you a minimal bibliography, indicating the pages where
you may find the information you need.

At the end of the course your final grade will take into account:
attendance of and contribution to face-to-face meetings with your
tutor and assisted activities, solving of SAQs and SAAs 40%
final examination 40%
portfolio (containing your tests/ your models) 20%
Introduction to language testing

Proiectul pentru nvmntul Rural 1

Unit 1

INTRODUCTION TO LANGUAGE TESTING



1.1 Unit Objectives ............................................................................................................. 1
1.2 Assessment. Testing. Evaluation .................................................................................. 2
1.3 Setting Testing Parameters ........................................................................................... 4
1.4 Participants in Testing .................................................................................................. 5
1.4.1 The Tester .................................................................................................................. 6
1.4.2 The Test Takers/ The Testees.................................................................................... 6
1.4.3 The Test User ............................................................................................................ 6
1.5 The Beneficiaries of Testing ......................................................................................... 6
1.6 The Overall Impact of Testing in Students Motivation .................................................. 7
1.7 Summary ....................................................................................................................... 9
1.8 Key Concepts ................................................................................................................ 9
1.9 Checklist ........................................................................................................................ 9
1.10 Answers to SAQs ...................................................................................................... 10
1.11 Further Readings....................................................................................................... 10



1.1 Unit Objectives

After you have completed the study of this unit you will be:

familiar with the background of language testing
aware of the fact that testing is an important part of every teaching
and learning experience
aware that both experienced and inexperienced teachers of
English as a Foreign Language (EFL) need to improve their skills
in constructing and administering classroom tests
able to understand how testing helps students create positive
attitudes towards your class and able to identify the main issues of
language testing
able to define and differentiate the terms test, measurement,
evaluation and assessment
recognize that assessment, measurement, evaluation and
testing are essential to sound educational decision making
recognize the ways assessment, measurement and evaluation can
assist in instruction, guidance, administrative and research decisions.
Introduction to language testing

2 Proiectul pentru nvmntul Rural

1.2 Assessment. Testing. Evaluation

The terms test, measurement, evaluation and assessment
are occasionally used interchangeably, but some users make
distinctions among them. Measurement often connotes a broader
concept. We can measure characteristics in other ways than by
giving tests (observation, rating scales, etc)
The term assessment refers to a variety of ways of collecting
information on learners ability or achievement. Although testing and
assessment are often used interchangeably, the latter is an umbrella
term encompassing measurement instruments such as tests, as well as
qualitative methods of monitoring and recording student learning such
as observation, simulations or project works. Assessment is also
distinguished from evaluation which in a TEFL setting is a process of
collecting, analysing and interpreting information about teaching and
learning in order to make informed decisions that enhance student
achievement and the success of educational programmes. It means
that evaluation is concerned with the overall language programme:
textbooks, other instructional materials, student achievement.

Point to Ponder
At the end of fifth-grade, we have two pupils who are both reading
at the fifth-grade level. However, at the beginning of the year, one
student was reading at the third-grade level, and one at the fourth-
grade level. Are our evaluations of those outcomes the same?


Answer: Measurement is not the same as evaluation. In this particular
case, evaluations are not the same. One student progressed at the
above-average rate, and the other at a below-average rate.
Assessment of achievement is what a student has learned in
relation to a particular course content or course objectives.
Formative assessment is carried out by teachers during the
learning process with the aim of using the results to improve
instruction. Summative assessment is done at the end of a course
to provide information on programme to educational authorities.
When you teach you are part of a cultural and social system
that extends beyond the walls of your classroom. Both you and your
pupils have expectations about
what they will do,
what they should get from and give to the experience, and
how you will know if you are succeeding.

Beyond the walls of the classroom, almost everything is tested
as the contemporary society values numbers, counting, and doing
research based on figures. Measurement is a fact of life.
Introduction to language testing

Proiectul pentru nvmntul Rural 3


Point to Ponder
To teach without testing is unthinkable.
The Joint Committee of the American Association of School
Administrators

Testing is a very widespread and common management
strategy if we accept the following:

testing represents the explicit codification of the real goals of a
teaching and learning programme. Contemporary trends in testing
show that this management strategy is rarely the decision of the
individual teacher. Rather, unless we speak about formative
tests, it is passed down from the next administrative level, or even
from the Ministry of Education and Research. But how you feel
about it, and the way you let it affect your attitudes and the
attitudes of your pupils, is still under your control.

marking is a form of assessment. It involves giving the pupils a
grade mark. Any assignment, oral or written can be marked.
Marking is one of the most time-consuming parts of a teachers
job. If you want to cope with all this marking, you have to take into
account a number of options. You may:
correct all errors
be selective in choosing particular errors
correct understanding or mistakes of content
suggest/ require corrections to be done
go over areas of common difficulty with the whole class
see individual pupils about their work
display the best papers
simply put a tick to show it has been read
students should know how they are to be assessed
You should avoid:
building up a back lag of unmarked work
marking down a paper because a pupil misbehaved

SAQ 1
What is measured/ tested beyond the walls of your classroom?








Write your answers in the space provided above (in no more than 60 words) and
compare them to those in the Answers to SAQs section at the end of the unit.

Introduction to language testing

4 Proiectul pentru nvmntul Rural

1.3 Setting Testing Parameters

Experience has demonstrated that teachers, even successful
and long experienced ones, are relatively unsophisticated and
careless when it comes to the design, operation and interpretation of
evaluation instruments, and in the interpretation and presentation of
results.
The first stage in the preparation of an evaluation instrument is
the setting of appropriate parameters.

WHY? What is the purpose of the evaluation?

Possible answers: to determine the extent to which a course/
chapter had achieved its stated aims; to measure pupils reaction/
satisfaction; to provide basis for comparison of different approaches,
methods, techniques.


WHAT? What is to be evaluated?
Possible answers include some/ the entire course; content;
methodology; participants, teachers etc. Always prepare a table of
specifications function of what you want to evaluate.
WHEN? When is it to be evaluated? During the course? At the end? Before
and after the course?

WHO? Who will evaluate? Teachers? Inspectors? An outside party?
WHY?
to give feedback to
how your teaching
is going
to provide
feedback and
guide
improvement
To classify
or grade
learners
To enable
student
progression
To add variety
to students
learning
experience
To
enable
grading
To provide
statistics for
the school
To maximize
learners
motivation
To diagnose faults
and provide
students with an
essential tool to
put things right

Introduction to language testing

Proiectul pentru nvmntul Rural 5

HOW? How will the evaluation be carried out? What form will it take? Will it
be a pen-and-paper instrument or be conducted orally? Will it seek to
elicit quantitative or qualitative data, or both?
Among these questions, the why? and the what? are obviously
of major importance. Also, in the real world, there will be constraints
(such as time, space, resources) that will operate on the parameters
of when?, who? and how?.
Once parameters are defined, it is possible to clarify valid
objects for evaluation and to agree on an appropriate methodology.
For small scale evaluation projects in which time is an important
factor, the most convenient vehicle is the pen-and-paper instrument.
At the next stage, that of instrument design, it is necessary to
consider such questions as validity, format and administration.


1.4 Participants in Testing

The participants in language testing are the:
tester
test taker/ the testee
test user
WHAT DO YOU
REALLY WANT TO
ASSESS?
Do you test subject
knowledge (information
recall) or how well
students can use such
information for synthesis,
anaysis and evaluation?
Do you evaluate
group work or
individual work?
Is assessment formative
or summative?
Does testing encourage deep,
surface or strategic learning?
Is the assessment convergent (aimed at
identical results) or divergent (to demonstrate
individuality and diversity)?
Is the assessment norm-
referenced or criterion-
referenced?
Is it teaching or learning
that is being assessed?
Are you going to assess
product (report, essay) or
process (how the learners
achieved the outcome)?
Is time/ context specific?
Is the assessment holistic?
Introduction to language testing

6 Proiectul pentru nvmntul Rural
1.4.1 The Tester
The tester may be:
a foreign language teacher who designs, administers, and
interprets tests given to his own learners
a group of people responsible for developing tests requirements
a private or governmental testing agency (PALSO in Greece; ETS
the Educational Testing Service in New Jersey, USA; CITO in
Holland or the Ministry of Education and Research in Romania
other organizations/ international meetings: the annual Language
Testing Research Colloquium, The Scientific Commission on
Language Tests and Testing of the International Association of
Applied Linguistics, Language Testing a professional and
academic journal

1.4.2 The Test Takers/ The Testees
The Test Takers may be:
students in schools and universities
applicants for positions that require foreign language abilities
people seeking certification of language proficiency for their jobs
Candidates who are not test wise (familiar with the test format
and content) are usually at a disadvantage. In order to avoid this,
some programmes offer preparation for tests, practice sessions (for
example, the TOEFL textbook).

1.4.3. The Test User
The Test Users are the individual or institution that make use of
the interpretation of scores e.g.
foreign language teachers (to encourage and monitor learning, for
personal feedback)
the Ministry of Education uses tests to ensure that the National
Curriculum is followed and to assess the standards achieved in
school work
foreign universities (American or British) use language tests
(TOEFL or Cambridge Examination) to assess the proficiency and
predict if applicants can attend successfully a programme of
instruction in English
public and private institutions assess the linguistic competence of
those employees who need a foreign language in their work
foreign language teaching schools use tests for placement at an
appropriate level in their courses

1.5 The Beneficiaries of Testing

diagnostic and placement tests offer advantages of improved
efficiency for learner, teacher, and educational system
admission tests protect admitting institutions and agencies that
offer scholarships from too high a failure rate
certification tests offer advantages to the persons who pass the
test and the agencies that hire them. They also offer protection to
existing professionals organized in professional organizations who
control access to certain professions
Introduction to language testing

Proiectul pentru nvmntul Rural 7
testing agencies TOEFL, University of Cambridge, Local
Examination Syndicate, English as a Foreign Language, UCLES;
tests are major sources of income for testing agencies


Points to Ponder
The most effective teachers assess student learning often
One research study found that of all the things a student
learns, 80% is forgotten in one year. Most of what is
forgotten are facts memorized for a quiz or test
What conclusion should you draw from these two statements?

1.6 The Overall Impact of Testing in Students Motivation

Testing has an impact on students self-esteem. Self-esteem is
an outcome of educational experience and a factor determining
future learning. However, you have to be very careful because one
impact of the tests is the reduction in self-esteem of those students
who did not achieve well.
Although at primary school level pupils are not aware that tests
give a narrow view of their learning, test performance is more highly
valued than what is being learned. Some pupils are also aware of the
narrowing of the curriculum. Only the pupils confident of success
enjoy tests. High achievers use appropriate test taking strategies.
Low achievers may become overwhelmed and de-motivated when
they repeatedly receive low grades. If we are not careful, the gap
between low and high achieving students may be increased. The use
of repeated practice tests is not a good practice because pupils may
adopt test taking strategies designed to avoid effort.
How assessment of students learning is reported back to the
pupils (feedback) affects motivation to learn. Your feedback should
focus on how to improve or build on what has been done (task- related
feedback) rather than on marks which are formally or informally
compared with those of others. Motivation is increased if you explain the
purpose of their tests and provide task related feedback.


SAQ 2
Maslows hierarchy of needs includes:
The self actualization needs
The esteem needs
The belongingness and love needs
The safety needs
The psychological needs
What needs are satisfied if all learners experience success, and get
praise and other reinforcement?

Circle the correct answer. Compare your answer to that in the Answers to SAQs
section at the end of the unit.
MOTIVATION
AND
TESTING
Introduction to language testing

8 Proiectul pentru nvmntul Rural
















Improving
traditional exam
questions
Keep the
language
simple
Use questions that seek
to discover what has been
learned, to reinterpret
their knowledge
intelligently
Be sure that
learners
understand the
instructions
Avoid trick
questions
Be creative. Allow
your students to
write one question
of their own in an
exam
Coping with
exam failure
Help students focus on
what they can do in the
future to improve
Offer opportunities
for practising
under simulated
exam conditions
Build-up confidence
(develop revisions and
exam techniques
Let students
play exams
Give students
opportunities to
reflect on
unsuccessful
exam
performance
Help students
identify their
strengths
Take account of the
feelings of students
who fail exams
See failure as
an opportunity
for learning
Introduction to language testing

Proiectul pentru nvmntul Rural 9

1.7 Summary


This unit aimed at sensitizing the learners about the main issues of
testing. We cannot stop testing. Tests seem inevitable because
they are part of a much larger cultural system. Testing can become
a healthy part of an honest and responsive learning atmosphere.
The following statements summarize the major points of the first
unit:

1. Assessment and testing, measurement and evaluation are
essential to sound educational decision making;
2. The concept of assessment is broader than that of testing.
The same is true about measurement. We can measure
characteristics in ways other than by giving tests.

Who wants tests?

Learners in order to be motivated, learners want solid
evidence of progress
Parents are concerned when children are not making
normal progress
Schools tests are used as the basis for grades which
schools require teachers to give
Adults want tests to use in getting into schools or into jobs
Teachers tests can give teachers ideas for effective
teaching procedures, tests can help teachers plan for the
future, tests are useful management tools as tests can
stimulate effort, guide learning, provide fair rewards for
honest work.


1.8 Key Concepts
Language Test
Assessment
Measurement
Assessment Criteria
Evaluation
Testing

1.9 Checklist
Do you ask questions which students can answer successfully?
Do you leave time for student to think?
Do you always praise or otherwise acknowledge correct
responses?
Do you avoid ridiculing students answers?
If no answer comes, are you able to ask a simpler question that
leads to the answer to the original question?



Introduction to language testing

10 Proiectul pentru nvmntul Rural
1.10 Answers to SAQs

SAQ 1 Your answer depends upon your personal experience.

Time we are surrounded by machines, schedules, and
procedures intended to get more things done more quickly.
Time we tend to become impatient when someone/ something
wastes our time: a disorganized book, a tie-up in traffic.
Possessions, money, land, other goods.
Anything that can be counted becomes more real, safer, more
valuable (research results).
Average income.
Cars are tested repeatedly.
Testing whether the consumer will buy a product.
Drugs are tested.



SAQ 2

If your answer to SAQ 2 is not comparable to the one suggested
below, please reread section 1.6 again.

Esteem needs. Students gain respect from the teacher and other
learners. At the same time, almost all of them want to be praised
by their parents.



1.11 Further readings


Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 1-4
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press, pp 1-9

Conditions of a Good Test

Proiectul pentru nvmnt Rural 11

Unit 2

CONDITIONS OF A GOOD TEST


2.1 Unit Objectives ........................................................................................................... 11
2.2 Principles of Good Practice for Assessing Student Learning ...................................... 13
2.3 Validity ........................................................................................................................ 14
2.3.1 Content Relevance .................................................................................................. 16
2.3.2 Content Coverage ................................................................................................... 16
2.3.3 Face Validity ............................................................................................................ 16
2.3.4 Content Validity ....................................................................................................... 16
2.3.5 Predictive Validity .................................................................................................... 17
2.3.6 Construct Validity ..................................................................................................... 17
2.3.7 Curricular Validity .................................................................................................... 19
2.3.8 Criterion Related Validity ......................................................................................... 19
2.3.9 Concurrent Validity .................................................................................................. 20
2.4 Reliability .................................................................................................................... 21
2.4.1 Measuring Reliability ................................................................................................ 21
2.4.1.1 Test-Retest Method .............................................................................................. 21
2.4.1.2 Parallel Forms of the Test to the Same Group ..................................................... 22
2.4.1.3 The Split-Half Method ........................................................................................... 22
2.4.1.4 Factors that Affect Language Scores .................................................................... 23
2.4.1.5 Test Length............................................................................................................ 26
2.5 Discrimination ............................................................................................................. 27
2.6 Feasibility .................................................................................................................... 28
2.7 Washback ................................................................................................................... 29
2.7.1 Negative Washback ................................................................................................. 30
2.7.2 Positive Washback .................................................................................................. 30
2.8 Summary ..................................................................................................................... 31
2.9 Key Concepts .............................................................................................................. 31
2.10 Checklist ................................................................................................................... 31
SAA 1 ................................................................................................................................ 32
2.11 Answers to SAQs ..................................................................................................... 33
2.12 Further Readings ...................................................................................................... 34


2.1 Unit Objectives

Testing, including all forms of language testing, is among other
things, one form of measurement. If we test reading comprehension
or spelling, for example, we want to measure to what degree these
abilities are present in the examinee. But there is potential for error
whenever we weigh something. Tests of language may be inaccurate
(or unreliable) or invalid. Tests, to be useful instruments, must offer
reliable and valid scores.

The objectives of this unit aim at:
offering you tools for the evaluation of the adequacy of any given
test
Conditions of a Good Test

12 Proiectul pentru nvmnt Rural
making you recognize sources of error variance and factors that
influence reliability estimate
understanding and interpreting the reliability/ validity of different
scores
understanding the relationship between reliability and validity
understanding the basic kinds of validity evidence
interpreting various expressions of validity
recognizing what factors affect validity and how they affect it
recognizing the relationship between test validity and decision
making
making you familiar with the rudiments of statistical concepts
developing your awareness of the characteristics that make a good
test
offering you an instrument for your personal classroom research
giving you an example about the importance of interdisciplinary
studies in TEFL
making you capable of rating a test, taking into account several
criteria:

Rating: 10 highly adequate, 0 highly inadequate

10 9 8 7 6 5 4 3 2 1 0

1. Validity (the test should adequately measure what it is supposed to
measure)
2. Difficulty (not too difficult or too easy; Is it a test for adults or for
children; Has the test been piloted?)
3. Reliability (Does the test minimize the presence of measurement
error; Can it be used for important post test decisions; Is the test
long enough?)
4. Applicability (Is the test format familiar to the testees, the
administrator of a tape-recorded test or the one using live voices?,
etc)
5. Relevance (Do all the testees have the same native language
background? Is the sample of test items drawn from a relevant
domain?)
6. Replicability (Is it possible to administer equivalent forms of the
same test to avoid cases of security breakdown?)
7. Interpretability (How is the test to be scored, reported and
interpreted?)
8. Economy (Is the test cheap or expensive to develop, purchase,
duplicate, score, report, store and interpret?)
9. Availability (Can you find and administer an available standardized
test?)
10. Acceptability (Is the test accepted by your immediate superiors or
by the Ministry of Education and Research, are there any
constraints?)

Conditions of a Good Test

Proiectul pentru nvmnt Rural 13

2.2 Principles of Good Practice for Assessing Student Learning

Assessment is not an end in itself but a vehicle for educational
improvement. Educational values should determine what we
choose to assess and how. When questions about educational
mission and values are skipped over, assessment threatens to be
an exercise in measuring what is easy, rather than a process of
improving what we really care about.

Assessment is most effective when it reflects on understanding of
learning as multidimensional, integrated, and revealed in
performance over time

Assessment should reflect that learning is a complex process i.e. it
involves knowledge, values, attitudes, habits of mind that affect
both academic success and performance in real life

It follows that assessment should reflect the complexity of learning
by:
- employing a diverse array of methods including those that call
for actual performance;
- using methods that cover time so as to reveal change, growth
and increasing degrees of integration

Assessment works best when the programmes it seeks to improve
have clear, explicitly stated purposes
Assessment is a goal-oriented process. It entails comparing
educational performance with educational purposes and
expectations these are derived from the institutions mission,
from teachers intentions in programme and course design, and
from knowledge of students own goals, it follows that assessment
as a process pushes an institution of learning towards clarity about
where to aim and what standards to apply. Assessment also
prompts attention to where and how programme goals will be
taught and learned
Assessment requires attention to outcomes but also to the
experiences that lead to those outcomes
Assessment can help us understand which students learn best
under what conditions. With such knowledge comes the capacity to
improve the whole of their learning (students experience along the
way, curricula, teaching
Assessment works best when it is ongoing, not episodic.
Assessment is a process whose power is cumulative. The teacher
should monitor progress towards intended goals in a spirit of
intended improvement.
Conditions of a Good Test

14 Proiectul pentru nvmnt Rural


SAQ 1

Read the principles of good practice for assessing pupil learning and
try to write your own assessment Decalogue on a single sheet of
paper.

1. ---------------------------------- ---------------------------------------------

2. ----------------------------------- --------------------------------------------

3. ---------------------------------- --------------------------------------------

4. --------------------------------- --------------------------- ----------------

5. --------------------------------- ----------------------------------------------

6. ------------------------------------------------------------------------------

7. -------------------------------------------------------------------------------

8. -------------------------------------------------------------------------------

9. -------------------------------------------------------------------------------

10. -------------------------------------------------------------------------------

Write 10 principles in the spaces provided above. Your choices depend on your
teaching and learning experience.


2.3 Validity

Validity, the most important quality of a test, refers not only to
the degree to which the test actually measures what is intended to
measure, but also to the adequacy and appropriateness of the way
we interpret and use test scores.
A valid test is one in which a testees score gives a true
reflection of his ability on the trait. Statistical and descriptive means
have been used to check validity. Content analysis of tests
determines:

the language items present in a test (quality, number, whether
they are representative samples)
the skills, or some aspects of a skill (the reading speed, the
variety of text types, etc)
Validity might also be a function of language knowledge, skill in
using the language, ability to negotiate certain language activities,
task authenticity.
Conditions of a Good Test

Proiectul pentru nvmnt Rural 15


Points to Ponder

If a large number of students do poorly on an exam, reconsider
its worth.
When used to describe a test, the term valid should be
accompanied by the preposition for e.g. This test is valid for
Make the first test relatively easy to build up students
confidence.
Never argue with a student about a grade in front of the class.
Offer to meet him/ her the next day. Give him/ her some time to
cool off first.


If the test scores are reliable, then performance on the test cannot
be affected by measurement errors but by other causes. In examining
validity, we consider the relationship between performance and other
types of performance in other contexts. It also implies:
the uses or interpretation we make of the test results
the value systems that justify a given use of test scores
the educational and social consequences of the uses we make of
tests

In test validation we are not examining the validity of the test content
or of even the test scores themselves, but rather the validity of the
way we interpret or use the information gathered through the testing
procedure (Bachman, 1990: 238)
Reliability is a requirement for validity. A test is not valid unless
it meets the conditions of reliability. The investigation of reliability and
validity are complementary aspects identifying, estimating and
interpreting different sources of variance in test scores. Correlation
between scores on parallel test demonstrates reliability. Correlation
between scores on a multiple choice test of grammar and ratings of
grammar on an oral interview demonstrate validity.



Validity is a unique concept. The distinction among content
validity, criterion-related validity and construct validity is inadequate.
Practically, they are complementary types of evidence.

Point to Ponder

Reliability is a necessary but not sufficient condition for validity to be
present i.e. it is possible for a test to be reliable without being valid for
a specific purpose, but it is not possible for a test to be valid without
first being reliable.

Conditions of a Good Test

16 Proiectul pentru nvmnt Rural

2.3.1 Content Relevance (validity)

Content relevance (validity) requires the specification of the ability
domain, of the task, or task domain. i.e. what is that the test
measures, the attributes of the stimuli presented to the tester, the
nature of his responses.

2.3.2 Content Coverage

Content coverage refers to the extent to which the tasks required in
the test adequately represent the behavior of domain in question
i.e. to ensure that the tasks required by the test were representative
of that domain.

2.3.3 Face Validity

A test has face validity when it looks right to other people (testers,
teachers, testee). J.B. Heaton gives the following example: Is
photography an art or a science? (discussion from a public matriculation
examination) It is obvious that this question demands specialized
knowledge. Adapted tests may lack face ability (the presence of culturally
bound words, for example). Face validity may increase motivation as
testees will try harder if the test looks sound. Face validity means that
the testees feel that the test tests what it is supposed to test. In order to
increase face validity use the following advice:
use a carefully constructed format;
include items that are clear;
give clear directions;
be sure that the tasks are familiar and relate to their course of study;

For Oller (1979), face validity is a kind of impressionistic
reaction on the part of the examinees.


SAQ 2

Why is face validity a desirable feature of a test?



Write your answer in the space provided above (in no more than 45 words) and
compare it to that in the Answers to SAQs section at the end of the unit.

2.3.4 Content Validity
This concept answers the question: Is the content a comprehensive
and representative sample of what you want to measure? i.e. of the
language skill, structures. It follows that the test constructor needs a
specification of the skills or structures from which to make a
principled selection. Lack of content validity has a harmful backwash
effect i.e do the items in the test represent an adequate sample of
ability (the neglected areas in the test are usually neglected by
Conditions of a Good Test

Proiectul pentru nvmnt Rural 17
testees). A written test which tries to measure pronunciation lacks
face validity.

2.3.5 Predictive Validity
Predictive validity concerns the degree to which a test can predict
testees future behavior. Predictive validity answers the question:
Does the score predict a testees ability to cope with a graduate
course at an American university. If a university admission exam is
administered and its scores are correlated with successive annual
grades, we notice the highest validity only after one year of study.
There is a tendency for predictive validity to decrease with each
successive year, reflecting maturational changes in the students. In
this case, the predictive validity is poor. Other criteria than annual
grades should be selected as success measures (annual ratings of
job performance).
The choice of criterion measure raises interesting questions:

should we rely on the subjective judgments of supervisors?
how helpful is it to use final outcome as the criterion measure
when so many factors other than ability (subject knowledge,
intelligence, motivation, health) will have contributed to the
outcome?
The typical example of predictive validity would be where an
attempt was made to validate a placement test. How many of the
students were misplaced.
Information about criterion relatedness concurrent or
predictive is by itself insufficient evidence for validation.

2.3.6 Construct Validity
Construct validity concerns the extent to which a test measures just
the ability which it is supposed to measure i.e. the purpose of
construct validation is to provide evidence that underlying theoretical
constructs being measured are themselves valid. The word construct
refers to a complex idea formed by combining single ideas. A
synonym might be the word concept. Examples of constructs:
reading ability, writing ability. Construct validation answers the
question: To what extent performance on tests is consistent with
predictions that we make on the basis of a theory of abilities.
Construct validity is central to the appropriate interpretation of test
scores and provides the basis for the view of validity as a unitary
concept.
Bachman considers that in conducting construct validation, we
are empirically testing hypothesized relationships between test
scores and abilities. Construct validation can thus be seen as a
special case of verifying, or falsifying a scientific theory, and just as a
theory can never be proven, the validity of any given test use or
interpretation is always subject to falsification. Construct validation
requires both logical analysis and empirical investigation. It also
reflects the extent to which the content of a test or of assessment
reflects current understanding of the skill(s) or sub-skill(s) being used.
Conditions of a Good Test

18 Proiectul pentru nvmnt Rural


SAQ 3

You want to test competence in vocabulary and grammar. You
decide to use two kinds of tests: a multiple-choice test and a writing
sample. The scores of multiple-choice tests are highly correlated
with other. The correlation between the multiple choice and writing
tests of grammar is poorer. What is the cause of this lack of
correlation?



Write your answers in the space provided above (in no more than 30 words) and
compare them to those in the Answers to SAQs section at the end of the unit.

Messick (after Bachman) considers the following types of
empirical evidence among the means of construct validation:
1. the examination of patterns of correlations among item scores
and test scores, and between characteristics of items and tests
and scores on items and tests;
2. analysis and modeling of the processes underlying test
performance;
3. studies of group differences;
4. studies of changes over time;
5. Investigation of the effect of experimental treatment

A correlation is a functional relationship between two measures:
two sets of scores may be correlated with each or they may vary.
Correlation:
a high score on a test of grammatical competence and
a high grade in writing classes.

A correlation coefficient tells us to what extent variation in one
(measure) goes with variations in the other. Due to the influence of
Samuel Messick, an outstanding representative of educational
measurement in the United States, construct validity, content validity,
criterion-related validity and consequential validity have become
included within a single unitary concept of validity centered on
construct validity (Messick, 1980, 1983).
In discussing validity we have to take into account experimental
evidence and the use of self-report data
i.e. what testees say about the experience of answering
particular test items in order to try and separate the measurement of
relevant aspects e.g. in using reading strategies, from the use of test-
taking strategies. This may be seen as a way of opening up construct
validity, or the accuracy of measurement of the theoretical essentials
of a given skill or one of knowledge.
Other factors that may influence more results
test bias (the result of differences in individual characteristics other
than the ability).
Conditions of a Good Test

Proiectul pentru nvmnt Rural 19

Test bias might includes:
- misinterpretation of test score;
- sexist or racist content;
- unequal prediction of criterion performance;
- unfair content with respect to the experience of test taker;
- inappropriate selection procedures;
- inadequate criterion
- threatening atmosphere;
- conditions of testing
- background knowledge;
- cognitive characteristics.
- field independence (the extent to which a person perceives
analytically )
- ambiguity tolerance (a persons ability to function rationally and
calmly in a situation in which interpretation of all stimuli is not
clear)
- native language, ethnicity
- sex and age

Messick has identified areas to be considered in the ethical use
and interpretation of test results. Construct validity (Does the
evidence offered by the test support the particular interpretation we
wish to make? Does the test, for example, guarantee the certification
of teachers?)
The value systems that inform the particular test use
The practical usefulness of the test
Consequences to the educational system or society of using test
results for a particular purpose.

Establishing the validity of a test or assessment may, thus,
include an evaluation of the intended or unintended consequences of
a tests interpretation and use.

2.3.7 Curricular Validity
The term curricular validity relates to the question of the degree to
which the test content is covered in the curriculum. This is certainly
important if one wishes to make inferences about instructional
effectiveness. Curricular validity is considered by many to be
important for any type of minimal competency test required for, say,
secondary school graduation. It seems unfair to withhold a diploma
for someone who did not learn something that was not covered in the
curriculum.

2.3.8 Criterion Related Validity
This approach to test validity answers the question: how far results
on the test agree with those provided by some outside independent
criterion measure:
concurrent validity
predictive validity
This concept is usually discussed together with those of concurrent
validity.
Conditions of a Good Test

20 Proiectul pentru nvmnt Rural
2.3.9 Concurrent Validity

When the test and the criterion are administered at about the same
time we speak about concurrent validity.

Example:
The course objectives call for an oral component as part of the
final achievement test. The testee is expected to perform orally a
large number of functions. The duration of the test might take 45 for
each student. Because of the great number of testees, only ten
minutes can be devoted to each of them. Does the test have content
validity? In order to check this, a sample of testees chosen at random
are fully tested (45). The result of this extension becomes the
criterion against which the shorter tests will be judged. A high level of
agreement between the two tests indicates that the shorter version of
the oral component may be considered valid. The mathematical
measure of similarity is called validity coefficient. Perfect agreements
between the two scores will result in a validity coefficient of 1. Total
lack of agreement will give a coefficient of zero.

The criterion for concurrent validation might be also considered
the teachers assessment of his student.


Point to Ponder

Before introducing a new test, its specifications and sample items
have to be made available to everyone concerned with preparation
for the test.




SAQ 4

Match the threats to Test Validity with the practical examples given
below:
1. Misapplication of tests
2. Standardized proficiency tests, developed from a distinct
population, are administered to subject drawn from a qualitatively
different population.
3. Items do not match the objectives or content of instruction
4. Imperfect cooperation of the examinee

a. The examinees are insincere, misinformed or hostile. They
consider that the test is a waste of time. They respond quickly making
a series of answers which do not at all reflect their opinions.
b. A test of reading comprehension designed to measure
achievement in reading comprehension in accordance with the
syllabus of 4
th
year high school students applied to measure
achievement of 4
th
year general school students.
c. TOEFL test is a standardized proficiency test of high validity
Conditions of a Good Test

Proiectul pentru nvmnt Rural 21


2.4 Reliability

If we buy a kilo of fruit, each time we weigh the parcel on the
same scales, we expect to get the same weight. The same thing is
expected from a test. In order to be reliable, a test must be consistent
in its measurements i.e if the test is given to the same learners on
different occasions with no further language lessons between the two
dates, the same scores are obtained.



Points to Ponder

Make the first test relatively easy to build students
confidence.
Provide adequate feedback on students test performance.
Any assessment should help a student to learn.

Reliability is vital especially when the test is used for an
entrance examination. Factors affecting the reliability of a test:
the size or the extent of the sample i.e. the longer the sample, or
the mere task the pupil has to perform, the greater the reliability.
Objective tests are favoured because they allow for a wide field to
be covered.

2.4.1 Measuring reliability
Well-constructed tests usually have a reliability coefficient of
r=0.90 or greater.

2.4.1.1 Test-Retest Method
Test Retest method i.e. re-administer the test to the same
testees after a lapse of time (no more than two weeks).
Comparison of the two results would then show how reliable the
test is. The following formula is recommended: r tt = r1, 2 where r
tt = the reliability coefficient using this method, r1, 2 = the
correlation of the scores at time one with those at time two for the
same test used with the same persons. A frequent use of it is not
to be recommended because: a number of pupils will benefit more
than others by a familiarity with the test and format of the test;
and reliability designed for the purpose of screening foreign students
for entry to American universities. In spite of this, the vocabulary
section is not difficult at all for Romanian testees. (words of Romance
origin which are difficult for Anglo-Saxons are easy for Romanians)
d. The test requires knowledge of vocabulary and structures to
which the students were never exposed. The test requires knowledge
of if-clauses only.

1. .. 2. . 3. .. 4. ..

Write your answers in the space provided and compare them to those at the end of
the unit.
Conditions of a Good Test

22 Proiectul pentru nvmnt Rural
changes in performance resulting from the memory factor;
personal factors.
In researches that involve data-collection at more than one
point in time, the same question can be asked in different locations
without drawing the respondents attention to what is being done.
The test retest method can sometimes produce surprising
results, especially in the case of questionnaires. The test retest
method can be approximated through a different technique, if the
tester has several indicators of the variable in question.

2.4.1.2 Parallel Forms to the Same Group

Administer parallel forms of the test to the same group. The
second test should be identical in its sampling, difficulty, length,
rubrics. If the correlation is high, the tests can be termed reliable.
The following formula is recommended: r tt = r A, B where r tt =
the reliability coefficient, r A, B = the correlation of form A with
form B of the test when administered to the same people at the
same time.

2.4.1.3 The Split-Half Method

The split half method. Divide the test and the corresponding
scores obtained. The test is reliable if the two halves correlate
with each other. Ways of splitting into halves:
Divide it into the first and second halves. Language tests are
designed as power test i.e. the easiest questions at the
beginning and the questions becoming progressively more
difficult
Split the test into random halves, the odd even method:
when we measure the same ability e.g. multiple choice tests
of grammar and vocabulary. The first half may comprise items
1, 4, 5, 8, 9, 12. The second half may contain items 2, 3, 6, 7,
10, 11. The following formula is recommended:
B rA
rAB
rtt
, 1
2
+
=
where
r tt = reliability estimated by the split, rAB = the correlation of
half-method, the scores from one half of the test with those
from the other half.


Point to Ponder

Tension between Reliability and Validity
The best measurements are those ranking high in both validity and
reliability. However, there is a tension between the two
characteristics. Increasing one often reduces the other. The
solution to this dilemma is to use a variety of measurement
techniques varying in validity and reliability whenever possible.


Conditions of a Good Test

Proiectul pentru nvmnt Rural 23


SAQ 5

Circle T(rue) or F(alse)
1. Reliability is a necessary but not sufficient condition for
validity.
2. Reliabilities of the prediction and criterion measures,
group heterogeneity cannot affect validity.
3. Availability of other data, cost of testing and faulty
decisions, selection ratio, success ratio cannot affect
whether a test is valid enough to be useful in decion
making.
4. Reliability is the degree of consistency between two
measures of the same thing.
5. Reliability will be higher when a test is given to a
heterogenous group.
6. All measurement is subject to error.

In order to give good answers read 2.3 and 2.4. circle T or F.
Compare your answers to those at the end of the unit.




T F

T F


T F


T F

T F

T F

2.4.1.4 Factors that affect language scores

1. Test method features
Features of the test environment
- familiarity with the place and the equipment
- personnel
- time of testing
- physical conditions
Features of the test
- the relative important features, sequence and relative
importance of parts
- time allocation
- instructions (language, visual or oral channel)
features of the input format (language form and vehicle of
presentation)
nature of language (vocabulary, contextualization, distribution
of new information, type of information)
discourse characteristics
features of the expected response (format, nature of
language)
restrictions on response (channel, format, time)
relationship between input and response (reciprocal,
nonreciprocal, adaptive)
2. Personal attributes
systematic individual (cognitive state, knowledge)
particular content areas and group characteristics (sex, race,
and ethnic background)
Conditions of a Good Test

24 Proiectul pentru nvmnt Rural

3. Random (unsystematic) factors
emotional state
mental alertness
changes in the test environment

The effects of all the above features:
Individuals who take a language test are no likely to perform
equally well
Variation is due to the different factors above
Different factors affect different individuals differently
Individuals are affected by different methods of testing (some
may do very well on a multiple choice test and perform poorly
an a composition)



SAQ 6

If all error of measurement could be removed from a testees
score, what would we call the remaining quantity?



Write your answer in the space provided above (in no more than 15 words) and
compare it to that in the Answers to SAQs section at the end of the unit.


Conclusions
A major concern is to minimize the effects of test method,
personal attributes and random factors that are not part of
language ability
The interpretation and use of language test scores must be
moderated by your assessment (or estimates) of the extent to
which these scores reflect personal, test method or random
features.

Points to Ponder

Are you aware that:
Students with neat handwriting get higher marks on essay
tests?
A halo effect exists in the assignment of grades? Students
who performed well on previous essays tend to be rated
higher on subsequent ones, even if the quality diminishes?
Longer essays get rated higher than better shorter essays?
Students with common names get rated higher than students
with unusual names?
Grades have proven of little value in predicting any criteria of
post-school success in any field? (after Ronald L. Partin)


Conditions of a Good Test

Proiectul pentru nvmnt Rural 25

Random factors and test method features are sources of
measurement error that affect reliability. Personal attributes are
sources of test bias or test invalidity. Two statistical concepts are
useful in discussing reliability: mean and variance.
The mean (symbolized by X) is the arithmetic average of the
scores of a given group of test takers. The variance (symbolized by
s - square s of standard deviation is a statistic that shows how much
individual scores vary from the group mean. x, t, e indicate specific
types of variance, sx refers to the variance in observed test scores.
NR Norm referenced test results are interpreted with
reference to the performance of a given group, or norm. A norm
group is made up of a large group of individuals who are similar to
the testees.
Stages in the development of NR group:
a test is given to the norm group
the results are used as reference points for interpreting the
performance of other students who take the test
the reference points are the mean x (or average score)
S the standard group deviation indicates how spread out the
scores of the group are
A NR test is graphically distributed in the shape of a bell
shaped curve
statistical characteristics of a normal distribution of scores: 50% of
the scores are below the mean, 50% are above, 34% of the scores
are between the mean and one standard deviation, 25% (+1 s) are
below, 37% are between one and two standard deviations from the
mean (13.5% above and 13.5% below), 5% of the scores will be as
far away as two or more standard deviations from the mean

Example: The mean of the TOEFL is about 512, s = 66.
A score of 578 (512 + 66) is above average with reference to
the norm group i.e. his/ her performance is equal to or greater than
that of 84% of the students in the norm group.




SAQ 7

Imagine an interview aimed at testing speaking ability in EFL.
Conditions: one rater; each testee was interviewed twice (test retest
reliability). What are the threats to this reliability?








Write your answers in the space provided above (in no more than 50 words) and
compare them to those in the Answers to SAQs section at the end of the unit.
MEAN AND
VARIANCE
Conditions of a Good Test

26 Proiectul pentru nvmnt Rural
2.4.1.5 Test Length
The test must be of sufficient length to yield reliable scores.
Usually, the longer the test, the more reliable the scores. If the
teacher follows the table of specifications (i.e. the relative emphasis
of each content area that usually reflects the relative importance to
the instructional objectives), the three classifications of the cognitive
domain (knowledge, understanding and application) should be
indicated. e.g. knowledge 30%, understanding 40%, application
30%). The test should be valid if it is reliable. After taking these
decisions and some others (test and item characteristics, test
difficulty, test instructions and layout, and obviously scoring) all that
is now required is to construct a test of sufficient length. The
following factor should be considered:

If a test is to be administered during a class section, it should be
constructed so that most of the examinees can easily finish it
during the examination period.
The age of the testees should also be considered and the item length
should take into account the pupils schemata and attention span.

Summarily, a test should be long enough to be adequately
reliable and short enough to be administered. Hints:

35 to 45 items should be reliable for the average end of-unit
revision
75 or more items for a final examination
The time needed to answer test items varies with the grade level,
the type of items used, the difficulty of the items, the level of
cognitive activity required
A typical learner can usually answer about two knowledge items
per minute, one application or understanding item per minute
Allow ten minutes to distribute materials, explain procedures and
collect materials
A teacher can utilize only 40 minutes of a 50 minute class period
if he/ she administers an examination to 35 to 50 multiple
choice items
2 or 3 written true or false items, 2 or 3 matching items, 1 or 2
completion items may be answered in 1 minute

When the testee must give the answer, the amount of time depends
on the amount of thinking time and the amount of writing involved.

Do not include more than 6 7 essay questions per hour
At the elementary level, allow more time per item
Rely on your personal experience
HINTS
ABOUT
TEST
LENGTH
Conditions of a Good Test

Proiectul pentru nvmnt Rural 27


SAQ 8

What general relationship exists between test reliability and the
number of the items on the test?




Write your answers in the space provided above (in no more than 30 words) and
compare them to those in the Answers to SAQs section at the end of the unit.


Summarily, a test should be long enough to be adequately
reliable and short enough to be administered.



2.5 Discrimination

Another characteristic of a good test is discrimination i.e. a test
has to have the power to discriminate between testees. This is not a
problem with tests for learners at much the same level (e.g. class
achievement tests). In order to discriminate reliably, the test should
be fairy long. Short tests are not always able to discriminate. The
solution is to require the testees who score highly on a short test
given to the majority to take a further longer extension test that meets
this condition.
Discrimination is also a property of individual items in a test.
Each item should contribute to the discrimination power of the test as
a whole. Item analysis answers the following questions:
Is an item answered correctly by candidates who answer most of
the rest of the items right? (good discrimination)

SAQ 9

What extraneous variables can be anticipated and controlled:
a. the sex of the examiner
b. the uniformity of the procedures followed in administering
the test e.g. time, clearly specified instructions
c. the uniformity of the procedures followed in scoring the test
d. the explanations given by the examiner
e. the examiners facial expression, tone of voice
f. the manner in which the examiner presents the materials





Write your answers in the space provided above (in no more than 40 words) and
compare them to those in the Answers to SAQs section at the end of the unit.

Conditions of a Good Test

28 Proiectul pentru nvmnt Rural
Is an item answered correctly by poor testees and incorrectly by
good ones? (poor discrimination)
Do the items show a reasonably good record of agreeing with the
overall score given by the rest of the items?
A sophisticated technique known as Item Response Theory
models the response to items on the assumption that each item has
a level of difficulty associated with it. More than that, IRT assumes
that the items can be ordered with respect to each other, as in a
power or ladder test, and that candidates response can be
expected to be inconsistent within the limits of their ability level. The
consequence is that with those techniques comes a more
sophisticated approach to test design and a pre-administration.
In determining discriminability with sample separation, the first step is
to separate the highest scoring group and the lowest scoring group
from the entire sample on the basis of total score on the test in the
high group. Then the following formula is applied:

64 . 0
4 7
7
=
+
=
+
=
Lc Nc
Nc
D

where, D = discriminability,
Nc = the number of correct responses
Lc = the number of correct responses in the low group.

The discriminability for item six is 0.64. A discriminability index of
0.67 is considered the lowest acceptable discriminability by this
method.



2.6 Feasibility

Something that is feasible can be done, made, or achieved.
This requirement has been made possible by a number of
technological developments, such as


SAQ 10

Case study. Half of the testees pass a given item and half fail it. If we
take difficulty into account, we would rate this item as an easy one.
Unfortunately, the testees who passed the item were the weaker half
of the testees, and those who failed the item were the better testees in
the ability being measured. What is your conclusion?





Write your answer in the space provided above (in no more than 35 words) and
compare it to that in the Answers to SAQs section at the end of the unit.

Conditions of a Good Test

Proiectul pentru nvmnt Rural 29
computer working for large-scale tests, in which the testee marks
the answer on a special computer readable sheet;
tests run on PCs;
tests using video presentation (e.g. BBC English Video Test);
Prerecorded oral tests such as Simulated Oral Proficiency
Interview, The Test of Spoken English (TSE).




2.7 Washback

Washback is the effect a test has on teaching in the classroom.
It is true we are always recommended not to teach toward a test.
However, we can use tests as teaching tools. Tests (especially
formative tests) may be used as feedback devices that make
teachers aware of the areas where the learners need improvement.
Formal tests may be channels through which learners can receive a
diagnosis of areas of strengths and weakness. Your prompt return of
written tests with your feedback is a must if you really want to use
washback positively. It is also important to comment upon your
evaluation. Give praise for good strengths and offer constructive
criticism of weaknesses. Give learning hints on how a learner might
improve his performance. Encourage learners to seek clarification
about their grades / scores.
Tests have the power of influencing over the method and
content of language courses. Their backwash effect may be positive
or negative. In their turn, the teachers who use such tests and
testees who suffered the negative backwash may be able to
influence the decision of the testing organizations who respond
positively to positive feedback (e.g. the new TOEFL test reflects this
type of feedback).

Washback can be positive (beneficial) or negative.

SAQ 11

What is affected in 1 and 2 below: reliability or validity?
1. a recording for an oral comprehension test is poor in quality
2. b. The quality of the recording is good. A group hears it played
under good acoustics conditions while another group hears it
under poor conditions.








Write your answers in the space provided above (in no more than 40 words) and
compare them to those in the Answers to SAQs section at the end of the unit.
Conditions of a Good Test

30 Proiectul pentru nvmnt Rural

2.7.1 Negative Washback
Examples of negative effects:
teaching is dominated by coaching for the testing session /
examination;
the test content and testing techniques differ from the objective of
the course;

Examples of positive effects:
when motivation is increased;

2.7.2 Positive Washback
How can positive backwash on teaching and learning be
achieved?
test the skill / abilities whose development you want to promote
(if you want to develop oral skills, then test oral skills);
If tests set two kinds of task: compare / contrast, describe/
interpret. Then teaching will be concentrated on these tasks.
Backwash is harmful in this case.
employ direct testing (i.e. tasks / tests that are as authentic as
possible);
make testing criterion referenced (norm referenced testing
makes teachers and learners assume that a certain percentage of
candidates will fail the exam). Use a series of criterion
referenced tests representing different levels of achievement and
allow learners to choose the tests they are able to pass. This will
encourage positive attitude to language learning;
Construct achievement test on objectives rather than on textbook
content;
Be sure that students understand what the test demands of them.



Point to Ponder

On the whole, if learners fail to learn it is the fault of the
teacher, the school, the curriculum, or poor curriculum.



SAQ 12

Diagnose a situation in which questions are difficult for the candidate
to understand, or are eventually biased.






Write your answers in the space provided above (in no more than 30 words) and
compare them to those in the Answers to SAQs section at the end of the unit.
Conditions of a Good Test

Proiectul pentru nvmnt Rural 31

2.8 Summary


The principal ideas, conclusions, and implications presented in
the chapter Conditions of a Good Test are summarized in the
following statements:
Reliability is a necessary but not sufficient condition for validity;
Validity can be defined as the degree to which certain
inferences can be made from the test scores (or other
measurements). Since a single test may have many different
purposes, there is no single validity index for a test;
Various factors affect validity;
Reliability is the degree of consistency between measures of the
same thing;
The different methods of estimating reliability consider different
sources of error. Which should be used depends upon how one
wishes to use the results of the test;
In general, longer tests are more reliable;
Reliability will be higher when a test is given to a heterogeneous
group.


2.9 Key Concepts

construct validity
content coverage
content relevance
content validity
criterion related validity
curricula
discrimination
face validity
feasibility
predictive validity
reliability
reliable variance
the Split Half Method
test length
test retest Method
washback
validity

2.10 Checklist

Do you make questions short and clear?
Do you accept students standard, however high or low, and set
about improving it in steps, by reinforcement and encouragement?
Do you allow resubmission of unsatisfactory work?
Conditions of a Good Test

32 Proiectul pentru nvmnt Rural
Do you ask your learners to evaluate their own work, and set
themselves target?
Do you set each student achievable goals?



SAA No. 1

This activity aims at reviewing unit 2.
Match the principles (numbers) with their main characteristics
(letters)
I. Principles:
a) face validity; b) practicality; c) authenticity; d) content; e) validity;
f) reliability; g) washback; h) discrimination.

II. Caracteristics:
1. a well - constructed format with familiar tasks; timing is clear;
uncomplicated items; crystal clear instructions; a difficulty level that
presents a reasonable challenge;
2. tasks that relate the course work of the learners
3. spending classroom time after reviewing the content; students
discover their areas of strength and weakness; asking students to
use test results as a guide to setting goals for their future effort;
items can serve in diagnostic capacity;
4. the language in the test is as natural as possible; contextualized
items; tasks represent real-world tasks;
5. objective scoring procedures; classroom conditions are equal
and optimal for all students;
6. te test is not expensive and stays within appropiate time
constraints; relatively easy to administer; a scoring procedure that
is specific and time efficient;








Please not that each correct answer will count for 15 points. 10
points will be given for ordering the principles function of their
importance.
The maximum score for this assignment is 100 points.

Do not forget to send your answers to your tutor in due time.




Conditions of a Good Test

Proiectul pentru nvmnt Rural 33

2.11 Answers to SAQs

SAQ 1

If your answer to SAQ 1 is not comparable to the one suggested
below, please reread section 2.2 again.

The answer might consist of a number of brief sentences:
When I assess my pupils I aim at educating them.
I ensure that my pupils understand the aim of my assessment.
Learning is complex and multidimensional. So is assessment.
Use tests that reveal change and growth.
The goals of assessment should be clear and shared.
Curricula, teaching, students effort should aim at reaching a
certain outcome.
My assessment is fair.
I use formative tests.
I prefer criterion referenced tests.

Traditionally, the characteristics of a good test have been seen to
be validity, reliability, discrimination and feasibility.

SAQ 2

If your answer to SAQ 2 is not comparable to the one suggested
below, please reread section 2.2.3 again.

Face validity is useful from a public acceptance standpoint.
Untrained people who look at or take the test should think the test
is measuring what its author claims. If a test appears irrelevant,
examinees may not take the test seriously, or potential users may
not consider the results useful.

SAQ 3

If your answer to SAQ 3 is not comparable to the one suggested
below, please reread section 2.3.6 again.

The poor correlation may be attributed to the effect of test method
factors since the two highly correlated tests of different abilities
shared the same multiple choice test method.

SAQ 4

If your answer to SAQ 4 is not comparable to the one suggested
below, please reread section 2.3 again.
1 b, 2 c, 3 d, 4 a

SAQ 5

If your answer to SAQ 5 is not comparable to the one suggested
below, please reread section 2.4 again.
T: 1, 4, 5, 6
F: 2, 3

SAQ 6

If your answer to SAQ 6 is not comparable to the one suggested
below, please reread section 2.4.1 again.
Removal of all error leaves only the testees true score.

Conditions of a Good Test

34 Proiectul pentru nvmnt Rural
SAQ 7 If your answer to SAQ 7 is not comparable to the one suggested
below, please reread section 2.4.1.4 again.

fluctuations in the interviewer (fatigue, anxiety, mental awareness)
fluctuations in the testee (fatigue, boredom, changing attitude
towards the interviewer)
fluctuations in the administration of the interview (length, time,
place)

SAQ 8 If your answer to SAQ 8 is not comparable to the one suggested
below, please reread section 2.4.1.5 again.

In general, reliability increases as the number of items has
increased, up to a point of asymptote, where little is gained through
the addition of new items.

SAQ 9 If your answer to SAQ 9 is not comparable to the one suggested
below, please reread section 2.4.1 again.

The reliability and validity of a test depend on the uniformity and
standardization of the procedures. We can anticipate and minimize:
b, c, d, f. We cannot control a,e (although such variables cannot be
controlled, their influence should be taken into consideration)

SAQ 10 If your answer to SAQ 10 is not comparable to the one suggested
below, please reread section 2.5 again.
The item is not suitable. If a test comprises only such items, a high
score would be an indication of inability and a low score would be
an indication of comparative ability.

SAQ 11 If your answer to SAQ 11 is not comparable to the one suggested
below, please reread sections 2.3 and 2.4 again.

In the first case, the recording is poor in quality for all testees. In this
case we may speak about the invalidity of the test. In the second
case, we may speak about unreliability and therefore invalid.

SAQ 12 If your answer to SAQ 12 is not comparable to the one suggested
below, please reread section 2.3 again.

Validity is compromised. It is common for teachers to confuse poor
learning with a students difficulty in understanding examination questions


2.12 Further Readings


Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 10-16
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge
University Press, pp 22-48

Types of Tests I

Proiectul pentru nvmnt Rural 35

Unit 3

TYPES OF TESTS I


3.1 Unit Objectives ........................................................................................................... 35
3.2 Informal Assessment .................................................................................................. 36
3.2.1 Informal Assessment of Speaking ........................................................................... 37
3.2.2 Informal Assessment of Writing ............................................................................... 38
3.2.3 Informal Assessment of Listening ............................................................................ 38
3.2.4 Informal Assessment of Reading ............................................................................. 39
3.2.5 Informal Assessment of Non Linguistic Factors .................................................... 39
3.2.6 Informal Assessment of Grammar and Vocabulary ................................................. 39
3.3 Formal Assessment - Types of Tests and Testing ...................................................... 40
3.3.0 Classification by Stimulus Material .......................................................................... 40
3.3.1 The purpose, or use, for which they are intended i.e. the types of decisions to be
made function of the scores .................................................................................... 41
3.3.1.1 Selection Tests ..................................................................................................... 41
3.3.1.2 Entrance Tests ..................................................................................................... 43
3.3.1.3 Readiness Tests ................................................................................................... 43
3.3.1.4 Placement Tests ................................................................................................... 43
3.3.1.5 Diagnostic Tests ................................................................................................... 44
3.3.1.6 Progress Tests ..................................................................................................... 45
3.3.1.7 Achievement/ Attainment Tests ............................................................................ 46
3.3.1.8 Mastery Tests ....................................................................................................... 46
3.3.2 Function of Content ................................................................................................. 48
3.3.2.1 Proficiency Tests .................................................................................................. 48
3.3.2.2 Achievement or Attainment Tests.......................................................................... 52
3.3.2.3 Aptitude or Prognostic Tests ................................................................................. 53
3.3.3 The frame of reference ............................................................................................ 54
3.3.3.1 Norm-Referenced Tests ....................................................................................... 55
3.3.3.2 Criterion Referenced Tests................................................................................. 56
3.4 Summary .................................................................................................................... 57
3.5 Key Concepts ............................................................................................................. 58
3.6 Checklist ..................................................................................................................... 58
3.7 Answers to SAQs ....................................................................................................... 58
3.8 Further Readings ........................................................................................................ 60


3.1 Unit Objectives

Just as there are many purposes for which language tests are
developed, so there are many types of language tests. Some types
serve a variety of purposes, while others are more restricted in their
applicability. If we were to consider all kinds of language tests, the
remainder of this book might not suffice. However, there are some
broad groups of tests that deserve description and explanation. Many
stand in opposition to one another. We cannot but recognize that
there is much overlap.
Types of Tests I

36 Proiectul pentru nvmnt Rural
Many people still view tests as the best available means of
determining what people can do. Others think that they are narrow
and restrictive as they do not measure innovation, social skills, and
qualities of leadership. As we describe and examine various kinds of
tests, we will look at the evidence on both sides of this controversy.

At the end of this unit, you will be able to:
distinguish among the following concepts: informal
assessment, formal assessment, and self- assessment
learn how to informally assess students language skills
be aware of the wide choice you have
make changes in the way you assess and test your learners
realize that testing aids you by helping:

a) to provide knowledge concerning the students entry
behaviours;
b) to set, refine, and clarify realistic goals for each student;
c) to determine the degree to which objectives have been
achieved;
d) to determine, evaluate and refine your instructional techniques.

understand how testing aids the student by:

a) communicating the goals of the teacher;
b) increasing motivation;
c) encouraging good study habits;
d) providing feedback that identifies his/ her strength and
weaknesses.

3.2 Informal Assessment

Informal assessment is carried out by the teacher in normal
classroom conditions.
Characteristics
a way of collecting information
a kind of continuous assessment (an academic year or more)
the result of systematic observation
Where? In and outside the classroom (looking at samples of learners work/
portfolio)
What? Linguistic and non linguistic factors
How? Establishing what we are going to assess:
criteria for assessing learners (do not rely only on impressions)
link informal assessment with formal assessment (tests) and with
self assessment
Why? To help learners identify difficulties, to give students positive
feedback, to develop students awareness

It is useful to consider what things we are going to assess
formally and which factors we are just going to get an impression of.
Types of Tests I

Proiectul pentru nvmnt Rural 37

SAQ 1

Which of the following items are assessed informally or formally?
Circle in the margin I (informally) or F (formally)
Written homework I F
Written grammar activities I F
Speaking I F
Projects I F
Portfolios I F
Listening tasks I F
Reading tasks I F
Writing tasks I F
Vocabulary activities I F
Attitude / effort I F
Participation in class I F
Group work I F
Pair work I F
Organization of work I F
Presentation of work I F

Circle in the margin I (informally) or F (formally). Compare your answers to
those in the Answers to SAQs section at the end of the unit.

3.2.1 Informal Assessment of Speaking

Assessing speaking informally is important when you have
practical difficulties in organizing oral tests. It is a way of providing
positive feedback and motivation to the learners. The criteria you
may take into account are: fluency (speed/ hesitations), the
relevancy and appropriacy of the message, grammatical and lexical
accuracy, pronunciation (sounds, intonation, and rhythm)
How
walk around the classroom monitoring pair work or group work,
thus learning about your learners pronunciation problems, their
intonation
give learners points based on pre-established criteria

SAQ 2

What kind of assessment is favoured by teachers? Circle in the margin
I (informally) or F (formally)and compare your answers to those at the
end of the unit.
A. who teach small classes I F
B. who teach small classes and have
more than two hours a week I F
C. teach large classes and have
two hours of English each week I F

Circle in the margin I (informally) or F (formally). Compare your answers to
those in the Answers to SAQs section at the end of the unit.
Types of Tests I

38 Proiectul pentru nvmnt Rural

3.2.2 Informal Assessment of Writing
Assessing your learners written work can be very time consuming. It
follows that a teacher, in order to avoid neglecting other aspects of
teaching, has to take a number of decisions:
correct the most important pieces of writing
organize group writing activities (in this case, you correct only a
limited numbers of essays)
link informal assessment with formal assessment
avoid unreliability (which is often the case when you correct essays)
establish clear criteria (for a 5 band scale):
5. Excellent writer - speaks fluently, no errors, little hesitation
4. Good writer - speaks quite fluently, not many mistakes
3. Modest writer - some difficulties, limited structures, difficult to
understand
2. Marginal writing - difficulty in speaking, almost
incomprehensible
1. Poor writer - unable to use vocabulary, grammatical structures



3.2.3 Informal Assessment of Listening

We can informally assess learners listening comprehension abilities by:
observing which learners seem to understand
getting an overall impression of what they have understood
looking at the learners
monitoring pair work activities
assessing learners reactions to instructions
asking for a show of hands i.e. how many learners have put up
their hands
going through the answers one by one
asking learners to summarize what they have heard
using a recorded text for a speaking activity
using TPR techniques
SAQ 3

Is informal/ impressionistic assessment reliable? Answer the following
questions.:
What kind of mark do you give a composition?
a. when you are tired H L
b. on a Friday H L
c. on a Monday H L
d. at the beginning of the activity (when
you have to correct 50 essays) H L
e. at the end of the activity H L
Circle the letter in the margin L (lower mark), H (higher mark). Compare your
answers to those in the Answers to SAQs section at the end of the unit.
Types of Tests I

Proiectul pentru nvmnt Rural 39
3.2.4 Informal Assessment of Reading
Reading can be assessed informally by:
observing
checking class understanding of several important points
using a reading text for developing speaking and writing abilities
using a reading text for role- play
asking for a summary in Romanian
assessing learners personal opinion

3.2.5 Informal Assessment of Non Linguistic Factors
Non linguistic factors are important in assessing learners overall
educational development, in encouraging personal effort.
Informal assessment of non-linguistic factors implies assessment of:
attitude (passive versus active learner)
co-cooperativeness (ability to work with other people in group work)
independence ( able to use dictionaries, other language materials)
creativity (original through initiative)

Informal assessment of non linguistic factors is carried out:
by observing learners in class and giving an impression or rating
them using a band scale
by collecting vocabulary notebooks and marking them
by using peer assessment of group work

3.2.6 Informal Assessment of Grammar and Vocabulary
By observing and identifying problems students are having
By observing what they are doing while they perform speaking and
writing tasks
Going round the class and writing down the most important mistakes
By organizing language awareness exercises: What is wrong with
this sentence?

Points to Ponder

Reflect upon some of your arguments against formal
proficiency tests.
Infants do not need to be tested formally. We interact with them
offering them comprehensible input. i.e. we subtly adjust to their
level of proficiency
Testing erects a barrier between us and our pupils
The results of testing are often used in ways that cause learners
/ teachers pain
The test is artificial, destructive
Reward acquisition and not test results
Help students understand that testing is needed and that they
have to accept it
Humanize the experience (teachers must openly comment on
test results when they seem to misrepresent abilities)
If you are not happy with the test, make suggestions to testing
organizations (local inspector, Ministry of Education and Research,
textbook authors)
Types of Tests I

40 Proiectul pentru nvmnt Rural

3.3 Formal Assessment - Types of Tests and Testing

Language tests can be classified according to some distinctive
criteria:
3.3.0. The types of stimulus material used to present the problems
to the learners: verbal and non-verbal
3.3.1. The purpose, or use, for which they are intended
3.3.2. The content upon which they are based
3.3.3. The frame of reference within which their results (scores) are
to be interpreted

3.3.1. The purpose, or use, for which they are intended i.e. the types
of decisions to be made function of the scores
Tests with regard to admission decisions
3.3.1.1. Selection tests
3.3.1.2. Entrance tests
3.3.1.3. Readiness tests
Tests with regard to identifying the appropriate instructional
level or the specific language areas in which instruction is
needed:
3.3.1.4. Placement tests
3.3.1.5. Diagnostic tests
Tests with regard to decisions about how learners should
proceed through the language programme, or how well they
are attaining the programmes objectives:
3.3.1.6. Progress tests
3.3.1.7. Achievement tests/ attainment tests
3.3.1.8. Mastery tests
3.3.2. The content upon which they are based (tests may be based
on a certain theory of language or a specific domain of content)
3.3.2.1. proficiency tests
3.3.2.2. achievement or attainment tests
3.3.2.3. aptitude or prognostic tests

3.3.3. The frame of reference within which their results (scores) are
to be interpreted
3.3.3.1. norm-referenced tests
3.3.3.2. criterion referenced tests

3.3.0 Classification by Stimulus Material: verbal nonverbal

There are many instances where the stimulus material used to
present the problem to the student need not be verbal. The stimulus
can be pictorial or (in humanities, art courses, foreign languages,
mainly at an elementary level), a recording (in a musical test).
Although nonverbal stimulus material items are infrequently
used in the classroom, this does not mean that they are not a good
medium to use.
CLASSIFICATION
CRITERIA
Types of Tests I

Proiectul pentru nvmnt Rural 41

3.3.1 The purpose or use for which they are intended i.e. the types of decisions to
be made function of the scores


3.3.1.1. Selection Tests
A selection test is a special form of placement test. It excludes
learners who are below a certain percentage. It selects candidates
for a particular job or course of study (success or failure depend on
the number of places available). Proficiency tests are often used for
selection.
The true-false test is the most popular among the selection type
examinations with classroom teachers.

Weaknesses:
Its fifty fifty chance of guessing the correct answer encourages
students to guess wildly
It does not discriminate well between those examinees receiving
the highest score on the total test and those receiving the lowest
score
It is not as reliable as a multiple choice test of equal length
It is quite difficult to develop statements which can be answered
absolutely true or false
It is seldom applicable to the measurement of complex
understandings and other higher order mutual processes
Strengths. The true-false test may be:
Rapidly and accurately scored by individuals unqualified to teach
the subject matter area being examined
The scoring is completely objective
Extraneous factors have no influence on test scoring
It can be administered relatively quickly (less time per item is
required to answer true-false questions in compositions with any
other item type)
It takes less time to construct and refine the items
The item statement need not include instructions on how to
respond
It is very useful in situations where the measurement of the
acquisition of factual, non interpretative information is desired
(vocabulary, technical terms, formulae, dates, proper names)

Construction
Select from the table of specifications the areas that can be
successfully tested by the true-false test
Write each item on a separate 3x5 piece of paper (it is easier to
place the items in the desired order on the test)
The true false item consists of: a statement, a disagreement with
the statement
The testee is instructed to mark the statement true or false, right or
wrong, yes or no at the beginning of the test

Types of Tests I

42 Proiectul pentru nvmnt Rural
Examples: The following items are true-false questions. If the statement is
true circle A on your answer sheet; if the statement is false, circle B
on your answer sheet. Be sure that the item number that you are
marking on the answer sheet corresponds to the item number of the
question you are answering:
1. According to the cognitive theorists, a learner will learn by heart if
he lacks a cognitive structure. A B
2. According to the cognitive theorists, learning something new is a
matter of seeing where it fits in. A B

In another variety of this item type, the statement may be: True
(T), False (F), True or False (TF). A separate answer sheet should
be used.

Example: All sides of a square are equal. T F TF

Rules for constructing true- false tests
Be sure that the item is absolutely true or completely false (except
when using the TF category)
The true-false statement should possess one and only one central
theme and should be free from ambiguities
A test should contain the same number of true and false
statements
You should avoid:
Negative statements
Irrelevant clues (all and none should be used with caution)
Qualifying clues (a long sentence)



SAQ 4

Rewrite the following items:
1. In Blakes The Lamb, the lamb does not stand as a symbol
of a child.
2. Only a few men have been elected presidents of the US, after
having been defeated for that office.





Corect the items in the space provided above and compare them to those in
the Answers to SAQs section at the end of the unit.

The following sentence may (may not) contain grammatical
errors. In front of each item are a T and a F. If the sentence is
grammatically correct thoroughly, mark out the T. If it is not
grammatically correct, mark out the F.

TF 1. I heard you was a wedding party.
TF 2. The rare tires wore out.
Types of Tests I

Proiectul pentru nvmnt Rural 43

Sometimes, we want testees to identify the false element and
correct it. Disadvantage: the speed of the response is reduced; fewer
items can be constructed in a given period of time. It takes longer to
score it. Sometimes an answer not anticipated by the teacher is
given (objectivity is reduced).


SAQ 5

Consider each statement below. Circle T if it is a do and F if it is a
don't:
1. include items to adequately sample the material T F
2. include a Table of Specifications to assure adequate T F
comprehension
3. use questions which are partially true and partially false T F
4. use unnecessary words and phrases T F
5. write concise, unambiguous and grammatically correct T F
statements
6. have more than one theme in the item T F
7. have a pattern in the order of the response T F
8. have approximately the same number of true and false T F
statements
9. use negative statements T F
10. use the qualifying terms: all, none, some, few, many T F

Compare your answers to those in the Answers to SAQs section at the end of
the unit.


3.3.1.2. Entrance Tests
They are used to admit pupils to a certain school. They protect
admitting institutions and student funding agencies from too high a
failure rate. Tests are used by universities and other educational
institutions to assess the proficiency and predict the readiness of
applicants to benefit from instruction given in the foreign language.
Examples of entrance tests are ACT (The American College
Testing Programme), SAT (The Scholastic Aptitude Test) that is
required for admission to many colleges in the USA. Applicants to
law schools and medical schools must pass special admission tests
e.g. LSAT (The Law School Admission), MCAT (The Medical College
Admission Test)

3.3.1.3. Readiness Tests
They assess whether a child is ready to benefit from instruction
in general or from instruction aimed at acquiring a certain skill e.g.
reading readiness.

3.3.1.4. Placement Tests
Closely related to the notions of diagnosis and selection is the
concept of placement. A placement test is a test which is designed to
place students at an appropriate level or stage in a programme or
language course. Such tests are used to assign learners to groups at
different levels. The term refers only to the purpose for which it is
Types of Tests I

44 Proiectul pentru nvmnt Rural
used. Various types of tests or testing procedures (e.g. dictation,
interview, grammar test) can be used for this purpose. Such a test is
used to assign language learners to one of the following levels:
beginner, lower intermediate, middle intermediate, upper
intermediate, advanced. The UCLA Placement Exam is used to
assign students to all levels and screen students with extremely low
English proficiency for participation in regular university instruction.
Placement tests designed by teachers are usually successful
because they take into account the particular situation of their school.
They should be simple, easy to administrate and quick to mark.

3.3.1.5. Diagnostic Tests
Diagnostic tests are designed to show what knowledge or skills
a learner knows and does not know i.e. the strengths and
weaknesses in learning abilities of the students. As they try to find
out problem areas, diagnostic tests are important for teachers in
order to design mastery learning and work out remedial activities.
The data are also useful for self-assessment. For example a
pronunciation test may become a diagnostic test if it tries to identify
which sounds a learner is or is not able to pronounce. Few tests
serve only as diagnostic tests. Achievement and proficiency tests
may be useful for diagnostic purposes.
Areas of focus that may serve for diagnostic purposes
Phoneme discrimination tests
Grammar and usage tests
Controlled writing tests

They are usually used at the beginning of a language course.
Diagnostic tests are based on error analysis and deficiency analysis
(on learners language deficiencies).
Diagnostic tests
offer feedback to the learners
are set after about eight hours of instruction
are not longer than 15
can be marked by the learners themselves
motivate
reduce anxiety about later summative tests
quickly diagnose errors
prevent compound errors of learning (a weeks poor learning
makes next weeks learning all the more difficult)
are not used for grading or judging
Corrective help
mastery learning also involves a self-correcting system
retakes are allowed
learners are advised to use appropriate instructional materials
correction in group after the test
out-of-class meeting to clear up difficulties
encouraging family, friend to help
corrective learning and retesting continues until mastery has been
achieved
Types of Tests I

Proiectul pentru nvmnt Rural 45

3.3.1.6. Progress Tests
A progress test is a small scale test (quiz) and it is an
achievement test linked to a textbook/ a set of teaching materials.
Progress tests are tests prepared by a teacher on the basis of a
textbook/ curriculum/ a particular course of instruction given at the
end of a unit, chapter, course or term. Besides being more
specifically focused, they are narrower in scope than achievement
tests. They are usually designed by the class teacher who can fully
take into consideration the knowledge of the learners, the
programme which they have been following, his/ her own particular
aims and goals.
Teachers should learn how to construct such tests as they are
extremely useful.

Such tests
are based on the language programme or curriculum (textbook,
workbook) which the class has been following
assess learning and teaching
familiarize the teacher with the progress of each of his students
and of the whole class
have positive and motivating backwash effect
reinforce what has been taught
allow the learners to show what they have learned
show high scores if progress has been made
do not require a wide range of performance as in the case of
standardized achievement or proficiency tests (Gausss bell/
shaped curve or the Gaussian curve/ normal curve showing a
distribution of probability associated with different values of a
variant is not valid in this case).

Progress tests are widely used, as they try to measure the
extent to which the pupils have learned what has been taught. They
are usually constructed by the class teacher who can fully evaluate it,
taking into account:

his knowledge of the students;
the programme which they have been following;
his own particular aims and goals.


SAQ 6

Doing Well on Tests

Read each of the following questions and circle the correct answer.
1. When should preparation for taking a test begin?
a. when the teacher announces there will be a test
b. the first day of class
c. the night before the test

2. Which of the following is not an effective way to prepare for a test?
Types of Tests I

46 Proiectul pentru nvmnt Rural


3.3.1.7 Achievement or Attainment Tests
An achievement/ attainment test aims at showing how much of
a language has been learned with reference to a particular study
programme in accordance with explicitly stated objectives of a
learning programme. It differs from the proficiency test which is not
linked to any language programme or language syllabus. The two
terms are sometimes used interchangeably. Those who differentiate
between the two terms emphasize the fact that the achievement test
is based on past learning or a textbook while an attainment test on
what the learner can do now irrespective of his past learning. A more
useful distinction is that between achievement tests and proficiency
tests (related to a particular purpose). Achievement tests may be
traditional and innovative. They may be used for certification of
learned competence. Achievement tests may become diagnostic
tests if they isolate learning deficiencies in the learner with the
intention of remediation.

3.3.1.8. Mastery Tests
Formative Assessment refers to the process of providing
information to curriculum developers during the development of a
curriculum or programme. It is also used in syllabus design and the
development of language teaching programmes and materials. A
formative test is given during a course of instruction. It provides
feedback to the teacher and the student. It tests only what has been
taught. The score shows whether the student needs extra work. It is
a pass or fail. If a person fails he or she is able to do more study and
take the test again. All tests in our schools should be criterion
referenced. Criterion referencing is ideal for mastery objectives.
In order to avoid the usual 30 to 40 percent failure, teachers
should adhere to formative assessment by
allowing pupils as much individualized instructions as they feel
they need;
allowing them as much practice as they feel they need;
defining the skill the learners need in order to pass;
a. go over your notes, underlining key words and phrases
b. reread all of the material that might be covered on the test
c. write memories and make outlines of what you have
learned

3. Which method should you use when you design a test?
a. answer the more difficult questions first, since they will take
the most thought
b. answer the questions in the order in which they are asked,
so as not to take extra time deciding which question to
answer first
c. answer the easiest questions first; then tackle the ones that
are more difficult for you

Compare your answers to those in the Answers to SAQs section at
the end of the unit.

Types of Tests I

Proiectul pentru nvmnt Rural 47
practice should be focused on well defined criteria in a checklist, a
list of competences;
identifying the causes of failure (gaps in the learning of the pupils);
allowing any number of retakes.

B.S. Bloom has shown that as some people take five or six times
longer than others to learn something, the solution is appropriate
instruction. Intelligence and aptitude tests are measures of how quickly
students can learn or measures of what they can learn.
The learner needs, according to B.S. Bloom:
effort (the pupil should try hard enough for long enough);
quality instruction
awareness of errors
more time (the key to many learning situations)
Characteristics of mastery testing
in mastery testing, the person has either achieved (mastered) the
objective satisfactorily or not
typically, the objectives sampled in a mastery test are more narrow
mastery tests are used in programmes of individualized instruction
where a mastery-learning model is employed
the mastery learning model suggests that the degree of learning
is a function of the time spent on the material
if degree of learning is fixed at some mastery level, then the
amount of time individuals must spend to reach this level will vary.
The most rapid learners learn about 6 times as fast as the slowest
learners
mastery tests are useful at the early elementary school level

Mastery learning offers 90 percent success. How is mastery
learning organized?
First stage defining mastery objectives i.e. objectives
attainable by all the class, after several hours of instruction and
corrected practice
Writing mastery objectives
The learner should be able to list, to recall, to recognize ..
The learner should be able to differentiate between, to summarize, to
evaluate
Mastery learning is based on the truth that all learners can master a
subject given sufficient time.

Points to Ponder

There is a widespread belief that the success of Asian
education system depends in part on their zero tolerance of
failure. They adopt mastery learning style, diagnostic
testing followed by correction action. Geoffrey Petty
The essence of mastery learning strategies is group
instruction supplemented by frequent feedback and
individualized corrective help as each student needs it. B.S.
Bloom, Evaluation to Improve Learning
Types of Tests I

48 Proiectul pentru nvmnt Rural

Mastery learning implies
individualized instruction (step-by-step approach especially for the
corrected practice phase of learning projects, open-ended
activities allow learners to work at their own pace)
competence only in the basic use of skills or knowledge (stretching
activities that are not time-killers and that do not require extra
teaching assistance are recommended)
personalized extra-work for students who experience difficulties
peer tutoring
access to reading materials at different levels
awareness of what mastery learning means





3.3.2. Function of Content

A test may be based on a certain theory of language (e.g.
structuralist, communicative) or a specific domain of content.

3.3.2.1 Proficiency Tests
Proficiency tests or attainment tests are theory based tests.
They are most often global measures of ability in a language and are
not necessarily developed with reference to some previously
experienced course of instruction. They are used for placement or
selection because of their power to spread students out on a
proficiency range within the desired area of learning.
What does proficiency tests mean?
summative test
at the beginning of the century it meant the present level of the
learners proficiency i.e. limited or advanced proficiency made up
of a combination of reading and writing skills, but also translation
abilities, grammatical knowledge, vocabulary range etc. The total
or aggregate proficiency of the learner used to be assessed by
combining the scores from separate tasks or sub-tests.
After the Second World War the concept of unitary proficiency
was favored i.e. a proficiency test based only on a single factor
SAQ7

What are the differences between a progress test and an achievement
test?








Write your answers in the space provided above (in no more than 20 words) and
compare them to those in the Answers to SAQs section at the end of the unit.
WHAT DOES
MASTERY
LEARNING
IMPLY?
Types of Tests I

Proiectul pentru nvmnt Rural 49
e.g. the concept of expectancy grammar that samples only a
limited range of skills (translation, grammar knowledge). It has
been abandoned mainly because of its negative washback effect.
After 1980s the concept of proficiency was used to refer to the
proficiency to do something with the language that has been
learned. This concept is related to ESP: a doctor has to be
proficient in English for medicine. The first test of proficiency was
Cambridge CPE developed for foreign teachers of English. It is
specific for applied proficiency which does not contradict a
communicative view of language. It has good face validity (it
meets the needs of the learners) and a positive washback effect.

There are three concepts of language proficiency
Aggregate proficiency the learners present level of language
mastery, as demonstrated in his/ her ability to carry out a range of
language tasks. It is also used as a summative or even as an
achievement test. It has good washback effect.
Unitary proficiency an underlying level of proficiency or language
competence which can be applied to any language operation.
Because it uses only one technique (cloze, dictation, etc) it has a
negative washback effect. Dirty tests often adopt this unitary
approach.
Specific/ applied proficiency an externally defined level of
language needed for a particular job or academic course. This
concept of proficiency is widely accepted. It has good face validity
(e.g. ESP test).

In discussions of proficiency testing, we refer to specific or
applied proficiency. The main function of most proficiency tests is the
same as that of the placement/ selection testing. Proficiency tests
have always been regarded as summative tests (given at the end of
a language course). Initially, proficiency was seen as a complex
combination of various skills: reading, writing, translation,
grammatical knowledge, vocabulary range. The learners total or
aggregated proficiency is the result of combining the scores from
separate sub-tests. For instance, the Cambridge Syndicate
Certificate of Proficiency in English CPE (1913) consisted of five
papers lasting about six hours.


Point to Ponder

Dictation is the bte noire of teaching methods, but students often
find it quicker and easier than copying.
However, dictation is a disaster for slow writers and bad spellers.


The second definition (Oller, 1979), unitary proficiency, is like
Chomskys notion of linguistic competence a single factor which
can be applied to all aspects of language performance. If language
proficiency is a single or unitary factor, then a proficiency test need
only identify and assess that factor and that factor alone. For Oller,
LANGUAGE
PROFICIENCY
Types of Tests I

50 Proiectul pentru nvmnt Rural
the factor was expectancy grammar (the fact that we re able to
complete a sentence or to identify and correct a language form). A
test based on a single factor is built on the theory of unitary
proficiency.
Nowadays, proficiency means a persons ability in using a
language for a specific purpose. A proficiency test measures how
much of a language someone has learned. A proficiency test is not
linked to a particular syllabus or course of study, but measures the
learners general level of language mastery. Although this may be the
result of a particular course of instruction, the latter is not the focus of
attention.
Some proficiency tests have been standardized for worldwide
use, such as the American TOEFL Test which is used to measure
the English language proficiency of foreign students who wish to
study in the USA. The specific purpose is to answer the question:
Does the student know enough English to follow a lecture in English?
Proficiency tests should be based on a specification of what
skills candidates have to be able to do in a language in order to be
proficient i.e. able to function in a foreign language for a particular
purpose e.g. a test is designed to determine whether a candidates
English is good enough to function as a guide or translator.
However, there are standard proficiency tests that do not aim at
a certain occupation. They are administered to candidates from
different schools:
Cambridge Examinations: First Certificate Examination,
Proficiency Examination
Oxford Examination: Preliminary and Higher
They show whether a candidate has reached a certain standard
with respect to certain abilities. Each proficiency test should be
based on detailed specifications. The selection of the best by
teachers, employers should be based on this list. All proficiency tests
are not based on courses that candidates might have previously
attended. If achievement tests look back (what has been learned
from a teaching programme or course), a proficiency test looks
forward. It answers the question: Will the student be able to solve a
particular task which he/ she will be required to perform in English?



SAQ 8

The main function of most proficiency tests is the same as one of the
other principal test types. Which?
Aptitude testing
Placement/ selection testing
Diagnostic testing
Progress testing
Communicative testing



Circle the answer. Compare it with that in the Answers to SAQs at the end of the
unit.
Types of Tests I

Proiectul pentru nvmnt Rural 51

Communicative Tests. Communicative testing aims at testing
communicative proficiency. The test items of this type of testing use
communicative events i.e. items that are related directly to language
use; authentic task, knowledge of language function and
appropriateness of expression to social situation; knowledge of
structure and word meanings. Communicative testing allows the
testee some choice of what to communicate or what level of
proficiency to be tested on in certain skills; up to date texts
representative of the testees intended use of language.
Communicative testing aims at assessing communicative
competence which is made up of grammatical competence (lexical
knowledge of items, rules of morphology), socio-cultural competence
(knowledge of the relation of language use to its non-linguistic
context and communicative functions, coherence and cohesion),
strategic competence (verbal and non-verbal communication
strategies that may be called for breakdowns in communication due
to performance variables and to insufficient knowledge), and
discourse competence. This model has exerted a considerable
influence on all aspects of language teaching and assessment,
including overall approach, syllabus design, and methodology and
testing.


The main characteristics of the communicative tests

Communicative tests are integrative rather than discrete point.
Communicative testing answers the full range of language skills:

Administratively costly as they involve large numbers of markers
The use of yardsticks reduces the subjective elements: 9 expert
user, 8 very good user, 7 good user, 6 competent user, 5
modest user, 4 limited user, 3 extremely limited user, 2
intermittent user, 1 non-user. Each band has a brief description
of the expected language performance
They will test knowledge of the language rather than knowledge of
the elements of the language
The students will have to produce language rather than simply
recognize appropriate language
The learner will have to respond by using their own language
rather than merely the examiners language
Are tests of actual performance
Realistic. There is a real world purpose
Authenticity
Real world tasks
Based on an information gap
The learner uses strategies which are part of the communicative
competence

Examples of simple communicative items:
a) Show that you cant, or dont believe what the other person is telling you.
COMMUNICATIVE
COMPETENCE
COMMUNICATIVE
TESTS
Types of Tests I

52 Proiectul pentru nvmnt Rural
b) Show you are annoyed. Use some phrases which are strong and
firm but without swearing.

Answers:
a) Come off it!
Youre pulling my leg.
Thats not true.
You are having me on.
You cant really mean that!
b) Push off!
Get lost!
Ive had enough! Just stop it.
Leave me alone!
Ive already told you I dont want to discuss that.
Who on earth told you that?
Where on earth have I put my keys (books) etc?





























3.3.2.2 Achievement Tests

Achievement tests are related to a total syllabus or to
classroom lessons or units. Most school examinations secondary
school entrance tests, school certification examinations take the
form of achievement tests. National examinations organized by the
Romanian Ministry of Education and Research are achievement tests.

SAQ 9

Identify the main characteristics of proficiency tests in the modern
sense of the word. Tick the correct statements:
Proficiency tests:

a. are based on a language syllabus
b. look forwards
c. look backwards
d. are based on an analysis of the language learner
e. has positive washback effect
f. have a negative washback effect
g. are standardized
h. are made by teachers
i. are used for diagnostic purposes
j. are used for certificates
k. are used for selection

Compare your answers with those in the Answers to SAQs at the end of the unit.
Types of Tests I

Proiectul pentru nvmnt Rural 53
Innovative (communicative) achievement tests look
backwards and forwards. They look backwards because they are
based on a textbook/ syllabus that has been taught/ learned. They
look forwards because they test whether the learner is able to
transfer his/ her language knowledge or abilities to the world outside
the walls of the classroom.
Revision tests. You can also ask students to learn for a short
test. This works best if the doing detail is very precise. For example,
dont say Read for a test next week, but I want you to learn the
following The learning task should be very well defined. Then,
there is no excuse for failure. The revision test should be achievable
by the great majority. The success should be rewarded with praise
and recognition. Students get a real feeling of achievement from
success in tests and this fuels future motivation.



3.3.2.3 Aptitude/ Prognostic Tests

Aptitude (prognostic) tests are based on abilities that are
related to the process of acquisition, rather that the use of language.
Such tests measure the testees probable performance in a language
he/she has not studied. They measure the suitability of a testee for a
specific programme of instruction or a particular job. Sometimes they
are used synonymously with intelligence tests or scientific tests.
The theory of language aptitude, as described by Carroll (1956-
1981) hypothesizes that cognitive abilities such as rote
memorization, phonetic coding, and the recognition of grammatical
analogies are related to an individuals ability to learn a second or
foreign language, and together constitute language aptitude. Language
learning aptitude is also related to intelligence, age, motivation,
phonological sensitivity and sensitivity to grammatical patterning. As
you know, all these elements vary greatly from one pupil to another.

SAQ 10

a. Achievement tests are sometimes described as summative.
Why?
b. How do we judge good achievement tests?











Write your answers in the space provided above (in no more than 60 words) and
compare them to those in the Answers to SAQs section at the end of the unit.

LANGUAGE
APTITUDE
Types of Tests I

54 Proiectul pentru nvmnt Rural
The Modern Language Aptitude Test (Carroll and Sapon, 1958)
and Pimsburss Language Aptitude Battery have been recently
severely criticized: they do not measure language aptitude but the
general intelligence or academic ability. These tests disregard
learning strategies and styles, context, motivation and determination.
A language aptitude test may be used to predict the likelihood
of success of a candidate for instruction in a modern language. It is
made up of several different tests that measure:
sound coding ability i.e. the testee has to identify and remember
new sounds in a foreign language;
grammatical coding ability the testee has to identify the
grammatical functions of different part of sentences;
inductive learning ability learners are left to discover or induce
rules from their experience of using the language i.e. meanings
are induced without explanations;
memorization the ability to remember words, sentences, rules in
a foreign language.



3.3.3. The Frame of Reference

The results of a test can be interpreted in two different ways,
depending on the frame of reference adopted:

3.3.3.1. if the frame is the performance of a particular group of
individuals then we may speak of norm referenced tests (NRT) /
psychometric tests (Cziko: 1981)
3.3.3.2. if the frame of interpretation is domain referenced i.e.
interpreted with respect to a specific level/ ability, we may speak of
criterion referenced tests (CRT) / edumetric (Cziko: 1981)
There is no essential difference between CRT and domain
referenced tests. Both of them differ from objectives referenced
tests (items are selected to match objectives directly without
reference to a pre-specified domain of target behaviors.

SAQ 11

Read, reflect and take a decision. Would you administer your 3
rd
form
pupils an aptitude test? Yes? No? Justify your decision.










Write your answers in the space provided above (in no more than 60 words) and
compare them to those in the Answers to SAQs section at the end of the unit.

LANGUAGE
APTITUDE
TESTS
Types of Tests I

Proiectul pentru nvmnt Rural 55
3.3.3.1 Norm-Referenced Tests
A norm-referenced test compares candidates with each other
and usually rewards the best. Why norm referenced? Because
marks show how the testee does compared with the norm or
average, for all the testees. If an answer to a test shows if the testee
obtained a score that placed him/ her at the top ten per cent of
candidates or at the bottom five per cent, or that he/she did better
than 70 % of those who took the test, we may say that the test is
norm- referenced. The testees score relates one candidates
performance to that of the other candidates. However, the score
does not tell us directly what the testee is able of doing in the
language. You must not forget that for statistical reasons, norm-
referenced tests work effectively only for examinations with at least a
few hundred testees. The percentage of candidates getting each
grade remains unchanged, regardless of their marks, unless a
conscious decision is made to change the percentages. It follows that
variation in the difficulty of tests/ exams from year to year does not
affect grades, but is sometimes unfair, as testees may be better one
year than the next. Norm referencing was used for deciding grades
for admission exams in the Romanian system of education when a
limited number of students were accepted to higher forms of learning.
In England, norm referencing is used for deciding A-level and
GCSE grades.
Characteristics of Norm Referenced or Standardized Tests:
must have been previously administered to a large sample of people
acceptable standards of achievement can only be determined after
the test has been developed and administered (by reference to the
mean or average score of other students)
items at various levels of difficulty are included
discriminate between low and high achieving students
good reliability and validity
norm-referenced measurement is necessary to make different
predictions
if learners differ in achievement levels, this normative information
can often assist in decision-making
normative-referenced testing is often considered a substantial
component of program evaluation
whether one uses norms- or criterion- referenced measurement
depends upon the kind of decision one wishes to make
Weaknesses:
norms change with time as the characteristics of the population
change, and therefore tests must be re-normed
developed independently of any particular course of instruction


Point to Ponder

Never underestimate the pleasure, satisfaction and
educational value which pupils get from satisfactorily completing an
action however simple. (Michael Norland The Craft of Classroom)


Types of Tests I

56 Proiectul pentru nvmnt Rural



3.3.3.2 Criterion- Referenced Tests

Criterion referenced tests measure what the testee can do,
awarding a pass if they can do it, and a fail if they cannot. It does not
matter if all candidates pass or if all the testees fail. A clear example
is the driving test. This method of assessment is reliable only if the
criteria are well defined (e.g. in a checklist, or a list of competences;
otherwise, different markers will apply different standards or the
same marker may apply different standards on different days and
different candidates. This method is appropriate for mastery
objectives.

Strengths of criterion referenced tests

set criteria meaningful in terms of what people can do
the criteria do not change with different groups of candidates
motivate learners to reach these criteria
they have a beneficial backwash effect
they are helpful in clarifying objectives
useful with small groups
test anxiety is reduced

Weaknesses

many criterion-referenced are shorter and therefore less reliable
than norm-referenced tests
students are unable to compare their performance with that of
other students

SAQ 12

The following list summarizes the chief objectives of language testing:
1. to determine the readiness for instructional programmes
2. to classify or place individuals in appropriate language classes
3. to diagnose the individuals specific strengths
4. to measure aptitude to learning
5. to measure the extent of student achievement of the
instructional goals
6. to evaluate the effectiveness of instruction

Group these six categories under three headings:
1. Aptitude test
2. General Proficiency test
3. Achievement test



Compare your groups with those in the Answers to SAQs at the end of the unit.
ADVANTAGES
OF CRT
DISADVANTAGES
OF CRT
Types of Tests I

Proiectul pentru nvmnt Rural 57
bright students, who easily attain the level of mastery, may not be
motivated to reach high standards
the results do not inform decision makers whether children achieve
what they should when they should




Points to Ponder

Learners require some reward or reinforcement for learning.
Reinforcement should follow the desired behavior as soon as
possible.
Learning proceeds step by step rather than happening all at
once, and it is strengthened by repeated success.
Self - assessment is preferable to teacher assessment.


3.4 Summary



This unit has been concerned with informal assessment and formal
assessment. Categories of tests have been introduced taking into
account the purpose, the content of the tests and the frame of
reference within which their scores are to be introduced. The following
types of tests have been introduced: selection tests, entrance tests,
placement tests, diagnostic tests, achievement/ attainment tests,
mastery tests, proficiency tests, aptitude or prognostic tests, norm
referenced tests and criterion referenced tests.
Norm-referenced tests are used to interpret a score of an individual
by comparing it with those of other individuals. Criterion-referenced
tests are used to interpret a persons performance by comparing it to
some specified behavioural criterion.


SAQ 13

Suppose you give 200 learners a test, choosing the best 40% to attend
a good high school, and the next 60% to attend a vocational school.
Is this test. Tick the right answer.







Compare your answers to those in the Answers to SAQs section at the end of the
unit.
criterion-referenced

norm-referenced

motivating

demotivating

Types of Tests I

58 Proiectul pentru nvmnt Rural

3.5 Key Concepts

Achievement/ attainment test
Aptitude test
Communicative test
Criterion-Referenced test
Diagnostic test
Entrance test
Informal assessment
Norm-Referenced test
Mastery test
Placement test
Proficiency test
Progress test
Readiness test
Selection test

3.6 Checklist

What is the purpose of the test? Why am I giving it?
What skills, knowledge, and so on, do I want?
Have I clearly defined the instructional objectives?
Have I prepared a table of specification?
Do the test items match the objectives?
What kind of test/ test format do I want to use? Why?
How long should the test be?
What do I need to do to prepare learners for taking the test?
How are scores/ grades, or level of competency to be assigned?
How are the test results to be reported?

3.7 Answers to SAQs

SAQ 1

Your answer depends upon your personal teaching and learning
experience.

Formal assessment: written grammar activities, reading tasks,
listening tasks, vocabulary activities. The others are generally
assessed informally.

SAQ 2

Your answer depends upon your personal teaching and learning
experience.

A I
B I
C F

SAQ 3

Your answer depends upon your personal teaching and learning
experience.

a. L, b. H, c. L, d. L, e. H
Types of Tests I

Proiectul pentru nvmnt Rural 59
SAQ 4

If your answer to SAQ 4 is not comparable to the one suggested
below, please reread section 3.3.1.1.2 again.

1. In Blakes The Lamb, the lamb stands as a symbol of
innocence.
2. Only three men have been elected President of the U.S. after
having been defeated for that office in the preceding general
election.

SAQ 5

If your answer to SAQ 5 is not comparable to the one suggested
below, please reread section 3.3.1 again.

False: 3,4,6,7,9,10 (Donts)
The rest are true (Dos)

SAQ 6

Your answer depends upon your personal experience / common
sense.

1. b, 2. b, 3. c

SAQ 7

If your answer to SAQ 7 is not comparable to the one suggested
below, please reread section 3.3.1.6 and 3.3.1.7 again.

Progress tests:
Look back over a period of learning
Are small-scale tests
Are used for diagnostic purposes
Administered during the language programme
Devised by the teacher
Has no fail/ pass purpose

SAQ 8

If your answer to SAQ 8 is not comparable to the one suggested
below, please reread section 3.3.1 again.

Placement/ selection testing

SAQ 9

If your answer to SAQ 9 is not comparable to the one suggested
below, please reread section 3.3.2.1 again.

b, c, g, h,

SAQ 10

If your answer to SAQ 10 is not comparable to the one suggested
below, please reread section 3.3.2.2 again.

a. Achievement tests are sometimes described as summative, that
is, they sample the total language syllabus at the end of the
course/ after a term/ at the end of the school year.
b. The most important criterion is content validity. They must
sample the language syllabus fully and fairly. They should not test
anything which has not been taught.

SAQ 11

The answer is definitely no . A justification of it may be found in
the following paragraph:

Types of Tests I

60 Proiectul pentru nvmnt Rural
How is one to interpret a language aptitude test? Rarely does an
institution have the luxury or freedom to test people before they
take a foreign language to counsel certain people out of their
decision to do so. So, an aptitude test biases both student and
teacher. They are each led to believe that they will be successful
or unsuccessful, depending on the aptitude test score, and a self-
fulfilling prophecy occurs. It is better for teachers to be optimistic
for students, and in the early stages of a students process of
language learning, to monitor styles and strategies carefully,
leading the student toward strategies that will aid in the process of
learning and away from those blocking that will hinder the
process. (Brown: 1994, p. 261)

SAQ 12

If your answer to SAQ 12 is not comparable to the one suggested
below, please reread sections 3.3.1.7, 3.3.2.1 and 3.3.2.2 again.

Achievement test: 4
General Proficiency test: 1,2,3
Achievement test: 5,6

SAQ 13 If your answer to SAQ 13 is not comparable to the one suggested
below, please reread sections 3.3.3.1 and 3.3.3.2 again.

Norm referenced, demotivating


3.8 Further Readings

Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 4-10
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press, pp 9-22, 48-59, 59-75

Types of Tests II

Proiectul pentru nvmnt Rural 61

Unit 4

TYPES OF TESTS II


4.1 Unit Objectives ........................................................................................................... 61
4.2 Formal Assessment - Types of Tests and Testing ...................................................... 62
4.2.1 Scoring Procedures ................................................................................................. 62
4.2.1.1 Subjective Tests .................................................................................................. 63
4.2.1.2 Objective Test ....................................................................................................... 63
4.2.1.3 Performance Tests ............................................................................................... 66
4.2.2 The Specific Technique or Method They Employ .................................................... 67
4.2.2.1 Multiple Choice, Completion, Dictation, Cloze Tests ............................................ 67
4.2.3 The Approach to Test Construction ......................................................................... 79
4.2.3.1 Direct Tests .......................................................................................................... 79
4.2.3.2 Indirect Tests ........................................................................................................ 79
4.2.4 Function of the Number of Elements Tested at a Time ............................................ 79
4.2.4.1 Discrete Point Tests .............................................................................................. 79
4.2.4.2 Integrative Tests ................................................................................................... 79
4.2.5 Speed Tests vs. Power Tests .................................................................................. 80
4.2.6 Other Test Categories ............................................................................................. 80
4.3 Self Assessment ...................................................................................................... 80
4.4 Standardized Tests ..................................................................................................... 85
4.5 Summary .................................................................................................................... 88
4.6 Key Concepts ............................................................................................................. 88
4.7 Checklist ..................................................................................................................... 88
SAA 2 ............................................................................................................................... 89
4.8 Answers to SAQs ....................................................................................................... 89
4.9 Further Readings ........................................................................................................ 91


4.1 Unit Objectives

The history of language testing may be divided into three
periods
the prescientific period (prior to the early 1950s)
the psychometric structuralist period (from the early 1950s
through the late 1960s)
the integrative sociolinguistic period (from the late 1960s to the
present time)
This unit aims at covering the last two periods in the history of
testing and the new trend of self-assessment. It implicitly betrays the
teachers and researchers strive for objectivity.

At the end of this unit, you will be able to
distinguish among various objective tests
understand why clarity of expression is so important in test items
recognize how irrelevant clues to the correct answer can easily
creep into objective items
Types of Tests II

62 Proiectul pentru nvmnt Rural
define and discuss the following objective type formats: short
answer, matching, true and false and multiple choice
apply guidelines offered for constructing objective tests
write better objective tests


4.2 Formal Assessment - Types of Tests and Testing

Language tests can also be classified according to some
distinctive criteria
4.2.1 The way in which they are scored
4.2.2. The specific technique or method they employ
4.2.3. function of approach to test construction
4.2.4. function of the number of elements tested at a time

4.2.1. The way in which they are scored
4.2.1.1. objective tests
4.2.1.2. subjective tests

4.2.2.The specific technique or method they employ
4.2.2.1. performance tests
4.2.2.2. multiple choice, completion, dictation, and cloze tests

4.2.3. function of the approach to test construction
4.2.3.1. direct tests
4.2.3.2. indirect tests

4.2.4. function of the number of elements tested at a time
4.2.4.1. discrete point tests
4.2.4.2. integrative tests

4.2.1. Scoring Procedures
Function of scoring procedures tests may be:
objective (the correctors of the test takers response is determined
entirely by predetermined criteria, so that no judgment is required
on the part of the scorers multiple choice tests, cloze tests and
dictation)
subjective (the scorers must make a judgment about the corrections
of the response based on his / her subjective interpretation of the
scoring criteria oral interviews or written corpus)
The objective-type item was developed in response to the
criticism leveled against the essay questions: content sampling,
unreliable scoring, time-consuming to grade, and encouragement of
bluffing. All objective-item formats may be subdivided into two
classes:
supply type (short answer)
select type (true-false, matching, and multiple choice)
These two types are sometimes called recall and recognition.
OBJECTIVE
AND
SUBJECTIVE
TESTS
Types of Tests II

Proiectul pentru nvmnt Rural 63



4.2.1.1. Subjective Tests
A subjective test requires scoring by opinionated judgment on
the part of the scorer. An example might be the scoring of free,
written compositions for the presence of creativity as no definition of
creativity is provided. Many tests, such as cloze tests, permitting all
grammatically acceptable responses to systematic deletions from a
context, lie between the extremes of objectivity and subjectivity. It is
true, however, that some subjective tests may be objectified in
scoring. In this case, you have to use a precise rating schedule
clearly specifying the kinds of errors to be quantified or through the
use of multiple independent raters.

4.2.1.2. Objective Tests

Objective tests can be marked without the use of the
examiners personal judgment. The correctness of the testees
response is determined entirely by predetermined criteria:
examinees responses are compared with a scoring key.

Advantages of objective tests:
Have only one correct answer
Are scored mechanically; no particular knowledge or training in the
examined content area is required on the part of the scorer
All have the students working in a completely structured situation
and responding to a large number of items
Many questions can be asked during the examination period and
more adequate content sampling can be obtained
Objective items may create an incentive for pupils to build up a
broad base of knowledge, skills and abilities
May be marked by computer
Require more careful preparation than subjective examinations
Can be made just as easy or as difficult as the test constructor wishes
Can be pre-tested before being administered on a wider basis
Encourage guessing (4 or 5 alternatives for each item are
sufficient to reduce the possibility of guessing)
May emphasize irrelevant areas just because they are testable

SAQ 1

What is more subjective?
The scoring of an essay
The scoring of short answers in response to questions area
reading passage

Circle your answer. Compare your choice with that in the Answers to SAQs section
at the end of the unit.
STRENGTHS
OF
OBJECTIVE
TESTS
Types of Tests II

64 Proiectul pentru nvmnt Rural

Drawbacks. Some critics of the objective-type item contend
that objective tests:
Do not measure the higher mental processes, but rather
encourages rote memory
Encourage guessing
neglect the measurement of writing ability


Point to Ponder

Everything it is possible for us to analyze depends on a clear
method which distinguishes the similar from the not similar.
Lineus Geneva Plantarum, 1754

Objective items can be written so that they measure the higher
mental processes of understanding, application, analysis and
interpretation.

Guidelines for writing objective tests
test for important facts and knowledge
tailor the questions to fit the examinees age and ability levels as
well as the purpose of the test
write the items as clearly as possible. Ambiguity can often occur
when qualitative rather than quantitative language is used (few,
many, low can mean different things to different pupils)
clarity can also be improved by using good grammar and sentence
structure
avoid using interrelated items (the correct answer may be found in
another item)
there should be only one correct answer
avoid negative questions whenever possible (not, never, least)
do not give the answer away

It is obvious that objective techniques are preferable in terms of
practicality (speed of scoring) and reliability (consistency of
scoring). Usually, the so called productive skills (writing and
speaking) are subjective. They are based on the teachers judgment
or impression of the language output. The lack of reliability is
evidenced by:
inter-rater reliability (the scores of the two markers do not correlate)
Mark remark reliability (the same marker gives different scores
to the same test if asked to remark after a while)
The first generation of tests used subjective scoring: impressive
marking. The second generation restricted the use of subjective
tests. The third generation developed more reliable subjective
techniques.

Examples of objective test items (excluding multiple-choice items)

WEAKNESSES
OF
OBJECTIVE
TESTS
CONSTRUCTING
OBJECTIVE
TESTS
Types of Tests II

Proiectul pentru nvmnt Rural 65
1. Conversions
Helen is a very good learner of English.
Helen learns .
Change the following sentences into questions:
I am a student.
We can work together.
2. Gap filling
He will come at half eight.
.high-speed computer is always expensive.
I always take magazine with me.
the United States, 30 million people have successfully kicked
the habit of smoking.
1964, and only one three Americans now smoke.
3. Combination
Combine the following sentences into one sentence without using
and or but:
Do you see that traffic problem? He is stopping the cars. He is
letting the school children cross the road.
Answer: Do you see that traffic policeman stopping the cars to let
the school children cross the road?
Helen did her homework. Then she went swimming. After .
That cheese pie is hard to resist. Im on a strict diet.
4. Addition
Yet . Havent seen this film.
5. Rearrangement or Ordering
At/ poor/ look/ that/ woman/ old
I had seen/ immediately reminded me/ Johns face/ in a zoo/ of a
silly monkey.
6. False/ True
Circle in the margin
A. if the sentence is true
B. if the sentence is false

A magazine is a newspaper shop. A, B
A library is a bookshop. A, B
7. Insertion
Identify a missing word. Find the place and write in the word:
You should go to see it. Its best film Ive ever seen.
8. Alternative response
Circle your answer:
Do you enjoy to watch television? Correct/ incorrect
9. Sentence completion
I hope you if I leave. (free response)
10. Rewriting
Rewrite in reported speech: I think youre great, he said.
11. Correction
Correct the grammar: Is that the woman which lives next door to you?
Types of Tests II

66 Proiectul pentru nvmnt Rural
12. Short-Answer Items
They are somewhat of a cross between the essay and other
objective items. On the one hand, like essay items, they require
recall rather than recognition. On the other hand, they can be
objectively scored. Short answers items are best for dates, names,
places, vocabulary

Objectivity in scoring results adds a greater reliability.

4.2.1.3. Performance Tests
Performance tests are tests of skill the skill with which
learners can identify objects, manipulate objects, perform assigned
tasks, or react to simulated situations. Tests of performance are often
found in tests of typing speed, in simulated situations tests (the
typing of business letters, personal letters). The technique is often
used to assess skills in areas such as chemistry, physics, and foreign
languages. Performance testing has long been neglected from the
classroom testing as teachers are preoccupied with tests of verbal
behavior. Performance tests are:
used to measure the effectiveness of final behavior
used at early grade levels, before paper and pencil tests can be
effectively used (handwriting performance tests; sentence to be copied)

With very young examinees or with those who cannot write, it
may be necessary to use an oral response format. Identification tests
include tasks that may be presented orally, visually from a
reproduction on the exam paper. Examinees are asked to respond
orally and in writing. In the latter case, responses may be short
answers, completion, multiple choice ad matching.

SAQ 2

Which is generally more suitable for each of the following language
areas: an objective or subjective test?

Language
area
Objective Subjective
pronunciation
vocabulary
grammar
discourse
listening
speaking
reading
Writing

Tick your answers . Compare them to those in the Answers to SAQs section at
the end of the unit.

Types of Tests II

Proiectul pentru nvmnt Rural 67

Point to Ponder
Darwin concluded that he needed to keep a notebook and pencil with
him at all times, as he found that he remembered evidence in favour of
his theories, but quickly forgot evidence against them! We too tend to
have selective memories about our students work and behavior.
Geoffrey Petty





4.2.2. The Specific Technique or Method They Employ

4.2.2.1. Multiple Choice Tests
The multiple choice item is the most popular and useful of all
objective item- types. It can be used to measure rote memory as well
as complex skills. It is simple to score and administer.

SAQ3

Objective testing uses a variety of techniques. Classify them into:
discrete point technique and integrative techniques.

Techniques Discrete point
techniques
Integrative
techniques
transformation
Fill in the blanks
Blank and cue
Joining element
Replacing elements
Adding elements
Arranging elements
Matching elements
True/ false
Multiple choice
Cloze
dictation
Information transfer

Tick your choices in the space provided and compare them to those in the Answers
to SAQs section at the end of the unit.
Types of Tests II

68 Proiectul pentru nvmnt Rural
Stages in Constructing the Test Items
Consider the purpose of the test mastery or discrimination and
the items specific instructional objective
Determine the actual areas to be covered by multiple-choice items.
Make a careful list of exactly which items it is desirable to include
(table of specifications)
Determine the number of items to be included in the test
Include enough items to allow for reliability function of the level of
difficulty, the nature of the areas being tested, the purpose of the test
Avoid too long tests ( a source for administration difficulties, mental
strain and tension)
Have someone read the tentative items before preparing them in
final form

The Structure of a Multiple Choice Test
The initial part is the stem. A stem can be either a direct question
or an incomplete statement which can be completed correctly. The
stem contains the problem and sets an appropriate frame of
reference. It must include all the conditions and limitations in order
to respond. The stem may be stated as a direct question or an
incomplete statement. The direct question format has several
advantages:
1. it forces the item writer to state the problem clearly in the stem
2. it reduces the possibility of giving the examinees grammatical
clues
3. it may be more easily handled by the younger and less able
students because less demand is placed on good reading skills
One disadvantage of the direct question form is that it may
require lengthier responses
The choices by one of the options are referred to as: options,
responses, alternatives
One option is the answer, correct option, or key. It is true that a
multiple response question can have more than one correct
answer among the options. In this case, the pupil may be told how
many they are, and is required to identify all of them to gain the mark.
The other options are distracters (or foils). The role of the
distracter/ foil is to divert or distract the majority of learners from
the correct option.
Use the active, present tense in stems and options.
Avoid using double negatives in either the stem or one option. Be
also careful with negative questions. For example, if asking Which
of the following is not true?, or Which is an exception to the
rule? make it really stand out that it is a wrong option that has to
be selected in such questions. You should be aware that testees
usually look for correct options. It follows that it is much better to
write Which of the following is rather than Which of the
following is not .
Use simple, direct sentences, careful layout, the appropriate use of
emboldening
After the test has been constructed and carefully corrected for
ambiguities, inappropriate vocabulary, unintentional
Types of Tests II

Proiectul pentru nvmnt Rural 69
comprehension difficulty or obscurity, unlikely meaning, and a
correct pattern order of correct choices (e.g. ABCABC), it should
be passed on to another person to be read and checked for these
weaknesses.
Incorporate in the stem all words which otherwise would have to
be repeated in each alternative

Point to Ponder
God himself does not presume to judge a man till the end of his
days. Why then should you or I? (Ben Jonson)


Weaknesses
The multiple-choice test is both blamed and praised.
It is mainly blamed because it:
Tests only factual knowledge (in fact it can also measure
understanding application of principles, analysis synthesis and
evaluation);
Penalizes the creative student (research in this area failed to
support this assumption. In fact a creative student may be better
off with selective-type test, rather than supplytype tests);
Is difficult to write good multiple-choice items and this can be done
only by skillful individuals. Teachers cannot always think of
plausible sounding distracters
There is a tendency for teachers to write multiple-choice items
demanding only factual recall
Requires more time to answer (than in the case of true-false tests)
Cannot be used to measure testees ability to organize materials,
or to clearly express his answers according to acceptable
language usage rules (the only solution in this case is to
complement it with essay-type examination)



SAQ 4

A paper is scored by two different scorers/ or scored by the same
scorers after two weeks. The scores are similar in both cases.
Comment upon: reliability, objectivity, subjectivity.









Write your answers in the space provided above (in no more than 30 words) and
compare them to those in the Answers to SAQs section at the end of the unit.
Types of Tests II

70 Proiectul pentru nvmnt Rural

Strengths
The multiple-choice test is the most flexible and versatile of all
selection-type examinations. It can be used to measure instructional
objectives at all levels of the cognitive domain. (i.e. knowledge,
comprehension, application, analysis, synthesis, and evaluation). It
can also be extremely versatile: multiple choice tests may be
designed for all subject matters and with learners at all grade levels.

A large number of items can be answered during a brief period of
time. It can have a relatively small content sampling error if a table
of specifications is carefully designed and used
There are no scoring errors
It may be scored rapidly, accurately, and objectively even by
individuals who are unqualified (secretaries, student assistants)
The scoring is not influenced by the previous performance or by
the personal appearance of the testee
Scoring is completely objective

Example: The first version
The language of the earliest literature of England was:
a) difficult;
b) French;
c) Anglo-Saxon;
d) Latin.
The improved version
The language of the earliest literature of England was:
a) midland
b) French;
c) Anglo-Saxon;
d) Latin.

Example: 1. The works of Mihai Eminescu had a lasting effect on the
development of modern Romanian poetry. STEM
A. an enduring correct option/ key
B. an unknown distractor/ foil
C. a startling - distractor/ foil
D. a final - distractor/ foil

2. Where is water found? STEM
A. in the air foil/ distractor
B. on the earths surface foil/ distractor
C. in the ground - foil/ distractor
D. all of the above KEY RESPONSE

Principles of construction
Each multiple choice item should have only one answer
(absolutely correct) although some instructions require choosing
the best option. There must be no ambiguity in the choices.
Test one feature at a time (it is less confusing for the testee and it
helps to reinforce a particular teaching point). Items that test more
CONSTRUCTING
MULTIPLE
CHOICE TESTS
Types of Tests II

Proiectul pentru nvmnt Rural 71
features at a time are called impure items. Test constructors use
impure items because of the limited numbers of distractors.
Each option should be concise, unambiguous, and correct when
included in the stem (except in the case of specific grammar test
items). Phrase/ words not required to express the question or to
set an appropriate frame of reference should be eliminated.
Difficult and technical words and phrases should be eliminated.
Figurative language and complex sentences should be avoided.
Do not provide clues through tense conflicts, misuse of articles,
and singular- plural conflicts between verbs and nouns. Each
alternative must linguistically fit the stem.
The vocabulary and general construction of each sentence should
be at a level that the learners can easily comprehend, so that they
are not distracted from the real task of deciding on tense and
aspect.
The correct choice should not repeat word for word some sentence
in a listening test
Avoid using stems ending with the article a, an, or a preposition
The correct choice should not depend on comprehension or non-
comprehension of one unusual vocabulary item
All multiple choice items should be at a level appropriate to the
linguistic ability of the learners
The context should be a lower level than the actual problem that
the item is testing
Multiple choice items should be brief and clear (although the
tendency is to provide a context)
Begin with one or two simple items
In constructing items to test ability to select correct verb forms, one
must give sufficient indicators of time relationships to make the
appropriate choice clear
Examples:
1. If she .. (understand) the situation, she would explain it to
us.
The solution is understood as it is required by would explain.
2. Sometimes an adverbial expression should be used:
He asked me how I was every time I . (see) him.
They . (live) in Europe for years when I first met him.
In constructing the items, use sentences that learners might
encounter or wish to use.

SAQ 5

Why is the following item wrong? Revise it.
American colonists were given the same rights as other Englishmen by:
a. local governors
b. 1542
c. charters
d. Parliament

Rewrite the above item. Compare it to the one in the Answers to SAQs section
at the end of the unit.

Types of Tests II

72 Proiectul pentru nvmnt Rural
The Stem
The stem is the first part of a multiple choice item. The task has
to be clear and concise. No irrelevant details should be included.
After reading the stem, the testee should be able to identify what
exactly the requirements are. The wording should be very clear.
Options such as all of these, none of these should be avoided. Be
careful with negative questions.

The stem may take the following forms:

a. an incomplete statement

Example: The Romanian Constitution gives Parliament to pass laws.
A. the power
B. has the power
C. the power is
D. of the power

b. a complete statement

Example: People think of salt mostly as a seasoning for food, but this use
accounts for less than five percent of the worlds salt production.
A. exclusively
B. nutritionally
C. mainly
D. necessarily

c. a passage

Example: What does the passage mainly discuss?
A. A new type of telescope
B. Ancient and modern attitudes towards stars.
C. A system of star classification.
D. Progress in identifying new stars.
The primary purpose of the stem is to present the problem
clearly and concisely. The stem contains words/ phrases which
would otherwise have to be repeated.

Example: The word spaceman is used in the passage to refer to a:
A. traveller in space
B. traveller in a balloon
C. traveller in a boat
D. traveller in an ocean

The repeated word traveller should be part of the stem.
The word spaceman is used in the passage to refer to a traveller in:
A. space
B. a balloon
C. a boat
D. the ocean

The same principle applies to grammar tests:
Types of Tests II

Proiectul pentru nvmnt Rural 73

Example: The item:
I enjoy children playing football in the park.
A. looking for
B. looking to
C. looking about
D. looking at
E. looking on
May be re-written
I enjoy looking the children.
A. for
B. to
C.about
D. at
E. on

SAQ 6

Analyze the following badly-constructed multiple choice items:
1. What was the ring made of?
A. it was made of gold
B. it was made of iron
C. it was made of cotton and rope
D. it was made of light wood
2. Puts his own desires first:
A. egoist
B. egotist
C. altruist

Rewrite the above item. Compare it to the one in the Answers to SAQs section
at the end of the unit.


The Correct Option

Usually one correct or best option is recommended. More correct
options confuse students. The correct option should be approximately the
same length as the distractors. The correct option that is longer than the
distractors may become a give-away item. Rules for writing options:
keep answer options short and concise (not longer than 15 words)
options should be approximately equal in length
make all options parallel in grammatical structure and general
appearance
Each option should follow logically and grammatically from the
start. The correct option must not be grammatically different from
all the other options.
Do not begin with none of the above or all of these
Check the options for careless clues
Be sure that the distractors are clearly incorrect, although plausible
If you use options which form a pair, for example by stating the
opposite of each other, make sure that the remaining two options
also form a pair.
Words or phrases from the stem should not be repeated in the options
Types of Tests II

74 Proiectul pentru nvmnt Rural
You may make a question more difficult by creating options which
are very similar to each other

NOTE

Do not write long stems (not more than 50 words, sentences not
longer than 15 words)
Avoid the use of conditionals
Be careful to choose between: which is the best answer/ which is
the correct answer ( do not forget that multiple choice questions
may have more than one answer)
A good testee should be able to give an answer without seeing the stem
A stem should not include general instructions
Include as much of the problem as possible in the stem, so that the
options can be kept short


Example: John is a dutiful son.
A. stern
B. kind
C. very respectful and obedient
D. lawful

C. is a give-away item. It should be written in the following way:

John is a dutiful son.
A. stern
B. kind
C. obedient
D. lawful
The Distracters or Foils

Each distracter should:
be reasonably attractive and plausible to any testee
be grammatically correct
be constructed in such a way that students obtain the correct
option by direct selection rather than by elimination of obviously
correct options
be at the level being tested
avoid absurd distracters

Sources for plausible distracters:
learners mistakes
previous answers in tests
the teachers experience
contrastive analysis between the native (Language 1) and foreign
language ( Language 2)
Types of Tests II

Proiectul pentru nvmnt Rural 75



Test Instructions
It is extremely important for the students to clearly understand
the question format. If they do not understand, then we do not
measure what we want i.e. the instructional objectives.
Instruction may be oral, although a combination of written and
oral instructions is probably desirable. The instructions should be
clear, concise and explicit. The instruction should be accompanied by
examples of each type of items. Be extremely careful when an item
type occurs for the first time. Encourage the students to ask
questions.
Test Layout
The layout influences the speed and accuracy of the
examinee. Hints:
use all the space available without hindering readability
make it easy for the examinee to keep track of his place in the
examination
a two-column page may be the best layout for multiple choice or
true false items
if you use various item types in the same examination, group
together the same item types: true false items, multiple choice,
completion. In this way, you reduce the number of shifts in mental
orientation
do not use more than two or three item types on one hour
examinations
in order to reduce test anxiety, arrange test items in order from the
easiest to the most difficult
ordering the items on the basis of their content

Readability is increased if
each item is completed in the columns and on the page in which it is
started
reference materials (paragraphs, graphs) should occur on the same
page as the item
the items that refer to the same reference material should be placed
in the same page, separated from other unrelated terms by dotted
lines

SAQ 7

Analyze the following distractors:
1. Intimating oneself in anothers good graces.
A. extenuating
B. ingratiating
C. superseding



Circle A,B or C. Compare your answer to that in the Answers to SAQs section at
the end of the unit.

Types of Tests II

76 Proiectul pentru nvmnt Rural
if you use Arabic numbers for the items, use letters to differentiate the
alternatives
the labels on the test should correspond with the labels on the answer
sheet
Formats
Writing Multiple - Choice Tests
A typical multiple choice test will look like the following:

1. Write the correct option in full in the blank space:
The practice of making excellent films based on rather obscure
novels has been going on so long in the United States .
constitute a tradition.
a. being
b. as to
c. so that
d. could

2. Write only the letter of the correct option in the box.
She eating breakfast.
A. is
B. has
C. does
D. no extra word

3. Why that dog following us?
A. is
B. has
C. does
D. no extra word
4. Underline the correct option:
What .. your father do?
A. is
B. has
C. does
D. no extra word

5. Put a circle round the letter at the side of the correct option
As a result of in physics and chemistry, scientists have been
able to make important discoveries in biology and medicine.
A. there is more knowledge
B. what is now known
C. knowing now that
D. known now
The correct option should appear in each position (e.g. A, B, C,
D, E) approximately the same number of times in a test, or the
options may be placed in alphabetical order the first word in each
option. However, figures, dates should be kept in chronological order.

William Shakespeare was born in
A. 1564
B. 1592
C. 1603
Types of Tests II

Proiectul pentru nvmnt Rural 77
D. 1616

Circle in the margin the letter corresponding to the correct form to
complete the following sentences:
A. is
B. has
C. does
D. no extra word
1. She having lunch. A B C D
2. What that phrase mean? A B C D
3. he driven that car before? A B C D
4. What your mother do? A B C D
5. John never seen snow. A B C D
6. Who always knows the answer? A B C D
7. Why that dog following us? A B C D
8. Jenny usually eat lunch at school. A B C D
9. When the bus come? A B C D
10. What your mother making? A B C D

Read and circle in the margin the letter corresponding to the tense
and aspect of the verb that you would use to fill in the sentence:
A. simple past
B. past continuous
C. present perfect
D. past perfect
1. He discovered that he (lose) his money A B C D
2. They came just to (get) breakfast. A B C D
3. He asked me how I was every time (see). A B C D
4. The cake would have been better if it (stay) in the oven longer. A B C D
5. While we (wait) for the train, we heard a terrible noise. A B C D
6. If she (understand) the case, she would explain it to us. A B C D
7. They (live) in France six months when I first met them. A B C D
8. Thats the best play this year. A B C D

Circle in the margin the letter corresponding to the lost appropriate
preposition:

A. back; B. along; C. hrough, D out; E. off; F. up

1. Was he rude? Yes, he told me to get A B C D E F
2. If you lend him money, youll never get it A B C D E F
3. Although we were relatives, we didnt get A B C D E F
4. Be sure to get the tram at the third stop. A B C D E F
5. I liked the first part of the movie, but I cant get the second. A B C D E F

Circle in the margin the letter corresponding to the word which
correctly completes the sentence:
1. The sister of your father or mother is your A B C D
A. great aunt
B. uncle
C. stepsister
D. aunt
Types of Tests II

78 Proiectul pentru nvmnt Rural
2. The son of your sister is your A B C D
A. nephew
B. cousin
C. niece
D. godson
3. The mother of your father is your A B C D
A. stepmother
B. grandmother
C. godmother
D. mother in law

Cloze Tests

Cloze is a testing technique whereby a complete text is gapped
after a few sentences of introduction. Learners try to fill each gap
with a word that fits the context. Marking can be either for the exact
word which is more reliable or an equivalent.
Cloze is often used as a test of reading comprehension, though
there are questions as to what reading skills it reveals The term was
coined in 1953 from the gestalt notion of closure, referring to the
human tendency to complete pattern once grasped.

Cloze tests have a variety of format:
A fixed ratio deletion method that establishes the deletion of
every n
th
word (usually every sixth or seventh word) regardless of
what that word may be
Rational deletion words that meet certain grammatical discourse
criteria

Scoring can vary. The testee may be required to supply:
The exact word that was deleted (efficient for exact scale testing; it
can also be adapted to a multiple choice format for easier
scoring mechanisms)
An acceptable word which makes sense

Both types are valid as long as there are 100 blanks (deletions).
The cloze test is considered an integrative test because it
requires knowledge of vocabulary, grammatical structure, discourse
structure, reading skills and strategies. It also demonstrates that the
testee has an internalized expectancy grammar i.e. he/she is able
to predict what item comes next in a sequence.


SAQ 8

Starting from the following paragraph construct as many cloze tests as
possible on a fixed ration deletion and rational deletion.

After the trip I will return to my country, where my family is expecting
me to take over the family business. I am not anxious to do so since I
am not interested in that business, and while it is profitable, it is not
personally rewarding to me. Quite frankly, my real hope is to develop a
CLOZE
FROM
CLOSURE
A CLOZE
TEST AN
INTEGRATIVE
TEST
Types of Tests II

Proiectul pentru nvmnt Rural 79


4.2.3 Function of Approach to Test Construction

4.2.3.1 Direct Tests
Ratings of language use in real/ authentic communication are
testing language performance directly e.g. an interview, a
contextualized vocabulary test
4.2.3.2 Indirect Tests
Indirect tests are indirectly measuring language proficiency i.e.
are less valid for measuring language proficiency e.g. multiple choice
recognition tests, a synonym matching tests. However, the value of a
test should be decide on the basis of other criteria in addition to
whether they are direct or indirect e.g. cost efficiency

4.2.4 Function of the number of elements tested at a time, a distinction is made
between

4.2.4.1 Discrete Point Tests
Discrete point tests are designed to measure knowledge or
performance in very restricted areas of the foreign language e.g. a test
of ability to use correctly the perfect tenses in English, supply correct
prepositions in a cloze passage. Discrete point tests are based on the
theory that language consists of different parts (e.g. grammar,
phonology, vocabulary) and different skills (listening, speaking, reading,
and writing) and these are made up of elements that can be tested
separately. The meaning of discrete is separate and distinct from each
other. Tests consisting of multiple choice items are usually discrete
point tests. Discrete point items operate at the sentence level. This
shortcoming is enhanced by the lack of context.

4.2.4.2 Integrative Tests
An integrative test is one that tests several language skills at
the same time (e.g. a dictation test requires the learner to use
knowledge of grammar, vocabulary, listening comprehension).
Integrative tests have in view a greater variety of language abilities.
They have a greater value in measuring overall language proficiency
e.g. random cloze, dictation, oral interviews, and oral imitation tasks.
strong leadership within the company itself, hopefully with the
involvement of my younger brother, who says he really wants to play a
vital role. I think it is realistic to expect the company to prosper in this
way, leaving me free to pursue my own interests.










Compare your solutions to those provided in the Answers to SAQs section at the
end of the unit.
Types of Tests II

80 Proiectul pentru nvmnt Rural
Dictations and cloze tests can combine to form partial dictations or
oral cloze tests. Dictations are considered integrative tests because
they require: careful listening, reproduction in writing of what is
heard, efficient short term memory.
Test batteries solve the conflict between the two kinds of tests
as they comprise discrete point subtests for diagnostic purposes and
integrative tests. The test battery provides a total score that is
considered to reflect overall language proficiency.

4.2.5 Speed Tests vs. Power Tests
A speed test is one in which the items are so easy that every
testee might be expected to get very item correct, given enough time. But
in the case of a speed test time is not provided i.e. testees are compared
on their speed of performance rather than on knowledge alone.
A power test allows sufficient time for every person to finish but
the items are so difficult that very few testees are expected to get
every item correct.



4.2.6 Other Test Categories
Examinations vs. quizzes
Questionnaires
Rating schedules
Single stage and multiple stage tests
Language skills tests
Language feature tests (verb tense/ aspect/ voice; subject/ verb
agreement; modifiers, comparatives, superlatives, relativization,
embedding)
Memory span tests
Sentence completion tests
Word association tests etc

4.3 Self Assessment
Self assessment is carried out by students themselves. There is a
danger in students relying solely on their teacher for the evaluation of their
performance. If they are never trusted to evaluate their own experience
they will not acquire the habits and skills of reflecting on their own
performance. The aim is to produce a student with the confidence and skill
to reflect and evaluate independently of the teacher, to become a reflective

SAQ 9

An integrative test is one that measures knowledge of a variety of
language features, modes, or skills simultaneously. An example would
be dictation. What does it measure simultaneously?



Write your answers in the space provided above (in no more than 60
words) and compare them answers to those in the Answers to SAQs
section at the end of the unit.
Types of Tests II

Proiectul pentru nvmnt Rural 81
practitioner. Ask questions: What were the main difficulties? Ask students
to draw up an assessment checklist which will aid reflection.


Point of Ponder

Research shows that students are generally quite accurate in their
self-scoring.


Students are often harsher in evaluating their own performance
than the teacher would be. Provide self-assessment questions.
Provide students with model answers after they have completed the
worksheet. The students can then use these to mark their own or
each others work. Self-assessment can be:
Low heat: self-worked tests, quizzes
High heat: presentations or exams

Example: The Standford Binet test is an example of reliable intelligence test.
It is mainly used for children and, among others, for the diagnosis of
academic achievement. It measures:
Attention (absorbed by task versus easily distracted)
Reactions during test performance:
- normal activity level versus abnormal activity level
- initiates activity versus waits to be told
- quick to respond versus urging needed
emotional independence
- socially confident versus unsure
- realistically self-confident versus distrusts own ability
- comfortable in adult company versus ill-at-ease
- assured versus anxious
problem solving behavior
- persistent versus gives up easily
- reacts to failure realistically versus reacts to failure
unrealistically
- eager to continue versus seeks to terminate
- challenged by hard tasks versus prefers only easy tasks
independence of examiner support
- needs minimum of encouragement versus needs constant
praise and encouragement
expressive language
- excellent articulation versus very poor articulation
receptive language
- excellent sound discrimination versus very poor discrimination
establishing rapport
- easy versus difficult

The test also takes into account:
verbal reasoning: vocabulary, comprehension, verbal relations
abstract/ visual reasoning: pattern analysis, copying
quantitative reasoning
short time memory
Types of Tests II

82 Proiectul pentru nvmnt Rural
Point to Ponder
The ultimate goal of the educational system is to shift to the
individual the burden of pursuing his own education.
J.W. Gardener

Self-assessment is a component of learner-centred education
or student autonomy. It underpins the individualization of instruction,
the development of patterns of self-directed learning and of the
methodology of self-access, as well as implying some degrees of
learner training.
It follows that autonomy refers to learners capacity to take charge
of both the strategy and content of learning. The psychological
approach in favour of autonomy suggests that learning is more efficient
and motivating to the degree that it matches a learners own style and
strategies. An autonomous learner can identify what has been taught, is
able to formulate his/her won learning objectives. Autonomous students
select and implement appropriate strategies. They can monitor these for
themselves and finally they know how to give up strategies that are not
working for them. The so-called self-directed learning may also involve
other issues: syllabus, negotiation, the role of autonomy in whole-class
instruction.
In this context the learners have to assume a degree of
responsibility over the assessment of the progress of their own
learning. The concept of the trainability of autonomous learning skills
and implicitly of self-assessment is currently brought into discussion.
An important role is also played by attitudes to language learning that
range from anxiety about the language and the situation, through
attitude to speakers of the L2, the country in which it is spoken, the
classroom, the teacher, other learners, the nature of language
learning, particular elements in learning activities, tests and grades.
Attitudinal information has a place in language teaching in two
areas: preparing the student to learn and this may resolve both the
discovery of the students own underlying attitudes, and a process of
attitude change and preference for particular kinds of learning
activities. The effectiveness of all learning depends on the learners
ability to judge when her/his performance (comprehension and
production) is adequate for the situation in which she/he is operating.
The effective learner is one who can judge when his/her proficiency
is adequate for the purpose. The learner who is satisfied with
inadequate performance will tend to allow the language to fossilize.
On the contrary, the learners who strive for perfection will limit his/her
progress in range and quality. Judging the adequacy of ones
performance is after all a matter of self-assessment.

Point to Ponder
It does not seem reasonable to impose freedom on anyone who
does not desire it.
Carl Rogers, Freedom to Learn


SELF-
ASSESSMENT
A WAY TO
STUDENT
AUTONOMY
AUTONOMOUS
LEARNING
SKILLS
Types of Tests II

Proiectul pentru nvmnt Rural 83

Self-monitoring versus self-assessment
Although both of them refer to judging ones performance, the
difference between them is one of scale and timing. Self-monitoring
refers to small stretches of language, self-assessment to large
stretches of language. It is also true that self-assessment depends upon
and includes self-monitoring. The new trends in teaching and learning
make teachers responsible to explicitly help the learner to become
more proficient in self-assessment, so that he/she can become better
and better at judging his/her performance. Although most assessment is
explicitly summative, much everyday classroom assessment is
concerned within the learning i.e. formative assessment. If I hadnt
separated formal and informal testing, I would have considered self-
assessment as an integral part of formative assessment.
The ability to judge when performance is at an adequate level
for the situation in which one is operating, includes

establishing appropriate criteria or standard which may be overt or
covert; it may also be informal e.g. what a particular phrase
sounds like or a sense that the phrase feels right or wrong
judging the situation to decide what the minimum acceptable standard is

Point to Ponder
Dont assess key or common skills, such as problem-solving or
working with others without teaching these skills.
Geoffrey Petty

Self-assessment may involve:

the motivation to undertake it
the willingness to reject inadequate performance in some internal
standard established by oneself or learned
the ability to measure ones own performance against the standards
the confidence to make these assessments
the recognition that ones ability to judge is limited

Self-monitoring and self-assessment, although part of the
process of learning, can be developed i.e. the learner may be helped
to improve his self-assessment techniques. This process begins with
the awareness activity that aims at persuading the learner that self-
assessment is a useful activity.

SELF-EVALUATION FORM
What were your objectives for this term?
What did you achieve?
What do you feel you have been responsible for this year?
Did you live up to this responsibility? Why/ Why not?
Give examples of good activities.
Give examples of good materials.
SELF-
ASSESSMENT
A WAY TO
STUDENT
AUTONOMY
Types of Tests II

84 Proiectul pentru nvmnt Rural
Which type of evaluation have you made use of? Diaries?
Whole-class talks? Talks with the teacher?
How do you judge the usefulness of the various types of
evaluation? Good? Bad? In Between? Dont know. Why?
Positive things about your teacher.
Negative things about your teacher.
Ideas for next term.


Techniques of self-assessment
You can involve your learners in self-assessment if you ask
them to write reports about their English and give them to you, if you
learn their problems from their diaries, if you involve them in rating
their skills in English, if they monitor their language when they edit
their essays, when they use your correction codes, if you ask them to
grade their mistakes. They may also list difficulties (pronunciation
problems), favorite activities, organize group and class surveys to
find out about their learning preferences and problems. You may
begin by distributing a questionnaire to learn how they feel about
their English e.g.

1. Learning English is (difficult, easy, very difficult)
2. Which of these areas of English are easiest for you? Rank them
staring from the easiest area. (speaking, listening, writing, reading,
grammar, vocabulary)
3. Give yourself a mark
4. Is English useful?
5. Do you try to speak English with your class mates?

You may also gather information with the help of th following
questions:
1. Why do you learn English?
2. How do you learn English?
3. How do you tackle an unknown text?
4. What is it to learn a word?


Point to Ponder
You learn from your own mistakes only when you think about them.
Michael Hermis and Paul McCann

Self-assessment
Implies knowledge about language (language awareness)
Implies teachers doubts (unreliability); usually students give
themselves lower marks than they deserve
Should not involve marking but thinking about performance and
progress
Must be integrated with other classroom activities
Can take a lot of time
Types of Tests II

Proiectul pentru nvmnt Rural 85
Useful tips for grading
For some homework, provide keys that the students use to score
their own work
Require students to keep a page in their notebook on which they
record each test or quiz grade when they receive it. That way they
always have a record of their own
Teach students to edit and revise their papers before turning them
in
Youll manage your time more efficiently if the assignments are
spaced
Use a computer grade keeping system
If the teacher does not have time to correct every student
assignment, allow students to swap and grade papers. Spot check
to prevent cheating


Points to Ponder
Independence, maturity, and self-reliance are all facilitated when
self-criticism and self-evaluation are basic and evaluation by
others is of secondary importance. (Carl Rogers)
Effective teachers let students know they are somebody, not
some body. (William Purkey)
Nothing is so fatiguing as the eternal hanging on of an
uncompleted task. (William James)

4.4 Standardized Tests

Tests and examinations of proficiency are designed by many
organizations in many English speaking countries. Examinations are
generally closed and restricted to particular educational systems.
Tests are open and available at all levels. Some tests are restricted
to limited proficiency levels, others claim to measure across the
range of all levels. Some tests concentrate on one skill (oral skills);
other test all the four skills.

Great Britain, University of Cambridge, LES Tests
1. Diploma in English Studies (DES);
2. Certificate of Proficiency in English (CPE);
3. First Certificate in English (FCE); the most frequently taken
language test in the world;
4. Preliminary English Test (PET);
5. Key English Test (KET)

English Speaking Union Frameworks nine-point scale is
approximately equivalent to the nine-band descriptors used by the
International English Language Testing system (IELTS)

9. Expert user;
8. Very good user;
7. Good user;
6. Competent user;
5. Modest user;
Types of Tests II

86 Proiectul pentru nvmnt Rural
4. Limited user;
3. Extremely limited user;
2. Intermittent user;
1. Non user.

USA Tests Proficiency tests, the so called ACTFL Proficiency Guidelines, have
been developed in collaboration by the Council on the Teaching of
Foreign Languages, The Educational Testing Service and the
Federal International Agency Language Round Table.
There are four levels (in fact seven if a more refined approach
is used)
Novice; Novice High
Intermediate; Intermediate High;
Advanced; Advanced High;
Superior.
General English
The Cambridge range already mentioned;
The Certificate in Communicative Skills in English (Cambridge
CCSE);
The Association of Recognized English Language Schools (ARELS);
The Oxford Delegacy of Local Examinations (ODLE);
The English Speaking Board;
The Institute of Linguistics;
Pitmans Examination Institute;
Trinity College.
Placement test
Nelson Quick Check
Oxford Placement Test
Study English Tests
IELTS;
TOEFL;
CENRA;
Northern Examination and Assessment Board;
Pitman;
Cambridge CPE;
Certificate in Advanced English (CAE);
Michigan English Language Battery;
University of London;
Business English Tests
London Chamber of Commerce and Industry (LCCI);
Oxford International Business English Certificate;
Pitmans English for Business;
Cambridges Certificate in English for International Business and
Trade;
Educational Testing Services (USA);
Test of English for International Communication (TOEIC).

Tourism
Oxfords Tourism Proficiency;
LCCI.
Types of Tests II

Proiectul pentru nvmnt Rural 87
Teaching English
Cambridge Examination in English for Language Teachers
(CEELT)
Young learners
ARELS/ODLE : Junior Counterpart (ages 12-17)
Associated Examination Board: English as an acquired language
(ages 7-12);
Pitman (ages 9-13).
Special purpose testing
Testing language required for specific purposes is an aspect of
English for Specific Purposes (ESP). Where individuals or small
groups are concerned, there may be no need for a standardized test.
Cases of large scale use are the fields of academic English,
business English, and medical English. In these fields, there has
been considerable test development, aimed at accurately assessing
appropriate language skills for relevant activities. Features of this
development are the use of authentic tests, authentic materials from
appropriate situations, communicative activities and group tasks for
assessing spoken language as if in real life situations.


SAQ 10

Self-assessment is extremely important for you. Examine the ways
in which self-assessment can be carried out in work context. Circle
the forms you can use:
Use a learning contract
Include assessment of a work log
Ask students to produce a reflective journal
Consider using a portfolio
Devise record keeping aids for students
Use technology (e-mail)
Encourage networking

Compare your answers to those in the Answers to SAQs section at the end of
the unit.



SAQ 11

1. Objective-type tests can use either the one correct answer
or the best answer format. Which one would you use? Why
would you use this type over the other?
2. If you were preparing a true-false test would you have more
true than false items? Why/ why not?



Write your answers in the space provided (not more than 60 words) and compare
them to those in the Answers to SAQs section at the end of the unit.

Types of Tests II

88 Proiectul pentru nvmnt Rural
4.5 Summary

The principal ideas, conclusions, and recommendations
presented in this unit are summarized in the following
statements:
1. Objective tests must be written as simply and clearly as
possible so that all examinees will be able to make the same
interpretation of the items intent
2. Test items should be tailored to fit the age and ability level of
the examinees
3. Technical jargon, and excessively difficult vocabulary should
be avoided
4. Irrelevant clues should be avoided
5. Trivial details should be avoided (otherwise, we encourage
rote memory)


4.6 Key Concepts
Cloze test
Criterion referenced test
Diagnostic test
Direct test
Discrete point test
Formative evaluation
Indirect test
Integrative tests
Norm referenced test
Objective test
Objectives referenced test
Power test
Speed test
Standardized test
Subjective test
Summative evaluation
4.7 Checklist
Do you set the most able learners the highest targets?
Are your learners kept informed of how their attainment compares
with the need of the course?
Do students evaluate their own performance?
Is this evaluation checked by the teacher?
Are successes and failures related to theory and how to do best
next time?
Types of Tests II

Proiectul pentru nvmnt Rural 89


SAA No. 2

Write a multiple choice test made up of 20 items. Use as
distracters some of the mistakes made by your pupils.

Please note that a corect item will count for 5 points.

Do not forget to send your multiple choice test to your tutor.


4.8 Answers to SAQs

SAQ 1 This SAQ 1 is meant to activate your schemata.

Marking an essay

SAQ 2 Your answer depends upon your personal teaching / learning
experience.

Language area Objective Subjective
pronunciation X
vocabulary X
grammar X
discourse X
listening X
speaking X
reading X
Writing X

SAQ 3 If your answer to SAQ 3 is not comparable to the one suggested
below, please reread sections 4.2.4.1 and 4.24.2 again.


Techniques Discrete point
techniques
Integrative
techniques
transformation X
Fill in the
blanks
X
Blank and cue X
Joining element X
Replacing
elements
X
Adding elements X
Arranging
elements
X
Matching
elements
X
True/ false X
Multiple choice X
Types of Tests II

90 Proiectul pentru nvmnt Rural
Cloze X
dication X
Information
transfer
X

SAQ 4 If your answer to SAQ 4 is not comparable to the one suggested
below, please reread section 4.2.1.2 again.

The test is reliable. The fact that the score is the same points to its
objectivity.

SAQ 5 If your answer to SAQ 5 is not comparable to the one suggested
below, please reread section 4.2.2.1 again.

Item B may also be considered an acceptable answer. In this case
both B and D are acceptable answers.
The revised form is:
Who gave the American Colonists the same rights as the
Englishmen?
A. The King
B. Local governors
C. Colonial legislatures
D. Parliament

SAQ 6 If your answer to SAQ 6 is not comparable to the one suggested
below, please reread section 4.2.2.1 again.

The stem contains words repeated in each option.
What was the ring made of?
A. gold
B. iron
C. cotton
D. wood
1. The distracters are too difficult.

SAQ 7 If your answer to SAQ 7 is not comparable to the one suggested
below, please reread section 4.2.2.1 again.

The distracters are too difficult. They may distract even the good
student. Such a tendency happens in vocabulary test items.

SAQ 8 If your answer to SAQ 8 is not comparable to the one suggested
below, please reread section 4.2.2 again (the paragraph on cloze
tests.)

1. After the trip I will to my country, where my is expecting me to
take the family business. I am anxious to do so since am not
interested in that , and while it is profitable, is not personally
rewarding to .
2. After the trip I will r..n to my country, where my fy is expecting
me to take or the family business.
Types of Tests II

Proiectul pentru nvmnt Rural 91
3. After the trip I will return/ visit/ go again to my country, where my
family/ friends/relatives is expecting me to take in/ up/ over/ for the
family business.
4. After the trip I to my country, where my family me over the
family business.

SAQ 9 If your answer to SAQ 9 is not comparable to the one suggested
below, please reread section 4.2.2.2 again.

Listening comprehension, spelling, or general language proficiency

SAQ 10 It is obvious that depends on your teaching / learning experience

Personal choice.

SAQ 11 If your answer to SAQ 11 is not comparable to the one suggested
below, please reread sections 3.3.1.1 and 4.2.1. 2 again.

1. The correct answer variety should be used insomuch as it is
difficult to obtain agreement, even among experts, on what is the
best answer.
2. By having approximately an equal number of true and false
statements in the test, we limit the influence of response set on the
validity of the test score. But having exactly the same number of
true-false statements could be a clue to the test-wise student. It is
much better to have more false than true statements since there is
evidence that false statements tend to be more discriminating.


4.9 Further Readings

Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 4-10
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press, pp 152-155

Testing the Language Skills I

92 Proiectul pentru nvmnt Rural

Unit 5

TESTING THE LANGUAGE SKILLS I


5.1 Unit Objectives ............................................................................................................92
5.2 Testing Speaking ........................................................................................................92
5.2.1 What Is Speaking?....................................................................................................93
5.2.2 Types of Speaking Based on Content and Function .................................................93
5.2.3 Objectives ................................................................................................................94
5.2.4 Types of Speaking Tests ..........................................................................................95
5.3 Testing Listening .......................................................................................................101
5.3.1 How Do We Comprehend? ....................................................................................102
5.3.2 Micro Skills .............................................................................................................102
5.3.3 Informal Evaluation ................................................................................................103
5.3.4 Scoring the Listening Test.......................................................................................106
5.4 Summary ...................................................................................................................110
5.5 Key Concepts ............................................................................................................110
5.6 Checklist ...................................................................................................................110
5.7 Answers to SAQs ......................................................................................................111
5.8 Further Readings .......................................................................................................112


5.1 Unit Objectives

Today language teaching is communicative, interactive, and
integrated. In spite of this, we have to find ways of simplifying things
so that learners can listen and talk about real things in real ways. The
next two units examine the approaches employed in order to test the
four basic language skills.
This unit attempts to provide the most obvious, natural, and
effective general strategies of testing speaking and listening. The aim
of this unit is to familiarize you with the principal techniques of testing
and marking the skill of speaking and listening.
By the end of this unit you should be able to:
construct your own speaking and listening tests
evaluate the speaking and listening tests from the textbooks you
currently use
construct and administer such tests competently

5.2 Testing Speaking

Speech is probably the most socially visible skill, the one that
allows the greatest amount of interaction, the one that will most
quickly lead people to conclude that you are proficient.
However, we should not forget that speaking
probably comes from listening
is less important that reading or listening for many who are
studying in academic programs in the target language
Testing the Language Skills I

Proiectul pentru nvmnt Rural 93
in rare cases is less important than writing
cannot be detached from the overall context or from the extra-
linguistic components such as conversational rules, gestures,
cultural, information, social states, level of formality, gender and
age differences, and so on.

5.2.1 What Is Speaking?
The ranges of things people do when speaking are so broad
that saying only that we are teaching or testing speaking is almost
meaningless. Oral production varies by:
content
function (purpose),
emotional and social context,
processing capabilities (proficiency)

5.2.2 Types of Speaking Based on Content and Function
short exchanges for information or courtesy
we talk to officials, clerks, police officers, classmates to get the
information we need. We greet and are pleasant to people we
meet or deal with (our relatives, the postman)
longer impersonal exchanges:
job interviews, business transactions, problem solving, seminars or
group work. These are likely to have a formal or semiformal register.
speaking directed at others in structured contexts, such as class
presentations, lectures, news broadcast. The language tends to be
at the formal end of the register
conversations over dinner, gossip, friendly discussions, and so on.
These are at the informal end of the register.
aesthetic, ritualistic and entertaining speech
what we do with others (reciting a pledge, a prayer or singing) or
ask to others with the purpose of eliciting aesthetic responses
such as with plays, poetry readings.
classroom discourse

Types of speaking based on emotional and social context
and on processing variables
speech can be: objective, cold, impersonal, it can be intimate,
relaxed, and informal
speech can vary in relation to:
- relative states, age, or gender
- or because of in-group or out-group affiliation

The conclusions about types of speaking are important for
testing speaking. The variety of types points to the complexity of the
task of testing speaking. Many different types of speech exist. They
are affected by topic, context, relationship between speakers, and
proficiency. Mere vocalization of correct utterances or repetitions of
sentences in a dialog are not really talking at all.
Real talking implies to say something important to someone
under certain conditions and for a certain purpose.
Testing the Language Skills I

94 Proiectul pentru nvmnt Rural
The linguistic form will be among the most important
considerations, but will certainly not be the only one. Normal human
speech is not produced in the sequence, grammatical form, selection of
words, application of the phonological rules but starting from a message
triggered during the course of a conversation interaction. This message
(feeling, idea or fact) is embedded in our schemata (i.e. it is part of our
experience). Because we wish to communicate this message, we
activate the linguistic discourse, content, event, or strategic schemata of
our listener by means of which we can convey our message.
Tests of spoken language also test other skills (listening for
example). There are no pure tests of spoken language because only in
traditional academic lectures or at the theatre the language is one-way.
The development of the ability to interact in a foreign language
involves comprehension as well as production. At the earliest stages
of learning a foreign language, formal testing is avoided. Informal
observation provides the necessary diagnostic information.

5.2.3 Objectives
to set tasks that are representative samples of the oral tasks that
we expect students to be able to perform
tasks should elicit behavior which truly represents the candidates
ability and which can be scored validly and reliably.
the testing of speaking is the most difficult of all language tests to
design, administer, and score. Why?
it is difficult to choose the criteria in evaluating speaking: which is
more important grammar, vocabulary, pronunciation, fluency,
listening comprehension, correct tone (fear, anxiety), reasoning
ability, and initiative in asking clarification? How can we evaluate
properly each of these criteria? Shall we also include among them
questions of response?

Other difficulties

The tester has to get students to speak and to evaluate them at
the same time
Each testee has to be evaluated individually
How are we to evaluate learners at the elementary level?
How are we going to evaluate those testees who want to work in
professions like teaching, business, translation, professional oral
reading by radio announcers?
Tests of speaking appeared much later than other type of tests.
The main reasons for this are: for academic purpose, the written
language was more important than spoken language. Tests of
speaking are subjective and unreliable and also time consuming,
expensive and therefore impractical if we take into account that
the learners have to be tested individually. The exclusion of
speaking tests from examinations has a poor washback on
learning and teaching. The spoken test has to have a place in all
kinds of language examinations.
SPEAKING-
A DIFFICULT
SKILL TO TEST
Testing the Language Skills I

Proiectul pentru nvmnt Rural 95
Situations
Learner speaks to:
assessor,
learner and assessor,
interlocutor and assessor

Three views on the nature of spoken language.
1. the literary view e.g. public speeches, dramatic monologues they are
not spontaneous spoken language, they are prewritten or memorized.
2. the linguistic view sees the spoken language as the oral
expression of writing
3. The communicative view sees language as a spontaneous and
interactive means of developing social relationships.




Point to Ponder
Evaluation can be done surreptitiously, and it can be done
with flags and trumpets; but it must be done, otherwise the teacher
will not know if learning is taking place
(Geoffrey Petty)

5.2.4 Types of Speaking Tests
reading aloud (problem solving working in pairs)
oral interview the interviewer tends to intimidate the learner or to
dominate the interaction; reduced reliability

The Speaking Tests

literary tests of spoken language reciting a poem or speech
the reading aloud of a prepared passage of prose or poetry
summarizing or retelling a story, book
the discussion of other aspects of a piece of literature
free talk prepared talk or lectures on given topics
long turns which the learner has only a limited time to prepare (the
description of a picture)
The limitations of free speaking tests

limited interaction
lack of authenticity
poor washback effect
doubtful test security (who is the author of the grading and what is
graded is difficult to assess)
limited sample

Linguistic speaking tests have in view the assessment of stress
patterns, intonation, grammatical structure, range of lexical units.
Examples:
an unprepared passage is read aloud (the assessor listens out for
the pronunciation of a few pre-selected words, intonation)
elicitation through questions/ direct instruction e.g. Ask me a
question beginning with
THE NATURE
OF SPOKEN
LANGUAGE
WEAKNESSES
OF SPEAKING
TESTS
Testing the Language Skills I

96 Proiectul pentru nvmnt Rural
Accuracy of function realization is tested by describing a situation
to which the testee must answer e.g. Situation: You are in a
restaurant. You find a fly in your soup. What would you say to the
waiter?

The limitations:
the unauthentic, non communicative purpose, no interaction,
restricted method (what is not tested will not be taught), no
washforward effect (no real link with the outside world)
Advantages:
no open answers
easy to - work tests



Guided speaking tests provide a task environment. The test
provides details about:
What is speaking?
To whom?
Why and about what?
The context of situation
Advantages: authenticity (role-play, a real world scenario)

Example: The learner will be given a topic to prepare before entering the room.
Time for preparation: 3 minutes.
Part 1: Presentation: three - minute uninterrupted talk
Assessor: takes notes
Examples of topics:
a. your interest and free time
b. your reasons for studying English
c. your future plans

Part 2: Personalized interview. The assessor asks questions about
the learners family, education, interest, future plans (5 minutes):
Where do you live? What do you enjoy about living there? How
long have you been studying English? Have you been studying
English in the same class?

SAQ 1

Which of the following characteristics are generally typical of modern
tests of speaking? Underline your choices.
a. integrative discrete point
b. subjective objective
c. high reliability - low reliability
d. authentic contrived
e. direct indirect
f. good validity poor face validity


Compare your answers to those in the Answers to SAQs section at the end of the
unit.
GUIDED
TESTS
Testing the Language Skills I

Proiectul pentru nvmnt Rural 97
Example: Look at the two pictures below. Discuss with your partner.

Point to Ponder

Getting accurate measurements of how skillful students are in
authentic communication becomes very difficult in practice.
Furthermore, indirect measures of correction take a great deal more
skill and planning, though they may be worth the extra effort when
dealing with students who are feeling discouraged or threatened.

Reading Aloud
The testee reads aloud to the assessor a passage of a text, a
dialogue (one of the parts is read by the interviewer), a specialized
technical English text, a descriptive passage, instructions (how to
cook a dish or giving instructions by phone), retelling a story

Variants:
Reading scripted dialogue with someone else reading the other part
Reading text with phonetic markers (sounds, words, technical
vocabulary, idiomatic or conversational expressions), speech
factors (assimilation, liaison or contractors), words or sounds that
are known to cause problems for speakers of a certain language
Reading sentences containing minimal pairs
Spelling aloud (if testees apply for a job in travel agency or for
orders by phone)
Reading from a table figure, abbreviations or initials in different
quantities

SAQ 2

Tests of speaking do not test objectively more than two of the four
components of communicative competence. Circle the correct
competences:

a. linguistic competence
b. discourse competence
c. sociolinguistic competence
d. strategic competence

Compare your choices to those in the Answers to SAQs section at the end of the
unit.

Advantages
The assessor may choose the topic
The same test may be given to all testees (higher reliability)
Simple to administer and quick to score
Correct production of sentences stems and intonation patterns
suggest a good comprehension

Disadvantages
The technique is not authentic (we rarely read aloud in real life)
Testing the Language Skills I

98 Proiectul pentru nvmnt Rural
The test is not communicative
Reading aloud is a skill that can be improved in a short time
Only the mechanical skills are tested i.e. pronunciation, intonation

Using pictures, maps, diagrams (pictures provide a realistic context)
Pictures of single object (for testing the production of phoneme
contrasts)
Pictures of scenes ( for description, for narration)

1. The testee is given a picture
2. The testees studies the picture for a few minutes
3. then, he/ she is required to describe the picture in a given time
4. The number of words he/ she speaks is counted by one examiner
5. the number of errors are counted
6. separate scores for fluency, grammar, vocabulary, phonology,
accuracy of description
Other examples:
advertisements
pictures for comparison
pictures for instructions
if the picture depicts a story or sequence of events, it is useful to
give the testee one or two sentences as a starter
oral interview
question and answer ( disconnected questions are graded in order
of increasing difficulty; suitable for lower levels; easy to adapt
questions to suit level)

Point to Ponder

Whatever the homework, if it is set it must be seen, marked or
tested by the teacher, otherwise it will be evaded.


Individual Elicitation

Research and specialized testing have devised various ways of
eliciting specific parts of the language. One technique is to use
pictures. The tester talks about part of the picture (for example using
a singular noun, or present tense verbs or questions forms). The
learner responds and the tester notes if the expected plural, past
tense verb, or the inflected form of the verb was present.

Interviews
Interviews are a direct, face-to-face exchange between testee and
interviewer.

Advantages
they are structured
the interviewer maintains firm control, keeps the initiative
more authentic
several topics may be raised
ASSESSING
SPEAKING:
OTHER
PROCEDURES
Testing the Language Skills I

Proiectul pentru nvmnt Rural 99

Disadvantages
the testee sees the assessor as a superior (the result is only one
style of speech)
many functions are absent
only at intermediate level or below
a candidate may dominate another
Stages

introduction (polite social questions to put the learner at ease)
find level against a specific scale
check questions above and below the established level
several more questions at about the right level
self assessment (oral ability, strength/ weakness)
feedback, tell the learner the result, invite any comment
assessor should not over correct errors, fill pauses automatically,
interrupt unless necessary

Example Question and answer for lower levels achievement test. If one
question is not understood, the interviewer can move to another. This
model may be adopted for various levels, simplified or made more
complex.

Whats your name? Could you spell it?
How are you?
How well can you speak English?
Do you like speaking English?
What do you do? Whats your job?
Tell me a little about your family
Can you count up to twenty?
Can you tell me the time?
What is the date today?
What day of the week is it?
What is the weather like?
Where/ how did you learn to speak English?
Tell me three things you did yesterday.
What were you doing/ where were you at this time yesterday?
Where do you live? How long have you been living there?
Where do you work? Do you like it there? How long have you been
working?
What is your hobby? What do you like doing in your spare time?
What will you do when you leave here today?
What are you going to do for your next holiday?
What are your plans for the future?
Have you been to England/ America?
How many foreign countries have you visited?
When did you go there? How long did you stay?
What did you see/ do?
Did you enjoy your stay/ visit? Why? Why not?
Testing the Language Skills I

100 Proiectul pentru nvmnt Rural
What differences in lifestyle/ transport/ food/ people did you
notice?
Would you like to live/ work/ go back there?
Can you speak any other languages? How well?




Interaction with Peers

Two or more candidates are asked to discuss a topic.

Pictures
When looking at pictures, you can often use the present tense: This
picture shows a who seems to be ..In the center of the picture I
can see

Role play
The candidates are asked to assume a role in a particular situation.

Interpreting

In part 3 of Paper , you and your partner will have to reach a
decision or work something out using one or more pictures or
diagrams.
Discussion between candidates
Discuss some of the important achievements that have influenced
the world we live in. Which three achievements have offered the
greatest benefit?

Imitation

Candidates have a series of sentences, each of which they have to
repeat in turn. If the sentences are long enough, testees will make
the same mistakes in performing this task as they will when speaking
freely.
Advantages
control in choice of sentences
good for a placement test

SAQ 3

Oral interviews are criticized for at least 3 reasons. Can you identify
them?





Write your answers in the space provided above (in no more than 10 words) and
compare them answers to those in the Answers to SAQs section at the end of the
unit.

Testing the Language Skills I

Proiectul pentru nvmnt Rural 101



SAQ 4

What should you test in order to encourage oral ability?










Write your answers in the space provided above (in no more than 60 words) and
compare them to those in the Answers to SAQs section at the end of the unit.


5.3 Testing Listening

Listening comprehension may be considered the language skill of the
greatest significance to foreign language teachers because it:
offers the most natural form of input
is a highly valuable skill in its own right
is a prerequisite of meaningful speaking and authentic interaction
In order to identify the techniques of testing listening comprehension
we have to understand:

What?
Types of listening function of intonation, rate of delivery, stress, rhythm:
Short, impersonal material: announcements in airports, train
stations, information about office hours, the weather
Longer impersonal material: lectures, reports, presentations
Interpersonal interchanges: conversations, greetings, invitations,
compliments, eavesdropping
Aesthetic and entertainment functions: songs, poems, movies
Instructional functions: orders, dictations, true and false tests, fill
in exercises

Types of listening based on context and discourse variables:
face to face
remote
live
recorded
with a friend
with a stranger
between social equals
between people different in age, sex, and status
with or without visual or non verbal cues
TYPES
OF LISTENING
Testing the Language Skills I

102 Proiectul pentru nvmnt Rural
surrounded by noise or in a quiet background
over the phone. Communication is not less effective. The listener
may have difficulties because there are no non-verbal signals

Types of listening based on production and processing variables.
Speech can be:
slow or fast
formal or informal
full of hesitations, pauses, repetition
polished or casual
linguistically, complex or simple
whispered
clearly articulated
native or non-native


SAQ 5

Which is more difficult to understand in a foreign language? A face-
to-face or a telephone conversation? Why?








Write your answers in 20 words in the space provided and compare them to
those in the Answers to SAQs section at the end of the unit.


5.3.1 How Do We Comprehend?
We comprehend because we know a lot about what the
speaker is saying. We have access to the same background, the
same frame of reference, to the same schemata. Generally, we
comprehend by linking up interactively with the speaker on familiar
cognitive and emotional ground.

The main modern approaches to TEFL are rich sources for
testing listening comprehension e.g. Total Physical Response (the
learner responds to a number of commands), the Comprehensive
Approach (show two pictures and give a command, or ask a question
e.g. point to the picture that

5.3.2 Micro Skills
Micro skills might include:
discrimination among the sounds of English
recognize sound patterns, rhythmic structures, intonation, and their
role in signaling information
ASSESSING
SPEAKING:
OTHER
PROCEDURES
Testing the Language Skills I

Proiectul pentru nvmnt Rural 103
recognize forms of words, grammatical word classes, tense and
agreement, patterns, rules, elliptical forms, cohesive devices
communicative functions
infer situations, participants
infer links and connections between events, deduce cause and
effects
detect main idea supporting new information

5.3.3 Informal Evaluation
You may need to know permanently if your students really
comprehend. Informal evaluation is much easier than you might
imagine. Learn to walk and read your students faces and gestures
and listen to the overall communicability of their responses:
a stiff look may mean: boredom (a task is too easy)
Fear, frustration, restlessness, whispering (the input is too difficult)

Correction is part of the evaluation process, but on the whole
does not require correction. You can make mistakes in hearing but
most of these mistakes are self-evident. The recommended materials
and procedures from modern textbooks that aim at developing
listening comprehension are a source for listening comprehension
tests. In principle, listening is an easy skill to measure. In practice,
objective testing requires high quality sound, special testing
materials, and proper facilities.
Listening materials tend to test rather than to teach. A
comparison of teaching materials with explicitly designed testing
material e.g. those used for the TOEFL examination, taken by foreign
students who intend to study at a United States university or the
Cambridge First Certificate intermediate level examination show that
the differences are often slight i.e. the most important ones involve
the degree of attention paid to the learners responses: in tests, the
responses really do count.
R. Lund (1990) in Taxonomy for Teaching Second Language
Listening (Foreign Language Annals 23 (1), 105 115, identifies nine
different ways in which we can check listeners comprehension

Doing the testee responds physically to a command
Choosing the testee selects from alternatives or pictures,
objects, texts
Transferring the testee draws a picture of what is heard
Answering the testee answers questions about a message
Extending the testee provides an ending to a story heard
Duplication the testee translates the message into the native
language or repeats it verbally
Modeling the testee may order , for example, a meal after
listening to a model order
Conversing the testee engages in a conversation that indicates
appropriate processing of information
TAXONOMY
OF
LISTENING
TASKS
Testing the Language Skills I

104 Proiectul pentru nvmnt Rural

The main procedures are
Teacher talk the most authentic and useful listening experience
is when the teacher is saying something real and learners are
doping something in response e.g. Today we are going to talk
about , Listen to this story about , I am going to talk about
some of the people in your class and you will tell me who I am
talking about
Total Physical Response. Give the following commands.
Beginning with everyone sitting down, with three pencils on each
desk: Pick up one pencil in your right hand, Point the tip of the
pencil up, and Lift both hands over your desk.
Imaginary movement. Say the following sentences and ask
learners to act them out: You push the door open slowly and
quietly. You put your head through, you look around thoroughly
Non verbal and short response. Follow the movement on the
map. Begin at go out the front door and turn right walk along
cross the street where are you now?
Look at the picture and follow the details. Pick the picture. Talk
about a series of pictures on a cartoon strip, mentioning as many
details as you can that also apply to several other pictures. Then
ask the testee to pint to which picture you have referred to.
True or false. Say true or false after you hear the following
sentences: The flight lasted 10 days
Identifying or pointing. Who is wearing who has three books on
her desk?
Find and color/ Find and cut it out
Definitions e.g. It lives in the ocean, breathes air, is smaller than a
whale, and likes to jump in and out of the water playfully. Answer?
Unnamed biography. Describe a famous person. After you finish,
ask who the person is. The same can be used for places, things,
activities.
Connected discourse. Stories: for all ages at any level. Use
stories of all types. Select those appropriate for the age and
interest level of the class. If you are using a story book that is well
illustrated, you have the added possibility of using the pictures and
general story line.
Dialogues. Take both sides of this dialogue, shifting posture, tone
of voice, or otherwise identify which of the two characters is
speaking.
Dictation. It is a specialized but highly useful listening activity
which also requires writing. It can be used with all ages and levels
of learners who can write. Choose a passage that is
comprehensible and also writable. Read over the whole passage,
asking for general responses. Reread the passage two or three
times, broken into short segments. You can read phrase by
phrase, and then repeat the whole sentence, and so on through all
the sentences. Repeat a final time, reading through at normal
speed. Correct them yourself.
Auditory scanning (i.e. listening for detail/ specific information).
Give the questions in advance. Tell them that they will only hear
PROCEDURES
Testing the Language Skills I

Proiectul pentru nvmnt Rural 105
the selection once. Weather report. Thank you for calling weather
line. Currently at the downtown weather forecast station the
temperature is 24 degrees. And now our forecast. Overnight, clear
skies and lows in the upper 10s. Clear and sunny tomorrow
morning, with highs around 28. Watch for afternoon showers with
temperature dropping into the low 10s. Questions in the written
form are in front of the testee: What is the current temperature?
What will the high temperature be tomorrow?
Individual responses. In one to one setting you can talk about
pictures and have the student point, you can make reports and
see if the learner can follow them, or if the level is high enough,
you can have conversational interchanges and see if the learner
can respond to what you are saying. I you want to know about
comprehension, you will do most of the talking ands require rather
simple, but unambiguous responses.
Simple paper and pencil test. Normally, you should pre-record
the tests if you want to have objective results i.e. comparable
results with different groups. Be careful! In giving live cues you are
very tempted to respond to the non verbal feedback of the group
and vary your production considerably.
True false. Provide a series of statements that must be
comprehended for their general meanings. They are clearly true or
false. The testees mark a standard answer sheet.
Pictures. You say something about one of the group of 3 or 4
pictures. The testee picks the one you referred to and works the
answer sheet.
Multiple choice. You can make statements, ask questions, or
have short conversations. The test contains 3 or 4 choices which
the testees read. They pick the one most related to what they
hear.
Completion multiple choice. Learners choose the best way of
completing the lines of a conversation from among the 3 or 6
choices and mark their answer sheets.
Oral cloze. Testees see a passage that has blanks. As they listen
the second time, they fill in the blanks.
Note taking. Candidates take notes during the talk. After the talk
is finished they see the questions they have to answer.



SAQ 6

Dictate the following text:
Dear Sir,
I am answering your advertisement/ for an engineer./ I saw it
in the paper yesterday./ Can I come for an interview next week?/ I
left my job last month/ and I am free every day of the week./
I am 25 in August/ and I am not married./ I studied at Bolton
University/ and I finished there in 2004.

What do you test? Why is the dictation test written this way?

Testing the Language Skills I

106 Proiectul pentru nvmnt Rural







Write your answers in the space provided above (in no more than 60 words) and
compare them to those in the Answers to SAQs section at the end of the unit.


5.3.4 Scoring the Listening Test

Tests may be classified into non- productive exercises (only
listening is involved; it is objective) and productive (integrated with
other skills e.g. writing or speaking; it is subjective)

Non productive listening tests
Multiple choice questions
True false questions
Multiple- choice cloze
Matching
Sequencing
Information transfer
Productive listening tests
Open ended questions
True false (false items to be corrected by the testee)
Listening cloze
Summarize
Note taking
Completion tasks
Dictation

Non productive tests can be scored objectively. Cloze, dictation,
gap filling can be scored semi-objectively. For reasons of
practicality and reliability, these tasks should be selected. For
reasons of washback, productive tasks are recommended.



SAQ 7

Which humanistic approach is recommended for developing and
testing listening?


Write your answer in the space provided above (in no more than 5 words) and
compare it to that in the Answers to SAQs section at the end of the unit.




Testing the Language Skills I

Proiectul pentru nvmnt Rural 107
A check list of listening task types that may be used in formal and informal testing

Type Example
1. Listen and Do: During or
after listening, students are
asked to perform some
action
Numbering a drawing, completing a
map, ordering items in a list, matching
items, labeling, ticking.
2. Listen and Do Nothing:
no output
Listening to a story or a poem
3. Listen and Follow:
students may be given a
map or picture and match
what they hear with what
they see
Map, picture, diagram work
4. Listen and Respond:
Students are asked for an
affective response
Listening to tape did they like / dislike
it, did they emphasize with the person,
etc.
5. Listen and Answer: The
traditional type of question
task
Students have to answer questions
of a variety of possible types: T/F, Wh-;
m-c questions or open-ended Qs
6. Listen and Compare:
Listening for similarities /
discrepancies between two
(or more) inputs
The inputs may be both/all listening
inputs (e.g. Jigsaw Listening) or a
mixture of a tape and print material,
e.g. radio and press reports on same
event
7. Listen and Complete:
Gap-filling
Cloze-type exercise; masked-word
tape task (word obscured by noise on
tape); collating fragments of text
(Patchwork Listening)
8. Listen and Predict:
Partial text is provided and
students are asked to
anticipate
What will Mrs. X say next?
How will Mrs. Y respond to that?
How will the story end?
9. Listen and Correct:
Students have written text which they
correct to match spoken version
10. Listen and
Recall/Write: Making and
using notes
Students take notes as they listen, in
order to prepare a written summary, or
reach agreement on what happened, in
group discussion. (Dictation is a variant
of Listen and Write)
11. Listen and Discuss:
Using a tape as an
information source for oral
interaction
Deduction of information from spoken
(and/or written) texts, evaluation of
information, problem solving on basis
of taped text.
12. Listen and React:
Expressing value judgments
Students are asked to make value
judgments about opinions given or
actions described on tape e.g. Did the
person do the right thing?

Testing the Language Skills I

108 Proiectul pentru nvmnt Rural
Point to Ponder

Talking is sharing, but listening is caring. (Anonymous)
Education is the ability to listen to almost anything without
losing your temper or your confidence.


Rating Scale for testing listening

8. Handles all general listening operations; shows confidence,
competence similar to those in his/ her native language, able to
compensate for difficulties
7. similar to nine, low repetition, repairs, adjusts listening strategies
for purpose
6. Extracts the majority of messages with minor loss of details, few
corrections
5. Handle moderately listening operations; extract most of the
message; need for repetition
4. Loss of detail, little grasp of subtlety, frequent need of repetition,
and difficulties in handling input at normal speed
3. Only the gist of the message, need for repetition, and difficulties in
handling input at normal speed, no compensation for errors
2. Comprehension of isolated points, dependent on repetition, a
narrow range of language
1. Little confidence, comprehends only basic messages, unable to
compensate

SAQ 8

Listening comprehension test

PART A
Testees objective: to demonstrate your ability to understand
spoken English
For each question you will hear a short sentence. Each
sentence will be spoken just once.
After you hear each sentence, read the four choices in your
book, marked (A), (B), (C), and (D), and decide which one is
closest in meaning to the sentence you heard. Then, on your
answer sheet, find the number of the question and fill in the
space that corresponds to the letter of the answer you have
chosen. Fill in the space completely so that the letter inside
the oval cannot be seen.
Listen to the example: Please turn in the key of your room
before you leave.
In your textbook, you read:
A. Please lock your room when you leave
B. Turn the key to the left to enter the room.
C. Please return your room key before leaving.
D. You must leave your room by four oclock.
The correct answer is C.
Listen to another example: What is Mary going to do
Testing the Language Skills I

Proiectul pentru nvmnt Rural 109
tomorrow?
In your textbook, you read:
a. Will Mary be traveling tomorrow?
b. What are Marys plans for tomorrow?
c. Who will be with Mary tomorrow?
d. Does Mary have to do it tomorrow?
The correct answer is b
PART B 15 short conversations
In Part B, you will hear short conversations between two people.
After each question, a third person will ask a question about what
was said. Read the four possible answers and decide which one is
the best answer to the question you heard. Then, on your own
answer sheet, find the number of the question and fill the space that
corresponds to the letter of the answer you have chosen.
Listen to an example: A man tells a woman that he doesnt like
the painting either. The question is: What does the man mean?
In your textbook, you read:
a. he doesnt like the painting either
b. it doesnt know how to paint
c. he doesnt love any paintings
d. he doesnt know what to do
The correct answer is A.
PART C

In this part of the test, you will hear longer conversations and talks. After
each conversation or talk, you will be asked some questions. After you
hear a question, read the four possible answers in your textbook and
decide which is the best answer to the question you heard.
Listen to an example: The topic is computer animation.
Question: What is the main purpose of the program?
In your textbook, you read:
a. to demonstrate the latest use of computer graphics
b. to discuss the possibility of an economic depression
c. to explain the workings of the brain
d. to dramatize a famous mystery story
The correct answer is C.
Question: Why does the speaker recommend watching the
program?
In your textbook, you read:
a. it is required of all science engines
b. it will never be shown again
c. it can help viewers improve their memory skills
d. it will help with coursework
The correct answer is D.

Comment upon the above test:
a. Is it a discrete/ integrative test?
b. What language areas does it cover?
c. What type of test is it?
d. How do you evaluate such a test: easy/ difficult

Write your answers in the space provided above (in no more than 10 words) and
compare them to those in the Answers to SAQs section at the end of the unit.
Testing the Language Skills I

110 Proiectul pentru nvmnt Rural

5.4 Summary


This unit has been concerned with testing speaking and listening
specific procedures have been introduced for formal and informal
assessment of speaking and listening. Changing emphasis in the
assessment of speaking and listening is a move towards integrative
tests that cover all four elements of communicative competence,
towards objective scoring and higher reliability and authenticity. We
have also identified other characteristics:
Assessing processes
Internal (during course assessment instead of external end of
course assessment)
Use of a variety of methods
Criterion referencing
Formative identification of strengths and weaknesses and
recording of positive achievement instead of pass/ fail summative
assessment.

5.5 Key Concepts

Oral interview
Linguistic competence
Discourse competence
Sociolinguistic competence
Strategic competence
Transactional
Information transmitting
Interactional
Transactional
Authenticity
Top-down
Bottom up
Interlocutor/ assessor

5.6 Checklist

Do students draw up checklists of criteria for success?
Do your learners get frequent reinforcement, e.g. marks,
comments, praise, etc.?
Does your reinforcement or recognition of success come as
quickly as possible after the student has completed the work?
Are the standards you set seen as work achieving by your
students, as well as being achievable by them?
Do you test regularly, and set well-managed deadlines for
students work?
Testing the Language Skills I

Proiectul pentru nvmnt Rural 111

5.7 Answers to SAQs

SAQ 1 If your answer to SAQ 1 is not comparable to the one suggested
below, please reread sections 2.3, 2.4, 4.2.1.2, and 4.2.4.2 again.

a. integrative
b. objective
c. high reliability
d. authentic
e. direct
f. good validity

SAQ 2 If your answer to SAQ 2 is not comparable to the one suggested
below, please reread section 3.3.2.1 again.

a, d

SAQ 3 Your answer depends upon your personal teaching / learning
experience.

The interviewer can intimidate the listener and dominate the
interaction.

SAQ 4 If your answer to SAQ 4 is not comparable to the one suggested
below, please reread section 5.2.4 again.

If you want to encourage oral ability, then test oral ability.
Generally, the abilities / skills should be given sufficient weight in
relation to other abilities. Some teachers ask their learners to bring
something to the test, a favourite object which is appropriate to their
age/ interest (an object, a picture). The testee is asked to speak
about the object. This technique reduces the fear of the unknown.
The disadvantage is that the presentation can be prepared.

SAQ 5 If your answer to SAQ 5 is not comparable to the one suggested
below, please reread section 5.3 again.

A telephone conversation is more difficult to understand. A
telephone conversation has a high information content and little
verbal redundancy. If you do not understand, it can be embarrassing
to ask for repetition. At the same time, the telephone adds a lot of
noise interference. However, we cannot enjoy the benefits that body
language contributes to improved listening comprehension.

SAQ 6 If your answer to SAQ 6 is not comparable to the one suggested
below, please reread section 5.3.3 again.

Listening comprehension is tested. The dictation is read three times.
1) The first reading of the whole text is done aloud at normal speed.
2) Then each block of text // is read twice in succession with a
pause between. 3) Read the whole passage through and allow
Testing the Language Skills I

112 Proiectul pentru nvmnt Rural
students time to check what they have written before collecting the
answer paper.

SAQ7 If your answer to SAQ 7 is not comparable to the one suggested
below, please reread section 5.3.3 again.

The Total Physical Response

SAQ 8 If your answer to SAQ 8 is not comparable to the one suggested
below, please reread sections 3.3.2.1, 4.2.4.1 again.

a. discrete
b. grammar, vocabulary, discourse
c. proficiency


5.8 Further Readings

Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 16-24
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press, pp 101-116, 134-141

Testing the Language Skills II

Proiectul pentru nvmnt Rural 113

Unit 6

TESTING THE LANGUAGE SKILLS II


6.1 Unit Objectives ......................................................................................................... 113
6.2 Testing Reading ....................................................................................................... 114
6.2.1 Types of Reading based on Content and Function ................................................ 114
6.2.2 Types of Reading based on Context and Processing Variables ............................ 114
6.2.3 Types of Reading according to Purpose ................................................................ 115
6.2.4 Cloze Passages ..................................................................................................... 116
6.2.5 Passages with Questions ...................................................................................... 117
6.2.6 Microskills .............................................................................................................. 117
6.2.7 True False Dont Know Checks ....................................................................... 118
6.2.8 Other Reading Techniques .................................................................................... 118
6.2.9 Assessing Overall Comprehension ........................................................................ 118
6.2.10 Issues in Teaching Reading ................................................................................ 119
6.2.10.1 Narrative Text. Reading for Pleasure ................................................................ 120
6.2.10.2 Reading for Information .................................................................................... 120
6.2.10.3 An Instructive Test ............................................................................................ 120
6.2.10.4 Types of Test Procedures ................................................................................. 121
6.3 Testing Writing .......................................................................................................... 121
6.3.1 Conditions under which Writing Takes Place ......................................................... 122
6.3.2 Current Theories of Writing with Particular Reference to
Foreign Language Writing ...................................................................................... 123
6.3.2.1 Writing as a Product ........................................................................................... 123
6.3.2.2 Writing as a Process ........................................................................................... 124
6.3.2.3 Writing as a Social Activity .................................................................................. 124
6.3.3 The Main Approach to Teaching Writing. Text Based Approaches .................... 125
6.3.3.1 Grammatical Form Practice ................................................................................ 125
6.3.3.2 A Communicative Approach ............................................................................... 125
6.3.3.3 Writer Based Approach .................................................................................... 125
6.3.4 Various Choices of Writing Tasks .......................................................................... 126
6.3.4.1 Scoring Essay Type Tests .................................................................................. 126
6.3.4.2 The Point Score Method ..................................................................................... 128
6.4 Summary .................................................................................................................. 130
6.5 Key Concepts ............................................................................................................ 130
6.6 Checklist ................................................................................................................... 131
SAA 3 ............................................................................................................................. 131
6.7 Answers to SAQs ..................................................................................................... 132
6.8 Further Readings ...................................................................................................... 132

6.1 Unit Objectives

Interactive, integrated skills approaches to language teaching
emphasize the interrelationship of skills. Reading is usually
developed and htested in association with writing, listening and
speaking activities.
The aim of this unit is to familiarize you with the main
procedures of testing the skills of reading and writing so that you will
Testing the Language Skills II

114 Proiectul pentru nvmnt Rural
feel confident to construct and administer such tests competently and
appropriately. We consider the major types of tests, their advantages
and limitations. We shall also give some suggestions on how to
prepare and grade the essay questions.
At the end of this unit you should be able to answer all issues in
testing reading and writing. Moreover, youll be able to apply many of
the techniques to your own situation.

6.2 Testing Reading

Reading theories, activities and materials, and research reports
on reading are increasing both in quality and quantity. In addition, the
number of publications has also increased in spite of the expansion
of the visual media that apparently make reading almost
unnecessary. More than that, the general public is keenly concerned
about promotion of literacy.
Reading is an extremely useful skill in itself. It makes it easier
for learners to get usable intake, comprehensible input as they can
adjust their reading to fit their own level. They can stop, back up, use
a dictionary, or ask for help much more easily than if they were
listening or speaking.
In other words, reading is the ideal form of comprehensible input.
The classification of the main types of reading gives you an idea of the
kind of authentic readings that should be used in teaching and testing.

Point to Ponder
The art of reading is to skip judiciously.
P.J. Hamerton, 1834 1894, The Intellectual Life

6.2.1 Types of Reading based on Content and Function
The list is important if we think of the positive washback. A
variety of types of reading tests will encourage the learners to read a
broad range of texts. The list also gives you suggestions for using
authentic texts in testing listening comprehension.
Short, impersonal material: highway signs, tickets, time tables,
labels, instructions. The style is usually formal and elliptic (missing
articles, regular verbs)
Larger, informational material: articles, books, reports, business
letters, etc
Personal material: personal letters, notes and messages,
biographies, what we write
Literary and aesthetic material: short stories, novels, plays, poetry
Instructional materials: textbooks, workbooks, tests, cloze exercises
6.2.2 Types of Reading based on Context and Processing Variables
Reading can be:
Formal
Informal
Impersonal
Light
Testing the Language Skills II

Proiectul pentru nvmnt Rural 115
6.2.3 Types of Reading according to Purpose
Skimming
Scanning
Speed reading
Reading for enjoyment
Reading for information
Reading for studying

Reading can vary depending on:
The age group
Gender
Social level
Professional background
Legal status

Reading varies according to:
Length of sentences
Difficulty of words
Complexity of sentence structure
Size of print
Use of co-text

Reading abilities and types of reading can vary greatly according to
readers interests, proficiency, reading habits, background

Reading is possible because the reader and the writer share the
same schemata. The reader reads selectively using the appropriate
knowledge, skill and experience (schemata) to get on the same ground
as the writer and figure out what the message is. Reading is favored by
the grammar translation and the reading approach. Evaluation plays
an important role in the process of developing this skill. You need to
evaluate students progress, to plan future reading and also to detect
problems before they become serious. Evaluation is reasonably easy,
based on learners ability to respond to the meaning of the material.
Teachers are sometimes tempted to take the phonological performance
of oral readings as a guide to reading ability, but we must view such a
tactic a serious error. Some learners can perform a passage perfectly
without understanding at all, as others can sound terrible and know
exactly what they read.
Since some reading material can be talked about reasonably
well simply by looking at the co-text and the titles, however, teachers
should not be content with a superficial interpretation. By having a
variety of response types, you can also better judge which types of
comprehension each student has. Formal tests of general skill,
where used with caution, can also be useful.

The procedures that can lead to growth in reading ability are a
source for formal and informal testing:
reading what you have said
reading around the classroom (e.g. the learners read the labels
placed on the door, table, chair etc)
HOW DO WE
READ?
PROCEDURES
Testing the Language Skills II

116 Proiectul pentru nvmnt Rural
reading material presented first orally
managed reading of short passages
non expository, non narrative reading
skimming and scanning
speed reading
In order to make the reader give overt responses you may ask
the reader:
To respond physically to a command rendered in written form
To select from written alternatives
To summarize what has been read
To answer questions about a written text
To outline and take notes from a written passage
To provide an ending to a story
To translate the message into the native language
To read instructions and assemble a toy
To talk in order to prove comprehension

SAQ 1

Why does reading share some techniques of testing with listening?






Write your answers in 20 words in the space provided and compare them to
those in the Answers to SAQs section at the end of the unit.

POINT TO PONDER
Before testing reading in a foreign language, the native language
reading skills of the testee must be assessed. If some testing skills
are not yet developed by the testee in his native language, you can
be sure that the learner encounters difficulties in reading in the
foreign language. However, do not conclude that reading skills from
the native language can be transferred to the foreign language.


Reading tests can follow the general outline of listening tests i.e
the same type of tests that are used for listening can be used for
reading. They can be handled individually, with true false
questions, written instructions, and so on. As easy as reading is to
organize informally, however it is surprisingly difficult to measure
formally.
6.2.4 Cloze Passages

Cloze passages are easy to prepare, but the fill in type can be
difficult to write. Multiple choice cloze tests appear to do much the
same thing, being only a little longer to prepare, and as many times
easier to score. You have to supply the correct choice and two or
three other attractive distractors (that have the right meaning but
WHEN IS A
CLOZE
PASSAGE
RELIABLE ?
Testing the Language Skills II

Proiectul pentru nvmnt Rural 117
wrong grammatical form, or somehow look right but have the wrong
meaning, and so on). Cloze passage of over 40 blanks can approach
reliability of 85 and above rather easily. But once made, such tests
are difficult to revise. You cannot just cut out the parts of the passage
that do not give good results as you can do with multiple choice tests
or other test types.

6.2.5 Passages with Questions
Many reading tests use passages followed by 3 or 5 questions
(for example, the TOEFL reading section). These tests are extremely
difficult to construct. First, it takes a lot of time to find suitable
passages that have content which is equally familiar to all testees.
The questions tend to overlap, and the answer to one question often
gives cues to the answers of the others. Reading through the
questions in fact can often tell you what the passage is about and
allow a high level of performance even if you do not read the
passage. Such tests take a lot of time per item, and in some ways
require more academic and study skills than reading skills.

6.2.6 Microskills
Lists of reading skills give us an idea of what to teach. At the
same time, these micro skills can become testing criteria:
Discriminate among the graphemes and orthographic patterns of
English
Send chunks of language of different length in short term
memory
Process writing at an efficient rate of speed
Recognize the cohesive devices, rhetorical forms, communicative
functions
Realize that a particular meaning can be rendered in different
grammatical forms
Infer context that is implicit
Infer links, connection, causes, effects, main idea, supporting idea,
new information
Distinguish between literal and implied meaning
Develop and use reading strategies (scanning, skimming, guess
the meaning of words from the context)
Identify referent of pronouns
Understand relations between parts of a text (introduction,
development, conclusion)

Setting criterial levels for reading is problematical. According to
Arthur Hughes the best way to proceed is to use the tasks
themselves to define the level. All of the items (and so the task that
they require the candidates to perform) should be within the
capabilities of anyone to whom we are prepared to give a pass. In
other words, in order to pass, a candidate should be expected, in
principle, to score 100%. But since we know that human performance
is not reliable, we can set the actual cutting point rather lower, say at
the 80% level. In order to distinguish between candidates of different
levels of ability, more than one test will be required.

READING
SKILLS
CRITERIA
LEVELS
Testing the Language Skills II

118 Proiectul pentru nvmnt Rural


6.2.7 True False Dont Know Checks

True false tests are useful as quick assessment of reading
comprehension. Scoring: 2 points for a correct answer, -1 point for an
incorrect answer, no penalty for answering dont know. Wilga Rivers
(1978: 267) suggests that this procedure discourages wild guesses.
Advantages: the True false test requires attention to structural
cues.

6.2.8 Other Reading Techniques
Questions in English requiring answer in students native
language. This is helpful for students who comprehend the text but
have troubles saying what he has understood in correct English.
If the text is too difficult students may be asked to read the
questions before they start reading.

Point to Ponder

Eye movements during reading are an important source of
information about the reading process. Modern theories of reading
usually analyze reading into a sequence of processes that are
applied to each word as it is encountered in the text.


6.2.9 Assessing Overall Comprehension
Supply in English a suitable title/ suitable subtitles for a text
Outline the plot
Give the main idea for each paragraph
Supply paraphrases or definitions for words in a text

Example of Reading Comprehension Test

After reading the passage, answer the questions by choosing
the letter for the alternative which could accurately complete the
statement.
SAQ 2

What happens if in a nine point scale, the best paper is assigned
a 8 and the worst paper assigned a 2 because the teacher
considers that the extreme scores should not be used because no
paper is good enough to receive the top score and no paper is bad
enough to receive the bottom score.





Write your answers in the space provided above (in no more than 40 words) and
compare them to those in the Answers to SAQs section at the end of the unit.


Testing the Language Skills II

Proiectul pentru nvmnt Rural 119
I must tell you that there is something in the proximity of the
woods which is very singular. It is with en as it is with the plants and
animals that grow and live in the forests: they are entirely different
from those that live in the plains by living in or near the woods, their
actions are regulated by the wilderness of the neighborhood. The
deer often come to eat their grain, the wolves to destroy their sheep,
the bears to kill their hogs, the foxes to catch their poultry. This
surrounding hostility immediately puts the gun into their hands, they
watch these animals, they kill some; and then, by defending their
property, they soon become professed hunters; this is the progress;
once hunters, farewell to the plough. The chase renders them
ferocious, gloomy and reasonable; a hunter wants no neighbor, he
rather hates them, because he dreads the competition. In little time
their success in the woods makes them neglect their village. (Letters
from an American Farmer, III, What is an American? by St. John de
Crvecoeur)

2. Living in the woods affects:
a. animals but not plants and men
b. plants but not animals and men
c. men as well as plants and animals
d. men but not animals and plants

3. The frontiersman
a. is forced to become a hunter
b. hunts for sport
c. hunts rarely
d. prefers fishing

3. Hunting and farming
A. go hand in hand
B. do not work well together
C. have no effect on one another
D. are alternate pursuits

4. Hunting makes the frontiersman
A. more sociable
B. a snob
C. less sociable
D. indifferent to neighbors

5. The authors opinion of the frontiersman seems
A. high
B. unflattering
C. flattering
D. envious

6.2.10. Issues in Teaching Reading
Skills may be used, with one predominant skill, in combination or in
isolation e.g. reading without responding in writing. Testing skills in
isolation is favoured for washback reasons and also because
integrated tests generate a lot of output (written or spoken) which
AUTHENTIC
READING
Testing the Language Skills II

120 Proiectul pentru nvmnt Rural
cannot be marked objectively. Moreover, integrated testing is time
consuming, costly, less reliable than objective testing
Reading is not a passive or receptive skill but an active process
Authenticity, an extremely important factor in testing reading,
refers to the degree to which language teaching materials have
the qualities of natural writing (texts from newspapers, magazines,
and other authentic materials). The opposite of authentic tests is
simplified tests. When authenticity is discussed, Widdowsons
sense (1976: 165) has to be taken into consideration: Authenticity
is a function of the interaction between the reader and the text
which incorporates the intentions of the writer/ speaker.
Authenticity of task is also an important issue. A task is authentic
when the result is a behavioral outcome (we do something with the
information derived from reading e.g. we read a recipe book and
then make a cake). However, it is also true that reading is not
always accompanied by a behavioral outcome as in the real world.
The texts we read have various purposes, for example reading for
reference, for interest or pleasure, for information or for instruction
or advice. Each of these types has a different style (uses a certain
tense or voice, is concise). It is also true that function of the
relation with the real world comprehension may be fragmentary
(telephone number from a directory) or global (a message). The
reading strategies function of the purpose in reading are:
skimming, scanning, intensive reading, and extensive reading.
Starting from the type of text we use, we may require the testee
to carry out a certain task.


Point to Ponder

Novice teachers are nearly always surprised by the results of
evaluation; it is not easy to guess who is learning and who is not.



6.2.10.1 Narrative Text. Reading for Pleasure

Sequencing a series of pictures/ statements to reconstruct a plot
Drawing
Assembling a number of lines

6.2.10.2 Reading for Information

Drawing a map
Labeling
Completing a table

6.2.10.3 An Instructive Test

Sequencing
Following instructions
Testing the Language Skills II

Proiectul pentru nvmnt Rural 121

6.2.10.4 Types of Test Procedures
Open ended questions; rejected by the second generation
testers because of their reduced objectivity and practicality;
accepted by some testers for their authenticity and fragmentary
comprehension
True or false, same or different, yes or no questions (authentic,
realistic)
Multiple- choice questions advantages, objectivity, reliability,
practicality)
Global multiple questions (an answer is given after scanning the
whole text)
Information transfer e.g. transfer information given by a picture map
Matching vocabulary
Cloze/ discourse cloze
Translation. Drawbacks: poor practicality, negative washback
effect/ subjective, time consuming
Testing reference skill and study skills


SAQ 3

Match the letters (i.e. the strategies) with the figures (objectives /
tasks)
I. a) responding b) summarizing; c) skimming; d) scanning; e) note-
taking; f) outlining.
II. 1. The test-taker must: locate a date, a name, or place in an
article; the setting for a story; the principal divisions of a chapter; the
cost of an item; comprehend labels, headings, numbers, and
symbols, making inferences that are not presented overtly.
2. What is the main idea of this text?
What is the author's purpose in writing the text?
What kind of writing is this?
What do you think you will learn from the text?
3. Write a summary of the text.
4. In this article.................., the author suggerts that ....................
Write an essay in which you agree or disagree with the author's
thesis. Support your opinion with information from the article or from
your experience.

Write your answers in the space provided and compare them to those in the
Answers to SAQs section at the end of the unit.


6.3 Testing Writing

Writing appears to be the most complex, the most variable, and
perhaps the least urgent of the four main language skills. However,
for virtually all students at all levels, writing is a skill they simply
cannot ignore. Writing is often of crucial importance. Judgment on
the performance of the learner may have consequences for him/ her,
such as exclusion into a specific discourse community.
Testing the Language Skills II

122 Proiectul pentru nvmnt Rural
Writing is also the means through which assessment and
testing of learning regularly take place. Writing is an important skill
for a learner in supporting other learning experience. It is obviously
the major means of recording, assimilating and reformulating
knowledge, and of developing and working through his/ her own
ideas. It is also a means of creativity and of self-expression.

6.3.1 Conditions under which Writing Takes Place
We cannot think merely of products, but also the processing
conditions. Those conditions include the motivation and proficiency of
the writer, the topic, the intended audience and the amount of time
available.

Physical processes
you put your fingers on a pen or keyboard
environment: you are sitting at a desk
you have to have a lot of information before doing it (different from
reading, when knowing is a result of doing it)
you need more energy
your ego is much more at risk
you expect writing to have more aesthetic qualities
you have to observe a number of cultural conventions
instructions are complex and compulsory
the main types of writing are expected to be more polished and
more permanent
writers can go back and forth and can use dictionaries and
reference materials
written language is more complex, more formal, more concise, and
uses less frequent vocabulary
written language has orthographic and punctuation rules
writers emotions are translated indirectly into words on page
writing lacks the social immediacy and urgency of speaking
writers feel lonely
the writer must think of an imagined reception audience
writing implies creation


Point to Ponder

The invention of writinghas had a greater influence in uplifting the
human race than any other intellectual achievement.
James Breasted, The Conquest of Civilization



How Do We Write?
Writers compose on the basis of their schemata:
Linguistic
Textual that affects quantity and quality (background knowledge)
Event (knowledge of the order of events)
Strategic (strategies for how to compose)
Testing the Language Skills II

Proiectul pentru nvmnt Rural 123
A level of print schemata is also involved (spelling, punctuation)
In second language acquisition, writing correlates with the degree
of language proficiency
Reading and writing are correlated
In other words, when you sit down to write something simple,
you activate the schemata (ideas, information, and organization)
regarding the topic. Sometimes you have to activate various parts of
the schemata. If you find a lot of information, you do not just copy off
everything. It is the turn of your discourse schemata to be activated
that gives you possible ways of writing essay answers and your
event schemata to decide which order the events occur. You also
need strategies for impressing the audience or to decide how to use
your time, and how to control your excitement. As a foreign learner of
English, you also have to pay attention to the structure of the
language (vocabulary, idioms, the correct level of formality, spelling).

6.3.2 Current Theories of Writing with Particular Reference to
Foreign Language Writing
Current theories to writing may be classified into:
Linguistic
Cognitive
Social
function of the particular emphasis given to the text, the writer
or the context (readers).
Two of these perspectives on writing are important in language
teaching. Writing is seen both as a product and as a process.

6.3.2.1 Writing as a Product
Writing means, among other things, the output of the activity of
writing i.e. a static text, visible on paper, isolate in time and place from
the writer. Writing as a product has been analyzed in many ways:
Comparisons of L1 and L2 writings
Length (fluency)
Accuracy of form (error)
Effectiveness (quality)
Structure
Many of the techniques of analysis of written texts derive from
Discourse Analysis. In 1966, Kaplan introduced the concept of
cultural variation in thought patterning in L1, which is also important
for the acquisition of L2 writing. As a result, research has been
carried out on:
Distribution of information
Inter-clause relations to measure patterning of the whole text level
Other differences between L1 and L2 texts:
Differences in the ways clauses are sequenced to built up
argument structures
Organization and elaboration of structures
The way in which readers requirements are met through topic
signaling and attention getting devices
Testing the Language Skills II

124 Proiectul pentru nvmnt Rural

Stylistic features of L2:
Relative inconsistency
Inappropriateness or limitations in variety of style and tone
Specific morpho syntactic lexical - semantic features
The nature and frequency of clause connectors
Types of modification
Occurrence of passives
Cohesion
The use of collocations

SAQ 4

Which one of the following statements is most applicable to the
selections of distractors for multiple choice items?
A. distractors should be unequivocally false
B. one should avoid tricky distractors based on misconceptions
C. distractors should be attractive to the uninformed
D. distractors should be heterogeneous

Circle your answers in the space provided and compare them to those in the
Answers to SAQs section at the end of the unit.


6.3.2.2 Writing as a Process
Writing as a process focuses on the process of producing text
i.e the activity of transforming ideas to written text. According to this
view, writing is a complex cognitive activity, involving the use of a
range of problem solving strategies and composing processes.
Research tools indicate the verbal protocol which is an analysis of
recorded verbalization of people in thinking aloud while writing,
observation and post-interviewing techniques. The resulting model
consists of the following components:

The writers long term memory (knowledge of the topic,
audience)
The task environment (the assignment, topic, audience, exigency,
the text produced)
The processes themselves i.e. planning (generating ideas, goal
setting, organizing), expressing ideas in verbal form, reviewing
(reading and editing)
The model is interactive. Both writers internal resources and
external context interact with composing processes.

6.3.2.3 Writing as a Social Activity
Writing is an act of communication between the writer and
reader within an external context. The important thing here is the
interaction between producer and receptor in terms of common
schemata and situational context.
The main components of writing are summarized in a diagram
presented by Raines (1983) in Techniques of Teaching Writing,
Oxford University Press
Testing the Language Skills II

Proiectul pentru nvmnt Rural 125
6.3.3 The Main Approach to Teaching Writing. Text Based Approaches

6.3.3.1 Grammatical Form Practice
This approach follows the requirement of the Audio-Lingual
Approach that considers writing as a way of reinforcing other
language skills. It requires:
Practice of syntactic and morphological patterns that can be
isolated
Reinforcement of target situations is achieved through: sentence
completion, combining texts, gap filling, manipulation and imitation
activities using mode paragraphs that contain the selected
structures
Its objective is formal linguistic accuracy. Appropriateness to
context and self expression are corrected by the teacher.
This limiting sentence bound approach was replaced in the
1960s by a discourse analysis approach as current traditional
rhetoric. Although the aim is still structural, this approach
emphasizes the formation of habits in writing: paragraphs patterns
and sequences of units of meaning over longer stretches of
discourse. Rhetorical categories (e.g. description, narration) and
functions are practiced by constructing and manipulating discourse
forces through completion type tasks, topic sentence, paragraph
development exercises.

Point to Ponder

The study of languages ... should be joined to that of objects that
our own acquaintance with the objective world and with language
may progress side by side. For it is people we are forming and not
parrots.
Comenius, 1957

6.3.3.2 A Communicative Approach

In a communicative approach to writing, the emphasis is on
message forms. The goal is purposeful interaction. Its objectives include:
Appropriateness to the purpose of communication i.e. content
Appropriate techniques
Information/ opinion
Information transfer exercise (from visual to text)
An emphasis on real time, holistic practice, risk taking strategies,
free practice

6.3.3.3 Writer Based Approach
It is based on the idea that writing is a non linear, explanatory
and generative process. It focuses on writers efforts to formulate and
communicate ideas. It involves problem solving cognitive activities,
using strategies of goal setting, idea generation, organization,
drafting, revising and editing. The main characteristics are:
A text is worked and reworked through a number of draft versions
Writing is a collaborative activity to be shared within the classroom
Testing the Language Skills II

126 Proiectul pentru nvmnt Rural
The teacher is an advisor rather than tester
However, in a successful attempt to teaching, we should not
forget that there is no process without product, and no product which
has nor arisen out as a process. Both approaches should be
considered in a practical approach to writing in L2.
Writing is fairly easy to elicit and reasonably easy to grade, at
least linguistically. But testing writing presents some important
issues. Selecting fair topics is one difficulty, but grading is more
serious. You must decide the relative weight of:
Mechanisms
Structures
Vocabulary
Fluency
Accuracy
Communication
Organization
Total amount of information

6.3.4 Various Choices of Writing Tasks
Controlled writing. You can have learners manipulate sentences,
expand, edit, summarize
Completion or synthesis. You can begin a passage and have
testees complete it. Or you can present paragraphs and ask
students to sort, arrange and expand them.
Simple stories, dialogues, or descriptions. You can ask
learners to develop written material.

6.3.4.1 Scoring Essay Type Tests
With the exception of the oral test, the essay is the oldest test
format in use today. The distinctive features of the essay question
are:
1. the examinee is permitted freedom of response
2. the answers vary in degree of quality of correctness

Advantages of the essay
1. it is relatively easier to prepare an essay test than to prepare a
multiple-choice test
2. it is the only means that we have to assess the examinees ability
to compose an answer and present it in effective prose
3. it tests the pupils ability to supply rather than select the correct
answer

Limitations of the essay
1. their poor content sampling
2. their low reader reliability
3. the student does not always understand the question and therefore
is not sure how to respond
4. reading essays and grading them is time consuming and laborious

Why Are Essay Tests Still Popular?
1. Essay tests can indirectly measure attitudes, values and opinions
2. Good essay tests are more easily prepared than objective tests
Testing the Language Skills II

Proiectul pentru nvmnt Rural 127
3. They provide good learning experiences
4. Essay tests require the student to express himself logically,
coherently, and good English
5. Essay questions should be restricted to advanced foreign learners
of English

It is very difficult to score essays. Many approaches have been
tried ranging from skimming the response for a quick estimate of its
worth to assigning points for specific bits of information. Regardless
of the method of scoring used, the emphasis should be on the ideas
presented, relationships developed, judgment made.
If factual information is desired, use selection type items or
completion items. If more questions are to be scored, score the first
question on each paper before proceeding to the next. When all the
items are scored, the points assigned can be summed for each
paper. In order to increase reliability, ignore students names.
There are two methods of scoring essay responses:
The sorting method
The point score method

Sorting Method
Decide on the number of groups to be used ( for example 5) prior
to scoring the papers
Read the papers
Sort the papers and place them into several groups ranging from high
to low
Resort the questionable papers or the border-line ones
Take care to see that the better papers in the better group are
superior to the top papers in the poorer group
If a predetermined number if papers are assigned to each group
function of the size of the class, the number of points assigned to
the question (for example if the number of groups is five then two
papers may be assigned to the first/ top group, four to the second,
eight to the third, four to the fourth, two to the fifth. The teacher is
not required to conform exactly to expected distribution. The
method establishes an expected standard measuring. All
groupings should be used in order to increase the reliability of the
examination.



SAQ 5

Imagine you are in a library. Find your reference materials in one of
the sections of the library.
Which of these items relate to:
a. study skills
b. reference skills

1. Where might you look for a book about submarines?
a. section J; b. section G; c. section H
2. Where would you find the latest issue of Time magazine?
a. section A; b. section B; c. section K
Testing the Language Skills II

128 Proiectul pentru nvmnt Rural
3. Where might you find a mystery about a teen-age detective?
a. section D; b. section E; c. section F
4. Where would you find an atlas with maps of Africa?
a. section J; b. section D; c. section C

Section A : Listening and Video- Tape Room
Section B: Reading Room
Section C: Reference Section
Section D: Childrens Section
Section E: Young Adult Section
Section F: Biography Section
Section G: Non-fiction Section
Section H: Fiction Section
Section K: Newspapers and periodicals

5. Read each of the questions that follow, and decide in which book
would you look first to find the answer. Circle your answer:
a. almanac
b. encyclopaedia
c. others
d. Readers Guide to Periodical Literature
1. Where are some famous lighthouses located in North America
2. What is the current population of Vancouver?


Your answers depend ou your personal experience. Compare your
solutions to those in the Answers to SAQs section at the end of
the unit.

6.3.4.2 The Point Score Method
The Point Score method of scoring essays is more reliable than
the sorting method. However, in this case, validity may easily be
sacrificed for an increase in reliability. Rules:

Avoid selecting facts as scoring points
A graded key what contains the expected responses should be
constructed
The specific points for the reader to note must be isolated and
assigned relative weights
The teacher reads a response and assigns to it the appropriate
number of pints
At the end, all the points assigned to the questions can be added
to determine the total score for each paper
The method is useful for scoring the short answer essay test

Example
Essay writing

You will have 30 minutes to plan, write, and correct your essay.
Your essay will be graded on its overall quality

INSTRUCTIONS
Testing the Language Skills II

Proiectul pentru nvmnt Rural 129
1. When the supervisor tells you to begin, read the essay question
carefully.
2. Think before you write. Making notes may help you to organize
your essay. Below the essay question is a space marked NOTES.
Use only this area to outline your essay or make notes.
3. Write only on this topic. If you write an essay on a different topic, it
will not be scored. Write clearly and precisely. How well you write
is much more important than how much you write, but to cover the
topic adequately, you may want to write more than one paragraph.
4. Write neatly and legibly. Do not skip lines. Do not write in very
large letters or leave large margins.
5. Check your work. Allow a few minutes before times are called to
read over your essay and make small changes.
6. After thirty minutes, the supervisor will tell you to stop. You must
stop writing and put your pencil down. If you continue to write, it
will be considered cheating.


Essay Question
(30 minutes)
Do you agree or disagree with the following statement?
A zoo has no useful purpose.
Use specific reasons and examples to explain your answer.
Notes:

Score Level Criteria
Comme
nts
30-27 EXCELLENT TO VERY GOOD: knowledgeable substantive thorough
development of thesis relevant to assigned topic
26-22 GOOD TO AVERAGE: some knowledge of subject adequate range limited
development of thesis mostly relevant to topic, but lacks detail

21-17 FAIR TO POOR: limited knowledge of subject little substance inadequate
development of topic

C
O
N
T
E
N
T

16-13 VERY POOR: does not show knowledge of the subject noin substantive not
pertinet OR not enough to evaluate


20-18 EXCELLENT TO VERY GOOD: fluent expression ideas clearly stated / supported
succint well-organized logical sequencing cohesive

17-14 GOOD TO AVERAGE: somewhat choppy loosely organized but main ideas stand
out limited support logical but incomplete sequencing

13-10 FAIR TO POOR: non fluent ideas confused or disconnected lacks logical
sequencing and development

O
R
G
A
N
I
Z
A
T
I
O
N

9-7 VERY POOR: does not communicate no organization OR not enough to
evaluate


20-18 EXCELLENT TO VERY GOOD: sophisticated range effective word/idiom choice
usage word form mastery appropriate register

17-14 GOOD TO AVERAGE: adequate range occasional errors of word/idiom form,
choice, usage but meaning not obscured

13-10 FAIR TO POOR: limited range frequent errors of word/idiom form, choice, usage
meaning confused or obscured

V
O
C
A
B
U
L
A
R
Y

9-7 VERY POOR: essentially translation little knowledge of English vocabulary,
idioms, word form OR not enough to evaluate


Testing the Language Skills II

130 Proiectul pentru nvmnt Rural

25-22 EXCELLENT TO VERY GOOD: effective complex constructions few errors of
agreement, tense, number, word, order/function, articles, pronouns , prepositions

21-18 GOOD TO AVERAGE: effective but simple constructions minor problems in
complex constructions several errors of agreement, tense, number, word
order/function, articles, pronouns, prepositions but meaning seldom obscured

17-11 FAIR TO POOR: major problems in simple/complex constructions frequent errors
of negation, agreement, tense, number, word order/function, articles, pronouns,
prepositions and/or fragments, run-ons, deletions meaning confused or obscured

L
A
N
G
U
A
G
E

U
S
E

10-5 VERY POOR: virtually no mastery of sentence construction rules dominated by
errors does not communicate OR not enough to evaluate





5 EXCELLENT TO VERY GOOD: demonstrates mastery of conventions few errors
of spelling, punctuation, capitalization, paragraphing

4 GOOD TO AVERAGE: occasional errors of spelling, punctuation, capitalization,
paragraphing, bet meaning not obscured

3 FAIR TO POOR: frequent errors of spelling, punctuation, capitalization,
paragraphing, poor handwriting meaning confused or obscured

M
E
C
H
A
N
I
C
S

2 VERY POOR: no mastery of conventions dominated by errors of spelling,
punctuation, capitalization, paragraphing handwriting illegible OR not enough to
evaluate


Total
score
Reader Comments

6.4 Summary

The assessment and testing of reading and writing, especially in a
communicative oriented classroom is a thorny issue. Because
reading, like listening comprehension, is totally unobservable, it is
important in reading as it is is in other skills to be able to accurately
assess students comprehension and development of skills. The
following overt response indicate comprehending: doing, choosing,
transferring, summarizing, condensing, extending (providing an
ending to a story), duplicating (translating), modeling (after reading
instructions), conversing (engaging in a conversation that indicates
appropriate processing of information. Six general categories form
the basis for the evaluation of student writing: meaning, organization,
content, vocabulary, discourse (sentences, grammar), syntax,
mechanics (spelling, punctuation). This unit provides a wide range of
procedures that can be applied to your own situation.

6.5 Key Concepts

Schema theory
Skimming
Scanning
Silent reading
Reading aloud
Bottom up approach
Top down approach
Process vs product
Authenticity
Interaction
Testing the Language Skills II

Proiectul pentru nvmnt Rural 131
Correction symbols
Communicative approach
Writer based approach

6.6 Checklist

Do students draw up checklists of criteria for success?
Do your learners get frequent reinforcement, e.g. marks,
comments, praise, etc.?
Does your reinforcement or recognition of success come as
quickly as possible after the student has completed the work?
Are the standards you set seen as work achieving by your
students, as well as being achievable by them?
Do you test regularly, and set well-managed deadlines for
students work?
Are the questions realistic in terms of difficulty, time allowed the
student to respond, complexity of test
Does the essay question establish a framework to guide the
student to the expected answer
a) Is the problem delimited?
b) Are descriptive words used (compare, contrast, define instead of
discuss, explain)

SAA No. 3

1. Evaluate the paragraphs below: use the levels and criteria from
the L2 composition profile. Fil in the description in
a. In the beginning of life there was no classroom, but we read
about many people have a big deal of knowledge. There was no
classroom told the first man in the world how to plan, how to build
his huts. I read about many potteries that have good poems in the
first and second centuries, they knew hoe these poems without any
classroom. In ancient the women knew how to sewing there
chesses without any teacher.
b. I believe that we get more knowledge out side the classroom
than we do inside. A classroom can give us only limited kinds of
information. If we look at the beginning of civilization, foe example,
we will note that people back then did not have formal classrooms,
yet many of them were well informed. There were no classrooms to
teach the first men how to plant or how to build huts. The great
early poets of the first and second centuries didnt learn their
poems in a classroom, nor did the women find out how to sew their
clothes there.
a. Meaning ............................................
Organization .......................................
Content ...........................................
Vocabulary .........................................
Sentences ..........................................
Grammar ..........................................
Mechanics .............................................. Grade:

b. Meaning .............................................
Testing the Language Skills II

132 Proiectul pentru nvmnt Rural
Organization .........................................
Content .............................................
Vocabulary ..........................................
Sentences ...........................................
Grammar ...........................................
Mechanics ................................................ Grade:

Please note that the appropriateness of your evaluation of each
paragraph will count for 50% of your grade. Try to give full
descriptions, e.g. : Meaning: clear, confusing parts, clear
presentation of the point of view etc.

Do not forget to send your evaluation to your tutor in due time.


6.7 Answers to SAQs


SAQ 1 If your answer to SAQ 1 is not comparable to the one suggested
below, please reread section 5.3.6.2 again
When testees read and listen there is nothing to observe as there is
no overt behaviour.
SAQ 2 If your answer to SAQ 2 is not comparable to the one suggested
below, please reread sections 2.4 and 4.2.11 again
The resulting reduction in the range of scores reduces the reliability
of the examination. However, the teacher need not feel compelled to
assign a 10 to the top paper or a 1 to the bottom one. This evaluation
depends not only on the score designed to the paper but also on the
teachers judgment of his/her work.
SAQ 3 If your answer to SAQ 3 is not comparable to the one suggested
below, please reread section 6.2.10 again

1. d, 2.c, 3.b, 4.a.

SAQ 4 If your answer to SAQ 4 is not comparable to the one suggested
below, please reread section 4.2.2.1 again

The answer is C.

SAQ 5 Your answer depends upon your personal study skills experience.

All of them are reference skills.


6.8 Further Readings

Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 24-110
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press, pp 75-101, 116-134

Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 133

Unit 7

TESTING THE LANGUAGE SYSTEM AND BEYOND


7.1 Unit Objectives ......................................................................................................... 133
7.2 Testing Pronunciation................................................................................................ 133
7.3 Testing Grammar and Usage .................................................................................... 138
7.3.1 Multiple- Choice Fill In ........................................................................................ 138
7.3.2 Modify and Fill In ................................................................................................. 138
7.4 Testing Vocabulary.................................................................................................... 140
7.4.1 Cloze ..................................................................................................................... 142
7.4.2 Multiple Choice Fill- In Type ................................................................................ 142
7.4.3 Multiple Choice Synonym Type .......................................................................... 143
7.4.4 Matching ................................................................................................................ 143
7.4.5 Simple Prompts ..................................................................................................... 143
7.4.6 Selection of the Words to Be Tested ..................................................................... 143
7.4.7 Translation ............................................................................................................. 143
7.4.8 True/ False ............................................................................................................. 143
7.4.9 Checklist Tests ...................................................................................................... 143
7.5 Testing Beyond Language Form .............................................................................. 144
7.5.1 Discourse and Culture ........................................................................................... 145
7.5.2 Speech events ....................................................................................................... 147
7.5.3 Literature ............................................................................................................... 148
7.6 Summary .................................................................................................................. 149
7.7 Key Concepts ........................................................................................................... 149
7.8 Checklist ................................................................................................................... 149
SAA 4 ............................................................................................................................. 150
7.9 Answers to SAQs ..................................................................................................... 150
7.10 Further Readings .................................................................................................... 150


7.1 Unit Objectives

Although teachers think that only language skills are usually of
interest, most proficiency tests still retain a grammar or vocabulary
section. It is believed that the lack of grammatical ability sets limits to
what can be achieved in the way of skills performance. The same is
true about vocabulary and pronunciation. Other tests also contain a
literature or culture section.
The aim of this unit is to familiarize you with the different
techniques of testing grammar, pronunciation, vocabulary, discourse,
literature and culture.

7.2 Testing Pronunciation

While most language acquisition occurs because of natural
processes within the learner, processes can be made more efficient
by selectively focusing on language form, even at the expense of
authentic communication, interaction, or integration. We can
Testing the Language System and Beyond

134 Proiectul pentru nvmnt Rural
significantly accelerate and enhance learning a foreign language and
make it more satisfying and cost effective by knowing more about
the linguistic system, and about the processes used by that linguistic
system. One of the components of the language system is
phonology.
Although research emphasizes that though teaching does not
seem to affect the sequence in which language is acquired, it does
seem to affect the rate. As conservative as this statement is,
however, it hardly leads to the conclusion that formal instruction is of
necessity worthless or wasteful. The lack of effectiveness in the
teaching of pronunciation might stem very well from bad methods,
aiming at inappropriate goals (e.g. native like pronunciation instead
of near -native like pronunciation) compressing instruction into
insufficient periods of time, placing too much attention on conscious
manipulation.

Successful strategies:

Use a top-down approach. Instead of beginning with the
articulation of individual sound, newer methods emphasize the
relevant features of pronunciation e.g. stress, rhythm and
intonation. The rhythm and intonation of English are two major
organizing structures that native speakers rely on to process
speech. Because of their major roles in communication, rhythm
and intonation merit greater priority in the teaching program than
attention to individual sounds. (Rita Wong, 1987, 21, Teaching
Pronunciation Focus on English Rhythm and Intonation, Prentice
Hall Regent)
Spread instruction over a longer time
Stress the need to overcome psychological blocks: fear,
frustration, self-consciousness, self image, and so on
Place emphasis on the overall discourse patterns

SAQ 1

True or false?
Competence in culture and discourse are easy to measure/ test.

Circle True or False and compare your answers to that in the Answers to SAQs
section at the end of the unit.



Some of the microskills for listening comprehension (adapted
from Richards) apply to pronunciation too:
produce chunks of language of different length
orally produce differences among the English phonemes and
allophonic variants
produce English stress patterns, words in stress positions, rhythm,
structure
produce reduced forms of words and phonemes

STRATEGIES






A TOP-DOWN
OR
BOTTOM-UO
APPROACH
MICROSKILLS
Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 135
As our goal is focus on clear and comprehensible
pronunciation, we have to think of the factors that affect
pronunciation learning:
the native language of the learner
age
exposure
innate phonetic ability
identity and language ego
motivation and concern for good pronunciation
Test devoted exclusively to pronunciation are rare today. This
does not mean that pronunciation is not important. It means that it is
evaluated with listening and speaking. Pronunciation tests today
incorporate context and meaning.
It is normal for a teacher to try to know what his/ her learners
are learning, and how well he/she will know what they are ready for.
To a large extent, testing should go beyond phonology, but
specialized prosodic levels can be useful. Correction should make
maintaining self- image and motivation a high priority. In
communication activities, correction should be indirect, and not
interfere with the activity, In drills where phonology is the centre of
attention. I advise you to give concise, clear and direct identification
of the problem and request for repetition, both by the individual as
well as by the group. You can take notes, of course, of students
mistakes and mention these collectively, at the end of the activity,
without identifying individual students or give this information
individually.

Point to Ponder

My voice goes after what my eyes cannot reach,
With the twirl of my tongue
I encompass words and volumes of words.
Walt Whitman

Pronunciation is normally not tested repeatedly except for
specialized purposes. This list of techniques does not exhaust the
procedures of testing pronunciation. Except for the first and the
second, most of such testing must be done individually

Discrimination tests. Testees listen and decide which sounds are
similar or different, correct or incorrect are easy to score. In
minimal pair tests, the testee must discriminate between two
words that are identical except for the sounds being focused on
e.g. sink link, cheap chip, sheep ship, pin pen.

Examples: You can use pictures while you say (keeping the intonation
identical in both cases):
The sheep is in the lake. The ship is in the lake.
The testee points to the right picture.

Write two words on the board labeled 1 and 2. Repeat the
words. Then ask testees to give you the number when you give
FACTORS THAT
AFFECT
PRONUNCIATION
CORRECTION
MINIMAL
PAIRS
PICTURES
AND OTHER
TYPES OF
TESTS
Testing the Language System and Beyond

136 Proiectul pentru nvmnt Rural
additional examples: cheap, sheep, lip, chip, sheep, meet, skip, leap,
ship, leap. Be sure to use the same intonation.
You can also use triplets:
Cheap, chip, cheap
Meat, meet, meet
Ship, sheep, ship

Ask testees to pronounce the contrasting word. If you say
ship, they say sheep and so on.
Then have them repeat phrases:
Cheap ship
Meet the jeep
Leap in the jeap
Ship the chips

Identification. Testees can be asked to indicate what they heard.
The listening task can be structured so that complex auditory skills
can be assessed, such as the ability to hear contractions, reduced
vowels, intonation patterns, and so on.

Example When all the three words are the same (AAA), sometimes only
the first and the second words are (AAB). Which is the correct answer?
Ship sheep ship AAA AAB ABA ABC
bad bad bat AAA AAB ABA ABC
bed bad beard AAA AAB ABA ABC
road rod nod AAA AAB ABA ABC
Then have them give full sentences. You might use the pictures from
discrimination tests: The sheep in the lake. The ship is in the lake.
They may make up sentences that contrast the sounds: The ship has
a leak. The sheep has pink lips.


SAQ 2

Study the following questions and circle T (true) or F (false):
1. Written examinations were introduced because
oral examinations were found not to be valid.
2. Objective tests were introduced because it was
found that traditional techniques lacked reliability.
3. Because of their format, objective tests can be
assumed to possess reliability and validity.
4. Tests, unlike examinations, give accurate
information about a testees abilities.
5. By eliminating marker variability, validity is insured.
6. Some people are more variable in their
performance than others.

Compare your answers to those in the Answers to SAQs section at
the end of the unit.





T F

T F

T F

T F

T F
T F



Repetition. Testees mimic the tester who then evaluates the
accuracy of specific components, including stress and rhythm.

Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 137
For initial or final evaluation of pronunciation, a recording of a
passage that includes some of the following features:
General features: lax muscular control, central tongue position,
general articulation, stress and rhythm
Vowels: lengthening, diphthongization
Consonants: consonants contrasts, voiceless/ voiced pairs,
inflectional ending
Word stress, phrase stress, intonation (yes no questions,
information questions series, etc)

Example Read the passage clearly and expressively into the microphone with
your tape recorder set at record:
Joe: Where are you going, Betty?
Betty: Hello, Joe, I am going shopping. Ive just moved and I need
some things for my room. Would you like to come with me, or
are you going to work?
J: Thanks, Id like to come. I want to buy a few things too.
B: Im going to look for chairs, a rug, and perhaps a picture.
J: A rug? How big? Did you measure your room?
B: Oh, no. Im only going to get a little one. A big one would be very
expensive. I havent got much money.
J: I havent either. First, lets go to that old shop the one near the
railway station.
B: OK. My boyfriend told me that was a good place to start.

Reading of isolated elements. Testees can be asked to produce
words or phrases from a list. Alone, this task gives little information
about overall comprehensibility, but it can help pinpoint specific
types of mistakes in articulation or stress.
Reading of dialogues. Testees can practice and then read aloud
natural discourse. The advantage is that everyone has the same
task and it is easier to make comparisons among those being
tested.
Pronunciation in discourse. You can separately rate the various
elements of pronunciation during the course of an interview:
overall physical projection and upper body movement, stress and
rhythm patterns, vowel reduction, articulation, and so on.
Dictation. Dictation is also recommended for its ability to sensitize
listening acuity. If you wish to make the task more specific to
phonological issues, you can use material that includes minimal
pairs, contractions, and allophonic variations, and so on.
Intonation. The teacher models and then asks the testee to
achieve a certain effect.

Example The window is open. neutral statement
The window is open. complaint
The window is open. request that the window be closed
The window is open. warning
OTHER
PROCEDURES
Testing the Language System and Beyond

138 Proiectul pentru nvmnt Rural

7.3 Testing Grammar and Usage

Although grammar is no longer seen as the goal or even
primary means of language acquisition, a wide range of evidence
seems very convincing that by our knowing about grammatical
processes and structures, we can accelerate learning. It means we
can enhance the process, make it more effective or efficient, prevent
wasted energy or exclude learner unnecessary frustration.
Grammar tests are normally of the objective scoring type. Since
there are so many grammatical patterns in a language, it is relatively
easy to design reliable multiple choice tests of grammar. These often
correlate strongly with other types of test, so there is evidence of
their validity. But the ease of construction and the vast number of
possible items are perhaps a temptation to over use such items at
the expense of more elusive communicative testing.

7.3.1 Multiple- Choice Fill In
Many tests have sentences that look like sentences students
have incorrectly produced with a blank where the mistake would be.
The correct choice is one of the choices, of course, along with the
common mistake, and the distractors/ additional mistakes that
students might make or find tempting.

Example: Circle in the margin the letter corresponding to the form to complete
the following sentences when:
A is; B has; C does; D no extra word

1. Tom usually eat lunch at school? A B C D
2. Why that man following us? A B C D
3. Who always knows the solution? A B C D
4. John never seen his cousin. A B C D
5. What your mother do? A B C D
6. he ridden that bike before? A B C D
7. What that phrase mean? A B C D

7.3.2 Modify and Fill In
Some tests provide blanks with uninflected forms in parenthesis. The
student has to decide on the correct form and write it in.
In fill-in tests, testees must supply missing words of forms,
sometimes from a list.
Cloze passages are fill in exercises, but the sentences are al
part of a context. In the following example, the passage requires a
focus on verb tense and aspect. The choices would be provided in
a separate list, or left completely up to the testee. Here is an
easier version:

Example: Sylvia (come) here about a month ago. She (leave) her village
because her father (die) and (leave) her with a cruel step
mother and sister. Sylvia (not hear) from her since she left. She
(live) in a flat with two roommates for the last three weeks. They
(make) her do most of the work while they (go out) to have a good
Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 139
time. Last night she (finished) her work early and (go) to the
local disco. She (have) a good time but (leave) at midnight
because her feet (hurt). Since then she (find) more interesting
things to do and so she (feel) much happier now.

In a choice drill, students make choices between alternatives.
The format is essentially that of a multiple choice test.
An editing test presents students with sentences in or out of
context that have mistakes for them to correct. Alertness to
erroneous forces in proofreading is extremely important. In real life
we are often required to proofread. It can also be a problem
solving activity. Sometimes in such tests the mistakes are marked.
Be aware that when testees have to find and also correct errors,
the test can be much more difficult than it might seem to be.
Patterned performance. Some tests require modifications such
as transformation, deletion and expansion. Normally, papers
would be graded individually.
Translation. Testees translate sentences that have the target
grammatical form. The responses are graded individually.

Scale of Grammatical Competence

Scale of Grammatical Competence
(after Bachman and Palmer, 1983)
Rating Range Accuracy
0 No evidence of correct
morphological and syntactic
structures
No control of sttructures
Errors of all types
1 Limited range of
morphologic and syntactic
structures
Control of very few
structures
Many errors of all types
2 The same as above but with
signs of systematic evidence
Control of some structures
Many error types
3 Large but incomplete range
of morphologic and syntactic
structures
The same as above
4 Large but incomplete range
of morphologic and syntactic
structures
Control of most structures
used
Few error types
5 Complete range of
morphologic and syntactic
structures
Control of most structures
used
Few error types
6 Complete range of
morphologic and syntactic
structures
No systematic errors

Testing the Language System and Beyond

140 Proiectul pentru nvmnt Rural

SAQ 3

True or false?
Grammar has been the skeleton around which most testing has
developed techniques.

Circle T or F and compare your choice to that in the Answers to SAQs section
at the end of the unit.

7.4 Testing Vocabulary

We are fully aware that teachers can effectively help learners
expand both the size and sophistication of their vocabularies. But
how many words is enough to use a new language at all well? Are
words quite independent from the grammatical system and other
schemata? These and other questions form the core of this unit.
Words are not really yours until you appropriate them by using
them for your own purposes. In fact, schooling is little more than the
acquisition of new concepts and the words that go with them. How
many words you need is entirely dependent on what you are trying to
do, with whom, under which circumstances. As you well know, words
are distributed in such a way that a small number of words is very
frequently used and the large number of words become increasingly
rare as the size of vocabulary grows, Words are not evenly
distributed. And furthermore, they do not all occur frequently. Not all
words have the same number of meanings. The most common ones
have the greatest number of occurrences and meanings. For
example, the articles the, a, and an represent about ten percent of
over one million written words. If we add about two hundred other
structure words (like they, their and them) we can account for over 50
percent of all written words. A 92 percent figure is each with only
10,000 words. By 25,000 (half of all the words in the count), you have
almost 98 percent of all the summing words. The number of words
used in normal phone conversations by non native speakers is said
to run around 2,000. Some even say that 850 words can do the work
of 20,000. It is said that a range of about 7,000 words is enough as
an adequate beginning level for functioning in a United States
university. It is obvious that for undergraduate programmes more
words are needed.
Summarily, the frequency of words decreases geometrically
with the size of the vocabulary. It follows that attention should be paid
to words that appear frequently.
Words are also part of a complex network that goes from the
phonological level to the level of background knowledge. Words are
linked to grammar to their frequency, to other words of the same
type, to words with opposite meaning and words that begin or end
with the same letter or syllable, or with words that sound the same
but have different meanings (homophones). It follows that a foreign
language teacher has to select new words carefully and introduce
them properly, with lots of contextual areas, build in explanation,
synonyms, and so on.
VOCABULARY
AND
STATISTICS
Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 141
You should know roughly which words students know in order
to ascertain which materials to use ands how to use them. Or in other
words you need t figure out roughly which new words students are
ready for. You can do this quite easily informally, as well as with
tests. Feedback (correction) also at times seems necessary. Adjust
correction to fit the learner, the context, the social dynamics, and the
time available, having as top priority maintaining incentive and self
esteem. That is, avoid emphasizing the fact that students do not
know words or they use them inappropriately.
Various procedures are used to study words. They offer useful
suggestions for testing. From among these we mention:
Reading or listening to loosely graded materials
Identifying and studying words
Introducing words in the experience process of other language
teaching activities
Introducing words through songs, poems
Introducing words through enriched short contexts (materials that
are not authentic)
Exercises fill in, cloze, matching, complete the words, define and
translate
Word study (prefixes and suffixes, stems)
Use of dictionaries
Self-study activities (cards, computers)

Point to Ponder

The gift of language is the single human trait that marks us all
genetically, setting us apart from the rest of life.
Lewis Thomas, The Lives of a Cell

Why testing vocabulary?
To find learners vocabulary size
Motivating the learner, encourage the learner by setting short
term goals
Feedback for teachers
To evaluate progress
To compare vocabulary size before and after a language
programme

Some rules have to be observed when testing vocabulary:
Avoid traditional types of vocabulary design to test of words that
are rarely used in everyday situations. Vocabulary achievement
tests are highly valued for their backwash effect.
Decide whether you want to test the learners active or passive
vocabulary, the spoken or the written language
In the case of beginners, focus on vocabulary deriving form the
spoken language
IMPROVING
VOCABULARY
TEST SCORES
RULES FOR
TESTING
VOCABULARY
Testing the Language System and Beyond

142 Proiectul pentru nvmnt Rural
The selection may be based on:
A syllabus
A frequency list: A General Secure List of English Words and The
Wright Frequency list (both of them are based on written
language, no account is taken of difficulty levels or of areas of
interference between L1 and L2)
The students textbook or reading materials
Errors taken from the written work of the student
Besides quantitative tests, add qualitative ones (test whether the
learner is able to discriminate between words)
Avoid difficult grammatical structures when you test vocabulary
(words can be grouped function of their frequency and usefulness;
a test should contain more frequent and useful words)
Vocabulary, like grammar, is so easy to test that it might be
better to measure communicative competence, rather than to
measure vocabulary.

SAQ 4

True or false? T F
The ten thousandth word in frequency might appear only once in
over a million words.

Circle T or F and compare your choice to that in the Answers to SAQs section at
the end of the unit.


7.4.1 Cloze
Cloze is indirectly a vocabulary test.
Example
Great Britain is an island that ...1 the Atlantic Ocean and the North
Sea. It 2 the mainlands of England, Wales and Scotland. Ireland
3 the west coast of Great Britain.
(Answers: 1. is surrounded by; 2. comprises, consists of, is
composed of; 3. lies off

Point to ponder

Words are one of our chief means of adjusting to all the situations of life.
Bergen Evans

7.4.2 Multiple Choice Fill In Type

The test item presents a complete sentence or short utterance
within which the target word fits naturally. A blank is inserted for the
target word, and three other words that might conceivably seem
possible (to a non native) are also provided. Students mark the
answer sheet with the letter of their choice.
SOURCES
OF
VOCABULARY
TESTS

Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 143
7.4.3 Multiple Choice Synonym Type
The target word is underlined in a sentence, and 4 choices are
provided. Students pick the one word which they think comes closet
in meaning to the underlined word and mark the answer sheet
accordingly. These tests are harder to construct and are more
confusing than the fill in type.

7.4.4 Matching
Some tests merely present a single word and then a list of four
additional words. The testee picks the one word that matches the
target word. While some object that this is artificial and a distortion of
material processes, the evidence now seems to indicate that the
more proficient readers are the more likely they are to respond
immediately to words out of context.

7.4.5 Simple prompts
Tests occasionally present pictures, words in L1, or definitions
and ask testees to supply the words in L2.

7.4.6 Selection of the Words to Be Tested
It is not easy to test all the words
General Service List choose 60 out of 100 words which will be
used to represent the 2000 headwords
Exclude all the words that cannot be easily tested (a, the, of, be)
It is much easier to test nouns, adjectives, verbs, adverbs
If we use pictures, the selection is based on concrete nouns
Choosing the test items from the words left
Number the words and choose every tenth word

7.4.7 Translation
Translation is a useful way of providing a quick check of learning.
The learners can be asked to translate the underlined words this
makes possible to test words that we could not test with a multiple
choice test. The aim is to find which words in the General Service List
were known and which were not.
2,000 and 3,000 word level contains high frequency words
The 5,000 word level is on the boundary of high and low frequency
words
The 10,000 word level contains low frequency words

7.4.8 True/ False
The words to be tested are put in sentences. If the tested word is not
known, the learner will find it difficult to answer correctly.

7.4.9 Checklist Tests
Checklists using some non - words should be used with caution
as learners with a small vocabulary overestimate their vocabulary.
The method is unreliable with learners who are poor at spelling and
with words having multiple meanings. Testing non words in
sentences is easy to prepare and score.
Testing the Language System and Beyond

144 Proiectul pentru nvmnt Rural
Examples:
1. To test yourself on the vocabulary, fill in the missing letters in
the incomplete words:
A superstition is an untrue b- - - - f held by many p - - - - e based on
fear of n - - - e. The ground hog s - - - y is one of the c - - - - - - t
superstitions.
The matching lexical cloze is a similar type of test. The words
are listed below. In a true matching lexical cloze the words are
omitted according to a system.

2. Choose appropriate words from the list below to complete the
passage. You may need to change the forms of some of the words.
Capable, permit, privilege, employ, complaint

Women in the United States were looked upon, for a long time, as
being less than men. This is the why they were not to have
as many as men.

3. Circle in the margin the letter corresponding to the most
appropriate completion for the following sentences when
A = back; B = along; C = through; D = out; E = off; F = up
I liked the first volume, but I cant get the second. A B C D E F
Be sure to get the bus at the second stop. A B C D E F

4. Circle in the margin the letter corresponding to the phrase which
correctly completes the sentence:
The mother of your father or mother is your A B C D
A. stepmother; B. grandmother; C. godmother; D. mother-in-law

5. Sets. Three of the four words in each line are similar in meaning or
share some common features. Draw a circle around the word that
does not fit:
1. conference, congress, meeting, ethics
2. collapse, dissipate, speculate, decay

6. Multiple choice in context.
Later I ... to them for my bad behavior.
a. apologized
b. applauded
c. enquired
d. entertained

7.5 Testing Beyond Language Form

As you know, language is inextricably intertwined with
information, culture, and products of various kinds including
literature. Although these concepts are usually included within areas
of language skills and language system, they do merit a focus of their
own. If they are interwoven, we cannot separate them anyway, and
by focusing on them we are teaching language in any case.
Furthermore, it is quite impossible to acquire a new language
and not a new culture. Culture offers some students an incentive: to
Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 145
find out how other people think and live. Cross cultural and even
multicultural learning is considered in our days to be highly desirable.

Point to Ponder

Were Shakespeare suddenly to materialize in London or New York
today, he would be able to understand, on the average, only five out
of every nine words in our vocabulary. The Bard would be a semi-
literate.
Toffler

7.5.1 Discourse and Culture

In its turn, discourse is not only a linguistic property, but a
socio-linguistic and cultural component as well. Discourse at a simple
level includes how people select, arrange and time utterances in
order to produce certain effects in those they talk to. In all cultures,
people talk in order to get things done. And the things that try to get
done fall into the same general categories. Finacchiaro and Burnfit
(1983, 65 66) identify 5 broad functions, each in turn, containing
several other functions:
Personal. People carify and express how they are feeling
Interpersonal. People use speech to initiate, interrupt, and end
conversations and to negotiate many other social functions:
complimenting, apologizing, offering, accepting, refusing
invitations, agreeing and disagreeing
Directive. People attempt to influence others and to respond to the
attempts of others to influence them
Referential. People exchange, compress, summarize, and list
information.
All cultures have structured conversations that probably have
the same rules. Grices maxims are observed in all cultures: be just
as informative as you need to be, say things that are true, say things
that are relevant, say things clearly, briefly, and in an organized
manner. But beyond these similarities, cultures have very different
types of speech acts. There are also differences between the sexes
and the different members of the same culture. Anyway, all learners
of a foreign language should know that:
Discourse is shaped by your view of yourself and your
understanding of what right you have to express your opinions to
others
Discourse is shaped by social relations and topic, and how that
topic can or cannot be handled
Discourse is timed, and different cultures have different guidelines
for determining the lengths of pauses, interruptions, length of
talking, and who gets more time, whom to talk
Discourse occurs in physical space and where people are
positioned and how far apart they are
Discourse follows particular conventions (how people begin a
speech act, and how they close it)
GRICES
MAXIMS
DISCOURSE
FUNCTIONS
Testing the Language System and Beyond

146 Proiectul pentru nvmnt Rural
All these rules are learned by experience, observation, and
individual hypothesis testing. Subtle and complex features have to be
taught, the wrong rules can be learned but never unlearned. Sp
explanation seems a necessary strategy. Explanation should be
preceded by observation and followed by meaningful practice.
Discourse functions are normally taught explicitly and by means
of dialogues. Emphasize first recognition and comprehension, provide
selective explanation, and then many additional opportunities for
continuing observation and appropriate responses. When you teach
functions, begin with a short and interesting dialogue, illustrating the
functions e.g. greeting, apologizing. After the dialogue, offer an
explanation (when, why, and how people express these functions).
Further examples are provided. Students practice the dialogue
material; have discussions to react to and interpret the functions,
acting them with the same function in their native language. Finally,
they are asked to use the new function (a variety of prompts is
provided: pictures, unfinished dialogues, practice tasks).
Testing is either by means of tests of knowledge or by
recognition of appropriate responses (in a multiple choice test) by
correctly responding to a prompt, or by performances.

When we teach and test culture we may:
Introduce cultural concepts that include information about the
culture, and require students to solve problems within the
possibilities the culture has available. Other approaches: values
clarification activities (students are given a situation: a conflict
between going out with family and getting homework finished)
Turning in a classmate who is cheating. The discussion should
remain open-ended.
Role play. Students are asked to take roles, some being the
stereotyped group (the outsiders) and the others are the
stereotyping group (the insiders)

Facts about a culture can easily be approached as content
(how the Smiths celebrate a birthday). Facts are easy to list and
quality, therefore, are easy to test.
Behavior can indirectly be tested with regular tests, although
such testing does not at all guarantee that the learner would actually
behave in a comparable way in real life.
Interview, role playing and simulation also offer opportunities for
you to guide students. But practically speaking, setting up a role play
with the purpose of giving a grade can make you put a lot of weigh on
the one or two behaviors. The amounts of time required, and the risk
that testing is not effective, which leaves us with few valid, reliable; and
cost-effective ways of measuring what seems to be among the most
significant of our objectives. Language testing can also include the
higher level of functions, content, literature and culture.
Example:
1. What is the difference between the United Kingdom and Great
Britain?
2. What does the Union Flag stand for and how should it be flown?
3. Does Britain have a National day?
Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 147
4. How do the British celebrate traditional and religious holiday?
5. What and when are bank holidays?
6. What is Pancake Day?
7. What is Guy Fawks Night?
8. What is the significance of the poppy and when is it worn?
9. What are Britains national flags?
10. What are the most common superstitions in Britain?
11. What is the most popular food in Britain?
12. Why do the British like drinking tea?
13. Why do the British like going to the pub?
14. Why is the Tower of London so popular with tourists?
15. What is Speakers Corner?


Point to Ponder

Novice teachers are nearly always surprised by the results of
evaluation; it is not easy to guess who is learning and who is not.


Testing discourse analysis implies assessment of the quality of
coherence deriving from an interaction of text with given participants
i.e. participants knowledge and perception of paralanguage, the
situation the culture, the world in general and the role, intentions and
relationships of participants, the study of cohesive devices
(participants, the study of cohesive devices, pronouns, ellipsis and
conjunctions, differences between the written and the spoken form.
Discourse is more difficult to test than other areas.

Example: Test lexical cohesion by synonyms: each gap should be filled by a
synonym or near synonym of the italicized word in the first sentence.
1. These animals live in rainforests. They are beautiful
2. Have you seen this gadget for cleaning combs? Its an excellent
little ; you should buy it.
3. New moves are afoot to stamp out tax evasion. The are
intended to stamp out tax evasion.
4. Mr. Smith said an altercation had arisen between himself and Mr.
Jones. He said the was over a bill.
Answers: 1. creatures/ beasts; 2. device; 3. measures; 4. argument/
quarrel

7.5.2 Speech Events
Learners of foreign languages are often asked to perform on
certain specific occasions and those occasions have culturally
specific forms: learners give class presentations and attend lectures.
Each of these has a structure, determined by culture. To perform
successfully, those patterns must be followed.
The theory postulates necessary conditions for particular acts.
In an order, for example, the speakers must refer to a possible future
action by the addressee and must have the right to give orders; the
addressee must have the obligation to do the action e.g. You ought
to tidy up, the result is the act of ordering.
COHESION
Testing the Language System and Beyond

148 Proiectul pentru nvmnt Rural
7.5.3 Literature
Literature is the collection of products, usually written down,
that is valued for its aesthetic, rather than its informational content.
The assumption that literature is the ideal basis for both cultural and
linguistic learning is often disagreed on. Nowadays, the writing of
literature is not seen as utilitarian, or as functional. The themes of
novels and plays certainly take place within cultures, but can hardly
be said to be members of the culture. A story that made all aspects
of culture explicit would strike most readers as tedious and
unrealistic. Writers do not write for non native speakers.
Nevertheless, despite changes of approach, doubts about pedagogic
validity and even doubt about its distinct existence as a discourse
type, literature continues to be popular with students and an
inexhaustible resource for the language teacher. Traditionally,
literature has occupied a central position in the teaching of English. It
still seems as a model of the best language. Language learning in
which literature is central inevitably focuses more upon the written
than the spoken language. The intrinsic value of literature, and the
fact that it does provide interesting and authentic use of the
language, has guaranteed it continued prominence.
If you teach literature or teach language through literature use it
in the way that the author intended. If you use it for linguistic or
cultural analysis, do not observe the intentions of the author. Select
literature that your learners will respond aesthetically, personally and
emotionally. Select material that is within the right range of difficulty.
Use the procedures employed in the teaching of other skills: pre-
reading, silent reading or listening, social interaction, integration of
the four skills, and appropriate focus on language. Include writing as
a possible response. Literature continues as an internationally
recognizable discourse in spite of fashions and cultural differences. It
allows people to gain insights into other cultures while also
appreciating the universality of human nature. It also allows you to
enjoy a universal pleasure in language art. All these factors have
ensured that literature teaching has survived. It continues
strengthened rather that weakened by the current debates.

Point to Ponder

Do not underestimate the motivating effect of an anticipated test.


When you teach culture, you teach about a culture (family, size,
customs, holidays, and educational system), attitudes towards a
culture, and behavior appropriate for a culture (how to behave in a
family, act during a ceremony, etc). When we think about teaching a
culture we have to think of what is essential to teach in order to
survive in the respective culture. Becoming native would take many
years of effort. In other words, choosing the right things to teach
seems more important for culture than it does for language.
HOW TO
SELECT
LITERARY
TEXTS IN TEFL
TEACHING
CULTURE
Testing the Language System and Beyond

Proiectul pentru nvmnt Rural 149

7.6 Summary

This unit explained how to assess mastery of the subskills of English
i.e. to test how well each component has been mastered as a subskill
of the four main skills. Of course, tests of grammar, vocabulary,
pronunciation do not show exactly how well a person uses English,
but they can help teachers identify students strengths and
weaknesses in oral or written communication. Choosing which
procedure to use depends on the learners age and language ability
as well as on the kind of skill being taught. It is true that tests devoted
exclusively to pronunciation are rare today. This does not mean that
testing pronunciation is useless. It simply means that this subskill is
assessed in conjunction with listening and speaking, incorporating
context. Pronunciation items can be useful as they may measure
progress made on specific points of pronunciation.

7.7 Key Concepts

Top-down
Bottom-up
Discrimination test
Minimal pairs
Intonation
Modify and fill in
Discourse
Grices rules
Cohesion
Coherence
Speech events
Sheltered academic programme
Culture


7.8 Checklist

Do all students achieve some success and get some
reinforcement?
Do you ask your students to set themselves tasks?
Do you adopt assessment methods which do not rely exclusively
on written assessments?
Do you ask questions equally of males and females?
Do all your students get some measure of success in their
learning? Does this success get quickly reinforced?
Do you encourage self evaluation and student responsibility?
Do your homework assignments combine maximum learning value
with minimum marking effort?
Are you rigorous about setting colleting and marking homework?
And do your students parents know this?
Do you use homework marks for report, or record cards?

Testing the Language System and Beyond

150 Proiectul pentru nvmnt Rural

SAA No. 4

Give your learners three tests of one grammatical feature e.g. form
and use of personal pronoun, a multiple choice test, a cloze test,
and ten sentences to translate into English. Make graphs of the
number of errors made by the learners on each test. Repeat the tests
in a different order for another feature e.g. Past tense versus Present
Perfect and examine the results as well. Write an analysis of what
this experiment has revealed about the relative difficulty and
discriminatory power of the tests, and the most persistent problems
for the students who are learning these features.



Send your paper to your tutor.

7.9 Answers to SAQs

SAQ1 Your answer depends upon your personal teaching and learning
experience.

False. Cultural and discourse competence are very difficult to
measure. One reason is that the number of items is smaller, but the
bigger problem is that attitude and behavior are difficult to assess.

SAQ2 Your answer depends upon your personal teaching and learning
experience. You may also read sections 4.2.1.2 and 4.2.1.3

1,2,3,4,5,6,7 True

SAQ3 If your answer to SAQ 3 is not comparable to the one suggested
below, please reread section 7.3 again
True

SAQ4 If your answer to SAQ 4 is not comparable to the one suggested
below, please reread section 7.4 again

True. Words drop off sharply in the probability they will be
encountered with decreasing frequency. 98% of words in learned
writing are among the six to ten thousand most frequent. The 50,000
th

frequent word can be expected to appear only once in about a million
words of running text.


7.10 Further Readings

Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 110-118
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press, pp 141-152
New Trends in Testing

Proiectul pentru nvmnt Rural 151

Unit 8

NEW TRENDS IN TESTING


8.1 Unit Objectives ......................................................................................................... 151
8.2 General Trends.......................................................................................................... 151
8.3 Computer- Based Language Testing ........................................................................ 152
8.4 Alternative Assessment ............................................................................................ 156
8.4.1 Techniques ............................................................................................................ 156
8.4.2 Journals ................................................................................................................. 156
8.4.3 Conferences .......................................................................................................... 157
8.4.4 Cooperative test construction ................................................................................ 157
8.5 Portfolios ................................................................................................................... 157
8.5.1 Characteristics ....................................................................................................... 158
8.5.2 Assessing Portfolios .............................................................................................. 159
8.5.3 Portfolio Content .................................................................................................... 160
8.5.4 Useful advice on development of portfolios ........................................................... 161
8.6 Summary .................................................................................................................. 162
8.7 Key Concepts ........................................................................................................... 163
SAA 5 .............................................................................................................................. 163
8.8 Answers to SAQs ..................................................................................................... 164
8.9 Further Readings....................................................................................................... 164


8.1 Unit Objectives

This unit tries to identify the main trends in assessing and testing
learners of English as a foreign language. By the end of this unit you will:

be aware of the advantages of using qualitative methods of
assessment
be able to adopt and apply alternative procedures of assessment
appreciate the value of assessment as a process
be able to assess learners portfolios
be aware of the advantages of using computer-adaptive testing

8.2 General Trends

There is extreme resistance to change in regard to language
testing. Testing is conservative as it concerns institutional standards,
norms and society. In spite of this, there are signs that
a closer relation between second language acquisition studies and
language testing will take place
testing will provide a very useful link for second language
acquisition and Applied Linguistics
elaboration of more scientific criteria for language tests
more attention to validity than to reliability

New Trends in Testing

152 Proiectul pentru nvmnt Rural
8.3 Computer- Based Language Testing

Definition: A computer based language test is a test that is
delivered and scored by computer. The computer needs to be able of
judging whether or not a particular response is correct.
Computer Based Language Testing has tended not to keep
pace with those in CALL (Computer Assisted Language Learning)
because of the difficulty in programming the computer to deal with
open ended input.
In spite of this, the computer plays a significant role in language
testing. It provides
a user friendly testing environment to the test candidate
a variety of options within a test
a way of recording information that both assess linguistic performance
and help in identifying a candidates test taking strategies

The computer may be used:
as a testing device, especially for informal classroom test
as a tool for research into test-taking and language learning
strategies

Limitations
the quality of language which is produced during a direct test of
oral production can not be assessed by the computer.
a computer can not cope with grammatical analysis above the
sentence level or with semantic analysis
direct test of written production is limited to items which require
relatively short, predictable responses

Advantages
it can stimulate oral production by using a simulation with a group
of learners
it has characteristics of speed memory , patience and flexibility
there are two ways in which these characteristics can be exploited
to allow computers to provide computer-adoptive tests (tests which
the computer adapts to the individual)
o learner-adaptive testing implies immediate feed-back, a
second choice at a question if their first answer is
wrong, access to a dictionary or glossary and clues.
The number of letters in the word that is the correct
answer given, one or more letters are given in various
positions, reference to other information in the text,
explanations
o by providing learner-adaptive tests (tests adapted to the
individual candidate), the testee can select from a list of
9 different item types (multiple choice, gap filling,
transformation, correction, insertion, deletion,
identification, organization, matching)
it can be programmed to accept alternative responses, mis-
spelling, to carry some syntactic analysis of responses
New Trends in Testing

Proiectul pentru nvmnt Rural 153
Self-assessment tests in which candidates are asked to say
whether they think they know the answer to a question or how well
they think they would be able to perform a test, offer feed-back to
the teacher/test constructor, reduce learner alienation from the
testing process, provide evidence of progress in the area of self
assessment. The use of self-assessment procedure has an
important role in student motivation. Self-assessment and
conventional procedures can be calculated, compared, and
presented immediately.

Features that are characteristic to a teaching programme should
have a place in a test because
it is difficult to choose between a teaching activity and a test (it is
obvious that learners should learn from both)
take-away tests are becoming increasingly common
a candidates reference to the help facilities can be recorded (in
this way we gather information about strategies, the state of
candidates knowledge)

Other information that can help research
the time taken for the test
the order in which candidates answer questions provides
information about test taking strategies and candidates
processing, about the difficulty a candidate is having with different
questions function of the number of times a candidate considers
different questions before answering them

Conclusions about computer assessment
The testees enjoy computer-based tests more than paper based
ones
Testees consider that the computer based tests are more useful
(immediate feedback and the second try facility are more useful)
Testees feel more relaxed
Computer-based tests measure more than language performance
The results are influenced by past experience of using computers
Computers are at a point suitable for classroom tests
Computer- based tests tend to become more flexible and friendlier
The barriers between teaching and testing are being blurred
Computer-based tests provide access to other sources of
information e.g. about a candidates test performance and
strategies which they use to achieve performance

Three examples of computer adaptive tests
Decision point tests these tests are constructed with items of
difficulty limited to the task difficulty at the decision cut-off points
Step ladder tests on these tests precalibrated items are
clustered at a series of graduated difficulty steps
Error-controlled tests these tests are distinguished in that
examinee ability is re-estimated using an appropriate ability
estimation algorithm after each item is evaluated

New Trends in Testing

154 Proiectul pentru nvmnt Rural
SAQ 1

What kind of tests are these questions specific to:
1. This is a test about the use of the Past Tense and Present Perfect in
English. Put a cross on the number which shows how well you think
you can use these tenses:
I make few mistakes when I use these tenses. 10, 9, 8, 7, 6, 5, 4, 3,2,1, 0
I always make mistakes when I use these tenses.
2. Do you think you can answer this question correctly? Yes Not sure
No






Write your answers in the space provided above (in no more than 20 words) and
compare them to those in the Answers to SAQs section at the end of the unit.

The curriculum for the twenty first century emphasizes
A shift from content and objectives to skills and processes
The empowerment of learners to act on their own
Focus on new knowledge, on the interdependence of knowledge
areas, and on the relevance of school knowledge to everyday
problems

Implications for the assessment of learning

A norm referenced approach to assessment is no longer
suitable assessment of the quality of work. A curriculum context that
encourages feedback focused on learning purposes and that values
critical, reflective, interactive processes for development and
improvement is more important.

Ethical Issues of Critical Language Testing

Taking into account that psychometric traditions in testing are
challenged by interpretative, individualized procedures for evaluating
ability and that tests are undoubtedly embedded in culture and
ideology, test designers have to offer new ways of testing for varying
styles, abilities, and intelligences among test takers.

Other challenged convictions
Standardized tests are not infallible in their predictive validity
Tests are culture- biased


Points to Ponder
Tests serve as gatekeepers in society.
Tests are milestones in the journey to success.


New Trends in Testing

Proiectul pentru nvmnt Rural 155
Recent developments in classroom testing prove that the
mentality of educators about testing undergoes a process of change.
These developments are determined by a broader view on the
measurement of ability and with the development of more authentic
testing rubrics. These changes derive from:
the research on intelligence by Howard Gardner and Robert
Sternberg who identified more than one type of intelligence
(linguistic, logical mathematical, visual spatial representation,
bodily kinesthetic, thinking and two forms of personal
understanding, intrapersonal) freed testing from exclusive reliance
on time, discrete point, analytical tests in measuring language
the above research conclusions on intelligence infused this field
with a responsibility of tapping into whole language skills, learning
processes and the ability to negotiate meaning. Our challenge was
to test interpersonal, communicative, interactive skill, and in doing
so, to place some trust in our subjectivity, our intuition (Brown:
404)
as a result, more performance based testing is involved in testing
of typical school subjects in spite of the fact that they are time-
consuming and expensive: open ended problems, hands-on
projects, student portfolios, experiments
more and more interactive language tests i.e. test that assess
while testees actually perform the behavior we want to measure
i.e. involving test takers in speaking, requiring, responding,
combining listening and speaking, or reading and writing. For
example, Swains test battery includes paper and pencils
multiple choice tests, oral communication skills and written
proficiency; OPI (Oral Proficiency Interviews), a widely used
interactive proficiency test is currently in the process of revision.
The current period uses fewer and fewer de-contextualized tests in
favour of alternative and more authentic means of testing. Brown
offers a table that highlights the differences between the traditional
and alternative approaches to assessment:

Differences between the traditional and alternative
approaches to assessment (after Brown)

Traditional Alternative
One-shot standardized exam Continuous long-term assessment
Timed, multiple- choice format Untimed, free-response format
Decontextualized test items Contextualized communicative tasks
Score: suffice for feedback Formative, interactive feedback
Norm referenced scores Criterion referenced scores
Focus on the right answer Open ended, creative answer
Summative Formative
Oriented to product Oriented to process
Non interactive performance Interactive performance
Fosters extrinsic motivation Fosters intrinsic motivation
REASEARCH
ON
INTELLIGENCE
New Trends in Testing

156 Proiectul pentru nvmnt Rural

8.4 Alternative Assessment

Alternative assessment includes self-assessment and peer-
assessment. Self-assessment has the following advantages: speed,
direct involvement of learners, learning autonomy, increased
motivation. Disadvantage: subjectivity.

8.4.1 Techniques
Oral production: use of self checklists or peer checklists to detect
pronunciation or grammar errors
Listening comprehension: listening TV or broadcast tapes and
checking comprehension with a partner, asking for help when you
do not understand something
Writing: revising written work on your own, with a peer,
proofreading
Reading: reading passages followed by self-check
comprehension questions, reading and checking comprehension
with a partner, vocabulary, quizzes
Other characteristics
Alternative authentic assessment are varied and cohesive
It encourages multiple methods for demonstrating learning
It can promote learning opportunities beyond the classroom
Encourages students to develop skills, understanding, and insights
relevant to their particular needs and contexts
Make assessment fair by reducing the dependence on
performance in a single examination as the only determinant of
student achievement and by giving individuals the opportunity to
demonstrate attainment over time and in a variety of contexts
Promotes complex thinking and problem solving
Encourages students performance of their learning
Engages with issues of equity
To make assessment more accurate and reflective of an
individuals learning and development by identifying the abilities
being examined
Knowledge is assessed in term of its constructive use for further
learning rather than view it simply as a measure of achievement

8.4.2 Journals
Journals or a dialogue between teacher and students may include
language learning logs, student grammatical discussions, reactions
to readings, personal feelings, attitudes. You should:
Tell the learners how to get started, give them a model
Make the learners aware of the importance of journals, their content
Give directions about length of each entry
Collect, read and return promptly the journals
Make the feedback clear
Help learners to process your feedback
New Trends in Testing

Proiectul pentru nvmnt Rural 157

8.4.3 Conferences
Conferencing implies a one-to-one interaction between teacher
and student. The role of the teacher assumes the role of the
facilitator and guide, an ally of the student. It develops in your learner
self-reflection attitudes. These alternative types of assessment are
formative looking forward towards further development.

8.4.4 Cooperative test construction
Ask the testees about the things they have learned and that
should be in a test. Then ask them to formulate the actual test
questions. The teacher makes them aware that the real test will
contain some of the questions they have selected. The cooperation
test becomes in this view a way to stimulate review and integration.
SAQ 2

List and explain three advantageous features of a system for the
machine construction of tests.






Write your answers in the space provided above (in no more than 30 words) and
compare them to those in the Answers to SAQs section at the end of the unit.


8.5 Portfolios

The increasing dissatisfaction with traditional, quantitative forms
of assessment has led to the development of alternative assessment
approaches. The theories behind the use of the portfolio for both
assessment and learning purposes are the constructivist learning
theories (which see learners as actively making sense of new
knowledge and deciding how to integrate it with previously held
concepts) and Vygotskys notion of the zone of proximal
development. The use of portfolios for both assessment and learning
purposes provides opportunities for demonstrating learning and for
the development of important learning dispositions, processes and
strategies.


Point to Ponder

Lev Semyonovich Vygotsky (1896 1934) Russian psychologist,
born in Orsha. He studied various social sciences at Moscow
University, and turned to psychology when aged 28. This last
decade of his life, when he was at the Institute of Psychology in
Moscow (1924 34) was his productive period. His theory of
cognitive development, especially his view of the relationship
between language and thinking, have strongly influenced western
New Trends in Testing

158 Proiectul pentru nvmnt Rural
psychology. He was open to intuition, had an undogmatic approach
to experimental methodology, and moved easily between the pure
and applied fields. He always emphasized the role of the cultural
and social factors in the development of cognition. Thought and
Language is now a classic text in university courses in
psycholinguistics.


Portfolio use for assessment parallels the shift from quantitative
tradition of assessment to a more qualitative approach. In the
quantitative tradition, curriculum is viewed as discrete units (listed as
decontextualized objectives and possibly including facts, skills,
competences and performance indicators). The teacher transmits
units of knowledge to learners in a fixed time frame. The learner
attempts to acquire the content transmitted by the teacher.
Assessment is norm referenced (based on multiple-choice or essay
marks). Such assessment practices atomize knowledge. Portfolios
offer an opportunity to address some of the limitations of qualitative
assessment.
The significance of engaging students in the process of
developing a portfolio of work is best understood in the context of this
conception of learning. Teachers provide feedback to promote
learning. In the portfolio process this occurs when the students
collect, select and evaluate their own mark.

Points to Ponder

What the child is able to do in collaboration today, he/she will be
able to do independently tomorrow. (Vygotsky)
Many people are increasingly likely to live so-called portfolio
lives constantly needing to update their skills and knowledge in
order to take advantage of opportunities as they arise. Their
skills will need to be transferable. This changing world will thus
place much greater emphasis on individuals taking responsibility
for reflecting on what they have already experienced, setting
future learning goals and preparing plans in order to improve
their contribution and their employability. (Report of the Steering
Group of the National Records of Achievement Review)
Whenever Sir Isaac Newton had a particular thorny problem, he
worked on it just before he went to sleep. He said, I invariably
wake up with a solution.

The notion of a portfolio of work, developed overtime,
incorporating critical reflections and self-evaluation of what has been
achieved makes for a more compatible assessment system.

8.5.1 Characteristics
Reflection on the inclusions in the portfolio (assignment, written
paper, test)
Reflections on what has been learned (provide the opportunity for
another level of analysis i.e. the extent to which intentions and
purposes have been achieved)
New Trends in Testing

Proiectul pentru nvmnt Rural 159
The system empowers students to take responsibility for learning

Portfolio use requires a constructivist pedagogy characterized by:
Opportunities to analyze learning
Teacher facilitation of learning
Group and pair work
Student teacher dialogue about students learning
Available support

All students need to acquire skills in self-assessment,
continuing learning, self-evaluation and planning of the future work
because of the radical changes of a global economic, social and
political nature, implying necessary changes to assessment,
pedagogy and the curriculum as described through the use of
portfolios. All portfolios should contain pieces of evidence and the
more relevant the evidence the more useful it is for evaluating the
level of achievement.

Developmental phases and uses of portfolio:
1. conceptualization of portfolios to support the learning process
2. construction and development of portfolios to support learning
processes
3. grading the portfolios: reliability, standards, summative
assessment or holistic assessment

Guidelines for using portfolios include purpose, how to get
started (a sample portfolio from a previous student), indicate
acceptable material, evaluate portfolios, give feedback, help learners
respond to your feedback.

8.5.2 Assessing Portfolios
assess in a team
quality not quantity content
be clear about what you are assessing
structure your feedback
encourage creativity
provide opportunities for self-assessment
set up an exhibition
get students to provide structure clearly labeled and numbered for
easy reference
get students provide route maps carefully structured
ask students to answer to questions such as: After you have
completed your portfolio, what do you consider you did especially
well, and what would you now do differently?
Self evaluation is an integral part of portfolio assessment
Portfolios are used to supplement, not replace, traditional
assessment procedures
Make the whole portfolio process a collaborative teacher student
effort (the teacher is a consultant to the student)


PORTFOLIO
AND THE
CONSTRUCTIVIST
PEDAGOGY
New Trends in Testing

160 Proiectul pentru nvmnt Rural

8.5.3 Portfolio Content







A portfolio:
Contains evidence of a
students achievement,
skills,
accomplishments)
Journal and
logs
Examples of
written work
Videotapes of
student
performance
Self
evaluation
Mind map and
notes
Charts,
graphs
Questionnaire
results
List of books
read/summaries
Tests and
quizzes
Audiotapes of
presentations
New Trends in Testing

Proiectul pentru nvmnt Rural 161

8.5.4 Useful advice on development of portfolios (after Ronald L. Pastin)


Useful advice on
development of
portfolios
(after Ronald L. Portin)
Help the parent
examine the portfolio
(make him/her aware
of evidence of
progress and areas of
needed improvement)
Ask students to
reflect on which
items are worth
including
Set a limited
number of
objectives
Organize
conferences to
review students
portfolios
Be sure each
item is dated (to
assess the
evaluation of
progress)
Develop your own teaching portfolio as a means of
facilitating your professional development (video
tapes of successful classes, curriculum materials,
sample lesson plans, your goals and objectives,
workshop classes attended, publications, awards,
certificates, professional affiliations, your teaching
philosophy, principals evaluation, inspections)
New Trends in Testing

162 Proiectul pentru nvmnt Rural

SAQ 3

Why people who do well in intelligence tests usually are better
learners than those with excellent memories?




Write your answers in 20 words in the space provided and compare them to
those in the Answers to SAQs section at the end of the unit.

8.6 Summary
This unit pinpoints the main trends in contemporary assessment
and testing that betray a move from quantitative to qualitative
testing, from paper based tests (PBT) to computer based tests,
from traditional testing to alternative testing, including self-
assessment. Advantages of CAT (Computer Adaptive Testing) may
be described as follows:
Individual testing time may be reduced
Frustration and fatigue are minimized
Boredom is reduced
Test scores may be provided immediately
Diagnostic feedback may be given immediately
Test security may be enhanced
Record-keeping functions are improved
Reporting, research and evaluation capabilities are
expanded.
A constructivist approach to testing, one of the alternatives to
traditional testing, is portfolio evaluation. Several advantages are:
It promotes cooperation rather than competition
It enhances professional communication
It requires no technical knowledge of quantitative evaluation
procedures
Ideas are conserved for future application in other classes
Other characteristics of contemporary trends include:
Continuous long term assessment
Formative tests
Criterion referenced score
Interactive and motivation

8.7 Key Concepts

Alternative assessment
Bodily kinaesthetic intelligence
Constructivist approach
Computer
Computer adaptive testing
Computer administered test
Computer assisted instruction (CAI)
New Trends in Testing

Proiectul pentru nvmnt Rural 163
Computer assisted language learning (CALL)
Computer based instruction
Computer literacy
Computational linguistics
Error control test
Interpersonal intelligence
Intrapersonal intelligence
Journal
Linguistic intelligence
Log
Logical mathematical intelligence
Microcomputer
Mainframe computer
Musical intelligence
Portfolio evaluation
Spatial intelligence
Zone of proximal development


SAA No. 5

This paper aims at making you identify, clarify and develop ou
informed, comprehensive personal philosophy of grading that is
consistent with your philosophy of teaching and evaluation.
Examine the following items and circle the figures for all items that
should be included in your set of criteria for determining a final mark.
Write in a percentage the weight you would assign to each circled
item (obviously the total percentage is 100 %)
1. Language perfomance of the learner (based on tests, quizzes,
other objective tests) ..........
2. Your informal observation of the learner's language ......
3. Oral participation in class activities ..............
4. Attitudes and behaviour: degree of cooperation, politeness,
disruption in the classroom ..............
5. Effort ...........
6. Motivation .................
7. Punctuality and attendance .............
8. Self-assessment ...............

Write an essay about your philosophy of grading taking into account
your answers to the questionnaire above. Do not write more than
three pages. Consider the following questions:
Do you consider yourself consistently impeccable in your objectivity?
Can you capture the totality of your student's competence only
through formal tests? What is the value of alternatives in
assessment?
Example:
I base my final marks/grades on student language performance
because I think grades should represent .................. Grades should
not be contaminated by ........................... I discourage the inclusion
of ..............................
New Trends in Testing

164 Proiectul pentru nvmnt Rural

Your essay will be scored according to the following grading scale
and explanations:
50% - clarity and strength of all main ideas and supporting ideas,
argument and logic;
10% - grammatical and mechanical errors, word choices and
expressions;
20% - cohesive devices within and across paragraphs
10% - documentation, citation of sources, evidence, and other
support
10% - adequacy and strength of the conclusion.


Do not forget to send your evaluation to your tutor in due time.


8.8 Answers to SAQs

SAQ 1 If your answer to SAQ 1 is not comparable to the one suggested
below, please reread section 4.3 again

These are self-assessment questions.

SAQ 2 If your answer to SAQ 2 is not comparable to the one suggested
below, please reread section 8.3 again

Difficulty and content range could be specified in advance
By consideration of the number of items employed, test reliability
could be estimated in advance
Since the items are being assembled by machine from a large
item bank, security is maintained

SAQ 3 If your answer to SAQ 3 is not comparable to the one suggested
below, please reread section 8.3 again (the paragraph on inteligence)

IQ tests measure the abilities of pattern recognition, non-verbal and
verbal reasoning, and problem solving. Because learning requires
creative meaning making, not passive remembering.

8.9 Further Readings

Brown, H. Douglas (1994) Teaching by Principles, Englewood Cliffs: Pretince Hall
Regents, pp 373 - 395
Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 118-
132

Bibliography

Proiectul pentru nvmnt Rural 165

Bibliography



Atkinson, Rita L., Richard C. Atkinson, Ernest R. Hilgard (1983) Introduction to
Psychology, Eighth Edition, San Diego: Harcourt Brace, Jovanovich Publishers
Brown, H. Douglas (1994), Teaching by Principles, An Interactive Approach to Language
Pedagogy, Englewood Cliffs: Prentice Hall Regents
Brown, H. Douglas (1994), Language Learning and Teaching, Third Edition, Englewood
Cliffs: Prentice Hall Regents
Brown, H. Douglas (2004), Language Assessment. Principles and Classroom Practices,
San Francisco State University, Longman.
Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan
Heaton, Brian (1991) Language Testing, Modern English Publications: London
Hedge, Tricia (2000), Teaching and Learning in the Language Classroom, Oxford: Oxford
University Press
Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University
Press
Jones, Leo (1991), Cambridge Advanced English, Cambridge: Cambridge University
Press
Seaton, Brian (1982), A Handbook of English Language. Teaching Terms and Practice,
London: Macmillan
OGrady, William and Michael Dobrovolsky (1989), Contemporary Linguistics, New York:
St. Martins Press
Patton, Michael Quinn (1982), Practical Evaluation, London: Sage Publications, Newbury
Park
Pavelcu, Vasile (1968) Principii de docimologie, Bucuresti: EDP
Seaton, Brian (1982), A Handbook of English Language Teaching Terms and Practice,
London: The Macmillan Press Ltd
Vagler, Jean (2000), Evaluarea n nvmntul preuniversitar, traducere de Ctlina
Grba i Ionela Blu, Iai: Polirom

S-ar putea să vă placă și