Documente Academic
Documente Profesional
Documente Cultură
perform a set of skills. Tests vary in style, rigor and requirements. For example, in a
closed book test, a test taker is often required to rely upon memory to respond to
specific items whereas in an open book test, a test taker may use one or more
KINDS OF TEST
Tests are a way of checking your knowledge or comprehension. They are the main
According to research studies, tests have another benefit: they make you learn and
remember more than you might have otherwise. Although it may seem that all tests
are the same, many different types of tests exist and each has a different purpose and
style.
Diagnostic Tests
These tests are used to diagnose how much you know and what you know. They can
help a teacher know what needs to be reviewed or reinforced in class. They also
Placement Tests
These tests are used to place students in the appropriate class or level. For example,
in language schools, placement tests are used to check a student’s language level
questions. After establishing the student’s level, the student is placed in the
syllabus. These tests only contain items which the students have been taught in class.
Short-term progress tests check how well students have understood or learned
material covered in specific units or chapters. They enable the teacher to decide if
learner’s progress over the entire course. They enable the students to judge how well
they have progressed. Administratively, they are often the sole basis of decisions to
BACKWASH
The backwash effect (also known as the washback effect) is the influence that a test
has on the way students are taught (e.g. the teaching mirrors the test because teachers
want their students to pass). The washback effect is the outcome of a test or an
examination which results either in positive or in a negative way. There are two basic
terms of the washback effect, which are the Positive and the Negative washback
effect. Positive washback occurs when there is harmony between the teaching and
the students' examination or a class test performance. The Negative washback effect
occurs when there is no sync between what is taught and what is performed (such as
narrowing down the content). Both of these types of washback influence the teaching
as well as the learning process.
Beneficial backwash
Beneficial backwash is given when the teaching and the learning of a test is focused
on all the languages skill.
Harmful backwash
Harmful backwash is given when a test measures just a part of the languages skills
instead of testing all language skills. Therefore, if the ability of writing is tested by
a multiple-choice test, teachers will focus their methodology on multiple-choice
items, instead of teaching writing itself.
Constructive alignment
Constructive alignment is a design for teaching in which what it is intended students
should learn and how they should express their learning is clearly stated before
teaching takes place. Teaching is then designed to engage students in learning
activities that optimize their chances of achieving those outcomes, and assessment
tasks are designed to enable clear judgments as to how well those outcomes have
been attained.
Reliability
Reliability refers to the extent to which assessments are consistent. Just as we enjoy
having reliable cars (cars that start every time we need them), we strive to have
reliable, consistent instruments to measure student achievement. Another way to
think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes
in the morning, and the scale is reliable, the same scale should register five pounds
for the potatoes an hour later (unless, of course, you peeled and cooked them).
Likewise, instruments such as classroom tests and national standardized exams
should be reliable – it should not make any difference whether a student takes the
assessment in the morning or afternoon; one day or the next.
Another measure of reliability is the internal consistency of the items. For example,
if you create a quiz to measure students’ ability to solve quadratic equations, you
should be able to assume that if a student gets an item correct, he or she will also get
other, similar items correct. The following table outlines three common reliability
measure
Validity
Validity refers to the accuracy of an assessment -- whether or not it measures what
it is supposed to measure. Even if a test is reliable, it may not provide a valid
measure. Let’s imagine a bathroom scale that consistently tells you that you weigh
130 pounds. The reliability (consistency) of this scale is very good, but it is not
accurate (valid) because you actually weigh 145 pounds (perhaps you re-set the scale
in a weak moment)! Since teachers, parents, and school districts make decisions
about students based on assessments (such as grades, promotions, and graduation),
the validity inferred from the assessments is essential -- even more crucial than the
reliability. Also, if a test is valid, it is almost always reliable.
There are three ways in which validity can be measured. In order to have confidence
that a test is valid (and therefore the inferences we make based on the test scores are
valid), all three kinds of validity evidence should be considered.
Test Validity
Test validity is an indicator of how much meaning can be placed upon a set of test
results.
Criterion Validity
Criterion Validity assesses whether a test reflects a certain set of abilities.
Concurrent validity measures the test against a benchmark test and high
correlation indicates that the test has strong criterion validity.
Predictive validity is a measure of how well a test predicts abilities. It involves
testing a group of subjects for a certain construct and then comparing them with
results obtained at some point in the future.
Content Validity
Content validity is the estimate of how much a measure represents every single
element of a construct.
Construct Validity
Construct validity defines how well a test or experiment measures up to its claims.
A test designed to measure depression must only measure that particular construct,
not closely related ideals such as anxiety or stress.
Convergent validity tests that constructs that are expected to be related are, in
fact, related.
Discriminant validity tests that constructs that should have no relationship do, in
fact, not have any relationship. (Also referred to as divergent validity).
Face Validity
Rubric
A rubric is a great tool for teachers because it is a simple way to set up a
grading criteria for assignments. Not only is this tool useful for teachers, it is
helpful for students as well. A rubric defines in writing what is expected of
the student to get a particular grade on an assignment. A good rubric also
describes levels of quality for each of the criteria. These levels of
performance may be written as different ratings (e.g., Excellent, Good,
Needs Improvement) or as numerical scores (e.g., 4, 3, 2, 1) Under
mechanics, for example, the rubric might define the lowest level of
performance as "7-10 misspellings, grammar, and punctuation errors," and
the highest level as all words are spelled correctly; your work shows that you
understand subject-verb agreement, when to make words possessive, and
how to use commas, semicolons and periods.
Direct testing
A test is said to be direct when the test actually requires the candidate to demonstrate
ability in the skill being sampled. It is a performance test. For example, if we wanted
to find out if someone could drive a vehicle, we would test this most effectively by
actually asking him to drive the vehicle. In language terms, if we wanted to test
whether someone could write an academic essay, we would ask him to do just that.
In terms of spoken interaction, we would require candidates to participate in oral
activities that replicated as closely as possible [and this is the problem] all aspects
of real-life language use, including time constraints, dealing with multiple
interlocutors, and ambient noise. Attempts to reproduce aspects of real life within
tests have led to some interesting scenarios.
Indirect testing
An indirect test measures the ability or knowledge that underlies the skill we are
trying to sample in our test. So, for example, you might test someone on the Highway
Code in order to determine whether he is a safe and law-abiding driver [as is now
done as part of the UK driving test]. An example from language learning might be
to test the learners’ pronunciation ability by asking them to match words that rhymed
with each other.
THE END