Sunteți pe pagina 1din 7

VALIDITY

Validity is construct validity has been increasingly used to refer to the general, overarching
notion of validity. It is not enough to assert that a test has construct validity; empirical evidence
is needed.

A. Face validity

Face validity considers how suitable the content of a test seems to be on the
surface. It’s similar to content validity, but face validity is a more informal and subjective
assessment. As face validity is a subjective measure, it’s often considered the weakest
form of validity. However, it can be useful in the initial stages of developing a method.

B. Content validity

Content validity assesses whether a test is representative of all aspects of the


construct. To produce valid results, the content of a test, survey or measurement method
must cover all relevant parts of the subject it aims to measure. If some aspects are
missing from the measurement (or if irrelevant aspects are included), the validity is
threatened.

C. Criterion validity

Criterion validity evaluates how closely the results of your test correspond to the
results of a different test. The criterion is an external measurement of the same thing. It is
usually an established or widely-used test that is already considered valid. To evaluate
criterion validity, you calculate the correlation between the results of your measurement
and the results of the criterion measurement. If there is a high correlation, this gives a
good indication that your test is measuring what it intends to measure.
D. Construct validity

Construct validity evaluates whether a measurement tool really represents the


thing we are interested in measuring. It’s central to establishing the overall validity of a
method. A construct refers to a concept or characteristic that can’t be directly observed,
but can be measured by observing other indicators that are associated with it. Constructs
can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or
depression; they can also be broader concepts applied to organizations or social groups,
such as gender equality, corporate social responsibility, or freedom of speech.

E. Validity in scoring

It is worth pointing out that if a test is to have validity, does not only the items but
also the way in which responses are scored must be valid. It is no use having excellent
items if they are scored invalidity.

How to make tests more valid: Write explicit specifications for the test. Make sure that
you include a representative sample of the content. Use direct testing. Make sure that the scoring
of the responses relates directly to what is being tested. Do everything possible to make the test
reliable.
RELIABILITY

Reliability refers to how consistently a method measures something. If the same result
can be consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable or students do not behave exactly the same way on every
occasion, even when the circumstances seem identical. Teachers have to construct, administer
and score test in such way the scores obtained on a test on a particular occasion are similar. The
more similar the scores would have been, the more reliable the test is said to be. There are some
types of reability:

a. Test-retest

When multiple people are giving assessments of some kind or are the subjects
of some test, then similar people should lead to the same resulting scores. It can be
used to calibrate people, for example those being used as observers in an
experiment.Inter-rater reliability thus evaluates reliability across different people.Two
major ways in which inter-rater reliability is used are (a) testing how similarly
people categorize items, and (b) how similarly people score items..Inter-rater
reliability is also known as inter-observer reliability or inter-coder reliability.

b. Test-Retest Reliability
An assessment or test of a person should give the same results whenever
you apply the test.Test-retest reliability evaluates reliability across time.Reliability
can vary with the many factors that affect how a person responds to the test,
including their mood, interruptions, time of day, etc. A good test will largely cope
with such factors and give relatively little variation. An unreliable test is highly
sensitive to such factors and will give widely varying results, even if the person re-
takes the same test half an hour later.
Generally speaking, the longer the delay between tests, the greater the likely
variation. Better tests will give less retest variation with longer delays.Of course the
problem with test-retest is that people may have learned and that the second test is
likely to give different results.This method is particularly used in experiments that
use a no-treatment control group that is measure pre-test and post-test.

c. Parallel-Forms Reliability

One problem with questions or assessments is knowing what questions are


the best ones to ask. A way of discovering this is do two tests in parallel, using
different questions.Parallel-forms reliability evaluates different questions and
question sets that seek to assess the same construct. Parallel-Forms evaluation may
be done in combination with other methods, such as Split-half, which divides items
that measure the same construct into two tests and applies them to the same group
of people.

d. Internal Consistency Reliability


When asking questions in research, the purpose is to assess the response
against a given construct or idea. Different questions that test the same construct
should give consistent results. Internal consistency reliability evaluates individual
questions in comparison with one another for their ability to give consistently
appropriate results. Average inter-item correlation compares correlations between
all pairs of questions that test the same construct by calculating the mean of all
paired correlations. Average item total correlation takes the average inter-item
correlations and calculates a total score for each item, then averages these. Split-
half correlation divides items that measure the same construct into two tests, which
are applied to the same group of people, then calculates the correlation between the
two total scores.

The reliability coefficient: It allows comparing the reliability of different tests. The
reliability of coefficient will depend on other considerations, most particularly the importance
of the decisions that are to be taken on the basis of the test. To have two sets of comparison.
Take the same test twice.
How to make tests more reliable: Take enough samples of behavior. Exclude items which
do not discriminate well between weaker and stronger students. Do not allow too much freedom.
Provide clear and explicit instructions. Ensure that tests are laid out and legible. Provide uniform
and non-distracting conditions of administration. Use items that are permit scoring which is as
objective as possible. Identify by number, not name. Employ multiple, independent scoring.

Practicality: It means that the test must not be too expensive, not too long but not too
short, easy to apply and appropriate and accurate with the topics that students are actually
studying or have the knowledge. Follow the scoring and evaluation procedure.
STAGES OF TEST CONSTRUCTION

A. Planning the Test:


Planning of the test is the first important step in the test construction. The main
goal of evaluation process is to collect valid, reliable and useful data about the student.

B. Preparing the Test:


After planning preparation is the next important step in the test construction. In
this step the test items are constructed in accordance with the table of specification. Each
type of test item need special care for construction.

C. Try Out of the Test:


Once the test is prepared now it is time to be confirming the validity, reliability
and usability of the test. Try out helps us to identify defective and ambiguous items, to
determine the difficulty level of the test and to determine the discriminating power of the
items.

D. Evaluating the Test:


Evaluating the test is most important step in the test construction process.
Evaluation is necessary to determine the quality of the test and the quality of the
responses. Quality of the test implies that how good and dependable the test is?
(Validity and reliability). Quality of the responses means which items are misfit in the
test. It also enables us to evaluate the usability of the test in general class-room
situation.
FEEDBACK : BACKWASH FROM THE TEST

A. Feedback: is information a teacher or another speaker, incuding another learner gives to


learners on how well they are doing eihter to help the learner improve specific points or to
help plan their learning. Feedback can be immediate, during can activity or delayed at the
end of an activity or part of a learning programme and can take various forms
B. Backwash effect: Is the effect that tests have on learning and teaching. For example if we
want to encourage oral ability, then test oral ability. The backwash can facilitate students to
become good learners in their development of their understanding.
C. Use direct testing: direct implies the testing of performance skills, with texts and tasks. If
we test directly, the practice for the test will represent those skills.
D. Make testing criterion- referenced: make clear which candidates have to do the test, and
with what degree of success, then students will have a clear picture of what they have to
achieve.
E. Base achievement test on objectives: every test have to have an objective and have clear
what the students are going to achieve.
F. Ensure the test is known and understood by students and teachers: the rationale for the
test, its specifications, and sample items should be made to everyone concerned with
preparation for the test.
G. Counting the cost: We should not forget that testing cost time and money. Teacher could
use alternative ideas. Tests sometimes are very expensive and more if is printed in colors,
images etc. Especially in large groups teachers can use black and white tests.

S-ar putea să vă placă și