Documente Academic
Documente Profesional
Documente Cultură
Validity is construct validity has been increasingly used to refer to the general, overarching
notion of validity. It is not enough to assert that a test has construct validity; empirical evidence
is needed.
A. Face validity
Face validity considers how suitable the content of a test seems to be on the
surface. It’s similar to content validity, but face validity is a more informal and subjective
assessment. As face validity is a subjective measure, it’s often considered the weakest
form of validity. However, it can be useful in the initial stages of developing a method.
B. Content validity
C. Criterion validity
Criterion validity evaluates how closely the results of your test correspond to the
results of a different test. The criterion is an external measurement of the same thing. It is
usually an established or widely-used test that is already considered valid. To evaluate
criterion validity, you calculate the correlation between the results of your measurement
and the results of the criterion measurement. If there is a high correlation, this gives a
good indication that your test is measuring what it intends to measure.
D. Construct validity
E. Validity in scoring
It is worth pointing out that if a test is to have validity, does not only the items but
also the way in which responses are scored must be valid. It is no use having excellent
items if they are scored invalidity.
How to make tests more valid: Write explicit specifications for the test. Make sure that
you include a representative sample of the content. Use direct testing. Make sure that the scoring
of the responses relates directly to what is being tested. Do everything possible to make the test
reliable.
RELIABILITY
Reliability refers to how consistently a method measures something. If the same result
can be consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable or students do not behave exactly the same way on every
occasion, even when the circumstances seem identical. Teachers have to construct, administer
and score test in such way the scores obtained on a test on a particular occasion are similar. The
more similar the scores would have been, the more reliable the test is said to be. There are some
types of reability:
a. Test-retest
When multiple people are giving assessments of some kind or are the subjects
of some test, then similar people should lead to the same resulting scores. It can be
used to calibrate people, for example those being used as observers in an
experiment.Inter-rater reliability thus evaluates reliability across different people.Two
major ways in which inter-rater reliability is used are (a) testing how similarly
people categorize items, and (b) how similarly people score items..Inter-rater
reliability is also known as inter-observer reliability or inter-coder reliability.
b. Test-Retest Reliability
An assessment or test of a person should give the same results whenever
you apply the test.Test-retest reliability evaluates reliability across time.Reliability
can vary with the many factors that affect how a person responds to the test,
including their mood, interruptions, time of day, etc. A good test will largely cope
with such factors and give relatively little variation. An unreliable test is highly
sensitive to such factors and will give widely varying results, even if the person re-
takes the same test half an hour later.
Generally speaking, the longer the delay between tests, the greater the likely
variation. Better tests will give less retest variation with longer delays.Of course the
problem with test-retest is that people may have learned and that the second test is
likely to give different results.This method is particularly used in experiments that
use a no-treatment control group that is measure pre-test and post-test.
c. Parallel-Forms Reliability
The reliability coefficient: It allows comparing the reliability of different tests. The
reliability of coefficient will depend on other considerations, most particularly the importance
of the decisions that are to be taken on the basis of the test. To have two sets of comparison.
Take the same test twice.
How to make tests more reliable: Take enough samples of behavior. Exclude items which
do not discriminate well between weaker and stronger students. Do not allow too much freedom.
Provide clear and explicit instructions. Ensure that tests are laid out and legible. Provide uniform
and non-distracting conditions of administration. Use items that are permit scoring which is as
objective as possible. Identify by number, not name. Employ multiple, independent scoring.
Practicality: It means that the test must not be too expensive, not too long but not too
short, easy to apply and appropriate and accurate with the topics that students are actually
studying or have the knowledge. Follow the scoring and evaluation procedure.
STAGES OF TEST CONSTRUCTION