Validity Refers To How Well A Test Measures What It Is Purported To Measure

Validity refers to how well a test measures what it is purported to measure.
Why is it necessary?
While reliability is necessary, it alone is not sufficient. For a test to be reliable,

it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your
weight every day with an excess of 5lbs. The scale is reliable because it
consistently reports the same weight every day, but it is not valid because it
adds 5lbs to your true weight. It is not a valid measure of your weight.
1. Validity
• is the most important quality of a good measuring instrument

• refers to the degree to which a test measures what it intends to measure.
• refers to the degree in which our test or other measuring device is truly measuring
what we intended it to measure.
• Decisions made based on the students’ test results
• The test should be valid. It should fulfill the objectives for which it is meant. For this
it should be free from defects that are likely to affect the valid results.
a. (Is this a valid test?)
- “How valid is the test for the decision that I need to make?” or
- “How valid is the interpretation I propose for the test?”
b. What precisely does the test measure?
c. How well does the test measure?
*It is said to be valid if it measures what it intends to measure.
• There are different types of validity:
A. Face validity
B. Content validity
C. Criterion validity (Concurrent and Predictive)
A) Face Validity:
- refers to the extent to which the physical appearance of the test corresponds to what it
is claimed to measure
B) Content Validity:
- means that extent to which the content of the test is truly a representative of the
content of the course.
- means that extent to which the content of the test is truly a representative of the
content of the course. A well constructed achievement test should cover the
objectives of instruction, not just its subject matter. Three domains of behavior are
included: cognitive, affective and psychomotor.
C) Criterion-Related Validity
Two kinds of Criterion-Related Validity:

Concurrent Validity and Predictive Validity
• Concurrent validity is established when the test and the criterion are
administered at about the same time.
• Predictive validity concerns with the degree to which a test can predict
candidates’ future performance.
1. Face Validity ascertains that the measure appears to be assessing the intended
construct under study. The stakeholders can easily assess face validity. Although this is
not a very “scientific” type of validity, it may be an essential component in enlisting
motivation of stakeholders. If the stakeholders do not believe the measure is an accurate
assessment of the ability, they may become disengaged with the task.
Example: If a measure of art appreciation is created all of the items should be

related to the different components and types of art. If the questions are
regarding historical time periods, with no reference to any artistic movement,
stakeholders may not be motivated to give their best effort or invest in this
measure because they do not believe it is a true assessment of art appreciation.
2. Construct Validity is used to ensure that the measure is actually measure

what it is intended to measure (i.e. the construct), and not other variables. Using
a panel of “experts” familiar with the construct is a way in which this type of
validity can be assessed. The experts can examine the items and decide what
that specific item is intended to measure. Students can be involved in this
process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment

of learning throughout the major. The questions are written with complicated
wording and phrasing. This can cause the test inadvertently becoming a test
of reading comprehension, rather than a test of women’s studies. It is
important that the measure is actually assessing the intended construct, rather
than an extraneous factor.
3. Criterion-Related Validity is used to predict future or current performance
- it correlates test results with another criterion of interest.
Example: If a physics program designed a measure to assess cumulative

student learning throughout the major. The new measure could be correlated
with a standardized measure of ability in this discipline, such as an ETS field
test or the GRE subject test. The higher the correlation between the established
measure and new measure, the more faith stakeholders can have in the new
assessment tool.
Type of
Definition Example/Non-Example
Validity
A semester or quarter exam that only includes
The extent to which the content
content covered during the last six weeks is
Content of the test matches the
not a valid measure of the course's overall
instructional objectives.
objectives -- it has very low content validity.
The extent to which scores on the
test are in agreement with If the end-of-year math tests in 4th grade
Criterion (concurrent validity) or predict correlate highly with the statewide math tests,
(predictive validity) an external they would have high concurrent validity.
criterion.
If you can correctly hypothesize that ESOL
The extent to which an
students will perform differently on a reading
assessment corresponds to other
Construct test than English-speaking students (because
variables, as predicted by some
of theory), the assessment may have construct
rationale or theory.
validity.
Therefore, reliability is “the extent to which a test produces consistent scores." This
means that the higher the extent, the more reliable the test. Valid test is also a reliable
test, but a reliable test may not be a valid one. Stability of tests scores tested at different
time.
Factors Affecting Test Reliability

1. sample of students’ performance
2. condition of administering the test
3. poor motivation
4. illness or personal problem
How to Make Tests More Reliable?
1. Take enough sample of behavior.

2. Do not allow candidates too much freedom.
3. Write unambiguous items.
4. Provide clear and explicit instructions.
5. Ensure that tests are well laid out and perfectly legible.
6. Candidates should be familiar with format and testing techniques.
7. Provide uniform and non-distracting conditions of administration.
8. Use items that permit scoring which is as objective as possible.
9. Make comparisons between candidates as direct as possible.
10. Provide a detailed scoring key.
11. Train scorers.
12. Identify candidates by number; not name.
13. Employ multiple, independent scoring.
Discriminating Power
• Discriminating power of the test is its power to discriminate between the upper and
lower groups who took the test.
The test should contain different difficulty level of questions.
Discrimination Index
• The power of the item to discriminate the students between those who scored
high and those who scored low in the overall test. In other words, it is the power
of the item to discriminate the students who know the lesson and those who do
not know the lesson.
Types of Discrimination Index
1. Positive Discrimination
 An item is correctly answered by superiors and is not answered correctly

by inferiors. The discriminative power range from +1 to -1.
2. Negative Discrimination
 An item is correctly answered by inferiors and is not answered correctly by

superiors.
3. Zero Discrimination
 The item of the test is answered correctly or know the answer by all the
examinee’s
 An item is not answered correctly any of the examinee
The formula for discrimination index(D.I)
D.I = (R.H - R.L)/ (N.H or N.L)

• R.H – rightly answered in highest group
• R.L - rightly answered in lowest group
• N.H – no of examinees in highest group
• N.L - no of examinees in lowest group
General guidelines for discriminating index (D.I)According to Ebel ,
D.I Item Evaluation
≥0.40 Very good items
0.30 - 0.39 Reasonably good but subject to improvement
0.20 – 0.29 Marginal items , need improvement
<0.19 Poor items . Rejected or revised
General Guidelines for Difficulty Value (D.V)
Low difficulty value index means, that item is high difficulty one ex: D.V=0.20 » 20%
only answered correctly for that item.So that item is too difficult
High difficulty value index means, that item is easy one ex: D.V=0.80 » 80% answered
correctly for that item. So thatitem is too easy one
D.V Item Evaluation
0.20 – 0.30 Most difficult
0.30 - 0.40 Difficult
0.40 – 0.60 Moderate difficult
0.60 – 0.70 Easy
0.70 – 0.80 Most easy
Relationship between Difficulty Value and Discrimination Power
 Both (D.V & D.I) are complementary not contradictory to each other
 Both should considered in selecting good items
 If an item has negatively discriminate or zero discrimination, is to be rejected
whatever the difficulty value
CRITERIA FOR SELECTION AND REJECTION ITEMS
 Positive discrimination index only selected
 Negative and zero discrimination index items are rejected
 High and low difficulty value items are rejected

Validity Refers To How Well A Test Measures What It Is Purported To Measure

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Validity Refers To How Well A Test Measures What It Is Purported To Measure

Încărcat de

Drepturi de autor:

Formate disponibile

Validity refers to how well a test measures what it is purported to measure.

While reliability is necessary, it alone is not sufficient. For a test to be reliable,

• is the most important quality of a good measuring instrument

a. (Is this a valid test?)

b. What precisely does the test measure?

c. How well does the test measure?

*It is said to be valid if it measures what it intends to measure.

• There are different types of validity:

Two kinds of Criterion-Related Validity:

Example: If a measure of art appreciation is created all of the items should be

2. Construct Validity is used to ensure that the measure is actually measure

Example: A women’s studies program may design a cumulative assessment

Example: If a physics program designed a measure to assess cumulative

Factors Affecting Test Reliability

How to Make Tests More Reliable?

1. Take enough sample of behavior.

The test should contain different difficulty level of questions.

Types of Discrimination Index

 An item is correctly answered by superiors and is not answered correctly

 An item is correctly answered by inferiors and is not answered correctly by

The formula for discrimination index(D.I)

D.I = (R.H - R.L)/ (N.H or N.L)

S-ar putea să vă placă și