Selecting the Right Measuring Instrument for Research

TT60104:
Quantitative
Research Method in
Education
Associate Professor Dr. Lay Yoon Fah
Faculty of Psychology and Education
Universiti Malaysia Sabah
layyoonfah@yahoo.com.my
Selecting Measuring
Instruments
Constructs
1. All types of research require collecting data. Data are pieces of
evidence used to examine a research topic or hypothesis.
2. Constructs are mental abstractions such as personality,
creativity, and intelligence that cannot be observed or measured
directly. Constructs become variables when they are stated in
terms of operational definiations.
Variables
1. Variables are placeholders that can assume any one of a range
of values.
2. Categorical variables assume nonnumerical (nominal) values;
quantitative variables assume numerical values and are measured
on an ordinal, interval, or ratio scale.
3. An independent variable is the treatment or cause, and the
dependent variable is the outcome or effect of the independent
variable.
Characteristics of Measuring Instruments
1. Three main ways to collect data for research studies include
administering an existing instrument, constructing one’s own
instrument, and recording naturally occurring events (i.e.,
observation).
2. The time and skill it takes to select an appropriate instrument
are invariably less than the time and skill it takes to develop
one’s own instrument.
3. Thousands of standardized and nonstandardized instruments
are available for researchers. A standardized test is administered,
scored, and interpreted in the same way no matter when and
where it is administered.
4. Most quantitative tests are paper-and-pencil ones, whereas
most qualitative researchers collect data by observation and oral
questioning.
5. Raw scores indicate the number of items or points a person got
correct.
Characteristics of Measuring Instruments
6. Norm-referenced scoring compares a student’s test
performance to the performance of other test takers; criterion-
referenced scoring compares a student’s test performance to
predetermined standards of performance.
Types of Measuring Instruments
1. Cognitive tests measure intellectual processes. Achievement
tests measure the current status of individuals on school-taught
subjects.
2. Aptitude tests are used to predict how well a test taker is likely
to perform in the future. General aptitude tests typically ask the
test takers to perform a variety of verbal and nonverbal tasks.
3. Affective tests are assessments designed to measure
characteristics related to emotion.
4. Most affective tests are nonprojective, self-report measures in
which the individual responds to a series of questions about him-
or herself.
5. Five basic types of scales are used to measure attitudes: Likert
scales, semantic differential scales, rating scales, Thurstone
scales, and Guttman scales. The first three are the most used.
6. Attitude scales ask respondents to state their feelings about
various objects, persons, and activities. People respond to Likert
scales by indicating their feelings along a scale such as strongly
agree, agree, undecided, disagree, and strongly disagree.
Semantic differential scales present a continuum of attitudes on
which the respondents selects a position to indicate the strength
of attitude, and rating scales present statements that
respondents must rate on a continuum from high to low.
7. Interest inventories ask individuals to indicate personal likes
and dislikes. Responses are generally compared to interest
patterns of other people.
8. Personality describes characteristics that represent a person’s
typical behaviour. Personality inventories include lists of
statements describing human behaviours, and participants must
indicate whether each statement pertains to them.
9. Personality inventories may be specific to a single trait
(introversion-extroversion) or may be general and measure a
number of traits.
10. Use of self-report measures creates a concern about whether
an individual is expressing his or her true attitude, values,
interests, or personality.
11. Test bias in both cognitive and affective measures can distort
the data obtained. Bias is present when one’s ethnicity, race,
gender, language, or religious orientation influences test
performance.
12. Projective tests present an ambiguous situation and require
the test takers to “project” her or his true feelings on the
ambiguous situation.
13. Association is the most commonly used projective technique
and is exemplified by the inkblot test. Only the specially trained
can administer and interpret projective tests.
Criteria for Good Measuring Instruments
1. Validity is the degree to which a test measures what it is
supposed to measure, thus permitting appropriate interpretations
of test scores.
2. A test is not valid per se; it is valid for a particular
interpretation and for a particular group. Each intended test use
requires its own validation. Additionally, validity is measured on a
continuum – tests are highly valid, moderately valid, or generally
invalid.
3. Content validity assesses the degree to which a test measures
an intended content area. Content validity is of prime importance
for achievement tests.
4. Content validity is determined by expert judgment of items
and sample validity, not by statistical means.
5. Criterion-related validity is determined by relating
performance on a test to performance on a second test or other
measure.
6. Criterion validity has two forms, concurrent and predictive.
Concurrent validity is the degree to which the scores on a test are
related to scores on another test administered at the same time
or to another measure available at the same time. Predictive
validity is the degree to which scores on a test are related to
scores on another test administered in the future. In both cases, a
single group must take both tests.
7. Construct validity is a measure of whether the construct
underlying a variable is actually being measured.
8. Construct validity is determined by a series of validation
studies that can include content and criterion-related
approaches. Both confirmatory and disconfirmatory evidence are
used to determine construct validity.
9. Consequential validity is concerned with the potential of tests
to create harmful effects for test takers. This is a new but
important form of validity.
10. The validity of any test or measure can be diminished by such
factors as unclear test directions, ambiguous or difficult test
items, subjective scoring, and nonstandardized administration
procedures.
11. Reliability is the degree to which a test consistently measures
whatever it measures. Reliability is expressed numerically, usually
as a coefficient ranging from 0.0 to 1.0; a high coefficient
indicates high reliability.
12. Measurement error refers to the inevitable fluctuations in
scores due to person and test factors. No test is perfectly
reliable, but the smaller the measurement error, the more
reliable the test.
13. The five general types of reliability are stability, equivalence,
equivalence and stability, internal consistency, and scorer/rater.
14. Stability, also called test-retest reliability, is the degree to
which test scores are consistent over time. It is determined by
correlating scores from the same test, administered more than
once.
15. Equivalence, also called equivalent-forms reliability, is the
degree to which two similar forms of a test produce similar scores
from a single group of test takers.
16. Equivalence and stability reliability is the degree to which two
forms of a test given at two different times produce similar scores
as measured by correlations.
17. Internal consistency deals with the reliability of a single test
taken at one time. It measures the extent to which the items in
the test are consistent among themselves and with the test as a
whole. Split-half, Kuder-Richardson 20 and 21, and Cronbach’s
alpha are the main approaches to obtaining internal consistency.
18. Split-half reliability is determined by dividing a test into two
equivalent halves (e.g., odd items vs. even items), correlating the
two halves, and using the Spearman-Brown formula to determine
the reliability of the whole test.
19. Kuder-Richardson reliability deals with the internal
consistency of tests that are scored dichotomously (i.e., right,
wrong), whereas Cronbach’s alpha deals with the internal
consistency of tests that are scored with more than two choices
(agree, neutral, disagree of 0, 1, 2, 3).
20. Scorer/rater reliability is important when scoring tests that
are potentially subjective. Interjudge reliability refers to the
reliability of two or more independent scorers, whereas
intrajudge reliability refers to the reliability of a single
individual’s ratings over time.
21. The acceptable level of reliability differs among test types,
with standardized achievement tests having very high reliabilities
and projective tests having considerably lower reliabilities.
22. If a test is composed of several subtests that will be used
individually in a study, the reliability of each subtests should be
determined and reported.
23. The standard error of measurement is an estimate of how
often one can expect test score errors of a given test. A small
standard error of measurement indicates high reliability; a large
standard error of measurement indicates low reliability.
24. The standard error of measurement is used to estimate the
difference between a person’s obtained and true scores. Big
differences indicate low reliability.
Test Selection, Construction, and Administration
1. Do not choose the first test you find that appears to meet your
needs. Identify a few appropriate tests and compare them on
relevant factors.
2. The Mental Measurement Yearbook (MMY) is the most
comprehensive source of test information available. It provides
factual information on all known or revised tests, test reviews,
and comprehensive bibliographies and indexes.
3. Tests in Print (TIP) is a comprehensive bibliography of all tests
that have appeared in preceding MMYs. Pro-Ed Publications’ Tests
describes more than 2,000 tests in education, psychology, and
business; reviews of many of these tests are found in Test
Critiques.
4. The ETS Test Collection Database describes more than 25,000
tests, published and unpublished.
5. Other sources of test information are professional journals and
test publishers or distributors.
Test Selection, Construction, and Administration
6. The three most important factors to consider in selecting a test
are its validity, reliability, and ease of use.
7. Self-constructed tests should be pilot-tested before use to
determine validity, reliability, and feasibility.
8. Be certain to align instruction and assessment to ensure valid
test results.
9. Every effort should be made to ensure ideal test administration
conditions. Failing to administer procedures precisely or altering
the administration procedures, especially on standardized tests,
lowers the validity of the test.
10. Monitor test takers to minimize cheating.

Selecting the Right Measuring Instrument for Research

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Selecting the Right Measuring Instrument for Research

Încărcat de

Drepturi de autor:

Formate disponibile

TT60104:

S-ar putea să vă placă și