Sunteți pe pagina 1din 4

evaluator as observer must decide about personal relationships and group interest, without

losing perspective on the experience of other students with whom the evaluator is less
directly involved. Similarly, in interviewing, patton is opposed to having participants fit their
knowledge, experiences, and feelings into the evaluator’s categories. Instead, the evaluator
must provide a framework for the respondents’ understanding of the program. Not asking
“How satisfied are you with this program?” but asking “What do you think of this program?”

Measuring Affect

Although it is a controversial activity, the assessment of affect is gaining interest.


Special techniques are used for this task, because it is believed that persons are more likely to
“fake” their attitudinal responses. Henre, mild deception is often used so that learners will not
know the purpose of the inquiry or that they re being observed. A stundent may be asked, for
example, to respond to several hypothetical situations, only one of which is of interest to the
examiner. The examiner may ask, “Where would you take a visitor friend from out of town-to
the market, the movie, the school, the library, the bank?’ If ‘school’ is the answer, it is
presumed that the respondent tends to value that institution. Another, less direct approach, is
to use high inference and theoretical instruments. The examiner might ask, “Would you play
the part of a degrenate in a play?” or “Which of the following names (one of which is the
respondent’s own) do you like?” (The inference is that students with high self concepts will
play any role and will like their names). Situations are sometimes contrived, and student’
reactions are interpreted to indicate particular attitudes. Student observers may collect
unobtrusive data and report their observations later, for example. Audio recording are
sometimes made of student made of student small group discussions and analyzed later.
Sometimes, too, students are offered ways to respond anonymously. In evaluating the
affective consequences of a curriculum, students need not be identified. One only has to
know what effect the curriculum is having on students as a group. Furthermore, the measures
or scores obtained with most inference instruments are not reliable enough for making
predictions about individual students.
In an effort to improve the credibility of their findings, evaluators may use
triangulation (the use of three different measures in concert). If a similar attitude is found by
all three measures, they have more confidence in the findings. Locally developed instruments
also are thought to be more valid when two or more persons score students’ responses the
same and when several samples of student behavior are consistent.

Sampling

Sampling is the practice of inferring an educational status on the basis of responses


from representative persons or representative tasks. James popham has said, “sampling
should make a scotsman’s values vibrate. It is so terribly thrifty”. Sampling is controversial
mainly because it is sometimes imposed in inappropriate situations. When students are to be
graded on their relative attainment of common objectives, it is not proper to assess only
certain students not is it valid to test some students on one set of objective and others on
another set.
Administrators rightfully use sampling when they estimate the typical reactions of
students from a few instances of their behavior. It is not necessary to collect all the
compositions that students have written in order to judge their writing ability. Samples will
suffice perhaps one at the beginning of the year and one at the end to show change, if any as a
result of instruction. Similiary, to determine a student’s knowledge in one subject, it is not
necessary to ask the student to respon to all the items that are involved in this knowledge. A
sample of what is involved is enough to draw an inference abou the student’s status. To find
out whether the student can name all the letters of the alphabet and as the student to name
them. The responses indicate ability to respon to the total population of letters. If all five are
named correctly, there is a high probability that could name all of the latters. If the child
cannot name one or more of the letters, obviously the objective has not been reached.
Controversy arises over sampling. If sampling indicates that a child cannot name all of the
letters of the alphabet, then the teacher wants to know specifically which ones must be taught.
Sampling is unlikely to reveal this information.
Controversy may also arise between legislators and others who want achievement
records of students and eveluators who prefer to use a technique like matrix sampling to
determine the effects of a program. In this sampling teqnique randomly selected students
respon to randomly selected test items measuring different objectibes. Thus,different students
take different tests. The advantages of the technique are many ; reduced testing time required
of the student, attainment of information concerning learners’ knowledge with respect to
many objectives, and reduced apprehension on the student’s part because examines are not
compared. The disadvantage is that sampling does not tell us the status of an individual on all
the objectives. But again, this is not necessary to get an indication of abilities within groups
of students.

Technical hazards

Donald horst and colleagues of the RMC research corporation have identified twelve
hazards in conducting evaluations. Each hazard makes it difficult to know whether students
do better in a particular program than they would have done without it.

1. The use of grade equivalent scores. One should not use grade equivalent scores in
evaluating programs. The concept is misleading; a grade equivalent score of 7 by fifth-
graders on a math test does not mean that they know sixth- and seventh-grade math.
Such scores do not comprise an equal interval scale and, therefore, it is difficult to
obtain an average an score. The procedures for obtaining these scores make them too
low in the fall and too high in the spring.
2. The use of gain scores. Gain scores have been used to adjust for differences found in
the pretest scores of treatment and comparison groups. Using them in this in this way is
a mistake, because raw gain scores (posttest scores minus pretest score) excessively
inflate the posttest performance measure of an initially interior group. Students who
initially have the lowest scores have the greatest opportunity to show gain.
3. The use of norm group comparisons with inappropriate test dates. A distorted picture of
a program’s effect occurs when pupils in the new program are not tested within a few
weeks of the norm groups’s tests. Standardized test developers might collect
performance scores in may in order to obtain a norm for the test. If the school’s staff,
however, administers the test during a different month, the discrepancy might be due to
the date of testing rather than to the program.
4. The use of inappropriate test levels. Standardized norm-referenced tests are divided into
levels that cover different grades. The test level may be too easy or too difficult, and
thereby fail to provide a valid measurement of achievement. The test might differentiate
sufficiently among groups at either end of the scale. Such effects may also occur with
the use of criterion referenced tests. Hence, tests should be chosen on the basic of the
pupils’ achievement level, not their grade in school.
5. The lack of pre- and posttest scores for each treatment participant. The group of
students ultimately posttested is not usually composed of exactly the same students as
the pretest group. Eliminating the score of dropouts from the posttest may raise the
posttest scores considerably. Conclusion of a program’s report should be based on the
performance of students who have both pre- and posttest scores. The reason for
dropping out also should be reported.
6. The use of noncomparable treatment and comparison groups. Students should be
randomly assigned to groups. If they are not, stundents in a special program may do
better or worse than those in other programs, because they were different to start with.
7. Using pretest scores to select program participant. Groups with low pretest scores
appear to learn more from a special program than they actually do because of a
phenomenon called regression toward the mean. Gains of high scoring students may be
obscured.
8. Assembling a mismatched comparison groups. The correct procedure for matching
groups is to match pairs of pupils and then randomly assign one member of each pair to
a treatment or comparison group. If for example, one wants to control for age, one
should choose pairs of pupils of the same age. Each member of the pair must have an
equal opportunity to be assigned to a given treatment. Do not consciously try to place
one member in a certain group.
9. Careless administration of tests. Pupils from both treatment and comparison groups
should complete pre- and posttests together. Problems arise when there is inconsistent
adminstration of tests to the two groups. If, for example, there is a disorderly situation
in one setting and a different teacher present, the results may differ.
10. The assumption that an achievement gain is due to the treatment alone. The hawthorne
effects unrecognized “treatments”, such as novelty may be responsible for gain.
Plausible rival hypotheses should be examined as a likely explanation.
11. The use of noncomparable pretests and posttests. Although conversion tables allow one
to correct scores on one test to their equivalent on other tests, it is best to use the same
level of the same test of both pre and posttesting. Often it is possible to use the identical
test as both pre and posttest. Obviously, this does not suffice if teachers teach to the test
and if there are practice effects from taking the test.
12. The use of inappropriate formulas to estimate posttest scores. Formulas such calculate
expected posttest score from IQ or an average of grade equivalent scores are inaccurate.
The actual posttest scores of treatment and comparison groups provide a better basis for
evaluating treatment effects.

CONCLUDING COMMENTS

Evaluation is useless if conclusions are not drawn from the data and acted on in
modifying the curriculum. Looking at test scores and filing them away mocks the evaluating
process, although admittedly, evaluating sometimes serves other purposes. Results are
sometimes used to support of parents and others.
Consensus evaluation also may be undertaken because it is a necessary basic for
requesting monies or reassuring a public that the school is doing its job. The principal
purpose for using the data, however, should be improvement of the curriculum. Hence, some
schools now have curriculum groups that study the findings and then make plans both for the
whole school and for individual teachers.
Scores or descriptive terms summarize learner performance and give study groups the
opportunity to see the strengths and weaknesses of their programs. Analyses of different
populations of pupils reveal how well the curriculum is serving major cultural subgroups,
such as the physically handicapped, or how different groups compare with each other.
Teachers attempt to ascertain from the data what individual students need. Diagnosing needs
becomes a basis for giving personal help. Study groups also discuss the reasons for a
curriculum’s strengths and weaknesses. Members try to explain the results of particular
learning opportunities, the time spent on an objective, the arrangement of activities and
topics, the kinds and frequency of responses from learners, the nations are verified by
determining whether all the data lead to the same conclusion. Plans are made to modify the
curriculum in light of deficiencies noted and the cause of the deficiencies.
The results from consensus evaluation should be used in at least two ways. First, they
can be used to strengthen ends. Results can be the basis for deciding on new instructional
objectives aimed at meeting revealed needs. If evaluation of a program or particular learning
opportunity results in the selection of more important objectives than were originally held,
the evaluation was valuable. Dewey said it well: “There is no such thing as a final set of
objectives, even for the time to revise and better in some respect the objectives arrived at in a
previous work”.
Pluralistic evaluation, especially critical inquiry, is consistent with the rise of
professionalism and the school as the center for evaluation focus. Accordingly, responsibility,
learning, and change become more important than scoreboard accountability. Such evaluation
includes teachers, students, administrators, parents, community members, and possibly a
researcher from the university. As they focus upon curriculum maters like content, goals,
learning opportunities, and grouping, participants create a new awareness, knowledge, and
values, at least if they engage in inquiry for action and try to answer sirotnik’s generic
question. Conditions for critical inquiry include trust among participants, understanding
(comprehension) of one another, and sharing of feelings, observations, and interpretations. In
the evaluative process any statement can be challenged. Evaluation of the statement rests
solely on the strength of the evidence and supporting arguments. All curriculum practices are
subject to questions and to examination of their consequences. “Deep” critical evaluation
even allows for evaluation of the school’s normative structure in which local values are
assessed in light of large values for human life.

S-ar putea să vă placă și