Comprehensive Material For Measurement and Evaluation

Technological Institute of the Philippines College of Education Center for Teaching Excellence Teaching Certificate Program Introduction to Measurement
and Evaluation Discussion Point 1: Introduction to Measurement and Evaluation The Necessity of Evaluation Evaluation in Teaching To teach without evaluation is a contradiction in terms. By its very nature, teaching requires innumerable judgments to be made by the teacher, the school administrators, parents and the pupils themselves. Teachers are obligated to assemble, analyze, and utilize whatever evidence can be brought forward to make the most effective decisions (evaluations) for the benefit of the students in their classes. Among these decisions are the following: 1. The nature of the subject matter that should be taught at each grade level; 2. Which aspects of the curriculum need to be eliminated, modified or included as a function of the current level of student knowledge and attitudes; 3. How instruction can be improves to ensure that students learn; 4. How pupils should be organized within the classroom to maximize learning; 5. How teachers can tell if students are able to retain knowledge; 6. Which students are in need of remedial or advanced work; 7. Which students will benefit from placement in special programs for the mentally retarded, emotionally disturbed, or physically handicapped; 8. Which children should be referred to the school counselor, psychologist, speech therapist, nurse or social worker; and 9. How each pupils progress can be explained most clearly and effectively. The Relationship between Teaching and Evaluation The purpose of teaching is to improve the knowledge, behaviors, and attitudes of students. Teachers want students to increase the amount of knowledge they possess and to decrease the amount of forgetting. Teaching consists of at least four interrelated elements (Glaser and DeCecco, 1968): 1. Developing instructional objectives Teachers need to know what they are attempting to accomplish and cannot leave such matters to chance. Students improve when they make progress toward clearly defined objectives. Clearly defined instructional objectives serve at least two roles: a. Help the teacher recognize student improvement by clarifying what it is the teacher wants to accomplish b. Instructional goals imply the way in which the goals will be evaluated
TCP.TIP_Rungduin Page 1
2. Evaluating the students entering behavior Individual differences (academic achievement, sexual preference, social class, notes from previous teachers, former school or former location , physical characteristics, knowledge of older brother or sister, and family background) Teaching methods are effective only if they are considered in relationship to the background of the student. 3. Selecting an instructional strategy If student background is important in selecting an instructional strategy, the teacher will have to become familiar with those procedures used to measure and evaluate those backgrounds. 4. Providing for an evaluation of the students performance Performance assessment may suggest that a program is ineffective because the objectives are unrealistic or because the entering behavior was not considered adequately. Evaluation can determine whether instructional objectives have been met; it provides evidence that students have the necessary entering behavior and it helps to evaluate the adequacy of an instructional strategy. Test an instrument or systematic procedure for measuring a sample of behavior. (Answer the question How well does the individual perform either in comparison with others or in comparison with a domain of performance task?) Measurement the process of obtaining a numerical description of degree to which an individual possesses a particular characteristic. (Answer the question How much?) Evaluation the systematic process of collecting, analyzing and interpreting information to determine the extent to which pupils are achieving instructional objectives. (Answer the question How good?)
TCP.TIP_Rungduin
Page 2
Measurement Measurement involves the assigning of numbers to attributes or characteristics of persons, objects, or events according to explicit formulations or rules. Educational measurement requires the quantification of attributes according to specified rules. Characteristics of Scales of Measurement Scale Definition Nominal Scale involving the classification of objects, persons, or events into discrete categories Least Complex
Uses and Examples Plate numbers, Social Security numbers, names of people, places, objects, numbers to identify athletes Letter grades (ratings from excellent to failing), military ranks, order of finishing a test
Limitations Cannot specify quantitative differences among categories
Ordinal
Scale involving ranking of objects, persons, traits or abilities without regard to equality of differences Scale having equal differences between successive categories Scale having an absolute zero and equal intervals.
Restricted to specifying relative differences without regard to absolute amount of difference Ratios are meaningless, the zero point is arbitrarily defined None except that few educational variables have ratio characteristics
Interval Most Complex
Temperature, Grades, Scores
Ratio
Distance, weight, time required to learn a skill or subject
TCP.TIP_Rungduin
Page 3
Testing A test may be defined as a task or series of tasks used to obtain systematic observations presumed to be representative of educational or psychological traits and or attributes. Typically tests require examinees to respond to items or tasks from which the examiner infers something about the attribute being measured. Tests and other measurement instruments serve a variety of purposes: 1. Selection. To determine which persons will be admitted to or denied admittance to an institution or organization. 2. Placement. To help individuals determine which of several programs they will pursue. 3. Diagnosis and remediation. To help discover the nature of the specific problems individuals may have. 4. Feedback 5. Motivation of guidance and learning 6. Program and curriculum improvement 7. Theory development Tests may be classified by how they are administered (individually or in groups), how they are scored (objectively or subjectively), what sort of response they emphasize (power or speed), what type of response subjects make (performance or pencil-andpaper), what they attempt to measure (sample or sign), and the nature of groups being compared (teacher-made or standardized) 1. Individual and group tests Some tests are administered on a one-to-one basis during careful oral questioning (e.g., individual intelligence tests), whereas others can be administered to a group of individuals. 2. Objective and subjective tests An objective test is one on which equally competent scorers will obtain the same scores (e.g., multiple-choice tests), whereas subjective test is one on which the scores are influenced by the opinion or judgment of the person doing the scoring (e.g., essay tests). 3. Power and speed tests A speed test measures the number of items that an individual can complete in a given time, whereas a power tests measures the level of performance under ample time conditions. Power tests items usually are arrange in order of increasing difficulty. Relationship between power and speed tests
TCP.TIP_Rungduin
Page 4
Time Difficulty
Power Generous Partially speeded Relatively hard
Speed Limited Relatively easy
4. Performance and paper-and-pencil tests Performance tests require examinees to perform a task rather than answer questions. They are usually administered individually so that the examiner can count the number of errors committed by the student and can measure how long each tasks takes. Pencil-and-paper tests are almost always given in group situation in which students are asked to write their answers on paper. 5. Sample and sign tests Sample of a students total behavior Sign tests are administered to distinguish one group of individuals from another. 6. Teacher-made and standardized tests Teacher made tests are constructed by teachers for use within their own classrooms. Their effectiveness depends on the skill of the teacher and hi or her knowledge of test construction. Standardized tests are constructed by test specialists working with curriculum experts and teachers. They are standardized in that they have been administered and scored under standard and uniform conditions so that results from different classes and different schools may be compared 7. Mastery and survey tests Some achievement tests measure the degree of mastery of a limited set of specific learning outcomes, whereas others measure pupils general level of achievement over broad range outcomes. 8. Supply and Selection Tests Some tests require examinees to supply the answer (e.g., essay tests), whereas others require them to select the correct response from the set of alternatives (e.g., multiple-choice tests).
TCP.TIP_Rungduin
Page 5
Evaluation Evaluation is a process through which a value judgment or decision is made from a variety of observations and from the background and training of the evaluator. General Principles of Evaluation 1. Determining and clarifying what is to be evaluated always has priority in the evaluation process. 2. Evaluation techniques should be selected according to the purpose to be served. 3. Comprehensive evaluation requires a variety of evaluation techniques. 4. Proper use of evaluation techniques requires an awareness of both their limitations and their strengths. 5. Evaluation is a means to an end, not an end in itself. Reasons for Using Tests and Other Measurements
Basis for Classification Nature of Measurement Type of Evaluation Maximum performance Typical performance Use in Classroom Instruction Placement Function of Evaluation Determines what individuals can do when performing at their best Determines what individuals will do under natural conditions Determines prerequisite skills, degree of mastery of course objectives, and/or best mode of learning Determine learning progress, provides feedback to reinforce learning, and correct learning errors Determines causes (intellectual, physical, emotional, environmental) of persistent learning difficulties Determines end-of-course achievement for assigning grades or certifying mastery of objectives Describes pupil performance according to specified domain of clearly defined learning tasks (e.g., adds single digit whole numbers). Describes pupil performance according to relative position in some known group (e.g., ranks tenth in a classroom group of 30). Illustrative Instruments Aptitude tests, achievement tests Attitude, interest and personality inventories; observational techniques; peer appraisal Readiness tests, aptitude tests, pretests on course objectives, self report inventories, observational techniques Teacher-made mastery tests, custom-made tests from test publishers, observational techniques Published diagnostic tests, teachermade diagnostic tests, observational techniques Teacher-made survey tests, performance rating scales, product scales Teacher-made mastery tests, custom-made tests from test publisher, observational techniques Standardized aptitude and achievement tests, teacher-made survey tests, interest inventories, adjustment inventories
Formative
Diagnostic Summative Method of Interpreting Results Criterion referenced
Norm referenced
Motivation and Guidance of Learning Tests can be used to motivate and guide students to learn, and because pupils study for the type of examination they expect to take, it is the teachers responsibility to construct examinations that measure important course objectives. Program and Curriculum Improvement: Formative and Summative Evaluations
1. Formative Evaluation. Evaluation Formative evaluation is used to monitor learning progress during instruction and provide continuous feedback to both pupil and teacher concerning learning success and failures. 2. Summative Evaluation. Evaluation Summative evaluation typically comes at the end of a course (or unit) of instruction. It is designed to determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or certifying pupil mastery of the intended learning outcomes. NormNorm-Referenced and Criterion Criterionion-Referenced Measurement Evaluation procedures can also be classified according to how the results are interpreted. There are two basic ways of interpreting pupil performance on tests and other evaluation instruments. One is to describe the performance in terms of the relative position held in some known group (e.g., typed better than 90 percent of the class members). The other is to directly describe the specific performance that was demonstrated (e.g., typed 40 words per minute without error). The first type of interpretation is called norm referenced; the second criterion referenced. Both types of interpretation are useful. Some Basic Terminologies 1. Norm-referenced test a test designated to provide a measure of performance that is interpretable in terms of an individuals relative standing in some known group. 2. Criterion-referenced test a test designated to provide a measure of performance that is interpretable in terms of a clearly defined and delimited domain of learning tasks. 3. Objective-referenced test a test designated to provide a measure of performance that is interpretable in terms of a specific instructional objective. (Many objectivereferenced test are called criterion-referenced tests by their developers). Other terms that are less often used have meanings similar to criterion referenced: content referenced, domain referenced, and universe referenced. Comparison of Norm-Referenced Tests (NRTs) and Criterion-Referenced Tests (CRTs) Common Characteristics of NRTs and CRTs 1. 2. 3. 4. 5. 6. Both require specification of the achievement domain to be measured. Both require a relevant and representative sample of test items. Both use the same types of tests items. Both use the same rules fir item writing (except for item difficulty). Both are judge by the same qualities of goodness (validity and reliability). Both are useful in educational measurement.
Differences Between NRTs and CRTs (but it is only matter of emphasis)

1. NRT typically covers a large domain of learning tasks, with just a few items measuring each specific tasks CRT typically focuses on a delimited domain of learning tasks, with a relatively large number of items measuring each specific task. 2. NRT emphasizes discrimination among individuals in terms of relative level of learning CRT emphasizes description of what learning tasks individuals can and cannot perform 3. NRT favors items of average difficulty and typically omits easy items. CRT matches item difficulty to learning tasks, without altering item difficulty or omitting easy items 4. NRT used primarily (but not exclusively) for survey testing CRT used primarily (but not exclusively) for mastery testing 5. NRT interpretation requires a clearly defined group CRT interpretation requires a clearly defined and delimited achievement domain Strictly speaking, norm reference and criterion reference refer only to the method of interpreting the results. These types of interpretation are likely to be most meaningful and useful, however, when tests (and other evaluation instruments) are specially designated for the type of interpretation to be made. Thus, we can use the terms criterion referenced and norm referenced as broad categories for classifying tests and other evaluation techniques. Tests that are specifically built to maximize each type of interpretation have much in common, and it is impossible to determine to determine the type of test from examining the test itself. Rather, it is in the construction and use of the tests that the differences can be noted. A key feature in constructing norm-referenced tests is the selection of items of average difficulty and the elimination of item that all pupils are likely to answer correctly. This procedure provides a wide spread of scores so that discrimination among pupils at various levels of achievement can be more reliably made. This is useful for decisions based on relative achievement, such as selection, grouping and relative grading. In contrast, a key feature in constructing criterion-referenced tests is the selection of items that are directly relevant to the learning outcomes to be measured, without regard to the items ability to discriminate among pupils. If the learning tasks are easy, the test items will be easy, and if the learning tasks are difficult, the test items will be difficult. Here the main purpose is to describe the specific knowledge and skills that each pupil can demonstrate, which is useful for planning both group and individual instruction. Norm-Referenced Test Combined Type Criterion-Referenced Test
Discrimination Among Pupils Other Descriptive Terms

TCP.TIP_Rungduin
Dual Interpretation
Description of Performance
Page 8
Some of the other terms that are frequently used in describing tests are presented here as contrasting test types, but some are simply the ends of a continuum (e.g., speed versus power tests). 1. Informal Versus Standardized Tests. Tests Informal tests are those constructed by classroom teachers, whereas those designated by test specialists and administered, scored, and interpreted under standard conditions are called standardized tests. 2. Individual Versus Group Tests. Tests Some tests are administered on a one-to-one basis during careful oral questioning (e.g., individual intelligence tests), whereas others can be administered to a group of individuals. 3. Mastery Versus Survey Survey Tests. Tests Some achievement tests measure the degree of mastery of a limited set of specific learning outcomes, whereas others measure pupils general level of achievement over broad range outcomes. Mastery tests are typically criterion referenced, and survey tests tend to be norm referenced, but some criterion-referenced interpretations are also possible with carefully prepared survey tests. 4. Supply Versus Selection Tests. Tests Some tests require examinees to supply the answer (e.g., essay tests), whereas others require them to select the correct response from the set of alternatives (e.g., multiple-choice tests). 5. Speed Versus Power Tests. Tests A speed test measures the number of items that an individual can complete in a given time, whereas a power tests measures the level of performance under ample time conditions. Power tests items usually are arrange in order of increasing difficulty. 6. Objective Versus Subjective Tests. Tests An objective test is one on which equally competent scorers will obtain the same scores (e.g., multiple-choice tests), whereas subjective test is one on which the scores are influenced by the opinion or judgment of the person doing the scoring (e.g., essay tests).
TCP.TIP_Rungduin
Page 9
Discussion Point 2: Preparing Instructional Objectives Instructional objectives play a key role in the instructional process. When properly stated, they serve as guides for both teaching and evaluation. A clear description of the intended outcomes of instruction aids in selecting relevant materials and methods of instruction, in monitoring pupil learning progress, in selecting or constructing appropriate evaluation procedures, and in conveying instructional intent to others. In preparing instructional objectives, it is possible to focus on different aspects of instruction. Educational goal is a general aim or purpose of education that is stated as a broad, long range outcome to work toward. Goals are used primarily in policy making and general program planning (e.g. Develop proficiency in the basic skills of reading, writing and arithmetic.) General Instructional Objective is an intended outcome of instruction that has been stated in general enough terms to encompass a set of specific learning outcomes (e.g., Comprehends the literal meaning of written material). Specific Learning Outcome is an intended outcome of instruction that has been stated in terms of specific and observable pupil performance (e.g. Identifies details that are explicitly stated in a passage.) A set of specific learning outcomes describes a sample of the types of performance that learners will be able to exhibit when they have achieved a general instructional objective (also called Specific Objectives, Performance Objectives, Behavioral Objectives, and Measurable Objectives). Pupil Performance is any measurable or observable pupil response in the cognitive, affective, or psychomotor area that is a result of learning. Dimensions of Instructional Objectives 1. Mastery vs. Developmental Outcomes Mastery objectives are typically concerned with relatively simple knowledge and skill outcomes (adds two single-digit numbers with sums of ten or less). Developmental outcomes are concerned with objectives that can never be fully achieved. Varying degrees of pupil progress along a continuum of development. 2. Ultimate vs. Immediate Objectives Ultimate objectives are concerned with those concerned with the typical performance of individuals in the actual situations they will face in the future. Example, good citizenship is reflected in adult life through voting behavior, interest in community affairs, and the like; safety consciousness shows up in safe driving and safe work habits and in obeying safety rules in daily activities. Immediate objectives should be closely related to ultimate situation. For example, can pupils apply basic skills to practical situations? Such objectives, calling for the application of knowledge and skills, aid in the transfer of skills to ultimate situations and should be on any list of objectives.
3. Single-course vs. Multiple-course Objectives Areas Containing Multiple-Course Objectives Whether these areas are the shared responsibility of several teachers depends on the grade level and the schools goals (e.g., in some schools, every teacher is considered a teacher of basic skills). Reading Computer skills Creativity Writing Study skills Citizenship Speaking Library skills Health Selection of Instructional Objectives 1. Types of learning outcomes to consider 2. Taxonomy of educational objectives 3. Use of published lists of objectives 4. Review of your own teaching materials and methods Begin with a Simple Framework: Knowledge, Understanding, Application Reading K = Knows vocabulary U = Reads with comprehension A = Reads a wide variety of printed materials Writing K = Knows the mechanics of writing U = Understands grammatical principles in writing A = Writes complete sentences (paragraph, theme) Math K = Knows the number system and basic operations. U = Understands math concepts and processes. A = Solves math problems accurately and efficiently Criteria for Selecting Appropriate Objectives 1. Do the objectives include all important outcomes? 2. Are the objectives in harmony with the general goals of the school? 3. Are the objectives in harmony with sound principles of learning? 4. Are the objectives realistic in terms of the pupils abilities and the time and facilities available?
TCP.TIP_Rungduin
Page 11
Stating the Specific Learning Outcomes 1. Focus on action verbs Examples: a. Understands the meaning of terms. 1. Defines the terms in own words 2. Identifies the meaning of a term in context 3. Differentiates between proper and improper usage of a term 4. Distinguishes between two similar terms on the basis of meaning. 5. Writes an original sentence using the term. b. Demonstrates skill in critical thinking 1. Distinguishes between fact and opinion 2. Distinguishes between relevant and irrelevant information. 3. Identifies fallacious reasoning in written material. 4. Identifies the limitations of given data. 5. Identifies the assumptions of underlying conclusions 2. Kept free of specific content so that they can be used in various units of study. Poor: Identifies the last ten presidents of the Psychological association of the Philippines. Better: Identifies important historical figures. Poor: Identifies the parts of the brain. Better: Identifies the parts of a given structure.
TCP.TIP_Rungduin
Page 12
Discussion Point 3: Achieving Different Types of Learning Outcomes Achieving Cognitive Learning Teaching Fact, Factual Information, and Knowledge Basic Concepts 1. Fact something that has happened, an event or an actual state of affairs 2. Factual Information information discriminated by many individuals who share the same cultural background and accepted as correct or appropriate. 3. Information anything that is discriminated by an individual 4. Knowledge factual information that is learned initially and then remembered Types of Knowledge 1. General knowledge that applies to many different situations 2. Domain-specific knowledge that pertains to a particular task or subjects 3. Declarative knowledge of verbal information: facts, beliefs, opinions] 4. Procedural knowledge of how a task is performed 5. Conditional knowing when and why the need to use declarative and procedural knowledge Three Categories of Knowledge 1. Knowledge of specifics isolated facts and remembered separately 2. Knowledge of ways and means conventions, trends and sequences, classification and categories, criteria and methodology 3. Knowledge of abstraction laws, theories and principles
TCP.TIP_Rungduin
Page 13
Application of Principles in Teaching and Learning Factual Information Principles Applications in Classroom Situation 1. Organizing learning material Group items through meaningful association o According to common attributes facilitates acquisition of o By relationships information 2. Transition from old to new Organize the material to a higher level of materials facilitates acquisition of generality information Use advanced organizer that is more general, abstract, inclusive and comparative Utilize related prior knowledge of students as advance organizers 3. Proper sequencing of materials Order subject matter facilitates acquisition of o According to regularity of the information structure o According to the responses available to the learner o According to similarity of different stimuli 4. Appropriate practice facilitates Provide adequate practice through acquisition of information o Use of knowledge in situation o Relationship o Distribution of session o Review of small amount of material, then at increasingly larger interval Reinforce practice through confirmation of correct responses 5. Independent evaluation facilitates Provide mechanism for learners to acquisition of information evaluate their own responses. Teaching Concepts and Principles Basic Concepts of Concept and Principles 1. Concept essentially an idea or an understanding of what something is A category used to group similar events, ideas, objects or people Organized information about the properties of one or more things 2. Principles relationship among two or more concepts Classification of Principles 1. Cause and effect if-then relationship 2. Probability prediction on actual sense 3. Correlation prediction based on a wide range of phenomena 4. Axioms rules
TCP.TIP_Rungduin
Page 14
Instances of Concepts 1. Positive specific example of concepts 2. Negative non-example of a concept Attributes of Concept 1. Learnability some concepts are readily learned than others 2. Usability some concepts can be used more than others 3. Validity the extent to which experts agree on its attributes 4. Generality the higher the concept, the more general it is 5. Power the extent to which concept facilitates learning of other concepts 6. Structure internally consistent organization 7. Instance perceptibility the extent to which concepts can be sensed 8. Instance numerousness number ranging from one to infinite number Four Levels of Concept Attainment 1. Concrete 2. Identity 3. Classificatory 4. Formal Four Components in Any Concept Development 1. Name of concept 2. Definition 3. Relevant and irrelevant attributes 4. Examples and non-examples Simple Procedures in Concept Analysis 1. State attributes and non-attributes 2. Give example and non-examples 3. Indicate relationships of a concept to other concept 4. Identify the principles in which concept is used 5. Use concept in solving problems
TCP.TIP_Rungduin
Page 15
Application of Principles in Teaching and Learning of Concepts and Principle

Principles 1. Awareness of attributes facilitates concept learning Applications in Classroom Situation Manage instruction by Guiding learners to identify the critical attributes Using examples and non-examples which the attribute of the concept can be identified Utilizing activities where instances of the concept can be directly observe Providing for over generalization and under generalizing to establish the limits of the concept Providing the right amount of variation and repetition Varying irrelevant dimensions so that the relevant dimensions may be identified easily Teach relevant names and labels associated with Concept Attributes Concept development should proceed From simple to complex example From concrete to abstract From parts to whole From whole to parts Present concepts of Larger number of instances of one concept High dominance than those of low dominance Positive and negative instances of the concept rather than all positive or all negative instances Present instances of concept simultaneously rather than successively Guided students discovery of concept through Encounter with real and meaningful problems Gathering accurate information A responsive environment Prompt and accurate feedback Conduct meaningful applications of concepts by Drawing on the learners experiences Observing related situations Encountering life-like situations Arrange for independent evaluation by Creating an attitude of seeking and searching Arranging for self-evaluation of the adequacy of ones concept Assisting learners to evaluate their concepts and their methods of evaluating them.
2. Correct language for concepts facilitates concept learning 3. Proper sequencing of instances facilitates concept learning
4. Guided student discovery facilitates concept learning
5. Concept application facilitates concept learning 6. Independent evaluation facilitates concept learning
TCP.TIP_Rungduin
Page 16
Developing Problem Solving Abilities Basic Concepts 1. Problem felt difficulty or a question for which a solution may be found only by a process of thinking 2. Thinking the recall and reorganization of facts and theories that occur at a time when the individual is face4d with obstacles and problems 3. Reasoning productive thinking in which previous experiences are organized or combined in new ways to solve problems 4. Problem solving creating new solutions to remove felt difficulty Steps in Problem Solving 1. Felt need 2. Recognizing a problem situation 3. Gathering data 4. Evaluating the possible solution 5. Testing and verification 6. Making generalization or conclusion
TCP.TIP_Rungduin
Page 17
Principles for Developing Problem Solving Abilities and their Applications in Classroom Situations
Principles 1. Recognizing difficulties in facilitates problem solving a situation Applications in classroom Assist students to Identify solvable significant problems State problems themselves Guide students in Analyzing the situation related to the problem Determining problems of immediate concern Delimiting the problem Stating problems with opportunity for securing progress towards a solution Deciding the form in which the solutions might appear Using information processing skill of selective attention Help students in Locating needed information Acquiring the necessary background information, concepts, principles for dealing with the problem Developing their minimum reference list Identify various sources of information Drawing information from their own experiences Deciding on a uniform system for writing bibliography Lead students to generate solutions through Brainstorming session Processing information Analyzing information in terms of the larger problems Incorporating diverse information Eliminating overlapping and discrepancies Develop the skills of the students in Drawing hypotheses Stating hypotheses Testing hypotheses
2. Delimiting the problem facilitates problem solving
3. Using new methods for arriving at a conclusion facilitates problem solving
4. Generalizing possible solutions through applying knowledge and methods to the problem situation facilitates problem solving
5. problem solutions through inferring and testing hypothesis facilitates problem solving
TCP.TIP_Rungduin
Page 18
Developing Creativity Basic Concepts 1. Restructuring conceiving of a problem in a new or different way 2. Incubation unconscious work toward a solution while one is away from the problem 3. Divergent thinking coming up with many possible solutions 4. Convergent narrowing possibilities to the single answer 5. Creativity occurrence of uncommon or unusual but appropriate responses: imaginative, original thinking Characteristics of a Creative Individual 1. Has a high degree of intellectual capacity 2. Genuinely values intellectual matter 3. Values own independence and autonomy 4. Verbally fluent 5. Enjoys aesthetics impressions 6. Is productive 7. Is concerned with philosophical problems 8. Has high aspiration level of self 9. Has a wide range of interests 10. Thinks in unusual way 11. Is an interesting, arresting person 12. Appears straight forward, candid 13. Behaves in ethically consistent manner Principles for Developing Creativity and their Applications in Classroom Teaching
Principles Applications in Classroom Teaching 1. Production of novel forms of ideas Model creative behaviors such as through expressing oneself by figural, Curiosity verbal and physical means facilitates Inquiry development of creativity Divergent production Provide opportunities for Expression in many media: language, rhythm, music and art Divergent production through figural, verbal and physical means Creative processes Valuing creative achievement Production of ideas that cannot be scored right or wrong Develop a continuing program for developing creative abilities 2. Associating success in creative efforts Respect with high level of creative experience Unusual questions facilitates the development of creativity Imaginative creative ideas Reward Creative efforts Unique productions
TCP.TIP_Rungduin
Page 19
Achieving Psychomotor Learning Basic Concepts 1. Capacity individuals potential power to do certain task 2. Ability actual power to perform an act physically and mentally 3. Skill level of proficiency attained in carrying out sequences of action in a consistent way Characteristics of Skilled Performances 1. Less attention to the specific movement (voluntary to involuntary) 2. Better differentiation of cues 3. More rapid feedback and correction of movements 4. Greater speed and coordination 5. Greater stability under a variety of environmental conditions Phases of Motor Skills Learning 1. Cognitive phase understanding the task 2. Organizing phase associating responses with particular cues and integrating responses 3. Perfecting phase executing performance in automatic fashion
TCP.TIP_Rungduin
Page 20
Application of Principles in Developing Psychomotor Skills in Classroom Teaching Principles Application in Classroom Teaching 1. Attending to the characteristics Analyze the psychomotor skills in terms of the of the skill and assessing ones learners abilities and development level own related abilities facilitate To determine the specific abilities motor skill learning necessary to perform it To arrange the component abilities in order To help students master them 2. Observing and imitating a model Demonstrate and describe the facilitates initial learning of skills Entire procedure for advance organizer and movements Correct component of motor abilities Links of the motor chain in sequence Skill again step by step 3. Guiding initial responses Provide verbal guidance to verbally and physically facilitates Give learners a feeling of security learning of motor skills Direct attention to more adequate techniques Promote insight into the factors related to successful performance of task Provide physical guidance to Facilitate in making correct response4s initially Correct immediately wrong responses 4. Practicing under desirable Conduct practice of skills conditions facilitates the Close to actual conditions where the skill learning of skills through will be used eliminating errors and From whole to part arrangement strengthening and refining Through repetitive drills in the same correct responses and form materials By distributed rather than mass practice With interval of rest long enough to overcome fatigue but not too long that forgetting occurs 5. Knowledge of results facilitates Provide informational feedback on skill learning Correct and incorrect responses Adequate and inadequate responses Correct6 or incorrect verbal remarks Feedback may be secured from Verbal analysis Chart analysis Tape performance 6. Evaluating ones own Self evaluation of learners performance through performance facilitates mastery Discussion of skills Analysis Assessment
TCP.TIP_Rungduin
Page 21
Achieving Affective Learning Developing Attitudes and Values 1. Affective pertains to emotions or feelings rather than thought 2. Affective learning consists of responses acquired as one evaluates the meaning of an idea, object, person, or event in terms of his view of the world. Main Elements 1. Taste like or dislike of a particular animal, color, or flavor 2. Attitudes learned, emotionally toned predisposition to react in a consistent way, favorable or unfavorable toward a person, object or idea 3. Values inner core belief, and internalized standards as norm of behavior Defining Attributes of Attitudes 1. Learnability all attitudes are learned 2. Stability learned attitudes become stronger and enduring 3. Personal-societal significance attitudes are of high importance to the individual and society 4. Affective-cognitive contents attitudes have both factual information and emotions associated with an object
TCP.TIP_Rungduin
Page 22
Application of Principles in Developing Attitudes and Values in Classroom Teaching Principles Application in Classroom Teaching 1. Recognizing an attitude Guides students in facilitates its initial learning Identifying the attitudes and values to developed Defining the terminal behavior expected of them 2. Observing and imitating a model Teachers provides facilitates initial attitude learning Different types of exemplary models Opportunities to examine carefully instructional materials in terms of attitudes and values presented Teacher sets good example 3. Positive attitudes toward a Provide for pleasant and positive emotional person, event or object experiences by facilitates affective learning Showing warmth and enthusiasm toward students Keeping personal prejudices under control Allowing students to express ones own value commitments Demonstrating interest in subject matter Making possible for each student to experience success 4. Getting information about Guide learners to extend their informative person, event, or object experiences by influences initial attitude Undergoing direct experiences learning and later commitment Listening to group lectures and discussions to group held attitudes Engaging in extensive reading Participating in related activities 5. Interacting in primary groups Facilitate interacting in primary groups through influences initial attitude Group planning learning, later commitment to Group discussion group held attitude Group decision making Role-playing 6. Practicing an attitude facilitates Practice context should stable organization Regard the teacher as an exemplary model manifesting interest in the students Be characterized by positive climate Confirm learner responses with positive remarks, approving nod, and smile 7. Purposeful learning facilitates Guide learners to engage in independent attitude effective attitude acquisition and cultivation through modification Providing opportunities for them to think about their own attitudes Writing about open-ended themes
TCP.TIP_Rungduin
Page 23
Discussion Point 4: Test Construction, Reliability and Validity STEP I: CONTENT VALIDATON It is the degree to which the test represents the essence, the topics, and the areas that the test is designed to measure. Considered the most crucial procedure in the test construction process because content validity sets the pace for the succeeding validity and reliability measures. 1.1 Documentary analysis or prepre-survey. At this stage, one must have familiarized him/herself with the theoretical constructs directly related to the test one planning. 1.2 Development of a Table of Specification. Determining the areas or concepts that will represent the nature of the variable being measured and the relative emphasis of each area are essentially judgmental. Detailed TS includes areas or concepts, objectives, number of items, and the percentage or proportion of items in each area. It is advisable to make 50 to 100 percent allowance in the construction of items. Sample Table of Specification (first draft) for Introduction to Psychology Unit Exam LEARNING NUMBER OBJECTIVES OF PLACEMENT OF ITEMS PERCENTAGE AREAS ITEMS K C A A S E I. History 1,6,13,14,22,23,24,32,33, of 15 21.43 % 49,50,51,59,65,65 Psychology II. Branches 7,12,15,16,20,21,31,34, 15 21.43 % of 47,48,52,58,59,61,70 Psychology III. 3,4,5,17,19,25,26,30,35, 28.57 % Schools of 20 36,39,40,44,46,53,54,60, Psychology 62,67,69 IV. 2,8,9,10,11,18,27,28,29, Research 20 37,38,41,42,43,45,55,56, 28.57 % Methods 63,64,68 Total 70 100 % Building a Table of Specifications: 1. Obtaining a list of Instructional Objectives 2. Outlining the Course Content 3. Preparing a Two-way chart
TCP.TIP_Rungduin
Page 24
Table of Specifications for a Summative Third-Grade Social Studies Test
1.3 Consultation with experts. experts At this point it is advisable to consult with your thesis adviser or with some authorities that have the expertise in making judgment about the representativeness or relevance of the entries made in your TS. 1.4 Item writing. writing At this stage you should know what types of items you are supposed to construct: the type of instrument, format, scaling and scoring techniques. STEP II: FACE VALIDATION Face validity, the crudest type of validity, pertains to whether the test looks valid, that is, if by the face of the instrument, it looks like it can measure what you intend to measure. This type of validity cannot stand-alone for use especially in researches in the graduate level. 2.1 Item inspection. Have the initial draft of the instrument inspected by a group of evaluators thesis adviser, test construction experts, expert/professionals whose specialization are related to the subject matter at hand. ITEM / ITEM NO. SUITABLE NOT SUITABLE NEED REVISION
2.2 InterInter-judge consistency. consistency You may collate the data gathered from the evaluators
for analysis. You have to look at the agreement or consistency of judgment they made each of the items.
TCP.TIP_Rungduin
Page 25
STEP III: FIRST TRIAL RUN At this stage you must have already a stencil of your first draft as a result of steps 1 and 2. Try out your test to a sample that is comparable to your target population or of your final sample. This try out should be large enough to provide meaningful computations. STEP IV: ITEM ANALYSIS Both the reliability and validity of any test depend largely on the characteristics of the items. It says that high validity and reliability can be built into the instruments in advance through item analysis. According to Likert, item analysis can be used as an objective check in determining whether the member of the group react differentially to the battery, that is, item analysis indicates whether those person who fall toward one end of the attitude continuum on the battery do so on the particular statement and vice versa. THE UU-L INDEX METHOD Appropriate for the test whose criterion is measured along the continuous scale and whose individual item is scored right or wrong and negative or positive. Steps in using UU-L Index Method 1. Score the test and arrange them from lowest to highest based on the total scores. Sample score from 10 item test n = 30 Arranged scores List of scores from lowest to highest 2 9 1 5 3 8 2 5 5 9 2 5 9 5 2 6 5 2 2 6 4 3 2 6 2 4 3 8 6 4 3 8 8 6 3 8 8 2 3 8 8 2 4 8 9 1 4 9 3 8 4 9 6 4 4 9 3 5 5 9
TCP.TIP_Rungduin
Page 26
2. Separate the top 27 % and bottom 27 % of the cases. the 27 % of 30 is 8.1 or 8 (30 x .27 = 8.1) Sample score from 10 item test n = 30 Arranged scores from lowest to highest 1 5 2 5 2 5 2 6 2 6 2 6 3 8 3 8 3 8 3 8 4 8 4 9 4 9 4 9 5 9
Bottom 27 %
Top 27 %
3. Prepare a tally sheet. Tally the number of cases from each group who got the item right for each of the entire items. And then convert them into frequencies. ITEM NO. 1 2 3 4 5 6 7 8 9 10 UPPER 27 % tally IIIII III IIIII IIIII III IIIII I IIIII III IIIII II IIIII I IIIII IIIII III IIIII - II frequency 8 5 8 6 8 7 6 5 8 7 LOWER 27 % tally II II I I II III I I II II frequency 2 2 1 1 2 3 1 1 2 2
TCP.TIP_Rungduin
Page 27
4. Compute the proportions of each case in the different item number.

f
U or L = n
ITEM NO. 1 2 3 4 5 6 7 8 9 10
UPPER 27 27 % n = 30 f p 8 5 8 6 8 7 6 5 8 7
LOWER 27 % n = 30 F P 2 2 1 1 2 3 1 1 2 2
5. Compute the discrimination index of each item. Discrimination index refers to the degree to which an item differentiates correctly among test takers in the behavior that the test is designed to measure. Thus, a god test item separates the bright from the poor respondents. Ds Where: = Pu Pl
Ds is discrimination index Pu is proportion of the upper 27 % Pl is proportion of the lower 27 percent % UPPER 27 % n = 30 f p (Pu) 8 5 8 6 8 7 6 5 8 7 LOWER 27 % n = 30 f p (Pl) 2 2 1 1 2 3 1 1 2 2 Ds
ITEM NO. 1 2 3 4 5 6 7 8 9 10
TCP.TIP_Rungduin
Page 28
6. Compute the difficulty index of each item Difficulty index index is the percentage of the respondents who got the item right. It can also be interpreted as how easy or how difficult an item is. Df = Where:
Pu Pl 2
Df is difficulty index Pu is proportion of the upper 27 % Pl is proportion of the lower 27 % UPPER 27 % n = 30 F p (Pu) 8 5 8 6 8 7 6 5 8 7 LOWER 27 % n = 30 f p (Pl) 2 2 1 1 2 3 1 1 2 2 Ds DF
ITEM NO. 1 2 3 4 5 6 7 8 9 10
7. Deciding whether to retain an item will be based on two ranges. ITEM NO. 1 2 3 4 5 6 7 8 9 10 UPPER 27 % n = 30 F p 8 5 8 6 8 7 6 5 8 7 LOWER 27 % n = 30 f p 2 2 1 1 2 3 1 1 2 2 Ds
DF
Decision
Items with difficulty indices within .20 to .80 and discrimination discrimination indices within .30 to .80 are retained.
TCP.TIP_Rungduin
Page 29
The Chung-the-fan item analysis table can be obtained in the discrimination indices of the items. .40 and above very good item .30 - .39 reasonably good item but possibly subject to improvement .20 - .29 marginal item, usually needing improvement .19 and below poor item, to be rejected, improved or revised Difficulty indices can interpreted as the following: .00 - .20 very difficult .21 - .80 moderately difficult .81 1.00 very easy CRITERION OF INTERNAL CONSISTENCY Somewhat similar to the U-L Index Method, in that two criterion groups, the high group and the low group, are employed to judge the discriminatory power of an item. However, in this method, Likert recommends the use of the high 10 percent and the low 10 percent groups. Steps in Criterion of Internal Consistency Method 1. List all the scores of the respondents and get its high 10 percent and low 10 percent. Write their respective scores for each item. sample scores from a 10-item test using 4-point scale n = 50 TEST ITEMS 2 3 4 5 6 7 8 4 2 4 4 4 4 4 4 4 4 4 2 4 4 3 4 2 4 2 4 3 4 4 4 2 4 4 4 4 2 2 2 1 2 2
respondents
High 10 % A B C D E Low 10 % F G H I J
1 4 4 4 4 3
9 4 4 3 4 4
10 3 3 3 4 4
4 4 3 3 3
1 1 2 2 2
1 2 2 2 2
2 2 1 1 2
3 4 3 4 4
2 2 3 2 1
3 2 1 3 3
4 3 4 4 3
3 3 2 2 1
2 2 2 2 2
TCP.TIP_Rungduin
Page 30
2. Get the summation of the high group and the low group. sample scores from a 10-item test using 4-point scale n = 50 TEST ITEMS 1 2 3 4 5 6 4 4 4 4 4 3 4 2 4 2 2 4 4 4 4 4 4 4 4 4 4 4 2 4 3 4 4 3 4 2
High 10 % A B C D E Sum of high group Sum of low group Low 10 % F G H I J
7 4 4 4 4 2
8 2 2 1 2 2
9 4 4 3 4 4
10 3 3 3 4 4
4 4 3 3 3
1 1 2 2 2
1 2 2 2 2
2 2 1 1 2
3 4 3 4 4
2 2 3 2 1
3 2 1 3 3
4 3 4 4 3
3 3 2 2 1
2 2 2 2 2
3. Get the difference of the groups. sample scores from a 10-item test using 4-point scale n = 50 TEST ITEMS 1 2 3 4 5 6 4 4 4 4 4 3 4 2 4 2 2 4 4 4 4 4 4 4 4 4 4 4 2 4 3 4 4 3 4 2
High 10 % A B C D E Sum of high group Sum of low group Difference Low 10 % F G H I J
7 4 4 4 4 2
8 2 2 1 2 2
9 4 4 3 4 4
10 3 3 3 4 4
4 4 3 3 3
1 1 2 2 2
1 2 2 2 2
2 2 1 1 2
3 4 3 4 4
2 2 3 2 1
3 2 1 3 3
4 3 4 4 3
3 3 2 2 1
2 2 2 2 2
TCP.TIP_Rungduin
Page 31
4. To see the difference between scores, ranking will be helpful to use. sample scores from a 10-item test using 4-point scale n = 50 TEST ITEMS 1 2 3 4 5 6 4 4 4 4 4 3 4 2 4 2 2 4 4 4 4 4 4 4 4 4 4 4 2 4 3 4 4 3 4 2
High 10 % A B C D E Sum of high group Sum of low group Difference Rank Low 10 % F G H I J
7 4 4 4 4 2
8 2 2 1 2 2
9 4 4 3 4 4
10 3 3 3 4 4
4 4 3 3 3
1 1 2 2 2
1 2 2 2 2
2 2 1 1 2
3 4 3 4 4
2 2 3 2 1
3 2 1 3 3
4 3 4 4 3
3 3 2 2 1
2 2 2 2 2
PEARSON PEARSON PRODUCTPRODUCT-MOMENT CORRELATION METHOD This item analysis technique is used for tests of continuous scaling with three (3) or more scale points. There is a total score, which serves as an X criterion, and item score, which is the Y criterion. This is done to the entire items. Therefore, if the draft consists of 60 items, there should be 60 correlation coefficients computed.
TCP.TIP_Rungduin
Page 32
Steps in Pearson Product-Moment Correlation Method 1. Find the X and Y scores. Where X is the total scores of the respondents while the Y is the item score. sample scores in item no 1 in a 75 item test n = 10 Respondents X Y A 30 4 B 43 5 C 53 3 D 45 4 E 70 2 F 45 3 G 68 4 H 48 5 I 38 2 J 45 4 TOTAL 485 36 2. Square all the X and Y scores. sample scores in item no 1 in a 75 item test n = 10 Respondents X Y X2 A 30 4 B 43 5 C 53 3 D 45 4 E 70 2 F 45 3 G 68 4 H 48 5 I 38 2 J 45 4 TOTAL 485 36 24905
Y2
140
TCP.TIP_Rungduin
Page 33
3. Multiply all the X and Y. sample scores in item no 1 in a 75 item test n = 10 Respondents X Y X2 Y2 A 30 4 B 43 5 C 53 3 D 45 4 E 70 2 F 45 3 G 68 4 H 48 5 I 38 2 J 45 4 TOTAL 485 36 24905 140
XY
4. Given the above data, compute the Pearson r. nxy (x) (y) rxy = [nx2 (x)2] [ny2 (y)2] where: rxy x y xy y2 x2 = = = = = = correlation between x and y sum of total scores sum of item scores sum of the product of XY sum of the squared total scores sum of squared total item scores
Significant coefficient reflects good items while insignificant coefficient ones reflect poor items. Most researchers considers a coefficient of .30 and above as indicating good items. To interpret the correlation coefficient values ( r ) obtained, the following classification may be applied: +.00 - + .20 = negligible correlation +.21 - + .40 = low or slight correlation +.41 - +.70 = marks or moderate correlation high relationship +.71 - +.90 = +.91 - +.99 = very high correlation +1.00 = perfect correlation
TCP.TIP_Rungduin
Page 34
POINTPOINT-BISERIAL CORRELATION METHOD This is applied to test with dichotomous scoring system (yes/no, right/wrong, improved/ not improved). Unlike the Pearson Product method, the Y criterion is scored either 1 or 0. USING TWO OR MORE TECHNIQUES Basically it is a combination of two or more item analysis techniques. Although item analysis is laborious, some researchers have adopted to play safe by going through this process. This is done to ensure the more accurate quantitative judgment. STEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATION More often than not, the second trial run becomes the final run. This means that for the second trial run one may administer the draft resulting from the item analysis to ones final sample. Necessary adjustments can still be done before finally administering the instrument to the final sample. STEP VI: EVALUATION OF THE TEST After the final run, the test can now be evaluated statistically of its final validity and reliability.
6.1 Evaluation of reliability
THE SPLIT HALF RELIABILITY The most common technique of evaluating the reliability of the half-test is through the odd-even split half technique. This is done by splitting the test into two, the odd numbered items as one, and the even numbered items as the other. Through the use Pearson Product-Moment Correlation, the reliability of the half of the instrument can be determined. The reliability coefficient of this type is often called a coefficient of internal consistency. Through the use of Spearman-Brown Prophecy Formula, the reliability of the entire instrument can be obtained.
2 (r )
r11 = where: r11 r = =
1 + r
reliability of the whole test reliability of the half test
What would be the reliability of the whole test if the computed coefficient from the odd-even method is r = .63? The Kuder Kuder-Richardson Formula 20 can also be used in determining the reliability of the entire test and at the same time solving the problem that may arise in using the Spearman-Brown Prophecy Formula Steps in Kuder-Richardson Formula 20
1. Check the test by giving 1 for every correct answer and o for every wrong answer and get its frequency ITEMS 1 2 3 4 5 6 7 8 9 10 total RESPONDENTS F G H I J 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 10 9 9 9 6
A 1 1 1 1 1 1 1 0 0 0 7
B C D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 9 10 9
E 1 1 1 1 1 1 1 1 1 0 9
K 1 1 1 0 0 0 0 1 0 0 4
L 1 1 0 0 0 0 0 0 0 0 2
M 0 0 0 0 0 0 0 0 0 0 0
N 0 0 0 0 0 0 0 0 0 0 0
f 12 12 11 10 10 10 9 8 8 4
2. Find the proportion passing each item (pi) and then the proportion failing each item (qi). pi is computed by dividing the number of respondents who got the correct answers in the total number of respondents, while the qi is computed through subtracting 1 to the computed pi. pi =
no. of students w/ correct answer total number of respondent
qi
= pi 1
ITEMS 1 2 3 4 5 6 7 8 9 10 total f 12 12 11 10 10 10 9 8 8 4 pi qi
TCP.TIP_Rungduin
Page 36
3. Multiply the pi and qi ITEMS 1 2 3 4 5 6 7 8 9 10 total f 12 12 11 10 10 10 9 8 8 4 pi qi piqi
1.9509
4. Compute for the variance (s2) of the instrument RESPONDENTS A B C D E F G H I J K L M N total x 7 9 10 9 9 10 9 9 9 6 4 2 0 0 (x x) (x x)2
x
x =
(x x)2 s2 =
n- 1
5. Compute for the Kuder-Richardson Formula 20

k piqi 1 k-1 s2
rtt =
TCP.TIP_Rungduin
Page 37
THE TESTTEST-RETEST RELIABILITY This is also called the coefficient of stability. To calculate the coefficient, the test is administered twice to the same sample with a given time interval. The Pearson r is then calculated to determine the reliability of the instrument The most critical problem in this technique is determining the correct time interval between the two testing. Generally, twp weeks or so. PARALLEL FORM OF RELIABILITY OR ALTERNATE FORM The coefficient of equivalence is computed by administering two parallel or equivalent forms of the test to the same group of individuals. This technique is also referred to as the method of equivalent forms. The coefficient obtained from this formula is also called as the coefficient of equivalence.
6.2 Evaluation of validity
CRITERIONCRITERION-RELATED VALIDITY Criterion-related validity is a very common type of validity, and it is primarily statistical. It is a correlation between a set of scores or some other predictor with an external measure. This external measure is called criterion. A correlation coefficient is then run between two sets of measurements. In actual practice, several predictors are used. Then multiple r would be computed between these predictors. The difficulty usually met in this type of validity is in selecting or judging which criterion should be used to validate the measure at hand. Also called as predictive validity. CONSTRUCT CONSTRUCT VALIDITY Construct validity is determined by investigating the psychological qualities, trait, or factors measured by a test. It often called as concept validity because it does really after with the high validity coefficient but in the theory and concept behind the test. Likewise, it involves discovering positive correlation between and among the variables/constructs that define the concept.
TCP.TIP_Rungduin
Page 38
Discussion Point 5: Constructing Objective Test Items: Multiple-Choice Form Objective test items are not limited to the measurement of simple learning outcomes The multiple-choice item can measure at both the knowledge and understanding levels and is also free of many of limitations of other forms of objective items. The multiple-choice item is generally recognized at the most widely applicable and useful type of objective test item. More effectively measures many of the simple learning outcomes measured by the shortanswer item, the true-false item, and the matching exercise. Measures a variety of the more complex outcomes in the knowledge, understanding and application areas. This flexibility, plus the higher quality items usually found in the multiple-choice form, has led to its extensive use in achievement testing. CHARACTERISTIC OF MULTIPLE-CHOICE ITEMS A multiplemultiple-choice item consists of a problem and a list of suggested solutions. The problem maybe stated as a direct question or an incomplete statement and is called the stem of the item. The list of suggested solutions may include words, numbers, symbols, or phrases and are called alternatives (also called choices or options). The pupil is typically requested to read the stem and the list of alternatives and to select the one correct, or best, alternative. The correct alternative in each item is called merely the answer, answer and the remaining alternatives are called distracters (also called decoys or foils). These incorrect alternatives receive their name from their intended function to distract those pupils who are in doubt about the correct answer. Whether to use a direct question or incomplete statement in the stem depends on several factors. The direct-question form is easier to write, is more natural for the younger pupils, and is more likely to present a clearly formulated problem. On the other hand, the incomplete statement is more concise, and if skillfully phrased, it too can present a well-defined problem. A common procedure is to start each stem as a direct question and shifting to the incomplete statement form only when the clarity of the problem can be retained and greater conciseness achieved.
TCP.TIP_Rungduin
Page 39
Examples: Direct-question form In which one of the following cities is the capital of Philippines? a. Manila b. Paraaque c. Pasay d. Taguig Incomplete sentence form The capital of the Philippines is in ______. a. Manila b. Paraaque c. Pasay d. Taguig
Examples: Best-answer type Which one of the following factors contributed most to the selection of Manila as the capital of the Philippines? a. Central location b. Good climate c. Good highways d. Large population The bestbest-answer type of multiplemultiple-choice item tends to be more difficult than the correct-answer type. This due partly to the finer discriminations called for and partly to the correct fact that such items are used to measure more complex learning outcomes. The bestanswer type is especially useful for measuring learning outcomes that require the understanding, application or interpretation of factual information. USES OF MULTIPLEMULTIPLE-CHOICE ITEMS The multiple-choice item is the most versatile type of test item available. It can measure a variety of learning outcomes from simple to complex, and it is adaptable to most types of subject matter content. The uses show only its function in measuring some of the more common learning outcomes in the knowledge, understanding and application areas. The measurement of more complex outcomes, using modified forms of the multiple-choice item. Measuring Knowledge Outcomes Knowledge of Terminology. A simple but basic learning outcome measured by the multiplechoice item is knowledge of terminology. For this purpose, pupils can be requested to show their knowledge of a particular term selecting a word that has the same meaning as the given term or by choosing a definition of the term. Special uses of a term can also be measured, by having pupils identify the meaning of the term when used in context.
TCP.TIP_Rungduin
Page 40
Knowledge of Specific Facts. Another learning outcome basic to all school subjects is the knowledge of the specific facts. It is important in its own right, and it provides a necessary basis for developing understanding, thinking skills, and other complex learning outcomes. Multiple-choice items designated to measure specific facts can take many different forms, but questions of the who, what, when, and where variety are most common.
TCP.TIP_Rungduin
Page 41
Knowledge of Principles. Knowledge of principles is also important learning outcome in most school subjects. Multiple-choice item can be constructed to measure knowledge of principles as easily as those designated to measure knowledge of specific facts. The items appear a bit more difficult, but this is because principles are more complex than isolated facts.
Knowledge of Methods and Procedures. Another common learning outcome readily adaptable to the multiple-choice form is knowledge of methods and procedures. In some cases we might want to measure knowledge of procedures before we permit pupils to practice in particular area (e.g., laboratory procedures). In other cases, knowledge of methods and procedures may be important learning outcomes in their own right (e.g, knowledge of governmental procedures).
TCP.TIP_Rungduin
Page 42
Measuring Outcomes at the Understanding And Application Levels Many teachers limit the use of multiple-choice items to the knowledge area because they believe that all objective-type items are restricted to the measurement of relatively simple learning outcomes. Although this is true of most of the other types of objective items, the multiplemultiple-choice item is especially adaptable to the measurement measurement of more complex learning outcomes. outcomes In reviewing the following items, it is important to keep in mind that such item measure learning outcomes beyond factual knowledge only if the applications and interpretations are new to the pupils. Any specific applications or interpretations of knowledge can, of course, be taught directly to the pupils as any other fact is taught. When this is done, and the test item contains the same problem situations and solutions used in teaching, it is obvious that the pupils can be given credit for no more than mere retention of factual knowledge. To measure understanding and application, an element of novelty must be included in the test items.
TCP.TIP_Rungduin
Page 43
Ability to Identify Application of Facts and Principles. A common method of determining whether pupils learning has gone beyond the mere memorization of a fact or principle is to ask them to identify its correct application in a situation that is new to the pupil. Ability to Interpret Interpret CauseCause-AndAnd-Effect Relationship. Understanding can frequently be measured by asking pupils to interpret various relationships among facts. One of the most important relationships in this regard, and one common to most subject-matter areas, is the cause-and-effect relationship. Understanding of such relationships can be measured by presenting pupils with specific cause-and-effect relationship and asking them to identify the reason that best accounts for it.
Ability to Justify Methods Methods and Procedures. Procedures Another phase of understanding important in various subject-matter areas is concerned with methods and procedures. A pupil might know the correct method or sequence of steps in carrying out procedure, without being able to explain why it is the best method or sequence of steps. At the understanding level we are interested in the pupils ability to justify the use of a particular method or procedure. This can be measured with multiplechoice items by asking the pupils to select the best of several possible explanations of method or procedure.
TCP.TIP_Rungduin
Page 44
Advantages and Limitations of MultipleMultiple-Choice Items The multiple-choice item is one of the most widely applicable test items for measuring achievement. It can effectively measure various types of knowledge and complex learning outcomes. In addition to this flexibility, it is free from some of the shortcomings characteristics of the other item types. The ambiguity and vagueness that frequently are present in the shortshort-answer item are avoided because the alternatives better structure the situation. situation The short-answer item can be answered in many different ways, but the multiplechoice item restricts the pupils response to a specific area. Poor: Jose Rizal was born in _______. Better: Jose Rizal in A. Cavite B. Laguna C. Manila D. Quezon
One advantage of the multiple-choice item over the true-false item is that pupils cannot receive credit for simply knowing that a statement is incorrect; they must also know what is correct. TF The degree to which a test measures what it purports to measure is reliability. The degree to which a test measures what it purports to measure is A. Objectivity B. Reliability C. Standardization D. Validity
Another advantage of the multiple-choice items over the true-false item is the greater reliability per item. item Because the number of alternatives is increased from two to four or five, the opportunity for guessing the correct answer is reduced, and reliability is correspondingly increased. The effect of increasing the number of alternatives for each item is similar to that of increasing the length of the test. Using the best-answer type of multiple-choice item also circumvents a difficulty associated with the true-false item obtaining statements that are true or false without qualification. This makes it possible to measure learning outcomes in the numerous subjectmatter areas in which solutions to problems are not absolutely true or false but vary in degree of appropriateness (e.g., best method, best reason, best interpretation). Another advantage of the multiple-choice item over the matching exercise is that the need for homogeneous material is avoided. avoided The matching exercise, which is essentially a modified form of the multiple-choice item, requires a series of related ideas to form the list of premises and alternative responses. In many content areas it is difficult to obtain enough homogenous material to prepare effective matching exercises.
Two other desirable characteristics of the multiple-choice item are worthy of mention. First, it is relatively free from response set. That is, pupils generally do not favor a particular alternative when they do not know the answer. Second, using number of plausible alternatives makes the result amenable to diagnosis. The kind of the incorrect alternatives pupils select provides clues to factual errors and misunderstanding that need correction. The wide applicability of the multiple-choice item, plus its advantages makes it easier to construct high-quality test items in this form than in any of the other forms. This does not mean that good multiple-choice items can be constructed without effort. But for given amount of effort, multiple-choice items will tend to be of a higher quality than short-answer, true-false, or matching-type items in the same area. Despite its superiority, the multiple-choice does have limitations. limitations 1. As with all other paper-and-pencil tests, it is limited to learning outcomes at the verbal level. The problems presented to pupils are verbal problems, free from the many irrelevant factors presenting natural situations. Also, the applications pupils are asked to make are verbal applications, free from the personal commitment necessary for application in natural situations. In short, the multiple-choice item, like other paper-and-pencil tests, measures whether the pupil knows or understands what to do when confronted with a problem situation, but it cannot determine how the pupil actually will perform in that situation. situation 2. As with other types of selection items, the multiple-choice item requires selection of the correct answer, and therefore it is not well adapted to measuring some problem solving skills in mathematics and science, or to measuring the ability to organize and present ideas ideas. 3. The multiple-choice item has a disadvantage not shared by the other item types: : the difficulty of finding a sufficient number of incorrect but plausible distracters. distracters This problem especially acute at the early primary level because of the pupils limited vocabulary and knowledge in any particular area. Even at this level, however, classroom teachers have been creative in adapting the multiple-choice item to the measurement of newly learned concepts. As pupils move up through the grade levels and expand their vocabulary, knowledge, and understanding, plausible but in correct answer become more available. It still takes as touch of creativity, however, to identify and state the most plausible distracters for use in multiple-choice items. This is the task that separates the good from the poor item maker. Fortunately, it gets easier with experience in constructing such items.
TCP.TIP_Rungduin
Page 46
Suggestions for Constructing MultipleMultiple-Choice Items The general applicability and the superior qualities of multiple-choice test items are realized most fully when care is taken in their construction. This involves formulating a clearly stated problem, identifying plausible alternatives and removing irrelevant clues to the answer. answer The following suggestions provide more specific maxims for this purpose. 1. The stem of the item should be meaningful by itself and should present a define problem. Often the stems of the test items placed in multiple-choice form are incomplete statements that make little sense until all the alternatives have been read. These are not multiple-choice items but, rather, a collection of true-false statements placed in multiple-choice form. A properly constructed multiple-choice item presents a definite problem in the stem that is meaningful without the alternatives. Formulating a definite problem in the stem not only improves the stem of the item, but it also has desirable effect on the alternatives. Heterogeneity is possible because of the
stems lack of structure. The clearly formulated formulated problem in the stem forces the alternatives to be more homogenous. homogenous A good check on the adequacy of the problem statement is to cover the alternatives and read the stem by itself. It should be complete enough to serve as short-answer item. Starting each item stem as a direct question and shifting to the incomplete statement form only when greater conciseness is possible is the most effective method for obtaining a clearly formulated problem. 2. The item stem should include as much of the item as possible possible and should be free of irrelevant material. This will increase the probability of a clearly stated problem in the stem and will reduce the reading time required. Removing irrelevant material increases the conciseness of an item and including in the stem those words repeated in the alternatives.
There are a few exceptions to this rule. In testing problem-solving ability, irrelevant material might be included in the stem of an item to determine whether pupils can identify and select the material that is relevant to the problems solution. Similarly, repeating common words in the alternatives is sometimes necessary for grammatical consistency or greater clarity. 3. Use a negatively stated item stem only when significant learning outcomes outcomes require it. Most problems can and should be stated in positive terms. This avoids the possibility of pupils overlooking the no, no, not, not, least, least, and other similar words used in negative statements. In most instances, it also avoids measuring relatively insignificant learning outcomes. Teachers sometimes go to extremes to use negatively stated items because they appear more difficult. The difficulty of such items, however, is in the lack of sentence clarity rather than in greater difficulty of the concept being measured. Although negatively stated items are generally to be avoided, there are occasions when they are useful, mainly in areas in which the wrong information or wrong procedure can have dire consequences. When used, the negative aspects of the item should be made obvious.
4. All of the alternatives should be grammatically consistent with the stem of the item. This rule is not presented merely to perpetuate proper grammar usage, however; its main function is to prevent irrelevant clues from creeping in. All too frequently the grammatical consistency of the correct answer is given attention, but that of the distracters is neglected. As a result, some alternatives are grammatically inconsistent with the stem and are therefore obviously incorrect answers.
5. An item should contain only one correct or clearly best answer. Including more than one correct answer in a test item and asking pupils to select all of the correct alternatives has two shortcomings. First, First, such items are usually no more than a collection of true-false items presented in the multiple-choice form they do not present a definite problem in the stem, and the selection of answers requires a mental response of true or false to each alternative rather than a comparison and selection of alternatives. Second, Second because the number of alternatives selected as correct answers varies from one pupil to another, there is no satisfactory method of scoring. Poor: Which one of the following is the source of heat for home use? A. Coal B. Electricity C. Gas D. Oil Better: In the provinces, which one of the following is the most economical source of heat for home use? A. Coal B. Electricity C. Gas D. Oil
6. Items used to measure understanding should contain contain some novelty, but beware of too much. The construction of multiple-choice items that measure learning outcomes at the understanding level requires a careful choice of situations and skillful phrasing. The situations must be new to the pupils, but not too far removed from the examples used in class. If the test items contain problem situations identical with those used in class, the pupils, of course, respond on the basis of memorized answers. On the other hand, if the problem situation contains too much novelty, some pupils may respond incorrectly merely because they lack necessary factual information about the situations used. The problem of too much novelty can usually be avoided by selecting situations from pupils everyday experiences, by including in the stem of the item any factual information needed, and by phrasing the item so that the type of application or interpretation called for is clear. 7. All distracters should be plausible. The purpose of a distracter is to distract the uninformed away from the correct answer. To the pupil who has not achieved the learning outcome being tested, the distracters should be at least as attractive as the correct answer, and preferably more so. In a properly constructed multiple-choice item, each distracter will be elected by some pupils. If a distracter is not selected by anyone, it is not contributing to the functioning of the item and should be eliminated or revised. One factor contributing to the plausibility of distracters is their homogeneity. If all of the alternatives are homogeneous with regard to the knowledge being measured, the distracters are more likely to function as intended. Whether alternatives appear homogenous and distracters plausible, however, also depends on the pupils age level.
In selecting plausible distracters, the pupils learning experiences must not be ignored.
TCP.TIP_Rungduin
Page 50
8. Verbal associations between the stem and the correct answer should be avoided. Frequently a word in the correct answer will provide an irrelevant clue because it looks or sounds like a word in the stem of the item. Such verbal associations should never permit the pupil who lacks the necessary achievement to select the correct answer. However, words similar to those in the stem might be included in the distracters to increase their plausibility. 9. The relative length of the alternatives should not provide a clue to the answer.
10. The correct answer should appear in each of the alternative positions an approximately equal equal number of times, but in random order. 11. Use special alternatives such as none of the above or all of the above. The phrases none of the above or all of the above are sometimes added as the last
alternative in multiple-choice items. This is done to force the pupil to consider all of the alternatives carefully and to increase the difficulty of the items. All too frequently, however, these special alternatives are used inappropriately.
The use of none of the above is restricted to the correct-answer type of multiple-choice item and consequently to the measurement of factual knowledge to which absolute standards of correctness can be applied. It is inappropriate in best-answer type items, because the pupil is told to select the best of several alternatives of varying degree of correctness. Use of none of the above is frequently recommended for items measuring computational skill in mathematics and spelling ability. But these learning outcomes should generally not be measured by multiple-choice items, because they can be measured more effectively by short-answer items. The use of all of the above is loaded with so many difficulties that it might be discarded as a possible alternative. 12. Do not use multiplemultiple-choice when other items items types are more appropriate. appropriate 13. Break any of these rules when you have a good reason for doing so. Although any of these rules provide valuable guidelines for constructing multiple-choice items, there are instances where an exception to the rule may improve the item.
TCP.TIP_Rungduin
Page 52
Discussion Point 6: Rubrics Development

Definition: A printed set of guidelines that distinguishes performances or products of quality. A rubric has descriptions that define what to look for at each level of performance. Rubrics also often have indicators providing specific examples or tell-tale signs of things to look for in work. As an assessment tool, it measures student performance or output (product) based on reallife criteria. Educational Benefits of Rubrics 1. Easy to use and simple to understand. 2. Good alternative forms of evaluating student performance or output. 3. Give students clear guidelines regarding teacher expectations. 4. Provide students with feedback regarding weaknesses and strengths, thus enabling them to develop their skills. 5. Weighted rubrics are particularly advantageous because they clearly reflect which part of the task or project is more important for students to learn. 6. Provide focus and emphasis to particular details. 7. May be reused. Types of Rubrics 1. Holistic rubric Holistic scoring is more global and does little to separate the tasks in any given product, but rather views the final product as a set of interrelated tasks contributing to the whole. Anchor points are used to assign value to descriptions pf products or performances that contribute to the whole. One score provides an overall impression of ability on any given product or work. Disadvantage: Does not provide detailed information about student performance in specific areas of content or skills.
Homework Rubric: 4 WOW! Exceptional work! 3 Must be complete (Neat and easy to read, Must have date and name, Must be on time) 2 Incomplete (directions not followed) Difficult to read, Has name, missing date, May be on time 1 Incomplete (unorganized and/or difficult to read, Missing name and date, Late) 0 Not done 2. Analytical rubric Analytic scoring breaks down the objective or final product into component parts and each part are scored independently. In this case, the total score is the sum of the rating for all of the parts that are being evaluated. A rubric may be qualitative or quantitative. It is qualitative if its sole purpose is to provide feedback to the students. If the teacher intends to use a rubric in giving grades or scores, a quantitative rubric is appropriate. The teacher may assign scores or weights to the different gradations of performance of each particular task or criterion.
TCP.TIP_Rungduin
Page 53
Elements Common to all Analytic Rubrics Criteria Advanced Gradations Proficient Basic
RUBRIC DEVELOPMENT PROTOCOL 1. Define the learning outcome or objective that students are expected to achieve. Consider: impact, work quality, methods, content, knowledge 2. Determine how to describe each level, use anchor products that represent various performances that can be evaluated as high quality, average and low. 2.a. Generate a number of potential dimensions to use If Then the task has these consider these as possible dimensions elements Oral Presentation Voice projection, Body language, Grammar and pronunciation, Organization Powerpoint or other media Technical quality, Aesthetics, Grammar and spelling Analysis (Scientific or Data gathering and analysis, Inferences made otherwise) Judgment Adequacy of elements considered, Articulation of ranking criteria 3. Use concept words that convey various degrees of performance Depth, breadth, quality; Accuracy; Presence to absence; Complete to incomplete Many to some to none; Major to minor; Consistent to inconsistent Technical Requirements of Rubrics 1. Continuous 2. Parallel 3. Coherent 4. Aptly weighted 5. Valid 6. Reliable Task: Student will use reference materials to assignments Goal: Participation meets stated criteria Criteria 1 2 3 4 Dictionary Uses guide words Locates words independently Understands dictionary abbreviations complete
P 3 2 1
TP
TCP.TIP_Rungduin
Page 54

Comprehensive Material For Measurement and Evaluation

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Comprehensive Material For Measurement and Evaluation

Încărcat de

Drepturi de autor:

Formate disponibile

Technological Institute of the Philippines College of Education Center for Teaching Excellence Teaching Certificate Program Introduction to Measurement

Limitations Cannot specify quantitative differences among categories

Interval Most Complex

Temperature, Grades, Scores

Distance, weight, time required to learn a skill or subject

Power Generous Partially speeded Relatively hard

Speed Limited Relatively easy

Diagnostic Summative Method of Interpreting Results Criterion referenced

Differences Between NRTs and CRTs (but it is only matter of emphasis)

Discrimination Among Pupils Other Descriptive Terms

Application of Principles in Teaching and Learning of Concepts and Principle

4. Guided student discovery facilitates concept learning

2. Delimiting the problem facilitates problem solving

3. Using new methods for arriving at a conclusion facilitates problem solving

Table of Specifications for a Summative Third-Grade Social Studies Test

4. Compute the proportions of each case in the different item number.

High 10 % A B C D E Sum of high group Sum of low group Low 10 % F G H I J

High 10 % A B C D E Sum of high group Sum of low group Difference Low 10 % F G H I J

r11 = where: r11 r = =

reliability of the whole test reliability of the half test

3. Multiply the pi and qi ITEMS 1 2 3 4 5 6 7 8 9 10 total f 12 12 11 10 10 10 9 8 8 4 pi qi piqi

5. Compute for the Kuder-Richardson Formula 20

Discussion Point 6: Rubrics Development

S-ar putea să vă placă și