F. Draft, Aspects To Consider When Writing A Good Test

Catalina Valverde Ovares
Aspects to consider when writing a good test

Teachers assess students to see how well they achieved the unit outcomes or certain topics. Through the assessments, teachers can provide feedback that can help students improve their performance. The assessment should motivate the students and help them develop their learning. Therefore, it provides evidence of achievement of learning outcomes and enables students work to be graded. When writing a good test, teachers have to take into consideration the five testing criteria for assessment:
1. PRACTICALITY
A practical test: should not be excessively expensive stays within the appropriate time constraints relatively easy to administer has a scoring procedure that is specific and time-efficient
For a test to be practical: details should clearly be established before the test students should be able to complete the test within the set time frame the test should be able to be administered without any problems all materials and equipment should be ready the cost of the test should be within budgeted limit the scoring system should be practical in the teachers time frame methods for reporting results should be determined in advance
2. RELIABILITY
A reliable test is consistent and dependable. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities: Fluctuations in the student (Student-Related Reliability) Fluctuations in scoring (Rater Reliability) Fluctuations in test administration (Test Administration Reliability) Fluctuations in the test (Test Reliability) itself
Student-Related Reliability Temporary illness, fatigue, a bad day, anxiety and other physical or psychological factors may make an observed score deviate from ones true score. Also a test-takers or strategies for efficient test taking can also be included in this category. Rater Reliability Human error, subjectivity, lack of attention to scoring criteria, inexperience, inattention, or even preconceived biases may enter into scoring process. Inter-rater unreliability occurs when two or more scorers yield inconsistent scores of the same test Intra-rater unreliability is a common occurrence for classroom teachers because of unclear scoring criteria, fatigue, bias toward particular good and bad students, or simple carelessness
One solution to such intra-rater unreliability is to read through about half of the tests before rendering any final scores or grades, then to recycle back through the whole set of tests to ensure an even-handed judgment. The careful specification of an analytical scoring instrument can increase rater reliability. Test Administration Reliability Unreliability may also result from the conditions in which the test is administered. Some samples could be like street noise, photocopying variations, poor light, variations in temperature, condition of desks and chairs. Test Reliability Sometimes the nature of the test itself can cause measurement errors. Timed tests may discriminate against students who do not perform well on a test with a time limit Poorly written test items (that are ambiguous or that have more than one correct answer) may be a further source of test unreliability
3. VALIDITY
Arguably, validity is the most important principle. The extent to which the assessment requires students to perform tasks that were included in the previous classroom lessons. The validity of a test is established by some characteristics: There is no final, absolute measure of validity, but several different kinds of evidence may be invoked in support In some cases it may be appropriate to examine the extent to which a test calls for performance that matches that of the course or unit of study being tested
In other cases we may be concerned with how well a test determines whether or not students have reached an established set of goals or level of competence Still in some other cases it could be appropriate to study statistical correlation with other related but independent measures Other concerns about a tests validity may focus on the consequences beyond measuring the criteria themselves - of a test, or even on the test-takers perception of validity
Below you can take a further look on the five types of evidences: Content Validity If a test requires the test-taker to perform the behavior that is being measured, it can claim content-related evidence of validity, often popularly referred to as content validity. For example, if you are trying to assess a persons ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple choice questions requiring grammatical judgments does not achieve content validity. In contrast, a test that requires the learner actually to speak within some sort of authentic context does. Additionally, in order for content validity to be achieved in a test, one should be able to elicit the following conditions: Classroom objectives should be identified and appropriately framed. The first measure of an effective classroom test is the identification of objectives Lesson objectives should be represented in the form of test specifications. In other words, a test should have a structure that follows logically from the lesson or unit you are testing
If you clearly perceive the performance of test-takers as reflective of the classroom objectives, then you can argue this, content validity has probably been achieved. Another way of understanding content validity is to consider the difference between direct (involves the testtaker in actually performing the target task) and indirect (involves the test-taker in performing not the target task itself, but that is related in some way) testing. For example, when you test learners oral production of syllable stress, if you have them mark stressed syllables in a list of written words, this will be an indirect testing, but if you require them actually produce target words orally then, this will be a direct testing. Consequently, it can be said that direct testing is the most reasonable way to achieve content validity in classroom assessment. Criterion-related Validity It examines the extent to which the criterion of the test has actually been achieved. For example, a classroom test designed to assess a point of grammar in communicative use will have criterion validity if test scores are corroborated either by observed subsequent behavior or by other communicative measures of the grammar point in question.
Criterion-related evidence usually falls into one of two categories: Concurrent validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. For example, the validity of a high score on the final exam of a foreign language course will be substantiated by actual proficiency in the language Predictive validity: The assessment criterion in such cases is not to measure concurrent ability but to assess (and predict) a test-takers likelihood of future success. For example, the predictive validity of an assessment becomes important in the case of placement tests, language aptitude tests, and the like
Construct Validity Virtually every issue in language learning and teaching involves theoretical constructs. In the field of assessment, construct validity asks, Does this test actually tap into the theoretical construct as it has been identified? Imagine that you have been given a procedure for conducting an oral interview. The scoring analysis for the interview includes several factors in the final score: pronunciation, fluency, grammatical accuracy, vocabulary use, and sociolinguistic appropriateness. The justification for these five factors lies in a theoretical construct that claims those factors to be major components of oral proficiency. So if you were asked to conduct on oral proficiency interview that evaluated only pronunciation and grammar, you could be justifiably suspicious about the construct validity of that test. Lets suppose youve created a simple written vocabulary quiz, covering the content of a recent unit, which asks students to correctly define a set of words. Your chosen items may be a perfectly adequate sample of what was covered in the unit, but if the lexical objective of the unit was the communicative use of vocabulary, then the writing of definitions certainly fails to match a construct of communicative language use. Consequential Validity Consequential validity encompasses all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the (intended and unintended) social consequences of a tests interpretation and use. According to McNamara (2000, p. 54) cautions against test results, may reflect socioeconomic conditions such as opportunities for coaching. For example, only some families can afford coaching, or because children with more highly educated parents get help from their parents. Teachers should consider the effect of assessments on students motivation, subsequent performance in a course, independent learning, study habits, and attitude toward school work. Face Validity
Refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the test-takers Means that the students perceive the test to be valid. Face validity asks the question Does the test, on the face of it, appear from the learners perspective to test what it is designed to test? Is not something that can be empirically tested by a teacher or even by a testing expert? It depends on the subjective evaluation of the test-taker A classroom test is not the time to introduce new tasks If a test samples the actual content of what the learner has achieved or expects to achieve, face validity will be more likely to be perceived Content validity is a very important ingredient in achieving face validity Students will generally judge a test to be face valid if directions are clear, the structure of the test is organized logically, its difficulty level is appropriately pitched, the test has no surprises, and timing is appropriate To give an assessment procedure that is biased for best, a teacher offers students appropriate review and preparation for the test, suggests strategies that will be beneficial, and structures the test so that the best students will be modestly challenged and the weaker students will not be overwhelmed
4. AUTHENTICITY
In an authentic test: The language is as natural as possible Items are as contextualized as possible Topics and situations are interesting, enjoyable, and/or humorous Some thematic organization, such as through a story line or episode is provided Tasks represent real-world tasks Reading passages are selected from real-world sources that test-takers are likely to have encountered or will encounter Listening comprehension sections feature natural language with hesitations, white noise, and interruptions More and more tests offer items that are interrupted in that they are sequenced to form meaningful units, paragraphs, or stories
5. WASHBACK
Washback includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment.
Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score Classroom tests should serve as learning devices through which washback is achieved Students incorrect responses can become windows of insight into further work Their correct responses need to be praised, especially when they represent accomplishments in a students inter-language Washback enhances a number of basic principles of language acquisition: intrinsic motivation, autonomy, self-confidence, language ego, interlanguage, and strategic investment, among others One way to enhance washback is to comment generously and specifically on test performance Washback implies that students have ready access to the teacher to discuss the feedback and evaluation he has given Teachers can raise the washback potential by asking students to use test results as a guide to setting goals for their future effort

F. Draft, Aspects To Consider When Writing A Good Test

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

F. Draft, Aspects To Consider When Writing A Good Test

Încărcat de

Drepturi de autor:

Formate disponibile

Catalina Valverde Ovares

Aspects to consider when writing a good test

S-ar putea să vă placă și