Developing The Assessment Instrument of Speaking

DEVELOPING THE ASSESSMENT INSTRUMENT OF SPEAKING
A. Ghufran Ferdiant *
Tadris Bahasa Inggris STAIN Pamekasan
Abstract
Speaking means to express ideas orally. By expressing what is in mind, a

speaker can make others understand things inside his/her mind. In order to
make the others capture and understand what he/she expresses orally, a
student should needs to pay attention on the signs that should be fulfilled.
How to develop the assessment instrument of the students’ speaking ability?
Therefore the writer used qualitative research design to describe the way to
develop the assessment instrument of the students’ ability. The result showed
that Developing speaking test is not as easy as other tests because a test
developer has to prepare the mechanism or direction and instruction well in
order to keep the test valid in which the test developer used content validity to
prove that the test was valid. In keeping the reliability the test developer used
inter- rater and Pearson Product Moment formula. In fact, content validity,
inter-rater and Pearson Product moment formula are proper to assess
speaking test. This study will be useful for the English teachers in increasing
the ability of the students in speaking by assessing the students’ capability in
good ways.
Key words: Developing, Assessment Instrument, and Speaking
Introduction Language learners who are lack in self-

Naturally, students often think confidence in their ability to participate
that the ability to speak a language is the successfully in oral interaction often listen in
product of language learning, but silence while others do the talking. One way
speaking is also a crucial part of the to encourage such learners to begin to
language learning process. Effective participate is to help them build up a stock of
teachers/lecturer teach students speaking minimal responses that they can use in
strategies by using minimal responses, different types of exchanges. Such responses
recognizing scripts, and using language can be especially useful for beginners.
to talk about language that they can use Minimal responses are predictable,
to help themselves expand their often idiomatic phrases that conversation
knowledge of the language and their participants use to indicate understanding,
confidence in using it. These agreement, doubt, and other responses to
teachers/lecturers help students learn to what another speaker is saying. Having a
speak so that the students can use stock of such responses enables a learner to
speaking to learn. focus on what the other participant is saying,
without having to simultaneously plan a
OKARA Journal of Languages and Literature, Vol. 1, Tahun 1, Mei 2016
93
response. In accordance with the explanation, give students strategies and phrases to use for
argues the speaker supplies verbal and clarification and comprehension check.
nonverbal symbols to the listeners, who By encouraging students to use
receive and interpreted them in terms of their clarification phrases in class when
own experiences, beliefs, knowledge, misunderstanding occurs, and by responding
1
interests, and needs . positively when they do, instructors can
Some communication situations are create an authentic practice environment
associated with a predictable set of spoken within the classroom itself. As they develop
exchanges. Greetings, apologies, control of various clarification strategies,
compliments, invitations, and other functions students will gain confidence in their ability
that are influenced by social and cultural to manage the various communication
norms often follow patterns or scripts. So do situations that they may encounter outside the
the transactional exchanges involved in classroom.
activities such as obtaining information and Speaking means to express ideas
making a purchase. In these scripts, the orally. By expressing what is in mind, a
relationship between a speaker's turn and the speaker can make others understand
one that follows it can often be anticipated. things inside his/her mind. In order to
Teachers/lecturers can help students develop make the others capture and understand
speaking ability by making them aware of the what he/she expresses orally, a student
scripts for different situations so that they can should needs to pay attention on the signs
predict what they will hear and what they that should be fulfilled. First he/she needs
will need to say in response. Through to have an advise, problem, or particular
interactive activities, instructors can give topic in his/her mind in order to convey it
students practice in managing and varying to the listeners, neither what should be
the language that different scripts contain. understood nor responded. Without
Language learners are often too advise, problem, or particular topic, there
embarrassed or shy to say anything when will not be a need for him/her to speak.
they do not understand another speaker or According to Djiwandono, content,
when they realize that a conversation partner organization, and language must get more
has not understood them. Instructors can help attention in speaking 2. If a speaker wants
students overcome this reticence by assuring what he/she expresses orally to be able to
them that misunderstanding and the need for be understood by other people, he/she has
clarification can occur in any type of to pay attention on the signs above. The
interaction, whatever the participants' signs are also needed to be criteria for
language skill levels. Instructors can also speaking test.
1 2
E.E. White, Basic Public Speaking (New York: S.M Djiwandono, Tes Bahasa (Jakarta: Indeks,
Macmillan Publishing Company, 1984). p. 19. 2008). P. 19.
94
As with any other area of language guarantee qualitatively similar speaking
assessment, the fundamental issues to be performance 5 . In order to test this
considered in a speaking assessment are: hypothesis, the performance of six test
(a) whether or not the test is used as participants in a semi-direct speaking test
intended, and (b) what its consequences was rated for (a) grammar, (b) vocabulary,
may be (Bachman & Purpura, in press). To (c) fluency, and (d) content and rhetorical
ensure that the uses and consequences of a organization. The taped responses of test
speaking test are fair, the operational participants were transcribed for qualitative
definition of speaking ability in the testing analysis, where the actual language
context should be examined, since the produced by the test participants was
definition of speaking ability varies with described in terms of four rating criteria.
respect to the targeted use and the decisions Both quantitative and qualitative analyses
made. One way to elicit the construct of of test-takers’ performance revealed a weak
speaking ability for a certain context is relationship between their quantitative
through a scoring rubric which informs test scores based on the ratings and their
users what a test aims to measure 3 . language production analyzed qualitatively.
However, a scoring rubric can affect the Meiron and Schick also find that similar
speaking assessment, as there may be an quantitative scores represented
interaction effect between the rating criteria qualitatively different performance in a
and examinees’ performance 4 . Different role-play simulation task 6 . In their study,
interpretations of the construct may cause the pre- and post-speaking performance of
biased effects on test-takers’ performance, 25 participants in an EFL teacher training
leading to unfairness in scoring and test program was scored based on a five-
use. Thus, careful examination of how category rubric (topic control,
rating scales interact with speaking pronunciation, grammatical control, lexical
performance needs to be considered to control, and conversational control). Close
determine the fairness of the speaking examination of the performance of two test
assessment. participants, one whose scores increased
The first issue in examining rating considerably from pre- to post-test, and the
scales is whether the scores given based on other who exhibited a very small increase,
the rating scale truly reflect the quality of
5
the test participants speaking performance. D Douglas, “Quantity and Quality in Speaking
Test Performance (Language Testing)” 11 (1994):
Douglas hypothesizes that quantitatively 125–44.
similar scores may not necessarily 6
B. Meiron and L. Schick. “Rating, Raters and
Test Performance: An Exploratory Study” in A.J.
3
Sari Louma, Assesing Speaking (UK: Kunnan (Ed.), “Fairness and Validation in
Cambridge University Press, 2004). Language Assessment: Selected Papers from the
4
Ibid.; T.F. McNamara, Measuring Second 19th Language Testing Research Colloquiem,
Language Performance (London: Longman, Orlando, Florida, Cambridge. UK: Cambridge
1996). University Press.
95
showed that their performances were very linked because the scales describe the kinds
different qualitatively, despite similar of speaking skills that the tasks elicit 9 .
quantitative scores on their post-test Generic scales have the potential to present
performance. For example, although these inappropriate criteria in measuring the
two examinees received the same score on intended ability, a concern related to the
conversational control, one examinee’s issue of validity. Different interpretations
performance showed more of “an academic of descriptors also lead to problems of
approach to rhetorical control” while the reliability10. Thus, rating scales developed
other’s performance exhibited more of “a for particular tasks are more desirable and
dialogic approach to conversational preferred since they should have greater
control” . The mismatch between validity and reliability, particularly those
examinees’ quantitative scores and their based partially or wholly on a sample of
qualitative performances, which was found test participants’ performance11.
in both of the cited studies, raises questions Another consideration in deciding
about the reliability and validity of the on rating criteria involves what the
testing scores. Thus, for better estimation speaking test intends to measure. That is, it
of test participants’ speaking ability, rating should be clear what speaking ability
scales should be designed to accurately means in a given task or test and whether
reflect the operational definition of or not defined aspects or features of
speaking ability 7 . This step can prevent speaking ability are appropriate for the
different raters from attending to different purposes of the test. Based on criteria used
features in a test participant’s discourse. in assessing performance, McNamara
What should be considered before distinguished between strong and weak
deciding on rating scales that ensure the language performance tests. Strong
validity of interpretations of test performance tests evaluate test
participants’ speaking performance? participants’ performance based on real-
Alderson and Banerjee divides rating scales world criteria where how well test-takers
into two categories. The first category are perform on a given task is the main
“generic scales” , which refer to scales that
9
are constructed in advance by proclaimed Louma, Assesing Speaking.
10
J. Upshur and C.E. Turner, “Constructing
experts and that are used to evaluate test Rating Scales for Second Language Tests
participants’ performance on any type of (English Language Teaching Journal)” 49 (1995):
3–12.
task. The second category includes rating 11
G Fulcher, “Tests of Oral Performance: The
scales designed to target specific tasks 8 . Need for Data-Based Criteria” 47 (1987): 287–91;
Rating scales and tasks are thus directly Upshur and Turner, “Constructing Rating Scales
for Second Language Tests (English Language
7 Teaching Journal)”; J. Upshur and C.E. Turner,
Ibid.
8 “Systematic Effects in the Rating of Second
J.C. Alderson and J. Banerjee, “Language
Testing and Assessment (Part 2) Language Language Speaking Ability: Test Method and
Teaching” 35 (n.d.): 79–113. Learner Discourse” 16 (1999): 82–111.
96
interest 12 . On the other hand, weak raters should blend criteria from different
performance tests focus more on the perspectives15. Rating criteria derived from
language itself. Such tests attempt to elicit task-specific and real-world concerns
a sample of the test participants’ language might not be useful beyond a certain
for evaluation through simulated and context. Nevertheless, knowledge of
artificial tasks, where success of the task is indigenous criteria employed in a real-
less important than the language elicited. world situation makes it possible to better
Although this dichotomy should be understand speaking test performance in
understood on a continuum rather than as relation to the situation at hand16.
two separate extremes, McNamara claimed Beside that, in education system of
that most general purpose language Indonesia, the government has stated that
performance tests are weak in nature 13 . there is no grammar minded anymore in
Douglas and Myers questioned what studying English, it has been changed into
appropriate rating criteria are necessary in speaking minded because a main success in
a language testing context that has a learning a target language is that the
specific purpose 14 . In their study, they students are able to communicate using the
reviewed veterinary students’ recorded language orally. Nowadays, the syllabus of
performances in simulated patient/client English in every school all over Indonesia
interviews. The researchers found out that is inclined to focus on how to increase the
proficiency was judged according to three capability of the students in speaking;
different criteria. Participants who were however, it doesn’t mean that there is no
professional veterinarians focused on the attempt to enhance the capability in other
test participants’ professional relationship three skills. Hence, it is proper to develop a
with the client and content knowledge, speaking performance test in which the test
applied linguists concentrated on participants are the students who are
framework of language use and studying English.
measurement construct, and student To sum up, in order to ensure
participants used their own knowledge base validity and reliability of a speaking
and the authenticity of the test format. In performance test, attention needs to be paid
conclusion, Douglas and Myers argued that to the quality of the speaking performance
along with scoring that is based on criteria
12
McNamara, Measuring Second Language specific to that particular testing context.
Performance.
13
Ibid.
Efforts to ensure high validity and
14
D. Douglas and R. Myers, “Assessing the reliability can help guarantee fairness in the
Communication Skill of Veterinary Students: speaking assessment. Ultimately, “the point
Whose Criteria? In A.J. Kunnan (Ed.), Fairness
is to get test developers to be clearer about
and Validation in Language Assessment: Selected
Papers from 19th Language Testing Research
15
Colloquium, Orlando, Florida (pp. 60-81), UK: Ibid.
16
Cambridge University Press. 2000. Ibid.
97
what they are requiring of test takers and They not only socialized the
raters, and to think through the students about the objectives but also
consequence of such requirements” clarified the description of each aspects
Based on the background of speaking competence:
knowledge and issues above, the test a. Content
developer thought that it was very The content should be relevant to
necessary to develop a test in speaking in the topic given in the test. It means that in
which it was expected to be beneficial for conveying the spoken text, the whole
English lecturers, teachers, students and content of the text should refer to the
other test developers. topic stated by the raters.
b. Organization
Research Method The test participant should
The present study employed a organize his/her sentences in systematical
qualitative design to describe the organization. In other words, he/she
assessment of the students’ speaking should know how to organize the
ability. The subjects were the students of unforgettable experience plot sequences
English Department of Teacher Training in good arrangement:
and Education Faculty, Islamic Orientation : Tells about whoever
University of Malang. were in unforgettable experience plot;
Before the test was conducted, the what was happening, where and when
test developer and the lecture assured that was it taking place;
the students had known about the Event1 and 2 : Tells about the
objective of the test neither the general compilation (either amusing, frightening
objective nor the specific objective. In the or embarrassing) and resolution (the way
specific objective of the test, it had been out) in the experience
stated about the aspects that would be Reorientation : Tells about the
evaluated. conclusion or ending the event
The general objective sounds: c. Language
“The test is to assess the students’ The test participant should be
speaking skill in expressing ideas orally. good in the components as follows:
“ While the specific one sounds:”The test grammar, pronunciation, and word
is to assess the students’ ability in choice.
expressing ideas orally with (1)clear Besides, the test participants were
content, (2)well organized, and (3)good informed that The text told orally would
language in terms of: intelligible be scored on the following aspects:
pronunciation, appropriate grammar, Content : 40 %
appropriately chosen words. Organization of ideas : 30 %
Language : 30 %
98
Total score : 100 % scoring guide and they determined the
3.2 Implementation of the Test minimum difference was 3. Each rater
On the day of the Test, the raters had each own form to write list of score
conveyed the test direction and in which the different lists of score from
instruction to keep the test well the two raters would be summed and
conducted: found out the mean.
a. Test Directions:
1. All students should be outside of After the Test
classroom first. The two sets of the scores from the
2. They are called one by one randomly test developer and the lecturer were
3. Choose one of the topics by lottery. summed, then found out the average.
b. Test Instruction: They determined the minimum difference
Now please tell me about your was 3. Based on the result of the test
unforgettable experience when you were there was no score range difference that
………(related to selected topic) was higher than the minimum difference
maximum four minutes . they stated. So it was not important for
c. Topic and Sub Topics provided the test developer to use the third rater.
Topic : The two sets of the scores from them
Telling about unforgettable experience were summed, then found out the
Sub Topics: average. They determined the minimum
1. Having a picnic difference was 3. Based on the result of
2. Studying in Senior High School the test there was no score range
3. Going camping difference that was higher than the
4. Watching TV minimum difference they stated. So it
5. Attending a party was not important for the test developer
6. Playing a favorite sport to use the third rater.
7. Eating a favorite food It can be stated that the lecturer
8. Helping parents had been successful in teaching the
9. Gardening students because the average of total
10 Making a friend score was 8.40 in which the mean score
d. Mechanism of the test: was above the minimum score stated.
The test developer and the
lecturer were sitting on separated seats Finding and Discussion
while scoring the test participant who
was telling his/her experience orally Here tells about principles related
based on the sub topic that had been to selection of test material and test
selected by lottery. In scoring, they (items) development. There are two
scored the test participants based on principles related to selection test
99
material and test (items) development Besides, he intended to avoid that the test
based on the opinion of Confucius, they participant would inform one another
are as follows: about the test.
Validating the Test
I cannot deny what I experience for Language test can be defined as a
myself. means or procedure used to evaluate
learning process. The test should refer to
Experience is a part of human measure the language ability possessed
destiny. Every human surely has by the test-taker or the test participant.
experience neither the interesting one nor Related to language test, Djiwandono
the bad one. Moreover, there is an states that in language learning
unforgettable experience in which it is implication, a test is intended to measure
very difficult to forget. language competence as the reflection of
Therefore it is not mistaken if the learning result. In addition, he states that
test developer selected this topic as the a good test should have some
topic of the test. Telling experience is characteristics, two of them are validity
stated in the syllabus of Speaking 2 and reliability17.
course and taught in the Speaking 2 class. Validity
Hence, all students as test participants To prove the validity of the test,
surely knew about their own experience. the test developer used curricular
It can help guarantee the validity of the validity in which the validity could be
test proven from relevancy between the test
I hear and I know. I see and I believe. I and the curriculum used in the
do and I understand. department.
From the experience a human can To keep validity of the test, before
hear and know about something. He/she scoring the test developer asked the
can see and believe. And he/she can lecturer about the materials having been
understand something by doing. In short, given to the students. The test developer
every human can get lessons from the and the lecturer agreed that the topic of
experience. the test was unforgettable experience in
“I do and I know” refers to the which in telling the unforgettable
competence of human. So here the test experience a student had used one of the
developer assessed the students or the test sub topics selected by lottery. Moreover,
participants’ speaking skill, especially in the test developer provided Table of
telling about unforgettable experience. Specification in order to guard the
Because there are various stories in relevance between the test and the
experience, thus he provided ten sub
topics related to experience to be selected.
17
Djiwandono, Tes Bahasa. P. 163.
100
objectives of the test neither the general lecturer became the second rater. Based
objective nor the specific objective. on the two list of score, It can be assured
After scoring, it was found that that there was consistency in scoring the
the speaking test given to the students ability of the students to tell unforgettable
was relevant to of the test neither the experience orally. In other words, the test
general objective nor the specific was reliable. It could be seen from the
objective. It means that the test given to difference between the first rater and the
what the lecturer had explained in the second rater in scoring. The minimum
speaking 2 class. Beside that it was difference stated was 3. Whereas, the
relevant to the objectives. It means that highest difference was only 2.
the test given to the students was valid. Moreover in proving the reliability, the
Reliability test developer used the formula of
While to keep reliability of the Pearson Product moment. The two
test, before scoring the test developer and different lists of score was processed
the lecturer agreed to implicate Inter- using the formula in order to know the
Rater Reliability in which in considering reliability of the test.
the reliability level there should be two
lists of score toward the test participants Conclusion and Suggestion
obtained from two raters. It was stated Developing speaking test is not as
that the test developer was as the first easy as other tests because a test
rater and the lecturer became the second developer has to prepare the mechanism
rater. Besides, He gave the lecturer or direction and instruction well in order
scoring guide in order to make the to keep the test valid in which the test
process of scoring more reliable (there is developer used content validity to prove
consistency in scoring).. In other words, that the test was valid. In keeping the
It was expected that there was no reliability the test developer used inter-
distinction in scoring the student’s ability rater and Pearson Product Moment
in speaking, especially in telling formula. In fact, content validity, inter-
unforgettable experience. In addition it rater and Pearson Product moment
was agreed that the minimum score or formula are proper to assess speaking test.
passing score was (2+2+2 = 6) and the In developing speaking test it is
minimum difference was 3. better to employ content validity, inter-
After scoring, there were two lists rater and Pearson Product moment
of score toward the test participants formula because they can work well in
obtained from two raters in which they keeping the test valid and reliable.
scored based on the scoring guide
provided. It was stated that the test
developer was as the first rater and the
101
References
Alderson, J. C., & Banerjee, J. 2002. Luoma, S. 2004. Assessing speaking.
Language testing and assessment Cambridge, UK: Cambridge
(Part 2). Language Teaching, 35, 79- University Press.
113.
McNamara, T. F. 1996. Measuring second
Bachman, L. F., & Purpura, J. E. (in press). language performance. London:
Language assessments: Gate-keepers Longman.
or door openers? In B. M. Spolsky &
F. M. Hult (Eds.), Blackwell Meiron, B., & Schick, L. 2000. Ratings,
handbook of educational linguistics. raters and test performance: An
Oxford, UK: Blackwell Publishing. exploratory study. In A. J. Kunnan
(Ed.), Fairness and validation in
Djiwandono, S.M. 2008. Tes Bahasa. language assessment: Selected
Jakarta: Indeks papers from the 19th Language
Testing Research Colloquium,
Douglas, D. 1994. Quantity and quality in Orlando, Florida . Cambridge, UK:
speaking test performance. Language Cambridge University Press.
Testing, 11, 125-44.
Renshaw, J.2008. Boost Speaking 2.
Douglas, D., & Myers, R. 2000. Assessing Hongkong: Pearson Longman Asia
the communication skills of ELT
veterinary students: Whose criteria?
In A. J. Kunnan (Ed.), Fairness and Upshur, J., & Turner, C. E. 1995.
validation in language assessment: Constructing rating scales for second
Selected papers from the 19th language tests. English Language
Language Testing Research Teaching Journal, 49, 3-12.
Colloquium, Orlando, Florida (pp.
60-81). Cambridge, UK: Cambridge Upshur, J., & Turner, C. E. (1999).
University Press. Systematic effects in the rating of
second-language speaking ability:
Fulcher, G. 1987. Tests of oral Test method and learner discourse.
performance: The need for data- Language Testing, 16, 82-111.
based criteria. English Language
Teaching Journal, 41, 287-91. White, E.E. 1984. Basic Public Speaking.
New York: Mcmillan Publishing
Jack C. Richard. 2000. Interchange. UK: Company.
Cambridge., authentic materials
102
103

Developing The Assessment Instrument of Speaking

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Developing The Assessment Instrument of Speaking

Încărcat de

Drepturi de autor:

Formate disponibile

DEVELOPING THE ASSESSMENT INSTRUMENT OF SPEAKING

Speaking means to express ideas orally. By expressing what is in mind, a

Key words: Developing, Assessment Instrument, and Speaking

Introduction Language learners who are lack in self-

OKARA Journal of Languages and Literature, Vol. 1, Tahun 1, Mei 2016

OKARA Journal of Languages and Literature, Vol. 1, Tahun 1, Mei 2016

OKARA Journal of Languages and Literature, Vol. 1, Tahun 1, Mei 2016

OKARA Journal of Languages and Literature, Vol. 1, Tahun 1, Mei 2016

OKARA Journal of Languages and Literature, Vol. 1, Tahun 1, Mei 2016

S-ar putea să vă placă și