Sunteți pe pagina 1din 18

Running head: TEST REVIEW PROJECT

Test Review Project: ACTFL, TOEFL iBT, and IELTS


Krista Boddy
Colorado State University

TEST REVIEW PROJECT

As an English language instructor, it is imperative that I know about the standard


proficiency tests that universities, governments, and corporations are currently using to make
high-stakes decisions, such as admission to a university, changing immigration status, or hiring
and promoting in business. Bachman and Palmer (2010) highlight the consequences of
assessments on stakeholders, such as English language learners (ELLs), and note that language
testers and instructors must be held accountable for how the scores from assessments are used (p.
85). Along with accountability, testing professionals must critically analyze tests for evidence of
validity and reliability. Miller, Linn and Gronlund (2009) emphasize the significance of validity,
or the appropriateness of the interpretations and uses of assessment results, and reliability, or
consistency of assessment results (p. 70-71). The more evidence there is of validity and
reliability, the better the assessment serves its objectives and purposes.
The following test review examines three published proficiency tests in the field of
Applied Linguistics. These include the American Council on the Teaching of Foreign Languages
(ACTFL), the Test of English as a Foreign Language - Internet-Based Test (TOEFL iBT), and
the International English Language Testing System (IELTS). These three tests are designed for
adult learners, which are my targeted group of English language learners. They are also among
the best-known English and language assessments. Results from these particular tests are often
used for high-stakes decisions. Having worked with ELLs in an Adult Basic Education (ABE)
program for four years, I am interested in working with academic ELLs abroad for a completely
different experience. I believe that the type of ELLs I will be working with will have taken or
will be preparing for one of the three tests in this review. My intention in reviewing these three
assessments is to learn whether they meet the needs and expectations of academic ELLs. I am
also interested in these particular language assessments because they each measure proficiency in

TEST REVIEW PROJECT

the four main skills of listening, speaking, reading, and writing in English. My goal in this
review is to ascertain which of the three tests meet my criteria of practicality (i.e., ease of use,
time to take test, time to receive results), and provides sufficient evidence of validity and
reliability (i.e., fairness of scoring, consistency of rating, actual purposes match what test
intends). This knowledge will help me determine which test best suits the academic ELLs in the
educational context that I am interested in. Below, Ive organized each test review summary into
Tables 1, 2, and 3.
Table 1
ACTFL - American Council on the Teaching of Foreign Languages
Publisher

ACTFL, Inc., 1001 N. Fairfax St., STE 200, Alexandra, Virginia 22314; Phone:
(703) 894-2900; http://www.actfl.org/professional-development/proficiencyassessments-the-actfl-testing-office
LTI- Language Testing International (Exclusive Licensee of ACTFL)
445 Hamilton Avenue, Suite 1104, White Plains, NY 10601; Tel: 914-963-7110,
Toll Free: 1-800-486-8444; http://www.languagetesting.com/
ACTFL Proficiency Guidelines were first published in 1986

Date of 1st
publication
Target
Secondary education, higher education, and beyond. Tests available in many
population different target languages (ACTFL Assessments Brochure, 2012).
Cost

Varies depending on test type, commercial, private or government pricing (e.g.,


the government price for ACTFL Oral Proficiency Interview Certified Double
Rated - $136.88). See http://www.languagetesting.com/contact-us for pricing.
Purpose
ACTFL proficiency assessments are meant to determine functional language
ability in all modes of communication (Listening, Speaking, Reading, and
Writing). These assessments are used internationally by academic institutions,
government agencies and private corporations for academic placement, student
assessment, program evaluation, professional certification, hiring, and
promotional qualification (ACTFL Website, 2016).
Structure
Oral Proficiency Interview (OPI) and Computer-based (OPIc)
(parts and A Live, recorded 20-30 minute telephone conversation with tester
item types) Five Phases: Warm-up, Level checks, Probes, Wind Down, & Role Play
Item types are personalized questions which adapt to candidate responses
Listening Test (LTP)
A Computerized test: 50 minutes for a two-level test; 75 minutes for a three-level
test; 125 minutes for a four-level test

TEST REVIEW PROJECT

Scoring

Authentic recorded passages; each consisting of three questions; each


question has four multiple-choice answers of which only one is correct (LTI
Website, 2016).
Range of informal/formal speech on general, social, and academic topics,
such as daily interactions, reports, discussions and broadcasts (ACTFL
Assessments Brochure, 2012).
Reading Test (RPT)
A Computerized test: 50 minutes for a two-level test; 75 minutes for a three-level
test; 125 minutes for four-level test.
Authentic text passages with three questions each; multiple-choice answers of
which only one is correct (LTI Website, 2016).
Range of informal/formal texts on general, social, academic and professional
topics, such as correspondence, technical reports, and news articles (ACTFL
Assessments Brochure, 2012).
Questions target the main idea, supporting detail and, for some texts, the
inferences and connections the test-taker can make from the content and
organization of the text (ACTFL Reading Proficiency Familiarization
Manual, 2013).
Writing Proficiency Test (WPT)
Computer or paper tests available. Total time to read the directions and complete
all the writing tasks is 80 minutes. A language self-assessment determines which
one of three WPT test forms is generated for the individual.
Candidate answers questions about their education or work history, their
hobbies and past times and special areas of interest. Their answers will be
used by the online system to select prompts from topics relevant to the test
taker for portions of the test (LTI Website, 2016).
The candidate is presented with four tasks, with approximately 10-20 minutes
allowed for each task. The test-taker is prompted to demonstrate descriptive
narrative, informative, and persuasive writing. Responses are open-ended and
are written in the target language (LTI Website, 2016).
WPT tasks deal with practical, social, and professional topics encountered in
informal and formal contexts and may include content, context, accuracy, and
discourse types associated with the writing tasks at each level. Text prompts
include presentational (essays, reports, letters) or interpersonal (instant
messaging, e-mail communication, and texting). Responses are spontaneous
(immediate, unedited) or reflective (revised, edited) (ACTFL Assessments
Brochure, 2012).
All four ACTFL Proficiency Tests follow the 2012 ACTFL Proficiency
Guidelines and use the same scale range, which includes five major levels
(Novice, Intermediate, Advanced, Superior, Distinguished) with the three lowest
levels divided into High, Mid, and Low sublevels.
Oral Proficiency Interview (OPI) and Computer-based (OPIc)
Two Scales: ACTFL and/or Inter-Agency Language Roundtable (ILR)
Commercial OPIs are single rated
Official/Certified OPIs are double rated whose independent ratings must
agree before an official rating is released (LTI Website, 2016).

TEST REVIEW PROJECT

Evidence
for
reliability

ILR scale: 0 (No Proficiency) to 5 (Functionally Native) 50% are double


rated (ACTFL Website, 2016).
Listening Test (LTP)
Auto-graded with results provided instantaneously on the ACTFL scale from
Novice to Superior levels. Superior individuals can understand passages from
many genres dealing with a wide range of subjects. Novice individuals can
recognize words and get limited information from highly predictable, simple
passages in familiar contexts and formats (ACTFL Listening Proficiency
Familiarization Manual, 2014).
Reading Test (RPT)
Auto-graded with results provided instantaneously on the ACTFL scale from
Novice to Superior levels. Superior individuals can understand texts from many
genres dealing with a wide range of subjects. Novice individuals can recognize
words and get limited information from highly predictable simple texts in
familiar contexts and formats, simple forms and documents (ACTFL Reading
Proficiency Familiarization Manual, 2013).
Writing Proficiency Test (WPT)
The writing performance is first placed within a major range and then matched to
the sub-level description.
The following criteria are considered for evaluation: the writing tasks or
functions the writer performs, the social contexts and specific content areas
within which the writer is able to perform the tasks, the accuracy of the
writing, and the length and organization of the written text the writer is
capable of producing (LTI Website, 2016).
Superior individuals can produce informal and formal writing on practical,
social, and professional topics, treated both abstractly and concretely. Novice
individuals can produce only lists and notes and limited formulaic
information on simple forms and documents (ACTFL Writing Proficiency
Familiarization Manual, 2012).
Official ACTFL WPT (LTI conducted assessments), are blindly double rated
by two separate certified raters to assign a final rating.
Statistical distribution of the scores (total and part) N/A
Standard Error of Measurement (SEM) N/A
Inter-rater reliability statistics and trainer comments are reviewed by ACTFL
testing experts, with the goal of maintaining an 85% and higher agreement
among tester ratings. Mandatory trainer and tester meetings are also held
periodically throughout the year to review testing issues, protocols, and new
developments (ACTFL Assessments Brochure, 2012).
ACTFL OPIc exceeded the minimum inter-rater reliability and agreement
standards. Spearman Rs exceeded the standard for use, ranging from .95 to
.97 across languages. Inter-rater reliability was similar across language
category and interview year. Inter-rater Agreement showed absolute
agreement for Spanish and English (80% for both) than for Arabic (71%).
The Arabic data showed an improving trend in agreement from 2009 (28%;
N=7) to 2011 (73%; N=152). Overall, the findings support the reliability of

TEST REVIEW PROJECT

the ACTFL OPIc as an assessment of speaking proficiency (SWA


Consulting, Inc. Reliability Study, 2012).
For the ACTFL Writing Proficiency Test, interrater consistency was found to
be well above acceptable levels for applied settings (e.g., r = .94 and .92 for
full and Spanish-only samples, respectively). Measures of interrater
agreement indicated that for the full sample the majority of judges provided
identical scores to (80% perfect agreement). Similar results were found for
the Spanish-only sample as well (78% perfect agreement). Interrater
agreement is also provided within each proficiency category (e.g.,
Advanced) and levels (e.g., Advanced-Mid) by this report (SWA Consulting,
Inc. Report, 2004).
Evidence
ACTFL assessments have been researched and validated by independent
for validity
researchers. A bibliography of studies of ACTFL/LTI assessments are
available on the LTI website: http://www.languagetesting.com
Confirmatory Factor Analysis (CFA) results provide further construct validity
evidence for the ACTFL OPI and ACTFL OPIc. The fit statistics were
excellent, the latent correlation between the two assessments was .94, and the
construct validity coefficient was .99 (SWA Consulting, Inc. Report, 2008).
Table 2
TOEFL iBT Test of English as a Foreign Language Internet-Based Test
Publisher
Date of 1st
publication
Target
population
Cost
Purpose

Structure
(parts and
item types)

Educational Testing Service, Publication Order Services, ETS Corporate


Headquarters, Rosedale Road, Princeton, NJ, 08541,
etsinfo@ets.org, http://www.ets.org
PBT Paper-Based Test (1965), CBT Computer-Based Test (1998), iBT
Internet-Based Test (2005)
Non-native speakers of English wanting to enroll in universities and colleges in
the US.
The cost of the test varies between countries. The ETS site shows prices based on
locations. For Colorado, the test is currently $190 (ETS Website, 2016).
The TOEFL iBT test measures your ability to use and understand English at the
university level. And it evaluates how well you combine your listening, reading,
speaking and writing skills to perform academic tasks (ETS Website, 2016).
Scores from the test can be used by U.S government agencies, scholarship
programs, and licensing and certification agencies (Stoynoff & Chapelle, 2005).
Some educational institutions require higher TOEFL scores than others, therefore
TOEFL is a high-stakes test since students futures are affected by the scores
(Stoynoff & Chapelle, 2005)
The entire TOEFL iBT must be completed within four hours and is conducted via
the internet at a TOEFL test center. A standard English language (QWERTY)
computer keyboard is used for the test. ETS recommends that test-takers practice
typing on a QWERTY keyboard before taking the test (ETS Website, 2016).
Four Sections include: listening, reading, speaking, and writing and tasks often
combine more than one skill.

TEST REVIEW PROJECT

Scoring

Reading: 60-80 min., 36-56 multiple-choice questions, read 3-4 passages from
academic texts and answer questions
Listening: 60-80 min., 34-51 multiple-choice questions, listen to lectures,
classroom discussions and conversations, then answer questions.
10 minute break
Speaking: 20 min., 6 tasks, express an opinion on a familiar topic; speak based on
reading and listening tasks.
Preparation Time: 15 -30 seconds depending on task
Response Time: 45 60 seconds depending on task
Writing: 50 min., 2 tasks, write essay responses based on reading and listening
tasks, support an opinion in writing.
In actual test, candidates can take notes while listening and reading and use
them to complete the essay.
In an actual test, 3 minutes are allowed to read the passage and 20 30 minutes
to plan and write a response. Typically, an effective response to the first essay
is 150 to 225 words, and 300 word minimum to the second essay.
Test takers with disabilities may request additional time to read the passage
and write the response (TOEFL iBT Sample Test Questions, 2016).
A total score of up to 120 is assessed from each language skill test.
Reading: 0-30 (High = 22-30, Intermediate = 15-21, Low = 0-14)
Listening: 0-30 (High = 22-30, Intermediate = 15-21, Low = 0-14)
The Reading and Listening sections are scored by computer with a score range
from 0 to 30. The Reading section has 3656 tasks based on reading passages
from academic texts and answering questions. The Listening section has 3451
tasks based on listening to lectures, classroom discussions and conversations, then
answering questions (ETS Website, 2016).
Speaking: 0-30 (Good = 26-30, Fair = 18-25, Limited = 10-17, Weak = 0-9)
ETS-certified test scorers rate responses for six tasks from 0 to 4 based on a
Speaking Rubric. The sum is converted to a scaled score of 0 to 30.
Writing: 0-30 (Good = 24-30, Fair = 17-23, Limited = 1-16)
Two tasks are rated from 0 to 5 based on a Writing Rubric for Integrated and
Independent tasks. The sum is converted to a scaled score of 0 to 30. The writing
section is scored by: evaluating the integrated writing task for development,
organization, grammar, vocabulary, accuracy and completeness (ETS Website,
2016). The independent essay is scored on overall writing quality, including
development, organization, grammar and vocabulary.
Human rating: multiple, rigorously trained raters score tests anonymously.
ETS raters are continually monitored to ensure fairness and the highest
quality (ETS Website, 2016).
Software rating: eRater automated scoring technology is used with human
ratings to score the independent and integrated writing tasks. Using both
human judgment for content and meaning with automated scoring for linguistic
features ensures consistent, quality scores (ETS Website, 2016).

TEST REVIEW PROJECT


Statistical
distribution
of the
scores

SEM

Evidence
for
reliability

Evidence
for validity

The data presented in the table below is based on test takers who took the TOEFL
iBT test between January 2015 and December 2015. This is based on examinees
who indicated that they were applying for admission to colleges or universities as
undergraduate students.
Table 5. Percentile Ranks for TOEFL iBT Scores Undergraduate Level
Students.
Scale Reading Listening Speaking Writing Total Percentile
Mean 19.2
19.2
20.1
20.2
Mean 79
S.D.
6.9
7
4.6
5
S.D.
21
(Test and Score Data Summary for TOEFL iBT Tests, 2015).
Score
Scale
SEM
Reading
0-30
3.35
Listening
0-30
3.20
Speaking
0-30
1.62
Writing
0-30
2.76
Total
0-120
5.64
(Reliability and Comparability of TOEFL iBT Scores, 2011, p. 5).
Score
Reliability Est.
Reading
.85
Listening
.85
Speaking
.88
Writing
.74
Total
.94
The above chart shows The reliability estimates for the Reading, Listening,
Speaking, and Total scores are relatively high, while the reliability of the
Writing score is somewhat lower. This is a typical result for writing measures
composed of only two tasks (Breland, Bridgeman, & Fowles, 1999)
(Reliability and Comparability of TOEFL iBT Scores, 2011, p. 5).
Zhang (February 2008) compared the test scores of more than 12,000
examinees who were identified as having taken two TOEFL iBT tests within a
period of one month. The correlations of their scores on the two test forms
were 0.77 for the listening and writing sections, 0.78 for reading, 0.84 for
speaking, and 0.91 for the total test score. Because these measures of reliability
take into account additional sources of variability, they are typically lower than
internal consistency measures. Nevertheless, they indicate a high degree of
consistency in the rank ordering of the scores of these test repeaters
(Reliability and Comparability of TOEFL iBT Scores, 2011, p. 5).
TOEFL iBT validity studies: http://www.ets.org/toefl/research/topics/validity
One study conducted by Sawaki, Stricker, and Oranje, (2009) summarizes firstorder factors on higher-order factors.
Turning to the higher-order factor loadings in Table 4, all the four
sections had high loadings, ranging from .78 to .97. This supports the
presence of a common underlying dimension that is strongly related
to the Reading, Listening, Speaking and Writing trait factors.
However, it is notable that the loading of the Speaking factor is

TEST REVIEW PROJECT


somewhat lower than the loadings of the other factors, suggesting that
this factor also reflects other abilities not captured by the general trait
factor (Sawaki, Stricker, & Oranje, 2009, p. 24)
Table 4 Loadings of first-order factors on higher-order factors
General
Error
SMRa
Reading
.91
.17
.84
Listening
.97
.07
.93
Speaking
.78
.39
.61
Writing
.91
.17
.83
Note: * t > +/- 1.96.a (a = Squared multiple correlation)
(Sawaki, Stricker, & Oranje, 2009, p. 25).
A further study by Biber and Gray (2013) discuss the validity of the spoken and
written task range.
In sum, the findings here have shown that there is significant and
extensive linguistic variation among TOEFL iBT texts corresponding
to differences between independent and integrated tasks in the spoken
and written modes. These findings strongly support the TOEFL
validity argument that this range of tasks is required to capture the breadth
of academic expectations in American universities (Biber & Gray,
2013, p. 68).
Table 3
IELTS - International English Language Testing System
Publisher

Date of 1st
publication
Target
population
Cost

Purpose

University of Cambridge ESOL Examinations, the British Council, and IDP:


IELTS Australia. Subject Manager, University of Cambridge ESOL
Examinations, 1 Hills Road, Cambridge CB1 2EU United Kingdom; telephone
44-1223-553355; ielts@ucles.org.uk; http://www.ielts.org/.
North America: Cambridge Examinations and IELTS International, 100 East
Corson Street, Suite 200, Pasadena, CA 91103 USA; telephone 626-564-2954;
bmeiron@ceii.org; http://www.ielts.org/
1989
English language learners who wish to work or attend college in an Englishspeaking country
Varies greatly by location of test center; see http:// www.ielts.org/ or the IELTS
handbook. In general, costs are Australia, A$160; United Kingdom, 72; United
States, about US$100 (OSullivan, 2005).
The IELTS measures English language performance in (Reading, Writing,
Speaking, and Listening) of people whose first language is not English and
desire to work or study in an English speaking community. The test provides
two tracks for test-takers depending on their purpose: the Academic or General
Training (GT) routes. The Academic route is for those interested in applying to a

TEST REVIEW PROJECT

Structure
(parts and
item types)

10

university or membership to a professional organization, while the GT route is


for those who wish to study at the secondary level, complete immigration
requirements, or work in an English-speaking environment (OSullivan, 2005).
All four modules (Listening, Speaking, Reading, Writing) lasts 2 hrs. & 45 min
for both test tracks (GT and Academic). Both GT and Academic test-takers take
the same Listening and Speaking tests but different Reading and Writing tests.
The distinction between IELTS Academic and IELTS General Training lies in
the subject matter of the Reading and Writing components. Listening, Reading
and Writing must be completed on the same day, with no breaks in between
them. The order in which these tests are taken may vary (IELTS Website, 2016).
Listening: 30 min., 4 recorded texts including monologues and conversations by
a range of native speakers. Questions types include: multiple choice, matching,
plan/map/diagram labelling, form/note/table/flow-chart/summary completion,
sentence completion. Test-takers write their answers on the question paper as
they listen and at the end of the test are given 10 minutes to transfer their
answers to an answer sheet (IELTS Website, 2016).
Speaking: 11-14 min. recorded oral interview in which test-takers respond to
personal questions on a range of familiar topics, such as home, family, work,
studies and interests. There are three parts to the test. Test takers are given 1
minute to prepare their talk, and a pencil and paper to make notes to prepare and
structure their 2 minute talk. Part 2 lasts 34 minutes, including the preparation
time. Part 3 focuses on the test takers' ability to express and justify opinions and
to analyze, discuss and speculate about issues (IELTS Website, 2016).
Academic Reading: 60 min. to read three passages and answer 40 questions.
Texts are taken from books, journals, magazines and newspapers and have been
written for a non-specialist audience. Passages may be written in narrative,
descriptive or discursive/argumentative. Texts may contain non-verbal materials
(diagrams, graphs or illustrations). A variety of questions are used, including:
multiple choice, identifying information, identifying writers views/claims,
matching information, matching headings, matching features, matching sentence
endings, sentence completion, summary completion, note completion, table
completion, flow-chart completion, diagram label completion, short-answer
questions. Test takers are required to transfer their answers to an answer sheet
within the time allowed for the test. No extra time is allowed for transfer. Poor
spelling and grammar are penalized (IELTS Website, 2016).
GT Reading: 60 min. to complete three sections and answer 40 questions.
The same question types are used as in the Academic Reading test. Section 1,
social survival, deals with basic linguistic survival tasks (e.g., retrieving and
providing general factual information, like notices, advertisements and
timetables). Section 2, workplace survival, deals with job descriptions,
contracts, and staff development and training materials. Section 3, general
reading, involves extended prose with a more complex structure but with the
emphasis on descriptive and instructive rather than argumentative texts
(newspapers, magazines and fictional and non-fictional book extracts).

TEST REVIEW PROJECT

Scoring

11

Academic Writing: Two writing tasks in 60 min.


Task 1: 150 words in 20 min. Describe visual information (graph, table, chart,
and diagram) in own words. Goal: to identify the most important and relevant
information and trends in a graph, chart, table or diagram, and to give a wellorganized overview of it using language accurately in an academic register or
style (IELTS Website, 2016).
Task 2: 250 words in 40 min. Respond to a prompt with a point of view,
argument or problem in academic or semi-formal/neutral style. This task
assesses the ability to present a clear, relevant, well-organized argument, giving
evidence or examples to support their ideas. (IELTS Website, 2016).
GT Writing: Two writing tasks in 60 min.
Task 1: 150 word response with consideration of audience and purpose (e.g.
writing to a friend (informal) or writing to a manager (semi-formal or formal).
This task assesses the test takers' ability to follow English letter-writing
conventions, to use language accurately and appropriately and to organize and
link information coherently and cohesively.
Task 2: 250 word response to a point of view, argument or problem providing
general factual information, outlining and/or presenting a solution, justifying an
opinion, evaluating evidence and ideas. Essays should show organization of
ideas, supportive arguments with relevant examples, complex ideas, and a range
of vocabulary and grammatical structures. This task assesses ability to organize
and link information coherently and cohesively and to use language accurately
and appropriately (IELTS Website, 2016).
All IELTS tests are scored using a Band Score Conversion Table which
translates scores out of 40 into the IELTS 9-band scale. A 0 = test-taker didnt
attempt the test, while a 9 = Expert user (IELTS Website, 2016). Scores for
each module (Listening, Speaking, Reading, and Writing) are reported in whole
bands and half bands, with a final, averaged holistic score. The minimal score
accepted by most universities is level 6.5, whereas some universities (e.g., Univ.
of California Berkley), require level 7 (OSullivan, 2005). All modules are
scored by certificated IELTS examiners, who are regularly monitored to ensure
reliability. After scoring at the test center, all answer sheets are returned to
Cambridge English Language Assessment for further analysis (IELTS Website,
2016).
Listening: One mark is awarded for each correct answer in the 40-item test. A
confidential Band Score conversion table is produced for each version of the
Listening test, which translates scores out of 40 into the IELTS 9-band scale.
Test-takers are penalized for poor spelling and grammar on the answer sheet.
Speaking: Detailed performance descriptors have been developed which
describe spoken performance at the nine IELTS bands. Assessment criteria
include: Fluency and coherence, Lexical resource, Grammatical range and
accuracy and Pronunciation (IELTS Website, 2016).
Academic and GT Reading: The following skills are assessed: an overall
understanding of the main points of the text, ability to recognize opinions or
ideas, ability to recognize the main theme in the paragraphs or sections of a text,
ability to distinguish main ideas from supporting ones, ability to recognize

TEST REVIEW PROJECT

12

relationships and connections between facts in the text versus opinions and
theories, ability to skim and scan text for specific information, and ability to read
for detail, understand a detailed description, and relate it to information
presented in the form of a diagram (IELTS Website, 2016).
Academic and GT Writing: Each task is assessed independently. Task 2 carries
more weight than Task 1. Detailed performance descriptors apply to both the
Academic and General Training Modules and are based on the following
criteria: Task Achievement/Response, Coherence and Cohesion, Lexical
Resource, and Grammatical Range and Accuracy (IELTS Website, 2016).
Statistical
Table 1: Mean, standard deviation and standard error of measurement of
distribution Listening and Reading (2014)
of the
scores
Module
Mean
SD
Listening
6.1
1.3
Academic Reading
6.0
1.2
GT Reading
6.0
1.4
(IELTS Test Performance)
2014 2014)
(IELTSdeviation
Test Performance,
SEM
Table 1: Mean, standard
and standard error of measurement of
Listening and Reading
2014)(2014)

Evidence
for
reliability

Evidence
for validity

Module
SEM
Listening
0.391
Academic Reading
0.378
GT Reading
0.418
(IELTS Test Performance)
2014 2014)
Table 2.3: Reliability Estimates for the IELTS Modules (IELTS, 2004)
Module
M
SD
Listening
.90
0.02
Academic Reading
.85
0.02
GT Reading
.90
0.02
(OSullivan, 2005).
Shaw (2004) shows reliability estimates from .77 to .85 for a series of research
studies done during a revision project (OSullivan, 2005).
OSullivan (2005) states in his review of IELTS, University of Cambridge
ESOL Examinations claims evidence of construct-related validity through the
use of expert judgement in operationalizing the constructin addition to
empirical evidence provided through statistical analysis of test responses
(OSullivan, 2005, p. 77). However, the following study by Moore, Morton, and
Price (2012) analyzed the construct validity of the IELTS Academic Reading
Test. The researchers found a divergence between the two domains of the IELTS
corpus and an academic corpus.

TEST REVIEW PROJECT

13

This was mainly found to arise from the considerable variety of


reading tasks identified in the academic corpus, especially those that
related to more extended assignment tasks (e.g. essays, reports).
Thus, whereas the IETLS corpus saw virtually all task types fall
within the local-literal of our analytic matrix, the academic corpus
was notable for incorporating tasks that covered all four areas. Amid
this diversity were tasks which seemed, on the face of it, to be quite
remote from the IELTS profile of tasks, including, for example,
those which required a critical engagement with material, or which
stipulated engagement with a multiplicity of sources and
viewpoints (p. 191-192).
Discussion
The teaching context in which I envision for myself is working for a university abroad
(Costa Rica, Thailand, Japan, Brazil, etc.) with academic ELLs wanting to use their proficiency
scores for high-stakes purposes, such as university admissions or achieving employment in a
professional field in an English speaking nation (U.S., U.K., Australia). The learners in this
context will already be at university level in their home country, but are desiring further
education (Masters, PhD) or employment where English is the common language. The ELLs will
most likely be monolingual in their L1, have advanced literacy in their L1, but may have several
English language proficiencies based on previous educational experiences in their home country
(beginning, intermediate, advanced). All learners would need to take an appropriate language
proficiency test to be admitted or hired in their country of choice. Motivation levels would be
high for this group of learners, as learning English will improve their employment and
educational opportunities. Class size would range from 30 50 pupils, depending on the
university. The curriculum would be selected by the university language department, language
instructors, or a collaboration of both. Classes would be divided by level (beginning,
intermediate, advanced English) into listening/speaking, reading/writing, grammar, or by each
individual skill. English for Academic Purposes (EAP) would be the primary focus of

TEST REVIEW PROJECT

14

instruction, and depending on the group of learners, other content-areas of focus (i.e., fields of
study, careers, etc.) may be included in classroom materials.
In comparing the three published proficiency tests reviewed to the teaching context
above, I have concluded that the TOEFL iBT would be the most appropriate test for this
particular group of ELLs. Considering the practicality of the TOEFL iBT (internet-based, 4
hours total, numerical score), along with the empirical evidence of validity and reliability of test
results, the IELTS and ACTFL are not suitable to my context of learners.
Though the other two tests are designed for the same target population and a similar
purpose, the TOEFL iBT is more practical than the IELTS because it is strictly internet and
computer based, and TOEFL iBT scores are more recognized by universities than the ACTFL
scores, which are not numerical. There are also multiple third-party research studies that provide
evidence of validity and reliability for the TOEFL iBT. Unfortunately, I could not find as much
research regarding the validity or reliability of the IELTS. The fact that the TOEFL iBT Writing
test uses both computer and human ratings shows further evidence of rater reliability.
Though the IELTS is the shortest test of the three (2 hours & 45 min.), it requires testtakers to transfer their answers to an answer sheet, sometimes with no extra time allowed for
transfer. This extra step seems repetitive and time-consuming for test-takers. They are also
penalized for poor spelling and grammar in their answer book for the Listening and Reading
modules of IELTS (IELTS Website, 2016), as opposed to the multiple choice question types
presented on the TOEFL iBT and ACTFL. The IELTS General Training track provides a good
alternative for learners seeking jobs abroad, but there is not much research to prove the construct
or content validity of this test. I believe the least effective assessment of the tests reviewed is the

TEST REVIEW PROJECT

15

IELTS, due to a lack of evidence of validity and repetitive (and outdated) hand-written
procedures on some of the modules (Reading and Listening).
Some benefits of ACTFL assessments include proficiency tests in multiple languages,
instantaneous results, and the volume of research that has been validated by independent
researchers and consulting groups. In spite of the ACTFL being easy to use, auto-graded, and
providing results instantaneously, employers and universities must be familiar with the ACTFL
five-point scale from Novice to Distinguished levels and sub-levels (low, mid, high). ACTFL
assessments also provide no data regarding statistical distribution of scores or standard error of
measurement, which are often used to determine evidence of reliability. The ACTFL can also be
extremely time-consuming for test-takers with the Reading and Listening tests maxing out at 125
minutes each for four-level tests, and the Writing test maxing out at 80 minutes.

TEST REVIEW PROJECT

16
References

ACTFL Assessments Brochure (2012). ACTFL Assessments. Retrieved from


http://d2k4mc04236t2s.cloudfront.net/wpcontent/uploads/2014/12/ACTFL%20Assessments%20Brochure.pdf
ACTFL, Inc. (2014). Listening Proficiency Familiarization Manual. Retrieved from
http://d2k4mc04236t2s.cloudfront.net/wpcontent/uploads/2015/02/ACTFL_FamManual_Listening_2015.pdf
ACTFL, Inc. (2013). Reading Proficiency Familiarization Manual. Retrieved from
http://d2k4mc04236t2s.cloudfront.net/wpcontent/uploads/2015/02/ACTFL_FamManual_Reading_2015.pdf
ACTFL, Inc. (2012). Writing Proficiency Familiarization Manual. Retrieved from
http://d2k4mc04236t2s.cloudfront.net/wp-content/uploads/2012/08/ACTFL-WritingProficiency-Test-WPT-Familiarization-Manual-.pdf
ACTFL Website (2016). American Council on the Teaching of Foreign Languages. Retrieved
from http://www.actfl.org/professional-development/proficiency-assessments-the-actfltesting-office
Bachman, L., & Palmer, A. (2010). Language assessment in practice. New York: Oxford
University Press.
Biber, D. & Gray, B. (2013). Discourse characteristics of writing and speaking task types on the
TOEFL iBT test: A lexico-grammatical analysis. TOEFL iBT Report No. iBT-19.
Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/j.23338504.2013.tb02311.x/epdf

TEST REVIEW PROJECT

17

ETS Website (2016). TOEFL iBT Sample Test Questions. Retrieved from
http://www.ets.org/Media/Tests/TOEFL/pdf/SampleQuestions.pdf
ETS Website (2016). TOEFL Home. Retrieved from http://www.ets.org/toefl/
IELTS Website (2016). Retrieved from https://www.ielts.org/
LTI Website (2016). Language Testing International. Exclusive Licensee of ACTFL. Retrieved
from http://www.languagetesting.com/
Miller, M.D., Linn, R.L., & Gronlund, NE. (2009). Measurement and assessment in teaching.
Upper Saddle River, NJ: Pearson.
Moore, T., Morton, J., & Price, S. (2012). Construct validity in the IELTS Academic Reading
test: A comparison of reading requirements in IELTS test items and in university study.
In L. Taylor & C. J. Weir (Eds.), IELTS Collected Papers 2: Research in reading and
listening assessment (pp. 120 211). Cambridge: Cambridge University Press.
OSullivan, B. (2005). International English Language Testing System (IELTS). In S. Stoynoff
& C. A. Chapelle (Eds.), ESOL Tests and Testing (pp. 7378). Alexandria, VA: TESOL.
Reliability and comparability of TOEFL iBT scores. (2011). Insight: TOEFL iBT Research, 1(3),
1-8. Retrieved from http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf
Sawaki, Y., Stricker, L. J., & Oranje, A. H. (2009) Factor structure of the TOEFL Internet-based
test. Language Testing, 26(1), 530. doi: 10.1177/0265532208097335
Stoynoff, S., & Chapelle, C. A. (2005). ESOL tests and testing: A resource for teachers and
administrators. Alexandria, VA: TESOL.

TEST REVIEW PROJECT

18

SWA Consulting, Inc. (2012). Reliability Study of ACTFL OPIc in Spanish, English, and
Arabic for the ACE Review. Retrieved from http://www.languagetesting.com/wpcontent/uploads/2013/08/actfl-opic-reliability-2012.pdf
SWA Consulting, Inc. (2004). Preliminary Reliability and Validity Findings for the ACTFL
Writing Proficiency Test. SWA Technical Report 2004-C04-R01. Retrieved from
http://www.languagetesting.com/wp-content/uploads/2013/08/ACTFL-WPT-TechnicalReport-2004.pdf
SWA Consulting, Inc. (2008). Two Studies Investigating the Reliability and Validity of the
English ACTFL OPIc with Korean Test Takers. The ACTFL OPIc Validation Project
Technical Report. Retrieved from http://d2k4mc04236t2s.cloudfront.net/wpcontent/uploads/2013/08/ACTFL-OPIc-English-Validation-2008.pdf
Test and Score Data Summary for TOEFL iBT Tests (2015). January 2015 December 2015
Test Data. Retrieved from https://www.ets.org/s/toefl/pdf/94227_unlweb.pdf

S-ar putea să vă placă și