Leakey - Evaluating Computer-Assisted Language Learning: An Integrated Approach To Effectiveness Research in CALL

Schools, colleges and universities are investing a great deal in
Evaluating Computer-Assisted Language Learning

the purchase of computer resources for the teaching of modern
languages, but whether these resources make a measurable
difference to the learning of language students is still unclear. In
Evaluating
this book the author outlines the existing evidence for the impact of
computers on language learning and makes the case for an integrated
approach to the evaluation of computer-assisted language learning
(CALL). Drawing on current and past research linked to CALL and
e-learning, the author builds a comprehensive model for evaluating
not just the software used in language learning, but also the teaching
and learning that takes place in computer-based environments, and Computer-
Assisted
the digital platforms themselves. This book will be of interest not
only to language teachers and CALL researchers, but also to those
interested in e-learning and general research methodology, as well
as designers of educational software, digital labs, virtual learning
Language
environments (VLEs) and institutional budget holders.
Learning
Jonathan Leakeys interest in evaluating the effectiveness of comput-
ers in language teaching comes from his years of teaching modern An Integrated Approach to
languages in secondary schools in Liverpool and in the further and
Effectiveness Research in CALL
Jonathan Leakey
higher education sectors in Northern Ireland. Since 2002 he has been
a lecturer in French, German and European Studies at the University
of Ulster, where he completed his doctorate in 2008.
JONATHAN LEAKEY
ISBN 978-3-0343-0145-9
Peter Lang
www.peterlang.com
Schools, colleges and universities are investing a great deal in
Evaluating Computer-Assisted Language Learning

the purchase of computer resources for the teaching of modern
languages, but whether these resources make a measurable
difference to the learning of language students is still unclear. In
Evaluating
this book the author outlines the existing evidence for the impact of
computers on language learning and makes the case for an integrated
approach to the evaluation of computer-assisted language learning
(CALL). Drawing on current and past research linked to CALL and
e-learning, the author builds a comprehensive model for evaluating
not just the software used in language learning, but also the teaching
and learning that takes place in computer-based environments, and Computer-
Assisted
the digital platforms themselves. This book will be of interest not
only to language teachers and CALL researchers, but also to those
interested in e-learning and general research methodology, as well
as designers of educational software, digital labs, virtual learning
Language
environments (VLEs) and institutional budget holders.
Learning
Jonathan Leakeys interest in evaluating the effectiveness of comput-
ers in language teaching comes from his years of teaching modern An Integrated Approach to
languages in secondary schools in Liverpool and in the further and
Jonathan Leakey
higher education sectors in Northern Ireland. Since 2002 he has been
a lecturer in French, German and European Studies at the University
of Ulster, where he completed his doctorate in 2008.
JONATHAN LEAKEY
Peter Lang
www.peterlang.com
Evaluating Computer-Assisted
Language Learning
Evaluating
Computer-
Assisted
Language
Learning
An Integrated Approach to
J o n th n L k y
Peter Lang
Oxford Bern Berlin Bruxelles Frankfurt am Main ew York Wien
Bibliographic information published by Die Deutsche ationalbibliothek.
Die Deutsche ationalbibliothek lists this publication in the Deutsche
ationalbibliografie; detailed bibliographic data is available on the Internet
at http://dnb.d-nb.de.
catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data:
Leakey, Jonathan, 1962-

valuating computer-assisted language learning : an integrated
approach to effectiveness research in C LL / Jonathan Leakey.
p. cm.
Includes bibliographical references and index.
ISB 978-3-0343-0145-9 (alk. paper)
1. Language and languages--Computer-assisted instruction. 2. Language
and languages--Study and teaching--Computer network resources. I.
itle.
P53.285.L43 2010
418.00785--dc22
2010045779
ISBN 978-3-0343-0145-9
E-ISBN 978-3-0353-0131-1
Peter Lang , International cademic Publishers, Bern 2011
Hochfeldstrasse 32, CH-3012 Bern, Switzerland
info@peterlang.com, www.peterlang.com, www.peterlang.net
ll rights reserved.
ll parts of this publication are protected by copyright.
ny utilisation outside the strict limits of the copyright law, without the
permission of the publisher, is forbidden and liable to prosecution.
This applies in particular to reproductions, translations, microfilming,
and storage and processing in electronic retrieval systems.
Printed in ermany
Contents
List of Figures vii
List of Tables ix
Chapter 1
The need for systematic quality control in CALL 1
Chapter 2
Swings, spirals and re-incarnations: Lessons from the past 21
Chapter 3
Has CALL made a difference: And how can we tell? 59
Chapter 4
A model for evaluating CALL. Part 1: CALL enhancement criteria 73
Chapter 5
A model for evaluating CALL. Part 2: Qualitative and quantitative
measures 115
Chapter 6
Case Study 1: Evaluating digital platforms 133
Chapter 7
Case Study 2: Evaluating programs 167
vi
Chapter 8
Case Study 3: Evaluating pedagogy 197
Chapter 9
A new framework for evaluating CALL 247
Bibliography 291
Index 299
List of Figures
Figure 2.1 Research process onion 30
Figure 4.1 Evaluation flowchart (MFE1) 75
Figure 4.2 Elliss Framework for investigating L2 acquisition 97
Figure 5.1 Evaluation diamond for CALL effectiveness research (MFE1) 122
Figure 8.1 Mean improvement from the pre- to the post-test 210
Figure 9.1 Evaluation flowchart (MFE2) 249
Figure 9.2 Evaluation diamond for CALL effectiveness research (MFE2) 279
List of Tables
Table 1.1 Levels of analysis for CALL evaluation 13

Table 2.1 Chapelles six criteria for CALL task appropriateness 48
Table 4.1 Chapelles criteria for evaluating the qualities of test usefulness 78
Table 4.2 Chapelles six criteria for evaluation of CALL task appropriateness 80
Table 4.3 Additional six principles for evaluating CALL enhancement 81
Table 4.4 Tally chart of exercises mapping the twelve CALL
Enhancement Criteria 83
Table 4.5 Mapping the University of Ulster/LLAS (Toner et al. 2007)
survey questions 84
Table 4.6 Mapping the Melissi Digital Classroom performance indicators 8687
Table 4.7 Mapping of Ingraham and Emerys (1991) evaluative headings 9091
Table 4.8 Mapping of Hubbards (1988) evaluative headings 93
Table 4.9 Elliss Framework for investigating L2 acquisition 95
Table 4.10 Mapping of Elliss framework for investigating SLA 99
Table 4.11 Mapping of Dunkels (1991) evaluative headings 106107
Table 4.12 Mapping of Robinson et al.s (1984) evaluative headings 111
Table 4.13 A mapping of Mehannas pedagogical clusters 112113
Table 5.1 Checklist to enable the mapping of quantitative and
qualitative measures 121122
Table 5.2 Research Design Criteria checklist for MFE1 125126
Table 5.3 Proto-typical (MFE1) version of the checklist for data
collection methods 126
Table 5.4 Validity assessment criteria for MFE1 drawing 127
x List of Tables
Table 6.1 Platform-judging considerations linked to the CALL

Table 6.2 Comparison of three digital platforms 142143
Table 6.3 MFE1 table mapping Robotel functionality against
Ingraham and Emery (1991) 150151
Table 6.4 MFE1 table mapping Robotel functionality against
Hubbard (1998) 154155
Table 6.5 MFE1 table mapping Melissi functionality against
Hubbard (1988) 158
Ingraham and Emery (1991) 160
Dunkel (1991) 161162
Table 7.1 Comparing the different pedagogical approaches behind
the TMM7 and TMM9 studies 171
Table 7.2 MFE1 checklist for data collection methods 172
Table 7.3 Validity assessment criteria for MFE1: mapping of the
TMM9 project 182
Table 7.4 Group descriptives by language and year group (TMM9) 185
Table 7.5 General statistics describing participation (TMM9) 186187
Table 7.6 Language levels, participation and total time spent
(treatment group) TMM9 187188
Table 7.7 TMM7 and TMM9 mapped against the twelve CALL
Table 7.8 Summary of the comparative features of TMM7 and TMM9 194195
Table 7.9 Summary of the comparative features of TMM7 and TMM9 196
Table 8.1 Model for CALL evaluation (MFE1) CALL pedagogy
checklist 200201
Table 8.2 MFE1 checklist for data collection methods: Mapping of the
TOLD project 203
List of Tables xi
Table 8.3 Validity assessment criteria for MFE1: Mapping of the

TOLD project 206
Table 8.4 Model for CALL evaluation (MFE1) Data Collection
Measure and Variable details 208
Table 8.5 Task-by-task and skill-by-skill paired samples t-test for the
treatment group 211
Table 8.6 Task-by-task and skill-by-skill paired samples t-test for the
control group 212
Table 8.7 Mapping ofTOLD against Dunkels (1991) evaluative
headings 216217
Table 8.8 Different blends of approach, settling, media and task in
the BLINGUA projects 222
checklist (BLINGUA-1) 223
checklist (BLINGUA-2) 225
Table 8.11 MFE1 Data Collection Measure and Variable details for
TOLD and BLINGUA projects 226227
Table 8.12 MFE1 checklist for data collection methods 229
Table 8.13 Validity assessment criteria for MFE1: Mapping of the
BLINGUA projects 230231
Table 8.14 Advantages and disadvantages of the blending of
Multimedia lab + VLE 239
Table 8.15 Mapping ofBLINGUA-1 and 2 projects against
Dunkel (1991) 240241
Table 8.16 Mapping of BLINGUA-1 and 2 projects against twelve
CALL Enhancement Criteria 242243
Table 9.1 Synthesized list of criteria for evaluation of CALL
programs, platforms and pedagogy 250251
Table 9.2 Model for CALL evaluation (MFE2) Quality control 253
xii List of Tables
Table 9.3 MFE2 quality control. Evaluation of CALL enhancement

criterion 1 254255
criterion 2 256257
criterion 3 258259
criterion 4 260261
criterion 5 262263
criterion 6 264265
criterion 7 266267
criterion 8 268269
criterion 9 270271
criterion 10 272273
criterion 11 274275
criterion 12 276277
Table 9.15 Research Design Criteria pre-project checklist for MFE2 280
Table 9.16 Model for CALL evaluation MFE2 Quality control:
Data collection measures 281
Research construct validity checklist 282283
Table 9.18 Research Design Criteria post-project checklist for MFE2 284285
Chapter 1
The need for systematic quality control in CALL
Introduction
New technologies, new literacies and a need to demonstrate their value
There have always been sceptics who have doubted whether the computer
has anything significant to add to the language learning experience beyond
the wow factor. Even with the arrival of the modem, broadband, Local
Area Networks (LAN), the worldwide web (WWW), Virtual Learning
Environments (VLE) and e-learning, doubts have persisted and the absence
of clear-cut empirical data demonstrating improved learning has not helped
to quell the uncertainty. It is still not really known with any degree of cer-
tainty whether computer-assisted language learning (CALL) makes an
objective, measurable and significant difference to students learning.
Qualitative studies have been aplenty and these have lent some cre-
dence to the educational benefits of new technologies for language learn-
ing. The language teacher may now, by means of a computer, deliver the
four main language skills (listening, speaking, reading and writing), teach
vocabulary acquisition, grammar tuition, literature, area studies, and also
enhance meta-cognitive language learning skills. Computer-mediated com-
munication (CMC) and web-enhanced language learning (WELL) have
sought to exploit the opportunities to motivate a new generation of lan-
guage learners. Within educational institutions we also have ever-improv-
ing multimedia language laboratories, interactive whiteboards (IWBs),
networked courseware and sophisticated tracking software. Nowadays,
language learning can occur through mobile-assisted language learning
(MALL), audio-, video- streaming, mp3s, pod-casting and wi-fi literally,
2 Chapter 1
language learning on the hoof. But can we show that any, or all, of these
do any better than an inspirational and well-organized language teacher
can achieve, or could have achieved in the past, without the benefit of a
computer or digital lab, and using merely those tools ofthe pre-digital era:
paper, pen, chalk (or dry-wipe marker!) and talk, conversation class, group/
pair work, cassette recorder and an overhead projector?
The digital revolution has even altered the way language is used.
Chapelle (2004) put it this way: language learners are entering a world
in which their communicative competence will include electronic litera-
cies, i.e., communication in registers associated with electronic commu-
nication (2004: 2). But are the tools of educational measurement still
flexible enough, and do they have the scope, to be able to evaluate and
indeed measure the impact of this revolution on language learning and
language learners? Indeed, is the task of identifying scientifically the causes
of improvement in language learning an impossible one? Is it like trying to
triangulate on the infinite (or whatever else we choose to call it) with our
finite minds and tools, as Willard McCarty put it in his key-note speech
to the 1995 EUROCALL Conference in Valencia?
The challenge for those attempting to apply scientific metrics to any
Humanities subject and CALL must surely belong, in large measure, to
the Humanities is that we are dealing with human beings, all of whom
possess complex subjectivity, multiple motivations and unique experi-
ences and gifts. Each one uses different learning processes, adopts different
learner strategies, and demonstrates different learning styles. However, in
evaluating pedagogy for language acquisition, there is not only the learner
to factor in, but also the learning and the learning environment.
When one considers the learning, there are plenty oflanguage-learning
pedagogies past and present that may be influencing teachers and class-
room or lab proceedings: Behaviourism, Functionalism, Constructivism,
Social Constructivism, Associationism, Connectivism, Socio-linguistics,
Chomskyism, the Natural Approach, Accelerative learning, Suggestopedia,
Second Language Acquisition (SLA), Cognitivism, Task-Based Learning,
Blended Learning (BL) and more. The question is the following: is the role
they play identifiable, and if so, is it susceptible to qualitative appraisal or
even quantitative measurement?
As for the CALL learning environment, clearly there are factors that
must play their part in influencing learning outcomes, such as comfort,
ergonomics and affective or psycholinguistic dynamics. Computer-based
learning environments clearly create their own variables in the learning
equation. Can these, too, be identified, isolated and measured? And if so,
how?
In essence, this book is about evaluation and aims to give the reader,
whatever his or her experience of evaluation, a theoretical introduction as
well as practical tools (i.e. a model for evaluation and stage-by-stage check-
lists) for assessing the value of computers in language teaching and learning
(CALL). This book will look at the history of attempts to be more certain
in evaluating CALL and will explore ways in which evaluation might be
done more efficiently and comprehensively. While the field-work has been
carried out at a UK university level, examples are cited from other sectors of
education from primary, through secondary and up to adult level. Readers
will find the model for evaluation (abbreviated to MFE), and checklists
have a built-in flexibility to enable them to be applied in a wide range of
educational contexts. They will enable the evaluator to carry out a kind of
quality control of the key factors that contribute to computer-assisted
language learning.
To that end, the focus has been on three variables which were felt at
the outset ofthe study to encompass the principal factors influencing the
language learner and language learning: the digital platform, the software
program, and the pedagogy employed. It was concluded early on in the
project that an evaluative model for CALL had to deploy the appropriate
metric tools and research approach to assess empirically both the impact
of each distinct element and any added synergies that may operate when
all the elements are working together in a real-life setting.
The need for a systematic approach to CALL evaluation
This book builds on the agenda-setting work of a small number ofCALL/

CASLA researchers to develop a more systematic approach to evaluat-
ing CALL. From the outset the aim was to demonstrate that CALL
4 Chapter 1
effectiveness research, drawing on the findings of empirical as well as more

qualitative research, should also be an integral part in the design and con-
struction of appropriate digital learning platforms, the writing of software
for language learning, and the conceptualizing of effective pedagogies for
CALL. That this has not always been the case may have been due to the
fact that CALL evaluations have often been undertaken in a scattergun
way and have not always employed a rigorous methodology. It is probable
that there will never be a single optimal pedagogy given the plethora of
existing approaches and language theories and their ever-changing nature.
However, what is needed, the more so because of this diffuseness in the
pedagogy and the hectic pace oftechnological advances, is a holistic, stable
and reliable approach to CALL evaluation. There are, in the history of
CALL and CAL, examples of good evaluative practice dating back to the
earliest days of computer-assisted learning (i.e. the 1960s), which need to
be integrated into a model for CALL evaluation.
The shabby equipment may have improved, become less prone to
breakdown and more interoperable, but has the pedagogy kept pace, and
have the learners and teachers managed to keep up? Laurillard in 1994
sounded a note of caution: If the use of new technology were to begin
with an analysis of what students need, instead of an analysis of what the
technology can offer, the directions taken would be very different (1994:
1). Similarly, Thompson states that CALL materials must be relevant to,
and integrated into, the curriculum: Unless [CALL materials] are directed
towards specific modules in specific programmes, which relate in a mean-
ingful way to their general learning programme, learners will soon dismiss
CALL as a waste of their time (2005: 151). In recent years there seems to
have been a gradual reappraisal of priorities with an increased emphasis
on a learner-oriented approach. Yet the nagging question persists: does all
this computer-assisted learning make any real difference to the learning
process and to the quantity and quality of the learning? And, if so, how
can this be proven?
Existing parameters for CALL evaluation
A book whose main objective is the evaluation ofthe methodologies used

in CALL effectiveness research and the exploration of useful ways of quan-
tifying student progress through CALL is faced with the challenge of
measuring a range of potential contributors, as well as potential hindrances,
to learning gains. It must consider the effectiveness not only of software
but also ofthe persona in the software (McCarty 1995: 3031), that is, its
personality, life, responsiveness and interactivity. It needs to determine the
ability of equipment, be it dated or state-of-the-art, to contribute added
value. It has to assess the impact upon learning outcomes of those many
independent variables (such as attitude, aptitude, learning style, degree of
ICT use and environment) which may, or may not, yield themselves up to
quantification and qualification.
Leaders in the field of CALL, from Pederson and Dunkel in the late
1980s and early 1990s to Levy, Chapelle and Felix in more recent years, have
pointed to the dearth ofboth CALL pedagogy and the empirical research
into learning processes and learning gains, while at the same time providing
guidance as to the kind of research that is needful. Chapelle, for example,
in Computer Applications in Second Language Acquisition Foundations
for teaching, testing and research (2001), declares: The CASLA retrospec-
tive ofthe twentieth century portrays it as a time of idiosyncratic learning,
quirky software development, and nave experimentation. She is, however,
hopeful that the stage is set for an improved approach to collaborative
development of infrastructure for large-scale use by the profession, and
research essential for the development oftheories of language assessment
and acquisition (2001: 175) as the twenty-first century gets under way.
Light at the interface
There is general agreement on the need in a field such as CALL, anchored

as it is between the humanities and the world of technology, to balance
qualitative with quantitative data. It is not that the humanities can only
be subject to qualitative study and the world of technology only subject
6 Chapter 1
to quantitative analysis, but rather that human interaction, or inter-sub-

jectivity, is so complex as not to be easily quantifiable and that technology
so utterly dependent on empiricism and logic as to miss the affective, the
human, the persona, and the synergistic. Can each be harnessed together
for the purpose of identifying and isolating contributory variables and
causality within the CALL dynamic?
Effectiveness research based on social sciences and the humanities
faces a clash of two philosophical approaches to research: positivism and
phenomenology. A positivist approach emphasizes observable social reali-
ties, law-like generalizations similar to those produced by the physical and
natural sciences (Remenyi, Williams, Money & Swartz 1998: 32), and seeks
to make detached interpretations aboutdata that have been collected
in an apparently value-free manner with an emphasis on a well-struc-
tured methodology that enables replication and quantifiable observations
that lend themselves to statistical analysis (Saunders, Thornhill & Lewis
2006: 85). As such, it will always stand in tension with a phenomenologi-
cal approach, whose proponents argue that rich insights into this complex
world are lost if such complexity is reduced entirely to a series of law-like
generalizations (Saunders et al. 2006: 86), that the hidden reality behind
thoughts and feelings does not lend itselfto exact quantification and that
only a phenomenological approach can enable the discovery ofthe reality
working behind the reality (Remenyi et al. 1998: 35).
The challenge to both technology and humanities researchers and ped-
agogues occurs at the human-computer interface. Is their coming together
measurable? If so, how? Can the study of such a conjunction inform lan-
guage teachers, language learners, courseware designers, educational man-
agers, and theorists of second, or foreign, language acquisition? How have
CALL researchers and others approached these questions, and what new
paths are being forged? Such is the heart of this enquiry.
A need to configure metric methodologies
It is not surprising, given the coincidence in CALL of at least two major

academic fields, that various research design models have been adopted
and adapted. Murray (1999: 191, cited in Levy 2000: 180), for example,
underlines the importance of multiple data sources and urges a configura-

tion of methods from pre- and post- language proficiency tests to inter-
views, personal language learning histories, journals, interviews and video
observations.
Likewise, Felix (2004: 124) in a conference paper entitled the Para-
doxes and Pitfalls ofICT Effectiveness Research: Some Modest Solutions,
states that a combination of various data collection methods within one
single study will help in strengthening confidence levels about results. In her
analysis of over 100 recent studies, she lists at least sixteen different design
models being used, with varying degrees of validity, by CALL researchers
(2004: 128). These range from pre-/ quasi/ and non-experimental designs
to various kinds of pre-test/post-test combinations, and within-subject or
between-subject models. To measure individual outcome effects she says
that a short-term fully controlled experimental design would be appropri-
ate, while a longer-term non-experimental study using qualitative measures
such as observational procedures and think-aloud protocols would yield
important data relating to effects on learning processes (Felix 2005a: 12).
Clearly what is needed in the world of CALL is well thought-through
design models together with apt combinations of data-capture methods,
which, where necessary, span the quantitative-qualitative axis.
A spouse for fruit: Keeping sight of the main purpose
Yet the gathering and publishing of data must surely not be our final goal.
In our endeavour to improve the performance of our language students in
their target language, and in our search for the elusive goals of optimized
platforms, programs and pedagogy, the role of our data must be to inform
further improvements in teaching and learning as well as CALL software
design and not be an end in themselves. A lesson might be taken from Sir
Francis Bacon whose condemnation ofthose who valued knowledge as an
end in itself, who use it as a mistress for pleasure rather than as a spouse
for fruit, might also be applied to those who value data and technology
as ends in themselves (Bacon, cited in Lewis 1943: 46). The fruit we seek
as teachers and CALL researchers must ultimately be the progress of our
students, not the generation of unapplied data and evidence.
8 Chapter 1
Much CALL research increasingly focuses on what have been called

atomistic studies (Yildiz & Atkins 1993: 134, cited in Levy 1997: 30) that
aim to unpack ever smaller foci of study given the increasing complexity
in CALL, and our growing understanding ofthe many variables involved.
Such discrete studies, while important, can easily lose sight of the larger
picture and ofCALLs location at the crossroads of a number of different
disciplines. It is essential to remain mindful of this and the fact that our
clients, or subjects, will always be human beings at varying stages of devel-
opment, with varied combinations of learning style, and the power to
behave unpredictably. If our promised land is to remain multilingual-
ism, then CALL must enslave neither teacher nor learner to the means
of delivery.
All the same, CALL effectiveness research must endeavour to be as
precise in its measurement as possible, employing diagnostic tools that are
appropriate to the task and rigorous. Pederson (1988), in her synthesis of
previous CALL research, cites Sterns (1983: 63) description of appropriate
second-language research methodologies:
Language learning theory has had a strong preference for speculation, the expres-
sion of personal opinion, the explanation of practical experience, and participation
in controversy all perfectly legitimate ways offinding directions provided they are
balanced by systematic empirical procedures. But in language teaching theory we
have tended to neglect the collection of empirical data (p. 126).
CALL evaluators and researchers need to answer her call for disciplined,
dispassionate research that attempts patiently and carefully to add to what
is already known about how students learn languages in the context of
computer-assisted language learning, sharing her desire to create the best
assurance that CALL, unlike the language lab of the 1960s, will be used
intelligently (1988: 127).
Second language acquisition, language theory and the Case Studies
In its attempt to heed Pedersons call to build on the lessons from existing
research, the project behind this book did not espouse any one particular
theory of language learning despite a leaning towards second language
acquisition (SLA). Other theories and approaches were not precluded in

our search for an effective pedagogy for CALL, which itself was part of
the rationale for the four Case Studies that generated data for this project.
These Case Studies reflect an attempt to evaluate a variety of different
approaches that broadly come under what Felix has described as combin-
ing constructivist and instructivist elements (2006: 182). By definition
SLA is a very broad church. As Ellis states: the nature of this object
[SLA] is far from clear, and different researchers have given very different
interpretations of it (1994: 11). A summary ofthe nature ofSLA research
in general and its application to CALL is given as part of Chapter 2, and
with it a clarification of which definition and pedagogical emphasis within
SLA has been adopted for the purposes of this book.
Through the course ofthe studies different language learning theories
informed the experimental construct and teaching approach. For example,
a communicative approach informed the TOLD project (Barr, Leakey &
Ranchoux 2005; see also Chapter 8), a theory of Blended Learning (BL),
combining an eclectic mix of Task-Based Learning (TBL), constructivist
and instructivist approaches, informed the BLINGUA project (Leakey
& Ranchoux 2006; see also Chapter 8). The experimental design of this
project deliberately compared a teacher-led approach with one that involved
more autonomous learning in groups determined by learning style. The
primary product ofthis project, the Model for Evaluation (MFE), occurs
as two realizations in this tome. MFE1 was the leaner prototype model
used in the Case Studies, derived from various mapping exercises that
pitted Chapelles theory-driven principles for evaluating Second Language
Acquisition against other evaluative or conceptual frameworks from the
relevant literature. MFE2 is the expanded model that pools the lessons
from the Case Studies and includes a number of sub-checklists that drill
down into the twelve CALL Enhancement Criteria and the Qualitative
and Quantitative Measures routes anticipated in MFE1. MFE2 is unpacked
in full in the final chapter.
While the evaluative model generated by this study is designed to be
applicable across the full range of CALL applications and environments,
no claims are made as to the establishing of a universally applicable new
theory of CALL pedagogy. The empirical data, if anything, reinforce the
value within a broad acceptance ofSLA theory of a pragmatic and locally-
10 Chapter 1
determined blending of resources and approaches in response to student

needs, the demands of the curriculum and the constraints of the learning
environment. Indeed, an attempt to define and test a blended learning
pedagogy for CALL is made (see the BLINGUA Case Study). Statistical
inferences based on the Case Studies reinforce some existing good practice
and provide insights that may help enhance the rigour or direction offuture
studies. Their generalizability must, however, due to the small sample sizes
involved, remain debatable. Such conclusions bear out the wisdom ofElliss
words, in SLA Research and Language Teaching:
SLA is characterized by small-scale studies. There have been few, if any, studies that
might be characterized as large-scale. However, this can be seen as an advantage, as
it affords a rich contextualized view of how L2 [i.e. Second Language] acquisition
takes place. The danger lies in using local research to advance pedagogic proposals
of a categorical nature. (1997: 252)
The aim is more to afford a broad and deep contextualized view ofCALL
learning and CALL evaluation, to add insights and guidelines to the corpus
of good evaluative practice, and to suggest a framework for systematiz-
ing CALL effectiveness research. Out of this, it is hoped, future CALL
effectiveness researchers might more easily identify gaps in the literature,
generate research questions that build on a logical progression of enquiry,
and employ proven methodologies that meet a consensual CALL effec-
tiveness research agenda.
It is now time to look at the research questions and methodologies
used for this study.
The foundational research questions
The following list of research questions was drawn up prior to the Case Stud-
ies and out ofthe initial literature review. They form the basis for the scope
ofthe literature review in Chapter 1 and, while other subordinate research
questions arise out ofthe literature review, they are the foundational focus
for the MFE and the Case Studies in the subsequent chapters.
1. Does CALL really improve language learning, and if so, what is the
evidence for this?
2. What might be a useful evaluation model for investigating and meas-
uring the effectiveness of platforms, programs and pedagogy?
3. Can one usefully compare CALL to traditional methods of second
language teaching and learning?
4. How can one best measure the effectiveness ofCALL platforms, pro-
grams and pedagogy over a sustained period oftime (between 1 semester
and 3 years minimum)?
5. Is it possible, using an appropriate evaluation model, to identify best
practice using optimized combinations of multimedia/CALL?
6. How can one integrate best practice using optimized combinations of
multimedia/CALL with more traditional second language teaching
and learning methods and with varying degrees of enthusiasm amongst
staff and students?
7. Is student progress (or lack of it) through CALL or non-CALL peda-
gogy determined to a significant degree by independent variables such
as learning style or prior exposure to/use ofICT? If so, might there be
value in mapping student learning paths to their dominant learning
style?
Research methodology
Theoretical considerations and research philosophy
In delineating the parameters of this study a decision was made to adopt

a flexible research philosophy from the outset. A balance of positivistic
and phenomenological approaches would lead to the generation of metric
data allowing for objective, statistical inferences to be made, while also
providing context-specific insights of a deeper and more subjective nature.
This blended philosophy would influence, in particular, the design ofthe
12 Chapter 1
Case Studies (quasi-experimental, real-life based, as opposed to tightly

experimental, clinical), but would also inform the kind ofCALL pedagogy
that would prevail (a blended, eclectic, and inductive approach, rather
than a deductive approach moving from a monistic educational theory
to practice).
An exclusively positivistic, scientific and empirical approach might
deny the opportunity to explore the reality behind the reality (Remenyi
et al. 1998:35) that occurs in any educational context where the complex
behaviour of human beings is under observation. Likewise a purely phe-
nomenological research philosophy might restrict the possibilities for the
quantitative study of, say, student progress over time (pre-test to post-test),
or for the tight control of different variables under experimental or near-
experimental conditions, or for strong internal and external validity in the
data. Such a quantitative approach would, furthermore, enable one to get
closer to explaining the causal relationships between variables (e.g. the link
between a particular learning style and progress using CALL).
A potential danger of a mixed, or configured, approach might be that
of falling between two stools: on the one hand, obtaining insufficiently
robust data because of having too much complexity and too many con-
founding variables to grapple with in a real-life classroom/lab context, and
on the other hand, having insufficient depth of context-specific insight and
participant feedback due to a preoccupation with obtaining satisfactory
sample sizes, full data sets and non-intrusive observation.
Evaluative theory
The theoretical background to the research and the evaluative methodol-

ogy used in the thesis that inform this book were informed by Chapelle
(2001) for broad principles and Felix (2004, 2005a) for experimental design
detail. The statistical methodology was drawn from the general principles
and practice of statistical analysis (e.g. Buglear 2000; Pallant 2001).
In articulating principles for CALL evaluation, Chapelle (2001)
states that three needs have to be addressed to improve CALL evalua-
tion. First, evaluation criteria need to incorporate findings and theory-
based speculation about ideal conditions for SLA, in other words CALL
researchers need to be familiar with previous effectiveness research findings
and know as far as possible the agreed best practice for language learn-
ing. Second, criteria should be accompanied by guidance as to how they
should be used; in other words, a theory of evaluation needs to be articu-
lated. Third, both criteria and theory need to apply not only to software,
but also to the task that the teacher plans and that the learner carries out
(pp. 5152). In other words, a holistic MFE, that is capable of evaluating
platforms, programs and pedagogy, is needful.
Chapelle argues that CALL evaluation has to go beyond what she
calls judgmental methods of evaluation to include empirical methods.
The former, she argues, is a level of analysis suitable for evaluating both
CALL software and teacher-planned activities to determine how well the
program/task does the job of improving language competency (see Table
1.1). Empirical analysis, she argues, evaluates the learners performance, and
is therefore conducted through examination of empirical data reflecting
learners use of CALL and learning outcomes (pp. 5354).
Level of Object of Method of

Example question
analysis evaluation evaluation
Does the software provide learners

1 CALL software the opportunity for interactional Judgmental
modifications to negotiate meaning?
Does the CALL activity designed

Teacher-planned by the teacher provide learners the
2 Judgmental
CALL activities opportunity to modify interaction for
negotiation of meaning?
Learners Do learners actually interact and

3 performance during negotiate meaning while they are Empirical
CALL activities working in a chat room?
Table 1.1 Levels of analysis for CALL evaluation. Source: Chapelle, C. (2001: 53).
Computer applications in Second Language Acquisition.
Cambridge University Press, reproduced with permission.
14 Chapter 1
This study incorporates both types of evaluative method. The terms judg-
mental and empirical echo the terms qualitative and quantitative. Judg-
mental data are usually open, descriptive, verbal, subjective and based on
opinion or affective response; they can be collated and analysed qualitatively
(for example, in semi-structured interviews or focus groups) or quanti-
tatively (for example, in Likert scale and closed yes/no questionnaires).
Empirical data are closed and explanatory data that are gained by objective
observation and/or experiment; they are usually collated numerically (i.e.
quantitatively).
Reporting formats
Empirical and qualitative/quantitative data can be depicted in a variety of

formats depending on the nature of the study, the purpose of the report
and the target readership or audience. Judgmental data are most frequently
depicted in a verbal, non-quantitative, tabular format, though often Yes/
No and multiple-choice qualitative response data might be tabulated quan-
titatively. Quantitative data are, of course, normally depicted numerically,
and here the descriptive statistical tables normally precede any inferential
analysis of the data and any t-tests or analyses of correlation. However,
there are instances where quantitative data are portrayed in a verbal format.
This may occur when the readership is a non-specialist one and is more
interested in a general statement of results than the fine detail ofthe data.
Meta-analyses reporting on multiple studies often use this approach.
Reporting of the Case Studies employs a range of different tabular
and graphic formats for the representation of quantitative and qualitative
data. Statistical analyses have been for the most part carried out using the
software package SPSS in conjunction with Microsoft Excel. These two
packages have, between them, an extensive range of tabular and graphic
formats. Typically, a discussion of descriptive statistics with quantitative
tabular illustration is followed by inferential statistical analyses employing
paired sample or independent sample t-tests and Pearsons rho analyses
of correlation. Where the sample size is small, and this is often the case
in the Case Studies due to the small size of the different language classes,
the above analyses are sometimes accompanied by supporting, or negat-

ing, non-parametric tests, such as the Wilcoxon Signed Ranks test or the
Mann-Whitney test or the Spearman rho test. The problem of small groups
is made even more challenging when a group is then subdivided according
to learning styles and correlations with learning outcomes or any other
variable are sought.
Much ofthe qualitative data was also submitted to statistical analysis
so that valid inferences could be made. To this end surveying techniques
such as questions requiring a simple Yes or No answer, or varieties ofLikert
scales (e.g. 15 rating), were used. Where richer data were sought, open-
ended questioning was used, although these do not lend themselves so
easily to statistical analysis. For a useful summary of some key concepts
and definitions linked to research design and statistical analysis see Felix
(2005a, pp. 2627).
Data collection methods
For the most part the studies involved make use of Treatment and Com-
parison groups to control for specific variable(s) and so enable the gathering
of data for the comparing of means.
A treatment (sometimes also called a test or experimental) group is,
in most instances, a group taught in a CALL environment such as a mul-
timedia laboratory. In the case of the BLINGUA Project for the same
University of Ulster Case Study, both treatment and comparison groups
were taught in the multimedia laboratory.
A comparison group, for example, might be a group taught in the tradi-
tional manner (i.e. in the classroom, never in the multi-media lab). This was
the case with the TOLD Project in the University of Ulster Case Study.
Quantitative data were collected by means of a pre- and post-test, so
that an empirical gauge oflearning gains/outcomes could be made. Before
teaching began, all subjects were given a test on the areas oflanguage cov-
ered by the module. At the completion of the period of instruction, the
same, or a near-identical, test was again administered as a post-test.
16 Chapter 1
Qualitative data were collected by means of student and staff evalua-

tions at mid- and end of semester/academic year. During and/or at the end
of the period of instruction, students were asked to complete a question-
naire. This was designed to find out how students reacted to the instruc-
tion methods, as well as how they felt about their progress. Some of the
later Case Studies included focus-group interviews in which subjects were
interviewed and digitally recorded during or after the treatment. Use was
also made of journals and web-logs. Students were asked to record, over the
duration of the treatment period, their subjective reaction to their learn-
ing experience. The TOLD and TellMeMore projects used a paper-based
journal, the BLINGUA Project made use of a combination of web-log
and paper-based journal.
Experimental design models used
These models varied according to the requirements of the project and

the nature of the data sought. In most cases the following research design
models were used. Fuller definitions of the terms below are given in Felix
(2005a, pp. 2627).
Firstly, the research design is based largely on a Case Study approach.
This, in essence, is a strategy for doing research which involves an empirical
investigation of a particular phenomenon within its real-life context using
multiple sources of evidence. The contexts for the Case Studies for this
project were always university undergraduate settings involving language
learners in CALL and sometimes non-CALL settings.
Given the real-life context of each Case Study one must also describe
the research design as a quasi-experimental research (similar in many ways
to action-research) design. These are, essentially, real-life studies which,
by definition, involve potentially confounding extraneous variables but
have the advantage of taking a slice of life. Being a gauge of what is really
going on they are in contrast to experimental research designs which are
more tightly controlled, clinical studies, and are more artificial for these
reasons and may not, therefore, show how subjects behave under normal
real-life conditions.
Quasi-experimental studies usually involve both a pre-test and a post-

test, treatment and control (or comparison) groups, but usually no random
assignment of subjects, due to the fact that they are subject to such con-
straints as timetables and streaming or course-determined cohorts. Given
the constraints of the university context it was usually impossible to ran-
domize the make-up of our treatment and comparison groups, although
the TellMeMore (v.9) study drew its subjects from volunteers rather than
subject cohorts.
All of the Case Studies employed a pre-test/post-test comparison
group design to provide an objective measure of learning gains over the
period of the study. The word comparison is used advisedly instead of
control as the latter relates normally to more experimental contexts.
All Case Studies were variants of a between-subjects study, which is a
study designed to make a comparison oftwo or more treatments and that
compares them by having one set of users try one treatment and another
set of users try another treatment, measuring their performance for each
(Felix 2005a: 26).
Finally, one can also describe all ofthe Case Studies as cross-sectional,
in that a cross section of subjects was chosen to represent a particular target
population, and this was studied at essentially one point in time (albeit for
periods of between one and two semesters). One study, the BLINGUA
study, while taking cross-sectional snap-shots over several years, took on
a longitudinal character in that it followed one cohort over a two-year
period, and gathered the same learning gain data from different cohorts
over a four-year period.
The above research questions and methodology inform the proto-
typical Model MFE1 (Chapters 4 and 5) and run as a thread through the
Case Studies that follow, before being widened, in the concluding chapter,
into the more systematic MFE2, born of the experience of the four Case
Studies and more recent developments in the field.
18 Chapter 1
Summary of chapters
Having introduced above the broad research questions and research phi-
losophy for my overall enquiry, as well as introduced some of the key ter-
minology and methodology to be used, this study will now, in Chapter 2,
evaluate the relevant literature relating to CALL definitions, CALL history,
the history ofCALL effectiveness research, CALL pedagogy and second
language pedagogy in general as it relates to the context of CALL and
CALL evaluation. Chapter 3 addresses the question Has CALL made a
difference, and how can we tell? in the context offour key debates that arise
frequently in the literature when CALLs effectiveness is being discussed:
whether CALL improves language learning, what the value of compara-
tive evaluations is, what combination of methods is best for measuring
progress, and whether the focus should be on learning processes or learning
outcomes. The chapter concludes by applying past lessons to an improved
model ofCALL evaluation. Chapter 4 presents the prototype MFE1 and
suggests two primary routes through the evaluative jungle: one that uses
twelve CALL Enhancement Criteria to judge CALL impact phenom-
enologically and a second that configures Qualitative and Quantitative
Measures to judge CALL impact more positivistically. The choice of the
twelve CALL Enhancement Criteria is then justified by means of mapping
Chapelles six evaluative criteria for evaluating CALL task appropriateness
against a number of evaluative agendas from the literature and CALL
practice. Chapter 5 presents the Qualitative and Quantitative Measures
and argues for an empirical methodology of CALL evaluation involving
a triangulation of analytical and diagnostic tools aimed at obtaining data
that are both rich and containing strong internal and external validity.
The three chapters that follow on from the presentation of MFE1
(that is, Chapters 6 to 8) draw together the evidence and findings from
the Case Studies which pilot MFE1 in a number of CALL settings. The
Case Studies follow the hierarchical logic ofthe construction site: starting
with the foundation of all CALL activity: the digital platform, followed by
the software program that sits on this platform, and finishing up with the
pedagogy that harnesses these for the purposes of teaching and learning.
MFE1 is revisited and improved upon in the final chapter (Chapter

9). It is important to reiterate, at this stage, that the Case Studies did not
trial the final ideal version of the Model for Evaluation (MFE2) that is
extrapolated in the final chapter. This is simply due to the fact that the
expanded model came about as a consequence of the trialling of the first,
more rudimentary, version in the Case Studies (between 2003 and 2006)
and of the insights gained through that experience and through wider
reading as well as developments in the field ofCALL effectiveness research
since the start of the research for this study in 2003.
The Case Studies
The decision to adopt a Case Study approach was born of three principal
needs. First, there was the need to prove, or if necessary disprove, the effec-
tiveness of CALL. Secondly, there was the need to trial and improve, in
real-life academic settings, a Model for Evaluation that would be a flexible
and systematic tool capable of assembling a large-scale picture from numer-
ous small-scale studies using a configuration of data-gathering techniques.
And thirdly, there was the need to develop a pedagogy for CALL that was
both informed by theory and yet pragmatic and flexible enough to identify
and profit from the rapidly changing and diverse world oftechnology with
which most of our students are already familiar.
The first Case Study (Chapter 6) looks at the evaluation ofthe impact
of digital platforms on the whole CALL process, with particular emphasis on
Robotels SmartClass digital platform as used in the University ofUlster and
the Melissi Digital Classroom as used in the University of Portsmouth.
The second Case Study (Chapter 7) evaluates the role of commercial
software in driving (or hindering) the CALL agenda and looks, in particu-
lar, at two evaluative projects that trialled, in the context of higher edu-
cation language teaching, different versions (i.e. a networked CD-ROM
and an online e-learning course) of a high-powered product developed by
a major CALL software manufacturer (Auralog, France).
20 Chapter 1
The third Case Study (Chapter 8) reports on two pedagogy-based

studies (TOLD and BLINGUA). These track and evaluate the migration
from an analogue language lab to a high performance digital platform in
the University of Ulster between 2003 and 2006 and assess the effective-
ness of new CALL platforms, programs and pedagogy in the context of
the teaching of French to undergraduate students and report on the use-
fulness of the trial Model for Evaluation with recommendations for its
improvement.
First, however, we must look at the literature ofCALL, CAL and SLA
and at the evidence of effectiveness research in the field ofCALL since its
birth over forty years ago.
Chapter 2
Swings, spirals and re-incarnations:

Lessons from the past
Introduction
Effectiveness research, or systematic evaluation, as it has also been called

(Dunkel 1991: 2324), is the analytical approach to evaluating the impact
or effects of a treatment on individuals, groups or processes. It is about
qualification and quantification. In our case it concerns the measuring of
the effects ofCALL on language learners. The notions ofCALL and CALL
evaluation are associated with a number of concepts that, from the outset,
need explaining and setting in the context of CALL evaluation.
To start with, the term CALL is defined and placed in the context
of the academic disciplines to which it is related, from which it draws
inspiration and to which it is now contributing. Several frequently used
CALL acronyms will also be briefly summarized. The concepts evaluation
and effectiveness are then unpacked into their various component parts:
effectiveness research; quantitative and qualitative analysis; experimental
and quasi-experimental research, and internal and external validity. The
scope ofthe terms pedagogy, platforms, and programs (or the Three Ps
for short) as they occur in the language learning and CALL literature and
as they will apply in the chapters following will then be given. Concepts
such as behaviouristic, communicative, constructivist, language acquisi-
tion, and learner differences all feature heavily in the evolution oflanguage
learning from the 1950s on; and as such have influenced the development
of pedagogies for CALL, and the evolution of the digital platforms and
software programs that support language teaching. This will pave the way,
in the following chapters, for a look at the evidence that already exists for
the effectiveness of CALL, how this was arrived at, and how one might
justify a new more inclusive and flexible model for evaluating CALL.
22 Chapter 2
Definitions
CALL A definition
Levy defines CALL as the search for and study of applications of the
computer in language teaching and learning (1997: 1). In light ofthe review
ofthe literature and for the purposes of clarity for the enquiry that follows
I have defined CALL as the following: the exploration, sometimes coher-
ent, sometimes disparate, of all aspects ofthe human-computer axis, with
the primary goal of enhancing the process of second-language teaching and
learning, be it in curriculum design, delivery, testing, feedback, monitoring
or evaluation, by means of the generation of improved computer-based
platforms, courseware, learning environments and pedagogies.
Computer-assisted language learning is now in its fifth decade as an
academic discipline or field of study. This relative youth may go some way
to explaining its being a relatively unknown and disparate entity. Also,
its very name suggests three vast areas of knowledge each with their own
fields of study and frames of reference. The notion of computer-assisted
automatically links the discipline with the new and rapidly changing dig-
ital world and with it a plethora of fields of varying degrees of relevance
to CALL. This relatively new field is linked to two that are nearly as old
as human beings: the concepts of language learning and learning, each
with their own conceptual root systems. CALLs very name reflects its
interdisciplinary nature.
CALL An interdisciplinary field of study
Levy (1997: 4950) lists twenty-four disciplines that bear upon CALL
and are to varying degrees influenced by it. They reflect our three principal
conceptual areas: the notion computer-assisted links CALL to Artificial
Intelligence (AI), to Computer-Assisted Instruction (CAI), Computational
Linguistics, Educational Technology, Expert Systems, Human-Computer
Interaction (HCI), Information Processing, Instructional Design, Instruc-

tional Technology, Language Data Processing, Machine-Translation,
Materials Design, Programmed Instruction/Learning, Natural Language
Processing (NLP), Parsing Theory and Systems Theory. Language learn-
ing links CALL to Linguistics, Applied Linguistics, Language Teaching
Methodology, Psycholinguistics, Second Language Acquisition (SLA),
and Sociolinguistics. Learning links CALL to Curriculum Development,
Cognitive Psychology, and Educational Psychology. Developments in any
ofthese fields are likely to have a knock-on effect on CALL and vice-versa.
Thus, an understanding of what constitutes CALL must remain a subject
of ongoing and evolving debate.
In Levys definition the focus is on the role of computer applications
in the language learning process, an emphasis which has gained increasing
centrality in recent CALL agenda-setting studies (cf. Chapelle 2001, Felix
2005a). Yet there is broad agreement with Levys assertion that CALL
practitioners and educators must ensure that CALL is not technology-led
(1997: 215) and that one ofthe key areas offocus must be in tightening the
relationship between theory and application. By application we usually
think of pedagogy in practice, and, quite literally, software applications.
These latter may be broadly defined as software or courseware. Courseware
is a subset of the general term software, and is defined by Davies et al. as:
complete commercial [language] courses either on- or off-line (2005: 9).
As for digital platforms, are there grounds for their featuring as a focus of
effectiveness research in CALL?
Recent years have witnessed the development of a growing number
of multimedia language learning systems, which blur the traditional line
between hardware and software. One might call them language learning
networks, shells, or indeed platforms. As such, they incorporate on the
one hand a number of hardware features such as switching systems, real
or virtual servers, video-conferencing equipment, specialist sound and
graphics cards, and tutor/administrator tools, and on the other hand, fully
integrated courseware applications. This project included Case Studies
(reported on in Chapters 4 to 7) of a number of university language depart-
ments which have recently changed from analogue to digital multimedia
laboratories. An evaluation ofthe capabilities ofthese platforms in real-life
24 Chapter 2
educational contexts will provide vital feedback to department managers

about to undertake just such a migration, to teachers coming to grips with
transition to a new teaching and learning culture, and to designers whose
products have not really been subjected to the pressures and demands of
different pedagogies, software modalities or learner differences.
CALL acronyms
For the sake of brevity, acronyms will be used as often as possible. The
acronym CALL, coined in the early 1980s, in all probability by Davies and
Higgins (1982), is one ofthree generic labels that jostled for pre-eminence
in the 1980s and 1990s, the other two being CELL (Computer Enhanced
Language learning) and TELL (Technology Enhanced Language learn-
ing). The name CELL was probably first coined by Professor Andrew
Lian around 1988, and, like CALL, emphasizes the computers role in
language learning, while allowing for all types of computer programs and
computer-based resources. The TELL Consortium was founded at the
1993 EUROCALL conference in Hull and was based at the Centre for
Modern Languages at the University ofHull in the UK; its nomenclature
implies a broader scope than CELL including all the technologies involved
in language learning. CALL, nevertheless, has stuck as a term. CALL, as
well as any acronym, emphasized the gamut of roles the computer can play
in learning, and by 1997 had already made its way into the titles of key
journals and conferences (ReCALL, CALL, On-CALL). The emphasis
on the term computer-assisted in the name CALL is more neutral than
computer-enhanced, emphasizes the facilitating role ofthe computer, and
discourages the perception of the computer as the tutor. As we will see,
this distinction has become increasingly important in the debate concern-
ing causality in the learning process.
While not departing from the generally accepted acronym ofCALL,
nevertheless, this book will balance the concept of learning with role of
the teacher and his/her pedagogy. Chapelles (2001) use of the concept
acquisition in her preferred acronym Computer-Assisted Second Lan-
guage Acquisition (CASLA) perhaps comes closest to this balance, while
still placing the greater emphasis on the learning, rather than the teaching
process. Likewise her use of the term task implies a two-way engage-
ment: something set by the teacher, but worked on with varying degrees
of autonomy by the student. Clearly learning can and should take place
without the teacher, but faced with the increasing plethora of resources
available to the learner, this book will argue that, now as much as ever, the
teacher/tutor/facilitator is needed to harness, integrate, pilot and humanize
the learning materials and processes, and that a clear and holistic agenda is
required to enable a coherent assessment as to how effectively this is being
done. To this end the notions of evaluation and effectiveness research need
to be clarified.
Evaluation studies
Effectiveness research, analysis of data: Definitions and scope
Effectiveness research is more than about providing empirical, scientifi-

cally rigorous evidence that something is the case or that a cause produces
a particular effect. Cameron was, in large part, talking about issues of
effectiveness, when he stated:
Validation and evaluation are extremely important aspects of any project and research-
ers should remember that iftheir papers are going to carry any weight, their findings
have to be substantiated with the support of usage and validation. A number of
submissions to this journal have been brilliant in their conception but have had to
be returned becausethe project had been poorly evaluated. (Cameron 1995: 294,
cited in Chapelle 1997).
Evaluation is about quantifying and qualifying the value or worth of an

object, idea, person or system. In evaluating quantitatively anything edu-
cational we are primarily interested in looking at learning gains or benefits.
The scope for qualitative evaluation does not always overlap with that of
the quantitative. It is at once broader and narrower: able to describe the
subjective and experiential as well as the objective and empirical yet less
able to demonstrate these empirically.
26 Chapter 2
Pederson (1988) defines effectiveness research as the search for an

empirical base for assumptions, strategies and applications ofthe computer
in language teaching; one whose primary goal is to ensure that, follow-
ing the demise of the audio-lingual method and the language laboratory,
another powerful tool for language learning does not fail to reach its
potential (p. 101).
Laurillard and Hewer (1998), published a detailed two-part evaluation
of the TELL Consortium Project whose principal aim was to make the
teaching and learning of modern languages within UK higher education
more productive and efficient by harnessing modern technology (p. 2).
Laurillard states that evaluation, both formative and summative, of the
effectiveness of [] courseware has been frustratingly neglected in the
past (p. 1). Their two reports, one formative and the other summative, are a
good example of an integrated evaluation of courseware design and imple-
mentation. Their holistic study looked at planning and implementation
processes, the data arising from the project, issues related to pedagogical
design, and the main findings in terms ofthe language learning issues. The
studys main limitation, from an effectiveness research point of view, was
that it was qualitative in its approach and, as such, did not look at evalu-
ating quantitatively the impact of the courseware on learning gains. This
somewhat undermined the empirical nature of the study.
Clark, like Laurillard and Hewer, argues that CALL evaluation needs
to include the criterion of cost-effectiveness. He asks a question that clari-
fies further the scope of effectiveness research:
We need to ask whether there are other media or another set of media attributes that
would yield similar learning gains. The question is critical because if different media
or attributes yield similar learning gains and facilitate achievement of necessary per-
formance criteria, then in a design science or an instructional technology, we must
always choose the less expensive way. (Clark 1994: 22, cited in Allum 2002: 147).
Effectiveness research goes beyond value to explore value added. While it

is interested in the identification of an effect, it is even more interested in
the effectiveness of this treatment in a given educational context.
CALL evaluation studies aim to assess the full spectrum of CALL-
related activity, from theoretical justifications to design and implemen-
tation. MacWhinney (1995: 324, cited in Barrire and Duquette 2002)
called for evaluation studies that assessed not just the courseware but their
operationalisation in the classroom: The design of computational systems
to support foreign language instruction needs to be grounded in what we
know about human learning, language instruction, and human-computer
interaction. Principles derived from these fields need to be tested and quan-
tified in the context of specific tutoring systems (Barrire and Duquette
2002: 472).
Yildiz and Atkins (1993, cited in Levy 1997: 41) called for an evalu-
ation of learning outcomes with different sizes of learner group and with
different methods of integrating the multimedia application into other
learning taking place in that context.
Is it, however, just about measuring learning outcomes, and merely
about product?
More recently Barr has argued that: few researchers have investi-
gated how to integrate all these applications to achieve maximal benefit
for the learning process (2004: 12). He quotes Richmond (1999: 312) as
suggesting this as an area for future research. Barr concludes his book on
computer-based learning environments with the following recommenda-
tion: a future study might also look at empirical evidence ofthe impact
of computer technology on the language learning and teaching process
(Barr 2004: 226).
The work of researchers such as Chapelle and Jamieson has focused
primarily on just this area of process. They argue forcefully the case for
research that identifies and isolates the specific variables surrounding CALL
activity that may, or may not, be contributing to increased learning gains.
They state that:
The researcher must ask and answer the following questions: What
factors other than the hypostudyed CALL activity could have influenced
students performance on the test used to measure the effectiveness ofthe
CALL activity in promoting the L2 development? What factors other than
the attitudes or strategies under investigation could have been responsible
for students reported perceptions and use oflearning strategies? What jus-
tifies the interpretation of particular behaviours observed as suggestive of
certain linguistic functions or cognitive strategies, and to what extent were
two independent raters able to agree on those interpretations? (1991: 54).
28 Chapter 2
These references help shed light on what is meant by evaluation and

the scope that such studies have taken or should take. Empirical data, how-
ever, come in a variety of forms, which can be broadly categorized under
the umbrella terms: quantitative and qualitative data. How are these to
be defined?
Quantitative and qualitative data
Buglear defines data as a set of known facts and the difference between
qualitative and quantitative data as the difference between categorizing
and measuring. He states that: any data that is based on characteristics
or attributes is qualitative. Data that is based on counting or measuring is
quantitative. (2000: 23). Thus if we were to describe a certain CALL pro-
gram or teaching method as effective, we could verify this by either using
qualitative terms to categorize the responses or effect it had on students
(i.e. motivating, helped me to learn my verbs, improved my fluency and
pronunciation) or we could quantify the actual effect such a program had
on the students by giving them the same test twice, once at the start of a
treatment and once at the end, and measure the difference. One could
then measure the effect on a whole class, add up the average of the whole
group and then compare the means before and after the treatment. Such
an approach would be a quantitative approach.
Whether quantitative or qualitative, the research requires rigour or
validity for its findings to be generalizable beyond the context of the
study. Many factors go towards ensuring such validity, such as the size
of the sample studied, the isolation of contributing variables, the ethical
integrity ofthe research, the length oftime over which the study was car-
ried out and the repeatability ofthe data. Validity can be divided into two
categories: internal and external. Internal validity can be described as: the
accurate attribution of observed experimental results to the factors that
were supposed to be responsible for those results, and external validity as
the applicability of research results to instructional and research contexts
other than the one in which the research was carried out (Chapelle and
Jamieson 1991: 38).
Studies assessing the effects ofCALL on language learning proficiency

can yield both quantitative data (e.g. pre-test results compared with post-
test results using the same test) or qualitative data (e.g. staff or student
attitudinal survey investigating perceptions of the learning process, of
improvement and reasons for improvement). Internal validity may be com-
promised by a failure to isolate the variables contributing to the progress.
For example, if two groups are compared, one using computers and the
other a similar group covering similar content in a traditional classroom,
but with a different teacher, learning gains may be attributable more to
the influencing factor of the teacher than to the influencing factor of the
computer-based environment. Thus internal factors like the context of
the language learning could compromise the external validity ofthe data.
Likewise the characteristics ofthe subjects and the type ofCALL activities
may also influence the external validity of the data.
Given the nature ofthe CALL classroom and the difficulties in obtain-
ing accurate positivistic data that fulfil the requirements ofexperimental
science, most CALL field research can often be no more than what has
been called quasi-experimental research. This is because the researchers
must examine intact groups of students and are unable to control and
manipulate variables (e.g. the content ofthe lesson) for experimental pur-
poses. They must be satisfied with investigating the effects of naturally
occurring treatments in real-world settings (Chapelle and Jamieson
1991: 39). In spite ofthese challenges, is there still justification in pursuing
both a qualitative and a quantitative approach to CALL evaluation in spite
of the difficulties? Furthermore, is the qualitative-quantitative divide as
neat as it appears at face value, and is it possible to pursue a combined or
dual route through the complexities of data-gathering that occur in CALL
environments?
Empirical versus judgmental; positivistic versus phenomenological
The contrasting of qualitative and quantitative data is part of a larger con-

ceptualization of data that posits the empirical against the judgmental
and, from a more philosophical point of view, the positivistic against the
30 Chapter 2
phenomenological. To convey the multiple layering of research approaches

that might follow on from a particular research philosophy Saunders et al.
(2000: 85) use the concept of a research process onion. Reproduced below
(Figure 2.1), it shows how a positivistic research philosophy will tend to
lead to a deductive approach, requiring a more experimental strategy, with
a cross-sectional rather than longitudinal time horizon, drawing more on
sampling, secondary data and observational data collection methods than
interviews and questionnaires.
Positivism Research
Deductive philosophy
Experiment
Research
Cross Survey approaches
sectional
Case
Sampling
Study Research
Secondary data
Observations strategies
Grounded
Interviews theory
Questionnaires
Ethnography
Longitudinal
Action Time horizons
research
Inductive Data collection
Phenomenology methods
Figure 2.1 Research process onion. Source: Saunders, M., Thornhill, A., & Lewis, P.
(2006: 85). Research methods for business students. 4th ed.
Pearson Education Ltd, reproduced with permission.
On the other hand a phenomenological research philosophy will tend

to lead to an inductive approach, requiring an action research or ethno-
graphic strategy, with a more longitudinal time horizon drawing more on
interviews and questionnaires for data collection. It is important to stress
that these are not mutually exclusive routes and depending on the research
design and the setting some aspects can feature in both philosophies. For
example, a closed questionnaire or interview approach will lend itselfto a
positivistic approach, whereas a more open-ended questionnaire or inter-
view technique will suit a phenomenological approach.
There are advantages and disadvantages in rigidly adopting any one

particular approach. For example, a purely positivistic underpinning phi-
losophy on its own would by definition generate mostly quantitative data.
This would allow for an experimental strategy, for example one involving
cross-sectional studies, since in such contexts (e.g. comparing first year stu-
dents year on year) the variables could be more tightly controlled than in
longitudinal studies, where variables such as age and subject material would
be constantly evolving. These can also be termed replication studies.
The primary focus of a purely phenomenological philosophy, on the
other hand, would be to understand the different meanings that the sub-
jects under observation attach to the learning experience. An inductive
approach emanating from such a philosophy would enable one to develop
theory from the raw data gained from a number of contexts. Quantitative
analysis of data produced by such an approach would, however, yield less
generalizable results than much quantitative data from positivistic studies,
as each set of qualitative, reaction data would be context-bound; however,
useful evaluations of individual contexts, experiences and outcomes can be
achieved almost regardless of sample sizes. Such studies lend themselves
more easily to longitudinal and ethnographic studies, where the necessity
of comparing like with like and having tightly controlled variables would
be less important than the evaluation of a long-term process.
The benefits of combining positivistic and phenomenological approaches
This thesis contains a number ofCase Studies where both broad approaches
have been adopted for the purposes of obtaining data that are as reliable
as possible, while also allowing for a richer analysis of the learning pro-
cess permitting alternative explanations of what is going on. For example,
in the TOLD project carried out in 2004, the quantitative data showed
that, over a 12-week semester, learning in a CALL context did not help
students in their oral language development significantly more than those
deprived of CALL were helped by similar material taught in a different
context. However, the quantitative data were given a depth of clarity by
the student logs which revealed that while most students who had access
to CALL materials appreciated the value of these materials for a variety
32 Chapter 2
of drilling methods and presentational communication, they missed the

opportunities to produce face-to-face message-centred communication
that were encountered in the traditional conversation classes with a for-
eign language assistant. This perceived lack was deemed to be a significant
factor in the relatively slower progress made by the treatment group. This
combined approach allowed us to endorse a pedagogic approach in favour
of a more collaborative CALL than the TellMeMore software and the
approach adopted for TOLD was able to deliver.
It is partly due to the difficulties in charting a way through the com-
plexities of evaluative method and partly due to the problems in account-
ing for the numerous factors involved in improved performance in the
CALL classroom or e-lab that much CALL research has tended to focus
on discrete aspects of the learning contexts, where it is easier to identify
single contributing variables; and for similar reasons much CALL research
has also tended to be more of the qualitative kind, gauging attitudes and
perceptions and observing student behaviour. This kind of research has
also been called descriptive research. A variant of this is ethnographic
or ethno-methodological research, which investigates the social as well
as the cognitive impact of using computers (Dunkel 1991: 23). Dunkel
refers to Sheingold, Kane and Endreweit (1983) who argue for this kind
of research in CALL in order to consider the interaction ofthe computer
with the social system that surrounds itto examine how this interaction
changes over time, as computing activities influence the social system, and
the social system, in turn, shapes use of the technology (Dunkel 1991:
23). Clearly there are many forces and factors and variables at work in the
CALL classroom, some of which can be factored out, others factored in,
some ignored, others identified and isolated. This study, while remaining
mindful ofthe potential for multiple other influencing factors, will focus
primarily on what are, arguably, the three most important factors in the
CALL classroom: the hardware used, the software installed and the teach-
ing and learning strategies employed: that is, the pedagogy, platforms,
and programs. It will suggest a dual route through the CALL evaluation
jungle that takes into account the need for both judgmental and empirical
data, a configuring of phenomenological and positivistic approaches in the
search for both valid and rich evidence.
Before justifying such a dual route approach in the light ofthe CALL
literature on past evaluation studies a clarification is needed ofthe nature
and significance of CALL-related pedagogy, platforms, and programs
within CALL effectiveness research.
Pedagogy, platforms and programs Definitions and scope
Three primary questions arise in the CALL (and CAI) literature when
evaluating the role of pedagogy, platforms, and programs in learning gains.
Firstly, how best may one conceptualize the nature and interrelationship
ofthese three terms? Secondly, from a CALL perspective, do they all need
to be involved in our evaluation ofCALL effectiveness? Thirdly, is it pos-
sible to devise a means of measuring their overall effectiveness that takes
into account the separate and the symbiotic role of each in the learning
process? Notions such as Computer-Based Environments and Intel-
ligent Tutoring Systems reflect in their names the search for a language
that takes into account all three concepts and their interrelationships.
Advances in network design and the Internet have also added to the need
for a descriptive and evaluative language that goes beyond separate assess-
ments of software and pedagogy.
Furthermore, evaluative studies need a grasp not just of hardware
specifications and the authoring and instructional design process, but also
ofHCI and psycholinguistic notions, in order to ensure that an integrated
model of evaluation (MFE2) does justice to the experience that the stu-
dents go through when they interact with software and digital platforms,
and provides useful feedback for all the various professionals involved
in the creative process. When Morgenstern stated that too many CALL
programs were technology-driven and called for a more goal-driven
approach to authoring (1986: 23, cited in Levy 1997), he put his finger on
the fact that many CALL programs may be inadequate for the language
teaching and learning context they are used in, even though they may use
34 Chapter 2
the latest technology. Such software may, for instance, tend to exploit for
its own sake recent technological breakthroughs, such as speech recogni-
tion or multimedia, without properly ensuring the content matches student
abilities, curriculum requirements or language learning theory. Likewise
teachers may under-use or misuse expensive resources through ignorance
and inadequate stafftraining. CALL effectiveness research has a vital role
to play at the intersection of the various players involved.
In their 2005 report entitled Setting up effective digital language
laboratories and multimedia ICT suites, Davies et al. produced for CILT
(the UKs National Centre for Languages and the Association for Lan-
guage learning) a useful guide to evaluating the range of platforms and
programs available. They emphasize that the lessons from the demise of
the original analogue language laboratory are being learned and state that
nowadays the language lab is no longer seen as the panacea, but rather as
one of the many technological aids that the language teacher can choose
to use to enhance teaching and learning (Davies et al. 2005: 4). Their case
studies taken from secondary schools in England demonstrate that, even
without the latest hi-tech networks or technical support, a well-structured,
integrative approach to the use ofICT can motivate students and improve
results by up to 15 per cent (p. 25).
The guide gives some useful evaluative questions that it says should
be addressed prior to the purchasing of educational courseware. These
questions, some of which are listed below, could well form the entry point
of an evaluative model for the interrelationship between programs and
pedagogy:
Is the material adaptable for specific use in my institution?

Is it possible to use it in a blended learning way?
Does the material match the pedagogical and methodological objectives
of my curriculum?
Does the material offer appropriate feedback to assist users to evaluate
their inputs and make progress?
Is the material structured in a way that offers a meaningful progres-
sion through it, allowing the learner to establish progress? (Davies et
al. 2005: 10).
In a later section a number of early CALL evaluative frameworks will be

looked at to assess their current value as models for gauging the relation-
ship between pedagogy and software. Such concepts as adaptability, blend-
ing, match, appropriateness of feedback and meaningful progression will
inform our measurement criteria.
An effectiveness measurement plumb-line must, therefore, assess what
Wyatt called the fit between the computers capabilities and the require-
ments ofthe teaching and learning context (1988: 86, cited in Levy 1997),
and what Pennington refers to as the quality of the match between the
properties ofthe medium, the attributes ofthe users and the way in which
it is implemented in a given context (1991: 274, cited in Levy 1997).
Levy states that the computer has too often become an end in itself, in
the sense that CALL materials cease to exist when computers are replaced,
and that commercial imperatives have driven decisions in hardware and
software selection, particularly hardware rendering subservient the ideals
of CALL authors (1997: 230). For these reasons it is not unreasonable to
assume, therefore, that student learning gains have suffered more than
they would have, had technology always been the servant of pedagogy
and courseware design.
What appears not to have been developed is an effectiveness research
model that addresses the relationship between all three links in the CALL
chain, and brings the contributory role ofthe platform under the evaluative
microscope. An evaluation model should, it is suggested, aim to develop
criteria for assessing optimal combinations and integration of pedagogy,
programs and platforms. The current interest in Blended Learning is gen-
erating research paradigms in Computer-Assisted Learning (Oliver and
Trigwell 2005) that CALL researchers can draw on in the search for an
optimal basis for language learning and teaching given the particular con-
ditions at hand (Neumeier 2005: 176).
CALL descriptions and developments have tended to focus primarily
on software and pedagogy, these being issues over which CALL practition-
ers potentially can have some influence, whereas issues to do with hardware
tend to be a question of making the most of what is available (Ingraham
and Emery 1991: 322). Since the arrival of the VLE and integrated lan-
guage learning multimedia environments or CALL systems, in which the
36 Chapter 2
distinction between hardware, platforms, programs and courseware is

increasingly blurred, a clearer definition ofthese terms and a reappraisal of
the role of the platform in the CALL equation have become imperative.
A major reason for this development may be the growing number of
language teachers involved in the design process, bringing together class-
room experience with high-level systems design and programming skills.
Examples of good practice in design synergy have been around since the
beginning ofCAL. Pioneering integrated systems have been around since
the pioneering days of TICCIT and PLATO in the 1960s where main-
frame computers were harnessed to innovative features such as student-
tutor communication, student record storage, tutor involvement in the
authoring process and a specially adapted keyboard (Delcloque 2000: 64).
Specialist companies are now producing integrated packages which harness
the capabilities ofLAN/WAN, RSS feeds and mobile technology, speech
recognition and video-conferencing technology as well as virtual servers
and digitizing equipment to service the needs of not only language teach-
ers but also other disciplines in multiple-use environments. The world of
multimedia should benefit from an all-encompassing evaluative framework,
comprehensive, yet flexible enough to keep up with its progress.
As it is generally a given that pedagogy should inform the design of
the hardware and the software it is to a clarification ofthe term as it relates
to language learning and CALL that we shall turn first before looking at
digital platforms and programs.
Pedagogy Language learning theory and CALL pedagogy
Pedagogy is defined in the Concise Oxford Dictionary (7th edn) as the

science ofteaching. Mortimore prefers the definition of pedagogy as: any
conscious action by one person designed to enhance learning in another
(1999: 3). For the purposes of this study the fuller definition given by
Mehanna is deemed the most helpful: pedagogy is any effective behav-
iour or activities designed to impart knowledge, it is used in the process of
teaching and learning, and has an association with students learning and
outcomes (2004: 283). As yet, one could not say there is a unified peda-
gogy oflanguage teaching, let alone CALL. It is debatable whether such a
unified approach would be possible or even desirable given the diversity of
theoretical standpoints adopted by individuals, institutions and national
education bodies, and given the state offlux that exists with regard to the
nature and acceptance of some ofthese theories. A brieflook at the history
of language learning and CALL pedagogies will illustrate the diversity of
approaches and methodologies past and present, underscore useful con-
ceptual and thus evaluative criteria, and will help outline the integrated,
blended approach to CALL that will inform MFE2 as well as the approach
adopted in some of the Case Studies.
A history of CALL pedagogy: Swings, spirals or an eclectic muddle?
Historically, educational theories and their resultant methodologies have

been subject to the influences and swings of educational fads and fash-
ionable philosophies at any given time; the value in revisiting the past is
to avoid reinventing the wheel by rediscovering and reviving the valuable
from what has gone before, and then integrating the dated but good into
the new.
The history of language teaching proves what Decoo has called the
mortality of language learning methods (2001). His insightful study of
language teaching and learning methods stretching back to the nineteenth-
century plots repeated cycles of evolution from the Reform movement and
Direct Method of the 1860s, through the Active Method, the Phonetic
Method, the Intuitive Method, the Reading Method, Behaviourism and
the Audio-lingual Method, Accelerative Learning, Suggestopedia, the
Silent Way, Community Language Learning, Total Physical Response,
SLA methods incorporating the Natural Method, Cognitive Method,
Communicative Method, and Constructivist Approach, and the most
recent: Post-Communicative language learning. These were interspersed
at various points with integrative, eclectic approaches that sought to draw
on the best elements of several approaches.
38 Chapter 2
In its own briefhistory CALL has attempted to incorporate, with vary-

ing degrees of success, elements of some of these theories. Behaviouristic
CALL, for example, seemed, from CALLs earliest days, to marry success-
fully drill-and-practice activity with the ability ofthe computer to handle
repetitive work, preset answers and feedback. Early communicative CALL
was a less successful response to digital recording and speech recognition
functionality in micro-processors in the 1980s. More recently, terms, such
as Integrative CALL (Warschauer 1996) and Integrated CALL (Bax 2003)
have been coined to describe the process of eclecticism and hybridization
that has been ongoing in CALL since the early 1990s.
Warschauer (1996) talked of a third phase of CALL: integrative
CALL, following on from but not necessarily abandoning all aspects of
behaviouristic CALL and communicative CALL. He stated that the intro-
duction of a new phase does not necessarily entail rejecting the programs
and methods of a previous phase; rather the old is subsumed within the
new. In addition, the phases do not gain prominence in one fell swoop, but,
like all innovations, gain acceptance slowly and unevenly. (1996: 3).
Benefits of behaviouristic CALL that have persisted and contribute
to hybrid CALL approaches might be the ability easily to deliver repeated
exposure to the same material, to provide immediate, non-judgmental
feedback and present such material on an individualized basis, allowing
students to proceed at their own pace thus freeing up class time for other
activities (Warschauer 1996: 3).
Integrative CALL also draws on features of communicative CALL.
According to Underwood (1984, paraphrased in Bax 2003), communica-
tive CALL
focuses more on using forms rather than on the forms themselves, teaches grammar
implicitly rather than explicitly, permits and encourages students to generate their
own sentences rather than just manipulate pre-packaged language, does not judge
and evaluate everything the students produce nor reward them with congratulatory
feedback, avoids telling students they are wrong and makes allowance for a variety of
student responses, uses the target language exclusively and creates an environment
in which using the target language feels natural, both on and offthe screen; and will
never attempt what a book can do just as well. (p. 16)
Warschauers notion of one phase subsuming aspects of a previous phase

goes beyond an uncritical adoption ofthose aspects. The fresh insights of
a new phase will inform and adapt the way the old form had been used to
ensure its limitations do not limit the new form. For example, there are
a number of drill-and-practice programs which could be used in a more
communicative fashion, if, for example, students were assigned to work in
pairs or small groups and then compare and discuss their answers (or, as
in Higgins, 1988, students can even discuss what inadequacies they found
in the computer program) (Warschauer 1996: 4). Furthermore, he argues
strongly that the nature ofthe pedagogy and methodology adopted in the
classroom is equally if not more significant than the actual software pack-
age used in determining which type of CALL approach holds sway: the
dividing line between behaviouristic and communicative CALL involves
not only which software is used, but also how the software is put to use by
the teacher and students (p. 4). And in his concluding remarks, he reiter-
ates that the effectiveness of CALL cannot reside in the medium itself
but only in how it is put to use (p. 9). Clearly our model for evaluation
will need to include judgments as to both the role played by the different
players in the CALL setting: computer(s) (networked or stand-alone),
software, platform, teacher/tutor, learner, environment, and the relation-
ship between them all.
Evolutionary progress in CALL approaches involves also, according
to Warschauer, a progression in understanding of the role the computer
itself plays in the classroom (cf. also Levy 1997, on software as tutor or
tool). For example, within behaviouristic CALL the computer tends to
function in the role ofknower-of-the-right-answer or computer as tutor
(Taylor and Perez 1989, cited in Warschauer 1996). Communicative and
integrative forms ofCALL may well also use the computer in this way, but
with increased amounts of student choice, control and interaction. Thus
the computer might be used more as a stimulus, fostering a more heuristic
approach to obtaining the right answer, stimulating students discussion,
writing or critical thinking, or as a tool/workhorse, where the computer, by
means of word processors, spelling and grammar checkers, desk-top pub-
lishing programs and electronic dictionaries and concordances, empowers
the learner to use or understand language, rather than necessarily provid-
ing language material (Warschauer 1996: 4).
40 Chapter 2
Writing in 1996, a few years after the arrival of the CD-ROM, mul-
timedia and the Internet, Warschauer was addressing the new impetus to
CALL brought about by hypermedia and CMC (Computer-Mediated-
Communication). From an evolutionary point of view Warschauer was well
aware ofthe seemingly limitless possibilities that these new developments
would bring: global, real time and asynchronous audio and visual com-
munication, access in the classroom to vast amounts of different authentic
material, the easier integration of a variety of language skills in a single
activity, the liberty now to focus primarily on content without sacrificing
focus on language form or learning strategies, and finally greater control
for students of their own learning. The potential for quantum leaps in
optimized learning packages, environments, and experiences was there,
and with it the potential for improved student motivation and learning
gains. Even with further advances since then his central argument is just as
relevant today as it was then; citing Garrett (1991: 75), he concludes: the
use ofthe computer does not constitute a method. Rather, it is a medium
in which a variety of methods, approaches, and pedagogical philosophies
may be implemented (Warschauer 1996: 6).
A linear evolutionary analysis is not the only way CALL analysts have
viewed CALL history. Bax (2003), for example, distinguished his own
appellation Integrated CALL from Warschauers Integrative CALL mainly
by stating that Integrated CALL had not happened yet, and was still an
ideal to be aimed for, whereas the reality behind Warschauers term either
was already happening in the communicative era (e.g. task-based, project-
based and content-based approaches where language use aimed to be in
authentic social contexts and to combine the various language skills) or else
was not happening at all. Whether integrated or integrative both authors
appeared to be calling for a move away from occasional, un-integrated use
of CALL towards a more imaginative and holistic approach. Warschauer
and Healey (1998, cited in Bax 2003) had pointed out that In integrative
approaches, students learn to use a variety oftechnological tools as an ongo-
ing process oflanguage learning and use, rather than visiting the computer
lab on a once-a-week basis for isolated exercises (whether the exercises be
behaviouristic or communicative) (Bax 2003: 5771).
Bax argued that in most teachers experience once-weekly bolt-on visits

to the lab are what still prevail throughout the world and that the ideal,
where CALL was an unremarkable (i.e. normalized) process of frequent,
ongoing use of a variety of tools in an integrated way, was still a long way
off. An evaluative model for effectiveness research will, therefore, need
to be able to gauge the contribution within a given CALL environment
made by the differing elements that make up CALL, be it an integrated
version or a more purist, single-theory approach. For the jury is still out as
to whether a particular approach is more effective because it better mixes,
or blends resources, media and content to suit the situation. While intui-
tion suggests this may be the case, evidence is still lacking as to whether
the normalisation of the computer as a tool for learning integrated into
the syllabus, adapted to learners needs (p. 21) leads to improved learning
outcomes when compared against a less normalized approach.
The history ofCALL has, according to Decoo, followed a pendulum
swing motion (p. 13) of repeated reaction against previous learning fads due
to the perception of a serious weakness or omission, though this history has
taken more easily to some approaches than others. Theory has not always
driven the approach. The reasons for this may be due in no small way to
the qualities of the medium itself. As Levy states (1997: 2829):
When empiricist theory (the dominant educational theory ofthe 1950s and 1960s)
predominated there appeared to be a perfect match between the qualities of the
computer and the requirements oflanguage teaching and learning. With the advent
ofthe communicative approach to language teaching, some writers began to say that
CALL methodology was out of step with current ideas (Stevens et al. 1986: (xi)), that
the ideas conflicted (Smith 1988: 5), and that CALL was not adaptable to modern
methodologies. (Last 1989: 39)
One cannot deny that the unique capabilities ofthe computer to support
drill-and-practice (i.e. behaviourist, habit-formation) methodology explains
in large measure the continued popularity of behaviourist didactics, and
the reintegration of much drill-based software such as the enduring Fun
With Texts, or the more recent HotPotatoes into the language learning
curricula of the current eclectic post-communicative era. Such a reha-
bilitation is occurring, ironically, at a time when multimedia technology
42 Chapter 2
is more capable than ever of delivering effective communicative CALL.

Whether programs and platforms reflect pedagogical trends or drive them,
what are increasingly needful are empirical evidence and evaluative tools
to enable purchasing managers, curriculum designers and teachers to make
intelligent choices as to how best to select from, implement and integrate
the large number of CALL resources and methods.
The software package TellMeMore that features in the Case Study of
Chapter 5 illustrates well the coexistence of multiple teaching and learning
theories in the one package. Technological advances in speech recognition
technology have no doubt caused it to play a central role in the program,
and arguably have driven the design of some ofthe activities. Such technol-
ogy was a major marketing tool in the 1990s for Auralogs products even
before the technology had been developed to a point where its inclusion
in a program actually was justified by an acceptable level of sensitivity and
accuracy. While the technology has clearly improved since then, the theo-
retical underpinning ofthe software remains unclear. In fact the role ofthe
speech recognition software hardly ventures beyond context-free, behav-
iourist, drill-and-practice, phonetic exercises. As Lafford (2004) states,
the program does not allow the learner to view the target language in situ
as a form of social practice (Kramsch 1993, cited by Lafford, p. 30), and
the dialogues do not always follow a cogent script (Schank and Abelson,
1977, cited by Lafford 2004: 30). The ability to create software that can
generate realistic, spontaneous and open-ended dialogue has yet to catch
up with ASR (Advanced Speech Recognition). Pedagogically, the design-
ers ofTellMeMore appear to have tried to avoid imposing a given learning
theory on the teacher by building in a vast amount of customisability to
provide the teacher with a myriad of choices as to settings and content mix.
For example, there are context-independent behaviourist drill-and-practice
type activities (such as the phonetic exercises) and more context-dependent
cognitive type activities (such as simulated role-plays). There is a choice
between the Free-to-roam mode and the Guided mode, the latter making
use of preset or teacher-set learning paths that the teacher or learner can
incorporate into the lesson. These learning paths, in turn, can be custom-
ized to reflect a particular behaviourist, constructivist or other approach.
There is even an authoring tool to allow the incorporation by the teacher
of authentic and up-to-date texts together with simple multiple-choice
exercises. No single, coherent theory can be said to be driving its use in

the software, but rather a combination of methodologies. This approach
reflects the current uncertainty as to which language learning method is
predominant, and has opted for an eclecticism where the language learn-
ing theory ball is, to a large extent, left in the teachers court. Maybe, then,
Decoo should have spoken of reincarnation-ality of language learning
methods, rather than their mortality given that the upward spiral ofCALL
and language teaching methods draws on the pasts apparent rejects. And
maybe the way forward is an organized eclecticism that draws on the best
in the search for an optimized, integrationist pedagogy.
Towards pedagogy for CALL for the twenty-first century

Blended CALL?
A more recent (i.e. since the arrival ofthe new millennium) development,
and one which reflects the above uncertainty and eclecticism has been
blended learning (BL). The term in its worst guise is used as a catch-all
for a thoroughly unreasoned pragmatism, but at its best appears to be a
synonym for a multi-modal approach that seeks to bring together best
practice from a range of pedagogies, methodologies and media in an opti-
mized package tailored to given situations. Its pragmatic nature has been
tagged as what works by one exponent in the business world whence the
term probably derived (Bersin and Associates 2003). One CALL author
has defined it as the optimal basis for language learning and teaching given
the particular conditions at hand (Neumeier 2005). Oliver and Trigwell
have linked BL to variation theory (2005: 1726) and see it as enhancing
learning through the controlled blending of media, modes of experience,
and patterns of variation. For the BLINGUA (blended learning) project
at the University of Ulster (see Chapter 6) we defined blended learning
in CALL as the adaptation in a local context of previous CALL and non-
CALL pedagogies into an integrated programme oflanguage teaching and
learning, drawing on different mixes of media and delivery to produce an
optimum mix that addresses the unique needs and demands of that con-
text. (Leakey and Ranchoux 2005: 358).
44 Chapter 2
Further support for adopting such a blending of pedagogy for CALL

(or at least CAL) has come from an analysis of effective e-learning pedago-
gies, from a range of academic disciplines not just languages, carried out
by Mehanna (2004). In an article entitled: The pedagogies of e-learning,
she presents a powerful case for adoption, in computer-assisted learning
(CAL), of a composite system that blends pedagogies and theories that
have been shown to work, rather than any single approach. Mehannas con-
clusions claim her composite system provides empirical reinforcement for
the idea that a blending ofdifferent learning theories and pedagogies was
not only possible but beneficial (p. 290), that it satisfied Olivers (2004)
call to justify the use of blended pedagogies and affirms other authors
propositions that e-learning in higher education should adopt pedagogical
models that are not fundamentalist in nature but allow for a complemen-
tarity between behaviourism and constructivism (Minocha and Sharp
2004, cited in Mehanna 2004) and cognitivism (Driscoll 2002). While
Mehannas study focused on the tertiary sector there is every reason to
believe that most, if not all, of the pedagogies found to be effective there
would be similarly effective in other educational sectors.
Her meta-analytical study uses a mixed methods multi-case study
research (p. 290) approach and looks at four postgraduate programmes in
four different UK universities (200 students and 14 tutors) and provides
convincing empirical evidence correlating a cluster of nine specific e-learn-
ing pedagogies that yielded improved student learning and outcomes in a
postgraduate e-learning course. Mehannas report sheds useful light on how
a holistic evaluative model for the measuring of a multi-modal approach
to CALL might operate, and threw up an interesting mix, or clusters, of
effective e-learning pedagogies, many of which might form the basis of a
blended pedagogy for CALL.
Nine e-learning pedagogies, drawn from techniques already proven
in face-to-face learning outside of computer-based environments, yielded
particularly significant learning gains (given as effect sizes ES). These nine
were, in rank order of effectiveness: identifying similarities and differences
between items, summarizing and note-taking (which include also as sub-
categories: filling missing parts and translation of information into a syn-
thesized form), recognizing student effort leading to improved engagement
in cognitive processes, homework and practice, non-linguistic representa-

tions (graphs, charts, maps, mind maps), cooperative learning (comprising:
positive interdependence, face-to-face promotive interaction, individual
and group accountability, interpersonal and small group skills, and group
processing), generating and testing hypotheses involving the application
ofknowledge, setting objectives and providing feedback (meta-cognitive
thinking), and activating prior knowledge by use of cues, questions, brain-
storming, etc. (Mehanna 2004: 281282). Most educational practitioners
would acknowledge at face value the worth of most ofthese methods and
will have probably used them themselves, whatever the subject discipline.
Mehannas findings were based on meta-analyses in a CAL, not CALL,
context. There is, nevertheless, good reason to conjecture that a similar
composite system for CALL will provide useful insights for an evaluative
framework for CALL. This enquiry represents a search for a valid blended
framework both for CALL pedagogy and CALL effectiveness research.
Any theory of blended CALL, however, would be seriously remiss if it
did not give significant attention to the central role played by construc-
tivist learning theory in the last three decades of language teaching, and
in particular that played by the theory of Second Language Acquisition.
It is time to consider what these are and what role they should play in a
developing a quality control model for CALL.
The theory of SLA Definition and scope
Coleman and Klapper (2005) state that for many years: there has been a
serious discrepancy between second language acquisition research findings
on the way foreign languages (FL) are learned and the way many univer-
sities have continued to teach them to students (p. 31). As a corollary of
this one might reasonably expect the discrepancy to apply to the way that
CALL itself is taught. A number ofCALL pedagogues have attempted to
apply a single theory oflanguage learning to CALL, but Chapelles (2001)
treatise on Computer-Assisted Second Language Acquisition (CASLA)
is one of the few also to link theory to CALL evaluation.
46 Chapter 2
SLAs links to constructivist views of learning ultimately go back as

far as Kant in the eighteenth century, who thought that experience leads
to the formation of general conceptions or constructs that are models of
reality (Fry et al. 2003:10). From Piaget onwards, the idea that individuals
grow intellectually by actively constructing their knowledge (be it cogni-
tive, affective, interpersonal or psychomotor) has taken hold. Piaget looked
back two centuries to Rousseau for evidence that constructivist approaches
to learning had been articulated a long time before, albeit in rudimentary
terms. Piaget states: It is true that he [Rousseau] provided eloquent proof
that it is impossible to learn anything other than by actively acquiring
mastery of it, and that the pupil must reinvent science instead of merely
repeating its verbalized formulas (1967: 139).
Since Piagets emphasis on the importance ofthe active acquisition of
knowledge by the pupil, in place of passive receptivity before a dominant
magister, cognitive constructivism has taken root and spawned numerous
studies and sub-theories, and altered significantly the way educators view
instruction and the learning process.
One could reference, for example, Bruners spiral curriculum the
idea that we revisit knowledge at ever higher levels of understanding (Fry
et al. 2003: 11), which in turn has given birth to approaches such as experi-
ential learning and reflective study. Such an approach echoes the views of
social constructivists, many of whom espouse the ideas ofthe psychologist
Vygotsky, a contemporary of Piaget, but who went beyond Piagets con-
cept of cognitive development requiring cooperation and mutual respect
in social interaction (1932: 79) to elaborate a more complex view of the
interrelationship between learning, development and the learners peers
and teachers. Vygotskys concept ofzones of proximal development were
built on the premise that what a child can do with the assistance of others
might be in some sense even more indicative oftheir mental development
than what they can do alone (Vygotsky 1978: 85). The zone of proximal
development he defines as
the distance between the actual developmental level [of a child] as determined by
independent problem solving and the level of potential development as developed
through problem solving under adult guidance or in collaboration with more capa-
ble peersthe distance between the assisted and the non-assisted performance of
an individual. (p. 86)
In other words, children, adolescents and adults can, and do, create their
own tested truth through interaction with other more advanced learners.
The concept ofscaffolded learning that ensures consistent and structured
support and guidance for the learner has been elaborated from this theory.
Such an approach echoes Higgins (1983) call for the more facilitative role
ofthe classical pedagogue to be adopted in a constructivist, heuristic use
ofthe computer in education, balancing the more prescriptive, knowledge-
dispensing magister of conventional drill-and-practice education. Through
its ever-expanding array of online and offline support mechanisms, tuto-
rials, helps, reference materials, plug-ins, tracking and feedback systems,
CALL clearly has the potential to provide specialized and differentiated
scaffolding for all types of learners.
Elaborations of Vygotskys ideas have in recent years led to increased
emphasis on amongst other things, collaborative learning, paired/group
activities and projects, peer assessment and feedback, and these are a key
feature of language learning as taught in many, if not most, schools and
universities. CALL pedagogues and indeed designers of courseware, ana-
logue and digital language labs have also sought to take social construc-
tivist approaches on board to nurture pair/group learning whether by
random pairing in a lab or group writing projects via web-based chat or
conferencing.
In its striving to describe and explain language acquisition, SLA
research has, in the last decade, increasingly focused on the point where
learning and the learner meet: the task. Chapelle (2001) charts six principles
or criteria for CALL task appropriateness: language learning potential,
learner fit, meaning focus, authenticity, positive impact and practicality.
CALL effectiveness researchers now regularly refer to this agenda to evalu-
ate CALL software and pedagogies; it will in turn inform the evolution of
the MFE1 and MFE2. Table 2.1 provides a basic definition of each of the
principles. A brief explanation of these principles then follows.
48 Chapter 2
Criteria for CALL task

Definitions
appropriateness
The degree of opportunity present for beneficial
Language learning potential
focus on form
The amount of opportunity for engagement with
Learner fit language under appropriate conditions given learner
characteristics
The extent to which learners attention is directed
Meaning focus
toward the meaning of the language
The degree of correspondence between the learning
Authenticity activity and target language activities of interest to
learners out of the classroom
The positive effects of the CALL activity on those
Positive impact
who participate in it
The adequacy of resources to support the use of the
Practicality
CALL activity
Table 2.1 Chapelles six criteria for CALL task appropriateness.

Source: Chapelle, C. (2001: 55). Computer applications in Second Language Acquisition.
By language learning potential, Chapelle means the degree of opportu-

nity present for beneficial focus on form (p. 55); by this she distinguishes
language learning activities from opportunities for language learning use
where there is no focus on form and students come away with no enhanced
understanding of structure. Important in Chapelles concept oflearner fit
is that activities be designed with an understanding of individual differences
and be at a level that is challenging while not dispiriting. A paraphrase for
learner fit could be appropriateness oflanguage presented, that is a mixture
of already known material and new material that is not beyond the learners
grasp (p. 56). This echoes closely Vygotskys zone of proximal development
and the importance of scaffolded input. Meaning focus is defined as the
extent to which learners attention is directed toward the meaning ofthe
language (p. 55). The key point here is that the task or activity requires the
learner to carry out a real or meaningful task or exchange of information, or
make a decision to achieve a goal, and thereby go beyond form-based tasks
such as grammatical gap-fill or substitution activities (p. 56). Such activities
may involve any or all ofthe four main language skills, where communica-
tion is the purpose. Authenticity is seen as the degree of correspondence
between the learning activity and target language activities of interest to

learners out of the classroom (p. 55). Positive impact denotes the posi-
tive effects of the CALL activity on those who participate in it (p. 55).
In other words, the language learning experience should help learners
develop their meta-cognitive strategies (Oxford 1990, cited in Chapelle
2001: 57), increase their desire and ability to transfer their learning outside
the classroom and their interest in the target culture. And finally there is
practicality, that is, the adequacy ofthe resources to support the use ofthe
CALL activity (p. 55). This criterion refers to the whole back-up system
within an institution to ensure that the CALL resources are in working
order and that the activities devised can be delivered successfully within
the given constraints. In a CALL environment the most crucial elements
of practicality have to do with the reliability and effectiveness of the
platform (VLE or multimedia lab), the technical support staff, and staff
development. Any evaluative model for CALL, and indeed for language
teaching, would be foolish to ignore these criteria.
Learner fit, learner differences and learner strategies
Each ofthese six criteria has a wealth of related meaning, and has drawn on
and also inspired other CALL or SLA researchers. While there is no point
in restating all of the background behind all of Chapelles criteria, a brief
foray into one ofthe six will demonstrate the range of associations behind
the term and their usefulness to CALL evaluation. To take Chapelles
second criterion, learner fit, it is clear that already a vast amount of work
has been done on incorporating the reality of learner experiences, differ-
ences and strategies into both language teaching and CALL.
SLA and constructivist theory is as much about understanding the
dynamics at work within and between individual learners as it is about
describing the universal characteristics of language acquisition. Piaget
himself looked back to a much earlier era, once more citing Rousseau
whose advice to teachers, in the preface to Emile, or On Education (1762)
was: begin by studying your pupils, for assuredly you do not know them at
all (Piaget 1967: 139142). The prior learning, experiences and individual
differences that our students bring to CALL must surely also be factored
into the impact equation.
50 Chapter 2
In 1983 Krashen spoke ofaffective variables and affective filters (such

as motivation, self-image and anxiety levels) which influence a learner
positively or negatively. In CAL/CALL new anxieties and resistances
regarding technology, or technophobia, have emerged, which can also be
seen as affective filters. Teaching, therefore, should aim to supply good
comprehensible input and lower the affective filter, (Krashen 1983: 38).
The field ofSLA has come some way since Krashen and a number of revi-
sions in the light of advances in several ofthe disciplines listed above have
taken place, most notably in the field of psycholinguistics. It is in large
part thanks to the insights of SLA that we are now more aware that the
CALL classroom introduces new affective variables and filters, motivational
issues and self-image issues for a small minority of learners, who may not
have come up against these hurdles in the ordinary classroom. By the same
token, the CALL classroom can liberate some learners who struggle in the
conventional classroom context (i.e. rows of desks + black/white-board
+ teacher at front).
In the 1990s CALL researchers, drawing on the insights of Human-
Computer Interaction (HCI), began to address issues of ergonomics and
inter-subjectivity in the CALL classroom. For example, the entire CALICO
(The Computer-Assisted Language Instruction Consortium) conference
of 1994 entitled Human Factors was given over to looking at issues of user-
friendliness, interface, ease-of-use and accessibility of content. Chapelle
argued there that HCI can answer pedagogical and psycholinguistic ques-
tions about SLA as it relates to the CALL context. In the light of these
insights her evaluative approach took an interactionalist perspective, which
posits that key learner factors and contextual factors enable SLA researchers
to construct hypotheses and focused on three main aspects of interaction:
the nature ofthe input into the learner, learner interaction with the input,
and the output the learner produces (Chapelle 1994, cited in Levy 1997).
Articulated in the same year as the 1994 CALICO conference was
Elliss (1994) framework for understanding the relationship between indi-
vidual learner differences, strategies and language learning outcomes. These,
by extension, have a bearing on learning that derives from CALL. Learner
difference he defines as beliefs (i.e. of the individual about the language
learning process), affective states (such as anxiety), general factors (such
as age), and previous learning experiences. SLA research has identified a
vast array of learner differences, each of which is influenced in turn by
other variables such as age, gender, and prior learning experience. Some
of these, such as gender, age, and aptitude are fixed, or at least cannot be
altered by the individual, others, such as motivation, attitude and learning
strategies, are potentially alterable through individual decision, experience
and/or learning. The research methodology for most of the Case Studies
will include a qualitative survey of learner difference that aims to qualify
and quantify student learning style, prior language learning experience and
beliefs and subjective judgments regarding their language learning.
To this end just one particular list oflearning styles has been adopted:
Reids VARK model (1987). This hypothesizes four key learning style vari-
ables: visual, auditory/aural, kinaesthetic, and read/write, although origi-
nally the R was a T (for tactile learning). This was selected for the following
reasons. First, it was originally articulated as a result of analysing the behav-
iours oflanguage students as opposed to other kinds oflearners. Secondly,
it is the list used by the University of Ulster (and many other universities
and schools) as the basis for its Personal Development Planning online
self-diagnosis questionnaire to help students understand how they learn.
Finally, the list of four variables is conveniently brief by comparison with
other learner style/personality type lists, such as the four pairings of the
Myers-Briggs Type Indicator (MBTI): Extroversion-Introversion, Intuitive-
Sensing, Thinking-Feeling, Perceptive-Judging (see Myers, McCaulley,
Quenk, and Hammer 1998, cited in Hu 2006: 47), or the four pairs oflearn-
ing styles in the Felder-Silverman list: Active-Reflective, Sensing-Intuitive,
Visual-Verbal, Sequential-Global. Brevity was considered important for the
purpose of making it easier to obtain sample sizes large enough for analysis
of covariance. The more variables one has the fewer the number of indi-
viduals who fall into any one category and the less generalizable ones data
become. The danger with this approach is that what one gains in sample
sizes one may lose in the precision of ones learning style descriptors. The
VARK list is neat and popular but, of course, leaves plenty of gaps in its
description of the way individuals learn.
A model of evaluation that seeks to measure effectiveness of a peda-
gogy or CALL object must acknowledge that a multitude of dynamics are
at play in the real-life CALL classroom and in the individual learner, and
that it is virtually impossible to control for all these potentially confound-
ing variables. Every researcher must, therefore, couch his/her inferences in
cautious terms and with reference to the caveats that exist.
52 Chapter 2
Having looked at the mix oflanguage related pedagogy as it pertains

to CALL, it is time to clarify the nature of, and role played by, digital plat-
forms in the CALL environment, and to justify the need to include them
in our evaluation of CALL.
Platforms
It is hard to settle upon a generic term to describe the underlying hardware-

based or virtual foundation on which learning resources are installed. This
is perhaps not surprising given the huge conceptual stretch between the
following foundational units that might qualify as platforms: Operating
System, Language Laboratory, Managed Learning Environment (MLE),
WorldWideWeb (WWW), Interactive Whiteboard (IWB), VLE, and
Digital Laboratory. Given the vast scope represented by the above for the
purposes of this study, VLEs and the digital lab will be the main points
offocus as these are the most recently-evolved kind of platform to appear
on the scene, hence less work has been done on them and, furthermore,
their inner workings are almost exclusively designed for an educational
context. Inasmuch as the WorldWideWeb is accessible from most digital
labs and VLEs, its use too will need to form part of an integral evaluation
agenda. The days ofthe analogue language laboratory, on the other hand,
are over. The Case Studies will include reference to a number of platforms
currently being used in higher education language learning environments:
Robotel, Melissi, WebCT, and Blackboard (the latter merged with WebCT
in 2005).
The first two ofthese come under the label Digital Laboratory, the last
two are VLEs. Robotel, ofCanadian provenance, produces high perform-
ance commercially marketed platforms, capable of serving both intranet
and distance learning packages. Robotel describe them as computer col-
laboration platforms that use highly-reliable hardware-based distributed
switching technology to support real-time sharing of information includ-
ing multimedia information among computer workstations (see: www.
robotel.com). Melissi is a less expensive, yet versatile and user-friendly

system, developed for use in a networked e-lab, by a UK team with a back-
ground in higher education. Davies et al. (2005) list all three of these in
their more comprehensive list of platforms:
Activa Solutions (formerly known as ASC Telecom): SmartClass

(Robotel), Esprit, Esprit LE and D.A.V.I.D.
Artec Electronics: VACS-22 Virtual Recorder
CAN-8 VirtuaLab
Keylink Computers: Virtual Language Lab (VLL)
Melissi Multimedia: Digital Classroom, Digital Lab, BlackBox
Sanako (formerly known as Tandberg, Teleste and Divace Learning
Systems) Lab100 (Elice), Lab 250, Lab 350 and Lounge100 Virtual
Classroom
Sonys Virtuoso/Soloist labs, marketed by SANS (Software And Net-
work Solutions), and ConnectEd (UK).
Sun-Tech: Digital Language Laboratory (DLL).
They further state that:
Specifications vary enormously. For example, Artec, CAN-8, Keylink, Melissi and
Sun-Tech labs are purely digital and only need normal network cables, whereas Sanako
(Divace) also requires the room to have special analogue cabling. Activa Solutions
Esprit uses hardware interface boxes for remote control and monitoring. CAN-8
requires that lessons are pre-authored with supplied tools, whilst the others allow
the teacher more flexibility and spontaneity. Prices also vary considerably. (p. 6)
As for the principal VLE that this study will focus on, WebCT, which
describes itself as the worlds leading provider of e-learning systems for
educational institutions (<http://www.webct.com/>), was founded in
1995 by Murray Goldberg, ofthe University ofBritish Columbia. In Feb-
ruary 2005 the two companies Blackboard and WebCT merged; the VLE
WebCT Vista is now one of its frontline products.
When assessing the role played by such platforms in the learning pro-
cess our quantitative and qualitative evaluation will need to assess such issues
as best fit within a diversity of pedagogies and didactic approaches, and
54 Chapter 2
make holistic judgments as to cost-effectiveness. Recently a major survey

of digital platform use in Higher Education was conducted by Toner et
al. (2007); this aimed to obtain a general idea as to the provision, use and
application of technologies in language teaching, as well as the impact
of digital laboratories on language teaching. This collaborative survey
conducted by the UK Subject Centre for Languages Linguistics and Area
Studies, on behalf ofthe University ofUlsters Centre ofExcellence in Mul-
timedia Language learning, received 147 returns (83 from UK universities
representing 56 institutions, 43 from other educational sectors in the UK
and 21 from educational institutions outside ofthe UK). Key findings were
that there was a fair degree of consensus as to their benefits, in particular
in their ability to encourage autonomous learning amongst students (4.0),
provide/ storage of audio/ video and other media files (4.0), introduce
audio, video and other media to classes (3.9) and encourage student engage-
ment (3.8) (Toner et al. 2007), where the figures in brackets represent the
mean rating on the surveys Likert scale (from 1 to 5, where 1 = Strongly
disagree, and 5 = Strongly Agree). Lesser noted benefits included effec-
tive monitoring and supervision of learners (3.3), an increase in teacher
interaction with students (3.2) and interaction among students (3.2) the
flexibility of being able to make timely interventions and give assistance
(3.2). On a more negative side there was a general perception however that
new technologies increased workloads for staff (3.5), involved technical
problems which impacted on the effectiveness of teaching, and reduced
tutor contact hours somewhat (2.5) (Toner et al. 2007). A remarkable aspect
of the study was its attempt, albeit superficially, to evaluate the role of all
three Ps: platforms, programs and pedagogy. A major limitation of the
study from the perspective of this enquiry was that it was only addressed
at teachers and did not garner student reaction. Furthermore, it relied
entirely on qualitative and subjective reaction, and involved no attempt
at an empirical evaluation of learning gains.
From the positive consensus in some of the responses in this survey
it is clear that there is at least a perceived importance of the role that the
digital platform plays in the language learning process; from the fact that
such a survey is almost unique amongst the plethora of qualitative CALL
surveys one is probably safe in saying that further research of this kind
is much needed and a proper evaluative framework required to help us

understand the significance of the role it is playing in language learning.
What, then, of the importance of CALL programs?
Programs
Three virtually interchangeable terms are used in relation to CALL pro-

grams: software, applications and courseware. The latter term will be used
most frequently in this study as it covers both the idea of software and the
concept of applications designed for use in an educational context.
CALL courseware design has been informed by almost as many
assumptions as there are courseware designers. Levys CALL Survey
showed that the design of software has tended to be more practitioner-
led than theory-led (1997). To that one could add also two other drivers
of courseware design: technological advances and the growing market in
commercially-produced CALL software. As early as 1988 Pederson said:
The point, however obvious, needs to be restated: CALL, in and of itself,
does not result in more and better learning, it is the specific way instruc-
tion is coded in CALL software that has the potential of affecting learning
positively, for specific learners in specific contexts (p. 107).
Software is not dismissed in the CALL impact equation; it is merely
that one must be careful when ascribing causality, and focus on its effects,
and effectiveness, in situ. Pederson goes on to say that one obvious problem
in CALL is to provide evidence that a given software package is designed
and programmed effectively (p. 108). She adds that the wise language
teacher should examine evaluative research reports carefully for clear edu-
cational objectives, a specific target audience, and an adequate evaluative
consensus from classroom teachers, students, and CALL experts (p. 109).
In other words, the evaluation of CALL programs should be intercon-
nected with CALL pedagogy and the two should not be mutually exclu-
sive activities.
56 Chapter 2
In terms of software, this study will focus, in particular, on a commer-

cially-produced package called TellMeMore, which is a successful commer-
cial product that one could argue has been both market and technology
driven. No doubt its creators will argue that language learning theory has
informed its development and that practitioners have been part of the
development team, and, as has been suggested, there is evidence in its
design of an awareness of constructivist and behaviourist approaches to
language learning. It will, however, be shown that the consistent applica-
tion oflanguage learning theory and a clear pedagogical approach appear
to some extent to have been surrendered to exercises determined by the
capacities of the available technology.
One question that needs to be answered is whether software that incor-
porates a blend oftechnologies and is driven by a hotchpotch oflanguage
learning theories or none can be equally as effective as, or even more effec-
tive than, software that is rigidly subservient to the dictates of one particular
theory. It is ultimately up to the teacher or pedagogue to impose his or her
teaching structure and learning theory on an item of software whatever its
informing pedagogy; however, not all languages software grants the teacher
the freedom to be flexible or eclectic. Some software works from a closed
database of language and content that cannot be added to or updated,
while others are more open to being updated and authored; some software
integrates well with the internet while other software does not need to,
cannot or has not been programmed to even though this might benefit it.
An evaluative framework for CALL software needs to allow for this vari-
ability while giving the evaluator the conceptual tools to be able to judge
each program against an objective standard.
Towards a method for evaluating CALL
If the current state of CALL represents a mix of approaches, despite the

preponderance of constructivist (SLA) ideology, and if its history of pen-
dulum swings and spirals of eclecticism and re-incarnation suggests a similar
future, then it is clear that a common agenda for evaluating CALL, both
now and in the future, is essential so that clear benchmarks are established.
On the other hand, a one size fits all approach to such evaluation will need
to be both flexible enough to handle the hybrid nature of much CALL
activity and the scope of the Three Ps, and rigid enough to ensure repeat-
ability across a range of educational sectors and geographical contexts.
The next chapter looks at what evidence there already is for CALLs
effectiveness, and how this evidence was obtained. Both the existing evi-
dence and the methodologies will be discussed in the context of four key
debates in the field of CALL effectiveness research: the improvement
debate, the comparison debate, the configuration debate, and the outcome
versus processes debate. This will prepare the ground for the subsequent
chapter which assembles a new framework for evaluation drawn from the
lessons of the past.
Chapter 3
Has CALL made a difference: And how can we tell?
Introduction
The shifting parameters of CALL effectiveness research
To judge by the conclusions of several CALL researchers there is much to

learn from the mistakes and successes ofthe past. Levy, for example, talks
ofthe contemporary relevance of old projects and past experience (Levy
1997). Even the oft-maligned empiricist approach of 1950s and 1960s lan-
guage teaching dubbed by Stern (1983: 169) as pedagogically audiolingual-
ism, psychologically behaviourism, linguistically structuralism can inform
our post-communicative era (Levy 1997: 14), or contribute to the best
practice of future eras. One can summarize the preoccupations of CALL
effectiveness research in its relatively short history in the following four
principal debates:
1. The improvement debate: Does CALL improve language learning?

2. The comparison debate: Can comparative evaluations be of any value
in demonstrating learning gains?
3. The configuration debate: What combination of methods is best for
measuring progress in CALL?
4. The outcome vs. processes debate: Do we only focus on measuring
learning outcomes? What about learning processes? Can we measure
the latter? If so, how?
The history of effectiveness research in CALL shows a move away from a

preoccupation with proving that CALL, of itself, improves students second
language competence towards an interest in the variables involved in the
60 Chapter 3
language teaching and learning process within computer-based environ-

ments. Nevertheless, we can glean useful lessons and methodology from
each ofthe different debates, even the less fashionable ones, that will inform
a new evaluative methodology.
Debate 1 the improvement debate:

Does CALL improve learning?
As will be demonstrated below there is a certain amount of research pointing

to the possible benefits of using computers in language teaching, however,
according to a report by the British Educational Communications and
Technology Agency (2004), the body of empirical evidence, compared
to other subject areas, is small. BECTA reports that further longterm
academic studies are needed in order to obtain a clearer picture of the
uses and benefits ofICT in the teaching and learning ofMFL, states that
twenty percent ofMFL teachers still make little or no use ofICT to sup-
port their teaching but that this figure is declining (citing DfES, 2003).
While most of the BECTA studies appear to be linked to the secondary
sector, the conclusions of their report, entitled: What the research says
about using ICT in modern foreign languages, may be broadly applicable
to other educational sectors as well. Below is a summary ofthe main points
made as they relate to the improvement debate.
First, TOP (Teachers Online Project 2001) found that ICT can
benefit the four language skills oflistening, speaking, reading and writing,
often with overlapping benefits. Tschirner (2001) found that digital video
was particularly effective in supporting language acquisition in the class-
room. Harris and Kington (2002) show that video-conferencing benefited
listening and speaking skills. In terms of its impact on attainment, BECTA
cites the ImpaCT2 project (2002) which found that when compared to
other subjects, the degree of use of ICT showed the greatest mean differ-
ence in relative gains. Even though the overall use of ICT in MFL was
quite low, when it was used, the extent of use made the greatest difference.
(Harrison et al, cited in BECTA, p. 3).
Other Case Studies showed links between ICT in MFL and improved
motivation and attainment (Blow 2001; TOP, 2001), understanding of
language, confidence, and examination performance (Superhighways Ini-
tiative, 1997).
In its introduction to its summary of research findings BECTA is
careful, however, to point out that the technology itself, in isolation from
effective pedagogy, must not be seen as the prime agent of gain, any posi-
tive impacts depend on the ways in which ICT is used. Improvements in
attainment and motivation will inevitably be reliant on the capacity of
teachers and students to use ICT as an effective pedagogical tool in the
pursuit of particular learning objectives (BECTA, p. 3).
Overall figures for HE are harder to come by, and one is dependent
on occasional meta-analyses such as the Felix one (2005c) which found
that very few studies aimed to obtain empirical quantitative evidence of
learning gains, and occasional country-wide surveys such as the Multimedia
Language Learning in UK Universities survey (Toner et al. 2007) which
received responses from 56 UK HE establishments. This study, however,
focused on the use of multimedia hardware in HE language teaching and
provides no qualitative or quantitative data regarding the impact ofCALL
on language learning outcomes. For evidence of the impact of CALL on
learning outcomes at HE level one is largely dependent on small-scale
studies published in the literature, the majority of which point to quali-
tative benefits, but lack empirical evidence of learning gains, and whose
replicability is often questionable.
When discussing the effectiveness or impact oftechnology, be it plat-
forms, programs or pedagogy, it is vital that the terms and parameters of
reference be clearly established so that we are clear as to the full range
of forces and variables at play. The community of CALL effectiveness
researchers is slowly extricating itself from the blind alley it had moved
down in the 1970s and 1980s, namely that of seeing CALL as treatment
applied to the learner, and then attempting to measure the effect of that
treatment on learning, without factoring in the role played by a host of
other factors such as the tutor, the teaching approach, the environment, and
most importantly the internal dynamics ofthe learner. In large part due to
the influence of a more cognitive approach such as that advocated by the
SLA community, warning sounds began to emerge in the late 1980s from
62 Chapter 3
a number of individuals. There were, it appears, two tracks to this blind

alley: the one, a misappropriation oflearning gains to the technology (i.e.
computer + software) itself; the other an over-simplistic conceptualization
of CALL as the agent of improvement.
Warnings regarding the first error were sounded as early as 1988 Peder-
son who pointed out the trap that some effectiveness research studies were
already falling into, namely that of attempting to attribute learning gains
to the medium itself rather than the way the medium was manipulated to
affect achievement (1988: 104). In 1993 Yildiz and Atkins (p. 134, cited
in Levy 1997: 30) then documented a fundamental shift from the early
quest in evaluation studies for the holy grail of a computer that would take
over the role of the teacher and do it better, to an emphasis upon a more
atomistic study ofthe qualities ofthe new media and their relation to the
learning, the learner and the learning context. They call for studies that
analyse the underlying beliefs regarding how students learn and how these
assumptions or beliefs influence software design.
In the same year as Pedersons warning came Doughtys call to move
away from an attribution of gains to a simplistic notion of CALL (i.e.
technology + treatment) to something at once more holistic and nuanced:
this product-oriented approach to the evaluation of the effectiveness
of CALL has proven unsatisfactory primarily due to inattention to the
central role of the learning process and the corresponding influence of
learner characteristics (1988: 137). Several studies, from CALL and CAL
authors, were showing that what is delivered affects learning far more than
the delivery system and warned of confusing the two (Schramm 1977,
Clark 1983, and Salomon 1979 cited in Pederson 1988: 108 & 111). The
primary focus of 1970s CALL research had been: Is CALL effective in
improving students second language competence? Chapelle (1989, cited
in Levy 1997: 30) gives four key reasons why the assumptions behind this
focus had since been rendered invalid. Firstly, more recent research had
shown that the term CALL encompassed a wider range of activities than
was originally conceived (i.e. CALL was about more than text-manipula-
tion and the behaviouristic emphases of the 1960s and 1970s). Secondly,
second language competence was now seen as a more complex interrela-
tion of skills that as a whole were difficult to measure exactly. Thirdly, it
was coming to be seen that a positivistic focus on measuring learning gains

by merely measuring learning outcomes, was ignoring the significance of
analysing the learning process, and the phenomenology ofthe classroom/
multimedia laboratory environment. Finally, Chapelle argues, individual
learner differences were being shown to affect significantly the process of
second language acquisition (1989: 79).
This focus away from technology as the cause of improved learning
gains and away from a simplistic view of CALL as merely treatment was
accompanied by a growing scepticism regarding the value of comparative
analyses where CALL was compared to non-CALL activities and contexts.
So concerned was Pederson, for example, at the problems linked to com-
parative studies that she argued for them to forever be abandoned (1988:
125). The next section summarizes this scepticism and asks whether there
might be any value in certain types of comparative studies.
Debate 2 the comparison debate
Behind this debate lies the question: can comparative evaluations be of any
value in demonstrating learning gains? Pedersons critiques of compara-
tive studies related to problems of replicability, attribution of causality,
and language learning theory. First, she argued that comparative stud-
ies cannot be easily replicated for the reason that the conditions under
which the study took place are hard, if not impossible, to reproduce. She
asked: if the independent variableis use of the computer versus use of
a traditional method, how can the classroom teacher in another setting
be assured that his or her use of the computer will be identical to that of
the primary study? (p. 106).
She goes on to state that there is no valid way to ascribe with confi-
dence the causes for differences in the dependent variables to the independ-
ent variables. For these reasons, she argues, any results will be difficult, if
not impossible, to generalize (Pederson 1988: 106107).
64 Chapter 3
According to Pederson, comparative studies usually fail to hold

hypotheses based on language learning theory and therefore it is hard to
integrate their results into the growing research base and use the findings
to improve classroom teaching (Pederson 1988: 106107).
Comparative studies suffer from the dilemma that a researcher will
never be able to demonstrate scientifically that one medium has delivered
better results than another using the same content with the same students;
for how can one run a comparative study without wiping the students
memories and rewinding the tape so as to run the same test again but via
a different medium?
Even were one able to wipe the treatment groups memories before
rerunning the test, a whole range of other variables might be shown to be
contributing to the outcomes such as the novelty factor, feedback, lesson
format, learner expectations, display mode and form, response mode, cogni-
tive response, motivation etc (Pederson 1988: 107). Of course, these same
variables and others (such as place and time of test, prior training in, and
exposure to, the medium) might well pertain, and thus skew the data, in
the more likely context of a similar but not identical comparison group.
Is there, then, any value in such comparative studies? From the litera-
ture studied it would seem that as long as the context is clearly defined,
the learning materials similar if not identical across studies, the data col-
lected on repeated occasions and under differing contexts and as many
variables as possible are controlled for, then there may be some value and
generalizability in the findings. This value would increase if the focus of
the study were not so much to show that CALL, in and of itself, produces
improved learning, but rather to gauge the effectiveness of combining the
method of instruction with the softwares potential of affecting learning
positively (Pederson, p. 107).
Ultimately the effectiveness of anything is often best proven by com-
parison. We would be unable to evaluate anything if we did not have a
yardstick, or control, against which to measure our object of focus. The
discipline of statistics has devised research designs that minimize the con-
founding effect of variables and isolating or accounting for possible skew-
ing factors. The rigour or demonstrability of proof, from a positivistic or
empiricist perspective, will hinge on the validity ofthe data that accompany
claims of effectiveness. Validity can be defined as the degree to which
research findings accurately reflect reality (Chapelle & Jamieson 1991:

38). Accuracy requires precision of measurement, and measurement, by
definition, requires numerical data.
Statistical testing of a population sample to demonstrate effectiveness
of any kind of treatment, whether it be that of a medical, dietetic, social
or educational nature, requires a comparison or control or blind group to
provide the plumb-line against which the treatment group is compared, in
order to throw into relief any differences applicable only to the treatment
group. The most common statistical tests to provide evidence of signifi-
cance of difference in the effectiveness of a treatment are between-groups
or within-group comparisons, independent samples, or related samples,
t-tests, various tests of correlation, and effect size calculations. These all
require the comparison of sets of data from two or more different groups
(i.e. in the case ofbetween-group comparisons) or comparative data from
the same group (i.e. in the case of a within-subjects comparison, for exam-
ple, where each subject is their own control (e.g. Al-Seghayer 2001). Meta-
analyses, which synthesize the findings of a large number of related studies,
also depend upon comparative techniques and are used for aggregating
the results of multiple empirical studies to determine the direction and
size of relationships between similar variables across these studies (Felix
2005c: 272).
Such tools of measurement and data analysis can operate in other
humanities-based fields; surely they can operate in a CALL context. The
question is whether they can effectively attribute correlation and causality
in the complex dynamic where platforms, programs and pedagogy overlap
and where mental processes interface with computer processes.
This enquiry works from the basis that comparative techniques are
essential to the obtaining of statistically acceptable, empirical data, and
that, as long as the research design is not attempting to attribute causality
simplistically to the medium, but rather looking at the multiplicity of vari-
ables involved, then the enquiry is a valid one. There are a number of ways in
which Pedersons issues of replicability and attribution can be tackled. First,
one might run the same content past comparable students (same course, and
age, similar IQ and learning experience, at the same time of day and point
in the course, and so on) but using the different medium under scrutiny.
Secondly, one can ensure that the same teacher conducts both studies, and
66 Chapter 3
in the same setting. The key in the repeat test is to alter as few variables as
possible, preferably only one, in order to increase attribution of causality.
Thirdly, and in order to make the study applicable outside the cohort and
institution of study, the key is to obtain as large a sample size as possible so
as to increase its generalizability or external validity. Fourthly, longitudinal
time-series analyses are a way of ensuring students are exposed to the same
or at least similar conditions, and of enlarging the sample size where the
institution has cohorts smaller than thirty students (the minimum needed
for the purposes of assuming normality). Here, a series of observations is
made on the same variable consecutively over time. The observations can be
on identical or similar units. Felix is an advocate ofthis approach (2005c:
17). The BLINGUA project in the Pedagogy Case Study is an attempt at
a longitudinal approach to a comparative study.
Chapelle gives a fifth way in which external validity (replicability and
generalizability) can be achieved. As long as practitioners are fully informed
of the specific nature of the context of instruction, the characteristics of
the students, and the type of CALL activities undertaken in a particular
study then it may be possible to relate such findings to a different context
where such conditions do not pertain in exactly the same way (Chapelle
1991: 4953).
Debate 3 the configuration debate
This debate asks: what combination of methods is best for measuring

progress in CALL? A number ofCALL authors have spoken ofthe value
of a configuration of data collection methods to ensure a rich picture of
what is happening in the classroom, lab, or online. Murray states that the
point is that none ofthe research tools employed in this study, when taken
individually, appear to offer a great deal of pertinent information. However,
configured as a network, narratives, diaries/journals, video observation
and interviews produced data which conveyed a picture of the learners
experience from his/her point of view. (Murray 1999: 191, cited in Levy
2000: 180).
Alongside a warning that such multiple-method studies have their

drawbacks too especially in terms of information overload and intru-
siveness overload and intrusiveness Murrays conclusion is that CALL
research needs the kind of insight that such approaches bring to light
(Murray cited in Levy 2000: 180). One should note that Murray was here
only talking about qualitative measures.
Felix, in the same year, adds quantitative measures when making the
same point about configured methods in an effectiveness study of a German
CD-ROM project:
An excellent aspect of the evaluation was the variety of data collection techniques
used, and this approach is highly recommended for future research. Questionnaire,
journal and test data complemented the information collected during the observa-
tions. These latter, in particular, yielded interesting information that would have
been difficult to obtain through questionnaires. They clearly confirmed the general
enthusiasm for the approach. They also highlighted differences oflearning styles and
preferences among students. (Felix 2000b: 61; emphasis added)
Early agenda-setting in this regard came from Salaberry. In his article enti-
tled: A theoretical foundation for the development of pedagogical tasks in
Computer Mediated Communication (1996), Salaberry calls for greater
rigour in the experimental design ofCALL studies: Adrawback of arche-
typal CALL programs has been the lack of appropriate empirical studies
that assess the benefits of such programs (Salaberry 1996: 10).
Salaberry cites reported deficiencies in the few empirical studies
addressing the pedagogical benefits ofCAI on learning (e.g. Reeves 1993;
Schmitt 1991, cited in Salaberry 1996). Schmitt, he says, noted small sample
sizes, lack of criteria for what constitutes appropriate software, faulty statis-
tical analysis, and inadequate length of treatment to measure educational
outcomes. Reeves mentioned the lack of theoretical framework, infre-
quency and brevity of experimental treatments, small sample sizes, and
large attrition in the number of participating subjects (Salaberry, p. 9).
Salaberry also lists the lack of use of a control group to measure
increased learning as an outcome and the Hawthorne effect. He cites
Reeves solution to these problems which is a step-wise and configured
approach of multiple research methods: conduct extensive, in-depth
studies to observe human behaviour in our field and relate the observations
68 Chapter 3
to meaningful learning theory that may be later susceptible to quantitative

theory (Reeves 1995: 44, cited in Salaberry 1996: 10).
Salaberrys principal argument is that the use of qualitative studies
should not rule out the design of quantitative studies that present a clear
theoretical rationale and a sound research design (p. 10) and that experi-
mental studies should include a clear theoretical rationale that guides
the investigation, an iterative design that promotes a more encompassing
framework of analysis (p. 27).
A variety of configurations of data-gathering methods are used in
the Case Studies of this project. What the majority have in common is
the combination of test data with questionnaires, and/or e-journal, and/
or focus group semi-structured interviews. The test data, if internally and
externally valid, add an empirical dimension to the judgmental data.
However, in developing an all-encompassing approach one needs to
be mindful ofMurrays warning about intrusiveness, for one runs the risk
of creating a monster that will itself skew the data gathered due to its inter-
ference in the learning process. The Model for Evaluation (MFE2) arising
out of this project, nevertheless, attempts to bring together and then test
in the Case Studies many of the above principles and tools of evaluation
in its evolution of an encompassing framework of analysis. On some occa-
sions concerns about intrusiveness prompted reductions in the diagnostic
burden imposed on the students.
Debate 4 the outcome vs processes debate
This debate revolves around the following questions: do we only focus on

measuring learning outcomes? What about learning processes? Can we
measure the latter? If so, how do we go about it? The summary of SLA
theory in Chapter Two drew attention to the growing interest in under-
standing the learner as well as the learning, in describing and explaining
the impact of individual differences and learning strategies on learning
outcomes. This interest remains high on the research agenda for many
CALL researchers. Felix states:
There exists a clear trend away from the comparison studies carried out during the
1980s that wanted to find out whether teaching with computers was better than
teaching without them. One of the reasons for this is surely the difficulty of car-
rying out valid research of this kind in natural settings. The most obvious reason,
though, is that in an environment where computers have become a natural part of
the educational environment and in which we have learnt that teachers will not be
replaced by them, the question is no longer as interesting. What remains interesting
to investigate is how technologies are impacting learning processes and as a conse-
quence might improve learning outcomes. (2004: 127; 2005c: 16)
Crucial to an evaluation of synergies across the Three Ps is the understand-

ing of the relationship between the design and the instructional theories
behind the development ofthe courseware and the reality on the ground,
that is, the learning contexts and environment in which the courseware is
used, its relationship to other materials, resources used, what one might call
non-computer work, and ultimately the effects on the language learners
and teachers in terms of motivation and learning outcomes.
In her challenging article entitled CALL in the year 2000: Still in
search of research paradigms? (1997: 19) Chapelle suggests an SLA-inspired
evaluative agenda that she will then fully articulate in her book (2001). She
urges that CALL researchers adopt an approach that looks to L2 class-
room research in general and interactionist research in particular, which
direct us to investigate two critical questions about CALL: What kind of
language does the learner engage in during a CALL activity? How good is
the language experience in CALL for L2 learning? (1997: 25).
Her theory-driven approach is at once process and outcome-focused.
The SLA-informed approach she proffers will:
investigate the extent to which learners have mastered a specific linguistic point, the
meta-cognitive strategies learners use while working on CALL, or the quality of
the cross-cultural experience they gain through CALL. Accordingly, other research
methods, such as experimental, correlational, introspective, or ethnographic meth-
ods, might also be used. (1997: 28)
She concludes that it seems necessary to shift from general approaches such
as those of psychology, computational linguistics, and educational technol-
ogy to the specific questions and methods of researchers who investigate
instructed SLA. (1997: 28). Whether this means that CALL (or CASLA)
70 Chapter 3
should forever abandon the insights and methods of non language-related

disciplines is debatable, but as shall be seen there is much that can be gleaned
from revisiting the work of pre-CALL and non-CALL SLA theorists and
researchers for the purposes of constructing a considered application of
instructed SLA theory to CALL practice and CALL evaluation.
Applying past lessons to an improved model

of CALL evaluation
The search for an optimal platform, program or pedagogy for CALL may
yet be shown to be in vain. The range of different possibilities in terms of
language task, learner need, instructional method and language learning
theory is so vast that it is unlikely that any one product or approach will
ever be proven to be vastly better than the rest. What is needful, however,
are comparative studies that are formative in nature, which highlight those
approaches and combinations of theory, design, environment, platform,
courseware and pedagogy that work best together, and thereby contribute
to improved CALL design, an enhanced integration of CALL into lan-
guage learning curricula, and improved CALL pedagogy.
To develop an evaluative methodology that will effectively measure
the impact of CALL on students progress, the following four aspects of
course design logic will have to be taken into account at both the analysis
stage and the reporting stage. First, there needs to be an awareness of the
nature of the thinking that lay behind the development (if the materials
were developed in-house) or choice of courseware used. Was the develop-
ment or selection made on the basis of an instructional or design theory
or were pragmatic, context-specific issues paramount? Secondly, clarity is
needed as to the basis on which the teaching context, that is the computer-
based environment and the wider language teaching and learning context,
was constructed. Thirdly, the particular pedagogical approach, if any, that
has been adopted will need to be identified. And fourthly, the degree of
integration of the CALL activities into the wider language learning cur-
riculum needs to be described and explained.
To this day, these four key debates remain pivotal in the world ofCALL
in general and in the field ofCALL effectiveness research in particular. The
answers to them will continue to determine the direction we head in. The
first debate, or question, as to whether CALL improves language learning
has met with a guarded yes as long as the question is framed aright it is
not so much about whether the computer itself can deliver improved learn-
ing gains as about whether an intelligent integration of good hardware and
courseware and sound pedagogy can do so. Secondly, comparative studies
can be of value, again as long as the aim is not to prove the effectiveness of
computers or software rather than an integrated and sound CALL peda-
gogy, and as long as reporting ofthe study clearly states any differences in
conditions under which each element of a study carried out. Comparisons
between various CALL approaches are also valid, indeed essential, and are
a central focus of all ofthe Case Studies in this enquiry. The third debate,
regarding the optimal configuration of data collection methods, highlights
the importance ofhaving a variety of methods, both qualitative and quan-
titative, to ensure that both a rich and an accurate picture is obtained as to
what is going on in the CALL activity under observation. Effectiveness
researchers in CALL/CASLA such as Pederson, Chapelle and Felix have
laid down clear and directives for a rigorous approach to construct valid-
ity in CALL measurement, and their agendas will contribute much to the
qualitative and quantitative measures ofboth MFE1 and MFE2. The fourth
debate has shown that there is ambivalence as to the respective weight we
should be putting on processes and outcomes. What is needed, as Salab-
erry stated, is a more encompassing framework of analysis to survey and
measure, qualify and quantify what is going on both within the individual
learner and between learners in the learning process, all the while adding
evidential data to the bank of unexplored, under-explored and disputed
areas of CALL learning gains.
CALL evaluation needs a matrix oftheory-derived criteria for observ-
ing CALL learning process, that is, the CALL task, activity, and experience.
Such a matrix should also have the capacity to conduct a kind of quality
control of what is going on in the CALL environment, that is, in the learner
and in the learning. Process will be hard, if not impossible to evaluate
quantitatively, and its evaluation may be primarily a matter for judgmen-
tal evaluation, that by definition cannot be substantiated by measurement
72 Chapter 3
of any objective kind. It may be that to obtain an empirical assessment of

the effectiveness of process we must focus on measuring outcome, and
endeavour to link by statistical inference the data from the latter with that
from the former. To that end CALL evaluators also need to improve both
the identifying and the validating of those outcomes. A clear and agreed
methodology is required for the construction of research designs that can
determine the impact of key variables involved in a given CALL context
on learning outcomes (in particular those that the educational institution
and the teaching staff have control over, i.e. the platform, program, and
the pedagogy). Such guidelines should also help identify any correlations,
if not causality, between the learning processes and those outcomes.
Furthermore, the internal and external validity ofCALL research designs
needs to be clearly established and maintained along an established code
of practice that draws on the lessons of past good practice. The next chap-
ter will present, justify and explain a possible way forward for both ofthe
above: a matrix oftheory-derived criteria for judging the nature and quality
of the CALL learning experience and a series of guidelines for conduct-
ing a configured empirical (both quantitative and qualitative) analysis of
learning outcomes in the CALL environment.
Chapter 4
A model for evaluating CALL

Part 1: CALL enhancement criteria
Introduction to the model for evaluation
The lessons learned from the CALL and CAL literature and the Case
Studies yielded a prototype (MFE1) which is outlined and anticipated in
Figure 4.1 and justified theoretically, and in the light of a review of the
relevant literature, in the remainder of this chapter. Those wishing to see
the presentation and explanation of the final complete model (MFE2)
assembled in the light of lessons learned while conducing the Case Stud-
ies, as well as a complete set of evaluative checklists, may skip to the final
chapter (Chapter 9). The Case Studies are included to demonstrate how
various aspects of the model for evaluation were applied to the Three Ps
and trialled in real-life educational settings.
There are essentially two routes through the evaluative process, as sug-
gested in the conclusion to the previous chapter: a judgmental appraisal
ofthe twelve CALL enhancement criteria in a given unit ofCALL teach-
ing, and the empirical (qualitative and quantitative) evaluation of that
unit through the prism of one, two or all three ofthe Three Ps (platforms,
programs, pedagogy). Using the twelve CALL Enhancement Criteria as
a starting point for any CALL evaluation should help to clarify the scope
and angle of approach to a planned judgmental and/or empirical enquiry,
and help inform the direction and progression offuture evaluative studies.
The Qualitative and Quantitative Measures route then outlines the pre-
cise methodological steps such studies should at least bear in mind when
designing the research construct for a study, then when gathering data for
and implementing a study, and, finally, when reporting on it.
74 Chapter 4
The CALL Enhancement Criteria route, which it is suggested should

be followed first, is made up of twelve criteria that have emerged from a
review of the relevant literature (CALL and non-CALL) and a number
of mapping exercises linked to that literature: Chapelles Six principles for
CALL task appropriateness were seen, in Chapter 2 as an already accepted
and useful basis for the evaluation ofCALL-related pedagogy from a con-
structivist/SLA perspective. However, Chapelles Six Criteria were never
devised to address, let alone evaluate, digital platforms such as multimedia
labs, VLEs and Interactive Whiteboards, and CALL software programs as
distinct from CALL pedagogy; so it is not surprising, therefore, that they
are inadequate for the task of judging their impact. So the additional six
Enhancement Criteria, listed in Figure 4.1 and defined in Table 4.3, were
needed to address the full scope ofCALL. They were arrived at as a result
of the mapping of Chapelles Six against a number of evaluative agendas
and frameworks from the field ofCALL/SLA/e-learning literature. These
six additional principles are language skills and combinations of skills,
learner control, error correction and feedback, collaborative CALL, the
teacher factor, and tuition delivery modes. The theoretical rationale for
all twelve is given in the first half ofthis chapter. Chapter 8, after the Case
Studies, will show how these might be used in future CALL evaluation
studies, and will provide a more detailed breakdown of each criterion
into sub-elements. Each criterion is given its own evaluator checklist; for
example, for the criterion error correction and feedback the evaluator is
asked to assess whether a CALL unit of study, or a CALL program, or a
CALL digital platform provides for implicit or explicit feedback, summa-
tive or formative assessment, whether it allows for, amongst other things,
the monitoring, tracking, reporting and certification of student work, as
well as staff and/or student access to correction and feedback data.
Figure 4.1 provides, by way of introduction, a simple at-a-glance
overview as to how these twelve criteria sit in relation to the overall pro-
posed evaluative process.
A model for evaluating CALL Part 1: CALL enhancement criteria 75
EVALUATION FLOW-CHART
CALL enhancement criteria
Learner fit
Meaning focus Qualitative & Quantitative Measures

Authenticity Diamond timeline
Positive Impact Research Design Criteria
Practicality Data Collection Methods
Language skills Validity checklist
Language control Data Collection
Error correction & feedback Report write-up
Collaborative CALL
Teacher factor
Tuition delivery modes
Figure 4.1 Evaluation flowchart (MFE1).
After an appraisal of any, or all, ofthe CALL Enhancement Criteria as they

relate to a given context the idea is that the evaluator who is interested in
a more empirical, positivist study, will move on to follow the Qualitative
and Quantitative Measures route in Figure 4.1. This route proposes a step-
by-step progression through a series of evaluative checklists, also devised
from CALL and non-CALL effectiveness literature, most of which are
also trialled in the Case Studies.
These measures are presented and justified in the second half of this
chapter, in tabular format to make them easier to grasp and more usable in
the context of an in situ evaluation. Each measures table is accompanied by
76 Chapter 4
an explanatory narrative. After a conceptualization ofthe evaluative process

in the form of a timeline diamond the evaluator is taken through a series
of questions relating to Research Design Criteria, such as the quantitative
instruments used, the conditions under which the statistical analysis should
be carried out, and issues relating to sampling. The second empirical step
has to do with Data Collection Methods; these are divided into qualitative/
judgmental (such as surveys and focus groups) and quantitative/empirical
methods (such as pre- and post-tests). A checklist for Internal and External
Validity then follows to ensure that the evaluator has ensured that key vari-
ables have been identified. In MFE1 the Research Design Criteria quality
control questions appear just once, after the timeline; however, in MFE2 it
is appears twice, the second time being just prior to the final Report Write-
up stage to ensure that the criteria have been adhered to in the study.
Twelve criteria for the evaluation of CALL Enhancement
Enlarging upon Chapelles evaluative method
In drawing up her criteria for CALL task appropriateness to guide CALL

evaluation (2001: 5160), Chapelle asserts that three needs have to be
addressed to improve the quality ofCALL/CASLA evaluation. First, evalu-
ation criteria need to show due regard to theory and findings regarding the
ideal conditions for SLA; second, such criteria need to be explained to the
user, or as Chapelle puts it: a theory of evaluation needs to be articulated
(p. 52); third, there needs to be an application of the above criteria and
theory to both the software and the task devised for the learner. What is
seminal to the approach is her call to combine a judgmental method of
evaluation of software and teacher-planned CALL activities and tasks
with a statistically rigorous empirical method of evaluation ofthe learners
performance. In other words, subjective evaluation must be combined with
objective, qualitative with quantitative, observational with experimental.
The implication is that the one without the others weakens the validity
of the evaluation. Together with her awareness of the need to judge both
the software and the pedagogical task it is clear that in her approach we
have the basis for a theory-driven, holistic, and configured approach to
CALL evaluation.
Chapelles approach addresses theory-driven pedagogy while also
being teacher-, courseware designer- and researcher-friendly, in that she
accompanies her criteria with clearly stepped questions that relate to each
ofher evaluation criteria. Where she differs from many CALL evaluators
is that her approach is based on a single theory (SLA) whereas others, as
we shall see, are either theory-neutral or employ a hybrid mix of theories.
While Chapelles approach, being single-theory based, is less flexible than
others, her six principles are generic and flexible enough to operate at a
number of different evaluative levels (e.g. at the level of evaluating task
appropriateness (p. 55), test usefulness (p. 101), and in a variety of different
contexts, in particular, judgmental analysis ofCALL appropriateness (p. 59)
and empirical evaluation ofCALL tasks (p. 68). They are accompanied by
repeated calls for empirical evidence and are followed up by clear guidelines
on ensuring suitable internal and external validity.
Our proposed new Model for Evaluation also prompts evaluators in
their own studies to seek out evidence for adherence to criteria evidence
that is, ideally, both objective and measurable and then to relate each of
these criteria and their sub-elements to any, or all, of the Three Ps and to
the different phases of a teaching cycle. It also includes space for evalua-
tors both to rate the quality of the evidence they find using Likert scale
ranking and open-ended comment boxes.
Having a generic methodology that operates at a number of different
levels is both an advantage and a disadvantage. It is an advantage in that it
is a reasonably simple and memorable model that enables quite a holistic
approach to evaluation; the disadvantage of this is that it is not always
adaptable in its entirety to the exigencies of a given context. A model with
a clearly-defined, though narrow, focus will occasionally be inadequate to
address the theoretical requirements and detail of a more complex, multi-
modal situation, and in such instances might require modification. For
example, at one point Chapelle, in applying her task-focused model to a
different context, has to deviate from her six criteria. When her context
shifts from CALL task appropriateness to CALL test usefulness, makes
use of just three of her six principles (i.e. authenticity, positive impact
78 Chapter 4
and practicality), and replaces the remainder with the new principles of
reliability, construct validity, and interactiveness (see Table 4.1). Our
larger model for evaluation will indeed incorporate these notions but they
are categorized and distributed differently within a full, and therefore,
more flexible evaluative framework. Reliability is seen on the one hand
as a CALL Enhancement Criterion and, as such, is subsumed into the
principle of practicality, and on the other hand is seen as data validity
criterion and, along with construct validity features in the Data Col-
lection Measures section of our final model. Interactiveness, however, is
deemed to be too multi-faceted a notion to come under one discrete head-
ing. Given the increase of interactivity across the full range of computer-
assisted learning the notion has been distributed across at least four of
the six additional criteria: it is relevant to learner control, collaborative
CALL, error correction and feedback and tuition delivery modes in the
final list of twelve criteria.
Qualities of test usefulness

Quality Definition
Reliability The consistency of the performance reflected in scores
The appropriateness of the inferences made on the basis of test
Construct validity
scores
The correspondence of characteristics of the testing activity to
Authenticity characteristics of relevant non-test contexts where language is
used
The expected extent of involvement of the test takers
Interactiveness knowledge and interest and of their communicative language
strategies in accomplishing a test task
The positive consequences that a test can have on society and
Positive impact educational systems and on the individuals within the systems
(i.e. learners and teachers)
The adequacy of the available resources for the design,
Practicality
development, use and evaluation of the test
Table 4.1 Chapelles criteria for evaluating the qualities of test usefulness.
As for the decision to change Chapelles terminology from criteria for evalu-
ation ofCALL task appropriateness to criteria/principles for evaluation
ofCALL enhancement this was guided by two principal considerations.
First, Chapelles six criteria are restricted to judging the CALL task, and
therefore exclude the judging of platforms, as well as some other features
ofthe CALL experience such as error correction and feedback. Secondly,
the term appropriateness, while being an excellent term to describe the
suitability and the fit of CALL provision did not extend to the idea of
value added and the difference that CALL makes (or does not make) to
student learning which was deemed to be essential aspect of effectiveness
research.
To adapt further Chapelles criteria to the Three Ps two adjustments
were made to two ofher own definitions (see Table 4.2), before adding the
six new criteria. Her definition of authenticity referred to the degree of
correspondence between the learning activity and target language activ-
ities of interest to learners outside of the classroom (2001: 55 emphasis
added); this reference to the classroom was enlarged to include also the
CALL environment as the word classroom excludes the notions of the
dedicated VLE and the multimedia laboratory which increasing numbers
of institutions have adopted to replace the dated analogue language labora-
tory. Secondly, her definition ofpracticality was given as the adequacy of
the resources to support the use ofthe CALL activity. This was lengthened
to include the cost effectiveness of such resources, given the importance
of budgetary considerations in most institutions and the need to choose
the less expensive way if the learning gains delivered by two differently
priced resources prove to be similar (cf: Clark 1994: 22, cited in Allum
2002: 147).
80 Chapter 4
6 criteria for CALL PPP

Definitions
evaluation
degree of opportunity present for beneficial focus
on form
amount of opportunity for engagement with
Learner fit language under appropriate conditions given
learner characteristics
Chapelle (2001)
extent to which learners attention is directed

Meaning focus
toward the meaning of the language
degree of correspondence between the learning
activity and target language activities of interest
Authenticity
to learners outside of the CALL environment or
classroom
positive effects of the CALL activity on those
Positive impact
who participate in it
adequacy and cost-effectiveness of the resources to
Practicality
support the use of the CALL activity
Table 4.2 Chapelles six criteria for evaluation of CALL task appropriateness.
Cambridge University Press, adapted with permission
(adaptations indicated in italics).
The six new CALL enhancement criteria (see Table 4.3), additional to
Chapelles six, were arrived at over the course of a five year project and
drew from a study of eight different authors from e-learning (Mehanna
2004), CALL (Chapelle 2001, Ingraham and Emery 1991, Hubbard 1988,
Dunkel 1991, and Pederson 1988) and SLA (Ellis 1994), from a UK-wide
survey of multimedia laboratory use at HE level (Toner et al. 2007) and
from the design criteria of a manufacturer of digital labs for language teach-
ing (Melissi 2007). To obtain these additional criteria Chapelles own cri-
teria were mapped against the varied evaluative concepts and hardware/
courseware design features arising from these sources. The wording ofthe
definitions for each new criterion was new but informed by the relevant
literature and fine-tuned through the experience gained in conducting
the Case Studies.
6 new principles for CALL

Definitions
PPP evaluation
the ability to deliver, either in isolation or in
Language skills and combination, all the main language skills, listening,
combinations of skills speaking, reading, writing, vocabulary, grammar and
area studies as well as meta-cognitive language skills
the degree of opportunity for self-directed, self-paced
Learner control
and autonomous learning
the extent of automated correction of error (whether
Error correction and explicit or implicit, formative or summative) and
feedback the monitoring, tracking, storing and reporting of
progress, level and achievement
Leakey
degree of opportunity for paired, group or class

Collaborative CALL interaction creating the social dynamic for learning
through concerted and collaborative effort
the influence of individual teacher personality and
style factors on the effectiveness of CALL; the
Teacher factor
quality and relevance of ongoing staff training and
development
capacity of the CALL platform, software or pedagogy,
to enable the delivery of a variety of teaching modes
(such as lecture, seminar, tutorial and practical) in a
CALL setting
Table 4.3 Additional six principles for evaluating CALL enhancement (Leakey).
While most ofthe sources were interested in pedagogy linked to computer-

assisted learning, none of them covered all of the Three Ps; one author,
Ellis, looked just at pedagogy with no reference to CALL; the remainder
with an interest in CALL have looked at least two of the Three Ps the
most frequent combination being pedagogy and programs. It soon became
clear that the least researched area was that of digital platforms and their
impact on the learning gains of students. The tally chart (Table 4.4) plots
the occurrences of references to each of the Twelve CALL Enhancement
Criteria across the eight principle sources in the literature. These eight are
grouped according to their principle focus, be it platforms, programs, or
pedagogy. The mapping exercises that follow will be dealt with in the same
order as they appear in Table 4.4.
82 Chapter 4
From the tally chart one can make a number of points relating to the
relevance of the sources to the generation of a new model for evaluation.
The distribution ofthe top two scores in each row (shown in italics) reveals
a good spread across the sources and strong justification for each criterion.
Clearly, both the Chapelle criteria and new criteria have a good spread of
representation across the sources; this is shown by the total scores for each
row (lowest is ten, and highest thirty-seven). Also, the Chapelle criteria
resonate well with most of the new criteria, and the new criteria resonate
well with most of the Chapelle criteria; the main exception to this being
the two platform-related columns where Chapelles criteria fare less well,
for reasons already mentioned. One can infer also that all the sources have a
strong resonance with constructivist and SLA concepts. This is also backed
up by the strong showing ofEllis (the principal SLA author ofthe sources)
against most ofthe criteria, and indicative ofthe fact that most teaching of
modern and foreign languages is nowadays heavily, though not exclusively,
influenced by constructivist ideology and practice.
What follows are the mapping exercises for each of the eight sources
above accompanied by a commentary explaining how each mapping exer-
cise was used to generate, define and justify the six new criteria. These new
criteria have been listed in the far right hand column of each mapping table
and, when considered with the mapping exercises from the other evaluative
agendas from the literature, they have been deemed significant enough to
include in the final list of evaluative principles.
Platforms Programs Pedagogy
Ingraham &
Toner et al.
Mehanna
Hubbard
Pederson
Dunkel
Melissi
Emery
Ellis
12 criteria for judging
Total
CALL enhancement
14 criteria
10 criteria
10 criteria
16 criteria
15 criteria
11 criteria
11 criteria
9 criteria
Language
0 1 0 1 5 3 0 9 19
learning potential
Learner fit 1 3 4 3 4 6 7 4 32
Chapelle
Meaning focus 0 4 0 2 7 2 1 5 21
Authenticity 1 4 1 0 2 0 1 1 10
Positive impact 2 0 9 3 1 3 1 2 21
Practicality 7 15 4 9 1 1 0 0 37
Language skills and
0 4 0 0 3 2 0 1 10
combinations of skills
Learner control 1 8 2 1 2 1 2 1 18
Error correction and
1 4 0 1 1 1 2 3 13
Leakey
feedback
Collaborative CALL 2 6 1 1 2 0 0 1 13
Teacher factor 6 1 0 0 2 1 1 3 14
Tuition delivery
7 4 0 0 0 3 0 4 18
modes
Totals 28 54 21 21 30 23 15 34 226
Table 4.4 Tally chart of exercises mapping the twelve CALL Enhancement Criteria.
Mapped against key authors from the literature and CALL practice.
84 Chapter 4
evaluation supplementary to
Proposed principles of
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
Chapelle
Author
PPP
Descriptor
Encouraging autonomous Provides learner

x
learning among students control
Interaction &
Increasing teacher inter
Collaborative
action with students
CALL
Interaction &
Increasing interaction
Collaborative
among students
CALL
Introducing audio/video
x x
and other media to classes
Provision/storage of
Toner et al. (2007)
audio/video and other x

Platforms
media files
Encouraging student
x x
engagement
Increase in tutor workload x
Technical problems
impact upon effectiveness x
of class
Reduction in tutor con-

x
tact hours
Effective monitoring /
x
supervision
Timely intervention /
x
assistance
Table 4.5 Mapping the University of Ulster/LLAS (Toner et al. 2007) survey questions.
For digital platform evaluation against Chapelles six principles for CALL evaluation.
Mapping exercise 1: Digital platforms (Toner et al. survey) vs Chapelle
Mapped against Chapelles six criteria for CALL task evaluation (see Table
4.5) the questions used by Toner et al. (2007) in their survey of digital
platform use in the UK Higher Education sector, already discussed in
Chapter 2, did not specifically cover issues relating to language learning
potential and meaning focus and was lacking also in the area of learner
fit and authenticity. This was mainly due to the fact that its principal focus
was on the functionality ofthe digital platform and less about pedagogical
content or method. There was, therefore, reasonable coverage ofpositive
impact and strong coverage of issues ofpracticality. On the other hand the
survey did consider two new areas that did not fit easily into the Chapelle
list: the provision oflearner control (encouraging autonomous learning)
and the promotion of interactivity and a group dynamic, which we have
classified under collaborative CALL.
Mapping exercise 2: Digital platforms (Melissi) vs. Chapelle
An analysis of performance indicators for a digital platform (Melissi),

specifically designed for language teaching and quite popular at UK HE
level, was, like the Toner et al. survey, strongly focused on issues of prac-
ticality. Its interface and functionality both leant themselves to intuitive
ease-of-use.
86
Language learning
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
potential
Author
PPP
Descriptor evaluation supplementary

to Chapelle
Teacher can easily produce audio and video files

from cassette, VCR, DVD and CD and transfer x x 4+ skills benefit
them quickly to the students
Melissi digital classtoom criteria (2005)
Teachers are able to produce lessons on the fly Teacher style factor;
x
without long, complicated advanced preparation Tuition delivery modes
They can also produce complete activities in
Tuition delivery modes;
advance that can include: audio; video; pictures; x
Platforms
4+ skills
text and instructions
Students log in and are allocated file storage space
x Learner control
on the teachers computer or server
Students can record their voice in synchronization

x x Learner control; 4+ skills
with the audio playback of these lesson files
Chapter 4
Students can work on documents, using a word
processor, while listening or watching material either x x x x Learner control
sent from the teacher or off the web
A model for evaluating CALL Part 1: CALL enhancement criteria
Teacher can speak individually or collectively to the
x x x Error correction and feed-
students through their headphones
back; Collaborative CALL
Learner control;
Students can call and speak to the teacher x x
Collaborative CALL
Learner control;
Students can call and speak to the teacher x x
Collaborative CALL
Melissi digital classtoom criteria (2005)
Student and teacher can engage in group Learner control;

x
conferencing Collaborative CALL
Teacher can monitor the students recordings Error correction and
x
Platforms
during and after feedback

Viewing of students desktop and remote control of Error correction and
x
students mouse and keyboard feedback
Ability to control several Classrooms from one
x Error correction and feed-
central server
back; Collaborative CALL
Students can telephone other students in the class x x
Students can text other students in the class x x
Students can subtitle a video x x x
87
Table 4.6 Mapping the Melissi Digital Classroom performance indicators.
Mapped against Chapelles six principles for CALL evaluation
(see also <http://www.Melissi.co.uk.htm> [accessed 13 April 2005]).
88 Chapter 4
The teacher console allowed for increased customizability, content

authoring/manipulation and differentiation by student need and level, and
therefore meant that issues oflearner fit and meaning focus also featured
in the mapping exercise. This exercise also pointed to six possible further
criteria: tuition delivery modes, and the teacher style factor, coverage of
the four language learning skills and ability to deliver benefit from combina-
tions ofthe four skills, the teacher style factor, learner control and error
correction and feedback (see Table 4.6). These evaluative criteria would,
after subsequent insertion into the remaining mapping exercises, be deemed
generic and distinctive enough to be included in the final model.
Mapping exercise 3: Programs and courseware design

(Ingraham and Emery) vs Chapelle
Ingraham and Emerys report on their Hypermedia approach to language

training (1991) provided, in the first decade ofCALL, useful early design
reference points for language courseware designers and users, all of which
remain relevant today. Looking at these from the perspective of effective-
ness research, these reference points lend themselves easily to the role of
evaluative criteria, as Alderman, author of the evaluative report on the
pioneering TICCIT project, stated what is a goal for a developer is often
a question for an evaluator (1978: 29). Ingraham and Emery, both coming
from EFL/ESL backgrounds, had an approach to courseware design that
aimed to marry pedagogy and technology in a way that was both prag-
matic and cost-effective (p. 322). Their design headings reflect the peda-
gogic nature oftheir priorities: methodological issues, active and passive
learning, authenticity and interaction and response. They place levels of
competence first on their list of overall objectives and structures, and this
has proven prescient both through the subsequent adherence and neglect
of them by CALL designers. Chapter 7 will feature a software product
TellMeMore Campus (or Online) that is a good example of both adher-
ence and neglect in this regard, on the one hand harnessing advances in
computer-adaptive testing to match the students learning paths to his/her
performance in the online placement test, and on the other persisting with
some challenging games and activities that are, often, beyond all but the
most competent students, or simply inauthentic in their content.
Ingraham and Emerys sub-topics televisual environment, windows
environment, screen design, hypermedia and linearity, autonomy versus
control, autonomy and self-tuition will prove important in qualitative
assessments of student and staff reactions to these elements. Performance
by CALL programs against these criteria will be seen to play an important
role in motivating or de-motivating users. An evaluative model needs to
construct research activities to test the premise that software that matched
student levels of competence to levels and the lesson structure within the
software package leads to greater and quicker learning gains (quantitative
measures) and greater student satisfaction and motivation (qualitative
measures) than a package that did not do this. These issues will feature
significantly in Chapter 7 when Ingraham and Emerys criteria are used
to evaluate the TellMeMore software program.
Ingraham and Emerys final set of criteria under the heading practi-
cal considerations (including authenticity, active and passive learning,
interaction and response) anticipate at least two of Chapelles six prin-
ciples. The BLINGUA project at the University of Ulster (see the Case
Study in Chapter 8) applied such criteria as practicality and workability
to its evaluation of a blended learning project for CALL in the context of
undergraduate language learning.
When mapped against Chapelles six principles for CALL evalua-
tion (see Table 4.7), Ingraham and Emerys agenda for CALL courseware
design (1991) has no coverage of language learning potential, meaning
focus, positive impact and little coverage of authenticity. Their agenda
does throw up, however, other supplementary headings already met in the
Melissi mapping: the provision of learner control and the promotion of
collaborative CALL in. As with the other mapped authors these supple-
mentary principles of evaluation will feature in MFE2 to ensure a fuller
and more comprehensive analysis.
Programs PPP
Ingraham & Emery (1991) in Levy (1997) Author
Methodological Overall objectives and
issues structure
Language learning methods
The televisual environment
Levels of competence
CAL methodology
Descriptor
Course structure
Lesson structure
Learner fit
x
Meaning focus
Authenticity
Positive impact
Practicality
x
tion supplementary to Chapelle

Proposed principles of evalua-
Chapter 4 90
A model for evaluating CALL Part 1: CALL enhancement criteria
The windows environment x
Screen design x
Interface issues
Ingraham & Emery (1991) in Levy (1997)
Hypermedia and linearity x
Autonomy versus control x x Provides learner control

Programs
Autonomy and self-tuition x x Provides learner control

Practical considerations
Authenticity x x
Active and passive learning x x
Interaction and response x Collaborative CALL
Table 4.7 Mapping of Ingraham and Emerys (1991) evaluative headings. For CALL courseware design against
Chapelles (2001) evaluative agenda for CALL tasks.
91
92 Chapter 4
Mapping exercise 4: Programs and courseware design

(Hubbard) vs Chapelle
Hubbard, in his article Language teaching approaches, the evaluation of

CALL software, and design implications (in Smith 1988), constructed
an evaluative framework for software that assesses its fit to pedagogy. His
software evaluation matrix, drawing on behaviourist, explicit learning as
well as acquisition approaches, goes beyond merely addressing generic
features of software to judging their language-specific and pedagogical
qualities. Hubbards guidelines help connect two ofthe Ps: programs and
pedagogy. One ofHubbards principal aims was to ensure that an institu-
tions purchasing decisions were preceded by informed evaluation of best
pedagogical fit and thereby constrain software publishers to develop design
criteria that include approach considerations (Hubbard 1988: 252). Hub-
bards questionnaires can thus be integrated into an evaluative framework
for both software design and pedagogy quality control. The Hubbard-
inspired and SLA theory-driven descriptors in the mapping exercise (in
Table 4.8) are drawn from a Hubbard questionnaire relating to what he
calls Acquisition Approaches in a chapter entitled Language teaching
approaches, the evaluation of CALL software, and design implications
(Hubbard, in Smith 1988: 246247).
When mapped against Chapelles six principles for CALL evaluation
(see Table 4.8), one sees that Hubbards agenda for a second language acqui-
sition (i.e. SLA) approach to CALL courseware design and evaluation lacks
reference to the need for authenticity of content and has sparse coverage
oflanguage learning potential but does yield, and thereby reinforce, head-
ings already met in other mappings, that were not there in Chapelle: the
promotion of collaborative CALL, error correction and feedback and
the provision oflearner control, all three of which have a broad relevance
to current technology-based pedagogy across a range of disciplines.
Proposed principles
Language learning
supplementary to
Positive impact
Meaning focus
of evaluation
Authenticity
Practicality
Learner fit
potential
Chapelle
Author
PPP
Descriptor
Provides meaningful
communicative interac-
x x x
tion between student
and computer
Provides comprehensi-
ble input at a level just
x x
beyond that currently
acquired by the learner
Promotes a positive self
x x
image in the learner
Motivates the learner
x x
to use it
Hubbard (1988)
Motivates the learner to

Programs
x x
acquire the language
Provides a challenge
but does not produce x x
frustration or anxiety
Error
Does not include overt
x correction
error correction
and feedback
Allows the learner the Provides
opportunity to produce x x x learner
comprehensible output control
Acts effectively as a
catalyst to promote
Collaborative
learner-learner inter- x
CALL
action in the target
language
Table 4.8 Mapping of Hubbards (1988) evaluative headings.

Against Chapelles (2001) evaluative agenda for CALL tasks.
94 Chapter 4
Hubbards aim is not so much to enforce rigid adoption of one or other

approach or syllabus type as to foster awareness ofthese and to ensure learn-
ing gains are not hampered by mismatches of the software to the course,
teaching or learning style. How many software packages, one might ask, sit
unused or underused in language department servers or on learning resource
unit shelves for the very reason that they were not, prior to purchase or even
after purchase, properly evaluated for the appropriateness oftheir fit to both
teaching approach and the syllabus type? Clearly, many teachers subscribe
to a fundamental eclecticism and blend approaches and resources to suit
the need of the moment. Hubbards article is a warning against assuming
that one can blend any combination of approach and syllabus with any
software package in any learner context and get away with it.
Chapter 7 is an evaluation of a networked CD package (TellMeMore
Education, 2004), and its more recent adaptation as an e-learning solution
(TellMeMore Campus (Online), 2006). Hubbards checklists formed part
of a fuller evaluative matrix that led ultimately to the former package not
being adopted as an integral part of the curriculum delivery as it failed
to measure up to Hubbards criteria. The evaluative conclusions then fea-
tured as part of a collaborative partnership with the software company to
ensure that the latest product was better adapted to the needs of higher
education.
Mapping exercise 5: Pedagogy (Ellis) vs Chapelle
When it comes to the third ofthe Three Ps in our list: pedagogy, the com-
plexity level increases as the human element (learner and teacher) is now
the central focus. Is it possible, one needs to ask at the outset, using well-
targeted and narrowly-focused research designs to get nearer to accounting
empirically for what is actually taking place in the learner? Ultimately, it
is from what is learned by the students that any measurable impact data
can be derived. The challenge here is initially about identifying, clarify-
ing, and then measuring the impact made by key variables involved in the
learning process.
There have been pendulum swings in the research between focusing on

the learning and emphasizing the learner. Recently the balance has prob-
ably shifted towards the learner, learner differences and learning styles.
In 1994 Ellis stated that: The main goal of SLA research is to character-
ize learners underlying knowledge of the L2, i.e., to describe and explain
their competence (Ellis 1994: 13). Elliss framework for investigating L2
acquisition (see Table 4.9) gives an idea of the complexities involved in
the second language learning process and, hence for us, the areas of focus
that our quality control model would need to include. These range across
the descriptive-explanatory as well as the learner-learning axes/dynamics
and give an idea of the enormity of the task that CALL/CASLA evalu-
ation involves.
Focus on Learning Focus on the Learner

Description Explanation
Area 1 Area 2 Area 3 Area 4
Characteristics of Learner-external Learner-internal The language
learner language factors mechanisms learner
general factors e.g.
errors social context L1 transfer
motivation
acquistion orders
input and
and developmental learning processes learner strategies
interaction
sequences
communication
variability
strategies
knowledge of
pragmatic features
linguistic universals
Table 4.9 Elliss Framework for investigating L2 acquisition. Reproduced by

permission of Oxford University Press. From Oxford Applied Linguistics: Study of
Second Language Acquisition, by Rod Ellis Rod Ellis 1994.
The capabilities of metric technologies and neuroscience have not yet

progressed sufficiently to provide us with a clear idea of agency and cau-
sality in some of Elliss areas, such as learnerinternal mechanisms and
social context. Until medical scanners and neurological science are able
96 Chapter 4
to monitor non-intrusively what is happening in learners brains as they

learn in the classroom we are restricted to waiting for the outcome ofthat
process, that is obtaining data based on what they say, write and do, rather
than on what is actually going on. Ellis put it this way:
learners mental knowledge is not open to direct inspection; it can only be inferred
by examining samples of their performance. SLA researchers have used different
kinds of performance to try to investigate competence. Many analyse the actual utter-
ances that learners produce in speech or writingSome try to tap learners intuitions
about what is correct or appropriate by means of judgment taskswhile others rely
on the introspective and retrospective reports that learners provide about their own
learning Needless to say none ofthese provide a direct window into competence.
Also, not surprisingly, very different results can be obtained depending on the kind
of performance data the researcher studies. (Ellis 1994: 13)
The variability, and in many cases unreliability, of such performance data is

due partly to the fact that students are often doing something very different
from what [language teachers] assume they are doing (Hosenfeld 1976: 123,
cited in Chapelle 2001). This is even more the case in CAL/CALL settings
where students increasingly are transferring into the lab or e-learning arena
multi-tasking habits they have adopted at home (e.g. combining emailing,
chat, MSN, texting, and listening to music with actual work which may
or may not involve the computer!). Therefore, the data gathered need to be
observable data that provide evidence ofCALL qualities (Chapelle 2001:
66). As we have seen from Chapelle, observable data can be evaluated by
means of judgmental or empirical methods of evaluation. They also need
to have strong internal and external validity.
Linguists and CALL researchers are, however, steadily gathering dis-
crete data made possible by progress in new technologies and advances
in science. The holistic model for CALL evaluation (i.e. MFE2) aims to
provide a logically expandable framework of analysis so that new data and
findings can be assimilated easily into an ever-increasing bank ofknowledge
and an organized overview of the whole field.
A rudimentary adaptation of Elliss table shows those areas that our
model will address (see Figure 4.2). Clearly there is virtually limitless
scope for further research based on the interrelationships of the differing
elements. Evaluation in any one ofthe circled elements in the figure already
provides the scope for a separate discipline in itself. For example, CALL-
based analysis of errors subsumes, amongst other things, the domains
of tracking software, computerized error and needs analyses, explicit and
implicit, formative and summative feedback, diagnostic tests, computer-
adaptive testing (CAT) and online assessment.
Focus on Learning Focus on the Learner

Description Explanation
Area 1 Area 2 Area 3 Area 4
Characteristics of Learner-external Learner-internal The language

learner language factors mechanisms learner
general factors e.g.
errors social context L1 transfer
motivation
acquistion orders
input and
and developmental learning processes learner strategies
interaction
sequences
communication
variability
strategies
knowledge of lin-
pragmatic features
guistic universals
Figure 4.2 Elliss Framework for investigating L2 acquisition.

Reproduced by permission of Oxford University Press.
From Oxford Applied Linguistics: Study of Second Language Acquisition,
by Rod Ellis Rod Ellis 1994.
As for the language learner we now have a dazzling array of online devices
for learners to self-diagnose their learning style. For the diagnostic survey
of learning style the VARK list, as mentioned in Chapter 2, was used in
the Case Studies for this project, but there are many others. For one ofthe
pre-tests a computer-adaptive test was used that responded and adapted
to the ongoing correctness, or lack of it, of students answers to direct the
difficulty level of remaining questions, and then make recommendations
98 Chapter 4
as to the level of activities in the subsequent learning paths. Other diag-

nostic or metric software packages can enable the monitoring of individual
learner differences and allow for further individualizing of learning. Just
focusing on the matching of the characteristics of the language learner to
appropriate software and/or CALL learning environment would be mate-
rial enough for many large-scale studies.
Obtaining reliable data depends in large measure on being able to con-
figure ones data by a combination of appropriate data collection methods.
A purely positivistic or empirical approach will not paint as rich a picture
as one that incorporates judgmental and phenomenological data. A quan-
titative verdict on learning outcomes will be more meaningful and true if
accompanied by a report on the learning context, the subject content and
feedback on students opinions regarding the learning process, informa-
tion about prior learning and learner characteristics. SLA theory outside
of CALL (or CASLA) inevitably lacks the conceptual vocabulary for
evaluation of CALL; however, that is not to say that it does not provide
a useful starting point.
When mapped against Chapelles criteria, Elliss framework, while
providing a useful vocabulary for labelling what is happening in an SLA
framework, is not broad enough to take on board the extra dimension that
CALL brings to the table (see Table 4.10).
For example, Chapelles sixth principle, practicality (i.e. the adequacy
of resources to support the use of the CALL activity), is not an issue for
Ellis when investigating acquisition processes. The mapping ofElliss frame-
work for investigating SLA to Chapelles principles for CALL evaluation
reveals an empty final column. Ellis is also inadequate for the purposes of
describing the impact of CALL on the learner (penultimate column of
Table 4.10). On the other hand Elliss reference to error and the whole
underlying concept of error analysis, error correction and feedback are a
criterion that is missing in Chapelles framework, but which will be added
to our framework based on our mapping ofHubbard, as well as Mehanna,
Dunkel, and Pederson. This addition also arose from a conviction that an
evaluation model for CALL platforms, programs and pedagogy would be
severely lacking if it did not look at the capacity of each ofthese to support
error-correction and feedback mechanisms.
Language learning
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
potential
Author
PPP
Descriptor
characteristics of
x
learner-language
Focus on Learning
Description: Area 1
errors x
acquisition orders and
x
developmental sequences
variability x
pragmatic features x
Explanation:
learner-external factors x
Learning
Focus on
Area 2
social context x
Ellis (1994)
Pe dagogy
input and interaction x x

learner-internal mechanisms x
Focus on Learning
Explanation: Area 3
L1 transfer x
learning processes x
communication strategies x x
knowledge of linguistic
x
universals
Focus on Learner
the language learner x

Explanation:
Area 4
general factors
x x
e.g. motivation
learner strategies x
Table 4.10 Mapping of Elliss framework for investigating SLA.
Mapped to Chapelles principles for CALL evaluation.
100 Chapter 4
Mapping exercise 6: Pedagogy (Dunkel) vs Chapelle
Dunkel (1991) provides a useful, early review ofthe strengths and weaknesses
ofkey effectiveness research studies. Her main interest is in the narrative and
meta-analytic research base. What Dunkel brings to the CALL effectiveness
research table is an ability to ask pertinent questions about the impacts of
CALL, a highlighting of the strengths and weaknesses of various research
designs, and useful recommendations for improvement in the rigour of evalu-
ative studies. The mapping exercise relating to Dunkel draws on her overall
in-depth analysis and recommendations for future effectiveness research, and
not from any tabulated framework; so, a brief review of her recommenda-
tions is needed to establish the grounds for her evaluative criteria.
The first CAI meta-analysis she looks at is that carried out by Roblyer,
Castine and King (eds) (1988, cited in Dunkel 1991: 535), which she calls a
review ofthe syntheses (Roblyer et al. looked at 26 ofthese prior to 1980),
but which, in addition, includes an analysis of 38 research reports and 44
doctoral dissertations completed between 1980 and 1988. She highlights
the editors conclusion that, while specific measures exist for evaluating
educational achievement (such as student achievement, attitudes, drop-out
rate, learning time), yet after 25 years of use of computers in instruction
the impact of computer applications on these measures remains largely an
unknown quantity (Roblyer et al. 1988: 12). Their review throws up key
questions that remained at that time to be answered unambiguously, and
which, for a large part, are still matters of dispute, such as:
Can computer applications help improve student performance in basic skills and
other key areas? For what specific skill areas, grade levels, and content areas are
computer applications most effective? Which kinds and levels of students seem to
profit most from using computers to learn? Which kinds of computer applications
are most effective for which skill and content areas? Can computer applications
improve students attitudes towards school, learning, and their abilities to learn? Will
improved attitudes translate into better performance in school and lower drop-out
rates? (Roblyer et al., p. 12)
Most, if not all, of these questions are still relevant today and some have
informed the evolution of MFE1 and MFE2. Dunkel bases her summary
of research findings on seven strong and consistent trends in findings
relating to the following aspects of instruction:
(1) The amount of learning time; (2) student attitudes towards the computer and
the subject matter; (3) the effect of computer use in specific content areas; (4) the
types ofCAI (tutorials, drill and practice, and simulations); (5) the computer envi-
ronments (CAI, CMI, and CEI); (6) the uses of CAI (i.e., as a supplement to,
versus a replacement for, traditional methods); and (7) the levels of student ability.
(Dunkel 1991: 11)
These trends are drawn from a mix of quantitative and qualitative studies.
Their findings can, when appropriately marshalled, inform the design of
programs themselves, the pedagogy behind the delivery ofthese programs
and also the approach to the evaluation of the effectiveness of these pro-
grams. What follows is a brief summary of her key findings in these areas
as they relate to effectiveness research and this enquiry.
The first criterion Dunkel looks at relates to the timesaving benefits of
CAL/CALL. While arguing that research interest in the timesaving ben-
efits of computers had lessened (in 1991) in favour of cost-effectiveness and
courseware design issues, her reference to the possible remedial benefit of
CALL materials is a valid point which could have been enlarged upon and
which more recent research has suggested may be a factor for less able lan-
guage students in an HE setting (Leakey and Ranchoux 2005: 47). Further-
more, timesaving and cost-effectiveness issues may be of use to commerce
as well as in secondary and higher education to assess students language
proficiency for diagnostic or achievement purposes, training tutorials and
drill-and-practice activities (p. 12). The second Case Study (Chapter Seven)
will feature a software package where all three of the above factors relat-
ing to timesaving appear to have featured as drivers behind the design. In
1991 this timesaving benefit was seen as also of use in its ability to free up
the teacher so that he/she could concentrate on devising communication-
engendering activities for the learner, which in the 1980s computer tech-
nology was very limited in its ability to deliver. It is increasingly possible
for the computer now to even assist with the communication-engendering
activities, for example, via video-conferencing and telephony applications
(such as MSN and Skype), not to mention text-based chat.
Our CALL effectiveness assessment should include, I suggest, a gauge
of the timesaving factor and its role in acceleration of student learning,
accelerated feedback in diagnostic and formative testing, and reducing
the workload ofteachers in the areas of preparation of materials, class and
individual contact time and the marking of tests.
102 Chapter 4
The second strong and consistent trend in findings that Dunkel deals
with relates to student attitudes toward the computer and the subject
matter. Dunkel points to the Florida Department of Education report
(1980) and a series of studies by Kulik and colleagues which both suggest
that students hold positive attitudes towards using computers. This is still
generally not in doubt, though teachers are finding and the focus groups
linked to the Case Studies for this thesis have confirmed that, even a quar-
ter of a century on, many students are wary of them, and even the more
IT-literate student may react negatively to the unnecessary or over-use of
computers for teaching purposes.
Even more interesting to the effectiveness debate is Dunkels inference
from Kulik and Kuliks finding that computers do not seem to have much
impact on students motivation to learn the subject matter even though
students may report that they like to use computers (1986: 13). This
phenomenon echoes Thorndike and Hagens halo error (1977, cited in
Chapelle and Jamieson 1991: 45), whereby students reporting of an experi-
ence may not accurately reflect their actual experience of it. Effectiveness
researchers need to be cautious in either wrongly designing student attitude
surveys or in misinterpreting the findings.
An effective model for evaluation will, therefore, need to provide a
qualitative indicator of student and staff reaction to the use of computers.
However, it will need to factor in the halo error and be able to distinguish
between attitudes to the computer and the effect of a computer-based
environment and learning programme on students attitudes to learning
the subject matter.
Dunkels third strong trend relates to the effect of computer use in spe-
cific content areas. Dunkels summary of previous findings places languages
in the top three subject areas benefiting from CAI alongside mathematics
and science (Fisher 1983: 13, and Roblyer et al. 1988). As for the language
learning skills that benefit most from computers she points to the Rob-
lyer et al. (1988) study that stated that: computer applications seem most
effective in the area of word analysis skills, such as phonics, followed by
higher level reading and language skills (Roblyer et al. 1988: 92, cited in
Dunkel 1991), and argue that while their own conclusions were based on
just four studies, these nevertheless replicated previous findings. A more
recent meta-analysis (Felix 2005c), which will be discussed in greater depth
later, suggests that little has changed in this regard. Our final model will
need to isolate the impacts of CALL on both individual language skills
and combined skills activities.
Dunkels fourth area is what she calls the types ofCAI (e.g. tutorials,
drill and practice, and simulations). Dunkel here draws together conclu-
sions from a number of different authors, whose findings have stood the
test of time (Burns and Bozeman 1981, Roblyer and King 1983, Samson,
Niemiec, Weinstein, and Walberg 1985, cited in Dunkel 1991: 14), namely
that: drill works better with lower level skills found at lower grade levels
while tutorials are required for higher level skills (Roblyer et al. 1988:
35). Also cited was the finding of Willis, Johnson and Dixon (1983) that
computer games and simulations were more attractive and interesting to
students than any other form of computer-based instruction. The latter also
pointed out the cost-efficiency of instructional simulations in that they
could bring the real world into the classroom, thus obviating expenditure
on trips abroad, and saw the cognitive benefits of simulations in nurturing
divergent thinking. Since then the most successful and abidingly popular
CALL products and activities have been heuristic packages such as the
tourist simulation game Granville (1980s), the murder mystery Who is
Oscar Lake (1996)?, or MOOs involving variants of simulations games
such as Dungeons and Dragons.
Courseware design and evaluation of courseware design can and
should still be informed by these findings. The fundamental dynamics
of the information gap, treasure hunt and the need to communicate to
discover stand at the heart ofthe best simulations and instructional course-
ware. Interactive web-enhanced instructional resources involving chat, file-
exchange, conferencing and peer feedback are more recent developments
in this genre. Felix gives examples of several fee-paying stand-alone courses
that are password protected, offering free trial materials open to anyone.
They range from one-person operations like Cyberitalian and Interdeutsch
to large organizations like GlobalEnglish that employ considerable staff
and offer a 24-hour attended chat site and other extensive services (Felix
2000a). This thesis reports on extensive trialling of another similar product,
TellMeMore, that started off as a networked or stand-alone CD-ROM
and has evolved into a sophisticated online tutoring resource incorporat-
ing simulated dialogues using speech recognition, sophisticated tracking,
104 Chapter 4
a significant fun element, flexible and customizable learning paths and an

email link to the tutor. The fact that these types of software appear to tick
up to five ofChapelles six principles (language learning potential, learner
fit, meaning focus, authenticity, and positive impact) might explain in
large part their impact.
The computer environments (CAI or Computer-Assisted Instruction,
CMI or Computer-Mediated Instruction, and CEI or Computer-Enhanced
Instruction) are Dunkels fifth strong trend in findings. She reports that
effectiveness research findings in each of these types of computer-based
instruction (CBI) differed depending on the educational context, age or
level of the students tested. The finding that two CEI studies Kulik et al.
(1986) involving simulations yielded the greatest average Effect Size (1.13)
is interesting in the light of Dunkels conclusions in the previous section.
Also it appears that certainly the more student-centred CEI approach
was of more benefit to higher level and older students, whereas the more
tutor-driven, drill-and-practice based CAI approach helped younger and
lower level students more.
CALL evaluators need to test these wider findings in the modern
CALL context and ensure that the environment or instructional context
is factored in as a variable. Such studies may well yield different results. For
example, younger learners familiar with more heuristic programs may be
much more amenable to learner-centred instruction, while older learners
may cope better with a tutor-driven approach.
The sixth area of interest to Dunkel is the uses of CAI (i.e., as a sup-
plement to, rather than as a replacement for, traditional methods). She
highlights research that found that CAI is more effective as a substitute
for, rather than a replacement for, teacher-based instruction, especially for
young learners. She makes the significant point that study ofthis issue has
decreased over time as researchers and users have come to reject the idea
that CAI or CALL might serve as the totality of instruction (p. 16), and
cites Kulik and Bangert-Drowns (19831984) who see computer-assisted
instruction as enhancement of, rather than a substitute for, traditional
or alternate methods of instruction. The Case Studies of this thesis will
investigate the effectiveness of different current multi-modal, or multi-
method, pedagogies that reflect this finding and the insights from CEI
to blend or customize multiple learning environments for a more flexible
approach to CALL.
Finally, Dunkel highlights the levels of student ability as her seventh

point where impact findings have been significant. Here the variable of
student ability had produced contradictory findings. Roblyer et al.s (1988)
study found no statistically significant differences between low achievers
and regular-achieving students, whereas Fishers 1983 study concluded
that CAI was best suited to single ability groups (whether high or low/
disadvantaged) rather than entire student populations. Dunkel argues
again that more research is needed to prove the relationship between CAI/
CALL and learning outcomes for differing ability levels. She also makes
the important point that CAI/CALL may be able to help under-achieving
studentscatch up in a non-threatening instructional environment (p. 17).
This finding was borne out in a 20032004 study of a cohort of first year
undergraduates ofFrench at the University ofUlster as part ofthe TOLD
Project, where the less able and less confident students found the CALL
environment less threatening for oral work, and made more progress over
a semester than the abler students in pronunciation and fluency (Toner et
al. 2007; see Chapter 6). CALL evaluation needs to control for ability as
well as learning style in order to determine the optimum approaches for
individuals and groupings of students.
Dunkel concludes her study with four seminal questions:
i. Is CALL more effective for certain L2 skills areas than others?

ii. Do certain levels of proficiency profit more from computer use than
others?
iii.What role does feedback play in the effectiveness of CALL
programs?
iv. What degree oflearner control is related to effective CALL designs?
(pp. 2526).
Such questions will of necessity relate to evaluations of platforms, programs

and pedagogy. Since 1991 various CALL researchers have, directly and indi-
rectly, addressed these questions. This thesis has taken, in particular, the first
two of Dunkels questions above, as basic research questions; in addition,
the same questions are integral to each of the study projects in the Case
Study chapters. In relation to the first of Dunkels questions, findings are
reported on regarding the effects ofCALL on oral skills, blended learning,
area studies and language study skills (or meta-skills). As regards Dunkels
106 Chapter 4
second question, findings are reported on in the Case Study chapters show-
ing how different levels of proficiency perform in a CALL environment.
Correlations were also looked for between learning style and learning gains,
and between experience with computers and learning gains.
As for Dunkels third and fourth questions, issues of feedback and
learner control will form an important element in particular in the Case
Studies looking at Platforms (Chapter 4) and the software program
TellMeMore (Chapter 5). Qualitative feedback gleaned from participat-
ing staff and students and their comments in evaluative questionnaires
and focus groups will be a significant part ofthe findings. The key factors
behind each ofDunkels four questions will also feature in our final evalu-
ative model (MFE2). Their value was underlined by our mapping against
Chapelle of the key Dunkel criteria gleaned from the above analyses and
meta-analyses.
When mapped against Chapelle (Table 4.11), Dunkels research agenda
for CALL, while lacking in the areas of authenticity and practicality,
otherwise overlaps reasonably well. Dunkels agenda throws up supple-
mentary headings similar to those shown by the mappings ofthe previous
chapter (Toner et al; Mehanna) and those that follow below. These are
language skills and combinations of skills, error correction and feedback
and learner control.
evaluation supplemen-
Language learning
tary to Chapelle
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
potential
Author
PPP
Descriptor
Does CALL save time?

Cost-efficiency; acceleration of
x
Dunkel (1991)
learning; reduction of teacher

Pedagogy
workload
Student attitudes towards
the computer and the subject
x
matter; halo & Hawthorne
effects; CHILL factor.*
The effect of computer use in

specific content areas (other
x x
subjects; 4+ skills); age;
gender; LS differentials?
The different impact of
different types of CAI
x x
(tutorials, drill and practice,
and simulations)
The different impact of
different types of computer
x x
environment (CAI, CMI, CEI,
WELL, MALL, podcasting)
The uses of CAI (i.e., as
a supplement to, versus a
x
replacement for, traditional
Dunkel (1991)
methods): early BL?

Pedagogy
The levels of student ability. Is

CAL/CALL best for remedial x
work and underachievers?
4+ skills
Is CALL more effective for and skills
certain L2 skills areas than x x combi-
others? nations
benefit
Do certain levels of proficiency
profit more from computer use x
than others?
What role does feedback play Error cor-

in the effectiveness of CALL x rection and
programs? feedback
What degree of learner control Provides

is related to effective CALL x learner
designs? control
Table 4.11 Mapping of Dunkels (1991) evaluative headings.

Mapped against Chapelles (2001) evaluative agenda for CALL tasks.
*CHILL factor = computer-hindered language learning (Ross 1991).
108 Chapter 4
Mapping exercise 7: Pedagogy (Pederson) vs Chapelle
In her synthesis ofCALL effectiveness research prior to the late 1980s (1988:
20121) Pederson draws on the insights of what she calls perhaps the most
ambitious CALL experimental endeavour to date, that ofRobinson et al.
(1985), for an evaluation of six pedagogical and four answer-judging (i.e.
feedback) hypotheses which, when tested over a nine-day period, in albeit a
junior U.S. high school Spanish class and not at HE level, revealed signifi-
cant out-performance by the experimental group (students who practised
with CALL the ten point criteria listed below). Such a strong finding, in
a field where strongly significant findings are very much the exception,
warrants closer scrutiny both for its findings and its research design. This
approach is an example of good experimental design practice in its atom-
istic rather than general approach, as Pederson states: the purpose of the
research was not to prove the effectiveness ofCALL in general, but to pro-
vide evidence ofhow the manipulation of certain CALL coding elements
may be particularly well suited to encouraging meaningful, communica-
tive, and maximally facilitative CALL (p. 120). The design used a classic
pre-test/post-test design that included also two tests of prior knowledge
to establish a benchmark or starting point for comparison, and thereby
allowed a clear isolation of learning gains to be made. Finally, rather than
being technology-driven the design insisted on a pedagogical rather than
a technological rationale for generating research questions and selecting
variables (p. 120). Both the control and the treatment groups were given
the identical materials as well as the same pre- and post-test in order to
isolate the one variable of CALL. The control group, however, practised
under the opposite conditions (p. 121), though it is not stipulated whether
these were non-CALL or alternative CALL conditions. The conclusions
from the study are a positive reinforcement for post-behavioural CALL
methodology, in that they showed that meaningful and discovery-oriented
CALL leads to more learning than CALL that is less communicative and
more directive (Pederson 1988: 121).
The six pedagogical hypotheses in the Robinson et al. study, echo-
ing Mehannas clusters in many ways, predicted improved achievement
as a result of the following types of materials presentation: integrated
context for discrete structural items; meaningful practice of structural

items; reference to people that students knew; use ofhumour and emotion
in order to involve the learner personally; student choice of general context;
and higher-level cognitive tasks (drawing inferences or problem solving)
(cited in Pederson, 1988: 121). As with Dunkel, several of these anticipate
Chapelles six principles of CALL evaluation seventeen years later.
Robinson et al.s four answer-judging (or feedback) hypotheses pre-
dicted improved learning gains for students who, when they answered
incorrectly, were given feedback that caused them to discover their error,
were provided assistance with a degree of personal control whether or
not to use it, were provided with implicit rather than explicit correction,
and finally were given the same items to practise again at spaced intervals
(p. 120). This study shows how a research design that is pedagogy- rather
than technology-driven, where the computer is servant to, rather than
master of, the pedagogy can yield results showing significantly improved
learning gains.
Some of these research design principles will inform the method
adopted in the Case Studies of this thesis, for example, the atomistic
approach above is reflected in the isolation of skills to be tested in the
TOLD (oral skills isolated), and BLINGUA (area studies comprehension
skills isolated) projects. Also the model will as often as possible require
an integrated context for discrete structural items and the meaningful
practice of structural items.
Anticipating Chapelle and Felix, Pederson states that future research
should investigate related differences in learning style, learning preference,
and aptitude as independent variables in addition to simply controlling for
them (p. 122, my italics). Since 1988 there have been a few attempts to
respond to this challenge to investigate learner differences, though the
majority have tackled this from purely a qualitative angle. In Felixs review
of current research (2004 and 2005a), of the 93 projects reviewed only
four look at the variables oflearner strategies/style, and ofthese only one,
Yeh and Lehmann (2001, cited in Felix 2005a: 24), tackles the approach
with a quantitative design. Their findings (N = 111) point in particular to
benefits for lower-ability students when given greater learner control (Yeh
and Lehmann 2001: 141).
110 Chapter 4
There is clearly a dearth of quantitative studies looking at learning

gains and configurations of approaches that include quantitative studies.
From what we have seen, careful thought and planning needs to go into
both the design of each evaluative study and the statistical tests carried out
to ensure variables are controlled for, extraneous variables are isolated and
subsequent inferences are both internally and externally valid. Evaluations
of the impacts of CALL, whether they be of discrete elements or of the
interplay between pedagogy and digital resource, need to be cut down to
manageable proportions but also fit into an overall research agenda that
moves the whole field forward incrementally. Felix would take up Dunkel
and Pedersons call for an awareness of previous findings and gaps and for
rigour in future studies and argue that, while effectiveness researchers need
to follow a common agenda, there is a clear need to tailor research design
constructs to the particular study (2005a):
Because there is such a large scope for research in this area, there cannot be a single
best design model. What is imperative, though, is that researchers match the design
to the research questions, the context in which the study takes place, the time frame
available, the variables under investigation, their capacity of statistical analyses and
their ability to control for confounding elements. (p. 12).
When mapped against Chapelles six principles (see Table 4.12) Pedersons
agenda, drawing primarily on Robinson et al.s criteria (i.e. those peda-
gogical hypotheses that predicted improved achievement (1988: 120)),
scores most strongly on issues oflanguage learning potential and learner
fit, provides some coverage of meaning focus (cf. meaningful practice of
structural items) and positive impact (cf. use of humour) but makes no
overt reference to authenticity and practicality. Similar to those shown by
the mapping ofDunkels agenda and the mappings ofthe previous chapter
(Toner et al. and Clarke) several of Pedersons criteria map well with our
new criteria error correction and feedback and learner control. And the
hitherto unmentioned factor teacher style has been added to our list. Some
ofthe other mapped agendas and methodologies below also highlight these
and other extra factors that may influence the quality ofthe teaching and
learning. MFE1 and MFE2 will incorporate these additional factors.
evaluation supplementary
Language learning
Positive impact
Meaning focus
Authenticity
to Chapelle
Practicality
Learner fit
potential
Author
PPP
Descriptor
Integrated context for

x
discrete structural items
Meaningful practice of
x x
structural items
Reference to people that
x
students knew
Use of humour and emotion

Teacher style
in order to involve the x
factor
Robinson et al (1984) in Pederson (1988)
learner personally
Student choice of general Provides

x
context learner control
Higher-level cognitive tasks

Pedagogy
(drawing inferences or x x
problem solving)
Give feedback that causes Error

students to discover their x correction and
error feedback
Provide assistance with a

Provides
degree of personal control x
learner control
whether or not to use it
Error
Provide with implicit rather
x x correction and
than explicit correction
feedback
Give the same items to

practice again at spaced x x
intervals
Table 4.12 Mapping of Robinson et al.s (1984) evaluative headings.

Mapped against Chapelles (2001) evaluative agenda for CALL tasks.
112 Chapter 4
Mapping exercise 8: Pedagogy (Mehanna) vs Chapelle
Lining up Mehannas meta-analysis of e-learning pedagogies (discussed

in Chapter 2) with Chapelles criteria reveals a broad overlap, particularly
with the first three criteria as they focus on learning and meaning. This
overlap suggests a strong transferability of general e-learning good practice
to CALL task design (see Table 4.13). There are, however, two significant
areas where Mehannas framework does not overlap with Chapelle: authen-
ticity and practicality.
Language learning
supplementary to
Positive impact
Meaning focus
Authenticity
Principles of
Practicality
Learner fit
evaluation
potential
Chapelle
Author
PPP
Descriptor
Identifying similarities and

x
differences between items
Summarizing and note-
taking (which include also
as subcategories: filling
x x
missing parts and translation
of information into a
synthesized form
Recognizing student
Error
effort leading to improved
x x correction
Mehanna (2004)
engagement in cognitive
and feedback
processes
Pedagogy
Homework and practice x

Non-linguistic
representations
x x
(graphs, charts, maps, mind
maps)
Cooperative learning
(comprising: positive
interdependence, face-to-
face promotive interaction, Collaborative
x x x
individual and group CALL
accountability, interpersonal
and small group skills, and
group processing)
Generating and testing

hypotheses involving the x x
application of knowledge
Setting objectives and Error
providing feedback (meta- x correction
cognitive thinking) and feedback
Activating prior knowledge
by use of cues, questions, x x
Mehanna (2004)
brainstorming, etc.
Pedagogy
Error
The self-system processing of
x x correction
presenting tasks
and feedback
The use of task-related
x
knowledge
The cognitive processing of
x
tasks
Provides
learner
The meta-cognitive
control;
processing of tasks
4+ skills
combinations
Table 4.13 A mapping of Mehannas pedagogical clusters.

Mapped against Chapelles six principles for CALL evaluation.
One might conclude from this that authenticity is more important in

SLA/CALL than across a wide spectrum of disciplines, or, at least those
academic disciplines that featured in the Mehanna e-learning meta-analysis
(one should note that Mehanna does not specify which academic disciplines
featured in her meta-analysis, or whether language learning featured at all).
As regards practicality in Chapelles definition ofthe term (i.e. adequacy
of the resources to support the use of the CALL activity (2004: 55)) this
is clearly not a matter of pedagogy as of resourcing and as such one would
not expect it to appear in Mehannas list, though, of course, it is an impor-
tant part of CALL evaluation.
114 Chapter 4
The extreme right-hand column ofTable 4.13 reveals three ofMehan-

nas clusters that Chapelles framework for evaluation does not cover, and
that resonate with other evaluative agendas in the relevant literature. First,
recognizing student effort leading to improved engagement in cognitive
processes, which broadly comes under our suggested new, evaluative cri-
terion error correction and feedback, does not feature in Chapelle. It is,
nevertheless, an important area of general pedagogy, SLA and CALL. Like-
wise, Mehannas cooperative learning criterion will feature as our evalu-
ative principle collaborative CALL. Thirdly, Mehannas ninth beneficial
e-learning cluster the meta-cognitive processing oftasks resonates in
part with two of the new evaluative principles in our model: the provi-
sion oflearner control (i.e. autonomous or self-directed learning in which
students develop their reflective learning skills) and training in all the
language skills and combinations of skills (which includes meta-cognitive
language learning or study skills as well as the more obvious skills oflisten-
ing, reading, writing, speaking, vocabulary, and grammar skills). Looking
beyond CALL and more general language learning pedagogy to other
language disciplines in search of evaluative criteria that might help with
the evaluation ofCALL seems well justified by the Mehanna meta-analysis
of e-learning pedagogies. Her notion of a composite system that blends
pedagogies and theories differs from Chapelles single theory approach, and
in many ways suggested the possible value of a research-informed adoption
ofblended CALL. Likewise her mixed methods multi-case study research
reinforced the idea of an evaluative model that employed a configuration of
diagnostic and analytical methods in the pursuit of a practical and empiri-
cal evaluative methodology for CALL. The next chapter sets out how such
a system might work.
Chapter 5
A model for evaluating CALL

Part 2: Qualitative and quantitative measures
A blended agenda for empirical effectiveness research
A decade earlier than her seminal work in providing a theory-driven jus-

tification for using SLA as the foundational theory for teaching, test-
ing and research in CALL (2001), Chapelle had, together with Jamieson
already set out clear guidelines for improved internal and external validity
in research on CALL effectiveness (1991). While her 2001 work was to
focus particularly on qualitative, judgmental studies ofCALL task appro-
priateness, this earlier article focused on improving the rigour in empirical
studies that set out to measure the impact ofCALL on learning outcomes,
and employ statistical techniques to that end. Her findings in this regard
have also contributed an important element in the metric route of our
method. Two other authors, Pederson and, more recently, Felix have also
contributed to the establishing of a clear agenda for CALL effectiveness
research. The relevant contribution of these three authors will now be
considered and integrated into the evaluative approach, as well as applied
in the Case Studies.
Common themes running through the work of these three authors,
whose work spans the last three decades ofCALL, is the demand for sound
construct validity in research design, strong internal and external validity,
rigour in the isolating, controlling and analysis of variables, and veracity
and sobriety in the reporting of findings.
Some ofPedersons guidance for improved effectiveness research has
already, in the previous chapter, been mapped against Chapelles six criteria
to inform the generation of six new CALL Enhancement Criteria. Those
116 Chapter 5
priorities that relate to data collection will be included in the mapping table
below (Table 5.1). The concluding remarks to her chapter in Smith (1988:
126127) serve as a clarion call for our enquiry: an increased interest in
disciplined, dispassionate research that attempts patiently and carefully to
add to what is already known about how students learn languages is the
best assurance that CALL, unlike the language lab of the 1960s, will be
used intelligently.
Likewise, Chapelles call for strong internal and external validity were
documented in Chapter 3. Her guidelines for strong internal and exter-
nal validity will form the basis of both Felixs guidelines below and the
data collection validity checklist (see Tables 5.1 and 5.4). To reinforce her
message one could add her comment in her concluding remarks in her
chapter in Dunkel (1991): Because perfect worlds in which to carry out
research do not exist and because the environment of each research study
has unique elements that constrain the validity of the investigation, it is
the responsibility of the researcher to identify and pinpoint threats to a
studys [internal] validity (p. 54).
Felix has been interested in good practice in effectiveness research
since her doctoral thesis on Suggestopedia (completed in 1989), and in
CALL evaluation since her 1993 article: Marking: a pain in the neck
the computer to the rescue. In 2000 she was advising caution regarding
unreasonable claims and contradictory findings regarding the effective-
ness of CALL and trying to raise awareness as to the complex nature of
the variables involved:
research into the efficacy of computer-assisted learning has produced very equivo-
cal results (Dunkel 1991), and it is easy to list problems attached to such research
(Chapelle 1997). Judgments in the area vary widely. At one end are positive reports
from the authors of several large meta-analyses as exemplified in the computer
did its job quickly on average in about two-thirds the time required by conven-
tional teaching methods (Kulik et al. 1980: 538), and the newer technologies show
promise to be able to provide feedback in multiple modes, such as listening and
reading (Basena and Jamieson 1996: 19), although they did caution that the results
are difficult to interpret, and the designs and measures do not lend themselves to
reproduction or generalizability (p. 19). At the other end are dismissive (and in the
quoted case unsubstantiated) comments such as: Study after study seems to con
A model for evaluating CALL Part 2: Qualitative and quantitative measures 117
firm that computer-based instruction reduces performance levels and that habitual
Internet use induces depression (Noble 1998: 2). Given this variation, it is fairly clear
that general conclusions about the effectiveness of CALL cannot be formulated
without qualification nor relied upon uncritically. What is more, the problem is
going to intensify: as programs become more sophisticated, variables to be investi-
gated become more wide-ranging and conclusions on meta-analyses more difficult.
(Felix 2000b: 50)
Since 2000 ongoing syntheses ofCALL research by Felix (2004 and 2005a)
and Hubbard (2004 and 2005) reveal that these lessons are still not being
learned. Hubbards review of over ninety research articles found that a high
percentage ofCALL research involves research subjects, whether students
or teachers, who are novices to CALL; they are also novices to the task or
application under study, are often studied exclusively during their initial
experience. Additionally, the studies may be very short, representing a
single event, such as a class or lab session. Also, surveys and questionnaires
are used in place of more objective measures, such as tracking or testing
(Hubbard 2004: 165; 2005: 358). Hubbard also adds: with better studies
of trained and experienced learners, we may find CALL is more effective
than is currently believed (Hubbard 2004: 165 (online)). This comment
reflects an awareness of the unconvincing findings of much CALL effec-
tiveness research and a confidence that improved outcomes are more likely
with improved research designs and students that are more familiar with
CALL. Familiarity suggests long-term exposure and longitudinal studies
to monitor this. These would help to eliminate skewing factors such as the
halo and Hawthorne and poor learning outcomes due to teachers and
students wasting learning time coming to grips with new technology.
In her article Analysing recent CALL effectiveness research Towards
a common agenda (2005a), based on a meta-analysis of more recent CALL
research (i.e. between 2000 and 2004), Felix also points out the frequent
shortcomings of research constructs, listing common problems that still
occur with effectiveness research: misleading titles, poor description of
the research design, failure to investigate previous research, poor choice
of variables to be investigated, and overambitious reporting of results
(2005a: 10).
118 Chapter 5
In this paper she outlines common problems afflicting effectiveness

research. From these one can discern the following nine principles for a
common agenda that link well with Dunkel, Pederson and Chapelle:
Firstly, extraneous or confounding variables must be checked to ensure

that there is a trustworthy attribution of cause to its effect. However, as
Felix puts it: if we managed to control for every possible confounding
variable in an experimental design we would be left with the technol-
ogy itself as the independent variable (2005a: 2).
Secondly, subjects should be randomly selected.
Thirdly, instruments of measurement for learning outcomes and atti-
tudes need to be reliable.
Fourthly, account must be taken ofthe reactions of students and staff
which may skew the data and create a false impression ofthe impacts
of a treatment.
Next, researchers ought to provide a full and dispassionate reporting of
data that gives a balanced picture ofthe insignificant and negative results
as well as merely the significant findings, and avoid over-ambition.
Titles of projects need to accurately reflect the reality, scope and sig-
nificance of that project.
The seventh principle is that a clear description of a sound research
design is required that distinguishes between experimental and non-
experimental studies, that ensures strong internal and external validity
and clearly informs the reader ofthe variables involved and the nature
of the sample.
Previous relevant research needs to be investigated and the implica-
tions of the findings for the study reported on.
Felixs ninth principle is that the choice of variables must be sound,
manageable and extraneous variables properly controlled for (Felix
2005a).
Several ofthese points featured earlier in Felixs (2000b) article and were,
therefore, available to inform the method used in the Case Studies. They
will also feature in MFE2 in Chapter 9. In concluding this section, it is
interesting to note some ofthe comments Felix made in her last conference
paper prior to retirement from academia. In her keynote address given at

the EUROCALL 2007 conference held at the University ofUlster (2007),
she highlighted the findings ofher meta-analysis (2005a) and reiterated her
call for a common agenda for effectiveness research in CALL. She pointed
out the large body of CALL research now showing the positive impact
of CALL on the quality of learners written accuracy, spelling (though
with the proviso that this may well be partly a function of increased use
of spell-checkers), and written fluency, improved attitudes to CALL, to
language learning and perceptions ofthe learning experience. She stressed
mixed results for reading comprehension and grammar. In concluding,
she called for future studies with a higher quality of experimental and
non-experimental design looking at under-explored areas such as speaking
online, and for research with a narrower focus, for example, syntheses of
sub-categories oflanguage learning skills such as the role of collaboration
in e-learning, meta-cognitive skills, online presence and identity. Above
all she emphasized the need for rigour and strong internal and external
validity and the importance of addressing the nine key principles outlined
above. This project aims to harmonize Felixs research agenda with previous
CALL authors, test them in a variety of third level settings, and take the
next steps towards an iterative and systematized approach to filling in the
pieces of the CALL evidence jigsaw. To what extent do the Case Studies
of this project comply with this agenda?
The Case Studies mapped against the agendas of Pederson and Felix
Looking, then, initially at evaluative methodology, one can see from the
table below that the four case studies managed to observe the majority of
measures recommended as best practice by esteemed CALL effectiveness
researchers. Leakey and Pedersons quantitative and qualitative measures
overlapped in 88% of cases and Leakey and Felixs (2005a) overlapped in
78% of cases. This percentage was arrived at by counting all those boxes
where there was either a Yes entered or else compliance was observed
(respective to Pederson (= 21/24 boxes) and Felix (= 28/36 boxes)). The
entries marked with a No or an Uncertain were deducted from the total
120 Chapter 5
number ofboxes. There were three main areas of non-adherence to princi-

ples: identical materials for pre- and post-test, subjects should be randomly
selected and choice of variables must be sound, manageable and extrane-
ous variables must be properly controlled for. In each case only one out of
four of the case studies observed the requirement fully.
Of course, a further, more detailed evaluation of the Case Studies is
necessary than that provided above. A Yes/No response does not convey
the full facts of adherence to a principle, which would be better ascertained
by more open-ended answers. The table is useful first and foremost as a
checklist; as yet, though, it provides no conceptual matrix for evaluating
whether and to what extent programs, pedagogy and platforms comply
with any given theory of language learning, nor is it merged with our
twelve principles for the judging of task quality. We are ultimately look-
ing for a model that will incorporate the Pederson and Felix effectiveness
research principles into a larger framework that will enable researchers,
teachers, courseware or platform designers to identify the type of pedagogi-
cal approach and phase used in a given teaching cycle, list any evidence of
good practice, grade for degree of complicity, and then give a judgmental
score for effectiveness ofCALL pedagogical planning, courseware design
or platform construction. This judgmental score will then have to be con-
figured with any quantitative data assessing the impact on learners of a
given platform, program or pedagogy (or combination of these). Impact
data can include learning gains or outcomes over the period of the study
(usually quantitative data), student/staff reaction to process and treatment
(qualitative data which can be collated either quantitatively or qualitatively)
and process data (based on a configuring of repeated diagnostic measure-
ments and cross-sectional and longitudinal observations).
Model for Evaluation 1 (Leakey 20032006)

TellMeMore TOLD BLINGUA
Other Quantitative and SOTON-UU
(2004 (2003 (2004
agendas Qualitative measures (20052006)
2007) 2004) 2006)
Pre-test/post-test Yes Yes Yes Yes
Tests of prior
knowledge used
as benchmark and Yes Yes Yes Yes
isolation of learning
gains
Pedagogy rather than
technology-driven in
generation of research Yes Yes Yes Yes
questions and selection
Pederson (1988)
of variables
CAT
drawn
Identical materials for
from same Yes Yes Yes
pre- and post-test
bank of
questions
Identical materials
for treatment and No Yes No No
comparison groups
Main control variable
for treatment and
control groups was
Yes Yes Yes Yes
opposite conditions
(i.e. CALL v NON-
CALL)
Extraneous or
confounding variables Yes Yes Yes Yes
must be controlled for
Felix (2004 & 2005)
No No vol-
Subjects should be No Yes whole
course unteer
randomly selected volunteers cohort
groups groups
Instruments of Yes CAT
LO test not
measurement for based on Yes 5
a measure of
learning outcomes and graded different Uncertain
meta-skills as
attitudes need to be database of tests
such
reliable questions
122 Chapter 5
Titles of projects need

to accurately reflect
the reality, scope and Yes Yes Yes Yes
significance of that
project
Account must be
taken of the reactions Survey +
Survey of Focus Focus group
of students and staff log (incl.
students group (incl. staff )
which may skew the staff )
data
Full and dispassionate
Felix (2004 & 2005)
reporting of data that Yes Yes Yes Yes

gives a balanced picture
Sound research design
that distinguishes Yes
Yes non-
between experimental Yes non- non-exp. Yes non-
exp. but
and non-experimental exp. small but exp. larger
small
studies, and strong full data set small sample
sample
internal and external sample
validity
Previous relevant
research needs to be Yes Yes Yes Yes
investigated
Choice of variables
must be sound,
manageable and Uncertain Yes Uncertain Uncertain
extraneous variables
properly controlled for
Table 5.1 Checklist to enable the mapping of quantitative and qualitative measures.
From Pederson and Felix to a Case Study or Research Project.
A suggested sequence for CALL effectiveness research studies
The evaluation diamond for CALL effectiveness research (see Figure 5.1)
gives a graphic overview ofthe options to consider in the process of design-
ing an evaluative study of impact or student learning gains. This suggests a
timeline, or sequence, of optional metric steps combining qualitative and

quantitative measures. It is vital to ensure that a clear base line or start-
ing point is established and that there is a clear understanding of those
variables (such as prior knowledge and language learning experience) that
might have a bearing on outcomes, and which may be extraneous to the
study. It is also important to have a diagnostic survey(s) and pre-test(s)
that will capture all the data necessary for gauging changes over the full
length of the study, as any additions to these in mid-term or final surveys
or post-test(s) will not be usable for the purposes of comparison if these
contains new items that do not feature in the initial test(s) or survey(s).
This is a prototype for MFE1 ofthe fuller version that appears in the final
chapter. The more comprehensive version for MFE2 will give a fuller list
of variables and research designs possible and will also indicate some ofthe
key statistical tests available for qualitative and quantitative data (such as
cross-tabulations, t-tests, correlations, and analysis of variance).
The Evaluation Flowchart (Figure 4.1) in Chapter 4 suggests a proce-
dural sequence through the Qualitative and Quantitative Measures outlined
in the tables below. It is suggested that the Research Criteria Questionnaire
(Table 5.2) be visited twice once prior to the Data Collection (Table 5.3),
Validity (Table 5.4) checklists, and Learning Gain Tests, where the ques-
tions will appear in the future tense and then, retrospectively, after the
test, where the questions will, of course, be in the past tense. Ideally the
researcher should be aware of all of the guidance above before commenc-
ing an effectiveness research study. Some of the tables below, which were
designed for MFE1, have been amended and enlarged upon for MFE2
in the final chapter. For example, the Data Collection table and Validity
checklists have been turned into tables (Tables 9.17 and 9.18) that will enable
judgmental scoring (by means ofLikert Scales) ofthe degree of adherence,
during the Learning Gain Test(s), to the listed criteria.
124 Chapter 5
Figure 5.1 Evaluation diamond for CALL effectiveness research (MFE1).
Tables 5.2, 5.3 and 5.4 are a checklist of good practice for effectiveness
research applicable when conducting ones own empirical studies or evalu-
ating those carried out by others.
Table 5.3 shows the proto-typical (MFE1) version ofthe checklist for
data collection methods (both qualitative and quantitative) used in the
Program and Pedagogy Case Studies.
From the experience ofthe Case Studies this has evolved into a longer,
more comprehensive version. This is shown in Chapter 9 (Table 9.17) and
contains twenty-one as opposed to the eleven data collection methods
of Table 5.3, and includes diagnosis of staff reactions as well as students.
Nevertheless, researchers must be mindful ofMurrays warnings about the
potentially intrusive nature of multiple-method data-gathering, and so it
must be stated that use of all the given intervention points in one study is
not recommended. A study, and in particular students enthusiasm, can
be spoiled by excessive monitoring.
Research Design Criteria

Criteria questionnaire Details
What was your Sample Size at start of study (N = ?)
Sampling
What was your Sample Size at end of study (N = ?)

What was your complete number of Full Data Sets? (N = ?)
What Pedagogical approach or Teaching Methodology was adopted?
PPP
What Platform Technology was used (software, hardware or hybrid)?

What Program Technology was used?
What is the Educational Setting (primary, secondary, tertiary, adult)?
Over what Period of Time did the study take place?
What is the Research Construct (Experimental, Quasi-experimental,
Non-experimental, Pre-experimental) of your study?
What Research Design (combination of qualitative and quantitative
data collection methods) is being employed? E.g. between-subjects
time-series study with focus groups; or within-subjects, cross-sectional
with surveys.
Is the study a Between-Subjects or Within-Subjects design?
Are the Instructors across the groups the same person /different
Conditions of the Study
people?
Are the Activities across the groups: identical, near-identical, different?
Is there a Treatment group and a Control or Comparison group?
Are the Pre- and Post-tests identical, near-identical or different?
What Language(s) are being studied?
What Language Skill/Combination of language skills is under
analysis?
What Variable(s) are being analysed
Is the Allocation of Subjects to groups random or selective
If Random allocation, how was this achieved?
If Selective, what criteria and methods were use to select subjects
What methods for Controlling for and isolating of variables were
adopted?
Is the Scoring carried out by an independent scorer?
126 Chapter 5
Is the wording of your Null Hypothesis and your Alternative

Hypothesis appropriate? Have these been recorded in your reporting?
What instrument(s) were used for the Comparison of Means?
Quantitative instruments
Parametric or non-parametric?
What instrument(s) were used to measure Correlation? Parametric or
non-parametric?
What instrument(s) were used to measure Variance? Parametric or
non-parametric?
What instrument(s) were used to measure Covariance? Parametric or
non-parametric?
Was an Effect Size equivalent given where relevant?
What degree ofConfidence was established at the outset? (99% or 95%)
Table 5.2 Research Design Criteria checklist for MFE1.
Element
present?
Leakey Data collection method
Yes/No
Diagnostic survey of prior learning

Qualitative/ judgmental data
Diagnostic survey of learning style

Post-treatment survey of student reaction
CALL learning measures
Post-treatment student focus group

Post-treatment survey of staff reaction
Post-treatment staff focus group
Electronic/paper log/journal of student reaction
Test(s) of prior learning
empirical data
Quantitative/
Pre-test
Progress test (mid-treatment)
Post-test (identical to pre-+ progress test)
Table 5.3 Proto-typical (MFE1) version of the checklist for data collection methods.
Issues of internal and external validity are also crucial for the robustness of
any research project where data (be they qualitative or quantitative) are being
gathered for reporting to a wider readership. For MFE1, we pooled validity
assessment criteria from the literature, in particular Chapelle (1991) and Felix
(2000b) to enable us to develop a sound research design (Table 5.4).
Adapted from Element

Chapelle addressed?
Validity checklist
(1991) and
Yes/No/
Felix (2000b)
Detail
Is this an experimental (variables can be controlled/
manipulated) or quasi-experimental (variables cannot
be controlled/manipulated) study?
Have the students been randomly assigned to the
treatment and comparison groups?
Have the respondents been isolated from each other?
Are the results attributable to the factor(s) studied?
Internal validity
What other factors (variables) might have contributed

to the effect?
How will you control for extraneous variables (such as
learner/ teacher differences, variable settings, time of
day/week/year)?
Validity checklist
How certain are you the learners are not getting

language instruction apart from through this study?
Does the student reporting accurately reflect what
happened?
Are the different variables (independent/control/
dependent) clearly identified and reported?
Generalizable sample N > 30 use parametric tests
Sample less easily generalizable N < 30 use non-
parametric tests
External validity
To what extent can the results be generalized to other

populations, settings and experimental situations?
How relevant are they elsewhere?
Does the report describe the skills tested?
Does the report describe the characteristics of the
subjects? (i.e. age; gender; ability/year group; cohort/
course
Does the report describe the CALL materials used?
Table 5.4 Validity assessment criteria for MFE1 drawing from Chapelle (1991)
and Felix (2000b).
128 Chapter 5
Clearly some questions in the checklist are easier to address than others; all
will require careful thought and planning prior to the start ofthe project.
Some, such as random assignment of respondents and the amount oflan-
guage instruction being received outside of the study may be depend-
ent on institutional and timetabling arrangements and may well require
adjustments at this level. Others will require knowledge of the context of
the project in the wider field of research. Many will be determined at the
reporting stage. In the Case Studies not all the above criteria were met on
every occasion and the next chapters will report on the rigour of each Case
Studys construct. What follows is a summary of the main data collection
techniques used for MFE1 in the Case Studies.
Quantitative and qualitative data (MFE1)
Our review ofthe CALL and statistical literature has argued that the rich-
est data can best be gained by combining data types. Given the complexi-
ties involved in such multiple learning environments and permutations
as CALL can throw up, it would be impossible to come up with a single
optimal experimental design model to suit all requirements. More impor-
tant is an understanding ofthe different instruments and research designs
possible and the ability to match them to the setting. As Felix puts it:
Because there is such a large scope for research in this area, there cannot be a single
best design model. What is imperative, though, is that researchers match the design
to the research questions, the context in which the study takes place, the time-frame
available, the variables under investigation, their capacity of statistical analyses and
their ability to control for confounding elements. A short-term fully controlled
experimental design, for instance would be suitable to measure individual well-defined
outcome effects (), while a longer-term non-experimental study using qualitative
measures such as observational procedures and think-aloud protocols would yield
important data related to effects on learning processes. A combination of various
data collection methods within one single study will help in strengthening confidence
levels about results. (2004: 124; and 2005a: 12).
What follows is an outline of how various methods of statistical analysis

were used in the Case Studies, first to present and describe the data, be
they quantitative or qualitative, and then to make valid inferences from
such data.
Quantitative data 1: Descriptive statistics (MFE1)
Ideally, descriptive data should be displayed first, prior to any inferential

data, in tabulated or graphical format giving measures of central tendency
(i.e. mean, median and mode). Columns indicating the sample size, range
of the data, minimum and maximum scores, standard deviations and an
indication of the significance of the variances between compared groups
should be shown and explained. Such information is vital for the conveying
ofthe broad scope ofthe study and its statistical parameters, and a trained
eye will start to make interpretations from just these figures.
Quantitative data 2: Inferential statistics (MFE1)
Inferential statistics should form the main body of the statistical analysis.
The aim should be to test a pre-stated hypothesis by means of a variety
of statistical analyses in order to show whether there are any significant
relationships between compared data. A typical CALL-related hypothesis
might be that exposure to technology in language development makes no
difference to student progress (called the null hypothesis). We can accept
the alternative hypothesis (i.e. that exposure to technology makes a differ-
ence to progress in language development) ifthe significance value in our
compared means tests for pre- and post-test scores across the two groups
is less than or equal to 0.05 (i.e. is at a 95% or higher level of confidence).
Significance, or the level of probability (i.e. the Sig. or p value) that the
results are due to chance in a comparison of means, is shown as a value
between 0 and 1. The nearer to 0 a significance in the comparison of means
is, the more unlikely it is that the results are due to chance.
130 Chapter 5
The analyses we might be interested in are the following. We will

want to compare the students pre- and post-test total scores for a gauge of
progress over a semester, year or course. Also we will wish to know whether
progress has been made by both/all cohorts, be it/they a control or treat-
ment group. We will then want to compare pre- and post-test scores by the
independent variable of which teaching group they were in, to see if signifi-
cantly more or less progress was made by one or other. The comparison could
also be made using the independent variable of gender, comparing progress
within and across the groups. The above comparisons will then need to be
carried out across language learning tasks or skills to see if more progress is
made in certain skills than others and whether there were any reasonable
assumptions possible as to the causes for this. Finally, we will need to assess
whether factors, other than cohort group and treatment, are influencing
students progress. To determine this we might wish to look at testing for
other variables (attendance, prior learning, experience with CALL/ICT
and learning style) and run statistical tests to see ifthese variables stood in
significant (positive or negative) correlation with progress.
To compare the means in pre- and post-tests to see if there has been
any general improvement over a period of study for a cohort of students
(control and treatment groups together) or, secondly, to compare the means
of two or more independent samples (groups of individuals), in our case
the treatment group and the control group, to see if one group has made
significantly more progress than the other, might require that an inde-
pendent samples t-test (for parametric samples) be carried out, and its
non-parametric equivalent (Mann-Whitney Test). We conducted both
types since, for most of our samples, normality could not be assumed due
to their small size (N < 30). Normality can only be assumed for sample
sizes of thirty subjects or more.
To test the differences of means from two sets of observations from
the same group of individuals a repeated measures or paired samples t-test
(with its non-parametric equivalent, the Wilcoxon Matched Pairs test) was
regularly carried out. Here we were mainly interested in comparing one
groups performance under one set of conditions (e.g. the pre-test scores,
or by task and skill) with their performance under another set of condi-
tions (i.e. the post-test after a period of time has elapsed, again looking at
overall totals and separate tasks and skills). For the paired samples t-tests
a measure of the means for the same group of individuals was, typically,
repeated for dependent (or outcome) variable A (i.e. a pre-test score) and
dependent (or outcome) variable B (i.e. a post-test score).
Additionally we tested for degrees of relationship or correlation
between variables such as: attendance, language learning experience, ICT-
use, learning style and learning outcome. It is worth noting that correlation
does not imply causation. As with any correlation, there could be a third
variable which explains the association between the variables we measure.
So in the case of the TOLD project even if we showed that there was a
strong positive correlation, say, between ICT use score and progress in the
treatment group, a third variable such as positive exposure to something
new may be playing a significant role, especially in the first weeks of expe-
riencing a new multi-media lab.
Qualitative data (MFE1)
As already mentioned in Chapter 1 (section on Research Questions and

Methodology) qualitative data from the Case Studies was collected in
a variety of ways such as survey/questionnaire, observation, journals or
web-logs and focus groups. Other ways of collecting qualitative data, some
of which featured in our studies, are video observation, audio recording
of classes or interviews, participant audio diaries (transcribed), teacher
diaries, participant and non-participant observer logs. These are all ways
of observing learning processes and gauging affective responses by the
subjects to the treatment.
In addition to identifying the independent variables already mentioned
(learning style, prior language learning and experience with ICT) the
qualitative data sought focused on subject reaction to both the treatment
and the technology and concentrated on issues of motivation, positive
or negative impact of the measures and the resources, causes of frustra-
tion or satisfaction, and suggestions for possible improvements. These
data were collated, and where possible quantified (e.g. via Likert scales)
and then configured with interesting quantitative findings to see whether
there were any salient connections or disparities between the two. Some
of the configuring process was conducted by a mix of statistical tests (e.g.
132 Chapter 5
Pearson rho correlation between learning style, prior language learning

and ICT-use) and judgmental inference that sought to make connections
between non-quantifiable data (such as subjective student comments in a
focus-group) and quantitative findings.
Reporting the findings (MFE1)
It is essential that reporting offindings be honest, dispassionate and full. A

poor description, say, ofthe research design, with, for example, insufficient
information about the subjects gender, prior learning, language ability or
learner differences, will limit the external validity of a study. Too many
variables not properly isolated may lead to misleading inferences. On the
other hand, the validity of even a small-scale study containing a number of
variables may be improved with a well-constructed design and thorough
reporting. Felix gives the example of Vincents study (2001) which she
describes as:
an excellent example of what can be done to increase validity in a study with a very
limited number of subjects and with so much scope for outcomes having been pro-
duced by elements other than the treatment. Procedures are described in great detail.
Participating children were selected by rigorous selection criteria including scores
from recognized (and referenced) visual, verbal and spatial tests, interviews with
children and some parents and a log of classroom observations. (Felix 2005a: 15)
With a prototype for evaluating CALL now assembled it is time to look

at how some of its principles stood up to testing in a higher education
context. In the next chapter (Chapter 6) elements ofthe model are applied
in a judgmental evaluation of digital platforms, in particular Robotels
SmartClass (as used in the University of Ulster and a number of institu-
tions worldwide) and the Melissi Digital Classroom (as used in particular
by the University ofPortsmouth as well as a number of other institutions
in the UK and worldwide).
Chapter 6
Case Study 1: Evaluating digital platforms
Introduction and definitions
By definition, a platform is foundational, a launch-pad for software; one

might also see it as a skeleton, with software as the muscle. Without plat-
form or software, CALL pedagogy is homeless. For this reason digital
platforms are the first of the three Ps that will be looked at, programs
the second P and pedagogy the third. Each P has its own separate iden-
tity and may be evaluated separately, and usually is. Like modern medi-
cine which, since the discrediting of diagnosis of illness by means of the
humours, has tended to look at the individual parts of the body for the
causes of disease rather than the whole system, effectiveness research has
tended to be atomistic rather than holistic. Such measurement is easier,
having fewer variables to deal with and being, therefore, a more precise
science. This narrowly-focused approach may, however, sometimes miss
the wood for the trees.
The Case Studies look at the three Ps and their sub-elements as sepa-
rate entities, then step back and look at the whole. Their overarching goal
is to see how the one P relates to and interacts with the others and see if
there is a synergy at work. Platforms, like programs and pedagogies, can
differ vastly from one to the next, are designed to meet functional and
human needs, and have the capacity to motivate, surprise, and infuriate.
They can enhance, or detract from, the performance of the software and
the effectiveness ofteaching and learning. But do digital platforms submit
to a general definition?
In light of the differing technologies involved, and the varied role
played by hardware and software in these systems, an all-encompassing
134 Chapter 6
definition is not obvious. Outside of the world of education, probably

the first association made, when people think of digital platforms, is with
home entertainment and broadcast satellite systems. To secure our defini-
tion within education we might define a digital platform as an integrated
PC-based hardware-only, software-only or hybrid system, that enables the
storage and delivery of multimedia learning, via cable or wireless technol-
ogy, to multiple users, either within a networked laboratory or a virtual,
web-based, environment.
Digital platforms are, nowadays, as much software solutions as hard-
ware with several traditionally well-known manufacturers of language
laboratories more recently diversifying into software-controlled labora-
tories and online interfaces. The knowledge, experience and data gleaned
from three different sources will be drawn upon to develop the capacity of
the MFE to evaluate platforms. These sources are: CAL/CALL literature
and studies linked to digital platform evaluation; secondly, the research
projects (TOLD and BLINGUA) linked to the pedagogy Case Study;
and thirdly, recent studies looking at digital platforms as used in Higher
Education institutions in the UK.
The platform Case Study that then follows looks at the use of the
Robotel and Melissi labs in some UK universities, as well as making a
general comparison of digital labs with Virtual Learning Environments
(VLE) and Interactive Whiteboards (IWB). The analysis in this chapter
is essentially a qualitative/judgmental one drawing, in particular, upon the
evaluative criteria ofDunkel, Ingraham and Emery, and Hubbard. Neither
a quantitative study aimed at correlating student learning gains with plat-
form use nor a scored judgmental evaluation was attempted between 2003
and 2006, when the Case Studies were conducted. This was due to the fact
that the primary focus at the time was on gathering empirical data from the
program and pedagogy studies that feature in Chapters 7 and 8.
Chapter 9 will show how scored judgmental evaluation of platforms, as
well as of programs and pedagogy, might be conducted. For those research-
ers interested in quantitative studies correlating platform use with student
learning gains some ofthe methodologies for correlating program use with
learning gains discussed in Chapter 7 (e.g. t-tests linked to either a time-
series or longitudinal research design involving pre- and post-tests) can be
applied to platform impact studies, though these come with the same caveats
regarding correctly identifying, isolating, and reporting on extraneous vari-
ables. The evaluative reports on the interdisciplinary TICCIT (Alderman
1978), and PLATO (Murphy and Appel 1977) projects of the 1960s and
1970s give further useful insights on conducting large-scale quantitative
impact studies linked to early computer-based education systems.
It must be stated at the outset that this chapter, as well as the next two
chapters, look retrospectively at Case Studies that were reported on fully
at the time. For the purposes ofthis book they are discussed in the light of
their relevance to the evolving Model for Evaluation (MFE2), and as such
focus primarily on principal issues relating to the evaluation framework;
a complete reporting and statistical analysis was conducted in each case
and is available on request, but would have been excessively detailed for
inclusion in these next three chapters.
CALL Enhancement Criteria and digital platforms
For the purpose of this project digital platforms are divided into three
subsections: digital labs (whether driven by hardware, software or a hybrid
ofboth), VLEs and Interactive Whiteboards. Each occupies very different
spaces; a digital lab is a discrete, self-contained physical space, a VLE exists
in cyber-space accessible from any location with access to the Internet,
and IWBs are a mobile resource that can be installed in any physical space
(classroom or lab) with access to an electrical socket, a master PC, and, ide-
ally, the Internet. This section will clarify what each of these is, show-case
some examples, and discuss some of the evaluative and pedagogy-related
issues that pertain to each.
It is evident that the functionalities of digital platforms sit well with
nearly all ofthe CALL Enhancement Criteria. Some criteria, such as lan-
guage learning potential, are clearly more relevant to programs and peda-
gogy than platforms. For most if not all others, there is a direct relevance
to the functionality of the platform itself, be it physical or virtual. When
one considers, for example, meaning focus, there is a clear link with Hewett
et al.s reference to the capacity of a platform to provide access to digital
136 Chapter 6
references or enhance meaning inference via coding elements. As for posi-

tive impact, the evaluator might ask whether the interface increases or
decreases motivation, frustrates, arouses curiosity, etc. Table 6.1 provides
evaluator prompts relevant to platforms for each of the twelve CALL
Enhancement Criteria. Chapter 9 will include a gradable version of this
table. Given the rapidly evolving nature of such platforms, and their increas-
ing liberation from physical locations, both CALL pedagogy and CALL
evaluation will need to demonstrate a growing flexibility and pragmatism
to ensure that they respond to evolving learner and teacher experiences and
expectations of digital resources and learning spaces. It is hoped that the
MFE contains criteria that will allow for evaluation to follow the times.
12 criteria for
judging CALL Platform-judging considerations
enhancement
Does the platform support software that allows for a beneficial
Language
focus on form? Does it support drill-and-practice and
learning
vocabulary acquisition activities? Does it enable rapid error
potential
correction and feedback linked to the focus on form?
Does the platform allow learners of different abilities, learning
style, ages and genders to learn together or in differentiated
Learner fit groups? How well does it support diagnosis of learner levels
and needs, and customization of materials and learning paths to
these levels and needs?
Meaning What capacity does the platform have to provide access to digital
focus references or enhance meaning inference via coding elements?
Chapelle
Does the platform deliver access to the World Wide Web?

Can this be controlled from the tutor console? How well does
the platform lend itself to harnessing technologies that are
Authenticity widely-used in the real world (such as USB connectors, mobile
telephony, Flipcams, digital recorders, Webcams, Skype)? Does
it support simulations of real-life activity (such as simulated
texting/chat/phone calls, or a newsroom/TV/radio studio?
Positive Does the interface increase or decrease motivation, frustrates,
impact arouses curiosity, etc?
Is the platform easy to use? Does it comply with accessibility
Practicality considerations? How reliable, stable and glitch-free is the
platform?
Language
How efficiently does the platform deliver sound, images, record,
skills and
and playback? How easy is it to combine language skills via
combina-
multimedia using this platform?
tions of skills
What degree of interaction, choice, control and manipulation of
Learner
material is enabled by the functionality of the platform itself, as
control
opposed to the CALL software program?
Can the teacher monitor student screens from the teacher
Error cor- console, take control of a students keyboard and/or mouse, and
rection and intervene to provide discrete individual or group feedback? Can
feedback the teacher or student readily access a record of performance and
progress?
Leakey
Does the platform allow for flexible lay-out and learner-friendly

ergonomics? Does it support random or fixed audio/video-
pairing? Does it support interactive paired or group work
Collaborative (e.g. through simulations of real-life activity, such as simulated
CALL texting/chat/phone calls, or a newsroom/TV/radio studio)?
How well does the VLE support group interaction (such as
through wikis, blogs, discussion forums, comment and feedback
boxes, file-sharing and assignment drop-boxes)?
How well does the lab allow the teacher use and switch between
Teacher
different teacher styles: e.g. pedagogue, facilitator, front-of class,
factor
roaming?
Tuition Is the platform flexible enough to allow for, and switch easily
delivery between, a range of modes ofteaching (i.e. teacher to class, teacher
modes to individual, conference teaching, distance teaching, etc.)?
Table 6.1 Platform-judging considerations linked to the CALL Enhancement Criteria.
Hardware, software and hybrid solutions
Davies et al.s (2005) comprehensive list of companies that manufacture

digital platforms for CALL has already been mentioned in Chapter 2.
A look at the resources currently available from these companies reveals
variation in hardware and software platform solutions and highlights the
challenge of applying our evaluative framework.
138 Chapter 6
At least one ofthese solutions employs simple to use audio panels and
headsets, the only PC being that ofthe teacher who has a software-driven
interface. There are a number ofhybrid (software/hardware) solutions on
the market, where a software- or hardware-driven digital recorder is required
for the recording and playback of multimedia learning material.
Ifthere is a trend to be observed it is towards virtualization and away
from confinement to specific physical locations, in response to the grow-
ing ability of lap-tops, and indeed mobile phones, to access remotely and
engage with learning objects in cyberspace. All the same companies that
have historically invested heavily in analogue and digital labs appear to be
hedging their bets and endeavouring to ensure that their latest labs are able
to serve both single networked sites and deliver remote learning.
Virtual Learning Environments (VLEs)
Davies et al.s report makes brief reference to Virtual Learning Environments

(VLE). The main weakness identified by the authors is the relationship
between VLEs and pedagogy: VLEs have certain advantages in terms of
ease of delivery and management oflearning materials. They may, however,
be restrictive in that the underlying pedagogy attempts to address a very
wide range of subjects, and thus does not necessarily fit in with established
practice in language learning and teaching. (2005: 9).
This point also applies to commercial software whose underlying peda-
gogical assumptions may, or may not, be appropriate to the purchasing
institution. Chapter 8 features the VLE WebCT in the pedagogy projects
involving University ofUlster. The evaluative approach adopted for digital
labs is equally applicable to VLEs as these environments are important to
language learning and probably in more widespread use now than digital
laboratories, given that they are web-based, do not require a dedicated
physical space, and some (such as Moodle) are entirely free.
Interactive Whiteboards
The Interactive Whiteboard (IWB) is another technology that might be

considered to fall within the category of digital platform as it, too, is a
content-neutral digital shell and distribution base for learning. A number
of CALL researchers see the IWB as playing an increasingly significant
role in language learning. In the 2005 report discussed above Davies et al.,
for example, say of the IWB:
These are becoming increasingly common in all types of teaching establishments.

They can be used at varying levels of complexity, but there is an inherent risk that they
may encourage a teacher-led approach. However, learners can participate actively,
receiving feedback and checking their progress in a variety of ways, even when using
programs that tend to be presentational, such as PowerPoint. (2005: 8)
In the Introduction to CALL on the ICT4LT website IWBs are seen as a

welcome solution to reviving the threatened art of whole-class teaching:
The advent of lower-priced, high-quality projection equipment and the
interactive whiteboard have brought about a revival in whole-class teaching
with a computer. (Davies (ed.) 2007, Section 4 of Module 1.4).
Bell (2002) presents thirteen arguments in favour ofthe use ofIWBs
in language teaching; these include: its ability to accommodate different
learning styles, popularity with all ages of students, suitability for distance
learning and constructivist approaches to teaching, cost-effectiveness com-
pared with labs, and its integrability with a number of different media.
Towards a framework for evaluating platforms in CALL
A phenomenological comparison of three different platform types
Dunkels evaluative criteria, as gathered in Table 4.3 of Chapter 4, can be

usefully applied to a comparison ofthe virtues and weaknesses ofthe three
main types of digital platform in use in schools, colleges and universities,
140 Chapter 6
namely, digital labs, VLEs, and interactive whiteboards. MFE2 will, in

Chapter 9, build on this evaluative framework to enable a numerical value
to be given to each platform based on its ability to meet the range ofCALL
learning criteria. Table 6.2 uses Dunkels criteria to compare the functional-
ity of VLEs, IWBs and digital labs.
Our Model for Evaluation aims to support both analysis of individual
platform types and comparative assessment of digital labs, VLEs and IWBs.
Comparison of technical specifications can readily be made by compar-
ing published charts. What is less visible is an evaluative framework that
will help researchers, platform designers, software developers, tutors and
budget holders to weigh up the comparative strengths and weaknesses of
each platform when the latter perform in CALL pedagogic contexts.
Table 6.2 provides an initial commentary comparing the three main
sub-types of digital platform identified. One must acknowledge, of course,
that some VLEs will be better or more efficient, cost-effective moti-
vational than others, and that some IWBs and digital labs will be better
than other IWBs and digital labs. General statements such as probably
the most expensive option will need to be qualified as some systems are
more expensive than others, and a comparative rating based on the dig-
ital genre rather than specific branded products would be inaccurate and
unfair at this stage. Future studies will need to attempt such ratings, based
on qualitative and quantitative analyses. Nevertheless some general points
about the comparative pros and cons of each genre are possible and valid.
For example, where IWBs gain in terms of practicality and flexibility in
servicing different learning environments they lose out against both VLEs
and digital labs in terms ofthe discreteness oftheir feed-back. IWBs, on the
other hand, will surpass digital labs and most VLEs on most cost effective-
ness measures (though the jury is still out on accelerated learning), and
score well on popularity with learners of all ages, flexibility and economy
with regard to physical space.
Designers of digital platforms, like courseware designers, employ their
own goals and principles of design, such as accessibility, interoperability
and interactivity, upgrade-ability drawn from theories of design, educa-
tion, psychology, computer science and others. Some of these principles
should, of course, feature in CALL effectiveness research evaluative theory,
but the latter should not be confined to or solely determined by them, nor
are all design issues directly relevant, for questions of language learning
and computer-assisted language learning should be the principal driver in
CALL evaluation.
At this stage, digital platforms will be submitted to an evaluation in the
light ofthe three software-based frameworks used in the previous chapters:
Ingraham and Emery (1991), Hubbard (1988), and Dunkel (1991). Also,
Davies et al.s (2005) guidelines for Setting up effective digital language
platforms and multimedia ICT suites for MFL, and the more recent Uni-
versity of Ulster/ LLAS survey on multimedia language learning in UK
universities (Toner et al. 2007) are harnessed in order to update and apply
the criteria from earlier publications to the current context. For our contex-
tual analyses of the Robotel and Melissi platforms we draw on data from
the TOLD and BLINGUA projects at Ulster and the Clarke (2005) survey
of Melissi Digital Classroom users at the University of Portsmouth.
Recent evaluative reports on digital platforms
Two important studies concentrating on digital platform use were the

Davies et al. (2005) report commissioned by CILT, and the Toner et al.
(2007) survey report which has already featured in the mapping exercise
in Chapter 4 (Table 4.5). Davies et al. provided guidance on good prac-
tice and Toner et al. was a survey on the nature and extent of their use,
especially in Higher Education in the UK. What follows below is a brief
rsum of the principal points and findings of these reports as they relate
to the evaluation of platforms.
The Davies et al. paper, commissioned by CILT (The UK National
Centre for Languages) and entitled Setting up effective digital language
laboratories and multimedia ICT suites for MFL, provided at the time a
much-needed overview ofthe evolution oftechnology to support language
teaching, defined terms, listed advantages of digital labs, and gave advice on
the integration of digital systems and software into language teaching.
Descriptor VLE
Cost-effective especially if freeware; can accelerate
Does the platform save time? Is it cost- learning due to accessibility outside class and hours;
efficient; does it accelerate learning; does reduction of workload through quiz correction and
it reduce teacher workload feedback, long-term storage of files, whole cohort
communication functionality etc
Student reaction to the digital platform Generally positive if students receive training and
and the multimedia environment the VLE is integrated into learning
Increasing capacity of VLEs to deliver all skills
The effectiveness of platform use in
(e.g. WIMBA functionality for speaking/
delivering specific content and language
listening; video objects and most media files can
skill areas. Learnability and efficiency
be imported; area studies etc enhanced through
of use factors provided by the platform
hyperlinks to other web resources)
The different impact of different types
Monitoring and tracking possible; best suited to
of platform functionality (broadcast,
autonomous learning; most multimedia elements
monitoring by tutor, audio pairing,
integrate well
integration of multimedia elements, etc)
The impact of the platform in serving

Highly flexible, can be integrated into whole class
different types of computer environment
lab-based teaching, self-access learning. Will sit on
(Teaching or self-access lab, virtual
any digital platform that gives access to WWW
environment, hard-wired or wireless,
and/or the VLE server
CAI, CMI, CEI, WELL)
The uses of the platform in relation to

other spaces (i.e., as a supplement to, Excellent for distant learning where the physical
versus a replacement for, traditional spaces are not available; or else can function in a
spaces such as the lecture room, seminar blended resources/spaces environment
room, library; or for blended spaces)
The levels of student competence. Is the
Serves the autonomous learner best; excellent
platform best suited to REMEDIAL
resource for reinforcement or remedial work; but
work and underachievers, or to very able
also can provide material to extend high achievers
students or to all types?
Which learner differences (age, gender) Autonomous learners will benefit most; all
/ learning styles/ learner strategies is the learner styles can benefit depending on the coding
digital platform best suited to? elements, manner of presentation and media used
VLE provides for all of these including monitoring

What role does monitoring, tracking
of amount of time spent on task. Synchronous and
and feedback play in the effectiveness of
asynchronous feedback through email, discussion
the digital platform?
forum, quiz scoring functionality
What degree of learner control is related Tutor must upload resources and links; after that
to effective CALL digital platform the student has high level of control over access,
design? timing, rate of work and interaction
IWB Digital labs
Cost-effective compared with digital lab purchase;
can accelerate learning due to motivational factor; Probably most expensive option. Other
may reduce workload through enabling teacher to two cannot match the broadcast, scan,
stay in his/her classroom and not have to transfer pairing/group functionality
to lab
Qualitative studies show this to be a popular tool Popular if ergonomically sound and
with all ages well integrated into teaching
Probably allows for the widest range

IWBs can also link with other media and support of media delivery. Can induce more
full range of software passive learning if used as a class-room
or lecture theatre
Best suited for monitoring and tracking

Monitoring and tracking not necessary as this tool
as the tutor is there physically but
is best suited to whole class or group learning; most
also can monitor all students using
multimedia elements integrate well
technology
Best suited to traditional classroom where it can
replicate some lab functionality in a whole class
Can be teaching lab or self-access; VLE
environment, although some institutions use
+ IWB can be accessed from it. Less
it in the lab to enhance interactivity; especially
flexible for face-to-face group work
useful where lab does not have a digital projector,
broadcast or capture function
Can be used in blended spaces or as

substitute for classroom, but IWBs
As above
more flexible as they can exist within
the traditional class-room space
All types from the academic university student and

business professional to the special needs pupil at Should cater for all competences
Primary level
Tactile learners can benefit from touching and

Should cater for all learning styles and
marking at the board, audio learners can have the
strategies depending on the media and
class discussion, visual learners can see what is taking
pedagogy used
place as it develops at the board (Bell, 2002)
IWBs can support monitoring and tracking within
A vital role, otherwise the digital lab
the learner software package installed. Tutor at front
is little more than a room filled with
of class will provide most of the feedback. One-to-
computers
one feedback not possible.
Depends on the teacher and the software installed;
The best digital labs allow for high level
only those students with access to screen are
of student control once the tutor has
exercising control (except where ACTIVote is
uploaded the relevant materials
available).
Table 6.2 Comparison of three digital platforms: VLEs, IWBs and Digital labs.
144 Chapter 6
The advantages of digital platforms, as given by Davies et al., are: interactiv-

ity, ease of movement between different applications, potential for teacher
intervention, potential for independent learning and versatility, ease of inte-
gration of the range of media. Versatility is explained thus: Text, images,
audio and video can easily be integrated, and the teacher can remodel and
alter materials to fit the needs of different language teaching programmes
(p. 6). Though not in name, the Davies et al.s principles have also informed
the twelve CALL Enhancement Criteria. For example, potential for teacher
intervention will come under tuition delivery modes; versatility and
interactivity relate to both tuition delivery modes and learner control
as well as collaborative CALL and combinations of skills.
The reports most useful contribution seems to be its guidelines for an
appropriate approach to the integration of digital platforms into teaching.
Issues to be carefully considered include the costs ofinstalling, maintain-
ing, supporting and upgrading equipment, time needed for stafftraining,
selection and creation of resources, and management of resources, which
are all linked to the criterion ofpracticality and its sub-theme cost effec-
tiveness. Davies et al. are adamant that a change of culture is required that
necessitates a time and budget commitment to stafftraining to ensure the
resource is not underused or misused and that new social and cognitive/
constructivist learning theories are taken into account (p. 6). In short,
there is in this report a restating of the importance of proper pedagogical
integration ofthe new resources into the curriculum and the teaching and
learning culture of an institution. CALL evaluation should contribute to
informed migration to new platforms, programs or pedagogies and needs
to assess the appropriateness oftraining, or the lack of it, in any educational
institutions transition to a new platform, program or pedagogy.
Davies et al. have highlighted some key principles of evaluation; the
survey that followed in 2006, and reported on in 2007 goes on to provide
a picture of the reality on the ground and give a clearer idea of the extent
of use of multimedia in UK Higher Education institutions. It also gives
some insight, albeit a limited one, into the extent to which good practice
guidelines, such as those advocated by Davies et al., are being followed.
A comprehensive Model for Evaluation should generate further insights
regarding the quality control of digital platforms use and their integration
into CALL pedagogy.
The University of Ulster/ LLAS Survey (Toner et al. 2007)
In 2006 a web-based questionnaire, entitled Multimedia Language learning

in Higher Education in the UK, was originated by the University ofUlster
Centre ofExcellence in Multimedia Language learning and administered
by the Subject Centre for Languages, Linguistics and Area Studies based
at the University of Southampton. After it was circulated to HE Institu-
tions across the UK and Europe a total of 147 responses were returned. 83
of these were from individuals in UK universities, and a total of 56 UK
HE establishments were represented. While acknowledging the upward
trend in interest and use of digital labs, the report asserts the necessity for
continued examination of good practice and research in the area of digital
lab technology. The survey also gives some insight into the use of digital
platforms (digital labs, VLEs and IWBs), and useful comparison with
environments such as classrooms and lecture theatres. It acknowledges the
increased and varied use of VLEs but advocates the value in establishing
good practice guidelines for language teachers in these areas and urges
better use ofthe tools provided by VLEs that may benefit language learn-
ing. Issues, such unfamiliarity with the technology and technical glitches,
are seen as off-putting and reinforce Davies et al.s call for more and better
stafftraining. The report also notes the increased use ofIWBs in HE (22.1
per cent of respondents) but highlights the continued gulf between HE
and the primary/secondary sectors where usage is much more extensive.
While over half of the respondents (54.2 per cent) use PCs in their
teaching on a regular or very regular basis, the most popular choice of
location for teaching is the classroom (46.4 per cent of respondents con-
duct their teaching in classrooms). The authors speculate that the reason
for this is that institutions will have more classrooms than other teaching
locations, such as labs. The report goes on to confirm the decline of the
analogue lab in favour of the digital lab, with findings such as only 3.4%
of respondents confirmed that 60% or more of their teaching takes place
in an analogue language lab, while 12.8% of their respondents teach the
same proportion of their classes in a digital lab.
As for the kind of language learning activities using technology, the
report states that: Irrespective of where teaching takes place, one of the
most popular activities is using audiovisual resources. However, the report
146 Chapter 6
goes on to state that the digital lab is underused for this purpose, and that
the CALL package is the most frequently used resource in such locations,
over and above teacher-devised activities, and speculates that the cause
of this is the issue of stigmatizationa danger that digital labs are seen
as highly specialized areas, only to be used for certain teaching activities.
CALL effectiveness research needs to test the validity of such speculation
by means of case studies, staff and student focus groups and ethnographic
research, and provide managers and teachers with evidence ofthe benefits
of stafftraining and enriched student learning. MFE targets the quality of
integration of digital platforms not only by asking questions about train-
ing but also with evaluation criteria such as Ingraham and Emerys crite-
ria: Supports Course structure, Supports lesson structure, Adaptable to
different language learning methods and Supports CAL methodology
(see Table 6.3).
In Chapter 4, key principles from the LLAS report were mapped
against Chapelles criteria (Table 4.6). These include: the encouragement
of autonomous learning, the impact on teacher interaction with students,
interaction among students, the integration of audio/video and other
media to classes, the provision and storage of audio/video and other media
files, the encouragement of student engagement, the effect on tutor work-
load, the impact oftechnical problems upon effectiveness ofthe platform,
and the impact on tutor contact hours. Most echo the criteria from Dunkel,
Hubbard, and Ingraham and Emery, which have been used (in Table 6.2)
to compare digital labs with VLEs and Interactive Whiteboards, and will
be used now to evaluate the Robotel and Melissi platforms.
Digital platforms in the Case Studies
While there is no evidence of a direct correlation between comfort/envi-

ronment and learning gains (Felix 2007), qualitative feedback suggests that
it at least has a bearing. The digital platform is the central feature of the
CALL environment. While the digital platforms studied all have excellent
functionality which would probably score well on an out-of-context evalua-

tion, one ofthe aims ofthis thesis is to evaluate the usability ofthe platforms
within CALL settings and as part of a pedagogical process. The pedagogy
Case Study in Chapter 8 (featuring the TOLD and BLINGUA projects)
features student reaction which mentioned the impact that, for good or ill,
the learning environment and the digital platform made on the students
motivation and experience. Several Ulster students after the TOLD and
BLINGUA1 projects expressed their dissatisfaction with the ergonomics
and lay-out ofthe transitional multimedia lab, and after BLINGUA2 their
relief at being in a comfortable lab laid out in a manner they felt was more
conducive to learning, preferring the more spacious and attractive new
lab where students face the teacher to the confined transitional lab where
students did faced the wall rather the teacher. They also commented favour-
ably on the broadcast, monitoring and interactive features ofthe Robotel
SmartClass system, and were critical when they felt it was underused and
its functionality not integrated into learning.
The Robotel and Melissi labs that feature below were selected prima-
rily for the fact that they are representative of the two primary digital lab
solutions available: a hard-wired solution (Robotel) and a software solu-
tion (Melissi); Robotel is the system used at Ulster and Melissi is used at
Portsmouth. The University of Portsmouth was chosen for the platform
Case Study as its use of the platform was comparable to Ulsters in that
it was integrated into classroom teaching (rather than being primarily a
self-access space, as is sometimes the case).
Case Study 1(a) Robotel at the University of Ulster
The Robotel and Melissi Case Studies provide a framework for correlat-
ing, qualitatively if not quantitatively, the impact of the chosen platform
and to provide a gauge of the synergy of the three Ps within the teaching
and learning experience.
148 Chapter 6
Multimedia spaces at Ulster
Ulster currently operates two multimedia labs equipped with Robotel

technology (that is the SmartClass2000 lab and its IDL (Interactive Dis-
tance Learning) technology). Its Coleraine campus is equipped with two
partitioned (but joinable) rooms (one with 20 workstations plus the teacher
console and the other with 22 workstations plus the teacher console). Its
Magee campus has 25 workstations plus the teacher console. The system
is a hybrid analogue/digital platform which enables digitized audio and
video to be used in conjunction with a student virtual recorder (SCVR)
for interactive individual and group learning with individual record and
playback and saving of student work. The language tutor can monitor
and communicate with individual students or the whole class as well as
broadcast material from and to any workstation in the network. Other
functions include the teacher control student screens, keyboards and mouse
functions. The lab can link remotely to its sister campus equipped with the
same platform. This remote communication function was not included in
this study as the focus was on communication within the lab.
Robotel evaluated against Ingraham and Emery criteria
An essential difference between a digital platform and most language

learning software is that, while the latter is filled with content, the former
is a content-neutral shell, and so the former needs to be judged more on
its ability to support teaching and learning modalities, multiple media
and room layouts, rather than do the actual teaching; it is thus always a
tool, and never a tutor. Ingraham and Emerys criteria with their primary
focus on supports for learning (Table 6.3) adapt well to platforms and
human-computer interaction (HCI) considerations (see Hewett et al.
1996), such as screen layout and ergonomic layout of a digital lab, as well
as course content or structure. Using this table a comparison ofthe TMM
and BLINGUA contexts throws up interesting insights. The cells with
italicized text in the two right-hand columns highlight instances where
the platform provides an added value to the delivery of pedagogy, either
via prepackaged language software such as TMM, or in the area studies
context where the main resources were more open-ended (i.e. Internet,
WORD and PowerPoint). The Ingraham and Emery criteria, for exam-
ple, show how the Robotel digital platform enabled TMM to be used for
more teacher-led instruction than the software designers may originally
have intended, by means of the scan and broadcast functionality to dem-
onstrate, say, a learning path and sequence of activities via the broadcast
function and then allow students to practice individually. With regard to
area studies teaching the Robotel digital platform, for example, is shown
to allow the teacher to switch swiftly between a teacher-led scenario and
an autonomous-learning setting.
The table also reveals the limitations of a digital platform; for example,
it will not enable the opening up of a closed database or software system:
teachers will not be able to use the Robotel SCVI to author any content
in the TMM software. Nor will they be able to use the software for paired
activities as all the interaction in the TMM dialogues is between the stu-
dent and the software.
150
Intended use of platform in teaching context:
TMM: Tutor mainly as facilitator and monitor
Area Studies (BLINGUA): Tutor as Lecturer and facilitator
Robotel For Area Studies teaching
Descriptor For teaching via TMM
SmartClass (BLINGUA)
SCVI enables tutor monitoring may
Overall objectives and
Digital platform caters to different SCVI is neutral and flexible in this

cater to the less competent, or techno-
levels of competence regard
phobic
structure
(adapted from Ingraham and Emery 1991)
Digital platform makes no differ-

Supports course structure ditto
ence here
SmartClass is a content shell. SCVI
Supports lesson structure ditto
Digital Platform
Lesson Editor enables lesson creation

Methodological issues
Digital platform enables teacher-led Teacher-led through SCVI lesson editor,

Adaptable to different language
and monitored use of software for or teacher from front. Full autonomy
learning methods
autonomous learning also possible
Scaffolded SLA: drill and practice Scaffolded SLA and/or drill and prac-
Supports CAL methodology
modes etc tice modes etc
Chapter 6
Delivery of the televisual Digital platform makes no differ- Broadcast, Capture, and Flex/Pairing
environment ence here modes
Delivery of the windows Digital platform makes no differ-
ditto
environment ence here
Supports Screen design ditto n/a
Yes can intergrate with other media

Digital platform will not open up
Supports hypermedia and linearty (such as digitizer, satellite TV, remote
Interface issues
the closed system that is TMM
location conferencing)
Yes through student control of own Yes through student control of own
Enables range of Autonomy and workstation and tutor broadcast and workstation and tutor broadcast and
Digital Platform
teacher control screen/mouse/keyboard capture + inter- screen/mouse/keyboard capture + inter-

vention mode vention mode
Yes scaffolded SLA and/or drill and Yes scaffolded SLA and/or drill and
Enables autonomy and self-tuition
practice modes etc. practice modesetc
Digital platform will not open up Supports Windows environment and
Provides access to authenticity
the closed system Internet access
Yes broadcast mode or student control

Supports active and passive learning Yes broadcast mode or student control
of own workstation
Yes through broadcast and student

Yes through broadcast and student
Allows for interaction and response control, audio pairing, call for help
control, call for help button etc
button etc
Table 6.3 MFE1 table mapping Robotel functionality against Ingraham and Emery (1991)
for the purposes of digital platform evaluation.
151
152 Chapter 6
Robotel evaluated against Hubbard criteria
Hubbards criteria (Table 6.4), on the other hand, articulate pedagogic

descriptors which apply less readily to platforms; however, by changing
key verbs in the descriptors (e.g. provides to enables) Hubbards criteria
will engender useful analysis of digital platforms. Insights into strengths (in
white cells in table) and weaknesses ofthe Robotel platform are provided.
For example, the platform allowed, via the audio-pairing and SCVR digital
recorder, meaningful communication on both a learner-to-learner and
learner-to-computer basis within the area studies (BLINGUA) context,
but was limited to learner-to-computer with TMM as the latter excludes
learner-to-learner communication. On the other hand, the teacher scan
and intervention functionalities allowed for reduction of affective hurdles
in students in both contexts. The principal negative comment relates to a
conflict between the SCVI and the pre-existing LAN network-server; a
possible weakness in the Robotel software, but it was one which the tech-
nician was able to circumvent. As with the previous table, the cells with
italicized text in the two right-hand columns highlight instances where the
platform provides an added value to the delivery of pedagogy.
Ulster has recently complemented its hardware-driven digital platform
with the software-driven Sony Virtuoso digital platform, which will work
alongside the Robotel system and make up for some ofthe shortfall in the
Robotel functionality. One advantage it will bring is that students will be
able to access it remotely and wirelessly from PCs and lap-tops within the
bounds of the University. This will greatly enhance autonomous learning
and allow teachers to set homework knowing that students will be able
to use many resources that until now have been confined to the physical
space of the digital lab.
Case Study 1(b) Melissi at the University of Portsmouth
Language learning context
Clarke (2005) claims that the University of Portsmouth is bucking the

trend of the decline in the uptake of Modern Languages courses at HE
level. This may or may not be connected to their investment in digital lab
technology. Between 2002 and 2005 the School of Language and Area
Studies purchased the Melissi multimedia lab (called the Digital Class-
room), the first university to do so. Over this period they equipped 70
workstations (three teaching labs and a self-access lab) at a cost of around
125,000 (Clarke 2005). Having looked at other similar products on the
market (notably Sanako and ASC systems), according to the Resources
Manager, they opted for Melissi to take advantage of cutting-edge devel-
opments on the IT front, which allowed audio and video to be recorded
and streamed for student positions in digital format, whereas the existing
players were still using analogue distribution methods. It was a product
designed by software engineers from a UK university who had developed
a software-driven platform that would emulate the existing functions of
analogue laboratories digitally and at the same time allow a portfolio of
activities to be put together through the medium ofthe PC without lengthy
and time-consuming preparation (Clarke 2005: 5). It was seemingly the
first off-the-shelf product of its kind to produce subtitles in conjunction
with videos. It offered the facility to digitize pre-existing audio and video
tapes, to view minimized or full-screen images without picture degrada-
tion while multi-tasking with other activities such as gap-filling activities
and consulting an online dictionary, to move easily between applications,
and easy customizability of materials by the teacher to suit the require-
ments of specific lessons. Clarke also claims as a singular feature Melissis
facility to create and overlay subtitles on video images without infringing
copyright restrictions because each file although permanently associated
is achieved as a separate entity (2005: 5). The lab is primarily used for
language teaching. Clarke reports that there was less institutional support
for its use as a resource to teach Area Studies.
154
Robotel For Area Studies teaching
Descriptor For teaching via TMM
SmartClass (BLINGUA)
Enables meaningful communicative Yes through SCVR digital recorder,

Digital platform makes no differ-
interaction between student and access to full PC functionality, WWW,
ence here
computer etc
Enables comprehensible input at

SmartClass is a neutral shell. Input
a level just beyond that currently ditto
depends on tutor or student
Digital Platform
SCVR caused frustration in the

early days as it conflicted with our
Scan and broadcast functionality helps Novell server and frequently crashed
Promotes a positive self image in the
tutor to assist those students struggling affective hurdle! Feedback, discreet
learner
with the software monitoring and intervention avoid stu-
dent embarrassment. Good practice can
also be broadcast to all
Chapter 6
Motivates the learner to use it Yes through the above See above
Motivates the learner to acquire the
ditto See above
language
Digital platform can reduce frustra-

(adapted from Ingraham and Emery 1991) Provides a challenge but does not
tion and help tutor to set targets See above
produce frustration or anxiety
more easily
See above. Appropriateness of error

Digital Platform
Tutor intervention function can add

Allows for appropriate error correction depends largely on the tutor,
value to the softwares own feedback
correction but at least SmartClass allows the tutor
function
to be discreet or overt
Yes through SCVR or microphone to

Allows for student production of Digital platform makes no differ- tutor. Broadcast of individual to whole
comprehensible output ence here group is also possible though takes a
while to set up
No, but the real limitation is in the

Allows for learner-learner interaction Yes through audio-pairing, flex mode,
software: TMM dialogues are between
in the target language one-to-one with the tutor and SCVR
the user and the PC
Table 6.4 MFE1 table mapping Robotel functionality against Hubbard (1998) for the purposes of digital platform evaluation.
155
156 Chapter 6
The Melissi Case Study will look at the claims for the Melissi Digital
Classroom and evaluate its impact in the light of staff and student focus
groups/surveys conducted by Clarke. It will then be compared with the
Robotel SmartClass 2000 system using the same MFE1 evaluative criteria
that were used for the Robotel study above.
The Melissi website draws attention to the primary difference between
the software and hardware solution:
Traditional language laboratories, and even some ofthe newer so-called digital labs,
still need dedicated wiring, making multi-use difficult. The Melissi Digital Classroom,
however, is not constrained by analogue wiring so the PCs can be installed almost
anywhere there is a suitable network. It can even be split over two or more rooms
providing that they are connected to the same network switch.
The absence of analogue wiring and the flexibility ofthe software solution
are probably Melissis main selling points; software solutions appear to be
the direction that most, if not all, digital platform providers are going. Even
those companies traditionally known for hardware solutions, such as Sanako
and Robotel, have developed a range of software solutions. In Robotels case
they have developed two digital platforms since SmartClass: a software
platform solution Symposium (targeted at fixed language learning environ-
ments and a more flexible virtual lab solution (LogoLab) targeted at
higher education applications requiring a virtual language lab solution, per-
mitting students to tackle media activitiesat their own time and pace from
any computer on campus (source: <www.Robotel.ca/english/documents/
NewsRelease_LogoLAB_200603-10.pdf>; accessed 1 January 2008). The
aim ofthese is possibly to improve on the more limited functionality oftheir
hard-wired system. The design of this product and its targeting at the HE
sector may well be in response to the Melissi challenge and Robotel would
doubtless now claim their system, too, makes multi-use easy.
Melissi evaluated against Hubbard criteria
For Robotel to match Melissis functionality, quite apart from the hardware/
software difference, it would need to look at developing its own equivalent
to the Black Box for interpretation work as well as a learner-to-learner
communication system (for text/chat and phone (i.e. audio link)) that is
learner controlled, but which the tutor can control from the teacher desk.
When one applies the pedagogic criteria ofHubbard, Ingraham and Emery,
and Dunkel the verdict does balance out. Against Hubbards acquisition
criteria (Table 6.5) one begins to see how much ofthe platforms effective-
ness will depend on teacher input and use. It also shows the value of the
Black Box, which was not being used at time of writing.
While there is wide functionality the platform will need to be effec-
tively harnessed to ensure students are motivated to use it for language
learning and comprehensible output. As we will see in the student and staff
feedback, there was limited use of Melissis functionality and it is maybe
not surprising that affective feedback was mixed.
Clarke Melissi survey (2005): Student feedback
Students were questioned on three aspects of their reaction to the use

of Melissi: which functionalities (or modalities) they had used, which
aspects they had found most useful and what technical problems they had
encountered. Key findings were: the students responded most positively to
the enhanced sound and picture quality ofthe video; other functionalities
responded to positively were the ease of use, and the design ofthe applica-
tion, versatility and variety of the application, increased learner control
and autonomy which the students understood to mean control over the
video and the ability to work more at their own pace in class. Features that
were not quite so well received were the customized and personalized files
and the possibility for students to create their own work. An uneven
response was given to the possibility of interaction via phone, email and
text (pp. 69). EFL students used the labs more than MFL students, and
made more use of the online dictionaries.
The Ingraham and Emery table (Table 6.6) highlights the versatility
and support for autonomous learning that the students appreciated, but
also exposes the restricted use by Portsmouth of its adaptability to different
language learning methods, even though its functionality suggests it has
great potential for blended teaching.
158 Chapter 6
Melissi
Digital Descriptor Comment
Classroom
Yes through A-V with text
Enables meaningful
comprehension functionality, Black
communicative interaction
Box, access to full PC functionality,
between student and computer
WWW, etc
Enables comprehensible input at
Melissi is a neutral shell. Input depends
a level just beyond that currently
on tutor or student
System caused frustration in the

early years as the network and server
Promotes a positive self image in it sat on was not able to support its
the learner full functionality! Feedback, discreet
monitoring and intervention avoid
(adapted from Hubbard 1988)
student embarrassment.
Digital Platforms
Motivates the learner to use it See above. Encourages autonomous use
See above. Interactive features and

Motivates the learner to acquire link to WWW and reference material
the language (online dictionaries) motivate
acquisition
Provides a challenge but does not Some frustration linked to technical
produce frustration or anxiety glitches above
See above. Appropriateness of error

Allows for appropriate error correction depends largely on the tutor;
correction monitoring allows the tutor to be
discreet or overt
Yes through audio-record, through
Allows for student production of
microphone to tutor, through Black Box
comprehensible output
for interpreting.
Allows for learner-learner Yes through chat, email and phone
interaction in the target language modes, and one-to-one with the tutor
Table 6.5 MFE1 table mapping Melissi functionality against Hubbard (1988) for the
purposes of digital platform evaluation.
Clarke Melissi survey (2005): Staff feedback
When it comes to staff reaction, Clarkes (2005) observations on their lab

use suggest a culture of pragmatism rather than a theory-driven approach,
which the Dunkel criteria (Table 6.7) also suggest. Clarke comments that
questioning of the teachers involved seemed to bear out the results: that
the classes were indeed structured around a video, with on-line dictionaries
and search facilities for students to do a bit oftheir own research added on
as and when, and that it was questionable whether a genuine blending
(or multimodality) of teaching and learning was taking place. On being
asked in what way are these facilities combined in class? several answered
to the effect that they were not being combined much, if at all, with the
primary use being video for listening comprehension.
The staff interviewed who had known the previous analogue system
noted the improvement that Melissi represented. Comments such as: we
are mainly using Melissi as a more sophisticated traditional language lab
suggest a lack of sustained staff development on the new system. Staff
predominantly used it for video and audio, and avoided using those func-
tions that in many ways set a digital lab apart from the older system (i.e.
text, phone and subtitling). Those activities that did prove popular, such
as access to the Internet and online dictionaries, are available in any com-
puter lab linked to the Internet and did not require Melissi, although staff
appreciated the ease with which students could transfer to these from within
Melissi. The inbuilt file management system was also appreciated after
teachers had learned how to use it. Students and staff appreciated the con-
trol the system gave them. Some staff exploited this both to increase their
own control during assessments (e.g. by switching off the text and phone
functions and WWW access), and to increase student control by allow-
ing free rewind and advance control within fixed listening assessments.
This element of learner control is identified by the Dunkel criteria below,
which also highlights those skill areas that benefited from the system as
well as the lack of combined skill activities carried out despite the potential
for this. The over-dependence on the system as a self-access/ autonomous
learning, rather than a teaching, resource is also exposed.
160 Chapter 6
Melissi
Classroom
Overall objectives and structure
Digital platform caters to

Cannot judge (CJ)
different levels of competence
Melissis internal file management

system linked to the in-house server
Supports course structure enables user-friendly transfer of
the course structure to the digital
environment
Supports lesson structure As above
Great potential for blended teach-
Methodological
Adaptable to different language ing, though Portsmouth tend to

issues
learning methods restrict their use to either A-V work

(adapted from Ingraham & Emery (1991)
or student self-access
Supports CAL methodology Yes
Digital Platforms
Yes, the icons (e.g. mobile phone

Delivery of the televisual
icon) and simple interface are
environment
attractive and inviting
Delivery of the windows Well designed interface linked to
environment Windows (e.g. file directory trees)
Interface issues
Supports screen design Yes

Supports hypermedia and
Yes
linearity
Enables range of autonomy and
Yes
teacher control
Enables autonomy and self-tuition Yes
Provides access to authenticity Yes
considerations
Supports active and passive

Practical
Yes
learning
Allows for interaction and
Yes
response
Table 6.6 MFE1 table mapping Melissi functionality against Ingraham and Emery
(1991) for the purposes of digital platform evaluation.
Melissi
Classroom
Does the platform save time? Is If anything it increases staff work-
it cost-efficient; does it accelerate load. It is cost-effective when com-
learning; does it reduce teacher pared with Robotels hard-wired
workload? system
Generally positive. Criticisms
Student reaction to the digital
tended to be linked to integration
platform and the multimedia
and technical glitches not con-
environment
nected to the system
as a support for CALL Pedagogy adapted from Dunkel (1991)
Portsmouth found it most useful

The effectiveness of platform use for listening comprehension using
in delivering specific content and the A-V. Other skills were under-
language skill areas. Learnability explored using the system. Area
and efficiency of use factors pro- Studies, for example, were not
vided by the platform taught in the lab as a policy
decision
Digital Platforms
Multimedia elements (video, audio

The different impact of different and hypermedia ease of transfer to
types of platform functionality other functions such as WWW)
(broadcast, monitoring by tutor, made most impact. No reference
audio pairing , intergration of mul- made to audio-pairing; some staff
timedia elements, etc) deliberately avoided the monitor-
ing function
The impact of the platform in
Depended on the teacher; some
serving different types of computer
preferred to use it as an autonomous
environment (Teaching or self-
learning resource, others as an inter-
access lab, virtual environment,
active teaching space. Some com-
hard-wired or wireless, CAI, CMI,
mented on ease of transfer to VLE
CEI, WELL)
The uses of the platform in relation
to other spaces (i.e., as a supple- Great potential for flexible use,
ment to, versus a replacement for, though Portsmouth tend to opt for
traditional spaces such as the lec- single-use function (A-V LC) rather
ture room, seminar room, library; than blended use
or for blended spaces)
162 Chapter 6
The levels of student competence.

Is the platform best suited to EFL students seemed to take to and
REMEDIAL work and undera- profit more from than the MFL
chievers, or to very able students or students
to all types?
Which learner differences (age,
gender) / learning styles/ learner
Cannot judge (CJ)
strategies is the digital platform
best suited to?
What role does monitoring, track- Dependent on the teacher; some
ing and feedback play in the effec- avoided it. Portsmouth study did
tiveness of the digital platform? not address this per se.
Significant element. Students and
staff appreciated the control it gave
them. Some staff exploited this both
What degree of learner control is to increase their control during
related to effective CALL digital assessments by switching off text
platform design? and WWW access, and to increase
student control by allowing free
rewind and advance control within
a set assessment period.
Table 6.7 MFE1 table mapping Melissi functionality against Dunkel (1991) for the
purposes of digital platform evaluation.
A significant gap in the three frameworks above has been revealed by this
study, namely the need to evaluate the bedding-in phase which involves
issues such as technical problems linked to this phase and staff reaction
to migration to a new system, environment and culture (and this will be
important for programs and pedagogy, as well). If there have been early
teething problems and if staff, either for this reason or for lack of train-
ing, are not disposed or equipped to use the systems, or to use them to the
full, then this will feed into under-use by the students, even in self-access
rooms. In Portsmouth teething problems with the technology occurred in
the first year or two, and although these were ironed out by the third year,
staff still commented that, even in its third year, the system often crashed
when the room was used to capacity, and that tended to be for assessments
when every student was present. Such experiences only reinforced a general
reluctance to use the system to the full. Some had tried using other features
ofMelissi: one teacher had made experiments with subtitlingas a filler
also tried to use the telephone functionbut that consistently does not
work because there is a sound card missing. This same teacher would prefer
to use the system as a self-directed learning tool, and disliked using it as an
interactive teaching space. Another teacher, however, liked to use it in this
way, but rather than using the systems own monitoring of student screens
function preferred to orally check where people are, check responses. I
know there is a facility to look at the screens and see what people are writ-
ing, but I personally prefer the personal checking of learning.
Lack oftime allocated to development of materials and ongoing train-
ing were major hurdles to staff using the wider functionality ofthe system.
Clarke comments in her section on training that many stafffelt their train-
ing was neither adequate, integrated or ongoing and that they were given
two to three hours of training at the start and then expected to get on
with it without any follow-up (pp. 1112). In short, the Melissi engineers
claim that the system would not involve time-consuming preparation was
possibly misleading.
Clearly most ofthe criticisms above are less concerned with weaknesses
in the Melissi system as such and more to do with technical, managerial,
cultural or pedagogical issues in the host institution. Technical glitches
tended to be linked to the pre-existent network or PCs that housed the
system. Staff reluctance to use the full functionality of the system, even
when everything was working well, derived from a lack oftraining, a lack of
encouragement to use different functionalities, leading to a culture of staff
doing their own thing in the lab rather than adopting a department-wide
ethos. Clarkes findings bear out Davies et al.s principle that an institutional
commitment to integrated and ongoing staff training is vital if a full and
proper pedagogical exploitation of the digital lab is to be made.
164 Chapter 6
Conclusions
Ideally, the Robotel and Melissi systems need to be tested side by side in
an experimental setting to control for system and student performance
differentials, and this should be the subject offuture research. While it has
not been possible to compare the two in this way, the project has helped
clarify the evaluative criteria needed for an assessment of the qualities of
digital platforms. The qualitative data obtained and the phenomenological
analysis gained from using these evaluative criteria have shed useful light
on the varying impacts a system and the manner of its integration make on
student and staff perceptions of effectiveness and on a number of dynamics
that contribute to the synergies at play. This Case Study has shown that
institutional priorities, problems of technical installation, staff training,
the management of staff expectations, and the existence or absence of a
pedagogy-driven approach to use are all as important, if not more so than
the array of functionalities a system may have. Evidently, a good number
of functionalities may as well not be there if staff are not trained or pre-
pared to use them.
Creating a culture of optimized use must start with clear and well
thought-out management commitments. Two digital systems, as we have
seen in the Robotel and Melissi systems, may have broadly similar func-
tionality despite the one being a software solution and the other a hard-
wired solution; but the degree and manner oftheir integration may be very
different due to decisions regarding timetable allocations, training, and
maintenance and ongoing investment priorities. Whereas the Ulster labs
were designated as teaching spaces only, Portsmouth operated a mixed-use
(teaching and self-access) system; secondly, at Ulster there was a commit-
ment to increase the timetabled uptake ofthe labs to ensure maximum use
and with no restrictions as to what modules or skills were taught using the
lab, whereas at Portsmouth there was a restriction to language modules (area
studies modules were excluded); thirdly, at Ulster there was the commit-
ment of significant human resources to ensure adequate staff training for
transition, technical support and the creation of a teaching and research
culture to create and sustain a momentum of optimized use of the lab.
For such a culture to be maintained, whatever the institution, the

greatest effort must also be invested, not just in the improving ofthe digital
platforms per se, but in the research-driven integration oftheir use with our
two remaining Ps, Programs and Pedagogy. The remaining Case Studies
will explore this process, looking first of all at the evaluation of the use of
commercially developed language learning software in the Ulster setting,
secondly at the evaluation ofCALL pedagogies for the teaching of various
language learning skills in the Ulster setting, and also at an evaluation ofthe
impact oftwo different approaches to the teaching of meta-cognitive skills
for language students as two different higher education institutions.
Chapter 7
Case Study 2: Evaluating programs
Introduction
The case for evaluating software
The evaluation of the impact of CALL software must be tied to the role
this software plays in the teaching and learning process. As early as 1988
Pederson said: The point, however obvious, needs to be restated: CALL,
in and of itself, does not result in more and better learning, it is the specific
way instruction is coded in CALL software that has the potential of affect-
ing learning positively, for specific learners in specific contexts (p. 107).
Software is not dismissed in the CALL impact equation; it is merely
that one must be careful when ascribing causality, and focus on its effects,
and effectiveness, in situ. Pederson goes on to say that one obvious problem
in CALL is to provide evidence that a given software package is designed
and programmed effectively (p. 108). She adds that the wise language
teacher should examine evaluative research reports carefully for clear edu-
cational objectives, a specific target audience, and an adequate evaluative
consensus from classroom teachers, students, and CALL experts (p. 109).
In other words, the evaluation of CALL programs should be intercon-
nected with CALL pedagogy and the two should not be mutually exclusive
activities. Pedersons core thesis is built upon the CAL work done by Salo-
mon. His contribution to effectiveness research generally derives from his
insights into the relationship between the software coding and cognition.
He defines coding, or the coding elements as the way a medium stores
and delivers instruction (Salomon 1979, cited in Pederson 1988: 111) and
identifies three key variables that influence computer-assisted learning:
168 Chapter 7
aptitude; (what the learner brings with him/her in the way of learning
style, strategy and ability), treatment (pedagogy, or how the material is
integrated into CALL), and thirdly, the coding elements (e.g. colour, dis-
play, graphics, rate, timing, format, clarity, print size, linearity, hierarchy
of elements, navigation).
Our Model for Evaluation, therefore, must be able to assess for CALL
programs a number of interrelationships. It will need to be able to isolate,
quantify, compare and correlate improvements in learner performance as
much in response to different software coding elements as to different
teaching approaches, as much to different software interfaces as to teach-
ing settings (CALL vs. CALL and CALL vs. non-CALL); it will need to
identify those learning styles that respond better to certain coding elements
and compare them to the effects generated by traditional pedagogies. In
looking at programs we will not so much be assessing their qualities as
technological products as qualifying and quantifying their effectiveness
in education learning environments.
Many software reviews for CALL have been carried out already. This
chapter will not be a review as such, for reviews or software evaluations
become rapidly obsolete (Pederson 1988: 109) since software products
are constantly updated and improved, and indeed this is the case with
TellMeMore, now in its ninth version. While this Case Study focuses on
one commercially developed product in particular, the primary aim is to
continue to test the Model for Evaluation with a view to identifying and
defining effective coding elements, or in McCartys words the persona in
each separate software package, and assessing the role software plays in the
engine that is the CALL teaching and learning process and experience.
MFE2 will be a framework for qualifying and, to a more limited extent,
quantifying, the persona in the software and the extent to which it embod-
ies the qualities of a good teacher (McCarty 1995: 30). As an extension of
this, the aim is also to evaluate qualitatively, if not quantitatively, the role
of the software when configured with the pedagogy and the platform.
In this Case Study two generations of the commercially successful
TellMeMore language learning software package, created by the French
company Auralog, were evaluated in the context of the teaching of lan-
guages at the University of Ulster. The first package trialled was the net-
workable CD-ROM package TellMeMore Education (version 7); the
second was version 9: the TellMeMore Campus (or Online) e-learning

application of the version 7 content. Both were designed specifically for
the education as opposed to the business or private markets.
Research design
There were two key differences to note in the respective research designs
ofthe TMM7 and TMM9 projects. First of all, the learning environments
were different: the TMM7 study took place in a multimedia language
laboratory because it was a networked CD-ROM; the TMM9 study, being
based on a web-accessible e-learning package, was context-free, accessible
from any PC linked to the internet. Secondly, the TMM9 project was a
discrete project, and so it was possible to conduct some quantitative as well
as qualitative analyses, whereas the TMM7 study was carried out in the
context of the research goals of the Ulster-based TOLD and BLINGUA
projects (see next chapter), which were interested less in the effectiveness
of a software program than in the impact of a wider CALL pedagogy that
included TMM7 as an aspect ofthe design. This inevitably had a limiting
effect on the nature and quantity of evaluative activity pertaining specifi-
cally to software. Nevertheless, the students do make specific reference to
the software in their feedback to those pedagogy studies. Central to both
was the challenge of integration of software into teaching programmes.
The TMM9 study, on the other hand, had as its primary goal the
evaluation of the software package, and could therefore be more focused
on the specific impact ofthe software; the potential for isolating causality
was, therefore, also increased. Student volunteers for this trial could be
randomly assigned from a number of different years and languages; the
main disadvantage, however, was that they had to work on the package in
their own time as the trial was based on a new product that had not been
integrated formally into the institutions modular structure. As with any
voluntary study what you gain in terms of a random assignment of students
to groups, and thereby good construct validity, you may lose in terms of
analysis of a real CALL experience in which the language learning on com-
puters is fully integrated with a module and its assessment structure.
170 Chapter 7
These two factors, the voluntary nature of participation and the bolt-on
nature ofthe study, laid the project open to the possibility that the students
might take it less seriously than if it were an obligatory, integrated part
of their studies. Data might therefore be skewed. Also, there was the fact
that progress made might be attributable as much to language taught in
the regular language modules that all students were committed to as to the
extra TMM factor. Given, however, that this affected all students (i.e. the
treatment and the comparison group) equally then one could reasonably
argue that their normal language tuition would act as a control.
The pedagogical designs behind the two TellMeMore studies were
determined by two separate theoretical agendas. First, there was the agenda
set by the researchers; in other words, the TOLD and BLINGUA context
in the case ofTMM7, which focused on oral and writing skills respectively,
and the autonomous e-learning context ofthe TMM9 study which focused
on overall language improvement rather than any one skill. The second
agenda at play was the pre-determined pedagogical agenda built into the
product by the courseware designers when they developed the packages.
Table 7.1 gives the research agendas relative to each study. In the Element
present? column a distinction has been drawn between whether the peda-
gogical descriptor was a characteristic of the teaching approach (T), an
inbuilt feature of the software (S), or both.
For the TMM7 study the degree of teaching input linked to TMM
varied depending on the demands ofthe TOLD and BLINGUA projects,
whereas for the TMM9 study there was no teaching input outside the
software program. In both studies we were interested in assessing the mal-
leability ofthe software to the overarching pedagogic requirements ofthe
institution and module, and whether our Model for Evaluation could pro-
vide an exportable diagnostic tool for gauging the intrusiveness and flexibil-
ity of pre-set learning content for any other language software programs.
Element present? In the

software (S) and/or the
TMM teaching (T)
CALL approach descriptor
7and9
TMM7 TMM9
Behaviouristic drill-and-practice (S) (S)

Communicative focus on meaningful output (S+T) (S)
Teacher-led didactic and directive, from the Online
(S+T)
front approach tutor only
(S+T)
CALL pedagogy
Student-centred autonomous or ID/LS TOLD no;

(S)
determined learning BLINGUA
mixed
Constructivist; instructed SLA; ZPD (S+T) (S)
e-learning
Blended learning mixed approach (S+T)
only
Blended learning mixed setting (T) (S)
TMM
Blended learning mixed resources (T)
only
Table 7.1 Comparing the different pedagogical approaches behind the TMM7
and TMM9 studies.
While both studies shared the fact that they were quasi-experimental stud-
ies, aimed at gleaning empirical data, and used undergraduates at the Uni-
versity of Ulster, their respective research designs were otherwise quite
different. Table 7.2 compares the different data-gathering methods. With
regard to TMM7 a fuller treatment ofthe overall TOLD and BLINGUA-1
(i.e. pedagogy) project designs is given in Chapter 8. Here those projects are
only considered as they relate to the TMM7 package, and not to detailed
matters of pedagogy. The key research design feature to note is that in the
TOLD and BLINGUA projects we would not be able to isolate language
learning progress made in the use of TMM7 from progress made as part
of the overall project. This is because the pre- and post-tests would apply
to the whole project most of which did not involve the use ofthe software
172 Chapter 7
package. The teaching scheme required students to dip in and out of the
software package as part of wider tuition involving discussion groups, paper
exercises, web-related activity and other language software programs such
as CLEF and HotPotatoes.
The TMM9 study made use, for the pre- and post-tests, ofthe Com-
puter Adaptive Test (CAT), which is a foundational, diagnostic tool built
into the TMM9 product.
Element present?
TMM Data collection method
TMM7 TMM7
TMM9
TOLD BLINGUA-1
Diagnostic survey of
Yes Yes No
prior learning
Diagnostic survey of
Yes Yes No
learning style
Post-treatment survey of
Yes Yes Yes

student reaction
Post-treatment student
No No No
focus group
Tutor
Post-treatment survey of
feedback No No
staff reaction
notes
Tutor
Post-treatment staff focus Yes tutors
feedback No
group log
notes
Electronic/paper log/
journal of student Yes Yes Yes
reaction
Quantitative/ empirical data
Test(s) of prior learning Yes No Yes (CAT test)

Yes (same as
Pre-test Yes Yes (week 1)
above)
Progress test (mid-
No No No
treatment)
Yes (CAT
Post-test (identical to
Yes Yes (week 5) test) used as
pre-+ progress test)
post-test
Table 7.2 MFE1 checklist for data collection methods. Mapping of TMM7 and TMM9.
Pedagogic context of TMM7 and TMM9 trials
The TMM Education (v.7) package was tested in the context ofthe teach-
ing of undergraduate (post A-Level/Leaving Cert) French only. With the
TOLD oral skills project this involved a CALL-based treatment group of
15 students who had access to the software and the lab and a comparison
group of 14 students who accessed similar but non-CALL content. In
the BLINGUA-1 writing skills project all 25 students had access to the
software and the lab as part of their area studies module. This cohort was
now divided into a treatment group (12 students) whose CALL-based
teaching was differentiated according to their dominant learning style,
and a comparison group (13 students) whose CALL-based teaching was
not differentiated.
The TMM Campus (v.9) trial, on the other hand, increased the range
of languages taught to five (French, German, Spanish, English and Ital-
ian). Three of the six available levels were used (Beginner, Intermediate
and Advanced). The overall number of students was 86 of which 47 were
Participants (i.e. had access to the materials for the duration), and 39 were
labelled Non-Participants (i.e. only had access to the pre- and post-test
for the purposes of comparison). Some of the participants who did two
languages chose to work on both oftheir languages for the trial, hence the
disparity between the above global total and the sum of the totals below.
The target cohorts were first to final year undergraduates. The students of
English and Italian were French foreign exchange students on the Eras-
mus programme. A fuller account of the participant details and further
background to the project was made but space does not allow for their
inclusion here.
While the TMM7 project took place over one semester (TOLD:
September-December, 2003; BLINGUA-1: September-December, 2003),
the TMM9 project was approximately six months in length: December
2006 until May 2007).
174 Chapter 7
Evaluation of strengths and weaknesses TMM7 and TMM9
Some general findings
Lafford (2004) has already reviewed the Spanish version of what appears
to be version 7 (Education), though she does not specify which version
she is looking at. What she says regarding content and progression within
the database will also apply to version 9 (Campus), which is identical to
Version 7 in that regard. Version 9 differs mainly in its adaptation to an
online, e-learning environment and its integration with sophisticated CAT
diagnostics, progress and summative testing. Laffords rsum summarizes
the strengths and weaknesses as she saw them with TMM7 as a networked
CD-ROM. Our primary judgment ofthe content ofTMM9 is that nothing
has changed, for good or ill, while the online CAT tests and web portal
functionality do represent significant value-added in terms of the adap-
tation of content to student levels, the accessibility and instantaneity of
feedback, and the liberation oflearning from the laboratory to a distance-
learning dimension.
While acknowledging the high-end graphics and excellent speech
recognition software that provides the learner multiple opportunities to
practice Lafford identifies key weaknesses that we found tend to handicap
its use in HE language teaching. For example, in both TMM7 and TMM9
the Cultural Workshop provides knowledge about some isolated cultural
facts from a sealed database of very short, and in some cases dated, texts.
The need for a functionality that allows the easy input of extended, up-to-
date authentic texts of a cultural, social or political nature for area studies
modules is currently not met. Nor are there any appropriate comprehension
or vocabulary related questions linked to these cultural texts. A potentially
significant advance on TMM7 is TMM9s access to an Authoring Tool,
developed with large commercial enterprises in mind such as Renault and
EDF, to enable their own technical training content to replace or comple-
ment the existing content in the Professional Situations route. This could
be a major selling point at higher education level as it would enable tutors
to import authentic and up-to-date texts with accompanying multiple-
choice style (assessable by the package) or open-ended questions (assessable
only by the tutor). In spite of its appeal, an early decision was made not to
use the Authoring Tool as the technical complications were prohibitive.
Uploading home-authored texts and exercises to Auralogs server in France,
while possible, would have involved the temporary suspension of their
globally accessible web-based materials every time it occurred; importing
the whole content onto a local server would have obviated this problem,
but in our case the language department server was not linked to the web
and so students could not access the materials outside ofthe local campus
intranet. The quest for an easy-to-use, open and updatable as well as web-
based package with the high-powered functionality of a program such as
TMM9 is definitely worth pursuing for the HE market. TMM9 has nearly
achieved it, though it needs a more user-friendly authoring tool.
Lafford then points out its suitability to the needs of individual learn-
ers, who are given a great deal of control over various elements ofthe pro-
gram so they can forge their own learning path, a point which our study
bears out, and lists the programs focus on pronunciation, structurally-based
curriculum, mechanical exercises, decontextualized interaction, and use of
culture capsules (mostly isolated from vocabulary and grammar exercises
and listening, speaking and writing activities) as reasons why it is out
of step with modern communicatively-based views of task-based foreign
language pedagogy views which are grounded in cultural authenticity
and the notion oflanguage as social practice (p. 32). Again, our trials con-
firm her findings, while we and most of our students would be less scath-
ing about the value of the pronunciation activities and some of the other
mechanical exercises that feature in the package. Also, it is hard to imagine
a product that, given the current limitations oftechnology, would be able
to deliver better non-structural, fully contextualized, communicatively-
based, task-based learning via a pre-packaged sealed database of content
and interactivity.
176 Chapter 7
Technical evaluation
Version 7, with its HTTP server, represented a clear improvement on the

previous version (5) we had installed. The only real criticism our techni-
cian had was the extensive use of Macromedia Flash in version 7, which
he felt slowed the overall performance. While Version 9 did not involve
installing any content for the students as all the content was held on the
Auralog server in France, it did, nevertheless, present several challenges
on initial access. A major prior concern was the fear that speech recogni-
tion would be sluggish over broadband. Auralogs downloadable plug-in
addresses this issue neatly. Through it all the speech recognition calcula-
tions are done on the client PC, and are therefore instant. Most technical
problems related to firewalls, security levels or pop-up blockers preventing
the full installation of the Auralog components (in particular the speech
recognition plug-in) or loading of pages. Several students had difficulties
accessing TMM9 at home. Essentially, the problems were no different
than those posed by the lab installation, though students did not have the
benefit of access to a technician.
The technical compatibility test that the online portal takes one
through, prior to registering is normally a quick process and tests that the
minimum requirements are met, that the Internet connection is sufficient
and that Flash Player is installed.
Once access has been gained by the learner to the web-portal, quite a
complex procedure is required to then ensure that the workstation is com-
patible with the requirements for running TellMeMore9. Those with little
knowledge ofthe workings of a PC would be discouraged by complications
with firewalls, Flash downloads, sound-cards and pop-up blockers. For
example, the window indicating that the configuration test is in progress
will not show unless the pop-up blocker is disabled. Furthermore, once this
window appears the pop-up blocker may also need to be disabled a second
time before the configuration test commences. On several occasions students
sat a while in front of an inactive configuration test simple window simply
because they did not realize that they had to disable the Pop-up blocker a
second time. The reference on this same window (to optimal user condi-
tions not being possible without this plug-in) may be improved by stating
that it is the (vital) speech recognition functionality that will be inactive.
This quite protracted procedure for accessing the product before a stu-
dent can merely start to use it presents quite an affective barrier to all con-
cerned (students, staff and in-house technicians). While travelling technical
support is available (at a price), one can imagine many an institution baulking
at the hurdle that installation presents. In many ways the procedure is easy
to understand despite the technical sophistication ofthe functionality, and
most students were very patient. One can only hope that with the advance of
technology and improvements in interoperability this will be simplified.
Data collection and findings (TMM7)
The quantitative data collected for the TMM7 study were largely incor-
porated into the data gathered for the respective studies the software was
used for whether TOLD or BLINGUA-1. These will be covered and the
findings reported on in Chapter 8. Discrete evaluation of the impact of
TMM7 within these studies relied for the most part on the qualitative data
gleaned from student logs and questionnaires, and stafffeedback. Some of
this has been reported in graphic and tabular format below.
Qualitative evaluation of TMM7
Staff and student reaction to TMM7 endorses most ofLaffords points both
positive and negative with the following caveats and additional points. Peda-
gogically, the main problem area concerns the mismatch between the self-
contained nature ofthe most ofthe activities and the way that teachers in a
given situation like to teach. In their feedback most staff echoed Roblyer et
al.s concerns (1997: 91) and saw the package as an all-or-nothing challenge
where they felt that ifthey were going to use it in a whole class context then
they would need to adapt completely their teaching style as well as the con-
tent oftheir classes to accommodate the package. Most preferred its use as a
self-access trainer in the mdiathque. Some staff also felt the highest levels
were not sufficiently taxing for the abler student at undergraduate level.
178 Chapter 7
Some qualitative findings from the TOLD Project (TMM7)
In the TOLD classes the drill-and-practice qualities of the phonetic fea-

tures lent themselves well to pronunciation and fluency coaching at the
start of an oral class. These features were found to be excellent in the pre-
communicative, or rehearsal stage of a sequence of instruction (Barr et
al. 2005).
Where the program remains weak, from the point of view of authen-
ticity in communication, is at the performance stage, where the limits of
technology do not allow for anything more real than the simulated inter-
active dialogues. Greater authenticity, as Lafford herself states, could be
brought in by toggling out ofthe program at a given point to, say, a teacher-
prepared activity involving research on the web followed by oral feedback
in a face-to-face context.
Many students expressed the feeling that, while they appreciated the
coaching for fluency and pronunciation in the program as well as the chance
for mistakes to be made without the embarrassment they might feel in a
group, nevertheless, they missed the spontaneity and human element of
an ordinary oral class. The native-speaker learning assistants endorsed this
view. These reactions underscored the conviction that oral work linked to
CALL, if it is to add value to non-CALL oral classes, needs to at the very
least include large amounts of authentic communication, such as might
be obtained through audio- or video-conferencing, or, at the very least,
text-based chat or CMC.
Some qualitative findings from the BLINGUA-1 Project (TMM7)
This project trialled a blended learning pedagogy for a written language

and an area studies module. TellMeMore was used in the written language
module for grammar rehearsal and testing, and in the area studies module
to kick-start research on a given topic. Students in the language module
found the dictation exercises and the sentence transformation activities
particularly useful. However, as with Lafford, the grammar workshop was
difficult to map to a sequenced programme of grammar tuition, and the
tutors and students preferred to use a separate grammar drilling program for
initial grammar input. For the area studies module the Culture Workshop
had material on a wide range oftopics. However, the passages were judged
too short and basic for university level. We gave web links and other sup-
port material to complement and extend these texts. The product, if it is
to support work in area studies, requires a greater degree of flexibility to
allow for teachers to bring in current texts and set up their own questions
within a pre-existing template (similar, say, to the HotPotatoes format).
In the absence of such flexibility teachers in HE are more likely to ignore
TMM for the teaching of culture and link to live and current pages on
the Internet. In many ways the ready availability of authentic, current and
free video material on the web (via streamed news sites, YouTube etc.),
from which teachers can rapidly generate lesson material, is making the
need for commercially-produced programs selling cultural material increas-
ingly redundant.
The Learning Paths (parcours pdagogiques) feature of TellMeM-
ore proved a useful means of differentiating activities and relating them
to various learning styles. Using Admin Tools and Tutor Tools to preset
student IDs and map learning paths to different students did initially take
a while to get used to but, once understood, proved to be a quick way to
customize student learning. Another useful feature of the package is the
student tracking and feedback functionality. TellMeMore automatically
scores student work, and displays this in tabular format which can then
be exported as text-files, html pages or to a spreadsheet. This is clearly a
welcome timesaving feature. The tracking includes a record of time spent
on a given activity.
Student reaction to TMM7
The two joint top scoring responses (15/17 students) in the student satisfac-
tion survey for TellMeMore after BLINGUA-1 were for overall enjoyment
and for the variety of activities in the program. Under activities enjoyed
most, listening activities (12/17 students) and exercises and games (10/17
students) were the most popular.
180 Chapter 7
In the student logs the students responded most negatively to the fol-
lowing features. They found the speech recognition activities occasionally
off-putting because the graph often did not give them a good score even
though even the tutor felt they had said the word or phrase well. Sometimes
even a native speaker did not always get a full 7/7. Secondly, the Speech
Recognition on the interactive dialogue was not always sensitive enough
and the students sometimes had to shout to get a reaction. When many
students are working in the same room this sometimes distorts feedback
from the PC. The third most frequently mentioned item was the hang-
man game. It only became enjoyable when they found they could translate
individual words in the clues by right-clicking and using the dictionary
otherwise, especially the weaker ones found it a little hard a case of either
you knew the word or you did not. The dictation exercises proved to be
the least popular activity (for 8/15 students), with pronunciation drills
and speech recognition activities coming close behind (7/15 each).
Auralog have worked hard at the sensitivity/accuracy of the speech
recognition software (the downloadable speech recognition plug-in for
TMM9 has contributed significantly to this); it is notable that in the more
recent TMM9 study more students reacted favourably to the speech recog-
nition, phonetic drills and dictation exercises. For the most part students
reported that they were happy with the program and would use it alone,
if it were available. The new TMM9 (Campus) edition would enable just
such an autonomous extra-mural use.
Since TMM9 used for the most part the same content as TMM7
(Education) we were primarily interested in any impact the new elements
made: the new mode of delivery (i.e. distance or e-learning, the role of
the web-portal) and the web-based computer-adaptive tests which were
designed to gear the learning more specifically to student needs and enable
closer tracking of learning gains. We also wanted to know to what extent
a new way of teaching would be required.
Data collection and findings TMM9
The TMM9 study revealed some of the weaknesses of a voluntary study.

For example, the oft-voiced student claim that they would use the package
more if they could access it outside class, while well-intentioned, was not
readily fulfilled by most of the students in the second study when usage
of the program was voluntary and in their own time. This finding raises
general questions as to how software is most effectively integrated into a
students learning. In this regard, the main dilemma raised in discussions
with language staff and students tends to concern whether use of an applica-
tion should be integrated into the schemes of work and linked to modular
assessments or merely made available as an optional self-access resource. The
various studies featured here show that software remains severely underused
when merely available as an optional remedial resource and not integrated
into teaching and/or assessment. This echoes McCarthys (1999: 5) warn-
ing that students do not take voluntary add-on material seriously for the
reason that, and here he cites Bull and Zakrzewski, if it is not worthy of
lecturer attention it is not considered worthy of theirs (1997: 17, cited in
McCarthy 1999).
Internal and external validity issues are tabulated in the MFE1 table
(Table 7.3). Inasmuch as volunteers can be called random, this study involved
a random assignment of subjects to groups. One should point out that it
is possible that more motivated and more able students are more likely
to volunteer than the less able and less motivated. However, the student
feedback and knowledge of the students who did volunteer suggests that
there was a good representation of weak students who were aware oftheir
need for remedial support and perceived involvement in the TMM9 study
as a possible way of obtaining this support. The large drop between the
number of participants and the number of complete data sets was the most
disappointing feature ofthis study. Had its conclusion not coincided with
the examination period it is likely that the completed data sets would have
been larger. That was the original intention, but delay in the start time of
the project due to logistical problems linked to the software company led
to slippage in the scheduling.
182 Chapter 7
Model for CALL evaluation (MFE1) Validity checklist

Element
TMM9 Validity criteria addressed?
Yes/No/Detail
Quasi-
manipulated) or quasi-experimental (variables cannot
experimental
be controlled/manipulated) study?
Yes
Have the respondents been isolated from each other? CJ
Are the results attributable to the factor(s) studied? See report
Internal validity
What other variables might have contributed to the Hawthorne;

effect? CHILL; halo;
Extra-curricular
learner/ teacher differences, variable settings, time of
activity
day/week/year)?
How certain are you the learners are not getting
They were!
language instruction apart from through this study?
See report
happened?
Yes
N = 95; but full
data sets N = 11!
Yes
parametric tests
External validity
populations, settings and experimental situations? See conclusions

Does the report describe the skills tested? Yes
Does the report describe the characteristics of the
subjects? (i.e. age; gender; ability/year group; cohort/ Yes
course)
Does the report describe the CALL materials used? Yes
Table 7.3 Validity assessment criteria for MFE1: Mapping of the TMM9 project.
CJ = cannot judge.
Some qualitative findings from the TMM9 study
The full monitoring of student reaction through pre-, mid-, post-treatment

surveys, focus groups and participant logs is a worthy aim. However, logis-
tic and time constraints, not to mention staff resistance, may not allow
for this and there is also the danger of over-intrusive monitoring which
can be counter-productive and skew the qualitative data. Given that the
TOLD and BLINGUA projects themselves included all of the above, it
was decided on this occasion to opt only for student logs and a summa-
tive survey. Most feedback comments related to the content and were, not
surprisingly, similar to the TMM7 content feedback. As for the difference
that having access to the product outside of class-contact hours and outside
ofthe campus buildings, there were surprisingly few comments. There was
general expression of disappointment that they had not spent as much
time using the product as they originally intended. Some commented on
the technical challenges of accessing the product from home. Regarding
pedagogy a number of students voiced a desire for greater integration of
the product into their normal learning.
Students KT and FG, for example, found the program useful for home-
based learning, but (Q.11) would have wanted this to be a requirement of
the course. Five students ( JC, AP, KT, LM, FG, and CN) felt it would
have been better to integrate TMM9 assessments into existing modules
assessments. There was unanimity here but with some caveats, such as it
should only be a small percentage of the marks. Most were in favour at
least of the program being used regularly in classroom teaching for the
purposes of grammar reinforcement, oral work (esp. pronunciation), lis-
tening comprehension and vocabulary acquisition.
As regards the tracking feature there was a certain ambivalence typi-
fied by CN, who saw both sides of knowing that there was a tutor able to
monitor use: A possible weakness, though I cant really complain about it,
was the fact that it was monitored exactly how much you used it and you
felt under pressure to use it for a certain amount every week even if you
have a demanding timetable but this is actually good because it puts a bit
of pressure on you to use it more often (CN Q.10).
184 Chapter 7
Thus, while TMM9 did liberate the students from the classroom the
freedom thus generated served to pull them back towards a greater desire
for external controls, be they via guided and integrated content, or the
knowledge that they were being watched by a tutor. Would this motiva-
tional impetus, however, contribute to improved time spent on task, and,
more importantly, improved learning gains?
Some quantitative findings from the TMM9 study
From the outset it was agreed that the CAT tests would form the basis ofthe
test for learning gains, as it had been designed to do this as part ofthe inbuilt
provision ofthe online courses. It was possible because it drew on the same
database of questions, and so was able to compare like with like (though
not necessarily same with same). Furthermore, even though devised by a
company that knew nothing of our students language levels, as they were
adaptive, both tests were a good gauge of the students current ability.
For the pre-test all the students (that is, the participant group (PG)
and the non-participant group (NPG) would take the placement test (test
de positionnement) at the start and the volunteers would be given access
to the TMM online for the period. They would follow the guided mode
(mode guid) and agree to spend at least 2 hours a week on the TMM
online material. Towards the end ofthe trial period the students would all
complete the progress test (test de progression), which was drawn from
the same database of questions as the placement test, and was the same
length, thus providing comparability.
Quantitative evaluation would therefore involve the following statisti-
cal tests: comparing sample means oflearning gains between groups (treat-
ment and comparison) and within groups, looking at progress between the
placement test and the progression test in particular.
Due to time constraints, and probably an element of over-monitoring,
the number of complete data sets was markedly reduced compared with
those ofthe pedagogy study that was going on at the same time. While the
pre-test took place for the entire cohort, the post-test was only completed by
a small proportion ofthe group due to the fact that we were unable to gather
all the classes together to complete the progress test under examination
184 Chapter 7
Thus, while TMM9 did liberate the students from the classroom the
freedom thus generated served to pull them back towards a greater desire
for external controls, be they via guided and integrated content, or the
knowledge that they were being watched by a tutor. Would this motiva-
tional impetus, however, contribute to improved time spent on task, and,
more importantly, improved learning gains?
Some quantitative findings from the TMM9 study
From the outset it was agreed that the CAT tests would form the basis ofthe
test for learning gains, as it had been designed to do this as part ofthe inbuilt
provision ofthe online courses. It was possible because it drew on the same
database of questions, and so was able to compare like with like (though
not necessarily same with same). Furthermore, even though devised by a
company that knew nothing of our students language levels, as they were
adaptive, both tests were a good gauge of the students current ability.
For the pre-test all the students (that is, the participant group (PG)
and the non-participant group (NPG) would take the placement test (test
de positionnement) at the start and the volunteers would be given access
to the TMM online for the period. They would follow the guided mode
(mode guid) and agree to spend at least 2 hours a week on the TMM
online material. Towards the end ofthe trial period the students would all
complete the progress test (test de progression), which was drawn from
the same database of questions as the placement test, and was the same
length, thus providing comparability.
Quantitative evaluation would therefore involve the following statisti-
cal tests: comparing sample means oflearning gains between groups (treat-
ment and comparison) and within groups, looking at progress between the
placement test and the progression test in particular.
Due to time constraints, and probably an element of over-monitoring,
the number of complete data sets was markedly reduced compared with
those ofthe pedagogy study that was going on at the same time. While the
pre-test took place for the entire cohort, the post-test was only completed by
a small proportion ofthe group due to the fact that we were unable to gather
all the classes together to complete the progress test under examination
conditions in the final week ofterm due to staff reluctance to make available
teaching time for the completion of the test. Students were then asked to
complete it in their own time and an extension to the trial into the exami-
nation period was granted. This enabled a few more students to complete,
but by no means the whole cohort. Table 7.4 shows the spread of students
across the different languages and years for both the participant and non-
participant groupings. We can learn some things from an inspection of dis-
tribution of participants and non-participants across the languages and year
groups. Table 7.4 portrays the distribution of students by language group
and year. Participant numbers generally reflect the proportional difference
in cohort sizes within those language groups on the campus. In the larger
cohorts students clearly feel freer to opt out of participating, whereas in the
smaller cohorts we tended to have 100 per cent participation.
Module Total cohort size No. of participants No. of non-participants

French (year 1) 21 9 12
German (year 1) Did not participate in trial
German (year 2) 6 6 0
German (year 3) 2 2 0
Spanish (year 1) 15 8 7
Spanish (year 2) 5 5 (incl. EFL) 0
Spanish (year 3) 6 6 0
English (EFL) 9 8 1
Total 107 68 39
Table 7.4 Group descriptives by language and year group (TMM9).

Note: Some students do two languages and chose to access both languages for the trial,
hence the disparity between the cohort sizes and the number of participants.
Table 7.5 shows summative data linked to the completion of various ele-
ments of the trial. The figure of 11/107 represents the total number of
students for whom we have complete quantitative data sets. The figure of
186 Chapter 7
8/107 represents the total number of students for whom we have complete
qualitative and quantitative data. Full data sets, that represent between 7
and 10 per cent ofthe original sample size, would be statistically practica-
ble if the original sample size was 350+. As it is, we may make speculative
inferences from the data we have to work with, but speculative is all they
will be. However, given that we have qualitative data to complement the
empirical data our speculative inferences may carry a little more weight.
Of course, the judgmental feedback from the students is valid in its own
right, though only to the extent that it is a truthful record ofthe students
experience, and most CALL impact studies rely entirely on such qualitative
data, however, for the purposes of configuring data collection methods,
this Case Study fell short of our desired objective.
TMM9 trial data collation

Item No. Comment
Total number of student For this, students accessing two languages are
107
registrations counted twice
Total number of placement
95 participants + non-participants
tests completed
Total number of students participants + non-participants (= licences
86
in trial used)
Number of participants
accessing more than one 23 Not all completed the second placement test
language
Number of participants
47 treatment group
with access to material
Number of non-participants 39 With suspended access comparison group.
Fr/Ge/Sp/En. Italian was accessed by two
Number of languages 4
Erasmus students but is not taught at UU
Number of modules FRE103C1, GER101C1, SPA101C1, FRE303C1,
6
involved FRE501C1, EFL303C1
Number of complete data That is, those students from the treatment
sets for the purpose of AND the comparison groups who completed
11
assessing learning gains both the placement test (= pre-test) and the
during the trial progress test (= post-test).
Number of complete data (two of these are actually one student who
8
sets in the treatment group accessed two languages)
Number of complete data
sets in the comparison 3
group
Survey returns 8
Table 7.5 General statistics describing participation (TMM9).
Table 7.6 records the levels and overall time spent on task by the treatment
group only. Noteworthy in the table is the very low number of students
who spent more than ten hours on the package. Several recorded in their
logs that they had spent more than they actually had. Other (more honest?)
students recorded their own disappointment with themselves at the lack
of time spent and also stated their desire to have the product integrated
into their modular studies rather than as an adjunct.
No test level
Advanced+
Intermed.+
Advanced
Intermed.
Expert
0 05 610 1115 1620

Total
TMM9
hrs hrs hrs hrs hrs
Overall 5 1 17 34 6 63 8 16 34 8 4 1
French (yr 1) 0 0 0 8 0 8 1 3 5 0 0 0
French (yr 2) 0 0 1 6 2 9 0 3 3 2 0 1
French (yr 3) 0 1 3 11 0 15 0 5 10 0 0 0
German (yr 1) 0 0 0 0 0 0 0 0 0 0 0 0
German (yr 2) 0 0 2 1 0 3 3 0 2 1 0 0
188 Chapter 7
German (yr 3) 0 0 0 1 1 2 0 0 2 0 0 0
Spanish (yr 1) 0 0 1 5 1 7 1 2 5 0 0 0
Spanish (yr 2) 0 0 3 2 0 5 0 1 3 0 1 0
Spanish (yr 3) 1 0 2 0 0 3 3 1 2 0 0 0
English (EFL) 4 0 4 0 1 9 0 0 2 5 2 0
Italian 0 0 1 0 1 2 0 1 0 0 1 0
Table 7.6 Language levels, participation and total time spent

(treatment group) TMM9.
The mean increase (i.e. learning gains) pre to post-test for the treatment
group was 0.7 (or 7 per cent). The mean increase pre to post-test for the
comparison group was 1.13 (or 11.3 per cent). Neither of these increases
was shown to be statistically significant when submitted to an independ-
ent samples t-test.
Conclusions from the TMM9 study
There are various speculative inferences that can be drawn from a configu-
ration of the qualitative and quantitative data. The poorer (though not
significantly so) performance ofthe treatment group hints at a number of
tentative conclusions to be drawn. First, it is clear that insufficient time was
spent by the treatment group on the product to make a significant impact
on their learning. Secondly, the slightly greater learning gains made by the
comparison group, who had no access to the product, suggests that they
benefited from possibly having more time to devote to their other language
learning. The treatment groups time commitment to the software, on the
other hand, was not sufficient of a critical mass to bring them any real com-
parative benefit. Thirdly, the product needs to be trialled as an integrated
element of normal studies and modular assessment to test the students
assertion that they would then take it more seriously. To conduct such a
trial on a sufficiently large scale to obtain the required generalizable data
necessitates, however, an institution-wide decision to integrate it across
the board. Such a decision, in most institutions, would only be based on
evidence that the product would bring the desired benefit, evidence which
would only be available from just such a trial! This returns us to the chicken
and egg argument that to obtain the data we need to demonstrate effec-
tiveness often involves us in the ethical dilemma of potentially favouring or
disadvantaging groups of students, the only way round which is to ask for
volunteers, who in turn will not be able or willing to commit the necessary
time outside of their normal hours to make the study viable.
Further trialling of the product at HE level, preferably over a full
academic year, will be necessary to confirm or reject the hypothesis that
its consistent and integrated use actually does lead to improved learning
gains. The generally positive student reaction to the product, and some
ofthe statistical data, suggest that the products fuller, or at least blended,
harnessing within an integrated programme of study is defensible at the
very least from a motivational point of view. Whether it will indeed yield
the significant learning gains that would justify both its expense and a
management decision to use it for more than just a self-access trainer is
another matter entirely.
General conclusions
From TMM7 to TMM9: A new way of tutoring
For teachers used to the networked CD-ROM version and the teaching
approach required to deliver it in a laboratory environment, the switch
to an online/distance tutoring mode may involve quite a conceptual and
pedagogical leap. Some ofthe features that might have remained redundant,
such as the email link to ones tutor and the bulletin board immediately
become more useful. Who needs to email their tutor when s/he is in the
room? Likewise the tracking facility becomes essential viewing when one
is not seeing the students at work. Knowing how many hours per week and
the date oflast usage ofthe package are most helpful and instantly flag up
those students who are slacking.
190 Chapter 7
The teacher mode for TMM9 is now no longer a front of class ped-
agogue but rather a facilitator who can intervene more as monitor or
consultant. Clearly dangers exist if the online mode is used entirely in its
distant mode in other words with no classroom contact (prsentiel);
weeks may pass without a student feeling they wish or need to commu-
nicate with the teacher, and a teacher who does not check the progress of
students regularly will not be so immediately aware of problems compared
with the instant feedback one gets from noticing the absence of a student
in a classroom or lab situation.
This e-learning environment probably is better suited to a business
learning environment where the learner probably has no other academic
commitments and where the online tutor is more available to provide prompt
feedback to students and to monitor what is happening with each student
from day to day, than to a higher education setting where the tutor and stu-
dents primary interaction is in the classroom or face-to-face tutorial, and
where constant online monitoring and feedback is less likely to occur.
Challenges of integration and blending
TMM9, like TMM7, risks being ignored, or else purchased and then
ignored, due to the impossibility of adapting the cultural content to exist-
ing curricula. The manufacturers may point to its vast size, and extensive
functionality and its dynamic mode, or to the fact that it rehearses all the
major language learning skills or to the vast number oflearning paths and
customizable permutations, but this does not get around the fact that
Auralog has decided the themes covered and has chosen the content of
the databases and the cultural workshops.
On the one hand the software needs to adapt to the changing edu-
cational context. On the other hand one could also argue that if teachers
are going to continue to recruit, inspire and retain students with the fun
of learning and the wow factor that CALL can bring, then they need
to consider adapting their modules to include time allocations as well as
assessment requirements involving programs, such as TMM7 or TMM9,
if they can be shown to reduce the teacher workload, motivate students
to learn the language in their own time, and can be matched to suit the
students level of learning and learning needs.
Blended learning/teaching is increasingly being seen both in busi-

ness and education as a possible way forward in an increasingly dynamic
and multimodal and multimedia knowledge economy. Language teachers
have to mix and match the resources they use, be they paper-based, digital
or human. They also have to adapt to a mix of learning environments, be
they classrooms, seminar rooms, digital labs, self-access learning resource
rooms, offices or homes.
Language learning software needs to take this into consideration and
develop flexibility and an increasing customizability. There is an increasing
trend to return to simple old and cheap software packages that rehearse
discrete aspects of language such as grammar or reading or vocabulary or
listening. Large packages, ifthey are to compete, need to be adaptable both
in functionality (which TellMeMore does very well), and content (which
TellMeMore does less well). If the product can adapt its content for, say
Renault or EDF, why can it not do the same for a school or university?
Making the TMM9 Authoring Tool something that teachers can use with
the ease, say, of the HotPotatoes package may well be a way forward as
regards encouraging a greater engagement by university (and even school)
practitioners. For example, the tool could be given a wider number of activ-
ities (such as cloze tests, matching exercises, and jumbled sentences), the
importing of media objects (such as video and audio) could be enabled,
and all of these stored on local servers, or released as shareware.
TellMeMore in the light of different CALL models for evaluation
When set against the twelve CALL Enhancement Criteria (see Table 7.7)
for evaluation ofCALL task appropriateness the TellMeMore package, in
the two formats we have trialled, still seems to be transgressing a number
of principles of good design. The student and staff reaction indicated a
shortcoming in the evaluative criterion positive impact. The datedness
of the video material is linked to the evaluative criterion of authenticity,
or lack of it, as Chapelle defines it: the degree of correspondence between
the learning activity and target language activities of interest to learners
out of the classroom (2001: 55). Students are nowadays, as ever, sensitive
to fashion and topicality and where this is absent a corresponding affective
hurdle can be raised in the students mind. The technical problems raised
192 Chapter 7
by the students raise some, albeit minor, doubts about TMMs fulfilment
of Chapelles sixth criterion: practicality, that is, the adequacy of the
resources to support the use ofthe CALL activity, though TMM9 repre-
sents an advance on TMM7. The crossword and Hangman exercises do not
seem to meet fully the criterion oflanguage learning potential, that is, the
degree of opportunity present for beneficial focus on form. The customiz-
able learning paths and CAT tests fully meet the criterion of learner fit,
by providing significant opportunity for engagement with language under
appropriate conditions given learner characteristics. However, we were not
able, with the experimental constructs we had particularly for the TMM9
trial, to test adequately the notion of engagement with language under
appropriate conditions as the product was generally underused and some
students would have welcomed a more integrated engagement.
12 CALL
Enhancement Definitions TMM7 and 9
Criteria
Mixed; Yes for dialogues;
Language degree of opportunity
dictation and some grammar
learning present for beneficial focus
exercises; No for crosswords,
potential on form
hangman
amount of opportunity for
Yes for learning paths; no
engagement with language
Learner fit for adaptability of content to
under appropriate conditions
different teaching settings
given learner characteristics
Chapelle (2001)
Yes, tutors can (de-)activate

translation and glossary func-
extent to which learners tion; once activated students
Meaning
attention is directed toward have full control of these func-
focus
the meaning of the language tions. Some meaningless drill
still possible in the phonetic
exercises for example
Mixed. Much dated and
degree of correspondence
Eurocentric material with poten-
between the learning activity
tial to alienate students. No
Authenticity and target language activities
hyperlinks to live web material
of interest to learners out of
possible. Easy-to-use Authoring
the classroom
Tool would help greatly
positive effects of the CALL Generally yes, but the caveats

Positive
activity on those who partici- mentioned above and below
impact
pate in it damage the good impression
Technical challenge at start
adequacy of the resources to when installing plug-in
Practicality support the use of the CALL and accessing the portal.
activity Improvements in speech recog-
nition functionality observable
A comprehensive skills trainer.
Language
the ability to deliver, either in Also allows for combined skills
skills and
isolation or in combination, activities. Closed database does
combina-
all the main language skills not customizable for updatable
tions of skills
civilization content
the degree of opportunity for Customizable learning paths.
Learner
self-directed, self-paced and Full autonomy and self-pacing
control
autonomous learning possible
the extent of automated
Error cor- Detailed and exhaustive error
correction of error, and the
rection and correction, feedback, tutor
monitoring, tracking, storing
feedback monitoring and reporting
Leakey
and reporting of achievement

degree of opportunity This package is not designed for
Collaborative
for paired, group or class collaborative CALL. The PC is
CALL
interaction the partner!
Teacher input possible through
the influence of individual
Tutor Tools, customiza-
Teacher teacher personality and style
tion of learning paths, email
factor factors on CALL; staff train-
tutor function, and tracking.
ing and development
Otherwise a closed system
capacity of the CALL Designed for students working
Tuition
resource to enable the deliv- alone, and learning from the
delivery
ery of a variety of teaching package, with input from the
modes
modes tutor only if requested
Table 7.7 TMM7 and TMM9 mapped against the twelve CALL Enhancement Criteria.
MFE2 will allow for a judgmental scoring ofCALL programs against the
above criteria and will direct researchers, language department managers
and teachers to question the capacity and flexibility of the teaching and
learning environment to integrate fully the package into its demands and
needs. To this end the twelve-point checklist will be supplemented with
quality control checklists that also draw on the full range of evaluative
194 Chapter 7
principles already mapped from the literature. Tables 7.8 to 7.9 are inter-
mediary examples of evaluations of TMM7 and TMM9 based on those
authors whose criteria are relevant to software judging.
Author
PPP
Descriptor TMM7 TMM9

Overall objectives
Levels of
Beginner to Advanced Beginner to Expert
and structure
competence
Pre-set paths or Pre-set paths or
Course structure
customizable customizable
Lesson structure Topic or skills based Topic or skills based
Language learn- Teacher-led or self- Teacher-led or self-
Methodological issues
ing methods study study

Online tutoring;
Networked or self- Free to roam and
CAL study CD; Free to Guided Mode (par-
methodology roam and Guided cours); placement
Mode (parcours) and achievement
CAT test
Ingraham and Emery (1991)
Colourful; Flash- Colourful; Flash-

The televisual
driven; not always driven; more
environment
intuitive intuitive
Programs
The windows Flash-driven;

Flash-driven; navigable
environment navigable
Not always intuitive;
More intuitive; slick
screen-tips and help
layout, navigation a
Screen design available; some hidden
little easier, but still
elements should be
Interface issues
confusing in places
more visible
Wide range of inter-
Wide range of inter-
activity; but sealed
activity; but sealed
Hypermedia and content; linear pro-
content; linear progres-
linearity gression but also
sion but also heuristic
heuristic approach
approach possible
possible
Balanced (can be both Balanced (can be
Autonomy versus
teacher- or learner- both teacher- or
control
driven) learner-driven)
Autonomy and
Very good Excellent
self-tuition
Euro-centric; video
Euro-centric; some
Authenticity material needs
dated video material
updating
Active and pas- Mainly active; all skills Mainly active; all
sive learning trained skills trained
Highly interactive
Highly interactive;
Interaction and excellent feedback
very good feedback and
response and exhaustive
tracking
tracking
Table 7.8 Summary of the comparative features of TMM7 and TMM9.

Ingraham and Emerys model for evaluating courseware/software (MFE1).
Tables 7.8 and 7.9 summarize the comparative features of TMM7 and
TMM9 using Ingraham and Emerys and Hubbards models for courseware
evaluation. The italicized boxes highlight where there are significant differ-
ences (usually improvements) between the two versions of the software.
These show up, in particular, the increased degree of autonomy, monitoring
and feedback that the Campus version has brought to the product.
In the context ofthis study, the TellMeMore Case Study has enabled
the road-testing of the proto-typical evaluative framework MFE1 with
regard to commercially produced language learning software. This frame-
work employed various criteria from six CALL authors (Pederson, Dunkel,
Ingraham and Emery, Hubbard, and Chapelle) who have contributed in
their different ways to the conceptualization of criteria for the design and
pedagogical implementation of software over the past two decades. For
MFE2 these criteria will support the judgmental evaluation using primarily
the twelve CALL Enhancement Criteria. Further work will also be required
to test the frameworks adaptability to the full range oflanguage courseware
and software types: from commercial to home-produced packages, from
simpler single language skill trainers to sophisticated multi-skill packages,
from stand-alone CD-ROMs to networked CD-ROMs and Internet-based
e-learning tuition systems complete with tutor support.
It is time now for the next road-test of our evolving methodology: the
evaluation ofPedagogy. Chapter 8 will feature the TOLD and BLINGUA
projects carried out at the University ofUlster. The TOLD project focused
on CALLs impact on oral skills, and the BLINGUA project that looked
at blended learning, with particular reference to writing skills, comprehen-
sion and area studies.
196 Chapter 7
Author
PPP
Descriptor TMM7 TMM9
Provides meaningful
Yes but some less Yes but still some
communicative interac-
meaningful, and some less meaningful,
tion between student and
dated and some dated
computer
Excellent progres-
Provides comprehensible Very good progres-
sion aided by CAT
input at a level just beyond sion some exercises
test still some
that currently acquired by (e.g. Hangman) mis-
mismatching of
the learner matched to levels
activities to levels
Promotes a positive self Positive feedback,
Positive feedback, fun
image in the learner fun
Motivates the learner to Yes (some frustrating Yes (some access
use it bits) issues at start)
Hubbard (1988)
Motivates the learner to

Yes Yes
Programs
acquire the language

Provides a challenge but Challenge and less
Challenge yes but also
does not produce frustra- frustration than
frustration
tion or anxiety TMM7
Degree of error
Does not include overt error Degree of error correc-
correction is
correction tion is customizable
customizable
Yes, but simula-
Yes, but simulations
Allows the learner the tions and writ-
and writing tasks
opportunity to produce ing tasks are still
are limited to closed
comprehensible output limited to closed
database
database
Only interaction is
Acts effectively as a catalyst Only interaction is
with the software
to promote learner-learner with the software
can be linked
interaction in the target can be linked to wider
to wider class
language class activities
activities
Table 7.9 Summary of the comparative features of TMM7 and TMM9.

Hubbards agenda for evaluating courseware/software (MFE1).
Chapter 8
Case Study 3: Evaluating pedagogy
Introduction
University of Ulster The migration from analogue to digital platforms
Managing effective migrations to digital environments requires the adop-

tion of good practice and ongoing monitoring. Effectiveness research, and
with it the configuration of data-harnessing approaches that capture and
portray a richer picture of the impacts of change, must play a key role in
such transitions. What follows is the pedagogical and evaluative narrative
behind just such a migration. It draws on the lessons learned over three
years (20032006) within the setting of undergraduate language studies
at the University of Ulster, and, in particular, from two CALL research
studies looking at different language skills.
The move to adopt an empirical approach followed studies of staff
and student reluctance and resistance to CALL carried out at the Uni-
versity of Ulster by Gillespie and McKee (1999b) and Gillespie and Barr
(2002). Adopting new technologies just because they were new, exciting
and effective elsewhere did not guarantee improved learning gains in lan-
guage acquisition. Furthermore, qualitative evidence of positive impact may
not necessarily show that students are learning languages better because
of the new platforms, programs or pedagogy. The literature was showing
a clear need to provide empirical evidence from more longitudinal stud-
ies, ideally with non-novices in the CALL environment (Hubbard 2004:
165, and 2005: 352). The TOLD and BLINGUA research teams, partly in
response to the research agenda of Hubbard, Felix, Pederson and others,
and partly responding to management aims and the Centre ofExcellence
198 Chapter 8
goals, wished to identify what factors were contributing to any gains, create
a bank of qualitative and quantitative evidence of good practice, diagnose
pedagogies that work in situ and ultimately mitigate staff and student
reluctance to embrace CALL.
Introducing TOLD and BLINGUA
The TOLD (Technology and Oral Language Development) and the BLIN-
GUA (B is for blended) projects aimed to identify and correlate learn-
ing gains with a number of different variables such as learning style, prior
familiarity with ICT, blend of environment and pedagogical approach
in the context of teaching different language skills. TOLD looked at the
teaching of oral skills, BLINGUA looked at comprehension and writing
skills, in particular in area studies (civilization) teaching.
TOLD assessed student progress in oral skills across two groups, one
using technology and the other a traditional conversation class. BLINGUA
was designed as a longitudinal study over two to three years looking, in
particular, at the teaching to first and second year undergraduates of area
studies. Both projects have generated published material (Barr, Leakey
and Ranchoux (2005), and Leakey and Ranchoux (2005), respectively);
this Case Study will only approach these projects and data from the new
angle of our effectiveness research model.
The TOLD study
Background to the research on CALL and oral skills
The TOLD project focused on oral skills training (responding to the com-
puter using speech recognition software) and oral communication within
the classroom/e-lab between students and with the teacher (responding to
a human using headsets and prompts from computer-delivered content).

Our guiding themes were using technology to enhance oral communica-
tion in face-to-face instruction and technology and the assessment of
oral language development. We chose to approach them from a primarily
pedagogical angle to ascertain and indeed quantify, the effectiveness of
oral language taught to undergraduates (mostly novices to CALL) over
one semester in a technological setting as against the opposing conditions
of a technology-free environment.
Our approach drew primarily on communicative and constructivist
theories of second language acquisition with activities geared to flow in a
progression from rehearsal to meaningful performance. Key constructiv-
ist principles that applied were multi-modality and content-based learn-
ing (Warschauer and Healey 1998), a strong focus on learner-centredness
with the teacher acting as a facilitator and the learner free to make his or
her own interpretations, student ownership of learning fostered through
reflection and self-awareness of knowledge construction (Driscoll 1994),
and scaffolding for knowledge construction through student collaboration
and Zones of Proximal Development (Vygotsky 1978). Table 8.1 features
the proto-typical evaluative checklist (MFE1) of CALL pedagogy, and
blends of pedagogy as realized in the TOLD project. Some features of
the pedagogy, such as the behaviourist elements were determined by the
nature of the software.
200
Model for CALL evaluation (MFE1) CALL pedagogy checklist
Element How well done?

Degree (03) Evidence
present? (05)
TOLD
CALL approach descriptor 0 = not at all 0 = poorly

1 = minimally 1 = minimally
relate comments to the fuller
Yes/No 2 = somewhat 3 = to a great extent
definition of each descriptor
3 = fully 5 = excellently
CJ = cannot judge CJ = cannot judge
Behaviouristic: drill-and- In TMM phonetic (ASR)

Yes 2 3
practice work
pedagogy
With Learning Assistant

Communicative: focus on
Yes 2 3 for group discussion and
meaningful output
presentations
Chapter 8
Teacher-led: didactic and For explanation of tasks and
Partially 2 3
directive, from the front group discussions
Negotiated goal-set-
Constructivist: instructed
Partially 2 2 ting; reflective learning;
SLA; ZPD
scaffolding
Student-centred: autonomous While on TMM and for pair

Partially 1 1
or ID/LS determined learning work + presentation
CALL
Blended learning: mixed P-P-P and Variation depend-

Yes 3 2
approach ent on the task
Treatment group only in

Blended learning: mixed
Yes 3 4 CALL lab + Only for experi-
setting
mental design purposes
Blended learning: mixed AV, TMM, Text, circle,

Yes Yes 3
resources WWW, Learning Assistant
Table 8.1 Model for CALL evaluation (MFE1) CALL pedagogy checklist.
201
202 Chapter 8
Research questions and methodology for TOLD
This project addressed the following questions:
1. Does computer technology enhance significantly progress in students

oral language development?
2. What factors may affect students oral language development when
using computers?
3. How do staff and students react to the use of computer technology
for oral language development?
Our null hypothesis was that a CALL environment makes no difference

to learning gains in oral language development.
Context of the TOLD project
After completing the pre-test together in the old analogue lab the cohort
was divided into two groups. The students were divided into four small
conversation classes along course lines taught by native-speaker Learning
Assistants. Two of the groups were comparison groups denied access to
technology, but taught with similar content. The two treatment groups
were taught in the multimedia laboratory. The project focused on the single
hour per week allocated to French conversation classes. Students also had
five other hours per week of other language tuition.
TOLD data collection and evaluation methods
The data collection and evaluation methods were in keeping with the MFE1
framework and are summarized below and in Table 8.2. Three qualitative
surveys were the surveys used at the start of all the Case Studies: a Language
Experience Questionnaire, an ICT-use survey, and the VARK learning style
survey. All students kept a reflective journal. Quantitative measurement
of learning gains relied on a pre- and post-test which all students sat, and
which involved a pronunciation task, some personal questions, a listening

comprehension presented initially without transcript ofthe text and ques-
tions, and then with these, to which the students recorded oral answers,
and lastly, an oral rsum of an extract of a television documentary.
Model for CALL evaluation (MFE1)

quantitative and qualitative measures checklist
Element
TOLD
Data collection method present?

Yes/No
Diagnostic survey of prior learning Yes
Diagnostic survey of learning style Yes

Post-treatment survey of student reaction Yes
Post-treatment student focus group No

Post-treatment survey of staff reaction No
Post-treatment staff focus group No
Electronic/paper log/journal of student reaction Yes
Test(s) of prior learning Yes
empirical data
Quantitative/
Pre-test Yes
Progress test (mid-treatment) No
Post-test (identical to pre-+ progress test) Yes
Table 8.2 MFE1 checklist for data collection methods: Mapping of the TOLD project.
204 Chapter 8
TOLD project environment, content and delivery
The environment
The Faculty of Arts at the University of Ulster is spread across four cam-
puses. Our project utilized the language resources available on the Coleraine
campus. The facilities included a small new multimedia lab (sixteen worksta-
tions) and an old analogue audio-visual laboratory (twenty workstations).
The multimedia classroom was equipped with the Robotel SmartClass2000
digital platform (already looked at in Chapter 6).
TOLD pedagogical approach
Our pedagogical approach combined small group face-to-face discussion

oftopical civilization issues with the use ofCALL software for pronuncia-
tion drilling. For the sake of comparability the comparison group covered
the same discussion topics and included a pronunciation-drilling element
in their routine. For both groups activities were designed to progress from
rehearsal to meaningful performance. We were conscious that the software
gave no opportunity for face-to-face communication for the treatment
group. We therefore ensured that lesson plans shared a similar structure for
both groups involving progression from pre-communication to rehearsal,
information retrieval, assimilation and final meaningful production in the
target language. All groups worked at the following oral skills: pronuncia-
tion, accent and intonation, fluency, one-to-one communication, respond-
ing spontaneously in a conversation, responding to visual or aural input,
structured and unstructured group discussion, and giving a presentation.
TOLD delivery
Students in the treatment group would record their pronunciation of a

passage or respond to a series of pre-recorded questions digitally, while the
comparison group would record themselves using an analogue recorder.
Multimedia was a feature of all the oral classes for the treatment group,
and not just an add-on. The only time that students regularly broke from
interaction with the computers was for the purpose of group discussion
or conversation.
Each of the four main language skills (listening, reading, speaking,
and writing) can be broken down into a number of different sub-skills.
For TOLD the skill of speaking was sub-divided into eleven sub-skills
(e.g. pronunciation, accent/intonation, fluency, one-to-one with a French
person, one-to-one in French with an English speaker, responding spon-
taneously in a conversation, responding to visual or aural input (e.g. from
TV/Radio), taking an active part in a structured group discussion, taking
an active part in an unstructured group discussion, giving a group pres-
entation, and giving a presentation alone). An evaluative framework for
CALL must surely require the capacity to identify and test separately such
sub-skills within not only the speaking skill but also the listening, read-
ing, writing, vocabulary acquisition, grammar, and area studies skills (in
particular reading comprehension and essay writing in the target language)
to provide an overview ofthe impacts ofCALL. Such information will be
of use to language teachers, learners, as well as CALL designers as CALL
products will be better at delivering some sub-skills than others.
Analysis of the TOLD quantitative data
For our quantitative analysis we employed standard statistical measures

alongside our MFE1 frameworks of criteria. Table 8.3 shows the MFE1
checklist of validity criteria as applied to the procedures employed for the
TOLD project. Validity may well have been hampered by the skewing
factor that most students and staff were novices to the CALL environ-
ment. Some advantage may also have accrued to the comparison group
due to the fact that the treatment group may have been distracted by the
new technology and environment. Qualitative feedback strongly points
to this possibility.
206 Chapter 8
Model for CALL evaluation (MFE1) validity checklist

Element
TOLD Validity criteria addressed?
Yes/No/Detail
Quasi-
manipulated) or quasi-experimental (variables cannot be
experimental
controlled/manipulated) study?
No
Have the respondents been isolated from each other? No
Are the results attributable to the factor(s) studied? See report
Internal validity
What other factors (variables) might have contributed to Hawthorne;

the effect? CHILL; halo;
learner/ teacher differences, variable settings, time of day/ See report
week/year)?
How certain are you the learners are not getting language
See report
instruction apart from through this study?
See report
happened?
Yes
Generalizable sample N > 30 use parametric tests No N =29
Yes
parametric tests
External validity

See
populations, settings and experimental situations? How
conclusions
relevant are they elsewhere?
Does the report describe the skills tested? Yes
Does the report describe the characteristics ofthe subjects?
Yes
(i.e. age; gender; ability/year group; cohort/course
Does the report describe the CALL materials used? Yes
Table 8.3 Validity assessment criteria for MFE1: Mapping of the TOLD project.
Where the table states see report this is because a full or direct answer
is impossible in the space provided in the table and in this chapter; fuller
explanation is available in the published report by Barr, Leakey and Ran-
choux (2005).
Table 8.4 shows more detailed mapping ofData Collection Measure
and Variables for TOLD project using MFE1. Due to the small sample size
and given that we did not select group members by learning style (or any
ofthe other independent variables) we were unable to control for these as
such in this project. Attendance was included as an additional numerical
criterion which we correlated with the learning gains.
While we were able to eliminate at least four of the listed confound-
ing variables (different content, location, cohort level and assessment) by
making sure these were consistent, nevertheless, we cannot account for
the possible skewing role played by the fact that we had different teachers
for the different groups (each with a certain freedom to deliver the con-
tent their way), a different class time/day of the week and two different
course groups. The difference in location was a control variable rather than
a confounding variable.
Our summary of findings below is based on statistical analyses using
the statistics program SPSS, which was fed with raw data gathered in Excel
spreadsheets.
208 Chapter 8
Model for CALL evaluation (MFE1)

Data Collection Measure and Variable details
TOLD
TOLD Statistical mode Data collection method
N = 29
qualitative ICT-use survey
qualitative Learning Style survey
qualitative Language learning survey
qualitative paper log/diary
Detail of Data Collection
qualitative electronic log/diary

qualitative mid-semester survey
Measure used
qualitative end of semester survey

qualitative electronic survey
qualitative focus group
quantitative pre-test
quantitative post-test
quantitative module assessment
quantitative attendance
control variable environment: CALL vs non-CALL

control variable % blended environment
control variable access to lab: taught vs self-access
independent variable Learning Style: VARK
independent variable ICT use
Detail of Variables
independent variable Language ability/level

dependent variable learning gains
confounding variable different teacher
confounding variable different location/environment control
confounding variable different class time
confounding variable different day of week for classes
confounding variable different cohort level of language
confounding variable different course cohort
confounding variable course content different
confounding variable different assessment used
Table 8.4 Model for CALL evaluation (MFE1) Data Collection

Measure and Variable details. For TOLD project.
Summary of TOLD quantitative findings
In answer to our first research question as to whether computer technology

significantly enhances progress in oral language development we drew the
following conclusions:
1. The parametric and non-parametric results were very similar for all
the tests analysed for this Case Study, implying that the sample size for
these tests (N = 29) was sufficiently reliable data from which to make
inferences. Nevertheless, a larger sample size would increase external
validity.
2. The Language learning Experience survey scores showed that both
the treatment and the comparison groups were, when viewed as a
whole, starting from a similar ability/experience benchmark (Treat-
ment group: 57.53 per cent; Comparison group: 59.57 per cent). This
went some way towards countering the skewing effect that might have
been caused by the fact that these groups were not randomly selected
but self-selecting according to course.
3. Both the treatment and the comparison group made significant
progress. However, the comparison group (NON-TECH) generally
made more progress than the treatment (TECH) group. The average
percentage gain for the comparison group was 15.64, whereas that of
the treatment group was just 5.44. Figure 8.1 shows that both groups
reached parity in outcome standard but the comparison group (NON-
TECH) had begun at a lower mean starting point. The immediate
suggestion is that the technology added nothing to learning gains,
and if anything may have hindered progress.
4. When broken down into individual sub-skills, the comparison group
did make significant progress in fluency, content and grammar, while
the treatment group did not. It is not so surprising that fluency and
content improved more in the comparison group as more time was
spent in this group on meaningful communication. What is more
surprising is that the treatment group, which had access to grammar
drilling software with built-in feedback, did not progress more in the
area of grammar.
210 Chapter 8
Figure 8.1 Mean improvement from the pre- to the post-test.

For the treatment group (TECH) and the control group (NON-TECH).
One way of analysing the test data was by means of paired t-tests for the
treatment (Table 8.5) and comparison (Table 8.6) groups respectively.
The final column reveals the significance scores at the 95 per cent (0.05)
confidence level. The bottom five rows in each table give the skills scores
and reveal that, while both groups made significant gains in pronuncia-
tion, and accent/intonation, only the comparison group made significant
gains in fluency, content and grammar. However, the difference between
the comparison groups gains in these three skills and the treatment groups
gains in the same skills was not statistically significant.
Paired Samples Test Paired Differences t df Sig.
(2-tailed)
Table 8.5 Task-by-task and skill-by-skill paired samples t-test for
Mean Std. Std. 95% Confidence

Deviation Error Interval of the
Mean Difference
Upper Lower
the treatment group (Tech).
Pair 1 Total % - PTotal -6.600 8.069 2.083 -11.069 -2.131 -3.168 14 .007
Pair 2 Task 2% - ptask2 -8.429 11.817 3.158 -15.252 -1.605 -2.669 13 .019
Pair 3 Task 4% - ptask4 -7.357 13.703 3.662 -15.269 .555 -2.009 13 .066
Pair 4 Task 5% - ptask5 -6.267 18.219 4.704 -16.356 3.823 -1.332 14 .204
Pair 5 Pronunciation % - ppron -8.800 9.615 2.483 -14.125 -3.475 -3.545 14 .003
Pair 6 Accent/Intonation % -10.800 10.930 2.822 -16.853 -4.747 -3.827 14 .002
- pAccInt
Pair 7 Fluency % - pfluency -6.000 11.458 2.958 -12.345 .345 -2.028 14 .062
Pair 8 Content % - pcontent -6.733 13.128 3.390 -14.004 .537 -1.986 14 .067
Pair 9 Grammar % - pgrammar -2.400 11.224 2.898 -8.615 3.815 -.828 14 .421
211
212
Table 8.6 Task-by-task and skill-by-skill paired samples t-test for the control group
Paired Samples Test Paired Differences t df Sig.

(2-tailed)
Mean Std. Std. 95% Confidence
Deviation Error Interval of the
Mean Difference
Upper Lower
Pair 1 Total % - PTotal -13.571 8.925 2.385 -18.724 -8.419 -5.690 13 .000
(Non-Tech).
Pair 2 Task 2% - ptask2 -15.429 18.912 5.054 -26.348 -4.509 -3.053 13 .009
Pair 3 Task 4% - ptask4 -7.000 27.139 7.527 -23.400 9.400 -.930 12 .371
Pair 4 Task 5% - ptask5 -7.917 14.311 4.131 -17.010 1.176 -1.916 11 .082
Pair 5 Pronunciation % - ppron -13.714 11.317 3.024 -20.248 -7.180 -4.534 13 .001
Pair 6 Accent/Intonation % -21.286 14.824 3.962 -29.845 -12.726 -5.373 13 .000
- pAccInt
Pair 7 Fluency % - pfluency -13.143 13.132 3.510 -20.725 -5.561 -3.745 13 .002
Pair 8 Content % - pcontent -12.143 12.347 3.300 -19.272 -5.014 -3.680 13 .003
Chapter 8
Pair 9 Grammar % - pgrammar -10.786 15.338 4.099 -19.642 -1.930 -2.631 13 .021
In answer to our second research question which aimed at isolating the

factors that may affect oral language development, the following conclu-
sions were drawn:
1. Our statistical tests suggest that the CALL environment proved to be

more inhibiting to progress in oral language development than the
traditional non-CALL setting. This finding was backed up by staff and
student qualitative feedback which mentioned the lack ofthe human
element in the lab, the lack of opportunity for meaningful output,
the inhibiting nature ofthe layout ofthe room and the learning time
sacrificed getting to grips with the technology. Other factors that may
have acted as inhibitors or brakes on progress may have been: the short
length ofthe study and the fact that the students and the teaching staff
were novices in the CALL environment, which may have exacerbated
resistance and reluctance on the part of some ofthem, just as much as
for some it may have had a positive (Hawthorne) effect. We did not
in this study analyse individual cases quantitatively. A student focus
group might have teased out the reasons on a skill by skill and task
by task basis.
2. A negative correlation was found to exist between student improve-
ment over the semester and their language learning experience in both
the treatment and comparison groups. In other words both contexts
most encouraged the weaker students. Given that we have shown that
the ICT environment did not contribute significantly to progress,
we must look elsewhere for an explanation of this finding. It might,
for example, be due to the smaller group context (as opposed to the
more threatening lecture/ large seminar environment) and the closer
attention students received in the experimental situation, which may
have raised the confidence of the weaker students.
3. No significant correlation was found to exist between ICT-use scores
and progress and attendance and progress. The Pearsons rho correla-
tion test showed no significant link (in a positive or negative direc-
tion) between the mean student scores on the ICT-use survey and
their learning gains between the pre- and post-test; nor was there any
clear link shown (in a positive or negative direction) between good
attendance and significantly improved progress. Both variables may
well play a larger role in results over a longer-term study.
214 Chapter 8
TOLD qualitative analysis
Even though the statistical evidence showed that the pedagogical benefit
of using technology for oral work was unclear, the views of students and
staff towards the use of technology in oral language development also
merit consideration to allow us to gauge the reaction ofboth groups to the
technology and help us answer the third of our research questions. Further
qualitative evidence was drawn from student and stafflogs and reports as
well as classroom observations. In spite of the restrictions of the software
and hardware resources available to us at the time and the fact that we were
not using CMC (such as conferencing) we found that a strong case can be
made for the use of technology for the tutorial, rehearsal and assessment
phases of oral skills teaching. The principal benefit is that technology can
ensure that every student is actively engaged in the production of speech
(whether interacting with the computer, a neighbour, or a native speaker
abroad) and the receiving of personalized and correct feedback more fre-
quently than in a class environment where there is usually no more than one
teacher. Technology also allows for the rapid access to multimedia resources
that may act as a prompt for oral production/discussion, or a means of
recording oral output and playback, thus allowing for a rich combination
of language skills that would be harder to replicate without technology.
The positive feedback for the CALL-based oral language tuition must be
correlated with the less positive impact on learning gains (i.e. the quantita-
tive data) for the treatment group to give a more balanced picture.
Student qualitative feedback on TOLD
Students in the treatment group were willing to use technology and gener-
ally were very upbeat about its use. In fact, in some cases the use of compu-
ter technology was cited as the most positive aspect ofthe classes, making
them more interesting. Furthermore, in their logs a majority oftreatment
students report feeling that progress was made in several of the oral sub-
skills. A configuration of this positive finding with the less than remark-
able quantitative data highlights a wider issue in the area of CALL and
ICT, whether perceived pedagogical benefit of technology by learners
corresponds to the actual benefit derived.
Student logs also revealed that just under halfthe students in the treat-
ment group (7 out of 15) described the group discussions and debates as
the best aspect of the oral development classes. These activities were the
least technological aspects ofthe oral development classes. The technology
may help in the development and practice of oral skills through drill-and-
practice and pronunciation exercises the rehearsal stage but its role in
the application of this practice the performance stage is not as clear,
especially given that TOLD did not involve message-orientated communi-
cation (e.g. by webcam or video/audio conferencing) with a real audience
outside the lab. A future study would need to investigate the qualitative
and quantitative impact on learning ofthis kind oftechnology-enhanced
oral work, before a general statement about the benefits of CALL on the
oral skill.
Staff feedback on TOLD
Discussions with the tutors showed that they were not opposed to the tech-
nology in itself but that they felt it did not always fit in with the aims of
the oral classes. The stafffeedback in general pointed to a dehumanization
of oral classes when technology was introduced, and this was supported
from classroom observations. We found the tutors reaction to be one of
pragmatism in other words, only using the technology when it makes
a difference to the learning process: a view that confirms Gillespie and
Barrs findings (2002: 131). The BLINGUA project will explore further
the effectiveness of a more deliberate pragmatism in the design ofblended
teaching in CALL. For the purposes of oral communication it is clear from
the TOLD study that, while a lab environment that does not contain a
live remote oral link to native speakers (say, in France) may support and
benefit some oral sub-skills through activities such as drill-and-practice,
record and playback, and web-inspired discussion, TOLD-2, if it were to
take place, would need to evaluate where technology is able to introduce
a human element that the classroom cannot replicate, namely authentic,
live communication with native speakers abroad.
216 Chapter 8
Conclusions for TOLD and lessons for MFE1
The application of MFE1 to the teaching of oral skills using technology

using Dunkels framework highlights key strengths and weaknesses in the
research design ofTOLD, as well as possible directions for future research.
Mapping TOLD against the Dunkel criteria (Tables 8.7) highlights most of
the key positive and negative aspects in the TOLD project. Against Dunkel,
positive features were the motivating aspect oftechnology especially linked
to speech recognition, feedback and monitoring. Negative features related
to the use ofCALL with novices, the Hawthorne effect, affective hurdles
through use of technology for some oral work, and the lack of real need
to communicate using technology in the TOLD design.
Descriptor Findings from the TOLD project

Author
PPP
Does CALL save time? Cost- Time was wasted familiarizing

efficiency; acceleration of learning; CALL novices with the technology
quicker diagnostics; reduction of which may have advantaged the
teacher workload comparison group. Using non-
novices may have yielded different
results.
Student attitudes towards the Some evidence of disparity between
computer and the subject matter; students favourable reporting and
halo and Hawthorne effects; actual learning gains. Some affective
Dunkel (1991)
CHILL factor. hurdles amongst staff and students

Pedagogy
CALL impedes human factor in

talk.
The effect of computer use in Some skills (fluency and
specific content areas (other pronunciation) benefited more
subjects; 4+ skills); age; gender; LS from drill-and-practice than others.
differentials? Some remedial benefit for less able
students.
The different impact of different Drill and practice element in TMM
types of CAI (tutorials, drill and generally made a positive impact.
practice, and simulations). Simulations were seen as artificial and
dated content. Design did not allow
for real communication outside lab.
The different impact of different n/a

types of computer environment
(CAI, CMI, CEI, WELL, MALL,
pod-casting).
The uses of CAI (i.e., as a Instinctive blending adopted by staff
supplement to, versus a replacement to introduce a more human element
for, traditional methods): early BL? into CALL environment.
The levels of student ability. Is Less able students seemed to benefit
CAL/CALL best for REMEDIAL most from the CALL elements and
work and underachievers? found drill elements helpful for
remedial work.
Is CALL more effective for certain TOLD confirms reluctance by staff
Dunkel (1991)
Pedagogy
L2 skills areas than others? to use CALL for oral skill work.
Some sub-skills within the oral
skill do however benefit more than
others.
Do certain levels of proficiency Weaker students may benefit more.
profit more from computer use than
others?
What role does feedback play in the Students appreciated instant and
effectiveness of CALL pedagogy? discrete feedback in TMM software
(e.g. speech recognition) and the
Robotel monitoring facility.
What degree of learner control is Some suggestion that differentiation
related to effective CALL designs? in learner control through
individualized learning paths may
enhance motivation.
Table 8.7 Mapping of TOLD against Dunkels (1991) evaluative headings.
Our Model for Evaluation, drawing on qualitative and quantitative data,

suggests therefore that a possible two alternative hypotheses will now need
to be tested in greater depth and with a narrower design focus. These are
first, that the CALL environment and CALL pedagogy make a positive
difference in particular sub-skills or tasks of oral language development
(such as pronunciation and accent/intonation), but not in others (H1), and
second, that the CALL environment and CALL pedagogy hinder certain
aspects (skills or tasks) of oral language development (H2). Such tests will,
of course, only show that those CALL environments and pedagogies tested
218 Chapter 8
have the particular effect. It is very possible that, as our analysis ofthe Hub-
bard table suggests, the lack of opportunities for meaningful output using
technology meant that it was partly our approach to teaching oral language
in the lab that prevented greater progress being made. Ideally, therefore, a
number of different CALL oral environments and oral pedagogies, as well
as blends ofCALL and non-CALL approaches, need to be trialled against
the same sub-skills and tasks before we get any nearer a more definitive
answer for the benefits of CALL on oral skills.
Having applied MFE1 to a single language learning skill (speaking)
over a single semester, we turned our attention to applying it to a mix of
skills (reading comprehension and writing within an area studies setting)
in a more longitudinal study (over the course oftwo semesters), delivered
in a more considered blending of settings, media and pedagogies. This
time both the treatment group and the comparison groups would have
access to CALL.
Case Study 3 continued: BLINGUA-1 and 2
Introduction to BLINGUA
BLINGUA was about developing and evaluating a blended language learn-

ing approach for CALL. A report on our first blended learning project
(which is called BLINGUA-1 in this evaluation) was published in 2006
(Leakey and Ranchoux). Rather than repeating the detail of that study,
this aspect of the pedagogy Case Study involves an attempt to configure
findings and insights from related project data collected over a period of
three years at the University ofUlster, a kind oflongitudinal study, in the
light of MFE1. BLINGUA-1, which took place in 20042005, was fol-
lowed by BLINGUA-2 featuring the same students in their second year
ofFrench undergraduate studies. This second year ofthe blended learning
project shows three significant evolutional advances on the design of the
TOLD based on our experiences gained in that study. First, where TOLD
and BLINGUA-1 had involved novices in CALL, for BLINGUA-2 the
BLINGUA-1 students were followed from their first year into their second
year, when they were now familiar with the technology and the new envi-
ronment. We hoped this would minimize skewing effects such as Haw-
thorne, and enable us to confirm short-term inferences with more certain
long-term findings. Secondly, we wanted to move away from comparing
CALL with non-CALL students in the same year for a number of reasons:
from an ethical point of view we did not wish to deny half a cohort access
to a treatment; we also wanted to focus on developing the right pedagogy
for CALL so our comparison was between different approaches to CALL
pedagogy (i.e. differentiated by learning style versus non-differentiated).
So, for the purposes of comparative data analysis, rather than using students
within the same year, we compared the same cohort and similar module
test data from a previous year (20032004) when no use of CALL had
been involved. Our third advance was to reduce the amount of language
teaching taking place outside the study by ensuring that, for BLINGUA-2
at least, all three contact hours per week for our module were taught in the
CALL environment (rather than the single hour that TOLD had worked
with). This aspect ofthe Case Study, therefore, harnesses our MFE frame-
work to provide a longitudinal perspective ofthree years of data-gathering
where a number of constant variables (same teachers and module, similar
learning content and assessments) have been maintained to ensure internal
validity, and where quantitative and qualitative data have been configured
to provide a mix of phenomenological richness and empirical rigour.
Background to BLINGUA-1 and 2
The BLINGUA project, from the outset, aimed to develop an effective

blended learning approach to CALL that drew on a range of resources and
pedagogic behaviours and matched the learning to an analysis of partici-
pants needs and abilities (Pederson 1988: 122; Neumeier 2005: 176). We
wanted to move beyond a mere eclectic pragmatism, to a more considered
blending or multi-modality. We were also keen to address Dunkels call for
research that would assess whether CALL was more effective for certain L2
skill areas than others and whether certain levels of proficiency and learning
style profit more from computer use than others (Dunkel 1991: 2526).
220 Chapter 8
As part ofthe migration to a new digital platform and way ofteaching

languages through CALL we were interested in an integrative approach
and in normalizing CALL for both staff and students (Bax 2003). In
line with McCarthys (1999) earlier appeal for CALL to become routine
(p. 7), Bax had urged that students experience of computers for language
learning become unremarkable (p. 25). We were also aware ofthe discus-
sion initiated by Oliver and Trigwell around the importance of ensuring
that blended learning be well thought through and linked to educational
theory; in their case they draw on variation theory as their foundation
(Oliver and Trigwell 2005).
We defined blended learning for CALL as the adaptation in a local
context of previous CALL and non-CALL pedagogies into an integrated
program oflanguage teaching and learning drawing on different combina-
tions of media and delivery to produce an optimum mix that addresses the
unique needs and demands of that context.
Research questions and project aims: BLINGUA
There were two central research questions driving BLINGUA-1 and BLIN-
GUA-2. These were:
1. Was an integrative or blended teaching and learning approach to the

teaching of language in an area studies context and in a computer-
based environment more effective than a traditional classroom-based
pedagogy?
2. Would a CALL-based teaching and learning approach be possible that
might allow students to learn according to their dominant learning
style and would this be more effective than a CALL-based approach
where the same (or similar) material was delivered but without differ-
entiation according to learning style?
To address these questions we designed a research project that drew together,

in an HE undergraduate context, a considered blend of drill-and-practice
and acquisition approaches and a multi-modality ofCALL resources. These
we would evaluate over a period of time using MFE1, itself an integrated
mix of analytical tools, to generate insights that other language teachers

and CALL practitioners would find useful.
While the research questions, the modules and the digital platform
remained the same across the two parts of the project, there were differ-
ences: For BLINGUA-1 two first year modules were under the spotlight: a
language module (FRE101) and an area studies module (FRE103) had one
hour each out ofthree devoted to the project; BLINGUA-2 was exclusively
devoted to a single area studies module (FRE303: French Press and Media),
and all three of its contact hours took place in the lab with students, who
were now non-novices. BLINGUA-2 CALL classes took place in a larger
and more ergonomically pleasing lab.
BLINGUA pedagogical considerations
CALL pedagogy tends to fall in line behind the pendulum swings oflan-
guage learning pedagogy and methodology (Decoo 2001), though it has
taken more easily to some approaches than others. Blended learning for
CALL can draw on the strengths of both behaviouristic and acquisition
approaches and resources, and need not restrict itself to computer-based
environments, resources and methodologies. The BLINGUA project has
been eclectic, too, in its trialling of different learning environments, teach-
ing and learning methods (at times teacher-centred, at others self-study or
parcours and learning style driven) and in the choice of software and online
resources, alternating as it did between the more behaviouristic CLEF and
Logifrench programs (used in BLINGUA-1) on the one hand and the more
constructivist, open-ended, customizable HotPotatoes program, parcours
(i.e., learning paths) of TellMeMore and home-produced web-enhanced
learning activities, on the other (used in both BLINGUA projects). Both
the treatment and comparison groups in BLINGUA-1 made use of the
same sixteen station digital lab employed for TOLD. BLINGUA-2 took
place in the newer forty-two station multimedia lab.
A blended approach should, ideally, strive to develop tasks and learning
activities or cycles of activities that prioritize meaningful communication
at some point in the teaching cycle, be it either earlier on in the cycle (as
in Task-Based Learning, where the production or performance precedes
222 Chapter 8
the focus on form) or at the end of the sequence (as in the Presentation-
Practice-Production approach). Both BLINGUA projects used the latter
(P-P-P) method, and we added a final phase: that of reflective learning by
means of student managed web-logs, paper-based logs and student inter-
views. Table 8.8 summarizes blend mixes in the BLINGUA groups and the
comparison group 20032004.
% lab/class Input blend Setting mix Resource mix Task mix
100%
classroom
Teacher led (l); Lecture Essay(s);
20032004 Board, OHP;
group discuss Theatre presentation;
FRE313 handouts; TV/video
(S+C) +Classroom 2 comps
Comparison
group
33% lab
(L+ S): Board, OHP;
BLINGUA1 LT +C/room
nonCALL handouts; TV/video Essay(s);
20042005
(C): CALL and SC2000; MSOffice; presentation;
FRE103 LAB ( J205) +
split into LgS/ CMS; WWW; 2 comps
Comparison Mdiathque
NLgS TMM
group
100% lab
BLINGUA2 LABS Dry-wipe board,
Teacher led (l); Essay(s);
20052006 (MMLL) + SC2000; MSOffice;
group discuss 2 dossiers;
FRE313 Mdiathque online dict.;
(S+C) 2 comps
Treatment (MMRU) WebCT; WWW;
group
Table 8.8 Different blends of approach, setting, media and task in the BLINGUA
projects.
BLINGUA integrated the behaviouristic elements built into some of the

software with a mix of teacher-led and autonomous learning elements.
The comparison groups were primarily teacher-led. The MFE1 checklists
in Tables 8.9 and 8.10 pull together the key points of pedagogy design and
implementation for BLINGUA-1 and BLINGUA-2.

How well done?
Element present? Degree (03) Evidence
(05)
BLINGUA-1
0 = not at all 0 = poorly
CALL approach descriptor 1 = minimally 1 = minimally
Yes/No 2 = somewhat 3 = to a great extent relate comments to the descriptor
In TMM phonetic (ASR) work,
behaviouristic
Yes 2 2 CLEF, Logifrench, and www.
drill-and-practice
frenchlesson.org
communicative focus on for written work and PowerPoint
Yes 3 3
meaningful output presentations
teacher-led didactic and
Yes for comparison 3 for comparison 3 for comparison For explanation of tasks, feedback, and
directive, from the front
group group group instruction
approach
CALL pedagogy
constructivist; Yes for treatment 3 for treatment 3 for treatment Negotiated goal-setting; reflective
instructed SLA; ZPD group group group learning; scaffolding
student-centred autonomous Yes for treatment 3 for treatment 3 for treatment
For all tasks (treatment gp.)
or ID/LS determined learning group group group
blended learning Yes for treatment 3 for treatment 3 for treatment P-P-P and Variation dependent on
mixed approach group group group the task
Treatment group mostly in CALL lab
blended learning Yes for treatment 3 for treatment 3 for treatment
(some occasionally in Mdiathque by
mixed setting group group group
LS). Comparison gp in CALL lab only
AV, TMM + grammar s/ware,
blended learning Yes for treatment 3 for treatment 3 for treatment
WORD/PPT, WWW (Mdiathque
mixed resources group group group
paper-based by LS)
223
Table 8.9 Model for CALL evaluation (MFE1) CALL pedagogy checklist (BLINGUA-1).
224 Chapter 8
The principal point of difference between the BLINGUA-1 and 2 research

designs was the fact that for the first project the students were from the same
first year cohort which we split into a treatment and comparison group.
For the second project the whole cohort was the treatment group, and we
compared the data from the cohort with other years that had covered the
same or similar modules, teaching and assessment experiences.
The construct for BLINGUA-1 also allowed us to sub-divide the treat-
ment group into four sub-groups with activities differentiated according to
dominant learning style. While the comparison group underwent a largely
teacher-led regime, the treatment group was allowed more freedom to roam
both between and within a range of software programs and web-based texts
which were categorized according to suitability to the different dominant
learning styles. For BLINGUA-2 the tuition structure made use of a blended
approach for lecture and seminar contexts (all classes were taught in the
e-lab, but there were a combination of activities: lectures were delivered via
PowerPoint; for the seminars and the comprehensions the teacher used the
Robotel SmartClass functionality as well as the (dry-wipe) whiteboard;
students used word-processing software, face-to-face interaction with the
teacher, screen-read and paper-based comprehensions, online French news
comprehensions, the online dictionary, tandem email surveys, and under-
took independent study using the multimedia resource unit (MMRU) and
paper-based activities). WebCT was used for posting of course-notes and
communication with students (instead ofCMS). In the e-lab the students
could also access materials and save their own work on the lab server as
well as on the university-wide system. Students also regularly accessed the
Internet and WebCT both within and outside the lab.
The next section looks at the impact of the projects on the learners.
First, there will be a comparative analysis ofthe different variables involved
in the BLINGUA-1 and 2 projects (also mapped against the TOLD vari-
ables). The different data collection methods used and the comparative
adherence to the internal and external validity criteria will then also be
looked at prior to a synthesis of the qualitative and quantitative findings.
Element present? Degree (03) How well done? (05) Evidence

BLINGUA-2
CALL approach descriptor relate comments to the
descriptor
Behaviouristic drill-and-
No 0 n/a n/a
practice
For written work, seminar
Communicative focus on
Yes 3 4 discussions and PowerPoint
meaningful output
presentations
Teacher-led didactic and For lectures, for explanation of
directive, from the front 33% 2 3 tasks, feedback, and instruction
approach both groups
Seminars, goal-setting;
CALL pedagogy
Constructivist; instructed
Yes 3 4 reflective learning; scaffolding
SLA; ZPD
treatment group
Student-centred
For all tasks for treatment
autonomous or ID/LS Yes 3 4
group
determined learning
Blended learning mixed P-P-P and Variation dependent
Yes 3 4
approach on the task treatment group
All classes taught in the
Blended learning mixed multimedia lab, but students
Minimal 3 4
setting visited self-access suite
(MMRU) in own time
Blended learning mixed Lab, WWW, VLE, WORD/
Yes 3 4
resources PPT, online dictionary, MMRU
225
Table 8.10 Model for CALL evaluation (MFE1) CALL pedagogy checklist (BLINGUA-2).
226 Chapter 8
Data collection: BLINGUA-1 and 2
Data-gathering for the two BLINGUA projects involved fewer data col-
lection methods than had featured for the TOLD project (see Table 8.11).
This was partly due to the awareness that excessive diagnostic measures can
inhibit participants and potentially affect the accuracy of the data.
MFE1 Data Collection Measure and Variable details
Statistical TOLD BLINGUA-1 BLINGUA-2

UU Data collection method
mode N = 29 N = 21 N =17
qualitative ICT-use survey
qualitative Learning Style survey
qualitative Language learning survey
qualitative paper log/diary

Detail of Data Collection Measure used
qualitative electronic log/diary
qualitative mid-semester survey
qualitative end of semester survey
qualitative electronic survey
qualitative focus group
quantitative pre-test
quantitative post-test
quantitative module assessment
quantitative attendance
control environment:

variable CALL vs non-CALL
control
% blended environment
variable
Access to lab: teaching

Small
control space (MMLL) vs self- Small Large MMLL
MMLL +
variable access (mdiathque/ MMLL + MMRU
mdiathque
MMRU)
independent
Learning Style: VARK
variable
independent
ICT use
variable
independent
Language ability/level
variable
dependent
learning gains
variable
control different location/

Detail of Variables

variable environment
confounding
different teacher
variable
confounding (previous
different class time
variable years)
confounding different day of week for (previous

variable classes years)
confounding different cohort level of (only when
variable language 303 vs 103)
confounding (to compare
different course cohort
variable other years)
confounding (only when
course content different
variable 303 vs 103)
(similar
confounding
different assessment used test type and
variable
structure)
Table 8.11 MFE1 Data Collection Measure and Variable details for
TOLD and BLINGUA projects.
228 Chapter 8
For qualitative feedback from students we relied, in BLINGUA-1, on a

paper-log and a summative survey and, for BLINGUA-2, a focus group.
The number of confounding variables was reduced; for example, the BLIN-
GUA projects always involved the same teacher, whereas TOLD had used
different Learning Assistants for the different groups which may have had
a skewing effect. In the TOLD and BLINGUA projects the difference in
location was a control variable (for TOLD it was part ofthe CALL vs non-
CALL comparison; in BLINGUA-1 occasional access for certain learning
styles to the mdiathque was part of the differentiation between CALL
approaches. For BLINGUA-2 the location ofthe treatment group was now
the new digital lab, and performance here could be compared with that of
previous similar cohorts in other locations (small lab or classroom).
Table 8.12 includes, along with the qualitative measures used, a list
of the quantitative measures. This highlights an improvement in design
from BLINGUA-1 to BLINGUA-2. In the BLINGUA-1 project there had
been a weakness in the timing ofthe pre- and post-tests, with the post-test
coming too early to measure learning gains, but also being too easy and
therefore yielding excessively high scores; for BLINGUA-2 we included
a computer-generated, gapped c-test in week 1 and 11 to measure learning
gains in vocabulary, grammar and comprehension. This complemented
the two modular comprehension assessments which we used as a constant
across all area studies groups studied.
MFE1 quantitative and qualitative measures checklist

Element Element
BLINGUA
present? present?
1 and 2
Data collection method

BLINGUA-1 BLINGUA-2
Yes/No Yes/No
Diagnostic survey of prior learning Yes Yes
Diagnostic survey of learning style Yes Yes

Post-treatment survey of student reaction Yes Yes
Post-treatment student focus group No Yes
Post-treatment survey of staff reaction No No
Post-treatment staff focus group No No

Electronic/paper log/journal of student
Yes No
reaction
Test(s) of prior learning No No
Quantitative/ empirical data
Yes
Pre-test Yes (week 1) (c-test wk 1
+week 5 comp)
Progress test (mid-treatment) No No
Yes
Post-test (identical to pre-+ progress test) Yes (week 5) week 11 c-test +
comp
Table 8.12 MFE1 checklist for data collection methods.

Mapping of the BLINGUA projects.
230 Chapter 8
Table 8.13 shows the comparative validity of the two BLINGUA studies.
MFE1 Validity checklist

Element addressed?
UU
Validity criteria BLINGUA-1 BLINGUA-2

Yes/No/Detail Yes/No/Detail
Is this an experimental (variables can
be controlled/manipulated) or quasi- Quasi- Quasi-
experimental (variables cannot be controlled/ experimental experimental
manipulated) study?
Yes due to
Have the students been randomly assigned
No longitudinal
to the treatment and comparison groups?
nature of design
Have the respondents been isolated from
No No
each other?
Are the results attributable to the factor(s)
See report See report
studied?
Hawthorne; Hawthorne,
Internal validity
CHILL; CHILL,
What other factors (variables) might have halo; language halo; reduced
contributed to the effect? learning language-
outside of learned outside
study study
How will you control for extraneous
variables (such as learner/ teacher
differences, variable settings, time of day/
week/year)?
How certain are you the learners are not
getting language instruction apart from See report See report
through this study?
Does the student reporting accurately
reflect what happened?
Are the different variables (independent/
control/ dependent) clearly identified and Yes Yes
reported?
Generalizable sample
No N = 21 No N= 17
N > 30 use parametric tests
Sample less easily generalizable
Yes N =21 Yes N =17
N < 30 use non-parametric tests
To what extent can the results be
External validity
generalized to other populations, settings See

See conclusions
and experimental situations? conclusions
Does the report describe the skills tested? Yes Yes
Does the report describe the characteristics
of the subjects? (i.e. age; gender; ability/ Yes Yes
year group; cohort/course
Does the report describe the CALL
Yes Yes
materials used?
Table 8.13 Validity assessment criteria for MFE1: Mapping of the BLINGUA projects.
In terms of internal validity BLINGUA-2 represented an improvement on

BLINGUA-1 in two regards: firstly, whereas for BLINGUA-1 the treatment
group and the comparison groups were divided non-randomly along course
lines, in the BLINGUA-2 study the treatment group was made up of the
whole cohort for that year, and then compared with the performance of
whole cohorts from other years a more random arrangement, and thereby
containing greater internal validity. Secondly, by ensuring that the entire
module FRE303 was taught in the lab as part of the project, the number
oflanguage learning contact hours outside the project was reduced in the
BLINGUA-2 project by two hours, thus reducing the confounding ele-
ment of progress made outside ofthe study. It must be stated, though, that
students still had a further 6 hours oflanguage tuition outside the project
module, which in all likelihood did contribute significantly to learning
gains, albeit in different language skills than we were testing for.
232 Chapter 8
BLINGUA-1 principal quantitative findings
When considering BLINGUA-1 it is important to remember that this

study was not a comparison ofCALL with non-CALL, but rather oftwo
different CALL treatments (the comparison group with a teacher-led whole
class approach versus a treatment group with more autonomous learning
where students were differentiated by learning style). First of all slight,
though not significant, differences were evident in the compared means
ofthe pre- and post test totals for the treatment group versus comparison
group. The treatment group (N = 10) scored lower than comparison group
(N = 11) only in some elements of the language module (FRE101), not in
any area studies elements (FRE103). One possible inference from this is that
blended learning, when linked to learning style and a more autonomous-
learning approach to classroom teaching may suit area studies classes better
than language classes which may benefit more from a teacher-led whole
class teaching approach.
The second main finding was that, when focusing purely on the differ-
ence between the pre-test and post-test scores (rather than individual test
scores) the comparison group scored as well as, or better than, the treatment
group in three out offour ofthese analyses. This suggests that differentiation
by learning style did not necessarily help the treatment group, even though
they may have reacted positively to the experience in their feedback.
A number of correlations were significant in BLINGUA-1. An analysis
ofPearsons rho correlations, to see what paired variables might be linked
causally, did not show any significant connection. It is important to note
that one cannot ascribe causality even when the significance is high (* =
significance at the.05 level; ** = significance at the.01 level of confidence).
One can only speculate as to possible connections or causality. Scores for
attendance, language learning experience and ICT-use were run against
scores for a number of tests.
For the Monday (comparison) group the following Pearsons rho tests
showed significant correlation. For example, Language Learning Experience
correlated very strongly with the difference between grammar pre-test and
post-test for module FRE101 (r =.910**). Attendance correlated strongly
with the FRE101 total module score (r =.838**). Attendance correlated
strongly with FRE101 difference between 2 grammar scores (r =.821**).
Attendance correlated strongly with FRE103 total module score (r =.794**).

Language Learning Experience correlated quite strongly with FRE103
total module score (r =.788**). Language Learning Experience correlated
quite strongly with FRE101 total module score (r =.720*), and ICT-use
correlated quite strongly with FRE101 difference between 2 translation
scores (r =.640*).
For the Wednesday (treatment) group only the following test showed
significant correlation: ICT-use correlated quite strongly with FRE101
difference between two translation scores (r =.701*).
From this one can infer that the Monday group, for some unclear
reason, showed many more significant correlations than the Wednes-
day group, with attendance proving to be the most significant factor in
progress. As with TOLD this could be attributable to the fact that this
group of slightly less able linguists on average found greater benefit from
the remedial elements of the multimedia lab in a teacher-led, rather than
autonomous learning context. The fact that the Wednesday (treatment)
group only showed one significant correlation could possibly be due to a
number offactors. It is possible that students were more alert on a Monday
afternoon than a Wednesday morning. Alternatively one might argue that
the treatment group students were unfamiliar with autonomous learning
and found it difficult to organize their learning, even though they were
generally very positive about the experience. A different research structure
(e.g. with non-novice students studying on the same day and at the same
time) would be needed to isolate which variable was more likely to be the
contributory factor to their poor performance relative to the comparison
group (even though they did make progress).
As for the role played by learning style in the treatment group, one can
only say that the results for learning styles are interesting at an individual
level but inconclusive and also not generalizable given the small sample size
of each category. Different categories scored higher for different tests with
perhaps the Visual and the Reading style doing best overall, but not by very
much. An analysis of learning styles shows that the Reading (i.e. reader/
writer) group ofthree students scored highest on the qualitative survey of
prior language qualifications and self-assessment oflanguage competence
across the language skills. This is in line with the fact that academics who
are Auditory learners and/or those with a Read/Write learning style are
234 Chapter 8
more likely to stay in higher education (see the University ofSouthamptons

study skills support materials, available at: http://www.studyskills.soton.
ac.uk/studytips/learn_styles.htm; accessed 2 September 2010).
The jury must remain out at this stage regarding which aspect/aspects
of the blended learning experience (environment, resource, differentia-
tion by learning style, teacher-led/autonomous learning) are more signifi-
cant. The findings from the BLINGUA-2 project, which was set in a more
comfortable environment with students more familiar with autonomous
learning, would suggest that the blend of environments might play a more
significant role than the other factors).
BLINGUA-2: Principal quantitative findings
The BLINGUA-2 cohort was the same group of area studies students as
BLINGUA-1, now in their second year, and now taught entirely (Lec-
ture + Seminar + Comprehension class) in a multimedia lab setting with
full use oftechnology. The graphs for the 20052006 second year cohort
show parity in mean scores for the treatment and comparison groups. In
other words there was no significant improvement in the mean scores from
the first comprehension to the second (paired t test: p =.964; Wilcoxon
Signed Ranks for two related samples: p =.831]. This test is based on a
comparison of sample means of similar two comprehensions again set at
week 6 and 12.
Nevertheless, the results for this module represent the best scores for
any area studies group in a multimedia setting in the period 20032006.
The configured data showed us that motivation improved significantly
even though performance showed no positive upward trend. The fact that
they were taught 100 per cent in the new, ergonomically-improved mul-
timedia labs, and were given a theory-driven, blended approach may have
played a role.
To chart progress from BLINGUA-1 to BLINGUA-2 data from
both projects was collated in a spreadsheet (not included) from which
some useful inferences could be made. There was a slight, but not signifi-
cant, correlation between good attendance and final overall ranking in the
BLINGUA-2 module. There was an apparent link between a Reading/

Writing and Auditory learning style and a strong high rank in the overall
module scores in the BLINGUA-2 module (three of the four students in
this category are ranked in the top five students). This endorses the BLIN-
GUA-1 finding that agreed with University of Southampton statement
regarding the favourable HE outcomes for Reading/Writing students.
Only a few students made it into the highest range of comprehension
scores, and no Kinaesthetic students made it into the top third of students
in the overall module ranks even though they were the largest LS group
of students (5/16).
A longitudinal view of BLINGUA
BLINGUA-1 and BLINGUA-2 configured findings
The students in both projects, generally speaking, found the blended

learning experience a positive and motivating one and would lean towards
preferring the BLINGUA approach to the traditional classroom based
learning.
Students kept their own logs and completed an electronic survey.
From an analysis of the logs the positive statements tended to revolve
around the following factors: enjoyment, informative nature offeedback,
independent learning, clarity oflayout, opportunity for repetitive drilling
and manipulation of structures, interest in the content of material, train-
ing in examination technique. The negative statements centred around
the following notions: difficulties with the technology, getting used to the
new environment, physiological effects of doing a reading comprehension
on a computer, lack of time to complete tasks, difficulty understanding
some of the instructions in French, lack of enjoyment or interest, missing
paper-based resources, lack of depth of some of the online Area Studies
content.
236 Chapter 8
From an electronic survey conducted after BLINGUA-1 we found that

percentages tipped between 56 and 88 per cent in favour ofthe BLINGUA
treatment. The survey focussed on the following issues: students percep-
tions of elements of teaching in a multimedia environment, the extent to
which this integrative teaching and learning approach in a multimedia
classroom was felt to be more effective than a traditional classroom-based
pedagogy, and the perceived level of effectiveness and satisfaction regard-
ing the BLINGUA approach, the FRE101 and 103 modules delivery and
their overall experience in first year.
Space does not allow for a detailed summary of qualitative feedback
from students. One key finding after BLINGUA-1 was that over two thirds
of students (68.8 per cent) agreed to having a real need to have both a
traditional and a multimedia classroom in which to learn. This statistic is
perhaps the clearest evidence in favour of a blended learning (i.e. mixed
settings) approach over and above an integrative CALL environment.
Students felt they needed different contexts to fulfil different needs.
The BLINGUA-2 feedback was similar in this regard, though this posi-
tive reaction to mixed settings appears to contradict the quantitative data
that suggested single location teaching was more conducive to improved
learning gains, whether in classroom or lab. For BLINGUA-2 we drew
our quantitative data primarily from an end oftreatment survey of student
reactions and a focus group involving five ofthe seventeen students in the
cohort. We did not repeat the three surveys (ICT-use, Language Learning
Experience and Learning Style) that we had used in TOLD and BLIN-
GUA-1 as we were dealing with the same students. There were five students
involved in the focus group. Their positive comments covered a range of
themes: normalization ofthe CALL experience, clarity and attractiveness
ofthe PowerPoint presentations, access at all times via the VLE to lecture
notes and other resources, the accessibility of the tutor via email or the
VLE, as well as discreteness in the lab via the headsets, the new lab environ-
ment, the fun factor. Negative comments also covered a range of topics:
the temptation not to use paper-based resources and texts, frustration with
some aspects of the software, inhumanity of the computer environment,
underuse at times of the full functionality of the lab, the increased work-
load linked to the new requirement to produce an e-dossier.
One significant finding from the two BLINGUA projects is that

having an improved environment, albeit married with a carefully integrated
approach to multimedia teaching, was an important part ofthe improve-
ment in student learning gains.
The principal empirical findings from BLINGUA-1 were equivocal
regarding the effectiveness of a blended learning pedagogy for CALL
designed to encourage improved, autonomous learning through learning
paths geared to a students predominant learning style. The data again
suffered from sample sizes inadequate to ensure external validity. What
significant findings there were broadly endorsed the finding in the TOLD
project that the comparison group performed slightly better than the treat-
ment group, although both groups made progress over most skills despite
drops in mean scores between pre- and post-tests due mainly to the fact that
only the post-test was sat under examination conditions and with time con-
straints. The remedial nature of some aspects ofCALL was also suggested
by the fact that (well-attending) weaker students appeared to make larger
learning gains than abler students. No evidence was found that differenti-
ated treatment by learning style in a blended learning environment made
any significant difference to learning gains over a single semester, despite
positive feedback from the treatment group. Student feedback was largely
positive towards both teaching and learning differentiated by learning style
(BLINGUA-1) and towards the blended, multimodal learning experience
(BLINGUA-1 and 2); however, this did not always correlate with improved
learning gains or attendance.
As for the principal BLINGUA-2 empirical findings we found that all
three groups with negative significance in learning gains had experienced a
mix of settings (33 per cent spent in lab, the remainder in the classroom).
What of the cohorts with less negative learning gains? The cohort with
the best score was the (20032004 retrospective) comparison group which
had spent 100 per cent of its time outside the lab in the ordinary classroom,
then came the BLINGUA-2 group which spent 100 per cent of its time
for the module FRE313 inside the lab. One possible inference from this is
that our null hypothesis (H0 ) that blended teaching, resources and set-
tings make a significant difference to language learning gains, might need
to be rejected in favour of the alternative hypothesis (H1) that language
238 Chapter 8
learning in one fixed setting, whether with blended teaching or not, has a
greater impact on language learning gains than a combination of settings
(CALL or absence of CALL being an insignificant variable).
This alternative hypothesis will clearly need further testing to isolate
what is really happening, and without recourse (as occurred in BLINGUA)
to a retrospective control group. There were at least three possible skewing
factors that might have been at play in our longitudinal study, and which
would need to be factored out in a future study. First, the 20032004
students were very familiar with their classroom environment and used
to all the well-established (teacher-led) routines; secondly, this group did
not have the extra challenge of having to manipulate and navigate a new
digital environment; and thirdly, one cannot be categorical about con-
clusions based on sample sizes of nineteen, twenty-one and seventeen.
With both the BLINGUA projects we were dealing with many novices
or relative novices to either the CALL setting, semi-autonomous blended
learning, or both. With larger sample sizes, studied over time, and with
learners and teachers well used to the new environment more favourable
data might be obtained.
Given the complexity ofthe research designs and the variables involved
there is uncertainty regarding the ability of our research design MFE1 to
assess empirically, within non-experimental contexts, the impact ofCALL
pedagogy with all its complexities and permutations. The real-life educa-
tional setting, in which the students learning was not confined to that
which was taking place during the research project class contact hours,
will always compromise validity. It was not certain whether our Model
for Evaluation could glean significant and generalizable data for language
teaching differentiated by learning style from a single institution where
the sample sizes were inadequate for the purpose of statistical analysis of
a multivariate study. As with TOLD a single semester was deemed to be
not long enough for students to get accustomed to a new approach to
language learning.
It became clear that full migration to CALL should not mean the
abandonment of blended teaching/learning and the reasoned use of the
classroom environment for some skills or sub-skills. Table 8.14 brings
together some of the insights into comparative advantages and disadvan-
tages of a CALL setting for the three tuition modes: lecture, seminar and
comprehension class. A well-designed lab and a broadly-based teaching

approach will allow for paper-based support materials, time-out from star-
ing at screens, pair or group work (face-to-face or using the computer), and
multi-modal multimedia (PowerPoint input, aural input and oral output,
web-based research, emailing partners abroad, written tasks, AV/TV-based
activities, etc.). Teachers must decide on a case-by-case basis which setting
is appropriate for a given activity and at what point in a scheme of work
recourse should be made to a different setting or resource.
BLINGUA-1 and 2 Tuition mode findings

Advantages of digital lab + VLE Disadvantages of digital lab + VLE
White board is there if you need it
Technology can break down;
Materials can go up on WebCT
home/ lodgings access may be limited
as an asynchronous resource, for
Cannot fit more than 22 into
consolidation, reflection, and
Lecture
Lecture
single lab or 42 into joined lab

revision accessible from home
Danger ofbeing used as a bolt-on,
Students can use paper ifthey
rather than integrated resource
wish more desk space between
Passivity: students may rely on
our monitors, than in an ordinary
WebCT and not take notes
classroom, possibly even than a LT
Advantages ofthe multimedia
Seating arrangements can be less
Interactivity; audio pairing for
conducive to group work
Seminar
Seminar
discussions
Technical glitches can be disruptive
Independent learning
Takes time to train novice users of
Reinforcement from networked
the lab (TOLD; BLINGUA-1)
materials in MMRU
As for seminar (interactivity,
Access to Internet translators
independent learning)
Comprehension
Comprehension
Paper version oftexts more

Relevant and up-to-date texts
desirable for longer texts
Online dictionary
Temptation to plagiarize or copy
Grand-Robert for synonym,
+ paste for written answers is
paraphrasing work
greater (but this can be checked
Grammar and other software
quicker via Google/ TurnItIn)
available
Table 8.14 Advantages and disadvantages of the blending of Multimedia lab + VLE.
For different teaching modes in the teaching of Area Studies in French (20032006).
240 Chapter 8
Mapping of the pedagogy Case Study for MFE1
When the BLINGUA-1 and 2 projects are mapped against Dunkel (1991)
the progression in design, pedagogy and effectiveness when compared with
TOLD is highlighted (see Table 8.15). In particular, one should note the
increased role played by familiarity with CALL, the percentage of module
time spent using CALL, and the ergonomics of the setting. The speed of
turn-around in the diagnostics relating to the pre-test scores was vastly
increased due to the use of digital tests (i.e. the learning styles survey, c-test,
and TellMeMore test). This in turn enabled an increased efficiency in the
allocation of students to differentiated learning paths.
Findings from the BLINGUA-1

Author
PPP
Descriptor and 2 project

(abbr. B-1 and B-2)
Some time wasted familiarizing
CALL novices (B-1) with the
technology but as both groups were
Does CALL save time? Cost-
in the lab this did not disadvantage
efficiency; acceleration of learning;
any group. Quicker allocation of
quicker diagnostics; reduction of
students to differentiated groups
teacher workload
due to computerized test-scoring.
Photocopying reduced. 100% in lab
saved time.
Dunkel (1991)
Pedagogy
Generally positive in B-1 and B-2,

Student attitudes towards the though some notable exceptions.
computer and the subject matter; Some disparity between reported
halo and Hawthorne effects; reaction and learning gains.
CHILL factor. Comparison groups did not appear
to suffer from denial of treatment.
Some skills (area studies) benefited
The effect of computer use in more from BLINGUA treatment
specific content areas (other than others. Age, gender not
subjects; 4+ skills); age; gender; LS targeted in B-1 and B-2. B-1 and B-2
differentials? showed Read/Write LS benefited
most from BL treatment.
Drill and practice element less a

The different impact of different feature in area studies elements.
types of CAI (tutorials, drill and Main benefit due to broadcast, scan
practice, and simulations). features and WWW access from
Robotel SC2000 lab.
The different impact of different
Notable improvement in the
types of computer environment
affective impact of the lab change
(CAI, CMI, CEI, WELL, MALL,
between B-1 and B-2 studies.
pod-casting).
LS differentiated treatment
combined with BL made positive
The uses of CAI (i.e., as a
impact on treatment group B-1.
supplement to, versus a replacement
However, B-2 showed 100% of lab
for, traditional methods): early BL?
or classroom yielded better learning
gains than mixed settings.
Dunkel (1991)
Less clear-cut remedial benefit.

Pedagogy
The levels of student ability. Is

Test data showed 3 of top 5 ranked
CAL/CALL best for REMEDIAL
students after B-1 and B-2 were R/W
work and underachievers?
LS students.
B-1 and B-2 does not show a
Is CALL more effective for certain statistical benefit of CALL for the
L2 skills areas than others? comprehension element of area
studies, merely an affective one.
Do certain levels of proficiency
profit more from computer use than See two cells above.
others?
Students appreciated instant and
What role does feedback play in the
discrete monitoring and feedback
effectiveness of CALL pedagogy?
via the Robotel monitoring facility.
B-1 suggested that differentiation
What degree of learner control is in learner control through
related to effective CALL designs? LS-differentiated learning enhances
motivation.
Table 8.15 Mapping of BLINGUA-1 and 2 projects against Dunkel (1991).

242 Chapter 8
Table 8.16 shows a mapping ofthe BLINGUA-1 and 2 projects against the
twelve CALL Enhancement Criteria. Even more so than with the Dunkel
mapping above for these projects, this table shows progression towards a
more self-consciously SLA-type approach to teaching and learning for the
treatment groups and some of the benefits thereof.
12 CALL Enhancement Criteria BLINGUA-1 BLINGUA-2

Focus primarily on form
Language learning Strong focus on form in
in context of meaningful
potential FRE101 language module.
output.
LS differentiation Lecture, Seminar and

matched learning Comprehension hour.
Learner fit resources to learner, but Lab functionality
only in 1/3 hours per advantageous in all 3
module. modes.
Written French and Drill element removed

Comprehension hours entirely from construct.
Meaning focus both focused on meaning So entire focus was on
more than drill the meaningful input and
latter more so. output.
Chapelle (2001)
Access to authentic texts Access to authentic

via WWW and TMM texts via WWW and
Authenticity
and differentiated staff differentiated staff
materials materials.
Some negative reaction

Positive reaction to the
to poor environment
new lab environment.
of lab, despite new
100% L+S+C in lab
Positive impact digital equipment. LS
popular for most, though
differentiation popular
some wanted a mix of
though no significant
settings.
benefit on learning gains.
Improved equipment
Much improved
compared with previous
ergonomics and technical
analogue lab. But some
Practicality functionality. Some
problems with sound
glitches in network, and
cards and head-sets and
noisy air-conditioning.
comfort of setting.
Focus primarily on Focus primarily on

Language skills and
written French and written French and
combinations of skills
comprehension skills comprehension skills
Treatment group had

Yes for 67% of activities;
Learner control much more control than
not for lecture
comparison group
More rapid feedback More rapid feedback

Error correction and for treatment group for treatment group
feedback when completing certain when completing certain
computerized tasks computerized tasks
More collaborative B-2 treatment group

for differentiated by worked collaboratively
Leakey
Collaborative CALL
LS groups. Pairs for for some of seminar
presentations element
A key difference between A key difference between

treatment and comparison treatment and comparison
Teacher factor groups. Treatment groups groups. Treatment groups
had less teacher-led (from had less teacher-led (from
the front) time the front) time
Another key difference Another key difference

between treatment and between treatment and
comparison groups, e.g. comparison groups, e.g.
access to PCs, type and access to PCs, type and
shape of room differed shape of room differed
according to tuition mode according to tuition mode
Table 8.16 Mapping of BLINGUA-1 and 2 projects against twelve

CALL Enhancement Criteria.
Conclusion to Ulster CALL pedagogy Case Study
The University of Ulster Case Study projects (TOLD, BLINGUA-1 and

2) have brought a number of benefits, both to the teaching and learning
oflanguages at the University ofUlster, and to the way in which CALL is
244 Chapter 8
evaluated. They have assisted the transition to a digital platform, shown us

that an improved CALL environment and blended teaching and resources
increase motivation even if they do not improve results, and confirmed
that CALL is better received when it is a normal and integral part of the
students experience rather than a bolt-on addition. On the negative side,
it is clear that the data by no means endorses our original hypothesis that
CALL significantly improves learning outcomes; in fact, with some stu-
dents (some novices to CALL, some abler students, some learning styles)
it is possible that CALL may hinder their learning.
Our Model for Evaluation has been applied here to various pedagogical
approaches, behaviourist, communicative, constructivist, and blended. It
has shown us strengths and limitations of our research design, and gaps that
need to be filled in future studies. In particular, in multivariate, long-term,
quasi-experimental studies the benefits of obtaining a richer overall picture
of an authentic CALL versus non-CALL experience are undermined by
the difficulties of isolating key variables and controlling for extraneous
variables. A clearer idea of causality and isolation of key variables will be
easier to obtain in an experimental, short-term, atomistic study, but this
may be at the expense of a richer, more authentic and overall picture of
student learning experiences. External validity was undermined in all of
our Case Studies by small sample sizes a problem which can be solved in a
number of ways. The obvious first solution is to conduct ones study in an
institution with larger language cohorts. Failing that one might consider
a cross institutional study where the treatment group is in one institution
and the comparison group is in another. The problem with such a con-
struct is that one is unlikely to be comparing like with like (e.g. different
locations, timetabling, teachers, resources and curriculum being the main
hurdles). Alternatively one might consider a time-series design whereby
the treatment and comparison groups in the same institution are swapped
around after one semester, thereby exposing all students to both treatment
and comparison settings. The main disadvantage of this is that students
after the first semester will now have knowledge that they did not have at
the start of the first semester, thereby potentially skewing the data.
The concluding chapter will first provide a summative overview of

the MFE2 in the light of the literature studied and experience gained in
the three Case Studies. It will also provide an extrapolation of the twelve
CALL Enhancement Criteria into more detailed checklists and thereby
provide CALL evaluators with a flexible and practical tool for judgmental
and empirical quality control of CALL in different educational settings.
Chapter 9
A new framework for evaluating CALL
MFE2: A new methodology for evaluating CALL
Dunkel defined effectiveness research as systematic evaluation (1991:

2324). This enquiry has been, above all, a quest for an improved systema-
tization of the field of CALL effectiveness research. It has featured an
exploration of evolving practice, agenda setting, as well as gaps and weak-
nesses in the field of effectiveness research in CALL. From the outset our
enquiry was divided into three main strands or clusters of related questions.
The first strand concerned the nature of learning and language learning,
and their relationship to computer-assisted language learning. The second
strand concerned the nature and degree of the impacts of CALL on lan-
guage learners and learning, the various learning processes, styles, skills and
sub-skills; the loci ofthese impacts were at the human-computer interface
and at the juncture ofthe Three Ps (platforms, programs and pedagogy),
and we were interested in the combined effects and synergies occurring
at these intersections. The third strand of enquiry concerned the nature,
direction and quality of CALL evaluation: if CALL does make a differ-
ence, we wanted to know how this can best be measured, what combina-
tions of judgmental and empirical, qualitative and quantitative measures
were appropriate in different settings. Following on from our three-strand
enquiry, and from the Case Studies that it generated, an end-product has
emerged which now needs to be tested by others across all educational sec-
tors (primary, secondary, tertiary, and adult education), in the full range
of existing CALL settings and resources, and targeting single or multiple
language learning skills. This end-product comes primarily in the form of
the evaluative frameworks that are gathered together in this final chapter.
248 Chapter 9
However, any of the checklists that do not feature in this chapter, such as
the checklists used in the Case Studies chapters based on the evaluative
criteria ofDunkel, Hubbard, Ingraham and Emery, may also form the part
of future evaluations. Most of the checklists that appear in this chapter
have already occurred in some form or other in the Case Studies and there-
fore need little explanation. However, most do not occur in exactly the
same form as they did earlier, given that the Case Studies were primarily a
formative process whereby the MFE1 prototypes were tried and tested, and
some developments have been made, and some new checklists generated.
The main novelties being the space allowed in most checklists for scoring
(qualitatively) the quality of the nature and use of the CALL resource in
a given context, and twelve CALL Enhancement Criteria sub-checklists
(one for each criterion) to enable a more detailed analysis to be made for
these covering the main theoretical elements linked to each.
MFE2 is, in short, a framework of quantitative and qualifying meas-
ures for the comparative judgment of platforms, programs and pedagogies
that improves on MFE1. It will give us a basis for scoring future CALL
effectiveness research evaluations using a trialled framework drawn from
CALL and SLA principles and road-tested in the Case Studies. Figure 9.1
was introduced in Chapter 4 and provides from the outset a simple over-
view as to the overall proposed evaluative process. The flowchart has been
amended slightly with the Research Design Criteria table appearing twice.
Its first appearance is as a prospective checklist straight after the Diamond
timeline, and here the questions are couched in the future tense. Its second
appearance is just before the final write-up as a retrospective checklist in
which the questions are couched in the past tense. The subsequent tables
enlarge upon each of the elements in this figure.
A clear idea of the effectiveness of CALL will only be gained when
CALL studies follow an agreed agenda and conform to accepted standards
of validity and reporting. MFE2 is a suggested solution. While its main
focus will be forward-looking, there may well also be value in revisiting past
CALL studies to pull these too into the same systematic categorization,
or meta-analysis, looking at not just programs and pedagogy but also the
ever-evolving digital platforms.
EVALUATION FLOW-CHART
CALL enhancement criteria
Learner fit
Meaning focus Qualitative & Quantitative Measures

Authenticity Diamond timeline
Positive Impact Research Design Criteria
Practicality (prospective)
Language skills Data Collection Methods
Learner control Validity checklist
Error correction & feedback Data Collection
Collaborative CALL Research Design Criteria

(retrospective)
Teacher factor
Report write-up
Figure 9.1 Evaluation flowchart (MFE2).
The proposed evaluative process is two-pronged, although evaluators may

wish to focus exclusively on just one of the two prongs. First, there is the
approach that is essentially about the obtaining of judgmental or phenom-
enological data based on the CALL Enhancement Criteria, which can, as
has been demonstrated, be applied to the design and delivery qualities of
the platform, program and pedagogy respectively; and secondly, there is the
obtaining of empirical data based on the systematic application of principles
of empirical research design, data-gathering and validity to student learning
gains over a period oftime. Once an overall study has been carried out a third
stage of a study may be conducted if desired: the discerning of synergies, or
collective dynamic, generated by the Three Ps when operating together.
250 Chapter 9
The left hand column in Figure 9.1 is made up of the twelve evalua-
tive criteria, or CALL Enhancement Criteria, which were generated from
the literature review and subsequent mapping exercises. They, it is argued,
should inform the direction and scope ofCALL evaluation studies, be they
qualitative or quantitative. Table 9.1 gives the full list of twelve criteria and
their definitions.
12 criteria for CALL

Definitions
enhancement
Language learning degree of opportunity present for beneficial focus on
1
potential form
amount of opportunity for engagement with
2 Learner fit language under appropriate conditions given learner
characteristics
extent to which learners attention is directed toward
3 Meaning focus
the meaning of the language
degree of correspondence between the learning activity
4 Authenticity and target language activities of interest to learners out
of the CALL environment
positive effects of the CALL activity on those who
5 Positive impact
participate in it
adequacy and cost-effectiveness of the resources to
6 Practicality
support the use of the CALL activity
the ability to deliver, either in isolation or in
Language skills and combination, all the main language skills, listening,
7
combinations of skills speaking, reading, writing, vocabulary, grammar and
area studies as well as meta-cognitive language skills
the degree of opportunity for self-directed, self-paced
8 Learner control
and autonomous learning
the extent of automated correction of error (whether
Error correction and explicit or implicit, formative or summative) and the
9
feedback monitoring, tracking, storing and reporting of progress,
level and achievement
degree of opportunity for paired or group interaction

10 Collaborative CALL creating the social dynamic for learning through
concerted and collaborative effort
the influence of individual teacher personality and style
11 Teacher factor factors on the effectiveness of CALL; the quality and
relevance of ongoing staff training and development
capacity of the CALL platform, software or pedagogy,
Tuition delivery to enable the delivery of a variety of teaching modes
12
modes (such as lecture, seminar, tutorial and practical) in a
CALL setting
Table 9.1 Synthesized list of criteria for evaluation of CALL programs, platforms
and pedagogy (MFE2). With definitions.
These twelve principles have, it is hoped, the theoretical and conceptual

breadth to cater for the full range of language learning approaches and
CALL resources, even though they have been informed primarily by SLA
and constructivist considerations. Empirical studies of CALLs effective-
ness cannot take place in a theoretical or pedagogical vacuum and any
judgment of student progress will need to be made in the light of existing
language learning theory and pedagogy, and will also need to discern from
the outset which theory (or blend of theories) and which pedagogy (or
blend of pedagogies) is extant in a given setting.
To be effective and usable an evaluative model must avoid falling
between the stools of over-simplicity, which would make it a blunt and
meaningless instrument, and excessive complexity, that might discourage
use through its impenetrability and unwieldiness. The series of evaluation
tables that follow are designed to enable a general assessment of CALL
provision in a given institution as well as penetrative insights into discrete
selected elements of that provision.
The first prong of an evaluative study will employ, it is suggested, the
Twelve CALL Enhancement Criteria and will be primarily qualitative, phe-
nomenological and judgment-based. Tables 9.2 and 9.39.14 illustrate how
the new model will move the evaluator from the general evaluative criterion
(9.2) to the particular sub-elements for each general criterion (9.39.14).
Table 9.3 covers evaluation of the twelve CALL Enhancement Criteria
252 Chapter 9
now enlarged from the criterion definition list (Table 9.1) to a chart with
columns for scoring adherence in a given institution or setting. Once the
evaluator has decided which P (or combination ofPs) is to be evaluated
the aims and objectives and the time-horizon ofthe study then need to be
spelt out. After that the evaluator needs to determine which phase of the
teaching cycle will be assessed, whether it is a task-based learning (TBL),
(Pre-task/Task phase/Language) or a Presentation-Practice-Production
cycle. From that point the form invites a judgmental grading of each of
the twelve elements for the extent to which it features in the cycle and the
quality of its implementation. An overall score may also be given which
will ultimately be converted into a percentage (see Net score cell). Tables
9.39.14 follow the same outline.
MFE2 Discrete principle quality control: 12 criteria for CALL Enhancement
Tick P being Time horizon (cross-sectional;
PPP descriptor Aims and objectives of CALL evaluation
studied longitudinal; time-series)?
Platform (hardware/ software
Which of the 3 Ps? solution; VLE?; brand name?):
Table 9.2 Model for CALL evaluation (MFE2) Quality control.
Program (commercial/ in-house;

Overall evaluation of the twelve CALL enhancement criteria.
networked or online?):
Pedagogy (language learning
theory; method used?):
Phase of Element How well done?
Degree (03)
cycle present? (05)
Description Notes on
CALL enhancement criteria 1 = minimally 1 = minimally
Phase 2
Phase 3
Phase 1
of element evidence
1. language learning potential
2. learner fit
Qualitative/judgmental data gathering
3. meaning focus
4. authenticity
5. positive impact
6. practicality
7. language skills & combinations of skills
8. learner control
9. error correction and feedback
10. collaborative CALL
11. teacher factor
12. tuition delivery modes
13. language learning potential
Totals Grand total: Final % grade:
254 Chapter 9
MFE2 Discrete principle quality control:
Tick P being
PPP descriptor Aims and objectives of
studied
Platform (hardware/ software solution;
Which of the 3 Ps?
VLE?; brand name?):

Pedagogy (language learning theory;
method used?):
Phase of
Element present?
cycle
Discrete principle evaluative sub-descriptors for

criterion 1: Language learning potential
Phase 2
Phase 3
Phase 1
Yes/No
Focus on form (salient L2 input, explicitly flagged

Doughty (1991, in Chapelle 2001))
Highlighting of specific grammatical items
Links to explanations of grammatical structures

Highlighting/repetition of specific lexical items
Links to meaning via word lists or images
Encouragement to discover own errors
Modified interaction
built-in opportunities for interruption of a reading,
listening or viewing task to allow for interactive
sequences and help options
Modified output
improved student discovery strategies through

formative error-correction and feedback
encouragement to use modified TL forms after

corrective feedback
meaningful practice of structural items
Totals
Criterion 1: Language learning potential
Time horizon (cross-sectional; longitudinal;

CALL evaluation
time-series)?
Degree (03) How well done? (05)
0 = not at all 0 = poorly Description of

1 = minimally 1 = minimally Notes on evidence
element
2 = somewhat 3 = to a great extent
Grand total: Final % grade:
Table 9.3 MFE2 quality control.

Evaluation of CALL enhancement criterion 1: Language learning potential.
256 Chapter 9
Tick P being
studied

Which of the 3 Ps?
solution; VLE?; brand name?):


method used?):
Phase of
Element present?
cycle

criterion 2: Learner fit
Phase 2
Phase 3
Phase 1
Yes/No
Opportunity for engagement with language?
Graded levels oflinguistic difficulty?
Needs analysis? Diagnostic test? CAT test?

Assessment of prior learning?

Link between diagnostic test and learning path
assigned?
Accessibility fit for those with special needs?
Content appropriate to age, gender, LD/LS?

Relevance oftask to learner demographic and
needs analysis?
Scaffolding for remedial, extension work?
Clear reason for use ofCALL, value-added?
Integrated with course/ module/units aims?

Personalized and customized levels and
intervention?
Totals
Criterion 2: Learner fit
Time horizon (cross-sectional;

CALL evaluation
longitudinal; time-series)?

element

Evaluation of CALL enhancement criterion 2: Learner fit.
258 Chapter 9

Tick P being
studied
Platform (hardware/ software solution; VLE?;
Which of the 3 Ps?
brand name?):
Program (commercial/ in-house; networked or
online?):
Pedagogy (language learning theory; method
used?):
Phase of Element
cycle present?

criterion 3: Meaning focus
Phase 2
Phase 3
Phase 1
Yes/No
To what extent is the learners attention directed towards the

meaning of the language?
Meaningful practice of structured items
Comprehensible input at a level just beyond that currently
Summarizing of content
Note-taking
Gap-filling
Dictation/transcribing
Information gap activity
Vocabulary building
Feedback/error correction focused on meaning
Comprehension questions (multiple-choice)
Comprehension questions (open-ended)
Combined skill activity (L/R/S/W)
Training in improved communication strategies
Subtitling, voice-over tasks with AV clips
Totals
Criterion 3: Meaning focus

CALL evaluation
time-series)?

element

Evaluation of CALL enhancement criterion 3: Meaning focus.
260 Chapter 9
Tick P being
studied

Which of the 3 Ps?
VLE?; brand name?):

method used?):
Phase of
Element present?
cycle

criterion 4: Authenticity
Phase 2
Phase 3
Phase 1
Yes/No
Projects a clear link between the learning activity

and target language activities of interest to learners
beyond the CALL environment
The interaction is context-embedded

the language is cognitively demanding involving
episodes of choice and problem-solving learners

might meet in the real world
The CALL task affords the opportunity to use the
TL in ways that the learner will be called upon to do
as a language user
Integrating audio/video material into classes
Use non-linguistic representations (graphs, charts,
maps, images)
Encourage students to test their knowledge in real
settings
Make reference to people the students know
Authentic content taken from target culture (web,
printed, audio-visual, native speakers)
Simulations of real-life activities (telephone calls
(MALL), shop, job interview, surveys, blogs, wikis,
newsroom, Facebook groups, etc)
Totals
Criterion 4: Authenticity

CALL evaluation
time-series)?

element

Evaluation of CALL enhancement criterion 4: Authenticity.
262 Chapter 9

Tick P being
studied
Which of the 3 Ps?

method used?):
Phase of
Element present?
cycle
Phase 2
Phase 3
Phase 1
criterion 5: Positive impact
Yes/No
The effect of CALL use on those who use it

PLATFORM:
Encourages student engagement through
ergonomic design, intuitive interface and
functionality
Combines well with software and pedagogy to

deliver content appropriate to lesson aims
PROGRAM:
Encourages student engagement through
ergonomic design, intuitive navigation and
functionality
Customizable to student level and learner need
Adds value to classroom teaching. Not bolt-on
Provides positive reinforcement through
automated feedback and tracking
PEDAGOGY
Integrates CALL well with curriculum
Recognizes student effort leading to improved
engagement in cognitive processes
Promotes a positive self image in the learner
Totals
Criterion 5: Positive impact

CALL evaluation
time-series)?
How well done?

Degree (03)
(05)
Description of
element

Evaluation of CALL enhancement criterion 5: Positive impact.
264 Chapter 9
Tick P being
studied

Which of the 3 Ps?
VLE?; brand name?):

method used?):
Phase of
Element present?
cycle
Phase 2
Phase 3
Phase 1
criterion 6: Practicality
Yes/No
Are the CALL resources adequate to support the

CALL activity?
There are sufficient workstations to cater for
demand
There is competent technical support available
The PCs are networked and linked to

institutional LAN and WWW
The PCs have sufficient memory and speed to
support multimedia activities and user numbers
The software has sufficient user-licences to meet
demand
The software is appropriate to the level and
content of the curriculum
The software is compatible with the platform and
PCs
The staff are trained in the use ofthe platform
The staff are trained in the use ofthe software
The staff are trained in CALL pedagogy
The resources are cost-efficient
The resources are time-saving
Totals
Criterion 6: Practicality

CALL evaluation
time-series)?
How well done?

Degree (03)
(05)
Description of
element

Evaluation of CALL enhancement criterion 6: Practicality
266 Chapter 9

Tick P being
studied
Which of the 3 Ps?
VLE?; brand name?):

method used?):
Phase of Element
cycle present?
Phase 2
Phase 3
Phase 1
criterion 7: Language skills
Yes/No
Does the CALL resource have the capacity to deliver

individually or in combination the principal language
skills?
Listening (from single utterance to longer speech)
Reading (from simple texts to whole ebooks)
Speaking (from pronunciation drills to
conversation, and interpreting)

Writing (from gap-fill, to translations and essays)
Vocabulary acquisition
Grammar
Area studies/civilization
and meta-cognitive language learning skills
Dictionary skills
Note-taking, referencing, web-based research
Language learning strategies
transferable skills
Basic ICT skills (file management/ word-
processing etc)
Presentational ICT
Multimedia use (subtitling, video editing etc)
Totals
Criterion 7: Language skills and combination of skills

CALL evaluation
time-series)?
How well done?

Degree (03)
(05)
Description of
element
Table 9.9 MFE2 quality control. Evaluation of CALL enhancement criterion 7:

Language skills and combination of skills.
268 Chapter 9
MFE2 Discrete principle quality control: Criterion 8: Learner control
Tick P being
studied

Which of the 3 Ps?

method used?):
Phase of
Element present?
cycle
Phase 2
Phase 3
Phase 1
criterion 8: Learner control
Yes/No
What degree of opportunity is afforded by the

CALL resource/activity for self-directed, self-paced
and autonomous learning?
It encourages and trains the students in the meta-
cognitive processing of tasks
It gives the students, when appropriate, control of
the workstation

the software
the learning content
It gives the students flexibility regarding the
learning context (LRC, classroom, teaching lab,
self-access lab)
Students may contribute to decisions regarding:
Learning path
Pace oflearning
Number of attempts at a task
Peer review
Communication with group
Communication with tutor
Totals
Criterion 8: Learner control

CALL evaluation
time-series)?

element

Evaluation of CALL enhancement criterion 8: Learner control.
270 Chapter 9
Tick P being
studied

Which of the 3 Ps?
VLE?; brand name?):

method used?):
Phase of
Element present?
cycle

criterion 9: Error correction and feedback
Phase 2
Phase 3
Phase 1
Yes/No
Does it provide for the correction of errors?

Explicit correction
Implicit correction
Formative feedback
Summative feedback
Monitoring of student activity
Tracking of student activity
Reporting of student activity
Certification of student activity
Student access to data
Teacher access to data
Totals
Criterion 9: Error correction and feedback

CALL evaluation
time-series)?

element
Table 9.11 MFE2 quality control. Evaluation of CALL enhancement criterion 9:

Error correction and feedback.
272 Chapter 9

Tick P being
studied
Which of the 3 Ps?

Pedagogy (language learning
theory; method used?):
Phase of
Element present?
cycle
Phase 2
Phase 3
Phase 1
criterion 10: Collaborative CALL
Yes/No
What degree of opportunity is afforded by the

CALL resource for paired or group interaction
and learning?
The CALL resource provides for two-way
interaction:
Teacher-students
Student-student paired interaction

Student-student group interaction
The CALL resource is equipped to enable
Audio-visual conference link between students
within the same room
Audio-visual conference link between students
in separate labs on same or remote site
Audio-visual link conference between students
and others via WWW applications (e.g. Skype,
MSN etc)
Chat/ text-based communication between
students
Individual and group uploads to VLE or
common space
Peer review online of group work
Individual and group feedback and tracking
Totals
Criterion 10: Collaborative CALL

CALL evaluation
time-series)?

Description of
element
2 = somewhat 3 = to a great extent 5
3 = fully = excellently

criterion 10: Collaborative CALL.
274 Chapter 9
Tick P being
studied

Which of the 3 Ps?

method used?):
Phase of
Element present?
cycle
Discrete principle evaluative sub-descriptors for criterion

11: Teacher factor
Phase 2
Phase 3
Phase 1
Yes/No
To what extent does the CALL resource allow for

input and customization by the teacher?
In terms of content
In terms oflevel of difficulty

In terms oflearning path
In terms of pace
In terms of remedial and extension provision
In terms of assessment
In terms of personalizing input
In terms of self-authoring of content
In terms ofteachers personality and teaching style
To what extent does the CALL resource provide
training for the teacher?
In terms of online, animated or paper manual
In terms of ongoing support for the teacher
Totals
Criterion 11: Teacher factor

CALL evaluation
time-series)?

element

Evaluation of CALL enhancement criterion 11: Teacher factor.
276 Chapter 9

Tick P being
studied
Which of the 3 Ps?

method used?):
Phase of
Element present?
cycle

criterion 12: Tuition delivery modes
Phase 2
Phase 3
Phase 1 Yes/No
Does the CALL resource allow for a flexible

approach to delivery mode?
Resources for Tutor preparation of materials
Space and equipment for tutor preparation of
their own authored materials
Equipment for tutor preparation of materials

from other sources (WWW, CD-ROM, AV, VLE)
Tutor delivery and follow-up
Lecture/tutorial mode: to whole class or selected
groups or individuals
Seminar mode: Interactive, two-way, sessions
Facilitator mode: tutor scanning, monitoring and
intervention, control of student screen, keyboard,
mouse. Tracking of student performance.
Student engagement with resources in different
modes
Access to online helps, tutorials and dictionaries
Access to WWW, gadgets, storage, hardware,
software, tutor, technical support, other students
Totals
Criterion 12: Tuition delivery modes

CALL evaluation
time-series)?
Degree(03) How well done? (05)

element

Evaluation of CALL enhancement criterion 12: Tuition delivery modes.
278 Chapter 9
The above tables should provide the evaluator with a clear idea ofthe qual-
ity ofCALL provision, resources and adherence to principles of pedagogy.
As such they may exist as a stand-alone study. However, if the evaluator
wishes to conduct an empirical, or positivistic, study of the impact of
CALL provision on student language learning gains and experience, and
data-gathering of a quantitative and qualitative nature, then what follows
is guidance as to a possible methodology to follow. As with the above
such a study may stand alone or else be combined and configured with
the phenomenological approach above to determine synergies provide a
fuller and richer picture.
A proposed timeline for an empirical CALL evaluation
The principal diagnostic and metric processes involved in a CALL effec-

tiveness research, or CALL impact, study looking at student gains over a
period of time, are pulled together Figure 9.1 below, in a single timeline
or flow-chart, which is the diamond referred to in the Evaluative flow-
chart. Most of these measures featured in the Case Studies. The vertical
line represents the chronological starting point of any empirical and/or
phenomenological study. It indicates the need to establish a benchmark of
standards from which to compare like-with-like, and control for differences
in the independent variables (such as: age, gender, competence, learning
style). CALL studies invariably address one or more ofthe three Ps in either
a cross-sectional study over varied periods of time (a day, week, semester,
academic year, etc.) and adopting single or repeated measures, time-series
or replication studies. Most CALL studies have tended to gather mainly
qualitative data and correlate them with similar studies in the literature; a
more powerful model will add quantitative data and triangulate them with
the qualitative data and similar studies in the literature. MFE2 aims to pro-
vide such a configuration, drawing on the various agendas and approaches
in the CALL/SLA literature that informed the mapping processes of the
Case Studies and proposing a systematic method for cataloguing past
evidences for CALL effectiveness and adding to them, incrementally, the
findings of future studies that meet the quality control criteria set out in
the literature (in particular by Dunkel, Pederson, Hubbard, Chapelle, and
Felix) and reviewed and trialled in this study.
Following on from Figure 9.2 are four quality control checklists (Tables
9.159.18) that were prefigured in Chapter 4, and provide prompts linked
to ensuring a high standard of research design, data collection, and validity
in an empirical study. Clearly a foundational understanding of statistical
techniques and an ability to use statistical analysis software is a prerequisite
for proceeding down this route. There is not space nor was it the aim in this
study to provide tuition in statistical analysis and readers should at the very
least consult the relevant literature before conducting such a study. Table
9.15 provides prospective questions relating to the design of the research
study, such as the sample size, allocation of subjects to treatment and com-
parison groups, and the metric measures to be employed. These need to be
addressed prior to the commencement ofthe study. The same table should
be revisited retrospectively at the end ofthe study, and Table 9.18 is similar
to 9.15 but with the questions couched in retrospective terms. Table 9.16
addresses the data collection methods (both qualitative and quantitative)
that will be employed, and gives space for the researcher to score the degree
and nature oftheir use. Table 9.17 asks the evaluator to consider the valid-
ity issues (both internal and external) necessary for a robust study whose
findings may then be generalizable to other contexts and studies.
Figure 9.2 Evaluation diamond for CALL effectiveness research (MFE2).

Research Design Criteria for a CALL impact study
Criteria Questionnaire Details
What will be your Sample Size at start of study (N =?)
Sampling
What will be your Sample Size at end of study (N =?)

What will be your complete number of Full Data Sets? (N=?)
What Pedagogical approach or Teaching Methodology will be adopted?
PPP
What Platform Technology will be used (software, hardware or hybrid)?

What Program Technology will be used?
Over what Period of Time will the study take place?
What is the Research Construct (Experimental, Quasi-experimental, Non-
experimental, Pre-experimental) of your study?
What Research Design (combination of qualitative and quantitative data
collection methods) will be employed? E.g. between-subjects time-series
study with focus groups; or within-subjects, cross-sectional with surveys.
Is the study a Between-Subjects or Within-Subjects design?
Conditions of the study
Are the Instructors across the groups the same person/different people?
Are the Activities across the groups: identical, near-identical, different?
Will the Pre- and Post-tests be identical, near-identical or different?
What Language(s) are being studied?
What Language Skill/Combination of language skills is under analysis?
What Variable(s) are being analysed
Is the Allocation of Subjects to groups random or selective
If Random allocation, how will this be achieved?
If Selective, what criteria and methods will be used to select subjects
What methods for controlling for & isolating of variables will be adopted?
Will the scoring be carried out by an independent scorer?
Is the wording of your Null Hypothesis and your Alternative Hypothesis
appropriate? Will these be recorded in your reporting?
What instrument(s) will be used for the Comparison of Means?
What instrument(s) will be used to measure Correlation? Parametric or
non-parametric?
What instrument(s) will be used to measure Variance? Parametric or non-
parametric?
What instrument(s) will be used to measure Covariance? Parametric or
non-parametric?
Will an Effect Size equivalent be given where relevant?
What degree of confidence will be established at the outset? (99% or 95%)
Table 9.15 Research Design Criteria pre-project checklist for MFE2.

Phase of cycle (if Element How well done?
Degree (03) Evidence
relevant) present? (05)
Presentation/ 0 = not at all 0 = poorly
relate comments
Data collection methods for a CALL impact study Practice/ 1 = minimally 1 = minimally
to the fuller
Production? Yes/No 2 = somewhat 3 = to a great extent
definition of
Pre-task, Task phase, 3 = fully 5 = excellently
each descriptor
Language phase? CJ = cannot judge CJ = cannot judge
Diagnostic survey of prior learning
Diagnostic survey of learning style
Diagnostic survey of learning strategies
Pre-treatment survey of student reaction
Mid-treatment survey of student reaction
Post-treatment survey of student reaction

CALL learner + learning measures
Pre-treatment student focus group

Mid-treatment student focus group
Post-treatment student focus group
Pre-treatment survey of staff reaction
Mid-treatment survey of staff reaction
Post-treatment survey of staff reaction
Pre-treatment staff focus group
Mid-treatment staff focus group
Post-treatment staff focus group
Electronic/paper log/journal of student reaction
Electronic/paper log/journal of staff reaction
empirical data
Test(s) of prior learning

Quantitative/
Pre-test
Progress test (mid-treatment)
Post-test (identical to pre-+ progress test)
Table 9.16 Model for CALL evaluation MFE2 Quality control:Data collection measures. For learner and learning.
282 Chapter 9
Phase of cycle
(if relevant)
Validity checklist Pre-task,
for a CALL impact study Task phase,
Language
phase?
Is this an experimental (variables can be controlled for/manipulated) or

quasi-experimental (variables cannot be controlled for/manipulated) study?
Have the students been randomly assigned to the treatment and

comparison groups?
Have the respondents been isolated from each other?
Are the results attributable to the factor(s) studied?
Internal validity
What other factors (variables) might have contributed to the effect?
How will you control for extraneous variables (such as learner/ teacher
differences, variable settings, time of day/week/year)?
How certain are you the learners are not getting language instruction
apart from through this study?
Does the student reporting accurately reflect what happened?
Are the different variables (independent/control/ dependent) clearly
identified and reported?
Sample less easily generalizable N < 30 use non-parametric tests
External validity
To what extent can the results be generalized to other populations,

settings and experimental situations? How relevant are they elsewhere?
Does the report describe the skills tested?

Does the report describe the characteristics of the subjects?
(i.e. age; gender; ability/year group; cohort/course
Does the report describe the CALL materials used?
How well done?

(05)

Research construct validity checklist.
How well done?

(05)

Research construct validity checklist.
284 Chapter 9
Research Design Criteria for a CALL impact study

End of study Criteria Questionnaire Details
What was your Sample Size at start of study (N = ?)
Sampling
What was your Sample Size at end of study (N = ?)

What was your complete number of Full Data Sets? (N = ?)
What Pedagogical approach or Teaching Methodology was adopted?
PPP
What Platform Technology was used (software, hardware or hybrid)?

What Program Technology was used?
Over what Period of Time did the study take place?
What was the Research Construct (Experimental, Quasi-experimental,
Non-experimental, Pre-experimental) of your study?
What Research Design (combination of qualitative and quantitative data

collection methods) was employed? E.g. between-subjects time-series
study with focus groups; or within-subjects, cross-sectional with surveys.
Was the study a Between-Subjects or Within-Subjects design?
Were the Instructors across the groups the same person/different
Conditions of the Study
people?
Were the Activities across the groups: identical, near identical, different?
Were the Pre- and Post-tests identical, near identical or different?
What Language(s) were being studied?
What Language Skill/Combination of language skills were under
analysis?
What Variable(s) were being analysed?
Was the Allocation of Subjects to groups random or selective?
If random allocation, how was this achieved?
If selective, what criteria and methods were used to select subjects?
What methods for controlling for and isolating of variables were adopted?
Was the scoring carried out by an independent scorer?
Was the wording of your Null Hypothesis and your Alternative

Hypothesis appropriate? Have these been recorded in your reporting?
What instrument(s) were used for the Comparison of Means?
What instrument(s) were used to measure Correlation? Parametric or

non-parametric?
What instrument(s) were used to measure Variance? Parametric or non-
parametric?
What instrument(s) were used to measure Covariance? Parametric or
non-parametric?
Was an Effect Size equivalent given where relevant?
Was the degree of confidence established at the outset maintained?
(99% or 95%)
Table 9.18 Research Design Criteria post-project checklist for MFE2.
Reporting of CALL effectiveness research needs to address the above

issues and the above tables have been designed to reflect the good practice
recommended in the literature and thereby instil both a logical progres-
sion and thoroughness to such reports. Ifthese measures are followed they
should also make reporting on an evaluative study a more straightforward
process.
Conclusions and future recommendations
It is clear that, in the search for rigour and improved validity in CALL effec-
tiveness MFE2 might run the risk of imposing excessive rigour, of losing
sight ofthe wood in the focus on the separate trees. It also risks alienating
the human subjects under investigation through excess of monitoring and
measuring, of exasperating the evaluator with excessive demands, and of
controlling out the human element just because it is so hard to pin down.
As Felix puts it:
286 Chapter 9
Naturally, one can go too far in the demand for the application of rigorous condi-
tions to educational research. After all, if we managed to control for every possible
confounding variable in an experimental design we would be left with the technol-
ogy itself as an independent variable, when in todays learning environment this is
inextricably linked to the instructional method and the context in which the learning
takes place. (2005a: 23).
The identification, exploration, measurement and evaluation of what Felix

describes above as inextricably linked have been the focus ofthis enquiry.
Our Model for Evaluation has also helped us progress towards an improved
means of identifying and qualifying synergies between the Three Ps. The
evaluative framework outlined above was the primary product ofthe inves-
tigation. It will, hopefully, provide a way forward for the more precise
exploration ofboth the trees and the wood ofCALL effectiveness research.
For in many ways the diagnostic tools we used to get this far were blunt
instruments, the more so when the number of variables at play increased,
and when the small sample sizes fell short of a generalizable level. In addi-
tion, it must be remembered that our Case Studies were restricted to UK
students in two universities, and confined to a time-horizon ofbetween a
semester and three years, with only one Case Study (BLINGUA) taking
us beyond one academic semester.
Felix adds that it is imperative to also look at studies that focus on the
process oflearning rather than outcomes alone (2005a: 3). Our Case Stud-
ies showed that to study process effectively requires a stable environment,
students and staff that are familiar with the CALL routines and technol-
ogy and longitudinal studies involving repeated measures analyses on the
same students (not a different cohort). The Case Studies all took place in
contexts where a process of migration to new digital resources was taking
place; as such they were contexts in a state offlux. Nevertheless, they were
a useful cross-sectional snap-shot of the transition to CALL and threw
up useful insights regarding appropriate blends of resource, setting and
pedagogy. While all ofthe Case Studies lasted at least a semester, which is
perhaps long enough to gain useful data in a stable setting where students
(and staff ) are familiar with the CALL setting, resources, and teaching
approach, may not be an adequate length oftime and a stable enough set-
ting for obtaining reliable data.
While our research design was based on sound empirical principles

(i.e. pre-test/ post-test and a configuration of qualitative and quantitative
data) there were problems in its administration which future research will
need to address. Fuller data sets would probably have been obtained by
administering post-tests at a different time when students attention was
not on exam revision or the submission of coursework. Larger sample sizes
would have been obtained by adopting a repeated measures approach, such
as a time-series study, whereby one group of students followed the treatment
(be it CALL only or a blend ofCALL/non-CALL) for half ofthe period
of time (a semester, or academic year) and the other was denied the treat-
ment (i.e. would follow a non-CALL only treatment or a different blend
ofCALL); at the half way point the groups would swap round. This would
have possibly also better served a longitudinal study. This approach would
ensure equality and address the ethical problem of denial of treatment to
the comparison group. The challenge, again, would be in the administra-
tion, in particular, ensuring that after the swap round the assessments were
the same or near-identical, to ensure validity, while not repeating the same
content and thereby wasting students time.
To the extent that they endorse, or cast doubt on, the efficacy ofCALL
the findings from any well-conducted study should have implications for
the future pursuit of CALL and language teaching. If the answers to the
above questions had shown categorically in favour of CALL, then the
implications would have been that any language teacher not using CALL
on principle should seriously reconsider his/her position. Conversely, if
the findings had cast serious doubt on the efficacy of CALL then, like-
wise, every language teacher who had, perhaps unthinkingly, embraced
CALL would need to reassess their position. As, however, the findings
have been largely inconclusive, suggesting that neither CALL nor non-
CALL is significantly more effective than the other, then prudence would
suggest a pragmatic approach that adopted and blended the best of both
worlds, while continuing to test and evaluate the evolving pedagogies and
resources.
The Case Studies have shown clearly that CALL platforms, programs
and pedagogy should not be viewed as an all-or-nothing proposition, which
would be a threatening idea to many and prohibitively expensive to adopt,
288 Chapter 9
but rather it should be seen as a tool to be integrated thoughtfully into

language learning curricula after careful evaluation of its potential merits
and ofthose points in a scheme of work where it will add value. Clearly our
students welcome the appropriate incorporation oftechnology into their
learning, and are motivated by those platforms, programs and pedagogies
that they feel enhance their learning experience, if not their learning out-
comes. They are increasingly familiar with a wide range oftechnologies in
their everyday life, and a curriculum that made no use oftechnology would
potentially deny them essential skills for life. What my research has shown
to be needful, however, is a pragmatic blending oflearning environments,
resources and approaches, that draws on good practice from the classroom
and the lab/ virtual environment, and is flexible enough to take on board
new technologies (IWBs, MALL, podcasting, etc) as, where, and when
these are shown to be effective.
What MFE1 and MFE2 have begun to address is the need for a working
evaluative system capable of identifying and qualifying both the separate
dynamic of, and the interrelationships between, CALL platforms, programs
and pedagogies of diverse kinds. In addressing calls in the literature for a
robust agenda for CALL effectiveness research our enquiry has defined
the scope of, and procedure for, a holistic, integrative, progressive and
incremental approach to enlarging the bank of evidence for the impacts
of CALL. Felix appears to have hinted at the need for something of its
kind when she stated:
While no single study, nor any meta-analysis on its own can so far give a definitive
answer on ICT effectiveness, a series of systematic syntheses of findings related to
one particular variable such as learning style or writing quality might produce more
valuable insights into the potential impact oftechnologies on learning processes and
outcomes. These would need to incorporate qualitative findings rather than rely on
effect sizes alone. An approach like this would begin to establish a research agenda in
ICT effectiveness rather than continue the series of isolated single studies on different
topics from which it is difficult to draw firm conclusions. (Felix, 2005a: 17).
This study has, it is hoped, begun to address Felixs call. The Case Studies
have all addressed various aspects of her agenda: her call for more rigour
in construct design, for greater detail and transparency in reporting, and
for an atomistic approach to gaps in the CALL effectiveness research lit-

erature (e.g. the oral skill). But if the Case Studies findings are not to be
consigned to her series of isolated single studies on different topics from
which it is difficult to draw firm conclusions, they need to be harnessed to
her call for systematic syntheses of single variable analyses and replication
studies of narrowly-focused projects. And if the evaluative framework, to
which this investigation has given birth, is to be harnessed as a working
tool for such a systematic approach, it can start by collating and incorpo-
rating the existing body of work that already meets Felixs and the MFE
criteria, including those isolated single studies. This study has been less
about justifying CALL, or trying to prove its incontrovertible impacts in
the different Case Studies, than about how best to evaluate CALL in a
range of contexts, using a variety of resources, and delivered via a number
of different teaching approaches. What has been established is a systematic
agenda and methodology for CALL evaluation, which may also serve as a
more general methodology for CALL research. Its ultimate aim must be to
generate an ongoing, logically-sequenced, and ever-enlarging meta-analysis
that will, with every new study carried out using sound methodology in
line with an agreed agenda for CALL evaluation, add credibility to the
body of evidence for CALLs effectiveness.
Bibliography
Alderman, D.L. (1978). Evaluation of the TICCIT computer-assisted instructional

system in the Community College (final report volume 1). Princeton, NJ: Edu-
cational Testing Service.
Allum, P. (2002). CALL and the classroom: The case for comparative research.
ReCALL, 14(1): 144166.
Al-Seghayer, K. (2001). The effect of multimedia annotation modes on L2 vocabu-
lary acquisition: A Comparative Study. Language-learning and Technology
[online], 5(1): 202232. Available at: <http://llt.msu.edu/vol5num1/alseg-
hayer/> [accessed 12 January 2008].
Barr, D. (2004). ICT Integrating computers in teaching, creating a computer-based
language-learning environment. Bern: Peter Lang.
Barr, D., Leakey, J. & Ranchoux, A. (2005). TOLD like it is! An evaluation of an
integrated oral development model. Language-learning & Technology [online],
9(3): 5578. Available at: <http://llt.msu.edu/vol9num3/pdf/vol9num3.pdf>
[accessed October 2005].
Barrire, C. & Duquette, L. (2002). Cognitive-based model for the development of
a reading tool in FSL. CALL Journal, 15(5): 469481.
Bax, S. (2003). CALL Past, present and future. System, 31: 1328.
BECTA (British Educational Communications and Technology Agency) (2004).
What the research says about using ICT in modern foreign languages [online].
Available at: <http://www.becta.org.uk/page_documents/research/wtrs_mfl.
pdf> [accessed October 2004].
Bell, M.A. (2002). Why use an Interactive Whiteboard? A bakers dozen reasons!
Teachers.Net Gazette [online], (3)1, Available at: <http://teachers.net/gazette/
JAN02/mabell.html> [accessed 19 December 2007].
Bersin & Associates. (2003). Blended learning: what works? [Online]. Available at:
<http://www.e-learningguru.com/wpapers/blended_bersin.doc> [accessed 18
August 2004].
Buglear, J. (2000). Stats to go A guide to statistics for hospitality, leisure and tourism.
Oxford: Butterworth-Heinemann.
Chapelle, C. (1989). Using intelligent Computer-Assisted Language-Learning. Com-
puters and the Humanities, 23: 5970.
292 Bibliography
Chapelle, C. (1997). CALL in the year 2000: still in search of research paradigms?
Language-learning & Technology [online], 1(1): 1943. Available at: <http://
llt.msu.edu/vol1num1/chapelle/> [accessed 10 October 2004].
Chapelle, C. (1998). Multimedia CALL: lessons to be learned from research on
instructed SLA. Language-learning and Technology [online], 2(1): 2234. Avail-
able at: <http://llt.msu.edu/vol2num1/article1/> [accessed 18 August 2004].
Chapelle, C. (2001). Computer applications in Second Language Acquisition. Cam-
bridge: Cambridge University Press.
Chapelle, C., & Jamieson, J. (1991). Internal and external validity issues in research
on CALL effectiveness. In P. Dunkel (ed.), Computer-Assisted Language-learn-
ing and Testing: Research Issues and Practice, pp. 3759. New York: Newbury
House, 1991.
Clarke, M. (2005). Moving towards the digital classroom. [Conference paper]. Pre-
sented at EUROCALL 2005 conference Krakow, Poland.
Coleman, J.A., & Klapper, J. (eds) (2005). Effective learning and teaching in modern
languages. London & New York: Routledge.
The Concise Oxford Dictionary (1982). 7th edn. Oxford: Oxford University Press.
Cutrim Schmid, E. (2007a). Enhancing performance knowledge and self-esteem in
classroom language-learning: The potential of the ACTIVote component of
interactive whiteboard technology. System, 35: 119133.
Cutrim Schmid, E. (2007b). Interactive Whiteboard technology: A further step
towards the normalisation ofCALL? [Conference paper]. Presented at EURO-
CALL 2007 Conference, University of Ulster.
CyberItalian. [Online]. Available at: <http://cyberitalian.com/> [accessed 10 Octo-
ber 2007].
Davies, G. (1997). Lessons from the past, lessons for the future: 20 years of CALL.
In A.-K. Korsvold & B. Rschoff (eds), New technologies in language-learn-
ing and teaching. Strasbourg: Council of Europe. Available at: <http://www.
camsoftpartners.co.uk/coegdd1.htm> [updated December 2007, accessed 11
January 2008].
Davies G. (ed.) (2007). Information and communications technology for language
teachers (ICT4LT). Slough, Thames Valley University [Online]. Available at:
<http://www.ict4lt.org/en/evalform.doc> [accessed 7 December 2007].
Davies, G., Bangs, P., Frisby, R., & Walton, E. (2005). Setting up effective digital
language laboratories and multimedia ICT suites for MFL. CILT. [Online].
<http://www.languages-ict.org.uk/info/digital_language_labs.pdf> [accessed
26 August 2005].
Davies, G., & Higgins, J. (1982). Computers, language and language learning. London:
CILT.
Bibliography 293
Decoo, W. (2001). On the mortality oflanguage-learning methods. [Online]. Avail-

able at: <www.didascalia.be/mortality.htm> [accessed 26 July 2004].
Delcloque, P. (ed.) (2000). The history ofComputer-Assisted Language-Learning
Web exhibition. [Online]. Available at: <http://www.eurocall-languages.org/
resources/> [accessed 9 October 2007].
Doughty, C. (1988). Relating Second-Language Acquisition theory to CALL research
and application. In W.F. Smith (ed.), Modern media in foreign language educa-
tion: Applications and projects, pp. 133167. Lincolnwood, IL: National Text-
book Company, 1988.
Driscoll, M. (1994). Psychology of learning for instruction. Boston: Allyn and
Bacon.
Driscoll, M. (2002). Blended learning: Lets go beyond the hype. E-learning, 3(3):
254.
Dunkel, P. (ed.) (1991). Computer-Assisted Language-Learning and testing: Research
issues and practice. New York: Newbury House.
Dunkel, P. (1991). Research on the effectiveness ofComputer-Assisted Instruction and
Computer-Assisted Language-learning. In P. Dunkel (ed.), Computer-Assisted
Language-Learning and testing: Research issues and practice, pp. 136. New
York: Newbury House.
Dunkel, P. (1999). Considerations in developing or using second/foreign language
proficiency computer-adaptive tests. Language-learning & Technology [online],
2(2): 7793. Available at: <http://llt.msu.edu/vol2num2/article4/> [accessed
22 July 2006].
Eliot, T.S. (1943). The Four Quartets. [Online]. Available at: <http://www.tristan.
icom43.net/quartets/> [accessed: 26 April 2005].
Ellis, R. (1994). Oxford Applied Linguistics: Study of Second Language Acquisition.
Oxford: Oxford University Press.
Ellis, R. (1997). SLA research and language teaching. Oxford: Oxford University
Press.
Felix, U. (1993). Marking: A pain in the neck The computer to the rescue. Babel,
28(3): 1516.
Felix, U. (1999). Exploiting the Web for language teaching: Selected approaches.
ReCALL, 11(1): 3037.
Felix, U. (2000a). The impact ofthe Web on CALL. [Online]. Melbourne: Monash
University. Available at: <http://users.monash.edu.au/~ufelix/finalWebCall.
htm> [accessed 10 October 2007].
Felix, U. (2000b). The potential of CD-ROM technology for integrating language
and literature: student perceptions. [Online]. German as a foreign language, 2:
4863. Melbourne: Monash University. Available at: <http://www.gfl-journal.
de/22000/felix.pdf> [accessed 16 December 2003].
294 Bibliography
Felix, U. (2001). A multivariate analysis of students experience of web based learning.

Australian Journal of Educational Technology, 17(1), 2136.
Felix, U. (2002). The Web as a vehicle for constructivist approaches in language teach-
ing. ReCALL, 14(1): 215.
Felix, U. (2004). Paradoxes and pitfalls of ICT effectiveness research: Some modest
solutions. [Conference paper]. In: 11th International CALL Conference on
CALL & Research Methodologies, University of Antwerp Proceedings,
pp. 113142.
Felix, U. (2005a). Analysing recent CALL effectiveness research Towards a common
agenda. Computer Assisted Language-Learning 18(1&2): 132.
Felix, U. (2005b). E-learning pedagogy in the third millennium: The need for combin-
ing social and cognitive constructivist approaches. ReCALL, 17(1): 85100.
Felix, U. (2005c). What do meta-analyses tell us about CALL effectiveness? ReCALL,
17(2): 269288.
Felix, U. (2006). Accelerative Learning Wonder method or pseudo-scientific gob-
bledygook? Melbourne: CAE Press.
Felix, U. (2007). The unreasonable effectiveness of CALL: What have we learnt in
two decades of research? [Plenary talk]. Given at: EUROCALL 2007 Confer-
ence at the University of Ulster. [Online archive]. Available at: <http://www.
eurocall2007.com/> (View Plenary 3) [accessed 5 December 2007].
Fry, H., Ketteridge, S., & Marshall, S. (eds) (2003). A handbook for teaching and learn-
ing in higher education. 2nd edn. London and New York: Routledge Falmer.
Gillespie, J.H., & Barr, J.D. (2002). Resistance, reluctance and radicalism: A study
of staff reaction to the adoption ofCALL/C&IT in modern languages depart-
ments. ReCALL, 14(1): 120132.
Gillespie, J.H., & McKee, J. (1995). The Text Analysis Program: Moving closer to the
computer-based language class-room. [Conference paper]. In: Proceedings of
EUROCALL 1995, Valencia: Servicio de Publicaciones SPUPV, pp. 133146.
Gillespie, J.H. & McKee, J. (1999a). Does it fit and does it make any difference? Inte-
grating CALL into the curriculum. CALL, 12(5): 441455.
Gillespie, J.H. & McKee, J. (1999b). Resistance to CALL: Degrees of student reluc-
tance to use CALL and ICT. ReCALL, 11(1): 3846.
GlobalEnglish. [Online] <http://www.globalenglish.com/> [accessed 10 October
2007].
Hewett, T., Baecker, R., Card, S., Carey, T., Gasen, J., Mantei, M., Perlman, G., Strong,
G., & Verplank, W. (1996). ACM SIGCHI curricula for Human-Computer
Interaction. [Online]. Available at: <http://www.sigchi.org/cdg/index.html>
[accessed 21 December 2007].
Higgins, J. (1983). Can computers teach? CALICO Journal, 1(2): 46.
Bibliography 295
Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation.

ReCALL, 15(1): 320.
HotPotatoes. [Online]. Available at: <http://hotpot.uvic.ca/> [accessed 11 Septem-
ber 2005].
Hu, Y. (2006). Chinese learners and Computer-Assisted Language-Learning: A study
of learning styles, learner attitudes and the effectiveness of CALL in Chinese
higher education. [PhD dissertation]. University of Edinburgh, Scotland.
Hubbard, P. (1988). Language teaching approaches, the evaluation ofCALL software,
and design implications. In W.F. Smith (ed.). Modern media in foreign language
education: Theory and implementation, pp. 227252. Lincolnwood, IL: National
Textbook Company, 1988.
Hubbard, P. (2004). Some subject, treatment and data collection trends in current
CALL research. [Conference paper]. In: 11th International CALL Conference
on CALL & Research Methodologies, University of Antwerp Proceedings,
pp. 165166 [online]. Available at: <http://www.stanford.edu/~efs/call04/
CALL04_files/frame.htm> [accessed 10 October 2007].
Hubbard, P. (2005). A review of subject characteristics in CALL research. Computer
Assisted Language-Learning, 18(5): 351368.
Ingraham, B.D., & Emery, C.R. (1991). France Interactive: A hypermedia approach
to language training. Educational and Training Technology International, 25(4):
321333.
Interdeutsch. [Online]. Available at: <http://www.virtuelles-kaufhaus.de/> [accessed
10 October 2007].
Krashen, S.D., & Terrell, T.D. (1983). The Natural Approach Language acquisition
in the classroom. Oxford: Pergamon Press.
Lafford, B.A. (2004). Review ofTell Me More Spanish. Language-learning & Technol-
ogy [online], 8(3): 2134. Available at: <http://llt.msu.edu/vol8num3/review1>
[accessed 6 May 2005].
Laurillard, D. (1994). Reinventing the steering wheel. ALT-J, 6(1): 67.
Laurillard, D., & Hewer, S. (1998). TELL Consortium Project Evaluation. University
ofHull, Hull. [Online]. Available at: <http://www.hull.ac.uk/cti/tell/eval.htm>
[accessed 18 December 2008].
Leakey, J. (2006). Evaluation of a one-year trial of Auralogs TellMeMore Education
(version 7) software package in a higher education context. CALL-EJ Online,
8(1): 17. Available at: <http://www.tell.is.ritsumei.ac.jp/callejonline/jour-
nal/81/leakey.html> [accessed 3 July 2006].
Leakey, J. (2007). Report on a trial of TellMeMore Online (v.9) at the University of
Ulster December 2006May 2007. [Unpublished report]. Commissioned by
and submitted to the French software company Auralog.
296 Bibliography
Leakey, J., & Ranchoux, A. (2005). BLINGUA An evaluation of a CALL blended

learning pilot project. [Conference paper]. Presented at U-CALL Conference,
University of Ulster, June 2006.
Leakey, J., & Ranchoux, A. (2006). BLINGUA A blended language-learning
approach for CALL. Computer Assisted Language-learning, 19(4&5): 357372
[online]. Available at: <http://www.informaworld.com/smpp/content~conte
nt=a768150127~db=all> [accessed 29 December 2006].
Levy, M. (1997). Computer-Assisted Language-Learning, context and conceptualiza-
tion. Oxford: Clarendon Press.
Levy, M. (2000). Scope, goals and methods in CALL research: Questions of coher-
ence and autonomy. ReCALL, 12(2): 170195.
Levy, M. (2002). CALL by design: Discourse, products and processes. ReCALL
14(1): 5884.
Lewis, C.S. (1978 [1943]). The Abolition of Man. Glasgow: William Collins Sons.
McCarthy, B. (1999). Integration: The sine qua non ofCALL. CALL-EJ Online, 1(2):
116. Available at: <http://www.tell.is.ritsumei.ac.jp/callejonline/journal/12/
mccarthy.html> [accessed 10 January 2006].
McCarty, W. (1995). Cannot without the procss of speech be told. [Keynote speech].
In: Proceedings of EUROCALL 1995, Valencia, pp. 1934.
Melissi. [Online]. Available at: <http://www.melissi.co.uk> [accessed 31 December
2007].
Mehanna, W.N. (2004). E-Pedagogy: The pedagogies of e-learning. ALT-J, Research
in Learning Technology, 12(3): 279293.
Mortimore, P. (1999). Understanding pedagogy and its impact on learning. London:
Paul Chapman.
Murphy, R.T., & Appel, L.R. (1977). Evaluation of the PLATO IV computer-based
education system in the Community College. Princeton, NJ: Educational Test-
ing Service.
Neumeier, P. (2005). A closer look at blended learning Parameters for designing a
blended learning environment for language teaching and learning. ReCALL,
17(2): 163178.
Oliver, M., & Trigwell, K. (2005). Can blended learning be redeemed? E-Learning,
2(1): 1726.
Pallant, J. (2001). SPSS Survival Manual A step by step guide to data analysis using
SPSS for Windows (Versions 10 and 11). Maidenhead: Open University Press.
Pederson, K.M. (1988). Research on CALL. In W.F. Smith (ed.), 1988, pp. 99131.
Piaget, J. (1932). The moral judgment of the child. London: Routledge & Kegan
Paul.
Bibliography 297
Piaget, J. (1970 [trans. of 1967]). Science of education and the psychology of the
child. London: Longman.
Reid, J. (1987). The learning style preferences of ESL students. TESOL Quarterly,
21: 87111.
Remenyi, D., Williams B, Money, A., & Swartz, E. (1998). Doing research in business
and management: An introduction. London: Sage.
Robotel Language Lab Systems website. [Online]. Available at: <http://www.robotel.
com> [accessed: March 2004].
Ross, M. (1991). The CHILL factor (or computer-hindered language-learning). In
Language Learning Journal, 4: 656.
Rousseau, J.-J., (1762). Emile, or On Education. [Online]. Available at: <http://www.
ilt.columbia.edu/pedagogies/rousseau/index.html> [accessed 10 January 2008].
English trans. by B. Foxley, 1911; rev. by G. Roosevelt, 1998.
Rowntree, D. (1981). Statistics without tears A Primer for non-mathematicians.
Harmondsworth: Penguin Books.
Salaberry, M.R. (1996). A theoretical foundation for the development of pedagogi-
cal tasks in Computer Mediated Communication. CALICO Journal, 14(1):
536.
Sanako. [Online]. Available at: <http://www.sanako.com> or <http://www.multi-
media-fl.com/LAB100datasheet1.pdf> [accessed 17 December 2007].
Saunders, M., Thornhill, A., & Lewis, P. (2006). Research methods for business stu-
dents. 4th edn. Upper Saddle River, NJ: Prentice Hall (Pearson Education).
Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-exper-
imental designs for generalized causal inference. Boston: Houghton Mifflin
Company.
Smith, W.F. (ed.) (1988). Modern media in foreign language education: Applications
and projects. Lincolnwood, IL: National Textbook Company.
TellMeMore Online version 9 web portal. [Online]. Available at: <http://www.
TellMeMorecampus.com/portalCOR/modportalCOR.axrq> [accessed 27 June
2007].
Thompson, J. (2005). Computer-Assisted Language-Learning. In J.A. Coleman & J.
Klapper (eds), 2005, pp. 148152.
Toner, G., Barr, D., Carvalho Martins, S., Duffner, K., Gillespie, J., & Wright, V. (2007).
Multimedia language-learning in UK universities A report by the Subject
Centre for Languages, Linguistics and Area Studies carried out on behalf ofthe
Centre for Excellence in Multimedia Language-Learning, University ofUlster.
[Online]. Available at: <http://www.cemll.ulster.ac.uk/site/news/CETL%20
Survey> [accessed 29 June 2010].
298 Bibliography
Towell, R., & Hawkins, R.D. (1994). Approaches to Second Language Acquisition.
Clevedon, Avon: Multilingual Matters.
University of Southampton Study Skills website. [Online]. Available at: <http://
www.studyskills.soton.ac.uk/studytips/learn_styles.htm> [accessed 30 Novem-
ber 2007].
Vygotsky, L.S. (1978). Mind in society: The development of higher psychological
processes. Cambridge: Harvard University Press.
Warschauer, M. (1996). Computer-assisted language-learning: An introduction.
In S. Fotos (ed.), Multimedia language teaching. Tokyo: Logos International,
pp. 320 [online]. Available at: <http://www.ict4lt.org/en/warschauer.htm>
[accessed 20 April 2006].
Warschauer, M. (2000). The death of cyberspace and the rebirth of CALL. English
Teachers Journal, 53: 6167 [online]. Available at: <http://www.gse.uci.edu/
person/markw/cyberspace.html> [accessed 20 July 2005].
Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview.
Language Teaching, 31: 5771.
Yeh, S.-W., & Lehmann, J.D. (2001). Effects oflearner control and learning strategies
on English as a Foreign Language (EFL) Learning from interactive hypermedia
lessons. Journal of Educational Multimedia and Hypermedia, 10(2): 141159
[online]. Available at: <http://www.editlib.org/index.cfm?fuseaction=Reader.
ViewAbstract&paper_id=8413> [accessed 22 November 2007].
Index
Note: Page numbers in italics denote references to tables and figures
ability of students 105, 117 BECTA (British Educational Com-

and BLINGUA 241 munications and Technology
and CALL Enhancement Agency) 6061
Criteria 107 behaviouristic approach 3839
and TOLD Project 217 and BLINGUA 223, 225
acronyms 24 and TMM 171
affective variables 50 and TOLD Project 200
Alderman, D.L. 88 Bell, M.A. 139
answer-judging hypotheses 108109 Black Box 156157
approaches to language learning 9 Blended Learning (BL) 35, 4345
approaches to research 6, 1112 advances 190191
area studies see BLINGUA see also BLINGUA
atomistic studies 8 blended learning approach
attitudes of learners 100, 102 and BLINGUA 223, 225
and BLINGUA 240 and TMM 171
and CALL Enhancement and TOLD Project 201
Criteria 106 BLINGUA 43, 89, 152, 198, 218221
to new technology 197 and CALL Enhancement
to remote learning 181, 183 Criteria 242243
to TOLD Project 216 data collection 226, 229
Auralog see TellMeMore (TMM) and evaluation framework 240241
authenticity in TMM7 178 learning environment 147
authenticity criterion 4849, 79, 80, 83 MFE1 121122, 150151, 154155
and BLINGUA 242 pedagogical approach 221225, 223,
and MFE2 250 225
and pedagogy 99, 106107, 111, qualitative data 228, 235239
112113 quantitative data
and platforms 84, 8687, 136 (BLINGUA-1) 232234
and programs 9091, 93 quantitative data
and TMM 192 (BLINGUA-2) 234235
Authoring Tool 174175, 191 and TMM7 172, 178179
tuition modes 239
Bacon, Sir Francis 7 validity assessment checklist 230231
Barr, D. 27 variables 227
Bax, S. 4041, 220 Bruner 46
300 Index
Buglear, J. 28 Coleman, J.A. and Klapper, J. 45

collaborative CALL criterion 81, 8385
CALICO conference (1994) 50 and BLINGUA 243
CALL, definition 2223 and MFE2 251
CALL Enhancement Criteria 74114, and pedagogy 112, 114
75, 83 and platforms 87, 137
and BLINGUA 242243 and programs 91, 93
and digital platforms 135144, and TMM 193
136137 common agenda, principles for 118119,
and MFE2 249251 121122
and TMM 191192, 192193 communicative approach
see also task appropriateness and BLINGUA 223, 225
principles and TMM 171
Cameron 25 and TOLD Project 200
Case Studies communicative CALL 3839
rationale 9, 1112, 1920 comparative studies, limitations 6366,
research constraints 1617 71
CASLA (Computer-Assisted Second comparison and treatment groups 15
Language Acquisition) 5, 2425, comparison debate 6366
45 competence levels 142143, 162
CELL (Computer Enhanced Language composite system 4445
learning) 24 comprehension class mode and
Chapelle, C. 2, 6263, 115116 BLINGUA 239
CALL task appropriateness princi- Computer Adaptive Test (CAT) and
ples 4749, 74, 7680, 83 TMM9 172
and BLINGUA 242 computer environments see environment
and pedagogy 99, 106107, 111113 Computer-Assisted Instruction
and platforms 84, 85, 8687, 136 (CAI) 103104
and programs 9091, 93 computer-assisted learning (CAL) and
and TMM 192193 blended learning 44
CASLA 5, 2425, 45 Computer-Assisted Second Language
levels of analysis 1213 Acquisition (CASLA) 5, 2425,
SLA-informed approach 69 45
validity 66, 127 computer-mediated communication
Chapelle, C. and Jamieson, J. 27, 29, 115 (CMC) 1
CILT (UK National Centre for configuration debate 6668, 71
Languages) 141 constraints of Case Study method 1617
Clark 26 constructivist approach 4647
Clarke, M. 153, 157, 159, 163 and BLINGUA 223, 225
CMC (computer-mediated and TMM 171
communication) 1 and TOLD Project 201
coding, definition 167 content vs. delivery system 62
Index 301
control groups, need for 6465 Melissi 53, 153163

cooperative learning criterion 114 and CALL Enhancement Crite-
cost-effectiveness 26, 101, 142143, 161 ria 83, 85, 8687
courseware design 23, 5556, 70, 8889, and MFE1 158, 160, 161162
103 Robotel 5253, 147155, 150151,
appropriateness 94 154155
Cultural Workshop 174, 179 subsections 135, 139140
curriculum, integration of CALL 4 survey on 145146
customized software 42 Virtual Learning Environments
(VLEs) 53, 135, 138, 140, 142
data collection 15 disciplines related to CALL 2223
and BLINGUA 226, 229 Doughty, C. 62
checklist 123, 126 Dunkel, P. 32, 100107, 116, 247
configuring 67, 68 and BLINGUA 240241
and MFE2 281 and CALL Enhancement Criteria 83
and TOLD Project 203, 208 and digital platforms 139140,
and TMM7 177 161162
and TMM9 181, 188189 and TOLD Project 216217
data see qualitative data; quantitative data
Davies, G. and Higgins, J. 24 effectiveness research 6, 8, 10, 21, 2527,
Davies, G. et al. 23, 34, 53, 137139, 88, 285286
141144 agenda for 115122
debates on CALL effectiveness 5972 and Dunkel 100104
Decoo, W. 37, 41 history 59
definition of CALL 2223 and MFE2 248, 279
delivery system vs. content 62 and Pederson 108
descriptive statistics 129 sequence for 122124
design logic 70 e-learning pedagogies 4445, 112114
design models 1617 see also TellMeMore Campus/Online
configuring 67 (TMM9)
design of research checklist, MFE2 280, Ellis, R. 9, 10, 50, 81, 83, 9598, 99
284285 empirical data collection, importance
diagnostic surveys 9798 of 8
digital labs 5253, 54, 135, 140, 143, empirical methods vs. judgmental
145146 methods of evaluation 1314
digital platforms Enhancement Criteria 74114, 75, 83
advantages 144 and BLINGUA 242243
definition 133134 and digital platforms 135144,
designs 140141 136137
digital labs see digital labs and MFE2 249251
evaluation 142143 and TMM 191192, 192193
Interactive Whiteboards (IWBs) 135, see also task appropriateness
139140, 143 principles
302 Index
environment 104, 142143, 146147, 161 halo effects

and affective variables 50 and BLINGUA 240
and BLINGUA 241 and pedagogy 106
and TOLD Project 217 and TOLD Project 216
error correction and feedback crite- halo error 102
rion 74, 81, 83, 98, 114 hardware, design 3536
and BLINGUA 243 Harris and Kington 60
and MFE2 250 HE language teaching, impact of
and pedagogy 107, 111113 CALL 61
and platforms 87, 137 Hewett, T. et al. 135136
and programs 93 Higgins, J. 47
and TMM 193 history of CALL 3738, 4042
ethnographic research 32 Hubbard, P. 117
evaluation, qualitative and and CALL Enhancement Crite-
quantitative 25 ria 83, 9294, 93
evaluation diamond 124 and Melissi 156157, 158
MFE2 279 and Robotel 152, 154155
Evaluation Flowchart 75, 123 and TMM 196
evaluation framework and Human-Computer Interaction
BLINGUA 240241 (HCI) 50
evaluative methodology 1214 humanitiestechnology interface 56
experience levels of students 117 hybrid (software/hardware) solutions 138
external validity see validity
ICT, benefits 6061
familiarity 117 ImpaCT2 project 60
feedback improvement debate 6063, 71
on BLINGUA 235237 inferential statistics 129131
on TMM7 179180 Ingraham, B.D. and Emery, C.R. 146
on TOLD Project 214215 and CALL Enhancement Criteria 83
Felix, U. 9, 61, 6869, 103, 109110, and courseware design 8889, 9091
285286, 288289 and Melissi 160
data collection 7, 67, 128 and Robotel 148149, 150151,
effectiveness research 116120, 154155
121122 and TMM 194195
validity 127, 132 Integrated CALL 40
Fisher 105 Integrative CALL 3839, 4041
Flash Player, use in TMM 176 Interactive Whiteboards (IWBs) 135,
139140, 143
games 103 interactiveness quality 78
Garrett 40 internal validity see validity
Gillespie, J.H. and Barr, J.D. 197, 215
Gillespie, J.H. and McKee, J. 197 judgmental methods vs. empirical
GlobalEnglish 103 methods of evaluation 1314
Index 303
Krashen, S.D. 50 and pedagogy 99, 106107, 111,

Kulik and Bangert-Drowns 104 112113
Kulik et al. 102, 104 and platforms 84, 8687, 136
and programs 9091, 93
Lafford, B.A. 42, 174175, 178 and TMM 192
language learning potential criterion 48, learners, internal competence vs.
80, 83 performance 96
and BLINGUA 243 learning environment 146147
and MFE2 250, 255 Learning Paths feature of TMM 179
and pedagogy 99, 106107, 111, lecture mode and BLINGUA 239
112113 levels of competence 142143, 162
and platforms 84, 8687, 136 Levy, M. 2223, 35, 41, 55, 59, 9091
and programs 9091, 93 Lian, Andrew 24
and TMM 192 locations for learning 142143, 145, 161
language skills and combinations of skills longitudinal approach to comparative
criterion 81, 83 study 66
and BLINGUA 243
and MFE2 250 MacWhinney 2627
and pedagogy 107, 113 mapping exercises 84114
and platforms 137 McCarthy, B. 181, 220
and TMM 193 McCarty, Willard 2, 168
Laurillard, D. 4 meaning focus criterion 48, 80, 83,
Laurillard, D. and Hewer, S. 26 135136
Leakey, J. 81, 83, 119, 126, 137, 193, 243 and BLINGUA 242
learner attitudes 100, 102 and MFE2 250
and BLINGUA 240 and pedagogy 99, 106107, 111,
and CALL Enhancement 112113
Criteria 106 and platforms 84, 8687, 136
to new technology 197 and programs 9091, 93
to remote learning 181, 183 and TMM 192
to TOLD Project 216 Mehanna, W.N. 3637, 4445, 83,
learner control criterion 81, 83 112114
and BLINGUA 241, 243 Melissi 53, 153163
and MFE2 250 and CALL Enhancement Crite-
and pedagogy 107, 111, 113 ria 83, 85, 8687
and platforms 8687, 137, 162 and MFE1 158, 160, 161162
and programs 91, 93 MFE1 9, 121122, 124
and TMM 193 and BLINGUA
and TOLD Project 217 data collection 226
learner differences 5051, 142143, 162 pedagogical approach 223, 225
learner fit criterion 4849, 80, 83 validity assessment criteria 230231
and BLINGUA 242 variables 227
and MFE2 250 Data Collection checklist 126
304 Index
and Melissi 158, 160, 161162 Morgenstern 33

qualitative data 131132 Mortimore, P. 36
quantitative data 129131 Multimedia Language learning surveys
Research Criteria checklist 125126 (2006) 145146
and Robotel 150151, 154155 (2007) 61
and TMM 194196 Murray 67, 6668
data collection 172
and TMM9 182 novice students 117
and TOLD Project 200201
data collection 203, 208 Oliver, M. 44
validity assessment checklist 206 Oliver, M. and Trigwell, K. 43, 220
variables 208 oral skills training see TOLD Project
Validity checklist 127 outcomes vs. processes debate 6870,
MFE2 9, 68, 248 7172
and authenticity criterion 260261
and CALL Enhancement pedagogical approach
Criteria 250251 and BLINGUA 221225, 223, 225
and collaborative CALL and TMM 171
criterion 272273 and TOLD Project 200201, 204
data collection methods 281 pedagogies 2, 4, 9
and error correction and feedback of CALL 3839
criterion 270271 definition 3637
evaluation flowchart 249 as design basis 36
and language learning potential history 3738, 4042
checklist 254255 interrelationship with program 34
and language skills and combination and task appropriateness princi-
of skills criterion 266267 ples 94114, 99, 106107, 111,
and learner control criterion 268270 112113
and learner fit criterion 256257 Pederson, K.M. 8, 6264
and meaning focus criterion 258259 and CALL Enhancement Criteria 83
and positive impact and effectiveness research 26, 115,
criterion 262263 119120, 121122
and practicality criterion 264265 and pedagogy 108111
and research design criteria 280, and software 55, 167
284285 Pennington 35
and teacher factor criterion 274275 performance data as competence
and tuition delivery modes indicator 96
criterion 276277 phenomenological approach 6, 1112,
validity assessment checklist 282283 3032
Model for Evaluation (MFE) see MFE1; Piaget, J. 46, 49
MFE2 platform-judging considerations 136137
Index 305
platforms 5255 Qualitative and Quantitative Meas-

evaluation 35 ures 75, 121122, 124, 128129
for language learning 2324 quality control checklists for MFE2 253
and task appropriateness princi- authenticity criterion 260261
ples 84, 8687, 136137 collaborative CALL
positive impact criterion 4849, 78, 80, criterion 272273
83, 136 data collection methods 281
and BLINGUA 242 error correction and feedback
and MFE2 250 criterion 270271
and pedagogy 99, 106107, 111, language learning potential
112113 criterion 254255
and platforms 84, 8687, 136 language skills and combination of
and programs 9091, 93 skills criterion 266267
and TMM 193 learner control criterion 268269
positivistic approach 6, 1112, 3032 learner fit criterion 256257
practicality criterion 4849, 79, 80, 83 meaning focus criterion 258259
and BLINGUA 242 positive impact criterion 262263
and MFE2 250 practicality criterion 264265
and pedagogy 99, 106107, 111, research design criteria 280, 284285
112113 teacher factor criterion 274275
and platforms 84, 8687, 136 tuition delivery modes
and programs 9091, 93 criterion 276277
and TMM 193 validity assessment 282283
processes vs. outcomes debate 6870, quantitative data
7172 and BLINGUA 226, 229
programs 556 and BLINGUA-1 232234
interrelationship with pedagogy 34 and BLINGUA-2 234235
and task appropriateness princi- descriptive statistics 129
ples 9091, 93 inferential statistics 129131
purpose of evaluation 78 and MFE2 281
vs. qualitative data 25, 2832
qualitative data 1312 and TMM 172
and BLINGUA 226, 228, 229, and TMM7 178179
235239 and TMM9 184188
and MFE2 281 and TOLD Project 203, 205, 208,
vs. quantitative data 25, 2832 209213
and TMM 172 questions, research 1011
and TMM7 177, 178179
and TMM9 183184 Reeves 67
and TOLD Project 203, 208, 214215 Reid, J. 51
reliability quality 78
306 Index
remote learning 138 student ability 105, 117

student attitudes 181, 183 and BLINGUA 241
replicability issues in comparative and CALL Enhancement
studies 6566 Criteria 107
replication studies 31 and TOLD Project 217
reporting of findings 132 student-centred approach
Research Criteria checklist 123, 125126 and BLINGUA 223, 225
research methodologies 15 and TOLD 201
configuring 67 student reactions 142143, 162
research philosophy 1112 style variables, VARK model 51
research process onion 30 surveys
research questions 1011 Melissi 157, 159, 161162
research shortcomings 117 Multimedia Language learning
Richmond 27 (2006) 145146
Robinson et al. 108109, 111 Multimedia Language learning
Roblyer et al. 100, 102, 105 (2007) 61
Robotel 523, 147155, 150151, 154155
Rousseau, J.-J. 46, 49 task appropriateness principles 4749,
74, 7680, 83
Salaberry, M.R. 6768, 71 and BLINGUA 242
sample sizes 10 and pedagogy 99, 106107, 111113
Saunders, M. et al. 30 and platforms 84, 85, 8687, 136
scaffolding 47 and programs 9091, 93
Schmitt 67 and TMM 192193
Second Language Acquisition see also Enhancement Criteria
(SLA) 89, 4550 teacher factor criterion 81, 83, 137
framework for investigating 99 and BLINGUA 243
research 95 and MFE2 251
seminar mode and BLINGUA 239 and TMM 193
Sheingold, Kane and Endreweit 32 teacher role and TMM9 189190
skill areas benefiting from CAI 102103 teacher-led approach
Smith, W.F. 116 and BLINGUA 223, 225
social constructivism 4647 and TMM 171
software packages 167 and TOLD Project 200
choices 4142 technological advances
see also courseware design; TellMeM- anxieties about 50
ore (TMM) emphasis on 3334
SOTON-UU 121122 technologyhumanities interface 56
speech recognition software 42 TELL (Technology Enhanced Language
spiral curriculum 46 learning) 24
statistical analyses 1415, 129131 TellMeMore (TMM) 42, 56, 103104,
need for comparison 65 121122, 148155, 150151, 154155
Stern 8, 59 Authoring Tool 174175, 191
Index 307
and CALL Enhancement Crite- quantitative data 203, 205, 208,

ria 191192, 192193 209213
comparison between version 7 and and TMM7 172, 178
version 9 169 validity assessment checklist 206
Cultural Workshop 174, 179 Toner, G. et al. 54, 83, 84, 145146
data collection 172 TOP (Teachers Online Project) 60
Learning Paths 179 tracking feature of TMM9 183
and MFE1 194196 treatment and comparison groups 15
and pedagogical approaches 171 Tschirner 60
technical evaluation 176177 tuition delivery modes criterion 81, 83,
TellMeMore Education (TMM7) 171 144
and Computer Adaptive Test and BLINGUA 243
(CAT) 172 and MFE2 251
context 173 and platforms 87, 137
qualitative findings 177, 178179 and TMM 193
quantitative findings 178179 tuition modes and BLINGUA 239
student feedback 179180
and TOLD Project 172, 178 Underwood 38
TellMeMore Campus/Online University of Portsmouth,
(TMM9) 8889, 171 Melissi 153163
context 173 University of Ulster
data collection 181, 188189 BLINGUA 218
qualitative findings 183184 use of CALL 243245
quantitative findings 184188 migration to digital environment 197
teacher role 189190 Robotel 147155, 150151, 154155
validity assessment criteria 182 TOLD delivery 204
Thompson, J. 4
Thorndike and Hagen 102 validity
time spent on TMM9 187188 ensuring 2829
time-efficiency internal and external 116, 126, 127, 132
and BLINGUA 240 validity assessment
and TOLD Project 216 and BLINGUA 230231
timeline for CALL evaluation 278279 checklist 123, 127
timesaving benefits of CALL 101 and MFE2 282283
TMM see TellMeMore and TMM9 182
TOLD Project (Technology and Oral and TOLD 206
Language Development) 3132, variables
105, 121122, 198199, 202203 BLINGUA 227
data collection methods 203, 208 control of 11819, 121122, 123,
delivery 204205 130131
evaluation framework 21617 influencing language learning 3, 5
pedagogy checklist 200201 internal 29
qualitative data 203, 208, 214215 VARK model 51
308 Index
versatility 144 Willis, Johnson and Dixon 103

Vincent 132 WorldWideWeb 52
Virtual Learning Environments Wyatt 35
(VLEs) 53, 135, 138, 140, 142
Vygotsky, L.S. 4647 Yeh, S.-W. and Lehmann, J.D. 109
Yildiz and Atkins 27, 62
Warschauer, M. 3840
Warschauer, M. and Healey, D. 40 zones of proximal development 4647
WebCT 53
web-enhanced language learning
(WELL) 1

Leakey - Evaluating Computer-Assisted Language Learning: An Integrated Approach To Effectiveness Research in CALL

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Leakey - Evaluating Computer-Assisted Language Learning: An Integrated Approach To Effectiveness Research in CALL

Încărcat de

Drepturi de autor:

Formate disponibile

Schools, colleges and universities are investing a great deal in

Evaluating Computer-Assisted Language Learning

Evaluating Computer-Assisted Language Learning

Library of Congress Cataloging-in-Publication Data:

Leakey, Jonathan, 1962-

List of Figures vii

Figure 2.1 Research process onion 30

Figure 4.1 Evaluation flowchart (MFE1) 75

Figure 4.2 Elliss Framework for investigating L2 acquisition 97

Figure 9.1 Evaluation flowchart (MFE2) 249

Table 1.1 Levels of analysis for CALL evaluation 13

Table 6.1 Platform-judging considerations linked to the CALL

Table 8.3 Validity assessment criteria for MFE1: Mapping of the

Table 9.3 MFE2 quality control. Evaluation of CALL enhancement

The need for systematic quality control in CALL

New technologies, new literacies and a need to demonstrate their value

The need for a systematic approach to CALL evaluation

This book builds on the agenda-setting work of a small number ofCALL/

effectiveness research, drawing on the findings of empirical as well as more

Existing parameters for CALL evaluation

A book whose main objective is the evaluation ofthe methodologies used

Light at the interface

There is general agreement on the need in a field such as CALL, anchored

to quantitative analysis, but rather that human interaction, or inter-sub-

A need to configure metric methodologies

It is not surprising, given the coincidence in CALL of at least two major

underlines the importance of multiple data sources and urges a configura-

A spouse for fruit: Keeping sight of the main purpose

Much CALL research increasingly focuses on what have been called

Second language acquisition, language theory and the Case Studies

acquisition (SLA). Other theories and approaches were not precluded in

determined blending of resources and approaches in response to student

The foundational research questions

Theoretical considerations and research philosophy

In delineating the parameters of this study a decision was made to adopt

Case Studies (quasi-experimental, real-life based, as opposed to tightly

The theoretical background to the research and the evaluative methodol-

Level of Object of Method of

Does the software provide learners

Does the CALL activity designed

Learners Do learners actually interact and

Empirical and qualitative/quantitative data can be depicted in a variety of

the above analyses are sometimes accompanied by supporting, or negat-

Data collection methods

Qualitative data were collected by means of student and staff evalua-

Experimental design models used

These models varied according to the requirements of the project and

Quasi-experimental studies usually involve both a pre-test and a post-

MFE1 is revisited and improved upon in the final chapter (Chapter

The Case Studies

The third Case Study (Chapter 8) reports on two pedagogy-based

Swings, spirals and re-incarnations:

Effectiveness research, or systematic evaluation, as it has also been called

CALL An interdisciplinary field of study

Interaction (HCI), Information Processing, Instructional Design, Instruc-

educational contexts will provide vital feedback to department managers

Effectiveness research, analysis of data: Definitions and scope

Effectiveness research is more than about providing empirical, scientifi-

Evaluation is about quantifying and qualifying the value or worth of an

Pederson (1988) defines effectiveness research as the search for an

Effectiveness research goes beyond value to explore value added. While it

These references help shed light on what is meant by evaluation and

Quantitative and qualitative data