Documente Academic
Documente Profesional
Documente Cultură
Evaluating
this book the author outlines the existing evidence for the impact of
computers on language learning and makes the case for an integrated
approach to the evaluation of computer-assisted language learning
(CALL). Drawing on current and past research linked to CALL and
e-learning, the author builds a comprehensive model for evaluating
not just the software used in language learning, but also the teaching
and learning that takes place in computer-based environments, and Computer-
Assisted
the digital platforms themselves. This book will be of interest not
only to language teachers and CALL researchers, but also to those
interested in e-learning and general research methodology, as well
as designers of educational software, digital labs, virtual learning
Language
environments (VLEs) and institutional budget holders.
Learning
Jonathan Leakeys interest in evaluating the effectiveness of comput-
ers in language teaching comes from his years of teaching modern An Integrated Approach to
languages in secondary schools in Liverpool and in the further and
Effectiveness Research in CALL
Jonathan Leakey
higher education sectors in Northern Ireland. Since 2002 he has been
a lecturer in French, German and European Studies at the University
of Ulster, where he completed his doctorate in 2008.
JONATHAN LEAKEY
ISBN 978-3-0343-0145-9
Peter Lang
www.peterlang.com
Schools, colleges and universities are investing a great deal in
Evaluating
this book the author outlines the existing evidence for the impact of
computers on language learning and makes the case for an integrated
approach to the evaluation of computer-assisted language learning
(CALL). Drawing on current and past research linked to CALL and
e-learning, the author builds a comprehensive model for evaluating
not just the software used in language learning, but also the teaching
and learning that takes place in computer-based environments, and Computer-
Assisted
the digital platforms themselves. This book will be of interest not
only to language teachers and CALL researchers, but also to those
interested in e-learning and general research methodology, as well
as designers of educational software, digital labs, virtual learning
Language
environments (VLEs) and institutional budget holders.
Learning
Jonathan Leakeys interest in evaluating the effectiveness of comput-
ers in language teaching comes from his years of teaching modern An Integrated Approach to
languages in secondary schools in Liverpool and in the further and
Effectiveness Research in CALL
Jonathan Leakey
higher education sectors in Northern Ireland. Since 2002 he has been
a lecturer in French, German and European Studies at the University
of Ulster, where he completed his doctorate in 2008.
JONATHAN LEAKEY
Peter Lang
www.peterlang.com
Evaluating Computer-Assisted
Language Learning
Evaluating
Computer-
Assisted
Language
Learning
An Integrated Approach to
Effectiveness Research in CALL
J o n th n L k y
Peter Lang
Oxford Bern Berlin Bruxelles Frankfurt am Main ew York Wien
Bibliographic information published by Die Deutsche ationalbibliothek.
Die Deutsche ationalbibliothek lists this publication in the Deutsche
ationalbibliografie; detailed bibliographic data is available on the Internet
at http://dnb.d-nb.de.
catalogue record for this book is available from the British Library.
ISBN 978-3-0343-0145-9
E-ISBN 978-3-0353-0131-1
Peter Lang , International cademic Publishers, Bern 2011
Hochfeldstrasse 32, CH-3012 Bern, Switzerland
info@peterlang.com, www.peterlang.com, www.peterlang.net
ll rights reserved.
ll parts of this publication are protected by copyright.
ny utilisation outside the strict limits of the copyright law, without the
permission of the publisher, is forbidden and liable to prosecution.
This applies in particular to reproductions, translations, microfilming,
and storage and processing in electronic retrieval systems.
Printed in ermany
Contents
List of Tables ix
Chapter 1
The need for systematic quality control in CALL 1
Chapter 2
Swings, spirals and re-incarnations: Lessons from the past 21
Chapter 3
Has CALL made a difference: And how can we tell? 59
Chapter 4
A model for evaluating CALL. Part 1: CALL enhancement criteria 73
Chapter 5
A model for evaluating CALL. Part 2: Qualitative and quantitative
measures 115
Chapter 6
Case Study 1: Evaluating digital platforms 133
Chapter 7
Case Study 2: Evaluating programs 167
vi
Chapter 8
Case Study 3: Evaluating pedagogy 197
Chapter 9
A new framework for evaluating CALL 247
Bibliography 291
Index 299
List of Figures
Figure 5.1 Evaluation diamond for CALL effectiveness research (MFE1) 122
Figure 8.1 Mean improvement from the pre- to the post-test 210
Figure 9.2 Evaluation diamond for CALL effectiveness research (MFE2) 279
List of Tables
Introduction
There have always been sceptics who have doubted whether the computer
has anything significant to add to the language learning experience beyond
the wow factor. Even with the arrival of the modem, broadband, Local
Area Networks (LAN), the worldwide web (WWW), Virtual Learning
Environments (VLE) and e-learning, doubts have persisted and the absence
of clear-cut empirical data demonstrating improved learning has not helped
to quell the uncertainty. It is still not really known with any degree of cer-
tainty whether computer-assisted language learning (CALL) makes an
objective, measurable and significant difference to students learning.
Qualitative studies have been aplenty and these have lent some cre-
dence to the educational benefits of new technologies for language learn-
ing. The language teacher may now, by means of a computer, deliver the
four main language skills (listening, speaking, reading and writing), teach
vocabulary acquisition, grammar tuition, literature, area studies, and also
enhance meta-cognitive language learning skills. Computer-mediated com-
munication (CMC) and web-enhanced language learning (WELL) have
sought to exploit the opportunities to motivate a new generation of lan-
guage learners. Within educational institutions we also have ever-improv-
ing multimedia language laboratories, interactive whiteboards (IWBs),
networked courseware and sophisticated tracking software. Nowadays,
language learning can occur through mobile-assisted language learning
(MALL), audio-, video- streaming, mp3s, pod-casting and wi-fi literally,
2 Chapter 1
language learning on the hoof. But can we show that any, or all, of these
do any better than an inspirational and well-organized language teacher
can achieve, or could have achieved in the past, without the benefit of a
computer or digital lab, and using merely those tools ofthe pre-digital era:
paper, pen, chalk (or dry-wipe marker!) and talk, conversation class, group/
pair work, cassette recorder and an overhead projector?
The digital revolution has even altered the way language is used.
Chapelle (2004) put it this way: language learners are entering a world
in which their communicative competence will include electronic litera-
cies, i.e., communication in registers associated with electronic commu-
nication (2004: 2). But are the tools of educational measurement still
flexible enough, and do they have the scope, to be able to evaluate and
indeed measure the impact of this revolution on language learning and
language learners? Indeed, is the task of identifying scientifically the causes
of improvement in language learning an impossible one? Is it like trying to
triangulate on the infinite (or whatever else we choose to call it) with our
finite minds and tools, as Willard McCarty put it in his key-note speech
to the 1995 EUROCALL Conference in Valencia?
The challenge for those attempting to apply scientific metrics to any
Humanities subject and CALL must surely belong, in large measure, to
the Humanities is that we are dealing with human beings, all of whom
possess complex subjectivity, multiple motivations and unique experi-
ences and gifts. Each one uses different learning processes, adopts different
learner strategies, and demonstrates different learning styles. However, in
evaluating pedagogy for language acquisition, there is not only the learner
to factor in, but also the learning and the learning environment.
When one considers the learning, there are plenty oflanguage-learning
pedagogies past and present that may be influencing teachers and class-
room or lab proceedings: Behaviourism, Functionalism, Constructivism,
Social Constructivism, Associationism, Connectivism, Socio-linguistics,
Chomskyism, the Natural Approach, Accelerative learning, Suggestopedia,
Second Language Acquisition (SLA), Cognitivism, Task-Based Learning,
Blended Learning (BL) and more. The question is the following: is the role
they play identifiable, and if so, is it susceptible to qualitative appraisal or
even quantitative measurement?
The need for systematic quality control in CALL 3
As for the CALL learning environment, clearly there are factors that
must play their part in influencing learning outcomes, such as comfort,
ergonomics and affective or psycholinguistic dynamics. Computer-based
learning environments clearly create their own variables in the learning
equation. Can these, too, be identified, isolated and measured? And if so,
how?
In essence, this book is about evaluation and aims to give the reader,
whatever his or her experience of evaluation, a theoretical introduction as
well as practical tools (i.e. a model for evaluation and stage-by-stage check-
lists) for assessing the value of computers in language teaching and learning
(CALL). This book will look at the history of attempts to be more certain
in evaluating CALL and will explore ways in which evaluation might be
done more efficiently and comprehensively. While the field-work has been
carried out at a UK university level, examples are cited from other sectors of
education from primary, through secondary and up to adult level. Readers
will find the model for evaluation (abbreviated to MFE), and checklists
have a built-in flexibility to enable them to be applied in a wide range of
educational contexts. They will enable the evaluator to carry out a kind of
quality control of the key factors that contribute to computer-assisted
language learning.
To that end, the focus has been on three variables which were felt at
the outset ofthe study to encompass the principal factors influencing the
language learner and language learning: the digital platform, the software
program, and the pedagogy employed. It was concluded early on in the
project that an evaluative model for CALL had to deploy the appropriate
metric tools and research approach to assess empirically both the impact
of each distinct element and any added synergies that may operate when
all the elements are working together in a real-life setting.
Yet the gathering and publishing of data must surely not be our final goal.
In our endeavour to improve the performance of our language students in
their target language, and in our search for the elusive goals of optimized
platforms, programs and pedagogy, the role of our data must be to inform
further improvements in teaching and learning as well as CALL software
design and not be an end in themselves. A lesson might be taken from Sir
Francis Bacon whose condemnation ofthose who valued knowledge as an
end in itself, who use it as a mistress for pleasure rather than as a spouse
for fruit, might also be applied to those who value data and technology
as ends in themselves (Bacon, cited in Lewis 1943: 46). The fruit we seek
as teachers and CALL researchers must ultimately be the progress of our
students, not the generation of unapplied data and evidence.
8 Chapter 1
Language learning theory has had a strong preference for speculation, the expres-
sion of personal opinion, the explanation of practical experience, and participation
in controversy all perfectly legitimate ways offinding directions provided they are
balanced by systematic empirical procedures. But in language teaching theory we
have tended to neglect the collection of empirical data (p. 126).
CALL evaluators and researchers need to answer her call for disciplined,
dispassionate research that attempts patiently and carefully to add to what
is already known about how students learn languages in the context of
computer-assisted language learning, sharing her desire to create the best
assurance that CALL, unlike the language lab of the 1960s, will be used
intelligently (1988: 127).
In its attempt to heed Pedersons call to build on the lessons from existing
research, the project behind this book did not espouse any one particular
theory of language learning despite a leaning towards second language
The need for systematic quality control in CALL 9
SLA is characterized by small-scale studies. There have been few, if any, studies that
might be characterized as large-scale. However, this can be seen as an advantage, as
it affords a rich contextualized view of how L2 [i.e. Second Language] acquisition
takes place. The danger lies in using local research to advance pedagogic proposals
of a categorical nature. (1997: 252)
The aim is more to afford a broad and deep contextualized view ofCALL
learning and CALL evaluation, to add insights and guidelines to the corpus
of good evaluative practice, and to suggest a framework for systematiz-
ing CALL effectiveness research. Out of this, it is hoped, future CALL
effectiveness researchers might more easily identify gaps in the literature,
generate research questions that build on a logical progression of enquiry,
and employ proven methodologies that meet a consensual CALL effec-
tiveness research agenda.
It is now time to look at the research questions and methodologies
used for this study.
The following list of research questions was drawn up prior to the Case Stud-
ies and out ofthe initial literature review. They form the basis for the scope
ofthe literature review in Chapter 1 and, while other subordinate research
questions arise out ofthe literature review, they are the foundational focus
for the MFE and the Case Studies in the subsequent chapters.
The need for systematic quality control in CALL 11
1. Does CALL really improve language learning, and if so, what is the
evidence for this?
2. What might be a useful evaluation model for investigating and meas-
uring the effectiveness of platforms, programs and pedagogy?
3. Can one usefully compare CALL to traditional methods of second
language teaching and learning?
4. How can one best measure the effectiveness ofCALL platforms, pro-
grams and pedagogy over a sustained period oftime (between 1 semester
and 3 years minimum)?
5. Is it possible, using an appropriate evaluation model, to identify best
practice using optimized combinations of multimedia/CALL?
6. How can one integrate best practice using optimized combinations of
multimedia/CALL with more traditional second language teaching
and learning methods and with varying degrees of enthusiasm amongst
staff and students?
7. Is student progress (or lack of it) through CALL or non-CALL peda-
gogy determined to a significant degree by independent variables such
as learning style or prior exposure to/use ofICT? If so, might there be
value in mapping student learning paths to their dominant learning
style?
Research methodology
Evaluative theory
based speculation about ideal conditions for SLA, in other words CALL
researchers need to be familiar with previous effectiveness research findings
and know as far as possible the agreed best practice for language learn-
ing. Second, criteria should be accompanied by guidance as to how they
should be used; in other words, a theory of evaluation needs to be articu-
lated. Third, both criteria and theory need to apply not only to software,
but also to the task that the teacher plans and that the learner carries out
(pp. 5152). In other words, a holistic MFE, that is capable of evaluating
platforms, programs and pedagogy, is needful.
Chapelle argues that CALL evaluation has to go beyond what she
calls judgmental methods of evaluation to include empirical methods.
The former, she argues, is a level of analysis suitable for evaluating both
CALL software and teacher-planned activities to determine how well the
program/task does the job of improving language competency (see Table
1.1). Empirical analysis, she argues, evaluates the learners performance, and
is therefore conducted through examination of empirical data reflecting
learners use of CALL and learning outcomes (pp. 5354).
Table 1.1 Levels of analysis for CALL evaluation. Source: Chapelle, C. (2001: 53).
Computer applications in Second Language Acquisition.
Cambridge University Press, reproduced with permission.
14 Chapter 1
This study incorporates both types of evaluative method. The terms judg-
mental and empirical echo the terms qualitative and quantitative. Judg-
mental data are usually open, descriptive, verbal, subjective and based on
opinion or affective response; they can be collated and analysed qualitatively
(for example, in semi-structured interviews or focus groups) or quanti-
tatively (for example, in Likert scale and closed yes/no questionnaires).
Empirical data are closed and explanatory data that are gained by objective
observation and/or experiment; they are usually collated numerically (i.e.
quantitatively).
Reporting formats
For the most part the studies involved make use of Treatment and Com-
parison groups to control for specific variable(s) and so enable the gathering
of data for the comparing of means.
A treatment (sometimes also called a test or experimental) group is,
in most instances, a group taught in a CALL environment such as a mul-
timedia laboratory. In the case of the BLINGUA Project for the same
University of Ulster Case Study, both treatment and comparison groups
were taught in the multimedia laboratory.
A comparison group, for example, might be a group taught in the tradi-
tional manner (i.e. in the classroom, never in the multi-media lab). This was
the case with the TOLD Project in the University of Ulster Case Study.
Quantitative data were collected by means of a pre- and post-test, so
that an empirical gauge oflearning gains/outcomes could be made. Before
teaching began, all subjects were given a test on the areas oflanguage cov-
ered by the module. At the completion of the period of instruction, the
same, or a near-identical, test was again administered as a post-test.
16 Chapter 1
Summary of chapters
Having introduced above the broad research questions and research phi-
losophy for my overall enquiry, as well as introduced some of the key ter-
minology and methodology to be used, this study will now, in Chapter 2,
evaluate the relevant literature relating to CALL definitions, CALL history,
the history ofCALL effectiveness research, CALL pedagogy and second
language pedagogy in general as it relates to the context of CALL and
CALL evaluation. Chapter 3 addresses the question Has CALL made a
difference, and how can we tell? in the context offour key debates that arise
frequently in the literature when CALLs effectiveness is being discussed:
whether CALL improves language learning, what the value of compara-
tive evaluations is, what combination of methods is best for measuring
progress, and whether the focus should be on learning processes or learning
outcomes. The chapter concludes by applying past lessons to an improved
model ofCALL evaluation. Chapter 4 presents the prototype MFE1 and
suggests two primary routes through the evaluative jungle: one that uses
twelve CALL Enhancement Criteria to judge CALL impact phenom-
enologically and a second that configures Qualitative and Quantitative
Measures to judge CALL impact more positivistically. The choice of the
twelve CALL Enhancement Criteria is then justified by means of mapping
Chapelles six evaluative criteria for evaluating CALL task appropriateness
against a number of evaluative agendas from the literature and CALL
practice. Chapter 5 presents the Qualitative and Quantitative Measures
and argues for an empirical methodology of CALL evaluation involving
a triangulation of analytical and diagnostic tools aimed at obtaining data
that are both rich and containing strong internal and external validity.
The three chapters that follow on from the presentation of MFE1
(that is, Chapters 6 to 8) draw together the evidence and findings from
the Case Studies which pilot MFE1 in a number of CALL settings. The
Case Studies follow the hierarchical logic ofthe construction site: starting
with the foundation of all CALL activity: the digital platform, followed by
the software program that sits on this platform, and finishing up with the
pedagogy that harnesses these for the purposes of teaching and learning.
The need for systematic quality control in CALL 19
The decision to adopt a Case Study approach was born of three principal
needs. First, there was the need to prove, or if necessary disprove, the effec-
tiveness of CALL. Secondly, there was the need to trial and improve, in
real-life academic settings, a Model for Evaluation that would be a flexible
and systematic tool capable of assembling a large-scale picture from numer-
ous small-scale studies using a configuration of data-gathering techniques.
And thirdly, there was the need to develop a pedagogy for CALL that was
both informed by theory and yet pragmatic and flexible enough to identify
and profit from the rapidly changing and diverse world oftechnology with
which most of our students are already familiar.
The first Case Study (Chapter 6) looks at the evaluation ofthe impact
of digital platforms on the whole CALL process, with particular emphasis on
Robotels SmartClass digital platform as used in the University ofUlster and
the Melissi Digital Classroom as used in the University of Portsmouth.
The second Case Study (Chapter 7) evaluates the role of commercial
software in driving (or hindering) the CALL agenda and looks, in particu-
lar, at two evaluative projects that trialled, in the context of higher edu-
cation language teaching, different versions (i.e. a networked CD-ROM
and an online e-learning course) of a high-powered product developed by
a major CALL software manufacturer (Auralog, France).
20 Chapter 1
Introduction
Definitions
CALL A definition
Levy defines CALL as the search for and study of applications of the
computer in language teaching and learning (1997: 1). In light ofthe review
ofthe literature and for the purposes of clarity for the enquiry that follows
I have defined CALL as the following: the exploration, sometimes coher-
ent, sometimes disparate, of all aspects ofthe human-computer axis, with
the primary goal of enhancing the process of second-language teaching and
learning, be it in curriculum design, delivery, testing, feedback, monitoring
or evaluation, by means of the generation of improved computer-based
platforms, courseware, learning environments and pedagogies.
Computer-assisted language learning is now in its fifth decade as an
academic discipline or field of study. This relative youth may go some way
to explaining its being a relatively unknown and disparate entity. Also,
its very name suggests three vast areas of knowledge each with their own
fields of study and frames of reference. The notion of computer-assisted
automatically links the discipline with the new and rapidly changing dig-
ital world and with it a plethora of fields of varying degrees of relevance
to CALL. This relatively new field is linked to two that are nearly as old
as human beings: the concepts of language learning and learning, each
with their own conceptual root systems. CALLs very name reflects its
interdisciplinary nature.
Levy (1997: 4950) lists twenty-four disciplines that bear upon CALL
and are to varying degrees influenced by it. They reflect our three principal
conceptual areas: the notion computer-assisted links CALL to Artificial
Intelligence (AI), to Computer-Assisted Instruction (CAI), Computational
Linguistics, Educational Technology, Expert Systems, Human-Computer
Swings, spirals and re-incarnations: Lessons from the past 23
CALL acronyms
For the sake of brevity, acronyms will be used as often as possible. The
acronym CALL, coined in the early 1980s, in all probability by Davies and
Higgins (1982), is one ofthree generic labels that jostled for pre-eminence
in the 1980s and 1990s, the other two being CELL (Computer Enhanced
Language learning) and TELL (Technology Enhanced Language learn-
ing). The name CELL was probably first coined by Professor Andrew
Lian around 1988, and, like CALL, emphasizes the computers role in
language learning, while allowing for all types of computer programs and
computer-based resources. The TELL Consortium was founded at the
1993 EUROCALL conference in Hull and was based at the Centre for
Modern Languages at the University ofHull in the UK; its nomenclature
implies a broader scope than CELL including all the technologies involved
in language learning. CALL, nevertheless, has stuck as a term. CALL, as
well as any acronym, emphasized the gamut of roles the computer can play
in learning, and by 1997 had already made its way into the titles of key
journals and conferences (ReCALL, CALL, On-CALL). The emphasis
on the term computer-assisted in the name CALL is more neutral than
computer-enhanced, emphasizes the facilitating role ofthe computer, and
discourages the perception of the computer as the tutor. As we will see,
this distinction has become increasingly important in the debate concern-
ing causality in the learning process.
While not departing from the generally accepted acronym ofCALL,
nevertheless, this book will balance the concept of learning with role of
the teacher and his/her pedagogy. Chapelles (2001) use of the concept
acquisition in her preferred acronym Computer-Assisted Second Lan-
guage Acquisition (CASLA) perhaps comes closest to this balance, while
still placing the greater emphasis on the learning, rather than the teaching
Swings, spirals and re-incarnations: Lessons from the past 25
process. Likewise her use of the term task implies a two-way engage-
ment: something set by the teacher, but worked on with varying degrees
of autonomy by the student. Clearly learning can and should take place
without the teacher, but faced with the increasing plethora of resources
available to the learner, this book will argue that, now as much as ever, the
teacher/tutor/facilitator is needed to harness, integrate, pilot and humanize
the learning materials and processes, and that a clear and holistic agenda is
required to enable a coherent assessment as to how effectively this is being
done. To this end the notions of evaluation and effectiveness research need
to be clarified.
Evaluation studies
We need to ask whether there are other media or another set of media attributes that
would yield similar learning gains. The question is critical because if different media
or attributes yield similar learning gains and facilitate achievement of necessary per-
formance criteria, then in a design science or an instructional technology, we must
always choose the less expensive way. (Clark 1994: 22, cited in Allum 2002: 147).
called for evaluation studies that assessed not just the courseware but their
operationalisation in the classroom: The design of computational systems
to support foreign language instruction needs to be grounded in what we
know about human learning, language instruction, and human-computer
interaction. Principles derived from these fields need to be tested and quan-
tified in the context of specific tutoring systems (Barrire and Duquette
2002: 472).
Yildiz and Atkins (1993, cited in Levy 1997: 41) called for an evalu-
ation of learning outcomes with different sizes of learner group and with
different methods of integrating the multimedia application into other
learning taking place in that context.
Is it, however, just about measuring learning outcomes, and merely
about product?
More recently Barr has argued that: few researchers have investi-
gated how to integrate all these applications to achieve maximal benefit
for the learning process (2004: 12). He quotes Richmond (1999: 312) as
suggesting this as an area for future research. Barr concludes his book on
computer-based learning environments with the following recommenda-
tion: a future study might also look at empirical evidence ofthe impact
of computer technology on the language learning and teaching process
(Barr 2004: 226).
The work of researchers such as Chapelle and Jamieson has focused
primarily on just this area of process. They argue forcefully the case for
research that identifies and isolates the specific variables surrounding CALL
activity that may, or may not, be contributing to increased learning gains.
They state that:
The researcher must ask and answer the following questions: What
factors other than the hypostudyed CALL activity could have influenced
students performance on the test used to measure the effectiveness ofthe
CALL activity in promoting the L2 development? What factors other than
the attitudes or strategies under investigation could have been responsible
for students reported perceptions and use oflearning strategies? What jus-
tifies the interpretation of particular behaviours observed as suggestive of
certain linguistic functions or cognitive strategies, and to what extent were
two independent raters able to agree on those interpretations? (1991: 54).
28 Chapter 2
Buglear defines data as a set of known facts and the difference between
qualitative and quantitative data as the difference between categorizing
and measuring. He states that: any data that is based on characteristics
or attributes is qualitative. Data that is based on counting or measuring is
quantitative. (2000: 23). Thus if we were to describe a certain CALL pro-
gram or teaching method as effective, we could verify this by either using
qualitative terms to categorize the responses or effect it had on students
(i.e. motivating, helped me to learn my verbs, improved my fluency and
pronunciation) or we could quantify the actual effect such a program had
on the students by giving them the same test twice, once at the start of a
treatment and once at the end, and measure the difference. One could
then measure the effect on a whole class, add up the average of the whole
group and then compare the means before and after the treatment. Such
an approach would be a quantitative approach.
Whether quantitative or qualitative, the research requires rigour or
validity for its findings to be generalizable beyond the context of the
study. Many factors go towards ensuring such validity, such as the size
of the sample studied, the isolation of contributing variables, the ethical
integrity ofthe research, the length oftime over which the study was car-
ried out and the repeatability ofthe data. Validity can be divided into two
categories: internal and external. Internal validity can be described as: the
accurate attribution of observed experimental results to the factors that
were supposed to be responsible for those results, and external validity as
the applicability of research results to instructional and research contexts
other than the one in which the research was carried out (Chapelle and
Jamieson 1991: 38).
Swings, spirals and re-incarnations: Lessons from the past 29
Positivism Research
Deductive philosophy
Experiment
Research
Cross Survey approaches
sectional
Case
Sampling
Study Research
Secondary data
Observations strategies
Grounded
Interviews theory
Questionnaires
Ethnography
Longitudinal
Action Time horizons
research
Inductive Data collection
Phenomenology methods
Figure 2.1 Research process onion. Source: Saunders, M., Thornhill, A., & Lewis, P.
(2006: 85). Research methods for business students. 4th ed.
Pearson Education Ltd, reproduced with permission.
This thesis contains a number ofCase Studies where both broad approaches
have been adopted for the purposes of obtaining data that are as reliable
as possible, while also allowing for a richer analysis of the learning pro-
cess permitting alternative explanations of what is going on. For example,
in the TOLD project carried out in 2004, the quantitative data showed
that, over a 12-week semester, learning in a CALL context did not help
students in their oral language development significantly more than those
deprived of CALL were helped by similar material taught in a different
context. However, the quantitative data were given a depth of clarity by
the student logs which revealed that while most students who had access
to CALL materials appreciated the value of these materials for a variety
32 Chapter 2
Before justifying such a dual route approach in the light ofthe CALL
literature on past evaluation studies a clarification is needed ofthe nature
and significance of CALL-related pedagogy, platforms, and programs
within CALL effectiveness research.
Three primary questions arise in the CALL (and CAI) literature when
evaluating the role of pedagogy, platforms, and programs in learning gains.
Firstly, how best may one conceptualize the nature and interrelationship
ofthese three terms? Secondly, from a CALL perspective, do they all need
to be involved in our evaluation ofCALL effectiveness? Thirdly, is it pos-
sible to devise a means of measuring their overall effectiveness that takes
into account the separate and the symbiotic role of each in the learning
process? Notions such as Computer-Based Environments and Intel-
ligent Tutoring Systems reflect in their names the search for a language
that takes into account all three concepts and their interrelationships.
Advances in network design and the Internet have also added to the need
for a descriptive and evaluative language that goes beyond separate assess-
ments of software and pedagogy.
Furthermore, evaluative studies need a grasp not just of hardware
specifications and the authoring and instructional design process, but also
ofHCI and psycholinguistic notions, in order to ensure that an integrated
model of evaluation (MFE2) does justice to the experience that the stu-
dents go through when they interact with software and digital platforms,
and provides useful feedback for all the various professionals involved
in the creative process. When Morgenstern stated that too many CALL
programs were technology-driven and called for a more goal-driven
approach to authoring (1986: 23, cited in Levy 1997), he put his finger on
the fact that many CALL programs may be inadequate for the language
teaching and learning context they are used in, even though they may use
34 Chapter 2
the latest technology. Such software may, for instance, tend to exploit for
its own sake recent technological breakthroughs, such as speech recogni-
tion or multimedia, without properly ensuring the content matches student
abilities, curriculum requirements or language learning theory. Likewise
teachers may under-use or misuse expensive resources through ignorance
and inadequate stafftraining. CALL effectiveness research has a vital role
to play at the intersection of the various players involved.
In their 2005 report entitled Setting up effective digital language
laboratories and multimedia ICT suites, Davies et al. produced for CILT
(the UKs National Centre for Languages and the Association for Lan-
guage learning) a useful guide to evaluating the range of platforms and
programs available. They emphasize that the lessons from the demise of
the original analogue language laboratory are being learned and state that
nowadays the language lab is no longer seen as the panacea, but rather as
one of the many technological aids that the language teacher can choose
to use to enhance teaching and learning (Davies et al. 2005: 4). Their case
studies taken from secondary schools in England demonstrate that, even
without the latest hi-tech networks or technical support, a well-structured,
integrative approach to the use ofICT can motivate students and improve
results by up to 15 per cent (p. 25).
The guide gives some useful evaluative questions that it says should
be addressed prior to the purchasing of educational courseware. These
questions, some of which are listed below, could well form the entry point
of an evaluative model for the interrelationship between programs and
pedagogy:
teaching and learning, and has an association with students learning and
outcomes (2004: 283). As yet, one could not say there is a unified peda-
gogy oflanguage teaching, let alone CALL. It is debatable whether such a
unified approach would be possible or even desirable given the diversity of
theoretical standpoints adopted by individuals, institutions and national
education bodies, and given the state offlux that exists with regard to the
nature and acceptance of some ofthese theories. A brieflook at the history
of language learning and CALL pedagogies will illustrate the diversity of
approaches and methodologies past and present, underscore useful con-
ceptual and thus evaluative criteria, and will help outline the integrated,
blended approach to CALL that will inform MFE2 as well as the approach
adopted in some of the Case Studies.
Writing in 1996, a few years after the arrival of the CD-ROM, mul-
timedia and the Internet, Warschauer was addressing the new impetus to
CALL brought about by hypermedia and CMC (Computer-Mediated-
Communication). From an evolutionary point of view Warschauer was well
aware ofthe seemingly limitless possibilities that these new developments
would bring: global, real time and asynchronous audio and visual com-
munication, access in the classroom to vast amounts of different authentic
material, the easier integration of a variety of language skills in a single
activity, the liberty now to focus primarily on content without sacrificing
focus on language form or learning strategies, and finally greater control
for students of their own learning. The potential for quantum leaps in
optimized learning packages, environments, and experiences was there,
and with it the potential for improved student motivation and learning
gains. Even with further advances since then his central argument is just as
relevant today as it was then; citing Garrett (1991: 75), he concludes: the
use ofthe computer does not constitute a method. Rather, it is a medium
in which a variety of methods, approaches, and pedagogical philosophies
may be implemented (Warschauer 1996: 6).
A linear evolutionary analysis is not the only way CALL analysts have
viewed CALL history. Bax (2003), for example, distinguished his own
appellation Integrated CALL from Warschauers Integrative CALL mainly
by stating that Integrated CALL had not happened yet, and was still an
ideal to be aimed for, whereas the reality behind Warschauers term either
was already happening in the communicative era (e.g. task-based, project-
based and content-based approaches where language use aimed to be in
authentic social contexts and to combine the various language skills) or else
was not happening at all. Whether integrated or integrative both authors
appeared to be calling for a move away from occasional, un-integrated use
of CALL towards a more imaginative and holistic approach. Warschauer
and Healey (1998, cited in Bax 2003) had pointed out that In integrative
approaches, students learn to use a variety oftechnological tools as an ongo-
ing process oflanguage learning and use, rather than visiting the computer
lab on a once-a-week basis for isolated exercises (whether the exercises be
behaviouristic or communicative) (Bax 2003: 5771).
Swings, spirals and re-incarnations: Lessons from the past 41
When empiricist theory (the dominant educational theory ofthe 1950s and 1960s)
predominated there appeared to be a perfect match between the qualities of the
computer and the requirements oflanguage teaching and learning. With the advent
ofthe communicative approach to language teaching, some writers began to say that
CALL methodology was out of step with current ideas (Stevens et al. 1986: (xi)), that
the ideas conflicted (Smith 1988: 5), and that CALL was not adaptable to modern
methodologies. (Last 1989: 39)
One cannot deny that the unique capabilities ofthe computer to support
drill-and-practice (i.e. behaviourist, habit-formation) methodology explains
in large measure the continued popularity of behaviourist didactics, and
the reintegration of much drill-based software such as the enduring Fun
With Texts, or the more recent HotPotatoes into the language learning
curricula of the current eclectic post-communicative era. Such a reha-
bilitation is occurring, ironically, at a time when multimedia technology
42 Chapter 2
A more recent (i.e. since the arrival ofthe new millennium) development,
and one which reflects the above uncertainty and eclecticism has been
blended learning (BL). The term in its worst guise is used as a catch-all
for a thoroughly unreasoned pragmatism, but at its best appears to be a
synonym for a multi-modal approach that seeks to bring together best
practice from a range of pedagogies, methodologies and media in an opti-
mized package tailored to given situations. Its pragmatic nature has been
tagged as what works by one exponent in the business world whence the
term probably derived (Bersin and Associates 2003). One CALL author
has defined it as the optimal basis for language learning and teaching given
the particular conditions at hand (Neumeier 2005). Oliver and Trigwell
have linked BL to variation theory (2005: 1726) and see it as enhancing
learning through the controlled blending of media, modes of experience,
and patterns of variation. For the BLINGUA (blended learning) project
at the University of Ulster (see Chapter 6) we defined blended learning
in CALL as the adaptation in a local context of previous CALL and non-
CALL pedagogies into an integrated programme oflanguage teaching and
learning, drawing on different mixes of media and delivery to produce an
optimum mix that addresses the unique needs and demands of that con-
text. (Leakey and Ranchoux 2005: 358).
44 Chapter 2
Coleman and Klapper (2005) state that for many years: there has been a
serious discrepancy between second language acquisition research findings
on the way foreign languages (FL) are learned and the way many univer-
sities have continued to teach them to students (p. 31). As a corollary of
this one might reasonably expect the discrepancy to apply to the way that
CALL itself is taught. A number ofCALL pedagogues have attempted to
apply a single theory oflanguage learning to CALL, but Chapelles (2001)
treatise on Computer-Assisted Second Language Acquisition (CASLA)
is one of the few also to link theory to CALL evaluation.
46 Chapter 2
In other words, children, adolescents and adults can, and do, create their
own tested truth through interaction with other more advanced learners.
The concept ofscaffolded learning that ensures consistent and structured
support and guidance for the learner has been elaborated from this theory.
Such an approach echoes Higgins (1983) call for the more facilitative role
ofthe classical pedagogue to be adopted in a constructivist, heuristic use
ofthe computer in education, balancing the more prescriptive, knowledge-
dispensing magister of conventional drill-and-practice education. Through
its ever-expanding array of online and offline support mechanisms, tuto-
rials, helps, reference materials, plug-ins, tracking and feedback systems,
CALL clearly has the potential to provide specialized and differentiated
scaffolding for all types of learners.
Elaborations of Vygotskys ideas have in recent years led to increased
emphasis on amongst other things, collaborative learning, paired/group
activities and projects, peer assessment and feedback, and these are a key
feature of language learning as taught in many, if not most, schools and
universities. CALL pedagogues and indeed designers of courseware, ana-
logue and digital language labs have also sought to take social construc-
tivist approaches on board to nurture pair/group learning whether by
random pairing in a lab or group writing projects via web-based chat or
conferencing.
In its striving to describe and explain language acquisition, SLA
research has, in the last decade, increasingly focused on the point where
learning and the learner meet: the task. Chapelle (2001) charts six principles
or criteria for CALL task appropriateness: language learning potential,
learner fit, meaning focus, authenticity, positive impact and practicality.
CALL effectiveness researchers now regularly refer to this agenda to evalu-
ate CALL software and pedagogies; it will in turn inform the evolution of
the MFE1 and MFE2. Table 2.1 provides a basic definition of each of the
principles. A brief explanation of these principles then follows.
48 Chapter 2
Each ofthese six criteria has a wealth of related meaning, and has drawn on
and also inspired other CALL or SLA researchers. While there is no point
in restating all of the background behind all of Chapelles criteria, a brief
foray into one ofthe six will demonstrate the range of associations behind
the term and their usefulness to CALL evaluation. To take Chapelles
second criterion, learner fit, it is clear that already a vast amount of work
has been done on incorporating the reality of learner experiences, differ-
ences and strategies into both language teaching and CALL.
SLA and constructivist theory is as much about understanding the
dynamics at work within and between individual learners as it is about
describing the universal characteristics of language acquisition. Piaget
himself looked back to a much earlier era, once more citing Rousseau
whose advice to teachers, in the preface to Emile, or On Education (1762)
was: begin by studying your pupils, for assuredly you do not know them at
all (Piaget 1967: 139142). The prior learning, experiences and individual
differences that our students bring to CALL must surely also be factored
into the impact equation.
50 Chapter 2
other variables such as age, gender, and prior learning experience. Some
of these, such as gender, age, and aptitude are fixed, or at least cannot be
altered by the individual, others, such as motivation, attitude and learning
strategies, are potentially alterable through individual decision, experience
and/or learning. The research methodology for most of the Case Studies
will include a qualitative survey of learner difference that aims to qualify
and quantify student learning style, prior language learning experience and
beliefs and subjective judgments regarding their language learning.
To this end just one particular list oflearning styles has been adopted:
Reids VARK model (1987). This hypothesizes four key learning style vari-
ables: visual, auditory/aural, kinaesthetic, and read/write, although origi-
nally the R was a T (for tactile learning). This was selected for the following
reasons. First, it was originally articulated as a result of analysing the behav-
iours oflanguage students as opposed to other kinds oflearners. Secondly,
it is the list used by the University of Ulster (and many other universities
and schools) as the basis for its Personal Development Planning online
self-diagnosis questionnaire to help students understand how they learn.
Finally, the list of four variables is conveniently brief by comparison with
other learner style/personality type lists, such as the four pairings of the
Myers-Briggs Type Indicator (MBTI): Extroversion-Introversion, Intuitive-
Sensing, Thinking-Feeling, Perceptive-Judging (see Myers, McCaulley,
Quenk, and Hammer 1998, cited in Hu 2006: 47), or the four pairs oflearn-
ing styles in the Felder-Silverman list: Active-Reflective, Sensing-Intuitive,
Visual-Verbal, Sequential-Global. Brevity was considered important for the
purpose of making it easier to obtain sample sizes large enough for analysis
of covariance. The more variables one has the fewer the number of indi-
viduals who fall into any one category and the less generalizable ones data
become. The danger with this approach is that what one gains in sample
sizes one may lose in the precision of ones learning style descriptors. The
VARK list is neat and popular but, of course, leaves plenty of gaps in its
description of the way individuals learn.
A model of evaluation that seeks to measure effectiveness of a peda-
gogy or CALL object must acknowledge that a multitude of dynamics are
at play in the real-life CALL classroom and in the individual learner, and
that it is virtually impossible to control for all these potentially confound-
ing variables. Every researcher must, therefore, couch his/her inferences in
cautious terms and with reference to the caveats that exist.
52 Chapter 2
Platforms
Specifications vary enormously. For example, Artec, CAN-8, Keylink, Melissi and
Sun-Tech labs are purely digital and only need normal network cables, whereas Sanako
(Divace) also requires the room to have special analogue cabling. Activa Solutions
Esprit uses hardware interface boxes for remote control and monitoring. CAN-8
requires that lessons are pre-authored with supplied tools, whilst the others allow
the teacher more flexibility and spontaneity. Prices also vary considerably. (p. 6)
As for the principal VLE that this study will focus on, WebCT, which
describes itself as the worlds leading provider of e-learning systems for
educational institutions (<http://www.webct.com/>), was founded in
1995 by Murray Goldberg, ofthe University ofBritish Columbia. In Feb-
ruary 2005 the two companies Blackboard and WebCT merged; the VLE
WebCT Vista is now one of its frontline products.
When assessing the role played by such platforms in the learning pro-
cess our quantitative and qualitative evaluation will need to assess such issues
as best fit within a diversity of pedagogies and didactic approaches, and
54 Chapter 2
Programs
future, then it is clear that a common agenda for evaluating CALL, both
now and in the future, is essential so that clear benchmarks are established.
On the other hand, a one size fits all approach to such evaluation will need
to be both flexible enough to handle the hybrid nature of much CALL
activity and the scope of the Three Ps, and rigid enough to ensure repeat-
ability across a range of educational sectors and geographical contexts.
The next chapter looks at what evidence there already is for CALLs
effectiveness, and how this evidence was obtained. Both the existing evi-
dence and the methodologies will be discussed in the context of four key
debates in the field of CALL effectiveness research: the improvement
debate, the comparison debate, the configuration debate, and the outcome
versus processes debate. This will prepare the ground for the subsequent
chapter which assembles a new framework for evaluation drawn from the
lessons of the past.
Chapter 3
Introduction
Other Case Studies showed links between ICT in MFL and improved
motivation and attainment (Blow 2001; TOP, 2001), understanding of
language, confidence, and examination performance (Superhighways Ini-
tiative, 1997).
In its introduction to its summary of research findings BECTA is
careful, however, to point out that the technology itself, in isolation from
effective pedagogy, must not be seen as the prime agent of gain, any posi-
tive impacts depend on the ways in which ICT is used. Improvements in
attainment and motivation will inevitably be reliant on the capacity of
teachers and students to use ICT as an effective pedagogical tool in the
pursuit of particular learning objectives (BECTA, p. 3).
Overall figures for HE are harder to come by, and one is dependent
on occasional meta-analyses such as the Felix one (2005c) which found
that very few studies aimed to obtain empirical quantitative evidence of
learning gains, and occasional country-wide surveys such as the Multimedia
Language Learning in UK Universities survey (Toner et al. 2007) which
received responses from 56 UK HE establishments. This study, however,
focused on the use of multimedia hardware in HE language teaching and
provides no qualitative or quantitative data regarding the impact ofCALL
on language learning outcomes. For evidence of the impact of CALL on
learning outcomes at HE level one is largely dependent on small-scale
studies published in the literature, the majority of which point to quali-
tative benefits, but lack empirical evidence of learning gains, and whose
replicability is often questionable.
When discussing the effectiveness or impact oftechnology, be it plat-
forms, programs or pedagogy, it is vital that the terms and parameters of
reference be clearly established so that we are clear as to the full range
of forces and variables at play. The community of CALL effectiveness
researchers is slowly extricating itself from the blind alley it had moved
down in the 1970s and 1980s, namely that of seeing CALL as treatment
applied to the learner, and then attempting to measure the effect of that
treatment on learning, without factoring in the role played by a host of
other factors such as the tutor, the teaching approach, the environment, and
most importantly the internal dynamics ofthe learner. In large part due to
the influence of a more cognitive approach such as that advocated by the
SLA community, warning sounds began to emerge in the late 1980s from
62 Chapter 3
Behind this debate lies the question: can comparative evaluations be of any
value in demonstrating learning gains? Pedersons critiques of compara-
tive studies related to problems of replicability, attribution of causality,
and language learning theory. First, she argued that comparative stud-
ies cannot be easily replicated for the reason that the conditions under
which the study took place are hard, if not impossible, to reproduce. She
asked: if the independent variableis use of the computer versus use of
a traditional method, how can the classroom teacher in another setting
be assured that his or her use of the computer will be identical to that of
the primary study? (p. 106).
She goes on to state that there is no valid way to ascribe with confi-
dence the causes for differences in the dependent variables to the independ-
ent variables. For these reasons, she argues, any results will be difficult, if
not impossible, to generalize (Pederson 1988: 106107).
64 Chapter 3
in the same setting. The key in the repeat test is to alter as few variables as
possible, preferably only one, in order to increase attribution of causality.
Thirdly, and in order to make the study applicable outside the cohort and
institution of study, the key is to obtain as large a sample size as possible so
as to increase its generalizability or external validity. Fourthly, longitudinal
time-series analyses are a way of ensuring students are exposed to the same
or at least similar conditions, and of enlarging the sample size where the
institution has cohorts smaller than thirty students (the minimum needed
for the purposes of assuming normality). Here, a series of observations is
made on the same variable consecutively over time. The observations can be
on identical or similar units. Felix is an advocate ofthis approach (2005c:
17). The BLINGUA project in the Pedagogy Case Study is an attempt at
a longitudinal approach to a comparative study.
Chapelle gives a fifth way in which external validity (replicability and
generalizability) can be achieved. As long as practitioners are fully informed
of the specific nature of the context of instruction, the characteristics of
the students, and the type of CALL activities undertaken in a particular
study then it may be possible to relate such findings to a different context
where such conditions do not pertain in exactly the same way (Chapelle
1991: 4953).
An excellent aspect of the evaluation was the variety of data collection techniques
used, and this approach is highly recommended for future research. Questionnaire,
journal and test data complemented the information collected during the observa-
tions. These latter, in particular, yielded interesting information that would have
been difficult to obtain through questionnaires. They clearly confirmed the general
enthusiasm for the approach. They also highlighted differences oflearning styles and
preferences among students. (Felix 2000b: 61; emphasis added)
Early agenda-setting in this regard came from Salaberry. In his article enti-
tled: A theoretical foundation for the development of pedagogical tasks in
Computer Mediated Communication (1996), Salaberry calls for greater
rigour in the experimental design ofCALL studies: Adrawback of arche-
typal CALL programs has been the lack of appropriate empirical studies
that assess the benefits of such programs (Salaberry 1996: 10).
Salaberry cites reported deficiencies in the few empirical studies
addressing the pedagogical benefits ofCAI on learning (e.g. Reeves 1993;
Schmitt 1991, cited in Salaberry 1996). Schmitt, he says, noted small sample
sizes, lack of criteria for what constitutes appropriate software, faulty statis-
tical analysis, and inadequate length of treatment to measure educational
outcomes. Reeves mentioned the lack of theoretical framework, infre-
quency and brevity of experimental treatments, small sample sizes, and
large attrition in the number of participating subjects (Salaberry, p. 9).
Salaberry also lists the lack of use of a control group to measure
increased learning as an outcome and the Hawthorne effect. He cites
Reeves solution to these problems which is a step-wise and configured
approach of multiple research methods: conduct extensive, in-depth
studies to observe human behaviour in our field and relate the observations
68 Chapter 3
There exists a clear trend away from the comparison studies carried out during the
1980s that wanted to find out whether teaching with computers was better than
teaching without them. One of the reasons for this is surely the difficulty of car-
rying out valid research of this kind in natural settings. The most obvious reason,
though, is that in an environment where computers have become a natural part of
the educational environment and in which we have learnt that teachers will not be
replaced by them, the question is no longer as interesting. What remains interesting
to investigate is how technologies are impacting learning processes and as a conse-
quence might improve learning outcomes. (2004: 127; 2005c: 16)
investigate the extent to which learners have mastered a specific linguistic point, the
meta-cognitive strategies learners use while working on CALL, or the quality of
the cross-cultural experience they gain through CALL. Accordingly, other research
methods, such as experimental, correlational, introspective, or ethnographic meth-
ods, might also be used. (1997: 28)
She concludes that it seems necessary to shift from general approaches such
as those of psychology, computational linguistics, and educational technol-
ogy to the specific questions and methods of researchers who investigate
instructed SLA. (1997: 28). Whether this means that CALL (or CASLA)
70 Chapter 3
The search for an optimal platform, program or pedagogy for CALL may
yet be shown to be in vain. The range of different possibilities in terms of
language task, learner need, instructional method and language learning
theory is so vast that it is unlikely that any one product or approach will
ever be proven to be vastly better than the rest. What is needful, however,
are comparative studies that are formative in nature, which highlight those
approaches and combinations of theory, design, environment, platform,
courseware and pedagogy that work best together, and thereby contribute
to improved CALL design, an enhanced integration of CALL into lan-
guage learning curricula, and improved CALL pedagogy.
To develop an evaluative methodology that will effectively measure
the impact of CALL on students progress, the following four aspects of
course design logic will have to be taken into account at both the analysis
stage and the reporting stage. First, there needs to be an awareness of the
nature of the thinking that lay behind the development (if the materials
were developed in-house) or choice of courseware used. Was the develop-
ment or selection made on the basis of an instructional or design theory
or were pragmatic, context-specific issues paramount? Secondly, clarity is
needed as to the basis on which the teaching context, that is the computer-
based environment and the wider language teaching and learning context,
was constructed. Thirdly, the particular pedagogical approach, if any, that
has been adopted will need to be identified. And fourthly, the degree of
integration of the CALL activities into the wider language learning cur-
riculum needs to be described and explained.
Has CALL made a difference: And how can we tell? 71
To this day, these four key debates remain pivotal in the world ofCALL
in general and in the field ofCALL effectiveness research in particular. The
answers to them will continue to determine the direction we head in. The
first debate, or question, as to whether CALL improves language learning
has met with a guarded yes as long as the question is framed aright it is
not so much about whether the computer itself can deliver improved learn-
ing gains as about whether an intelligent integration of good hardware and
courseware and sound pedagogy can do so. Secondly, comparative studies
can be of value, again as long as the aim is not to prove the effectiveness of
computers or software rather than an integrated and sound CALL peda-
gogy, and as long as reporting ofthe study clearly states any differences in
conditions under which each element of a study carried out. Comparisons
between various CALL approaches are also valid, indeed essential, and are
a central focus of all ofthe Case Studies in this enquiry. The third debate,
regarding the optimal configuration of data collection methods, highlights
the importance ofhaving a variety of methods, both qualitative and quan-
titative, to ensure that both a rich and an accurate picture is obtained as to
what is going on in the CALL activity under observation. Effectiveness
researchers in CALL/CASLA such as Pederson, Chapelle and Felix have
laid down clear and directives for a rigorous approach to construct valid-
ity in CALL measurement, and their agendas will contribute much to the
qualitative and quantitative measures ofboth MFE1 and MFE2. The fourth
debate has shown that there is ambivalence as to the respective weight we
should be putting on processes and outcomes. What is needed, as Salab-
erry stated, is a more encompassing framework of analysis to survey and
measure, qualify and quantify what is going on both within the individual
learner and between learners in the learning process, all the while adding
evidential data to the bank of unexplored, under-explored and disputed
areas of CALL learning gains.
CALL evaluation needs a matrix oftheory-derived criteria for observ-
ing CALL learning process, that is, the CALL task, activity, and experience.
Such a matrix should also have the capacity to conduct a kind of quality
control of what is going on in the CALL environment, that is, in the learner
and in the learning. Process will be hard, if not impossible to evaluate
quantitatively, and its evaluation may be primarily a matter for judgmen-
tal evaluation, that by definition cannot be substantiated by measurement
72 Chapter 3
The lessons learned from the CALL and CAL literature and the Case
Studies yielded a prototype (MFE1) which is outlined and anticipated in
Figure 4.1 and justified theoretically, and in the light of a review of the
relevant literature, in the remainder of this chapter. Those wishing to see
the presentation and explanation of the final complete model (MFE2)
assembled in the light of lessons learned while conducing the Case Stud-
ies, as well as a complete set of evaluative checklists, may skip to the final
chapter (Chapter 9). The Case Studies are included to demonstrate how
various aspects of the model for evaluation were applied to the Three Ps
and trialled in real-life educational settings.
There are essentially two routes through the evaluative process, as sug-
gested in the conclusion to the previous chapter: a judgmental appraisal
ofthe twelve CALL enhancement criteria in a given unit ofCALL teach-
ing, and the empirical (qualitative and quantitative) evaluation of that
unit through the prism of one, two or all three ofthe Three Ps (platforms,
programs, pedagogy). Using the twelve CALL Enhancement Criteria as
a starting point for any CALL evaluation should help to clarify the scope
and angle of approach to a planned judgmental and/or empirical enquiry,
and help inform the direction and progression offuture evaluative studies.
The Qualitative and Quantitative Measures route then outlines the pre-
cise methodological steps such studies should at least bear in mind when
designing the research construct for a study, then when gathering data for
and implementing a study, and, finally, when reporting on it.
74 Chapter 4
EVALUATION FLOW-CHART
Learner fit
Teacher factor
the software and the pedagogical task it is clear that in her approach we
have the basis for a theory-driven, holistic, and configured approach to
CALL evaluation.
Chapelles approach addresses theory-driven pedagogy while also
being teacher-, courseware designer- and researcher-friendly, in that she
accompanies her criteria with clearly stepped questions that relate to each
ofher evaluation criteria. Where she differs from many CALL evaluators
is that her approach is based on a single theory (SLA) whereas others, as
we shall see, are either theory-neutral or employ a hybrid mix of theories.
While Chapelles approach, being single-theory based, is less flexible than
others, her six principles are generic and flexible enough to operate at a
number of different evaluative levels (e.g. at the level of evaluating task
appropriateness (p. 55), test usefulness (p. 101), and in a variety of different
contexts, in particular, judgmental analysis ofCALL appropriateness (p. 59)
and empirical evaluation ofCALL tasks (p. 68). They are accompanied by
repeated calls for empirical evidence and are followed up by clear guidelines
on ensuring suitable internal and external validity.
Our proposed new Model for Evaluation also prompts evaluators in
their own studies to seek out evidence for adherence to criteria evidence
that is, ideally, both objective and measurable and then to relate each of
these criteria and their sub-elements to any, or all, of the Three Ps and to
the different phases of a teaching cycle. It also includes space for evalua-
tors both to rate the quality of the evidence they find using Likert scale
ranking and open-ended comment boxes.
Having a generic methodology that operates at a number of different
levels is both an advantage and a disadvantage. It is an advantage in that it
is a reasonably simple and memorable model that enables quite a holistic
approach to evaluation; the disadvantage of this is that it is not always
adaptable in its entirety to the exigencies of a given context. A model with
a clearly-defined, though narrow, focus will occasionally be inadequate to
address the theoretical requirements and detail of a more complex, multi-
modal situation, and in such instances might require modification. For
example, at one point Chapelle, in applying her task-focused model to a
different context, has to deviate from her six criteria. When her context
shifts from CALL task appropriateness to CALL test usefulness, makes
use of just three of her six principles (i.e. authenticity, positive impact
78 Chapter 4
and practicality), and replaces the remainder with the new principles of
reliability, construct validity, and interactiveness (see Table 4.1). Our
larger model for evaluation will indeed incorporate these notions but they
are categorized and distributed differently within a full, and therefore,
more flexible evaluative framework. Reliability is seen on the one hand
as a CALL Enhancement Criterion and, as such, is subsumed into the
principle of practicality, and on the other hand is seen as data validity
criterion and, along with construct validity features in the Data Col-
lection Measures section of our final model. Interactiveness, however, is
deemed to be too multi-faceted a notion to come under one discrete head-
ing. Given the increase of interactivity across the full range of computer-
assisted learning the notion has been distributed across at least four of
the six additional criteria: it is relevant to learner control, collaborative
CALL, error correction and feedback and tuition delivery modes in the
final list of twelve criteria.
Table 4.1 Chapelles criteria for evaluating the qualities of test usefulness.
Source: Chapelle, C. (2001: 101). Computer applications in Second Language Acquisition.
Cambridge University Press, reproduced with permission.
A model for evaluating CALL Part 1: CALL enhancement criteria 79
As for the decision to change Chapelles terminology from criteria for evalu-
ation ofCALL task appropriateness to criteria/principles for evaluation
ofCALL enhancement this was guided by two principal considerations.
First, Chapelles six criteria are restricted to judging the CALL task, and
therefore exclude the judging of platforms, as well as some other features
ofthe CALL experience such as error correction and feedback. Secondly,
the term appropriateness, while being an excellent term to describe the
suitability and the fit of CALL provision did not extend to the idea of
value added and the difference that CALL makes (or does not make) to
student learning which was deemed to be essential aspect of effectiveness
research.
To adapt further Chapelles criteria to the Three Ps two adjustments
were made to two ofher own definitions (see Table 4.2), before adding the
six new criteria. Her definition of authenticity referred to the degree of
correspondence between the learning activity and target language activ-
ities of interest to learners outside of the classroom (2001: 55 emphasis
added); this reference to the classroom was enlarged to include also the
CALL environment as the word classroom excludes the notions of the
dedicated VLE and the multimedia laboratory which increasing numbers
of institutions have adopted to replace the dated analogue language labora-
tory. Secondly, her definition ofpracticality was given as the adequacy of
the resources to support the use ofthe CALL activity. This was lengthened
to include the cost effectiveness of such resources, given the importance
of budgetary considerations in most institutions and the need to choose
the less expensive way if the learning gains delivered by two differently
priced resources prove to be similar (cf: Clark 1994: 22, cited in Allum
2002: 147).
80 Chapter 4
Table 4.2 Chapelles six criteria for evaluation of CALL task appropriateness.
Source: Chapelle, C. (2001: 55). Computer applications in Second Language Acquisition.
Cambridge University Press, adapted with permission
(adaptations indicated in italics).
The six new CALL enhancement criteria (see Table 4.3), additional to
Chapelles six, were arrived at over the course of a five year project and
drew from a study of eight different authors from e-learning (Mehanna
2004), CALL (Chapelle 2001, Ingraham and Emery 1991, Hubbard 1988,
Dunkel 1991, and Pederson 1988) and SLA (Ellis 1994), from a UK-wide
survey of multimedia laboratory use at HE level (Toner et al. 2007) and
from the design criteria of a manufacturer of digital labs for language teach-
ing (Melissi 2007). To obtain these additional criteria Chapelles own cri-
teria were mapped against the varied evaluative concepts and hardware/
courseware design features arising from these sources. The wording ofthe
definitions for each new criterion was new but informed by the relevant
literature and fine-tuned through the experience gained in conducting
the Case Studies.
A model for evaluating CALL Part 1: CALL enhancement criteria 81
Table 4.3 Additional six principles for evaluating CALL enhancement (Leakey).
From the tally chart one can make a number of points relating to the
relevance of the sources to the generation of a new model for evaluation.
The distribution ofthe top two scores in each row (shown in italics) reveals
a good spread across the sources and strong justification for each criterion.
Clearly, both the Chapelle criteria and new criteria have a good spread of
representation across the sources; this is shown by the total scores for each
row (lowest is ten, and highest thirty-seven). Also, the Chapelle criteria
resonate well with most of the new criteria, and the new criteria resonate
well with most of the Chapelle criteria; the main exception to this being
the two platform-related columns where Chapelles criteria fare less well,
for reasons already mentioned. One can infer also that all the sources have a
strong resonance with constructivist and SLA concepts. This is also backed
up by the strong showing ofEllis (the principal SLA author ofthe sources)
against most ofthe criteria, and indicative ofthe fact that most teaching of
modern and foreign languages is nowadays heavily, though not exclusively,
influenced by constructivist ideology and practice.
What follows are the mapping exercises for each of the eight sources
above accompanied by a commentary explaining how each mapping exer-
cise was used to generate, define and justify the six new criteria. These new
criteria have been listed in the far right hand column of each mapping table
and, when considered with the mapping exercises from the other evaluative
agendas from the literature, they have been deemed significant enough to
include in the final list of evaluative principles.
A model for evaluating CALL Part 1: CALL enhancement criteria 83
Ingraham &
Toner et al.
Mehanna
Hubbard
Pederson
Dunkel
Melissi
Emery
Ellis
12 criteria for judging
Total
CALL enhancement
14 criteria
10 criteria
10 criteria
16 criteria
15 criteria
11 criteria
11 criteria
9 criteria
Language
0 1 0 1 5 3 0 9 19
learning potential
Learner fit 1 3 4 3 4 6 7 4 32
Chapelle
Meaning focus 0 4 0 2 7 2 1 5 21
Authenticity 1 4 1 0 2 0 1 1 10
Positive impact 2 0 9 3 1 3 1 2 21
Practicality 7 15 4 9 1 1 0 0 37
Language skills and
0 4 0 0 3 2 0 1 10
combinations of skills
Learner control 1 8 2 1 2 1 2 1 18
Error correction and
1 4 0 1 1 1 2 3 13
Leakey
feedback
Collaborative CALL 2 6 1 1 2 0 0 1 13
Teacher factor 6 1 0 0 2 1 1 3 14
Tuition delivery
7 4 0 0 0 3 0 4 18
modes
Totals 28 54 21 21 30 23 15 34 226
Table 4.4 Tally chart of exercises mapping the twelve CALL Enhancement Criteria.
Mapped against key authors from the literature and CALL practice.
84 Chapter 4
evaluation supplementary to
Language learning potential
Proposed principles of
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
Chapelle
Author
PPP
Descriptor
Interaction &
Increasing teacher inter
Collaborative
action with students
CALL
Interaction &
Increasing interaction
Collaborative
among students
CALL
Introducing audio/video
x x
and other media to classes
Provision/storage of
Toner et al. (2007)
media files
Encouraging student
x x
engagement
Increase in tutor workload x
Technical problems
impact upon effectiveness x
of class
Table 4.5 Mapping the University of Ulster/LLAS (Toner et al. 2007) survey questions.
For digital platform evaluation against Chapelles six principles for CALL evaluation.
A model for evaluating CALL Part 1: CALL enhancement criteria 85
Mapped against Chapelles six criteria for CALL task evaluation (see Table
4.5) the questions used by Toner et al. (2007) in their survey of digital
platform use in the UK Higher Education sector, already discussed in
Chapter 2, did not specifically cover issues relating to language learning
potential and meaning focus and was lacking also in the area of learner
fit and authenticity. This was mainly due to the fact that its principal focus
was on the functionality ofthe digital platform and less about pedagogical
content or method. There was, therefore, reasonable coverage ofpositive
impact and strong coverage of issues ofpracticality. On the other hand the
survey did consider two new areas that did not fit easily into the Chapelle
list: the provision oflearner control (encouraging autonomous learning)
and the promotion of interactivity and a group dynamic, which we have
classified under collaborative CALL.
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
Proposed principles of
potential
Author
PPP
Teachers are able to produce lessons on the fly Teacher style factor;
x
without long, complicated advanced preparation Tuition delivery modes
They can also produce complete activities in
Tuition delivery modes;
advance that can include: audio; video; pictures; x
Platforms
4+ skills
text and instructions
Students log in and are allocated file storage space
x Learner control
on the teachers computer or server
Chapter 4
Students can work on documents, using a word
processor, while listening or watching material either x x x x Learner control
sent from the teacher or off the web
A model for evaluating CALL Part 1: CALL enhancement criteria
Tuition delivery modes;
Teacher can speak individually or collectively to the
x x x Error correction and feed-
students through their headphones
back; Collaborative CALL
Learner control;
Students can call and speak to the teacher x x
Collaborative CALL
Learner control;
Students can call and speak to the teacher x x
Collaborative CALL
Melissi digital classtoom criteria (2005)
87
Table 4.6 Mapping the Melissi Digital Classroom performance indicators.
Mapped against Chapelles six principles for CALL evaluation
(see also <http://www.Melissi.co.uk.htm> [accessed 13 April 2005]).
88 Chapter 4
some challenging games and activities that are, often, beyond all but the
most competent students, or simply inauthentic in their content.
Ingraham and Emerys sub-topics televisual environment, windows
environment, screen design, hypermedia and linearity, autonomy versus
control, autonomy and self-tuition will prove important in qualitative
assessments of student and staff reactions to these elements. Performance
by CALL programs against these criteria will be seen to play an important
role in motivating or de-motivating users. An evaluative model needs to
construct research activities to test the premise that software that matched
student levels of competence to levels and the lesson structure within the
software package leads to greater and quicker learning gains (quantitative
measures) and greater student satisfaction and motivation (qualitative
measures) than a package that did not do this. These issues will feature
significantly in Chapter 7 when Ingraham and Emerys criteria are used
to evaluate the TellMeMore software program.
Ingraham and Emerys final set of criteria under the heading practi-
cal considerations (including authenticity, active and passive learning,
interaction and response) anticipate at least two of Chapelles six prin-
ciples. The BLINGUA project at the University of Ulster (see the Case
Study in Chapter 8) applied such criteria as practicality and workability
to its evaluation of a blended learning project for CALL in the context of
undergraduate language learning.
When mapped against Chapelles six principles for CALL evalua-
tion (see Table 4.7), Ingraham and Emerys agenda for CALL courseware
design (1991) has no coverage of language learning potential, meaning
focus, positive impact and little coverage of authenticity. Their agenda
does throw up, however, other supplementary headings already met in the
Melissi mapping: the provision of learner control and the promotion of
collaborative CALL in. As with the other mapped authors these supple-
mentary principles of evaluation will feature in MFE2 to ensure a fuller
and more comprehensive analysis.
Programs PPP
Ingraham & Emery (1991) in Levy (1997) Author
Methodological Overall objectives and
issues structure
Language learning methods
The televisual environment
Levels of competence
CAL methodology
Descriptor
Course structure
Lesson structure
Learner fit
x
Meaning focus
Authenticity
Positive impact
Practicality
x
Chapter 4 90
A model for evaluating CALL Part 1: CALL enhancement criteria
The windows environment x
Screen design x
Interface issues
Ingraham & Emery (1991) in Levy (1997)
Hypermedia and linearity x
Authenticity x x
Table 4.7 Mapping of Ingraham and Emerys (1991) evaluative headings. For CALL courseware design against
Chapelles (2001) evaluative agenda for CALL tasks.
91
92 Chapter 4
Proposed principles
Language learning
supplementary to
Positive impact
Meaning focus
of evaluation
Authenticity
Practicality
Learner fit
potential
Chapelle
Author
PPP
Descriptor
Provides meaningful
communicative interac-
x x x
tion between student
and computer
Provides comprehensi-
ble input at a level just
x x
beyond that currently
acquired by the learner
Promotes a positive self
x x
image in the learner
Motivates the learner
x x
to use it
Hubbard (1988)
x x
acquire the language
Provides a challenge
but does not produce x x
frustration or anxiety
Error
Does not include overt
x correction
error correction
and feedback
Allows the learner the Provides
opportunity to produce x x x learner
comprehensible output control
Acts effectively as a
catalyst to promote
Collaborative
learner-learner inter- x
CALL
action in the target
language
When it comes to the third ofthe Three Ps in our list: pedagogy, the com-
plexity level increases as the human element (learner and teacher) is now
the central focus. Is it possible, one needs to ask at the outset, using well-
targeted and narrowly-focused research designs to get nearer to accounting
empirically for what is actually taking place in the learner? Ultimately, it
is from what is learned by the students that any measurable impact data
can be derived. The challenge here is initially about identifying, clarify-
ing, and then measuring the impact made by key variables involved in the
learning process.
A model for evaluating CALL Part 1: CALL enhancement criteria 95
elements. Evaluation in any one ofthe circled elements in the figure already
provides the scope for a separate discipline in itself. For example, CALL-
based analysis of errors subsumes, amongst other things, the domains
of tracking software, computerized error and needs analyses, explicit and
implicit, formative and summative feedback, diagnostic tests, computer-
adaptive testing (CAT) and online assessment.
As for the language learner we now have a dazzling array of online devices
for learners to self-diagnose their learning style. For the diagnostic survey
of learning style the VARK list, as mentioned in Chapter 2, was used in
the Case Studies for this project, but there are many others. For one ofthe
pre-tests a computer-adaptive test was used that responded and adapted
to the ongoing correctness, or lack of it, of students answers to direct the
difficulty level of remaining questions, and then make recommendations
98 Chapter 4
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
potential
Author
PPP
Descriptor
characteristics of
x
learner-language
Focus on Learning
Description: Area 1
errors x
acquisition orders and
x
developmental sequences
variability x
pragmatic features x
Explanation:
learner-external factors x
Learning
Focus on
Area 2
social context x
Ellis (1994)
Pe dagogy
L1 transfer x
learning processes x
communication strategies x x
knowledge of linguistic
x
universals
Focus on Learner
general factors
x x
e.g. motivation
learner strategies x
Table 4.10 Mapping of Elliss framework for investigating SLA.
Mapped to Chapelles principles for CALL evaluation.
100 Chapter 4
Dunkel (1991) provides a useful, early review ofthe strengths and weaknesses
ofkey effectiveness research studies. Her main interest is in the narrative and
meta-analytic research base. What Dunkel brings to the CALL effectiveness
research table is an ability to ask pertinent questions about the impacts of
CALL, a highlighting of the strengths and weaknesses of various research
designs, and useful recommendations for improvement in the rigour of evalu-
ative studies. The mapping exercise relating to Dunkel draws on her overall
in-depth analysis and recommendations for future effectiveness research, and
not from any tabulated framework; so, a brief review of her recommenda-
tions is needed to establish the grounds for her evaluative criteria.
The first CAI meta-analysis she looks at is that carried out by Roblyer,
Castine and King (eds) (1988, cited in Dunkel 1991: 535), which she calls a
review ofthe syntheses (Roblyer et al. looked at 26 ofthese prior to 1980),
but which, in addition, includes an analysis of 38 research reports and 44
doctoral dissertations completed between 1980 and 1988. She highlights
the editors conclusion that, while specific measures exist for evaluating
educational achievement (such as student achievement, attitudes, drop-out
rate, learning time), yet after 25 years of use of computers in instruction
the impact of computer applications on these measures remains largely an
unknown quantity (Roblyer et al. 1988: 12). Their review throws up key
questions that remained at that time to be answered unambiguously, and
which, for a large part, are still matters of dispute, such as:
Can computer applications help improve student performance in basic skills and
other key areas? For what specific skill areas, grade levels, and content areas are
computer applications most effective? Which kinds and levels of students seem to
profit most from using computers to learn? Which kinds of computer applications
are most effective for which skill and content areas? Can computer applications
improve students attitudes towards school, learning, and their abilities to learn? Will
improved attitudes translate into better performance in school and lower drop-out
rates? (Roblyer et al., p. 12)
Most, if not all, of these questions are still relevant today and some have
informed the evolution of MFE1 and MFE2. Dunkel bases her summary
of research findings on seven strong and consistent trends in findings
relating to the following aspects of instruction:
A model for evaluating CALL Part 1: CALL enhancement criteria 101
(1) The amount of learning time; (2) student attitudes towards the computer and
the subject matter; (3) the effect of computer use in specific content areas; (4) the
types ofCAI (tutorials, drill and practice, and simulations); (5) the computer envi-
ronments (CAI, CMI, and CEI); (6) the uses of CAI (i.e., as a supplement to,
versus a replacement for, traditional methods); and (7) the levels of student ability.
(Dunkel 1991: 11)
These trends are drawn from a mix of quantitative and qualitative studies.
Their findings can, when appropriately marshalled, inform the design of
programs themselves, the pedagogy behind the delivery ofthese programs
and also the approach to the evaluation of the effectiveness of these pro-
grams. What follows is a brief summary of her key findings in these areas
as they relate to effectiveness research and this enquiry.
The first criterion Dunkel looks at relates to the timesaving benefits of
CAL/CALL. While arguing that research interest in the timesaving ben-
efits of computers had lessened (in 1991) in favour of cost-effectiveness and
courseware design issues, her reference to the possible remedial benefit of
CALL materials is a valid point which could have been enlarged upon and
which more recent research has suggested may be a factor for less able lan-
guage students in an HE setting (Leakey and Ranchoux 2005: 47). Further-
more, timesaving and cost-effectiveness issues may be of use to commerce
as well as in secondary and higher education to assess students language
proficiency for diagnostic or achievement purposes, training tutorials and
drill-and-practice activities (p. 12). The second Case Study (Chapter Seven)
will feature a software package where all three of the above factors relat-
ing to timesaving appear to have featured as drivers behind the design. In
1991 this timesaving benefit was seen as also of use in its ability to free up
the teacher so that he/she could concentrate on devising communication-
engendering activities for the learner, which in the 1980s computer tech-
nology was very limited in its ability to deliver. It is increasingly possible
for the computer now to even assist with the communication-engendering
activities, for example, via video-conferencing and telephony applications
(such as MSN and Skype), not to mention text-based chat.
Our CALL effectiveness assessment should include, I suggest, a gauge
of the timesaving factor and its role in acceleration of student learning,
accelerated feedback in diagnostic and formative testing, and reducing
the workload ofteachers in the areas of preparation of materials, class and
individual contact time and the marking of tests.
102 Chapter 4
The second strong and consistent trend in findings that Dunkel deals
with relates to student attitudes toward the computer and the subject
matter. Dunkel points to the Florida Department of Education report
(1980) and a series of studies by Kulik and colleagues which both suggest
that students hold positive attitudes towards using computers. This is still
generally not in doubt, though teachers are finding and the focus groups
linked to the Case Studies for this thesis have confirmed that, even a quar-
ter of a century on, many students are wary of them, and even the more
IT-literate student may react negatively to the unnecessary or over-use of
computers for teaching purposes.
Even more interesting to the effectiveness debate is Dunkels inference
from Kulik and Kuliks finding that computers do not seem to have much
impact on students motivation to learn the subject matter even though
students may report that they like to use computers (1986: 13). This
phenomenon echoes Thorndike and Hagens halo error (1977, cited in
Chapelle and Jamieson 1991: 45), whereby students reporting of an experi-
ence may not accurately reflect their actual experience of it. Effectiveness
researchers need to be cautious in either wrongly designing student attitude
surveys or in misinterpreting the findings.
An effective model for evaluation will, therefore, need to provide a
qualitative indicator of student and staff reaction to the use of computers.
However, it will need to factor in the halo error and be able to distinguish
between attitudes to the computer and the effect of a computer-based
environment and learning programme on students attitudes to learning
the subject matter.
Dunkels third strong trend relates to the effect of computer use in spe-
cific content areas. Dunkels summary of previous findings places languages
in the top three subject areas benefiting from CAI alongside mathematics
and science (Fisher 1983: 13, and Roblyer et al. 1988). As for the language
learning skills that benefit most from computers she points to the Rob-
lyer et al. (1988) study that stated that: computer applications seem most
effective in the area of word analysis skills, such as phonics, followed by
higher level reading and language skills (Roblyer et al. 1988: 92, cited in
Dunkel 1991), and argue that while their own conclusions were based on
just four studies, these nevertheless replicated previous findings. A more
recent meta-analysis (Felix 2005c), which will be discussed in greater depth
A model for evaluating CALL Part 1: CALL enhancement criteria 103
later, suggests that little has changed in this regard. Our final model will
need to isolate the impacts of CALL on both individual language skills
and combined skills activities.
Dunkels fourth area is what she calls the types ofCAI (e.g. tutorials,
drill and practice, and simulations). Dunkel here draws together conclu-
sions from a number of different authors, whose findings have stood the
test of time (Burns and Bozeman 1981, Roblyer and King 1983, Samson,
Niemiec, Weinstein, and Walberg 1985, cited in Dunkel 1991: 14), namely
that: drill works better with lower level skills found at lower grade levels
while tutorials are required for higher level skills (Roblyer et al. 1988:
35). Also cited was the finding of Willis, Johnson and Dixon (1983) that
computer games and simulations were more attractive and interesting to
students than any other form of computer-based instruction. The latter also
pointed out the cost-efficiency of instructional simulations in that they
could bring the real world into the classroom, thus obviating expenditure
on trips abroad, and saw the cognitive benefits of simulations in nurturing
divergent thinking. Since then the most successful and abidingly popular
CALL products and activities have been heuristic packages such as the
tourist simulation game Granville (1980s), the murder mystery Who is
Oscar Lake (1996)?, or MOOs involving variants of simulations games
such as Dungeons and Dragons.
Courseware design and evaluation of courseware design can and
should still be informed by these findings. The fundamental dynamics
of the information gap, treasure hunt and the need to communicate to
discover stand at the heart ofthe best simulations and instructional course-
ware. Interactive web-enhanced instructional resources involving chat, file-
exchange, conferencing and peer feedback are more recent developments
in this genre. Felix gives examples of several fee-paying stand-alone courses
that are password protected, offering free trial materials open to anyone.
They range from one-person operations like Cyberitalian and Interdeutsch
to large organizations like GlobalEnglish that employ considerable staff
and offer a 24-hour attended chat site and other extensive services (Felix
2000a). This thesis reports on extensive trialling of another similar product,
TellMeMore, that started off as a networked or stand-alone CD-ROM
and has evolved into a sophisticated online tutoring resource incorporat-
ing simulated dialogues using speech recognition, sophisticated tracking,
104 Chapter 4
second question, findings are reported on in the Case Study chapters show-
ing how different levels of proficiency perform in a CALL environment.
Correlations were also looked for between learning style and learning gains,
and between experience with computers and learning gains.
As for Dunkels third and fourth questions, issues of feedback and
learner control will form an important element in particular in the Case
Studies looking at Platforms (Chapter 4) and the software program
TellMeMore (Chapter 5). Qualitative feedback gleaned from participat-
ing staff and students and their comments in evaluative questionnaires
and focus groups will be a significant part ofthe findings. The key factors
behind each ofDunkels four questions will also feature in our final evalu-
ative model (MFE2). Their value was underlined by our mapping against
Chapelle of the key Dunkel criteria gleaned from the above analyses and
meta-analyses.
When mapped against Chapelle (Table 4.11), Dunkels research agenda
for CALL, while lacking in the areas of authenticity and practicality,
otherwise overlaps reasonably well. Dunkels agenda throws up supple-
mentary headings similar to those shown by the mappings ofthe previous
chapter (Toner et al; Mehanna) and those that follow below. These are
language skills and combinations of skills, error correction and feedback
and learner control.
evaluation supplemen-
Proposed principles of
Language learning
tary to Chapelle
Positive impact
Meaning focus
Authenticity
Practicality
Learner fit
potential
Author
PPP
Descriptor
workload
Student attitudes towards
the computer and the subject
x
matter; halo & Hawthorne
effects; CHILL factor.*
A model for evaluating CALL Part 1: CALL enhancement criteria 107
In her synthesis ofCALL effectiveness research prior to the late 1980s (1988:
20121) Pederson draws on the insights of what she calls perhaps the most
ambitious CALL experimental endeavour to date, that ofRobinson et al.
(1985), for an evaluation of six pedagogical and four answer-judging (i.e.
feedback) hypotheses which, when tested over a nine-day period, in albeit a
junior U.S. high school Spanish class and not at HE level, revealed signifi-
cant out-performance by the experimental group (students who practised
with CALL the ten point criteria listed below). Such a strong finding, in
a field where strongly significant findings are very much the exception,
warrants closer scrutiny both for its findings and its research design. This
approach is an example of good experimental design practice in its atom-
istic rather than general approach, as Pederson states: the purpose of the
research was not to prove the effectiveness ofCALL in general, but to pro-
vide evidence ofhow the manipulation of certain CALL coding elements
may be particularly well suited to encouraging meaningful, communica-
tive, and maximally facilitative CALL (p. 120). The design used a classic
pre-test/post-test design that included also two tests of prior knowledge
to establish a benchmark or starting point for comparison, and thereby
allowed a clear isolation of learning gains to be made. Finally, rather than
being technology-driven the design insisted on a pedagogical rather than
a technological rationale for generating research questions and selecting
variables (p. 120). Both the control and the treatment groups were given
the identical materials as well as the same pre- and post-test in order to
isolate the one variable of CALL. The control group, however, practised
under the opposite conditions (p. 121), though it is not stipulated whether
these were non-CALL or alternative CALL conditions. The conclusions
from the study are a positive reinforcement for post-behavioural CALL
methodology, in that they showed that meaningful and discovery-oriented
CALL leads to more learning than CALL that is less communicative and
more directive (Pederson 1988: 121).
The six pedagogical hypotheses in the Robinson et al. study, echo-
ing Mehannas clusters in many ways, predicted improved achievement
as a result of the following types of materials presentation: integrated
A model for evaluating CALL Part 1: CALL enhancement criteria 109
When mapped against Chapelles six principles (see Table 4.12) Pedersons
agenda, drawing primarily on Robinson et al.s criteria (i.e. those peda-
gogical hypotheses that predicted improved achievement (1988: 120)),
scores most strongly on issues oflanguage learning potential and learner
fit, provides some coverage of meaning focus (cf. meaningful practice of
structural items) and positive impact (cf. use of humour) but makes no
overt reference to authenticity and practicality. Similar to those shown by
the mapping ofDunkels agenda and the mappings ofthe previous chapter
(Toner et al. and Clarke) several of Pedersons criteria map well with our
new criteria error correction and feedback and learner control. And the
hitherto unmentioned factor teacher style has been added to our list. Some
ofthe other mapped agendas and methodologies below also highlight these
and other extra factors that may influence the quality ofthe teaching and
learning. MFE1 and MFE2 will incorporate these additional factors.
A model for evaluating CALL Part 1: CALL enhancement criteria 111
evaluation supplementary
Proposed principles of
Language learning
Positive impact
Meaning focus
Authenticity
to Chapelle
Practicality
Learner fit
potential
Author
PPP
Descriptor
Meaningful practice of
x x
structural items
Reference to people that
x
students knew
learner personally
(drawing inferences or x x
problem solving)
Error
Provide with implicit rather
x x correction and
than explicit correction
feedback
supplementary to
Positive impact
Meaning focus
Authenticity
Principles of
Practicality
Learner fit
evaluation
potential
Chapelle
Author
PPP
Descriptor
engagement in cognitive
and feedback
processes
Pedagogy
brainstorming, etc.
Pedagogy
Error
The self-system processing of
x x correction
presenting tasks
and feedback
The use of task-related
x
knowledge
The cognitive processing of
x
tasks
Provides
learner
The meta-cognitive
control;
processing of tasks
4+ skills
combinations
priorities that relate to data collection will be included in the mapping table
below (Table 5.1). The concluding remarks to her chapter in Smith (1988:
126127) serve as a clarion call for our enquiry: an increased interest in
disciplined, dispassionate research that attempts patiently and carefully to
add to what is already known about how students learn languages is the
best assurance that CALL, unlike the language lab of the 1960s, will be
used intelligently.
Likewise, Chapelles call for strong internal and external validity were
documented in Chapter 3. Her guidelines for strong internal and exter-
nal validity will form the basis of both Felixs guidelines below and the
data collection validity checklist (see Tables 5.1 and 5.4). To reinforce her
message one could add her comment in her concluding remarks in her
chapter in Dunkel (1991): Because perfect worlds in which to carry out
research do not exist and because the environment of each research study
has unique elements that constrain the validity of the investigation, it is
the responsibility of the researcher to identify and pinpoint threats to a
studys [internal] validity (p. 54).
Felix has been interested in good practice in effectiveness research
since her doctoral thesis on Suggestopedia (completed in 1989), and in
CALL evaluation since her 1993 article: Marking: a pain in the neck
the computer to the rescue. In 2000 she was advising caution regarding
unreasonable claims and contradictory findings regarding the effective-
ness of CALL and trying to raise awareness as to the complex nature of
the variables involved:
research into the efficacy of computer-assisted learning has produced very equivo-
cal results (Dunkel 1991), and it is easy to list problems attached to such research
(Chapelle 1997). Judgments in the area vary widely. At one end are positive reports
from the authors of several large meta-analyses as exemplified in the computer
did its job quickly on average in about two-thirds the time required by conven-
tional teaching methods (Kulik et al. 1980: 538), and the newer technologies show
promise to be able to provide feedback in multiple modes, such as listening and
reading (Basena and Jamieson 1996: 19), although they did caution that the results
are difficult to interpret, and the designs and measures do not lend themselves to
reproduction or generalizability (p. 19). At the other end are dismissive (and in the
quoted case unsubstantiated) comments such as: Study after study seems to con
A model for evaluating CALL Part 2: Qualitative and quantitative measures 117
firm that computer-based instruction reduces performance levels and that habitual
Internet use induces depression (Noble 1998: 2). Given this variation, it is fairly clear
that general conclusions about the effectiveness of CALL cannot be formulated
without qualification nor relied upon uncritically. What is more, the problem is
going to intensify: as programs become more sophisticated, variables to be investi-
gated become more wide-ranging and conclusions on meta-analyses more difficult.
(Felix 2000b: 50)
Since 2000 ongoing syntheses ofCALL research by Felix (2004 and 2005a)
and Hubbard (2004 and 2005) reveal that these lessons are still not being
learned. Hubbards review of over ninety research articles found that a high
percentage ofCALL research involves research subjects, whether students
or teachers, who are novices to CALL; they are also novices to the task or
application under study, are often studied exclusively during their initial
experience. Additionally, the studies may be very short, representing a
single event, such as a class or lab session. Also, surveys and questionnaires
are used in place of more objective measures, such as tracking or testing
(Hubbard 2004: 165; 2005: 358). Hubbard also adds: with better studies
of trained and experienced learners, we may find CALL is more effective
than is currently believed (Hubbard 2004: 165 (online)). This comment
reflects an awareness of the unconvincing findings of much CALL effec-
tiveness research and a confidence that improved outcomes are more likely
with improved research designs and students that are more familiar with
CALL. Familiarity suggests long-term exposure and longitudinal studies
to monitor this. These would help to eliminate skewing factors such as the
halo and Hawthorne and poor learning outcomes due to teachers and
students wasting learning time coming to grips with new technology.
In her article Analysing recent CALL effectiveness research Towards
a common agenda (2005a), based on a meta-analysis of more recent CALL
research (i.e. between 2000 and 2004), Felix also points out the frequent
shortcomings of research constructs, listing common problems that still
occur with effectiveness research: misleading titles, poor description of
the research design, failure to investigate previous research, poor choice
of variables to be investigated, and overambitious reporting of results
(2005a: 10).
118 Chapter 5
Several ofthese points featured earlier in Felixs (2000b) article and were,
therefore, available to inform the method used in the Case Studies. They
will also feature in MFE2 in Chapter 9. In concluding this section, it is
interesting to note some ofthe comments Felix made in her last conference
A model for evaluating CALL Part 2: Qualitative and quantitative measures 119
The Case Studies mapped against the agendas of Pederson and Felix
Looking, then, initially at evaluative methodology, one can see from the
table below that the four case studies managed to observe the majority of
measures recommended as best practice by esteemed CALL effectiveness
researchers. Leakey and Pedersons quantitative and qualitative measures
overlapped in 88% of cases and Leakey and Felixs (2005a) overlapped in
78% of cases. This percentage was arrived at by counting all those boxes
where there was either a Yes entered or else compliance was observed
(respective to Pederson (= 21/24 boxes) and Felix (= 28/36 boxes)). The
entries marked with a No or an Uncertain were deducted from the total
120 Chapter 5
Tests of prior
knowledge used
as benchmark and Yes Yes Yes Yes
isolation of learning
gains
Pedagogy rather than
technology-driven in
generation of research Yes Yes Yes Yes
questions and selection
Pederson (1988)
of variables
CAT
drawn
Identical materials for
from same Yes Yes Yes
pre- and post-test
bank of
questions
Identical materials
for treatment and No Yes No No
comparison groups
Main control variable
for treatment and
control groups was
Yes Yes Yes Yes
opposite conditions
(i.e. CALL v NON-
CALL)
Extraneous or
confounding variables Yes Yes Yes Yes
must be controlled for
Felix (2004 & 2005)
No No vol-
Subjects should be No Yes whole
course unteer
randomly selected volunteers cohort
groups groups
Instruments of Yes CAT
LO test not
measurement for based on Yes 5
a measure of
learning outcomes and graded different Uncertain
meta-skills as
attitudes need to be database of tests
such
reliable questions
122 Chapter 5
Table 5.1 Checklist to enable the mapping of quantitative and qualitative measures.
From Pederson and Felix to a Case Study or Research Project.
The evaluation diamond for CALL effectiveness research (see Figure 5.1)
gives a graphic overview ofthe options to consider in the process of design-
ing an evaluative study of impact or student learning gains. This suggests a
A model for evaluating CALL Part 2: Qualitative and quantitative measures 123
Tables 5.2, 5.3 and 5.4 are a checklist of good practice for effectiveness
research applicable when conducting ones own empirical studies or evalu-
ating those carried out by others.
Table 5.3 shows the proto-typical (MFE1) version ofthe checklist for
data collection methods (both qualitative and quantitative) used in the
Program and Pedagogy Case Studies.
From the experience ofthe Case Studies this has evolved into a longer,
more comprehensive version. This is shown in Chapter 9 (Table 9.17) and
contains twenty-one as opposed to the eleven data collection methods
of Table 5.3, and includes diagnosis of staff reactions as well as students.
Nevertheless, researchers must be mindful ofMurrays warnings about the
potentially intrusive nature of multiple-method data-gathering, and so it
must be stated that use of all the given intervention points in one study is
not recommended. A study, and in particular students enthusiasm, can
be spoiled by excessive monitoring.
A model for evaluating CALL Part 2: Qualitative and quantitative measures 125
people?
Are the Activities across the groups: identical, near-identical, different?
Is there a Treatment group and a Control or Comparison group?
Are the Pre- and Post-tests identical, near-identical or different?
What Language(s) are being studied?
What Language Skill/Combination of language skills is under
analysis?
What Variable(s) are being analysed
Is the Allocation of Subjects to groups random or selective
If Random allocation, how was this achieved?
If Selective, what criteria and methods were use to select subjects
What methods for Controlling for and isolating of variables were
adopted?
Is the Scoring carried out by an independent scorer?
126 Chapter 5
Parametric or non-parametric?
What instrument(s) were used to measure Correlation? Parametric or
non-parametric?
What instrument(s) were used to measure Variance? Parametric or
non-parametric?
What instrument(s) were used to measure Covariance? Parametric or
non-parametric?
Was an Effect Size equivalent given where relevant?
What degree ofConfidence was established at the outset? (99% or 95%)
Element
present?
Leakey Data collection method
Yes/No
Pre-test
Progress test (mid-treatment)
Post-test (identical to pre-+ progress test)
Table 5.3 Proto-typical (MFE1) version of the checklist for data collection methods.
Issues of internal and external validity are also crucial for the robustness of
any research project where data (be they qualitative or quantitative) are being
gathered for reporting to a wider readership. For MFE1, we pooled validity
A model for evaluating CALL Part 2: Qualitative and quantitative measures 127
assessment criteria from the literature, in particular Chapelle (1991) and Felix
(2000b) to enable us to develop a sound research design (Table 5.4).
Table 5.4 Validity assessment criteria for MFE1 drawing from Chapelle (1991)
and Felix (2000b).
128 Chapter 5
Clearly some questions in the checklist are easier to address than others; all
will require careful thought and planning prior to the start ofthe project.
Some, such as random assignment of respondents and the amount oflan-
guage instruction being received outside of the study may be depend-
ent on institutional and timetabling arrangements and may well require
adjustments at this level. Others will require knowledge of the context of
the project in the wider field of research. Many will be determined at the
reporting stage. In the Case Studies not all the above criteria were met on
every occasion and the next chapters will report on the rigour of each Case
Studys construct. What follows is a summary of the main data collection
techniques used for MFE1 in the Case Studies.
Our review ofthe CALL and statistical literature has argued that the rich-
est data can best be gained by combining data types. Given the complexi-
ties involved in such multiple learning environments and permutations
as CALL can throw up, it would be impossible to come up with a single
optimal experimental design model to suit all requirements. More impor-
tant is an understanding ofthe different instruments and research designs
possible and the ability to match them to the setting. As Felix puts it:
Because there is such a large scope for research in this area, there cannot be a single
best design model. What is imperative, though, is that researchers match the design
to the research questions, the context in which the study takes place, the time-frame
available, the variables under investigation, their capacity of statistical analyses and
their ability to control for confounding elements. A short-term fully controlled
experimental design, for instance would be suitable to measure individual well-defined
outcome effects (), while a longer-term non-experimental study using qualitative
measures such as observational procedures and think-aloud protocols would yield
important data related to effects on learning processes. A combination of various
data collection methods within one single study will help in strengthening confidence
levels about results. (2004: 124; and 2005a: 12).
A model for evaluating CALL Part 2: Qualitative and quantitative measures 129
Inferential statistics should form the main body of the statistical analysis.
The aim should be to test a pre-stated hypothesis by means of a variety
of statistical analyses in order to show whether there are any significant
relationships between compared data. A typical CALL-related hypothesis
might be that exposure to technology in language development makes no
difference to student progress (called the null hypothesis). We can accept
the alternative hypothesis (i.e. that exposure to technology makes a differ-
ence to progress in language development) ifthe significance value in our
compared means tests for pre- and post-test scores across the two groups
is less than or equal to 0.05 (i.e. is at a 95% or higher level of confidence).
Significance, or the level of probability (i.e. the Sig. or p value) that the
results are due to chance in a comparison of means, is shown as a value
between 0 and 1. The nearer to 0 a significance in the comparison of means
is, the more unlikely it is that the results are due to chance.
130 Chapter 5
a measure of the means for the same group of individuals was, typically,
repeated for dependent (or outcome) variable A (i.e. a pre-test score) and
dependent (or outcome) variable B (i.e. a post-test score).
Additionally we tested for degrees of relationship or correlation
between variables such as: attendance, language learning experience, ICT-
use, learning style and learning outcome. It is worth noting that correlation
does not imply causation. As with any correlation, there could be a third
variable which explains the association between the variables we measure.
So in the case of the TOLD project even if we showed that there was a
strong positive correlation, say, between ICT use score and progress in the
treatment group, a third variable such as positive exposure to something
new may be playing a significant role, especially in the first weeks of expe-
riencing a new multi-media lab.
an excellent example of what can be done to increase validity in a study with a very
limited number of subjects and with so much scope for outcomes having been pro-
duced by elements other than the treatment. Procedures are described in great detail.
Participating children were selected by rigorous selection criteria including scores
from recognized (and referenced) visual, verbal and spatial tests, interviews with
children and some parents and a log of classroom observations. (Felix 2005a: 15)
applied to platform impact studies, though these come with the same caveats
regarding correctly identifying, isolating, and reporting on extraneous vari-
ables. The evaluative reports on the interdisciplinary TICCIT (Alderman
1978), and PLATO (Murphy and Appel 1977) projects of the 1960s and
1970s give further useful insights on conducting large-scale quantitative
impact studies linked to early computer-based education systems.
It must be stated at the outset that this chapter, as well as the next two
chapters, look retrospectively at Case Studies that were reported on fully
at the time. For the purposes ofthis book they are discussed in the light of
their relevance to the evolving Model for Evaluation (MFE2), and as such
focus primarily on principal issues relating to the evaluation framework;
a complete reporting and statistical analysis was conducted in each case
and is available on request, but would have been excessively detailed for
inclusion in these next three chapters.
For the purpose of this project digital platforms are divided into three
subsections: digital labs (whether driven by hardware, software or a hybrid
ofboth), VLEs and Interactive Whiteboards. Each occupies very different
spaces; a digital lab is a discrete, self-contained physical space, a VLE exists
in cyber-space accessible from any location with access to the Internet,
and IWBs are a mobile resource that can be installed in any physical space
(classroom or lab) with access to an electrical socket, a master PC, and, ide-
ally, the Internet. This section will clarify what each of these is, show-case
some examples, and discuss some of the evaluative and pedagogy-related
issues that pertain to each.
It is evident that the functionalities of digital platforms sit well with
nearly all ofthe CALL Enhancement Criteria. Some criteria, such as lan-
guage learning potential, are clearly more relevant to programs and peda-
gogy than platforms. For most if not all others, there is a direct relevance
to the functionality of the platform itself, be it physical or virtual. When
one considers, for example, meaning focus, there is a clear link with Hewett
et al.s reference to the capacity of a platform to provide access to digital
136 Chapter 6
12 criteria for
judging CALL Platform-judging considerations
enhancement
Does the platform support software that allows for a beneficial
Language
focus on form? Does it support drill-and-practice and
learning
vocabulary acquisition activities? Does it enable rapid error
potential
correction and feedback linked to the focus on form?
Does the platform allow learners of different abilities, learning
style, ages and genders to learn together or in differentiated
Learner fit groups? How well does it support diagnosis of learner levels
and needs, and customization of materials and learning paths to
these levels and needs?
Meaning What capacity does the platform have to provide access to digital
focus references or enhance meaning inference via coding elements?
Chapelle
Language
How efficiently does the platform deliver sound, images, record,
skills and
and playback? How easy is it to combine language skills via
combina-
multimedia using this platform?
tions of skills
What degree of interaction, choice, control and manipulation of
Learner
material is enabled by the functionality of the platform itself, as
control
opposed to the CALL software program?
Can the teacher monitor student screens from the teacher
Error cor- console, take control of a students keyboard and/or mouse, and
rection and intervene to provide discrete individual or group feedback? Can
feedback the teacher or student readily access a record of performance and
progress?
Leakey
At least one ofthese solutions employs simple to use audio panels and
headsets, the only PC being that ofthe teacher who has a software-driven
interface. There are a number ofhybrid (software/hardware) solutions on
the market, where a software- or hardware-driven digital recorder is required
for the recording and playback of multimedia learning material.
Ifthere is a trend to be observed it is towards virtualization and away
from confinement to specific physical locations, in response to the grow-
ing ability of lap-tops, and indeed mobile phones, to access remotely and
engage with learning objects in cyberspace. All the same companies that
have historically invested heavily in analogue and digital labs appear to be
hedging their bets and endeavouring to ensure that their latest labs are able
to serve both single networked sites and deliver remote learning.
Interactive Whiteboards
but the latter should not be confined to or solely determined by them, nor
are all design issues directly relevant, for questions of language learning
and computer-assisted language learning should be the principal driver in
CALL evaluation.
At this stage, digital platforms will be submitted to an evaluation in the
light ofthe three software-based frameworks used in the previous chapters:
Ingraham and Emery (1991), Hubbard (1988), and Dunkel (1991). Also,
Davies et al.s (2005) guidelines for Setting up effective digital language
platforms and multimedia ICT suites for MFL, and the more recent Uni-
versity of Ulster/ LLAS survey on multimedia language learning in UK
universities (Toner et al. 2007) are harnessed in order to update and apply
the criteria from earlier publications to the current context. For our contex-
tual analyses of the Robotel and Melissi platforms we draw on data from
the TOLD and BLINGUA projects at Ulster and the Clarke (2005) survey
of Melissi Digital Classroom users at the University of Portsmouth.
Which learner differences (age, gender) Autonomous learners will benefit most; all
/ learning styles/ learner strategies is the learner styles can benefit depending on the coding
digital platform best suited to? elements, manner of presentation and media used
What degree of learner control is related Tutor must upload resources and links; after that
to effective CALL digital platform the student has high level of control over access,
design? timing, rate of work and interaction
IWB Digital labs
Cost-effective compared with digital lab purchase;
can accelerate learning due to motivational factor; Probably most expensive option. Other
may reduce workload through enabling teacher to two cannot match the broadcast, scan,
stay in his/her classroom and not have to transfer pairing/group functionality
to lab
Qualitative studies show this to be a popular tool Popular if ergonomically sound and
with all ages well integrated into teaching
Table 6.2 Comparison of three digital platforms: VLEs, IWBs and Digital labs.
144 Chapter 6
goes on to state that the digital lab is underused for this purpose, and that
the CALL package is the most frequently used resource in such locations,
over and above teacher-devised activities, and speculates that the cause
of this is the issue of stigmatizationa danger that digital labs are seen
as highly specialized areas, only to be used for certain teaching activities.
CALL effectiveness research needs to test the validity of such speculation
by means of case studies, staff and student focus groups and ethnographic
research, and provide managers and teachers with evidence ofthe benefits
of stafftraining and enriched student learning. MFE targets the quality of
integration of digital platforms not only by asking questions about train-
ing but also with evaluation criteria such as Ingraham and Emerys crite-
ria: Supports Course structure, Supports lesson structure, Adaptable to
different language learning methods and Supports CAL methodology
(see Table 6.3).
In Chapter 4, key principles from the LLAS report were mapped
against Chapelles criteria (Table 4.6). These include: the encouragement
of autonomous learning, the impact on teacher interaction with students,
interaction among students, the integration of audio/video and other
media to classes, the provision and storage of audio/video and other media
files, the encouragement of student engagement, the effect on tutor work-
load, the impact oftechnical problems upon effectiveness ofthe platform,
and the impact on tutor contact hours. Most echo the criteria from Dunkel,
Hubbard, and Ingraham and Emery, which have been used (in Table 6.2)
to compare digital labs with VLEs and Interactive Whiteboards, and will
be used now to evaluate the Robotel and Melissi platforms.
The Robotel and Melissi Case Studies provide a framework for correlat-
ing, qualitatively if not quantitatively, the impact of the chosen platform
and to provide a gauge of the synergy of the three Ps within the teaching
and learning experience.
148 Chapter 6
WORD and PowerPoint). The Ingraham and Emery criteria, for exam-
ple, show how the Robotel digital platform enabled TMM to be used for
more teacher-led instruction than the software designers may originally
have intended, by means of the scan and broadcast functionality to dem-
onstrate, say, a learning path and sequence of activities via the broadcast
function and then allow students to practice individually. With regard to
area studies teaching the Robotel digital platform, for example, is shown
to allow the teacher to switch swiftly between a teacher-led scenario and
an autonomous-learning setting.
The table also reveals the limitations of a digital platform; for example,
it will not enable the opening up of a closed database or software system:
teachers will not be able to use the Robotel SCVI to author any content
in the TMM software. Nor will they be able to use the software for paired
activities as all the interaction in the TMM dialogues is between the stu-
dent and the software.
150
Intended use of platform in teaching context:
TMM: Tutor mainly as facilitator and monitor
Area Studies (BLINGUA): Tutor as Lecturer and facilitator
Robotel For Area Studies teaching
Descriptor For teaching via TMM
SmartClass (BLINGUA)
SCVI enables tutor monitoring may
Overall objectives and
Scaffolded SLA: drill and practice Scaffolded SLA and/or drill and prac-
Supports CAL methodology
modes etc tice modes etc
Chapter 6
Delivery of the televisual Digital platform makes no differ- Broadcast, Capture, and Flex/Pairing
environment ence here modes
Case Study 1: Evaluating digital platforms
Delivery of the windows Digital platform makes no differ-
ditto
environment ence here
Supports Screen design ditto n/a
Yes through student control of own Yes through student control of own
Enables range of Autonomy and workstation and tutor broadcast and workstation and tutor broadcast and
Digital Platform
Yes scaffolded SLA and/or drill and Yes scaffolded SLA and/or drill and
Enables autonomy and self-tuition
practice modes etc. practice modesetc
Digital platform will not open up Supports Windows environment and
Provides access to authenticity
Practical considerations
Table 6.3 MFE1 table mapping Robotel functionality against Ingraham and Emery (1991)
for the purposes of digital platform evaluation.
151
152 Chapter 6
Chapter 6
Motivates the learner to use it Yes through the above See above
Case Study 1: Evaluating digital platforms
Motivates the learner to acquire the
ditto See above
language
Table 6.4 MFE1 table mapping Robotel functionality against Hubbard (1998) for the purposes of digital platform evaluation.
155
156 Chapter 6
The Melissi Case Study will look at the claims for the Melissi Digital
Classroom and evaluate its impact in the light of staff and student focus
groups/surveys conducted by Clarke. It will then be compared with the
Robotel SmartClass 2000 system using the same MFE1 evaluative criteria
that were used for the Robotel study above.
The Melissi website draws attention to the primary difference between
the software and hardware solution:
Traditional language laboratories, and even some ofthe newer so-called digital labs,
still need dedicated wiring, making multi-use difficult. The Melissi Digital Classroom,
however, is not constrained by analogue wiring so the PCs can be installed almost
anywhere there is a suitable network. It can even be split over two or more rooms
providing that they are connected to the same network switch.
The absence of analogue wiring and the flexibility ofthe software solution
are probably Melissis main selling points; software solutions appear to be
the direction that most, if not all, digital platform providers are going. Even
those companies traditionally known for hardware solutions, such as Sanako
and Robotel, have developed a range of software solutions. In Robotels case
they have developed two digital platforms since SmartClass: a software
platform solution Symposium (targeted at fixed language learning environ-
ments and a more flexible virtual lab solution (LogoLab) targeted at
higher education applications requiring a virtual language lab solution, per-
mitting students to tackle media activitiesat their own time and pace from
any computer on campus (source: <www.Robotel.ca/english/documents/
NewsRelease_LogoLAB_200603-10.pdf>; accessed 1 January 2008). The
aim ofthese is possibly to improve on the more limited functionality oftheir
hard-wired system. The design of this product and its targeting at the HE
sector may well be in response to the Melissi challenge and Robotel would
doubtless now claim their system, too, makes multi-use easy.
For Robotel to match Melissis functionality, quite apart from the hardware/
software difference, it would need to look at developing its own equivalent
to the Black Box for interpretation work as well as a learner-to-learner
Case Study 1: Evaluating digital platforms 157
communication system (for text/chat and phone (i.e. audio link)) that is
learner controlled, but which the tutor can control from the teacher desk.
When one applies the pedagogic criteria ofHubbard, Ingraham and Emery,
and Dunkel the verdict does balance out. Against Hubbards acquisition
criteria (Table 6.5) one begins to see how much ofthe platforms effective-
ness will depend on teacher input and use. It also shows the value of the
Black Box, which was not being used at time of writing.
While there is wide functionality the platform will need to be effec-
tively harnessed to ensure students are motivated to use it for language
learning and comprehensible output. As we will see in the student and staff
feedback, there was limited use of Melissis functionality and it is maybe
not surprising that affective feedback was mixed.
Melissi
Digital Descriptor Comment
Classroom
Yes through A-V with text
Enables meaningful
comprehension functionality, Black
communicative interaction
Box, access to full PC functionality,
between student and computer
WWW, etc
Enables comprehensible input at
Melissi is a neutral shell. Input depends
a level just beyond that currently
on tutor or student
acquired by the learner
student embarrassment.
Digital Platforms
Table 6.5 MFE1 table mapping Melissi functionality against Hubbard (1988) for the
purposes of digital platform evaluation.
Case Study 1: Evaluating digital platforms 159
Melissi
Digital Descriptor Comment
Classroom
Overall objectives and structure
or student self-access
Supports CAL methodology Yes
Digital Platforms
Yes
learning
Allows for interaction and
Yes
response
Table 6.6 MFE1 table mapping Melissi functionality against Ingraham and Emery
(1991) for the purposes of digital platform evaluation.
Case Study 1: Evaluating digital platforms 161
Melissi
Digital Descriptor Comment
Classroom
Does the platform save time? Is If anything it increases staff work-
it cost-efficient; does it accelerate load. It is cost-effective when com-
learning; does it reduce teacher pared with Robotels hard-wired
workload? system
Generally positive. Criticisms
Student reaction to the digital
tended to be linked to integration
platform and the multimedia
and technical glitches not con-
environment
nected to the system
as a support for CALL Pedagogy adapted from Dunkel (1991)
Table 6.7 MFE1 table mapping Melissi functionality against Dunkel (1991) for the
purposes of digital platform evaluation.
A significant gap in the three frameworks above has been revealed by this
study, namely the need to evaluate the bedding-in phase which involves
issues such as technical problems linked to this phase and staff reaction
to migration to a new system, environment and culture (and this will be
important for programs and pedagogy, as well). If there have been early
teething problems and if staff, either for this reason or for lack of train-
ing, are not disposed or equipped to use the systems, or to use them to the
full, then this will feed into under-use by the students, even in self-access
rooms. In Portsmouth teething problems with the technology occurred in
the first year or two, and although these were ironed out by the third year,
staff still commented that, even in its third year, the system often crashed
when the room was used to capacity, and that tended to be for assessments
Case Study 1: Evaluating digital platforms 163
when every student was present. Such experiences only reinforced a general
reluctance to use the system to the full. Some had tried using other features
ofMelissi: one teacher had made experiments with subtitlingas a filler
also tried to use the telephone functionbut that consistently does not
work because there is a sound card missing. This same teacher would prefer
to use the system as a self-directed learning tool, and disliked using it as an
interactive teaching space. Another teacher, however, liked to use it in this
way, but rather than using the systems own monitoring of student screens
function preferred to orally check where people are, check responses. I
know there is a facility to look at the screens and see what people are writ-
ing, but I personally prefer the personal checking of learning.
Lack oftime allocated to development of materials and ongoing train-
ing were major hurdles to staff using the wider functionality ofthe system.
Clarke comments in her section on training that many stafffelt their train-
ing was neither adequate, integrated or ongoing and that they were given
two to three hours of training at the start and then expected to get on
with it without any follow-up (pp. 1112). In short, the Melissi engineers
claim that the system would not involve time-consuming preparation was
possibly misleading.
Clearly most ofthe criticisms above are less concerned with weaknesses
in the Melissi system as such and more to do with technical, managerial,
cultural or pedagogical issues in the host institution. Technical glitches
tended to be linked to the pre-existent network or PCs that housed the
system. Staff reluctance to use the full functionality of the system, even
when everything was working well, derived from a lack oftraining, a lack of
encouragement to use different functionalities, leading to a culture of staff
doing their own thing in the lab rather than adopting a department-wide
ethos. Clarkes findings bear out Davies et al.s principle that an institutional
commitment to integrated and ongoing staff training is vital if a full and
proper pedagogical exploitation of the digital lab is to be made.
164 Chapter 6
Conclusions
Ideally, the Robotel and Melissi systems need to be tested side by side in
an experimental setting to control for system and student performance
differentials, and this should be the subject offuture research. While it has
not been possible to compare the two in this way, the project has helped
clarify the evaluative criteria needed for an assessment of the qualities of
digital platforms. The qualitative data obtained and the phenomenological
analysis gained from using these evaluative criteria have shed useful light
on the varying impacts a system and the manner of its integration make on
student and staff perceptions of effectiveness and on a number of dynamics
that contribute to the synergies at play. This Case Study has shown that
institutional priorities, problems of technical installation, staff training,
the management of staff expectations, and the existence or absence of a
pedagogy-driven approach to use are all as important, if not more so than
the array of functionalities a system may have. Evidently, a good number
of functionalities may as well not be there if staff are not trained or pre-
pared to use them.
Creating a culture of optimized use must start with clear and well
thought-out management commitments. Two digital systems, as we have
seen in the Robotel and Melissi systems, may have broadly similar func-
tionality despite the one being a software solution and the other a hard-
wired solution; but the degree and manner oftheir integration may be very
different due to decisions regarding timetable allocations, training, and
maintenance and ongoing investment priorities. Whereas the Ulster labs
were designated as teaching spaces only, Portsmouth operated a mixed-use
(teaching and self-access) system; secondly, at Ulster there was a commit-
ment to increase the timetabled uptake ofthe labs to ensure maximum use
and with no restrictions as to what modules or skills were taught using the
lab, whereas at Portsmouth there was a restriction to language modules (area
studies modules were excluded); thirdly, at Ulster there was the commit-
ment of significant human resources to ensure adequate staff training for
transition, technical support and the creation of a teaching and research
culture to create and sustain a momentum of optimized use of the lab.
Case Study 1: Evaluating digital platforms 165
Introduction
The evaluation of the impact of CALL software must be tied to the role
this software plays in the teaching and learning process. As early as 1988
Pederson said: The point, however obvious, needs to be restated: CALL,
in and of itself, does not result in more and better learning, it is the specific
way instruction is coded in CALL software that has the potential of affect-
ing learning positively, for specific learners in specific contexts (p. 107).
Software is not dismissed in the CALL impact equation; it is merely
that one must be careful when ascribing causality, and focus on its effects,
and effectiveness, in situ. Pederson goes on to say that one obvious problem
in CALL is to provide evidence that a given software package is designed
and programmed effectively (p. 108). She adds that the wise language
teacher should examine evaluative research reports carefully for clear edu-
cational objectives, a specific target audience, and an adequate evaluative
consensus from classroom teachers, students, and CALL experts (p. 109).
In other words, the evaluation of CALL programs should be intercon-
nected with CALL pedagogy and the two should not be mutually exclusive
activities. Pedersons core thesis is built upon the CAL work done by Salo-
mon. His contribution to effectiveness research generally derives from his
insights into the relationship between the software coding and cognition.
He defines coding, or the coding elements as the way a medium stores
and delivers instruction (Salomon 1979, cited in Pederson 1988: 111) and
identifies three key variables that influence computer-assisted learning:
168 Chapter 7
aptitude; (what the learner brings with him/her in the way of learning
style, strategy and ability), treatment (pedagogy, or how the material is
integrated into CALL), and thirdly, the coding elements (e.g. colour, dis-
play, graphics, rate, timing, format, clarity, print size, linearity, hierarchy
of elements, navigation).
Our Model for Evaluation, therefore, must be able to assess for CALL
programs a number of interrelationships. It will need to be able to isolate,
quantify, compare and correlate improvements in learner performance as
much in response to different software coding elements as to different
teaching approaches, as much to different software interfaces as to teach-
ing settings (CALL vs. CALL and CALL vs. non-CALL); it will need to
identify those learning styles that respond better to certain coding elements
and compare them to the effects generated by traditional pedagogies. In
looking at programs we will not so much be assessing their qualities as
technological products as qualifying and quantifying their effectiveness
in education learning environments.
Many software reviews for CALL have been carried out already. This
chapter will not be a review as such, for reviews or software evaluations
become rapidly obsolete (Pederson 1988: 109) since software products
are constantly updated and improved, and indeed this is the case with
TellMeMore, now in its ninth version. While this Case Study focuses on
one commercially developed product in particular, the primary aim is to
continue to test the Model for Evaluation with a view to identifying and
defining effective coding elements, or in McCartys words the persona in
each separate software package, and assessing the role software plays in the
engine that is the CALL teaching and learning process and experience.
MFE2 will be a framework for qualifying and, to a more limited extent,
quantifying, the persona in the software and the extent to which it embod-
ies the qualities of a good teacher (McCarty 1995: 30). As an extension of
this, the aim is also to evaluate qualitatively, if not quantitatively, the role
of the software when configured with the pedagogy and the platform.
In this Case Study two generations of the commercially successful
TellMeMore language learning software package, created by the French
company Auralog, were evaluated in the context of the teaching of lan-
guages at the University of Ulster. The first package trialled was the net-
workable CD-ROM package TellMeMore Education (version 7); the
Case Study 2: Evaluating programs 169
Research design
There were two key differences to note in the respective research designs
ofthe TMM7 and TMM9 projects. First of all, the learning environments
were different: the TMM7 study took place in a multimedia language
laboratory because it was a networked CD-ROM; the TMM9 study, being
based on a web-accessible e-learning package, was context-free, accessible
from any PC linked to the internet. Secondly, the TMM9 project was a
discrete project, and so it was possible to conduct some quantitative as well
as qualitative analyses, whereas the TMM7 study was carried out in the
context of the research goals of the Ulster-based TOLD and BLINGUA
projects (see next chapter), which were interested less in the effectiveness
of a software program than in the impact of a wider CALL pedagogy that
included TMM7 as an aspect ofthe design. This inevitably had a limiting
effect on the nature and quantity of evaluative activity pertaining specifi-
cally to software. Nevertheless, the students do make specific reference to
the software in their feedback to those pedagogy studies. Central to both
was the challenge of integration of software into teaching programmes.
The TMM9 study, on the other hand, had as its primary goal the
evaluation of the software package, and could therefore be more focused
on the specific impact ofthe software; the potential for isolating causality
was, therefore, also increased. Student volunteers for this trial could be
randomly assigned from a number of different years and languages; the
main disadvantage, however, was that they had to work on the package in
their own time as the trial was based on a new product that had not been
integrated formally into the institutions modular structure. As with any
voluntary study what you gain in terms of a random assignment of students
to groups, and thereby good construct validity, you may lose in terms of
analysis of a real CALL experience in which the language learning on com-
puters is fully integrated with a module and its assessment structure.
170 Chapter 7
These two factors, the voluntary nature of participation and the bolt-on
nature ofthe study, laid the project open to the possibility that the students
might take it less seriously than if it were an obligatory, integrated part
of their studies. Data might therefore be skewed. Also, there was the fact
that progress made might be attributable as much to language taught in
the regular language modules that all students were committed to as to the
extra TMM factor. Given, however, that this affected all students (i.e. the
treatment and the comparison group) equally then one could reasonably
argue that their normal language tuition would act as a control.
The pedagogical designs behind the two TellMeMore studies were
determined by two separate theoretical agendas. First, there was the agenda
set by the researchers; in other words, the TOLD and BLINGUA context
in the case ofTMM7, which focused on oral and writing skills respectively,
and the autonomous e-learning context ofthe TMM9 study which focused
on overall language improvement rather than any one skill. The second
agenda at play was the pre-determined pedagogical agenda built into the
product by the courseware designers when they developed the packages.
Table 7.1 gives the research agendas relative to each study. In the Element
present? column a distinction has been drawn between whether the peda-
gogical descriptor was a characteristic of the teaching approach (T), an
inbuilt feature of the software (S), or both.
For the TMM7 study the degree of teaching input linked to TMM
varied depending on the demands ofthe TOLD and BLINGUA projects,
whereas for the TMM9 study there was no teaching input outside the
software program. In both studies we were interested in assessing the mal-
leability ofthe software to the overarching pedagogic requirements ofthe
institution and module, and whether our Model for Evaluation could pro-
vide an exportable diagnostic tool for gauging the intrusiveness and flexibil-
ity of pre-set learning content for any other language software programs.
Case Study 2: Evaluating programs 171
Table 7.1 Comparing the different pedagogical approaches behind the TMM7
and TMM9 studies.
While both studies shared the fact that they were quasi-experimental stud-
ies, aimed at gleaning empirical data, and used undergraduates at the Uni-
versity of Ulster, their respective research designs were otherwise quite
different. Table 7.2 compares the different data-gathering methods. With
regard to TMM7 a fuller treatment ofthe overall TOLD and BLINGUA-1
(i.e. pedagogy) project designs is given in Chapter 8. Here those projects are
only considered as they relate to the TMM7 package, and not to detailed
matters of pedagogy. The key research design feature to note is that in the
TOLD and BLINGUA projects we would not be able to isolate language
learning progress made in the use of TMM7 from progress made as part
of the overall project. This is because the pre- and post-tests would apply
to the whole project most of which did not involve the use ofthe software
172 Chapter 7
package. The teaching scheme required students to dip in and out of the
software package as part of wider tuition involving discussion groups, paper
exercises, web-related activity and other language software programs such
as CLEF and HotPotatoes.
The TMM9 study made use, for the pre- and post-tests, ofthe Com-
puter Adaptive Test (CAT), which is a foundational, diagnostic tool built
into the TMM9 product.
Element present?
TMM Data collection method
TMM7 TMM7
TMM9
TOLD BLINGUA-1
Diagnostic survey of
Yes Yes No
prior learning
Diagnostic survey of
Yes Yes No
learning style
Post-treatment survey of
Qualitative/ judgmental data
feedback No No
staff reaction
notes
Tutor
Post-treatment staff focus Yes tutors
feedback No
group log
notes
Electronic/paper log/
journal of student Yes Yes Yes
reaction
Quantitative/ empirical data
Table 7.2 MFE1 checklist for data collection methods. Mapping of TMM7 and TMM9.
Case Study 2: Evaluating programs 173
The TMM Education (v.7) package was tested in the context ofthe teach-
ing of undergraduate (post A-Level/Leaving Cert) French only. With the
TOLD oral skills project this involved a CALL-based treatment group of
15 students who had access to the software and the lab and a comparison
group of 14 students who accessed similar but non-CALL content. In
the BLINGUA-1 writing skills project all 25 students had access to the
software and the lab as part of their area studies module. This cohort was
now divided into a treatment group (12 students) whose CALL-based
teaching was differentiated according to their dominant learning style,
and a comparison group (13 students) whose CALL-based teaching was
not differentiated.
The TMM Campus (v.9) trial, on the other hand, increased the range
of languages taught to five (French, German, Spanish, English and Ital-
ian). Three of the six available levels were used (Beginner, Intermediate
and Advanced). The overall number of students was 86 of which 47 were
Participants (i.e. had access to the materials for the duration), and 39 were
labelled Non-Participants (i.e. only had access to the pre- and post-test
for the purposes of comparison). Some of the participants who did two
languages chose to work on both oftheir languages for the trial, hence the
disparity between the above global total and the sum of the totals below.
The target cohorts were first to final year undergraduates. The students of
English and Italian were French foreign exchange students on the Eras-
mus programme. A fuller account of the participant details and further
background to the project was made but space does not allow for their
inclusion here.
While the TMM7 project took place over one semester (TOLD:
September-December, 2003; BLINGUA-1: September-December, 2003),
the TMM9 project was approximately six months in length: December
2006 until May 2007).
174 Chapter 7
Lafford (2004) has already reviewed the Spanish version of what appears
to be version 7 (Education), though she does not specify which version
she is looking at. What she says regarding content and progression within
the database will also apply to version 9 (Campus), which is identical to
Version 7 in that regard. Version 9 differs mainly in its adaptation to an
online, e-learning environment and its integration with sophisticated CAT
diagnostics, progress and summative testing. Laffords rsum summarizes
the strengths and weaknesses as she saw them with TMM7 as a networked
CD-ROM. Our primary judgment ofthe content ofTMM9 is that nothing
has changed, for good or ill, while the online CAT tests and web portal
functionality do represent significant value-added in terms of the adap-
tation of content to student levels, the accessibility and instantaneity of
feedback, and the liberation oflearning from the laboratory to a distance-
learning dimension.
While acknowledging the high-end graphics and excellent speech
recognition software that provides the learner multiple opportunities to
practice Lafford identifies key weaknesses that we found tend to handicap
its use in HE language teaching. For example, in both TMM7 and TMM9
the Cultural Workshop provides knowledge about some isolated cultural
facts from a sealed database of very short, and in some cases dated, texts.
The need for a functionality that allows the easy input of extended, up-to-
date authentic texts of a cultural, social or political nature for area studies
modules is currently not met. Nor are there any appropriate comprehension
or vocabulary related questions linked to these cultural texts. A potentially
significant advance on TMM7 is TMM9s access to an Authoring Tool,
developed with large commercial enterprises in mind such as Renault and
EDF, to enable their own technical training content to replace or comple-
ment the existing content in the Professional Situations route. This could
be a major selling point at higher education level as it would enable tutors
to import authentic and up-to-date texts with accompanying multiple-
choice style (assessable by the package) or open-ended questions (assessable
Case Study 2: Evaluating programs 175
only by the tutor). In spite of its appeal, an early decision was made not to
use the Authoring Tool as the technical complications were prohibitive.
Uploading home-authored texts and exercises to Auralogs server in France,
while possible, would have involved the temporary suspension of their
globally accessible web-based materials every time it occurred; importing
the whole content onto a local server would have obviated this problem,
but in our case the language department server was not linked to the web
and so students could not access the materials outside ofthe local campus
intranet. The quest for an easy-to-use, open and updatable as well as web-
based package with the high-powered functionality of a program such as
TMM9 is definitely worth pursuing for the HE market. TMM9 has nearly
achieved it, though it needs a more user-friendly authoring tool.
Lafford then points out its suitability to the needs of individual learn-
ers, who are given a great deal of control over various elements ofthe pro-
gram so they can forge their own learning path, a point which our study
bears out, and lists the programs focus on pronunciation, structurally-based
curriculum, mechanical exercises, decontextualized interaction, and use of
culture capsules (mostly isolated from vocabulary and grammar exercises
and listening, speaking and writing activities) as reasons why it is out
of step with modern communicatively-based views of task-based foreign
language pedagogy views which are grounded in cultural authenticity
and the notion oflanguage as social practice (p. 32). Again, our trials con-
firm her findings, while we and most of our students would be less scath-
ing about the value of the pronunciation activities and some of the other
mechanical exercises that feature in the package. Also, it is hard to imagine
a product that, given the current limitations oftechnology, would be able
to deliver better non-structural, fully contextualized, communicatively-
based, task-based learning via a pre-packaged sealed database of content
and interactivity.
176 Chapter 7
Technical evaluation
This quite protracted procedure for accessing the product before a stu-
dent can merely start to use it presents quite an affective barrier to all con-
cerned (students, staff and in-house technicians). While travelling technical
support is available (at a price), one can imagine many an institution baulking
at the hurdle that installation presents. In many ways the procedure is easy
to understand despite the technical sophistication ofthe functionality, and
most students were very patient. One can only hope that with the advance of
technology and improvements in interoperability this will be simplified.
The quantitative data collected for the TMM7 study were largely incor-
porated into the data gathered for the respective studies the software was
used for whether TOLD or BLINGUA-1. These will be covered and the
findings reported on in Chapter 8. Discrete evaluation of the impact of
TMM7 within these studies relied for the most part on the qualitative data
gleaned from student logs and questionnaires, and stafffeedback. Some of
this has been reported in graphic and tabular format below.
Staff and student reaction to TMM7 endorses most ofLaffords points both
positive and negative with the following caveats and additional points. Peda-
gogically, the main problem area concerns the mismatch between the self-
contained nature ofthe most ofthe activities and the way that teachers in a
given situation like to teach. In their feedback most staff echoed Roblyer et
al.s concerns (1997: 91) and saw the package as an all-or-nothing challenge
where they felt that ifthey were going to use it in a whole class context then
they would need to adapt completely their teaching style as well as the con-
tent oftheir classes to accommodate the package. Most preferred its use as a
self-access trainer in the mdiathque. Some staff also felt the highest levels
were not sufficiently taxing for the abler student at undergraduate level.
178 Chapter 7
tutors and students preferred to use a separate grammar drilling program for
initial grammar input. For the area studies module the Culture Workshop
had material on a wide range oftopics. However, the passages were judged
too short and basic for university level. We gave web links and other sup-
port material to complement and extend these texts. The product, if it is
to support work in area studies, requires a greater degree of flexibility to
allow for teachers to bring in current texts and set up their own questions
within a pre-existing template (similar, say, to the HotPotatoes format).
In the absence of such flexibility teachers in HE are more likely to ignore
TMM for the teaching of culture and link to live and current pages on
the Internet. In many ways the ready availability of authentic, current and
free video material on the web (via streamed news sites, YouTube etc.),
from which teachers can rapidly generate lesson material, is making the
need for commercially-produced programs selling cultural material increas-
ingly redundant.
The Learning Paths (parcours pdagogiques) feature of TellMeM-
ore proved a useful means of differentiating activities and relating them
to various learning styles. Using Admin Tools and Tutor Tools to preset
student IDs and map learning paths to different students did initially take
a while to get used to but, once understood, proved to be a quick way to
customize student learning. Another useful feature of the package is the
student tracking and feedback functionality. TellMeMore automatically
scores student work, and displays this in tabular format which can then
be exported as text-files, html pages or to a spreadsheet. This is clearly a
welcome timesaving feature. The tracking includes a record of time spent
on a given activity.
The two joint top scoring responses (15/17 students) in the student satisfac-
tion survey for TellMeMore after BLINGUA-1 were for overall enjoyment
and for the variety of activities in the program. Under activities enjoyed
most, listening activities (12/17 students) and exercises and games (10/17
students) were the most popular.
180 Chapter 7
In the student logs the students responded most negatively to the fol-
lowing features. They found the speech recognition activities occasionally
off-putting because the graph often did not give them a good score even
though even the tutor felt they had said the word or phrase well. Sometimes
even a native speaker did not always get a full 7/7. Secondly, the Speech
Recognition on the interactive dialogue was not always sensitive enough
and the students sometimes had to shout to get a reaction. When many
students are working in the same room this sometimes distorts feedback
from the PC. The third most frequently mentioned item was the hang-
man game. It only became enjoyable when they found they could translate
individual words in the clues by right-clicking and using the dictionary
otherwise, especially the weaker ones found it a little hard a case of either
you knew the word or you did not. The dictation exercises proved to be
the least popular activity (for 8/15 students), with pronunciation drills
and speech recognition activities coming close behind (7/15 each).
Auralog have worked hard at the sensitivity/accuracy of the speech
recognition software (the downloadable speech recognition plug-in for
TMM9 has contributed significantly to this); it is notable that in the more
recent TMM9 study more students reacted favourably to the speech recog-
nition, phonetic drills and dictation exercises. For the most part students
reported that they were happy with the program and would use it alone,
if it were available. The new TMM9 (Campus) edition would enable just
such an autonomous extra-mural use.
Since TMM9 used for the most part the same content as TMM7
(Education) we were primarily interested in any impact the new elements
made: the new mode of delivery (i.e. distance or e-learning, the role of
the web-portal) and the web-based computer-adaptive tests which were
designed to gear the learning more specifically to student needs and enable
closer tracking of learning gains. We also wanted to know to what extent
a new way of teaching would be required.
Case Study 2: Evaluating programs 181
Table 7.3 Validity assessment criteria for MFE1: Mapping of the TMM9 project.
CJ = cannot judge.
Case Study 2: Evaluating programs 183
Thus, while TMM9 did liberate the students from the classroom the
freedom thus generated served to pull them back towards a greater desire
for external controls, be they via guided and integrated content, or the
knowledge that they were being watched by a tutor. Would this motiva-
tional impetus, however, contribute to improved time spent on task, and,
more importantly, improved learning gains?
From the outset it was agreed that the CAT tests would form the basis ofthe
test for learning gains, as it had been designed to do this as part ofthe inbuilt
provision ofthe online courses. It was possible because it drew on the same
database of questions, and so was able to compare like with like (though
not necessarily same with same). Furthermore, even though devised by a
company that knew nothing of our students language levels, as they were
adaptive, both tests were a good gauge of the students current ability.
For the pre-test all the students (that is, the participant group (PG)
and the non-participant group (NPG) would take the placement test (test
de positionnement) at the start and the volunteers would be given access
to the TMM online for the period. They would follow the guided mode
(mode guid) and agree to spend at least 2 hours a week on the TMM
online material. Towards the end ofthe trial period the students would all
complete the progress test (test de progression), which was drawn from
the same database of questions as the placement test, and was the same
length, thus providing comparability.
Quantitative evaluation would therefore involve the following statisti-
cal tests: comparing sample means oflearning gains between groups (treat-
ment and comparison) and within groups, looking at progress between the
placement test and the progression test in particular.
Due to time constraints, and probably an element of over-monitoring,
the number of complete data sets was markedly reduced compared with
those ofthe pedagogy study that was going on at the same time. While the
pre-test took place for the entire cohort, the post-test was only completed by
a small proportion ofthe group due to the fact that we were unable to gather
all the classes together to complete the progress test under examination
184 Chapter 7
Thus, while TMM9 did liberate the students from the classroom the
freedom thus generated served to pull them back towards a greater desire
for external controls, be they via guided and integrated content, or the
knowledge that they were being watched by a tutor. Would this motiva-
tional impetus, however, contribute to improved time spent on task, and,
more importantly, improved learning gains?
From the outset it was agreed that the CAT tests would form the basis ofthe
test for learning gains, as it had been designed to do this as part ofthe inbuilt
provision ofthe online courses. It was possible because it drew on the same
database of questions, and so was able to compare like with like (though
not necessarily same with same). Furthermore, even though devised by a
company that knew nothing of our students language levels, as they were
adaptive, both tests were a good gauge of the students current ability.
For the pre-test all the students (that is, the participant group (PG)
and the non-participant group (NPG) would take the placement test (test
de positionnement) at the start and the volunteers would be given access
to the TMM online for the period. They would follow the guided mode
(mode guid) and agree to spend at least 2 hours a week on the TMM
online material. Towards the end ofthe trial period the students would all
complete the progress test (test de progression), which was drawn from
the same database of questions as the placement test, and was the same
length, thus providing comparability.
Quantitative evaluation would therefore involve the following statisti-
cal tests: comparing sample means oflearning gains between groups (treat-
ment and comparison) and within groups, looking at progress between the
placement test and the progression test in particular.
Due to time constraints, and probably an element of over-monitoring,
the number of complete data sets was markedly reduced compared with
those ofthe pedagogy study that was going on at the same time. While the
pre-test took place for the entire cohort, the post-test was only completed by
a small proportion ofthe group due to the fact that we were unable to gather
all the classes together to complete the progress test under examination
Case Study 2: Evaluating programs 185
conditions in the final week ofterm due to staff reluctance to make available
teaching time for the completion of the test. Students were then asked to
complete it in their own time and an extension to the trial into the exami-
nation period was granted. This enabled a few more students to complete,
but by no means the whole cohort. Table 7.4 shows the spread of students
across the different languages and years for both the participant and non-
participant groupings. We can learn some things from an inspection of dis-
tribution of participants and non-participants across the languages and year
groups. Table 7.4 portrays the distribution of students by language group
and year. Participant numbers generally reflect the proportional difference
in cohort sizes within those language groups on the campus. In the larger
cohorts students clearly feel freer to opt out of participating, whereas in the
smaller cohorts we tended to have 100 per cent participation.
Table 7.5 shows summative data linked to the completion of various ele-
ments of the trial. The figure of 11/107 represents the total number of
students for whom we have complete quantitative data sets. The figure of
186 Chapter 7
8/107 represents the total number of students for whom we have complete
qualitative and quantitative data. Full data sets, that represent between 7
and 10 per cent ofthe original sample size, would be statistically practica-
ble if the original sample size was 350+. As it is, we may make speculative
inferences from the data we have to work with, but speculative is all they
will be. However, given that we have qualitative data to complement the
empirical data our speculative inferences may carry a little more weight.
Of course, the judgmental feedback from the students is valid in its own
right, though only to the extent that it is a truthful record ofthe students
experience, and most CALL impact studies rely entirely on such qualitative
data, however, for the purposes of configuring data collection methods,
this Case Study fell short of our desired objective.
Number of complete data That is, those students from the treatment
sets for the purpose of AND the comparison groups who completed
11
assessing learning gains both the placement test (= pre-test) and the
during the trial progress test (= post-test).
Number of complete data (two of these are actually one student who
8
sets in the treatment group accessed two languages)
Number of complete data
sets in the comparison 3
group
Survey returns 8
Table 7.6 records the levels and overall time spent on task by the treatment
group only. Noteworthy in the table is the very low number of students
who spent more than ten hours on the package. Several recorded in their
logs that they had spent more than they actually had. Other (more honest?)
students recorded their own disappointment with themselves at the lack
of time spent and also stated their desire to have the product integrated
into their modular studies rather than as an adjunct.
No test level
Advanced+
Intermed.+
Advanced
Intermed.
Expert
TMM9
hrs hrs hrs hrs hrs
Overall 5 1 17 34 6 63 8 16 34 8 4 1
French (yr 1) 0 0 0 8 0 8 1 3 5 0 0 0
French (yr 2) 0 0 1 6 2 9 0 3 3 2 0 1
French (yr 3) 0 1 3 11 0 15 0 5 10 0 0 0
German (yr 1) 0 0 0 0 0 0 0 0 0 0 0 0
German (yr 2) 0 0 2 1 0 3 3 0 2 1 0 0
188 Chapter 7
German (yr 3) 0 0 0 1 1 2 0 0 2 0 0 0
Spanish (yr 1) 0 0 1 5 1 7 1 2 5 0 0 0
Spanish (yr 2) 0 0 3 2 0 5 0 1 3 0 1 0
Spanish (yr 3) 1 0 2 0 0 3 3 1 2 0 0 0
English (EFL) 4 0 4 0 1 9 0 0 2 5 2 0
Italian 0 0 1 0 1 2 0 1 0 0 1 0
The mean increase (i.e. learning gains) pre to post-test for the treatment
group was 0.7 (or 7 per cent). The mean increase pre to post-test for the
comparison group was 1.13 (or 11.3 per cent). Neither of these increases
was shown to be statistically significant when submitted to an independ-
ent samples t-test.
There are various speculative inferences that can be drawn from a configu-
ration of the qualitative and quantitative data. The poorer (though not
significantly so) performance ofthe treatment group hints at a number of
tentative conclusions to be drawn. First, it is clear that insufficient time was
spent by the treatment group on the product to make a significant impact
on their learning. Secondly, the slightly greater learning gains made by the
comparison group, who had no access to the product, suggests that they
benefited from possibly having more time to devote to their other language
learning. The treatment groups time commitment to the software, on the
other hand, was not sufficient of a critical mass to bring them any real com-
parative benefit. Thirdly, the product needs to be trialled as an integrated
element of normal studies and modular assessment to test the students
assertion that they would then take it more seriously. To conduct such a
trial on a sufficiently large scale to obtain the required generalizable data
necessitates, however, an institution-wide decision to integrate it across
the board. Such a decision, in most institutions, would only be based on
Case Study 2: Evaluating programs 189
evidence that the product would bring the desired benefit, evidence which
would only be available from just such a trial! This returns us to the chicken
and egg argument that to obtain the data we need to demonstrate effec-
tiveness often involves us in the ethical dilemma of potentially favouring or
disadvantaging groups of students, the only way round which is to ask for
volunteers, who in turn will not be able or willing to commit the necessary
time outside of their normal hours to make the study viable.
Further trialling of the product at HE level, preferably over a full
academic year, will be necessary to confirm or reject the hypothesis that
its consistent and integrated use actually does lead to improved learning
gains. The generally positive student reaction to the product, and some
ofthe statistical data, suggest that the products fuller, or at least blended,
harnessing within an integrated programme of study is defensible at the
very least from a motivational point of view. Whether it will indeed yield
the significant learning gains that would justify both its expense and a
management decision to use it for more than just a self-access trainer is
another matter entirely.
General conclusions
For teachers used to the networked CD-ROM version and the teaching
approach required to deliver it in a laboratory environment, the switch
to an online/distance tutoring mode may involve quite a conceptual and
pedagogical leap. Some ofthe features that might have remained redundant,
such as the email link to ones tutor and the bulletin board immediately
become more useful. Who needs to email their tutor when s/he is in the
room? Likewise the tracking facility becomes essential viewing when one
is not seeing the students at work. Knowing how many hours per week and
the date oflast usage ofthe package are most helpful and instantly flag up
those students who are slacking.
190 Chapter 7
The teacher mode for TMM9 is now no longer a front of class ped-
agogue but rather a facilitator who can intervene more as monitor or
consultant. Clearly dangers exist if the online mode is used entirely in its
distant mode in other words with no classroom contact (prsentiel);
weeks may pass without a student feeling they wish or need to commu-
nicate with the teacher, and a teacher who does not check the progress of
students regularly will not be so immediately aware of problems compared
with the instant feedback one gets from noticing the absence of a student
in a classroom or lab situation.
This e-learning environment probably is better suited to a business
learning environment where the learner probably has no other academic
commitments and where the online tutor is more available to provide prompt
feedback to students and to monitor what is happening with each student
from day to day, than to a higher education setting where the tutor and stu-
dents primary interaction is in the classroom or face-to-face tutorial, and
where constant online monitoring and feedback is less likely to occur.
TMM9, like TMM7, risks being ignored, or else purchased and then
ignored, due to the impossibility of adapting the cultural content to exist-
ing curricula. The manufacturers may point to its vast size, and extensive
functionality and its dynamic mode, or to the fact that it rehearses all the
major language learning skills or to the vast number oflearning paths and
customizable permutations, but this does not get around the fact that
Auralog has decided the themes covered and has chosen the content of
the databases and the cultural workshops.
On the one hand the software needs to adapt to the changing edu-
cational context. On the other hand one could also argue that if teachers
are going to continue to recruit, inspire and retain students with the fun
of learning and the wow factor that CALL can bring, then they need
to consider adapting their modules to include time allocations as well as
assessment requirements involving programs, such as TMM7 or TMM9,
if they can be shown to reduce the teacher workload, motivate students
to learn the language in their own time, and can be matched to suit the
students level of learning and learning needs.
Case Study 2: Evaluating programs 191
When set against the twelve CALL Enhancement Criteria (see Table 7.7)
for evaluation ofCALL task appropriateness the TellMeMore package, in
the two formats we have trialled, still seems to be transgressing a number
of principles of good design. The student and staff reaction indicated a
shortcoming in the evaluative criterion positive impact. The datedness
of the video material is linked to the evaluative criterion of authenticity,
or lack of it, as Chapelle defines it: the degree of correspondence between
the learning activity and target language activities of interest to learners
out of the classroom (2001: 55). Students are nowadays, as ever, sensitive
to fashion and topicality and where this is absent a corresponding affective
hurdle can be raised in the students mind. The technical problems raised
192 Chapter 7
by the students raise some, albeit minor, doubts about TMMs fulfilment
of Chapelles sixth criterion: practicality, that is, the adequacy of the
resources to support the use ofthe CALL activity, though TMM9 repre-
sents an advance on TMM7. The crossword and Hangman exercises do not
seem to meet fully the criterion oflanguage learning potential, that is, the
degree of opportunity present for beneficial focus on form. The customiz-
able learning paths and CAT tests fully meet the criterion of learner fit,
by providing significant opportunity for engagement with language under
appropriate conditions given learner characteristics. However, we were not
able, with the experimental constructs we had particularly for the TMM9
trial, to test adequately the notion of engagement with language under
appropriate conditions as the product was generally underused and some
students would have welcomed a more integrated engagement.
12 CALL
Enhancement Definitions TMM7 and 9
Criteria
Mixed; Yes for dialogues;
Language degree of opportunity
dictation and some grammar
learning present for beneficial focus
exercises; No for crosswords,
potential on form
hangman
amount of opportunity for
Yes for learning paths; no
engagement with language
Learner fit for adaptability of content to
under appropriate conditions
different teaching settings
given learner characteristics
Chapelle (2001)
Table 7.7 TMM7 and TMM9 mapped against the twelve CALL Enhancement Criteria.
MFE2 will allow for a judgmental scoring ofCALL programs against the
above criteria and will direct researchers, language department managers
and teachers to question the capacity and flexibility of the teaching and
learning environment to integrate fully the package into its demands and
needs. To this end the twelve-point checklist will be supplemented with
quality control checklists that also draw on the full range of evaluative
194 Chapter 7
principles already mapped from the literature. Tables 7.8 to 7.9 are inter-
mediary examples of evaluations of TMM7 and TMM9 based on those
authors whose criteria are relevant to software judging.
Author
PPP
Levels of
Beginner to Advanced Beginner to Expert
and structure
competence
Pre-set paths or Pre-set paths or
Course structure
customizable customizable
Lesson structure Topic or skills based Topic or skills based
Language learn- Teacher-led or self- Teacher-led or self-
Methodological issues
confusing in places
more visible
Wide range of inter-
Wide range of inter-
activity; but sealed
activity; but sealed
Hypermedia and content; linear pro-
content; linear progres-
linearity gression but also
sion but also heuristic
heuristic approach
approach possible
possible
Balanced (can be both Balanced (can be
Autonomy versus
teacher- or learner- both teacher- or
control
driven) learner-driven)
Autonomy and
Very good Excellent
self-tuition
Case Study 2: Evaluating programs 195
Euro-centric; video
Euro-centric; some
Practical considerations
Authenticity material needs
dated video material
updating
Active and pas- Mainly active; all skills Mainly active; all
sive learning trained skills trained
Highly interactive
Highly interactive;
Interaction and excellent feedback
very good feedback and
response and exhaustive
tracking
tracking
Tables 7.8 and 7.9 summarize the comparative features of TMM7 and
TMM9 using Ingraham and Emerys and Hubbards models for courseware
evaluation. The italicized boxes highlight where there are significant differ-
ences (usually improvements) between the two versions of the software.
These show up, in particular, the increased degree of autonomy, monitoring
and feedback that the Campus version has brought to the product.
In the context ofthis study, the TellMeMore Case Study has enabled
the road-testing of the proto-typical evaluative framework MFE1 with
regard to commercially produced language learning software. This frame-
work employed various criteria from six CALL authors (Pederson, Dunkel,
Ingraham and Emery, Hubbard, and Chapelle) who have contributed in
their different ways to the conceptualization of criteria for the design and
pedagogical implementation of software over the past two decades. For
MFE2 these criteria will support the judgmental evaluation using primarily
the twelve CALL Enhancement Criteria. Further work will also be required
to test the frameworks adaptability to the full range oflanguage courseware
and software types: from commercial to home-produced packages, from
simpler single language skill trainers to sophisticated multi-skill packages,
from stand-alone CD-ROMs to networked CD-ROMs and Internet-based
e-learning tuition systems complete with tutor support.
It is time now for the next road-test of our evolving methodology: the
evaluation ofPedagogy. Chapter 8 will feature the TOLD and BLINGUA
projects carried out at the University ofUlster. The TOLD project focused
on CALLs impact on oral skills, and the BLINGUA project that looked
at blended learning, with particular reference to writing skills, comprehen-
sion and area studies.
196 Chapter 7
Author
PPP
Provides meaningful
Yes but some less Yes but still some
communicative interac-
meaningful, and some less meaningful,
tion between student and
dated and some dated
computer
Excellent progres-
Provides comprehensible Very good progres-
sion aided by CAT
input at a level just beyond sion some exercises
test still some
that currently acquired by (e.g. Hangman) mis-
mismatching of
the learner matched to levels
activities to levels
Promotes a positive self Positive feedback,
Positive feedback, fun
image in the learner fun
Motivates the learner to Yes (some frustrating Yes (some access
use it bits) issues at start)
Hubbard (1988)
Introduction
goals, wished to identify what factors were contributing to any gains, create
a bank of qualitative and quantitative evidence of good practice, diagnose
pedagogies that work in situ and ultimately mitigate staff and student
reluctance to embrace CALL.
The TOLD (Technology and Oral Language Development) and the BLIN-
GUA (B is for blended) projects aimed to identify and correlate learn-
ing gains with a number of different variables such as learning style, prior
familiarity with ICT, blend of environment and pedagogical approach
in the context of teaching different language skills. TOLD looked at the
teaching of oral skills, BLINGUA looked at comprehension and writing
skills, in particular in area studies (civilization) teaching.
TOLD assessed student progress in oral skills across two groups, one
using technology and the other a traditional conversation class. BLINGUA
was designed as a longitudinal study over two to three years looking, in
particular, at the teaching to first and second year undergraduates of area
studies. Both projects have generated published material (Barr, Leakey
and Ranchoux (2005), and Leakey and Ranchoux (2005), respectively);
this Case Study will only approach these projects and data from the new
angle of our effectiveness research model.
The TOLD project focused on oral skills training (responding to the com-
puter using speech recognition software) and oral communication within
the classroom/e-lab between students and with the teacher (responding to
Case Study 3: Evaluating pedagogy 199
Chapter 8
Teacher-led: didactic and For explanation of tasks and
Partially 2 3
directive, from the front group discussions
Case Study 3: Evaluating pedagogy
Negotiated goal-set-
Constructivist: instructed
Partially 2 2 ting; reflective learning;
SLA; ZPD
scaffolding
Table 8.1 Model for CALL evaluation (MFE1) CALL pedagogy checklist.
201
202 Chapter 8
After completing the pre-test together in the old analogue lab the cohort
was divided into two groups. The students were divided into four small
conversation classes along course lines taught by native-speaker Learning
Assistants. Two of the groups were comparison groups denied access to
technology, but taught with similar content. The two treatment groups
were taught in the multimedia laboratory. The project focused on the single
hour per week allocated to French conversation classes. Students also had
five other hours per week of other language tuition.
The data collection and evaluation methods were in keeping with the MFE1
framework and are summarized below and in Table 8.2. Three qualitative
surveys were the surveys used at the start of all the Case Studies: a Language
Experience Questionnaire, an ICT-use survey, and the VARK learning style
survey. All students kept a reflective journal. Quantitative measurement
of learning gains relied on a pre- and post-test which all students sat, and
Case Study 3: Evaluating pedagogy 203
Pre-test Yes
Progress test (mid-treatment) No
Post-test (identical to pre-+ progress test) Yes
Table 8.2 MFE1 checklist for data collection methods: Mapping of the TOLD project.
204 Chapter 8
The environment
The Faculty of Arts at the University of Ulster is spread across four cam-
puses. Our project utilized the language resources available on the Coleraine
campus. The facilities included a small new multimedia lab (sixteen worksta-
tions) and an old analogue audio-visual laboratory (twenty workstations).
The multimedia classroom was equipped with the Robotel SmartClass2000
digital platform (already looked at in Chapter 6).
TOLD delivery
Multimedia was a feature of all the oral classes for the treatment group,
and not just an add-on. The only time that students regularly broke from
interaction with the computers was for the purpose of group discussion
or conversation.
Each of the four main language skills (listening, reading, speaking,
and writing) can be broken down into a number of different sub-skills.
For TOLD the skill of speaking was sub-divided into eleven sub-skills
(e.g. pronunciation, accent/intonation, fluency, one-to-one with a French
person, one-to-one in French with an English speaker, responding spon-
taneously in a conversation, responding to visual or aural input (e.g. from
TV/Radio), taking an active part in a structured group discussion, taking
an active part in an unstructured group discussion, giving a group pres-
entation, and giving a presentation alone). An evaluative framework for
CALL must surely require the capacity to identify and test separately such
sub-skills within not only the speaking skill but also the listening, read-
ing, writing, vocabulary acquisition, grammar, and area studies skills (in
particular reading comprehension and essay writing in the target language)
to provide an overview ofthe impacts ofCALL. Such information will be
of use to language teachers, learners, as well as CALL designers as CALL
products will be better at delivering some sub-skills than others.
Table 8.3 Validity assessment criteria for MFE1: Mapping of the TOLD project.
Case Study 3: Evaluating pedagogy 207
Where the table states see report this is because a full or direct answer
is impossible in the space provided in the table and in this chapter; fuller
explanation is available in the published report by Barr, Leakey and Ran-
choux (2005).
Table 8.4 shows more detailed mapping ofData Collection Measure
and Variables for TOLD project using MFE1. Due to the small sample size
and given that we did not select group members by learning style (or any
ofthe other independent variables) we were unable to control for these as
such in this project. Attendance was included as an additional numerical
criterion which we correlated with the learning gains.
While we were able to eliminate at least four of the listed confound-
ing variables (different content, location, cohort level and assessment) by
making sure these were consistent, nevertheless, we cannot account for
the possible skewing role played by the fact that we had different teachers
for the different groups (each with a certain freedom to deliver the con-
tent their way), a different class time/day of the week and two different
course groups. The difference in location was a control variable rather than
a confounding variable.
Our summary of findings below is based on statistical analyses using
the statistics program SPSS, which was fed with raw data gathered in Excel
spreadsheets.
208 Chapter 8
quantitative pre-test
quantitative post-test
quantitative module assessment
quantitative attendance
1. The parametric and non-parametric results were very similar for all
the tests analysed for this Case Study, implying that the sample size for
these tests (N = 29) was sufficiently reliable data from which to make
inferences. Nevertheless, a larger sample size would increase external
validity.
2. The Language learning Experience survey scores showed that both
the treatment and the comparison groups were, when viewed as a
whole, starting from a similar ability/experience benchmark (Treat-
ment group: 57.53 per cent; Comparison group: 59.57 per cent). This
went some way towards countering the skewing effect that might have
been caused by the fact that these groups were not randomly selected
but self-selecting according to course.
3. Both the treatment and the comparison group made significant
progress. However, the comparison group (NON-TECH) generally
made more progress than the treatment (TECH) group. The average
percentage gain for the comparison group was 15.64, whereas that of
the treatment group was just 5.44. Figure 8.1 shows that both groups
reached parity in outcome standard but the comparison group (NON-
TECH) had begun at a lower mean starting point. The immediate
suggestion is that the technology added nothing to learning gains,
and if anything may have hindered progress.
4. When broken down into individual sub-skills, the comparison group
did make significant progress in fluency, content and grammar, while
the treatment group did not. It is not so surprising that fluency and
content improved more in the comparison group as more time was
spent in this group on meaningful communication. What is more
surprising is that the treatment group, which had access to grammar
drilling software with built-in feedback, did not progress more in the
area of grammar.
210 Chapter 8
One way of analysing the test data was by means of paired t-tests for the
treatment (Table 8.5) and comparison (Table 8.6) groups respectively.
The final column reveals the significance scores at the 95 per cent (0.05)
confidence level. The bottom five rows in each table give the skills scores
and reveal that, while both groups made significant gains in pronuncia-
tion, and accent/intonation, only the comparison group made significant
gains in fluency, content and grammar. However, the difference between
the comparison groups gains in these three skills and the treatment groups
gains in the same skills was not statistically significant.
Case Study 3: Evaluating pedagogy
Paired Samples Test Paired Differences t df Sig.
(2-tailed)
Table 8.5 Task-by-task and skill-by-skill paired samples t-test for
Pair 1 Total % - PTotal -6.600 8.069 2.083 -11.069 -2.131 -3.168 14 .007
Pair 2 Task 2% - ptask2 -8.429 11.817 3.158 -15.252 -1.605 -2.669 13 .019
Pair 3 Task 4% - ptask4 -7.357 13.703 3.662 -15.269 .555 -2.009 13 .066
Pair 4 Task 5% - ptask5 -6.267 18.219 4.704 -16.356 3.823 -1.332 14 .204
Pair 5 Pronunciation % - ppron -8.800 9.615 2.483 -14.125 -3.475 -3.545 14 .003
Pair 6 Accent/Intonation % -10.800 10.930 2.822 -16.853 -4.747 -3.827 14 .002
- pAccInt
Pair 7 Fluency % - pfluency -6.000 11.458 2.958 -12.345 .345 -2.028 14 .062
Pair 8 Content % - pcontent -6.733 13.128 3.390 -14.004 .537 -1.986 14 .067
Pair 9 Grammar % - pgrammar -2.400 11.224 2.898 -8.615 3.815 -.828 14 .421
211
212
Table 8.6 Task-by-task and skill-by-skill paired samples t-test for the control group
Pair 1 Total % - PTotal -13.571 8.925 2.385 -18.724 -8.419 -5.690 13 .000
(Non-Tech).
Pair 2 Task 2% - ptask2 -15.429 18.912 5.054 -26.348 -4.509 -3.053 13 .009
Pair 3 Task 4% - ptask4 -7.000 27.139 7.527 -23.400 9.400 -.930 12 .371
Pair 4 Task 5% - ptask5 -7.917 14.311 4.131 -17.010 1.176 -1.916 11 .082
Pair 5 Pronunciation % - ppron -13.714 11.317 3.024 -20.248 -7.180 -4.534 13 .001
Pair 6 Accent/Intonation % -21.286 14.824 3.962 -29.845 -12.726 -5.373 13 .000
- pAccInt
Pair 7 Fluency % - pfluency -13.143 13.132 3.510 -20.725 -5.561 -3.745 13 .002
Pair 8 Content % - pcontent -12.143 12.347 3.300 -19.272 -5.014 -3.680 13 .003
Chapter 8
Pair 9 Grammar % - pgrammar -10.786 15.338 4.099 -19.642 -1.930 -2.631 13 .021
Case Study 3: Evaluating pedagogy 213
Even though the statistical evidence showed that the pedagogical benefit
of using technology for oral work was unclear, the views of students and
staff towards the use of technology in oral language development also
merit consideration to allow us to gauge the reaction ofboth groups to the
technology and help us answer the third of our research questions. Further
qualitative evidence was drawn from student and stafflogs and reports as
well as classroom observations. In spite of the restrictions of the software
and hardware resources available to us at the time and the fact that we were
not using CMC (such as conferencing) we found that a strong case can be
made for the use of technology for the tutorial, rehearsal and assessment
phases of oral skills teaching. The principal benefit is that technology can
ensure that every student is actively engaged in the production of speech
(whether interacting with the computer, a neighbour, or a native speaker
abroad) and the receiving of personalized and correct feedback more fre-
quently than in a class environment where there is usually no more than one
teacher. Technology also allows for the rapid access to multimedia resources
that may act as a prompt for oral production/discussion, or a means of
recording oral output and playback, thus allowing for a rich combination
of language skills that would be harder to replicate without technology.
The positive feedback for the CALL-based oral language tuition must be
correlated with the less positive impact on learning gains (i.e. the quantita-
tive data) for the treatment group to give a more balanced picture.
Students in the treatment group were willing to use technology and gener-
ally were very upbeat about its use. In fact, in some cases the use of compu-
ter technology was cited as the most positive aspect ofthe classes, making
them more interesting. Furthermore, in their logs a majority oftreatment
students report feeling that progress was made in several of the oral sub-
skills. A configuration of this positive finding with the less than remark-
able quantitative data highlights a wider issue in the area of CALL and
ICT, whether perceived pedagogical benefit of technology by learners
corresponds to the actual benefit derived.
Case Study 3: Evaluating pedagogy 215
Student logs also revealed that just under halfthe students in the treat-
ment group (7 out of 15) described the group discussions and debates as
the best aspect of the oral development classes. These activities were the
least technological aspects ofthe oral development classes. The technology
may help in the development and practice of oral skills through drill-and-
practice and pronunciation exercises the rehearsal stage but its role in
the application of this practice the performance stage is not as clear,
especially given that TOLD did not involve message-orientated communi-
cation (e.g. by webcam or video/audio conferencing) with a real audience
outside the lab. A future study would need to investigate the qualitative
and quantitative impact on learning ofthis kind oftechnology-enhanced
oral work, before a general statement about the benefits of CALL on the
oral skill.
Discussions with the tutors showed that they were not opposed to the tech-
nology in itself but that they felt it did not always fit in with the aims of
the oral classes. The stafffeedback in general pointed to a dehumanization
of oral classes when technology was introduced, and this was supported
from classroom observations. We found the tutors reaction to be one of
pragmatism in other words, only using the technology when it makes
a difference to the learning process: a view that confirms Gillespie and
Barrs findings (2002: 131). The BLINGUA project will explore further
the effectiveness of a more deliberate pragmatism in the design ofblended
teaching in CALL. For the purposes of oral communication it is clear from
the TOLD study that, while a lab environment that does not contain a
live remote oral link to native speakers (say, in France) may support and
benefit some oral sub-skills through activities such as drill-and-practice,
record and playback, and web-inspired discussion, TOLD-2, if it were to
take place, would need to evaluate where technology is able to introduce
a human element that the classroom cannot replicate, namely authentic,
live communication with native speakers abroad.
216 Chapter 8
L2 skills areas than others? to use CALL for oral skill work.
Some sub-skills within the oral
skill do however benefit more than
others.
Do certain levels of proficiency Weaker students may benefit more.
profit more from computer use than
others?
What role does feedback play in the Students appreciated instant and
effectiveness of CALL pedagogy? discrete feedback in TMM software
(e.g. speech recognition) and the
Robotel monitoring facility.
What degree of learner control is Some suggestion that differentiation
related to effective CALL designs? in learner control through
individualized learning paths may
enhance motivation.
have the particular effect. It is very possible that, as our analysis ofthe Hub-
bard table suggests, the lack of opportunities for meaningful output using
technology meant that it was partly our approach to teaching oral language
in the lab that prevented greater progress being made. Ideally, therefore, a
number of different CALL oral environments and oral pedagogies, as well
as blends ofCALL and non-CALL approaches, need to be trialled against
the same sub-skills and tasks before we get any nearer a more definitive
answer for the benefits of CALL on oral skills.
Having applied MFE1 to a single language learning skill (speaking)
over a single semester, we turned our attention to applying it to a mix of
skills (reading comprehension and writing within an area studies setting)
in a more longitudinal study (over the course oftwo semesters), delivered
in a more considered blending of settings, media and pedagogies. This
time both the treatment group and the comparison groups would have
access to CALL.
Introduction to BLINGUA
BLINGUA-1 students were followed from their first year into their second
year, when they were now familiar with the technology and the new envi-
ronment. We hoped this would minimize skewing effects such as Haw-
thorne, and enable us to confirm short-term inferences with more certain
long-term findings. Secondly, we wanted to move away from comparing
CALL with non-CALL students in the same year for a number of reasons:
from an ethical point of view we did not wish to deny half a cohort access
to a treatment; we also wanted to focus on developing the right pedagogy
for CALL so our comparison was between different approaches to CALL
pedagogy (i.e. differentiated by learning style versus non-differentiated).
So, for the purposes of comparative data analysis, rather than using students
within the same year, we compared the same cohort and similar module
test data from a previous year (20032004) when no use of CALL had
been involved. Our third advance was to reduce the amount of language
teaching taking place outside the study by ensuring that, for BLINGUA-2
at least, all three contact hours per week for our module were taught in the
CALL environment (rather than the single hour that TOLD had worked
with). This aspect ofthe Case Study, therefore, harnesses our MFE frame-
work to provide a longitudinal perspective ofthree years of data-gathering
where a number of constant variables (same teachers and module, similar
learning content and assessments) have been maintained to ensure internal
validity, and where quantitative and qualitative data have been configured
to provide a mix of phenomenological richness and empirical rigour.
There were two central research questions driving BLINGUA-1 and BLIN-
GUA-2. These were:
CALL pedagogy tends to fall in line behind the pendulum swings oflan-
guage learning pedagogy and methodology (Decoo 2001), though it has
taken more easily to some approaches than others. Blended learning for
CALL can draw on the strengths of both behaviouristic and acquisition
approaches and resources, and need not restrict itself to computer-based
environments, resources and methodologies. The BLINGUA project has
been eclectic, too, in its trialling of different learning environments, teach-
ing and learning methods (at times teacher-centred, at others self-study or
parcours and learning style driven) and in the choice of software and online
resources, alternating as it did between the more behaviouristic CLEF and
Logifrench programs (used in BLINGUA-1) on the one hand and the more
constructivist, open-ended, customizable HotPotatoes program, parcours
(i.e., learning paths) of TellMeMore and home-produced web-enhanced
learning activities, on the other (used in both BLINGUA projects). Both
the treatment and comparison groups in BLINGUA-1 made use of the
same sixteen station digital lab employed for TOLD. BLINGUA-2 took
place in the newer forty-two station multimedia lab.
A blended approach should, ideally, strive to develop tasks and learning
activities or cycles of activities that prioritize meaningful communication
at some point in the teaching cycle, be it either earlier on in the cycle (as
in Task-Based Learning, where the production or performance precedes
222 Chapter 8
the focus on form) or at the end of the sequence (as in the Presentation-
Practice-Production approach). Both BLINGUA projects used the latter
(P-P-P) method, and we added a final phase: that of reflective learning by
means of student managed web-logs, paper-based logs and student inter-
views. Table 8.8 summarizes blend mixes in the BLINGUA groups and the
comparison group 20032004.
100%
classroom
Teacher led (l); Lecture Essay(s);
20032004 Board, OHP;
group discuss Theatre presentation;
FRE313 handouts; TV/video
(S+C) +Classroom 2 comps
Comparison
group
33% lab
(L+ S): Board, OHP;
BLINGUA1 LT +C/room
nonCALL handouts; TV/video Essay(s);
20042005
(C): CALL and SC2000; MSOffice; presentation;
FRE103 LAB ( J205) +
split into LgS/ CMS; WWW; 2 comps
Comparison Mdiathque
NLgS TMM
group
100% lab
BLINGUA2 LABS Dry-wipe board,
Teacher led (l); Essay(s);
20052006 (MMLL) + SC2000; MSOffice;
group discuss 2 dossiers;
FRE313 Mdiathque online dict.;
(S+C) 2 comps
Treatment (MMRU) WebCT; WWW;
group
Table 8.8 Different blends of approach, setting, media and task in the BLINGUA
projects.
constructivist; Yes for treatment 3 for treatment 3 for treatment Negotiated goal-setting; reflective
instructed SLA; ZPD group group group learning; scaffolding
student-centred autonomous Yes for treatment 3 for treatment 3 for treatment
For all tasks (treatment gp.)
or ID/LS determined learning group group group
blended learning Yes for treatment 3 for treatment 3 for treatment P-P-P and Variation dependent on
mixed approach group group group the task
Treatment group mostly in CALL lab
blended learning Yes for treatment 3 for treatment 3 for treatment
(some occasionally in Mdiathque by
mixed setting group group group
LS). Comparison gp in CALL lab only
AV, TMM + grammar s/ware,
blended learning Yes for treatment 3 for treatment 3 for treatment
WORD/PPT, WWW (Mdiathque
mixed resources group group group
paper-based by LS)
223
Table 8.9 Model for CALL evaluation (MFE1) CALL pedagogy checklist (BLINGUA-1).
224 Chapter 8
Constructivist; instructed
Yes 3 4 reflective learning; scaffolding
SLA; ZPD
treatment group
Student-centred
For all tasks for treatment
autonomous or ID/LS Yes 3 4
group
determined learning
Blended learning mixed P-P-P and Variation dependent
Yes 3 4
approach on the task treatment group
All classes taught in the
Blended learning mixed multimedia lab, but students
Minimal 3 4
setting visited self-access suite
(MMRU) in own time
Blended learning mixed Lab, WWW, VLE, WORD/
Yes 3 4
resources PPT, online dictionary, MMRU
225
Table 8.10 Model for CALL evaluation (MFE1) CALL pedagogy checklist (BLINGUA-2).
226 Chapter 8
Data-gathering for the two BLINGUA projects involved fewer data col-
lection methods than had featured for the TOLD project (see Table 8.11).
This was partly due to the awareness that excessive diagnostic measures can
inhibit participants and potentially affect the accuracy of the data.
quantitative pre-test
quantitative post-test
quantitative attendance
control environment:
variable CALL vs non-CALL
control
% blended environment
variable
Case Study 3: Evaluating pedagogy 227
independent
Learning Style: VARK
variable
independent
ICT use
variable
independent
Language ability/level
variable
dependent
learning gains
variable
variable environment
confounding
different teacher
variable
confounding (previous
different class time
variable years)
confounding different day of week for (previous
variable classes years)
confounding different cohort level of (only when
variable language 303 vs 103)
confounding (to compare
different course cohort
variable other years)
confounding (only when
course content different
variable 303 vs 103)
(similar
confounding
different assessment used test type and
variable
structure)
Table 8.11 MFE1 Data Collection Measure and Variable details for
TOLD and BLINGUA projects.
228 Chapter 8
present? present?
1 and 2
Yes
Pre-test Yes (week 1) (c-test wk 1
+week 5 comp)
Progress test (mid-treatment) No No
Yes
Post-test (identical to pre-+ progress test) Yes (week 5) week 11 c-test +
comp
Table 8.13 shows the comparative validity of the two BLINGUA studies.
CHILL; CHILL,
What other factors (variables) might have halo; language halo; reduced
contributed to the effect? learning language-
outside of learned outside
study study
How will you control for extraneous
variables (such as learner/ teacher
See report See report
differences, variable settings, time of day/
week/year)?
How certain are you the learners are not
getting language instruction apart from See report See report
through this study?
Does the student reporting accurately
See report See report
reflect what happened?
Are the different variables (independent/
control/ dependent) clearly identified and Yes Yes
reported?
Case Study 3: Evaluating pedagogy 231
Generalizable sample
No N = 21 No N= 17
N > 30 use parametric tests
Sample less easily generalizable
Yes N =21 Yes N =17
N < 30 use non-parametric tests
To what extent can the results be
External validity
Table 8.13 Validity assessment criteria for MFE1: Mapping of the BLINGUA projects.
The BLINGUA-2 cohort was the same group of area studies students as
BLINGUA-1, now in their second year, and now taught entirely (Lec-
ture + Seminar + Comprehension class) in a multimedia lab setting with
full use oftechnology. The graphs for the 20052006 second year cohort
show parity in mean scores for the treatment and comparison groups. In
other words there was no significant improvement in the mean scores from
the first comprehension to the second (paired t test: p =.964; Wilcoxon
Signed Ranks for two related samples: p =.831]. This test is based on a
comparison of sample means of similar two comprehensions again set at
week 6 and 12.
Nevertheless, the results for this module represent the best scores for
any area studies group in a multimedia setting in the period 20032006.
The configured data showed us that motivation improved significantly
even though performance showed no positive upward trend. The fact that
they were taught 100 per cent in the new, ergonomically-improved mul-
timedia labs, and were given a theory-driven, blended approach may have
played a role.
To chart progress from BLINGUA-1 to BLINGUA-2 data from
both projects was collated in a spreadsheet (not included) from which
some useful inferences could be made. There was a slight, but not signifi-
cant, correlation between good attendance and final overall ranking in the
Case Study 3: Evaluating pedagogy 235
learning in one fixed setting, whether with blended teaching or not, has a
greater impact on language learning gains than a combination of settings
(CALL or absence of CALL being an insignificant variable).
This alternative hypothesis will clearly need further testing to isolate
what is really happening, and without recourse (as occurred in BLINGUA)
to a retrospective control group. There were at least three possible skewing
factors that might have been at play in our longitudinal study, and which
would need to be factored out in a future study. First, the 20032004
students were very familiar with their classroom environment and used
to all the well-established (teacher-led) routines; secondly, this group did
not have the extra challenge of having to manipulate and navigate a new
digital environment; and thirdly, one cannot be categorical about con-
clusions based on sample sizes of nineteen, twenty-one and seventeen.
With both the BLINGUA projects we were dealing with many novices
or relative novices to either the CALL setting, semi-autonomous blended
learning, or both. With larger sample sizes, studied over time, and with
learners and teachers well used to the new environment more favourable
data might be obtained.
Given the complexity ofthe research designs and the variables involved
there is uncertainty regarding the ability of our research design MFE1 to
assess empirically, within non-experimental contexts, the impact ofCALL
pedagogy with all its complexities and permutations. The real-life educa-
tional setting, in which the students learning was not confined to that
which was taking place during the research project class contact hours,
will always compromise validity. It was not certain whether our Model
for Evaluation could glean significant and generalizable data for language
teaching differentiated by learning style from a single institution where
the sample sizes were inadequate for the purpose of statistical analysis of
a multivariate study. As with TOLD a single semester was deemed to be
not long enough for students to get accustomed to a new approach to
language learning.
It became clear that full migration to CALL should not mean the
abandonment of blended teaching/learning and the reasoned use of the
classroom environment for some skills or sub-skills. Table 8.14 brings
together some of the insights into comparative advantages and disadvan-
tages of a CALL setting for the three tuition modes: lecture, seminar and
Case Study 3: Evaluating pedagogy 239
Lecture
Seminar
discussions
Technical glitches can be disruptive
Independent learning
Takes time to train novice users of
Reinforcement from networked
the lab (TOLD; BLINGUA-1)
materials in MMRU
As for seminar (interactivity,
Access to Internet translators
independent learning)
Comprehension
Comprehension
Table 8.14 Advantages and disadvantages of the blending of Multimedia lab + VLE.
For different teaching modes in the teaching of Area Studies in French (20032006).
240 Chapter 8
When the BLINGUA-1 and 2 projects are mapped against Dunkel (1991)
the progression in design, pedagogy and effectiveness when compared with
TOLD is highlighted (see Table 8.15). In particular, one should note the
increased role played by familiarity with CALL, the percentage of module
time spent using CALL, and the ergonomics of the setting. The speed of
turn-around in the diagnostics relating to the pre-test scores was vastly
increased due to the use of digital tests (i.e. the learning styles survey, c-test,
and TellMeMore test). This in turn enabled an increased efficiency in the
allocation of students to differentiated learning paths.
Table 8.16 shows a mapping ofthe BLINGUA-1 and 2 projects against the
twelve CALL Enhancement Criteria. Even more so than with the Dunkel
mapping above for these projects, this table shows progression towards a
more self-consciously SLA-type approach to teaching and learning for the
treatment groups and some of the benefits thereof.
Improved equipment
Much improved
compared with previous
ergonomics and technical
analogue lab. But some
Practicality functionality. Some
problems with sound
glitches in network, and
cards and head-sets and
noisy air-conditioning.
comfort of setting.
Case Study 3: Evaluating pedagogy 243
Collaborative CALL
LS groups. Pairs for for some of seminar
presentations element
However, any of the checklists that do not feature in this chapter, such as
the checklists used in the Case Studies chapters based on the evaluative
criteria ofDunkel, Hubbard, Ingraham and Emery, may also form the part
of future evaluations. Most of the checklists that appear in this chapter
have already occurred in some form or other in the Case Studies and there-
fore need little explanation. However, most do not occur in exactly the
same form as they did earlier, given that the Case Studies were primarily a
formative process whereby the MFE1 prototypes were tried and tested, and
some developments have been made, and some new checklists generated.
The main novelties being the space allowed in most checklists for scoring
(qualitatively) the quality of the nature and use of the CALL resource in
a given context, and twelve CALL Enhancement Criteria sub-checklists
(one for each criterion) to enable a more detailed analysis to be made for
these covering the main theoretical elements linked to each.
MFE2 is, in short, a framework of quantitative and qualifying meas-
ures for the comparative judgment of platforms, programs and pedagogies
that improves on MFE1. It will give us a basis for scoring future CALL
effectiveness research evaluations using a trialled framework drawn from
CALL and SLA principles and road-tested in the Case Studies. Figure 9.1
was introduced in Chapter 4 and provides from the outset a simple over-
view as to the overall proposed evaluative process. The flowchart has been
amended slightly with the Research Design Criteria table appearing twice.
Its first appearance is as a prospective checklist straight after the Diamond
timeline, and here the questions are couched in the future tense. Its second
appearance is just before the final write-up as a retrospective checklist in
which the questions are couched in the past tense. The subsequent tables
enlarge upon each of the elements in this figure.
A clear idea of the effectiveness of CALL will only be gained when
CALL studies follow an agreed agenda and conform to accepted standards
of validity and reporting. MFE2 is a suggested solution. While its main
focus will be forward-looking, there may well also be value in revisiting past
CALL studies to pull these too into the same systematic categorization,
or meta-analysis, looking at not just programs and pedagogy but also the
ever-evolving digital platforms.
A new framework for evaluating CALL 249
EVALUATION FLOW-CHART
Learner fit
The left hand column in Figure 9.1 is made up of the twelve evalua-
tive criteria, or CALL Enhancement Criteria, which were generated from
the literature review and subsequent mapping exercises. They, it is argued,
should inform the direction and scope ofCALL evaluation studies, be they
qualitative or quantitative. Table 9.1 gives the full list of twelve criteria and
their definitions.
Table 9.1 Synthesized list of criteria for evaluation of CALL programs, platforms
and pedagogy (MFE2). With definitions.
now enlarged from the criterion definition list (Table 9.1) to a chart with
columns for scoring adherence in a given institution or setting. Once the
evaluator has decided which P (or combination ofPs) is to be evaluated
the aims and objectives and the time-horizon ofthe study then need to be
spelt out. After that the evaluator needs to determine which phase of the
teaching cycle will be assessed, whether it is a task-based learning (TBL),
(Pre-task/Task phase/Language) or a Presentation-Practice-Production
cycle. From that point the form invites a judgmental grading of each of
the twelve elements for the extent to which it features in the cycle and the
quality of its implementation. An overall score may also be given which
will ultimately be converted into a percentage (see Net score cell). Tables
9.39.14 follow the same outline.
MFE2 Discrete principle quality control: 12 criteria for CALL Enhancement
Tick P being Time horizon (cross-sectional;
PPP descriptor Aims and objectives of CALL evaluation
studied longitudinal; time-series)?
Platform (hardware/ software
Which of the 3 Ps? solution; VLE?; brand name?):
Table 9.2 Model for CALL evaluation (MFE2) Quality control.
networked or online?):
Pedagogy (language learning
theory; method used?):
Phase of Element How well done?
Degree (03)
cycle present? (05)
0 = not at all 0 = poorly
Description Notes on
CALL enhancement criteria 1 = minimally 1 = minimally
Phase 2
Phase 3
Phase 1
of element evidence
Yes/No 2 = somewhat 3 = to a great extent
3 = fully 5 = excellently
CJ = cannot judge CJ = cannot judge
1. language learning potential
2. learner fit
Qualitative/judgmental data gathering
3. meaning focus
4. authenticity
5. positive impact
6. practicality
7. language skills & combinations of skills
8. learner control
9. error correction and feedback
10. collaborative CALL
11. teacher factor
12. tuition delivery modes
13. language learning potential
Totals Grand total: Final % grade:
254 Chapter 9
Tick P being
PPP descriptor Aims and objectives of
studied
Platform (hardware/ software solution;
Which of the 3 Ps?
Phase 2
Phase 3
Phase 1
Yes/No
Modified interaction
built-in opportunities for interruption of a reading,
listening or viewing task to allow for interactive
sequences and help options
Modified output
Tick P being
PPP descriptor Aims and objectives of
studied
Phase of
Element present?
cycle
Phase 3
Phase 1
Yes/No
brand name?):
Program (commercial/ in-house; networked or
online?):
Pedagogy (language learning theory; method
used?):
Phase of Element
cycle present?
Phase 2
Phase 3
Phase 1
Yes/No
Summarizing of content
Note-taking
Gap-filling
Dictation/transcribing
Information gap activity
Vocabulary building
Feedback/error correction focused on meaning
Comprehension questions (multiple-choice)
Comprehension questions (open-ended)
Combined skill activity (L/R/S/W)
Training in improved communication strategies
Subtitling, voice-over tasks with AV clips
Totals
A new framework for evaluating CALL 259
Tick P being
PPP descriptor Aims and objectives of
studied
Phase 2
Phase 3
Phase 1
Yes/No
Criterion 4: Authenticity
Phase 2
Phase 3
Phase 1
criterion 5: Positive impact
Yes/No
Tick P being
PPP descriptor Aims and objectives of
studied
Phase 2
Phase 3
Phase 1
criterion 6: Practicality
Yes/No
Criterion 6: Practicality
Phase 2
Phase 3
Phase 1
criterion 7: Language skills
Yes/No
Tick P being
PPP descriptor Aims and objectives of
studied
Phase 2
Phase 3
Phase 1
criterion 8: Learner control
Yes/No
Learning path
Pace oflearning
Number of attempts at a task
Peer review
Communication with group
Communication with tutor
Totals
A new framework for evaluating CALL 269
Tick P being
PPP descriptor Aims and objectives of
studied
Phase 3
Phase 1
Yes/No
Explicit correction
Implicit correction
Formative feedback
Summative feedback
Monitoring of student activity
Tracking of student activity
Reporting of student activity
Certification of student activity
Student access to data
Teacher access to data
Totals
A new framework for evaluating CALL 271
Phase 2
Phase 3
Phase 1
criterion 10: Collaborative CALL
Yes/No
Tick P being
PPP descriptor Aims and objectives of
studied
Phase 3
Phase 1
Yes/No
In terms of content
Qualitative/judgmental data gathering
Phase 2
Phase 3
Phase 1 Yes/No
The above tables should provide the evaluator with a clear idea ofthe qual-
ity ofCALL provision, resources and adherence to principles of pedagogy.
As such they may exist as a stand-alone study. However, if the evaluator
wishes to conduct an empirical, or positivistic, study of the impact of
CALL provision on student language learning gains and experience, and
data-gathering of a quantitative and qualitative nature, then what follows
is guidance as to a possible methodology to follow. As with the above
such a study may stand alone or else be combined and configured with
the phenomenological approach above to determine synergies provide a
fuller and richer picture.
Following on from Figure 9.2 are four quality control checklists (Tables
9.159.18) that were prefigured in Chapter 4, and provide prompts linked
to ensuring a high standard of research design, data collection, and validity
in an empirical study. Clearly a foundational understanding of statistical
techniques and an ability to use statistical analysis software is a prerequisite
for proceeding down this route. There is not space nor was it the aim in this
study to provide tuition in statistical analysis and readers should at the very
least consult the relevant literature before conducting such a study. Table
9.15 provides prospective questions relating to the design of the research
study, such as the sample size, allocation of subjects to treatment and com-
parison groups, and the metric measures to be employed. These need to be
addressed prior to the commencement ofthe study. The same table should
be revisited retrospectively at the end ofthe study, and Table 9.18 is similar
to 9.15 but with the questions couched in retrospective terms. Table 9.16
addresses the data collection methods (both qualitative and quantitative)
that will be employed, and gives space for the researcher to score the degree
and nature oftheir use. Table 9.17 asks the evaluator to consider the valid-
ity issues (both internal and external) necessary for a robust study whose
findings may then be generalizable to other contexts and studies.
Are the Instructors across the groups the same person/different people?
Are the Activities across the groups: identical, near-identical, different?
Is there a Treatment group and a Control or Comparison group?
Will the Pre- and Post-tests be identical, near-identical or different?
What Language(s) are being studied?
What Language Skill/Combination of language skills is under analysis?
What Variable(s) are being analysed
Is the Allocation of Subjects to groups random or selective
If Random allocation, how will this be achieved?
If Selective, what criteria and methods will be used to select subjects
What methods for controlling for & isolating of variables will be adopted?
Will the scoring be carried out by an independent scorer?
Is the wording of your Null Hypothesis and your Alternative Hypothesis
appropriate? Will these be recorded in your reporting?
What instrument(s) will be used for the Comparison of Means?
Quantitative instruments
Parametric or non-parametric?
What instrument(s) will be used to measure Correlation? Parametric or
non-parametric?
What instrument(s) will be used to measure Variance? Parametric or non-
parametric?
What instrument(s) will be used to measure Covariance? Parametric or
non-parametric?
Will an Effect Size equivalent be given where relevant?
What degree of confidence will be established at the outset? (99% or 95%)
Pre-test
Progress test (mid-treatment)
Post-test (identical to pre-+ progress test)
Table 9.16 Model for CALL evaluation MFE2 Quality control:Data collection measures. For learner and learning.
282 Chapter 9
Phase of cycle
(if relevant)
Validity checklist Pre-task,
for a CALL impact study Task phase,
Language
phase?
How will you control for extraneous variables (such as learner/ teacher
differences, variable settings, time of day/week/year)?
How certain are you the learners are not getting language instruction
apart from through this study?
Does the student reporting accurately reflect what happened?
Are the different variables (independent/control/ dependent) clearly
identified and reported?
Generalizable sample N > 30 use parametric tests
Sample less easily generalizable N < 30 use non-parametric tests
External validity
people?
Were the Activities across the groups: identical, near identical, different?
Is there a Treatment group and a Control or Comparison group?
Were the Pre- and Post-tests identical, near identical or different?
What Language(s) were being studied?
What Language Skill/Combination of language skills were under
analysis?
What Variable(s) were being analysed?
Was the Allocation of Subjects to groups random or selective?
If random allocation, how was this achieved?
If selective, what criteria and methods were used to select subjects?
What methods for controlling for and isolating of variables were adopted?
Was the scoring carried out by an independent scorer?
A new framework for evaluating CALL 285
It is clear that, in the search for rigour and improved validity in CALL effec-
tiveness MFE2 might run the risk of imposing excessive rigour, of losing
sight ofthe wood in the focus on the separate trees. It also risks alienating
the human subjects under investigation through excess of monitoring and
measuring, of exasperating the evaluator with excessive demands, and of
controlling out the human element just because it is so hard to pin down.
As Felix puts it:
286 Chapter 9
Naturally, one can go too far in the demand for the application of rigorous condi-
tions to educational research. After all, if we managed to control for every possible
confounding variable in an experimental design we would be left with the technol-
ogy itself as an independent variable, when in todays learning environment this is
inextricably linked to the instructional method and the context in which the learning
takes place. (2005a: 23).
While no single study, nor any meta-analysis on its own can so far give a definitive
answer on ICT effectiveness, a series of systematic syntheses of findings related to
one particular variable such as learning style or writing quality might produce more
valuable insights into the potential impact oftechnologies on learning processes and
outcomes. These would need to incorporate qualitative findings rather than rely on
effect sizes alone. An approach like this would begin to establish a research agenda in
ICT effectiveness rather than continue the series of isolated single studies on different
topics from which it is difficult to draw firm conclusions. (Felix, 2005a: 17).
This study has, it is hoped, begun to address Felixs call. The Case Studies
have all addressed various aspects of her agenda: her call for more rigour
in construct design, for greater detail and transparency in reporting, and
A new framework for evaluating CALL 289
Chapelle, C. (1997). CALL in the year 2000: still in search of research paradigms?
Language-learning & Technology [online], 1(1): 1943. Available at: <http://
llt.msu.edu/vol1num1/chapelle/> [accessed 10 October 2004].
Chapelle, C. (1998). Multimedia CALL: lessons to be learned from research on
instructed SLA. Language-learning and Technology [online], 2(1): 2234. Avail-
able at: <http://llt.msu.edu/vol2num1/article1/> [accessed 18 August 2004].
Chapelle, C. (2001). Computer applications in Second Language Acquisition. Cam-
bridge: Cambridge University Press.
Chapelle, C., & Jamieson, J. (1991). Internal and external validity issues in research
on CALL effectiveness. In P. Dunkel (ed.), Computer-Assisted Language-learn-
ing and Testing: Research Issues and Practice, pp. 3759. New York: Newbury
House, 1991.
Clarke, M. (2005). Moving towards the digital classroom. [Conference paper]. Pre-
sented at EUROCALL 2005 conference Krakow, Poland.
Coleman, J.A., & Klapper, J. (eds) (2005). Effective learning and teaching in modern
languages. London & New York: Routledge.
The Concise Oxford Dictionary (1982). 7th edn. Oxford: Oxford University Press.
Cutrim Schmid, E. (2007a). Enhancing performance knowledge and self-esteem in
classroom language-learning: The potential of the ACTIVote component of
interactive whiteboard technology. System, 35: 119133.
Cutrim Schmid, E. (2007b). Interactive Whiteboard technology: A further step
towards the normalisation ofCALL? [Conference paper]. Presented at EURO-
CALL 2007 Conference, University of Ulster.
CyberItalian. [Online]. Available at: <http://cyberitalian.com/> [accessed 10 Octo-
ber 2007].
Davies, G. (1997). Lessons from the past, lessons for the future: 20 years of CALL.
In A.-K. Korsvold & B. Rschoff (eds), New technologies in language-learn-
ing and teaching. Strasbourg: Council of Europe. Available at: <http://www.
camsoftpartners.co.uk/coegdd1.htm> [updated December 2007, accessed 11
January 2008].
Davies G. (ed.) (2007). Information and communications technology for language
teachers (ICT4LT). Slough, Thames Valley University [Online]. Available at:
<http://www.ict4lt.org/en/evalform.doc> [accessed 7 December 2007].
Davies, G., Bangs, P., Frisby, R., & Walton, E. (2005). Setting up effective digital
language laboratories and multimedia ICT suites for MFL. CILT. [Online].
<http://www.languages-ict.org.uk/info/digital_language_labs.pdf> [accessed
26 August 2005].
Davies, G., & Higgins, J. (1982). Computers, language and language learning. London:
CILT.
Bibliography 293
Piaget, J. (1970 [trans. of 1967]). Science of education and the psychology of the
child. London: Longman.
Reid, J. (1987). The learning style preferences of ESL students. TESOL Quarterly,
21: 87111.
Remenyi, D., Williams B, Money, A., & Swartz, E. (1998). Doing research in business
and management: An introduction. London: Sage.
Robotel Language Lab Systems website. [Online]. Available at: <http://www.robotel.
com> [accessed: March 2004].
Ross, M. (1991). The CHILL factor (or computer-hindered language-learning). In
Language Learning Journal, 4: 656.
Rousseau, J.-J., (1762). Emile, or On Education. [Online]. Available at: <http://www.
ilt.columbia.edu/pedagogies/rousseau/index.html> [accessed 10 January 2008].
English trans. by B. Foxley, 1911; rev. by G. Roosevelt, 1998.
Rowntree, D. (1981). Statistics without tears A Primer for non-mathematicians.
Harmondsworth: Penguin Books.
Salaberry, M.R. (1996). A theoretical foundation for the development of pedagogi-
cal tasks in Computer Mediated Communication. CALICO Journal, 14(1):
536.
Sanako. [Online]. Available at: <http://www.sanako.com> or <http://www.multi-
media-fl.com/LAB100datasheet1.pdf> [accessed 17 December 2007].
Saunders, M., Thornhill, A., & Lewis, P. (2006). Research methods for business stu-
dents. 4th edn. Upper Saddle River, NJ: Prentice Hall (Pearson Education).
Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-exper-
imental designs for generalized causal inference. Boston: Houghton Mifflin
Company.
Smith, W.F. (ed.) (1988). Modern media in foreign language education: Applications
and projects. Lincolnwood, IL: National Textbook Company.
TellMeMore Online version 9 web portal. [Online]. Available at: <http://www.
TellMeMorecampus.com/portalCOR/modportalCOR.axrq> [accessed 27 June
2007].
Thompson, J. (2005). Computer-Assisted Language-Learning. In J.A. Coleman & J.
Klapper (eds), 2005, pp. 148152.
Toner, G., Barr, D., Carvalho Martins, S., Duffner, K., Gillespie, J., & Wright, V. (2007).
Multimedia language-learning in UK universities A report by the Subject
Centre for Languages, Linguistics and Area Studies carried out on behalf ofthe
Centre for Excellence in Multimedia Language-Learning, University ofUlster.
[Online]. Available at: <http://www.cemll.ulster.ac.uk/site/news/CETL%20
Survey> [accessed 29 June 2010].
298 Bibliography
Towell, R., & Hawkins, R.D. (1994). Approaches to Second Language Acquisition.
Clevedon, Avon: Multilingual Matters.
University of Southampton Study Skills website. [Online]. Available at: <http://
www.studyskills.soton.ac.uk/studytips/learn_styles.htm> [accessed 30 Novem-
ber 2007].
Vygotsky, L.S. (1978). Mind in society: The development of higher psychological
processes. Cambridge: Harvard University Press.
Warschauer, M. (1996). Computer-assisted language-learning: An introduction.
In S. Fotos (ed.), Multimedia language teaching. Tokyo: Logos International,
pp. 320 [online]. Available at: <http://www.ict4lt.org/en/warschauer.htm>
[accessed 20 April 2006].
Warschauer, M. (2000). The death of cyberspace and the rebirth of CALL. English
Teachers Journal, 53: 6167 [online]. Available at: <http://www.gse.uci.edu/
person/markw/cyberspace.html> [accessed 20 July 2005].
Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview.
Language Teaching, 31: 5771.
Yeh, S.-W., & Lehmann, J.D. (2001). Effects oflearner control and learning strategies
on English as a Foreign Language (EFL) Learning from interactive hypermedia
lessons. Journal of Educational Multimedia and Hypermedia, 10(2): 141159
[online]. Available at: <http://www.editlib.org/index.cfm?fuseaction=Reader.
ViewAbstract&paper_id=8413> [accessed 22 November 2007].
Index