(Corpus and Discourse) Magali Paquot-Academic Vocabulary in Learner Writing - From Extraction To Analysis-Continuum PDF

Academic Vocabulary in
Learner Writing
Corpus and Discourse
Series editors: Wolfgang Teubert, University of Birmingham, and Michaela
Mahlberg, University of Liverpool.
Editorial Board: Paul Baker (Lancaster), Frantisek Čermák (Prague), Susan

Conrad (Portland), Geoffrey Leech (Lancaster), Dominique Maingueneau (Paris
XII), Christian Mair (Freiburg), Alan Partington (Bologna), Elena Tognini-
Bonelli (Siena and TWC), Ruth Wodak (Lancaster), Feng Zhiwei (Beijing).
Corpus linguistics provides the methodology to extract meaning from texts.

Taking as its starting point the fact that language is not a mirror of reality but lets
us share what we know, believe and think about reality, it focuses on language as a
social phenomenon, and makes visible the attitudes and beliefs expressed by the
members of a discourse community.
Consisting of both spoken and written language, discourse always has historical,
social, functional, and regional dimensions. Discourse can be monolingual or
multilingual, interconnected by translations. Discourse is where language and
social studies meet.
The Corpus and Discourse series consists of two strands. The first, Research in Corpus
and Discourse, features innovative contributions to various aspects of corpus
linguistics and a wide range of applications, from language technology via the
teaching of a second language to a history of mentalities. The second strand,
Studies in Corpus and Discourse, is comprised of key texts bridging the gap between
social studies and linguistics. Although equally academically rigorous, this strand
will be aimed at a wider audience of academics and postgraduate students working
in both disciplines.
Research in Corpus and Discourse
Conversation in Context
A Corpus-driven Approach
With a preface by Michael McCarthy
Christoph Rühlemann
Corpus-Based Approaches to English Language Teaching

Edited by Mari Carmen Campoy, Begona Bellés-Fortuno and Ma Lluïsa Gea-Valor
Corpus Linguistics and World Englishes

An Analysis of Xhosa English
Vivian de Klerk
Evaluation and Stance in War News

A Linguistic Analysis of American, British and Italian television news reporting of
the 2003 Iraqi war
Edited by Louann Haarman and Linda Lombardo
Evaluation in Media Discourse
Analysis of a Newspaper Corpus
Monika Bednarek
Historical Corpus Stylistics

Media, Technology and Change
Patrick Studer
Idioms and Collocations

Corpus-based Linguistic and Lexicographic Studies
Edited by Christiane Fellbaum
Meaningful Texts
The Extraction of Semantic Information from Monolingual and Multilingual
Corpora
Edited by Geoff Barnbrook, Pernilla Danielsson and Michaela Mahlberg
Rethinking Idiomaticity
A Usage-based Approach
Stefanie Wulff
Working with Spanish Corpora

Edited by Giovanni Parodi
Studies in Corpus and Discourse
Corpus Linguistics and The Study of Literature

Stylistics In Jane Austen’s Novels
Bettina Starcke
English Collocation Studies

The OSTI Report
John Sinclair, Susan Jones and Robert Daley
Edited by Ramesh Krishnamurthy
With an introduction by Wolfgang Teubert
Text, Discourse, and Corpora. Theory and Analysis

Michael Hoey, Michaela Mahlberg, Michael Stubbs and Wolfgang Teubert
With an introduction by John Sinclair
This page intentionally left blank
Academic Vocabulary in
Learner Writing
From Extraction to Analysis
Magali Paquot
Continuum International Publishing Group
The Tower Building 80 Maiden Lane
11 York Road Suite 704, New York
London SE1 7NX NY 10038
www.continuumbooks.com
© Magali Paquot 2010
All rights reserved. No part of this publication may be reproduced or

transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage or
retrieval system, without prior permission in writing from the publishers.
British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.
ISBN: 978-1-4411-3036-5 (hardcover)
Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress.
Typeset by Newgen Imaging Systems Pvt Ltd, Chennai, India

Printed and bound in Great Britain by the MPG Books Group
Contents
Acknowledgements xi
List of abbreviations xiii
List of figures xv
List of tables xvii
Introduction 1
Part I: Academic vocabulary

Chapter 1 What is academic vocabulary? 9
1.1. Academic vocabulary vs. core vocabulary and
technical terms 9
1.1.1. Core vocabulary 10
1.1.2. Academic vocabulary 11
1.1.3. Technical terms 13
1.1.4. Fuzzy vocabulary categories 13
1.2. Academic vocabulary and sub-technical vocabulary 17
1.3. Vocabulary and the organization of academic texts 22
1.4. Is there an ‘academic vocabulary’? 25
1.5. Summary and conclusion 27
Chapter 2 A data-driven approach to the selection of

academic vocabulary 29
2.1. Corpora of academic writing 31
2.2. Corpus annotation 34
2.2.1. Issues in annotating corpora 34
2.2.2. The software 36
2.3. Automatic extraction of potential academic words 44
2.3.1. Keyness 46
2.3.2. Range 48
2.3.3. Evenness of distribution 50
2.3.4. Broadening the scope of well-represented
semantic categories 53
2.4. The Academic Keyword List 55
viii Contents
Part II: Learners’ use of academic vocabulary

Chapter 3 Investigating learner language 67
3.1. The International Corpus of Learner English 67
3.2. Contrastive Interlanguage Analysis 70
3.3. A comparison of learner vs. expert writing 72
Chapter 4 Rhetorical functions in expert academic writing 81

4.1. The Academic Keyword List and rhetorical functions 81
4.2. The function of exemplication 88
4.2.1. Using prepositions, adverbs and adverbial
phrases to exemplify 90
4.2.2. Using nouns and verbs to exemplify 95
4.2.3. Discussion 106
4.3. The phraseology of rhetorical functions in expert
academic writing 108
Chapter 5 Academic vocabulary in the International

Corpus of Learner English 125
5.1. A bird’s-eye view of exemplification in learner writing 125
5.2. Academic vocabulary and general interlanguage features 142
5.2.1. Limited lexical repertoire 142
5.2.2. Lack of register awareness 150
5.2.3. The phraseology of academic vocabulary
in learner writing 154
5.2.4. Semantic misuse 168
5.2.5. Chains of connective devices 174
5.2.6. Sentence position 177
5.3. Transfer-related effects on French learners’ use
of academic vocabulary 181
Part III: Pedagogical implications and conclusions

Chapter 6 Pedagogical implications 201
6.1. Teaching-induced factors 201
6.2. The role of the first language in EFL learning
and teaching 203
6.3. The role of learner corpora in EAP materials design 206
Contents ix
Chapter 7 General Conclusion 211

7.1. Academic vocabulary: a chimera? 211
7.2. Learner corpora, interlanguage and second
language acquisition 215
7.3. Avenues for future research 216
Appendix 1: Expressing cause and effect 219

Appendix 2: Comparing and contrasting 226
Notes 235
References 240
Author index 257
Subject index 261
Acknowledgements
There are several people without whom this book would never have been
written. First and foremost, I want to express my deepest and most sincere
gratitude to my PhD supervisor, Professor Sylviane Granger, for her
infectious enthusiasm, her intellectual perceptiveness and her unfailing
expert guidance. I am greatly indebted to you, Sylviane, for giving me the
opportunity to join the renowned Centre for English Corpus Linguistics
seven years ago now! I have been lucky enough to undertake research in
an environment where writing a PhD also means collaborating with
many fellow researchers on up-and-coming projects, attending thought-
provoking conferences, organizing seminars, conferences and summer
schools, as well as lecturing and offering guidance to undergraduate
students.
I am also very grateful to my colleagues and friends at the Centre for
English Corpus Linguistics - Céline, Claire, Fanny, Gaëtanelle, Jennifer,
Marie-Aude, Suzanne and Sylvie – for making the Centre for English
Corpus Linguistics such an inspiring and intellectually stimulating research
centre. I also wish to thank them for their moral and intellectual support
and for all the entertaining lunchtimes we spent together talking about
everyday life . . . and work.
I am indebted to a great number of colleagues not only for supplying
me with corpora, corpus-handling tools and references, but also for
providing helpful comments on earlier versions and stimulating ideas for
my research. I would like to thank Yves Bestgen, Liesbet Degand, Jean
Heiderscheidt, Sebastian Hoffmann, Scott Jarvis, Jean-René Klein, Fanny
Meunier, Hilary Nesi, John Osborne and JoAnne Neff van Aertselaer. I am
also grateful to an anonymous reviewer for recommendations on the first
draft of the text.
I gratefully acknowledge the support of both the Communauté française
de Belgique, which funded my doctoral dissertation out of which this
book has grown, and the Belgian National Fund for Scientific Research
(F.N.R.S).
xii Acknowledgements
On a more personal note, I would like to express my deepest thanks to my

parents and friends for everything they have done to help me while I was
working on this book. And last, but not least, Arnaud: thank you for making
it all worthwhile.
Magali Paquot
Louvain-la-Neuve
November, 2009
List of abbreviations
AKL Academic Keyword List (my own list)

AWL Academic Word List (Coxhead, 2000)
BAWE British Academic Written English (BAWE) Pilot
Corpus
BNC British National Corpus
B-BNC Baby BNC Academic Corpus
BNC-AC British National Corpus – academic sub-corpus
BNC-AC-HUM British National Corpus – academic sub-corpus
(discipline: humanities and arts)
BNC-SP British National Corpus – spoken sub-corpus
CALL Computer-assisted language learning
CECL Centre for English Corpus Linguistics, Université
catholique de Louvain
CIA Contrastive Interlanguage Analysis
CLAWS Constituent Likelihood Automatic Word-tagging
system
CODIF Corpus de Dissertations Françaises
EAP English for academic purposes
EFL English as a foreign language
ESL English as a second language
ESP English for specific purposes
GSL General Service List (West, 1953)
ICLE International Corpus of Learner English (Granger
et al., 2002)
ICLEv2 International Corpus of Learner English (version 2)
(Granger et al., 2009)
IL interlanguage
L1 First language
L2 Foreign language
LDOCE4 Longman Dictionary of Contemporary English
(4th edition)
xiv List of abbreviations
LOCNESS Louvain Corpus of Native Speaker Essays

LogL Log-likelihood statistical test
MC Micro-Concord Corpus Collection B
MED2 Macmillan English Dictionary for Advanced Learners
(second edition)
MLD Monolingual learners’ dictionary
NS Native speaker
NNS Non-native speaker
pmw Per million words
POS Part-of-speech
SLA Second language acquisition
UCREL University Centre for Computer Corpus Research on
Language, Lancaster University
WST4 WordSmith Tools (version 4)
List of figures
Figure 1.1: The relationship between academic and sub-technical

vocabulary 21
Figure 2.1: A three-layered sieve to extract potential

academic words 45
Figure 2.2: WordSmith Tools – WordList option 49
Figure 2.3: Distribution of the words example and law in the
15 sub-corpora 50
Figure 2.4: WordSmith Tools Detailed Consistency Analysis 51
Figure 2.5: Distribution of the noun ‘solution’ 53
Figure 3.1: ICLE task and learner variables (Granger et al.,

2002: 13) 68
Figure 3.2: Contrastive Interlanguage Analysis (Granger 1996a) 70
Figure 3.3: BNCweb Collocations option 77
Figure 4.1: Exemplification in the BNC-AC-HUM 89

Figure 4.2: The distribution of the adverb ‘notably’
across genres 93
Figure 4.3: The distribution of ‘by way of illustration’
across genres 94
Figure 4.4: The distribution of ‘to name but a few’
across genres 95
Figure 4.5: The distribution of the verbs ‘illustrate’ and
‘exemplify’ across genres 103
Figure 4.6: The phraseology of rhetorical functions
in academic prose 121
Figure 5.1: Exemplifiers in the ICLE and the BNC-AC-HUM 127

Figure 5.2: The use of the prepositions ‘like’ and ‘such as’
in different genres 131
Figure 5.3: The use of the adverb ‘notably’ in different genres 131
xvi List of figures
Figure 5.4: Distribution of the adverbials ‘for example’ and

‘for instance’ across genres in the BNC 132
Figure 5.5: The treatment of ‘namely’ on websites devoted
to English connectors 140
Figure 5.6: The use of ‘despite’ and ‘in spite of’ in
different genres 145
Figure 5.7: The frequency of speech-like lexical items in expert
academic writing, learner writing and speech
(based on Gilquin and Paquot, 2008) 153
Figure 5.8: Phraseological cascades with ‘in conclusion’ and
learner-specific equivalent sequences 161
Figure 5.9: Collocational overlap 165
Figure 5.10: A possible rationale for the use of ‘according to me’
in French learners’ interlanguage 187
Figure 5.11: A possible rationale for the use of ‘let us in
French learners’ interlanguage 191
Figure 5.12: Features of novice writing - Frequency in expert
academic writing, native-speaker and EFL novices’
writing and native speech (per million words of
running text) 195
Figure 6.1: Connectives: contrast and concession

( Jordan 1999:136) 202
Figure 6.2: Comparing and contrasting: using nouns such
as ‘resemblance’ and ‘similarity’ (Gilquin et al.,
2007b: IW5) 208
Figure 6.3: Reformulation: Explaining and defining:
using ‘i.e.’, ‘that is’ and ‘that is to say’ (Gilquin
et al., 2007b: IW9) 209
Figure 6.4: Expressing cause and effect: ‘Be careful’ note on
‘so’ (Gilquin et al., 2007b: IW13) 210
List of tables
Table 1.1: Composition of the Academic Corpus

(Coxhead 2000: 220) 12
Table 1.2: Chung and Nation’s (2003: 105) rating scale for finding
technical terms, as applied to the field of anatomy 14
Table 1.3: Word families in the AWL 17
Table 2.1: The corpora of professional academic writing 31

Table 2.2: The re-categorization of data from the professional
corpus into knowledge domains 32
Table 2.3: The corpora of student academic writing 33
Table 2.4: Examples of essay topics in the BAWE pilot corpus 34
Table 2.5: An example of CLAWS vertical output 39
Table 2.6: CLAWS horizontal output [lemma + POS] 40
Table 2.7: CLAWS horizontal output [lemma + simplified
POS tags] 40
Table 2.8: Simplification of CLAWS POS-tags 41
Table 2.9: CLAWS tagging of the complex preposition
‘in terms of’ 41
Table 2.10: Semantic fields of the UCREL Semantic
Analysis System 42
Table 2.11: USAS vertical output 43
Table 2.12: USAS horizontal output 44
Table 2.13: The fiction corpus 47
Table 2.14: Number of keywords 47
Table 2.15: Automatic semantic analysis of potential
academic words 54
Table 2.16: Distribution of grammatical categories in the
Academic Keyword List 55
Table 2.17: The Academic Keyword List 56
Table 2.18: The distribution of AKL words in the GSL
and the AWL 60
xviii List of tables
Table 3.1: Breakdown of ICLE essays 69

Table 3.2: BNC Index – Breakdown of written BNC genres
(Lee 2001) 74
Table 4.1: Ways of expressing exemplification found in the

BNC-AC-HUM 89
Table 4.2: The use of ‘for example’ and ‘for instance’ in the
BNC-AC-HUM 91
Table 4.3: The use of ‘example’ and ‘for example’ in the
BNC-AC-HUM 95
Table 4.4: Significant verb co-occurrents of the noun ‘example’
in the BNC-AC-HUM 96
Table 4.5: Adjective co-occurrents of the noun ‘example’
Table 4.6: The use of the lemma ‘illustrate’ in the BNC-AC-HUM 103
Table 4.7: The use of the lemma ‘exemplify’ in the BNC-AC-HUM 105
Table 4.8: The use of imperatives in academic writing (based
on Siepmann, 2005: 119) 107
Table 4.9: Ways of expressing a concession in the
BNC-AC-HUM 109
Table 4.10: Ways of reformulating, paraphrasing and clarifying
Table 4.11: Ways of expressing cause and effect
Table 4.12: Ways of comparing and contrasting found in
the BNC-AC-HUM 112
Table 4.13: Co-occurrents of nouns expressing cause or effect
Table 4.13a: reason 115
Table 4.13b: implication 115
Table 4.13c: effect 116
Table 4.13d: outcome 116
Table 4.13e: result 117
Table 4.13f: consequence 117
Table 4.14: Co-occurrents of verbs expressing possibility and
certainty in the BNC-AC-HUM 119
Table 4.14a: suggest 119
Table 4.14b: prove 120
Table 4.14c: appear 120
Table 4.14d: tend 120
List of tables xix
Table 5.1: A comparison of exemplifiers based on the total

number of running words 128
Table 5.2: A comparison of exemplifiers based on the total
number of exemplifiers used 129
Table 5.3: Two methods of comparing the use of exemplifiers 130
Table 5.4: Significant adjective co-occurrents of the noun
‘example’ in the ICLE 133
Table 5.5: Adjectives co-occurrents of the noun ‘example’
in ICLE not found in the BNC 133
Table 5.6: Significant verb co-occurrents of the noun ‘example’
in the ICLE 134
Table 5.7: Verb co-occurrent types of the noun ‘example’
in ICLE not found in BNC 134
Table 5.8: The distribution of ‘example’ and ‘be’ in the ICLE
and the BNC-AC-HUM 135
Table 5.9: The distribution of ‘there + BE + example’ in ICLE
Table 5.10: The distribution of AKL words in the ICLE 143
Table 5.11: Examples of AKL words which are overused and
underused in the ICLE 144
Table 5.12: Two ways of comparing the use of cause and effect
markers in the ICLE and the BNC 146
Table 5.13: The over- and underuse by EFL learners of specific
devices to express cause and effect (based on
Appendix 1) 147
Table 5.14: The over- and underuse by EFL learners of
specific devices to express comparison and
contrast (based on Appendix 2) 149
Table 5.15: Speech-like overused lexical items per
rhetorical function 151
Table 5.16: The frequency of ‘maybe’ in learner corpora 154
Table 5.17: The frequency of ‘I think’ in learner corpora 154
Table 5.18: Examples of overused and underused clusters
with AKL words 156
Table 5.19: Clusters of words including AKL verbs which
are over- and underused in learners’ writing,
by comparison with expert academic writing 158
Table 5.20: Examples of overused clusters in learner writing 159
Table 5.21: Verb co-occurrents of the noun conclusion
in the ICLE 162
xx List of tables
Table 5.22: Adjective co-occurrents of the noun conclusion

in the ICLE 167
Table 5.23: The frequency of sentence-initial position of
connectors in the BNC-AC-HUM and the ICLE 178
Table 5.24: Sentence-final position of connectors in the ICLE
Table 5.25: Jarvis’s (2000) three effects of potential L1 influence 183
Table 5.26: Jarvis’s (2000) unified framework applied to
the ICLE-FR 184
Table 5.27: A comparison of the use of the English verb
‘illustrate’ and the French verb ‘illustrer’ 188
Table 5.28: ‘let us’ in learner texts 189
Table 5.29: The transfer of frequency of the first person
plural imperative between French and English writing 191
Table 6.1: Le Robert & Collins CD-Rom (2003–2004):

Essay writing 205
Introduction
That English has become the major international language for research
and publication is beyond dispute. As a result, university students need to
have good receptive command of English if they want to have access to the
literature pertaining to their discipline. As a large number of them are also
required to write academic texts (e.g. essays, reports, MA dissertations, PhD
theses, etc.), they also need to have a productive knowledge of academic
language. As noted by Biber, ‘students who are beginning university studies
face a bewildering range of obstacles and adjustments, and many of these
difficulties involve learning to use language in new ways’ (2006: 1). Several
studies have shown that the distinctive, highly routinized, nature of
academic prose is problematic for many novice native-speaker writers
(e.g. Cortes, 2002), but poses an even greater challenge to students for
whom English is a second (e.g. Hinkel, 2002) or foreign language (e.g.
Gilquin et al., 2007b).
Studies in second language writing have established that learning to write
second-language (L2) academic prose requires an advanced linguistic com-
petence, without which learners simply do not have the range of lexical and
grammatical skills required for academic writing (Jordan, 1997; Nation and
Waring, 1997; Hinkel, 2002; 2004; Reynolds, 2005). A questionnaire survey
of almost 5,000 undergraduates showed that students from all 26 depart-
ments at the Hong Kong Polytechnic University experienced difficulties
with the writing skills necessary for studying content subjects through the
medium of English (Evans and Green, 2006). Almost 50 per cent of the
students reported that they encountered difficulties in using appropriate
academic style, expressing ideas in correct English and linking sentences
smoothly. Mastering the subtleties of academic prose is, however, not only a
problem for novice writers. International refereed journal articles are
regarded as the most important vehicle for publishing research findings
and non-native academics who want to publish their work in those top jour-
nals often find their articles rejected, partly because of language problems.
2 Academic Vocabulary in Learner Writing
These problems include the fact that they have less facility of expression
and a poorer vocabulary; they find it difficult to ‘hedge’ appropriately
and the structure of their texts may be influenced by their first language
(see Flowerdew, 1999).
Because it causes major difficulties to students and scholars alike,
academic discourse has become a major object of study in applied linguis-
tics. Flowerdew (2002) identified four major research paradigms for
investigating academic discourse, namely (Swalesian) genre analysis,
contrastive rhetoric, ethnographic approaches and corpus-based analysis.
While the first three approaches to English for Academic Purposes (EAP)
emphasize the situational or cultural context of academic discourse,
corpus-linguistic methods focus more on the co-text of selected lexical
items in academic texts.
Corpus linguistics is concerned with the collection in electronic format
and the analysis of large amounts of naturally occurring spoken or written
data ‘selected according to external criteria to represent, as far as possible,
a language or language variety as a source of linguistic research’ (Sinclair,
2005: 16). Computer corpora are analysed with the help of software pack-
ages such as WordSmith Tools 4 (Scott, 2004), which includes a number of
text-handling tools to support quantitative and qualitative textual data anal-
ysis. Wordlists give information on the frequency and distribution of the
vocabulary – single words but also word sequences – used in one or more
corpora. Wordlists for two corpora can be compared automatically so as to
highlight the vocabulary that is particularly salient in a given corpus, i.e.,
its keywords. Concordances are used to analyse the co-text of a linguistic
feature, in other words its linguistic environment in terms of preferred
co-occurrences and grammatical structures. The research paradigm of
corpus linguistics is ideally suited for studying the linguistic features of
academic discourse as it can highlight which words, phrases or structures
are most typical of the genre and how they are generally used.
Corpus-based studies have already shed light on a number of distinctive
linguistic features of academic discourse as compared with other genres.
Biber’s (1988) study of variation across speech and writing has shown that
academic texts typically have an informational and non-narrative focus;
they require highly explicit, text-internal reference and deal with abstract,
conceptual or technical subject matter (Biber, 1988: 121–60). The Longman
Grammar of Spoken and Written English (Biber et al., 1999) provides a compre-
hensive description of the range of distinctive grammatical and lexical
features of academic prose, compared to conversation, fiction and newspa-
per reportage. Common features of this genre include a high rate of
Introduction 3
occurrence of nouns, nominalizations, noun phrases with modifiers,

attributive adjectives, derived adjectives, activity verbs, verbs with inanimate
subjects, agentless passive structures and linking adverbials. By contrast,
first and second person pronouns, private verbs, that-deletions and contrac-
tions occur very rarely in academic texts.
In addition, studies of vocabulary have emphasized the importance of a
‘sub-technical’ or ‘academic’ vocabulary alongside core words and techni-
cal terms in academic discourse (Nation, 2001: 187–216). Hinkel (2002:
257–65) argues that the exclusive use of a process-writing approach, the
relative absence of direct and focused grammar instruction, and the lack of
academic vocabulary development contribute to a situation in which non-
native students are simply not prepared to write academic texts. She pro-
vides a list of priorities in curriculum design and writes that, among the top
priorities, ‘NNSs [non-native students] need to learn more contextualized
and advanced academic vocabulary, as well as idioms and collocations to
develop a substantial lexical arsenal to improve their writing in English’
(Hinkel, 2002: 247). The Academic Word List (Coxhead, 2000) was compiled
on the basis of corpus data to meet the specific vocabulary needs of stu-
dents in higher education settings.
But what is ‘academic vocabulary’? Despite its widespread use, the term
has been used in various ways to refer to different (but often overlapping)
vocabulary categories. This book aims to provide a better description of the
notion of ‘academic vocabulary’. It takes the reader full circle, from the
extraction of potential academic words through their linguistic analysis in
expert and learner corpus data, to the pedagogical implications that can be
drawn from the results. Recent corpus-based studies have emphasized the
specificity of different academic disciplines and genres. As a result, research-
ers such as Hyland and Tse (2007) question the widely held assumption that
students need a common core vocabulary for academic study. They argue
that the different disciplinary literacies undermine the usefulness of such
lists and recommend that lecturers help students develop a discipline-based
lexical repertoire.
This book is an attempt to resolve the tension between the particularizing
trend which advocates the teaching of a more restricted, discipline-based
vocabulary syllabus, and the generalizing trend which recognizes the
existence of a common core ‘academic vocabulary’ that can be taught to a
large number of learners in many disciplines. I first argue that, to resolve
this tension, the concept of ‘academic vocabulary’ must be revisited.
I demonstrate, on the basis of corpus data, that, as well as discipline-specific
vocabulary, there is a wide range of words and phraseological patterns that
are used to refer to activities which are characteristic of academic discourse,

and more generally, of scientific knowledge, or to perform important dis-
course-organizing or rhetorical functions in academic writing.
A large proportion of this lexical repertoire consists of core vocabulary, a
category which has so far been largely neglected in EAP courses but which
is usually not fully mastered by English as a foreign language (EFL) learn-
ers, even those at the high-intermediate or advanced levels. I make use of
Granger’s (1996a) Contrastive Interlanguage Analysis to test the working
hypothesis that upper-intermediate to advanced EFL learners, irrespective
of their mother tongue background, share a number of linguistic features
that characterize their use of academic vocabulary. The learner corpus
used is the first edition of the International Corpus of Learner English (ICLE),
which is among the largest non-commercial learner corpora in existence.
It contains texts written by learners with different mother tongue back-
grounds. Ten ICLE sub-corpora representing different mother tongue
backgrounds (Czech, Dutch, Finnish, French, German, Italian, Polish,
Russian, Spanish, Swedish) are compared with a subset of the academic
component of the British National Corpus (texts written by specialists in the
Humanities) to identify ways in which learners’ use of academic vocabulary
differs from that of more expert writers. A comparison of the ten sub-
corpora then makes it possible to identify linguistic features that are
shared by learners from a wide range of mother tongue backgrounds, and
therefore possibly developmental. The EFL learners are all learning how
to write in a foreign language, and they are often novice writers in their
mother tongue as well.
However, not all learner specific-features can be attributed to develop-
mental factors. The comparison of several ICLE sub-corpora helps to
pinpoint a number of patterns that are characteristic of learners who share
the same first language, and which may therefore be transfer-related.
I made use of Jarvis’s (2000) unified framework to investigate the potential
influence of the first language on French learners’ use of academic vocabu-
lary in English.
The book is organized in three sections. The first scrutinizes the concept
of ‘academic vocabulary’, reviewing the many definitions of the term and
arguing that, for productive purposes, academic vocabulary is more use-
fully defined as a set of options to refer to those activities that characterize
academic work, organize scientific discourse, and build the rhetoric of
academic texts. It then proposes a data-driven procedure based on the
criteria of keyness, range, and evenness of distribution, to select academic
words that could be part of a common core academic vocabulary syllabus.
Introduction 5
The resulting list, called the Academic Keyword List (AKL), comprises a set of
930 potential academic words. One important feature of the methodology
is that, unlike Coxhead’s (2000) Academic Word List, the AKL includes the
2,000 most frequent words of English, thus making it possible to appreciate
the paramount importance of core English words in academic prose.
The AKL is used in Section 2 to explore the importance of academic
vocabulary in expert writing and to analyse EFL learners’ use of lexical
devices that perform rhetorical or organizational functions in academic
writing. This section offers a thorough analysis of these lexical devices as
they appear in the International Corpus of Learner English, describing the fac-
tors that account for learners’ difficulties in academic writing. These factors
include a limited lexical repertoire, lack of register awareness, infelicitous
word combinations, semantic misuse, sentence-initial positioning of adverbs
and transfer effects.
The final section briefly comments on the pedagogical implications of
these results, summarizes the major findings, and points the way forward to
further research in the area.
Part I
Academic vocabulary
‘Academic vocabulary’ is a term that is widely used in textbooks on English

for academic purposes and Second Language Acquisition (SLA) reference
books. Nevertheless, it can be understood in a variety of ways and used to
indicate different categories of vocabulary. In this section, my objectives are
to clarify the meaning of ‘academic vocabulary’ by critically examining its
many uses, and to build a list of words that fit my own definition of the term.
Chapter 1 therefore tries to identify the key features of academic vocabu-
lary and to clear up the confusion between academic words and other
vocabulary. Chapter 2 proposes a data-driven methodology based on the
criteria of keyness, range and evenness of distribution, and uses this to build
a new list of potential academic words, viz. the Academic Keyword List (AKL).
This list is very different from Coxhead’s Academic Word List and has already
been used to inform the writing sections in the second edition of the
Macmillan English Dictionary for Advanced Learners (see Gilquin et al., 2007b).
The AKL is used in Section 2 to analyse EFL learners’ use of lexical devices
that perform rhetorical or organizational functions in academic writing.
Chapter 1
What is academic vocabulary?
Academic vocabulary is in fashion, as witnessed by the increasing number

of textbooks on the topic. Recent titles include Essential Academic Vocabulary:
Mastering the Complete Academic Word List (Huntley, 2006) and Academic Vocab-
ulary in Use (McCarthy and O’Dell, 2008). But what is academic vocabulary?
The term often refers to a set of lexical items that are not core words but
which are relatively frequent in academic texts. Examples of academic
words include adult, chemical, colleague, consist, contrast, equivalent, likewise,
parallel, transport and volunteer (cf. Coxhead, 2000). Unlike technical terms,
they appear in a large proportion of academic texts, regardless of the disci-
pline. Academic vocabulary is also sometimes used as a synonym for sub-
technical vocabulary (e.g. mouse, bug, nuclear, solution) or discourse-organizing
vocabulary (e.g. cause, compare, differ, feature, hypothetical, and identify). In this
chapter, I set out to review the many definitions of academic vocabulary
that have been given and to clear up the confusion between academic
words, core words, technical terms, sub-technical words and discourse-
organizing words. I will show why a definition of academic vocabulary that
excludes the top 2,000 words of English is not very useful for productive
purposes in higher education settings and argue for a function-based
definition of the term. The very existence of academic words has recently
been challenged by several researchers in English for Specific Purposes
(ESP) who advocate that teachers help students develop a more restricted,
discipline-specific lexical repertoire. I will round off this chapter by situat-
ing the book in ongoing debates over generality vs. disciplinary specificity
in teaching vocabulary for academic purposes.
1.1. Academic vocabulary vs. core vocabulary

and technical terms
Numerous second language acquisition studies have investigated whether

there is a threshold which marks the point at which vocabulary knowledge
becomes sufficient for adequate reading comprehension. Laufer (1989;

1992) has shown that at least 95 per cent coverage is needed to ensure
reasonable comprehension of a text. To achieve this coverage, it is com-
monly believed that students in higher education settings need to master
three lists of vocabulary: a core vocabulary of 2,000 high-frequency
words, plus some academic words, and technical terms. Some researchers,
however, do not agree that vocabulary categories can be described as if they
were clearly separable. In this section, the notions of core vocabulary,
academic vocabulary and technical terms are described and illustrated. The
criticisms levelled at the division of vocabulary into mutually exclusive lists
are then reviewed.
1.1.1. Core vocabulary

A core (or basic or nuclear) vocabulary consists of words that are of high
frequency in most uses of the language. It comprises the most useful func-
tion words (e.g. a, about, be, by, do, he, I, some and to) and content words like
bag, lesson, person, put and suggest. Stubbs describes nuclear words as an
essential common core of ‘pragmatically neutral words’ (1986: 104) and
lists five main reasons for their pragmatic neutrality:
1. Nuclear words have a ‘purely conceptual, cognitive, logical or proposi-

tional meaning, with no necessary attitudinal, emotional or evaluative
connotations’ (ibid.).
2. They have no cultural or geographical associations.
3. They give no indication of the field of discourse from which a text is
taken, i.e. its domain of experience and social settings.
4. They are also neutral with respect to tenor and mode of discourse: they
are not restricted to formal or informal usage or to a specific medium of
communication, e.g. written or spoken language.
5. They are used in preference to non-nuclear words in summarizing
tasks.
The best-known list of core words is West’s (1953) General Service List of
English Words (GSL),1 which was created from a five-million word corpus of
written English and contains around 2,000 word families. Percentage figures
are given for different word meanings and parts of speech of each head-
word. In a variety of studies, the GSL provided coverage of up to 92 per cent
of fiction texts (e.g. Hirsh and Nation, 1992), and up to 76 per cent of
academic texts (Coxhead, 2000). Next to frequency and coverage, other
What is academic vocabulary? 11
criteria such as learning ease, necessity and style were also used in making
the selection (West 1953: ix–x). West also wanted the list to include words
that are often used in the classroom or that would be useful for understand-
ing definitions of vocabulary outside the list. The GSL has had a wide influ-
ence for many years and served as a resource for writing graded readers and
other material.
A number of criticisms have, however, been levelled at the GSL, most
particularly at its coverage and age. Engels (1968) criticized the low cover-
age of the second 1,000 word families. While the first 1,000 word families
covered between 68 and 74 per cent of the words in the ten texts of 1,000
running words he analysed, the second set of word families in the GSL
provided coverage of less than 10 per cent. In addition, because of changes
in the English language and culture, the GSL includes many words that are
considered to be of limited utility today (e.g. crown, coal, ornament and vessel)
but does not contain very common words such as computer, astronaut and
television (see Nation and Hwang, 1995: 35–6; Leech et al., 2001: ix–x; Carter,
1998: 207). However, several researchers have pointed out that, for educa-
tional purposes, it still remains the best of the available lists because of ‘its
information on frequency of each word’s various meanings, and West’s
careful application of criteria other than frequency and range’ (Nation and
Waring 1997:13).
1.1.2. Academic vocabulary

A number of academic word lists have been compiled to meet the specific
vocabulary needs of students in higher education settings (e.g. Campion
and Elley, 1971; Praninskas, 1972; Lynn, 1973; Ghadessy, 1979; Xue and
Nation, 1984). The Academic Word List (Coxhead, 2000) is the most widely
used today in language teaching, testing and the development of pedagogi-
cal material. It is now included in vocabulary textbooks (e.g. Schmitt and
Schmitt, 2005; Huntley, 2006), vocabulary tests (e.g. Schmitt et al., 2001),
computer-assisted language learning (CALL) materials, and dictionaries
(e.g. Major, 2006).
The Academic Word List (AWL) was created from a corpus of 414 academic
texts by more than 400 authors and totals around 3.5 million words.
The Academic Corpus includes journal articles, chapters from university
textbooks and laboratory manuals. It is divided into four sub-corpora of
approximately 875,000 words representing broad academic disciplines:
arts, commerce, law and science. Each sub-corpus is further subdivided
into seven subject areas as shown in Table 1.1.
Table 1.1 Composition of the Academic Corpus (Coxhead 2000: 220)
Running words Texts Subject areas
Arts 883,214 122 education; history; psychology; politics; psychol-

ogy; sociology
Commerce 879,547 107 accounting; economics; finance; industrial rela-
tions; management; marketing; public policy
Law 874,723 72 constitutional law; criminal law; family law and
medico-legal; international law; pure commer-
cial law; quasi-commercial law; rights and
remedies
Science 875,846 113 biology; chemistry; computer science; geography;
geology; mathematics; physics
Total 3,513,330 414
Like the General Service List, the Academic Word List is made up of word
families. Each family consists of a headword and its closely related affixed
forms according to Level 6 of Bauer and Nation’s (1993) scale, which
includes all the inflections and the most frequent and productive deriva-
tional affixes. For example, the words presumably, presume, presumed, presumes,
presuming, presumption, presumptions and presumptuous are all members of the
same family.
Coxhead (2000) selected word families to be included in the AWL on the
basis of three criteria:
1. Specialized occurrence: a word family could not be in the first 2,000

most frequent words of English as listed in West’s (1953) General Service
List.
2. Range: a word family had to occur in all four academic disciplines with
a frequency of at least 10 in each sub-corpus and in 15 or more of the
28 subject areas.
3. Frequency: a word family had to occur at least 100 times in the Academic
Corpus.
The resulting list consists of 570 word families and covers at least
8.5 per cent of the running words in academic texts. By contrast, it accounts
for a very small percentage of words in other types of texts such as novels,
suggesting that the AWL’s word families are closely associated with academic
writing (Coxhead, 2000: 225). It is divided into 10 sublists ordered accord-
ing to decreasing word-family frequency. Some of the most frequent
word families included in Sublist 1 are headed by the word forms analyse,
benefit, context, environment, formula, issue, labour, research, significant and
vary. Examples of the least frequent word families in Sublist 10 are assemble,
colleague, depress, enormous, likewise, persist and undergo.
Academic words are likely to be problematic for native as well as non-
native students as a large proportion of them are Graeco-Latin in origin
and refer to abstract ideas and processes, thus introducing additional prop-
ositional density to a text (cf. Corson, 1997). Scarcella and Zimmerman
(2005: 127) have also shown that mastery of derivative forms makes aca-
demic words particularly difficult for foreign language learners who often
fail to analyse the different parts of complex words.
1.1.3. Technical terms

Domain-specific or technical terms are words whose meaning requires
scientific knowledge. They are typically characterized by semantic special-
ization, resistance to semantic change and absence of exact synonyms
(cf. Mudraya, 2006: 238–9). As explained by Nation (2001: 203), some prac-
titioners consider that it is not the English teacher’s job to teach technical
terms. These words are best learned through the study of the body of
knowledge that they are attached to. Language teachers are not specialists
in chemistry, computer science, law or economics and may have a great
deal of difficulty with technical words. By contrast, learners who specialize
in the field may have little difficulty in understanding these words (Strevens,
1973: 228).
Since technical terms are highly subject-specific, it is possible to identify
them on the basis of their frequencies of occurrence, range and distribu-
tion (see Section 2.3) and to use them as a way of characterizing text types
(Yang, 1986). Technical terms occur with very high or at least moderate
frequency within a very limited range of texts (Nation and Hwang, 1995). In
biology, for example, we find words such as alleles, genotype, chromatid, cyto-
plasm and abiotic. These words are very unlikely to occur in texts from other
disciplines or subject areas. Technical vocabulary is difficult to quantify.
According to Coxhead and Nation (2001), technical dictionaries contain
probably 1,000 headwords or less per subject area. Research suggests that
knowledge of domain-specific or technical terms allows learners to under-
stand an additional 5 per cent of academic texts in a specific discipline.
1.1.4. Fuzzy vocabulary categories

Although core words, academic words and technical terms are described
as if they were clearly separable, the boundaries between them are fuzzy
(cf. Yang, 1986; Mudraya, 2006; Beheydt, 2005). As Nation and Hwang
remark, ‘any division is based on an arbitrary decision on what numbers
represent high, moderate or low frequency, or wide or narrow range,
because vocabulary frequency, coverage and range figures for any text or
group of texts occur along a continuum’ (1995: 37). Chung and Nation
(2003) investigate what kinds of words make up technical vocabulary in
anatomy and applied linguistics texts. They classify technical terms on a
four-level scale designed to measure the strength of the relationship of a
word to a particular specialized field. Results for vocabulary in anatomy
texts are given in Table 1.2. Chung and Nation consider items at Steps 3
and 4 to be technical terms, but not items at Steps 1 and 2. A large pro-
portion of technical words belong to the 2,000 most frequent word
families of English as given in the GSL or to the AWL. In the anatomy texts,
16.3 per cent of the word types at Step 3 are from the GSL or AWL (e.g. cage,
chest, neck, shoulder). This increases to 50.5 per cent in the applied linguistics
texts (e.g. acquisition, input, interaction, meaning, review). A major result of
this study is that a word can only be described as general service, academic
or technical in context.
Table 1.2 Chung and Nation’s (2003: 105) rating scale for finding technical
terms, as applied to the field of anatomy
Step 1
Words such as function words that have a meaning that has no particular relationship with
the field of anatomy, that is, words independent of the subject matter. Examples are: the, is,
between, it, by, adjacent, amounts, common, commonly, directly, constantly, early and especially
Step 2
Words that have a meaning that is minimally related to the field of anatomy in that they
describe the positions, movements, or features of the body. Examples are: superior, part,
forms, pairs, structures, surrounds, supports, associated, lodges, protects.
Step 3
Words that have a meaning that is closely related to the field of anatomy. They refer to
parts, structures and functions of the body, such as the regions of the body and systems of
the body. Such words are also used in general language. The words may have some
restrictions of usage depending on the subject field. Examples are: chest, trunk, neck,
abdomen, ribs, breast, cage, cavity, shoulder, skin, muscles, wall, heart, lungs, organs, liver, bony,
abdominal, breathing. Words in this category may be technical terms in a specific field like
anatomy and yet may occur with the same meaning in other fields where they are not
technical terms.
Step 4
Words that have a specific meaning to the field of anatomy and are not likely to be used
in general language. They refer to structures and functions of the body. These words have
clear restrictions of usage depending on the subject field. Examples are: thorax, sternum,
costal, vertebrae, pectoral, fascia, trachea, mammary, periosteum, hematopoietic, pectoralis, viscera,
intervertebral, demifacets, pedicle.
Similarly, it has been shown that the GSL contains words that appear with
particularly high range and frequency in academic texts (e.g. example, reason,
argument, result, use, find, show) (cf. Martínez et al., 2009: 192). These words
may be used differently in academic discourse. For example, Partington
(1998: 98) has shown that a claim in academic or argumentative texts is not
the same as in news reporting or a legal report. On the other hand, the AWL
includes words that are extremely common outside academia (e.g. adult,
drama, sex, tape) (Paquot, 2007a). Hancioğlu et al. argue that ‘the assump-
tion that any high frequency word outside the GSL coverage in the academic
corpus would be a de facto academic item perhaps accounts for the
distinctly “un-academic” texture of some of the items on the list’ (Hancioğlu
et al., 2008: 462). They also comment that the fact that ‘items such as study
appear in the GSL (but not in the AWL) and items such as drama in the AWL
(but not in the GSL), suggests that the division of vocabulary into mutually
exclusive lists is likely to be an activity that for all its initial convenience may
prove inherently problematic in the long run’ (ibid.: 463).
Originating from research on vocabulary needs for reading comprehen-
sion and text coverage, the division between core words and academic
words is very practical for assessing text difficulty and targeting words that
are worthy of explanation when reading an academic text in the classroom.
Most English for Academic Purposes (EAP) students recognize core words
but are not familiar with the meaning of academic words such as amend,
concept, implement, normalize, panel, policy, principle and rationalize, which are
not very common in everyday English. These words are, however, relatively
frequent in academic texts and students will most probably encounter them
quite often while reading. They should therefore be the focus of an aca-
demic reading course.
The division of vocabulary into three mutually exclusive lists becomes
problematic, however, when it is transposed to academic writing courses
and the need arises to distinguish between knowing a word for receptive
and productive purposes. As early as 1937, West argued that ‘both as regards
Selection and still more as regards detailed Itemization, there is a need of a
divorce between receptive and productive work’ (West, 1937: 437) and
regretted that teachers were giving
composite lessons aiming at teaching reading and speaking simultane-

ously, whereas reading and speaking are the Hare and the Tortoise.
Reading and speech bear the same relation to each other as musical
appreciation and actual execution on the piano. The one is Recognition of
a lot; the other is Skill in using a little. (ibid.)
Learning vocabulary for productive purposes has been found to be much

more difficult than learning for receptive uses. Knowing a word produc-
tively involves, for example, being able to pronounce and/or spell it
correctly, produce it to express the intended meaning in the appropriate
context, and use it with words that commonly occur with it (Nation, 2001:
27–8). Selection is thus a key issue in teaching vocabulary for academic
writing and speaking. It is questionable whether all the words from the
AWL should be the focus of productive learning. And yet this strategy lies at
the heart of several recent textbooks (e.g. Schmitt and Schmitt, 2005;
Huntley, 2006) and CALL materials (see, e.g., Gillett’s website about vocab-
ulary in EAP < http://www.uefap.com/vocab/vocfram.htm>; Luton’s
Exercises for the Academic Word List < http://www.academicvocabularyex-
ercises.com> and Haywood’s AWL Gapmaker <)
Several scholars have suggested replacing separate lists of general service
words, academic vocabulary and technical terms by a single list, either a
more specialized list or a larger common core vocabulary. Ward (1999), for
example, built an engineering word list of 2,000 word families which con-
tains both technical terms and all the general words necessary for reading
comprehension and shows that it provides 95 per cent coverage of many
basic engineering texts (see also Mudraya, 2006). Others, by contrast, have
tried to revise the General Service List, to ensure maximum utility for
any learner, regardless of specialization. Billuroğlu and Neufeld (2007)
combined into one list all the words from: (1) the GSL, (2) the AWL,
(3) the first 2,000 words of the Brown corpus, (4) the first 5,000 words of
the British National Corpus, (5) the revised version of the GSL, (6) the
Longman Wordwise of commonly used words and (7) the Longman
Defining Vocabulary. The resulting Billuroğlu-Neufeld-List (BNL) consists
of 2,709 word families categorized according to the number of lists in which
they were represented. This procedure led to the emergence of only 176
word families that were not in either the GSL or the AWL, thus confirming
that ‘if the GSL was enlarged by even a relatively small degree, [. . . ] much
of the AWL would be absorbed into it’ (Hancioğlu et al., 2008: 466).
See Stein (2008) for a similar approach.
A final criticism that can be levelled at the AWL is related to the notion of
a word family. The AWL, as well as most word lists for learners of English,
groups words into families. Other examples include the GSL, the University
Word List (Xue and Nation, 1984) and recent domain-specific lists such as
those developed by Ward (1999) and Mudraya (2006). Coxhead (2000:
218) argues that this practice is supported by psycholinguistic evidence
suggesting that morphological relations between words are represented in
Table 1.3 Word families in the AWL
link proceed issue evident item stress utilize
linkage procedural issued evidenced itemisation stressed utilisation

linkages procedure issues evidence itemise stresses utilised
linked procedures issuing evidential itemised stressful utilises
linking proceeded evidently itemises stressing utilising
links proceeding itemising unstressed utiliser
proceedings items utilisers
proceeds utility
utilities
utilization
utilize
utilized
utilizes
utilizing
the mental lexicon. This may well be true and may justify the use of word
families for receptive purposes. However, not all members of a word family
are likely to be equally helpful in academic writing. For example, under
the headword item, which has a relative frequency of 134.29 occurrences
per million words in the academic part of the British National Corpus
(see Section 3.3), we find the noun itemisation and word forms of the verb
itemise. However, these two lemmas are quite rare in academic writing,
with relative frequencies of 0.06 and 1.17 occurrences per million words
respectively. A related problem is that parts-of-speech are not differenti-
ated. Table 1.3 shows several word families taken from the AWL: the only
information provided is that the words in italics are the most frequent form
of their family. This, however, does not tell us whether the word forms issue
and issues (under the headword issue) are more often used as nouns or
verbs in EAP.
1.2. Academic vocabulary and sub-technical vocabulary
Like Coxhead (2000), Nation (2001: 187–96) uses the term ‘academic vocab-
ulary’ to refer to words that are not in the top 2,000 words of English but
which occur reasonably frequently in a wide range of academic texts. Unlike
Coxhead, however, he also uses it to label a whole set of lexical items also
known as ‘sub-technical vocabulary’ (Cowan 1974; Yang, 1986; Baker, 1988;
Mudraya, 2006), ‘semi-technical vocabulary’ (Farrell, 1990), ‘non-technical
terms’ (Goodman and Payne, 1981), and ‘specialised non-technical lexis’
(Cohen et al, 1988). However, all these terms have been used quite differ-
ently in the literature. Cowan defines sub-technical vocabulary as ‘context
independent words which occur with high frequency across disciplines’ and
comments that,
Clearly some of what I am calling sub-technical vocabulary would be

encompassed in the existing word frequency counts like Thorndike Lorge,
Michael West’s General Service List and the recent one million word com-
puter analysis by Henry Kučera and Nelson Francis. (Cowan, 1974: 391)
Cowan’s definition of sub-technical vocabulary applies to those words that

have the same meaning in several disciplines. Trimble (1985) extends
Cowan’s (1974) usage to include ‘those words that have one or more
“general” English meanings and which in technical contexts take on extended
meanings’ (Trimble 1985: 129). Trimble’s definition thus encompasses words
such as junction, circuit, wage and cage that would be categorized as technical
terms according to Chung and Nation’s (2003) four-level rating scale of tech-
nicality or field-specificity (see Table 1.2) (see also Farrell, 1990: 37).
Cohen et al. (1988) regard the extended meanings of what they call
‘non-technical’ words as a major area of difficulty for non-native readers
who may only be aware of one of their meanings. In biology, for example,
the adjective specific may also be used with reference to the genetic notion
of specificity, which is a characteristic of enzymes. A second area of difficulty
arises because non-technical words may be used in contextual paraphrases
to refer to the same concept (e.g. repair processes and repair mechanism
in a genetics text), thus causing problems of lexical cohesion at the level of
synonymy. Cohen et al. (1988) identify a subset of non-technical vocabulary
as a third area of difficulty, viz. ‘specialized non-technical lexis’. They do
not offer a precise definition of the term, but explain that this lexis includes
vocabulary items indicating, for example, time sequence, measurement, or
truth validity. They show that a large proportion of vocabulary items which
indicate time sequence or frequency in a genetics text are unknown to their
informants (e.g. ensuing, alternatively, consecutively, intermittently, subsequent
and successive).
In Li and Pemberton’s (1994) view, sub-technical vocabulary as defined
by Trimble (1985) is an important subset of academic vocabulary. They
showed that first-year computer science students are better able to recog-
nize the technical meanings of sub-technical words than their non-technical
meanings. For example, they are quite familiar with the technical meaning
of the verb compile in computer science and tend to interpret it as ‘convert
or translate a language into a machine code’ or ‘translate’ regardless of the

context in which the word occurs. This is problematic as the non-technical
meaning of a sub-technical word is often more common than its technical
meaning (see Mudraya, 2006). For example, the word solution is more fre-
quently used in its non-technical sense in engineering textbooks, even in a
chemical engineering thermodynamics textbook.
Baker (1988) has argued that this middle area between core and techni-
cal vocabulary is itself made up of several different types of vocabulary:
1. Items which express notions shared by all or several specialized

disciplines. Examples include factor, method and function.
2. Items which have a specialized meaning in a particular field, in addition
to a different meaning in general language (e.g. bug in computer
science, solution in mathematics and chemistry).
3. Items which are not used in general language but which have different
technical meanings in different disciplines (e.g. morphological in linguis-
tics, botany and biology).
4. General language items which have restricted meanings in one or more
disciplines. In botany, ‘genes which are expressed have observable effects,
i.e. are more apparent physically, as opposed to being masked. Expressed in
botany is therefore not associated with emotional or verbal behaviour as
is the case in general language’ (Baker, 1988: 92).
5. General language items which are used, in preference to other semanti-
cally equivalent items, to describe or comment on technical processes
and functions. For example, an examination of biology textbooks showed
that photosynthesis does not happen but takes place or occasionally occurs.
Baker thus comments that take place and occur can be regarded as sub-
technical words.
6. Items which are used in academic texts to perform specific rhetorical
functions. These are ‘items which signal the writer’s intentions or his
evaluation of the material presented’ (Baker, 1988: 92).
Martin uses the term academic vocabulary as a synonym for sub-technical

vocabulary to refer to words that ‘have in common a focus on research,
analysis and evaluation – those activities which characterize academic work’
(1976: 92). The vocabulary of the research process consists primarily of
verbs, nouns and their co-occurrences (e.g. state the hypothesis and expected
results; present the methodology; plan or design the experiment; develop a model).
The vocabulary of analysis includes high-frequency verbs and two-word
verbs that are ‘often overlooked in teaching English to foreign students but
which graduate students need in order to present information in an

organized sequence’ (ibid: 93), e.g. consist of, group, result from, derive,
bring about, cause, base on, be noted for. Adjectives and adverbs make up a large
proportion of the vocabulary of evaluation.
In summary, the many definitions of sub-technical vocabulary proposed
in the literature cover very different sets of lexical items, which are of
various sizes and may share certain characteristics. Sub-technical vocabu-
lary is generally defined as a category of words which are frequent across
disciplines and account for a significant proportion of word tokens in
academic texts. Farrell (1990), for example, found that out of 508 lemmas
occurring more than five times in a corpus of electronic texts, 44 per cent
were sub-technical. Definitions of sub-technical vocabulary also differ
widely, referring to words that take on extended meanings in specific
academic disciplines (Trimble, 1985), or to words that allow scholars to
conduct research, analyse data and evaluate results (Martin 1976). Baker
(1988) uses the term as a broad category for different types of lexical sets
including both Trimble’s (1985) sub-technical vocabulary and Martin’s
(1976) academic vocabulary.
Figure 1.1 shows that the various definitions of sub-technical vocabulary
and academic vocabulary as defined in Section 1.1.2 partially overlap.
Coxhead’s (2000) Academic Word List includes a large proportion of the
words that take on extended meanings in specialised fields (cf. Trimble’s
definition of ‘sub-technical vocabulary’). For example, according to the
Oxford English Dictionary (OED), the adjective nuclear has extended senses in
astronomy, biology, medicine, psychoanalysis, sociology, linguistics and
phonetics; the verb enable has a specialized meaning in computer science
(‘to make (a device) operational; to turn on’). The noun error refers to ‘the
quantity by which a result obtained by observation or by approximate
calculation differs from an accurate determination’ in mathematics. The
AWL also contains several sub-technical words as defined by Martin (1976)
(e.g. hypothesis, significant, method, function) but a large number of them do
not fall within Coxhead’s definition of academic vocabulary. Many of these
are general service words (e.g. cause, develop, group, model, plan, result). The
same is true of Baker’s (1988) category of words that perform rhetorical
functions: case, cause, compare, describe, explanation, observe, report, and study
are among the top 2,000 most frequent words of English. This category will
be the focus of the next section as it is itself made up of various sets of
lexical items and Baker (1988) suggested that it is the most difficult type of
sub-technical vocabulary to teach and acquire.
Baker's (1988) sub-technical vocabulary
Coxhead's (2000) academic vocabulary Martin's (1976)

academic vocabulary
cause, develop,
psychology hypothesis group, model,
appropriate interesting, show,
significant, experiment,
colleague method remarkable,
consist result, plan,
nevertheless factor, present, observe
function explanation
enormous derive increase, case
result, study
thereby
compile base
briefly
fast
welfare transport, mouse
journal, dog
civil, nuclear, bug
hence decade, text solution
error 'expressed' (genes)
widespread enable 'masked' (genes)
participant
morphological
Trimble's (1985) sub-technical vocabulary
Figure 1.1 The relationship between academic and sub-technical vocabulary

1.3. Vocabulary and the organization of academic texts
Baker (1988) gave the following examples of sub-technical words that are
used to perform rhetorical functions: ‘One explanation is that…’; ‘Others
have said …’; and ‘It has been pointed out by . . . that . . .’. These words bear
a strong relationship with what Winter (1977) called ‘Vocabulary 3 items’
and Widdowson (1983) ‘procedural vocabulary’. Winter (1977: 14–23) dis-
tinguished between three types of words that are commonly used to create
cohesion or structure in discourse and that are essential to the understand-
ing of clause relations. Each group is distinguished by its clause-relating
functions. Vocabulary 1 consists of ‘subordinators’ which either connect
clauses together (e.g. although, as far as, except that, unless, whereas) or embed
one clause within another (e.g. not so much . . . as . . .; not . . . let alone . . .).
Vocabulary 2 comprises ‘sentence connectors’ which ‘make explicit the
clause relation between the matrix clause and the preceding clause or sen-
tence’ (Winter, 1977: 15), e.g. therefore, anyway, hence, for example, indeed, thus,
that is to say, in other words. Vocabulary 3 items serve to establish semantic
relations in the connection of clauses or sentences in discourse. They
behave grammatically like subject, verb, object or complement and can be
pre- or post-modified. Examples include addition, affirm, alike, analogous,
cause, compare, connect, consequence, contradict, differ, explanation, feature, hypo-
thetical, identify, method, reason, result, specify and subsequent. These words ‘may
be used to make the relation explicit by saying what the relation is’ (Winter,
1977: 22). As such, they are part of what Widdowson (1983) called ‘proce-
dural vocabulary’, i.e. highly context-dependent items with very little lexical
content which serve to do things with the content-bearing words and draw
attention to the function that a stretch of discourse is performing (see also
Harris, 1997; Luzón Marco, 1999).
Vocabulary 3 items include a large proportion of nouns that are inher-
ently unspecific and require lexical realization in their co-text, either
beforehand or afterwards. Francis (1994) refers to this type of lexical cohe-
sion as ‘advance’ and ‘retrospective labelling’: labels2 allow the reader to
predict the precise information that will follow when they occur before
their lexical realization and they encapsulate and package a stretch of dis-
course when they occur after their realization (e.g. approach, area, aspect,
case, matter, move, problem, and way). Labels have traditionally been described
as content words. However, when we encounter them in a text, we often
need to do ‘something similar to what we do when we encounter words like
it, he and do in texts: we either refer to the bank of knowledge built up with
the author, look back in the text to find a suitable referent, or forward,
anticipating that the writer will supply the missing content’ (Carter and
McCarthy, 1988: 206–7). As explained by McCarthy, ‘the language learner
who has trouble with such words may be disadvantaged in the struggle to
decode the whole text as efficiently as possible and as closely as possible to
the author’s designs’ (McCarthy, 1991: 76).
Within the category of labels, Francis identified a set of nouns which are
‘metalinguistic in the sense that they label a stretch of discourse as being a
particular type of language’ (1994: 89). Metalinguistic labels are of four
types, although there is some overlap between them:
1. Illocutionary nouns are nominalizations of verbal processes, e.g. advice,

answer, argument, assertion, claim, observation, recommendation,
remark, reply, response, statement, suggestion.
2. Language-activity nouns refer to language activities and the results
thereof, e.g. comparison, contrast, definition, description, detail, exam-
ple, illustration, instance, proof, reasoning, reference, summary, etc.
3. Mental process nouns refer to cognitive states and processes and the
results thereof, e.g. analysis, assumption, attitude, belief, concept,
conviction, finding, hypothesis, idea, insight, interpretation, opinion,
position, theory, thesis, view, etc.
4. Text nouns refer to the formal textual structure of discourse, e.g. phrase,
words, quotation, excerpt, section, term, etc.
As pointed out by Nation, the strength of labels as discourse organizing

vocabulary is that ‘they have a referential function and variable meaning
like pronouns but, unlike pronouns, they can be modified by demonstra-
tive pronouns, numbers, and adjectives, they can occur in various parts of a
sentence and they have a significant constant meaning’ (2001: 212). As well
as representing text segments, labels ‘additionally give us indications of
the larger text-patterns the author has chosen, and build up expectations
concerning the shape of the whole discourse’ (McCarthy, 1991: 76). The
following words typically cluster round the elements of problem-solution
patterns: concern, difficulty, dilemma, hinder, obstacle, respond, consequence, effect
and result (see also Jordan, 1984; Hoey, 1993; 1994; Flowerdew, 2008; Nation,
2001: 211). Labels not only cluster around elements of macro patterns, they
are also characterized by their specific collocational environment as shown
by Francis:
there is a tendency for the selection of a label to be associated with

common collocations. Many labels are built into a fixed phrase or ‘idiom’
(in the widest sense of the word), representing a single choice. Frequent
collocations include, for example, ‘the move follows . . . ’, ‘. . . rejected/
denied the allegations’, ‘. . . to solve a problem’, and ‘. . . to reverse the
trend’, where the retrospective label is found in predictable company
(. . . ). Even where the collocations are less fixed, the label occurs in a
compatible lexical environment. (1994: 100–1)
More generally, Baker comments that sub-technical words which perform

specific rhetorical functions and structure the writer’s argument ‘should
not be taught in isolation but in context and as central elements in typical
collocations’ (Baker, 1988: 103).
Labels are not the only indicators of text patterns in academic discourse.
The claim-counterclaim pattern, for example, is often organized with verbs
such as assert and state, adjectives like false and likely, the preposition according
to and adverbs such as apparently and arguably. In Building Academic Vocabu-
lary, Zwier focused on lexical items that are ‘particularly useful in the kinds
of writing most common in EAP writing classes – general description,
description of processes (especially those involving changes), comparison/
contrast, and cause/effect’ (Zwier, 2002: xiii) and described the way in which
words such as consist of, comprise, parallel, alike, likewise, distinguish, raise, rise,
link, stem from, and yield are used to perform specific rhetorical functions in
academic discourse. He devoted particular attention to verbs because ‘accu-
rate verb use is especially difficult for academic writers’ (Zwier, 2002: xi) (see
also Swales and Feak, 2004). Similarly, Meyer (1997) commented that non-
technical words ‘provide a semantic-pragmatic skeleton for the text. They
determine the status of the (more or less technically phrased) propositions
that are laid down in it, and the relations between them’ (Meyer, 1997: 9).
For example, these words express temporal deixis (e.g. original, currently),
modality (e.g. may, obviously, likely), epistemic relations between the subject
matter and the scholar (e.g. indicate, seem), quantitative changes of entities
(e.g. increase, fluctuation), classifiers of entities (e.g. problem, method, theory,
characteristic), relations between entities (e.g. arising from, follow, since, involve),
scholarly speech acts (e.g. suggest, proposal, show, define), and textual deixis
(e.g. above, later) (ibid: 10–11). Meyer’s ‘non-technical vocabulary’ is there-
fore closely related to the notion of metadiscourse, i.e. ‘a specialized form of
discourse which allows writers to engage with and influence their inter-
locutors and assist them to interpret and evaluate the text in a way they
will see as credible and convincing’ (Hyland, 2005: 60).
As shown in the previous sections, the term ‘academic vocabulary’ has
been used extensively in the literature to refer to various sets of lexical
items. However, its very existence has recently been challenged by several
ESP researchers.
1.4. Is there an ‘academic vocabulary’?
In an article entitled ‘Is there an “academic vocabulary”?’, Hyland and

Tse questioned the widely held assumption that ‘a single inventory can
represent the vocabulary of academic discourse and so be valuable to all
students irrespective of their field of study’ (Hyland and Tse, 2007: 238).
They made use of Coxhead’s (2000) Academic Word List and showed that the
coverage of AWL items in a corpus of 3.3 million words from a range of
academic disciplines is not evenly distributed. The disciplines that make up
the corpus are biology, physics and computer science (sciences sub-corpus);
mechanical and electronic engineering (engineering sub-corpus), and
sociology, business studies and applied linguistics (social sciences sub-
corpus). Of the 570 AWL families, 534 (94%) have irregular distributions
across the sciences, engineering, and social sciences sub-corpora, with,
in many cases, a majority of the occurrences located in just one domain.
Of these, 227 (40%) have at least 60 per cent of all occurrences concen-
trated in just one sub-corpus. Overall, only 36 word families were found
to be relatively evenly distributed across the sub-corpora. By contrast,
78 families were extremely infrequent in one sub-corpus, 63 in two sub-
corpora and 6 in all three.
Hyland and Tse further argued that ‘all disciplines shape words for
their own uses’ (ibid: 240) as demonstrated by their clear preferences for
particular meanings and collocations. They gave the example of the word
process which is far more likely to be encountered as a noun by science and
engineering students than by social scientists. They also showed that the
verb analyse tends to refer to ‘methods of determining the constituent
parts or composition of a substance’ in engineering, while in the social sci-
ences it often simply means ‘considering something carefully’ (ibid: 244).
An investigation of a set of potential homographs in the AWL revealed a
considerable amount of semantic variation across fields. Science and engi-
neering students, for example, are very unlikely to come across the noun
volume in the meaning of ‘a book or journal series’ unless they are reading
book reviews. In addition, words may take on additional discipline-specific
meanings as a result of their regular co-occurrence with other items. The
noun strategy, for example, often appears in the multi-word unit marketing
strategy in business, learning strategy in applied linguistics and coping strategy
in sociology. The authors concluded that ‘By considering context, cotext,

and use, academic vocabulary becomes a chimera’ (ibid: 250).
These findings pose a tremendous challenge to the growing number
of students who enrol in interdisciplinary programmes and to English
teachers who are regularly faced with mixed groups of students, most nota-
bly in international EAP programmes (cf. Bhatia, 2002; Huckin, 2003;
Eldridge, 2008). Two decades ago, in an article on second language teach-
ing for academic achievement, Saville-Troike insisted that ‘vocabulary
knowledge is the single most important area of second language compe-
tence when learning content through that language is the dependent
variable’ (1984: 199). EAP courses need to ensure that sufficient attention
is given to vocabulary development (cf. Sutarsyah et al, 1994: 37). That
being the case, if academic vocabulary is a chimera, the problem is to deter-
mine what words EAP tutors should teach a mixed groups of students.
Granger and Paquot (2009a) advocate a ‘happy medium’ approach
which concurs with Hyland and Tse’s rejection of approaches of EAP as ‘an
undifferentiated unitary mass’ (Hyland and Tse, 2007: 247) while also sub-
scribing to Eldridge’s claim that ‘though one function of research is to
unravel what distinguishes different fields and genres, another function is
to find similarities and generalities that will facilitate instruction in an
imperfect world’ (Eldridge, 2008: 111). This balanced approach aims
to reconcile research findings and the reality of EAP teaching practice.
An investigation of the verb analyse in a corpus of 1,701,351 words of
business, linguistics and medicine articles has shown that it is possible to
identify both the common core features of an academic word and its
discipline-specific characteristics in terms of meaning, lexico-grammar and
phraseological patterns (Granger and Paquot, 2009a). Wang and Nation
commented that ‘learners should be encouraged to look for the central
concept behind a variety of uses’ (2004: 310). As regards the verb analyse,
this central concept can be defined as ‘to examine data using specific
methods or tools in order to make sense of it’, with the ‘data’ and the
‘methods or tools’ varying across academic fields. It is only by invoking
more general definitions of this type that EAP tutors will help L2 learners
deal with the various uses of verbs that they may come across even within a
single discipline. In linguistics, for example, analyse is also often used in the
sense of carrying a statistical analysis, submitting data to computer-aided
analyses or distinguishing the constituents of a word, phrase or sentence.
The lexico-grammatical environment of the verb will help differentiate its
‘distinct (though not unconnected)’ (Hoey, 2005: 105) senses (see also
Sinclair, 1987; 1991).
1.5. Summary and conclusion
There have been several studies that have investigated the vocabulary
needed for academic study. Some of them have assumed that learners
already knew the 2,000 most frequent words of English and looked at aca-
demic texts to see what words not in the core vocabulary occur frequently
across a range of academic disciplines. The Academic Word List (Coxhead,
2000) consists of 570 word families that are not in West’s (1953) General
Service List but which have wide range and occur reasonably frequently in a
3,500,000 word corpus of academic texts. This list is very useful for students
entering university, as well as being an excellent resource for preparing for
the reading test in International English Certificates such as TOEFL and
IELTS. It proves helpful in setting feasible learning goals and assessing
vocabulary learning.
Defining academic vocabulary in opposition to core words, however, is of
limited use when the role words play in academic discourse is examined. As
shown by Martínez et al. (2009: 192), the verbs show, find and report are not
presented as academic words because they are part of the GSL. They
perform, however, the same rhetorical function of reporting research as
establish, conclude, and demonstrate and are often more frequent than these
three AWL verbs in academic texts. These GSL verbs therefore also deserve
careful attention in the academic writing classroom.
I agree with Hancioğ lu and her colleagues that EAP practitioners
should ‘avoid taking the GSL as any kind of “given” in the compilation of
more specialized wordlists’ (Hancioğlu et al. 2008: 464). I do not, however,
subscribe to the idea according to which we ‘should seriously consider put-
ting aside the idea of a distinct discrete-item Academic Word List’ (Hancioğlu
et al. 2008: 468). The construct of academic vocabulary remains a useful
one which is, nevertheless, in need of a more precise definition (cf. Beheydt,
2005). That definition should rely on the work of researchers such as
Martin (1976) and Meyer (1997) who focused on the nature and role of
words that occur across subject-oriented texts, irrespective of the disci-
plines. Martin (1976) discussed words that are useful instruments in the
description of activities that characterize academic work, that is, research,
analysis and evaluation. Meyer (1997) focused on words that provide a
semantic-pragmatic skeleton for academic texts and identified a number of
lexical subsets that fulfil important rhetorical and organizational functions
in academic discourse (e.g. expressing modality, textual deixis, scholarly
speech acts). More recently, Martínez et al. commented that academic
words should serve ‘to build the rhetoric of a text, providing words useful
for the construction of the argument of science’ (2009: 193). All in all,
it seems reasonable to argue that, for productive purposes, academic
vocabulary would be more usefully defined as a set of options to refer
to those activities that characterize academic work, organize scientific
discourse and build the rhetoric of academic texts.
The next step is to build a list of academic words according to this
definition and it remains to be seen whether this can be done automati-
cally. Academic words in their ‘functional’ sense should be useful to biolo-
gists, agronomists, physicists, historians, sociologists, lawyers, economists,
linguists and computer scientists writing in higher education settings but
not to novelists, poets or playwrights. Following Coxhead (2000), this lexi-
cal set should therefore be reasonably frequent in a wide range of academic
texts but relatively uncommon in other kinds of texts. There are, however,
two major differences between this proposal and Coxhead’s work:
– The 2,000 most frequent words of English may be part of a list of aca-
demic vocabulary;
– Words that are reasonably frequent in a wide range of academic texts but
relatively uncommon in other kinds of texts will not be granted the sta-
tus of academic words automatically. This frequency-based criterion is
not regarded as a defining property of academic words but as a way of
operationalizing a function-based definition of academic vocabulary.
In the next chapter, I will investigate whether academic words can be

automatically extracted from corpora. To weed out those words that are not
specific to academic texts, I will use a number of corpus linguistics tech-
niques, and more particularly, keyword analysis.
Chapter 2
A data-driven approach to the selection

of academic vocabulary
In this chapter, I describe the data-driven approach used to extract

potential academic words from corpora. The term ‘potential academic
words’ is used to refer to words that are reasonably frequent in a wide range
of academic texts but relatively uncommon in other kinds of texts and
which, as such, might be used to refer to those activities that characterize
academic work, organize scientific discourse and build the rhetoric of
academic texts, and so be granted the status of academic vocabulary. The
method used to extract potential academic words is based on Rayson’s
(2008) data-driven approach, which draws on both the ‘corpus-based’
and the ‘corpus-driven’ paradigms in corpus linguistics. Rayson (2008)
identified two general kinds of research question that can be investigated
using a corpus-based paradigm. The majority of corpus-based studies tend
to focus on a particular linguistic feature, possibly a word, lemma, multi-
word expression or a grammatical construction. They examine ‘linguistic
(lexical or grammatical associations of the feature), and non-linguistic
aspects (distribution of the feature across different types of texts or speech)’
(Rayson 2008: 520). Other corpus-based studies invert this relationship
and investigate the characteristics of whole texts or language varieties, by
examining how certain linguistic features appear in a text (e.g. Biber, 1988).
Common to all corpus-based studies is the prior selection of which linguis-
tic features to study.
Rayson proposed a different approach: ‘decisions on which linguistic
features are important or should be studied further are made on the basis
of information extracted from the data itself; in other words, it is data-
driven’ (2008: 521). This model is set out in five main steps:
1. Build: corpus design and compilation.

2. Annotate: manual or automatic analysis of the corpus.
3. Retrieve: quantitative and qualitative analyses of the corpus.

4. Question: devise a research question or model (iteration back to
Step 3).
5. Interpret: interpretation of the results or confirmation of the accuracy of
the model.
Studies that make use of the data-driven approach first focus on whole texts
(Step 3) and then refine the research question or suggest specific linguistic
features to study in further detail (Step 4). Research questions emerge from
iterative analyses of the corpus data.
The model bears some similarity to corpus-driven linguistics as pre-
sented by Tognini-Bonelli (2001: 85), in which the corpus is the main
informant. However, Rayson (2008) uses the term ‘data-driven’ to distin-
guish this approach from the corpus-driven paradigm. Corpus-driven
linguists question the ‘underlying assumptions behind many well estab-
lished theoretical positions’ (Tognini-Bonelli, 2001: 48), stating that
pre-corpus theories need to be re-examined in the light of evidence
from corpora. They also have strong objections to corpus annotation
(see McEnery et al., 2006: 9–10). Rayson’s (2008) data-driven method thus
combines elements of both the corpus-based and the corpus-driven
approaches. It relies on pre-existing part-of-speech tagsets but considers
corpus data as ‘the starting point of a path-finding expedition that will
allow linguists to uncover new grounds, new categories and formulate new
hypotheses on the basis of the patterns that were observed’ (De Cock,
2003: 197). The model is also testimony to the fact that the ‘distinction
between the corpus-based vs. corpus-driven approaches to language
studies is overstated. In particular the latter approach is best viewed as an
idealized extreme’ (McEnery et al., 2006: 8).
Following Rayson’s (2008) data-driven approach, I first detail the cor-
pora used (Step 1) and the type of annotation adopted (Step 2). I then
focus on the different steps undertaken to retrieve potential academic
words. The keyword procedure is first used to retrieve a set of words
which are distinctive of academic writing. The advantages and disadvan-
tages of a keyword list are discussed and the criteria of range and even-
ness of distribution are proposed to refine the list of potential academic
words (Steps 3 and 4). In the last part of the chapter, I give a description
of the final list of potential academic words, the Academic Keyword List,
and investigate whether its constituents fit my definition of academic
vocabulary. This provides a check on the accuracy of the retrieval proce-
dure (Step 5).
Selection of academic vocabulary 31
2.1. Corpora of academic writing
Corpus-based studies of vocabulary in academic discourse (e.g. Johansson,

1978; Coxhead, 2000; Mudraya, 2006) have principally considered book
sections, journal articles and textbooks. Academic writing, however, includes
other kinds of text than professionally edited articles and books, notably
student essays. As Nesi et al. (2004: 440) comment, ‘novice writers do not
(. . . ) begin by writing for publication, or for a readership of strangers.
Their early attempts at academic writing are more likely to be assessed texts
produced in the context of a course study.’ The automatic selection of
potential academic words for this study was therefore made on the basis of
an analysis of both professional and student writing.
The professional academic corpora used are the Micro-Concord Corpus
Collection B (MC) and the Baby BNC Academic Corpus (B-BNC). The corpora
contain about a million words of published academic prose each. The MC
comprises 33 book sections and the B-BNC is made up of 30 book sections
and extracts from scientific journals. Texts in the B-BNC were written by
British scholars while the MC also includes texts written by American research-
ers. As shown in Table 2.1, both corpora consist of five sections of about
200,000 words each, corresponding to five broad academic domains (e.g. arts,
social science, science). This division into ‘knowledge domains’ (Hyland,
2009: 62–5) is particularly well suited to extracting words that are used by all
members of the ‘academic discourse community’ (Swales, 1990).
Table 2.1 The corpora of professional academic writing
Corpus Variety of English Text type Number of words
MC mainly British books 1,005,060

Arts English 180,496
Belief and religion 199,612
Science 219,596
Applied science 203,316
Social science 202,302
B-BNC British English books and 1,021,007

Humanities periodicals 262,476
Politics, education and law 196,322
Social science 132,678
Science 283,490
Technology and engineering 146,041
TOTAL 2,026,067
Table 2.2 The re-categorization of data from the

professional corpus into knowledge domains
Corpus Number of words
ProfSS 1,173,886
MC Arts 180,496
MC Belief and religion 199,612
MC Social science 202,302
BNC Humanities 262,476
BNC Politics, education and law 196,322
BNC Social science 132,678
ProfHS 852,443
MC Science 219,596
MC Applied science 203,316
BNC Science 283,490
BNC Technology and engineering 146,041
For centuries the traditional dividing line in the history of academia has
been between the natural sciences and technology (‘hard sciences’), and
humanities and the social sciences (‘soft sciences’). It is across this dividing
line that ‘we tend to see the clearest discoursal variation and rhetorical
distinctiveness’ (Hyland, 2009: 63). There seem to be good reasons for
taking knowledge domains as the point of departure for identifying poten-
tial academic words. As shown in Table 2.2, two corpora were compiled
from the MC and the B-BNC: a corpus of professional ‘soft science’ (ProfSS)
and a corpus of professional ‘hard science’ (ProfHS).
Two corpora of student writing were also used: part of the Louvain Corpus
of Native Speaker Essays (LOCNESS) and a selection of texts from the British
Academic Written English (BAWE) Pilot Corpus. Together they constitute the
Student Writing Corpus.
LOCNESS totals 323,304 words and consists of argumentative and
literary essays written by British A-level students (60,209 words), British uni-
versity students (95,695 words) and American university students (168,400
words) (see Granger, 1996a; 1998a for further details). The part used for
this study consists of argumentative essays written by university students and
totals 168,593 words. Argumentative essay titles include, among others,
‘The death penalty’, ‘Euthanasia’, ‘Fox hunting’, ‘The National Lottery’,
‘Nuclear power’, ‘Crime does not pay’ and ‘Money is the root of all evil’.
The BAWE Pilot Corpus1 contains about one million words of proficient
assessed student writing, in the form of 500 assignments ranging from 1,000
to 5,000 words in length (Nesi et al., 2004). 27 per cent of the contributors
were not native speakers of English. Nesi et al. (2004: 444) comment that
‘the University of Warwick is a multicultural, multilingual environment,
and in their departments students are assessed on merit, without regard for
their language background’, and add that ‘all contributors are proficient
users of English, given that their assignments have been awarded high
grades’. For the purpose of identifying potential academic words, I decided
to only make use of assignments written by British students as Hinkel
(2003) has shown that even English as a Second Language (ESL) students
‘continue to have a restricted repertoire of syntactic and lexical features
common in the written academic genre’ (Hinkel, 2003: 1066).
Unlike the final BAWE corpus2, disciplines are not equally represented in
the pilot corpus and the majority of student assignments come from the
humanities and social sciences. As shown in Table 2.3, the texts were
grouped into four sub-corpora which represent a discipline or a set of
disciplines. The ‘Language studies’ sub-corpus consists of essays produced
for courses in English studies, French studies, Italian studies, theatre, and
literature. Texts in business, law, politics, sociology and economics were
grouped together as social sciences as there were not enough texts per
discipline to build separate corpora. Essay topics in the BAWE pilot corpus
are very diverse and seldom repeated (see Table 2.4 for examples).
The Student Writing Corpus is thus quite representative of university
students’ writing in that it comprises different types of writing tasks (skills-
based writing and content-based writing; argumentative and expository
writing). It is, however, skewed towards humanities and social sciences.
It could be argued that the Academic Keyword List might therefore not fully
represent academic vocabulary used in the ‘hard sciences’. It will be seen in
Section 2.3 that the procedure used to extract potential academic words
largely overcomes this limitation.
Table 2.3 The corpora of student academic writing
Corpus Variety of English Text type Number of words
BAWE British English assignments 845,344

Language studies 221,841
Social sciences 163,300
Psychology 201,946
History 258,257
LOCNESS mainly American argumentative 168,593

English essays
Student Writing Corpus 1,013,937

Table 2.4 Examples of essay topics in the BAWE pilot corpus
Language studies – Visual arts in Britain

[98 essays] – Prince Arthur portrayed in books
– Rise of aestheticism
– Modes of writing essays
Social sciences – Housing policy

[64 essays] – Teachers as professionals
– Would you agree that subordination was inscribed in the life of domestic
servant?
Psychology – Clinical depression

[103 essays] – Psychology as a science
– Expressing attitude
– Is attention merely a matter of selection?
History – Absolutism in early modern Europe

[136 essays] – Why did America dominate the world film market by the 1920s?
– Who was to blame for the Boxer rising?
2.2. Corpus annotation
The Academic Word List (Coxhead, 2000) is a list of word forms that were
manually classified into 570 word families (cf. Section 1.1.2.). Most studies
of vocabulary in the field of English for specific purposes (ESP) are based
on raw corpora (e.g. Coxhead and Hirsh, 2007; Martínez et al., 2009;
Mudraya, 2006; Wang et al., 2008; Ward, 2009). None of them discuss issues
arising from the format of the corpus. However, any corpus-based study that
aims to identify a specific set of vocabulary items should consider the
advantages and disadvantages of annotating corpora.
2.2.1. Issues in annotating corpora

As Leech put it, ‘corpora are useful only if we can extract knowledge or
information from them. The fact is that to extract information from
a corpus, we often have to begin by building information in’ (1997: 4).
Corpus annotation refers to the practice of adding linguistic information to
an electronic corpus of language data. Various levels of annotation can be
distinguished, starting from the addition of lemma information to each
word in the corpus. A lemma is used to group together inflected forms of a
word, such as the singular and the plural forms of a noun, or the different
conjugated forms of a verb. A second type of annotation is the morphosyn-
tactic level of annotation, which concerns the labelling of the part-of-speech
(POS) or grammatical category of each word in the corpus. POS tagging is
the most popular kind of linguistic annotation applied to text. By providing

information about the grammatical nature of a word, it makes it possible
to extract information about its various meanings and uses. Thus, it
distinguishes between left as the past tense or past participle of leave
(‘I left early’), and left as a word meaning the opposite of right, either as an
adjective (‘my left hand’), an adverb (‘turn left’) or a noun (‘on your left’).
Other levels of annotation are syntactic annotation or parsing (the analysis
of sentences into their constituents), semantic annotation (the labelling of
semantic fields) and discourse tagging (the annotation of discourse rela-
tions within the texts). For more information on the different levels of
annotation, see McEnery et al. (2006: 33–43).
A number of criticisms have been directed at corpus annotation, notably
by distinguished contributors to corpus-driven linguistics. One of the most
widespread criticisms is that annotation reflects, at least to a certain extent,
some theoretical perspective. Although the sets of categories and features
used in annotating a corpus are generally chosen to be as uncontroversial
as possible, the interpretative nature of corpus annotation has been
perceived as a way of imposing pre-existing models of language on corpus
data (Tognini-Bonelli, 2001: 73–4). These models of language date from a
‘pre-corpus’ time and some of them derive from descriptions which ignore
empirical evidence altogether (Sinclair, 2004a: 52). The argument, although
valid, is certainly not strong enough to counterbalance all the advantages of
corpus annotation, but it should be taken as a warning against the naive
assumption that using annotating software is a neutral act.
Another argument against annotation is that it may introduce errors.
While it is inevitable that annotation systems will sometimes get things
wrong, the various levels of annotation distinguished above are performed
with varying degrees of accuracy. POS tagging is a well-researched kind of
linguistic annotation and taggers perform with very high levels of accuracy.
However discourse annotation systems, for example, are more recent and
still need to be substantially refined.
Although annotated data is often described as ‘enriched’ data (Leech
and Smith, 1999; Aarts, 2002; Bowker and Pearson, 2002), annotation
has also sometimes been criticized for resulting in a loss of information
(Sinclair, 1992; 2004a). The argument can be summarized as follows:
It could be argued that in a tagged text no information is lost because

the words of the text are still there and available, but the problem is that
they are bypassed in the normal use of a tagged text. The actual loss of
information takes place when, once the annotation of the corpus is
completed and the tagsets are attached to the data, the linguist processes
the tags rather than the raw data. By doing this the linguist will easily lose
sight of the contextual features associated with a certain item and will
accept single, uni-functional items —tags — as the primary data. What is
lost, therefore, is the ability to analyse the inherent variability of language
which is realised in the very tight interconnection between lexical and
grammatical patterns. This is the price paid for simplification; a process
that is so useful – but it is argued here that the interconnection between
lexis and grammar is crucial in determining the meaning and function of
a given unit: any processing that loses out on this is bound to lose out in
accuracy. (Tognini-Bonelli 2001: 73–4)
The data-driven methodology adopted in this book aims to preserve the

best of both worlds, by first extracting potential academic words from
annotated corpora and then returning to raw data to analyse their use in
context.
Finally, it is worth stressing that this chapter does not attempt to meet a
theoretical objective. Rather it is content with an applied aim. Even the
linguists who have directed the most severe criticisms at annotated data
acknowledge that the ‘good point of annotation lies in its value in applica-
tions’ (Tognini-Bonelli, 2001: 73). The main objective of this chapter is to
select nouns, verbs, adjectives, adverbs and other function words that are
commonly used in academic texts. Part-of-speech tagged corpora will thus
facilitate the extraction of specific word classes. Elsewhere (Paquot, 2007b),
I have tested the extraction procedure described in this chapter on two
corpus formats, (word form + morphosyntactic tag, and lemma + morpho-
syntactic tag), and shown that using lemmatised corpora makes it possible
to identify some 31 per cent more lexical verbs that are typical of academic
texts than using unlemmatised corpora.. If lemmas are used, the different
inflectional forms of a verb (e.g. consist, consists, consisted, and consisting) are
merged and so a better frequency distribution for the lemma across texts is
obtained. Word forms of lexical items that have two alternative spellings
(e.g. analyse /analyze; characterise/characterize; centre/center; behaviour/behav-
ior) were lemmatized under the same headword (either the British variant
or the most frequently used option).
2.2.2. The software

The analysis was carried out using Wmatrix, a web-based corpus processing
environment which gives researchers access to several corpus annotation
and retrieval tools developed at the University Centre for Computer
Corpus Research on Language (UCREL) at Lancaster University. The tools
available in Wmatrix include the Constituent Likelihood Automatic Word-tagging

System (CLAWS) and the UCREL Semantic Analysis System (USAS) (see
Rayson, 2003).
The Constituent Likelihood Automatic Word-tagging System

A corpus uploaded to the Wmatrix environment is first grammatically
tagged with the Constituent Likelihood Automatic Word-tagging System (CLAWS)
(Garside and Smith, 1997). The tagger makes use of a detailed set of
146 tags3 (CLAWS C7 tagset). It also uses two lexicons: (a) a lexicon of
single words with all their possible parts of speech and associated lemmas;
and (b) a multiword expression lexicon. Multiword expressions include
adverbs such as a bit, all the same, and so forth, at least, and by and large; prepo-
sitions such as as opposed to, because of, contrary to, in the light of, and in compari-
son with; compounds such as tabula rasa, brand new, matter-of-fact and grown
up; and conjunctions such as even though, as if, provided that, and so that.
Part-of-speech tagging is essentially a disambiguation task. Many
words are part-of-speech homographs, i.e. they are spelt the same but
belong to different word classes. A tagger needs to determine which part-of-
speech is most probable, given the immediate syntactic and semantic con-
text of a homograph. Although close to 90 per cent of English types4 can
only be one part-of-speech (e.g. abound can only be a verb and kindness is
always a noun), over 40 per cent of the running words (or tokens) in a cor-
pus are morphosyntactically ambiguous (DeRose, 1988: 31). This is largely
due to the ambiguity of a number of high-frequency words such as that,
which can be a determiner (Do you remember that nice Mr. Hoskins who came to
dinner?)5, a relative pronoun (The people that live next door), a conjunction
(I can’t believe that he is only 17) or an adverb (I hadn’t realized the situation was
that bad!). Another very common source of ambiguity in English is homog-
raphy between verbs and nouns, e.g. use, issue, cause, abandon, craft, etc.
(see Ide, 2005).
Most current part-of-speech taggers use an approach to disambiguation
which is at least partly probabilistic: they rely on co-occurrence probabili-
ties between neighbouring tags. Co-occurrence probabilities are often
automatically derived by training the software on manually disambiguated
texts. For example, given that x is a determiner, the probability that
the item to its immediate right is a noun or an adjective can be calculated.
Non-probabilistic or rule-based taggers have also been making a comeback
with systems such as that proposed by Brill (1992). Typical rule-based
taggers use context frame rules to assign tags to unknown or ambiguous
words. An example of a context frame rule is ‘if an ambiguous or unknown
word is preceded by a determiner and followed by a noun, tag it as an

adjective’. Voutilainen (1999) has surveyed the history of the different
approaches to word class tagging.
CLAWS is a hybrid tagger, combining both probabilistic and rule-based
approaches. This hybrid approach allows CLAWS to assign POS-tags with a
very high degree of accuracy – 97–98 per cent for written texts (Rayson,
2003: 63). The tagger is commonly described as going through five major
stages (Garside, 1987):
1. A pre-editing or tokenisation phase: This stage prepares the text for the
tagging process by segmenting it into words and sentence units, a task
which is not trivial. A sentence is generally described as a string of words
followed by a full stop. A full stop does not, however, always signal the
end of a sentence (e.g. in figures (5.8 or 14.28), title nouns (Mr., Dr.),
and other types of abbreviations (i.e., viz., fig.). Similarly, a word is
generally considered as an orthographic word, i.e. a string of letters
surrounded by white spaces. However, words are not always separated by
blanks (e.g. in contractions such as don’t, it’s, they’re).
2. An initial part-of-speech assignment: Once a text has been tokenised, the
tagger assigns part-of-speech tags to all the word tokens in the text with-
out considering the context. If a word is unambiguous, i.e. belongs to
only one part-of-speech category or word class (e.g. boat, person, belong), it
is assigned a single tag. If a word is ambiguous, that is, if it can belong to
more than one word class (e.g. use, cause, fire), it is assigned several tags
listed in decreasing likelihood. Thus, fire is first tagged as a noun and
then as a verb, because the probability of it being a noun is higher than
that of it being a verb. If a particular word is not found in the tagger’s
lexicon, it is assigned a tag based on various sets of rules, e.g. morpho-
logical rules, for tagging unknown items. Thus, a word ending in *ness
will be classified as a noun; a word ending in *ly will be classified as an
adverb, etc.
3. A rule-based contextual part-of-speech assignment: This stage assigns a
single ‘ditto-tag’ to two or more orthographic words which function as a
single unit or multiword expression (e.g. as well as is tagged as a conjunc-
tion, and in situ as an adverb (see below for more details on ditto-tags
and their advantages)).
4. The probabilistic tag-disambiguation program: The task of the probabi-
listic tag-disambiguation program is to inspect all the cases where
a word has been assigned two or more tags and choose a preferred tag
by considering the context in which the word appears and assessing the
probability of any particular sequence of tags. The probability of a tag

sequence is typically a function of,
– the probability that one tag follows another; and

– the probability of a word being assigned a particular tag from the list
of all its possible tags (Garside and Smith, 1997: 104).
If, for example, the word run has been assigned both a noun and a verb
tag, it is less likely to be classified as a verb if it appears in the vicinity of
another verb, despite the fact that run is more often a verb than a noun.
5. Output: The output data can be presented in intermediate format (vertical

output for manual post-editing) or final format (horizontal and encoded
in SGML). Table 2.5 shows a typical CLAWS vertical output: each line rep-
resents a running word in the corpus and gives its POS-tag and lemma.
The intermediate format has the advantage of allowing researchers to select

the information needed. I have written a Perl program which takes this
intermediate format as its input and creates a corpus with lemmas followed
by their POS-tags (Table 2.6).
The problem with the format shown in Table 2.6 is that the word
forms are replaced by their lemmas, while the POS-tags are too specific
for our purposes. Redundant information includes, for example, that on
number given by the tags NN1 (singular common noun) or DD1 (singular
Table 2.5 An example of CLAWS

vertical output
POS-tag Word form Lemma
AT The the
JJ whole whole
NN1 point point
IO of of
AT the the
NN1 play play
VVZ seems seem
TO to to
VBI be be
AT1 an an
NN1 attack attack
II on on
AT the the
NN1 Church church
. . PUNC
Table 2.6 CLAWS horizontal output [lemma + POS]
the_AT whole_JJ point_NN1 of_IO the_AT play_NN1 seem_VVZ to_TO be_VBI an_AT1 attack_NN1
on_II the_AT Church_NN1 ._PUNC
Where AT: article; JJ: adjective; NN1: singular common noun; IO: of (as preposition); VVZ:
-s form of lexical verb; TO: infinitive marker ‘to’; VBI: be, infinitive; AT1: singular article; II: general
preposition; PUNC: punctuation
Table 2.7 CLAWS horizontal output [lemma + simplified POS tags]
the_AT whole_JJ point_NN of_IO the_AT play_NN seem_VV to_TO be_VB an_AT attack_NN on_II
the_AT Church_NN ._PUNC
determiner) and that on verbal forms given by the tags VVZ (-s form of a
lexical verb) or VVG (-ing form of a lexical verb). As a result, frequency lists
based on this format generate different frequencies for ‘example_NN1’ and
‘example_NN2’. POS-tags were therefore simplified by a Perl program to
match the level of specificity of the lemmas. Table 2.7 shows the same
sentence in Table 2.6 after simplification of the POS-tags. The simplifica-
tion routines are presented in Table 2.8.
Finally, each CLAWS7 tag can be modified by the addition of a pair of
digits to show that it occurs as part of a sequence of similar tags, represent-
ing a group of graphemic words which, for grammatical purposes, are best
treated as a single unit. The expression ahead of is an example of a sequence
of two graphemic words treated as a single preposition. It receives the tags:
ahead_II21 of_II22, where II stands for a general preposition. The first of
the two digits indicates the number of graphemic words in the sequence,
and the second digit the position of each graphemic word within that
sequence. Such ‘ditto tags’ are not included in the lexicon but the program
assigns them via an algorithm which is applied after initial part-of-speech
assignment and before disambiguation by looking for a range of multiword
expressions included in a pre-established list.
Ditto tags are very useful as they make it possible to extract complex prep-
ositions, and complex conjunctions as well as single words that are typical
of academic discourse. However, the annotation format has to be slightly
modified to do this. Table 2.9 shows the CLAWS vertical output for the com-
plex preposition in terms of. Each graphemic word of the complex preposi-
tion is tagged and lemmatised independently. A word list based on CLAWS
horizontal output would thus distinguish between the preposition in (in_II)
and the preposition in used as the first word of three-word sequences
(such as in terms of ) (in_II31). It would not be able to retrieve the complex
Table 2.8 Simplification of CLAWS POS-tags
Simplified POS tags CLAWS7 POS tags
Singular vs. plural forms
MC (cardinal number) MC1, MC2

NN (common nouns) NN1, NN2
NNL (locative nouns, e.g. island, street) NNL1, NNL2
NNO (numeral nouns, e.g. hundred) NNO, NNO2
NNT (temporal nouns, e.g. day, week) NNT1, NNT2
NNU (units of measurement, e.g. inch) NNU1, NNU2
NP (proper nouns) NP1, NP2
NPD (weekday noun) NPD1, NPD2
NPM (month noun) NPM1, NPM2
Comparative and superlative forms
DA (after-determiners, e.g. little, much, few) DAR (more, less), DAT (most, fewest)
JJ (adjective) JJR, JJT
Verb forms
VB (be) VB0 (be, base form), VBDR (were), VBDZ

(was), VBI (be, infinitive), VBM (am), VBN
(been), VBR (are), VBZ (is)
VD (do) VD0 (do, base form), VDD (did), VDG (doing),
VDI (do, infinitive), VDN (done), VDZ (does)
VH (have) VH0 (have, base form), VHD (had), VHG
(having), VHI (have, infinitive), VHN (had),
VHZ (has)
VV (lexical verbs) VV0 (base form of lexical verb), VVD (past
tense), VVG (-ing participle), VVGK (-ing
participle catenative, e.g. be going to), VVI
(infinitive), VVN (past participle), VVNK
(past participle catenative, e.g. be bound to),
VVZ (-s form)
Table 2.9 CLAWS tagging of

the complex preposition ‘in
terms of’
POS-tag Word form Lemma
II31 in in
II32 terms term
II33 of of
preposition itself. Another Perl program was therefore used to replace any
sequence of words with ditto tags (e.g. in_II31 terms_II32 of_II33) by the
component words, separated by a hyphen and followed by their POS-tag
(e.g. in-terms-of II).
The UCREL Semantic Analysis System

A second layer of annotation was applied by the UCREL Semantic Analysis
System (USAS). This tool assigns tags representing the general semantic
field of words from a lexicon of single words and multiword expressions.
A semantic field is a theoretical construct which groups together ‘words
that are related by virtue of their being connected – at some level of gener-
ality – with the same mental concept’ (Wilson and Thomas, 1997: 54). This
includes not only synonyms and antonyms of a word but also its hypernyms
and hyponyms, and any other words that are linked in other ways with the
concept concerned. For example, the category ‘language and commu-
nication’ (Q) includes words such as answer, reply, response, question, query,
statement, message, feedback, anecdote, explain, and explanation.
The USAS tagset includes 21 major semantic fields (see Table 2.10),
which, in turn, expand into 232 categories (see Archer et al., 2002). Letters
Table 2.10 Semantic fields of the

UCREL Semantic Analysis System
A General and abstract terms

B The body and the individual
C Arts and crafts
E Emotional actions, states and processes
F Food and farming
G Government and public
H Architecture, house and the home
I Money and commerce in industry
K Entertainment, sports and games
L Life and living things
M Movement, location, travel and transport
N Numbers and measurement
O Substances, materials, objects and equipment
P Education in general
Q Language and communication
S Social actions, states and processes
T Time
W World and environment
X Psychological actions, states and processes
Y Science and technology
Z Names and grammar
are used to denote the major semantic fields while numbers indicate field
subdivisions. For example, the semantic tag A2.2 represents a word in the
category ‘general and abstract words’ (A), the subcategory ‘affect’ (A2) and
more precisely the sub-subcategory ‘cause / connected’ (A2.2). The seman-
tic annotation does not apply to proper names and closed classes of words
such as prepositions, conjunctions and pronouns. These categories are all
marked with a Z-tag.
Like part-of-speech tagging, semantic tagging can be subdivided broadly
into a tag assignment phase and a tag disambiguation phase. First, a set of
potential semantic tags are attached to each lexical unit. The next stage
consists of selecting the contextually appropriate semantic tag from the set
of potential tags provided by the tag assignment algorithm. The program
makes use of a number of sources of information in the disambiguation
phase, notably POS-tags, domain of discourse, and contextual rules
(Rayson, 2003: 67–8). It assigns a semantic field tag to every word in the text
with about 92 per cent accuracy. Table 2.11 shows that in the sentence ‘This
chapter deals with the approach of the criminal law to behaviour which causes or risks
causing death’, the word chapter has been assigned the tags Q4.1 (‘language
and communication – media – books’), S5 (‘social actions, states and pro-
cesses – groups and affiliation’), S9 (‘social actions, states and processes –
religion and the supernatural’) and T1.3. (‘time-period’). The program
Table 2.11 USAS vertical output
POS-tag Word form Semantic tag
DD1 This M6 Z5 Z8
NN1 chapter Q4.1 T1.3 S9/S5 S5+
VVZ deals A1.1.1 I2.2 I2.1 A9- K5.2 F3/I2.2
IW with Z5
AT the Z5
NN1 approach X4.2 M1 E1 S1.1.1
IO of Z5
AT the Z5
JJ criminal G2.1[i1.2.1 G2.1- A5.1-
NN1 law G2.1[i1.2.2 G2.1 S6+ Y1
II to Z5
NN1 behaviour S1.1.1 A1.1.1
DDQ which Z8 Z5
VVZ causes A2.2
CC or Z5
VVZ risks A15-
VVG causing A2.2
NN1 death L1-
. .
Table 2.12 USAS horizontal output
This_M6 chapter_Q4.1 deals_A1.1.1 with_Z5 the_Z5 approach_X4.2 of_Z5 the_Z5 criminal_

G2.1[i1.2.1 law_G2.1[i1.2.2 to_Z5 behaviour_S1.1.1 which_Z8 causes_A2.2 or_Z5 risks_A15
causing_A2.2 death_L1- ._PUNC
ranked these semantic tags and chose Q4.1 as the semantic tag with the
highest correctness probability. This is displayed in the final output format
(see Table 2.12).
The same occurrence of a word in a text may simultaneously signal
more than one semantic field. The word chapter in the sense of ‘an
ecclesiastical assembly of priests or monks’ is a case in point. It belongs
equally to the semantic fields of ‘groups and affiliation’ and ‘religion’.
The two semantic tags are thus assigned in the form of a single tag S9/S5
(see Table 2.11).
In the USAS lexicon of multiword expressions, phrasal verbs (e.g. break
out, take off), compounds (e.g. academic year, advisory committee, bank
account), and idioms (e.g. at the drop of a hat, to bark up the wrong tree, by the
skin of one’s teeth) are described as regular expressions or templates, i.e.
sequences of words, parts of words and grammatical categories used to
match similar patterns of text and extract them. Thus, the template
‘ma[kd]*_V* {JJ, D*, AT*} sense_NN1’ identifies all occurrences of the verb
make directly followed by an optional adjective (JJ), determiner (D*) or
article (AT*) and the singular noun (NN1) sense. It thus retrieves all
instances of the expression make sense and its variants make no sense, makes
little sense, made more sense, etc. Multiword expressions are analysed as if they
were single words, using ditto-tags similar to those used in part-of-speech
tagging. For example, criminal law is tagged as: criminal G2.1[i1.2.1 law
G2.1[i1.2.2 (see Table 2.12).
2.3. Automatic extraction of potential academic words
Coxhead (2000) made use of the Range corpus analysis program

(Heatley & Nation, 1996) to select words that met three frequency-based
criteria:
1. The word families included had to be outside the first 2,000 most fre-
quent words in English, as represented by West’s (1953) General Service
List.
2. A member of a word family had to occur in all 4 disciplines represented

in the Academic Corpus, with a frequency of at least 10 occurrences in
each sub-corpus and in 15 or more of the 28 subject areas.
3. Members of a word family had to occur at least 100 times in the corpus
(cf. Section 1.1.2.).
If Criterion 1 had not been used, the resulting list would have included a
large number of function words and other high-frequency words that tend
to be frequent in the English language as a whole (cf. Section 1.1.1) but
which are not necessarily the most representative lexical items in the
Academic Corpus. On the other hand, applying Criterion 1 makes it impos-
sible to identify high-frequency words that are particularly prominent in
academic texts.
To address this limitation, the procedure described in this book is primar-
ily based on keyness (Scott, 2001), a fully data-driven method that is often
used in corpus linguistics to find salient linguistic features in texts (e.g.
Archer, 2009) and which does not require the use of a stop list to filter out
function words. These words and other high-frequency words will only
occur in a keyword list “if their usage is strikingly different from the norm
established by the reference text” (Archer, 2009a: 3).
Two quantitative filters, namely range and evenness of distribution, are
subsequently used to narrow down the resulting list of potential academic
words (Figure 2.1).
1. Keyness
2. Range
3. Evenness of distribution
Potential academic words
Figure 2.1 A three-layered sieve to extract potential academic words

2.3.1. Keyness
Keyword analysis has been used in a variety of fields to extract distinctive
words or keywords, e.g. business English words (Nelson, 2000), words
typically used by men and women with cancer in interviews and online
cancer support groups (Seale et al., 2006), and terminological items typical
of specific sub-disciplines of English for information science and techno-
logy (Curado Fuentes, 2001). As emphasized by Scott and Tribble (2006:
55–6), ‘keyness is a quality words may have in a given text or set of texts,
suggesting that they are important, they reflect what the text is really about,
avoiding trivia and insignificant detail. What the text ‘boils down to’ is its
keyness, once we have steamed off the verbiage, the adornment, the blah
blah blah’.
The procedure to identify keywords of a particular corpus involves five
main stages (see Scott and Tribble, 2006: 58–60):
1. Frequency-sorted word lists are generated for a reference corpus and the
research corpus.
2. A minimum frequency threshold is usually set at 2 or 3 occurrences in
the research corpus. Thus, ‘for a word to be key, then it (a) must occur at
least as frequently as the threshold level, and (b) be outstandingly fre-
quent in terms of the reference corpus’ (Scott and Tribble, 2006: 59).
3. The two lists of word types and their frequencies are compared by means
of a statistical test, usually the log-likelihood ratio.
4. Words that occur less frequently than the threshold in the research
corpus, or are not significantly more frequent in the research corpus
than in the reference corpus, are filtered out.
5. The word list for the research corpus is reordered in terms of the
keyness of each word type. Software tools usually list positive keywords,
i.e. words that are statistically prominent in the research corpus, as well
as negative keywords, i.e. words that have strikingly low frequency in the
research corpus in comparison to the reference corpus.
For the purposes of this research, the two corpora of professional writing
and the corpus of student academic writing described in Section 2.1 were
each compared with a large corpus of fiction on the grounds that academic
words would be particularly under-represented in this literary genre. Thus,
the reference corpus was not chosen to represent all the varieties of the
language6 but to serve as a ‘strongly contrasting reference corpus’ (Tribble,
2001: 396). The K (general fiction), L (mystery and detective fiction),
Table 2.13 The fiction corpus
Corpora Number of words
LOB (categories K, L, M, N, P) 946,337

FLOB (categories K, L, M, N, P)
BROWN (categories K, L, M, N, P)
FROWN (categories K, L, M, N, P)
Baby BNC fiction 999,688
TOTAL 1,946,025
Table 2.14 Number of keywords
Corpus Positive keywords Negative keywords
ProfHS corpus 4,322 837

ProfSS corpus 4,656 1,201
Student Writing corpus 4,492 956
M (science fiction), N (adventure and western fiction) and P (romance and

love story) categories of the LOB (Lancaster-Oslo/Bergen) corpus, the
FLOB (Freiburg Lancaster-Oslo/Bergen) corpus, the BROWN corpus and
the FROWN (Freiburg-Brown) corpus7 were combined with the Baby BNC
fiction corpus (Table 2.13) to form the reference corpus for this study.
Keyness values were calculated with the Keyness module of WordSmith
Tools 4 (Scott, 2004). The significance of the log-likelihood test was set at
0.01 with a critical value of 15.13 (see Rayson et al., 2004), which means
that there is less than 1 per cent danger of mistakenly claiming a significant
difference in frequency. Keywords were extracted for the ProfSS corpus,
the ProfHS corpus and the Student Writing Corpus, in lemma + POS-tag
format. Table 2.14 gives the number of positive and negative keywords for
each corpus.
Positive keywords are more numerous than negative keywords for each aca-
demic corpus. This can be explained by the large amount of specialized
vocabulary present in academic texts, e.g. formula, cell and species in biology
(hard science), law, offence and policy in law (soft science) and theory, factor and
participant in student writing. However, not all of the keywords meet the defi-
nition of academic vocabulary in Section 1.5. The keyword procedure selects
all words that occur with unusual frequency in a given text/corpus, com-
pared to a reference corpus. The resulting list is therefore likely to include
technical words that do not occur in all types of academic texts, simply
because they are under-represented in fiction writing (e.g. bacterium, methane,
DNA, penicillin, chromosome, enzyme, jurisdiction, rape, archbishop, martyr, etc.).
The keyword procedure relies on the conception of a corpus as one big

text rather than as a collection of smaller texts. Statistical measures such as
the log-likelihood ratio are computed on the basis of absolute frequencies
and cannot account for the fact that ‘corpora are inherently variable inter-
nally’ (Gries, 2007: 110). As a consequence, the procedure cannot distin-
guish between ‘global’ and ‘local’ keywords (Katz, 1996). Global keywords
are dispersed more or less evenly through the corpus while local keywords
appear repeatedly in some parts of the corpus only, a phenomenon which
Katz (1996: 19) has referred to as ‘burstiness’8. For example, in a keyword
analysis of gay male vs. lesbian erotic narratives, Baker shows that wuz (used
as a non-standard spelling of was) appears to be a keyword of gay male
erotic narratives, when in fact its use is restricted to one single text, ‘which
suggests that this word is key because of a single author’s use of a word in a
specific case, rather than being something that indicates a general differ-
ence in language use’ (2004: 350). In other words, the keyword status of
wuz is more a function of the sampling decision to include one particular
narrative in the corpus than evidence of the distinctiveness of the word in
gay male erotic narratives (see also Oakes and Farrow, 2007: 91).
As a first step to overcome this inherent limitation of the keyword
procedure, I wrote a Perl program which automatically compares keywords
for several corpora and creates a list of positive keywords that are shared in
the ProfHS corpus, the ProfSS corpus and the Student Writing corpus
(cf. Scott’s (1997) notion of ‘key keywords’). Although the resulting
number of keywords fell by more than 60 per cent, 2,048 shared keywords
were still identified. The criteria of range and evenness of distribution
were subsequently used to refine the list of potential academic words still
further.
2.3.2. Range
Range (i.e. the number of texts in which a word appears) is used to deter-
mine whether a word appears to be a potential academic keyword because
it occurs in most academic disciplines or because of a very high usage in a
limited subset of texts. It is calculated on the basis of the 15 sub-corpora
described in Section 2.1 with the WordList option of WordSmith Tools
(Scott, 2004). This tool can take several corpus files as input and range
comes automatically with any word list it produces. Figure 2.2 shows that
there is a column headed ‘Texts’ which shows the number of texts each
word occurred in. The words ability, able and about, for example, are shown
to appear in all 15 sub-corpora, that is, in 100 per cent of the corpora
Figure 2.2 WordSmith Tools – WordList option
analysed. For the purposes of this study, only words appearing in all 15
academic sub-corpora were retained as potential academic words.
Used alone, range has an important limitation in that it gives no informa-
tion on the frequency of a word in each sub-corpus. Thus, the criterion of
range excludes the words sector, paradigm and variance as they only appear in
11 sub-corpora but includes both the word example, which we intuitively
regard as an academic word, and the word law, the meaning of which is
more discipline or topic-dependent (e.g. canon law, criminal law, the law of
gravity). The frequencies of these two words in each sub-corpus are shown
in Figure 2.3 and accurately reflect the difference between them. The fre-
quency of the word example ranges from 26 to 226 in the 15 sub-corpora,
while that of the word law varies between 11 and 812. The large variation in
the range of law can be explained by the peak frequency of occurrence of
the noun in the professional soft science sub-corpora.
900
800
700
600
500 LAW
400 EXAMPLE
300
200
100
0
C M
C
AP C
M BEL
SC
W E CH
C C
C L
BN SC
S
TS
W HO
C C
Y
W IST S
BN PO
SO
ES
BN HU
BN SO
PS R
BA H RT
LO SO
AR
BA AW TE
O
BA YC
M MC
BN C
N
C
E A
C
E
M
B C
C
C
M
E
Figure 2.3 Distribution of the words example and law in the 15 sub-corpora
2.3.3. Evenness of distribution

Differences in range can be highlighted by a measure of the evenness of the
distribution of words in a corpus. This is the last criterion I applied to
restrict the list of potential academic words. The evenness of the distribu-
tion or dispersion of a word is ‘a statistical coefficient of how evenly distrib-
uted a word is across successive sectors of the corpus’ (Rayson, 2003: 93).
This measure takes into account ‘not only the presence or absence of a
word in each subsection of the corpus, but the exact number of times it
appears’ (Oakes and Farrow 2007: 91). A number of studies have used a
measure of dispersion to define a core lexicon on the basis that ‘if a word is
commonly used in a language, it will appear in different parts of the corpus.
And if the word is used commonly enough, it will be well-distributed’
(Zhang et al., 2004).
One such measure is Juilland’s D statistical coefficient. Juilland’s D was
first used in the Frequency Dictionary of Spanish Words (Juilland and Rodriguez,
1964) and is calculated as D = 1 – V / √n-1 where n is the number of sectors
(i.e. the number of sub-corpora or texts) in the corpus. The variation coef-
ficient V is given by V = s / x where x is the mean sub-frequency of the word
in the corpus and s is the standard deviation of these sub-frequencies. Its
values range from 0 (most uneven distribution possible) to 1 (perfectly
even distribution across the sectors of the corpus) (see Oakes (1998:
189–92) and Gries (2008) for more information on dispersion measures).
Juilland’s Ds were calculated for each word using the output list from
WordSmith Tools Detailed Consistency Analysis. Figure 2.4 gives an example of a
Selection of academic vocabulary
Figure 2.4 WordSmith Tools Detailed Consistency Analysis
51
detailed consistency analysis: the second column (Total) gives the total
frequency of each word in the whole corpus, the third column (Texts) gives
the range of each word and the following columns show its frequencies
in each sub-corpus. These frequencies were copied into an Excel file and
normalized per 100,000 words as the 15 sub-corpora are of different sizes.
The measures necessary to calculate Juilland’s D values (i.e. the variation
coefficient, the mean sub-frequency and the standard deviation) were
computed in Excel and Juilland’s D values were then calculated for
each word.9
For a word to be selected as a potential academic word, its Juilland’s
D value had to be higher than 0.8. The noun example was thus identified as
a potential academic word as its dispersion value was 0.83, whereas the
noun law, with a Juilland’s D value of 0.69, was not selected. Dispersion
values make it possible to avoid the mistaken conclusion that these two
words behave similarly in academic writing, and confirm that only example
is of widespread and general use in this genre, while the noun law is
over-represented in the professional soft science corpus, and more specifi-
cally in the social science sub-corpus. Evenness of distribution is the only
criterion used that could perhaps favour keywords that are more promi-
nent in the different parts of the ProfSS corpus and the Student Writing
corpus. However, a relatively high minimum threshold of 0.8 reduces the
possibility of giving too much weight to words that would be particularly
frequent in the ‘soft science’ sub-corpora but much less common in the
‘hard science’ sub-corpora.
The resulting list of 599 potential academic words includes nouns such as
conclusion, difference, extent, significance, and consequence; verbs such as prove,
appear, provide, discuss, show, result and illustrate; adjectives such as significant,
effective, similar and likely and adverbs such as particularly, conversely, highly
and above. Examples of words that have D values lower than 0.8 and were
therefore not selected include the nouns health, employment, personality,
and treatment; and the verbs label, perceive and isolate. At times, the cut-off
point of 0.8 appears to be too restrictive and words that would intuitively be
considered as academic words are excluded. Some words have skewed
Juilland’s D values because of their polysemy. The noun solution is a case in
point, as it has both a general meaning (‘a way of solving a problem’) and
a technical meaning (‘a liquid in which a solid or gas has been mixed’) with
different frequencies and distributional behaviours. Its general meaning is
found in all academic sub-corpora while its technical meaning is restricted
to scientific writing and accounts for its much higher frequency in the two
professional scientific sub-corpora (MC-SC and BNC-SC in Figure 2.5).
300
250
200
150
100
50
0
TS
BN UM
H
I
PS T
S
C
AP N
L
BN C
BA HY
BN OL
LO SC
BN SC
IS
C
BE
ES
M SO
SO
IE
BA RT
S
AR
BA TE
H
C
SC
E
C
N
A
C
C
C
C
BA WE
C
C
C
M
E
BN
M
C
M
E
C
W
M
Figure 2.5 Distribution of the noun ‘solution’
These two peak frequencies of occurrence are responsible for the relatively
low D value (54.6) of the noun solution.
2.3.4. Broadening the scope of well-represented semantic categories

More generally, the 15 sub-corpora are relatively small and the frequencies
of occurrence of words may be skewed by a particular topic or author’s pre-
ferred turn of phrase. I therefore made use of a semi-automatic procedure
to identify words that did not pass the dispersion criterion but were seman-
tically related to the 599 potential academic words.
Section 2.2.2 discussed how a text uploaded to the web-based environ-
ment Wmatrix is morphosyntactically and semantically tagged. The seman-
tic analysis was conducted with the UCREL System which classifies words and
multiword units into 21 major semantic categories. Table 2.15 shows the
distribution of the 599 potential academic words across these semantic
classes. Some words were automatically classified into more than one cate-
gory but the figures given are based on the semantic tag most frequently
attributed to each word. It is notable that 87 per cent of the 599 potential
academic words fall into just six of the categories; in particular, the category
‘general and abstract terms’ includes almost half the potential academic
words. Examples include the nouns activity, circumstance, and limitation as
well as the verbs perform and cause, the adjectives detailed and particular and
the adverbs similarly and conversely. The category ‘numbers and measure-
ment’ accounts for more than 10 per cent of the potential academic words
and includes nouns (e.g. degree, measure, amount, extent), adjectives (e.g. high,
Table 2.15 Automatic semantic analysis of potential academic words
Semantic categories Number of words Percentage of words
A. General and abstract terms 267 44.6

B. The body and the individual 2 0.3
C. Arts and crafts 2 0.3
E. Emotion 4 0.7
F. Food and farming 0 0.0
G. Government and public 4 0.7
H. Architecture, house and the home 2 0.3
I. Money and commerce in industry 7 1.2
K. Entertainment, sports and games 0 0.0
L. Live and living things 0 0.0
M. Movement, location, travel and transport 12 2.0
N. Numbers and measurement 74 12.4
O. Substances, materials, objects and equipment 7 1.2
P. Education in general 4 0.7
Q. Language and communication 34 5.7
S. Social actions, states and processes 47 7.7
T. Time 26 4.3
W. World and environment 2 0.3
X. Psychological actions, states and processes 55 9.2
Y. Science and technology in general 2 0.3
Z. Names and grammar 50 8.3
TOTAL 599 100
large, wide), verbs (e.g. extend, increase, reduce), adverbs (e.g. frequently,
subsequently, also) and prepositions (e.g. in addition to). The categories
‘psychological actions, states and processes’ (e.g. assumption, analyse, inter-
pretation, conclusion, attempt), ‘names and grammar’ (mainly consisting of
connective devices such as conjunctions (or, whether), prepositions (such as,
according to, since, during) and adverbs (moreover, thus, therefore)), ‘social
actions, states and processes’ (e.g. social, encourage, facilitate, impose), and
‘language and communication’ (e.g. argue, claim, define, suggest) represent
9.2 per cent, 8.3 per cent, 7.7 per cent and 5.7 per cent of the potential
academic words respectively.
On this basis, 331 keywords that did not have Juilland’s D values higher
than 0.8 but which formed part of one of the six semantic categories
described above were added to the list of potential academic words. Many
of the words that were retrieved by this additional criterion are morphologi-
cally related to words that had already been automatically selected. For
example, the noun analysis, which is morphologically related to the poten-
tial academic verb analyse, was retrieved by the semantic criterion although
its Juilland’s D was below 0.8. However this is not an argument for using
word families instead of lemmas. The criteria of minimum frequency and
range still apply: the noun analysis was retrieved only because it is very
frequent in academic prose and appears in a wide range of academic texts.
Other morphologically related words such as analyst or analysable were still
excluded from the list.
2.4. The Academic Keyword List
The (semi-)automatic extraction procedure described in Section 2.3

identified 930 potential academic words on the basis of four criteria. First,
the words had to be keywords in professional (both ‘hard’ and ‘soft’
disciplines) and student academic writing. Second, they had to be charac-
terized by wide range, i.e. to appear in all 15 sub-corpora representing
different academic disciplines. Third, they had to be well-distributed across
the corpora and have a Juilland’s D value higher than 0.8. Keywords that did
not match this last criterion but belonged to one of the six best represented
semantic categories – ‘general and abstract’, ‘numbers and measurement’,
‘psychological actions, states and processes’, ‘names and grammar’, ‘social
actions, states and processes’ and ‘language and communication’ – were also
included. The resulting list of potential academic words has been named the
Academic Keyword List (AKL) to emphasize the fact that it is the output of a
data-driven set of criteria, the first of which is keyness, and not a list of aca-
demic vocabulary in its functional sense. Table 2.16 presents a breakdown of
the AKL by grammatical category. The complete list is given in Table 2.17.
Nouns make up 38.17 per cent of all potential academic words. This is
consistent with Biber et al.’s (1999) finding that nouns are particularly
frequent in academic prose. A large proportion of the nouns in the list are
Table 2.16 Distribution of

grammatical categories in the
Academic Keyword List
Number Percentage
Nouns 355 38.17

Verbs 233 25.05
Adjectives 180 19.35
Adverbs 87 9.35
Others 75 8.06
Total 930 100
Table 2.17 The Academic Keyword List
56
355 nouns
ability, absence, account, achievement, act, action, activity, addition, adoption, adult, advance, advantage, advice, age, aim, alternative, amount,
analogy, analysis, application, approach, argument, aspect, assertion, assessment, assistance, association, assumption, attempt, attention, attitude,
author, awareness, balance, basis, behaviour, being, belief, benefit, bias, birth, capacity, case, category, cause, centre, challenge, change, character,
characteristic, choice, circumstance, class, classification, code, colleague, combination, commitment, committee, communication, community,
comparison, complexity, compromise, concentration, concept, conception, concern, conclusion, condition, conduct, conflict, consensus,
Academic Vocabulary in Learner Writing

consequence, consideration, constraint, construction, content, contradiction, contrast, contribution, control, convention, correlation, country,
creation, crisis, criterion, criticism, culture, damage, data, debate, decision, decline, defence, definition, degree, demand, description, destruction,
determination, development, difference, difficulty, dilemma, dimension, disadvantage, discovery, discrimination, discussion, distinction, diversity,
division, doctrine, effect, effectiveness, element, emphasis, environment, error, essence, establishment, evaluation, event, evidence, evolution,
examination, example, exception, exclusion, existence, expansion, experience, experiment, explanation, exposure, extent, extreme, fact, factor,
failure, feature, female, figure, finding, force, form, formation, function, future, gain, group, growth, guidance, guideline, hypothesis, idea, identity,
impact, implication, importance, improvement, increase, indication, individual, influence, information, insight, instance, institution, integration,
interaction, interest, interpretation, intervention, introduction, investigation, isolation, issue, kind, knowledge, lack, learning, level, likelihood, limit,
limitation, link, list, literature, logic, loss, maintenance, majority, male, manipulation, mankind, material, means, measure, medium, member, method,
minority, mode, model, motivation, movement, need, network, norm, notion, number, observation, observer, occurrence, operation, opportunity,
option, organisation, outcome, output, parallel, parent, part, participant, past, pattern, percentage, perception, period, person, personality,
perspective, phenomenon, point, policy, population, position, possibility, potential, practice, presence, pressure, problem, procedure, process,
production, programme, progress, property, proportion, proposition, protection, provision, publication, purpose, quality, question, range, rate,
reader, reality, reason, reasoning, recognition, reduction, reference, relation, relationship, relevance, report, representative, reproduction,
requirement, research, resistance, resolution, resource, respect, restriction, result, review, rise, risk, role, rule, sample, scale, scheme, scope, search,
section, selection, sense, separation, series, service, set, sex, shift, significance, similarity, situation, skill, society, solution, source, space, spread,
standard, statistics, stimulus, strategy, stress, structure, subject, success, summary, support, survey, system, target, task, team, technique, tendency,
tension, term, theme, theory, tolerance, topic, tradition, transition, trend, type, uncertainty, understanding, unit, use, validity, value, variation, variety,
version, view, viewpoint, volume, whole, work, world
233 verbs
accept, account (for), achieve, acquire, act, adapt, adopt, advance, advocate, affect, aid, aim, allocate, allow, alter, analyse, appear, apply, argue, arise,
assert, assess, assign, associate, assist, assume, attain, attempt, attend, attribute, avoid, base, be, become, benefit, can, cause, characterise, choose, cite,
claim, clarify, classify, coincide, combine, compare, compete, comprise, concentrate, concern, conclude, conduct, confine, conform, connect,
consider, consist, constitute, construct, contain, contrast, contribute, control, convert, correspond, create, damage, deal, decline, define, demonstrate,
depend, derive, describe, design, destroy, determine, develop, differ, differentiate, diminish, direct, discuss, display, distinguish, divide, dominate,
effect, eliminate, emerge, emphasize, employ, enable, encounter, encourage, enhance, ensure, establish, evaluate, evolve, examine, exceed, exclude,
exemplify, exist, expand, experience, explain, expose, express, extend, facilitate, fail, favour, finance, focus, follow, form, formulate, function, gain,
generate, govern, highlight, identify, illustrate, imply, impose, improve, include, incorporate, increase, indicate, induce, influence, initiate, integrate,
Selection of academic vocabulary

interpret, introduce, investigate, involve, isolate, label, lack, lead, limit, link, locate, maintain, may, measure, neglect, note, obtain, occur, operate,
outline, overcome, participate, perceive, perform, permit, pose, possess, precede, predict, present, preserve, prevent, produce, promote, propose,
prove, provide, publish, pursue, quote, receive, record, reduce, refer, reflect, regard, regulate, reinforce, reject, relate, rely, remain, remove, render,
replace, report, represent, reproduce, require, resolve, respond, restrict, result, retain, reveal, seek, select, separate, should, show, solve, specify, state,
stimulate, strengthen, stress, study, submit, suffer, suggest, summarise, supply, support, sustain, tackle, tend, term, transform, treat, undermine,
undertake, use, vary, view, write, yield
180 adjectives
absolute, abstract, acceptable, accessible, active, actual, acute, additional, adequate, alternative, apparent, applicable, appropriate, arbitrary, available,
average, basic, central, certain, clear, common, competitive, complete, complex, comprehensive, considerable, consistent, conventional, correct,
critical, crucial, dependent, detailed, different, difficult, distinct, dominant, early, effective, equal, equivalent, essential, evident, excessive,
experimental, explicit, extensive, extreme, far, favourable, final, fixed, following, formal, frequent, fundamental, future, general, great, high, human,
ideal, identical, immediate, important, inadequate, incomplete, independent, indirect, individual, inferior, influential, inherent, initial, interesting,
internal, large, late, leading, likely, limited, local, logical, main, major, male, maximum, mental, minimal, minor, misleading, modern, mutual,
natural, necessary, negative, new, normal, obvious, original, other, overall, parallel, partial, particular, passive, past, permanent, physical, positive,
possible, potential, practical, present, previous, primary, prime, principal, productive, profound, progressive, prominent, psychological, radical,
random, rapid, rational, real, realistic, recent, related, relative, relevant, representative, responsible, restricted, scientific, secondary, selective,
separate, severe, sexual, significant, similar, simple, single, so-called, social, special, specific, stable, standard, strict, subsequent, substantial, successful,
successive, sufficient, suitable, surprising, symbolic, systematic, theoretical, total, traditional, true, typical, unique, unlike, unlikely, unsuccessful,
useful, valid, valuable, varied, various, visual, vital, wide, widespread
57
(Continued)
58
Table 2.17 (Cont’d)
87 adverbs
above, accordingly, accurately, adequately, also, approximately, at best, basically, clearly, closely, commonly, consequently, considerably, conversely,
correctly, directly, effectively, e.g., either, equally, especially, essentially, explicitly, extremely, fairly, far, for example, for instance, frequently, fully,
further, generally, greatly, hence, highly, however, increasingly, indeed, independently, indirectly, inevitably, initially, in general, in particular, largely,
less, mainly, more, moreover, most, namely, necessarily, normally, notably, often, only, originally, over, partially, particularly, potentially, previously,

primarily, purely, readily, recently, relatively, secondly, significantly, similarly, simply, socially, solely somewhat, specifically, strongly, subsequently,
successfully, thereby, therefore, thus, traditionally, typically, ultimately, virtually, wholly, widely
75 others
according to, although, an, as, as opposed to, as to, as well as, because, because of, between, both, by, contrary to, depending on, despite, due to,
during, each, even though, fewer, first, former, from, for, given that, in, in addition to, in common with, in favour of, in relation to, in response to, in
terms of, in that, in the light of, including, its, itself, latter, less, little, many, most, of, or, other than, per, prior to, provided, rather than, same, second,
several, since, some, subject to, such, such as, than, that, the, their, themselves, these, third, this, those, to, unlike, upon, versus, whereas, whether,
whether or not, which, within
abstract terms which belong to Francis’s (1994) category of metalinguistic

labels: illocutionary nouns (e.g. argument, question), language-activity nouns
(e.g. comparison, contrast, definition, and description), mental process nouns
(e.g. analysis, concept, hypothesis, theory, and view) and text nouns (e.g. section,
term). Verbs account for 25 per cent of the list and include verbs that Hinkel
(2004) classified as activity verbs (e.g. deal, use, show, provide), reporting
verbs (e.g. argue, discuss, emphasize, explain, respond), mental verbs (e.g. assess,
examine, interpret, note), linking verbs (e.g. appear, be, become, prove) and
logico-semantic relationship verbs (e.g. arise, cause, contrast, differ, follow,
imply, illustrate, include).
Although usually disregarded in academic textbooks and teaching mate-
rials, adjectives represent 19.35 per cent of the potential academic words in
this list. The syntactic function of adjectives is to modify nouns and noun
phrases. So it is logical that, if nouns are frequent in academic writing,
numerous adjectives will be used to qualify them. The majority of adjectives
express value judgements, either positive or negative (e.g. adequate,
appropriate, clear, inadequate, incomplete, interesting, prime, useful), possibility
(e.g. possible, potential, likely, unlikely) and logico-semantic relationships
(e.g. additional, alternative, different, equivalent, final, following, parallel,
similar). As Conrad (1999) pointed out, the category of potential academic
adverbs (9.35%) consists essentially of linking (e.g. consequently, conversely,
hence, however, thus) and evaluative (e.g. adequately, correctly, effectively, highly,
increasingly, inevitably, significantly) words. The category ‘other’ includes
prepositions, conjunctions, pronouns, articles, determiners and ordinal
numbers. Inspection of this list illustrates the value of CLAWS ditto tags
(Section 2.2.2): some 40 per cent of the potential academic prepositions
are complex (e.g. such as, according to, in terms of, because of). There are also
complex conjunctions such as whether or not and given that. Pronouns, arti-
cles and determiners are not fully lemmatised by CLAWS. For example, this,
these and those are not analysed as word forms of the same determiner and
are categorized as separate lemmas.
In order to calculate the percentages of GSL and AWL words in the list of
potential academic words, the Academic Keyword List was uploaded to the
Web Vocab Profile developed by Tom Cobb10. This web interface takes a text
file as its input, analyses the vocabulary in the text, and classifies the words
into four main categories: (1) the first 1,000 most frequent words of
English; (2) the second 1,000 most frequent words of English; (3) words
of the Academic Word List; and (4) other words. Before uploading the list of
potential academic words, it was necessary to remove multiword units (such
as the adverbs for example and for instance and the complex prepositions
Table 2.18 The distribution of AKL words in the GSL and the AWL
Lists % Examples
GSL 57% aim, argue, argument, because, compare, comparison, differ, difference, discuss,
example, exception, explain, explanation, importance, include, increasingly,
likelihood, namely, point, reason, result, therefore, typically
AWL 40% accurately, adequate, analysis, assess, comprise, conclude, conclusion,
consequence, emphasize, hypothesis, inherent, method, proportion, relevance,
scope, summary, survey, theory, validity, whereas
Other 3% assertion, correlation, criticism, exemplify, proposition, reference, tackle, versus,
viewpoint
in addition to, due to, prior to, in the light of, in favour of and because of) since
Web Vocab Profile cannot deal with them and would simply decompose multi-
word units into their parts. The comparison is thus based on single words
only. The results show that only 40 per cent of the potential academic words
in my list also occur in the Academic Word List, while 57 per cent are among
the 2,000 most frequent words of English as described in West’s (1953)
General Service List (see Table 2.18). These results highlight the important
role played by general service vocabulary in academic writing.
The Academic Keyword List is a list of potential academic words. As explained
in Section 1.5, frequency-based criteria are not used as a defining property
of academic words but as a way of operationalizing a function-based defini-
tion of academic vocabulary. Table 2.17 shows that a high proportion of AKL
words fit the functional definition of academic vocabulary (e.g. evidence,
approach, result, show, define, measure, degree, extent, condition, experience, result),
which provides the retrieval procedure with some evidence of face validity.
An important application of word frequency lists for course design is to
uncover ‘functional and notional areas which might be important for the
syllabus’ (Flowerdew, 1993: 237). The automatic semantic analysis has
shown that a number of semantic sub-categories are particularly well-
represented. For example, the semantic sub-category named ‘A2.2. Affect:
cause – connected’ consists of the nouns basis, cause, consequence, correlation,
effect, factor, impact, implication, influence, link, motivation, relation, reason, result
and stimulus; the verbs associate, attribute, base, cause, combine, connect, depend,
derive, effect, generate, induce; influence, lead, link, produce, relate, render, result
and stimulate; the adjectives dependent, related and resulting; the adverbs conse-
quently, hence, therefore, thereby, and thus; the prepositions depending on, due to,
in relation to, in view of, on account of, with respect to, in respect of, in response to,
because of, in the light of, in terms of and subject to; and the conjunctions because,
given that, provided and since. Other well-represented semantic categories
include ‘A1.1. General actions, making, etc’ (e.g. activity, circumstance, event,
arise, perform), ‘A1.8. Inclusion / exclusion’ (e.g. content, scope, consist, exclude,
include), ‘A4.1. General kinds, groups, examples’ (e.g. case, category, example,
instance, kind, classify, illustrate), ‘A5. Evaluation’ (e.g. progress, enhance,
improve, favourable, positive), ‘A6. Comparing’ (e.g. contrast, difference, similar,
unlike, conversely), ‘A11.1. Importance’ (e.g. significance, emphasize, fundamen-
tal, major, primary); ‘N5. Quantities’ (e.g. amount, extent, figure, considerable,
limited, widely, several), ‘Q2.2. Speech acts’ (e.g. account, definition, discussion,
describe, quote, refer, suggest) and ‘X2.1. Thought and belief’ (e.g. view, assume,
consider, formulate).
The fact that these AKL words are used to serve particular functions in
academic prose has ‘to be corroborated by concordancing’ (Flowerdew,
1993: 237). Words such as the noun illustration, the verb illustrate and the
preposition like are often employed as exemplifiers but also have different
uses. The verb illustrate, for example, also means ‘to put pictures in a book’,
a sense that will not be useful to all scholars who are writing in higher edu-
cation settings. Similarly, the Academic Keyword List includes several words
that are relatively well distributed keywords in academic prose but should
probably not be part of a list of academic vocabulary (e.g. the nouns coun-
try, female, male, parent, sex and world). The Academic Keyword List is not a final
product and does not in itself ‘carry any guarantee of pedagogical
relevance’ (Widdowson, 1991: 20–1). It still needs ‘pedagogic mediation’
(Widdowson, 2003) and is thus better conceived of as a ‘platform from which
to launch corpus-based pedagogical enterprises’ (Swales, 2002: 151).
In Chapter 1, academic vocabulary was defined as ‘a set of options to refer

to those activities that characterize academic work, organize scientific
discourse and build the rhetoric of academic texts’. The main objective of
Chapter 2 has been to operationalize this function-based definition
using frequency-based criteria, and to retrieve potential academic words
(i.e. words that are reasonably frequent in a wide range of academic texts
but relatively uncommon in other kinds of texts and which, as such, might
count as academic vocabulary) from several corpora of academic prose.
The (semi-)automatic method used relies on Rayson’s (2008) data-driven
model, which combines elements of both ‘corpus-based’ and ‘corpus-
driven’ paradigms in corpus linguistics. The keyword procedure was first
used to retrieve a set of words which are distinctive of academic writing.
Second, the criteria of range and evenness of distribution were used as a

sieve to refine the list of potential academic words. Third, a number of
keywords that had failed the dispersion test were selected on the basis of a
semantic criterion: they belonged to one of the six semantic categories
(‘general and abstract’, ‘numbers and measurement’, ‘psychological
actions, states and processes’, ‘names and grammar’, ‘social actions, states
and processes’ and ‘language and communication’) that included most
potential academic words. The method provides a good illustration of the
usefulness of annotation for the development of practical applications such
as the Academic Keyword List.
This methodology has limitations. First, the corpora used are relatively
small and contain a limited number of text types. However, at present no
corpus exists that represents all the varieties of academic discourse.
As Krishnamurthy and Kosem (2007: 370) comment, ‘the one thing that
EAP seems to lack is a corpus that includes all levels of data – from pre-
sessional students’ writing and speech to academic lectures, PhD theses,
and published research articles and books. Such a corpus would need to
include as many disciplines as possible, with sufficient detailed categoriza-
tion to enable the users (teachers or students) to select a customized set of
corpus texts appropriate for their needs’. Second, a limitation inherent in
the keyness approach is the use of a reference corpus. A reference corpus
is itself characterized by a set of distinctive linguistic features, some of which
may be shared with the corpus under study. There is thus a strong case for
using ‘strongly contrasting reference corpora’ (cf. Tribble, 2001: 396).
However, it is likely that a few potential academic words passed unnoticed
because they are also used in fiction, irrespective of differences in meaning
or function. Third, the results are dependent on a number of arbitrary
cut-off points: the probability threshold under which log-likelihood ratio
values are not considered significant, the minimum number of texts in
which a keyword must appear, and the minimum coefficient of dispersion
(see Oakes and Farrow, 2007: 92; Paquot and Bestgen, 2009). The Academic
Keyword List might have looked quite different if other cut-off points had
been adopted.
Despite these limitations, a high proportion of the words included in the
Academic Keyword List match the definition of academic vocabulary, confirm-
ing the accuracy of the retrieval procedure. AKL words have been shown to
fall in semantic categories such as cause and effect, inclusion/exclusion,
evaluation, comparison, importance, quantities, and speech acts, which
clearly relate to academic work, the organization of scientific discourse, or
the building of the argument of academic texts. The method has also made
it possible to appreciate the paramount importance of general service words

in academic prose. A number of general service words take on prominent
rhetorical and organizational functions in academic discourse and their
absence from lists such as Coxhead’s (2000) Academic Word List may be
highly problematic in academic writing courses.
The Academic Keyword List still needs validation. However it unquestion-
ably offers ‘a portal into the complex behaviour and intricate relationships
of individual lexical items’ (Hancioğ lu et al., 2008: 461). Each AKL word
should be subject to a careful corpus-based analysis to confirm its status as
an academic word and establish how it is used in academic prose in terms
of meaning, phraseology, and sentence position.
Part II
Learners’ use of academic

vocabulary
Having clarified what academic vocabulary is by providing a critical over-

view of its many definitions, and proposed a data-driven methodology to
extract potential academic words from corpora, I will now examine EFL
learners’ use of academic vocabulary. Chapter 3 offers a detailed descrip-
tion of the corpora used and the method adopted to compare them:
Granger’s (1996) Contrastive Interlanguage Analysis (CIA). In Chapter 4 the
Academic Keyword List is shown to include a large number of lemmas that can
be used to organize academic texts and structure their content around
logico-semantic relations. The function of ‘exemplification’ is presented in
detail. This illustrates the type of data and the results obtained when the
whole range of lexical strategies available to expert writers for establishing
cohesive links in their texts is examined. Chapter 4 also focuses on the
phraseology of academic words. Chapter 5 aims to test the working hypoth-
esis that upper-intermediate to advanced EFL learners, irrespective of their
mother-tongue background, share a number of linguistic features that
characterize their use of academic vocabulary. The learner corpus used
is a collection of texts taken from the International Corpus of Learner English.
As learner texts are short argumentative essays, I focus on AKL words that
are used to organize discourse and build the rhetoric of a text, rather than
on words that focus on research, analysis and evaluation. Many of the lin-
guistic features that characterize learner writing are shared by learners
from a range of mother tongue backgrounds, and can therefore be assumed
to be developmental. The EFL learners are all learning how to write in a
foreign language, and they are often novice writers in their mother tongue
as well. However, not all the features of their writing are shared across the
different language backgrounds, and the differences may be due to transfer

from the writers’ mother tongues. Chapter 5 therefore ends with a brief
discussion of the potential influence of their mother tongue on French
speakers’ use of academic vocabulary in English.
Chapter 3
Investigating learner language
Whatever the definition adopted, academic vocabulary has generally been

described as a major source of difficulty for EFL learners. In the second
part of this book, I will focus on learners’ use of academic words that serve
typical organizational or rhetorical functions in academic discourse. I will
investigate whether upper-intermediate to advanced EFL learners, irrespec-
tive of their mother tongue backgrounds, share characteristic ways of using
academic vocabulary. In this chapter, I describe the corpora and methodol-
ogy used to pursue these objectives. The learner corpus used is the Interna-
tional Corpus of Learner English, which is among the largest non-commercial
learner corpora currently existing, and contains texts written by learners
with different mother-tongue backgrounds. The method of analysing
learners’ use of academic vocabulary and comparing it with that of
expert writers relies on Granger’s (1996a) Contrastive Interlanguage Analysis.
A subset of the British National Corpus is used as a control corpus and helps
bring to light features of learner language.
3.1. The International Corpus of Learner English
The learner data used consist of ten sub-corpora of the International Corpus
of Learner English version 1 (ICLE) compiled at the University of Louvain,
Belgium, under the supervision of Sylviane Granger (Granger et al., 2002;
Granger, 2003). A computer learner corpus is an electronic collection of
(near-) natural language learner texts assembled according to explicit
design criteria (see Granger, 2009 for the design criteria, and Ellis and
Barkhuizen, 2005 for a discussion of different types of learner data). Each
learner text is documented with a detailed learner-profile questionnaire,
which all the learners were requested to fill in. Learner-profile question-
naires give two types of information: learner characteristics and informa-
tion on the type of task.
International Corpus of Learner English
Shared features Variable features
Learner variables Task variables Learner variables Task variables
Age Medium Gender Topic

Learning context Field Mother tongue Task setting
Proficiency level Genre Region
Length Other FL Timming
L2 exposure Exam
Reference
tools
Figure 3.1 ICLE task and learner variables (Granger et al., 2002: 13)
As shown in Figure 3.1, ICLE texts share a number of learner and task
variables, which were used as corpus-design criteria. All the learners who
submitted an essay to the ICLE were university undergraduates and were
therefore usually in their twenties. They were learners of English as
a Foreign Language rather than as a Second Language and were in their
third or fourth year of university study. On the basis of these external crite-
ria, their level is described as advanced although ‘individual learners and
learner groups differ in proficiency’ (Granger, 2003: 539).1 Learner pro-
ductions have quite a few task variables in common, notably in terms of
medium (writing), genre (academic essay), field (general English rather
than English for Specific Purposes) and length (between 500 and 1,000
words).
Other variables differ (e.g. mother tongue background of learners,
L2 exposure, essay topic and task settings). The ICLE learners represent
11 different mother tongue backgrounds: Bulgarian, Czech, Dutch, Finnish,
French, German, Italian, Polish, Russian, Spanish and Swedish.2 A large
proportion of the learner texts are argumentative, but the ICLE essays cover
a wide range of topics. Topics include Most university degrees are theoretical and
do not prepare students for the real word; In the words of the old song, ‘Money is the
root of all evil’ and Feminists have done more harm to the cause of women than good.
Essays differ in task conditions: they may have been written in timed or
untimed conditions, as part of an exam or not, with reference tools such as
Investigating learner language 69
Table 3.1 Breakdown of ICLE essays
No. of essays No. of words Average no. of

words per essay
Czech (ICLE-CZ) 147 130,768 890

Dutch (ICLE-DU) 196 162,243 828
Finnish (ICLE-FI) 167 125,292 750
French (ICLE-FR) 228 136,343 598
German (ICLE-GE) 179 109,556 612
Italian (ICLE-IT) 79 47,739 604
Polish (ICLE-PO) 221 140,521 636
Russian (ICLE-RU) 194 165,937 855
Spanish (ICLE-SP) 149 99,119 665
Swedish (ICLE-SW) 81 48,060 593
TOTAL 1641 1,165,524 697
grammars and dictionaries or not. Most studies of the ICLE data to date
have not taken these task settings into consideration (see Ädel’s (2008)
analysis of timed and untimed essays for an exception). However, research-
ers in second and foreign language acquisition and teaching insist that the
influence of task type and condition is important (Shaw, 2004; Kroll,
1990).
Task and learner variables can be used to compile homogeneous sub-
corpora. As shown in Table 3.1, this study makes use of ten ICLE sub-
corpora, representing different mother tongue backgrounds. Learner
essays in each sub-corpus were carefully selected in an attempt to control
for the task variables which may affect learner productions: all the texts are
untimed argumentative essays, potentially written with the help of reference tools.
Although essays written without the help of reference tools would arguably
have been more representative of what advanced EFL learners can pro-
duce, I chose to select untimed essays with reference tools as they represent
the majority of learner texts in ICLE.3
The ICLE sub-corpora compiled for this study were analysed with the
Concord tool of the computer software WordSmith Tools 4 (Scott, 2004). The
software was used to examine the occurrences of potential academic words
in context. The Clusters option proved very useful for identifying the most
frequent n-grams or lexical bundles that contained the words being
studied. More information on the types of analyses that can be performed
with WordSmith Tools can be found on Mike Scott’s webpage (http: //www.
lexically.net) and in Scott and Tribble (2006).
3.2. Contrastive Interlanguage Analysis
The methodology most frequently used to analyse learner corpora is

Contrastive Interlanguage Analysis (CIA) (Granger 1996a; Gilquin 2000/2001).
Unlike contrastive analysis, which consists of comparing two or more lan-
guages, CIA compares varieties of one language: native and non-native vari-
eties (L1/L2), or different non-native varieties (L2/L2) (see Figure 3.2).
L1/L2 comparisons bring out the features of non-nativeness in learner pro-
ductions, ‘which at an advanced level are as much (if not more) a question
of over- and under-use of linguistic items or structures as a question of
downright errors’ (Granger et al., 2009: 41). Comparisons of different
interlanguages (e.g. the English of French speakers compared to that of
Dutch speakers), on the other hand, make it possible to assess whether
these features are peculiar to one language group (and thus possibly due to
the influence of the learner’s mother tongue), or shared by several learner
populations (and therefore likely to be developmental or due to other
causes such as teaching methods) (cf. Granger, 2002).
In his preface to Learner English on Computer (Granger, 1998), Leech
describes the native control corpus as ‘a standard of comparison, a norm
against which to measure the characteristics of the learner corpora’
(Leech, 1998: xv). In second language acquisition research, however,
L1/L2 comparisons have generally been criticized for being guilty of the
‘comparative fallacy’ (Bley-Vroman, 1983), i.e. for comparing learner
language to a native speaker norm and thus failing to analyse interlanguage
in its own right (see Lakshmanan and Selinker, 2001; Firth and Wagner,
1997). A strong argument that can be invoked in defence of the CIA model
is that the native speaker norm used in learner corpus research is explicit
and corpus-based (Mukherjee, 2005) rather than implicit and intuition-
based as has been common in second language acquisition (SLA) studies
CIA
L1 > < L2 L2 > < L2
Figure 3.2 Contrastive Interlanguage Analysis (Granger, 1996a)

(see also Granger, 2009: 20). Lakshmanan and Selinker (2001) address
the issue of the comparative fallacy and warn against the danger of
‘judging language learner speech utterances as ungrammatical from the
standpoint of the target grammar without first having compared the
relevant interlanguage utterances with the related speech utterances in
adult native-speaker spoken discourse’ (Lakshmanan and Selinker, 2001:
401). Although they do not dwell on it, Lakshmanan and Selinker’s
point may be understood as a plea for more natural language data
(i.e. corpus data) and a warning against hasty conclusions based on a single
researcher’s intuitions.
Another criticism of L1/L2 comparisons is directed at the idea of the
‘native speaker’ as the target norm (e.g. Piller, 2001; Tan, 2005). Mukherjee,
however, argues that ‘nativeness’ remains a useful construct both for lin-
guistics and for the ELT community, a ‘useful myth’ in Davies’s (2003)
terms. He proposes a usage-based definition of the native speaker based on
three aspects that he regards as central to native-like performance, i.e.
lexico-grammaticality, acceptability and idiomaticity (see Pawley and Syder,
1983):
The term ‘native speaker’ should be used for an abstraction of all

language users (1) who have good intuitions about what is lexicogram-
matically possible in a given language and speak/write accordingly,
(2) who know to a large extent what is acceptable in a given communi-
cation situation and speak/write accordingly, (3) whose usage is largely
idiomatic in terms of linguistic routines commonly used in a given speech
community. If we refer to an individual speaker as a native speaker, this
speaker is thus taken to exemplify the abstract native speaker model on
grounds of his/her language use. (Mukherjee, 2005: 14)
Mukherjee advocates a corpus-approximation to the native speaker norm

and argues that corpus data can be used to describe this norm by ‘general-
izing and abstracting from a vast amount of representative performance
data’ (ibid: 15). In this book, the corpus-approximation to the native speaker
norm is based on British and American English corpora. It should be noted,
however, that the existence of a variety of norms is recognized in learner
corpus research (see Granger, 2009) and that other varieties of English are
sometimes used as control corpora. For example, Gilquin and Granger
(2008) compared the Tswana component of the second edition of the
International Corpus of Learner English to a corpus of South African English
editorials. The control corpora used in this study are described below.
3.3. A comparison of learner vs. expert writing
Carrying out L1/L2 comparisons implies choosing an L1 corpus to be used

as some kind of ‘norm’ with which the learner corpus data can be
compared. In this study, learner writing was compared to expert academic
prose. There is no general agreement, however, on the type of material that
is best suited to serve as a control for a learner corpus. Several researchers
have criticized the use of professional writing in learner corpus research,
arguing that it is ‘both unfair and descriptively inadequate’ (Lorenz,
1999a: 14) and taking a stand against the ‘unrealistic standard of “expert
writer” models’ (Hyland and Milton, 1997: 184). Native student writing is
arguably a better source of comparable data to EFL learner writing if the
aim of the comparison is to describe and evaluate interlanguage(s) as fairly
as possible. It is, however, doubtful whether the findings from such
comparisons could make their way into the classroom. As Leech puts it,
‘native-speaking students do not necessarily provide models that everyone
would want to imitate’ (Leech, 1998: xix). For example, native students
have been shown to produce more dangling participles than EFL learners
(Granger, 1997) and different types of spelling mistakes (Cutting, 2000).
The question of the norm is best addressed by considering the aim of
the comparison. Professional writing has a major role to play in learner
corpus research if instruction and pedagogical applications are the goals
of the comparison between learner and native-speaker productions.
As Ädel put it,
On the one hand, it can be argued that in order to evaluate foreign

learner writing by students justly, we need to use native-speaker writing
that is also produced by students for comparison. On the other hand, it
can also be argued that professional writing represents the norm that
advanced foreign learner writers try to reach and their teachers try to
promote. In this respect, a useful corpus for comparison is one
which offers a collection of what Bazerman (1994: 131) calls ‘expert
performances’. (Ädel, 2006: 206–7)
The International Corpus of Learner English, however, consists of argu-

mentative texts and ‘argumentative essay writing has no exact equivalent in
professional writing’ (De Cock, 2003: 196). It has been suggested that
the ICLE might be compared with ‘a corpus of newspaper editorials, a text
type which combines the advantages of being argumentative in nature and
written by professionals’ (Granger, 1998a: 18, footnote 10). In a number of
studies based on ICLE texts produced by Spanish EFL learners, Neff and
her colleagues used both native-speaker student writing and newspaper
editorials as control corpora (Neff et al., 2004a; 2004b; Neff van Aertselaer,
2008). General corpora have also been used in learner corpus
research. Nesselhauf (2005), for example, made use of the written part
of the British National Corpus to determine the degree of acceptability of
verb-noun combinations that had been extracted from the German subset
of ICLE.
The British National Corpus (BNC) was created to be a balanced reference
corpus of late twentieth century British English. The BNC contains both
written and spoken material. The written component totals about 90 mil-
lion words and includes samples of academic books, newspaper articles,
popular fiction, letters, university essays and many other text types. The text
selection procedure has been described as follows:
In selecting texts for inclusion in the corpus, account was taken of both
production, by sampling a wide variety of distinct types of material, and
reception, by selecting instances of those types which have a wide distri-
bution. Thus, having chosen to sample such things as popular novels, or
technical writing, best-seller lists and library circulation statistics were
consulted to select particular examples of them. (Aston and Burnard,
1998: 28)
The BNC mark-up conforms to the Text Encoding Initiative (TEI) recom-
mendations (Burnard, 2007). Mark-ups include rich metadata on a variety
of structural properties of texts (e.g. headings, sentences and paragraphs),
file description, text profile, as well as linguistic information (morphosyn-
tactic tags, lemmas, etc.).
Three criteria were originally used to select written texts to design
a balanced corpus: domain, time and medium. Domain refers to the
subject field of the texts; time refers to the period when the text was written,
and medium refers to the type of publication (books, newspapers, periodi-
cals, etc.). Lee (2001) criticized the domain categories for being overly
broad and not sufficiently explicit, and developed a new resource called the
BNC Index which contains genre labels for all BNC texts. Table 3.2 gives the
breakdown of the genre categories in the BNC written corpus and shows
that genre labels are often hierarchically nested. Thus, if we want to analyse
texts produced by scholars specializing in natural sciences, we can select all
BNC texts classified under ‘W_ac_nat_science’. On the other hand, if we
are not interested in discipline-specific differences and want to examine
Table 3.2 BNC Index – Breakdown of written BNC genres (Lee 2001)
BNC written No. of % ‘Super genre’ No. of

words files
W_ac_humanities_arts 3,321,867 3.8% 87

W_ac_medicine 1,421,933 1.6% 24
W_ac_nat_science 1,111,840 1.3% Academic 43
W_ac_polit_law_edu 4,640,346 5.3% prose 17.7% 186
W_ac_soc_science 4,247,592 4.9% 138
W_ac_tech_engin 686,004 0.8% 23
W_admin 219,946 0.3% 12
W_advert 558,133 0.6% 60
W_biography 3,528,564 4.0% 100
W_commerce 3,759,366 4.3% 112
W_email 213,045 0.2% 7
W_essay_sch 146,530 0.2% Unpublished 7
W_essay_univ 65,388 0.1% essays 0.3% 4
W_fict_drama 45,757 0.1% 2
W_fict_poetry 222,451 0.3% Fiction 18.6% 30
W_fict_prose 15,926,677 18.2% 432
W_hansard 1,156,171 1.3% 4
W_institut_doc 546,261 0.6% 43
W_instructional 436,892 0.5% 15
W_letters_personal 52,480 0.1% Letters 0.2% 6
W_letters_prof 66,031 0.1% 11
W_misc 9,140,957 10.5% 500
W_news_script 1,292,156 1.5% 32
W_news_brdsht_nat_arts 351,811 0.4% 51
W_news_brdsht_nat_commerce 424,895 0.5% 44
W_news_brdsht_nat_editorial 101,742 0.1% 12
W_news_brdsht_nat_misc 1,032,943 1.2% 95
W_news_brdsht_nat_reportage 663,355 0.8% 49
W_news_brdsht_nat_science 65,293 0.1% 29
W_news_brdsht_nat_social 81,895 0.1% 36
W_news_brdsht_nat_sports 297,737 0.3% News 7.8% 24
W_news_other_arts 239,258 0.3% 15
W_news_other_commerce 415,396 0.5% 17
W_news_other_report 2,717,444 3.1% 39
W_news_other_science 54,829 0.1% 23
W_news_other_social 1,143,024 1.3% 37
W_news_other_sports 1,027,843 1.2% 9
W_news_tabloid 728,413 0.8% 6
W_non_ac_humanities_arts 3,751,865 4.3% 111
W_non_ac_medicine 498,679 0.6% 17
W_non_ac_nat_science 2,508,256 2.9% Non-academic 62
W_non_ac_polit_law_edu 4,477,831 5.1% prose 19.1% 93
W_non_ac_soc_science 4,187,649 4.8% 128
W_non_ac_tech_engin 1,209,796 1.4% 123
W_pop_lore 7,376,391 8.5% 211
W_religion 1,121,632 1.3% 35
TOTAL 87,284,364 100% 3,144

texts produced by professional writers in higher education settings, we can

select all texts whose categorizing labels begin with ‘W_ac’.
The BNC sub-corpus of academic prose in humanities and arts
(W_ac_humanities_arts; henceforth BNC-AC-HUM) totals 3,321,867 words.
It was used as the comparison corpus to ICLE in this study. This sub-corpus
was chosen for two main reasons. First, ICLE texts were produced by
students of humanities; texts in the BNC-AC-HUM are arguably quite
close to the type of text these students might have come across in their first
few years at university. They also have the advantage of corresponding to
the type of writing that learners will try to produce during their university
studies. There are, however, major differences between the two corpora.
First, ICLE is a corpus of unpublished university student essays while BNC-
AC-HUM consists of samples of published articles and books. Second,
student essays rarely total more than 1,000 words while samples in the BNC-
AC-HUM are much longer (from 25,000 to 45,000 words).4 Third, the
topics in BNC-AC-HUM differ from those in ICLE (described in Section 3.1
above). They include The people’s peace; National liberation; The morality of
freedom; Europe in the central middle ages; China’s students; British literature since
1945; What is this thing called science?; Soviet relations with Latin America;
Nietzsche on tragedy, etc. Unlike the ICLE, topics in BNC-AC-HUM appear
only once. Interpreting the results in the light of genre analysis thus
required special care: differences between student essays and expert
writing may simply reflect differences in their communicative goals and
settings (Neff et al., 2004a).
The W_fic and W_news sub-corpora (cf. Table 3.2) were sometimes
used to compare the frequency of words and phrases across ‘super genres’.
The spoken part of the British National Corpus was also regularly consulted
to check whether words and word sequences that were found in learner
writing are more typical of speech or academic writing. The BNC spoken
corpus (BNC-SP) consists of 10,334,947 words and includes a wide variety
of spoken registers, among others, broadcast documentaries and news,
interviews and lectures.
The British National Corpus was accessed via the BNCweb (CQP-edition)
interface developed by Stefan Evert and Sebastian Hoffmann. This web
interface is the result of a ‘marriage of two corpus tools’ (Hoffmann and
Evert, 2006), i.e. the BNCweb, a web-based client developed at the University
of Zurich which allows users to access the BNC by means of a Web browser
(see Lehmann et al., 2000) and the Corpus Query Processor (CQP), a central
component of IMS Open Corpus Workbench, which ‘allows sophisticated
searches both for individual words (which can be matched against regular
expressions) and for lexico-grammatical patterns (using linear grammars

that have access to all levels of annotation)’ (Hoffmann and Evert, 2006:
180). The CQP edition of the BNCweb combines the strengths of both
software packages while overcoming their limitations. It is a marriage
between the efficiency and flexibility of CQP queries, and the user-friendli-
ness of BNCweb with its wide range of query options and display facilities.
Hoffmann et al. (2008) proved further information on the British National
Corpus and the BNCweb interface.
One tool that is particularly useful is Collocations, which picks out
significant co-occurrents of the search word on the basis of a number of
measures of association. Association measures are the most widely used
method of distinguishing between casual and significant co-occurrences.
They compute an association score for each pair of words extracted from
a corpus, which indicates the strength of the association relative to that
expected by chance. Users of the BNCweb can decide to use any of five
different measures: mutual information, MI3, z-score, log-likelihood
and log-log measures.5 They can also sort co-occurrents by decreasing
frequency. A number of other settings are customizable, e.g. maximum
window span, minimum frequency of the co-occurrence, minimum
frequency of the co-occurrent, inclusion of lemma and part-of-speech
information, etc. Figure 3.3 displays a collocation query result. Signifi-
cant co-occurrents are sorted by decreasing log-likelihood values (right
column). The frequency of the co-occurrence is given together with the
number of texts in which it appears.
Mutual information, MI3, z-score, log-likelihood and log-log measures
rank co-occurrences in very different ways (Evert, 2004). McEnery et al.
(2006) compared the various statistical measures provided by BNCweb and
reported that ‘MI and z-scores tend to put too much emphasis on infre-
quent words. In contrast, the log-likelihood, log-log and MI3 tests appear to
provide more realistic collocation information’ (McEnery et al., 2006: 220).
The log-likelihood test was therefore used to study the phraseology of aca-
demic words in expert and learner writing. The log-likelihood scores can be
directly compared with critical values of a chi-square distribution table
(see Oakes, 1998: 176). Rayson et al. (2004), however, focused on the
comparison of word frequencies between corpora and suggested that,
in order to extend applicability of the frequency comparisons to low
expected values, use of a threshold value of 15.13 is preferred at p < 0.01.
Co-occurrence frequencies can be quite low and I therefore followed
Rayson et al.’s (2004) suggestion. Co-occurrences were analysed in windows
of one to three words to both the left (3L-1L) and the right (1R-3R).
Investigating learner language
Figure 3.3 BNCweb Collocations option
77
Log-likelihood measures are strongly dependent on corpus size and word

frequencies. Co-occurrence statistics are therefore not comparable across
corpora of different sizes such as the British National Corpus and the Interna-
tional Corpus of Learner English. The ICLE sub-corpora are in fact too small
for a statistical analysis of co-occurrences to be meaningful. Academic words
are not high-frequency words such as make, do and take and co-occurrences
often appear less than three times. In his study of the statistics of word
co-occurrences, Evert argued that ‘data with co-occurrence frequency
f < 3, i.e. the hapax and dis-legomena, should always be excluded from
the statistical analysis’ (Evert, 2004: 133) as expected frequencies and
p values for low frequency words are distorted in unpredictable ways.
A distributional (Evert, 2004) or frequency-based approach (Nesselhauf,
2004) was adopted to examine the phraseology of academic words in
learner writing. Word pairs in the ICLE sub-corpora were classified
into three groups according to their co-occurrence status in professional
academic writing:
– word pairs that are statistically significant co-occurrents in the academic

sub-corpus of the BNC (BNC-AC). In a pilot study, I found that learners’
word pairs were sometimes not statistically significant in BNC-AC-HUM
just because the co-occurrence was not frequent enough. As soon as
more data was used, the co-occurrence proved significant. I therefore
decided to use the larger BNC-AC instead of the BNC-AC-HUM to judge
the acceptability and typicality of EFL learners’ phraseological sequences
(see Section 5.2.3).
– word pairs that appear in the BNC-AC but are not statistically significant
co-occurrents;
– word pairs that do not appear in the BNC-AC;
Word pairs that did not appear in the BNC-AC were presented to a native
speaker of English for acceptability judgments.
This chapter has described the data and methodology used to investigate
the use of academic vocabulary in writing by EFL learners. Special care
has been taken to select a set of learner essays from the International
Corpus of Learner English that is as homogeneous as possible and to control
for a number of variables that have been found to influence such writing.
The learner corpus can be compared to the humanities and arts academic
sub-corpus of the British National Corpus to identify learner-specific
features of the use of academic vocabulary. The BNC spoken corpus can
also sometimes be useful to check whether specific words and phrases that
appear in the learner sub-corpora are more typical of speech or writing.
The method used to investigate learners’ use of academic vocabulary
is based on Contrastive Interlanguage Analysis (CIA) and combines compari-
sons of learner and native-speaker writing, and comparisons of different
learner interlanguages.
CIA is very popular among researchers in the field of learner corpus
research and has helped to highlight an unprecedented number of fea-
tures that characterize learner interlanguages. To date, however, most stud-
ies have used the technique only to compare a learner corpus and a native
reference corpus, rather than to explore different learner corpora in the
same target language. The studies that have compared more than one
interlanguage have usually focused on learners from one mother-tongue
background, and used data from one or two other learner populations
only to check whether the features they have identified are L1-specific
(and thus possibly transfer-related) or are shared by other learners. L2/L2
comparisons involving many different first languages are, however,
indispensable if we want to identify the distinguishing features of
learner language at a given stage of development (Bartning, 1997). In the
following chapters, I try to make the most of CIA by comparing academic
vocabulary in ten learner corpora representing different mother-tongue
backgrounds.
Chapter 4
Rhetorical functions in expert

academic writing
This chapter deals with academic vocabulary that serves specific rhetorical
and organizational functions in expert academic writing. Section 4.1 focuses
on the Academic Keyword List and shows that a high proportion of AKL words
can fulfil these functions in academic prose. It lists several steps which are
necessary to turn the AKL into a tool that can be used for curriculum and
materials design (most notably a phraseological analysis of AKL words).
Section 4.2 presents a detailed analysis of exemplificatory devices in
academic writing. This serves as an illustration of the type of data and results
obtained when the whole range of lexical strategies available to expert
writers to organize scientific discourse are examined. For lack of space it is
impossible to describe in similar detail all the functions that were analysed
in the BNC-AC-HUM so as to provide a basis for comparison to EFL learner
writing. Section 4.3 briefly comments on the types of lexical devices used by
expert writers to serve four additional functions: ‘expressing cause and
effect’, ‘comparing and contrasting’, ‘expressing a concession’ and ‘refor-
mulating: paraphrasing and clarifying’ and aims to characterize the phrase-
ology of rhetorical functions in academic prose.
4.1. The Academic Keyword List and rhetorical functions
The functional syllabus has a long tradition in English language teaching

(see Wilkins, 1976; Weissberg and Buker, 1978). Jordan (1997: 165) reports
that most of the textbooks that were published in Britain in the 1980s and
1990s that followed a product approach to academic writing were orga-
nized according to language functions such as explanation, definition,
exemplification, classification, cause and effect, and comparison and con-
trast (e.g. Jordan, 1999). However, they were rarely based on principled
selection criteria, relying instead on the writers’ perceptions of good

practice in academic writing.
Unlike textbooks adopting a functional approach, courses which
use vocabulary as the unit of progression, introduce new words according
to principles such as frequency and range of occurrence. Nation explains
that
such courses generally combine a “series” and a “field” approach to

selection and sequencing. In a series approach, the items in a course
are ordered according to a principle such as frequency of occurrence,
complexity or communicative need. In a field approach, a group of items
is chosen and the course covers them in any order that is convenient,
eventually checking that all the items are adequately covered. Courses
which use vocabulary as the unit of progression tend to break vocabulary
lists into manageable fields, (. . . ), according to frequency, which are
then covered in an opportunistic way. (Nation, 2001: 386)
Most pedagogical applications of Coxhead’s (2000) Academic Word List to

date have adopted this particular approach, using the frequency-based AWL
sub-lists as fields (e.g. Obenda, 2004; Huntley, 2006).
There is a need for teaching materials that merge the two types of syllabus
design, thus adopting a ‘functional-product’ approach (Jordan, 1997: 165)
to academic writing while introducing new vocabulary according to princi-
pled criteria such as frequency and range of occurrence. This is precisely
where the Academic Keyword List has a role to play.
As explained in Section 2.4, the Academic Keyword List requires pedagogic
mediation: it is a platform which can inform a functional syllabus for
academic writing, but it needs to be organized. As argued by Martinez
et al. (2009: 193), ‘a list based on semantic and pragmatic criteria would
perhaps be more useful than lists built solely on frequency criteria.’ Sinclair,
however, warns us that ‘there is no assumption that meaning attaches
only to the word’ (Sinclair, 2004b: 160). Similarly, Siepmann (2005: 86)
comments that ‘neat compartmentalizing of meanings or functions can do
no more than partially capture a complex reality’ in which any word or
multi-word sequence may express more than one discourse relation. This
being said, the results of the automatic semantic analysis of the Academic
Keyword List revealed that a significant proportion of AKL words fall into
semantic categories that correspond to the rhetorical functions typical of
scientific discourse, e.g. A2.2. Affect: Cause – connected, A4. Classification,
A5. Evaluation, A6. Comparing, Q2.2. Speech acts (see Section 2.4). A close
Rhetorical functions in expert academic writing 83
examination of the words classified into these semantic categories made it

possible to identify twelve rhetorical functions that dovetail with the func-
tions typically treated in EAP textbooks adopting a functional approach to
academic writing:
1. Exemplification, e.g. example, for example, illustrate

2. Cause and effect, e.g. cause, consequence, result
3. Comparison and contrast, e.g. contrast, difference, same
4. Concession, e.g. although, despite, however
5. Adding information, e.g. first, further, in addition to
6. Expressing personal opinion, e.g. appropriate, essential, major
7. Expressing possibility and certainty, e.g. likely, possibility, unlikely
8. Introducing topics and ideas, e.g. discuss, examine, subject
9. Listing items, e.g. first, second, third
10. Reformulating – paraphrasing and clarifying, e.g. namely
11. Quoting and reporting, e.g. define, report, suggest
12. Summarizing and drawing conclusions, e.g. conclude, conclusion, summary.
The next step is to analyse all words that may serve one of these twelve
functions in context, with special emphasis on their phraseology. Multi-
word sequences have been shown to provide ‘basic building blocks for
constructing spoken and written discourse’ (Biber and Conrad, 1999:
185) and to correlate closely with the complex communicative demands
of a particular genre, thus contributing to its lexical profile (Biber et al,
1999; 2004; Luzón Marco, 2000). A phraseological analysis will also make
it possible to investigate how academic vocabulary contributes to this
‘shared scientific voice or “phraseological accent” which leads much tech-
nical writing to polarise around a number of stock phrases’ (Gledhill,
2000: 204). It will examine phrasemes, i.e. syntagmatic relations between
at least two lemmas, contiguous or not, written separately or together,
which are typically syntactically closely related and constitute ‘“preferred”
ways of saying things’ (Altenberg, 1998: 122). This is because such
phrasemes:
– form a functional (referential, textual or communicative) unit (e.g.

Burger, 1998);
and
– display arbitrary lexical restrictions (e.g. Mel’čuk, 1998);

and/or
– are characterized by a certain degree of semantic non-compositionality

(e.g. Barkema, 1996);
– display arbitrary restrictions on the word forms that can be used to instan-
tiate at least one of the lemmas involved;
– display a certain degree of syntactic fixity.
The phraseological analysis used here is based on Granger and Paquot’s

(2008a) classification of phraseological units into three main categories:
referential phrasemes, textual phrasemes and communicative phrasemes.
Referential phrasemes are used to convey a content message: they refer to
objects, phenomena or real-life facts. They include lexical and grammatical
collocations, idioms, similes, irreversible bi- and trinomials, compounds
and phrasal verbs. Textual phrasemes are typically used to structure and
organize the content (i.e. referential information) of a text or any type of
discourse; they include grammaticalized sequences such as complex prepo-
sitions and complex conjunctions, linking adverbials and textual sentence
stems. Communicative phrasemes are used to express feelings or beliefs
towards a propositional content or to explicitly address interlocutors, either
to focus their attention, include them as discourse participants or influence
them. They include speech act formulae, attitudinal formulae, common-
places, proverbs and slogans.
In this chapter and the next, I focus on the vocabulary of five rhetorical
functions — exemplification, cause and effect, comparison and contrast,
concession, and reformulating — with occasional forays into other func-
tions. Apart from being essential rhetorical functions in academic prose,
these functions should be among the least sensitive to the text type differ-
ences discussed in Section 3.3. The use of academic words is compared in
the BNC-AC-HUM corpus (a corpus of book samples and journal articles
written by experts in the fields of arts and humanities) and the International
Corpus of Learner English (a corpus of short argumentative essays produced
by EFL learners of English). As the BNC includes truncated texts, it would
not be reasonable to quantitatively compare the words that are used to
serve the function of ‘summarizing and drawing conclusions’. This func-
tion is typically localized in the last paragraphs of a piece of academic
writing and might thus be absent from a number of BNC texts. Nor is it
reasonable to focus on functions such as ‘reporting and quoting’ and
‘expressing personal opinion’. Unlike experts writing in their field, the
learners who produced the argumentative essays were not supposed to

show that they were familiar with the subject by referring to or quoting
from the literature. By contrast, they were explicitly encouraged to give
their personal opinions (topics for the essays include: ‘Some people say
that in our modern world, dominated by science, technology and industri-
alism, there is no longer a place for dreaming and imagination. What is
your opinion?’ and ‘In the 19th century, Victor Hugo said: “How sad it is to
think that nature is calling out but humanity refuses to pay heed.” Do you
think it is still true nowadays?’) (see Section 3.1).
Several researchers in applied corpus linguistics have examined language
features in general reference corpora and compared the distributions and
patterns found in actual language use with the presentations of the same
features in teaching materials such as textbooks or grammars (e.g. Carter,
1998b; Conrad, 2004; Römer, 2004a; 2004b; 2005). They have often found
considerable mismatches between naturally-occurring language and the
type of language that is presented as a model in teaching materials (Römer,
2008: 4). I therefore consulted several EAP textbooks (Harris Leonhard,
2002; Jordan, 1999; Lonon Blanton, 2001; Oshima and Hogue, 2006;
Ruetten, 2003; Zemach and Rumisek, 2005; Zwier, 2002), and listed all the
lexical items that are commonly taught to serve rhetorical functions. The
textbooks-derived list appeared to be very different from the words found
in the Academic Keyword List. For example, the AKL includes a number of
words and phrasemes that are commonly used as exemplifiers: the word-
like units for example and for instance, the noun example, the verbs illustrate
and exemplify, the preposition such as, the adverb notably and the abbrevia-
tion e.g. Other lexical items listed in EAP materials but not found in the
AKL are the expressions by way of illustration and to name but a few, the nouns
illustration and a case in point and the preposition like.
I decided to include the lexical items found in EAP textbooks in my study
of academic vocabulary for two main reasons. First, it is not quite clear why
these items are taught to novice writers and EFL learners while other much
more frequent lexical items that are used to express the same rhetorial func-
tions are missing. Frequency, however, may not be the sole criterion to
include lexical items in the curriculum (see Section 1.1.1). Some of these
non-AKL words may be used in very specific lexico-grammatical environ-
ments, have a very restricted meaning or prove particularly difficult for learn-
ers. It is only by examining their frequency and patterns of use in expert and
learner corpora that I shall be able to assess whether these words and phrase-
mes should be part of an academic vocabulary and added to the AKL.
Second, their inclusion in the description of a specific function in

academic writing allows us to approximate as closely as possible what
Hoffmann (2004: 190) referred to as ‘conceptual frequency’, so that the
frequency of each exemplificatory lexical item can be calculated as a pro-
portion of the total number of exemplifiers. As Wray stated in her book on
formulaic language,
To capture the extent to which a word string is the preferred way of

expressing a given idea (for this is at the heart of how prefabrication is
claimed to affect the selection of a message form), we need to know not
only how often that form can be found in the sample, but also how often
it could have occurred. In other words, we need a way to calculate the
occurrences of a particular message form as a proportion of the total
number of attempts to express that message. (Wray, 2002: 30)
This approach should help us move towards ‘understanding the intersec-

tion of form and function’ (Swales, 2002: 163) in academic prose.
The Academic Keyword List is based on native corpora only, which has
limitations for an analysis of learner writing, especially if conceptual
frequency is to be investigated. EFL learners may use different lexical
devices than native writers to serve rhetorical functions. For example, they
repeatedly use word-like units such as in a nutshell, in brief and all in all for
summarizing and concluding, which are quite rare in academic prose.
A keyword procedure such as that described in Section 2.3.1 for the auto-
matic extraction of potential academic words was therefore adopted to
identify words and word sequences that EFL learners frequently use, but
which are not favoured by expert academic writers. The ICLE corpus was
compared to the BNC-AC-HUM to extract distinctive words in the learner
corpus. The resulting list was analysed to identify words that might serve
one of the 12 rhetorical functions listed above. In learner corpus research,
positive keywords are often referred to as overused words and negative
keywords are said to be underused. These two terms are neutral, and sim-
ply reflect the fact that a word is more/less frequent in learner writing.
Examples of overused words which do not belong to the AKL but are
employed to serve rhetorical functions in learner writing include like, thing,
say, let, I, really, firstly, secondly, thirdly, opinion, maybe, say, sure, but, thanks,
always, so and why (see De Cock, 2003 for a keyword analysis of the French
sub-corpus of ICLE). Learner-specific word sequences are discussed in
Section 5.2.3.
The final lists of words that may be used to serve one of the five
selected rhetorical functions are given below. The words in italics are not
part of the Academic Keyword List. They were identified on the basis of a close
examination of EAP materials and a keyword analysis of learner corpora.
They are included in the corpus-based analyses presented in this chapter
and in Chapter 5 to assess the adequacy of the treatment of rhetorical func-
tions in EAP textbooks and investigate whether the AKL should be supple-
mented with additional academic words.
Exemplification: example, illustration, a case in point, illustrate, exemplify,

such as, like, for example, for instance, e.g., notably, to name but a few, by
way of illustration
Comparison and contrast: analogy, comparison, (the) contrary, contrast,
difference, differentiation, distinction, distinctiveness, (the) opposite, parallel,
parallelism, resemblance, (the) reverse, (the) same, similarity, alike, analogous,
common, comparable, contrary, contrasting, different, differing, distinct, dis-
tinctive, distinguishable, identical, opposite, parallel, reverse, same, similar,
unlike, compare, contrast, correspond, differ, distinguish, differentiate,
look like, parallel, resemble, analogously, by/in comparison, by/in contrast, by way
of contrast, comparatively, contrariwise, contrastingly, conversely, correspond-
ingly, differently, distinctively, identically, in the same way, likewise, on the con-
trary, on the other hand, *on the other side, *on the opposite, parallely, reversely,
similarly, as against, as opposed to, by/in comparison with, contrary to,
*in contrary to, like, in contrast to/with, in parallel with, unlike, versus, as,
whereas, while, as … as, compared with/to, in the same way as/that
Cause and effect: cause (n.), consequence, effect, factor, implication,
origin, outcome, root, reason, result, source, arise from/out of, bring about,
cause (v.), contribute to, derive, emerge, follow from, generate, give rise
to, induce, lead, make sb/sth do sth, prompt, provoke, result in/from, stem
from, trigger, yield, consequent, responsible, as a result of, as a consequence of,
because of, due to, in consequence of, in (the) light of, in view of, on account
of, on the grounds that, owing to, thanks to, accordingly, as a consequence, as a
result, by implication, consequently, hence, in consequence, so, thereby,
therefore, thus, as, because, for, on the grounds that, since, so that, is why
Concession: however, nevertheless, nonetheless, though (adv.), yet, although,
though (conj.), even though, even if, albeit, despite, in spite of,
notwithstanding
Reformulating – paraphrasing and clarifying: i.e., that is, that is to say, in
other words, namely, viz., or more precisely, or more accurately, or rather
4.2. The function of exemplication
This section presents a detailed analysis of the academic words that are
used by expert writers to serve the rhetorical function of exemplification.
Siepmann (2005) showed that exemplificatory discourse markers occur in
all kinds of discursive prose, and are particularly frequent in humanities
texts. He argued, however, that as an object of study, ‘exemplification con-
tinues to be the poor relation of other rhetorical devices’ and that ‘such
neglect has led to a commonly held view in both the linguistic and the
pedagogic literature that exemplification is a minor textual operation, sub-
ordinate to major discoursal stratagems such as “inferring” and “proving”’
(Siepmann, 2005: 111). Coltier (1988) remarked that examples and exem-
plification merit close investigation at two levels: the exemplificatory strate-
gies adopted (i.e. when and why are examples introduced into a text); and
the wording of the example (i.e. the choice of exemplifiers). This section
deals with the latter and focuses on the lexical items used by expert writers
to give an example. For a rhetorical perspective on exemplifiers in native
writing, see Siepmann (2005: 112–18).
The Academic Keyword List (AKL) includes a number of words and multi-
word sequences that are commonly used as exemplificatory discourse mark-
ers: the mono-lexemic or word-like units for example and for instance, the
noun example, the verbs illustrate and exemplify, the preposition such as, the
adverb notably and the abbreviation e.g. Other lexical items commonly listed
in textbooks and EAP/EFL materials, but not found in the AKL, are the
expressions by way of illustration and to name but a few, the nouns illustration
and a case in point and the preposition like. Table 4.1 gives the absolute fre-
quencies of these words in the BNC-AC-HUM as well as their relative fre-
quencies per 100,000 words and the percentage of exemplificatory discourse
markers they represent. In Figure 4.1, the lexical items are ordered by
decreasing relative frequency in the academic corpus. The most frequent
exemplifiers in professional academic writing are the mono-lexemic phrase-
mes such as and for example, plus the noun example, which occur more than
35 times per 100,000 words. Almost half of the exemplifiers — for instance,
like, illustrate, e.g. and notably — occur with a relative frequency of between
5 and 20 occurrences per 100,000 words. The verb exemplify and the noun
illustration are less frequent (around 2.3 occurrences per 100,000 words)
while the adverbials to name but a few and by way of illustration as well as the
noun case in point appear very rarely in the BNC-AC-HUM.
I will now discuss my main findings on the exemplificatory functions
of prepositions, adverbs and adverbial phrases, and then focus on the
exemplificatory use of nouns and verbs.
Table 4.1 Ways of expressing exemplification

found in the BNC-AC-HUM
Abs. freq. % Rel. freq.
Nouns
example 1285 21.6 38.7

illustration 77 1.3 2.3
(BE) a case in point 18 0.3 0.5
TOTAL NOUNS 1380 23.2 41.5
Verbs
illustrate 259 4.4 7.8

exemplify 79 1.3 2.4
TOTAL VERBS 338 5.7 10.2
Prepositions
such as 1494 25.0 45.0
like 532 8.9 16.0
TOTAL PREP. 2026 34.0 61.0
Adverbs
for example 1263 21.2 38.0

for instance 609 10.2 18.3
e.g. 259 4.3 7.8
notably 77 1.3 2.3
to name but a few 4 0.1 0.1
by way of illustration 3 0.1 0.1
TOTAL ADVERBS 2215 37.2 66.7
TOTAL 5959 100 179.4
50
45
40
35
30
25
20
15
10
5
0
e
w
n
e
y
e
te
e
ify
g.
n
t
as
in
pl
tio
bl
nc
lik
fe
pl
tio
ra
e.
pl
po
m
ta
am
ra
ch
ta
ta
ra
st
em
xa
no
ns
st
illu
su
in
st
ex
bu
re
illu
ex
illu
ri
se
e
fo
fo
ca
of
na
ay
a
BE
to
w
by
Figure 4.1 Exemplification in the BNC-AC-HUM

4.2.1. Using prepositions, adverbs and adverbial phrases to exemplify

As shown in Figure 4.1, the complex preposition such as is the most
frequent exemplifier in the BNC-AC-HUM (see Example 4.1). Unlike in
other genres (such as speech and fiction), it is much more frequent than
the preposition like in professional academic writing (Example 4.2).
4.1. This is the arrangement in Holland whereby various institutions such as media,
schools, cultural organisations, welfare services, and hospitals are duplicated,
and run by the separate catholic and protestant communities.
4.2. Surrealist painting had publicity value, especially when executed by a showman
like Salvador Dali, who married the former wife of the poet Paul Éluard.
Similarly, for example is twice as frequent as for instance. These two adverbials are
commonly classified as ‘code glosses’ in metadiscourse theory (see Section 1.3)
as they are used to ‘supply additional information, by rephrasing, explaining,
or elaborating what has been said, to ensure the reader is able to recover the
writer’s intended meaning’ (Hyland, 2005: 52). Code glosses are ‘interactive
resources’ in Hyland’s typology of metadiscourse: they are features used to
‘organize propositional information in ways that a projected target audience
is likely to find coherent and convincing’ (ibid, 50). In a phraseological
approach to academic vocabulary, they fall into the category of textual phrase-
mes as they are mono-lexemic multi-word units, i.e. multi-word units that are
equivalent to single words and which fill only one grammatical slot, with an
organizational – exemplificatory – function. In the BNC-AC-HUM, for example
and for instance are typically used within the sentence, enclosed by commas,
especially after the subject. But they can also follow the subject of the exempli-
fying sentence, while remaining essentially cataphoric in nature (i.e. pointing
forward to the example) as shown in Examples 4.3 and 4.4.
4.3. Such associations of sexual deviance and political threat have a long history
sedimented into our language and culture. The term ‘buggery’, for example,
derives from the religious as well as sexual nonconformity of an eleventh-century
Bulgarian sect which practised the Manichaean heresy and refused to propagate
the species; the OED tells us that it was later applied to other heretics, to whom
abominable practices were also ascribed.
4.4. The small mammals living today in many different habitats and climatic zones
have been described, so that the associations between faunal types and ecology
are well documented [ . . . ]. Woodland faunas, for instance, are distinct from
grassland faunas, and tropical faunas distinct from temperate faunas, and
when these and more precise distinctions are made it is possible to correlate and
even define ecological zones by their small mammal faunas.
Table 4.2 The use of ‘for example’ and ‘for

instance’ in the BNC-AC-HUM
Cataphoric marker Endophoric marker
for example 1,185 (93.8%) 78 (6.2%)

for instance 588 (96.5%) 21 (3.5%)
For example and for instance can also function as endophoric markers and
refer back to an example given before, as illustrated in Example 4.5. This
use is, however, much less frequent (see Table 4.2).
4.5. Thirdly, the debates over how far to forge a strategy either for winning power or
for promoting economic development in a post-revolutionary society have not
been satisfactorily resolved, and indeed perhaps cannot be, given that counter-
revolutionary response to any successful formula will ensure that it will be that
much more difficult to apply the same tactics in another situation. Such is the
relation which Nicaragua bears to El Salvador, for example.
In Mieux écrire en anglais, Laruelle (2004: 96–7) writes that for example
should be placed in the initial position if the whole sentence has an exem-
plificatory function, while the adverbial should follow the subject, between
commas, if only the subject is the example. This statement, however, is not
confirmed by corpus data. Example 4.3 clearly shows that for example need
not be placed in the initial position to introduce an exemplificatory
sentence.
Like nouns and verbs, mono-lexemic adverbial phrases can also have
their own phraseological patterns. Three verbs, i.e. consider (f[n, c]1 = 13;
log-likelihood = 92.5), take (f[n, c] = 7; log-likelihood = 19.1) and see
(f[n, c] = 19; log-likelihood = 71.7) are significant left co-occurrents of for
example in the BNC-AC-HUM. They are used in the second person of the
imperative. The verbs consider and take are typically used with for example to
introduce an example that is discussed in further detail over several
sentences:
4.6. It is worth pausing here momentarily to observe that such legally provided rem-
edies can be morally justified even when applied to people who are not subject to
the authority of the government and its laws. Consider for example the law of
defamation. Assuming that it is what it should be, it does no more than incor-
porate into law a moral right existing independently of the law. The duty to
compensate the defamed person is itself a moral duty. Enforcing such a duty
against a person who refuses to pay damages is morally justified because it
implements the moral rights of the defamed. One need not invoke the authority
of the law over the defamer to justify such action. The law may not have
authority over him.
4.7. But the concept of compresence is far from clear. If it implies that no time-lag is
detectable between elements of an experienced “complex”, then this is true only in
a very limited sense. Take, for example, the perceptual experience that I have
while looking at this bunch of carnations arranged in a vase on the table in the
middle of the room. I see this “complex” as one whole. But while I am looking at
it my eyes constantly wander from one flower to the next, pausing at some, ignor-
ing others, picking out the details of their shapes and colours. Finally, without
taking my eyes off the flowers, I may move the vase closer, or walk around the
table and look at the flowers from different angles. The scene will keep constantly
changing. As a result, I shall experience a succession of different “complexes of
qualities” but I shall still be looking at the same bunch of flowers.
Hyland describes this type of imperative as directives with a rhetorical

purpose that ‘can steer readers to certain cognitive acts, where readers
are initiated into a new domain of argument, led through a line of reason-
ing, or directed to understand a point in a certain way’ (2002a: 217).
He categorizes them as interactional resources, and more specifically as
engagement markers, i.e., ‘devices that explicitly address readers, either to
focus their attention or include them as discourse participants’ (Hyland,
2005:53).
The verb see is frequently used in professional academic writing as an
endophoric marker to refer to tables, figures, or other sections of the
article or to someone else’s ideas or publications (Hyland, 1998, 2002a,
2005; Hyland and Tse, 2007). The use of the second person imperative
see ‘allow[s] academic writers to guide readers to some textual act, referring
them to another part of the text or to another text’ (Hyland, 2002a: 217).
In the BNC-AC-HUM, 63 per cent of the occurrences of the sequence see for
example appear between brackets as in Example 4.8:
4.8. Afro-Caribbean and Asian children are indeed painfully aware that many
teachers view them negatively and some studies have documented reports of
routine racist remarks by teachers (see for example Wright in this volume).
Swales et al. (1998) examined a corpus of research articles in ten disciplines

(art history, chemical engineering, communication studies, experimental
geology, history, linguistics, literary criticism, philosophy, political science
and statistics) and found that second person imperative see was the most
frequent imperative form across disciplines. Similarly, in his study of

directives in academic writing, Hyland (2002a) analysed a corpus of
240 published research articles, seven textbook chapters and 64 project
reports written by final year Hong Kong undergraduates and found that the
second person imperative see represented 45 per cent of all imperatives in
that corpus. Note that in both studies, the use of the imperative varied
across disciplines.
The advantage of adopting a phraseological approach to rhetorical func-
tions, and hence metadiscourse resources, appears quite clearly here. The
sequences take/consider for example consist of two metadiscourse resources in
Hyland’s (2005) categorization scheme: the imperative forms take and con-
sider are interactional resources, and more specifically engagement mark-
ers, while for example is a code gloss. Similarly, see is an endophoric marker
in see for example. In our phraseological framework, the sequences take/
consider/see for example are textual phrasemes as they form functional —
textual — units and display arbitrary lexical restrictions.
The adverb notably can be regarded as a typical academic word: Figure 4.2
shows that it is much more frequent in academic writing than in the other
genres. It is typically preceded by a comma (Example 4.9) and is qualified
by the adverb most in 15.2 per cent of its occurrences in the BNC-AC-HUM
(Example 4.10).
4.9. Some bishops, notably Jenkins of Durham, Sheppard of Liverpool, and

Hapgood of York, have spoken out about deprivation in the inner cities, the
miners’ strike, and the need for government to show a greater compassion for,
and understanding of, the poor.
50
freq. per million
40
words
30
20
10
0
academic news fiction speech
writing
Figure 4.2 The distribution of the adverb ‘notably’ across genres

4.10. At leading public schools, most notably Eton, there is a tradition of providing
MPs, government ministers, and prime ministers.
The abbreviation e.g. (or less frequently eg) stands for the Latin ‘exempli
gratia’ and means the same as for example. It is quite common in the
BNC-AC-HUM, in which 65.7 per cent of its occurrences are between
brackets:
4.11. Direct curative measures (e.g. flood protection) are clearly within the domain
of a soil conservation policy.
In contrast to for example and for instance, the great majority of occurrences
of e.g. introduce one or more noun phrases rather than full clauses:
4.12. It may help to refer the patient to other agencies (e.g. social services, a psycho-
sexual problems clinic, self-help groups).
When e.g. is used without brackets, it is preceded by a comma:
4.13. Primary industries are those which produce things directly from the ground, the
water, or the air, e.g. farming.
As shown in Figure 4.1, the textual phrasemes by way of illustration and

to name but a few are quite rare in the BNC-AC-HUM. In fact, these expres-
sions are very infrequent in all types of discourse. Figures 4.3 and 4.4 show
the distribution of the two phrasemes in four main genres of the British
1
freq. per million
words
0.5
0
n
s
ch
ic
tio
w
em
ee
ne
fic
ad
sp
ac
Figure 4.3 The distribution of ‘by way of illustration’ across genres

freq. per million

words
0.5
s
ic
ch
n
w
tio
em
ee
ne
fic
ad
sp
ac
Figure 4.4 The distribution of ‘to name but a few’ across genres
National Corpus (BNC), namely academic writing, fiction, newspaper texts,

and speech. Some 36 per cent of the occurrences of by way of illustration in
the BNC (i.e. 10 out of a total of 28) appear in academic texts, and only one
occurrence comes from speech. The expression to name but a few is more
frequent than by way of illustration in the whole BNC, but only 12.8 per cent
of its occurrences (10 out of 78) appear in academic writing. No instance of
to name but a few was found in speech.
4.2.2. Using nouns and verbs to exemplify

Nouns and verbs are used to give examples in specific phraseological
patterns. The noun which is most frequently used in this way is example,
which is much more common than illustration or a case in point. Table 4.3
shows that it is as frequently used as its connective counterpart, the textual
phraseme for example, in the BNC-AC-HUM.
The significant verb co-occurrents of the noun example in the BNC-
AC-HUM are listed in Table 4.4. The verb be is the most frequent verb
co-occurrent of example in windows of one to three words to both the left
Table 4.3 The use of ‘example’ and ‘for example’ in the

BNC-AC-HUM
example for example
Absolute freq. % Absolute freq. %
BNC-AC-HUM 1285 50.43 1263 49.57

Table 4.4 Significant verb co-occurrents

of the noun ‘example’ in the BNC-AC-
HUM
Left co-occurrents Right co-occurrents
Verb freq. Verb freq.
be 139 be 84
provide 26 illustrate 14
take 29 show 21
give 12 give 15
cite 5 suggest 12
consider 12 quote 6
illustrate 7 include 7
show 9 provide 8
see 10 concern 6
serve 5
will 16
can 15
would 13
(3L-1L) and the right (1R-3R). Be is, however, twice as frequent in the left
window. When example is preceded by the verb be, it mainly functions as a
retrospective label, i.e. it refers back to the exemplifying element which is
given as the subject. The noun example may refer back directly to a noun
phrase (Example 4.14) or to the demonstrative pronoun this which further
points to a previous exemplifying sentence (Example 4.15).
4.14. Vision is a better example of a modular processing system.

4.15. The designer at Olympia chose to represent the race by the moment
before it started, as Polygnotos showed the sack of Troy in its
aftermath. This is the supreme surviving example of the early classical taste
for stillness and indirect narrative.
By contrast, when the noun example is introduced by there + BE (11%) or here

+ BE (15%), it functions as an advance label which refers forward to
a following example (underlined):
4.16. In addition, of course, choices can result from lengthy weighing of odds. Here
is a simple example of the complexity at issue. I am driving along a narrow
main road, used by fast-moving traffic, with my children in the back
seat. A car some distance ahead strikes a large dog but does not stop,
leaving the creature walking-wounded but in obvious distress.
My children, seeing what occurred, cry out. I glance in the rear-view

mirror to see other cars close behind; slowing down but then speed-
ing up again. I do not stop.
When example is the subject of the verb be, it always functions as an advance
label. It is often qualified by an adjective (see Examples 4.17 to 4.19)
and the exemplified item is generally introduced by the preposition of
(Examples 4.18 and 4.19). In Example 4.19, the exemplified item is the
pronoun this which refers back to the previous sentences.
4.17. The prime example is the Dada movement, whose nihilistic work is now
admired for its qualities of imagination.
4.18. The clearest example of emotive language is poetry, which is entirely
concerned with the evocation of feelings or attitudes, and in which the writer’s
and reader’s attention is not, or should not be, directed at any of the objective
relationships between words and things.
4.19. Until the seventeenth century many, even most, European frontiers were very
vague, zones in which the claims and jurisdictions of different rulers and their
subjects overlapped and intersected in a complex and confusing way. This was
especially true in eastern Europe, where many states were large and central
governments were usually less effective at the peripheries of their territories than
in the west. The most striking example of this is perhaps the frontier in the
Danubian plain between the Ottoman empire and the Habsburg
territories in central Europe.
Copular clauses using the noun example consist of textual sentence stems
(An example of Y is . . . ) and rhemes (. . . is an example of Y). Textual sentence
stems are routinized fragments of sentences which serve specific textual or
organizational functions. They consist of sequences of two or more clause
constituents, and typically involve a subject and a verb, e.g. An example
of Y is . . . . They typically have an empty slot for the following object or
complement. Rhemes typically consist of a verb and its post-verbal elements,
which do not contain any thematic element (e.g. . . . is another issue).
Four other verbs, namely provide, give, illustrate and show (given in italics
in Table 4.4), are significant left and right co-occurrents of the noun
example. The verbs take, cite, consider, see and serve are only significant left
co-occurrents, while the verbs suggest, concern, quote and include and the
modals will, can and would are significant right co-occurrents. The verbs
provide, take, give, cite, consider, see, serve and include often co-occur with the
noun example to form textual — exemplificatory — phrasemes. The verb
provide can be used in active or passive structures, but active structures in

which the subject is the example (Example 4.20) are more frequent. The
verb cite is more often found in a passive structure in which example
functions as a retrospective label (Example 4.21). The two verbs often form
rhemes with the noun example:
4.20. The Magdalen College affair, for example, provides a classic example of
passive resistance.
4.21. A famous passage of art criticism can be cited as one example entirely beyond
dispute.
The verb take is mainly used with the noun example in sentence-initial
exemplificatory infinitive clauses (68.9%; Example 4.22). It also occurs in
active structures with a personal pronoun subject (13.79%; Example 4.23)
and in imperative sentences (13.79%). When used in the imperative, it
generally appears in the second person (Example 4.24) and there is only
one occurrence in the first person plural in the BNC-AC-HUM. By contrast,
the verb consider is mainly used with the noun example in imperative
sentences (70%), usually second person imperatives (Example 4.25) and
less frequently first person plural. The verb see always co-occurs with the
noun example in the second person of the imperative (Example 4.26). It is
never used to introduce an example, but always as an endophoric marker
to direct the reader’s attention to an example elsewhere in the text.
4.22. To take one example, at the beginning of the project seven committees were
established, each consisting of about six people, to investigate one of a range of
competing architectural possibilities.
4.23. In accordance with the theme of this chapter, I shall simply use ‘stylistics’
as a convenient label (hence the inverted commas) for the branch of
literary studies that concentrates on the linguistic form of texts, and I shall
take four different examples of this kind of work as alternatives to the Prague
School’s and Jakobson’s approach to the relationship between linguistics and
literature.
4.24. Take the example of following an object by eye-movements (so-called
‘tracking’).
4.25. Consider the following example.
4.26. The most important vowel is set to two or more tied notes in a phrase designed
to increase the lyrical expression (see Example 47, above).
The verb include is used with the plural form of the noun example in subject
position to introduce an incomplete list of examples in object position:
4.27. The floral examples include a large lotus calyx and two ivy leaves joined by a
slight fillet.
Another set of verb co-occurrents of example is used to discuss the

examples given in a text. These include quote (Example 4.28), suggest and
show (Example 4.29) to talk about conclusions that can be drawn from the
examples, and illustrate (Example 4.30) to show what something is like or
that something is true.
4.28. Thirdly, in all the examples quoted here, there is a sense in which all observers
see the same thing.
4.29. The example shows that the objector’s neat distinction between adjudicative
and legislative authorities is mistaken.
4.30. This example clearly illustrates the theory dependence and hence fallibility of
observation statements.
These significant co-occurrences illustrated in Examples 4.28 to 4.30 do not

qualify as collocations as the meaning of the verb is not restricted by the
noun example, and the combinations are fully explicable in semantic and
syntactic terms. However, these co-occurrences are frequently used in
adverbial clauses (e.g. as this example suggests . . . ) and sentence stems (e.g.
this example [adv.] illustrates . . .) which describe examples, give more detail
about them, and make suggestions on their basis.
The advantage of using the noun example rather than the adverbials for
example or for instance is that it allows the writer to evaluate the example in
terms of its suitability, e.g. good, outstanding, fine, excellent (Example 4.31) or
typicality, e.g. classic, typical, prime (Example 4.32). The adjectives above and
following are used to situate the example in the text (Example 4.33): used
with the noun example, they function as endophoric markers in Hyland’s
(2005) typology of metadiscourse features. Table 4.5 gives the 24 adjectives
that significantly co-occur with the noun example in the BNC-AC-HUM.
4.31. An outstanding example of this type of narrative is Vargas Llosa’s Conversa-

tion in the Cathedral, which pivots around a four-hour conversation
between two characters, the whole novel being made up of dialogue and
narrative units generated in waves by the central conversation, as the two
men’s review of their past lives sparks off inner thoughts and recollections and
conjures up other conversations and dramatised episodes.
4.32. The prime example is the Dada movement, whose nihilistic work is now
admired for its qualities of imagination.
Table 4.5 Adjective co-occurrents of the

noun ‘example’ in the BNC-AC-HUM
Adjective freq. Adjective freq.
good 38 fine 9
above 15 notable 8
following 18 isolated 8
well-known 10 interesting 9
obvious 16 known 7
classic 11 excellent 6
typical 13 prime 7
outstanding 10 trivial 5
extreme 12 previous 6
clear 16 remarkable 5
simple 13 numerous 5
striking 9 single 6
4.33. Consider the following example.
There is a case for considering the co-occurrence classic example as a free

combination, i.e. a word combination that is semantically fully composi-
tional, syntactically fully flexible and collocationally open: the adjective clas-
sic is used with a meaning that is listed as its first sense in the Longman
Dictionary of Contemporary English (LDOCE4) (1. TYPICAL: having all the
features that are typical or expected of a particular thing or situation) and the
Oxford English Dictionary Online2 (1. of the first class, of the highest rank or
importance; approved as a model; standard, leading). However, the adjective is
only commonly used in this sense with a very limited number of nouns—
example, mistake and case3. This is clearly an illustration of the difficulty
of separating the senses that a word has in isolation from those that it
acquires in context (see Barkema, 1996). Following Granger and Paquot
(2008a: 43), I classified co-occurrences of this type as collocations,
i.e. usage-determined or preferred syntagmatic relations between two
lexemes in a specific syntactic pattern. Both lexemes make a separate
semantic contribution to the word combination but they do not have the
same status. The ‘base’ of a collocation is semantically autonomous and is
selected first by a language user for its independent meaning. The second
element, i.e. the ‘collocate’ or ‘collocator’, is selected by and semantically
dependent on the ‘base’. The co-occurrence prime + example is a clear
example of a collocation: the adjective prime has two core meanings –
‘most important’ and ‘of the very best quality or kin’ – but a prime example
is ‘a very typical example of sth’. Collocations represent 8.3 per cent of the
types and 6.87 per cent of the tokens of adjective + example co-occurrences.
Other adjectives form semantically and syntactically fully compositional

sequences with the noun example. Thus, the meaning of an outstanding exam-
ple is composed of the meanings of the adjective outstanding and the noun
example. This does not mean, however, that they are pedagogically uninter-
esting. First, they constitute ‘preferred ways’ of qualifying example as they
are repeatedly used with this noun. Second, in her study of verb + noun
combinations, Nesselhauf (2005) has shown that free combinations
are prone to erroneous or, at least, unidiomatic use in learners’ writing.
Similarly, Lorenz (1998; 1999a) has pointed out that German learners’ use
of adjectives, irrespective of their phraseological status, differs from that of
native students.
The added value of using statistics, and more specifically association
measures, to analyse the common co-occurrences of a word in a large
corpus is made clear by comparing the significant adjective co-occurrents
of the noun example (listed in Table 4.5) with attested adjectival collocates
(as given by Siepmann, 2005: 137). In addition to most of the adjectives
given in Table 4.5, Siepmann listed a number of adjectives that do not
appear even once in the 87-million word written part of the BNC (beguiling,
consummate, eminent, apposite, anodyne, happy, alarming, crass, cautionary), and
adjectives which occur only once or twice in the corpus (exquisite, well-worn,
edifying, emotive, awe-inspiring, glittering, hideous). To use Sinclair’s (1999: 18)
words, these co-occurrents are best described as ‘singularities’ and do not
represent ‘the habitual usages of the majority of users’.
Apart from verbs and adjectives, other significant co-occurrents of
the noun example are found in professional academic writing. Left co-
occurrents include determiners and the pronoun this. Indefinite determin-
ers (a, another and one) are more frequent than the definite article the
with example. The is mainly used when the noun is qualified by a superlative
adjective or preceded by ordinals such as first, next and last (Example
4.34).
4.34. The first two examples discussed below illustrate different ways in which the
linguistic model is used to develop a narrative model, and (. . . ).
The pronoun this is typically used as a subject with the verb be to refer
back to an example given in a previous sentence (see Example 4.15 above).
Right co-occurrents include the preposition of and the pronoun this.
In 40 per cent of its occurrences, the noun example is directly followed by
the preposition of which introduces the idea, class or event exemplified,
which in turn is often determined by a demonstrative (Example 4.31 above)
or pronominalized to refer back to a previous sentence.
These findings support Gledhill’s (2000) view that there may be a very
specific phraseology and set of lexico-grammatical patterns for function
words in academic discourse. Function words seem to display co-occurrence
preferences just as content words do (also see Renouf and Sinclair’s (1991)
notion of a ‘collocational framework’). These findings also provide strong
evidence against the use of stopword lists when extracting co-occurrences
from corpora as there is a serious danger of missing a whole set of phraseo-
logical patterns (Clear, 1993).
The verbs illustrate and exemplify can also be used as exemplifiers. The
verb illustrate is used with the meaning of ‘to be an example which shows
that something is true or that a fact exists’ (Example 4.35) or ‘to make the
meaning of something clearer by giving examples’ (Example 4.36)
(LDOCE4). The verb exemplify is used with the meanings of ‘to be a very
typical example of something’ and ‘to give an example of’.
4.35. The narratives of the Passio Praeiecti and of the Vita Boniti both have
their peculiarities, and it is possible that the appointment of Praeiectus and
the retirement of Bonitus were less creditable than their hagiographers claim.
Nevertheless they do illustrate the complexities of local ecclesiastical politics.
4.36. My aim will be to illustrate different ways of approaching literature through its
linguistic form, ways involving the direct application of linguistic theory and
linguistic methods of analysis in order to illuminate the specifically literary
character of texts.
Both verbs are more frequent in academic writing than in any other genre.
Figure 4.5 compares the relative frequencies of the two verbs in academic
writing with three main genres represented in the British National Corpus.
The verb illustrate is not uncommon in news but a quick look at its concor-
dances shows that a significant proportion of its occurrences are used not
to introduce an example, but with the meaning of ‘to put pictures in a
book, article, etc’ (Example 4.37). Exemplify is very rarely used in other
genres.
4.37. Also in the pipeline is an Australian children ‘s TV series based on Gumnut

Factory Folk Tales (written, illustrated and published by Chris Trump).
(BNC-NEWS)
Figure 4.5 also shows that the verb illustrate is more frequent than
exemplify in professional academic writing. The frequencies of the two verb
lemmas, their word forms and tenses in the BNC-AC-HUM4 were computed
in the way described by Granger (2006). Table 4.6 shows that there is no
140
frequency per million word
120
100
80
60
40
20
0
Academic News Fiction Speech
illustrate exemplify
Figure 4.5 The distribution of the verbs ‘illustrate’ and ‘exemplify’ across
genres
Table 4.6 The use of the lemma ‘illustrate’ in

the BNC-AC-HUM
The lemma illustrate BNC-AC-HUM
illustrate 97 37.45%
simple present 36 13.89%
infinitive 61 23.55%
illustrated 84 32.43%
simple past 7 2.7%
present/past perfect 0 0%
past participle 77 29.73%
illustrates 63 24.32%
illustrating 15 5.79%
continuous tense 2 0.77%
-ing clause 13 5%
Total 259 100%
Nr of words 3,321,867
Relative freq. per 100,000 7.8
words
major difference in proportion between the verb forms illustrate, illustrated

and illustrates. When used in active structures, the verb is often preceded
by a non-human subject such as example, figure, table, case or approach
(Example 4.38). Almost all occurrences of the past participle appear in the
passive construction BE illustrated by/in (Example 4.39).
4.38. This example clearly illustrates the theory dependence and hence fallibility
of observation statements.
4.39. The contrast between the conditions on the coast and in the interior is illus-
trated by the climatic statistics for two stations less than 30 km (18.5 miles)
apart.
The sentence-initial adverbial clause To illustrate this/the point/X, . . .

(Example 4.40) represents 2.7 per cent of the occurrences of the lemma
illustrate in the BNC-AC-HUM.
4.40. How many observations make up a large number? (. . . ) Whatever the answer
to such a question, examples can be produced that cast doubt on the invariable
necessity for a large number of observations. To illustrate this, I refer to the
strong public reaction against nuclear warfare that followed the dropping of the
first atomic bomb on Hiroshima towards the end of the Second World War.
In the BNC-AC-HUM, illustrate significantly co-occurs with the noun exam-

ple [LogL = 112] in a 3L-1L window, and with the nouns point [LogL =
168.78], example [LogL = 49.65] and fig. [LogL = 45.08] in a 1R-3R window.
The noun point is used as an object of illustrate which refers back to an idea
put forward in a previous sentence:
4.41. For most of this century it is those disorders gathered together under the head-
ing of ‘schizophrenia’ that have been used as the paradigm for trying to describe
and understand psychosis. Yet even in this form, or forms – for many would
prefer to talk of ‘the schizophrenias’ – there is still no universally accepted set of
criteria for diagnosis. To illustrate the point, one of the present authors was
recently asked to review a paper submitted to a prominent psychiatric journal,
proposing a new set of rules for diagnosing schizophrenia. In the course of their
analysis the authors determined the extent to which their proposed criteria
agreed with those contained in other existing diagnostic schemes – some ten or
twelve of them. Correlations varied over a very wide range.
The noun figure (and the abbreviation fig.) is used either as the subject of
the verb illustrate or in the passive structure illustrated in Figure x. This
co-occurrence is even more marked in academic genres such as social
sciences, natural sciences and medicine which rely extensively on figures,
tables and diagrams (see Examples 4.42 and 4.43).
4.42. Figure 1 illustrates the spread of results for the alcoholics and the controls.
(W_ac_medicine BNC sub-corpus, see Table 3.2)
4.43. The advantages of the system are illustrated in Fig. 8.2 and, like the Peruvian
example discussed above, the fallow stage is contributing to crop productivity
as well as providing protection against soil erosion. (W_ac_soc_science
BNC sub-corpus, see Table 3.2)
The adverbs well, better, best and clearly are sometimes used with illustrate to
evaluate the typicality or suitability of the example (Example 4.44). The
verb illustrate also co-occurs significantly with how to introduce a clause
(Example 4.45), with the verb serve (Example 4.46), and with the modals
will, can and may (Example 4.47).
4.44. The history of the English monarchy well illustrates both the importance and
the unimportance of war.
4.45. We recently did a simple experiment which happens to illustrate how children’s
knowledge of where an object is determines their behaviour.
4.46. While our discussion in this chapter is of the doctrine of neutrality as such,
Rawls ‘ treatment of it will serve to illustrate the problems involved.
4.47. This prejudice against close involvement with the secular government may be
illustrated by an anecdote related in the about Molla Gurani.
Table 4.7 shows that the lexico-grammatical preferences of the verb

exemplify differ from those of illustrate. A large proportion of the occurrences
Table 4.7 The use of the lemma ‘exemplify’

in the BNC-AC-HUM
The lemma exemplify BNC-AC-HUM
exemplify 9 11.4%
simple present 5 6.33%
infinitive 4 5%
exemplified 53 67%
simple past 8 10%
present/past perfect 1 1.26%
past participle 44 55.7%
exemplifies 15 19%
exemplifying 2 2.53%
continuous tense 0 0%
-ing clause 2 2.53%
Total 79 100%
Nr of words 3,321,867
Relative freq. per 100,000 2.38
words
of the lemma exemplify are –ed forms, and more precisely past participle
forms, of the verb. In the BNC-AC-HUM, the verb significantly co-occurs
with the verb be and the conjunction as in a 3L-1L window, and with the
prepositions by and in in a 1R-3R window. These significant co-occurrents
highlight the preference of the verb for the passive structure BE exemplified
by/in (Example 4.48) and the lexico-grammatical pattern as exemplified by/in
(Example 4.49). Exemplify is also often used after a noun phrase, preceded
by a comma (Example 4.50). Unlike illustrate, the verb exemplify does not
co-occur significantly with nouns.
4.48. The association of this material with the clerk is clearly exemplified by
Chaucer’s wife of Bath’s fifth husband, the clerk Jankyn, who, in the Wife of
Bath’s Prologue, reads antifeminist material to her from his book Valerie and
Theofraste.
4.49. He assumed, without argument, that science, as exemplified by physics, is supe-
rior to forms of knowledge that do not share its methodological characteristics.
4.50. Piaget’s claim that thinking is a kind of internalised action, exemplified in the
assimilation-accommodation theory of infant learning mentioned above, is really
a global assumption in search of some refined, detailed and testable expression.
4.2.3. Discussion
The description of exemplifiers presented here does not aim at exhaustive-
ness in professional academic writing but at typicality. The corpus-based
methodology adopted has highlighted a number of lexical items that are
repeatedly used as exemplifiers in academic writing. The function of exem-
plification can be fulfilled by a whole spectrum of single words (the preposi-
tion like, the adverb notably, the abbreviation e.g.) and word combinations,
i.e. word-like units or mono-lexemic phrasemes (the preposition such as,
the adverbials for example and for instance), sentence stems (An example of
Y is X; Examples include . . .) and rhemes (… is an example of . . .; . . . provides
a classic example of . . .), imperative clauses (Consider, for example . . .) and
sentence-initial infinitive clauses (To take one example, . . .). A large majority
of these word combinations are semantically and syntactically fully compo-
sitional; the exceptions are a few collocations such as prime example and
classic example. They are, however, characterized by their high frequency of
use and can be described as ‘preferred ways’ (Altenberg, 1998) of giving an
example in professional academic writing.
Siepmann (2005) analysed a 9.5-million word corpus of academic
writing, but did not make use of statistical methods. He enumerated every
single occurrence of word sequences used to give an example and listed

rare events such as the infinitive clauses to paint an extreme example and
to pick just one example (a single occurrence in his corpus), the co-occurrence
example + is afforded by and the expression for the sake of example. It may be
argued that privileging exhaustiveness over typicality in corpus linguistic
research is counter-productive, and that such an approach results in too
much — unreliable — information. Siepmann, for example, wrote that
English authors have a large range of exemplificatory imperatives at their

disposal, using the direct second-person imperative VP ~ as well as the less
imposing hortative let us + VP and the inclusive let me + VP. Of these last
two, the former is around five times more frequent than the latter,
showing a high degree of audience sensitivity among authors. (Siepmann,
2005:120)
A closer look at his frequency data (reprinted in Table 4.8) shows, however,
that the co-occurrences see/take/consider + for example account for 89.4 per
cent of the imperatives Siepmann found. First person imperatives are
extremely rare and let me + VP only appeared three times in the 9.5-million
word corpus of professional academic writing he used. Although a large
range of exemplificatory imperatives may be available to language users, only
a very limited set of these are widespread in professional academic writing.
Table 4.8 The use of imperatives in academic writing (based on

Siepmann, 2005: 119)
Imperatives in academic writing Frequency %
(for example/for instance) see (for example/for instance) NP 200 66.2

(for example) consider (for example) NP 54 17.9
take, for (another) example, NP 16 5.3
Consider a(n) (ADJ) example/instance 7 2.3
take the example of (as examples of NP) 5 1.7
consider (as an example) NP 3 1
take, as an example, NP 1 0.3
as an illustration (of this)/ by way of (brief) illustration, 2 0.7
consider NP (2) 2 0.7
Take (even) NP (2) 4 1.3
Let us (now) take + (as) + DET + ADJ + example(s) 4 1.3
Let us consider + DET + ADJ + example(s) 2 0.7
Let me give (you) (but) one example 1 0.3
Let me offer + DET (+ ADJ+) example 1 0.3
Let us consider, for the sake of illustration, NP
Total 302 100
The analysis of exemplifiers presented here also validates the method

used to design the Academic Keyword List. The exemplificatory lexical items
which were extracted are of two types:
— the most frequent exemplifiers in academic writing (such as, example, for
example and for instance) (see Figure 4.1 discussed earlier in this chapter);
— lexical items which are not as frequent as such as, example, for example and
for instance, but which are more frequent in academic prose than in any
other genres (illustrate, exemplify, e.g. and notably).
The preposition like can be used to fulfil an exemplificatory function in

academic writing but it is much more common in other genres. The nouns
illustration and case in point are quite characteristic of formal textual genres,
but they are infrequent. The expressions to name but a few and by way of
illustration are rare in all types of discourse.
4.3. The phraseology of rhetorical functions in

expert academic writing
This section briefly comments on the types of lexical devices used by expert
writers to serve the functions of ‘expressing cause and effect’, ‘comparing
and contrasting’, ‘expressing a concession’ and ‘reformulating’ in an
attempt to give a wider overview of the way academic vocabulary is used to
serve specific rhetorical functions. It aims to characterize the phraseology
of these rhetorical functions in academic prose.
Table 4.9 shows that the lexical means of expressing a concession consist
of single word adverbs (e.g. however, nevertheless, yet), (complex) conjunc-
tions (e.g. although, even though) and (complex) prepositions (e.g. despite, in
spite of). Similarly, reformulation is most frequently achieved by means of
the mono-lexemic units that is and in other words, the abbreviation i.e. and
the adverb namely (Table 4.10).
Adverbs, prepositions and conjunctions also represent a large proportion
of the lexical devices used by expert writers to serve the functions of
‘expressing cause and effect’ (Table 4.11) and ‘comparing and contrasting’
(Table 4.12). However, these two functions can also be realized by means of
nouns, verbs and adjectives in specific phraseological or lexico-grammatical
patterns. As shown in Table 4.11, nouns account for 32.5 per cent of
the lexical means used to express a cause or an effect in academic writing,
e.g. cause, factor, source, effect, result, consequence, outcome and implication.
Table 4.9 Ways of expressing a concession in

the BNC-AC-HUM

Adverbs
however 3,353 28.6 100.9
nevertheless 676 5.8 20.3
nonetheless 66 0.6 2.0
though ADV 144 1.2 4.3
yet 1,817 15.5 54.7
TOTAL ADVERBS 6,056 51.6 182.3
Conjunctions
although 2,292 19.5 69.0
though CONJ 1,721 14.7 51.8
even though 248 2.1 7.5
(even if) 451 3.8 13.6
albeit 80 0.7 2.4
TOTAL CONJ. 4,792 40.86 144.26
Prepositions
despite 681 5.8 20.5
in spite of 159 1.4 4.8
notwithstanding 39 0.3 1.2
TOTAL PREP. 879 7.5 26.46
TOTAL 11,727 100 353
Table 4.10 Ways of reformulating, paraphrasing

and clarifying in the BNC-AC-HUM
i.e. 330 25.1 9.9

that is 375 28.5 11.3
that is to say 81 6.2 2.4
in other words 210 16.0 6.3
namely 187 14.2 5.6
viz. 21 1.6 0.6
or more precisely 12 0.9 0.4
or more accurately 7 0.5 0.2
or rather 91 6.9 2.7
TOTAL 1,314 100 39.6
Verbs are also common: cause, bring about, contribute to, lead to, result in, derive,
emerge, and stem. Patterns involving nouns (e.g. contrast, comparison, difference
and distinction) and verbs (e.g. contrast, differ, distinguish and differentiate)
are often used to compare and contrast but adjectives (e.g. different, distinct,
differing and distinctive) play a more prominent role and account for
29.2 per cent of the lexical means used by expert writers (Table 4.12).
Table 4.11 Ways of expressing cause and effect in

the BNC-AC-HUM
nouns
cause 755 2.8 22.7

factor 550 2.1 16.6
source 1,175 4.4 35.4
origin 500 1.9 15.0
root 183 0.7 5.5
reason 1,802 6.8 54.2
consequence 450 1.7 13.6
effect 1,830 6.9 55.0
result 813 3.1 24.5
outcome 143 0.5 4.3
implication 411 1.7 12.4
TOTAL NOUNS 8,612 32.52 259.25
Verbs
cause 570 2.2 17.2
bring about 125 0.5 3.8
contribute to 276 1.0 8.3
generate 227 0.8 6.8
give rise to 101 0.4 3.0
induce 67 0.2 2.0
lead to 671 2.5 20.2
prompt 115 0.4 3.5
provoke 161 0.6 4.9
result in 327 1.2 9.8
yield 129 0.5 3.9
make sb/sth do sth 171 0.6 5.2
arise from/out of 145 0.6 4.4
derive 476 1.8 14.3
emerge 466 1.8 14.0
follow from 74 0.3 2.2
trigger 56 0.2 1.7
stem 95 0.4 2.9
TOTAL VERBS 4,252 16.06 128.0
Adjectives
consequent 53 0.2 1.6
responsible (for) 344 1.3 10.4
TOTAL ADJ. 397 1.49 12
Prepositions
because of 599 2.3 18.0
due to 195 0.7 5.9
as a result of 196 0.7 5.9
as a consequence of 22 0.1 0.7
in consequence of 1 0.0 0.0
in view of 66 0.3 2.0
owing to 52 0.2 1.6
in (the) light of 109 0.4 3.3
thanks to 35 0.1 1.0
on the grounds of 22 0.1 0.7
on account of 24 0.1 0.7
TOTAL PREP. 1,321 4.99 39.8
Adverbs
therefore 1,412 5.3 42.5
accordingly 130 0.5 3.9
consequently 143 0.5 4.3
thus 1,767 6.7 53.2
hence 283 1.1 8.5
so 1,894 7.2 57.0
thereby 182 0.7 5.5
as a result 101 0.4 3.0
as a consequence 20 0.1 0.6
in consequence 14 0.0 0.4
by implication 35 0.1 1.1
TOTAL ADVERBS 5,981 22.59 180.04
Conjunctions
because 2,207 8.3 66.4
since 955 3.6 28.7
As 5 883 3.3 26.6
for 1,036 3.9 31.2
so that 696 2.6 21.0
PRO is why 52 0.2 1.6
that is why 22 0.1 0.7
this is why 18 0.1 0.5
which is why 12 0.0 0.4
on the grounds that 83 0.312 2.5
TOTAL CONJ. 5,912 22.33 177.97
TOTAL 26,475 100 796.99
Table 4.12 Ways of comparing and contrasting found

in the BNC-AC-HUM
Nouns
resemblance 116 0.4 3.5
similarity 212 0.7 6.4
parallel 147 0.5 4.4
parallelism 19 0.1 0.6
analogy 175 0.6 5.3
contrast 522 1.8 15.7
comparison 311 1.1 9.4
difference 1,318 4.5 39.7
differentiation 76 0.3 2.3
distinction 595 2.0 17.9
distinctiveness 10 0.0 0.3
(the) same 559 1.9 16.8
(the) contrary 28 0.1 0.8
(the) opposite 85 0.3 2.6
(the) reverse 56 0.2 1.7
TOTAL NOUNS 4,229 14.46 127.3
Adjectives
same 2,580 0.9 77.7
similar 1,027 3.5 30.9
analogous 55 0.2 1.7
common 1055 3.6 31.8
comparable 223 0.8 6.7
identical 137 0.5 4.1
parallel 52 0.2 1.6
alike 98 0.3 2.9
contrasting 63 0.2 1.9
different 2,496 8.5 75.1
differing 72 0.3 2.2
distinct 278 0.9 8.4
distinctive 163 0.6 4.9
distinguishable 33 0.1 1.0
unlike 43 0.1 1.3
contrary 27 0.1 0.8
opposite 127 0.4 3.8
reverse 23 0.1 0.7
TOTAL ADJECTIVES 8,552 29.24 257.44
Verbs
resemble 138 0.5 4.1
correspond 137 0.5 4.1
look like 102 0.3 3.1
compare 278 0.9 8.4
parallel 56 0.2 1.7
contrast 137 0.5 4.1
differ 242 0.8 7.3

distinguish 404 1.4 12.2
differentiate 74 0.3 2.2
TOTAL VERBS 1,568 5.36 47.2
Adverbs
similarly 394 1.3 11.9
analogously 2 0.0 0.1
identically 2 0.0 0.1
correspondingly 29 0.1 0.9
parallely 0 0.0 0.0
likewise 118 0.4 3.5
in the same way 56 0.2 1.7
contrastingly 3 0.0 0.1
differently 97 0.3 2.9
by/in contrast 185 0.6 5.6
by contrast 116
in contrast 69
by way of contrast 0 0.0 0.0
by/in comparison 23 0.1 0.7
by comparison 14 0.4
in comparison 9 0.3
comparatively 69 0.2 2.1
contrariwise 4 0.0 0.1
distinctively 25 0.1 0.7
on the other hand 372 1.3 11.2
(on the one hand) 136 0.5 4.1
on the contrary 95 0.3 2.9
quite the contrary 2 0.0 0.1
conversely 62 0.2 1.9
TOTAL ADVERBS 1,674 5.72 50.39
Prepositions
like6 2,812 9.6 84.6
unlike 244 0.8 7.3
in parallel with 8 0.0 0.2
as opposed to 121 0.4 3.6
as against 46 0.2 1.4
in contrast to/with 82 0.3 2.5
in contrast to 73 2.2
in contrast with 9 0.3
versus 53 0.2 1.6
contrary to 66 0.2 2.00
by/in comparison with 52 0.2 1.6
in comparison with 14 0.4
in comparison to 4 0.1
by comparison with 21 0.6
in comparison with 14 0.4
TOTAL PREP. 3,484 11.91 104.88
(Continued)
Table 4.12 Cont’d
Conjunctions
as 5,045 17.2 151.9
while 1264 4.3 38.0
whereas 442 1.5 13.3
TOTAL CONJ. 6,751 23.08 203.23
Other expressions
as . . . as 2,766 9.5 83.3
in the same way as/that 38 0.1 1.1
compared with/to 155 0.5 4.7
compared with 113 3.4
compared to 42 1.3
CONJ compared to/with 32 0.1 1.0
as compared to/with 11 0.3
when compared to/with 20 0.6
if compared to/with 1 0.0
TOTAL 29,249 100 880.5
Table 4.13 shows a co-occurrence analysis of several nouns that are used
to express cause or effect in academic prose: reason, implication, effect,
outcome, result and consequence. Most of the co-occurrents listed form quite
flexible and compositional textual sentence stems with their nominal node,
as illustrated in the following examples:
4.51. Another direct result of conquest by force of arms was the development of
slavery, which was widespread up to the beginning of the nineteenth
century.
4.52. This may be an effect of the uncertainty around television’s textuality; but it
is now an extremely limiting effect for the development of theory.
4.53. Health for women was held to be synonymous with healthy motherhood. This
had important implications for the debate over access to birth control informa-
tion and abortion – rarely were demands for freer access to birth control
information devoid of maternalist rhetoric.
4.54. The reason is that with Van Gogh art and life are not merely conditioned by
each other to a greater degree than with any other artist, but actually merge
with each other.
4.55. However it is first necessary to consider another important consequence of the
view of psychosis being presented here.
Table 4.13 Co-occurrents of nouns expressing cause or effect in the BNC-AC-HUM
Table 4.13a: reason
Adjective + reason Verb + reason Determiner + reason
good have this

main give another
sufficient see (no) reason to + verb
obvious base on believe
other provide suppose
different find doubt
alleged examine prefer
simple Auxiliary verb + reason think
tactical be fear
political seem accept
major reason + verb reason(s) for . . . .
additional be supposing
right justify believing
valid reason + preposition thinking
similar for accepting
fundamental against rejecting
real Preposition (2L) + reason adopting
independent for There + verb + reason
special reason + conjunction There is (no) reason to
possible why There seems no reason
historical which There are (DET/ADJ) reasons
particular that
Table 4.13b: implication
Adjective + implication Auxiliary verb + implication
important be
practical implication + preposition
political of
serious for
social Preposition + implication
Verb + implication with
have Determiner + implication
carry this
implication + verb implication + conjunction
be that
Table 4.13c: effect
Adjective + effect Verb + effect
adverse have
overall produce
good achieve
profound create
knock-on cause
indirect Auxiliary verb + effect
far-reaching be
damaging effect + verb
cumulative be
dramatic depend on
immediate occur
excellent effect + preposition
long-term of
practical on
particular upon
powerful Determiner + effect
special this
full effect + conjunction
general That
important Noun and effect
other cause
Table 4.13d: outcome
Adjective + outcome Verb + outcome
logical influence
eventual determine
likely represent
different affect
inevitable outcome + verb
final be
outcome + preposition Auxiliary verb + outcome
of be
Determiner + outcome
this
Table 4.13e: result
Adjective + result Verb + result
inevitable produce
direct achieve
immediate yield
beneficial give
eventual bring
interesting lead to
practical show
main present
similar interpret
result + preposition obtain
of have
from result + verb
Preposition (3L) + result be
with Auxiliary verb + result
Determiner + result be
this
Table 4.13f: consequence
Adjective + consequence Verb + consequence
inevitable have
unintended suffer (from)
unfortunate avoid
direct consider
important outweigh
necessary discuss
political consequence + verb
natural be
bad follow
practical ensue
social Auxiliary verb +
consequence
likely be
major consequence + preposition
possible of
Determiner + consequence for
this Preposition (3L) +
consequence
another with
consequence + conjunction of
that -
The word combinations illustrated in Examples 4.51 to 4.55 are good illus-
trations of what Sinclair and his followers have called ‘extended units of
meaning’ where lexical and grammatical choices are ‘intertwined to build
up a multi-word unit with a specific semantic preference, associating the
formal patterning with a semantic field, and an identifiable semantic pros-
ody, performing an attitudinal and pragmatic function in the discourse’
(Tognini-Bonelli, 2002: 79). These extended units of meaning are catego-
rized as textual phrasemes in Granger and Paquot’s (2008a) typology as
they function as sentence stems to organize the propositional content at a
metadiscoursal level.
A few co-occurrences are collocations as illustrated by Example 4.56.
The verb carry is used in a delexical sense in the collocation carry implica-
tions, which basically means have implications.
4.56. We may certainly talk of animals, in the absence of speech, “consciously intend-
ing” or being compassionate, both of which carry implications of understand-
ing to some degree.
The variety of adjectives used with the nouns reason, implication, effect, outcome,
result and consequence is also worthy of note and bears testimony to their prom-
inent role in argumentation (Soler, 2002; Tutin, forthcoming). A large pro-
portion of those are evaluative adjectives (e.g. fundamental, good, important,
inevitable, major, serious, sufficient) and are used to express the ‘writer’s atti-
tude or stance towards, viewpoint on, or feelings about the entities or propo-
sitions that he or she is talking about’ (Hunston and Thompson, 2000: 5).
Like nouns, verbs that serve specific rhetorical or organizational func-
tions in academic prose generally enter compositional and flexible
sequences. Table 4.14 gives the most frequent lexical bundles containing
one of the four verbs suggest, appear, prove and tend typically used to express
possibility or certainty. Most clusters are lexico-grammatical patterns which
function as textual sentence stems (e.g. it has been suggested that, it appears
that), sentence-initial adverbial clauses (e.g. as suggested above, . . . ) or
rhemes (e.g. . . . proved a complete failure). It is worth noting that each verb
form has its own ‘distinctive collocational relationship’ (Sinclair, 1999: 16),
and that these constitute different form/meaning pairings, and thus differ-
ent complete units of meaning. For example, the –ed form of suggest (unlike
that of appear, tend or prove) is mainly used in passive constructions. It is often
used to report suggestions made by other people in impersonal structures
introduced by it (e.g. it has been suggested, it is sometimes suggested), and in
phrases introduced by the conjunction as (e.g. as already suggested by).
As-phrases are also used with an endophoric marker (e.g. as suggested above)
and/or the first person pronoun I (e.g. as I have suggested) to refer to a sug-
gestion previously made. Suggested is also used in impersonal structures
introduced by it followed by a modal verb (e.g. it may/might be suggested that)
to make a tentative suggestion. By contrast, the verb form suggests is typically
used to make it clear that the suggestion offered is made on the basis of
who/whatever is the subject of the sentence:
4.57. More recent evidence suggests, however, that while it lives in woodland it
actually hunts over nearby open areas.
4.58. Sinclair Hood (1971) suggests that woollen cloth and timber were sent to
Egypt in exchange for linen or papyrus.
In summary, results indicate that the phraseology of rhetorical or organi-

zational functions in academic prose does not consist of idioms, similes,
phrasal verbs, idiomatic sentences, proverb fragments and the like (see also
Pecman, 2004 and Gledhill, 2000).7 Referential phrasemes that serve to
organize scientific discourse mainly consist of lexical and grammatical
collocations. Results also confirm Howarth’s (1996; 1998) conclusion that a
large proportion of the lexical collocations found in academic discourse
consist of a verb in a figurative sense and an abstract noun denoting
a recurrent concept in academic discussion (e.g. adopt an approach/a method;
Table 4.14 Co-occurrents of verbs expressing possibility and certainty in the

BNC-AC-HUM
Table 4.14a: suggest
suggested suggest
– it has been suggested that – NP / it / this might / may / would suggest

– it is (sometimes, commonly) suggested (that)
that – NP does suggest (that)
– it was (first, also, even) suggested that – there is evidence to suggest
– it can / could / may be suggested that – I (would / want to) suggest
– this is suggested by – NP / it / this seems to suggest (that)
– as (already) suggested by
– as suggested above
– (as) I (have) (already) suggested
suggests suggesting
– NP / it / this (ADV: strongly, also) suggests – … , (ADV: strongly) suggesting (that)

(that) – I am (not) suggesting that
– . . ., which suggests (that)
– as NP suggests
Table 4.14b: prove
proved prove
– NP / it / this proved to – ADJ (likely, difficult, easy, possible) to

– NP / it / this proved (ADV) ADJ (to) with prove
ADJ: difficult, unable, abortive, impossible, . . . may / might / would prove ADJ to
inadequate, successful, possible – NP was to prove ADJ
– NP / it / proved to be (ADV) ADJ – attempt to prove
– NP proved NP – seek to prove
proves proving
– NP proves ADJ (impossible, necessary, – BE proving
inadequate, successful) – . . ., proving that
– NP proves that – . . . by proving
– . . . of proving
Table 4.14c: appear
appeared appears
– it appeared (ADJ) that – NP / it / this appears to V

– there appeared to be – which appears to V
– this appeared to V – what appears to V
– . . . which appeared ADJ/ to V – there appears to V
– it appears that
– as appears from/in
appear appearing
NP would/might/may appear to be/V /
Table 4.14d: tend
tended tend
– NP tended to V (be, favour, take, see) – NP tend to V (be, see, look, regard)
tends tending
– NP tends to V /
– . . . which tends to V
– it tends to V
V: be, confirm, ignore, obscure, become,
support, conclude
draw an analogy/a comparison/a distinction; reach a conclusion/a consensus/a

point; develop an idea/a method/a model; carry out a task/a test/a study).
In academic prose, the category of textual phrasemes consists of three
types of phraseme (cf. Figure 4.6). The first is complex prepositions (e.g.
with respect to, in addition to) and complex conjunctions (e.g. so that, as if, even
Phrasemes
Referential function Textual function Communicative function

Referential phrasemes Textual phrasemes Communicative phrasemes
(Lexical) collocations Complex prepositions Attitudinal formulae

Grammatical collocations complex conjuctions
Linking adverbials
Textual formulae (including
textual sentence stems and
rhemes)
Figure 4.6 The phraseology of rhetorical functions in academic prose
though) which are used to establish grammatical relations (cf. Burger’s

(1998) category of structural phrasemes). The second is multiword linking
adverbials, used to connect two stretches of discourse. Although the major-
ity of linking adverbials are single adverbs, and are therefore not part of the
phraseological spectrum, prepositional phrases functioning as adverbs (e.g.
for example, in other words, in addition, in conclusion, as a result) and clausal
linking adverbials (e.g. that is, that is to say, what is more, to conclude) are also
common in academic prose (Conrad, 1999: 11–12). These first two catego-
ries of textual phraseme broadly correspond to Moon’s (1998) set of orga-
nizational fixed expressions and idioms. Textual sentence stems and rhemes
constitute the third type of textual phrasemes, which I refer to as ‘textual
formulae’. Textual sentence stems are multiple clause elements involving a
subject and a verb, which ‘form the springboard of utterances leading up to
the communicatively most important — and lexically most variable — ele-
ment’ (Altenberg, 1998: 113). Examples include It has been suggested; Another
reason is . . . ; and It is argued that. . . . Rhemes typically consist of a verb and
its post-verbal elements (e.g. . . . is another issue). They also sometimes func-
tion as textual phrasemes but are less frequent than sentence stems, possi-
bly because rhemes are ‘usually tailored to expressing the particular new
information the speakers want to convey to their listeners, and are there-
fore, as Altenberg (1998: 111) points out, “composed of variable items
drawn from an open set”’ (De Cock, 2003: 269). Textual formulae are par-
ticularly prominent in academic writing and display different degrees of
flexibility, from flexible fragments such as ‘DET (a, another) ADJ (typical,
classic, prime, good, etc.) example of [NP] is . . .’ to more inflexible phrasemes
such as ‘to be a case in point’.
Attitudinal formulae make up a large proportion of communicative

phrasemes in academic prose. They largely consist of sentence stems such
as it is important/necessary that, it seems that or it is noteworthy that. This group
is similar to Biber et al.’s (2004) category of stance bundles that ‘provide a
frame for the interpretation of the following proposition, conveying two
major kinds of meaning: epistemic and attitude/modality’ (Biber et al.,
2004: 389).
The frequency-based approach adopted to study the phraseology of
rhetorical functions has also helped uncover a whole range of word combi-
nations that do not fit traditional phraseological categories. Co-occurrences
such as direct result, evidence suggests, final outcome, and outstanding example
have traditionally been considered as peripheral or falling outside the lim-
its of phraseology (Granger and Paquot, 2008a: 29) but results suggest that
they are essential for effective communication and are also part of the pre-
ferred lexical devices used to organize scientific discourse.
In this chapter, I have shown that a high proportion of words in the

Academic Keyword List (AKL) fit my definition of academic vocabulary and
serve rhetorical or organizational functions in academic prose. The analysis
of exemplifiers presented in Section 4.2 has also validated the method used
to select AKL words: the lexical items which were automatically extracted
included the most frequent exemplifiers in academic writing (such as,
example, for example and for instance) and lexical items which are not as
frequent but which are more common in academic prose than in other
genres (illustrate, exemplify, e.g., notably). The AKL could be very useful for
curriculum and materials design as it includes a high number of words that
serve rhetorical functions in academic prose.
The list, however, still needs to be refined in various ways. To be useful to
apprentice writers, it should include the word combinations (frequent
co-occurrences, collocations, textual phrasemes, etc.) in which each AKL
word is commonly found in academic prose, together with information
on the word’s frequency (see Coxhead et al. (forthcoming) for a similar
project for Coxhead’s (2000) Academic Word List). This means that each
AKL word has to be described in context, as was done above (Section 4.2)
for the function of exemplification. Such a contextual analysis will also
make it possible to decide whether each word fits my definition of academic
vocabulary and deserves to be retained in the Academic Keyword List.
The type of data analysis presented in this chapter has also offered
valuable insights into the distinctive nature of the phraseology of rhetorical
functions in scientific discourse. Most notably, results have shown that
textual phrasemes make up the lion’s share of multiword units that ensure
textual cohesion in academic prose. This type of phraseme, however, has
often been neglected in theories of phraseology (cf. Granger and Paquot,
2008a: 34–5). Attitudinal formulae serve a major role in a restricted
number of functions such as ‘expressing personal opinion’ and ‘expressing
possibility and certainty’. Results have also pointed to the prominent role
of free combinations to build the rhetoric of academic texts. My findings
thus support Gledhill’s call for a rhetorical or pragmatic definition of
phraseology:
Phraseology is the ‘preferred way of saying things within a particular

discourse’. The notion of phraseology implies much more than invento-
ries of idioms and systems of lexical patterns. Phraseology is a dimension
of language use in which patterns of wording (lexico-grammatical
patterns) encode semantic views of the world, and at a higher level idi-
oms and lexical phrases have rhetorical and textual roles within a specific
discourse. Phraseology is at once a pragmatic dimension of linguistic
analysis, and a system of organization which encompasses more local lexi-
cal relationships, namely collocation and the lexico-grammar. I claim that
the phraseological analysis of a text should not only involve the identifica-
tion of specific collocations and idioms, but must also take account of the
correspondence between the expression and the discourse within which
it has been produced. (Gledhill, 2000: 202)
In line with this call, the functions of all AKL words and their preferred
phraseological and lexico-grammatical patterns should be identified by
examining them in context.
Another objective of this chapter has also been to assess the adequacy of
the treatment of rhetorical functions in EAP textbooks and investigate
whether the AKL should be supplemented with additional academic words.
To do so, I listed the words and phrases given in academic writing textbooks
as typical lexical devices to perform the five rhetorical functions analysed in
detail in this book and compared them with the AKL. I identified the words
that were not part of the AKL and examined their use in the BNC-AC-HUM.
Some of these lexical items turned out not to be typical of academic
prose or to be extremely rare (e.g. to name but a few, by way of illustration)
and should therefore not deserve the attention they have been given in
pedagogical materials. By contrast, a large proportion of AKL words were

not found in textbooks in spite of their relatively high frequency and major
discourse functions in academic prose. These findings show the power of a
data-driven approach to the selection of academic vocabulary and clearly
call for a revision of the treatment of rhetorical functions in academic
writing textbooks.
A pedagogically-oriented investigation of academic vocabulary cannot
rest solely on native speaker data. It is essential to examine what learners
actually do with lexical devices that serve rhetorical functions. For example,
do they use exemplifiers? Do they rely on words and phrasemes that are
typical of academic prose? Do they use the expressions to name but a few and
by way of illustration? If so, do they use them correctly? And do they use them
sparingly or do they make heavy use of these infrequent exemplifiers? These
questions can only be answered by an analysis of learner corpus data. Such
an analysis is presented in the next chapter.
Chapter 5
Academic vocabulary in the International

Corpus of Learner English
This chapter is devoted to academic vocabulary in learner writing. Section 5.1

presents a detailed comparison of exemplificatory devices in native and
learner writing. This illustrates the type of results obtained when the range
of lexical strategies available to EFL learners is compared to that of expert
writers. Differences between learner and native writing are highlighted by
means of log-likelihood tests. The UCREL log-likelihood calculator website
(http://ucrel.lancs.ac.uk/llwizard.html) was used to compute log-likelihood
values; 6.64 (p < 0.01) was taken as the threshold value. The whole learner
corpus was compared to the BNC-AC-HUM but the results are only reported
if they are common to learners from a majority of the mother tongue
backgrounds considered. The same methodology was used to examine
learners’ use of words that serve the rhetorical functions of ‘expressing
cause and effect’, ‘comparing and contrasting’, ‘expressing a concession’
and ‘reformulating: paraphrasing and clarifying’. However these analyses
are not presented in as much detail as for exemplification, both for reasons
of space and because the presentation would soon become cumbersome.
Instead, the focus of Section 5.2 is on the general interlanguage features
that emerge from these analyses. These fall into six broad categories: limited
lexical repertoire, lack of register awareness, learner-specific phraseological
patterns, semantic misuse, clusters of connectives and unmarked position of
connectors. However not all learner specific-features can be attributed to devel-
opmental factors. The learner’s first language also plays a considerable part in
his or her use of academic vocabulary. In Section 5.3, I focus on transfer effects
on French learners’ use of multiword sequences with rhetorical functions.
5.1. A bird’s-eye view of exemplification in learner writing
A general finding of the comparison between the International Corpus of

Learner English (ICLE) and the British National Corpus – Academic Humanities
(BNC-AC-HUM) subcorpus is that exemplificatory lexical items are signifi-

cantly more frequent in learner writing than in professional academic
prose. This result highlights the importance of analysing several learner
populations and comparing them so as to avoid faulty conclusions about
EFL learner writing in general. Siepmann (2005) finds that the adverbials
for example and for instance are less frequent in German learner writing than
in native and non-native professional writing and argues that ‘under-use
of exemplification as a rhetorical strategy in student writing may (. . .)
bespeak a general lack of concern for comprehensibility’ (Siepmann, 2005:
255). This explanation for German learners’ underuse of exemplifiers
is not entirely satisfactory, and does not apply to EFL learner writing in
general: most L1 learner populations overuse exemplificatory discourse
markers.
The bar chart in Figure 5.1 shows the frequencies per 100,000 words of
exemplifiers in the ICLE and the BNC-AC-HUM. The lexical items are
ordered by decreasing relative frequency in the ICLE. The bar chart shows
that EFL learners’ use of exemplifiers differs from that of professional
writers in at least two ways. First, they do not choose the same exemplifiers.
Thus the most frequent exemplifier in the ICLE is the adverbial for example,
whereas the most frequent one in the BNC-AC-HUM is such as.
The frequencies of individual items also differ widely. Figures and
log-likelihood values for each corpus comparison are given in Table 5.1.
This shows that EFL learners’ overuse of the function of exemplification is
largely explained by their massive overuse of the adverbials for example and
for instance, the noun example 2 and the preposition like. The overuse of for
instance has already been reported by Granger and Tyson (1996) for French
learners and Altenberg and Tapper (1998) for Swedish learners. Overuse of
for example has also been found in other learner populations such as
Japanese and Taiwanese learners (Narita and Sugiura, 2006; Chen, 2006).
By contrast, learners tend to make little use of the verbs illustrate and exem-
plify and the adverb notably, which are underused in the ICLE. There is no
significant difference in the use of the preposition such as, the abbreviation
e.g., the nouns illustration and case in point and the expressions to name but a
few and by way of illustration when comparisons are based on the total num-
ber of running words in each corpus. Except for the preposition such as and
the abbreviation e.g., these lexical items are quite infrequent in both native-
speaker and learner writing.
As explained in Section 4.1, the frequency of each exemplificatory lexical
item can also be calculated as a proportion of the total number of exempli-
fiers. Corpus comparisons based on the total number of running words
have shown that exemplification is used significantly more in the ICLE than
80
70
60
Academic vocabulary in the ICLE

50
40
30
20
10
0
ple . bly lify
ple as like tance e.g str
ate tion poin
t
mp
few tion
m
exa
m
suc
h tra ota ta tra
exa ins illu l u s i n n e x e b u
ill u s
for for il ase me of
E ac o na ay
B t w
by
ICLE BNC-AC-HUM
Figure 5.1 Exemplifiers in the ICLE and the BNC-AC-HUM
127
Table 5.1 A comparison of exemplifiers based on the total number of running

words
ICLE BNC-AC-HUM LogL
Abs. Rel. Abs. Rel.
Nouns
example 713 61.17 1285 38.68 91.6 (++)
example 477 40.9 665 20 134 (++)
examples 230 19.7 620 18.7 0.5
*exemple1 4 0.3
*exampl 1 0.1
*examle 1 0.1
illustration 17 1.5 77 2.3 3.3
illustration 16 1.4 63 2
illustrations 1 0.1 14 0.4
(BE) a case in point 10 0.86 18 0.5 1.3
TOTAL NOUNS 740 63.5 1380 41.5 83.6 (++)
verbs
illustrate 51 4.38 259 7.8 16.1 (− −)
illustrate 29 2.5 97 2.9 0.6
illustrates 14 1.2 63 1.9 2.6
illustrated 8 0.7 84 2.5 17.7 (− −)
illustrating 0 0 15 0.5 9
exemplify 6 0.43 79 2.38 20.32 (− −)
exemplify 2 0.2 9 0.3 0.4
exemplifies 2 0.2 15 0.5 2.1
exemplified 2 0.18 53 1.6 20.09 (− −)
exemplified 1 0.1
*examplified 1 0.1
exemplifying 0 0 2 0 1.2
TOTAL VERBS 57 4.8 338 10.2 32.1 (− −)
prepositions
such as 489 42 1494 45 1.8
like 468 40.2 532 16 199.6 (++)
TOTAL PREP. 957 82.1 2026 61 55.3 (++)
Adverbs
for example 857 73.5 1263 38.00 209.9 (++)
for example 854
*for exemple 3
for instance 344 29.5 609 18.3 47.3 (++)
e.g. 94 8 259 7.8 0.1
notably 5 0.4 77 2.3 22.1 (− −)
to name but a few 3 0.3 4 0.1 0.9
by way of illustration 1 0.1 3 0.1 0
TOTAL ADVERBS 1304 111.9 2215 66.7 208.3 (++)
TOTAL 3058 262.4 5959 179.4 279.2 (++)
Legend: (++) significantly more frequent (p < 0.01) in ICLE than in BNC-AC-HUM;
(− −) significantly less frequent (p < 0.01) in ICLE than in BNC-AC-HUM
Academic vocabulary in the ICLE 129
Table 5.2 A comparison of exemplifiers based on the total

number of exemplifiers used
ICLE BNC-AC-HUM LogL
Abs. % Abs. %
Nouns
example 713 23.3 1285 21.6 2.8
illustration 17 0.6 77 1.3 11.7 (− −)
(BE) a case in point 10 0.3 18 0.3 0
TOTAL NOUNS 740 24.2 1380 23.2 0.9
Verbs
illustrate 51 1.7 259 4.4 47.7 (− −)
exemplify 6 0.2 79 1.3 35 (− −)
TOTAL VERBS 56 1.8 338 5.7 77.3 (− −)
Prepositions
such as 489 16 1494 25 80 (− −)
like 468 15.3 532 8.9 70.7 (++)
TOTAL PREP. 957 31.3 2026 34 4.5
Adverbs
for example 854 28 1263 21.2 39 (++)
for instance 344 11.3 609 10.2 2
e.g. 94 3 259 4.3 8.1 (− −)
notably 5 0.2 77 1.3 36.9 (− −)
to name but a few 3 0.1 4 0.1 0.2
by way of illustration 1 0 3 0 0.2
TOTAL ADVERBS 1301 42.6 2215 37.2 15.3 (++)
TOTAL 3054 100 5959 100
Legend: (++) significantly more frequent (p < 0.01) in ICLE than in BNC-AC-HUM;
(− −) significantly less frequent (p < 0.01) in ICLE than in BNC-AC-HUM
in the BNC-AC-HUM, and that the four lexical items discussed above are
largely responsible for this overuse (Table 5.1). Comparisons based on the
total number of exemplifiers allow us to ask and answer different research
questions. They give information about which lexical item(s) EFL learners
prefer to use when they want to give an example, and in what proportions.
Thus Table 5.2 shows that EFL learners select for example on 28 per cent of
the occasions when they introduce an example, whereas native-speaker
academics only use it to introduce 21 per cent of their examples.
Both methods indicate that EFL learners overuse the preposition like and
the adverbial for example. As shown in Table 5.3, however, the two methods
may also give different results. The noun example appears to be overused in
the ICLE when comparisons are based on the total number of running
Table 5.3 Two methods of comparing the use of exemplifiers
Lexical item Comparison based on total Comparison based on total

number of running words number of exemplifiers
example ++ //
illustration // −−
(be) a case in point // //
TOTAL NOUNS ++ //
illustrate −− −−
exemplify −− −−
TOTAL VERBS −− −−
Such as // −−
Like ++ ++
TOTAL PREPOSITIONS ++ //
for example ++ ++
for instance ++ //
e.g. // //
notably −− −−
to name but a few // //
by way of illustration // //
TOTAL ADVERBS ++ ++
Legend: ++ significantly more frequent (p < 0.01) in ICLE than in BNC-AC-HUM;
− − significantly less frequent (p < 0.01) in ICLE than in BNC-AC-HUM; // no significant difference
between the frequencies in the two corpora
words in each corpus. However, a comparison based on the total number of

exemplifiers suggests that the learners choose the noun example about as
often as professional academics when they want to introduce an example
(23.3% vs. 21.6%). More lexical items are significantly underused when
figures are based on the total number of exemplifiers. In addition to illustrate,
exemplify and notably, the noun illustration and the preposition such as are
selected proportionally less often by EFL learners than by professionals to
introduce an example.
This first broad picture of the use of exemplifiers in the ICLE points to
EFL learners’ limited repertoire of lexical items used to serve this specific
EAP function. This characteristic of learner writing is discussed in more
detail in Section 5.2.1.
By comparison with academics, EFL learners overuse the preposition like
and underuse such as. Figure 5.2 shows the relative frequencies per 1,000,000
words of like and such as in four sub-corpora of the British National Corpus
representing different ‘super genres’ (see Section 3.3): academic writing,
fiction, newspaper texts and speech (BNC-SP) as well as in the ICLE. The
2500
2000
1500
1000
500
0
Academic Fiction News Speech Learner
writing writing
like such as
Figure 5.2 The use of the prepositions ‘like’ and ‘such as’ in different genres
50
45
freq. per million words
40
35
30
25
20
15
10
5
0
speech Fiction Learner News Academic
writing writing
Figure 5.3 The use of the adverb ‘notably’ in different genres
preposition like is much more frequent than such as in speech3, fiction, news
and learner writing but is less frequent in academic prose. By contrast,
such as is more frequently used in academic prose. Learners’ use of these
exemplificatory prepositions thus differs from academic expert writing, but
resembles more informal genres such as speech. Learners’ underuse of the
adverb notably in their academic writing is another illustration of the same
point (Figure 5.3).
for example for instance
14%
20%
2%
7%
10%
59%
11%
77%
Academic writing News Fiction Speech
Figure 5.4 Distribution of the adverbials ‘for example’ and ‘for instance’ across
genres in the BNC
A large proportion of EFL learner populations make repeated use of

the word-like unit for instance. The use of this adverbial by native-speakers,
however, differs significantly from that of for example, both in terms of
frequency and register. Figure 5.4 shows that 77 per cent of all instances of
for example in the BNC are found in the academic sub-corpus. However only
59 per cent of the occurrences of for instance appear in academic prose
while 30 per cent are found in more informal genres such as speech and
fiction. Lee and Swales (2006: 64) also showed that the use of these two
adverbials differs across academic disciplines: for instance is more frequent
in the social sciences and humanities while in natural sciences, technology
and engineering, for example is strongly favoured to clarify a difficult or
complex point through exemplification.
Lack of register awareness manifests itself in a number of ways in learner
academic writing. This will be the focus of Section 5.2.2.
The phraseology of academic words is also a major source of difficulties
to EFL learners. One of the main advantages of using a noun rather than
the adverbials for example and for instance is that the use of a noun allows the
writer to qualify the example with an adjective (see Section 4.2.2). However
only 18 per cent of the adjective co-occurrents (types) of the noun example
in the ICLE are significant co-occurrents in the BNC-AC-HUM (Table 5.4).
A quarter of the adjective co-occurrents of example in the ICLE do not
appear at all in the 100-million word British National Corpus (Table 5.5).
A large proportion of these adjectives have been described by our
Table 5.4 Significant adjective co-occurrents

of the noun ‘example’ in the ICLE
good 77 excellent 4
extreme 12 typical 3
above 8 classic 2
clear 8 interesting 2
striking 7 numerous 2
simple 6 outstanding 1
Well-known 5
Table 5.5 Adjectives co-occurrents of the

noun ‘example’ in ICLE not found in the BNC
big 2 manipulative 1
warning 2 mere 1
absolute 1 model 1
bright 1 opposite 1
cruel 1 overstated 1
present day 1 polemic 1
evident 1 hair raising 1
frightening 1 stirring 1
impermissible 1 upsetting 1
native-speaker informant as forming awkward co-occurrences with example

as illustrated in the following sentences:
5.1. The story of Cinderella is one more impermissible example. Cinderella is a

neglected child, and once again the step-family is the guilty party. (ICLE-DU)
5.2. For example a disliked politician will be shot through such a zoom as to expose
his ugly bits. Which may most probably influence our feeling towards him.
We all know thousands of such manipulative examples. (ICLE-PO)
5.3. This mere example proves that the ideal union people dream of is not yet a total
reality: national conflicts are still at work, every nation defends its own interests
before fighting for those of “the group” they joined. (ICLE-FR)
5.4. The opposite example is (the former?) USSR, where the union was imposed by
a central power without real approbation of the states and against people’s will.
(ICLE-FR)
5.5. Of course, that was an overstated example, extreme, so to speak. (ICLE-RU)
Similarly, only 23 per cent of the verb types that are used with example in the
ICLE are significant co-occurrents of the noun in the BNC-AC-HUM
(see Table 5.6). Some 27 per cent of the verb co-occurrents (types) of the
noun example in the ICLE do not appear with example in the whole BNC.
They are listed in Table 5.7. Like adjective co-occurrents, several of these
verbs form awkward co-occurrences with the noun example:
5.6. In a new society made with less inequality, less poverty and more social justice
we would not find the same quantity of crime that we find in our society. I can
make the example of Naples: here there is everyday an incredible lot of crimes.
(ICLE-IT)
Table 5.6 Significant verb

co-occurrents of the noun ‘example’
in the ICLE

be 162 be 119
take 36 show 31
give 28 illustrate 15
find 10 concern 2
show 10 suggest 1
serve 4 Suffice 1
illustrate 3
provide 2
cite 2
consider 1
TOTAL 258 TOTAL 169
Table 5.7 Verb co-occurrent types of the

noun ‘example’ in ICLE not found in BNC

culminate into 1 say 1
glide into 1 reinforce 1
state 1 criticize 1
plaster with 1 point out 1
derive 1 express 1
write 1
help as 1
appear 1
TOTAL 8 TOTAL 5
Table 5.8 The distribution of ‘example’ and ‘be’ in the ICLE and the BNC-AC-
HUM
be + example example + be TOTAL Rel. freq. LogL
ICLE 162 (57.7%) 119 (42.3%) 281 24.1 199.76 (++)

BNC-AC-HUM 139 (62.3%) 84 (37.7%) 223 6.71
Table 5.9 The distribution of ‘there + BE + example’ in

ICLE and the BNC-AC-HUM
there + BE + example LogL
Abs. freq. Rel. freq.

ICLE 31 2.66 34.52 (++)
BNC-AC-HUM 15 0.45
5.7. Their understanding of the outside world differs. It originates in dissimilar

climate, life-style, social organization, political and economical stability of the
country. To glide into an extreme example, unequality appears even between
people living in towns and villages. (ICLE-CZ)
5.8. The rules of the road you have to learn to pass your driving license are plastered
with examples of children who cross the road unexpectedly, running after a ball.
(ICLE-GE)
The copular be is the most frequent left and right co-occurrent of the
noun example in learner writing. Textual sentence stems and rhemes with
the verb be are significantly more frequent in learner writing than in profes-
sional academic writing (Table 5.8). These results differ markedly from
those reported in Paquot (2008a) in which French, Spanish, Italian and
German learners were shown to underuse stems and rhemes with the verb
be. This difference may be explained by the fact that the reference corpus
used for comparison in Paquot (2008a) was a collection of native-speaker
student essays.
Table 5.9 shows that the structure there + be + example is more frequently
used in learner writing than in professional academic writing. It appears in
all 10 learner corpora (i.e. irrespective of the learner’s mother tongue) as
illustrated by the following sentence:
5.9. There is the example of Great Britain where a professional army costs less than,
for example, the French army based on conscription. (ICLE-RU)
In professional academic writing, the verb take is mainly used in

sentence-initial exemplificatory infinitive clauses with the noun example
(Example 5.10). This pattern is very infrequent in ICLE. EFL learners
prefer to use the verb take in active structures introduced by the personal
pronoun I (Example 5.11) or in first person plural imperative sentences
(Example 5.12).
5.10. To take one example, at the beginning of the project seven committees were
established, each consisting of about six people, to investigate one of a range of
competing architectural possibilities. (BNC-AC-HUM)
5.11. I can take the example of the ‘Société Générale de Belgique’ which is directed
by ‘Suez’. (ICLE-FR)
5.12. Let’s take the example of painting. (ICLE-FR)
As illustrated by Examples 5.13 and 5.14, learners often use the verb have
in the same structures as take to introduce an example. The imperative
sentence, however, was judged to be awkward by our native-speaker
informant.
5.13. Let us have an example — an extract out of the famous Figaro’s soliloquy:
There is a liberty of the press in Madrid now, so that I can write about anything
I like, providing I will have it checked by two or three censors and an condition
that I will not write against the government and religion. (ICLE-CZ)
5.14. I have a good example in my family. (ICLE-PO)
Interestingly, the verb have and the first person plural imperative let’s are
not significant left co-occurrents of example in the BNC-AC but they are in
the BNC-SP corpus of spoken language. The verb have is often used in
speech with an inclusive we as subject (Example 5.15); let’s is typically used
with the verb take + example (Example 5.16).
5.15. Er in relation to existing employment sites er and Mr Laycock referred to

National Power, erm there we have an example of the attitude that the the
council is taking towards the the re-use of employment sites. (BNC-SP)
5.16. Let’s take the example of a cooker. (BNC-SP)
The verb give is the most significant co-occurrent of the noun example in the
BNC-SP. It is used in questions and first person plural imperative sentences
(Examples 5.17 and 5.18), two patterns that are not found in the BNC-AC-
HUM despite the fact that the verb is also a significant co-occurrent of
example in academic prose. By contrast, first person plural imperative

sentences with the verb give do appear in the ICLE (Example 5.19).4
5.17. Can you give an example when you say that the law is designed?
(BNC-SP)
5.18. Let me give you some examples. (BNC-SP)
5.19. Let me give you one example – appaling shots from the war in ex-Yugoslavia
that we can see nearly every day. (ICLE-CZ)
In summary, verb co-occurrents of the noun example provide further

evidence for the genre-bound nature of phrasemes: the preferred phraseo-
logical environment of the noun differs in academic writing and speech
(see Biber et al. 1999; 2004; Luzón Marco, 2000). Results suggest that
EFL learners sometimes select co-occurrences that are more typical of
speech, which can be interpreted as further indication of their lack of
register awareness.
Differences in phraseological or lexico-grammatical preferences are
often revealed by patterns of overuse and underuse of word forms. Thus,
the different forms of the verbs illustrate and exemplify are not all under-
used in learner writing. Table 5.1 above shows that the two verbs are under-
used in their –ed form only. This underuse corresponds to an underuse of
the passive constructions BE illustrated by/in (Example 5.20) and BE exem-
plified by/in (Example 5.21), the past participle exemplified following a noun
phrase (Example 5.22) and the patterns as illustrated/exemplified by/in
(Example 5.23):
5.20. The contrast between the conditions on the coast and in the interior is illus-
trated by the climatic statistics for two stations less than 30 km (18.5 miles)
apart. (BNC-AC-HUM)
5.21. The association of this material with the clerk is clearly exemplified by Chau-
cer’s Wife of Bath’s fifth husband, the clerk Jankyn, who, in the Wife of Bath’s
Prologue, reads antifeminist material to her from his book Valerie and
Theofraste. (BNC-AC-HUM)
5.22. Piaget’s claim that thinking is a kind of internalized action, exemplified in the
assimilation-accommodation theory of infant learning mentioned above, is
really a global assumption in search of some refined, detailed and testable
expression. (BNC-AC-HUM)
5.23. He assumed, without argument, that science, as exemplified by physics, is
superior to forms of knowledge that do not share its methodological character-
istics. (BNC-AC-HUM)
The verb illustrate is more often used with human subjects (11.76%) in
learner writing, and more specifically with the personal pronoun I:
5.24. I would like to illustrate that by means of some examples which, as you will
see, are very diverse; . . . (ICLE-DU)
5.25. In the worst cases people decide to suicide. I can illustrate that by a real
example. (ICLE-CZ)
It is also frequently used in sentence-initial infinitive clauses (13.72%):
5.26. To illustrate the truth of this, one has only to mention people’s disappointment
when realizing how little value has the time spent at university. (ICLE-SP)
5.27. To illustrate this point, it would be interesting to compare our situation with
the U.S.A.’s. (ICLE-FR)
As in professional academic writing, the noun case in point is very

rarely used in learner writing. When used, however, it sometimes appears in
lexico-grammatical patterns that are not found in expert academic
writing, e.g. in an infinitive clause with the verb take (Example 5.28)
or determined by a definite article and followed by the verb be and a
that-clause (Example 5.29).
5.28. However, wars always break out for economical reasons; For example, the
first world war, to take a case in point, did not start because the murder of
archduke Frank Ferdinand, heir of Autro-Hungary; that was only the straw
that broke the camel’s back. (ICLE-SP)
5.29. Professional observers see some even deeper danger in the emerging situation.
A great number of children spend more and more time watching television.
They take into consideration the behaviour patterns of film stars, they want to
be like them. The case in point is that little children learn how to smoke how to
drink how to be cunning and clever and get round the adults. Film stars are
usually very attractive and it’s not a surprise that children want to follow
them. (ICLE-RU)
EFL learners’ phraseological and lexico-grammatical specificities will be

discussed in detail in Section 5.2.3 below.
EFL learners may also experience difficulty with the meaning of single
words and phrasemes. For example, they sometimes use the abbreviation
i.e. instead of e.g. as an exemplificatory discourse marker (Examples 5.30
to 5.32). The abbreviation i.e., however, is a synonym of ‘that is’ used to

reformulate by paraphrasing or clarifying, and not an exemplifier at all.
5.30. The states mostly tend to solve their politic problems in a peaceful way (*i.e.
[e.g.] the split of Czech federation or the unification of Germany). (ICLE-CZ)
5.31. One of the examples that makes this point is related to children’s toys, because
nowadays children play with technological toys (*i.e.: [e.g.] video games),
and these toys do not let the children develop their imagination and, in many
cases, they are so inactive that playing with these toys does not permit physical
exercise. (ICLE-SP)
5.32. It might seem absurd, but many progressive social changes (*i.e. [e.g.] an
increase of individual liberty) may lead to further increase of crime. (ICLE-RU)
Learners also sometimes use as in lieu of the complex preposition such as

(Examples 5.33 to 5.37). It should be noted, however, that this erroneous
use is more frequently found in learner populations with Romance mother
tongue backgrounds.
5.33. Thus soldiers learned mostly bad habits *as [such as] smoking, drinking
(if possible) and being lazy in their leisure time. (ICLE-CZ)
5.34. In addition to the familiar subjects *as [such as] reading, writing and math-
ematics, time should be reserved for making children conscious of the fact that
there is more to life than the things we see. (ICLE-DU)
5.35. There should be particular institutions for those who are mentally alienated
*as [such as] the rapists, others for the young people, etc. (ICLE-FR)
5.36. In this essay I would like to show how, in my opinion, crime is caused by a pre-
disposition of the individuals and how, of course, other factors *as [such as]
society, culture and politics can influence this natural inclination. (ICLE-IT)
5.37. Another proof will be the role that imagination plays in all the Arts *as [such
as] Literature, Music and Painting. (ICLE-SP)
As illustrated in Example 5.38, the adverb namely is also sometimes misused

by EFL learners who use it instead of notably or another exemplifier.
5.38. This new wave of revolting trivial events is all the more worrying since it is
linked to a rise of the small delinquance, implying a generalized climate of ter-
ror and a total mistrust of the citizens towards the police forces and the law,
both accused of all vices and *namely [(most) notably] of being too lax with
those evils. (ICLE-FR)
Pour donner des exemples for instance, for example, such as, like
namely (c' est-à-dire)
above all (surtout)
http://page sperso-orange fr/frat. st.paul/BACK itde Survie.pdf
Example: for example, for instance, just as, in particular, namely, one example, such as, to
illustrate
http://fr.wikibooks.org/wiki/utilisateur:Jean-Francois_Gagnon/Anglais:Connective_words
Figure 5.5 The treatment of ‘namely’ on websites devoted to English connectors
This confusion is relatively common, which is not surprising as it is even

found on websites supposed to help learners master English connectors
(Figure 5.5).
More generally, namely is very often misused in learner writing and it is
not always clear what learners mean when they use this adverb:
5.39. Because the campus consists of modern buildings, built closely together, it is no
more than a ten minute’s walk to get where you need to be for lectures and semi-
nars. All the academic facilities are ?namely located on the main campus.
(ICLE-DU)
5.40. Why, then, so many people object to gay marriages and, at the same time, yearn
for equality? It is ?namely just equality what gay marriages are about, isn’t it?
(ICLE-FI)
5.41. The efforts made by the firms are obvious. They ?namely create replacement
products: they replace the gas in the aerosols and so we have ozone-friendly
aerosols, . . . (ICLE-FR)
5.42. Reluctance to eventually join The Common Market is ?namely caused by fear,
disbelieves, inferiority complex, short-sightedness or even nationalistic and
xenophobic tendencies. (ICLE-PO)
More examples of semantic misuse are illustrated and discussed in

Section 5.2.4.
Another explanation for the general overuse of the function of exempli-
fication in learner writing may be that exemplifiers are repeatedly used
when they are superfluous, redundant or even when other rhetorical func-
tions should be made explicit. In Example 5.43, the logical relation between
the two sentences is a causal link that is left implicit while an unnecessary
exemplifier is used:
5.43. I described there only some examples from the great number of criminal offences.
After some years many of those criminals will be set free because of their
relatively mild punishment. They had for example youthful age. (Youthful age
– by the way in contrast to the punishment of 16 years old boys in our country,
who got off with the light punishment, in England were recently sentenced two
10 years old boys for murder of a 3 years old boy to the lifelong punishment!)
(ICLE-CZ)
Section 5.2.5 will focus on the unnecessary use of lexical items that serve
rhetorical or organizational functions as well as on learners’ tendency to
clutter up their texts with too many logical devices.
EFL learners’ use of exemplifiers also differs from that of expert writers
with respect to positioning. A sentence-initial position for the adverbials for
example and for instance is clearly favoured in the ICLE, compared to the
BNC-AC-HUM:
5.44. But there are actually a number of things we all can do that make a difference.
For example, there ought to be information about different ways to save elec-
tricity. (ICLE-SW)
5.45. There were a lot of wars due to the religion. For instance, England has always
been divided according to the kind of religion in which a person believed.
(ICLE-SP)
The two adverbials are also repeatedly found at the end of a sentence in the
learner subcorpora (7.14% of the occurrences of for example and 8.4% of
the occurrences of for instance), although this position is rare in academic
professional writing (1.6% for for example; 1.3% for for instance):
5.46. Let us have a good look at television for example. (ICLE-PO)

5.47. They only want an easy to operate camera, a Single Use Camera for instance.
(ICLE-DU)
Aspects of sentence position are dealt with in Section 5.2.5.

In Section 4.4, I argued that Academic Keyword List (AKL) lexical items
and their phraseological patterns should be taught to EFL learners.
Learner corpus data support this claim as all the AKL words that are used
to give examples in academic prose present one or more learner-specific
difficulties. The adverb notably and the abbreviation e.g. are semantically
misused; the adverbials for example and for instance are predominantly used
in sentence-initial position; and the noun example and the verbs illustrate
and exemplify are used in learner-specific phraseological patterns. It was also
argued that the pedagogical relevance of non-AKL items – the preposition
like, the nouns illustration and case in point and the expressions to name but a
few and by way of illustration – depended on whether learners already
used these exemplifiers and how they used them. The analysis of the ICLE
corpus suggests that:
– A word of caution is needed against excessive reliance on the preposition

like;
– The noun illustration should be specifically taught to upper-intermediate
and advanced learners as it is underused in the ICLE;
– The specific lexico-grammatical patterns of case in point should also be
taught as this phraseme is repeatedly used in ‘unidiomatic’ patterns.
The pedagogical implications of learner corpus-based findings will be

further considered in Chapter 6.
5.2. Academic vocabulary and general

interlanguage features
A comparison of words that serve the rhetorical functions of ‘giving

examples’, ‘expressing cause and effect’, ‘comparing and contrasting’,
‘expressing a concession’ and ‘reformulating: paraphrasing and clarifying’
in learner and expert academic writing has made it possible to identify
six specific areas of where learner English varies from native-speaker
academic English. Section 5.2.1 focuses on learners’ limited lexical reper-
toire by examining aspects of over- and underuse. In Section 5.2.2, the char-
acteristics of learner’s lack of register awareness are presented. Section
5.2.3 explores the type of phraseological and lexico-grammatical patterns
that are found in most learner sub-corpora. Section 5.2.4 discusses patterns
of semantic misuse of connectors and abstract nouns. Learners’ tendency
to clutter their texts with unnecessary connectives is the focus of Section
5.2.5 and Section 5.2.6 illustrates their preference for placing connectors at
the beginning of sentences.
5.2.1. Limited lexical repertoire

Several studies based on one or more ICLE subcorpora have argued that
‘these EFL writers are not equipped with the type of lexical knowledge
necessary for the type of writing task they are undertaking’ (Petch-Tyson,
1999: 60). An analysis of learners’ use of potential academic words from the
Academic Keyword List (AKL) supports this view. Table 5.10 shows that almost
50 per cent of the words in the AKL are underused in the ICLE, a percent-
age that rises to 52.1 per cent for nouns and 56.3 per cent for adverbs.
By contrast, the proportion of words in the AKL that are overused in learner
academic writing is only 21.4 per cent . The largest percentages of overused
items are found in nouns and in the ‘other’ category which includes prepo-
sitions, conjunctions, determiners, etc.
Table 5.11 gives examples of overused and underused AKL words in
the ICLE. It could be argued that ‘learner usage tends to amplify the high
frequencies and diminish the low ones’ (Lorenz 1999b: 59). For example,
overused items such as the nouns idea and problem, the verbs be and become
and the adjectives difficult and important are very frequent words in general
English (relative frequencies of more than 200 occurrences per million
words in the whole BNC). Conversely, underused items such as the nouns
hypothesis and validity, the verbs exemplify and advocate, the adverbs conversely
and ultimately and the prepositions as opposed to and in the light of are much
less frequent in English (relative frequencies of less than 30 occurrences
per million words in the whole BNC).
The picture, however, appears to be more complex than Lorenz’s quote
suggests. Not all high frequencies are amplified in EFL learner writing.
Many AKL words that appear with a relative frequency of more than 100
occurrences per million words in the whole BNC are underused in the
ICLE, e.g. the nouns argument, difference and effect, the verbs argue and
explain, the adjectives likely and significant and the adverbs generally and par-
ticularly (in bold in Table 5.11). Key function words such as between, in, by,
and of are quite representative of the nominal style of academic texts, where
60 per cent of all noun phrases have a modifier (Biber, 2006). However,
these highly frequent prepositions are underused in the ICLE, a fact that
can be related to EFL learners’ tendency to avoid prepositional noun
phrase postmodification (Aarts and Granger, 1998; Meunier, 2000: 279).
Table 5.10 The distribution of AKL words in the ICLE
overused no statistical underused

difference
nouns 86 [24.2%] 84 [23.7%] 185 [52.1%]

verbs 40 [17.2%] 93 [39.9%] 100 [42.9%]
adjectives 34 [18.9%] 59 [32.8%] 87 [48.3%]
adverbs 16 [18.4%] 22 [25.3%] 49 [56.3%]
other 21 [28.0%] 21 [28.0%] 33 [44.0%]
TOTAL 199 [21.4%] 277 [29.8%] 454 [48.8%]
Table 5.11 Examples of AKL words which are overused and underused in
the ICLE
overused underused
nouns advantage, aim, benefit, change, choice, addition, argument, assumption, basis,
conclusion, consequence, degree, bias, comparison, concept, contrast,
disadvantage, example, fact, idea, criterion, difference, effect, emphasis,
influence, possibility, problem, reality, evidence, extent, form, hypothesis, issue,
reason, risk, solution, stress outcome, perspective, position, scope,
sense, summary, theme, theory, validity
verbs aim, allow, avoid, be, become, cause, adopt, advocate, argue, assert, assess,
choose, concern, consider, consist, assume, cite, comprise, conduct,
contribute, create, deal, depend, develop, contrast, define, derive, describe,
exist, improve, increase, influence, emphasise, enhance, ensure, examine,
participate, prove, solve, study, treat, exemplify, explain, highlight, indicate,
use note, propose, reflect, reveal, specify,
suggest, view, yield
adjectives common, different, difficult, important, adequate, appropriate, comprehensive,
interesting, main, necessary, obvious, critical, detailed, explicit, extensive,
possible, practical, real, special, true, inherent, likely, major, misleading,
useful parallel, particular, prime, relative,
representative, significant, similar,
subsequent, substantial, unlikely
adverbs also, consequently, especially, extremely, adequately, conversely, effectively,
however, mainly, more, moreover, essentially, generally, hence, increas-
often, only, secondly, successfully, ingly, largely, notably, originally,
therefore particularly, potentially, previously,
primarily, readily, relatively, similarly,
specifically, subsequently, ultimately
other according to, because, due to, during, although, an, as opposed to, between, by,
each, for, less, many, or, same, several, despite, from, given that, in, in relation
some, than, this to, in response to, in terms of, in the
light of, including, its, latter, of, prior
to, provided, rather than, subject to,
the, to, unlike, upon, which
The preposition despite is underused, while its much less frequent synonym,
the complex preposition in spite of, is overused in learner writing
(Figure 5.6), irrespective of genre. In addition, words such as the noun
disadvantage, the verbs participate and solve, and the adverbs consequently and
moreover (underlined in Table 5.11) are overused although they appear with
frequencies of less than 50 per million words in the BNC.
The amplification of a restricted set of low frequency words in
learner writing may be partly explained by teaching-induced factors. Words
such as consequently, moreover and secondly usually appear in the long and
250
200
150
100
50
0
Academic News Fiction Speech Learner
writing writing
despite in spite of
Figure 5.6 The use of ‘despite’ and ‘in spite of’ in different genres
undifferentiated lists of connectors provided in EFL/EAP teaching materi-

als (see Section 6.1). This situation may be compounded by problems of
semantic misuse as will be discussed in Section 5.2.4. The underuse of some
frequent, but semantically specialized, words probably stems from learners’
tendency to rely on all-purpose, general, and vague words where more pre-
cise vocabulary should be used (Granger and Rayson, 1998; Petch-Tyson,
1999). Another tentative explanation may be that EFL learners do not
amplify any high frequencies words except those that are common in
speech. As argued by Baayen et al. (2006), ‘the complexity of the frequency
variable has been underestimated’ and it may be that more emphasis should
be placed on the explanatory potential of spoken frequency counts.
Underused words such as argument, issue, assume, indicate, appropriate, and
particularly are quite frequent in general English (as represented by the
whole BNC), but their frequencies are significantly less when the conversa-
tion component is analysed separately.
In Section 5.1, it was shown that, although they generally overuse exem-
plifiers, EFL learners make little use of a number of EAP-specific lexical
devices such as the verbs illustrate and exemplify or the adverb notably. They
rely instead on a restricted lexical repertoire mainly composed of the adver-
bials for example and for instance, the noun example and the prepositions like
and such as. The same conclusion holds for learners’ use of cause and effect
lexical items, which is compared with that of expert writers in Appendix 1.
Broadly speaking, learners overuse logical links signifying cause and effect
in their argumentative essays. This overuse does not, however, affect all
grammatical categories. When corpus comparisons are based on the total
Table 5.12 Two ways of comparing the use of cause and effect
markers in the ICLE and the BNC
Absolute frequency / total Absolute frequency / total

number of words number of ‘cause and
effect’ markers
nouns // −−
verbs // −−
adjectives // //
adverbs ++ //
prepositions ++ ++
conjunctions ++ ++
− − significantly less frequent (p < 0.01) in ICLE than in BNC-AC-HUM;
// no significant difference between the frequencies in the two corpora
number of running words in each corpus, the overuse seems to be generally

attributable to adverbs, prepositions and conjunctions. The categories of
nouns, verbs and adjectives do not display significant patterns of over- or
underuse. By contrast, when frequencies are compared to the total number
of cause and effect lexical items, only prepositions and conjunctions are
significantly overused, while nouns and verbs are underused (Table 5.12).
This means that, compared to expert writers, EFL learners prefer to use
prepositions, conjunctions and, to a lesser extent, adverbs to express a cause
or an effect, and tend to avoid nouns and verbs.
Table 5.13 shows that, even though EFL learners prefer to use preposi-
tions, conjunctions and adverbs to express cause and effect, not all indi-
vidual connectors are overused in learner writing. The overuse of
conjunctions largely stems from learners’ marked preference for because,
which represents 19.9 per cent of all cause and effect markers in the ICLE.
Lorenz (1999b) examined the use of causal links in essays written by
16-to-18-year-old German learners and described the marked overuse of
the conjunction because as ‘wild-card use’. He argued that ‘if a linguistic
element is used as an all-purpose wild card, that usage is bound to include a
number of instances of over-extension. In other words, it can be expected
that learners may disregard target-language restrictions which are not that
obvious, or even accounted for in the standard grammars, but which are nev-
ertheless observed by the native speakers. Such “simplification” is one of the
most frequently cited features of learner language’ (Lorenz, 1999b: 60–1).
Several of the overused lexical items are massively overused in learner
writing. The adverb so represents 11.5 per cent of the ‘cause and effect’
Table 5.13 The over- and underuse by EFL learners of specific devices to express
cause and effect (based on Appendix 1)
overuse no statistical underuse TOTAL

difference
2 [19%] 4 [37%] 5 [45%]

11
nouns root, consequence cause, factor, reason, source, origin, effect, [100%]
result outcome, implication
1 [6%] 3 [18%] 13 [76%]
cause bring about, generate, give rise to,
contribute to, lead to induce, prompt, 17
verbs stem, provoke, result [100%]
in, yield, arise,
derive, emerge,
follow, trigger
0 1 [50%] 1 [50%] 2
adjectives
responsible (for) consequent [100%]
4 [40%] 2 [20%] 4 [40%]

consequently, as a therefore, in accordingly, thus, 10
adverbs
result, as a consequence hence, thereby [100%]
consequence, so
3 [27%] 6 [54%] 2 [18%]
because of, due to, as a result of, owing in view of, in (the)
thanks to to, as a consequence light of 11
prepositions
of, on the grounds of, [100%]
in consequence of, on
account of
2 [40%] 0 3 [60%]
5
conjunctions because, this/that is for, so that, on the
[100%]
why grounds that
TOTAL 12 [21%] 16 [29%] 28 [50%] 56
[100%]
lexical items used by learners while it only accounts for 7.2 per cent of those
in expert writing. Other examples of ‘lexical teddy bears’ (Hasselgren,
1994) or ‘pet’ discourse markers (Tankó, 2004) are the prepositions because
of and due to. In their study of expressions of doubt and certainty, Hyland
and Milton (1997) reported similar findings: Cantonese learners used a
more limited range of epistemic modifiers, with the ten most frequently
used items (will, may, think, would, always, usually, know, in fact, actually, and
probably) accounting for 75 per cent of the total.5
On the other hand, 50 per cent of the lexical devices which serve to
express cause or effect in expert writing are underused by learner writers.
While underuse was found in all grammatical categories, the proportions
varied significantly. Nouns and verbs constitute a large proportion of the
possible ways of expressing a cause or an effect in academic prose, but
64.3 per cent of them are underused in the ICLE (e.g. the nouns source,
effect and implication; the verbs induce, result in, yield, arise, emerge and stem
from). As will be discussed in Section 6.1, this may be explained by teaching-
induced factors, as lexical cohesion has been largely neglected in teaching
materials (textbooks and especially grammars), where the focus has gener-
ally been on adverbial connectors.
An analysis of the lexical items which serve to express a comparison or a
contrast in academic prose shows that the rate of underuse is also quite
high in this function. Table 5.14 shows that almost half of all comparison
and contrast markers are underused. As with cause and effect lexical items,
the degree of underuse varies significantly. Nouns and adjectives (e.g. resem-
blance, similarity, contrast, similar, distinct, and unlike) account for 59 per cent
of all underused lexical items in the comparison and contrast category. The
rate of overuse is relatively low, but once again overused items include
words and phrasemes that are more frequent in speech (e.g. look like, in the
same way) (see Section 5.2.2) as well as commonly misused expressions such
as on the contrary (see Section 5.2.4). Unlike the cause and effect lexical
items, overused comparison and contrast word do not compensate for the
underused ones. Comparisons and contrasts are generally underused in
learner writing.
In summary, EFL learners tend to rely heavily on a restricted set of greatly
overused adverbs, prepositions or conjunctions to establish textual cohe-
sion. Logical links can also be provided by nouns (cf. the concept of ‘label-
ling’ explained in Section 1.3), verbs and adjectives, which often account
for a large proportion of the lexical strategies used to serve a specific rhe-
torical or organizational function in expert academic prose. These cohesive
devices, however, do not seem to be readily accessible to upper-intermedi-
ate/advanced EFL learners. This is not particularly surprising, as lexical
cohesion has generally been neglected in teaching materials. These
findings are not restricted to EFL learners: although they may become
fluent in English conversational discourse, English as a Second Language
(ESL) speakers have also been reported to ‘continue to have a restricted
repertoire of syntactic and lexical features common in the written
academic genre’ (Hinkel, 2003: 1066). Tables 5.13 and 5.14 provide useful
Table 5.14 The over- and underuse by EFL learners of specific devices to express comparison and contrast (based on Appendix 2)
overuse no statistical difference underuse TOTAL
0 5 [33%] 10 [67%]
nouns parallelism, difference, distinctiveness, resemblance, similarity, parallel, analogy, 15 [100%]

the contrary, the opposite contrast, comparison, differentiation,
distinction, the same, the reverse
2 [22%] 5 [56%] 2 [22%]
verbs look like, compare resemble, correspond, differ, parallel, contrast 9 [100%]
distinguish, differentiate
2 [11%] 4 [22%] 12 [67%]
adjectives same, different alike, contrary, opposite, reverse similar, analogous, common, comparable, 18 [100%]
identical, parallel, contrasting, differing,
distinct, distinctive, distinguishable, unlike
4 [19%] 10 [48%] 7 [33%]
in the same way, on the other hand, on analogously, differently, identically, similarly, likewise, correspondingly,
adverbs the one hand, on the contrary, parallely, reversely, contrariwise, by by/in comparison, conversely, by/in contrast, 21 [100%]
way of contrast, contrastingly, distinctively
+ erroneous expressions quite the contrary, comparatively
2 [22%] 3 [33%] 4 [44%]
prepositions like, by/in comparison with in parallel with, in contrast to/with, unlike, as opposed to, as against, versus 9 [100%]
+ erroneous expressions contrary to
0 1 [33.33%] 2 [66.67%]
conjunctions 3 [100%]
whereas as, while6
1 [25%] 3 [75%] 0
other expressions as … as, in the same way as/ that, compared 4 [100%]
with/to,
CONJ compared with/to
TOTAL 11 [13.9%] 31 [39.2%] 37 [46.8%] 79
information about learners’ particular needs. Section 6.3 will discuss how
they can be used to inform pedagogical material.
In this section, the breadth of EFL learners’ lexical repertoire has been
examined in terms of the proportion of over- and underused AKL single
words and mono-lexemic units used to perform specific rhetorical func-
tions. In Section 5.2.3, it will be shown that the limited nature of EFL learn-
ers’ lexical repertoire also stems from a restricted use of the phrasemes and
lexico-grammatical patterns typically found in expert academic prose.
5.2.2. Lack of register awareness

Many learner corpus-based studies have reported on EFL learners’ lack of
register awareness (e.g. Granger and Rayson, 1998; Lorenz, 1999b;
Altenberg and Tapper, 1998; Meunier, 2000; Ädel, 2006). These studies,
however, have often focused on learners with the same mother tongue
background. The large-scale study undertaken here allows for a more sys-
tematic description of register awareness, by exploring the way EFL learn-
ers with different mother tongue backgrounds use academic vocabulary.
In the ICLE, most rhetorical functions are characterized by the overuse
of at least one lexical item that is more typical of speech than of expert writ-
ing (Table 5.15). Examples 5.48 to 5.52 illustrate overused lexical items that
are more frequent in the BNC spoken component than in the BNC-AC-
HUM: the adverb so to express an effect, the adverb though to introduce a
concession, the adverbial of course to express certainty, the stem I am going to
talk about to introduce a new topic, and the adverbial all in all which is used
to ‘show that you are considering every part of a situation’ (Longman
Dictionary of Contemporary English (LDOCE4)).
5.48. Many people who are in this situation think that this is a waste of time: you
lose an entire year. So they want to get rid of the military service. (ICLE-DU)
5.49. Spanish holds an important position in South America and increasingly so in
the United States, too. According to Crystal it has little further potential ouside
Spain, though. (ICLE-FI)
5.50. But practically everybody is able to dream. Of course, there are different people
with different concepts of happiness, different thoughts and emotions.
(ICLE-RU)
5.51. In this essay I am going to talk about the link between crime and politics; what
I want to demostrate is that a good way of making politics can cut the roots to
crime. (ICLE-IT)
Table 5.15 Speech-like overused lexical items per rhetorical function
Rhetorical function Speech-like overused lexical item
Exemplification like
Cause and effect thanks to
so
because
that/this is why
Comparison and contrast look like
like
Concession the (sentence-final) adverb though
Adding information sentence-initial and
the adverb besides
Expressing personal opinion I think
to my mind
from my point of view
it seems to me
Expressing possibility and certainty really
of course
absolutely
maybe
Introducing topics and ideas I would like to/want/am going to
talk about
thing
by the way
Listing items first of all
Reformulation: paraphrasing and
clarifying
Quoting and reporting say
Summarizing and drawing conclusions all in all
5.52. Thanks to them anyone willing to broaden his/her general knowledge of the
world has an easy access to useful information. All in all, there are many ways
in which mass media affect our approach to reality and they are, by no means,
all positive or good for us. (ICLE-PO)
Gilquin and Paquot (2008) examined the use of some of the lexical items
listed in Table 5.15 in the ten learner corpora used here as well as in four
L1 sub-corpora (Norwegian, Japanese, Chinese, and Turkish) from the
second version of the International Corpus of Learner English (ICLEv2)
(Granger et al., 2009). The corpus totalled around 1.5 million words.
We compared the frequencies of speech-like lexical items in learner writing

with their frequencies in the 10-million word spoken component (BNC-SP)
and the 15-million word academic sub-corpus of the British National
Corpus. Our findings support Lorenz’s (1999b: 64) statement that there is
‘mounting evidence that text-type sensitivity does indeed lie at the heart of
the NS/NNS numerical contrast.’ They show that the relative frequency of
these speech-like lexical items in learner writing is often situated between
their frequency in academic prose and in speech (see the bar charts for
maybe, I would like/want/am going to talk about, really, absolutely, definitely, by the
way and though in Figure 5.7). However some of these items (so expressing
effect, it seems to me, of course and certainly) are even more frequent in learner
writing than in speech.
The overuse of several of these speech-like lexical items has been high-
lighted in a number of studies focusing on specific L1 learner populations.
For example, Chen (2006) reports on the overuse of besides in Taiwanese
student writing; Lorenz (1999b) discusses the marked overuse of the
conjunction because and the adverb so in German learner writing; French,
Spanish and Swedish learners’ heavy reliance on I think to express their
personal opinion is reported by Granger (1998b), Neff et al. (2007) and
Aijmer (2002); Japanese, French and Swedish learners’ overuse of of course
is highlighted by Narita and Sugiura (2006), Granger and Tyson (1996) and
Altenberg and Tapper (1998). Using the ICLE, my results suggest that these
features are often shared by a large proportion of the learners investigated,
irrespective of their mother tongues, and are therefore likely to be develop-
mental or teaching-induced. It remains to be seen, however, whether lack
of register awareness is a typical feature of EFL learner writing or whether
it is a more general characteristic of novice writing. This issue will be
touched upon in Section 5.4.
Different EFL learner populations, however, do not use speech-like
lexical items similarly. Although all L1 learner populations overuse the
adverb maybe when compared to the BNC-AC-HUM, Table 5.16 shows
that relative frequencies differ widely across L1 populations. Another
example is EFL learners’ use of I think, which is overused by all
L1 learner populations while showing marked differences across learner
L1 sub-corpora. As shown in Table 5.17, relative frequencies range from
17.29 occurrences per 100,000 words in the Polish learner sub-corpus
(ICLE-PO) to 143.57 occurrences per 100,000 words in the Swedish
one (ICLE-SW). This huge difference may be partly explained by L1
influence. Studies in contrastive rhetoric (e.g. Connor, 1996; Vassileva,
1998) have shown that features of writer visibility in academic prose may
differ markedly across languages.
350 1200
300 1000
250
800
200
600
150
400
100
50 200
0 0
Frequency of maybe (pmw) Frequency of so expressing effect (pmw)
40 20
35
30 15
25
10
20
15
5
10
5 0
0
Frequency of it seems to me (pmw) Frequency of I would like/want/am going to
talk about (pmw)
2000
1500
1000
500
0
really of course certainly absolutely definitely
Frequency of amplifying adverbs (pmw)
120
40 100
80
30
60
20
40
10 20
0 0
Frequency of by the way (pmw) Frequency of through at the end of a sentence
(pmw)
Academic writing: British National Corpus, academic component (15million words)

Learner writing: ICLEv2 (14 L1s; 1.5million words)
Speech: British National Corpus, spoken component (10million words)
Figure 5.7 The frequency of speech-like lexical items in expert academic writing,
learner writing and speech (based on Gilquin and Paquot, 2008)
Table 5.16 The frequency of

‘maybe’ in learner corpora
relative freq. per

100,000 words
ICLE-IT 48.18 ++
ICLE-GE 38.34 ++
ICLE-DU 35.13 ++
ICLE-CZ 32.88 ++
ICLE-SP 32.28 ++
ICLE-SW 31.21 ++
ICLE-FI 24.74 ++
ICLE-FR 20.34 ++
ICLE-PO 16.37 ++
ICLE-RU 13.26 ++
BNC-AC-HUM 1.93
Legend: ++ frequency significantly higher
(p < 0.01) than in the BNC-AC-HUM
Table 5.17 The frequency of

‘I think’ in learner corpora
relative freq. per

100,000 words
ICLE-SW 143.57 ++
ICLE-IT 134.06 ++
ICLE-RU 121.13 ++
ICLE-CZ 101.7 ++
ICLE-FR 94.61 ++
ICLE-GE 72.11 ++
ICLE-SP 66.59 ++
ICLE-FI 55.87 ++
ICLE-DU 51.77 ++
ICLE-PO 17.79 ++
BNC-AC-HUM 6.14
Legend: ++ frequency significantly higher
(p < 0.01) than in the BNC-AC-HUM
5.2.3. The phraseology of academic vocabulary in learner writing

In this section, I first present the major results of an analysis of recurrent
word sequences in EFL learner writing. I focus on aspects of over-
and underuse of word sequences that include AKL words before discussing
learner-specific clusters that are not found in professional academic
prose. Learner writing is also typically recognizable by a whole range of co-
occurrences that differ from academic prose in quantitative and qualitative
terms. I illustrate this with a comparison of the co-occurrents of the noun

conclusion in academic and learner writing and examine EFL learners’ phra-
seological infelicities and lexico-grammatical errors.
An analysis of word sequences in EFL learner writing

The results presented in this section are based on an analysis of 2-to-5
word sequences that are over- or underused in learner writing. The compari-
son between the ICLE and the BNC-AC-HUM was performed with the
Keywords option of the software tool WST4. The results show that learner
writing is characterized by a marked underuse of a large proportion of
the 2-to-5 word sequences that include AKL words and that are typically used
to serve specific rhetorical and/or organizational functions in academic
prose. EFL learners rely instead on a restricted set of clusters which they
massively overuse (e.g. for example, main reason, it depends, more and more,
in order to, the problem is that). Granger (1998b) suggests that the use of these
sequences ‘could be viewed as instances of what Dechert (1984: 227)
calls “islands of reliability” or “fixed anchorage points”, i.e. prefabricated
formulaic stretches of verbal behaviour whose linguistic and paralinguistic
form and function need not be “worked upon”’ (Granger, 1998b: 156).
This is also consistent with the author’s statement that ‘while the foreign-
soundingness of learners’ productions has generally been related to a lack of
prefabs, it can also be due to an excessive use of them’ (Granger, 1998b: 155).
The foreign-soundingness of EFL learner writing also stems from learners’
overuse of AKL words in clusters that are not typical of the particular genre
of academic prose but are more frequently used in speech or more informal
types of writing (e.g. people claim that, I will discuss, from my point of view, because
of the fact7).
Table 5.18 shows that EFL learners overuse adjective + noun sequences
with ‘nuclear’ adjectives (see Section 1.1.1) such as main (e.g. main reason,
main cause, main problem), real (e.g. real problem, real value), important
(e.g. important role, important question, important factor), great (e.g. great num-
ber, great importance), different (e.g. different points, different problems, different
reasons) and big (e.g. big problem) to the detriment of more EAP-like phrase-
mes such as extensive use, crucial importance, central issue, significant number,
integral part, lesser extent and wide variety. Similarly, they overuse adverb +
adjective/adverb /conjunction sequences with highly frequent adverbs
such as mainly (e.g. mainly because), quite (e.g. quite clear) and very (e.g. very
important) but make little use of phrasemes such as readily available, relatively
few, significantly different, almost entirely, closely associated, particularly interesting,
more generally, highly significant and precisely because.
Table 5.18 Examples of overused and underused clusters with AKL words
Overused clusters Underused clusters
for example, for instance, important to, main reason, opportunity of, therefore I, and by contrast, in particular, was probably, a similar, the view, suggestion that, described
therefore, have problems, are concerned, another important, mainly because, only as, suggested above, was effectively, still further, more generally, readily available,
because, quite clear, different reasons, totally different, different way, more difficult, relatively few, more significantly, is ultimately, he concludes, on average, central issue,
2-word clusters
great importance, very important, main cause, main problem, absolutely necessary, certain respects, radically different, consistent with, crucial importance, significantly
because I, negative consequences, real problem, great amount, good idea, I consider, different, extensive use, final analysis, they suggest, inferred from, listed above, general
great part, important part, big problem, best solution, allows us, conclusion I, different principles, inherent in, major source, particular attention, highly significant, by
points, we can, can choose, it depends, good use, good example, real value, important comparison, considerable degree, perhaps because, much emphasis, he cites, provide
question, important factor, I can, different problems evidence, little evidence, central figure, in practice, reports that, allowing for, what
appears, discussed in, may suggest, reported by, precisely because, crucial role, integral
part, wide variety, they argued, partly because, somewhat different, almost entirely, he
remarks, his method
as a result, as a consequence, in my view, more and more, more or less, take into in terms of, the absence of, the view that, extent to which, the implications of, an
3-word clusters
account, advantages and disadvantages, aim of this, pay attention to, as a account of, a theory of, in relation to, an attempt to, closely associated with, a
conclusion, take into consideration, of great importance, it means that, affect our considerable degree, as distinct from, high degree of, high proportion of, it seems likely,
approach, people claim that, I will discuss, may say that, prevents us from, provides us various forms of, a concern with, to this extent, despite the fact, the hypothesis that, the
with, issue of, this need not, at any rate, by reference to, in certain respects, were subject to, in
his view, in view of, it was claimed, it follows that, by showing that, this suggests that,
be ascribed to, when compared with, as noted above, is described in
the problem is that, it is very difficult, the fact is that, is the fact that, it is also true, it may be that, may well have been, to the effect that, are likely to be, would seem to be,
there are also people, a great number of, it is high time, it is obvious that, as much as to the extent that, with the exception of, it does not follow, it seems likely that, in the
4-word clusters
possible, it is true that, to a great extent, because of the fact, to answer this question, in presence of, the edge of the, it was difficult to, the immediate aftermath of, it is possible
order to achieve, it is necessary for, that, can be related to, similar to that of, the total number of, it is unlikely that, a wide
variety of, in the absence of, to the advantage of, on the assumption that, as an
attempt to, on the basis of, in the belief that, might have been expected, with the
exception of, the extent to which, was by no means, in the presence of, no reason to
suppose, with the result that, it would appear that, it is assumed that, may have been
used
5-word
from my point of view, far as I am concerned, there are more and more, it is very as in the case of, it has been suggested that, it could be argued that, in so far as they, it
difficult to, but it is true that, this is not the case, as a matter of fact, it is very is more likely that, it is hardly surprising that, be defined in terms of, it is worth noting
important to, it is a fact that, one of the most important that, be explained in terms of
The results also seem to support the widely held view that EFL learners’
academic writing is characterized by ‘firmer assertions, more authoritative
tone and stronger writer commitments when compared with native speaker
discourse’ (Hyland and Milton, 1997: 193) (see also Petch-Tyson, 1998;
Lorenz, 1998; Neff et al., 2004a). EFL learners state propositions more force-
fully and make a more overt persuasive effort: they overuse communicative
phrasemes that serve as attitude markers (e.g. it is very difficult to, it is very
important to) and boosters (e.g. but it is true that, it is a fact that, it is obvious that).
By contrast, they underuse hedges such as it is (more) likely that, it may be that,
it seems likely that, it is possible that, it is unlikely that, and it would appear that.
Word sequences used as self mentions are also much more frequent
in learner writing than in academic prose (Aijmer, 2002; De Cock, 2003;
Ädel, 2006). Examples include therefore I, because I, I consider, we can, I can, in
my view, I will discuss, provides us with, and from my point of view. Conversely,
academic writers use more clusters with third person pronouns with an
evidential function, e.g. he remarks, she cites, his method, they suggest, a differ-
ence which can be related to the more intertextual nature of professional
academic texts.
EFL learners also underuse a whole set of word sequences involving
the –ed form of verbs, and more precisely, their past participle form.
For example, they underuse the 2-word clusters described as, suggested above,
inferred from, listed above, discussed in and reported by, the 3-word clusters closely
associated with, it was claimed, be ascribed to, when compared with, as noted above,
is described in; the 4-word clusters can be related to, might have been expected, it is
assumed that, may have been used; and the 5-word clusters it has been suggested
that, it could be argued that, be defined in terms of, and be explained in terms of.
This is consistent with Granger and Paquot’s (2009b) finding that past
participles are the most frequent verb forms in academic prose, but are
highly underused in learner writing.
Verbs may have similar frequencies as lemmas in learner writing and
academic prose, but still display over- or underuse of some forms (Granger
and Paquot, 2009b). Examples of AKL verbs following this pattern are differ
and discuss. The lemmas do not differ significantly in their use. However,
differ is underused in its –ing form while discuss is overused in its unmarked
form (discuss) and underused in its –ed form. Similarly, some verbs are
under- or overused as lemmas without this affecting all forms of the verb.
For example, the lemma provide is underused in learner writing compared
to expert writing, but an analysis of word forms indicates that this only
applies to provided; use of other forms of the verb does not differ signifi-
cantly in the two corpora. Table 5.19 shows that the picture can even be
more complex: verb forms may be overused in some specific lexical bundles,
Table 5.19 Clusters of words including AKL verbs which are over- and underused
in learners’ writing, by comparison with expert academic writing
Lemmas and their Overused clusters Underused clusters

word forms
affect (++) affect our, media affect, affect us, was affected, not affect the
affect (++) affects the, affect our approach, media
affects (++) affect our, mass media affect, affect
our approach, media affect our
approach, mass media affect our,
affect our approach to reality, media
affect our approach to
allow (++) allowed to, not allowed, are allowed, allow them, allowed him, by allowing,
allowed (++) not allow, be allowed, allow them, it allow it, allows for, to allow, allowing
allows, allows us, are not allowed, are for, allow that, which allowed,
allowed to, not allowed to, allow them allowed him, to allow for
to, be allowed to, allows us to, are not
allowed to
concern (++) is concerned, are concerned, am was concerned, been concerned,
concerning (++) concerned, concerned about, it concerned to, concerned with, we are
concerns, concerning the, I am concerned, been concerned with, was
concerned, as I am concerned, far as I concerned with, concerned with the, is
am concerned concerned with the
depend (++) depends on, it depends, depending depending upon, depended on,
depends (++) on, much depends, it depends on, depended upon, will depend, depends
depending (++) depends on the, depending on the upon the, depend upon the, will
depended (− −) depend on
differ (//) - differed from, differs from the
differing (− −)
discuss (//) will discuss, to discuss, I will discuss was discussed, already discussed, and
discuss (++) discussed, discussed below, in
discussed (− −) discussing, discussed in, discussed in
chapter
tend (//) tend to, people tend, we tend, they they tended to, and tended to, has
tend (++) tend, people tend to, we tend to, they tended to, have tended to, tended
tended (− −) tend to to be
provide (− −-) provides us, provide us, provide them, might provide, provide the, provides
provided (− −) can provide, provide us with, provides that, provide an, provide evidence, to
us with, provide them with provide, provide a, provides an,
provides a, was to provide, to provide
an, to provide a
− − significantly less frequent (p < 0.01) in ICLE than in BNC-AC-HUM;
// no significant difference between the frequencies in the two corpora
Table 5.20 Examples of overused clusters in learner writing
Examples
2-word clusters in sum, of course, in fact, is why, let us, I think, instead of, look at, we must, or
maybe, really think, there are, my opinion, if you, but I, if we, there is, thanks to, we
want, sure that, I believe, people say, people think, when I, said that, I agree, many
things, no matter, means that, opinion is, I want, everybody knows, people often, let
them, we look, I hope, at all, people believe, even worse, I really, so why, we think,
people feel, we get, I guess, just imagine, think twice, quite sure, why we, I must, very
serious, helps us
3-word clusters in my opinion, in spite of, to sum up, first of all, I think that, in order to, I would
like, that is why, on the contrary, I believe that, to my mind, we have to, all kinds of,
I would say, we all know, people think that, if we want, it means that, by the way, a
look at, on one hand, I am convinced, people believe that, I will try, I agree that, and
of course, everybody knows that, many people think
4-word clusters on the one hand, last but not least, I would like to, some people say that, we can say
that, in this essay I, are more and more, I am sure that, there are a lot, it is impossible
to, I don’t agree with, I want to say, but if we look, I am afraid that, it is easy to
5-word clusters I do not think that, as a matter of fact, from my point of view, I would like to say, far
as I am concerned, it seems to me that, I do not agree with, but at the same time, due
to the fact that, I do not think so
while being underused in others. For example, the verb form concerned is
overused in as I am concerned and concerned about but underused in been
concerned with or we are concerned. Similarly, EFL learners overuse the sequences
it allows and allows us to and underuse the EAP sequence allows for.
A keyword analysis of recurrent word sequences is indispensable if we
want to build up a full picture of all the possible lexical realizations of rhe-
torical functions in learner writing. It makes it possible to uncover a whole
range of words and word sequences that are not typical of academic prose
but which are nevertheless used by EFL learners to organize scientific dis-
course and build the argument of academic texts. Examples of learner-
specific sequences that do not include an AKL word are given in Table 5.20.
They include:
– word sequences that are more frequently used in speech, e.g. of course,
I think that, there are a lot of (see Section 5.2.2);
– sequences that are not used in English to establish the logical link
intended by the EFL learner, e.g. on the other side (see Section 5.2.4 on
semantic misuse);
– sequences that exist in English but are very rare in all types of discourse,
e.g. the sequence as far as I am concerned which is repeatedly used to
express a personal opinion in the ICLE;
– ‘unidiomatic’ sequences such as as a conclusion used as a textual phraseme

to introduce a conclusion (see below for a co-occurrence analysis of the
noun conclusion in the ICLE);
– erroneous sequences such as in contrary, by the contrary, in the contrary, in
contrary to that are used to express a contrast in EFL learner writing.
EFL learners’ overuse of sequences that are rarely used by native-speakers

(such as as far as I am concerned or last but not least) or ‘unidiomatic’ sequences
(such as as a conclusion) may be partly explained by poor teaching materials
and/or the influence of their mother tongue. For example, Les fiches
essentielles du Baccalauréat en anglais (published in 2008 by Clairefontaine)
give a list of linking words that French students are encouraged to use in
the English test of the ‘Baccalauréat’ (the final secondary school examina-
tion which gives successful students the right to enter university) to ‘enrich
their essay and give more clarity to their argumentation’. This includes as a
conclusion but not in conclusion.8 The rare expression as far as I am concerned
is also given as a key expression for voicing one’s own opinion.
Preferred co-occurrences in EFL learner writing

In Section 5.2.1, it was shown that EFL learners manifest a marked prefer-
ence for a restricted set of single words and mono-lexemic phrasemes to
express logical links. They also use learner-specific functional equivalents
of these markers such as the sequence as a conclusion instead of in conclusion.
This learner-specific word combination represents 39.2 per cent of the
concluding textual phrasemes involving the noun conclusion in the ICLE.
In a longitudinal study of German learners’ use of the noun conclusion,
Mukherjee and Rohrback (2006) commented that the sequence as a conclu-
sion is gaining ground in learner writing to the extent that it is even more
frequent than in conclusion in the more recent corpus they use:
Interestingly, the most frequent phrase is no longer in conclusion, but as a

conclusion. This certainly is a problematical development because in con-
clusion is much more frequent and idiomatic than as a conclusion, the lat-
ter being notoriously overused by German learners of English at university
level as well. (Mukherjee and Rohrback, 2006: 224)
This development may be related to the increasing use of the internet

for study purposes and of the type of teaching materials available on this
channel, as discussed above. Another example of a learner-specific logical
say 6
emphasize 2
In conclusion 59 I 37 would 21 like to 13 tell 1
As a conclusion 40 (36%) (20.6%) (12.7%) mention 1
As conclusion 3 speak about 1
reiterate 1
quote 1
Figure 5.8 Phraseological cascades with ‘in conclusion’ and learner-specific

equivalent sequences
marker is on the other side which they use instead of on the other hand to
compare and contrast (see Section 5.2.4 below for more details of learners’
use of on the other side).
In Section 4.2.1, it was shown that mono-lexemic phrasemes such as
for example have their own phraseological patterns in academic prose.
However, these do not seem to be readily available to EFL learners,
who tend to produce their own phraseological ‘cascades’, ‘collocational
patterns which extend from a node to a collocate and on again to another
node (in other words, chains of shared collocates)’ (Gledhill, 2000: 212).9
Figure 5.8 shows that the textual phraseme in conclusion (or one of its
learner-specific functional equivalents) is very often directly followed by the
personal pronoun I in the ICLE. This is consistent with Ädel’s (2006) find-
ing that personal metadiscourse, i.e. metadiscourse items that refer explic-
itly to the writer and/or reader, serves a wide range of rhetorical functions
(including exemplifying, arguing, anticipating the reader’s reaction, and
concluding) in Swedish learner writing. The sequence in conclusion, I is
generally followed by the modal would to produce the word sequence in
conclusion, I would, which, in turn, very often introduces the sequence like to.
The sequence in conclusion, I would like to either introduces the verb say or
another verb of saying such as tell or mention.
EFL learners use AKL nouns and verbs in different lexico-grammatical
or phraseological patterns than professional writers. This has already
been illustrated by learners’ use of the noun example and the verbs illustrate
and exemplify in Section 5.1. Another example is learners’ use of the noun
conclusion. Table 5.21 lists the verb co-occurrents of the noun conclusion in
the ICLE. Some 30.8 per cent of verb co-occurrent types are significant co-
occurrents of the noun conclusion in the BNC-AC . However almost half of
the verb co-occurrent types (46.2%) used in the ICLE do not appear in the
BNC-AC. When tokens are analysed, the percentage of verb co-occurrents
162
Table 5.21 Verb co-occurrents of the noun conclusion in the ICLE
Verb + conclusion as Freq. in Statistically significant Appearance in conclusion as Freq. in Significant Appearance
object ICLE co-occurrent in the BNC-AC subject + verb ICLE co-occurrent in the
BNC-AC in BNC-AC BNC-AC
add up to 1 − x emerge 1 ** √
apply 1 − x arise 1 − √
approach 1 − x contain 1 − X

arrive at 5 ** √ be 23 ** √
bring 1 − x come 1 − X
bring sb to 2 − √ need 1 − √
come to 52 ** √ bring sb to 1 − X
*come into 1 − x
confirm 1 ** √
contain 1 − x
draw 25 ** √
*draw up 1 − x
end with 1 − x
escape 1 ** √
express 1 ** √
find 2 − x
gather 1 − x
get 1 − √
give 1 − √
have 1 − √
influence 1 − √
jump to 2 ** √
lead to 4 ** √
leave sb with 1 − x
look for 1 − x
make 11 − √
point to 1 * √
put 1 − x
put forward 1 − x
reach 3 ** √
write as 1 − x
TOTAL 128 tokens (32 types) TOTAL 29 tokens (7 types)
Academic vocabulary in the ICLE

Legend: ** significant co-occurrent in the BNC-AC (p < 0.01);
− not significant co-occurrents in the BNC-AC;
√ the co-occurrent appears in the BNC-AC;
x the co-occurrent is not found in the BNC-AC
163
that are significant co-occurrents of the noun conclusion in the BNC-AC rises
to 75.8 per cent as several of the verbs are repeatedly used in learner writing
(e.g. come to and draw). Conversely, the percentage of verb co-occurrents
that are not found in the BNC-AC falls to 12 per cent as ‘non-native’
co-occurrences are rarely repeated.
EFL learners use the collocations arrive at + conclusion, come to + conclu-
sion, draw + conclusion, lead + conclusion and reach + conclusion. However, they
do not always use them in native-like lexico-grammatical patterns.
In Examples 5.53 and 5.54, the indefinite article a is used instead of the
definite article the, which is always used in the BNC-AC when the conclu-
sion (underlined in the examples) is introduced by a that-clause. In Exam-
ple 5.55, the frequent phraseme lead to the conclusion that is used with the
personal pronoun us, a pattern which is very rarely found in academic
prose. In the context of EFL teaching/learning, these findings support
Nesselhauf’s (2005: 25) argument that collocations should not be viewed
as involving only two lexemes; other elements closely associated with them
should also be taught.
5.53. However, when we consider all the pros and cons of fast food we will certainly
arrive at a conclusion that it is not an ideal way of eating. (ICLE-PO)
5.54. And taking into consideration that Marx was a materialist we can come to a
conclusion that he himself would be attracted by the advantages of
television, and religion for him would remain the opium of the
masses. (ICLE-RU)
5.55. To sums up, all I have mentioned before lead us to the conclusion that if our
lifes were a little ‘easier’ and we wouldn’t be dominated by a world that is
constantly changing, due to new techniques and industrialization, we could
enjoy doing things as dream and imagine more frequently. (ICLE-SP)
The collocation escape + conclusion appears in two phraseological patterns

in academic prose: ‘it is difficult to escape the conclusion that’ and ‘we cannot
escape the conclusion that’. The single occurrence of the collocation that
appears in the ICLE is used in the native-like lexico-grammatical pattern
‘cannot escape the conclusion that’ but its subject is a nominal phrase headed
by the noun evaluation:
5.56 However, a more objective evaluation of the problem cannot escape the
conclusion that, drug use and abuse have occurred in all civilizations all over
the world, and that it is the criminalization of drugs that has created a much
heavier burden on society. (ICLE-DU)
In the collocation express + conclusion, the verb express has acquired

a semi-technical sense and means ‘make something public’. It is mainly
used in legal discourse and thus conveys a rather formal tone as illustrated
in Example 5.57. Its single occurrence in the ICLE can be qualified as ‘non-
native like’ as it appears with the first person singular pronoun I as subject
and the possessive determiner my (Example 5.58). It may be hypothesized
that the learner who wrote this sentence has been influenced by the native-
like co-occurrence ‘express one’s opinion/view’.
5.57. The Divisional Court expressed its conclusion in the following terms: . . .
(BNC-AC-HUM)
5.58. Finally, I wanted to express my conclusions. (ICLE-SP)
There are many other examples of EFL learners’ attempts at using native-
like collocations, which result in crude approximations. In Example 5.59,
the phrasal verb draw up is used in place of draw and in Example 5.60, the
preposition into replaces to, and no article is used in an attempt to produce
the native-speaker sequence ‘came to the conclusion that’.
5.59. Finally, a conclusion can be drawn up emphasizing our first statement,

that is: technology, science and industrialization have not killed dreams and
imagination. (ICLE-SP)
5.60. The woman started to think about the price of progress and came into conclu-
sion that automation causes more problems than it solves. (ICLE-PO)
In Example 5.61, the verb put forward is used with the noun conclusion. This
verb is commonly used with the abstract nouns plan and proposal, two nouns
that, like conclusion, combine with the verb draw to form collocations.
However, the verb put forward is not used with the noun conclusion in English
(see Figure 5.9). Howarth (1996; 1998) refers to this phenomenon as
a collocational overlap, i.e. a set of nouns which have partially shared
collocates (see also Lennon, 1996).
plan
draw
proposal
put forward
conclusion
Figure 5.9 Collocational overlap

5.61. Without putting forward premature conclusions, we can nevertheless notice

that a certain importance is granted to them. (ICLE-FR)
The semantic incongruity of the co-occurrence ‘put forward a conclusion’

is made apparent by contrasting the definitions of put forward and
conclusion. The verb put forward means ‘to suggest an idea, explanation etc,
especially one that other people later consider and discuss’ (LDOCE4)
while a conclusion is ‘something you decide after considering all the infor-
mation you have’ (LDOCE4). Thus, a conclusion can hardly be put forward
as it is supposed to be more than a suggestion and the result of serious con-
sideration and discussion.
As already pointed out by Nesselhauf (2005), EFL learners also produce
deviant verb + noun free combinations. The noun conclusion enters into
combinations that are not found in academic prose and which are semanti-
cally awkward:
5.62. Looking for the conclusion I would like to say that every person is individual
and each has his or her own character. (ICLE-RU)
5.63. Having considered the various aspects of capitalism a conclusion must
be gathered: the system cannot provide for the basic needs of the population;
consequently it needs to take steps in order to prevent combativity which will
endangered their interests. (ICLE-SP)
The same remark can be made about several adjective + conclusion

co-occurrences (Example 5.64). More importantly, however, adjective
co-occurrents of the noun conclusion in learner writing are not the most
typical ones in academic prose even though a large proportion of them
occur in the BNC-AC (see Table 5.22). The first ten most significant
adjective co-occurrents of the noun conclusion in the BNC-AC are general,
logical, tentative, similar, foregone, main, firm, different, opposite, and definite.
None of these appear in learner writing except for logical. This reveals learn-
ers’ weak sense of native speakers’ ‘preferred ways of saying things’.
5.64. Looking at this idea from the Polish point of view, also brings double standard
conclusions. (ICLE-PO)
The phraseology of EFL learner writing is also characterized by a number

of lexico-grammatical infelicities and errors. Learners sometimes use
the preposition about after the abstract noun account (e.g. an account
*about a murder (ICLE-RU)) or the preposition of instead of for after the
noun demand (e.g. the demand *of raw material (ICLE-GE)). They also use
Table 5.22 Adjective co-occurrents of the noun conclusion in the ICLE
Adjectives Frequency Significant co-occurrents of Appearance in the

conclusion in the BNC-AC BNC-AC
− √
absolute 1
awful 1 − x
certain 3 ** √
clear 1 ** √
clever 1 − x
concrete 1 − √
depressing 1 ** √
double standard 1 − x
fair 1 − √
false 1 − √
final 5 ** √
frightening 1 − x
interesting 1 − √
liberal 1 − x
logical 4 ** √
long-searched for 1 − x
obvious 1 ** √
overall 1 ** √
only 1 − √
own 4 ** √
personal 1 − √
premature 1 − √
private 1 − x
radical 1 − √
right 2 − √
sad 1 − x
same 2 ** √
satisfactory 2 ** √
satisfying 1 − x
sensible 2 − √
successful 1 ** √
terrifying 1 − x
understated 1 − x
unequivocal 1 − √
wrong 1 − √
TOTAL 51 tokens (35 types)
Legend: ** significant co-occurrent in the BNC-AC (p < .01);
− not significant co-occurrents in the BNC-AC;
√ the co-occurrent appears in the BNC-AC;
x the co-occurrent is not found in the BNC-AC.
a to-infinitive structure after the noun possibility instead of an –ing form

(e.g. the possibility *to learn a good job (ICLE-FR)). Other examples of colliga-
tional errors include suggest *to, related *with, attempt *of, and discuss *about.
Example 5.65 illustrates learners’ confusion between the prepositions despite
and in spite of, which results in the blend *despite of (cf. Dechert and
Lennon, 1989):
5.65. Despite *of [Despite] the absence of such professionalism our nation
overcame fascists. (ICLE-RU)
Learners also have a tendency to use the impersonal pronoun it in the

subject position after as:
5.66. It is a matter of fact that these ‘things’ cannot be bought and sold like shares
on the stockmarket. Luckily, I would say because otherwise only the rich would
be able to posses them as *it is [as is] unfortunately the case with many prod-
ucts in other areas of living. (ICLE-GE)
5.67. Because of the ambition for the power, their rivalry made them hold
continuous battles, as *it was [as was] the case of Catholics and Protestants.
(ICLE-SP)
Another source of error is the adjective same which is sometimes preceded

by the indefinite article in the ICLE:
5.68. The negative image of feminism makes it twice as hard for women to rise
above it than it would be if men were facing *a [the] same kind of dilemma.
(ICLE-FI)
5.69. When different people read *a [the] same book they have probably various
imaginations while reading. (ICLE-CZ)
It should be noted that very few of these errors are widespread in learner
writing and that some of them may be partly L1-induced. For example,
French learners use the erroneous colligation discuss *about as a translation
of the French discuter de (Granger and Paquot, 2009b).
5.2.4. Semantic misuse

In Section 5.2.1, the function of comparing and contrasting was shown to
be generally underused in learner writing. An analysis of individual lexical
items, however, reveals that the adverbials on the contrary and on the other
hand are overused in the ICLE. As Lorenz (1999b: 72) has demonstrated,
overuse is often accompanied by patterns of non-native usage. EFL learn-
ers’ semantic misuse of the phraseme on the contrary has already been
reported for different learner populations:
In Hong Kong, we are all familiar with students who use ‘on the contrary’
for ‘however/on the other hand’, thus adding an unintended ‘corrective’
force to the merely ‘contrastive’ function sought. (Crewe, 1990: 317)
Granger and Tyson (1996) report the same conceptual problems for French
learners. Lake (2004) states that a large proportion of EAP non-native
speakers who use on the contrary do so inappropriately. This is confirmed by
our corpus-based analysis of EFL learners from different mother tongue
backgrounds (see also Celce-Murcia and Larsen-Freeman, 1999: 534-535).
EFL learners typically use on the contrary erroneously (instead of a contras-
tive discourse marker such as on the other hand or by contrast) to contrast the
qualities of two different subjects (underlined in Examples 5.70 to 5.72).
Thus, in Example 5.70, the fact that Onasis had everything is contrasted
with the fact that Raskolnikov had nothing and the phraseme by contrast
would have been more appropriate.
5.70. Raskolnikov differs from Onasis, of course. Onasis had everything but he
wanted to have more. Raskolnikov, *on the contrary [by contrast], had
nothing. (ICLE-RU)
5.71. The young like crazy driving, overtaking and leading on the roads.
Sports cars are created for this use and this may be the reason why their price
is so high and use is expensive. *On the contrary [By contrast], station
wagons are not expensive in maintenance. The main users of this kind of
vehicles are families. (ICLE-PO)
5.72. For instance, most Americans have moved to the USA from different coun-
tries as immigrants. *On the contrary [By contrast], Europeans have lived
in their countries for hundreds of years. (ICLE-FI)
The semantic inappropriacy of on the contrary in EFL learner writing has

been attributed to teaching practices. Teaching materials often provide lists
of connectors in which the adverbial on the contrary is described as a phrase
of contrast, that is, as an equivalent alternative to on the other hand, by
contrast, etc. (cf. Crewe, 1990). For pedagogical purposes, Lake (2004)
proposes a checklist of contextual features that should be present when

on the contrary is employed:
As for the implications for learners, it now becomes possible to consult a

checklist of contextual features that should be present in order for on the
contrary to be appropriate:
one subject;
two contrasting qualities;
one positive statement and one negative statement open to similar
interpretations;
an argument, either genuinely present or implied, to which the two
statements, adjacent to the phrase both form a refutation.
Such a checklist may be simplistic in that it does not cover all the possible
lexico-syntactical environments in which the phrase might be encoun-
tered; but as a guideline for production, it ought to prove a useful
starting point from which EAP teachers can devise their own practice
materials. (Lake, 2004: 142)
Lake (2004) rules out the possibility of an L1 influence on EFL learners’

semantic misuse of on the contrary on the basis that over 70 per cent of
international students from widely different mother tongue backgrounds
produced two distinctly separate L1-equivalent items in a cloze test in which
they were required to insert on the contrary or on the other hand and provide an
equivalent phrase for both adverbials. It is, however, probable that misguided
teaching practices and L1 interact here. The L1 equivalent forms to on the
contrary and on the other hand may be characterized by different patterns of
usage and thus be the source of negative transfer. Granger and Tyson (1996),
for example, argued that French learners’ overuse and misuse of on the
contrary is probably due to an over-extension of the semantic properties of
the French ‘au contraire’, which can be used to express both a concessive
and an antithetic link. The potential influence of the first language on
French learners’ use of on the contrary is discussed in Section 5.3 below.
Lake (2004) considers EFL learners’ misuse of on the contrary to be ‘some-
thing of an exception’ and writes that ‘in the EAP context, such functional
phrases [connectives] are usually familiar to learners from an early stage,
and do not pose great problems of usage’ (Lake, 2004: 137). This view,
however, is over-optimistic and is clearly not reflected in our corpus-based
learner data. In Section 5.1, EFL learners’ inappropriate use of the abbre-
viation i.e. (in lieu of e.g.), the preposition as (instead of such as) and the
adverb namely was discussed. Other examples of semantically misused

lexical items in learner writing include on the other hand, on the other side,
moreover, besides, and even if.
Field and Yip (1992: 25) reported that on the other hand is frequently used
by Cantonese speakers to make an additional point, with no implied
contrast. They suggested that this semantic misuse might be L1 induced:
the Chinese equivalent of on the other hand is often misused by novice
L1 writers, who use it to mean ‘another side or aspect’. Although L1
influence may play a part in Hong Kong Chinese students’ inappropriate
use of the adverbial, erroneous uses of on the other hand are found in most
ICLE sub-corpora, which suggests that there are other contributing factors
to this learner difficulty. The following extracts are examples of the use of
on the other hand in the ICLE where it would have been more appropriate to
use no connector or an additive marker:
5.73. I strongly believe that there is still a place for dreaming and imagination in our
modern society. [P]10 Firstly, where there is a child, there are always dreams
and imagination. Everybody knows that children like inventing funny stories
and amusing plays by using their wide fantasy. This is one reason why
children always bring happiness and awake the adults’ childish part. *On the
other hand, fantasy is [also] a useful mean used by teachers in primary
schools to teach school subjects to their little students. So, it is children who keep
dreams and imagination alive! (ICLE-IT)
5.74. The re-introduction of the death penalty may have positive sides, too.
Criminality would be limited, because criminals would be afraid of the severe
punishment. [P] This might be an illusion, because *on the other hand [ø]
the death penalty develops violence and is incompatible with the basic laws of
humanity. (ICLE-GE)
5.75. The function of punishment is to show that crimes are not acceptable or that
they can solve any problems. *On the other hand the aim of punishments is
[also] to make the criminals obey the laws and show example to other’s so that
they will not follow the bad example and commit the same crime. (ICLE-FI)
The word combination on the other side sometimes appears in the ICLE in
places where a contrast seems to be the logical link intended by EFL learn-
ers. This does not occur in academic prose. It is illustrated in the following
examples:
5.76. Poland cannot reply with isolation as the unification still remains the best
solution to its problems. On the other side, all countries should understand
that history and its consequences cannot divide the continent. The successful
process of unification should be carried out with respect to nations’ rights and
without special privileges given to the powerful. (ICLE-PO)
5.77. Another big problem is our environment. There is pollution wherever you look.
We can no longer enjoy the sun in summer because of the hole in the ozone
layer. This hole is caused by technical improvements in the last decades. But on
the other side it is sometimes hard to live without car or aerosols. (ICLE-GE)
5.78. Europe 92 means well a loss of identity since we’ll be no longer Belgians,
Italians, English ... but Europeans. But on the other side we will form a new
nation with new hopes, new ideas . . . (ICLE-FR)
There is also some confusion between the conjunctions even if and even
though in EFL learner writing. Learners often use even if in lieu of even though
to introduce a concession:
5.79. However,*even if [even though] I agree that the American public school
system is defective, home schooling to me is no real alternative, as I feel that
parents are not the best teachers for their own children. (ICLE-GE)
5.80. We must forget about refrigerators containing CFC-11 and CFC-12, *even if
[even though] they are cheaper. (ICLE-PO)
5.81. We are as much a part of Europe as any other country here, *even if [even
though] we are not in the European Union. (ICLE-SW)
Even if should be used to introduce a condition, not a concession.

Compare:
5.82. Even if these descriptions are valid they still leave open a number of questions,
particularly why the same mechanisms do not operate with girls.
5.83. Even though these descriptions are valid they still leave open a number of
questions, particularly why the same mechanisms do not operate with girls.
In the second sentence, the writer knows and accepts that the descriptions
are valid. In the first sentence, he or she does not.
Semantic misuse has often been discussed in the literature in relation to
logical connectives. However, EFL learners also experience difficulty
with the semantic properties of other types of cohesive devices, and more
specifically, labels, i.e. abstract nouns such as issue, argument, and claim that
are inherently unspecific and require lexical realization in their co-text,
either beforehand or afterwards (Flowerdew, 2006). In addition to phraseo-
logical and lexico-grammatical infelicities, EFL learners’ use of labels is
characterized by semantic infelicity or lack of semantic precision. Learners,

for example, use the noun problem as an ‘all purpose wild card’ (cf. Lorenz,
1999b) in lieu of more specific nouns such as issue or question as illustrated
in the following sentences:
5.84. This short discussion of the main points linked to the problem [issue] of
capital punishment leads to the final question. (ICLE-PO)
5.85. The most important question concerning genetic engineering is the problem
[that] of gen manipulation with humans. (ICLE-GE)
5.86. If we are aware of the fact that such time-tables are very common for people
living in a modern society like ours, the problem [question] of the place of
imagination and dreaming is not even worth examining. Industrialisation
has transformed dreaming into a waste of time which is now “cleverly” linked
to money. (ICLE-FR)
The noun argument also seems to cause difficulty to EFL learners.

In Example 5.87, the rather unidiomatic expression ‘familiar arguments
about’ should be rephrased as ‘widespread or popular beliefs about’.
In Example 5.88, the sentences that follow the label argument would be
better described as ‘reasons’ why Big Tobacco did not depart from pre-
pared statements.
5.87. Female participation in making decisions concerning war and peace, economy
and environmental protection would be to the benefit of all. However it will not
be possible until males re-think and, hopefully, reject familiar arguments
[widespread/popular beliefs] about women being unreliable, irrational
and dependent on instincts. (ICLE-PO)
5.88. There are two main arguments [?reasons] that help us understand why Big
Tobacco stuck to their statements for so long. [P] First, the companies feared
the consequences that would follow a confession. They feared that there was
going to be even more legislation and regulation if they would ever admit to
lying. . . . . (ICLE-DU)
Other problematic labels include, among many others, aspect and issue.
In Example 5.89, another aspect introduces a second example (about the
unemployed and housewives) of the fact that you are judged by what you
do rather than by what you are, contrasting it with the first example (about
physicists and mathematicians). In Example 5.90, in certain aspects stands for
in some respects and the aspect of money probably refers to the ‘money issue’
or the ‘money question’ in Example 5.91.
5.89. Our modern western society puts a lot of pressure on people as far as work
is concerned. Your job is your “trademark”. Or, in other words, you are judged
by what you do rather than by what you are. Sad, but true. For example,
according to popular opinion you must be very intelligent if you are a physicist
or a mathematician. And another aspect is that [?by contrast,] the unem-
ployed or housewives are sometimes treated as social outcasts. (ICLE-GE)
5.90. Actually, bits of information from the remotest parts of the globe reach us in an
instant. Human beings can eventually feel as one great family, but only *in
certain aspects [in some respects], for as far as real good relations among
countries are concerned, it is still a matter of distant future. (ICLE-PO)
5.91. A legend exists that money was invented by the devil to tempt the mankind.
The aspect [?issue/question] of money includes the problem of equality.
There were and there are different ideas about making all people equal, because
it was considered that this would lead to common happiness. (ICLE-RU)
In Example 5.92, it is not quite clear what her issues refer to and in
Example 5.93, issue most probably stands for ‘product’:
5.92. Uta Ranke-Heinemann, the most famous woman in the field of Catholic
theology, tries to provide answers to them. Her issues [?] lies on the verge of
theology, philosophy and first of all, religion. She is employed in defining the
relation between faith and the mind. (ICLE-PO)
5.93. The picture I draw from my dear old houseman admittedly is nothing
but a mere cliché, a hyperbolic issue [product] of my vivid imagination.
(ICLE-GE)
5.2.5. Chains of connective devices

EFL learners’ texts are sometimes characterized by the use of too many
connective devices (Crewe, 1990; Chen, 2006; Narita and Sugiura, 2006).
The following text is an excerpt from an essay written by a French-speaking
EFL learner. Each sentence contains at least one connective device –
typically an adverbial connector or a sentence stem – which is often found
in sentence-initial position (see Section 5.2.6 below).
5.94. [1] But what about these prestigious institutions today? [2] To caricature
them rapidly one could say that universities consist of courses given by profes-
sors (competent in their fields) in front of a silent audience who is conscien-
tiously taking notes. [3] So one can wonder if a university degree really prepare
students for real world and what his value is nowadays. [4] I think it is true
that lectures in themselves are theoretical. [5] Firstly because students spend
most of their time sitting in big classrooms which do not allow practical
exercises but only ex cathedra lectures. [6] Secondly because the subjects of the
lectures are theoretical. [7] For example: during a general methodology course
(which, we think, could be more practical) different theories as Krashen’s,
Lado’s are studied in detail but practical points are hardly ever considered.
[8] However is it true that this formation does not prepare students for real
world? [9] I am of the opinion that the answer is no. [10] First I think that
university degrees are theoretical on purpose (as opposed to high schools which
are more practical.). [11] The reason is that, thanks to the theoretical back-
ground they have learned, university students are able to build up their own
way to achieve their aim. [12] Moreover they are also able to adapt or to
modify their method according to the situation. [13] To take the example of a
teacher again, I could say that a teacher in front of a classroom does not think
about particular methodological theories again but that he has created his own
methodology. [14] Secondly, I think that academic studies develop a critical
mind. [15] The students are indeed trained to analyse pieces of information
coming from different horizons from a critical point of view, which means that
they have to dissect them, to confront them and then to be able to pass judgment
on them. [16] That is the way they should create a personal opinion for them-
selves. [17] Nevertheless, I do not want to go too far. [18] I really think that
theory is essential but I am convinced that practice should also be present. [19]
Let’s take the example of a student in economics who has his certificate in his
pocket and proudly goes working in a big firm for the first time. [20] I would
compare this business man to a gentleman who perfectly knows the highway
code and who knows how to start and how to run through the gears but who
finds himself in the center of Paris at the peak hours the first time he really
drives! [21] By this example, I want to show that theory must always be
accompagnied by practical applications, which is not often the case at univer-
sity. [22] I think that this is a fully justified criticism against this institution.
Some EFL learners use many logical connectives between sentences simply
to indicate to the reader that they are adding another point (e.g. firstly,
secondly, for example, first, moreover, to take the example of). Several of these
connectors are superfluous and sometimes wrongly used (e.g. moreover
in sentence [12], indeed in sentence [15]). Crewe (1990) attributed EFL
learners’ massive overuse of connective devices to their attempt at imposing
‘surface logicality on a piece of writing where no deep logicality exists’
(Crewe, 1990: 320). He added that ‘over-use at best clutters up the text
unnecessarily, and at worst causes the thread of the argument to zigzag
about, as each connective points it in a different direction’ (ibid: 324).
The following excerpt from an EFL learners’ essay is a good example
of EFL learners’ use of logical connectors as ‘stylistic enhancers’, i.e. ‘words

or expressions that may be sprinkled over a text in order to give it an
“educated” or “academic” look’ (Crewe, 1990: 316) but whose presence will
not make the text coherent.
5.95. Furthermore, Hobbes is a stern determinist. He regards man, like nature, as

subject to the chain of cause and effect. Therefore a concept like ‘free will’ is
impossible. Hobbes even considers people as artificial creatures, not belonging
to nature, because they are not able to live together in harmony, something
which animals like bees and ants are capable of, because they are natural.
Of course, these ideas were as much an insult to man’s estimation of himself
as Darwin’s allegation, two hundred years later, that our ancestors used to live
in trees. As a consequence, Hobbes was accused of being an atheist and forbid-
den to publish any more books. (ICLE-DU)
As Aijmer (2001) showed in a study of Swedish EFL student writing, learn-

ers use I think to make their claims more persuasive rather than to express
a tentative degree of commitment. They often use I think or an equivalent
expression (e.g. I am of the opinion that, I am convinced that) when it is
communicatively unnecessary in the flow of argumentation. For example,
Sentence [18] in Example 5.94 could be rephrased as ‘Theory is essential
but practice should also be present’. The sequence I think it is true in
Sentence [4] corresponds to what Aijmer (2001) described as a ‘rhetorical
overstatement’, which the author regards as typical of non-native-speaker
argumentative essays. The clusters To me, I think and as far as I am concerned,
I think that in Examples 5.96 and 5.97 respectively are two more instances of
rhetorical overstatement.
5.96. To me I think technology and imagination are very much interrelated, and
then on the other hand I understand that they also can be seen as separate.
(ICLE-SW)
5.97. I agree with George Orwell, because as far as I am concerned I think that in
every country there are few people which are rich and many people which are
poor. (ICLE-IT)
The pedagogical implication of these findings is that, ‘important as these

links are, learning when not to use them is as important as learning when
to do so. In other words, students need to be taught that excessive use of
linking devices, one for almost every sentence, can lead to prose that sounds
both artificial and mechanical’ (Zamel, 1983: 27).
5.2.6. Sentence position

Linking adverbials occur in different sentence positions. They often occur
initially, as does however in Example 5.98. They can also occur in a medial
position, i.e. within the sentence, often immediately after the subject, as
shown in Example 5.99. The final position is also possible, but is more typi-
cal of speech as illustrated in Example 5.100.
5.98. In practice, the Red Army units did nothing to conciliate the Ukrainian
Left or the peasants. Agriculture was brutally collectivized and no conces-
sions were made in the use of the Ukrainian language and culture.
However, Denikin’s White armies counter-attacked and after seven months
the Red Army was obliged to withdraw. (BNC-AC-HUM)
5.99. Coysevox’s bust of Lebrun repeats – again with a certain restraint – the
general outlines of Bernini’s bust of Louis XIV. The face, however, shows a
realism and subtlety of characterization that are Coysevox’s own. (BNC-
AC-HUM)
5.100. It’d be worth asking him first, though. (BNC-SP)
EFL learners’ marked preference for the sentence-initial position has

been reported in various studies focusing on one L1 learner populations
(Field and Yip, 1992; Lorenz, 1999b; Zhang, 2000; Narita and Sugiura,
2006). Granger and Tyson (1996: 24) commented that ‘it is likely that this
tendency for learners to place connectors in initial position is not language-
specific’. Our analysis of connectors in the ICLE supports this hypothesis.
Table 5.23 shows that the total proportion of sentence-initial connectors in
learner writing is much higher than that found in academic prose (13.2%
compared to 6%). Examples include the preposition despite which appears
in sentence-initial position in 52 per cent of its occurrences in the ICLE but
only in 34.5 per cent in the BNC-AC-HUM (see Example 5.101), and
sentence-initial due to which is repeatedly used in learner writing but hardly
ever occurs in academic prose (Example 5.102).
5.101. Despite its commercial character Christmas still means a lot to me.
(ICLE-FI)
5.102. Due to these developments the production expanded enormously, which
meant that a greater number of people could be fed. (ICLE-DU)
Another example is the adverb therefore which often appears in sentence-

initial position in the ICLE but is not often used in that position in the
BNC-AC-HUM:
Table 5.23 The frequency of sentence-initial position of connectors in the

BNC-AC-HUM and the ICLE
ICLE BNC-AC-HUM
S-I Total % Rel. freq. S-I Total % Rel. freq.

freq. pmw freq. pmw
although 263 522 50.4 225.6 676 2,276 29.7 203.5

and 1456 32,236 4.5 1249 1374 91,306 1.5 413.6
as a result 71 103 68.9 60.9 65 102 63.7 19.6
as a result of 24 79 30.4 20.6 22 194 11.3 6.6
as far as X is 96 167 57.5 82.4 31 59 52.5 9
concerned
because 107 2,493 4.3 91.8 151 2,207 6.79 45.4
because of 62 530 11.7 53.2 46 599 7.67 13.8
consequently 103 179 57.5 88.4 60 143 42 18
despite 50 96 52 42.9 235 681 34.5 70.7
due to 29 246 11.8 24.9 3 195 1.5 0.9
even if 83 274 30.3 71.2 94 451 20.8 28.2
even though 46 127 36.2 39.5 28 248 11.3 8.4
for example 235 854 27.5 201.6 233 1263 18.4 70
for instance 93 344 27 79.8 86 609 14.1 25.9
furthermore 113 127 96.6 96.9 176 217 81.1 53
however 673 1,128 59.7 577.4 882 3,353 26.3 265.5
in spite of 47 106 44.3 40 42 159 26.4 12.6
moreover 255 292 87.3 218.8 365 495 73.7 109.9
nevertheless 170 250 68 145.8 392 676 58 118
on the 92 164 56.1 78.9 48 95 50.5 14.4
contrary
on the other 228 418 54.5 195.6 155 372 41.7 46.7
hand
so 805 1,436 56 690 675 1,894 35.6 203.2
thanks to 68 199 34.2 58.3 5 35 14.3 1.5
therefore 340 689 49.3 291.7 75 1,412 5.3 22.5
thus 221 446 49.5 189.6 756 1,767 42.8 227.6
TOTAL 5,730 43,505 13.2 4,916.24 6,675 110,808 6 2009
5.103. Scientific research as well as individual observations prove that eating habits
have a great impact on the condition of the human body and soul and, con-
sequently, on rest, sleeping and even dreams. Therefore people should pay
more attention to what they consume. (ICLE-PO)
These findings provide evidence for EFL learners’ lack of knowledge of

the preferred syntactic positioning of connectors in English.11 This lack has
often been attributed to L2 writing instruction. Flowerdew (1993) argued
that teaching materials do not provide students with authentic descriptions
of syntactic patterns of words. He showed that, contrary to what is often
taught in course books, the adverbial connector then rarely occurs in

sentence-initial position, but is more usually found in a medial position.
Similarly, Milton (1999: 225) discussed the problematic aspects of teaching
connectors by means of lists of undifferentiated items, and suggested
that one way in which instruction may skew EFL learners’ style is ‘by the
presentation of these expressions as if they occurred in only sentence-initial
position’ (see also Narita and Sugiura, 2006). Thus, EFL learners’ tendency
to place connectors in unmarked sentence-initial position seems to be
reinforced by teaching (see Granger, 2004: 135).
Unmarkedness provides another possible explanation for EFL learners’
massive overuse of sentence-initial connectors. Conrad (1999) studied
variation in the use of linking adverbials across registers. She showed that,
in both conversation and academic prose, the highest percentage of linking
adverbials appear in sentence-initial position and concluded that ‘initial
position seems the unmarked position for linking adverbials’ (Conrad,
1999:13) (see also Biber et al., 1999 and Quirk et al., 1985). EFL learners
seem to use the unmarked sentence-initial position as a safe bet.
Contrary to our expectations, the proportion of sentence-initial because is
lower in learner writing than in professional writing. However, sentence-
initial because is significantly more frequent (relative frequencies of 9.18 in
learner writing and 4.54 in academic prose). It is also used to serve different
functions in learner writing. In academic prose, sentence-initial because-
clauses are attached to a main clause. As shown in the following examples,
they introduce the cause of something that is described in the main clause:
5.104. Because these changes were worldwide, Europe’s history is inseparable from
world history between 1880 and 1945. (BNC-AC-HUM)
5.105. Because the death-rate was high, marriages were usually short-term. (BNC-
AC-HUM)
Unlike expert writers, EFL learners sometimes use sentence-initial because

to introduce new information in independent segments and give the cause
of something that was referred to in the previous sentence:
5.106. The crime rate would also strongly reduce and this is of course the main objec-
tive of all this measures. Because everybody wants to live in a safe society.
(ICLE-DU)
5.107. To directly try to change people with ‘experience of life’ would, at best, only be
to win Pyrrhic-victories, compared to this effective investment. Because deep
inside every man’s heart lies the ‘Indian’-insight that we are only borrowing
the earth from our children. (ICLE-SW)
5.108. In my opinion it is useful only for them, for their trial. Because their sorrow
is found as the extenuating circumstance. (ICLE-CZ)
EFL learners share this characteristic with ESL writers. In a comparison of

strategies for conjunction in spoken English and English as a Second
Language (ESL) writing, Schleppegrell (1996) found that students who
had spent most of their lives in the US and learnt English primarily through
oral interaction, transferred conjunction strategies from speech to essay
writing. They employed both an ‘afterthought’ because (Altenberg, 1984) to
add information in independent segments, and other types of speech-like
clause-combining strategies.
Conrad (1999) reported that, in academic prose, most linking adverbials
are placed in sentence-initial or medial position. Three types of medial
position are particularly frequent (Conrad, 1999: 14–15):
1. Linking adverbials which occur immediately after the subject as

illustrated in Example 5.99 above.
2. Linking adverbials which occur between an auxiliary and the main verb,
such as: All estimates of population size must therefore allow for a large measure
of conjecture, a fact stressed by all reputable modern historians who have worked
on this intractable subject. (BNC-AC-HUM)
3. Linking adverbials which occur between the main verb and its comple-
ment, e.g.: It is difficult to believe therefore that one of these mosaics was not
influenced by the other. (BNC-AC-HUM)
A medial position for connectors is quite typical of academic prose.

However, it is clearly less favoured by EFL learners. As indicated
above, teaching materials tend to focus on sentence-initial position, and
EFL learners probably feel unsafe about other syntactic positionings for
connectors.
Table 5.24 shows that several connectors are also repeatedly used in
sentence-final position in the ICLE, which is quite uncommon in the
BNC-AC-HUM. The final position is frequent in conversation, but rare in
academic prose. Conrad (1999) found that three highly frequent items –
then, anyway and though — account for the relatively high proportion of
sentence-final linking adverbials in native-speaker’s conversation. She
argued that these linking adverbials are commonly found in sentence-final
position as they serve important interpersonal functions:
Adverbials in conversation, in addition to showing a link with previous

discourse, can also play important roles in the interpersonal interaction
Table 5.24 Sentence-final position of connectors in the ICLE and the BNC-AC-
HUM
ICLE BNC-AC-HUM
S-F Tot. freq. % Rel. freq. S-F Tot. freq. % Rel. freq.
anyway 25 132 18.9 2.1 20 71 28.2 0.6

for example 63 854 7.4 5.4 20 1263 1.6 0.6
for instance 31 344 9.0 2.7 8 609 1.3 0.2
indeed 15 257 5.8 1.3 18 1413 1.3 0.5
of course 34 750 4.5 2.9 14 863 1.6 0.4
then 35 1054 3.3 3.0 17 3062 0.5 0.5
though 11 256 4.3 0.9 7 178 0.9 0.2
that takes place. These roles are often particularly noticeable for the
common adverbials in final position. (. . . ) [A] final though often occurs
when speakers are disagreeing or giving negative responses, final anyway
is often associated with expressions of doubt or confusion, and (. . . ) then
typically indicates that a speaker is making an inteference (sic) based on
another speaker’s utterance. The placement of these adverbials in final
position is consistent with previous corpus analysis of conversation that
has found that elements with particular interpersonal importance are
often placed at the end of a clause (. . . ). It may be, then, that in some
cases in conversation there is a tension between placing the linking adver-
bial at the beginning of the clause, due to its linking function, and at the
end of the clause, due to its interactional function. (Conrad 1999:14)
The type of interpersonal interaction that takes place in conversation is not

typical of academic prose. Thus, none of the linking adverbials commonly
associated with the final position in conversation are common in formal
writing. These findings suggest that the positioning of linking adverbials in
native discourse is directly influenced by the register in which they appear,
and the textual and/or interpersonal functions they serve.
5.3. Transfer-related effects on French learners’

use of academic vocabulary
The focus of Section 5.2 was on interlanguage features that are shared by
most learner populations when compared to expert academic writing, and
which are therefore likely to be developmental. Multiple factors, however,
may combine to influence learners’ use of academic vocabulary. It has, for
example, been suggested that learners’ preference for the sentence-initial
position for connectors may be attributed to the influence of instruction or

‘transfer of training’ (Selinker, 1972). The marked difference in frequency
of I think across the learner sub-corpora may be partly explained by
different academic writing conventions in the different mother tongues.
As Granger (1998b: 158) put it, ‘learners clearly cannot be regarded as
“phraseologically virgin territory”: they have a whole stock of prefabs in
their mother tongue which will inevitably play a role – both positive and
negative – in the acquisition of prefabs in the L2’.
Claims made about the nature of L1 influence and its interaction with
other factors, however, have often been built on shaky methodological
foundations and suffer from what Jarvis (2000: 246) referred to as
a ‘you-know-it-when-you-see-it’ syndrome. To remedy this situation, Jarvis
(2000) incorporated three types of L1 observable effects into a unified
framework for the study of L1 influence and proposed the following
working definition of L1 influence, which is intended as a methodological
heuristic to be used by transfer researchers:
L1 influence refers to any instance of learner data where a statistically

significant correlation (or probability-based relation) is shown to exist
between some features of learners’ IL performance and their L1 back-
ground. (Jarvis, 2000: 252)
Jarvis translated his working definition of L1 influence into a list of specific

types of L1 observable effects that should be examined when investigating
transfer. He argued that transfer studies should minimally consider at least
three potential effects of L1 influence when presenting a case for or against
L1 influence:
1. Intra-L1-group homogeneity in learners’ IL performance is found when

learners who speak the same first language behave as a group with respect
to a specific L2 feature. To illustrate this first L1 effect, Jarvis used
Selinker’s (1992) finding according to which Hebrew-speaking learners
of English as a group tend to produce sentences in which adverbs are
placed before the object (e.g. I like very much movies).12 Intra-L1-group
homogeneity is verified by comparing the interlanguage of learners
sharing the same mother tongue background.
2. Inter-L1-group heterogeneity in learners’ IL performance is found when
‘comparable learners of a common L2 who speak different L1s diverge
in their IL performance’ (Jarvis, 2000: 254). To illustrate this effect,
Jarvis referred to a number of studies reported by Ringbom (1987) that
have shown that Finnish-speaking learners are more likely than Swedish-
speaking learners to omit English articles and prepositions. Jarvis argued
that ‘this type of evidence strengthens the argument for L1 influence
because it essentially rules out developmental and universal factors as
the cause of the observed IL behaviour. In other words, it shows that the
IL behaviour in question (omission of function words) is not something
that every learner does (to the same degree or in the same way) regard-
less of L1 background’ (Jarvis, 2000: 254–5). Inter-L1-group heterogene-
ity is identified by comparing the interlanguage of learners from different
mother tongue backgrounds.
3. Intra-L1-group congruity between learners’ L1 and IL performance is
found where ‘learners’ use of some L2 feature can be shown to parallel
their use of a corresponding L1 feature’ (Jarvis, 2000: 255). Selinker
(1992) uses this type of evidence to show that Hebrew-speaking learners’
positioning of English adverbs parallels their use of adverbs in the L1.
The added value of this third L1 effect is that it also has explanatory
power by showing ‘what it is in the L1 that motivates the IL behavior’
(Jarvis, 2000: 255). Intra-L1-group congruity is confirmed by an IL/L1
comparison.
These three effects can emerge in circumstances in which transfer is not

at play and can thus be misleading when considered in isolation. As shown
in Table 5.25, Jarvis concluded that, despite differences in the degree of
reliability, none of the three effects is sufficient by itself to verify or charac-
terize L1 influence. The identification of two simultaneous L1 effects is
necessary to present a convincing case for L1 influence. Identifying the
three L1 effects would be even more convincing if it were not that ‘the
ubiquity of conditions that can obscure L1 effects renders the three-effect
requirement unrealistic in many cases’ ( Jarvis, 2000: 255).
Table 5.25 Jarvis’s (2000) three effects of potential L1 influence
L1 effect reliability sufficient criterion
Intra-L1-group homogeneity in learners’ poor no

IL performance
Inter-L1-group heterogeneity in learners’ strong no
IL performance
Intra-L1-group congruity between learners’ L1 and strongest no
IL performance
Table 5.26 Jarvis’s (2000) unified framework applied to the ICLE-FR
L1 effect Corpus comparisons
Intra-L1-group homogeneity in learners’ A comparison of the use of a specific lexical

performance item in all the essays written by French
learners
Inter-L1-group heterogeneity in learners’ IL A comparison of the use of a specific lexical
performance item in the ICLE-FR against other L1 sub-
corpora
Intra-L1-group congruity between learners’ A comparison of a specific lexical item in
L1 and IL performance the ICLE-FR to the use of its equivalent
form in a comparable corpus of French
native student writing
I made use of Jarvis’s (2000) unified framework to investigate the poten-

tial influence of the first language on multiword sequences that serve rhe-
torical functions in French learners’ argumentative writing. The International
Corpus of Learner English appears to be ideally suited to analysing the three
potential effects of L1 influence described by Jarvis (2000). Table 5.26 lists
the three steps needed to investigate the influence of French on recurrent
word sequences in the ICLE-FR. Intra-L1-group homogeneity in learners’
performance is investigated by comparing all the essays written by French
learners to verify whether they behave as a group with respect to a specific
L2 feature. Inter-L1-group heterogeneity in learners’ IL performance is
verified by a comparison of the number of texts in which a specific lexical
item is used in the ICLE-FR and in other L1 sub-corpora. Unlike Jarvis
(2000), I made use of comparison of means tests and post hoc tests such as
Ryan’s procedure and Dunnett’s test to confirm this second L1 effect.13
To establish intra-L1-group congruity between learners’ L1 and IL perfor-
mance, French EFL learners’ use of a specific lexical item is compared to
the use of its equivalent form in a 225,174-word comparable corpus of essays
written by French-speaking students collected at the University of Louvain,
i.e. the Corpus de Dissertations Françaises (CODIF).
Applying Jarvis’s (2000) framework to the ICLE texts reveals the potential
influence of transfer on French learners’ use of multiword sequences that
serve specific rhetorical functions in English. For example, the three trans-
fer effects are found in French learners’ use of on the contrary, indicating
that L1 influence most probably reinforces the conceptual problems and
misguided teaching practices that were identified in Section 5.2.4 as poten-
tial explanations for the frequent misuse of the adverbial. This strongly
supports Granger and Tyson’s (1996) suggestion that French learners’
overuse and misuse of the connector is probably due to an over-extension
of the semantic properties of the French au contraire, which can be used to

express both a concessive and an antithetic link.
Most transfer studies have focused on what we can call ‘transfer of
form’ (e.g. borrowing), ‘transfer of meaning’ (cf. semantic transfer, seman-
tic extension) or ‘transfer of form/meaning mapping’ (e.g. cognates)
(see Jarvis and Pavlenko, 2008; Odlin, 1989; 2003 and Ringbom, 2007 for
excellent syntheses on lexical and semantic transfer). Next to knowledge of
form and meaning, however, knowing a word also involves knowing in what
patterns, with what words, when, where and how to use it (Nation, 2001: 27).
These other types of knowledge can also give rise to transfer. For example,
research into learners’ use of cognates has highlighted transfer effects on
style and register (cf. Granger and Swallow 1988; Van Roey 1990; Granger
1996b). Studies focusing on learners’ use of phrasemes have brought to
light transfer effects on collocational restrictions and lexico-grammatical
patterns (e.g. Biskup, 1992; Granger, 1998b; Nesselhauf, 2003). However,
much remains to be done regarding ‘transfer of use’.
Applying Jarvis’s (2000) unified framework on learner corpus data
brings to light interesting findings relating to L1 influence on word use.
It helps to identify a number of transfer effects that remain largely undocu-
mented in the SLA literature: transfer of function, transfer of the phraseo-
logical environment, transfer of style and register, and transfer of L1
frequency. These four transfer effects often accompany transfer of form
and meaning and may also reinforce each other. They are illustrated in the
remaining of this section.
Multiword sequences with a pragmatic anchor seem to be quite
easily transferred. French learners’ use of the idiosyncratic expression
*according to me is a good example of transfer of function. This sequence is
repeatedly used in the ICLE-FR; it does not appear in other learner
sub-corpora except for the ICLE-DU and the ICLE-SW, where it is extremely
rare. Moreover, there is congruity between French learners’ use of according
to me in English and selon moi in French, which are probably regarded as
translation equivalents by French EFL learners. The English preposition
according to and the French selon both mean ‘as shown by something or stated
by someone’ (e.g. According to George Heard Hamilton, Rodin became “a figure of
international significance, the most admired, prolific, and influential sculptor since
Bernini”, BNC-AC-HUM). However, they differ in one significant way: accord-
ing to me is usually not accepted as a correct English phraseme. By contrast,
selon moi is perfectly fine in French and is, in fact, quite frequent in French
native-speaker students’ writing. This may explain why French EFL learners
are keen to use what they regard as a direct translation of a common French
expression.
The following examples illustrate French students’ use of selon moi and
French EFL learners’ use of according to me:
5.109. Selon moi, la chanson est un vecteur de culture parce qu’elle est un art qui
impose l’engagement des différents acteurs. (CODIF)
5.110. Selon moi, tout le monde pense ce qu’il veut et comme il veut, agit comme il
l’entend en respectant la loi et les codes établis. (CODIF)
5.111. According to me, the real problem now is not that man refuses to pay heed
but that man refuses to make some sacrifices for the sake of ecology and to
understand that the values that we have chosen are the wrong ones.
(ICLE-FR)
5.112. According to me, the prison system is not outdated: it has never been
a solution per se. (ICLE-FR)
Figure 5.10 represents graphically how the misleading translation equiva-

lent may be created by French EFL learners.
Transfer effects are also detectable in French learners’ use of lexico-
grammatical and phraseological patterns. The English verb illustrate is a
case in point. Although it is not found in many texts written by French
learners, it is much more frequent in ICLE-FR overall than in any other
learner sub-corpus. Table 5.27 shows that French EFL learners frequently
use the verb illustrate in its infinitive form. The percentage of use of this
form (40%) is quite similar to that of the infinitive form of the French cog-
nate verb illustrer in CODIF, but differs significantly from the proportion of
infinitive forms of the English verb illustrate that were found in the BNC-AC-
HUM (23.6%) (cf. Table 4.6 in Section 4.2.2). A closer look at the occur-
rences of the infinitive form of illustrate in ICLE-FR reveals that it is
repeatedly used in sentence-initial to-infinitive structures (Examples 5.113
and 5.114), a pattern that is also the preferred lexico-grammatical environ-
ment of illustrer in the corpus of French essays (Example 5.115).
5.113. To illustrate this, we can mention the notion of culture and language in the
north of Belgium. (ICLE-FR)
5.114. To illustrate this point, it would be interesting to compare our situation with
the U.S.A.’s. (ICLE-FR)
5.115. Pour illustrer cela, prenons l’exemple des pâtes alimentaires italiennes.
(CODIF)
French learners’ knowledge of the verb illustrer in their mother tongue

probably influences the type of word combinations and lexico-grammatical
patterns in which they use the English verb illustrate.
FRENCH
'selon'
'selon' + [+HUM] 'selon' + [-HUM]

e.g. idée, loi, principe, philosophie,
'selon X' 'selon moi' argument, théorie, norme, etc.
e.g. lui, Hugo,
monsieur Bernanos,
certains, etc.
FRENCH LEARNERS' INTERLANGUAGE
'according to'
'according to' + [+HUM] 'according to'

+ [+HUM]
'according to X' *'according

to me'
e.g. Civil Liberty Members, e.g. idea, article, theory, argument,

supporters, Judge Kamins, Xavier situation, etc.
Flores, etc.
'according to' + [+HUM] 'according to' + [-HUM]
'according to'
ENGLISH
Figure 5.10 A possible rationale for the use of ‘according to me’ in French
learners’ interlanguage
Similarly, French EFL learners almost always use the verb conclude in the
sentence-initial discourse marker To conclude followed by an active structure
introduced by a first person pronoun + modal verb. This pattern is less fre-
quent in the writing of EFL learners with other mother tongue backgrounds
and parallels a very frequent way of concluding in French, viz. sentence-
initial Pour conclure. The following examples show that longer sequences
Table 5.27 A comparison of the use of the English verb

‘illustrate’ and the French verb ‘illustrer’
En. ‘illustrate’ Fr. ‘illustrer’ in

in ICLE-FR CODIF
simple present 10 50% 8 31%

infinitive 8 40% 13 50%
past participle 2 10% 3 12%
imperative 0 0% 1 4%
past 0 0% 1 4%
TOTAL 20 100% 26 100%
Rel. freq. per 100,000 words 14.67 11.55
and phraseological cascades (see Section 5.2.3) may also be transfer-

related.
5.116. Pour conclure, nous pouvons dire que les deux stades sont aussi importants
l’ un que l’ autre : il est nécessaire que l’ homme soit membre d’ un groupe
mais il est tout aussi primordial qu’ il s’ en détache pour construire son iden-
tité propre. (CODIF)
5.117. To conclude, we can say that many people are today addicted to television.
(ICLE-FR)
5.118. Pour conclure, je dirais que chaque individu est unique, différent et qu’il est
facile de vouloir ressembler aux autres plutôt que de s’accepter tel qu’on est.
(CODIF)
5.119. To conclude, I would say that science, technology and industrialisation cer-
tainly stand in the way of human relationships but not in people’s dreams
and imagination. (ICLE-FR)
My findings also point to a transfer of style and register. In Section 5.2.3,

the first person plural imperative form let us was shown to be overused by all
L1 learner populations when compared to expert academic writing.
As shown in Table 5.28, the two-word sequence occurs in 25.9 per cent of
the texts produced by French learners and is much more frequent in the
ICLE-FR than in any other learner sub-corpus. This difference in use
between ICLE-FR and the other ICLE sub-corpora proved to be statistically
significant.
An analysis of concordance lines for let us shows that this sequence is
repeatedly used by French speaking students to serve a number of rhetori-
cal and organisational functions. For example, it is used as a code gloss to
Table 5.28 ‘let us’ in learner texts
Rel. freq. of Number of Number %

‘let us’ and texts including of texts
‘let’s’ per ‘let us’ or ‘let’s’
100,000 words
French 71.88 59 228 25.9%

Czech 25.24 19 147 12.9
Dutch 12.33 19 196 9.7
Finnish 8.78 10 167 6
German 13.69 14 179 7.8
Italian 20.95 10 79 12.7
Polish 19.21 23 221 10.4
Russian 38.57 47 194 24.2
Spanish 26.23 14 149 9.4
Swedish 18.73 9 81 11.11
TOTAL 26.85 224 1641 13.65
introduce an example (Example 5.120), a transition marker to change topic

(Examples 5.121 and 5.122), and an attitude marker (Example 5.123).
5.120. To illustrate the truth of this, let us take the example of Britain which was
already fighting its corner alone after Mrs Thatcher found herself totally
isolated over the decision that Europe would have a single currency.
(ICLE-FR)
5.121. Let us then focus on the new Europe as a giant whose parts are striving for
unity. (ICLE-FR)
5.122. Let us now turn our attention to the students who want to apply for a job in
the private sector. (ICLE-FR)
5.123. Let us be clear that we cannot let countries tear one another to pieces and if
we closed our eyes to such an atrocity, our behaviour would be cowardly.
(ICLE-FR)
As explained in Section 4.2.3, the first person plural imperative form let us
is found in professional academic writing, but it is not frequent (relative
frequency of 5.46 occurrences per 100,000 words). It is also restricted to a
limited set of verbs (see Swales et al., 1998; Hyland, 2002). In the BNC-AC-
HUM, there are only eight significant verb co-occurrents of let us: consider,
say, suppose, return, begin, look, take and have.
There is no lexically equivalent form to En. let us in French. Equivalence
is however found at the morphological level as French makes use of an inflec-
tional suffix to mark the first imperative plural form. Thus, to investigate the
third L1 effect, i.e. intra-L1 group congruity between learners’ L1 and IL
performance, I compared the use of let us in ICLE-FR with that of first person
plural imperative verbs in CODIF. The rhetorical and organisational func-
tions fulfilled by let us in French EFL learner writing can be paralleled with
the very frequent use of first person plural imperative verbs in French student
writing to organize discourse and interact with the reader (Paquot, 2008a):
5.124. Prenons l’exemple des sorciers ou des magiciens au Moyen Age. (CODIF)
5.125. Ajoutons qu’une partie plus spécifique de la population est touchée.
(CODIF)
5.126. Comparons cela à la visite de la cathédrale d’Amiens. (CODIF)
5.127. Envisageons tout d’abord la question économique. (CODIF)
5.128. Examinons successivement le problème de l’abolition des frontières d’un point
de vue économique, juridique et enfin culturel. (CODIF)
5.129. Imaginons un monde ou règne une pensée unique. (CODIF)
5.130. Considérons un instant le cinéma actuel. (CODIF)
First person plural imperative verbs serve specific discourse strategies in

French formal types of writing, and more specifically in academic writing.
French EFL learners seem to transfer their knowledge of French academic
writing conventions (Connor, 1996) and make use of imperatives in English
academic writing in the same way as in French academic writing. Impera-
tive forms that are repeated in the ICLE-FR often have formal equivalents
that are found in CODIF (e.g. let us take the example of ‘prenons l’exemple
de’; let us consider ‘considérons’; let us hope ‘espérons’; let us examine ‘exami-
nons’; let us take ‘prenons’; let us (not/never) forget ‘oublions/n’oublions pas
que’; let us think ‘pensons’). This generalized overuse of the first person
plural imperative in EFL French learner writing as a rhetorical strategy
does not conform to English academic writing conventions but rather to
French academic style. In English, let us (and more precisely its contracted
form let’s) is much more typical of speech (relative frequency of 42.5 occur-
rences per 100,000 words in the BNC-SP but only 5.3 per 100,000 in the
BNC-AC). As a result, the speech-like nature of let us in French EFL learner
writing leads to an overall impression of stylistic inappropriateness.
This example points to yet another type of transfer effect, namely transfer
of L1 frequency. As shown in Table 5.29, the frequency of let us in the
ICLE-FR is much closer to the frequency of first person plural imperative
verbs in student writing in French, than in English expert or novice writing.
Other examples of sequences that have French-like frequencies in the
ICLE-FR include on the contrary, on the other hand, let us take the example, to
illustrate this, to conclude and *according to me.
Table 5.29 The transfer of frequency of the first person plural

imperative between French and English writing
Corpus Relative frequency per

100,000 words of first person
plural imperative verbs
French L1 students (CODIF) 95.5

French EFL learners (ICLE-FR) 71.9
English expert writers (BNC-AC-HUM) 5.7
English novice writers (LOCNESS) 3
FRENCH ENGLISH
Fr. 1st plural imperative En. 1st plural imperative
Fr. ‘prenons’ example de’ En. ‘let us take the example of’
Fr. ‘n’ oublions pas’ En ‘let us not forget’
Fr. ‘examinons’ En. ‘let us examine’
FREQUENCYFR FREQUENCYEN
REGISTERFR REGISTEREN
FUNCTIONFR FUNCTIONEN
PHRASEOLOGYFR PHRASEOLOGYEN
FRENCH EFL LEARNERS'

INTERLANGUAGE
En. 1st plural imperative
En. ‘let us take the example of’

En ‘let us not forget’
En. ‘let us examine’
...
FREQUENCYFR
REGISTERFR
FUNCTIONFR
PHRASEOLOGYFR
Figure 5.11 A possible rationale for the use of ‘let us’ in French learners’
interlanguage
Transfer effects often interact in learners’ use of English lexical devices.

Thus, French EFL learners use English first person plural imperatives in
academic writing with the frequency of French imperative verbs in the corre-
sponding register, in French-like phraseological patterns and to serve the same
organizational and interactional functions. As illustrated in Figure 5.11, French
EFL learners’ use of textual phrasemes such as let’s take the example of, let’s
examine or let us not forget mirror the stylistic profile of the French sequences
prenons l’exemple de .., examinons et n’oublions pas in French academic writing.
The transfer effects identified in this section – transfer of function, trans-
fer of lexico-grammatical and phraseological patterns, transfer of style and
register, and transfer of frequency – make up what, following Hoey (2005:
183), I refer to as ‘transfer of primings’. EFL learners’ knowledge of words
and word combinations in their mother tongue includes a whole range of
information about their preferred co-occurrences and sentence position,
stylistic or register features, discourse functions and frequency. Primings
for collocational and contextual use of (at least a restricted set of frequent
or core) L1 lexical devices are particularly strong in the mental lexicon of
adult EFL learners. They are the result of many encounters with these lexi-
cal items in L1 speech and writing. Mental primings in the L1 lexicon prob-
ably influence EFL learners’ knowledge of English words and word
sequences by priming the lexico-grammatical preferences of an L1 lexical
item to its English counterpart.
The data presented in this chapter support the idea that the ‘English of
advanced learners from different countries with a relatively limited varia-
tion of cultural and educational background factors share a number of fea-
tures which make it differ from NS language’ (Ringbom, 1998: 49). The
focus of the analysis has been on the lexical means available to learners to
perform specific rhetorical and organizational functions in academic writ-
ing, and more precisely in argumentative essays. This textual dimension is
particularly difficult to master and has been described by Perdue (1993) as
the last developmental stage before bilingualism in second language acqui-
sition. My results show that the expression of rhetorical and organizational
functions in EFL writing is characterized by:
A limited lexical repertoire: EFL learners tend to massively overuse a

restricted set of words and phrasemes to serve a particular rhetorical
function and to underuse a large proportion of the lexical means avail-

able to expert writers. They also seem to prefer to use conjunctions,
adverbs and prepositions rather than phraseological patterns with nouns,
verbs and adjectives.
A lack of register awareness: texts produced by EFL learners often ‘give
confusing signals of register’ (Field and Yip, 1992: 26) as they display
mixed patterns of formality and informality. The frequency of informal
words and phrases in learner writing is often closer to their frequency in
native-speakers’ speech than in their academic prose.
Lexico-grammatical and phraseological specificities: EFL learners’
writing is distinguishable by a whole range of lexico-grammatical
patterns and co-occurrences that differ from academic prose in both
quantitative and qualitative terms. Preferred co-occurrences in the ICLE
are often not the same as in academic prose, which reveals learners’ weak
sense of native speakers’ ‘preferred ways of saying things’. Learners’
attempts at using collocations are not always successful and sometimes
result in crude approximations and lexico-grammatical infelicities.
My results also support Lorenz’s (1999b) remark that ‘advanced learn-
ers’ deficits are most resilient in the area of lexico-grammar, where lexi-
cal items are employed to signal grammatical and textual relations’ and
that ‘a lack of coherence in advanced learners’ writing must at least partly
be attributable to lexico-grammatical deficits’ (Lorenz, 1999b: 56).
Semantic misuse: As Crewe (1990: 317) commented, ‘the misuse of
logical connectives is an almost universal feature of ESL students’
writing’. What is less well-documented in the literature, however, is that
EFL learners also experience difficulty with the semantics of other types
of cohesive devices, and specifically, with labels, i.e. abstract nouns that
are inherently unspecific and require lexical realization in their co-text,
either beforehand or afterwards.
Chains of connective devices: EFL learners’ texts are sometimes charac-
terized by the use of superfluous (and sometimes semantically inconsis-
tent) connective devices.
A marked preference for sentence-initial position of connectors: con-
nectors are often used in the unmarked sentence-initial position in
learner writing. A medial position is not favoured by EFL learners,
although it is typical of academic prose.
The methodology used in the first part of this chapter has made it possi-
ble to draw a general picture of the writing of upper-intermediate to
advanced EFL learners from different mother tongue backgrounds. Most
of these features have already been mentioned in the literature, but they
have always been reported on the basis of only one or two L1 learner popu-
lations. My methodology makes it possible to avoid hasty interpretations in
terms of L1 influence. Consider the following quotations by Zhang (2000),
who attributes a number of features to the influence of the learners’ mother
tongue, in this case Chinese:
The overuse of this expression [more and more] was most probably due to
language transfer since a familiar expression in the Chinese language ye
lai yue was popularly used. (Zhang, 2000: 77);
The reason for the initial positioning of conjunctions was again due to
the transfer of the Chinese language where conjunction devices with sim-
ilar meaning are mostly used at the beginning of a sentence. (Zhang,
2000: 83).
As explained above, sentence-initial positioning of conjunctions is

common to most learner populations. The mother tongue may reinforce
learners’ preference for sentence-initial position but cannot be regarded as
a complete explanation for this learner-specific feature. In Section 5.2.6,
teaching-induced factors have been identified as a possible cause for learn-
ers’ preference for sentence-initial position. Syntactic positioning of
connectors is rarely taught and EFL learners often consider the sentence-
initial position to be a safe strategy. As for the overuse of the expression more
and more, although it is indeed very significant in the Chinese component of
the second edition of the ICLE (Granger et al., 2009), this feature is actu-
ally common to all learner populations represented in the corpus. This
suggests that, while transfer may be at work in Chinese learners’ use of more
and more, it is probably not the only explanation. It is not always possible to
attribute learner-specific features to a single factor, as developmental,
teaching-induced and transfer-related effects can reinforce each other
(Granger, 2004: 135–6).
Another advantage of the method I used is that, once linguistic features
of upper-intermediate to advanced EFL learner writing have been identi-
fied, we can check to what extent they are specific to EFL learners or just
typical of novice writing. This is precisely where a corpus of essays written by
English native university students such as LOCNESS (see Section 2.1) has a
role to play. Tripartite comparisons between professional writing, foreign
learner writing and native student writing make it possible to distinguish
between learner-specific and developmental features (e.g. Neff et al., 2008).
Whether a feature is learner-specific or developmental varies from lexical

item to lexical item, but as a general rule, the findings suggest that the main
feature shared by native and non-native novice writers is a lack of register-
awareness. Figure 5.12 shows that a whole range of lexical items that Gilquin
and Paquot (2008) found to be overused in EFL learners’ writing – maybe,
so expressing effect, it seems to me, really, sentence-final though, this/that is why,
I think and first of all –are also more frequently used by native-speaker
student writers than in expert academic prose.14 The overuse of I think in
both EFL learner and native-speaker student writing has already been
reported by Neff et al. (2004a) who described it as a general ‘novice-writer
characteristic of excessive visibility’ (Neff et al, 2004a: 152).
Figure 5.12 also shows that not all learner-specific speech-like lexical
items are overused in the writing of native-speaking students. Thus, the
lexical items of course, certainly, absolutely, by the way and I would like/want/am
going to talk about are quite rare in LOCNESS. They are even less frequent in
native-speaker students’ writing than in academic prose, which suggests
that native novice writers do not transfer all spoken features to their
3000
200
2500
150 2000
100 1500
1000
50
500
0 0
Freq. of PRO (this, that, which) is why Freq. of I think (pmw)
(pmw)
200 Expert academic writing: British

National Corpus, academic
150 component (15m words)
100 Native-speaker student writing:
Sub-corpus of LOCNESS (100,702
50 words)
EFL learners' writing: ICLEv2
0 (14L1s; around 1.5m. words)
Freq. of first of all (pmw) Native speech: British National
corpus, spoken component
(10m words)
Figure 5.12 Features of novice writing – Frequency in expert academic writing,

native-speaker and EFL novices’ writing and native speech (per million words of
running text)
350 1200
300 1000
250 800
200
600
150
400
100
50 200
0 0
Freq. of maybe (pmw) Freq. of so expressing effect (pmw)
40 18
35 16
30 14
25 12
10
20
8
15 6
10 4
5 2
0 0
Freq. of it seems to me (pmw) Freq. of I would like / want / am going to talk
about (pmw)
2000
1500
1000
500
0
really of course certainly absolutely definitely
Freq. of amplifying adverbs (pmw)
45 120
40 100
35
30 80
25 60
20
15 40
10 20
5
0 0
Freq. of by the way (pmw) Freq. of sentence-final though (pmw)
Figure 5.12 Continued

academic writing. It seems that lexical items which are not particularly
frequent in speech and are rare in academic prose (e.g. I want/would like/
am going to talk about) are less likely to be overused by native novice writers.
By contrast, lexical items that are very frequent in speech, and acceptable
in academic prose, are very likely to be overused (e.g. maybe, so expressing
effect).
Other linguistic features are limited to EFL learners. These include
lexico-grammatical errors (*a same, possibility *to, despite *of, discuss *about),
the use of non-native-like sequences (e.g. according to me and as a conclusion),
and the overuse of relatively rare expressions such as in a nutshell. As
Gilquin, Granger and Paquot (2007a: 323) have argued, the issue of the
degree of overlap between novice native writers and non-native writers has
far-reaching methodological and pedagogical implications and is clearly in
need of further empirical study.
Developmental factors in L1 and L2 acquisition cannot, however, be
held responsible for all learner specific-features. In addition to teaching-
induced factors and proficiency, the first language also plays a part in EFL
learners’ use of academic vocabulary. In the last part of this chapter,
I focused on the potential influence of the first language on multiword
sequences that serve rhetorical functions in French learner writing. I made
use of Jarvis’s (2000) framework for assessing transfer and identified
a number of transfer effects – transfer of function, of the phraseological
environment, of style and register, and of L1 frequency – that I referred to
as ‘transfer of primings’. These results support Kellerman’s claim that the
‘hoary old chestnut’ according to which transfer does not afflict the more
advanced learner ‘should finally be squashed underfoot as an unwarranted
overgeneralization based on very limited evidence’ (Kellerman, 1984: 121).
However, they also suggest that the main effect of the students’ mother
tongue on higher-intermediate to advanced learner writing is not errors,
but more subtle transfer effects, especially at higher levels of proficiency.
Part III
Pedagogical implications
and conclusions
In the first two sections I defined the concept of ‘academic vocabulary’,

built a list of academic keywords from corpora of expert writing, and
analysed their use in ten sub-corpora of the International Corpus of
Learner English. In Chapter 6, I discuss some of the important pedagogical
implications of this research. There are three key aspects: the influence of
teaching on learners’ writing; the role of the first language in EFL learning
and teaching; and the use of corpora, and more specifically, learner cor-
pora, in the development of EAP teaching materials. Chapter 7 then briefly
summarizes the major results, discusses some of their implications, and sug-
gests several remaining issues and avenues for future research.
Chapter 6
Pedagogical implications
This chapter considers three areas where my findings have major pedagogi-
cal implications: teaching-induced factors; the role of the first language in
EFL learning and teaching; and the role of corpora in EAP material design.
The ways in which corpus data, in particular data from learner corpora,
have been used to inform the academic-writing sections of the second
edition of the Macmillan English Dictionary for Advanced Learners (MED2)
(Rundell, 2007) are also discussed.
6.1. Teaching-induced factors
Factors linked to teaching have repeatedly been denounced in the litera-

ture as being responsible for a number of learners’ inappropriate uses of
connectors (see Zamel, 1983; Hyland and Milton, 1997; Flowerdew, 1998;
Milton, 1999). Connectors are often presented in long lists of undifferenti-
ated and supposedly equivalent items, classified in broad functional catego-
ries. This can cause semantic misuse (Crewe, 1990; Lake, 2004). For
example, Jordan (1999) describes the adverbial on the contrary as a phrase of
contrast equivalent to on the other hand and by contrast (see Figure 6.1). The
same is said about conversely. However this adverb should only be used to
indicate that one situation is the exact opposite of another:
6.1. American consumers prefer white eggs; conversely, British buyers like brown
eggs. (LDOCE4)
Also problematic are the categorization of besides as a marker of concession,

and the misleading presentation of the conjunctions even if and even though
as synonyms.
Overuse of connectors such as nevertheless, in a nutshell, as far as I am con-
cerned, on the one hand, and on the other hand can also be attributed to the long
A. Contrast, with what has preceded:
instead
conversely
then
on the contrary
by (way of ) contrast
in comparison
(on the one hand) . . . on the other hand . . .
B. Concession indicates the unexpected, surprising nature of what is being said in view of
what was said before:
besides yet
(or) else in any case
however at any rate
nevertheless for all that
nonetheless in spite of/despite that
notwithstanding after all
only at the same time
still on the other hand
while all the same
(al)though even if/though
Figure 6.1 Connectives: contrast and concession (Jordan, 1999: 136)
lists of connectors found in most textbooks (Granger, 2004: 135) as no

information is given about their frequency or semantic properties. Milton
(1999) has shown that there is a strong correlation between the words and
phrases overused by Hong Kong students and the functional lists of expres-
sions distributed by tutorial schools (private institutions which prepare
most high school students in Hong Kong for English examinations). The
selection of connectors to be taught may also lend itself to criticism. It was
shown in Section 5.2.3 that sequences that are rarely used by native speak-
ers (e.g. as far as I am concerned or last but not least) or ‘unidiomatic’ sequences
(e.g. as a conclusion) are sometimes found in teaching materials, especially
in the lists of connectors freely available on the Internet.1 By contrast,
the connectors most frequently used to serve rhetorical functions are
sometimes missing from these lists.
Another direct consequence of these lists is EFL learners’ stylistic
inappropriateness, as Milton explains:
Students are drilled in the categorical use of a short list of expressions –

often those functioning as connectives or alternatively those which are
Pedagogical implications 203
colourful and complicated (and therefore impressive) – regardless of

whether they are used primarily in spoken or written language (if indeed
at all), or to which text types they are appropriate (1998: 190).
Thus, the spoken-like expression all the same is given as an equivalent alter-
native to more formal connectors such as on the other hand or notwithstanding
in Jordan (1999) (see Figure 6.1). This example also illustrates the fact that
no information about the connectors’ grammatical category or syntactic
properties is made available to the learners. The preposition notwithstand-
ing is listed together with adverbs and adverbial phrases (e.g. however, yet) as
well as conjunctions (e.g. although, while). Learners’ marked preference for
the sentence-initial positioning of connectors has also been related to
L2 instruction (see Flowerdew, 1998; Milton, 1999; Narita and Sugiura,
2006). Positional variation of connectors is usually not taught, and learners
use the sentence-initial position as a safe bet.2
Another problem of teaching practices (which has not often been
documented) is that too much emphasis tends to be placed on connectors,
that is, on grammatical cohesion (see Halliday and Hasan, 1976), to the
detriment of lexical cohesion.3 However I have shown in this book that
nouns, verbs and adjectives all have prominent rhetorical functions in
academic prose. Labels, have also been found to fulfil a prominent cohesive
role in this particular genre. It is most probable that lexical cohesion
has been neglected in EFL teaching because ‘there have been no good
descriptions of the forms and functions of this phenomenon’ (Flowerdew,
2006: 345).
6.2. The role of the first language in EFL learning

and teaching
My findings have at least two important pedagogical implications

relating to the role of the first language in EFL learning and teaching.
Transfer of primings means that words or word sequences in the foreign
language may be primed for L1 use in terms of discourse function, colloca-
tional and lexico-grammatical preferences, register and frequency. One of
the many roles of teaching should thus be to counter these ‘default’ and
sometimes misleading primings in EFL learners’ mental lexicons. Aware-
ness-raising activities focusing on similarities and differences between the
mother tongue and the foreign language are clearly needed to achieve this.
These activities should not be restricted to ‘helping learners focus on errors
typically committed by learners from a particular L1’ (Hegelheimer and
Fisher, 2006: 259). They should also raise learners’ awareness of more
subtle differences such as the register differences and collocational prefer-
ences of similar words in the two languages. This recommendation stands
in sharp contrast to Bahns’s (1993: 56) claim that collocations that are
direct translation equivalents do not need to be taught. Learners have no
way of knowing which collocations are congruent in the mother tongue
and the foreign language; moreover, the differences between the colloca-
tions in L1 and L2 may lie in aspects of use rather than form or meaning.
However, as Odlin commented, it is not always possible to make use of the
first language in the classroom and to rely on contrastive data:
Whatever the merits of contrastive materials in some contexts, it is

clear that such materials are not always feasible. For example, when an
ESL class consists of speakers of Chinese, Persian, Spanish, Tamil, and
Yoruba, there is not likely to be any textbook that contrasts English verb
phrases with verb phrases in all of those languages – and even if there
were, teachers could not profitably spend the class time necessary to illu-
minate so many contrasts. Yet even in such classes, one type of contrastive
information is frequently available: bilingual dictionaries. Although the
comparisons are sometimes restricted to words in the native and target
languages, the most carefully prepared dictionaries often provide some
comparisons of pronunciation and grammar as well. If the class size allows
it, teachers can help individual students in using any contrastive informa-
tion that their dictionaries provide. (1989: 162)
Bilingual dictionaries should ideally facilitate the teacher’s task in multilin-

gual as well as monolingual classrooms. However, it is questionable whether
the type of contrastive information they provide is fully adequate. For exam-
ple, the Robert & Collins CD-Rom (Version 1.1) includes an essay-writing sec-
tion in which first person plural imperatives in French are systematically
translated by structures employing let us in English (Granger and Paquot,
2008b). In Section 5.3, however, I showed that first person plural impera-
tives are not the best way of organizing discourse and interacting with the
reader in English academic writing. Table 6.1 lists examples of infelicitous
translation equivalents. Similarly, a web-page devoted to linking words and
hosted by the ‘Académie de Lille (Anglais BTS Informatique)’ lists accord-
ing to me as a direct translation equivalent of the French ‘à mon avis’, and as
a conclusion as a possible equivalent of the French ‘pour conclure / pour
résumer ’.4
Table 6.1 Le Robert & Collins CD-Rom (2003–2004): Essay writing
Essay writing: function French sentence Proposed English equivalence
Developing the Prenons comme point de = ‘let us take ... as a starting

argument départ le rôle que le point’
gouvernement a joué dans
l’élaboration de ces
programmes
En premier lieu, examinons = ‘firstly, let us examine’
ce qui fait obstacle à la paix
The other side of the Après avoir étudié la progres- = ‘after studying ... let us now
argument sion de l’action, considérons consider’
maintenant le style
Venons-en maintenant à = ‘now let us come to’
l’analyse des retombées
politiques
Assessing an idea Examinons les origines du = ‘let us examine ... as well as’
problème ainsi que certaines
des solutions suggérées
Sans nous appesantir or nous = ‘without dwelling on the
attarder sur les détails, details, let us note, however,
notons toutefois que le rôle that’
du Conseil de l’ordre a été
déterminant
Nous reviendrons plus loin = ‘we shall come back to this
sur cette question, mais question later, but let us
signalons déjà l’absence point out at this stage’
totale d’émotion dans ce
passage
Avant d’aborder la question = ‘before tackling ... let us
du style, mentionnons mention briefly’
brièvement le choix des
métaphores
Adding or detailing Ajoutons à cela or Il faut = ‘let us add to this or added
ajouter à cela or À cela to this’
s’ajoute un sens remarqua-
ble du détail
Introducing an example Prenons le cas de Louis dans = ‘(let us) take the case of’
«le Nœud de vipères»
Stating facts Rappelons les faits. Victoria = ‘let’s recall the facts’
l’Américaine débarque à
Londres en 1970 et réussit
rapidement à s’imposer sur
la scène musicale
Emphasizing particular N’oublions pas que, sur Terre, = ‘let us not forget that’
points la gravité pilote absolument
tous les phénomènes
These findings are quite representative of a general lack of good contras-

tive studies on which pedagogical materials can be based. Multilingual
corpora clearly have an important role to play here by providing an empir-
ically-based source of translation equivalents (Bowker, 2003; King, 2003).
6.3. The role of learner corpora in EAP materials design
While teaching materials designed to help undergraduate students

improve their academic writing skills are legion (e.g. Bailey 2006;
Hamp-Lyons and Heasley 2006), few make use of authentic texts and very
few are informed by the use of corpora. Even when they are corpus-
informed, EAP resources tend to be based on data from native-speakers
only. Thus, Thurstun and Candlin’s (1997) Exploring Academic English, which
uses concordance lines to introduce new words in context and familiarize
learners with phraseological patterns, relies exclusively on data from a
native-speakers’ academic corpus. Although this is one of the most innova-
tive EAP textbooks to date, it is arguably less useful for non-native learners,
despite Thurstun and Candlin’s (1998) claim that it is equally appropriate
for native and non-native writers. As shown in Section 5.2, EFL writing is
characterized by a number of linguistic features that differ from novice
native-speakers’ writing.
The value of pedagogical tools for non-native speakers of English would
be greatly increased if findings from learner corpus data were also used to
select what to teach and how to teach it. As Flowerdew (1998) put it, ‘when
choosing which markers to teach, decisions made should also be based on
findings from a parallel student corpus to ascertain where students’ main
deficiencies lie. If not, there is a danger that the emphasis on teaching the
most frequent markers may focus on ones already familiar to and correctly
used by students, or in this case, exacerbate the problem with their overuse’
(Flowerdew, 1998: 338). By showing, in context, the types of infelicities
EFL learners produce and the types of errors they make, as well as the items
they tend to under- or overuse, learner corpora are the most valuable
resources for designing EAP materials which address the specific problems
that EFL learners encounter (see also Flowerdew, 2001; Granger, 2009).
Yet, such corpora have very rarely been used systematically to inform EAP
materials (see Milton, 1998 and Tseng and Liou, 2006 for two exceptions in
Computer-Assisted Language Learning).5
The only type of resource in which learner corpus data have been
relatively successfully implemented up to now is the monolingual learners’
dictionary (MLD). For example, the Longman Dictionary of Contemporary
English and the Cambridge Advanced Learner’s Dictionary include a number of

learner corpus-informed usage notes which warn against common learner
errors (e.g. the confusion between the adjectives actual and current,
the countable use of the noun information). Yet, if MLDs are to take
further ‘proactive steps to help learners negotiate known areas of difficulty’
(Rundell, 1999: 47), learner corpora should not only be exploited to
compile error notes but also to improve other aspects of the dictionary.
As put by Cook (1998: 57) referring to Carter’s (1998b) standpoint,
however, ‘materials should be influenced by, but not slaves to, corpus
findings’ (see also Swales, 2002; Widdowson, 2003). The method used in
Chapter 5 has made it possible to identify a number of common features
of EFL learners’ expression of rhetorical and organizational functions.
A selected list of features were used to inform a 30-page writing section
which I and two other members of the Centre for English Corpus Linguistics
(CECL), Gaëtanelle Gilquin and Sylviane Granger, designed for the second
edition of the Macmillan English Dictionary for Advanced Learners (Gilquin et
al. 2007b: IW1–IW29). The writing section includes 12 functions that EFL
learners need to master in order to write well-structured academic texts.
These were identified in Section 4.1 as typically appearing in EAP textbooks
which adopt a functional approach to academic writing: (1) adding infor-
mation; (2) comparing and contrasting: describing similarities and differ-
ences; (3) exemplification: introducing examples; (4) expressing cause and
effect; (5) expressing personal opinions; (6) expressing possibility and
certainty; (7) introducing a concession; (8) introducing topics and related
ideas; (9) listing items; (10) reformulation: paraphrasing or clarifying;
(11) reporting and quoting; (12) summarizing and drawing conclusions.
Each writing section includes a detailed ‘corpus-based rather than
corpus-bound’ description (Summers, 1996: 262) of the many lexical means
that are available to expert writers to perform a specific function. Special
emphasis is placed on AKL nouns, adjectives and verbs and their phraseo-
logical patterns. As shown in Figures 6.2 and 6.3, the sections provide infor-
mation about how to use these words appropriately by focusing on their:
– semantic properties,
– syntactic positioning,
– collocations,
– frequency,
– style and register differences.
All the examples come from the academic component of the British National
Corpus.
You can use the nouns resemblance, similarity, parallel, and analogy to show that two
points, ideas, or situations are similar in certain ways:
If there is a resemblance or similarity between two or more points, ideas, situations, or

people, they share some characteristics but are not exactly the same:
There is a striking resemblance between them.
He would have recognized her from her strong resemblance to her brother.
There is a remarkable similarity of techniques, of clothes and of weapons.
The noun similarity also refers to a particular characteristic or aspect that is shared by two or
more points, ideas, situations, or people:
These theories share certain similarities with biological explanations.
The orang-utan is the primate most closely related to man; its lively facial expressions show
striking similarities to those of humans.
Collocation
Adjectives frequently used with resemblance and

similarity.
Certain, close, remarkable, striking, strong, superficial
The distribution of votes across the three parties in
1983 bears a close resemblance to the elections of
1923 and of 1929.
You can also use the noun parallel to refer to the way in which points, ideas, situations, or
people, are similar to each other:
Scientists themselves have often drawn parallels between the experience of a scientific
vocation and certain forms of religious experience.
There are close parallels here with anti-racist work in education.
An analogy is a comparison between two situations, processes, etc which are similar in some
ways, usually made in order to explain something or make it easier to understand:
A usefull analogy for understanding Piaget's theory is to view the child as a scientists who is
seeking a 'theory' to explain complex phenomena.
Collocation
Adjectives frequently used with analogy and parallel

close, interesting, obvious
A close analogy can be drawn between cancer of
the cell and a society hooked on drugs
Figure 6.2 Comparing and contrasting: using nouns such as ‘resemblance’ and
‘similarity’ (Gilquin et al., 2007b: IW5)
Evidence from learner corpora was used in several ways to inform the
writing sections. The sections specifically address the types of problems
discussed in Chapter 5 — limited lexical repertoire, lack of register aware-
ness, phraseological infelicities, semantic misuse, overuse of connective
When you want to explain or define exactly what you mean by something, you can use the
abbreviation i.e. (short for 'id est', the Latin equivalent of 'that is') or the expressions that is
and that is to say:
The police now have up to ninety-six hours, i.e. four days and nights, to detain people without
charge.
Descartes was obsessed by epistemological questions, that is, questions about what we can
know and how we can know it.
First, it excludes the public sector, that to say, the nationalized industries.
That is and that is to say are usually enclosed by commas. The abbreviation i.e. follows a
comma or is used between brackets:
Network emergencies (i.e. network failures) should be reported immediately.

Note that, in academic writing and professional reports, i.e. and that is are much more
frequent than that is to say.
Academic writing
Freq. per million words
160
140
120
100
80
60
40
20
0
i.e. that is that is to say
Figure 6.3 Reformulation: explaining and defining: using ‘i.e.’, ‘that is’ and ‘that
is to say’ (Gilquin et al., 2007b: IW9)
devices and syntactic positioning. Our treatment of these problems is mainly

explicit, in that we draw learners’ attention to error-prone items and we
provide negative feedback in the form of ‘Be careful!’ notes which focus
on problems of frequency (over- and underuse), register confusion and
atypical positioning. These notes are typically supported by frequency data,
in the form of graphs which help the reader visualize the differences
between learners’ language and that of native writers. Thus, in the section
on ‘Expressing cause and effect’, a graph is used to show that learners have
a strong tendency to use the adverb so, which is relatively rare in academic
prose and much more typical of speech (see Figure 6.4). There are also
‘Get it right’ boxes which are intended to give guidance on how to avoid
common errors. Numerous authentic examples are provided to illustrate
Be careful!
Learners often use so to express an effect. This use is correct, but it is more typical of speech
and should therefore not be used too often in academic writing and professional reports.
so expressing effect
Freq. per million words
1200
1000
800
600
400
200
0
Academic writing Learner writing Speech
Figure 6.4 Expressing cause and effect: ‘Be careful’ note on ‘so’ (Gilquin et al.,
2007b: IW13)
all the points we make. The reader is referred to Gilquin et al. (2007a) for
more detailed information on the principles that guided the design of these
writing sections.
My investigation of academic vocabulary has shown that the use of learner
corpus data, and their systematic comparison with native corpora, can bring
to light a wide range of learner-specific features, not limited to grammatical
or lexical errors, but also including over-reliance on a limited set of lexical
devices and under-representation of a wide range of typical academic words
and phraseological patterns. While Gilquin et al. (2007a; 2007b) have
shown how these findings can be integrated into a learner’s dictionary,
other writing resources, such as textbooks or electronic writing aids,6 could
equally benefit from the use of learner corpus data.
Chapter 7
General conclusion
This book lies at the intersection of three areas of research: English

for academic purposes, learner corpus research and second language
acquisition. In this final chapter, I take stock of the main findings of the
present study and bring out its major contributions to these three research
areas. The chapter concludes with some avenues for future research.
7.1. Academic vocabulary: a chimera?
The status and usefulness of EAP has been questioned by Hyland who
believes that ‘academic literacy is unlikely to be achieved through an
orientation to some general set of trans-disciplinary academic conven-
tions and practices’ (Hyland, 2000: 145). This book, however, supports
and substantiates the concept of ‘English for (General) Academic
Purposes’ both as a macro-genre which subsumes a wide range of text
types in academic settings (Biber et al., 1999), and as a teaching practice
that deals with ‘the teaching of the skills and language that are common
to all disciplines’ (Dudley-Evans and St Johns, 1998: 41) and focuses on ‘a
general academic English register, incorporating a formal, academic style, with
proficiency in the language use’ (Jordan, 1997: 5). My own contribution to
legitimizing EAP has been to demonstrate – on the basis of corpus data –
that ‘it is possible to delimit a procedural vocabulary of such words that
would be useful for readers/writers over a wide range of academic disci-
plines involving varied textual subject matters and genres’ (McCarthy,
1991: 78).
Academic texts are characterized by a wide range of words and phrase-
mes that refer to activities which are typical of academic discourse, and
more generally, of scientific knowledge. These lexical items also contribute
to discourse organization and cohesion, from topic introduction to
concluding statements. I have therefore argued in favour of a functional
definition of ‘academic vocabulary’ (Martínez et al., 2009) and proposed

the following definition:
academic vocabulary consists of a set of options to refer to those activities

that characterize academic work, organize scientific discourse and build
the rhetoric of academic texts.
Unlike Coxhead’s (2000) definition of the term, a large proportion of what

has been referred to as academic vocabulary in this book consists of core
words, a category which has so far largely been neglected in EAP courses.
Following researchers such as Hancioğ lu et al. (2008), I have therefore
questioned the fuzzy but well-established frequency-based distinction
between general service words and academic words.
Teachers should not assume that EAP students know the first 2,000 words
of English. Numerous so-called general service words are not mastered
productively by L2 learners, even at upper-intermediate to advanced levels
of proficiency. However, these words serve important discourse-organizing
functions in academic writing; this suggests that they should be the target
of teaching, particularly teaching aimed at productive activities. My find-
ings call into question the systematic use of Coxhead’s Academic Word List
as the exclusive vocabulary syllabus in a number of recent productivity-
oriented vocabulary textbooks. Another fact that stands out is that a clear
distinction should be made between vocabulary needs for academic
reading and writing.
As a result, I have derived a productive counterpart to the Academic Word
List, and have developed a rigorous and empirically-based procedure to
select potential academic words for this list. The methodology makes use of
the criteria of keyness, range and evenness of distribution, and provides a
good illustration of the usefulness of POS-tagged corpora for applied pur-
poses. One important feature of the methodology adopted here is that it
includes the 2,000 most frequent words in English, thus making it possible
to appreciate the paramount importance of core English words in academic
prose. The outcome of this procedure is the Academic Keyword List.
This list should not, however, be regarded as an end product. In its
current form (see Table 2.17), the list is the raw result of the application of
purely quantitative criteria to native-speaker corpus data. As such, it is not a
list of academic vocabulary in a functional sense. Each word still needs
‘pedagogic mediation’ (Widdowson, 2003): its different meanings, lexico-
grammatical patterning and phraseology in expert academic prose needs
to be carefully described and learner corpus data should be used to
General conclusion 213
complement these descriptions. This procedure has already been applied

to the study of words that serve discourse functions (such as exemplifying,
expressing cause and effect, comparing and contrasting) in academic prose.
I have shown that a phraseological approach to the description of academic
vocabulary provides a mine of valuable information for pedagogical tools.
The first result of this method has been to dethrone adverbs from their
dominant position as default cohesive markers. Adverbs do not have a
monopoly on lexical cohesion and discourse organization in academic
writing. My results have provided ample evidence for the prominent discur-
sive role of nouns, verbs and adjectives and their phraseological patterns, a
role which is hardly ever mentioned in EFL/EAP teaching. These part-
of-speech categories, however, serve organizational functions as diverse as
exemplification, comparing and contrasting, and expressing cause and
effect. Second, the method has helped to demonstrate that an essential set
of phrasemes in academic prose consists of ‘lexical extensions’ (Curado
Fuentes, 2001: 115) of academic words (e.g. conclusion, issue, claim, argue).
These words acquire their organizational or rhetorical function in specific
word combinations that are essentially semantically and syntactically
compositional (e.g. as discussed below, an example of . . . is . . ., the aim of this
study, the next section aims at . . ., it has been suggested) (Oakey, 2002; Biber
et al., 2004) and contribute to push ‘the boundary that roughly demarcates
the “phraseological” more and more into the zone previously thought of
as free’ (Cowie, 1998: 20).
The focus has been on words that are reasonably frequent in a wide range
of academic texts and their preferred lexicogrammatical and phraseologi-
cal patterns, irrespective of discipline. As well as their common core
features, these words may also have a discipline-specific phraseology
(Granger and Paquot, 2009a). Different disciplines may also have their pre-
ferred ways of performing rhetorical or organizational functions. A decade
ago, Milton (1999: 223) commented that ‘a great deal of research [was] still
necessary to describe with any empirical rigour the lexis that is characteris-
tic of particular purposes, genres, and registers’. Since then, there has been
a huge increase in the number of corpus-based studies highlighting the
specificity of vocabulary and phraseology in different academic disciplines
and genres.
The primary motivation of these studies, however, has not been
pedagogical. As a result, their findings do not easily lend themselves to
being used in general EAP courses and it is now essential to find ways of
reconciling research findings and the reality of EAP teaching practice. EAP
tutors are left wondering how they can possibly meet the needs of all their
students in classes which are ‘often composed of students from different

disciplines and/or language backgrounds with different purposes for
taking the class’ (Huckin, 2003: 6). They do not know either what should
be taught, for example, to law students who also have to take courses in
economics, history, sociology or psychology. With the emergence of a wide
range of interdisciplinary curricula, the problem is likely to become even
more acute in the future, not only for students but also for their teachers as
‘it seldom happens, especially in mixed classes, that the LSP [Language for
Specific Purposes] teacher has the disciplinary knowledge needed to pro-
vide reliably accurate instruction in technical varieties of language’ (Huckin,
2003: 8).
Faced with this difficulty, we have advocated elsewhere (Granger and
Paquot, 2009a) a balanced approach which concurs with Hyland’s (2002b)
plea for more specificity in EAP teaching while also subscribing to Eldridge’s
view that an essential function of research is to identify ‘similarities and
generalities that will facilitate instruction in an imperfect world’ (Eldridge,
2008: 111). We have shown that it is possible to identify both the common
core features of an academic word and its discipline-specific characteristics
in terms of meaning, lexico-grammar, phraseological patterns, etc. One way
of implementing this ‘happy medium’ approach in the classroom is to apply
a data-driven learning methodology, which consists of making use of corpus
data as a source of learning materials for language students (Johns, 1994).
The study of ‘individualized’ examples derived from specialized corpora
can be of considerable benefit in helping learners to appreciate the possi-
ble linguistic realisations of rhetorical and organizational functions in their
own disciplines. As Charles put it,
although it may not be possible in all teaching situations to provide mate-

rials that are specifically tailored to the disciplines of the students taught,
the process of investigation is itself of great value in raising students’
awareness of the patterned nature of academic discourse. With this
understanding, students are better equipped to examine the ways in
which grammatical patterns and lexical choices combine to perform rhe-
torical functions within their own disciplines and hence to apply this
knowledge to their own academic writing. (2007: 216)
In a heterogeneous EAP class, where disciplinary variability constitutes

a serious problem, this approach allows teachers to emphasize general
academic words and phrasemes which ‘are not likely to be glossed by
the content teacher’ (Flowerdew 1993: 236), while also empowering
learners by giving them the tools to investigate authentic texts and practices
in their own disciplines, ‘thereby allowing considerations of subject

specificity and disciplinary variation to inform classroom discussion’
(Groom, 2005: 273).
My journey into academic vocabulary – from the extraction of potential
academic words through their linguistic analysis in expert and learner
corpus data, to the pedagogical implications that can be drawn from the
results – has contributed to fleshing out this concept and has convincingly
demonstrated that academic vocabulary is anything but a chimera.
7.2. Learner corpora, interlanguage and second

language acquisition
Contrastive Interlanguage Analysis (CIA) (Granger, 1996) involves two types

of comparison. One compares native with non-native (or inter-) language,
for example native English and the English produced by French-speaking
learners. The other type of analysis compares two (or more) interlanguages,
for example the English produced by French-speaking learners and the
English produced by Italian-speaking learners. Although the CIA method
has become quite popular, most studies using the method have been of the
first type. Studies comparing more than one IL usually focus on learners
from one mother tongue background and use data from one or two other
learner populations only to check whether the features they have high-
lighted in one corpus are common to other learners, or are L1-specific
(and so possibly transfer-related). In this book, I have tried to make the
most of CIA by systematically exploiting the two types of comparison it
allows to examine EFL learners’ use of academic vocabulary.
The results show that academic, and more precisely argumentative, essays
written by upper-intermediate to advanced EFL learners share a number of
linguistic features irrespective of the learners’ mother tongue backgrounds
or language families. The common core of interlanguage features that
characterize the expression of rhetorical and organizational functions in
EFL writing includes a limited lexical repertoire and a lack of register aware-
ness as well as lexico-grammatical and phraseological specificities, the
semantic misuse of connectors and labels, the extensive use of chains of
connective devices and a marked preference for placing connectors in the
sentence-initial position. Several of these linguistic features, and more
specifically, the lack of register awareness, may also be found in novice
native-speaker writing. However, other features such as lexico-grammatical
errors, the use of non-native-like sequences and the overuse of relatively
rare expressions seem to be largely learner-specific.
A systematic analysis of several interlanguages is necessary to analyse the

potential influence of developmental, teaching-induced and transfer-
related factors on EFL learner writing. By focusing on shared features
across L1 learner populations, I have highlighted the important role
played by developmental and teaching-induced factors in learners’ written
production. I have also shown that it is not always possible to attribute
learner-specific features to a single factor, because developmental, teach-
ing-induced and transfer-related effects can reinforce each other (Granger,
2004:135–6). Applying Jarvis’s (2000) methodological framework to learner
corpus data has helped identify a number of transfer effects that until now
have been largely undocumented in the SLA literature. Lexical transfer has
too often been narrowed down to transfer of form/meaning mappings and
the third aspect of word knowledge, i.e. use, has rarely been investigated.
My study has helped to identify a number of transfer effects relating to word
use that make up what, following Hoey (2005), I refer to as ‘transfer of
primings’. Transfer of primings includes L1 influence on collocational use,
lexico-grammatical and phraseological patterns, discourse function, style
and register preferences, and frequency of use.
The valuable theoretical insights provided by a learner-corpus based
approach to the study of L1 influence bring to the fore the potential contri-
bution of learner corpora for SLA studies. Learner corpora are probably
the best – if not the sole – type of learner interlanguage samples which can
be used to investigate these transfer effects. In addition, they arguably pro-
vide a good account of the complexity and versatility of L1 influence. With
its focus on frequency, register differences and phraseology, corpus linguis-
tics clearly has numerous resources and specific tools to offer SLA research-
ers who wish to further investigate the manifestations of L1 influence on
learners’ interlanguage. There are many other variables that interact in
learners’ interlanguage which are also in need of careful operationaliza-
tion. Learner corpora can clearly act as a test bed for studies that aim to
provide empirical evidence for theories of second language acquisition.
They are not the exclusive preserve of learner corpus researchers, and
should feature prominently in the battery of data types used by all SLA
specialists.
7.3. Avenues for future research
A promising area of research which has only been touched upon in

this book lies in the investigation of patterns of difficulty shared by
mother-tongue English-speaking students and EFL learners. Such research

would enable linguistic features that are characteristic of novice writing to
be separated from those features that have commonly been attributed to
EFL writing. Novice native-speaker writers have been shown to have diffi-
culty with academic language, and more particularly with its highly conven-
tionalized phraseology. Howarth postulated the existence of a continuum
of phraseological competence that would ‘encompass mature NS writers at
one extreme and weak NNS writers at the other, with NS and NNS students
of varying levels of proficiency in between, and some overlap between native
and non-native writers’ (Howarth, 1999: 151). Hoey (2005) insisted that
primings are constrained by register and genre. He gave the example of the
word research which is primed in the mind of academic language users to
occur with recent in academic discourse and news reports on research. The
collocation is not primed to occur in other text types or other contexts.
A direct implication of Hoey’s theory of lexical priming is that academic
phraseology cannot be assumed to be primed in the mental lexicon of
novice native-speaker writers who have had little contact with academic
disciplines. Further research is clearly needed to shed more light on the
similarities and differences between EFL learners’ use of academic words
and phrasemes and that of novice native-speaker writers.
All in all, I have shown that the research paradigm of corpus linguistics is
ideally suited to studying the lexical specificities of academic discourse in
native-speaker and learner writing. The many corpora already available
make it possible to examine a wide range of genres and text types. However,
much more could be achieved in the field if other types of corpora were
collected. In particular, longitudinal corpora of learner language are sorely
lacking.1 L1 writing skills also need to figure more prominently in future
research. It does not make sense to expect learners to write properly in
English, and produce coherent and cohesive texts in a foreign language, if
they cannot already perform this task in their mother tongue. Learner
corpus research would greatly benefit from the design of comparable
corpora of L1 and L2 writing produced by the same learners. There is also
an urgent need for learner corpora which represent academic text
types other than argumentative essays. New corpora such as the British
Academic Written Corpus and the Michigan Corpus of Upper-level Student Papers
are thus particularly welcome, as they consist of ESP texts produced by
writers at different stages of undergraduate and graduate level study, both
native and non-native speakers, in a variety of disciplines. A new corpus
currently under development at Louvain, the Varieties of English for Specific
Purposes dAtabase (VESPA) learner corpus, has been designed as the ESP
counterpart of the International Corpus of Learner English. It includes English

for specific purposes texts written by L2 writers from various mother tongue
backgrounds.
New avenues of research can now be explored by SLA specialists, corpus
learner researchers and teachers alike. Not only have a number of largely
unrecognized transfer effects been brought to light, but the potential influ-
ence of L1 frequency on learner interlanguage has also been highlighted.
The role of frequency is a key issue in second language acquisition.
However, it has generally been conceived of in terms of L2 frequency.2 Not
a single article in the special issue of Studies in Second Language Acquisition
(2002, Volume 24/2) is devoted to L1 frequency effects and their implica-
tions for second language acquisition. The volume largely focuses on input
frequency, and its relation with language processing, intake3, and implicit
vs. explicit learning. Similarly, in a state-of-the-art article on SLA theory,
Gregg (2003) only addresses the issue of frequency in relation to the role of
input, thus restricting his discussion to the question of ‘how often does
input of X need to be provided in order for X to be acquired?’ (Gregg,
2003: 846). The role of L1 frequency is particularly interesting, and can be
expected to be the object of much attention in the next few years.
My journey into academic vocabulary has led me to explore a large
number of fascinating fields of research, and experiment with a wide range
of tools and methods. Navigating my way through the complexity of each
of these research areas, I have sought to unify several aspects of English
for academic purposes, learner corpus research, and second language
acquisition into a coherent whole. The challenges presented by such a
cross-disciplinary position have quickly been proved worthwhile by the fresh
light the approach has shed on key issues such as the nature of academic
vocabulary, the relative influence of developmental features and transfer
effects, and the methodological aspects of interlanguage studies. I hope
that this book will serve as a starting block for further research into the
many issues raised. There is still so much to explore.
Appendix 1: Expressing cause and effect
Comparisons based on total number of running words
ICLE BNC−AC−HUM LogL
Abs. Rel. Abs. Rel.

nouns
cause 314 26.9 755 22.7 6.3 (++)
cause 127 492
causes 186 263
*causae 1 −
factor 229 19.7 550 16.6 4.6
factor 100 244
factors 129 306
source 274 23.5 1,175 35.4 40.2 (− −)
source 194 577
sources 78 598
*sourse 2 −
origin 60 5.2 500 15 81.2 (− −)
origin 48 286
origins 11 214
*origine 1 −
root 173 14.8 183 5.5 83.3 (++)
root 112 72
roots 61 111
reason 939 80.6 1,802 54.3 92.2 (++)
reason 563 1105
reasons 374 697
*reaons 1 −
*reasongs 1 −
consequence 319 27.4 450 13.6 87.1 (++)
consequence 76 223
consequences 227 269
*consecvencies 1 −
*consecuence 2 −
*consecuences 3 −
*consecuenses 2 −
*consequencies 4 −
*consequense 1 −
*consequenses 3 −
(Continued)
220 Appendix 1
Abs. Rel. Abs. Rel.

effect 395 33.9 1,830 55 84.8 (− −)
effect 214 1249
effects 179 581
efect 2
result 381 32.7 813 24.5 20.9 (++)
result 167 502
results 213 311
*resut 1 −
outcome 28 2.4 143 4.3 9.03 (− −)
outcome 21 135
outcomes 7 8
implication 12 1 411 12.4 170.4 (− −)
implication 4 93
implications 8 318
TOTAL NOUNS 3,124 268 8,612 259.3 2.5
verbs
cause 499 42.8 570 17.2 211 (++)
cause 140 133
causes 106 66
caused 220 317
causing 33 54
bring about 51 4.4 125 3.8 0.8
brings 25 44
brings 10 6
brought 14 64
brining 2 11
contribute to 116 10 276 8.3 2.6
contribute 61 52
contributes 20 18
contributed 21 82
contributing 13 26
*contribuates 1 −
generate 14 1.2 227 6.8 67.4 (− −)
generate 3 63
generates 2 23
generated 9 119
generating 0 22
give rise to 20 1.7 101 3 6.2
give 8 23
gives 4 21
gave 3 32
given 5 18
giving 0 7
Appendix 1 221
Abs. Rel. Abs. Rel.

induce 15 1.3 67 2 2.7
induce 7 19
induces 8 5
induced 0 35
inducing 0 8
lead to 356 30.5 671 20.2 31.8 (++)
lead 184 161
leads 83 105
led 72 334
leading 17 71
prompt 12 1 115 3.5 22.1 (− −)
prompt 4 14
prompts 2 13
prompted 3 82
prompting 3 6
provoke 50 4.3 161 4.9 0.6
provoke 14 38
provokes 8 11
provoked 16 102
provoking 8 10
provocate 1 −
provocated 2 −
provoqued 1 −
result in/from 114 8.6 327 9.8 0
result 30 104
results 33 18
resulted 32 138
resulting 5 67
yield 2 0.2 88 2.7 39.1 (− −)
yield 2 31
yields 0 16
yielded 0 34
yielding 0 7
make sb/sth do sth# 489 42 171 5.2 666.1 (++)
arise from/out of 8 0.7 145 4.4 46 (− −)
arise 4 31
arises 2 28
arose 2 30
arisen 0 4
arising 0 52
derive 39 3.4 476 14.3 115.2 (− −)
derive 12 77
derives 8 68
derived 15 297
deriving 3 34
*derivated 1 −
(Continued)
222 Appendix 1
Abs. Rel. Abs. Rel.

emerge 33 2.8 466 14 126.2 (− −)
emerge 11 107
emerges 6 95
emerged 15 221
emerging 1 43
follow from 4 0.3 74 2.2 23.7 (− −)
follow 1 33
follows 0 35
followed 2 5
following 1 1
trigger 8 0.7 56 1.7 7 (− −)
trigger 5 14
triggers 0 3
triggered 3 27
triggering 0 12
stem from 7 0.6 68 2.9 13.3 (− −)
stem 1 14
stems 5 22
stemmed 0 23
stemming 1 9
TOTAL VERBS 1,847 158.5 4,174 125.7 66.8 (++)
adjectives
consequent 10 0.9 53 1.6 3.7
responsible (for) 171 14.7 344 10.4 13.3 (++)
TOTAL ADJ. 181 15.5 397 12 4.89
prepositions
because of 531 45.6 599 18 229.6 (++)
due to 246 21.1 195 5.9 175.1 (++)
as a result of 79 6.8 196 5.9 1.1
as a consequence of 7 0.6 22 0.7 0.1
in consequence of 5 0.4 1 0 8.7 (++)
in view of 8 0.7 66 2 10.6 (− −)
owing to 17 1.5 52 1.6 0.1
in (the) light of 7 0.6 109 3.3 31.6 (− −)
thanks to 199 17 35 1 360.1 (++)
on the grounds of 3 0.3 22 0.7 3
on account of 7 0.6 24 0.7 0.19
TOTAL PREP. 1,109 95 1,321 39.8 433.4 (++)
Appendix 1 223
Abs. Rel. Abs. Rel.

Adverbs
therefore 701 60.1 1,412 42.5 54.1 (++)
therefore 689
*therefor 12
accordingly 26 2.2 130 3.9 7.7 (−)
consequently 183 15.7 143 4.3 132.4 (++)
consequently 179
*consecuently 4
thus 446 38.3 1,767 53.2 41.2 (− −)
hence 42 3.6 283 8.5 33.3 (− −)
so 1,436 123.2 1,894 57 457.8 (++)
thereby 15 1.3 182 5.5 43.8 (− −)
as a result 103 8.8 101 3 55.7 (++)
as a consequence 35 3 20 0.6 34.3 (++)
in consequence 11 0.9 14 0.4 3.8
by implication 0 0 35 1.1 21.1 (− −)
TOTAL ADVERBS 2,998 257.2 5,981 180 243.3 (++)
conjunctions
because 2,495 214 2,207 66.4 1553.8
because 2,493 (++)
*becausae 1
*becaus 1
since## 428 36.7 955 28.74 17.1 (++)
##
as 331 28.4 883 26.6 1
for 58 5 1,036 31.2 325.9 (− −)
so that 273 23.4 696 21 2.4
PRO is why 220 18.9 52 1.56 359 (++)
that is why 189 16.2 22 0.7 381.7 (++)
this is why 18 1.5 18 0.5 24.7 (++)
which is why 12 1 12 0.4 0.3
on the grounds that 5 0.4 83 2.5 25 (− −)
TOTAL CONJ. 3,810 326.9 5,912 178 809.1 (++)
TOTAL 13,066 1121 26,407 794.9 989.9 (++)
224 Appendix 1
Comparisons based on total number of ‘cause and effect’ lexical items
Abs. % Abs. %
nouns
cause 314 2.4 755 2.9 6.9 (− −)
factor 229 1.8 550 2.1 4.9
source 274 2.1 1,175 4.5 145.3 (− −)
origin 60 0.55 500 1.9 153.3 (− −)
root 173 1.3 183 0.7 36.4 (++)
reason 939 7.2 1,802 6.8 1.7
consequence 319 2.4 450 1.7 23.5 (++)
effect 395 3 1,830 6.9 263.8 (− −)
result 381 2.9 813 3.1 0.8
outcome 28 0.2 143 0.5 24.4 (− −)
implication 12 0.1 446 1.7 274 (− −)
TOTAL NOUNS 3,124 23.9 8,612 32.6 231.2 (− −)
Verbs
cause 499 3.8 570 2.2 84.4 (++)
bring about 51 0.4 125 0.5 1.4
contribute to 116 0.9 276 1.1 2.2
generate 14 0.1 227 0.9 106.6 (− −)
give rise to 20 0.2 101 0.4 16.9 (− −)
induce 15 0.1 67 0.3 9 (− −)
lead to 356 2.7 671 2.5 1.1
prompt 12 0.1 115 0.4 39.5 (− −)
provoke 50 0.4 161 0.6 8.9 (− −)
result in 114 0.9 327 1.2 10.9 (− −)
yield 2 0.0 88 0.3 56 (− −)
make sb/sth do sth# 489 3.7 171 0.7 463.6 (++)
arise from/out of 8 0.1 145 0.6 71.5 (− −)
derive 39 0.3 476 1.8 192.7 (− −)
emerge 33 0.3 466 1.8 201 (− −)
follow from 4 0.0 74 0.3 36.8 (− −)
trigger 8 0.1 56 0.2 14.5 (− −)
stem 7 0.1 68 0.3 23.6 (− −)
TOTAL VERBS 1,847 14.1 4,174 15.8 16.2 (− −)
adjectives
consequent 10 0.1 53 0.2 9.6 (− −)
responsible (for) 171 1.3 344 1.3 0
TOTAL ADJ. 181 1.4 397 1.5 0.9
prepositions
because of 531 4.1 599 2.3 93.3 (++)
due to 246 1.9 195 0.7 95.3 (++)
as a result of 79 0.6 196 0.7 2.4
as a consequence of 7 0.1 22 0.1 1.1
Appendix 1 225
Abs. % Abs. %
in consequence of 5 0.0 1 0 6.45
in view of 8 0.1 66 0.3 20.1 (− −)
owing to 17 0.1 52 0.2 2.4
in (the) light of 7 0.1 109 0.4 50.2 (− −)
thanks to 199 1.5 35 0.1 270.7 (++)
on the grounds of 3 0.0 22 0.1 6
on account of 7 0.1 24 0.1 1.7
TOTAL PREP. 1109 8.5 1321 5 164.1 (++)
adverbs
therefore 701 5.4 1,412 5.4 0.0
accordingly 26 0.2 130 0.5 21.4 (− −)
consequently 183 1.4 143 0.5 72.6 (++)
thus 446 3.4 1,767 6.7 182.7 (− −)
hence 42 0.3 283 1.1 70.2 (− −)
so 1,436 11 1,894 7.2 144.9 (++)
thereby 15 0.1 182 0.7 73.4 (− −)
as a result 103 0.8 101 0.4 26.2 (++)
as a consequence 35 0.3 20 0.1 21.4 (++)
in consequence 11 0.1 14 0.1 1.3
by implication 0 0 35 0.1 28.1 (− −)
TOTAL ADVERBS 2,998 23 5,981 22.7 0.3
Conjunctions
because 2,495 19.1 2,207 8.4 790.6 (++)
since## 428 3.3 955 3.6 2.9
as## 331 2.6 883 3.3 19.3 (− −)
for 58 0.4 1,036 3.9 507.59 (− −)
so that 273 2.1 696 2.6 10.92 (−)
PRO is why 220 1.7 52 0.2 262.8 (++)
that is why 189 1.5 22 0.1 294.5 (++)
this is why 28 0.2 18 0.1 14.8 (++)
which is why 3 0.0 12 0.0 1.3
on the grounds that 5 0.0 83 0.3 39.4 (− −)
TOTAL CONJ. 3,810 29.2 5,912 22.4 158.3 (++)
TOTAL 13,066 100 26,407 100
#
Estimations based on Gilquin (2008).
##
Estimations based on an analysis of the first 200 occurrences of the word in each corpus.
Appendix 2: Comparing and contrasting
Comparisons based on total number of running words
Abs. Rel. Abs. Rel.

nouns
resemblance 2 0.2 116 3.49 54.9 (− −)
resemblance 2 0.2 100 3
resemblances 0 0 16 0.5
similarity 25 2.1 212 6.38 35.2 (− −)
similarity 7 0.6 106 3.19
similarities 16 1.4 106 3.19
*similarieties 1 0.1 −
*similiraty 1 0.1 −
parallel 6 0.5 147 4.4 54 (− −)
parallel 6 0.5 76 2.3
parallels 0 0 71 2.1
parallelism 3 0.3 19 0.6 2
parallelism 1 0.1 10 0.3
parallelisms 0 0 9 0.3
*paralelism 1 0.1 −
*parallelim 1 0.1 −
analogy 3 0.3 175 5.3 82.9 (− −)
analogy 3 0.3 133 4
analogies 0 0 42 1.3
contrast 25 2.1 522 15.7 178.3 (− −)
contrast 18 1.5 470 14.2
contrasts 7 0.6 52 1.6
comparison 38 3.3 311 9.4 49.3 (− −)
comparison 36 3.1 249 7.5
comparisons 0 0 62 1.9
*comparaison 1 0.1 −
*comparision 1 0.1 −
difference 394 33.8 1,318 39.7 8 (− −)
difference 187 16 802 24.1
differences 191 16.4 516 15.5
*differencies 6 0.5 −
*difference 3 0.3 −
Appendix 2 227
Abs. Rel. Abs. Rel.

*diference 1 0.1 −
*differency 1 0.1 −
*differene 1 0.1 −
*diffrences 1 0.1 −
differentiation 3 0.3 76 2.3 28.3 (− −)
differentiation 2 0.2 72 2.2
differentiations 0 0 4 0.1
*differenciation 1 0.1 −
distinction 47 4.1 595 17.9 148.4 (− −)
distinction 38 3.3 498 15
distinctions 9 0.8 97 2.9
distinctiveness 2 0.2 10 0.3 0.6
(the) same 246 21.1 559 16.8 8.5 (+)
*similars 1 0.1 −
(the) contrary 17 1.5 28 0.8 3
contrary 16 1.4 27
contraries 1 0.1 1
(the) opposite 44 3.78 85 2.6 4.2
opposite 40 3.4 58
opposites 4 0.3 27
(the) reverse 5 0.4 56 1.7 12.6 (− −)
TOTAL NOUNS 860 73.8 4,229 127.3 283.7 (− −)
Adjectives
same 1,058 90.8 2,580 77.7 18.8 (++)
similar 160 13.7 1,027 30.9 110.5 (− −)
similar 157 13.5 30.9
*similiar 2 0.2
*simmilar 1 0.1
analogous 1 0.1 55 1.7 25.8 (− −)
common 275 23.6 1055 31.8 20.4 (− −)
comparable 16 1.4 223 6.7 59.8 (− −)
identical 12 1 137 4.1 31.3 (− −)
parallel 5 0.4 52 1.6 10.9 (− −)
alike 23 2 98 3 3.3
contrasting 1 0.1 63 1.9 30.3 (− −)
different 1,515 130 2,496 75.1 268 (++)
different 1510 129.6 2496 75.1
*differents 2 0.2 −
*differrent 1 0.1 −
*diffrent 2 0.1 −
differing 4 0.3 72 2.17 22.75 (− −)
(Continued)
228 Appendix 2
Abs. Rel. Abs. Rel.

distinct 9 0.8 278 8.4 111.4 (− −)
distinct 7 0.6 278 8.4
*distinc 2 0.2 −
distinctive 13 1.1 163 4.9 40.3 (− −)
distinguishable 2 0.2 33 1 9.9 (− −)
unlike 2 0.2 43 1.3 14.9 (− −)
contrary 7 0.6 27 0.8 0.5
opposite 53 4.6 127 3.8 1.1
reverse 7 0.6 23 0.7 0.1
reverse 3 0.3
*reversed 4 0.3
TOTAL ADJECTIVES 3,163 271.4 8,552 257.4 6.4
verbs
resemble 31 2.7 138 4.2 5.5
resemble 16 1.4 51 1.5
resembled 3 0.3 18 0.5
resembles 11 0.9 46 1.4
resembling 1 0.1 23 0.7
correspond 41 3.52 137 4.1 0.8
correspond 27 2.3 73 2.2
corresponded 3 0.3 16 0.5
corresponds 4 0.3 48 1.4
corresponding 7 0.6 28 0.8
look like 106 9.1 102 3.1 58.9 (++)
look like 72 6.2 42 1.3
looks like 21 1.8 38 1.1
looked like 12 1.0 19 0.6
looking like 1 0.1 3 0.1
compare 129 11.1 278 8.4 6.6 (+)
compare 75 6.4 140 4.2
compared 36 3.1 71 2.1
compares 2 0.2 17 0.5
comparing 16 1.4 50 1.5
parallel 2 0.2 56 1.7 21.7 (− −)
parallel 1 0.1 9 0.3
parallels 0 0 4 0.1
paralleled 0 0 38 1.1
paralleling 1 0.1 5 0.2
contrast 7 0.6 137 4.1 45.3 (− −)
contrast 3 0.3 31 6.4
contrasted 4 0.3 47 11 (− −)
contrasts 0 0 42
contrasting 0 0 17
Appendix 2 229
Abs. Rel. Abs. Rel.

differ 86 7.4 242 7.29 0.01
differ 57 4.9 112 3.4 5
differs 29 2.5 73 2.2 0.3
differed 0 0 57 1.7 34.3 (− −)
distinguish 107 9.2 404 12.16 7.1 (− −)
distinguish 70 6 164 4.9 1.8
distinguished 16 1.4 116 3.4 15.4 (− −)
distinguishes 12 1.0 36 2.2 0.0
distinguishing 6 0.5 88 1.7 24.5 (− −)
*distinquish 2 0.2 −
*distingush 1 0.1 −
differentiate 18 1.5 74 2.2 2.09
differentiates 1 0.1 6 0.2 0.6
differentiated 2 0.2 31 0.9 9 (− −)
differentiating 1 0.1 15 0.5 4.2
*differenciate 2 0.2 −
TOTAL VERBS 527 45.2 1,568 47.2 0.7
adverbs
similarly 31 2.7 394 11.9 98.6 (− −)
similarly 26 2.2
*similarely 1 0.1
*similarily 1 0.1
*similary 3 0.3
analogously 1 0.1 2 0.1 0.1
identically 0 0 2 0.1 1.2
correspondingly 0 0 29 0.9 17.4 (− −)
parallely 1 0.1 − −
likewise 9 0.8 118 3.6 30.3 (− −)
in the same way 38 3.3 56 1.7 9.3 (+)
contrastingly 0 0 3 0.1 1.8
differently 42 3.6 97 2.9 1.3
by/in contrast 9 0.8 185 62.7 (− −)
by contrast 2 0.2 116 54.9 (− −)
in contrast 7 0.6 69 5.6 13.7 (− −)
by way of contrast 1 0.1 0 3.5 2.7
by/in comparison 0 0 23 0.7 13.8 (− −)
by comparison 0 0 14 0.4
in comparison 0 0 9 0.3
comparatively 14 1.2 69 2.1 3.9
comparatively 13 1.1
*comparitively 1 0.1 −
contrariwise 0 0 4 0.1 2.4
distinctively 1 0.1 25 0.8 9.3 (− −)
on the other hand 418 35.9 372 11.2 258.3 (++)
(Continued)
230 Appendix 2
Abs. Rel. Abs. Rel.

(on the one hand) 100 8.6 136 4.1 29.8 (++)
*on the other side 23 2 0 0 62 (++)
*on the opposite 3 0.3 0 0 8.1 (++)
on the contrary 164 14 95 2.9 158.9 (++)
on the contrary 160 95
*on the contray 3 −
*on the contrairy 1 −
Other expressions with 13 1.1 2 0.1
contrary
*in contrary 1 0 0
*by the contrary 1 0 0
*to the contrary 2 0 0
quite the contrary 4 2 0.1 4.4
*in the contrary 2 0 0
rather the contrary 1 0 0
*quite contrary 1 0 0
*contrary 1 0 0
reversely 1 0.1 0 0
conversely 6 0.5 62 1.9 12.9 (− −)
TOTAL ADVERBS 875 76.7 1,250 38.7 231.7 (++)
Prepositions
like# 1,435 123.1 2,812 84.7 127.5 (++)
unlike 26 2.2 244 7.3 45.8 (− −)
in parallel with 0 0 8 0.2 4.8
as opposed to 7 0.6 121 3.6 37.4 (− −)
as against 0 0 46 1.4 27.7 (− −)
in contrast to/with 23 2 82 2.5 0.9
in contrast to 15 1.3 73 2.2 4
in contrast with 8 0.7 9 0.3 3.5
versus 7 0.6 53 1.6 7.5 (− −)
contrary to 18 1.5 66 2 0.9
*in contrary to 2 0.2 0 0
*opposite to 3 0.3 0 0 8.1 (++)
by/in comparison with 39 3.4 52 1.6 12.1 (+)
in comparison with 28 2.4 14 0.4 30.5 (++)
in comparison to 11 0.9 4 0.1 14.7 (+)
by comparison with 0 0 21 0.6 12.6 (− −)
in comparison with 0 0 14 0.4 8.4 (− −)
TOTAL PREP. 1,560 133.8 3,484 104.9 62 (++)
Conjunctions
as # 1,157 99.3 5,045 151.9 185.4 (− −)
while # 206 17.7 1264 38 124.4 (− −)
Appendix 2 231
Abs. Rel. Abs. Rel.

whereas 137 11.8 442 13.3 1.6
whereas 135 11.6
wheras 2 0.2
TOTAL CONJ. 1,500 128.7 6,751 203.2 281.3 (− −)
Other expressions
as . . . as 1,287 110.4 2,766 83.26 67.5 (++)
in the same way as/that 19 1.6 38 1.14 1.5
compared with/to 49 4. 155 4.67 0.4
compared with 12 1.0 113 3.4 21.3 (− −)
compared to 37 3.2 42 1.26 15.8 (+)
CONJ compared to/with 14 1.2 32 1 0.5
as compared to/with 5 0.4 11 0.3 0.2
when compared to/with 3 0.3 20 0.6 2.3
if compared to/with 6 0.5 1 0.0 11 (++)
TOTAL 9,854 845.5 29,249 880.5 12.24 (− −)
232 Appendix 2
Comparisons based on total number of ‘comparison and contrast’ lexical items
Abs. % Abs. %
Nouns
resemblance 2 0.0 116 0.4 52.6 (− −)
similarity 25 0.3 212 0.7 32.3 (− −)
parallel 6 0.1 147 0.5 51.3 (− −)
parallelism 3 0.0 19 0.1 1.8
analogy 3 0.0 175 0.1 79.5 (− −)
contrast 25 0.3 522 1.8 168.9 (− −)
comparison 38 0.4 311 1.1 45.1 (− −)
difference 394 4 1,318 4.5 4.4
differentiation 3 0.0 76 0.3 26.9 (− −)
distinction 47 0.5 595 2.0 138.9 (− −)
distinctiveness 2 0.0 10 0.0 0.5
(the) same 246 2.5 559 1.9 11.8 (− −)
(the) contrary 17 0.2 28 0.1 3.5
(the) opposite 44 0.5 85 0.3 5.1
(the) reverse 5 0.1 56 0.2 11.7 (− −)
TOTAL NOUNS 860 8.7 4,229 14.5 202.8 (− −)
Adjectives
same 1,058 10.7 2,580 0.9 28.2(++)
similar 160 1.6 1,027 3.5 98.8(− −)
analogous 1 0.0 55 0.2 24.7(− −)
common 275 2.8 1055 3.6 15.1(− −)
comparable 16 0.2 223 0.8 56.2 (− −)
identical 12 0.1 137 0.5 29.2 (− −)
parallel 5 0.1 52 0.2 10.1 (−)
alike 23 0.2 98 0.3 2.6
contrasting 1 0.0 63 0.2 29 (− −)
different 1,515 15.4 2,496 8.5 307.7 (++)
differing 4 0.0 72 0.3 21.5 (− −)
distinct 9 0.1 278 1 106.2 (− −)
distinctive 13 0.1 163 0.6 37.7 (− −)
distinguishable 2 0.0 33 0.1 9.3 (− −)
unlike 2 0.0 43 0.2 14.1 (− −)
contrary 7 0.1 27 0.1 0.4
opposite 53 0.5 127 0.4 1.7
reverse 7 0.1 23 0.1 0.1
TOTAL ADJECTIVES 3,163 32.1 8,552 29.2 19.82 (++)
Verbs
resemble 31 0.3 138 0.5 4.5
correspond 41 0.4 137 0.5 0.5
look like 106 1.1 102 0.4 63.2 (++)
compare 129 1.3 278 1 8.7 (+)
parallel 2 0.0 56 0.2 20.6 (−)
Appendix 2 233
Abs. % Abs. %
contrast 7 0.1 137 0.5 42.9 (− −)

differ 86 0.9 242 0.8 0.2
distinguish 107 1.1 404 1.4 5.1
TOTAL VERBS 527 5.4 1,568 5.4 0
Adverbs
similarly 31 0.3 394 1.4 92.3 (− −)
analogously 1 0.0 2 0.0 0.1
identically 0 0 2 0.0 1.2
correspondingly 0 0 29 0.1 16.8 (− −)
parallely 1 0.0 0 0 2.8
likewise 9 0.1 118 0.4 28.3 (− −)
in the same way 38 0.4 56 0.2 10.4 (++)
contrastingly 0 0 3 0.0 1.7
differently 42 0.4 97 0.3 1.8
by/in contrast 9 0.1 185 0.6 59.4 (− −)
by contrast 2 116
in contrast 7 69
by way of contrast 1 0.0 0 0 2.8
by/in comparison 0 0 23 0.1 13.4 (− −)
by comparison 0 14
in comparison 0 9
comparatively 14 0.1 69 0.2 3.3
contrariwise 0 0 4 0.0 2.3
distinctively 1 0.0 25 0.1 8.8 (− −)
on the other hand 418 4.2 372 1.3 275.8 (++)
(on the one hand) 100 1.0 136 0.5 33 (++)
*on the other side 23 0.2 0 0 63.4 (++)
*on the opposite 3 0.0 0 0 8.3 (++)
on the contrary 164 1.7 95 0.3 166.8 (++)
Other expressions with 13 0.1 2 0.0 25.2 (++)
contrary
reversely 1 0.0 0 0 2.8
conversely 6 0.1 62 0.2 12 (−)
TOTAL ADVERBS 875 8.9 1,250 4.3 258.6 (++)
Prepositions
like# 1,435 14.6 2,812 9.6 155.8 (++)
unlike 26 0.3 244 0.8 42.3 (− −)
in parallel with 0 0 8 0.0 4.7
as opposed to 7 0.1 121 0.4 35.3 (− −)
as against 0 0 46 0.2 26.7 (− −)
in contrast to/with 23 0.2 82 0.3 0.6
versus 7 0.1 53 0.2 6.9 (− −)
contrary to 18 0.2 66 0.2 0.7
*in contrary to 2 0.0 0 0 5.5
(Continued)
234 Appendix 2
Abs. % Abs. %
*opposite to 3 0.0 0 0 8.3 (+)

by/in comparison with 39 0.4 52 0.2 13.4 (+)
TOTAL PREP. 1,560 15.8 3,484 11.9 83.9 (++)
Conjunctions
as# 1,157 11.7 5,045 17.3 150.5 (− −)
while# 206 2.1 1264 4.3 110.6 (− −)
whereas 137 1.4 442 1.5 0.7
TOTAL CONJ. 1,500 15.2 6,751 23.1 231.6 (− −)
Other expressions
as … as 1,287 13.1 2,766 9.5 87.8 (++)
in the same way as/that 19 0.2 38 0.1 2.7
compared with/to 49 0.5 155 0.5 0.2
CONJ compared to/ 14 0.1 32 0.1 0.6
with
TOTAL 9,854 100 29,249 100
# Estimations based on an analysis of the first 200 occurrences of the word in each corpus.
Notes
Chapter 1
1
See Stein (2008) for a review of major twentieth-century projects aimed at
developing a controlled vocabulary for foreign language learners.
2
This specific set of abstract nouns has variously been referred to as ‘signalling
words’ (Jordan, 1984), ‘anaphoric nouns’ (Francis, 1986), ‘carrier nouns’ (Ivanicˇ,
1991), ‘shell nouns’ (Schmid, 2000) and ‘discourse-organising words’ (McCarthy,
1991).
Chapter 2
1
The BAWE Pilot Corpus was a pilot for the ESRC funded project ‘An investigation
of genres of assessed writing in British higher education (RES-000-23-0800).
It was created in 2001 under the directorship of Hilary Nesi, with support from the
University of Warwick Teaching Development Fund.
2
The British Academic Written English (BAWE) corpus was developed at the Universi-
ties of Warwick, Reading and Oxford Brookes under the directorship of Hilary
Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics at
Warwick University), Paul Thompson (Department of Applied Linguistics,
Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes),
with funding from the ESRC (RES-000-23-0800). The BAWE corpus contains 2761
pieces of proficient assessed student writing. Holdings are fairly evenly distributed
across four broad disciplinary areas (Arts and Humanities, Social Sciences, Life
Sciences and Physical Sciences). Thirty-five disciplines are represented.
3
See http://ucrel.lancs.ac.uk/claws7tags.html for a list of tags used in CLAWS C7
tagset (accessed 2 August 2009).
4
If a text is 75,000 words long, it has 75,000 tokens. But a lot of these words will be
repeated, and there may be only 2,000 different words (called types) in the text.
5
Sentence examples are taken from the Longman Dictionary of Contemporary English
(2005)
6
See the definition of a reference corpus proposed by the Expert Advisory Group
on Language Engineering Standards (EAGLES96) at http://www.ilc.cnr.it/
EAGLES96/corpustyp/node18.html (accessed 2 August 2009).
7
Each of these corpora consists of one million words of British or American written
English. The four corpora are equivalent in the sense that they were compiled using
the same corpus design and sampling methods. For more information about these
corpora, see http://khnt.hit.uib.no/icame/manuals (accessed 2 August 2009).
236 Notes
8
Katz (1996: 19) distinguishes between ‘document-level burstiness’, i.e. ‘multiple
occurrences of a content word or phrase in a single-text document, which is
contrasted with the fact that most other documents contain no instances of this
word or phrase at all’; and ‘within-document burstiness’ or ‘burstiness proper’,
i.e. the ‘close proximity of all or some individual instances of a content word or
phrase within a document exhibiting multiple occurrences’.
9
Scott’s (2004) WordSmith Tools 4 can compute Juilland’s D values, but only for
words in a single file, based on an arbitrary division of a text into 8 segments of
equal size.
10
Available at http://www.lextutor.ca/vp/eng/ (accessed 2 August 2009).
Chapter 3
1
A random sample of 20 essays from each of the 16 L1 sub-corpora available in the
second version of ICLE were submitted to a professional rater who was asked to
rate them on the basis of the Common European Framework of Reference for Languages
(CEF) descriptors for writing. While 60 per cent of the sample essays were rated
as advanced (C1 or C2), the proportion was much higher in some sub-corpora,
reaching 100 per cent for students with Swedish mother tongue, but falling as low
as 40 per cent for Spanish speakers (Granger et al., 2009: 11–12).
2
ICLEv2 now also includes texts written by students with Chinese, Japanese,
Norwegian, Turkish and Tswana mother tongue backgrounds (cf. Granger et al.,
2009).
3
ICLE also comprises a Bulgarian sub-corpus. However, essays written by
Bulgarian-speaking learners were mainly written without the help of reference
tools and were therefore not included in the analysis.
4
Texts longer than 45,000 words were sampled so as to allow for a wider coverage
of text types and avoid over-representation of idiosyncratic uses. This design
criterion, however, causes problems for certain types of linguistic enquiries.
A number of studies in the field of English for academic purposes have shown
that words may behave differently and display different preferred lexico-
grammatical environments in different sections of a text (see, for example,
Gledhill, 2000). Quantitative comparisons between the BNC and ICLE thus have
to be treated with caution, especially when the lexical items under study are
closely linked to specific parts of texts (e.g. words and phrasemes used to intro-
duce the main topic or a conclusion).
5
See Stefan Evert’s webpage (http: //www.collocations.de/index.html (accessed
2 August 2009)) for a comprehensive list of measures of association and their
mathematical interpretation.
Chapter 4
1
In f[n, c], f is the frequency, n the node and c the collocate.
2
http://www.oed.com (accessed 2 August 2009).
3
These three nouns are listed under the first sense of ‘classic’ in LODCE4.
Notes 237
4
These figures are based on disambiguated data. The instances of illustrate used in
the sense of ‘to put pictures in a book, article, etc’ are not included.
5
Estimations based on an analysis of the first 200 occurrences of the conjunction
in the BNC-AC-HUM.
6
Estimations based on an analysis of the first 200 occurrences of the preposition
in the BNC-AC-HUM.
7
This does not mean, however, that there are no idioms, similes, compounds,
phrasal verbs, commonplaces and allusions to proverbs and quotations in
academic prose. As shown by Gläser, ‘authors of scientific writing are prone to
modify idioms, proverbs, and quotations for intellectual punning and sophisti-
cated allusions’ (1998: 143). Studies focusing on terminological terms used in
English for Specific Purposes have also revealed the pervasiveness of compounds
(e.g. Bourigault et al., 2004) in specialized texts.
Chapter 5
1
The ‘word list’ option of WST4 was used to search for any misspelt form of the
words under study in the ICLE.
2
The relative frequencies of for instance and example are higher in most learner cor-
pora than in the BNC-AC-HUM in most learner corpora. When the learner corpora
for different mother tongues are analysed separately, however, the differences in
use are only significant for a few groups. Aggregated frequencies thus also help to
reveal general, though moderate, overuse in learner corpora in general.
3
See Miller and Weinert (1995), Siegel (2002) and Biber et al. (1999: 562) for
specific functions of like in speech. See Müller (2005: 197–228) for an analysis of
like as a discourse marker.
4
Other verb co-occurrents that are quite frequent in the BNC-SP but not found
in the BNC-AC-HUM are the verbs get and think.
So we’ve got some examples here of some patterns that we want to learn using the N tuple
method and tuple and tuple. (BNC-SP)
Again think of the example of erm erm a social club you know, relationships between
members, although they may be close and intimate and friendly and all that, are not the
same as a relationship between members of a family. (BNC-SP)
5
The noun root is overused in the ICLE largely because it appears in an essay title
given to some of the EFL learners, ‘In the words of the old song: “Money is the
root of all evil”’, which learners then tend to work into their essays.
6
The underuse of the conjunctions as and while reported here must be treated
with caution as it results from estimations based on an analysis of only the first
100 occurrences of each conjunction in each corpus.
7
AKL words are printed in bold in these examples.
8
John Osborne (Université de Haute Savoie, France) kindly pointed out to me
that the sequence according to me also appeared in published textbooks such
as Ok! (Lacoste and Marcelin, Nathan 1984), which was widely used in French
colleges throughout the 1980s and early 1990s.
238 Notes
9
Gledhill (2000) uses the term ‘collocational cascade’ but, following Granger and
Paquot (2008a), I prefer to avoid using the adjective ‘collocational’ to refer to
sequences of co-occurrents.
10
[P] indicates a new paragraph in learner writing
11
A related problem is that of punctuation. EFL learners sometimes omit commas
after sentence-initial subordinate clauses or connectors or before and after
appositives such as that is and that is to say (e.g. According to von Mayer, however, what
matters is relative poverty* that is to say* the sudden decrease of wealth, ICLE-IT). By
contrast, they sometimes erroneously use a comma after the conjunctions although
or (even) though (e.g. When I compare these languages I do not consider English as an
easy language, although, I do admit that I have noticed some things that are easier about
English than about the other languages that I had the chance to learn, ICLE-PO).
12
Osborne (2008) compared adverb placement in the various interlanguages
represented in the first version of the International Corpus of Learner English and
found that ‘V-Adv-O order is most frequent in the productions of learners whose
L1 has verb-raising (French, Italian and Spanish), and least frequent with
speakers of V2 languages (Dutch, German and Swedish), with speakers of non-
raising languages (Russian, Polish, Czech and Bulgarian) in between’ (Osborne,
2008: 77).
13
See Paquot (2008b and in preparation) for details on the corpus linguistics meth-
ods and statistical measures used to operationalize Jarvis’s (2000) framework on
learner corpus data.
14
The results reported here are only preliminary. The figures should be treated
with caution as the LOCNESS corpus is quite small.
Chapter 6
1
The quality of the teaching material on the use of connectors in English that is
freely available on the Internet is generally quite alarming, especially given that
students increasingly use the Internet for study purposes.
2
As shown in Section 4.2.1, when the preferred sentence position of individual
connectors is taught, the information is often neither corpus-based nor con-
firmed by corpus data.
3
Cohesion is often dealt with in grammars, where the focus is always on connec-
tors. It is noteworthy that, in the new corpus-based Cambridge Grammar of English
(Carter and McCarthy, 2006), no attention is given to lexical cohesion, although
there is a chapter on textual cohesion (‘Grammar across turns and sentences’,
pp. 242–62) as well as a full chapter on ‘Grammar and Academic English’
(pp. 266–94).
4
http://www2c.ac-lille.fr/malraux-bethune/FORMAT/super/anglaisinfo/
methodes/ Expressions_et_mots_de_liaison.htm (last accessed: 30 July 2009).
5
See Gilquin et al.(2007a) for a detailed discussion of the role of corpora, and
more specifically, learner corpora in the design of EAP materials and for possible
explanations of the relatively modest role that corpora have played so far.
6
The ‘Improve your writing skills’ section in the MED2 shows how a rigorous
corpus-based method can help users achieve higher levels of accuracy and fluency
Notes 239
in academic writing. However, to achieve maximum efficiency, it is essential to

explore ways of integrating this type of description into the microstructure of
dictionaries rather than inserting it as a separate middle section. The Centre for
English Corpus Linguistics (Université catholique de Louvain) has therefore
recently launched a new dictionary project which consists of a web-based EAP
dictionary-cum-writing aid tool, the Louvain EAP Dictionary (LEAD) (see Granger
and Paquot, 2008b and 2010). This project is innovative in two main respects: it
allows for both onomasiological (via the lexeme) and semasiological (via the con-
cept) access and is customizable according to the learner’s mother tongue and
the field in which he or she is specializing (business, medicine, etc.).
Chapter 7
1
The Centre for English Corpus Linguistics launched the LONGDALE project in
January 2008, with the intention of building a large longitudinal database of
learner English containing data from learners with a wide range of mother
tongue backgrounds. In the LONGDALE project, the same students will be fol-
lowed over a period of two to three years.
2
The major role of L1 frequency has been identified in a few transfer studies
focusing on phonology and syntax (Selinker, 1992: 211; Kamimoto et al., 1992).
3
De Bot et al. distinguish between input and intake as follows: ‘“Input” is every-
thing around us we may perceive with our senses, and “uptake” or “intake” is what
we pay attention to and notice’ (2005: 8).
References1
Aarts, J. (2002), ‘Does corpus linguistic exist? Some old and new issues’, in L. Breivik
and A. Hasselgren (eds), From the COLT’s Mouth . . . and Others. Amsterdam: Rodopi,
pp. 1–17.
Aarts, J. and Granger, S. (1998), ‘Tag sequences in learner corpora: a key to
interlanguage grammar and discourse’, in Granger S. (ed.), Learner English on Com-
puter. London and New-York: Addison Wesley Longman, pp. 132–41.
Ädel, A. (2006), Metadiscourse in L1 and L2 English. Amsterdam: John Benjamins.
Ädel, A. (2008), ‘Involvement features in writing: do time and interaction trump register
awareness’, in Gilquin, G., Papp, S. and Diez-Bedmar B. M. (eds), Linking up Contras-
tive and Learner Corpus Research. Amsterdam and Atlanta: Rodopi, pp. 35–53.
Aijmer, K. (2001), ‘I think as a marker of discourse style in argumentative Swedish student
writing’, in Aijmer, K. (ed.), A Wealth of English. Studies in Honour of Göran Kjellmer.
Göteborg: Acta Universitatis Gothoburgensis, pp. 247–57.
Aijmer, K. (2002), ‘Modality in advanced Swedish learners’ written interlanguage’, in
Granger, S., Hung, J. and Petch-Tyson, S. (eds), Computer Learner Corpora, Second
Language Acquisition and Foreign Language Teaching. Language Learning and Language
Teaching 6. Amsterdam and Philadelphia: John Benjamins, pp. 55–76.
Altenberg, B. (1984), ‘Causal linking in spoken and written English’. Studia Linguistica,
38, 20–69.
Altenberg, B. (1998), ‘On the phraseology of spoken English: the evidence of recurrent
word-Combinations’, in Cowie, A. P. (ed.), Phraseology: Theory, Analysis and Applica-
tions. Oxford: Oxford University Press, pp. 101–22.
Altenberg, B. and Tapper, M. (1998), ‘The use of adverbial connectors in advanced
Swedish learners’ written English’, in Granger S. (ed.), Learner English on Computer.
London and New York: Addison Wesley Longman, pp. 80–93.
Archer, D. (ed.) (2009), What’s in a Word-list? Investigating Word Frequency and Keyword
Extraction. Farnham: Ashgate.
Archer, D. (2009a), ‘Does frequency really matter?’, in Archer, D. (ed.), What’s in a Word-
list? Investigating Word Frequency and Keyword Extraction. Farnham: Ashgate, pp. 1–15.
Archer, D., Wilson, A. and Rayson, P. (2002), Introduction to the USAS category system. Avail-
able from http://www.comp.lancs.ac.uk/computing/research/ucrel/usas/usas%20
guide.pdf.
Aston, G. and Burnard, L. (1998), The BNC Handbook. Edinburgh: Edinburgh University
Press.
Baayen, R. H., Feldman, L. F. and Schreuder, R. (2006), ‘Morphological influences
on the recognition of monosyllabic monomorphemic words’. Journal of Memory and
Language, 53, 496–512.
Bahns, J. (1993), ‘Lexical collocations: a contrastive view’. ELT Journal, 47 (1), 56–63.
Bailey, S. (2006), Academic Writing: A Handbook for International Students (2nd edition).
London and New York: Routledge.
References 241
Baker, M. (1988), ‘Sub-technical vocabulary and the ESP teacher: an analysis of some
rhetorical items in medical journal articles’. Reading in a Foreign Language, 4, 91–105.
Baker, P. (2004), ‘Querying keywords: questions of difference, frequency and sense in
keyword analysis’. Journal of English Linguistics, 32 (4), 346–59.
Barkema, H. (1996), ‘Idiomaticity and terminology: a multi-dimensional descriptive
model’. Studia Linguistica, 50 (2), 125–60.
Bartning, I. (1997), ‘L’apprenant dit avancé et son acquisition d’une langue étrangère:
tour d’horizon et esquisse d’une caractérisation de la variété avancée’. AILE, 9,
9–50.
Bauer, L. and Nation, I. S. P. (1993), ‘Word families’. International Journal of Lexicography,
6 (4), 253–79.
Bazerman, C. (1994), ‘Systems of genres and the enactment of social intentions’, in
Freedman, A. and Medway, P. (eds), Genre and the New Rhetoric. London: Taylor and
Francis, pp. 79–101.
Beheydt, L. (2005), ‘The development of an academic vocabulary’, in Battaner, P. and
DeCesaris, J. (eds), De lexicografia. Actes del I Symposium Internacional de Lexicografia.
Série actvitats 15. Barcelona: Institut universitari de linguistica
applicada, pp. 241–50.
Bhatia, V. (2002), ‘A generic view of academic discourse’, in Flowerdew, J. (ed.) Academic
discourse. Harlow: Longman, pp. 21–39.
Biber, D. (1988), Variation across Speech and Writing. Cambridge: Cambridge University
Press.
Biber, D. (2006), University Language: A Corpus-based Study of Spoken and Written Registers.
Amsterdam and Philadelphia: John Benjamins.
Biber, D. and Conrad, S. (1999), ‘Lexical bundles in conversation and academic prose’,
in Hasselgård, H. and Oksefjell, S. (eds), Out of Corpora: Studies in Honour of Stig Johans-
son. Amsterdam: Rodopi, pp. 181–90.
Biber D., Conrad, S. and Cortes, V. (2004), ‘If you look at . . . .: lexical bundles in university
teaching and textbooks’. Applied Linguistics, 25 (3), 371–405.
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999), Longman Grammar
of Spoken and Written English. Harlow: Longman.
Billuroğ lu, A. and Neufeld, S. D. (2007), BNL 2709: The most commonly used words in Eng-
lish. Fourth Edition. Nicosia: Rüstem Kitabevi.
Biskup, D. (1992), ‘L1 influence on learners’ renderings of English collocations:
a Polish/ German empirical study’, in Arnaud, P. and Béjoint, H. (eds), Vocabulary
and Applied Linguistics. London: Macmillan, pp. 85–93.
Bley-Vroman, R. (1983), ‘The comparative fallacy in interlanguage studies: the case of
systematicity’. Language Learning, 33, 1–17.
Bourigault, D., Aussenac-Gilles, N. and Charlet, J. (2004), ‘Construction de ressources
terminologiques ou ontologiques à partir de textes: un cadre unificateur pour trois
études de cas’. Revue d’Intelligence Artificielle, 18 (1), 87–110. Available from http://
w3.univ-tlse2.fr/erss/textes/pagespersos/bourigault/RIA-bourigault-aussenac-
charlet.doc
Bowker, L. (2003), ‘Corpus-based applications for translator training: exploring the
possibilities’, in Granger, S., Lerot, J. and Petch-Tyson, S. (eds), Corpus-based Approaches
to Contrastive Linguistics and Translation Studies. Amsterdam and Philadelphia: John
Benjamins, pp. 169–83.
Bowker, L. and Pearson, J. (2002), Working with Specialized Text: A Practical Guide to Using
Corpora. London and New York: Routledge.
242 References
Brill, E. (1992), ‘A simple rule-based part of speech tagger’. Proceedings of ANLP-92, 3rd
Conference on Applied Natural Language Processing. Available from http://
citeseer.ist.psu.edu/brill92simple.html
Burger, H. (1998), Phraseologie. Eine Einführung am Beispiel des Deutschen. Berlin: Erich
Schmidt VerčČlag.
Burnard, L. (2007), Reference Guide for the British National Corpus (XML edition). Available
from http://www.natcorp.ox.ac.uk/XMLedition/URG/
Campion, M. E. and Elley, W. B. (1971), An Academic Vocabulary List. Wellington:
NZCER.
Carter, R. (1998 [1987]) Vocabulary: Applied Linguistic Perspectives (2nd edition). London:
Routledge.
Carter, R. (1998b), ‘Orders of reality: CANCODE, communication, and culture’. ELT
Journal, 52 (1), 43–56.
Carter R. and McCarthy, M. (1988), ‘Lexis and discourse: vocabulary in use’, in Carter, R.
and McCarthy, M. (eds), Vocabulary and Language Teaching. New York: Longman,
pp. 201–20.
Carter, R. and McCarthy, M. (2006), Cambridge Grammar of English: A Comprehensive Guide.
Spoken and Written English Grammar and Usage. Cambridge: Cambridge University
Press.
Celce-Murcia, M. and Larsen-Freeman, D. (1999), The Grammar Book: An ESL/EFL
Teacher’s Course (2nd edition). Boston: Heinle and Heinle.
Charles, M. (2007), ‘Argument or evidence? Disciplinary variation in the use of the noun
“that” pattern in stance construction’. English for Specific Purposes, 26 (2), 203–18.
Chen, C. W. (2006), ‘The use of conjunctive adverbials in the academic papers
of advanced Taiwanese EFL learners’. International Journal of Corpus Linguistics,
11 (1), 113–30.
Chung, T. and Nation, P. (2003), ‘Technical vocabulary in specialised texts’. Reading in a
Foreign Language, 15 (2). Available from http://www.nflrc.hawaii.edu/rfl/
Clear, J. (1993) ‘From Firth principles: computational tools for the study of collocation’,
in Baker M., Francis, G. and Tognini-Bonelli, E. (eds), Text and Technology: In Honour
of John Sinclair. Amsterdam: John Benjamins, pp. 271–92.
Cohen, A. D., Glasman, H., Rosenbaum-Cohen, P. R., Ferrera, J. and Fine, J. (1988),
‘Reading English for specialised purposes: discourse analysis and the use of
student informants’, in Carrrell P., Devine, J. and Eskey, D. E. (eds), Interactive
Approaches to Second Language Reading. Cambridge: Cambridge University Press,
pp. 152–67.
Coltier, D. (1988), ‘Introduction et gestion des exemples dans les textes à thèse’. Pra-
tiques, 58, 23–41.
Connor, U. (1996), Contrastive Rhetoric: Cross-Cultural Aspects of Second-Language
Writing. New York: Cambridge University Press.
Conrad, S. (1999), ‘The importance of corpus-based research for language teachers’.
System, 27, 1–18.
Conrad, S. (2004), ‘Corpus Linguistics, Language Variation, and Language
Teaching’, in Sinclair, J. McH. (ed.) How to Use Corpora in Language Teaching. Amster-
dam: John Benjamins, pp. 67–85.
Cook, G. (1998), ‘The uses of reality: a reply to Ronald Carter’. ELT Journal, 52 (1):
57–63.
Corson, D. (1997), ‘The learning and use of academic English words’. Language
Learning, 47 (4), 671–718.
References 243
Cortes, V. (2002), ‘Lexical bundles in Freshman composition’, in Reppen, R.,

Fitzmaurice, S. M. and Biber, D. (eds), Using Corpora to Explore Linguistic Variation.
Amsterdam: John Benjamins, pp. 131–45.
Council of Europe (2001), Common European Framework of Reference for Languages: Learn-
ing, Teaching, Assessment. Cambridge: Cambridge University Press.
Cowan, J. R. (1974), ‘Lexical and syntactic research for the design of EFL reading materi-
als’. TESOL Quarterly, 8 (4), 389–400.
Cowie, A. P. (1998), ‘Phraseological dictionaries: some east-west comparisons’, in Cowie,
A. P. (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University
Press, p. 209–28.
Coxhead, A. (2000), ‘A new Academic Word List’. TESOL Quarterly, 34 (2), 213–38.
Coxhead, A. and Nation, P. (2001), ‘The specialised vocabulary of English for
Academic Purposes’, in Flowerdew, J. and Peacock, M. (eds), Research Perspectives on
English for Academic Purposes. Cambridge: Cambridge University Press,
pp. 252–67.
Coxhead, A. and Hirsh, D. (2007), ‘A pilot science-specific word list’. Revue française de
Linguistique Appliquée, 12 (2), 65–78.
Coxhead, A., Bunting, J., Byrd, P. and Moran, K. (forthcoming), The Academic Word List:
Collocations and Recurrent Phrases. Boston: University of Michigan Press.
Crewe, W. (1990), ‘The illogic of logical connectors’. ELT Journal, 44 (4), 316–25.
Curado Fuentes, A. (2001), ‘Lexical behaviour in academic and technical corpora: impli-
cations for ESP development’. Language Learning and Technology, 5 (3), 106–29.
Cutting, J. (2000), ‘Written errors of international students and English native speaker
students’, in Blue, G. M., Milton, J. and Saville, J. (eds), Assessing English for Academic
Purposes. Frankfurt am Main: Peter Lang, pp. 97–113.
Davies, A. (2003), The Native Speaker: Myth and Reality. Clevedon: Multilingual Matters.
De Bot, K., Lowie, W. and Verspoor, M. (2005), Second Language Acquisition: An Advanced
Resource Book. London and New York: Routledge.
De Cock, S. (2003), ‘Recurrent sequences of words in native speaker and advanced
learner spoken and written English: a corpus-driven approach’. Unpublished PhD
thesis. Louvain-la-Neuve: Université catholique de Louvain.
De Cock, S. and Granger, S. (2004), ‘Computer learner corpora and monolingual learn-
ers’ dictionaries: the perfect match’, in Teubert, W. and Mahlberg, M. (eds), The
Corpus Approach to Lexicography. Special issue of Lexicographica, 20, 72–86.
Dechert, H. (1984), ‘Second language production: six hypotheses’, in Dechert, H.,
Möhle, D. and Raupach, M. (eds), Second Language Productions. Tübingen: Gunter
Narr, pp. 211–30.
Dechert, H. and Lennon, P. (1989), ‘Collocational blends of advanced language learn-
ers: a preliminary analysis’, in Olesky, W. (ed.), Contrastive Pragmatics. Amsterdam:
John Benjamins, pp. 131–68.
DeRose, S. (1988), ‘Grammatical category disambiguation by statistical optimization’.
Computational Linguistics, 14, 31–9.
Dudley-Evans, T. and St Johns, M. J. (1998), Developments in English for Specific Purposes.
Cambridge: Cambridge University Press.
Eldridge, J. (2008), ‘“No, there isn’t an ‘academic vocabulary,’ but . . . ” A reader responds
to K. Hyland and P. Tse’s “Is there an ‘academic vocabulary’?”’. TESOL Quarterly,
42 (1), 109–13.
Ellis, R. and Barkhuizen, G. (2005), Analysing Learner Language. Oxford: Oxford Univer-
sity Press.
244 References
Engels, L.K. (1968), ‘The fallacy of word counts’. International Review of Applied Linguis-
tics, 10, 213–31.
Evans, S. and Green, C. (2006), ‘Why EAP is necessary: a survey of Hong Kong
tertiary students’. Journal of English for Academic Purposes, 6, 3–17.
Evert, S. (2004), ‘The statistics of word cooccurrences: word pairs and collocations’.
Ph.D. thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart.
Available from http://www.collocations.de/phd.html
Farrell, P. (1990), ‘Vocabulary in ESP: a lexical analysis of the English of electronics and
a study of semi-technical vocabulary’. CLCS Occasional Paper, 25, 1–83.
Field, Y. and Yip, L.M.O. (1992), ‘A comparison of internal conjunctive cohesion in the
English essay writing of Cantonese and native speakers of English’. RELC Journal, 23
(1), 15–28.
Firth, A. and Wagner, J. (1997), ‘On discourse, communication and (some) fundamental
concepts in SLA research’. Modern Language Journal, 81, 285–300.
Flowerdew, J. (1993), ‘Concordancing as a tool in course design’. System, 21 (2), 231–44.
Flowerdew, J. (1999), ‘Problems in writing for scholarly publication in English: the case
of Hong-Kong’. Journal of Second Language Writing, 8 (3), 243–64.
Flowerdew, J. (2002), ‘Introduction: approaches to the analysis of academic discourse in
English’, in Flowerdew, J. (ed.) Academic Discourse. Harlow: Longman, pp. 1–17.
Flowerdew, J. (2006), ‘Signalling nouns in a learner corpus’. International Journal of
Corpus Linguistics, 11 (3), 345–62.
Flowerdew, L. (1998), ‘Integrating ‘expert’ and ‘interlanguage’ computer corpora find-
ings on causality: discoveries for teachers and students’. English for Specific Purposes, 17
(4), 329–345.
Flowerdew, L. (2001), ‘The exploitation of small learner corpora in EAP materials
design’, in Ghadessy, M. and Roseberry, R. (eds), Small corpus studies and ELT.
Flowerdew, L. (2008) Corpus-based Analyses of the Problem-Solution Pattern: A Phraseological
Approach. Studies in Corpus Linguistics 29. Amsterdam and Philadelphia: John
Benjamins.
Francis, G. (1986) Anaphoric Nouns. Discourse Analysis Monograph 11. Birmingham:
English Language Research, University of Birmingham.
Francis, G. (1994), ‘Labelling discourse: an aspect of nominal-group lexical
cohesion’, in Coulthard, M. (ed.), Advances in Written Text Analysis. London and
New-York: Routledge, pp. 82–101.
Garside, R. (1987), ‘The CLAWS word-tagging system’, in Garside, R., Leech, G. and
Sampson, G. (eds), The Computational Analysis of English. London and New York:
Longman, pp. 30–41.
Garside, R. and Smith, N. (1997), ‘A hybrid grammatical tagger: CLAWS4’, in Garside,
R., Leech, G. and McEnery, A. (eds), Corpus Annotation: Linguistics Information from
Computer Text Corpora. New York: Addison Wesley Longman, pp. 102–21.
Ghadessy, M. (1979), ‘Frequency counts, word lists and materials preparation:
a new approach’. English Teaching Forum,17, 24–7.
Gilquin, G. (2000/2001), ‘The integrated contrastive model. Spicing up your data’. Lan-
guages in Contrast, 3 (1), 95–123.
Gilquin, G. (2008), ‘Combining contrastive and interlanguage analysis to apprehend
transfer’, in Diez-Bedmar, B. M., Gilquin, G. and Papp, S. (eds), Linking Up Contrastive
and Learner Corpus Research. Amsterdam and Atlanta: Rodopi, pp. 3–33.
Gilquin, G. and Granger, S. (2008), ‘From EFL to ESL: evidence from the International
Corpus of Learner English’. Paper presented at the First Triennial Conference of the
References 245
International Society for the Linguistics of English, Albert-Ludwigs-Universität

Freiburg, 8–11 October 2008.
Gilquin, G., Granger, S. and Paquot, M. (2007a) ‘Learner corpora: the missing link in
EAP pedagogy, in Thompson, P. (ed.), Corpus-based EAP pedagogy. Special issue of the
Journal of English for Academic Purposes, 6 (4), 319–35.
Gilquin, G., Granger, S. and Paquot, M. (2007b), ‘Improve your writing skills: writing
sections’, in Rundell, M. (editor in chief) Macmillan English Dictionary for Advanced
Learners (2nd edition). Oxford: Macmillan Education, pp. IW4–IW28.
Gilquin, G. and Paquot, M. (2008), ‘Too chatty: learner academic writing and register
variation’. English Text Construction, 1 (1), 41–61.
Gläser, R. (1998), ‘The stylistic potential of phraseological units in the light of genre
analysis’, in Cowie, A. P. (ed.), Phraseology: Theory, Analysis and Applications. Oxford:
Oxford University Press, pp. 125–43.
Gledhill, C. (2000), Collocations in Science Writing. Language in Performance 22.
Tuebingen: Gunter Narr Verlag.
Goodman, A. and Payne, E. (1981), ‘A taxonomic approach to the lexis of science’, in
Selinker L., Tarone, E. and Hnazeli, V. (eds), English for Academic and Technical Purposes:
Studies in Honour of Louis Trimble. Rowley MA: Newbury House, pp. 23–39.
Granger, S. (1996a), ‘From CA to CIA and back: an integrated approach to computer-
ized bilingual and learner corpora’, in Aijmer K., Altenberg, B. and Johansson, M.
(eds), Languages in Contrast: Text-based Cross-linguistic Studies. Lund Studies in English 88.
Lund: Lund University Press, pp. 37–51.
Granger, S. (1996b), ‘Romance words in English: from history to pedagogy’, in Svartvik,
J. (ed.), Words. Proceedings of an International Symposium. Stockholm: Almqvist and
Wiksell International, pp. 105–21.
Granger, S. (1997), ‘On identifying the syntactic and discourse features of participle
clauses in academic English: native and non-native writers compared’, in Aarts, J.,
de Mönnink, I. and Wekker, H. (eds), Studies in English Language and Teaching.
Amsterdam and Atlanta: Rodopi, pp. 185–98.
Granger, S. (ed.) (1998), Learner English on Computer. London and New York: Addison
Wesley Longman.
Granger, S. (1998a), ‘The computer learner corpus: a versatile new source of data
for SLA research’, in Granger, S. (ed.), Learner English on Computer. London and
New York: Addison Wesley Longman, pp. 3–18.
Granger, S. (1998b), ‘Prefabricated patterns in advanced EFL writing: collocations and
formulae’, in Cowie, A. P. (ed.), Phraseology: Theory, Analysis and Applications. Oxford:
Oxford University Press, pp. 145–60.
Granger, S. (2002), ‘A bird’s-eye view of learner corpus research’, in Granger, S., Hung,
J. and Petch-Tyson, S. (eds), Computer Learner Corpora, Second Language Acquisition and
Foreign Language Teaching. Language Learning and Language Teaching 6. Amsterdam
and Philadelphia: John Benjamins, pp. 3–33.
Granger, S. (2003), ‘The International Corpus of Learner English: a new resource for
foreign language learning and teaching and second language acquisition research’.
TESOL Quarterly, 37 (3), 538–46.
Granger, S. (2004), ‘Computer learner corpus research: current status and future
prospects’, in Connor, U. and Upton, T. (eds), Applied Corpus Linguistics: A Multidi-
mensional Perspective. Amsterdam and Atlanta: Rodopi, pp. 123–45.
Granger, S. (2006), ‘Lexico-grammatical patterns of EAP verbs: how do learners cope?’
Paper presented at Exploring the Lexis-Grammar Interface, 5–7 October 2006, University
of Hanover, Germany.
246 References
Granger, S. (2009), ‘The contribution of learner corpora to second language acquisition

and foreign language teaching: a critical evaluation’, in Aijmer, K. (ed.), Corpora and
Language Teaching. Amsterdam and Philadelphia: John Benjamins, pp. 15–32.
Granger, S., Dagneaux, E. and Meunier, F. (eds) (2002), The International Corpus of Learner
English. CD-ROM and Handbook. Presses universitaires de Louvain: Louvain-la-
Neuve.
Granger S., Dagneaux, E., Meunier, F. and Paquot, M. (2009), The International Corpus of
Learner English. Version 2. Handbook and CD-ROM. Louvain-la-Neuve: Presses univer-
sitaires de Louvain.
Granger, S. and Paquot, M. (2008a), ‘Disentangling the phraseological web’, in Granger,
S. and Meunier, F. (eds), Phraseology: An Interdisciplinary Perspective. Amsterdam and
Philadelphia: John Benjamins, pp. 27–49.
Granger, S. and Paquot, M. (2008b), ‘From dictionary to phrasebook?’, in Bernal, E. and
DeCesaris, J. (eds), Proceedings of the XIII EURALEX International Congress, Barcelona,
Spain, 15–19 July 2008, pp. 1345–55.
Granger, S. and Paquot, M. (2009a), ‘In search of General Academic English: a corpus-
driven study’, in Katsampoxaki-Hodgetts, K. (ed.), Options and Practices of L.S.P
practitioners Conference Proceedings. University of Crete Publications, E-media,
pp. 94–108. Available from http://cecl.fltr.ucl.ac.be/Downloads/In_search_of_a_general_
academic_english.pdf
Granger, S and Paquot, M. (2009b), ‘Lexical verbs in academic discourse: a corpus-driven
study of learner use’, in Charles, M., Pecorari , D. and Hunston, S. (eds.), Academic
Writing: At the Interface of Corpus and Discourse. Continuum, pp. 193–214.
Granger, S. and Paquot, M. (2010), ‘Customising a general EAP dictionary to learner
needs’, in Granger, S. and Paquot, M. (eds) eLexicography in the 21st century: new chal-
lenges, new applications. Proceedings of the eLex2009 Conference. Cahiers du Cental, 6.
Louvain-la-Neuve: Presses universitaires de Louvain.
Granger, S. and Rayson, P. (1998), ‘Automatic profiling of learner texts’, in Granger, S.
(ed.), Learner English on Computer. London and New York: Addison Wesley Longman,
pp. 119–31.
Granger, S. and Swallow, H. (1988), ‘False friends: a kaleidoscope of translation
difficulties’. Langage et l’Homme, 23, 108–20.
Granger, S. and Tyson, S. (1996), ‘Connector usage in the English essay writing of native
and non-native EFL speakers of English’. World Englishes, 15, 19–29.
Gregg, K. R. (2003), ‘SLA theory: construction and assessment’, in Doughty, C. and
Long, M. H. (eds), Handbook of Second Language Research. London: Blackwell,
pp. 831–65.
Gries, S. (2007), ‘Exploring variability within and between corpora: some methodologi-
cal considerations’. Corpora, 1 (2), 109–51.
Gries, S. (2008), ‘Dispersions and adjusted frequencies in corpora’. International Journal
of Corpus Linguistics, 13 (4), 403–37.
Groom, N. (2005), ‘Pattern and meaning across genres and disciplines: an exploratory
study’. Journal of English for Academic Purposes, 4, 257–77.
Halliday, M. and Hasan, R. (1976), Cohesion in English. London: Longman.
Hamp-Lyons, L. and Heasley, B. (2006), Study Writing: A Course in Writing Skills for Aca-
demic Purposes. Cambridge: Cambridge University Press.
Hancioğlu, N., Neufeld, S. and Eldridge, J. (2008), ‘Through the looking glass and into
the land of lexico-grammar’. English for Specific Purposes, 27 (4), 459–79.
Harris, S. (1997), ‘Procedural vocabulary in law case reports’. English for Specific
Purposes, 16 (4), 289–308.
References 247
Harris Leonhard, B. (2002), Discoveries in Academic Writing. Bonston: Heinle and Heinle.
Hasselgren, A. (1994), ‘Lexical teddy bears and advanced learners: a study into the ways
Norwegian students cope with English vocabulary’. International Journal of Applied Lin-
guistics, 4 (2), 237–60.
Heatley, A. and Nation, P. (1996), Range [Computer software]. Wellington, New Zealand:
Victoria University of Wellington. Available from http://www.victoria.ac.nz/lals/
resources/range.aspx
Hegelheimer, V. and Fisher, D. (2006), ‘Grammar, writing, and technology: a sample
technology-supported approach to teaching grammar and improving writing for ESL
learners’. CALICO Journal, 23 (2), 257–79.
Hinkel, E. (2002), Second Language Writers’ Text: Linguistic and Rhetorical Features. London:
Lawrence Erlbaum Associates.
Hinkel, E. (2003), ‘Adverbial markers and tone in L1 and L2 students’ writing’. Journal
of Pragmatics, 35 (7), 1049–68.
Hinkel, E. (2004), Teaching Academic ESL Writing: Practical Techniques in Vocabulary and
Grammar. Mahwah, NJ: Lawrence Erlbaum Associates.
Hirsh, D. and Nation, P. (1992), ‘What vocabulary size is needed to read unsimplified
texts for pleasure?’ Reading in a Foreign Language, 8, 689–96.
Hoey, M. (1993), ‘A common signal in discourse: how the word reason is used in texts’,
in Sinclair, J. M., Hoey, M. and Fox, G. (eds), Techniques of Description. London and
New York: Routledge, pp. 67–82.
Hoey, M. (1994), ‘Signalling in discourse: a functional analysis of a common
discourse pattern in written and spoken English’, in Coulthard, M. (ed.), Advances in
Written Text Analysis. London: Routledge, pp. 26–45.
Hoey, M. (2005), Lexical Priming: A New Theory of Words and Language. London and
New-York: Routledge.
Hoffmann, S. (2004), ‘Are low-frequency complex prepositions grammaticalized? On
the limits of corpus data – and the importance of intuition’, in Lindquist, H.
and Mair, C. (eds), Corpus Approaches to Grammaticalization in English. Amsterdam and
Philadelphia: John Benjamins, pp. 171–210.
Hoffmann, S. and Evert, S. (2006), ‘BNCweb (CQP-edition): The marriage of two corpus
tools’, in Braun, S., Kohn, K. and Mukherjee, J. (eds), Corpus Technology and Language
Pedagogy: New Resources, New Tools, New Methods, volume 3 of English Corpus Linguis-
tics. Peter Lang, Frankfurt am Main, pp. 177–95. Available from http://purl.org/
stefan.evert/PUB/HoffmannEvert2006.pdf
Hoffmann, S., Evert, S., Smith, N., Lee D. and Berglund Prytz, Y. (2008), Corpus Linguis-
tics with BNCweb – a Practical Guide. Frankfurt am Main: Peter Lang.
Howarth, P. (1996), Phraseology in English Academic Writing: Some Implications for
Language Learning and Dictionary Making. Tübingen: Max Niemeyer Verlag.
Howarth, P. (1998), ‘The phraseology of learners’ academic writing’, in Cowie, A. P.
(ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press,
pp. 161–86.
Howarth, P. (1999), ‘Phraseological standards in EAP’, in Bool H. and Luford, P. (eds)
Academic Standards and Expectations: The Role of EAP. Nottingham: Nottingham Univer-
sity Press, 149–58.
Huckin, T. (2003), ‘Specificity in LSP’. Ibérica, 5, 3–17.
Hunston, S. and Thompson, G. (eds) (2000), Evaluation in Text: Authorial Stance and the
Construction of Discourse. Oxford: Oxford University Press.
Huntley, H. (2006), Essential Academic Vocabulary: Mastering the Complete Academic Word
List. Boston: Houghton Mifflin Company.
248 References
Hyland, K. (1998), ‘Persuasion and context: the pragmatics of academic metadiscourse’.

Journal of Pragmatics, 30, 437–55.
Hyland, K. (2000) Disciplinary Discourses: Social Interactions in Academic Writing. Harlow:
Pearson Education Limited.
Hyland, K. (2002a), ‘Directives: argument and engagement in academic writing’. Applied
Linguistics, 23, 215–39.
Hyland, K. (2002b), ‘Specificity revisited: how far should we go now?’ English for Specific
Purposes, 21, 385–95.
Hyland, K. (2005), Metadiscourse. London and New York: Continuum.
Hyland, K. (2009), Academic Discourse. London and New York: Continuum.
Hyland, K. and Milton, J. (1997), ‘Qualifications and certainty in L1 and L2 students’
writing’. Journal of Second Language Writing, 6 (2), 183–205.
Hyland, K. and Tse, P. (2007), ‘Is there an “academic vocabulary”?’ TESOL Quarterly, 41
(2), 235–53.
Ide, N. (2005), ‘Preparation and analysis of linguistic corpora’, in Schreibman, S.,
Siemens, R. and Unsworth, J. (eds), A Companion to Digital Humanities. Oxford:
Blackwell, pp. 289–306.
Ivanič, R. (1991), ‘Nouns in search of a context: a study of nouns with both
open- and closed-system characteristics’. International Review of Applied Linguistics in
Language Teaching, 29, 93–114.
Jarvis, S. (2000), ‘Methodological rigor in the study of transfer: identifying L1 influence
in the interlanguage lexicon’. Language Learning, 50 (2), 245–309.
Jarvis, S. and Pavlenko, A. (2008), Crosslinguistic Influence in Language and Cognition.
New York and London: Routledge.
Johansson, S. (1978), Some Aspects of the Vocabulary of Learned and Scientific English.
Göteborg: Acta Universitatis Gothoburgensis.
Johns, T. (1994), ‘From printout to handout: grammar and vocabulary teaching in the
context of data-driven learning’, in Odlin, T. (ed.), Approaches to Pedagogic Grammar.
Cambridge: Cambridge University Press, pp. 293–313.
Jordan, M. P. (1984), Rhetoric of Everyday English Texts. London: Allen and Unwin.
Jordan, R. R. (1997), English for Academic Purposes. A Guide and Resource Book for Teachers.
Jordan, R. R. (1999), Academic Writing Course. Study Skills in English. Harlow: Pearson
Education Limited.
Juilland, A. and Rodriguez, E. C. (1964), Frequency Dictionary of Spanish Words. La Haye:
Mouton.
Kamimoto, T., Shimura, A. and Kellerman, E. (1992), ‘A second language classic recon-
sidered: the case of Schachter’s avoidance’. Second Language Research, 8 (3): 251–77.
Katz, S. (1996), ‘Distribution of common words and phrases in text and language model-
ling’, Natural Language Engineering, 2 (1), 15–59.
Kellerman, E. (1984), ‘The empirical evidence for the influence of the L1 in interlan-
guage’, in Davies A., Criper, C. and Howatt, A. (eds), Interlanguage. Edinburgh:
Edinburgh University Press, pp. 98–122.
King, P. (2003), ‘Parallel concordancing and its applications’, in Granger, S., Lerot, J.
and Petch-Tyson, S. (eds), Corpus-based Approaches to Contrastive Linguistics and Transla-
tion Studies. Amsterdam and Philadelphia: John Benjamins, pp. 157–68.
Krishnamurthy, R. and Kosem, I. (2007), ‘Issues in creating a corpus for EAP pedagogy
and research’. Journal of English for Academic Purposes, 6, 356–73.
Kroll, B. (1990), ‘What does time buy? ESL student performance on home vs. class com-
positions’, in Kroll, B. (ed.) Second Language Writing. Cambridge: Cambridge University
Press, 140–53.
References 249
Lake, J. (2004), ‘Using ‘on the contrary’: the conceptual problems for EAP
students’. ELT Journal, 58 (2), 137–44.
Lakshmanan, U. and Selinker, L. (2001), ‘Analysing interlanguage: how do we know
what learners know?’ Second Language Research, 17, 393–420.
Laruelle, P. (2004), Mieux écrire en anglais. Paris: Presses Universitaires de France.
Laufer, B. (1989), ‘What percentage of text-lexis is essential for comprehension?’, in
Lauren, C. and Nordman, M. (eds), Special Language: From Humans Thinking to Think-
ing Machines. Clevedon: Multilingual Matters, pp. 316–23.
Laufer, B. (1992), ‘How much lexis is necessary for reading comprehension?’, in Arnaud,
P. J. and Béjoint, H. (eds), Vocabulary and Applied Linguistics. London: Macmillan,
pp. 126–132.
Lee, D. (2001), ‘Genres, registers, text types, domains, and styles: clarifying the concepts
and navigating a path through the BNC jungle’. Language Learning and Technology, 5
(3), 37–72.
Lee, D. and Swales, J. (2006), ‘A corpus-based EAP course for NNS doctoral
students: Moving from available specialized corpora to self-compiled corpora’. English
for Specific Purposes, 25, 56–75.
Leech, G. (1997), ‘Introducing corpus annotation’, in Garside, R., Leech, G. and
McEnery, A. (eds), Corpus Annotation: Linguistics Information from Computer Text
Corpora. New York: Addison Wesley Longman, pp. 1–18.
Leech G. (1998) ‘Preface’, in Granger, S. (ed.), Learner English on Computer. London and
New York: Addison Wesley Longman.
Leech, G., Rayson, P. and Wilson, A. (2001), Word Frequencies in Written and Spoken English.
London: Longman.
Leech, G. and Smith, N. (1999), ‘The use of tagging’, in van Halteren, H. (ed.), Syntactic
Wordclass Tagging. Dordrecht: Kluwer Academic Publishers, pp. 23–36.
Lehmann, H.-M., Schneider, P. and Hoffmann, S. (2000), ‘BNCweb’, in Kirk, J. (ed.),
Corpora Galore: Analysis and Techniques in Describing English. Amsterdam: Rodopi, pp.
259–66.
Lennon, P. (1996), ‘Getting ‘easy’ verbs wrong at the advanced level’. IRAL, 34 (1),
23–36.
Li, E. S.-L. and Pemberton, R. (1994), ‘An investigation of students’ knowledge of
academic and subtechnical vocabulary’, in Flowerdew, L. and Tong, A. K. K. (eds),
Entering Text. Hong Kong: The Hong Kong University of Science & Technology,
pp. 183–96.
Lonon Blanton, L. (2001), Composition Practice 3. Boston: Heinle and Heinle.
Lorenz, G. (1998), ‘Overstatement in advanced learners’ writing: stylistic aspects of
adjective intensification’, in Granger, S. (ed.), Learner English on Computer. London
and New York: Addison Wesley Longman, pp. 53–66.
Lorenz, G. (1999a), Adjective Intensification – Learners versus Native Speakers. A Corpus Study
of Argumentative Writing. Language and Computers: Studies in Practical Linguistics 27.
Amsterdam and Atlanta: Rodopi.
Lorenz, G. (1999b), ‘Learning to cohere: causal links in native vs. non-native argumenta-
tive writing’, in Bublitz, W., Lenk, U. and Ventola, E. (eds), Coherence in Spoken and
Written Discourse. How to Create it and How to Describe it. Amsterdam and Philadelphia:
John Benjamins, pp. 55–75.
Luzón Marco, M. J. (1999), ‘Procedural vocabulary: lexical signalling of conceptual rela-
tions in discourse’. Applied Linguistics, 20 (1), 1–21.
Luzón Marco, M. J. (2000), ‘Collocational frameworks in medical research papers: a
genre-based study’. English for Specific Purposes, 19 (1), 63–86.
250 References
Lynn, R. W. (1973), ‘Preparing word lists: a suggested method’. RELC Journal, 4 (1),
25–32.
Major, M. (ed.) (2006), The Longman Exams Dictionary. Harlow: Longman.
Martin, A. (1976), ‘Teaching academic vocabulary to foreign graduate students’. TESOL
Quarterly, 10 (1), 91–7.
Martínez, I., Beck, S. and Panza, C. (2009), ‘Academic vocabulary in agricultural research
articles: a corpus-based study’. English for Specific Purposes, 28, 183–98.
McCarthy, M. (1991), Discourse Analysis for Language Teachers. Cambridge: Cambridge
University Press.
McCarthy, M. and O’Dell, F. (2008), Academic Vocabulary in Use. Cambridge: Cambridge
University Press.
McEnery, A., Xiao, R. and Tono, Y. (2006), Corpus-based Language Studies: An Advanced
Resource Book. London and New-York: Routledge.
Mel’čuk, I. (1998), ‘Collocations and lexical functions’, in Cowie, A. P. (ed.), Phraseology:
Theory, Analysis and Applications. Oxford: Oxford University Press, pp. 23–53.
Meunier, F. (2000), ‘A computer corpus linguistics approach to interlanguage grammar:
noun phrase complexity in advanced learner writing’. Unpublished PhD thesis.
Université catholique de Louvain: Louvain-la-Neuve.
Meyer, P.G. (1997), Coming to Know: Studies in the Lexical Semantics and Pragmatics of Aca-
demic English. Tübingen: Gunter Narr Verlag Tübingen.
Miller, J. and Weinert, R. (1995), ‘The function of like in dialogue’. Journal of Pragmatics,
23, 365–93.
Milton, J. (1998). ‘Exploiting L1 and interlanguage corpora in the design of an elec-
tronic language learning and production environment’, in Granger, S. (ed.), Learner
English on Computer. London and New York: Addison Wesley Longman, 186-198.
Milton, J. (1999), ‘Lexical thickets and electronic gateways: making text accessible by
novice writers’, in Candlin, C. N. and Hyland, K. (eds), Writing: Texts, Processes and
Practices. London and New York: Longman, pp. 221–43.
Moon, R. (1998), Fixed Expressions and Idioms in English. Oxford: Clarendon Press.
Mudraya, O. (2006), ‘Engineering English: a lexical frequency instructional model’.
English for Specific Purposes, 25 (2), 235–56.
Mukherjee, J. (2005), ‘The native speaker is alive and kicking – linguistic and language-
pedagogical perspectives’. Anglistik, 16 (2), 7–23.
Mukherjee, J. and Rohrback, J-M. (2006), ‘Rethinking applied corpus linguistics from a
language-pedagogical perspective: new departures in learner corpus research’, in
Kettemann, B. and Marko, G. (eds), Planning, Gluing and Painting Corpora: Inside
the Applied Corpus Linguist’s workshop. Frankfurt am Main: Peter Lang, pp. 205–32.
Available from http://www.uni-giessen.de/anglistik/LING/Staff/mukherjee/
Müller, S. (2005), Discourse Markers in Native and Non-native English Discourse. Amsterdam
and Philadelphia: John Benjamins.
Narita, M. and Sugiura, M. (2006), ‘The use of adverbial connectors in argumentative
essays by Japanese EFL college students’. English Corpus Studies, 13, 23–42.
Nation, P. (2001), Learning Vocabulary in another Language. Cambridge: Cambridge
University Press.
Nation, P. and Hwang, K. (1995), ‘Where would general service vocabulary stop and
special purposes vocabulary begin?’ System, 23 (1), 35–41.
Nation, P. and Waring, R. (1997), ‘Vocabulary size, text coverage and word lists’, in
Schmitt, N. and Nation, P. (eds), Vocabulary: Description, Acquisition and Pedagogy.
Cambridge: Cambridge University Press, pp. 6–19.
References 251
Neff, J., Ballesteros, F., Dafouz, E., Martínez, F. and Rica, J. P. (2004a), ‘The expression of
writer stance in native and non-native argumentative texts’, in Facchinetti, R.
and Palmer, F. (eds), English Modality in Perspective. Frankfurt am Main: Peter Lang,
pp. 141–61.
Neff J., Dafouz, E., Díez, M., Prieto, R. and Chaudron, C. (2004b), ‘Contrastive discourse
analysis: argumentative text in English and Spanish’, in Moder, C. and Martinovic-Zic,
A. (eds), Discourse across Languages and Cultures. Amsterdam and Philadelphia: John
Neff J., Ballesteros, F., Dafouz, E., Martínez, F. and Rica, J-P. (2007), ‘A contrastive func-
tional analysis of errors in Spanish EFL university writers’ argumentative texts:
corpus-based study’, in Fitzpatrick, E. (ed.), Corpus Linguistics beyond the Word: Corpus
Research from Phrase to Discourse (Language and Computers 23). Amsterdam: Rodopi,
pp. 203–25.
Neff, J., Ballesteros, F., Dafouz, E., Martínez, F., Rica, J-P., Díez, M. and Prieto, R. (2008),
‘Formulating writer stance: a contrastive study of EFL learner corpora’, in Gerbig, A.
and Mason, O. (eds), Language, People, Numbers. Corpus Linguistics and Society.
Amsterdam and New York: Rodopi, pp. 73–89.
Neff van Aertselaer, J. (2008), ‘Contrasting English-Spanish interpersonal discourse
phrases: a corpus study’, in Meunier, F. and Granger, S. (eds), Phraseology in Language
Learning and Teaching. Amsterdam: John Benjamins, pp. 85–100.
Nelson, M. (2000), ‘A corpus-based study of business English and business English
teaching materials’. Unpublished PhD Thesis. Manchester: University of
Manchester.
Nesi, H., Sharpling, G. and Ganobcsik-Williams, L. (2004), ‘Student papers across the
curriculum: designing and developing a corpus of British student writing’. Computers
and Composition, 21, 439–50.
Nesselhauf, N. (2003), ‘Transfer at the locutional level: an investigation of German-
speaking and French-speaking learners of English’, in Tschichold, C. (ed.) English
Core Linguistics. Essays in Honour of D. J. Allerton. Bern: Lang, pp. 269–86.
Nesselhauf, N (2004), ‘What are collocations?’, in Allerton, D. J., Tschichold, C. and
Wieser, J. (eds), Phraseological Units: Basic Concepts and their Application. Basel: Schwabe,
pp. 1–22.
Nesselhauf, N. (2005), Collocations in a Learner Corpus. Amsterdam: John Benjamins.
Oakes, M. P. (1998), Statistics for Corpus Linguistics. Edinburgh: Edinburgh University
Press.
Oakes, M. and Farrow, M. (2007), ‘Use of the Chi-Squared Test to examine vocabulary
differences in English language corpora representing seven different countries’.
Literary and Linguistic Computing, 22 (1), 85–99.
Oakey, D. (2002), ‘Formulaic language in English academic writing: a corpus-based study
of the formal and functional variation of a lexical phrase in different academic disci-
plines’, in Reynolds, D. (2005), ‘Linguistic correlates of second language literacy
development: evidence from middle-grade learner essays’. Journal of Second Language
Writing,14 (1), 19–45.
Obenda, D. (ed.) (2004), Academic Word Power (1 – 4). Boston and New York: Houghton
Mifflin Company.
Odlin, T. (1989), Language Transfer: Cross-linguistic Influence in Language Learning.
Odlin, T. (2003), ‘Cross-linguistic influence’, in Doughty, C. J. and Long, M. H. (eds),
The Handbook of Second Language Acquisition. Oxford: Blackwell, pp. 436–86.
252 References
Osborne, J. (2008), ‘Phraseology effects as a trigger for errors in L2 English: the case of
more advanced learners’, in Meunier, F. and Granger, S. (eds), Phraseology in Language
Learning and Teaching. Amsterdam and Philadelphia: John Benjamins, pp. 67–84.
Oshima, A. and Hogue, A. (2006), Writing Academic English. New York: Pearson
Education.
Paquot, M. (2007a), ‘Towards a productively-oriented academic word list’, in Walinski, J.,
Kredens, K. and Gozdz-Roszkowski, S. (eds), Corpora and ICT in Language Studies.
PALC 2005. Lodz Studies in Language 13. Frankfurt am Main: Peter Lang, pp. 127–40.
Paquot, M. (2007b), ‘EAP vocabulary in native and learner writing: from extraction to
analysis. A phraseologically-oriented approach’. Unpublished PhD thesis, Université
catholique de Louvain.
Paquot, M. (2008a), ‘Exemplification in learner writing: a cross-linguistic perspective’, in
Granger, S. and Meunier, F. (eds), Phraseology in Language Learning and Teaching.
Paquot, M. (2008b), ‘Lifting the “methodological fog” that covers transfer studies:
a combination of Granger’s (1996) integrated contrastive model and Jarvis’s (2000)
unified framework for transfer research’. Paper presented at the ASKeladden Opening
Conference, 24–25 June 2008, Bergen, Norway. Available from http://cecl.fltr.ucl.ac.
be/publications.html
Paquot, M. (in preparation), ‘Unveiling L1-induced effects with the help of learner
corpora: Transfer of lexical priming’.
Paquot, M. and Bestgen, Y. (2009), ‘Distinctive words in academic writing: a comparison of
three statistical tests for keyword extraction’, in Hundt, M., Schreier, D. and Jucker, A. H.
(eds), Corpora: Pragmatics and Discourse. Amsterdam: Rodopi, pp. 243–65.
Partington, A. (1998), Patterns and Meanings. Using Corpora for English Language Research
and Teaching. Amsterdam and Philadelphia: John Benjamins.
Pawley, A. and Syder, F. H. (1983), ‘Two puzzles for linguistic theory: nativelike selection
and nativelike fluency’, in Richards, J. C. and Schmidt, W. (eds), Language and
Communication. London and New York: Longman, pp. 29–59.
Pecman, M. (2004), ‘Phraséologie contrastive anglais-français: analyse et traitement en
vue de l’aide à la rédaction scientifique’. Unpublished PhD Thesis, Université de
Nice, Sophia Antipolis.
Perdue, C. (1993), ‘Comment rendre compte de la “logique” de l’acquisition d’une
langue étrangère par l’adulte?’ Etudes de Linguistique Appliquée, 92, 8–22.
Petch-Tyson, S. (1998), ‘Reader/writer visibility in EFL persuasive writing’, in Granger, S.
(ed.), Learner English on Computer. London and New York: Addison Wesley Longman,
pp. 107—118.
Petch-Tyson, S. (1999), ‘Demonstrative expressions in argumentative discourse – a com-
puter-based comparison of non-native and native English’, in Botley, S. P. and
McEnery, A. M. (eds), Corpus-based and Computational Approaches to Discourse Anaphora.
Amsterdam and Philadelphia: John Benjamins, pp. 43–64.
Piller, I. (2001), ‘Who, if anyone, is a native speaker?’ Anglistik, 12 (2), 109–21.
Praninskas, J. (1972), American University Word List. London: Longman.
Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. (1985), A Comprehensive
Grammar of the English Language. London: Longman.
Rayson, P. (2003), ‘Matrix: a statistical method and software tool for linguistic analysis
through corpus comparison’. Unpublished PhD thesis, Lancaster University. Avail-
able from http://www.comp.lancs.ac.uk/~paul/public.html
Rayson, P. (2008), ‘From key words to key semantic domains’. International Journal of
Corpus Linguistics, 13 (4), 519–49.
References 253
Rayson, P., Berridge, D. and Francis, B. (2004), ‘Extending the Cochran rule for the com-
parison of word frequencies between corpora’, in Purnelle, G., Fairon, C. and Dister,
A. (eds), Le Poids des Mots: Proceedings of the 7th International Conference on Statistical Analy-
sis of Textual Data (JADT 2004), Louvain-la-Neuve, Belgium, March 10–12, 2004.
Louvain-la-Neuve: Presses universitaires de Louvain, pp. 926–36.
Renouf, A. and Sinclair, J. (1991), ‘Collocational frameworks in English’, in Aijmer, K.
and Altenberg, B. (eds), English Corpus Linguistics: Studies in Honour of Jan Svartvik.
London and New York: Longman, pp. 128–43.
Reynolds, D. W. (2005), ‘Linguistic correlates of second language literacy development:
evidence from middle-grade learner essays’. Journal of Second Language Writing, 14,
19–45.
Ringbom, H. (1987), The Role of the First Language in Foreign Language Learning. Clevedon
and Philadelphia: Multilingual Matters.
Ringbom, H. (1998), ‘Vocabulary frequencies in advanced learner English: a cross-
linguistic approach’, in Granger, S. (ed.), Learner English on Computer. London and
New York: Addison Wesley Longman, pp. 41–52.
Ringbom, H. (2007), Cross-linguistic Similarity in Foreign Language Learning.
Clevedon: Multilingual Matters.
Römer, U. (2004a), ‘A corpus-driven approach to modal auxiliaries and their didactics’,
in Sinclair, J. McH. (ed.), How to Use Corpora in Language Teaching. Amsterdam: John
Römer, U. (2004b), ‘Comparing real and ideal language learner input: The use of an
EFL textbook corpus in corpus linguistics and language teaching’, in Aston, G.,
Bernardini, S. and Stewart, D. (eds), Corpora and Language Learners. Amsterdam: John
Römer, U. (2005), Progressives, Patterns, Pedagogy. A Corpus-driven Approach to English
Progressive Forms, Functions, Contexts and Didactics. Amsterdam: John Benjamins.
Römer, U. (2008), ‘Corpora and language teaching’, in Lüdeling, A. and Kytö, M. (eds),
Corpus Linguistics. An International Handbook (volume 1). [HSK series]. Berlin: Mouton
de Gruyter, pp. 112–30.
Ruetten, M. (2003), Developing Composition Skills. Rhetoric and Grammar. Boston: Heinle.
Rundell, M. (1999), ‘Dictionary use in production’. International Journal of Lexicography,
12 (1), 35–53.
Rundell, M. (ed.) (2007), Macmillan English Dictionary for Advanced Learners. (2nd
Edition). Oxford: Macmillan Education.
Saville-Troike, M. (1984), ‘What really matters in second language learning for academic
achievement?’ TESOL Quarterly 18 (2), 199–219.
Scarcella, R. C. and Zimmerman, C. B. (2005), ‘Cognates, cognition and writing: an
investigation of the use of cognates by university second-language learners’, in Tyler,
A., Takada, M., Kim, Y. and Marinova, D. (eds), Language in Use: Cognitive and Discourse
Perspectives on Language and Language Learning. Washington: Georgetown University
Press, pp. 123–36.
Schleppegrell, M. J. (1996), ‘Conjunction in spoken English and ESL writing’. Applied
Linguistics, 17 (3), 271–85.
Schmid, H-J. (2000), English Abstract Nouns as Conceptual Shells: From Corpus to Cognition.
Berlin: Mouton de Gruyter.
Schmitt, N. and Schmitt, D. (2005), Focus on Vocabulary: Mastering the Academic Word List.
London: Longman.
Schmitt, N., Schmitt, D. and Clapham, C. (2001), ‘Developing and exploring the behaviour
of two new versions of the Vocabulary Levels Test’. Language Testing, 18 (1), 55–88.
254 References
Scott, M. (1997), ‘PC analysis of keywords and key keywords’. System, 25 (2), 233–45.
Scott, M. (2001), ‘Comparing corpora and identifying key words, collocations, frequency
distributions through the WordSmith Tools suite of computer programs’, in Ghadessy,
M., Henry, A. and Roseberry, L. (eds), Small Corpus Studies and ELT. Amsterdam: John
Scott, M. (2004), WordSmith Tools 4. Oxford: Oxford University Press.
Scott, M. and Tribble, C. (2006), Textual Patterns: Key Words and Corpus Analysis in Lan-
guage Education. Amsterdam: John Benjamins.
Seale, C., Ziebland, S. and Charteris-Black, J. (2006), ‘Gender, cancer experience and
internet use: a comparative keyword analysis of interviews and online cancer support
groups’. Social Science and Medicine, 62, 2577–90.
Selinker, L. (1972), ‘Interlanguage’. IRAL, X (3), 209–31.
Selinker, L. (1992), Rediscovering Interlanguage. London and New York: Longman.
Shaw, P. (2004), ‘The development of Swedish university students’ written English,
appropriacy, scope and coherence’, in Proceedings of the Ninth Nordic Conference for Eng-
lish Studies, Aarhus, Denmark, May 27–29. Available from http://www.hum.au.dk/
engelsk/naes2004/papers.html
Siegel, M. (2002), ‘Like: the discourse particle and semantic’. Journal of Semantics, 19 (1),
35–71.
Siepmann, D. (2005), Discourse Markers across Languages: A Contrastive Study of Second-level
Discourse Markers in Native and Non-native Text with Implications for General and Pedagogic
Lexicography. London and New York: Routledge.
Sinclair, J. M. (1987), ‘The nature of the evidence’, in Sinclair, J. M. (ed.), Looking up.
London: Collins, pp. 150–9.
Sinclair, J. M. (1991), Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Sinclair, J. M. (1992), ‘The automatic analysis of corpora’, in Svartvik, J. (ed.), Directions
in Corpus Linguistics. Berlin and New York: Mouton de Gruyter, pp. 378–97.
Sinclair, J. M. (1999), ‘The lexical item’, in Weigand, E. (ed.), Contrastive Lexical
Semantics. Current Issues in Linguistic Theory 17. Amsterdam and Philadelphia: John
Sinclair, J. M. (2004a), ‘Intuition and annotation – the discussion continues’, in Aijmer,
K. and Altenberg, B. (eds), Advances in Corpus Linguistics. Papers from the 23rd Interna-
tional Conference on English Language Research on Computerized Corpora. Amsterdam and
New York: Rodopi, pp. 39–59.
Sinclair, J. M. (2004b), ‘The empty lexicon’, in Sinclair, J. (ed.), Trust the Text: Language,
Corpus and Discourse. New York: Routledge, pp. 149–63.
Sinclair, J. M. (2005), ‘Corpus and text–basic principles’, in Wynne, M. (ed.), Developing
Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, 1–16. Available from
http://ahds.ac.uk/creating/guides/linguistic-corpora/chapter1.htm.
Soler, V. (2002), ‘Analysing adjectives in scientific discourse: an exploratory study with
educational applications for Spanish speakers at advanced university level’. English for
Specific Purposes, 21, 145–65.
Stein, G. (2008), Developing your English Vocabulary. A Systematic New Approach. Tübingen:
Stauffenburg Verlag.
Strevens, P. (1973), ‘Technical, technological, and scientific English’. ELT Journal, 27 (3),
223–34.
Stubbs, M. (1986), ‘Language development, lexical competence and nuclear vocabu-
lary’, in Stubbs, M. (ed.), Educational Linguistics. Oxford and New York: Blackwell,
pp. 98–115.
References 255
Summers, D. (1996), ‘Computer lexicography: the importance of representativeness

in relation to frequency’, in Thomas, J. and Short, M. (eds), Using Corpora for
Language Research: Studies in Honour of Geoffrey Leech. London: Longman,
pp. 260–6.
Sutarsyah, C., Nation, P. and Kennedy, G. (1984), ‘How useful is EAP vocabulary for
ESP? A corpus based case study’. RELC Journal, 25, 34—50.
Swales, J.M. (1990) Genre Analysis: English in Academic and Research Settings. Cambridge:
Cambridge University Press.
Swales, J. M. (2002), ‘Integrated and fragmented worlds: EAP materials and corpus
linguistics’, in Flowerdew, J. (ed.), Academic Discourse. Harlow: Pearson Education,
pp. 150–64.
Swales, J.M., Ahmad, U., Chang, Y., Chavez, D., Dressen, D. and Seymour, R. (1998),
‘Consider this: the role of imperatives in scholarly writing’. Applied Linguistics,
19, (1), 97–121.
Swales, J.M. and Feak, C. B. (2004), Academic Writing for Graduate Students: Essential Tasks
and Skills (2nd edition). Ann Arbor, MI: University of Michigan Press.
Tan, M. (2005), ‘Authentic language or language errors? Lessons from a learner corpus’.
ELT Journal, 59 (2), 126–34.
Tankó, G. (2004), ‘The use of adverbial connectors in Hungarian university students’
argumentative essays’, in Sinclair, J. (ed.), How to Use Corpora in Language Teaching.
Amsterdam and Philadelphia: John Benjamins, pp. 157–81.
Thurstun, J. and Candlin, C. (1997), Exploring Academic English. A Workbook for Student
Essay Writing. Sydney: NCELTR Publications.
Thurstun, J. and Candlin, C. (1998), ‘Concordancing and the teaching of the vocabulary
of Academic English’. English for Specific Purposes, 17 (3), 267–80.
Tognini-Bonelli, E. (2001), Corpus Linguistics at Work. Amsterdam and Philadephia: John
Benjamins.
Tognini-Bonelli, E. (2002), ‘Functionally complete units of meaning across English and
Italian: towards a corpus-driven approach’, in Altenberg, B. and Granger, S. (eds)
Lexis in Contrast: Corpus-based Approaches. Amsterdam and Philadephia: John Benjamins,
pp. 73–95.
Tribble, C. (2001), ‘Small corpora and teaching writing: towards a corpus-informed
pedagogy of writing’, in Ghadessy, M., Henry, A. and Roseberry, R. (eds), Small Corpus
Studies and ELT: Theory and Practice. Amsterdam and Philadelphia: John Benjamins,
pp. 381–408.
Trimble, L. (1985), English for Science and Technology. Cambridge: Cambridge University
Press.
Tseng, Y-C. and Liou, H-C. (2006), ‘The effects of online conjunction materials on
college EFL students’ writing’. System, 34, 270–83.
Tutin, A. (forthcoming), ‘Evaluative adjectives in academic writing in the humanities
and social sciences’. Paper presented at Interpersonality in Written Academic
Discourse: Perspectives across Languages and Cultures, Jaca, 11–13 December 2008.
Available from http://w3.u-grenoble3.fr/lidilem/labo/file/evaluative_adjectives_
interlae_2008_tutin.pdf
Van Roey, J. (1990), French-English Contrastive Lexicology: An Introduction. Louvain-la-Neuve:
Peeters.
Vassileva, I. (1998), ‘Who am I/who are we in academic writing? A contrastive analysis of
authorial presence in English, German, French, Russian and Bulgarian’. International
Journal of Applied Linguistics, 8 (2), 163–90.
256 References
Voutilainen, A. (1999), ‘A short history of tagging’, in van Halteren, H. (ed.), Syntactic

wordclass tagging. Dordrecht: Kluwer Academic Publishers, pp. 9–21.
Wang, J., Liang, S., and Ge, G. (2008), ‘Establishment of a Medical Academic Word List’.
English for Specific Purposes, 27 (4), 442–58.
Wang, K. and Nation, P. (2004), ‘Word meaning in academic English: homography in the
Academic Word List’. Applied Linguistics, 25 (3), 291–314.
Ward, J. (1999), ‘How large a vocabulary do EAP engineering students need?’
Reading in a Foreign Language, 12 (2), 309–24.
Ward, J. (2009), ‘A basic engineering English word list for less proficient foundation
engineering undergraduates’. English for Specific Purposes, 28, 170–82.
Weissberg, R. and Buker, S. (1978), ‘Strategies for teaching the rhetoric of written
English for Science and Technology’. TESOL Quarterly,12 (3), 321–9.
West, M. (1937), ‘The present position in vocabulary selection for foreign language
teaching’. The Modern Language Journal, 21 (6), 433–7.
West, M. (1953), A General Service List of English Words. London: Longman.
Widdowson, H. G. (1983), Learning Purpose and Language Use. Oxford: Oxford
University Press.
Widdowson, H. G. (1991), ‘The description and prescription of language’, in Alatis, J. E.
(ed.), Linguistics and Language Pedagogy: The State of the Art. Washington, D.C.:
Georgetown University Press, pp. 11–24.
Widdowson, H. G. (2003), Defining Issues in English Language Teaching. Oxford: Oxford
University Press.
Wilkins, D. A. (1976), Notional Syllabuses. Oxford: Oxford University Press.
Wilson, A. and Thomas, J. (1997), ‘Semantic annotation’, in Garside, R., Leech, G. and
McEnery, A. (eds), Corpus Annotation: Linguistics Information from Computer Text
Corpora. New York: Addison Wesley Longman, pp. 53–65.
Winter, E. (1977), ‘A clause relational approach to English texts: a study of some predic-
tive lexical items in written discourse’. Instructional Science, 6, 1–92.
Wray, A. (2002), Formulaic Language and the Lexicon. Cambridge: Cambridge University
Press.
Xue, G. and Nation, P. (1984), ‘A University Word List’. Language Learning and
Communication, 3 (2), 215–29.
Yang, H. (1986), ‘A new technique for identifying scientific / technical terms and describ-
ing science texts’. Literary and Linguistic Computing, 1 (2), 93–103.
Zamel, V. (1983), ‘Teaching those missing links in writing’. ELT Journal, 37 (1), 22–9.
Zemach, D. and Rumisek, L. (2005), Academic Writing: From Paragraph to Essay. Oxford:
Macmillan.
Zhang, M. (2000), ‘Cohesive features in the expository writing of undergraduates in two
Chinese universities’. RELC Journal, 31, 61–95.
Zhang, H., Huang, C. and Yu, S. (2004), ‘Distributional consistency: A general method
for defining a core lexicon’, in Proceedings of the Fourth International Conference on Lan-
guage Resources and Evaluation, Lisbon, Portugal, 26–28 May 2004. Available from
http://cwn.ling.sinica.edu.tw/churen/LA040524CT01.pdf
Zwier, L. J. (2002), Building Academic Vocabulary. Michigan: The University of Michigan
Press.
Note
1
All cited internet sources were correct as of 2 August 2009.
Author index
Note: Page numbers in italics denote illustrations.
Aarts, J. 35, 143 Charles, M. 214

Ädel, A. 69, 72, 150, 157, 161 Chen, C. W. 126, 152, 174
Aijmer, K. 152, 157, 176 Chung, T. 14, 18
Altenberg, B. 83, 106, 121, 126, 150, Clear, J. 102
152, 180 Cohen, A. D. 18
Archer, D. 42, 45 Coltier, D. 88
Aston, G. 73 Connor, U. 152, 190
Conrad, S. 59, 83, 85, 121, 179, 180
Baayen, R. H. 145 Cook, G. 207
Bahns, J. 204 Corson, D. 13
Bailey, S. 206 Cortes, V. 1
Baker, M. 17, 19, 20, 21, 22, 24 Cowan, J. R. 17, 18
Baker, P. 48 Cowie, A. P. 213
Barkema, H. 84, 100 Coxhead, A. 3, 5, 9, 10, 11, 12, 13, 16, 17,
Barkhuizen, G. 67 20, 21, 25, 27, 28, 31, 34, 44, 63, 82,
Bartning, I. 79 122, 212
Bauer, L. 12 Crewe, W. 169, 174, 175, 176, 193, 201
Bazerman, C. 72 Curado Fuentes, A. 46, 213
Beheydt, L. 14, 27 Cutting, J. 72
Bestgen, Y. 62
Bhatia, V. 26 Davies, A. 71
Biber, D. 2, 29, 55, 83, 122, 137, 143, De Bot, K. 239n. 3
179, 211, 237n. 3 Dechert, H. 155, 168
Billuroğlu, A. 16 De Cock, S. 30, 72, 86, 121, 157
Biskup, D. 185 DeRose, S. 37
Bley-Vroman, R. 70 Dudley-Evans, T. 211
Bourigault, D. 237n. 7 (Ch. 4)
Bowker, L. 35, 206 Eldridge, J. 26, 214
Brill, E. 37 Elley, W. B. 11
Buker, S. 81 Ellis, R. 67
Burger, H. 83, 121 Engels, L. K. 11
Burnard, L. 73 Evans, S. 1
Evert, S. 75, 76, 78
Campion, M. E. 11
Candlin, C. 206 Farrell, P. 17, 18, 20
Carter, R. 11, 23, 85, 207, 238n. 3 Farrow, M. 48, 50, 62
Celce-Murcia, M. 169 Feak, C. B. 24
258 Author index
Field, Y. 171, 177, 193 Huntley, H. 9, 11, 16, 82

Firth, A. 70 Hwang, K. 11, 13, 14
Fisher, D. 204 Hyland, K. 3, 24, 25, 26, 31, 32, 72, 90,
Flowerdew, J. 2, 60, 61, 172, 178, 201, 92, 93, 99, 147, 157, 189, 201, 211,
214 214
Flowerdew, L. 23, 199, 201, 204
Francis, G. 22, 23, 59, 235n. 2 (Ch. 1) Ide, N. 37
Ivanič, R. 235n. 2 (Ch. 1)
Garside, R. 37, 38, 39
Ghadessy, M. 11 Jarvis, S. 4, 182, 183, 184, 185, 197, 216,
Gilquin, G. 1, 7, 70, 71, 151, 153, 195, 238n. 13
197, 207, 207, 208, 209, 210, 225, Johansson, S. 31
238n. 5 Johns, T. 214
Gläser, R. 237n. 7 (Ch. 4) Jordan, M. P. 23, 235n. 2 (Ch. 1)
Gledhill, C. 83, 102, 119, 123, 161, Jordan, R. R. 1, 81, 82, 85, 201, 202,
236n. 4, 238n. 9 203, 211
Goodman, A. 17 Juilland, A. 50
Granger, S. 4, 26, 32, 65, 67, 68, 70, 71,
72, 84, 100, 102, 118, 122, 123, Kamimoto, T. 239n. 2
126, 143, 145, 150, 151, 152, 155, Katz, S. 48, 236n. 8
157, 168, 169, 170, 177, 179, 182, Kellerman, E. 197
184, 185, 194, 197, 202, 204, 206, King, P. 206
213, 214, 215, 216, 236n. 1, 236n. 2, Kosem, I. 62
238n. 9 Krishnamurthy, R. 62
Green, C. 1 Kroll, B. 69
Gregg, K. R. 218
Gries, S. 48, 50 Lake, J. 169, 170, 201
Groom, N. 215 Lakshmanan, U. 70, 71
Larsen-Freeman, D. 169
Halliday, M. 203 Laruelle, P. 91
Hamp-Lyons, L. 206 Laufer, B. 10
Hancioğlu, N. 15, 16, 27, 63, 212 Lee, D. 73, 74, 132
Harris, S. 22 Leech G. 11, 34, 35, 70, 72
Harris Leonhard, B. 85 Lennon, P. 165, 168
Hasan, R. 203 Li, E. S.-L. 18
Hasselgren, A. 147 Liou, H.-C. 206
Heasley, B. 206 Lonon Blanton, L. 85
Heatley, A. 44 Lorenz, G. 72, 101, 143, 146, 150, 152,
Hegelheimer, V. 203 157, 169, 173, 177, 193
Hinkel, E. 1, 3, 33, 59, 148 Luzón Marco, M. J. 22, 83, 137
Hirsh, D. 10, 34 Lynn, R. W. 11
Hoey, M. 23, 26, 192, 216, 217
Hoffmann, S. 75, 76, 86 Major, M. 11
Hogue, A. 85 Martin, A. 19, 20, 21, 27
Howarth, P. 119, 165, 217 Martínez, I. 15, 27, 34, 82, 212
Huckin, T. 26, 214 McCarthy, M. 9, 23, 211, 235n. 2
Hunston, S. 118 (Ch.1), 238n. 3,
Author index 259
McEnery, A. 30, 35, 76 Quirk, R. 179

Mel’čuk, I. 83
Meunier, F. 143, 150 Rayson, P. 29, 30, 37, 38, 43, 47, 50, 61,
Meyer, P. G. 24, 27 76, 145, 150
Miller, J. 237n3 Renouf, A. 102
Milton, J. 72, 147, 157, 179, 201, 202, Reynolds, D. W. 1
203, 206, 213 Ringbom, H. 182, 185, 192
Moon, R. 121 Rodriguez, E. C. 50
Mudraya, O. 13, 14, 16, 17, 19, 31, 34 Rohrback, J.-M. 160
Mukherjee, J. 70, 71, 160 Römer, U. 85
Müller, S. 237, Ruetten, M. 85
Rumisek, L. 85
Narita, M. 126, 152, 174, 177, 179, 202 Rundell, M. 201, 207
Nation, I. S. P. 12
Nation, P. 1, 3, 10, 11, 13, 14, 16, 17, 18, St Johns, M. J. 211
23, 26, 44, 82, 185 Saville-Troike, M. 26
Neff, J. 73, 75, 152, 157, 194, 195 Scarcella, R. C. 13
Neff van Aertselaer, J. 73 Schleppegrell, M. J. 180
Nelson, M. 46 Schmid, H.-J. 235n. 2 (Ch.1)
Nesi, H. 31, 32, 33 Schmitt, D. 11, 16
Nesselhauf, N. 73, 78, 101, 164, 166, Schmitt, N. 11, 16
185 Scott, M. 2, 45, 46, 47, 48, 69, 236n. 9
Neufeld, S. D. 16 Seale, C. 46
Selinker, L. 70, 71, 182, 183, 239n. 2
Oakes, M. P. 48, 50, 62, 76 Shaw, P. 69
Oakey, D. 213 Siegel, M. 237n. 3
Obenda, D. 82 Siepmann, D. 82, 88, 101, 107, 126
O’Dell, F. 9 Sinclair, J. M. 2, 26, 35, 82, 101, 102,
Odlin, T. 185, 204 118
Osborne, J. 238n. 12 Smith, N. 35, 37, 39
Oshima, A. 85 Soler, V. 118
Stein, G. 16, 235n. 1 (Ch.1)
Paquot, M. 15, 26, 36, 62, 84, 100, 118, Strevens, P. 13
122, 123, 135, 151, 153, 157, 168, Stubbs, M. 10
190, 195, 197, 204, 213, 214, Sugiura, M. 126, 152, 174, 177, 179,
238nn. 9,13, 239n. 6 203
Partington, A. 15 Summers, D. 207
Pavlenko, A. 185 Sutarsyah, C. 26
Pawley, A. 71 Swales, J. M. 24, 31, 61, 86, 92, 132, 189,
Payne, E. 17 207
Pearson, J. 35 Swallow, H. 185
Pecman, M. 119 Syder, F. H. 71
Pemberton, R. 18
Perdue, C. 192 Tan, M. 71
Petch-Tyson, S. 142, 145, 157 Tankó, G. 147
Piller, I. 71 Tapper, M. 126, 150, 152
Praninskas, J. 11 Thomas, J. 42
260 Author index
Thompson, G. 118 Weinert, R. 237n. 3

Thompson, P. 255n. 2 Weissberg, R. 80
Thurstun, J. 206 West, M. 10, 11, 12, 15, 27, 44, 60
Tognini-Bonelli, E. 30, 35, 36, 118 Widdowson, H. G. 22, 61, 207, 212
Tribble, C. 46, 62, 69 Wilkins, D. A. 81
Trimble, L. 18, 20, 21 Wilson, A. 42
Tse, P. 3, 25, 26, 92 Winter, E. 22
Tseng, Y.-C. 206 Wray, A. 86
Tutin, A. 118
Tyson, S. 126, 152, 169, 170, 177, 184 Xue, G. 11, 16
Van Roey, J. 185 Yang, H. 13, 14, 17

Vassileva, I. 152 Yip, L. M. O. 171, 177, 193
Voutilainen, A. 38
Zamel, V. 176, 201
Wagner, J. 70 Zemach, D. 85
Wang, J. 34 Zhang, H. 50
Wang, K. 26 Zhang, M. 177, 194
Ward, J. W. 16, 34 Zimmerman, C. B. 13
Waring, R. 1, 11 Zwier, L. J. 24, 85
Subject index
Note: Page numbers in italics denote illustrations.
Academic Corpus 11–12 adjectives 101, 118

academic discourse 2, 3–4, 15, 24, in the Academic Keyword List 57
27–8, 31, 40, 63, 102, 119, 214, as co-occurrents of academic
217 nouns 100, 133, 167
academic discourse community 31 potential academic 57, 59
Academic Keyword List (AKL) 5, 7, 55–61, adverbials/adverbs 93, 213
122 in the Academic Keyword List 58
academic vocabulary and 60 mono-lexemic 91
automatic semantic analysis of 82–3 multiword linking 121
distribution, in ICLE 143 potential academic 58, 59
exemplificatory discourse markers semantic misuse and 139–40
in 88 sentence position 179
grammatical distribution categories annotation 34–6, 37–42, 43
in 55 part-of-speech annotation 30, 34–5,
need for concordancing in 61 36, 37, 38, 40, 41, 43
need for pedagogic mediation 61, 82 semantic annotation 35, 43–4, 53
nouns and 56 association measures 76, 101
overused and underused clusters attitudinal formulae 84, 122, 123
with 156 automatic semantic analysis, of
and rhetorical functions 81–7 AKL 82–3
words distribution, in GSL and
AWL 60 Baby BNC Academic corpus (B-BNC) 31,
words, overused and underused in 32, 47
ICLE 144 bilingual dictionaries 204
academic literacy 231 Billuroğlu-Neufeld-List (BNL) 16
academic vocabulary 7 blend 168
vs. core vocabulary and technical BNC-AC-HUM 75, 78, 90, 95, 100, 102
terms 10–13 comparing and contrasting
definition of 212 in 112–14
fuzzy vocabulary categories 13–17 expressing cause and effect in
meaning of 9, 24–5, 28 110–11, 114–18
and sub-technical vocabulary 17–21 expressing a concession in 109
Academic Word List (AWL) 3, 5, 11, 12, expressing possibility and
15, 16, 17, 20, 25, 27, 34, 59, 60, certainty 118–20
63, 82, 122, 212 reformulating in 109
activity verbs 59 see also British National Corpus (BNC)
262 Subject index
BNCweb 75, 76, 77 Common European Framework of Reference

booster 157 for Languages (CEF) 236n. 1 (ch3)
British Academic Written Corpus 217 communicative phrasemes 84, 121–2
British Academic Written English comparative fallacy 70, 71
(BAWE) Pilot Corpus 32–3, 34, comparison and contrast markers 87,
235n. 1,2 (ch2) 202, 208, 226–34
British National Corpus (BNC) 4, 16, in BNC-AC-HUM 112–14
17, 31, 67, 73, 74, 75–6, 77, 78, 79, EFL learners’ use of 148, 149
84, 95, 102, 125, 130, 132, 133, conceptual frequency 86
134, 146, 152, 207 concession markers 87
Baby BNC Academic Corpus in BNC-AC-HUM 109
(B-BNC) 31 conjunctions
BNC-AC 78 complex 40, 59, 84, 119, 120
BNC-AC-HUM see BNC-AC-HUM overuse of 146
Index 73–5 sentence initial position of 194
mark-ups 73 connectors 140, 169, 170, 172, 174–6,
BROWN corpus 47 177, 178–9, 178, 180, 181, 193,
burstiness 48, 236n. 8 201–2, 202, 203, 215, 238nn. 1–3
medial position for 180
Cambridge Advanced Learner’s overuse of 201–2
Dictionary 207 semantic misuse 201
cataphoric markers 90–1 sentence 22
cause and effect markers 87, 210, 219–25 sentence position 141, 174–82, 193–4,
in BNC-AC-HUM 110–11 203
EFL learners’ use of 146–8, 147 Constituent Likelihood Automatic Word-
Centre for English Corpus Linguistics tagging System (CLAWS) 37–42, 59
(CECL) 207 content words 10, 22, 102, 236n. 8
Clairefontaine Contrastive Interlanguage Analysis
Les fiches essentielles du Baccalauréat en (CIA) 4, 65, 70, 79, 85, 87, 215
anglais 160 contrastive rhetoric 2, 152
CLAWS 37–42, 59 control corpus 67, 70, 71, 73
code gloss 90, 93, 188–9 co-occurrence 37, 76, 78, 95, 96, 99,
cohesion 22, 123, 211, 213, 238n. 3 100, 101, 114, 115–17, 119–20,
advance and retrospective 133–4, 137, 160, 162, 166, 167,
labelling 22 193
grammatical 203 preferred co-occurrences in EFL
lexical 18, 22, 148, 203, 213 writing 160–8
non-technical words 18 core vocabulary 3, 4, 10–11, 15
textual 123, 148 Corpus de Dissertations Françaises
colligation 168 (CODIF) 184, 186, 188, 190,
colligational errors 166, 168 191
collocation 23–4, 76, 77, 100, 102, 118, corpus-based approach 2, 3, 29, 30, 31,
119, 161, 164, 165, 192, 204, 217, 61, 87, 106, 150, 216, 238n. 6
238n. 9 corpus-driven approach 29, 30, 35
collocational framework 102 Corpus Query Processor (CQP) 75, 76
collocational overlap 165 co-text 2, 22, 172, 193, 203
Subject index 263
data-driven approach 4, 29, 30, 36 attitudinal formulae 84, 122, 123

data-driven learning 214 textual formulae 121
derivation 12, 13, 16, 19 free combination 100, 101, 123
developmental factor 4, 125, 181, 183, FROWN corpus 47
197, 216 functional-product approach 82
directives functional syllabus 81–2, 83
see imperatives function words 10, 45, 102, 143
discipline 3, 18, 25, 26, 27, 33, 55, 211, fuzzy vocabulary categories 13–17
213, 214, 215, 235n. 2
see also knowledge domain General Service List of English Words
discourse marker 88, 126, 138 (GSL) 10–11, 12, 14, 15, 16, 18,
cataphoric marker 90, 91 27, 44, 59, 60
endophoric marker 91, 92, 93, 98, general service word 16, 20, 63, 212
99, 119 genre 2, 73, 74, 75, 93, 94, 95, 102, 103,
engagement marker 92, 93 130, 131, 132, 145
discourse-organizing vocabulary 9, 23 global keywords 48
dispersion see distribution grammatical cohesion see cohesion
distribution 29, 45, 50–3, 55, 60, 78, 93, graphemic words 40
94, 95, 103, 132, 135, 143
ditto-tag 38, 40, 44, 59 hedge 2, 157
document-level burstiness 236n. 8 high-frequency word 5, 10, 14, 15, 20,
27, 28, 37, 45, 60, 212
EAP material design 221 homographs 25, 37
EAP teaching 26, 213, 214
endophoric markers 91, 93, 98, idiom 3, 23, 44, 71, 84, 119, 123, 237n. 7
99, 119 illocutionary nouns 23
English as a Second Language imperatives
(ESL) 33, 148, 180, 204 in academic writing 93, 107
English for Academic Purposes as directives with rhetorical
(EAP) 15, 26, 27, 62, 85 purpose 92
English for Specific Purposes (ESP) 9, first person plural 136, 137, 188,
217 189–90, 191, 192, 204
epistemic modifiers 147 second person 91–2, 98
evenness of distribution see distribution IMS Open Corpus Workbench 75
and Juilland’s D statistical International Corpus of Learner
coefficient English (ICLE) 4, 5, 65, 67–9, 71,
exemplifiers 85–8 72–3, 75, 78, 84, 86, 125, 236n. 3
in BNC-AC-HUM 88–108
learners’ use of 125–42, 189–91 Juilland’s D statistical coefficient 50–3
‘extended units of meaning’ 118
keyness 4, 45, 46–8, 55, 62, 159, 212
fiction 46, 47 keyword 30, 46–8, 47, 55, 61, 62, 86, 159
field approach 82 global keyword 48
fixed phrase 23–4, 121 local keyword 48
FLOB corpus 47 negative keyword 47, 86
formulae 12, 47, 61, 84 positive keyword 47, 86
264 Subject index
keyword analysis see keyness non-technical meaning 18, 19

knowledge domain 31, 32 technical meaning 18, 19, 52
mental process nouns 23
L1 frequency 185, 190, 218, 239n2 mental verbs 59
L1-induced factor see transfer metadiscourse 24, 90, 93, 99, 118, 161
L1 influence 182–92 metalinguistic labels 23, 59
Jarvis’s unified framework 182–4 Michigan Corpus of Upper-level
labeling 22–4, 73, 173, 203 Student Papers 217
advance labeling 96, 97 Micro-Concord Corpus Collection
retrospective labeling 22, 96, 98 B (MC) 31, 32
labels 22–3 monolingual learners’ dictionary
semantic misuse 172–3 (MLD) 206–7
language-activity nouns 23 morphosyntactic annotation 34–5
learners’ dictionary 206–7 multiword expression 37, 44, 53, 59, 60,
lexical bundle 69, 118, 177 83, 88, 121, 184, 185, 197
lexical cohesion see cohesion
lexical extension 213 native control corpus 70
lexical priming see priming native speaker norm, corpus-
lexical repertoire 3, 4, 5, 9, 125, approximation to 70, 71
142–50, 192–3 native student writing 72
lexical teddy bear 147 negative keywords 86
lexical transfer see transfer n-gram 69
lexico-grammar 26, 71, 85, 105, 118, non-technical term 17, 18
123, 137, 138, 161, 164, 186, 193, non-technical words 18–19, 24
197, 214, 215 nouns 22–3, 108, 138
lexico-grammatical error 155, 197, in the Academic Keyword List 56
215 adjectives as co-occurrents of
linking word 3, 84, 121, 160, 177, 179, academic 100, 133, 167
180, 181, 204 verbs as co-occurrents of
LOB corpus 47 academic 95–9, 134, 137, 162–3
local keywords 48 novice writing 1, 4, 31, 65, 85, 95–6,
logico-semantic relationship verbs 59 152, 190, 194, 195–6, 197, 206,
log-likelihood 47, 48, 62, 76, 78, 125 215, 217
log-likelihood calculator, UCREL 125 nuclear vocabulary 9, 10, 14, 20
LONGDALE project 239n. 1 nuclear words and pragmatic
Longman Dictionary of Contemporary neutrality 14
English 206
Louvain Corpus of Native Speaker organizational function see rhetorical
Essays (LOCNESS) 32, 194, 195 function
overuse 86, 126, 129, 130, 140, 143,
Macmillan English Dictionary for Advanced 144, 145–6, 148, 150, 151, 152,
Learners 7, 201, 207, 238n. 6 155, 156, 157, 158, 159, 160, 194,
meaning 10, 13, 14, 18–19, 20, 25, 35, 195, 201, 237n. 5
49, 52, 100–1, 102, 118, 185 Oxford English Dictionary (OED) 20
delexical meaning 118
figurative meaning 119 paraphrasing and clarifying see
over-extension 146, 170, 184 reformulation markers
Subject index 265
parsing see syntactic annotation range 1, 4, 11, 12, 13, 14, 30, 45, 48–50,
part-of-speech (POS) tagging 34–5, 37–8 62, 212
see also annotation Range corpus analysis program 44
pedagogic mediation 61, 82, 212 reception 1, 15, 17, 73
Perl program 48 reference corpus 46, 47, 62, 73, 135
personal metadiscourse 161 referential phrasemes 84, 119
personal pronoun 98, 136, 138, 161, 164 reformulation markers 87, 108, 139, 209
phraseme 83, 84, 93, 94, 106, 118, 119, in BNC-AC-HUM 109
121, 137, 138, 148, 164, 185, 211, register awareness 5, 125, 132, 142,
214 150–2, 193, 208, 215
communicative phraseme 84, 122, 157 reporting verbs 59
mono-lexemic phraseme 88, 90, 106, retrospective labelling 22
160, 161 rhemes 97, 98, 121, 135
referential phraseme 84, 119 rhetorical function 5, 7, 9, 10, 20, 22,
structural phraseme 121 24, 26, 27, 60, 61, 63, 81, 125, 141,
textual phraseme 84, 90, 94, 95, 118, 142, 148, 150, 151, 155, 161, 184,
120, 121, 123, 161 188, 190, 192–3, 197, 202, 203,
phraseological accent 83 207, 213, 214, 215
phraseological analysis 83–4, 90, 93, rhetorical overstatement 176
123, 213 Robert & Collins CD-Rom 204, 205
phraseological ‘cascade’ 161, 188
phraseological competence 217 semantic annotation 35
phraseological infelicity 155 semantic misuse 5, 139–40, 145,
phraseology of rhetorical functions 65, 168–74, 170, 172, 193, 201
76, 78, 81, 102, 108, 109, 110–11, semantic tagging 43
112–14, 115–17, 119–20, 121, 123, semantic transfer see transfer
132, 154, 166, 213, 217 semi-technical vocabulary 17
frequency-based approach to 122 sentence connectors 22
positive keywords 86 sentence stem 97, 99, 106, 114, 118,
potential academic words 29, 44–55 121, 122, 135
preferred co-occurrence 2, 160, 192, 193 specialised non-technical lexis 17, 18
preferred ways of saying things 83, 123, speech 2, 62, 71, 84, 95, 131, 136, 145,
166, 193 151, 152, 153, 177, 190, 195, 197,
preposition 97, 101, 108, 143 213
complex 40, 41, 84, 90, 120, 139, 144 speech-like lexical item 151–2, 153,
priming 192, 197, 203, 216, 217 195
mental 192 spoken frequency counts 145
transfer of 192, 203 Student Writing Corpus 32, 33
procedural vocabulary 22, 211 sub-technical vocabulary 17–21, 21, 22,
production 1, 4, 9, 15–16, 33, 68, 69, 24
70, 142, 155, 212 syntactic annotation 35
pronoun 23, 101
demonstrative 96 tagging see annotation
as exemplified item 97 teaching material 82, 85, 148, 160, 169,
impersonal 168 178, 180, 206, 238n. 1
personal 98, 136, 138, 161, 164 technical terms 3, 9, 13, 14, 18
third person 157 technical vocabulary 13, 17–21
266 Subject index
text coverage 10, 11, 15, 16, 236n. 4 Varieties of English for Specific
text nouns 23 Purposes dAtabase (VESPA) 217
textual formulae 121 verbs 24, 26–7, 59, 91, 118–20, 136, 157–8
textual phrasemes 84, 93, 94, 118, 119, activity 59
121, 123, 160, 161 co-occurrents
textual sentence stems 97, 121, 135 of academic nouns 95–9, 134, 137,
tokenisation 38 162–3
transfer 4, 168, 171, 181, 190, 191, 194, forming rhemes with noun 98
197, 203, 216, 218 lexical 36
lexical transfer 216 linking 59
transfer effects 182–5 mental 59
transfer of form 185 potential academic 57
transfer of form/meaning reporting 59
mapping 185, 216 in sentence-initial infinitive
transfer of L1 frequency 185, 190–1 clauses 138
transfer of meaning 185 Vocabulary 3 items 22
transfer of the phraseological
environment 185, 197 Web Vocab Profile 59, 60
transfer of primings 192, 197, 203, within-document burstiness 236n. 8
216 Wmatrix 36–7, 53
transfer of style and register 185, word families 12, 16, 17, 45
188, 192 in AWL 12, 16–17, 17
transfer of training 144, 182, 194, 201–3 in GSL 11
transfer of use 185 word form 12, 17, 34, 36, 39, 102, 157
transfer-related factor see transfer word list 2, 16, 27, 40, 46
typicality 106, 107 in the Academic Keyword List 57
Word Smith Tools 2, 47, 48, 49, 51, 69
underuse 86, 126, 130, 131, 135, 137, Concord tool 69
143, 144, 145, 146, 147, 148, 149, Detailed Consistency Analysis 51
155, 156, 157, 158, 159 Keywords option 155
underused words see negative keywords WordList option 49
University Word List 16
USAS (UCREL Semantic Analysis ‘you-know-it-when-you-see-it’
System) 37, 42–4 syndrome 182

(Corpus and Discourse) Magali Paquot-Academic Vocabulary in Learner Writing - From Extraction To Analysis-Continuum PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

(Corpus and Discourse) Magali Paquot-Academic Vocabulary in Learner Writing - From Extraction To Analysis-Continuum PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Academic Vocabulary in

Editorial Board: Paul Baker (Lancaster), Frantisek Čermák (Prague), Susan

Corpus linguistics provides the methodology to extract meaning from texts.

Research in Corpus and Discourse

Corpus-Based Approaches to English Language Teaching

Corpus Linguistics and World Englishes

Evaluation and Stance in War News

Historical Corpus Stylistics

Idioms and Collocations

Working with Spanish Corpora

Studies in Corpus and Discourse

Corpus Linguistics and The Study of Literature

English Collocation Studies

Text, Discourse, and Corpora. Theory and Analysis

© Magali Paquot 2010

All rights reserved. No part of this publication may be reproduced or

British Library Cataloguing-in-Publication Data

ISBN: 978-1-4411-3036-5 (hardcover)

Library of Congress Cataloging-in-Publication Data

Typeset by Newgen Imaging Systems Pvt Ltd, Chennai, India

Part I: Academic vocabulary

Chapter 2 A data-driven approach to the selection of

Part II: Learners’ use of academic vocabulary

Chapter 4 Rhetorical functions in expert academic writing 81

Chapter 5 Academic vocabulary in the International

Part III: Pedagogical implications and conclusions

Chapter 7 General Conclusion 211

Appendix 1: Expressing cause and effect 219

On a more personal note, I would like to express my deepest thanks to my

AKL Academic Keyword List (my own list)

LOCNESS Louvain Corpus of Native Speaker Essays

Figure 1.1: The relationship between academic and sub-technical

Figure 2.1: A three-layered sieve to extract potential

Figure 3.1: ICLE task and learner variables (Granger et al.,

Figure 4.1: Exemplification in the BNC-AC-HUM 89

Figure 5.1: Exemplifiers in the ICLE and the BNC-AC-HUM 127

Figure 5.4: Distribution of the adverbials ‘for example’ and

Figure 6.1: Connectives: contrast and concession

Table 1.1: Composition of the Academic Corpus

Table 2.1: The corpora of professional academic writing 31

Table 3.1: Breakdown of ICLE essays 69

Table 4.1: Ways of expressing exemplification found in the

Table 5.1: A comparison of exemplifiers based on the total

Table 5.22: Adjective co-occurrents of the noun conclusion

Table 6.1: Le Robert & Collins CD-Rom (2003–2004):

occurrence of nouns, nominalizations, noun phrases with modifiers,

are used to refer to activities which are characteristic of academic discourse,

‘Academic vocabulary’ is a term that is widely used in textbooks on English

What is academic vocabulary?

Academic vocabulary is in fashion, as witnessed by the increasing number

1.1. Academic vocabulary vs. core vocabulary

Numerous second language acquisition studies have investigated whether

becomes sufficient for adequate reading comprehension. Laufer (1989;

1.1.1. Core vocabulary

1. Nuclear words have a ‘purely conceptual, cognitive, logical or proposi-

1.1.2. Academic vocabulary

Table 1.1 Composition of the Academic Corpus (Coxhead 2000: 220)

Running words Texts Subject areas

Arts 883,214 122 education; history; psychology; politics; psychol-

1. Specialized occurrence: a word family could not be in the first 2,000

1.1.3. Technical terms

1.1.4. Fuzzy vocabulary categories

composite lessons aiming at teaching reading and speaking simultane-