Sunteți pe pagina 1din 4

ELT Journal Advance Access published May 5, 2011

key concepts in e l t

Corpus-aided language learning


Li-Shih Huang
A corpus is a large collection or database of machine-readable texts involving natural discourse in diverse contexts (Bernardini 2000). Such discourses can be spoken, written, computer-mediated, spontaneous, or scripted and may represent a variety of genres (for example everyday conversations, lectures, seminars, meetings, radio and television programmes, and essays). Some readily available corpora include the British National Corpus (BNC, http://www.natcorp.ox.ac.uk), which contains 100 million words from written and spoken language in a variety of contexts, the Michigan Corpus of Academic Spoken English (MICAS E, http://micase.elicorpora.info), which features 1.8 million words of speech in various academic contexts, and the Corpus of Contemporary American English (COCA), with 410 million words (http://www.americancorpus.org).1 Although corpus linguistics (i.e. computer-assisted analysis techniques for studying texts) is a young specialization, its usefulness in teaching and learning has received growing attention and recognition (for example Hunston 2002; Sinclair 2004; Conrad 2005; OKeeffe, McCarthy, and Carter 2007; Bennett 2010; Reppen 2010). In particular, researchers have identied corpus data as resources that provide descriptive insights relevant to how people use language and as tools that enable students and instructors to analyse both how people use different language forms at various levels of formality and how language fullls multiple speech functions across contexts. Corpus data suggest that individuals often do not use language as specied in grammar books and that word meanings vary across contexts and users (Biber and Reppen 2002). Over the past ten years, a growing number of studies have shown how learners can use corpus data to further their language learning (see Hunston op.cit.; Boulton 2010). Numerous corpus linguists (for example Gavioli and Aston 2001) have pointed out that learning activities centred on analysing corpus data are consistent with current principles of languagelearning theory, that is students develop more autonomy when they receive guidance about how to observe language and make generalizations. Such activities promote noticing and grammatical consciousness raising (Schmidt 1990), which can enhance second language learning and development. Despite the growing interest in corpora and corpus-aided learning, however, many teachers believe that incorporating corpora into their teaching would be too technically challenging or time consuming (Boulton 2010). Yet, while some researchers have suggested substantial l and Lindquist 2007), training is necessary (for example Estling Vannesta others have provided evidence that only a minimal amount of training is needed (for example Boulton 2008). Some have also recommended using
E LT Journal; doi:10.1093/elt/ccr031 The Author 2011. Published by Oxford University Press; all rights reserved.

Downloaded from http://eltj.oxfordjournals.org/ by guest on April 24, 2013

1 of 4

paper-based materials generated from corpora as a viable alternative to accessing corpora via computers (Boulton 2010). A key pedagogical approach for using corpora in language teaching and learning is data-driven learning (DDL), which emerged in the mid-1980s. DDL was dened as the use in the classroom of computer generated concordances to get students to explore the regularities of patterning in the target language, and the development of activities and exercises based on concordance output (Johns and King 1991: iii). As Johns (1994: 297) stated, what distinguishes the DDL approach is the attempt to cut out the middleman as far as possible and to give direct access to the data so that the learner can take part in building up his or her own proles of meaning and uses. Furthermore, corpus data [offer] a unique resource for the stimulation of inductive learning strategiesin particular the strategies of perceiving similarities and differences and of hypothesis formation and testing (ibid.). By extension, the corpus-aided discovery learning (CADL) approach entails encouraging learners to take the role of language researchers by systematically engaging in discovery learning (Gavioli 2000) and in learning how to learn through observations, analyses, interpretations, and presentations of language-use patterns in corpus data. In the C A DL approach, learning about language use is driven by a process of enquiry that works toward understanding or problem solving, and corpora are used as mediational tools (Vygotsky 1978) rather than as the basis for language teaching and learning. Furthermore, instructors adhering to the CADL approach play a critical role in facilitating or guiding the process of discovery, which depends on the learners needs, stages of learning, and levels of prociency. Researchers have generally agreed that corpus data enrich our understanding of language use and are an important resource for language teaching and learning. The use of corpora in language teaching is not without controversies, however. Among the debates featured in Seidlhofer (2003), for example, some scholars have advocated using real examples only in the classroom (for example Sinclair 1997), while others, in contrast, wonder whether the discourse in corpora, taken out of its original context, can still be considered authentic, real, or natural, thereby questioning the efcacy of analysing displaced language that may not be relevant to learners linguistic and sociocultural contexts. In response to Widdowsons (1998) remark that corpora may provide samples of genuine language produced by language users with real communication goals but do not necessarily guarantee that learners can participate in discourse in ways that lead to learning, researchers such as Gavioli and Aston (op.cit.) note that learners can still authenticate language samples by adopting an observers role to critically analyse the data, which will raise their awareness of lexical, grammatical, and textual issues as they restructure their views about language use in real situations. Similarly, Carter (1998: 501) argues that while real English from corpora can be unrealistic for classroom instruction and thus modied language used in the classroom that is based on learners needs and levels might be more pedagogically viable and realistic, learners should be provided with opportunities to develop a feel for the language through corpus data. The validity of analysing corpora to capture language use across seemingly limitless contexts or to describe the
2 of 4 Li-Shih Huang

Downloaded from http://eltj.oxfordjournals.org/ by guest on April 24, 2013

workings of real English around the world has also been questioned. Some scholars point out that communicative contexts are not restricted to native speaker discourse, and, as such, language teaching should not be based simply on descriptive facts generated from largely native speaker-oriented corpora (Prodromou 1996).2 Despite these debates, technological advancements have undoubtedly enhanced language learners and instructors access to corpora, and the plethora of articles and books written for language-teaching researchers and practitioners published during the past ve years suggest that attention to and interest in using corpora for teaching and learning purposes will continue for the foreseeable future.
Notes 1 For more examples, visit http://corpus.byu.edu and International Corpus of English: http://icecorpora.net/ice. 2 The Vienna-Oxford International Corpus of English (V O I C E) (http://www.univie.ac.at/voice) is one such corpora that collects English spoken by non-native language users in various contexts. V O ICE comprises one million words of naturally occurring, non-scripted, face-to-face interactions by over 1,200 speakers with 50 different rst languages. References Bennett, G. 2010. Using Corpora in the Language Learning Classroom. Ann Arbor, MI: Michigan University Press. Bernardini, S. 2000. Systematising serendipity: proposals for concordancing large corpora with language learners in L. Burnard and T. McEnery (eds.). Rethinking Language Pedagogy from a Corpus Perspective. Frankfurt am Main: Peter Lang. Biber, D. and R. Reppen. 2002. What does frequency have to do with grammar teaching? Studies in Second Language Acquisition 24: 199208. Boulton, A. 2008. Looking for empirical evidence for DD L at lower levels in B. LewandowskaTomaszczyk (ed.). Corpus Linguistics, Computer Tools, and Applications: State of the Art. Frankfurt am Main: Peter Lang. Boulton, A. 2010. Data-driven learning: taking the computer out of the Equation. Language Learning 60/3: 534572. Carter, R. 1998. Orders of reality: C AN CODE, communications, and culture. E LT Journal 52/1: 4356. Conrad, S. 2005. Corpus linguistics and L2 teaching in E. Hinkel (ed.). Handbook of Research in Second Language Teaching and Learning. Mahwah, NJ: Lawrence Erlbaum Associates. l, M. and H. Lindquist. 2007. Estling Vannesta Learning English grammar with a corpus: experimenting with concordancing in a university grammar course. ReC AL L 19/3: 32950. Gavioli, L. 2000. The learner as researcher: introducing corpus concordancing in the classroom in G. Aston (ed.). Learning with Corpora. Houston, TX: Athelstan/Bologna: C L U E B. Gavioli, L. and G. Aston. 2001. Enriching reality: language corpora in language pedagogy. E LT Journal 55/3: 23846. Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Johns, T. 1994. From printout to handout: grammar and vocabulary teaching in the context of data-driven learning in T. Odlin (ed.). Perspectives on Pedagogical Grammar. Cambridge: Cambridge University Press. Johns, T. and P. King. (eds.). 1991. Classroom concordancing. English Language Research Journal 4: 2745. OKeeffe, A., M. McCarthy, and R. Carter. 2007. From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. Prodromou, L. 1996. Correspondence. E LT Journal 50/1: 889. Reppen, R. 2010. Using Corpora in the Language Classroom. Cambridge: Cambridge University Press. Schmidt, R. 1990. The role of consciousness in second language learning. Applied Linguistics 11/2: 12958. Seidlhofer, B. 2003. Controversies in Applied Linguistics. Oxford: Oxford University Press. Sinclair, J. 1997. Corpus evidence in language description in A. Wichmann, S. Fligelstone T. McEnery, and G. Knowles (eds.). Teaching and Language Corpora. New York, NY: Longman.

Downloaded from http://eltj.oxfordjournals.org/ by guest on April 24, 2013

Corpus-aided language learning

3 of 4

Sinclair, J. 2004. How to Use Corpora in Language Teaching. Amsterdam: John Benjamins Publishing Company. Vygotsky, L. S. 1978. Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA: Harvard University Press. Widdowson, H. G. 1998. Context, community and authentic language. T E S OL Quarterly 32/4: 70516.

The author Li-Shih Huang is an Associate Professor of Applied Linguistics and Learning and Teaching Centre Scholar-in-Residence at the University of Victoria, Canada. Her current research examines academic language learning needs and outcomes assessment, corpus-aided discovery learning, and learner strategies in language learning and language testing contexts. Email: lshuang@uvic.ca

Downloaded from http://eltj.oxfordjournals.org/ by guest on April 24, 2013

4 of 4

Li-Shih Huang