Sunteți pe pagina 1din 32

Using Corpora in the L2

Classroom
2014 CoTESOL Annual Fall Convention
Angela Sharpe
Moriah Kent
Tatiana Nekrasova-Beker

Introduction
Objectives of the demonstration:
1) To introduce readily available corpora and ways in which you can
use corpora as resources for authentic language in order to inform
activities and materials in the L2 classroom.
2) To provide a model and examples of in-class activities and tasks
which utilize corpora.
3) To discuss ways in which corpus data can be used as a tool to
analyze materials in order to improve learning opportunities in the L2
classroom.

What is a corpus?

A corpus is a large, principled collection of naturally occurring texts (written


or spoken) stored electronically (Reppen, 2010).
Examples of naturally occurring texts:
Textbooks
Magazine, newspaper, and scholarly articles
Conversations between friends
Call center phone recordings
University office hours
Dissertations
Letters
Class assignments
Meetings

Readily available corpora:


The Corpus of Contemporary American English (COCA)- 450
million words
Google Books: American English
TIME magazine Corpus
MICASE- 1.7 million words
MICUSP- 2.6 million words
Open American National Corpus (OANC)- 15 million words
Brown Corpus- 1 million words
Corpus of Spoken Professional American English (CSPAE)2 million words
British National Corpus- 100 million words

When/How/Why to use a corpus?


1. A language related question.

2. A representative corpus of language.

3. Pedagogically sound principles in accessing


and applying corpus data.

Language features a corpus can be


used to analyze:

Analyzing lexicography:
What are the meanings associated with a particular word?
What is the frequency of a word relative to other related words?
What non-linguistic association patterns does a particular word have
(e. g. to registers, historical periods, dialects).
What words commonly co-occur with a particular word, and what is the
distribution of these collocational sequences across registers?
How are the meanings and uses of a word distributed?
How are seemingly synonymous words used and distributed in different
ways?
(Biber et al, 1988)

What else can you do?


Beyond focusing on individual linguistic features, a
corpus can be used to draw learners attention to
characteristics and organization according to genre, text
type, or register.
Take learners beyond the word level by introducing
them to authentic formulaic language sequences in
context according to register and genre.

Concordances and concordancers


Language in a corpus is displayed in concordance lines.
Using a concordancer program you can elicit pieces of
languages (e.g., target word or sequence) from a corpus.
Available concordance software:
-AntConc, MonoConc, PowerConc, Lextutor

MICASE

MICASE
If you look at query of English Language
Learners:

MICUSP

MICUSP

Classroom Concordancing
What is classroom concordancing?
A teaching approach in which concordance data are
used in the language classroom to help learners
notice and practice language patterns and use. This
teaching approach is sometimes referred to as Datadriven Learning (DDL). Learning is driven by
authentic language data, presented in the form of
concordance lines, where learners act as a linguistic
detectives to infer meaning from context and induce
linguistic patterns.
(Johns 1988; 1991 a, b)

Types of Concordance-based Tasks:


Teachercentered

Collaborative

Learner-centered

The teacher selects words The teacher and the


or phrases to be
learners agree on the
investigated usually taken language to be studied.
from observations or
information presented in
the course text.

The learners form their own


questions.

The teacher retrieves and


selects concordance lines,
and designs concordancebased tasks with different
degrees of control.

The teacher and the


learners browse the
corpus and examine the
language data together.

The learners browse the corpus


independently. There is no structured
or controlled task.

The teacher provides


clues and hints to help
learners complete
concordance tasks, or
guides learners to a
generalization or
conclusion.

The teacher comments on


and helps refine the
learner's generalizations.

There is very little interference from


the teacher in the generalization
process.

(adapted from Sripicharn, 2003)

Published Corpus-Informed
Materials

-Touchstone
-Basic Vocabulary in Use
-Focus on Grammar
-Natural Grammar
-English Idioms in Use
-Real Grammar
(see Sharpe, 2014)

Using Corpora in the


Classroom: Why?
Help students increase their knowledge of formulaic
sequences.
Provide students and teachers with a tangible resource.
Help increase their fluency and native-like language
production at the academic level of discourse.
Important because ELLs often overgeneralize or avoid
formulaic language.

How/What of Formulaic
Language

associated with particular pragmatic functions (Nattinger & DeCarrico,


1992, p.179).

is a key to native-like fluency (Simpson, 2004, p. 37).

that is stored and retrieved whole from memory at the time of use
(Wray, 2000, p. 465).
short-cuts to [increase] production and fluency (Wray, 2000, p.475).

With corpus and storage, the main attribute is frequencyso if


they are frequent, they are more salient in the language and
therefore might be stored more holistically.

What can you do with


corpora?Model Task
Extract the most common formulaic sequences used in academic writing
from Martinez The PHRASE List.
Martinez, R. (2011). The development of a corpus-informed list of formulaic sequences
for language pedagogy (Doctoral dissertation for the University of Nottingham). Retrieved
from
http://www.academia.edu/download/31080970/Ron_Martinez_2011_Doctoral_Thesis_Uni
versity_of_Nottingham_Chapter_5_Development_of_the_PHRASE_Test.pdf
OR http://www.norbertschmitt.co.uk/resources.html

Assign formulaic sequences to levels of instructions (Basic, Intermediate,


Advanced).

Breakdown of Bundles for each Level


of Class
Class

Word Frequency List

Approximate number
of phrases taught per
class period

Basic 1

1-2K

1-2 bundles/class

Basic 2

2K

1-2 bundles/class

Intermediate 100

3K

2 bundles/class

Intermediate 200

4K

1-2 bundles/class

Advanced 300

4-5K

2 bundles/class

Advanced 400

5K

1 bundle/class and
review

Teacher Phrase List Examples


Intermediate 200 Excerpt:

Basic 1 Excerpt:
Frequency
1K
1K
1K
1K
1K

Lexical Bundle
SUCH AS

RATHER THAN
AS WELL AS

(BE) LIKELY TO
A NUMBER OF

4K

In accordance with

4K

At times

4K

In effect

4K

4K
4K
4K
4K

All the way

In advance

On the part of
So as to

Take advantage

Example Worksheet and


Student Booklets:

Group Discussion
What are some ways that corpora could
be used in your classroom?
What challenges do you foresee with
using corpora in the classroom?
What subjects/ research questions might
corpora be most appropriate for?

AntConc
http://www.laurenceanthony.net/antconc_index.
html

AntConc can be used to:


elicit concordance lines;
perform a keyword analysis;
extract frequent expressions in a text;
search for typical contexts in which a
word can be used.

AntConc

AntConc

Range
http://www.victoria.ac.nz/lals/about/staff/pa
ul-nation
Range can be used to:
compare vocabulary in a text (or several
texts) against several baseword lists
scan texts to provide data on frequency
and range of vocabulary.

Range

Case study (Range)


3 chapters of the textbook used in an
introductory Engineering course (Eide,
Jenison, Northup, & Mickelson, 2012).
Range was used to examine:
the distribution of words in the text that
belonged to 1K and 2K and AWL (wordlist 1).
the distribution of words BEL (Ward, 2009)
(wordlist 2).

Results (Range)
Results of the Range Analysis
Wordlists

Sub-lists

Tokens

% Coverage

Cumulative %

(1) GSL/AWL

1K
2K
AWL
Off-list

6,543
2,197
3,424
2,240

45.42
15.25
23.77
15.55

45.42
60.67
84.44
99.99

(2) BEL

On-list
Off-list

2,636
11,951

18.07
81.93

18.07
100.00

Pedagogical implications
To examine opportunities for students to
encounter and learn less frequent and
discipline-specific vocabulary.
To generate key vocabulary that is
genre- and discipline-specific.
To investigate the distribution of target
vocabulary in a range of pedagogical
materials covered in a course (textbook,
handouts, tests and exams, lectures).

Join the CSU TEFL/TESL Student


Association to welcome

Diane Larsen-Freeman!
When: Friday, April 24th 2015, 5:00 p.m.
Where: Location TBD (stay tuned!), but on
the CSU campus

Thank you!
Angela Sharpe:
absharpe@rams.colostate.edu

Moriah Kent:
moriahkent@gmail.com

Tatiana Nekrasova-Beker:
t.nekrasova_beker@colostate.edu

S-ar putea să vă placă și