Sunteți pe pagina 1din 24

A CORPUS-BASED STUDY crucial in interpreting and evaluating

those results because the corpus-driven


OF THE STYLE IN JANE approach to literary texts relies heavily on
1
AUSTEN’S NOVELS quantitative data.

2 Introduction
Raksangob Wijitsopon
The past few decades have seen a
Abstract remarkable prominence in the application
of corpora in English applied linguistic
While a corpus linguistic technique has research. This also includes the area of
been applied to various studies in text and text and discourse analysis. The corpus-
discourse analysis, it has not been much based technique allows text and discourse
adopted in stylistic analysis of literary analysts to expand the size of their data,
texts. The present study, therefore, applies which in turn enables them to generalize
a corpus-driven approach to Jane their findings to a larger extent than
Austen’s six major novels, in order to see before. On the theoretical side, it enables
how well this new method works with text and discourse researchers to show that
literary texts, compared with what has patterns of co-occurrence among words in
been observed in previous studies of Jane texts are associated with different
Austen’s language. It has been found that meanings and uses in the communicative
the corpus-driven approach can provide events (Sinclair 2004). For example, Biber
quite a few results that are useful in and Conrad (1999) showed that
supporting and refining literary scholars’ conversation and academic writing are
intuitive observations on the author’s markedly different from each other
works. Some of the linguistic patterns through their frequency analysis of phrases
derived from the comparative corpus- found in corpora of the two text types.
driven method have not been remarked on Recent work in critical discourse analysis,
before in any previous studies and hence whose main concern lies in the
can serve as new textual evidence in the relationship between discourse, ideology
study of Jane Austen’s writing style. and power, also incorporates the use of
Despite such great potential for the study corpora in its analytical practice. Moon
of style in literary works, it is suggested and Caldas-Coulthard (2010), for instance,
that the analyst’s knowledge and found from their analysis of a British
understanding of the text(s) under study is newspaper corpus that women are
frequently described in terms of their
1
This study is sponsored by the TRF-CHE physical appearance, as shown by a high
Research Grant for New Scholar and the frequency of such adjectives as
Ratchadaphiseksomphot Endowment Fund of “beautiful”, “pretty” and “lovely” used in
Chulalongkorn University (RES560530083- collocation with references to women,
HS). I would like to express my gratitude to whereas references to men are usually
the Thailand Research Fund (TRF), the modified by adjectives related to
Commission on Higher Education (CHE), importance, including “great”, “key” and
Ministry of Education, and Chulalongkorn
“main”. This difference in the media’s
University for their support.
2
Assistant Professor, Department of English,
discursive practice, it is argued, reflects
Faculty of Arts, Chulalongkorn University
MANUSYA: Journal of Humanities Regular 16.1, 2013

and simultaneously sustains the patriarchal Corpus stylistics: The theoretical


ideological beliefs in British society. framework of the study
While corpora have been adopted in the
The term “corpus stylistics” has recently
examination of a variety of text types,
been used by scholars in stylistics (e.g.
ranging from everyday conversation and
Short and Semino 2004) or in corpus
newspaper reports to academic writing,
linguistics (e.g. Mahlberg 2007), to refer
little has been done on literary discourse.
to the practice of linguistic analysis of
Given that our interpretation of a literary
literary texts, making use of a collection of
work relies particularly heavily on
electronic texts, sampled to be maximally
language in a text, it would be interesting
representative of a writer’s works or a
to explore to what extent the use of
particular literary genre (see Biber 2011
corpora would enable us to investigate
for review of corpus stylistics research).
language in literary texts and its
While corpus linguistics is interested in
relationship with interpretative issues. To
describing normative uses of language,
this end, I apply corpus techniques to the
which can be inferred from repeated
analysis of Jane Austen’s six major novels.
occurrences of linguistic patterns, stylistics
The research questions the present
pays particular attention to deviations from
research addresses are:
linguistic norms, which create particular
textual meanings and aesthetic effects.
(1) What lexical patterns are characteristic
Given the primary concern of each
of Jane Austen’s novels?
different discipline, corpus linguistics and
stylistics appear to focus on opposing
(2) What textual meanings do those
phenomena in the form-meaning
patterns suggest?
relationship. However, in identifying
deviant instances of language use in a
(3) What are strengths and limitations of
literary text, stylisticians in effect draw on
the corpus approach in the study of
their observation or knowledge of what is
fictional prose?
normal in language use. Therefore, while
linguistic deviation is central to stylistics,
In the sections that follow, I first give an
consideration of linguistic norms is in fact
overview of “corpus stylistics”, the
inherent to the practice of stylistic
theoretical framework in which this
analysis.
research is grounded. Then, an outline of
previous studies on Jane Austen’s works
It is through the concern about linguistic
and writing style is provided. This is
norms that stylistics and corpus linguistics
followed by an account of the
come to converge. As Stubbs (2005: 21)
methodology adopted in this study,
notes, “[…] a text is a selection from the
comprising explanations about corpus
potential of the language […] Comparative
descriptive tools and data preparation.
corpus methods […] allow us to study how
Then, the results of the study are reported,
far texts consist of recurrent phrasal
followed by a discussion on the strengths
patterns which are widespread in the
and limitations of the corpus-driven
language as a whole.” Corpus stylistics
approach to literary texts, as observed
thus involves an explicit comparison
from a corpus of Jane Austen’s six major
between a corpus of texts under
novels.
investigation and linguistic “norms”,

42
A Corpus-based Stylistic Study of Jane Austen’s Novels

represented by a corpus of the kind of A brief review of previous studies


texts that are “contextually related” of Jane Austen’s novels
(Enkvist 1973) to the text or group of texts
under investigation.
Jane Austen is one of the most renowned
novelists in English literature. Her works
Some corpus linguists (e.g. Tognini-
usually present the story of a young lady
Bonelli 2001, Mahlberg 2005) argue that a
who has some kind of limitations,
corpus-based approach can allow textual
including social standing, economic
analysts to obtain quantitative data to test
insecurity and even her own distorted
their hypothesis about textual features. On
understanding of the world. All of her
the other hand, an analyst can come to
heroines have to learn to overcome these
study the text without identifying what
limitations by developing self-
textual features are likely to mark the style
understanding and sound judgment of
of the work and keep their eyes open to the
people around them before they have a
findings derived from a comparison
happy ending (McMaster 1996; Tanner
between the text(s) in question and the
2007). Because Jane Austen’s works
appropriate reference corpus. These
involve the portrayal of the emotional,
findings are then taken as textual patterns
intellectual and spiritual growth of the
that are brought to analysts’ attention and
female protagonists, they have often been
deserve further investigation in terms of
criticized, according to Sherry (1966), for
the semantic and pragmatic roles they
being lacking in physical action and full of
have in our interpretation of the text(s).
socializing activities, such as neighbor or
relative visits, picnics, and parties.
In the present study, I take this corpus-
Advocates of her novels, however, argue
driven stance in that stylistic features of
that Jane Austen is great in her realistic
Jane Austen’s novels are not first
description and biting commentary on
identified but, instead, the comparison is
domestic life during the Regency period of
allowed to reveal lexical items and
England.
patterns that characterize Jane Austen’s
novels. These patterns will be analyzed
Although Jane Austen’s works have been
qualitatively in order to see how they
widely discussed in the study of English
contribute to meanings of the texts under
literature, little has been researched on the
study.
interplay between the author’s linguistic
choices and the aesthetic value of her
Given that Jane Austen’s six major novels
novels. For example, literary critics often
are among canonical classic works in
discuss the author’s use of irony in her
English literature, they were chosen to be
novels but rarely are the ironic statements
the objects of study. As there have been
explained as to how their ironic force is
numerous studies of Jane Austen’s works,
achieved (cf. e.g. Mudrick 1952). The few
a brief note on her novels, particularly on
attempts that have been made to explain
her language use, is provided in the next
Jane Austen’s language use are mostly
section so that corpus-informed findings
concerned with her word choices. Page
can be assessed and discussed in relation
(1972), for instance, observes that Jane
to intuitive observations made in previous
Austen’s novels are full of abstract nouns.
studies of Jane Austen.
Booth (1991) also comments on the
author’s preference for abstract nouns,

43
MANUSYA: Journal of Humanities Regular 16.1, 2013

suggesting that they are used to delineate to explore stylistic features and their
reliable characters from comic or textual functions in Jane Austen’s novels:
superficial ones, whose idiolects tend to be “keyness”, “collocation” and “cluster”,
filled with concrete nouns. A more each of which is explained in turn below.
detailed account of Jane Austen’s lexical
choices is found in Stokes (1991), who a. Keyness
argues that Jane Austen’s novels contain
four major groups of words, namely those To answer the research questions stated
related to (1) spirit, e.g. “vivacity” and above, the concept of “keyword” in corpus
“ardour”, (2) manners, e.g. “civil” and linguistics is drawn upon as a starting
“elegant”, (3) intelligence, e.g. point in the analytical procedures.
“accomplishment” and “discernment” and According to Scott and Tribble (2006),
(4) temperament, e.g. “amiable” and keywords are lexical items of significance
“disposition”, all of which are central to to a text in question, because of their
the development of the plots and themes of “unusual frequency in comparison with a
her novels, including those about reference corpus of some suitable kind”.
judgment or the disparity between The “unusual frequency” here refers to
appearance and reality. both “unusually high” and “unusually
low” frequency. For the purpose of the
Given that Jane Austen’s novels have been present study, only items with “unusually
discussed widely in literary studies and high” frequency are considered.
that some observations have been made on
her language use, though intuitively, an The phrase “unusually high” here suggests
examination of the language in her six that keywords are not simply words of
major novels using a new approach like high frequency. In other words, keywords
corpus stylistics might be a boon to both are not necessarily the most frequent
English literary studies and corpus words found in a text. Keywords are
linguistics: findings from the corpus- important to the text because they are used
driven approach can be compared with “unusually often” when compared with
what has been said in the previous studies, other texts. To illustrate, if we consider a
which in turn would help us evaluate how corpus of Jane Austen’s novels alone, the
well the approach works in relation to definite article “the” is found to be the
literary discourse; on the other hand, most frequent word in Jane Austen’s
critics’ careful observations of Jane novels. However, can we immediately say
Austen’s writing can be validated or that it marks Jane Austen’s writing style?
refined by findings from a systematic Given that “the” is an article, it is likely to
corpus-stylistic approach, which involves be used very often in any piece of writing,
both the quantitative and qualitative not just in Jane Austen’s novels.
analysis of all her six major novels. Therefore, we need to compare Jane
Austen’s works with other authors’ so that
Methodology we can see if she really used “the”
significantly more than others. And when
compared with other novelists’ writing,
1. Descriptive tools
the article does not turn up as a keyword in
Jane Austen’s works because it is also
There are three corpus linguistic
used frequently by other authors. On the
descriptive tools that are used in this study

44
A Corpus-based Stylistic Study of Jane Austen’s Novels

other hand, the word “very”, whose words that are not usually identified
frequency is lower than that of “the”, consciously by readers as key but
appears in the keyword list of Jane nonetheless occur in significantly high
Austen’s novels. This means that Jane frequencies and so can be indicators of the
Austen used the word “very” significantly style of a text, rather than of its content.
more often than other writers. Therefore,
as Baker (2006) puts it, a keyword list (b) Collocation
gives a measure of saliency, not just
frequency, of the lexical items in a text After keywords are extracted, their
and hence can suggest further examination significance in the six major novels needs
of their textual functions. This is the to be explained. To this end, the concept
reason why keywords are fundamental to of collocation, defined by Hoey (1991: 6-
the corpus-stylistic analysis of Jane 7) as “the relationship a lexical item has
Austen’s novels in the present study: with items that appear with greater than
through the Keyword function in Rayson’s random probability in its (textual)
(2007) Wmatrix Tools (see below), lexical context”, is drawn upon. This definition
items that are characteristic of Jane emphasizes that collocation of a word is
Austen’s novels are extracted, some of not just a random co-occurrence of words,
which will be further investigated in detail. e.g. “she + is”, but the co-occurrence takes
place in a text for some reason, as seen
Based on the above principle, it can be from the phrase “with greater than random
seen that a keyword list is derived through probability in its (textual) context”. For
a comparison between the text or corpus in example, as shown by Stubbs (2001: 28),
question and reference corpora, with the common collocations of the word “seek”
size of each corpus and the frequencies of include “help”, “advice” and “support”.
each word within them being cross- An examination of the collocational
tabulated. Such statistical tests as the chi- patterns of a word in a text can therefore
square or log-likelihood tests are then allow us to see the relationship between
employed to measure the degree of each lexical items in a text, which in turn
word’s significance. In this study, as far enables us to see the way words are used
as a statistical test is concerned, the to create meanings in a text. To find out
statistical measure log-likelihood is what keywords are used “with greater than
applied for all the comparisons. This is random probability” in Jane Austen’s
because, according to Leech et al. (2001), novels, a computer-assisted extraction of
while many statistical tests rely on an collocates, through the statistical measure
assumption of normal distribution of data, Mutual Information (MI)3, is adopted.
which is often not the case with linguistic
data such as word frequencies, the log- (c) Cluster
likelihood measure does not.
In this study, while a collocation refers to
According to Scott and Tribble (2006), an individual lexical item that is found
three kinds of words usually come out of a
comparison as keywords: (1) proper
3
nouns, (2) words that “human beings Mutual Information is a concept in statistics,
would recognise” as key, which tend to often adopted in lexicography to measure the
indicate a text’s “aboutness”, and (3) strength of collocations (see, for example,
Church and Hanks 1990)

45
MANUSYA: Journal of Humanities Regular 16.1, 2013

through statistical measure to co-occur 2. Data, reference corpora and


significantly with another lexical item, the software
term “cluster” is used to refer to a
recurrent string of uninterrupted word To answer the research questions with the
forms, e.g. “you do not” and “I am sure above three descriptive tools, three
that” (Scott 1999). Given these examples, corpora were compiled, one being the
clusters are thus phrasal constructions, main corpus data and the other two being
which are combinations of lexis and reference corpora used for a comparison
grammar. According to Stubbs (2001), with the main corpus data:
because clusters display both lexical and
grammatical relationships among words, (1) the main corpus data: a corpus of Jane
they play an important role in creating Austen’s six major novels (henceforth JA),
textual meanings. As far as the present with 729,488 tokens4;
study is concerned, the concept of
“cluster” is of particular use in showing (2) the first reference corpus: a corpus of
the way function words such as “could” modern fictional texts from the British
and “must”, which turn up in the keyword National Corpus provided by the Wmatrix
list of Jane Austen’s novels, contribute to software (henceforth “BNC”);
meanings in the author’s works. As
individual words, it is hard to explain (3) the second reference corpus: a corpus
systematically what semantic and of British prose fiction published during
pragmatic contributions the modal verbs the period 1780 – 1820 (henceforth
“could” and “must” have to Jane Austen’s 19CNov), with 2,027,118 tokens.
major novels, because the verbs can be
used for various purposes, including The latter two corpora serve as the
expressing probability, obligation, reference corpora in this project, with
politeness or permission. However, if we which JA was compared so that key
consider their frequent clusters, e.g. “she lexical items in JA can be extracted. BNC
could not say” and “you must have is meant to represent linguistic patterns in
thought”, we are able to see more clearly modern British fictional prose, with which
in what ways these function words are modern readers are familiar and is
important to the style of Jane Austen’s therefore likely to affect readers’
writing. interpretation of Jane Austen’s novels,
while 19CNov is meant to represent the
Clusters can be extracted automatically by British English the author drew upon when
the software WordSmith Tools (see writing her novels. In other words, BNC
below). To extract clusters from a text, we represents language on the receptive side
need to identify the length of a cluster. In while 19CNov represents language on the
the present study, I chose to look at three- productive side of the texts under
word clusters, e.g. “I do not”, because, investigation. Thus, a comparison between
after experiments on various lengths of JA and BNC would show how the
clusters, this is the optimal length of language in Jane Austen’s novels is
clusters since a manageable size of data
can be generated for detailed concordance
4
analysis. The word “token” used here is a term in
corpus linguistics, referring to an occurrence of
any given word form.

46
A Corpus-based Stylistic Study of Jane Austen’s Novels

different from the kind of English with An examination of the texts at three
which modern readers are familiar. A linguistic levels, i.e. lexical, grammatical
comparison with 19CNov, on the other and semantic, serves as a means of
hand, would show in what ways linguistic triangulation in the way we approach the
patterns in JA are different from those in texts, looking at different linguistic
the novels contemporary to JA. The lists features of the texts, and also reduces the
of keywords obtained from the comparison problem that may arise as a result of the
of JA with the two different reference focus on frequency and statistical value.
corpora are compared. The words that are That is, by focusing on keywords only,
found to occur on both lists are deemed researchers may have to ignore words
true keywords that mark Jane Austen’s below the statistical cut-off point 6 even
writing style since they are found to be of though they are actually closely related to
statistical significance no matter what those above the cut-off point. For
“linguistic norms” are considered, the example, as will be seen below, the word
tendency of present-day British English “very” is ranked second in the keyword
representing modern readers’ language or list of JA. However, the author also used
that of 19th Century British English. other degree adverbs, such as “so” and
“really”, but they do not turn up in the
The software used for the corpora keyword list since their statistical value
comparison in this project are WordSmith stands below the cut-off point set in the
Tools, developed by Scott (1999) and extraction. Nevertheless, they may be
Wmatrix, developed by Rayson (2007). found as part of key grammatical
Both are software tools for corpus analysis categories if the “degree adverbs” category
and text comparison. The former is an is designated as key. If not, we can infer
integrated suite of programmes that enable that only the word “very” is of particular
us to examine how words behave in texts. significance to Jane Austen’s writing since
Wmatrix provides a web interface to the it was used more significantly even than
semantic and grammatical corpus its close synonyms. In other words, the
annotation tools, i.e. USAS and CLAWS5, extraction of key semantic fields and key
respectively. Wmatrix users can upload grammatical categories would help
their own corpus data to the system, so confirm the significance of the individual
that it can be automatically annotated and key words or shed light on the density of
viewed via the web browser. Wmatrix also some words that may have been
extends the keyword method to key overlooked due to their relatively lower
grammatical categories and key semantic frequency as individual words.
fields, i.e. grammatical categories and
semantic fields that are of significance to
the text under investigation due to their
unusual frequencies when compared with 6
The statistical cut-off point used here refers
reference corpora. to the level at which the statistical value of a
word is considered meaningful for data
interpretation. For example, if a cut-off point is
set at a statistical value of 0.05, words with
5
USAS stands for UCREL Semantic Analysis values higher than 0.05 are considered
System and CLAWS for the Constituent important and should be chosen for detailed
Likelihood Automatic Word-tagging System examination while those with values lower
than 0.05 are not very important.

47
MANUSYA: Journal of Humanities Regular 16.1, 2013

In this study, to extract key linguistic Table 1: Keywords in Jane Austen’s


features, the cut-off point was set at the novels
log-likelihood value of 200, which can be
considered rather high. This is due to the Rank Keyword
fact that the two reference corpora are to a 1. be
large extent different, one representing the 2. very
language of present-day British English
fictional prose and the other that of 19th 3. not
century British English fiction. 4. she
Consequently, lists of keywords derived 5. her
from the comparison of JA with each 6. could
reference corpus are likely to be 7. every
remarkably different. Setting a high cut-
8. herself
off point means that the possibility of key
linguistic features found on the lists 9. must
occurring by chance is slim. Therefore, 10. such
this would guarantee the keyness of the 11. any
lexical items, grammatical categories and 12. been
semantic fields that are found from the 13. however
comparison. The items or categories that
are found to turn up on both JA-BNC and 14. sister
JA-19CNov lists are considered true key 15. feelings
items or categories in Jane Austen’s 16. to
novels and hence further examined 17. have
through an analysis of their collocations
and clusters.

Result

Based on the setting spelled out in the


previous section, keywords, key
grammatical categories and key semantic
fields in Jane Austen’s novels that are
found from comparing JA with BNC and
19CNov are derived. However, as stated
above, in this study only those that occur
on both lists are considered significant
style markers of Jane Austen’s novels. The
lists of keywords, semantic fields and
grammatical categories in JA are presented
in Tables 1-3 below, starting from the item
with the highest degree of keyness.

48
A Corpus-based Stylistic Study of Jane Austen’s Novels

Table 2: Key semantic fields in Jane Austen’s novels

Rank Key semantic field Sample words in the semantic field7


1 degree boosters very, so, much
2 likely could, would, might
3 entire; maximum all, any, every
4 thought, belief think, felt, believe
5 kin sister, father, mother
6 degree maximizers most, perfectly, entirely
7 content pleasure, glad, satisfied
8 strong obligation or necessity must, should, obliged
9 social actions, states and processes manner, visit, conduct
10 respected respect, regard, esteem
11 expected hope, expected, anticipated
12 like dear, like, affection

Table 3: Key grammatical categories in Jane Austen’s novels

Rank Key grammatical categories Sample words in the grammatical


categories8
1 degree adverb very, so, much
2 be – infinitive be
3 noun of title Mr, Miss, Mrs
4 modal auxiliary could, would, might
5 determiner capable of such
pronominal function
6 have – infinitive have
7 been been
8 to to
9 third-person singular objective him, her
personal pronoun

7
The three most frequent words in each semantic field are given as sample words.
8
If a grammatical category contains more than three items, e.g. the “degree adverb” group, the three
most frequent words in each category are given as sample words. For those that contain three or fewer
than, all of the items are put in the table.

49
MANUSYA: Journal of Humanities Regular 16.1, 2013

Interpretation of overall findings (4) words related to women, which


comprise the keywords “she”, “her” and
It is observed that a number of the findings “herself” and the grammatical categories
relate to or overlap with one another, “noun of title” and “third-person singular
either in the same categories or across objective personal pronoun”
different linguistic groups. To illustrate,
the keywords “she”, “her” and “herself” (5) words related to family relationships,
are related in the sense that they are which comprise the keyword “sister” and
personal pronouns referring to women, or the key semantic field “kin”
the keyword “very” is also part of the key
grammatical category “degree adverb” and (6) words related to internal states of
key semantic field “degree boosters”, the mind, which comprise the keyword
latter of which also expresses a similar “feelings” and the semantic fields
concept to the key semantic field “degree “thought, belief”, “content”, “respected”,
maximizers”. Such correspondence “expected” and “like”
suggests that these overlapping items or
categories are especially characteristic of Applying Scott and Tribble’s (2006)
Jane Austen’s novels since they remain on categorization of keywords (see above),
the lists, whether we look at individual these six groups of key linguistic features
lexical items, grammatical categories or in Jane Austen’s novels can be divided
semantic fields in the novels. Such into two main groups. The first group
overlapping items and categories can be contains lexical items that are likely to be
put together into groups, resulting in a identified through human observation and
total of six groups of key linguistic suggest the content of the texts. This group
features that mark the style of Jane comprises Groups (4), (5) and (6). The
Austen’s novels. They are: lexical items in these three groups match
what has been discussed in literary
(1) words related to a high degree, which criticism of Jane Austen’s novels and
comprise the keywords “very”, “every” writing style. For instance, that her novels
and “any”, the semantic fields “degree deal with women’s lives in Regency
boosters”, “entire, maximum” and “degree England can be represented by the keyness
maximizers” and the grammatical category of words related to women (Group 4).
“degree adverb” Group (5), which consists of the keyword
“sister” and the semantic field “kin”,
(2) modal auxiliary verbs, which comprise corresponds to the point, noted by Sherry
the keywords “could” and “must”, the (1966) and Page (1972), that Jane
semantic fields “likely” and “strong Austen’s novels are primarily domestic.
obligation or necessity” and the key Group (6), which contains the keyword
grammatical category “modal auxiliary” “feelings” and various semantic fields (see
above), reflects the point critics often
(3) auxiliary BE and HAVE, which make that her novels feature characters’
comprise the keywords “be”, “been” and thoughts and feelings, rather than physical
“have”, the grammatical categories “be- actions or adventures. The fact that these
infinitive”, “have-infinitive” and “been” corpus-informed sets of findings can be
interpreted in close relation to what Jane
Austen literary scholars and critics have

50
A Corpus-based Stylistic Study of Jane Austen’s Novels

been talking about suggests that the but style markers of all Jane Austen’s six
corpus-driven keyword analysis conducted major novels. Since these three groups of
in this study can provide textual evidence findings have not yet received much
for previous studies on Jane Austen. attention in literary discussions of Jane
Because these three groups of linguistic Austen’s works, they will be investigated
markers of Jane Austen’s writing style in turn in the present study.
have often been mentioned in literary
studies of Jane Austen, the present study (1) Words related to a high degree
will not deal with them in detail.
Of all the six groups of key linguistic
The other group consists of lexical items, features shown above, words related to a
as Scott and Tribble (2006) state, that are high degree are the most characteristic of
not usually identified consciously by Jane Austen’s novels. This is reflected in
readers as important but occur in the fact that they occur in all three
significantly high frequencies and so can different linguistic categories, as shown
be indicators of the style of a text, rather below:
than of its content. Given that Jane
Austen’s novels are not about the degree Key semantic fields degree boosters
of something, we can say that words in entire; maximum
Group (1) “words related to a high degree” degree maximizers
are style markers, rather than “aboutness”
indicators while the other two groups, Keywords very
“modal auxiliary verbs” and “auxiliary BE every
and HAVE” are helping verbs which such
generally do not express the content of the any
texts. To the best of my knowledge, the
auxiliary verbs BE and HAVE have never Key grammatical categories
been mentioned anywhere, even in degree adverb
passing, in literary studies of Jane Austen.
As for words related to a high degree, they Not only the density of words related to a
have been mentioned in passing in relation high degree but also the degree of their
to Jane Austen’s characterization of comic keyness reflect their greater significance to
and insensible characters through their Jane Austen’s novels than other key
idiolects (cf. e.g. Booth 1991 and Stokes features; the semantic field “degree
1991), i.e. those characters tend to use boosters” and grammatical category
such intensifiers as “extremely” and “degree adverbs” are ranked first in the
“vastly” when expressing their opinions relevant lists (see Tables 2 and 3 above)
about something. Modal auxiliary verbs while the keyword “very” is in second
have been studied in great detail by place in the keyword list (see Table 1
Burrows (1986), but they are treated as above).
properties of the speech of characters with
a strong sense of morality, e.g. Fanny Upon examination of concordance lines of
Price in Mansfield Park or Mr. Knightley the words in this group, it is found that the
in Emma, while it is revealed in the words denoting a high degree are used in
present study that they are not just close proximity to one another. A strong
characteristic of some characters’ speech density of high-degree words at some

51
MANUSYA: Journal of Humanities Regular 16.1, 2013

points in the novels constitutes an admiration for Harriet’s manners that


exaggerated discourse in Jane Austen’s show “deference” and “grateful[ness]” for
works. The exaggeration, in turn, is likely Emma is likely to raise doubt in some
to encourage readers to feel that the part of readers’ minds as to whether Emma’s
the text they are reading cannot be kindness for Harriet comes from her
interpreted at face value. Some doubt is genuine good wishes for Harriet or from
likely cast on the reliability or sincerity of her own vanity. In fact, some critics even
the character whose point of view is argue that Emma’s decision to adopt
focalized. The excerpt below, taken from Harriet as her protégée is not out of
Emma, illustrates this point. In this part of compassion but due to her preference for
the novel, Emma, the protagonist of the dominating someone and exercising power
novel, has met with Harriet Smith, an (cf. e.g. Mudrick 1952). Of course, the
orphan from the lower class, for the first above paragraph does not state so and yet
time and is interested in making Harriet the fact that there are such interpretative
her protégée. The extract below shows arguments and that we find it hard to
what she thinks about Harriet. The follow Emma’s viewpoint that Harriet is
underlined words are those found either in super-good suggests that there are
the above-mentioned semantic fields, meanings between the lines here and what
grammatical categories or keyword list. partly accounts for this textually is the
reiteration of words denoting a high
Emma was as much pleased with her degree that present Emma’s extreme
[Harriet Smith] manners as her evaluation.
person, and quite determined to
continue the acquaintance. She was It must be noted that my attention to the
not struck by anything remarkably above extract came before I discovered
clever in Miss Smith's conversation, what literary scholars and critics have said
but she found her altogether very about Emma and Harriet. The text was
engaging -- not inconveniently shy, chosen for careful study because
not unwilling to talk – and yet so far concordance lines of words in the above
from pushing, showing so proper and semantic fields show that this part of the
becoming a deference, seeming so novel is rich in hyperbolic words. It can
pleasantly grateful for being admitted thus be said that concordance investigation
to Hartfield [Emma’s mansion], and is useful in helping stylistic analysts select
so artlessly impressed by the a text for further detailed qualitative
appearance of everything in so analysis with less subjectivity. 9 This is
superior a style to what she had been because the analyst’s selection of an
used to, that she must have good sense excerpt is not guided alone by his/her
and deserve encouragement. knowledge and interpretation of a relevant
part of the literary work but also through
In the above paragraph, Emma’s quantitative corpus-driven findings,
satisfaction with Harriet on their first
meeting is presented as remarkably strong,
9
as can be seen from the recurrence of Stylisticians are sometimes criticized for
high-degree words, especially that of “so”. being subjective, selecting a text that they
On the surface, Emma may seem very already know contains some interesting
kind and compassionate but her intense linguistic features relatable to some
interpretative issues (cf. e.g. Fish 1996)

52
A Corpus-based Stylistic Study of Jane Austen’s Novels

without which he/she may not have at all between the lines in many parts of the
considered that part of the work. novels. The word “crucial” used here is
not an exaggeration, however, given that
Apart from the use of high-degree words the creation and interpretation of meanings
in the narrative part, it is also found from between the lines is one of the remarkable
the concordance lines analysis that this qualities of Jane Austen’s novels. Literary
group of lexical items occurs significantly critics often note that it is not only the
as part of the conversations among female protagonists in all her novels but
characters. It is found from the analysis also her readers who are involved in the
that, like the narrative part, a character’s process of distinguishing between
direct speech that displays a density of appearance and reality (McMaster 1996).
high-degree words is a marked exaggeration. Decoding meaning between the lines, be it
The exaggerated speech tends to betray the irony or the insincerity of some characters,
speaker’s insincerity or insensibility. is a task that her readers will experience in
Below is a direct speech quotation of Lucy the course of their reading. While this is
Steele, an antagonist in Sense and widely remarked on in studies of Jane
Sensibility. In this extract, Lucy is talking Austen, it is hardly ever mentioned in
to Elinor Dashwood, trying to convince what ways this thematic instantiation is
Elinor that she truly loves Edward and it is achieved textually. The corpus-driven
not because of his prospect of inheriting a approach has directed our attention to
large fortune from his mother when she high-degree words and led us to see that it
dies. is this group of lexical items that are used
strategically for creating and hinting at
He [Edward] has only two thousand meanings between the lines in her novels.
pounds of his own; it would be madness
to marry upon that, though for my own (2) Modal auxiliary verbs
part, I could give up every prospect of
more without a sigh. I have been always Modal auxiliary verbs are also highly
used to a very small income, and could significant to Jane Austen’s writing style,
struggle with any poverty for him. whether as a semantic and grammatical
group or as a single lexical item, since
Like the passage from Emma, this extract they occur in all three linguistic levels:
does not state that Lucy Steele is lying but
readers are encouraged not to believe what Key semantic fields likely
she says, more or less because her love for strong obligation
Edward sounds exaggerated and that is or necessity
evidenced by the use of such high-degree
lexical items as “very”, “every” and Keywords could
“always” close to one another. Must
To summarise, based on the corpus-driven Key grammatical categories
approach, words denoting a high-degree modal auxiliary
are found to be most characteristic of Jane
Austen’s novels. Their close distribution The modal verbs that are most
in all her major novels plays a crucial role characteristic of Jane Austen’s novels are
in suggesting that there are meanings “could” and “must”; however, the fact that

53
MANUSYA: Journal of Humanities Regular 16.1, 2013

modal auxiliary verbs also turn up as key such a frequent word, reading through all
grammatical categories suggests that other the 3,599 concordance lines of the verb
modal verbs are also used significantly in would hardly be possible. Therefore, an
her works but they do not turn up above automatic extraction of the most frequent
the cut-off point set at LL 200. As for the phraseological patterns of the verb “could”
key semantic fields, although not all the is opted for, so that it is possible to see in
words in the “likely” and “strong what sort of textual environment the verb
obligation or necessity” groups are modal is predominantly used. Based on the
verbs10, it can be argued that the keyness extraction of three-word clusters of which
of these two semantic fields is largely “could” is a part, which occur more than
constituted by modal auxiliary verbs since 50 times in Jane Austen’s novels, a total of
78.25% of the “likely” semantic field is nine clusters turn up in the list presented
made up of “could”, “would”, “can”, below, with the frequency of each cluster
“might” and “may” while 66.57% of the in the parentheses.
“strong obligation or necessity” field is
made up of “must” and “should”. Top 9 three-word clusters of “could”
1. she could not (333)
The textual functions of the modal 2. could not be (167)
auxiliary verbs in Jane Austen’s novels 3. I could not (87)
cannot be approached in the same way as 4. could not have (76)
the hyperbolic words analysed above. This 5. could not help (76)
is because meanings of the modal verbs 6. that she could (75)
vary across the contexts of their 7. could not but (73)
occurrences while uses of those high- 8. he could not (69)
degree words are closely similar to one 9. as she could (66)
another. Therefore, an account of the
semantic and pragmatic significance of all As can be seen, out of the nine 3-word
modal verbs in Jane Austen’s novels clusters, seven of them contain the
cannot be presented here. However, two negative word “not”. We can thus infer
modal verbs, “could” and “must”, will be that the keyword “could” is used
discussed in some detail because, among repeatedly in the novel to convey
all the modal auxiliary verbs, they are also characters’ inability to do something.
listed as keywords in Jane Austen’s Upon further investigation of the
novels. concordance lines of all the “could not”
clusters, it is found that they tend to co-
2.1 “could” occur with words or phrases related to
cognition, perception and speech, as
The modal verb “could” is found to occur illustrated below:
3,599 times in all six novels. To do a
qualitative analysis of textual functions of - She could not avoid a little suspicion at
the total suspension of Isabella’s impatient
desire to see Mr. Tilney. (Northanger
10
The “likely” group also contains such words Abbey)
as “probably”, “promising” and “probable”
and the “strong obligation or necessity” group - With all these circumstances, recollections
consists of such words as “necessary”,
and feelings, she could not hear that
“obligation” and “duties”.

54
A Corpus-based Stylistic Study of Jane Austen’s Novels

Captain Wentworth’s sister was likely to 2.2 “must”


live at Kellynch without a revival of
former pain. (Persuasion) The other modal verb that ranks among
keywords in Jane Austen’s novels is
- Whether he had felt more of pain or of “must”, with 2,079 tokens in JA. Like
pleasure in seeing her she could not tell, “could”, it is hardly possible to investigate
but he certainly had not seen her with all the concordance lines of “must”, given
composure. (Pride and Prejudice) its frequency. Therefore, three-word
clusters in which “must” is embedded
This collocational pattern points to one of were extracted in order to see the ways in
the central features of Jane Austen’s which “must” is often used in Jane
novels, noted by literary critics (e.g. Stovel Austen’s novels. However, the criteria set
and Gregg 2002), that in Jane Austen’s for extraction of “could” clusters, i.e. the
fictional world, her characters are clusters with a minimum frequency of 50,
primarily engaged in conversation and cannot be applied to that of “must”, since
judging others from each character’s talk. there are only two clusters that occur more
The density of the “could not” cluster in than 50 times (see below). Hence, the ten
collocation with verbs of perception, most frequent clusters of “must” are
cognition and speech reflects the difficulty considered so that more patterns can be
the characters, mostly the female found and explored in detail. Below is a
protagonists, have in understanding or list of the top 10 three-word clusters of
expressing their thoughts about certain “must”. Note that there are two clusters
matters. The inability to see through things that have the same frequency. Like
or speak up is the main problem Jane “could” clusters, the frequency of each
Austen’s protagonists encounter and must cluster is in parentheses.
be able to solve before they have a happy
ending at the close of the novel: Catherine Top 10 three-word clusters of “must”
Morland in Northanger Abbey, Marianne 1. it must be (78)
Dashwood in Sense and Sensibility, 2. must have been (70)
Elizabeth Bennet in Pride and Prejudice 3. must be a (43)
and Emma Woodhouse in Emma have to 4. must not be (42)
learn to perceive the discrepancy between 5. he must be (27)
appearance and reality while Elinor 6. you must be (25)
Dashwood in Sense and Sensibility, Fanny 7. you must have (25)
Price in Mansfield Park and Anne Elliot in 8. must be the (24)
Persuasion can see through things and 9. that she must (22)
judge correctly but cannot speak their 10. I must have (20)
mind because of the force of certain
circumstances. The statistically significant Given the above list, it can be seen that
predominance of the modal verb “could” “must” tends to co-occur with two
can thus be related to the author’s auxiliary verbs, the infinitives “be” and
portrayal of the problems each character “have” and the past participle “been”, with
suffers, as shown by the frequent either “be” or “been” embedded in the first
collocation of “could” with “not” and seven clusters of “must”. Upon
cognition/ perception or speech verbs. investigation of these clusters, it is found
that they are often used in the characters’

55
MANUSYA: Journal of Humanities Regular 16.1, 2013

speculations about certain people or states clusters of “must” indicate that, in Jane
of affairs.11 This can be illustrated by the Austen’s fictional world, the characters
sample concordance lines below, where often do not have a clear idea about other
the relevant clusters are underlined: characters or certain matters and hence
have to guess what exactly is the case.
- The fact [that Colonel Brandon fell in These two keywords therefore serve as
love with Marianne Dashwood] was linguistic evidence that accounts for
ascertained by his listening to her again. It literary critics’ interpretation that Jane
must be so. She was perfectly convinced Austen’s works tend to be lacking in
of it. (Sense and Sensibility) physical action but are full of the
narrator’s portrayal of the characters’
- “No doubt she [Miss Crawford] will be thoughts and presentation of their
very glad. It must be a great relief to her,” conversations.
said Fanny, trying for greater warmth of
manner. (Mansfield Park) (3) BE and HAVE
- She felt that something must be the Table 1 above shows that two verb forms
matter. The change was indubitable. The of the lemma BE, “be” and “been”, and
difference between his present air and the infinitive “have” are among the top 17
what had been in the Octagon Room was keywords and key grammatical features in
strikingly great. (Persuasion) Jane Austen’s novels. However, they do
not occur in the key semantic domains.
This pattern of the modal “must” suggests This is probably because, based on an
that Jane Austen’s novels are to a large investigation of the concordance lines of
extent concerned with characters’ the verbs, many of them are used as
speculations about others or certain states helping verbs, whose role has more to do
of affairs. This relates to the point with grammatical form than semantic or
discussed above in the analysis of “could” pragmatic aspects of the texts. Although
clusters: just as the frequent negative they are mainly used as helping verbs, the
clusters of “could” contribute to the statistical significance of their occurrences
description of the characters’ inability to in the novels should lead us to turn our
understand some people or matters, the eyes to them and find out why such small
words mark the writing style of this great
11 author.
Another use of “must”, though less frequent,
is suggested by the cluster “must not be” and
“that she must”, which rank fourth and eighth,
Since the three verb forms occur
respectively, in the list. Unlike the other considerably in JA12, it is hardly possible
clusters, “must not be” tends to be used in a to analyse every single concordance line
character’s speech when the speaker, often of the verbs. It is therefore more helpful to
higher in status or older than the interlocutor, extract predominant patterns in which BE
indicates that he/she does not want something and HAVE occur. To this end, an
to happen. For example: extraction of statistically significant
- “My dear friend, you must not be
angry with me” (Northanger Abbey)
12
- “You must not be too severe upon In the six novels by Jane Austen, there are
yourself” (Pride and Prejudice) 8,157 tokens of “be”, 3,257 tokens of “been”
and 5,189 tokens of “have”.

56
A Corpus-based Stylistic Study of Jane Austen’s Novels

collocates of the two verbs was conducted. “better”.13 As can be seen, many of these
It should be noted that I did not choose to adjective collocates of “be” are concerned
extract frequent clusters as in the analysis with thoughts and feelings. This suggests
of “could” and “must” because, based on that the auxiliary verb “be” and its
the experiments with the Cluster function collocation with adjectives denoting
on WordSmith, the frequent clusters of the thoughts and feelings is a statistically
two auxiliary verbs tend to display the co- significant collocational pattern in Jane
occurrence among auxiliary verbs, such as Austen’s novels. This reflects that what
“will have been”, which requires still features in the author’s novels is the
further steps in the analysis. An extraction description of characters’ thoughts and
of collocates of “be”, “been” and “have”, feelings, rather than their physical actions.
which, unlike cluster analysis, includes In fact, as shown in Table 3 above, lexical
words that do not necessarily occur items about thoughts and feelings
immediately before or after the verb forms constitute the key semantic domains in
but still occur within the four-word span of Jane Austen’s novels. Although the other
the node words, seems to be more helpful three adjectives, “able”, “likely” and
in showing phraseological patterns of the “better”, are not directly about thoughts
three verb forms. To extract significant and feelings, an investigation of their
collocates of “be”, “been” and “have”, concordance lines suggests that they are
three criteria were set up as follows: also more or less connected to thinking
and feeling since they are part of the
(1) The collocates must be lexical items characters’ judgment or evaluation of
with a minimum frequency of 30 tokens others or certain states of affairs. This is
in JA illustrated in the following sample
concordance lines of the collocation
(2) The collocates must be lexical items among “be”, “likely” or “able” or “better”,
found to occur within the 4-word span to and evaluative words or phrases:
the right and left of the search word
- Their drive, even when this subject was
(3) The collocates must have a minimum over, was not likely to be very
statistical MI value of 3. agreeable. (Northanger Abbey)

The collocates of these three keywords can - But I shall tell you, Miss Anne, because
similarly be divided into seven groups: (1) you may be able to set things to rights,
modal auxiliary verbs, (2) personal that I have no very good opinion of Mrs.
pronouns, (3) prepositions (i.e. “by” and Charles’ nursery maid. (Persuasion)
“before”, (4) adjectives, (5) adverbs, (6)
lexical verbs and (7) nouns. - She […] thought it would be better to
speak openly to her aunt than to run such
However, the dominant group of a risk. (Pride and Prejudice)
collocates of each verb varies. The largest
group of collocates of “be” is adjectives,
namely “glad”, “sorry”, “satisfied”,
“sure”, “happy”, “able”, “likely”, and
13
There are relatively much fewer cases in
which “better” is used as an adverb, compared
with its use as an adjective.

57
MANUSYA: Journal of Humanities Regular 16.1, 2013

Taking all the above statistically not know at all that Frank has always only
significant adjective collocates together, pretended to feel attached to her.]
we can see that the pattern of “be +
adjective” is the dominant phraseological - Her own attachment had really subsided
pattern that is connected to the noted into a mere nothing; it was not worth
quality of Jane Austen’s writing style, i.e. thinking of; -- but if he, who had
her works mainly involve characters’ undoubtedly been always so much the
judging someone or something. most in love of the two, were to be
returning with the same warmth of the
Unlike “be”, the most predominant group sentiment which he had taken away, it
of collocates of “been”, the other verb would be very distressing. (Emma)
form of BE ranked among the keyword
list, is adverbs, namely “never”, “always”, [Context: While other characters go out,
“ever”, “too”, “so” and “much”. All of Mrs. Rushworth and Mrs. Norris are left at
these collocates can be used to express a home. However, this is not a problem for
high degree of something. This Mrs. Norris as she likes to flatter the rich,
collocational pattern of “been” and high- such as Mrs. Rushworth. In this excerpt,
degree adverbs is linked to the creation of the narrator describes her enjoyment
exaggerated discourse in Jane Austen’s sarcastically.]
works. As already discussed above, these
hyperbolic statements perform crucial - Mrs. Norris had been too well employed
textual functions in the novels. For to move faster. Whatever cross-
example, they may be a part of a accidents had occurred to intercept the
character’ speech, which often betrays the pleasure of her nieces, she had found a
speaker’s extreme, unreliable judgment or morning of complete enjoyment
insincerity, or articulates an ironic force in (Mansfield Park)
the narrative presentation of a character’s
thoughts, or the narrative description of As for the verb “have”, its largest group of
some character or event. This is illustrated collocates is lexical verbs in the past
through the following sample concordance participle form, namely “seen”, “heard”,
lines: “known”, “thought”, “given”, “done”, and
“made”. This does not appear very
[Context: Lucy Steele is trying to mislead surprising, given that the node word is the
Elinor that Edward loves her very much.] verb “have” and hence the collocational
pattern between “have” and past participle
- I have never been able,” continued Lucy, verbs reflects the grammatical
“to give him my picture in return, which construction of the present perfect.
I am very much vexed at, for he has However, upon investigation of the list of
been always so anxious to get it! (Sense those significant verb collocates, it is
and Sensibility) observed that they have in common certain
semantic properties; that is, the verbs
[Context: Emma is pondering that after “seen” and “heard” are verbs of perception
she and Frank Churchill have not seen and “known” and “thought” are verbs of
each other for a while, she has lost her cognition. The other three verbs, “given”
feelings for him and he would be upset if and “made”, though not conveying
he found this out. But, in fact, Emma does meanings of perception or cognition, are in

58
A Corpus-based Stylistic Study of Jane Austen’s Novels

many cases used with words related to conflicts between characters’ words or
thoughts and feelings. This is illustrated actions and their thoughts or contrasts
below with the relevant words underlined: between what a character speculates and
what really happens. This is reflected in
- I think differently now; time and the fact that “have done” tends to co-occur
sickness and sorrow have given me other with modal verbs, as shown in the sample
notions. (Persuasion) concordance lines below:

- Could she but have given Harriet her - “Well, Catherine, how do you like my
feelings about it all? She has talked her friend Thorpe?" Instead of answering, as
into love; but, alas! she was not so easily she probably would have done, had there
to be talked out of it. (Emma) been no friendship and no flattery in the
case, "I do not like him at all," she
- The letters from town, which a few days directly replied, "I like him very much;
before would have made every nerve in he seems very agreeable.” (Northanger
Elinor’s body thrill with transport, now Abbey)
arrived […] (Sense and Sensibility)
- Had he wished ever to see her again, he
- But your arts and allurements may, in a need not have waited till this time; he
moment of infatuation, have made him would have done what she could not but
forget what he owes to himself and to all believe that in his place she should have
his family. (Pride and Prejudice) done long ago, […] (Persuasion)

- But with sense and temper which ought - “[…] However, I recollected afterwards
to have made him judge and feel better, that if he had been prevented from
he allowed himself great latitude on such going, the wedding need not be put off,
points. (Mansfield Park) for Mr. Darcy might have done as well."
(Pride and Prejudice)
- “And now, Henry,” said Miss Tilney,
“that you have made us understand each Given such repeated uses of the
other, you may as well make Miss phraseological pattern “modal verb + have
Morland understand yourself […]” + done” in Jane Austen’s novels, it can be
(Northanger Abbey) said that the recurrence of this pattern
contributes to the interpretation that Jane
The fact that the keyword auxiliary verb Austen’s novels deal with differences
“have” tends to be used in collocation with between appearance and reality.
words related to perception, thoughts and
feelings, serves as a set of linguistic Discussion
evidence that accounts for the reason why
literary critics tend to feel that Jane
The findings presented above throw light
Austen’s novels are lacking in action.
on two important points in relation to the
research questions (see above) addressed
The collocational pattern of “have” and
in the present study: (1) textual patterns
“done”, however, occurs significantly in
and their relationship to meaning in Jane
the six novels because they are often used
Austen’s novels and (2) assessment of a
as part of the narrative description of
corpus-driven approach to the study of

59
MANUSYA: Journal of Humanities Regular 16.1, 2013

literary texts. These two points will be novels, since important textual features
discussed together in this section. can be observed through scholars’ close
readings. However, we should not forget
By comparing Jane Austen’s novels with a that those claims are intuition-based and
corpus of their late 18th – early 19th century that they can now be validated (or refuted)
contemporaries and with a corpus of through the findings derived from a
present-day British fictional prose, it is quantitative comparative method. In short,
found that the lists of statistically though not providing a totally new set of
significant lexical items, grammatical findings or generating new points of
categories and semantic fields in Jane discussion, a corpus approach can provide
Austen’s novels correlate with one another statistically significant textual evidence
to a large extent. This suggests that, that helps support or refute claims made in
whether we take a lexical, grammatical or literary studies.
semantic perspective, six groups of
linguistic features are particularly Having said that, I must admit that I have
characteristic of Jane Austen’s novels, some reservations about the value of some
namely: textual evidence derived from the corpus-
driven method. Looking, for example, at
(1) words related to a high degree, the density of words related to “women”
and “family relations”, I cannot help but
(2) modal auxiliary verbs, wonder whether we need a corpus
approach to explain that Jane Austen’s
(3) the auxiliary verbs BE and HAVE, novels deal with women and their families.
Is it a worthwhile effort to compile a
(4) words related to internal states of corpus, reference corpora, and conduct a
mind, statistical quantification, just to find that
Jane Austen’s novels are primarily
(5) words related to family relationships, concerned with women and family
matters? I personally feel that only some
(6) words related to women. sets of corpus findings, to be discussed
below, can be of value to literary criticism
In the light of what has been discussed in while others, such as the keyness of words
literary studies, some of the above corpus- related to women, seem to point to aspects
driven findings, especially groups (4) – that are too general or superficial for
(6), cannot be said to be totally new or to literary discussion. This may be because,
throw fresh light on stylistic features of unlike academic or other informative texts,
Jane Austen’s novels, since they have been literary texts are expressive texts, whose
already mentioned, more or less, by thematic meanings are not always
literary critics. In fact, to the best of my conveyed straightforwardly through the
knowledge, only group (3) “the auxiliary words that appear on the surface of the
verbs BE and HAVE” has not been talked text. The keywords that are content words,
of in any previous studies of Jane Austen. which generally indicate the “aboutness”
On the surface, this might be taken to of a text as Scott and Tribble (2006) state,
mean that a corpus approach does not do not seem to be of much value to a
seem very helpful in the study of stylistic study of literary texts.
literature, in this case Jane Austen’s

60
A Corpus-based Stylistic Study of Jane Austen’s Novels

A more valuable set of findings that a In the case of modal auxiliary verbs,
corpus-informed method yields, in my though having been studied by Burrows
opinion, is those that involve function (1986), the corpus-driven approach has
words or semi-grammatical lexical items, enabled us to explicate their roles in Jane
such as lexical items in groups (1) – (3). Austen’s novels in a more precise and
This is because this group of findings can refined manner. It shows patterns of co-
hardly ever be detected even by the most occurrences between “could” and “not” or
careful readers. To me, the statistical “must” and “be”, which in turn helps
significance of the auxiliary verbs “be”, illuminate the instantiation of thematic
“been” and “have” and their collocational ideas in her novels.
patterns in the texts is a prime example
that illustrates the value of a corpus-driven Based on my observation of the set of
method in shedding light on linguistic findings from the study, it can be said that
features and patterns that influence our a corpus-driven approach is of value to a
interpretations but are very likely to stylistic analysis of literary texts at varying
escape scholarly attention. levels. At the most basic level, it can be a
“supporting actor” in literary or stylistic
This also applies to the linguistic research, yielding quantitative textual
categories that have been less observed by evidence that supports or refutes intuitive
literary critics. While the high-degree interpretations. More than that, a corpus
words are rarely mentioned in literary approach can be the “main actor” that
criticism of Jane Austen (and when they unearths linguistic or stylistic features that
are, they often occur in passing), it is the even a well-trained reader could hardly
corpus-driven approach in this study that imagine in explaining interpretative issues.
enables us to find out that these apparently Finally, while it has been widely
small words turn out to be the most acknowledged that a corpus approach
distinctive stylistic feature of Jane seems to be the only method analysts can
Austen’s writing. Moreover, while what resort to if a whole work of fictional prose
has been mentioned by critics are all or a group of literary works are objects of
marked intensifiers, such as “vastly” and the study, it has been found from the
“perfectly”, it has been revealed through a present study that the corpus-driven
corpus-driven approach in this study that it approach can also be of great help for a
is not simply the use of intensifiers but detailed manual analysis of part of the
also other sorts of words denoting a high whole text, since it can draw our attention
degree, e.g. adverbs of frequency like to part(s) of a literary work that is(are)
“always” and “never” and indefinite worth further detailed investigation, as
pronouns like “everybody”, “anyone”, that illustrated in the analysis of high-degree
are used significantly and, more words.
importantly, in close proximity to one
another. In other words, the corpus Despite such potential, the comparative
approach has shown that it is not just the corpus-driven technique has certain
use of intensifiers but also the close limitations. First, since corpus linguistics
proximity of high-degree words of various holds that the more frequent linguistic
kinds that serves as a tool for the author to patterns are, the more significant they are,
create and hint at meanings between the an application of a corpus-driven approach
lines in her novels. in stylistics relies heavily on quantitative

61
MANUSYA: Journal of Humanities Regular 16.1, 2013

value, when, in fact, items or patterns that technique has been adopted in a number of
lie above the cut-off point may not be of types and forms of discourse and text
much importance in a literary text. For analyses, it has been adopted only
example, as discussed above, the keyness infrequently in a stylistic analysis of
of words related to women and family literary texts. A corpus-driven approach
ranks very high and yet these words are was thereby applied to an analysis of Jane
not very illuminating textual features when Austen’s six major novels in order to see
it comes to an academic discussion of Jane how well this method works with literary
Austen’s works. On the other hand, items texts. Although I did not identify in the
that are below a statistical cut-off point first place what linguistic features should
can be significant to an analysis of a mark the style of Jane Austen’s works, the
literary text. In fact, as Leech and Short corpus-driven method has yielded a set of
(1981) argue, a word that occurs only once findings that are useful for the discussion
in a text may be of great significance to of Jane Austen. Some can help support
the text under investigation. Second, the literary scholars’ observations on Austen’s
corpus approach allows us to explore novels in quantitative terms while others
mainly lexical items and their patterns in a serve as new linguistic evidence that can
text. Other linguistic aspects, such as enrich the study of the novelist. However,
characters’ interactions, can hardly be although reliance on a computer makes it
dealt with from a corpus-stylistic possible to investigate a group of literary
approach. Irony in Jane Austen’s novels is texts and provides satisfactory results, it is
a good example that illustrates this. only part of the story; the analyst’s
Although the present study enables us to understanding of the text in question still
see that an extensive distribution of plays a central role in explaining and
various kinds of hyperbolic words plays an evaluating those corpus-informed findings.
important role in creating and interpreting
meanings between the lines in Jane References
Austen’s novels, it is undeniable that the
understanding of irony requires more than
Austen, Jane. Emma. London: Penguin
the recognition of high-degree words. A
Books, 1994.
pragmatic or cognitive perspective must
also be involved in a full explanation of
---. Mansfield Park. Harmondsworth,
ironic statements. In other words, the
Middlesex: Penguin Books, 1985.
corpus-driven approach offers an insight
into only one aspect of textual features.
---. Northanger Abbey. New York:
Finally, a word of caution is in order. The
Bantam Books, 1989.
corpus-driven approach simply shows us
what is statistically significant in the text,
---. Persuasion. New York: Bantan Books,
it does not help explain to what extent and
1989.
why the lexical item is significant; that job
is the analyst’s.
---. Pride and Prejudice. London:
Penguin Books, 1994.
Conclusion
---. Sense and Sensibility. London:
The present study starts from an Penguin Books, 1994.
observation that while a corpus linguistic

62
A Corpus-based Stylistic Study of Jane Austen’s Novels

Baker, Paul. 2006. Using Corpora in Hoey, Michael. 1991. Patterns of Lexis
Discourse Analysis. London/ New in Text. Oxford: Oxford UP.
York: Continuum.
Leech, G. and Short, M. 1981. Style in
Biber, Douglas and Susan Conrad. Fiction. London: Longman.
1999. Lexical bundles in conversation
and academic prose. In Out of corpora, Leech, G., Rayson, P. and Andrew
edited by H. Hasselgard, & S. Wilson. 2001. Word Frequencies
Oksefjell, pp.181-190. Amsterdam- in Written and Spoken English
Atlanta GA: Rodopi. Based on the British National
Corpus. London: Longman.
Biber, Douglas. 2011. Corpus
linguistics and the study of Mahlberg, Michaela. 2005. English
literature: Back to the future? General Nouns: A Corpus
Scientific Study of Literature 1(1), Theoretical Approach. Amsterdam:
15-23. John Benjamins.

Booth, Wayne. 1991. Control of --- 2007. Clusters, key clusters and local
Distance in Jane Austen’s Emma. In textual functions in Dickens. Corpora,
Jane Austen: Emma (A Casebook), 2(1), 1-31.
edited by D. Lodge. London:
Macmillan Press, 137-55. McMaster, Juliet. 1996. Jane Austen
the Novelist: Essays Past and
Burrows, J. F. 1986. Modal verbs and Present. Basingstoke: Macmillan.
moral principles: An aspect of Jane
Austen’s style. Literary and Moon, R. and Carmen. R Caldas-
Linguistic Computing 1, 9–23. Coulthard. 2010. ‘Curvy, hunky,
kinky’: using corpora as tools for
Church, Kenneth and Patrick Hanks. critical analysis. Discourse &
1990. Word Association Norms, Society, 21(2) 99–133.
Mutual Information, and
Lexicography. Computational Mudrick, M. 1952. Jane Austen: Irony
Linguistics 16 (1), 22-29. as Defense and Discovery.
Princeton: Princeton UP.
Enkvist, Nils. 1973. Linguistic
Stylistics. The Hague: Mouton. Page, Norman. 1972. The Language of
Jane Austen. Oxford: Basil Blackwell.
Fish, Stanley. 1996. What is Stylistics
and Why are They Saying Such Rayson, Paul. 2007. Wmatrix: a web-
Terrible Things About It? In The based corpus processing environment,
Stylistics Reader, edited by J. Computing Department, Lancaster
Weber, 94-116. London: Arnold. University, January 23, 2011.
<http://www.comp.lancs.ac.uk/ucrel/w
Fletcher, William. 2003. April 12, 2011 matrix/>
<http://phrasesinenglish.org/>

63
MANUSYA: Journal of Humanities Regular 16.1, 2013

Scott, Mike. 1999. WordSmith Tools Tognini-Bonelli, Elena. 2001. Corpus


(Software). Oxford: Oxford UP. Linguistics at Work. Amsterdam:
John Benjamins.
Scott, M. and Christopher Tribble.
2006. Textual Patterns: Keyword
and Corpus Analysis in Language
Education. The Hague: John
Benjamins.

Sherry, Norman. 1966. Jane Austen.


London: Evans Bros.

Short, M. and Elena Semino. 2004.


Corpus Stylistics: Speech, Writing
and Thought Presentation in a
Corpus of English Writing. London:
Routledge.

Sinclair, John. 2004. Trust the Text:


Language, Corpus and Discourse.
London: Routledge.

Stokes, Myra. 1991. The Language of


Jane Austen: A Study of Some
Aspects of Her Vocabulary.
Basingstoke: Macmillan.

Stovel, B. and Lynn Gregg. 2002. The


Talk in Jane Austen. Alberta: U. of
Alberta Press.

Stubbs. M. 2005. Conrad in the


computer: examples of quantitative
stylistic methods. Language and
Literature, 14, 1: 5-24.

Stubbs, M. and Isabelle Barth. 2003.


Using recurrent phrases as text-type
discriminators: a quantitative
method and some findings. Functions
of Language. 10, 1: 65-108.

Tanner, Tony. 2007. Jane Austen


(reissued edition). Hampshire:
Palgrave Macmillan.

64

S-ar putea să vă placă și