Documente Academic
Documente Profesional
Documente Cultură
2 Introduction
Raksangob Wijitsopon
The past few decades have seen a
Abstract remarkable prominence in the application
of corpora in English applied linguistic
While a corpus linguistic technique has research. This also includes the area of
been applied to various studies in text and text and discourse analysis. The corpus-
discourse analysis, it has not been much based technique allows text and discourse
adopted in stylistic analysis of literary analysts to expand the size of their data,
texts. The present study, therefore, applies which in turn enables them to generalize
a corpus-driven approach to Jane their findings to a larger extent than
Austen’s six major novels, in order to see before. On the theoretical side, it enables
how well this new method works with text and discourse researchers to show that
literary texts, compared with what has patterns of co-occurrence among words in
been observed in previous studies of Jane texts are associated with different
Austen’s language. It has been found that meanings and uses in the communicative
the corpus-driven approach can provide events (Sinclair 2004). For example, Biber
quite a few results that are useful in and Conrad (1999) showed that
supporting and refining literary scholars’ conversation and academic writing are
intuitive observations on the author’s markedly different from each other
works. Some of the linguistic patterns through their frequency analysis of phrases
derived from the comparative corpus- found in corpora of the two text types.
driven method have not been remarked on Recent work in critical discourse analysis,
before in any previous studies and hence whose main concern lies in the
can serve as new textual evidence in the relationship between discourse, ideology
study of Jane Austen’s writing style. and power, also incorporates the use of
Despite such great potential for the study corpora in its analytical practice. Moon
of style in literary works, it is suggested and Caldas-Coulthard (2010), for instance,
that the analyst’s knowledge and found from their analysis of a British
understanding of the text(s) under study is newspaper corpus that women are
frequently described in terms of their
1
This study is sponsored by the TRF-CHE physical appearance, as shown by a high
Research Grant for New Scholar and the frequency of such adjectives as
Ratchadaphiseksomphot Endowment Fund of “beautiful”, “pretty” and “lovely” used in
Chulalongkorn University (RES560530083- collocation with references to women,
HS). I would like to express my gratitude to whereas references to men are usually
the Thailand Research Fund (TRF), the modified by adjectives related to
Commission on Higher Education (CHE), importance, including “great”, “key” and
Ministry of Education, and Chulalongkorn
“main”. This difference in the media’s
University for their support.
2
Assistant Professor, Department of English,
discursive practice, it is argued, reflects
Faculty of Arts, Chulalongkorn University
MANUSYA: Journal of Humanities Regular 16.1, 2013
42
A Corpus-based Stylistic Study of Jane Austen’s Novels
43
MANUSYA: Journal of Humanities Regular 16.1, 2013
suggesting that they are used to delineate to explore stylistic features and their
reliable characters from comic or textual functions in Jane Austen’s novels:
superficial ones, whose idiolects tend to be “keyness”, “collocation” and “cluster”,
filled with concrete nouns. A more each of which is explained in turn below.
detailed account of Jane Austen’s lexical
choices is found in Stokes (1991), who a. Keyness
argues that Jane Austen’s novels contain
four major groups of words, namely those To answer the research questions stated
related to (1) spirit, e.g. “vivacity” and above, the concept of “keyword” in corpus
“ardour”, (2) manners, e.g. “civil” and linguistics is drawn upon as a starting
“elegant”, (3) intelligence, e.g. point in the analytical procedures.
“accomplishment” and “discernment” and According to Scott and Tribble (2006),
(4) temperament, e.g. “amiable” and keywords are lexical items of significance
“disposition”, all of which are central to to a text in question, because of their
the development of the plots and themes of “unusual frequency in comparison with a
her novels, including those about reference corpus of some suitable kind”.
judgment or the disparity between The “unusual frequency” here refers to
appearance and reality. both “unusually high” and “unusually
low” frequency. For the purpose of the
Given that Jane Austen’s novels have been present study, only items with “unusually
discussed widely in literary studies and high” frequency are considered.
that some observations have been made on
her language use, though intuitively, an The phrase “unusually high” here suggests
examination of the language in her six that keywords are not simply words of
major novels using a new approach like high frequency. In other words, keywords
corpus stylistics might be a boon to both are not necessarily the most frequent
English literary studies and corpus words found in a text. Keywords are
linguistics: findings from the corpus- important to the text because they are used
driven approach can be compared with “unusually often” when compared with
what has been said in the previous studies, other texts. To illustrate, if we consider a
which in turn would help us evaluate how corpus of Jane Austen’s novels alone, the
well the approach works in relation to definite article “the” is found to be the
literary discourse; on the other hand, most frequent word in Jane Austen’s
critics’ careful observations of Jane novels. However, can we immediately say
Austen’s writing can be validated or that it marks Jane Austen’s writing style?
refined by findings from a systematic Given that “the” is an article, it is likely to
corpus-stylistic approach, which involves be used very often in any piece of writing,
both the quantitative and qualitative not just in Jane Austen’s novels.
analysis of all her six major novels. Therefore, we need to compare Jane
Austen’s works with other authors’ so that
Methodology we can see if she really used “the”
significantly more than others. And when
compared with other novelists’ writing,
1. Descriptive tools
the article does not turn up as a keyword in
Jane Austen’s works because it is also
There are three corpus linguistic
used frequently by other authors. On the
descriptive tools that are used in this study
44
A Corpus-based Stylistic Study of Jane Austen’s Novels
other hand, the word “very”, whose words that are not usually identified
frequency is lower than that of “the”, consciously by readers as key but
appears in the keyword list of Jane nonetheless occur in significantly high
Austen’s novels. This means that Jane frequencies and so can be indicators of the
Austen used the word “very” significantly style of a text, rather than of its content.
more often than other writers. Therefore,
as Baker (2006) puts it, a keyword list (b) Collocation
gives a measure of saliency, not just
frequency, of the lexical items in a text After keywords are extracted, their
and hence can suggest further examination significance in the six major novels needs
of their textual functions. This is the to be explained. To this end, the concept
reason why keywords are fundamental to of collocation, defined by Hoey (1991: 6-
the corpus-stylistic analysis of Jane 7) as “the relationship a lexical item has
Austen’s novels in the present study: with items that appear with greater than
through the Keyword function in Rayson’s random probability in its (textual)
(2007) Wmatrix Tools (see below), lexical context”, is drawn upon. This definition
items that are characteristic of Jane emphasizes that collocation of a word is
Austen’s novels are extracted, some of not just a random co-occurrence of words,
which will be further investigated in detail. e.g. “she + is”, but the co-occurrence takes
place in a text for some reason, as seen
Based on the above principle, it can be from the phrase “with greater than random
seen that a keyword list is derived through probability in its (textual) context”. For
a comparison between the text or corpus in example, as shown by Stubbs (2001: 28),
question and reference corpora, with the common collocations of the word “seek”
size of each corpus and the frequencies of include “help”, “advice” and “support”.
each word within them being cross- An examination of the collocational
tabulated. Such statistical tests as the chi- patterns of a word in a text can therefore
square or log-likelihood tests are then allow us to see the relationship between
employed to measure the degree of each lexical items in a text, which in turn
word’s significance. In this study, as far enables us to see the way words are used
as a statistical test is concerned, the to create meanings in a text. To find out
statistical measure log-likelihood is what keywords are used “with greater than
applied for all the comparisons. This is random probability” in Jane Austen’s
because, according to Leech et al. (2001), novels, a computer-assisted extraction of
while many statistical tests rely on an collocates, through the statistical measure
assumption of normal distribution of data, Mutual Information (MI)3, is adopted.
which is often not the case with linguistic
data such as word frequencies, the log- (c) Cluster
likelihood measure does not.
In this study, while a collocation refers to
According to Scott and Tribble (2006), an individual lexical item that is found
three kinds of words usually come out of a
comparison as keywords: (1) proper
3
nouns, (2) words that “human beings Mutual Information is a concept in statistics,
would recognise” as key, which tend to often adopted in lexicography to measure the
indicate a text’s “aboutness”, and (3) strength of collocations (see, for example,
Church and Hanks 1990)
45
MANUSYA: Journal of Humanities Regular 16.1, 2013
46
A Corpus-based Stylistic Study of Jane Austen’s Novels
different from the kind of English with An examination of the texts at three
which modern readers are familiar. A linguistic levels, i.e. lexical, grammatical
comparison with 19CNov, on the other and semantic, serves as a means of
hand, would show in what ways linguistic triangulation in the way we approach the
patterns in JA are different from those in texts, looking at different linguistic
the novels contemporary to JA. The lists features of the texts, and also reduces the
of keywords obtained from the comparison problem that may arise as a result of the
of JA with the two different reference focus on frequency and statistical value.
corpora are compared. The words that are That is, by focusing on keywords only,
found to occur on both lists are deemed researchers may have to ignore words
true keywords that mark Jane Austen’s below the statistical cut-off point 6 even
writing style since they are found to be of though they are actually closely related to
statistical significance no matter what those above the cut-off point. For
“linguistic norms” are considered, the example, as will be seen below, the word
tendency of present-day British English “very” is ranked second in the keyword
representing modern readers’ language or list of JA. However, the author also used
that of 19th Century British English. other degree adverbs, such as “so” and
“really”, but they do not turn up in the
The software used for the corpora keyword list since their statistical value
comparison in this project are WordSmith stands below the cut-off point set in the
Tools, developed by Scott (1999) and extraction. Nevertheless, they may be
Wmatrix, developed by Rayson (2007). found as part of key grammatical
Both are software tools for corpus analysis categories if the “degree adverbs” category
and text comparison. The former is an is designated as key. If not, we can infer
integrated suite of programmes that enable that only the word “very” is of particular
us to examine how words behave in texts. significance to Jane Austen’s writing since
Wmatrix provides a web interface to the it was used more significantly even than
semantic and grammatical corpus its close synonyms. In other words, the
annotation tools, i.e. USAS and CLAWS5, extraction of key semantic fields and key
respectively. Wmatrix users can upload grammatical categories would help
their own corpus data to the system, so confirm the significance of the individual
that it can be automatically annotated and key words or shed light on the density of
viewed via the web browser. Wmatrix also some words that may have been
extends the keyword method to key overlooked due to their relatively lower
grammatical categories and key semantic frequency as individual words.
fields, i.e. grammatical categories and
semantic fields that are of significance to
the text under investigation due to their
unusual frequencies when compared with 6
The statistical cut-off point used here refers
reference corpora. to the level at which the statistical value of a
word is considered meaningful for data
interpretation. For example, if a cut-off point is
set at a statistical value of 0.05, words with
5
USAS stands for UCREL Semantic Analysis values higher than 0.05 are considered
System and CLAWS for the Constituent important and should be chosen for detailed
Likelihood Automatic Word-tagging System examination while those with values lower
than 0.05 are not very important.
47
MANUSYA: Journal of Humanities Regular 16.1, 2013
Result
48
A Corpus-based Stylistic Study of Jane Austen’s Novels
7
The three most frequent words in each semantic field are given as sample words.
8
If a grammatical category contains more than three items, e.g. the “degree adverb” group, the three
most frequent words in each category are given as sample words. For those that contain three or fewer
than, all of the items are put in the table.
49
MANUSYA: Journal of Humanities Regular 16.1, 2013
50
A Corpus-based Stylistic Study of Jane Austen’s Novels
been talking about suggests that the but style markers of all Jane Austen’s six
corpus-driven keyword analysis conducted major novels. Since these three groups of
in this study can provide textual evidence findings have not yet received much
for previous studies on Jane Austen. attention in literary discussions of Jane
Because these three groups of linguistic Austen’s works, they will be investigated
markers of Jane Austen’s writing style in turn in the present study.
have often been mentioned in literary
studies of Jane Austen, the present study (1) Words related to a high degree
will not deal with them in detail.
Of all the six groups of key linguistic
The other group consists of lexical items, features shown above, words related to a
as Scott and Tribble (2006) state, that are high degree are the most characteristic of
not usually identified consciously by Jane Austen’s novels. This is reflected in
readers as important but occur in the fact that they occur in all three
significantly high frequencies and so can different linguistic categories, as shown
be indicators of the style of a text, rather below:
than of its content. Given that Jane
Austen’s novels are not about the degree Key semantic fields degree boosters
of something, we can say that words in entire; maximum
Group (1) “words related to a high degree” degree maximizers
are style markers, rather than “aboutness”
indicators while the other two groups, Keywords very
“modal auxiliary verbs” and “auxiliary BE every
and HAVE” are helping verbs which such
generally do not express the content of the any
texts. To the best of my knowledge, the
auxiliary verbs BE and HAVE have never Key grammatical categories
been mentioned anywhere, even in degree adverb
passing, in literary studies of Jane Austen.
As for words related to a high degree, they Not only the density of words related to a
have been mentioned in passing in relation high degree but also the degree of their
to Jane Austen’s characterization of comic keyness reflect their greater significance to
and insensible characters through their Jane Austen’s novels than other key
idiolects (cf. e.g. Booth 1991 and Stokes features; the semantic field “degree
1991), i.e. those characters tend to use boosters” and grammatical category
such intensifiers as “extremely” and “degree adverbs” are ranked first in the
“vastly” when expressing their opinions relevant lists (see Tables 2 and 3 above)
about something. Modal auxiliary verbs while the keyword “very” is in second
have been studied in great detail by place in the keyword list (see Table 1
Burrows (1986), but they are treated as above).
properties of the speech of characters with
a strong sense of morality, e.g. Fanny Upon examination of concordance lines of
Price in Mansfield Park or Mr. Knightley the words in this group, it is found that the
in Emma, while it is revealed in the words denoting a high degree are used in
present study that they are not just close proximity to one another. A strong
characteristic of some characters’ speech density of high-degree words at some
51
MANUSYA: Journal of Humanities Regular 16.1, 2013
52
A Corpus-based Stylistic Study of Jane Austen’s Novels
without which he/she may not have at all between the lines in many parts of the
considered that part of the work. novels. The word “crucial” used here is
not an exaggeration, however, given that
Apart from the use of high-degree words the creation and interpretation of meanings
in the narrative part, it is also found from between the lines is one of the remarkable
the concordance lines analysis that this qualities of Jane Austen’s novels. Literary
group of lexical items occurs significantly critics often note that it is not only the
as part of the conversations among female protagonists in all her novels but
characters. It is found from the analysis also her readers who are involved in the
that, like the narrative part, a character’s process of distinguishing between
direct speech that displays a density of appearance and reality (McMaster 1996).
high-degree words is a marked exaggeration. Decoding meaning between the lines, be it
The exaggerated speech tends to betray the irony or the insincerity of some characters,
speaker’s insincerity or insensibility. is a task that her readers will experience in
Below is a direct speech quotation of Lucy the course of their reading. While this is
Steele, an antagonist in Sense and widely remarked on in studies of Jane
Sensibility. In this extract, Lucy is talking Austen, it is hardly ever mentioned in
to Elinor Dashwood, trying to convince what ways this thematic instantiation is
Elinor that she truly loves Edward and it is achieved textually. The corpus-driven
not because of his prospect of inheriting a approach has directed our attention to
large fortune from his mother when she high-degree words and led us to see that it
dies. is this group of lexical items that are used
strategically for creating and hinting at
He [Edward] has only two thousand meanings between the lines in her novels.
pounds of his own; it would be madness
to marry upon that, though for my own (2) Modal auxiliary verbs
part, I could give up every prospect of
more without a sigh. I have been always Modal auxiliary verbs are also highly
used to a very small income, and could significant to Jane Austen’s writing style,
struggle with any poverty for him. whether as a semantic and grammatical
group or as a single lexical item, since
Like the passage from Emma, this extract they occur in all three linguistic levels:
does not state that Lucy Steele is lying but
readers are encouraged not to believe what Key semantic fields likely
she says, more or less because her love for strong obligation
Edward sounds exaggerated and that is or necessity
evidenced by the use of such high-degree
lexical items as “very”, “every” and Keywords could
“always” close to one another. Must
To summarise, based on the corpus-driven Key grammatical categories
approach, words denoting a high-degree modal auxiliary
are found to be most characteristic of Jane
Austen’s novels. Their close distribution The modal verbs that are most
in all her major novels plays a crucial role characteristic of Jane Austen’s novels are
in suggesting that there are meanings “could” and “must”; however, the fact that
53
MANUSYA: Journal of Humanities Regular 16.1, 2013
modal auxiliary verbs also turn up as key such a frequent word, reading through all
grammatical categories suggests that other the 3,599 concordance lines of the verb
modal verbs are also used significantly in would hardly be possible. Therefore, an
her works but they do not turn up above automatic extraction of the most frequent
the cut-off point set at LL 200. As for the phraseological patterns of the verb “could”
key semantic fields, although not all the is opted for, so that it is possible to see in
words in the “likely” and “strong what sort of textual environment the verb
obligation or necessity” groups are modal is predominantly used. Based on the
verbs10, it can be argued that the keyness extraction of three-word clusters of which
of these two semantic fields is largely “could” is a part, which occur more than
constituted by modal auxiliary verbs since 50 times in Jane Austen’s novels, a total of
78.25% of the “likely” semantic field is nine clusters turn up in the list presented
made up of “could”, “would”, “can”, below, with the frequency of each cluster
“might” and “may” while 66.57% of the in the parentheses.
“strong obligation or necessity” field is
made up of “must” and “should”. Top 9 three-word clusters of “could”
1. she could not (333)
The textual functions of the modal 2. could not be (167)
auxiliary verbs in Jane Austen’s novels 3. I could not (87)
cannot be approached in the same way as 4. could not have (76)
the hyperbolic words analysed above. This 5. could not help (76)
is because meanings of the modal verbs 6. that she could (75)
vary across the contexts of their 7. could not but (73)
occurrences while uses of those high- 8. he could not (69)
degree words are closely similar to one 9. as she could (66)
another. Therefore, an account of the
semantic and pragmatic significance of all As can be seen, out of the nine 3-word
modal verbs in Jane Austen’s novels clusters, seven of them contain the
cannot be presented here. However, two negative word “not”. We can thus infer
modal verbs, “could” and “must”, will be that the keyword “could” is used
discussed in some detail because, among repeatedly in the novel to convey
all the modal auxiliary verbs, they are also characters’ inability to do something.
listed as keywords in Jane Austen’s Upon further investigation of the
novels. concordance lines of all the “could not”
clusters, it is found that they tend to co-
2.1 “could” occur with words or phrases related to
cognition, perception and speech, as
The modal verb “could” is found to occur illustrated below:
3,599 times in all six novels. To do a
qualitative analysis of textual functions of - She could not avoid a little suspicion at
the total suspension of Isabella’s impatient
desire to see Mr. Tilney. (Northanger
10
The “likely” group also contains such words Abbey)
as “probably”, “promising” and “probable”
and the “strong obligation or necessity” group - With all these circumstances, recollections
consists of such words as “necessary”,
and feelings, she could not hear that
“obligation” and “duties”.
54
A Corpus-based Stylistic Study of Jane Austen’s Novels
55
MANUSYA: Journal of Humanities Regular 16.1, 2013
speculations about certain people or states clusters of “must” indicate that, in Jane
of affairs.11 This can be illustrated by the Austen’s fictional world, the characters
sample concordance lines below, where often do not have a clear idea about other
the relevant clusters are underlined: characters or certain matters and hence
have to guess what exactly is the case.
- The fact [that Colonel Brandon fell in These two keywords therefore serve as
love with Marianne Dashwood] was linguistic evidence that accounts for
ascertained by his listening to her again. It literary critics’ interpretation that Jane
must be so. She was perfectly convinced Austen’s works tend to be lacking in
of it. (Sense and Sensibility) physical action but are full of the
narrator’s portrayal of the characters’
- “No doubt she [Miss Crawford] will be thoughts and presentation of their
very glad. It must be a great relief to her,” conversations.
said Fanny, trying for greater warmth of
manner. (Mansfield Park) (3) BE and HAVE
- She felt that something must be the Table 1 above shows that two verb forms
matter. The change was indubitable. The of the lemma BE, “be” and “been”, and
difference between his present air and the infinitive “have” are among the top 17
what had been in the Octagon Room was keywords and key grammatical features in
strikingly great. (Persuasion) Jane Austen’s novels. However, they do
not occur in the key semantic domains.
This pattern of the modal “must” suggests This is probably because, based on an
that Jane Austen’s novels are to a large investigation of the concordance lines of
extent concerned with characters’ the verbs, many of them are used as
speculations about others or certain states helping verbs, whose role has more to do
of affairs. This relates to the point with grammatical form than semantic or
discussed above in the analysis of “could” pragmatic aspects of the texts. Although
clusters: just as the frequent negative they are mainly used as helping verbs, the
clusters of “could” contribute to the statistical significance of their occurrences
description of the characters’ inability to in the novels should lead us to turn our
understand some people or matters, the eyes to them and find out why such small
words mark the writing style of this great
11 author.
Another use of “must”, though less frequent,
is suggested by the cluster “must not be” and
“that she must”, which rank fourth and eighth,
Since the three verb forms occur
respectively, in the list. Unlike the other considerably in JA12, it is hardly possible
clusters, “must not be” tends to be used in a to analyse every single concordance line
character’s speech when the speaker, often of the verbs. It is therefore more helpful to
higher in status or older than the interlocutor, extract predominant patterns in which BE
indicates that he/she does not want something and HAVE occur. To this end, an
to happen. For example: extraction of statistically significant
- “My dear friend, you must not be
angry with me” (Northanger Abbey)
12
- “You must not be too severe upon In the six novels by Jane Austen, there are
yourself” (Pride and Prejudice) 8,157 tokens of “be”, 3,257 tokens of “been”
and 5,189 tokens of “have”.
56
A Corpus-based Stylistic Study of Jane Austen’s Novels
collocates of the two verbs was conducted. “better”.13 As can be seen, many of these
It should be noted that I did not choose to adjective collocates of “be” are concerned
extract frequent clusters as in the analysis with thoughts and feelings. This suggests
of “could” and “must” because, based on that the auxiliary verb “be” and its
the experiments with the Cluster function collocation with adjectives denoting
on WordSmith, the frequent clusters of the thoughts and feelings is a statistically
two auxiliary verbs tend to display the co- significant collocational pattern in Jane
occurrence among auxiliary verbs, such as Austen’s novels. This reflects that what
“will have been”, which requires still features in the author’s novels is the
further steps in the analysis. An extraction description of characters’ thoughts and
of collocates of “be”, “been” and “have”, feelings, rather than their physical actions.
which, unlike cluster analysis, includes In fact, as shown in Table 3 above, lexical
words that do not necessarily occur items about thoughts and feelings
immediately before or after the verb forms constitute the key semantic domains in
but still occur within the four-word span of Jane Austen’s novels. Although the other
the node words, seems to be more helpful three adjectives, “able”, “likely” and
in showing phraseological patterns of the “better”, are not directly about thoughts
three verb forms. To extract significant and feelings, an investigation of their
collocates of “be”, “been” and “have”, concordance lines suggests that they are
three criteria were set up as follows: also more or less connected to thinking
and feeling since they are part of the
(1) The collocates must be lexical items characters’ judgment or evaluation of
with a minimum frequency of 30 tokens others or certain states of affairs. This is
in JA illustrated in the following sample
concordance lines of the collocation
(2) The collocates must be lexical items among “be”, “likely” or “able” or “better”,
found to occur within the 4-word span to and evaluative words or phrases:
the right and left of the search word
- Their drive, even when this subject was
(3) The collocates must have a minimum over, was not likely to be very
statistical MI value of 3. agreeable. (Northanger Abbey)
The collocates of these three keywords can - But I shall tell you, Miss Anne, because
similarly be divided into seven groups: (1) you may be able to set things to rights,
modal auxiliary verbs, (2) personal that I have no very good opinion of Mrs.
pronouns, (3) prepositions (i.e. “by” and Charles’ nursery maid. (Persuasion)
“before”, (4) adjectives, (5) adverbs, (6)
lexical verbs and (7) nouns. - She […] thought it would be better to
speak openly to her aunt than to run such
However, the dominant group of a risk. (Pride and Prejudice)
collocates of each verb varies. The largest
group of collocates of “be” is adjectives,
namely “glad”, “sorry”, “satisfied”,
“sure”, “happy”, “able”, “likely”, and
13
There are relatively much fewer cases in
which “better” is used as an adverb, compared
with its use as an adjective.
57
MANUSYA: Journal of Humanities Regular 16.1, 2013
Taking all the above statistically not know at all that Frank has always only
significant adjective collocates together, pretended to feel attached to her.]
we can see that the pattern of “be +
adjective” is the dominant phraseological - Her own attachment had really subsided
pattern that is connected to the noted into a mere nothing; it was not worth
quality of Jane Austen’s writing style, i.e. thinking of; -- but if he, who had
her works mainly involve characters’ undoubtedly been always so much the
judging someone or something. most in love of the two, were to be
returning with the same warmth of the
Unlike “be”, the most predominant group sentiment which he had taken away, it
of collocates of “been”, the other verb would be very distressing. (Emma)
form of BE ranked among the keyword
list, is adverbs, namely “never”, “always”, [Context: While other characters go out,
“ever”, “too”, “so” and “much”. All of Mrs. Rushworth and Mrs. Norris are left at
these collocates can be used to express a home. However, this is not a problem for
high degree of something. This Mrs. Norris as she likes to flatter the rich,
collocational pattern of “been” and high- such as Mrs. Rushworth. In this excerpt,
degree adverbs is linked to the creation of the narrator describes her enjoyment
exaggerated discourse in Jane Austen’s sarcastically.]
works. As already discussed above, these
hyperbolic statements perform crucial - Mrs. Norris had been too well employed
textual functions in the novels. For to move faster. Whatever cross-
example, they may be a part of a accidents had occurred to intercept the
character’ speech, which often betrays the pleasure of her nieces, she had found a
speaker’s extreme, unreliable judgment or morning of complete enjoyment
insincerity, or articulates an ironic force in (Mansfield Park)
the narrative presentation of a character’s
thoughts, or the narrative description of As for the verb “have”, its largest group of
some character or event. This is illustrated collocates is lexical verbs in the past
through the following sample concordance participle form, namely “seen”, “heard”,
lines: “known”, “thought”, “given”, “done”, and
“made”. This does not appear very
[Context: Lucy Steele is trying to mislead surprising, given that the node word is the
Elinor that Edward loves her very much.] verb “have” and hence the collocational
pattern between “have” and past participle
- I have never been able,” continued Lucy, verbs reflects the grammatical
“to give him my picture in return, which construction of the present perfect.
I am very much vexed at, for he has However, upon investigation of the list of
been always so anxious to get it! (Sense those significant verb collocates, it is
and Sensibility) observed that they have in common certain
semantic properties; that is, the verbs
[Context: Emma is pondering that after “seen” and “heard” are verbs of perception
she and Frank Churchill have not seen and “known” and “thought” are verbs of
each other for a while, she has lost her cognition. The other three verbs, “given”
feelings for him and he would be upset if and “made”, though not conveying
he found this out. But, in fact, Emma does meanings of perception or cognition, are in
58
A Corpus-based Stylistic Study of Jane Austen’s Novels
many cases used with words related to conflicts between characters’ words or
thoughts and feelings. This is illustrated actions and their thoughts or contrasts
below with the relevant words underlined: between what a character speculates and
what really happens. This is reflected in
- I think differently now; time and the fact that “have done” tends to co-occur
sickness and sorrow have given me other with modal verbs, as shown in the sample
notions. (Persuasion) concordance lines below:
- Could she but have given Harriet her - “Well, Catherine, how do you like my
feelings about it all? She has talked her friend Thorpe?" Instead of answering, as
into love; but, alas! she was not so easily she probably would have done, had there
to be talked out of it. (Emma) been no friendship and no flattery in the
case, "I do not like him at all," she
- The letters from town, which a few days directly replied, "I like him very much;
before would have made every nerve in he seems very agreeable.” (Northanger
Elinor’s body thrill with transport, now Abbey)
arrived […] (Sense and Sensibility)
- Had he wished ever to see her again, he
- But your arts and allurements may, in a need not have waited till this time; he
moment of infatuation, have made him would have done what she could not but
forget what he owes to himself and to all believe that in his place she should have
his family. (Pride and Prejudice) done long ago, […] (Persuasion)
- But with sense and temper which ought - “[…] However, I recollected afterwards
to have made him judge and feel better, that if he had been prevented from
he allowed himself great latitude on such going, the wedding need not be put off,
points. (Mansfield Park) for Mr. Darcy might have done as well."
(Pride and Prejudice)
- “And now, Henry,” said Miss Tilney,
“that you have made us understand each Given such repeated uses of the
other, you may as well make Miss phraseological pattern “modal verb + have
Morland understand yourself […]” + done” in Jane Austen’s novels, it can be
(Northanger Abbey) said that the recurrence of this pattern
contributes to the interpretation that Jane
The fact that the keyword auxiliary verb Austen’s novels deal with differences
“have” tends to be used in collocation with between appearance and reality.
words related to perception, thoughts and
feelings, serves as a set of linguistic Discussion
evidence that accounts for the reason why
literary critics tend to feel that Jane
The findings presented above throw light
Austen’s novels are lacking in action.
on two important points in relation to the
research questions (see above) addressed
The collocational pattern of “have” and
in the present study: (1) textual patterns
“done”, however, occurs significantly in
and their relationship to meaning in Jane
the six novels because they are often used
Austen’s novels and (2) assessment of a
as part of the narrative description of
corpus-driven approach to the study of
59
MANUSYA: Journal of Humanities Regular 16.1, 2013
literary texts. These two points will be novels, since important textual features
discussed together in this section. can be observed through scholars’ close
readings. However, we should not forget
By comparing Jane Austen’s novels with a that those claims are intuition-based and
corpus of their late 18th – early 19th century that they can now be validated (or refuted)
contemporaries and with a corpus of through the findings derived from a
present-day British fictional prose, it is quantitative comparative method. In short,
found that the lists of statistically though not providing a totally new set of
significant lexical items, grammatical findings or generating new points of
categories and semantic fields in Jane discussion, a corpus approach can provide
Austen’s novels correlate with one another statistically significant textual evidence
to a large extent. This suggests that, that helps support or refute claims made in
whether we take a lexical, grammatical or literary studies.
semantic perspective, six groups of
linguistic features are particularly Having said that, I must admit that I have
characteristic of Jane Austen’s novels, some reservations about the value of some
namely: textual evidence derived from the corpus-
driven method. Looking, for example, at
(1) words related to a high degree, the density of words related to “women”
and “family relations”, I cannot help but
(2) modal auxiliary verbs, wonder whether we need a corpus
approach to explain that Jane Austen’s
(3) the auxiliary verbs BE and HAVE, novels deal with women and their families.
Is it a worthwhile effort to compile a
(4) words related to internal states of corpus, reference corpora, and conduct a
mind, statistical quantification, just to find that
Jane Austen’s novels are primarily
(5) words related to family relationships, concerned with women and family
matters? I personally feel that only some
(6) words related to women. sets of corpus findings, to be discussed
below, can be of value to literary criticism
In the light of what has been discussed in while others, such as the keyness of words
literary studies, some of the above corpus- related to women, seem to point to aspects
driven findings, especially groups (4) – that are too general or superficial for
(6), cannot be said to be totally new or to literary discussion. This may be because,
throw fresh light on stylistic features of unlike academic or other informative texts,
Jane Austen’s novels, since they have been literary texts are expressive texts, whose
already mentioned, more or less, by thematic meanings are not always
literary critics. In fact, to the best of my conveyed straightforwardly through the
knowledge, only group (3) “the auxiliary words that appear on the surface of the
verbs BE and HAVE” has not been talked text. The keywords that are content words,
of in any previous studies of Jane Austen. which generally indicate the “aboutness”
On the surface, this might be taken to of a text as Scott and Tribble (2006) state,
mean that a corpus approach does not do not seem to be of much value to a
seem very helpful in the study of stylistic study of literary texts.
literature, in this case Jane Austen’s
60
A Corpus-based Stylistic Study of Jane Austen’s Novels
A more valuable set of findings that a In the case of modal auxiliary verbs,
corpus-informed method yields, in my though having been studied by Burrows
opinion, is those that involve function (1986), the corpus-driven approach has
words or semi-grammatical lexical items, enabled us to explicate their roles in Jane
such as lexical items in groups (1) – (3). Austen’s novels in a more precise and
This is because this group of findings can refined manner. It shows patterns of co-
hardly ever be detected even by the most occurrences between “could” and “not” or
careful readers. To me, the statistical “must” and “be”, which in turn helps
significance of the auxiliary verbs “be”, illuminate the instantiation of thematic
“been” and “have” and their collocational ideas in her novels.
patterns in the texts is a prime example
that illustrates the value of a corpus-driven Based on my observation of the set of
method in shedding light on linguistic findings from the study, it can be said that
features and patterns that influence our a corpus-driven approach is of value to a
interpretations but are very likely to stylistic analysis of literary texts at varying
escape scholarly attention. levels. At the most basic level, it can be a
“supporting actor” in literary or stylistic
This also applies to the linguistic research, yielding quantitative textual
categories that have been less observed by evidence that supports or refutes intuitive
literary critics. While the high-degree interpretations. More than that, a corpus
words are rarely mentioned in literary approach can be the “main actor” that
criticism of Jane Austen (and when they unearths linguistic or stylistic features that
are, they often occur in passing), it is the even a well-trained reader could hardly
corpus-driven approach in this study that imagine in explaining interpretative issues.
enables us to find out that these apparently Finally, while it has been widely
small words turn out to be the most acknowledged that a corpus approach
distinctive stylistic feature of Jane seems to be the only method analysts can
Austen’s writing. Moreover, while what resort to if a whole work of fictional prose
has been mentioned by critics are all or a group of literary works are objects of
marked intensifiers, such as “vastly” and the study, it has been found from the
“perfectly”, it has been revealed through a present study that the corpus-driven
corpus-driven approach in this study that it approach can also be of great help for a
is not simply the use of intensifiers but detailed manual analysis of part of the
also other sorts of words denoting a high whole text, since it can draw our attention
degree, e.g. adverbs of frequency like to part(s) of a literary work that is(are)
“always” and “never” and indefinite worth further detailed investigation, as
pronouns like “everybody”, “anyone”, that illustrated in the analysis of high-degree
are used significantly and, more words.
importantly, in close proximity to one
another. In other words, the corpus Despite such potential, the comparative
approach has shown that it is not just the corpus-driven technique has certain
use of intensifiers but also the close limitations. First, since corpus linguistics
proximity of high-degree words of various holds that the more frequent linguistic
kinds that serves as a tool for the author to patterns are, the more significant they are,
create and hint at meanings between the an application of a corpus-driven approach
lines in her novels. in stylistics relies heavily on quantitative
61
MANUSYA: Journal of Humanities Regular 16.1, 2013
value, when, in fact, items or patterns that technique has been adopted in a number of
lie above the cut-off point may not be of types and forms of discourse and text
much importance in a literary text. For analyses, it has been adopted only
example, as discussed above, the keyness infrequently in a stylistic analysis of
of words related to women and family literary texts. A corpus-driven approach
ranks very high and yet these words are was thereby applied to an analysis of Jane
not very illuminating textual features when Austen’s six major novels in order to see
it comes to an academic discussion of Jane how well this method works with literary
Austen’s works. On the other hand, items texts. Although I did not identify in the
that are below a statistical cut-off point first place what linguistic features should
can be significant to an analysis of a mark the style of Jane Austen’s works, the
literary text. In fact, as Leech and Short corpus-driven method has yielded a set of
(1981) argue, a word that occurs only once findings that are useful for the discussion
in a text may be of great significance to of Jane Austen. Some can help support
the text under investigation. Second, the literary scholars’ observations on Austen’s
corpus approach allows us to explore novels in quantitative terms while others
mainly lexical items and their patterns in a serve as new linguistic evidence that can
text. Other linguistic aspects, such as enrich the study of the novelist. However,
characters’ interactions, can hardly be although reliance on a computer makes it
dealt with from a corpus-stylistic possible to investigate a group of literary
approach. Irony in Jane Austen’s novels is texts and provides satisfactory results, it is
a good example that illustrates this. only part of the story; the analyst’s
Although the present study enables us to understanding of the text in question still
see that an extensive distribution of plays a central role in explaining and
various kinds of hyperbolic words plays an evaluating those corpus-informed findings.
important role in creating and interpreting
meanings between the lines in Jane References
Austen’s novels, it is undeniable that the
understanding of irony requires more than
Austen, Jane. Emma. London: Penguin
the recognition of high-degree words. A
Books, 1994.
pragmatic or cognitive perspective must
also be involved in a full explanation of
---. Mansfield Park. Harmondsworth,
ironic statements. In other words, the
Middlesex: Penguin Books, 1985.
corpus-driven approach offers an insight
into only one aspect of textual features.
---. Northanger Abbey. New York:
Finally, a word of caution is in order. The
Bantam Books, 1989.
corpus-driven approach simply shows us
what is statistically significant in the text,
---. Persuasion. New York: Bantan Books,
it does not help explain to what extent and
1989.
why the lexical item is significant; that job
is the analyst’s.
---. Pride and Prejudice. London:
Penguin Books, 1994.
Conclusion
---. Sense and Sensibility. London:
The present study starts from an Penguin Books, 1994.
observation that while a corpus linguistic
62
A Corpus-based Stylistic Study of Jane Austen’s Novels
Baker, Paul. 2006. Using Corpora in Hoey, Michael. 1991. Patterns of Lexis
Discourse Analysis. London/ New in Text. Oxford: Oxford UP.
York: Continuum.
Leech, G. and Short, M. 1981. Style in
Biber, Douglas and Susan Conrad. Fiction. London: Longman.
1999. Lexical bundles in conversation
and academic prose. In Out of corpora, Leech, G., Rayson, P. and Andrew
edited by H. Hasselgard, & S. Wilson. 2001. Word Frequencies
Oksefjell, pp.181-190. Amsterdam- in Written and Spoken English
Atlanta GA: Rodopi. Based on the British National
Corpus. London: Longman.
Biber, Douglas. 2011. Corpus
linguistics and the study of Mahlberg, Michaela. 2005. English
literature: Back to the future? General Nouns: A Corpus
Scientific Study of Literature 1(1), Theoretical Approach. Amsterdam:
15-23. John Benjamins.
Booth, Wayne. 1991. Control of --- 2007. Clusters, key clusters and local
Distance in Jane Austen’s Emma. In textual functions in Dickens. Corpora,
Jane Austen: Emma (A Casebook), 2(1), 1-31.
edited by D. Lodge. London:
Macmillan Press, 137-55. McMaster, Juliet. 1996. Jane Austen
the Novelist: Essays Past and
Burrows, J. F. 1986. Modal verbs and Present. Basingstoke: Macmillan.
moral principles: An aspect of Jane
Austen’s style. Literary and Moon, R. and Carmen. R Caldas-
Linguistic Computing 1, 9–23. Coulthard. 2010. ‘Curvy, hunky,
kinky’: using corpora as tools for
Church, Kenneth and Patrick Hanks. critical analysis. Discourse &
1990. Word Association Norms, Society, 21(2) 99–133.
Mutual Information, and
Lexicography. Computational Mudrick, M. 1952. Jane Austen: Irony
Linguistics 16 (1), 22-29. as Defense and Discovery.
Princeton: Princeton UP.
Enkvist, Nils. 1973. Linguistic
Stylistics. The Hague: Mouton. Page, Norman. 1972. The Language of
Jane Austen. Oxford: Basil Blackwell.
Fish, Stanley. 1996. What is Stylistics
and Why are They Saying Such Rayson, Paul. 2007. Wmatrix: a web-
Terrible Things About It? In The based corpus processing environment,
Stylistics Reader, edited by J. Computing Department, Lancaster
Weber, 94-116. London: Arnold. University, January 23, 2011.
<http://www.comp.lancs.ac.uk/ucrel/w
Fletcher, William. 2003. April 12, 2011 matrix/>
<http://phrasesinenglish.org/>
63
MANUSYA: Journal of Humanities Regular 16.1, 2013
64