Sunteți pe pagina 1din 20

Formal Representation

and the Digital


Humanities
Edited by

Paola Cotticelli-Kurras
and Federico Giusfredi
Formal Representation and the Digital Humanities

Edited by Paola Cotticelli-Kurras and Federico Giusfredi

This book first published 2018

Cambridge Scholars Publishing

Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK

British Library Cataloguing in Publication Data


A catalogue record for this book is available from the British Library

Copyright © 2018 by Paola Cotticelli-Kurras, Federico Giusfredi


and contributors

All rights for this book reserved. No part of this book may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording or otherwise, without
the prior permission of the copyright owner.

ISBN (10): 1-5275-0803-X


ISBN (13): 978-1-5275-0803-3
ON SONORITY AND ACCENT
1
IN TOCHARIAN B

HANNES A. FELLNER
AND BERNHARD KOLLER

Abstract
Since early 2011 the Linguistics Department at the University of Vienna
has hosted a project to create an electronic edition of all available
Tocharian manuscript fragments combined with a linguistic database (A
Comprehensive Edition of Tocharian Manuscripts [CEToM]:
www.univie.ac.at/tocharian). The present study demonstrates how the
CEToM database can effectively be employed in the study of Tocharian B
phonology, specifically its accentual system. Jasanoff (2015) argued that
the location of the stress accent is the result of sonority-based principles.
The eventual goal of our ongoing study is to test the predictions of
Jasanoff’s proposal on the entire Tocharian B nominal system. To that end
we present an early version of an algorithm implemented in Perl that
automatically determines the underlying accent of nominal stems.

1. Tocharian
Tocharian is a branch of the Indo-European language family.2 It consists
of two languages, designated Tocharian A and Tocharian B. The
Tocharian linguistic material was discovered around the turn of the 19th
and 20th century by Central Asian expeditions of the major political
powers at that time in today’s Xinjiang Uyghur Autonomous Region of the

1
We would like to thank the participants of the workshop Formal Representation
and Digital Humanities for useful comments and suggestions on the present
material, as well as Anna Pagé for providing comments and corrections on a draft
of this paper. The usual disclaimer applies. Research for his paper has been
supported by the Austrian Science Fund (FWF): project number Y 492-G20.
2
See Fortson (2009) for a comprehensive overview and introduction to Indo-
European languages and linguistics.
80 On Sonority and Accent in Tocharian B1

Peoples Republic of China.3 It is still under debate who the Tocharians


were, whence they came and when they moved into the territory from
which the evidence of their language hails. It is also not clear what
ethnonyms attested in the Central Asian, Chinese, and Western sources
refer to the Tocharians.4

The affiliation and relationship of Tocharian with other Indo-European


branches remains an unresolved issue. Based mainly on certain linguistic
and archaeological considerations some scholars have advanced the
hypothesis that – like the Anatolian branch – Tocharian split off from the
rest of the Indo-European family rather early.5 The Tocharian languages
are known from paper manuscript fragments, wooden tablets, and various
inscriptions mainly written in a special Central Asian variant of the Indian
Brahmi script.6 The oldest Tocharian fragments date to the 4th century and
the youngest to the first centuries of the second millennium CE.7 The
extant manuscript fragments are scattered in collections in Berlin, Paris,
London, St. Petersburg, and various places in situ.8

The vast majority of Tocharian texts are Buddhist in nature and – due to
their early date of attestation that makes Tocharian the oldest extant non-
Indian Buddhist language – have a bearing on the question of the spread of
Buddhism along the ancient Silk Road to China. The Tocharian tradition,
though fragmentary, belongs to the oldest extant Buddhist literature. While
many texts are translations or adaptions of Sanskrit or Prakrit originals,
Buddhist drama enjoyed popularity among the Tocharians and this is
where Tocharian literature made original contributions to Buddhism.9
There is also some secular literature, such as medicinal and grammatical
texts, commercial exchanges, letters, and caravan passes.

Aside from its historic significance, Tocharian now plays an increasingly


important role in Indo-European historical linguistics after having largely
been neglected due to its philological intricacies. These resulted from
problems posed by the attestation of Tocharian texts and complexities
concerning the histories of the individual manuscript collections.

3
See Fellner (2007) for an overview of the expeditions.
4
See Mallory (2015), with further references.
5
See Anthony and Ringe (2015), with further references.
6
See Sander (1968).
7
See Malzahn (2007b).
8
See Malzahn (2007c).
9
See Pinault (2016).
Hannes A. Fellner and Bernhard Koller 81

Despite some great scholarly achievements during the first decades after
the discovery of the Tocharian languages, until twenty years ago only a
very few experts had access to manuscripts. In the past the virtual non-
availability of the Tocharian linguistic material stood in the way of
thorough linguistic research. Therefore, up until recently, there were only
a few text editions covering merely a fraction of the attested corpus. This
resulted in incoherent and non-exhaustive handbooks that in turn made
Tocharian the most understudied major branch of Indo-European. This
made studies in the history, culture, and religion of Tocharian based on the
linguistic material almost impossible for non-specialists.

Only in recent years have these problems started to be overcome by


modern dictionaries,10 linguistic introductions11 and research guides,12
linguistic monographs13 and an international journal specifically dedicated
to Tocharian Studies.14

Tocharian has therefore become increasingly relevant within Indo-


European linguistics,15 and other fields, including Buddhist studies, have
also started to recognize the importance of Tocharian within the context of
Silk Road and Central Asian socio-economic, cultural and linguistic
history.16 With the online database A Comprehensive Edition of Tocharian
Manuscripts (CEToM), Tocharian in all its aspects has become one of the
most accessible Indo-European branches.

2. CEToM
Since early 2011 the Linguistics Department at the University of Vienna
has hosted the CEToM project to create an electronic edition of all
available Tocharian material combined with a linguistic database. This
project was generously funded by the START Program of the Austrian

10
See, e.g., Adams (2013); Carling, Pinault, Winter (2009).
11
Pinault (2008).
12
Malzahn (2007a).
13
See, e.g., Ringe (1996); Peyrot (2008); Malzahn (2010).
14
Tocharian and Indo-European Studies, Museum Tusculanum Press, established
by the late Jörundur Hilmarsson (University of Iceland) and currently edited by
Birgit Anette Olsen (University of Kopenhagen), Georges-Jean Pinault (École
Pratique des Hautes Études, Paris), Michaël Peyrot (Leiden University), and
Thomas Olander (University of Kopenhagen).
15
See, e.g., the treatment of the verbal system in Jasanoff (2003: 144-214).
16
See, e.g., the contributions in Malzahn et al. (2015).
82 On Sonority and Accent in Tocharian B1

Science Fund (FWF) (project number Y492) from February 2011 to


January 2017. It is currently maintained by the Department of Linguistics,
University of Vienna. The system is programmed in Perl 5.17 The output is
generated using HTML 4.01 Transitional18 and CSS 2.119 and encoded in
Unicode UTF-8.20

The specific aim of CEToM is to provide an integrated corpus of already


published, as well as still unpublished, Tocharian texts. While the
traditional editions are often lacking translations and photographs of the
manuscripts, in recent years images of many fragments have been made
available online, first on Thesaurus Indogermanischer Text- und
Sprachmaterialien (TITUS)21 and later also on the International
Dunhuang Project (IDP).22 Likewise, many of the old editions have been
digitised, on TITUS and in Gerd Carling’s Thesaurus of Tocharian A (in
progress), and new fragments have been edited both on TITUS and on
IDP. Starting from these existing editions, both printed and online, the aim
of CEToM is to provide a comprehensive edition that uses a unified
notation, provides information about chronological layer and script type,
and adds translation, commentary, a systematic metrical analysis, and a
bibliography.

In addition, the texts are analysed grammatically and, as far as possible,


the morphological characteristics of each word are determined. In this
way, the text corpus is made accessible for any kind of lexical or
grammatical search. It will be possible, for instance, to search for all
genitive plural forms, or for 2sg. finite verbal forms. In the advanced
search function it will also be possible to refine a search by limiting it to
archaic texts, metrical texts, texts from a certain find spot or genre etc., or
a combination thereof.

The combination of an electronic edition with grammatical information


and a sophisticated search function will not only be useful for linguistic
inquiries, but will also be of essential help in the decipherment of
problematic unedited fragments and the identification of Buddhist text
genres.

17
http://www.perl.org/.
18
http://www.w3.org/TR/html4/.
19
http://www.w3.org/Style/CSS/.
20
http://www.unicode.org/.
21
http://titus.fkidg1.uni-frankfurt.de/framee.htm?/index.htm.
22
http://idp.bl.uk/.
Hannes A. Fellner and Bernhard Koller 83

The parameters for a given text are: press mark(s); provenience (main find
spot, specific find spot, expedition code, collection); language and script
(language, linguistic stage, additional linguistic characteristics, script); text
contents (title of the work, passage, manuscript, text genre, text subgenre,
verse/prose, parallel text); object (manuscript, leaf number, material, form,
accessibility, size, completeness, number of lines, line distance); image;
manuscript remarks, transliteration; transcription; translation; philological
commentary; linguistic commentary; parallel text commentary; references;
editor; bibliography. The lexical database is a thesaurus and provides the
relevant morphological information on all parts of speech. By now,
CEToM contains entries for more than 10000 manuscript fragments,
13000 lexical items, and 1200 bibliographical items.

Future plans for the database include an etymological dictionary, a


syntactic treebank, and digital palaeography.

3. Tocharian B Stress Accent


The present study employs the CEToM database to investigate Tocharian
B stress accent, specifically a proposal about exceptions to the general
rules of Tocharian B accent made by Jasanoff (2015). Before moving on to
Jasanoff’s specific claims, we provide an overview of the general rules of
Classical Tocharian B stress accent.

In Tocharian B the location of the word accent can be inferred from vowel
alternations, specifically, the behavior of the central vowel phonemes / /
and /a/. These underlying segments show different realizations depending
on whether they are accented or not. The underlying segment / / is
rendered by <a> [ ] if accented and by <ä> [ ] if unaccented. The
underlying segment /a/ is rendered by < > [a] if accented and by <a> [ ] if
unaccented. In the Tocharian version of the Br hm script the difference
between <ä> [ ], <a> [ ], and < > [ ] is expressed by the use of different
characters.

Underlying segment Accented Unaccented


/ / <a> [ ] <ä> [ ]
/a/ < > [a] <a> [ ]

Table 1.

The vowel <ä> [ ] is always unaccented (at least in Classical Tocharian


B), the vowel < > [a] is always accented. The accent properties of <a> [ ]
84 On Sonority and Accent in Tocharian B1

can be determined based on the alternation it displays across


morphologically related forms (both within and across paradigms). The
basic generalization for Tocharian B accent, as first observed by Krause
(1952: 10; cf also Krause and Thomas 1960: 43), is that disyllabic words
bear the accent on the initial syllable, whereas polysyllabic words usually
bear the accent on the second syllable.

Disyllabic word Polysyllabic word


tarkär [ t rk r] NOM.SG ‘cloud’ tärkarwa [t r k rw ] NOM.PL ‘clouds’
ost [ ost ] NOM.SG ‘house’ ostameṃ [os t men] ABL. SG ‘from the
house’
śarsa [ rs ] 3SG.PRET ‘s/he śärsāre [ r sare] 3PL.PRET ‘they knew’
knew’

Table 2.

Krause specifically described this pattern as involving weakening of the


initial vowel in words of more than two syllables. This interpretation of
the data was challenged by Marggraf (1970), who argued that the stems
that show the kind of vowel alternation observed by Krause are
underlyingly accented on the second (i.e. final) syllable. He then
established the rule that if the lexical accent on a morpheme would end up
on the final syllable, it is retracted onto the preceding syllable instead. As
a complicating factor, a number of morphemes end in an underlying
sequence of /C /. In Tocharian B a final / / generally does not surface.23
Yet, an underlying word-final / / counts as an additional syllable peak
with respect to the structural description of Marggraf’s rule, resulting in
surface forms with the accent on the final syllable.

Accent retraction Accent in situ Underlying form


a) kante känte-nma-sa /k nté/ + /nma/ + /sa/
PERL.PL ‘hundred’
b) māka makā-ntso GEN.PL /maká/ + /nts / ‘much/many’
c) māka makā-ṁts GEN.PL /maká/ + /nts / ‘much/many’

Table 3.

In the first column of table 3 the lexical accent has been retracted from its
underlying position in order to avoid accenting the word-final syllable. In

23
More precisely, the final / / is deleted in prose context but can be realized as [ ]
or [o] in metrical contexts (for details see Malzahn 2012).
Hannes A. Fellner and Bernhard Koller 85

the second column the same stem is followed by an unaccented suffix and
therefore, the accent can remain in its original position. Note that the
forms in row b) and c) involve the same underlying representations, the
only difference being that the final / / in genitive plural morpheme /nts / is
realized as [o] in b) and deleted in c). Yet, both forms exhibit accent
retraction.

A clear advantage of Marggraf’s account over Krause’s is that it correctly


captures the rare cases in which we find vowel alternations that do not
involve the first two syllables.

Accent retraction Accent in situ Underlying form


eneṅka eneṅkā-ññeṃ /enenká/ ‘inside’

Table 4.

According to Marggraf’s model, we simply need to assume that eneṅka


bears underlying accent on the final syllable, which must be retracted if
the stem is realized in isolation. Krause’s description, according to which
words are either accented on the first or the second syllable, has nothing to
say about these cases.

Additionally, there are a number of polysyllabic forms that deviate from


the general patterns outlined above by bearing the accent on the initial
syllable. Among these apparent exceptions are a group of forms that,
according to Malzahn (2010: 6), “have in common that vowel of the initial
syllable is a full vowel such as ā or *æ > TB e and […] the vowel of the
following, second, syllable is, or was, (*)ä.” One example is the preterite
participle eṅku, pl. eṅkoṣ ‘seized’ from Proto-Tocharian *ǽnkəwə and
*ǽnkəwæṣə, respectively.

Word (pret. ptcp.) Expected form Underlying form


yāmu [ yamu] M.SG †yamáu [y m u] / yaməwə/
yāmoṣ [ yamo ] M.PL †yamáweṣ [y m we ] / yaməwæṣə/
yāmuwa [ yamuw ] †yamáwa [y m wa] / yaməwa/
F.SG

Table 5.
86 On Sonority and Accent in Tocharian B1

4. Jasanoff’s Weight-to-Stress Principle


Recently, Jasanoff (2015) – focusing on forms in or associated with the
TB verbal system – has argued that the synchronic pattern observed by
Malzahn constitutes the reflex of a more pervasive phonological Weight-
to-Stress principle – if heavy, then stressed (Prince 1990) – operative in a
prehistoric stage of the Tocharian languages. According to Jasanoff (2015:
90), Tocharian B accent developed via two stages: “1) replacement of the
PIE accentual system by a system of initial accent; 2) advancement of the
stress accent one syllable rightwards in words of three or more syllables,
except in sequences of the form *-ÁC0ə- (i.e., sequences in which the first
syllable contained a “full” (= non-high) vowel and the second contained a
schwa or schwa-antecedent (*i, *u, *e, *R̥ ))”. The non-advancement of the
accent due to the condition specified under 2) Jasanoff calls the “yāmu-
rule”.

Stage I Stage II Toch. B Comment


*lə́klæ *lə́klæ lákle NOM.SG ‘pain’ disyllable, normal
non-advancement
*lə́klænta *ləklǽnta läklénta NOM.PL ‘pains’ trisyllable, normal
advancement
*lə́təwə *lətə́wə ltú PRET.PTCP.NOM.SG trisyllable, normal
‘having gone’ advancement
*yáməwə *yáməwə yā́mu ‘having done’ trisyllable,
advancement
barred by yāmu-
rule

Table 6.

Apart from the nominal forms discussed above, Jasanoff built his
hypothesis about the development of the Tocharian B accent system
primarily on the verbal system, where underlyingly initial accent occurs
systematically in a number of categories. Since, according to him, Weight-
to-Stress was a purely phonological principle operating in Pre-Tocharian,
we would expect to find some reflexes of it within the nominal system as
well. The most obvious way such a reflex could manifest itself would be if
nominal stems with a full vowel in the first syllable and a schwa in the
second syllable bear initial accent more frequently than other types of
stems. Determining whether this prediction is actually borne out within the
attested corpus of Tocharin B requires parsing every single nominal
paradigm within Tocharian B in order to determine the underlying accent
Hannes A. Fellner and Bernhard Koller 87

of the stem (wherever possible). The purpose of the present study is to


explore to what extent the electronic corpus can be employed in order to
automate this task and to demonstrate the potential use of CEToM as a
tool for linguistic analysis. As the CEToM dictionary is still a work in
progress the current paper merely aims to illustrate the methodology
employed rather than to provide definitive linguistic results, a task that
will have to await the inclusion of additional data to the dictionary, both in
terms of lexical entries, and additional lexical information on existing
entries (such as loan word status, on which see below).

5. Approach
We employed an algorithm implemented in Perl 5 to automatically parse
the CEToM dictionary in order to determine the location of the accent in
nominal and adjectival forms. The goal is to be able to automatically
retrieve stems with the relevant structure (i.e. heavy in the first syllable,
light in the second) and initial accent, and compare their distribution with
other types of stems. The corpus we are using for this study comes from
two sources. 2706 nominal forms are taken from the CEToM dictionary,
which is still a work in progress and does not contain all of the forms
attested in the Tocharian B text corpus. Therefore, we increased the data
coverage by adding forms from the Tocharian B dictionary by Douglas
Adams (Adams 1999). The resulting 4562 individual forms are grouped
into 2024 families. We use the term family to refer to a set of
morphologically related forms. This includes forms belonging to the same
paradigm, as well as forms that are related by derivational processes. The
reason we need to operate in terms of families rather than individual stem
forms is that in many cases, the underlying accent location can only be
established by comparing multiple forms of the same stem. Take, for
example, the word for ‘river’ cake with the genitive ckentse.

Surface Representation Underlying Representation


cake [ t ke] /t ke/
ckentse [ t kentse] /t kentse/

Table 7.

The nominative cake shows that the underlying representation of the word
contains a schwa intervening between the first two consonants. Therefore,
in terms of underlying representations, ckentse is accented on the second
88 On Sonority and Accent in Tocharian B1

vowel, despite the fact that in both surface representations it is the first
syllable that is accented. In the case of the nominative this is due to
Marggraf’s accent retraction rule, which prohibits final syllables to be
accented on the surface, while in the case of the genitive the vowel / /
undergoes deletion in an unaccented open syllable.

We will now give a brief overview of how lexical entries are structured
within the CEToM database.24 Each form is stored as a separate entry but
contains a reference to the nominative singular of the same paradigm, or
the form it is derived from. The following examples involve two forms of
the paradigm for ‘pain’. The nominative/oblique plural läklenta (second
entry) contains a reference to the nominative singular lakle (first entry),
which doubles as the lemma form of the paradigm as a whole. The
reference is contained in the field <w_lemma>, indicating that the two
forms belong to the same paradigm.

<entry>
<page_name> lakle </page_name>
<w_case> Nominative+Oblique </w_case>
<w_class> Noun </w_class>
<w_gender> Alternant </w_gender>
<w_language> TB </w_language>
<w_meaning> "suffering, pain" </w_meaning>
<w_noun_number> Singular </w_noun_number>
</entry>
<entry>
<w_lemma> lakle </w_lemma>
<page_name> läklenta </page_name>
<w_case> Nominative+Oblique </w_case>
<w_class> Noun </w_class>
<w_language> TB </w_language>
<w_noun_number> Plural </w_noun_number>
</entry>

Similarly, the derived adjective läklessu ‘painful’ contains a reference to


the base form lakle in the field <w_family>, indicating that the former is
derived from the latter.

24
Rather than presenting the CEToM-internal markup we are giving an XML-
version thereof, as the reader is likely more familiar with this type of markup. The
internal structure of the entries is the same in either representation, however.
Hannes A. Fellner and Bernhard Koller 89

<entry>
<w_family> lakle </w_lemma>
<page_name> läklessu </page_name>
<w_case> Nominative </w_case>
<w_class> Adjective </w_class>
<w_language> TB </w_language>
<w_noun_number> Singular </w_noun_number>
</entry>

When determining the underlying accent of a set of morphologically


related forms, we can ignore the difference between inflection and
derivation, since derivational morphemes (such as the adjectival suffix
-ṣṣe) are accent-neutral. That is, they do not alter the location of the
underlying accent on the stem (although they can have an impact on the
surface accent by providing an extra syllable to the stem and thereby
preventing the application of accent retraction in otherwise disyllabic
forms).

Let us now look at how our algorithm uses the CEToM data in order to
determine the underlying accent of individual nominal forms. The
following sample derivation illustrates how the underlying accent of the
adjectival stem lāre ‘beloved’ is determined based on a set of related
forms25 from the same family.

Sample Family: lāre ‘beloved’

Base: l re lVrV

Form set: {l re.M.NOM.SG, larona.F.PL, larauñe.NOUN.NOM/OBL.SG}

Form surface accent underlying accent updated template


1 l re láre ? lVrV larV
2 larona laróna non-initial larV
3 larona laráuñe non-initial larV

Table 8.

25
In the interest of space we are only giving a subset of the actually attested forms,
which provide sufficient information to determine the location of the accent.
90 On Sonority and Accent in Tocharian B1

The nominative singular masculine form lāre functions as the base form of
the family. This form is converted into a CV-template, in which all vowels
are removed and every consonant is followed by a potential vowel
position. The purpose of this template is to keep track of vowel
alternations (specifically between <ä>, <a>, < > and zero) and to establish
underlying representations of central vowels. The base form lāre yields the
template l V r V. This template is updated with every parsed form of the
family in order to record alternations between central vowels, which are
often required to establish the underlying representation of the vowel
position. Specifically, an alternation between <a> and <ä>/zero indicates
the underlying segment / / while an alternation between <a> and < >
points to /a/. The algorithm cycles through each form of the family and
attempts to determine the underlying accent of the stem based on both
evidence internal to the form itself and information from the current
version of the template. The algorithm first parses the base form of the
family (Table 8) and determines that lāre has initial surface accent based
on the generalization that < > is always accented. Since, < > can only
represent the underlying segment /a/, the template can be updated
accordingly (lVrV larV). The items in the column labelled surface
accent represent an intermediary representation of the form containing a
hypothesis about the underlying representation of the central vowels and
the location of the surface accent based on the information gathered so far.
In the case of láre this is already the correct analysis. However, the
location of the underlying accent is obscured by Marggraf’s rule. That is,
even if the lexical accent was located on the second syllable, it would have
to be realized on the first syllable due to the ban on final accented surface
syllables. The algorithm then moves on to the next form (8.2). In isolation
the feminine plural form larona is ambiguous in that the accent could be
either on the first or the second syllable, since the vowel <a> either
represents an accented schwa or an unaccented /a/. The current form of the
template resolves this ambiguity and the underlying accent in this form is
correctly determined to be non-initial. The derived nominal larauñe works
exactly the same way. As noted above, the fact that larauñe is related to
lāre via derivation instead of inflection is irrelevant for determining the
accent of the stem.

Let us now return to the word for river cake to see how the algorithm
determines the location of schwas that have been deleted on the surface.

Sample Family: cake ‘river’


Hannes A. Fellner and Bernhard Koller 91

{cake.NOM/OBL.SG, ckentse.GEN.SG}

Base: cake cVkV

1 Form surface accent underlying accent updated template


2 cake c ke ? CVkV c kV
ckentse c kéntse non-initial c kV

Table 9.

The sample consists again of a subset of the forms attested for the family.
As in lāre the surface accent of cake must be located on the first syllable
due to Marggraf’s rule. From this it follows that <a> must represent
underlying / /, which is recorded in the template (cVkV → cəkV). Using
the updated template, the algorithm correctly restores the schwa following
the initial consonant in the genitive ckentse, despite it having undergone
deletion on the surface. Based on this alternation, the underlying accent
can be correctly determined to be non-initial.

The family of the noun kercapo ‘donkey’ illustrates two additional aspects
of the method employed.

Sample Family: kercapo ‘donkey’

{kercapo.NOM.SG, kercapai.OBL.SG, kercäpo.NOM.SG (archaic)}

Base: kercapo kVrVcVpV

Form surface accent underlying accent updated template


1 kercapo kerc?po ? kVrVcVpV
2 kercapai kerc?pai ? kVrVcVpV
kVrVcVpV
3 kercäpo kercäpo ?
kVr cVpV
2nd pass
4 kercapo kerc po non-initial kVrVc pV
5 kercapai kerc pai non-initial kVrVc pV
6 kercäpo kerc po ? kVrVc pV

Table 10.
92 On Sonority and Accent in Tocharian B1

The nominative kercapo is ambiguous with regards to whether its accent is


on the first or the second syllable. There is no indication as to whether
<a> realizes accented / / or unaccented /a/ and, consequently, the
template remains underspecified. The same applies to the oblique form
kercapai. The interesting form is the alternative nominative kercäpo,
attested only in archaic texts. The information regarding the dating of the
text in which the form is attested is crucial, since in archaic texts the rules
for the spelling of central vowels do not apply in the same way as they do
in classical and late texts. For our purposes the relevant difference is that
/ / is usually spelled with <ä>, regardless of whether it is accented or not.
This means that the accent of archaic forms cannot be determined. In order
to prevent the algorithm from wrongly analysing kercäpo as bearing initial
accent it needs to be able to identify the form as archaic. This information
is not present within the dictionary and must be retrieved from the
electronic manuscript corpus, where each manuscript fragment is marked
for the diachronic layer it belongs to. Each form in the dictionary is
matched against the entire text corpus in order to determine whether it
belongs to the archaic layer and must therefore receive exceptional
treatment in terms of accent. However, instead of simply discarding
archaic forms from the corpus, they can still be used to establish the
underlying representations of central vowels. In this case the nominative
form kercäpo unambiguously identifies the medial vowel as / /,
information that is invaluable for determining the accent of the stem in
classical and late forms. With the final version of the template in place, the
family is processed a second time, during which the nominative and
oblique are reanalysed and correctly identified as bearing non-initial
accent. The accent of the archaic form remains, for the aforementioned
reasons, undetermined.

6. Problems and Outlook


The system presented here represents work in progress and currently does
not yield any reliable generalizations regarding the relationship between
(underlyingly) initial accent and the phonological structure of the stem that
we would expect to find under the proposal of Jasanoff (2015). This is
because there are still too many forms whose accent cannot be correctly
determined by the algorithm due to a number of factors. Foremost among
these is the large number of Sanskrit loanwords occurring in Tocharian
texts, which frequently do not adhere to the general rules governing the
interaction between accent and central vowel alternations. For example,
the noun anāgāme ( Sanskrit anāgāmin-) ‘one destined to return no
Hannes A. Fellner and Bernhard Koller 93

more to this world’ (Adams 2013: 12) contains two instances of < >,
conflicting with the general rule that < > corresponds to accented /a/
(under the assumption that words only bear a single accent). Many other
cases do not have as obvious a tell as anāgāme that they deviate from the
general accent pattern but are, due to their status as loanwords, still
unreliable as evidence for inherited accentual properties. Before arriving at
any definitive results, it is therefore necessary to enrich the dictionary with
information regarding borrowing.

The applications of this system go well beyond the current research


question. In addition to refining the system for nominal forms by
expanding the electronic dictionary, we are planning to expand the scope
of the algorithm to verbal forms and to add the automatically generated
phonological representations to the CEToM dictionary. This will facilitate
the phonological study of Tocharian B by making it possible to search for
accented vowels, as well as accentual patterns. Furthermore, it will be
possible to search the dictionary for forms containing the underlying
segments /a/ and / /, independent of their surface realizations.

Bibliography
Adams D.Q., 1999, A Dictionary of Tocharian B., (Leiden Studies in Indo-
European 10), Rodopi, Amsterdam/Atlanta.
—. 2013, A Dictionary of Tocharian B, 2nd edition, (Leiden Studies in
Indo-European 10), Rodopi, Amsterdam/New York.
Anthony D.W. and Ringe D.A., 2015, “The Indo-European Homeland
from Linguistic and Archaeological Perspectives”, in Annual Review of
Linguistics 1, 199-219.
Carling G., Pinault G.-P. and Winter W., 2009, A Dictionary and
Thesaurus of Tocharian A. Volume 1: letters a-j. Harrassowitz,
Wiesbaden.
CEToM Comprehensive Edition of Tocharian Manuscripts:
univie.ac.at/tocharian.
Fellner H.A., 2007, “The Expeditions to Tocharistan”, in Malzahn M.
(ed.), Instrumenta Tocharica, Winter, Heidelberg, 13-36.
Fortson B.W., 2009, Indo-European Language and Culture: An
Introduction, 2nd edition, Malden, Ma., Blackwell.
IDP International Dunhuang Project: idp.bl.uk.
Jasanoff J.H., 2003, Hittite and the Indo-European verb, Oxford
University Press, Oxford.
—. 2015, “The Tocharian B accent”, in Malzahn M. (ed.), Tocharian Texts
94 On Sonority and Accent in Tocharian B1

in Context, International Conference on Tocharian Manuscripts and


Silk Road Culture held in June 26-28, 2013 in Vienna, Hempen,
Bremen, 87-98.
Krause W., 1952, Westtocharische Grammatik, Band I. Das Verbum,
Winter, Heidelberg.
Krause W. and Thomas W., Tocharisches Elementarbuch, Band I.
Grammatik, Winter, Heidelberg.
Mallory J.P., 2015, “The problem of Tocharian origins: an archaeological
perspective”, in Sino-Platonic Papers 259.
Malzahn M. (ed.), 2007a, Instrumenta Tocharica, Winter, Heidelberg.
—. 2007b, “The most Archaic Manuscripts of Tocharian B and the
Varieties of the Tocharian B Language”, in Malzahn M. (ed.),
Instrumenta Tocharica, Winter, Heidelberg, 255-297.
—. 2007c, “Tocharian Texts and Where to Find Them”, in Malzahn M.
(ed.), Instrumenta Tocharica, Winter, Heidelberg, 79-112.
—. 2010, The Tocharian Verbal System, Brill, Leiden/Boston.
—. 2012a, “Now you see it, now you don’t – Bewegliches o in Tocharisch
B”, in Hackstein O. and Kim R.I. (eds.), Linguistic developments along
the Silk Road: Archaism and Innovation in Tocharian, (Philosophisch-
Historische Klasse, Sitzungsberichte, 834. Band. Iranische Onomastik,
Nr. 12. Multilingualism and History of Knowledge, vol. II.),
Österreichische Akademie der Wissenschaften, Wien, 33-82.
Malzahn M., Peyrot M., Fellner H.A. and Illés T.-S-, 2015, Tocharian
Texts in Context. International Conference on Tocharian Manuscripts
and Silk Road Culture, Hempen, Bremen.
Marggraf W.-J., 1970, Untersuchungen zum Akzent in Tocharisch B,
doctoral dissertation, Christian-Albrechts-Universität zu Kiel.
Peyrot M., 2008, Variation and change in Tocharian B, Amsterdam/New
York, Rodopi.
Pinault G.-J., 2008, Chrestomathie tokharienne. Textes et Grammaire,
Peeters, Leuven/Paris.
—. 2016, “Les Tokhariens, passeurs et interprètes du bouddhisme”, in
Espagne M., Gorshenina S., Grenet F., Mustafayev S. and Rapin C.
(eds.), Asie Centrale. Transferts culturels le long de la Route de la
Soie, Vendémiaire, Paris, 167-200.
Prince A., 1990, “Quantitative consequences of rhythmic organization”, in
Deaton K. et al. (eds.), CLS 26-II: Papers from the Parasession on the
Syllable in Phonetics and Phonology, Chicago Linguistic Society,
Chicago, 355-398.
Ringe D.A., 1996, On the Chronology of Sound Changes in Tocharian.
Volume 1: From Proto-Indo-European to Proto-Tocharian, American
Hannes A. Fellner and Bernhard Koller 95

Oriental Society (AOS 80), New Haven, Conn.


Sander L., 1968, Paläographisches zu den Sanskrithandschriften der
Berliner Turfan Sammlung, (Verzeichnis der Orientalischen Hand-
schriften in Deutschland, Suppl.-Bd. 8), Franz Steiner Verlag,
Wiesbaden.
TITUS Thesaurus Indogermanischer Text- und Sprachmaterialien:
titus.fkidg1.uni-frankfurt.de.

S-ar putea să vă placă și