Sunteți pe pagina 1din 19
TRANSACTIONS of the American Philosophical Society Held at Philadelphia for Promoting Useful Knowledge VOLUME 82, Part 5, 1992 An Indoeuropean Classification: A Lexicostatistical Experiment Gu pss V, 82 PLS Acknowledgments ‘The first suthor collected the Indocuropean data, made the required ccognation decisions, and made the classification from the matrix of Ienicostatistical percentages. He also compared the Iexicostatistical and traditional classitications, including the material described in Chapter 5. "The second author invented box diagrams and shared in doing the analyses based on them, including most of the material described in (Chapter 6, He also did the statistical parts of the monograph, including ceamera-ready copy for the publisher. ‘The third author dd much of the compoting including wring and using the computer programs to crete that led to clarification and development of the ideas. The three authors all shared substantially in writing the monograph. ‘The authors would like to acknowledge the help of Anton Dufek for inexhaustible energy. We those many colleagues and used, for without their contribution this work could not have been tundertaken: their names are listed in Appendix 4 and in the index. Finally, we would like to express our appreciation to AT&T Bell Laboratories for their support and aid in many forms. ‘Cop © 19 by Th American Phlopial Soci Lay of Congres Catog InrsatonlSuntad acc Number O476042.0 1S ISSN Os. TABLE OF CONTENTS 42 46 The Entry For a List or Closed Group “The Entry for an Open’ Finding the Critical Die 4 575.6 English and Frisian (The Ingveonic Hypothesis) 595.7 Gujarati 60 6, The Box Diagram and Discrepancies 70 7. Multidimensional Sealing 77 & Conclusions (continued) Bibliography Al. The Outline Classification of Indoeuropean ‘AZ. The Groups and Specch Varieties 3. The Lexicostatistical Method ‘Ad. The Word Lists and Theit Sources AS. Table of Lexicostatistical Percentages ‘AG. The Pair-Group Method of Clustering AT. The Probabitity that p ~p 2 0.08 Index Table 1. Critical Differences of the Four Qualifiers by Rank 9 Some Conditions on p\~p2 Figure 1. The Box Diagram of Indoeuropean Figure 4, Slavic Pseudomap on Geographic Map Figure 5. Pseudomap of Major Indoeuropean Branches 1. INTRODUCTION described in Appendix 3 without assuming any lingw For readers who want to have their memories refreshed, phases of the method in the next paragraph. Finally, important points bout how the method is used in this monograph are described at the beginning of Chapter 2. ‘There are four phases in the lexicostatistical method, Phi collecting the word lists for the various dialects. Phase ‘cognate decisions among corresponding words from diferent phase is called Iexicostatistical comparison. Phase 3 is calculating the lexicostaisical percentages, Le, calculating the percentage of cognates shared by each pair of lists. Phase 4 is subgrouping the word lists by using the lexicostatistical percentages. This phase is the subgrouping phase ofthe lexicostatistical method. ‘heart the law of regular phonetic change, sometinies ‘referred to as the regularity of phonetic change or the regularity of sound ‘change. In fact this method is litle more than a new application of the ‘comparative method and fests on the same assumptions. It relies, however, on the more plentiful lexical innovations rather than the relatively few innovations availa ‘comparative method, To deal withthe abundance of principles from statistics. Much as the application of ‘modemizing innova ‘welcomed as an expansi 2 1. Introduction 4.4 The Purposes of the Work from contemporary languages and have historical information in forming the ‘of the Indosuropean languages. This _greater numberof lists treated in the present study gives it the advantage ‘of being more nearly comparable in size with LCAL. as an investigation. scale agreement ides substantial ‘evidence for the validity of the lexicostatistical method and thus of other 11 Purposes 3 ld be carried out, itis likely to yield only because the number of languages involved is reported even when the reports are accurate, and eee ea eee procedure which conforms bbeen offered in evidence. The aim in not presenting them is t0 save space, and the action is furthermore justified on the grounds that anyone who s0 desires can test the lexicostatistical percentages offered by thatthe perce ‘of approximation, given that some differences and even occasional ‘cognation errors (on ether side) are allowed for. it may appear to some that the agreement between the classification here and the generally accepted one can be attributed to capturing the “obvious” aspects of classification. It should be remembered, however, agreed upon ‘which therefore it is not yet ‘considering only the implications of detailed reconstruction, yields 4 1. Introduetion ‘number and qual argument against its validity on differences are, however, far from indicating thatthe method is invalid, because the few differences are all of three types, The first type results from the self-imposed languages, while the tat ancient written languages. though itis a striking one. ‘The second type of difference occurs where the lexicostatistical classification depends on differences of a few points in the Texicostatistical percentages. Unlike some correspondences between phonemes that are so well and repeatedly atte ‘drawn from them approach certainty, the lexi should always be regarded as approximate. The haps also be regarded as approximative, for if it is tis less surprising that different scholars can reach ‘one difference of this type, traditional method seems not 1o guarantee a universally acceptable inference. ‘The third type of difference betwes traditional classifications ovcurs in pars of class which the issue of methodologic ity is not raised, since such controversies tend to revolve about the evidence, As the evidence accumulates on one side or the other, a consensus or majority view is expected to adhere to the implications of that evidence, The validity of the lexicostatistical method should therefore not be rogarded ‘as dependent on its exact reproduction of a majority view reached by ‘other arguments, ‘The approach to certaimy of inferences drawn from the percentages depends on the magnitude of the differences between them and on the 1 Parposes s agreements of the percentages among themselves and with other relevant indicators in pointing to the same inference. Although the logic behind ew ell thay ae deed ty te ‘opposed, for example, to models 6 1. Introduction association is not contraindicated, alization at the highest level of Indoeuropean, so that no Which weak evidence of a € contradictions provide only 2 negative kind of evidence for scaling methods suck as that of positive evidence, 112 Vallity and Reliabity validity of a method concemin; relation between the assumptions that the method makes Tanguages behave and how the data are treated in the a method; the assumption scientific argument. We this requirement without di results agree when the same investigator ‘han one occasion, or 2) interinvestgatorn the results of different investigntors using method is invalid, its re difference how accurately the wrong ‘Substantial questions hve been lexicostatstcal method, chiefly on the be relitble if glotochronology (ie. lexicostatistical dating) is not reliable (Bergsland and Vogt 1962.126). Our response has iwo parts 1.2 Validity and Reliability 7 First, the asserted connection between the io ways of using lexicostaistics does not hold. Lexicostatisties might be sufficiently accurate for the purpose of subgrouping, as we beliove in fact it is, regardless of whether it has the greater accuracy needed for dating, (This postion seems to be rather widely heli despite the opposition of om the fact that basic ‘pposition.) ‘Second, "ever though plotocionolopy tmsed on Jexicostatistes isnot used inthis monograph, it can nevertheless be well Dyen and BI ‘Merwe [1966), lexicosatistical pereentage does not decay in the simple re for deducing age from lexicostalstcal percentage is not valid Instead, similarity decays in the manner of radioactivity from a mixture ‘connect age and similarity; see, e-g., Kruskal, Dj ‘Simpler methods than the one shown there are definitively established, between the method under consideration and other methods is usually ‘substituted if the results agree, they provide an extemal validation of all ‘Methods that are equally undoubtedly differ and differ markedly in their degrees of It is not our purpose here to ‘claim that the lexicostatstical method has greater reliability than the traditional method, but merely that it has a satisfactory degree of reliability. That this isso is indicated by the degree to which it agrees with the results obtained by the traditional method, since otherwise the scliability ofthe traditional method must be challenged or the agreement ‘must be lad to chance, neither of which options appears to be open. Nevertheless there are differences between the lexicostatstical and ‘traditional classifications, and these differences are important. But when ‘such differences are found, they call for deeper studies and careful ‘reexamination. They may on the one hand result Jexicostatistics like other linguistic methods is ‘other hand contain # suggestion or argument for a new view that will 12 Validity and Retiabitty ’ eopopical cnuraiy ‘This definite smplex” (Hockett I its that approximate, in a rough and ready way, the definit cctologcal langage stated above. No instance is known of 10 Ie lntroduction 5 defined above, The insistence on the dialectologi unit of classification in genetic ‘itsion for detemining Ihave a major effect onthe not languages. The assignment of ‘effect on the classification effect is through the minor ‘At the same time it earlier language and (some of) its dialects, 13 Language and Relationship un We now wm to creole languages. There are two in the dialects, and Takitaki, The difficulty they offer is that they present native speaker of modem Hebrew ‘word because it seems most likely that the last native t Hebrew and the first native speaker of modern Hebrew statements can apparently be made regarding * "Nevertheless a lexicostatistical percentage of pseudo-cognates can be obtained between creole languages and the languages that contributed to under some theory of linguistic change. Ie is obvious that a classification, Proviso that undetected loanwords can be excluded asa factor such borrowing: cannot be absolutely excluded as a factor, ‘meanings by which the words in a language list are selected that are on the whole not likely to have been borrowed, provided that the language has not been subjected to “intimate” (after Bloomfeld) or “prestige-seeking” (after Hockett) borrowing. Even the likely occurrence of intimate borrowings is normally detectable of a lists percentsges (see LCAL.25f; for effects, see Dyen 1963). These for aributing validity to lexicostatistcs Anyone who actualy works on having a large membership is soon struck with how diffi 44 Lexlcostatstical Method B proceed beyond the relatively obvious determinations of close rolationshipe that can be regarded as judgments by inspection. Strong ‘phonological arguments such as those which could be based on the ‘effects of the changes formulated as Grimm's and Vemer’s Laws, which {ead inovitably to the recognition of the Germanic subfamily, seem to be ‘quite rare, whereas other nonlexical arguments seem generally to end up in circularity and/or speculation. Lexicost for its part has the advantage of presenting a careful, itemally consistent argument which ‘merits challenge with counterevidence, not with the prejudice often accorded the use of statistics in historical linguistics, 15 Lesiostatistical Classification and Reconstruction Tis generally accepted as a matter of course that the proto-language (fa language family implied by lexicostatstcs is the same as the proto- language of that family implied by reconstruction. For reasons explained later’ in this section, we refer to the protoanguage implied by lexicostatistics as the’ “proto-language of the original list”, which is abbreviated as POL, and we refer 10 the the “proto-language ‘enerally accepted that the Indocuropean POL. is the same an PIPS, and in parcular that both ofthese refer tothe We refer to this regularity as the “law of regular topophonic change.” ‘Here the term topophone is used to specify a phoneme, or a transition berween phonemes, oF a transition between a pause and a phoneme in either order, that i, either as the onset of an initial phoneme or the offset following a final phoneme. ‘The law of regular topophonic change states that in linguistic material directly inherited from one stage of a language to another, the outcome in the later stage from a topophone in the carer stage depends only on the topophone and its environment of topophones. Different outcomes from the same topophone arise only by virme of 2.5 percentage points, go to Step 2 (which foams a group); otherwise retum to Step 0. ‘of the subgrouping met et percentages of an open clare G to be an apen inherits the percentages of N. be seen thatthe percentage of G with any other pool member Z is, jimum of the percentages shared by Xy, ~~ Xx with Z) Go 0 Step 0. Mad by Xie Ky Win. Goto Sepo A finer distinction among groups than “closed” and “open” is wseful within the classification, to indicate how well separated each group is. ‘Though not part ofthe subgrouping method, four qualifiers are appended to the group name to indicate the size ofthe eritcal difference as shown in Table 1 ‘Qualifier Giosed or bese [9.59% no upper limit | closed genus ‘luster hresion open ‘A subfamily and a genus are closed groups; a cluster and a hesion are ‘open groups. For an exact value of the eritical difference, or to chock the ‘qualifiers, itis possible with alittle effort to work out the exact critical difference for any group from Appendix 1, as explained in Section 48. as 2. Data and Method of Subgrouping ‘The magnitudes ofthe percentage point criteria in the table are based partly on experience, but remain arbitrary in the sense that some other ‘magnitudes may ultimately prove more useful. They are also based ed. The probability of observing more by chance in the absence of (or fess, rather than 10%. (omallest percentage between immediate members of group) — (largest percentage between group and rest of pool), and the difference was significant. This would strongly indicate that no ‘other pool members belong tothe group, and hence would de to be well separated. The second term in the critica di to the second term above, but the frst term isnot the aller than any previous mating pereen this group, but in general there are ct ‘percentages between two group members. Fortunately, there is a percentages within each group. This review gives assurance that the ‘smallest percentages are not too much smaller than the mating percentage ‘of the group, which helps provide assurance that no other pool members belong in the group. 23 Averaging for Closed Groups i is applied only to well-separated groups.) The lexicostatistical 2.3 Averaging for Closed Groups 2» ‘evidence fora well-separated group is not only the fact that the members ently high percentages with each other to group them ‘also the fact that their percentages with nonmembers are ly similar, The homogeneous behavior of members of a group nce that they originate from a well-defined dialect chaia, most distinct language. Im view of the homogeneous behavior in a closed group, the average fof the percentages shared by group members better than any individual percentage. climinate the effects of percentages whi a accidentally high or low, Thus for reasonably be regarded group, the percentages shared with a nonmember are averaged, and the further relationships of the group are based on these averaged percentages. The averaging process tends to roduce the superficial closeness of the nearest members of different groups. However, there are a few cases where a closed group does not exhibit the homogeneous behavior described above. During the process of subgrouping, whenever a closed group is fonmed on the its intemal percentages, the percentages of the members with nonmembers are compared to Verify ther similarity. In a few cases, the percentages shared with nonmembers may vary too much. Ifthe percentages shared with a single nonmember vary by more than 10 percentage points, the percentages are considered disturbed. Every disturbance found in the present matrix of percentages has been traceable to the percentages of ‘one member being systematically smaller or systematically larger than the corresponding percentages of other members, As explained in Section 24, all percentages of the nonconforming member are excluded from the averaging process, and the nonconforming member is marked by a double asterisk () in Appendices 1 and and throughout the set, exclusion occurred in six cases, which are all discussed in Seetion 2.4 24 Some Special Problems: Defated and Infated Ina Few cases, the percentages that a singh share with other isis may be deflate circumstances, Such percentages and the associat 0 2. Data and Method of Subgrouping ‘A third problem thet can oocur is rar: distorted percentages due to izing a dialect. **Greok K represents **Kath is nommally determinable. First, as with **Albanian, borrowed words may be directly observed. **Albanian is generally regarded to have use except pet amtificially removes cognates with closely related colloquial tists, thus 2 2. Data and Method of Subgrouping with the related Lists, Greek lists than they rnon-Greek lists as the *+Katharevousa in depth icf. Modification 3 Bracket are wed in "Gres (Safany)," for's nonstandard reaton. (The sandard reason, which invltes. the langage limit of 70%, is explained in Section 42) The brackets are used because the state of Gres asa subfamily depeads on the 69.9% imerage of the deviant percentages between *Katharevausa and the otter Geck iss, and deviant percentages are bei Modieaton 4: Quite apart fom is deflation of premages, tlaonship between Engh and "Takia is only a poco geese isons dete Sein 13, Eg nd "aa we tsi regarded os foming a eroup so that cr apparet grup refoned to merely es English witout ay quale, 3. EXPLANATION OF THE BOX DIAGRAM. belonging to thet speech vary and extending all the Way actus the age. ‘The body of the diagram contains many in with heavy black lines. These boxes are the dingram.” Bach box represents @ group, whose box. For example, in the upper left-hand comer is a tiny box which represents Welsh, while some of the nearby boxes represent respectively Breton, Irish, Brythonic, **French Creole, Franco-Proy Rumanian, as labeled. For luck of space, names are ori different directions. A few of the boxes are Inbeled with abbreviations, duo to lack of space to place the entire name. For example, just beneath the tiny Rumanian box is a large box labeled WR, representing a group rectangles) drawn, Appendix 2 contains an entry for each abbreviation in alphabetical postion. To avoid obscuring other information a are not completely drawn in. For example the Indoaryan and Iranian boxes (lower right) lack parts of their righthand edges, and the upper left-hand comer. By tracing over to the left, one can sce that the 8

S-ar putea să vă placă și