Sunteți pe pagina 1din 16

Semi-Automatic Annotation of Intra-Sentential Discourse Relations in PDT

Pavlna Jnov, Ji Mrovsk and Lucie Polkov


Charles University in Prague Institute of Formal and Applied Linguistics

{jinova|mirovsky|polakova}@ufal.mff.cuni.cz ABSTRACT In the present paper, we describe in detail and evaluate the process of semi-automatic annotation of intra-sentential discourse relations in the Prague ependenc! Treeban", which is a part of the pro#ect of otherwise mostl! manual annotation of all $intra- and inter-sentential% discourse relations with e&plicit connectives in the treeban"' (ur assumption that some s!ntactic features of a sentence anal!sis $in a form of a deeps!nta& dependenc! tree% correspond to certain discourse-level features proved to be correct, and the rich annotation of the treeban" allowed us to automaticall! detect the intra-sentential discourse relations, their connectives and arguments in most of the cases' TITLE AND A C#EC$

!T"ACT IN

Poloautomatick anotace vnitrovtnch diskurznch vztah v PDT


ABSTRA)T * tomto +l,n"u nab-.-me detailn- popis a evaluaci procesu poloautomatic"/ anotace vnitrov0tn1ch te&tov1ch v.tah2 v Pra3s"/m .,vislostn-m "orpusu #a"o sou+,st pro#e"tu #ina" p4edev5-m manu,ln- anotace v5ech $vnitro- a me.iv0tn1ch% te&tov1ch v.tah2 s e&plicitn-m "one"torem v tomto "orpusu' Potvrdil se n,5 p4edpo"lad, 3e n0"ter/ s!nta"tic"/ vlastnosti anal1.! v0t! $ve form0 .,vislostn-ho stromu hloub"ov/ s!nta&e% odpov-da#- #ist1m vlastnostem na 6rovni anal1.! te&tov1ch v.tah2 $dis"ur.u%' Bohat, anotace "orpusu n,m ve v0t5in0 p4-pad2 umo3nila automatic"! dete"ovat vnitrov0tn/ v.tah!, #e#ich "one"tor! a argument!'

)789(R )789(R

: T7CT(;RA<<ATICS, P T,

ISC(=RS7 A>>(TATI(>, I>TRA-S7>T7>TIA? R7?ATI(>S IS)=R@=,

S I>

C@7CA : T7)T(;RA<ATI)A, P T, A>(TAC7

*>ITR(*BT>C *@TAA8

Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects (ADACA), pages 4358, COLING 2012, Mumbai, December 2012.

43

Introduction

?inguistic phenomena going be!ond the sentence boundar! have been coming into the focus of computational linguists in the last decade' *arious corpora annotated with discourse relations appear, two of the first and most influential $for 7nglish% were the RST iscourse Treeban" $Carlson, <arcu, ("urows"i, DEED% and Penn iscourse Treeban" $Prasad et al', DEEF%' Gor other languages we can mention discourseannotated resources for Tur"ish $@e!re" et al', DEHE%, Arabic $Al-Saif and <ar"ert, DEHE%, and Chinese $@hou and Iue, DEHD%' <ost of these pro#ects have raw te&ts as their annotation basis' In the discourse pro#ect for C.ech, contrar! to the others, discourserelated phenomena have been annotated directl! on top of the s!ntactic $tectogrammatical% trees of the Prague ependenc! Treeban" D'J $henceforth P T, Be#+e" et al', DEHD%, with the goal to ma"e ma&imum use of the s!ntactico-semantic information from the sentence representation' The annotation of discourse relations $semantic relations between discourse units% in P T consisted of two steps K first, the inter-sentential discourse relations were annotated manuall!, second, the intra-sentential discourse relations were annotated semi-automaticall!' In both cases, onl! relations signalled b! an e&plicit discourse connective have been annotated' The main goal of this paper is to report in detail on the process of the semi-automatic annotation of intra-sentential discourse relations in P T' As we assumed, some of the $not onl!% s!ntactic features alread! annotated in the treeban" were ver! helpful and enabled us to perform automatic e&tractions and conversions' H >evertheless, some manual wor" had to be done both before and after the annotation'

!"!

#a$ers of Annotation in PDT

The data in our pro#ect come from the Prague ependenc! Treeban" D'J $Be#+e" et al', DEHD%, which is a corrected and enhanced version of P T D'E $Aa#i+ et al', DEEL%' P T is a treeban" of C.ech written #ournalistic te&ts $almost JE thousand sentences% enriched with a comple& manual annotation at three la!ers: the morphological la!er, where each to"en is assigned a lemma and a P(S tag, the so-called anal!tical la!er, at which the surface-s!ntactic structure of the sentence is represented as a dependenc! tree, and the tectogrammatical la!er, at which the linguistic meaning of the sentence is represented' At the tectogrammatical la!er, the meaning of the sentence is represented as a dependenc! tree structure' >odes of the tectogrammatical tree represent auto-semantic words, whereas functional words $such as prepositions, au&iliaries, subordinating con#unctions% and punctuation mar"s have $in most cases% no node of their own' The nodes are labelled with a large set of attributes, mainl! with a tectogrammatical lemma and a functor $semantic relationM e'g' Predicate $PR7 %, Actor $ACT%, Patient $PAT%,
Gor details on the e&ploitation of the s!ntactic features during the manual annotation of the intersentential relations, please consult <-rovs"1 et al' $DEHD%'
H

44

GI;=R7 H K An e&ample of an inter-sentential discourse relation, represented b! a thic" arrow between roots of the arguments

?ocation $?(C%%D' Additionall!, the tectogrammatical la!er includes the annotation of information structure attributes $sentence topic and focus, rhemati.ing e&pressions etc'%'

!"%

Discourse Annotation in T&o Ste's

In the pro#ect of discourse annotation, we have focused on discourse relations anchored b! an e&plicit $surface-present% discourse connective' These relations and their connectives have been annotated throughout the whole P T' Aowever, all the numbers reported in the paper refer to the training and development test parts of the whole data N, i'e' ON,PJJ sentences $appro&' PQHE of the treeban"%'O The annotation of discourse relations proceeded in two steps: Girst, the inter-sentential and some selected intra-sentential discourse relations were annotated manuall!, second, the remaining intra-sentential discourse relations were annotated $semi-%automaticall!, based on the information alread! annotated in P T'J The main theoretical principle of the annotation was the same for both phases' It was inspired partiall! b! the le&ical approach of the Penn iscourse Treeban" pro#ect $Prasad et al', DEEF%, and partiall! b! the tectogrammatical approach and the functional generative description $Sgall et al', HPFL, <i"ulov, et al', DEEJ%' A discourse connective in this view ta"es two te&t spans $verbal clauses or larger units% as its arguments' The semantic relation between the arguments is represented b! a discourse arrow $lin"%, the direction of which also uniforml! defines the nature of the argument $e'g' reason K result%'L
Gor a description of functors in P T, see http:QQufal'mff'cuni'c.QpdtD'EQdocQmanualsQenQt-la!erQhtmlQchER'html' N as distinguished in the P T pro#ect O Thus the last tenth of the treeban", evaluation test data, remains $as far as possible% unobserved' J The annotation had to proceed in this order' (ur understanding what is possible to annotate automaticall! onl! formed during the manual annotation, as we got familiar with the data' L Gor further information on the annotation guidelines, see http:QQufal'mff'cuni'c.QdiscourseQ'
D

45

GI;=R7 D K An e&ample of an intra-sentential discourse relation annotated during the first phase !"%"! Ste' !( )anual Annotation *)ostl$ of the Inter-Sentential Relations+

The first phase of the annotation was a thorough manual processing of the treeban" primaril! focused on the inter-sentential relations $relations between sentences% signalled b! e&plicit discourse connectives' 7&ample H and Gigure H show an intersentential discourse relation of t!pe opposition with e&plicit connective ale $but%' $H% Lid chtj platit jen to, co skuten spotebovali. Jet dlouho tomu tak ale patrn nebude. People onl !ant to pa "or !hat the reall consumed. But apparentl , it !ill not be so et "or a lon# time. Intra-sentential relations $within a sentence% were during the first phase onl! mar"ed manuall! in cases where the discourse t!pe could not be determined unambiguousl! b! the tectogrammatical label $functor% and the actual discourse t!pe was not prevailing for the given functor' Gor instance, the tectogrammatical label $functor% A *S $the adversative relation, in our case clausal% is too general and corresponds to several finer discourse t!pes, namel! the t!pes of opposition, restrictive opposition, correction, con"rontation, and concession' $pposition is predominant among the discourse t!pes for the functor A *S, so it was not annotated in the first phase $and was left for the second phase%R' All the other discourse t!pes for the functor A *S were annotated manuall! in the first phase' The situation is illustrated b! 7&ample D and Gigure DM on the tectogrammatical la!er, the relation between the two clauses was labelled as A *S
R

See Table H for predominant discourse t!pes for various functors'

46

$functor of the coordinative node in Gigure D%M the discourse t!pe is correction $the relation is mar"ed b! the arrow with label corr in Gigure D%' $D% %&vodem kanibalismu neb l hlad, ale politick motiv . 'he reason "or the cannibalism !as not hun#er but political motives. Gor a more detailed description of the manual annotation of the treeban" including the annotation evaluation see e'g' S-nov, et al' DEHD' !"%"% Ste' %( Automatic Annotation of the Intra-Sentential Relations

The second phase of the annotations consisted predominantl! of an automatic procedure that e&tracted mostl! tectogrammatical features and used them directl! for the annotation of intra-sentential discourse relations' The main goal was to find and mar" all so far unmar"ed intra-sentential discourse relations' This is the main topic of the present paper and we describe it in detail in the following sections' Section D briefl! describes the manual preparator! wor" preceding the automated part of the e&traction' Section N is devoted to the automatic annotation itself and to some practical issues connected to it' In Section O, we mention two necessar! manual corrections performed after the automatic annotation, and we evaluate our results in Section J, which is followed b! a conclusion'

Pre-Annotation

Two manual steps preceded the automatic annotation of the intra-sentential discourse relations: completel! manuall! annotated selected intra-sentential relations and partiall! manuall! annotated temporal relations'

%"!

)anual ,ork

As e&plained in Subsection H'D'H $7&ample D, Gigure D%, some of the intra-sentential discourse relations were annotated manuall! during the first phase of the annotations' It was JHE vertical $subordinate% relations and H,LFH hori.ontal $coordinate% F intrasentential relations' (ther cases of intra-sentential relations, where the tectogrammatical annotation was adeTuate for the discourse interpretation, were left to the second phase' As an e&ample, if we follow the sub-classification of the A *S tectogrammatical label for discourse semantics mentioned above in H'D'H, e&cept for the relations mar"ed previousl! in the manual phase, the remaining cases were all automaticall! set to discourse t!pe opposition (opp), see Table H and Section N'H for details'

%"%

Semi-Automatic Annotation

Ginite verbs with the t!pe of dependenc! being one of the temporal relations $functors TGA?, TA?, TA(, TSI>, TTI??, T9A7>% were pre-processed manuall!' Gor each of
In dependenc! trees of P T, root nodes of coordinated phrases are captured as siblings $direct children of the coordinating node%, hence Uhori.ontalV relations'
F

47

them, the t!pe of the discourse relation was set b! a human annotator, along with the direction of the relation $whether from the dependent node to its governor or the other wa!%P and the e&act position of the arguments $the nodes themselves or possibl! their coordinating nodes $if present%%' All this information was annotated in a table and passed to the automatic script to create the discourse relations and to find and set the appropriate connective to each relation automaticall!' Altogether, it was OPH relations'

Automatic Annotation

After the manual annotation described in Subsection D'H and the manual preprocessing of temporal relations described in Subsection D'D, an automatic script went through the tectogrammatical la!er of the whole data of P T, document b! document, sentence b! sentence and node b! node' If the node represented a finite verb with one of the temporal functors $TGA?, TA?, TA(, TSI>, TTI??, T9A7>%, it was annotated using the information from the manuall! created table $Subsection D'D above%' a finite verb with functor CA=S, C(> , C>CS, AI<, C(>TR or S=BS, it became a candidate for an automaticall! detected vertical discourse relation' a coordination node with functor R7AS, CSW, A *S, C(>GR, ;RA , C(>S or ISS, coordinating $directl! or transitivel!% finite verbs or non-finite-verbal nodes with functor PR7 HE, it became a candidate for a hori.ontal relation'

In all cases, the connective was detected automaticall! $see below in Subsection N'O%' .ertical Relations Candidates for a vertical relation were chec"ed for a presence of a previousl! manuall! annotated relationM if there was none, an automatic discourse relation was created, in the basic case directl! between the dependant and governing verbal nodes' If one of the nodes was a member of a coordination, more comple& procedure was used to set the e&act position of the arguments $see below Subsections N'D and N'D'H%' The discourse t!pe and direction of the discourse arrow were set based on the tectogrammatical functor of the dependant node, see Subsection N'H below for details' Ginall!, the connective was found and set K see Subsection N'O for the procedure' /orizontal Relations Similarl!, candidates for a hori.ontal relation were chec"ed for a presence of a previousl! manuall! annotated relationM if there was none, an automatic discourse
There is a rich variet! of connectives, and also verbal aspect values and negation pla! a role' These features in combination determine the discourse t!pe and also the direction of the discourse arrow $i'e' the nature of the discourse arguments: precedence * succession%' Aowever, as the occurrences in the data were not so man!, it was faster to decide on the t!pe of the relation and the order of arguments manuall!' HE PR7 K a tectogrammatical predicateM for a list and description of all functors, please see the tectogrammatical manual: http:QQufal'mff'cuni'c.QpdtD'EQdocQmanualsQenQt-la!erQhtmlQchER'html
P

48

relation was created among the members of the coordination' A special case of multiple coordinations is discussed in N'D'D below' The discourse t!pe and direction of the arrow were established based on the tectogrammatical functor of the coordinating node, again see Subsection N'H below for details' Subsection N'O describes the procedure of searching for the connective of the hori.ontal relation'

-"!

0unctor to Discourse T$'e 1onversion

Table H shows a list of tectogrammatical functors and their corresponding prevailing discourse t!pes' After the manual annotation, the table could be $and was% used to identif! the discourse t!pe of the remaining relations' >ote that it is still not a H-H relation, for e&ample the discourse t!pe con"rontation can be signalled b! two different functors $C(>TR and C(>GR%, as we give up the s!ntactic distinction of h!potactic $C(>TR % vs' paratactic $C(>GR% in this respect' The transformation table was used for all automaticall! annotated hori.ontal relations $R,NPD cases% and all automaticall! annotated vertical relations $D,JPP cases%' 0unctor
AI< CA=S C>CS C(> C(>TR S=BS A *S C(>GR C(>S CSW ISS ;RA R7AS

0unctor *lon2 name+!!


purpose cause concession condition confrontation substitution adversative relation confrontation con#unction conseTuence dis#unction gradation causal relation

Discourse t$'e
purp reason conc cond confr corr opp confr con# reason dis#alt grad reason

Discourse t$'e *lon2 name+


purpose reason-result concession condition confrontation correction opposition confrontation con#unction reason-result dis#unctive alternative gradation reason-result

TAB?7 H K Gunctor to discourse t!pe automatic translation tableM the first si& rows represent vertical relations, the last seven rows represent hori.ontal relations'

-"%

Ar2uments &ith 1oordinations

In P T, coordinating e&pressions are represented as separate nodes and technicall! the! are not different from other nodes representing content words' In the detection of discourse arguments, two situations needed to be treated in a special wa!, as described in the following two subsections'
HH

ta"en from the tectogrammatical manual: http:QQufal'mff'cuni'c.QpdtD'EQdocQmanualsQenQt-la!erQhtmlQchER'html

49

-"%"!

1oordinated Structures in the Detection of the Ar2ument Position

In man! cases, an argument of a discourse relation is represented b! a coordination of verbal nodes, not b! the verbal nodes individuall!' In such cases, the position of the argument was shifted from the verbal nodes to the coordinating node' It could even happen transitivel!, so the topmost suitable coordination was alwa!s searched for' 7&ample N demonstrates a comple& case of coordinated arguments' The situation is depicted in Gigure N, which is a tectogrammatical tree in a folded mode $nodes of the tree represent individual clauses or coordinations% HD' All discourse annotation in the tree is a result of the automatic procedure' $N% Po revoluci se s r&+n,mi pavdami a arlat-nstvm ro+trhl p tel, co ch-pu, protoe jednak b l +a komunismu +ak-+an, a tak lo#ick pitahoval , a za druh nab+ej r chl- a snadn- een a v svtlen, co se hro+n lb tm, kte neradi m sl. ."ter the revolution, !e !ere "looded !ith various pseudosciences and charlatanisms, which / can understand, because for one thing, the !ere "orbidden in the communist era and so lo#icall the !ere attractive, and for another, the o""er "ast and eas solutions and e0planations, which is a!"ull liked b those !ho do not like to think.

HD

Gor all features of the annotation tool for discourse, see <-rovs"1 et al' $DEHE%'

50

In this e&ample sentence, five discourse relations along with their t!pes and connectives have been automaticall! detected' Gour of them are hori.ontal relations: i' a hori.ontal relation of t!pe conj between clauses 1Po revoluci se 2 ro+trhl p tel3 $1."ter the revolution, !e !ere "looded 2 charlatanisms3 %, and 1ch-pu3 $1/ can understand3%, with the connective co4 $!hich%,

ii' a hori.ontal relation of t!pe reason between clauses 1lo#ick pitahoval 3 $1lo#icall the !ere attractive3% and 1b l +a komunismu +ak-+an3 $1the !ere "orbidden in the communist era3%, with the connective 1a tak3 $1and so3%, iii' a hori.ontal relation of t!pe conj between clauses 1nab+ej 2 v svtlen3 $1the o""er 2 e0planations3% and 1se hro+n lb 2 neradi m slV $1is a!"ull liked 2 do not like to think3%, with the connective co4 $!hich%, iv' and a hori.ontal relation of t!pe conj between coordinations of clauses in $ii% and $iii%, with the connective 1jednak a +a druh3 $1"or one thin# and "or another3%' (ne of them is a vertical relation: v' a vertical relation of t!pe reason between the coordination of the coordinations in $iv% and the coordination of clauses in $i%, with the connective proto4e $because%'

Cases $i%, $ii% and $iii% are simple cases where the arguments are represented directl! b! the coordinated verbal nodes' Case $iv% is also a relativel! simple case, onl! a presence of a coordinated HN finite-verb in the subtree of both the coordinated clauses needed to be chec"ed $transitivel! in general%' Case $v% is a vertical discourse relation represented b! an arrow between the two coordinating nodes' The relation was however signalled b! four occurrences of functor CA=S, mar"ing a linguistic $effective% dependenc! HO between each of the transitivel! coordinated finite verbs with this functor HJ and each of their linguistic parents $finite verbs 1ro+trhnout se3 $1be "looded3% and ch-pat $1to understand3%%, which are also coordinated' The arguments of the relation$s% needed to be lifted to the topmost suitable coordinating nodes'HL Thus, instead of eight discourse relations that could be created directl! between the individual verbal nodes, onl! one overall discourse relation was created, which is a more comprehensible solution, without a loss of an! information' In all detected vertical relations, the effective parent was shifted b! one coordination level DLN times, resulting in HHE discourse relations, and b! two coordination levels F times, resulting in N discourse relations' The effective child was shifted b! one
The tectogrammatical attribute isXmember serves to distinguishing coordinated and non-coordinated children of a coordinating node' HO The effective dependenc! is a linguistic dependenc! between nodes representing content words, ta"ing all effects of coordinations etc' into account' HJ verbal nodes 1b,t (+ak-+an,)3 $1to be ("orbidden)3%, pitahovat $1to be attractive3%, nab+et $1to o""er3%, and 1lbit se3 $1to be liked3% HL Again, the tectogrammatical attribute isXmember was used'
HN

51

coordination level LNO times, resulting in NHO discourse relations, and b! two coordination levels LH times, resulting in DJ discourse relations' -"%"% )ulti'le 1oordinations

In case of multiple coordinations $coordinations with more than two members% with onl! a comma as the con#unction of the first members of the coordination and a connective $often a $and%% as the con#unction of the last two members of the coordination, onl! the last two members form a discourse relation with an e&plicit connective $as we do not consider a comma to be a discourse connective%' 7&ample O demonstrates such a case: $O% Po+oroval jsem jednou jednu slenu5 sedla u P6, mla prst +aboen do kl-vesnice a evidentn se nudila. / !atched a oun# lad once5 she !as sittin# at a P6, had her "in#er buried in the ke board and evidentl !as bored.78 Aere, a discourse relation was onl! created between clauses U evidentn se nudila3 $Uevidentl !as bored3% and Umla prst +aboen do kl-vesnice3 $Uhad her "in#er buried in the ke board3%, with a $and% as a connective' The other discourse relations in these coordinations are considered implicit and will be annotated in the future, during the annotations of implicit discourse relations' <ultiple coordinations of this t!pe occur JEH times in the data'

-"-

Sco'e of Ar2uments

In all intra-sentential relations, the scope of a discourse argument is defined as the effective subtreeHF of the root node of the argument $the root node of the argument can either be a finite verb or a node coordinating HP finite verbs or another t!pe of node with functor PR7 %, e&cluding all nodes of the other argument of the relation' In all HE,OFD automaticall! annotated intra-sentential relations, the tectogrammatical tree structure correctl! defined the scope of the arguments, independentl! of the fact whether the argument was formed on the surface b! a continuous seTuence of words or not' DE

-"3

Detection of Discourse 1onnectives

In most cases, the discourse connectives of intra-sentential discourse relations could be automaticall! detected on the basis of the information on the tectogrammatical and anal!tical la!ers'
The presence of a sub#ect in a C.ech clause is irrelevant for the decision whether to annotate a discourse relation or not, as C.ech is a pro-drop language' Aence, the 7nglish translation of the e&ample sentence with no sub#ect in the last two clauses is not to be treated as a *P coordination, which would not be annotated in some pro#ects for 7nglish li"e the P TB $see Prasad, DEER% HF 7ffective subtree of a node is a set of nodes that linguisticall! depend $transitivel!% on the given node, ta"ing all effects of coordinations etc' into account' HP possibl! transitivel!, i'e' through other coordinating nodes DE Gor the D,HPH manuall! annotated intra-sentential relations, in all but HOL cases the scope of arguments was also eTual to the effective subtree of the root node, in the HOL cases the annotator had to define a different scope of the argument'
HR

52

Connectives of the vertical relations can be found among nodes from the anal!tical la!er that correspond to the verbal root of the discourse argument on the tectogrammatical la!er' All au&iliar! anal!tical counterparts $not the le&ical counterpart% of the verbal node e&cept for au&iliar! verbs and refle&ive particles $ se, si% become a part of the connective' Connectives of the hori.ontal relations can be found on the tectogrammatical la!er at the coordinating node $all its anal!tical counterparts, e'g' a $and%, bu9 * nebo $either * or%, etc'% or its modifiers $functor C< $con#unction modifier%, e'g' dokonce $even%, pesto $despite o" that%, or negation%' 9ith the e&ception of DN at!pical cases $which were fi&ed manuall!, see Subsection O'H%, discourse connectives could be detected automaticall! for all HE,OFD intra-sentential discourse relations' In the rest of this subsection, we point out three special cases of the connective detection' -"3"! 1onnectives &ith tak4 pak4 potom

Gor vertical relation, connectives li"e jestli4e * pak $i" * then%, the second part $pak $then%% needed to be found among the effective children of the effective parent$s% of the given verbal node' The! were filtered using the tectogrammatical lemma $onl! tak, pak, potom $so, then, then%% and the functor $onl! PR7C or one of the temporal relations%' It happened PN times in the data' -"3"% 1onnectives &ith 56'ression co

The e&pression co4 $!hich% can represent an intra-sentential connective with the conjunctive meaning even though it can be inflected and pla!s a role of a participant of the clause structure $including a valence participant%' To ma"e it possible to distinguish the connective role of this e&pression automaticall!, grammatical coreference DH was used' If the annotated anaphoric lin" from the e&pression co4 referred to the coordinated verbal phrase $or in a more comple& case to a coordination of verbal phrases%, co4 became a part of the connective' See 7&ample J, where co4 $!hich% refers $via the grammatical coreference% to stal se $became%: $J% Pavlov se pak stal pedsedou vl-d , co se :lausovi pihodilo nakonec tak. Pavlov then became the prime minister, which a"ter all happened to :laus as !ell. In the data, DDE occurrences of the e&pression co4 have a grammatical coreference lin" to a finite-verb node, HH occurrences have this lin" to a coordination of finite-verb nodes' Altogether, DNH discourse relations were created with co4 $!hich% as a part of the connective'

;rammatical coreference was annotated in P T for e&pressions where it is possible to identif! the coreferred part of the te&t on the basis of grammatical rules $see <i"ulov, et' al, DEEJ%'
DH

53

-"3"-

Dou7le 1onnectives

In some cases of a vertical relation where dependant finite verbal nodes are coordinated, the coordinated clauses begin with separate or different connectives, li"e proto4e * proto4e $because * because% in 7&ample L' Both the connectives become a part of the connective of the discourse relation' $L% 2 je kodliv, a ideolo#ick +av-djc, protoe odr-4 ned&vru v racionalitu chov-n ka4dho + n-s a protoe implikuje "alenou vru ve schopnosti nkter,ch + n-s v tvoit pro n-s ostatn lep, dokonalej svt. 2 is harm"ul and ideolo#icall misleadin# because it re"lects the mistrust in the behaviour rationalit o" each o" us and because it implicates a "alse "aith in the abilit o" some o" us to create "or the rest o" us a better, more per"ect !orld. This happened LP times in our data'

)anual 1orrections

After the automatic annotation, a few manual chec"s and corrections were needed' The! are described in the following two subsections'

3"!

0ailures in the 1onnective Identification

After having run the script, some manual correction turned up to be necessar! in cases where the automatic search for connectives failed $DN cases in sum%' These failures arose from two t!pes of situation' Girst, connectives were placed on a non-t!pical position in the tree' Second, connectives were not present in the sentence at all' This situation is illustrated b! 7&ample R: the last clause $he did not pa "or this% is interpreted as a causal sentence on the tectogrammatical la!er, but no connective signals this relation' $R% 2 v&bec nejhor pos-dka v sa"ari busu je smen-5 .n#lian si +apomene kameru v hotelu a chce se vr-tit, ;rancou+ +u, +a tohle neplatil< 2 the absolutel !orst cre! in a sa"ari bus is a mi0ed one5 the =n#lishman "or#ets his camera in the hotel and !ants to #o back, the ;renchman is "urious, he did not pa "or this< In the first t!pe of situation, the connective was added manuall! $we count these relations under the manuall! annotated ones%, in the second t!pe $as in 7&ample R%, the whole relation was deleted for violation of the surface-present connective rule'

3"%

1lauses De'endin2 on a 8oun Phrase or an Infinitive

Solel! manual treatment reTuired those t!pes of constructions where the dependent clause with discourse semantics was related to a comple& predicate structure containing a noun phrase or an infinitive' (nl! semantics allows to distinguish cases where the dependent clause is related to the whole predicate structure from those related onl! to an infinitive or a noun phrase' Consider 7&amples F and P' In both structures, the dependent clause is a child-node of the infinitive, but onl! in 7&ample F it is

54

semanticall! related to the whole predicate structure 1je ochoten povolit3 $1is !illin# to permit3%' In 7&ample P the dependent clause is semanticall! related onl! to the noun phrase 1pipravenost odpovdt silou3 $1readiness to respond !ith "orce3 %' As we onl! annotate discourse relations between te&t spans with finite verbs, onl! in 7&ample F a discourse relation was annotated' $F% >rbsk, pre+ident >lobodan ?iloevi je ochoten povolit me+in-rodn kontrolu sv blok-d bosensk,ch >rb&, pokud bude obdobn- kontrola uplatnna i na hranicch 6horvatska a @osn . 'he >erbian president >lobodan ?ilosevic is !illin# to permit an international inspection o" his blockade o" the @osnian >erbs if a similar control is applied also on borders o" 6roatia and @osnia. $P% Ad&ra+nili vak tak pipravenost odpovdt silou, pokud opo+ice bude trvat na pou4it +bran. Bo!ever, the also emphasised their readiness to respond !ith "orce if the opposition !ill insist on the use o" !eapons. There were HOL cases with such a dependent clause related to the whole predicate structure and RN occurrences where it was not the case'

Summar$

Table D shows the summar! of all relations annotated during both phases of the pro#ect, and gives detailed numbers of various Ut!pesV of the intra-sentential relations' The last row of the table presents the whole number of all annotated discourse relations of an! t!pe'DD T$'e of the relation
Intra-sentential relations - automatic vertical - semi-automatic vertical - automatic hori.ontal - manual vertical - manual hori.ontal Inter-sentential $all manual% Total

count
HD,LRN D,JPP OPH R,NPD JHE H,LFH J,JHO HF,HFR

TAB?7 D K (verview of discourse relations annotated in P T 9e were able to automaticall! convert P,PPH $D,JPP vertical and R,NPD hori.ontal% tectogrammatical dependencies into discourse relations, along with all properties of the relations $i'e' the position of arguments, the discourse t!pe and the connective% ' Gor
?et us emphasi.e again: although ever!thing was done on the whole P T data, all reported numbers onl! refer to the training and development test parts of the data $PQHE of the treeban", ON,PJJ sentences%'
DD

55

another OPH vertical dependencies, the discourse t!pe, the order of arguments and their position according to possible coordinations were set manuall!, as e&plained in Subsection D'D, while the rest of the wor" with these relations was also done automaticall!M we count these relations as semi-automatic' <ostl! during the first phase of the annotation, D,HPH $JHE vertical and H,LFH hori.ontal% intra-sentential discourse relations were annotated completel! manuall!' After the automatic procedure, nont!pical connectives needed to be fi&ed in DN cases, and HOL relations between a dependent clause and a comple& predicate structure needed to be manuall! added, as e&plained in Section O'

1onclusion
In the paper, we have presented in detail the second phase of the discourse annotation pro#ect in the Prague ependenc! Treeban" D'J, namel! the semi-automatic annotation of intra-sentential discourse relations mar"ed b! an e&plicit connective' In the preceding first phase of the pro#ect, the whole treeban" was processed manuall! and all intersentential relations were mar"ed b! a human annotator' Also all intra-sentential relations were assessed manuall! and those relations whose discourse semantics was not unambiguousl! inferable from the tectogrammatical information were annotated' After the manual annotation, the tectogrammatical interpretation of the remaining relations conve!ed the discourse semantics properl! and, in the second phase of the pro#ect, all these remaining intra-sentential relations were annotated semi-automaticall! or automaticall!' uring the automatic part of the annotation, the presence of a discourse relation, the e&act position of its arguments, its discourse t!pe and the connective were automaticall! detected, using the annotation of the deep-s!nta& dependenc! trees at the tectogrammatical la!er of P T' As a final step, a few manual chec"s and corrections were performed' 9e have also discussed interesting theoretical observations revealed during the semiautomatic annotation, namel! to what e&tent a s!nta&-based discourse anal!sis is automaticall! processible and what are the special $and so linguisticall! interesting% cases that reTuire more attention' The annotated data $both intra- and inter-sentential relations% was published in the autumn of DEHD under the same licence as the underl!ing P T D'J, i'e' the Creative Commons licenceDN' It is available $downloadable% from the repositor! of ?I> AT-Clarin K Centre for ?anguage Research Infrastructure in the C.ech RepublicDO'

Ackno&led2ments
9e gratefull! ac"nowledge support from the ;rant Agenc! of the C.ech Republic $grants POELQHDQELJF and POELQDEHEQEFRJ% and from the <inistr! of 7ducation, 8outh and Sports in the C.ech Republic, program )(>TA)T $<7HEEHF% and the ?I> AT-Clarin pro#ect $?<DEHEEHN%'
DN DO

http:QQcreativecommons'org http:QQwww'lindat'c.

56

References
Al-Saif, AM <ar"ert, )' $DEHE%' The ?eeds Arabic iscourse Treeban": Annotating discourse connectives for Arabic. In Proceedin#s o" the 8th /nternational 6on"erence on Lan#ua#e Cesources and =valuation (LC=6 DE7E), *alletta, <alta, pp' DEOLKDEJN' Be#+e", 7', Panevov,, S', Popel"a, S', Sme#"alov,, ?', StraY,", P', Zev+-"ov,, <', Zt0p,ne", S', Toman, S', [abo"rts"1, @', Aa#i+, S' $DEHD%' Prague ependenc! Treeban" D'J' %ataFso"t!are, 6harles Gniversit in Pra#ue, C.ech Republic, http:QQufal'mff'cuni'c.QpdtD'JQ' Carlson, ?', <arcu, ', and ("urows"i, <'7' $DEED%' C>' %iscourse 'reebank, ? CDEEDTER \Corpus]' ?inguistic ata Consortium, Philadelphia, PA, =SA' Aa#i+, S', Panevov,, S', Aa#i+ov,, 7', Sgall, P', Pa#as, P', Zt0p,ne", S', Aavel"a, S', <i"ulov,, <', [abo"rts"1, @' and Zev+-"ov,-Ra.-mov,, <' $DEEL%' Pra#ue %ependenc 'reebank D.E. Software protot!pe, ?inguistic ata Consortium, Philadelphia, PA, =SA, ISB> H-JFJLN-NRE-O, www'ldc'upenn'edu, Sul DEEL' <-rovs"1, S', S-nov,, P', Pol,"ov,, ?' $DEHD%' oes Tectogrammatics help the Annotation of iscourse^ In Procedin#s o" the DHth /nternational 6on"erence on 6omputational Lin#uistics (6$L/IJ DE7D), <umbai, India, ecember DEHD' S-nov,, P', <-rovs"1, S', Pol,"ov,, ?' $DEHD%' Anal!.ing the <ost Common 7rrors in the iscourse Annotation of the Prague ependenc! Treeban"' In Proceedin#s o" the 77th /nternational Korkshop on 'reebanks and Lin#uistic 'heories ('L' 77), ?isboa, Portugal' <i"ulov,, <', B/mov,, A', Aa#i+, S', Aa#i+ov,, 7', Aavel"a, S', )ol,4ov,-_e.n-+"ov,, *', )u+ov,, ?', ?opat"ov,, <', Pa#as, P', Panevov,, S', Ra.-mov,, <', Sgall, P', Zt0p,ne", S', =re5ov,, @', *esel,, )' and [abo"rts"1, @' $DEEJ%' .notace na tekto#ramatick rovin Pra4skho +-vislostnho korpusu. .not-torsk- pruka. Praha: =GA? <GG' Available at: http:QQufal'mff'cuni'c.QpdtD'EQdocQmanualsQc.Qt-la!erQhtmlQinde&'html' <-rovs"1, S', <ladov,, ?', [abo"rts"1, @' $DEHE%' Annotation Tool for iscourse in P T' In Proceedin#s o" the DLrd /nternational 6on"erence on 6omputational Lin#uistics (6olin# DE7E), Bei#ing, China, pp' P-HD' Prasad, R', inesh, >', ?ee, A', <iltsa"a"i, 7', Robaldo, ?', Soshi, A' and 9ebber, B' $DEER%' 'he Penn %iscourse 'ree@ank D.E .nnotation ?anual. Available at: http:QQwww'seas'upenn'eduQ`pdtbQP TBAPIQpdtb-annotation-manual'pdf' Prasad, R', inesh, >', ?ee, A', <iltsa"a"i, 7', Robaldo, ?', Soshi, A' and 9ebber, B' $DEEF%' The Penn iscourse Treeban" D'E' In Proceedin#s o" the Mth /nternational 6on"erence on Lan#ua#e Cesources and =valuation (LC=6 DEEN), ?arrakech, ?orocco, pp. DOM7*DOMN' Sgall, P', Aa#i+ov,, 7' and Panevov,, S' $HPFL%' 'he ?eanin# o" the >entence in /ts >emantic and Pra#matic .spects, Praha: Academia'

57

@e!re", ', emiraahin Iabn, callb A' B' S', Balaban A' d', 8alebn"a!a f', g Turan h' ' $DEHE%' The annotation scheme of the Tur"ish iscourse Ban" and an evaluation of inconsistent annotations' In Proceedin#s o" the ;ourth Lin#uistic .nnotation Korkshop, .6L DE7E, =ppsala, Sweden, pp' DFDKDFP' 8uping @hou and >ianwen Iue' $DEHD%' P TB-st!le discourse annotation of Chinese te&t' In Proceedin#s o" the PEth .nnual ?eetin# o" the .ssociation "or 6omputational Lin#uistics $.6L DE7D), Se#u, Republic of )orea, pp' LPKRR'

58

S-ar putea să vă placă și