Sunteți pe pagina 1din 256

Studies in Natural Language Processing Sponsored by the Association for Computational Linguistics

Text generation

Studies in Natural Language Processing

This series publishes monographs, texts, and edited volumes within the inter- disciplinary field of computational linguistics. Sponsored by the Association for Computational Linguistics, the series will represent the range of topics of concern to the scholars working in this increasingly important field, whether their back- ground is in formal linguistics, psycholinguistics, cognitive psychology, or ar- tificial intelligence.

Text generation

Using discourse strategies and focus constraints to generate natural language text


Department of Computer Science, Columbia University



Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 1RP


West 20th Street, New York, NY 10011-4211, USA


Stamford Road, Oakleigh, Victoria 3166, Australia

© Cambridge University Press 1985

First published 1985 Reprinted 1986 First paperback edition 1992

Library of Congress Cataloging in Publication Data McKeown, Kathleen R. Text generation. Bibliography: p. Includes index.

1. Discourse analysis - Data processing.

2. Linguistics - Data processing. I. Title.

P302.M392 1985 410'.28'54 84-19889

ISBN 0-521-30116-5 hardback ISBN 0-521-43802-0 paperback

Transferred to digital printing 2004




1. Introduction


1.1 Problems in generating text


1.2 A processing model


1.3 A sketch of related work


1.4 A text generation theory and method


1.5 System overview


1.6 Th e database application


1.7 Other issues


1.8 Guide to remaining chapters


2. Discourse



2.1 Rhetorical predicates



Linguistic background



Ordering communicative techniques


2.2 Analysis of texts


2.2.1 Predicate recursiveness


2.2.2 Summary of text analysis


2.3 Related research using rhetorical predicates


2.4 Use of schemata



Associating technique with purpose


2.5 Selecting a schema


2.6 Filling the schema


2.7 An example


2.8 Future work


2.9 Conclusions


3. Focusing

in discourse


3.1 Computational theories and uses of focusing


3.1.1 Global focus


3.1.2 Immediate focus


3.2 Focusing and generation


3.2.1 Global focus and generation


3.2.2 Immediate focus and generation


3.2.3 Current focus versus potential focus list


3.2.4 Current focus versus focus stack



A focus algorithm for generation


3.2.7 Selecting a default focus


3.2.8 Overriding the default focus


3.2.9 The focus algorithm


3.2.10 Use of focus sets


3.3 Focus and syntactic structures


3.3.1 Linguistic background


3.3.2 Passing focus information to the tactical component


3.4 Future work


3.5 Conclusions


4. TEXT system implementation


4.1 System components


4.2 Knowledge representation



Representation overview









The entity-relationship model



Use of generalization



The topic hierarchy






Distinguishing descriptive attributes



DDAs for database entity generalizations



Supporting database attributes



Based database attributes



DDAs for database entity subsets



Constant database attributes


4.3 Selection of relevant knowledge


4.3.1 Requests for information and definitions


4.3.2 Comparisons


4.3.3 Determining closeness


4.3.4 Relevancy on the basis of conceptual closeness


4.3.5 Conclusions


4.4 Schema implementation



Arc types



Arc actions






Graphs used



Traversing the graph



The compare and contrast schema


4.5 The tactical component


4.5.1 Overview of functional grammar


4.5.2 The grammar formalism


4.5.3 A functional grammar


4.5.4 The unifier



Unifying a sample input with a sample grammar



4.5.7 Grammar implementation


4.5.8 Morphology and linearization


4.5.9 Extensions


4.5.10 Disadvantages


4.5.11 Advantages



The dictionary


4.6.1 Design


4.6.2 Structure of dictionary entries


4.6.3 General entries


4.6.4 An example


4.6.5 Creating the dictionary


4.6.6 Conclusions



Practical considerations


4.7.1 User needs


4.7.2 Question coverage


4.7.3 Conclusions


5. Discourse history



5.1 Possible discourse history records


5.2 Questions about the difference between entities


5.3 Requests for definitions



5.4 Requests for information


5.5 Summary



6. Related generation



6.1 components - early systems



6.2 components - later works



6.3 Generation in database systems


6.4 Planning and generation


6.5 Knowledge needed for generation


6.6 Text generation



7. Summary




7.1 Discourse structure



7.2 Relevancy criterion


7.3 Discourse coherency


7.4 Generality of generation principles


7.5 An evaluation of the generated text


7.6 Limitations of the implemented system


7.7 Future directions




7.7.1 Discourse structure


7.7.2 Relevancy


7.7.3 Coherency


7.7.4 User model


7.7.5 Conclusion


Appendix A. Sample output of the TEXT system



B. Introduction to Working



Appendix C. Resources used


Appendix D. Predicate Semantics








There are two major aspects of computer-based text generation:

1) determining the content and textual shape of what is to be said; and

2) transforming that message into natural language.

research has been on a computational solution to the questions of what to

say and how to organize it effectively.

and implemented in a system called TEXT that uses principles of discourse

structure, discourse coherency, and relevancy criterion.

theoretical basis of the generation method and the use of the theory within the computer system TEXT are described.

Emphasis in this

A generation method was developed

In this book, the

The main theoretical results have been on the effect of discourse

structure and focus constraints on the generation process.

treatment of rhetorical devices has been developed which is used to guide the

generation process. Previous work on focus of attention has been extended for

the task of generation to provide constraints on what to say next.

of these two interacting mechanisms constitutes a departure from earlier

generation systems.

should not simply trace the knowledge representation to produce text.

Instead, communicative strategies people are familiar with are used to


be described in different ways on different occasions.

The main features of the generation method developed for the TEXT strategic component include 1) selection of relevant information for the answer, 2) the pairing of rhetorical techniques for communication (such as analogy) with discourse purposes (for example, providing definitions) and 3) a focusing mechanism. Rhetorical techniques, which encode aspects of discourse structure, are used to guide the selection of propositions from a relevant knowledge pool. The focusing mechanism aids in the organization of the message by constraining the selection of information to be talked about next to that which ties in with the previous discourse in an appropriate way. This work on generation has been done within the framework of a natural language interface to a database system. The implemented system generates responses of paragraph length to questions about database structure.

Three classes of questions have been considered:

available in the database, requests for definitions,




The use

The approach taken here is that the generation process

This means that the same information


convey information.

questions about information and questions about the

between database entities.

Text generation

The work described in this book was done at the University of Pennsylvania and would not have been possible without the help of a number of people who deserve special mention. First and foremost, is the influence of my advisor, Aravind K. Joshi, who provided many of the insights and much appreciated guidance throughout all stages of the work. I am also grateful to Bonnie Webber for her many helpful suggestions, pointers to relevant papers, and editorial comments. The implementation of TEXT was greatly assisted by Kathleen McCoy and Steven Bossie who designed and implemented portions of the system. Kathy developed a system which automatically generated a portion of the knowledge base and implemented the knowledge base interface. Steve designed and partially implemented the tactical component used in TEXT. Many people read and commented on various sections of the manuscript. Barbara Grosz's comments on the chapter on focusing were particularly valuable. Norman Badler, Peter Buneman, Richard Korf, Michael Lebowitz, Eric Mays, Kevin Matthews, Cecile Paris, and Ellen Prince also contributed

in this area.

reviewers must be mentioned as these were extremely helpful in improving the text. Support for this work was provided in part by an IBM Research Fellowship, by National Science Foundation grant #MCS81-07290 awarded to the Department of Computer and Information Science of the University of Pennsylvania, by ONR grant N00014-82-K-0256 awarded to the Department of Computer Science of Columbia University, and by ARPA contract N00039-82-C-0427 awarded to the Department of Computer Science of Columbia University.

Finally, the detailed and insightful comments of the unnamed



In the process of producing discourse, speakers and writers must decide



is that


want to say

and how to present it effectively.

They are

capable of disregarding information in their large body of knowledge


the world which is not specific

integrate pertinent information into a coherent unit. They determine how to

appropriately start the discourse, how to order its elements, and how to close


when to say

and how to group them into sentences. In order for a system to generate text, it, too, must be able to make these kinds of decisions.

to the task

at hand and they

manage to

These decisions are all part of the process of deciding what to say and


Speakers and writers must also determine what words to use

In this work,

a computational solution is sought to the problems of

deciding what to say and how to organize it effectively.

discourse can be applied to this task? they can be used in a computational

can aid our understanding of how discourse is produced by demanding a

precise specification



determining content and textual shape, the development and application of

principles of discourse structure, discourse coherency, are essential to its success.

What principles of

How can they be specified so that


A computational


of the

process. If we want to build a system that can

these tasks, our theory of production must be detailed and accurate.

to build a system that can produce discourse


and relevancy


1.1. Problems in generating text

To get a feeling for what a text generation theory must handle, consider an example of the kind of text the system should be able to generate (see (A) below). This text (taken from The Hamlyn Pocket Dictionary of Wines (Paterson 80)) was written for the explicit discourse goal of defining Flagey- Echezeaux. It presents information relevant to that goal in a comprehensible organizational framework. What must a generation system take into account to generate a text such as this one, given a specific discourse goal? To illustrate the problems inherent in language generation, I'll consider the following questions in light of example (A):

Text generation

• How do problems in language generation differ from those of language interpretation?

• What are the range of choices a generation system must consider?

• How does generation of text differ from generation of single sentences?

• What is specific to written text as opposed to speech?

A) Flagey-Echezeaux (France) Important red wine township in the Cote de Nuits with two front-ranking vineyards, Echezeaux and

Grands Echezeaux.

the second, which is not a single vineyard but a group, is also

capable of producing fine wines but, like other divided properties,

the quality of its wine is variable.

Echezeaux are entitled to the appelation Vosne-Romanee.

The first produces a fine rich, round wine and

The lesser wines of Flagey-

While researchers have investigated the problems involved in computer interpretation of natural language for some time now, interest in generating it has only recently begun to gain momentum. As a result, people are less familiar with the problems in language generation. Although there is research that suggests that the same information can be used both for interpretation and generation (e.g., Kay 79; Winograd 83; Wilensky 81), there are some important distinctions that can be made about the processes required for each task. Interpretation of natural language requires examination of the evidence provided by a particular text in order to determine the meaning of the text and intentions of the writer who produced it. It necessitates using that evidence to examine the limited set of options the system knows to be available to the writer to determine the option actually taken. For example, in interpreting the second sentence of example (A), a system would use the evidence that "produce" occurs in the active form to determine that "a rich round wine" is the object being produced and Echezeaux (to which "the first" refers, one of many problems for interpretation that I don't discuss) is the agent that does the producing. While interpretation involves specification of how a speaker's options are limited at any given point (for example, by writing grammars), it does not

require a formulation of reasons for selecting between those options.

in interpreting sentence (2) of example (A), a system does not consider why the writer used the active form as opposed to any of the other options


Note that as interpretation systems become more sophisticated, the analysis of reasoning behind the selection of a choice may be helpful in determining the goals of the speaker.

available at that point. exactly what is required.


In generation of natural language, however, this is A generator must be able to construct the best

utterance for a given situation by choosing between many possible options involving a wide range of knowledge sources. To produce the second sentence of the example, a generator must decide that although both the active and passive forms are possible (the passive would result in "a fine rich, round wine is produced by the first"), the active is better than the passive. Furthermore, the generator must have a principled reason for making that decision, which it can use in all similar cases. Where research on interpretation may describe limitations on options in order to more efficiently determine the option taken, research in generation must specify why one option is better than others in various situations.

The choices that a language generator must face include options regarding the content and textual shape of what is to be said and choices in the transformation of the message so determined into natural language. A language generation system must be able to decide what information to communicate, when to say what, and which words and syntactic structures best express its intent. In the last of these stages, local decisions such as syntactic and lexical choices are made, often using a grammar and dictionary to do so. It is in this stage that the active form would be selected for sentence (2) of the example. Until recently, this has been considered the extent of language generation research. But determining what to say and how to put it together above the sentence level also introduce language issues that must be addressed by any speaker or writer of extended discourse. These three classes of decisions constitute the full range of the language generation problem. If connected text (and not simply single sentences) is to be generated, issues of discourse structure and discourse coherency are particularly important. Generation of text requires the ability to determine how to organize individual sentences. A writer does not randomly order the sentences in his text, but rather, plans an overall framework or outline, from which the individual sentences are produced. This is obvious in example (A). The author has chosen an organizational framework that is appropriate for providing definitions. Here, he first identifies Flagey-Echezeaux by describing its superordinate ("important red wine township in the Cote de Nuits") and then introduces two of its constituents (Echezeaux and Grands Echezeaux). Next, characteristic descriptive information about each of these vineyards is presented in turn, and finally, the author presents additional information about Flagey-Echezeaux (the item being defined) in the last sentence. To generate texts that are well organized, an analysis of the kinds of structures that are appropriate for achieving discourse goals such as define is needed as well as methods for formalizing the results so that they can be used by a computational process.

Text generation

Discourse coherency is required if the generated text is to be a unit: the computational process must produce a text that "hangs together." This means that only information that is relevant to the discourse goal is included and that each sentence must be semantically related to the previous text. In example (A), only information supporting the definition of Flagey-Echezeaux is included in the text. This is due partly to the fact that the author only considers information that is related to Flagey-Echezeaux, but it is also due to the organizational strategy he has chosen. It dictates that information about each of the two constituents be included and not informatkri about the Cote de Nuit, for example. Furthermore, each sentence relates to the previous sentences. Having introduced Echezeaux and Grands Echezeaux in the first sentence, the author continues talking about them in the second sentence. If a system is to produce coherent text, a formalization of the factors that contribute to coherency is necessary so that the computational process can make use of it. These issues suggest a contrast between the generation of text and the

generation of single sentences.

single sentences within dialogue in that a text is more or less a linguistically

complete structure.

coherent, it constitutes a unit that in and of itself has a meaningful interpretation. This is in contrast to a dialogue sentence which may only be comprehensible in the context of the preceding discourse.

Considerations of context are also important for the generation of text however. Generation of a single sentence within a text must take into account the preceding and succeeding text. Even if the overall organization of the text provides an appropriate framework for the single sentence, the sentence must nonetheless be semantically linked in some way to the preceding and succeeding sentences if the resulting text is to be coherent. If the text is generated within an interactive environment, preceding discourse may also affect its generation. Although speakers do produce discourse consisting of more than one sentence, the concern here is with text that more closely resembles written than spoken text. This means that some of the phenomena which normally occur in speech, such as self-correction, incomplete or ungrammatical

sentences, informal styles or phrases (e.g., "yeah

and circularity, are not of importance. It also means that an investigation of the process of planning text is important, since writers typically spend more

time planning the organization and content of what is to be said than do speakers. For practical reasons, the use of written text is more appropriate since natural language systems produce their output in written form on a terminal screen and reading transcribed spoken text is difficult. The problem dealt with here can now be stated more concisely: How can a computer system determine what to say in service of a given discourse goal

Generation of text differs from generation of

Because a text has an organizational framework and is


"well"), interruption,


such that a coherent, well organized text is generated? The system's choices across all phases of the language generation process must not be arbitrary; rather, they must be well founded on linguistic principles which clearly justify the choice the system has made.

1.2. A processing model

In the preceding section, I have delineated the orientation and themes of this work: generation of multi^sentential text as opposed to single sentences, determination of textual content and organization as opposed to the surface text, and generation of written as opposed to spoken text. In order to focus on just these problems, a model of language production has been adopted that divides processing into two phases. In the first phase, the content and structure of the discourse is determined. The component embodying this phase is termed the "strategic" component, following Thompson (77). The second, the "tactical" component, uses a grammar to translate the message into English. This distinction allows focus on the problems and processes of the strategic component. The output of the strategic component is an ordered message; all decisions about what to include in the text and when to include it have been made. The strategic component, furthermore, must be capable of providing information needed by the tactical component to make decisions about lexical and syntactic choices. Although the discourse planning process need not know how to express its message in natural language, it must provide the information on which choices about expression can be made. The strategic component embodies both semantic and structural processes. Semantic processes determine relevancy: of all that could be said, the component must be capable of selecting that information that is relevant to a given discourse goal. The strategic component must also be capable of determining an organizational strategy that is appropriate for the given discourse goal. Communicative techniques, comprising the strategy, must be selected and integrated to form the text. Such strategies determine the structure of the text. In the formation of the text, semantics are also necessary, in part to ensure that each sentence is related to previous text. Although some of the decisions that must be made are basically semantic

2 Although processing in

this research was based on a division of the two

stages such that the results of the strategic component were completely determined and then passed to the tactical component, a control structure

which allowed for backtracking between the tactical and strategic component such as Appelt suggests (Appelt 81) would also be possible. The approach I have taken clearly specifies bow processes in the planning of the text

influence the realization of a message in natural language.

would allow for processes that produce the surface expression to influence the

planning of the discourse. issue.


See Chapter Six for further discussion of this



in nature, while others are structural, the mechanisms that handle these

decisions need not each use only semantic or structural information. In fact,

I claim that each of these decisions is determined by an interaction between

structural and semantic processes. The organizational strategies used in the text will affect its content and the information that is determined to be

relevant will influence the chosen organization of the text. For example, suppose the generation system was to generate a definition of Flagey-Echezeaux as in example (A). A diagram illustrating the different mechanisms involved is shown in Figure 1-1. The strategic component would

receive as input the discourse goal define (Flagey-Echezeaux). It has access to

a knowledge base containing information about the world, including many

townships. From this knowledge base, the semantic processes select just information pertaining to Flagey-Echezeaux. Structural processes, also part of the strategic component, select an organizational strategy for the text. The double arrow between these two components in the diagram indicates interaction between the two processes. I have yet to define this interaction exactly. A message, represented in internal form, is produced by the strategic component and passed to the tactical component which generates the final English text.

1.3. A sketch of related work

What is known about language generation? Is there a previous body of work from which this work can draw? The majority of work done to date** on computer generation of language has focused on problems in the tactical component. This has ranged from work on direct translation of an underlying formal representation (e.g., Simmons and Slocum 72; Chester 76), the development of grammars and mechanisms for using those grammars to produce language (e.g., McDonald 80; Kay 79), and the development and

representation of criteria for making decisions about vocabulary as part of a dictionary (e.g., Goldman 75; McDonald 80). While such work has little to say about determining the content and organization of the text, it does offer well tried mechanisms and procedures for translating a message into language and much of the tactical component implemented here draws off this knowledge. It should be noted that most of this previous work deals with the generation of single sentences and thus, questions about how the surface text must link in with the previous text have gone largely unanswered. This

is a place where the development of a well founded strategic component can

provide answers.


This section outlines previous work in generation and orients it in the


a full discussion of previous work in natural language generation, see Chapter


division. Only work which is directly related is outlined. For











Strategic and Tactical





Text generation

Relatively little work has been done on problems in the strategic component. The work that has been done can be characterized as addressing one of three main issues: knowledge needed for generation (e.g., Swartout 81; Meehan 77), planning to determine an appropriate speech act (e.g., Cohen 81a), and textual organization (e.g., Mann and Moore 81; Weiner 80). Of these, the works of Mann and Weiner address issues most similar to the ones I am concerned with. They are both concerned with achieving the best ordering of information in a given knowledge base, but assume that all information pertains to the discourse task and thereby avoid the problem of determining textual content. The approach I take to textual organization is significantly different from both of their approaches. These differences are explored in more detail in Chapter Six.

Linguistic research also bears on the questions I am considering.


are two main classes of work that are related. These involve research on textual and discourse organization and research on discourse coherency. The work on textual organization (e.g., Williams 1893; Shipherd 26; Grimes 75) provides an analysis of the structure of text, identifying the basic structural units of which text is composed. This is useful for generating text, since the same basic units can be used, but it leaves open questions about how units can be combined and fails to be very precise. These analyses are significantly extended in this work so that they can be applied in a generation system.

Linguistic work on discourse coherency (e.g., Halliday 67; Prince 79; Firbas 74) identifies how coherency is expressed in a text (e.g., how certain syntactic constructions and coreferentiality can be used to increase the coherency of the text), but does not address how coherency of content and structure can be achieved before the surface text is generated. Their results

can be used by the tactical component in generating the surface text, but- do not apply to the problems of the strategic component. There has been a significant amount of work on discourse coherency in the computer science


uses a formulation of the speaker's focus of attention. As I shall show, this work is applicable to the problem of generating coherent textual content.

for interpretation of natural language (e.g., Grosz 77; Sidner 79) which

1.4. A text generation theory and method

The main theoretical emphasis of this work has been on the effect of discourse structure and focus constraints on the generation process. This has involved a formulation of discourse structure that is commonly used in naturally occurring texts. I present a representation of discourse structure that specifies a computational model of rhetorical devices that can be used for generation, an approach that has not previously been taken. This means that the generation process is able to use the same strategies that people commonly use to produce effective text.


One of the strategies formalized for the generation process is the constituency strategy that was used in the example text for defining Flagey-


is a common one for producing effective texts. It is characterized by three

main steps:

This strategy is used in many naturally occurring texts and thus

1. Present the constituents of the item to be defined.

2. Present characteristic information about each constituent in turn.

3. Present additional characteristic information about the item to be defined.

By representing the strategy formally, the system can use it to determine the ordering and, in part, the content of the text it generates. Other strategies developed are presented in Chapter Two along with the formal representation. Strategies for effective communication are used in combination with a treatment of focus constraints on language generation. A model of focus of attention, which represents the focus of the text constructed so far, is used to constrain what can be said next at any point when a choice exists. The model extends the use of focus constraints for interpretation (Sidner 79) for generation through the development of a preference ordering. The use of ordered focus constraints ensures that the resulting text will be coherent. Not only does focus of attention provide a basis for deciding what to say next when the discourse strategy allows for a choice, but it also provides information needed by the tactical component to decide how to express the


sentence, the author focuses on Flagey-Echezeaux, but introduces its two constituents, Echezeaux and Grands-Echezeaux. In the first part of the second sentence, he shifts focus to Echezeaux and this determines his choice of the active form as it allows him to signal his focus to the readers. The result is a formal theory of discourse strategy and focus of attention, as well as a specification of their interaction. These formulations have been embodied in the semantic and structural processes of the strategic component. That these processes interact with each other results in a greater variety of possible texts. A single plan for generation used in different situations does not always produce the same text because of the focus constraints (and different underlying knowledge). Similarly, although the same information may be produced by semantic processes for satisfying two different discourse goals, the texts generated may be different since different strategies are associated with the discourse goals. These analyses have been implemented as part of the strategic component of a generation system called TEXT. The main features of the

This is evident in the Flagey-Echezeaux definition.

In the first

Text generation

generation method developed for the TEXT strategic component include 1) the pairing of rhetorical techniques for communication (such as analogy) with discourse purposes (for example, providing definitions), 2) selection of relevant information for the current discourse goal, and 3) a focusing mechanism. Rhetorical techniques, which encode aspects of discourse structure, are used to guide the selection of propositions from a relevant knowledge pool: a subset of the knowledge base which serves as the source for all information which can be included in the text. The focusing mechanism helps maintain discourse coherency. It aids in the organization of the message by constraining the selection of information to be talked about next to that which ties in with the previous discourse in an appropriate way. These processes operate in a cooperative fashion to produce the textual message.

The relevant knowledge pool is constructed by semantic processes after receiving a discourse goal. It contains information determined by the system to be relevant to the given goal. Use of a relevant knowledge pool provides a limit on the information that needs to be considered when constructing a text for a given goal, thus increasing the efficiency of the program while at the same time providing a model of a speaker's narrowing of attention when producing a text. Rhetorical techniques are the means which a speaker has available for description. In the TEXT system, these techniques have been encoded as schemata which represent patterns of discourse structure. Use of schemata reflects the fact that people have preconceived ideas about how to provide different kinds of descriptions. The choice of a particular schema to use for an answer is affected by a characterization of the information available and by the discourse purpose of the current answer. The schema is effectively a plan for the text and is used to guide the generation process in its decisions about what to say next. Focusing constraints, which define how focus can shift from one sentence

to the next, are used to ensure that the generated text is coherent. text is about something, what is said at any given point must be

appropriately related to what has already been said.

mechanism tracks focus of attention as a text is created and, where there are

choices for what to say next, it eliminates options that violate its knowledge

about valid shifts in focusing. schemata in the TEXT system.


The focusing

The focus constraints monitor the use of the

1.5. System overview


The TEXT system was developed to generate text in response to a limited class of questions about the structure of a military database. The system consists of four basic modules: the semantic processor which produces the relevant knowledge pool, the schema selector, the schema filler, which uses both the selected schema and focus constraints to do its job, and the tactical component. A diagram providing a simple overview of the generation process is shown in Figure 1-2. To answer an incoming question, TEXT first selects a set of possible schemata to be used for the answer. These are the strategies associated with the discourse purpose of the current answer (for example, to provide a definition). On the basis of the input question, semantic processes produce a pool of relevant knowledge. The type of information available in this pool is used to select a single schema from the set of possible schemata. This marks the beginning of interaction between the structural and semantic processes in the system; here semantics influences the structure selected for the answer. The answer is constructed by "filling" the schema: propositions are selected from the relevant knowledge pool which match the rhetorical techniques in the schema. Each rhetorical technique has associated semantics that indicate which types of propositions in the knowledge base it matches. These semantics are dependent on system type (such as database versus computer aided instruction system), but are not dependent on the domain of the system. A focusing mechanism monitors the matching process; where there are choices for what to say next (i.e., where the rhetorical technique matches several propositions in the knowledge pool), the focusing mechanism

selects that proposition which ties in most closely with the previous discourse. When a proposition has been selected, focus information about the proposition

is recorded.

When the schema has been filled, the system passes the constructed, ordered message to the tactical component. The tactical component uses a functional grammar, based on a formalism defined by Kay (79), to translate the message into English. The main theoretical emphasis at this level is on the use of information derived by the strategic component to determine surface choices. The grammar was designed so that it can use the focus information provided in the message to select appropriate syntactic constructions. A sample text produced by the system in response to a request to define

a guided projectile is shown below in (B). The system used the constituency strategy to produce this text.



input question














System Overview

B) A guided projectile is a projectile that is self-propelled.


There are 2

types of guided projectiles in the ONR database: torpedoes and missiles. The missile has a target location in the air or on the earth's surface. The torpedo has an underwater target location. The missile's target location is indicated by the DB attribute DESCRIPTION and the missile's flight capabilities are provided by the DB attribute ALTITUDE. The torpedo's underwater capabilities are provided by the DB attributes under DEPTH (for example, MAXIMUM OPERATING DEPTH). The guided projectile has DB attributes TIME TO TARGET & UNITS, HORZ RANGE & UNITS and NAME.

1.6. The database application

In order to test principles about natural language generation, an application was selected that could provide a motivation for generation as well as a restricted yet interesting domain. A system was developed, therefore, within the framework of a natural language interface to a database system that addressed the specific problem of generating answers to questions about the database structure. To date, natural language database systems have concentrated on answering factual questions, providing answers in the form of lists or tables of objects in the database. These questions query the existence or identity of restricted classes of objects in the database. An answer is provided by searching the database for objects which meet the given restrictions. To ask such questions, the user must already know what information is stored in th e database and how it is structured. (Note that even if th e user already knows what type of information is available, its structure in the database is not always intuitive.) A user who is not aware of the nature of information stored and its structure can neither request the system to supply this information (since current systems do not possess this capability) nor phrase appropriate questions about the database contents. The task of the TEXT system is to generate responses to such meta-level questions. Three classes of questions have been considered: questions about information available in the database, requests for definitions, and questions about the differences between database entities. In this context, input questions provide the initial motivation for generating text. Although the specific application of answering questions about database

Note that

in some systems, the list (especially in cases where it consists of

only one object) may be embedded in a sentence, or a table may be introduced bv a sentence which has been generated by the system (Gnshman 79). In a few systems (e.g., Malhotra 75; Codd 78), a one or tw o sentence reply about the information in the database may be generated, but this reply is usually stored as a whole in the knowledge structure.



structure was used primarily for testing principles about text generation, it is

a feature that many users would like. Several experiments (Malhotra 75;

Tennant 79) have shown that users need to ask questions to familiarize themselves with the database structure before proceeding to make requests about the database contents.

Malhotra's experiment involved a simulated management decision support

system in which users typed in questions at a terminal.

intercepted by a person familiar with the system, who rephrased the questions

using syntax acceptable to the system.

system could not answer, the interceptor would either answer the question himself or construct a series of questions necessary to answer the one asked.

Subjects were given a problem to solve which required using information

stored in the database.

often begin by asking questions to familiarize themselves with the material available before asking questions particular to the given problem. Typical of the questions that were asked are the following:

These questions were

When questions were asked which the

Transcripts of the user sessions indicate that people

• What kind of data do you have?

• What do you know about unit cost?

• What is the difference between material cost and production cost?

• What is production cost?

Tennant's experiments were done on two natural language database systems: the PLANES system, which accesses a large database containing

information about naval aircraft, and the Automatic Advisor, which accesses

a smaller database containing course information.

asked to solve database problems after reading introductory information about the database. Tennant found that systems tended to be lacking in conceptual coverage. Like Malhotra, he found that users often asked questions which were not interpretable as database queries. These included questions about the database (e.g., "What do you know?") and questions about vocabulary (e.g., "What is a buser?").

Responding to questions such as these requires more than a simple search

These types of questions do not provide as clear restrictions

University students were

of the database.

on what information is sufficient to answer them as do specific questions about the database content. In fact, there is often no single correct way to answer such questions. Since answers to questions about the structure of the database will usually require more than a single sentence, the application provides an appropriate testbed for generation principles. The system will be required to determine how to select the appropriate information to be included in the answer and how to organize it into a multi-sentential text.

Implementation of the TEXT system for natural language generation used

a portion of the Office of Naval Research (ONR) database that contains


questions that can be asked of TEXT include:

about vehicles and destructive devices.

Some examples of


• What is a frigate?

• What do you know about submarines?

• What is the difference between an ocean escort and a cruiser?

Examples of questions and responses from this domain will be used

throughout the book.

is illustrated by the response it generates to question (C) below.

The kind of generation of which the system is capable

C) What kind of data do you have? All entities in the ONR database have DB attributes REMARKS.

There are 2 types of entities in the ONR database: destructive

devices and vehicles.

information on SPEED INDICES and TRAVEL MEANS.

destructive device has DB attributes that provide information on LETHAL INDICES.

The vehicle has DB attributes that provide


The type of response generated by the TEXT system could be used not only for specific questions about the database structure, but also as supportive explanations for yes/no questions'* or as explanations for structural presumption failures (Mays 80). As an example, consider the question "What is the draft and displacement of the whisky?". A plausible response is given in (D) below. This is very similar to some of the responses currently generated by the TEXT system.

D) The database contains no information on DRAFT and DISPLACEMENT for the whisky. Ships have DB attributes DRAFT and DISPLACEMENT. The whisky is an underwater submarine with a PROPULSION TYPE of DIESEL and a FLAG of RDOR. The submarine's underwater capabilities are provided by the DB attributes under DEPTH (for example, OPERATING DEPTH) and MAXIMUM SUBMERGED SPEED. Other DB attributes of the submarine include OFFICIAL NAME, FUEL (FUEL TYPE and FUEL CAPACITY), and PROPULSION TYPE.

A system for generating textual responses to questions requiring

Kaplan (79) also discusses the use of supportive explanations for yes/no questions.

The system developed is not capable of detecting an intensional failure. Assuming that such a failure has been found, the system could be extended to generate a response that explains the failure.

Text generation

descriptions or explanations could be useful in other application areas in addition to the database query system. Computer assisted instruction systems (Collins, Warnock and Passafiume 74) and expert systems (Grosz 77) are examples of areas where the provision of descriptions and explanations would be useful. The methods for generation developed for the TEXT system are not specific to the database application and could be adapted for systems where generation of descriptions of static information is required. That is, the schemata capture discourse strategies in terms of text structure, and their representation does not rely on domain (or system) dependent concepts. Similarly, the focus constraints describe general preferences for what should be appropriately said next and apply to all situations in which coherent text must be produced.

1.7. Other issues

In order to develop a system that can generate text in response to questions about database structure, problem areas outside the realm of text generation per se had to be considered. These include the knowledge representation which contains the information to be described, interpretation of the user's question and user modelling. Knowledge representation and content is important since it limits what

the generation component is able to talk about unless an extensive inferencing component is available. A knowledge representation was implemented (McCoy 82) which draws heavily on features used in other database models.

It is based on the Chen entity-relationship model (Chen 76) and also includes

a generalization hierarchy on entities, a hierarchy on attributes, and

distinguishing characteristics of entities in the generalization hierarchy. This combination of features incorporates both information about the actual database and its structure as well as a real world view of the data.

No facility for interpreting a user's questions is provided in the TEXT system implementation since this work is on the generation of language and not interpretation. Questions must be phrased using a simple functional notation which corresponds to the types of questions that can be asked. The TEXT system provides a canned explanation of this notation when it is invoked and it is fairly easy to use. For a user model, the system assumes a casual and naive user and gears its responses to a level appropriate for this characterization. An extensive user modelling facility, which can represent and infer information about different types of users, was not implemented as part of the TEXT system. An analysis was done, however, on the effect of previous discourse on the generation of responses. This analysis indicates how a generation system can make use of the previous dialogue to tailor its responses to the current user and is described in Chapter Five. The research described here focuses on issues concerning the content and


organization of the generated text.

most part, been addressed in the past and they represent areas about which

little is known.

treatment of discourse structure and focusing constraints and their relation to the generation of natural language was necessary.

These two problems have not, for the

In order to handle them appropriately, a comprehensive

1.8. Guide to remaining chapters

A discussion of discourse structure, its effect on generation, and the implementation of the schemata is provided in Chapter Two. The focus constraints, both as they affect discourse coherency and as they restrict attention to relevant information, are discussed in Chapter Three. These two chapters describe the major part of the text generation theory and are essential to the remaining chapters. The implementation of the TEXT system is described in Chapter Four. This includes the knowledge base used, the method used to determine relevancy, the dictionary and the tactical component. Chapter Four closes with a discussion of practical considerations, discussing how close the system comes to meeting the needs of real users. Chapter Four will be of interest to those who want to get a real feel for how the system works. It will be of less interest to those who lack a computer science background, although the section on the tactical component is recommended for those with a linguistic background. Chapter Five gives an analysis of how the previous discourse could be used to improve the quality of the responses generated. A comparison of this work to other research in natural language generation is provided in Chapter Six and the final chapter presents some conclusions, along with suggestions for future work. Appendix A provides examples of the TEXT system in operation.


Discourse structure

The approach I have taken towards text generation is based on two fundamental hypotheses about the production of text: 1) that how information is stored in memory and how a person describes that information need not be the same and 2) that people have preconceived notions about the ways in which descriptions can be achieved. I assume that information is not described in exactly the same way it is organized in memory. Rather, such descriptions reflect one or more principles of text organization.' It is not uncommon for a person to repeat himself and talk about the same thing on different occasions. Rarely, however, will he repeat himself exactly. He may describe aspects of the subject which he omitted on first telling or he may, on the other hand, describe things from a different perspective, giving the text a new emphasis. Chafe (79) has performed a series of experiments which he claims support the notion that the speaker decides as he is talking what material should go into a sentence. These experiments show that the distribution of semantic constituents among sentences often varies significantly from one version of a narrative to another. The second hypothesis central to this research is that people have preconceived ideas about the means with which particular communicative tasks can be achieved as well as the ways in which these means can be integrated to form a text. In other words, people generally follow standard patterns of discourse structure. For example, they commonly begin a narrative by describing the setting (the scene, the characters, or the time- frame). In the TEXT system, these types of standard patterns of discourse structure have been exploited through the use of schemata. A schema is a

I make no claims about the nature of stored knowledge in this research.

In practice, however, a particular representation for the

Questions about how a representation

are discussed in Section 4.2.

For the purposes of text generation, any representation ol knowledge could

have been used.

given application had to be selected.

can restrict the generation process, either in terms of content or ease of


Text generation

representation of a standard pattern of discourse structure which efficiently encodes the set of communicative techniques that a speaker can use for a particular discourse purpose. It defines a particular organizing principle for text and is used to structure the information that will be included in the answer. It is used to guide the generation process, controlling decisions about what to say when in a text. This mechanism embodies a computational treatment of rhetorical devices, which have not previously been formalized in such a way.

2.1. Rhetorical predicates

Rhetorical predicates are the means which a speaker has for describing information. They characterize the different types of predicating acts he may use and they delineate the structural relations between propositions in a text. Some examples are analogy (the making of an analogy), constituency (description of sub-parts or sub-types), and attributive (providing detail about an entity or event). Linguistic discussion of such predicates (e.g., Williams 1893; Shipherd 26; Grimes 75) indicates that some combinations are preferable to others. The following sections give the linguistic background of rhetorical predicates.

2.1.1. Linguistic background The notion of the means available to a speaker or writer goes back to Aristotle, who describes the means which a speaker can use for persuasive argument (McKeon 41). He distinguished between enthymemes (or syllogisms) and examples, where syllogisms are argument types and examples provide evidence for different arguments. Both Williams (1893) and Shipherd (1926), old-style grammarians, categorize sentences by their function in order to illustrate to the beginning writer how to construct paragraphs. The functions Williams identifies include:

topic, general illustration, particular illustration, comparison, amplification, contrasting sentences, and conclusions. Although Williams enumerates many of the "do's" and "don'ts" of writing, he says nothing about combining sentence functions to form paragraphs. He merely cites examples of prose that he considers well done and identifies the function of each sentence in the examples. In more recent years, Grimes describes rhetorical predicates as explicit

Grimes distinguishes three

organizing relations used in discourse (Grimes 75). functions that predicates can serve in discourse:

1. supporting or supplementary (which add detail, explain, or substantiate what has come before. The three examples of predicates given above fall into this category.)

Discourse structure

2. setting (which locate an object or event in space or time)

3. identification (which establish or maintain reference to an object)

Grimes claims that the predicates are recursive and can be used to identify the organization of text at any level (i.e. proposition, sentence, paragraph, or longer sequence of text), but does not show how this is done. Rhetorical predicates have also been called coherence relations (eg, Hirst 81) and have been used as an aid in anaphora resolution (Hobbs 78; Lockman 78). Hirst (81) proposed a set of relations extracted from a variety of sources including elaboration, contrast, effect, cause, syllogism, parallel, and exemplification Works using coherence relations concentrate on their aid in interpretation for the specific task of anaphora resolution.

2.1.2. Ordering communicative techniques Although the use of rhetorical predicates in text as structuring devices has been considered, most researchers have not discussed the ways in which they may be combined to form larger units of text. Both Grimes and Williams imply this use however. Grimes claims that the predicates are recursive, and Williams cites examples of well-written prose, identifying the predicates used. My own examination of texts and transcripts has shown that not only are certain combinations of rhetorical techniques more likely than others, but certain ones are more appropriate in some discourse situations than others. For example, I found that identification of objects was frequently achieved by employing some combination of the following means: (1) identification of an item as a member of some generic class, (2) description of an object's function, attributes, and constituency (either physical or class), (3) analogies made to familiar objects, and (4) examples. These techniques were rarely used in random order; for instance it was common to identify an item as a member of some generic class before providing examples. For this analysis of rhetorical predicates, a variety of texts were examined - ten different authors, in varying styles, from very literate written to transcribed spoken texts formed the basis of the study. Short samples of expository writing were used since these are most relevant to the system being developed. This also avoided problems involved in narrative writing (e.g., scene, temporal description, personality). The data were drawn from the following texts: Working (the introduction plus two transcriptions) (Terkel 72), Dictionary of Weapons and Military Terms (Quick 73), Encyclopedia Americana (Encyclopedia 76), The Hamlyn Pocket Dictionary of Wines (Paterson 80), The Poorperson's Guide to Great Cheap Wines

Text generation

(Nelson 77), "The American Style of Warfare and Military Balance" (Luttwak 79), Future Facts (Rosen 76), "Toxicants occurring naturally in spices and flavors" (Hall 73), transcripts of mother-child dialogues (Shipley 80) , transcripts of user interactions with database systems (Malhotra 75), and "Tactical Nuclear Weapons" (Martin 73). Each proposition in the texts was classified as one of the set of rhetorical predicates shown in Figure 2-1 - 2-3 . A proposition is a simple predicating act and can surface linguistically as either a sentence, sentence fragment, or a clause. Where possible, each clause was classified as a predicate, but in some cases a sentence consisting of several clauses could not be broken down into separate predicates and a single predicate was assigned to the entire sentence. In such cases, the entire sentence as a whole conveyed the force of the predicate. In a few cases, it was difficult to classify a proposition definitively as a single predicate. In such cases, the ambiguous proposition was assigned several predicates. Figures 2-1 - 2-3 show three groups of predicates, categorized by source.


In the figures, each predicate is followed by an example English sentence. some cases, a preceding sentence was needed to provide a context in which

to give the example. In such cases, the example illustrating the predicate is underlined. The first group of predicates was taken from (Grimes 75). The

second group of predicates was taken from (Williams 1893).

predicates are somewhat similar to those proposed by Grimes, but provide a viewpoint different enough to be useful. For example, "conclusion" names a predicate which draws a conclusion from the previous discourse, while Grimes' "inference" identifies a specific fact deduced from previous facts.

The final group of predicates are those that I found necessary to add

during the analysis of texts. "Identification" identifies an entity as belonging to a specific class (the opposite of Grimes' "constituency"). This predicate may be followed by attributes or functions which further identify the entity. "Positing" simply introduces an entity into the text (e.g., "Just think of Marcus Welby", "Movies set up these glamorized occupations" (Terkel 72)). Further discussion of the entity was only provided in succeeding sentences and not in the positing proposition. "Renaming" provides alternative names

for an entity (e.g., "Also known as the 'Red Baron'



These are transcripts of taped dialogues between mothers and their children where each mother was asked to show her child pictures of familiar and unfamiliar objects and discuss them. Some mothers described the pictures in great detail, while others provided minimal comments. The dialogues were taped by Shipley and Tier colleagues for psychological experiments.

Only a sampling of paragraphs was used from each.

Discourse structure


Attributive Mary has a pink coat.


Equivalent Wines described as 'great' are fine wines from an especially good village.


Specification (of general fact)

Mary is quite heavy.

She weighs 200 pounds.


Explanation (reasoning behind an inference drawn) So people form a low self-image of themselves,


their lives can never match the way Americans live oil the screen.


Evidence (for a given fact)

The audience recognized the difference. They started

laughing right from the very first

frames o£ that film.


Analogy You make it in exactly the same way as red-wine sangria, except that you use any of your inexpensive white wines instead of one of your inexpensive reds.

7. Representative (item representative of a set) What does a giraffe have that's special?

8. Constituency (presentation of sub-parts or sub-classes)

a long neck.

This is an octopus

and he has these suction cups.

There ts fvte eye, these are his legs,

9. Covariance (antecedent, consequent statement) / / John went to the movies, then he can tell us



10. Alternatives We can visit the Empire State Building or call it a day.

11. Cause-effect The addition of spirit during the period of fermentation arrests the fermentation development

12. Adversative It was a case of sink or swim.

13. Inference So people form a low self-image of themselves.

Figure 2-1:

Grimes' Predicates

Text generation

Williams' predicates are illustrated by providing an example paragraph from his text in which each sentence is classified as one of his predicates. The classifying predicate follows the sentence.



General illustration

Particular illustration




"What, then, are the proper encouragements of genius? (topic) I answer, subsistence and respect, for these are rewards congenial to nature. (amplification) Every animal has an aliment suited to its constitution. (general illustration) The heavy ox seeks nourishment from earth; the light chameleon has been supposed to exist on air. (particular illustration) A sparer diet than even this satisfies the man of true genius, for he makes a luxurious banquet upon empty applause, (comparison) It is this alone which has inspired all that ever was truly great and noble among us. It is as Cicero finely calls it, the echo of virtue, (amplification) Avarice is the pain of inferior natures; money the pay of the common herd, (contrasting sentences) The author who draws his quill merely to take a purse no more deserves success than he who presents a pistol, (conclusion) "


Figure 2-2:

Analysis of texts

Williams' Predicates

My analysis has shown that, with slight variations, similar patterns of predicate usage occur across various expository texts. These patterns have been represented as schemata. The schemata are recursive descriptions and may be embedded in other schemata to form paragraphs. In addition, in the texts, a paragraph was sometimes introduced by the positing predicate. Allowing for schema embedding and positing initial sequences, each paragraph that was examined (a total of 56) could be described by one of the schemata developed. Four schemata were found to capture the structure of the 56

Discourse structure

1. Identification ELTVILLE (Germany) An important wine village of the Rheingau region.

2. Renaming Also known as the Red Baron.

3. Positing Just think of Marcus Welby.

Figure 2-3:

Additional Predicates Needed For The Analysis

paragraphs. ® These schemata are not intended to capture the structure of all written text. Additional analysis is necessary to capture common strategies used for discourse goals other than those considered here.

identified are shown in Figures 2-4 - 2-7. "{} " indicates

The schemata

optional constituents, "/ " indicates alternatives, "+ " indicates that the item

may appear

may appear 0 to n times. Each schema is followed by a sample paragraph

taken from the data and a classification of the propositions contained in the

paragraph. ";" is use d to

the paragraph. These were translated into the schemata as alternatives. The attributive schema (Figure 2-4) can be used to illustrate a particular point about a concept or object. The sample paragraph, taken from the Introduction to Working, attributes the topic (working and violence) to the book, amplifies on that ("spiritual as well as physical") in proposition (2),

and provides a series of illustrations in the third sentence. Note that the third proposition could either be classified as many illustrations or as a single illustration, both of which are covered by the schema. The fourth

1 to

n times, and "*" indicates that the item is optional and

represent classification of ambiguous propositions in

Note that, in order to make such an analysis, the function of each proposition had to be determined and a predicate assigned to it. Since there are no hard and fast rules for predicate assignment, the analysis is subjective and could have had somewhat different results if done by someone else. This affects both the form of the resulting schemata and the number of schemata necessary to cover all 56 paragraphs. If rules for predicate assignment could be developed, then the schemata could be used for interpretation as well: if an input textual sequence were captured by a schema, its discourse purpose would be discernible. See Section 2.8 for further discussion.

Text generation

proposition presents a single example as representative of the problem and the fifth amplifies on that instance. The fifth proposition illustrates an ambiguous classification, since it could conceivably function as either amplification or explanation. The identification schema (Figure 2-5) is used to identify entities or events. The characteristic techniques it uses to do so include identification, particular illustration, evidence, analogy, renaming, and various descriptive predicating acts. It should be noted that the identification schema was only found in texts whose primary function was to provide definitions (i.e., dictionaries and encyclopedias). The other texts simply did not have occasion to provide definitions. Moreover, the schema represents the types of definitions provided in the particular examples analyzed but does not dictate what every definition must look like. For example, some definitions may be provided by describing process information associated with the term. The constituency schema (Figure 2-6) describes an entity or event in terms of its sub-parts or sub-types. After identifying its sub-types, the focus can either switch to each of its sub-types in turn (following the depth- identification or depth-attributive path) or can continue on the entity itself, describing either its attributes (attributive path) or its functions (cause-effect path). Note that there are three possible predicates that can be used for each sub-type if the depth-identification or depth-attributive path is taken (this is indicated by indentation of the entire set in the figure). Two of these are optional and need not occur (i.e., particular-illustration/evidence and comparison/analogy), but if they do, this portion of the schema will expand into three propositions for each subtype. The schema may end by optionally returning to discussion of the original by using the amplification, explanation, attributive, or analogy predicate. In the sample paragraph, taken from the American Encyclopedia, part of the entry under torpedo includes a description of its classification. In the section title and first sentence, the two types of torpedoes are introduced. First, the steam-propelled model is identified by citing facts about it, and then the electric-powered model is compared against it, with the most significant difference cited. The contrastive schema (Figure 2-7) is used to describe something by contrasting it against something else. The speaker may contrast his major point against a negative point (something he wishes to show isn't true). The lesser item (to be contrasted against) is introduced first. The major concept is then described in more detail using one or more of the predicates shown in the second option of the schema. The closing sequence makes a direct comparison between the two. This schema dictates the structural relation between the two concepts — the use of A and ~ A (not A) in the schema represent the major and lesser concepts — but is less restrictive about which predicates are used.

Discourse structure

Attributive Schema

Attributive {Amplification; restriction} Particular illustration* {Representative} {Question; problem Answer} / {Comparison; contrast Adversative} Amplification/Explanation/Inference/ Comparison


"1) This book, being about work, is, by its very nature, about violence - 2) to the spirit as well as to the body. 3) It is about ulcers as well as accidents, about shouting matches as well as fistfights, about nervous breakdowns as well as kicking the dog around. 4) It is, above all (or beneath all), about daily humiliations. 5) To survive the day is triumph enough for the walking wounded among the great many of us." (Terkel 72)

Example Classification

1. Attributive

2. Amplification

3. Particular illustration

4. Representative

5. Amplification; explanation

Figure 2-4:

The Attributive Schema



Identification Schema

Identification (class & attribute/function) {Analogy/Constituency/Attributive/Renaming/Amplification}* Particular-illustration/Evidence-f {Amplification/Analogy /Attributive} {Particular illustration/Evidence}


"Eltville (Germany) 1) An important wine village of the Rheingau


style, 3) with a considerable weight for a white wine.

Sonnenberg and Langenstuck are among vineyards of note."

2) The vineyards make wines that are emphatically of the Rheingau

4) Taubenberg,

(Paterson 80)

Example Classification

1. Identification (class & attribute)

2. Attributive

3. Amplification

4. Particular illustration

Figure 2-5:

The Identification Schema

Constituency Schema

Constituency Cause-effect*/Attributive*/ Depth-identification/Depth-attributive {Particular-illustration/evidence} {Comparison/analogy} } + {Amplification/Explanation/Attributive/ Analogy}



Discourse structure

"Steam and electric torpedoes. 1) Modern torpedoes are of 2 general


of 4000 to 25,000 yds. (4,367 - 27,350 meters).

models are similar 4) but do not leave the telltale wake created by the

exhaust of a steam torpedo."

2) Steam-propelled models have speeds of 27 to 45 knots and ranges

3) The electric powered

(Encyclopedia 76)

Example Classification

1. Constituency

2. Depth-identification;


3. Comparison

4. Depth-identification;


Figure 2-6:

The Constituency Schema

Text generation

In the sample paragraph, the contrastive schema is used to show how people form a bad self-image by comparing themselves against those in the movies. In the first sentence, the movie standard is introduced (the negative point or ^A) . In the second and third sentences, real-life occupations and the feelings associated with them are described (the major point or A). Finally, a direct comparison is made between the two situations and an inference drawn: "people form a low self-image of themselves."

2.2.1. Predicate recursiveness Although the examples above only show how the schemata work at the paragraph level, there is evidence that such organization also occurs at higher levels of text. The schemata were found to apply to a sequence of paragraphs, with each predicate in the schema matching an entire paragraph, instead of a single proposition. The Introduction to Working, for example, covers three major topics, each of which is introduced and closed within four or five paragraphs. The first topic group follows the attributive schema (the text for this topic group is reproduced in Appendix B); each paragraph in the group matches a single rhetorical predicate. * Figure 2-8 shows a tree representing the first topic group of the Introduction. Paragraphs are numbered nodes in the tree. The tree is described by the predicates listed at the bottom of the figure which is an instantiation of the attributive schema. Thus, the predicates do indeed seem to function recursively as Grimes suggests. Schemata, since they consist of predicates, also function recursively; that is, each predicate in a schema can expand to another schema. The structure of a text when described by the schemata is, therefore, hierarchical. Each node in the hierarchical structure corresponds to a predicate. The predicate can either be interpreted as a single predicate or can be expanded to another set of predicates representing the schema named. Recursion functions to describe the structure of text at all levels. For example, a single sentence may be used to attribute information to an entity or a longer sequence of text may be used for the same purpose. The analysis of texts was made in order to discover just how predicates are combined to form a longer sequence of text having a specific function. Thus, the resulting schemata describe combinations of predicates which serve the function of a single predicate. For this reason, each schema is associated with a single predicate and is given its name. Schema recursion is achieved by allowing each predicate in a schema to expand to either a single proposition (e.g., a sentence) or to a schema (e.g., a text sequence). The structure for a text generated from this application of schemata will be a tree structure, with a sub-tree occurring at each point

Note that this analysis is somewhat subjective.

Discourse structure

Compare and Contrast Schema

Positing/Attributive {Attributive (A) /


Particular illustration/Evidence (A) / Amplification (A) / Inference (A)/ Explanation (A) } -f {Comparison (A and ~A ) / Explanation (A and ^A ) Generalization (A and ~A ) /


Inference (A and ~A )



"1) Movies set up these glamorized occupations. 2) When people find they are waitresses, they feel degraded. 3) No kid says I want to be a waiter, I want to run a cleaning establishment. 4) There is a tendency in movies to degrade people if they don't have white-collar professions. 5) So, people form a low self-image of themselves, 6) because their lives can never match the way Americans live — on the screen." (Terkel 72)

Example Classification

1. Positing (—A)

2. Attributive (A)

3. Evidence (A)

4. Comparison;explanation (A


~A )

5. Inference (A and ~A )

6. Comparison;explanation (A


~A )

Figure 2-7:

The Compare and Contrast Schema




to Working

Topic Group 1





1) Attributive 2) Restriction 3) Attributive 4) Particular-illustration 5) Particular-illustration






Introduction to Working




where a predicate has been expanded into a schema. Propositions occur at the leaves of the tree.

Schemata, therefore,

are similar in concept to hierarchical plans

(Sacerdoti 77). Each predicate in the schema is a generation goal which can be achieved either by fulfilling a number of sub-goals (the predicate expands

Discourse structure

to a schema) or producing a single utterance (the predicate expands to a proposition). Figure 2-9 illustrates how schema recursion works through the use of a constructed example. The identification schema is used in response to the question "What is a Hobie Cat?". The first step the hypothetical speaker takes is to identify the Hobie Cat as a class of catamarans (1). To do so, however, he also provides a definition of a catamaran, assuming that his listener knows little about sailing and simply identifying the Hobie Cat as a catamaran is not adequate for him. The identification predicate expands to the identification schema, where the speaker identifies the catamaran as a sailboat (2) and provides an analogy between the two, which consists of their similarities (3) and differences (4). Note that these two steps are dictated by an analogy schema. After pointing out a catamaran to the listener (5), he pops back to the original identification schema to provide additional information about the Hobie Cat (6) and finally, cites two types of Hobie Cats, the 16-ft. and the 14-ft. (7).

ID Schema

ID Schema



identification - >



- >








2) which is a kind of sailboat.

3) Catamarans have sails and a mast like other sailboats, 4) but they have two hulls instead of one. 5) That thing over there is a catamaran. 6) Hobie Cats have a canvas cockpit connecting the two pontoons and one or two sails. 7) The 16 ft. Hobie Cat has a main and a jib and the 14 ft. Hobie Cat has only a main.

1) A Hobie Cat is a brand of catamaran,

Figure 2-0:

Schema Recursion

Full recursion, such as is illustrated in the above example, is not

Text generation

currently implemented in the TEXT system. In order for the system to be fully recursive, a schema must be written for each rhetorical predicate. Right now, schemata for only four of the predicates (out of a total of ten predicates) are written. (In the above example, the analogy schema shown is assumed to correspond to the compare and contrast schema, but this would require more analysis to verify.) Another, perhaps more interesting side to the recursive use of schemata is the question of when recursion is necessary. Clearly, there are situations where a simple sentence is sufficient for fulfilling a communicative goal, while in others, it may be necessary to provide a more detailed explanation. One test for recursion hinges on an assessment of the listener's knowledge. In the above example, the speaker provided a detailed identification of the Hobie Cat, because he assumed that the listener knew very little about sailing. In order to achieve comprehensive treatment for providing more detailed information a full user model (Moore 80; Rich 79; Allen 80) would have to be developed to determine how much detail is needed for each user at different times. Another test for recursion hinges on the amount of information available about a given concept in the knowledge pool. No matter how much detail a user needs to understand a concept, it cannot be supplied if nothing more is known about the concept. On the other hand, if a speaker knows a great deal about a concept he is discussing, he will probably want to say it unless he's sure the listener already knows about it. Neither user modelling nor assessments of the amount of information which can be talked about have been implemented in the TEXT system. The machinery for actually performing the recursive push to an associated schema (i.e., entering a new schema and saving the states associated with the old on a stack) has been implemented, so that once the extra schemata are written and sufficient tests for providing detailed information developed, full recursion would be possible. There are situations where a full-blown user model is not necessary to determine that recursion is necessary. One such case has been implemented in the TEXT system, where recursion is used in answering a question about the difference between two very different items. In this case, simply asking the question signifies to the system that the user has no idea what these two items are. Since the most appropriate information to include in the answer is about generic classes (see Section 4.3), it is the only information provided in the relevant knowledge pool. Therefore, double identification of the two questioned objects is necessary (as was the case in identifying a Hobie Cat). When a question is asked about two very different items, it triggers the tagging of the super-ordinates of the questioned objects as unknown to the user.

For example, in asking about the difference between a destroyer and a bomb, the questioner indicates that he doesn't understand that one is a

Discourse structure

vehicle and the other a destructive device, two objects with totally different functions. * During schema filling, the presence of an unknown tag indicates that the user needs more detailed information and a recursive push is performed. In (A) below the answer to the question "What's the difference between a destroyer and a bomb?" is shown. Two recursive schema invocations were made, both from one identification predicate to the next (proposition 1 to 2 and proposition 3 to 4), resulting in a double identification. Note that since no information other than identificational information is available in the relevant knowledge pool, an early exit from the embedded schema is taken.


Note that the system does not address itself to the question of why the user thinks they are similar, another possible way of answering the question.

Text generation


Schema selected:

proposition selected:





DESTROYER SHIP (restrictive ((DRAFT)) (((DRAFT (15

222))))) (non-restrictive TRAVEL-MODE SURFACE))


Schema selected:

proposition selected:



focus: SHIP

Schema exited

proposition selected:


SURFACE) (non-restrictive TRAVEL-MEANS



SHIP VEHICLE (non-restrictive FUNCTION


BOMB FREE-FALLING (restrictive



focus: BOMB

Schema selected:


proposition selected:




Schema exited

proposition selected:



Message through







Entering tactical


A destroyer is a surface ship with a DRAFT between 15 and 222. A


ship is a vehicle. target location.

bomb and the destroyer, therefore,




a free

projectile that has a surface

A free falling projectile is a lethal destructive device.

are very different

kinds of entities.



Discourse structure

2.2.2. Summary of text analysis The analysis of texts and transcripts shows that patterns do occur across a variety of text styles. It appears, however, that the patterns are very loose. Each schema contains a number of alternatives, indicating that a speaker has a wide variety of options within each type of structure. Moreover, since it is difficult to precisely define a predicate, the interpretation of each predicate in the schema allows for additional speaker variation.

It reflects

the observation that at the text level speakers have more options in

constructing English than at the sentence level: there is less agreement on

what constitutes a bad paragraph than a bad sentence.

choice points allow other factors to influence textual structure.

current system, focus of attention is the sole influence (see Chapter Three), but influences such as a user model could also be incorporated.

Despite variability, the schemata do provide definite constraints on the ordering of paragraph constituents. A higher level view of the identification schema, the least constrained of the schemata, illustrates this. The schema consists of an identification, a description , and an example , in that order, with the option of an additional descriptive statement or example following, a reasonably constrained organization. It should be noted that the schemata are descriptive and not


eventually be broken in order to achieve a desired literary effect.

license, in fact, is based on the breaking of norms.

the discourse level are broken to create implicatures similarly to the creation

of implicatures at the sentence level (Grice 75).

that the schemata do not function as grammars of text.

Encoding this extent of variability in schemata is intentional.

Moreover, these

In the

Any discourse norm developed over a period of time will


It may be that

norms at

All this points to the fact

The schemata do, however, identify common means for


achieving certain discourse goals. They capture patterns of textual structure that are frequently used by a variety of people. Thus, they describe the norm for achieving given discourse goals, although they do not capture all the means for achieving these goals. Since they formally capture means that are used for achieving a discourse goal, they can be used by a generation system to produce effective text.

13 where

description — >



example — > particular-illustration/evidence

Text generation

2.3. Related research using rhetorical predicates

One computational use of rhetorical predicates is for the interpretation of arguments (Cohen 81b). Cohen's goal is to determine argument structure. She uses linguistic clues in the text to aid in determining the rhetorical function of a proposition and, thereby, the supporting relations between propositions in the text. Some of the predicate types which Cohen uses include claim, evidence, and inference. It should be noted that Cohen assumes an "oracle" which does the classification of propositions as predicates. In Cohen's work, predicates are used for the interpretation of language, as opposed to its generation. Another proposed use of rhetorical predicates is in the generation of paragraphs (Jensen et al. 81). Jensen assumes that the content of the paragraph has already been determined. Her system then determines the function of each proposition and uses it to aid in the development of paragraph style. By identifying the underlying structure between propositions, they can be combined appropriately in text. This is in contrast to the use of predicates here to guide the determination of content as well as to determine ordering. As mentioned earlier, Hobbs (78) and Lockman (78) both used predicates (or coherence relations) as an aid in the interpretation of anaphora. These works use formal definitions of coherence relations to identify the relations between juxtaposed sentences. These relations then help to predict what kind of anaphora can occur. This work has application to natural language interpretation, but has little to say about the use of coherence relations in generation. Rumelhart's story grammars (Rumelhart 75) are similar to schemata as they describe textual structure for stories. He uses the grammars to recognize the underlying structure of a story, as opposed to generating it, and to summarize the important events of a story. Rumelhart's grammars differ from schemata in that they include both a structural and a semantic component, the non-terminals of the grammar (e.g., setting, episode, event) do not correspond to the rhetorical predicates used for TEXT, and he captures the structure of narratives, while I am more interested in the structure of descriptions.

2.4. Use of schemata

In the TEXT system, schemata describing discourse structure are used to guide the generation process. They are used to decide what is said first, what next, and so forth. The four schemata shown in Figures 2-4 - 2-7 above (identification, attributive, constituency, and compare and contrast) are used in the TEXT system with minor variations.

The identification,

constituency, and attributive schemata were modified

Discourse structure

by eliminating several predicates for which no corresponding information exists in the specific application. Specifically, the renaming predicate was eliminated from the identification schema since synonyms are not represented or used in the TEXT system, and the cause-effect predicate was eliminated from the constituency schema since no process information is represented. TEXT's attributive schema is even more constrained than the original. The analogy predicate is used in place of the comparison predicate and the classification predicate is used instead of restriction, as neither the classification nor the restriction predicate have a translation in the database domain. In addition, several alternatives and options were deleted from the attributive schema, notably question-answer, adversative, and all alternatives of the last line except explanation. The modified schemata are each a subset of their corresponding originals; that is, the structures the modified schemata generate are generated by the originals, but they do not generate all structures generated by the originals. The compare and contrast schema was modified to allow for equal discussion of the two items in question. Recall that the contrastive schema which emerged from the text analysis called for contrasting a major concept against a minor one. The minor concept, had, in most cases, either been discussed in the preceding text, or was assumed by the writer to be familiar to the reader. Thus, more discussion of the major concept was provided. Since no history of discourse is currently maintained in the TEXT implementation (see Chapter Five for suggestions for future work) and no user model, other than a static one, is constructed, the system does not know whether the user has more knowledge about one concept than another and the comparison, therefore, must be equally balanced between the two. An example of an equally balanced comparison taken from the texts analyzed is shown below in (B) (the basic outline of the compare and contrast schema used in TEXT is shown).

(B) "Made .by" vs. "Produced by" (Nelson 77)


Each listing also states that the wine was "produced and bottled by," or "made and bottled by," or "cellared and bottled by" a

particular vintner.

guide to how much of the wine in the bottle was actually fermented and finished by the company that put it into the bottle.

In the case of California wines, this is a very rough

Text generation

Differences If the label states "produced and bottled by," then at least 75 percent of the wine was fermented and finished by that winery. If the label says "made and bottled by," then only 10 percent of the wine need have been produced by the winery, and the other 90 percent or some portion of it may have been bought from another source and blended into the final product. If the label says anything else — "cellared," "vinted," "bottled," "perfected," or any long and glorious combination of these words, then none of the wine in the bottle need have been produced by that winery.


The fact that the label says simply "Bottled by Jones Brothers

Winery" doesn't mean the wine is no good, however.


Brothers to buy good wine, rather than on their ability to make it.

It may be

Its goodness will simply depend on the ability of the Jones

2.^.1. Associating technique with purpose

In the texts I analyzed, different rhetorical techniques were found to be

used for different

of technique with discourse purpose is achieved by associating the

schemata with different

defining a term, a different set of schemata (and therefore rhetorical techniques) is chosen than if the question involves describing the type of information available in the knowledge base. The discourse purposes under consideration correspond to the three response types handled by TEXT:

discourse purposes.


In the TEXT system, this association


For example, if a question involves

1. define: provide a definition

2. describe: describe available information

3. compare: compare


The identification schema was found to be used for definitions.

(In fact,

in TEXT it is only used in response to a request for a definition.) On the other hand, the purpose of the attributive schema is to provide detailed information about one particular aspect of any concept and it can therefore be used in response to a request to describe available information. In situations where an object or concept can be described in terms of its sub- parts or sub-classes, the constituency schema is used. It may be selected in response to requests for either definitions or information. The compare and contrast schema is used in response to a question about the difference between objects. It makes use of each of the three other schemata (see Section 4.4.6). A summary of the assignment of schemata to question-types is shown in Figure 2-10.

Discourse structure

requests for




requests for available




requests about the difference

between objects

compare and contrast

Figure 2-10:

Schemata used for TEXT

It should be noted that the compare and contrast schema actually has many uses and is an expository device frequently used in many of the texts analyzed. This schema is appropriate as the response structure for any question type when an object similar to the questioned object has been discussed in the immediately preceding discourse or is assumed to be familiar to the reader. In such situations, it serves two purposes: 1) it can point out the ways in which the questioned object differs from a concept familiar to the user; and 2) it can be used to parallel the structure of an earlier answer. This type of response would require using the one-sided compare and contrast schema that most of the analyzed texts used. In order for TEXT to use the "

compare and contrast for questions other than "What's the difference questions a discourse history record would have to be implemented and maintained throughout a session.

Text generation

2.5. Selecting a schema

Textual organization is influenced both by a speaker's goal and by what he has to say. Thus, the selection of a textual strategy is dictated by the discourse purpose and by knowledge that is relevant to that purpose. Each discourse purpose has a set of schemata associated with it that restricts the choice of which textual strategy to use to a small number of possibilities. A characterization of the information relevant to that purpose can then be used to select a single schema from the small set of possible schemata. Basically, this characterization specifies how much information is potentially relevant to the discourse purpose.

In TEXT, processing for schema selection models this view.

Once a

question has been posed to the system, a schema is first selected for the response structure. It will later be used to control the decisions involved in deciding what to say when. On the basis of the given question, which defines the discourse purpose, a small set of schemata is selected as possible structures for the response. This set includes those schemata associated with the given question-type (see Figure 2-10, above). A single schema is selected out of this set on the basis of the information available to answer the question. In response to requests for definitions and information, the constituency schema is selected when the relevant knowledge pool contains a "rich" description of the questioned object's sub-classes and less information about the object itself. When this is not the case, the identification schema is used for definition questions and the attributive schema is used for information questions. The test for what kind of information is available is a relatively simple one. If the questioned object occurs at a higher level in the knowledge base hierarchy than a pre-determined level, the constituency schema is used. Note that the higher an entity occurs in the hierarchy, the less descriptive information is available to describe the set of instances it represents since the larger the class, the less common features occur across it. Thus, above this level the constituency schema will be used and below it the attributive or identification schema will be used. This process assumes a hierarchically structured knowledge base and could not be done on an unstructured one (see Section 4.2 for a description of the knowledge base used in the TEXT system).

(C) and (D) below show two examples of a request for a definition. For

the question "What is a guided projectile?" (C) the constituency schema is selected since the guided projectile occurs above the determined level in the hierarchy and thus more information is available about the guided projectile's sub-classes than about the guided projectile itself, while the identification schema is selected for the question "What is an aircraft-carrier?" (D).

(definition GUIDED)

Schema selected: constituency








Message through dictionary.

Discourse structure

Entering tactical component

A guided projectile is a projectile that is self-propelled. There are 2 types of guided projectiles in the ONR database: torpedoes and missiles. The missile has a target location in the air or on the earth's surface. The torpedo has an underwater target location. The missile's target location is indicated by the DB attribute DESCRIPTION and the missile's flight capabilities are provided by the DB attribute ALTITUDE. The torpedo's underwater capabilities are provided by the DB attributes under DEPTH (for example, MAXIMUM OPERATING DEPTH). The guided projectile has DB



Text generation


Schema selected: identification






Message through dictionary. Entering tactical component

An aircraft

carrier is a surface ship with a DISPLACEMENT between

78000 and 80800 and a LENGTH between 1039 and 1063. Aircraft carriers have a greater LENGTH than all other ships and a greater DISPLACEMENT than most other ships. Mine warfare ships, for example, have a DISPLACEMENT of 320 and a LENGTH of 144. All aircraft carriers in the ONR database have REMARKS of 0, FUEL TYPE of BNKR, FLAG of BLBL, BEAM of 252, ENDURANCE RANGE of 4000, ECONOMIC SPEED of 12, ENDURANCE SPEED of 30 and PROPULSION of STMTURGRD. A ship is classified as an aircraft carrier if the characters 1 through 2 of its HULL NO are CV.


The compare and contrast schema, as intimated above, is significantly different in format from the other schemata. It dictates a contrastive structure without specifying which predicates are to be used. Use of predicates varies, depending upon what is being talked about. To achieve this variation, while allowing the schema the same guiding, role as the other schemata, the compare and contrast schema makes use of one of the three other schemata as part of the response depending on the semantic information available about the two entities. The type of information included in the relevant knowledge pool for this

Discourse structure

kind of question is dependent on the conceptual similarity 1 ^ of the two entities. In building the relevant knowledge pool, the semantic processor categorizes the entities as very close in concept, very different in concept, or in between these two extremes (see Section 4.3 for a description of how this is done). This classification is available for deciding which schema to use. If the two entities are very close in concept, the attributive schema is used since detailed information about each of the entities is available in the knowledge pool. If the entities are very different in concept, the identification schema is used since the only information available in the knowledge pool is hierarchical classification. For entities in between these two classifications, the constituency schema is used since the class difference in the hierarchy can be discussed as well as some of the entities' attributes.

2.6. Filling the schema

Once a schema has been selected, it is filled by matching the predicates it contains against the relevant knowledge pool (Section 4.3). Semantics

associated with each predicate define the type of information

match in the knowledge pool. These are semantics in the sense that they define what a predicate means in the database system; that is, what it can refer to in the database. The semantics defined for TEXT are particular to database systems and would have to be redefined if the schemata were to be used in another type of system (such as a tutorial system, for example). The semantics are not particular, however, to the domain of the database. When transferring the system from one database to another, the predicate semantics would not have to be altered.

Before describing predicate semantics in more detail, it is important to note the difference between a rhetorical predicate and a proposition. A

rhetorical predicate specifies a generic type of speech act (Searle 75). Each predicate is essentially a type of an inform act. Associated with each

that it can


number and types of arguments associated with a predicate are defined by its semantics. A proposition is an instantiation of a predicate; the predicate arguments have been filled with values from the knowledge base. Furthermore, although predicates, loosely speaking, match propositions in the knowledge base, propositions are not stored as wholes in the knowledge base (see Section 4.2 for a description of the knowledge base representation). Instead, pieces of the knowledge base are selected as values for the predicate arguments to construct a proposition.

are arguments which can take any value of a given type. The

The semantics for each predicate indicate the data-types of information

For another see (Winston 79).

approach to determining similarities, or drawing analogies,



in the knowledge base that can satisfy its arguments.

have more than one way in which its arguments can be satisfied.

semantics for the attributive predicate, for example, indicate that the following two English sentences both attribute information to the missile:

A single predicate may


1. The missile has database attributes TIME TO TARGET & UNITS, LETHAL RADIUS & UNITS, ALTITUDE, SPEED, and PROBABILITY OF KILL, (database attributes)

2. The missile has a target location in the

air or on the earth's


(distinguishing descriptive attribute)

The constituency predicate, on the other hand, has only one


hierarchy and would translate to an English sentence like: "There are two types of water-going vehicles in the ONR database: ships and submarines."

It matches the sub-classes of an entity in the generalization

The semantics of the predicates are represented as functions.


with each predicate is a function that accesses the relevant knowledge pool and retrieves values for the predicate arguments. Each predicate has the

effect of providing information about something.

in the two examples given above attribute information to the missile.

The attributive propositions

Likewise, the constituency example above presents the sub-classes of water-

This specialized entity is the given argument of the predicate.

It is passed as input to the predicate function.

remaining predicate arguments that are associated with the given argument in the relevant knowledge pool. The values for the arguments which are passed to the predicate functions are supplied by the previous discourse or the input questions before any discourse has been constructed. Where possible, they are supplied by the focus of the discourse. In other cases, the function extracts an instance of the data type it is looking for from the most recent proposition which contains it. The predicate arguments and their ordering, specified by the predicate semantics, are called the message formalism in the TEXT system. Each predicate has its associated formalism. When a predicate is evaluated, one or more of its arguments are given and the others are filled by values in the database to form a proposition. This proposition is the actual output of the predicate function. A complete specification of the predicates and their formalism, along with examples, is given in Appendix D. A schema is filled by stepping through it, using the predicate semantics to select propositions that match the predicates. At any choice point in the schema, the focus constraints (described in Chapter Three) are used to decide

The function searches for the

going vehicles.

Discourse structure

which proposition should be selected. This is a place where additional information, such as a user model, could be incorporated as an influence on the generated text. For cases where a single predicate has several types and matches more than one proposition in the knowledge base, information about how focus of attention can shift is used to select the most appropriate proposition (see Chapter Three). In places where alternative predicates occur in the schema, all alternatives are matched against the relevant knowledge pool, producing a set of propositions (if more than one predicate succeeds). Again, focus of attention dictates how to select the most appropriate proposition. When an optional predicate occurs in the schema, both the optional predicate and the predicate which would succeed it are matched against the knowledge pool. If the optional predicate has no match, the successor's match is selected. If both predicates match, focus of attention is used to select the most appropriate proposition. After a proposition has been selected, it is marked in order to prevent

repetition in a single answer.

of information in the knowledge pool, each piece of information is marked by

Since a proposition may be composed of pieces

When selecting propositions, this property

is checked to determine whether it has already been said. Since no tracking of discourse is done right now, the "used" property is removed after the generation of each answer.

adding the property "used" to it.

2.7. An example

To see exactly how a schema is filled, consider the process of answering the question "What is a ship?" (in functional notation, "(definition SHIP)"). Two schemata are associated with definitions: constituency and identification. A test on the generalization hierarchy indicates that the ship occurs at a level where a large amount of information is available about the entity itself. The identification schema is therefore selected and the process of schema filling begins. The first predicate in the TEXT identification schema (shown in Figure 2-11) is identification. The relevant knowledge pool constructed for this question is shown in Figure 2-12 (see Section 4.3 for the determination of relevant information). Since this is the first statement of the answer and no preceding discourse exists to provide a context for the predicate to use, the questioned object (all that has been mentioned) is passed as argument to the identification function. In this case, the questioned equals SHIP. The identification predicate is matched against the relevant knowledge pool and the ship's super-ordinate in the hierarchy, plus certain descriptive information as dictated by the semantics of the predicate, are selected. Note that the identification predicate has only one type and therefore, only one proposition matches it:



(identification SHIP WATER-VEHICLE (restrictive TRAVEL- MODE SURFACE) (non-restrictive TRAVEL-MEDIUM WATER)) 16

Identification (class & attribute /






Figure 2-11:

The TEXT Identification Schema

The second step in the schema specifies an optional alternative.


alternative includes the descriptive predicates analogy, constituency, and attributive Each of these predicates is matched against the relevant knowledge pool. Since each of these predicates takes an entity as its given argument, both SHIP and WATER-VEHICLE are passed to the various predicate functions (SHIP and WATER-VEHICLE are the only entities mentioned so far). Since quite a bit of information remains about the SHIP in the relevant knowledge pool, each of these predicates matches and three propositions are produced. Since the only remaining information about the WATER-VEHICLE is its sub-classes, only the constituency predicate matches for the WATER-VEHICLE. The 4 matched propositions are:

Here, the arguments of the identification predicate are filled by the entity SHIP (what is being identified), the entity WATER-VEHICLE (its superordinateY and two distinguishing descriptive attributes, one of which distinguishes SHIPS from all other WATER-VEHICLES (labelled "restrictive"), and the other which describes all WATER-VEHICLES (and therefore is labeled as "non-restrictive").

Discourse structure


(analogy rels SHIP ON GUIDED GUNS)




) )


(attributive db SHIP (name OFFICIAL_NAME) (topics



DIMENSIONS) (duplicates





Since the alternative is optional, its succeeding step (an alternative between particular-illustration and evidence) is also matched against the

relevant knowledge pool. The same entities are passed as given arguments to the predicate functions. Since the second argument required by the

particular illustration

predicate does not exist in the discourse so far, there

is nothing to illustrate and the particular-illustration predicate fails. The evidence function succeeds for the entity SHIP since there are several database attributes which indicate that it travels on the surface. It does not succeed for the WATER-VEHICLE, however, so this step matches one proposition:


One proposition is then selected from this set of five by applying

constraints based on how focus of attention can shift.

proposition matching the evidence predicate is selected, although the reasoning behind the choice is not discussed here since it depends on the focus

constraints (see Section 3.2.9 for the focus algorithm).

far and the updated relevant knowledge pool (information occurring in the answer is marked as used) are shown in Figures 2-13 and 2-14. This process

is then repeated for the next step in the schema to complete the answer, but

is not shown here.

It should be noted that the identification schema encodes

more alternatives than the other schemata and is therefore less efficient in

deciding what to say next.


additional choices.

In this case, the

The answer created so

Less restrictive schemata necessarily entail more

than others as more processing must be done to explore the



































Sample Relevant Knowledge




Discourse structure









Figure 2-13:
















Updated Relevant Knowledge Pool

Text generation






The ship is a water-going vehicle that travels on the surface. Its surface going capabilities are provided by the DB attributes DRAFT and DISPLACEMENT.

Figure 2-14:

Selected Information

2.8. Future work

There are several directions of research which this work suggests. Currently, the model of generation allows for several influences on the structure of generated text, including rhetorical strategies (encoded as schemata), potentially relevant information, discourse goal, and focus of attention (Chapter Three). One influence not taken into account is knowledge about the user's beliefs and goals. This information could be taken into account both in the initial selection of a schema and in deciding between alternatives within a schema. Which of several alternatives at a schema choice point should be taken is currently determined by the focus constraints alone, which select the proposition that ties in most closely with the previous discourse. When filling the schema, conditions on focus are tested to determine which alternative to take. These conditions are essentially preconditions to taking a particular action (predicate) in the text plan (schema). They provide the hooks for using tests based on the system's beliefs about the user to select a predicate most satisfactory for the user, given what he already knows. The development of full recursion is related to the incorporation of a

Discourse structure

user model. The use of full recursion would mean the system would have the ability to provide different responses to the same question which vary by level of detail. In order to develop such a capability, it will be necessary to investigate the influences on level of detail, some of which will be part of a user model, and incorporate these as conditions on recursion. This is a direction that I have already begun to pursue. Another direction, hinted at earlier, is the use of schemata to aid in the interpretation of natural language. Schemata could be used to determine the discourse goals of the user when interacting with the system. If a schema matches a sequence of input questions from the user, then the goal he is pursuing over that sequence of dialogue will be a goal associated with the matched schema (e.g., definition if the identification schema matches). This information will be useful both in responding to individual questions and anticipating follow-up questions.

2.0. Conclusions

Schemata have been used in the TEXT system to model common patterns of text structure. The schemata embody a number of alternatives, thereby allowing for a good deal of structural variety in text. Moreover, it was shown that schemata are not grammars of text; many experienced and talented writers purposely break norms in order to achieve a striking literary effect. Rather, the schemata describe common patterns of description that are used to effectively achieve the discourse purposes considered. They capture, in a computationally useful formalism, techniques that are frequently used. Although not a complete grammar of text in general, they serve as a grammar for TEXT (i.e., they delineate the extent of text structures it can generate). Schemata are used in the TEXT system to guide the generation process. They initiate the process of what to say next; their decisions are monitored by the focusing mechanism which selects between alternatives. Since the schemata were shown to be recursive, describing text structure at many levels, they are much like a hierarchical plan for text. The schemata developed for the TEXT system encode structures suitable for description of static information. Other text types will require different kinds of schemata and probably different kinds of predicates as well. Descriptions of processes involving cause and effect, reasoning involved in explanation, and narrative are all examples of different text types that will require additional examination of text to determine commonly used means of description and explanation, but were not relevant to the application.


Focusing in discourse

Focusing is a prevalent phenomenon in all types of naturally occurring


various concepts or objects throughout the process of reading, writing,

speaking, or listening. In all these modalities, the focusing phenomena occur

at many levels of discourse.

with a single theme or subject; chapters are given headings, indicating that

the material included within is related to the given heading; paragraphs are organized around topics; and sentences are related in some way to preceding

and succeeding sentences. In conversation,


change the subject

conversation centers on specific

changing the focus of attention. The use of focusing makes for ease of processing on the part of participants in a conversation. When interpreting utterances, knowledge that the discourse is about a particular topic eliminates certain possible interpretations from consideration. Grosz (77) discusses this in light of the interpretation of definite referring expressions. She notes that although a word may have multiple meanings, its use in an appropriate context will rarely bring to mind any meaning but the relevant one. Focusing also facilitates the interpretation of anaphoric, and in particular, pronominal, references (see Sidner 79). When the coherence provided by focusing is missing from discourse, readers and hearers may have difficulty in determining what a pronoun refers to. When speaking or writing, the process of focusing constrains the set of possibilities for what to say next. Having decided that he wants to talk about the weather, for example, a speaker need not consider what he could say about yesterday's movie. When a speaker or writer has not decided ahead of time on the specific themes he wants to convey, he will experience difficulty in proceeding. Incoherent text or conversation is often the result of such a situation.

Everyone, consciously or unconsciously,

centers their attention on

For example, we expect a book to concern itself

comments such as "Stick to the


or "Let's


"Going back to what you were saying before

" all indicate that people are aware that the

ideas and that there are conventions for

Focusing also influences how something is said.

Changing what is

focused may involve marking the move for the hearer by using a different

Text generation

syntactic form. Continuing discussion of the same topic may require pronominalization* . The use of marked syntactic structures can highlight new information about a previously mentioned item. This use of focusing is what makes a sequence of sentences a whole. The fact that a sequence of sentences is about something makes that sequence connected, coherent, and in some sense, a unit. Intuitively then, a text is a connected, coherent sequence of sentences. In order to generate texts, some account of the use of focusing must be given.

3.1. Computational theories and uses of focusing

Focusing has been used effectively

as a computational tool in the

interpretation of discourse by several researchers in artificial intelligence.

Although theories about the process of focusing were developed specifically for use in the interpretation of discourse, some of the ideas developed are

applicable to the generation of natural language.

previous work in this area is presented before discussing the use of focusing

in this research.

Some background on

3.1.1. Global focus Grosz (77) identified the role of focusing in the interpretation of referring

expressions in dialogue. In particular, she was concerned with the distinction between two types of focus: global and immediate. Immediate focus refers to how a speaker's center of attention shifts or remains constant over two consecutive sentences. Both the ordering of sentence constituents and the interpretation of sentence fragments are affected by the immediate focus. Global focus, on the other hand, describes the effect of a speaker's center of attention throughout a set of discourse utterances on succeeding utterances.

A speaker's global focus encompasses a more general set of objects than his

immediate focus. In her work, Grosz concentrated on defining the representation and use of global focus. She did not address the problem of defining and using immediate focus. Grosz represented global focus by partitioning a subset of the entire knowledge base containing focused items from the remaining knowledge base. Determining what is focused on throughout discourse was part of the theory she developed. She distinguished between items that were explicitly focused on, as a result of having been mentioned, and those that were implicitly in focus by virtue of their association with mentioned items. Knowing which items are focused makes further interpretation of discourse easier.

^ The use of a pronoun to refer



to a person,

place, or thing (e.g., "he" or

Focusing in discourse

Considering only a subset of the knowledge base at a given time limits the search for referents of definite noun phrases occurring in the discourse and makes it more likely that the correct referent will be found. Grosz's representation of focus is, in fact, slightly more complicated than this. In the implementation of a focusing mechanism Grosz termed the subset of the knowledge base that contains items in focus a focus space. A

focus space is "open" (i.e., its contents are currently in focus) if items within

it have been recently mentioned.

By not bringing items into the focus space

until mentioned, the efficiency of the search for referents is increased. Items are "implicitly" in focus if they are related to items in an open focus space,

but have not yet been mentioned. Mention of one of these items opens the implicit focus space. The old open focus space remains open but is stacked. An open focus space is closed only when conversation returns to a stacked open focus space. In this case, conversation returns to an earlier topic thereby closing recent discussion. The highly structured task domain in which this work was done was used to guide changes in focus.

3.1.2. Immediate focus Sidner (79) extended Grosz's work with an extensive analysis of immediate focus. She used focus for the disambiguation of definite anaphora and thus for aiding in the interpretation of discourse. She was able to explain types of anaphora which Grosz did not consider, particularly the use of pronouns. A major result of her work was the specification of detailed algorithms for maintaining and shifting immediate focus. Tracking immediate focus involves maintaining three pieces of information: the immediate focus of a sentence (represented by the current focus), the elements of a sentence which are potential candidates for a change in focus (represented by a potential focus list), and past immediate foci (represented by a focus stack*° ). Current focus indicates the constituent of a sentence being focused on. The potential focus list records

By focus stack I mean specifically a data structure classically known as


a stack

one end of the


a list



to which one can only

add and delete

called the top of the stack.

Stacks are also

The stack operations that are referred to

a stack.

items from

known as last-in first-out here are:

pop. remove and retur n the top element off the stack.



specific item on the stack is popped, then all elements above that item are deleted from the stack and the item is deleted and returned.



a new element to the top of the stack.

is also used to mean push.

The verb stack

Text generation

constituents within the sentence that are candidates for a shift in focus. The potential focus list is partially ordered. The focus stack is updated every time a change in focus occurs. When conversation shifts to a member of the previous potential focus list, the old focus is pushed on the stack and the current focus becomes the new focus. When conversation returns to an item previously discussed, the stack is popped to yield that item. Because the concept of focusing is meaningful only in the context of at least two sentences, Sidner's algorithms specify rules for maintaining and shifting focus from one sentence to the next. Briefly, she claims that the speaker has four options:

1. continue talking about the same thing (current focus remains the same)

2. switch to talk about an item just introduced (current focus becomes a member of the previous potential focus list)

3. return to a topic of previous discussion (current focus becomes the popped member of the focus stack)

4. talk about an item implicitly related to the current focus (general world knowledge is needed to determine that such a switch has been made)

These rules are only part of an algorithm which is used to determine the referent of an anaphoric expression in the incoming sentence. Tracking the focus of the current sentence is part of the process of determining the referent of an anaphoric expression.

3.2. Focusing and generation

In previous research in computational linguistics, the use of focusing has been considered as a factor in the comprehension of discourse and in particular, definite anaphora in discourse. In this research, I show how it can be used as a tool for the generation of discourse. The use of a focusing mechanism provides constraints on the possibilities for what can be said. Global focus constrains the entire knowledge base, producing a subset containing items which can be talked about. Immediate focus further constrains the subset since after any given utterance a smaller set of choices will be possible. Furthermore, the use of focusing provides a computationally tractable method of producing coherent and cohesive discourse. Use of a focusing mechanism ensures discourse connectivity by ensuring that each

Focusing in discourse

proposition 19 of the discourse is related through its focused argument to the previous discourse. The following sections describe how to make use of the focusing mechanisms and guidelines developed by both Grosz and Sidner in the generation process. Several problems arise in adapting this work to generation. Since it considers interpretation, there is no need to discriminate between members of the set of legal foci; when more than one possibility for global or immediate focus exists after a given sentence, the next incoming sentence determines which of the choices is taken. The kinds of choices that must be made in generation, as well as the extensions which must be included in the focusing mechanism to accommodate these decisions, are described in the following sections.

3.2.1. Global focus and generation In the TEXT system, a relevant knowledge pool which contains information determined by the system to be relevant to the input question is constructed for each answer. It is equivalent to Grosz's concept of an open focus space. The relevant knowledge pool contains those items which are in focus over the course of an answer. It contains all that can be talked about further. Since it is a subset of the entire knowledge base, it contains a limited amount of information. I claim that, in generating discourse, one way that global focus may shift is when a recursive push on a schema is taken (see Section 2.2.1). That is, when it is necessary to provide a more detailed description of a particular concept (in identifying it, attributing information to it, providing an analogy about it, etc.), it is necessary to describe information related to the concept in question and therefore implicitly in focus. Such information is not part of the open focus space since it has not been mentioned previously, but it is related to the information explicitly in focus. When the push is taken, the focus shifts to this information and it remains in focus for the duration of the new schema. Thus, a new open focus space has been created. Note that the old focus space remains open; due to the nature of recursive pushes on schemata, the tex t will continue where it left off when th e task of providing more detailed information (the push) is completed. When the pop from the sub-schema occurs, the new open focus space is closed and the old open focus space again becomes the active one. As an example, consider the problem (described in Section 2.2.1) of

defining a Hobie Cat to someone who knows nothing about sailing.

for identifying the Hobie Cat is reproduced below.

identifying a Hobie Cat as a type of catamaran, it was also necessary to

The text

Remember that in

A proposition corresponds to a single sentence in the generated text.

Text generation

define a catamaran for the listener. In this case, the features of a catamaran are implicitly in focus when talking about the Hobie Cat. When the push is made to identify the catamaran, the features of the catamaran are focused. They remain in focus throughout the definition of the catamaran (the Hobie Cat is not discussed now) and when the discussion is finished, mention of the Hobie Cat brings the old open space back into focus and closes the space containing catamaran features (see Figure 3-1).

1. A Hobie Cat is a brand of catamaran,

2. which is a kind of sailboat.

3. Catamarans have sails and a mast like other sailboats,

4. but they have two hulls instead of one.

5. That's a catamaran there.

6. Hobie Cats have a canvas cockpit connecting the two pontoons and one or two sails.

7. The 16 ft. Hobie Cat has a main and a jib and the 14 ft. Hobie Cat has only a main.

It should be noted that this feature of shifting global focus is not currently implemented in the TEXT system, although the design has been worked out. Its implementation is dependent upon the implementation of full recursion (see Chapter 2, Section 2.2.1) which requires the development of a user-model, a major research effort.

8.2.2. Immediate focus and generation The previous sections have shown that the speaker is limited in many

ways to what he will say at any given point. He is limited by the goal he

is trying to achieve in his current speech act, which in the TEXT system is

to answer the user's current question. To achieve that goal, he has limited his scope of attention to a set of objects relevant to this goal, as represented by global focus or the relevant knowledge pool. The speaker is also limited by his higher-level plan of how to achieve the goal (the schema). Within these constraints, however, a speaker may still run into the problem of deciding what to say next.

In the TEXT system an immediate focusing mechanism is used to select

between these remaining options.

selected schema with propositions from the relevant knowledge pool.

that the schemata describe normal patterns of discourse structure and encode

a number of alternatives. Hence, they only partially constrain the choice of

what to say. During the process of schema filling, more than one proposition

may match the next predicate in the schema.

This can occur either because

It constrains the process of filling the


1) alternative predicates appear in the schema and propositions in the

relevant knowledge pool match more than one alternative, or 2) more than

one proposition matches a single predicate.

The decision as to which








Focusing in discourse

For sentence 1:

For sentence 2:

For sentence 3:

For sentence 4:

For sentence 5:

For sentence 6:

For sentence 7:

Space 1-1 open Space 2-1, 1-2, 1-3 implicit Space 1-1 open but stacked, Space 2-1 open Space 2-2, 2-3 implicit Space 1-1 open but stacked Space 2-1, 2-2 open Space 2-3 implicit Space 1-1 open but stacked Space 2-1, 2-2, 2-3 open same Space 1-1, 1-2 open and active Space-2 closed Space 1-1, 1-2, 1-3 open Space 2 closed

Figure 3-1:

Global Focus Shifts

Text generation

proposition is most appropriate is made by the focusing mechanism. It eliminates any propositions whose current focus does not meet the legal restrictions specified by Sidner. That is, the focus of the next proposition must be the same as the current focus of the last proposition, a member of the potential focus list of the last proposition, or a member of the focus stack. The representation of immediate focus and guidelines for shifting and maintaining focus used in the TEXT system follow Sidner. As each proposition is added to an answer, 20 its focus (termed current focus) and its potential focus list (a partially ordered list of items within the proposition that are potential candidates for a shift in focus) are recorded. Thus, focus is maintained and may shift or remain constant across each proposition. A focus stack is maintained throughout the course of an answer and it is updated every time the current focus changes. When the current focus shifts to a member of the potential focus list, the old current focus is stacked. When the current focus shifts to a member of the focus stack (conversation returns to a topic of previous discussion) the focus stack is popped to return to a previous focus. Although this information suffices for interpretation, additional mechanisms are needed for generation which can decide among focus alternatives. In the interpretation of discourse, this is not necessary because the choice is dictated by the incoming sentence. For generation, however, the speaker may have to decide between any of the valid foci at any given point. Figure 3-2 shows the choices that a speaker may have to make. The following sections describe how the TEXT system selects between these alternatives.

3.2.3. Current focus versus potential focus list The choice between current focus and items on the potential focus list corresponds to choosing between continuing to talk about the same thing or starting to talk about something introduced in the last sentence. As an example, consider the following situation. Suppose I want to tell you that John is a new graduate student. Suppose I also want to tell you that new graduate students typically have a rough first semester and I want to tell you a lot of other things about John: what courses he's taking, what he's interested in, where he lives. If I decide to tell you all the other things about John first, when I finally get around to telling you about the first semester of new graduate students, I will somehow have to re-introduce it into the conversation, either by reminding you that John is a new graduate student, by relating it to rough times, etc. If, on the other hand, I first told

90 Each proposition will translate to a sentence.

Focusing in discourse

Choice := CF (new sentence) = CF (last sentence) vs. CF (new sentence) £ PFL (last sentence)

Choice := CF (new sentence) =

CF (last sentence)

vs. CF (new sentence) £ focus-stack

Figure 3-2;

Choices between valid foci

you that new graduate students typically have a rough first semester, I would have no trouble in continuing to tell you the other facts about John. In fact, in continuing talking about John, I will be returning to a topic of previous discussion, a legal focus move. In other words, the current focus of the next sentence will be a member of the focus stack. Note that discussing new graduate students after an ensuing conversation about John is not a legal focus move, since "new graduate students" never became the focus of conversation, but was only a potential focus list member. Thus, for reasons of efficiency, when one has the choice of remaining on the same topic or switching to one just introduced, I claim the preference is to switch. If the speaker has something to say about an item just introduced and he does not present it next, he will have to go to the trouble of reintroducing the topic at a later point. In summary,


In this example, I am ignoring the effect of discourse structure and planning on the choice of what to say next for purposes of illustration only. Normally, the schema or discourse plan would also constrain what a person could say next.

Text generation

Choice : =

CF (next sentence) = CF (last sentence) vs. CF (next sentence) £ PFL (last sentence)


:= CF (next sentence) £ PFL (last sentence)

Reason : = if preference

is not taken, speaker will have

to re-introduce PFL-member (last sentence) at a later point.

If this rule is followed, it will have the effect of producing "topic clusters", each spinning off of and clustered around an item just introduced, rather than producing an extended discourse about a single topic. This causes the formation of sub-topics which results in a more interesting text than if a single topic were consistently maintained over a sequence of sentences. In the imagined conversation about John, one instantiation would produce the discourse shown below in Figure 3-3. Two topic clusters are John as a new graduate student and the courses John is taking.

John is a new graduate student.

New graduate students typically have a rough first semester.

John is taking four courses: intro to programming, graphics, analysis of algorithms, and hardware.

Graphics is the most interesting one.

John lives at Graduate Towers.

Figure 3-3:

Topic Clusters

Focusing in discourse

Several consecutive moves to potential focus list members are not a problem. In fact, they occur frequently in written text. In the following example, taken from "Pseudo-silk" in Future Facts (Rosen 76), focus shifts in every case but one.

1. Finally in November 1973, two Japanese scientists, Seigo Oya and Juzo Takahashi, announced that they had synthesized a fiber which "very much resembles silk."

2. The base for their pseudo-silk is glutamic acid,

focus =

their pseudo-silk (a fiber which "very much resembles


3. one of the 20 amino acids that make up all proteins

focus =

one (glutamic acid)

4. and a chemical long used in the production of monosodium glutamate,

focus =

a chemical (glutamic acid)

5. the controversial seasoning found in many meals,

focus =

the controversial seasoning (monosodium glutamate)

6. ranging from baby food to egg rolls, focus = <gap> (meals)

If this rule were applied indefinitely, however, it would result in a never- ending side-tracking onto different topics of conversations. The discourse would be disconcerting and perhaps incoherent. However, TEXT is operating under an assumption that information is being presented in order to achieve a particular goal (i.e., answer a question). Only a limited amount of information is within the speaker's scope of attention because of its relevance to that goal. Hence only a limited amount of sidetracking can occur. The rule is viable because global focus constrains consecutive shifts, illustrating the necessity for the interaction between global and immediate focus.

3.2.4. Current focus versus focus stack

The choice between current focus and returning to an item on the focus stack corresponds to the choice between continuing talking about the same thing or returning to a topic** of previous discussion.

Consider an extension of the discourse about John.

Suppose I have


Topic is used here loosely to refer

to the subject

or theme of


It does not refer to the linguistic notion of topic.


Text generation

already told you that John is a new graduate student and that new graduate students have a rough first semester (the first two sentences of Figure 3-3). Suppose that in addition to telling you the other facts about John (about his courses and where he lives), I also want to tell you that new graduate students are required to maintain a B or above average or they will not be

allowed to continue their studies (a fact not mentioned in the last discourse).

I have the choice of telling you this immediately after sentence 2 of Figure

3-3 (in which case CF (new sentence) =

students) or of telling you the other facts about John first (in which case CF

(new sentence) =

CF (last sentence) =


new graduate

focus-stack member =

If I should decide to tell you the other facts about

John first, I would

not run into the same problem of re-introducing a topic since "new graduate

students" had been focused on.

There is, however, something odd about this


Here, the issue of global focus is more important than that of local


Having switched the local focus to "new graduate students", I opened

a new focus space for discussion.

space (see (Grosz 77) for a detailed discussion of opening and closing focus

spaces), thereby implying that I have finished that topic of conversation. am implying, therefore, that I have nothing more to say about the topic,

when in fact

about the same thing rather than returning to a topic of previous discussion. Having introduced a topic, which may entail the introduction of other topics, one should say all that needs to be said on that topic before returning to an earlier one.

The preference I claim in this case is to continue talking

If I switch back to

John, I close that





Choice := =

CF (new sentence) =

vs. CF (new sentence) £ focus-stack

CF (last sentence)

Preference := =

CF (new sentence) =

CF (last sentence)

Reason := =

to avoid false implication of a finished topic

These two guidelines for changing and maintaining focus during the process of generating language provide an ordering on the three basic legal focus moves that Sidner specifies:

1. change focus to member of previous potential focus list if possible CF (new sentence) £ PFL (last sentence)

2. maintain focus if possible

CF (new sentence) =

CF (last sentence)

3. return to topic of previous discussion CF (new sentence) £ focus-stack

Focusing in discourse

I have not investigated the problem of incorporating focus moves to

items implicitly related to current foci, potential focus list members, or

previous foci into this scheme.

This remains a topic for future research.

8.2.5. Other choices Even the addition of constraints induced by immediate focusing, however, is not sufficient for ensuring a coherent discourse. Although a speaker may decide to focus on a specific entity, he may want to convey information about several properties of the entity. The guidelines developed so far prescribe no set of actions for this situation. Rather than arbitrarily listing properties of the entity in any order, I claim that a speaker will group together in his discussion properties that are in some way related to each other. Thus, strands of semantic connectivity will occur at more than one level of discourse. An example of this phenomenon is given in discourses (1) and (2) below. In both, the discourse is focusing on a single entity (the balloon), but in (1), properties that must be talked about are presented randomly. In (2), a related set of properties (color) is discussed before the next set (size). As a result, (2) is more connected than (1).

1. The balloon was red and white striped. Because this balloon was

designed to carry men, it had to be large. It had a silver

at the top to reflect heat. In fact, it was larger than any balloon John had ever seen.


2. The balloon was red and white striped.

It had

a silver circle at

the top to reflect heat.


balloon John had ever seen.

Because this balloon was designed to

In fact,

it was larger than any

men, it

had to be large.

Consider the

following example, taken from the introduction to Working (Terkel 72).^ Except for the last sentence, where the current focus changes to "daily humiliations", the focus remains unchanged throughout the paragraph (current focus = this book). An undercurrent of related themes occur from one sentence to the next. Violence, physical violence, spiritual violence, and examples of violence are all related properties of the book that are described.

This type of phenomenon is very common in literary texts.


Other literary techniques, which I am ignoring here, such as syntactic


parallelism, also serve to make the text a cohesive unit.

to single out any one of these devices as the main mechanism for achieving cohesiveness.

it is difficult

Text generation

This book, being about work, is, by its very nature, about

violence - to the spirit

well as accidents, about shouting matches as well as kicking the dog


To survive the day is triumph enough for the walking wounded among the great many of us.

It is about ulcers as

as well

as to



It is, above all (or beneath all), about daily humiliations.

(P- xiii)

This phenomenon manifests itself as links between the potential focus lists of consecutive propositions in discourse. Consider the focus records for the first two sentences of the Working paragraph:




this book


violence {spirit; body} being about work is about


CF =

this book


{ulcers; accidents, shouting matches; fist fights, nervous breakdowns; kicking the dog around} is about

In this example the first item of the potential focus list (PFL) of sentence (2) (a list) provides instances of violence and is thus related to the first item in sentence (l)'s PFL. The elements of the list also exemplify the spiritual/physical dichotomy of violence and are thus related to the second item in sentence (l)'s PFL. Note furthermore that the PFL links are implicit links; ulcers and accidents are sub-types of violence, spiritual and physical. Although the current focus of a sentence is often a definite reference (pronominal or otherwise) to a previously mentioned item, definite reference across potential focus lists rarely occurs. These potential focus links result in a layering of foci that corresponds to the speaker's global focus. More than one thing is focused on at a time (global focus) and one of them is distinguished as immediate focus. In the generation process, this phenomenon is accounted for by further constraining the choice of what to talk about next to the proposition with the greatest

Focusing in discourse

number of links to the previous potential focus list. This constraint ensures that the text will maintain the global focus of the speaker when possible. If application of the guidelines discussed above does not constrain the possibilities to a single proposition, links between potential focus lists are examined to select the single proposition with the most links to the previous discourse (i.e., that proposition containing the greatest number of links to elements already mentioned). When this constraint is included, the ordering of focus maintaining and shifting rules becomes:

1. shift focus to member of previous PFL CF (new sentence) £ PFL (last sentence)

2. maintain focus CF (new sentence) =

CF (last sentence)

3. return to topic of previous discussion CF (new sentence) I focus-stack

4. select proposition with greatest number of implicit links to previous potential focus list PFL (new sentence) related-to PFL (last sentence)

8.2.6. A focus algorithm for generation

Before describing exactly how these guidelines for maintaining and shifting focus are incorporated into an algorithm that can determine what to say next, the assignment of focus and potential focus list to a proposition must be discussed. In TEXT, the assignment of focus involves a process of give and take between what the focus could be and what the guidelines about focus maintenance claim as preference. Initially, a default focus is assigned to a proposition. This focus can be overridden, however, if another item within the proposition allows for the application of one of the higher rules on the ordered list of guidelines.

3.2.7. Selecting a default focus

Selecting a default focus is a simple look-up procedure. A single argument of each predicate is singled out as the one most likely to be focused on. This information is stored in a table and the entry for the given predicate accessed when needed. Use of a default focus implies that a predicating act has a marked and unmarked syntax associated with it, the unmarked dictated by the default focus. This does not seem unlikely. Consider the attributive predicate, which is exemplified in sentences (1) and (2) below. In its usual use, it attributes features to an entity or event. The unmarked use assumes an entity has been focused on: the entity is being

Text generation

talked about and some of its features are being described (see Sentence (1) below). The opposite case, of associating talked-about features with a different entity, is less usual (see Sentence (2) below).

1. The chimpanzee has fine control over finger use.

2. Fine control over finger use is also common to the chimpanzee.

Use of a default focus also assumes that a particular class of words will be used to verbalize the predicate: those that allow the default focus to

For the attributive predicate, the default

focus would allow verbs such as "have" and "possess" as a translation, both

of which are possible in Sentence 1, but not "is common to" or "belongs",

which are possible in Sentence 2.

than Sidner's default focus. She associates the default focus with the theme of verb rather than surface subject, but she uses default focus to predict what the focus of the next sentence might be, while I use it to establish the focus of the current sentence.

appear in surface subject position.

Use of a default focus here is different

8.2.8. Overriding the default focus The default focus of a proposition is overridden if taking a different predicate argument as focus will allow the application of a more preferable guideline for focus movement. For example, if the default focus of a proposition is the same as the current focus of the last proposition, guideline (2) would apply (CF (new sentence) = CF (last sentence)). If, however, another predicate argument of the proposition is a member of the previous proposition's potential focus list, that argument is selected as the proposition's focus, since it allows for the application of guideline (1) (CF (new sentence) £ PFL (last sentence)). The assignment of focus to propositions is made when a proposition is selected which ties in most closely with the preceding discourse according to the focus constraint guidelines. Although the default focus can be overridden, it is useful since it provides some indication of the usual way of presenting information. It is needed to determine which argument to focus on in discourse initial propositions where no information exists to override it. In addition, since it indicates the most likely case, it can result in savings in processing time within each guideline application. If, for example, one proposition of a set of possible next propositions has an argument that is a member of the previous potential focus list, it will be selected by application of guideline (1). If that argument is the proposition's default focus, the proposition will be selected after the default focus of each proposition is checked for membership in the previous potential focus list, requiring one test per proposition. If default focus is not represented, the proposition will only be selected after each argument of each proposition is checked for membership in the previous potential focus list, requiring many tests per proposition.

Focusing in discourse

Moreover, if following a single guideline of the focus constraints would allow more than one proposition argument to qualify for focus, the default

focus indicates which of these t o use.

several arguments which occur in the previous potential focus list, the default

focus is selected as the argument to be focused on.

If, for instance, a proposition has

8.2.9. The focus algorithm The algorithm for using focus constraints in the selection of propositions is given below. This algorithm does not specify exactly how a schema is selected or filled and, as such, is not a complete algorithm for the strategic component (see Chapter Two for details on these processes).

I. Select schema

II. Initialization of focus records

GF (global focus) =

argument of goal

(e.g., for (definition AIRCRAFT-CARRIER),

GF =

for (differense OCEAN-ESCORT CRUISER),

GF =



(the set))

CF (current focus) =

PFL (potential focus list) =



III. For each entry in schema:

1. Select all propositions in relevant knowledge pool that match the predicate (see Appendix D for a description of predicate semantics)

If schema allows for a choice of predicates, select all possible propositions for each predicate

2. If options exist (i.e., more than one proposition matched), use immediate focus constraints:

a. Select proposition(s) with default-focus £ PFL. Set proposition-focus = default-focus

b. If none exist, select proposition(s) having some other argument-entry £ PFL. Set proposition-focus = other argument-entry

c. If none exist, select proposition(s) with default-focus = CF. Set proposition-focus = CF

d. If none exist, select proposition(s) having some other argument-entry = CF. Set proposition-focus = other argument entry

Text generation

e. If none exist, select proposition(s) with default-focus £ focus- stack. Set proposition-focus = default-focus

f. If none exist, select proposition(s) with other argument-entry £ focus-stack. Set proposition-focus = other argument entry

g. If none exist, set proposition-focus of each with proposition = default-focus

3. If options exist (i.e., more than one proposition remains after application of immediate focus constraints), use potential focus list links

a. Select that proposition with greatest number of links to PFL

4. Record predicate, proposition pair

a. Add to message

i. Add CF, PFL, focus-stack for last proposition to message

ii. Add new proposition to message

b. Update focus records

i. If proposition-focus £ focus-stack. Pop focus-stack. CF = proposition-focus,

ii. If proposition-focus ~ = CF. Stack CF on focus-stack. CF = proposition-focus,

iii. Else no change to CF and focus-stack,

iv. Set PFL:

1. member 1 = default-theme^ of proposition if proposition-focus ~ = default-theme. Else = default-focus

2. last-member = predicate (corresponding to sentence verb)

3. Other PFL members = arbitrary listing of other predicate argument in proposition

Note that if no suitable focus for the next proposition is found (step 2g), the proposition's default focus is used and the proposition added to the message.

As is the case for focus, a table of proposition arguments that function

Note that theme is a vaguely defined

as unmarked theme (that item that is most likely of all introduced items to

be focused on next) is maintained.

concept (see, for example, Sidner 79) and for that reason, is minimally used.

Focusing in discourse

This type of conversational move is the equivalent of a total shift in focus. Since the strategic component maintains a focus record, the tactical component could use this information to select an appropriate syntactic cue to signal this kind of abrupt shift. Note also that the potential focus list of each proposition is only partially ordered. Its first member is the default theme and its last member is the predicate, or what will eventually be the verb of the sentence. Other entries are set in arbitrary order. The focus algorithm makes a breadth-first search of all possible next propositions. In other words, each possible proposition is retrieved and then the focus constraints are applied. Another approach would be a depth-first search of the possibilities, retrieving only propositions that have an argument which meets the first focus preference and if that fails, retrieving propositions which meet the second focus preference, etc. To determine which propositions meet a focus preference, however, it is necessary to retrieve all propositions and examine their arguments, making the depth-first search a more expensive alternative.

8.2.10. Use of focus sets Sidner notes that a discourse need not always focus on a single central concept. A speaker may decide to talk about several concepts at once and yet the resulting discourse is still coherent. She terms this type of phenomenon "co-present foci". She gives the following discourse as an example of the use of co-present foci:

1. I have 2 dogs.

2. The one is a poodle;

3. the other is a cocker spaniel.

4. The poodle has some weird habits.

5. He eats plastic flowers and likes to sleep in a paper bag.

6. It's a real problem keeping him away from the flowers.

7. My cocker is pretty normal,

8. and he's a good watchdog.

9. I like having them as pets.

In this discourse, a set of two elements is introduced as focus in sentence

(1). Each element of the set is specified in (2) and (3) by "the one

other" construction (Sidner, in fact, relies heavily on this type of construction to identify the use of co-present foci). The discourse then proceeds to focus on each element of the set in turn.

Sidner notes that the use of co-present foci in discourse is a highly regulated phenomenon. For this reason, focusing on more than one central concept does not result in an incoherent discourse. She says:


Text generation

Co-present foci reflect a special kind of structure that occurs in discourse. Several elements are introduced. When continuing discussion of one of the elements extends the discourse, the focus moves to that element. When that discussion is complete, the focus cannot simply move onto any other thing the speaker wants to mention. The discussion should return to the other elements, and those elements discussed. However, the discussion of one element for an extended part of the discourse may involve introduction and consideration of other elements. The real constraint in the foregoing analysis is that discussion should eventually return to the other elements via co-present foci. When it does not, the hearer is left to wonder why co-presence was used in the first place.

(Sidner 79), p. 195.

Although Sidner describes the restrictions on how co-present foci can

occur, she does not describe the reasons for its use or for the focus moves to

elements of the focused set.

co-present foci is given by the incoming discourse, and there is no need to

Again, for interpretation of discourse, the use of

decide when its use is appropriate.

Generation of discourse, however, requires

that these kinds of decisions be made. Decisions to use co-present foci rest in part on the rhetorical techniques used in discourse and thus the discourse structure (e.g., the decision to define an object in terms of its sub-classes). They also depend on the discourse goal (e.g., the decision to answer a question about the difference between two objects). In the first case, definition of a concept in terms of its sub-classes suggests the use of the constituency schema, a particular structure for discourse. Use of the constituency schema implies focusing on the questioned object, followed by the introduction of its sub-classes and extended discussion of each of these in turn. In this case, the structure of the discourse forces the use of co-present foci and the changes in focus to set members. In the second case, the discourse purpose is to provide a description of the differences between the two objects. Associated with this discourse purpose are the rhetorical techniques encoded in the compare and contrast schema. Although the exact structure dictated by this schema varies depending on the type of information available in the relevant knowledge pool (see Section 2.3), its basic outline is a discussion of the similarities between the two objects, followed by a discussion of their differences. Thus, the discourse will first center on the two objects and their common attributes; focus will then switch to the questioned objects in turn. Again, the structure of the discourse forces the use of co-present foci and the changes in focus to set members.

In the TEXT system, schemata control the introduction of focus sets and

Focusing in discourse

changes in focus to their elements. Arc actions on the schemata (see Section 4.4 for a discussion of the ATN nature of schemata) can force selection of a set as focus and dictate moves to their elements. The focus algorithm continues as usual, with the exception that it allows decisions involving that focus set to override its own. Once discussion involving the focus set is over, the focus algorithm proceeds normally.


Focus and syntactic structures

3.8.1. Linguistic background

Phenomena similar to what is called "immediate focus" have been

studied by a number of linguists.

widely; some of the names which have emerged include topic/comment, presupposition and focus, theme/rheme, and given/new. It should be noted that a major difference between these concepts and focus, as discussed in this chapter, is that focusing describes an active process on the part of speaker and listener. The item in focus is that item on which the speaker is currently centering his attention. These linguistic concepts describe a distinction between functional roles elements play in a sentence. A brief description of each of these linguistic concepts follows.

Topic/comment articulation is often used to describe the distinction between what the speaker is talking about (topic) and what he has to say about that topic (comment). Definitions of topic/comment for a sentence usually do not depend upon previous context, although some linguists (in particular, Sgall, Hajicova and Benesova 73) provide different definitions of the distinction for sentences that contain a link to previous discourse and for those that do not. Others who have discussed topic/comment articulation include Lyons (68) and Reinhart (81). Presupposition has been used to describe information which a sentence structure indicates is assumed as true by the speaker. Presupposition has a very precise definition for formulations which consider meaning the equivalent

of truth

It refers to all that must hold in order for a sentence to be true. Focus, as defined by linguistic precedents, labels information in the sentence which carries the import of the message. Note that this term does not refer to the same concept as used in this work. Linguistic focus refers to the focus of the sentence, while AI researchers use focus to refer to the focus of the speaker. The linguistic focus of a sentence is usually determined by the position where phonological stress occurs (see Chomsky 71; Quirk and Greenbaum 73; Halliday 67).

The given/new distinction identifies information that is assumed by the speaker to be derivable from context (given) — where context may mean

Terminology and definitions for these vary

and has been analyzed by many (e.g., Weischedel 75; Keenan 71).

Text generation

either the preceding discourse or shared world knowledge — and information that cannot be (new). The given/new distinction has been discussed by Halliday (67), Prince (79), and Chafe (76). Theme/rheme is a distinction used in work by the Prague School of linguists (see Firbas 66; Firbas 74). They postulate that a sentence is divided into a theme — elements providing common ground for the conversants — and a rheme — elements which function in conveying the information to be imparted. In sentences containing elements that are contextually dependent, the contextually dependent elements always function as theme. Thus, the Prague School version is close to the given/new distinction with the exception that a sentence always contains a theme, while it need not always contain given information. Halliday also discusses theme (Halliday 67), but he defines theme as that which the speaker is talking about now, as opposed to given, that which the speaker was talking about. Thus, his notion of theme is closer to the concept of topic/comment articulation. Furthermore, Halliday

always ascribes the term theme to the element occurring first in the sentence. Focus of attention is like topic (and Halliday's theme) in that it specifies what the speaker is focusing on (i.e., talking about) now. But it is also like given information in that immediate focus is linked to something that has been mentioned in the previous utterance and thus is already present in the reader's consciousness. It combines these two concepts. The potential focus list is akin to new information in that it has not previously been referred to and it specifies the items to which focus of attention is likely to shift. These paragraphs provide only a brief overview of these various distinctions. Fuller discussions are provided in (Prince 79) and (Chafe 76) which also give overviews of the conflicting descriptions and definitions of these concepts. What is important to this work is that each of these concepts, at one time or another, has been associated with the selection of various syntactic structures. For example, it has been suggested that focus, new information, and rheme usually occur toward the end of a sentence (e.g.,

Halliday 67; Lyons 68; Sgall

information in its proper position in the sentence, structures other than the unmarked active sentence may be required (for example, the passive).

Structures such as it-extraposition, there-insertion, left-dislocation, and topicalization* have been shown to function in the introduction of new information into discourse (Sidner 79; Prince 79), often with the assumption

et al. 73; Firbas 74).

In order to place this


Some examples of these constructions are:

1. It was Sam who left

the door open,


2. There are 3 blocks on the table,

3. Sam,

4. Sam

I like him.

I like,




Focusing in discourse

that it will be talked about for a period of time.

another linguistic device associated with these distinctions (see Akmajian 73;

Sidner 79).

been implemented in the TEXT system.

Pronominalization is

In the following sections, I describe how these observations have

8.3.2. Passing focus information to the tactical component Since focus information has been used to constrain the selection of propositions, a record containing each proposition's focus and its potential focus list is available for the tactical component to use when determining the specific syntactic structures and linguistic devices that should be used in the answer. The tactical component can examine this information to determine how a proposition is related to previous discourse: whether the focus has shifted to a new topic, whether a return to a previous topic was made, or whether, in extreme cases, a total shift in topic has been made and the proposition is totally unrelated to what came before. Some of the uses that can be made of this information and the linguistic effects that can be achieved are described in this section. Pronominalization is a linguistic device that has long been linked with concepts such as focus of attention (Akmajian 73; Sidner 79; McDonald 80). Sidner uses it to aid in determining focus. If an entity remains in focus over

a sequence of sentences, references to it can be pronominalized. " following two sentences illustrate this:


John was late coming home. He got caught in a traffic jam.

In the TEXT system, focus information is used in some limited situations

to test whether pronominalization can be used.

pronominalization was selected is shown in (A) below.

of the answer, the ship is being focused on and reference

following sentence can therefore be pronominalized.

reference such as "the ship" would also be appropriate, but in the TEXT system, pronominalization is selected wherever it is determined possible.

Part of an answer where

In the first sentence



in the

Note that definite

McDonald shows that some additional tests for pronominalization must be made before a pronoun can be used. See (McDonald 80) for a discussion of subsequent reference.

Text generation

A) (definition SHIP)

A ship is a water-going vehicle that travels on the surface. surface-going capabilities are provided by the DB attributes



and DRAFT.

Focus information has also been shown to affect the use of


syntactic structures, as discussed in Section 3.3.1. Depending upon which constituent of