A Small Natural Language Interpreter in Prolog

A small natural language interpreter in Prolog
Knut Tveitane, knut at itu.dk Christian Theil Have, cth at itu.dk Supervisor: Henning Christiansen, henning at ruc.dk May 29, 2006
4 Week Project, IT University of Copenhagen
Contents
1 Introduction 1.1 About the Project . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Introduction . . . . . . . . . . . . . . . . . 2.2 Use Cases . . . . . . . . . . . . . . . . . . 2.3 Similar work . . . . . . . . . . . . . . . . 2.3.1 Attempto Controlled English . . . 2.3.2 Common Logic Controlled English 2.3.3 Natural Language Case Tool . . . 2.3.4 Metafor . . . . . . . . . . . . . . . 2.4 Lexicons . . . . . . . . . . . . . . . . . . . 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 4 5 5 5 7 7 8 8 9 10 11 11 11 12 12 13 13 14 14 14 14 15 16 17 19 19 21 22 22 22 22 25 25 28 30
3 Our approach 3.1 Use Cases and Natural English . . . . . . . 3.2 Supported Natural Language Constructs . . 3.2.1 Basic Sentences . . . . . . . . . . . . 3.2.2 Property Sentences . . . . . . . . . . 3.2.3 Entity-relational Sentences . . . . . 3.2.4 Phrase Lists . . . . . . . . . . . . . . 3.2.5 Compound Sentences and Pronouns 3.2.6 Syntactic Stringency . . . . . . . . . 3.3 Delimiting the Project . . . . . . . . . . . . 3.4 Tools and methods . . . . . . . . . . . . . . 3.5 Input Grammar Details . . . . . . . . . . . 3.6 Code Generation . . . . . . . . . . . . . . . 3.6.1 Extraction of facts from the program 3.6.2 Generation of Graphviz code . . . . 3.6.3 From parse tree to code . . . . . . .
4 Running the Project Software 4.1 Instructions for Use . . . . . . . . . . . . . . . . 4.2 Example Sessions . . . . . . . . . . . . . . . . . . 4.2.1 Simple example using Test domain . . . . 4.2.2 More complex example with the Company 5 Future Work 6 Conclusion 7 References
. . . . . . . . . . . . . . . domain
8 Appendix A - Code 8.1 Input Grammar.pl . 8.2 CodeGen.pl . . . . . 8.3 Lexicon Test.pl . . . 8.4 Lexicon Company.pl
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
32 32 35 39 40
List of Figures
1 2 3 Object notation for UML diagrams . . . . . . . . . . . . . . . . . UML diagram for sample session . . . . . . . . . . . . . . . . . . UML diagram for company use case . . . . . . . . . . . . . . . . 19 24 26
List of Tables
1 Examples of use cases and the entities associated with . . . . . . 18
1
1.1
Introduction
About the Project
This project is a 4 week project at the IT University of Copenhagen. We, the authors, are MSc students at ITU, but met at RUC spring 2006, where we both attended the course Paradigms in Programming. This course was also where we were introduced to Prolog. The PiP course gave us both a wish to achieve a better working knowledge of Prolog than the course alone could give us. That is why we decided to do a 4 week project in Prolog. A common interest in language and language technology led us to a grammar based project. We thank professor Henning Christiansen at RUC for counseling the project, and for supplying valuable input and feedback during these four weeks.
1.2
The Authors
Knut Tveitane is a second semester MSc student at the Software Development study line at ITU. He has been working for a number of years with IT technology application within the language supplier and translation industry. He has an interest in language, but he is not a linguist, and he is new to logic programming and Prolog. Christian Theil Have is also a MSc student at the Software Development line at ITU. He has solid programming experience and a computer science background (B.S.). During his studies he has developed strong interest in AI related technologies. He has no linguistic background, but nd it to be a very interesting eld. It is also his rst adventure in Prolog, except for the PiP course.
1.3
Purpose and Scope
The process of getting from a use case written in natural language to code in a programming language is time consuming work that traditionally demands human skills, and is usually performed by educated professionals. In object oriented approaches, one must identify and partition into classes and methods, as part of this process. The purpose of the project is to investigate the possibilities for automated transition from Use Cases in a natural language syntax into a computer readable representation, by trying to capture the semantics of the natural language and map it into building blocks of the object oriented programming paradigm (classes, objects, methods, properties etc.) The shift in programming languages from strict procedural to more object oriented has made this mapping process easier. Programming language constructs more and more resemble the linguistic constructs in natural language. Hugo Liu conclude that ..fairly direct mappings are possible from parsed English to the control and data structures of traditional high-level programming languages.. [LL05]
We propose automating this process at least partly with a tool that performs the mapping from a use case written in natural language to a computer understandable language. Its our hope that such a tool might useful to the professional normally performing this process manually. A number of possible uses for such a tool could be: Brainstorming. Prototyping. A learning tool that could help students gain insight in the process of developing software. Certain types of domain specic applications.
2
2.1
Background
Introduction
There has been several attempts to map natural language specications to code in a programming language. We have not found anyone specically targeting use cases, though some of the examined approaches do something very similar to what we hope to achieve. The examined approaches has all used English as their input language. Philosophies phies: The approaches can be divided according to two dierent philoso-
Formal languages: Approaches that use a controlled subset of English that maps into rst order logic. Opportunistic recognition: Approaches that leaves room for ambiguity in the input language, and opportunistically recognizes a subset of it. Attemptos ACE (section 2.3.1) and CLCE (section 2.3.2) are examples of the rst philosophy, while Metafor (section 2.3.4) and to a degree the Natural Language CASE Tool (section 2.3.3) are examples of the second.
2.2
Use Cases
Use Cases are normally described in natural language. A use case describes what a system does but does not specify how it does it [BJR99]. Use cases model the ow of events between system and its actors. Even though use cases are written in natural language, only a subset of English is normally used.
The Unied Modeling Language User Guide [BJR99] does not provide any in-depth guidance on how to write use-cases, it only gives a few examples written as paragraphs of text in present tense and third person. There has be some relevant research concerned with the structure and style of use cases. Alistair Cockburn [Coc97] has suggested a semi-formal approach where each action description has a certain structured format. Both Martin Fowler [Fow03] and Cockburn suggest consistent line numberings. Line numbers makes it less ambiguous to determine the ow of events. An European research team called CREWS has elaborated on Cockburns research and dened a rigorous set of guidelines for use case writing in [Ach98a]. The guidelines address both the content and style of use cases. The guidelines are expressed almost as a formal grammar. The CREWS team also provide some research in the area of linguistic structures of use cases written using their guidelines in [Ach98b]. The guidelines are divided into content guidelines and style guidelines. The guidelines include a number things that simplies the language in which the use cases are written. Avoidance of such things as anaphoric references, explicit references and synonyms, clearly make parsing simpler. So does consistency in terminology and abstraction level. The style guidelines provides provide some insight into which linguistic structures one can expect to nd in use cases. Some of those are: Atomic action structures. Flow conditions Loops The guidelines provides some templates for these linguistic structures, which could relatively easy be formulated as grammars. Research by Karl Cox and others [CP00] have questioned the usefulness of the CREWS guidelines, judging them to be to complex. Subsequently they have written their own guidelines [KCS01], called CP. The CP guidelines indeed seem clearer and simpler. CP is also divided into style rules and content rules. They are summarized below. CP Style rules Style 1: Each sentence in the description should be on a new, numbered line. Alternatives and exceptions should be described in a section below the main description and the sentence numbers should agree. Style 2: Avoid pronouns if there is more than one actor. Style 3: No adverbs or adjectives. Style 4: Avoid negatives. Style 5: Give explanations if necessary. 6
Style 6: All verbs are in present tense format. Style 7: There should be logical coherence throughout the description. Style 8: When an action occurs there should be a meaningful response to that action. CP Content rules Structure 1: Subject verb object. Structure 2: Subject verb object prepositional phrase. Structure 3: Subject passive. Structure 4: Underline other use case names.
2.3
2.3.1
Similar work
Attempto Controlled English
Attempto Controlled English (ACE) is a subset of English designed for writing software specications. This subset is enough to express rst order logic. ACE can be accurately and eciently processed by a computer, but is expressive enough to allow natural usage. [Fuc00] Specications in ACE appear to be informal, but are in fact quite formal. The Attempto system translates specication texts in ACE into discourse representation structures and optionally into Prolog. [Fuc00] The input text is parsed using something similar to denite clause grammars: The specication text is parsed by a top-down parser using a unication-based phase structure grammar. [Fuc00] It operates with a small vocabulary that contains entries of the function word class, e.g. determiners, prepositions, pronouns[Fuc00], but nouns, verbs and adjectives have to be added for the specic input text. Their grammar is built to recognize declarative sentences and composite sentences built from declarative sentences by using coordination constructors (and, or, either-or). A declarative sentence tells us how the world looks like if the sentence is true (proposition) and claims that the world looks like that (illocution). [Fuc00] A declarative sentence can be illustrated by a simple grammar, subject + nite verb (+complement or object). Only sentences in present tense and third person (either singular or plural) are allowed. Sentences containing modal adjectives or verbs are not allowed. They employ some techniques that allow for more natural sentences. For instance they allow anaphoric references and provides a means to resolve them. The technique used is a combination of deixis look-back based on syntactic features such as gender number, and a simple rule of right associativity. A feature that is not present in natural language, but practical for software specications is a sort of variables (in ACE terminology, dynamic names). Dynamic names in ACE distinguish single instance of the set of objects denoted by the preceding noun. [FST99] 7
They also allow synonyms and abbreviations, but do not provide details on how they manage synonym resolution. Possibly, its done using a lexicon such as WordNet, which is described in section 2.4. The input text is translated to a discourse representation structure (DRS). A DRS is a structured from of rst-order predicate logic which contains discourse referents representing the objects of the discourse, and conditions for the discourse referents. [Fuc00]. The DRS is similar to rules in Prolog programming language and can be translated to Prolog. 2.3.2 Common Logic Controlled English
Common Logic Controlled English is a specication draft written by John F. Sowa for a formal language with an English-like syntax.[Sow04]. The grammar denition is still incomplete. It is a subset of English that can be translated into rst-order logic. As of yet no tool exists that transforms sentences to FOL but SOWAs claim is that Under the assumption that all words, names, and variables are declared explicitly or implicitly before their rst use, the translation of any CLCE text to FOL can be performed in a single pass by a context-free parser augmented with two symbol tables. [Sow04]. ACE and CLCE are very similar languages. ATTEMPTO has advanced schemes for resolution of anaphora and structural ambiguity. This makes ACE a more natural language than CLCE but prevents translation in both directions and makes parsing more complex. 2.3.3 Natural Language Case Tool
[NT96] introduces an approach to mapping process descriptions written in natural text to information system design. The paper describes a CASE tool that utilizes natural language processing for interpreting and mapping business rules to information systems design [NT96]. In brief, it identies condition-action structures in the sentences and represents them as branches in a owchart. The boxes in the owchart are labeled with relevant parts of the input text. The text within the boxes are processed in a systematic way, so that subjects are left out and negations in conditions are eliminated. The tool does not seem to be capable of much contextualizing, in the sense that the text in the nodes are not processed further. Liu describes the nodes as large unparsed natural language utterances. [LL05] The paper gives the impression that it can identify a lot of dierent condition and action structures within the input text, but unfortunately the detail in which these constructions are described is very limited. The technical details are also sparse. They employ multi-pass parsing and disambiguation using a dictionary with syntactic and semantic information. The dictionary distinguishes dierent conceptual categories (tangible, object, person, event, location etc.).
2.3.4
Metafor
Metafor is a tool that maps natural language (English) into code in the python programming language. They conclude that fairly direct mappings are possible from parsed English to the control and data structures of LISP and Python. [LL04] Metafor accepts a wide range of language constructions and is denitely not a formal language. It is a prime example of a tool in the opportunistic category. It understands dierent narrative stances, past and present tense, metonyms, dominion, anaphoric and even dynamic references (set selections). They compare programming to storytelling. This is interesting, since storytelling is similar to use case writing, where events are also described chronologically. The capabilities of the tool is very impressive since it allows much ambiguity the input language. One of the reasons why Metafor so successfully recognizes such a rich input language is the use the ConceptNet lexicon. We briey describe this lexicon in section 2.4. They do mention how parsing and internal representation of the input text is handled. On a side-note, a deitic stack is mentioned, which might well be the way they handle anaphoric resolution. Programmatic Semantics The authors [LL05] have coined the term programmatic semantics to describe the transliteration process: Programmatic semantics is a mapping between natural linguistic structures and basic programming language structures. [LL05] They have further divided programmatic semantics into four categories: Syntactic features Procedural features Relational and set-theorectic features Representational equivalence Syntactic features implies semantics. Dierent word categories naturally map to certain programming language constructs; nouns as objects, verbs as functions and adjectives as properties. Procedural features include expression of conditional rules and iteration. The paper [LL05] point out linguistic constructions for handling conditionals: Subjunctives, possibles and when. Subjunctives are the well known two-clause constructions also seen in most programming languages (if ... then ...). When is similar to if, and can be handled the same way. Possibles are constructions such as may and might. Note, that possibles are illegal in the formal languages examined (ACE and CLCE). Loops are similar to conditionals, they are also two-clause constructions with a conditional and a body.
Relational and set-theoretic features can denote implicit loops. For instance a selectional constraint can be used to select a subset from a set. Implicitly this means looping through the set, applying the selectional constraint. Representational equivalence is a kind of a type system used for inferring a code representation for the objects in the input text. In Metafor, we always begin by assuming the simplest code representation which can accommodate the facts in the story, dynamically refactoring to more complex representations as necessary. [LL05] They claim that the sort of representational equivalence found in natural language is quite unparalleled in any formal programming language. This might be true, but the way they infer representation seems similar to Standard MLs type inference system.
2.4
Lexicons
Each of the examined approaches use some sort of lexicon. Lexicons are widely used in natural language processing, so they will not go unmentioned here either. In the following we describe some important lexicons and briey discuss their properties. We use the term lexicon to describe a collection or database of words where each word is linked with lexical and possibly semantic knowledge i.e. an advanced dictionary. A very popular publicly available lexicon is WordNet [Fel98]. WordNet is a huge lexicon that contains information about dierent words, categorized by word class (eg. noun, verb, adverb etc). Most notably, WordNet also contains information about relations between words and dierent word senses (eg. rock can refer to both a stone and to a kind of music). Some of the supported relations include synonyms, antonyms, part-of, kind-of and several others. Attempto and CLCE use custom lexicons. Each of them contains a limited amount of words and must be extended by the user to process a specic input document. Those dictionaries contain only closed word classes such as determiners, prepositions and conjunctions. The user extends the dictionary with domain specic nouns and verbs. These limited lexicons makes good sense in a formal language approach. However in the opportunistic approaches, large lexicons packed with lexical and semantic information is required, since input cannot easily be anticipated. Natural language CASE Tool use a large lexicon with 75.000 words and can also be extended by the user. This lexicon contains information about dierent concept categories. Details on the lexicon is sparse, but it probably has a high degree of similarity to WordNet. Metafor uses a large lexicon called ConceptNet. [LS04] ConceptNet is a very advanced lexicon developed at MIT, that includes more than 250,000 elements of commonsense concepts. It is similar to WordNet but contains a much richer set of semantic relations. While WordNet has been hand-crafted, ConceptNet was developed as a web-collaboration project. WordNet only contains entries for individual words, but entries in ConceptNet are linked with much more contextual information (so word sense, for instance, may determined using sentence analysis). 10
2.5
Discussion
The limited language of use cases makes it feasible to employ natural language processing. The scope of this could be limited to a certain subset of language constructions. Dierent approaches for mapping natural language to code exists and can be divided into formal languages and opportunistic language recognition tools. The approaches also dier in what types of language constructs they support and how they map them to a computer understandable representation. ACE and CLCE maps to formal logic while Metafor and Natural language CASE Tool have a heavy emphasis on procedural features. Natural Language CASE Tool totally disregards structural features whereas Metafor works with a combination structural and procedural features. Natural language CASE Tool creates diagrams instead of actual code in programming language, but argues that their tool could easily be modied to output code to programming language. Metafor skips directly to code generation in the Python programming language. Because of Pythons high-level features (dynamic typed, lambda functions etc.), they are able to do this quite elegantly, but are spared many considerations they would have go through if they had used a language like Java.
3
3.1
Our approach
Use Cases and Natural English
Use cases are descriptions of system functionality and interaction between parts of a system. The word system here is used in its widest meaning, meaning in theory it can be any compound structure where two or more components interact. In reality the systems described usually consist of human beings (users), some computer software and some kind of hardware objects. The use cases describe the capabilities of the system, and interactions between the parts of the system, in particular between the users on one side and the hardware and software parts on the other. Use cases are a system design tool. This means that the software (and sometimes also some of the hardware) described in the use cases, do not exist at the time the use cases are written. On the contrary, it is the purpose of the use cases to aid the system designer in designing a well functioning system. One of the tasks for the system designer is, based on the use cases and other documentation, to dene the class structure - that is, the main entities of the system, their content and capabilities, and the relations between them. The purpose of this project is, as mentioned before, to do some investigation into the possibilities of automating this process. The use cases are written in natural a language. However, they are not free form. As discussed in section 2.2, several sets of guidelines have been elaborated for how natural language should be applied to use cases. We have chosen to base the syntax on Attempto Controlled English, as presented in section 2.3.1. ACE comprises a subset of the English language, 11
such that any statement in ACE is valid English, but not every valid English statement is valid Controlled English. Three notable limitations of ACE are that it supports only Present tense 3rd person singular or plural Active, declarative sentences The grammar we have implemented in this version does not cover the entire denition of ACE. The precise subset of ACE that we support, is presented in the next section.
3.2
Supported Natural Language Constructs
When dening the constraints for the subset of natural English that we would implement support for, the goal was that it had to be expressive enough to identify the important entities and relations in an object oriented description of a system. Further, it still had to not only to comply with a basic formal syntax, but be suciently exible to maintain a certain degree of the ow and style of natural language. This rises the controversy of expressional exibility vs syntactical stringency and error control. We have tried to nd a balance between these, implementing a proof of concept that both can be considered within this kind of solution. Below we present the dierent sentence types and constructions the grammar parser is designed to recognize from a grammatical or syntactical point of view. Where applicable, the programmatic semantics of individual constructs are included. In 3.5, we present details of implementations for the most important of these constructs. 3.2.1 Basic Sentences
The starting point is the simplest of sentences, with a noun phrase followed by a simple verb phrase. The verb phrase can contain an intransitive or transitive verb. We will rst consider verbs that imply an action to be performed by or on the subject of the sentence. Examples are A man walks or The woman drives a car. Programmatic Semantics The subject of the sentence is a noun. This noun maps to a class denition in the object oriented programming paradigm. The verb maps to a method of the class represented by the subject. If the verb is transitive, the object is another noun (that denes another class) and this class serves as argument to the method.
12
3.2.2
Property Sentences
Sentences that imply an ownership or containment relation are syntactically identical to the transitive sentences as described in the previous section. However, verbs like to have or to contain cannot be considered to imply an action performed by or on the subject. Instead, a sentence like A car has an engine signal a static super-subordinate relation between subject and object. The object of a property sentence may be plural. In these cases, the object may contain a quantier. This may be an integer, a number (between two and twelve) spelled out in letters, or an unspecied quantier like some. An example is the sentence A car has four wheels. Programmatic Semantics The main semantic dierence between this sentence construction and the basic transitive sentences described above, lies in that the verb in itself is not considered a method. Instead, the class represented by the object is considered a property of the class in the subject. When it comes to properties in plural form, they map to multi-valued properties. There are dierent approaches to representing these in object oriented programming languages. We have tried not to limit the exibility, by maintaining as precise information as possible about the cardinality of the property. This means the exact number of instances is conserved if present; if it is not, the information that it is a multi-valued property is still saved. 3.2.3 Entity-relational Sentences
Some sentences dene entities in terms of others. These sentences are syntactically similar to the above examples. The verb to be (is/are) serves the purpose of dening such relationships. Programmatic Semantics One type of such sentences are similar to A car is a kind of vehicle. Also here there is a super-subordinate relationship, but of a dierent type. In object oriented terminology, vehicle is a superclass of car. The key to this kind of sentences is what we can call a subclassing phrase, kind of, following the verb to be. Another construction is a sentence like John is a man. The noun phrase here is a proper noun, and the signicance of this sentence is that there is a concrete, named entity (John) of the class man. In object oriented terminology, John is an object of class man. After the person John has been introduced, we must be prepared for sentences on the form John talks. The sentence looks straightforward at rst glance, but the programmatic semantics of it is a bit more complex. John maps to an object, talks to a method - but objects dont dene methods (or properties). The sentence must be understood to mean that the method belongs to the class the object is an instance of.
13
3.2.4
Phrase Lists
In natural language several phrases of the same type are often packed into a list instead of having a sentence each. Such lists can be lists of verb phrases or lists of objects to the verb to have. Lists are comma separated, except for the last two entries which are separated by a conjunction. We have implemented support for such lists, using the conjunctions and or or. 3.2.5 Compound Sentences and Pronouns
One or more sentences of the types mentioned above, combined by conjunctions and followed by a sentence-ending punctuation mark (full stop is the only one implemented by now) comprise a compound sentence. A use case is made up of one or more compound sentences. Within a use case (eventually spanning compound sentences), the subject of a sentence can be substituted by a pronoun, e.g. A man is a type of person. He has a car. The pronoun will always refer to the subject of the previous sentence. The pronouns available are he, she, it and they. 3.2.6 Syntactic Stringency
Several syntax check constraints regulate which words and word forms that can be used together. We have implemented singular/plural agreement between subject and verb phrases, such that A woman walks is accepted, but She drive a car is not. Gender agreement between pronouns and their referred nouns has also been implemented. Therefore, the use case A man goes. She has a bag is not accepted, because she is not allowed to refer to a man.
3.3
Delimiting the Project
We have done simplications regarding the input and output. Instead of full natural language input, we use Prolog list format, with each word and syntax element represented as atoms. The transition from natural syntax to this list syntax is considered trivial. The project depends on a lexicon that, in our test project, is dened as part of the program. The lexicon consists of two parts: One part denes words that are part of the basic language denition - the other part denes the domain specic terminology for the use cases we want to analyze. The latter consists of nouns, proper nouns, transitive and intransitive verbs. The domain specic lexicon is implemented as a separate module (le), making it easily interchangeable. We have built a couple of relatively small domain-specic lexicon for testing the approach. The lexicons are suciently large to prove that a reasonably wide selection of use cases can be transformed, and can easily be extended. Other dictionaries are available which contain a much larger selection words and word categorizations, see section 2.4. It would be possible to substitute our simple dictionary with one those to make the system recognize a much larger vocabulary. 14
We have chosen to focus on the structural features found in natural language. Behavioral and procedural features are not represented in our system. Metafor (section 2.3.4) and Natural language CASE Tool (section 2.3.3 have shown that it is possible to do this, but it would be time consuming and out of reach in the time available for this project. Instead of generating code in an actual programming language, we generate UML diagrams. There are several reasons why we chose to do this: There usually is a design phase between use case writing and coding. The UML class diagram is a central tool and model in this phase, and is later used when writing the actual code. We focus on the structural aspects which is exactly what is modeled in the UML class diagram. The class diagram is more illustrative than code.
3.4
Tools and methods
Denite Clause Grammar (DCG) syntactic extension to Prolog is a very powerful tool for building parsers and otherwise analyze language. DCG itself is based on a simple syntax, where production rules dene one grammatical symbol (or segment) in terms of a sequence of (one or more) others. The format is s n, v, meaning that the (non-terminal) symbol s consists of the symbol sequence n followed by v. The symbols are non-terminal (i.e. segments that have their own production rules and will be further segmented) or terminal (which, in a natural language grammar, correspond to words). Expressional strength is added by use of parameters (so called features) and by embedding ordinary Prolog syntax in the production rules. We have used DCG in combination with standard Prolog code for the input grammar and the lexicon. DCG is, as mentioned, just a syntactic extension to Prolog. The DCG production rules are translated to ordinary Prolog clauses - the production rule s n, v is transformed to the Prolog syntax: s(List1, Rest) :n(List1, List2), v(List2, Rest). The arguments to all of the predicates in this clause, are a construction called dierence lists - a construction Denite Clause Grammars rely heavily on. Dierence lists are sets of two lists, where the last list contains any tail part of the rst list. The value of a dierence list, is the head part of the rst list, up to the point corresponding to the start of the second. The dierence list [1,2,3,4,5],[4,5] evaluates to [1,2,3]. The second list may be (and often is) the empty list, in which case the dierence list equals the rst list. Dierent solutions for the grammatical parsing of the input were considered. One option was to build a full syntax tree for the use cases, but this resulted in
15
rather complex code that was hard to overlook, and it seemed the complexity did not ease the syntactical transition. A better solution proved to be an approach where the input is segmented in multiple levels, aiming to isolate the programmatic semantics of each segment on the highest possible level, and embedding functionality to handle these semantics directly in the grammar. In other words, as high up in the segmenting hierarchy as possible, the entities (from an object oriented point of view) that can be captured from the programmatic semantics of the text segments, are asserted to the programs database as facts. The following types of facts (shown with their arguments) are asserted: class(Classname) extends(Classname, Superclass) property(PropertyName, Classname, Cardinality) method(Methodname, Classname) object(Objectname, Classname) Prolog, being primarily a declarative language, has a symmetrical approach to input and output, in the sense that which variables that are output from a rule or procedure, depends on whether they are uninitialized when the rule is evaluated, not on their position in the rule like in imperative languages. In other words, variables to hold output can be found in the body (right side part) as well as the head (left side) of a rule. This symmetrical approach also holds for DCG, meaning that DCG production rules can be used not only for parsing a complex, non-terminal symbol into terminals, but also for generating a non-terminal (output) symbol from terminals. Thus, we could also use DCG in combination with standard Prolog to produce the output, based on the set of facts that exist after the input is processed.
3.5
Input Grammar Details
As mentioned earlier, the grammar in this version is a subset of ACE. Even though support for the most complex features are not implemented, the grammar is suciently advanced to allow for a variety of sentence constructions. The text is limited to present tense, 3rd person, declarative active sentence form. Viewed from top down, input consists of one or more use cases. The input grammar will identify each piece of text with semantic signicance, and for each such piece, one or more facts are asserted to the program. The list of facts increases for each use case that is analyzed. See table 1 at the end of this section for an example of mapping between use cases and asserted facts. A use case consists of one or more compound sentences (each terminated with an end punctuation symbol, .). A compound sentence, in turn, consists of one or more sentences, joined together by conjunctions. 16
A sentence is divided into a noun phrase and a verb phrase, with the noun phrase - either a proper noun or a determiner, noun sequence - as the subject of the sentence. When a noun is identied in the text, it is generally asserted as a class fact. The verb phrase can have dierent functional signicance. It can be either a subclassing verb phrase, an instantiation verb phrase, a method verb phrase or a property verb phrase, depending partly on the verb it contains, partly on other factors. Verbs are grouped according to their functionality, related to the types of verb phrases stated above. The verb to be (in its 3rd person present singular and plural forms, is and are) is classied in a functional class of its own, as a subordinating verb. As mentioned in 3.2.3 above, it is found in entity-relational sentences, its usage being to dene entities (objects and classes) that are based on other classes. It occurs in two verb phrase types, subclassing verb phrases, which signify the denition of a class as a subclass to some other class (and trigger the assertion of extends facts) and instantiation verb phrases, signifying the instantiation of an object of a class (and triggering assertion of object facts). Another functional verb class is possessive verbs. They are found in property verb phrases, which describe property relations for classes, as discussed in section 3.2.2, and assert property facts. The group consists of verbs of the type have and contain. Quantiers are used in property verb phrases to set the value of the cardinality argument of the property fact. Cardinality is set to 1 if the noun object of the verb phrase is singular, or - if the noun is plural - to a number > 1 if the a numeric quantier is specied (either as an integer or spelled out in letters for the numbers two - twelve). The atom n is used if no or an indenite quantier (like some) is specied. Other verbs are divided by traditional dividing lines, into transitive and intransitive verbs. In our lexicon, these will typically be action verbs, mapping into method facts. Intransitive verbs are asserted as parameter-less methods, while transitive verbs are asserted using the grammatical object as parameter (only one parameter is possible using this approach). As with sentences, method- and property-specifying verb phrases as well as property noun phrases can be compound, following the normal syntax where the rst entries in the list are separated with a comma, and only the last must be a conjunction - and or or. The grammar is - though fairly basic - suciently advanced to prove that the concept of a DCG, handling dierent elements of the programmatic semantics on dierent grammatical levels, has the ability to extract and express dierences in meaning from syntactically quite similar sentences.
3.6
Code Generation
The system generates UML class diagrams. We decided that this, more illustrative approach, is better suited for the purpose of this project, than generating actual code for programming language, which is argued in section 3.3. 17
Table 1: Examples of use cases and the entities associated with class(car) class(vehicle) A car is a type of vehicle. extends(car,vehicle) class(man) A man is a kind of person. John is a class(person) extends(man,person) man. object(john, man) class(car) class(seat) property(seat,car,n) A car has some seats, an engine and class(engine) four wheels. property(engine,car,1) class(wheel) property(wheel,car,4) class(library) class(book) property(book,library,1) class(magazine) A library contains books and magazines property(magazine,library,1) and has borrowers. A borrower borrows class(borrower) a book. He takes the book an goes. property(borrower,library,1) method(borrow,borrower,book) method(borrow,borrower,book) method(take,borrower,book) method(go,borrower) Diagrams are generated in the dot language which can be visualized using Graphviz. Graphviz is open source graph visualization software. [gra]. UML class diagrams only represent structural relations between classes such as inheritance, association and aggregation. Objects are not included. However in our diagrams we decided not conform strictly to the UML specication [uml], so we could include objects anyway. Objects can be recognized using our input grammar, so they should also be displayed in the output. We have introduced our own notation for including objects in the diagram. Objects are represented as boxes that contains the class of the object, then a colon, followed by the name of the object. There is an arrow from the class to the object. The head of the this arrow is a round circle. This is also our own notation. An example of the notation is shown in gure 1. We display properties both on the class they belong to and as aggregation arrows. Normally only simple types are displayed as properties on the class, however we do not operate with simple types. From a visual perspective, it seems clearer to display properties as both.
18
myclass
+some_method() : void
myclass: myobject
Figure 1: Object notation for UML diagrams 3.6.1 Extraction of facts from the program
As described in section 3.4, programmatic semantics are asserted as facts in the program. To facilitate code generation, we must later extract these facts again. This is done using rules that utilize the built-in bagof predicate. All the facts of each category is appended to lists as tokens. These list are combined into to a token program, were the facts from each lists are qualied with additional information tokens (such as class, method, property and object). The nal token program is a at list of tokens. And example of such a token program is shown below (indented for readability): [ program, class, vehicle, class, car, method, drive, property, wheel, 4, property, engine, 1, property, seat, n, class wheel, class engine, class seat, extends, vehicle, car ] 3.6.2 Generation of Graphviz code
The token program is used as input to a DCG grammar. This also serves as an illustration of how denite clause grammars can be used to produce a language as well as parse one. The DCG grammar builds the Graphviz code as a parse tree, while recognizing the token language. Each rule in the DCG grammar builds a list of dot code and control tokens, corresponding to the element in the token grammar. The result is a nested list (indeed, a tree) containing all the output for Graphviz.
19
Rules of the grammar egory: classes properties methods extends (inheritance) objects
The grammar contains rules for each semantic cat-
Since each of the elements may occur a number of times (including zero), they are recognized by recursive DCG rules. Thoose rules are similar in structure, and the structure of such a recursive rule is: rec_elements([]) --> []. % match zero or no more elements rec_elements([X,Y]) --> single_element(X), rec_elements(Y). single_element(X) --> ... Each single element rule contributes some code to the feature X. The recursive rules collects these contributions in nested lists. The nesting level of the produced list is proportional to the number of elements of the this type. For most of the elements, context is not really needed. Code can be generated independent of its context. Whether or not this is possible depends on the similarity of the input and output languages. In our case, we have designed the input language such that the order of elements is almost identical to corresponding elements of Graphviz code. The places where we need to handle context are described below. Classes The syntax of classes in the token language expressed in Backus-Naur Form (BNF) is: <class> ::= class class-name methods properties For each class its properties and methods is generated recursively. The class produce a feature with all the Graphviz code for the class, which include the code passed up from the methods and properties. Properties Properties have following the token syntax expressed in BNF:
<property> ::= property name cardinality Properties are translated into to aggregation arrows, that point from one class to another. The name of the class pointed from is given as the next token in the input stream, whereas the class pointed to occurs somewhere before the property in the input stream. We handle this by passing down the name of the current class as feature to the properties rules. A second feature is used to pass 20
up the generated tree for aggregation arrows and a third feature is used to pass up the generated tree for the class property list. Cardinality is resolved using the the typeinf relation. typeinf generates either an atomic type or an array type (with relevant number of elements) for the property list. Methods We handle two kinds of methods: Methods that doesnt take any arguments and methods that take exactly one argument. The token syntax for methods in BNF is: <method> ::= method name | method name argument Methods without argument are simple and are reected only in the method list in the class description. A feature is used to generate the tree for the method list. Methods with an argument also triggers the construction of an association arrow. The rst end of the association arrow, the name of the class the method belongs to, is passed down using a feature, the second end is given as the next token in the input stream. A second feature is used to pass up the method list and a third is used to pass up the code for the association arrow. extends Extends has the following syntax in BNF:
<extends> ::= extends superclass-name subclass-name In inheritance (extends) the name of the related classes are given directly as token following the extends token. This saves the use a feature to pass down the name of the current class. In the dot language it doesnt matter where we put the inheritance declarations, so we just put them after all the class declarations. Objects Objects have the following syntax in BNF:
<object> ::= object object-name class-name Objects are similar in construction to extends. No contextual information is needed by the rules producing code for objects. They are placed in the end of the dot program. 3.6.3 From parse tree to code
A depth-rst traversal of the parse tree will visit the nodes in the correct order for output in the dot language. We atten the list before output, and this attening process is really a depth-rst traversal that puts the elements in a at list in the order visited. Each element in this list is written to a le in the order appearing. Certain control tokens (tab and newline) are used to control the formatting of the output, and thus have special interpretations. All other elements are written directly.
21
4
4.1
Running the Project Software

Instructions for Use
The project software consists of several Prolog program les. InputGrammar.pl (Grammatical parser) CodeGen.pl (Output generator) Lexicon XXX.pl (Domain specic Lexicon for domain XXX) After consulting the relevant les, one or more use cases are entered using the clause: use_case(U). The argument U is a list of atoms representing words and punctuation. Punctuation marks must be enclosed in apostrophes. To generate output, enter the clause: generate_dotty_file(F). F contains the name of a le (enclosed in apostrophes) to contain the output. The le contents serves as input to Graphviz.
4.2
4.2.1
Example Sessions
Simple example using Test domain
This rst example session uses the lexicon contained in the le lexicon test.pl. The le contains terms related to persons and cars - a domain that is not closely related to computer systems, but that serves the point of demonstrating entity relationships. SICStus 3.11.0 (x86-win32-nt-4): Mon Oct 20 00:38:10 WEDT 2003 Licensed to ruc.dk | ?- :consult(C:/NLPproject/InputGrammar.pl). % consulting c:/nlpproject/inputgrammar.pl... % loading c:/program files/sicstus prolog 3.11.0/library/lists.po... % module lists imported into user % loaded c:/program files/sicstus prolog 3.11.0/library/lists.po in module lists, 0 msec 13600 bytes % consulted c:/nlpproject/inputgrammar.pl in module user, 10 msec 30688 bytes | ?- :consult(C:/NLPproject/codegen.pl). % consulting c:/nlpproject/codegen.pl... % consulted c:/nlpproject/codegen.pl in module user, 10 msec 12320 bytes | ?- :22
consult(C:/NLPproject/Lexicon\_Test.pl). % consulting c:/nlpproject/lexicon\_test.pl... % consulted c:/nlpproject/lexicon\_test.pl in module user, 0 msec 3976 bytes | ?- use_case([a,man,is,a,kind,of,person,.,john,is,a,man,.]). yes | ?- use_case([a,woman,is,a,type,of,person,., women,walk,,,talk,and,drive,cars,.]). yes | ?- use_case([a,car,has,an,engine,and,four,wheels,.]). yes | ?- generate_dotty_file(c:\\dotty_test.txt). yes | ?This generates the following output: digraph G { fontsize = 8 node [ fontsize = 8 shape = "record" ] edge [ fontsize = 8 ] man[ label = "{man||}" ] person[ label = "{person||}" ] woman[ label = "{woman||+: walk(param:) : void\l+: talk(param:): void\l+: drive(param:car) : void\l}" ] edge [ arrowhead = "none" ] woman -> car [ label="drive" ] car[ label = "{car|- property: engine\l- property: wheel[4]\l|}" ] edge [ arrowhead = "odiamond" ] 23
engine -> car [ label="1" ] edge [ arrowhead = "odiamond" ] wheel -> car [ label="4" ] engine[ label = "{engine||}" ] wheel[ label = "{wheel||}" ] edge [ arrowhead = "empty" ] man -> person edge [ arrowhead = "empty" ] woman -> person obj_john [ label = "{man:\ljohn}" ] edge [ arrowhead = "odot" ] man -> obj_john } When this le is processed by Graphviz, it produces the graphic representation show in gure 2.
man woman engine wheel
+: walk(param:) : void +: talk(param:) : void +: drive(param:car) : void
drive
person man: john
car - property: engine - property: wheel[4]
Figure 2: UML diagram for sample session The diagram uses slightly modied UML diagram syntax. The nouns from the use cases can be recognized as class rectangles, with methods and properties specied, and the standard UML arrowed connectors showing generalization (inheritance). Classes that are properties of other classes, are shown with connectors with an open diamond ending, the UML syntax for aggregations. The connectors are even equipped with cardinality indicators. If a method in one class has another class type as parameter, the classes are connected with an 24
arrowless connector - UML for associations. The connectors are marked with the method name. There is one additional syntac element, as presented in 3.6, which is not part of the UML standard: Instantiated objects are shown in the diagram. They are connected to the class they belong to by a connector with a circle in the object end. 4.2.2 More complex example with the Company domain
Not only does this example represent a more complex set of entities, the domain is also more relevant in relation to design of computer system. The lexicon le lexicon company.pl contains terms related to companies and employees, and can be viewed as a starting point for a Human Resources information system. The Prolog session is simmilar to the one above, but instead of the Lexicon Test.pl, the le Lexicon Company.pl should be consulted. And of course the use cases are dierent: use_case([a,company,has,a,number,of,departments,., it,produces,goods,or,delivers,services,.]), use_case([a,department,consists,of,employees,., all,employees,work,and,they,have,a,position,., they,are,persons,.]), use_case([the,employees,have,salaries,., the,company,pays,the,salary,.]), use_case([a,sales,representative,is,a,kind,of,position,., an,office,clerk,is,a,type,of,position,.]), use_case([office,clerks,use,a,computer,.]), use_case([sales,representatives,sell,and,they,have,a,budget,.]), use_case([a,boss,is,a,kind,of,position,.]), use_case([a,boss,manages,employees,.]), use_case([mary, is, the, boss, ., john, is, an, office, clerk, .]). We will not show the contents of intermediate output le here. The graph produced when it is processed by Graphviz is shown in gure 3
Future Work
There are several options to extend the project and its software. We give a short presentation of some of them below. An obvious one is to extend the grammar. The other options included, deal with integration and communication with the outer world, i.e. other existing systems and services. Grammar Extension. The grammatical system implemented in the project software is quite limited. Due to the time frame for the project, we had to concentrate on a limited number of syntactical constructs. The software must be considered a proof of concept more than a nished system. 25
Figure 3: UML diagram for company use case It would be interesting to continue extending the grammar. The list of interesting new features to implement includes more advanced use of pronouns (also enabling pronominal references to the object of sentences), handling indirect objects and prepositional expressions (enabling sentences indicating methods with more than one argument, The man gives the book to the woman). These extensions do not have large semantical implications. Features that also involve new features in the programmatic semantics of the software, would be introduction of primitive types (int,real,string). Presently, all properties of a class are of another class type. Also, introduction of adjectives would imply changes to the property typing system. The car is red would for instance have to map red to an enumeration type colour, for which a nite set of values were dened. Finally, one could try to implement support for expressing program logic,
26
to put some esh on the skeleton code. One would need ways to specify sequences, selections and iterations of operations, and a strategy for coupling the code description to the correct method. This needs very careful consideration, and it is hard to imagine a solution to this without a quite stringent and restricted natural language syntax. Input Parsing via ACE Web Service. The Attempto Controlled English project oers a Web Service that - among more advanced functions - can be used to transform a sentence from ordinary free form (with blanks separating words) to Prolog list format. The service also checks that the input is legal Controlled English, so it can serve as a rst level syntax check for the program. We have not been able to nd any specic information on, or explicit support for, using Sixtus Prolog as a Web Service client. Though it is probably not very complicated to implement (using Prologs HTTP libraries) we did not want to spend time on investigating this during this project. Extending the Program with Semantically Rich Lexicons. It would be interesting to extend the program using one the various lexicons described in section 2.4. One obvious consequence of doing this is that we can assume things not explicitly described in the use case by the user. For instance, if the user describes an entity called john, we could infer that john is the name of a person, even if the user has never mentioned anything about persons. Inheritance and composition could be inferred with a very limited amount of input from the user using Wordnets kindOf and PartOf relations. ConceptNets semantic relations go further and dene things such as subeventOf, which could be used to infer procedural features. It would also allow for a much wider input language where synonyms and word senses could be inferred from context. There is a catch though. Automatically inferring things from a semantic net such the mentioned lexicons, could have the consequence that the system infers something errorneously. Also the system could infer to much or to little. Maybe the user does not want the system to automatically infer a person class etc. Using these rich semantic lexicons would bring us closer to an opportunistic approach. Such an approach has some very desirable features, but also introduce unwanted complexity. Output Formats and .Net CodeDOM. We have concentrated on generating output in a format that can be used to produce UML diagrams. UML diagrams are a common representation for all object oriented languages and formats, and this makes it a natural rst step in a system like this. However, having the information necessary to construct the class diagram for the UML, going one step further and actually generate a code skeleton in some object oriented language is close to trivial, even if it would involve quite a bit of labor. An output grammar (code generator) would have to be constructed for each language to support. However, general purpose programming
27
languages pose some challenges that were not considered in the generation of UML diagrams: Types: Simple types vary between programming languages and implementation would require a connection from the input grammar to desired programming language. The current input grammar can be use to describe that a person has a name and that a name is a string. That would trigger creation of a string class (which wouldnt know how to actually represent a string). A more sensible solution would instead use the programming languages built-in string class. Order of construction of classes: In Graphviz the order in which the classes are described doesnt matter. In a programming language, classes usually must be described in an order such that they are declared before classes referencing them. Generation of variable and parameter names: This could be easily ensured by using a global uniqueness scheme. Giving the variables sensible names would be more dicult. One particularly interesting output alternative to investigate, would be to interface the program to Microsoft .NET and deliver the data in CodeDOM (Code Document Object Model) format. CodeDOM is a .NET API dening a language independent program description (or meta language) model. A program specication in CodeDOM can be rendered to any .NET-based programming language. However, the CodeDOM model itself has no persistent format - the model is constructed in runtime and can not be saved otherwise than by rendering it in a programming language. Therefore, direct interfacing to .NET is necessary to use this approach. Prolog has a .NET interface module that we presume could facilitate this.
Conclusion
The purpose of this project was formulated as ...to investigate the possibilities for automated transition from Use Cases in a natural language syntax into a computer readable representation.... Natural language processing is an immensely complex eld. It was important for us to constrain the scope of the project, since there would have been enough of interesting problems to spill our time on. Committing to these constraints, we have shown that our goal was realistic, and it was possible to construct a though somewhat limited - functional NLP system within the timeframe of a 4 week student project. We also got valuable experience in the use of Prolog and Denite Clause Grammars, which proved to be as highly powerful tools for this kind of task as we anticipated. For relatively unexperienced Prolog users, however, it is sometimes demanding to switch your mindset from imperative to logic programming -
28
things take time, and often, the result of a days work measured in lines of code is not too impressive. The more satisfactory to see how advanced functionality one can achieve with really few program lines! Summing up, it has been a most interesting project to work with, we achieved our goal, and it had a good learning eect.
29
References
[Ach98a] Camille Ben Achour. Guiding scenario authoring. In EJC, pages 152171, 1998. [Ach98b] Camille Ben Achour. Writing and correcting textual scenarios for system design. In DEXA Workshop, pages 166170, 1998. [BJR99] Grady Booch, Ivar Jacobson, and James Rumbaugh. The Unied Modeling Language User Guide. Addison-Wesley, 1999. [Coc97] [CP00] Alistair Cockburn. Structuring use cases with goals. Journal of ObjectOriented Programming, SeptemberOctober 1997. Karl Cox and Keith Phalp. Replicating the CREWS use case authoring guidelines experiment. Empirical Software Engineering, 5(3):245 267, 2000. Christiane Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998. Martin Fowler. UML Distilled: A Brief Guide to the Standard Modeling Object Language. Object Technology Series. Addison-Wesley, third edition, September 2003.
[Fel98] [Fow03]
[FST99] Norbert E. Fuchs, Uta Schwertel, and Sunna Torge. Controlled natural language can replace rst-order logic, October 09 1999. [Fuc00] [gra] Norbert E. Fuchs. Attempto controlled english. In WLP, pages 211 218, 2000. Graphviz - graph visualization software.
[KCS01] Keith Phalp Karl Cox and Martin Shepperd. Comparing use case writing guidelines. In Seventh International Workshop on Requirements Engineering (RE01), June 2001. [LL04] Hugo Liu and Henry Lieberman. Toward a programmatic semantics of natural language. In VL/HCC, pages 281282. IEEE Computer Society, 2004. Liu, Hugo and Lieberman, Henry. Programmatic semantics for natural language interfaces. In Proceedings of ACM CHI 2005 Conference on Human Factors in Computing Systems, volume 2 of Late breaking results: short papers, pages 15971600, 2005. Hugo Liu and Push Singh. Conceptnet: A practical commonsense reasoning toolkit, May 02 2004. 30
[LL05]
[LS04]
[NT96]
Introducing A Natural and Language Case Tool. Eliciting and mapping business rules to is design: Introducing A natural language case tool, July 31 1996. John F. Sowa. Common logic controlled english, 2004. Unied Modeling Language (UML), version 2.0.
[Sow04] [uml]
31
Appendix A - Code
This Appendix contains the Prolog code produced during the project. The code is also available via the webpage: http://www.itu.dk/~cth/nlp/
8.1

Input Grammar.pl
dynamic(class/1). %class(Name) dynamic(method/2). %method(Name,Class) dynamic(method/3). %method(Name,Class,Argument) dynamic(property/3). %property(Name,Class,Cardinality) dynamic(extends/2). %extends(Name,Super) dynamic(object/2). %object(Name,Class) use module(library(lists)).
%%%% Global predicates
addfact(F )
F assert(F ) ; true.
getclass(A, C ) object(A, C ); class(A), A = C .

%%%% Translation from input format to DCG format
use case(S ) use case( ,
, S , [ ]).
%%%% Grammar rules and assertions %%%% Ai = Actor (input), Ao = Actor (output), Gi = Gender (input), Go = Gender (output)
use case(Ai , Gi ) compound sentence(Ai , Gi , , ). use case(Ai , Gi ) compound sentence(Ai , Gi , Ao, Go), use case(Ao, Go). compound sentence(Ai , Gi , Ao, Go) sentence(Ai , Gi , Ao, Go), moresentences(Ao, Go), end punctuation. sentence( , , Actor , Gnd ) noun phrase(Cnt, Gnd , Actor ), verb phrase(Cnt, Actor ). sentence(Actor , Gnd , Actor , Gnd ) pronoun(Cnt, Gnd , Actor ), verb phrase(Cnt, Actor ). moresentences( , ) [ ]. moresentences(Ai , Gi ) conjunction, sentence(Ai , Gi , Ao, Go), moresentences(Ao, Go). verb verb verb verb phrase(Cnt, phrase(Cnt, phrase(Cnt, phrase(Cnt, Actor ) Actor ) Actor ) Actor ) subclassing verb phrase(Cnt, Actor ). instantiation verb phrase(Cnt, Actor ). method verb phrases(Cnt, Actor ). property verb phrases(Cnt, Actor ).
subclassing verb phrase(Cnt, Actor ) subord verb(Cnt, ), subclassing noun phrase(Cnt, Object), !, 32
{addfact(extends(Actor , Object)) }. instantiation verb phrase(Cnt, Actor ) subord verb(Cnt, ), noun phrase( , {addfact(object(Actor , Object)) }. , Object), !,
method verb phrases(Cnt, Actor ) method verb phrase(Cnt, Actor ). method verb phrases(Cnt, Actor ) method verb phrase list(Cnt, Actor ), method verb phrase(Cnt, Actor ), conjunction, method verb phrase(Cnt, Actor ). method verb phrase list( , ) [ ]. method verb phrase list(Cnt, Actor ) method verb phrase(Cnt, Actor ), list separator, method verb phrase list(Cnt, Actor ). method verb phrase(Cnt, Actor ) intrans verb(Cnt, Verb), !, {getclass(Actor , Actor Class), addfact(method(Verb, Actor Class)) }. method verb phrase(Cnt, Actor ) trans verb(Cnt, Verb), noun phrase( , , Object), !, {getclass(Actor , Actor Class), addfact(method(Verb, Actor Class, Object)) }. property verb phrases(Cnt, Actor ) property verb phrase(Cnt, Actor ). property verb phrases(Cnt, Actor ) property verb phrase list(Cnt, Actor ), property verb phrase(Cnt, Actor ), conjunction, property verb phrase(Cnt, Actor ). property verb phrase list( , ) [ ]. property verb phrase list(Cnt, Actor ) property verb phrase(Cnt, Actor ), list separator, property verb phrase list(Cnt, Actor ). property verb phrase(Cnt, Actor ) possess verb(Cnt, ), property noun phrases(Actor ). property noun phrases(Actor ) property noun phrase(Actor ). property noun phrases(Actor ) property noun phrase list(Actor ), property noun phrase(Actor ), conjunction, property noun phrase(Actor ). property noun phrase list( ) [ ]. property noun phrase list(Actor ) property noun phrase(Actor ), list separator, property noun phrase list(Actor ). property noun phrase(Actor ) noun phrase(sing, , Object), !, {getclass(Actor , Actor Class), addfact(property(Object, Actor Class, 1)) }. property noun phrase(Actor ) quantier(X ), noun phrase(plur, , Object), !, {getclass(Actor , Actor Class), addfact(property(Object, Actor Class, X )) }. noun phrase(Cnt, Gnd , Actor ) determiner(Cnt), noun(Cnt, Gnd , Actor ), !, {addfact(class(Actor )) }.
33
noun phrase(sing, Gnd , Actor ) proper noun(Gnd , Actor ). subclassing noun phrase(Cnt, Actor ) subclasser, noun(Cnt, {addfact(class(Actor )) }.
%%%% Lexicon general
, Actor ), !,
conjunction [and]. conjunction [or]. list separator [',']. end punctuation ['.']. determiner(sing) [a]. determiner(sing) [an]. determiner( ) [the]. determiner(plur) [ ]. determiner(sing) [any]. determiner(sing) [every]. determiner(plur) [some]. determiner(plur) [most]. determiner(plur) [all]. pronoun(sing, n, ) [it]. pronoun(sing, m, ) [he]. pronoun(sing, f , ) [she]. pronoun(plur, , ) [they]. quantier(n) [ ]. quantier(n) [several]. quantier(n) [some]. quantier(n) [a, number, of ]. quantier(X ) [X ], {integer(X )}. quantier(2) [two]. quantier(3) [three]. quantier(4) [four]. quantier(5) [ve]. quantier(6) [six]. quantier(7) [seven]. quantier(8) [eight]. quantier(9) [nine]. quantier(10) [ten]. quantier(11) [eleven]. quantier(12) [twelve]. subclasser [a, kind, of ]. subclasser [a, sort, of ]. subclasser [a, type, of ]. possess verb(sing, have) [has].
34
possess verb(plur, have) [have]. possess verb(sing, contain) [contains]. possess verb(plur, contain) [contain]. possess verb(sing, consistof ) [consists, of ]. possess verb(plur, consistof ) [consist, of ]. subord verb(sing, be) [is]. subord verb(plur, be) [are].
8.2
CodeGen.pl
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Code generation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
use module(library(lists)).
%%% utility rules
atten([Head |Tail ], FlatList) atten(Head , FlatHead ), atten(Tail , FlatTail ), append(FlatHead , FlatTail , FlatList), !. atten([ ], [ ]). atten(X , [X ]) atomic(X ).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Determine array type using cardinality..
typeinf (n, '[n]') !. typeinf (1, '') !. typeinf (X , ['[', X , ']']) !.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % extraction % Extracts facts from the program as lists %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
classes(Classes) bagof (Class, class(Class), Classes), !. classes([ ]). class methods noarg(Class, Methods) bagof (X , method(X , Class), Methods). class methods noarg( , [ ]). class methods one arg(Class, [Methods, Arg]) bagof (X , method(X , Class, Arg), Methods). class methods one arg( , [ ]). class properties(Class, [X , Cnt])
35
bagof (X , property(X , Class, Cnt), X ). class properties( , [ ]). class properties list(Class, L) bagof (X , class properties(Class, X ), L). extends list([SuperClass, SubClass]) bagof (X , extends(SubClass, SuperClass), X ). extends list([ ]). objects([O, C ]) bagof (X , object(O, C ), X ). objects([ ]).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Generate program list %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
generate program(Program) classes(C ), generate classes(C , ClassProg), bagof (EL, extends list(EL), EL), qualify(extends, EL, ExtendsProg), bagof (O, objects(O), O), qualify(object, O, ObjectsProg), atten([program, ClassProg, ExtendsProg, ObjectsProg], Program), !. generate classes([C |Rest], [class, C , M0L, M1L, PL|GRest]) class methods noarg(C , M0 ), qualify(method, M0 , M0L), bagof (X , class methods one arg(C , X ), M1 ), qualify(method, M1 , M1L), class properties list(C , P ), qualify(property, P , PL), generate classes(Rest, GRest). qualify( , [ ], [ ]). qualify( , [[ ]], [ ]). qualify(Q, [[ ]|Rest], R) qualify(Q, Rest, R). qualify(Q, [C |Rest], [Q, C |R]) qualify(Q, Rest, R).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Generate output for grapviz % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
gv uml([H , C , E , O, F ]) gv header(H ), gv classes(C ), gv extends(E ), gv objects(O), gv footer(F ). gv header(['digraph G {', newline, tab, 'fontsize = 8', newline, newline,
36
tab, 'node [', newline, tab, tab, 'fontsize = 8', newline, tab, tab, 'shape = "record"', newline, tab, ']', newline, newline, tab, 'edge [', newline, tab, tab, 'fontsize = 8', newline, tab, ']', newline ]) [program]. gv footer([newline, '}', newline]) [ ].
%%% Classes:
gv classes([ ]) [ ]. gv classes([C , CS ]) gv class(C ), gv classes(CS ). gv class( [tab, Name, '[ ', newline, tab, tab, 'label = "{', Name, '|', P , '|', M , '}"', newline, tab, ']', newline, Aggregations, newline, newline, A, newline]) [class], gv name(Name), gv methods(Name, A, M ), gv properties(Name, P , Aggregations).
%% Methods:
gv methods( , [ ], [ ]) [ ]. gv methods(Class, [A1 , A2 ], [N , M ]) gv method(Class, A1 , N ), gv methods(Class, A2 , M ). gv method(Class, Assoc, ['+: ', Name, '(param:', Arg, ')', ' : ', 'void', '\\', 'l' ]) [method], gv name(Name), gv method arg(Class, Arg, Name, Assoc). gv method arg( , [ ], , [ ]) [ ]. gv method arg(Class, Arg, MethodName, [ newline, tab, 'edge [ arrowhead = "none" ]', newline, tab, Class, ' -> ', Arg, ' [ label="', MethodName, '" ]', newline ]) gv name(Arg).
%% Properties:
gv properties( , [ ], [ ]) [ ].
37
gv properties(Class, [Props1 |PropsRest], [Agg1 , AggRest]) gv property(Class, Props1 , Agg1 ), gv properties(Class, PropsRest, AggRest). gv property(Class, [ '- property: ', Name, ArrayType, '\\', 'l' ], [ tab, 'edge [ arrowhead = "odiamond" ]', newline, tab, Name, ' -> ', Class, ' [ label="', C , '" ]', newline ]) [property], gv name(Name), gv cardinality(C ), {typeinf (C , ArrayType)}.
%% inheritance:
gv extends([ ]) [ ]. gv extends([E , F ]) gv extend(E ), gv extends(F ). gv extend([ tab, 'edge [ arrowhead = "empty" ]', newline, tab, Sub, ' -> ', Super , newline]) [extends], gv name(Super ), gv name(Sub).
% objects
gv objects([ ]) [ ]. gv objects([O, P ]) gv object(O), gv objects(P ). gv object([ newline, tab, 'obj ', Object, ' [ ', newline, tab, tab, 'label = "{', Class, ':', '\\', 'l', Object, '}"', newline, tab, ']', newline, tab, 'edge [ arrowhead = "odot" ]', newline, tab, Class, ' -> ', 'obj ', Object, newline]) [object], gv name(Object), gv name(Class). gv name(X ) [X ]. gv cardinality(X ) [X ].
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Write output to le %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
output code([ ]).
38
output code([newline|Rest]) nl, output code(Rest). output code([tab|Rest]) write(' '), output code(Rest). output code([X |Rest]) write(X ), output code(Rest). generate dotty generate program(P ), !, gv uml(Code, P , [ ]), atten(Code, FlatCode), output code(FlatCode). generate dotty le(File) tell(File), generate dotty, told.
8.3
Lexicon Test.pl
%%%% Dictionary Domain specic Persons and cars
noun(sing, n, person) [person]. noun(plur, n, person) [persons]. noun(sing, m, man) [man]. noun(plur, m, man) [men]. noun(sing, f , woman) [woman]. noun(plur, f , woman) [women]. noun(sing, n, car) [car]. noun(plur, n, car) [cars]. noun(sing, n, engine) [engine]. noun(plur, n, engine) [engines]. noun(sing, n, wheel) [wheel]. noun(plur, n, wheel) [wheels]. noun(sing, n, seat) [seat]. noun(plur, n, seat) [seats]. noun(sing, n, bag) [bag]. noun(plur, n, bag) [bags]. proper noun(m, john) [john]. proper noun(f , mary) [mary]. intrans intrans intrans intrans intrans intrans intrans intrans verb(sing, go) [goes]. verb(plur, go) [go]. verb(sing, walk) [walks]. verb(plur, walk) [walk]. verb(sing, talk) [talks]. verb(plur, talk) [talk]. verb(sing, look) [looks]. verb(plur, look) [look].
39
intrans verb(sing, run) [runs]. intrans verb(plur, run) [run]. trans verb(sing, drive) [drives]. trans verb(plur, drive) [drive]. trans verb(sing, like) [likes]. trans verb(plur, like) [like]. trans verb(sing, love) [loves]. trans verb(plur, love) [love].
8.4
Lexicon Company.pl
%%%% Dictionary Domain specic Companies and employees
noun(sing, n, company) [company]. noun(plur, n, company) [companies]. noun(sing, n, department) [department]. noun(plur, n, department) [departments]. noun(sing, n, person) [person]. noun(plur, n, person) [persons]. noun(sing, m, man) [man]. noun(plur, m, man) [men]. noun(sing, f , woman) [woman]. noun(plur, f , woman) [women]. noun(sing, n, employee) [employee]. noun(plur, n, employee) [employees]. noun(sing, n, salary) [salary]. noun(plur, n, salary) [salaries]. noun(sing, n, position) [position]. noun(plur, n, position) [positions]. noun(sing, n, oce clerk) [oce, clerk]. noun(plur, n, oce clerk) [oce, clerks]. noun(sing, n, sales rep) [sales, representative]. noun(plur, n, sales rep) [sales, representatives]. noun(sing, n, budget) [budget]. noun(plur, n, budget) [budgets]. noun(sing, n, computer) [computer]. noun(plur, n, computer) [computers]. noun(plur, n, goods) [goods]. noun(plur, n, service) [services]. noun(sing, n, boss) [boss]. noun(plur, n, bosses) [bosses]. proper noun(m, john) [john]. proper noun(f , mary) [mary]. intrans verb(sing, work) [works].
40
intrans verb(plur, work) [work]. intrans verb(sing, sell) [sells]. intrans verb(plur, sell) [sell]. trans verb(sing, pay) [pays]. trans verb(plur, pay) [pay]. trans verb(sing, produce) [produces]. trans verb(plur, produce) [produce]. trans verb(sing, deliver) [delivers]. trans verb(plur, deliver) [deliver]. trans verb(sing, manage) [manages]. trans verb(plur, manage) [manage]. trans verb(sing, use) [uses]. trans verb(plur, use) [use].
%%%% Test case:
company test case use case([a, company, has, a, number, of , departments, '.', it, produces, goods, or, delivers, services, '.']), use case([a, department, consists, of , employees, '.', all, employees, work, and, they, have, a, position, '.', they, are, persons, '.']), use case([the, employees, have, salaries, '.', the, company, pays, the, salary, '.']), use case([a, sales, representative, is, a, kind, of , position, '.', an, oce, clerk, is, a, type, of , position, '.']), use case([oce, clerks, use, a, computer, '.']), use case([sales, representatives, sell, and, they, have, a, budget, '.']), use case([a, boss, is, a, kind, of , position, '.']), use case([a, boss, manages, employees, '.']), use case([mary, is, the, boss, '.', john, is, an, oce, clerk, '.']).
41

A Small Natural Language Interpreter in Prolog

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A Small Natural Language Interpreter in Prolog

Încărcat de

Drepturi de autor:

Formate disponibile

A small natural language interpreter in Prolog

Purpose and Scope

Supported Natural Language Constructs

Delimiting the Project

Tools and methods

Input Grammar Details

The grammar contains rules for each semantic cat-

Running the Project Software

+: walk(param:) : void +: talk(param:) : void +: drive(param:car) : void

person man: john

car - property: engine - property: wheel[4]

%%%% Global predicates

getclass(A, C ) object(A, C ); class(A), A = C .

use case(S ) use case( ,

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Code generation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

typeinf (n, '[n]') !. typeinf (1, '') !. typeinf (X , ['[', X , ']']) !.

output code([ ]).

%%%% Dictionary Domain specic Persons and cars

%%%% Dictionary Domain specic Companies and employees

S-ar putea să vă placă și