Sunteți pe pagina 1din 10

A Linguistic Approach to Conceptual Modeling

with Semantic Types and OntoUML

Lucia Castro, Fernanda Baião Giancarlo Guizzardi


NP2Tec-Research and Practice Group in NEMO-Ontology and Conceptual Modeling
Information Technology Research Group
UNIRIO UFES
Rio de Janeiro, Brazil Vitória, Brazil
{lucia.castro, fernanda.baiao}@uniriotec.br gguizzardi@inf.ufes.br

Abstract— The process of conceptual modeling involves the variety of ways, like from interviews with users, reports and
acquisition of concepts (and of the signs that represent them) functional documents pertaining to that environment, from
used in the Universe of Discourse (UoD) being modeled, and observation of the group routine, etc. However, no matter
the creation of the actual model according to a modeling the source of information, the knowledge about the scenario
language grammar. The knowledge about the UoD is obtained
to be modeled is always passed to the modeler in a natural
from a variety of sources, all of which are always expressed in
a natural language. It is correct to say that conceptual language. To develop a conceptual model, the modeler must
modeling is much similar to language translation i.e., identify conceptual elements, understand how they relate to
identifying concepts that are represented by signs of a each other and then represent both conceptual elements and
language, and then representing those same concepts in a their inter-relationships in a modeling language [3]. It is
different language. As so, conceptual modeling activities should correct to say that the conceptual modeling process is a
be performed from on a linguistic point of view. Moreover, the translation activity, i.e., identifying concepts that are
modeling language should have constructs that allow for a represented by signs that belong to a language, and then
representation that is at least semantically equal to the natural represent those same concepts with signs that belong to a
language original descriptions. This work proposes a linguistic
different language. The modeler is supposed to be a
approach to conceptual modeling based on the notion of
semantic types, and on the use of OntoUML as a modeling bilingual person, i.e., someone who can communicate in
language. The proposed approach is illustrated in a theoretical both the natural and the modeling languages.
case study. Defining the process of conceptual modeling as a
Keywords-conceptual modeling, OntoUML, linguistic
translation activity is not a novelty and consequently, it was
approach only natural that researchers resorted to linguistics for
support in the development of methods and solutions for the
I. INTRODUCTION modeling process, as illustrated by the research works
presented in [4] and its references. However, such projects
Conceptual modeling “is by far the most critical phase view the modeling activities from the perspective of the
of database design and further development of database (meta)model, and linguistic concepts are used, at best, as a
technology is not likely to change this situation”, [1]. In means to support modeling decisions. Also, they are
fact, the results of an empirical study on enterprise restricted to syntactic analysis and concepts to corroborate
conceptual modeling [2] reveal that “…the vast majority of their choices, barely mentioning semantics at all; yet,
modeling teams are sketching and not using CASE or CAD translating is an activity based on “meaning”.
tools.” This is a consequence to the fact that a conceptual
model is a tool for intentional communication and As with a NL to NL translation, the model is ideally
reasoning, i.e., human, as opposed to technical, activities. expected to have the same meaning as the texts or
The process of conceptual modeling involves two main representations in the natural language; in other words,
tasks; the first on involves the acquisition of concepts anyone who is literate in the modeling language must, from
related to the Universe of Discourse (UoD) being modeled, reading the model, get the same understanding of the UoD
along with the identification of the natural language signs as from reading the its natural language description(s). To
used to represent such concepts. The second involves the accomplish this task, the modeler needs not only a linguistic
representation of the acquired concepts according to the understanding of the source documents but also to choose a
grammar of a modeling language, i.e., the creation of the modeling language that is as expressive as its natural
actual model. The knowledge about the UoD is obtained in a
counterpart, and that provides the means for the creation of these, semantics stands out since meaning is the “holy grail”
clear and sound representations of the modeled domain. not only of linguistics, but also of philosophy, psychology
and neuroscience, to say the least, since “Understanding
This article documents a research on the field of
how we mean and how we think is a vital issue for our
conceptual data modeling that involves the semantic types
intuitive sense of ourselves as human beings.” [13].
proposed by Dixon [5], and their comparison to the
Saussure, also in [11], describes the linguistic sign as
constructs of the OntoUML modeling language, which is
the union of a signifier (an acoustic image or a articulated
described and documented in [6], [7], [8] and [9], besides
word) and a signified (a concept). In [14], Ullmann, on the
the proposal of an approach for such modeling process. It is
other hand, presents the three elements of meaning: a sign
divided in six sections, as follows: section 2 discusses
that represents a concept and refers to a thing in the real
languages, both natural and modeling ones; section 3
world, or the referent; the concept is an abstraction of the
describes the translation or mapping between natural and
thing. He states that linguistics works on the left side of
modeling language; section 4 describes the proposed
triangle, that is, on the relations between the concept and the
approach; section 5 proposes a modeling example and
symbol, since the real world thing or event is outside its area
section 6 concludes the article.
of study. The elements of meaning are commonly presented
II. LANGUAGES as the vertices of a triangle, known as “semiotic triangle” or
“Ullmann’s triangle” or even “Ogden and Richard’s triangle
In [9] Bunge presents a two-page long description for the
of meaning”, which can be seen in figure 1.
language entry; the basic concept, however, is not long and
reads: “System of signs serving to communicate and think.
[…] Since every sign must be elucidated in terms of other
symbols, stray signs are nonsignificant.” In other words,
any language is a system, intentionally used by humans for
communication and reasoning, and composed of signs
which related meaning is determined by opposition to one
another. This is a generic definition of language that applies
to both natural and modeling ones; however, there are
characteristics that are particular to one or the other, and that
must be understood.
A. Natural Languages
Natural language is the designation given to languages
Figure 1. The semiotic triangle
natively spoken by humans in order to transmit information,
express emotions, attitudes, commanding and requests, for
The lexicon of a natural language, or its vocabulary, is
social interaction, and poetic and creative expressions.
divided into word classes or parts of speech, which are sets
McWhorter [11], in his presentation of the history of
of words grouped according to notional, morphological and
languages, describes how, one single language, that was first
grammatical criteria [15]. These classes themselves are
used approximately 150,000 years ago, evolved to form
further grouped into two other classes: the closed-system
what are now the more than 6,000 natural languages known
items, or closed classes, and the open-class items, or open
in the world. The original language spread as its speakers
classes. Closed classes are the ones that have fewer
migrated from East Africa to Asia, then to Europe, Australia
members and cannot normally be extended by the creation
and finally to the Americas. According to the reality faced
of new items; they include determiners, prepositions,
with by each of those groups, the original language evolved
pronouns, conjunctions and numerals. Open classes, on the
until the separate dialects morphed into completely diverse
other hand, are the most numerous, and the ones that can be
languages. That means to say that, although a natural
indefinitely extended; these are nouns, adjectives, verbs and
language presently learned by human beings may seem a
adverbs [15] [16]. Open class items are the ones that carry
static structure, it is a dynamic entity in slow but constant
the semantic load.
change and evolution, in order to reflect and meet its
The first step in conceptual modeling is the acquisition
speakers communication and reasoning needs.
of the concepts and symbols related to a certain UoD from
All facts and phenomena related to natural languages are
documents and/or statements written or uttered in a natural
studied in Linguistics, which is, according to Saussure [12],
language. As so, in this first step the modeler must focus on
a branch of Semiotic (or Semiology), the study of signs and
understanding the elements present in the triangle above,
symbols in general. Linguistics focuses on the study of the
i.e., the identification of the signs used in that context and
linguistic sign, comprising semantics (study of the relations
the understanding of their relations to the concepts and the
between the signs and their referents), syntax (study of the
reality they abstract. Only then can a modeler try and
relations among signs) and pragmatics (the study of the
represent the same Universe in a modeling language.
relations between signs and the one who uses them); from
B. Modeling Languages system (or language) provides a way of perceiving the world
While natural languages allow for several diverse ways that differs from the other ones; that is to say, when
to express the same content, and even for the creation of translating a message, the sign of the target language must
messages that, although grammatically correct are not be chosen from the ones that belong to the same semantic
precise in meaning (and this is a very desirable trait since it field as the sign in the source language. Figure 2 represents
favors creativity, as expressed in poetic and literary usage, the translation process between natural languages.
besides jokes and puns), a modeling language must be much
more strict and prevent ambiguity. As so, not only a
systematic approach must be adopted in conceptual
modeling, but also the choice for a well-founded ontological
language, which constructs and grammar prevent ambiguity
and non-semantic constructions, are key in the development
of good and efficient conceptual models.
Men have been creating abstractions of real-world things
since Stone Age, as a way to understand and cope with
reality [17]; thus, it is right to say that men have been
building models ever since. According to Guizzardi [6], “A
model is an abstraction of reality according to a certain
conceptualization.”; however, for models to be
understandable and useful to a community, they must be
created according to a known system of symbols and
connecting rules. Such systems are modeling languages, Figure 2. The translation process
which are artificial languages which, as their natural
counterparts, are also used for communication and to help B. Semantic Types
reasoning but, in this case, through the creation of models.
Dixon [5] states that the open class items of any
In terms of information systems, conceptual modeling
(natural) language can be grouped into classes he names
languages started being formally defined in the 1970’s. ER,
semantic types. All the words of a semantic type share a
CSL, EER, UML, DAML, OWL are a few examples. This
common meaning component and a typical set of
work adopts OntoUML as modeling language.
grammatical properties, as, for instance, its association with
III. TRANSLATION a part of speech. In fact, he groups semantic types according
to the parts of speech to which they are associated in
Considering there are around 6,000 natural languages,
English, i.e., nouns, verbs and adjectives. The most
and that they are all used to accomplish the same
important semantic types for conceptual modeling purposes,
communication tasks [11], it is correct to say that there are
at least in English and other structurally similar languages,
at least six thousand ways of expressing a certain message
are the ones related to nouns, since this is the class of words
or information. It is also safe to say that native speakers of
that name beings and things. For instance, the ones that have
different natural languages may need to communicate and
a concrete reference Animate (in this case, animals), Human
that, for such communication to occur, it may be necessary
and its subclasses (Kin, Rank and Social Groups), Parts
that the message to be exchanged between them be
(body and others) and Inanimate and its subclasses
translated from one natural language to another.
(Artefacts, Celestial and Weather, Environment and Flora).
A. Translation between natural languages Figure 4 presents the semantic types linked to nouns.
Mounin [19] discusses problems related to the Verbs are important for establishing relations and links
translation activities between natural languages. He starts between concepts. The semantic types associated with
by stating that the lexicon of a language cannot be seen as verbs are classified as Primary (“refer to some activity or
just an inventory of words, and introduces the concept of state; verbs that can make up sentences by themselves” )
semantic field, or area of meaning, to which a group of and Secondary (“those providing semantic modification of
words would belong, and within which the meaning of each some other verb”); Primary type subdivide into A and B,
word is determined in opposition to the meaning of the whereas Secondary type subdivide into A, B, C and D.
others. One example would be the word house, that can Semantic Types associated with Adjectives are divided in
refer to any kind of habitation in the vocabulary of a young 11 classes: Dimension, Physical Property, Speed, Age,
child, but that, from conceptual differentiations, specializes Colour, Value, Difficulty, Volition, Qualification, Human
into apartment, bungalow, shack, mansion, manor, or even Propensity and Similarity.
palace in the vocabulary of an adult person. The concept of
semantic field is important for translation activities because
it provides a way to deal with the fact that each linguistic
interchangeable within the same familiar
group, i.e., a son cannot exchange his
Noun position with his father, for instance.
o Rank – semantic type that groups concrete
Abstract Activities Concrete States Speech
reference nouns, which referents are
Reference Reference Acts
living beings that are members of the
kingdom Animalia, that are human, and
Time Place Quantity Variety Language Other
denote a position distinction of one
element in relation to others in a social
Animate Human Parts Inanimate group. Since a same element may have
different positions in different social
Kin Rank Social
Group
groups (a same person can be a goalie, a
teacher, a manager, and a volunteer coach
at the same time), Rank references cannot
Artefacts Celestial Environment Flora
and Weather be mutually exclusive and they can be
interchangeable within the same social
Figure 3. Semantic types associated with Nouns group, i.e., a goalie can be a stopper if
needed.
The semantic types associated with concrete reference o Social Group – semantic type that groups
nouns are mostly the focus on conceptual data modeling, concrete reference nouns, which referents
since they represent concepts that abstract real-world things are living beings that are members of the
about which data is supposed to be stored. A complete kingdom Animalia, that are human, and
description of the semantic types proposed by Dixon [4] is denote groups of such humans which are
outside the scope of the present work; however, as the ones seen and identified as a whole. A
related to concrete reference nouns are the most common in company is an example of a Social group.
conceptual models, and are important for the understanding  Parts – semantic type that groups concrete
of the example provided, a little discussion about these reference nouns, which referents are parts, i.e., that
constructs is very relevant. have a component relation with other beings; such
Concrete reference nouns are the ones which the parts can be corporeal or not.
referents are beings or things that exist independently in the  Inanimate – semantic type that groups concrete
world [21]; they are also described as the ones that “refer to reference nouns, which referents are inanimate
entities that are typically perceptible and tangible” [15]. things. It further specializes in:
Either way, the independent existence, perceptibility and o Artefacts – semantic type that groups
tangibility imply the possibility of individuation and concrete reference nouns, which referents
identification of such referents. The Concrete reference are things created or made by men, i.e.,
semantic specializes into four other semantic types: things that are not present in nature, like
 Animate – semantic type that groups concrete book , for instance.
reference nouns which referents are living beings o Flora – semantic type that groups
that are members of the kingdom Animalia but are concrete reference nouns which referents
not human. are living beings that are members of the
 Human – semantic type that groups concrete kingdom Plantae.
reference nouns which referents are living beings o Celestial and weather – semantic type that
that are members of the kingdom Animalia and are groups concrete reference nouns which
human. It further specializes in: referents are celestial bodies, like sun and
o Kin – semantic type that groups concrete moon, or related to weather or climate,
reference nouns, which referents are like rain or wind.
living beings that are members of the o Environment – semantic type that groups
kingdom Animalia, that are human, and concrete reference nouns which referents
denote a position distinction of one are related to the environment and mostly
element in relation to others in a family members of the mineral kingdom, like
group. Since a same element may have water and gold, for instance.
different positions in different groups (a Most referents of concrete reference nouns are countable
same person can be a father, a son, a but some of them that are non-countable; in this case, their
nephew, a son-in-law, a husband, etc.), referents seen and perceived as a mass. They are generally
Kin references cannot be mutually minerals and belong to the environment semantic type, like
exclusive; however, such positions are not water and gold, for instance.
The semantic types Dixon [5] describes are common to determined by the Kind construct) seen and
all natural languages; however, grammatical properties, like perceived as a uniform structure.
the word class to which they associated, may vary from one  Phase – properties (phased-sortal) that determine
language to the other. In other words, a semantic type that is classes of partitions of beings or things determined
associated with a noun in English may be associated with a as Kind. The classes determined by Phase are
verb in a different language. Also, a word may be linked to stages in the existence of a being; they are mutually
different semantic types in the same natural language, exclusive and depend on the intrinsic properties of
depending on the context in which it is used: the word book, an individual. A good example would be
for instance, can be a noun, when referring to a number of Caterpillar and Butterfly that are partitions
written sheets of paper bound together (an artefact), or a (Phases) of a Lepdopterum (Kind).
verb, when it refers to the act of reserving something (a  Role – properties (phased-sortal) that determine
giving type verb). The concept of semantic type can be a classes of relationally dependent roles of beings or
great support when it comes to the creation of conceptual things determined as Kind. The classes determined
models. The present work take them as the natural language by Role are not mutually exclusive and depend on
constructs to which the modeling language ones must be the extrinsic properties of an individual. A good
compared when the model is being built. example would be Student that is a role of a Person
C. OntoUML (Kind). [6]
The definitions provided here have been simplified as to fit
OntoUML is a modeling language based on a revision of the purposes of this work. The UFO, as well as, OntoUML,
a portion of UML, and which constructs derive from the involve many other philosophical concepts that must be
Unified Foundation Ontology (UFO) [6] [9]. A foundational understood for the creation of correct and precise models.
ontology is “a domain-independent common-sense theory However, as said before, those are outside the scope of this
constructed by aggregating suitable contributions from paper.
areas such as descriptive metaphysics, philosophical logics,
cognitive science and linguistics.” [18]. Based on
philosophical concepts, and starting from universals, the
UFO describes real world categories and, as so, provides
OntoUML with the basis for the creation of semantically
accurate models. Figure 4 presents the lists of OntoUML
constructs, the representatives of concepts in the first
column and those of relations in the second one.
Although a complete explanation of UFO concepts and
the OntoUML constructs is beyond the scope of the present
work, some of them will be briefly discussed here, as a
means for the understanding of the proposal presented later.
The first step would be to define that universals are
properties that determine classes of all the things; that
sortals are universals that that determine classes of things
and beings that have an identity and individuation principle.
Sortals can be substance sortals (properties that provide
identity principle) or phased-sortals (properties that
determine a class of things or beings for a period of time).
 Kind – properties (substance sortal) that rigidly (all
members in every world must have) determine
classes of complex beings or things that are
relationally independent, that can be clearly Figure 4. OntoUML constructs
identified; the beings or things that are modeled by
this construct can be animals (Dog, Cat, Person),
plants (Tree) or artifacts (Chair, Book, Television). D. Translation from natural language to modeling
 Quantity – properties (substance sortal) that language
determine classes of mass substances, like Gold, Although the process of translating natural language
Water, Clay. concepts into a conceptual model imply comparison
 Collective - properties (substance sortal) that techniques, as with the translations between two natural
determine classes of collectives, i.e., collections of languages, there are two main differences that must be taken
members of a class of beings or things (as the ones into account. First, the modeler and the domain experts need
to communicate in the same natural language. Second, the
modeling language does not provide a sign that is proposes, then, a conceptual modeling approach that
semantically equivalent to the one used in the natural supports modeling activities that may be already known or
language; i.e., comparison is not done between equivalent familiar but that, when ordered and organized as presented
semantic fields but between natural language constructs here, can help save time, avoid mistakes and create better
(here, semantic types) and the constructs of the modeling models. Also, this approach has linguistics, mainly,
language. The natural language sign representing the semantics as a starting point, what is key to the actual
concept being modeled appears in the conceptual model as understanding of the meaning of signs; it also focus on the
the label of a construct. interaction with UoD specialists and in the contextualized
As an example of the aforementioned comparison, we meaning of the signs present in the UoD descriptions. In
will try and compare the semantic types already defined to order to make the analysis and understanding of the texts
the OntoUML constructs previously presented. Such easier, they should be turned into a numbered list of
comparison is not trivial and there is not a one-to-one clear sentences. The approach is illustrated in Figure 5 (in the
unique possibility – languages are culturally dependent and UML Activity diagram notation); the activities are described
subjective; also, the classification of signs according to below.
semantic types depends on the Universe being modeled.
Since all concrete reference nouns, their referents are all
determined by sortals. According to the definition and
examples provided above, most of the concrete reference
semantic types are mapped to the Kind (classes of
existentially independent, clearly identified and
individuated, rigidly determined beings or things)
OntoUML construct; this shows that the semantic types
proposes a more detailed differentiation of such classes.
The same happens with the semantic types Kin and Rank
that are mapped to the Role OntoUML construct; however,
the Phase construct does not have an direct equivalence in
the semantic types and will be modeled according to the
judgment of the modeler – he/she must know, for instance,
that Caterpillar is a partition in the life of a Lepdopterum.
The definition of the semantic type Environment points to
the Quantity construct, as does the Social Group to the
Collective. Table 1presents the results of this comparison.
TABLE I. SEMANTIC TYPES AND ONTOUML

Semantic Types OntoUML construct


Animate Kind
Human Kind
Human/Kin Role
Human/Rank Role
Human/Social Group Collective
Parts Kind
Inanimate Kind
Inanimate/Artefacts Kind
Inanimate/Cel. & Weather Kind
Inanimate/Environment Quantity
Inanimate/Flora Kind
Phase
Figure 5. Activity diagram of the proposal

A. Decompose text into simple sentences


IV. APPROACH
Any conceptual modeling activity may be considered as a A simple sentence is a sentence that does not have co-
process that involves knowledge and perceptions that are ordinate or subordinate clauses; they generally present the
inherently human and very subjective. However, it is format Subject + verb + object, and none of these
possible to systematize modeling activities in a detailed list components can be a clause in itself. Consequently, a way to
of activities, which can pose a guide to modelers. This work “break” complex sentences is by looking for co-ordinate and
subordinate conjunctions and understanding how they relate subjective activity which success rely on the modeler’s
clauses. knowledge, experience and ability to focus on the current
The aim here is to reach what Chomsky [22] defines as context and not to allow him/herself to be influenced by
kernel sentences, i.e., the ones that belong to the deep his/her previous experiences. This is one of the two most
structure of the language (meaning), without the important steps of the proposed approach and, for it to be
transformations that lead to the surface structure of the performed successfully, the modeler must seek a clear
language (form). Chomsky describes kernel sentences as understanding of concepts of the UoD being modeled.
active voice, affirmative simple sentences. Thus, it is also
F. Map semantic types to OntoUML constructs
important to convert sentences from passive to active voice,
whenever applicable, so that subjects and objects are clearly The second most important step in the proposed
identified; if the resulting active voice sentence does not approach is, no doubt, the mapping of semantic types to the
have an explicit subject, the word someone should be used modeling language constructs. The modeler must compare
as substitute. The list of simple sentences should also be the (meta-)properties of each semantic type identified to the
numbered in the order they appear in the text, so that the (meta-)properties of OntoUML constructs searching for
reading of the sentences, in order, reproduces the reading of equivalences, as it was shown above. In order to avoid
the text itself. repetition, the rationale and guide for this step will be
explained in the example that follows.
B. Create list of questions and doubts
G. Create first version of the model
As the modeler decomposes the text into simple
sentences, he or she may find that pieces of information are Once the semantic types have been mapped to the
missing or find ambiguities that will have to be explained by modeling language constructs, the first version of the model
the UoD specialist. The first thing to notice is the presence can be created. Benevides and Guizardi [7] present an
of the word someone in the subject position, derived from OntoUML tool that can be used for this activity; the use of
transformation of a sentence into its active form – the such a tools ensures the syntax quality of the produced
modeler must be able to replace it with the correct subject. model, since it does not allow for the constructions of
Another potential question arises, for instance, when two of syntactically non-valid models.
the three components are repeated in two or more sentences; H. Validate model
this may mean that there are synonyms in the texts and the
best (or correct) sign must be identified. The modeler should The model must be taken to the UoD specialist for
validation. The modeler can, once again, ask questions, this
write a list with all questions and doubts, in the same order
time about the meta-properties of the constructs used to
as they appear in the text, and later ask all the listed
model UoD concepts, in order to make sure they were
questions to the specialist. This activity can be executed in
correctly mapped.
parallel to the decomposition of the sentences described
above. I. Create final version of the model
C. Update list of simple sentences The modeler must adjust the model according to the
comments provided by the UoD specialist during the
According to the answers provided by the UoD
validation activity. Also, there are metrics for the quality of
specialist, the modeler must update the list of simple
conceptual models that, although outside the scope of the
sentences, explicating previously unknown subjects, and
present work, must also be taken into consideration in
eliminating synonyms and ambiguities. It is advisable that
this updated list be a separate document and that it does not evaluating the correctness of created model, as, for instance,
replace the first list, so that the work rationale can be the ones described in [20]. The final version of the model
must take such metrics into account.
recovered at any time.
D. Identify signs of the Universe of Discourse V. EXAMPLE
From the final list of simple sentences the modeler can In their seminal conceptual modeling book [1], Battini et
identify the conceptually significant natural language signs al provide exercise case studies for students. The natural
that used in the UoD being modeled. In English, as well as language text used in the example presented here is an
in other similarly structured languages, such symbols will excerpt that was taken from one of these case studies (pp
be nouns, verbs and adjectives. 268--269), as follows:
E. Associate signs with semantic types “In the library of a computer science department, books
The modeler must associate each of the identified signs can be purchased both by researchers and by students.
it with one of the semantic types. As semantic types are not Researchers must indicate the grant used to pay for the
mutually exclusive, the modeler must be careful so as to book; each student has a limited budget, which is fixed each
make the association that is applicable in that specific year by the dean of the college.”
context (or UoD). This is a very important and yet
According to the approach proposed in this work, a modeler Sentence No. Signs
should start by decomposing the text above into simple 0 Library, Computer Science Department
sentences and, simultaneously, write down a list of 1 Researcher, Purchase, Book
questions and doubts to be answered by UoD specialists. 2 Student, Purchase, Book
Table 1 presents the list of simple sentences that should be 3 Researcher, Book, Grant
the result of this activity and Table 2 presents the question 4 Researcher, Indicate, Grant, Book,
and answered produced. 5 Student, Limited Budget
6 Student, Book, Budget
TABLE II. LIST OF SIMPLE SENTENCES 7 Dean, Fix, Student [yearly] Budget

No. Sentence
0 In the library of a computer science department The next step is the association of identified signs with
semantic types. Table 4 presents the list of signs and its
This “sentence” is, in fact an adverbial phrase that identifies
and localizes the UoD; thus it was given the number zero.
associations.
1 Researchers can purchase books TABLE V. ASSOCIATION OF SIGNS WITH SEMANTIC TYPES

This sentence is the active voice form of the one in the Sign Semantic Type
natural language text. Computer Sign is a noun phrase that refers to a division of an
2 Students can purchase books Science Dpt. institution, i.e., it has a concrete reference, is
This sentence is also the active voice form of the one in the related to humans and is a Social Group.
natural language text. Library Sign is a noun that also refers to a division of an
3 Researchers pay for books with grants. institution; also has a concrete reference, is related
to humans and is a Social Group.
A Grant is an amount of money given by an organization or Researcher Sign refers to a human but qualifying the person
institution for a particular purpose.
according to a position before the other members
4 Researchers must indicate the grant used to pay for the
of a group; the semantic type should be Rank.
book
Purchase Purchase is a sign is a Primary A verb, of the type
Grants represent something identifiable. Giving, i.e., one that always involves 3 semantic
5 Each student has a limited budget roles: a donor , a donated thing and a recipient.
Book Sign refers to an object (concrete and inanimate)
Budget refers to an amount of money provided for a
produced by men, thus, an Artefact.
particular purpose.
Each student pays for books from their budget Student Sign refers to a human but qualifying the person
6
The Dean of the college fixes students’ budgets every year according to a position a before the other members
7
of a group; the semantic type should be Rank.
TABLE III. MODELER QUESTION AND ANSWER PROVIDED Grant Sign refers to an amount of money given by an
BY UOD SPECIALIST organization for a particular purpose; a
No. Question Answer given by nominalization of the verb to grant, it’s meaning
Specialist relates a Primary A verb of the type Giving, since
1 When you say “grant”, do you It refers to the amount an institution provides the amount of money to a
refer to the amount of money or of money (library Researcher, and each of such occurrences must be
to a document, like a grant report, budget).
uniquely identified.
or a grant certificate?
Budget Sign refers to an amount of money set aside for a
particular purpose; a nominalization of the verb to
budget, it’s meaning relates to a Primary A verb
In the next step, the modeler must update the list of of the type Giving, since the Dean fixes the
sentences as to reflect the answers provided by the
amount of money a Student has at his/her
specialist. In this small case, where we had just one
discretion, and this procedure is repeated every
question, the list does not need to be updated; the modeler
year.
must just keep in mind that “grant”, in this UoD, refers to
the amount of money received by an institution and linked Dean Sign refers to a human but qualifying the person
to a researcher. Next, the modeler must identify the signs in according to a position before the other members
the sentences. Table 3 presents the signs identified in each of a group; the semantic type should be Rank.
sentence.

TABLE IV. LIST OF SIGNS IDENTIFIED IN EACH SENTENCE


The modeler then must map or relate semantic types to VI. CONCLUSION
OntoUML constructs. Table 5 presents construct mappings. Conceptual modeling remains the most important
Finally, the modeler can create a first version of the model
activity in database design and it is not likely that this
and present it to the UoD specialists for validation; from the
activity can ever be fully automated, since it relies on
comments provided by such specialists, the modeler should
basically human activities. Such activities include the
either update his documents and the model, or evaluate the
model according to established metrics (which, as interaction between the modelers and community members,
mentioned before, are not in the scope of this work), and the understanding of the language particularly used by such
produce a final version of the model. Figure 6 presents the community, and the ability to translate its related concepts
final version of our example model. to a modeling language. Also, it is certain that, no matter
the source of information the modeler has access to, it is
always expressed in a natural language. Language
TABLE VI. ASSOCIATION OF SEMANTIC TYPES WITH understanding is the ground stone for the modeling
ONTOUML CONSTRUCTS
(translation) process, thus, the knowledge and application of
Semantic OntoUML Rationale linguistic principles are an invaluable support.
Type Construct This work proposed an approach to the modeling
Social Kind Concrete reference = existentially process that is based on linguistic analysis; it relies on the
Group independent, identifiable (substance semantic types theory and classification proposed by Dixon
sortal); a unit formed by a collection of [5], and the mapping of each of those types to a well-
complexes. founded ontological modeling language, OntoUML. Since
Rank Role Qualifies the entity according to a modeling languages differ from natural languages in that the
position and/or responsibility that is not meaning of models do not come from signs but from
exclusive (anti-rigid) but that is constructs, the modeler must compare natural language
relationally dependent. constructs (semantic types) to the modeling language ones,
Giving Relator A relation that is existentially having the (meta-)properties inherent to each of them in
dependent (moment) of at least two of mind. One quality trait of the produced model relies on its
three semantic roles. semantic equivalence to the descriptions provided in a
Artefact Kind Concrete reference = existentially natural language.
independent, identifiable (substance
VII. FUTURE WORK
sortal); functional complex either
instance of a natural kind or an artifact.
In [20] Poels et al present 6 types of quality desired in a
conceptual model; among them is semantic quality, which is
difficult not only to achieve but also to measure. The use of
ontological languages to that end is not a novelty and the
semantic gain of an OntoUML model over a correspondent
ER one can be seen in [6].
As referenced in [4], most conceptual modeling research
works based on linguistic principles only go as far as
syntactic concepts and word classes, or parts of speech, in
terms of natural language constructs. The approach
proposed in this work advances in the use of linguistics
since it focuses on semantic constructs and in the path to
achieving semantic quality in conceptual models. However,
the research must continue to establish the semantic gain
Figure 6. OntoUML model from the use of such approach.
Future works would, then, include a real-world case
study, in which the semantic quality of models produced
from the same textual descriptions can be evaluated. Diverse
modeling languages and methods should be used, besides
the proposed approach; the models should be created by
different modelers and should each be read by a different
analyst. The aim is to estimate the semantic gain by
comparing the readings (or recreated natural language
sentences) of the produced models to the original sentences.
ACKNOWLEDGMENT [9] Benevides, A. B., Guizzardi, G., Braga, B. F. B., Almeida, J. P. A.:
Assessing Modal Aspects of OntoUML Conceptual Models in Alloy.
The authors would like to thank Ms. Leila Andrade, a In: Heuser, C., Pernul, G. (eds) ETheCoM 2009. LNCS, vol. 5833, pp
Librarian who kindly took the time to contribute to our case 55--64. Springer, Heidelberg (2009)
study. [10] Bunge, M.: Philosophical Dictionary. Prometheus Books, Amherst
(2003)
REFERENCES [11] McWhorter, J.: The Power of Babel: A Natural History of Language.
HarperCollins, New York (2003)
[1] Batini, C., Ceri, S., Navathe, S.: Conceptual Database Design. [12] Saussure, F.: Course in General Linguistics. Cultrix, São Paulo
Benjamin/Cummings, Redwood City (1992) (2006) In Portuguese
[2] Anaby-Tavor, A., Amid, D., Fisher, A., Ossher, H., Bellamy, R., [13] Jackendoff, R.: Foundations of Language. Oxford University Press,
Callery, M., Desmond, M., Krasikov, S., Roth, T., Simmonds, I., Oxford (2002)
Vries, J.: An Empirical Study of Enterprise Conceptual Modeling. In [14] Ullmann, S.: Semantics: An Introduction to the Science of Meaning.
Laender, A. H. F., Castano, S., Dayal, Umeshwar, Casati, F., Oliveira, Calouste, Lisboa (1977) In Portuguese
J. P. M. (eds) ER 2009, LNCS, vol. 5829, pp 55--69, Springer, [15] Greenbaum, S.: Oxford English Grammar. Oxford University Press,
Heidelberg (2009) Oxford (1996)
[3] Gangopadhyay, A.: Conceptual Modeling from Natural Language [16] Quirk, R., Greenbaum, S.: A University Grammar of English.
Functional Specifications. Artificial Intelligence in Engineering, vol. Longman, London (1973)
15, issue 2, 207--218 (2001) [17] Schichl, H.: Models and History of Modeling. In: Kallrath, J.:
[4] Castro, L., Baiao, F., Guizzardi, G.: A Survey on Conceptual Modeling Language in Mathematical Optimization. Pp 25--36,
Modeling from a Linguistic Point of View. Technical Report, Rela Kluwer Academic Publishers, Norwell (2004)
Te-DIA (2009) [18] Guizzardi, G.: The Role of Foundational Ontologies for Conceptual
[5] Dixon, R. M. W.: A Semantic Approach to English Grammar. Oxford Modeling and Domain Ontology Representation. In: 7th International
University Press, Oxford (2005) Baltic Conference on Databases and Information Systems. Vilnius
[6] Guizzardi, G.: Ontological Foundations for Structural Conceptual (2006)
Models. CTIT, Enschede (2005) [19] Mounin, G.: Les Problèmes Théoriques de la Traduction. Cultrix, São
[7] Benevides, A. B., Guizzardi, G.: A Model-Based Tool for Conceptual Paulo (1975) In Portuguese
Modeling and Domain Ontology Engineering in OntoUML. In: [20] Poels, G., Nelson, J., Genero, M., Piattini, M.: Quality in Conceptual
Filipe, J., Cordeiro, J. (eds) ICEIS 2009. LNBIP, vol. 24, pp 528-- Modeling – New Research Directions. In: Olivé, A. (eds) ER 2003
538, Springer, Heidelberg (2009) Ws. LNCS, vol. 2784, pp 243--250, Springer, Heidelberg, (2003)
[8] Guizzardi, G., Lopes, M., Baião, F., Falbo, R.: On the Importance of [21] Bechara, E.: Moderna Gramática Portuguesa. Nova Fronteira, Rio de
Truly Ontological Distinctions for Ontology Representation Janeiro (2009) In Portuguese
Languages: An Industrial Case Study in the Domain of Oil and Gas.
In: Holpin, T., Krogstie, J., Schmidt, R., Soffer, P., Ukor, R. (eds)
BPMDS 2009 and EMMSAD 2009. LNBIP, vol. 29, pp 224--236, [22] Chomsky, N.: Syntactic Structures. Mouton de Gruyter, New York
Springer, Heidelberg (2009) (2002).

S-ar putea să vă placă și