Sunteți pe pagina 1din 12

a methodology for metadata modelling -

depth for a flat world


Andreas Aschenbrenner
ERPANET
Abstract: The recognition of the create quality metadata models with a
value of metadata continues to rise, and robust design.
accordingly metadata frameworks are Keywords: conceptual metadata
ever more widespread, they grow more model; metadata modeling; metadata
comprehensive and they become visualisation; graphic modelling
increasingly complex. This rise in technique; information model;
quantity, size, and complexity calls for a entity-relationship analysis.
methodology that supports metadata
1 introduction
design. Information modelling
techniques as they are routinely Abstraction is probably one of the most
employed in information system design powerful of human capabilities. Plans
and other domains are well suited for this and models have assisted so many before
task offering both visualisation us to coordinate hunting, to find the way,
techniques and an entire design to develop novel tools, and sheer
methodology. Above all, information uncountable other activities. Today,
models help to manage complexity and modelling also manifests in a myriad of
leverage communication, thereby formalised methodologies and takes
promoting reuse and interoperability. place in a variety of fields and situations.
Already some metadata initiatives Conceptual models can take many forms:
successfully employed modelling they can be text-based or employ
techniques building on the large body of graphics for visualisation; high- level or
existing experience in this area, yet these highly detailed; plain or hierarchically
techniques remain to be widely adopted structured.
in the metadata world. When composing a metadata set a host
Information modelling techniques of requirements, influences and
allow visualisation for intuitive scenarios need to be considered. After all,
interpretation and clear communication, metadata needs to be tuned to a specific
facilitate a structured approach to design, business environment with all its
and create new perspectives on existing activities, roles, and possible
metadata models. This paper describes interdependencies. Such a complex
the application of information modelling undertaking calls for the use of formal
to metadata. It also provides orientation modelling techniques that support the
where the metadata community can design process.
further extend their modelling skills to Metadata design starts from an initial
requirements analysis that explores the from. This paper refers to these
relevant business processes by experiences. It focuses on information
establishing use cases and functional models to illustrate the value of an actual
models (1). The actual metadata is then metadata model. The description of the
represented by a data or - as it is also modelling technique in Chapters 3 and 4
called - information model, and its is practically oriented and illustrates how
development comes as a natural a graphic metadata model can transport
succession to precursory analyses. Such more information in a far clearer way
a comprehensive approach ensures that compared to a flat text listing of a
all external requirements are accounted metadata set. Chapters 5 and 6 then
for, supports implementation at a later explore the new perspectives such a
stage, and essentially raises the quality of model allows and reflect some
the final product. extensions that may further enhance the
Taking a look at the available experience power of this technique.
in the metadata community, this design
2 background
process can be followed nicely is the
"Functional Design for a digital depot" Information models take a prominent
(2), where the actual metadata model is role in engineering and computer science.
based on functional models of a The success of a development project
comprehensive process analysis. In a depends on the models created in the
similar vein, a core standards activity in design phase. Respective techniques
metadata modelling, IFLA's Functional were first developed already with the
Requirements for Bibliographic Records advent of data processing systems in the
(3), based their metadata model on 1950’s. A proliferation of methodologie s
requirements analyses and use cases. and tools followed, and modelling
Another authoritative initiative that techniques are widely applied today.
employs various modelling techniques Graphic modelling techniques are
and indeed is working on a data model employed to facilitate human
with regard to digital preservation is the interpretation as part of requirements
InterPARES project (4). The conceptual analysis and conceptual design. These
models of this huge, international project visual models are then translated into
are part of their core deliverables, and detailed lists suitable for implementation.
surely they are also a great means for An early methodologies was the
communication and discussion. Structured Analysis and Design
Before metadata-related initiatives Technique (SADT) developed in the
picked up modelling techniques, these 1970s as a “language for communicating
techniques were widely used in ideas” (5).
information system design. It is the The family of Integrated Definition
experience accumulated in that domain Languages, short IDEF, were first
that the metadata world can still benefit developed in the 1970s by the US Air
Force, and they are standard modelling However, the concepts of information
techniques today. They cover a range of models and the conclusions presented in
applications from functional to the following are of a more general
information modelling, simulation, nature and it does not really matter with
object-oriented analysis and design and which modelling language they are
knowledge acquisitio n. Specifically, notated. The abstraction mechanisms
IDEF0 (6, 7) is a functional modelling presented below can hence be applied
language that builds on SADT, and with IDEF1X or any other information
IDEF1X (8, 9) provides for information model as well.
modelling.
3 the methodology
The development of the Unified
Modeling Language (UML) (10) started In order to understand the working of an
in the 1990s building on the experience EER and its application for metadata, we
gained in a range of existing construct in the following a small
object-oriented analysis and design learning example in a step-by-step
methods. UML is geared at combining a process. We start from a metadata set for
range of modelling techniques and any sort of digital object comprehending
provides a set of twelve model types four elements:
including functional as well as
information models. The Object
Management Group (OMG), a non
-profit consortium, coordinates the
development of UML to create a
rigorous, open standard for software
modelling and system design. Clearly this metadata set is actually
The Extended Entity-Relationship (EER) composed of two different entities: there
model is a widely used conceptual is the actual ‘object’ described by the title
information model, and has a and the creation_date, and then there is
long-lasting history in system the ‘creator’ of the object of whom we
engineering (12, 11). The EER found know the nationality. These two entities
many proponents, some of whom are in a certain relation to each other: an
introduced individual styles or object is created by one or more creators,
extensions for specialised applications, and each creator may have created one or
but the basic concepts have hardly been more objects. In EER notation this is
obscured. called a many-to-many relation, and is
In the following we will use the well notated like this:
established and widely used EER to
model metadata. Also the two metadata
initiatives (2) and (3) introduced earlier
employed entity-relationship models.
Other connectivities of relations are a
one-to-one relation or a one-to-many
relation. To demonstrate a one-to-many
relation, let us assume that each creator
may be working for exactly one
organisation, but each organisation may Turning our focus to the entity ‘object’,
contract many creators: we find that each object may consist of
any number of items, also just one at
least. Each item has certain attributes
such as file_size and location.
Furthermore, each item is of a certain
One step further we define that each type, for example an ‘image’, ‘audio’, or
affiliation may contract zero, one, or any ‘text’. These item types are
number of creators, whereas a creator specialisations and may have attributes
must be assigned to exactly one of their own; for example, an image has a
organisation. For this reason we resolution, whereas for an audio the
introduce a circle signifying ‘optional’ sample_rate may be specified.
on the side of the ‘creator’ entity,
whereas we leave the relation on the side
of the ‘affiliation’ unchanged.

Lastly, a more sophisticated kind of


relation is the ternary relation. The above
relations were largely binary ones
between two different entities. Unary Each of these items can be stored in a
relations are between one instance of an certain file format. An image, for
object and another instance of the same instance, may be stored as a TIFF or as a
object. Ternary relations, consequently, JPEG. These specific formats may again
involve three different objects. For the have attributes of their own, similar to
following example we assume that each the generalisation/specialisation above.
object is created through a number of The following notation called subset,
processes conducted by each creator, and however, allows overlapping entities. In
also that the creator may actually other words an image may be stored as a
perform the same processes on various TIFF, as a JPEG, or both formats at the
objects. This calls for a same time.
many-to-many-to-many relationship.
this requirement in EER notation
followed by a discussion.

This basic toolset of the EER model can


be applied for modelling any conceptual
metadata framework. The above is a In the first description the quirk is an
synopsis of the original discussion in (12) attribute of the entity ‘object’. In other
and it is translated to modelling metadata. words, quirks are described in a short
Relevant in this context is mainly Step 1 text, which is attached to the very object.
described in Chapters 1 and 2 in that Approach (b), however, assumes that
paper. For a more extensive description each object may have any number of
please refer to that paper, or to any of the distinct quirks including none. Yet
numerous tutorials and information another perspective is (c), where various
sources available online and via other objects can have the same quirk or any
channels. number of quirks. If in a specific
environment a set of recurring quirks can
4 application
be identified, predefined quirk
Information models do not provide a descriptions could be quickly assigned to
mathematically sound technique which newly incoming objects, which may
allows only a single possible solution for raise efficiency considerably. As
a specific metadata model. They rather illustrated by the Figure above,
help to understand and shape a task; they differences between the models are quite
help to ask the right questions and evident, whereas in a flat textual
provide a typology for communicating description of the metadata set these
possible answers. Indeed, distinct differences may go unnoticed and may
perceptions of a single metadata set may cause misunderstandings and wasted
manifest in small but eye-catching efforts.
differences within the model. As an Visualisation is one key feature of a
example to this we assume that in the conceptual model. A visual model further
tentative metadata model we started to facilitates organising and structuring the
create above, we want to describe data at hand through multiple abstraction
possible ‘quirks’ of ‘objects’, i.e. levels and modularisation. Ultimately
deficiencies in the original objects. This this is conducive to efficiency in the
time, however, we flip the procedure: model’s practical application. To
first we look at possible descriptions of illustrate the value of modularisation and
aggregation, we specify a specific quirk further opportunities for optimising
closer by describing the technology system implementation and streamlining
environment (i.e. the hardware and user tasks. For instance, instead of
software platform) it occurred in. In requesting that all attributes of an entity
another part of the model, the description need to be entered again and again, the
of the process an object was created in, system could automatically complete the
we plan to include this technology empty attributes once it uniquely
description as well. By making identified the corresponding dataset.
‘technology’ an entity of its own, we can Looking at the entity ‘creator’, once the
establish a relation between it and the creator’s name has been given, and there
‘process’ as well as the ‘quirk’ object. As is only a single creator with that name
technology descriptions can be expected known to the system, the birth and death
to be rather repetitive, this bears huge dates as well as the nationality could be
time and cost savings in practice. filled in automatically. As obvious such a
Creators are in this approach not obliged semi-automatic approach may appear, it
to produce a technology description requires the modularisation of data to
themselves, but they just select given make it possible in the first place. More
descriptions, which are reused by the than that, modularisation in conceptual
people creating the descriptions of modelling is the basis for allowing
possible quirks. (Note also that this multiple creators of an object. Again this
approach could not be realised if is obvious to humans, yet these concepts
perspective (a) in the Figure above was and relations can only be created with
chosen.) adequate data structures at their basis.
Information models can be taken down Picking up the example above, the
from rather sketchy high-level models to system would just not know which
detailed descriptions of future nationality belongs to which author, or
functionality. In a further step after the who died if merely two authors and one
model has been established, the types of date of death were given in the form of a
each attribute should be specified. plain metadata list.
Consistency requirements for attributes Modularisation in metadata design may
should be established as well. For also reflect the organisational roles and
instance, the attribute ‘CreationDate’ responsibilities for metadata creation.
from the entity ‘object’ should be a date Let’s take the ‘creator’ entity again:
somewhere between the birth and the somebody needs to enter a creator’s data
death dates of the creator. Exact type first. Should the person who enters the
definitions and consistency rules will metadata be the creator herself, or should
further enhance the quality of metadata. this be taken care of by the organisation?
Going one step further, the task for
5 implications
populating the metadata of a specific
Groupings in the metadata model offer module within a metadata framework
could actually be outsourced to a modelling techniques are conceivable
dedicated service. A whole sector could that could tune them specifically for
decide to join forces and create such a modelling metadata. One convenient
service that all organisations in the sector extension might be a colour coding for
can make use of. In the case of the attributes to convey how rigidly their
‘creator’ entity, this step has been taken types are defined: on the one end are just
in the cultural heritage sector by a textual fields that allow any conceivable
European project called LEAF (13). The input, on the other are clearly specified
LEAF system contains authority files data types such as the ISO 8601 date
that describe specific persons. LEAF can format (15). Tightly defined data types,
indeed be used as a service responsible of course, allow a certain level of
for one particular module in the metadata machine comprehension, which is
framework of a library, as it allows conducive to their automatic handling.
external resources to link to its authority One may distinguish three levels of
files. typing: (1) machine comprehensible; (2)
Another example for exploiting possible strongly typed; and (3) human
synergies is a File Format Registry that understandable. Each of these levels
holds a comprehensive catalogue of file could be assigned a colour, so that the
format documentation. Instead of viewer is able to discern at first sight,
providing all the object type information which attributes can be interpreted by the
itself, an organisation may rather entrust system (1); those which can to some
this task to an external service. The degree be automatically handled and
Global File Format Registry (14) is manipulated, but are essentially
setting up exactly such a service. One of meaningless to the system (2); and those
the open challenges is with the unique which are unstructured strings and unfit
identification of entries in this database, for automatic handling (3).
so that specific organisations can Another extension may be required on a
actually reference to and establish a higher conceptual level: increasingly
relation between their metadata model complex relationships between different
and the registry. metadata sets are being established, such
What other such services are conceivable, as certain attributes that are part of
on a corporate, sectoral, or even a global various metadata sets at the same time,
level? Information models for metadata or virtual attributes that are created
may help to answer this question and automatically and on the fly from other
thereby highlight opportunities for metadata. Technologies such as
cooperation, which essentially saves Application Profiles (16) foster these
costs and raises quality. pioneering developments. Modelling
such complex relationships may,
6 extensions
however, require extended modelling
Several extensions to existing conceptual techniques.
Same applies for including a temporal With more and more initiatives
component in metadata modelling, the embarking on metadata modelling,
importance of which was underlined in techniques are needed to compare ,
(17). Temporal modelling technique s combine and reuse models. As (21) in the
have already been analysed and scope of the "Guidelines of Modeling
introduced for various conceptual GoM" project pointed out, however,
modelling techniques, including EER modelling is an essentially creative and
(18) and also for the IDEF family (14) – subjective act. The exactly same
no need to reinvent the wheel if these requirements may therefore be captured
existing techniques are applicable. into entirely different models by different
Previous research, albeit in the field of designers. Guidelines such as those
system engineering, has also focused on introduced by GoM (21) and by the
quality related issues. The different quality frameworks referenced above
background notwithstanding, the potentially reduce subjectivism in the
findings of this research are transferable. design process to such a degree that
For example, Daniel Moody (19) model comparison is possible. However,
identified model quality factors and any more automatic approach to
established a quality management comparison of a large body of models
framework for both the model as such such as suggested in (22) is probably still
and the process of modelling. The out of reach for metadata models, due to
framework takes different roles in the the inherent heterogeneity of
modelling process into account, and backgrounds and the often incompatible
should lead to a model that can be terms and perspectives of metadata
implemented and is complete, simple, initiatives.
flexible, integrated, and understandable.
7 conclusions
An important design goal for metadata
models obviously is their long-term Information modelling techniques
stability. While time will inevitably applied to metadata allow quick
make certain adaptations necessary, the comprehension, and help keeping track
model can be designed such that it is of large and complex models. Even
robust despite necessary modifications. newcomers will find the interpretation of
Lex Wedemeijer (20) researched the such a metadata model straight-forward,
long-term evolution of conceptual and will appreciate it as a means for
schema. His findings leverage the communicating the foundation concepts
simplicity, extensibility, and essentially of a specific model. Some experience is
the stability over time of conceptual necessary in order to best exploit the
schema. The transfer of these findings features and opportunities modelling
will be particularly valuable, for quality offers in the design phase. However, the
and long-term stability are central wealth of experience and resources
concerns in metadata modelling. available in other fields supports a steep
learning curve. Tutorials and auxiliary created for essentially the same
software tools are easily found. requirements. At the same time metadata
Information modelling is one chain link models facilitate the communication
in the design process. Before the between different communities and
metadata model can be created, a thereby foster the reconciliation of the
requirements analysis needs to explore variety of concepts and notions inherent
the real-life environment and the tasks in the diverse backgrounds of metadata
and responsibilities the model has to initiatives.
support. After the model is finalised, it Over all, there is little doubt that this
needs to be taken forward to support in conceptualisation and
implementation. The close connection communication, as well as the ir fostering
between informatio n models and quality, efficacy, and robustness suggest
software engineering helps bridging the the adoption of modelling techniques and
gap between conceptual metadata make them more consistently used in the
models and system implementation. metadata world.
Methodologies for translating models
References
into database schema are widely
available. Desirable for metadata is a 1. J.García Molina, M.José Ortín,
methodology for translating a conceptual Begoña Moros, Joaquín Nicolás, and
metadata model into an XML/RDF Ambrosio Toval. Towards Use Case
framework. In fact, an information and Conceptual Models through
model was designed for RDF (23), which Business Modeling. In: A.H.F.
is tailored to the specific use and very Laender, S.W. Liddle, V.C. Storey
concrete. However, ways to translate (Eds.). ER2000 Conference, LNCS
conceptual models into the RDF data 1920, pp. 281-294, 2000.
model remain to be explored. 2. Nico van Egmond, Hans Hofman,
Another area that calls for research is the Jacqueline Slats, Tamara van Zwol.
comparison of metadata models, which Depot 2000 - Functional design for a
is particularly needed for tools and digital depot. Rijksarchiefdienst, Den
services such as metadata registries. Haag, 2000.
Metadata registries (24) continue to be a 3. IFLA Study Group. Functional
hot topic for their promise to enhance the Requirements for Bibliographic
discovery and reuse of existing metadata Records. UBCIM Publications –
definitions. Information modelling New Series Vol 19; September 1997.
techniques may support metadata ISBN 3-598-11382-X.
registries by reducing idiosyncratic 4. Bill Underwood. The InterPARES
model features through the application of Preservation Model: A Framework
modelling quality frameworks. This for the Long-Term Preservation of
promotes more canonical models, in Authentic Electronic Records. In:
order to avoid different models being Toblach/Dobbiaco. Choices and
Strategies for Preservation of the Description.
Collective Memory. Italy, 25-29 June http://standards.ieee.org/reading/ieee/
2002. std_public/description/se/1320.2-199
Also see the website of the 8_desc.html
InterPARES 2 project (International 10. Object Management Group (OMG).
Research on Permanent Authentic Unified Modeling Language.
Records in Electronic Systems). http://www.omg.org/um
http://www.interpares.org 11. P.P.Chen. The Entity-Relationship
5. D.Ross. Structured analysis: A Model: towards a unified view of
language for communicating ideas. data. ACM Transactions on Database
In: IEEE Transactions on Software Systems; v1, n1; (March 1976). pp
Engineering 3(1). Special Issue on 9-36
Requirements Analysis. (1977), 12. Toby J.Teory, Dongoing Yang, James
pp16-34. P.Fry. A logical Design Methodology
6. Keith McConnelly (US Department for Relational Databases Using the
of Defense). Introductio n to IDEF Extended Entity-Relationship Model.
Modeling: Function and Information ACM Computing Surveys, v18, n2;
Modeling. http:// (1986). pp 197-222
www.defenselink.mil/nii/bpr/bprcd/0 13. Max Kaiser, Hans-Jörg Lieder, Kurt
066.htm Majcen, Heribert Vallant. New Ways
7. National Institute of Standards and of Sharing and Using Authority
Technology (NIST). Integration Information - The LEAF Project. In:
Definition for Function Modeling D-Lib Magazine (ISSN 1082-9873),
(IDEF0). Federal Information November 2003; Volume 9, Number
Processing Standards Publication 183 11. http://www.dlib.org/
(FIPS PUBS). December 1993. dlib/november03/lieder/11lieder.html
http://www.itl.nist.gov/fipspubs/idef0 14. Global Digital Format Registry
2.doc (GDFR). http://hul.harvard.edu/gdfr/
8. National Institute of Standards and 15. ISO 8601. Data elements and
Technology (NIST). Integration interchange formats - Information
Definition For Information Modeling interchange - Representation of dates
(IDEF1X). Federal Information and times. ISO TC 154 (International
Processing Standards Publication 184 Organization for Standardization,
(FIPS PUBS). December 1993. Technical Committee).
http://www.itl.nist.gov/fipspubs/idef1 16. Thomas Baker, Makx Dekkers,
x.doc Rachel Heery, Manjula Patel, and
9. IEEE 1320.2-1998 - IEEE Standard Gauri Salokhe. What terms does your
for Conceptual Modeling metadata use? Application profiles as
Language-Syntax and Semantics for machine-understandable narratives.
IDEF1X97 (IDEFobject) - In: Journal of Digital Information
2(2), November 2003. http:// Database Systems 23(3); 1998; pp
jodi.ecs.soton.ac.uk/Articles/v02/i02/ 286--333; ISSN 0362-5915.
Baker/ 23. Eric Miller. An Introduction to the
17. Results of the Harmony Project. Resource Description Framework. In:
http://www.metadata.net/harmony/. D-Lib Magazine (ISSN 1082-9873),
Of particular interest in this context: May 1998.
Carl Lagoze, Jane Hunter. The ABC 24. The Dublin Core Metadata Registry.
Ontology and Model. In: Journal of http://dublincore.org/dcregistry/
Digital Information 2(2), November
2003.
18. Heidi Gregersen, Christian S. Jensen.
Temporal Entity-Relationship
Models - A Survey. In: IEEE
Transactions on Knowledge and Data
Engineering, May 1999; v.11, n.3, pp
464-497.
19. Daniel L.Moody, Graeme G.Shanks.
Improving the quality of data models:
empirical validation of a quality
management framework. In:
Information Systems 28(6), 2003; pp
619-650; ISSN 0306-4379.
20. Lex Wedemeijer. Long-term
evolution of a conceptual schema at a
life insurance company, Annals of
cases on information technology,
Idea Group Publishing, Hershey, PA,
2002.
21. Reinhard Schuette, Thomas
Rotthowe. The Guidelines of
Modeling - An Approach to Enhance
the Quality in Information Models. In:
T.W. Ling, S. Ram, and M.L. Lee
(Eds.). ER’98, LNCS 1507, pp.
240-254, 1998. Springer-Verlag
Berlin Heidelberg 1998.
22. S.Castano, V. de Antonellis,
M.G.Fugini, B.Pernici. Conceptual
schema analysis: techniques and
applications. In: ACM Trans.

S-ar putea să vă placă și