Andreas Aschenbrenner ERPANET Abstract: The recognition of the create quality metadata models with a value of metadata continues to rise, and robust design. accordingly metadata frameworks are Keywords: conceptual metadata ever more widespread, they grow more model; metadata modeling; metadata comprehensive and they become visualisation; graphic modelling increasingly complex. This rise in technique; information model; quantity, size, and complexity calls for a entity-relationship analysis. methodology that supports metadata 1 introduction design. Information modelling techniques as they are routinely Abstraction is probably one of the most employed in information system design powerful of human capabilities. Plans and other domains are well suited for this and models have assisted so many before task offering both visualisation us to coordinate hunting, to find the way, techniques and an entire design to develop novel tools, and sheer methodology. Above all, information uncountable other activities. Today, models help to manage complexity and modelling also manifests in a myriad of leverage communication, thereby formalised methodologies and takes promoting reuse and interoperability. place in a variety of fields and situations. Already some metadata initiatives Conceptual models can take many forms: successfully employed modelling they can be text-based or employ techniques building on the large body of graphics for visualisation; high- level or existing experience in this area, yet these highly detailed; plain or hierarchically techniques remain to be widely adopted structured. in the metadata world. When composing a metadata set a host Information modelling techniques of requirements, influences and allow visualisation for intuitive scenarios need to be considered. After all, interpretation and clear communication, metadata needs to be tuned to a specific facilitate a structured approach to design, business environment with all its and create new perspectives on existing activities, roles, and possible metadata models. This paper describes interdependencies. Such a complex the application of information modelling undertaking calls for the use of formal to metadata. It also provides orientation modelling techniques that support the where the metadata community can design process. further extend their modelling skills to Metadata design starts from an initial requirements analysis that explores the from. This paper refers to these relevant business processes by experiences. It focuses on information establishing use cases and functional models to illustrate the value of an actual models (1). The actual metadata is then metadata model. The description of the represented by a data or - as it is also modelling technique in Chapters 3 and 4 called - information model, and its is practically oriented and illustrates how development comes as a natural a graphic metadata model can transport succession to precursory analyses. Such more information in a far clearer way a comprehensive approach ensures that compared to a flat text listing of a all external requirements are accounted metadata set. Chapters 5 and 6 then for, supports implementation at a later explore the new perspectives such a stage, and essentially raises the quality of model allows and reflect some the final product. extensions that may further enhance the Taking a look at the available experience power of this technique. in the metadata community, this design 2 background process can be followed nicely is the "Functional Design for a digital depot" Information models take a prominent (2), where the actual metadata model is role in engineering and computer science. based on functional models of a The success of a development project comprehensive process analysis. In a depends on the models created in the similar vein, a core standards activity in design phase. Respective techniques metadata modelling, IFLA's Functional were first developed already with the Requirements for Bibliographic Records advent of data processing systems in the (3), based their metadata model on 1950’s. A proliferation of methodologie s requirements analyses and use cases. and tools followed, and modelling Another authoritative initiative that techniques are widely applied today. employs various modelling techniques Graphic modelling techniques are and indeed is working on a data model employed to facilitate human with regard to digital preservation is the interpretation as part of requirements InterPARES project (4). The conceptual analysis and conceptual design. These models of this huge, international project visual models are then translated into are part of their core deliverables, and detailed lists suitable for implementation. surely they are also a great means for An early methodologies was the communication and discussion. Structured Analysis and Design Before metadata-related initiatives Technique (SADT) developed in the picked up modelling techniques, these 1970s as a “language for communicating techniques were widely used in ideas” (5). information system design. It is the The family of Integrated Definition experience accumulated in that domain Languages, short IDEF, were first that the metadata world can still benefit developed in the 1970s by the US Air Force, and they are standard modelling However, the concepts of information techniques today. They cover a range of models and the conclusions presented in applications from functional to the following are of a more general information modelling, simulation, nature and it does not really matter with object-oriented analysis and design and which modelling language they are knowledge acquisitio n. Specifically, notated. The abstraction mechanisms IDEF0 (6, 7) is a functional modelling presented below can hence be applied language that builds on SADT, and with IDEF1X or any other information IDEF1X (8, 9) provides for information model as well. modelling. 3 the methodology The development of the Unified Modeling Language (UML) (10) started In order to understand the working of an in the 1990s building on the experience EER and its application for metadata, we gained in a range of existing construct in the following a small object-oriented analysis and design learning example in a step-by-step methods. UML is geared at combining a process. We start from a metadata set for range of modelling techniques and any sort of digital object comprehending provides a set of twelve model types four elements: including functional as well as information models. The Object Management Group (OMG), a non -profit consortium, coordinates the development of UML to create a rigorous, open standard for software modelling and system design. Clearly this metadata set is actually The Extended Entity-Relationship (EER) composed of two different entities: there model is a widely used conceptual is the actual ‘object’ described by the title information model, and has a and the creation_date, and then there is long-lasting history in system the ‘creator’ of the object of whom we engineering (12, 11). The EER found know the nationality. These two entities many proponents, some of whom are in a certain relation to each other: an introduced individual styles or object is created by one or more creators, extensions for specialised applications, and each creator may have created one or but the basic concepts have hardly been more objects. In EER notation this is obscured. called a many-to-many relation, and is In the following we will use the well notated like this: established and widely used EER to model metadata. Also the two metadata initiatives (2) and (3) introduced earlier employed entity-relationship models. Other connectivities of relations are a one-to-one relation or a one-to-many relation. To demonstrate a one-to-many relation, let us assume that each creator may be working for exactly one organisation, but each organisation may Turning our focus to the entity ‘object’, contract many creators: we find that each object may consist of any number of items, also just one at least. Each item has certain attributes such as file_size and location. Furthermore, each item is of a certain One step further we define that each type, for example an ‘image’, ‘audio’, or affiliation may contract zero, one, or any ‘text’. These item types are number of creators, whereas a creator specialisations and may have attributes must be assigned to exactly one of their own; for example, an image has a organisation. For this reason we resolution, whereas for an audio the introduce a circle signifying ‘optional’ sample_rate may be specified. on the side of the ‘creator’ entity, whereas we leave the relation on the side of the ‘affiliation’ unchanged.
Lastly, a more sophisticated kind of
relation is the ternary relation. The above relations were largely binary ones between two different entities. Unary Each of these items can be stored in a relations are between one instance of an certain file format. An image, for object and another instance of the same instance, may be stored as a TIFF or as a object. Ternary relations, consequently, JPEG. These specific formats may again involve three different objects. For the have attributes of their own, similar to following example we assume that each the generalisation/specialisation above. object is created through a number of The following notation called subset, processes conducted by each creator, and however, allows overlapping entities. In also that the creator may actually other words an image may be stored as a perform the same processes on various TIFF, as a JPEG, or both formats at the objects. This calls for a same time. many-to-many-to-many relationship. this requirement in EER notation followed by a discussion.
This basic toolset of the EER model can
be applied for modelling any conceptual metadata framework. The above is a In the first description the quirk is an synopsis of the original discussion in (12) attribute of the entity ‘object’. In other and it is translated to modelling metadata. words, quirks are described in a short Relevant in this context is mainly Step 1 text, which is attached to the very object. described in Chapters 1 and 2 in that Approach (b), however, assumes that paper. For a more extensive description each object may have any number of please refer to that paper, or to any of the distinct quirks including none. Yet numerous tutorials and information another perspective is (c), where various sources available online and via other objects can have the same quirk or any channels. number of quirks. If in a specific environment a set of recurring quirks can 4 application be identified, predefined quirk Information models do not provide a descriptions could be quickly assigned to mathematically sound technique which newly incoming objects, which may allows only a single possible solution for raise efficiency considerably. As a specific metadata model. They rather illustrated by the Figure above, help to understand and shape a task; they differences between the models are quite help to ask the right questions and evident, whereas in a flat textual provide a typology for communicating description of the metadata set these possible answers. Indeed, distinct differences may go unnoticed and may perceptions of a single metadata set may cause misunderstandings and wasted manifest in small but eye-catching efforts. differences within the model. As an Visualisation is one key feature of a example to this we assume that in the conceptual model. A visual model further tentative metadata model we started to facilitates organising and structuring the create above, we want to describe data at hand through multiple abstraction possible ‘quirks’ of ‘objects’, i.e. levels and modularisation. Ultimately deficiencies in the original objects. This this is conducive to efficiency in the time, however, we flip the procedure: model’s practical application. To first we look at possible descriptions of illustrate the value of modularisation and aggregation, we specify a specific quirk further opportunities for optimising closer by describing the technology system implementation and streamlining environment (i.e. the hardware and user tasks. For instance, instead of software platform) it occurred in. In requesting that all attributes of an entity another part of the model, the description need to be entered again and again, the of the process an object was created in, system could automatically complete the we plan to include this technology empty attributes once it uniquely description as well. By making identified the corresponding dataset. ‘technology’ an entity of its own, we can Looking at the entity ‘creator’, once the establish a relation between it and the creator’s name has been given, and there ‘process’ as well as the ‘quirk’ object. As is only a single creator with that name technology descriptions can be expected known to the system, the birth and death to be rather repetitive, this bears huge dates as well as the nationality could be time and cost savings in practice. filled in automatically. As obvious such a Creators are in this approach not obliged semi-automatic approach may appear, it to produce a technology description requires the modularisation of data to themselves, but they just select given make it possible in the first place. More descriptions, which are reused by the than that, modularisation in conceptual people creating the descriptions of modelling is the basis for allowing possible quirks. (Note also that this multiple creators of an object. Again this approach could not be realised if is obvious to humans, yet these concepts perspective (a) in the Figure above was and relations can only be created with chosen.) adequate data structures at their basis. Information models can be taken down Picking up the example above, the from rather sketchy high-level models to system would just not know which detailed descriptions of future nationality belongs to which author, or functionality. In a further step after the who died if merely two authors and one model has been established, the types of date of death were given in the form of a each attribute should be specified. plain metadata list. Consistency requirements for attributes Modularisation in metadata design may should be established as well. For also reflect the organisational roles and instance, the attribute ‘CreationDate’ responsibilities for metadata creation. from the entity ‘object’ should be a date Let’s take the ‘creator’ entity again: somewhere between the birth and the somebody needs to enter a creator’s data death dates of the creator. Exact type first. Should the person who enters the definitions and consistency rules will metadata be the creator herself, or should further enhance the quality of metadata. this be taken care of by the organisation? Going one step further, the task for 5 implications populating the metadata of a specific Groupings in the metadata model offer module within a metadata framework could actually be outsourced to a modelling techniques are conceivable dedicated service. A whole sector could that could tune them specifically for decide to join forces and create such a modelling metadata. One convenient service that all organisations in the sector extension might be a colour coding for can make use of. In the case of the attributes to convey how rigidly their ‘creator’ entity, this step has been taken types are defined: on the one end are just in the cultural heritage sector by a textual fields that allow any conceivable European project called LEAF (13). The input, on the other are clearly specified LEAF system contains authority files data types such as the ISO 8601 date that describe specific persons. LEAF can format (15). Tightly defined data types, indeed be used as a service responsible of course, allow a certain level of for one particular module in the metadata machine comprehension, which is framework of a library, as it allows conducive to their automatic handling. external resources to link to its authority One may distinguish three levels of files. typing: (1) machine comprehensible; (2) Another example for exploiting possible strongly typed; and (3) human synergies is a File Format Registry that understandable. Each of these levels holds a comprehensive catalogue of file could be assigned a colour, so that the format documentation. Instead of viewer is able to discern at first sight, providing all the object type information which attributes can be interpreted by the itself, an organisation may rather entrust system (1); those which can to some this task to an external service. The degree be automatically handled and Global File Format Registry (14) is manipulated, but are essentially setting up exactly such a service. One of meaningless to the system (2); and those the open challenges is with the unique which are unstructured strings and unfit identification of entries in this database, for automatic handling (3). so that specific organisations can Another extension may be required on a actually reference to and establish a higher conceptual level: increasingly relation between their metadata model complex relationships between different and the registry. metadata sets are being established, such What other such services are conceivable, as certain attributes that are part of on a corporate, sectoral, or even a global various metadata sets at the same time, level? Information models for metadata or virtual attributes that are created may help to answer this question and automatically and on the fly from other thereby highlight opportunities for metadata. Technologies such as cooperation, which essentially saves Application Profiles (16) foster these costs and raises quality. pioneering developments. Modelling such complex relationships may, 6 extensions however, require extended modelling Several extensions to existing conceptual techniques. Same applies for including a temporal With more and more initiatives component in metadata modelling, the embarking on metadata modelling, importance of which was underlined in techniques are needed to compare , (17). Temporal modelling technique s combine and reuse models. As (21) in the have already been analysed and scope of the "Guidelines of Modeling introduced for various conceptual GoM" project pointed out, however, modelling techniques, including EER modelling is an essentially creative and (18) and also for the IDEF family (14) – subjective act. The exactly same no need to reinvent the wheel if these requirements may therefore be captured existing techniques are applicable. into entirely different models by different Previous research, albeit in the field of designers. Guidelines such as those system engineering, has also focused on introduced by GoM (21) and by the quality related issues. The different quality frameworks referenced above background notwithstanding, the potentially reduce subjectivism in the findings of this research are transferable. design process to such a degree that For example, Daniel Moody (19) model comparison is possible. However, identified model quality factors and any more automatic approach to established a quality management comparison of a large body of models framework for both the model as such such as suggested in (22) is probably still and the process of modelling. The out of reach for metadata models, due to framework takes different roles in the the inherent heterogeneity of modelling process into account, and backgrounds and the often incompatible should lead to a model that can be terms and perspectives of metadata implemented and is complete, simple, initiatives. flexible, integrated, and understandable. 7 conclusions An important design goal for metadata models obviously is their long-term Information modelling techniques stability. While time will inevitably applied to metadata allow quick make certain adaptations necessary, the comprehension, and help keeping track model can be designed such that it is of large and complex models. Even robust despite necessary modifications. newcomers will find the interpretation of Lex Wedemeijer (20) researched the such a metadata model straight-forward, long-term evolution of conceptual and will appreciate it as a means for schema. His findings leverage the communicating the foundation concepts simplicity, extensibility, and essentially of a specific model. Some experience is the stability over time of conceptual necessary in order to best exploit the schema. The transfer of these findings features and opportunities modelling will be particularly valuable, for quality offers in the design phase. However, the and long-term stability are central wealth of experience and resources concerns in metadata modelling. available in other fields supports a steep learning curve. Tutorials and auxiliary created for essentially the same software tools are easily found. requirements. At the same time metadata Information modelling is one chain link models facilitate the communication in the design process. Before the between different communities and metadata model can be created, a thereby foster the reconciliation of the requirements analysis needs to explore variety of concepts and notions inherent the real-life environment and the tasks in the diverse backgrounds of metadata and responsibilities the model has to initiatives. support. After the model is finalised, it Over all, there is little doubt that this needs to be taken forward to support in conceptualisation and implementation. The close connection communication, as well as the ir fostering between informatio n models and quality, efficacy, and robustness suggest software engineering helps bridging the the adoption of modelling techniques and gap between conceptual metadata make them more consistently used in the models and system implementation. metadata world. Methodologies for translating models References into database schema are widely available. Desirable for metadata is a 1. J.García Molina, M.José Ortín, methodology for translating a conceptual Begoña Moros, Joaquín Nicolás, and metadata model into an XML/RDF Ambrosio Toval. Towards Use Case framework. In fact, an information and Conceptual Models through model was designed for RDF (23), which Business Modeling. In: A.H.F. is tailored to the specific use and very Laender, S.W. Liddle, V.C. Storey concrete. However, ways to translate (Eds.). ER2000 Conference, LNCS conceptual models into the RDF data 1920, pp. 281-294, 2000. model remain to be explored. 2. Nico van Egmond, Hans Hofman, Another area that calls for research is the Jacqueline Slats, Tamara van Zwol. comparison of metadata models, which Depot 2000 - Functional design for a is particularly needed for tools and digital depot. Rijksarchiefdienst, Den services such as metadata registries. Haag, 2000. Metadata registries (24) continue to be a 3. IFLA Study Group. Functional hot topic for their promise to enhance the Requirements for Bibliographic discovery and reuse of existing metadata Records. UBCIM Publications – definitions. Information modelling New Series Vol 19; September 1997. techniques may support metadata ISBN 3-598-11382-X. registries by reducing idiosyncratic 4. Bill Underwood. The InterPARES model features through the application of Preservation Model: A Framework modelling quality frameworks. This for the Long-Term Preservation of promotes more canonical models, in Authentic Electronic Records. In: order to avoid different models being Toblach/Dobbiaco. Choices and Strategies for Preservation of the Description. Collective Memory. Italy, 25-29 June http://standards.ieee.org/reading/ieee/ 2002. std_public/description/se/1320.2-199 Also see the website of the 8_desc.html InterPARES 2 project (International 10. Object Management Group (OMG). Research on Permanent Authentic Unified Modeling Language. Records in Electronic Systems). http://www.omg.org/um http://www.interpares.org 11. P.P.Chen. The Entity-Relationship 5. D.Ross. Structured analysis: A Model: towards a unified view of language for communicating ideas. data. ACM Transactions on Database In: IEEE Transactions on Software Systems; v1, n1; (March 1976). pp Engineering 3(1). Special Issue on 9-36 Requirements Analysis. (1977), 12. Toby J.Teory, Dongoing Yang, James pp16-34. P.Fry. A logical Design Methodology 6. Keith McConnelly (US Department for Relational Databases Using the of Defense). Introductio n to IDEF Extended Entity-Relationship Model. Modeling: Function and Information ACM Computing Surveys, v18, n2; Modeling. http:// (1986). pp 197-222 www.defenselink.mil/nii/bpr/bprcd/0 13. Max Kaiser, Hans-Jörg Lieder, Kurt 066.htm Majcen, Heribert Vallant. New Ways 7. National Institute of Standards and of Sharing and Using Authority Technology (NIST). Integration Information - The LEAF Project. In: Definition for Function Modeling D-Lib Magazine (ISSN 1082-9873), (IDEF0). Federal Information November 2003; Volume 9, Number Processing Standards Publication 183 11. http://www.dlib.org/ (FIPS PUBS). December 1993. dlib/november03/lieder/11lieder.html http://www.itl.nist.gov/fipspubs/idef0 14. Global Digital Format Registry 2.doc (GDFR). http://hul.harvard.edu/gdfr/ 8. National Institute of Standards and 15. ISO 8601. Data elements and Technology (NIST). Integration interchange formats - Information Definition For Information Modeling interchange - Representation of dates (IDEF1X). Federal Information and times. ISO TC 154 (International Processing Standards Publication 184 Organization for Standardization, (FIPS PUBS). December 1993. Technical Committee). http://www.itl.nist.gov/fipspubs/idef1 16. Thomas Baker, Makx Dekkers, x.doc Rachel Heery, Manjula Patel, and 9. IEEE 1320.2-1998 - IEEE Standard Gauri Salokhe. What terms does your for Conceptual Modeling metadata use? Application profiles as Language-Syntax and Semantics for machine-understandable narratives. IDEF1X97 (IDEFobject) - In: Journal of Digital Information 2(2), November 2003. http:// Database Systems 23(3); 1998; pp jodi.ecs.soton.ac.uk/Articles/v02/i02/ 286--333; ISSN 0362-5915. Baker/ 23. Eric Miller. An Introduction to the 17. Results of the Harmony Project. Resource Description Framework. In: http://www.metadata.net/harmony/. D-Lib Magazine (ISSN 1082-9873), Of particular interest in this context: May 1998. Carl Lagoze, Jane Hunter. The ABC 24. The Dublin Core Metadata Registry. Ontology and Model. In: Journal of http://dublincore.org/dcregistry/ Digital Information 2(2), November 2003. 18. Heidi Gregersen, Christian S. Jensen. Temporal Entity-Relationship Models - A Survey. In: IEEE Transactions on Knowledge and Data Engineering, May 1999; v.11, n.3, pp 464-497. 19. Daniel L.Moody, Graeme G.Shanks. Improving the quality of data models: empirical validation of a quality management framework. In: Information Systems 28(6), 2003; pp 619-650; ISSN 0306-4379. 20. Lex Wedemeijer. Long-term evolution of a conceptual schema at a life insurance company, Annals of cases on information technology, Idea Group Publishing, Hershey, PA, 2002. 21. Reinhard Schuette, Thomas Rotthowe. The Guidelines of Modeling - An Approach to Enhance the Quality in Information Models. In: T.W. Ling, S. Ram, and M.L. Lee (Eds.). ER’98, LNCS 1507, pp. 240-254, 1998. Springer-Verlag Berlin Heidelberg 1998. 22. S.Castano, V. de Antonellis, M.G.Fugini, B.Pernici. Conceptual schema analysis: techniques and applications. In: ACM Trans.