Sunteți pe pagina 1din 6

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

KNOWLEDGE BASE MAPPING FOR PRESERVATION PLANNING ON DIGITAL


PRESERVATION STRATEGIES DEVELOPMENT
Mardhani Riasetiawan, Universiti Teknologi PETRONAS

ABSTRACT

The current issues in digital preservation strategies development are planning, policies, strategies, and action for
preservation. Preservation planning is sets of activities to arrange and defining the preservation strategies will
chosen for implemented for digital information. Knowledge Base Mapping for Preservation Planning developed based
on OAIS Processes to ensure that preservation processes has followed the current standards, and based on
preservation development methodologies for compatibility with the other preservation planning tools. Knowledge
Base Mapping consist of 4 layer, there are attributes of digital repositories, attributes of archived collections, tools
and technologies, and economic and policy models. It's has ensured “optimal” preservation strategies for many types
of digital objects, in this case in Geoscience Data/File.

Keywords: Digital Preservation, Planning, Knowledge Base Mapping, Digital Repositories, Archived
Collections, Tools and Technology, Economic and Policy Models, Preservation Strategies

1.0 INTRODUCTION

Digital information collections are heterogeneous, vast and growing at rate that outspaces from our ability to manage
and preserve them. Increasing amount of information are being created and maintained in digital format, includes
objects from virtually every discipline and type. All of these objects need to be preserved and maintained for long
term periods, be it out of legal requirements, and represented the basis of business models, the constitute valuable
cultural heritage, need of the evidence and prof of scientific experiments, and personal reasons and value. The
technologies, methodologies, strategies, and resources to needed to manage digital information for the long-term
have not kept pace with innovations in the creation and capture of digital information. Much more digital content is
available and worth preserving; researchers increasingly depend on digital resources and assume that they will be
preserved.

Digital preservation is a set of activities required to make sure digital objects can be identified, located, rendered,
used and understood in the future. This can include managing the object names and locations, updating the storage
media, documenting the content and tracking hardware and software changes to make sure objects can still be
opened and understood. Digital preservation combines planning, policies, strategies and actions to ensure access to
reformatted and born digital content regardless of the challenges of media failure and technological change. The goal
of digital preservation is the accurate rendering of authenticated content over time. The act of maintaining
information, in a correct and Independently Understandable form, over the Long Term.

1.1 Technology Challenges

Preservation of digital information has own characteristics, there are it’s a complex task, requires a concise
understanding of the objects, their intellectual characteristics, the way they were created and used and how they will
most likely be used in the future, requires a continuous commitment to preserve objects to avoid the “digital dark
hole“, requires a solid, trusted infrastructure and work flows to ensure digital objects are not lost, is essential to
maintain electronic publications & data accessible, will become more complex as digital objects become more
complex, needs to be defined in a preservation plan.

1.2 Preservation Planning Needs

The current issues in digital preservation application development are planning, policies, strategies, and action for
preservation. All of this issues in very close attention as first step for developing the preservation application.
Preservation planning is sets of activities to arrange and defining the preservation strategies will chosen for
implemented for digital information. This is crucial think, because technology challenges in digital preservation are stil

©Informatics '09, UM 2009 RDT3 - 75


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

being current issues. Preservation Planning manage to understood of the objects, the intellectual characteristics, the
way the file were created and used. By identified them, it will more easy for preserver to chose and select the match
preservation strategies.

Currently several preservation strategies has developed , for each strategy has several tools available, for each tool
has several parameter settings available. The questions are : How do you know which one is most suitable? What
are the needs of your users? Now? In the future? Which aspects of an object do you want to preserve? What are the
requirements? How to prove in 10, 20, 50, 100 years, that the decision was correct / acceptable at the time it was
made?

2.0 DIGITAL PRESERVATION AND STANDARDS

Digital preservation is the set of processes, activities, and technologies utilized to store and to access large amounts
of heterogeneous digital data for long periods of time [1]. The focus of digital preservation is long term access and
manage in the future-able term. The challenges is about time. Digital preservation enables people or systems to used
and understands the data in the far future in spite of the unknown changes in users, and technologies (SW/HW).
Because of this, it is an inherently cross functional task and there are many stakeholders involved in a presentation
environment including IT, records and information management, security, legal, regulatory compliance.

2.1 Standard on Digital Preservation

OAIS has developed by NASA : National Space Science Data Center for NASA’s first digital archive in experienced
many technological changes since 1966. Its also re-developed by Consultative Committee for Space Data Systems in
international group of space agencies, developed range of discipline-independent standards, evolved into ISO TC 20/
SC 13 working group around 1990, TC20: Aircraft and Space Vehicles, SC13: Space Data and Information Transfer
Systems. OAIS is framework for understanding and applying concepts needed for long-term digital information
preservation. Its long-term means long enough to be concerned about changing technologies, framework for
comparing architectures and operations of existing and future archives. OAIS addresses a full range of archival
functions [2].

The others standard are Extensible Access Methods (XAM), SNIA 100 Years Archive Task Force and Object Storage
Devices (OSD). XAM is a Storage Networking Industry Association (SNIA) initiate to define a standard interface
between application and management application (consumers) and providers (storage systems), developed by IBM
and EMC in Q4 2004 [3]. SNIA 100 Years Archives task Force define storage standards for long term digital
information retention [4], and Object Storage Devices (OSD) enable creation of self managed, heterogeneous,
shared, and secure storage by moving low level storage functions into storage devices itself [5].

2.2 Preservation Planning

Preservation Planning is consistent work flow leading to a preservation plan. The several steps in preservation
planning are analyses, which solution to adopt, considers about preservation policies, legal obligations,
organisational and technical constraints, user requirements and preservation goals. Next steps is describes the
preservation context, evaluated preservation strategies, resulting decision including the reasoning. The preservation
planning ensure the preservation are repeatable, and has solid evidence [1].

2.3 Related Project

Plato is Preservation Planning Tool [6], that reference implementation of planning workflow. Its developed by Web-
based application and has released in version 2.0 Nov. 12 2008. Plato accommodated documents the process and
ensures that all steps are considered. Plato has automates several steps to creates a preservation plan (XML, PDF).
Plato has developed with technical basis: Java Enterprise Beans, EJB 3 (Hibernate), based on JBoss Application
Server, JBoss Seam Integration Framework, Java Server Faces with Facelets, XML Import/Export (Xstream). The
main function of Plato are assists in analyzing the collection with profiling, analysis of sample objects via Pronom and
other services. Plato also allows creation of objective tree within application or via import of mindmaps, and allows
the selection of Preservation action tools. Also resulted runs experiments and documents results, allows definition of
transformation rules, weightings, performs evaluation, sensitivity analysis, and provides recommendation (ranks
solutions).

©Informatics '09, UM 2009 RDT3 - 76


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

3.0 METHODOLOGIES

This research has doing several steps for the methodologies, they are :
1. Identification
Identification consist of describing the context (Institutional settings, legal obligations, user groups, target
community, and organizational constraints). The parameters in this steps are New Collection Alert (NCA),
Changed Collection Profile Alert (CPA), Changed Environment Alert (CEA), Changed Objective Alert (COA),
Periodic Review Alert (PRA)
2. Status
The identification do registering the digital file and defining in the file name, file context, reference,
provenance and fixity. Preservation planning also defining the representation information model, consist of
information object model and representation information network model.
3. Description of Institutional Setting
describing the institutional settings in the context of institutional settings, legal obligations, user
groups/community, and organizational constraints.
4. Description of Collection
Describing the type of objects, the amount of objects and the formats.
5. Requirements for Preservation
Based on identification in identification, status, description of institutional settings, and collection, the
preservation planing will giving options of preservation strategies (recommended by the preservation
planning).
6. Evidence for Preservation Strategy
collect the evidence for each preservation strategy that implemented. This steps collecting and identify the
Object characteristics, record characteristics , process characteristics, costs .
7. Cost
Calculate the cost of each preservation planning options. Its needed to identify the resource spent for each
preservation planning.
8. Trigger for Re-evaluation
Each options in preservation planning needs to get evaluations. The evaluations focusing on deliberate step
for taking a decision whether it will be useful and cost-effective to continue the procedure given. The
resources to be spent (people, money), the availability of tools and solutions, and the expected result(s).
Review of the experiment/ evaluation process design so far and answering : Is the design complete, correct
and optimal.
9. Preservation Action Plan
Running the preservation action based on recommended preservation planning. This steps do plan for each
experiment by steps to build and test SW components, HW set-up, procedures and preparation and
parameter settings, capturing measurements .

4.0 KNOWLEDGE BASE MAPPING FOR PRESERVATION PLANNING

The research resulted design and well-implemented Knowledge Base Mapping for Preservation Planning
Techniques. The result focus on the process and manage the preservation planning processes for preservation
strategies selections for long term archives and access. Knowledge Base Mapping for Preservation Planning
developed based on OAIS Processes to ensure that preservation processes has followed the current standards, and
based on preservation development methodologies for compatibility with the other preservation planning tools.

Knowledge Base Mapping consist of 4 layer, there are attributes of digital repositories, attributes of archived
collections, tools and technologies, and economic and policy models.

©Informatics '09, UM 2009 RDT3 - 77


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

Table 1 : Knowledge Base Mapping for Preservation Planning


Data Model Driven Architecture
Controlled Access Repositories
DIGITAL
REPOSITORIES Archives of temporally Archives of derived data
Archives of involving data
changing data products
Re purposing of Archives
Selection and preservation of complex digital objects
ARCHIVED Aggregation of items objects into collection
COLLETIONS Decisions models for selections
Resolution of naming hierarchy

TOOLS & Acquisition and Naming & Decision Models & Standards &
TECHNOLOGY Ingest Authorization Metric Interoperability

Long Term
Preservation of Deposit of digital
Metrics Intellectual Capital
CONTENT Digital Information content into archives

Knowledge Content
Notes :

1. Attributes of Digital Repositories


The development of infrastructures for digital archiving is strongly driven by the need to support
multiple communities. Each community has unique requirements that will influence the design of
the digital archive.
2. Attributes of Archived Collections
The archived collections have additional attributes that enhance their quality, utility,
trustworthiness, and longevity. The archival collections are created through curatorial processes
that include selection, organization, description, and quality control, and they require individuals or
organizational entities that will take on formal responsibility for long-term stewardship
3. Tools and Technology
Human labor is the most expensive component of digital archiving systems. Therefore, research
and development of better archiving tools and technologies will not only make digital archives more
robust and reliable, it will also drive down the costs of this endeavor.
4. Policy and Economic Models
Even the most effective tools and technology will be useless without a policy and economic
environment that is conducive to long-term preservation. The area of policy and economic models
is ripe for research
The research has running experiment for preserving the such type of geosience data in Petroleum Data, they are :
1. [A] Two-dimensional seismic section with well logs plotted on the seismic data.
2. [B] Gas chromatograph plots derived from analyses of oil samples.
3. [C] Down-hole profile of hydrocarbon content of fluid inclusions.
4. [E] Three-dimensional seismic data cube.
5. [F] Well-log plot from an industry drill well showing rock type and possible hydrocarbon-rich layers.
6. [G] Outcrop of sedimentary rocks and a geological cross section (upper image) derived from data collected
at that outcrop

©Informatics '09, UM 2009 RDT3 - 78


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

Fig. 2 : Geoscience Data/File Preserved


Based on the design and implemented of Knowledge Base Mapping for Preservation Planning and the collected
data/file in Geoscience Data, has resulted and captured preservation planning :

Table 2: Preservation Planning Scheme for Well-log Plot


Element Name Description/Remarks
Permit No Well permit number of the Alabama State Oil and Gas Board (ASOGB)
API_Number API assigned code number
Well Name Well Name
Field Name of the oil or gas field
County County Name
Section component of local coordinate scheme
Township component of local coordinate scheme
Range component of local coordinate scheme
Dist1 distance 1 (N-S)
Dist2 distance 2 (E-W)
Latitude Coordinate location of well.
Longitude Coordinate location of well.
Derived Lat/Long Approximate coordinate; value were derived from either: Nearby wells. Field shows the Permit
No. of the well used for approximation or Interpolated value from closest Township or Range

Well Logs List (code) of available well logs: ca= caliper, son=sonic, iel=induction electric, etc.
Mudlog TRUE if mudlog analysis/cuttings for well on file with ASOGB.
Cores TRUE if core samples from well filed with ASOGB.
Cuttings TRUE if washed cuttings from well filed with ASOGB.
Core_analysys TRUE if core analysis for well on file with ASOGB.
Production_data TRUE if production data for well on file with ASOGB.
Well Status Coded status of the well (for example, plugged and abandoned = P&A)
Core_Fnumb ASOGB core collection number
Core Type Values are: ch=chips, pl=plugs, pt=parts of whole core, sl=slabs, sw=sidewalls, wh=whole,
fwc=filed with cuttings
Core Analysis CA = Core analysis performed ?
Core Quantity In number of boxes
Cores1 Cored Intervals (and diameter ?)
Cores2 Other core information
Sample_Fnumb ???
Sample_Type ???
Sample_Analysis ???
Pool Name of the formation or oil reservoir
Preservatin Strategies E=Emulation, M=Migration, V=Virtualization

©Informatics '09, UM 2009 RDT3 - 79


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

5.0 DISCUSSIONS

Preservation planning shares many requirements with well-designed information systems, such as security,
authentication, robust models for representation, and sophisticated information retrieval mechanisms. Nevertheless,
unique long-term preservation requirements raise many interesting research questions that demand innovative
solutions. Redundancy, replication, and security against intentional attacks on archival systems and against
technological failures are critical requirements for long-term preservation, as are issues of forward migration. Future
users of digital archives will have different needs, expectations, technologies and analytical tools from those of the
communities that created the digital content initially. Another factor that distinguishes digital preservation research
from many other types of research is the difficulty of knowing whether or not we have solved the problems
successfully, because the ultimate test of success will be the new knowledge and discoveries that result at some
future date. This problem requires some very challenging thinking about success measures and evaluation criteria,
and it will demand an extended research effort over the next decade.

5.0 CONCLUSIONS

Knowledge base Mapping for Preservation Planning has ensured “optimal” preservation strategies for many types of
digital objects, in this case in Geoscience Data/File. It's simple, methodologically sound model to specify and
document requirements. The knowledge base mapping techniques can be repeatable and documented evaluation for
each steps of preservation planning, using basis for well-informed, accountable decisions, concretization of OAIS
model, and generic work flow that can easily be integrated in different institutional settings.

REFERENCES

[1] S Cohen, D Naor, L Ramati, P Reshef, Toward OAIS-Based Preservation Aware Storage. IBM Haifa Labs,
November 2006.

[2] ISO 14721:2003, Blue Book. Issue 1. CCSDS 650.0-B-1: Reference Model for an Open Archival Information
System (OAIS), 2002.

[3] SNIA – Networking Industry Association, Data Management Group, XAM (Extensible Access Method). See
http://www.snia-dmf.org/xam

[4] Galen Schreck. Building the 100-Year Archive, Forrester, 2005

[5] SNIA – Storage Networking Industry Association. OSD : Object Based Storage Devices Technical Work
Group.

[6] Digital Preservation Working Group, Faculty of Informatics, University Technology of Vienna. See
http://www.ifs.tuwien.ac.at/dp

[7] Brian F. Lavoie. The Open Archival Information System References Model: Introductory Guide. DPC
Technology Watch Report 04-01, 2004

[8] Andreas Rodriguez at al. Supporting e-Research Using Representation Information. Proceedings of the UK e-
Science, Nothingham UK, 2005

[9] OCLC/RLG Working Group on Preservation Metadata. Preservation Metadata and the OAIS Information
Model: A Metadata Framework to Support the Preservation of Digital Objects, 2002

BIOGRAPHY

Mardhani Riasetiawan is a PhD by research student in Computer and Information Science Department, Universiti
Teknologi PETRONAS. Currently, he is working on knowledge management research, focusing on knowledge based
digital preservation. Mardhani Riasetiawan obtained his Master of Engineering from Universitas Gadjah Mada,
Indonesia in 2007.

©Informatics '09, UM 2009 RDT3 - 80

S-ar putea să vă placă și