Sunteți pe pagina 1din 16

Developing a Distributed Data

Dictionary Service

Jim U’Ren
Jet Propulsion Laboratory
California Institute of Technology
Design Hub, KM Standards Working Group & EDA Team

April 11, 2002


Problem
• 1. Data dictionaries mean different things to
different people:
• Vocabularies - human readable collections of terms
and definitions pertaining to a domain
• Data element dictionaries - machine interpretable
collections of data elements (fromISO/IEC11179)

• Schemas (information models) - structured, machine


interpretable collections of information models
consisting of structured relationships between data
elements
❚ 2. Dictionaries do not communicate with each
other
Developing A Distributed Data
2001-07-30 Dictionary Service 2
What is Needed
• A mechanism that can be used to access,
publish, update, relate and integrate data
dictionaries (vocabularies, data elements,
and data models)
• Mechanism must be able to span domains
and subdomains, e.g., engineering, science,
and administrative
• Mechanism must have both manual and
automated interfaces
• Mechanism should follow the distributed
service model (e.g., DNS, Internet Domain Name
Service, x.500 Directory, etc.)
Developing A Distributed Data
2001-07-30 Dictionary Service 3
A Solution
• Develop a distributed data dictionary “service ”
using:
• LDAP Internet service protocol (LightWeight Directory Access
Protocol)

• ISO11179 - a specification for standard data


elements
• DSML XML DTD/Schema (Directory Service Markup Language)
• Dublin Core Meta-data
• the Service will store and relate vocabulary, data
elements, and data model information

Developing A Distributed Data


2001-07-30 Dictionary Service 4
Advantages of LDAP
❚ LDAP has many advantages, including:
• Universal Access - Internet directory standard, widely
adopted and implemented by numerous vendors and
open source software solutions
• Simple - a relatively simple, high-level protocol with a
straightforward API
• Extensible - easily extended and adapted
• Access Control and Security - connections can be
authenticated and secured layered Internet security
mechanims
• Multi-Platform Development - C/C++, Perl, Java,
JavaScript, Python, PHP and other APIs are available,
making LDAP services accessible from virtually any
language, platform, or development environment
Developing A Distributed Data
2001-07-30 Dictionary Service 5
What is LDAP?
❙ An Internet Standard – from an IETF working group
❘ RFC 1777 Lightweight Directory Access Protocol
❘ RFC 1778 String Representation of Standard Atribute Syntaxes
❘ RFC 1779 String Representation of Distinguished Names
❘ RFC 1959 LDAP URL Format
❘ RFC LDAP API

❙ A distributed, hierarchial data base


❙ Uses a multi-part naming convention to create
unique records (“distinguished names”)
❘ cn=behaviour, dc=vocabulary, dc=Part233, dc=10303, dc=ISO
❘ cn=requirement_set, dc=data-element, dc=Part233, dc=10303, dc=ISO
❘ cn=TBR-apha1, dc=shema, dc=Part233, dc=10303, dc=ISO

❙ Includes ability to implement multiple levels of


security

Developing A Distributed Data


2001-07-30 Dictionary Service 5
Example of an LDAP tree

ISO

9000 10303 14496

203 209 210 . . . 233 . . . 235 237

Vocabulary Data Elements Schema

Developing A Distributed Data


2001-07-30 Dictionary Service 5
Advantages of ISO 11179
• an established international standard
• widely supported - US Census Bureau, NIST,
Defense Information System Agency, Environmental
Security, DoE, DoJ, Bureau of Labor Statistics, DoT, EPA,
etc.

• Flexible use of elements within the


schema
• Easily implemented in an LDAP
directory service - flexible and easily configured
LDAP servers well suited to flexible 11179 schema

Developing A Distributed Data


2001-07-30 Dictionary Service 6
Data Dictionary Components
for a given “namespace”

Developing A Distributed Data


2001-07-30 Dictionary Service 7
A Distributed Data Dictionary Service
using Standards-based technology
LDAP Protocol | ISO 11179 meta-data schema | DSML |
Dublin Core
Prototype service viewable at:
http://step.jpl.nasa.gov/ldap

Supporting
Supporting Automated
Validation Processes
Scenarios

Supporting
Supporting
Data
Terminology
Modeling
Lookups
Activities
Developing A Distributed Data
2001-07-30 Dictionary Service 8
A Proposed Data Element Naming Convention

• A structured, multi-part naming system


• similar to IP addressing and URLs
• “ dot ” delimited names
• follows convention used by Dublin Core Meta-data
Initiative
• short-name aliases could be supported in the
planned distributed data dictionary service
• e.g. author = DC.Creator, keyword=DC.Subject, etc.
• Names would consist of domains, descriptors
and qualifiers.

Developing A Distributed Data


2001-07-30 Dictionary Service 9
Examples of the Data Element Naming
Convention within JPL Domains
• Dublin Core Meta-data Initiative (a JPL adopted standard)
• DC.Date
• DC.Date.Created
• DC.Date.LastModified
• JPL’s Planetary Data System (PDS)
• PDS.Target_Name
• PDS.Sampling_Factor
• JPL’s Product Data Management System (PDMS)
• PDMS.Version
• PDMS.ReferenceDesignator
• JPL New Business System (NBS)
• NBS.HR.start_date
• NBS.HR.employee_status
Developing A Distributed Data
2001-07-30 Dictionary Service 10
Terminology Lookup Scenarios
 Resolving Ambiguous Terminology - an end user, needing to clarify use and
meaning of a word used in a specific context, performs a multi-domain vocabulary
lookup across multiple DD services looking for published vocabulary of referenced
domain
 Finding the Correct Acronym - an end user, confronted with a number of new
acronyms used in a presentation, accesses a local DD service to look up the
acronyms based within probable domains, thereby eliminating the alternative
meanings e.g., searching for STEP standards work versus the JPL STEP project
 Enabling Improved Search Engine Performance - as a search engine scans
through a document, it discovers a keyword list and finds a “reserved word”; the
document includes a reference to a domain-specific vocabulary list in a DD service;
the search engine uses this vocabulary to be certain it is indexing the keywords in
the right context
 Building Glossaries for Technical Papers - an engineer or scientist writing a
technical paper, needs to include a glossary of relevant terms in the paper; by
performing a multi-service search, terms and definitions that relate to the topic of the
paper are quickly found and inserted into the paper with the corresponding
attributions

Developing A Distributed Data


2001-07-30 Dictionary Service 11
Validation Scenarios
 Validating Units of Measure - a system integrator receives an MCAD geometry
model (e.g., STEP AP203 Part 21 file) of a component to be integrated into any
assembly; automatically, a standard validation routine is performed against the
schema located in a referenced data dictionary that checks for use of the units of
measure called for in the contract and identified in the exchange file
 Enabling Automated Repository Check-In - as a STEP model is checked into a
PDM system, an automated validation routine checks the model using the schema
(located in the DD service) that is identified in the Part 21 data file
 Improving Quality of Data Handoffs - an MCAD geometry model is sent from
design to thermal analysis and validation is performed using the correct schema
version as referenced in the model; validation is an automated process that occurs
before any work is done with the model as it is transferred between domains
 Validating for Adequacy and Range – the PDS (NASA’s Planetary Data System)
central node receives a dataset description in template format to be ingested into the
dataset catalogue database. Automatically, a standard validation routine is
performed that checks for required keywords, key word values and value types in the
dataset in template format against a corresponding structure stored in the PDS
domain of the data dictionary service

Developing A Distributed Data


2001-07-30 Dictionary Service 12
Data Modeling Scenarios
 Data Reuse in Modelling Activities- a data modeller, charged with
developing an information model for a new application, uses data
elements published in several DD services (much like a parts library),
ensuring that the new information model will have compatible interfaces
with data sets that share the same data elements or collection of elements
 Creating a TDP (technical data package) - an application performs a
schema check against objects about to be wrapped into a TDP (e.g., STEP
AP232 or PDM Schema TDP) to ensure their correct structure and meta-
data content
 Data Integration Enabled - an analyst, charged with integrating data
from two or more data sets, accesses the “correct” version of each
schema as referenced in the data set from the “DD service space”
allowing them to identify/map interfaces between the data sets, e.g.,
MCAD-ECAD-cost data
 Extending a schema - to solve a "local" problem, a data modeller uses
data elements from a published collection of data items to extend an
existing “official” schema; the new schema is published in the DD service
with traces/links back to the “official” schema

Developing A Distributed Data


2001-07-30 Dictionary Service 13
What’s next? (Completing the “prototype” )

❚ Architecture development Reporting Module



❙ UML Model (50%) ●
Glossary Builder (0 %)
❙ Naming Convention (50%) ●
History/CM Report (0 %)
❙ Linking ontology (25%) ●
Summarized Listings (0 %)
❚ Server configuration ❚ Testing (0 %)
❙ 2nd and 3rd DD test nodes (33%)● Scenarios
❙ Wrapping existing DD DBs (10 %) Server ●

❚ Client configurations Clients ●

LDAP URL (75 %) Java (33%)



Data population

Python (33%) Perl (33%) ❙ Documentation


C/C++ (75%) Unix Shell (25%)

White Paper (100 %)

FAQ (20 %)
PHP (25%) Native clients(25%) ●
Server Configuration Guide (15%)

Recommended Best Practices
(10%)
Developing A Distributed Data
2001-07-30 Dictionary Service 14

S-ar putea să vă placă și