Sunteți pe pagina 1din 8

OCLC - XML Schema Strategy DRAFT

Purpose of this document:


We use XML schemas to define the message payloads that can be exchanged
between clients and our services. This document describes how these schemas
shall be handled, for example where they should be stored and how they can be
imported into development projects where needed.
Why is this necessary? Currently there is no OCLC-wide agreement on how XML
schemas are maintained and having a centralized schema repository creates
some advantages like:
1. Helping to eliminate redundant schemas, for example identical copies of
schemas controlled as part of the source in multiple software projects
(rather than being treated like a dependency).
2. Helping to eliminate redundant data type definitions in new schemas by
promoting (requiring) reuse of data types already defined in schemas in
the central repository.
3. Helping to eliminate redundant element definitions. Element types like for
example language, currency and countries should be defined only once
and reused wherever they are needed in other XML schemas.

Schemas as archetypes and dependencies


Ideally, all new OCLC XML schemas should be controlled in their own sub-project
of the XMLSchemas Subversion project. Each sub-project creates its own artifact;
this artifact can then be used as a dependency by software projects and other
schema projects. Having sub-projects that produce versioned release artifacts
(schema versions with unique URLs) allows each schema to be
versioned/released independently. Each schema version can then be defined as a
Maven dependency, ie:
<dependency>
<groupId>org.oclc.schemas<groupId>
<artifactId>MySchema</artifactId>
<version>1.1<version>
</dependency>
As schema releases are deployed as Maven artifacts, their official public URIs
follow the pattern:
http://worldcat.org/xmlschemas/SchemaName/version/SchemaName-version.xsd
Note: The URIs for pre-release or SNAPSHOT versions of XML schemas will
reference internal OCLC Maven repositories. Worldcat.org is a public-facing, readonly, partial mirror of the OCLC enterprise Archiva repository:
http://svn.dev.oclc.org:10000/archiva/index.action, so when an XMLSchemas sub-

project is released and the schema artifact is deployed to that repository, it


becomes available via http://worldcat.org/xmlschemas/ (it gets pulled over on the
first external access request).

Creating a schema project


Note: The archetype needs to be adjusted to use the schema root.
To create a new schema sub-project under XMLSchemas it needs to be requested
via ServiceNow.
E.g. XMLSchemas/foo
After the project has been created, check out the XML schema project to directory
foo.
svn co http://svn.dev.oclc.org/svn/XMLSchemas/foo foo
Create a Maven project from the XMLSchema archetype:
mvn archetype:generate \
DarchetypeRepository=http://artifactory.dev.oclc.org/artifactory/dev
elopment-internal \
-DarchetypeGroupId=org.oclc.maven.archetypes \
-DarchetypeArtifactId=standalone-schema-archetype \
-DarchetypeVersion=1.0-SNAPSHOT \
-Dversion=1.0-SNAPSHOT \
-DgroupId=org.oclc.schemas \
-DartifactId=foo
Put the (valid) foo.xsd into foo/trunk/src/main/schemas/foo.xsd.
Create a (valid) foo/trunk/src/main/resources/fooExample.xml instance document.
Tweak trunk/pom.xml.
Run mvn test to verify that the project builds and things are valid.
Run svn add pom.xml src, svn propset svn:ignore F/tmp/foo.(/tmp/foo
contains list of glob patterns to ignore like .idea and *.iml), and svn commit m
whatever.
Run mvn deploy goal and verify artifact in Artifactory.
After significant testing, publish 1.0 release to Archiva by using Self Service, see
later step.

The XMLSchemas project structure


If a sub-project is created within the XMLSchemas repository, there are two
structures to be aware of, the Maven project structure and the file directory
structure.

Note: Sub-projects can be created via a Maven archetype:


http://artifactory.dev.oclc.org/artifactory/developmentinternal/org/oclc/maven/archetypes/standalone-schema-archetype/1.0SNAPSHOT/standalone-schema-archetype-1.0-SNAPSHOT.pom
POM file structure
All schema sub-projects should define the schema root project as their parent.
The root project contains all overall Maven project settings like plugins and
versions that are common for all the schema projects.
<parent>
<groupId>org.oclc.schemas</groupId>
<artifactId>SchemaRoot</artifactId>
<version>1.1 </version>
</parent>
The Maven project structure as a tree looks like this:
- Schema Root Project
-Schema Project
-Any optional child projects, e.g. when nested schemas are used
File directory structure
The file directory structure is the same like in any other SVN directory. The trunk
contains the current SNAPSHOT version, and the releases can be found within
branches and tags.
-schema project
-trunk
-src
-main
-resources (for test data)
-schemas (for the schema files)

Working with namespaces


Every service schema should define its own unique namespace identifier
following the pattern: http://worldcat.org/xmlschemas/MyNamespaceID where
MyNamespaceID is a meaningful association to the service.
The namespace definition of a schema ready for release should look like this:
<xs:schema

xmlns="http://worldcat.org/xmlschemas/MyNamespaceId-${pom.version}"
targetNamespace="http://worldcat.org/xmlschemas/MyNamespaceId-$
{pom.version}"

>

The pom.version placeholder will be replaced by Maven during the build process.
Also it is important that the schemas file name and the namespace are
consistent, meaning they should be the same.

Working with imports


When working with nested schemas it becomes necessary to define import or
include or redefine statements within the schemas. If the schema that is
imported is not released yet, it needs to be imported from its snapshot repository
location rather than the release repository location
(http://worldcat.org/xmlschemas/...). This is done using a schemaLocation
attribute value. Note that the namespace attribute value URI should always
follow the http://worldcat.org/xmlschemas/... pattern, even during the pre-release
(snapshot) phase of the schemas development.
<xs:import
schemaLocation="http://svn.dev.oclc.org:10000/archiva/repository/snapshots/or
g/oclc/schemas/MyNamespaceId/${version}/MyNamespaceId-${version}.xsd"
namespace=/>
Before or after the imported schema has been released, depending on the
schema, the import must be switched to the worldcat.org schema location.
<xs:import schemaLocation="http://worldcat.org/xmlschemas/MyNamespaceid/
${version}/MyNamespaceId-${version}.xsd" namespace=/>

Schema versioning:
The basic schema version numbering should be in the form of:
A.B[-SNAPSHOT]
Where A is the major version number, and it only changes when a significant
change to the service is made. B is the minor version number, it changes when a
minor enhancement to the service requires a minor change to the schema.
SNAPSHOT is used during the development phase and marks a schema as a work
in progress.

The Common Schema Projects


Within XMLSchemas there are so called Common projects. These projects
contain base schemas or base element type definitions to be reused within other
schemas. (These are not yet fully mature, widely vetted, or cmoprehensive, but
they are a good start.) This eliminates the need to maintain redundant
information (type and element definitions) throughout other schema projects and
allows enterprise-level changes to be made in minimal set of schemas.
Common Schemas
Common schemas are like abstract java classes, they can be used as template to
inherit other schemas from them.

Common Types
Common types can be reused and extended in multiple schemas, and have the
same meaning all over OCLC. Types as Currency, Language and Countries are a
good example.

Schema guidelines
Some guidelines for writing good schema definitions:

To have only one root element in your schema avoid using rel="" to
reference other elements.
Work with types where you can. (Avoid inlining a type definition
anonymously within an element definition.)
Document your elements with annotation/documentation.
Do NOT control copies of XML schemas in other projects!!! (This allows
them to drift out of sync without people realizing it.) Include them as
proper dependencies and use their URLs.

The schema release process


To release schema projects you need them set up in the Middleware Self Service
tool. This is a straight forward process:
1. Login to the Middleware Self-Service System
2. Select the schema you want to release.

3. Create a branch and build the project.

4. In the branch change all the import statements from using the Archiva
location to use worldcat.org (if there are any)
5. Tag the branch

After these steps the schema is available under worlcat.org/xmlschemas.

Importing schemas to service projects


There are multiple ways to import schemas to service projects. The simplest
would be to define a Maven dependency, but most certainly the schema will be
used by tools like JAXB to generate binding classes.
This can be done easily with Maven and the maven-jaxb2-plugin. Basically there
are three ways to do this. The first two work pretty well for simple not nested
schemas (compare http://confluence.highsource.org/display/MJIIP/User+Guide).
Compiling Schema from a URL
<configuration>
<forceRegenerate>true</forceRegenerate>
<schemas>
<schema>
<url>http://worldcat.org/xmlschemas/MySchemaName/${version}/MySchemaName-$
{version}.xsd</url>
</schema>
</schemas>
</configuration>
Compiling (generating code from) the schema from a Maven artifact
<configuration>
<forceRegenerate>true</forceRegenerate>
<schemas>
<schema>
<dependencyResource>
<groupId>org.oclc.schemas</groupId>
<artifactId>MySchema</artifactId>
<!-- Can be defined in project dependencies or dependency
management -->
<version>${project.version}</version>
<resource>MySchema.xsd</resource>
<dependencyResource>
</schema>
</schemas>
</configuration>
If nested schemas are used, things get a bit more complicated as they need to be
compiled in the reversed order of import. Meaning the top level schema is
compiled last, and the base schema that contains no imports is compiled first.
This can be done with a two step process:
1. Import the artifact with the maven remote resources plugin
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-remote-resources-plugin</artifactId>
<executions>

<execution>
<id>process</id>
<phase>generate-sources</phase>
<goals>
<goal>process</goal>
</goals>
<configuration>
<resourceBundles>
<!-- Add all resources that are referenced from the schema
definition -->
<resourceBundle>org.oclc.schemas.MySchema:MySchema:$
{version}</resourceBundle>

</resourceBundles>
</configuration>
</execution>
</executions>
</plugin>
2. Create an execution step for every schema and pipe the created episode
file to the next execution step.
<plugin>
<groupId>org.jvnet.jaxb2.maven2</groupId>
<artifactId>maven-jaxb2-plugin</artifactId>
<executions>
<execution>
<id>Common</id>
<phase>generate-sources</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<specVersion>2.2</specVersion>
<extension>true</extension>
<schemaDirectory>${project.build.directory}/maven-shared-archiveresources/schemas</schemaDirectory>

<schemaIncludes>
<include>MySchema.xsd</include>
</schemaIncludes>
<bindingDirectory>$
{project.basedir}/src/main/xsd</bindingDirectory>
<bindingIncludes>
<include>MyBindingFile.xjb</include>
</bindingIncludes>
<episode>true</episode>
<verbose>true</verbose>
<args>
<arg>-Xannotate</arg>
<arg>-XhashCode</arg>

<arg>-Xequals</arg>
<arg>-XtoString</arg>
</args>
<episodeFile>$
{project.build.directory}/xjc/my.episode</episodeFile>
<plugins>
<plugin>
<groupId>org.jvnet.jaxb2_commons</groupId>
<artifactId>jaxb2-basics-annotate</artifactId>
<version>${jaxb2BasicsVersion}</version>
</plugin>
<plugin>
<groupId>org.jvnet.jaxb2_commons</groupId>
<artifactId>jaxb2-basics</artifactId>
<version>${jaxb2BasicsVersion}</version>
</plugin>
</plugins>
</configuration>
</execution>
<execution>

<args>
<arg>-Xannotate</arg>
<arg>-XhashCode</arg>
<arg>-Xequals</arg>
<arg>-XtoString</arg>
<arg>-b</arg>
<arg>$
{project.build.directory}/xjc/my.episode</arg>
</args>
.

S-ar putea să vă placă și