Sunteți pe pagina 1din 33

Introduction to XML

John Arnett, MSc


Standards Modeller
Information and Statistics Division
NHSScotland
Tel: 0131 551 8073 (x2073)
mailto:John.Arnett@isd.csa.scot.nhs.uk
http://isdscotland.org/xml
Contents
What is XML?
Anatomy of an XML Document
Conformance and Validation
Summary
Find Out More
What is XML?
a programming language
a software panacea
an object-oriented technology
HTML with funny tags
a replacement for HTML but it is
re-shaping publishing on the web
XML is not
What is XML?
Meta-markup language derived
from SGML (Standard Generalised
Markup Language)
Open Standard, currently XML 1.0
2
nd
edition (W3C Recommendation
6 October 2000)
Stands for Extensible Markup Language
What is XML?
XML is the universal format for
structured documents and data on
the Web
A data object is an XML document
if it is well-formed, as defined in [the
W3C] specification (more on this later)
W3C says
What is XML?
Data Content and Presentation
Sample dataset
1
0
1
0
SEX
15061976 Sarah Jackson 147678
12111979 Lesley Martin 111672
23081983 Alison McKenzie 198457
06011971 Ian Jones 134376
DOB FORENAME SURNAME ID
Flat file, database, spreadsheet, etc
Record data oriented structure

111672 Martin Lesley 0 12111979
What is XML?
Structured
Searchable
Easy to understand
Portable
What is XML?
HTML document oriented structure
<h1>Record Id: <font color="red">11672</font></h1>
<table><colgroup><col align="left"></colgroup>
<tr><th>Surname:</th><td>Martin</td>
</tr><tr><th>Given Name:</th><td>Lesley</td>
</tr><tr><th>Sex:</th><td>Male</td></tr>
<tr><th>Date of Birth:</th><td>12 November 1979</td></tr>
</table>
Record Id: 11672
Surname: Martin
Given Name: Lesley
Sex: Male
Date of Birth: 12 November 1979

Easy to understand
Portable
Structured
Searchable
What is XML?
XML to the rescue!
<Record recordId=11672">
<Surname>Martin</Surname>
<GivenName>Lesley</GivenName>
<Sex>M</Sex>
<DateOfBirth>
<Day>12</Day><Month>11</Month><Year>1979</Year>
</DateOfBirth>
</Record>
Easy to understand
Portable
Structured
Searchable
What is XML?
Text based
Open standards
Widely used
HTML and XML are
What is XML?
Structured
Separates data from presentation
Self-describing
Searchable
Extensible
i.e. any number of tags allowed
But XML also
Anatomy of an XML Document
character data
tab, carriage return and line feed
Unicode characters
markup
XML documents consist of text
Anatomy of an XML Document
Markup
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
start-, end- and empty element tags
tag names are case sensitive!
entity and character references
comments
Anatomy of an XML Document
Character data
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
Reserved characters
&, <, >, and
Anatomy of an XML Document
Declaration
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
Optional first line of markup (but
W3C recommended)
Used to match documents to
parsers
Anatomy of an XML Document
Root Element
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
Uniquely named element
Contains all the data and links to
other documents
Anatomy of an XML Document
Elements
<Book>XML Bible
<Price>24.99</Price>
<img src=book.gif"/>
<Author>E.R. Harold</Author>
<Publisher>J. Forbes</Publisher>
</Book>
Define the content of the XML
document
May contain other elements,
character data or can be empty
Anatomy of an XML Document
Attributes
<BookCatalog Subject="XML">
<Book Title="XML Bible" Price="24.99/>
<Book Title="XML How To Program" Price=19.99/>
<Book Title=Definitive XML Schema
Price=44.99/>
</BookCatalog>
Add data about the elements
Anatomy of an XML Document
Built-in entities
& = &amp;
= &quot;
< = &lt;
> = &gt;
= &apos;
Handling reserved characters
CDATA Sections
<CodeSnippet>
<![CDATA[if(this->getX() < 5 && values[0] =>
10) cerr << "out of range";]]>
</CodeSnippet>
Anatomy of an XML Document
Namespaces
Preventing naming collisions
<order
xmlns:cust="http://www.example.com/custDetails
xmlns:book="http://www.example.com/bookDetails"
xmlns="http://www.example.com/order">
<cust:title>Dr</cust:title>
<cust:name>Peter Parker</cust:name>
<book:title>White Teeth</book:title>
<book:price>5.99</book:price>
<orderNumber>AYT2379</orderNumber>
</order>
Conformance and Validation
One root element
Start and end tags match
<Tag>content</Tag>
Empty elements are terminated as
<Tag/>
Tags are correctly nested
<Parent><Child></Child></Parent>
All attributes enclosed in quotes
All XML processors must check well-
formedness constraints
Conformance and Validation
specified in Document Type
Definitions (DTDs) or Schemas
a valid XML document must be
well-formed
a well-formed document need not
necessarily be valid
Validating XML processors check
against validity constraints
Document Type Definitions
DTD syntax able to specify
<!ATTLIST Product EffDate CDATA #IMPLIED>
Element attributes
limited number of data types
default and fixed attribute values
<!ELEMENT Product (Name, Size?)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Size (#PCDATA)>
Structure and order of child elements
Document Type Definitions
Easy to understand and implement
Lightweight alternative to schemas
But
use non-XML syntax
only limited support for data
typing and namespaces
difficult to extend
DTDs
Schemas
Uses XML syntax
Provides built-in and supports user-
defined data types
Supports namespaces
Provides several extensibilty
mechanisms
W3C Schema
Schemas
Schemas therefore more flexible
<xs:element name="Product">
<xs:complexType>
<xs:sequence>
<xs:element name=Name" type="xs:string"/>
<xs:element name=Size" type="xs:positiveInteger
minOccurs="0"/>
</xs:sequence>
<xs:attribute name=EffDate" type="xs:date"/>
</xs:complexType>
</xs:element>
<!ELEMENT Product (Name, Size?)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Size (#PCDATA)>
<!ATTLIST Product EffDate CDATA #IMPLIED>
but harder to understand than DTDs
In Summary
A language for describing markup
languages
Extensible, ie. define own tags
Readable, structured and self
describing
Documents must be well-formed
Documents may be validated using
DTDs and/or Schemas
Find Out More
World Wide Web Consortium
www.w3.org
W3C XML v1.0 Specification
http://www.w3.org/TR/REC-xml
Find Out More
The XML Industry Portal
www.xml.org
OReilly XML site
www.xml.com
XML Cover Pages
www.oasis-open.org/cover/
Caf Con Leche
www.ibiblio.org/xml/
Find Out More
Scottish Health and Community Care
XML Steering Group
www.isdscotland.org/xml
XML Tools
XSV - Open Source XML Schema
Validator
www.ltg.ed.ac.uk/~ht/xsv-status.html
MSXML 4.0
www.microsoft.com/downloads/detai
ls.aspx?FamilyID=3144b72b-b4f2-
46da-b4b6-c5d7485f2b42
XML Tools
XML Spy 2004 IDE
www.altova.com/products_ide.html
Free XML Tools and Software
www.garshol.priv.no/download/xmlt
ools/

Printed Sources
Numerous printed sources for more
information visit
Charles F. Goldfarb's
www.xmlbooks.com
www.amazon.com

S-ar putea să vă placă și