Sunteți pe pagina 1din 11

XML Session 2

Rajesh Math

SICSR XML - Lecture 1

10/17/2012

XML Syntax Rules Summary

XML documents have exactly one root element.

All elements have a parent element except for the root element.
All elements have a start and an end tag (except for empty elements), eg:

<myTag>some content in here</myTag> <myEmptyTag />


Element names: must begin with a letter or underscore(-) followed by letters, digits,

underscore, period(.) or hyphen(-). Element names cannot start with the string xml in any case combination (xml is a reserved keyword).
Attributes: elements may have attributes associated with them. Attribute names follow

the element naming rules. Attribute values must be enclosed inside double quotes().
Nested elements: Elements can be nested within other elements. <A><B></B></A> is allowed

SICSR XML - Lecture 1

10/17/2012

XML Syntax Rules Summary


No overlapping tags: XML elements must be properly nested. <A><B></A></B> is not allowed. The XML declaration is the first line of the document. It identifies the document as an XML document, specifies the xml version being used and the character encoding system. <?xml version=1.0 encoding=UTF-8?> Builtin reference Entities: < &lt;left angle bracket> &gt; right anglebracket' &apos;apostrophe &quot; double quotation mark &&amp;ampersand Attributes: the attribute value must be supplied if the attribute is used and the value must be quoted. <myTag MyAttrib = "x" > ... </myTag> <document versionNo = "1.4" > ... </document>

10/17/2012

SICSR XML - Lecture 1

XML Is Not Just A Markup Language


When we say "XML", we are really referring to a whole family of technologies: DTD HTML is defined by a Document Type Definition (DTD) that specifies the structure and

syntax of all HTML valid documents.


In XML we can define our own markup language and the structure of any documents created from it. The rules are defined in a DTD that we design for that particular application.

Schema Another method for defining the structure and rules for an XML document. Schema gives a tighter definition of the elements and their allowed values as well as the order in which nesting of tags is allowed. XSL eXtensible Stylesheet Language: A markup language that allows you to describe a set of rules for translating one XML document to another XML document. XSLT XSL Transformer: A set of Application Programming Interfaces (API) that are used to accomplish the transformation. Utility programs exist that take an XML document and a XSL file to produce the transformation to a new file. 10/17/2012

SICSR XML - Lecture 1

XML Is Not Just A Markup Language


Parsing A parser is responsible for disassembling an XML document into its basic objects. The objects are then available for manipulation by a computer programmer to extract and process in what way they desire. DOM - Document Object Model

Used by Browsers in HTML. DOM parsers are available as libraries for Java, Perl, C++ and many other languages. Uses a tree representation of documents, see above. Very memory and CPU intensive for large documents.

SAX: Simple API for XML


But not that simple to use! uses a state engine and event notification to extract objects from the XML doc. SAX parsers are less CPU and memory demanding than DOM parsers for large

documents. Sequentially processes the document from start to end. useful for extracting single items from the document.

SICSR XML - Lecture 1

10/17/2012

XML syntax
XML consists of ELEMENTS

Each element is named and contains some

content (except for special empty elements). <friend>George</friend> Elements are represented using tags and each tag has a corresponding closing tag unless it is an empty tag, such as: <student id="40123721" grade="A" />

SICSR XML - Lecture 1

10/17/2012

Attributes, Comments
Attribute values must be present, and must be

quoted. For example, in HTML we could get away with: <hr noshade>. This is not legal in XHTML, where it must be written <hr noshade="noshade"/>. Similarly, in XML:

<myTag myAttrib="1.6"> ..... </myTag> NB: An open issue in XML design is whether a particular entity should be modelled/described as a tag/element of its own, or as an attribute of an existing element. The general "rule-of-thumb" is that elements should be thought of as containers (which are understood to have contents) and attributes are characteristicsof the element.
7 SICSR XML - Lecture 1 10/17/2012

CData
The content of a CDATA section is not treated as

markup. Typically used to include data that will be used by another application eg. JavaScript. Syntax is a little messy, and looks like: <! [CDATA [ <script language="javascript" type="text/javascript"> var name="fred"; var x = 3.0; var y = 4.0; if ( x < y ) document.write( x is less than y ); </script> ]]> Note the strange use of the "square brackets", [ CDATA [ ...]]. In XHTML the above example would be written:

<script language="javascript" type="text/javascript"> <! [CDATA [ var name="fred"; var x = 3.0; var y = 4.0; if ( x < y ) document.write( x is less than y ); ]]> </script>
8 SICSR XML - Lecture 1 10/17/2012

Processing Instructions
An XML file can also contain processing

instructions that give commands or information to an application that is processing the XML data. Processing instructions look rather like the lines in the prolog: <?target instructions?> where target is the name of the application and instructions is a string of text which is passed to it, eg: <?xml-stylesheet type="text/xsl" href="weather.xsl"?>
9 SICSR XML - Lecture 1 10/17/2012

Namespace
Similar concept to scope rules for variables in programming. For example, in Java, the key word "this" is used as a prefix to refer to the instance variable to avoid confusion with a local variable of the same name. Another example: in Perl, the keywords "my" and "local" provide fine-grain control over variable scope. Universal Resource Identifiers (URIs) are used to uniquely identify a namespace.

URI
Universal Resource Identifier -- can be a

URL or a URN

URL
Universal Resource Locator

URN
Universal Resource Name

10/17/2012

SICSR XML - Lecture 1

10

URN Syntax
The syntax is similar to URL All URNs have the following syntax:

<URN> ::= "urn:" <NID> ":" <NSS> where <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. A namespace can be declared for any XML document type (custom markup language). The namespace is identified using a unique URN or URL. IMPORTANT: The URI need not physically exist. It is only being used a means of uniquely identifying a document definition. A name space is a conceptual zone in which all names are unique

11

SICSR XML - Lecture 1

10/17/2012

S-ar putea să vă placă și