Sunteți pe pagina 1din 41

Introduction to XML

Chapter 1

Chapter Objectives -1

Discuss markup language


List and explain drawbacks of HTML
Discuss the architecture of XML documents
List the benefits of XML
Discuss Parser

Core XML / Chapter 1 / Slide 2 of 35

Chapter Objectives -2

Build a complete XML Document:


Character Data
Comments
Processing Instructions
Entities
General Entities
Parameter Entities
The DOCTYPE Declarations

Core XML / Chapter 1 / Slide 3 of 35

History of Markup
Documents recorded
using paper and pen

Typesetters formatting
documents

Tools used by typesetters


to format a document

Core XML / Chapter 1 / Slide 4 of 35

Markup Language

A Markup language defines the rules that help to add


meaning to the content and structure of documents.
They are classified as:

Stylistic Markup It determines the presentation of the


document
Structure Markup It defines the structure of the
document
Semantic Markup It determines the content of the
document
Core XML / Chapter 1 / Slide 5 of 35

SGML

Generalized Markup Language (GML) is


the system of formatting documents.
GML was fine-tuned and came to be known
as Standard Generalized Markup Language
(SGML).
SGML is the source of origin of all markup
languages
Core XML / Chapter 1 / Slide 6 of 35

Features of SGML

It describes markup language, which allows


authors to create their own tags that relate to
their content.
It needs a separate file that will contain all
the rules for the language, for its
interpretation
A SGML application is markup language
derived from SGML.
Core XML / Chapter 1 / Slide 7 of 35

HTML

HTML is the most famous markup language


derived from SGML.
It was created to mark up technical papers so that
they could be transferred across different
platforms for the scientific community.
It is now also used by those non-scientific users
who are concerned about their documents
presentation.
Core XML / Chapter 1 / Slide 8 of 35

Drawbacks of HTML

Fixed tag set


Presentation technology does not relate to the
contents
It is flat
Clogging
HTML is not international
Data interchange is impossible
Does not have a robust linking mechanism
HTML is not reusable
Core XML / Chapter 1 / Slide 9 of 35

<UL>

HTML and XML code


Examples
XML

HTML
<LI> TOM CRUISE
Code

<UL>
<LI>
<LI>
<LI>
<LI>
<LI>
St.
<LI>
<LI>
<LI>
</UL>
</UL>

CLIENT ID : 100
COMPANY : XYZ Corp.
Email : tom@usa.net
Phone : 3336767
Street Adress: 25th
City : Toronto
State : Toronto
Zip : 20056

<Details>
Code
<CONTACT>
<PERSON_NAME>TOM CRUISE
</PERSON_NAME>
<ID>
100
</ID>
<Company>XYZ Corp. </Company>
<Email>
tom@usa.net</Email>
<Phone> 3336767 </Phone>
<Street> 25th St.
</Street>
<City> Toronto
</City>
<State> Toronto
</State>
<ZIP> 20056
</ZIP>
</CONTACT>
</Details>
Core XML / Chapter 1 / Slide 10 of 35

XML -1

XML stands for Extensible Markup Language.


It overcomes all the drawbacks of HTML.
It allows the user to define their own set of tags, and also
makes it possible for others (people or programs) to
understand it.
It is more flexible than HTML.
It inherits the features of SGML and combines it with the
features of HTML.
It is a smaller version of SGML.
Core XML / Chapter 1 / Slide 11 of 35

XML -2

XML is a metalanguage and it describes other


languages.
The data contained in an XML file can be
displayed in different ways.
It can also be offered to other applications for
further processing.
Style sheets help transform structured data into
different HTML views. This enables data to be
displayed on different browsers.
Core XML / Chapter 1 / Slide 12 of 35

XML Architecture - 1

XML supports three-tier architecture for handling


and manipulating data.
It can be generated from existing databases using a
scalable three-tier model.
XML tags represent the logical structure of data
that can be interpreted and used in various ways
by different applications.
The middle-tier is used to access multiple
databases and translate data into XML.
Core XML / Chapter 1 / Slide 13 of 35

XML Architecture -2

Core XML / Chapter 1 / Slide 14 of 35

XML A Universal data


format

HTML is a single markup language, but XML is a


family of markup languages.
Any type of data can be easily defined in XML.
XML is popular because it supports a wide range of
applications and is easy to use.
XML has a structured data format, which allows it to
store complex data

Core XML / Chapter 1 / Slide 15 of 35

Benefits of XML

The three-tier architecture has easier


scalability and better security.
The benefits of XML are classified into the
following:
Business benefits
Technological benefits

Core XML / Chapter 1 / Slide 16 of 35

Business Benefits

Information sharing:

XML inside a single application:

Allows businesses to define data formats in XML


Provides tools to read, write and transform data
between XML and other formats
Powerful, flexible and extensible language

Content Delivery:

Supports different users and channels, like digital TV,


phone, web and multimedia kiosks
Core XML / Chapter 1 / Slide 17 of 35

Technological Benefits
Separation of data
and presentation
Semantic
information

Technological
Benefits

Extensibility

Re-use of data

Core XML / Chapter 1 / Slide 18 of 35

XML Document Structure

An XML document is composed of sets of


entities identified by unique names.
All documents begin with a root or document
entity.
Entities are aliases for more complex functions.
Documents are logically composed of
declarations, elements, comments, character
references, and processing instructions.
Core XML / Chapter 1 / Slide 19 of 35

Well formed and Valid


Documents

An XML document is considered as well formed, if


a minimum set of requirements defined in the XML
1.0 specification are satisfied.
The requirements ensure that correct language terms
are used in the right manner .
A valid XML document is a well-formed XML
document, which conforms to the rules of a
Document Type Definition (DTD).
DTD defines the rules that an XML markup in the
XML document must follow.
Core XML / Chapter 1 / Slide 20 of 35

Parsers - 1

Parsers help the computer interpret an XML


file.
<?xml
version=1.0
?>
<nxn> </nxn>

Editor with the


XML document

XML document parsed by


the parser

Parsed document
viewed in the browser

Their are two types of parsers:

Non Validating parser


Validating parser

Core XML / Chapter 1 / Slide 21 of 35

Parsers - 2
XML
file
Parsers load the XML
and other related files
to check whether the
XML document is
well formed and valid
Other related
files (like
DTD file)

Data tree

Core XML / Chapter 1 / Slide 22 of 35

Data versus Markup


Markup

<NAME>
</NAME>

Tom Cruise
Data

Core XML / Chapter 1 / Slide 23 of 35

Creating an XML
Document

To create an XML document:


State an XML declaration
Create a root element
Create the XML code
Verify the document

Core XML / Chapter 1 / Slide 24 of 35

Stating an XML
Declaration

Syntax

<?xml version=1.0 standalone=no


encoding=UTP-8?>
Standalone and encoding attributes are

optional, only the version number is mandatory


Standalone is the external declaration
Encoding - specifies the character encoding
used by the author
XML 1.0 version is default
Core XML / Chapter 1 / Slide 25 of 35

Creating a Root Element

There can only be one root element


It describes the function of the document
Every XML document must have a root
element

Example
<?xml version=1.0 standalone=no encoding=UTP-8?>
<BOOK>
</BOOK>
Core XML / Chapter 1 / Slide 26 of 35

Creating the XML Code -1

It is the process of creating our own elements and


attributes as required by our application.
Elements are the basic units of XML content.
Tags tell the user agent to do something to the
content encased between the start and end tag.
Opening Tag

Parts of an
element

<TITLE>

Content

Closing Tag

Aptech Ltd

</TITLE>

Element
Core XML / Chapter 1 / Slide 27 of 35

Creating the XML Code -2

Rules govern the elements:


At least one element required
XML tags are case sensitive
End the tags correctly
Nest tags Properly
Use legal tags
Length of markup names
Define Valid Attributes

Core XML / Chapter 1 / Slide 28 of 35

Verify the document

The document should follow the


XML rules; otherwise it will not be
read by the browser or by any other
XML reader

Core XML / Chapter 1 / Slide 29 of 35

Comments

This is information for the understanding of


the user, and is to be ignored by the
processor.
Syntax
<!- - Write the comment here -- >

Example
<!-- don't show these
<NAME>KATE WINSLET</NAME>
<NAME>NICOLE KIDMAN</NAME>
<NAME>ARNOLD</NAME>
-->
<NAME>TOM CRUISE</NAME>

The example given will


display only the name
TOM CRUSIE, and others
are treated as comments.
Core XML / Chapter 1 / Slide 30 of 35

Processing Instruction

A processing information is a bit of information


meant for the application using the XML document.
These instructions are directly passed to the
application using the parser.
The XML declaration is also a processing agent.
<?xml:stylesheet type=text/xsl?>
Name of application

Instruction information
Core XML / Chapter 1 / Slide 31 of 35

Character Data

The text between the start and end tags is


defined as character data.
Character data may be any legal (Unicode).
Character data is classified into:
PCDATA
CDATA

Core XML / Chapter 1 / Slide 32 of 35

PCDATA

It stands for parsed character data.


PCDATA is text that will be parsed by a Parser.
Tags inside the text will be treated as markup and
entities will be expanded.
Entity Name
&lt;
&gt;
&amp;
&quot;
&apos;

Character
<
>
&
"
'

Predefined entities

Core XML / Chapter 1 / Slide 33 of 35

CDATA

It means character data.


It will not be parsed by the Parser.
CDATA are used to make it convenient to include
large blocks of special characters.
The character string ]]> is not allowed within a
CDATA block as it will signal the end of the
CDATA block.

Example

<SAMPLE>
<![CDATA[<DOCUMENT>
<NAME>TOM CRUISE</NAME>
<EMAIL>tom@usa.com</EMAIL>
</DOCUMENT>]]>
</SAMPLE>
Core XML / Chapter 1 / Slide 34 of 35

Entities

Entities are used to avoid typing long pieces of


text repeatedly within a document.
There are two categories of entities:

General entities
Syntax
<!ENTITY ADDRESS "text that is to be represented
by an entity">

Parameter entities
Syntax
<!ENTITY % ADDRESS "text that is to be represented by an entity">
Core XML / Chapter 1 / Slide 35 of 35

Examples of Entities
An example of Parameter entities
< CLIENT = "&APTECH;"
PRODUCT = "&PRODUCT_ID;"
QUANTITY = "15">
Entity declaration
Syntax
%PARAMETER_ENTITY_NAM
E;
Example
%address;

An example of a General entity


<!ENTITY full_address " My
Address 12 Tenth Ave. Suite 12
Paris, France">
Entity declaration
Syntax
&ENTITY_NAME;
Example
&address;

Core XML / Chapter 1 / Slide 36 of 35

The DOCTYPE declarations

The <!DOCTYPE [..]> declaration follows the XML


declaration in an XML document.
Syntax
<?xml version="1.0"?>
<!DOCTYPE myDoc [
...declare the entities here....
<myDoc>
...body of the document....
</myDoc>

Example
<!DOCTYPE CUSTOMERS [
<!ENTITY firstFloor "15 Downing St Floor 1">
<!ENTITY secondFloor "15 Downing St Floor 2">
<!ENTITY thirdFloor "15 Downing St Floor 3">
]>
Core XML / Chapter 1 / Slide 37 of 35

Attributes

An attribute gives information about an


element.
Attributes are embedded in the element start
tag.
An attribute consists of an attribute name
and attribute value.

Example
<TV count="8">SONY</TV>
<LAPTOP count="10">IBM</LAPTOP>
Core XML / Chapter 1 / Slide 38 of 35

Summary-1

A markup language defines a set of rules that adds meaning to the


content and structure of documents
XML is extensible, which means that we can define our own set of tags,
and make it possible for other parties (people or programs) to know and
understand these tags. This makes XML much more flexible than HTML
XML inherits features from SGML and includes the features of HTML.
XML can be generated from existing databases using a scalable threetier model. XML-based data does not contain information about how
data should be displayed
An XML document is composed of a set of entities identified by
unique names

Core XML / Chapter 1 / Slide 39 of 35

Summary-2

A well-formed document is one that conforms to the basic rules of


XML; a valid document is a well-formed document that conforms to
the rules of a DTD (Document Type Definition)
The parser helps the computer to interpret an XML file
Steps involved in the building of an XML document are:
Stating an XML declaration
Creating a root element
Creating the XML code
Verifying the document
Character data is classified into PCDATA and CDATA

Core XML / Chapter 1 / Slide 40 of 35

Summary-3

Entities are used to avoid typing long pieces of text


repeatedly in a document. The two types of entities
are:
General entities
Parameter entities
The <!DOCTYPE []> declaration follows the
XML declaration in an XML document.
An attribute gives information about an element

Core XML / Chapter 1 / Slide 41 of 35

S-ar putea să vă placă și