Sunteți pe pagina 1din 34

Lecture 41

XML

Semistructured Data
Extensible Markup Language
Document Type Definitions
Source: slides and textbook of Dr. J. D. Ullman’s
1
Graphs of Semistructured Data
 Nodes = objects.
 Labels on arcs (attributes, relationships).
 Atomic values at leaf nodes (nodes with
no arcs out).
 Flexibility: no restriction on:
 Labels out of a node.
 Number of successors with a given label.

2
Example: Data Graph
root Notice a
new kind
beer beer of data.
bar
manf manf prize
name A.B.
name
servedAt year award
Bud
M’lob 1995 Gold
name addr

Joe’s Maple The beer object


for Bud
The bar object
for Joe’s Bar

3
Example
 The BARS:
BARS

BAR BAR
BAR
NAME ...
BEER
BEER
Joe’s Bar
PRICE NAME PRICE
NAME
Bud 2.50 Miller 3.00
4
Well-Formed and Valid XML
 Well-Formed XML allows you to invent
your own tags.
 Similar to labels in semistructured data.
 Valid XML involves a DTD (Document
Type Definition), which limits the labels
and gives a grammar for their use.

5
Well-Formed XML
 Start the document with a declaration,
surrounded by <?xml … ?> .
 Normal declaration is:
<? xml version = “1.0”
standalone = “yes” ?>
 “standalone” = “no DTD provided.”
 Balance of document is a root tag
surrounding nested tags.
6
Example
 The <BARS> XML document is:
BARS

BAR BAR
BAR
NAME ...
BEER
BEER
Joe’s Bar
PRICE NAME PRICE
NAME
Bud 2.50 Miller 3.00
7
Example: The <BARS> XML document
<? xml version=“1.0 standalone = “yes” ?>
<BARS>
<BAR><NAME>Joe’s Bar</NAME>
<BEER><NAME>Bud</NAME>
<PRICE>2.50</PRICE>
</BEER>
<BEER><NAME>Miller</NAME> BARS
<PRICE>3.00</PRICE>
</BEER>
BAR BAR
</BAR>
<BAR> …</BAR> BAR
<BAR> …</BAR>
NAME BEER ...
</BARS>
BEER
Joe’s Bar
PRICE NAME PRICE
NAME
Bud 2.50 Miller 3.00

8
Semistructured Data
StarsMovies

star movie
star
cf
mh sw
address name
name city year
address street title
Carrie
Mark
Fisher
city Hamill Oak B’wood Star 1977
city street
Wars
street

Maple H’wood Locust Malibu

9
XML Example
<?xml version="1.0" standalone = "yes" ?>

<STARSMOVIES>
<STAR>
<NAME>Carrie Fisher</NAME>
<ADDRESS>
<STREET>Maple</STREET>
<CITY>H&apos;wood</CITY>
</ADDRESS>
<ADDRESS>
<STREET>Locust</STREET>
<CITY>Malibu</CITY>
</ADDRESS>
</STAR>
<STAR>
<NAME>Mark Hamill</NAME>
<STREET>Oak</STREET>
<CITY>B&apos;wood</CITY>
</STAR>
<MOVIE>
<TITLE>Star Wars</TITLE>
<YEAR>1977</YEAR>
</MOVIE>
</STARSMOVIES>
10
DTD Structure
<!DOCTYPE <root tag> [
<!ELEMENT <name> ( <components> )
<more elements>
]>

11
Example: The <BARS> XML document
<? xml version=“1.0” standalone = “yes” ?> <!DOCTYPE BARS [
<BARS> <!ELEMENT BARS (BAR*)>
<BAR><NAME>Joe’s Bar</NAME> <!ELEMENT BAR (NAME, BEER+)>
<BEER><NAME>Bud</NAME> <!ELEMENT NAME (#PCDATA)>
<PRICE>2.50</PRICE> <!ELEMENT BEER (NAME, PRICE)>
</BEER>
<BEER><NAME>Miller</NAME>
<!ELEMENT PRICE (#PCDATA)>
<PRICE>3.00</PRICE> ]>
</BEER>
</BAR> BARS
<BAR> …</BAR>
<BAR> …</BAR> BAR BAR
</BARS>
BAR
NAME BEER ...
BEER
Joe’s Bar
PRICE NAME PRICE
NAME
12
Bud 2.50 Miller 3.00
Element Descriptions
 Subtags must appear in order shown.
 A tag may be followed by a symbol to
indicate its multiplicity.
 * = zero or more.
 + = one or more.
 ? = zero or one.
 Symbol | can connect alternative
sequences of tags.
13
End of review

14
Use of DTD’s
1. Set standalone = “no”.
2. Either:
a) Include the DTD as a preamble of the
XML document, or
b) Follow DOCTYPE and the <root tag> by
SYSTEM and a path to the file where the
DTD can be found.

15
Example (a)
<? xml version = “1.0” standalone=“no” ?>
<!DOCTYPE BARS [
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, BEER+)> The DTD
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT BEER (NAME, PRICE)>
<!ELEMENT PRICE (#PCDATA)> The document
]>
<BARS>
<BAR><NAME>Joe’s Bar</NAME>
<BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER>
</BAR>
<BAR> … </BAR>
</BARS>

16
Example (b)
 Assume the BARS DTD is in file bar.dtd.
<? xml version = “1.0” standalone=“no” ?>
<!DOCTYPE BARS SYSTEM “bar.dtd”>
<BARS> Get the DTD
<BAR><NAME>Joe’s Bar</NAME> from the file
<BEER><NAME>Bud</NAME> bar.dtd
<PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER>
</BAR>
<BAR> … </BAR>
</BARS>

17
Attributes
 Opening tags in XML can have
attributes.
 In a DTD,
<!ATTLIST E . . . >
declares an attribute for element E,
along with its datatype.

18
ATTLIST syntax
<!ATTLIST element-name attribute-name
attribute-type default-value>

attribute-type:
CDATA The value is character data
(en1|en2|..) The value must be one from an
enumerated list
ID The value is a unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids

Source:http://www.w3schools.com/dtd/dtd_attributes.asp 19
ATTLIST syntax (cont’d)
<!ATTLIST element-name attribute-name
attribute-type default-value>

default-value :
value The default value of the attribute
#REQUIRED The attribute value must be included in the element
#IMPLIED The attribute does not have to be included
#FIXED value The attribute value is fixed

Source:http://www.w3schools.com/dtd/dtd_attributes.asp 20
Example: Attributes
 Bars can have an attribute kind, a
character string describing the bar.
<!ELEMENT BAR (NAME BEER*)>
<!ATTLIST BAR kind CDATA
#IMPLIED>
Character string
type; no tags
Attribute is optional
opposite: #REQUIRED
21
Example: Attribute Use
 In a document that allows BAR tags, we might
see:
<BAR kind = “sushi”> Note attribute
values are quoted
<NAME>Akasaka</NAME>
<BEER><NAME>Sapporo</NAME>
<PRICE>5.00</PRICE></BEER>
...
</BAR>
22
Example: Attributes
 Bars can have an attribute kind, which is
either sushi, sports, or other.
<!ELEMENT BAR (NAME BEER*)>
<!ATTLIST BAR kind (sushi |
sports | other) “sushi”>

23
Example (c)
<? xml version = “1.0” standalone=“no” ?>
<!DOCTYPE BARS [
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, BEER+)>
<!ATTLIST BAR kind (sushi | sports | other) “sushi”>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT BEER (NAME, PRICE)>
<!ELEMENT PRICE (#PCDATA)>
]>
<BARS>
<BAR kind = “sushi”><NAME>Joe’s Bar</NAME>
<BEER><NAME>Bud</NAME>
<PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER>
</BAR>
<BAR kind = “sports”> … </BAR>
</BARS> 24
Example (d)
<?xml version = “1.0” standalone=“no” ?>
<!DOCTYPE BARS [
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, kind, BEER+)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT kind (#PCDATA)>
<!ELEMENT BEER (NAME, PRICE)>
<!ELEMENT PRICE (#PCDATA)>
]>
<BARS>
<BAR><NAME>Joe’s Bar</NAME> <kind>sushi</kind>
<BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER>
</BAR>
<BAR> … </BAR>
</BARS>

25
ID’s and IDREF’s
 Attributes can be pointers from one
object to another.
 Compare to HTML’s NAME = “foo” and
HREF = “#foo”.
 Allows the structure of an XML
document to be a general graph, rather
than just a tree.

26
Creating ID’s
 Give an element E an attribute A of
type ID.
 When using tag <E > in an XML
document, give its attribute A a unique
value.
 Example:
<E A = “xyz”>

27
Creating IDREF’s
 To allow objects of type F to refer to
another object with an ID attribute,
give F an attribute of type IDREF.
 Or, let the attribute have type IDREFS,
so the F –object can refer to any
number of other objects.

28
Example: ID’s and IDREF’s
 Let’s redesign our BARS DTD to include both
BAR and BEER subelements.
 Both bars and beers will have ID attributes
called name.
 Bars have PRICE subobjects, consisting of a
number (the price of one beer) and an IDREF
theBeer leading to that beer.
 Beers have attribute soldBy, which is an
IDREFS leading to all the bars that sell it.

29
The DTD Bar elements have name
as an ID attribute and
have one or more
SELLS subelements.
<!DOCTYPE BARS [
<!ELEMENT BARS (BAR*, BEER*)>
SELLS elements
<!ELEMENT BAR (SELLS+)> have a number
<!ATTLIST BAR name ID #REQUIRED> (the price) and
<!ELEMENT SELLS (#PCDATA)> one reference
to a beer.
<!ATTLIST SELLS theBeer IDREF #REQUIRED>
<!ELEMENT BEER EMPTY>
<!ATTLIST BEER name ID #REQUIRED>
<!ATTLIST BEER soldBy IDREFS #IMPLIED>
]>
Explained Beer elements have an ID attribute called name,
next and a soldBy attribute that is a set of Bar names.
30
Example Document
<BARS>
<BAR name = “JoesBar”>
<SELLS theBeer = “Bud”>2.50</SELLS>
<SELLS theBeer = “Miller”>3.00</SELLS>
</BAR> …
<BEER name = “Bud” soldBy = “JoesBar
SuesBar …”/> …
</BARS>
31
Empty Elements
 We can do all the work of an element in
its attributes.
 Like BEER in previous example.
 Another example: SELLS elements could
have attribute price rather than a
value that is a price.

32
Example: Empty Element
 In the DTD, declare:
<!ELEMENT SELLS EMPTY>
<!ATTLIST SELLS theBeer IDREF #REQUIRED>
<!ATTLIST SELLS price CDATA #REQUIRED>
 Example use:
<SELLS theBeer = “Bud” price = “2.50”/>

Note exception to
“matching tags” rule 33
34

S-ar putea să vă placă și