Sunteți pe pagina 1din 90

PHP XML Tutorial: Create, Parse, Read with Example

What is XML?
XML is the acronym for Extensible Markup Language.

XML is used to structure, store and transport data from one system to another.

XML is similar to HTML.

It uses opening and closing tags.

Unlike HTML, XML allows users to define their own tags.

In this tutorial, you will learn-

 What is DOM?
 XML Parsers
 Why use XML?
 XML Document Example
 How to Read XML using PHP
 How to Create an XML document using PHP

What is DOM?
DOM is the acronym for Document Object Model.

It’s a cross platform and language neutral standard that defines how to access and manipulate
data in;

 HTML
 XHTML
 XML

DOM XML is used to access and manipulate XML documents. It views the XML document as a
tree-structure.

XML Parsers
An XML parser is a program that translates the XML document into an XML Document Object
Model (DOM) Object.

The XML DOM Object can then be manipulated using JavaScript, Python, and PHP etc.
The keyword CDATA which is the acronym for (Unparsed) Character Data is used to ignore
special characters such as “<,>” when parsing an XML document.

Why use XML?


 Web services such as SOAP and REST use XML format to exchange information. Learning what
XML is and how it works will get you competitive advantage as a developer since modern
applications make heavy use of web services.
 XML documents can be used to store configuration settings of an application
 It allows you to create your own custom tags which make it more flexible.

XML Document example


Let’s suppose that you are developing an application that gets data from a web service in XML
format.

Below is the sample of how the XML document looks like.

<?xml version="1.0" encoding="utf-8"?>

<employees status = "ok">

<record man_no = "101">

<name>Joe Paul</name>

<position>CEO</position>

</record>

<record man_no = "102">

<name>Tasha Smith</name>

<position>Finance Manager</position>

</record>

</employees>

HERE,

 “<?xml version="1.0" encoding="utf-8"?>” specifies the xml version to be used and encoding
 “<employees status = "ok">” is the root element.
 “<record…>…</record>” are the child elements of administration and sales respectively.

How to Read XML using PHP


Let’s now write the code that will read the employees XML document and display the results in
a web browser. Index.php

<?php

$xml = simplexml_load_file('employees.xml');

echo '<h2>Employees Listing</h2>';

$list = $xml->record;

for ($i = 0; $i < count($list); $i++) {

echo '<b>Man no:</b> ' . $list[$i]->attributes()->man_no . '<br>';

echo 'Name: ' . $list[$i]->name . '<br>';

echo 'Position: ' . $list[$i]->position . '<br><br>';

?>

HERE,

 “$xml = simplexml_load_file('employees.xml');” uses the simplexml_load_file function to load


the file name employees.xml and assign the contents to the array variable $xml.
 “$list = $xml->record;” gets the contents of the record node.
 “for ($i = 0; $i < count(…)…” is the for loop that reads the numeric array and outputs the results
 “$list[$i]->attributes()->man_no;” reads the man_no attribute of the element
 “$list[$i]->name;” reads the value of the name child element
 “$list[$i]->position;” reads the value of the position child element

Testing our application


Assuming you saved the file index.php in phptus/xml folder, browse to the URL
http://localhost/phptuts/xml/index.php
How to Create an XML document using PHP
We will now look at how to create an XML document using PHP.

We will use the example above in the DOM tree diagram.

The following code uses the PHP built in class DOMDocument to create an XML document.

<?php

$dom = new DOMDocument();

$dom->encoding = 'utf-8';

$dom->xmlVersion = '1.0';

$dom->formatOutput = true;

$xml_file_name = 'movies_list.xml';

$root = $dom->createElement('Movies');

$movie_node = $dom->createElement('movie');

$attr_movie_id = new DOMAttr('movie_id', '5467');

$movie_node->setAttributeNode($attr_movie_id);

$child_node_title = $dom->createElement('Title', 'The Campaign');

$movie_node->appendChild($child_node_title);
$child_node_year = $dom->createElement('Year', 2012);

$movie_node->appendChild($child_node_year);

$child_node_genre = $dom->createElement('Genre', 'The Campaign');

$movie_node->appendChild($child_node_genre);

$child_node_ratings = $dom->createElement('Ratings', 6.2);

$movie_node->appendChild($child_node_ratings);

$root->appendChild($movie_node);

$dom->appendChild($root);

$dom->save($xml_file_name);

echo "$xml_file_name has been successfully created";

?>

HERE,

 “$dom = new DOMDocument();” creates an instance of DOMDocument class.


 “$dom->encoding = 'utf-8';” sets the document encoding to utf-8
 “$dom->xmlVersion = '1.0';” specifies the version number 1.0
 “$dom->formatOutput = true;” ensures that the output is well formatted
 “$root = $dom->createElement('Movies');” creates the root node named Movies
 “$attr_movie_id = new DOMAttr('movie_id', '5467');” defines the movie id attribute of Movies
node
 “$child_node_element_name = $dom->createElement('ElementName', 'ElementValue')” creates
the child node of Movies node. ElementName specifies the name of the element e.g. Title.
ElementValue sets the child node value e.g. The Campaign.
 “$root->appendChild($movie_node);” appends the movie_node elements to the root node
Movies
 “$dom->appendChild($root);” appends the root node to the XML document.
 “$dom->save($xml_file_name);” saves the XML file in the root directory of the web server.
 “echo '<a href= "'.$xml_file_name.'">' . $xml_file_name . '</a> has been successfully created';”
creates the link to the XML file.

Testing our application


Assuming you saved the file create_movies_list in phptuts/xml folder, browse to the URL
http://localhost/phptuts/xml/create_movies_list.php
Click on movies_list_xml link

Summary
 XML is the acronym for Extensible Markup Language
 XML can be used for exchanging information between systems or store configuration settings of
an application etc.
 DOM is the acronym for Document Object Model. XML DOM views the XML document as a tree-
structure
 An XML Parser is a program that translates XML an XML document into a DOM tree-structure
like document.
 CDATA is used to ignore special characters when parsing XML documents.
 PHP uses the simplexml_load_file to read XML documents and return the results as a numeric
array
 PHP DOMDocument class to create XML files.

IBM XML

Reading and writing Extensible Markup Language (XML) in PHP may seem a little frightening.
In fact, XML and all its related technologies can be intimidating. However, reading and writing
XML in PHP doesn't have to be a daunting task. First, you need to learn a little about XML --
what it is and what it's used for. Then, you need to learn how to read and write XML in PHP,
which you can do in many ways.

This article provides a short primer on XML, then explains how to read and write XML in PHP.

What is XML?
XML is a data storage format. It doesn't define what data is being stored or the structure of that
data. XML simply defines tags and attributes for those tags. A properly formed XML tag looks
like this:

<name>Jack Herrington</name>

This <name> tag contains some text: Jack Herrington.

An XML tag that contains no text looks like this:

<powerUp />

There may be more than one way to code something in XML. For instance, this tag produces the
same output as the previous one:

<powerUp></powerUp>

You can also add attributes to an XML tag. For example, this <name> tag contains first and
last attributes:

<name first="Jack" last="Herrington" />

You can encode special characters in XML, too. For instance, an ampersand is encoded like this:

&
An XML document that contains tags and attributes formatted like the examples provided is well
formed, which means the tags are balanced, and the characters are encoded properly. Listing 1 is
an example of well-formed XML.

Listing 1. An XML book list example

1
2 <books>
3 <book>
<author>Jack Herrington</author>
4 <title>PHP Hacks</title>
5 <publisher>O'Reilly</publisher>
6 </book>
7 <book>
8 <author>Jack Herrington</author>
<title>Podcasting Hacks</title>
9 <publisher>O'Reilly</publisher>
10</book>
11</books>
12

The XML in Listing 1 contains a list of books. The parent <books> tag includes a set of <book>
tags that each contain <author>, <title>, and <publisher> tags.

XML documents are valid when the structure of the tags and their content is validated by an
external schema file. Schema files can be specified in a variety of formats. For the purposes of
this article, all you need is well-formed XML.

If you think XML looks a lot like Hypertext Markup Language (HTML), you're right. Both XML
and HTML are tag-based languages, and they have many similarities. However, it's important to
note that while XML documents can be well-formed HTML, not all HTML documents are well-
formed XML. The break tag (br) is an excellent example of the differences between XML and
HTML. This line break is well-formed HTML, but not well-formed XML:

<p>This is a paragraph<br>
With a line break</p>

This line break is well-formed XML and HTML:

<p>This is a paragraph<br />


With a line break</p>

If you want to write HTML that is well-formed XML, follow the Extensible Hypertext Markup
Language (XHTML) standard from the World Wide Web Consortium (W3C). All modern
browsers render XHTML. Plus, it's possible to use XML tools to read XHTML and to find data
in the documents, which is far easier than parsing through HTML.

Reading XML using the DOM library


The easiest way to read a well-formed XML file is to use the Document Object Model (DOM)
library compiled into some installations of PHP. The DOM library reads the entire XML
document into memory and represents it as a tree of nodes, as illustrated in Figure 1.

Figure 1. XML DOM tree for the books XML

The books node at the top of the tree has two child book tags. Within each book, there are
author, publisher, and title nodes. The author, publisher, and title nodes each have
child text nodes that contain the text.

The code to read the books XML file and display the contents using the DOM is shown in
Listing 2.

Listing 2. Reading books XML with the DOM

1
2 <?php
$doc = new DOMDocument();
3 $doc->load( 'books.xml' );
4
5 $books = $doc->getElementsByTagName( "book" );
6 foreach( $books as $book )
7 {
8 $authors = $book->getElementsByTagName( "author" );
$author = $authors->item(0)->nodeValue;
9
10$publishers = $book->getElementsByTagName( "publisher" );
11$publisher = $publishers->item(0)->nodeValue;
12
13$titles = $book->getElementsByTagName( "title" );
14$title = $titles->item(0)->nodeValue;
15
echo "$title - $author - $publisher\n";
16}
17?>
18
19

The script starts by creating a new DOMdocument object and loading the books XML into that
object using the load method. After that, the script uses the getElementsByName method to get
a list of all of the elements with the given name.

Within the loop of the book nodes, the script uses the getElementsByName method to get the
nodeValue for the author, publisher, and title tags. The nodeValue is the text within the
node. The script then displays those values.

You can run the PHP script on the command line like this:

% php e1.php
PHP Hacks - Jack Herrington - O'Reilly
Podcasting Hacks - Jack Herrington - O'Reilly
%

As you can see, a line is printed for each book block. That's a good start. However, what if you
don't have access to the XML DOM library?

Reading XML using the SAX parser


Another way to read XML is to use the Simple API for XML (SAX) parser. Most installations of
PHP include the SAX parser. The SAX parser runs on a callback model. Every time a tag is
opened or closed, or any time the parser sees some text, it makes callbacks to some user-defined
functions with the node or text information.

The advantage of a SAX parser is that it's really lightweight. The parser doesn't keep anything in
memory for very long, so it can be used for extremely large files. The disadvantage is that
writing SAX parser callbacks is a big nuisance. Listing 3 shows the code to read the books XML
file and display the contents using SAX.

Listing 3. Reading books XML with the SAX parser

1 <?php
2 $g_books = array();
3 $g_elem = null;
4
function startElement( $parser, $name, $attrs )
5 {
6 global $g_books, $g_elem;
7 if ( $name == 'BOOK' ) $g_books []= array();
8 $g_elem = $name;
9 }
10function endElement( $parser, $name )
11{
12global $g_elem;
13$g_elem = null;
14}
15
16function textData( $parser, $text )
{
17global $g_books, $g_elem;
18if ( $g_elem == 'AUTHOR' ||
19$g_elem == 'PUBLISHER' ||
20$g_elem == 'TITLE' )
21{$g_books[ count( $g_books ) - 1 ][ $g_elem ] = $text;
22}
23}
24
25$parser = xml_parser_create();
26
27xml_set_element_handler( $parser, "startElement", "endElement" );
xml_set_character_data_handler( $parser, "textData" );
28
29$f = fopen( 'books.xml', 'r' );
30
31while( $data = fread( $f, 4096 ) )
32{
33xml_parse( $parser, $data );
34}
35xml_parser_free( $parser );
36
37foreach( $g_books as $book )
38{
39echo $book['TITLE']." - ".$book['AUTHOR']." - ";
40echo $book['PUBLISHER']."\n";
}
41?>
42
43
44
45
46
47
48

The script starts by setting up the g_books array, which holds all the books and their information
in memory, and a g_elem variable, which stores the name of the tag the script is currently
processing. The script then defines the callback functions. In this example, the callback functions
are startElement, endElement, and textData. The startElement and endElement functions
are called when tags are opened and closed, respectively. The textData function is called on the
text between the start and end of the tags.

In this example, the startElement tag is looking for the book tag to start a new element in the
book array. Then, the textData function looks at the current element to see if it's a publisher,
title, or author tag. If so, the function puts the current text into the current book.
To get the parsing going, the script creates the parser with the xml_parser_create function.
Then, it sets the callback handlers. After that, the script reads in the file and sends off chunks of
the file to the parser. After the file is read, the xml_parser_free function deletes the parser. The
end of the script dumps out the contents of the g_books array.

As you can see, this is much tougher code to write than the DOM equivalent. What if you don't
have the DOM library or the SAX library? Is there another alternative?

Parsing XML with regular expressions


I'm certain to be vilified by some engineers for even mentioning this approach, but you can parse
XML with regular expressions. Listing 4 shows an example of using the preg_ functions to read
the books file.

Listing 4. Reading books XML with regular expressions

1
2
<?php
3 $xml = "";
4 $f = fopen( 'books.xml', 'r' );
5 while( $data = fread( $f, 4096 ) ) { $xml .= $data; }
6 fclose( $f );
7
8 preg_match_all( "/\<book\>(.*?)\<\/book\>/s",
$xml, $bookblocks );
9
10foreach( $bookblocks[1] as $block )
11{
12preg_match_all( "/\<author\>(.*?)\<\/author\>/",
13$block, $author );
14preg_match_all( "/\<title\>(.*?)\<\/title\>/",
$block, $title );
15preg_match_all( "/\<publisher\>(.*?)\<\/publisher\>/",
16$block, $publisher );
17echo( $title[1][0]." - ".$author[1][0]." - ".
18$publisher[1][0]."\n" );
}
19?>
20
21

Notice how short that code is. It starts by reading the file into one big string. It then uses one
regex function to read in each book item. Finally, using the foreach loop, the script loops
around each book block and picks out the author, title, and publisher.

So, what are the shortcomings? The problem with using regular expression code to read XML is
that it doesn't check first to make sure that the XML is well formed. That means you may not
know you have XML that is not well formed before you start reading it. Also, some valid forms
of XML may not match your regular expressions, so you will have to modify them later.
I never recommend using regular expressions to read XML, but sometimes it's the most
compatible way because the regular expression functions are always available. Don't use regular
expressions to read XML that comes directly from users; you don't control the form or structure
of that XML. Always read XML from users using a DOM library or SAX parser.

Writing XML with the DOM


Reading XML is only one part of the equation. What about writing it? The best way to write
XML is to use the DOM. Listing 5 shows how the DOM builds the books XML file.

Listing 5. Writing books XML with the DOM

1 <?php
2 $books = array();
$books [] = array(
3 'title' => 'PHP Hacks',
4 'author' => 'Jack Herrington',
5 'publisher' => "O'Reilly"
6 );
7 $books [] = array(
'title' => 'Podcasting Hacks',
8 'author' => 'Jack Herrington',
9 'publisher' => "O'Reilly"
10);
11
12$doc = new DOMDocument();
$doc->formatOutput = true;
13
14$r = $doc->createElement( "books" );
15$doc->appendChild( $r );
16
17foreach( $books as $book )
18{
19$b = $doc->createElement( "book" );
20
$author = $doc->createElement( "author" );
21$author->appendChild(
22$doc->createTextNode( $book['author'] )
23);
24$b->appendChild( $author );
25
26$title = $doc->createElement( "title" );
$title->appendChild(
27$doc->createTextNode( $book['title'] )
28);
29$b->appendChild( $title );
30
31$publisher = $doc->createElement( "publisher" );
$publisher->appendChild(
32$doc->createTextNode( $book['publisher'] )
33);
34$b->appendChild( $publisher );
35
36$r->appendChild( $b );
37}
38
echo $doc->saveXML();
39?>
40
41
42
43
44
45
46

At the top of the script, the books array is loaded with some example books. That data could
come from the user or from a database.

After the example books are loaded, the script creates a new DOMDocument and adds the root
books node to it. Then the script creates an element for the author, title, and publisher for each
book and adds a text node to each of those nodes. The final step for each book node is to re-
attach it to the root books node.

The end of the script dumps the XML to the console using the saveXML method. (You can also
use the save method to create a file from the XML.) The output of the script is shown in Listing
6.

Listing 6. Output from the DOM build script

1
2 % php e4.php
3 <?xml version="1.0"?>
4 <books>
5 <book>
<author>Jack Herrington</author>
6 <title>PHP Hacks</title>
7 <publisher>O'Reilly</publisher>
8 </book>
9 <book>
<author>Jack Herrington</author>
10<title>Podcasting Hacks</title>
11<publisher>O'Reilly</publisher>
12</book>
13</books>
14%
15

The real value of using the DOM is that the XML it creates is always well formed. But what can
you do if you don't have access to the DOM to create XML?
Writing XML with PHP
If the DOM isn't available, you can use PHP text templating to write XML. Listing 7 shows how
PHP builds the books XML file.

Listing 7. Writing books XML with PHP

1
2
3 <?php
4 $books = array();
5 $books [] = array(
6 'title' => 'PHP Hacks',
7 'author' => 'Jack Herrington',
'publisher' => "O'Reilly"
8 );
9 $books [] = array(
10'title' => 'Podcasting Hacks',
11'author' => 'Jack Herrington',
12'publisher' => "O'Reilly"
);
13?>
14<books>
15<?php
16
17foreach( $books as $book )
{
18?>
19<book>
20<title><?php echo( $book['title'] ); ?></title>
21<author><?php echo( $book['author'] ); ?>
22</author>
<publisher><?php echo( $book['publisher'] ); ?>
23</publisher>
24</book>
25<?php
26}
27?>
</books>
28
29
30

The top of the script is similar to the DOM script. The bottom of the script opens the books tag,
then iterates through each book, creating the book tag and all the internal title, author, and
publisher tags.

The problem with this approach is encoding the entities. To make sure the entities are properly
encoded, the htmlentities function must be called on each item, as shown in Listing 8.

Listing 8. Using the htmlentities function to encode entities


1
2
<books>
3 <?php
4
5 foreach( $books as $book )
6 {
7 $title = htmlentities( $book['title'], ENT_QUOTES );
$author = htmlentities( $book['author'], ENT_QUOTES );
8 $publisher = htmlentities( $book['publisher'], ENT_QUOTES );
9 ?>
10<book>
11<title><?php echo( $title ); ?></title>
12<author><?php echo( $author ); ?> </author>
<publisher><?php echo( $publisher ); ?>
13</publisher>
14</book>
15<?php
16}
17?>
</books>
18
19

This is why it's annoying to write XML in basic PHP. You think that you're creating perfect
XML, but then you find that certain elements aren't encoded properly when you try to run data
through it.

Conclusions
XML has always had a lot of hype and confusion surrounding it. However, it's not as difficult as
you think it is -- especially in a great language like PHP. When you understand and implement
XML properly, you'll find there are a lot of powerful tools you can use. XPath and XSLT are two
such tools that are worth checking out.

Advanced XML parsing techniques

PHP5's XML parsing techniques for large or complex XML documents

Cliff Morgan
Published on March 06, 2007

FacebookTwitterLinked InGoogle+E-mail this page


5

Content series:

Advanced techniques to read, manipulate, and write XML

Add XSLT to DOM and SimpleXML APIs

Cliff Morgan
Published on March 13, 2007

FacebookTwitterLinked InGoogle+E-mail this page

Content series:

This content is part 3 of 3 in the series: XML for PHP


developers
PHP5 offers the developer a lot more muscle to work with XML. New and modified extensions
such as the DOM, SimpleXML, and XSL make working with XML less code intensive. In
PHP5, the DOM is compliant with the W3C standard. Most importantly, the interoperability
among these extensions is significant, providing additional functionality, like swapping formats
to extend usability, W3C's XPath, and more, across the board. Here you will look at input and
output options, and you will depend on the Yahoo Web Services REST protocol interface to
provide a more sophisticated showcase for the functionality of the now familiar DOM and
SimpleXML extensions and conclude with the XSL extension.
Previously in this series
Other articles in this series

 XML for PHP developers, Part 1: The 15-minute PHP-with-XML starter


 XML for PHP developers, Part 2: Advanced XML parsing techniques

The first article of this series provided essential information on XML. It focused on quick start
Application Programming Interfaces (APIs) and demonstrated how SimpleXML, when
combined with the Document Object Model (DOM) as necessary, is the ideal choice for if you
work with straightforward, predictable, and relatively basic XML documents. Part 2 looked at
the breadth of parsing APIs available for XML in PHP5, including SimpleXML, the DOM,
Simple API for XML (SAX), and XMLReader and considered which parsing techniques were
most appropriate for different sizes and complexities of XML documents.

XML in PHP5

Extensible Markup Language (XML), described as both a markup language and a text-based data
storage format, offers a text-based means to apply and describe a tree-based structure to
information. Here you'll look at XML in the context of Web services, probably one of the most
important factors driving the recent growth of XML outside the enterprise world.

In PHP5, there are totally new and entirely rewritten extensions for manipulating XML, all based
on the same libxml2 code. This common base provides interoperability between these extensions
that extends the functionality of each. The tree-based parsers include SimpleXML, the DOM,
and the XSLT processor. If you are familiar with the DOM from other languages, you will have
an easier time coding with similar functionality in PHP than before. The stream-based parsers
include the Simple API for XML (SAX) and XMLReader. SAX functions the same way it did in
PHP4.

Manipulating XML using the DOM


You can use to manipulate an XML file. Using the DOM is efficient only when the XML file is
relatively small. The advantages to using this method are the solid standard of the familiar W3C
DOM, its methods, and the flexibility it brings to coding. The disadvantages of the DOM are the
difficulty in coding and performance issues with large documents.

The DOM in action


With the DOM, you can build, modify, query, validate and transform XML documents. All
DOM methods and properties can be used, and most DOM level 2 methods are implemented
with properties properly supported. Documents parsed with the DOM can be as complex as they
come thanks to its tremendous flexibility. Remember however, that flexibility comes at a price if
you load a large XML document into memory all at once.
The examples in this article use Yahoo's search API, PHP5, and REpresentational State Transfer
(REST) to illustrate the use of the DOM in an interesting application environment. Yahoo chose
REST because of a common belief among developers that REST offers 80% of SOAP's benefits
at 20% of the cost. I chose this application to showcase PHP/XML because the popularity of
Web services is probably one of the most important factors driving the recent growth of XML
outside the enterprise world.

Typically, REST forms a request by beginning with a service entry URL and then appending
search parameters in the form of a query string. Then Listing 1 parses the results of the query
using the DOM extension.

Listing 1. The Yahoo Demo code sample using the DOM

1 <?php

3 //This query does a search for any Web pages relevant to "XML Query"

4 $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".

"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
//Create the DOM Document object from the XML returned by the query
7
$xml = file_get_contents($query);
8
$dom = new DOMDocument;
9
$dom = DOMDocument::loadXML($xml);
10

11
function xml_to_result($dom) {
12

13 //This function takes the XML document and maps it to a

14 //PHP object so that you can manipulate it later.

15

16 //First, retrieve the root element for the document

17 $root = $dom->firstChild;

18
19 //Next, loop through each of its attributes

20 foreach($root->attributes as $attr) {

$res[$attr->name] = $attr->value;
21
}
22

23
//Now, loop through each of the children of the root element
24
//and treat each appropriately.
25

26
//Start with the first child node. (The counter, i, is for
27
//tracking results.
28 $node = $root->firstChild;

29 $i = 0;

30

31 //Now keep looping through as long as there is a node to work

32 //with. (At the bottom of the loop, the code moves to the next

//sibling, so when it runs out of siblings, the routine stops.


33
while($node) {
34

35
//For each node, check to see whether it's a Result element or
36
//one of the informational elements at the start of the document.
37
switch($node->nodeName) {
38

39
//Result elements need more analysis.
40 case 'Result':

41 //Add each child node of the Result to the result object,

42 //again starting with the first child.

43 $subnode = $node->firstChild;
44 while($subnode) {

45

46 //Some of these nodes just are just whitespace, which does

//not have children.


47
if ($subnode->hasChildNodes()){
48

49
//If it does have children, get a NodeList of them, and
50
//loop through it.
51
$subnodes = $subnode->childNodes;
52
foreach($subnodes as $n) {
53

54 //Again check for children, adding them directly or

55 //indirectly as appropriate.

56 if($n->hasChildNodes()) {

57 foreach($n->childNodes as $cn){

$res[$i][$subnode->nodeName][$n->nodeName]=
58
trim($cn->nodeValue);
59
}
60
} else {
61
$res[$i][$subnode->nodeName]=trim($n->nodeValue);
62
}
63 }

64 }

65 //Move on to the next subnode.

66 $subnode = $subnode->nextSibling;

67 }

$i++;
68
69 break;

70 //Other elements are just added to the result object.

default:
71
$res[$node->nodeName] = trim($node->nodeValue);
72
break;
73
}
74

75
//Move on to the next Result of informational element
76
$node = $node->nextSibling;
77 }

78 return $res;

79 }

80

81 //First, convert the XML to a DOM object you can manipulate.

82 $res = xml_to_result($dom);

83
//Use one of those "informational" elements to display the total
84
//number of results for the query.
85
echo "<p>The query returns ".$res["totalResultsAvailable"].
86
" total results The first 10 are as follows:</p>";
87

88
//Now loop through each of the actual results.
89 for($i=0; $i<$res['totalResultsReturned']; $i++) {

90

91 echo "<a href='".$res[$i]['ClickUrl']."'><b>".

92 $res[$i]['Title']."</b></a>: ";

93 echo $res[$i]['Summary'];
94

95 echo "<br /><br />";

96 }

97
?>
98

99

100

101

102

103

104

105

Manipulating XML using SimpleXML


The SimpleXML extension is a tool of choice for manipulating an XML document, provided that
the XML document isn't too complicated or too deep, and contains no mixed content.
SimpleXML is easier to code than the DOM, as its name implies. It is far more intuitive if you
work with a known document structure. Greatly increasing the flexibility of the DOM and
SimpleXML the interoperative nature of the libXML2 architecture allows imports to swap
formats from DOM to SimpleXML and back at will.

SimpleXML in action

Documents manipulated with SimpleXML simple and quick to code. The following code parses
the results of the query using the SimpleXML extension. As you might expect, the following
SimpleXML code (see Listing 2) is more compact than the DOM code example shown above in
Listing 1.

Listing 2. The Yahoo SimpleXML example

<?php
1

2
//This query does a search for any Web pages relevant to "XML Query"
3 $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".

4 "query=%5C%22XML%20Query%5C%22&appid=YahooDemo";

5
$xml = simplexml_load_file($query);
6

7
// Load up the root element attributes
8
foreach($xml->attributes() as $name=>$attr) {
9
$res[$name]=$attr;
10
}
11

12
//Use one of those "informational" elements to display the total
13//number of results for the query.

14echo "<p>The query returns ".$res["totalResultsAvailable"].

15 " total results The first 10 are as follows:</p>";

16

17//Unlike with DOM, where we loaded the entire document into the

18//result object, with SimpleXML, we get back an object in the


//first place, so we can just use the number of results returned
19
//to loop through the Result members.
20

21
for($i=0; $i<$res['totalResultsReturned']; $i++) {
22

23
//The object represents each piece of data as a member variable
24
//rather than an array element, so the syntax is a little bit
25 //different from the DOM version.
26

27 $thisResult = $xml->Result[$i];
28

29 echo "<a href='".$thisResult->ClickUrl."'><b>".

30 $thisResult->Title."</b></a>: ";

echo $thisResult->Summary;
31

32
echo "<br /><br />";
33
}
34

35
?>
36

37

38

39

Listing 3 adds a cache layer to the SimpleXML example from Listing 2. The cache caches the
results of any particular query for two hours.

Listing 3. The Yahoo SimpleXML example with a cache layer

1 <?php

2
//This query does a search for any Web pages relevant to "XML Query"
3
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
4
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
//The cached material should only last for 2 hours, so you need the
7
//current time.
8
$currentTime = microtime(true);
9

10//This is where I put my tempfile; you can store yours in a more


11//convenient location.

12$cache = 'c:\temp\yws_'.md5($query);

13
//First check for an existing version of the time, and then check
14
//to see whether or not it's expired.
15
if(file_exists($cache) &&
16
filemtime($cache) > (time()-7200)) {
17

18
//If there's a valid cache file, load its data.
19
$data = file_get_contents($cache);
20} else {

21

22 //If there's no valid cache file, grab a live version of the

23 //data and save it to a temporary file. Once the file is complete,

24 //copy it to a permanent file. (This prevents concurrency issues.)

$data = file_get_contents($query);
25
$tempName = tempnam('c:\temp','YWS');
26
file_put_contents($tempName, $data);
27
rename($tempName, $cache);
28
}
29

30
//Wherever the data came from, load it into a SimpleXML object.
31$xml = simplexml_load_string($data);

32

33//From here, the rest of the file is the same.

34

35// Load up the root element attributes


36foreach($xml->attributes() as $name=>$attr) {

37 $res[$name]=$attr;

}
38

39
...
40

41

42

43

Manipulating XML using XSL


EXtensible Stylesheet Language (XSL) is a functional XML language that was created for the
task of manipulating XML documents. Using XSL, you can transform an XML document into a
redefined XML document, an XHTML document, an HTML document, or a text document
based on a stylesheet definition similar to the way CSS works by implementing rules. PHP5's
implementation of the W3C standard supports interoperability with the DOM and XPath.
EXtensible Stylesheet Language Transformations (XSLT) is an XML extension based on
libxml2, and its stylesheets are XML documents. XSLT transforms an XML source tree into an
XML or XML-type result tree. These transformations apply the series of rules specified in the
stylesheet to the XML data. XSLT can add or remove elements or attributes to or from the output
file. It allows the developer to sort or rearrange elements and make decisions about what
elements to hide or display. Different stylesheets allow for your XML to be displayed
appropriately for different media, such as screen display versus print display. XSLT uses XPath
to navigate through the original XML document. The XSLT transformation model usually
involves a source XML file, an XSLT file containing one or more processing templates, and an
XSLT processor. XSLT documents have to be loaded using the DOM. PHP5 supports only the
libxslt processor.

XSL in action

An interesting application of XSL is to create XML files on the fly to contain whatever data has
just been selected from the database. Using this technique, it is possible to create complete Web
applications where the PHP scripts are made up of XML files from database queries, then use
XSL transformations to generate the actual HTML documents.

This method completely splits the presentation layer from the business layer so that you can
maintain either of these layers independently of the other.
Listing 4 illustrates the relationship between the XML input file, the XSL stylesheet, the XSLT
processor, and multiple possible outputs.

Listing 4. XML transformation

<?php
1

2
// Create new XSLTProcessor
3
$xslt = new XSLTProcessor();
4

5
//Both the source document and the stylesheet must be
6
//DOMDocuments, but the result can be a DOMDocument,
7
//a file, or even a String.
8

9 // Load the XSLT stylesheet

10$xsl = new DOMDocument();

11$xsl->load('recipe.xsl');

12

13// Load the stylesheet into the processor

14$xslt->importStylesheet($xsl);

15
// Load XML input file
16
$xml = new DOMDocument();
17
$xml->load('recipe.xml');
18

19
//Now choose an output method and transform to it:
20

21
// Transform to a string
22$results = $xslt->transformToXML($xml);
23echo "String version:";

24echo htmlentities($results);

25
// Transform to DOM object
26
$results = $xslt->transformToDoc($xml);
27
echo "The root of the DOM Document is ";
28
echo $results->documentElement->nodeName;
29

30
// Transform to a file
31
$results = $xslt->transformToURI($xml, 'results.txt');
32

33?>

34

35

36

Summary
The earlier parts of this series focused on the use of the Document Object Model and on
SimpleXML to perform both simple and complex parsing tasks. Part 2 also looked at the use of
XMLReader, which provides a faster easier way to perform tasks that one would previously do
using SAX.

Now, in this article, you saw how to access remote files such as REST-based Web services, and
how to use XSLT to easily output XML data to a string, DOM Document object, or file

This content is part 2 of 3 in the series: XML for PHP


developers
PHP5 offers an improved variety of XML parsing techniques. James Clark's Expat SAX parser,
now based on libxml2, is no longer the only fully functional game in town. Parsing with the
DOM, fully compliant with the W3C standard, is a familiar option. Both SimpleXML, which you
saw in Part 1 (see Related topics), and XMLReader, which is easier and faster than SAX, offer
additional parsing approaches. All the XML extensions are now based on the libxml2 library by
the GNOME project. This unified library allows for interoperability between the different
extensions. This article will cover PHP5's XML parsing techniques, focusing on parsing large or
complex XML documents. It will offer some background about parsing techniques, what method
is best suited to what types of XML documents, and, if you have a choice, what your criteria for
choosing should be.

SimpleXML
Other articles in this series

 XML for PHP developers, Part 1: The 15-minute PHP-with-XML starter


 XML for PHP developers, Part 3: Advanced techniques to read, manipulate, and write XML

Part 1 provided essential information on XML and focused on quick-start Application


Programming Interfaces (APIs). It demonstrated how SimpleXML, combined with the Document
Object Model (DOM) as necessary, is the ideal choice if you work with straightforward,
predictable, and relatively basic XML documents.

XML and PHP5

Extensible Markup Language (XML) is described as both a markup language and a text-based
data storage format; it offers a text-based means to apply and describe a tree-based structure to
information.

In PHP5, there are totally new and rewritten extensions for parsing XML. Those that load the
entire XML document into memory include SimpleXML, the DOM, and the XSLT processor.
Those parsers that provide you with one piece of the XML document at a time include the
Simple API for XML (SAX) and XMLReader. SAX functions the same way it did in PHP4, but
it's not based on the expat library anymore, but on the libxml2 library. If you are familiar with
the DOM from other languages, you will have an easier time coding with the DOM in PHP5 than
previous versions.

XML parsing fundamentals


The two basic means to parse XML are: tree and stream. Tree style parsing involves loading the
entire XML document into memory. The tree file structure allows for random access to the
document's elements and for editing of the XML. Examples of tree-type parsing include the
DOM and SimpleXML. These share the tree-like structure in different but interoperable formats
in memory. Unlike tree style parsing, stream parsing does not load the entire XML document
into memory. The use of the term stream in this context corresponds closely to the term stream in
streaming audio. What it is doing and why it is doing it is exactly the same, namely delivering a
small amount of data at a time to preserve both bandwidth and memory. In stream parsing, only
the node currently being parsed is accessible, and editing the XML, as a document, is not
possible. Examples of stream parsers include XMLReader and SAX.

Tree-based parsers
Tree-based parsers are so named because they load the entire XML document into memory with
the document root being the trunk, and all children, grandchildren, subsequent generations, and
attributes being the branches. The most familiar tree-based parser is the DOM. The easiest tree-
based parser to code is SimpleXML. You will look at both.

Parsing with the DOM

The DOM standard, according the W3C, is "... a platform and language neutral interface that will
allow programs and scripts to dynamically access and update the content, structure and style of
documents." The libxml2 library by the GNOME project implements the DOM, along with all its
methods, in C. Since all of the PHP5 XML extensions are based on libxml2, there is complete
interoperability between the extensions. This interoperability greatly enhances their
functionality. You can, for instance, use XMLReader, a stream parser, to get an element, import
it into the DOM and extract data using XPath. That is a lot of flexibility. You'll see this in Listing
5.

The DOM is a tree-based parser. The DOM is easy to understand and utilize since its structure in
memory resembles the original XML document. DOM passes on information to the application
by creating a tree of objects that duplicates exactly the tree of elements from the XML file, with
every XML element being a node in the tree. DOM is a W3C standard, which gives the DOM a
lot of authority with developers due to its consistency with other programming languages.
Because the DOM builds a tree of the entire document, it uses a lot of memory and processor
time.

The DOM in action

If you're forced by your design or any other constraint to be a one trick pony in the area of
parsers, this is where you want to be due to flexibility alone. With the DOM, you can build,
modify, query, validate and transform XML Documents. You can use all DOM methods and
properties. Most DOM level 2 methods are implemented with properties properly supported.
Documents parsed with the DOM can be as complex as they come because of its tremendous
flexibility. Remember, however, that flexibility comes at a price if you load a large XML
document into memory all at once.

The example in Listing 1 uses the DOM to parse the document and retrieves an element with
getElementById. It's necessary to validate the document by setting validateOnParse=true
before referring to the ID. According to the DOM standard, this requires a DTD which defines
the attribute ID to be of type ID.

Listing 1. Using the DOM with a basic document


1
<?php
2

3 $doc = new DomDocument;

5 // We must validate the document before referring to the id

6 $doc->validateOnParse = true;

7 $doc->Load('basic.xml');

9 echo "The element whose id is myelement is: " .


$doc->getElementById('myelement')->tagName . "\n";
10

11
?>
12

The getElementsByTagName() function returns a new instance of class DOMNodeList


containing the elements with a given tag name. The list, of course, has to be walked through.
Altering the document structure while iterating the NodeList returned by
getElementsByTagName() affects the NodeList you are iterating (see Listing 2). There is no
validation requirement.

Listing 2. DOM getElementsByTagName method

1DOMDocument {

2 DOMNodeList getElementsByTagName(string name);

3}

The example in Listing 3 uses the DOM with XPath.

Listing 3. Using the DOM and parsing with XPath

<?php
1

2
$doc = new DOMDocument;
3

4 // We don't want to bother with white spaces

5 $doc->preserveWhiteSpace = false;

6
$doc->Load('book.xml');
7

8
$xpath = new DOMXPath($doc);
9

10
// We start from the root element
11
$query = '//book/chapter/para/informaltable/tgroup/tbody/row/entry[. =
12"en"]';

13

14$entries = $xpath->query($query);

15
foreach ($entries as $entry) {
16
echo "Found {$entry->previousSibling->previousSibling->nodeValue}," .
17
" by {$entry->previousSibling->nodeValue}\n";
18
}
19
?>
20

21

Having said all of those nice things about the DOM, I'm going to wind up with an example of
what not to do with the DOM just to make the point as strongly as possible, and, then, in the next
example, how to save yourself. Listing 4 illustrates loading a large file into the DOM simply to
extract the data from a single attribute with DomXpath.

Listing 4. Using the DOM with XPath the wrong way, on a large XML document

1 <?php

2
3 // Parsing a Large Document with DOM and DomXpath

4 // First create a new DOM document to parse


$dom = new DomDocument();
5

6
// This document is huge and we don't really need anything from the tree
7
// This huge document uses a huge amount of memory
8
$dom->load("tooBig.xml");
9
$xp = new DomXPath($dom);
10
$result = $xp->query("/blog/entries/entry[@ID = 5225]/title") ;
11print $result->item(0)->nodeValue ."\n";

12

13?>

14

This final, follow-up example in Listing 5 uses the DOM with XPath in the same way, except the
data is passed one element at a time by XMLReader using expand(). With this method, you can
convert a node passed by XMLReader to a DOMElement.

Listing 5. Using the DOM with XPath the right way, on a large XML document

<?php
1

2
// Parsing a large document with XMLReader with Expand - DOM/DOMXpath
3
$reader = new XMLReader();
4

5
$reader->open("tooBig.xml");
6

7
while ($reader->read()) {
8
switch ($reader->nodeType) {
9 case (XMLREADER::ELEMENT):
10 if ($reader->localName == "entry") {

11 if ($reader->getAttribute("ID") == 5225) {

$node = $reader->expand();
12
$dom = new DomDocument();
13
$n = $dom->importNode($node,true);
14
$dom->appendChild($n);
15
$xp = new DomXpath($dom);
16
$res = $xp->query("/entry/title");
17 echo $res->item(0)->nodeValue;

18 }

19 }

20 }

21}

22
?>
23

24

25

Parsing with SimpleXML

The SimpleXML extension is another choice for parsing an XML document. The SimpleXML
extension requires PHP5 and includes built-in XPath support. SimpleXML works best with
uncomplicated, basic XML data. Provided that the XML document isn't too complicated, too
deep, and lacks mixed content, SimpleXML is simpler to use than the DOM, as its name implies.
It is more intuitive if you are working with a known document structure.

SimpleXML in action

SimpleXML shares many of the advantages of the DOM and is more easily coded. It allows easy
access to an XML tree, has built-in validation and XPath support, and is interoperable with the
DOM, giving it read and write support for XML documents. You can code documents parsed
with SimpleXML simply and quickly. Remember however, that, like the DOM, SimpleXML
comes with a price for its ease and flexibility if you load a large XML document into memory.
The following code in Listing 6 extracts <plot> from the example XML.

Listing 6. Extracting the plot text

2 <?php
$xmlstr = <<<XML
3
<?xml version='1.0' standalone='yes'?>
4
<books>
5
<book>
6
<title>Great American Novel</title>
7
<plot>
8 Cliff meets Lovely Woman. Loyal Dog sleeps, but

9 wakes up to bark at mailman.

10 </plot>

11 <success type="bestseller">4</success>

12 <success type="bookclubs">9</success>

</book>
13
</books>
14
XML;
15
?>
16
<?php
17

18
$xml = new SimpleXMLElement($xmlstr);
19echo $xml->book[0]->plot; // "Cliff meets Lovely Woman. ..."

20?>

21

On the other hand, you might want to extract a multi-line address. When multiple instances of an
element exist as children of a single parent element, normal iteration techniques apply. The
following code in Listing 7 demonstrates this functionality.
Listing 7. Extracting multiple instances of an element

1 <?php

2 $xmlstr = <<<XML

3 <xml version='1.0' standalone='yes'?>

4 <books>
<book>
5
<title>Great American Novel</title>
6
<plot>
7
Cliff meets Lovely Woman.
8
</plot>
9
<success type="bestseller">4</success>
10 <success type="bookclubs">9</success>

11 </book>

12 <book>

13 <title>Man Bites Dog</title>

<plot>
14
Reporter invents a prize-winning story.
15
</plot>
16
<success type="bestseller">22</success>
17
<success type="bookclubs">3</success>
18
</book>
19</books>

20XML;

21?>

22<php

23

24$xml = new SimpleXMLElement($xmlstr);


25

26foreach ($xml->book as $book) {

27 echo $book->plot, '<br />';

}
28
?
29

30

31

In addition to reading element names and their values, SimpleXML can also access element
attributes. In the code shown in Listing 8, you access attributes of an element just as you would
elements of an array.

Listing 8. Demonstrating SimpleXML accessing the attributes of an element

<?php
1
$xmlstr = <<<XML
2
<?xml version='1.0' standalone='yes'?>
3
<books>
4
<book>
5 <title>Great American Novel</title>

6 <plot>

7 Cliff meets Lovely Woman.

8 </plot>

9 <success type="bestseller">4</success>

<success type="bookclubs">9</success>
10
</book>
11
<book>
12
<title>Man Bites Dog</title>
13
<plot>
14
Reporter invents a prize-winning story.
15 <plot>

16 <success type="bestseller">22</success>

<success type="bookclubs">3</success>
17
</book>
18
<books>
19
XML;
20
?>
21
<?php
22

23$xml = new SimpleXMLElement($xmlstr);

24

25foreach ($xml->book[0]->success as $success) {

26 switch((string) $success['type']) {

27 case 'bestseller':

echo $success, ' months on bestseller list<br />';


28
break;
29
case 'bookclubs':
30
echo $success, ' bookclub listings<br />';
31
break;
32
}
33}

34

35?>

36

37

38

39
This final example (see Listing 9) uses SimpleXML and the DOM with XMLReader. With
XMLReader, the data is passed one element at a time using expand(). With this method, you can
convert a node passed by XMLReader to a DOMElement, and then to SimpleXML.

Listing 9. Using SimpleXML with the DOM and XMLReader to parse a large XML document

1 <?php

2
// Parsing a large document with Expand and SimpleXML
3
$reader = new XMLReader();
4

5
$reader->open("tooBig.xml");
6

7
while ($reader->read()) {
8
switch ($reader->nodeType) {
9
case (XMLREADER::ELEMENT):
10 if ($reader->localName == "entry") {

11 if ($reader->getAttribute("ID") == 5225) {

12 $node = $reader->expand();

13 $dom = new DomDocument();

14 $n = $dom->importNode($node,true);

$dom->appendChild($n);
15
$sxe = simplexml_import_dom($n);
16
echo $sxe->title;
17
}
18
}
19
}
20}

21

22?>
23

24

Stream-based parsers
Stream-based parsers are so named because they parse the XML in a stream with much the same
rationale as streaming audio, working with a particular node, and, when they are finished with
that node, entirely forgetting its existence. XMLReader is a pull parser and you code for it in
much the same way as for a database query result table in a cursor. This makes it easier to work
with unfamiliar or unpredictable XML files.

Parsing with XMLReader

The XMLReader extension is a stream-based parser of the type often referred to as a cursor type
or pull parser. XMLReader pulls information from the XML document on request. It is based on
the API derived from C# XmlTextReader. It is included and enabled in PHP 5.1 by default and is
based on libxml2. Before PHP 5.1, the XMLReader extension was not enabled by default but
was available at PECL (see Related topics for a link). XMLReader supports namespaces and
validation, including DTD and Relaxed NG.

XMLReader in action

XMLReader, as a stream parser, is well-suited to parsing large XML documents; it is a lot easier
to code than SAX and usually faster. This is your stream parser of choice.

This example in Listing 10 parses a large XML document with XMLReader.

Listing 10. XMLReader with a large XML file

1 <?php

2
$reader = new XMLReader();
3
$reader->open("tooBig.xml");
4
while ($reader->read()) {
5
switch ($reader->nodeType) {
6
case (XMLREADER::ELEMENT):
7
if ($reader->localName == "entry") {
8 if ($reader->getAttribute("ID") == 5225) {
9 while ($reader->read()) {

10 if ($reader->nodeType == XMLREADER::ELEMENT) {

if ($reader->localName == "title") {
11
$reader->read();
12
echo $reader->value;
13
break;
14
}
15
if ($reader->localName == "entry") {
16 break;

17 }

18 }

19 }

}
20
}
21
}
22
}
23
?>
24

25

26

Parsing with SAX

The Simple API for XML (SAX) is a stream parser. Events are associated with the XML
document being read, so SAX is coded in callbacks. There are events for element opening and
closing tags, for the content of elements, for entities, and for parsing errors. The primary reason
to use the SAX parser rather than the XMLReader is that the SAX parser is sometimes more
efficient and usually more familiar. A major disadvantage is that SAX parser code is complex
and more difficult to write than XMLReader code.
SAX in action

SAX is likely familiar to those who worked with XML in PHP4, and the SAX extension in PHP5
is compatible with the version they're used to. Since it's a stream parser, it's a good choice for
large files, but not as good a choice as XMLReader.

This example in Listing 11 parses a large XML document with SAX.

Listing 11. Using SAX to parse a large XML file

1
<?php
2

3
//This class contains all the callback methods that will actually
4 //handle the XML data.

5 class SaxClass {

6 private $hit = false;

7 private $titleHit = false;

8
//callback for the start of each element
9
function startElement($parser_object, $elementname, $attribute) {
10
if ($elementname == "entry") {
11
if ( $attribute['ID'] == 5225) {
12
$this->hit = true;
13
} else {
14 $this->hit = false;

15 }

16 }

17 if ($this->hit && $elementname == "title") {

18 $this->titleHit = true;

} else {
19
$this->titleHit =false;
20
21 }

22 }

23
//callback for the end of each element
24
function endElement($parser_object, $elementname) {
25
}
26

27
//callback for the content within an element
28
function contentHandler($parser_object,$data)
29
{
30 if ($this->titleHit) {

31 echo trim($data)."<br />";

32 }

33 }

34}

35
//Function to start the parsing once all values are set and
36
//the file has been opened
37
function doParse($parser_object) {
38
if (!($fp = fopen("tooBig.xml", "r")));
39

40
//loop through data
41 while ($data = fread($fp, 4096)) {
42 //parse the fragment

43 xml_parse($parser_object, $data, feof($fp));

44 }

45}
46

47$SaxObject = new SaxClass();

48$parser_object = xml_parser_create();
xml_set_object ($parser_object, $SaxObject);
49

50
//Don't alter the case of the data
51
xml_parser_set_option($parser_object, XML_OPTION_CASE_FOLDING, false);
52

53
xml_set_element_handler($parser_object,"startElement","endElement");
54
xml_set_character_data_handler($parser_object, "contentHandler");
55

56
doParse($parser_object);
57

58?>

59

60

61

62

Summary
PHP5 offers an improved variety of parsing techniques. Parsing with the DOM, now fully
compliant with the W3C standard, is a familiar option, and is your choice for complex but
relatively small documents. SimpleXML is the way to go for basic and not-too-large XML
documents, and XMLReader, easier and faster than SAX, is the stream parser of choice for large
documents.
Advanced techniques to read, manipulate, and write XML

Add XSLT to DOM and SimpleXML APIs

Cliff Morgan
Published on March 13, 2007

FacebookTwitterLinked InGoogle+E-mail this page

Content series:

This content is part 3 of 3 in the series: XML for PHP


developers
PHP5 offers the developer a lot more muscle to work with XML. New and modified extensions
such as the DOM, SimpleXML, and XSL make working with XML less code intensive. In
PHP5, the DOM is compliant with the W3C standard. Most importantly, the interoperability
among these extensions is significant, providing additional functionality, like swapping formats
to extend usability, W3C's XPath, and more, across the board. Here you will look at input and
output options, and you will depend on the Yahoo Web Services REST protocol interface to
provide a more sophisticated showcase for the functionality of the now familiar DOM and
SimpleXML extensions and conclude with the XSL extension.

Previously in this series


Other articles in this series

 XML for PHP developers, Part 1: The 15-minute PHP-with-XML starter


 XML for PHP developers, Part 2: Advanced XML parsing techniques

The first article of this series provided essential information on XML. It focused on quick start
Application Programming Interfaces (APIs) and demonstrated how SimpleXML, when
combined with the Document Object Model (DOM) as necessary, is the ideal choice for if you
work with straightforward, predictable, and relatively basic XML documents. Part 2 looked at
the breadth of parsing APIs available for XML in PHP5, including SimpleXML, the DOM,
Simple API for XML (SAX), and XMLReader and considered which parsing techniques were
most appropriate for different sizes and complexities of XML documents.

XML in PHP5

Extensible Markup Language (XML), described as both a markup language and a text-based data
storage format, offers a text-based means to apply and describe a tree-based structure to
information. Here you'll look at XML in the context of Web services, probably one of the most
important factors driving the recent growth of XML outside the enterprise world.

In PHP5, there are totally new and entirely rewritten extensions for manipulating XML, all based
on the same libxml2 code. This common base provides interoperability between these extensions
that extends the functionality of each. The tree-based parsers include SimpleXML, the DOM,
and the XSLT processor. If you are familiar with the DOM from other languages, you will have
an easier time coding with similar functionality in PHP than before. The stream-based parsers
include the Simple API for XML (SAX) and XMLReader. SAX functions the same way it did in
PHP4.

Manipulating XML using the DOM


You can use to manipulate an XML file. Using the DOM is efficient only when the XML file is
relatively small. The advantages to using this method are the solid standard of the familiar W3C
DOM, its methods, and the flexibility it brings to coding. The disadvantages of the DOM are the
difficulty in coding and performance issues with large documents.

The DOM in action


With the DOM, you can build, modify, query, validate and transform XML documents. All
DOM methods and properties can be used, and most DOM level 2 methods are implemented
with properties properly supported. Documents parsed with the DOM can be as complex as they
come thanks to its tremendous flexibility. Remember however, that flexibility comes at a price if
you load a large XML document into memory all at once.

The examples in this article use Yahoo's search API, PHP5, and REpresentational State Transfer
(REST) to illustrate the use of the DOM in an interesting application environment. Yahoo chose
REST because of a common belief among developers that REST offers 80% of SOAP's benefits
at 20% of the cost. I chose this application to showcase PHP/XML because the popularity of
Web services is probably one of the most important factors driving the recent growth of XML
outside the enterprise world.

Typically, REST forms a request by beginning with a service entry URL and then appending
search parameters in the form of a query string. Then Listing 1 parses the results of the query
using the DOM extension.
Listing 1. The Yahoo Demo code sample using the DOM

1 <?php

2
//This query does a search for any Web pages relevant to "XML Query"
3
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
4
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
//Create the DOM Document object from the XML returned by the query
7
$xml = file_get_contents($query);
8
$dom = new DOMDocument;
9 $dom = DOMDocument::loadXML($xml);

10

11 function xml_to_result($dom) {

12

13 //This function takes the XML document and maps it to a

14 //PHP object so that you can manipulate it later.

15
//First, retrieve the root element for the document
16
$root = $dom->firstChild;
17

18
//Next, loop through each of its attributes
19
foreach($root->attributes as $attr) {
20
$res[$attr->name] = $attr->value;
21
}
22

23 //Now, loop through each of the children of the root element

24 //and treat each appropriately.


25

26 //Start with the first child node. (The counter, i, is for

27 //tracking results.

$node = $root->firstChild;
28
$i = 0;
29

30
//Now keep looping through as long as there is a node to work
31
//with. (At the bottom of the loop, the code moves to the next
32
//sibling, so when it runs out of siblings, the routine stops.
33
while($node) {
34

35 //For each node, check to see whether it's a Result element or

36 //one of the informational elements at the start of the document.

37 switch($node->nodeName) {

38

39 //Result elements need more analysis.

case 'Result':
40
//Add each child node of the Result to the result object,
41
//again starting with the first child.
42
$subnode = $node->firstChild;
43
while($subnode) {
44

45
//Some of these nodes just are just whitespace, which does
46 //not have children.

47 if ($subnode->hasChildNodes()){

48

49 //If it does have children, get a NodeList of them, and


50 //loop through it.

51 $subnodes = $subnode->childNodes;

foreach($subnodes as $n) {
52

53
//Again check for children, adding them directly or
54
//indirectly as appropriate.
55
if($n->hasChildNodes()) {
56
foreach($n->childNodes as $cn){
57
$res[$i][$subnode->nodeName][$n->nodeName]=
58 trim($cn->nodeValue);

59 }

60 } else {

61 $res[$i][$subnode->nodeName]=trim($n->nodeValue);

}
62
}
63
}
64
//Move on to the next subnode.
65
$subnode = $subnode->nextSibling;
66
}
67 $i++;

68 break;

69 //Other elements are just added to the result object.

70 default:

71 $res[$node->nodeName] = trim($node->nodeValue);

break;
72
}
73

74
75 //Move on to the next Result of informational element

76 $node = $node->nextSibling;

}
77
return $res;
78
}
79

80
//First, convert the XML to a DOM object you can manipulate.
81
$res = xml_to_result($dom);
82

83
//Use one of those "informational" elements to display the total
84 //number of results for the query.

85 echo "<p>The query returns ".$res["totalResultsAvailable"].

86 " total results The first 10 are as follows:</p>";

87

88 //Now loop through each of the actual results.

89 for($i=0; $i<$res['totalResultsReturned']; $i++) {

90
echo "<a href='".$res[$i]['ClickUrl']."'><b>".
91
$res[$i]['Title']."</b></a>: ";
92
echo $res[$i]['Summary'];
93

94
echo "<br /><br />";
95
}
96

97 ?>

98

99
100

101

102

103

104

105

Manipulating XML using SimpleXML


The SimpleXML extension is a tool of choice for manipulating an XML document, provided that
the XML document isn't too complicated or too deep, and contains no mixed content.
SimpleXML is easier to code than the DOM, as its name implies. It is far more intuitive if you
work with a known document structure. Greatly increasing the flexibility of the DOM and
SimpleXML the interoperative nature of the libXML2 architecture allows imports to swap
formats from DOM to SimpleXML and back at will.

SimpleXML in action

Documents manipulated with SimpleXML simple and quick to code. The following code parses
the results of the query using the SimpleXML extension. As you might expect, the following
SimpleXML code (see Listing 2) is more compact than the DOM code example shown above in
Listing 1.

Listing 2. The Yahoo SimpleXML example

<?php
1

2
//This query does a search for any Web pages relevant to "XML Query"
3
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
4
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
$xml = simplexml_load_file($query);
7

8
// Load up the root element attributes
9 foreach($xml->attributes() as $name=>$attr) {

10 $res[$name]=$attr;

}
11

12
//Use one of those "informational" elements to display the total
13
//number of results for the query.
14
echo "<p>The query returns ".$res["totalResultsAvailable"].
15
" total results The first 10 are as follows:</p>";
16

17
//Unlike with DOM, where we loaded the entire document into the
18//result object, with SimpleXML, we get back an object in the

19//first place, so we can just use the number of results returned

20//to loop through the Result members.

21

22for($i=0; $i<$res['totalResultsReturned']; $i++) {

23
//The object represents each piece of data as a member variable
24
//rather than an array element, so the syntax is a little bit
25
//different from the DOM version.
26

27
$thisResult = $xml->Result[$i];
28

29
echo "<a href='".$thisResult->ClickUrl."'><b>".
30
$thisResult->Title."</b></a>: ";
31 echo $thisResult->Summary;
32

33 echo "<br /><br />";


34}

35

36?>

37

38

39

Listing 3 adds a cache layer to the SimpleXML example from Listing 2. The cache caches the
results of any particular query for two hours.

Listing 3. The Yahoo SimpleXML example with a cache layer

1 <?php

3 //This query does a search for any Web pages relevant to "XML Query"

4 $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
//The cached material should only last for 2 hours, so you need the
7
//current time.
8
$currentTime = microtime(true);
9

10
//This is where I put my tempfile; you can store yours in a more
11
//convenient location.
12$cache = 'c:\temp\yws_'.md5($query);

13

14//First check for an existing version of the time, and then check

15//to see whether or not it's expired.

16if(file_exists($cache) &&
17 filemtime($cache) > (time()-7200)) {

18

19 //If there's a valid cache file, load its data.

$data = file_get_contents($cache);
20
} else {
21

22
//If there's no valid cache file, grab a live version of the
23
//data and save it to a temporary file. Once the file is complete,
24
//copy it to a permanent file. (This prevents concurrency issues.)
25
$data = file_get_contents($query);
26 $tempName = tempnam('c:\temp','YWS');

27 file_put_contents($tempName, $data);

28 rename($tempName, $cache);

29}

30

31//Wherever the data came from, load it into a SimpleXML object.


$xml = simplexml_load_string($data);
32

33
//From here, the rest of the file is the same.
34

35
// Load up the root element attributes
36
foreach($xml->attributes() as $name=>$attr) {
37
$res[$name]=$attr;
38}

39

40...

41
42

43

Manipulating XML using XSL


EXtensible Stylesheet Language (XSL) is a functional XML language that was created for the
task of manipulating XML documents. Using XSL, you can transform an XML document into a
redefined XML document, an XHTML document, an HTML document, or a text document
based on a stylesheet definition similar to the way CSS works by implementing rules. PHP5's
implementation of the W3C standard supports interoperability with the DOM and XPath.
EXtensible Stylesheet Language Transformations (XSLT) is an XML extension based on
libxml2, and its stylesheets are XML documents. XSLT transforms an XML source tree into an
XML or XML-type result tree. These transformations apply the series of rules specified in the
stylesheet to the XML data. XSLT can add or remove elements or attributes to or from the output
file. It allows the developer to sort or rearrange elements and make decisions about what
elements to hide or display. Different stylesheets allow for your XML to be displayed
appropriately for different media, such as screen display versus print display. XSLT uses XPath
to navigate through the original XML document. The XSLT transformation model usually
involves a source XML file, an XSLT file containing one or more processing templates, and an
XSLT processor. XSLT documents have to be loaded using the DOM. PHP5 supports only the
libxslt processor.

XSL in action

An interesting application of XSL is to create XML files on the fly to contain whatever data has
just been selected from the database. Using this technique, it is possible to create complete Web
applications where the PHP scripts are made up of XML files from database queries, then use
XSL transformations to generate the actual HTML documents.

This method completely splits the presentation layer from the business layer so that you can
maintain either of these layers independently of the other.

Listing 4 illustrates the relationship between the XML input file, the XSL stylesheet, the XSLT
processor, and multiple possible outputs.

Listing 4. XML transformation

1 <?php

3 // Create new XSLTProcessor

4 $xslt = new XSLTProcessor();


5

6 //Both the source document and the stylesheet must be

7 //DOMDocuments, but the result can be a DOMDocument,


//a file, or even a String.
8

9
// Load the XSLT stylesheet
10
$xsl = new DOMDocument();
11
$xsl->load('recipe.xsl');
12

13
// Load the stylesheet into the processor
14
$xslt->importStylesheet($xsl);
15

16// Load XML input file

17$xml = new DOMDocument();

18$xml->load('recipe.xml');

19

20//Now choose an output method and transform to it:

21

22// Transform to a string


$results = $xslt->transformToXML($xml);
23
echo "String version:";
24
echo htmlentities($results);
25

26
// Transform to DOM object
27
$results = $xslt->transformToDoc($xml);
28
echo "The root of the DOM Document is ";
29echo $results->documentElement->nodeName;
30

31// Transform to a file

32$results = $xslt->transformToURI($xml, 'results.txt');

33
?>
34

35

36

Summary
The earlier parts of this series focused on the use of the Document Object Model and on
SimpleXML to perform both simple and complex parsing tasks. Part 2 also looked at the use of
XMLReader, which provides a faster easier way to perform tasks that one would previously do
using SAX.

Now, in this article, you saw how to access remote files such as REST-based Web services, and
how to use XSLT to easily output XML data to a string, DOM Document object, or file

Advanced techniques to read, manipulate, and write XML

Add XSLT to DOM and SimpleXML APIs

Cliff Morgan
Published on March 13, 2007

FacebookTwitterLinked InGoogle+E-mail this page

1
Content series:

This content is part 3 of 3 in the series: XML for PHP


developers
PHP5 offers the developer a lot more muscle to work with XML. New and modified extensions
such as the DOM, SimpleXML, and XSL make working with XML less code intensive. In
PHP5, the DOM is compliant with the W3C standard. Most importantly, the interoperability
among these extensions is significant, providing additional functionality, like swapping formats
to extend usability, W3C's XPath, and more, across the board. Here you will look at input and
output options, and you will depend on the Yahoo Web Services REST protocol interface to
provide a more sophisticated showcase for the functionality of the now familiar DOM and
SimpleXML extensions and conclude with the XSL extension.

Previously in this series


Other articles in this series

 XML for PHP developers, Part 1: The 15-minute PHP-with-XML starter


 XML for PHP developers, Part 2: Advanced XML parsing techniques

The first article of this series provided essential information on XML. It focused on quick start
Application Programming Interfaces (APIs) and demonstrated how SimpleXML, when
combined with the Document Object Model (DOM) as necessary, is the ideal choice for if you
work with straightforward, predictable, and relatively basic XML documents. Part 2 looked at
the breadth of parsing APIs available for XML in PHP5, including SimpleXML, the DOM,
Simple API for XML (SAX), and XMLReader and considered which parsing techniques were
most appropriate for different sizes and complexities of XML documents.

XML in PHP5

Extensible Markup Language (XML), described as both a markup language and a text-based data
storage format, offers a text-based means to apply and describe a tree-based structure to
information. Here you'll look at XML in the context of Web services, probably one of the most
important factors driving the recent growth of XML outside the enterprise world.

In PHP5, there are totally new and entirely rewritten extensions for manipulating XML, all based
on the same libxml2 code. This common base provides interoperability between these extensions
that extends the functionality of each. The tree-based parsers include SimpleXML, the DOM,
and the XSLT processor. If you are familiar with the DOM from other languages, you will have
an easier time coding with similar functionality in PHP than before. The stream-based parsers
include the Simple API for XML (SAX) and XMLReader. SAX functions the same way it did in
PHP4.
Manipulating XML using the DOM
You can use to manipulate an XML file. Using the DOM is efficient only when the XML file is
relatively small. The advantages to using this method are the solid standard of the familiar W3C
DOM, its methods, and the flexibility it brings to coding. The disadvantages of the DOM are the
difficulty in coding and performance issues with large documents.

The DOM in action


With the DOM, you can build, modify, query, validate and transform XML documents. All
DOM methods and properties can be used, and most DOM level 2 methods are implemented
with properties properly supported. Documents parsed with the DOM can be as complex as they
come thanks to its tremendous flexibility. Remember however, that flexibility comes at a price if
you load a large XML document into memory all at once.

The examples in this article use Yahoo's search API, PHP5, and REpresentational State Transfer
(REST) to illustrate the use of the DOM in an interesting application environment. Yahoo chose
REST because of a common belief among developers that REST offers 80% of SOAP's benefits
at 20% of the cost. I chose this application to showcase PHP/XML because the popularity of
Web services is probably one of the most important factors driving the recent growth of XML
outside the enterprise world.

Typically, REST forms a request by beginning with a service entry URL and then appending
search parameters in the form of a query string. Then Listing 1 parses the results of the query
using the DOM extension.

Listing 1. The Yahoo Demo code sample using the DOM

<?php
1

2
//This query does a search for any Web pages relevant to "XML Query"
3
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
4
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
//Create the DOM Document object from the XML returned by the query
7
$xml = file_get_contents($query);
8
$dom = new DOMDocument;
9 $dom = DOMDocument::loadXML($xml);
10

11 function xml_to_result($dom) {

12

13 //This function takes the XML document and maps it to a

//PHP object so that you can manipulate it later.


14

15
//First, retrieve the root element for the document
16
$root = $dom->firstChild;
17

18
//Next, loop through each of its attributes
19
foreach($root->attributes as $attr) {
20
$res[$attr->name] = $attr->value;
21 }

22

23 //Now, loop through each of the children of the root element

24 //and treat each appropriately.

25

26 //Start with the first child node. (The counter, i, is for

//tracking results.
27
$node = $root->firstChild;
28
$i = 0;
29

30
//Now keep looping through as long as there is a node to work
31
//with. (At the bottom of the loop, the code moves to the next
32
//sibling, so when it runs out of siblings, the routine stops.
33
while($node) {
34
35 //For each node, check to see whether it's a Result element or

36 //one of the informational elements at the start of the document.

switch($node->nodeName) {
37

38
//Result elements need more analysis.
39
case 'Result':
40
//Add each child node of the Result to the result object,
41
//again starting with the first child.
42
$subnode = $node->firstChild;
43 while($subnode) {

44

45 //Some of these nodes just are just whitespace, which does

46 //not have children.

47 if ($subnode->hasChildNodes()){

48
//If it does have children, get a NodeList of them, and
49
//loop through it.
50
$subnodes = $subnode->childNodes;
51
foreach($subnodes as $n) {
52

53
//Again check for children, adding them directly or
54
//indirectly as appropriate.
55 if($n->hasChildNodes()) {
56 foreach($n->childNodes as $cn){

57 $res[$i][$subnode->nodeName][$n->nodeName]=

58 trim($cn->nodeValue);

59 }
60 } else {

61 $res[$i][$subnode->nodeName]=trim($n->nodeValue);

}
62
}
63
}
64
//Move on to the next subnode.
65
$subnode = $subnode->nextSibling;
66
}
67 $i++;

68 break;

69 //Other elements are just added to the result object.

70 default:

$res[$node->nodeName] = trim($node->nodeValue);
71
break;
72
}
73

74
//Move on to the next Result of informational element
75
$node = $node->nextSibling;
76
}
77 return $res;

78 }

79

80 //First, convert the XML to a DOM object you can manipulate.

81 $res = xml_to_result($dom);

82

83 //Use one of those "informational" elements to display the total


//number of results for the query.
84
85 echo "<p>The query returns ".$res["totalResultsAvailable"].

86 " total results The first 10 are as follows:</p>";

87
//Now loop through each of the actual results.
88
for($i=0; $i<$res['totalResultsReturned']; $i++) {
89

90
echo "<a href='".$res[$i]['ClickUrl']."'><b>".
91
$res[$i]['Title']."</b></a>: ";
92
echo $res[$i]['Summary'];
93

94
echo "<br /><br />";
95 }

96

97 ?>

98

99

100

101

102

103

104

105

Manipulating XML using SimpleXML


The SimpleXML extension is a tool of choice for manipulating an XML document, provided that
the XML document isn't too complicated or too deep, and contains no mixed content.
SimpleXML is easier to code than the DOM, as its name implies. It is far more intuitive if you
work with a known document structure. Greatly increasing the flexibility of the DOM and
SimpleXML the interoperative nature of the libXML2 architecture allows imports to swap
formats from DOM to SimpleXML and back at will.

SimpleXML in action

Documents manipulated with SimpleXML simple and quick to code. The following code parses
the results of the query using the SimpleXML extension. As you might expect, the following
SimpleXML code (see Listing 2) is more compact than the DOM code example shown above in
Listing 1.

Listing 2. The Yahoo SimpleXML example

<?php
1

2
//This query does a search for any Web pages relevant to "XML Query"
3
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
4
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
$xml = simplexml_load_file($query);
7

8 // Load up the root element attributes

9 foreach($xml->attributes() as $name=>$attr) {

10 $res[$name]=$attr;

11}

12

13//Use one of those "informational" elements to display the total


//number of results for the query.
14
echo "<p>The query returns ".$res["totalResultsAvailable"].
15
" total results The first 10 are as follows:</p>";
16

17
//Unlike with DOM, where we loaded the entire document into the
18
//result object, with SimpleXML, we get back an object in the
19//first place, so we can just use the number of results returned

20//to loop through the Result members.

21
for($i=0; $i<$res['totalResultsReturned']; $i++) {
22

23
//The object represents each piece of data as a member variable
24
//rather than an array element, so the syntax is a little bit
25
//different from the DOM version.
26

27
$thisResult = $xml->Result[$i];
28

29
echo "<a href='".$thisResult->ClickUrl."'><b>".
30 $thisResult->Title."</b></a>: ";

31 echo $thisResult->Summary;

32

33 echo "<br /><br />";

34}

35

36?>

37

38

39

Listing 3 adds a cache layer to the SimpleXML example from Listing 2. The cache caches the
results of any particular query for two hours.

Listing 3. The Yahoo SimpleXML example with a cache layer

1 <?php
2

3 //This query does a search for any Web pages relevant to "XML Query"

4 $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
"query=%5C%22XML%20Query%5C%22&appid=YahooDemo";
5

6
//The cached material should only last for 2 hours, so you need the
7
//current time.
8
$currentTime = microtime(true);
9

10
//This is where I put my tempfile; you can store yours in a more
11
//convenient location.
12$cache = 'c:\temp\yws_'.md5($query);

13

14//First check for an existing version of the time, and then check

15//to see whether or not it's expired.

16if(file_exists($cache) &&
filemtime($cache) > (time()-7200)) {
17

18
//If there's a valid cache file, load its data.
19
$data = file_get_contents($cache);
20
} else {
21

22
//If there's no valid cache file, grab a live version of the
23
//data and save it to a temporary file. Once the file is complete,
24 //copy it to a permanent file. (This prevents concurrency issues.)
25 $data = file_get_contents($query);

26 $tempName = tempnam('c:\temp','YWS');
27 file_put_contents($tempName, $data);

28 rename($tempName, $cache);

}
29

30
//Wherever the data came from, load it into a SimpleXML object.
31
$xml = simplexml_load_string($data);
32

33
//From here, the rest of the file is the same.
34

35
// Load up the root element attributes
36
foreach($xml->attributes() as $name=>$attr) {
37 $res[$name]=$attr;

38}

39

40...

41

42

43

Manipulating XML using XSL


EXtensible Stylesheet Language (XSL) is a functional XML language that was created for the
task of manipulating XML documents. Using XSL, you can transform an XML document into a
redefined XML document, an XHTML document, an HTML document, or a text document
based on a stylesheet definition similar to the way CSS works by implementing rules. PHP5's
implementation of the W3C standard supports interoperability with the DOM and XPath.
EXtensible Stylesheet Language Transformations (XSLT) is an XML extension based on
libxml2, and its stylesheets are XML documents. XSLT transforms an XML source tree into an
XML or XML-type result tree. These transformations apply the series of rules specified in the
stylesheet to the XML data. XSLT can add or remove elements or attributes to or from the output
file. It allows the developer to sort or rearrange elements and make decisions about what
elements to hide or display. Different stylesheets allow for your XML to be displayed
appropriately for different media, such as screen display versus print display. XSLT uses XPath
to navigate through the original XML document. The XSLT transformation model usually
involves a source XML file, an XSLT file containing one or more processing templates, and an
XSLT processor. XSLT documents have to be loaded using the DOM. PHP5 supports only the
libxslt processor.

XSL in action

An interesting application of XSL is to create XML files on the fly to contain whatever data has
just been selected from the database. Using this technique, it is possible to create complete Web
applications where the PHP scripts are made up of XML files from database queries, then use
XSL transformations to generate the actual HTML documents.

This method completely splits the presentation layer from the business layer so that you can
maintain either of these layers independently of the other.

Listing 4 illustrates the relationship between the XML input file, the XSL stylesheet, the XSLT
processor, and multiple possible outputs.

Listing 4. XML transformation

1 <?php

3 // Create new XSLTProcessor


$xslt = new XSLTProcessor();
4

5
//Both the source document and the stylesheet must be
6
//DOMDocuments, but the result can be a DOMDocument,
7
//a file, or even a String.
8

9
// Load the XSLT stylesheet
10
$xsl = new DOMDocument();
11$xsl->load('recipe.xsl');

12

13// Load the stylesheet into the processor

14$xslt->importStylesheet($xsl);
15

16// Load XML input file

17$xml = new DOMDocument();


$xml->load('recipe.xml');
18

19
//Now choose an output method and transform to it:
20

21
// Transform to a string
22
$results = $xslt->transformToXML($xml);
23
echo "String version:";
24
echo htmlentities($results);
25

26// Transform to DOM object

27$results = $xslt->transformToDoc($xml);

28echo "The root of the DOM Document is ";

29echo $results->documentElement->nodeName;

30

31// Transform to a file


$results = $xslt->transformToURI($xml, 'results.txt');
32

33
?>
34

35

36

Summary
The earlier parts of this series focused on the use of the Document Object Model and on
SimpleXML to perform both simple and complex parsing tasks. Part 2 also looked at the use of
XMLReader, which provides a faster easier way to perform tasks that one would previously do
using SAX.

Now, in this article, you saw how to access remote files such as REST-based Web services, and
how to use XSLT to easily output XML data to a string, DOM Document object, or file.

Downloadable resources

 PDF of this content

Related topics

 XML for PHP developers, Part 1: The 15-minute PHP-with-XML starter (Cliff Morgan,
developerWorks, February 2007): In the first article of this three-part series, discover PHP5's
XML implementation and how easy it is to work with XML in a PHP environment.
 XML for PHP developers, Part 2: Advanced XML parsing techniques (Cliff Morgan,
developerWorks, March 2007): In Part 2 of this three-part series, explore XML parsing
techniques in PHP5, and learn how to decide which parsing method is best for your app.
 Tip: Use Language specific tools for XML processing (Uche Ogbuji, developerWorks, January
2004): Try these alternatives to SAX and DOM when you parse XML.
 Intuition and Binary XML (Leigh Dodds, XML.com, April 2001): Read about the debate
concerning binary encoded alternatives to XML.
 What kind of language is XSLT (Michael Kay, developerWorks, April 2005): Put XSLT in context as
you learn where the language comes from, what it's good at, and why you should use it.
 Tip: Implement XMLReader: An interface for XML converters (Benoît Marchal, developerWorks,
November 2003): Explore APIs for XML pipelines.
 Reading and writing the XML DOM in PHP (Jack Herrington, developerWorks, December 2005):
Explore three methods to read XML: the DOM library, the SAX parser, and regular expressions.
Also, look at how to write XML using DOM and PHP text templating.
 SimpleXML Processing with PHP (Elliotte Rusty Harold, developerWorks, October 2006): Try the
SimpleXML extension and enable your PHP pages to query, search, modify, and republish XML.
 Introducing Simple XML in PHP5 ( Alejandro Gervasio, Dev Shed, June 2006): In the first of a
three-part article series on SimpleXML, save work with the basics of the simplexml extension in
PHP 5, a library that primarily focuses on parsing simple XML files.
 PHP Cookbook, Second Edition (Adam Trachtenberg and David Sklar, O'Reilly Media, August
2006): Learn to build dynamic Web applications that work on any Web browser.
 XML.com: Visit O'Reilly's XML site for comprehensive coverage of the XML world.
 W3C XML Information: Read the XML specification from the source.
 PHP development home site: Learn more about this widely-used general-purpose scripting
language that is especially suited for Web development.
 Visit PEAR: PHP Extension and Application Repository: Get more information on PEAR, a
framework and distribution system for reusable PHP components.
 PECL: PHP Extension Community Library: Visit the sister site to PEAR and repository for PHP
Extensions.
 Planet PHP: Visit the PHP developer community news source.
 xmllib2: Get the the XML C parser and toolkit of Gnome.
 IBM certification: Find out how you can become an IBM-Certified Developer.
 XML technical library: See the developerWorks XML Zone for a wide range of technical articles
and tips, tutorials, standards, and IBM Redbooks.
 IBM trial software: Build your next development project with trial software available for
download directly from developerWorks.

Why XML validation?


XML is a markup language that enables you, as a developer, to create your own custom
language. This language is then used to carry, but not necessarily display, data in a platform-
independent fashion. The language is defined with the use of markup tags, much like Hypertext
Markup Language (HTML).

XML has gained in popularity in recent years because it represents the best of two worlds: It is
easily readable by humans and computers alike. XML languages are expressed in tree-like
structure with elements and attributes describing key data. The element and attribute names are
usually written in plain English (so humans can read them). They are also highly structured (so
computers can parse them).

Now, for example, suppose you create your own XML language, called LuresXML. LuresXML
simply defines a means for defining various types of lures that are offered on your Web site.
First, you create an XML schema that defines what the XML document should look like, as in
Listing 1.

Listing 1. lures.xsd

<?xml version="1.0"?>
1
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
2 elementFormDefault="qualified">

3 <xs:element name="lures">

4 <xs:complexType>

5 <xs:sequence>

<xs:element name="lure">
6
<xs:complexType>
7
<xs:sequence>
8
<xs:element name="lureName" type="xs:string"/>
9
<xs:element name="lureCompany" type="xs:string"/>
10
<xs:element name="lureQuantity" type="xs:integer"/>
11 </xs:sequence>
12 </xs:complexType>

13 </xs:element>

</xs:sequence>
14
</xs:complexType>
15
</xs:element>
16
</xs:schema>
17

18

This is, quite intentionally, a fairly simple example. The root element is called lures. It is the
parent element of one or more lure elements, each of which is the parent of three other
elements. The first element is the lure name (lureName). The second element is the name of the
company that manufactures the lure (lureCompany). And, finally, the last element is the quantity
(lureQuantity), or how many lures your company has in inventory. The first two of these child
elements are defined as strings, whereas the lureQuantity element is defined as an integer.

Now, say you want to create an XML document (sometimes called an instance) based on that
schema. It might look something like Listing 2.

Listing 2. lures.xml

1
<lures>
2 <lure>

3 <lureName>Silver Spoon</lureName>

4 <lureCompany>Clark</lureCompany>

5 <lureQuantity>Seven</lureQuantity>

6 </lure>
</lures>
7

This is a simple XML document instance of the schema from Listing 1. In this case, the
document instance lists only one lure. The name of the lure is Silver Spoon. The manufacturing
company is Clark. And the quantity on hand is Seven.

Here is the question: How do you know that the XML document in Listing 2 is a proper instance
of the schema defined in Listing 1? In fact, it isn't (this is also intentional).
Note the lureQuantity element as defined in Listing 1. It is of type xs:integer. Yet in Listing
2 the lureQuantity element actually contains a word (Seven), not an integer.

The purpose of XML validation is to catch exactly those kinds of errors. Proper validation
ensures that an XML document matches the rules defined in its schema.

Continuing with this example, when you attempt to validate the XML document in Listing 2, you
get an error. You fix this error (by changing the Seven to a 7) before using the document within
your software application.

XML validation is important because you want to catch errors as early as possible in the
information interchange process. Otherwise, unpredictable results can occur when you attempt to
parse an XML document and it contains invalid data types or an unexpected structure.

Simple XML parsing in PHP


It is beyond the scope of this article to provide an exhaustive overview of parsing XML
documents in PHP. However, I look at the basics of loading an XML document in PHP.

Just to continue to keep things simple, keep using the schema from Listing 1 and the XML
document from Listing 2. Listing 3 demonstrates some basic PHP code to load the XML
document.

Listing 3. testxml.php

1<?php

3$xml = new DOMDocument();

4$xml->load('./lures.xml');

6?>

Nothing is complicated about this either. You are using the DOMDocument class to load the XML
document, here called lures.xml. Note that for this code to work on your own PHP server, the
lures.xml file must reside on the same path as the actual PHP code.

At this point, it is tempting to start parsing the XML document. However, as you have seen, it is
best to first validate the document to ensure that it matches the language specifications set forth
in the schema.
Simple XML validation in PHP
Continue adding to the PHP code in Listing 3 by inserting some simple validation code, as in
Listing 4.

Listing 4. Enhanced testxml.php

1
<?php
2

3
$xml = new DOMDocument();
4 $xml->load('./lures.xml');

6 if (!$xml->schemaValidate('./lures.xsd')) {

7 echo "invalid<p/>";

8 }

9 else {
echo "validated<p/>";
10
}
11

12
?>
13

Once again, note that the schema file from Listing 2 must be in the same directory where the
PHP code is located. Otherwise, PHP returns an error.

This new code invokes the schemaValidate method against the DOMDocument object that loaded
the XML. The method accepts one parameter: the location of the XML schema used to validate
the XML document. The method returns a Boolean where true indicates a successful validation
and false indicates an unsuccessful validation.

Now, deploy the PHP code from Listing 3 to your own PHP server. Call it testxml.php because
that is the name given in Listings 3 and 4. Ensure that the XML document (from Listing 2) and
XML schema (from Listing 1) are both in the same directory. Once again, PHP reports an error if
this is not the case.

Point your browser to testxml.php. You should see one simple word on the screen: "invalid."
The good news is that the schema validation is working. It should return an error, and it did.

The bad news is that you have no idea where the error is located within the XML document.
Okay, you might know because I mentioned the source of the error earlier in the article. But
pretend that didn't happen, okay?

There is an error, but where?

To repeat: The bad news is that you have no idea where the error is located within the XML
document. Just play along. It would be nice if the PHP code actually reported the location of the
error, as well as the nature of the error, so that you can take corrective action. Something along
the lines of "Hey! I can't accept a string for lureQuantity" would be nice.

To view the error that was encountered, you can use the libxml_get_errors() function.
Unfortunately, the text output of that function doesn't specifically identify where in the XML
document the error occurred. Instead, it identifies where in the PHP code an error was
encountered. Because that's fairly useless, you look at another option.

There is another PHP function called libxml_use_internal_errors(). This function accepts a


Boolean as its only parameter. If you set it to true, then that means that you are disabling the
libxml error reporting and fetching the errors on your own. That's what you do.

Of course, that means that you have to write a bit more code. But the trade-off is more specific
error reporting. In the long run, this saves a lot of time.

Listing 5 shows the finished product.

Listing 5. The final testxml.php

1 <?php

2 function libxml_display_error($error)
{
3
$return = "<br/>\n";
4
switch ($error->level) {
5
case LIBXML_ERR_WARNING:
6
$return .= "<b>Warning $error->code</b>: ";
7
break;
8 case LIBXML_ERR_ERROR:

9 $return .= "<b>Error $error->code</b>: ";


10break;

11case LIBXML_ERR_FATAL:
$return .= "<b>Fatal Error $error->code</b>: ";
12
break;
13
}
14
$return .= trim($error->message);
15
if ($error->file) {
16
$return .= " in <b>$error->file</b>";
17}

18$return .= " on line <b>$error->line</b>\n";

19

20return $return;

21}

22

23function libxml_display_errors() {
$errors = libxml_get_errors();
24
foreach ($errors as $error) {
25
print libxml_display_error($error);
26
}
27
libxml_clear_errors();
28}

29

30// Enable user error handling

31libxml_use_internal_errors(true);

32

33$xml = new DOMDocument();

34$xml->load('./lures.xml');
35

36if (!$xml->schemaValidate('./lures.xsd')) {

37print '<b>Errors Found!</b>';


libxml_display_errors();
38
}
39
else {
40
echo "validated<p/>";
41
}
42

43
?>
44

45

46

47

First, notice the function at the top of the code listing. It's called libxml_display_error() and
accepts a LibXMLError object as its only parameter. Then it uses the all-too-familiar switch
statement to determine the error level and craft an error message appropriate to that level. When
the level is determined, the code produces a string that reports the appropriate level.

Then, two more things happen. First, the error object is examined to determine whether or not a
file property contains a value. If so, then that file value is appended to the error message so
the location of the file is reported. Next, the line property is appended to the error message so
the user can see exactly where in the XML file the error occurred. Needless to say, this is
extremely important for debugging purposes.

It should also be noted that libxml_display_error() simply produces a string that describes
the error. The actual printing of the error to the screen is left up to the caller, in this case
libxml_display_errors().

The function below that is the previously mentioned libxml_display_errors(), which takes
no parameters. The first thing this function does is call libxml_get_errors(). This returns an
array of LibXMLError objects that represent all of the errors encountered when the
schemaValidate() method was invoked on the XML document.
Next, you step through each of the errors you encountered and invoke the
libxml_display_error() function for each error object. Whatever string is returned by that
function is then printed to the screen. One great benefit of handling errors this way is that all of
the errors are printed at once. This means that you only need to execute the code once to view all
of the errors specific to that particular XML document.

Finally, libxml_clear_errors() clears out the errors recently encountered by the


schemaValidate() method. This means that if schemaValidate() is executed again within the
same code sequence, you will start with a clean slate, and only new errors will be reported. If
you don't do this and you execute schemaValidate() again, then all of the errors from the first
invocation of schemaValidate() remain in the array returned by libxml_get_errors().
Obviously, that presents problems if you're looking for a fresh set of errors.

It's also important to note that I made a slight change to the if-then statement at the bottom of the
code in Listing 5. If an error is encountered, it prints "Errors Found!" in bold and then invokes
the aforementioned libxml_display_errors() function which displays all of the errors
encountered before clearing out the error array. I opted for this solution instead of just printing
out "invalid" as I did in Listing 4.

Second test

Now, it's time to test again. Move the PHP file from Listing 5 to your PHP server. Keep the file
name the same (testxml.php). As before, ensure that both the XML Schema Definition (XSD)
file and the XML files are in the same directory as the PHP file. Point your browser to
testxml.php once again, and now you should see something like this:

Errors Found!
Error 1824: Element 'lureQuantity': 'Seven' is not a valid value of the atomic type 'xs:integer'. in
/home/thehope1/public_html/example.xml on line 5

Well, that's fairly descriptive, isn't it? The error message tells you on what line the error
occurred. It also tells you where the file is (as if you didn't know). And it tells you exactly why
the error occurred. That's information you can use.

Fixing the problem

You can now leave the PHP file alone and work on fixing the problem in your XML document.

Because the error reportedly occurred on line 5 of the XML document, it's a good idea to look at
line 5 and see what's there. Unsurprisingly, line 5 is the location of the lureQuantity element.
And, as you look at it carefully, you suddenly have an epiphany that Seven is a string, not a
number. So you change the string Seven to the numeral 7. The final copy of the XML document
should look like Listing 6.

Listing 6. Updated XML file


1
<lures>
2 <lure>

3 <lureName>Silver Spoon</lureName>

4 <lureCompany>Clark</lureCompany>

5 <lureQuantity>7</lureQuantity>

6 </lure>
</lures>
7

Now, copy this new XML file to your PHP server. And, once again, point your browser to
testxml.php. You should see just one word: "validated." This is excellent news for two reasons.
First, it means that the validation code is working properly because the XML document is, in
fact, valid. Second, you have probably just validated your first XML document in PHP.
Congratulations!

As I always advise, now it is time to tinker. Modify lures.xsd to make it a more complex schema.
Modify lures.xml to make it a more complex instance of that schema. Copy those files to the
PHP server and, once again, execute testxml.php. See what happens. Intentionally produce an
invalid document for several reasons and see what happens.

Also, note that when you tinker, you don't need to change the PHP code at all. Just make sure
that the file names (lures.xml and lures.xsd) are the same and you can modify them to your
heart's content.

Conclusion
PHP makes it easy for developers to validate XML documents. Using the DOMDocument class in
conjunction with the schemaValidate() method, you can ensure that your XML documents
comply with the specifications in their respective schemas. This is important to ensure data
integrity in your software applications

PHP Ajax Tutorial with Example

What is Ajax?
AJAX is the acronym for Asynchronous JavaScript & XML.
It is a technology that reduces the interactions between the server and client.

It does this by updating only part of a web page rather than the whole page.

The asynchronous interactions are initiated by JavaScript.

JavaScript is a client side scripting language. It is executed on the client side by the web
browsers that support JavaScript.JavaScript code only works in browsers that
have JavaScript enabled.

XML is the acronym for Extensible Markup Language. It is used to encode messages in both
human and machine readable formats. It’s like HTML but allows you to create your custom tags.
For more details on XML, see the article on XML

Why use AJAX?


 It allows developing rich interactive web applications just like desktop applications.
 Validation can be performed done as the user fills in a form without submitting it. This can be
achieved using auto completion. The words that the user types in are submitted to the server
for processing. The server responds with keywords that match what the user entered.
 It can be used to populate a dropdown box depending on the value of another dropdown box
 Data can be retrieved from the server and only a certain part of a page updated without loading
the whole page. This is very useful for web page parts that load things like
o Tweets
o Commens
o Users visiting the site etc.

How to Create an PHP Ajax application


We will create a simple application that allows users to search for popular PHP MVC
frameworks.

Our application will have a text box that users will type in the names of the framework.

We will then use mvc AJAX to search for a match then display the framework’s complete name
just below the search form.

Step 1) Creating the index page


Index.php

<html>

<head>

<title>PHP MVC Frameworks - Search Engine</title>


<script type="text/javascript" src="auto_complete.js" ></script>

</head>

<body>

<h2>PHP MVC Frameworks - Search Engine</h2>

<p><b>Type the first letter of the PHP MVC Framework</b></p>

<form method="POST" action="index.php">

<p><input type="text" size="40" id="txtHint"


onkeyup="showName(this.value)"></p>

</form>

<p>Matches: <span id="txtName"></span></p>

</body>

</html>

HERE,

 “onkeyup="showName(this.value)"” executes the JavaScript function showName


everytime a key is typed in the textbox.

This feature is called auto complete

Step 2) Creating the frameworks page


frameworks.php

<?php

$frameworks = array("CodeIgniter","Zend Framework","Cake PHP","Kohana") ;

$name = $_GET["name"];

if (strlen($name) > 0) {

$match = "";

for ($i = 0; $i < count($frameworks); $i++) {

if (strtolower($name) == strtolower(substr($frameworks[$i], 0,
strlen($name)))) {

if ($match == "") {

$match = $frameworks[$i];
} else {

$match = $match . " , " . $frameworks[$i];

echo ($match == "") ? 'no match found' : $match;

?>

Step 3) Creating the JS script


auto_complete.js

function showName(str){

if (str.length == 0){ //exit function if nothing has been typed in the


textbox

document.getElementById("txtName").innerHTML=""; //clear previous


results

return;

if (window.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera,


Safari

xmlhttp=new XMLHttpRequest();

} else {// code for IE6, IE5

xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");

xmlhttp.onreadystatechange=function() {

if (xmlhttp.readyState == 4 && xmlhttp.status == 200){

document.getElementById("txtName").innerHTML=xmlhttp.responseText;

}
}

xmlhttp.open("GET","frameworks.php?name="+str,true);

xmlhttp.send();

HERE,

 “if (str.length == 0)” check the length of the string. If it is 0, then the rest of the script is not
executed.

 “if (window.XMLHttpRequest)…” Internet Explorer versions 5 and 6 use ActiveXObject for AJAX
implementation. Other versions and browsers such as Chrome, FireFox use XMLHttpRequest.
This code will ensure that our application works in both IE 5 & 6 and other high versions of IE
and browsers.

 “xmlhttp.onreadystatechange=function…” checks if the AJAX interaction is complete and the


status is 200 then updates the txtName span with the returned results.

Step 4) Testing our PHP Ajax application


Assuming you have saved the file index.php In phututs/ajax, browse to the URL
http://localhost/phptuts/ajax/index.php
Type the letter C in the text box You will get the following results.

The above example demonstrates the concept of AJAX and how it can help us create rich
interaction applications.

Summary
 AJAX is the acronym for Asynchronous JavaScript and XML
 AJAX is a technology used to create rich interaction applications that reduce the interactions
between the client and the server by updating only parts of the web page.
 Internet Explorer version 5 and 6 use ActiveXObject to implement AJAX operations.
 Internet explorer version 7 and above and browsers Chrome, Firefox, Opera, and Safari use
XMLHttpRequest

Parsing XML essentially means navigating through an XML document and returning the relevant
data. An increasing number of web services return data in JSON format, but a large number still
return XML, so you need to master parsing XML if you really want to consume the full breadth
of APIs available.

Using PHP’s SimpleXML extension that was introduced back in PHP 5.0, working with XML is
very easy to do. In this article I’ll show you how.

Basic Usage
Let’s start with the following sample as languages.xml:

<?xml version="1.0" encoding="utf-8"?>


<languages>
<lang name="C">
<appeared>1972</appeared>
<creator>Dennis Ritchie</creator>
</lang>
<lang name="PHP">
<appeared>1995</appeared>
<creator>Rasmus Lerdorf</creator>
</lang>
<lang name="Java">
<appeared>1995</appeared>
<creator>James Gosling</creator>
</lang>
</languages>

The above XML document encodes a list of programming languages, giving two details about
each language: its year of implementation and the name of its creator.

The first step is to loading the XML using either simplexml_load_file() or


simplexml_load_string(). As you might expect, the former will load the XML file a file and
the later will load the XML from a given string.

<?php
$languages = simplexml_load_file("languages.xml");

Both functions read the entire DOM tree into memory and returns a SimpleXMLElement object
representation of it. In the above example, the object is stored into the $languages variable. You
can then use var_dump() or print_r() to get the details of the returned object if you like.

SimpleXMLElement Object
(
[lang] => Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => C
)
[appeared] => 1972
[creator] => Dennis Ritchie
)
[1] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => PHP
)
[appeared] => 1995
[creator] => Rasmus Lerdorf
)
[2] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => Java
)
[appeared] => 1995
[creator] => James Gosling
)
)
)

The XML contained a root language element which wrapped three lang elements, which is why
the SimpleXMLElement has the public property lang which is an array of three
SimpleXMLElements. Each element of the array corresponds to a lang element in the XML
document.

You can access the properties of the object in the usual way with the -> operator. For example,
$languages->lang[0] will give you a SimpleXMLElement object which corresponds to the first
lang element. This object then has two public properties: appeared and creator.

<?php
$languages->lang[0]->appeared;
$languages->lang[0]->creator;

Iterating through the list of languages and showing their details can be done very easily with
standard looping methods, such as foreach.

<?php
foreach ($languages->lang as $lang) {
printf(
"<p>%s appeared in %d and was created by %s.</p>",
$lang["name"],
$lang->appeared,
$lang->creator
);
}

Notice that I accessed the lang element’s name attribute to retrieve the name of the language.
You can access any attribute of an element represented as a SimpleXMLElement object using
array notation like this.

Dealing With Namespaces


Many times you’ll encounter namespaced elements while working with XML from different web
services. Let’s modify our languages.xml example to reflect the usage of namespaces:

<?xml version="1.0" encoding="utf-8"?>


<languages
xmlns:dc="http://purl.org/dc/elements/1.1/">
<lang name="C">
<appeared>1972</appeared>
<dc:creator>Dennis Ritchie</dc:creator>
</lang>
<lang name="PHP">
<appeared>1995</appeared>
<dc:creator>Rasmus Lerdorf</dc:creator>
</lang>
<lang name="Java">
<appeared>1995</appeared>
<dc:creator>James Gosling</dc:creator>
</lang>
</languages>

Now the creator element is placed under the namespace dc which points to
http://purl.org/dc/elements/1.1/. If you try to print the creator of a language using our previous
technique, it won’t work. In order to read namespaced elements like this you need to use one of
the following approaches.

The first approach is to use the namespace URI directly in your code when accessing
namespaced elements. The following example demonstrates how:

<?php
$dc = $languages->lang[1]- >children("http://purl.org/dc/elements/1.1/");
echo $dc->creator;

The children() method takes a namespace and returns the children of the element that are
prefixed with it. It accepts two arguments; the first one is the XML namespace and the latter is
an optional Boolean which defaults to false. If you pass true, the namespace will be treated as a
prefix rather the actual namespace URI.

The second approach is to read the namespace URI from the document and use it while accessing
namespaced elements. This is actually a cleaner way of accessing elements because you don’t
have to hardcode the URI.

<?php
$namespaces = $languages->getNamespaces(true);
$dc = $languages->lang[1]->children($namespaces["dc"]);

echo $dc->creator;

The getNamespaces() method returns an array of namespace prefixes with their associated
URIs. It accepts an optional parameter which defaults to false. If you set it true then the method
will return the namespaces used in parent and child nodes. Otherwise, it finds namespaces used
within the parent node only.

Now you can iterate through the list of languages like so:

<?php
$languages = simplexml_load_file("languages.xml");
$ns = $languages->getNamespaces(true);

foreach($languages->lang as $lang) {
$dc = $lang->children($ns["dc"]);
printf(
"<p>%s appeared in %d and was created by %s.</p>",
$lang["name"],
$lang->appeared,
$dc->creator
);
}

A Practical Example – Parsing YouTube Video Feed


Let’s walk through an example that retrieves the RSS feed from a YouTube channel displays
links to all of the videos from it. For this we need to make a call to the following URL:

http://gdata.youtube.com/feeds/api/users//uploads

The URL returns a list of the latest videos from the given channel in XML format. We’ll parse
the XML and get the following pieces of information for each video:

 Video URL
 Thumbnail
 Title

We’ll start out by retrieving and loading the XML:

<?php
$channel = "channelName";
$url = "http://gdata.youtube.com/feeds/api/users/".$channel."/uploads";
$xml = file_get_contents($url);

$feed = simplexml_load_string($xml);
$ns=$feed->getNameSpaces(true);

If you take a look at the XML feed you can see there are several entity elements each of which
stores the details of a specific video from the channel. But we are concerned with only thumbnail
image, video URL, and title. The three elements are children of group, which is a child of
entry:

<entry>

<media:group>

<media:player url="video url"/>
<media:thumbnail url="video url" height="height" width="width"/>
<media:title type="plain">Title…</media:title>

</media:group>

</entry>

We simply loop through all the entry elements, and for each one we can extract the relevant
information. Note that player, thumbnail, and title are all under the media namespace. So,
we need to proceed like the earlier example. We get the namespaces from the document and use
the namespace while accessing the elements.
<?php
foreach ($feed->entry as $entry) {
$group=$entry->children($ns["media"]);
$group=$group->group;
$thumbnail_attrs=$group->thumbnail[1]->attributes();
$image=$thumbnail_attrs["url"];
$player=$group->player->attributes();
$link=$player["url"];
$title=$group->title;
printf('<p><a href="%s"><img src="%s" alt="%s"></a></p>',
$player, $image, $title);
}

Conclusion
Now that you know how to use SimpleXML to parse XML data, you can improve your skills by
parsing different XML feeds from various APIs. But an important point to consider is that
SimpleXML reads the entire DOM into memory, so if you are parsing large data sets then you
may face memory issues. In those cases it’s advisable to use something other than SimpleXML,
preferably an event-based parser such as XML Parser. To learn more about SimpleXML, check
out its documentation.

And if you enjoyed reading this post, you’ll love Learnable; the place to learn fresh skills and
techniques from the masters. Members get instant access to all of SitePoint’s ebooks and
interactive online courses, like Jump Start PHP.

Comments on this article are closed. Have a question about PHP? Why not ask it on our forums?

S-ar putea să vă placă și