SimpleXML

Introduction
Advanced Simplicity
•  Namespaces
•  Searching, Splitting, Recursing
•  Edge Conditions
Summary
About the Author



Introduction

“Simplicity of character is no hindrance to the subtlety of
intellect.”

– John Morley

When people ask me “What is SimpleXML?” I often quip, “XML is the
solution to all your problems; SimpleXML
ensures it isn’t the root of your
problems!”

Those of you who have parsed XML with
PHP4, or are currently dealing with XML parsing in PHP4, know
that it can indeed be very painful to handle documents with any
degree of complexity. You either need to use the SAX approach and
write a handwritten parser for every document, or you need to use
the DOM extension; which (in addition to its tendency to
crash, leak and generally misbehave under heavy usage) involves the
pain of processing documents using an API designed for a heavily
object oriented language and targeted at supporting every single one of
XML’s idiosyncrasies.

Consider the following small XML
snippet, which describes a small collection of books in
XML format. The document has a root node of library, with
a direct child of shelf, which classifies the books as fiction.
The shelf displayed has two children() labelled
book; “Of Mice and Men” by John Steinbeck and

“Harry Potter and the Philospher’s Stone” by J.K. Rowling.



  <?xml version="1.0"?>

  <library>
   <shelf id="fiction">

    <book>

     <title>Of Mice and Men</title>

     <author>John Steinbeck</author>

    </book>

    <book>

     <title>Harry Potter and the Philosopher's Stone</title>

     <author>J.K. Rowling</author>

    </book>

   </shelf>

  </library>

The document itself is simple enough: you can
see the structure very clearly, and you can understand the path you
need to follow to access that information.

Now, before we get into why SimpleXML will
change your life, let’s first look at how one would parse this document
using DOM:



<?php

  $doc
= new domDocument();
  
$doc->load('library.xml');

  $library = $doc->documentElement;

  $shelves = $library->childNodes;

  foreach ($shelves as $shelf) {

     if (
$shelf instanceof domElement) {

          process_shelf($shelf);

     }

  }

  function process_shelf($shelf)

  {  

      
printf("Shelf %s\n", $shelf->getAttribute('id'));

      $books = $shelf->childNodes;

      foreach (
$books as $book) {

          if (
$book instanceof domElement) {

              process_book($book);

          }

      }

  }

  function process_book($book)

  {

      foreach (
$book->childNodes as $child) {

          if (! ($child instanceof domElement)) {

              continue;

          }

          foreach($child->childNodes as $element) {

              
$content = trim($element->nodeValue);

              switch ($child->tagName) {

              case
'title':

                  
printf("Title: %s\n", $content);

                  break;

              case
'author':

                  printf("Author: %s\n", $content);

                  break;




             &nbsp}

          }

      }

  }

  ?>


As you can see, it takes 47 lines of
well-crafted PHP code – with no error checking- to
manipulate and print out a list of the books within the XML file.
With error checking, comments and other things you might find
add in the real world, it could easily
take 70-80 lines of code to parse this straightforward, simple XML
document.

Contrast the example above with the following piece of code that
uses the SimpleXML extension to access the same document, and print out
the exact same information.

  <?php

  $library
= simplexml_load_file('library.xml');
  foreach (
$library->shelf as $shelf) {

      
printf("Shelf %s\n", $shelf['id']);

      foreach ($shelf->book as $book) {

          
printf("Title: %s\n", $book->title);

          
printf("Author: %s\n", $book->author);

      }

  }

  ?>



With SimpleXML, element names are automatically mapped to properties
on an object, and this happens recursively. Attributes are mapped to
iterator accesses. All of this happens “on-demand,” using Zend Engine 2′s
new object overloading features. SimpleXML’s “low-fat” approach to XML
parsing reduced the code size of this example from 47 lines of code, to
a mere 10. Furthermore, the code is considerably more readable:
instead of using statements like foreach($child->childNodes as $element)
to access the element node of an XML child, you simply reference it by name.


Advanced Simplicity

In a perfect world all XML documents, and the information you needed
to extract from them, would be as basic as the example given
above. In fact this is true in many cases:
configuration files, basic data export, and basic serialization all
require parsing capabilities no greater than the above example.
There are, however, some cases where the basic
functionality listed above simply isn’t suitable.


Namespaces

One issue that SimpleXML encountered was XML namespaces. XML
documents allow you to hide tags away into a labelled section called a
namespace. SimpleXML originally solved namespaces by
simply adding another level of indirection:



  <?xml version="1.0"?>

  <entries xmlns:blog="http://www.edwardbear.org/serendipity/">
   <blog:entry>

    <blog:name>RPROF - Regular Expression Profiler</blog:name>

   </blog:entry>

   <blog:entry>

    <blog:name>Advanced PHP Programming</blog:name>

   </blog:entry>

  </entries>

To print out the names of all the different blog entries you
could write the following code:



  <?php

  $entries
= simplexml_load_file('syndic.xml');
  foreach (
$entries->blog->entry as $entry) {

      printf("%s\n", $entry->name);

  }

  
?>



This approach, however, proved to be too naive; while it was fine
for parsing a particular document, it was
no good at all for
any type of generalized processing. One
thing to note about XML namespaces is that the qualified name (i.e.
blog) is just a simple alias with no
particular relevance. The significant portion of a namespace is the
URL ( "http://blog.coggeshall.org/" target="_blank">http://www.edwardbear.org/serendipity/),
which is what people who parse XML documents should rely
upon.

Therefore, the approach SimpleXML takes to supporting multiple
namespaces is not to add any changes to the way you access properties,
but rather to give you two methods: attributes() and children(). The
children() function returns all the children() of an XML node in a given
namespace. If no namespace is passed to the children() function, all
the elements in the global namespace are returned.

The example given above is properly parsed with the following bit
of code:



  <?php

  $entries
= simplexml_load_file('syndic.xml');
  foreach (
$entries->children('http://www.edwardbear.org/serendipity/') as $entry) {

      printf("%s\n", $entry->name);

  }

  
?>



Note: You may also pass the qualified name to the
children() or attributes() method so they will check for that as well,
but this is not recommended.


Searching, Splitting, Recursing

The other way that SimpleXML didn’t really address the needs
of people developing XML applications was that, while it provided a
nice way to algorithmically process a document, it didn’t provide any
features for performing common searches and accesses. For example, how
does one access all descendants of a given node? How can
you search a document, and find a tag and a value that both match a
given condition? There are many common operations on XML
documents that are a pain to write by hand, and desperately
need simplification.

As a solution to this problem, SimpleXML doesn’t re-invent the
wheel, but instead provides the xpath() method, which allows you to perform W3C
standard Xpath queries on an XML document. A problem like getting all
descendants of a given node turns into a highly optimized Xpath query
//children(). While the full scope of Xpath is well
outside the scope of this document, it is recommended that anyone
serious about processing XML should learn to use the Xpath language,
which is as important to XML as Regular
Expressions are to plain text.


Edge Conditions

While SimpleXML is a great tool for processing XML, its simplicity
does come with a few drawbacks. Most notable among these is
that processing mixed XML and text content with
SimpleXML is very hard. For example,
consider the following XML



  <?xml version="1.0"?>

  <flaw>
    <blurb>

      This <italic>is</italic> some sample <bold>text</bold> where

      SimpleXML <underline>will</underline> not behave well.

    </blurb>

  </flaw>

Accessing $document->blurb with print_r() or

var_dump() would return an element iterator that
contained the contents of italic, bold, and underline.
It would not, however, return the text surrounding those elements. This is because when given
the choice between mixed elements and contents, SimpleXML will always choose to return the
elements, and ignore the contents, of a particular tag.

SimpleXML has two solutions to this problem built into the library.
Firstly, a method called asXML() is provided, which
will take the given node and serialize its contents, as well as the
contents of all its children(), to either a file or a string.
With the example above, you would call

$document->blurb->asXML() and it would return the full contents
of the blurb node in a format suitable for printing or
further processing.

The second solution is to bypass SimpleXML for certain portions of
your document. One of the explicit design goals of PHP5′s
XML support was to allow all extensions to interoperate at a minimal
cost. Since LibXML2 is the lingua franca of all XML extensions, DOM and
SimpleXML objects can be exchanged with zero copies. It’s just
a different way of viewing the same underlying object! By this method, the
DOM extension can “import” SimpleXML objects and use them as DOM
objects, and vice versa. When you need to use a DOM feature you can,
and when you need SimpleXML’s ease of processing, you have that too.


Summary

PHP5′s new XML support was designed as a coherent set of APIs
to process and manipulate XML. This includes the DOM
extension, which provides all you’ll ever need for
handling XML, the SAX API for streaming XML parsing, XSLT for XML
transformations and SimpleXML when you need to do anything else.

Be fruitful and multiply!


About the Author

Sterling Hughes is a PHP core developer and the chief instigator of
the SimpleXML extension for PHP 5. His earlier contributions include
the ADT, cURL, XSLT, and Mono extensions. He works
as a freelance Web developer, creating dynamic Web applications in PHP,
C and Perl, and is also the co-author of PHP Developer’s Cookbook.

Sterling can be contacted at

sterling@apache.org.

Published: March 16th, 2004 at 12:00
Categories: Tutorials
Tags: , ,

8 comments to “SimpleXML”

_____anonymous_____
April 9th, 2007 at 12:35 am

How about something like this for namespace access:

$node->{"namespace:tag"}

As this would make SimpleXml brilliant to use. Currently I end up with similar amounts of code to DOM when namespaces are involved.

It would be useful if you could write object in the SimpleXML format that is output, and have them output XML.

This would have several uses…

Just a thought :>)

I am trying to convert and use a document that uses namespaces (i.e. all tags have a namespace prefix).
When trying to convert this using simplexml_load_string(…) I am getting false/null as result although the string contains definitely legal XML.
I suspect one needs to fiddle with the optional namespace parameter here but the documentation is severely lacking. Could some kind soul feel tempted to explain the optional "namespace" and "is_prefix" parameters of the simplexml_load_string(…) and simplexml_load_file(…) methods? What exactly are these arguments meant for andor what do they control? What types does one have to pass here? Could someone maybe provide an example?

Michael

I struggled with this for a while as well. Here is an example:
<?php
$soap_request_string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope&quot; xmlns:ns1="urn:Gateway_Proxy" xmlns:xsd="http://www.w3.org/2001/XMLSchema&quot; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:enc="http://www.w3.org/2003/05/soap-encoding">
<env:Body>
<ns1:make_proxy_payment env:encodingStyle="http://www.w3.org/2003/05/soap-encoding">
<payment_id>61ecc268-1cd0-f468</payment_id>
<payment_amount>15495</payment_amount>
<callback_query_string>&amp;payment_id=61ecc268-1cd0-f468</callback_query_string>
<transaction_note>Order from Student Library Fees with Payment Id: 61ecc268-1cd0-f468</transaction_note>
</ns1:make_proxy_payment>
</env:Body>
</env:Envelope>
XML;

$xml_element = new SimpleXMLElement($soap_request_string, NULL, false, ‘http://www.w3.org/2003/05/soap-envelope‘);
$name_spaces = $xml_element->getNamespaces(true);
//print_r($name_spaces);

foreach ($xml_element->children($name_spaces['env']) as $body)
{
//printf("%s<br />", $body->getName());

foreach ($body->children($name_spaces['ns1']) as $function)
{
printf("%s<br />", $function->getName());

foreach ($function->children() as $parameters)
{
printf("%s => "%s"<br />", $parameters->getName(), $parameters);
}
}
}
?>

Hi! If I define an attribute with a different namespace, I’d expect that it’s attributes have the same namespace by default.
Example: Let’s say that default document’s namespace is Z.
In this xml:

<document xmlns:blog="http://www.foobar.com/blog">
<blog:entry author="John">Hello world</blog:entry>
</document>

I’d expect that attribute "author" has the namespace "http://www.foobar.com/blog". However, it has the document’s default Z namespace. I need to change this document to:

<document xmlns:blog="http://www.foobar.com/blog">
<blog:entry blog:author="John">Hello world</blog:entry>
</document>

But it doesn’t seem clear to me. I think that default attribute’s namespace should be the one from the tag, and not for the whole document.

Regards,


Andres P. Ferrando
http://www.pruna.com.ar

_____anonymous_____
May 2nd, 2008 at 5:08 am

Excellent article, thank you!

xpath("//children()") did not work for me

instead of it this had worked fine:

foreach ($xml->row as $row){
foreach ($row->xpath("*") as $node){
echo($node->getName());
echo ": ";
echo($node);
echo "\n";
}
}

hi all
I am posting a XML to some url and getting a XML returned but when i am checking i am gettign the following on page source
[code]
<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://tempuri.org/"><?xml version="1.0" encoding="utf-16"?>
<UpdateTicketAction xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<MemberID>zadmin</MemberID>
<CompanyName>company</CompanyName>
<CultureInfoName>en-US</CultureInfoName>
<IntegrationLoginId>intid</IntegrationLoginId>

<IntegrationPassword>intpass</IntegrationPassword>
<Ticket>
<ManagedId />
<CompanyId>company</CompanyId>
<Summary>testing15</Summary>
<SiteName>thewebnsoft</SiteName>
<Status>N</Status>

<AddressLine1>shyam nagar</AddressLine1>
<AddressLine2>vivek vihar</AddressLine2>
<City>jaipur</City>
<StateId>raj</StateId>
<Zip>320440</Zip>
<Resolution />

<ProblemDescription>test test test....</ProblemDescription>
<SendingSrServiceRecid>0</SendingSrServiceRecid>
<DateReq>2009-07-15T13:53:03</DateReq>
<SubBillingMethodId>None</SubBillingMethodId>
<SubBillingAmount>0</SubBillingAmount>
<SubDateAccepted>0001-01-01T00:00:00</SubDateAccepted>

<Id>24451</Id>
</Ticket>
<ManagedId />
<CompanyId>DTi</CompanyId>
<SrServiceRecid>24451</SrServiceRecid>
</UpdateTicketAction></string>
[/code]

I need to fetch the Ticket ID 24451 – if someone can guide me how i can do this using simplexml