Intended Audience
Overview
Learning Objectives
Definitions
Pre-Requisites
How it Works
- A Brief Introduction to XML
- Getting started with XSLT
- Adding PHP to the mix
- Handling the processed output
- Making XSL work for its money
- What else can XSL do?
- Conclusion
Resources
About the Author

Intended Audience

This tutorial is intended for PHP programmers who use XML, or intend doing so, and who wish to manipulate their XML data quickly and easily.

Although this tutorial includes a short introduction to XML, readers new to the subject may find it more useful if they first read the section on XML parsing in the PHP manual.

Overview

With HTML, developers knew where they stood: design, content, and styling were all in one place. However, the new (and superior) trend is towards keeping various parts of data separate: XML stores content, CSS stores styling, and XHTML stores layout.

Using the new system, with content and layout clearly split, it is suddenly much easier to manipulate content without affecting layout. This is where XSLT comes in: it provides a way to process and output the data stored in an XML document based upon your processing instructions.

Learning Objectives

The purpose of this tutorial is to give you a primer in using PHP's XSLT extension (based on the Sablotron library) to process XSL.
No prior knowledge of XSL is assumed, although prior knowledge of XML is useful.
In this tutorial you will learn how to:
  • Create XSL stylesheets
  • Apply these stylesheets to your XML using PHP
  • Manipulate XSLT processing for maximum flexibility
  • You will also be able to apply the following functions:
xslt_create(); xslt_free(); xslt_process()

Definitions

XML: Extended Markup Language. XML is a language designed to enable you to create your own markup language, like HTML.

XSL: eXtensible Stylesheet Language. XSL is a wide-ranging, XML-based technology designed to process XML documents.

XSLT: XSL Transformations. This is the specific section of XSL that I shall deal with in this tutorial. It specifically lets you re-write the XML, and includes flow control statements.

XHTML: eXtended HyperText Markup Language. This is an XML/HTML hybrid that retains most of HTML's flexibility whilst forcing it to fit the XML syntax rules

Pre-Requisites

To install the XML and XSLT extension, you first need to have the Expat and Sablotron libraries installed somewhere your compiler can find them. If you are compiling PHP as a module for a fairly recent release of Apache, PHP will automatically use the bundled Expat library from Apache. You can get the latest version of the Sablotron library by Visiting www.gingerall.com.

Once you have the necessary prerequisites, go ahead and reconfigure PHP using these extra configuration options:
--with-xml --enable-xslt --with xslt-sablot

Note that the XSLT functions were overhauled in PHP 4.1. I strongly recommend you upgrade to at least v4.3 before trying any of the code examples in this article. However, your mileage may vary using older versions of PHP.

If you are using Windows you should find the relevant DLLs already compiled for them in their distribution. Depending on your version, some of the necessary files may be in the experimental directory.

How it Works

A Brief Introduction to XML

Extended Markup Language is designed to allow you to define your own markup languages. To most people, it just looks like customisable HTML, but it's more the other way around: you can define HTML (and other such languages) using XML, and indeed XHTML (a fully XML-compliant version of HTML) has been designed by the W3C.

Although some XML can look just like HTML, it's a little more complicated behind the scenes. Most web browsers accept messy HTML and try to make sense of it. For example you can have three <BODY> tags, unheard of attributes, table cells that never end, and the like, and the web browsers use a little artificial intelligence to try and guess what was actually meant. XML, however, is designed more strictly; it has a simple set of syntax rules, and you can't get around them. As a result, parsing XML is faster, easier, and more efficient than parsing HTML, and it is also easier for humans to read, which is always good!

XML can be said to be either valid and well formed, well-formed, or neither. Valid XML is also by definition well-formed, but well-formed XML is not necessarily valid. The definition of valid is that the XML document matches a Document Type Definition (DTD), a document that defines how the XML document should look grammatically. "Well-formed" means that a document fits the basic XML syntax rules.

If the given XML matches its DTD, then it is valid, and because it matches a DTD, it must therefore also be syntactically correct. This doesn't work the other way around. A document can be perfectly acceptable XML and yet not match its DTD.

Here is a quick list of the key syntax rules in XML:
  • All XML documents must have a root tag. This is the element which contains all others. In HTML, this was the <HTML> element.
  • Elements are case sensitive. <title> must have a matching </title> - </TiTlE> will not work, and neither will any other case combination.
  • Elements require a closing tag. So, <element> must be closed with </element> somewhere in the XML. <element /> is also valid, and is used as a short-hand if your element is not going to contain any text content.
  • Elements must be properly nested. <parent><child>foo</child><parent> is acceptable, whereas <parent><child>foo</parent></child> is not (because you always need to close the most recently-opened XML elements first).
  • Attributes must always have quotes. You need to use <body bgcolor="red"> as opposed to <body bgcolor=red>
There have been previous XML tutorials published on this site. If you need more information, I recommend you read through them before you continue.

To make sure we're on the same wavelength, here is an example XML document that I'll be using to explain how XSL works:

<?xml version="1.0" encoding="ISO-8859-1"?>
<channel>'
<item type="lie">
<title>Microsoft gives up on Windows</title>
<url>http://www.nothere.com/foo/bar</url>
</item>

<item type="lie">
<title>George Bush finds Iraq on map</title>
<url>http://www.somesite.com/news/4544.html</url>
</item>

<item type="lie">
<title>Man sells fridge to Eskimo</title>
<url>http://www.eskimostuff.com/blah/wombat.php</url>
</item>
</channel>

Save that file as input.xml somewhere PHP can access it; we'll come back to it later. For now, though, I am assuming you know enough XML to understand how it interacts with XSL.

Getting started with XSLT

XSLT is an XML-based language that allows you to manipulate XML documents before outputting them, and PHP implements XSLT processing through the use of the Sablotron library.

When I say that XSLT manipulates XML, it actually transforms it (hence the T in XSLT) into another form. With one XML document, you can make the same content look vastly different - for example, you could parse it with a WML XSL stylesheet and send it to WAP devices, or parse it with an SQL XSL stylesheet and send it to a database.

Several browsers (most notably Microsoft Internet Explorer) can perform XSL transformation on the client-side. It downloads an XML document, the XSL stylesheet, and any accompanying CSS files, then combines them all together on your visitor's computer. But what happens when someone with an old version of IE or any other non-XSL-enabled browser visits your site? They wouldn't see what you wanted them to see, that's for sure!

This is where PHP comes in. Your visitor enters a URL as per usual, and gets to a PHP page on which you have XML/XSL interaction going on. PHP loads the XML and the XSL, combines the two together into the output, and sends that output to the user (often in XHTML format). On the client-side, users see nothing special, no XML or XSL at all, just normal XHTML. Of course, there's nothing stopping that PHP page from analyzing the visitor's user agent and sending content fit for that browser, whether it be HTML 2, XHTML, WAP, or anything else.

Often, XSLT makes more sense once you see it in action, so here's an example XSL document designed to work on the example XML document given previously. Save it in the same directory, as input.xsl:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://my.netscape.com/rdf/simple/0.9/">

<xsl:output method="html" indent="no" encoding="utf-8"/>

<xsl:template match="/">
<html>
<head>
<title>XSLT</title>
</head>
<body>

<xsl:for-each select="/channel/item">
News Item: <xsl:value-of select="title"/><BR/>
</xsl:for-each>

</body>
</html>
</xsl:template>
</xsl:stylesheet>

Let's break that down line by line...

Line 1: <?xml version="1.0" encoding="utf-8" ?>: You should all be used to this; it's just the document type. Remember, XSL is an XML language, so it requires this.

Line 2: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://my.netscape.com/rdf/simple/0.9/">: This defines what namespaces you'll be using in this stylesheet. If you are unfamiliar with namespaces, think of them as vocabulary qualifiers. For example, “<jackrussell>” could be a dog or a person's name, whereas “<dog:jackrussell>” is defined as a specific type of jackrussell. Line 2 in the example XSL defines what namespaces are in use, and where they can be found online for reference.

Line 3: <xsl:output method="html" indent="no" encoding="utf-8"/>: lets you define various output attributes. Here, for example, we are saying that our target is HTML, amongst other things.

Line 4: <xsl:template match="/">: This is a key piece of markup in XSL. XSLuses its "match" attribute to pattern match XML elements. If it finds a matching XML element, the contents of the template are parsed. The pattern matching syntax isn't anything you'll find elsewhere because it is very specific to matching markup. In the example, we match "/", which equates to the root element. As you have a root element (""), that element is matched, and the contents of the template (, etc) are processed.

Lines 5-9: <html>...<body>: As these aren't XSL processing instructions, Sablotron considers them as output, and no processing is performed here.

Line 10: xsl:for-each is basically an array iterator, like the PHP construct foreach. Here, the array is whatever XML elements are found by pattern matching against the "select" attribute of the for-each. The pattern matching here is the same as used in xsl:template, although this time you are matching against elements that are children of elements. The for-each iterates through every element matching the criteria (there are three in the example) and processes the contents of the for-each once for every matching element.

Line 11 to end: These are closing tags, to ensure you remain XHTML compliant. Remember, you must always close tags you open!

Adding PHP to the mix

We now have a file of well-formed XML and a file of well-formed XSL, but they remain separate. In order to combine the two together to make a final page, you need to mix in the magic ingredient: PHP.

As there are only thirteen XSLT functions in PHP at the time of writing, the learning curve is fairly smooth. The three key functions to learn are xslt_create(), xslt_free(), and xslt_process().

xslt_create() and xslt_free() are used together to create and destroy XSLT processors. As all XSLT processing in PHP is performed by a processor, you will need to call xslt_create() at least once in order to get started. Note that it returns the XSLT processor resource you should use for manipulation in other XSLT functions. For example, xslt_free() takes an XSLT resource as its only parameter, and frees up the memory associated with the provided processor.

This just leaves xslt_process(), which is the core function in PHP XSLT parsing. This function, which can take a maximum of six parameters in total, is where we combine XML and XSLT. The result is transformed XML, the great thing being that it has been transformed into the format you want.

The first three parameters for xslt_process() are the XSLT processor resource to use, the location of the XML file, and the location of the XSL file. These are the only parameters that are required in order to perform the transformation; all the others are optional, and can usually be ignored.

As you know how to create an XSLT processor, and you already have input.xml and input.xsl, your xslt_process() function call is very straightforward:

xslt_process($xsltproc'/path/to/input.xml''/path/to/input.xsl');

In order to create and destroy the $xsltproc XSLT processor, you need the following line above xslt_process():

$xsltproc xslt_create();

and the following line below xslt_process():

xslt_free($xsltproc);

The complete script, then, is just three lines long: we create a processor, merge the XML and XSL together, then free the processor.

Handling the processed output

There are two ways to deal with the output of xslt_process():

One way is to use the optional fourth parameter that allows you to specify the name of a file you wish to save the processed content to.

If you do not provide this parameter, or it is NULL, then xslt_process() returns the processed content via its return value.

The option you choose is obviously dependent on your plans for the output. Usually you will probably want to ignore the fourth parameter (or supply NULL) in order to get the content back in the function return value. This is done by capturing the return value in a variable, or by passing it straight to an echo statement. Using our previous example, here is what that would look like. Save it as xslt_test.php:

<?php
$xsltproc 
xslt_create();
$xslt_result xslt_process($xsltproc'/path/to/input.xml''/path/to/input.xsl');
xslt_free($xsltproc);
?>
...........
<?php
echo $xslt_result;
?>

The difference here is that we capture the return value of xslt_process() in $xslt_result, which we then echo out later on the script. While it's just as good in this situation to replace the "$xslt_result = " line with just "echo", you may wish to replace the dots between the xslt_process() and the echoing out of the results with more PHP code. (Note that there is no need to add an <HTML> tag in the PHP output because XSL is designed to add <HTML> and other basic tags if it matches a root element.)

Go ahead and run the xslt_test.php script. If everything has gone as planned, you should be see your formatted XML returned. Also, take a look at the source of the page and note that it's all nicely-formatted XHTML.

Making XSL work for its money

Now that we can control how our output looks, based upon the XSL stylesheet its processed with, let's try something a little harder that will better demonstrate the flexibility of the XML/XSLT combination.

As mentioned previously, XSLT can transform your output into pretty much any format you want, including unformatted languages like SQL. If the xslt_test.php script is modified to check whether a certain HTTP GET variable is set, we can alter the format of the output merely by changing the URL.

This change needs to be implemented in two steps. Firstly, you need a new XSL stylesheet to transform your content into SQL. Naturally this will look a lot like the previous stylesheet, because the logic is basically the same: loop through all /channel/item elements, and output data about it.

Here is the XSL stylesheet necessary to transfer our example XML into SQL. Save this as "sql.xsl" in the same directory
as the previous files.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/"
>
<xsl:output
method="html"
indent="no"
encoding="utf-8"
/>

<xsl:template match="/">
<xsl:for-each select="/channel/item">
INSERT INTO News (Title, Link) VALUES ('<xsl:value-of select="title"/>', '<xsl:value-of select="url"/>')<BR/>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

The key differences are that we no longer output any HTML. Our output target is different now, and HTML would not work in a MySQL query. Also, this time the stylesheet prints out the "url" value of the XML, along with the "title" value, nested inside an SQL query.

In order to facilitate the output selection, you also need to modify the xslt_test.php script so that it changes the input XSL files based upon the value of a variable. There are various ways to do this, but below I've included an example to get you started:

<?php

$xsltproc 
xslt_create();

if (isset(
$USESQL)) {
$xslinput 'sql.xsl';
} else {
$xslinput 'formatted.xsl';
}

$hResult xslt_process($xsltproc'final.rss'$xslinput);
print 
$hResult;

xslt_free($xsltproc)

?> 

What else can XSL do?

You'd be surprised at the power of XSL. What you have seen here really is only the beginning. For example, many sites make their news content available in XML format so that sites can syndicate their news. In the XSL stylesheets above, I have included a specific xmlns attribute in the xsl:stylesheet tag that will allow you to parse the RSS and RDF formats used by many syndicating news sites. Experiment!

There are also a variety of other XSL processing tags for you to make use of, conditional statements ("if foo = bar") for example. The W3C has a very large collection of freely-available information online about XML and XSLT, which is a rich mine for ideas; be sure to check it out!

Conclusion

XSLT is a very powerful technology that is only recently finding a niche for itself. As with many new technologies, its biggest limitation is acceptance: many browsers just do not support XSLT processing, and probably will not support it within the forseeable future.

Congratulate yourself once again for having chosen to use PHP, as the PHP support for server-side parsing is complete and very powerful. The end result of switching from client-side XSLT parsing to server-side XSLT parsing is the same as switching to PHP for form validation instead of using JavaScript: you're guaranteed 100% compatibility with all clients, as well as giving yourself the ability to add in extra functionality that would otherwise have been problematic.

Even the most basic XML implementation can provide instant return on investment, simply by making your content available in a pure format. Furthermore, you can add more powerful XSLT processing at a later date.
Perhaps the most popular use for server-side XSLT parsing is serving a particular content type (eg: WML for WAP phones) based upon the user agent received.

Resources

http://www.php.net/manual/en/ref.xml.php
http://www.php.net/manual/en/ref.xslt.php
http://www.phpbuilder.com/columns/justin20001025.php3
http://www.phpbeginner.com/columns/ray/xml

About the Author

Paul Hudson is a full-time IT journalist specialising in web technologies. You can read more of his work every month in Future Publishing's Linux Format magazine, or you can contact him directly at hudzilla@php.net.