Using Sablotron to process XSLT

Intended Audience

Overview

Learning Objectives

Definitions

Pre-Requisites

How it Works

- A Brief Introduction to XML

- Getting started with XSLT

- Adding PHP to the mix

- Handling the processed output

- Making XSL work for its money

- What else can XSL do?

- Conclusion

Resources

About the Author



Intended Audience


This tutorial is intended for PHP programmers who use XML, or intend doing so,
and who wish to manipulate their XML data quickly and easily.



Although this tutorial includes a short introduction to XML, readers new to the
subject may find it more useful if they first read the section on XML parsing
in the PHP manual.

Overview

With HTML, developers knew where they stood: design, content, and styling were
all in one place. However, the new (and superior) trend is towards keeping various
parts of data separate: XML stores content, CSS stores styling, and XHTML stores
layout.



Using the new system, with content and layout clearly split, it is suddenly much
easier to manipulate content without affecting layout. This is where XSLT comes
in: it provides a way to process and output the data stored in an XML document
based upon your processing instructions.

Learning Objectives


The purpose of this tutorial is to give you a primer in using PHP’s XSLT extension
(based on the Sablotron library) to process XSL.

No prior knowledge of XSL is assumed,
although prior knowledge of XML is useful.

In this tutorial you will learn how to:

  • Create XSL stylesheets
  • Apply these
    stylesheets to your XML using PHP

  • Manipulate XSLT
    processing for maximum flexibility
  • You will also be
    able to apply the following functions:

xslt_create(); xslt_free(); xslt_process()

Definitions


XML: Extended Markup Language. XML is a language designed to enable you to create
your own markup language, like HTML.



XSL: eXtensible Stylesheet Language. XSL is
a wide-ranging, XML-based technology designed to process XML
documents.


XSLT: XSL Transformations. This is the
specific section of XSL that I shall deal with in this tutorial. It specifically
lets you re-write the XML, and includes flow control
statements.



XHTML: eXtended HyperText Markup Language. This is an XML/HTML hybrid that retains
most of HTML’s flexibility whilst forcing it to fit the XML syntax rules

Pre-Requisites


To install the XML and XSLT extension, you first need to have the Expat and Sablotron
libraries installed somewhere your compiler can find them. If you are compiling
PHP as a module for a fairly recent release of Apache, PHP will automatically
use the bundled Expat library from Apache. You can get the latest version of the
Sablotron library by Visiting www.gingerall.com.



Once you have the necessary prerequisites,
go ahead and reconfigure PHP using these extra configuration
options:

--with-xml --enable-xslt --with
xslt-sablot




Note that the XSLT functions were
overhauled in PHP 4.1. I strongly recommend you upgrade to at least v4.3 before
trying any of the code examples in this article. However, your mileage may vary
using older versions of PHP.



If you are using Windows you should find the relevant DLLs already compiled for
them in their distribution. Depending on your version, some of the necessary files
may be in the experimental directory.

How it Works


A Brief Introduction to XML


Extended Markup Language is designed to allow you to define your own markup languages.
To most people, it just looks like customisable HTML, but it’s more the other
way around: you can define HTML (and other such languages) using XML, and indeed
XHTML (a fully XML-compliant version of HTML) has been designed by the W3C.



Although some XML can look just like HTML,
it’s a little more complicated behind the scenes. Most web browsers accept messy
HTML and try to make sense of it. For example you can have three <BODY>
tags, unheard of attributes, table cells that never end, and the like, and the
web browsers use a little artificial intelligence to try and guess what was
actually meant. XML, however, is designed more strictly; it has a simple set of
syntax rules, and you can’t get around them. As a result, parsing XML is faster,
easier, and more efficient than parsing HTML, and it is also easier for humans
to read, which is always good!



XML can be said to be either valid and well
formed, well-formed, or neither. Valid XML is also by definition well-formed,
but well-formed XML is not necessarily valid. The definition of valid is that
the XML document matches a Document Type Definition (DTD), a document that
defines how the XML document should look grammatically. “Well-formed” means that
a document fits the basic XML syntax rules.



If the given XML matches its DTD, then it
is valid, and because it matches a DTD, it must therefore also be syntactically
correct. This doesn’t work the other way around. A document can be perfectly
acceptable XML and yet not match its DTD.



Here is a quick list of the key syntax rules in XML:


  • All XML documents must have a root tag. This is
    the element which contains all others. In HTML, this was the <HTML>
    element.
  • Elements are case sensitive.
    <title> must have a matching </title> – </TiTlE> will not
    work, and neither will any other case combination.

  • Elements require a closing tag. So,
    <element> must be closed with </element> somewhere in the XML.
    <element /> is also valid, and is used as a short-hand if your element is
    not going to contain any text content.
  • Elements
    must be properly nested.
    <parent><child>foo</child><parent> is acceptable,
    whereas <parent><child>foo</parent></child> is not
    (because you always need to close the most recently-opened XML elements first).

  • Attributes must always have quotes. You need to
    use <body bgcolor=”red”> as opposed to <body bgcolor=red>

There have been previous XML tutorials published on this site. If you need more
information, I recommend you read through them before you continue.



To make sure we’re on the same wavelength, here is an example XML document that
I’ll be using to explain how XSL works:


<?xml version="1.0" encoding="ISO-8859-1"?>

<channel>'

<item type="lie">

<title>Microsoft gives up on Windows</title>

<url>http://www.nothere.com/foo/bar</url>

</item>

<item type="lie">

<title>George Bush finds Iraq on map</title>

<url>http://www.somesite.com/news/4544.html</url>

</item>

<item type="lie">

<title>Man sells fridge to Eskimo</title>

<url>http://www.eskimostuff.com/blah/wombat.php</url>

</item>

</channel>

Save that file as input.xml somewhere PHP can access it; we’ll come back to it
later. For now, though, I am assuming you know enough XML to understand how it
interacts with XSL.

Getting started with XSLT


XSLT is an XML-based language that allows you to manipulate XML documents before
outputting them, and PHP implements XSLT processing through the use of the Sablotron
library.



When I say that XSLT manipulates XML, it
actually transforms it (hence the T in XSLT) into another form. With one XML
document, you can make the same content look vastly different – for example, you
could parse it with a WML XSL stylesheet and send it to WAP devices, or parse it
with an SQL XSL stylesheet and send it to a database.



Several browsers (most notably Microsoft
Internet Explorer) can perform XSL transformation on the client-side. It
downloads an XML document, the XSL stylesheet, and any accompanying CSS files,
then combines them all together on your visitor’s computer. But what happens
when someone with an old version of IE or any other non-XSL-enabled browser
visits your site? They wouldn’t see what you wanted them to see, that’s for
sure!



This is where PHP comes in. Your visitor
enters a URL as per usual, and gets to a PHP page on which you have XML/XSL
interaction going on. PHP loads the XML and the XSL, combines the two together
into the output, and sends that output to the user (often in XHTML format). On
the client-side, users see nothing special, no XML or XSL at all, just normal
XHTML. Of course, there’s nothing stopping that PHP page from analyzing the
visitor’s user agent and sending content fit for that browser, whether it be
HTML 2, XHTML, WAP, or anything else.



Often, XSLT makes more sense once you see
it in action, so here’s an example XSL document designed to work on the example
XML document given previously. Save it in the same directory, as
input.xsl:


<?xml version="1.0" encoding="utf-8" ?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://my.netscape.com/rdf/simple/0.9/">

<xsl:output method="html" indent="no" encoding="utf-8"/>

<xsl:template match="/">

<html>

<head>

<title>XSLT</title>

</head>

<body>

<xsl:for-each select="/channel/item">

News Item: <xsl:value-of select="title"/><BR/>

</xsl:for-each>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

Let’s break that down line by line…



Line 1:
<?xml version="1.0" encoding="utf-8"
?>
: You should all be used to this; it’s just
the document type. Remember, XSL is an XML language, so it requires this.




Line 2:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/">
:
This defines what namespaces you’ll be using in this stylesheet. If you are
unfamiliar with namespaces, think of them as vocabulary qualifiers. For
example, “<jackrussell>” could be a dog or a person’s name,
whereas “<dog:jackrussell>” is defined as a specific

type of jackrussell. Line 2 in the example XSL defines what namespaces
are in use, and where they can be found online for reference.



Line 3:
<xsl:output method="html" indent="no"
encoding="utf-8"/>
: lets you define various
output attributes. Here, for example, we are saying that our target is HTML,
amongst other things.



Line 4:
<xsl:template
match="/">
: This is a key piece of markup in
XSL. XSLuses its “match” attribute to pattern match XML elements. If it finds a
matching XML element, the contents of the template are parsed. The pattern
matching syntax isn’t anything you’ll find elsewhere because it is very specific
to matching markup. In the example, we match “/”, which equates to the root
element. As you have a root element (“”), that element is matched, and the
contents of the template (, etc) are processed.



Lines 5-9:
<html>...<body>:
As these aren’t XSL processing instructions, Sablotron considers them as output,
and no processing is performed here.



Line 10:
xsl:for-each
is basically an array iterator, like the PHP
construct foreach. Here, the array is whatever XML elements are found by pattern
matching against the “select” attribute of the for-each. The pattern matching
here is the same as used in xsl:template, although this time you are matching
against elements that are children of elements. The for-each iterates through
every element matching the criteria (there are three in the example) and
processes the contents of the for-each once for every matching element.




Line 11 to end: These are closing tags, to ensure you remain XHTML compliant.
Remember, you must always close tags you open!

Adding PHP to the mix


We now have a file of well-formed XML and a file of well-formed XSL, but they
remain separate. In order to combine the two together to make a final page, you
need to mix in the magic ingredient: PHP.



As there are only thirteen XSLT functions
in PHP at the time of writing, the learning curve is fairly smooth. The three
key functions to learn are xslt_create(), xslt_free(), and
xslt_process().



xslt_create() and xslt_free() are used
together to create and destroy XSLT processors. As all XSLT processing in PHP is
performed by a processor, you will need to call xslt_create() at least once in
order to get started. Note that it returns the XSLT processor resource you
should use for manipulation in other XSLT functions. For example, xslt_free()
takes an XSLT resource as its only parameter, and frees up the memory associated
with the provided processor.



This just leaves xslt_process(), which is
the core function in PHP XSLT parsing. This function, which can take a maximum
of six parameters in total, is where we combine XML and XSLT. The result is
transformed XML, the great thing being that it has been transformed into the
format you want.



The first three parameters for
xslt_process() are the XSLT processor resource to use, the location of the XML
file, and the location of the XSL file. These are the only parameters that are
required in order to perform the transformation; all the others are optional,
and can usually be ignored.



As you know how to create an XSLT
processor, and you already have input.xml and input.xsl, your xslt_process()
function call is very straightforward:


xslt_process($xsltproc'/path/to/input.xml''/path/to/input.xsl');


In order to create and destroy the
$xsltproc XSLT processor, you need the following line above
xslt_process():


$xsltproc xslt_create();

and the following line below
xslt_process():


xslt_free($xsltproc);

The complete script, then, is just three lines long: we create a processor, merge
the XML and XSL together, then free the processor.

Handling the processed output


There are two ways to deal with the output of xslt_process():



One way is to use the optional fourth
parameter that allows you to specify the name of a file you wish to save the
processed content to.



If you do not provide this parameter, or it
is NULL, then xslt_process() returns the processed content via its return value.




The option you choose is obviously
dependent on your plans for the output. Usually you will probably want to ignore
the fourth parameter (or supply NULL) in order to get the content back in the
function return value. This is done by capturing the return value in a variable,
or by passing it straight to an echo statement. Using our previous example, here
is what that would look like. Save it as xslt_test.php:


<?php

$xsltproc 
xslt_create();

$xslt_result xslt_process($xsltproc'/path/to/input.xml''/path/to/input.xsl');

xslt_free($xsltproc);

?>

...........

<?php

echo $xslt_result;

?>

The difference here is that we capture the
return value of xslt_process() in $xslt_result, which we then echo out later on
the script. While it’s just as good in this situation to replace the
$xslt_result =

” line with just “echo”, you may wish to replace
the dots between the xslt_process() and the echoing out of the results with more
PHP code. (Note that there is no need to add an <HTML> tag in the PHP
output because XSL is designed to add <HTML> and other basic tags if it
matches a root
element.)



Go ahead and run the xslt_test.php script.
If everything has gone as planned, you should be see your formatted XML
returned. Also, take a look at the source of the page and note that it’s all
nicely-formatted XHTML.

Making XSL work for its money


Now that we can control how our output looks, based upon the XSL stylesheet its
processed with, let’s try something a little harder that will better demonstrate
the flexibility of the XML/XSLT combination.



As mentioned previously, XSLT can transform
your output into pretty much any format you want, including unformatted
languages like SQL. If the xslt_test.php script is modified to check whether a
certain HTTP GET variable is set, we can alter the format of the output merely
by changing the URL.



This change needs to be implemented in two
steps. Firstly, you need a new XSL stylesheet to transform your content into
SQL. Naturally this will look a lot like the previous stylesheet, because the
logic is basically the same: loop through all /channel/item elements, and output
data about it.



Here is the XSL stylesheet necessary to
transfer our example XML into SQL. Save this as “sql.xsl” in the same directory

as the previous files.


<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns="http://my.netscape.com/rdf/simple/0.9/"

>

<xsl:output

method="html"

indent="no"

encoding="utf-8"

/>

<xsl:template match="/">

<xsl:for-each select="/channel/item">

INSERT INTO News (Title, Link) VALUES ('<xsl:value-of select="title"/>', '<xsl:value-of select="url"/>')<BR/>

</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

The key differences are that we no longer
output any HTML. Our output target is different now, and HTML would not work in
a MySQL query. Also, this time the stylesheet prints out the “url” value of the
XML, along with the “title” value, nested inside an SQL query.



In order to facilitate the output
selection, you also need to modify the xslt_test.php script so that it changes
the input XSL files based upon the value of a variable. There are various ways
to do this, but below I’ve included an example to get you
started:


<?php

$xsltproc xslt_create();

if (isset($USESQL)) {

$xslinput 'sql.xsl';

} else {

$xslinput 'formatted.xsl';

}

$hResult xslt_process($xsltproc'final.rss'$xslinput);

print $hResult;

xslt_free($xsltproc)

?> 

What else can XSL do?

You’d be surprised at the power of XSL. What you have seen here really is only
the beginning. For example, many sites make their news content available in XML
format so that sites can syndicate their news. In the XSL stylesheets above, I
have included a specific xmlns attribute in the xsl:stylesheet tag that will allow
you to parse the RSS and RDF formats used by many syndicating news sites. Experiment!



There are also a variety of other XSL processing tags for you to make use of,
conditional statements (“if foo = bar”) for example. The W3C has a very large
collection of freely-available information online about XML and XSLT, which is
a rich mine for ideas; be sure to check it out!

Conclusion


XSLT is a very powerful technology that is only recently finding a niche for itself.
As with many new technologies, its biggest limitation is acceptance: many browsers
just do not support XSLT processing, and probably will not support it within the
forseeable future.



Congratulate yourself once again for having
chosen to use PHP, as the PHP support for server-side parsing is complete and
very powerful. The end result of switching from client-side XSLT parsing to
server-side XSLT parsing is the same as switching to PHP for form validation
instead of using JavaScript: you’re guaranteed 100% compatibility with all
clients, as well as giving yourself the ability to add in extra functionality
that would otherwise have been problematic.



Even the most basic XML implementation can
provide instant return on investment, simply by making your content available in
a pure format. Furthermore, you can add more powerful XSLT processing at a later
date.

Perhaps the most popular use for server-side XSLT parsing is serving a particular
content type (eg: WML for WAP phones) based upon the user agent received.

Resources

http://www.php.net/manual/en/ref.xml.php

http://www.php.net/manual/en/ref.xslt.php

http://www.phpbuilder.com/columns/justin20001025.php3

http://www.phpbeginner.com/columns/ray/xml

About the Author


Paul Hudson is a full-time IT journalist specialising in web technologies. You
can read more of his work every month in Future Publishing’s Linux Format magazine,
or you can contact him directly at hudzilla@php.net.

Published: March 11th, 2003 at 12:00
Categories: Tutorials
Tags: , ,