Web services are the coolest technology I know of that ends up turning everyone off. I don't know about you, but when I go to a lecture on Web services, invariably tons of acronyms come out, like Representational State Transfer (REST), Extensible Markup Language (XML), Remote Procedure Call (RPC), SOAP, and RSS. And then I start to nod off and dream about a land where free Krispy Kreme donuts grow on trees.

When I wake up, I realize that, in reality, I do a lot of Web service work: I just don't go through all the standards rigmarole. I look at Web services very simply, as an alternative interface to my Web applications that allows other programs to talk to my application in a way that makes sense for programming languages. Humans talk to my application through Hypertext Markup Language (HTML), and applications talk to my application through XML, Comma-Separated Value (CSV) files, or something standardized.

Figure 1 explains it a little better, I think.

On the left, you have the traditional Web application. A server speaks HTML to the Web client. That server can be whatever technology you want: Java(TM) technology, Microsoft(r) .NET, Rails, PHP, Python, ColdFusion-whatever. On the right side is the same Web application, but this time, in addition to the HTML it vends, it speaks XML to talk to other applications or other specialized services, such as RSS readers.

A lot of times, I hear people talk about an application server being completely service based. And that may be fine for a back-end technology. But for your average Web application, you always want to support both an HTML and an XML interface.

A nice fringe benefit to having two interfaces is that it forces you to centralize your business logic in one place-perhaps a "middle tier"-so that both the HTML and the XML interfaces make the same queries to the database and get the same results.

Perhaps the value of allowing your Web application to speak XML to other programs will be obvious to you, but it wasn't to me-at least initially. So, this article serves to walk you through an example of what can be done when you put an XML interface on an application. I start with a simple HTML front end, then show how to build the XML interface and add various readers, including Asynchronous JavaScript and XML (Ajax), RSS, and Adobe(r) Flex(TM).

The articles application

The little test application I'm going to start with is one that has a list of articles in a database. That database is shown in Listing 1.

Listing 1. articles.sql

DROP TABLE IF EXISTS articles;
CREATE TABLE articles (
  id INTEGER NOT NULL AUTO_INCREMENT,
  title VARCHAR(255),
  author VARCHAR(255),
  description TEXT,
  PRIMARY KEY( id ) );

INSERT INTO articles VALUES ( null,
  'What I like about dogs', 'Megan Herrington',
  'Everything that I love about dogs I learned in preschool' );
INSERT INTO articles VALUES ( null,
  'Making action movies', 'Jack Herrington',
  'How to script, produce and direct Hong Kong action flicks' );
INSERT INTO articles VALUES ( null,
  'Super Paper Mario Tips', 'Lori Herrington',
  'Everything you need to know to win at Paper Mario' );
INSERT INTO articles VALUES ( null,
  'Why I bark', 'Oso Herrington',
  '' );

It's pretty simple. A single table holds the list of articles, and each article has a title, an author, and a description. The elaborate HTML front end for this table is shown in Listing 2.

Listing 2. articles.php

<html>
<head><title>Articles</title></head>
<body>
<?php
require_once( "DB.php" );
$db =& DB::Connect( 'mysql://root@localhost/articles1', array() );
if (PEAR::isError($db)) { die($db->getMessage()); }

$res = $db->query( "SELECT * FROM articles" );
$rows = array();
while( $res->fetchInto($row, DB_FETCHMODE_ASSOC) ) {
?>
<div class="title"><?php echo( $row['title'] ) ?></div>
<div class="author"><?php echo( $row['author'] ) ?></div>
<?php if ( strlen( $row['description'] ) > 0 ) { ?>
<div class="description"><?php echo( $row['description'] ) ?></div>
<?php } ?>
<br/>
<?php } ?>
</body>
</html>

If everything works and the database is set up, I see the page in Figure 2 when I navigate to it in my browser.

Yeah, it's not much to look at, but I wanted to keep it simple for this example-most of all because of what I want to show next, which is not very pretty.

One of the big reasons to provide an XML interface is to avoid having to write the code shown in Listing 3.

Listing 3. fetch.php

<?php
require_once 'HTTP/Client.php';

$client = new HTTP_Client();
$client->get( "http://localhost/ws/articles.php" );
$resp = $client->currentResponse();
$data = $resp['body'];
$found = array();
preg_match_all( "/(<div class=\"title\">.*?)<br\/>/s", $data, $found );

foreach( $found[1] as $item ) {
  preg_match( "/<div class=\"title\">(.*?)<\/div>/", $item, $title );
  preg_match( "/<div class=\"author\">(.*?)<\/div>/", $item, $author );
  preg_match( "/<div class=\"description\">(.*?)<\/div>/", $item, $description );
  $articles []= array( 'title' => $title[1],
    'author' => $author[1],
    'description' => count( $description ) > 0 ? $description[1] : '' );
}

print var_export($articles);
?>

This is a PHP-based "screen scraper." It's a script that gets the page as one big text string, then uses a set of complicated regular expressions to parse out the title, author, and description elements from the page.

When I run it on the command line, I see what's shown in Listing 4.

Listing 4. Running fetch.php

% php fetch.php
array (
  0 => 
  array (
    'title' => 'What I like about dogs',
    'author' => 'Megan Herrington',
    'description' => 'Everything that I love about dogs I learned in preschool',
  ),
  1 => 
  array (
    'title' => 'Making action movies',
    'author' => 'Jack Herrington',
    'description' => 'How to script, produce and direct Hong Kong action flicks',
  ),
  2 => 
  array (
    'title' => 'Super Paper Mario Tips',
    'author' => 'Lori Herrington',
    'description' => 'Everything you need to know to win at Paper Mario',
  ),
  3 => 
  array (
    'title' => 'Why I bark',
    'author' => 'Oso Herrington',
    'description' => '',
  ),
)

So, yes, it gets the data alright, but it can't get the ID of the elements, because the ID isn't in the HTML anywhere.

All this "screen scraper" code is in this article for one big reason: to convince you to put an XML interface on your data. If your data is interesting, people are going to find a way to get at it. And if you don't support XML, then that means screen scraping and a lot of angry customers when you make updates to the site to "beautify" the interface.

Adding the XML service

So, to avoid all the nastiness of screen scraping (and to enable some really cool programs to use my data), I'm going to write an XML interface to the table. This interface is shown in Listing 5.

Listing 5. artxml.php

<?php
require_once( "DB.php" );
$db =& DB::Connect( 'mysql://root@localhost/articles1', array() );
if (PEAR::isError($db)) { die($db->getMessage()); }

$dom = new DomDocument();
$dom->formatOutput = true;

$root = $dom->createElement( "articles" );
$dom->appendChild( $root );

$res = $db->query( "SELECT * FROM articles" );
$rows = array();
while( $res->fetchInto($row, DB_FETCHMODE_ASSOC) ) {
  $art = $dom->createElement( "article" );
  $art->setAttribute( 'id', $row['id'] );
  $root->appendChild( $art );

  $title = $dom->createElement( "title" );
  $title->appendChild( $dom->createTextNode( $row['title'] ) );
  $art->appendChild( $title );

  $author = $dom->createElement( "author" );
  $author->appendChild( $dom->createTextNode( $row['author'] ) );
  $art->appendChild( $author );

  $desc = $dom->createElement( "description" );
  $desc->appendChild( $dom->createTextNode( $row['description'] ) );
  $art->appendChild( $desc );
}

header( "Content-type: text/xml" );
echo $dom->saveXML();
?>

When I run this script on the command line, I get the output shown in Listing 6.

Listing 6. The article XML

% php artxml.php
<?xml version="1.0"?>
<articles>
  <article id="1">
    <title>What I like about dogs</title>
    <author>Megan Herrington</author>
    <description>Everything that I love about dogs I learned in preschool</description>
  </article>
  <article id="2">
    <title>Making action movies</title>
    <author>Jack Herrington</author>
    <description>How to script, produce and direct Hong Kong action flicks</description>
  </article>
...

This code is pretty straightforward. There is a root <articles> tag that contains a bunch of <article> tags. Each <article> tag has an id attribute with the numeric ID of the record. The <title>, <author>, and <description> tags hold the corresponding data.

I used the XML Document Object Model (DOM) functions in PHP instead of hand-writing the tags so that the DOM handles all the XML node balancing and encoding work for me. It's an easy way to ensure that the XML that the page returns is always valid. I strongly recommend using the XML DOM functions to output XML. All the major Web languages support building an exporting XML DOMs.

Fetching the XML

Earlier, I showed the HTML and the PHP code to scrape the data out of the HTML. Now I have this XML service, let's have a look at the equivalent piece of PHP code to get to the same data, but this time in XML. This XML-fetching code is shown in Listing 7.

Listing 7. fetchxml.php

<?php
require_once 'HTTP/Client.php';

$client = new HTTP_Client();
$client->get( "http://localhost/ws/artxml.php" );
$resp = $client->currentResponse();
$data = $resp['body'];

$dom = new DomDocument();
$dom->loadXML( $data );
$articles = array();
foreach( $dom->getElementsByTagName( 'article' ) as $art ) {
  $title = $art->getElementsByTagName( 'title' );
  $author = $art->getElementsByTagName( 'author' );
  $description = $art->getElementsByTagName( 'description' );
  $articles []= array(
    'title' => $title->item(0)->firstChild->nodeValue,
    'author' => $author->item(0)->firstChild->nodeValue,
    'description' => $description->item(0)->firstChild->nodeValue
  );
}

print var_export( $articles );
?>

This is much easier. I still get the page the same way, but then I give the page contents to the PHP's DOM library and use the XML functions to get the title, author, and description data quickly and easily. The code is easy to read, easy to maintain, and will not break unless the XML format changes, which is unlikely.

Just for the sake of comparison, I wrote the same code in C# to show how two very different programming languages can read a single data source. The C# code is shown in Listing 8.

Listing 8. WebServiceTest.cs

using System;
using System.IO;
using System.Net;
using System.Xml;

namespace wstest1
{
  class WebServiceTest
  {
    [STAThread]
    static void Main(string[] args)
    {
      HttpWebRequest r = (HttpWebRequest)WebRequest.Create( "http://localhost/ws/artxml.php" );
      WebResponse res = r.GetResponse();

      string sPage;
      StreamReader reader = new StreamReader( res.GetResponseStream() );
      sPage = reader.ReadToEnd();
      reader.Close();
      res.Close();

      XmlDocument doc = new XmlDocument();
      doc.LoadXml( sPage );

      foreach( XmlElement elArticle in doc.GetElementsByTagName( "article" ) )
      {
        string sTitle = (elArticle.SelectSingleNode( "title" )).InnerXml;
        string sAuthor = (elArticle.SelectSingleNode( "author" )).InnerXml;
        string sDescription = (elArticle.SelectSingleNode( "description" )).InnerXml;
        int nID = Int32.Parse( elArticle.Attributes["id"].Value );
      }
    }
  }
}

Now, with the techy-codey part of the article out of the way, it's time to talk about the fun stuff, like what you can do with the XML in other ways.

Play with XML in XSLT

Wait, didn't I just say the techy-codey part was over? Well, okay, just one more thing. Turns out, you can format XML data very quickly using XML Style Sheets, or XSLs. The code shown in Listing 9 formats the XML code returned from the Web service I wrote into the HTML from the articles.php page.

Listing 9. articles.xsl

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/TR/xhtml1/strict">

<xsl:output method="html" indent="yes" encoding="iso-8859-1" />

<xsl:template match="/">
<html>
<head><title>Article list</title></head>
<body>
<xsl:for-each select="/articles/article">
<div class="title"><xsl:value-of select="title"/></div>
<div class="author"><xsl:value-of select="author"/></div>
<xsl:if test="string-length( description ) > 0">
<div class="description"><xsl:value-of select="description"/></div>
</xsl:if>
</xsl:for-each>
</body></html>
</xsl:template>

</xsl:stylesheet>

It's a bit tough to read, but that's XSL for you. Basically, XSL is a pattern matcher; I've defined an XSL template that matches the root tag of the incoming XML tree. It outputs the HTML header, then uses a for�each loop to go around each article and put out the values for the title, author, and description (if there is one).

This style sheet could be attached to the XML output itself, and most browsers would use that to render the XML into HTML for display automatically. How about that!

Ajax

Perhaps the biggest reason to have an XML export from your application is to enable the use of Ajax in Web clients. The JavaScript code in the client can request the XML from the server after the page is loaded and render it in any way it chooses, often dynamically changing based on user input without requiring a page refresh.

A simple Ajax-based table that renders the data from the XML feed is shown in Listing 10.

Listing 10. ajax.html

<html><head>
<script src="prototype.js"></script>
</head>
<body><table id="articles"></table>
<script>
new Ajax.Request( 'artxml.php', { 
  method: 'get',
  onSuccess: function( transport ) {
    var artTags = transport.responseXML.getElementsByTagName( 'article' );

    for( var a = 0; a < artTags.length; a++ ) {
      var author = artTags[a].getElementsByTagName('author')[0].firstChild.nodeValue;
      var title = artTags[a].getElementsByTagName('title')[0].firstChild.nodeValue;
      var description = artTags[a].getElementsByTagName('description')[0].firstChild.nodeValue;

      var elTR = $('articles').insertRow( -1 );
      var elTD1 = elTR.insertCell( -1 );
      elTD1.innerHTML = author;
      var elTD2 = elTR.insertCell( -1 );
      elTD2.innerHTML = title;
      var elTD3 = elTR.insertCell( -1 );
      elTD3.innerHTML = description;
    }
  }
} );
</script></body></html>

This code uses the Prototype.js library to access the data from the server, then uses the XML DOM functions in the browser to access the author, title, and description fields. It then uses the HTML DOM functions to add new rows and cells to the "articles" table for each article in the data set.

The output of the Ajax code is shown in the browser in Figure 3.

This is a pretty rudimentary example, but you could easily imagine adding client-side sorting or searching all without going back to the server for additional data.

Accessing XML with Flex

Next-generation rich Internet application (RIA) frameworks, like Flex, were born and raised on XML. So, they make consuming XML data and displaying it a snap. Have a look at the example Flex application shown in Listing 11.

Listing 11. wstest.mxml

<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="vertical">
  <mx:XML id="articles" source="http://localhost/ws/artxml.php" />
  <mx:DataGrid dataProvider="{articles..article}" width="400">
    <mx:columns>
      <mx:Array>
        <mx:DataGridColumn dataField="author" headerText="Author" />
        <mx:DataGridColumn dataField="title" headerText="Title" />
        <mx:DataGridColumn dataField="description" headerText="Description" />
      </mx:Array>
    </mx:columns>
  </mx:DataGrid>
</mx:Application>

There is no actual code in there at all-just a reference to the XML data source, which is then passed to the DataGrid control. The output is shown in Figure 4.

Seriously! Is that cool or what? No code at all to do that. And that's just the beginning of what Flex and Adobe ActionScript(TM) can do with XML. ActionScript has a language extension called E4X built in. With E4X, you can navigate XML document trees as simply as you would objects by using the "dot notation" syntax. This means no more clunky XML DOM methods-just straight object and array references like you would any other data structure in memory.

Standardizing

So far in this article, I've used my own flavor of XML that I cooked up just for this example. But I could have gone with a standard XML format-say RSS, for example.

The code in Listing 12 shows the same database output in RSS format.

Listing 12. artrss.php

<?php
require_once( "DB.php" );
$db =& DB::Connect( 'mysql://root@localhost/articles1', array() );
if (PEAR::isError($db)) { die($db->getMessage()); }

$dom = new DomDocument();
$dom->formatOutput = true;

$rss = $dom->createElement( "rss" );
$rss->setAttribute( "version", "0.91" );
$dom->appendChild( $rss );

$root = $dom->createElement( "channel" );
$rss->appendChild( $root );

$rtitle = $dom->createElement( "title" );
$rtitle->appendChild( $dom->createTextNode( "Article list" ) );
$root->appendChild( $rtitle );

$rdesc = $dom->createElement( "description" );
$rdesc->appendChild( $dom->createTextNode( "The article list" ) );
$root->appendChild( $rdesc );

$res = $db->query( "SELECT * FROM articles" );
$rows = array();
while( $res->fetchInto($row, DB_FETCHMODE_ASSOC) ) {
  $art = $dom->createElement( "item" );
  $root->appendChild( $art );

  $title = $dom->createElement( "title" );
  $title->appendChild( $dom->createTextNode( $row['title'] ) );
  $art->appendChild( $title );

  $title = $dom->createElement( "link" );
  $title->appendChild( $dom->createTextNode(
    "http://myhost/showarticle.php?id=".$row['id'] ) );
  $art->appendChild( $title );

  $desc = $dom->createElement( "description" );
  $desc->appendChild( $dom->createTextNode( $row['description'] ) );
  $art->appendChild( $desc );
}

header( "Content-type: text/xml" );
echo $dom->saveXML();
?>

The advantage of this is that I can use all my RSS tools in addition to any custom code that can read XML. For example, I can point my Mozilla Firefox browser at the feed, and it will create a live bookmark that I can put on my toolbar and check for updates. This is shown in Figure 5.

Of course, not all data falls conveniently into the RSS format, and that's fine. But if it can become RSS, or Resource Description Framework (RDF), or any of the other conventional XML formats, it's often valuable to follow those instead of inventing your own.

Conclusion

Hopefully, this article puts practical Web services for applications into perspective. I know I didn't cover all the REST, XML/RPC or SOAP bases. Lots of articles have covered those and given enough engineers standards-based nightmares to last for ages. In this article, I wanted to show how easy it is to get XML data out of an application and use it in a very pragmatic way. If I've succeeded, let me know by dropping me a line and showing me what XML data comes out of your application. Maybe we can put together a mashup with some of the other cool Web services out there.

Resources

  • Flex is the open source-based RIA development environment pioneered by Adobe.
  • REST is a simple Web service standard intended to more directly map to Hypertext Transfer Protocol (HTTP).
  • SOAP is an advanced protocol for object method invocations over HTTP.
  • XML/RPC is a middle-ground protocol for method invocation over HTTP that's a bit lighter weight than SOAP.
  • RSS is a syndication standard for blog entries, new articles, and that sort of thing. Google's Reader service is a fantastic-and free-RSS manager for your Web browser or smart phone.
  • Prototype.js is a free JavaScript library that aids in writing maintainable cross-browser Ajax code.