STREAMLINE WORKING WITH XML IN PHP USING SERVICE DATA OBJECTS – PART 2

First published by IBM at IBM developerWorks (www.ibm.com/developerworks). All rights retained by IBM and the authors.
Used By Permission

Creating data objects

Now that we have seen how to get and set the properties on a data object and how to connect them, we need to explain how to create data objects.

SDOs are created by calling a data factory object. Before the data factory will create anything, it needs to have defined to it the model — that is, the set of type names and the properties each type can have. It is this model that constrains the properties types can take, so that, for example, you cannot normally add a property to a data object if that property is not in the model, and you cannot assign a data object of one type to a property that has been specified to take a property of a different type.

In the PHP implementation of SDO, there are three ways you can get a data factory initialized to start creating data objects. The first two involve the use of the supplied DASes, which presuppose you are probably going to be reading from and writing to a relational database or an XML file. In these circumstances, the DASes create and initialize a data factory, but they keep it hidden and provide an interface themselves for creating data objects. The other way to obtain a data factory is to create it yourself, and use the AddType() and AddPropertyToType() methods to define the model, just as the DASes would. Although this is an unconventional thing to do, we show briefly what this looks like, to give you a clear picture of what goes on under the covers in the supplied DASes.

Here is how to define the two types we used in the examples above:

You may remember that we illustrated the getTypeName() call on a $author object and saw the name Author coming out. It is a call to addType() like this that will have set this name in the SDO model.

All types exist in a namespace. Here, the namespace is set to NAMESPACE, but it could just as easily have been left blank or null if no namespace were wanted.

Here is how to add the simple string property first to the Name type:

Listing 10. Adding a primitive property to an SDO type

The first two arguments to this rather ungainly call specify the namespace and name of the type we’re adding a property to — in this case, NAMESPACE:Name. The third argument is the name of the property we’re adding — in this case, first. The fourth and fifth arguments are the namespace and name of the types the new property can take — in this case, commonj.sdo:String (the value of the constant SDO_TYPE_NAMESPACE_URI is commonj.sdo:; this is the namespace in which primitive types live in the SDO model). The final argument is an associative array that specifies properties, such as single- or many-valued, and for object references (which this is not), whether containment or noncontainment.

After the calls so far, you would be able to call this data factory to create a data object of type Name, and it would have one property called first that expects to be assigned objects of type string. We will illustrate this shortly.

Here is how the containment reference from Author to Name would be specified:

Listing 11. Adding a containment reference property to an SDO type

Here, we are adding a property called name to the Author type and constraining it to refer objects of the Name type. The name property is a single-valued containment reference.

We must stress again that it would be unconventional to define the types and properties to the data factory like this yourself. This is usually done on your behalf by the DASes, which we will meet in the next section.

The other properties we have seen would be added the same way. If we were adding the works property we used in an earlier example, this would be many-valued.

Once the model has been specified to the data factory, we are in a position to create the data objects. The first object we create can only be created by calling the data factory or calling the appropriate method on whichever DAS we are using. Here, we call the data factory to create the top-level object, and we pass the type name and the namespace:

Once we have one object, there are two ways to create any child data objects. The straightforward way is to call the data factory again to create an object of the appropriate type, then assign it to the property in its parent object.

Listing 12. Creating and assigning second data object with call to data factory

There is also a shortcut that is more common. Call createDataObject() on an existing object, passing the name of the property to contain the new object.

This call does a look-up in the model to see what sort of data object the property name can point to, creates a data object of that type, assigns that data object to the given property, then returns the created data object. We are effectively using one data object as a data factory for the objects it refers to. Once you have one data object to start with, this is the common way to create others.

Data Access Services

Now let’s see how to do what we have just done in the more conventional way using one of the DASes. First, a little background.

The job of an SDO DAS is to load data from a store and turn it into a data graph for the application program to work with, then to write a data graph back out again. Two DASes are provided for the PHP implementation of SDO: one for working with data in XML and one for working with a relational database. They are each described in detail in their respective chapters of the PHP documentation, so we will just give a short introduction here. You will also see several examples of using the XML DAS in our example in the second half of this article.

Both DASes need to start by initializing an SDO model, which they do by calling the addType() and addPropertyToType() methods you saw. The model needs to match the types, properties, and relationships that exist in the source XML or relational database, and this information needs to be specified to the DAS. The way this is specified is quite different between the two DASes, though.

The XML DAS always initializes its model by reading and parsing an XML schema definition file (an XSD file) that corresponds to the XML instance document it is going to load. The rules for mapping from an XML schema to an SDO model are given in the SDO V2.0 specification document, but are essentially that any element that is a complex type will become an SDO type in the model. The containment relationships between elements will be reflected by containment references between SDOs. Simple types like strings end up as primitive properties on the SDOs. Attributes also become properties.

The schema definition file is usually passed to the XML DAS when initialized, using the static create() method on the class, like this:

Once the model has been defined to the XML DAS, there are loadFile() and saveFile() methods for loading and saving the XML data from or to a file, and loadString() and saveString() for loading and saving it from and to a string. There is also a createDocument() method that creates a document from scratch without needing to start from a loaded file.

The Relational DAS is initialized quite differently. It uses data supplied by the application program and passed to the DAS in associative arrays when created. One parameter is a collection of associative arrays that describe the database: table names, column names, primary keys, and foreign keys. This is information that in principle could be obtained automatically from the database. Although this is not done now, perhaps a future version of the Relational DAS will do so. The other important information the application must supply defines how the data in the database should be mapped to a data graph, which type should be regarded as the top of the graph, which foreign keys should be interpreted as containment properties, and so on.

Once the Relational DAS has absorbed the model, it can be used to load data from the database into memory as SDOs. You give the Relational DAS a SQL query to execute, and it will issue the query and break the result set into a graph of SDOs. If you then update the objects, perhaps adding to or deleting from the graph, then call applyChanges on the Relational DAS, it will generate the SQL statement necessary to apply the changes back to the database. Much more has been written about the Relational DAS in the online documentation and other developerWorks articles (see Resources).

My half-baked RSS feed

To illustrate the use of SDO and the XML DAS, we will develop a sample application that uses SDO to do all the XML handling. The application is in two parts: a simple blogging application that saves the contents of the blog as an XML file and another part that reads the XML file and republishes it as an RSS feed. You should find that SDO makes working with XML from PHP about as easy as it can be. See Download for the code.

Blogging application

In structure, our blog and our blogging application are simple. The first script puts up an HTML form on which the user will enter a news item with a title and description. Here is the screen as our first entry:

Figure 3. Adding an item to the blog

Adding an item to the blog

When we submit this entry the second script, which is the destination for the form on the first, writes the data to the blog and puts up a screen to verify that the item has been added.

Figure 4. Confirmation

Confirmation

We will look at the scripts, but first, take a look at our blog with just this one entry in it:

Listing 13. Blog with one item

There is one top-level element, <blog>, that contains a number of <item>s. Each element has a title, the main text, a date, and a unique key field called guid (for globally unique ID), which is assigned when the item is created. Although we do not need this for the blog itself, it will be needed later as part of the RSS feed. We assign the guid by creating it as a hash of the date and time the item was entered. To get some idea of who has written to the blog, we also capture the IP address of the originating site.

Since the blog is cumulative and we want to read in a blog like the one above, add an item, and write it out, we use the SDO XML DAS. You will recall that the XML DAS gets its information about the model the SDOs must follow by reading an XML schema file. Accordingly, we must provide one.

Listing 14. XML schema for the blog

You should be able to see that this schema file does indeed correspond to the instance document above. Our blog has a document element called <blog>, containing an unbounded sequence of items with title, description, and so on.

Application code

Here is the HTML script that puts up a form to accept the title and description (there is no PHP or SDO needed for this one):

Listing 15. HTML page to capture item for blog

This links to the second script, additem.php:

Listing 16. PHP script to add item to blog

This second script shows our first use of SDO and the XML DAS, so it is worth looking at, though there should be few surprises.

The first thing we do is use the SDO_DAS_XML::create() static method to initialize an XML DAS with the schema file containing the description of our blog. The XML DAS will parse the schema file, decide what the SDO model of types and properties looks like, and make calls to addType() and addPropertyToType() on a data factory to initialize it with the SDO model.

The subsequent call to loadFile() then reads in and parses the XML instance document, returning an object to represent the document. Suppose we are adding a second item, and the item we added above with the title Hello World is already saved. Under the covers, loadFile() has made repeated calls to createDataObject() to construct the SDOs. Since we are assuming we are loading a blog with one item in it, the XML DAS will have created one SDO of type blog and one of type item. The blog data object will contain a many-valued containment reference property called item, which points to a list containing the one item. When we call getRootDataObject() on the document objects, we will get the SDO representing the document element, <blog>. If we were to use var_dump() on it, we would see:

Listing 17. The blog as displayed by var_dump()

You should find that the correspondence between the instance document, the schema, and the SDO data graph is clear.

Now in the lines that follow, we create a new item and copy in the item title and description entered on the HTML form. At this time, we also get the current date and time, generate the guid, and save the IP address from which the item was entered. Note that the application creates the new item by calling createDataObject() on the blog SDO, which passes the property name item. As explained, this means that underneath the covers, the model is inspected, it is found that the property item is intended to take SDOs of type item (the first is a property name, the second a type name), then a data object of type item is created and assigned to the many-valued containment reference property item in the SDO $blog.

Finally, we call saveFile() to write the blog back to the XML file from whence it came.

Suppose we add a second item with the title “What next?” The blog might now look like this:

Listing 18. Blog with the newly added item

Introduction to RSS

RSS

No article touching on RSS would be complete without saying something on versions of RSS. There are two main strands of RSS, and they differ substantially from one another. The version numbers may surprise, too: 0.92 and 2.0 belong to one camp, while 1.0 belongs to the other. Many feeds we find on the Web are 2.0, and this is what we will use. If you are interested in knowing more about the history of RSS, we recommend the O’Reilly book Developing Feeds with RSS and Atom, by Ben Hammersley, which starts with a detailed history (see Resources).

Here is the briefest of introductions to RSS. Although terse, it might help to understand the structure of the XML document we are trying to create, especially if you have not looked at the contents of an RSS feed before.

An RSS feed is just an XML document intended to provide a summary of some or all of a Web site’s contents. An RSS feed typically contains a list of recent articles available on the site, and for each of them a title, a brief summary, and a link to the main article. For our example, we will stick with this news-oriented paradigm and produce an RSS feed from our blog. But once you see the code that generates the feed, you will quickly appreciate that it would be equally simple to produce a summary like this from almost anything that has a simple structure of a list of items.

It is unusual to view the XML file as it stands. Instead, people use a so-called feed reader, a simple program that reads and formats the feed. There are dozens of feed readers. We tested our application with a small but representative sample of Windows® feed readers. Some have quirks, and they do not always agree exactly on how to interpret a feed, so we stuck to a subset of RSS and used it in a way that all feed readers handled properly. The feed readers we used are all free to download. They were Awasu Personal Edition, SharpReader, Mozilla’s Thunderbird, and the Live Bookmarks feature in Mozilla Firefox. Of these, we found Awasu the most useful for developing this application, partly because it will go and reread a feed on demand and partly because it will also display the feed exactly as it originally received it.

Schema file for RSS

To work with an XML document and the XML DAS, you need to start with an XML schema file, which the DAS uses to build the SDO model. There is no officially sanctioned schema file for RSS V2.0, although the specification and sample feeds can be found on the Harvard Law site (see Resources). When you search the Web for RSS20.xsd, you find a schema file for RSS V2.0. However, the one we found was quite long, contained many elements we did not want to use, and did not generate the feed exactly as we wanted. As a result, we wrote a much-simplified schema to define just the parts of RSS we wanted:

Listing 19. A simplified schema for subset of RSS V2.0

If you are not used to looking at XML schema this may be daunting, but it says:

  • A feed consists of a single RSS element.
  • The RSS element contains a number of channel elements.
  • Each channel element contains a number of item elements.
  • The RSS element has a single attribute: version.
  • A channel element has a number of other elements like copyright information that occur only once and the list of items.
  • Each item has a title element, a description, a link that will contain a URL to the referenced article, a publication date, and the guid.

The meaning of the guid is described in the RSS specification. A string should be unique to that article. The string can be an opaque identifier, or it can be a URL pointing to the article, in which case the isPermaLink attribute should be set to true, and the feed reader interprets it as a link. We have chosen to generate an opaque identifier and use the <link> element to point back to the article.

Producing the feed

We will describe the application that produces the RSS feed from the blog later. First, let’s look at what the feed would look like if the blog we are producing the feed for has just the two items we added above. With luck, you will be able to see how this structure corresponds to the XML schema we wrote for our RSS feed.

Listing 20. The blog when output as an RSS feed

This is what this feed will look like when displayed by the Awasu Personal Edition feed reader:

Figure 5. The feed displayed in Awasu

The feed as displayed in Awasu

If you click on the titles Hello World or What next? — which have the small document icons next to them — Awasu will follow the URL in the <link> element of the item. We will show this below.

Generating the feed

Now let’s see the PHP script that generates this feed:

Listing 21. The PHP script that generates feed from the blog

Our aim here is to construct a data object for the RSS feed, then read in our blog and create a corresponding item data object in the RSS feed for each blog item. We could decide to limit ourselves to only a few recent articles and cut off the list at a certain date, but here, we just copy all the items.

First, we write out the appropriate HTTP header to indicate that an XML document follows. Then we construct an XML DAS using our RSS schema and load an XML file to start us off. This XML file contains a few settings that probably do not change from one generation of the feed to another — copyright statement, title of the feed, and so on. This is just a tidy place to keep these settings. We could just as easily have hardcoded them as assignments to the properties of the channel data object in the script.

After initializing the properties of the RSS feed in this way, we acquire the RSS data object and set a few more properties — those relating to the time of generation. You can find out what any of these mean in the RSS V2.0 specification.

The next three statements open and load the blog, and acquire the blog data item from the document. We use a second XML DAS to do this. This is perfectly all right because we now have two DASes, each loaded with a different model. As long as we do not expect one DAS to understand the data objects created by the other, no problems should occur. Incidentally, it might have been possible to load both schema files into the one DAS — a DAS can manage many schema files — but that would have meant putting the type names they have in common — title, description, item, guid — in separate namespaces. We chose to separate them into separate DASes since we did not want to illustrate the use of namespaces.

Now we have both data graphs loaded in memory. Finishing off the RSS feed is a matter of iterating through the items within the blog, and for each of them, creating a corresponding item within the feed. Note that we cannot copy the items themselves, for not only is the structure of an item in the blog slightly different from that in the feed but we would be copying an item created by one DAS into a graph created by another, which is not allowed.

Once we have all the information in the feed as we want it, the application writes out the entire XML document as a string with saveString(). To format it neatly should it be read by a person, we use saveString() to indent the text by two columns at each level.

Here is the base XML file to load:

Listing 22. Base file that contains a few hardcoded values for the feed

A script to show an individual item

The last piece of the puzzle concerns the <link> element in the feed. The <link> element for each item points to a URL http://localhost/rss/showitem.php?… The <link> element is interpreted by any of the feed readers as a link to go to the real article. In our case, we just go to a simple page that displays what we have on the item. Notice that guid is on the end of the <link> value, so showitem.php knows which item to display. The script opens the blog, extracts the item with the corresponding guid, and formats it. The code follows:

Listing 23. PHP script to show a single given item

This script creates a DAS with an XSD, loads a document, and gets the root data object, which is a pattern that should by now be familiar. Then there is just one more aspect of SDO not yet described: the use of an XPath-like expression to find the item we want within the data graph. The whole blog will have been loaded into memory as a data graph, and the expression ["item[guid=$id]"] (remember the $id will have been substituted with the actual value of $id by PHP) will be interpreted as an XPath search string. $item will be assigned the item with the correct ID. SDO implements a capable subset of XPath.

Here is what it looks like if we follow the link from the first item, again shown using Awasu:

Figure 6. The first item displayed in Awasu

The first item displayed by Awasu

Getting it running

Perhaps you have been content just to read along, but if you wanted to download the code and get this running on your machine, and perhaps tinker with it, the following tips may help.

First, note that in the PHP script that generates the feed, the link is specified with a full URL. The script expects all the files to be installed in an rss subdirectory of your Web server’s document root — for example, htdocs/rss with Apache.

The various feed readers mentioned do not all implement the same logic with regard to interpreting the guid and link, especially if the link is not specified or if isPermaLink is set to true or is not specified. Under some circumstances, we saw SharpReader take the guid and put it on the end of the address of the site that delivered the feed, in order to construct a link. We also saw Awasu simply put a http:// on the front of the guid. The scheme we ended up with, described above, of a link and a guid with isPermaLink false, works fine with all our feed readers.

Some of the feed readers have what you might call a mailbox mentality, so that if they have once seen an item with a given guid, that item will keep appearing in the view they show of the feed, even if the latest version no longer has that item. Similar confusion can occur if two items end up with the same guid, or the guid of an item changes. It is best not to do that, but when experimenting, these things can happen, and they can be difficult to clear up.

Usually, deleting the feed from the feed reader is enough, but we found that for Thunderbird, the only way to completely clear out the history of a feed was to delete the directories corresponding to the RSS News & Blogs account under Thunderbird’s mail directory.
Note: Take utmost care not to delete anything else you really wanted to keep, such as your mail directory.

You can rest assured that at least your RSS feed is being served by checking the Web server’s log. If you’re working on a UNIX® system, or on a Windows system with MKS toolkit or cygwin installed, you can also tail -f it.

Conclusion

Our aim was to introduce you to Service Data Objects, to show you the API for working with them, and to illustrate that they make a convenient and natural way to work with XML data from PHP, expressing the structured data from an XML document in a natural form. Although we are not deceived into thinking that we have written the most advanced blogging application possible, or the most advanced RSS feed generator, perhaps you found the application interesting.

You will have seen that to use XML with SDO, you do need a XML schema file from which the XML DAS can initialize the model of types and properties, and perhaps you found that off-putting. If you do have a schema, the SDO API polices all assignments and alterations to the data graph, ensuring that any changes you make to the objects and the data graph will form a document consistent with the schema — a sort of schema validation on the fly. Perhaps you will find this useful.

We asserted that one of the original objectives with SDO was to provide a way of working with structured data that is independent of the source of the data. We have not illustrated the use of the Relational DAS, but we will leave you with the thought that had we wished to do so, we could have written the application above to work with data in a relational database and not in XML. And had we done so, we would have needed to initialize a Relational DAS, rather than an XML DAS, and we would have done that in a different way, but all the rest of the SDO manipulation would have been identical.

We wish you success in using SDO to work with structured data in PHP.

Back to top

Resources

Learn

Get products and technologies