Streamline working with XML in PHP using Service Data Objects - Part 1

Streamline working with XML in PHP using Service Data Objects
Explore SDO by building a simple blog and RSS feed
Matthew Peters (matthew_peters@uk.ibm.com), Software Engineer, IBM U.K. Laboratories
Caroline Maynard (caroline.maynard@uk.ibm.com), Software Engineer, IBM U.K. Laboratories
Graham Charters (charters@uk.ibm.com), Senior Software Engineer, IBM U.K. Laboratories
12 May 2006
Most PHP programmers will know that much of the function they use resides in PHP extensions, which usually either come packaged with their PHP distribution or can be downloaded from the PECL site. One such extension supports Service Data Objects (SDO) for PHP, which in February moved from a beta-level 0.9.0 release to a stable 1.0. Written by some of the original developers of the SDO extension, this article is aimed at the PHP programmer who wants to understand what SDO for PHP is, how it can be used, and how it can streamline working with XML.
Looking at SDOs and their associated interface, you should get a clear idea of the API the SDO extension provides. We then move on to show a working example of using SDOs in a two-part application comprising a small PHP application to implement a simple Web log (blog) and a part that displays that blog as an RSS feed. Both parts use SDOs as a way of working with XML. We hope you will agree that SDO is an attractive option for working with XML data in PHP.
Let's begin by pulling apart the term Service Data Object (SDO) to see how the name came about.
SDOs are PHP V5 objects. Unlike ordinary PHP V5 objects, SDOs are intended only to carry data and not to have application methods or functionality defined on them. Hence, they are Data Objects. They were devised as a way of making data available to an application program while making the format independent of its original source, so the data would be structured and manipulated in the same way regardless of whether it came from a relational database or XML. In some loose way, this made them Service Data Objects. Today we think of them as useful in service-oriented applications: When data with a complex structure needs to be exchanged between two components in a service-oriented application, SDOs are a good way to do it.
SDOs are a generalization of the Data Transfer Object pattern. If you search for "Data Transfer Object" online, you will quickly come across the definition in Martin Fowler's Patterns of Enterprise Application Architecture: "An object that carries data between processes in order to reduce the number of method calls." A Data Transfer Object is a way to package a collection of values in a single object so they can be passed around economically.
SDOs extend the Data Transfer Object pattern in several ways, making something more powerful than the simple pattern. They still have this in common with Data Transfer Objects, though: They are only carriers of data and do not have application methods or business logic defined for them.
There are three important ways in which SDOs add to the basic Data Transfer Object pattern:
- They come not just as individual data objects but also as collections of objects that can refer to one another. This makes them useful for representing the kind of structured data that when stored would commonly be held in an XML file.
- A collection of SDOs that has been altered also maintains a record of its original values, enabling certain sorts of optimistic locking algorithms.
- SDOs are objects that only have meaning when they are in memory or when serialized and in transmission, but data that remains only in memory or in transmission and never makes it out to a back-end store is not always all that useful. The PHP implementation of SDO includes two so-called Data Access Services (DASes), which have the job of getting data from some back-end store and turning it into a graph of SDOs or putting data from a graph of SDOs back to the store again. One DAS is for working with data in relational databases, and one is for working with data in XML files.
Incidentally, the implementation of SDO for PHP is one of a family in that there are also implementations for C++ and for Java™ technology.
An SDO is a PHP V5 object. Like any PHP V5 object, an SDO has a set of properties, but unlike a normal PHP V5 object they are just containers for data values and cannot have application methods defined for them. You do not code a class definition for them, nor do you call a constructor to create them.
Leaving aside for a moment the question of how they are created, if we were to use the var_dump() function on an SDO, we would see that it is an instance of SDO_DataObjectImpl, and we would see the property names and their values. Suppose we have an SDO called $author and we use var_dump on it, we might see:
Listing 1. var_dump() of an SDO
|
This data object has three properties that are all PHP strings: name, dob (date of birth) and pob (place of birth). SDOs can have data object properties that can be any of the simple PHP scalar types: string, integer, float or Boolean, as well as NULL or a reference to another SDO. The type of any given property is fixed, and if a value of a different type is assigned, it will be converted. For example, assigning an integer value to a string property will cause the integer to be converted to a string.
All SDOs have a PHP type of SDO_DataObjectImpl, but SDOs also have an SDO type name. The var_dump() function shows the contents of the data object, but does not show its SDO type name. To see this, we use the getTypeName() method on the object, which in this case prints like this: Author.
We'll see later where this type name was defined to SDO, as well as how the set of properties was specified.
The PHP print instruction on this object, incidentally, produces the same information as var_dump() in a slightly more compact form, all on one line (which we have split into five for readability).
Listing 2. print() of an SDO
|
Setting and getting properties
SDOs support object syntax and associative array syntax, so the properties could have been set with object syntax:
|
or with associative array syntax:
|
Of course, we can get values from the data object, as well as set them using the same two forms:
|
or
|
Many-valued and single-valued properties
A property can be many-valued or single-valued. If a property is many-valued, var_dump() will show it pointing to an SDO_DataObjectList object, which is a list of the individual values. Here is an example where a works property has been added to the Author type and has been defined to comprise a list of strings:
Listing 3. var_dump() of an SDO with a many-valued property
|
In this case, the more succinct print instruction merely indicates that there is a many-valued property called works, but does not expand the property.
Listing 4. Print of an SDO with a many-valued property
|
To see the contents of the works property, you would need to print it separately.
Because SDO_DataObjectList implements the PHP ArrayAccess interface, many-valued properties behave much like PHP arrays. For example, we might have added these two works to the list:
|
And if we wanted to iterate through the values and print them, we might do something like this:
|
A note of caution: An SDO_DataObjectList does not behave exactly like a PHP array. For example, unset on an item in a list will cause the indices of all subsequent items to be shuffled up; there are no holes allowed in the set of indices.
We have seen that var_dump() and print show the current set of properties of any given SDO. To get the most complete picture of an SDO -- to see the names and types of properties that have not been set, for instance -- there is a reflection API described in the SDO section of the PHP manual (see Resources).
Connecting SDOs: More complex structures
SDOs are for representing structured data, and a single SDO on its own can only represent so much. SDOs become much more capable when they are connected to form a graph of objects.
SDOs are connected by references from one to another. The reference from one SDO to another is also a property of that SDO. In our previous examples, the properties were all primitives or a list of primitives, but a property may also be a reference to another SDO or a list of references.
Here's a simple example: Once again, we have used var_dump() to dump an SDO called $author, but this time, thename property was defined to be not just a simple string but instead a reference to another SDO. This second SDO represents the author's name with two properties of its own: first and last. When we dump the author object, we see both objects because var_dump() follows the reference from the author object to the name object:
Listing 5. var_dump() of a pair of connected SDOs
|
Leaving aside the matter of how the two SDOs are created, the second SDO would have been assigned to the name property of the author SDO in the same way as the strings we saw before. The whole sequence might have looked like this:
Listing 6. Assignment of a reference using object syntax
|
or like this:
Listing 7. Assignment of a reference using associative array syntax
|
where $author and $name are both SDOs.
It is helpful to distinguish clearly the different uses of the word name here and be clear on what is an SDO type name and what is a property name. In a manner that has not been described yet, $author and $name have been created as objects of a certain SDO type. If we call getTypeName() on them, we will see the type names. Let us suppose they are Author and Name. Now, the $author object has a property called name. This has been defined so you can only assign to it objects of SDO type Name. If you attempt to assign an SDO of a different SDO type (another author, perhaps) or a primitive, SDO would throw an exception. There is no type conversion for SDO reference properties. There is inheritance, though: SDO understands an inheritance hierarchy among types, and given a property expecting a given type, you can assign a subtype.
In sum: There is an object called $name, it has an SDO type of Name, and it is assigned to a property of $author called name.
So far so good, but there are two important aspects to SDO reference properties we must describe.
Many- and single-valued reference properties
We saw the first aspect already when we looked at properties containing primitives: Reference properties can be many-valued or single-valued. A many-valued reference property will point to an SDO_DataObjectList object that will be a list of data objects. These data objects will all be of the same type. To illustrate this, suppose that we make the list of the author's works into a many-valued reference to SDOs of type Work, where SDOs of type Work have a title and rough date of composition. Suppose also they keep the single-valued reference to an SDO of type Name. In this case the structure of the graph of SDOs might look like this:
Figure 1. SDOs illustrating many- and single-valued reference properties

Containment and noncontainment reference properties
The second important aspect to reference properties is that they can be containment or noncontainment references.
The notion of containment vs. noncontainment is specific to reference properties and needs some explanation. As mentioned, one goal of SDO is that it is possible to represent the sort of structured data usually stored as XML. The basic structure of a well-formed XML document is a hierarchy: There is a single element at the top of the hierarchy, usually called the root element or document element, which typically contains other elements that probably contain other elements in turn. Since any given element is contained within the start and end tag of its containing element, the structure has to form a tree. Let's illustrate this with the author and name example. This is one way the author and name objects on which we used var_dump above might have been represented in XML.
Listing 8. XML instance document illustrating containment
|
The simple <first> and <last> elements are contained within the <name> element, which is in turn contained within the <author> element. The simple elements <first> and <last> are modeled as primitive (string) properties of the name SDO. The fact that the <name> element is contained within the <author> element is modeled as a containment reference property from the name property of the author SDO to the name SDO.
So, if these inclusion-style relationships are what SDO calls containment references, what is a noncontainment reference? Although not all applications use them, XML also allows a way to express links between elements that are independent of the containment hierarchy, using XML IDs and IDREFs. It is these extra relationships that SDO models as noncontainment references.
To illustrate this, we need a new example and there is one that is used in a number of places in other documents on SDO. The example is that of a company that contains departments, which in turn contain employees. Here is the example expressed as an XML document:
Listing 9. XML instance document illustrating an ID/IDREF relationship (noncontainment)
|
You will see how the XML models the simple containment hierarchy where a single company element contains a departments element, which in turn contains three employees elements. The departments and employees elements are perhaps unfortunately named; singular rather than plural might have been better. When this document is loaded into memory as a graph of SDOs, the graph will contain five data objects and two data object lists. There will be one company data object, one department data object, and three employee data objects. There will be two list objects -- one for the collection of departments (even though there is only one department, the property is many-valued), and one for the collection of employees. The company object will have a departments property that will point to a list of department objects (containing just one in this case). The department will have an employees property that points to the list of employee objects.
Each data object also has one or more primitive properties; company has a name property derived from the name attribute and has the value 'MegaCorp', the department and employee data objects also have name properties, and so on.
Notice the employeeOfTheMonth attribute on the company element. You will see it contains the serial number of one of the employees. Although we have not shown you the XML schema for this XML document that connects the serial number attribute to the employeeOfTheMonth attribute, this is a use of the XML ID and IDREF. The SN (serial number) is defined as an ID field, and the employeeOfTheMonth as an IDREF. This represents a relationship between these two elements that is independent of the main hierarchy.
We model these ID and IDREF attributes in SDO with noncontainment references. Like ID and IDREF attributes in XML, noncontainment relationships are like an extra layer over the containment hierarchy. It is a rule of SDOs that just as an IDREF cannot refer to an XML element that does not exist somewhere in the document, any SDO in a graph reachable by a noncontainment reference must also be reachable by a containment reference from the root. You cannot point to an SDO only with a noncontainment reference. This aspect of graphs of SDOs, called closure, is checked whenever SDOs are written out to storage by any of the DASes.
Here is a diagram to illustrate the principles. Note how the noncontainment reference is independent of the containment hierarchy. Note that the employee reachable by following the noncontainment relationship is also reachable by following containment references from the root element.
Figure 2. SDOs illustrating containment and noncontainment reference properties

Like containment references, noncontainment references can be many- or single-valued.
They are used about as much as XML's ID and IDREFs, which is to say that some applications will use them a lot, and some never.
|
STREAMLINE WORKING WITH XML IN PHP USING SERVICE DATA OBJECTS - PART 2

Comments