Categories


Loading feed
Loading feed
Loading feed

Introducing Service Data Objects for PHP


Introduction
Overview
SDO concepts
•  Type and Property
•  DataObject
•  Sequence
Data Access Services
Why choose Service Data Objects?
•  Reduced database overhead
•  Single API for data
•  Knowledge of the structure
Relationship to PHP Data Objects and SimpleXML
Contact scenario
•  Contact edit use case
•  Retrieving the contact entry
•  More on SDO navigation
•  Modifying the data
•  More on SDO modification
Summary
Resources
About the Authors

Introduction

Service Data Objects (SDOs) have been around in the Java technology world since November 2003. They are designed as a means of simplifying and unifying working with heterogeneous data sources. In February 2005, IBM and Zend announced a strategic partnership to collaborate on the development and support of the PHP environment. One aspect of this collaboration has been the definition and implementation of SDOs for PHP. This article gives an overview of SDOs and the motivations for using them in the PHP environment. A simple contact management scenario is used to illustrate key concepts.

Overview

Service Data Objects (SDOs) are designed to simplify and unify the way applications handle data. Using SDOs, application programmers can uniformly access and manipulate data from heterogeneous data sources, including relational databases, XML data sources, Web services, and other such enterprise information systems.

SDOs are based on the concept of disconnected data graphs. A data graph is a collection of data objects. Under the disconnected data graphs architecture, a client retrieves a data graph from a data source, changes the data graph, and can then apply the data graph updates back to the data source.

The task of connecting applications to data sources is performed by a Data Access Service (DAS) (see Figure 1). Client applications can query a DAS and get a data graph in response, modify the data graph and send the updated data graph to a DAS to have the changes applied to the original data source. Clients may also use DASes to read from one data source and write to another -- for example, reading an XML RSS feed and writing the results to a relational database. This SDO/DAS architecture allows applications to deal principally with data graphs and data objects.

Role of a DAS

The PHP implementation of SDOs involves mapping to a dynamic and weakly typed language. The result is a greatly simplified API, with the collapsing of the many type-specific getters/setters. In addition to this, SDOs also exploit PHP's ability to implement objects that can be manipulated as if they were arrays, including support for iteration, state testing and unsetting. The result is a powerful data object technology with a simple, intuitive interface.

SDO for PHP currently has support for relational and XML data sources, and a service provider interface to enable the implementation of support for others.

SDO concepts

Core SDO concepts are defined by the SDO model and a small number of interfaces that enable working with instance data managed in accordance with the model. Figure 2 shows a UML class diagram of these concepts. Let's take a few moments to understand the roles of the different elements.

SDO model

Type and Property

At the heart of every SDO instance is a model that defines the permitted structure of a data object. You can think of this as being like a blueprint for the data object. It covers concepts such as parent-child data object relationships, cardinality of relationships, permitted properties on a data object, default values. The SDO model is described in terms of Types and Properties. Types can be primitives, such as string (SDO calls these data types and defines a set mapped to PHP types), or complex types to represent an order or address (SDO calls these data object types). Data object types contain properties. Each property has a type that can be a data type or a data object type.

This Type and Property model allows data objects to represent what SDO refers to as data graphs. The essence of a data graph is a tree structure of data objects, navigable via their containment references (you can think of these as container, or aggregation, parent-child relationships), plus noncontainment references that point to data objects within the data graph (these are not aggregation relationships and can, therefore, point to any other data object within the tree).

Let's use the example of a Person data object to illustrate the important concepts of Type and Property. The diagram below shows an example of a Person SDO instance on the left and its corresponding model on the right. The Person SDO instance has its structure defined by a type with name of Person. The Person type has been defined to have two properties with the names name and age. The types of these two properties are SDO data types of String and Int. On the left, we can see that the Person instance has had these two properties set to the values of "Fred Fish" and 35, respectively.

Example Person SDO instance and model

In addition to the attributes mentioned, types also have attributes that say whether they are open (supporting additional instance properties not defined in the type), sequenced (preserving order across properties), or abstract (a base type capable of being extended by another SDO type). Properties also have attributes that describe things like cardinality (for example, a many-valued property to represent the departments in a company), any default value, whether the property is read-only. Not all these concepts are supported in the PHP implementation of SDO. The documentation accompanying the SDO for PHP project describes the status of the various capabilities. All these concepts are covered in detail in the Java SDO specification.

Note that SDO for PHP does not currently support read-only properties, default values, abstract types, open types, or bidirectional relationships.

DataObject

The core interface for working with data objects is the DataObject (no surprise there). This supports all the capabilities one would expect when working with a data structure, such as setting and getting properties, creating child structures, property querying via an augmented subset of XPath, and structure navigation.

If we consider the earlier example, we could use the DataObject interface to get the age property value, 35, and then use it to update the age to 36, say. The use of the DataObject interface is described in more detail in Contact scenario.

Sequence

The sequence interface is used to manipulate data objects when ordering across a data object's properties is required (the data object is said to be a sequenced type). This is particularly useful when working with XML data where the order of property values is important, but varies for each instance and is, therefore, not explicitly defined by the model (for mixed XML content, for example).

Data Access Services

As mentioned in the introduction, SDOs rely on the existence of DASes. The role of a DAS is to retrieve and write data to and from a data source. When using a DAS, the client works with SDOs, and is, therefore, insulated from any data source-specific data representation.

A DAS can also act as a factory for data objects, creating new instances that conform to some predefined data source schema (for example, a database or XML schema).

Two DAS implementations are provided as part of the SDO for PHP project. These are the XML DAS for working with XML files or XML/HTTP sources, and the Relational Data Access Service (Relational DAS), implemented using PHP Data Objects (PDO) for accessing relational data sources.

Why choose Service Data Objects?

There are a number of reasons why you might consider using SDOs. The main ones are outlined below.

Reduced database overhead

At the heart of SDOs is support for disconnected working. Data objects can automatically record their change history that can then be used by a DAS to detect collisions when applying changes back to an enterprise information system.

This technique is often referred to as optimistic concurrency, or optimistic offline locking (described in Patterns of Enterprise Application Architecture, by Martin Fowler). It is best suited to scenarios where there is a low risk of collision, perhaps due to infrequent edits, or the edits being governed by some external process that reduces the likelihood of multiple people concurrently working on the same data.

The benefits are even greater if, in addition to the low risk of collision, there is also high edit latency (significant time between checking out to make edits and edits being committed), or the data is passed around in ways that make it difficult to manage database connections and locks (for example, in service-oriented application architectures).

The major benefit to using this technique is the removal of the need for the application to hold connections and locks in the enterprise information system while some user or application is working on the data.

Interestingly, there is nothing in the SDO architecture that precludes the creation of DASes that employ a pessimistic concurrency model.

Single API for data

In addition to the optimistic concurrency support, another major benefit to using SDO is realized when working with multiple heterogeneous data sources. SDOs provide a single API for manipulating data independent of the data's originating data source.

In addition to the expected data structure manipulation support, SDOs also provide the ability to set and retrieve the contents of an SDO based on an XPath-like expression, including queries. This removes the burden from the client of having to navigate the data structure in order to identify substructures based on instance data. For example, we might choose to extend our Person example to include a new data object type that contains a collection of Person data objects. To identify an individual by name without XPath would require us to iterate through all the Person objects and test the name property until a match was found. To do this with XPath requires just a single call, specifying the property name and the value to be matched.

Knowledge of the structure

As we have already seen, SDOs carry internal knowledge of the structure of the data they represent (the model). This knowledge is used to ensure that the creation and modification of instance data conforms to the structure and type rules for the data object.

Data capabilities, such as SimpleXML and PDO, do not employ this approach and, therefore, must delegate this responsibility to other technologies when creating and validating data. As a consequence, the benefit that SimpleXML and PDO have over SDO is they do not require the developer to specify the model.

An additional benefit to having a model is the capability to introspect it at run time. For example, this can enable developers to write flexible user interfaces that can adapt to changes in their data structures (SDO schema) at run time. SDO model introspection is not fully enabled in the PHP implementation, but we expect this to change over time.

Relationship to PHP Data Objects and SimpleXML

PDO (not to be confused with SDO) aims to provide a consistent API for the common capabilities found in most relational database APIs. This greatly simplifies creating Web applications designed to support different database vendors by encapsulating the differences under a common API. PDO provides a simple object view of results, but does not attempt to normalize those results (one row of the result set equals one object, regardless of whether there are multiple tables represented in the result). The ease of use of PDO makes it a natural choice when working directly with databases, and for this very reason is the technology chosen for the Relational DAS implementation provided with SDO.

As mentioned, the focus of SDO is on providing a flexible data object representation for data from heterogeneous data sources and built-in support for optimistic concurrency. We have also described how having a "knowledge of the structure" enables SDOs to provide a single API for the complete life cycle of the data, including creation and validation. If these capabilities are important to your application, then SDO is probably an appropriate choice, using the Relational DAS implemented to PDO for relational data source support.

SimpleXML provides a simple way for working with XML instance documents. Documents are loaded, and can be navigated and manipulated through a simple API. The interface surfaces some specifics of XML (for example, the syntax differentiates between elements and attributes).

When parsing and processing an instance document, SimpleXML is an excellent technology choice. However, if significant manipulation is required, it is important that the technology understands the model for the document, and in that case, SDO is probably a more appropriate choice.

Contact scenario

You should now have a reasonable understanding of the main concepts of SDO. To help clarify, we will illustrate them with a simple scenario.

The following sections provide an overview of SDO for PHP capabilities, described in the context of a personal contacts example. This example demonstrates the following attributes of SDO:

  • Disconnected working (optimistic concurrency)
  • SDO navigation and manipulation
  • The role of a DAS
Code snippets are provided to help explain the concepts and development requirements. However, it is not our goal to describe a complete working example. We expect future articles to cover working examples and elaborate on the use of the relational and XML DASes.

Contact edit use case

In this scenario, a contacts Web application has been written to support the management of contact information. The contact information is stored in a relational database containing the following two tables:

Table 1. "contact" table definition
Column Type Example
shortname (primary key) string "Charlie"
fullname string "Charles Babbage"

Table 2. "address" table definition
Column Type Example
id (auto-generated primary key) integer 1
shortname (foreign key) string "Charlie"
addressline1 string "Analytical House"
addressline2 string "1 Engine Close"
city string "Walworth Road"
state string "London"
zip string "XX11 1ZZ"
telephone string "555-555-5555"

To illustrate the use of SDOs, we have selected the use case of modifying a contact. The main steps to modifying a contact are as follows:

  • Retrieve the contact to be modified from the database
  • Make the modifications to the contact
  • Apply the modification back to the database
These steps are described in more detail below.

Note: As we have mentioned, DAS APIs are not specified as part of SDO because we are referring to the SDO specification, and, therefore, any sample code is necessarily specific to a particular implementation. The code snippets shown below have been created to match the APIs of the Relational DAS provided with SDO for PHP.

Retrieving the contact entry

The first page the user sees is the contact management main.php page. It has a single input field for entering the shortname of the contact to be edited, along with an Edit button to submit the request.

The contact management main page

Entering a shortname and clicking Edit transitions to the edit.php page, passing the specified shortname in the $_POST array. The edit.php page contains the following main steps:

  1. Get the entered shortname
  2. Create a relational DAS instance
  3. Execute the query to retrieve the contact SDO
  4. Populate the edit form with the contact details
  5. Store the contact SDO in the session
Each of these steps is described in more detail below.

1. Get the shortname
The welcome page form configuration resulted in the shortname being placed in $_POST['shortname']. At this point, and also in the later section on "Modifying the data", we would normally validate the user input to prevent the database from being compromised.

// get the shortname from posted variables
$shortname = $_POST['shortname'];

2. Create an Relational DAS instance
An important aspect of creating the Relational DAS is describing the database schema it should use. The Relational DAS will take this schema and use it to define the model for the Service Data Objects it can create. This information is often required by other Relational DAS instances and is, therefore, a candidate for placing in a separate script which is then included.

// Describe the structure of the contact table
$contact_table = array(
    
'name' => 'contact',
    
'columns' => array('shortname', 'fullname', 'telephone'),
    
'PK' => 'shortname'
    
);

// Describe the structure of the address table
$address_table = array (
    
'name' => 'address',
    
'columns' => array('id', 'contact_id', 'addressline1', 'addressline2', 'city', 'state', 'zip'),
    
'PK' => 'id',
    
'FK' => array ('from' => 'contact_id', 'to' => 'contact')
    );

$table_metadata = array($contact_table, $address_table);

// Describe the parent-child relationship.  This is information required
// by the Relational DAS to help map from the relational database representation
// to the data graph representation of SDO.
$address_reference = array('parent' => 'contact', 'child' => 'address');

Note: The Relational DAS assumes that all containment relationships are cardinality one-to-many. So in this example, the contact can contain zero or more address DataObject instances.

Having defined the model, we can now create an instance of the Relational DAS.

// Create the Relational Data Access Service telling it the database
// schema, that table should be considered the root of the graph,
// and finally the additional information for the object model.
$das = new SDO_DAS_Relational($table_metadata, 'contact', $address_reference);

3. Execute the query
We can now use the Relational DAS to retrieve the contact information.

// connect to the database.  This connection will be released when the
// $dbh variable is cleaned up.
$dbh = new PDO('mysql:dbname=contactdb;host=localhost', DB_USER, DB_PASSWORD);

// construct the SQL query for contact retrieval
$stmt = "SELECT * FROM contact, address WHERE contact.shortname=$shortname AND
contact.shortname=address.contact_id"
;

// execute the query to retrieve the contact
$contact = $das->executeQuery($dbh, $stmt);

The resulting $contact SDO is shown below. This shows the data objects, their properties (property index shown in the square brackets, followed by the property name), and the property values. As mentioned, the cardinality of the address containment within a contact is assumed by the Relational DAS to be one-to-many. In the diagram below, the notation [0] DataObject has been used to signify the first entry in the list of address DataObjects.

Contact SDO instance

4. Populate the edit form
Given the contact SDO, we can populate the form to allow the user to edit the data.

<!-- Create and populate the form with the contact details -->
<form action="..." method="POST">
  <input type="text" name="fullname" value="<?php echo $contact->fullname ?>">
  ...
  <input type="text" name="addressline1" value="<?php echo $contact->address[0]->addressline1 ?>">
  ...
</form>

5. Store the data object in the session
Because we are disconnected from the database, we need to store the contact data object in the session to make it available to the next page.

// store the contact data object in the session
$_SESSION['contact_sdo'] = $contact;

More on SDO navigation

The previous section briefly touched on accessing the properties of an SDO (see the code in Step 4, "Populate the edit form"). In addition to accessing primitive properties, most SDO applications also require navigation up and down parent-child data object relationships (between contact and address, for example). Some also require query capabilities to identify parts of the data graph.

The code snippets below give a quick overview of the ways of navigating the contact SDO data graph.

As we saw in Step 4, SDOs supports property access using the object property syntax:

// get the fullname using the object property
$fullname = $contact->fullname;

We can also access fullname using the property index (the position as defined by the data object's model, as shown in the square brackets in Figure 4):

// get the second contact property (fullname)
$fullname = $contact[1];

We can access many-valued child data object properties, such as address, using the same syntax:

// get the list of address data objects via the object property
$addresses = $contact->address;

// get the list of address data objects via the property index
$addresses = $contact[2];

We can access individual elements of many-valued properties, such as the first address, using array syntax:

// get the first address from a list of address data objects
$address1 = $contact->address[0];

We can also directly reference properties within child data object properties, such as the ZIP code from the first address:

// access the zip code from the first address via the object properties
$zip = $contact->address[0]->zip;

// access the zip code via the property indices.
// Note: this style is not recommended since it leads to virtually
// unserviceable code.  The XPath-style (described later) or defining
// constants lead to much more readable code.
$zip = $contact[2][0][5];

We can also iterate over the properties of a data object:

// Iterate over the properties of the first address
// $name is assigned each property name (e.g. "id", "addressline1", ...)
// $value is assigned each property value (e.g. 1, "Analytical House", ...)
foreach ($contact->address[0] as $name => $value) {
    echo
"$name: $value<br />";
}

Finally, we can access the properties using XPath-like support, the simplest form being the property name:

// use property names (XPath) to access the zip property
$zip = $contact['address'][0]['zip'];

// use single XPath expression with array index notation
to access the zip property
// Note: XPath array indices start at 1.
$zip = $contact['address[1]/zip'];

// use single XPath expression with dotted index notation
to access the zip property
// Note: SDO dotted notation indices start at 0.
$zip = $contact['address.0/zip'];

XPath can also be used to navigate and query data objects. If we had retrieved a number of contacts in the Relational DAS query, we could identify an individual from its first address line; for example: The Relational DAS returns multiple results as a many-valued child property of a root data object, that in the example we have named $root.

// Get the address object that contains the addressline1 of "1 Engine Close"
$address = $root["contact/address[addressline1='1 Engine Close']"];

// Get the contact with the matching address.  GetContainer() navigates to the
// parent of a data object, in this case the contact SDO.
$contact = $address->getContainer();

Modifying the data

The following shows a simple example of a contact edit page, edit.php, where the contact has a single address:

The contact edit page

This page allows the user to modify individual property values. When the Update button is clicked, all the values are placed in the $_POST array, regardless of whether they have been modified, and the application transitions to the confirm.php page. The confirm page performs the following main steps:

  1. Retrieves the contact SDO from the session
  2. Updates the contact SDO
  3. Creates an Relational DAS instance
  4. Writes the changes back to the database
  5. Informs the user of the outcome
Each of these steps is described in more detail below.

1. Retrieve the contact SDO from the session
The final step in the execution of the edit page was to place the contact SDO into the session. We now retrieve this contact to make the updates.

// retrieve the contact from the session
$contact = $_SESSION['contact_sdo'];

2. Update the contact
Now that we have the contact, we can go about making the updates posted from the edit page. This is done by comparing the posted value with the old value, and if they are different, setting the posted value on the contact. We do this to avoid setting a value unnecessarily and causing SDO to record a change in the change summary (holds the old values for data objects that have been modified). It would be nice if SDO implementation were to do this test on our behalf.

// update the fullname if changed
if ($contact->fullname != $_POST['fullname']) {
    
$contact->fullname = $_POST['fullname'];
}
...

3. Create an Relational DAS
The next step is to create the Relational DAS used to write the updates to the database. This code is identical to that used in retrieval and, as mentioned, is best placed in a separate script, (contact_model.inc.php, for example).

// initialize the Relational Data Access Service
require_once('contact_model.inc.php');

4. Write the changes back to the database
The next step is to apply the changes back to the database. The call below shows how this is done for the Relational DAS. There is no need to specify a SQL statement for updates because the Relational DAS derives this from the model and the contact data object's change summary.

// apply the changes back to the database
$das->applyChanges($dbh, $contact);

The applyChanges() call is deceptively simple. Under the covers, it is:

  • Ordering SDO updates to ensure the correct results (for example, creates before updates).
  • Generating SQL INSERT, UPDATE, and DELETE statements to apply the changes. The UPDATE and DELETE statements are qualified with the original values of the data so that should the data have changed in the database in the meantime this will be detected.
  • Executing the SQL statements -- If any of the SQL statements fails to execute, this is an indication that a collision has occurred and the Relational DAS rolls back all changes and throws an exception. If all statements succeed, all the changes are committed to the database. The client application can then continue to work with the data object, make more changes, and apply them, or can discard it.

5. Inform the user of the outcome
The final task is to notify the user of the outcome. If no collisions are detected by the Relational DAS, and the update is successful, all is well.

The confirmation page

There are two common schemes employed for detecting conflicts:

  1. Add a version column (might be based on a timestamp) to each table, updated each time a row is modified. Comparing versions tells us if there is a conflict.
  2. Record all the original values and compare with the current ones to see if any have been modified.
The Relational DAS implements the second scheme, since this does not require the database to be modified in order to use it.

More on SDO modification

The previous section briefly touched on accessing and setting an SDO property (see the code in Step 2, "Update the contact"). In addition to setting primitive properties, most SDO applications also require the creation of child data objects and the deletion of parts of the data structure. The code snippets below give a quick overview of the other types of modification one might wish to perform on the contact SDO.

The techniques described for getting individual properties are also available for setting:

// set the fullname via the object property
$contact->fullname = 'Alan Turing';

// set the fullname via the property index
$contact[1] = 'Alan Turing';

// set the fullname via the property name (XPath)
$contact['fullname'] = 'Alan Turing';

We can create child data objects. For example, the edit user interface could allow adding a new address to a contact. When new address details were posted, we might perform the following:

// create a child address data object
$address = $contact->createDataObject('address');

// set the address's addressline1 property from the posted value
$address->addressline1 = $_POST['addressline1'];

Note: This child data object is automatically inserted into the graph and $address is simply a reference to that position in the graph. So for example, if this were the first address added, the following would both set the ZIP code on the contact's address.

// set the address's zip
$address->zip = 'XY11 2ZZ';

// set the address's zip via the contact data object
$contact->address[0]->zip = 'XY11 2ZZ';

We can test and unset individual instance properties of the contact. For example, if the user cleared the string in the interface, we could use this to signify unsetting:

// if the fullname value was cleared on the edit page and the fullname
// was previously set then unset the fullname property
if (empty($_POST['fullname']) && isset($contact->fullname)) {
    unset(
$contact->fullname);
}

Finally, we probably want to enable the deletion of a contact, or contact's address in the edit page, again implemented using unset:

// test and unset the first address
if (isset($contact->address[0])) {
    unset(
$contact->address[0]);
}

Summary

SDOs add some interesting capabilities for working with data in PHP, whilst maintaining the simple, easy-to-use interfaces PHP developers expect. SDOs can represent complex data structures from heterogeneous data sources, whilst allowing their manipulation through a single API similar to that of SimpleXML and PDO. Optimistic concurrency support is built into SDO, allowing disconnected data manipulation without requiring the application to implement change tracking and conflict detection.

This article has given a taste of some of the capabilities of SDO, but there are many that have not been covered. We expect subsequent articles to cover these topics, including:

  • Different classes (or types) of data objects: sequenced, open, abstract
  • Relational DAS details
  • XML DAS details
  • Implementing a DAS using SDO service provider interface

This article was first published by IBM developerWorks at http://www-128.ibm.com/developerworks/opensource/library/os-sdophp/

Resources

The SDO for PHP implementation is delivered as a PECL extension, and can be downloaded from the SDO project page.

The SDO for PHP documentation can be found in the livedocs build of the PHP manual.

The Java SDO specification is available for download from IBM's developerWorks library.

About the Authors

Graham Charters is a Senior Software Engineer working at IBM's development lab in Hursley, England. Past roles have included IBM WebSphere® Application Server development, and architecture responsibilities in WebSphere Business Integration, and Adapters. His current interests are in the relationships between open source technologies, such as those of Linux®, Apache, MySQL, PHP/Perl/Python (LAMP) and the WebSphere platform. He holds degrees in computer science, numerical analysis, and machine vision, all from the University of Manchester.

Matthew Peters works at IBM's development lab in Hursley, England, as a Software Engineer. He has worked in various roles on IBM's CICS® and MQSeries® products, and also spent a number of years working with partners in scientific and technical computing and large-scale parallel processing. In recent years, he worked on the garbage collector in the IBM JVM. He has a degree in mathematics from Queens' College in Cambridge and a master's in software engineering from Oxford University.

Caroline Maynard is a Software Engineer at IBM's development lab in Hursley, England, where she has worked in diverse areas, including networking, graphics, and voice. Most recently, she led the development of the IBM Java ORB, which underpins the WebSphere Application Server EJB container. She is interested in the integration of IBM offerings with open source Linux, Apache, MySQL, PHP/Perl/Python (LAMP) technologies. She holds a degree in mathematics from the University of Sussex.

Anantoju Veera Srinivas works as a Staff Software Engineer at IBM's development lab in Bangalore, India. Past roles have included JVM development on Linux® and AIX®. His current interests are in Web technologies and databases, such as Linux, Apache, MySQL, PHP/Perl/Python (LAMP) in the open source world, and the IBM WebSphere® and DB2® platforms. He holds degrees in electrical and electronics engineering from Sri Venkateshwara University, Andhra Pradesh, India.

All four authors worked together to create the SDO extension for PHP.



Comments