Categories


Loading feed
Loading feed
Loading feed

Writing Libraries in PHP


Intended Audience Introduction
General Rules
Define the Goal
Document
Embrace Change
Use it Yourself
Specific Guidelines
The two types of API
Creating Methods
Naming Methods
File Trees
Plan to Expand
Beware of Infinity
Conclusion
About The Author

 

Intended Audience


This tutorial is intended for PHP programmers who are interested in writing reusable code. Experience with PHP4 and familiarity with writing classes are assumed. The article is primarily geared toward programmers who are new to writing code libraries, but experienced programmers may find the article informative as well.

Introduction

The core of any programming language is its API. The API (application programming interface) is the set of available functions that allow a programmer to use the programming language. Compare an API to the language's syntax, which is the rules for stringing the various commands together so the interpreter can understand them.

Most of the documentation on PHP is focused on explaining the functioning of the various APIs PHP has available. Likewise, if you create a PHP library, most of the work involved is usually in sorting out how the API will work.

When you code a library, your primary goal is to write code that can be reused. You might be concerned with code reuse within a single application or maybe you want to publish the library for others to use. Taking the time to code your library properly results in code that often becomes more useful than you initially expected, and reduces the time it takes you to create the rest of your application. It also makes it easier to involve other programmers in your project. Obviously, a well-written library is much more likely to be useful to other programmers than one that was created carelessly.

Libraries generally take on two forms: functions and classes. PHP is a language that I generally classify as "hybrid" because it has most of the capabilities of an object-oriented language, yet the native PHP API is functional in nature. The shortcomings of PHP's object programming are not bad enough to warrant avoidance of writing class libraries. In fact, the examples and explanations in this article are based completely on class libraries, so I'll be covering some of the workarounds for those shortcomings.

General Rules

Define the Goal

The first thing to do when writing a class library is to determine the goal of the library (actually, this should be done before starting any kind of project, whether programming or not). I could go into a long discussion of exactly how to define the goal and exactly how much detail is required, but suffice it to say that more is better.

I've seen this called “defining the ideal scene". You build a little utopia in your mind and imagine what your library would do if it were perfect. Identify key elements to this success. Is it simple? Is speed a critical concern? Must it accept many different types of input or can you establish firm guidelines for input? Security is always a concern, so will your library handle it or will it be up to the application that uses the library? Defining what the library will not do is often just as important as defining what it will do.

Be sure to document the goal. Depending on how your project is organized, you might include it in a README file. Documenting the goal helps keep you focused on what's important. Everyone has time constraints, focusing on what's important allows you to make the best use of your time. It also keeps you from turning the library into something it was never intended to be. The term "feature creep" was coined by people who either never documented a goal for their project, or ignored the goal after they started working. Documenting the goal is also important if the library is going to be made public. It keeps you from getting requests for features that don't belong, and avoids misunderstanding.

Document

Begin documenting immediately after you determine what your goal is. Actually, documenting the goal of your library bleeds right over into documenting the rest of the library.

Some people think that a library should be documented after it is written. I disagree. I think the major problem with a lot of software on the market is poor documentation. This goes for open-source software as well, and can be crippling to the success of a library of any type.

Documenting prior to development seems to help in other ways as well. Getting your ideas down in documentation form can speed actual code development considerably. It also helps in the same way as having a goal defined, by keeping you (and anyone else who may be helping with development) on the right track.

Embrace Change

If you can get the API for your library correct on the first try, please let me know so I can lay a tribute at your feet. For the rest of us mortals, I believe that the easiest way to accept change is to see it as a magical experience by which the code improves. This give you a psychological edge over "Darn, I've got to change the code again".

But embracing change is much more than just a positive frame of mind. It is an all-aspects bear hug of involvement. When an aspect of your library changes you need to do a number of things right away!
  1. Update the library itself to implement the change
  2. Update the documentation to reflect the change
  3. Plan for compatibility problems
Number two can take a lot of work sometimes. It's usually good to describe how the function used to work, mention the exact version where the change took place, and detail the nature of the change and the new behavior.

Number three can be accomplished a number of ways. If you're aware that a lot of code already uses the old method, you should probably create a brand-new method that implements the new system while leaving the old one intact. You can often use a difference in the name to indicate the change. Unfortunately, this can lead to code bloat and the necessity of supporting old code longer than you would like. The best way to avoid this is to prominently document the fact that the old way is depreciated and clearly state the version in which it will be removed from the library. Be sure to give people a few versions to make the switch.

Compatibility can often be accomplished with wrappers around the new methods that mimic the behavior of the old methods. For example:

function count_to($start, $increment, $end)
{
    
$this->count_for($start, $end, $increment);
}

function
count_for($start, $end, $increment = 1)
{
    for (
$i = $start; $i <= $end; $i += $increment) {
        echo
$i;
    }
}

As you can see in this fictional example, count_to has been replaced with count_for. The new function is simpler, since a case where the increment is one (which is fairly common) does not require a third argument. By wrapping the new function with the old we reduce code maintenance and bloat, while still maintaining 100% compatibility with the old method.

Change is a large and very important topic. I could probably write an entire article on how to know when to change something and when not to and (obviously) there would still be no way to know for sure that the decision was correct. I think the important thing is to consider every change with regard to the goal of the project, and decide whether the change makes the goal more likely to be accomplished or not. This is the primary gauge for knowing whether or not to make a change.

Use it Yourself

Depending on why you're creating your library, you may well be using it as you go. If you're creating the library because you need the functionality it provides for a project you're working on, don't let this fool you into thinking you're testing the library. The project you're developing for is probably only going to stress a limited number of uses of the library. When you wrote your goal, you probably defined places where the library would be useful, and you need to test all of them, even the ones that you don't need for your current project.

It helps to document what has been thoroughly tested and what hasn't. It’s a good idea to write a script that tests all aspects of your library, and every time you make changes check them against this script. Failure to test wastes the time of other programmers. If someone else points out a failing add additional tests to the script.

Specific Guidelines

The two types of API

In my experience, there are two broad types of API. Keep in mind that I'm focusing completely on class APIs; this doesn't hold true for purely functional APIs.

I think it's easiest to describe the two types by giving examples:

$lib = new Library;
$lib->setVar1($lib->getSomeValue());
$lib->setVar2('value2');
$lib->setVar3('value3');
$lib->doSomething();

I call this the “preset” approach. (There may be some other, nicer sounding, computer science degree savvy name for it, but I don't know what it is). Microsoft seems to use this model in many of their APIs. It's characterized by a method of use that involves setting a number of class values and then calling a method to act on those values. In the example above var1, var2, and var3 must be set before doSomething() can be called, and doSomething() is actually using those values.

The other method is more like the way an API is written for a functional language. The same library written in this manner would be used thus:

$lib = new Library;
$lib->doSomething($lib->getSomeValue(), 'value2', 'value3');

There are advantages and disadvantages to both methods. For methods that have few operands, the functional style is usually simpler and shorter. Reading the code is easier too, since all the pertinent information is in one place. In the “preset” method, the setting of the class variables might occur much earlier in the code, and it might not be immediately obvious what those values were set to. On the other hand, the ”preset” method stays consistent even when the values are complicated to acquire. For example, if the first value required a number of method calls and a lot of math operations to calculate, it would make sense to store it in another variable temporarily. Whereas the preset method would not change very much.

Despite the potential problems, I prefer the functional approach. I think the most important thing to do is pick one of these methods and stick with it for the entire API of your library.

Some might claim that I've made the library too complicated in the “preset” example. Why not just set the class variables directly? It would make things shorter, and the class code would be less complex if I replaced the methods above with something like:

$lib->var2 = 'value2';

The problem is that this is very self-limiting. If you decide you want to validate the data placed into var2 you can easily add such a feature in the method. If you don't use a method to set var2 then you can't do data validation until doSomething() is called. Additionally, you can never be sure how you'll want to store that value internally, and directing an implementer to set the value directly will require a change to the program if you alter your library. By using a method, your change is transparent to everyone who uses your library.

This is a current shortcoming of PHP's object model. It doesn't implement ties, which would allow you to set the value directly and still validate it or internally alter the representation. Knowing this, the best way to avoid problems is to use methods to set your class variables.

Creating Methods

Creating methods that are truly scalable and useful is not always easy. This is by no means an exhaustive explanation on how to achieve API perfection. I won't even pretend that I know all the answers, I'm just going to share what I've learned so far.

It's best to start developing your API from the bottom up. Start out with the most basic tasks that will need accomplished and write the code for them. Many of these will be private methods (that the user of the API should never use) and here is one of the shortcomings of the PHP object model. PHP doesn't truly support private methods. The PEAR coding standards recommend that you precede methods that should be private with an underscore. This is a fair workaround, and in practice the lack of private methods seldom causes problems.

For example, in the case of a system that uses a database, the first methods that you'll want to create will be ones that allow you to access the database without knowing that it's a database. Methods such as getRecordByName() or getRecordById() are a good start. Even better would be something more specific, for example: getCustomerById().

Once you have these, writing the next layer up becomes easier. A method like getAllDeliquentCustomers() might be able to use the getCustomerByDueDate() method. Next thing you know, you've written a generateDeliquentReport() method with only five lines of code.

You'll need to refine your API from the top down. If you think I contradicted myself, look again. You start developing from the bottom up, then you refine from the top down. Actually the process will go back and forth as long as you develop your library, with the low-level methods suggesting ideas for higher level methods, and high-level methods requiring new and revised low-level methods. Flow with this process like the programming monk you long to be. Keep aware of what your high-level functions are saying to you. "Build me a generic method to access this part of the class," they will say. "Exploit my code to create a more powerful wrapper method," the low-level methods will whisper. Listen to them. A great artist once believed that the artwork already existed within the stone, and he had but to reveal it. Programming is often the same way.

Naming Methods

You always want your methods to have names that describe what they do, without being too long to type easily. While you're at it, please bring about world peace. That goal is often difficult or impossible to accomplish, but you should always keep it in mind when naming your method, and push to come as close to perfection as you can without causing a brain hemorrhage.

Be consistent with naming. Don't name one function getAllReleventData() and another get_all_unrelated_data() and another fetch_data_important() within the same library. There are a number of style guides available so you don't have to write your own. I recommend the one developed for PEAR. But if you can't agree with any pre-existing style guide, at least develop your own style and be consistent.

If you're not sure whether to make a method private or public, make it private. If you find out that you made a mistake later, a simple FIND/REPLACE operation within your editor will correct the problem. If you make it public and realize that it wasn't a good idea, anyone who uses your library can be adversely affected when you change it.

File Trees

Exactly how this works out will differ a lot depending on the size and complexity of your library. You could, of course, put all the code for your library in a single file. There are advantages to this, such as easy distribution, but the larger the code base gets, the more disadvantages appear and. maintenance becomes a nightmare. It takes hours just to scroll through to find the method you want to work on. Also, you can't divide work among multiple programmers because the code is all in one file.

Unless you're sure that the library will stay small, you should probably plan on having many files to your library right from the start. Even if you're sure that the library will remain small, you might want to do this anyway. Things have a way of outgrowing original expectations.

Look at what your code does and try to imagine a logical division of labor that will allow you to easily decide what code should go in what file. You might have alternate parts of your code that can be included at run time. For example, if you want to store files, you may have the option to store them on the file system, or in a database. Then you can split the methods between three files as follows:

class.php
database
.inc.php
filesystem
.inc.php

This helps in many ways. You don't have nearly as many if-then blocks in the code. PHP has to parse less. If you want to rework the file system code, you don't risk breaking the database code while you're doing it. You get the idea.

Making it work internally can be done a number of ways. My favorite is to wrap all the include code in an additional class and make it a property of the main class. Here's an example:

// File class.php
class libraryMain {

    var
$typeclass;

    function
libraryMain($type = 'filesystem')
    {
        switch (
$type) {
        case
'filesystem' :
            require_once(
'filesystem.inc.php');
            break;
        case
'database' :
            require_once(
'database.inc.php');
            break;
        }
        
$this->typeclass = new storeClass;
    }
    
    function
getFile($filename)
    {
        return
$this->typeclass->getFile($filename);
    }
}

// File database.inc.php
class storeClass {
    function
getFile($filename)
    {
        
// Code to retrieve a file from the database
    
}
}

// File filesystem.inc.php
class storeClass {
    function
getFile($filename)
    {
        
// Code to retrieve a file from the filesystem
    
}
}

Aside from the fact that this oversimplified code is lacking basic error checking, it will allow you to easily use both file system-based and database-based file storage. There are other ways to accomplish this, however I'll leave them as an exercise to the reader.

If you don't have any optional code, you can still benefit by dividing your code base into different files. Consider grouping your methods into broad categories by what they accomplish or how they function. You might put all your low-level methods in one file, and your high-level methods in another. You might put all the methods that deal with files in a file called file.inc.php while all the methods that deal with database records are in db.inc.php and all the methods that manipulate the data are in manip.inc.php.

Plan to Expand

One of the beautiful things about high-level languages like PHP is the ability to move data around within the code. There are no pointers to frustrate over, or memory management to worry about. Take advantage of this feature-filled language and pass the power on to users of your library.

Never return a single value when you can return an array. Never return an array when you can return an object. Seriously. The beautiful thing about arrays and objects is their ability to expand. What if your function generates an error? One way to handle it is to return false and have a method that retrieves the last error. This is implemented something like this:

$o = new Library;
if (!
$o->someMethodCall()) {
    echo
$o->getLastError();
}

This works, and in many cases is the most practical approach. But don't fail to consider the following enhancement:

$o = new Library;
$r = $o->someMethodCall();
if (
$r->gotError()) {
    
$r->tellErrorToUser();
    
$r->resetClassToHandleError();
} else {
    
$r->doWhatYouWouldDoIfSuccess();
}

To say that all methods should return an object is insane, but always consider the possibility. Arrays work almost as nicely. Consider the following example:

$o = new Library;
$r = $o->someMethodCall();
if (
$r['error']) {
    echo
$r['usererrormsg'];
    if (
$user == 'Admin') {
        echo
$r['adminerrormsg'];
    }
} else {
    echo
$r['value'];
}

Each of the three examples does something a little different to illustrate some of the advantages of each method. Don't blindly use objects or arrays for all return values. The best way to decide is to consider what the implementer will be doing with the returned information and use that as a gauge. If they'll be using a lot of methods on it every time they acquire it, an object may be the best thing to return. If there are a number of values that may need returned, but you can't be sure how they'll be used, an array is probably best. If the method simply does something and needs to indicate success or failure, then a single value will suffice.

This works in the other direction as well. Think about the arguments to your method and consider the possibility that an array might be better than a long list of values, especially if you have a lot of optional values. For example:

function setUpShop($name, $size = 'large', $color = 'red', $angle = 0, $scale = 1, $language = 'en')

Obviously, the only required value is the name, while the remaining values will be assigned defaults if not specified. But what if the person using this function needs to specify a non-default angle, but leave the default size and color. There's no way to do it. You could alternately define the function as:

function setUpShop($name, $parameters = array())
{
    if (!isset(
$parameters['size'])) $parameters['size'] = 'large';
    if (!isset(
$parameters['color'])) $parameters['color'] = 'red';
    ...
etc ...

This way, the user can specify only the parameters needed, and the rest will be set to defaults. It's also possible to provide an object as a parameter, although the advantages are fewer.

There are disadvantages and advantages to both approaches. In the second one, the programmer has to put the values into the array prior to calling the function, which can be somewhat tedious. The best time to use the second approach is when there are a lot of potential values, and you can't be sure which ones the user will want to set, and which ones will be left as defaults. The first method works best if you know that the presence of value three will always require value two, and so forth. Another advantage to the second method is that you can add parameters to the method call without changing the API, so if you're not sure what parameters you might need in the long run, the second method is probably better.

Beware of Infinity

The only major error I know of in the PHP documentation is where the database documents say that you don't have free result sets after use, since PHP will free the memory automatically when the script terminates. This is bad advice all around, but especially bad for libraries.

Always free large amounts of memory when you're through using it. Database result sets are a good example. (Use the preferred function, such as pg_free_result().) Variables that end up containing large arrays or long strings should be unset() when you know you won't need them again.

This seems to contradict what I said earlier about not having to worry about pointers and memory management, but it's important. No matter how big and powerful newer computers get, they're still working with limited resources. And you never know how many times a particular script might legitimately call your class methods. You might be surprised how quickly repeated database result sets can eat up all the memory on a web server. Large arrays and long string values aren't as bad because the memory is garbage collected when they go out of scope, but keep your eyes open for problems. Database connections can cause trouble as well, since most database servers have a limit to the number of connections that can be made. If possible, reuse the same connection. If not, be sure to close the connection when you're done with it.

Conclusion

I haven't covered everything there is to say about writing libraries. Then again, I don't feel that could be done without writing an entire book.

I feel that the most important points are consistency and documentation. Without consistency, a library is difficult to use, and lacks the “polish” that many people expect. Without documentation, well, there's little chance that anyone will be able to use the library.

About The Author

Bill Moran works for Potential Technologies and has been helping people get more out of their computers for over ten years. He can be reached at wmoran@potentialtech.com.

Comments


Loading feed